subject:"\[R\] Problem with comparing multiple data sets"

Re: [R] Problem with comparing multiple data sets

2015-05-29 Thread Mohammad Alimohammadi

Hi everyone.

I tried the (modeest) package on my initial test data and it worked.
However, it doesn't work on the entire data set. I saved one of the
protions that gives error. (Not for all of the values but for some of
them). For example: lines 36 and 37 and 39 correctly show the mode value
but 38 and 40 are not correct. Such error is repeated for many of the
values.

[36,] 2
[37,] 2
[38,] Numeric,3
[39,] 1
[40,] Numeric,3



#This is what I did:
 df- read.csv(file=Part1-modif.csv, head=TRUE, sep=,)
 Out- apply(df[,2:length(df)],1, mfv)
 t(t(Out))


#This is the data set

structure(list(terms = structure(c(2L, 4L, 4L, 4L, 3L, 1L, 5L,
5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label =
c(#authentication,access control,
#privacy,personal data, #security,malicious,security, data
controller,
id management,security, password,recovery), class = factor),
class.1 = c(2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L,
2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L,
1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L), class.2 = c(2L, 2L, 2L,
0L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L,
2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L,
2L, 2L), class.3 = c(2L, 0L, 2L, 2L, 1L, 1L, 0L, 0L, 0L,
2L, 2L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c(terms,
class.1, class.2, class.3), class = data.frame, row.names = c(NA,
-50L))



also when I try to include the terms to the result it gives me an error:

 mode.names- data.frame (df[,1],Out)
Error in data.frame(df[, 1], Out) :
arguments imply differing number of rows: 50, 3







On Thu, May 28, 2015 at 9:24 AM, Mohammad Alimohammadi 
mxalimoha...@ualr.edu wrote:

 Thank you David for your help !

 On Wed, May 27, 2015 at 7:31 PM, David L Carlson dcarl...@tamu.edu
 wrote:

  cat(paste0([, 1:length(Out), ] #dac , Out), sep=\n)

  David

 *From:* Mohammad Alimohammadi [mailto:mxalimoha...@ualr.edu]
 *Sent:* Wednesday, May 27, 2015 2:29 PM
 *To:* David L Carlson; r-help@r-project.org

 *Subject:* Re: [R] Problem with comparing multiple data sets



 Thanks David it worked !



 One more thing. I hope it's not complicated. Is it also possible to
 display the terms for each row next to it?



 for example:



 [1] #dac2

 [2] #dac0

 [3] #dac1

 ...









 On Wed, May 27, 2015 at 2:18 PM, David L Carlson dcarl...@tamu.edu
 wrote:

 Save the result of the apply() function:

 Out - apply(df[ ,2:length(df)], 1, mfv)

 Then there are several options:

 Approximately what you asked for
 data.frame(Out)
 t(t(Out))

 More typing but exactly what you asked for
 cat(paste0([, 1:length(Out), ] , Out), sep=\n)


 David L. Carlson
 Department of Anthropology
 Texas AM University



 -Original Message-
 From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Mohammad
 Alimohammadi
 Sent: Wednesday, May 27, 2015 1:47 PM
 To: John Kane; r-help@r-project.org
 Subject: Re: [R] Problem with comparing multiple data sets

 Ok. so I read about the (modeest) package that gives the results that I
 am looking for (most repeated value).

 I modified the data frame a little and moved the text to the first column.
 This is the data frame with all 3 possible classes for each term.

 =
 structure(list(terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L,
 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac,
 #mac,#security,
 accountability,anonymous, data security,encryption,security
 ), class = factor), class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L,
 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), class.2 = c(2L, 2L,
 2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L,
 0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L),
 class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L,
 0L, 0L, 0L, 0L, 2L, 1L, 2L)), .Names = c(terms, class.1,
 class.2, class.3), class = data.frame, row.names = c(NA,
 -49L))
 =
 #Then I applied the function below:

 ==
 library(modeest)
 df- read.csv(file=short.csv

Re: [R] Problem with comparing multiple data sets

2015-05-29 Thread John Kane

Hi Mohammad,
I have no idea what is happening but for some reason your new data (renamed df1 
since df is a reserved word in R) is outputting a list whereas dff1 (your 
original test data) is giving a vector as you wanted.

It may be obvious but I don't see why df1 is giving us a list.  As far as I can 
tell the two data sets are structually the same.

The two data sets are below the program.  
## =
library(modeest)

# Original test data 
str(dff2)
head(dff2)

# sample of new data
str(d1)
head(df1)

Out.dff2  - apply(dff2[ ,2:length(dff2)], 1, mfv)
str(Out.dff2)

Out.df1  -  apply(df1[ , 2:length(df1)], 1, mfv)
str(Out.df1)


## =
## New data set 
df1  - structure(list(terms = structure(c(2L, 4L, 4L, 4L, 3L, 1L, 5L,
5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label =
c(#authentication,access control,
#privacy,personal data, #security,malicious,security, data controller,
id management,security, password,recovery), class = factor),
class.1 = c(2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L,
2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L,
1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L), class.2 = c(2L, 2L, 2L,
0L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L,
2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L,
2L, 2L), class.3 = c(2L, 0L, 2L, 2L, 1L, 1L, 0L, 0L, 0L,
2L, 2L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c(terms,
class.1, class.2, class.3), class = data.frame, row.names = c(NA,
-50L))

## Original test data set

dff2  -   structure(list(terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L,
 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac,
 #mac,#security,
 accountability,anonymous, data security,encryption,security
 ), class = factor), class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L,
 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), class.2 = c(2L, 2L,
 2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L,
 0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L),
 class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L,
 0L, 0L, 0L, 0L, 2L, 1L, 2L)), .Names = c(terms, class.1,
 class.2, class.3), class = data.frame, row.names = c(NA,
 -49L))

##=



John Kane
Kingston ON Canada

-Original Message-
From: mxalimoha...@ualr.edu
Sent: Fri, 29 May 2015 11:40:41 -0500
To: dcarl...@tamu.edu, drjimle...@gmail.com, jrkrid...@inbox.com, 
r-help@r-project.org
Subject: Re: [R] Problem with comparing multiple data sets

Hi everyone.

I tried the (modeest) package on my initial test data and it worked. However, 
it doesn't work on the entire data set. I saved one of the protions that gives 
error. (Not for all of the values but for some of them). For example: lines 36 
and 37 and 39 correctly show the mode value but 38 and 40 are not correct. Such 
error is repeated for many of the values.

[36,] 2        

[37,] 2        

[38,] Numeric,3

[39,] 1        

[40,] Numeric,3



#This is what I did:

 df- read.csv(file=Part1-modif.csv, head=TRUE, sep=,)

 Out- apply(df[,2:length(df)],1, mfv)

 t(t(Out))

#This is the data set 

structure(list(terms = structure(c(2L, 4L, 4L, 4L, 3L, 1L, 5L, 

5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 

6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 

6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c(#authentication,access 
control, 

#privacy,personal data, #security,malicious,security, data controller, 

id management,security, password,recovery), class = factor), 

    class.1 = c(2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 

    2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 

    1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 

    2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L), class.2 = c(2L, 2L, 2L, 

    0L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 

    2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 

    2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 

    2L, 2L), class.3 = c(2L, 0L, 2L, 2L, 1L, 1L, 0L, 0L, 0L, 

    2L, 2L, 0L, 0L, 0L, 0L, 1L

Re: [R] Problem with comparing multiple data sets

2015-05-29 Thread Jim Lemon

Hi Mohammad,
It looks like you are still having problems with this. Given your
latest data set, as below, here is something that might do what you
want. From David's message, I'm not sure whether you are operating on
a single data frame or a list.

# this is the data set as taken from your message below
madf-structure(list(terms = structure(c(2L, 4L, 4L, 4L, 3L, 1L, 5L,
5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label =
c(#authentication,access control,
#privacy,personal data, #security,malicious,security, data controller,
id management,security, password,recovery), class = factor),
class.1 = c(2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L,
2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L,
1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L), class.2 = c(2L, 2L, 2L,
0L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L,
2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L,
2L, 2L), class.3 = c(2L, 0L, 2L, 2L, 1L, 1L, 0L, 0L, 0L,
2L, 2L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c(terms,
class.1, class.2, class.3), class = data.frame, row.names = c(NA,
-50L))

# define a function that extracts the value from one field
# selected by a value in another field
extract_by_value-function(x,field1,value1,field2) {
 return(x[x[,field1]==value1,field2])
}

# define another function that equates all of the values
sub_value-function(x,field1,value1,field2,value2) {
 x[x[,field1]==value1,field2]-value2
 return(x)
}

# this now steps through every value in key_field
# and operates on every field listed in change_fields
conformity-function(x,key_field,change_fields) {
 keys-unique(x[,key_field])
 for(key in keys) {
  for(change_field in change_fields) {
   # get the most frequent value in change_field
   # for the desired value in key_field
   most_freq-as.numeric(names(which.max(table(
extract_by_value(x,key_field,key,change_field)
   # now set all the values to the most frequent
   x-sub_value(x,key_field,key,change_field,most_freq)
  }
 }
 return(x)
}

conformity(madf,terms,c(class.1,class.2,class.3))

Obviously you will want to save the return value of conformity into
your original data frame or create a new one.

Jim

 Hi everyone.

 I tried the (modeest) package on my initial test data and it worked. However, 
 it doesn't work on the entire data set. I saved one of the protions that 
 gives error. (Not for all of the values but for some of them). For example: 
 lines 36 and 37 and 39 correctly show the mode value but 38 and 40 are not 
 correct. Such error is repeated for many of the values.

 [36,] 2

 [37,] 2

 [38,] Numeric,3

 [39,] 1

 [40,] Numeric,3

 

 #This is what I did:

 df- read.csv(file=Part1-modif.csv, head=TRUE, sep=,)

 Out- apply(df[,2:length(df)],1, mfv)

 t(t(Out))

 #This is the data set

 structure(list(terms = structure(c(2L, 4L, 4L, 4L, 3L, 1L, 5L,

 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,

 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,

 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = 
 c(#authentication,access control,

 #privacy,personal data, #security,malicious,security, data controller,

 id management,security, password,recovery), class = factor),

 class.1 = c(2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L,

 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L,

 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L,

 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L), class.2 = c(2L, 2L, 2L,

 0L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L,

 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L,

 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L,

 2L, 2L), class.3 = c(2L, 0L, 2L, 2L, 1L, 1L, 0L, 0L, 0L,

 2L, 2L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,

 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,

 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c(terms,

 class.1, class.2, class.3), class = data.frame, row.names = c(NA,

 -50L))

 

 also when I try to include the terms to the result it gives me an error:

 mode.names- data.frame (df[,1],Out)

 Error in data.frame(df[, 1], Out) :

 arguments imply differing number of rows: 50, 3


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained,

Re: [R] Problem with comparing multiple data sets

2015-05-28 Thread John Kane

Lovely solution Mohammed. I had not even heard of the modeest package.   

For names, I'd just create another data.frame

mode.names  -  data.frame(df[,1], Out)

John Kane
Kingston ON Canada


 -Original Message-
 From: dcarl...@tamu.edu
 Sent: Thu, 28 May 2015 00:31:45 +
 To: mxalimoha...@ualr.edu, r-help@r-project.org
 Subject: Re: [R] Problem with comparing multiple data sets
 
 cat(paste0([, 1:length(Out), ] #dac , Out), sep=\n)
 
 David
 From: Mohammad Alimohammadi [mailto:mxalimoha...@ualr.edu]
 Sent: Wednesday, May 27, 2015 2:29 PM
 To: David L Carlson; r-help@r-project.org
 Subject: Re: [R] Problem with comparing multiple data sets
 
 Thanks David it worked !
 
 One more thing. I hope it's not complicated. Is it also possible to
 display the terms for each row next to it?
 
 for example:
 
 [1] #dac2
 [2] #dac0
 [3] #dac1
 ...
 
 
 
 
 On Wed, May 27, 2015 at 2:18 PM, David L Carlson
 dcarl...@tamu.edumailto:dcarl...@tamu.edu wrote:
 Save the result of the apply() function:
 
 Out - apply(df[ ,2:length(df)], 1, mfv)
 
 Then there are several options:
 
 Approximately what you asked for
 data.frame(Out)
 t(t(Out))
 
 More typing but exactly what you asked for
 cat(paste0([, 1:length(Out), ] , Out), sep=\n)
 
 
 David L. Carlson
 Department of Anthropology
 Texas AM University
 
 
 -Original Message-
 From: R-help
[mailto:r-help-boun...@r-project.orgmailto:r-help-boun...@r-project.org]
 On Behalf Of Mohammad Alimohammadi
 Sent: Wednesday, May 27, 2015 1:47 PM
 To: John Kane; r-help@r-project.orgmailto:r-help@r-project.org
 Subject: Re: [R] Problem with comparing multiple data sets
 
 Ok. so I read about the (modeest) package that gives the results that I
 am looking for (most repeated value).
 
 I modified the data frame a little and moved the text to the first
 column.
 This is the data frame with all 3 possible classes for each term.
 
 =
 structure(list(terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L,
 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac,
 #mac,#security,
 accountability,anonymous, data security,encryption,security
 ), class = factor), class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L,
 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), class.2 = c(2L, 2L,
 2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L,
 0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L),
 class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L,
 0L, 0L, 0L, 0L, 2L, 1L, 2L)), .Names = c(terms, class.1,
 class.2, class.3), class = data.frame, row.names = c(NA,
 -49L))
 =
 #Then I applied the function below:
 
 ==
 library(modeest)
 df- read.csv(file=short.csv, head= TRUE, sep=,)
 apply(df[ ,2:length(df)], 1, mfv)
 
 
 # It gives the most frequent value for each row which is what I need. The
 only problem is that all the values are displayed in one single row.
 
  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0
 0 0 2 1 1 1 1 0 0 0 0 2 1 2
 
 It would be much better to show them in separate rows.
 For example:
 
  [1] 0
 
  [2] 0
 
  [3] 1
 
 
 Any idea how to do this?
 
 
 
 On Wed, May 27, 2015 at 10:11 AM, Mohammad Alimohammadi 
 mxalimoha...@ualr.edumailto:mxalimoha...@ualr.edu wrote:
 
 Hi Jim,
 
 Thank you for your advice.
 
 I'm not sure how to exactly incorporate this function though. I added a
 portion of the actual data sets. all 3 data sets have the same items
 (text)
 with different class values. So I need to assign the most repeated class
 (0,1,2) for each text.
 
 For example: if line1 has text aaa. It may be assigned to class 0 in
 dat1, 2 in dat 2 and 0 in dat3. in this case the aaa will be assigned
 to
 0 (most repeated value). So it goes for each text.
 
 I really appreciate your help.
 
 =
 
 *dat1*
 
 structure(list(class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L,
 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), terms = structure(c(1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label =
 c(#dac,
 #mac,#security, accountability,anonymous, data
 security,encryption,security
 ), class = factor)), .Names = c(class.1, terms), class =
 data.frame

Re: [R] Problem with comparing multiple data sets

2015-05-27 Thread John Kane

Hi Mohammad, 

I went back and reread your original statement of the problem about and I think 
I kinda grasp it. It is actually quite clear and I misunderstood it completely.

At the moment I have no idea how to approach it.  As Jim Lemon said, it looks 
easy but may not be.  I'll go back and re-examine Jim's approach.

You might want to create three sample data sets of the original data layouts 
and upload them, in dput() format, to the list.  It may be easier to tackle 
from that approach.

In any case, in the existing data set is a 2 a numeric value 2 or just an 
on/off indicator?  

John Kane
Kingston ON Canada


 -Original Message-
 From: mxalimoha...@ualr.edu
 Sent: Tue, 26 May 2015 20:11:08 -0500
 To: r-help@r-project.org
 Subject: Re: [R] Problem with comparing multiple data sets
 
 Thank you John. Yes. as you mentioned this is not really what I am
 looking
 for.
 
 It's interesting because I was really thinking that it should be pretty
 easy. All I need to do is just compare class1, class2 and class3 for each
 text and put the most frequent number next to it in each row. Repeat it
 for
 all the rows. Apparently it's not that simple.
 
 Sorry I didn't notice that I sent it only to you! Thanks for letting me
 know.
 
 I appreciate if anybody can help on this.
 
 Thank you.
 
 
 
 
 On Tue, May 26, 2015 at 7:27 PM, John Kane jrkrid...@inbox.com wrote:
 
 Hi Mohammad,
 
 The data came through beautifully despite the fact that you posted in
 HTML.  Please, post in plain text.
 
 Oh, just as I was ready to push Send, I  noticed you only replied to me.
 You really should reply to the R-help list since there are a lot more
 and
 better people to help there. Besides it's a world-wide list. Others can
 play with the problem while we sleep :) .
 
 I will just reply to you but I really suggest sending all of this to the
 list.
 
 Now I am wondering what to do with the data. As a first swipe I just
 added
 up all the values in each class by each text value. Results are below.
 Not
 what you want by any means but perhaps a small step.
 
 Then I started to think are we really interested in the sum or should we
 be looking at incidence, that is should we be looking at the frequency
 rather than the sum?
 
 Is
 class.1 class.2   class  #dac
   0   2  0
 
 a value of 2 (sum) or a hit of 1 (count or freq) ?
 
 Anyway below is what I have tried so far -- it may not be anywhere near
 what you want but if it makes any sense then I think we just need to
 pick
 off the highest values for each combination of terms and class to give
 you
 what you want.
 
 I suspect our real data-munging gurus can do  all this faster and better
 than I can but hopefully it is a start.
 
 Where your data set is dat1
 #=
 # If reshape2 is not installed.
 install.packages(reshape2)
 #=
 
 library(reshape2)
  mdat  -  melt(dat1, id.vars= c(terms),
variable.name = class,
value.name = value,
na.rm = FALSE)
 
 mdat1  -  aggregate(value ~ terms + class, data = mdat, sum)
 
 mdat1[order(mdat1$terms, mdat1$class), ]
 
 #=
 
 
 John Kane
 Kingston ON Canada
 
 -Original Message-
 From: mxalimoha...@ualr.edu
 Sent: Tue, 26 May 2015 09:50:43 -0500
 To: jrkrid...@inbox.com
 Subject: Re: [R] Problem with comparing multiple data sets
 
 Thank you John for being patient with me.
 
 My original post was to compare 3 sets of data which had difference in
 their class value for the same text. However, I thought it might be
 easier
 to combine those 3 data sets into one that shows the 3 different classes
 and then find the most frequent class value for the text. So that's what
 I
 did. Now I only want to add the most frequent class value in a new
 column.
 
 I tried to create a dput version of the data set (Only a small part of
 it)
 so you can see. I hope it works.
 
 Tweet1- read.csv(file=part1_complete.csv,head=TRUE,sep= ,)
 
 dput(head(Tweet1, 100))
 
 structure(list(class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L,
 
 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 0L, 1L, 2L, 2L, 2L, 1L, 1L, 1L,
 
 1L, 2L, 1L, 1L, 1L, 0L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
 
 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
 
 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L), class.2 = c(2L,
 
 2L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
 
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 
 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L,
 
 0L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L,
 
 1L, 0L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
 
 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 
 1L, 1L, 1L), class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 
 0L, 0L, 0L, 0L, 0L, 0L

Re: [R] Problem with comparing multiple data sets

2015-05-27 Thread Mohammad Alimohammadi

Thanks John,

I really hope it can be answered. Yes all 3 data sets have the same items.

On Wed, May 27, 2015 at 9:32 AM, John Kane jrkrid...@inbox.com wrote:

 Thanks Mohammad.
 The data appear to have come through just fine. This probably means you
 can ignore some of the questions I just sent you -- our emails are crossing.

 I probably will not get a chance  to look at this til this afternoon
 (10:25 here now). We can hope someone with more skill than I have will have
 solved the problem by then.

 This is starting to sound a bit like a psychometric inter-rater
 reliability study.  Does each data set contain the same set of items ?


 John Kane
 Kingston ON Canada

 -Original Message-
 From: mxalimoha...@ualr.edu
 Sent: Wed, 27 May 2015 09:18:12 -0500
 To: jrkrid...@inbox.com, r-help@r-project.org
 Subject: Re: [R] Problem with comparing multiple data sets

 Hi John,

 I created the original data set with dput . This time I only loaded 50
 values for each data set (dat1, dat2, dat3).

 About your question, all 0,1 and 2 are indicator of a specific class. The
 task is to compare 3 independent classification of a certain term and and
 determine the actual class of the term by finding the most frequent
 assigned number for that term.

 I thought it might be easier to combine them into 1 data frame but either
 way is fine.

 Let me know if it shows up clean. I saved the dput in txt file and copied
 here from that file. I assume this is the right way to do it. I might be
 wrong.

 ==

 dat1

 structure(list(class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,

 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,

 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L,

 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), terms = structure(c(1L, 1L,

 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,

 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,

 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label =
 c(#dac,

 #mac,#security, accountability,anonymous, data
 security,encryption,security

 ), class = factor)), .Names = c(class.1, terms), class =
 data.frame, row.names = c(NA,

 -49L))

 dat2

 structure(list(class.2 = c(2L, 2L, 2L, 2L, 0L, 0L, 2L, 0L, 0L,

 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,

 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L,

 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), terms = structure(c(1L, 1L,

 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,

 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,

 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label =
 c(#dac,

 #mac,#security, accountability,anonymous, data
 security,encryption,security

 ), class = factor)), .Names = c(class.2, terms), class =
 data.frame, row.names = c(NA,

 -49L))

 dat3

 structure(list(class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,

 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,

 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L,

 1L, 0L, 0L, 0L, 0L, 2L, 1L, 2L), terms = structure(c(1L, 1L,

 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,

 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,

 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label =
 c(#dac,

 #mac,#security, accountability,anonymous, data
 security,encryption,security

 ), class = factor)), .Names = c(class.3, terms), class =
 data.frame, row.names = c(NA,

 -49L))

 =

 On Wed, May 27, 2015 at 8:05 AM, John Kane jrkrid...@inbox.com wrote:

 Hi Mohammad,

  I went back and reread your original statement of the problem about and I
 think I kinda grasp it. It is actually quite clear and I misunderstood it
 completely.

  At the moment I have no idea how to approach it.  As Jim Lemon said, it
 looks easy but may not be.  I'll go back and re-examine Jim's approach.

  You might want to create three sample data sets of the original data
 layouts and upload them, in dput() format, to the list.  It may be easier
 to tackle from that approach.

  In any case, in the existing data set is a 2 a numeric value 2 or just an
 on/off indicator?

  John Kane
  Kingston ON Canada

   -Original Message-
   From: mxalimoha...@ualr.edu

  Sent: Tue, 26 May 2015 20:11:08 -0500
   To: r-help@r-project.org
   Subject: Re: [R] Problem with comparing multiple data sets
  
   Thank you John. Yes. as you mentioned this is not really what I am
   looking
   for.
  
   It's interesting because I was really thinking that it should be pretty
   easy. All I need to do is just compare class1, class2 and class3 for
 each
   text and put the most frequent number next to it in each row. Repeat it
   for
   all the rows. Apparently it's not that simple.
  
   Sorry I didn't notice that I sent it only to you! Thanks for letting me
   know.
  
   I appreciate if anybody can help

Re: [R] Problem with comparing multiple data sets

2015-05-27 Thread John Kane

Hi Mohammad,

My mantra for the day is Plain Text, Plain Text. A bas HTML.
And I really need to get out of here.  

I have not found a solution but is this a bit more like what you want?

#===

dat1  -  structure(list(class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L,
1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), terms = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label =
c(#dac,
#mac,#security, accountability,anonymous, data
security,encryption,security
), class = factor)), .Names = c(class.1, terms), class =
data.frame, row.names = c(NA,
-49L))

dat2  -  structure(list(class.2 = c(2L, 2L, 2L, 2L, 0L, 0L, 2L, 0L, 0L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L,
2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), terms = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label =
c(#dac,
#mac,#security, accountability,anonymous, data
security,encryption,security
), class = factor)), .Names = c(class.2, terms), class =
data.frame, row.names = c(NA,
-49L))

dat3  -  structure(list(class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L,
1L, 0L, 0L, 0L, 0L, 2L, 1L, 2L), terms = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label =
c(#dac,
#mac,#security, accountability,anonymous, data
security,encryption,security
), class = factor)), .Names = c(class.3, terms), class =
data.frame, row.names = c(NA,
-49L))

names(dat1) -  names(dat2)  -  names(dat3)  -  c(class, term)

bbind  -  rbind(dat1, dat1, dat3)

with(bbind, table( term, class))

#=

John Kane
Kingston ON Canada

-Original Message-
From: mxalimoha...@ualr.edu
Sent: Wed, 27 May 2015 09:37:24 -0500
To: jrkrid...@inbox.com, r-help@r-project.org
Subject: Re: [R] Problem with comparing multiple data sets

Thanks John,

I really hope it can be answered. Yes all 3 data sets have the same items.

On Wed, May 27, 2015 at 9:32 AM, John Kane jrkrid...@inbox.com wrote:

Thanks Mohammad.
 The data appear to have come through just fine. This probably means you can 
ignore some of the questions I just sent you -- our emails are crossing.

 I probably will not get a chance  to look at this til this afternoon (10:25 
here now). We can hope someone with more skill than I have will have solved the 
problem by then.

 This is starting to sound a bit like a psychometric inter-rater reliability 
study.  Does each data set contain the same set of items ?

 John Kane
 Kingston ON Canada

 -Original Message-
 From: mxalimoha...@ualr.edu

Sent: Wed, 27 May 2015 09:18:12 -0500
 To: jrkrid...@inbox.com, r-help@r-project.org
 Subject: Re: [R] Problem with comparing multiple data sets

 Hi John,

 I created the original data set with dput . This time I only loaded 50 values 
for each data set (dat1, dat2, dat3).

 About your question, all 0,1 and 2 are indicator of a specific class. The task 
is to compare 3 independent classification of a certain term and and determine 
the actual class of the term by finding the most frequent assigned number for 
that term.

 I thought it might be easier to combine them into 1 data frame but either way 
is fine.

 Let me know if it shows up clean. I saved the dput in txt file and copied here 
from that file. I assume this is the right way to do it. I might be wrong.

 ==

 dat1

 structure(list(class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 

 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 

 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 

 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), terms = structure(c(1L, 1L, 

 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 

 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 

 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = 
c(#dac, 

 #mac,#security, accountability,anonymous, data 
security,encryption,security

 ), class = factor)), .Names = c(class.1, terms), class = data.frame, 
row.names = c(NA, 

 -49L))

 dat2

 structure(list(class.2 = c(2L, 2L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, 

 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 

 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L

Re: [R] Problem with comparing multiple data sets

2015-05-27 Thread John Kane

Thanks Mohammad. 
The data appear to have come through just fine. This probably means you can 
ignore some of the questions I just sent you -- our emails are crossing. 

I probably will not get a chance  to look at this til this afternoon (10:25 
here now). We can hope someone with more skill than I have will have solved the 
problem by then.

This is starting to sound a bit like a psychometric inter-rater reliability 
study.  Does each data set contain the same set of items ?


John Kane
Kingston ON Canada

-Original Message-
From: mxalimoha...@ualr.edu
Sent: Wed, 27 May 2015 09:18:12 -0500
To: jrkrid...@inbox.com, r-help@r-project.org
Subject: Re: [R] Problem with comparing multiple data sets

Hi John,

I created the original data set with dput . This time I only loaded 50 values 
for each data set (dat1, dat2, dat3).

About your question, all 0,1 and 2 are indicator of a specific class. The task 
is to compare 3 independent classification of a certain term and and determine 
the actual class of the term by finding the most frequent assigned number for 
that term.

I thought it might be easier to combine them into 1 data frame but either way 
is fine.

Let me know if it shows up clean. I saved the dput in txt file and copied here 
from that file. I assume this is the right way to do it. I might be wrong.

==

dat1

structure(list(class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 

0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 

0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 

1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), terms = structure(c(1L, 1L, 

1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 

1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 

1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac, 

#mac,#security, accountability,anonymous, data 
security,encryption,security

), class = factor)), .Names = c(class.1, terms), class = data.frame, 
row.names = c(NA, 

-49L))

dat2

structure(list(class.2 = c(2L, 2L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, 

2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 

0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, 

2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), terms = structure(c(1L, 1L, 

1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 

1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 

1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac, 

#mac,#security, accountability,anonymous, data 
security,encryption,security

), class = factor)), .Names = c(class.2, terms), class = data.frame, 
row.names = c(NA, 

-49L))

dat3

structure(list(class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 

0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 

0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 

1L, 0L, 0L, 0L, 0L, 2L, 1L, 2L), terms = structure(c(1L, 1L, 

1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 

1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 

1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac, 

#mac,#security, accountability,anonymous, data 
security,encryption,security

), class = factor)), .Names = c(class.3, terms), class = data.frame, 
row.names = c(NA, 

-49L))

=

On Wed, May 27, 2015 at 8:05 AM, John Kane jrkrid...@inbox.com wrote:

Hi Mohammad,

 I went back and reread your original statement of the problem about and I 
think I kinda grasp it. It is actually quite clear and I misunderstood it 
completely.

 At the moment I have no idea how to approach it.  As Jim Lemon said, it looks 
easy but may not be.  I'll go back and re-examine Jim's approach.

 You might want to create three sample data sets of the original data layouts 
and upload them, in dput() format, to the list.  It may be easier to tackle 
from that approach.

 In any case, in the existing data set is a 2 a numeric value 2 or just an 
on/off indicator?

 John Kane
 Kingston ON Canada

  -Original Message-
  From: mxalimoha...@ualr.edu

 Sent: Tue, 26 May 2015 20:11:08 -0500
  To: r-help@r-project.org
  Subject: Re: [R] Problem with comparing multiple data sets
 
  Thank you John. Yes. as you mentioned this is not really what I am
  looking
  for.
 
  It's interesting because I was really thinking that it should be pretty
  easy. All I need to do is just compare class1, class2 and class3 for each
  text and put the most frequent number next to it in each row. Repeat it
  for
  all the rows. Apparently it's not that simple.
 
  Sorry I didn't notice that I sent it only to you! Thanks for letting me
  know.
 
  I appreciate if anybody can help on this.
 
  Thank you.
 
 
 
 
  On Tue, May 26, 2015 at 7:27 PM, John Kane jrkrid...@inbox.com wrote:
 
  Hi Mohammad,
 
  The data came through beautifully despite the fact that you posted in
  HTML.  Please, post

Re: [R] Problem with comparing multiple data sets

2015-05-27 Thread John Kane

I was wondering about the layout of each of your data sets. I cobbled together 
what I think is the most likely scenarios.  My bet is the data sets most 
closely resemble my data set 4 in structure. Am I correct?  I dropped the other 
two columns in your data layout as likely to be immaterial to the problem.

data set 1 (unique text and class)
class text
0 text1
2 text2
1 text3
2 text4

data set 2 (unique class, multiple text)
class text
0 text1
0 text1
0 text1
2 text2
1 text3
2 text4

data set 3 (multiple classes, multiple text)
class text
0 text1
0 text1
1 text1
2 text2
1 text3
2 text4

data set 4 (mutltiple classes , multiple text, text not found in other data 
sets)
0 text1
0 text1
1 text1
2 text2
1 text3
2 text4
2 text6
0 text6

John Kane
Kingston ON Canada


 -Original Message-
 From: mxalimoha...@ualr.edu
 Sent: Tue, 26 May 2015 20:11:08 -0500
 To: r-help@r-project.org
 Subject: Re: [R] Problem with comparing multiple data sets
 
 Thank you John. Yes. as you mentioned this is not really what I am
 looking
 for.
 
 It's interesting because I was really thinking that it should be pretty
 easy. All I need to do is just compare class1, class2 and class3 for each
 text and put the most frequent number next to it in each row. Repeat it
 for
 all the rows. Apparently it's not that simple.
 
 Sorry I didn't notice that I sent it only to you! Thanks for letting me
 know.
 
 I appreciate if anybody can help on this.
 
 Thank you.
 
 
 
 
 On Tue, May 26, 2015 at 7:27 PM, John Kane jrkrid...@inbox.com wrote:
 
 Hi Mohammad,
 
 The data came through beautifully despite the fact that you posted in
 HTML.  Please, post in plain text.
 
 Oh, just as I was ready to push Send, I  noticed you only replied to me.
 You really should reply to the R-help list since there are a lot more
 and
 better people to help there. Besides it's a world-wide list. Others can
 play with the problem while we sleep :) .
 
 I will just reply to you but I really suggest sending all of this to the
 list.
 
 Now I am wondering what to do with the data. As a first swipe I just
 added
 up all the values in each class by each text value. Results are below.
 Not
 what you want by any means but perhaps a small step.
 
 Then I started to think are we really interested in the sum or should we
 be looking at incidence, that is should we be looking at the frequency
 rather than the sum?
 
 Is
 class.1 class.2   class  #dac
   0   2  0
 
 a value of 2 (sum) or a hit of 1 (count or freq) ?
 
 Anyway below is what I have tried so far -- it may not be anywhere near
 what you want but if it makes any sense then I think we just need to
 pick
 off the highest values for each combination of terms and class to give
 you
 what you want.
 
 I suspect our real data-munging gurus can do  all this faster and better
 than I can but hopefully it is a start.
 
 Where your data set is dat1
 #=
 # If reshape2 is not installed.
 install.packages(reshape2)
 #=
 
 library(reshape2)
  mdat  -  melt(dat1, id.vars= c(terms),
variable.name = class,
value.name = value,
na.rm = FALSE)
 
 mdat1  -  aggregate(value ~ terms + class, data = mdat, sum)
 
 mdat1[order(mdat1$terms, mdat1$class), ]
 
 #=
 
 
 John Kane
 Kingston ON Canada
 
 -Original Message-
 From: mxalimoha...@ualr.edu
 Sent: Tue, 26 May 2015 09:50:43 -0500
 To: jrkrid...@inbox.com
 Subject: Re: [R] Problem with comparing multiple data sets
 
 Thank you John for being patient with me.
 
 My original post was to compare 3 sets of data which had difference in
 their class value for the same text. However, I thought it might be
 easier
 to combine those 3 data sets into one that shows the 3 different classes
 and then find the most frequent class value for the text. So that's what
 I
 did. Now I only want to add the most frequent class value in a new
 column.
 
 I tried to create a dput version of the data set (Only a small part of
 it)
 so you can see. I hope it works.
 
 Tweet1- read.csv(file=part1_complete.csv,head=TRUE,sep= ,)
 
 dput(head(Tweet1, 100))
 
 structure(list(class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L,
 
 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 0L, 1L, 2L, 2L, 2L, 1L, 1L, 1L,
 
 1L, 2L, 1L, 1L, 1L, 0L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
 
 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
 
 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L), class.2 = c(2L,
 
 2L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
 
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 
 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L,
 
 0L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 0L

Re: [R] Problem with comparing multiple data sets

2015-05-27 Thread Mohammad Alimohammadi

Hi John,

I created the original data set with dput . This time I only loaded 50
values for each data set (dat1, dat2, dat3).

About your question, all 0,1 and 2 are indicator of a specific class. The
task is to compare 3 independent classification of a certain term and and
determine the actual class of the term by finding the most frequent
assigned number for that term.

I thought it might be easier to combine them into 1 data frame but either
way is fine.

Let me know if it shows up clean. I saved the dput in txt file and copied
here from that file. I assume this is the right way to do it. I might be
wrong.


==

*dat1*

structure(list(class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L,
1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), terms = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label =
c(#dac,
#mac,#security, accountability,anonymous, data
security,encryption,security
), class = factor)), .Names = c(class.1, terms), class =
data.frame, row.names = c(NA,
-49L))


*dat2*

structure(list(class.2 = c(2L, 2L, 2L, 2L, 0L, 0L, 2L, 0L, 0L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L,
2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), terms = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label =
c(#dac,
#mac,#security, accountability,anonymous, data
security,encryption,security
), class = factor)), .Names = c(class.2, terms), class =
data.frame, row.names = c(NA,
-49L))

*dat3*

structure(list(class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L,
1L, 0L, 0L, 0L, 0L, 2L, 1L, 2L), terms = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label =
c(#dac,
#mac,#security, accountability,anonymous, data
security,encryption,security
), class = factor)), .Names = c(class.3, terms), class =
data.frame, row.names = c(NA,
-49L))

=









On Wed, May 27, 2015 at 8:05 AM, John Kane jrkrid...@inbox.com wrote:

 Hi Mohammad,

 I went back and reread your original statement of the problem about and I
 think I kinda grasp it. It is actually quite clear and I misunderstood it
 completely.

 At the moment I have no idea how to approach it.  As Jim Lemon said, it
 looks easy but may not be.  I'll go back and re-examine Jim's approach.

 You might want to create three sample data sets of the original data
 layouts and upload them, in dput() format, to the list.  It may be easier
 to tackle from that approach.

 In any case, in the existing data set is a 2 a numeric value 2 or just an
 on/off indicator?

 John Kane
 Kingston ON Canada


  -Original Message-
  From: mxalimoha...@ualr.edu
  Sent: Tue, 26 May 2015 20:11:08 -0500
  To: r-help@r-project.org
  Subject: Re: [R] Problem with comparing multiple data sets
 
  Thank you John. Yes. as you mentioned this is not really what I am
  looking
  for.
 
  It's interesting because I was really thinking that it should be pretty
  easy. All I need to do is just compare class1, class2 and class3 for each
  text and put the most frequent number next to it in each row. Repeat it
  for
  all the rows. Apparently it's not that simple.
 
  Sorry I didn't notice that I sent it only to you! Thanks for letting me
  know.
 
  I appreciate if anybody can help on this.
 
  Thank you.
 
 
 
 
  On Tue, May 26, 2015 at 7:27 PM, John Kane jrkrid...@inbox.com wrote:
 
  Hi Mohammad,
 
  The data came through beautifully despite the fact that you posted in
  HTML.  Please, post in plain text.
 
  Oh, just as I was ready to push Send, I  noticed you only replied to me.
  You really should reply to the R-help list since there are a lot more
  and
  better people to help there. Besides it's a world-wide list. Others can
  play with the problem while we sleep :) .
 
  I will just reply to you but I really suggest sending all of this to the
  list.
 
  Now I am wondering what to do with the data. As a first swipe I just
  added
  up all the values in each class by each text value. Results are below.
  Not
  what you want by any means but perhaps a small step.
 
  Then I started to think are we really interested in the sum or should we
  be looking at incidence, that is should we be looking at the frequency
  rather than the sum

Re: [R] Problem with comparing multiple data sets

2015-05-27 Thread Mohammad Alimohammadi

(text1,text2,text3))
 df3-data.frame(Class=c(2,1,0),Comment=c(com1,com2,com3),
  Term=c(aac,aax,vvx),Text=c(text1,text2,text3))
 dflist-list(df1,df2,df3)
 dflist

 # define a function that extracts the value from one field
 # selected by a value in another field
 extract_by_value-function(x,field1,value1,field2) {
  return(x[x[,field1]==value1,field2])
 }

 # define another function that equates all of the values
 sub_value-function(x,field1,value1,field2,value2) {
  x[x[,field1]==value1,field2]-value2
  return(x)
 }

 conformity-function(x,fieldname1,value1,fieldname2) {
  # get the most frequent value in fieldname2
  # for the desired value in fieldname1
  most_freq-as.numeric(names(which.max(table(unlist(lapply(x,
   extract_by_value,fieldname1,value1,fieldname2))
  # now set all the values to the most frequent
  for(i in 1:length(x))
   x[[i]]-sub_value(x[[i]],fieldname1,value1,fieldname2,most_freq)
  return(x)
 }

 conformity(dflist,Text,text1,Class)

 Jim

 On Sat, May 23, 2015 at 11:23 PM, John Kane jrkrid...@inbox.com wrote:
  Hi Mohammad
 
  Welcome to the R-help list.
 
  There probably is a fairly easy way to what you want but I think we
 probably need a bit more background information on what you are trying to
 achieve.  I know I'm not exactly clear on your decision rule(s).
 
  It would also be very useful to see some actual sample data in useable
 R format.Have a look at these links
 http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
 and http://adv-r.had.co.nz/Reproducibility.html for some hints on what
 you might want to include in your question.
 
  In particular, read up about dput()  in those links and/or see ?dput.
 This is the generally preferred way to supply sample or illustrative data
 to the R-help list.  It basically creates a perfect copy of the data as it
 exists on 'your' machine so that R-help readers see exactly what you do.
 
 
 
 
 
 
 
  John Kane
  Kingston ON Canada
 
 
  -Original Message-
  From: mxalimoha...@ualr.edu
  Sent: Fri, 22 May 2015 12:37:50 -0500
  To: r-help@r-project.org
  Subject: [R] Problem with comparing multiple data sets
 
  Hi everyone,
 
  I am very new to R and I have a task to do. I appreciate any help. I
 have
  3
  data sets. Each data set has 4 columns. For example:
 
  Class  Comment   Term   Text
  0   com1aactext1
  2   com2aaxtext2
  1   com3vvxtext3
 
  Now I need t compare the class section between 3 data sets and assign
 the
  most available class to that text. For example if text1 is assigned to
  class 0 in data set 12 but assigned as 2 in data set 3 then it should
 be
  assigned to class 0. If they are all the same so the class will be the
  same. The ideal thing would be to keep the same format and just update
  the
  class. Is there any easy way to do this?
 
  Thanks a lot.
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
  
  FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!
 
  __
  R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.




 --
 Mohammad Alimohammadi | Graduate Assistant
 University of Arkansas at Little Rock | College of Science and Mathematics
 (CSAM)
 | mxalimoha...@ualr.edu | ualr.edu

 Public URL: http://scholar.google.com/citations?user=MsfN_i8J




-- 
Mohammad Alimohammadi | Graduate Assistant
University of Arkansas at Little Rock | College of Science and Mathematics
(CSAM)
501.346.8007 | mxalimoha...@ualr.edu | ualr.edu

Public URL: http://scholar.google.com/citations?user=MsfN_i8J

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with comparing multiple data sets

2015-05-27 Thread David L Carlson

Save the result of the apply() function:

Out - apply(df[ ,2:length(df)], 1, mfv)

Then there are several options:

Approximately what you asked for
data.frame(Out)
t(t(Out))

More typing but exactly what you asked for
cat(paste0([, 1:length(Out), ] , Out), sep=\n)


David L. Carlson
Department of Anthropology
Texas AM University


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Mohammad 
Alimohammadi
Sent: Wednesday, May 27, 2015 1:47 PM
To: John Kane; r-help@r-project.org
Subject: Re: [R] Problem with comparing multiple data sets

Ok. so I read about the (modeest) package that gives the results that I
am looking for (most repeated value).

I modified the data frame a little and moved the text to the first column.
This is the data frame with all 3 possible classes for each term.

=
structure(list(terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L,
4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac,
#mac,#security,
accountability,anonymous, data security,encryption,security
), class = factor), class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L,
1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), class.2 = c(2L, 2L,
2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L,
0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L),
class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L,
0L, 0L, 0L, 0L, 2L, 1L, 2L)), .Names = c(terms, class.1,
class.2, class.3), class = data.frame, row.names = c(NA,
-49L))
=
#Then I applied the function below:

==
library(modeest)
df- read.csv(file=short.csv, head= TRUE, sep=,)
apply(df[ ,2:length(df)], 1, mfv)


# It gives the most frequent value for each row which is what I need. The
only problem is that all the values are displayed in one single row.

 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 2 1 1 1 1 0 0 0 0 2 1 2

It would be much better to show them in separate rows.
For example:

 [1] 0

 [2] 0

 [3] 1


Any idea how to do this?




On Wed, May 27, 2015 at 10:11 AM, Mohammad Alimohammadi 
mxalimoha...@ualr.edu wrote:

 Hi Jim,

 Thank you for your advice.

 I'm not sure how to exactly incorporate this function though. I added a
 portion of the actual data sets. all 3 data sets have the same items (text)
 with different class values. So I need to assign the most repeated class
 (0,1,2) for each text.

 For example: if line1 has text aaa. It may be assigned to class 0 in
 dat1, 2 in dat 2 and 0 in dat3. in this case the aaa will be assigned to
 0 (most repeated value). So it goes for each text.

 I really appreciate your help.

 =

 *dat1*

 structure(list(class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L,
 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), terms = structure(c(1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label =
 c(#dac,
 #mac,#security, accountability,anonymous, data
 security,encryption,security
 ), class = factor)), .Names = c(class.1, terms), class =
 data.frame, row.names = c(NA,
 -49L))


 *dat2*

 structure(list(class.2 = c(2L, 2L, 2L, 2L, 0L, 0L, 2L, 0L, 0L,
 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L,
 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), terms = structure(c(1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label =
 c(#dac,
 #mac,#security, accountability,anonymous, data
 security,encryption,security
 ), class = factor)), .Names = c(class.2, terms), class =
 data.frame, row.names = c(NA,
 -49L))


 *dat3*

 structure(list(class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L,
 1L, 0L, 0L, 0L, 0L, 2L, 1L, 2L), terms = structure(c(1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label =
 c(#dac,
 #mac

Re: [R] Problem with comparing multiple data sets

2015-05-27 Thread Mohammad Alimohammadi

Thanks David it worked !

One more thing. I hope it's not complicated. Is it also possible to display
the terms for each row next to it?

for example:

[1] #dac2
[2] #dac0
[3] #dac1
...




On Wed, May 27, 2015 at 2:18 PM, David L Carlson dcarl...@tamu.edu wrote:

 Save the result of the apply() function:

 Out - apply(df[ ,2:length(df)], 1, mfv)

 Then there are several options:

 Approximately what you asked for
 data.frame(Out)
 t(t(Out))

 More typing but exactly what you asked for
 cat(paste0([, 1:length(Out), ] , Out), sep=\n)


 David L. Carlson
 Department of Anthropology
 Texas AM University


 -Original Message-
 From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Mohammad
 Alimohammadi
 Sent: Wednesday, May 27, 2015 1:47 PM
 To: John Kane; r-help@r-project.org
 Subject: Re: [R] Problem with comparing multiple data sets

 Ok. so I read about the (modeest) package that gives the results that I
 am looking for (most repeated value).

 I modified the data frame a little and moved the text to the first column.
 This is the data frame with all 3 possible classes for each term.

 =
 structure(list(terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L,
 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac,
 #mac,#security,
 accountability,anonymous, data security,encryption,security
 ), class = factor), class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L,
 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), class.2 = c(2L, 2L,
 2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L,
 0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L),
 class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L,
 0L, 0L, 0L, 0L, 2L, 1L, 2L)), .Names = c(terms, class.1,
 class.2, class.3), class = data.frame, row.names = c(NA,
 -49L))
 =
 #Then I applied the function below:

 ==
 library(modeest)
 df- read.csv(file=short.csv, head= TRUE, sep=,)
 apply(df[ ,2:length(df)], 1, mfv)

 
 # It gives the most frequent value for each row which is what I need. The
 only problem is that all the values are displayed in one single row.

  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 2 1 1 1 1 0 0 0 0 2 1 2

 It would be much better to show them in separate rows.
 For example:

  [1] 0

  [2] 0

  [3] 1
 

 Any idea how to do this?




 On Wed, May 27, 2015 at 10:11 AM, Mohammad Alimohammadi 
 mxalimoha...@ualr.edu wrote:

  Hi Jim,
 
  Thank you for your advice.
 
  I'm not sure how to exactly incorporate this function though. I added a
  portion of the actual data sets. all 3 data sets have the same items
 (text)
  with different class values. So I need to assign the most repeated class
  (0,1,2) for each text.
 
  For example: if line1 has text aaa. It may be assigned to class 0 in
  dat1, 2 in dat 2 and 0 in dat3. in this case the aaa will be assigned
 to
  0 (most repeated value). So it goes for each text.
 
  I really appreciate your help.
 
  =
 
  *dat1*
 
  structure(list(class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L,
  1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), terms = structure(c(1L, 1L,
  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
  1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label =
  c(#dac,
  #mac,#security, accountability,anonymous, data
  security,encryption,security
  ), class = factor)), .Names = c(class.1, terms), class =
  data.frame, row.names = c(NA,
  -49L))
 
 
  *dat2*
 
  structure(list(class.2 = c(2L, 2L, 2L, 2L, 0L, 0L, 2L, 0L, 0L,
  2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
  0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L,
  2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), terms = structure(c(1L, 1L,
  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
  1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label =
  c(#dac,
  #mac,#security, accountability,anonymous, data
  security,encryption,security
  ), class = factor)), .Names = c(class.2, terms), class =
  data.frame, row.names = c(NA,
  -49L))
 
 
  *dat3*
 
  structure(list(class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
  0L, 0L, 0L, 0L, 0L

Re: [R] Problem with comparing multiple data sets

2015-05-27 Thread David L Carlson

cat(paste0([, 1:length(Out), ] #dac , Out), sep=\n)

David
From: Mohammad Alimohammadi [mailto:mxalimoha...@ualr.edu]
Sent: Wednesday, May 27, 2015 2:29 PM
To: David L Carlson; r-help@r-project.org
Subject: Re: [R] Problem with comparing multiple data sets

Thanks David it worked !

One more thing. I hope it's not complicated. Is it also possible to display the 
terms for each row next to it?

for example:

[1] #dac2
[2] #dac0
[3] #dac1
...




On Wed, May 27, 2015 at 2:18 PM, David L Carlson 
dcarl...@tamu.edumailto:dcarl...@tamu.edu wrote:
Save the result of the apply() function:

Out - apply(df[ ,2:length(df)], 1, mfv)

Then there are several options:

Approximately what you asked for
data.frame(Out)
t(t(Out))

More typing but exactly what you asked for
cat(paste0([, 1:length(Out), ] , Out), sep=\n)


David L. Carlson
Department of Anthropology
Texas AM University


-Original Message-
From: R-help 
[mailto:r-help-boun...@r-project.orgmailto:r-help-boun...@r-project.org] On 
Behalf Of Mohammad Alimohammadi
Sent: Wednesday, May 27, 2015 1:47 PM
To: John Kane; r-help@r-project.orgmailto:r-help@r-project.org
Subject: Re: [R] Problem with comparing multiple data sets

Ok. so I read about the (modeest) package that gives the results that I
am looking for (most repeated value).

I modified the data frame a little and moved the text to the first column.
This is the data frame with all 3 possible classes for each term.

=
structure(list(terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L,
4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac,
#mac,#security,
accountability,anonymous, data security,encryption,security
), class = factor), class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L,
1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), class.2 = c(2L, 2L,
2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L,
0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L),
class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L,
0L, 0L, 0L, 0L, 2L, 1L, 2L)), .Names = c(terms, class.1,
class.2, class.3), class = data.frame, row.names = c(NA,
-49L))
=
#Then I applied the function below:

==
library(modeest)
df- read.csv(file=short.csv, head= TRUE, sep=,)
apply(df[ ,2:length(df)], 1, mfv)


# It gives the most frequent value for each row which is what I need. The
only problem is that all the values are displayed in one single row.

 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 2 1 1 1 1 0 0 0 0 2 1 2

It would be much better to show them in separate rows.
For example:

 [1] 0

 [2] 0

 [3] 1


Any idea how to do this?



On Wed, May 27, 2015 at 10:11 AM, Mohammad Alimohammadi 
mxalimoha...@ualr.edumailto:mxalimoha...@ualr.edu wrote:

 Hi Jim,

 Thank you for your advice.

 I'm not sure how to exactly incorporate this function though. I added a
 portion of the actual data sets. all 3 data sets have the same items (text)
 with different class values. So I need to assign the most repeated class
 (0,1,2) for each text.

 For example: if line1 has text aaa. It may be assigned to class 0 in
 dat1, 2 in dat 2 and 0 in dat3. in this case the aaa will be assigned to
 0 (most repeated value). So it goes for each text.

 I really appreciate your help.

 =

 *dat1*

 structure(list(class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L,
 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), terms = structure(c(1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label =
 c(#dac,
 #mac,#security, accountability,anonymous, data
 security,encryption,security
 ), class = factor)), .Names = c(class.1, terms), class =
 data.frame, row.names = c(NA,
 -49L))


 *dat2*

 structure(list(class.2 = c(2L, 2L, 2L, 2L, 0L, 0L, 2L, 0L, 0L,
 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L,
 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), terms = structure(c(1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label =
 c(#dac,
 #mac

Re: [R] Problem with comparing multiple data sets

2015-05-26 Thread Mohammad Alimohammadi

Thank you John. Yes. as you mentioned this is not really what I am looking
for.

It's interesting because I was really thinking that it should be pretty
easy. All I need to do is just compare class1, class2 and class3 for each
text and put the most frequent number next to it in each row. Repeat it for
all the rows. Apparently it's not that simple.

Sorry I didn't notice that I sent it only to you! Thanks for letting me
know.

I appreciate if anybody can help on this.

Thank you.




On Tue, May 26, 2015 at 7:27 PM, John Kane jrkrid...@inbox.com wrote:

 Hi Mohammad,

 The data came through beautifully despite the fact that you posted in
 HTML.  Please, post in plain text.

 Oh, just as I was ready to push Send, I  noticed you only replied to me.
 You really should reply to the R-help list since there are a lot more and
 better people to help there. Besides it's a world-wide list. Others can
 play with the problem while we sleep :) .

 I will just reply to you but I really suggest sending all of this to the
 list.

 Now I am wondering what to do with the data. As a first swipe I just added
 up all the values in each class by each text value. Results are below. Not
 what you want by any means but perhaps a small step.

 Then I started to think are we really interested in the sum or should we
 be looking at incidence, that is should we be looking at the frequency
 rather than the sum?

 Is
 class.1 class.2   class  #dac
   0   2  0

 a value of 2 (sum) or a hit of 1 (count or freq) ?

 Anyway below is what I have tried so far -- it may not be anywhere near
 what you want but if it makes any sense then I think we just need to pick
 off the highest values for each combination of terms and class to give you
 what you want.

 I suspect our real data-munging gurus can do  all this faster and better
 than I can but hopefully it is a start.

 Where your data set is dat1
 #=
 # If reshape2 is not installed.
 install.packages(reshape2)
 #=

 library(reshape2)
  mdat  -  melt(dat1, id.vars= c(terms),
variable.name = class,
value.name = value,
na.rm = FALSE)

 mdat1  -  aggregate(value ~ terms + class, data = mdat, sum)

 mdat1[order(mdat1$terms, mdat1$class), ]

 #=


 John Kane
 Kingston ON Canada

 -Original Message-
 From: mxalimoha...@ualr.edu
 Sent: Tue, 26 May 2015 09:50:43 -0500
 To: jrkrid...@inbox.com
 Subject: Re: [R] Problem with comparing multiple data sets

 Thank you John for being patient with me.

 My original post was to compare 3 sets of data which had difference in
 their class value for the same text. However, I thought it might be easier
 to combine those 3 data sets into one that shows the 3 different classes
 and then find the most frequent class value for the text. So that's what I
 did. Now I only want to add the most frequent class value in a new column.

 I tried to create a dput version of the data set (Only a small part of it)
 so you can see. I hope it works.

  Tweet1- read.csv(file=part1_complete.csv,head=TRUE,sep= ,)

  dput(head(Tweet1, 100))

 structure(list(class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,

 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,

 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L,

 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 0L, 1L, 2L, 2L, 2L, 1L, 1L, 1L,

 1L, 2L, 1L, 1L, 1L, 0L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,

 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,

 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L), class.2 = c(2L,

 2L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,

 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,

 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L,

 0L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L,

 1L, 0L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,

 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,

 1L, 1L, 1L), class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,

 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,

 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L,

 1L, 0L, 0L, 0L, 0L, 2L, 1L, 2L, 0L, 2L, 2L, 0L, 2L, 1L, 1L, 1L,

 1L, 0L, 0L, 0L, 2L, 1L, 0L, 0L, 1L, 0L, 0L, 2L, 2L, 2L, 2L, 2L,

 0L, 2L, 2L, 1L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,

 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L), terms = structure(c(9L,

 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L,

 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L,

 9L, 9L, 9L, 9L, 69L, 69L, 69L, 69L, 69L, 40L, 40L, 40L, 40L,

 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 98L, 98L, 98L, 98L, 98L,

 98L, 98L, 98L, 98L, 98L, 98L, 98L, 98L, 98L, 23L, 87L, 87L, 87L,

 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L,

 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L,

 87L, 87L), .Label = c(#accountability

Re: [R] Problem with comparing multiple data sets

2015-05-24 Thread Jim Lemon

Hi Mohammad,
You know, I thought this would be fairly easy, but it wasn't really.

df1-data.frame(Class=c(0,2,1),Comment=c(com1,com2,com3),
 Term=c(aac,aax,vvx),Text=c(text1,text2,text3))
df2-data.frame(Class=c(0,2,1),Comment=c(com1,com2,com3),
 Term=c(aac,aax,vvx),Text=c(text1,text2,text3))
df3-data.frame(Class=c(2,1,0),Comment=c(com1,com2,com3),
 Term=c(aac,aax,vvx),Text=c(text1,text2,text3))
dflist-list(df1,df2,df3)
dflist

# define a function that extracts the value from one field
# selected by a value in another field
extract_by_value-function(x,field1,value1,field2) {
 return(x[x[,field1]==value1,field2])
}

# define another function that equates all of the values
sub_value-function(x,field1,value1,field2,value2) {
 x[x[,field1]==value1,field2]-value2
 return(x)
}

conformity-function(x,fieldname1,value1,fieldname2) {
 # get the most frequent value in fieldname2
 # for the desired value in fieldname1
 most_freq-as.numeric(names(which.max(table(unlist(lapply(x,
  extract_by_value,fieldname1,value1,fieldname2))
 # now set all the values to the most frequent
 for(i in 1:length(x))
  x[[i]]-sub_value(x[[i]],fieldname1,value1,fieldname2,most_freq)
 return(x)
}

conformity(dflist,Text,text1,Class)

Jim

On Sat, May 23, 2015 at 11:23 PM, John Kane jrkrid...@inbox.com wrote:
 Hi Mohammad

 Welcome to the R-help list.

 There probably is a fairly easy way to what you want but I think we probably 
 need a bit more background information on what you are trying to achieve.  I 
 know I'm not exactly clear on your decision rule(s).

 It would also be very useful to see some actual sample data in useable R 
 format.Have a look at these links 
 http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
  and http://adv-r.had.co.nz/Reproducibility.html for some hints on what you 
 might want to include in your question.

 In particular, read up about dput()  in those links and/or see ?dput.  This 
 is the generally preferred way to supply sample or illustrative data to the 
 R-help list.  It basically creates a perfect copy of the data as it exists on 
 'your' machine so that R-help readers see exactly what you do.







 John Kane
 Kingston ON Canada


 -Original Message-
 From: mxalimoha...@ualr.edu
 Sent: Fri, 22 May 2015 12:37:50 -0500
 To: r-help@r-project.org
 Subject: [R] Problem with comparing multiple data sets

 Hi everyone,

 I am very new to R and I have a task to do. I appreciate any help. I have
 3
 data sets. Each data set has 4 columns. For example:

 Class  Comment   Term   Text
 0   com1aactext1
 2   com2aaxtext2
 1   com3vvxtext3

 Now I need t compare the class section between 3 data sets and assign the
 most available class to that text. For example if text1 is assigned to
 class 0 in data set 12 but assigned as 2 in data set 3 then it should be
 assigned to class 0. If they are all the same so the class will be the
 same. The ideal thing would be to keep the same format and just update
 the
 class. Is there any easy way to do this?

 Thanks a lot.

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with comparing multiple data sets

2015-05-23 Thread John Kane

Hi Mohammad 

Welcome to the R-help list.

There probably is a fairly easy way to what you want but I think we probably 
need a bit more background information on what you are trying to achieve.  I 
know I'm not exactly clear on your decision rule(s). 

It would also be very useful to see some actual sample data in useable R 
format.Have a look at these links 
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
 and http://adv-r.had.co.nz/Reproducibility.html for some hints on what you 
might want to include in your question.

In particular, read up about dput()  in those links and/or see ?dput.  This is 
the generally preferred way to supply sample or illustrative data to the R-help 
list.  It basically creates a perfect copy of the data as it exists on 'your' 
machine so that R-help readers see exactly what you do.  







John Kane
Kingston ON Canada


 -Original Message-
 From: mxalimoha...@ualr.edu
 Sent: Fri, 22 May 2015 12:37:50 -0500
 To: r-help@r-project.org
 Subject: [R] Problem with comparing multiple data sets
 
 Hi everyone,
 
 I am very new to R and I have a task to do. I appreciate any help. I have
 3
 data sets. Each data set has 4 columns. For example:
 
 Class  Comment   Term   Text
 0   com1aactext1
 2   com2aaxtext2
 1   com3vvxtext3
 
 Now I need t compare the class section between 3 data sets and assign the
 most available class to that text. For example if text1 is assigned to
 class 0 in data set 12 but assigned as 2 in data set 3 then it should be
 assigned to class 0. If they are all the same so the class will be the
 same. The ideal thing would be to keep the same format and just update
 the
 class. Is there any easy way to do this?
 
 Thanks a lot.
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problem with comparing multiple data sets

2015-05-22 Thread Mohammad Alimohammadi

Hi everyone,

I am very new to R and I have a task to do. I appreciate any help. I have 3
data sets. Each data set has 4 columns. For example:

Class  Comment   Term   Text
0   com1aactext1
2   com2aaxtext2
1   com3vvxtext3

Now I need t compare the class section between 3 data sets and assign the
most available class to that text. For example if text1 is assigned to
class 0 in data set 12 but assigned as 2 in data set 3 then it should be
assigned to class 0. If they are all the same so the class will be the
same. The ideal thing would be to keep the same format and just update the
class. Is there any easy way to do this?

Thanks a lot.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with comparing multiple data sets

Re: [R] Problem with comparing multiple data sets

Re: [R] Problem with comparing multiple data sets

Re: [R] Problem with comparing multiple data sets

Re: [R] Problem with comparing multiple data sets

Re: [R] Problem with comparing multiple data sets

Re: [R] Problem with comparing multiple data sets

Re: [R] Problem with comparing multiple data sets

Re: [R] Problem with comparing multiple data sets

Re: [R] Problem with comparing multiple data sets

Re: [R] Problem with comparing multiple data sets

Re: [R] Problem with comparing multiple data sets

Re: [R] Problem with comparing multiple data sets

Re: [R] Problem with comparing multiple data sets

Re: [R] Problem with comparing multiple data sets

Re: [R] Problem with comparing multiple data sets

Re: [R] Problem with comparing multiple data sets

[R] Problem with comparing multiple data sets

18 matches

Site Navigation

Mail list logo

Footer information