Re: [R] Problem with comparing multiple data sets
Hi everyone. I tried the (modeest) package on my initial test data and it worked. However, it doesn't work on the entire data set. I saved one of the protions that gives error. (Not for all of the values but for some of them). For example: lines 36 and 37 and 39 correctly show the mode value but 38 and 40 are not correct. Such error is repeated for many of the values. [36,] 2 [37,] 2 [38,] Numeric,3 [39,] 1 [40,] Numeric,3 #This is what I did: df- read.csv(file=Part1-modif.csv, head=TRUE, sep=,) Out- apply(df[,2:length(df)],1, mfv) t(t(Out)) #This is the data set structure(list(terms = structure(c(2L, 4L, 4L, 4L, 3L, 1L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c(#authentication,access control, #privacy,personal data, #security,malicious,security, data controller, id management,security, password,recovery), class = factor), class.1 = c(2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L), class.2 = c(2L, 2L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L), class.3 = c(2L, 0L, 2L, 2L, 1L, 1L, 0L, 0L, 0L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c(terms, class.1, class.2, class.3), class = data.frame, row.names = c(NA, -50L)) also when I try to include the terms to the result it gives me an error: mode.names- data.frame (df[,1],Out) Error in data.frame(df[, 1], Out) : arguments imply differing number of rows: 50, 3 On Thu, May 28, 2015 at 9:24 AM, Mohammad Alimohammadi mxalimoha...@ualr.edu wrote: Thank you David for your help ! On Wed, May 27, 2015 at 7:31 PM, David L Carlson dcarl...@tamu.edu wrote: cat(paste0([, 1:length(Out), ] #dac , Out), sep=\n) David *From:* Mohammad Alimohammadi [mailto:mxalimoha...@ualr.edu] *Sent:* Wednesday, May 27, 2015 2:29 PM *To:* David L Carlson; r-help@r-project.org *Subject:* Re: [R] Problem with comparing multiple data sets Thanks David it worked ! One more thing. I hope it's not complicated. Is it also possible to display the terms for each row next to it? for example: [1] #dac2 [2] #dac0 [3] #dac1 ... On Wed, May 27, 2015 at 2:18 PM, David L Carlson dcarl...@tamu.edu wrote: Save the result of the apply() function: Out - apply(df[ ,2:length(df)], 1, mfv) Then there are several options: Approximately what you asked for data.frame(Out) t(t(Out)) More typing but exactly what you asked for cat(paste0([, 1:length(Out), ] , Out), sep=\n) David L. Carlson Department of Anthropology Texas AM University -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Mohammad Alimohammadi Sent: Wednesday, May 27, 2015 1:47 PM To: John Kane; r-help@r-project.org Subject: Re: [R] Problem with comparing multiple data sets Ok. so I read about the (modeest) package that gives the results that I am looking for (most repeated value). I modified the data frame a little and moved the text to the first column. This is the data frame with all 3 possible classes for each term. = structure(list(terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac, #mac,#security, accountability,anonymous, data security,encryption,security ), class = factor), class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), class.2 = c(2L, 2L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 2L, 1L, 2L)), .Names = c(terms, class.1, class.2, class.3), class = data.frame, row.names = c(NA, -49L)) = #Then I applied the function below: == library(modeest) df- read.csv(file=short.csv
Re: [R] Problem with comparing multiple data sets
Hi Mohammad, I have no idea what is happening but for some reason your new data (renamed df1 since df is a reserved word in R) is outputting a list whereas dff1 (your original test data) is giving a vector as you wanted. It may be obvious but I don't see why df1 is giving us a list. As far as I can tell the two data sets are structually the same. The two data sets are below the program. ## = library(modeest) # Original test data str(dff2) head(dff2) # sample of new data str(d1) head(df1) Out.dff2 - apply(dff2[ ,2:length(dff2)], 1, mfv) str(Out.dff2) Out.df1 - apply(df1[ , 2:length(df1)], 1, mfv) str(Out.df1) ## = ## New data set df1 - structure(list(terms = structure(c(2L, 4L, 4L, 4L, 3L, 1L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c(#authentication,access control, #privacy,personal data, #security,malicious,security, data controller, id management,security, password,recovery), class = factor), class.1 = c(2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L), class.2 = c(2L, 2L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L), class.3 = c(2L, 0L, 2L, 2L, 1L, 1L, 0L, 0L, 0L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c(terms, class.1, class.2, class.3), class = data.frame, row.names = c(NA, -50L)) ## Original test data set dff2 - structure(list(terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac, #mac,#security, accountability,anonymous, data security,encryption,security ), class = factor), class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), class.2 = c(2L, 2L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 2L, 1L, 2L)), .Names = c(terms, class.1, class.2, class.3), class = data.frame, row.names = c(NA, -49L)) ##= John Kane Kingston ON Canada -Original Message- From: mxalimoha...@ualr.edu Sent: Fri, 29 May 2015 11:40:41 -0500 To: dcarl...@tamu.edu, drjimle...@gmail.com, jrkrid...@inbox.com, r-help@r-project.org Subject: Re: [R] Problem with comparing multiple data sets Hi everyone. I tried the (modeest) package on my initial test data and it worked. However, it doesn't work on the entire data set. I saved one of the protions that gives error. (Not for all of the values but for some of them). For example: lines 36 and 37 and 39 correctly show the mode value but 38 and 40 are not correct. Such error is repeated for many of the values. [36,] 2 [37,] 2 [38,] Numeric,3 [39,] 1 [40,] Numeric,3 #This is what I did: df- read.csv(file=Part1-modif.csv, head=TRUE, sep=,) Out- apply(df[,2:length(df)],1, mfv) t(t(Out)) #This is the data set structure(list(terms = structure(c(2L, 4L, 4L, 4L, 3L, 1L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c(#authentication,access control, #privacy,personal data, #security,malicious,security, data controller, id management,security, password,recovery), class = factor), class.1 = c(2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L), class.2 = c(2L, 2L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L), class.3 = c(2L, 0L, 2L, 2L, 1L, 1L, 0L, 0L, 0L, 2L, 2L, 0L, 0L, 0L, 0L, 1L
Re: [R] Problem with comparing multiple data sets
Hi Mohammad, It looks like you are still having problems with this. Given your latest data set, as below, here is something that might do what you want. From David's message, I'm not sure whether you are operating on a single data frame or a list. # this is the data set as taken from your message below madf-structure(list(terms = structure(c(2L, 4L, 4L, 4L, 3L, 1L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c(#authentication,access control, #privacy,personal data, #security,malicious,security, data controller, id management,security, password,recovery), class = factor), class.1 = c(2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L), class.2 = c(2L, 2L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L), class.3 = c(2L, 0L, 2L, 2L, 1L, 1L, 0L, 0L, 0L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c(terms, class.1, class.2, class.3), class = data.frame, row.names = c(NA, -50L)) # define a function that extracts the value from one field # selected by a value in another field extract_by_value-function(x,field1,value1,field2) { return(x[x[,field1]==value1,field2]) } # define another function that equates all of the values sub_value-function(x,field1,value1,field2,value2) { x[x[,field1]==value1,field2]-value2 return(x) } # this now steps through every value in key_field # and operates on every field listed in change_fields conformity-function(x,key_field,change_fields) { keys-unique(x[,key_field]) for(key in keys) { for(change_field in change_fields) { # get the most frequent value in change_field # for the desired value in key_field most_freq-as.numeric(names(which.max(table( extract_by_value(x,key_field,key,change_field) # now set all the values to the most frequent x-sub_value(x,key_field,key,change_field,most_freq) } } return(x) } conformity(madf,terms,c(class.1,class.2,class.3)) Obviously you will want to save the return value of conformity into your original data frame or create a new one. Jim Hi everyone. I tried the (modeest) package on my initial test data and it worked. However, it doesn't work on the entire data set. I saved one of the protions that gives error. (Not for all of the values but for some of them). For example: lines 36 and 37 and 39 correctly show the mode value but 38 and 40 are not correct. Such error is repeated for many of the values. [36,] 2 [37,] 2 [38,] Numeric,3 [39,] 1 [40,] Numeric,3 #This is what I did: df- read.csv(file=Part1-modif.csv, head=TRUE, sep=,) Out- apply(df[,2:length(df)],1, mfv) t(t(Out)) #This is the data set structure(list(terms = structure(c(2L, 4L, 4L, 4L, 3L, 1L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c(#authentication,access control, #privacy,personal data, #security,malicious,security, data controller, id management,security, password,recovery), class = factor), class.1 = c(2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L), class.2 = c(2L, 2L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L), class.3 = c(2L, 0L, 2L, 2L, 1L, 1L, 0L, 0L, 0L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c(terms, class.1, class.2, class.3), class = data.frame, row.names = c(NA, -50L)) also when I try to include the terms to the result it gives me an error: mode.names- data.frame (df[,1],Out) Error in data.frame(df[, 1], Out) : arguments imply differing number of rows: 50, 3 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained,
Re: [R] Problem with comparing multiple data sets
Lovely solution Mohammed. I had not even heard of the modeest package. For names, I'd just create another data.frame mode.names - data.frame(df[,1], Out) John Kane Kingston ON Canada -Original Message- From: dcarl...@tamu.edu Sent: Thu, 28 May 2015 00:31:45 + To: mxalimoha...@ualr.edu, r-help@r-project.org Subject: Re: [R] Problem with comparing multiple data sets cat(paste0([, 1:length(Out), ] #dac , Out), sep=\n) David From: Mohammad Alimohammadi [mailto:mxalimoha...@ualr.edu] Sent: Wednesday, May 27, 2015 2:29 PM To: David L Carlson; r-help@r-project.org Subject: Re: [R] Problem with comparing multiple data sets Thanks David it worked ! One more thing. I hope it's not complicated. Is it also possible to display the terms for each row next to it? for example: [1] #dac2 [2] #dac0 [3] #dac1 ... On Wed, May 27, 2015 at 2:18 PM, David L Carlson dcarl...@tamu.edumailto:dcarl...@tamu.edu wrote: Save the result of the apply() function: Out - apply(df[ ,2:length(df)], 1, mfv) Then there are several options: Approximately what you asked for data.frame(Out) t(t(Out)) More typing but exactly what you asked for cat(paste0([, 1:length(Out), ] , Out), sep=\n) David L. Carlson Department of Anthropology Texas AM University -Original Message- From: R-help [mailto:r-help-boun...@r-project.orgmailto:r-help-boun...@r-project.org] On Behalf Of Mohammad Alimohammadi Sent: Wednesday, May 27, 2015 1:47 PM To: John Kane; r-help@r-project.orgmailto:r-help@r-project.org Subject: Re: [R] Problem with comparing multiple data sets Ok. so I read about the (modeest) package that gives the results that I am looking for (most repeated value). I modified the data frame a little and moved the text to the first column. This is the data frame with all 3 possible classes for each term. = structure(list(terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac, #mac,#security, accountability,anonymous, data security,encryption,security ), class = factor), class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), class.2 = c(2L, 2L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 2L, 1L, 2L)), .Names = c(terms, class.1, class.2, class.3), class = data.frame, row.names = c(NA, -49L)) = #Then I applied the function below: == library(modeest) df- read.csv(file=short.csv, head= TRUE, sep=,) apply(df[ ,2:length(df)], 1, mfv) # It gives the most frequent value for each row which is what I need. The only problem is that all the values are displayed in one single row. [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 1 1 0 0 0 0 2 1 2 It would be much better to show them in separate rows. For example: [1] 0 [2] 0 [3] 1 Any idea how to do this? On Wed, May 27, 2015 at 10:11 AM, Mohammad Alimohammadi mxalimoha...@ualr.edumailto:mxalimoha...@ualr.edu wrote: Hi Jim, Thank you for your advice. I'm not sure how to exactly incorporate this function though. I added a portion of the actual data sets. all 3 data sets have the same items (text) with different class values. So I need to assign the most repeated class (0,1,2) for each text. For example: if line1 has text aaa. It may be assigned to class 0 in dat1, 2 in dat 2 and 0 in dat3. in this case the aaa will be assigned to 0 (most repeated value). So it goes for each text. I really appreciate your help. = *dat1* structure(list(class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac, #mac,#security, accountability,anonymous, data security,encryption,security ), class = factor)), .Names = c(class.1, terms), class = data.frame
Re: [R] Problem with comparing multiple data sets
Hi Mohammad, I went back and reread your original statement of the problem about and I think I kinda grasp it. It is actually quite clear and I misunderstood it completely. At the moment I have no idea how to approach it. As Jim Lemon said, it looks easy but may not be. I'll go back and re-examine Jim's approach. You might want to create three sample data sets of the original data layouts and upload them, in dput() format, to the list. It may be easier to tackle from that approach. In any case, in the existing data set is a 2 a numeric value 2 or just an on/off indicator? John Kane Kingston ON Canada -Original Message- From: mxalimoha...@ualr.edu Sent: Tue, 26 May 2015 20:11:08 -0500 To: r-help@r-project.org Subject: Re: [R] Problem with comparing multiple data sets Thank you John. Yes. as you mentioned this is not really what I am looking for. It's interesting because I was really thinking that it should be pretty easy. All I need to do is just compare class1, class2 and class3 for each text and put the most frequent number next to it in each row. Repeat it for all the rows. Apparently it's not that simple. Sorry I didn't notice that I sent it only to you! Thanks for letting me know. I appreciate if anybody can help on this. Thank you. On Tue, May 26, 2015 at 7:27 PM, John Kane jrkrid...@inbox.com wrote: Hi Mohammad, The data came through beautifully despite the fact that you posted in HTML. Please, post in plain text. Oh, just as I was ready to push Send, I noticed you only replied to me. You really should reply to the R-help list since there are a lot more and better people to help there. Besides it's a world-wide list. Others can play with the problem while we sleep :) . I will just reply to you but I really suggest sending all of this to the list. Now I am wondering what to do with the data. As a first swipe I just added up all the values in each class by each text value. Results are below. Not what you want by any means but perhaps a small step. Then I started to think are we really interested in the sum or should we be looking at incidence, that is should we be looking at the frequency rather than the sum? Is class.1 class.2 class #dac 0 2 0 a value of 2 (sum) or a hit of 1 (count or freq) ? Anyway below is what I have tried so far -- it may not be anywhere near what you want but if it makes any sense then I think we just need to pick off the highest values for each combination of terms and class to give you what you want. I suspect our real data-munging gurus can do all this faster and better than I can but hopefully it is a start. Where your data set is dat1 #= # If reshape2 is not installed. install.packages(reshape2) #= library(reshape2) mdat - melt(dat1, id.vars= c(terms), variable.name = class, value.name = value, na.rm = FALSE) mdat1 - aggregate(value ~ terms + class, data = mdat, sum) mdat1[order(mdat1$terms, mdat1$class), ] #= John Kane Kingston ON Canada -Original Message- From: mxalimoha...@ualr.edu Sent: Tue, 26 May 2015 09:50:43 -0500 To: jrkrid...@inbox.com Subject: Re: [R] Problem with comparing multiple data sets Thank you John for being patient with me. My original post was to compare 3 sets of data which had difference in their class value for the same text. However, I thought it might be easier to combine those 3 data sets into one that shows the 3 different classes and then find the most frequent class value for the text. So that's what I did. Now I only want to add the most frequent class value in a new column. I tried to create a dput version of the data set (Only a small part of it) so you can see. I hope it works. Tweet1- read.csv(file=part1_complete.csv,head=TRUE,sep= ,) dput(head(Tweet1, 100)) structure(list(class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 0L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 0L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L), class.2 = c(2L, 2L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L
Re: [R] Problem with comparing multiple data sets
Thanks John, I really hope it can be answered. Yes all 3 data sets have the same items. On Wed, May 27, 2015 at 9:32 AM, John Kane jrkrid...@inbox.com wrote: Thanks Mohammad. The data appear to have come through just fine. This probably means you can ignore some of the questions I just sent you -- our emails are crossing. I probably will not get a chance to look at this til this afternoon (10:25 here now). We can hope someone with more skill than I have will have solved the problem by then. This is starting to sound a bit like a psychometric inter-rater reliability study. Does each data set contain the same set of items ? John Kane Kingston ON Canada -Original Message- From: mxalimoha...@ualr.edu Sent: Wed, 27 May 2015 09:18:12 -0500 To: jrkrid...@inbox.com, r-help@r-project.org Subject: Re: [R] Problem with comparing multiple data sets Hi John, I created the original data set with dput . This time I only loaded 50 values for each data set (dat1, dat2, dat3). About your question, all 0,1 and 2 are indicator of a specific class. The task is to compare 3 independent classification of a certain term and and determine the actual class of the term by finding the most frequent assigned number for that term. I thought it might be easier to combine them into 1 data frame but either way is fine. Let me know if it shows up clean. I saved the dput in txt file and copied here from that file. I assume this is the right way to do it. I might be wrong. == dat1 structure(list(class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac, #mac,#security, accountability,anonymous, data security,encryption,security ), class = factor)), .Names = c(class.1, terms), class = data.frame, row.names = c(NA, -49L)) dat2 structure(list(class.2 = c(2L, 2L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac, #mac,#security, accountability,anonymous, data security,encryption,security ), class = factor)), .Names = c(class.2, terms), class = data.frame, row.names = c(NA, -49L)) dat3 structure(list(class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 2L, 1L, 2L), terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac, #mac,#security, accountability,anonymous, data security,encryption,security ), class = factor)), .Names = c(class.3, terms), class = data.frame, row.names = c(NA, -49L)) = On Wed, May 27, 2015 at 8:05 AM, John Kane jrkrid...@inbox.com wrote: Hi Mohammad, I went back and reread your original statement of the problem about and I think I kinda grasp it. It is actually quite clear and I misunderstood it completely. At the moment I have no idea how to approach it. As Jim Lemon said, it looks easy but may not be. I'll go back and re-examine Jim's approach. You might want to create three sample data sets of the original data layouts and upload them, in dput() format, to the list. It may be easier to tackle from that approach. In any case, in the existing data set is a 2 a numeric value 2 or just an on/off indicator? John Kane Kingston ON Canada -Original Message- From: mxalimoha...@ualr.edu Sent: Tue, 26 May 2015 20:11:08 -0500 To: r-help@r-project.org Subject: Re: [R] Problem with comparing multiple data sets Thank you John. Yes. as you mentioned this is not really what I am looking for. It's interesting because I was really thinking that it should be pretty easy. All I need to do is just compare class1, class2 and class3 for each text and put the most frequent number next to it in each row. Repeat it for all the rows. Apparently it's not that simple. Sorry I didn't notice that I sent it only to you! Thanks for letting me know. I appreciate if anybody can help
Re: [R] Problem with comparing multiple data sets
Hi Mohammad, My mantra for the day is Plain Text, Plain Text. A bas HTML. And I really need to get out of here. I have not found a solution but is this a bit more like what you want? #=== dat1 - structure(list(class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac, #mac,#security, accountability,anonymous, data security,encryption,security ), class = factor)), .Names = c(class.1, terms), class = data.frame, row.names = c(NA, -49L)) dat2 - structure(list(class.2 = c(2L, 2L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac, #mac,#security, accountability,anonymous, data security,encryption,security ), class = factor)), .Names = c(class.2, terms), class = data.frame, row.names = c(NA, -49L)) dat3 - structure(list(class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 2L, 1L, 2L), terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac, #mac,#security, accountability,anonymous, data security,encryption,security ), class = factor)), .Names = c(class.3, terms), class = data.frame, row.names = c(NA, -49L)) names(dat1) - names(dat2) - names(dat3) - c(class, term) bbind - rbind(dat1, dat1, dat3) with(bbind, table( term, class)) #= John Kane Kingston ON Canada -Original Message- From: mxalimoha...@ualr.edu Sent: Wed, 27 May 2015 09:37:24 -0500 To: jrkrid...@inbox.com, r-help@r-project.org Subject: Re: [R] Problem with comparing multiple data sets Thanks John, I really hope it can be answered. Yes all 3 data sets have the same items. On Wed, May 27, 2015 at 9:32 AM, John Kane jrkrid...@inbox.com wrote: Thanks Mohammad. The data appear to have come through just fine. This probably means you can ignore some of the questions I just sent you -- our emails are crossing. I probably will not get a chance to look at this til this afternoon (10:25 here now). We can hope someone with more skill than I have will have solved the problem by then. This is starting to sound a bit like a psychometric inter-rater reliability study. Does each data set contain the same set of items ? John Kane Kingston ON Canada -Original Message- From: mxalimoha...@ualr.edu Sent: Wed, 27 May 2015 09:18:12 -0500 To: jrkrid...@inbox.com, r-help@r-project.org Subject: Re: [R] Problem with comparing multiple data sets Hi John, I created the original data set with dput . This time I only loaded 50 values for each data set (dat1, dat2, dat3). About your question, all 0,1 and 2 are indicator of a specific class. The task is to compare 3 independent classification of a certain term and and determine the actual class of the term by finding the most frequent assigned number for that term. I thought it might be easier to combine them into 1 data frame but either way is fine. Let me know if it shows up clean. I saved the dput in txt file and copied here from that file. I assume this is the right way to do it. I might be wrong. == dat1 structure(list(class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac, #mac,#security, accountability,anonymous, data security,encryption,security ), class = factor)), .Names = c(class.1, terms), class = data.frame, row.names = c(NA, -49L)) dat2 structure(list(class.2 = c(2L, 2L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L
Re: [R] Problem with comparing multiple data sets
Thanks Mohammad. The data appear to have come through just fine. This probably means you can ignore some of the questions I just sent you -- our emails are crossing. I probably will not get a chance to look at this til this afternoon (10:25 here now). We can hope someone with more skill than I have will have solved the problem by then. This is starting to sound a bit like a psychometric inter-rater reliability study. Does each data set contain the same set of items ? John Kane Kingston ON Canada -Original Message- From: mxalimoha...@ualr.edu Sent: Wed, 27 May 2015 09:18:12 -0500 To: jrkrid...@inbox.com, r-help@r-project.org Subject: Re: [R] Problem with comparing multiple data sets Hi John, I created the original data set with dput . This time I only loaded 50 values for each data set (dat1, dat2, dat3). About your question, all 0,1 and 2 are indicator of a specific class. The task is to compare 3 independent classification of a certain term and and determine the actual class of the term by finding the most frequent assigned number for that term. I thought it might be easier to combine them into 1 data frame but either way is fine. Let me know if it shows up clean. I saved the dput in txt file and copied here from that file. I assume this is the right way to do it. I might be wrong. == dat1 structure(list(class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac, #mac,#security, accountability,anonymous, data security,encryption,security ), class = factor)), .Names = c(class.1, terms), class = data.frame, row.names = c(NA, -49L)) dat2 structure(list(class.2 = c(2L, 2L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac, #mac,#security, accountability,anonymous, data security,encryption,security ), class = factor)), .Names = c(class.2, terms), class = data.frame, row.names = c(NA, -49L)) dat3 structure(list(class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 2L, 1L, 2L), terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac, #mac,#security, accountability,anonymous, data security,encryption,security ), class = factor)), .Names = c(class.3, terms), class = data.frame, row.names = c(NA, -49L)) = On Wed, May 27, 2015 at 8:05 AM, John Kane jrkrid...@inbox.com wrote: Hi Mohammad, I went back and reread your original statement of the problem about and I think I kinda grasp it. It is actually quite clear and I misunderstood it completely. At the moment I have no idea how to approach it. As Jim Lemon said, it looks easy but may not be. I'll go back and re-examine Jim's approach. You might want to create three sample data sets of the original data layouts and upload them, in dput() format, to the list. It may be easier to tackle from that approach. In any case, in the existing data set is a 2 a numeric value 2 or just an on/off indicator? John Kane Kingston ON Canada -Original Message- From: mxalimoha...@ualr.edu Sent: Tue, 26 May 2015 20:11:08 -0500 To: r-help@r-project.org Subject: Re: [R] Problem with comparing multiple data sets Thank you John. Yes. as you mentioned this is not really what I am looking for. It's interesting because I was really thinking that it should be pretty easy. All I need to do is just compare class1, class2 and class3 for each text and put the most frequent number next to it in each row. Repeat it for all the rows. Apparently it's not that simple. Sorry I didn't notice that I sent it only to you! Thanks for letting me know. I appreciate if anybody can help on this. Thank you. On Tue, May 26, 2015 at 7:27 PM, John Kane jrkrid...@inbox.com wrote: Hi Mohammad, The data came through beautifully despite the fact that you posted in HTML. Please, post
Re: [R] Problem with comparing multiple data sets
I was wondering about the layout of each of your data sets. I cobbled together what I think is the most likely scenarios. My bet is the data sets most closely resemble my data set 4 in structure. Am I correct? I dropped the other two columns in your data layout as likely to be immaterial to the problem. data set 1 (unique text and class) class text 0 text1 2 text2 1 text3 2 text4 data set 2 (unique class, multiple text) class text 0 text1 0 text1 0 text1 2 text2 1 text3 2 text4 data set 3 (multiple classes, multiple text) class text 0 text1 0 text1 1 text1 2 text2 1 text3 2 text4 data set 4 (mutltiple classes , multiple text, text not found in other data sets) 0 text1 0 text1 1 text1 2 text2 1 text3 2 text4 2 text6 0 text6 John Kane Kingston ON Canada -Original Message- From: mxalimoha...@ualr.edu Sent: Tue, 26 May 2015 20:11:08 -0500 To: r-help@r-project.org Subject: Re: [R] Problem with comparing multiple data sets Thank you John. Yes. as you mentioned this is not really what I am looking for. It's interesting because I was really thinking that it should be pretty easy. All I need to do is just compare class1, class2 and class3 for each text and put the most frequent number next to it in each row. Repeat it for all the rows. Apparently it's not that simple. Sorry I didn't notice that I sent it only to you! Thanks for letting me know. I appreciate if anybody can help on this. Thank you. On Tue, May 26, 2015 at 7:27 PM, John Kane jrkrid...@inbox.com wrote: Hi Mohammad, The data came through beautifully despite the fact that you posted in HTML. Please, post in plain text. Oh, just as I was ready to push Send, I noticed you only replied to me. You really should reply to the R-help list since there are a lot more and better people to help there. Besides it's a world-wide list. Others can play with the problem while we sleep :) . I will just reply to you but I really suggest sending all of this to the list. Now I am wondering what to do with the data. As a first swipe I just added up all the values in each class by each text value. Results are below. Not what you want by any means but perhaps a small step. Then I started to think are we really interested in the sum or should we be looking at incidence, that is should we be looking at the frequency rather than the sum? Is class.1 class.2 class #dac 0 2 0 a value of 2 (sum) or a hit of 1 (count or freq) ? Anyway below is what I have tried so far -- it may not be anywhere near what you want but if it makes any sense then I think we just need to pick off the highest values for each combination of terms and class to give you what you want. I suspect our real data-munging gurus can do all this faster and better than I can but hopefully it is a start. Where your data set is dat1 #= # If reshape2 is not installed. install.packages(reshape2) #= library(reshape2) mdat - melt(dat1, id.vars= c(terms), variable.name = class, value.name = value, na.rm = FALSE) mdat1 - aggregate(value ~ terms + class, data = mdat, sum) mdat1[order(mdat1$terms, mdat1$class), ] #= John Kane Kingston ON Canada -Original Message- From: mxalimoha...@ualr.edu Sent: Tue, 26 May 2015 09:50:43 -0500 To: jrkrid...@inbox.com Subject: Re: [R] Problem with comparing multiple data sets Thank you John for being patient with me. My original post was to compare 3 sets of data which had difference in their class value for the same text. However, I thought it might be easier to combine those 3 data sets into one that shows the 3 different classes and then find the most frequent class value for the text. So that's what I did. Now I only want to add the most frequent class value in a new column. I tried to create a dput version of the data set (Only a small part of it) so you can see. I hope it works. Tweet1- read.csv(file=part1_complete.csv,head=TRUE,sep= ,) dput(head(Tweet1, 100)) structure(list(class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 0L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 0L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L), class.2 = c(2L, 2L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 0L
Re: [R] Problem with comparing multiple data sets
Hi John, I created the original data set with dput . This time I only loaded 50 values for each data set (dat1, dat2, dat3). About your question, all 0,1 and 2 are indicator of a specific class. The task is to compare 3 independent classification of a certain term and and determine the actual class of the term by finding the most frequent assigned number for that term. I thought it might be easier to combine them into 1 data frame but either way is fine. Let me know if it shows up clean. I saved the dput in txt file and copied here from that file. I assume this is the right way to do it. I might be wrong. == *dat1* structure(list(class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac, #mac,#security, accountability,anonymous, data security,encryption,security ), class = factor)), .Names = c(class.1, terms), class = data.frame, row.names = c(NA, -49L)) *dat2* structure(list(class.2 = c(2L, 2L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac, #mac,#security, accountability,anonymous, data security,encryption,security ), class = factor)), .Names = c(class.2, terms), class = data.frame, row.names = c(NA, -49L)) *dat3* structure(list(class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 2L, 1L, 2L), terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac, #mac,#security, accountability,anonymous, data security,encryption,security ), class = factor)), .Names = c(class.3, terms), class = data.frame, row.names = c(NA, -49L)) = On Wed, May 27, 2015 at 8:05 AM, John Kane jrkrid...@inbox.com wrote: Hi Mohammad, I went back and reread your original statement of the problem about and I think I kinda grasp it. It is actually quite clear and I misunderstood it completely. At the moment I have no idea how to approach it. As Jim Lemon said, it looks easy but may not be. I'll go back and re-examine Jim's approach. You might want to create three sample data sets of the original data layouts and upload them, in dput() format, to the list. It may be easier to tackle from that approach. In any case, in the existing data set is a 2 a numeric value 2 or just an on/off indicator? John Kane Kingston ON Canada -Original Message- From: mxalimoha...@ualr.edu Sent: Tue, 26 May 2015 20:11:08 -0500 To: r-help@r-project.org Subject: Re: [R] Problem with comparing multiple data sets Thank you John. Yes. as you mentioned this is not really what I am looking for. It's interesting because I was really thinking that it should be pretty easy. All I need to do is just compare class1, class2 and class3 for each text and put the most frequent number next to it in each row. Repeat it for all the rows. Apparently it's not that simple. Sorry I didn't notice that I sent it only to you! Thanks for letting me know. I appreciate if anybody can help on this. Thank you. On Tue, May 26, 2015 at 7:27 PM, John Kane jrkrid...@inbox.com wrote: Hi Mohammad, The data came through beautifully despite the fact that you posted in HTML. Please, post in plain text. Oh, just as I was ready to push Send, I noticed you only replied to me. You really should reply to the R-help list since there are a lot more and better people to help there. Besides it's a world-wide list. Others can play with the problem while we sleep :) . I will just reply to you but I really suggest sending all of this to the list. Now I am wondering what to do with the data. As a first swipe I just added up all the values in each class by each text value. Results are below. Not what you want by any means but perhaps a small step. Then I started to think are we really interested in the sum or should we be looking at incidence, that is should we be looking at the frequency rather than the sum
Re: [R] Problem with comparing multiple data sets
(text1,text2,text3)) df3-data.frame(Class=c(2,1,0),Comment=c(com1,com2,com3), Term=c(aac,aax,vvx),Text=c(text1,text2,text3)) dflist-list(df1,df2,df3) dflist # define a function that extracts the value from one field # selected by a value in another field extract_by_value-function(x,field1,value1,field2) { return(x[x[,field1]==value1,field2]) } # define another function that equates all of the values sub_value-function(x,field1,value1,field2,value2) { x[x[,field1]==value1,field2]-value2 return(x) } conformity-function(x,fieldname1,value1,fieldname2) { # get the most frequent value in fieldname2 # for the desired value in fieldname1 most_freq-as.numeric(names(which.max(table(unlist(lapply(x, extract_by_value,fieldname1,value1,fieldname2)) # now set all the values to the most frequent for(i in 1:length(x)) x[[i]]-sub_value(x[[i]],fieldname1,value1,fieldname2,most_freq) return(x) } conformity(dflist,Text,text1,Class) Jim On Sat, May 23, 2015 at 11:23 PM, John Kane jrkrid...@inbox.com wrote: Hi Mohammad Welcome to the R-help list. There probably is a fairly easy way to what you want but I think we probably need a bit more background information on what you are trying to achieve. I know I'm not exactly clear on your decision rule(s). It would also be very useful to see some actual sample data in useable R format.Have a look at these links http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example and http://adv-r.had.co.nz/Reproducibility.html for some hints on what you might want to include in your question. In particular, read up about dput() in those links and/or see ?dput. This is the generally preferred way to supply sample or illustrative data to the R-help list. It basically creates a perfect copy of the data as it exists on 'your' machine so that R-help readers see exactly what you do. John Kane Kingston ON Canada -Original Message- From: mxalimoha...@ualr.edu Sent: Fri, 22 May 2015 12:37:50 -0500 To: r-help@r-project.org Subject: [R] Problem with comparing multiple data sets Hi everyone, I am very new to R and I have a task to do. I appreciate any help. I have 3 data sets. Each data set has 4 columns. For example: Class Comment Term Text 0 com1aactext1 2 com2aaxtext2 1 com3vvxtext3 Now I need t compare the class section between 3 data sets and assign the most available class to that text. For example if text1 is assigned to class 0 in data set 12 but assigned as 2 in data set 3 then it should be assigned to class 0. If they are all the same so the class will be the same. The ideal thing would be to keep the same format and just update the class. Is there any easy way to do this? Thanks a lot. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop! __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Mohammad Alimohammadi | Graduate Assistant University of Arkansas at Little Rock | College of Science and Mathematics (CSAM) | mxalimoha...@ualr.edu | ualr.edu Public URL: http://scholar.google.com/citations?user=MsfN_i8J -- Mohammad Alimohammadi | Graduate Assistant University of Arkansas at Little Rock | College of Science and Mathematics (CSAM) 501.346.8007 | mxalimoha...@ualr.edu | ualr.edu Public URL: http://scholar.google.com/citations?user=MsfN_i8J [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with comparing multiple data sets
Save the result of the apply() function: Out - apply(df[ ,2:length(df)], 1, mfv) Then there are several options: Approximately what you asked for data.frame(Out) t(t(Out)) More typing but exactly what you asked for cat(paste0([, 1:length(Out), ] , Out), sep=\n) David L. Carlson Department of Anthropology Texas AM University -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Mohammad Alimohammadi Sent: Wednesday, May 27, 2015 1:47 PM To: John Kane; r-help@r-project.org Subject: Re: [R] Problem with comparing multiple data sets Ok. so I read about the (modeest) package that gives the results that I am looking for (most repeated value). I modified the data frame a little and moved the text to the first column. This is the data frame with all 3 possible classes for each term. = structure(list(terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac, #mac,#security, accountability,anonymous, data security,encryption,security ), class = factor), class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), class.2 = c(2L, 2L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 2L, 1L, 2L)), .Names = c(terms, class.1, class.2, class.3), class = data.frame, row.names = c(NA, -49L)) = #Then I applied the function below: == library(modeest) df- read.csv(file=short.csv, head= TRUE, sep=,) apply(df[ ,2:length(df)], 1, mfv) # It gives the most frequent value for each row which is what I need. The only problem is that all the values are displayed in one single row. [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 1 1 0 0 0 0 2 1 2 It would be much better to show them in separate rows. For example: [1] 0 [2] 0 [3] 1 Any idea how to do this? On Wed, May 27, 2015 at 10:11 AM, Mohammad Alimohammadi mxalimoha...@ualr.edu wrote: Hi Jim, Thank you for your advice. I'm not sure how to exactly incorporate this function though. I added a portion of the actual data sets. all 3 data sets have the same items (text) with different class values. So I need to assign the most repeated class (0,1,2) for each text. For example: if line1 has text aaa. It may be assigned to class 0 in dat1, 2 in dat 2 and 0 in dat3. in this case the aaa will be assigned to 0 (most repeated value). So it goes for each text. I really appreciate your help. = *dat1* structure(list(class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac, #mac,#security, accountability,anonymous, data security,encryption,security ), class = factor)), .Names = c(class.1, terms), class = data.frame, row.names = c(NA, -49L)) *dat2* structure(list(class.2 = c(2L, 2L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac, #mac,#security, accountability,anonymous, data security,encryption,security ), class = factor)), .Names = c(class.2, terms), class = data.frame, row.names = c(NA, -49L)) *dat3* structure(list(class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 2L, 1L, 2L), terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac, #mac
Re: [R] Problem with comparing multiple data sets
Thanks David it worked ! One more thing. I hope it's not complicated. Is it also possible to display the terms for each row next to it? for example: [1] #dac2 [2] #dac0 [3] #dac1 ... On Wed, May 27, 2015 at 2:18 PM, David L Carlson dcarl...@tamu.edu wrote: Save the result of the apply() function: Out - apply(df[ ,2:length(df)], 1, mfv) Then there are several options: Approximately what you asked for data.frame(Out) t(t(Out)) More typing but exactly what you asked for cat(paste0([, 1:length(Out), ] , Out), sep=\n) David L. Carlson Department of Anthropology Texas AM University -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Mohammad Alimohammadi Sent: Wednesday, May 27, 2015 1:47 PM To: John Kane; r-help@r-project.org Subject: Re: [R] Problem with comparing multiple data sets Ok. so I read about the (modeest) package that gives the results that I am looking for (most repeated value). I modified the data frame a little and moved the text to the first column. This is the data frame with all 3 possible classes for each term. = structure(list(terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac, #mac,#security, accountability,anonymous, data security,encryption,security ), class = factor), class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), class.2 = c(2L, 2L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 2L, 1L, 2L)), .Names = c(terms, class.1, class.2, class.3), class = data.frame, row.names = c(NA, -49L)) = #Then I applied the function below: == library(modeest) df- read.csv(file=short.csv, head= TRUE, sep=,) apply(df[ ,2:length(df)], 1, mfv) # It gives the most frequent value for each row which is what I need. The only problem is that all the values are displayed in one single row. [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 1 1 0 0 0 0 2 1 2 It would be much better to show them in separate rows. For example: [1] 0 [2] 0 [3] 1 Any idea how to do this? On Wed, May 27, 2015 at 10:11 AM, Mohammad Alimohammadi mxalimoha...@ualr.edu wrote: Hi Jim, Thank you for your advice. I'm not sure how to exactly incorporate this function though. I added a portion of the actual data sets. all 3 data sets have the same items (text) with different class values. So I need to assign the most repeated class (0,1,2) for each text. For example: if line1 has text aaa. It may be assigned to class 0 in dat1, 2 in dat 2 and 0 in dat3. in this case the aaa will be assigned to 0 (most repeated value). So it goes for each text. I really appreciate your help. = *dat1* structure(list(class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac, #mac,#security, accountability,anonymous, data security,encryption,security ), class = factor)), .Names = c(class.1, terms), class = data.frame, row.names = c(NA, -49L)) *dat2* structure(list(class.2 = c(2L, 2L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac, #mac,#security, accountability,anonymous, data security,encryption,security ), class = factor)), .Names = c(class.2, terms), class = data.frame, row.names = c(NA, -49L)) *dat3* structure(list(class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L
Re: [R] Problem with comparing multiple data sets
cat(paste0([, 1:length(Out), ] #dac , Out), sep=\n) David From: Mohammad Alimohammadi [mailto:mxalimoha...@ualr.edu] Sent: Wednesday, May 27, 2015 2:29 PM To: David L Carlson; r-help@r-project.org Subject: Re: [R] Problem with comparing multiple data sets Thanks David it worked ! One more thing. I hope it's not complicated. Is it also possible to display the terms for each row next to it? for example: [1] #dac2 [2] #dac0 [3] #dac1 ... On Wed, May 27, 2015 at 2:18 PM, David L Carlson dcarl...@tamu.edumailto:dcarl...@tamu.edu wrote: Save the result of the apply() function: Out - apply(df[ ,2:length(df)], 1, mfv) Then there are several options: Approximately what you asked for data.frame(Out) t(t(Out)) More typing but exactly what you asked for cat(paste0([, 1:length(Out), ] , Out), sep=\n) David L. Carlson Department of Anthropology Texas AM University -Original Message- From: R-help [mailto:r-help-boun...@r-project.orgmailto:r-help-boun...@r-project.org] On Behalf Of Mohammad Alimohammadi Sent: Wednesday, May 27, 2015 1:47 PM To: John Kane; r-help@r-project.orgmailto:r-help@r-project.org Subject: Re: [R] Problem with comparing multiple data sets Ok. so I read about the (modeest) package that gives the results that I am looking for (most repeated value). I modified the data frame a little and moved the text to the first column. This is the data frame with all 3 possible classes for each term. = structure(list(terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac, #mac,#security, accountability,anonymous, data security,encryption,security ), class = factor), class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), class.2 = c(2L, 2L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 2L, 1L, 2L)), .Names = c(terms, class.1, class.2, class.3), class = data.frame, row.names = c(NA, -49L)) = #Then I applied the function below: == library(modeest) df- read.csv(file=short.csv, head= TRUE, sep=,) apply(df[ ,2:length(df)], 1, mfv) # It gives the most frequent value for each row which is what I need. The only problem is that all the values are displayed in one single row. [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 1 1 0 0 0 0 2 1 2 It would be much better to show them in separate rows. For example: [1] 0 [2] 0 [3] 1 Any idea how to do this? On Wed, May 27, 2015 at 10:11 AM, Mohammad Alimohammadi mxalimoha...@ualr.edumailto:mxalimoha...@ualr.edu wrote: Hi Jim, Thank you for your advice. I'm not sure how to exactly incorporate this function though. I added a portion of the actual data sets. all 3 data sets have the same items (text) with different class values. So I need to assign the most repeated class (0,1,2) for each text. For example: if line1 has text aaa. It may be assigned to class 0 in dat1, 2 in dat 2 and 0 in dat3. in this case the aaa will be assigned to 0 (most repeated value). So it goes for each text. I really appreciate your help. = *dat1* structure(list(class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac, #mac,#security, accountability,anonymous, data security,encryption,security ), class = factor)), .Names = c(class.1, terms), class = data.frame, row.names = c(NA, -49L)) *dat2* structure(list(class.2 = c(2L, 2L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c(#dac, #mac
Re: [R] Problem with comparing multiple data sets
Thank you John. Yes. as you mentioned this is not really what I am looking for. It's interesting because I was really thinking that it should be pretty easy. All I need to do is just compare class1, class2 and class3 for each text and put the most frequent number next to it in each row. Repeat it for all the rows. Apparently it's not that simple. Sorry I didn't notice that I sent it only to you! Thanks for letting me know. I appreciate if anybody can help on this. Thank you. On Tue, May 26, 2015 at 7:27 PM, John Kane jrkrid...@inbox.com wrote: Hi Mohammad, The data came through beautifully despite the fact that you posted in HTML. Please, post in plain text. Oh, just as I was ready to push Send, I noticed you only replied to me. You really should reply to the R-help list since there are a lot more and better people to help there. Besides it's a world-wide list. Others can play with the problem while we sleep :) . I will just reply to you but I really suggest sending all of this to the list. Now I am wondering what to do with the data. As a first swipe I just added up all the values in each class by each text value. Results are below. Not what you want by any means but perhaps a small step. Then I started to think are we really interested in the sum or should we be looking at incidence, that is should we be looking at the frequency rather than the sum? Is class.1 class.2 class #dac 0 2 0 a value of 2 (sum) or a hit of 1 (count or freq) ? Anyway below is what I have tried so far -- it may not be anywhere near what you want but if it makes any sense then I think we just need to pick off the highest values for each combination of terms and class to give you what you want. I suspect our real data-munging gurus can do all this faster and better than I can but hopefully it is a start. Where your data set is dat1 #= # If reshape2 is not installed. install.packages(reshape2) #= library(reshape2) mdat - melt(dat1, id.vars= c(terms), variable.name = class, value.name = value, na.rm = FALSE) mdat1 - aggregate(value ~ terms + class, data = mdat, sum) mdat1[order(mdat1$terms, mdat1$class), ] #= John Kane Kingston ON Canada -Original Message- From: mxalimoha...@ualr.edu Sent: Tue, 26 May 2015 09:50:43 -0500 To: jrkrid...@inbox.com Subject: Re: [R] Problem with comparing multiple data sets Thank you John for being patient with me. My original post was to compare 3 sets of data which had difference in their class value for the same text. However, I thought it might be easier to combine those 3 data sets into one that shows the 3 different classes and then find the most frequent class value for the text. So that's what I did. Now I only want to add the most frequent class value in a new column. I tried to create a dput version of the data set (Only a small part of it) so you can see. I hope it works. Tweet1- read.csv(file=part1_complete.csv,head=TRUE,sep= ,) dput(head(Tweet1, 100)) structure(list(class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 0L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 0L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L), class.2 = c(2L, 2L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 2L, 1L, 2L, 0L, 2L, 2L, 0L, 2L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 2L, 1L, 0L, 0L, 1L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 0L, 2L, 2L, 1L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L), terms = structure(c(9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 69L, 69L, 69L, 69L, 69L, 40L, 40L, 40L, 40L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 98L, 98L, 98L, 98L, 98L, 98L, 98L, 98L, 98L, 98L, 98L, 98L, 98L, 98L, 23L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L, 87L), .Label = c(#accountability
Re: [R] Problem with comparing multiple data sets
Hi Mohammad, You know, I thought this would be fairly easy, but it wasn't really. df1-data.frame(Class=c(0,2,1),Comment=c(com1,com2,com3), Term=c(aac,aax,vvx),Text=c(text1,text2,text3)) df2-data.frame(Class=c(0,2,1),Comment=c(com1,com2,com3), Term=c(aac,aax,vvx),Text=c(text1,text2,text3)) df3-data.frame(Class=c(2,1,0),Comment=c(com1,com2,com3), Term=c(aac,aax,vvx),Text=c(text1,text2,text3)) dflist-list(df1,df2,df3) dflist # define a function that extracts the value from one field # selected by a value in another field extract_by_value-function(x,field1,value1,field2) { return(x[x[,field1]==value1,field2]) } # define another function that equates all of the values sub_value-function(x,field1,value1,field2,value2) { x[x[,field1]==value1,field2]-value2 return(x) } conformity-function(x,fieldname1,value1,fieldname2) { # get the most frequent value in fieldname2 # for the desired value in fieldname1 most_freq-as.numeric(names(which.max(table(unlist(lapply(x, extract_by_value,fieldname1,value1,fieldname2)) # now set all the values to the most frequent for(i in 1:length(x)) x[[i]]-sub_value(x[[i]],fieldname1,value1,fieldname2,most_freq) return(x) } conformity(dflist,Text,text1,Class) Jim On Sat, May 23, 2015 at 11:23 PM, John Kane jrkrid...@inbox.com wrote: Hi Mohammad Welcome to the R-help list. There probably is a fairly easy way to what you want but I think we probably need a bit more background information on what you are trying to achieve. I know I'm not exactly clear on your decision rule(s). It would also be very useful to see some actual sample data in useable R format.Have a look at these links http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example and http://adv-r.had.co.nz/Reproducibility.html for some hints on what you might want to include in your question. In particular, read up about dput() in those links and/or see ?dput. This is the generally preferred way to supply sample or illustrative data to the R-help list. It basically creates a perfect copy of the data as it exists on 'your' machine so that R-help readers see exactly what you do. John Kane Kingston ON Canada -Original Message- From: mxalimoha...@ualr.edu Sent: Fri, 22 May 2015 12:37:50 -0500 To: r-help@r-project.org Subject: [R] Problem with comparing multiple data sets Hi everyone, I am very new to R and I have a task to do. I appreciate any help. I have 3 data sets. Each data set has 4 columns. For example: Class Comment Term Text 0 com1aactext1 2 com2aaxtext2 1 com3vvxtext3 Now I need t compare the class section between 3 data sets and assign the most available class to that text. For example if text1 is assigned to class 0 in data set 12 but assigned as 2 in data set 3 then it should be assigned to class 0. If they are all the same so the class will be the same. The ideal thing would be to keep the same format and just update the class. Is there any easy way to do this? Thanks a lot. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop! __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with comparing multiple data sets
Hi Mohammad Welcome to the R-help list. There probably is a fairly easy way to what you want but I think we probably need a bit more background information on what you are trying to achieve. I know I'm not exactly clear on your decision rule(s). It would also be very useful to see some actual sample data in useable R format.Have a look at these links http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example and http://adv-r.had.co.nz/Reproducibility.html for some hints on what you might want to include in your question. In particular, read up about dput() in those links and/or see ?dput. This is the generally preferred way to supply sample or illustrative data to the R-help list. It basically creates a perfect copy of the data as it exists on 'your' machine so that R-help readers see exactly what you do. John Kane Kingston ON Canada -Original Message- From: mxalimoha...@ualr.edu Sent: Fri, 22 May 2015 12:37:50 -0500 To: r-help@r-project.org Subject: [R] Problem with comparing multiple data sets Hi everyone, I am very new to R and I have a task to do. I appreciate any help. I have 3 data sets. Each data set has 4 columns. For example: Class Comment Term Text 0 com1aactext1 2 com2aaxtext2 1 com3vvxtext3 Now I need t compare the class section between 3 data sets and assign the most available class to that text. For example if text1 is assigned to class 0 in data set 12 but assigned as 2 in data set 3 then it should be assigned to class 0. If they are all the same so the class will be the same. The ideal thing would be to keep the same format and just update the class. Is there any easy way to do this? Thanks a lot. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop! __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with comparing multiple data sets
Hi everyone, I am very new to R and I have a task to do. I appreciate any help. I have 3 data sets. Each data set has 4 columns. For example: Class Comment Term Text 0 com1aactext1 2 com2aaxtext2 1 com3vvxtext3 Now I need t compare the class section between 3 data sets and assign the most available class to that text. For example if text1 is assigned to class 0 in data set 12 but assigned as 2 in data set 3 then it should be assigned to class 0. If they are all the same so the class will be the same. The ideal thing would be to keep the same format and just update the class. Is there any easy way to do this? Thanks a lot. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.