Re: [R] missing and replace
Dear All, Replacing missing values with means is generally not a good idea: "Perhaps the easiest way to impute is to replace each missing value with the mean of the observed values for that variable. Unfortunately, this strategy can severely distort the distribution for this variable, leading to complications with summary measures including, notably, underestimates of the standard deviation. Moreover, mean imputation distorts relationships between variables by “pulling” estimates of the correlation toward zero." That's from Gelman and Hill -- more here : http://www.stat.columbia.edu/~gelman/arm/missing.pdf best, Fraser From: Val [valkr...@gmail.com] Sent: Wednesday, April 26, 2017 8:45 PM To: r-help@R-project.org (r-help@r-project.org) Subject: [R] missing and replace HI all, I have a data frame with three variables. Some of the variables do have missing values and I want to replace those missing values (1represented by NA) with the mean value of that variable. In this sample data, variable z and y do have missing values. The mean value of y and z are152. 25 and 359.5, respectively . I want replace those missing values by the respective mean value ( rounded to the nearest whole number). DF1 <- read.table(header=TRUE, text='ID1 x y z 1 25 122352 2 30 135376 3 40 NA350 4 26 157NA 5 60 195360') mean x= 36.2 mean y=152.25 mean z= 359.5 output ID1 x y z 1 25 122 352 2 30 135 376 3 40 152 350 4 26 157 360 5 60 195 360 Thank you in advance __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] F Distribution
Dear Bob, You want... > qf( .95,1, 1) [1] 161.4476 Best, Fraser -Original Message- From: Robert Sherry [mailto:rsher...@comcast.net] Sent: Monday, December 21, 2015 2:51 PM To: R Project Help Subject: [R] F Distribution When I use a table, from a Schaum book, I see that for the 95 percentile, with v_1 = 1 and v_2 = 1 the value is 161. In the modern era, looking values up in a table is less than ideal. Therefore, I would expect R to have a function to do this and based upon my reading of the documentation, I would expect the following call to get the value I expect: pf( .95,1, 1) However, it produces 0.4918373 Therefore, I conclude that I am using the wrong function. What function should I use? Thanks Bob __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] textplot() in wordcloud package
Just a quick note to thank David Carlson and Jim lemon for the helpful replies on this... The easiest solution are to punt on WordCloud and use either the plotrix or maptools packages. In plotrix the function is thigmophobe.labels(). In mapTools it's pointLabel() Neither complains when you pass it the pos= parameter. Thank you one and all! Best, Fraser -Original Message- From: David L Carlson [mailto:dcarl...@tamu.edu] Sent: Monday, March 16, 2015 12:02 PM To: David L Carlson; Fraser D. Neiman; r-help@r-project.org Subject: RE: textplot() in wordcloud package Another possibility is to use pointLabel() in package maptools. For your example library(maptools) plot(x,y) pointLabel(x, y, text1) Advantages of pointLabel() are that it returns a list of the x and y coordinates of the labels that you can tweak if necessary and, at least in your example, it does a better job of avoiding labels being chopped at the plot margins. - David L Carlson Department of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David L Carlson Sent: Monday, March 16, 2015 10:44 AM To: Fraser D. Neiman; r-help@r-project.org Subject: Re: [R] textplot() in wordcloud package You should contact the package maintainer about this. The problem is that the pos= argument is being passed to strwidth() and strheight() and those functions do not know what to do with it. In the meantime: suppressWarnings(textplot(x,y, text1, new=F, show.lines=F, pos=4)) will eliminate the warnings. - David L Carlson Department of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Fraser D. Neiman Sent: Friday, March 13, 2015 3:29 PM To: r-help@r-project.org Subject: [R] textplot() in wordcloud package Dear All, The textplot() function in the wordcloud package seem to do a good job with generating non-overlapping labels on a scatter plot. But it throws warnings when I try to use the pos= parameter to position the text labels relative to a given x-y point. Here is a simple example: x-runif(100) y-runif(100) text1- rep('LAB', 100) plot(x,y) textplot(x,y, text1, new=F, show.lines=F, pos=4) There were 50 or more warnings (use warnings() to see the first 50) warnings() Warning messages: 1: In strwidth(words[i], cex = cex[i], ...) : pos is not a graphical parameter 2: In strheight(words[i], cex = cex[i], ...) : pos is not a graphical parameter How can I pass the pos=parameter to text() without generating the warnings? I am doubly puzzled by the warnings because in the graph that results from the foregoing code, The labels are to the right of the points, as 'pos=4' requests. Thanks! Fraser D. Neiman __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] textplot() in wordcloud package
Dear All, The textplot() function in the wordcloud package seem to do a good job with generating non-overlapping labels on a scatter plot. But it throws warnings when I try to use the pos= parameter to position the text labels relative to a given x-y point. Here is a simple example: x-runif(100) y-runif(100) text1- rep('LAB', 100) plot(x,y) textplot(x,y, text1, new=F, show.lines=F, pos=4) There were 50 or more warnings (use warnings() to see the first 50) warnings() Warning messages: 1: In strwidth(words[i], cex = cex[i], ...) : pos is not a graphical parameter 2: In strheight(words[i], cex = cex[i], ...) : pos is not a graphical parameter How can I pass the pos=parameter to text() without generating the warnings? I am doubly puzzled by the warnings because in the graph that results from the foregoing code, The labels are to the right of the points, as 'pos=4' requests. Thanks! Fraser D. Neiman __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R vs. RStudio?
In my experience, another negative to RStudio is its performance when trying to access code or data files on a remote server over a VPN connection -- even modest files can take minutes to load and sometimes crash the session. The native R GUI seems to handle this better and I often am forced to use it when working remotely. But there is enough other good stuff in RStudio to make this a bummer. Fraser -Original Message- From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com] Sent: Sunday, January 11, 2015 5:31 AM To: Boris Steipe; R mailing list Subject: Re: [R] R vs. RStudio? On 10/01/2015 9:22 PM, Boris Steipe wrote: Could someone kindly enlighten me whether there are currently advantages to use R Studio vs. the normal R GUI? On the Mac I can't seem to find anything compelling, on Windows (which I don't use myself) I noticed last year that there seems to be no syntax highlighting available for the R GUI but R Studio had it. Surely there must be some value proposition in that project, what am I missing? I find several advantages, and one or two disadvantages. - The debugger is nicer. You can set breakpoints in the code editor and it installs them in the right place. - It has lots of support for things like Sweave, knitr, rmarkdown, etc. - It is easy to switch between different projects. - It looks the same on all platforms, so if you switch platforms you still know what you're doing. Negatives: - I don't like the tiled display. I find it doesn't give me enough space. - At least until recently, I haven't checked with the latest release, it converts files to the native format, i.e. saving a file on Windows gives you CR LF line endings, doing it elsewhere converts them to LF. This is really irritating when files get changed for no good reason. Duncan Murdoch __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] posterior probabilities from lda.predict
Dear All, I have used the lda() function in the MASS library to estimate a set of discriminant functions to assign samples from a training set to one of six groups. The cross validation generates nearly perfect predictions for samples in the training set. Hooray! Now I want to use lda.predict() to estimate both discriminant function scores and probabilities of group membership for a second set of samples whose group membership is unknown. For each unknown sample, lda.predict() produces a six probabilities. These probabilities sum to one. So lda.predict() seems to assume that the unknown samples do, in fact, belong to one of the six groups. The problem is that it is nearly certain that some of the unknown samples in the second set do not belong to any of the six groups. For those samples, probabilities of group membership should be close to zero for all six groups. In fact, identifying which samples are unlikely to belong to any of the six groups is a major goal of the analysis. So the question is, what is lda.predict() doing behind the scenes to force the group membership probabilities to sum to one? How do I get it to not do this and produce probabilities that accurately reflect the large Mahalanobis distances of some of the unknown sample from any group centroid?\ I have searched the R-list archive on this and have found several folks asking similar questions, but no helpful answers. Thanks very much! Fraser __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RODBC and PosgreSQL problems
Dear All, I just wanted to follow up my question with an answer, which I owe to Robbie Bingler at UVA's IATH. The code chunk that bombed is here: sqlQuery(DRCch,paste( + SELECT * + FROM tblCeramicWare + )) [1] 42P01 7 ERROR: relation \tblceramicware\ does not exist;\nError while executing the query [2] [RODBC] ERROR: Could not SQLExecDirect '\n SELECT * \n FROM tblCeramicWare \n ' The following works: sqlQuery(DRCch,paste('SELECT * + FROM + tblCeramicWare + ')) WareID Ware 1 1 Coarse Earthenware, unidentified 2 2 Red Agate, refined 3 97 Agate, refined (Whieldon-type) 4 4Redware 5 5Buckley 6 6 Iberian Ware 7 87North Devon Gravel Tempered Note the double quote on the table name (a PostgreSQL feature) and the single quotes enclosing the SQL text-string that is the argument to the paste() function. Boolean operators often require single-quoted text strings and to prevent R from interpreting these as the end of the SQL string, one uses \ as an escape sequence: sqlQuery(DRCch,paste('SELECT * from tblCeramicWare WHERE Ware = \'Slip Dip\' ')) WareID Ware 1 93 Slip Dip Thanks to Robbie and to all the folks on the R-Help list for their help. Best, Fraser From: Fraser D. Neiman Sent: Friday, May 30, 2014 2:00 PM To: r-help@r-project.org Subject: RODBC and PosgreSQL problems Dear All, I am trying for the first time to run SQL queries against a remote PostgreSQL database via RODBC. I am able to establish a connection just fine, as shown by getting results back from the sqlTables(), sqlColumns() and sqlPrimary Key() functions in RODBC. However, when I try to run a SQL query using the sqlQuery() function I get [1] 42P01 7 ERROR: relation \tblceramicware\ does not exist;\nError while executing the query [2] [RODBC] ERROR: Could not SQLExecDirect '\n SELECT * \n FROM tblCeramicWare What am I doing wrong? Here are the relevant snips from the R console. What's puzzling is that tblcermicWare is recognized as an argument to sqlColumns() and sqlPrimaryKey() . But NOT in sqlQuery() . Thanks for any pointers. best, Fraser library(RODBC) # connect to DAACS and assign a name (DAACSch) to the connection DRCch - odbcConnect(postgreSQL35W , case= nochange, uid =XX,pwd=XX); #list the tables that are avalailabale sqlTables(DRCch, tableType = TABLE) TABLE_QUALIFIER TABLE_OWNER TABLE_NAME TABLE_TYPE REMARKS 1 daacs-production public TempSTPTable TABLE 2 daacs-production public activities TABLE 3 daacs-production public articles TABLE 4 daacs-production publicschema_migrations TABLE 5 daacs-production publictblACDistance TABLE 6 daacs-production public tblArtifactBox TABLE 7 daacs-production public tblArtifactImage TABLE 8 daacs-production publictblBasicColor TABLE 9 daacs-production public tblBead TABLE sqlColumns(DRCch, tblCeramicWare) TABLE_QUALIFIER TABLE_OWNER TABLE_NAME COLUMN_NAME DATA_TYPE TYPE_NAME PRECISION LENGTH SCALE RADIX NULLABLE 1 daacs-production public tblCeramicWare WareID 4 int4 10 4 0100 2 daacs-production public tblCeramicWareWare-9 varchar 50100NANA1 REMARKS COLUMN_DEF SQL_DATA_TYPE SQL_DATETIME_SUB CHAR_OCTET_LENGTH ORDINAL_POSITION 1 nextval('global_id_seq'::regclass) 4 NA -11 2 NA-9 NA 1002 IS_NULLABLE DISPLAY_SIZE FIELD_TYPE AUTO_INCREMENT PHYSICAL NUMBER TABLE OID BASE TYPEID TYPMOD 1NA 11 23 1 1 27441 0 -1 2NA 50 1043 0 2 27441 0 50 sqlPrimaryKeys(DRCch, tblCeramicWare) TABLE_QUALIFIER TABLE_OWNER TABLE_NAME COLUMN_NAME KEY_SEQ PK_NAME 1 daacs-production public tblCeramicWare WareID 1 tblCeramicWare_pkey sqlQuery(DRCch,paste( + SELECT * + FROM tblCeramicWare + )) [1] 42P01 7 ERROR: relation \tblceramicware\ does not exist;\nError while executing the query [2] [RODBC] ERROR: Could not SQLExecDirect '\n
[R] RODBC and PosgreSQL problems
Dear All, I am trying for the first time to run SQL queries against a remote PostgreSQL database via RODBC. I am able to establish a connection just fine, as shown by getting results back from the sqlTables(), sqlColumns() and sqlPrimary Key() functions in RODBC. However, when I try to run a SQL query using the sqlQuery() function I get [1] 42P01 7 ERROR: relation \tblceramicware\ does not exist;\nError while executing the query [2] [RODBC] ERROR: Could not SQLExecDirect '\n SELECT * \n FROM tblCeramicWare What am I doing wrong? Here are the relevant snips from the R console. What's puzzling is that tblcermicWare is recognized as an argument to sqlColumns() and sqlPrimaryKey() . But NOT in sqlQuery() . Thanks for any pointers. best, Fraser library(RODBC) # connect to DAACS and assign a name (DAACSch) to the connection DRCch - odbcConnect(postgreSQL35W , case= nochange, uid =XX,pwd=XX); #list the tables that are avalailabale sqlTables(DRCch, tableType = TABLE) TABLE_QUALIFIER TABLE_OWNER TABLE_NAME TABLE_TYPE REMARKS 1 daacs-production public TempSTPTable TABLE 2 daacs-production public activities TABLE 3 daacs-production public articles TABLE 4 daacs-production publicschema_migrations TABLE 5 daacs-production publictblACDistance TABLE 6 daacs-production public tblArtifactBox TABLE 7 daacs-production public tblArtifactImage TABLE 8 daacs-production publictblBasicColor TABLE 9 daacs-production public tblBead TABLE sqlColumns(DRCch, tblCeramicWare) TABLE_QUALIFIER TABLE_OWNER TABLE_NAME COLUMN_NAME DATA_TYPE TYPE_NAME PRECISION LENGTH SCALE RADIX NULLABLE 1 daacs-production public tblCeramicWare WareID 4 int4 10 4 0100 2 daacs-production public tblCeramicWareWare-9 varchar 50100NANA1 REMARKS COLUMN_DEF SQL_DATA_TYPE SQL_DATETIME_SUB CHAR_OCTET_LENGTH ORDINAL_POSITION 1 nextval('global_id_seq'::regclass) 4 NA -11 2 NA-9 NA 1002 IS_NULLABLE DISPLAY_SIZE FIELD_TYPE AUTO_INCREMENT PHYSICAL NUMBER TABLE OID BASE TYPEID TYPMOD 1NA 11 23 1 1 27441 0 -1 2NA 50 1043 0 2 27441 0 50 sqlPrimaryKeys(DRCch, tblCeramicWare) TABLE_QUALIFIER TABLE_OWNER TABLE_NAME COLUMN_NAME KEY_SEQ PK_NAME 1 daacs-production public tblCeramicWare WareID 1 tblCeramicWare_pkey sqlQuery(DRCch,paste( + SELECT * + FROM tblCeramicWare + )) [1] 42P01 7 ERROR: relation \tblceramicware\ does not exist;\nError while executing the query [2] [RODBC] ERROR: Could not SQLExecDirect '\n SELECT * \n FROM tblCeramicWare \n ' Fraser D. Neiman Department of Archaeology, Monticello (434) 984 9812 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] puzzling classical Mahalanobis distances from covMcd() {robustbase}
Greetings, I am puzzled about why the _classical_ Mahalanobis distances that I get using the {stats} mahalanobis() function do not match the distances I get from the {robustbase} covMcd() function. Here is an example: x - matrix(rnorm(10*3), ncol = 3) #here is the {stats} result: Sx - cov(x) D2 - mahalanobis(x, colMeans(x), Sx) D2 [1] 1.5135795 1.3761046 1.0367444 1.8111585 4.3038621 5.3195918 3.2798665 5.7559301 [9] 2.2172150 0.3859475 #here is the {robustbase} result Library(robustbase) D2rb- covMcd(x) D2rb$raw.mah [1] 0.7737193 1.1177445 0.7290794 0.6275703 3.5517622 6.0334350 1.0582663 5.7169250 [9] 0.9420184 0.4210470 According to the help file for covMcd{robustbase} raw.mah mahalanobis distances of the observations based on the raw estimate of the location and scatter. So I think the second set of numbers should match the first. But they do not. What am I missing here? Thanks, Fraser __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.