Re: [R] missing and replace

2017-04-27 Thread Fraser D. Neiman
Dear All,

Replacing  missing values with means is generally not a good idea:

"Perhaps the easiest way to impute is to replace each missing
value with the mean of the observed values for that variable. Unfortunately, 
this
strategy can severely distort the distribution for this variable, leading to 
complications
with summary measures including, notably, underestimates of the standard
deviation. Moreover, mean imputation distorts relationships between variables by
“pulling” estimates of the correlation toward zero."

That's from Gelman and Hill -- more here : 
http://www.stat.columbia.edu/~gelman/arm/missing.pdf


best, Fraser


From: Val [valkr...@gmail.com]
Sent: Wednesday, April 26, 2017 8:45 PM
To: r-help@R-project.org (r-help@r-project.org)
Subject: [R] missing and replace

HI all,

I have a data frame with three variables. Some of the variables do
have missing values and I want to replace those missing values
(1represented by NA) with the mean value of that variable. In this
sample data,  variable z and y do have missing values. The mean value
of y  and z are152. 25  and 359.5, respectively . I want replace those
missing values  by the respective mean value ( rounded to the nearest
whole number).

DF1 <- read.table(header=TRUE, text='ID1 x y z
1  25  122352
2  30  135376
3  40   NA350
4  26  157NA
5  60  195360')
mean x= 36.2
mean y=152.25
mean z= 359.5

output
ID1  x  y  z
1   25 122   352
2   30 135   376
3   40 152   350
4   26 157   360
5   60 195   360


Thank you in advance

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] F Distribution

2015-12-21 Thread Fraser D. Neiman
Dear Bob,

You want...

> qf( .95,1, 1)
[1] 161.4476

Best, Fraser

-Original Message-
From: Robert Sherry [mailto:rsher...@comcast.net] 
Sent: Monday, December 21, 2015 2:51 PM
To: R Project Help
Subject: [R] F Distribution


When I use a table, from a Schaum book, I see that for the 95 percentile, with 
v_1 = 1 and v_2 = 1 the value is 161. In the modern era, looking values up in a 
table is less than ideal. Therefore, I would expect R to have a function to do 
this and based upon my reading of the documentation, I would expect the 
following call to get the value I expect:
  pf( .95,1, 1)
However, it produces
 0.4918373
Therefore, I conclude that I am using the wrong function. What function should 
I use?

Thanks
Bob

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] textplot() in wordcloud package

2015-03-16 Thread Fraser D. Neiman

Just a quick note to thank David Carlson and Jim lemon for the helpful replies 
on this... 

The easiest solution are to punt on WordCloud and use either the plotrix or 
maptools packages.

In plotrix the function is thigmophobe.labels().  In mapTools it's pointLabel()

Neither complains when you pass it the pos= parameter.

Thank you one and all!

Best, Fraser



-Original Message-
From: David L Carlson [mailto:dcarl...@tamu.edu] 
Sent: Monday, March 16, 2015 12:02 PM
To: David L Carlson; Fraser D. Neiman; r-help@r-project.org
Subject: RE: textplot() in wordcloud package

Another possibility is to use pointLabel() in package maptools. For your example

library(maptools)

plot(x,y)
pointLabel(x, y, text1)

Advantages of pointLabel() are that it returns a list of the x and y 
coordinates of the labels that you can tweak if necessary and, at least in your 
example, it does a better job of avoiding labels being chopped at the plot 
margins.

-
David L Carlson
Department of Anthropology
Texas AM University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David L Carlson
Sent: Monday, March 16, 2015 10:44 AM
To: Fraser D. Neiman; r-help@r-project.org
Subject: Re: [R] textplot() in wordcloud package

You should contact the package maintainer about this. The problem is that the 
pos= argument is being passed to strwidth() and strheight() and those functions 
do not know what to do with it. In the meantime:

suppressWarnings(textplot(x,y, text1, new=F, show.lines=F,  
  pos=4))

will eliminate the warnings.

-
David L Carlson
Department of Anthropology
Texas AM University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Fraser D. Neiman
Sent: Friday, March 13, 2015 3:29 PM
To: r-help@r-project.org
Subject: [R] textplot() in wordcloud package

Dear All,

The textplot() function in the wordcloud package seem to do a good job with 
generating non-overlapping labels on a scatter plot.
But it throws warnings when I try to use the pos= parameter to position the 
text labels relative to a given x-y point.

Here is a simple example:

 x-runif(100)
 y-runif(100)
text1- rep('LAB', 100)

 plot(x,y)
 textplot(x,y, text1, new=F, show.lines=F,  
  pos=4)

There were 50 or more warnings (use warnings() to see the first 50)
 warnings()
Warning messages:
1: In strwidth(words[i], cex = cex[i], ...) : pos is not a graphical parameter
2: In strheight(words[i], cex = cex[i], ...) : pos is not a graphical 
parameter 

How can I pass the pos=parameter to text() without generating the warnings?

I am doubly puzzled by the warnings because in the graph that results from the 
foregoing code, The labels are to the  right of the points, as 'pos=4' requests.

Thanks!

Fraser D. Neiman

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] textplot() in wordcloud package

2015-03-13 Thread Fraser D. Neiman
Dear All,

The textplot() function in the wordcloud package seem to do a good job with 
generating non-overlapping labels on a scatter plot.
But it throws warnings when I try to use the pos= parameter to position the 
text labels relative to a given x-y point.

Here is a simple example:

 x-runif(100)
 y-runif(100)
text1- rep('LAB', 100)
 
 plot(x,y)
 textplot(x,y, text1, new=F, show.lines=F,  
  pos=4)

There were 50 or more warnings (use warnings() to see the first 50)
 warnings()
Warning messages:
1: In strwidth(words[i], cex = cex[i], ...) : pos is not a graphical parameter
2: In strheight(words[i], cex = cex[i], ...) : pos is not a graphical 
parameter 

How can I pass the pos=parameter to text() without generating the warnings?

I am doubly puzzled by the warnings because in the graph that results from the 
foregoing code,
The labels are to the  right of the points, as 'pos=4' requests.

Thanks!

Fraser D. Neiman

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R vs. RStudio?

2015-01-12 Thread Fraser D. Neiman

In my experience, another negative to RStudio is its performance  when trying 
to access  code or data files on a remote server over a VPN connection -- even 
modest files can take minutes to load and sometimes crash the session. 

The native R GUI seems to handle this better and I often am forced to use it 
when working remotely. But there is enough other good stuff in RStudio to make 
this a bummer.




Fraser  

-Original Message-
From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com] 
Sent: Sunday, January 11, 2015 5:31 AM
To: Boris Steipe; R mailing list
Subject: Re: [R] R vs. RStudio?

On 10/01/2015 9:22 PM, Boris Steipe wrote:
 Could someone kindly enlighten me whether there are currently advantages to 
 use R Studio vs. the normal R GUI? On the Mac I can't seem to find anything 
 compelling, on Windows (which I don't use myself) I noticed last year that 
 there seems to be no syntax highlighting available for the R GUI but R Studio 
 had it.
 
 Surely there must be some value proposition in that project, what am I 
 missing?

I find several advantages, and one or two disadvantages.

 - The debugger is nicer.  You can set breakpoints in the code editor and it 
installs them in the right place.

 - It has lots of support for things like Sweave, knitr, rmarkdown, etc.

 - It is easy to switch between different projects.

 - It looks the same on all platforms, so if you switch platforms you still 
know what you're doing.

Negatives:

 - I don't like the tiled display.  I find it doesn't give me enough space.

 - At least until recently, I haven't checked with the latest release, it 
converts files to the native format, i.e. saving a file on Windows gives you CR 
LF line endings, doing it elsewhere converts them to LF.
This is really irritating when files get changed for no good reason.

Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] posterior probabilities from lda.predict

2014-08-29 Thread Fraser D. Neiman
Dear All,

I have used the lda() function in the MASS library to estimate a set of 
discriminant functions to assign samples from a training set to one of six 
groups.  The cross validation generates nearly perfect predictions for samples 
in the training set.  Hooray!

Now I want to use lda.predict() to estimate both discriminant function scores 
and probabilities of group membership for a second set of samples whose group 
membership is unknown.  For each unknown sample, lda.predict() produces a six 
probabilities. These probabilities sum to one. So lda.predict() seems to assume 
that the unknown samples do, in fact, belong to one of the six groups.  

The problem is that it is nearly certain that some of the unknown samples in 
the second set do not belong to any of the six groups. For those samples, 
probabilities of group membership should be close to zero for all six groups.  
In fact, identifying which samples are unlikely to belong to any of the six 
groups is a major goal of the analysis. 

So the question is, what is lda.predict() doing behind the scenes to force the 
group membership probabilities to sum to one? How do I get it to not do this 
and produce probabilities that accurately reflect the large Mahalanobis 
distances of some of the unknown sample from any group centroid?\

I have searched the R-list archive on this and have found several folks asking 
similar questions, but no helpful answers.

Thanks very much!

Fraser
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RODBC and PosgreSQL problems

2014-06-04 Thread Fraser D. Neiman
Dear All,

I just wanted to follow up my question with an answer, which I owe to Robbie 
Bingler at UVA's IATH. The code chunk that bombed is here:

sqlQuery(DRCch,paste(
+  SELECT *
+  FROM tblCeramicWare
+  ))
[1] 42P01 7 ERROR: relation \tblceramicware\ does not exist;\nError while 
executing the query
[2] [RODBC] ERROR: Could not SQLExecDirect '\n SELECT * \n 
FROM tblCeramicWare \n '
The following works:

 sqlQuery(DRCch,paste('SELECT *
+   FROM
+  tblCeramicWare
+  '))
   WareID   Ware
1   1   Coarse Earthenware, unidentified
2   2 Red Agate, refined
3  97 Agate, refined (Whieldon-type)
4   4Redware
5   5Buckley
6   6   Iberian Ware
7  87North Devon Gravel Tempered

Note the double quote on the table name (a PostgreSQL feature) and the single 
quotes enclosing the SQL text-string
that is the argument to the paste() function.

Boolean operators often require single-quoted text strings and to prevent R 
from interpreting these as
the end of the  SQL string, one uses \ as an escape sequence:
 sqlQuery(DRCch,paste('SELECT * from tblCeramicWare WHERE Ware = \'Slip 
 Dip\' '))
  WareID Ware
1 93 Slip Dip


Thanks to Robbie and to all the folks on the R-Help list for their help.

Best, Fraser



From: Fraser D. Neiman
Sent: Friday, May 30, 2014 2:00 PM
To: r-help@r-project.org
Subject: RODBC and PosgreSQL problems


Dear All,

I am trying for the first time to run SQL queries against a remote PostgreSQL 
database via RODBC. I am able to establish a connection just fine, as shown by 
getting results back from the sqlTables(),  sqlColumns() and sqlPrimary Key() 
functions in RODBC. However, when I try to run a SQL query using the sqlQuery() 
function I get

[1] 42P01 7 ERROR: relation \tblceramicware\ does not exist;\nError while 
executing the query
[2] [RODBC] ERROR: Could not SQLExecDirect '\n SELECT * \n 
FROM tblCeramicWare

What am I doing wrong?

Here are the relevant snips from the R console.  What's puzzling is that 
tblcermicWare is recognized as an argument to sqlColumns() and 
sqlPrimaryKey() . But NOT in sqlQuery() .

Thanks for any pointers.

best, Fraser

 library(RODBC)

 # connect to DAACS and assign a name (DAACSch) to the connection
 DRCch - odbcConnect(postgreSQL35W , case= nochange, uid 
 =XX,pwd=XX);

 #list the tables that are avalailabale
 sqlTables(DRCch, tableType = TABLE)
 TABLE_QUALIFIER TABLE_OWNER   TABLE_NAME 
TABLE_TYPE REMARKS
1   daacs-production  public TempSTPTable  
TABLE
2   daacs-production  public   activities  
TABLE
3   daacs-production  public articles  
TABLE
4   daacs-production  publicschema_migrations  
TABLE
5   daacs-production  publictblACDistance  
TABLE
6   daacs-production  public   tblArtifactBox  
TABLE
7   daacs-production  public tblArtifactImage  
TABLE
8   daacs-production  publictblBasicColor  
TABLE
9   daacs-production  public  tblBead  
TABLE


 sqlColumns(DRCch, tblCeramicWare)
   TABLE_QUALIFIER TABLE_OWNER TABLE_NAME COLUMN_NAME DATA_TYPE TYPE_NAME 
PRECISION LENGTH SCALE RADIX NULLABLE
1 daacs-production  public tblCeramicWare  WareID 4  int4   
 10  4 0100
2 daacs-production  public tblCeramicWareWare-9   varchar   
 50100NANA1
  REMARKS COLUMN_DEF SQL_DATA_TYPE SQL_DATETIME_SUB 
CHAR_OCTET_LENGTH ORDINAL_POSITION
1 nextval('global_id_seq'::regclass) 4   NA 
   -11
2   NA-9   NA 
  1002
  IS_NULLABLE DISPLAY_SIZE FIELD_TYPE AUTO_INCREMENT PHYSICAL NUMBER TABLE OID 
BASE TYPEID TYPMOD
1NA   11 23  1   1 27441  
 0 -1
2NA   50   1043  0   2 27441  
 0 50
 sqlPrimaryKeys(DRCch, tblCeramicWare)
   TABLE_QUALIFIER TABLE_OWNER TABLE_NAME COLUMN_NAME KEY_SEQ 
PK_NAME
1 daacs-production  public tblCeramicWare  WareID   1 
tblCeramicWare_pkey

 sqlQuery(DRCch,paste(
+  SELECT *
+  FROM tblCeramicWare
+  ))
[1] 42P01 7 ERROR: relation \tblceramicware\ does not exist;\nError while 
executing the query
[2] [RODBC] ERROR: Could not SQLExecDirect '\n

[R] RODBC and PosgreSQL problems

2014-05-30 Thread Fraser D. Neiman

Dear All,

I am trying for the first time to run SQL queries against a remote PostgreSQL 
database via RODBC. I am able to establish a connection just fine, as shown by 
getting results back from the sqlTables(),  sqlColumns() and sqlPrimary Key() 
functions in RODBC. However, when I try to run a SQL query using the sqlQuery() 
function I get

[1] 42P01 7 ERROR: relation \tblceramicware\ does not exist;\nError while 
executing the query
[2] [RODBC] ERROR: Could not SQLExecDirect '\n SELECT * \n 
FROM tblCeramicWare

What am I doing wrong?

Here are the relevant snips from the R console.  What's puzzling is that 
tblcermicWare is recognized as an argument to sqlColumns() and 
sqlPrimaryKey() . But NOT in sqlQuery() .

Thanks for any pointers.

best, Fraser

 library(RODBC)

 # connect to DAACS and assign a name (DAACSch) to the connection
 DRCch - odbcConnect(postgreSQL35W , case= nochange, uid 
 =XX,pwd=XX);

 #list the tables that are avalailabale
 sqlTables(DRCch, tableType = TABLE)
 TABLE_QUALIFIER TABLE_OWNER   TABLE_NAME 
TABLE_TYPE REMARKS
1   daacs-production  public TempSTPTable  
TABLE
2   daacs-production  public   activities  
TABLE
3   daacs-production  public articles  
TABLE
4   daacs-production  publicschema_migrations  
TABLE
5   daacs-production  publictblACDistance  
TABLE
6   daacs-production  public   tblArtifactBox  
TABLE
7   daacs-production  public tblArtifactImage  
TABLE
8   daacs-production  publictblBasicColor  
TABLE
9   daacs-production  public  tblBead  
TABLE


 sqlColumns(DRCch, tblCeramicWare)
   TABLE_QUALIFIER TABLE_OWNER TABLE_NAME COLUMN_NAME DATA_TYPE TYPE_NAME 
PRECISION LENGTH SCALE RADIX NULLABLE
1 daacs-production  public tblCeramicWare  WareID 4  int4   
 10  4 0100
2 daacs-production  public tblCeramicWareWare-9   varchar   
 50100NANA1
  REMARKS COLUMN_DEF SQL_DATA_TYPE SQL_DATETIME_SUB 
CHAR_OCTET_LENGTH ORDINAL_POSITION
1 nextval('global_id_seq'::regclass) 4   NA 
   -11
2   NA-9   NA 
  1002
  IS_NULLABLE DISPLAY_SIZE FIELD_TYPE AUTO_INCREMENT PHYSICAL NUMBER TABLE OID 
BASE TYPEID TYPMOD
1NA   11 23  1   1 27441  
 0 -1
2NA   50   1043  0   2 27441  
 0 50
 sqlPrimaryKeys(DRCch, tblCeramicWare)
   TABLE_QUALIFIER TABLE_OWNER TABLE_NAME COLUMN_NAME KEY_SEQ 
PK_NAME
1 daacs-production  public tblCeramicWare  WareID   1 
tblCeramicWare_pkey

 sqlQuery(DRCch,paste(
+  SELECT *
+  FROM tblCeramicWare
+  ))
[1] 42P01 7 ERROR: relation \tblceramicware\ does not exist;\nError while 
executing the query
[2] [RODBC] ERROR: Could not SQLExecDirect '\n SELECT * \n 
FROM tblCeramicWare \n '




Fraser D. Neiman
Department of Archaeology, Monticello
(434) 984 9812


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] puzzling classical Mahalanobis distances from covMcd() {robustbase}

2012-07-27 Thread Fraser D. Neiman
Greetings,

I am puzzled about why the _classical_ Mahalanobis distances that I get using
the {stats} mahalanobis() function do not match the distances I get from the
{robustbase} covMcd() function. Here is an example:

x - matrix(rnorm(10*3), ncol = 3)

#here is the {stats} result:
Sx - cov(x)
D2 - mahalanobis(x, colMeans(x), Sx)
D2

[1] 1.5135795 1.3761046 1.0367444 1.8111585 4.3038621 5.3195918 3.2798665
5.7559301
 [9] 2.2172150 0.3859475

 
#here is the {robustbase} result
Library(robustbase)
D2rb- covMcd(x)
D2rb$raw.mah

[1] 0.7737193 1.1177445 0.7290794 0.6275703 3.5517622 6.0334350 1.0582663
5.7169250
 [9] 0.9420184 0.4210470

According to the help file for covMcd{robustbase}

raw.mah mahalanobis distances of the observations based on the raw estimate of
the location and scatter.

So I think the second set of numbers should match the first. But they do not.
What am I missing here?

Thanks, Fraser

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.