Re: [R] Splitting a vector
You don't need to wrap 1:12 in c(). Since matrices are just folded vectors, you can convert vector X to a matrix Xm: Xm - matrix( X, nrow=3 ) and access columns to get your your sub-vectors: Xm[,1] Xm[,2] and so on. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Long Vo long_vo_...@yahoo.com.vn wrote: Hi, I am quite new to R so I know that this probably is very basic , but how can I split a sequence of number into multiple parts with equal length? For example I have a vector X=c(1:12) I simply need to split it into sub-vectors with the same length N . Say N=3 then I need the output to be like 1 2 3 4 5 6 7 8 9 10 11 12 And better if the sub-vectors can be named so that I can use them later for individual study, probably a do-loop in which a function can be applied to them. I just want them to be in consecutive order so really no fancy conditions here. Any helps to this amateur is greatly appreciated, Long -- View this message in context: http://r.789695.n4.nabble.com/Splitting-a-vector-tp4681930.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] read.csv interpreting numbers as factors
Why does R interpret a column of numbers in a csv file as a factor when using read.csv() and how can I prevent that. The data looks like 9928 3502 146 404 1831 686 249 I tried kick=read.csv(kick.csv,stringsAsFactors =FALSE) as well as kick=read.csv(kick.csv) Thanks On Mon, Dec 2, 2013 at 5:16 PM, William Dunlap wdun...@tibco.com wrote: It seems so inefficient. But ifelse knows nothing about the expressions given as its second and third arguments -- it only sees their values after they are evaluated. Even if it could see the expressions, it would not be able to assume that f(x[i]) is the same as f(x)[i] or things like ifelse(x0, cumsum(x), cumsum(-x)) would not work. You can avoid the computing all of f(x) and then extracting a few elements from it by doing something like x - c(Wednesday, Monday, Wednesday) z1 - character(length(x)) z1[x==Monday] - Mon z1[x==Tuesday] - Tue z1[x==Wednesday] - Wed or LongDayNames - c(Monday,Tuesday,Wednesday) ShortDayNames - c(Mon, Tue, Wed) z2 - character(length(x)) for(i in seq_along(LongDayNames)) { z2[x==LongDayNames[i]] - ShortDayNames[i] } To avoid the repeated x==value[i] you can use match(x, values). z3 - ShortDayNames[match(x, LongDayNames)] z1, z2, and z3 are identical character vectors. Or, you can use factors. factor(x, levels=LongDayNames, labels=ShortDayNames) [1] Wed Mon Wed Levels: Mon Tue Wed Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Bill Sent: Monday, December 02, 2013 4:50 PM To: Duncan Murdoch Cc: r-help@r-project.org Subject: Re: [R] ifelse -does it manage the indexing? It seems so inefficient. I mean the whole first vector will be evaluated. Then if the second if is run the whole vector will be evaluated again. Then if the next if is run the whole vector will be evaluted again. And so on. And this could be only to test the first element (if it is false for each if statement). Then this would be repeated again and again. Is that really the way it works? Or am I not thinking clearly? On Mon, Dec 2, 2013 at 4:48 PM, Duncan Murdoch murdoch.dun...@gmail.comwrote: On 13-12-02 7:33 PM, Bill wrote: ifelse ((day_of_week == Monday),1, ifelse ((day_of_week == Tuesday),2, ifelse ((day_of_week == Wednesday),3, ifelse ((day_of_week == Thursday),4, ifelse ((day_of_week == Friday),5, ifelse ((day_of_week == Saturday),6,7))) In code like the above, day_of_week is a vector and so day_of_week == Monday will result in a boolean vector. Suppose day_of_week is Monday, Thursday, Friday, Tuesday. So day_of_week == Monday will be True,False,False,False. I think that ifelse will test the first element and it will generate a 1. At this point it will not have run day_of_week == Tuesday yet. Then it will test the second element of day_of_week and it will be false and this will cause it to evaluate day_of_week == Tuesday. My question would be, does the evaluation of day_of_week == Tuesday result in the generation of an entire boolean vector (which would be in this case False,False,False,True) or does the ifelse manage the indexing so that it only tests the second element of the original vector (which is Thursday) and for that matter does it therefore not even bother to generate the first boolean vector I mentioned above (True,False,False,False) but rather just checks the first element? Not sure if I have explained this well but if you understand I would appreciate a reply. See the help for the function. If any element of the test is true, the full first vector will be evaluated. If any element is false, the second one will be evaluated. There are no shortcuts of the kind you describe. Duncan Murdoch [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.csv interpreting numbers as factors
It is bad netiquette to hijack an existing thread for a new topic. Please start a new email thread when changing topics. If your data really consists of what you show, then read.csv won't behave that way. I suggest that you open the file in a text editor and look for odd characters. They may be invisible. Going out on a limb, you may be trying to read a tab separated file, and if so then you need to use the sep=”\t argument to read.csv. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Bill william...@gmail.com wrote: Why does R interpret a column of numbers in a csv file as a factor when using read.csv() and how can I prevent that. The data looks like 9928 3502 146 404 1831 686 249 I tried kick=read.csv(kick.csv,stringsAsFactors =FALSE) as well as kick=read.csv(kick.csv) Thanks On Mon, Dec 2, 2013 at 5:16 PM, William Dunlap wdun...@tibco.com wrote: It seems so inefficient. But ifelse knows nothing about the expressions given as its second and third arguments -- it only sees their values after they are evaluated. Even if it could see the expressions, it would not be able to assume that f(x[i]) is the same as f(x)[i] or things like ifelse(x0, cumsum(x), cumsum(-x)) would not work. You can avoid the computing all of f(x) and then extracting a few elements from it by doing something like x - c(Wednesday, Monday, Wednesday) z1 - character(length(x)) z1[x==Monday] - Mon z1[x==Tuesday] - Tue z1[x==Wednesday] - Wed or LongDayNames - c(Monday,Tuesday,Wednesday) ShortDayNames - c(Mon, Tue, Wed) z2 - character(length(x)) for(i in seq_along(LongDayNames)) { z2[x==LongDayNames[i]] - ShortDayNames[i] } To avoid the repeated x==value[i] you can use match(x, values). z3 - ShortDayNames[match(x, LongDayNames)] z1, z2, and z3 are identical character vectors. Or, you can use factors. factor(x, levels=LongDayNames, labels=ShortDayNames) [1] Wed Mon Wed Levels: Mon Tue Wed Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Bill Sent: Monday, December 02, 2013 4:50 PM To: Duncan Murdoch Cc: r-help@r-project.org Subject: Re: [R] ifelse -does it manage the indexing? It seems so inefficient. I mean the whole first vector will be evaluated. Then if the second if is run the whole vector will be evaluated again. Then if the next if is run the whole vector will be evaluted again. And so on. And this could be only to test the first element (if it is false for each if statement). Then this would be repeated again and again. Is that really the way it works? Or am I not thinking clearly? On Mon, Dec 2, 2013 at 4:48 PM, Duncan Murdoch murdoch.dun...@gmail.comwrote: On 13-12-02 7:33 PM, Bill wrote: ifelse ((day_of_week == Monday),1, ifelse ((day_of_week == Tuesday),2, ifelse ((day_of_week == Wednesday),3, ifelse ((day_of_week == Thursday),4, ifelse ((day_of_week == Friday),5, ifelse ((day_of_week == Saturday),6,7))) In code like the above, day_of_week is a vector and so day_of_week == Monday will result in a boolean vector. Suppose day_of_week is Monday, Thursday, Friday, Tuesday. So day_of_week == Monday will be True,False,False,False. I think that ifelse will test the first element and it will generate a 1. At this point it will not have run day_of_week == Tuesday yet. Then it will test the second element of day_of_week and it will be false and this will cause it to evaluate day_of_week == Tuesday. My question would be, does the evaluation of day_of_week == Tuesday result in the generation of an entire boolean vector (which would be in this case False,False,False,True) or does the ifelse manage the indexing so that it only tests the second element of the original vector (which is Thursday) and for that matter does it therefore not even bother to generate the first boolean vector I mentioned above (True,False,False,False) but rather just checks the first element? Not sure if I have explained this well but if you understand I would appreciate a reply. See the help for the function. If any element of the test is true, the full first vector will be evaluated. If any element is false, the second one will be evaluated. There are no
Re: [R] read.csv interpreting numbers as factors
On Tue, Dec 10, 2013 at 10:06 AM, Jeff Newmiller jdnew...@dcn.davis.ca.us wrote: It is bad netiquette to hijack an existing thread for a new topic. Please start a new email thread when changing topics. If your data really consists of what you show, then read.csv won't behave that way. I suggest that you open the file in a text editor and look for odd characters. They may be invisible. Going out on a limb, you may be trying to read a tab separated file, and if so then you need to use the sep=”\t argument to read.csv. Or something in the data isn't a valid number. Try: as.numeric(as.character(factorthingyouthinkshouldbenumbers)) and if you get any NA values then those things aren't valid number formats. You need as.numeric(as.character(..)) because otherwise as.numeric just gets the underlying number codes for the factor levels. f=factor(c(1,1,2,three,4,69)) f [1] 1 1 2 three 4 69 Levels: 1 2 4 69 three as.numeric(f) [1] 1 1 2 5 3 4 as.numeric(as.character(f)) [1] 1 1 2 NA 4 69 Warning message: NAs introduced by coercion __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need help figuring out sapply (and similar functions) with multiple parameter user defined function
In that case you need correct those cases inside the function. Everything else stays same. Something like fff - function(x) { tb-table(x) if(max(tb)==min(tb)) res - None else res - names(which.max(table(x))) } Regards Petr From: Walter Anderson [mailto:wandrso...@gmail.com] Sent: Monday, December 09, 2013 5:47 PM To: PIKAL Petr Cc: r-help@r-project.org Subject: Re: [R] Need help figuring out sapply (and similar functions) with multiple parameter user defined function Petr, Thank you for your assistance; however, your code does not produce the correct results in the following two cases; ID,Q1,Q2,Q3 17,Option 1, Option 3, Option 2 24,Option 2, Option 3, Option 1 In both cases it chooses a preference of Option 1, when the correct answer is None # The data is expected to be organized as so: # ID, Q1, Q2, Q3 # and that the questions are in the following format # Q.1) Which do you prefer? # 1) Option 1 # 2) Option 2 # Q.2) Which do you prefer? # 1) Option 1 # 2) Option 3 # Q.3) Which do you prefer? # 1) Option 2 # 2) Option 3 # Test data that shows all possible responses to the questions ID,Q1,Q2,Q3 1,0,0,0 2,0,0,1 3,0,0,2 4,0,1,0 5,0,1,1 6,0,1,2 7,0,2,0 8,0,2,1 9,0,2,2 10,1,0,0 11,1,0,1 12,1,0,2 13,1,1,0 14,1,1,1 15,1,1,2 16,1,2,0 17,1,2,1 18,1,2,2 19,2,0,0 20,2,0,1 21,2,0,2 22,2,1,0 23,2,1,1 24,2,1,2 25,2,2,0 26,2,2,1 27,2,2,2 On 12/09/2013 10:21 AM, PIKAL Petr wrote: Hi you are still rather cryptic. If you said you want to extract prevalent choices in each row it would save me (and you) a lot of time. Import data and make necessary chnages survey.results - read.csv(SamplePairedComparisonData.csv) survey.results[survey.results[,3]==2,3]-3 # Convert column 3 (Q2) to a value of 3 if the existing value is 2 # The order of the following two commands is important, don't change survey.results[survey.results[,4]==2,4]-3 # Convert column 4 (Q3) to a value of 3 if the existing value is 1 survey.results[survey.results[,4]==1,4]-2 # Convert column 4 (Q4) to a value of 2 if the existing value is 1 survey.results$Q1 - factor(survey.results$Q1, labels=c(None,Option 1,Option 2)) survey.results$Q2 - factor(survey.results$Q2, labels=c(None,Option 1,Option 3)) survey.results$Q3 - factor(survey.results$Q3, labels=c(None,Option 2,Option 3)) Then make the object matrix without first column sr-survey.results[,-1] sr-as.matrix(sr) Elaborate evaluation function fff - function(x) names(which.max(table(x))) Apply function to matrix result - apply(sr,1, fff) Bind result with original data survey.results - cbind(survey.results, result) And voila, you shall have reqiured values. Regards Petr From: Walter Anderson [mailto:wandrso...@gmail.com] Sent: Monday, December 09, 2013 4:37 PM To: PIKAL Petr Subject: Re: [R] Need help figuring out sapply (and similar functions) with multiple parameter user defined function Peter. First, let me thank you for your assistance on this. Your ideas below for eliminating the three functions I had that 'adjusted' the data was received and appreciated. I included that approach in my revised script. I am only left with a for loop and my final preference determining code. I am including the current script and test data (all possible choices included-27). From my review of all of the responses, I don't see an alternative to the if then block that tests the state of the three questions or the for loop that executes it. I made some attempts to use the ifelse statement mentioned in the responses, but couldn't get anything to work. Again, thanks for your assistance and time on this! Walter [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lattice: superposed boxplots with same colors forrectanglesandumbrellas and filled boxes
Thank you very much, Rich, for the fast and very helpful reply! It helped me to reach my goal. Regards -- Gerrit PS: I extended your solution to a version that allows slightly finer control over the components of the boxplots, in particular if one wants to combine it with, e.g., panel.stripplot() or panel.average(). It was also possible to simplify my call of bwplot() a bit. The code is certainly not perfect and fully tested, but does work (for me ;-)), and follows as a little 'thank you': panel.bwplot.constantColor - function( ..., col, fill, cex, pch, dot.pch, umbrella.lty, box.alpha) { ## Date: Mon, 9 Dec 2013 17:52:38 -0500 ## From: Richard M. Heiberger r...@temple.edu ## Subject: Re: [R] lattice: superposed boxplots with same colors ## for rectangles and umbrellas and filled boxes ## to be included in next version of the HH package box.save - list( box.dot = trellis.par.get( box.dot), box.rectangle = trellis.par.get( box.rectangle), box.umbrella = trellis.par.get( box.umbrella), plot.symbol = trellis.par.get( plot.symbol)) trellis.par.set( box.dot = list( col = col), box.rectangle = list( col = col, alpha = box.alpha), box.umbrella = list( col = col, lty = umbrella.lty, alpha = box.alpha), plot.symbol = list( col = col)) panel.bwplot( ..., fill = col, cex = cex, pch = dot.pch) trellis.par.set( box.save) } bwplot( Y ~ F1, groups = F2, data = Data, jitter.data = TRUE, col = c( red, blue), box.alpha = 1/4, dot.pch = 17, umbrella.lty = 1, do.out = FALSE, panel = panel.superpose, panel.groups = panel.bwplot.constantColor) Thank you for the opportunity to illustrate this. I solved this problem last week and it will be included in the next version of the HH package (about a month away). panel.bwplot.constantColor - function(..., col, fill, cex, pch) { ## to be included in next version of the HH package box.save - list(box.dot=trellis.par.get(box.dot), box.rectangle=trellis.par.get(box.rectangle), box.umbrella=trellis.par.get(box.umbrella), plot.symbol=trellis.par.get(plot.symbol)) trellis.par.set(box.dot=list(col=col), box.rectangle=list(col=col), box.umbrella=list(col=col), plot.symbol=list(col=col)) panel.bwplot(..., fill=col, cex=cex, pch=pch) trellis.par.set(box.save) } bwplot( Y ~ F1, groups = F2, data = Data, col = mycolors, fill = mycolors, jitter.data = TRUE, pch=19, panel = panel.superpose, panel.groups = function(..., pch, col, alpha){ panel.stripplot(...) panel.bwplot.constantColor(..., pch=pch, col=col, alpha=alpha, do.out = FALSE) }, par.settings = list( box.dot = list( alpha = myalpha), box.rectangle = list( col = mycolors, alpha = myalpha), box.umbrella = list( col = mycolors, alpha = myalpha)) ) Rich On Mon, Dec 9, 2013 at 5:17 PM, Gerrit Eichner gerrit.eich...@math.uni-giessen.de wrote: Dear R-list, I've been trying to produce a sort of an interaction plot wherein colored stripplots and boxplots are superposed. In particular, I want the colors of the (transparently) filled boxes to be the same as the colors of the box borders (rectangle) and their whiskers (umbrella). Below I'm going to create an artificial data set and provide the code with which I've come up so far, and which is a fusion and adaptation of a few pieces of code I've found in the help pages and the mail archives. It does almost what I want, but still colors the rectangles and the umbrellas in black (of course, because it is the setting in the example presented below). However, the latter is what I want to change. x - c( rep( 1:4, each = 10), rep( -2, 40)) Data - data.frame( F1 = gl( 4, 10, length = 80, labels = LETTERS[ 1:4]), F2 = gl( 2, 40), Y = x + rnorm( length( x))) mycolors - c( red, blue) myalpha - 0.33 bwplot( Y ~ F1, groups = F2, data = Data, col = mycolors, fill = mycolors, jitter.data = TRUE, panel = panel.superpose, panel.groups = function(..., pch, col, alpha){ panel.stripplot(...) panel.bwplot(..., do.out = FALSE) }, par.settings = list( box.dot = list( alpha = myalpha), box.rectangle = list( col = black, alpha = myalpha), box.umbrella = list( col = black, alpha = myalpha)) ) If I'd provide mycolors instead of black to the col-component of the (sub-)lists given to
[R] Problem to predict new data with mclust
Hi, I am trying to use mclust to cluster some data (train_pca_10), I get the clusters, but when I try to use the model to predict new data (test1) I get this error mClust2 - Mclust(train_pca_10,G=2) pred-predict.Mclust(mClust2,test1) Error in if (warn) warning(WARNING) : argument is of length zero Can anyone see the problem here? Thanks, Pereira. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple Lorenz curves in one diagram - populations with different n
I want to plot on the same diagram, curves from two different populations, that have different n. How can I do this? Use lines() as well as plot() S *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rbind/timestamp problem
Hello world, I am stuck somehow... I am trying to add a timestamp to a sensitivity matrix: #this code works: sensitivity=data.frame(mtype=1,cdate=2) sensitivity=rbind(sensitivity,c(mtype=2,cdate=2)) # this code does not work - no idea why sensitivity=data.frame(mtype=1,cdate=as.POSIXct(Sys.time(), origin=1970-01-01)) sensitivity=rbind(sensitivity,c(mtype=2,cdate=as.POSIXct(Sys.time(), origin=1970-01-01))) Error message: Error in as.POSIXct.numeric(value) : 'origin' must be supplied Any hints why the second part does not work? Greetings, Georg -- Georg Hoermann, Dep. of Hydrology, Ecology, Kiel University, Germany +49/431/2190916, mo: +49/176/64335754, icq:348340729, skype: ghoermann __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rbind/timestamp problem
try this: sensitivity=rbind(sensitivity,data.frame(mtype=2,cdate=as.POSIXct(Sys.time(), origin=1970-01-01))) sensitivity mtype cdate 1 1 2013-12-10 09:35:53 2 2 2013-12-10 09:37:02 The 'c' function is coercing to numeric. If you want to 'rbind' rows to a dataframe, then make sure you use a dataframe. Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Tue, Dec 10, 2013 at 8:55 AM, Georg Hörmann ghoerm...@hydrology.uni-kiel.de wrote: Hello world, I am stuck somehow... I am trying to add a timestamp to a sensitivity matrix: #this code works: sensitivity=data.frame(mtype=1,cdate=2) sensitivity=rbind(sensitivity,c(mtype=2,cdate=2)) # this code does not work - no idea why sensitivity=data.frame(mtype=1,cdate=as.POSIXct(Sys.time(), origin=1970-01-01)) sensitivity=rbind(sensitivity,c(mtype=2,cdate=as.POSIXct(Sys.time(), origin=1970-01-01))) Error message: Error in as.POSIXct.numeric(value) : 'origin' must be supplied Any hints why the second part does not work? Greetings, Georg -- Georg Hoermann, Dep. of Hydrology, Ecology, Kiel University, Germany +49/431/2190916, mo: +49/176/64335754, icq:348340729, skype: ghoermann __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Add column to DF based on 2 columns in another DF
Hi, Try: library(reshape2) m1 - melt(LookupTable,id.vars=Str) m2 - m1 res - merge(MainDataFrame,m1,by.x=c(Str,index),by.y=c(Str,variable)) res[order(res$Str),c(3:6,1:2,7)] #or library(plyr) colnames(m2)[-1] - c(index,index_num) m2$index - as.character(m2$index) join(MainDataFrame,m2,by=c(Str,index)) A.K. LookupTable - read.table(header = TRUE, stringsAsFactors = FALSE, text=Str IND13 IND12 IND11 IND07 IND06 1 517 529 562 562 567 2 517 529 562 562 568 3 517 529 562 562 567 4 517 529 562 562 569 5 517 529 562 562 567 6 517 529 562 562 567 7 517 529 562 562 560 8 517 529 562 562 567 9 517 529 562 562 567 10 517 529 562 562 567) MainDataFrame - read.table(header = TRUE, stringsAsFactors = FALSE, text=Pid YEAR MONTH Fips Str index 600250 2006 7 6037 1 IND06 600250 2006 7 6037 2 IND06 600250 2006 7 6037 3 IND06 600250 2006 7 6037 4 IND06 600250 2006 7 6037 5 IND06 600353 2007 9 48097 6 IND07 600772 2006 2 6039 7 IND06 600947 2007 1 13207 7 IND07 601055 2007 9 13315 8 IND07 601103 2006 5 21093 10 IND06) MainDataFrame_New - ?? #What is the best way to add a new column index_num to MainDataFrame that is populated with the #number corresponding to 'Str' and 'index' in LookupTable: # Pid YEAR MONTH Fips Str index index_num # 600250 2006 7 6037 1 IND06 567 # 600250 2006 7 6037 2 IND06 568 # 600250 2006 7 6037 3 IND06 567 # 600250 2006 7 6037 4 IND06 569 # 600250 2006 7 6037 5 IND06 567 # 600353 2007 9 48097 6 IND07 562 # 600772 2006 2 6039 7 IND06 560 # 600947 2007 1 13207 7 IND07 562 # 601055 2007 9 13315 8 IND07 562 # 601103 2006 5 21093 10 IND06 567 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How can I find nonstandard or control characters in a large file?
andrewH wrote: However, my suspicion is that there are some funky characters, either control characters or characters with some non-standard encoding, somewhere in this 14 gig file. Moreover, I am concerned that these characters may cause me trouble down the road even if I use a different approach to getting columns out of the file. This is not an R solution, but here's a Windows utility I wrote to produce a table of frequency counts for all hex characters x00 to xFF in a file. http://www.efg2.com/Lab/OtherProjects/CharCount.ZIP Normally, you'll want to scrutinize anything below x20 or above x7F, since ASCII printable characters are in the range x20 to x7E. You can see how many tab (x09) characters are in the file, and whether the line endings are from Linux (x0A) or Windows (paired x0A and x0D). The ZIP includes Delphi source code, but provides a Windows executable. I made a change several months ago to allow drag-and-drop, so you can just drop the file on the application to have the characters counted. Just run the EXE after unzipping. No installation is needed. Once you find problems characters in the file, you can read the file as character data and use sub/gsub or other tools to remove or alter problem characters. efg Earl F Glynn UMKC School of Medicine Center for Health Insights __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem to predict new data with mclust
Hi, It's unlikely that anyone will be able to see your problem without a reproducible example. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example Are you using the mclust package? That would be the first thing we need to know. Then, have you worked through the examples in ?predict.Mclust, and do you understand them? Then, what do your data look like? str() and such are useful, but the best option is to use dput() to create a reproducible subset, or to reproduce your problem using one of the built-in datasets in R, as is done in the above documentation examples. Sarah On Tue, Dec 10, 2013 at 6:41 AM, Welma Pereira welma.pere...@gmail.com wrote: Hi, I am trying to use mclust to cluster some data (train_pca_10), I get the clusters, but when I try to use the model to predict new data (test1) I get this error mClust2 - Mclust(train_pca_10,G=2) pred-predict.Mclust(mClust2,test1) Error in if (warn) warning(WARNING) : argument is of length zero Can anyone see the problem here? Thanks, Pereira. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using assign with mapply
David Winsemius dwinsemius at comcast.net writes: So what happens if you try this: mapply(assign, kkk$vars, kkk$vals, MoreArgs = list(envir = .GlobalEnv) Yes, it works in certain situations, as well as the equivalent code: kkk - data.frame(vars=c(var1, var2, var3), vals=c(10, 20, 30), stringsAsFactors=F) mapply(assign, kkk$vars, kkk$vals, MoreArgs = list(pos = 1)) var1 [1] 10 See, however, the following example : example - function () { var1 - 250 kkk - data.frame(vars=c(var1, var2, var3), vals=c(10, 20, 30), stringsAsFactors=F) mapply(assign, kkk$vars, kkk$vals, MoreArgs = list(pos = 1)) print (var2) print (var1) } example() [1] 20 [1] 250 My question is: how to get the combination of mapply and assign to affect the variables defined in the block where the construction is executed? Maybe there is still something I don't understand. Thanks, -Sergio. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 3-D interpretation
Hi Shane, On Tue, Dec 10, 2013 at 10:35 AM, Shane Carey careys...@gmail.com wrote: Hi, im trying to create a 3-D interpretation of a geological fault using the akima package. But my code fails below highlighted in red: Does anyone have any ideas why this is. This is a plain-text email list, so no highlighting in red is available. More importantly, you should take a look at: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example and provide a simple example with data that demonstrates your problem. Sarah Thanks dat-read.delim(D:\\fault.txt,header=T,sep=,) library(rgl) # data rgl.spheres(dat$X,dat$Z , dat$Y,1,color=red) rgl.bbox() # bivariate linear interpolation # interp: akima.li - interp(dat$X, dat$Y, dat$Z, xo=seq(min(dat$X), max(dat$X), length = 100), yo=seq(min(dat$Y), max(dat$Y), length = 100)) This is my error: Error in interp.old(x, y, z, xo = xo, yo = yo, ncp = 0, extrap = extrap, : duplicate data points: need to set 'duplicate = ..' # interp surface: rgl.surface(akima.li$X,akima.li$Y,akima.li$Z,color=green,alpha=c(0.5)) # interpp: akima.p - interpp(dat$X, dat$Y, dat$Z, runif(200,min(dat$X),max(dat$X)), runif(200,min(dat$Y),max(dat$Y))) # interpp points: rgl.points(akima.p$X,akima.p$Z , akima.p$Y,size=4,color=yellow) # bivariate spline interpolation # data rgl.spheres(dat$X,dat$Z ,dat$Y,0.5,color=red) rgl.bbox() # bivariate cubic spline interpolation # interp: akima.si - interp(dat$X, dat$Y, dat$Z, xo=seq(min(dat$X), max(dat$X), length = 100), yo=seq(min(dat$Y), max(dat$Y), length = 100), linear = FALSE, extrap = TRUE) -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 3-D interpretation
Hi, I have since solved that problem, but now R keeps crashing. I have over 20,000 points. Can R handle that amount? Cheers On Tue, Dec 10, 2013 at 3:35 PM, Shane Carey careys...@gmail.com wrote: Hi, im trying to create a 3-D interpretation of a geological fault using the akima package. But my code fails below highlighted in red: Does anyone have any ideas why this is. Thanks dat-read.delim(D:\\fault.txt,header=T,sep=,) library(rgl) # data rgl.spheres(dat$X,dat$Z , dat$Y,1,color=red) rgl.bbox() # bivariate linear interpolation # interp: akima.li - interp(dat$X, dat$Y, dat$Z, xo=seq(min(dat$X), max(dat$X), length = 100), yo=seq(min(dat$Y), max(dat$Y), length = 100)) This is my error: Error in interp.old(x, y, z, xo = xo, yo = yo, ncp = 0, extrap = extrap, : duplicate data points: need to set 'duplicate = ..' # interp surface: rgl.surface(akima.li$X,akima.li$Y,akima.li$Z,color=green,alpha=c(0.5)) # interpp: akima.p - interpp(dat$X, dat$Y, dat$Z, runif(200,min(dat$X),max(dat$X)), runif(200,min(dat$Y),max(dat$Y))) # interpp points: rgl.points(akima.p$X,akima.p$Z , akima.p$Y,size=4,color=yellow) # bivariate spline interpolation # data rgl.spheres(dat$X,dat$Z ,dat$Y,0.5,color=red) rgl.bbox() # bivariate cubic spline interpolation # interp: akima.si - interp(dat$X, dat$Y, dat$Z, xo=seq(min(dat$X), max(dat$X), length = 100), yo=seq(min(dat$Y), max(dat$Y), length = 100), linear = FALSE, extrap = TRUE) -- Shane -- Shane [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 3-D interpretation
Hi, im trying to create a 3-D interpretation of a geological fault using the akima package. But my code fails below highlighted in red: Does anyone have any ideas why this is. Thanks dat-read.delim(D:\\fault.txt,header=T,sep=,) library(rgl) # data rgl.spheres(dat$X,dat$Z , dat$Y,1,color=red) rgl.bbox() # bivariate linear interpolation # interp: akima.li - interp(dat$X, dat$Y, dat$Z, xo=seq(min(dat$X), max(dat$X), length = 100), yo=seq(min(dat$Y), max(dat$Y), length = 100)) This is my error: Error in interp.old(x, y, z, xo = xo, yo = yo, ncp = 0, extrap = extrap, : duplicate data points: need to set 'duplicate = ..' # interp surface: rgl.surface(akima.li$X,akima.li$Y,akima.li$Z,color=green,alpha=c(0.5)) # interpp: akima.p - interpp(dat$X, dat$Y, dat$Z, runif(200,min(dat$X),max(dat$X)), runif(200,min(dat$Y),max(dat$Y))) # interpp points: rgl.points(akima.p$X,akima.p$Z , akima.p$Y,size=4,color=yellow) # bivariate spline interpolation # data rgl.spheres(dat$X,dat$Z ,dat$Y,0.5,color=red) rgl.bbox() # bivariate cubic spline interpolation # interp: akima.si - interp(dat$X, dat$Y, dat$Z, xo=seq(min(dat$X), max(dat$X), length = 100), yo=seq(min(dat$Y), max(dat$Y), length = 100), linear = FALSE, extrap = TRUE) -- Shane [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 3-D interpretation
Hi, On Tue, Dec 10, 2013 at 10:42 AM, Shane Carey careys...@gmail.com wrote: Hi, I have since solved that problem, It would be useful to post the solution, for the archives, so that others with the same problem can benefit from your insight. but now R keeps crashing. I have over 20,000 points. Can R handle that amount? R can. Your computer may not be able to. But again, we don't have enough information to answer you, since that depends on OS, available RAM, etc. Sarah Cheers On Tue, Dec 10, 2013 at 3:35 PM, Shane Carey careys...@gmail.com wrote: Hi, im trying to create a 3-D interpretation of a geological fault using the akima package. But my code fails below highlighted in red: Does anyone have any ideas why this is. Thanks dat-read.delim(D:\\fault.txt,header=T,sep=,) library(rgl) # data rgl.spheres(dat$X,dat$Z , dat$Y,1,color=red) rgl.bbox() # bivariate linear interpolation # interp: akima.li - interp(dat$X, dat$Y, dat$Z, xo=seq(min(dat$X), max(dat$X), length = 100), yo=seq(min(dat$Y), max(dat$Y), length = 100)) This is my error: Error in interp.old(x, y, z, xo = xo, yo = yo, ncp = 0, extrap = extrap, : duplicate data points: need to set 'duplicate = ..' # interp surface: rgl.surface(akima.li$X,akima.li$Y,akima.li$Z,color=green,alpha=c(0.5)) # interpp: akima.p - interpp(dat$X, dat$Y, dat$Z, runif(200,min(dat$X),max(dat$X)), runif(200,min(dat$Y),max(dat$Y))) # interpp points: rgl.points(akima.p$X,akima.p$Z , akima.p$Y,size=4,color=yellow) # bivariate spline interpolation # data rgl.spheres(dat$X,dat$Z ,dat$Y,0.5,color=red) rgl.bbox() # bivariate cubic spline interpolation # interp: akima.si - interp(dat$X, dat$Y, dat$Z, xo=seq(min(dat$X), max(dat$X), length = 100), yo=seq(min(dat$Y), max(dat$Y), length = 100), linear = FALSE, extrap = TRUE) -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 3-D interpretation
Hi Sarah, here is some of the data: structure(list(X = c(9816966.951, 9816963.08, 9816947.516, 9816939.51, 9816924.005, 9816916.096, 9816901.984, 9816896.967, 9816892.928, 9816890.743, 9816886.427, 9816884.006, 9816879.185, 9816876.468, 9816871.027, 9816868.276, 9816863.404, 9816860.712, 9816855.409, 9816852.487, 9816846.77, 9816843.635, 9816837.53, 9816834.196, 9816827.733, 9816824.214, 9816817.419, 9816811.926, 9816797.531, 9816789.588, 9816774.134, 9816765.716, 9816749.492, 9816740.717, 9816723.946, 9816714.931, 9816697.836, 9816675.781, 9816606.59, 9816578.983, 9816544.487, 9816531.179, 9816514.376, 9816505.891, 9816489.52, 9816481.288, 9816465.377, 9816457.374, 9816441.836, 9816434.026, 9816418.807, 9816411.161, 9816396.205, 9816385.052, 9816355.719, 9816341.037, 9816312.075, 9816304.796, 9816304.784, 9816916.962, 9816917.365, 9816918.982, 9816919.414, 9816919.519, 9816919.545, 9816599.694, 9816615.198, 9816677.27, 9816697.31, 9816714.671, 9816723.431, 9816740.464, 9816748.995, 9816765.474, 9816773.661, 9816789.36, 9816797.089, 9816811.764, 9816817.212, 9816824.113, 9816827.536, 9816834.1, 9816837.345, 9816843.545, 9816846.596, 9816852.404, 9816855.248, 9816860.635, 9816863.257, 9816868.198, 9816870.861, 9816876.39, 9816879.039, 9816883.937, 9816886.296, 9816890.68, 9816892.809, 9816896.829, 9816901.555, 9816915.878, 9816921.977, 9816931.382, 9816933.721, 9816619.581, 9816631.77, 9816680.572, 9816697.262, 9816714.648, 9816723.385, 9816740.441, 9816748.95, 9816765.452, 9816773.618, 9816789.339, 9816797.049, 9816811.749, 9816817.193, 9816824.103, 9816827.518, 9816834.092, 9816837.328, 9816843.537, 9816846.58, 9816852.396, 9816855.233, 9816860.628, 9816863.244, 9816868.191, 9816870.846, 9816876.383, 9816879.025, 9816883.93, 9816886.284, 9816890.674, 9816892.798, 9816896.817, 9816901.516, 9816915.859, 9816921.464, 9816929.009, 9816930.886, 9816639.468, 9816648.343, 9816683.875, 9816697.215, 9816714.624, 9816723.338, 9816740.418, 9816748.905, 9816765.43, 9816773.575, 9816789.318, 9816797.009, 9816811.734, 9816817.175, 9816824.094, 9816827.5, 9816834.083, 9816837.311, 9816843.528, 9816846.565, 9816852.389, 9816855.218, 9816860.621, 9816863.23, 9816868.184, 9816870.831, 9816876.376, 9816879.012, 9816883.924, 9816886.272, 9816890.669, 9816892.787, 9816896.805, 9816901.477, 9816915.839, 9816920.952, 9816926.637, 9816928.05, 9816659.354, 9816664.915, 9816687.177, 9816697.167, 9816714.6, 9816723.291, 9816740.395, 9816748.859, 9816765.407, 9816773.532, 9816789.298, 9816796.969, 9816811.719, 9816817.156, 9816824.085, 9816827.482, 9816834.074, 9816837.294, 9816843.52, 9816846.549, 9816852.381, 9816855.204, 9816860.614, 9816863.217, 9816868.176, 9816870.816, 9816876.369, 9816878.999, 9816883.918, 9816886.26, 9816890.663, 9816892.777, 9816896.792, 9816901.438, 9816915.819, 9816920.439, 9816924.264, 9816925.215, 9816679.241, 9816681.487, 9816690.48, 9816697.119, 9816714.577, 9816723.244, 9816740.372, 9816748.814, 9816765.385, 9816773.489, 9816789.277, 9816796.929, 9816811.705, 9816817.137, 9816824.076, 9816827.464, 9816834.065, 9816837.277, 9816843.512, 9816846.533, 9816852.373, 9816855.189, 9816860.607, 9816863.203, 9816868.169, 9816870.801, 9816876.362, 9816878.985, 9816883.911, 9816886.248, 9816890.657, 9816892.766, 9816896.78, 9816901.399, 9816915.799, 9816919.926, 9816921.892, 9816922.38, 9816541.016, 9816548.667, 9816579.391, 9816605.021, 9816675.293, 9816697.454, 9816714.742, 9816723.572, 9816740.533, 9816749.131, 9816765.54, 9816773.79, 9816789.422, 9816797.21, 9816811.808, 9816817.269, 9816824.14, 9816827.589, 9816834.126, 9816837.395, 9816843.569, 9816846.643, 9816852.427, 9816855.292, 9816860.656, 9816863.297, 9816868.219, 9816870.906, 9816876.411, 9816879.078, 9816883.956, 9816886.331, 9816890.697, 9816892.842, 9816896.867, 9816901.672, 9816915.938, 9816923.514, 9816938.5, 9816942.226, 9816560.486, 9816564.893, 9816582.587, 9816604.825, 9816675.232, 9816697.406, 9816714.719, 9816723.525, 9816740.51, 9816749.085, 9816765.518, 9816773.747, 9816789.401, 9816797.17, 9816811.793, 9816817.25, 9816824.131, 9816827.572, 9816834.118, 9816837.378, 9816843.561, 9816846.628, 9816852.419, 9816855.277, 9816860.649, 9816863.284, 9816868.212, 9816870.891, 9816876.404, 9816879.065, 9816883.949, 9816886.32, 9816890.692, 9816892.831, 9816896.854, 9816901.633, 9816915.918, 9816923.002, 9816936.127, 9816939.391, 9816579.957, 9816581.118, 9816585.783, 9816604.629, 9816675.171, 9816697.358, 9816714.695, 9816723.478, 9816740.487, 9816749.04, 9816765.496, 9816773.704, 9816789.381, 9816797.13, 9816811.778, 9816817.231, 9816824.122, 9816827.554, 9816834.109, 9816837.361, 9816843.553, 9816846.612, 9816852.411, 9816855.262, 9816860.642, 9816863.27, 9816868.205, 9816870.876, 9816876.397, 9816879.052, 9816883.943, 9816886.308, 9816890.686, 9816892.82, 9816896.842, 9816901.594, 9816915.898, 9816922.489, 9816933.755, 9816936.556, 9816521.879, 9816524.089, 9816532.945, 9816543.835, 9816578.476, 9816605.217, 9816675.354, 9816697.501,
Re: [R] 3-D interpretation
The error says you have duplicate points in dat so you need to set duplicate= to tell interp() how to handle them. ?interp - David L Carlson Department of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Shane Carey Sent: Tuesday, December 10, 2013 9:36 AM To: r-help@r-project.org Subject: [R] 3-D interpretation Hi, im trying to create a 3-D interpretation of a geological fault using the akima package. But my code fails below highlighted in red: Does anyone have any ideas why this is. Thanks dat-read.delim(D:\\fault.txt,header=T,sep=,) library(rgl) # data rgl.spheres(dat$X,dat$Z , dat$Y,1,color=red) rgl.bbox() # bivariate linear interpolation # interp: akima.li - interp(dat$X, dat$Y, dat$Z, xo=seq(min(dat$X), max(dat$X), length = 100), yo=seq(min(dat$Y), max(dat$Y), length = 100)) This is my error: Error in interp.old(x, y, z, xo = xo, yo = yo, ncp = 0, extrap = extrap, : duplicate data points: need to set 'duplicate = ..' # interp surface: rgl.surface(akima.li$X,akima.li$Y,akima.li$Z,color=green,alpha =c(0.5)) # interpp: akima.p - interpp(dat$X, dat$Y, dat$Z, runif(200,min(dat$X),max(dat$X)), runif(200,min(dat$Y),max(dat$Y))) # interpp points: rgl.points(akima.p$X,akima.p$Z , akima.p$Y,size=4,color=yellow) # bivariate spline interpolation # data rgl.spheres(dat$X,dat$Z ,dat$Y,0.5,color=red) rgl.bbox() # bivariate cubic spline interpolation # interp: akima.si - interp(dat$X, dat$Y, dat$Z, xo=seq(min(dat$X), max(dat$X), length = 100), yo=seq(min(dat$Y), max(dat$Y), length = 100), linear = FALSE, extrap = TRUE) -- Shane [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 3-D interpretation
Hey David, I set it equal to the mean. duplicate=mean but still no joy, Thanks On Tue, Dec 10, 2013 at 4:28 PM, David Carlson dcarl...@tamu.edu wrote: The error says you have duplicate points in dat so you need to set duplicate= to tell interp() how to handle them. ?interp - David L Carlson Department of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Shane Carey Sent: Tuesday, December 10, 2013 9:36 AM To: r-help@r-project.org Subject: [R] 3-D interpretation Hi, im trying to create a 3-D interpretation of a geological fault using the akima package. But my code fails below highlighted in red: Does anyone have any ideas why this is. Thanks dat-read.delim(D:\\fault.txt,header=T,sep=,) library(rgl) # data rgl.spheres(dat$X,dat$Z , dat$Y,1,color=red) rgl.bbox() # bivariate linear interpolation # interp: akima.li - interp(dat$X, dat$Y, dat$Z, xo=seq(min(dat$X), max(dat$X), length = 100), yo=seq(min(dat$Y), max(dat$Y), length = 100)) This is my error: Error in interp.old(x, y, z, xo = xo, yo = yo, ncp = 0, extrap = extrap, : duplicate data points: need to set 'duplicate = ..' # interp surface: rgl.surface(akima.li$X,akima.li$Y,akima.li$Z,color=green,alpha =c(0.5)) # interpp: akima.p - interpp(dat$X, dat$Y, dat$Z, runif(200,min(dat$X),max(dat$X)), runif(200,min(dat$Y),max(dat$Y))) # interpp points: rgl.points(akima.p$X,akima.p$Z , akima.p$Y,size=4,color=yellow) # bivariate spline interpolation # data rgl.spheres(dat$X,dat$Z ,dat$Y,0.5,color=red) rgl.bbox() # bivariate cubic spline interpolation # interp: akima.si - interp(dat$X, dat$Y, dat$Z, xo=seq(min(dat$X), max(dat$X), length = 100), yo=seq(min(dat$Y), max(dat$Y), length = 100), linear = FALSE, extrap = TRUE) -- Shane [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Shane [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 3-D interpretation
it just keeps crashing on me, It seems to crash on this line rgl.surface(akima.li$x,akima.li$y,akima.li$z,color=green,alpha=c(0.5)) On Tue, Dec 10, 2013 at 4:32 PM, David Carlson dcarl...@tamu.edu wrote: I guess you will have to give us a reproducible example. Are you still getting the same error message? David From: Shane Carey [mailto:careys...@gmail.com] Sent: Tuesday, December 10, 2013 10:30 AM To: dcarl...@tamu.edu Cc: r-help@r-project.org Subject: Re: [R] 3-D interpretation Hey David, I set it equal to the mean. duplicate=mean but still no joy, Thanks On Tue, Dec 10, 2013 at 4:28 PM, David Carlson dcarl...@tamu.edu wrote: The error says you have duplicate points in dat so you need to set duplicate= to tell interp() how to handle them. ?interp - David L Carlson Department of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Shane Carey Sent: Tuesday, December 10, 2013 9:36 AM To: r-help@r-project.org Subject: [R] 3-D interpretation Hi, im trying to create a 3-D interpretation of a geological fault using the akima package. But my code fails below highlighted in red: Does anyone have any ideas why this is. Thanks dat-read.delim(D:\\fault.txt,header=T,sep=,) library(rgl) # data rgl.spheres(dat$X,dat$Z , dat$Y,1,color=red) rgl.bbox() # bivariate linear interpolation # interp: akima.li - interp(dat$X, dat$Y, dat$Z, xo=seq(min(dat$X), max(dat$X), length = 100), yo=seq(min(dat$Y), max(dat$Y), length = 100)) This is my error: Error in interp.old(x, y, z, xo = xo, yo = yo, ncp = 0, extrap = extrap, : duplicate data points: need to set 'duplicate = ..' # interp surface: rgl.surface(akima.li$X,akima.li$Y,akima.li$Z,color=green,alpha =c(0.5)) # interpp: akima.p - interpp(dat$X, dat$Y, dat$Z, runif(200,min(dat$X),max(dat$X)), runif(200,min(dat$Y),max(dat$Y))) # interpp points: rgl.points(akima.p$X,akima.p$Z , akima.p$Y,size=4,color=yellow) # bivariate spline interpolation # data rgl.spheres(dat$X,dat$Z ,dat$Y,0.5,color=red) rgl.bbox() # bivariate cubic spline interpolation # interp: akima.si - interp(dat$X, dat$Y, dat$Z, xo=seq(min(dat$X), max(dat$X), length = 100), yo=seq(min(dat$Y), max(dat$Y), length = 100), linear = FALSE, extrap = TRUE) -- Shane [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Shane -- Shane [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 3-D interpretation
I guess you will have to give us a reproducible example. Are you still getting the same error message? David From: Shane Carey [mailto:careys...@gmail.com] Sent: Tuesday, December 10, 2013 10:30 AM To: dcarl...@tamu.edu Cc: r-help@r-project.org Subject: Re: [R] 3-D interpretation Hey David, I set it equal to the mean. duplicate=mean but still no joy, Thanks On Tue, Dec 10, 2013 at 4:28 PM, David Carlson dcarl...@tamu.edu wrote: The error says you have duplicate points in dat so you need to set duplicate= to tell interp() how to handle them. ?interp - David L Carlson Department of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Shane Carey Sent: Tuesday, December 10, 2013 9:36 AM To: r-help@r-project.org Subject: [R] 3-D interpretation Hi, im trying to create a 3-D interpretation of a geological fault using the akima package. But my code fails below highlighted in red: Does anyone have any ideas why this is. Thanks dat-read.delim(D:\\fault.txt,header=T,sep=,) library(rgl) # data rgl.spheres(dat$X,dat$Z , dat$Y,1,color=red) rgl.bbox() # bivariate linear interpolation # interp: akima.li - interp(dat$X, dat$Y, dat$Z, xo=seq(min(dat$X), max(dat$X), length = 100), yo=seq(min(dat$Y), max(dat$Y), length = 100)) This is my error: Error in interp.old(x, y, z, xo = xo, yo = yo, ncp = 0, extrap = extrap, : duplicate data points: need to set 'duplicate = ..' # interp surface: rgl.surface(akima.li$X,akima.li$Y,akima.li$Z,color=green,alpha =c(0.5)) # interpp: akima.p - interpp(dat$X, dat$Y, dat$Z, runif(200,min(dat$X),max(dat$X)), runif(200,min(dat$Y),max(dat$Y))) # interpp points: rgl.points(akima.p$X,akima.p$Z , akima.p$Y,size=4,color=yellow) # bivariate spline interpolation # data rgl.spheres(dat$X,dat$Z ,dat$Y,0.5,color=red) rgl.bbox() # bivariate cubic spline interpolation # interp: akima.si - interp(dat$X, dat$Y, dat$Z, xo=seq(min(dat$X), max(dat$X), length = 100), yo=seq(min(dat$Y), max(dat$Y), length = 100), linear = FALSE, extrap = TRUE) -- Shane [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Shane __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rbind/timestamp problem
Thank you, now it works. I would never have never found the source of the problem. I am trying to build a data frame with results from a sensitivity analysis for a model. I use rbind because the number of rows is unkown and i use the timestamp as a hint for the students that they do not analyse data from last week :-). Anyway, thank you very much, it helped a lot. Greetings, Georg On 10.12.2013 15:40, jim holtman wrote: try this: sensitivity=rbind(sensitivity,data.frame(mtype=2,cdate=as.POSIXct(Sys.time(), origin=1970-01-01))) sensitivity mtype cdate 1 1 2013-12-10 09:35:53 2 2 2013-12-10 09:37:02 The 'c' function is coercing to numeric. If you want to 'rbind' rows to a dataframe, then make sure you use a dataframe. Jim Holtman Data Munger Guru -- Georg Hoermann, Department of Hydrology and Water Resources Management Kiel University, Germany +49/431/2190916, mo: +49/176/64335754, icq:348340729, skype: ghoermann __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] growth curve estimation
Dear Vito, Robert and David, thank you for your replies, although I made some step forward on my own, your answers helped a lot and gave me more insight. The only question I have left is why this code gives me an error (and that is what David asked for): m1 - lms(BMI,age,data=adatok_fiu, cent=c(3,10,25,50,75,90,97), families=BCCG) centiles.pred(m1, xname=age, xvalues=seq(10,20,.5), cent=c(3,10,25,50,75,90,97)) Error in X[onlydata, , drop = FALSE] : (subscript) logical subscript too long but m3 - gamlss(BMI~pb(age), sigma.formula=~pb(age), nu.formula = ~pb(age), tau.formula = ~pb(age), family=BCT, data=adatok_fiu) centiles.pred(m3, xname=age, xvalues=seq(10,20,.5), cent=c(3,10,25,50,75,90,97)) works as expected (although I get a warning message Warning message: In predict.gamlss(obj, what = tau, newdata = newx, type = response, : There is a discrepancy between the original and the re-fit used to achieve 'safe' predictions I had the feeling centiles.pred should work for the result of an lms() function just like in case of gamlss(). Thank you all again! daniel Feladó: David Winsemius [dwinsem...@comcast.net] Küldve: 2013. december 9. 21:40 To: Dániel Kehl Cc: r-help@r-project.org Tárgy: Re: [R] growth curve estimation On Dec 8, 2013, at 7:45 AM, Dániel Kehl wrote: Dear Community, I am struggling with a growth curve estimation problem. It is a classic BMI change with age calculation. I am not an expert of this field but have some statistical experience in other fields. Of course I started reading classical papers related to the topic and understood the concept of the LMS method. After that I thought it will be a piece of cake, R must have a package related to the topic, so I just need the data and I am just a few lines of code from the results. You might want to look at a more recent discussion: http://www.who.int/entity/childgrowth/standards/velocity/tr3chap_2.pdf (The WHO centre has published their R code.) I encountered some problems: - found at least three packages related to LMS: gamlss, VGAM and an old one lmsqreg (I did not try this because it does not support my R version (3.0.1.)) - it was relatively easy to get plots of percentiles in both packages, although they were far not the same (I also tried an other software, LMSchartmaker it gave different results from the previous ones) - I tried to get tables of predicted values (with the predict function in VGAM and with centiles.pre in gamlss) but without any success. Don't see any code or error messages. - I tried to use the function gamlss() instead of lms() in gamlss but I could not force them to give the same (or very similar results), but the centiles.pred() function did work as expected for the model resulted from galmss() - lms gives really different results if k is specified different ways, which is best? Won't that depend on the amount and distribution of the data? Also I have a general question: some publications state they estimated the centiles so that aroun 18 years of age the curves pass through certain points. How is that possible? Thank you for any suggestions or insights about the methods or preferred package! Here is my code (without data): #gamlss library(gamlss) library(VGAM) library(foreign) adatok - read.spss(MDSZ adatok.sav, to.data.frame=TRUE) adatok_fiu - subset(adatok, adatok$gender == Fiúk)[,2:3] row.names(adatok_fiu) - NULL adatok_lany - subset(adatok, adatok$gender == L÷nyok)[,2:3] row.names(adatok_lany) - NULL m1 - lms(BMI,age,data=adatok_fiu, cent=c(3,10,25,50,75,90,97), families=BCCG) fittedPlot(m1, x=adatok_fiu$age) m1 - lms(BMI,age,data=adatok_fiu, cent=c(3,10,25,50,75,90,97), families=BCCG, method.pb=GAIC, k=log(1455)) fittedPlot(m1, x=adatok_fiu$age) m1 - lms(BMI,age,data=adatok_fiu, cent=c(3,10,25,50,75,90,97), families=BCCG, method.pb=GAIC) fittedPlot(m1, x=adatok_fiu$age) m2 - lms(BMI,age,data=adatok_lany, cent=c(3,10,25,50,75,90,97), families=BCCG) m2 - lms(BMI,age,data=adatok_lany, cent=c(3,10,25,50,75,90,97), families=BCCG, method.pb=GAIC, k=log(1144)) m2 - lms(BMI,age,data=adatok_lany, cent=c(3,10,25,50,75,90,97), families=BCCG, method.pb=GAIC) m3 - gamlss(BMI~age, family=BCT, data=adatok_fiu) centiles(m3,xvar=adatok_fiu$age, cent=c(3,10,25,50,75,90,97)) newx - seq(12,20,.5) centiles.pred(m1, xname=age, xvalues=newx) centiles.pred(m3, xname=age, xvalues=newx) centiles(m1,adatok_fiu$age) #VGAM library(foreign) library(VGAM) library(VGAMdata) adatok - read.spss(MDSZ adatok.sav, to.data.frame=TRUE) adatok_fiu - subset(adatok, adatok$gender == Fiúk) adatok_lany - subset(adatok, adatok$gender == L÷nyok) fit1 - vgam(BMI ~ s(age), lms.bcn(percentiles = c(3, 10, 25, 50, 75, 90, 97)), adatok_fiu) fit2 - vgam(BMI ~ s(age), lms.bcn(percentiles = c(3, 10, 25, 50, 75, 90, 97)), adatok_lany) qtplot(fit1, percentiles = c(3, 10, 25,
[R] If-statement in for-loop
Hi everyone, you might find my question elementary but I am a beginner and unfortunately I can't fix the problem. So, I simulate this following algorithm and some values of c are NA. Therefore, I should add these following two if-statements but I don't know how I should do it in a for-loop. Thank in advance if someone helps me! The conditions: If there is no input greater or equal to d, then ALG = b*T If max(s + b*(0:T)) b*T , then OPT = b*T The algorithm: a - 0.0008 b - 0.0001 T - 100 t - 0:T alpha - 1 d- sqrt(a * b) * T - b * t N - 100 c - rep(0, N) for (j in 1:N) { e - rnorm(T, mean = 0, sd = 0.001) s- c( runif(1,0, a*T), rep(0, T-1) ) minx - 0 for(i in 2:T) { x - alpha * s[i-1] + e[i] maxx - a*(T-i) if(x minx) { s[i] - minx } else { if(x maxx) { s[i] - maxx } else s[i] - x } } p- which(s = d)[1] ALG- s[p] + b*(p-1) OPT - max(s + b*(0:T)) c[j] - OPT / ALG } head(c) mean(c) plot(c) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding tables
How about this: t2 - table(c(10,11,12,13)) t1 -table(c(1,1,2,4,5)) t12-(rbind(as.data.frame(t1),as.data.frame(t2))) xtabs(Freq~.,t12) it works for overlapping tables t2 - table(c(10,11,12,13,1,2)) t1 -table(c(1,1,2,4,5,11,12)) t12-(rbind(as.data.frame(t1),as.data.frame(t2))) xtabs(Freq~.,t12) and it will work for two-dimensional tables as well: t2 - table(c(10,11,12,13),letters[rep(1:2,each=2)]) t1 -table(c(1,1,2,4,5),letters[rep(c(1,3),length.out=5)]) t12a-(rbind(as.data.frame(t1),as.data.frame(t2))) xtabs(Freq~.,t12a) cheers. Am 10.12.2013 00:59, schrieb Ross Boylan: Can anyone recommend a good way to add tables? Ideally I would like t1 - table(x1) t2 - table(x2) t1+t2 It t1 and t2 have the same levels this works fine, but I need something that will work even if they differ, e.g., t1 1 2 4 5 2 1 1 1 t2 - table(c(10, 11, 12, 13)) t1+t2 # apparently does simple vector addition 1 2 4 5 3 2 2 2 whereas I want 1 2 4 5 10 11 12 13 2 1 1 1 1 1 1 1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Eik Vettorazzi Department of Medical Biometry and Epidemiology University Medical Center Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/7410-58243 F ++49/40/7410-57790 -- Besuchen Sie uns auf: www.uke.de _ Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; Gerichtsstand: Hamburg Vorstandsmitglieder: Prof. Dr. Christian Gerloff (Vertreter des Vorsitzenden), Prof. Dr. Dr. Uwe Koch-Gromus, Joachim Prölß, Rainer Schoppik _ SAVE PAPER - THINK BEFORE PRINTING __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem to predict new data with mclust
Btw I am using mclust, and basically wanted to reproduce the mstep and estep of the Mclust algorithm for new data. Any help is appreaciated! On 10 December 2013 21:23, Welma Pereira welma.pere...@gmail.com wrote: Hi, Thanks for the hint Sarah. So here is my problem again: (files train_pca is in train_pca and test is in test_pca attached) mclust2_pca - Mclust(train_pca,G=2, modelNames= c(EII, VII, EEI, EVI, VEI, VVI)) pred-predict.Mclust(mclust2_pca,test) Warning message: In cdensVVI(data = c(2.80217508052409, 2.75071740560707, 2.6175058179515, : cannot compute E-step Thanks! On 10 December 2013 16:30, Sarah Goslee sarah.gos...@gmail.com wrote: Hi, It's unlikely that anyone will be able to see your problem without a reproducible example. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example Are you using the mclust package? That would be the first thing we need to know. Then, have you worked through the examples in ?predict.Mclust, and do you understand them? Then, what do your data look like? str() and such are useful, but the best option is to use dput() to create a reproducible subset, or to reproduce your problem using one of the built-in datasets in R, as is done in the above documentation examples. Sarah On Tue, Dec 10, 2013 at 6:41 AM, Welma Pereira welma.pere...@gmail.com wrote: Hi, I am trying to use mclust to cluster some data (train_pca_10), I get the clusters, but when I try to use the model to predict new data (test1) I get this error mClust2 - Mclust(train_pca_10,G=2) pred-predict.Mclust(mClust2,test1) Error in if (warn) warning(WARNING) : argument is of length zero Can anyone see the problem here? Thanks, Pereira. -- Sarah Goslee http://www.functionaldiversity.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem to predict new data with mclust
Many people do not want to download and open random attachments. That's why I suggested dput() with part of your data. dput(head(yourdata, 20)) is enough for many problems, though it's good to try it out. Or use one of the datasets built into R. Sarah On Tue, Dec 10, 2013 at 3:23 PM, Welma Pereira welma.pere...@gmail.com wrote: Hi, Thanks for the hint Sarah. So here is my problem again: (files train_pca is in train_pca and test is in test_pca attached) mclust2_pca - Mclust(train_pca,G=2, modelNames= c(EII, VII, EEI, EVI, VEI, VVI)) pred-predict.Mclust(mclust2_pca,test) Warning message: In cdensVVI(data = c(2.80217508052409, 2.75071740560707, 2.6175058179515, : cannot compute E-step Thanks! On 10 December 2013 16:30, Sarah Goslee sarah.gos...@gmail.com wrote: Hi, It's unlikely that anyone will be able to see your problem without a reproducible example. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example Are you using the mclust package? That would be the first thing we need to know. Then, have you worked through the examples in ?predict.Mclust, and do you understand them? Then, what do your data look like? str() and such are useful, but the best option is to use dput() to create a reproducible subset, or to reproduce your problem using one of the built-in datasets in R, as is done in the above documentation examples. Sarah On Tue, Dec 10, 2013 at 6:41 AM, Welma Pereira welma.pere...@gmail.com wrote: Hi, I am trying to use mclust to cluster some data (train_pca_10), I get the clusters, but when I try to use the model to predict new data (test1) I get this error mClust2 - Mclust(train_pca_10,G=2) pred-predict.Mclust(mClust2,test1) Error in if (warn) warning(WARNING) : argument is of length zero Can anyone see the problem here? Thanks, Pereira. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Get average model after dredge function ran in a loop
Dear all I'm a beginner in R and I'm trying to get the final model after I run the dredge function for 10 times (with a loop): ###Building the Model Coy.glm0-glm(pa ~ shrub + snowdep + tree + bio5 + bio6 + bio12 + log(human+1), data=Coy.pa, family=binomial) summary(Coy.glm0) install.packages('MuMIn') library(MuMIn) Coy.dredge-dredge(Coy.glm0) head(Coy.dredge) ##Look in which colum is AIC ###Building a simulation Coy.models-Coy.dredge[1,c(1:13)] Coy.models ###Turn a loop who will create 10 models run=1 while(run11) #11 means 10 models. { Coy.abs$Long-runif(300,498,2579440) Coy.abs$Lat-runif(300,-51483,1377669) Coy.pa-rbind(Coy.train, Coy.abs) train ou prSS Coy.ppp-ppp(Coy.pa$Long,Coy.pa$Lat, window=win, unitname=meters) Coy.pa$snowdep-snowdepz.im[Coy.ppp, drop=F] Coy.pa$tree-treez.im[Coy.ppp, drop=F] Coy.pa$bio5-bio5z.im[Coy.ppp, drop=F] Coy.pa$bio6-bio6z.im[Coy.ppp, drop=F] Coy.pa$bio12-bio12z.im[Coy.ppp, drop=F] Coy.pa$human-humanz.im[Coy.ppp, drop=F] Coy.pa$shrub-shrub.im[Coy.ppp, drop=F] Coy.glm0-glm(pa ~ shrub + snowdep + tree + bio5 + bio6 + bio12+ log(human+1), data=Coy.pa, family=binomial) Coy.dredge-dredge(Coy.glm0) Coy.models-rbind(Coy.models, Coy.dredge[1,c(1:13)]) run=run+1 } I do get a best model for each run which I then hand pick and add to a table. The problem is that I have 11 models now in this table and I want their average to get the final model. I don't know how to do it from the table (as the model.avg() will tell me I only have one model in the table, because it's not recognizing the different rows as different models), but on the other hand there must be a way to do it directly in the loop, only I'm not sure at what point of the script I should be asking for it and how the code should be written. I would very much appreciate any help you can give me. Thank you. Cat [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need help figuring out sapply (and similar functions) with multiple parameter user defined function
Petr, Thank you for you assistance. Particularly this last bit. It really helped me understand exactly how your solution worked and why I was confused by rapply processing rows. It was apply that could process rows (and/or columns) of a matrix, Anyway for anyone who wants the version of my script that incorporates all of the suggested changes, here it is. survey.results - read.csv(SamplePairedComparisonData.csv) # The following function evaluate which, if any, preference the respondent has shown get.pref - function(x) { tb - table(x) if ((length(tb) == 3) max((tb) == min(tb))) retVal - None # This opton checks for a couple of special cases where the simple which.max function fails to # detect that each of the three items has a single response. else # For all other conditions the following detects which response received the most responses (2+) and records a NONE # in any case that doesn't achieve that level of response. retVal - names(which.max(table(x))) return(retVal) } # The data is expected to be organized as so: # ID, Q1, Q2, Q3 # and that the questions are in the following format # Q.1) Which do you prefer? # 1) Option 1 # 2) Option 2 # Q.2) Which do you prefer? # 1) Option 1 # 2) Option 3 # Q.3) Which do you prefer? # 1) Option 2 # 2) Option 3 # And the next three lines reprocess the data table to organize the questions so that the values for Q.2) are 1 or 3 and the values for Q.3) are 2 or 3 survey.results[survey.results[,3]==2,3]-3 # Convert column 3 (Q2) to a value of 3 if the existing value is 2 # The order of the following two commands is important, don't change survey.results[survey.results[,4]==2,4]-3 # Convert column 4 (Q3) to a value of 3 if the existing value is 1 survey.results[survey.results[,4]==1,4]-2 # Convert column 4 (Q4) to a value of 2 if the existing value is 1 # The following lines convert the numerical values in the fields to text (factors) # If the questions don't have empty (zero) values than the None should be removed from those questions where zero isn't possible # The labels 'Option 1' and 'Option 2' can be replaced with more appropriate text as needed. survey.results$Q1 - factor(survey.results$Q1, labels=c(None,Option 1,Option 2)) survey.results$Q2 - factor(survey.results$Q2, labels=c(None,Option 1,Option 3)) survey.results$Q3 - factor(survey.results$Q3, labels=c(None,Option 2,Option 3)) # Take the survey data table and get rid of the first column (ID) and convert the remaining three columns # to a matrix. Then apply the get.pref function to each of the rows in the matrix survey.results$Preference - apply(as.matrix(survey.results[,-1]),1,get.pref) On 12/10/2013 05:02 AM, PIKAL Petr wrote: In that case you need correct those cases inside the function. Everything else stays same. Something like fff - function(x) { tb-table(x) if(max(tb)==min(tb)) res - None else res - names(which.max(table(x))) } Regards Petr [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Add column to DF based on 2 columns in another DF
LookupTable - read.table(header = TRUE, stringsAsFactors = FALSE, text=Str IND13 IND12 IND11 IND07 IND06 1 517 529 562 562 567 2 517 529 562 562 568 3 517 529 562 562 567 4 517 529 562 562 569 5 517 529 562 562 567 6 517 529 562 562 567 7 517 529 562 562 560 8 517 529 562 562 567 9 517 529 562 562 567 10 517 529 562 562 567) MainDataFrame - read.table(header = TRUE, stringsAsFactors = FALSE, text=Pid YEAR MONTH Fips Str index 600250 2006 7 6037 1 IND06 600250 2006 7 6037 2 IND06 600250 2006 7 6037 3 IND06 600250 2006 7 6037 4 IND06 600250 2006 7 6037 5 IND06 600353 2007 9 48097 6 IND07 600772 2006 2 6039 7 IND06 600947 2007 1 13207 7 IND07 601055 2007 9 13315 8 IND07 601103 2006 5 21093 10 IND06) MainDataFrame_New - ?? #What is the best way to add a new column index_num to MainDataFrame that is populated with the #number corresponding to 'Str' and 'index' in LookupTable: # Pid YEAR MONTH Fips Str index index_num # 600250 2006 7 6037 1 IND06 567 # 600250 2006 7 6037 2 IND06 568 # 600250 2006 7 6037 3 IND06 567 # 600250 2006 7 6037 4 IND06 569 # 600250 2006 7 6037 5 IND06 567 # 600353 2007 9 48097 6 IND07 562 # 600772 2006 2 6039 7 IND06 560 # 600947 2007 1 13207 7 IND07 562 # 601055 2007 9 13315 8 IND07 562 # 601103 2006 5 21093 10 IND06 567 -- View this message in context: http://r.789695.n4.nabble.com/Add-column-to-DF-based-on-2-columns-in-another-DF-tp4681950.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fwd: Datatable manipulation
Hi, I have a doubt after a long time :) . I have a function which has sqldf statements and it works fine when I call it from console but when I am calling that function within another function, it gives me the error that *Error in sqliteExecStatement(con, statement, bind.data) : * * RS-DBI driver: (error in statement: no such table: table_name)* What is wrong here? On Fri, Nov 22, 2013 at 7:27 PM, arun smartpink...@yahoo.com wrote: Hi, Assuming that this is the case: dat1 - read.table(text=a b c d e 1 2 3 4 5 10 9 8 7 6,sep=,header=TRUE) Names1- read.table(text=Original New e ee ggg a aa c cc f ff,sep=,header=TRUE,stringsAsFactors=FALSE) indx - match(names(dat1),Names1[,1]) names(dat1)[names(dat1) %in% Names1[,1]] - Names1[,2][indx[!is.na (indx)]] dat1 # aa b cc d ee #1 1 2 3 4 5 #2 10 9 8 7 6 A.K. On Friday, November 22, 2013 4:46 AM, Nitisha jha nitisha...@gmail.com wrote: Hey! I got this one. :) For the match function, actually I just want the ones that are matching to be replaced. Rest should stay the same. How do I do that? When I tried your command, if there is no match, it writes var2 or something. On Fri, Nov 22, 2013 at 12:38 AM, arun smartpink...@yahoo.com wrote: Hi, Try: dat1 - read.table(text=a b c d e 1 2 3 4 5 10 9 8 7 6,sep=,header=TRUE) Names1- read.table(text=Original New e ee b bb a aa c cc d dd,sep=,header=TRUE,stringsAsFactors=FALSE) It is better to dput() your dataset. For example: dput(Names1) structure(list(Original = c(e, b, a, c, d), New = c(ee, bb, aa, cc, dd)), .Names = c(Original, New), class = data.frame, row.names = c(NA, -5L)) names(dat1) - Names1[,2][match(names(dat1), Names1[,1])] ## dat1 # aa bb cc dd ee #1 1 2 3 4 5 #2 10 9 8 7 6 A.K. On Thursday, November 21, 2013 1:45 PM, Nitisha jha nitisha...@gmail.com wrote: Hi, Thanks. I used as.character() and got the right strings. Btw, I have lots of handicaps regarding R. I have to rename the columns(I have 22 columns here). I have the new names along with the original names in another dataset. Right now, I am going hardcoding all the 19 name changes(tedious and not optimum). 1st 3 names remain the same. I will give u a sample dataset. Let me know if there is any easy way of doing this. Pardon the displaced column labels. Original dataset. a b c d e 1 2 3 4 5 10 9 8 7 6 Dataset for name change Original New e ee b bb a aa c cc d dd I want my final dataset to be like this: aa bb cc dd ee 1 2 3 4 5 10 9 8 7 6 Could u tell me an optimal way to do it. My method is tedious and not good. Also, is there a way to import .xls without perl (windows)? Thanks for being patient. :) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data distribution for lme
Hi folks, I am using the lme package of R, and am wondering if it is assumed that the dependent factor (what we fit for; y in many relevant texts) has to have a normal Gaussian distribution? Is there any margins where some skewness in the data is accepted and how within R itself one could check distribution of the data? Thanks, Peyman __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fwd: Datatable manipulation
Hi, Check these links: http://stackoverflow.com/questions/19626534/r-issue-error-in-sqliteexecstatementcon-statement-bind-data-no-such-tabl http://r.789695.n4.nabble.com/Problem-with-SQLDF-Error-in-sqliteExecStatement-con-statement-bind-data-RS-DBI-driver-error-in-state-td4621931.html A.K. On Tuesday, December 10, 2013 6:54 AM, Nitisha jha nitisha...@gmail.com wrote: Hi, I have a doubt after a long time :) . I have a function which has sqldf statements and it works fine when I call it from console but when I am calling that function within another function, it gives me the error that Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: no such table: table_name) What is wrong here? On Fri, Nov 22, 2013 at 7:27 PM, arun smartpink...@yahoo.com wrote: Hi, Assuming that this is the case: dat1 - read.table(text=a b c d e 1 2 3 4 5 10 9 8 7 6,sep=,header=TRUE) Names1- read.table(text=Original New e ee g gg a aa c cc f ff,sep=,header=TRUE,stringsAsFactors=FALSE) indx - match(names(dat1),Names1[,1]) names(dat1)[names(dat1) %in% Names1[,1]] - Names1[,2][indx[!is.na(indx)]] dat1 # aa b cc d ee #1 1 2 3 4 5 #2 10 9 8 7 6 A.K. On Friday, November 22, 2013 4:46 AM, Nitisha jha nitisha...@gmail.com wrote: Hey! I got this one. :) For the match function, actually I just want the ones that are matching to be replaced. Rest should stay the same. How do I do that? When I tried your command, if there is no match, it writes var2 or something. On Fri, Nov 22, 2013 at 12:38 AM, arun smartpink...@yahoo.com wrote: Hi, Try: dat1 - read.table(text=a b c d e 1 2 3 4 5 10 9 8 7 6,sep=,header=TRUE) Names1- read.table(text=Original New e ee b bb a aa c cc d dd,sep=,header=TRUE,stringsAsFactors=FALSE) It is better to dput() your dataset. For example: dput(Names1) structure(list(Original = c(e, b, a, c, d), New = c(ee, bb, aa, cc, dd)), .Names = c(Original, New), class = data.frame, row.names = c(NA, -5L)) names(dat1) - Names1[,2][match(names(dat1), Names1[,1])] ## dat1 # aa bb cc dd ee #1 1 2 3 4 5 #2 10 9 8 7 6 A.K. On Thursday, November 21, 2013 1:45 PM, Nitisha jha nitisha...@gmail.com wrote: Hi, Thanks. I used as.character() and got the right strings. Btw, I have lots of handicaps regarding R. I have to rename the columns(I have 22 columns here). I have the new names along with the original names in another dataset. Right now, I am going hardcoding all the 19 name changes(tedious and not optimum). 1st 3 names remain the same. I will give u a sample dataset. Let me know if there is any easy way of doing this. Pardon the displaced column labels. Original dataset. a b c d e 1 2 3 4 5 10 9 8 7 6 Dataset for name change Original New e ee b bb a aa c cc d dd I want my final dataset to be like this: aa bb cc dd ee 1 2 3 4 5 10 9 8 7 6 Could u tell me an optimal way to do it. My method is tedious and not good. Also, is there a way to import .xls without perl (windows)? Thanks for being patient. :) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Add column to DF based on 2 columns in another DF
Thanks, AK. That gets the job done and I learned 2 ways to do it. Much appreciated as always. -- View this message in context: http://r.789695.n4.nabble.com/Add-column-to-DF-based-on-2-columns-in-another-DF-tp4681950p4681969.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data distribution for lme
This is not really an R question -- it is statistics. In any case, you should do better posting this on the R-Sig-Mixed-Models list, which concerns itself with matters like this. However, I'll hazard a guess at an answer: maybe. (Vague questions elicit vague answers). Cheers, Bert On Tue, Dec 10, 2013 at 6:55 AM, peyman zira...@gmail.com wrote: Hi folks, I am using the lme package of R, and am wondering if it is assumed that the dependent factor (what we fit for; y in many relevant texts) has to have a normal Gaussian distribution? Is there any margins where some skewness in the data is accepted and how within R itself one could check distribution of the data? Thanks, Peyman __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] radial.plot shaded region
I'm working with radial.plot in the plotrix package. Is there a way to show a shaded region for a specific range of values in radial.lim? For example: # test data from radial.plot{plotrix} testlen-c(sin(seq(0,1.98*pi,length=100))+2+rnorm(100)/10) testpos-seq(0,1.98*pi,length=100) # the original plot radial.plot(testlen,testpos,rp.type=p,main=Test Polygon,line.col=blue) # my attempt to shade the region between 3 and 3.5: radial.plot( matrix(c(3.5, 3), byrow = TRUE), matrix(c(testpos, testpos), byrow = TRUE, nrow = 2), rp.type=p, main=Test Polygon, poly.col = c('light blue', 'white'), radial.lim = c(0, 3.5) ) # In effect, draw a polygon at 3.5 filled with light blue, then another at 3 filled with white. # Now overplot the values of interest: radial.plot(testlen,testpos,rp.type=p,main=Test Polygon,line.col=blue, radial.lim = c(0, 3.5), add = TRUE) Is there an easier way? How can I re-draw the grid lines that are in the original plot? Thanks, Steve . . . . . . . . . . . . . . . Steven M. Hodge Child and Adolescent Neurodevelopment Initiative S3-301 University of Massachusetts Medical School 55 Lake Avenue North Worcester, MA 01655 steven.ho...@umassmed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] radial.plot shaded region
On 12/11/2013 10:33 AM, Hodge, Steven wrote: I'm working with radial.plot in the plotrix package. Is there a way to show a shaded region for a specific range of values in radial.lim? For example: # test data from radial.plot{plotrix} testlen-c(sin(seq(0,1.98*pi,length=100))+2+rnorm(100)/10) testpos-seq(0,1.98*pi,length=100) # the original plot radial.plot(testlen,testpos,rp.type=p,main=Test Polygon,line.col=blue) # my attempt to shade the region between 3 and 3.5: radial.plot( matrix(c(3.5, 3), byrow = TRUE), matrix(c(testpos, testpos), byrow = TRUE, nrow = 2), rp.type=p, main=Test Polygon, poly.col = c('light blue', 'white'), radial.lim = c(0, 3.5) ) # In effect, draw a polygon at 3.5 filled with light blue, then another at 3 filled with white. # Now overplot the values of interest: radial.plot(testlen,testpos,rp.type=p,main=Test Polygon,line.col=blue, radial.lim = c(0, 3.5), add = TRUE) Is there an easier way? How can I re-draw the grid lines that are in the original plot? Hi Steve, That is a tricky one. The best I can do at the moment is to suggest the following be added: radial.grid(radial.lim=c(0,3.5), grid.pos=seq(0,3.5,length.out=8)) radial.plot(testlen,testpos,rp.type=p, main=,line.col=blue,add=TRUE) There may be a solution using the radial.pie function, and if I find it, I'll post it. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data distribution for lme
No - it is assumed to be conditionally normal, that is, normal conditional on the model. So you should be looking at the distributions of the residuals rather than of the response variable, as an indicator for whether or not the model assumptions are satisfied. Skewness in the residuals may or may not affect the outcome; it depends on what the purpose of the model is. Skewness in residuals may arise from a number of different problems with the model. I hope that this helps, Andrew On Wed, Dec 11, 2013 at 1:55 AM, peyman zira...@gmail.com wrote: Hi folks, I am using the lme package of R, and am wondering if it is assumed that the dependent factor (what we fit for; y in many relevant texts) has to have a normal Gaussian distribution? Is there any margins where some skewness in the data is accepted and how within R itself one could check distribution of the data? Thanks, Peyman __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Deputy Director, CEBRA Senior Lecturer in Applied Statistics Tel: +61-3-8344-6410 Department of Mathematics and StatisticsFax: +61-3-8344 4599 University of Melbourne, VIC 3010 Australia Email: a.robin...@ms.unimelb.edu.auWebsite: http://www.ms.unimelb.edu.au FAwR: http://www.ms.unimelb.edu.au/~andrewpr/FAwR/ SPuR: http://www.ms.unimelb.edu.au/spuRs/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fisher.test - can I use non-integer expected values?
On Dec 10, 2013, at 2:04 PM, bakerwl wrote: I seem to be able to use expected values that are decimal (e.g., 1.33) when using chisq.test but not when using fisher.test. There are no expected values in the input to fisher.test. This happens when using an array/matrix as input. Fisher.test returns: Error in sprintf(gettext(fmt, domain = domain), ...) : invalid format '%d'; use format %s for character objects. Thus, it appears fisher.test is looking for integers only. That would seem to be a very reasonable assumption. I tried putting the data in x and y factor objects, but that does not work either. Is there another way to use non-integer expected values with fisher.test or is that a limitation of fisher.test? If I must use integer expected values, I suppose one option would be round the expected value down or up to an integer. But, which? I tried that, but they produce different p values. Well, of course. First, you tell us why you need `fisher.test` at all. It says very clearly it is for count data and you clearly want to do something with input that is not counts. `prop.test` will test a distribution of counts against expected proportions and `binom.test` will do an exact test of a Bernoulli experiment against (one) proportion. Thanks for any help! View this message in context: http://r.789695.n4.nabble.com/fisher-test-can-I-use-non-integer-expected-values-tp4681976.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data distribution for lme
See inline below. On 12/11/13 11:28, Bert Gunter wrote: This is not really an R question -- it is statistics. In any case, you should do better posting this on the R-Sig-Mixed-Models list, which concerns itself with matters like this. However, I'll hazard a guess at an answer: maybe. (Vague questions elicit vague answers). No! Nay! Never! Well, hardly ever. The ***y*** values will rarely be Gaussian. (Think about a simple one-way anova, with 3 levels, and N(0,sigma^2) errors. The y values will have a distribution which is a mixture of 3 independent Gaussian distributions.) You *may* wish to worry about whether the ***errors*** have a Gaussian distribution. Some inferential results depend on this, but in many cases these results are quite robust to non-Gaussianity. There. I have exhausted my knowledge of the subject. cheers, Rolf Cheers, Bert On Tue, Dec 10, 2013 at 6:55 AM, peyman zira...@gmail.com wrote: Hi folks, I am using the lme package of R, and am wondering if it is assumed that the dependent factor (what we fit for; y in many relevant texts) has to have a normal Gaussian distribution? Is there any margins where some skewness in the data is accepted and how within R itself one could check distribution of the data? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] radial.plot shaded region
On Dec 10, 2013, at 4:24 PM, Jim Lemon wrote: On 12/11/2013 10:33 AM, Hodge, Steven wrote: I'm working with radial.plot in the plotrix package. Is there a way to show a shaded region for a specific range of values in radial.lim? For example: # test data from radial.plot{plotrix} testlen-c(sin(seq(0,1.98*pi,length=100))+2+rnorm(100)/10) testpos-seq(0,1.98*pi,length=100) # the original plot radial.plot(testlen,testpos,rp.type=p,main=Test Polygon,line.col=blue) # my attempt to shade the region between 3 and 3.5: radial.plot( matrix(c(3.5, 3), byrow = TRUE), matrix(c(testpos, testpos), byrow = TRUE, nrow = 2), rp.type=p, main=Test Polygon, poly.col = c('light blue', 'white'), radial.lim = c(0, 3.5) ) # In effect, draw a polygon at 3.5 filled with light blue, then another at 3 filled with white. # Now overplot the values of interest: radial.plot(testlen,testpos,rp.type=p,main=Test Polygon,line.col=blue, radial.lim = c(0, 3.5), add = TRUE) Is there an easier way? How can I re-draw the grid lines that are in the original plot? Hi Steve, That is a tricky one. The best I can do at the moment is to suggest the following be added: radial.grid(radial.lim=c(0,3.5), grid.pos=seq(0,3.5,length.out=8)) radial.plot(testlen,testpos,rp.type=p, main=,line.col=blue,add=TRUE) There may be a solution using the radial.pie function, and if I find it, I'll post it. Looking at the code I would have thought it would not be too difficult to de-couple the plot setup and the grid plotting in this section: if (!add) { par(mar = mar, pty = s) plot(c(-maxlength, maxlength), c(-maxlength, maxlength), type = n, axes = FALSE, main = main, xlab = xlab, ylab = ylab) if (show.grid) { for (i in seq(length(grid.pos), 1, by = -1)) { xpos - cos(angles) * (grid.pos[i] - radial.lim[1]) ypos - sin(angles) * (grid.pos[i] - radial.lim[1]) polygon(xpos, ypos, border = grid.col, col = grid.bg) } } But you are the Plotmeister and I am a humble Student. -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fisher.test - can I use non-integer expected values?
David, Thanks for your reply--I appreciate your thoughts. I will look at prop.test. The reason I chose fisher.test over chisq.test is that fisher.test is more appropriate when observed counts are not numerous--empty cells and cells with counts 5 are less a problem. Expected values are needed to test a null hypothesis against observed counts, but if total observed counts are 20 for 3 categories, then a null hypothesis of a random effect would use expected values = 6.67 in each of the 3 categories (20/3). Yes, fisher.test is for count data and so is chisq.test, but chisq.test allows 6.67 to be input as expected values in each of 3 categories, while fisher.test does not seem to allow this? I don't think it is inherent in Fisher's exact test itself that expected values must be integers, but not sure. -- View this message in context: http://r.789695.n4.nabble.com/fisher-test-can-I-use-non-integer-expected-values-tp4681976p4681989.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] radial.plot shaded region
On 12/11/2013 11:24 AM, Jim Lemon wrote: ... There may be a solution using the radial.pie function, and if I find it, I'll post it. Hi Steve, Here it is. Just call radial.pie twice to get the annulus, then call radial.grid with the appropriate arguments, and then you can add whatever radial.plot you want, using the add argument. radial.pie(3.5,sector.colors=lightblue) radial.pie(3,sector.colors=white,add=TRUE) radial.grid(radial.lim=c(0,3.5),grid.pos=seq(0,3.5,length.out=8)) Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data distribution for lme
Thanks Rolf and Andrew. I was entirely too careless and should take a trip to the woodshed (google David Stockman woodshed for the reference). The correct answer therefore is: maybe for the residuals, for the right model, of course. But I still think the crowd on r-sig-mixed-models is the right place to hash it out, if anything meaningful can indeed be made of it. Cheers, Bert On Tue, Dec 10, 2013 at 5:33 PM, Rolf Turner r.tur...@auckland.ac.nz wrote: See inline below. On 12/11/13 11:28, Bert Gunter wrote: This is not really an R question -- it is statistics. In any case, you should do better posting this on the R-Sig-Mixed-Models list, which concerns itself with matters like this. However, I'll hazard a guess at an answer: maybe. (Vague questions elicit vague answers). No! Nay! Never! Well, hardly ever. The ***y*** values will rarely be Gaussian. (Think about a simple one-way anova, with 3 levels, and N(0,sigma^2) errors. The y values will have a distribution which is a mixture of 3 independent Gaussian distributions.) You *may* wish to worry about whether the ***errors*** have a Gaussian distribution. Some inferential results depend on this, but in many cases these results are quite robust to non-Gaussianity. There. I have exhausted my knowledge of the subject. cheers, Rolf Cheers, Bert On Tue, Dec 10, 2013 at 6:55 AM, peyman zira...@gmail.com wrote: Hi folks, I am using the lme package of R, and am wondering if it is assumed that the dependent factor (what we fit for; y in many relevant texts) has to have a normal Gaussian distribution? Is there any margins where some skewness in the data is accepted and how within R itself one could check distribution of the data? -- Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fisher.test - can I use non-integer expected values?
On Dec 10, 2013, at 6:55 PM, bakerwl wrote: David, Thanks for your reply--I appreciate your thoughts. I will look at prop.test. The reason I chose fisher.test over chisq.test is that fisher.test is more appropriate when observed counts are not numerous--empty cells and cells with counts 5 are less a problem. Expected values are needed to test a null hypothesis against observed counts, but if total observed counts are 20 for 3 categories, then a null hypothesis of a random effect would use expected values = 6.67 in each of the 3 categories (20/3). Yes, fisher.test is for count data and so is chisq.test, but chisq.test allows 6.67 to be input as expected values in each of 3 categories, while fisher.test does not seem to allow this? I don't think it is inherent in Fisher's exact test itself that expected values must be integers, but not sure. I see it differently, although I could be further educated on the subject and I've been wrong on Rhelp before. I think it _is_ inherent in Fisher's Exact Test. FET is essentially a permutation test built on the hypergeometric distribution (a discrete distribution) and it is unclear what to do with 1.33 of an entity under conditions of permutation. The chi-square test (one of many so-called chi-square tests) is a pretty good approximation to the discrete counterparts despite the fact that the chi-square distribution takes continuous arguments and generally holds well down to expected counts of 5. The link between the chi-square and binomial distributions is through there variances: npq vs sum(o-e)^2/n. You can develop arguments in the limit that converge fairly quickly. -- View this message in context: http://r.789695.n4.nabble.com/fisher-test-can-I-use-non-integer-expected-values-tp4681976p4681989.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fisher.test - can I use non-integer expected values?
On Dec 10, 2013, at 8:21 PM, David Winsemius wrote: On Dec 10, 2013, at 6:55 PM, bakerwl wrote: David, Thanks for your reply--I appreciate your thoughts. I will look at prop.test. The reason I chose fisher.test over chisq.test is that fisher.test is more appropriate when observed counts are not numerous--empty cells and cells with counts 5 are less a problem. Expected values are needed to test a null hypothesis against observed counts, but if total observed counts are 20 for 3 categories, then a null hypothesis of a random effect would use expected values = 6.67 in each of the 3 categories (20/3). Yes, fisher.test is for count data and so is chisq.test, but chisq.test allows 6.67 to be input as expected values in each of 3 categories, while fisher.test does not seem to allow this? I don't think it is inherent in Fisher's exact test itself that expected values must be integers, but not sure. I see it differently, although I could be further educated on the subject and I've been wrong on Rhelp before. I think it _is_ inherent in Fisher's Exact Test. FET is essentially a permutation test built on the hypergeometric distribution (a discrete distribution) and it is unclear what to do with 1.33 of an entity under conditions of permutation. The chi-square test (one of many so-called chi-square tests) is a pretty good approximation to the discrete counterparts despite the fact that the chi-square distribution takes continuous arguments and generally holds well down to expected counts of 5. The link between the chi-square and binomial distributions is through there variances: npq vs sum(o-e)^2/n. You can develop arguments in the limit that converge fairly quickly. I was careless there, both in the spelling of 'their' and in the connection of chi-square distributions to binomial. You should consult more authoritative source for the mathematics of similarities in their large sample features. -- David -- View this message in context: http://r.789695.n4.nabble.com/fisher-test-can-I-use-non-integer-expected-values-tp4681976p4681989.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fisher.test - can I use non-integer expected values?
On Tue, Dec 10, 2013 at 6:55 PM, bakerwl bake...@uwyo.edu wrote: Expected values are needed to test a null hypothesis against observed counts, but if total observed counts are 20 for 3 categories, then a null hypothesis of a random effect would use expected values = 6.67 in each of the 3 categories (20/3). Yes, fisher.test is for count data and so is chisq.test, but chisq.test allows 6.67 to be input as expected values in each of 3 categories, while fisher.test does not seem to allow this? To the best of my knowledge (which may be limited) you never put expected counts as input in Fisher Exact Test, you need to put actual observed counts. Fisher test tests the independence of two different random variables, each of which has a set of categorical outcomes. From what you wrote it appears that you have only one random variable that can take 3 different values, and you want a statistical test for whether the frequencies are the same. You can use chisq.test for this by specifying the probabilities (argument p) and running it as a goodness-of-fit test. I am not aware of goodness-of-fit way of using fisher.test. If you actually have two different variables, one of which can take two values and the other one can take 3 values, you need the actual observed counts for each of the 6 combinations of the two variables. You put these counts into a 2x3 table and supply that to fisher.test or chisq.test. I don't think it is inherent in Fisher's exact test itself that expected values must be integers, but not sure. I think it is inherent in Fisher's Exact test. The test makes certain assumptions about the distribution of the numbers you put in. If you put in non-integers, you necessarily violate those assumptions and the test is then not applicable. Peter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Stochastic Dominance Analysis
Do anyone now R packages that run/test stochastic dominance with respect to a function (SDRF) and/or stochastic efficiency with respect to a function (SERF)? Peter Maclean Department of Economics UDSM [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] setting effect plot parameters
I have tried to set parameters of an effects plot with moderate success, but I'm finding that although the plot(effect()) method uses trellis graphics (via the lattice package), not all aspects of the plot can be controlled by standard trellis graphics. I am able to set any parameters that are given in trellis.par.get() easily enough, for example: axis.components - trellis.par.get(axis.components) axis.components$right$tck - 0 axis.components$top$tck - 0 axis.components$left$pad1 - 2 axis.components$left$pad2 - 2 trellis.par.set(axis.components, axis.components) but I have been unable to change other axis parameters such as setting the number of ticks on the bottom and left axes. I can do this in a conventional lattice plot such as xyplot xyplot(lar~mdep, data=m1, scales=list( x=list(tck=c(1,0), at=seq(100,max(m1$mdep),100)), y=list(tck=c(1,0), at=seq(0,max(m1$lar)+0.1,0.05 but this does not work for effects plots. Any help would be much appreciated. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Splitting a vector
This does what I needed. However, as the output is a list object, is there any way to apply a function to such object? For example if I want to compute the mean for the 4th subvectors, I can't simply use: # Y=split(X,as.numeric(gl(length(X),3,length(X mean(Y[4]) # as the error message shows argument is not numeric or logical -- View this message in context: http://r.789695.n4.nabble.com/Splitting-a-vector-tp4681930p4681987.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SDM using BIOMOD2 error message
Hi I am working on Species Distribution Modeling in R using BIOMOD2. When picking my algorithms to model I keep getting an error message about length (pred) this is the code that I'm using library(knitr) library(markdown) library(rgdal) library(raster) library(shapefiles) library(biomod2) library(dismo) library(sp) library(rgeos) library(maptools) library(MigClim) setwd(V:/BIOCLIM) bio2 - raster(bio_2.bil) bio5 - raster(bio_5.bil) bio6 - raster(bio_6.bil) bio15 - raster (bio_15.bil) bio18 - raster (bio_18.bil) bio19 - raster (bio_19.bil) bio2 - crop (bio2, extent (-112,-105,34,39)) bio5 - crop (bio5, extent (-112,-105,34,39)) bio6 - crop (bio6, extent (-112,-105,34,39)) bio15 - crop (bio15, extent (-112,-105,34,39)) bio18 - crop (bio18, extent (-112,-105,34,39)) bio19 - crop (bio19, extent (-112,-105,34,39)) envStack -stack (bio2, bio5,bio6,bio15,bio18, bio19) rm #bring in your presence or presence/absence data setwd(C:/Users/Lindsie/Documents/R) BH - read.csv(occurrence_BH.csv) head(BH) myBiomodData=BIOMOD_FormatingData(resp.var=as.matrix(BH[,3]),expl.var=envStack,resp.xy=BH[,1:2],resp.name=Huntii, PA.nb.rep=2, PA.nb.absences=909, PA.strategy=random,na.rm=TRUE) myBiomodOption - BIOMOD_ModelingOptions() *myBiomodModelOut - BIOMOD_Modeling(myBiomodData, models = c(GLM,GBM,GAM,CTA,ANN,SRE,FDA,MARS,RF), models.options = myBiomodOption, NbRunEval = 1, DataSplit = 70, Yweights = NULL, VarImport = 3, models.eval.meth = c(ROC, TSS, KAPPA), SaveObj = TRUE, rescal.all.models = TRUE)* at this point I get an error message that says *Error in length(pred) : 'pred' is missing * any suggestions? Thanks -- View this message in context: http://r.789695.n4.nabble.com/SDM-using-BIOMOD2-error-message-tp4681993.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.