[R] Multivariate discrete HMMs
Hi r-help, I have been using your RHmm package for some time and have recently had to try using the package for a new dataset. Basically I have a dataset with a number of discrete observation variables that change over time, and I would love to try modeling them using a HMM. Basically I was wondering if RHmm can be used to model a multivariate discrete HMM, i.e., the observations are a vector of discrete measurements? From what I see in the documentation and from playing around with examples, it seems like this may not be possible. My understand of the mathematics behind multivariate HMMs is limited, so I would appreciate any advance you might be able to give. Thanks for any help anyone can give __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Splitting a categorical variable into multiple variables
Thanks Bert. I guess I was just wondering if there was a way to create the new factors automatically without me having to hard code the level names manually in my R code. Rgds Claus On Fri, Aug 9, 2013 at 3:42 PM, Bert Gunter wrote: > ... or if you want to keep the unchanged levels the same: > > zz <- factor(ifelse( z %in% c("a", "b"),"d" ,levels(z)[z])) > > -- Bert > > On Fri, Aug 9, 2013 at 7:35 AM, Bert Gunter wrote: >> If I understand what you mean, just recode them. >> >> z <- factor(letters[1:3]) >> z >> zz <- factor(ifelse( z %in% c("a", "b"),"d" ,z)) >> zz >> >> Cheers, >> Bert >> >> On Fri, Aug 9, 2013 at 7:10 AM, Claus O'Rourke >> wrote: >>> Hello R-Help, >>> I have a variable with > 32 levels and I'd like to split this into two >>> variables such that both new variables have >= 32 variables. This is >>> to handle the limit of 32 level predictor variables in R's Random >>> Forest implementation. Might someone be able to suggest an elegant way >>> to do this? I've tried googling for this, but haven't hit the right >>> search terms. >>> >>> Regards >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> >> >> -- >> >> Bert Gunter >> Genentech Nonclinical Biostatistics >> >> Internal Contact Info: >> Phone: 467-7374 >> Website: >> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Splitting a categorical variable into multiple variables
Hello R-Help, I have a variable with > 32 levels and I'd like to split this into two variables such that both new variables have >= 32 variables. This is to handle the limit of 32 level predictor variables in R's Random Forest implementation. Might someone be able to suggest an elegant way to do this? I've tried googling for this, but haven't hit the right search terms. Regards __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Levels in new data fed to SVM
Thanks for clarifying! On Thu, Jan 10, 2013 at 12:47 PM, Uwe Ligges wrote: > > > On 08.01.2013 21:14, Claus O'Rourke wrote: >> >> Hi all, >> I've encountered an issue using svm (e1071) in the specific case of >> supplying new data which may not have the full range of levels that >> were present in the training data. >> >> I've constructed this really primitive example to illustrate the point: >> >>> library(e1071) >>> training.data <- data.frame(x = c("yellow","red","yellow","red"), a = >>> c("alpha","alpha","beta","beta"), b = c("a", "b", "a", "c")) >>> my.model <- svm(x ~ .,data=training.data) >>> test.data <- data.frame(x = c("yellow","red"), a = c("alpha","beta"), b = >>> c("a", "b")) >>> predict(my.model,test.data) >> >> Error in predict.svm(my.model, test.data) : >>test data does not match model ! >>> >>> >>> levels(test.data$b) <- levels(training.data$b) >>> predict(my.model,test.data) >> >> 1 2 >> yellowred >> Levels: red yellow >> >> In the first case test.data$b does not have the level "c" and this >> results in the input data being rejected. I've debugged this down to >> the point of model matrix creation in the SVM R code. Once I fill up >> the levels in the test data with the levels from the original data, >> then there is no problem at all. >> >> Assuming my test data has to come from another source where the number >> of category levels seen might not always be as large as those for the >> original training data, is there a better way I should be handling >> this? > > > > You have to tell the factor about the possible levels, it does not > necessarily contain examples. > That means: > > levels(test.data$b) <- C("a", "b", "c") > predict(my.model,test.data) > > will help. > > Best, > Uwe Ligges > > > >> Thanks >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Levels in new data fed to SVM
Hi all, I've encountered an issue using svm (e1071) in the specific case of supplying new data which may not have the full range of levels that were present in the training data. I've constructed this really primitive example to illustrate the point: > library(e1071) > training.data <- data.frame(x = c("yellow","red","yellow","red"), a = > c("alpha","alpha","beta","beta"), b = c("a", "b", "a", "c")) > my.model <- svm(x ~ .,data=training.data) > test.data <- data.frame(x = c("yellow","red"), a = c("alpha","beta"), b = > c("a", "b")) > predict(my.model,test.data) Error in predict.svm(my.model, test.data) : test data does not match model ! > > levels(test.data$b) <- levels(training.data$b) > predict(my.model,test.data) 1 2 yellowred Levels: red yellow In the first case test.data$b does not have the level "c" and this results in the input data being rejected. I've debugged this down to the point of model matrix creation in the SVM R code. Once I fill up the levels in the test data with the levels from the original data, then there is no problem at all. Assuming my test data has to come from another source where the number of category levels seen might not always be as large as those for the original training data, is there a better way I should be handling this? Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Training parameters for a HMM
Hi, I'm a newbie to the world of HMMs and HMMs in R. I've had a look at the hmm package and the RHmm package but I couldn't see anything straightforward on how a labelled sequential dataset with observed values and underlying states might be used to construct and train a HMM based on that data and no pre-computed values for the transition, emission or initial state distributions. Does anyone have any excerpts of code that could get me moving in the right direction? To put it another way, lets say that I have the simple HMM topology that is shown here: http://en.wikipedia.org/wiki/File:HiddenMarkovModel.png And I have somehow collected datasets with observations and labelled hidden states: Sequence 1: Obs Hid y1X1 y2X2 y2X2 y4X1 ... ... y3X3 ... Sequence N: Obs Hid y2X1 y2X2 y2X1 y4X1 ... ... y4X1 I'm assuming categorial variables for y and x. I know I really am starting from from scratch here, so I'd be very grateful if anyone could point out to me how I could go about automatically constructing and parameterizing a HMM for data like this. Thanks for your patience. Claus. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Newbie-ish question on iteratively applying function to dataframe
Brilliant - that was really useful! On Tue, Mar 15, 2011 at 3:46 PM, Ista Zahn wrote: > Hi Claus, > > On Tue, Mar 15, 2011 at 9:33 AM, Claus O'Rourke > wrote: >> Hi, >> I am trying to recursively apply a function to a selection of columns >> in a dataframe. I've had a look around and from what I have read, I >> should be using some version of the apply function, but I'm really >> having some headaches with it. > > I would just do it in a loop (see below) >> >> Let me be more specific with an example. >> >> Say I have a data frame similar to the following >> >> A x y z r1 r2 r3 r4 >> 0.1 0.2 0.1 ... >> 0.1 0.3 ... >> 0.2 ... >> >> i.e., a number of columns, each of the same length, and all containing >> real numbers. Of these columns, I want to model one variable, say A, >> as a function of other variables, say x, y, z, and any one of my r1, >> r2, r3, ... variables. >> >> i.e., I want to model >> A ~ x + y + z + r1 >> A ~ x + y + z + r2 >> >> A ~ x + y + z + rn >> >> But where the number of 'r' variables I will have will be large, and I >> don't know the specific number of these variables in advance. >> >> My question first is, how can I select all the columns in a dataframe >> that have a heading that matches a string pattern? > > ?grep > >> >> And then related to this, what would be the best way of repeatedly >> applying my modelling function to the result? > > Well, I don't know about the "best" way. But why not just > > set.seed(21 ) > dat <- as.data.frame(matrix(rnorm(10 ), ncol=100, dimnames=list > (1:1000, c("A", "x", "y", "z", paste("r", 1:96, sep="" ) > > mods <- list() > for(i in grep("r", names(dat ), value=TRUE)) { > mods[[i]] <- lm(as.formula(paste("A ~ x + y + z + ", i)), data=dat ) > } > > Note that you should be cautious about making any inferences based on > this kind of method. In the example above 9 r variables are > "significant" at the .05 level, even though the data was generated > "randomly": > > sort(sapply(mods, function(x) coef(summary(x))[5, 4])) > > Best, > Ista >> >> Many thanks for any help for this occasional R armature. >> >> Claus >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Ista Zahn > Graduate student > University of Rochester > Department of Clinical and Social Psychology > http://yourpsyche.org > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Newbie-ish question on iteratively applying function to dataframe
Hi, I am trying to recursively apply a function to a selection of columns in a dataframe. I've had a look around and from what I have read, I should be using some version of the apply function, but I'm really having some headaches with it. Let me be more specific with an example. Say I have a data frame similar to the following A x y z r1r2r3r4 0.1 0.2 0.1 ... 0.1 0.3 ... 0.2 ... i.e., a number of columns, each of the same length, and all containing real numbers. Of these columns, I want to model one variable, say A, as a function of other variables, say x, y, z, and any one of my r1, r2, r3, ... variables. i.e., I want to model A ~ x + y + z + r1 A ~ x + y + z + r2 A ~ x + y + z + rn But where the number of 'r' variables I will have will be large, and I don't know the specific number of these variables in advance. My question first is, how can I select all the columns in a dataframe that have a heading that matches a string pattern? And then related to this, what would be the best way of repeatedly applying my modelling function to the result? Many thanks for any help for this occasional R armature. Claus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Mixed Design ANOVA - singular error model
Dear r-help list, I would like to run a mixed design anova to compare the results from one population sample to another. Here my within subject variable (stiulusID) has 45 levels and my between subject variable (group) has two levels. In addition to my number of levels in the within subject variable being very large, one other 'feature' of my data is that it is not balanced on the between subject variable. I have attached a copy of my data for reference. To run the ANOVA, I did this: summary(aov(Result ~ StimulusID * Group + Error(ParticipantToken/(StimulusID + Group)), data = data)) My results were roughly as I had expected, but at the end of the output I have the warning that the model is singular. I have seen this warning listed in other help requests, but from what I saw there this meant that one of my variables was redundant as it was a nesting of the other. I don't think this is the case in my data. Can I trust the results of a Mixed Design ANOVA with the warning? Or indeed with the unbalanced between subject variable? Many thanks for any help for a stats novice. Error: ParticipantToken Df Sum Sq Mean Sq F value Pr(>F) Group 1 0.69 0.6934 0.0437 0.8356 Residuals 39 619.47 15.8840 Error: ParticipantToken:StimulusID Df Sum Sq Mean Sq F valuePr(>F) StimulusID 44 4607.4 104.713 60.3868 < 2.2e-16 *** StimulusID:Group 44 170.3 3.870 2.2317 8.106e-06 *** Residuals1716 2975.6 1.734 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Warning message: In aov(Result ~ StimulusID * Group + Error(ParticipantToken/(StimulusID + : Error() model is singular __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Pairwise cross correlation from data set
Dear list, Following up on an earlier post, I would like to reorder a dataset and compute pairwise correlations. But I'm having some real problems getting this done. My data looks something like: Participant Stimulus Measurement p1 s`15 p1 s`26.1 p1 s`37 p2 s`14.8 p2 s`26 p2 s`36.5 p3 s`14 p3 s`27 p3 s`36 As a first step I would imagine that I have to rearrange my data into a frame more like this Stimulus p1 p2 p3 s1 5 4.8 4 s2 6.1 67 s3 7 6.5 6 And then do the pairwise correlations between {p1,p2},{p2,p3}.{p2,p3} I can do all of this manually, i.e., using some messy case specific code, but can anyone please point out the best way to do this in a more generalizable way. Thanks for any help you can give a novice! Claus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ANOVA of a sort
Dear R Help, I have a general question - I know this is the R list, but I hope someone can help me out a little as I've always found the help here to be absolutely fantastic. I have run a psychological study where participants are given multiple stimuli and their responses to those stimuli are measured on the same numerical scale, i.e., the data is something like Participant Stimulus Measurement p1 s`15 p1 s`26.1 p1 s`37 p2 s`14.8 p2 s`26 p2 s`36.5 p3 s`14 p3 s`27 p3 s`36 I would like to be able to measure the between participant variability for my data - i.e., determine whether measurements are relatively homogeneous across participants and whether there are very strange outliers (i.e., participants who maybe gave random or purposefully incorrect answers). Can anyone point me towards the correct type of tests for quantifying this? I have read that a repeated measure ANOVA might be a starting point. Many many thanks for any help you can give me! Claus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] substring comparison
Thanks. It works perfectly. On Thu, Apr 29, 2010 at 6:24 PM, Henrique Dallazuanna wrote: > Try with grepl: > > data$ContainsThe <- ifelse(grepl("the",data$Utt),"y","n") > > On Thu, Apr 29, 2010 at 2:17 PM, Claus O'Rourke > wrote: >> >> Hi all, >> >> I'm writing a script to do some basic text analysis in R. Let's assume >> I have a data frame named data which contains a column named 'utt' >> which contains strings. Is there a straightforward way to achieve >> something like this: >> >> data$ContainsThe <- ifelse(startsWith(data$Utt,"the"),"y","n") >> >> or >> >> data$ContainsThe <- ifelse(contains(data$Utt,"the"),"y","n") >> ? >> >> I tried using grep >> data$ContainsThe <- ifelse(grep("the",data$Utt),"y","n") >> >> but this doesn't work becausee grep only returns the rows for which >> grep succeeded. >> >> Thanks for any pointers >> >> Claus >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > Henrique Dallazuanna > Curitiba-Paraná-Brasil > 25° 25' 40" S 49° 16' 22" O > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] substring comparison
Hi all, I'm writing a script to do some basic text analysis in R. Let's assume I have a data frame named data which contains a column named 'utt' which contains strings. Is there a straightforward way to achieve something like this: data$ContainsThe <- ifelse(startsWith(data$Utt,"the"),"y","n") or data$ContainsThe <- ifelse(contains(data$Utt,"the"),"y","n") ? I tried using grep data$ContainsThe <- ifelse(grep("the",data$Utt),"y","n") but this doesn't work becausee grep only returns the rows for which grep succeeded. Thanks for any pointers Claus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Practical work with logistic regression
Thanks everyone for the replies, that sure cleared up some things for me. On Fri, Apr 23, 2010 at 9:11 AM, Jan van der Laan wrote: > When you just want to calculate the probability of belong to class A > or B of a new observation xi and do not have to do any new model > estimations or other analyses, the easiest way is probably to write > the estimated coefficients to a text write and read them in in your > java/c/whatever program and use them directly to calculate the > probabilities. This is simply > > p_i = 1/(1+e^(-(b0 + b1 x1i + ... + bk xki))) > > with the b0 ... bk your parameters. Easy to implement. Having an > interface to R for this seems to me overkill. > > Regards, > Jan > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Practical work with logistic regression
Dear all, I have a couple of short noob questions for whoever can take them. I'm from a very non-stats background so sorry for offending anybody with stupid questions ! :-) I have been using logistic regression care of glm to analyse a binary dependent variable against a couple of independent variables. All has gone well so far. In my work I have to compare the accuracy of analysis to a C4.5 machine learning approach. With the machine learning, a straight-forward measure of the quality of the classifier is simply the percentage of correctly classified instances. I can calculate this for the resultant model by comparing predictions to original values 'manually'. My question: is this not automatically - or easily - calculated in the produced model or the summary of that model? I want to use my model in real time to produce results for new inputs. Basically this model is to be used as a classifier for a robot in real time. Can anyone suggest the best way that a produced model can be used directly in external code once the model has been developed in R? If my external code is in Java, then using jri is one option. A more efficient method would be to take the intercept and coefficients and actually code up the function in the appropriate programming language. Has anyone ever tried doing this? Apologies again for the stupid questions, but the sooner I get some of these things straight, the better. Claus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] interpretation of p values for highly correlated logistic analysis
Thank you both for your advice. I'll follow up on it, but it is good to know that this is a known effect. Claus On Wed, Mar 31, 2010 at 3:02 PM, Stephan Kolassa wrote: > Hi Claus, > > welcome to the wonderful world of collinearity (or multicollinearity, as > some call it)! You have a near linear relationship between some of your > predictors, which can (and in your case does) lead to extreme parameter > estimates, which in some cases almost cancel out (a coefficient of +/-40 on > a categorical variable in logistic regression is a lot, and the intercept > and two of the roman parameter estimates almost cancel out) but which are > rather unstable (hence your high p-values). > > Belsley, Kuh and Welsch did some work on condition indices and variance > decomposition proportions, and variance inflation factors are quite popular > for diagnosing multicollinearity - google these terms for a bit, and > enlightenment will surely follow. > > What can you do? You should definitely think long and hard about your data. > Should you be doing separate regressions for some factor levels? Should you > drop a factor from the analysis? Should you do a categorical analogue of > Principal Components Analysis on your data before the regression? I > personally have never done this, but correspondence analysis has been > recommended as a "discrete alternative" to PCA on this list, see a couple of > books by M. J. Greenacre. > > Best of luck! > Stephan > > > claus orourke schrieb: >> >> Dear list, >> >> I want to perform a logistic regression analysis with multiple >> categorical predictors (i.e., a logit) on some data where there is a >> very definite relationship between one predicator and the >> response/independent variable. The problem I have is that in such a >> case the p value goes very high (while I as a naive newbie would >> expect it to crash towards 0). >> >> I'll illustrate my problem with some toy data. Say I have the >> following data as an input frame: >> >> roman animal colour >> 1 alpha dog black >> 2 beta cat white >> 3 alpha dog black >> 4 alpha cat black >> 5 beta dog white >> 6 alpha cat black >> 7 gamma dog white >> 8 alpha cat black >> 9 gamma dog white >> 10 beta cat white >> 11 alpha dog black >> 12 alpha cat black >> 13 gamma dog white >> 14 alpha cat black >> 15 beta dog white >> 16 beta cat black >> 17 alpha cat black >> 18 beta dog white >> >> In this toy data you can see that roman:alpha and roman:beta are >> pretty good predictors of colour >> >> Let's say I perform logistic analysis directly on the raw data with >> colour as a response variable: >> >>> options(contrasts=c("contr.treatment","contr.poly")) >>> anal1 <- glm(data$colour~data$roman+data$animal,family=binomial) >> >> then I find that my P values for each individual level coefficient >> approach 1: >> >> Coefficients: >> Estimate Std. Error z value Pr(>|z|) >> (Intercept) -41.65 19609.49 -0.002 0.998 >> data$romanbeta 42.35 19609.49 0.002 0.998 >> data$romangamma 43.74 31089.48 0.001 0.999 >> data$animaldog 20.48 13866.00 0.001 0.999 >> >> while I expect the p value for roman:beta to be quite low because it >> is a good predictor of colour:white >> >> On the other hand, if I then run an anova with a Chi-sq test on the >> result model, I find as I would expect that 'roman' is a good >> predictor of colour. >> >>> anova(anal1,test="Chisq") >> >> Analysis of Deviance Table >> >> Model: binomial, link: logit >> >> Response: data$colour >> >> Terms added sequentially (first to last) >> >> >> Df Deviance Resid. Df Resid. Dev P(>|Chi|) >> NULL 17 24.7306 >> data$roman 2 19.3239 15 5.4067 6.366e-05 *** >> data$animal 1 1.5876 14 3.8191 0.2077 >> --- >> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 >> >> Can anyone please explain why my p value is so high for the individual >> levels? >> >> Sorry for what is likely a stupid question. >> >> Claus >> >> p.s., when I run logistic analysis on data that is more 'randomised' >> everything comes out as I expect. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.