[R] Approximate taylor series
Hi All, Say I have the values of function f(x1,x2,x3,x4) for each values of x1,x2,x3,x4 but not complete. But the functional form is not known. Techniques like regression, etc. are not able to give me satisfactory results and msy be more complex than we thought. I wanted to use Taylor's approximation to continuous function, to approximate a functional form using the given data. But failed to see a package in R thaat does that. Can anyone suggest a way to do it? -- Anindya Sankar Dey [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Probable Error in fmsb package
Hi All, The fmsb package has a function called Variance Inflation Factor and it states the definition of the function as follows:- "To evaluate multicolinearity of multiple regression model, calculating the variance inflation factor (VIF) from the result of lm(). If VIF is more than 10, multicolinearity is strongly suggested. " The function computes VIF of a model as 1/(1-R^2) where R^2 is the coefficient of determination. Now nowhere in literature I have come across this definition of VIF, as VIF is always computed at individual variable level. Though the structure is almost the same, R^2 in theoretical VIF is the partial correlation coefficient. I only came aware when lots of freshers from non statistics background I interviewed for analytics position answered that the only definition of VIF they know is 1/(1 - Coeff. of Determination), and there is a R package which calculates VIF like that. After researched I found that such a function indeed exist in fmsb package. Please help me understand has an alternate definition of Variance Inflation Factor has ever emerged in theory? Does it really make sense to have VIF at a model level, as it does not help in solving the problem of multicollinearity during model building. And if I am right, what steps I should do about it. -- Anindya Sankar Dey [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Text Mining - Remove punctuation not removing quotes and dashes
Hi, I have been doing some text mining. I created the DTM matrix using the following steps. corpus1<-VCorpus(VectorSource(resume1$Dat1)) corpus1<-tm_map(corpus1,content_transformer(tolower)) dtm<-DocumentTermMatrix(corpus1, control = list(removePunctuation = TRUE, removeNumbers = TRUE, removeSparseTerms=TRUE, stopwords = TRUE)) After all the run I am still getting words like -quotation, "fun, model" , etc. What can I do about it. I do not need this dahses and extra quotations. -- Anindya Sankar Dey [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plm returning less number of variable than provided as input
Hi, I was trying a plm model with around 400 variables, but after passing that to the plm function I am getting coefficients for 265 variables. Can anyone explain me the reason? Is there a size restriction in plm? -- Anindya Sankar Dey [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data transformation to list for event occurence
Hi, Say I have a following data ID WeekEvent_Occurence A 1 0 A 2 0 A 3 1 A 4 0 B 1 1 B 2 0 B 3 0 B 4 1 that whether an individual experienced an event in a particular week. I wish to create list such as the first element of the list will be a vector listing the week number when the event has occurred for A, followed by that of B. Can you help creating this? -- Anindya Sankar Dey [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (no subject)
You can easily subset the data then use rowSum. say your dataset name is data1. then write data2<-data[,c(7,12,45,57)] then write result<-rowsum(data2) On Fri, Aug 23, 2013 at 3:47 PM, rajib prasad wrote: > I am new to R. I have a data like: > > x y z w p .. > m >1 1015 20 25 30 >2 11 1621 26 31 >3 12 171819 20 >4 51 52535567 > ... > > thus I have 145 rows and 160 column in my data which is named as > data.csv. Now i want to create a new column 'm' and for every row m > will take value =column 7+ column 12+ column 57+ column 45 i.e. for > every row it will take value of sum of corresponding row's 7 & 12 & 57 > & 45 column's value . > So, how to write the code for this operation? > > > > > > -- > > > > RAJIB PRASAD > > Centre for Economic Studies & Planning > Jawaharlal Nehru University > New Delhi-67 > contact no: 09868320368 > mail id: rwho2...@gmail.com > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Anindya Sankar Dey [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Combining two tables without going through lot of ifelse statement
Hi All, Arun's solution is working. Now can someone help me in just an expansion. If we have multiple table like this, adding them in rbind is working, but if I want a generic function where we do not know how many tables will be created can that also be avoided from using loops. On Fri, Aug 23, 2013 at 7:15 PM, arun wrote: > > > In the case of ?data.table() > > dt1<-data.table(rbind(as.matrix(dat1),as.matrix(dat2))) ## converted the > data.frame to matrix to mimic the situation > > dt2<- subset(dt1[,sum(V2),by=V1],V1!=0) > setnames(dt2,2,"V2") > dt2 > # V1 V2 > #1: 1 10 > #2: 3 10 > #3: 2 10 > > > #or > > > > res<-with(as.data.frame(rbind(as.matrix(dat1),as.matrix(dat2))),aggregate(V2~V1,FUN=sum)) > res1<- res[res[,1]!=0,] > res1 > # V1 V2 > #2 1 10 > #3 2 10 > #4 3 10 > A.K. > > From: Anindya Sankar Dey > To: arun > Sent: Friday, August 23, 2013 9:40 AM > Subject: Re: [R] Combining two tables without going through lot of ifelse > statement > > > > Mine is matrices, will this work on matrices as well? > > Thank for your help > > > > On Fri, Aug 23, 2013 at 7:02 PM, arun wrote: > > However it is not clear when you mention these are tables. There is > ?table() and ?data.frame and the structure will be different in each case. > Here, I assumed that your table is data.frame.. > > > > > > > > > >- Original Message - > >From: arun > >To: Anindya Sankar Dey > >Cc: R help > >Sent: Friday, August 23, 2013 9:30 AM > >Subject: Re: [R] Combining two tables without going through lot of ifelse > statement > > > >Hi, > >Try: > > > >dat1<- read.table(text=" > >1 10 > >3 5 > >0 0 > >",sep="",header=FALSE) > >dat2<- read.table(text=" > >2 10 > >0 0 > >3 5 > >",sep="",header=FALSE) > >res<-with(rbind(dat1,dat2),aggregate(V2~V1,FUN=sum)) > >res1<-res[res[,1]!=0,] > > res1 > ># V1 V2 > >#2 1 10 > >#3 2 10 > >#4 3 10 > > > >#or > >library(data.table) > >dt1<- data.table(rbind(dat1,dat2)) > > dt2<-subset(dt1[,sum(V2),by=V1],V1!=0) > > setnames(dt2,2,"V2") > > dt2 > ># V1 V2 > >#1: 1 10 > >#2: 3 10 > >#3: 2 10 > > > >A.K. > > > >- Original Message - > >From: Anindya Sankar Dey > >To: r-help > >Cc: > >Sent: Friday, August 23, 2013 8:59 AM > >Subject: [R] Combining two tables without going through lot of ifelse > statement > > > >HI All, > > > >Suppose I have two table like below > > > >Table 1: > > > >1 10 > >3 5 > >0 0 > > > >Table 2: > > > >2 10 > >0 0 > >3 5 > > > > > >I need to create a new table like below > > > >Table 3: > > > >1 10 > >2 10 > >3 10 > > > >The row may interchange in table 3, but is there any way to do this > instead > >of writing lot of if-else and loops? > > > >Thanks in advance. > > > >-- > >Anindya Sankar Dey > > > >[[alternative HTML version deleted]] > > > >__ > >R-help@r-project.org mailing list > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > > > > > > > -- > Anindya Sankar Dey > -- Anindya Sankar Dey [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Combining two tables without going through lot of ifelse statement
HI All, Suppose I have two table like below Table 1: 1 10 3 5 0 0 Table 2: 2 10 0 0 3 5 I need to create a new table like below Table 3: 1 10 2 10 3 10 The row may interchange in table 3, but is there any way to do this instead of writing lot of if-else and loops? Thanks in advance. -- Anindya Sankar Dey [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Apriori probabilities in naiveBayes function
Hi All, I applied the naiveBayes function in e1071 package with the iris data, and here's the list that was created structure(list(apriori = structure(c(50L, 50L, 50L), .Dim = 3L, .Dimnames = structure(list( Y = c("setosa", "versicolor", "virginica")), .Names = "Y"), class = "table"), tables = structure(list(Sepal.Length = structure(c(5.006, 5.936, 6.588, 0.352489687213451, 0.516171147063863, 0.635879593274432 ), .Dim = c(3L, 2L), .Dimnames = structure(list(Y = c("setosa", "versicolor", "virginica"), Sepal.Length = NULL), .Names = c("Y", "Sepal.Length"))), Sepal.Width = structure(c(3.428, 2.77, 2.974, 0.379064369096289, 0.313798323378411, 0.322496638172637 ), .Dim = c(3L, 2L), .Dimnames = structure(list(Y = c("setosa", "versicolor", "virginica"), Sepal.Width = NULL), .Names = c("Y", "Sepal.Width"))), Petal.Length = structure(c(1.462, 4.26, 5.552, 0.173663996480184, 0.469910977239958, 0.551894695663983 ), .Dim = c(3L, 2L), .Dimnames = structure(list(Y = c("setosa", "versicolor", "virginica"), Petal.Length = NULL), .Names = c("Y", "Petal.Length"))), Petal.Width = structure(c(0.246, 1.326, 2.026, 0.105385589380046, 0.197752680004544, 0.274650055636667 ), .Dim = c(3L, 2L), .Dimnames = structure(list(Y = c("setosa", "versicolor", "virginica"), Petal.Width = NULL), .Names = c("Y", "Petal.Width", .Names = c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")), levels = c("setosa", "versicolor", "virginica"), call = quote(naiveBayes.default(x = X, y = Y, laplace = laplace))), .Names = c("apriori", "tables", "levels", "call"), class = "naiveBayes") I'm unable to understand that the first element of the list should be a vector like (50,50,50) but its correctly showing (0.33,0.33,0.33). Can anyone tell me which part of the code is doing this? -- Anindya Sankar Dey [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Plotting time data for various countries in same graph
Hi, I've the following kind of data Time Country Values 2010Q1India 5 2010Q2India 7 2010Q3India 5 2010Q4India 9 2010Q1China 10 2010Q2China 6 2010Q3China 9 2010Q4 China 14 I needed to plot a graph with the x-axis being time,y-axis being he Values and 2 line graph , one for India and one for counry. I don't have great knowledge on graphics in R. I was trying to use, ggplot(data,aes(x=Time,y=Values,colour=Country)) But this does not help. Can anyone help me with this? -- Anindya Sankar Dey [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] improving/speeding up a very large, slow simulation
.out=length.out) > > permut.grid<-expand.grid(number.strata.range, project.n.range, > project.acreage.range, project.mean, project.sd.range, > number.verification.plots, verification.range, allowed.deviation) # create > a matrix with all combinations of the supplied vectors > > #assign names to the colums of the grid of combinations > names.permut<-c("number.strata", "project.n.plots", "project.acreage", > "project.mean", "project.sd", "number.verification.plots", > "verification.mean", "allowed.deviation") > > names(permut.grid)<-names.permut # done > > combinations<-length(permut.grid[,1]) > > size <-reps*combinations #need to know the size of the master matrix, which > is the the number of replications * each combination of the supplied > factors > > # we want a df from which to read all the data into the simulation, and > record the results > permut.col<-ncol(permut.grid) > col.base<-ncol(permut.grid)+2 > base <- matrix(nrow=size, ncol=col.base) > base <-data.frame(base) > > # supply the names > names.base<-c("number.strata", "project.n.plots", "project.acreage", > "project.mean", "project.sd", "number.verification.plots", > "verification.mean", "allowed.deviation","success.fail", > "plots.to.success") > > names(base)<-names.base > > # need to create index vectors for the base, master df > ends <- seq(reps+1, size+1, by=reps) > begins <- ends-reps > index <- cbind(begins, ends-1) > #done > > # next, need to assign the first 6 columns and number of rows = to the > number of reps in the simulation to be the given row in the permut.grid > matrix > > pb <- winProgressBar(title="Create base, big loop 1 of 2", label="0% done", > min=0, max=100, initial=0) > > for (i in 1:combinations) { > > base[index[i,1]:index[i,2],1:permut.col] <- permut.grid[i,] > #progress bar > info <- sprintf("%d%% done", round((i/combinations)*100)) > setWinProgressBar(pb, (i/combinations)*100, label=info) > } > > close(pb) > > # now, simply feed the values replicated the number of times we want to run > the simulation into the sequential.unpaired function, and assign the values > to the appropriate columns > > out.index1<-ncol(permut.grid)+1 > out.index2<-ncol(permut.grid)+2 > > #progress bar > pb <- winProgressBar(title="fill values, big loop 2 of 2", label="0% done", > min=0, max=100, initial=0) > > for (i in 1:size){ > > scalar.base <- base[i,] > verification.plots <- rnorm(scalar.base$number.verification.plots, > scalar.base$verification.mean, scalar.base$project.sd) > result<- sequential.unpaired(scalar.base$number.strata, > scalar.base$project.n.plots, scalar.base$project.mean, scalar.base$ > project.sd, verification.plots, scalar.base$allowed.deviation, > scalar.base$project.acreage, min.plots='default', alpha) > > base[i,out.index1] <- result[[6]][1] > base[i,out.index2] <- result[[7]][1] > info <- sprintf("%d%% done", round((i/size)*100)) > setWinProgressBar(pb, (i/size)*100, label=info) > } > > close(pb) > #return the results > return(base) > > } > > # I would like reps to = 1000, but that takes a really long time right now > test5 <- simulation.unpaired(reps=5, project.acreage.range = c(99, > 110,510,5100,11000), project.mean=100, project.n.min=10, project.n.max=100, > project.sd.min=.01, project.sd.max=.2, verification.mean.min=100, > verification.mean.max=130, number.verification.plots.min=10, > number.verification.plots.max=50, length.out = 5) > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Anindya Sankar Dey [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unable to work with Rattle
Hi, You can download the package here http://cran.r-project.org/web/packages/XML/index.html Regards, Anindya On Sun, Oct 14, 2012 at 10:14 PM, balaji sarangarajan wrote: > > > > Hello, > > > > I have installed R version 2.15.1 and I am trying to work with > rattle package. I have got an error > stating as below; > > > > Error in loadTooltips() : could not find function > "xmlTreeParse" > > In addition: Warning messages: > > 1: package XML is not available (for R version 2.15.1) > > 2: In library(package, lib.loc = lib.loc, character.only = > TRUE, logical.return = TRUE, : > > there is no package > called XML > > > > Is there any possibilities to add the XML package for R > version 2.15.1. > > > > Thanks in advance. > > > > Regards, > > > > Balaji > > > [[alternative HTML version deleted]] > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- Anindya Sankar Dey [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.