[R] How to process each element in 3 minute interval using a for loop in R?
Hi R-users, This should be a simple question: How can I delay each loop process in some minutes? The reason for this is I need to avoid too much traffic to get longitudes and latitudes of 2000 addresses using google API. I am searching for solutions with keywords like interval, minutes, delay, but no directly relevant clues have come up yet. Many thanks in advance. Best, Taka [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to process each element in 3 minute interval using a for loop in R?
On 09/07/2014 08:17, Takatsugu Kobayashi wrote: Hi R-users, This should be a simple question: How can I delay each loop process in some minutes? The reason for this is I need to avoid too much traffic to get longitudes and latitudes of 2000 addresses using google API. I am searching for solutions with keywords like interval, minutes, delay, but no directly relevant clues have come up yet. See ?Sys.sleep Many thanks in advance. Best, Taka -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reorder a list
Thanks Bill and the other guys for the variety of useful replies! In fact I'm working with pretty big lists (with ~35000 sublists) and Bill's solution is the fastest one in terms of computing time. Now comes the second part of the question... :-) I've my usual list of values and time indices to sort: A1-list(c(1:4),c(2,4,5),23,c(4,5,13)) and then another list A2 with variables which have to be paired with the values of A1: A2-sapply(A1, exp)#(in my case there's no exp relation between A1 and A2, they're completely uncorrelated. That's just an example ) A2 [[1]] [1] 2.718282 7.389056 20.085537 54.598150 [[2]] [1] 7.389056 54.598150 148.413159 [[3]] [1] 9744803446 [[4]] [1] 54.59815148.41316 442413.39201 Now I'd like to reorder the elements of A2 according to the same rule applied for A1: f - function (x) { lengths - vapply(x, FUN = length, FUN.VALUE = 0L) split(rep(seq_along(x), lengths), unlist(x, use.names = FALSE)) } B1-f(A1) and thus obtain a list B2 which looks like this: B2 $`1` [1] 2.718282 $`2` [1] 7.389056 7.389056 $`3` [1] 20.08554 $`4` [1] 54.59815 54.59815 54.59815 $`5` [1] 148.4132 148.4132 $`13` [1] 442413.4 $`23` [1] 9744803446 (In this example each element is the exp() of the sublist name, but in a general case they would be uncorrelated, and the resulting elements of each sublist would be different) Any idea? Alfio From: wdun...@tibco.com Date: Tue, 8 Jul 2014 12:11:09 -0700 Subject: Re: [R] reorder a list To: alfio...@hotmail.com CC: r-help@r-project.org f - function (x) { lengths - vapply(x, FUN = length, FUN.VALUE = 0L) split(rep(seq_along(x), lengths), unlist(x, use.names = FALSE)) } f(A1) # gives about what you want (has, e.g., name 23, not position 23, in output) Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Jul 8, 2014 at 9:39 AM, Lorenzo Alfieri alfio...@hotmail.com wrote: Hi, I'm trying to find a way to reorder the elements of a list. Let's say I have a list like this: A1-list(c(1:4),c(2,4,5),23,c(4,5,13)) A1 [[1]] [1] 1 2 3 4 [[2]] [1] 2 4 5 [[3]] [1] 23 [[4]] [1] 4 5 13 All the elements included in it are values, while each sublist is a time index Now, I'd like to reorder the list (without looping) so to obtain one sublist for each value, which include all the time indices where each value appears. In other words, the result should look like this: A2 [[1]] [1] 1 [[2]] [1] 1 2#because value 2 appears in the time index [[1]] and [[2]] of A1 [[3]] [1] 1 [[4]] [1] 1 2 4 [[5]] [1] 2 4 [[13]] [1] 4 [[23]] [1] 3 Any suggestion? Thanks Alfio [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] matrix built by diagonal matrices with a given structure (2nd trial)
Dear R-users, three weeks ago I sent the mail below, but I didn't receive any response. Maybe it was overlooked. Thanks anyway for all the help you gave us by this mailing-list, Giancarlo Camarda [...] I have a matrix with a series of not-overlapping in a row dimension vectors in a given structure. Something like: |a1, 0, 0, 0| | 0, a2, a3, 0| |a4, 0, 0, a5| where ai are column-vectors of the equal length, m. My aim is to construct a new matrix formed by diagonal matrices built by the mentioned vectors and placed following the original structure. Something like: |diag(a1),0,0,0| | 0, diag(a2), diag(a3),0| |diag(a4),0,0, diag(a5)| Of course the zeros are vectors of length m, and empty (m times m) matrices in the first and second scheme, respectively. I found a way to obtain what I need by selecting an augmented version of the original matrix which I have constructed using the kronecker product. I was wondering whether there is a more elegant and straightforward procedure. See below a simple reproducible example of my challenge in which the length of the vectors is 4. Thanks in advance for your help, Giancarlo Camarda ## size of the diagonal matrices ## or length of the vectors m - 4 ## the original matrix ze - rep(0,m) A - cbind(c(1,2,3,4,ze,13,14,15,16), c(ze,5,6,7,8,ze), c(ze,9,10,11,12,ze), c(ze,ze,17,18,19,20)) ## augmenting the original matrix A1 - kronecker(A, diag(m)) ## which rows to select w1 - seq(1, m^2, length=m) w2 - seq(0, 2*m^2, by=m^2) w0 - outer(w1, w2, FUN=+) w - c(w0) ## final matrix A2 - A1[w,] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to process each element in 3 minute interval using a for loop in R?
On 9 Jul 2014 17:19, Takatsugu Kobayashi taquito2...@gmail.com wrote: Hi R-users, This should be a simple question: How can I delay each loop process in some minutes? The reason for this is I need to avoid too much traffic to get longitudes and latitudes of 2000 addresses using google API. R is too fast? In a loop? Preposterous! I am searching for solutions with keywords like interval, minutes, delay, but no directly relevant clues have come up yet. Many thanks in advance. Best, Taka [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Transform a data.frame with ; sep column and another one in a a new one with the same two column but with repetitions
Em 05-07-2014 00:43, John McKown escreveu: I messed up my original response by not including r-help in the distribution. And now I won't look as bad because, after a short nap, I have new, much shorted (but more difficult, for me, to understand) answer. # # The original data is in the variable x. z=data.frame(TC=x$TC, WC=I(mapply(strsplit,x$WC,MoreArgs=list(';'),USE.NAMES=FALSE))); result=data.frame(TC=rep(x$TC,sapply(z$WC,length)),WC=unlist(z$WC)); # There may be a way to eliminate the temporary variable z. Maybe I need another nap! The heart of this is the mapply, which results in a list where each entry in the list is another list. And the entries in embedded list are the list of results from the output of strsplit() on the WC information. If this needs to be a function, then splitUp - function(x) { z=data.frame(TC=x$TC, WC=I(mapply(strsplit,x$WC,MoreArgs=list(';'),USE.NAMES=FALSE))); result=data.frame(TC=rep(x$TC,sapply(z$WC,length)),WC=unlist(z$WC)); return(result); } Then invoke it with: flattened.result - splitUp(original.data.frame); On Fri, Jul 4, 2014 at 7:50 AM, João Azevedo Patrício joao.patri...@gmx.pt wrote: Hi, I've been trying to solve this issue but with no success. I have some data like this: 1 TC WC 2 0 Instruments Instrumentation; Nuclear Science Technology; Physics, Particles Fields; Spectroscopy 3 0 Nanoscience Nanotechnology; Materials Science, Multidisciplinary; Physics, Applied 4 2 Physics, Nuclear; Physics, Particles Fields 5 0 Chemistry, Inorganic Nuclear 6 2 Chemistry, Physical; Materials Science, Multidisciplinary; Metallurgy Metallurgical Engineering And I need to have this: 1 TC WC 2 0 Instruments Instrumentation 2 0 Nuclear Science Technology 2 0 Physics, Particles Fields 2 0 Spectroscopy 3 0 Nanoscience Nanotechnology 3 0 Materials Science, Multidisciplinary 3 0 Physics, Applied 4 2 Physics, Nuclear 4 2 Physics, Particles Fields 5 0 Chemistry, Inorganic Nuclear 6 2 Chemistry, Physical 6 2 Materials Science, Multidisciplinary 6 2 Metallurgy Metallurgical Engineering This means repeat the row for each element in WC and keeping the same value in TC. The goal is to check how many TC (sum) there are by WC, when WC is multiple. i've tried to separate the column using strsplt but then I cannot keep the track of TC. thanks in advance. -- João Azevedo Patrício I've been testing it and the results is coming nicely. It grabs a CSV taken from ISI Web Of science, works it out and produces a table organized by WC (web of science category) with number of papers per area, citations and impact factor. my code is like this right now: isi - read.table(file.csv, header = TRUE, sep=;) ##get citations and web of science categories file isisplit=data.frame(TC=isi$TC, + WC=I(mapply(strsplit,isi$WC,MoreArgs=list(';'),USE.NAMES=FALSE))); result=data.frame(TC=rep(isi$TC,sapply(isisplit$WC,length)),WC=unlist(isisplit$WC)); isisplit$WC - str_trim(isisplit$WC) wccitations - aggregate (isisplit$TC, by=list(Category=isisplit$WC), FUN = sum) ## creates a table with the list of WCategories and the specific + citations colnames(wccitations) - c(WC, TC) wcproduction - table(isisplit$WC) ## creates a table with the number of pubs by WCategories wcproduction - as.data.table(wcproduction) colnames(wcproduction) - c(WC, PUB) wc - data.frame(WC = wccitations$WC, PUB = wcproduction$PUB, TC = wccitations$TC, IMP = round((wcproduction$PUB/wccitations$TC), digits = + 2)) wc[wc == Inf] = 0 ## removes inf in impact by impact 0 write.table(wc, file = file.csv, sep = ;, dec = ,) -- João Azevedo Patrício Tel.: +31 91 400 53 63 Portugal @ http://tripaforra.bl.ee Take 2 seconds to think before you act __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Cutting hierarchical cluster tree at specific height fails
Hi, I'd like to cut a hierachical cluster tree calculated with hclust at a specific height. However ever get following error message: Error in cutree(hc, h = 60) : the 'height' component of 'tree' is not sorted (increasingly) Here is a working example to show that when specifing a height in cutree() the code fails. In contrast, specifying the number of clusters in cutree() works. What is the exact problem and how can I solve it? x - c(rnorm(100,50,10),rnorm(100,200,25),rnorm(100,80,15)) y - c(rnorm(100,50,10),rnorm(100,200,25),rnorm(100,150,25)) df - data.frame(x,y) plot(df) hc - hclust(dist(df,method = euclidean), method=centroid) plot(hc) df$memb - cutree(hc, h = 60) # this does not work df$memb - cutree(hc, k = 3) # this works! plot(df$x,df$y,col=df$memb) Thank you for your hints! Best regards, Johannes [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] icd9 - a new R package
Dear R people, The new packge 'icd9' provides a range of tools for working with ICD-9-CM codes. http://cran.r-project.org/web/packages/icd9/index.html https://github.com/jackwasey/icd9 ICD-9 (clinical modification) is primarily used for categorizing diseases in the USA for hospital administration, whereas ICD-10 is used by the rest of the world for disease surveillance. This package is currently restricted to ICD-9-CM codes. I've seen other R code which manipulates ICD-9 codes, but the mistake is often made of thinking they are numeric. This is not the case, e.g. 100.0 is different from 100 and 100.00 . This package takes care of validating these codes, explaining them (converting code to plain English), comparing them, and attributing codes to groups of codes to assign co-morbidities to patients. ICD-9 codes are often provided in a shortened format without a decimal place, and these have distinct validation rules. Functions to convert between decimal and short forms are provided. All key parts use vectorized code, and comorbidities for a million patient visits can be assigned in a few seconds on a modest workstation. SAS code is published by AHRQ to allow assignment of ICD-9 codes to comorbidities. This package contains some SAS source to R code translation, so that the canonical ICD-9-CM to comorbidity mapping provided by AHRQ can be derived directly without the cumbersome and error-prone manual task of re-encoding the relationships in R. I believe a SAS to R converter was an April Fools' joke some time ago, but this is indeed a very limited answer to that problem. http://www.biostatistics.dk/sas2r/index.html A short vignette covers the major use-cases. http://cran.r-project.org/web/packages/icd9/vignettes/icd9.pdf The code is supported by a fairly thorough test suite, and is well documented in the hope that it will be easier for users of the package to understand it, and to get involved. I chose only to export key functions where I had thought carefully about the external API, but all internal functions are documented and contain potentially useful nuggets for power users. Comments and contributions are most welcome. In particular, I'd love to see unit tests corresponding to any failures you may encounter working with your own ICD-9 data. Hope you find this useful. Jack -- Jack Wasey Resident Physician, Anesthesiology and Critical Care Medicine Johns Hopkins Hospital Baltimore, MD, USA ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] h2o - fast scalable glm, deeplearning, gbm, randomForest, plyr for big datasets
http://cran.r-project.org/web/packages/h2o/ Please try h2o, H2O is fast scalable open source R package for Generalized Linear Modeling, Deep Learning, Gradient Boosting, RandomForest, k-means for large tera-byte datasets. This package allows you to scale R over Hadoop like datasets in-memory on multiple machines. Under the guidance of a strong scientific advisory council from Stanford, likes of Stephen Boyd, Rob Tibshirani and Trevor Hastie, our distributed systems team built an R package that calls fast implementations of GLM, GBM, DeepLearning, RandomForest to allow modeling on big data in Open Source. Connectors to Hadoop, S3 and other file formats and extensibility via R-expressions at scale, plyr and java. thanks, Sri -- ceo co-founder, 0 http://www.0xdata.com/*x*data Inc culture.code.customer [[alternative HTML version deleted]] ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R Studio v3.0.3 for Windows 32bits is too slow
Hi R'er, I have a dataset which has a matrix of 7502 x 1426 (rows x columns). The data is in a CSV format which has a size around 68Mb. This dataset is less than 10% of our dataset. I have been adopting the Anomaly detection method as described by http://www.mattpeeples.net/kmeans.html . It has been running more than 24hrs and still haven't completed the calculation. I did manage to run it with a smaller dataset (ie, 2100 rows x 1426 columns). It took around 12hrs to run. I have a few questions and need your expertise guidance. 1) Is there any better Open source tools to use to do in one tool (eg, R Studio): prepare data, build models, validate models, test models and present data. I am looking a tool which will allow me to do the same as per the above link (Matt Peeples' blog). 2) Is there an Open source tools to perform the above which will allow me to run on top of Hadoop eco-system? 3) Can we use R Studio for windows as a client to run on top of Hadoop eco-system? If yes, please point me to the site where they have a use cases or samples. Thanks and Regards, Truong Phan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error evaluating partitioning around medoids clustering method R clValid package
I have a data.frame with 300 observations of 36 numerical, categorical, and NA variables. I am trying to evaluate the partitioning around medoids clustering algorithm for a marketing segmentation study. My original dataset has over 130,000 observations, but I took a sample for easy reproducibility reasons. My machine Mac OSX 10.9.3: sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin13.1.0 (64-bit) Problem: Getting an error when doing internal and stability evaluation with the clValid CRAN package in R. Code: #Convert csv to data.frame frame -as.data.frame(Smallstore1) library(cluster) #Create dissimilarity matrix #Gower coefficient for finding distance between mixed variables daisy1 - daisy(frame, metric = gower, type = list(ordratio = c(1:36))) #k-medoid algorithm with 3 clusters kanswers - pam(daisy1, 3, diss = TRUE) #Evaluate k-mediod clustering algorithm with 2 to 6 clusters #Import clValid package library(clValid) #Internal validation internval1 - clValid(daisy1, 2:6, clMethods = pam, validation = internal) #Error in switch(class(obj), matrix = mat - obj, ExpressionSet = mat -Biobase::exprs(obj), : EXPR must be a length 1 vector #Error in summary(internval1) : #error in evaluating the argument 'object' in selecting a method for function 'summary': Error: object 'internval1' not found #External validation stabval1 - clValid(daisy1, 2:6, clMethods = pam, validation = stability) #Error in switch(class(obj), matrix = mat - obj, ExpressionSet = mat - Biobase::exprs(obj), : EXPR must be a length 1 vector Data: I put the data.frame in a dissimilarity matrix using the daisy function and used partitioning around medoids with 3 clusters. The daisy and pam functions come from the cluster CRAN package in R. Since the data.frame has mixed values, the gower distance coefficient is used. Here's the head of the first 7 variables, but I took out the names of the email for privacy reasons. head(frame) user_id emailAge Gender Household.Income Marital.Status Presence .of.children 1 12945 @bellycard.com NAMaleNANA NA 2 12947 @bellycard.com NAMaleNANA NA 3 12990 @gmail.com NANANANA NA 4 13160 @gmail.com 25-34 Male100k-125k Single No 5 13195 @gmail.com NAMale75k-100kSingle No 6 13286 @gmail.com NANANANA NA Please let me know if I can provide more information. -- Scott Davis Cell: (408)826-9561 Skype ID: Scdavis61 San Jose, CA. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] HPGL or PCL plotting device? Or otherwise plotting plots
Hi, I want to print plots on a Roland DXY-1100 plotter. How can I do this from R? I think the easiest thing would be a graphics device for Printer Command Language or Hewlett-Packard Graphics Language, but I haven't managed to find any of those. Thanks Tom __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] HPGL or PCL plotting device? Or otherwise plotting plots
Oh it was easier than I thought. postscript('project-contracts.ps') hist(log(projects$n.contracts)) dev.off() Then run this from the shell. pstoedit -f plot-hpgl project-contracts.ps project-contracts.hpgl And send it to the plotter. On 09 Jul 13:10, Thomas Levine wrote: Hi, I want to print plots on a Roland DXY-1100 plotter. How can I do this from R? I think the easiest thing would be a graphics device for Printer Command Language or Hewlett-Packard Graphics Language, but I haven't managed to find any of those. Thanks Tom __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R Studio v3.0.3 for Windows 32bits is too slow
RStudio is a separate product with its own support. Post there, not here. -- Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. Clifford Stoll On Tue, Jul 8, 2014 at 7:34 PM, Phan, Truong Q troung.p...@team.telstra.com wrote: Hi R'er, I have a dataset which has a matrix of 7502 x 1426 (rows x columns). The data is in a CSV format which has a size around 68Mb. This dataset is less than 10% of our dataset. I have been adopting the Anomaly detection method as described by http://www.mattpeeples.net/kmeans.html . It has been running more than 24hrs and still haven't completed the calculation. I did manage to run it with a smaller dataset (ie, 2100 rows x 1426 columns). It took around 12hrs to run. I have a few questions and need your expertise guidance. 1) Is there any better Open source tools to use to do in one tool (eg, R Studio): prepare data, build models, validate models, test models and present data. I am looking a tool which will allow me to do the same as per the above link (Matt Peeples' blog). 2) Is there an Open source tools to perform the above which will allow me to run on top of Hadoop eco-system? 3) Can we use R Studio for windows as a client to run on top of Hadoop eco-system? If yes, please point me to the site where they have a use cases or samples. Thanks and Regards, Truong Phan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] eclat problem
Thanks Michael, everything going perfect right now! I didn't expect such an extensive list of itemsets given the insights on the data that I have. And the error message didn't gave me the right clue. Now is working fine! Thanks for your time! Best Regards. Alvaro. -Mensaje original- De: Michael Hahsler [mailto:mhahs...@lyle.smu.edu] Enviado el: miércoles, 09 de julio de 2014 10:02 Para: r-help@r-project.org; Alvaro Flores Asunto: Re: [R] eclat problem Hi Alvaro, this was a tricky problem. Under Windows R uses the trio library (different from the package Trio which creates very similar error messages) for printf support. arules currently contains a bug that results in an invalid format string for printf when an error message is created. For your problem below the error message should read out of memory, but since creating the error message produces an invalid printf format string you see under Windows the internal error instead. This problem will be fixed in the next release of arules (version 1.1-4). Note however that your code still runs out of memory and you need to increase support and/or restrict the number of items in the itemsets (both with the list for parameter; see also class? ECparameter). -Michael On 07.07.2014 22:56, Alvaro Flores wrote: I'm working with arule packages and I'm constantly trying to mine frequent itemsets in different datasets. But recently R kept returning the same error message : Error in eclat(txn, parameter = list(supp = 0.001)) : internal error in trio library Is just this particular dataset that gives me problems. Anyone has ever passed and fixed this error? Here are an example of the transaction data set: items 1 {001200-3, 004100-3, 004200-5, 004500-9, 004600-5} 2 {001524-K, 002100-2} 3 {00179, 03807, 08019, 09314, 12432} 4 {002000} 5 {002600-4, 002700-0} 6 {004115-F, 02/100073A, 02/630935A, 044.1567.0, 044.1567.0/I, 1010301FA, 1012015-400-, 1117285, 1118100-201-4020, 1118105-051-M, 173171, 1903628, 1903628/I, 1903629, 1903629/I, 1907566, 1907567, 1907570, 1907571, 1931018, 2.4419.340.0, 215420/N, 2654408-N, 2992242, 2992544, 2996416, 2VC-115561, 320/04133A, 4102AZL.14.100N00, 4110Z.14.30, 4625547, 477556, 477556/O, 478736, 478736/O, 500054655, 581/18096, 6170005, 957E-6731 A, BF8T-6731 BA, BG2X-6731 CA, DBPN-6731 A, F2NN-6714 AB, LF16015, LF3000, LF3345, LF3346, LF3349, LF3806, LF4054, LF9009, RE504836, RE59754, T19044/I, TAE-115561, W-950/7, ZP520} 7 {005226, 012.0348.0, 012.0349.0, 02/910150A, 1105010E834N00, 1105020D354, 1117011-630-W, 1117025-621-, 1372444, 1393640, 1457434310001, 1521219, 1873018, 1901605, 1902134, 1902138, 1902138/I, 1907640, 1907640/I, 1908547, 1908547/I, 1930010, 19BG920-30001, 20430751, 20514654, 20976003/O, 20998367, 215460, 26560143, 26560201, 26560201/I, 2710806, 2992241, 2992241/I, 2992300, 2992662, 2992662/I, 2995711, 2997376, 2R0-127177, 2R0-127177 A, 2RD-127491, 32/401102, 32/912001A, 32/925423, 32/925760, 32/925869, 32/925915, 320/07155, 343144, 4102H.15.110, 4102H.15.110N00, 4102H.15.20, 500315480, 500315480/I, 500316868, 550228, 550228/N, 582042, 612630080011N00, 612630080087, 7146717, 8159975/O, 81BASE921, 98439681, AR50041, BC1132N01, BF0X-9155 AA, BF5T-9155 AB, BF8T-9155 DA, DDN-99162 B, DONN-9N074 BG, E5HT-9155 CA, E7HN-9155 AA, FF42000, FF5421, FF5458, FF5488, FS1000, FS1015, FS1241, FS1242, FS1280, PSD460/1, PSD970/1, R28-30M, RC45MB, RE62418, RK120MBQ2, T22VA, WK-723} 8 {005227,
[R] Revolutions blog: June 2014 Roundup
Revolution Analytics staff and guests write about R every weekday at the Revolutions blog: http://blog.revolutionanalytics.com and every month I post a summary of articles from the previous month of particular interest to readers of r-help. In case you missed them, here are some articles related to R from the month of June: The useR! 2014 conference in Los Angeles opened with 16 tutorials: http://bit.ly/1rSoqeh DataInformed published my article on how various companies use R: http://bit.ly/1rSos5L Joe Rickert reviews the new book Applied Predictive Modeling by Max Kuhn and Kjell Johnson, which is rich with examples in R and the caret package: http://bit.ly/1rSopam Hadley Wickham's new ggvis package features a new syntax to create interactive ggplot2-style graphics: http://bit.ly/1rSoqei Guest poster Wayne Smith reviews the R and Statistics presentations at the Intel International Science and Engineering Fair: http://bit.ly/1rSos5K Bank of America uses R to make mundane tables stand out, as reported in a recent FastCoLabs article: http://bit.ly/1rSoqen DataCamp created an infographic comparing SAS, R and SPSS: http://bit.ly/1rSos5M Prizes on offer for the best R graphic mapping the locations of R user groups: http://bit.ly/1rSoqeo Guy Abel used the circlize package to visualize the players in the World Cup and the location of their home teams: http://bit.ly/1rSos5N How to create a clean financial data set for backtesting using data from Quandl: http://bit.ly/1rSoqep R's popularity continues to surge, with high rankings in the latest KDNuggets poll and Redmonk language rankings: http://bit.ly/1rSos5O Analysis of movie palettes using Python and R reveals that Hollywood cinematographers prefer orange and blue: http://bit.ly/1rSoqeq There are now 141 R user groups worldwide, with recent additions in Chennai (India), Exeter (UK), Miami (FL), Durham (NH), Albany (NY) and Charlotte (NC): http://bit.ly/1rSos5Q Two more companies share how they use R: the ride-sharing company Uber, and CultureAmp (a people intelligence platform): http://bit.ly/1rSoquF Tutorial on constructing a term structure of interest rates with R: http://bit.ly/1rSos5R R is featured in a Dataversity article on the relevance of open source analytics for businesses: http://bit.ly/1rSoqer The China R Users Conference attracted more than 1000 attendees: http://bit.ly/1rSoquE A look at the state of the art in Deep Learning research, including the darch and deepnet packages: http://bit.ly/1rSos5U One million students have enrolled in Coursera courses based on R: http://bit.ly/1rSoquG An updated function for reading data into R from Google Spreadsheets that works with Google's current security model: http://bit.ly/1rSos5W General interest stories (not related to R) in the past month included: the illusory colors of Benham's Top (http://bit.ly/1rSos5V), an airport performance of All by Myself (http://bit.ly/1rSoquH), and beer bottle harmonies (http://bit.ly/1rSos5X). Meeting times for local R user groups (http://bit.ly/eC5YQe) can be found on the updated R Community Calendar at: http://bit.ly/bb3naW If you're looking for more articles about R, you can find summaries from previous months at http://blog.revolutionanalytics.com/roundups/. You can receive daily blog posts via email using services like blogtrottr.com, or join the Revolution Analytics mailing list at http://revolutionanalytics.com/newsletter to be alerted to new articles on a monthly basis. As always, thanks for the comments and please keep sending suggestions to me at da...@revolutionanalytics.com or via Twitter (I'm @revodavid). Cheers, # David -- David M Smith da...@revolutionanalytics.com Chief Community Officer, Revolution Analytics http://blog.revolutionanalytics.com Tel: +1 (650) 646-9523 (Chicago IL, USA) Twitter: @revodavid -- Try Enterprise R Now! https://aws.amazon.com/marketplace/seller-profile/ref=_ptnr_emailfooter?ie=UTF8id=3c6536d3-8115-4bc0-a713-be58e257a7be Get a 14 Day Free Trial of Revolution R Enterprise on AWS Marketplace __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to include factor levels into plot title?
Hi all, I'd like to include the levels of one of my variables in the title of a plot. I'd like these factor levels to be concatenated. E.g. 'These are the levels: setosa, versicolor, virginica'. I've been working with this code but I don't get the desired results. Any suggestions would be a great help. Thanks! dd - iris plot(dd$Sepal.Length, dd$Petal.Length, main=sprintf(These are the levels: %s, levels(dd$Species))) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cutting hierarchical cluster tree at specific height fails
To cut the tree, the clustering algorithm must produce consistently increasing height values with no reversals. You used one of the two options in hclust that does not do this. Note the following from the hclust manual page: Note however, that methods median and centroid are not leading to a monotone distance measure, or equivalently the resulting dendrograms can have so called inversions (which are hard to interpret). The cutree manual page: Cutting trees at a given height is only possible for ultrametric trees (with monotone clustering heights). Use a different method (but not median). - David L Carlson Department of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Johannes Radinger Sent: Wednesday, July 9, 2014 7:07 AM To: R help Subject: [R] Cutting hierarchical cluster tree at specific height fails Hi, I'd like to cut a hierachical cluster tree calculated with hclust at a specific height. However ever get following error message: Error in cutree(hc, h = 60) : the 'height' component of 'tree' is not sorted (increasingly) Here is a working example to show that when specifing a height in cutree() the code fails. In contrast, specifying the number of clusters in cutree() works. What is the exact problem and how can I solve it? x - c(rnorm(100,50,10),rnorm(100,200,25),rnorm(100,80,15)) y - c(rnorm(100,50,10),rnorm(100,200,25),rnorm(100,150,25)) df - data.frame(x,y) plot(df) hc - hclust(dist(df,method = euclidean), method=centroid) plot(hc) df$memb - cutree(hc, h = 60) # this does not work df$memb - cutree(hc, k = 3) # this works! plot(df$x,df$y,col=df$memb) Thank you for your hints! Best regards, Johannes [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to include factor levels into plot title?
How about: plot(dd$Sepal.Length, dd$Petal.Length, main=paste(These are the levels:, paste(levels(dd$Species), collapse=, ))) Thanks for the actual reproducible example! Sarah On Wed, Jul 9, 2014 at 11:24 AM, Bea GD aguitatie...@hotmail.com wrote: Hi all, I'd like to include the levels of one of my variables in the title of a plot. I'd like these factor levels to be concatenated. E.g. 'These are the levels: setosa, versicolor, virginica'. I've been working with this code but I don't get the desired results. Any suggestions would be a great help. Thanks! dd - iris plot(dd$Sepal.Length, dd$Petal.Length, main=sprintf(These are the levels: %s, levels(dd$Species))) -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] eclat problem
Hi Alvaro, this was a tricky problem. Under Windows R uses the trio library (different from the package Trio which creates very similar error messages) for printf support. arules currently contains a bug that results in an invalid format string for printf when an error message is created. For your problem below the error message should read out of memory, but since creating the error message produces an invalid printf format string you see under Windows the internal error instead. This problem will be fixed in the next release of arules (version 1.1-4). Note however that your code still runs out of memory and you need to increase support and/or restrict the number of items in the itemsets (both with the list for parameter; see also class? ECparameter). -Michael On 07.07.2014 22:56, Alvaro Flores wrote: I'm working with arule packages and I'm constantly trying to mine frequent itemsets in different datasets. But recently R kept returning the same error message : Error in eclat(txn, parameter = list(supp = 0.001)) : internal error in trio library Is just this particular dataset that gives me problems. Anyone has ever passed and fixed this error? Here are an example of the transaction data set: items 1 {001200-3, 004100-3, 004200-5, 004500-9, 004600-5} 2 {001524-K, 002100-2} 3 {00179, 03807, 08019, 09314, 12432} 4 {002000} 5 {002600-4, 002700-0} 6 {004115-F, 02/100073A, 02/630935A, 044.1567.0, 044.1567.0/I, 1010301FA, 1012015-400-, 1117285, 1118100-201-4020, 1118105-051-M, 173171, 1903628, 1903628/I, 1903629, 1903629/I, 1907566, 1907567, 1907570, 1907571, 1931018, 2.4419.340.0, 215420/N, 2654408-N, 2992242, 2992544, 2996416, 2VC-115561, 320/04133A, 4102AZL.14.100N00, 4110Z.14.30, 4625547, 477556, 477556/O, 478736, 478736/O, 500054655, 581/18096, 6170005, 957E-6731 A, BF8T-6731 BA, BG2X-6731 CA, DBPN-6731 A, F2NN-6714 AB, LF16015, LF3000, LF3345, LF3346, LF3349, LF3806, LF4054, LF9009, RE504836, RE59754, T19044/I, TAE-115561, W-950/7, ZP520} 7 {005226, 012.0348.0, 012.0349.0, 02/910150A, 1105010E834N00, 1105020D354, 1117011-630-W, 1117025-621-, 1372444, 1393640, 1457434310001, 1521219, 1873018, 1901605, 1902134, 1902138, 1902138/I, 1907640, 1907640/I, 1908547, 1908547/I, 1930010, 19BG920-30001, 20430751, 20514654, 20976003/O, 20998367, 215460, 26560143, 26560201, 26560201/I, 2710806, 2992241, 2992241/I, 2992300, 2992662, 2992662/I, 2995711, 2997376, 2R0-127177, 2R0-127177 A, 2RD-127491, 32/401102, 32/912001A, 32/925423, 32/925760, 32/925869, 32/925915, 320/07155, 343144, 4102H.15.110, 4102H.15.110N00, 4102H.15.20, 500315480, 500315480/I, 500316868, 550228, 550228/N, 582042, 612630080011N00, 612630080087, 7146717, 8159975/O, 81BASE921, 98439681, AR50041, BC1132N01, BF0X-9155 AA, BF5T-9155 AB, BF8T-9155 DA, DDN-99162 B, DONN-9N074 BG, E5HT-9155 CA, E7HN-9155 AA, FF42000, FF5421, FF5458, FF5488, FS1000, FS1015, FS1241, FS1242, FS1280, PSD460/1, PSD970/1, R28-30M, RC45MB, RE62418, RK120MBQ2, T22VA, WK-723} 8 {005227, 2641311, 2641371, 2641406, 2641725, 2641729, 2641808, 376518, 4757883, 72013, 72061, 8190393, 9986316, D8NN-9350 AA, DDN-9350, RE42211} 9 {0055, 0087, 0482, 0484, 0531, 11329, 8311} 10 {007.0762.0/40, 014.0428.0, 1114036, 1118369, 1118375, 1118376, 1118377, 1118379, 1305546, 1312934, 1677591, 1677592, 1677593, 2.1539.130.0, 2.1539.259.0, 20515059/C, 275092/C, 275636/C, 2RD-107124, 31358393-G, 3135X031, 3135X063, 4622074, 4622074/G, 4742199, 4742202, 4770623, 4803030/G, 500337911,
Re: [R] R Studio v3.0.3 for Windows 32bits is too slow
Grumpy today, Bert? While it is a fact that RStudio is a separate tool from R, it is clear from the question that the OP is interested in capabilities that R is providing and he simply cannot tell the difference. OP: 1) Better is a word that leads to pointless arguments. You will have to be the judge of what works for you. I caution you that Open Source tools almost always achieve success by interoperating with other OS tools, and much of the success you have already obtained is the result of many contributions, of which R and its contributed packages deserve the lion's share of credit. RStudio is a very convenient editor that makes using R and LaTeX and Markdown and version control easier, but it is unlikely that either the blame for your dissatisfaction or the credit for your success should be attributed to RStudio. I have successfully used all sorts of plain text editors and command line interfaces with R, and if you plan to scale up your projects then you will likely want to be very clear on this distinction between editors and computing tools so you can distribute your work on multiple parallel servers (where editors may not necessarily even be helpful) even if you choose to use RStudio as your controlling environment for launching such tasks. 2) and 3) I know that R has contributed packages that can manage Hadoop data processing, but I have no personal experience with them. Google is your friend... especially if you keep in mind that these tools are not all found in one monolithic package. For future reference: this is a plain text mailing list, so please adjust your mail client appropriately when sending to this list. Also, there are considerable resources mentioned in the Posting Guide that you should be aware of... see the link below. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On July 9, 2014 7:10:00 AM PDT, Bert Gunter gunter.ber...@gene.com wrote: RStudio is a separate product with its own support. Post there, not here. -- Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. Clifford Stoll On Tue, Jul 8, 2014 at 7:34 PM, Phan, Truong Q troung.p...@team.telstra.com wrote: Hi R'er, I have a dataset which has a matrix of 7502 x 1426 (rows x columns). The data is in a CSV format which has a size around 68Mb. This dataset is less than 10% of our dataset. I have been adopting the Anomaly detection method as described by http://www.mattpeeples.net/kmeans.html . It has been running more than 24hrs and still haven't completed the calculation. I did manage to run it with a smaller dataset (ie, 2100 rows x 1426 columns). It took around 12hrs to run. I have a few questions and need your expertise guidance. 1) Is there any better Open source tools to use to do in one tool (eg, R Studio): prepare data, build models, validate models, test models and present data. I am looking a tool which will allow me to do the same as per the above link (Matt Peeples' blog). 2) Is there an Open source tools to perform the above which will allow me to run on top of Hadoop eco-system? 3) Can we use R Studio for windows as a client to run on top of Hadoop eco-system? If yes, please point me to the site where they have a use cases or samples. Thanks and Regards, Truong Phan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Survival Analysis with an Historical Control
The code is actually available at the websites you provide. Try View page source in your browser. The most cryptic code isn't needed because the math functions (e.g, incomplete gamma function) are available in R. -Original Message- From: Paul Miller [mailto:pjmiller...@yahoo.com] Sent: Tuesday, July 08, 2014 12:00 PM To: r-help@r-project.org Subject: [R] Survival Analysis with an Historical Control Hello All, I'm trying to figure out how to perform a survival analysis with an historical control. I've spent some time looking online and in my boooks but haven't found much showing how to do this. Was wondering if there is a R package that can do it, or if there are resources somewhere that show the actual steps one takes, or if some knowledgeable person might be willing to share some code. Here is a statement that describes the sort of analyis I'm being asked to do. A one-sample parametric test assuming an exponential form of survival was used to test the hypothesis that the treatment produces a median PFS no greater than the historical control PFS of 16 weeks. A sample median PFS greater than 20.57 weeks would fall beyond the critical value associated with the null hypothesis, and would be considered statistically significant at alpha = .05, 1 tailed. My understanding is that the cutoff of 20.57 weeks was obtained using an online calculator that can be found at: http://www.swogstat.org/stat/public/one_survival.htm Thus far, I've been unable to determine what values were plugged into the calculator to get the cutoff. There's another calculator for a nonparamertric test that can be found at: http://www.swogstat.org/stat/public/one_nonparametric_survival.htm It would be nice to try doing this using both a parameteric and a non-parametric model. So my first question would be whether the approach outlined above is valid or if the analysis should be done some other way. If the basic idea is correct, is it relatively easy (for a Terry Therneau type genius) to implement the whole thing using R? The calculator is a great tool, but, if reasonable, it would be nice to be able to look at some code to see how the numbers actually get produced. Below are some sample survival data and code in case this proves helpful. Thanks, Paul ### Example Data: GD2 Vaccine ### connection - textConnection( GD2 1 8 12 GD2 3 -12 10 GD2 6 -52 7 GD2 7 28 10 GD2 8 44 6 GD2 10 14 8 GD2 12 3 8 GD2 14 -52 9 GD2 15 35 11 GD2 18 6 13 GD2 20 12 7 GD2 23 -7 13 GD2 24 -52 9 GD2 26 -52 12 GD2 28 36 13 GD2 31 -52 8 GD2 33 9 10 GD2 34 -11 16 GD2 36 -52 6 GD2 39 15 14 GD2 40 13 13 GD2 42 21 13 GD2 44 -24 16 GD2 46 -52 13 GD2 48 28 9 GD2 2 15 9 GD2 4 -44 10 GD2 5 -2 12 GD2 9 8 7 GD2 11 12 7 GD2 13 -52 7 GD2 16 21 7 GD2 17 19 11 GD2 19 6 16 GD2 21 10 16 GD2 22 -15 6 GD2 25 4 15 GD2 27 -9 9 GD2 29 27 10 GD2 30 1 17 GD2 32 12 8 GD2 35 20 8 GD2 37 -32 8 GD2 38 15 8 GD2 41 5 14 GD2 43 35 13 GD2 45 28 9 GD2 47 6 15 ) hsv - data.frame(scan(connection, list(VAC=, PAT=0, WKS=0, X=0))) hsv - transform(hsv, CENS=ifelse(WKS 1, 1, 0), WKS=abs(WKS)) head(hsv) require(survival) survObj - Surv(hsv$WKS, hsv$CENS==0) ~ 1 km - survfit(survObj, type=c(kaplan-meier)) print(km) paraExp - survreg(survObj, dist=exponential) print(paraExp) ** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reorder a list
Is the following 'g' what you want? A better example might be with A2a - lapply(A1, function(x)x+seq_along(x)/(100*length(x))) g - function (x, y) { xLengths - vapply(x, FUN = length, FUN.VALUE = 0L) yLengths - vapply(y, FUN = length, FUN.VALUE = 0L) stopifnot(identical(xLengths, yLengths)) split(unlist(y, use.names = FALSE), unlist(x, use.names = FALSE)) } Used as g(A1,A2) $`1` [1] 2.718282 $`2` [1] 7.389056 7.389056 $`3` [1] 20.08554 $`4` [1] 54.59815 54.59815 54.59815 $`5` [1] 148.4132 148.4132 $`13` [1] 442413.4 $`23` [1] 9744803446 Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Jul 9, 2014 at 2:04 AM, Lorenzo Alfieri alfio...@hotmail.com wrote: Thanks Bill and the other guys for the variety of useful replies! In fact I'm working with pretty big lists (with ~35000 sublists) and Bill's solution is the fastest one in terms of computing time. Now comes the second part of the question... :-) I've my usual list of values and time indices to sort: A1-list(c(1:4),c(2,4,5),23,c(4,5,13)) and then another list A2 with variables which have to be paired with the values of A1: A2-sapply(A1, exp)#(in my case there's no exp relation between A1 and A2, they're completely uncorrelated. That's just an example ) A2 [[1]] [1] 2.718282 7.389056 20.085537 54.598150 [[2]] [1] 7.389056 54.598150 148.413159 [[3]] [1] 9744803446 [[4]] [1] 54.59815148.41316 442413.39201 Now I'd like to reorder the elements of A2 according to the same rule applied for A1: f - function (x) { lengths - vapply(x, FUN = length, FUN.VALUE = 0L) split(rep(seq_along(x), lengths), unlist(x, use.names = FALSE)) } B1-f(A1) and thus obtain a list B2 which looks like this: B2 $`1` [1] 2.718282 $`2` [1] 7.389056 7.389056 $`3` [1] 20.08554 $`4` [1] 54.59815 54.59815 54.59815 $`5` [1] 148.4132 148.4132 $`13` [1] 442413.4 $`23` [1] 9744803446 (In this example each element is the exp() of the sublist name, but in a general case they would be uncorrelated, and the resulting elements of each sublist would be different) Any idea? Alfio From: wdun...@tibco.com Date: Tue, 8 Jul 2014 12:11:09 -0700 Subject: Re: [R] reorder a list To: alfio...@hotmail.com CC: r-help@r-project.org f - function (x) { lengths - vapply(x, FUN = length, FUN.VALUE = 0L) split(rep(seq_along(x), lengths), unlist(x, use.names = FALSE)) } f(A1) # gives about what you want (has, e.g., name 23, not position 23, in output) Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Jul 8, 2014 at 9:39 AM, Lorenzo Alfieri alfio...@hotmail.com wrote: Hi, I'm trying to find a way to reorder the elements of a list. Let's say I have a list like this: A1-list(c(1:4),c(2,4,5),23,c(4,5,13)) A1 [[1]] [1] 1 2 3 4 [[2]] [1] 2 4 5 [[3]] [1] 23 [[4]] [1] 4 5 13 All the elements included in it are values, while each sublist is a time index Now, I'd like to reorder the list (without looping) so to obtain one sublist for each value, which include all the time indices where each value appears. In other words, the result should look like this: A2 [[1]] [1] 1 [[2]] [1] 1 2 #because value 2 appears in the time index [[1]] and [[2]] of A1 [[3]] [1] 1 [[4]] [1] 1 2 4 [[5]] [1] 2 4 [[13]] [1] 4 [[23]] [1] 3 Any suggestion? Thanks Alfio [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reorder a list
Thanks for the suggestion. I found that I get the result I wanted with this simple command: split(unlist(A2),unlist(A1)) $`1` [1] 2.718282 $`2` [1] 7.389056 7.389056 $`3` [1] 20.08554 $`4` [1] 54.59815 54.59815 54.59815 $`5` [1] 148.4132 148.4132 $`13` [1] 442413.4 $`23` [1] 9744803446 which is indeed very similar to Bill's solution Alfio From: wdun...@tibco.com Date: Wed, 9 Jul 2014 09:26:14 -0700 Subject: Re: [R] reorder a list To: alfio...@hotmail.com CC: r-help@r-project.org Is the following 'g' what you want? A better example might be with A2a - lapply(A1, function(x)x+seq_along(x)/(100*length(x))) g - function (x, y) { xLengths - vapply(x, FUN = length, FUN.VALUE = 0L)yLengths - vapply(y, FUN = length, FUN.VALUE = 0L)stopifnot(identical(xLengths, yLengths))split(unlist(y, use.names = FALSE), unlist(x, use.names = FALSE)) }Used as g(A1,A2)$`1`[1] 2.718282 $`2`[1] 7.389056 7.389056 $`3`[1] 20.08554 $`4`[1] 54.59815 54.59815 54.59815 $`5`[1] 148.4132 148.4132 $`13`[1] 442413.4 $`23`[1] 9744803446 Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Jul 9, 2014 at 2:04 AM, Lorenzo Alfieri alfio...@hotmail.com wrote: Thanks Bill and the other guys for the variety of useful replies! In fact I'm working with pretty big lists (with ~35000 sublists) and Bill's solution is the fastest one in terms of computing time. Now comes the second part of the question... :-) I've my usual list of values and time indices to sort: A1-list(c(1:4),c(2,4,5),23,c(4,5,13)) and then another list A2 with variables which have to be paired with the values of A1: A2-sapply(A1, exp)#(in my case there's no exp relation between A1 and A2, they're completely uncorrelated. That's just an example ) A2 [[1]] [1] 2.718282 7.389056 20.085537 54.598150 [[2]] [1] 7.389056 54.598150 148.413159 [[3]] [1] 9744803446 [[4]] [1] 54.59815148.41316 442413.39201 Now I'd like to reorder the elements of A2 according to the same rule applied for A1: f - function (x) { lengths - vapply(x, FUN = length, FUN.VALUE = 0L) split(rep(seq_along(x), lengths), unlist(x, use.names = FALSE)) } B1-f(A1) and thus obtain a list B2 which looks like this: B2 $`1` [1] 2.718282 $`2` [1] 7.389056 7.389056 $`3` [1] 20.08554 $`4` [1] 54.59815 54.59815 54.59815 $`5` [1] 148.4132 148.4132 $`13` [1] 442413.4 $`23` [1] 9744803446 (In this example each element is the exp() of the sublist name, but in a general case they would be uncorrelated, and the resulting elements of each sublist would be different) Any idea? Alfio From: wdun...@tibco.com Date: Tue, 8 Jul 2014 12:11:09 -0700 Subject: Re: [R] reorder a list To: alfio...@hotmail.com CC: r-help@r-project.org f - function (x) { lengths - vapply(x, FUN = length, FUN.VALUE = 0L) split(rep(seq_along(x), lengths), unlist(x, use.names = FALSE)) } f(A1) # gives about what you want (has, e.g., name 23, not position 23, in output) Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Jul 8, 2014 at 9:39 AM, Lorenzo Alfieri alfio...@hotmail.com wrote: Hi, I'm trying to find a way to reorder the elements of a list. Let's say I have a list like this: A1-list(c(1:4),c(2,4,5),23,c(4,5,13)) A1 [[1]] [1] 1 2 3 4 [[2]] [1] 2 4 5 [[3]] [1] 23 [[4]] [1] 4 5 13 All the elements included in it are values, while each sublist is a time index Now, I'd like to reorder the list (without looping) so to obtain one sublist for each value, which include all the time indices where each value appears. In other words, the result should look like this: A2 [[1]] [1] 1 [[2]] [1] 1 2#because value 2 appears in the time index [[1]] and [[2]] of A1 [[3]] [1] 1 [[4]] [1] 1 2 4 [[5]] [1] 2 4 [[13]] [1] 4 [[23]] [1] 3 Any suggestion? Thanks Alfio [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] symbols in a data frame
Hello, I have recently received a dataset from a metal analysis company. The dataset is filled with less than symbols. What I am looking for is a efficient way to subset for any whole numbers from the dataset. The column is automatically formatted as a factor because of the symbols making it difficult to deal with the numbers is a useful way. So in sum any ideas on how I could subset the example below for only whole numbers? Thanks in advance! Sam #code metals - structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label = c(Antimony, Arsenic, Barium, Beryllium, Boron (Hot Water Soluble), Cadmium, Chromium, Cobalt, Copper, Lead, Mercury, Molybdenum, Nickel, pH 1:2, Selenium, Silver, Thallium, Tin, Vanadium, Zinc), class = factor), Cedar.Creek = structure(c(3L, 3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L, 4L, 4L, 3L), .Label = c(1, 10, 100, 1000, 200, 5, 500, 0.1, 0.13, 0.5, 0.8, 1.07, 1.1, 1.4, 1.5, 137, 154, 163, 165, 169, 178, 2.3, 2.4, 22, 24, 244, 27.2, 274, 3, 3.1, 40.2, 43, 50, 516, 53.3, 550, 569, 65, 66.1, 68, 7.6, 72, 77, 89, 951), class = factor)), .Names = c(Parameter, Cedar.Creek), row.names = c(NA, 19L), class = data.frame) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] symbols in a data frame
Hi Sam, I'd take the similar tack of removing the instead. Note that if you import the data frame using the stringsAsFactors=FALSE argument, you don't need the first step. metals$Cedar.Creek - as.character(metals$Cedar.Creek) metals$Cedar.Creek - gsub(, , metals$Cedar.Creek) metals$Cedar.Creek - as.numeric(metals$Cedar.Creek) R str(metals) 'data.frame':19 obs. of 2 variables: $ Parameter : Factor w/ 20 levels Antimony,Arsenic,..: 1 2 3 4 6 7 8 9 10 11 ... $ Cedar.Creek: num 100 100 500 100 10 1000 100 516 550 10 ... Sarah On Wed, Jul 9, 2014 at 1:19 PM, Sam Albers tonightstheni...@gmail.com wrote: Hello, I have recently received a dataset from a metal analysis company. The dataset is filled with less than symbols. What I am looking for is a efficient way to subset for any whole numbers from the dataset. The column is automatically formatted as a factor because of the symbols making it difficult to deal with the numbers is a useful way. So in sum any ideas on how I could subset the example below for only whole numbers? Thanks in advance! Sam #code metals - structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label = c(Antimony, Arsenic, Barium, Beryllium, Boron (Hot Water Soluble), Cadmium, Chromium, Cobalt, Copper, Lead, Mercury, Molybdenum, Nickel, pH 1:2, Selenium, Silver, Thallium, Tin, Vanadium, Zinc), class = factor), Cedar.Creek = structure(c(3L, 3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L, 4L, 4L, 3L), .Label = c(1, 10, 100, 1000, 200, 5, 500, 0.1, 0.13, 0.5, 0.8, 1.07, 1.1, 1.4, 1.5, 137, 154, 163, 165, 169, 178, 2.3, 2.4, 22, 24, 244, 27.2, 274, 3, 3.1, 40.2, 43, 50, 516, 53.3, 550, 569, 65, 66.1, 68, 7.6, 72, 77, 89, 951), class = factor)), .Names = c(Parameter, Cedar.Creek), row.names = c(NA, 19L), class = data.frame) -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] symbols in a data frame
On Jul 9, 2014, at 12:19 PM, Sam Albers tonightstheni...@gmail.com wrote: Hello, I have recently received a dataset from a metal analysis company. The dataset is filled with less than symbols. What I am looking for is a efficient way to subset for any whole numbers from the dataset. The column is automatically formatted as a factor because of the symbols making it difficult to deal with the numbers is a useful way. So in sum any ideas on how I could subset the example below for only whole numbers? Thanks in advance! Sam #code metals - structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label = c(Antimony, Arsenic, Barium, Beryllium, Boron (Hot Water Soluble), Cadmium, Chromium, Cobalt, Copper, Lead, Mercury, Molybdenum, Nickel, pH 1:2, Selenium, Silver, Thallium, Tin, Vanadium, Zinc), class = factor), Cedar.Creek = structure(c(3L, 3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L, 4L, 4L, 3L), .Label = c(1, 10, 100, 1000, 200, 5, 500, 0.1, 0.13, 0.5, 0.8, 1.07, 1.1, 1.4, 1.5, 137, 154, 163, 165, 169, 178, 2.3, 2.4, 22, 24, 244, 27.2, 274, 3, 3.1, 40.2, 43, 50, 516, 53.3, 550, 569, 65, 66.1, 68, 7.6, 72, 77, 89, 951), class = factor)), .Names = c(Parameter, Cedar.Creek), row.names = c(NA, 19L), class = data.frame) Sam, You can use ?gsub to remove the '' characters from the column and then use ?subset to select the records you wish. Note that gsub() returns a character vector, so you want to coerce to numeric. as.numeric(gsub(, , metals$Cedar.Creek)) [1] 100 100 500 100 10 1000 100 516 550 10 200 500 100 [14] 500 100 951 1000 1000 100 For example: subset(metals, as.numeric(gsub(, , Cedar.Creek)) == 100) Parameter Cedar.Creek 1 Antimony100 2Arsenic100 4 Beryllium100 7 Cobalt100 13 Selenium100 15 Thallium100 19 Antimony100 subset(metals, as.numeric(gsub(, , Cedar.Creek)) = 500) Parameter Cedar.Creek 1Antimony100 2 Arsenic100 3 Barium500 4 Beryllium100 5 Cadmium 10 7 Cobalt100 10Mercury 10 11 Molybdenum200 12 Nickel500 13 Selenium100 14 Silver500 15 Thallium100 19 Antimony100 You can also just create a new column that is numeric and go from there: metals$CC.Num - as.numeric(gsub(, , metals$Cedar.Creek)) str(metals) 'data.frame': 19 obs. of 3 variables: $ Parameter : Factor w/ 20 levels Antimony,Arsenic,..: 1 2 3 4 6 7 8 9 10 11 ... $ Cedar.Creek: Factor w/ 45 levels 1,10,100,..: 3 3 7 3 2 4 3 34 36 2 ... $ CC.Num : num 100 100 500 100 10 1000 100 516 550 10 ... metals Parameter Cedar.Creek CC.Num 1Antimony100100 2 Arsenic100100 3 Barium500500 4 Beryllium100100 5 Cadmium 10 10 6Chromium 1000 1000 7 Cobalt100100 8 Copper 516516 9Lead 550550 10Mercury 10 10 11 Molybdenum200200 12 Nickel500500 13 Selenium100100 14 Silver500500 15 Thallium100100 16Tin 951951 17 Vanadium 1000 1000 18 Zinc 1000 1000 19 Antimony100100 Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] symbols in a data frame
Well, ?grep and ?regex are clearly apropos here -- dealing with character data is an essential skill for handling input from diverse sources with various formatting conventions. I suggest you go through one of the many regular expression tutorials on the web to learn more. But this may not be the important issue here at all. If k means the value is left censored at k -- i.e. we know it's less than k but not how much less -- than Sarah's proposal is not what you want to do. Exactly what you do want to do depends on context, and as it concerns statistical methodology, is not something that should be discussed here. Consult a local statistician if this is a correct guess. Otherwise ignore. ... and please post in plain text in future (as requested) as HTML can get garbled. Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. Clifford Stoll On Wed, Jul 9, 2014 at 10:26 AM, Sarah Goslee sarah.gos...@gmail.com wrote: Hi Sam, I'd take the similar tack of removing the instead. Note that if you import the data frame using the stringsAsFactors=FALSE argument, you don't need the first step. metals$Cedar.Creek - as.character(metals$Cedar.Creek) metals$Cedar.Creek - gsub(, , metals$Cedar.Creek) metals$Cedar.Creek - as.numeric(metals$Cedar.Creek) R str(metals) 'data.frame':19 obs. of 2 variables: $ Parameter : Factor w/ 20 levels Antimony,Arsenic,..: 1 2 3 4 6 7 8 9 10 11 ... $ Cedar.Creek: num 100 100 500 100 10 1000 100 516 550 10 ... Sarah On Wed, Jul 9, 2014 at 1:19 PM, Sam Albers tonightstheni...@gmail.com wrote: Hello, I have recently received a dataset from a metal analysis company. The dataset is filled with less than symbols. What I am looking for is a efficient way to subset for any whole numbers from the dataset. The column is automatically formatted as a factor because of the symbols making it difficult to deal with the numbers is a useful way. So in sum any ideas on how I could subset the example below for only whole numbers? Thanks in advance! Sam #code metals - structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label = c(Antimony, Arsenic, Barium, Beryllium, Boron (Hot Water Soluble), Cadmium, Chromium, Cobalt, Copper, Lead, Mercury, Molybdenum, Nickel, pH 1:2, Selenium, Silver, Thallium, Tin, Vanadium, Zinc), class = factor), Cedar.Creek = structure(c(3L, 3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L, 4L, 4L, 3L), .Label = c(1, 10, 100, 1000, 200, 5, 500, 0.1, 0.13, 0.5, 0.8, 1.07, 1.1, 1.4, 1.5, 137, 154, 163, 165, 169, 178, 2.3, 2.4, 22, 24, 244, 27.2, 274, 3, 3.1, 40.2, 43, 50, 516, 53.3, 550, 569, 65, 66.1, 68, 7.6, 72, 77, 89, 951), class = factor)), .Names = c(Parameter, Cedar.Creek), row.names = c(NA, 19L), class = data.frame) -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to include factor levels into plot title?
@ Sarah Thanks a lot, paste does the job perfectly! On 09/07/2014 17:46, Sarah Goslee wrote: How about: plot(dd$Sepal.Length, dd$Petal.Length, main=paste(These are the levels:, paste(levels(dd$Species), collapse=, ))) Thanks for the actual reproducible example! Sarah On Wed, Jul 9, 2014 at 11:24 AM, Bea GD aguitatie...@hotmail.com wrote: Hi all, I'd like to include the levels of one of my variables in the title of a plot. I'd like these factor levels to be concatenated. E.g. 'These are the levels: setosa, versicolor, virginica'. I've been working with this code but I don't get the desired results. Any suggestions would be a great help. Thanks! dd - iris plot(dd$Sepal.Length, dd$Petal.Length, main=sprintf(These are the levels: %s, levels(dd$Species))) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] symbols in a data frame
Thanks for all the responses. It sometimes difficult to outline exactly what you need. These response were helpful to get there. Speaking to Bert's point a bit, I needed a column to identify where the symbol was used. If I knew more about R I think I might be embarrassed to post my solution to that problem but here is how I used Sarah's solution but still kept the info about detection limits. I'm sure there is a more elegant way: metals - structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label = c(Antimony, Arsenic, Barium, Beryllium, Boron (Hot Water Soluble), Cadmium, Chromium, Cobalt, Copper, Lead, Mercury, Molybdenum, Nickel, pH 1:2, Selenium, Silver, Thallium, Tin, Vanadium, Zinc), class = factor), Cedar.Creek = structure(c(3L, 3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L, 4L, 4L, 3L), .Label = c(1, 10, 100, 1000, 200, 5, 500, 0.1, 0.13, 0.5, 0.8, 1.07, 1.1, 1.4, 1.5, 137, 154, 163, 165, 169, 178, 2.3, 2.4, 22, 24, 244, 27.2, 274, 3, 3.1, 40.2, 43, 50, 516, 53.3, 550, 569, 65, 66.1, 68, 7.6, 72, 77, 89, 951), class = factor)), .Names = c(Parameter, Cedar.Creek), row.names = c(NA, 19L), class = data.frame) metals$temp1-metals$Cedar.Creek metals$Cedar.Creek - as.character(metals$Cedar.Creek) metals$Cedar.Creek - gsub(, , metals$Cedar.Creek) metals$Cedar.Creek - as.numeric(metals$Cedar.Creek) metals$temp2-metals$temp1==metals$Cedar.Creek metals$Detection-factor(ifelse(metals$temp2==TRUE,Measured,Limit)) metals[,c(1,2,5)] Thanks again! Sam On Wed, Jul 9, 2014 at 10:41 AM, Bert Gunter gunter.ber...@gene.com wrote: Well, ?grep and ?regex are clearly apropos here -- dealing with character data is an essential skill for handling input from diverse sources with various formatting conventions. I suggest you go through one of the many regular expression tutorials on the web to learn more. But this may not be the important issue here at all. If k means the value is left censored at k -- i.e. we know it's less than k but not how much less -- than Sarah's proposal is not what you want to do. Exactly what you do want to do depends on context, and as it concerns statistical methodology, is not something that should be discussed here. Consult a local statistician if this is a correct guess. Otherwise ignore. ... and please post in plain text in future (as requested) as HTML can get garbled. Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. Clifford Stoll On Wed, Jul 9, 2014 at 10:26 AM, Sarah Goslee sarah.gos...@gmail.com wrote: Hi Sam, I'd take the similar tack of removing the instead. Note that if you import the data frame using the stringsAsFactors=FALSE argument, you don't need the first step. metals$Cedar.Creek - as.character(metals$Cedar.Creek) metals$Cedar.Creek - gsub(, , metals$Cedar.Creek) metals$Cedar.Creek - as.numeric(metals$Cedar.Creek) R str(metals) 'data.frame':19 obs. of 2 variables: $ Parameter : Factor w/ 20 levels Antimony,Arsenic,..: 1 2 3 4 6 7 8 9 10 11 ... $ Cedar.Creek: num 100 100 500 100 10 1000 100 516 550 10 ... Sarah On Wed, Jul 9, 2014 at 1:19 PM, Sam Albers tonightstheni...@gmail.com wrote: Hello, I have recently received a dataset from a metal analysis company. The dataset is filled with less than symbols. What I am looking for is a efficient way to subset for any whole numbers from the dataset. The column is automatically formatted as a factor because of the symbols making it difficult to deal with the numbers is a useful way. So in sum any ideas on how I could subset the example below for only whole numbers? Thanks in advance! Sam #code metals - structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label = c(Antimony, Arsenic, Barium, Beryllium, Boron (Hot Water Soluble), Cadmium, Chromium, Cobalt, Copper, Lead, Mercury, Molybdenum, Nickel, pH 1:2, Selenium, Silver, Thallium, Tin, Vanadium, Zinc), class = factor), Cedar.Creek = structure(c(3L, 3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L, 4L, 4L, 3L), .Label = c(1, 10, 100, 1000, 200, 5, 500, 0.1, 0.13, 0.5, 0.8, 1.07, 1.1, 1.4, 1.5, 137, 154, 163, 165, 169, 178, 2.3, 2.4, 22, 24, 244, 27.2, 274, 3, 3.1, 40.2, 43, 50, 516, 53.3, 550, 569, 65, 66.1, 68, 7.6, 72, 77, 89, 951), class = factor)), .Names = c(Parameter, Cedar.Creek), row.names = c(NA, 19L), class = data.frame) -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __
Re: [R] HPGL or PCL plotting device? Or otherwise plotting plots
Actually, this doesn't _quite_ do what I want; I want different R colors (1, 2, 3, c.) to select different pens in HPGL (SP1, SP2, SP3, c.), but the HPGL file I get selects only pen 1. A hacky way to do this would be to generate a few different postscript files for the different colors on the plot, create the corresponding HPGL files, edit the SP command in each of them, and concatenate them. But maybe there's a better way? On 09 Jul 13:32, Thomas Levine wrote: Oh it was easier than I thought. postscript('project-contracts.ps') hist(log(projects$n.contracts)) dev.off() Then run this from the shell. pstoedit -f plot-hpgl project-contracts.ps project-contracts.hpgl And send it to the plotter. On 09 Jul 13:10, Thomas Levine wrote: Hi, I want to print plots on a Roland DXY-1100 plotter. How can I do this from R? I think the easiest thing would be a graphics device for Printer Command Language or Hewlett-Packard Graphics Language, but I haven't managed to find any of those. Thanks Tom __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Cansisc: Error in eigen(eHe, symmetric = TRUE)
Hi, I have a problem using the function Candisc from Candisc Package. bosques1-read.csv(bosques1.csv,header=TRUE,encoding=latin1) bosques1-na.exclude(bosques1) attach(bosques1) #Modelo de regresión mod - lm(cbind(biomasa,altdosel,altsoto,cobertura,riqarb,elevacion,temperatura,precipitacion, Acanthaceae, Apocinaceae, Araceae, Araliaceae, Arecaceae, Aspleniaceae, Begoniaceae, Blechnaceae, Bromeliaceae, Clusiaceae, Cyclanthaceae, Davalliaceae, Denstaedtiaceae, Dryopteridaceae, Ericaceae, Gesneriaceae, Hymenophyllaceae, indet., Lauraceae, Lomariopsidaceae, Lycopodiaceae, Melastomataceae, Moraceae, Myrsinaceae, Ophioglossaceae, Orchidaceae, Peperomia, Piperaceae, Poaceae, Polypodiaceae, Primulaceae, Pteridaceae, Pteridophyta.taxa, Rubiaceae, Vittariaceae) ~ sitio, data=bosques1) summary(mod) #Gráfico 1 can - candisc(mod, term=sitio,data=bosques1,ndim=1,eig=T) ### The error happens here, so I can not run the plot. plot(can,titles.1d = c(Puntuación canónica, Estructura)) summary(can, means = FALSE, scores = TRUE, coef = c(std), digits = 2) The error is: Error in eigen(eHe, symmetric = TRUE) : infinite or missing values in 'x' In addition: Warning message: In sqrt(wmd) : NaNs produced Please help! -- Maria Judith Carmona Higuita. Estudiante de BiologÃa - Universidad de Antioquia MedellÃn - Colombia La felicidad ocurre cuando encajas en tu vida, cuando encajas tan armónicamente que cualquier cosa que hagas es una alegrÃa para ti. De repente lo sabrás y la meditación te seguirá. Si amas el trabajo que haces, si amas la manera como vives, entonces ya estás meditando y nada puede distraerte. Osho [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] using match to obtain non-sorted index values from non-sorted vector
Hello all, I've been struggling with the best way to find index values from a large vector with elements that will match elements of a subset vector [the table argument in match()]. BUT the index values can't come out sorted (as we'd get in which(X %in% Y) ). My 'population' vector can't be sorted. pop.df - data.frame(pop=c(1,6,4,3,10)) The subset: Tset = c(10,3,6) So I'd like to get these index values (from pop.df) , in this order: 5,4,2 If it could be sorted I could use: which(sort(pop.df$pop) %in% sort(Tset)) But sorting will cause more grief later, so best not mess with it. Here is my hopefully adequate MWE of a solution. I'm keen to see if anybody has a better suggestion. Thanks! _ ###BEGIN R #pop is the full set of values, it has no info on their ranking # I don't want to sort these data. They need to remain in this order. pop.df - data.frame(pop=c(1,6,4,3,10)) #rank.df is my dataframe that tells me the top three rankings (derived elsewhere) rank.df - data.frame(rank=1:3, Tset = c(10,3,6)) # Target set #match.df will be my source of row index based on rank match.df - data.frame(match.vec= match(pop.df$pop, table=rank.df$Tset), index.vec=1:nrow(pop.df)) #rank.df will now include the index location in the pop.df where I can find the top three ranks. rank.df - merge(rank.df, match.df, by.x='rank', by.y='match.vec') rank.df END ___ Michael Folkes Salmon Stock Assessment Canadian Dept. of Fisheries Oceans Pacific Biological Station 3190 Hammond Bay Rd. Nanaimo, B.C., Canada V9T-6N7 Ph (250) 756-7264 Fax (250) 756-7053 michael.fol...@dfo-mpo.gc.ca mailto:michael.fol...@dfo-mpo.gc.ca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using match to obtain non-sorted index values from non-sorted vector
There may be a faster way, but sapply(Tset, function(x) which(pop.df$pop==x)) [1] 5 4 2 - David L Carlson Department of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Folkes, Michael Sent: Wednesday, July 9, 2014 2:58 PM To: r-help@r-project.org Subject: [R] using match to obtain non-sorted index values from non-sorted vector Hello all, I've been struggling with the best way to find index values from a large vector with elements that will match elements of a subset vector [the table argument in match()]. BUT the index values can't come out sorted (as we'd get in which(X %in% Y) ). My 'population' vector can't be sorted. pop.df - data.frame(pop=c(1,6,4,3,10)) The subset: Tset = c(10,3,6) So I'd like to get these index values (from pop.df) , in this order: 5,4,2 If it could be sorted I could use: which(sort(pop.df$pop) %in% sort(Tset)) But sorting will cause more grief later, so best not mess with it. Here is my hopefully adequate MWE of a solution. I'm keen to see if anybody has a better suggestion. Thanks! _ ###BEGIN R #pop is the full set of values, it has no info on their ranking # I don't want to sort these data. They need to remain in this order. pop.df - data.frame(pop=c(1,6,4,3,10)) #rank.df is my dataframe that tells me the top three rankings (derived elsewhere) rank.df - data.frame(rank=1:3, Tset = c(10,3,6)) # Target set #match.df will be my source of row index based on rank match.df - data.frame(match.vec= match(pop.df$pop, table=rank.df$Tset), index.vec=1:nrow(pop.df)) #rank.df will now include the index location in the pop.df where I can find the top three ranks. rank.df - merge(rank.df, match.df, by.x='rank', by.y='match.vec') rank.df END ___ Michael Folkes Salmon Stock Assessment Canadian Dept. of Fisheries Oceans Pacific Biological Station 3190 Hammond Bay Rd. Nanaimo, B.C., Canada V9T-6N7 Ph (250) 756-7264 Fax (250) 756-7053 michael.fol...@dfo-mpo.gc.ca mailto:michael.fol...@dfo-mpo.gc.ca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] symbols in a data frame
Hi Sam, But this may not be the important issue here at all. If k means the value is left censored at k -- i.e. we know it's less than k but not how much less -- than Sarah's proposal is not what you want to do. Exactly what you do want to do depends on context, and as it concerns statistical methodology, is not something that should be discussed here. Consult a local statistician if this is a correct guess. I'd like to chime in with Bert's advise here. Unless the LOQs are very few*, they have the potential to seriously mess up any further data analysis. Actually, I'd recommend you go one step back and ask the analysis lab whether they can supply you with the uncensored data, specifying the LOQ separately. A while ago I posted some illustrations about such censoring at LOQ situations on cross validated, which may help you in forming a decision how to go on: http://stats.stackexchange.com/a/30739/4598 Claudia (Analytical Chemist Chemometrician) *or you know that they'll not matter for the particular data analysis you want to do -- Claudia Beleites, Chemist Spectroscopy/Imaging Leibniz Institute of Photonic Technology Albert-Einstein-Str. 9 07745 Jena Germany email: claudia.belei...@ipht-jena.de phone: +49 3641 206-133 fax: +49 2641 206-399 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using match to obtain non-sorted index values from non-sortedvector
So nice! Apply wins again. Thanks David. Michael -Original Message- From: David L Carlson [mailto:dcarl...@tamu.edu] Sent: July-09-14 1:11 PM To: Folkes, Michael; r-help@r-project.org Subject: RE: using match to obtain non-sorted index values from non-sortedvector There may be a faster way, but sapply(Tset, function(x) which(pop.df$pop==x)) [1] 5 4 2 - David L Carlson Department of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Folkes, Michael Sent: Wednesday, July 9, 2014 2:58 PM To: r-help@r-project.org Subject: [R] using match to obtain non-sorted index values from non-sorted vector Hello all, I've been struggling with the best way to find index values from a large vector with elements that will match elements of a subset vector [the table argument in match()]. BUT the index values can't come out sorted (as we'd get in which(X %in% Y) ). My 'population' vector can't be sorted. pop.df - data.frame(pop=c(1,6,4,3,10)) The subset: Tset = c(10,3,6) So I'd like to get these index values (from pop.df) , in this order: 5,4,2 If it could be sorted I could use: which(sort(pop.df$pop) %in% sort(Tset)) But sorting will cause more grief later, so best not mess with it. Here is my hopefully adequate MWE of a solution. I'm keen to see if anybody has a better suggestion. Thanks! _ ###BEGIN R #pop is the full set of values, it has no info on their ranking # I don't want to sort these data. They need to remain in this order. pop.df - data.frame(pop=c(1,6,4,3,10)) #rank.df is my dataframe that tells me the top three rankings (derived elsewhere) rank.df - data.frame(rank=1:3, Tset = c(10,3,6)) # Target set #match.df will be my source of row index based on rank match.df - data.frame(match.vec= match(pop.df$pop, table=rank.df$Tset), index.vec=1:nrow(pop.df)) #rank.df will now include the index location in the pop.df where I can find the top three ranks. rank.df - merge(rank.df, match.df, by.x='rank', by.y='match.vec') rank.df END ___ Michael Folkes Salmon Stock Assessment Canadian Dept. of Fisheries Oceans Pacific Biological Station 3190 Hammond Bay Rd. Nanaimo, B.C., Canada V9T-6N7 Ph (250) 756-7264 Fax (250) 756-7053 michael.fol...@dfo-mpo.gc.ca mailto:michael.fol...@dfo-mpo.gc.ca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using match to obtain non-sorted index values from non-sortedvector
On Jul 9, 2014, at 1:13 PM, Folkes, Michael wrote: So nice! Apply wins again. I doubt that `sapply( ..., which(,) )` would win a foot race with `match`: match(Tset, pop.df$pop) [1] 5 4 2 -- David. Thanks David. Michael -Original Message- From: David L Carlson [mailto:dcarl...@tamu.edu] Sent: July-09-14 1:11 PM To: Folkes, Michael; r-help@r-project.org Subject: RE: using match to obtain non-sorted index values from non-sortedvector There may be a faster way, but sapply(Tset, function(x) which(pop.df$pop==x)) [1] 5 4 2 - David L Carlson Department of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Folkes, Michael Sent: Wednesday, July 9, 2014 2:58 PM To: r-help@r-project.org Subject: [R] using match to obtain non-sorted index values from non-sorted vector Hello all, I've been struggling with the best way to find index values from a large vector with elements that will match elements of a subset vector [the table argument in match()]. BUT the index values can't come out sorted (as we'd get in which(X %in% Y) ). My 'population' vector can't be sorted. pop.df - data.frame(pop=c(1,6,4,3,10)) The subset: Tset = c(10,3,6) So I'd like to get these index values (from pop.df) , in this order: 5,4,2 If it could be sorted I could use: which(sort(pop.df$pop) %in% sort(Tset)) But sorting will cause more grief later, so best not mess with it. Here is my hopefully adequate MWE of a solution. I'm keen to see if anybody has a better suggestion. Thanks! _ ###BEGIN R #pop is the full set of values, it has no info on their ranking # I don't want to sort these data. They need to remain in this order. pop.df - data.frame(pop=c(1,6,4,3,10)) #rank.df is my dataframe that tells me the top three rankings (derived elsewhere) rank.df - data.frame(rank=1:3, Tset = c(10,3,6)) # Target set #match.df will be my source of row index based on rank match.df - data.frame(match.vec= match(pop.df$pop, table=rank.df$Tset), index.vec=1:nrow(pop.df)) #rank.df will now include the index location in the pop.df where I can find the top three ranks. rank.df - merge(rank.df, match.df, by.x='rank', by.y='match.vec') rank.df END ___ Michael Folkes Salmon Stock Assessment David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using match to obtain non-sorted index values from non-sortedvector
Oh dear, I seem to have suffered a case of reversed arguments. This explains my surprise why R didn't have this in a function already - as it does! I was following the pattern of search.vector %in% pattern, but match() arguments are opposite this. Thanks to both Davids. Michael -Original Message- From: David Winsemius [mailto:dwinsem...@comcast.net] Sent: July-09-14 2:01 PM To: Folkes, Michael Cc: David L Carlson; r-help@r-project.org Subject: Re: [R] using match to obtain non-sorted index values from non-sortedvector On Jul 9, 2014, at 1:13 PM, Folkes, Michael wrote: So nice! Apply wins again. I doubt that `sapply( ..., which(,) )` would win a foot race with `match`: match(Tset, pop.df$pop) [1] 5 4 2 -- David. Thanks David. Michael -Original Message- From: David L Carlson [mailto:dcarl...@tamu.edu] Sent: July-09-14 1:11 PM To: Folkes, Michael; r-help@r-project.org Subject: RE: using match to obtain non-sorted index values from non-sortedvector There may be a faster way, but sapply(Tset, function(x) which(pop.df$pop==x)) [1] 5 4 2 - David L Carlson Department of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Folkes, Michael Sent: Wednesday, July 9, 2014 2:58 PM To: r-help@r-project.org Subject: [R] using match to obtain non-sorted index values from non-sorted vector Hello all, I've been struggling with the best way to find index values from a large vector with elements that will match elements of a subset vector [the table argument in match()]. BUT the index values can't come out sorted (as we'd get in which(X %in% Y) ). My 'population' vector can't be sorted. pop.df - data.frame(pop=c(1,6,4,3,10)) The subset: Tset = c(10,3,6) So I'd like to get these index values (from pop.df) , in this order: 5,4,2 If it could be sorted I could use: which(sort(pop.df$pop) %in% sort(Tset)) But sorting will cause more grief later, so best not mess with it. Here is my hopefully adequate MWE of a solution. I'm keen to see if anybody has a better suggestion. Thanks! _ ###BEGIN R #pop is the full set of values, it has no info on their ranking # I don't want to sort these data. They need to remain in this order. pop.df - data.frame(pop=c(1,6,4,3,10)) #rank.df is my dataframe that tells me the top three rankings (derived elsewhere) rank.df - data.frame(rank=1:3, Tset = c(10,3,6)) # Target set #match.df will be my source of row index based on rank match.df - data.frame(match.vec= match(pop.df$pop, table=rank.df$Tset), index.vec=1:nrow(pop.df)) #rank.df will now include the index location in the pop.df where I can find the top three ranks. rank.df - merge(rank.df, match.df, by.x='rank', by.y='match.vec') rank.df END ___ Michael Folkes Salmon Stock Assessment David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R Studio v3.0.3 for Windows 32bits is too slow
On 10/07/14 04:24, Jeff Newmiller wrote: Grumpy today, Bert? SNIP Bert is ***always*** grumpy! :-) If he weren't, I'd get worried. But then someone else, not more than a million miles from this email, has a strong tendency to be grumpy (acerbic?) as well. Of course ***I*** am ***never*** grumpy! :-) cheers, Rolf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cansisc: Error in eigen(eHe, symmetric = TRUE)
Dear Maria Judith Carmona Higuita, Since you didn't include enough information (such as your access to your data) to reproduce the error, one can only guess. My guess: you have fewer observations in your data set than response variables on the LHS of the multivariate linear model. I hope this helps, John John Fox, Professor McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/ On Wed, 9 Jul 2014 11:36:35 -0500 Maria Judith Carmona H juditycarm...@gmail.com wrote: Hi, I have a problem using the function Candisc from Candisc Package. bosques1-read.csv(bosques1.csv,header=TRUE,encoding=latin1) bosques1-na.exclude(bosques1) attach(bosques1) #Modelo de regresión mod - lm(cbind(biomasa,altdosel,altsoto,cobertura,riqarb,elevacion,temperatura,precipitacion, Acanthaceae, Apocinaceae, Araceae, Araliaceae, Arecaceae, Aspleniaceae, Begoniaceae, Blechnaceae, Bromeliaceae, Clusiaceae, Cyclanthaceae, Davalliaceae, Denstaedtiaceae, Dryopteridaceae, Ericaceae, Gesneriaceae, Hymenophyllaceae, indet., Lauraceae, Lomariopsidaceae, Lycopodiaceae, Melastomataceae, Moraceae, Myrsinaceae, Ophioglossaceae, Orchidaceae, Peperomia, Piperaceae, Poaceae, Polypodiaceae, Primulaceae, Pteridaceae, Pteridophyta.taxa, Rubiaceae, Vittariaceae) ~ sitio, data=bosques1) summary(mod) #Gráfico 1 can - candisc(mod, term=sitio,data=bosques1,ndim=1,eig=T) ### The error happens here, so I can not run the plot. plot(can,titles.1d = c(Puntuación canónica, Estructura)) summary(can, means = FALSE, scores = TRUE, coef = c(std), digits = 2) The error is: Error in eigen(eHe, symmetric = TRUE) : infinite or missing values in 'x' In addition: Warning message: In sqrt(wmd) : NaNs produced Please help! -- Maria Judith Carmona Higuita. Estudiante de BiologÃa - Universidad de Antioquia MedellÃn - Colombia La felicidad ocurre cuando encajas en tu vida, cuando encajas tan armónicamente que cualquier cosa que hagas es una alegrÃa para ti. De repente lo sabrás y la meditación te seguirá. Si amas el trabajo que haces, si amas la manera como vives, entonces ya estás meditando y nada puede distraerte. Osho [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] symbols in a data frame
After reading the metals data frame, I would do this: metals$result - as.numeric(gsub('','',metals$Cedar.Creek)) metals$flag - ifelse(grepl('',metals$Cedar.Creek),'','h') Also, assuming you got your data into R using read.table(), read.csv(), or similar, I would include stringsAsFactors=TRUE as another argument to the function call. You don't need factors at this point. -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 7/9/14 11:02 AM, Sam Albers tonightstheni...@gmail.com wrote: Thanks for all the responses. It sometimes difficult to outline exactly what you need. These response were helpful to get there. Speaking to Bert's point a bit, I needed a column to identify where the symbol was used. If I knew more about R I think I might be embarrassed to post my solution to that problem but here is how I used Sarah's solution but still kept the info about detection limits. I'm sure there is a more elegant way: metals - structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label = c(Antimony, Arsenic, Barium, Beryllium, Boron (Hot Water Soluble), Cadmium, Chromium, Cobalt, Copper, Lead, Mercury, Molybdenum, Nickel, pH 1:2, Selenium, Silver, Thallium, Tin, Vanadium, Zinc), class = factor), Cedar.Creek = structure(c(3L, 3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L, 4L, 4L, 3L), .Label = c(1, 10, 100, 1000, 200, 5, 500, 0.1, 0.13, 0.5, 0.8, 1.07, 1.1, 1.4, 1.5, 137, 154, 163, 165, 169, 178, 2.3, 2.4, 22, 24, 244, 27.2, 274, 3, 3.1, 40.2, 43, 50, 516, 53.3, 550, 569, 65, 66.1, 68, 7.6, 72, 77, 89, 951), class = factor)), .Names = c(Parameter, Cedar.Creek), row.names = c(NA, 19L), class = data.frame) metals$temp1-metals$Cedar.Creek metals$Cedar.Creek - as.character(metals$Cedar.Creek) metals$Cedar.Creek - gsub(, , metals$Cedar.Creek) metals$Cedar.Creek - as.numeric(metals$Cedar.Creek) metals$temp2-metals$temp1==metals$Cedar.Creek metals$Detection-factor(ifelse(metals$temp2==TRUE,Measured,Limit)) metals[,c(1,2,5)] Thanks again! Sam On Wed, Jul 9, 2014 at 10:41 AM, Bert Gunter gunter.ber...@gene.com wrote: Well, ?grep and ?regex are clearly apropos here -- dealing with character data is an essential skill for handling input from diverse sources with various formatting conventions. I suggest you go through one of the many regular expression tutorials on the web to learn more. But this may not be the important issue here at all. If k means the value is left censored at k -- i.e. we know it's less than k but not how much less -- than Sarah's proposal is not what you want to do. Exactly what you do want to do depends on context, and as it concerns statistical methodology, is not something that should be discussed here. Consult a local statistician if this is a correct guess. Otherwise ignore. ... and please post in plain text in future (as requested) as HTML can get garbled. Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. Clifford Stoll On Wed, Jul 9, 2014 at 10:26 AM, Sarah Goslee sarah.gos...@gmail.com wrote: Hi Sam, I'd take the similar tack of removing the instead. Note that if you import the data frame using the stringsAsFactors=FALSE argument, you don't need the first step. metals$Cedar.Creek - as.character(metals$Cedar.Creek) metals$Cedar.Creek - gsub(, , metals$Cedar.Creek) metals$Cedar.Creek - as.numeric(metals$Cedar.Creek) R str(metals) 'data.frame':19 obs. of 2 variables: $ Parameter : Factor w/ 20 levels Antimony,Arsenic,..: 1 2 3 4 6 7 8 9 10 11 ... $ Cedar.Creek: num 100 100 500 100 10 1000 100 516 550 10 ... Sarah On Wed, Jul 9, 2014 at 1:19 PM, Sam Albers tonightstheni...@gmail.com wrote: Hello, I have recently received a dataset from a metal analysis company. The dataset is filled with less than symbols. What I am looking for is a efficient way to subset for any whole numbers from the dataset. The column is automatically formatted as a factor because of the symbols making it difficult to deal with the numbers is a useful way. So in sum any ideas on how I could subset the example below for only whole numbers? Thanks in advance! Sam #code metals - structure(list(Parameter = structure(c(1L, 2L, 3L, 4L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 18L, 19L, 20L, 1L), .Label = c(Antimony, Arsenic, Barium, Beryllium, Boron (Hot Water Soluble), Cadmium, Chromium, Cobalt, Copper, Lead, Mercury, Molybdenum, Nickel, pH 1:2, Selenium, Silver, Thallium, Tin, Vanadium, Zinc), class = factor), Cedar.Creek = structure(c(3L, 3L, 7L, 3L, 2L, 4L, 3L, 34L, 36L, 2L, 5L, 7L, 3L, 7L, 3L, 45L, 4L, 4L, 3L), .Label = c(1, 10, 100, 1000, 200, 5, 500, 0.1, 0.13, 0.5, 0.8, 1.07, 1.1, 1.4, 1.5, 137, 154, 163, 165, 169, 178, 2.3, 2.4, 22, 24, 244, 27.2,
[R] function completing properly
Hi R community, i created a function (mkdate) as follows: mkdate = function(x) { x$date = as.Date(paste(x$year, x$month, x$day, sep=-)) x$wy = ifelse(x$month =10, x$year+1, x$year) x$yd = as.integer(format(as.Date(x$date), format=%j)) x$wyd = cal.wyd(x) x } the function results in adding the new columns date, wy, yd, and wyd to the table i apply it to. this has always worked in R version 2.14.2. however, in R version 3.1.0 - instead of my mkdate function adding those columns to my existing table, it just overwrites my table and leaves me with just a list of the last variable created by my mkdate function. so i end up with just a list of numbers representing wyd, and lose all the data in my original table. does anyone know what would now be causing this to occur, and what i need to do to make my function work properly again? thank you for any assistance, Janet [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R Studio v3.0.3 for Windows 32bits is too slow
Grumpy today, Jeff? For the concrete issue, I'd conjecture that the base problem is that there are way too many columns in the data and that the nature of the method is not properly understood. It is not obvious that k-means clustering based on Euclidean distance makes sense in 1426-dimensional space. It is quite possible that the data set not even consists of columns measured in the same units. Even if it does fit the problem, it is a quite computationally intensive. Some sort of feature extraction or data reduction technique is likely to be required. So basically, further study of the methodology, or contact with a machine learning expert (which I am not) seems advisable. -pd On 09 Jul 2014, at 18:24 , Jeff Newmiller jdnew...@dcn.davis.ca.us wrote: Grumpy today, Bert? While it is a fact that RStudio is a separate tool from R, it is clear from the question that the OP is interested in capabilities that R is providing and he simply cannot tell the difference. OP: 1) Better is a word that leads to pointless arguments. You will have to be the judge of what works for you. I caution you that Open Source tools almost always achieve success by interoperating with other OS tools, and much of the success you have already obtained is the result of many contributions, of which R and its contributed packages deserve the lion's share of credit. RStudio is a very convenient editor that makes using R and LaTeX and Markdown and version control easier, but it is unlikely that either the blame for your dissatisfaction or the credit for your success should be attributed to RStudio. I have successfully used all sorts of plain text editors and command line interfaces with R, and if you plan to scale up your projects then you will likely want to be very clear on this distinction between editors and computing tools so you can distribute your work on multiple parallel servers (where editors may not necessarily even be helpful) even if you choose to use RStudio as your controlling environment for launching such tasks. 2) and 3) I know that R has contributed packages that can manage Hadoop data processing, but I have no personal experience with them. Google is your friend... especially if you keep in mind that these tools are not all found in one monolithic package. For future reference: this is a plain text mailing list, so please adjust your mail client appropriately when sending to this list. Also, there are considerable resources mentioned in the Posting Guide that you should be aware of... see the link below. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On July 9, 2014 7:10:00 AM PDT, Bert Gunter gunter.ber...@gene.com wrote: RStudio is a separate product with its own support. Post there, not here. -- Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. Clifford Stoll On Tue, Jul 8, 2014 at 7:34 PM, Phan, Truong Q troung.p...@team.telstra.com wrote: Hi R'er, I have a dataset which has a matrix of 7502 x 1426 (rows x columns). The data is in a CSV format which has a size around 68Mb. This dataset is less than 10% of our dataset. I have been adopting the Anomaly detection method as described by http://www.mattpeeples.net/kmeans.html . It has been running more than 24hrs and still haven't completed the calculation. I did manage to run it with a smaller dataset (ie, 2100 rows x 1426 columns). It took around 12hrs to run. I have a few questions and need your expertise guidance. 1) Is there any better Open source tools to use to do in one tool (eg, R Studio): prepare data, build models, validate models, test models and present data. I am looking a tool which will allow me to do the same as per the above link (Matt Peeples' blog). 2) Is there an Open source tools to perform the above which will allow me to run on top of Hadoop eco-system? 3) Can we use R Studio for windows as a client to run on top of Hadoop eco-system? If yes, please point me to the site where they have a use cases or samples. Thanks and Regards, Truong Phan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
Re: [R] function completing properly
I think you are mistaken. Please provide an example of how you used this function in any version of R that behaved as you describe. Also, please post in plain text to avoid the what-you-see-is-not-what-we-see feature that HTML email provides. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On July 9, 2014 4:47:39 PM PDT, Janet Choate jsc@gmail.com wrote: Hi R community, i created a function (mkdate) as follows: mkdate = function(x) { x$date = as.Date(paste(x$year, x$month, x$day, sep=-)) x$wy = ifelse(x$month =10, x$year+1, x$year) x$yd = as.integer(format(as.Date(x$date), format=%j)) x$wyd = cal.wyd(x) x } the function results in adding the new columns date, wy, yd, and wyd to the table i apply it to. this has always worked in R version 2.14.2. however, in R version 3.1.0 - instead of my mkdate function adding those columns to my existing table, it just overwrites my table and leaves me with just a list of the last variable created by my mkdate function. so i end up with just a list of numbers representing wyd, and lose all the data in my original table. does anyone know what would now be causing this to occur, and what i need to do to make my function work properly again? thank you for any assistance, Janet [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cansisc: Error in eigen(eHe, symmetric = TRUE)
Dear Judith, I take it from your reply that you have *more* observations than there are response variables in the multivariate linear model, but since you still haven't provided access to the data, it's still impossible to tell what the problem is. I don't follow your application, possibly because I'm ignorant of the area in which you work, but also possibly because there's insufficient information about that too. When you say that there are 0 abundances, I assume that this doesn't mean that some abundance values are 0 for *all* observations. If that's the case, then that would I believe produce a computational error, though not I think the one you observed. As an aside, if there are many 0 abundances then using the multivariate normal distribution for the responses likely isn't reasonable, which is what you're doing, but this in itself won't produce a computational error in candisc(). So, to reiterate, without the data, there's not much more that I can say. Because I'm out of town and will be traveling tomorrow, I'm unlikely to be able to respond again for several days. Best, John On Wed, 9 Jul 2014 17:54:56 -0500 Maria Judith Carmona H juditycarm...@gmail.com wrote: Dear John, I am including abundance values ??in my data set so obviously I have zero abundances. The problem is that if plot only the factors (biomasa, altdosel, altsoto, cobertura, riqarb, elevacion, temperatura, precipitacion) I get the graphic, the same happen when I included only the families, but I want to see the effect of all these factors+families on this plot . In fact I included only certain families: prueba4 - lm(cbind(biomasa,altdosel,altsoto,cobertura,riqarb,elevacion,temperatura,precipitacion, Araceae,Begoniaceae,Bromeliaceae,Clusiaceae,Cyclanthaceae,Ericaceae,Gesneriaceae, Melastomataceae,Orchidaceae,Piperaceae,Pteridophyta) ~ sitio, data=bosques.p) canprueba2 - candisc(prueba2, term=sitio, data=bosques.p, ndim=1) Error in eigen (EHD, symmetric = TRUE): infinite or missing values ??in 'x' In addition: Warning message: In sqrt (wmd): NaNs produced You see I get the same error. Best regards, Judith On Wed, Jul 9, 2014 at 5:30 PM, John Fox j...@mcmaster.ca wrote: Dear Maria Judith Carmona Higuita, Since you didn't include enough information (such as your access to your data) to reproduce the error, one can only guess. My guess: you have fewer observations in your data set than response variables on the LHS of the multivariate linear model. I hope this helps, John John Fox, Professor McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/ On Wed, 9 Jul 2014 11:36:35 -0500 Maria Judith Carmona H juditycarm...@gmail.com wrote: Hi, I have a problem using the function Candisc from Candisc Package. bosques1-read.csv(bosques1.csv,header=TRUE,encoding=latin1) bosques1-na.exclude(bosques1) attach(bosques1) #Modelo de regresión mod - lm(cbind(biomasa,altdosel,altsoto,cobertura,riqarb,elevacion,temperatura,precipitacion, Acanthaceae, Apocinaceae, Araceae, Araliaceae, Arecaceae, Aspleniaceae, Begoniaceae, Blechnaceae, Bromeliaceae, Clusiaceae, Cyclanthaceae, Davalliaceae, Denstaedtiaceae, Dryopteridaceae, Ericaceae, Gesneriaceae, Hymenophyllaceae, indet., Lauraceae, Lomariopsidaceae, Lycopodiaceae, Melastomataceae, Moraceae, Myrsinaceae, Ophioglossaceae, Orchidaceae, Peperomia, Piperaceae, Poaceae, Polypodiaceae, Primulaceae, Pteridaceae, Pteridophyta.taxa, Rubiaceae, Vittariaceae) ~ sitio, data=bosques1) summary(mod) #Gráfico 1 can - candisc(mod, term=sitio,data=bosques1,ndim=1,eig=T) ### The error happens here, so I can not run the plot. plot(can,titles.1d = c(Puntuación canónica, Estructura)) summary(can, means = FALSE, scores = TRUE, coef = c(std), digits = 2) The error is: Error in eigen(eHe, symmetric = TRUE) : infinite or missing values in 'x' In addition: Warning message: In sqrt(wmd) : NaNs produced Please help! -- Maria Judith Carmona Higuita. Estudiante de Biología - Universidad de Antioquia Medellín - Colombia La felicidad ocurre cuando encajas en tu vida, cuando encajas tan armónicamente que cualquier cosa que hagas es una alegría para ti. De repente lo sabrás y la meditación te seguirá. Si amas el trabajo que haces, si amas la manera como vives, entonces ya estás meditando y nada puede distraerte. Osho [[alternative HTML version deleted]] -- Maria Judith Carmona Higuita. Estudiante de Biología - Universidad de Antioquia Medellín - Colombia La felicidad ocurre cuando encajas en tu vida, cuando encajas tan armónicamente que cualquier cosa que hagas es una alegría para ti. De
[R] Installing rgdal and rjags packages on a linux cluster
Dear R Help, I'm trying to install the rjags and rgdal packages on a linux cluster running R 3.0.3. However, I'm having problems installing them successfully. Both packages require external programs (JAGS and GDAL, respectively), which have been successfully installed. For rjags, the error message reads: configure: error: Location of JAGS headers not defined. Use configure arg '--with-jags-include' or environment variable 'JAGS_INCLUDE' I tried the following: install.packages(rjags, configure.args = list(--with-jags-include)) This returns a different error: configure: error: Problem with header file yes/Console.h From my readings of various help pages, it seems that I need to download the developer version of the rjags package, in order to supply the header files. Is this correct? If so, where do I find developer packages and how do I install them? R package development is new to me. For rgdal, the error message reads: Error: gdal-config not found The gdal-config script distributed with GDAL could not be found. Here, it's my understanding that I need to install PROJ.4 libraries and the developer versions of the rgdal and proj4 packages. Is this correct? Are the problems with installing rjags and rgdal basically the same? Could the problems be caused by running an older version of R? Any help would be greatly appreciated. Adam Zeilinger -- Adam Zeilinger Postdoctoral scholar Berkeley Initiative for Global Change Biology University of California Berkeley http://www.linkedin.com/in/adamzeilinger/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cansisc: Error in eigen(eHe, symmetric = TRUE)
Dear John, I am including abundance values ââin my data set so obviously I have zero abundances. The problem is that if plot only the factors (biomasa, altdosel, altsoto, cobertura, riqarb, elevacion, temperatura, precipitacion) I get the graphic, the same happen when I included only the families, but I want to see the effect of all these factors+families on this plot . In fact I included only certain families: prueba4 - lm(cbind(biomasa,altdosel,altsoto,cobertura,riqarb,elevacion,temperatura,precipitacion, Araceae,Begoniaceae,Bromeliaceae,Clusiaceae,Cyclanthaceae,Ericaceae,Gesneriaceae, Melastomataceae,Orchidaceae,Piperaceae,Pteridophyta) ~ sitio, data=bosques.p) canprueba2 - candisc(prueba2, term=sitio, data=bosques.p, ndim=1) Error in eigen (EHD, symmetric = TRUE): infinite or missing values ââin 'x' In addition: Warning message: In sqrt (wmd): NaNs produced You see I get the same error. Best regards, Judith On Wed, Jul 9, 2014 at 5:30 PM, John Fox j...@mcmaster.ca wrote: Dear Maria Judith Carmona Higuita, Since you didn't include enough information (such as your access to your data) to reproduce the error, one can only guess. My guess: you have fewer observations in your data set than response variables on the LHS of the multivariate linear model. I hope this helps, John John Fox, Professor McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/ On Wed, 9 Jul 2014 11:36:35 -0500 Maria Judith Carmona H juditycarm...@gmail.com wrote: Hi, I have a problem using the function Candisc from Candisc Package. bosques1-read.csv(bosques1.csv,header=TRUE,encoding=latin1) bosques1-na.exclude(bosques1) attach(bosques1) #Modelo de regresión mod - lm(cbind(biomasa,altdosel,altsoto,cobertura,riqarb,elevacion,temperatura,precipitacion, Acanthaceae, Apocinaceae, Araceae, Araliaceae, Arecaceae, Aspleniaceae, Begoniaceae, Blechnaceae, Bromeliaceae, Clusiaceae, Cyclanthaceae, Davalliaceae, Denstaedtiaceae, Dryopteridaceae, Ericaceae, Gesneriaceae, Hymenophyllaceae, indet., Lauraceae, Lomariopsidaceae, Lycopodiaceae, Melastomataceae, Moraceae, Myrsinaceae, Ophioglossaceae, Orchidaceae, Peperomia, Piperaceae, Poaceae, Polypodiaceae, Primulaceae, Pteridaceae, Pteridophyta.taxa, Rubiaceae, Vittariaceae) ~ sitio, data=bosques1) summary(mod) #Gráfico 1 can - candisc(mod, term=sitio,data=bosques1,ndim=1,eig=T) ### The error happens here, so I can not run the plot. plot(can,titles.1d = c(Puntuación canónica, Estructura)) summary(can, means = FALSE, scores = TRUE, coef = c(std), digits = 2) The error is: Error in eigen(eHe, symmetric = TRUE) : infinite or missing values in 'x' In addition: Warning message: In sqrt(wmd) : NaNs produced Please help! -- Maria Judith Carmona Higuita. Estudiante de BiologÃa - Universidad de Antioquia MedellÃn - Colombia La felicidad ocurre cuando encajas en tu vida, cuando encajas tan armónicamente que cualquier cosa que hagas es una alegrÃa para ti. De repente lo sabrás y la meditación te seguirá. Si amas el trabajo que haces, si amas la manera como vives, entonces ya estás meditando y nada puede distraerte. Osho [[alternative HTML version deleted]] -- Maria Judith Carmona Higuita. Estudiante de BiologÃa - Universidad de Antioquia MedellÃn - Colombia La felicidad ocurre cuando encajas en tu vida, cuando encajas tan armónicamente que cualquier cosa que hagas es una alegrÃa para ti. De repente lo sabrás y la meditación te seguirá. Si amas el trabajo que haces, si amas la manera como vives, entonces ya estás meditando y nada puede distraerte. Osho [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cansisc: Error in eigen(eHe, symmetric = TRUE)
Dear John There is my data set. Thanks. On Wed, Jul 9, 2014 at 8:12 PM, John Fox j...@mcmaster.ca wrote: Dear Judith, I take it from your reply that you have *more* observations than there are response variables in the multivariate linear model, but since you still haven't provided access to the data, it's still impossible to tell what the problem is. I don't follow your application, possibly because I'm ignorant of the area in which you work, but also possibly because there's insufficient information about that too. When you say that there are 0 abundances, I assume that this doesn't mean that some abundance values are 0 for *all* observations. If that's the case, then that would I believe produce a computational error, though not I think the one you observed. As an aside, if there are many 0 abundances then using the multivariate normal distribution for the responses likely isn't reasonable, which is what you're doing, but this in itself won't produce a computational error in candisc(). So, to reiterate, without the data, there's not much more that I can say. Because I'm out of town and will be traveling tomorrow, I'm unlikely to be able to respond again for several days. Best, John On Wed, 9 Jul 2014 17:54:56 -0500 Maria Judith Carmona H juditycarm...@gmail.com wrote: Dear John, I am including abundance values ??in my data set so obviously I have zero abundances. The problem is that if plot only the factors (biomasa, altdosel, altsoto, cobertura, riqarb, elevacion, temperatura, precipitacion) I get the graphic, the same happen when I included only the families, but I want to see the effect of all these factors+families on this plot . In fact I included only certain families: prueba4 - lm(cbind(biomasa,altdosel,altsoto,cobertura,riqarb,elevacion,temperatura,precipitacion, Araceae,Begoniaceae,Bromeliaceae,Clusiaceae,Cyclanthaceae,Ericaceae,Gesneriaceae, Melastomataceae,Orchidaceae,Piperaceae,Pteridophyta) ~ sitio, data=bosques.p) canprueba2 - candisc(prueba2, term=sitio, data=bosques.p, ndim=1) Error in eigen (EHD, symmetric = TRUE): infinite or missing values ??in 'x' In addition: Warning message: In sqrt (wmd): NaNs produced You see I get the same error. Best regards, Judith On Wed, Jul 9, 2014 at 5:30 PM, John Fox j...@mcmaster.ca wrote: Dear Maria Judith Carmona Higuita, Since you didn't include enough information (such as your access to your data) to reproduce the error, one can only guess. My guess: you have fewer observations in your data set than response variables on the LHS of the multivariate linear model. I hope this helps, John John Fox, Professor McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/ On Wed, 9 Jul 2014 11:36:35 -0500 Maria Judith Carmona H juditycarm...@gmail.com wrote: Hi, I have a problem using the function Candisc from Candisc Package. bosques1-read.csv(bosques1.csv,header=TRUE,encoding=latin1) bosques1-na.exclude(bosques1) attach(bosques1) #Modelo de regresión mod - lm(cbind(biomasa,altdosel,altsoto,cobertura,riqarb,elevacion,temperatura,precipitacion, Acanthaceae, Apocinaceae, Araceae, Araliaceae, Arecaceae, Aspleniaceae, Begoniaceae, Blechnaceae, Bromeliaceae, Clusiaceae, Cyclanthaceae, Davalliaceae, Denstaedtiaceae, Dryopteridaceae, Ericaceae, Gesneriaceae, Hymenophyllaceae, indet., Lauraceae, Lomariopsidaceae, Lycopodiaceae, Melastomataceae, Moraceae, Myrsinaceae, Ophioglossaceae, Orchidaceae, Peperomia, Piperaceae, Poaceae, Polypodiaceae, Primulaceae, Pteridaceae, Pteridophyta.taxa, Rubiaceae, Vittariaceae) ~ sitio, data=bosques1) summary(mod) #Gráfico 1 can - candisc(mod, term=sitio,data=bosques1,ndim=1,eig=T) ### The error happens here, so I can not run the plot. plot(can,titles.1d = c(Puntuación canónica, Estructura)) summary(can, means = FALSE, scores = TRUE, coef = c(std), digits = 2) The error is: Error in eigen(eHe, symmetric = TRUE) : infinite or missing values in 'x' In addition: Warning message: In sqrt(wmd) : NaNs produced Please help! -- Maria Judith Carmona Higuita. Estudiante de Biología - Universidad de Antioquia Medellín - Colombia La felicidad ocurre cuando encajas en tu vida, cuando encajas tan armónicamente que cualquier cosa que hagas es una alegría para ti. De repente lo sabrás y la meditación te seguirá. Si amas el trabajo que haces, si amas la manera como vives, entonces ya estás meditando y nada puede distraerte. Osho [[alternative HTML version deleted]] -- Maria Judith Carmona
[R] Information about font
Hi, I have this set of R scripts which are ran on a linux box and create plots with the lattice package. I do not specify any custom font family, so I believe that whatever is the default font on my system is used in the plot. 1- how can I know which is the default font used in my plots? 2- is this font specific to R or can it be used by external tools? 3- if this font can be used by external tools, how can I know the location of this font on my system? Thank you in advance for your help Sebastien [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Information about font
On Jul 9, 2014, at 7:47 PM, Sébastien Bihorel wrote: Hi, I have this set of R scripts which are ran on a linux box and create plots with the lattice package. I do not specify any custom font family, so I believe that whatever is the default font on my system is used in the plot. 1- how can I know which is the default font used in my plots? 2- is this font specific to R or can it be used by external tools? 3- if this font can be used by external tools, how can I know the location of this font on my system? Fonts are specific to the graphical device being used. You have not specified what device you are using. ?Devices The fonts are provided by your OS setup. ?pdfFonts ?Type1Font ?grid::gpar Thank you in advance for your help Sebastien [[alternative HTML version deleted]] Still having trouble understanding your mail client? -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Decision Tree
Hi R-helpers, Is it possible to change the color of the boxes when plotting decision trees using 'fancyRpartPlot()' from rpart.plot package ? -- Regards, Abhinaba Roy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] quantmod: How could I change the name in chartSeries
hi, guys, I am just a beginner to the excellent R package, quantmod. I quite don't know how to change the y-axis name in the chartSeries function. Actually, I want to write some sort of the following function, by which I could use just one code sentence to complete the financial analysis. The following function is designed to provide some aspects of the SP500. And now I want to change the stock.name on the y-axis as SP500. Is there anyway to realize this? THX William # stock.price - function(stock.name, stock.code){ Loading.. library(zoo) library(xts) library(TTR) library(Defaults) library(quantmod) #-- ## Theme: white theme.white - chartTheme(white) names(theme.white) theme.white$bg.col - white theme.white$up.col - red theme.white$dn.col - green #-- main function stock.name - getSymbols(stock.code, from = 2010-01-01, to = Sys.Date(), src = yahoo, auto.assign=FALSE) chartSeries(stock.name, theme = theme.white, # subset = 'last 12 months', TA = addVo(); addSMA(); addEnvelope(); addMACD(); addMomentum(); addROC(); addBBands()) addLines(v = which(stock.name[,4] == max(stock.name[,4])), col = gray) } # stock.price(SP500, ^GSPC) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
RE: [R-es] Resumen de R-help-es, Vol 65, Env�o 13
Exacto: mucho trabajo y poco miedo al ridículo para preguntar y preguntar. Una buena combinación... Un saludo. Isidro Hidalgo Arellano Observatorio Regional de Empleo Consejería de Empleo y Economía http://www.jccm.es -Mensaje original- De: r-help-es-boun...@r-project.org [mailto:r-help-es-bounces@r- project.org] En nombre de Eva Prieto Castro Enviado el: miércoles, 09 de julio de 2014 8:51 Para: ERNESTO JIMENEZ GARRIDO; r-help-es@r-project.org Asunto: Re: [R-es] Resumen de R-help-es, Vol 65, Envío 13 Hola, Ernesto: Me parece desacertada tu decisión. El conocimiento llega a uno gradualmente, como consecuencia de esforzarse, de preguntar y de contar con gente que, aunque sepa mucho, quiera y tenga la capacidad de adaptarse a la situación del otro. Al fin y al cabo, la última fase del aprendizaje es la enseñanza... Mis conocimientos en R no son avanzados y nunca he dudado en preguntar. Te invito a que hagas lo mismo. Un saludo. Eva El Miércoles 9 de julio de 2014 8:00, ERNESTO JIMENEZ GARRIDO ernestjime...@ub.edu escribió: Queridos, Me apunté a esta lista de correo con el objetivo de avanzar en mi proceso de aprendizaje con R y R-Commander. Mis limitados conocimientos me hacen ver que este no es el foro que necesito, su nivel es demasiado avanzado para mÃ. Si me podeis orientar para estar en el foro adecuado que serÃa tener oportunidad de comentar problemas de los primeros pasos con R y R- Commander os lo agradecerÃa. Os doy las gracias por avanzado. Que os vaya bien en vuestro lento avance con este programa común, universal y gratuÃto. Ernest Jiménez Garrido Professor Associat UB Departament d'Econometria, EstadÃstica i Economia Espanyola De: r-help-es-boun...@r-project.org [r-help-es-boun...@r-project.org] en nom de r-help-es-requ...@r-project.org [r-help-es-request@r- project.org] Enviat el: dimarts, 8 / juliol / 2014 19:43 Per a: r-help- e...@r-project.org Tema: Resumen de R-help-es, Vol 65, EnvÃo 13 EnvÃe los mensajes para la lista R-help-es a     r-help-es@r- project.org Para subscribirse o anular su subscripción a través de la WEB     https://stat.ethz.ch/mailman/listinfo/r-help-es O por correo electrónico, enviando un mensaje con el texto help en el asunto (subject) o en el cuerpo a:     r-help-es-requ...@r-project.org Puede contactar con el responsable de la lista escribiendo a:     r-help-es-ow...@r-project.org Si responde a algún contenido de este mensaje, por favor, edite la linea del asunto (subject) para que el texto sea mas especifico que: Re: Contents of R-help-es digest Además, por favor, incluya en la respuesta sólo aquellas partes del mensaje a las que está respondiendo. Asuntos del dÃa:  1. Paquete generado no detectan ambiente particular creado.    (Eva Prieto Castro)  2. Re: Paquete generado no detectan ambiente particular creado.    (Eva Prieto Castro)  3. Re: Paquete generado no detectan ambiente particular creado.    (miguel.angel.rodriguez.mui...@sergas.es)  4. Re: Paquete generado no detectan ambiente particular creado.    (Eva Prieto Castro)  5. Re: Paquete generado no detectan ambiente particular creado.    (rubenfcasal) -- Message: 1 Date: Tue, 8 Jul 2014 11:38:22 +0100 To: r-help-es r-help-es@r-project.org Subject: [R-es] Paquete generado no detectan ambiente particular     creado. Message-ID:     1404815902.81335.yahoomail...@web171506.mail.ir2.yahoo.com Content-Type: text/plain Buenos dÃas: Por favor, ¿alguien podrÃa crear un script de r con este código que envÃo e intentar paquetizarlo?. Yo siempre lo conseguÃa, pero con la versión actual de R (3.1.0), una vez generado el zip del paquete y cargado desde la RGui, no me detecta la existencia del environment que creé (.Ch.env). Es como si ahora el pkt sólo pudiera estar integrado (a efectos prácticos) por funciones, sin admitir la existencia de una estructura de datos subyacente, como es el conjunto formado por lGlo y bStarted, ambas incluidas en el environment creado (.Ch.env) .Ch.env - new.env() .Ch.env$lGlo - list() .Ch.env$bStarted - FALSE CheckGloCreated - function() {  if (.Ch.env$bStarted == TRUE) {   stop(Data structures were already initialized., call.=FALSE)  } } ChrL.Start - function() {  CheckGloCreated() .Ch.env$bStarted - TRUE  cat(Tested.\n) } Lo único peculiar al paquetizar es que en el Ch-internal.r (si le llamáis Ch al paquete) hay que corregir la lÃnea que genera el package.skeleton y sustituirla por lo siguiente: .Ch.env - new.env() Gracias de antemano. Atte.- Eva     [[alternative HTML version deleted]] -- Message: 2 Date: Tue, 8 Jul 2014 12:45:44 +0100 To: r-help-es
Re: [R-es] La lista está para ayudar...
Hola: Haciendo un poco de historia debemos recordar, al menos los más viejos del lugar, que esta lista está destinada a ayudar a los que tienen problemas y les cuesta dirigirse a listas especializadas, ya sea por el idioma o por la falta de experiencia con R o incluso con otros aspectos más aplicados. Es una lista para ayudarnos si recurrir a aquello de lea el j... manual; creo que desde el principio la lista a funcionado con una cordialidad insuperable, las ayudas han sido de todo tipo de niveles; además, estoy seguro de que ha facilitado algunas relaciones personales. Espero seguir encontrándome útil sobre todo a los novatos, aunque siempre hay dedos y teclados más vertiginosos que los mios :o( Un saludo y un agradecimiento a todos los colisteros: preguntones, respondones y discutones ;-) El 09/07/14 09:53, miguel.angel.rodriguez.mui...@sergas.es escribió: Hola Ernesto. Ya que estás pensando en abandonar la lista, yo (antes) me lanzaría a preguntar alguna cosa (de esas que tienes guardadas esperando al foro adecuado). Qué puedes perder? :-) Un Saludo, Miguel Ángel Rodríguez Muíños Dirección Xeral de Innovación e Xestión da Saúde Pública Consellería de Sanidade Xunta de Galicia http://dxsp.sergas.es -Mensaje original- De: r-help-es-boun...@r-project.org [mailto:r-help-es-boun...@r-project.org] En nombre de ERNESTO JIMENEZ GARRIDO Enviado el: miércoles, 09 de julio de 2014 8:01 Para: r-help-es@r-project.org Asunto: Re: [R-es] Resumen de R-help-es, Vol 65, Envío 13 Queridos, Me apunté a esta lista de correo con el objetivo de avanzar en mi proceso de aprendizaje con R y R-Commander. Mis limitados conocimientos me hacen ver que este no es el foro que necesito, su nivel es demasiado avanzado para mí. Si me podeis orientar para estar en el foro adecuado que sería tener oportunidad de comentar problemas de los primeros pasos con R y R-Commander os lo agradecería. Os doy las gracias por avanzado. Que os vaya bien en vuestro lento avance con este programa común, universal y gratuíto. Ernest Jiménez Garrido Professor Associat UB Nota: A información contida nesta mensaxe e os seus posibles documentos adxuntos é privada e confidencial e está dirixida únicamente ó seu destinatario/a. Se vostede non é o/a destinatario/a orixinal desta mensaxe, por favor elimínea. A distribución ou copia desta mensaxe non está autorizada. Nota: La información contenida en este mensaje y sus posibles documentos adjuntos es privada y confidencial y está dirigida únicamente a su destinatario/a. Si usted no es el/la destinatario/a original de este mensaje, por favor elimínelo. La distribución o copia de este mensaje no está autorizada. See more languages: http://www.sergas.es/aviso_confidencialidad.htm ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es -- José Antonio Palazón Ferrando Profesor Titular. Departamento de Ecología e Hidrología. Facultad de Biología. Universidad de Murcia. Campus Universitario de Espinardo 30100 MURCIA-SPAIN Telf: +34 868 88 49 80 Fax : +34 868 88 39 63 Email: pala...@um.es ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es
Re: [R-es] error com un archivo
Hola, ¿qué tal? Las condiciones que definen los cuatro conjuntos de datos, ¿son mutuamente exclusivas? Yo tratarÃa de comprobar si hay lÃneas que cumplen más de una de ellas. PodrÃas crear un vector con etiquetas al principio que definiese qué filas van a cada conjunto de datos. Eso garantizarÃa que cada fila llega a uno y solo uno de ellos. TendrÃas, además, un diseño más limpio y fácil de interpretar y mantener. Un saludo, Carlos J. Gil Bellosta http://www.datanalytics.com 2014-07-09 13:56 GMT+02:00 Marta valdes lopez martavalde...@gmail.com: Hola a todos, Me gustaria pedir vuestra ayuda a encontrar el error que no consigo encontrar en este archivo. He revisado todo mil veces y probado y no doy con ello.Adjunto el archivo con Google drive porque es muy grande. â monicap_50.csv https://docs.google.com/file/d/0B8o2KrPEgG7ATlBMc19lTVk1d3M/edit?usp=drive_web â Este es el script, y lo que no entiendo que pasa es que tengo 592044 datos despues de limpiar los NA quedan 586561 datos , y cuando utilizo el script la suma de z2+z4+z5+z6 , que son los estados deberia de darme lo mismo que Z1 que es el valor total de datos pero no se que error existe que me dan mas datos que los que hay.He comparado con el archivo en excel y los datos de na estan correctos. library(chron) library(xlsx) filename-monicap_50.csv DBxy-read.csv(filename, sep=;,header=TRUE,dec=,) DBx-na.omit(DBxy) names(DBx)-c(Boat,DateTime,TimeDiff, Latitude, Longitude, Course, Speed, distNm, calcSpeed, calcCourse, distHb, Harbour, idTrip,vmsAngle, calcAngle, vmsLeg, calcLeg, Trip_vmsLeg, Trip_calcLeg, lengthTrip, lengthTrip_vmsLeg, lengthTrip_calcLeg,Time, Date) #Formatting date and time variables DBx$Date-strptime(DBx$Date, %d-%m-%Y) DBx$Year-as.POSIXlt(DBx$Date)$year+1900 if(filename!=monicap_50.csv) {DBx$Time-paste(DBx$Time, :00, sep=)} #NOT necessary for Monicap and Univerest_50 DBx$Time-times(DBx$Time) #Works for Monicap AND UNIVEREST_50 ONLY DBx$Boat-gsub(^\\s+|\\s+$, , DBx$Boat) #Read file with boat codes and gears codeBoats- read.csv(CODES_2002-2010New.csv, sep=,,header=TRUE)#Laptop codeBoats$CODIGO-gsub(^\\s+|\\s+$, , codeBoats$CODIGO) #Assigning a Fishing license based on Boat and Year DBx$gear-codeBoats$Lic[match(paste(DBx$Boat,DBx$Year), paste(codeBoats$CODIGO,codeBoats$Year))] z0-length(DBx$gear) z1-length(DBx$gear) z1 #defining speed and distance limits speedFishing-2.0 speedHarb-1.0 distHbRule-3.0 speedSteam-2.0 minTime-times(c(05:59:59))#usual beginning of fishing operations maxTime- times(c(20:59:59))#usual finishing of fishing operations #Selecting Harbour DBharbour- na.omit(DBx[DBx$distHb=distHbRule DBx$calcSpeed=speedHarb,]) DBharbour$State-Harbour #MONICAP= 10618; UNIVER1= ; UNIVER2= ; UNIVEREST= 1028 z2-length(DBharbour$State) #Selecting Steaming DBsteaming- na.omit(DBx[(DBx$calcSpeedspeedFishing) | (DBx$distHb=distHbRule DBx$calcSpeedspeedHarb),]) DBsteaming$State- Steaming #MONICAP= 88398; UNIVER1= ; UNIVER2= ; UNIVEREST= 53748 DBsteaming$Harbour- z4-length(DBsteaming$State) #Selecting Fishing DBfishing- na.omit(DBx[(DBx$calcSpeed=speedFishing DBx$distHbdistHbRule DBx$TimeminTime DBx$Time=maxTime),]) DBfishing$State-Fishing DBfishing$Harbour- z5-length(DBfishing$State) #Selecting nigth DBnight- na.omit(DBx[(DBx$calcSpeed=speedFishing DBx$distHbdistHbRule (DBx$Time=minTime | DBx$TimemaxTime)),]) DBnight$State-Night #MONICAP=10434; UNIVER1= 16677; UNIVER2= 25789 DBnight$Harbour- z6-length(DBnight$State) Si alguien ve el error y puede echarme una mano agradeceria, si no pues seguire peleandome con el archivo! Muchas gracias, un saludo ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es [[alternative HTML version deleted]] ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es
[R-es] Conversion date a numeric y vuelta a date
Hola a todos: Debe de ser una tonterÃa, pero no consigo saber porque la siguiente linea no devuelve la fecha actual: as.Date(as.numeric(Sys.time())) He hecho esa prueba porque no consigo pasar un numero convertido a partir de una fecha y modificado a fecha de nuevo. Gracias por adelantado. Un saludo, Alberto. [[alternative HTML version deleted]] ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es
Re: [R-es] Conversion date a numeric y vuelta a date
Hola, Obtienes un error por que no indicas el origin, tal y como refleja fielmente la documentación: as.Date(as.numeric(Sys.time()))Error in as.Date.numeric(as.numeric(Sys.time())) : 'origin' must be supplied - Details The usual vector re-cycling rules are applied to x and format so the answer will be of length that of the longer of the vectors. Locale-specific conversions to and from character strings are used where appropriate and available. This affects the names of the days and months. The as.Date methods accept character strings, factors, logical NA and objects of classes POSIXlt http://127.0.0.1:42870/help/library/base/help/POSIXlt and POSIXct http://127.0.0.1:42870/help/library/base/help/POSIXct. (The last is converted to days by ignoring the time after midnight in the representation of the time in specified time zone, default UTC.) Also objects of class date(from package date http://127.0.0.1:42870/help/library/date/html/as.date.html) and dates (from package chron http://127.0.0.1:42870/help/library/chron/html/chron.html). Character strings are processed as far as necessary for the format specified: any trailing characters are ignored. *as.Date will accept numeric data (the number of days since an epoch), but only if origin is supplied.* The format and as.character methods ignore any fractional part of the date. - Saludos, Carlos Ortega www.qualityexcellence.es El 9 de julio de 2014, 14:25, Alberto Soria alberto.so...@ari-solar.es escribió: Hola a todos: Debe de ser una tonterÃa, pero no consigo saber porque la siguiente linea no devuelve la fecha actual: as.Date(as.numeric(Sys.time())) He hecho esa prueba porque no consigo pasar un numero convertido a partir de una fecha y modificado a fecha de nuevo. Gracias por adelantado. Un saludo, Alberto. [[alternative HTML version deleted]] ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es -- Saludos, Carlos Ortega www.qualityexcellence.es [[alternative HTML version deleted]] ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es
Re: [R-es] Conversion date a numeric y vuelta a date
Hola Alberto, Necesitas as.Date(as.numeric(as.Date(Sys.time())), origin = '1970-01-01') Esta parte as.numeric(as.Date(Sys.time())) # 16260 te da el numero de dias que han transcurrido desde Ene 1, 1970. Luego, utilizando ese dia/año como origen, determinas la fecha actual. Saludos, Jorge.- 2014-07-09 22:25 GMT+10:00 Alberto Soria alberto.so...@ari-solar.es: Hola a todos: Debe de ser una tontería, pero no consigo saber porque la siguiente linea no devuelve la fecha actual: as.Date(as.numeric(Sys.time())) He hecho esa prueba porque no consigo pasar un numero convertido a partir de una fecha y modificado a fecha de nuevo. Gracias por adelantado. Un saludo, Alberto. [[alternative HTML version deleted]] ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es [[alternative HTML version deleted]] ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es
Re: [R-es] Conversion date a numeric y vuelta a date
Hola Alberto, en el siguiente link es posible que encuentres alguna clave para la solución: http://www.noamross.net/blog/2014/2/10/using-times-and-dates-in-r---presentation-code.html Espero sea de utilidad. Un saludo, -- *Alexandre Alonso Fernández** *Instituto de Investigaciones Marinas, IIM-CSIC Departamento de Recursos y Ecología Marina Grupo de Ecología Pesquera http://www.iim.csic.es/pesquerias El 09/07/2014 14:32, Carlos Ortega escribió: Hola, Obtienes un error por que no indicas el origin, tal y como refleja fielmente la documentación: as.Date(as.numeric(Sys.time()))Error in as.Date.numeric(as.numeric(Sys.time())) : 'origin' must be supplied - Details The usual vector re-cycling rules are applied to x and format so the answer will be of length that of the longer of the vectors. Locale-specific conversions to and from character strings are used where appropriate and available. This affects the names of the days and months. The as.Date methods accept character strings, factors, logical NA and objects of classes POSIXlt http://127.0.0.1:42870/help/library/base/help/POSIXlt and POSIXct http://127.0.0.1:42870/help/library/base/help/POSIXct. (The last is converted to days by ignoring the time after midnight in the representation of the time in specified time zone, default UTC.) Also objects of class date(from package date http://127.0.0.1:42870/help/library/date/html/as.date.html) and dates (from package chron http://127.0.0.1:42870/help/library/chron/html/chron.html). Character strings are processed as far as necessary for the format specified: any trailing characters are ignored. *as.Date will accept numeric data (the number of days since an epoch), but only if origin is supplied.* The format and as.character methods ignore any fractional part of the date. - Saludos, Carlos Ortega www.qualityexcellence.es El 9 de julio de 2014, 14:25, Alberto Soria alberto.so...@ari-solar.es escribió: Hola a todos: Debe de ser una tonterÃa, pero no consigo saber porque la siguiente linea no devuelve la fecha actual: as.Date(as.numeric(Sys.time())) He hecho esa prueba porque no consigo pasar un numero convertido a partir de una fecha y modificado a fecha de nuevo. Gracias por adelantado. Un saludo, Alberto. [[alternative HTML version deleted]] ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es https://twitter.com/FisheriesIIM [[alternative HTML version deleted]] ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es