Re: [R] tried half-precision but size 2 is unknown on this machine
Following the posting guide and hence reading the help page first helps: Possible sizes are 1, 2, 4 and possibly 8 for integer or logical vectors, and 4, 8 and possibly 12/16 for numeric vectors. Best, Uwe Ligges On 04.01.2015 08:03, Mike Miller wrote: Thanks for the pedantic insult, but no thanks. I'd rather just hear if anyone reading this is able to make something like this work on any architecture: vec - 1:10/10 con - file( test.bin16, wb ) writeBin( vec , con, size=2 ) close(con) If they can do it, they can tell me about it. That shouldn't ruin the list for anyone else. I can understand why a machine architecture would prevent floating-point operations with half-precision numbers, but I can't understand how it prevents us from encoding doubles as half-precision to store them in a file. They could then be read back in, translated on the fly into doubles. Like I said, I've been using integers instead of floats to store the numbers in files, but it could be slightly more convenient to use half-precision floats for storage instead of converting integers to floats. Almost forgot. Please tell me how this changes anything: sessionInfo() R version 3.1.1 (2014-07-10) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=C LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.1.1 Also, this is how the hexbin package is described: Description Binning and plotting functions for hexagonal bins. So I guess that suggestion wasn't helping me much, either. Mike On Sat, 3 Jan 2015, Jeff Newmiller wrote: Your message is missing either a reproducible example or an indication of your R environment (such as the output of sessionInfo()). Yes, the machine architecture can prevent certain types of operations. This is however a poor venue for discussing such issues. I suggest that you investigate the hexbin package for binary data handling, and if you still have issues then post again, following the posting guide recommendations. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On January 3, 2015 9:31:02 PM PST, Mike Miller mbmille...@gmail.com wrote: It's an IEEE standard format: http://en.wikipedia.org/wiki/Half-precision_floating-point_format#IEEE_754_half-precision_binary_floating-point_format:_binary16 This is what I see: writeBin(vec , con, size=2 ) Error in writeBin(vec, con, size = 2) : size 2 is unknown on this machine I'm not sure what the machine has to do with it. It's really up to the software, isn't it? Is there a way to get R to read/write half-precision numbers (binary16)? It isn't a big deal for me because unsigned 16-bit integers are working well enough, but I'd like to have an answer for people who ask why I make them divide by 1000 all the time. ;-) Mike __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tried half-precision but size 2 is unknown on this machine
On 04/01/2015 12:31 AM, Mike Miller wrote: It's an IEEE standard format: http://en.wikipedia.org/wiki/Half-precision_floating-point_format#IEEE_754_half-precision_binary_floating-point_format:_binary16 This is what I see: writeBin(vec , con, size=2 ) Error in writeBin(vec, con, size = 2) : size 2 is unknown on this machine I'm not sure what the machine has to do with it. It's really up to the software, isn't it? Yes, but R relies on the underlying C run-time library for a lot of things like this. On your platform, is there a C type corresponding to half precision? If so, let us know the details, and we'll possibly add it to writeBin. Is there a way to get R to read/write half-precision numbers (binary16)? If it's not supported by the C run-time library and has to be done entirely using other types, that's the sort of thing that belongs in a user-contributed package. I'm not aware of one that already has it, so you may have to write this yourself. Duncan Murdoch It isn't a big deal for me because unsigned 16-bit integers are working well enough, but I'd like to have an answer for people who ask why I make them divide by 1000 all the time. ;-) Mike __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Separating a Complicated String Vector
I'm coming to R from Python, so I coded a Python3 solution: # data = alabama bates tuscaloosa smith arkansas fayette little rock alaska juneau nome .split() state_list = [alabama, arkansas, alaska] # etc. return_list = [] for word in data: if word in state_list: current_state = word else: return_list.append([current_state, word]) print(return_list) # ... and then translated it to R: # data = alabama bates tuscaloosa smith arkansas fayette little rock alaska juneau nome data = strsplit(data, split=\n)[[1]] states = vector() cities = vector() for (word in data) { if (word %in% tolower(state.name)) { current_state = word } else { states = c(states, current_state) cities = c(cities, word) } } print(data.frame(V1=states, V2=cities)) # -John -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David Winsemius Sent: Sunday, January 04, 2015 2:48 AM To: npretnar Cc: R-help@r-project.org Subject: Re: [R] Separating a Complicated String Vector On Jan 3, 2015, at 9:20 PM, npretnar wrote: Sorry. Bad example on my part. Try this. V1 is ... V1 alabama bates tuscaloosa smith arkansas fayette little rock alaska juneau nome And I want: V1 V2 alabama bates alabama tuscaloosa alabama smith arkansasfayette arkansaslittle rock alaska juneau alaskas nome dat$is_state - grepl(tolower(paste(state.name, collapse=|)), dat$V1) dat$thisstate - cumsum(rownames(dat) %in% which(dat$is_state) ) dat2 - data.frame(V1 = dat$V1[dat$is_state][dat$thisstate[!dat$is_state] ] , V2 = dat$V1[ !dat$is_state] ) dat2 V1 V2 1 alabama bates 2 alabama tuscaloosa 3 alabama smith 4 arkansasfayette 5 arkansas little 6 arkansas rock 7 alaska juneau 8 alaska nome -- David. This is more representative of the problem, extended to all 50 states. - Nick On Jan 3, 2015, at 9:22 PM, Ista Zahn wrote: I'm not sure what's so complicated about that (am I missing something?). You can search using grep, and replace using gsub, so tmpDF - read.table(text=V1 V2 A 5 a1 1 a2 1 a3 1 a4 1 a5 1 B 4 b1 1 b2 1 b3 1 b4 1, header=TRUE) tmpDF - tmpDF[grepl([0-9], tmpDF$V1), ] data.frame(tmpDF, V3 = toupper(gsub([0-9], , tmpDF$V1))) Seems to do the trick. Best, Ista On Sat, Jan 3, 2015 at 9:41 PM, npretnar npret...@gmail.com wrote: I have a string variable (V1) in a data frame structured as follows: V1 V2 A 5 a1 1 a2 1 a3 1 a4 1 a5 1 B 4 b1 1 b2 1 b3 1 b4 1 I want the following: V1 V2 V3 a1 1 A a2 1 A a3 1 A a4 1 A a5 1 A b1 1 B b2 1 B b3 1 B b4 1 B I am not sure how to go about making this transformation besides writing a long vector that contains each of the categorical string names (these are state names, so it would be a really long vector). Any help would be greatly appreciated. Thanks, Nicholas Pretnar Mizzou Economics Grad Assistant npret...@gmail.com David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sapply function and poisson distribution
thank you for your answer.Yes,that sounds right.I thought the same thing but the problem is how can i generalize the command for every vector of numbers not only for the specific example?not only for c(1,2),c(0.1,0.8). 2015-01-04 0:45 GMT+00:00 Pete Brecknock [via R] ml-node+s789695n4701358...@n4.nabble.com: dimnik wrote i want to find a function that takes in two vectors of numbers that have the same length.The output should be a list of vectors, where each vector is a sequence of randomly generated Poisson variables where the number of samples in each vector is determined by the entries in the first input vector and the lambdas come from the entries in the second input vector. For example, :If the inputs are c(1,2) and c(0.1,0.8) the output will be a list of twovectors where the first vectorhas a single sample from Poisson(0.1) and the second vector has two samples from Poisson(0.8).How can i do all that kind of stuff using sapply function? thank u in advance How about using mapply, the multivariate version of sapply? Based on your example ... mapply(function(x,y) rpois(x,y), c(1,2),c(0.1,0.8)) HTH Pete -- If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/sapply-function-and-poisson-distribution-tp4701353p4701358.html To unsubscribe from sapply function and poisson distribution, click here http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4701353code=dmFnZWxpc2d1ZEBnbWFpbC5jb218NDcwMTM1M3wtMTg5MDAyODgzMA== . NAML http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://r.789695.n4.nabble.com/sapply-function-and-poisson-distribution-tp4701353p4701373.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to group by then count?
Dear Monnad, one possible way would be to use as.factor() and in the summary you would get counts for every level. Like this: x = c(1, 1, 2, 1, 5, 2) summary(as.factor(x)) Cheers, Christian Hi all, I thought this was a very naive problem but I have not found any solution which is idiomatic to R. The problem is like this: Assuming we have vector of strings: x = c(1, 1, 2, 1, 5, 2) We want to count number of appearance of each string. i.e. in vector x, string 1 appears 3 times; 2 appears twice and 5 appears once. Then I want to know which string is the majority. In this case, it is 1. For imperative languages like C, C++ Java and python, I would use a hash table to count each strings where keys are the strings and values are the number of appearance. For functional languages like clojure, there're higher order functions like group-by. However, for R, I can hardly find a good solution to this simple problem. I found a hash package, which implements hash table. However, installing a package simple for a hash table is really annoying for me. I did find aggregate and other functions which operates on data frames. But in my case, it is a simple vector. Converting it to a data frame may be not desirable. (Or is it?) Could anyone suggest me an idiomatic way of doing such job in R? I would be appreciate for your help! -Monnand [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tried half-precision but size 2 is unknown on this machine
Sorry about the dead lead on the package... it is hexView. It does not support FP16 directly though... You would have to find another way to make that conversion. Some people have posted code that may be usable with Rcpp [1]. I believe your architecture may support hardware conversion of FP32 to FP16. If you came up with a portable version, I imagine that would be a nice contribution to make to hexView. [1] https://fgiesen.wordpress.com/2012/03/28/half-to-float-done-quic/ --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On January 3, 2015 11:03:19 PM PST, Mike Miller mbmille...@gmail.com wrote: Thanks for the pedantic insult, but no thanks. I'd rather just hear if anyone reading this is able to make something like this work on any architecture: vec - 1:10/10 con - file( test.bin16, wb ) writeBin( vec , con, size=2 ) close(con) If they can do it, they can tell me about it. That shouldn't ruin the list for anyone else. I can understand why a machine architecture would prevent floating-point operations with half-precision numbers, but I can't understand how it prevents us from encoding doubles as half-precision to store them in a file. They could then be read back in, translated on the fly into doubles. Like I said, I've been using integers instead of floats to store the numbers in files, but it could be slightly more convenient to use half-precision floats for storage instead of converting integers to floats. Almost forgot. Please tell me how this changes anything: sessionInfo() R version 3.1.1 (2014-07-10) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8LC_COLLATE=C LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.1.1 Also, this is how the hexbin package is described: Description Binning and plotting functions for hexagonal bins. So I guess that suggestion wasn't helping me much, either. Mike On Sat, 3 Jan 2015, Jeff Newmiller wrote: Your message is missing either a reproducible example or an indication of your R environment (such as the output of sessionInfo()). Yes, the machine architecture can prevent certain types of operations. This is however a poor venue for discussing such issues. I suggest that you investigate the hexbin package for binary data handling, and if you still have issues then post again, following the posting guide recommendations. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On January 3, 2015 9:31:02 PM PST, Mike Miller mbmille...@gmail.com wrote: It's an IEEE standard format: http://en.wikipedia.org/wiki/Half-precision_floating-point_format#IEEE_754_half-precision_binary_floating-point_format:_binary16 This is what I see: writeBin(vec , con, size=2 ) Error in writeBin(vec, con, size = 2) : size 2 is unknown on this machine I'm not sure what the machine has to do with it. It's really up to the software, isn't it? Is there a way to get R to read/write half-precision numbers (binary16)? It isn't a big deal for me because unsigned 16-bit integers are working well enough, but I'd like to have an answer for people who ask why I make them divide by 1000 all the time. ;-) Mike __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
Re: [R-es] Ayuda identificación elementos en el cluster
Hola, ¿qué tal? Tu problema es que lo que llamas nombre es un factor. Mira esto: cat(iris$Species[1]) 1 cat(as.character(iris$Species[1])) setosa Un saludo, Carlos J. Gil Bellosta http://www.datanalytics.com El día 4 de enero de 2015, 10:39, Jose Manuel Veiga del Baño chem...@um.es escribió: Hola a todos, Tengo un problema, que no consigo solucionar. En el análisis cluster de 280 elementos lo hago mediante la secuencia: library(cluster) clusplot(mydata2, fit2$cluster, color=TRUE, shade=TRUE, labels=2, lines=0) La representacion de los 280 elementos lo hace de forma adecuada, cambiando el nombre del elemento por el número. Ahora bien necesitaría saber que nombre de elemento le corresponde con ese elemento, para ello lo hago mediante: clusters-sapply(unique(groups),function(x)mydata2$PESTICIDA[groups == x]) pero cuando intento sacar que nombre le corresponde a ese número, siempre me devuelve el número, no consigo sacar el nombre. Es decir si hago clusterx[k,1] me sale el nombre pero al meterlo para que me lo informe con cat, me sale otra vez el número: for (j in 1:ncluster){ clusterx-data.frame(clusters[j]) cat(Numero de cluster=,j, \n) for (k in 1:nrow(clusterx)){ cat(clusterx[k,1], sep=//) } } He mirado pero no consigo encontrar la forma de poder identificar el elemento. ¿Alguien se ha encontrado con el problema o sabría como solucionarlo? Muchas gracias. Dr. José M. Veiga Dpt. Química Agrícola, Geología y Edafología. Universidad de Murcia. [[alternative HTML version deleted]] ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es
Re: [R] problem with vegan function rda()
Lukas, Lukas Kohl llukas.kkohl at gmail.com writes: Hello R-list Maybe someone knows what's going on here. I'm trying to re-run a script I wrote earlier this year using the function rda() in the vegan package. The script run fine back then, and I did not change the dataset, so I was wandering whether there's some problem in a updated version of the package (I re-installed R + all packages since then). Thanks for any advice, Lukas Kohl Sorry for the late reply: I don't follow this list regularly. A couple of points about your question: (1) vegan indeed has function rda(), but its output has no resemblance to your example (except for the line repeating Call:). Either you are using some other package or you have made up your own version of rda() or you do not show its output. (2) The rda() function in vegan 2.2-0. This is documented in NEWS. You can see this by issuing vegandocs(NEWS) command after loading library(vegan). (3) I have no idea what your script does (id does something different than vegan::rda()) and I cannot reproduce the problem without that knowledge. Kind regards, Jari Oksanen PS. Sorry for not top-posting: Gmane does not allow it. PS2. Sorry for removing some of your message: Gmane requires this. -- So here's the output I get: rda(rel) Call: rda(X = rel) Regularization parameters: NULL Prior probabilities of groups: NULL Misclassification rate: apparent: % Warning message: In is.na(x$error.rate[1]) : is.na() applied to non-(list or vector) of type 'NULL' There's no NA's in my dataset, which seems ok.. sum(is.na(rel)) [1] 0 nrow(rel) [1] 59 ncol(rel) [1] 49 head(rel) X14.0 i.15.0 ai.15.0br.16.0 X15.1n.15.0i.16.0 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tried half-precision but size 2 is unknown on this machine
On 04/01/2015 12:12, Duncan Murdoch wrote: On 04/01/2015 12:31 AM, Mike Miller wrote: It's an IEEE standard format: http://en.wikipedia.org/wiki/Half-precision_floating-point_format#IEEE_754_half-precision_binary_floating-point_format:_binary16 This is what I see: writeBin(vec , con, size=2 ) Error in writeBin(vec, con, size = 2) : size 2 is unknown on this machine I'm not sure what the machine has to do with it. It's really up to the software, isn't it? Yes, but R relies on the underlying C run-time library for a lot of things like this. On your platform, is there a C type corresponding to half precision? If so, let us know the details, and we'll possibly add it to writeBin. There is a IEC60559 (aka IEEE 754) 'half-precision floating-point type', but I know of no support by a C runtime on any platform I have used (there is a lot more in IEC60559 which is almost never supported). Is there a way to get R to read/write half-precision numbers (binary16)? If it's not supported by the C run-time library and has to be done entirely using other types, that's the sort of thing that belongs in a user-contributed package. I'm not aware of one that already has it, so you may have to write this yourself. There is a C++ library called 'half' which could be wrapped. See http://half.sourceforge.net/ : it has a lot of compiler-specific code. Duncan Murdoch It isn't a big deal for me because unsigned 16-bit integers are working well enough, but I'd like to have an answer for people who ask why I make them divide by 1000 all the time. ;-) Mike -- Brian D. Ripley, rip...@stats.ox.ac.uk Emeritus Professor of Applied Statistics, University of Oxford 1 South Parks Road, Oxford OX1 3TG, UK __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R-es] Ayuda identificación elementos en el cluster
Hola a todos, Tengo un problema, que no consigo solucionar. En el análisis cluster de 280 elementos lo hago mediante la secuencia: library(cluster) clusplot(mydata2, fit2$cluster, color=TRUE, shade=TRUE, labels=2, lines=0) La representacion de los 280 elementos lo hace de forma adecuada, cambiando el nombre del elemento por el número. Ahora bien necesitaría saber que nombre de elemento le corresponde con ese elemento, para ello lo hago mediante: clusters-sapply(unique(groups),function(x)mydata2$PESTICIDA[groups == x]) pero cuando intento sacar que nombre le corresponde a ese número, siempre me devuelve el número, no consigo sacar el nombre. Es decir si hago clusterx[k,1] me sale el nombre pero al meterlo para que me lo informe con cat, me sale otra vez el número: for (j in 1:ncluster){ clusterx-data.frame(clusters[j]) cat(Numero de cluster=,j, \n) for (k in 1:nrow(clusterx)){ cat(clusterx[k,1], sep=//) } } He mirado pero no consigo encontrar la forma de poder identificar el elemento. ¿Alguien se ha encontrado con el problema o sabría como solucionarlo? Muchas gracias. Dr. José M. Veiga Dpt. Química Agrícola, Geología y Edafología. Universidad de Murcia. [[alternative HTML version deleted]] ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es
[R] How to group by then count?
Hi all, I thought this was a very naive problem but I have not found any solution which is idiomatic to R. The problem is like this: Assuming we have vector of strings: x = c(1, 1, 2, 1, 5, 2) We want to count number of appearance of each string. i.e. in vector x, string 1 appears 3 times; 2 appears twice and 5 appears once. Then I want to know which string is the majority. In this case, it is 1. For imperative languages like C, C++ Java and python, I would use a hash table to count each strings where keys are the strings and values are the number of appearance. For functional languages like clojure, there're higher order functions like group-by. However, for R, I can hardly find a good solution to this simple problem. I found a hash package, which implements hash table. However, installing a package simple for a hash table is really annoying for me. I did find aggregate and other functions which operates on data frames. But in my case, it is a simple vector. Converting it to a data frame may be not desirable. (Or is it?) Could anyone suggest me an idiomatic way of doing such job in R? I would be appreciate for your help! -Monnand [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to group by then count?
On 04-01-2015, at 10:02, Monnand monn...@gmail.com wrote: Hi all, I thought this was a very naive problem but I have not found any solution which is idiomatic to R. The problem is like this: Assuming we have vector of strings: x = c(1, 1, 2, 1, 5, 2) We want to count number of appearance of each string. i.e. in vector x, string 1 appears 3 times; 2 appears twice and 5 appears once. Then I want to know which string is the majority. In this case, it is 1. For imperative languages like C, C++ Java and python, I would use a hash table to count each strings where keys are the strings and values are the number of appearance. For functional languages like clojure, there're higher order functions like group-by. However, for R, I can hardly find a good solution to this simple problem. I found a hash package, which implements hash table. However, installing a package simple for a hash table is really annoying for me. I did find aggregate and other functions which operates on data frames. But in my case, it is a simple vector. Converting it to a data frame may be not desirable. (Or is it?) Could anyone suggest me an idiomatic way of doing such job in R? I would be appreciate for your help! Have a look at table: ?table Berend __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tried half-precision but size 2 is unknown on this machine
Thanks! So it looks like I can say R writeBin/readBin does not support half-precision floats even though the error message size 2 is unknown on this machine seems to contradict that (for some machine). I tried to figure out from the source code (src/main/connections.c) how it decides what is possible, but that was a little beyond me. That was really just to satisfy my curiosity. The unsigned 16-bit integer approach works well-enough for me and it has the advantage that I know it will always work on anyone's system. I'm working with numbers from 0 to 2 with no more than 4 significant digits, so a 16-bit float with 11 digits of precision was appealing. It's not hard to work with uint16, though, and od also reads it easily. I've been working on a message about this application which I will share soon, probably later tonight. I'm also experimenting with a lossy storage using a single byte per integer (uint8). That might be a good strategy because the numbers I'm working with are inherently imprecise. It seems to work fine in R, but it doesn't seem to work with GNU od (Linux/UNIX program) and that makes me wonder what else can handle it. uint16 seems the safer bet, and there is no loss of precision. Of course, the downside is that the uint16 file is twice as big as the uint8 file, and these files may be several hundred GB in size. Mike On Sun, 4 Jan 2015, Uwe Ligges wrote: Following the posting guide and hence reading the help page first helps: Possible sizes are 1, 2, 4 and possibly 8 for integer or logical vectors, and 4, 8 and possibly 12/16 for numeric vectors. On Sun, 4 Jan 2015, Duncan Murdoch wrote: On 04/01/2015 12:31 AM, Mike Miller wrote: It's an IEEE standard format: http://en.wikipedia.org/wiki/Half-precision_floating-point_format#IEEE_754_half-precision_binary_floating-point_format:_binary16 This is what I see: writeBin(vec , con, size=2 ) Error in writeBin(vec, con, size = 2) : size 2 is unknown on this machine I'm not sure what the machine has to do with it. It's really up to the software, isn't it? Yes, but R relies on the underlying C run-time library for a lot of things like this. On your platform, is there a C type corresponding to half precision? If so, let us know the details, and we'll possibly add it to writeBin. Is there a way to get R to read/write half-precision numbers (binary16)? If it's not supported by the C run-time library and has to be done entirely using other types, that's the sort of thing that belongs in a user-contributed package. I'm not aware of one that already has it, so you may have to write this yourself. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to group by then count?
This seems to me to be a case where thinking in terms of computer programming concepts is getting in the way a bit. Approach it as a data analysis task; the S language (upon which R is based) is designed in part for data analysis so there is a function that does most of the job for you. (I changed your vector of strings to make the result more easily interpreted) x = c(1, 1, 2, 1, 5, 2,'3','5','5','2','2') tmp - table(x) ## counts the number of appearances of each element tmp[tmp==max(tmp)] ## finds which one occurs most often 2 4 Meaning that the element '2' appears 4 times. The table() function should be fast even with long vectors. Here's an example with a vector of length 1 million: foo - table( sample(letters, 1e6, replace=TRUE) ) One of the seminal books on the S language is John M Chambers' Programming with Data -- and I would emphasize the with Data part of that title. -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 1/4/15, 1:02 AM, Monnand monn...@gmail.com wrote: Hi all, I thought this was a very naive problem but I have not found any solution which is idiomatic to R. The problem is like this: Assuming we have vector of strings: x = c(1, 1, 2, 1, 5, 2) We want to count number of appearance of each string. i.e. in vector x, string 1 appears 3 times; 2 appears twice and 5 appears once. Then I want to know which string is the majority. In this case, it is 1. For imperative languages like C, C++ Java and python, I would use a hash table to count each strings where keys are the strings and values are the number of appearance. For functional languages like clojure, there're higher order functions like group-by. However, for R, I can hardly find a good solution to this simple problem. I found a hash package, which implements hash table. However, installing a package simple for a hash table is really annoying for me. I did find aggregate and other functions which operates on data frames. But in my case, it is a simple vector. Converting it to a data frame may be not desirable. (Or is it?) Could anyone suggest me an idiomatic way of doing such job in R? I would be appreciate for your help! -Monnand [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] dealing with NA in readBin() and writeBin()
The help doc for readBin writeBin tells me this: Handling R's missing and special (Inf, -Inf and NaN) values is discussed in the ‘R Data Import/Export’ manual. So I go here: http://cran.r-project.org/doc/manuals/r-release/R-data.html#Special-values Unfortunately, I don't really understand that. Suppose I am using single-byte integers and I want 255 (binary ) to be translated to NA. Is it possible to do that? Of course I could always do something like this: X[ X==255 ] - NA The problem with that is that I want to process the data on the fly, dividing the integer to produce a double in the range from 0 to 2: X - readBin( file, what=integer, n=N, size=1, signed=FALSE)/127 It looks like this still works: X[ X==255/127 ] - NA It would be neater if there were some kind of translation option for the input stream, like the way GNU tr (Linux/UNIX) works. I'm looking around and not finding such a thing. I can use gsub() to translate on the fly and then coerce back to integer format: X - as.integer(gsub(255, NA, readBin( file, what=integer, n=N, size=1, signed=FALSE)))/127 What is your opinion of that tactic? Is there a better way? I don't know if that has any advantage on the postprocessing tactic above. Maybe what I need is something like gsub() that can operate on numeric values... X - numsub(255, NA, readBin( file, what=integer, n=N, size=1, signed=FALSE))/127 ...but if that isn't better in terms of speed or memory usage than postprocessing like this... X[ X==255/127 ] - NA ...then I really don't need it (for this, but it would be good to know about). The na.strings = NA functionality of scan() is neat, but I guess that doesn't work with the binary read system. I don't think I can scan the readBin input because it isn't a file or stdin. Mike __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dealing with NA in readBin() and writeBin()
On 04/01/2015 5:13 PM, Mike Miller wrote: The help doc for readBin writeBin tells me this: Handling R's missing and special (Inf, -Inf and NaN) values is discussed in the ‘R Data Import/Export’ manual. So I go here: http://cran.r-project.org/doc/manuals/r-release/R-data.html#Special-values Unfortunately, I don't really understand that. Suppose I am using single-byte integers and I want 255 (binary ) to be translated to NA. Is it possible to do that? Of course I could always do something like this: X[ X==255 ] - NA The problem with that is that I want to process the data on the fly, dividing the integer to produce a double in the range from 0 to 2: X - readBin( file, what=integer, n=N, size=1, signed=FALSE)/127 Why? Why not do it in three steps, i.e. X - readBin( file, what=integer, n=N, size=1, signed=FALSE) X[ X==255 ] - NA X - X/127 If you are worried about the extra typing, then write a function to handle all three steps. It looks like this still works: X[ X==255/127 ] - NA I suspect that would work on all current platforms, but I wouldn't trust it. Don't use == on floating point values unless you know they are fractions with 2^n in the denominator. It would be neater if there were some kind of translation option for the input stream, like the way GNU tr (Linux/UNIX) works. I'm looking around and not finding such a thing. I can use gsub() to translate on the fly and then coerce back to integer format: It's really trivial to write a wrapper for readBin to do what you want: myReadBin - function(...) { X - readBin(...) X[ X==255 ] - NA X } Duncan Murdoch X - as.integer(gsub(255, NA, readBin( file, what=integer, n=N, size=1, signed=FALSE)))/127 What is your opinion of that tactic? Is there a better way? I don't know if that has any advantage on the postprocessing tactic above. Maybe what I need is something like gsub() that can operate on numeric values... X - numsub(255, NA, readBin( file, what=integer, n=N, size=1, signed=FALSE))/127 ...but if that isn't better in terms of speed or memory usage than postprocessing like this... X[ X==255/127 ] - NA ...then I really don't need it (for this, but it would be good to know about). The na.strings = NA functionality of scan() is neat, but I guess that doesn't work with the binary read system. I don't think I can scan the readBin input because it isn't a file or stdin. Mike __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dealing with NA in readBin() and writeBin()
On Sun, 4 Jan 2015, Duncan Murdoch wrote: On 04/01/2015 5:13 PM, Mike Miller wrote: The help doc for readBin writeBin tells me this: Handling R's missing and special (Inf, -Inf and NaN) values is discussed in the ‘R Data Import/Export’ manual. So I go here: http://cran.r-project.org/doc/manuals/r-release/R-data.html#Special-values Unfortunately, I don't really understand that. Suppose I am using single-byte integers and I want 255 (binary ) to be translated to NA. Is it possible to do that? Of course I could always do something like this: X[ X==255 ] - NA The problem with that is that I want to process the data on the fly, dividing the integer to produce a double in the range from 0 to 2: X - readBin( file, what=integer, n=N, size=1, signed=FALSE)/127 Why? Why not do it in three steps, i.e. X - readBin( file, what=integer, n=N, size=1, signed=FALSE) X[ X==255 ] - NA X - X/127 If you are worried about the extra typing, then write a function to handle all three steps. The thing I was concerned about is the memory usage, not the typing, because everything will be scripted. But maybe memory isn't an issue and I never have to hold two copies in memory simultaneously. There will be about 50 million elements, typically. I think in terms of processing numbers that are streaming into memory, but that might not be what R is doing. For example, with scan() and na.strings=NA, I picture it changing strings to NA as they are read, it might load the whole file as character, then do all the work with things like what=numeric() and na.strings=NA after the fact. Maybe that doesn't impose an extra memory burden. It looks like this still works: X[ X==255/127 ] - NA I suspect that would work on all current platforms, but I wouldn't trust it. Don't use == on floating point values unless you know they are fractions with 2^n in the denominator. Good point about platforms. I was concerned about the use of ==, and you've convinced me it is not trustworthy. Thanks very much. Mike __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sapply function and poisson distribution
dimnik wrote thank you for your answer.Yes,that sounds right.I thought the same thing but the problem is how can i generalize the command for every vector of numbers not only for the specific example?not only for c(1,2),c(0.1,0.8). 2015-01-04 0:45 GMT+00:00 Pete Brecknock [via R] ml-node+s789695n4701358h57@.nabble : dimnik wrote i want to find a function that takes in two vectors of numbers that have the same length.The output should be a list of vectors, where each vector is a sequence of randomly generated Poisson variables where the number of samples in each vector is determined by the entries in the first input vector and the lambdas come from the entries in the second input vector. For example, :If the inputs are c(1,2) and c(0.1,0.8) the output will be a list of twovectors where the first vectorhas a single sample from Poisson(0.1) and the second vector has two samples from Poisson(0.8).How can i do all that kind of stuff using sapply function? thank u in advance How about using mapply, the multivariate version of sapply? Based on your example ... mapply(function(x,y) rpois(x,y), c(1,2),c(0.1,0.8)) HTH Pete -- If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/sapply-function-and-poisson-distribution-tp4701353p4701358.html To unsubscribe from sapply function and poisson distribution, click here lt;http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codeamp;node=4701353amp;code=dmFnZWxpc2d1ZEBnbWFpbC5jb218NDcwMTM1M3wtMTg5MDAyODgzMA==gt; . NAML lt;http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_vieweramp;id=instant_html%21nabble%3Aemail.namlamp;base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespaceamp;breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.namlgt; Not sure how you intend to specify the input vectors for n and lambda One way would be as below - you can amend the 2 vectors with the values of your choice. n - c(1,2,3,4,5) lambda - c(0.1,0.8,1.2,2.2,4.2) mapply(function(x,y) rpois(x,y), n, lambda) HTH Pete -- View this message in context: http://r.789695.n4.nabble.com/sapply-function-and-poisson-distribution-tp4701353p4701384.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] counting sets of consecutive integers in a vector
I have a vector of sorted positive integer values (e.g., postive integers after applying sort() and unique()). For example, this: c(1,2,5,6,7,8,25,30,31,32,33) I want to make a matrix from that vector that has two columns: (1) the first value in every run of consecutive integer values, and (2) the corresponding number of consecutive values. For example: c(1:20) would become this... 1 20 ...because there are 20 consecutive integers beginning with 1 and c(1,2,5,6,7,8,25,30,31,32,33) would become 1 2 5 4 25 1 30 4 What would be the best way to accomplish this? Here is my first effort: v - c(1,2,5,6,7,8,25,30,31,32,33) L - rle( v - 1:length(v) )$lengths n - length( L ) matrix( c( v[ c( 1, cumsum(L)+1 ) ][1:n], L), nrow=n) [,1] [,2] [1,]12 [2,]54 [3,] 251 [4,] 304 I suppose that works well enough, but there may be a better way, and besides, I wouldn't want to deny anyone here the opportunity to solve a fun puzzle. ;-) The use for this is that I will be doing repeated seeks of a binary file to extract data. seek() gives the starting point and readBin(n=X) gives the number of bytes to read. So when there are many consecutive variables to be read, I can multiply the X in n=X by that number instead of doing many different seek() calls. (The data are in a transposed format where I read in every record for some variable as sequential elements.) I'm probably not the first person to deal with this. Best, Mike -- Michael B. Miller, Ph.D. University of Minnesota http://scholar.google.com/citations?user=EV_phq4J __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting sets of consecutive integers in a vector
Tena koe Mike An alternative, which is slightly fast: diffv - diff(v) starts - c(1, which(diffv!=1)+1) cbind(v[starts], c(diff(starts), length(v)-starts[length(starts)]+1)) Peter Alspach -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Mike Miller Sent: Monday, 5 January 2015 1:03 p.m. To: R-Help List Subject: [R] counting sets of consecutive integers in a vector I have a vector of sorted positive integer values (e.g., postive integers after applying sort() and unique()). For example, this: c(1,2,5,6,7,8,25,30,31,32,33) I want to make a matrix from that vector that has two columns: (1) the first value in every run of consecutive integer values, and (2) the corresponding number of consecutive values. For example: c(1:20) would become this... 1 20 ...because there are 20 consecutive integers beginning with 1 and c(1,2,5,6,7,8,25,30,31,32,33) would become 1 2 5 4 25 1 30 4 What would be the best way to accomplish this? Here is my first effort: v - c(1,2,5,6,7,8,25,30,31,32,33) L - rle( v - 1:length(v) )$lengths n - length( L ) matrix( c( v[ c( 1, cumsum(L)+1 ) ][1:n], L), nrow=n) [,1] [,2] [1,]12 [2,]54 [3,] 251 [4,] 304 I suppose that works well enough, but there may be a better way, and besides, I wouldn't want to deny anyone here the opportunity to solve a fun puzzle. ;-) The use for this is that I will be doing repeated seeks of a binary file to extract data. seek() gives the starting point and readBin(n=X) gives the number of bytes to read. So when there are many consecutive variables to be read, I can multiply the X in n=X by that number instead of doing many different seek() calls. (The data are in a transposed format where I read in every record for some variable as sequential elements.) I'm probably not the first person to deal with this. Best, Mike -- Michael B. Miller, Ph.D. University of Minnesota http://scholar.google.com/citations?user=EV_phq4J __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. The contents of this e-mail are confidential and may be ...{{dropped:14}} __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting sets of consecutive integers in a vector
Here is another approach: v - c(1,2,5,6,7,8,25,30,31,32,33) # split by differences != 1 t(sapply(split(v, cumsum(c(1, diff(v)) != 1)), function(x){ + c(value = x[1L], length = length(x)) # output first value and length + })) value length 0 1 2 1 5 4 225 1 330 4 Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Sun, Jan 4, 2015 at 8:27 PM, Peter Alspach peter.alsp...@plantandfood.co.nz wrote: Tena koe Mike An alternative, which is slightly fast: diffv - diff(v) starts - c(1, which(diffv!=1)+1) cbind(v[starts], c(diff(starts), length(v)-starts[length(starts)]+1)) Peter Alspach -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Mike Miller Sent: Monday, 5 January 2015 1:03 p.m. To: R-Help List Subject: [R] counting sets of consecutive integers in a vector I have a vector of sorted positive integer values (e.g., postive integers after applying sort() and unique()). For example, this: c(1,2,5,6,7,8,25,30,31,32,33) I want to make a matrix from that vector that has two columns: (1) the first value in every run of consecutive integer values, and (2) the corresponding number of consecutive values. For example: c(1:20) would become this... 1 20 ...because there are 20 consecutive integers beginning with 1 and c(1,2,5,6,7,8,25,30,31,32,33) would become 1 2 5 4 25 1 30 4 What would be the best way to accomplish this? Here is my first effort: v - c(1,2,5,6,7,8,25,30,31,32,33) L - rle( v - 1:length(v) )$lengths n - length( L ) matrix( c( v[ c( 1, cumsum(L)+1 ) ][1:n], L), nrow=n) [,1] [,2] [1,]12 [2,]54 [3,] 251 [4,] 304 I suppose that works well enough, but there may be a better way, and besides, I wouldn't want to deny anyone here the opportunity to solve a fun puzzle. ;-) The use for this is that I will be doing repeated seeks of a binary file to extract data. seek() gives the starting point and readBin(n=X) gives the number of bytes to read. So when there are many consecutive variables to be read, I can multiply the X in n=X by that number instead of doing many different seek() calls. (The data are in a transposed format where I read in every record for some variable as sequential elements.) I'm probably not the first person to deal with this. Best, Mike -- Michael B. Miller, Ph.D. University of Minnesota http://scholar.google.com/citations?user=EV_phq4J __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. The contents of this e-mail are confidential and may be ...{{dropped:14}} __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting sets of consecutive integers in a vector
Here is a solution using data.table require(data.table) x - data.table(v, diff = cumsum(c(1, diff(v)) != 1)) x v diff 1: 10 2: 20 3: 51 4: 61 5: 71 6: 81 7: 252 8: 303 9: 313 10: 323 11: 333 x[, list(value = v[1L], length = .N), key = 'diff'] diff value length 1:0 1 2 2:1 5 4 3:225 1 4:330 4 x[, list(value = v[1L], length = .N), key = 'diff'][, -1, with = FALSE] # get rid of 'diff' column value length 1: 1 2 2: 5 4 3:25 1 4:30 4 Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Sun, Jan 4, 2015 at 7:03 PM, Mike Miller mbmille...@gmail.com wrote: I have a vector of sorted positive integer values (e.g., postive integers after applying sort() and unique()). For example, this: c(1,2,5,6,7,8,25,30,31,32,33) I want to make a matrix from that vector that has two columns: (1) the first value in every run of consecutive integer values, and (2) the corresponding number of consecutive values. For example: c(1:20) would become this... 1 20 ...because there are 20 consecutive integers beginning with 1 and c(1,2,5,6,7,8,25,30,31,32,33) would become 1 2 5 4 25 1 30 4 What would be the best way to accomplish this? Here is my first effort: v - c(1,2,5,6,7,8,25,30,31,32,33) L - rle( v - 1:length(v) )$lengths n - length( L ) matrix( c( v[ c( 1, cumsum(L)+1 ) ][1:n], L), nrow=n) [,1] [,2] [1,]12 [2,]54 [3,] 251 [4,] 304 I suppose that works well enough, but there may be a better way, and besides, I wouldn't want to deny anyone here the opportunity to solve a fun puzzle. ;-) The use for this is that I will be doing repeated seeks of a binary file to extract data. seek() gives the starting point and readBin(n=X) gives the number of bytes to read. So when there are many consecutive variables to be read, I can multiply the X in n=X by that number instead of doing many different seek() calls. (The data are in a transposed format where I read in every record for some variable as sequential elements.) I'm probably not the first person to deal with this. Best, Mike -- Michael B. Miller, Ph.D. University of Minnesota http://scholar.google.com/citations?user=EV_phq4J __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/ posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Separating a Complicated String Vector
f - function (x) { isState - is.element(tolower(x), tolower(state.name)) w - which(isState) data.frame(State = x[rep(w, diff(c(w, length(x) + 1)) - 1L)], City = x[!isState]) } E.g., V1 -c(alabama, bates, tuscaloosa, smith, arkansas, fayette, little rock, alaska, juneau, nome) f(V1) StateCity 1 alabama bates 2 alabama tuscaloosa 3 alabama smith 4 arkansas fayette 5 arkansas little rock 6 alaska juneau 7 alaskanome Bill Dunlap TIBCO Software wdunlap tibco.com On Sat, Jan 3, 2015 at 9:20 PM, npretnar npret...@gmail.com wrote: Sorry. Bad example on my part. Try this. V1 is ... V1 alabama bates tuscaloosa smith arkansas fayette little rock alaska juneau nome And I want: V1 V2 alabama bates alabama tuscaloosa alabama smith arkansasfayette arkansaslittle rock alaska juneau alaskas nome This is more representative of the problem, extended to all 50 states. - Nick On Jan 3, 2015, at 9:22 PM, Ista Zahn wrote: I'm not sure what's so complicated about that (am I missing something?). You can search using grep, and replace using gsub, so tmpDF - read.table(text=V1 V2 A 5 a1 1 a2 1 a3 1 a4 1 a5 1 B 4 b1 1 b2 1 b3 1 b4 1, header=TRUE) tmpDF - tmpDF[grepl([0-9], tmpDF$V1), ] data.frame(tmpDF, V3 = toupper(gsub([0-9], , tmpDF$V1))) Seems to do the trick. Best, Ista On Sat, Jan 3, 2015 at 9:41 PM, npretnar npret...@gmail.com wrote: I have a string variable (V1) in a data frame structured as follows: V1 V2 A 5 a1 1 a2 1 a3 1 a4 1 a5 1 B 4 b1 1 b2 1 b3 1 b4 1 I want the following: V1 V2 V3 a1 1 A a2 1 A a3 1 A a4 1 A a5 1 A b1 1 B b2 1 B b3 1 B b4 1 B I am not sure how to go about making this transformation besides writing a long vector that contains each of the categorical string names (these are state names, so it would be a really long vector). Any help would be greatly appreciated. Thanks, Nicholas Pretnar Mizzou Economics Grad Assistant npret...@gmail.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting sets of consecutive integers in a vector
Thanks, Peter. Why not cbind your idea for the first column with my idea for the second column and get it done in one line?: v - c(1,2,5,6,7,8,25,30,31,32,33) M - cbind( v[ c(1, which( diff(v) !=1 ) + 1 ) ] , rle( v - 1:length(v) )$lengths ) M [,1] [,2] [1,]12 [2,]54 [3,] 251 [4,] 304 I find that pretty appealing and I'll probably stick with it. It seems quite fast. Here's an example: # make fairly long vector v - sort(unique(round(10*runif(10 length(v) [1] 63274 # time the procedure: ptm - proc.time() ; M - cbind( v[ c(1, which( diff(v) !=1 ) + 1 ) ] , rle( v - 1:length(v) )$lengths ) ; proc.time() - ptm user system elapsed 0.030.000.03 dim(M) [1] 23212 2 I probably won't be using vectors any longer than that, and this isn't the kind of thing that I do over and over again, so that speed is excellent. Mike On Mon, 5 Jan 2015, Peter Alspach wrote: Tena koe Mike An alternative, which is slightly fast: diffv - diff(v) starts - c(1, which(diffv!=1)+1) cbind(v[starts], c(diff(starts), length(v)-starts[length(starts)]+1)) Peter Alspach -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Mike Miller Sent: Monday, 5 January 2015 1:03 p.m. To: R-Help List Subject: [R] counting sets of consecutive integers in a vector I have a vector of sorted positive integer values (e.g., postive integers after applying sort() and unique()). For example, this: c(1,2,5,6,7,8,25,30,31,32,33) I want to make a matrix from that vector that has two columns: (1) the first value in every run of consecutive integer values, and (2) the corresponding number of consecutive values. For example: c(1:20) would become this... 1 20 ...because there are 20 consecutive integers beginning with 1 and c(1,2,5,6,7,8,25,30,31,32,33) would become 1 2 5 4 25 1 30 4 What would be the best way to accomplish this? Here is my first effort: v - c(1,2,5,6,7,8,25,30,31,32,33) L - rle( v - 1:length(v) )$lengths n - length( L ) matrix( c( v[ c( 1, cumsum(L)+1 ) ][1:n], L), nrow=n) [,1] [,2] [1,]12 [2,]54 [3,] 251 [4,] 304 I suppose that works well enough, but there may be a better way, and besides, I wouldn't want to deny anyone here the opportunity to solve a fun puzzle. ;-) The use for this is that I will be doing repeated seeks of a binary file to extract data. seek() gives the starting point and readBin(n=X) gives the number of bytes to read. So when there are many consecutive variables to be read, I can multiply the X in n=X by that number instead of doing many different seek() calls. (The data are in a transposed format where I read in every record for some variable as sequential elements.) I'm probably not the first person to deal with this. Best, Mike -- Michael B. Miller, Ph.D. University of Minnesota http://scholar.google.com/citations?user=EV_phq4J __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. The contents of this e-mail are confidential and may be subject to legal privilege. If you are not the intended recipient you must not use, disseminate, distribute or reproduce all or any part of this e-mail or attachments. If you have received this e-mail in error, please notify the sender and delete all material pertaining to this e-mail. Any opinion or views expressed in this e-mail are those of the individual sender and may not represent those of The New Zealand Institute for Plant and Food Research Limited. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.