Re: [R] tried half-precision but size 2 is unknown on this machine

2015-01-04 Thread Uwe Ligges

Following the posting guide and hence reading the help page first helps:

Possible sizes are 1, 2, 4 and possibly 8 for integer or logical 
vectors, and 4, 8 and possibly 12/16 for numeric vectors.


Best,
Uwe Ligges


On 04.01.2015 08:03, Mike Miller wrote:

Thanks for the pedantic insult, but no thanks.  I'd rather just hear if
anyone reading this is able to make something like this work on any
architecture:

vec - 1:10/10
con - file( test.bin16, wb )
writeBin( vec , con, size=2 )
close(con)

If they can do it, they can tell me about it.  That shouldn't ruin the
list for anyone else.

I can understand why a machine architecture would prevent floating-point
operations with half-precision numbers, but I can't understand how it
prevents us from encoding doubles as half-precision to store them in a
file.  They could then be read back in, translated on the fly into
doubles.  Like I said, I've been using integers instead of floats to
store the numbers in files, but it could be slightly more convenient to
use half-precision floats for storage instead of converting integers to
floats.

Almost forgot.  Please tell me how this changes anything:


sessionInfo()

R version 3.1.1 (2014-07-10)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C LC_TIME=en_US.UTF-8
LC_COLLATE=C LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
LC_PAPER=en_US.UTF-8 LC_NAME=C
LC_ADDRESS=C   LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_3.1.1


Also, this is how the hexbin package is described:

Description Binning and plotting functions for hexagonal bins.

So I guess that suggestion wasn't helping me much, either.

Mike


On Sat, 3 Jan 2015, Jeff Newmiller wrote:


Your message is missing either a reproducible example or an indication
of your R environment (such as the output of sessionInfo()).

Yes, the machine architecture can prevent certain types of operations.
This is however a poor venue for discussing such issues.

I suggest that you investigate the hexbin package for binary data
handling, and if you still have issues then post again, following the
posting guide recommendations.

---

Jeff NewmillerThe .   .  Go
Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
Go...
 Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.
rocks...1k
---

Sent from my phone. Please excuse my brevity.

On January 3, 2015 9:31:02 PM PST, Mike Miller mbmille...@gmail.com
wrote:

It's an IEEE standard format:

http://en.wikipedia.org/wiki/Half-precision_floating-point_format#IEEE_754_half-precision_binary_floating-point_format:_binary16


This is what I see:


writeBin(vec , con, size=2 )

Error in writeBin(vec, con, size = 2) : size 2 is unknown on this
machine

I'm not sure what the machine has to do with it.  It's really up to the

software, isn't it?

Is there a way to get R to read/write half-precision numbers
(binary16)?

It isn't a big deal for me because unsigned 16-bit integers are working

well enough, but I'd like to have an answer for people who ask why I
make
them divide by 1000 all the time.  ;-)

Mike

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tried half-precision but size 2 is unknown on this machine

2015-01-04 Thread Duncan Murdoch
On 04/01/2015 12:31 AM, Mike Miller wrote:
 It's an IEEE standard format:
 
 http://en.wikipedia.org/wiki/Half-precision_floating-point_format#IEEE_754_half-precision_binary_floating-point_format:_binary16
 
 This is what I see:
 
 writeBin(vec , con, size=2 )
 Error in writeBin(vec, con, size = 2) : size 2 is unknown on this machine
 
 I'm not sure what the machine has to do with it.  It's really up to the 
 software, isn't it?

Yes, but R relies on the underlying C run-time library for a lot of
things like this.  On your platform, is there a C type corresponding to
half precision?  If so, let us know the details, and we'll possibly add
it to writeBin.


 
 Is there a way to get R to read/write half-precision numbers (binary16)?

If it's not supported by the C run-time library and has to be done
entirely using other types, that's the sort of thing that belongs in a
user-contributed package.  I'm not aware of one that already has it, so
you may have to write this yourself.

Duncan Murdoch

 
 It isn't a big deal for me because unsigned 16-bit integers are working 
 well enough, but I'd like to have an answer for people who ask why I make 
 them divide by 1000 all the time.  ;-)
 
 Mike
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Separating a Complicated String Vector

2015-01-04 Thread John Posner
I'm coming to R from Python, so I coded a Python3 solution:

#
data = alabama
bates
tuscaloosa
smith
arkansas
fayette
little rock
alaska
juneau
nome
.split()

state_list = [alabama, arkansas, alaska]   # etc.

return_list = []
for word in data:
if word in state_list:
current_state = word
else:
return_list.append([current_state, word])

print(return_list)
#

... and then translated it to R:

#
data = alabama
bates
tuscaloosa
smith
arkansas
fayette
little rock
alaska
juneau
nome


data = strsplit(data, split=\n)[[1]]

states = vector()
cities = vector()

for (word in data) {
  if (word %in% tolower(state.name)) {
current_state = word
  } else {
states = c(states, current_state)
cities = c(cities, word)
  }
}

print(data.frame(V1=states, V2=cities))
#

-John




 -Original Message-
 From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David
 Winsemius
 Sent: Sunday, January 04, 2015 2:48 AM
 To: npretnar
 Cc: R-help@r-project.org
 Subject: Re: [R] Separating a Complicated String Vector


 On Jan 3, 2015, at 9:20 PM, npretnar wrote:

  Sorry. Bad example on my part. Try this. V1 is ...
 
  V1
  alabama
  bates
  tuscaloosa
  smith
  arkansas
  fayette
  little rock
  alaska
  juneau
  nome
 
  And I want:
 
  V1  V2
  alabama bates
  alabama tuscaloosa
  alabama smith
  arkansasfayette
  arkansaslittle rock
  alaska  juneau
  alaskas nome


 dat$is_state - grepl(tolower(paste(state.name, collapse=|)), dat$V1)

 dat$thisstate - cumsum(rownames(dat) %in% which(dat$is_state) )
 dat2 - data.frame(V1 = dat$V1[dat$is_state][dat$thisstate[!dat$is_state] ] 
 ,
V2 = dat$V1[ !dat$is_state] )


  dat2
 V1 V2
 1  alabama  bates
 2  alabama tuscaloosa
 3  alabama  smith
 4 arkansasfayette
 5 arkansas little
 6 arkansas   rock
 7   alaska juneau
 8   alaska   nome

 --
 David.

 
  This is more representative of the problem, extended to all 50 states.
 
  - Nick
 
 
  On Jan 3, 2015, at 9:22 PM, Ista Zahn wrote:
 
  I'm not sure what's so complicated about that (am I missing
  something?). You can search using grep, and replace using gsub, so
 
  tmpDF - read.table(text=V1  V2
  A   5
  a1  1
  a2  1
  a3  1
  a4  1
  a5  1
  B   4
  b1  1
  b2  1
  b3  1
  b4  1,
header=TRUE)
  tmpDF - tmpDF[grepl([0-9], tmpDF$V1), ] data.frame(tmpDF, V3 =
  toupper(gsub([0-9], , tmpDF$V1)))
 
  Seems to do the trick.
 
  Best,
  Ista
 
  On Sat, Jan 3, 2015 at 9:41 PM, npretnar npret...@gmail.com wrote:
  I have a string variable (V1) in a data frame structured as follows:
 
  V1  V2
  A   5
  a1  1
  a2  1
  a3  1
  a4  1
  a5  1
  B   4
  b1  1
  b2  1
  b3  1
  b4  1
 
  I want the following:
 
  V1  V2  V3
  a1  1   A
  a2  1   A
  a3  1   A
  a4  1   A
  a5  1   A
  b1  1   B
  b2  1   B
  b3  1   B
  b4  1   B
 
  I am not sure how to go about making this transformation besides
 writing a long vector that contains each of the categorical string names 
 (these
 are state names, so it would be a really long vector). Any help would be
 greatly appreciated.
 
  Thanks,
 
  Nicholas Pretnar
  Mizzou Economics Grad Assistant
  npret...@gmail.com


 David Winsemius
 Alameda, CA, USA

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sapply function and poisson distribution

2015-01-04 Thread dimnik
thank you for your answer.Yes,that sounds right.I thought the same thing
but the problem is how can i generalize the command for every vector of
numbers not only for the specific example?not only for c(1,2),c(0.1,0.8).

2015-01-04 0:45 GMT+00:00 Pete Brecknock [via R] 
ml-node+s789695n4701358...@n4.nabble.com:

  dimnik wrote
 i want to find  a function that takes in two vectors of numbers that have
 the same
 length.The output should be a list of vectors, where each vector is a
 sequence of
 randomly generated Poisson variables where the number of samples in each
 vector is determined by the entries in the first input vector and the
 lambdas come
 from the entries in the second input vector. For example, :If the inputs
 are c(1,2) and c(0.1,0.8) the output will be a list of twovectors where the
 first vectorhas a single sample from Poisson(0.1) and the second vector has
 two samples from Poisson(0.8).How can i do all that kind of stuff using
 sapply function?
 thank u in advance

 How about using mapply, the multivariate version of sapply?

 Based on your example ...

 mapply(function(x,y) rpois(x,y), c(1,2),c(0.1,0.8))

 HTH

 Pete

 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://r.789695.n4.nabble.com/sapply-function-and-poisson-distribution-tp4701353p4701358.html
  To unsubscribe from sapply function and poisson distribution, click here
 http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4701353code=dmFnZWxpc2d1ZEBnbWFpbC5jb218NDcwMTM1M3wtMTg5MDAyODgzMA==
 .
 NAML
 http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://r.789695.n4.nabble.com/sapply-function-and-poisson-distribution-tp4701353p4701373.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to group by then count?

2015-01-04 Thread Christian Brandstätter
Dear Monnad,

one possible way would be to use as.factor() and in the summary you would get 
counts for every level.

Like this:

  x = c(1, 1, 2, 1, 5, 2)

summary(as.factor(x))

Cheers, Christian


 Hi all,

 I thought this was a very naive problem but I have not found any solution
 which is idiomatic to R.

 The problem is like this:

 Assuming we have vector of strings:
   x = c(1, 1, 2, 1, 5, 2)

 We want to count number of appearance of each string. i.e. in vector x,
 string 1 appears 3 times; 2 appears twice and 5 appears once. Then I
 want to know which string is the majority. In this case, it is 1.

 For imperative languages like C, C++ Java and python, I would use a hash
 table to count each strings where keys are the strings and values are the
 number of appearance. For functional languages like clojure, there're
 higher order functions like group-by.

 However, for R, I can hardly find a good solution to this simple problem. I
 found a hash package, which implements hash table. However, installing a
 package simple for a hash table is really annoying for me. I did find
 aggregate and other functions which operates on data frames. But in my
 case, it is a simple vector. Converting it to a data frame may be not
 desirable. (Or is it?)

 Could anyone suggest me an idiomatic way of doing such job in R? I would be
 appreciate for your help!

 -Monnand

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tried half-precision but size 2 is unknown on this machine

2015-01-04 Thread Jeff Newmiller
Sorry about the dead lead on the package... it is hexView.  It does not support 
FP16 directly though... You would have to find another way to make that 
conversion. Some people have posted code that may be usable with Rcpp [1]. I 
believe your architecture may support hardware conversion of FP32 to FP16. If 
you came up with a portable version, I imagine that would be a nice 
contribution to make to hexView.

[1] https://fgiesen.wordpress.com/2012/03/28/half-to-float-done-quic/
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On January 3, 2015 11:03:19 PM PST, Mike Miller mbmille...@gmail.com wrote:
Thanks for the pedantic insult, but no thanks.  I'd rather just hear if

anyone reading this is able to make something like this work on any 
architecture:

vec - 1:10/10
con - file( test.bin16, wb )
writeBin( vec , con, size=2 )
close(con)

If they can do it, they can tell me about it.  That shouldn't ruin the 
list for anyone else.

I can understand why a machine architecture would prevent
floating-point 
operations with half-precision numbers, but I can't understand how it 
prevents us from encoding doubles as half-precision to store them in a 
file.  They could then be read back in, translated on the fly into 
doubles.  Like I said, I've been using integers instead of floats to
store 
the numbers in files, but it could be slightly more convenient to use 
half-precision floats for storage instead of converting integers to 
floats.

Almost forgot.  Please tell me how this changes anything:

 sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C 
LC_TIME=en_US.UTF-8LC_COLLATE=C 
LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8   
LC_PAPER=en_US.UTF-8 
LC_NAME=C  LC_ADDRESS=C   LC_TELEPHONE=C 
LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_3.1.1


Also, this is how the hexbin package is described:

Description Binning and plotting functions for hexagonal bins.

So I guess that suggestion wasn't helping me much, either.

Mike


On Sat, 3 Jan 2015, Jeff Newmiller wrote:

 Your message is missing either a reproducible example or an
indication 
 of your R environment (such as the output of sessionInfo()).

 Yes, the machine architecture can prevent certain types of
operations. 
 This is however a poor venue for discussing such issues.

 I suggest that you investigate the hexbin package for binary data 
 handling, and if you still have issues then post again, following the

 posting guide recommendations.


---
 Jeff NewmillerThe .   .  Go
Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
Go...
  Live:   OO#.. Dead: OO#.. 
Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#. 
rocks...1k

---
 Sent from my phone. Please excuse my brevity.

 On January 3, 2015 9:31:02 PM PST, Mike Miller mbmille...@gmail.com
wrote:
 It's an IEEE standard format:


http://en.wikipedia.org/wiki/Half-precision_floating-point_format#IEEE_754_half-precision_binary_floating-point_format:_binary16

 This is what I see:

 writeBin(vec , con, size=2 )
 Error in writeBin(vec, con, size = 2) : size 2 is unknown on this
 machine

 I'm not sure what the machine has to do with it.  It's really up to
the

 software, isn't it?

 Is there a way to get R to read/write half-precision numbers
 (binary16)?

 It isn't a big deal for me because unsigned 16-bit integers are
working

 well enough, but I'd like to have an answer for people who ask why I
 make
 them divide by 1000 all the time.  ;-)

 Mike

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see

Re: [R-es] Ayuda identificación elementos en el cluster

2015-01-04 Thread Carlos J. Gil Bellosta
Hola, ¿qué tal?

Tu problema es que lo que llamas nombre es un factor. Mira esto:

 cat(iris$Species[1])
1

 cat(as.character(iris$Species[1]))
setosa

Un saludo,

Carlos J. Gil Bellosta
http://www.datanalytics.com


El día 4 de enero de 2015, 10:39, Jose Manuel Veiga del Baño
chem...@um.es escribió:
 Hola a todos,

 Tengo un problema, que no consigo solucionar. En el análisis cluster de
 280 elementos lo hago mediante la secuencia:

   library(cluster)
   clusplot(mydata2, fit2$cluster, color=TRUE, shade=TRUE,
labels=2, lines=0)
 La representacion de los 280 elementos lo hace de forma adecuada, cambiando
 el nombre del elemento por el número. Ahora bien necesitaría saber que
 nombre de elemento le corresponde con ese elemento, para ello lo hago
 mediante:
 clusters-sapply(unique(groups),function(x)mydata2$PESTICIDA[groups == x])
 pero cuando intento sacar que nombre le corresponde a ese número, siempre
 me devuelve el número, no consigo sacar el nombre. Es decir si hago
 clusterx[k,1] me sale el nombre pero al meterlo para que me lo informe con
 cat, me sale otra vez el número:
for (j in 1:ncluster){
  clusterx-data.frame(clusters[j])
  cat(Numero de cluster=,j, \n)
  for (k in 1:nrow(clusterx)){
  cat(clusterx[k,1], sep=//)
  }
}

 He mirado pero no consigo encontrar la forma de poder identificar el
 elemento. ¿Alguien se ha encontrado con el problema o sabría como
 solucionarlo?

 Muchas gracias.

 Dr. José M. Veiga
 Dpt. Química Agrícola, Geología y Edafología.
 Universidad de Murcia.

 [[alternative HTML version deleted]]

 ___
 R-help-es mailing list
 R-help-es@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-help-es

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R] problem with vegan function rda()

2015-01-04 Thread Jari Oksanen
Lukas,


Lukas Kohl llukas.kkohl at gmail.com writes:

 
 Hello R-list
 
 Maybe someone knows what's going on here.
 
 I'm trying to re-run a script I wrote earlier this year using the function
 rda() in the vegan package. The script run fine back then, and I did not
 change the dataset, so I was wandering whether there's some problem in a
 updated version of the package (I re-installed R + all packages since then).
 
 Thanks for any advice,
 Lukas Kohl
 

Sorry for the late reply: I don't follow this list regularly.

A couple of points about your question:

(1) vegan indeed has function rda(), but its output has no resemblance
to your example (except for the line repeating Call:). Either you are using
some other package or you have made up your own version of rda() or you
do not show its output.

(2) The rda() function in vegan 2.2-0. This is documented in NEWS. You can
see this by issuing vegandocs(NEWS) command after loading library(vegan).

(3) I have no idea what your script does (id does something different than
vegan::rda()) and I cannot reproduce the problem without that knowledge.

Kind regards, Jari Oksanen
 
PS. Sorry for not top-posting: Gmane does not allow it.

PS2. Sorry for removing some of your message: Gmane requires this.

 --
 
 So here's the output I get:
 
  rda(rel)
 Call:
 rda(X = rel)
 
 Regularization parameters:
 NULL
 
 Prior probabilities of groups:
 NULL
 
 Misclassification rate:
apparent:  %
 Warning message:
 In is.na(x$error.rate[1]) :
   is.na() applied to non-(list or vector) of type 'NULL'
 
 There's no NA's in my dataset, which seems ok..
 
  sum(is.na(rel))
 [1] 0
  nrow(rel)
 [1] 59
  ncol(rel)
 [1] 49
  head(rel)
   X14.0   i.15.0  ai.15.0br.16.0  X15.1n.15.0i.16.0

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tried half-precision but size 2 is unknown on this machine

2015-01-04 Thread Prof Brian Ripley

On 04/01/2015 12:12, Duncan Murdoch wrote:

On 04/01/2015 12:31 AM, Mike Miller wrote:

It's an IEEE standard format:

http://en.wikipedia.org/wiki/Half-precision_floating-point_format#IEEE_754_half-precision_binary_floating-point_format:_binary16

This is what I see:


writeBin(vec , con, size=2 )

Error in writeBin(vec, con, size = 2) : size 2 is unknown on this machine

I'm not sure what the machine has to do with it.  It's really up to the
software, isn't it?


Yes, but R relies on the underlying C run-time library for a lot of
things like this.  On your platform, is there a C type corresponding to
half precision?  If so, let us know the details, and we'll possibly add
it to writeBin.


There is a IEC60559 (aka IEEE 754) 'half-precision floating-point type', 
but I know of no support by a C runtime on any platform I have used 
(there is a lot more in IEC60559 which is almost never supported).



Is there a way to get R to read/write half-precision numbers (binary16)?


If it's not supported by the C run-time library and has to be done
entirely using other types, that's the sort of thing that belongs in a
user-contributed package.  I'm not aware of one that already has it, so
you may have to write this yourself.


There is a C++ library called 'half' which could be wrapped.  See 
http://half.sourceforge.net/ : it has a lot of compiler-specific code.




Duncan Murdoch



It isn't a big deal for me because unsigned 16-bit integers are working
well enough, but I'd like to have an answer for people who ask why I make
them divide by 1000 all the time.  ;-)

Mike



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford
1 South Parks Road, Oxford OX1 3TG, UK

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R-es] Ayuda identificación elementos en el cluster

2015-01-04 Thread Jose Manuel Veiga del Baño
Hola a todos,

Tengo un problema, que no consigo solucionar. En el análisis cluster de
280 elementos lo hago mediante la secuencia:

  library(cluster) 
  clusplot(mydata2, fit2$cluster, color=TRUE, shade=TRUE, 
       labels=2, lines=0)
La representacion de los 280 elementos lo hace de forma adecuada, cambiando
el nombre del elemento por el número. Ahora bien necesitaría saber que
nombre de elemento le corresponde con ese elemento, para ello lo hago
mediante: 
clusters-sapply(unique(groups),function(x)mydata2$PESTICIDA[groups == x])
pero cuando intento sacar que nombre le corresponde a ese número, siempre
me devuelve el número, no consigo sacar el nombre. Es decir si hago
clusterx[k,1] me sale el nombre pero al meterlo para que me lo informe con
cat, me sale otra vez el número: 
   for (j in 1:ncluster){
     clusterx-data.frame(clusters[j])
     cat(Numero de cluster=,j, \n)
     for (k in 1:nrow(clusterx)){
     cat(clusterx[k,1], sep=//)
     } 
   }

He mirado pero no consigo encontrar la forma de poder identificar el
elemento. ¿Alguien se ha encontrado con el problema o sabría como
solucionarlo?

Muchas gracias.

Dr. José M. Veiga
Dpt. Química Agrícola, Geología y Edafología.
Universidad de Murcia.

[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


[R] How to group by then count?

2015-01-04 Thread Monnand
Hi all,

I thought this was a very naive problem but I have not found any solution
which is idiomatic to R.

The problem is like this:

Assuming we have vector of strings:
 x = c(1, 1, 2, 1, 5, 2)

We want to count number of appearance of each string. i.e. in vector x,
string 1 appears 3 times; 2 appears twice and 5 appears once. Then I
want to know which string is the majority. In this case, it is 1.

For imperative languages like C, C++ Java and python, I would use a hash
table to count each strings where keys are the strings and values are the
number of appearance. For functional languages like clojure, there're
higher order functions like group-by.

However, for R, I can hardly find a good solution to this simple problem. I
found a hash package, which implements hash table. However, installing a
package simple for a hash table is really annoying for me. I did find
aggregate and other functions which operates on data frames. But in my
case, it is a simple vector. Converting it to a data frame may be not
desirable. (Or is it?)

Could anyone suggest me an idiomatic way of doing such job in R? I would be
appreciate for your help!

-Monnand

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to group by then count?

2015-01-04 Thread Berend Hasselman

 On 04-01-2015, at 10:02, Monnand monn...@gmail.com wrote:
 
 Hi all,
 
 I thought this was a very naive problem but I have not found any solution
 which is idiomatic to R.
 
 The problem is like this:
 
 Assuming we have vector of strings:
 x = c(1, 1, 2, 1, 5, 2)
 
 We want to count number of appearance of each string. i.e. in vector x,
 string 1 appears 3 times; 2 appears twice and 5 appears once. Then I
 want to know which string is the majority. In this case, it is 1.
 
 For imperative languages like C, C++ Java and python, I would use a hash
 table to count each strings where keys are the strings and values are the
 number of appearance. For functional languages like clojure, there're
 higher order functions like group-by.
 
 However, for R, I can hardly find a good solution to this simple problem. I
 found a hash package, which implements hash table. However, installing a
 package simple for a hash table is really annoying for me. I did find
 aggregate and other functions which operates on data frames. But in my
 case, it is a simple vector. Converting it to a data frame may be not
 desirable. (Or is it?)
 
 Could anyone suggest me an idiomatic way of doing such job in R? I would be
 appreciate for your help!
 

Have a look at table:

?table

Berend

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tried half-precision but size 2 is unknown on this machine

2015-01-04 Thread Mike Miller
Thanks!  So it looks like I can say R writeBin/readBin does not support 
half-precision floats even though the error message size 2 is unknown on 
this machine seems to contradict that (for some machine).  I tried to 
figure out from the source code (src/main/connections.c) how it decides 
what is possible, but that was a little beyond me.  That was really just 
to satisfy my curiosity.  The unsigned 16-bit integer approach works 
well-enough for me and it has the advantage that I know it will always 
work on anyone's system.  I'm working with numbers from 0 to 2 with no 
more than 4 significant digits, so a 16-bit float with 11 digits of 
precision was appealing.  It's not hard to work with uint16, though, and 
od also reads it easily.  I've been working on a message about this 
application which I will share soon, probably later tonight.


I'm also experimenting with a lossy storage using a single byte per 
integer (uint8).  That might be a good strategy because the numbers I'm 
working with are inherently imprecise.  It seems to work fine in R, but it 
doesn't seem to work with GNU od (Linux/UNIX program) and that makes me 
wonder what else can handle it.  uint16 seems the safer bet, and there is 
no loss of precision.  Of course, the downside is that the uint16 file is 
twice as big as the uint8 file, and these files may be several hundred GB 
in size.


Mike


On Sun, 4 Jan 2015, Uwe Ligges wrote:


Following the posting guide and hence reading the help page first helps:

Possible sizes are 1, 2, 4 and possibly 8 for integer or logical vectors, 
and 4, 8 and possibly 12/16 for numeric vectors.



On Sun, 4 Jan 2015, Duncan Murdoch wrote:


On 04/01/2015 12:31 AM, Mike Miller wrote:

It's an IEEE standard format:

http://en.wikipedia.org/wiki/Half-precision_floating-point_format#IEEE_754_half-precision_binary_floating-point_format:_binary16

This is what I see:


writeBin(vec , con, size=2 )

Error in writeBin(vec, con, size = 2) : size 2 is unknown on this machine

I'm not sure what the machine has to do with it.  It's really up to the
software, isn't it?


Yes, but R relies on the underlying C run-time library for a lot of 
things like this.  On your platform, is there a C type corresponding to 
half precision?  If so, let us know the details, and we'll possibly add 
it to writeBin.





Is there a way to get R to read/write half-precision numbers 
(binary16)?


If it's not supported by the C run-time library and has to be done 
entirely using other types, that's the sort of thing that belongs in a 
user-contributed package.  I'm not aware of one that already has it, so 
you may have to write this yourself.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to group by then count?

2015-01-04 Thread MacQueen, Don
This seems to me to be a case where thinking in terms of computer
programming concepts is getting in the way a bit. Approach it as a data
analysis task; the S language (upon which R is based) is designed in part
for data analysis so there is a function that does most of the job for you.

(I changed your vector of strings to make the result more easily
interpreted)

 x = c(1, 1, 2, 1, 5, 2,'3','5','5','2','2')
 tmp - table(x)  ## counts the number of appearances of each element
 tmp[tmp==max(tmp)]   ## finds which one occurs most often
2 
4 

Meaning that the element '2' appears 4 times.  The table() function should
be fast even with long vectors. Here's an example with a vector of length
1 million:

foo - table( sample(letters, 1e6, replace=TRUE) )


One of the seminal books on the S language is John M Chambers' Programming
with Data -- and I would emphasize the with Data part of that title.

-- 

Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 1/4/15, 1:02 AM, Monnand monn...@gmail.com wrote:

Hi all,

I thought this was a very naive problem but I have not found any solution
which is idiomatic to R.

The problem is like this:

Assuming we have vector of strings:
 x = c(1, 1, 2, 1, 5, 2)

We want to count number of appearance of each string. i.e. in vector x,
string 1 appears 3 times; 2 appears twice and 5 appears once. Then I
want to know which string is the majority. In this case, it is 1.

For imperative languages like C, C++ Java and python, I would use a hash
table to count each strings where keys are the strings and values are the
number of appearance. For functional languages like clojure, there're
higher order functions like group-by.

However, for R, I can hardly find a good solution to this simple problem.
I
found a hash package, which implements hash table. However, installing a
package simple for a hash table is really annoying for me. I did find
aggregate and other functions which operates on data frames. But in my
case, it is a simple vector. Converting it to a data frame may be not
desirable. (Or is it?)

Could anyone suggest me an idiomatic way of doing such job in R? I would
be
appreciate for your help!

-Monnand

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] dealing with NA in readBin() and writeBin()

2015-01-04 Thread Mike Miller

The help doc for readBin writeBin tells me this:

Handling R's missing and special (Inf, -Inf and NaN) values is discussed 
in the ‘R Data Import/Export’ manual.


So I go here:

http://cran.r-project.org/doc/manuals/r-release/R-data.html#Special-values

Unfortunately, I don't really understand that.  Suppose I am using 
single-byte integers and I want 255 (binary ) to be translated to 
NA.  Is it possible to do that?  Of course I could always do something 
like this:


X[ X==255 ] - NA

The problem with that is that I want to process the data on the fly, 
dividing the integer to produce a double in the range from 0 to 2:


X - readBin( file, what=integer, n=N, size=1, signed=FALSE)/127

It looks like this still works:

X[ X==255/127 ] - NA

It would be neater if there were some kind of translation option for the 
input stream, like the way GNU tr (Linux/UNIX) works.  I'm looking around 
and not finding such a thing.  I can use gsub() to translate on the fly 
and then coerce back to integer format:


X - as.integer(gsub(255, NA, readBin( file, what=integer, n=N, size=1, 
signed=FALSE)))/127

What is your opinion of that tactic?  Is there a better way?  I don't know 
if that has any advantage on the postprocessing tactic above.  Maybe what 
I need is something like gsub() that can operate on numeric values...


X - numsub(255, NA, readBin( file, what=integer, n=N, size=1, 
signed=FALSE))/127

...but if that isn't better in terms of speed or memory usage than 
postprocessing like this...


X[ X==255/127 ] - NA

...then I really don't need it (for this, but it would be good to know 
about).



The na.strings = NA functionality of scan() is neat, but I guess that 
doesn't work with the binary read system.  I don't think I can scan the 
readBin input because it isn't a file or stdin.


Mike
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dealing with NA in readBin() and writeBin()

2015-01-04 Thread Duncan Murdoch
On 04/01/2015 5:13 PM, Mike Miller wrote:
 The help doc for readBin writeBin tells me this:
 
 Handling R's missing and special (Inf, -Inf and NaN) values is discussed 
 in the ‘R Data Import/Export’ manual.
 
 So I go here:
 
 http://cran.r-project.org/doc/manuals/r-release/R-data.html#Special-values
 
 Unfortunately, I don't really understand that.  Suppose I am using 
 single-byte integers and I want 255 (binary ) to be translated to 
 NA.  Is it possible to do that?  Of course I could always do something 
 like this:
 
 X[ X==255 ] - NA
 
 The problem with that is that I want to process the data on the fly, 
 dividing the integer to produce a double in the range from 0 to 2:
 
 X - readBin( file, what=integer, n=N, size=1, signed=FALSE)/127

Why?  Why not do it in three steps, i.e.

X - readBin( file, what=integer, n=N, size=1, signed=FALSE)
X[ X==255 ] - NA
X - X/127

If you are worried about the extra typing, then write a function to
handle all three steps.

 
 It looks like this still works:
 
 X[ X==255/127 ] - NA

I suspect that would work on all current platforms, but I wouldn't trust
it.  Don't use == on floating point values unless you know they are
fractions with 2^n in the denominator.

 It would be neater if there were some kind of translation option for the 
 input stream, like the way GNU tr (Linux/UNIX) works.  I'm looking around 
 and not finding such a thing.  I can use gsub() to translate on the fly 
 and then coerce back to integer format:

It's really trivial to write a wrapper for readBin to do what you want:

myReadBin - function(...) {
  X - readBin(...)
  X[ X==255 ] - NA
  X
}

Duncan Murdoch

 
 X - as.integer(gsub(255, NA, readBin( file, what=integer, n=N, size=1, 
 signed=FALSE)))/127
 
 What is your opinion of that tactic?  Is there a better way?  I don't know
 if that has any advantage on the postprocessing tactic above.  Maybe what 
 I need is something like gsub() that can operate on numeric values...
 
 X - numsub(255, NA, readBin( file, what=integer, n=N, size=1, 
 signed=FALSE))/127
 
 ...but if that isn't better in terms of speed or memory usage than 
 postprocessing like this...
 
 X[ X==255/127 ] - NA
 
 ...then I really don't need it (for this, but it would be good to know 
 about).
 
 
 The na.strings = NA functionality of scan() is neat, but I guess that 
 doesn't work with the binary read system.  I don't think I can scan the 
 readBin input because it isn't a file or stdin.
 
 Mike
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dealing with NA in readBin() and writeBin()

2015-01-04 Thread Mike Miller

On Sun, 4 Jan 2015, Duncan Murdoch wrote:


On 04/01/2015 5:13 PM, Mike Miller wrote:

The help doc for readBin writeBin tells me this:

Handling R's missing and special (Inf, -Inf and NaN) values is discussed
in the ‘R Data Import/Export’ manual.

So I go here:

http://cran.r-project.org/doc/manuals/r-release/R-data.html#Special-values

Unfortunately, I don't really understand that.  Suppose I am using
single-byte integers and I want 255 (binary ) to be translated to
NA.  Is it possible to do that?  Of course I could always do something
like this:

X[ X==255 ] - NA

The problem with that is that I want to process the data on the fly,
dividing the integer to produce a double in the range from 0 to 2:

X - readBin( file, what=integer, n=N, size=1, signed=FALSE)/127


Why?  Why not do it in three steps, i.e.

X - readBin( file, what=integer, n=N, size=1, signed=FALSE)
X[ X==255 ] - NA
X - X/127

If you are worried about the extra typing, then write a function to 
handle all three steps.


The thing I was concerned about is the memory usage, not the typing, 
because everything will be scripted.  But maybe memory isn't an issue and 
I never have to hold two copies in memory simultaneously.  There will be 
about 50 million elements, typically.


I think in terms of processing numbers that are streaming into memory, but 
that might not be what R is doing.  For example, with scan() and 
na.strings=NA, I picture it changing strings to NA as they are read, it 
might load the whole file as character, then do all the work with things 
like what=numeric() and na.strings=NA after the fact.  Maybe that 
doesn't impose an extra memory burden.




It looks like this still works:

X[ X==255/127 ] - NA


I suspect that would work on all current platforms, but I wouldn't trust 
it.  Don't use == on floating point values unless you know they are 
fractions with 2^n in the denominator.


Good point about platforms.  I was concerned about the use of ==, and 
you've convinced me it is not trustworthy.


Thanks very much.

Mike
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sapply function and poisson distribution

2015-01-04 Thread Pete Brecknock
dimnik wrote
 thank you for your answer.Yes,that sounds right.I thought the same thing
 but the problem is how can i generalize the command for every vector of
 numbers not only for the specific example?not only for c(1,2),c(0.1,0.8).
 
 2015-01-04 0:45 GMT+00:00 Pete Brecknock [via R] 

 ml-node+s789695n4701358h57@.nabble

:
 
  dimnik wrote
 i want to find  a function that takes in two vectors of numbers that have
 the same
 length.The output should be a list of vectors, where each vector is a
 sequence of
 randomly generated Poisson variables where the number of samples in each
 vector is determined by the entries in the first input vector and the
 lambdas come
 from the entries in the second input vector. For example, :If the inputs
 are c(1,2) and c(0.1,0.8) the output will be a list of twovectors where
 the
 first vectorhas a single sample from Poisson(0.1) and the second vector
 has
 two samples from Poisson(0.8).How can i do all that kind of stuff using
 sapply function?
 thank u in advance

 How about using mapply, the multivariate version of sapply?

 Based on your example ...

 mapply(function(x,y) rpois(x,y), c(1,2),c(0.1,0.8))

 HTH

 Pete

 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://r.789695.n4.nabble.com/sapply-function-and-poisson-distribution-tp4701353p4701358.html
  To unsubscribe from sapply function and poisson distribution, click here
 lt;http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codeamp;node=4701353amp;code=dmFnZWxpc2d1ZEBnbWFpbC5jb218NDcwMTM1M3wtMTg5MDAyODgzMA==gt;
 .
 NAML
 lt;http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_vieweramp;id=instant_html%21nabble%3Aemail.namlamp;base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespaceamp;breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.namlgt;


Not sure how you intend to specify the input vectors for n and lambda

One way would be as below - you can amend the 2 vectors with the values of
your choice.

n - c(1,2,3,4,5)
lambda - c(0.1,0.8,1.2,2.2,4.2)

mapply(function(x,y) rpois(x,y), n, lambda)  

HTH

Pete





--
View this message in context: 
http://r.789695.n4.nabble.com/sapply-function-and-poisson-distribution-tp4701353p4701384.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] counting sets of consecutive integers in a vector

2015-01-04 Thread Mike Miller
I have a vector of sorted positive integer values (e.g., postive integers 
after applying sort() and unique()).  For example, this:


c(1,2,5,6,7,8,25,30,31,32,33)

I want to make a matrix from that vector that has two columns: (1) the 
first value in every run of consecutive integer values, and (2) the 
corresponding number of consecutive values.  For example:


c(1:20) would become this...

1  20

...because there are 20 consecutive integers beginning with 1 and 
c(1,2,5,6,7,8,25,30,31,32,33) would become


1  2
5  4
25 1
30 4

What would be the best way to accomplish this?  Here is my first effort:

v - c(1,2,5,6,7,8,25,30,31,32,33)
L - rle( v - 1:length(v) )$lengths
n - length( L )
matrix( c( v[ c( 1, cumsum(L)+1 ) ][1:n], L), nrow=n)

 [,1] [,2]
[1,]12
[2,]54
[3,]   251
[4,]   304

I suppose that works well enough, but there may be a better way, and 
besides, I wouldn't want to deny anyone here the opportunity to solve a 
fun puzzle.  ;-)


The use for this is that I will be doing repeated seeks of a binary file 
to extract data.  seek() gives the starting point and readBin(n=X) gives 
the number of bytes to read.  So when there are many consecutive variables 
to be read, I can multiply the X in n=X by that number instead of doing 
many different seek() calls.  (The data are in a transposed format where I 
read in every record for some variable as sequential elements.)  I'm 
probably not the first person to deal with this.


Best,

Mike

--
Michael B. Miller, Ph.D.
University of Minnesota
http://scholar.google.com/citations?user=EV_phq4J

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] counting sets of consecutive integers in a vector

2015-01-04 Thread Peter Alspach
Tena koe Mike

An alternative, which is slightly fast:

  diffv - diff(v)
  starts - c(1, which(diffv!=1)+1)
  cbind(v[starts], c(diff(starts), length(v)-starts[length(starts)]+1))

Peter Alspach

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Mike Miller
Sent: Monday, 5 January 2015 1:03 p.m.
To: R-Help List
Subject: [R] counting sets of consecutive integers in a vector

I have a vector of sorted positive integer values (e.g., postive integers after 
applying sort() and unique()).  For example, this:

c(1,2,5,6,7,8,25,30,31,32,33)

I want to make a matrix from that vector that has two columns: (1) the first 
value in every run of consecutive integer values, and (2) the corresponding 
number of consecutive values.  For example:

c(1:20) would become this...

1  20

...because there are 20 consecutive integers beginning with 1 and
c(1,2,5,6,7,8,25,30,31,32,33) would become

1  2
5  4
25 1
30 4

What would be the best way to accomplish this?  Here is my first effort:

v - c(1,2,5,6,7,8,25,30,31,32,33)
L - rle( v - 1:length(v) )$lengths
n - length( L )
matrix( c( v[ c( 1, cumsum(L)+1 ) ][1:n], L), nrow=n)

  [,1] [,2]
[1,]12
[2,]54
[3,]   251
[4,]   304

I suppose that works well enough, but there may be a better way, and besides, I 
wouldn't want to deny anyone here the opportunity to solve a fun puzzle.  ;-)

The use for this is that I will be doing repeated seeks of a binary file to 
extract data.  seek() gives the starting point and readBin(n=X) gives the 
number of bytes to read.  So when there are many consecutive variables to be 
read, I can multiply the X in n=X by that number instead of doing many 
different seek() calls.  (The data are in a transposed format where I read in 
every record for some variable as sequential elements.)  I'm probably not the 
first person to deal with this.

Best,

Mike

-- 
Michael B. Miller, Ph.D.
University of Minnesota
http://scholar.google.com/citations?user=EV_phq4J

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
The contents of this e-mail are confidential and may be ...{{dropped:14}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] counting sets of consecutive integers in a vector

2015-01-04 Thread jim holtman
Here is another approach:

 v - c(1,2,5,6,7,8,25,30,31,32,33)

 # split by differences != 1
 t(sapply(split(v, cumsum(c(1, diff(v)) != 1)), function(x){
+ c(value = x[1L], length = length(x))  # output first value and length
+ }))
  value length
0 1  2
1 5  4
225  1
330  4



Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sun, Jan 4, 2015 at 8:27 PM, Peter Alspach 
peter.alsp...@plantandfood.co.nz wrote:

 Tena koe Mike

 An alternative, which is slightly fast:

   diffv - diff(v)
   starts - c(1, which(diffv!=1)+1)
   cbind(v[starts], c(diff(starts), length(v)-starts[length(starts)]+1))

 Peter Alspach

 -Original Message-
 From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Mike
 Miller
 Sent: Monday, 5 January 2015 1:03 p.m.
 To: R-Help List
 Subject: [R] counting sets of consecutive integers in a vector

 I have a vector of sorted positive integer values (e.g., postive integers
 after applying sort() and unique()).  For example, this:

 c(1,2,5,6,7,8,25,30,31,32,33)

 I want to make a matrix from that vector that has two columns: (1) the
 first value in every run of consecutive integer values, and (2) the
 corresponding number of consecutive values.  For example:

 c(1:20) would become this...

 1  20

 ...because there are 20 consecutive integers beginning with 1 and
 c(1,2,5,6,7,8,25,30,31,32,33) would become

 1  2
 5  4
 25 1
 30 4

 What would be the best way to accomplish this?  Here is my first effort:

 v - c(1,2,5,6,7,8,25,30,31,32,33)
 L - rle( v - 1:length(v) )$lengths
 n - length( L )
 matrix( c( v[ c( 1, cumsum(L)+1 ) ][1:n], L), nrow=n)

   [,1] [,2]
 [1,]12
 [2,]54
 [3,]   251
 [4,]   304

 I suppose that works well enough, but there may be a better way, and
 besides, I wouldn't want to deny anyone here the opportunity to solve a fun
 puzzle.  ;-)

 The use for this is that I will be doing repeated seeks of a binary file
 to extract data.  seek() gives the starting point and readBin(n=X) gives
 the number of bytes to read.  So when there are many consecutive variables
 to be read, I can multiply the X in n=X by that number instead of doing
 many different seek() calls.  (The data are in a transposed format where I
 read in every record for some variable as sequential elements.)  I'm
 probably not the first person to deal with this.

 Best,

 Mike

 --
 Michael B. Miller, Ph.D.
 University of Minnesota
 http://scholar.google.com/citations?user=EV_phq4J

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 The contents of this e-mail are confidential and may be ...{{dropped:14}}

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] counting sets of consecutive integers in a vector

2015-01-04 Thread jim holtman
Here is a solution using data.table

 require(data.table)
 x - data.table(v, diff = cumsum(c(1, diff(v)) != 1))
 x
 v diff
 1:  10
 2:  20
 3:  51
 4:  61
 5:  71
 6:  81
 7: 252
 8: 303
 9: 313
10: 323
11: 333
 x[, list(value = v[1L], length = .N), key = 'diff']
   diff value length
1:0 1  2
2:1 5  4
3:225  1
4:330  4
 x[, list(value = v[1L], length = .N), key = 'diff'][, -1, with = FALSE]
# get rid of 'diff' column
   value length
1: 1  2
2: 5  4
3:25  1
4:30  4


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sun, Jan 4, 2015 at 7:03 PM, Mike Miller mbmille...@gmail.com wrote:

 I have a vector of sorted positive integer values (e.g., postive integers
 after applying sort() and unique()).  For example, this:

 c(1,2,5,6,7,8,25,30,31,32,33)

 I want to make a matrix from that vector that has two columns: (1) the
 first value in every run of consecutive integer values, and (2) the
 corresponding number of consecutive values.  For example:

 c(1:20) would become this...

 1  20

 ...because there are 20 consecutive integers beginning with 1 and
 c(1,2,5,6,7,8,25,30,31,32,33) would become

 1  2
 5  4
 25 1
 30 4

 What would be the best way to accomplish this?  Here is my first effort:

 v - c(1,2,5,6,7,8,25,30,31,32,33)
 L - rle( v - 1:length(v) )$lengths
 n - length( L )
 matrix( c( v[ c( 1, cumsum(L)+1 ) ][1:n], L), nrow=n)

  [,1] [,2]
 [1,]12
 [2,]54
 [3,]   251
 [4,]   304

 I suppose that works well enough, but there may be a better way, and
 besides, I wouldn't want to deny anyone here the opportunity to solve a fun
 puzzle.  ;-)

 The use for this is that I will be doing repeated seeks of a binary file
 to extract data.  seek() gives the starting point and readBin(n=X) gives
 the number of bytes to read.  So when there are many consecutive variables
 to be read, I can multiply the X in n=X by that number instead of doing
 many different seek() calls.  (The data are in a transposed format where I
 read in every record for some variable as sequential elements.)  I'm
 probably not the first person to deal with this.

 Best,

 Mike

 --
 Michael B. Miller, Ph.D.
 University of Minnesota
 http://scholar.google.com/citations?user=EV_phq4J

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/
 posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Separating a Complicated String Vector

2015-01-04 Thread William Dunlap
f - function (x) {
isState - is.element(tolower(x), tolower(state.name))
w - which(isState)
data.frame(State = x[rep(w, diff(c(w, length(x) + 1)) - 1L)],
City = x[!isState])
}

E.g.,
V1 -c(alabama, bates, tuscaloosa, smith, arkansas, fayette,
little rock, alaska, juneau, nome)
 f(V1)
 StateCity
1  alabama   bates
2  alabama  tuscaloosa
3  alabama   smith
4 arkansas fayette
5 arkansas little rock
6   alaska  juneau
7   alaskanome



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Sat, Jan 3, 2015 at 9:20 PM, npretnar npret...@gmail.com wrote:

 Sorry. Bad example on my part. Try this. V1 is ...

 V1
 alabama
 bates
 tuscaloosa
 smith
 arkansas
 fayette
 little rock
 alaska
 juneau
 nome

 And I want:

 V1  V2
 alabama bates
 alabama tuscaloosa
 alabama smith
 arkansasfayette
 arkansaslittle rock
 alaska  juneau
 alaskas nome

 This is more representative of the problem, extended to all 50 states.

 - Nick


 On Jan 3, 2015, at 9:22 PM, Ista Zahn wrote:

  I'm not sure what's so complicated about that (am I missing
  something?). You can search using grep, and replace using gsub, so
 
  tmpDF - read.table(text=V1  V2
  A   5
  a1  1
  a2  1
  a3  1
  a4  1
  a5  1
  B   4
  b1  1
  b2  1
  b3  1
  b4  1,
 header=TRUE)
  tmpDF - tmpDF[grepl([0-9], tmpDF$V1), ]
  data.frame(tmpDF, V3 = toupper(gsub([0-9], , tmpDF$V1)))
 
  Seems to do the trick.
 
  Best,
  Ista
 
  On Sat, Jan 3, 2015 at 9:41 PM, npretnar npret...@gmail.com wrote:
  I have a string variable (V1) in a data frame structured as follows:
 
  V1  V2
  A   5
  a1  1
  a2  1
  a3  1
  a4  1
  a5  1
  B   4
  b1  1
  b2  1
  b3  1
  b4  1
 
  I want the following:
 
  V1  V2  V3
  a1  1   A
  a2  1   A
  a3  1   A
  a4  1   A
  a5  1   A
  b1  1   B
  b2  1   B
  b3  1   B
  b4  1   B
 
  I am not sure how to go about making this transformation besides
 writing a long vector that contains each of the categorical string names
 (these are state names, so it would be a really long vector). Any help
 would be greatly appreciated.
 
  Thanks,
 
  Nicholas Pretnar
  Mizzou Economics Grad Assistant
  npret...@gmail.com
 
  __
  R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] counting sets of consecutive integers in a vector

2015-01-04 Thread Mike Miller
Thanks, Peter.  Why not cbind your idea for the first column with my idea 
for the second column and get it done in one line?:


v - c(1,2,5,6,7,8,25,30,31,32,33)
M - cbind( v[ c(1, which( diff(v) !=1 ) + 1 ) ] , rle( v - 1:length(v) 
)$lengths )
M

 [,1] [,2]
[1,]12
[2,]54
[3,]   251
[4,]   304

I find that pretty appealing and I'll probably stick with it.  It seems 
quite fast.  Here's an example:


# make fairly long vector
v - sort(unique(round(10*runif(10
length(v)
[1] 63274

# time the procedure:
ptm - proc.time() ; M - cbind( v[ c(1, which( diff(v) !=1 ) + 1 ) ] , rle( v 
- 1:length(v) )$lengths ) ; proc.time() - ptm
   user  system elapsed
   0.030.000.03

dim(M)
[1] 23212 2

I probably won't be using vectors any longer than that, and this isn't the 
kind of thing that I do over and over again, so that speed is excellent.


Mike



On Mon, 5 Jan 2015, Peter Alspach wrote:


Tena koe Mike

An alternative, which is slightly fast:

 diffv - diff(v)
 starts - c(1, which(diffv!=1)+1)
 cbind(v[starts], c(diff(starts), length(v)-starts[length(starts)]+1))

Peter Alspach

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Mike Miller
Sent: Monday, 5 January 2015 1:03 p.m.
To: R-Help List
Subject: [R] counting sets of consecutive integers in a vector

I have a vector of sorted positive integer values (e.g., postive integers after 
applying sort() and unique()).  For example, this:

c(1,2,5,6,7,8,25,30,31,32,33)

I want to make a matrix from that vector that has two columns: (1) the first 
value in every run of consecutive integer values, and (2) the corresponding 
number of consecutive values.  For example:

c(1:20) would become this...

1  20

...because there are 20 consecutive integers beginning with 1 and
c(1,2,5,6,7,8,25,30,31,32,33) would become

1  2
5  4
25 1
30 4

What would be the best way to accomplish this?  Here is my first effort:

v - c(1,2,5,6,7,8,25,30,31,32,33)
L - rle( v - 1:length(v) )$lengths
n - length( L )
matrix( c( v[ c( 1, cumsum(L)+1 ) ][1:n], L), nrow=n)

 [,1] [,2]
[1,]12
[2,]54
[3,]   251
[4,]   304

I suppose that works well enough, but there may be a better way, and besides, I 
wouldn't want to deny anyone here the opportunity to solve a fun puzzle.  ;-)

The use for this is that I will be doing repeated seeks of a binary file to 
extract data.  seek() gives the starting point and readBin(n=X) gives the 
number of bytes to read.  So when there are many consecutive variables to be 
read, I can multiply the X in n=X by that number instead of doing many 
different seek() calls.  (The data are in a transposed format where I read in 
every record for some variable as sequential elements.)  I'm probably not the 
first person to deal with this.

Best,

Mike

--
Michael B. Miller, Ph.D.
University of Minnesota
http://scholar.google.com/citations?user=EV_phq4J

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
The contents of this e-mail are confidential and may be subject to legal 
privilege.
If you are not the intended recipient you must not use, disseminate, distribute 
or
reproduce all or any part of this e-mail or attachments.  If you have received 
this
e-mail in error, please notify the sender and delete all material pertaining to 
this
e-mail.  Any opinion or views expressed in this e-mail are those of the 
individual
sender and may not represent those of The New Zealand Institute for Plant and
Food Research Limited.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.