[R] use of names() within lapply()

2013-04-17 Thread Ivan Alves
Dear all,

List g has 2 elements

 names(g)
[1] 2009-10-07 2012-02-29

and the list plot

lapply(g, plot, main=names(g))

results in equal plot titles with both list names, whereas distinct titles 
names(g[1]) and names(g[2]) are sought. Clearly, lapply is passing 'g' in stead 
of consecutively passing g[1] and then g[2] to process the additional 'main'  
argument to plot.  help(lapply) is mute as to what to element-wise pass 
parameters.  Any suggestion would be appreciated.

Kind regards,
Ivan
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] use of names() within lapply()

2013-04-17 Thread Ivan Alves
Dear Duncan and A.K.
Many thanks for your super quick help. The modified lapply did the trick, 
mapply died with a error Error in dots[[2L]][[1L]] : object of type 'builtin' 
is not subsettable.
Kind regards,
Ivan
On 17 Apr 2013, at 17:12, Duncan Murdoch murdoch.dun...@gmail.com wrote:

 On 17/04/2013 11:04 AM, Ivan Alves wrote:
 Dear all,
 
 List g has 2 elements
 
  names(g)
 [1] 2009-10-07 2012-02-29
 
 and the list plot
 
 lapply(g, plot, main=names(g))
 
 results in equal plot titles with both list names, whereas distinct titles 
 names(g[1]) and names(g[2]) are sought. Clearly, lapply is passing 'g' in 
 stead of consecutively passing g[1] and then g[2] to process the additional 
 'main'  argument to plot.  help(lapply) is mute as to what to element-wise 
 pass parameters.  Any suggestion would be appreciated.
 
 I think you want mapply rather than lapply, or you could do lapply on a 
 vector of indices.  For example,
 
 mapply(plot, g, main=names)
 
 or
 
 lapply(1:2, function(i) plot(g[[i]], main=names(g)[i]))
 
 Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] recursive function on a structured list of lists (dendrogram)

2012-10-24 Thread Ivan Alves
Dear all,

I have been trying the following without avail and would be very grateful for 
any help.  From a dendrogram (recursive list of lists with some structure), I 
would like to obtain some information of the component lists and of the 
enclosing list at the same time.  In dendrogram-speech I basically would like 
the label of the leaf and the height of the enclosing branch.

A dendrogram example (from the help file of stats::dendrogram), and some 
functions showing how it is structured:

hc - hclust(dist(USArrests), ave)
dend1 - as.dendrogram(hc)
plot(dend1)
str(dend1)
Similarly to dendrapply(), I tried o recursively obtain from the tree a list 
including, for each member (leaf) the height of the list containing it. 
However, I fail to fully grasp how the 'recursiveness' is made within the 
function saving both elements at the leaf and branch levels.  For reference the 
dendrapply function is as follows:

function (X, FUN, ...) 
{
FUN - match.fun(FUN)
if (!inherits(X, dendrogram)) 
stop('X' is not a dendrogram)
Napply - function(d) {
r - FUN(d, ...)
if (!is.leaf(d)) {
if (!is.list(r)) 
r - as.list(r)
if (length(r)  (n - length(d))) 
r[seq_len(n)] - vector(list, n)
r[] - lapply(d, Napply)
}
r
}
Napply(X)
}

I essentially don't manage to 'save' the height of a branch (a list of lists) 
so that it can be used at the next iterations for adding to the leafs there. 
Many thanks for any guidance on how to recursively implement a function.

Kind regards,
Ivan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 2 (related) problems with RODBC in 64 bit Windows

2012-08-29 Thread Ivan Alves
Dear Uwe,
Many thanks for the reply.
On 1, the problem is that RODBC on 32 bit ' interprets'  factors correctly, 
whereas on 64 bit it gives the error below.  On both systems forcing characters 
(via colClasses = character in read.csv), results in no problems.  I still 
see this as a problem of implementation on 64 bit.
On 2, many thanks, once I gather the courage to address Prof. Ripley I will 
send him a recollection of my experience.

Kind regards,
Ivan
On 29 Aug 2012, at 15:08, Uwe Ligges lig...@statistik.tu-dortmund.de wrote:

 
 
 On 24.08.2012 21:53, Ivan Alves wrote:
 Hi all,
 
 I am encountering an RODBC problem in R 2.15.1 in windows 64 bit which I do 
 not encountered in the same set up in windows 32 bit (the latest binary 
 version of RODBC in both cases from the same depository gotten by 
 install.packages(‘RODBC’), Oracle ODBC client software installed in 64 and 
 32 bit respectively)
 
 1.  The code looks like
 
 
 library(RODBC)
 
 credentials - read.csv(~/credentials.csv, head=T, row.names=1)
 
 db - odbcConnect(dsn=DSN, uid=credentials[DSN, username], 
 pwd=credentials[DSN, password], rows_at_time=1024)
 
 
 on which the odbcConnect call fails with the following error code
 
 Error in nchar(uid) : 'nchar()' requires a character vector
 
 (
 
 credentials are processed correctly and credentials[DSN, username] 
 correctly returns – by the way a factor –
 
 [1] _username_
 
 Levels: …
 
 ).
 
 
 When I run the equivalent call with direct arguments
 
 
 db - odbcConnect(DSN, uid=_username_, pwd=_password_, 
 rows_at_time=1024)
 
 
 it works just fine.  Furthermore both work just fine on windows 32 bit, or 
 on both systems when the colClasses = character option is used. Is this 
 perhaps a problem with RODBC in 64 bit when dealing with factors that is not 
 a problem in 32 bit?
 
 
 I think 32-bit and 64-bit behave the same way (but you have not compared 
 exactly), reading
 
 credentials - read.csv(~/credentials.csv, head=T, row.names=1)
 
 results in factors for username and password that have to be converted to 
 character. It is unrelated to RODBC.
 
 
 
 
 
 2.  Furthermore (and as reported in 
 http://stackoverflow.com/questions/3407015/querying-oracle-db-from-revolution-r-using-rodbc),
  there are issues with using sqlQuery without the option believeNRows=FALSE, 
 as RODBC seems to still have issues with signed vs. unsigned integer (or 
 sizeof(long) between 32 and 64 bit.
 
 
 Don't know, but that is something you may want to report (preferrably 
 including patches) to the package maintainer.
 
 Uwe ligges
 
 
 Any chance the problems have the same source in RODBC code and could be 
 addressed in the near future (after apparently years of making difficult the 
 transition to 64 bit for work with Oracle servers)? (is there an implicit 
 encouragement to use RJDBC when combining 64 bit R use and Oracle databases?)
 
 Many thanks in advance for any guidance.
 
 Ivan
  [[alternative HTML version deleted]]
 
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] 2 (related) problems with RODBC in 64 bit Windows

2012-08-24 Thread Ivan Alves
Hi all,

I am encountering an RODBC problem in R 2.15.1 in windows 64 bit which I do not 
encountered in the same set up in windows 32 bit (the latest binary version of 
RODBC in both cases from the same depository gotten by 
install.packages(‘RODBC’), Oracle ODBC client software installed in 64 and 32 
bit respectively)

1.  The code looks like


library(RODBC)

credentials - read.csv(~/credentials.csv, head=T, row.names=1)

db - odbcConnect(dsn=DSN, uid=credentials[DSN, username], 
pwd=credentials[DSN, password], rows_at_time=1024)


on which the odbcConnect call fails with the following error code

Error in nchar(uid) : 'nchar()' requires a character vector

(

credentials are processed correctly and credentials[DSN, username] 
correctly returns – by the way a factor –

[1] _username_

Levels: …

).


When I run the equivalent call with direct arguments


db - odbcConnect(DSN, uid=_username_, pwd=_password_, rows_at_time=1024)


it works just fine.  Furthermore both work just fine on windows 32 bit, or on 
both systems when the colClasses = character option is used. Is this perhaps 
a problem with RODBC in 64 bit when dealing with factors that is not a problem 
in 32 bit?


2.  Furthermore (and as reported in 
http://stackoverflow.com/questions/3407015/querying-oracle-db-from-revolution-r-using-rodbc),
 there are issues with using sqlQuery without the option believeNRows=FALSE, as 
RODBC seems to still have issues with signed vs. unsigned integer (or 
sizeof(long) between 32 and 64 bit.


Any chance the problems have the same source in RODBC code and could be 
addressed in the near future (after apparently years of making difficult the 
transition to 64 bit for work with Oracle servers)? (is there an implicit 
encouragement to use RJDBC when combining 64 bit R use and Oracle databases?)

Many thanks in advance for any guidance.

Ivan
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Graph in R with edge weights

2010-12-01 Thread Ivan Alves
Hi Arthur,

I was asking the same thing and came across the following (your need the sna 
library).

http://students.washington.edu/mclarkso/documents/gplot%20Ver2.pdf

Take a look at the edge.lwd and vertex.cex examples of the function gplot. You 
can use vectors for the different nodes.

Kind regards,
Ivan
On Dec 1, 2010, at 9:31 AM, arturs.onz...@gmail.com wrote:

 Can you please show code example, how to draw graph with some nodes and
 edges, but with weights. I only found here
 http://www.bioconductor.org/packages/release/bioc/vignettes/Rgraphviz/inst/doc/Rgraphviz.pdf-
 Using edge weights for labels, but...
 
 Here an example:
 
 library(graph); library(Rgraphviz)
 myNodes = c(s, p, q, r)
 myEdges = list(
 s = list(edges = c(p, q)),
 p = list(edges = c(p, q)),
 q = list(edges = c(p, r)),
 r = list(edges = c(s)))
 g = new(graphNEL, nodes = myNodes,
 edgeL = myEdges, edgemode =
 directed)
 plot(g)
 
 but how about weights?
 
 
 Thanx.
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2 adding vertical line at a certain date

2009-05-28 Thread Ivan Alves

check out geom_vline

 + geom_vline(xintercept=as.numeric(as.Date(2002-11-01)))

[you may not need to convert the date to numeric in the most recent  
ggplot2 version]


On 27 May 2009, at 20:31, stephen sefick wrote:


library(ggplot2)

melt.updn - (structure(list(date = structure(c(11808, 11869, 11961,  
11992,

12084, 12173, 12265, 12418, 12600, 12631, 12753, 12996, 13057,
13149, 11808, 11869, 11961, 11992, 12084, 12173, 12265, 12418,
12600, 12631, 12753, 12996, 13057, 13149), class = Date), variable =
structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c(unrestored,
restored), class = factor), value = c(1.1080259671261,  
0.732188576856918,
0.410334408061265, 0.458980396410056, 0.429867902470711,  
0.83126337241925,
0.602008712602784, 0.818751283264408, 1.12606382402475,  
0.246174719479079,
0.941043753226865, 0.986511619794787, 0.291074883642735,  
0.346361775752625,
1.36209038621623, 0.878561166753624, 0.525156715576168,  
0.80305564765846,
1.08084449441812, 1.24906568558731, 0.970954515841768,  
0.936838439269239,
1.26970090246036, 0.337831520417547, 0.909204325710795,  
0.951009811036613,

0.290735620653709, 0.426683515714219)), .Names = c(date, variable,
value), row.names = c(NA, -28L), class = data.frame))

qplot(date, value, data=melt.updn, shape=variable)+geom_smooth()

#I would like to add a line at November 1, 2002
#thanks for the help

--
Stephen Sefick

Let's not spend our time and resources thinking about things that are
so little or so large that all they really do for us is puff us up and
make us feel like gods.  We are mammals, and have not exhausted the
annoying little problems of being mammals.

-K. Mullis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data frames with å, ä, and ö (=n on-ASCII-characters) from windows to mac os x

2009-01-16 Thread Ivan Alves

Hi,

On my system (see below), it works fine (inputing the code below at  
the R prompt).  Make sure that the encoding of the input file is  
encoded UTF-8.


Rgds,

Ivan

 sessionInfo()
R version 2.8.1 Patched (2009-01-14 r47602)
i386-apple-darwin9.6.0

locale:
en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base
 structure(list(LANKOD = structure(c(11L, 19L, 10L, 13L, 21L,7L, 9L,  
18L, 8L, 3L, 16L, 6L, 5L, 4L, 15L, 2L, 20L, 17L, 1L,14L, 12L), .Label  
= c(AB, AC, BD, C, D, E, F, G,H, I, K, M, N,  
O, S, T, U, W, X, Y, Z), class = factor), Län =  
structure(c(1L, 4L, 3L, 5L, 6L, 7L,8L, 2L, 9L, 10L, 20L, 21L, 13L,  
14L, 15L, 16L, 17L, 18L, 12L,19L, 11L), .Label = c(Blekinge län,  
Dalarnas län, Gotlands län,Gävleborgs län,Hallands län,  
Jämtlands län, Jönköpings län,Kalmar län, Kronobergs län,  
Norrbottens län, Skåne län,Stockholms län, Södermanlands län,  
Uppsala län, Värmlands län,Västerbottens län, Västernorrlands  
län, Västmanlands län,Västra Götalands län, Örebro län,  
Östergötlands län), class =factor)), .Names = c(LANKOD,Län),  
class = data.frame, row.names = c(0, 1, 2, 3,4, 5, 6,  
7, 8, 9, 10, 11, 12, 13, 14, 15,16, 17, 18,  
19, 20))

   LANKOD  Län
0   K Blekinge län
1   X   Gävleborgs län
2   I Gotlands län
3   N Hallands län
4   ZJämtlands län
5   F   Jönköpings län
6   H   Kalmar län
7   W Dalarnas län
8   G   Kronobergs län
9  BD  Norrbottens län
10  T   Örebro län
11  EÖstergötlands län
12  DSödermanlands län
13  C  Uppsala län
14  SVärmlands län
15 ACVästerbottens län
16  Y  Västernorrlands län
17  U Västmanlands län
18 AB   Stockholms län
19  O Västra Götalands län
20  MSkåne län
 Länkarta - structure(list(LANKOD = structure(c(11L, 19L, 10L, 13L,  
21L,7L, 9L, 18L, 8L, 3L, 16L, 6L, 5L, 4L, 15L, 2L, 20L, 17L, 1L,14L,  
12L), .Label = c(AB, AC, BD, C, D, E, F, G,H, I,  
K, M, N, O, S, T, U, W, X, Y, Z), class =  
factor), Län = structure(c(1L, 4L, 3L, 5L, 6L, 7L,8L, 2L, 9L, 10L,  
20L, 21L, 13L, 14L, 15L, 16L, 17L, 18L, 12L,19L, 11L), .Label =  
c(Blekinge län, Dalarnas län, Gotlands län,Gävleborgs  
län,Hallands län, Jämtlands län, Jönköpings län,Kalmar län,  
Kronobergs län, Norrbottens län, Skåne län,Stockholms län,  
Södermanlands län, Uppsala län, Värmlands län,Västerbottens  
län, Västernorrlands län, Västmanlands län,Västra Götalands  
län, Örebro län, Östergötlands län), class =factor)), .Names =  
c(LANKOD,Län), class = data.frame, row.names = c(0, 1, 2,  
3,4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,  
15,16, 17, 18, 19, 20))

 ls()
[1] Länkarta

On 16 Jan 2009, at 14:13, Gustaf Rydevik wrote:


Hi,
I ran into this issue previously and managed to solve it, but I've
forgotten how and am getting frustrated...

I have a data frame (see below) with scandinavian characters in R
(2.7.1) running on a Win Xp-computer. I save the data frame in an
RData-file on a usb stick, and load() it in R (2.8.0) running on OS X
10.5. Now the name of the data frame and all factor labels with
scandinavian characters are scrambled. How do I make R in OS X read my
data frame?
From what I've managed to find in the list archives and the FAQ I  
either

1) run
Sys.setlocale(LC_ALL,en_US.UTF-8) ### Doesn't change anything
or
2) run
 defaults write org.R-project.R force.LANG en_US.UTF-8
in the terminal, which doesn't help either.
I must admit that I couldn't quite follow what documentation i found
on locales, so I might have messed up somewhere along the line.

Many thanks in advance for your help!

Regards,

Gustaf




Länkarta -
structure(list(LANKOD = structure(c(11L, 19L, 10L, 13L, 21L,
7L, 9L, 18L, 8L, 3L, 16L, 6L, 5L, 4L, 15L, 2L, 20L, 17L, 1L,
14L, 12L), .Label = c(AB, AC, BD, C, D, E, F, G,
H, I, K, M, N, O, S, T, U, W, X, Y, Z
), class = factor), Län = structure(c(1L, 4L, 3L, 5L, 6L, 7L,
8L, 2L, 9L, 10L, 20L, 21L, 13L, 14L, 15L, 16L, 17L, 18L, 12L,
19L, 11L), .Label = c(Blekinge län, Dalarnas län, Gotlands län,
Gävleborgs län, Hallands län, Jämtlands län, Jönköpings län,
Kalmar län, Kronobergs län, Norrbottens län, Skåne län,
Stockholms län, Södermanlands län, Uppsala län, Värmlands län,
Västerbottens län, Västernorrlands län, Västmanlands län,
Västra Götalands län, Örebro län, Östergötlands län), class =
factor)), .Names = c(LANKOD,
Län), class = data.frame, row.names = c(0, 1, 2, 3,
4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20))



--
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide 

[R] Treatment of Date ODBC objects in R (RODBC)

2008-12-22 Thread Ivan Alves

Dear all,

Retrieving an Oracle Date data type by means of RODBC (version  
1.2-4) I get different classes in R depending on which operating  
system I am in:


On MacOSX I get Date class
On Windows I get  POSIXt POSIXct class

The problem is material, as converting the POSIXt POSIXct object  
with as.Date() returns one day less (2008-12-17 00:00:00 CET is  
returned as 2008-12-16).


I have 2 related questions:

1. Is there a way to control the conversion used by RODBC for types  
Date? or is this controlled by the ODBC Driver (in my case the  
Oracle driver in Windows and Actual on Mac OS X)?


2. What is the trick to get as.Date() to return the _intended_ date  
(the date that the OS X environment correctly reads)?


Many thanks in advance for any guidance.

Best regards,
Ivan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregating along bins and bin-quantiles

2008-10-22 Thread Ivan Alves

Dear Mark and all interested,

Unfortunately the code provided by Mark does not work - there is a  
syntax error when run as provided. I looked at possibly solving the  
problem, but without much knowledge of the output of split (looks  
like a list of lists, and not a list of data frames), it is difficult  
to identify where in the call to lapply the problem arises. The  
problem both in Mark's code and my original (with tapply) is on the  
format of the output of the call to an implicit loop.  In fact I find  
this area of R one of the most obscure to my simplistic way of  
thinking (I would expect the output to have the same format as the  
input (data.frame to data.frame), but I am certain there must be good  
reasons for the way implicit loop functions return what they do).


Any further help would be appreciated, as I may have to resort to some  
(less elegant) loop...


Kind regards,
Ivan

On 22 Oct 2008, at 00:22, [EMAIL PROTECTED] wrote:
Hi Ivan: I think I understand better so below is some new code  but  
I'm still not totally sure that it's what you want. If not, then I  
think it brings you closer anyway ? the split function is very  
useful and I think that's what you need. let me know if below is  
what you needed.
if it's close but not quite right, i can look at it again. it's not  
a problem. if i'm totally off, maybe you should resend to the list  
because that means I probably can't fix it.


#= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
==
a - read.csv ( file = /opt/mark/research/equity/projects/R_mails/ 
example.csv , colClasses = c ( Date , numeric ) ) #beware of  
the path


# SPLIT BY DATE
# TO CREATE A LIST OF
# DATAFRAMES
DFlist - split(a,a$Date)
print(str(DFlist))

# USE LAPPLY TO CALL cut AND
# THEN aggregate ON EACH COMPONENT
# DATAFRAME IN THE LIST
tempresult - lapply(DFlist,function(.df) {
 .df$quantile - cut(.df$value,breaks=quantile(.df 
$value,probs=seq(0,1,0.1),na.rm=TRUE))

  aggregate(.df$value,list(DATE=.df$Date,QUANTILE=.df$quantile),sum)
})

# CHECK IF IT WORKED
print(tempresult)

# RBIBND EVERYTHING BACK TOGETHER
# SO THAT ITS ONE DATAFRAME
finalresult - do.call(rbind,tempresult)
print(finalresult)




On Tue, Oct 21, 2008 at  5:47 PM, Ivan Alves wrote:


Hello Mark,
Many thanks for the reply.  Your suggestion is essentially  
equivalent to my first attempt: the quantiles are estimated for the  
WHOLE of the a.value column.  Essentially what I would need is to  
first break down the value column by bins determined by the  
a.date column and THEN estimate the quantile for each bin.  you  
see, I would need the quantiles for each data entry, not for all  
the entries, thus if there are 12 dates (or bins), then I would  
need 12x#10 deciles, not just 10.

Kind regards,
Ivan

On 21 Oct 2008, at 22:20, [EMAIL PROTECTED] wrote:

Hi: I still wasn't very clear on what you wanted but that might be  
because i didn't save your original email ? I doubt that below
helps. i used cut instead of cut2 because I didn't have Hmisc  
loaded and I think cut does what you want ? Jim will probably  
later with a better answer.
He's the real expert with this type of thing. I just like to  
practice.


a - read.csv ( file = /opt/mark/research/equity/projects/ 
R_mails/ example.csv , colClasses = c ( Date , numeric ) )
a$quantile - cut(a$value,breaks=quantile(a  
$value,probs=seq(0,1,0.1),na.rm=TRUE))

aggregate(a$value,list(DATE=a$Date,QUANTILE=a$quantile),sum)


On 21 Oct 2008, at 09:25, Ivan Alves wrote:


Dear all,

Thanks to Jim and Mark for suggesting including the reproducible  
code.  Please note that the enclosed file would need to go to into  
the home folder or that the path for reading the CSV file be  
changed.  I hope no encoding issues emerge when reading it.


And the code

library(Hmisc) #need the cut2 function to mark the quantile a given  
line belongs to
a - read.csv(file = ~/example.csv,  
colClasses=c(Date,numeric)) #beware of the path

dim(a) #should give [1] 50762
aggregate(a$value, list(Date = a[,Date],Quantile=cut2(a 
$value,g=10)),sum) #should give the sum by year but on the quantiles  
for the whole population
aggregate(a$value, list(Date = a[,Date],Quantile=tapply(a 
$value,use.filter$Date,cut2,g=10)),sum) #gives error mentioned below


Once again, many thanks for any help
Ivan

On 21 Oct 2008, at 02:40, jim holtman wrote:


PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

You need to at least post a subset of your data so that we can
understand the data structures that you are using.  'dput' will  
create

an easily readable format for posting your data (much easier than if
you post the listing of a table).  Usually it is some 'type mismatch'
which says you really have to have the data to run the script  
against.


On Mon, Oct 20, 2008 at 6:38 PM, Ivan Alves [EMAIL PROTECTED] wrote:

Dear all,

I would like

[R] coalesce columns within a data frame

2008-10-22 Thread Ivan Alves

Dear all,

I searched the mail archives and the R site and found no guidance  
(tried merge, cbind and terms like coalesce with no success).   
There surely is a way to coalesce (like in SQL) columns in a  
dataframe, right?  For example, I would like to go from a dataframe  
with two columns to one with only one as follows:


From

Name.x Name.y
nx1 ny1
nx2 NA
NA ny3
NA NA
...

To

Name
nx1
nx2
ny3
NA
...

where column Name.x is taken if there is a value, and if not then  
column Name.y


Any help would be appreciated

Kind regards,
Ivan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] coalesce columns within a data frame

2008-10-22 Thread Ivan Alves

Dear all,
Thanks for all the replies.
I get something with Duncan's code (slightly more compact than the  
other two), but of class integer, whereas the two inputs are class  
factor.  Clearly the name information is lost.  I did not see  
anything on this in the help page for ifelse.


On this experience I also tried
df$Name - df$NAME.x
df[is.na(df$NAME.x),Name] - df[is.na(df $NAME.x),NAME.y]

but then again the factor issue was a problem (clearly the levels  
are not the same and then there is a conflict)


Any further guidance?
Kind regards,
Ivan

On 22 Oct 2008, at 17:26, Duncan Murdoch wrote:


On 10/22/2008 11:21 AM, Ivan Alves wrote:

Dear all,
I searched the mail archives and the R site and found no guidance   
(tried merge, cbind and terms like coalesce with no  
success).   There surely is a way to coalesce (like in SQL) columns  
in a  dataframe, right?  For example, I would like to go from a  
dataframe  with two columns to one with only one as follows:

From
Name.x Name.y
nx1 ny1
nx2 NA
NA ny3
NA NA
...
To
Name
nx1
nx2
ny3
NA
...
where column Name.x is taken if there is a value, and if not then   
column Name.y

Any help would be appreciated


I don't know of any special function to do that, but ifelse() can  
handle it easily:


Name - ifelse(is.na(Name.x), Name.y, Name.x)

(If those are columns of a dataframe named df, you'd prefix each  
column name by df$, or do


within(df, Name - ifelse(is.na(Name.x), Name.y, Name.x))

Duncan Murdoch


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] coalesce columns within a data frame

2008-10-22 Thread Ivan Alves
Many thanks to all for their help.  Factors are indeed very tricky and  
sided on the conversion to character.

Kind regards,
Ivan
On 22 Oct 2008, at 19:01, Duncan Murdoch wrote:


On 10/22/2008 12:09 PM, Ivan Alves wrote:

Dear all,
Thanks for all the replies.
I get something with Duncan's code (slightly more compact than the   
other two), but of class integer, whereas the two inputs are  
class  factor.  Clearly the name information is lost.  I did not  
see  anything on this in the help page for ifelse.


It is there, in this warning:

The mode of the result may depend on the value of 'test', and the
class attribute of the result is taken from 'test' and may be
inappropriate for the values selected from 'yes' and 'no'.

You'd want the result to be a factor, but those attributes are  
lost.  I think this is a result of two design flaws:  ifelse()  
shouldn't base the class on the test, it should base it on the  
values.  And factors in S and R have all sorts of problems.


You can work around this by converting to character vectors:

Name - ifelse(is.na(Name.x), as.character(Name.y),  
as.character(Name.x))


If you really want factors, you can convert back at the end, but why  
would you want to?


Duncan Murdoch


On this experience I also tried
df$Name - df$NAME.x
df[is.na(df$NAME.x),Name] - df[is.na(df $NAME.x),NAME.y]
but then again the factor issue was a problem (clearly the  
levels  are not the same and then there is a conflict)

Any further guidance?
Kind regards,
Ivan
On 22 Oct 2008, at 17:26, Duncan Murdoch wrote:

On 10/22/2008 11:21 AM, Ivan Alves wrote:

Dear all,
I searched the mail archives and the R site and found no  
guidance   (tried merge, cbind and terms like coalesce with  
no  success).   There surely is a way to coalesce (like in SQL)  
columns  in a  dataframe, right?  For example, I would like to go  
from a  dataframe  with two columns to one with only one as  
follows:

From
Name.x Name.y
nx1 ny1
nx2 NA
NA ny3
NA NA
...
To
Name
nx1
nx2
ny3
NA
...
where column Name.x is taken if there is a value, and if not  
then   column Name.y

Any help would be appreciated


I don't know of any special function to do that, but ifelse() can   
handle it easily:


Name - ifelse(is.na(Name.x), Name.y, Name.x)

(If those are columns of a dataframe named df, you'd prefix each   
column name by df$, or do


within(df, Name - ifelse(is.na(Name.x), Name.y, Name.x))

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregating along bins and bin-quantiles

2008-10-21 Thread Ivan Alves

Dear all,

Thanks to Jim and Mark for suggesting including the reproducible  
code.  Please note that the enclosed file would need to go to into the  
home folder or that the path for reading the CSV file be changed.  I  
hope no encoding issues emerge when reading it.


And the code

library(Hmisc) #need the cut2 function to mark the quantile a given  
line belongs to
a - read.csv(file = ~/example.csv, colClasses=c(Date,numeric))  
#beware of the path

dim(a) #should give [1] 50762
aggregate(a$value, list(Date = a[,Date],Quantile=cut2(a 
$value,g=10)),sum) #should give the sum by year but on the quantiles  
for the whole population
aggregate(a$value, list(Date = a[,Date],Quantile=tapply(a 
$value,use.filter$Date,cut2,g=10)),sum) #gives error mentioned below


Once again, many thanks for any help
Ivan

On 21 Oct 2008, at 02:40, jim holtman wrote:


PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

You need to at least post a subset of your data so that we can
understand the data structures that you are using.  'dput' will create
an easily readable format for posting your data (much easier than if
you post the listing of a table).  Usually it is some 'type mismatch'
which says you really have to have the data to run the script against.

On Mon, Oct 20, 2008 at 6:38 PM, Ivan Alves [EMAIL PROTECTED] wrote:

Dear all,

I would like to aggregate a data frame (consisting of 2 columns - one
for the bins, say factors, and one for the values) along bins and
quantiles within the bins.

I have tried

aggregate(data.frame$values, list(bin = data.frame
$bin,Quantile=cut2(data.frame$bin,g=10)),sum)

but then the quantiles apply to the population as a whole and not the
individual bins. Upon this realisation I have tried

aggregate(data.frame$values, list(bin = data.frame
$bin,Quantile=tapply(data.frame$values,data.frame 
$bin,cut2,g=10)),sum)


which gives the following error:

Error in sort.list(unique.default(x), na.last = TRUE) :
 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?

clearly I am doing something wrong, but cannot figure out what.  I
believe the error stems either from a. the output of tapply being a
list of a dimension equal to the number of bins, and not a list of
equal dimension as the values, or b. that somehow aggregate does not
like that the second list (of the quantiles within the bins are not
sorted nicely)

1. Do you have a reference for doing the summation on both bins and
quantiles within the bins?
2. If not, can you give me some guidance as to what I am doing wrong
and how I can solve the sort/list issue?

Any help would be greatly appreciated

Kind regards,

Ivan Alves


  [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] aggregating along bins and bin-quantiles

2008-10-20 Thread Ivan Alves
Dear all,

I would like to aggregate a data frame (consisting of 2 columns - one  
for the bins, say factors, and one for the values) along bins and  
quantiles within the bins.

I have tried

aggregate(data.frame$values, list(bin = data.frame 
$bin,Quantile=cut2(data.frame$bin,g=10)),sum)

but then the quantiles apply to the population as a whole and not the  
individual bins. Upon this realisation I have tried

aggregate(data.frame$values, list(bin = data.frame 
$bin,Quantile=tapply(data.frame$values,data.frame$bin,cut2,g=10)),sum)

which gives the following error:

Error in sort.list(unique.default(x), na.last = TRUE) :
   'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?

clearly I am doing something wrong, but cannot figure out what.  I  
believe the error stems either from a. the output of tapply being a  
list of a dimension equal to the number of bins, and not a list of  
equal dimension as the values, or b. that somehow aggregate does not  
like that the second list (of the quantiles within the bins are not  
sorted nicely)

1. Do you have a reference for doing the summation on both bins and  
quantiles within the bins?
2. If not, can you give me some guidance as to what I am doing wrong  
and how I can solve the sort/list issue?

Any help would be greatly appreciated

Kind regards,

Ivan Alves


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregating along bins and bin-quantiles

2008-10-20 Thread Ivan Alves
Apologies, just a typo in the first instruction (when translating the  
names), the question is still valid


On 21 Oct 2008, at 00:38, Ivan Alves wrote:


Dear all,

I would like to aggregate a data frame (consisting of 2 columns - one
for the bins, say factors, and one for the values) along bins and
quantiles within the bins.

I have tried

aggregate(data.frame$values, list(bin = data.frame
$bin,Quantile=cut2(data.frame$values,g=10)),sum)

but then the quantiles apply to the population as a whole and not the
individual bins. Upon this realisation I have tried

aggregate(data.frame$values, list(bin = data.frame
$bin,Quantile=tapply(data.frame$values,data.frame$bin,cut2,g=10)),sum)

which gives the following error:

Error in sort.list(unique.default(x), na.last = TRUE) :
  'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?

clearly I am doing something wrong, but cannot figure out what.  I
believe the error stems either from a. the output of tapply being a
list of a dimension equal to the number of bins, and not a list of
equal dimension as the values, or b. that somehow aggregate does not
like that the second list (of the quantiles within the bins which do  
not  appear to be

sorted nicely)

1. Do you have a reference for doing the summation on both bins and
quantiles within the bins?
2. If not, can you give me some guidance as to what I am doing wrong
and how I can solve the sort/list issue?

Any help would be greatly appreciated

Kind regards,

Ivan Alves


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.