date:20080606

[R] fractional ranks

2008-06-06 Thread array chip

Hi, is there a function to calculate fractional ranks?

Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R: Securities earning covariance

2008-06-06 Thread ANGELO.LINARDI

Thank you for your very fast response.
I just tried to use the zoo package, after having read the vignettes, but I get 
this error message:

Warning messages:
1: In x$DAY : $ operator is invalid for atomic vectors, returning NULL
2: In x$EARNINGS :
  $ operator is invalid for atomic vectors, returning NULL
3: In x$DAY : $ operator is invalid for atomic vectors, returning NULL
4: In x$EARNINGS :
  $ operator is invalid for atomic vectors, returning NULL
5: In x$DAY : $ operator is invalid for atomic vectors, returning NULL
6: In x$EARNINGS :
  $ operator is invalid for atomic vectors, returning NULL

Am I missing something ?

Thank you again

Angelo Linardi 
 

-Messaggio originale-
Da: Gabor Grothendieck [mailto:[EMAIL PROTECTED] 
Inviato: giovedì 5 giugno 2008 17.55
A: LINARDI ANGELO
Cc: r-help@r-project.org
Oggetto: Re: [R] Securities earning covariance

Check out the three vignettes (i.e. pdf documents in the zoo package). e.g.


Lines - SEC_ID  DAY EARNING
IT001   200701015.467
IT001   200701025.456
IT001   200701034.954
IT001   200701043.456
IT002   200701011.456
IT002   200701021.345
IT002   200701031.233
IT003   200701010.345
IT003   200701020.367
IT003   200701030.319

DF - read.table(textConnection(Lines), header = TRUE) DFs - split(DF, 
DF$SEC_ID)

library(zoo)
f - function(DF.) zoo(DF.$EARNING, as.Date(format(DF.$DAY), %Y%m%d)) z - 
do.call(merge, lapply(DFs, f))
cov(z) # uses n-1


On Thu, Jun 5, 2008 at 11:41 AM,  [EMAIL PROTECTED] wrote:
 Good morning,

 I am a new R user and I am trying to learn how to use it.
 I am trying to solve this problem.
 I have a dataframe df of daily securities (for a year) earnings as
 follows:

 SEC_ID  DAY EARNING
 IT001   200701015.467
 IT001   200701025.456
 IT001   200701034.954
 IT001   200701043.456
..
 IT002   200701011.456
 IT002   200701021.345
 IT002   200701031.233
..
 IT003   200701010.345
 IT003   200701020.367
 IT003   200701030.319
..

 And so on: about 800 different SEC_ID and about 18 rows.
 I have to calculate the covariance for each couple of securities x 
 and y according to the formula:

 Cov(x,y) = (sum[(x-x')*(y-y')]/N)/(sx*sy)

 being x' and y' the mean of securities earning in the year, N the 
 number of observations, sx and sy the standard deviation of x and y.
 To do this I could build a df2 data frame like this:

 DAY SEC_ID.xSEC_ID.yEARNING.x
 EARNING.y   x'  y'  sx  sy
 20070101IT001   IT002   5.467   1.456
 a   b   aa  bb
 20070101IT001   IT003   5.467   0.345
 a   c   aa  cc
 20070101IT002   IT003   1.456   0.345
 b   c   bb  cc
 20070102IT001   IT002   5.456   1.345
 a   b   aa  bb
 20070102IT001   IT003   5.456   0.367
 a   c   aa  cc
 20070102IT002   IT003   1.345   0.367
 b   c   bb  cc
 
 ...

 (merging df with itself with a condition SEC_ID.x  SEC_ID.y) and then 
 easily calculate the formula; but the dimensions are too big (the 
 process stops whit an out-of-memory message).
 Besides partitioning the input and using a loop, are there any smarter 
 solutions (eventually using split and other ways of subgroup merging
 to solve the problem ?
 Are there any shortcuts using statistical built-in functions (e.g.
 cov, vcov) ?
 Thank you in advance

 Angelo Linardi



 ** Le e-mail provenienti dalla Banca d'Italia sono trasmesse in buona 
 fede e non comportano alcun vincolo ne' creano obblighi per la Banca 
 stessa, salvo che cio' non sia espressamente previsto da un accordo scritto.
 Questa e-mail e' confidenziale. Qualora l'avesse ricevuta per errore, 
 La preghiamo di comunicarne via e-mail la ricezione al mittente e di 
 distruggerne il contenuto. La informiamo inoltre che l'utilizzo non 
 autorizzato del messaggio o dei suoi allegati potrebbe costituire reato. 
 Grazie per la collaborazione.
 -- E-mails from the Bank of Italy are sent in good faith but they are 
 neither binding on the Bank nor to be understood as creating any 
 obligation on its part except where provided for in a written 
 agreement. This e-mail is confidential. If you have received it by mistake, 
 please inform the sender by reply e-mail and

[R] R: Securities earning covariance

2008-06-06 Thread ANGELO.LINARDI

It works perfectly, thank you so much.
Now I will try to put teh results into a suitable form (a data frame like this):

SEC_ID.xSEC_ID.yEARN_COV

Thank you again

Angelo Linardi

-Messaggio originale-
Da: Patrick Burns [mailto:[EMAIL PROTECTED] 
Inviato: giovedì 5 giugno 2008 18.11
A: LINARDI ANGELO
Cc: r-help@r-project.org
Oggetto: Re: [R] Securities earning covariance

I would start by creating a matrix that held the returns with rows being the 
dates and columns being the securities.  You can do this by something along the 
lines of:

days - as.character(df[, 'DAY'])
sec - as.character(df[, 'SEC_ID']
earningmat - array(NA, c(length(unique(days)),
   length(unique(sec))), list(sort(unique(days)),
   unique(sec)))
submat - cbind(match(days, rownames(earningmat)),
match(sec, colnames(earningmat)))
earningmat[submat] - as.numeric(as.character(df[, 'EARNING']))

Notice that while the 'as.numeric-as.character' in the last line may not be 
needed -- if it is needed, it is needed in a big way.  If the 'EARNING' column 
is a factor (because there was at least one item that didn't appear to be 
numeric when it was read in), then skipping the 'as.numeric-as.character'
call will put the codes for the factor into the matrix.  It will be numeric as 
you expect, but complete garbage.

The trick with 'submat' is explained in any complete description of 
subscripting -- the subscripting section of Chapter 1 of S Poetry, for instance.

Once you have a suitable matrix, then you can use 'var' or some other function 
to get the variance matrix.  Depending on where you are going, a factor model 
variance may be better.
You can get 'factor.model.stat' from the public domain area of the Burns 
Statistics website.  This is especially useful if there are missing values in 
your matrix.


Patrick Burns
[EMAIL PROTECTED]
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and A Guide for the Unwilling S User)



[EMAIL PROTECTED] wrote:
 Good morning,

 I am a new R user and I am trying to learn how to use it.
 I am trying to solve this problem.
 I have a dataframe df of daily securities (for a year) earnings as
 follows:

 SEC_IDDAY EARNING
 IT001 200701015.467
 IT001 200701025.456
 IT001 200701034.954
 IT001 200701043.456
   ..
 IT002 200701011.456
 IT002 200701021.345
 IT002 200701031.233
   ..
 IT003 200701010.345
 IT003 200701020.367
 IT003 200701030.319   
   ..

 And so on: about 800 different SEC_ID and about 18 rows.
 I have to calculate the covariance for each couple of securities x 
 and y according to the formula:

 Cov(x,y) = (sum[(x-x')*(y-y')]/N)/(sx*sy)

 being x' and y' the mean of securities earning in the year, N the 
 number of observations, sx and sy the standard deviation of x and y.
 To do this I could build a df2 data frame like this:

 DAY   SEC_ID.xSEC_ID.yEARNING.x
 EARNING.y x'  y'  sx  sy
 20070101  IT001   IT002   5.467   1.456
 a b   aa  bb
 20070101  IT001   IT003   5.467   0.345
 a c   aa  cc
 20070101  IT002   IT003   1.456   0.345
 b c   bb  cc
 20070102  IT001   IT002   5.456   1.345
 a b   aa  bb
 20070102  IT001   IT003   5.456   0.367
 a c   aa  cc
 20070102  IT002   IT003   1.345   0.367
 b c   bb  cc
 
 ...

 (merging df with itself with a condition SEC_ID.x  SEC_ID.y) and then 
 easily calculate the formula; but the dimensions are too big (the 
 process stops whit an out-of-memory message).
 Besides partitioning the input and using a loop, are there any smarter 
 solutions (eventually using split and other ways of subgroup merging
 to solve the problem ?
 Are there any shortcuts using statistical built-in functions (e.g.
 cov, vcov) ?
 Thank you in advance

 Angelo Linardi



 ** Le e-mail provenienti dalla Banca d'Italia sono trasmesse in buona 
 fede e non comportano alcun vincolo ne' creano obblighi per la Banca 
 stessa, salvo che cio' non sia espressamente previsto da un accordo scritto.
 Questa e-mail e' confidenziale. Qualora l'avesse ricevuta per errore, 
 La preghiamo di comunicarne via e-mail la ricezione al mittente e di 
 distruggerne il contenuto. La informiamo inoltre che l'utilizzo non 
 autorizzato del messaggio o dei suoi allegati potrebbe costituire reato. 
 Grazie per la collaborazione.
 -- E-mails from the Bank of Italy are sent in

Re: [R] Getting R and x11 to work

2008-06-06 Thread Prof Brian Ripley


On Fri, 6 Jun 2008, Rick Bilonick wrote:


I'm using Suse Linux Enterprise Desktop 10.2 (SP2) on an HP 2133 (x86)
mini-notebook. (There apparently are a LOT of bugs in 10.1!) I
downloaded R-base from the openSuse 10.2 repository and was (finally)
able to install it (after installing blas and gcc-fortran). I can start
an R session and do computations. When I try to do any graphics using
x11, I get the message:

unable to load shared library '/usr/lib/R/modules//R_X11.so':
/usr/lib/R/modules//R_X11.so: undefined symbol:
cairo_image_surface_get_data

Does anyone have an idea on how to fix this?


Yes, your binary version of R is incompatible with the version of cairo 
you have installed (if you have one).  Really the RPM should have checked 
that, so please report to the maintainer.


Short-term fix: set X11.options(type=Xlib) in the session or in 
.Rprofile via


setHook(packageEvent(grDevices, onLoad),
function(...)  grDevices::X11.options(type=Xlib) )

Longer-term fix: install or update cairo = 1.2 (and preferably 1.4).

--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Multiple comment.char under read.table

2008-06-06 Thread Gundala Viswanath

Hi all,

Suppose I want to read a text file with read.table.
It containt lines to be skipped that begins with ! and ^.

Is there a way to include this two values in the read.table function?
I tried this but doesn't seem to work.

dat - read.table(mydata.txt, comment.char = c(!,^) , na.strings
= null, sep = \t);

Please advice.

-- 
Gundala Viswanath

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] power of a multiway ANOVA

2008-06-06 Thread biologeeks

 thank you ! :)

2008/6/5 Rolf Turner [EMAIL PROTECTED]:


 On 6/06/2008, at 1:08 AM, biologeeks wrote:

  dear all,

 in the package pwr , there is the fonction power.anova.test which permit
 to
 obtain the power for a one-way ANOVA...but I'm looking for a way to
 compute
 the power of a multiway ANOVA.( find the 1-beta). Is it possible?

 do you have some ideas ?


 The cumulative F distribution function pf in R allows for a non-centrality
 parameter, so you can calculate the power of any F-test, against any
 properly
 specified alternative hypothesis, if you know what you are doing.  (And if
 you
 don't know what you're doing, don't do it.)

cheers,

Rolf Turner

 ##
 Attention:This e-mail message is privileged and confidential. If you are
 not theintended recipient please delete the message and notify the
 sender.Any views or opinions presented are solely those of the author.

 This e-mail has been scanned and cleared by MailMarshal
 www.marshalsoftware.com
 ##


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How can I display a characters table ?

2008-06-06 Thread Maura E Monville

I would like to generate a graphics text. I have a 67x2  table with
5-character string in col 1 and 2-character string in col 2.
Is it possible to make such a table appear on a graphics or a
message-box pop-up window ?

Thank you so much.
-- 
Maura E.M

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] (baseline) logistic regression + gof functions?

2008-06-06 Thread David Barron

Hi,

I'm not sure why you think glm doesn't provide goodness of fit tests.
Have a look at anova.glm and summary.glm.  All the functions you
mention can deal with multiple predictors.  multinom deals with
non-binary data.  lrm will deal with ordinal data as well as binary.
polr (in the MASS package) will also do ordinal logistic regression.

David

On Thu, Jun 5, 2008 at 10:33 PM, Wim Bertels [EMAIL PROTECTED] wrote:

Hallo,

which function can i use to do (baseline) logistic regression +
goodness
of fit tests?

so far i found:

# logistic on binary data
lrm combined with resid(model,'gof')

# logistic on binary data
glm with no gof-test

# baseline logit on binary data
multinom with no gof-test

(# also, what if the data are not binary and more than one
predictor in
the model?)

Hints? Suggestions? Other functions that might help?

mvg,
Wim

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] modifying tbrm function

2008-06-06 Thread DAVID ARTETA GARCIA


Hi,

I haven´t much experience on writing functions and would like to  
modify the simple tbrm() function from package dplR in order to save  
the weights that it produces. I have tried using the superassignment  
operator as explained in the R intro, but is this the right way to  
save a variable within a function?  This is my code


mytukey - function (x, C = 9)
{
wt = rep(0, length(x))
x.med = median(x)
S.star = median(abs(x - x.med))
w0 = (x - x.med)/(C * S.star + 1e-06)
lt0.flag = abs(w0) = 1
wt[lt0.flag] = ((1 - w0^2)^2)[lt0.flag]
t.bi.m = sum(wt * x)/sum(wt)
myweights-wt  # this is my added line
t.bi.m

}


Thanks,

D.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Lattice: key does not accept German umlaute

2008-06-06 Thread Bernd Weiss


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

library(lattice)

## works as expected
xyplot(1~1, key = list(text = list(c(Maenner

## works as expected
xyplot(1~1, key = list(text = list(c(Maenner))), xlab = M\344nner)

## gives an error
xyplot(1~1, key = list(text = list(c(M\344nner

Is this a bug?

TIA,

Bernd

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFISPo6Usbvfbd00+ERArJFAJsEvWq2Cai7chuOADadZHT2pnRJOgCfWLdx
3Hs3PnCzd6nuTqt6JwCl+VM=
=RVUk
-END PGP SIGNATURE-

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] which question

2008-06-06 Thread Eleni Christodoulou

Hello list,

I was trying to select a column of a data frame using the *which* command. I
was actually selecting the rows of the data frame using *which, *and then
displayed a certain column of it. The command that I was using is:
sequence=*mydata*[*which*(human[,3] %in% genes.sam.names),*9*]
In the above command, *mydata  *is my data frame, *9 *is the column which I
want to display. The rest are just other variables that I use. The
*which*command is supposed to retrieve the rows of interst. The rows
are well
retrieved, however, if for the certain row, column *9* is NA, the respective
element of column *10* is displayed. How can I fix that?

Thank you very much,
Eleni

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] label outliers in geom_boxplot (ggplot2)

2008-06-06 Thread Mihalicza Péter


hadley wickham írta:

2008/5/27 Mihalicza Péter [EMAIL PROTECTED]:
  

Dear List and Hadley,

I would like to have a boxplot with ggplot2 and have the outlier values
labelled with their name attribute. So I did
 library(ggplot2)
   dat=data.frame(num=rep(1,20), val=c(runif(18),3,3.5),
name=letters[1:20])
   p=ggplot(dat, aes(y=val, x=num))+geom_boxplot(outlier.size=4,
outlier.colour=green)
   p+geom_text(label=dat$name)

But this -of course- labels all the data points. So I searched high and low
to find the way to only label the outliers, but I couldn't find any
solution. Probably my keywords were inappropriate, but I looked at the
ggplot website and  the book also. So I did this:
 boxout=boxplot(dat$val)$out
   outname=as.character(dat$name)
   outname[(dat$val %in% boxout)==FALSE]=\n
   p+geom_text(label=outname)

This works, but seems like a hack to me. Is there an obvious solution that I
am missing?



I don't think so.  This type of problem (where you need to
independently access the statistics generated by ggplot) does come up
fairly often, but I don't have any particularly good solution for it.
  
It's too obvious, so I am positive that there is a good reason for not 
doing this, but still:
why is it not possible, to have an outlier output in stat_boxplot that 
can be used at geom_text()?


Something like this, with upper:
dat=data.frame(num=rep(1,20), val=c(runif(18),3,3.5), 
name=letters[1:20])

ggplot(dat, aes(y=val, x=num))+stat_boxplot(outlier.size=4,
   + outlier.colour=green)+geom_text(aes(y=..upper..), label=This is 
upper hinge)


Unfortunately, this does not work and gives the error message:
   Error in eval(expr, envir, enclos) : object upper not found

Is it because you can only use stat outputs within the stat statements? 
Could it be possible to make them available outside the statements too?



P.S.  Sorry for taking so long to respond, I've been at my sister's
wedding in New Zealand
  

Thanks for the answer and happy marriage to your sister!

Peter



--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] (baseline) logistic regression + gof functions?

2008-06-06 Thread Wim Bertels


Thanks for the quick reply David,
so far this sums up to:

# logistic on binary data
 lrm combined with resid(model,'gof')

# logistic on raw binary data
 glm with gof using anova.glm 
(i think that anova.glm only makes sence on grouped binary data, not on the raw binary data..)

(so what is the gof for raw binary data and glm?)

# baseline logit on raw multicategory data
 multinom with no gof-test
(the only reference i found was *by GOEMAN* Jelle J. ; LE CESSIE Saskia
eg https://openaccess.leidenuniv.nl/bitst*r*eam/1887/4324/22/04.pdf 
but there is no R implementation?)


(# also, what if the data are not binary and more than one predictor in the 
model?)

# what if the grouped data are very unbalanced and might have a lot of empty 
cell counts?

Hints? Suggestions? Other functions that might help?

eg:
by grouped data i mean (example for binomial):
   Var1 (with 3 levels)
   L1   L2   L3
T112  
F020  


by raw data i mean:
eg
Bin   Var1
T  L1
T  L2
T  L3
T  L3
F  L2
F  L2

mvg,
Wim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] bartlett-test

2008-06-06 Thread hanen


i a have transformed my data to data frame named df with only column names(no
rownames).each column represnt one sample with 3 observations (in deed
nrow(df)=3, and ncol(df)=92).in order to check homoskedasticity of variance
of my 92 samples i do:

bartlett.test(df)

it work and give me a result.but i'm afread of getting false result,knowen
that a call of such function require a vector of data x and a factor g . g
is omitted when x is a list of vector.

is the call that i do true?
is a data frame in my case considered like a list of vector?



-- 
View this message in context: 
http://www.nabble.com/bartlett-test-tp17687757p17687757.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Agreggating data using external aggregation rules

2008-06-06 Thread ANGELO.LINARDI

Dear R experts,

I am currently facing a tricky problem which I have read a lot about in
the various R mailing lists without finding exactly what I need.
I have a big data frame DF (about 2,000,000 rows) with 7 columns being
variables and 1 being a measure (using reshape package nomeclature).
There are no duplicates in it. 
Fot each of the variables I have some rules to apply, being COD_IN the
value of the variable in the DF, COD_OUT the one to be transformed to;
once obtained the new codes in the DF I have to aggregate the new DF
(for example summing the measure).
Usually the total transformation (merge+aggregate) really decreases the
number of  lines in the data frame, but sometimes it can grows depending
on the rule. Just to give an idea, the first rule in v1 maps 820
different values into 7 ones. 
Using SQL and a database this can be done in a very straightforward way
(for example on the variable v1):

Select COD_OUT, v2, v3, v4, v5, v6, v7, sum(measure)
From DF, RULE_v1
Where v1=COD_IN
Group by v2, v3,v4, v5, v6, v7

So the first choice would be using a database; the second one would be
splitting the data frame and then joining the results.
Is there any other possibility to merge+aggregate caused by the merge ?

Thank you in advance

Angelo Linardi



** Le e-mail provenienti dalla Banca d'Italia sono trasmesse in buona fede e 
non 
comportano alcun vincolo ne' creano obblighi per la Banca stessa, salvo che 
cio' non 
sia espressamente previsto da un accordo scritto.
Questa e-mail e' confidenziale. Qualora l'avesse ricevuta per errore, La 
preghiamo di 
comunicarne via e-mail la ricezione al mittente e di distruggerne il contenuto. 
La 
informiamo inoltre che l'utilizzo non autorizzato del messaggio o dei suoi 
allegati 
potrebbe costituire reato. Grazie per la collaborazione.
-- E-mails from the Bank of Italy are sent in good faith but they are neither 
binding on 
the Bank nor to be understood as creating any obligation on its part except 
where 
provided for in a written agreement. This e-mail is confidential. If you have 
received it 
by mistake, please inform the sender by reply e-mail and delete it from your 
system. 
Please also note that the unauthorized disclosure or use of the message or any 
attachments could be an offence. Thank you for your cooperation. **

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Lattice: key does not accept German umlaute

2008-06-06 Thread Bernd Weiss


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Bernd Weiss schrieb:
| library(lattice)
|
| ## works as expected
| xyplot(1~1, key = list(text = list(c(Maenner
|
| ## works as expected
| xyplot(1~1, key = list(text = list(c(Maenner))), xlab = M\344nner)
|
| ## gives an error
| xyplot(1~1, key = list(text = list(c(M\344nner
|
| Is this a bug?
|
| TIA,
|
| Bernd
|

Sorry, I forgot to mention my

| sessionInfo()
R version 2.7.0 (2008-04-22)
i386-pc-mingw32

locale:
LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] lattice_0.17-8

loaded via a namespace (and not attached):
[1] grid_2.7.0  tools_2.7.0


Bernd
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFISQbkUsbvfbd00+ERAk/JAJ4nFVRTJFZe2XalbYDUgyz4YGw6LQCeIcAe
8dQeN5+Fwr+DThwjBxH72xA=
=MZr3
-END PGP SIGNATURE-

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] which question

2008-06-06 Thread Dieter Menne

Eleni Christodoulou elenichri at gmail.com writes:

 I was trying to select a column of a data frame using the *which* command. I
 was actually selecting the rows of the data frame using *which, *and then
 displayed a certain column of it. The command that I was using is:
 sequence=*mydata*[*which*(human[,3] %in% genes.sam.names),*9*]

Please provide a running example. The *mydata* are difficult to read.


Dieter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vector comparison

2008-06-06 Thread Jim Lemon


Karin Lagesen wrote:

I know this is fairly basic, but I must have somehow missed it in the
manuals.

I have two vectors, often of unequal length. I would like to compare
them for identity. Order of elements do not matter, but they should
contain the same.

I.e: I want this kind of comparison:



if (1==1) show(yes) else show(blah)


[1] yes


if (1==2) show(yes) else show(blah)


[1] blah


Only replace the numbers with for instance the vectors 




a = c(a)
b = c(b,c)
c = c(c,b)




Now, I realize I only get a warning when comparing things, but this to
me means that I am not doing it correctly:



if (a==a) show(yes) else show(blah)


[1] yes


if (a==b) show(yes) else show(blah)


[1] blah
Warning message:
In if (a == b) show(yes) else show(blah) :
  the condition has length  1 and only the first element will be used


if (b == c) show(yes) else show(blah)


[1] blah
Warning message:
In if (b == c) show(yes) else show(blah) :
  the condition has length  1 and only the first element will be used


I have also tried the %in% comparator, but that one throws warnings too:



if (b %in% c) show(yes) else show(blah)


[1] yes
Warning message:
In if (b %in% c) show(yes) else show(blah) :
  the condition has length  1 and only the first element will be used


if (c %in% c) show(yes) else show(blah)


[1] yes
Warning message:
In if (c %in% c) show(yes) else show(blah) :
  the condition has length  1 and only the first element will be used


So, how is this really supposed to be done?


Hi Karin,
My interpretation of your question is that you want to test whether two 
vectors contain the same elements, whether or not the order of those 
elements is the same. I'll first assume that the vectors must only have 
elements from the same _set_ and it doesn't matter if they have 
different lengths.


if(length(unique(a))==length(unique(b))) {
 if(all(unique(a)==unique(b))) cat(Yes\n)
 else cat(No\n)
}
else cat(No\n)

However, if the lengths must be the same, but the order may be different:

if(length(a)==length(b)) {
 if(all(sort(a)==sort(b))) cat(Yes\n)
 else cat(No\n)
}
else cat(No\n)

The latter test ensures that if there are repeated elements, the number 
of repeats of each element is the same in each vector.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Y values below the X plot

2008-06-06 Thread Jim Lemon


jpardila wrote:

Dear List,
I am creating a plot and I want to insert the tabular data below the X axis.
I mean for every value of X I want to show the value in Y as a table below
the plot. I think the attached image gives an idea of what I mean by this. 


Below is the code i am using now... but as you see the Y values don't have
the right location. Maybe I should insert them as a table? Any ideas on
that. This should be easy to do but I don't have much experience in R. 
Many thanks in advanced,

JP

http://www.nabble.com/file/p17670311/legend.jpg legend.jpg 
-

img1-c(-5.28191709,-5.364480081,-4.829456677,-5.325101503,-5.212952356,-5.181171896,-5.211122693,-5.153677663,-5.292961077,-5.151612394,-5.056544559,-5.151457115,-5.332984571,-5.325259917,-5.523870109,-5.429800485,-5.436455325)
img2-c(-5.55,-5.56,-5.72,-5.57,-5.34,-5.18,-5.18,-5.36,-5.46,-5.32,-5.29,-5.37,-5.42,-5.45,-5.75,-5.75,-5.77)
angle-26:42
plot(img1~angle, type=o, xlab=Incident angle, ylab=sigma,
ylim=c(-8,-2),lwd=2,col=8, pch=19,cex=1,axes=false)
lines(img2~angle,lwd=2,type=o, col=1, pch=19,cex=1)
legend(38,-2,format(img1,digits=2), cex=0.8)
legend(40,-2,format(img2,digits=2),cex=0.8)
legend(26, -2, c(Image 1,Image 2), cex=0.8,lwd=2,col=c(8,1), pch=19,
lty=1:2,bty=n)
abline(h = -1:-8, v = 25:45, col = lightgray, lty=3)

axis(1, at=2*0:22)
axis(2, at=-8:-2)
---

Hi JP,
I thought I could do this with addtable2plot, but I hadn't coded a 
column spacing into it (maybe next version). However, this is almost 
what you want, and I'm sure you can work out how to add the lines.


plot(img1~angle, type=o, xlab=Incident angle, ylab=sigma,
 ylim=c(-8,-2),lwd=2,col=8, pch=19,cex=1,axes=false)
box()
lines(img2~angle,lwd=2,type=o, col=1, pch=19,cex=1)
tablerownames-Angle\nImage1\nImage2
mtext(c(tablerownames,
 paste(angle,round(img1,2),round(img2,2),sep=\n)),
 1,line=1,at=c(24.7,angle),cex=0.5)

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Lattice: key does not accept German umlaute

2008-06-06 Thread Prof Brian Ripley

Well, you failed to give the 'at a minimum information' asked for in the 
posting guide, and \344 is locale-specific.  I see 'MingW32' below, so 
will guess this is German-language Windows.  We don't know what the error 
was, either.


It works correctly for me in CP1252 with R-patched, and gives an error in 
2.7.0 (and works in 2.6.2).  I think it was fixed as side effect of


o   Rare string width calculations in package grid were not
interpreting the string encoding correctly.

although it is not the same problem that NEWS item refers to.

My error message in 2.7.0 was

Error in grid.Call.graphics(L_setviewport, pvp, TRUE) :
  invalid input 'Männer' in 'utf8towcs'

which is what makes me think this was to do with sizing the viewport.


So please update to R-patched and try again.


On Fri, 6 Jun 2008, Bernd Weiss wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

library(lattice)

## works as expected
xyplot(1~1, key = list(text = list(c(Maenner

## works as expected
xyplot(1~1, key = list(text = list(c(Maenner))), xlab = M\344nner)

## gives an error
xyplot(1~1, key = list(text = list(c(M\344nner

Is this a bug?

TIA,

Bernd

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFISPo6Usbvfbd00+ERArJFAJsEvWq2Cai7chuOADadZHT2pnRJOgCfWLdx
3Hs3PnCzd6nuTqt6JwCl+VM=
=RVUk
-END PGP SIGNATURE-

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Lattice: key does not accept German umlaute

2008-06-06 Thread Dieter Menne

Bernd Weiss bernd.weiss at uni-koeln.de writes:

 library(lattice)
 ## gives an error
 xyplot(1~1, key = list(text = list(c(M\344nner
 
 Is this a bug?

You forgot to mention your version, assuming 2.7.0 unpatched.

Corrected by Brian Ripley in developer version (and probably also in patched)

http://finzi.psych.upenn.edu/R/Rhelp02a/archive/129251.html

Dieter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Merging two dataframes

2008-06-06 Thread Michael Pearmain

Hi All,

Newbie question for you all but i have been looking at the archieves and the
help dtuff to get a rough idea of what i want to do

I would like to merge two dataframes together based on a keyed variable in
one dataframe linking to the other dataframe.  Only some of the cases will
match but i would like to keep the others as well.

My dataframes have 67 and 28 cases respectively and i would like ot end uip
with one file 67 cases long (all 28 are matched cases).


I can use the merge command to merge two datasets together this but i still
get some
odd results, i'm using the code below;

ETC - read.csv(file=CSV_Data2.csv,head=TRUE,sep=,)
'SURVEY - read.csv(file=survey.csv,head=TRUE,sep=,)
'FullData - merge(ETC, SURVEY, by.SURVEY = uid, by.ETC = ord)

The merged file seems to have 1800 cases while the ETC data file only
has 67 and the SURVEY file only has 28.  (Reading the help it looks as if it
merges 1 case with all cases in the other file, which is not what i want)

The matching variables fields are the 'ord' field and the 'uid' field
Can anyone advise please?

-- 
Michael Pearmain

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Lattice: key does not accept German umlaute

2008-06-06 Thread Bernd Weiss


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Prof Brian Ripley schrieb:

[...]

| It works correctly for me in CP1252 with R-patched, and gives an error
| in 2.7.0 (and works in 2.6.2).  I think it was fixed as side effect of
|
| oRare string width calculations in package grid were not
| interpreting the string encoding correctly.
|
| although it is not the same problem that NEWS item refers to.
|
| My error message in 2.7.0 was
|
| Error in grid.Call.graphics(L_setviewport, pvp, TRUE) :
|   invalid input 'Männer' in 'utf8towcs'
|
| which is what makes me think this was to do with sizing the viewport.
|
|
| So please update to R-patched and try again.


That's it! Thanks for your help.

Bernd
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFISSwiUsbvfbd00+ERAphpAJ9I5vxmzCYIkl52potRXsMG322J1gCgxe4S
BgPTcyWju9A74csTgVPQSi4=
=urOX
-END PGP SIGNATURE-

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] boxplot changes fontsize of labels

2008-06-06 Thread Sebastian Merz

Hi all!

So far I learned some R but finilizing my plots so they look
publishable seems not to be possible.

I set up some boxplots. Everything works well but when I put more then
two of them in one plot the labels of the axes appear smaller than the
normal font size.

 x - rnorm(30)
 y - rnorm(30)
 par(mfrow=c(1,4))
 boxplot(x,y, names=c(horray, hurra))
 mtext(Jubel, side=1, line=2)

In case I take one or two boxplots this does not happen:
 par(mfrow=c(1,2))
 boxplot(x,y, names=c(horray, hurra))
 mtext(Jubel, side=1, line=2)

The cex.axis seems not to be changed, as setting it to 1.0 doesn't
change the behaviour. If cex.axis=1.3 in the first example the font
size used by boxplot and by mtext is about the same. But as I use a
function to draw quite some of these plots this hack is not a proper
solution.

I couldn't find anything about this behaviour in the documention or
the inet.

Can anybody explain? All hints are appriciated.

Thanks,
S. Merz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] simple data question

2008-06-06 Thread stephen sefick

if I wanted to use a name for a column with two words say Dick Cheney and
George Bush
can I put these in quotes Dick Cheney and George Bush to get them to
read into R using both read.table and read.zoo to recognize this.
thanks

Stephen

-- 
Let's not spend our time and resources thinking about things that are so
little or so large that all they really do for us is puff us up and make us
feel like gods. We are mammals, and have not exhausted the annoying little
problems of being mammals.

-K. Mullis

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multiple comment.char under read.table

2008-06-06 Thread Daniel Folkinshteyn

according to the helpfile, comment only takes one character, so you'll 
have to do some 'magic' :)


i'd suggest to first run mydata through sed, and replace one of the 
comment chars with another, then run read.table with the one comment 
char that remains.


sed -e 's/^\^/!/' mydata.txt  mydata2.txt

alternatively, you could do read.table twice, once with ! and once with 
^, and then pull out all the common rows from the two results.


on 06/06/2008 03:47 AM Gundala Viswanath said the following:

Hi all,

Suppose I want to read a text file with read.table.
It containt lines to be skipped that begins with ! and ^.

Is there a way to include this two values in the read.table function?
I tried this but doesn't seem to work.

dat - read.table(mydata.txt, comment.char = c(!,^) , na.strings
= null, sep = \t);

Please advice.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] simple data question

2008-06-06 Thread Daniel Folkinshteyn

should work - don't even have to put them in quotes, if your field 
separator is not space. why don't you just try it and see what comes out? :)


on 06/06/2008 08:43 AM stephen sefick said the following:

if I wanted to use a name for a column with two words say Dick Cheney and
George Bush
can I put these in quotes Dick Cheney and George Bush to get them to
read into R using both read.table and read.zoo to recognize this.
thanks

Stephen



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R: Securities earning covariance

2008-06-06 Thread Gabor Grothendieck

Update your version of zoo to the latest one.

On Fri, Jun 6, 2008 at 3:18 AM,  [EMAIL PROTECTED] wrote:
 Thank you for your very fast response.
 I just tried to use the zoo package, after having read the vignettes, but I 
 get this error message:

 Warning messages:
 1: In x$DAY : $ operator is invalid for atomic vectors, returning NULL
 2: In x$EARNINGS :
  $ operator is invalid for atomic vectors, returning NULL
 3: In x$DAY : $ operator is invalid for atomic vectors, returning NULL
 4: In x$EARNINGS :
  $ operator is invalid for atomic vectors, returning NULL
 5: In x$DAY : $ operator is invalid for atomic vectors, returning NULL
 6: In x$EARNINGS :
  $ operator is invalid for atomic vectors, returning NULL

 Am I missing something ?

 Thank you again

 Angelo Linardi


 -Messaggio originale-
 Da: Gabor Grothendieck [mailto:[EMAIL PROTECTED]
 Inviato: giovedì 5 giugno 2008 17.55
 A: LINARDI ANGELO
 Cc: r-help@r-project.org
 Oggetto: Re: [R] Securities earning covariance

 Check out the three vignettes (i.e. pdf documents in the zoo package). e.g.


 Lines - SEC_ID  DAY EARNING
 IT001   200701015.467
 IT001   200701025.456
 IT001   200701034.954
 IT001   200701043.456
 IT002   200701011.456
 IT002   200701021.345
 IT002   200701031.233
 IT003   200701010.345
 IT003   200701020.367
 IT003   200701030.319
 
 DF - read.table(textConnection(Lines), header = TRUE) DFs - split(DF, 
 DF$SEC_ID)

 library(zoo)
 f - function(DF.) zoo(DF.$EARNING, as.Date(format(DF.$DAY), %Y%m%d)) z - 
 do.call(merge, lapply(DFs, f))
 cov(z) # uses n-1


 On Thu, Jun 5, 2008 at 11:41 AM,  [EMAIL PROTECTED] wrote:
 Good morning,

 I am a new R user and I am trying to learn how to use it.
 I am trying to solve this problem.
 I have a dataframe df of daily securities (for a year) earnings as
 follows:

 SEC_ID  DAY EARNING
 IT001   200701015.467
 IT001   200701025.456
 IT001   200701034.954
 IT001   200701043.456
..
 IT002   200701011.456
 IT002   200701021.345
 IT002   200701031.233
..
 IT003   200701010.345
 IT003   200701020.367
 IT003   200701030.319
..

 And so on: about 800 different SEC_ID and about 18 rows.
 I have to calculate the covariance for each couple of securities x
 and y according to the formula:

 Cov(x,y) = (sum[(x-x')*(y-y')]/N)/(sx*sy)

 being x' and y' the mean of securities earning in the year, N the
 number of observations, sx and sy the standard deviation of x and y.
 To do this I could build a df2 data frame like this:

 DAY SEC_ID.xSEC_ID.yEARNING.x
 EARNING.y   x'  y'  sx  sy
 20070101IT001   IT002   5.467   1.456
 a   b   aa  bb
 20070101IT001   IT003   5.467   0.345
 a   c   aa  cc
 20070101IT002   IT003   1.456   0.345
 b   c   bb  cc
 20070102IT001   IT002   5.456   1.345
 a   b   aa  bb
 20070102IT001   IT003   5.456   0.367
 a   c   aa  cc
 20070102IT002   IT003   1.345   0.367
 b   c   bb  cc
 
 ...

 (merging df with itself with a condition SEC_ID.x  SEC_ID.y) and then
 easily calculate the formula; but the dimensions are too big (the
 process stops whit an out-of-memory message).
 Besides partitioning the input and using a loop, are there any smarter
 solutions (eventually using split and other ways of subgroup merging
 to solve the problem ?
 Are there any shortcuts using statistical built-in functions (e.g.
 cov, vcov) ?
 Thank you in advance

 Angelo Linardi



 ** Le e-mail provenienti dalla Banca d'Italia sono trasmesse in buona
 fede e non comportano alcun vincolo ne' creano obblighi per la Banca
 stessa, salvo che cio' non sia espressamente previsto da un accordo scritto.
 Questa e-mail e' confidenziale. Qualora l'avesse ricevuta per errore,
 La preghiamo di comunicarne via e-mail la ricezione al mittente e di
 distruggerne il contenuto. La informiamo inoltre che l'utilizzo non
 autorizzato del messaggio o dei suoi allegati potrebbe costituire reato. 
 Grazie per la collaborazione.
 -- E-mails from the Bank of Italy are sent in good faith but they are
 neither binding on the Bank nor to be understood as creating any
 obligation on its part except where provided for in a written

Re: [R] Merging two dataframes

2008-06-06 Thread Daniel Folkinshteyn


try this:
FullData - merge(ETC, SURVEY, by.x = ord, by.y = uid, all.x = T, 
all.y = F)


on 06/06/2008 07:30 AM Michael Pearmain said the following:

Hi All,

Newbie question for you all but i have been looking at the archieves and the
help dtuff to get a rough idea of what i want to do

I would like to merge two dataframes together based on a keyed variable in
one dataframe linking to the other dataframe.  Only some of the cases will
match but i would like to keep the others as well.

My dataframes have 67 and 28 cases respectively and i would like ot end uip
with one file 67 cases long (all 28 are matched cases).


I can use the merge command to merge two datasets together this but i still
get some
odd results, i'm using the code below;

ETC - read.csv(file=CSV_Data2.csv,head=TRUE,sep=,)
'SURVEY - read.csv(file=survey.csv,head=TRUE,sep=,)
'FullData - merge(ETC, SURVEY, by.SURVEY = uid, by.ETC = ord)

The merged file seems to have 1800 cases while the ETC data file only
has 67 and the SURVEY file only has 28.  (Reading the help it looks as if it
merges 1 case with all cases in the other file, which is not what i want)

The matching variables fields are the 'ord' field and the 'uid' field
Can anyone advise please?



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] request: a class having max frequency

2008-06-06 Thread Muhammad Azam

Dear R users
I have a very basic question. I tried but could not find the  required  result. 
using
dat - pima
f - table(dat[,9])

 f 
  0   1 
500 268
i want to find that class say 0 having maximum frequency i.e 500. I used
which.max(f)
which provide 
0 
1  
How can i get only the 0.  Thanks and 


best regards

Muhammad Azam 
Ph.D. Student 
Department of Medical Statistics, 
Informatics and Health Economics 
University of Innsbruck, Austria 


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] request: a class having max frequency

2008-06-06 Thread Chuck Cleland


On 6/6/2008 9:14 AM, Muhammad Azam wrote:

Dear R users
I have a very basic question. I tried but could not find the  required  result. 
using
dat - pima
f - table(dat[,9])

f 
  0   1 
500 268

i want to find that class say 0 having maximum frequency i.e 500. I used

which.max(f)
which provide 
0 
1  
How can i get only the 0.  Thanks and 


table(iris$Species)

setosa versicolor  virginica
50 50 50

which.max(table(iris$Species))
setosa
 1

names(which.max(table(iris$Species)))
[1] setosa


best regards

Muhammad Azam 
Ph.D. Student 
Department of Medical Statistics, 
Informatics and Health Economics 
University of Innsbruck, Austria 



  
	[[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code. 


--
Chuck Cleland, Ph.D.
NDRI, Inc. (www.ndri.org)
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] request: a class having max frequency

2008-06-06 Thread Michael Conklin

The 0 is the name of the item and the 1 is the index in f of the maximum
class. (since f is a table, and the first element of the table is the
maximum, which.max returns a 1) So, if you just want to know which class
is maximum you can say

 names(which.max(f))




Michael Conklin

Chief Methodologist - Advanced Analytics

 

MarketTools, Inc.

6465 Wayzata Blvd. Suite 170

Minneapolis, MN 55426 

Tel: 952.417.4719 | Mobile:612.201.8978

[EMAIL PROTECTED]

 

MarketTools(r)http://www.markettools.com

 

This e-mail and any attachments may contain privileged, confidential or
proprietary information. If you are not the intended recipient, be aware
that any review, copying, or distribution of this e-mail or any
attachment is strictly prohibited. If you have received this e-mail in
error, please return it to the sender immediately, and permanently
delete the original and any copies from your system. Thank you for your
cooperation.

 


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Muhammad Azam
Sent: Friday, June 06, 2008 8:15 AM
To: R Help; R-help request
Subject: [R] request: a class having max frequency

Dear R users
I have a very basic question. I tried but could not find the  required
result. using
dat - pima
f - table(dat[,9])

 f 
  0   1 
500 268
i want to find that class say 0 having maximum frequency i.e 500. I
used
which.max(f)
which provide 
0 
1  
How can i get only the 0.  Thanks and 


best regards

Muhammad Azam 
Ph.D. Student 
Department of Medical Statistics, 
Informatics and Health Economics 
University of Innsbruck, Austria 


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] request: a class having max frequency

2008-06-06 Thread Daniel Folkinshteyn


names(f)[which.max(f)]

on 06/06/2008 09:14 AM Muhammad Azam said the following:

Dear R users
I have a very basic question. I tried but could not find the  required  result. 
using
dat - pima
f - table(dat[,9])

f 
  0   1 
500 268

i want to find that class say 0 having maximum frequency i.e 500. I used

which.max(f)
which provide 
0 
1  
How can i get only the 0.  Thanks and 



best regards

Muhammad Azam 
Ph.D. Student 
Department of Medical Statistics, 
Informatics and Health Economics 
University of Innsbruck, Austria 



  
	[[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem in executing R on server

2008-06-06 Thread Erik Iverson


Run the sessionInfo() command in R, as the posting guide requests!

Jason Lee wrote:

Hi,

I am not too sure its what you meant :-
Below is the closest data for each session from top
 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
26792 jason 25   0 283m 199m 2620 R 100 0.6 0:00.38 R
 
The numbers changed as the processes are running. I am actually sharing 
the server with other few people. I dont think this is a problem.


And, for my own pc,
 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 6192 jason 25   0  157m 148m 2888 R  100 14.8   1081:21 R

On Fri, Jun 6, 2008 at 12:46 PM, Erik Iverson [EMAIL PROTECTED] 
mailto:[EMAIL PROTECTED] wrote:


And what is your sessionInfo() in each case!

Jason Lee wrote:

Hi,

I query free -m,

On my server it is,

total   used   free sharedbuffers  
  cached
Mem: 32190   8758  23431  0742  
2156


And on my pc,

total   used   free sharedbuffers  
  cached
Mem:  1002986 16  0132  
 255



On the server, the above figure is after I exited the R.
It seems that there are still alot free MB available if I am not
wrong.

On Fri, Jun 6, 2008 at 12:29 PM, Erik Iverson
[EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
mailto:[EMAIL PROTECTED]
mailto:[EMAIL PROTECTED] wrote:

   How much RAM is installed in your Sun Solaris server?  How
much RAM
   is installed on your PC?

   Jason Lee wrote:

   Hi,

   I am actually trying to do some matrix multiplications of
large
   datasets of 3000 columns and 150 rows.

   And I am running R version 2.7.0. http://2.7.0.
http://2.7.0. http://2.7.0.



   I tried setting  R --min-vsize=10M --max-vsize=100M
   --min-nsize=500k --max-nsize=1000M

   Yet I still get:-

   Error: cannot allocate vector of size 17.7 Mb

   I am running on Sun Solaris server.

   Please advise.
   Thanks.
   On Fri, Jun 6, 2008 at 11:50 AM, Erik Iverson
   [EMAIL PROTECTED]
mailto:[EMAIL PROTECTED]
mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
   mailto:[EMAIL PROTECTED]
mailto:[EMAIL PROTECTED]
   mailto:[EMAIL PROTECTED]
mailto:[EMAIL PROTECTED] wrote:



  Jason Lee wrote:

  Hi R-listers,

  I have problem in executing my R on server. It
returns me

  Error: cannot allocate vector of size 15.8 Mb

  each time when i execute R on the server. But it
doesnt
   give me
  any problem
  when i try executing on my own Pc (except it runs
   extremely slow).

  Any pointers to this? I tried to read the FAQ on
this issue
  before in the
  archive but it seems there is no one solution to this.


  And that is because there is no one cause to this
issue.  I might
  guess your 'server' has less memory than your 'PC',
but you
   didn't
  say anything your respective setups, or what you are even
   trying to
  do with R.


   I tried to

  simplified my code but it seems the problem is
still the
   same.



  Please advise. Thanks.

 [[alternative HTML version deleted]]

  __
  R-help@r-project.org mailto:R-help@r-project.org
mailto:R-help@r-project.org mailto:R-help@r-project.org
   mailto:R-help@r-project.org
mailto:R-help@r-project.org mailto:R-help@r-project.org
mailto:R-help@r-project.org

   mailing list

  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
   reproducible code.






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] request: a class having max frequency

2008-06-06 Thread Chuck Cleland


On 6/6/2008 9:18 AM, Chuck Cleland wrote:

On 6/6/2008 9:14 AM, Muhammad Azam wrote:

Dear R users
I have a very basic question. I tried but could not find the  
required  result. using

dat - pima
f - table(dat[,9])

f 

  0   1 500 268
i want to find that class say 0 having maximum frequency i.e 500. I 
used

which.max(f)
which provide 0 1  How can i get only the 0.  Thanks and 


table(iris$Species)

setosa versicolor  virginica
50 50 50

which.max(table(iris$Species))
setosa
 1

names(which.max(table(iris$Species)))
[1] setosa


  If, as above, more than one category frequency is at the maximum, you 
might want something like this:


x - table(iris$Species)

which(x == max(x))
setosa versicolor  virginica
 1  2  3

names(which(x == max(x)))
[1] setosa versicolor virginica


best regards

Muhammad Azam Ph.D. Student Department of Medical Statistics, 
Informatics and Health Economics University of Innsbruck, Austria


  [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.  


--
Chuck Cleland, Ph.D.
NDRI, Inc. (www.ndri.org)
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Manipulating DataSets

2008-06-06 Thread Neil Gupta

Hello R-users,

I have a very simple problem I wanted to solve. I have a large dataset as
such:
   Lag  X.Symbol Time TickType ReferenceNumber  Price Size  X.Symbol.1
Time.1 TickType.1 ReferenceNumber.1
1  ES 3:ESZ7.GB 08:30:00B74390987 151075   44
3:ESZ7.GB08:30:00  A  74390988
2  ES 3:YMZ7.EC 08:30:00B74390993  13686   17
3:YMZ7.EC08:30:00  A  74390994
3  YM 3:ESZ7.GB 08:30:00B74391135 151075   49
3:ESZ7.GB08:30:00  A  74391136
4  YM 3:YMZ7.EC 08:30:00B74390998  13686   17
3:YMZ7.EC08:30:00  A  74390999
5  YM 3:ESZ7.GB 08:30:00B74391135 151075   49
3:ESZ7.GB08:30:00  A  74391136
6  YM 3:YMZ7.EC 08:30:00B74391000  13686   14
3:YMZ7.EC08:30:00  A  74391001
  Price.1 Size.1  LeadTime MidPoint Spread
1  151100 22  08:30:00 *151087.5* 25
2   13688 27  08:30:00  13687.0  2
3  151100 22  08:30:00 *151087.5* 25
4   13688 27  08:30:00  13687.0  2
5  151100 22  08:30:00 151087.5 25
6   13688 27  08:30:00  13687.0  2


All I wanted to do was take the Log(MidPoint[2]) - Log(MidPoint[1]) for a
symbol 3:ESZ7.GB
So the first one would be log(151087.5) - log(151087.5). I wanted to do this
throughout the data set and add that in another column. I would appreciate
any help.

Regards,

Neil Gupta

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Agreggating data using external aggregation rules

2008-06-06 Thread Gabor Grothendieck

Use aggregate() for aggregation and use indexing or subset() for selection.
Alternately try the sqldf package: http://sqldf.googlecode.com
which allows one to perform SQL operations on data frames.

On Fri, Jun 6, 2008 at 6:12 AM,  [EMAIL PROTECTED] wrote:
 Dear R experts,

 I am currently facing a tricky problem which I have read a lot about in
 the various R mailing lists without finding exactly what I need.
 I have a big data frame DF (about 2,000,000 rows) with 7 columns being
 variables and 1 being a measure (using reshape package nomeclature).
 There are no duplicates in it.
 Fot each of the variables I have some rules to apply, being COD_IN the
 value of the variable in the DF, COD_OUT the one to be transformed to;
 once obtained the new codes in the DF I have to aggregate the new DF
 (for example summing the measure).
 Usually the total transformation (merge+aggregate) really decreases the
 number of  lines in the data frame, but sometimes it can grows depending
 on the rule. Just to give an idea, the first rule in v1 maps 820
 different values into 7 ones.
 Using SQL and a database this can be done in a very straightforward way
 (for example on the variable v1):

 Select COD_OUT, v2, v3, v4, v5, v6, v7, sum(measure)
 From DF, RULE_v1
 Where v1=COD_IN
 Group by v2, v3,v4, v5, v6, v7

 So the first choice would be using a database; the second one would be
 splitting the data frame and then joining the results.
 Is there any other possibility to merge+aggregate caused by the merge ?

 Thank you in advance

 Angelo Linardi



 ** Le e-mail provenienti dalla Banca d'Italia sono trasmesse in buona fede e 
 non
 comportano alcun vincolo ne' creano obblighi per la Banca stessa, salvo che 
 cio' non
 sia espressamente previsto da un accordo scritto.
 Questa e-mail e' confidenziale. Qualora l'avesse ricevuta per errore, La 
 preghiamo di
 comunicarne via e-mail la ricezione al mittente e di distruggerne il 
 contenuto. La
 informiamo inoltre che l'utilizzo non autorizzato del messaggio o dei suoi 
 allegati
 potrebbe costituire reato. Grazie per la collaborazione.
 -- E-mails from the Bank of Italy are sent in good faith but they are neither 
 binding on
 the Bank nor to be understood as creating any obligation on its part except 
 where
 provided for in a written agreement. This e-mail is confidential. If you have 
 received it
 by mistake, please inform the sender by reply e-mail and delete it from your 
 system.
 Please also note that the unauthorized disclosure or use of the message or any
 attachments could be an offence. Thank you for your cooperation. **

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Subsetting to unique values

2008-06-06 Thread Emslie, Paul [Ctr]

I want to take the first row of each unique ID value from a data frame.
For instance
 ddTable -
data.frame(Id=c(1,1,2,2),name=c(Paul,Joe,Bob,Larry))

I want a dataset that is 
Id  Name
1   Paul
2   Bob

 unique(ddTable)
Will give me all 4 rows, and
 unique(ddTable$Id) 
Will give me c(1,2), but not accompanied by the name column.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How can I display a characters table ?

2008-06-06 Thread Katharine Mullen

Dear Maura,
try the function textplot from the package gplots.  you can say
textplot(yourmatrix) and get a plot of a character matrix.

On Fri, 6 Jun 2008, Maura E Monville wrote:

 I would like to generate a graphics text. I have a 67x2  table with
 5-character string in col 1 and 2-character string in col 2.
 Is it possible to make such a table appear on a graphics or a
 message-box pop-up window ?

 Thank you so much.
 --
 Maura E.M

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Giovanna Jonalasinio è fuori ufficio, I' m away

2008-06-06 Thread Giovanna . Jonalasinio



Risposta automatica dal 06/06/08 fino al 14/06/08

I'm going to have limited access to my email untill the 14th of june 2008

Avrò accesso limitato all'email fino al 14 giugno 2008
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subsetting to unique values

2008-06-06 Thread Chuck Cleland


On 6/6/2008 9:35 AM, Emslie, Paul [Ctr] wrote:

I want to take the first row of each unique ID value from a data frame.
For instance

ddTable -

data.frame(Id=c(1,1,2,2),name=c(Paul,Joe,Bob,Larry))

I want a dataset that is 
Id	Name

1   Paul
2   Bob


unique(ddTable)

Will give me all 4 rows, and
unique(ddTable$Id) 

Will give me c(1,2), but not accompanied by the name column.


ddTable - data.frame(Id=c(1,1,2,2),name=c(Paul,Joe,Bob,Larry))

!duplicated(ddTable$Id)
[1]  TRUE FALSE  TRUE FALSE

ddTable[!duplicated(ddTable$Id),]
  Id name
1  1 Paul
3  2  Bob

?duplicated


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code. 


--
Chuck Cleland, Ph.D.
NDRI, Inc. (www.ndri.org)
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] simple data question

2008-06-06 Thread stephen sefick

Good point.  Thanks

On Fri, Jun 6, 2008 at 9:05 AM, Daniel Folkinshteyn [EMAIL PROTECTED]
wrote:

 should work - don't even have to put them in quotes, if your field
 separator is not space. why don't you just try it and see what comes out? :)

 on 06/06/2008 08:43 AM stephen sefick said the following:

  if I wanted to use a name for a column with two words say Dick Cheney and
 George Bush
 can I put these in quotes Dick Cheney and George Bush to get them to
 read into R using both read.table and read.zoo to recognize this.
 thanks

 Stephen




-- 
Let's not spend our time and resources thinking about things that are so
little or so large that all they really do for us is puff us up and make us
feel like gods. We are mammals, and have not exhausted the annoying little
problems of being mammals.

-K. Mullis

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subsetting to unique values

2008-06-06 Thread John Kane

I don't have R on this machine but will this work.  
myrows - unique(ddTable[,1])
unis - ddTable(myrows, ]

--- On Fri, 6/6/08, Emslie, Paul [Ctr] [EMAIL PROTECTED] wrote:

 From: Emslie, Paul [Ctr] [EMAIL PROTECTED]
 Subject: [R] Subsetting to unique values
 To: r-help@r-project.org
 Received: Friday, June 6, 2008, 9:35 AM
 I want to take the first row of each unique ID value from a
 data frame.
 For instance
  ddTable -
 data.frame(Id=c(1,1,2,2),name=c(Paul,Joe,Bob,Larry))

 I want a dataset that is 
 IdName
 1 Paul
 2 Bob

  unique(ddTable)
 Will give me all 4 rows, and
  unique(ddTable$Id) 
 Will give me c(1,2), but not accompanied by the name
 column.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] which question

2008-06-06 Thread Eleni Christodoulou

An example is:

symbol=human[which(human[,3] %in% genes.sam.names),8]

The data* human* and *genes.sam.names* are attached. The result of the above
command is:
 symbol
 [1] CCL18  MARCO  SYT13
 [4] FOXC1  CDH3
 [7] CA12   CELSR1 NM_018440
[10] MICROTUBULE-ASSOCIATED NM_015529  ESR1
[13] PHGDH  GABRP  LGMN
[16] MMP9   BMP7   KLF5
[19] RIPK2  GATA3  NM_032023
[22] TRIM2  CCND1  MMP12
[25] LDHB   AF493978   SOD2
[28] SOD2   SOD2   NME5
[31] STC2   RBP1   ROPN1
[34] RDH10  KRTHB1 SLPI
[37] BBOX1  FOXA1  NM_005669
[40] MCCC2  CHI3L1 GSTM3
[43] LPIN1  DSC2   FADS2
[46] ELF5   CYP1B1 LMO4
[49] AL035297   NM_152398  AB018342
[52] PIK3R1 NFKBIE MLZE
[55] NFIB   NM_052997  NM_006023
[58] CPB1   CXCL13 CBR3
[61] NM_017527  FABP7  DACH
[64] IFI27  ACOX2  CXCL11
[67] UGP2   CLDN4  M12740
[70] IGKC   IGKC   CLECSF12
[73] AY069977   HOXB2  SOX11
[76]NM_017422  TLR2
[79] CKS1B  BC017946   APOBEC3B
[82]HLA-DRB1   HLA-DQB1
[85]CCL13  C4orf7
[88]NM_173552
21345 Levels:  (2 (32 (55.11 (AIB-1) (ALU (CAK1) (CAP4) (CASPASE ... ZYX

As you can see, apart from gene symbols, which is the required thing, RefSeq
ID sare also retrieved...

Thanks a lot,
Eleni






On Fri, Jun 6, 2008 at 1:23 PM, Dieter Menne [EMAIL PROTECTED]
wrote:

 Eleni Christodoulou elenichri at gmail.com writes:

  I was trying to select a column of a data frame using the *which*
 command. I
  was actually selecting the rows of the data frame using *which, *and then
  displayed a certain column of it. The command that I was using is:
  sequence=*mydata*[*which*(human[,3] %in% genes.sam.names),*9*]
 
 Please provide a running example. The *mydata* are difficult to read.


 Dieter

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subsetting to unique values

2008-06-06 Thread Adrian Dusa

Emslie, Paul [Ctr] emsliep at atac.mil writes:

 
 I want to take the first row of each unique ID value from a data frame.
 For instance
  ddTable -
 data.frame(Id=c(1,1,2,2),name=c(Paul,Joe,Bob,Larry))
 
 I want a dataset that is 
 IdName
 1 Paul
 2 Bob
 
  unique(ddTable)
 Will give me all 4 rows, and
  unique(ddTable$Id) 
 Will give me c(1,2), but not accompanied by the name column.


ddTable[-which(duplicated(ddTable$Id)), ]

HTH,
Adrian

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Java to R interface

2008-06-06 Thread Dumblauskas, Jerry

Try and make sure that R is in your windows Path variable

I got your message when I first did this, but when I did the about it
then worked...

==
Please access the attached hyperlink for an important electronic communications 
disclaimer: 

http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
==

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Startup speed for a lengthy script

2008-06-06 Thread Dennis Fisher

Colleagues,

Several days ago, I wrote to the list about a lengthy delay in startup  
of a a script.  I will start with a brief summary of that email.  I  
have a 10,000 line script of which the final 3000 lines constitute a  
function.  The script contains time-markers (cat(date()) to that I can  
determine how fast it was read.  When I invoke the script from the OS  
(R --slave  Script.R; similar performance with R 2.6.1 or 2.7.0 on  
a Mac / Linux / Windows), the first 7000 lines were read in 5 seconds,  
then it took 2 minutes to read the remaining 3000 lines.  I inquired  
as to the cause for the lengthy reading of the final 3000 lines.

Subsequently, I whittled the 3000 lines to ~ 1000 (moving 2000 lines  
to smaller functions).  Now the first 9000 lines still reads in ~ 6  
seconds and the final 1000 lines in ~ 15 seconds.  Better but not ideal.

However, I just encountered a new situation that I don't understand.   
The R code is now embedded in a graphical interface built with Real  
Basic.  When I invoke the script in that environment, the first 9000  
lines takes the usual 6 seconds.  But, to my surprise, the final 1000  
lines takes 2 seconds!

There is one major difference in the implementation.  With the GUI,  
the commands are pushed, i.e., the GUI opens R, then sends a  
continuous stream of code.

Does anyone have any idea as to why the delay should be so different  
in the two settings?

Dennis


Dennis Fisher MD
P  (The P Less Than Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-415-564-2220
www.PLessThan.com


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] which question

2008-06-06 Thread Richard Pearson


I didn't get any attached data, but my suspicion here is that you have somehow 
got RefSeq IDs in column 8 of human, as well as the gene symbols. Did you read 
this data in from a text file?

Eleni Christodoulou wrote:

An example is:

symbol=human[which(human[,3] %in% genes.sam.names),8]

The data* human* and *genes.sam.names* are attached. The result of the above
command is:

symbol

 [1] CCL18  MARCO  SYT13
 [4] FOXC1  CDH3
 [7] CA12   CELSR1 NM_018440
[10] MICROTUBULE-ASSOCIATED NM_015529  ESR1
[13] PHGDH  GABRP  LGMN
[16] MMP9   BMP7   KLF5
[19] RIPK2  GATA3  NM_032023
[22] TRIM2  CCND1  MMP12
[25] LDHB   AF493978   SOD2
[28] SOD2   SOD2   NME5
[31] STC2   RBP1   ROPN1
[34] RDH10  KRTHB1 SLPI
[37] BBOX1  FOXA1  NM_005669
[40] MCCC2  CHI3L1 GSTM3
[43] LPIN1  DSC2   FADS2
[46] ELF5   CYP1B1 LMO4
[49] AL035297   NM_152398  AB018342
[52] PIK3R1 NFKBIE MLZE
[55] NFIB   NM_052997  NM_006023
[58] CPB1   CXCL13 CBR3
[61] NM_017527  FABP7  DACH
[64] IFI27  ACOX2  CXCL11
[67] UGP2   CLDN4  M12740
[70] IGKC   IGKC   CLECSF12
[73] AY069977   HOXB2  SOX11
[76]NM_017422  TLR2
[79] CKS1B  BC017946   APOBEC3B
[82]HLA-DRB1   HLA-DQB1
[85]CCL13  C4orf7
[88]NM_173552
21345 Levels:  (2 (32 (55.11 (AIB-1) (ALU (CAK1) (CAP4) (CASPASE ... ZYX

As you can see, apart from gene symbols, which is the required thing, RefSeq
ID sare also retrieved...

Thanks a lot,
Eleni






On Fri, Jun 6, 2008 at 1:23 PM, Dieter Menne [EMAIL PROTECTED]
wrote:


Eleni Christodoulou elenichri at gmail.com writes:


I was trying to select a column of a data frame using the *which*

command. I

was actually selecting the rows of the data frame using *which, *and then
displayed a certain column of it. The command that I was using is:
sequence=*mydata*[*which*(human[,3] %in% genes.sam.names),*9*]


Please provide a running example. The *mydata* are difficult to read.


Dieter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Richard D. Pearson [EMAIL PROTECTED]
School of Computer Science,http://www.cs.man.ac.uk/~pearsonr
University of Manchester,  Tel: +44 161 275 6178
Oxford Road,   Mob: +44 7971 221181
Manchester M13 9PL, UK.Fax: +44 161 275 6204

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merging two dataframes

2008-06-06 Thread Daniel Folkinshteyn

cool. :) yea, the argument names are by.x and by.y, so your by.etc were 
ignored in the black hole of arguments passed to other methods


on 06/06/2008 09:11 AM Michael Pearmain said the following:

Thanks
Works perfectly.
Was the problem due to me putting by.survey and by.etc rather than by.y 
and by.x?


I think when i was playing around i tried the all. command in that setup 
as well


Mike



On Fri, Jun 6, 2008 at 2:07 PM, Daniel Folkinshteyn [EMAIL PROTECTED] 
mailto:[EMAIL PROTECTED] wrote:


try this:
FullData - merge(ETC, SURVEY, by.x = ord, by.y = uid, all.x =
T, all.y = F)

on 06/06/2008 07:30 AM Michael Pearmain said the following:

Hi All,

Newbie question for you all but i have been looking at the
archieves and the
help dtuff to get a rough idea of what i want to do

I would like to merge two dataframes together based on a keyed
variable in
one dataframe linking to the other dataframe.  Only some of the
cases will
match but i would like to keep the others as well.

My dataframes have 67 and 28 cases respectively and i would like
ot end uip
with one file 67 cases long (all 28 are matched cases).


I can use the merge command to merge two datasets together this
but i still
get some
odd results, i'm using the code below;

ETC - read.csv(file=CSV_Data2.csv,head=TRUE,sep=,)
'SURVEY - read.csv(file=survey.csv,head=TRUE,sep=,)
'FullData - merge(ETC, SURVEY, by.SURVEY = uid, by.ETC = ord)

The merged file seems to have 1800 cases while the ETC data file
only
has 67 and the SURVEY file only has 28.  (Reading the help it
looks as if it
merges 1 case with all cases in the other file, which is not
what i want)

The matching variables fields are the 'ord' field and the 'uid'
field
Can anyone advise please?




--
Michael Pearmain
Senior Statistical Analyst


1st Floor, 180 Great Portland St. London W1W 5QZ
t +44 (0) 2032191684  
[EMAIL PROTECTED] mailto:[EMAIL PROTECTED]

[EMAIL PROTECTED] mailto:[EMAIL PROTECTED]


Doubleclick is a part of the Google group of companies


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] fit.variogram sgeostat error

2008-06-06 Thread Alexys Herleym Rodriguez Avellaneda

Hi,

When i do the next line it work fine:

fit.spherical(var, 0, 2.6, 250, type='c', iterations=10,
tolerance=1e-06, echo=FALSE,  plot.it=T, weighted=TRUE, delta=0.1,
verbose=TRUE)

But, i use the next and send one error:

fit.variogram(spherical, var, nugget=0, sill=2.6, range=250,
plot.it=TRUE, iterations=0)

This is the error:
Error in fit.variogram(spherical, var, nugget = 0, sill = 2.6, range
= 250,  :
  unused argument(s) (nugget = 0, sill = 2.6, range = 250, plot.it =
TRUE, iterations = 0)

any suggest?

Alexys H

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] lsmeans

2008-06-06 Thread Dani Valverde


Hello,
I have the next function call:

lme(fixed=Error ~ Temperature * Tumour ,random = ~1|ID, data=error_DB)

which returns an lme object. I am interested on carrying out some kind 
of lsmeans on the data returned, but I cannot find any function to do 
this in R. I'have seen the effect() function, but it does not work with 
lme objects. Any idea?


Best,

Dani

--
Daniel Valverde Saubí

Grup de Biologia Molecular de Llevats
Facultat de Veterinària de la Universitat Autònoma de Barcelona
Edifici V, Campus UAB
08193 Cerdanyola del Vallès- SPAIN

Centro de Investigación Biomédica en Red
en Bioingeniería, Biomateriales y
Nanomedicina (CIBER-BBN)

Grup d'Aplicacions Biomèdiques de la RMN
Facultat de Biociències
Universitat Autònoma de Barcelona
Edifici Cs, Campus UAB
08193 Cerdanyola del Vallès- SPAIN
+34 93 5814126

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn


Anybody have any thoughts on this? Please? :)

on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:

Hi everyone!

I have a question about data processing efficiency.

My data are as follows: I have a data set on quarterly institutional 
ownership of equities; some of them have had recent IPOs, some have not 
(I have a binary flag set). The total dataset size is 700k+ rows.


My goal is this: For every quarter since issue for each IPO, I need to 
find a matched firm in the same industry, and close in market cap. So, 
e.g., for firm X, which had an IPO, i need to find a matched non-issuing 
firm in quarter 1 since IPO, then a (possibly different) non-issuing 
firm in quarter 2 since IPO, etc. Repeat for each issuing firm (there 
are about 8300 of these).


Thus it seems to me that I need to be doing a lot of data selection and 
subsetting, and looping (yikes!), but the result appears to be highly 
inefficient and takes ages (well, many hours). What I am doing, in 
pseudocode, is this:


1. for each quarter of data, getting out all the IPOs and all the 
eligible non-issuing firms.
2. for each IPO in a quarter, grab all the non-issuers in the same 
industry, sort them by size, and finally grab a matching firm closest in 
size (the exact procedure is to grab the closest bigger firm if one 
exists, and just the biggest available if all are smaller)
3. assign the matched firm-observation the same quarters since issue 
as the IPO being matched

4. rbind them all into the matching dataset.

The function I currently have is pasted below, for your reference. Is 
there any way to make it produce the same result but much faster? 
Specifically, I am guessing eliminating some loops would be very good, 
but I don't see how, since I need to do some fancy footwork for each IPO 
in each quarter to find the matching firm. I'll be doing a few things 
similar to this, so it's somewhat important to up the efficiency of 
this. Maybe some of you R-fu masters can clue me in? :)


I would appreciate any help, tips, tricks, tweaks, you name it! :)

== my function below ===

fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata, 
quarters_since_issue=40) {


result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is 
cheaper, so typecast the result to matrix


colnames = names(tfdata)

quarterends = sort(unique(tfdata$DATE))

for (aquarter in quarterends) {
tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]

tfdata_quarter_fitting_nonissuers = tfdata_quarter[ 
(tfdata_quarter$Quarters.Since.Latest.Issue  quarters_since_issue)  
(tfdata_quarter$IPO.Flag == 0), ]
tfdata_quarter_ipoissuers = tfdata_quarter[ 
tfdata_quarter$IPO.Flag == 1, ]


for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
arow = tfdata_quarter_ipoissuers[i,]
industrypeers = tfdata_quarter_fitting_nonissuers[ 
tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
industrypeers = industrypeers[ 
order(industrypeers$Market.Cap.13f), ]

if ( nrow(industrypeers)  0 ) {
if ( nrow(industrypeers[industrypeers$Market.Cap.13f = 
arow$Market.Cap.13f, ])  0 ) {
bestpeer = 
industrypeers[industrypeers$Market.Cap.13f = arow$Market.Cap.13f, ][1,]

}
else {
bestpeer = industrypeers[nrow(industrypeers),]
}
bestpeer$Quarters.Since.IPO.Issue = 
arow$Quarters.Since.IPO.Issue


#tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO == 
bestpeer$PERMNO] = 1

result = rbind(result, as.matrix(bestpeer))
}
}
#result = rbind(result, tfdata_quarter)
print (aquarter)
}

result = as.data.frame(result)
names(result) = colnames
return(result)

}

= end of my function =



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] store filename

2008-06-06 Thread DAVID ARTETA GARCIA



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] label outliers in geom_boxplot (ggplot2)

2008-06-06 Thread hadley wickham

 It's too obvious, so I am positive that there is a good reason for not doing
 this, but still:
 why is it not possible, to have an outlier output in stat_boxplot that can
 be used at geom_text()?

 Something like this, with upper:
dat=data.frame(num=rep(1,20), val=c(runif(18),3,3.5),
 name=letters[1:20])
ggplot(dat, aes(y=val, x=num))+stat_boxplot(outlier.size=4,
   + outlier.colour=green)+geom_text(aes(y=..upper..), label=This is upper
 hinge)

 Unfortunately, this does not work and gives the error message:
   Error in eval(expr, envir, enclos) : object upper not found

 Is it because you can only use stat outputs within the stat statements?
 Could it be possible to make them available outside the statements too?

You can generally, but it won't work here.  The problem is that you
want a different y aesthetic for the statistic (val) than you do for
the geom (upper)  and there's no way to get around that with the
current design of ggplot2.

Hadley


-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Improving data processing efficiency

2008-06-06 Thread Patrick Burns


One thing that is likely to speed the code significantly
is if you create 'result' to be its final size and then
subscript into it.  Something like:

  result[i, ] - bestpeer

(though I'm not sure if 'i' is the proper index).

Patrick Burns
[EMAIL PROTECTED]
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and A Guide for the Unwilling S User)

Daniel Folkinshteyn wrote:

Anybody have any thoughts on this? Please? :)

on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:

Hi everyone!

I have a question about data processing efficiency.

My data are as follows: I have a data set on quarterly institutional 
ownership of equities; some of them have had recent IPOs, some have 
not (I have a binary flag set). The total dataset size is 700k+ rows.


My goal is this: For every quarter since issue for each IPO, I need 
to find a matched firm in the same industry, and close in market 
cap. So, e.g., for firm X, which had an IPO, i need to find a matched 
non-issuing firm in quarter 1 since IPO, then a (possibly different) 
non-issuing firm in quarter 2 since IPO, etc. Repeat for each issuing 
firm (there are about 8300 of these).


Thus it seems to me that I need to be doing a lot of data selection 
and subsetting, and looping (yikes!), but the result appears to be 
highly inefficient and takes ages (well, many hours). What I am 
doing, in pseudocode, is this:


1. for each quarter of data, getting out all the IPOs and all the 
eligible non-issuing firms.
2. for each IPO in a quarter, grab all the non-issuers in the same 
industry, sort them by size, and finally grab a matching firm closest 
in size (the exact procedure is to grab the closest bigger firm if 
one exists, and just the biggest available if all are smaller)
3. assign the matched firm-observation the same quarters since 
issue as the IPO being matched

4. rbind them all into the matching dataset.

The function I currently have is pasted below, for your reference. Is 
there any way to make it produce the same result but much faster? 
Specifically, I am guessing eliminating some loops would be very 
good, but I don't see how, since I need to do some fancy footwork for 
each IPO in each quarter to find the matching firm. I'll be doing a 
few things similar to this, so it's somewhat important to up the 
efficiency of this. Maybe some of you R-fu masters can clue me in? :)


I would appreciate any help, tips, tricks, tweaks, you name it! :)

== my function below ===

fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata, 
quarters_since_issue=40) {


result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is 
cheaper, so typecast the result to matrix


colnames = names(tfdata)

quarterends = sort(unique(tfdata$DATE))

for (aquarter in quarterends) {
tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]

tfdata_quarter_fitting_nonissuers = tfdata_quarter[ 
(tfdata_quarter$Quarters.Since.Latest.Issue  quarters_since_issue)  
(tfdata_quarter$IPO.Flag == 0), ]
tfdata_quarter_ipoissuers = tfdata_quarter[ 
tfdata_quarter$IPO.Flag == 1, ]


for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
arow = tfdata_quarter_ipoissuers[i,]
industrypeers = tfdata_quarter_fitting_nonissuers[ 
tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
industrypeers = industrypeers[ 
order(industrypeers$Market.Cap.13f), ]

if ( nrow(industrypeers)  0 ) {
if ( nrow(industrypeers[industrypeers$Market.Cap.13f 
= arow$Market.Cap.13f, ])  0 ) {
bestpeer = 
industrypeers[industrypeers$Market.Cap.13f = arow$Market.Cap.13f, ][1,]

}
else {
bestpeer = industrypeers[nrow(industrypeers),]
}
bestpeer$Quarters.Since.IPO.Issue = 
arow$Quarters.Since.IPO.Issue


#tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO == 
bestpeer$PERMNO] = 1

result = rbind(result, as.matrix(bestpeer))
}
}
#result = rbind(result, tfdata_quarter)
print (aquarter)
}

result = as.data.frame(result)
names(result) = colnames
return(result)

}

= end of my function =



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Improving data processing efficiency

2008-06-06 Thread Gabor Grothendieck

Try reading the posting guide before posting.

On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn [EMAIL PROTECTED] wrote:
 Anybody have any thoughts on this? Please? :)

 on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:

 Hi everyone!

 I have a question about data processing efficiency.

 My data are as follows: I have a data set on quarterly institutional
 ownership of equities; some of them have had recent IPOs, some have not (I
 have a binary flag set). The total dataset size is 700k+ rows.

 My goal is this: For every quarter since issue for each IPO, I need to
 find a matched firm in the same industry, and close in market cap. So,
 e.g., for firm X, which had an IPO, i need to find a matched non-issuing
 firm in quarter 1 since IPO, then a (possibly different) non-issuing firm in
 quarter 2 since IPO, etc. Repeat for each issuing firm (there are about 8300
 of these).

 Thus it seems to me that I need to be doing a lot of data selection and
 subsetting, and looping (yikes!), but the result appears to be highly
 inefficient and takes ages (well, many hours). What I am doing, in
 pseudocode, is this:

 1. for each quarter of data, getting out all the IPOs and all the eligible
 non-issuing firms.
 2. for each IPO in a quarter, grab all the non-issuers in the same
 industry, sort them by size, and finally grab a matching firm closest in
 size (the exact procedure is to grab the closest bigger firm if one exists,
 and just the biggest available if all are smaller)
 3. assign the matched firm-observation the same quarters since issue as
 the IPO being matched
 4. rbind them all into the matching dataset.

 The function I currently have is pasted below, for your reference. Is
 there any way to make it produce the same result but much faster?
 Specifically, I am guessing eliminating some loops would be very good, but I
 don't see how, since I need to do some fancy footwork for each IPO in each
 quarter to find the matching firm. I'll be doing a few things similar to
 this, so it's somewhat important to up the efficiency of this. Maybe some of
 you R-fu masters can clue me in? :)

 I would appreciate any help, tips, tricks, tweaks, you name it! :)

 == my function below ===

 fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata,
 quarters_since_issue=40) {

result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is
 cheaper, so typecast the result to matrix

colnames = names(tfdata)

quarterends = sort(unique(tfdata$DATE))

for (aquarter in quarterends) {
tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]

tfdata_quarter_fitting_nonissuers = tfdata_quarter[
 (tfdata_quarter$Quarters.Since.Latest.Issue  quarters_since_issue) 
 (tfdata_quarter$IPO.Flag == 0), ]
tfdata_quarter_ipoissuers = tfdata_quarter[ tfdata_quarter$IPO.Flag
 == 1, ]

for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
arow = tfdata_quarter_ipoissuers[i,]
industrypeers = tfdata_quarter_fitting_nonissuers[
 tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
industrypeers = industrypeers[
 order(industrypeers$Market.Cap.13f), ]
if ( nrow(industrypeers)  0 ) {
if ( nrow(industrypeers[industrypeers$Market.Cap.13f =
 arow$Market.Cap.13f, ])  0 ) {
bestpeer = industrypeers[industrypeers$Market.Cap.13f
 = arow$Market.Cap.13f, ][1,]
}
else {
bestpeer = industrypeers[nrow(industrypeers),]
}
bestpeer$Quarters.Since.IPO.Issue =
 arow$Quarters.Since.IPO.Issue

 #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO ==
 bestpeer$PERMNO] = 1
result = rbind(result, as.matrix(bestpeer))
}
}
#result = rbind(result, tfdata_quarter)
print (aquarter)
}

result = as.data.frame(result)
names(result) = colnames
return(result)

 }

 = end of my function =


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Store filename

2008-06-06 Thread DAVID ARTETA GARCIA


Hi list,

Is it possible to save the name of a filename automatically when  
reading it using read.table() or some other function?
My aim is to create then an output table with the name of the original  
table with a suffix like _out


example:

mydata = read.table(Run224_v2_060308.txt, sep = \t, header = TRUE)

## store name?

myfile = the_name_of_the_file

## do analysis of data and store in a data.frame myoutput
## write output in tab format

write.table(myoutput, c(myfile,_out.txt),sep=\t)

the name of the new file will be

Run224_v2_060308_out.txt

Thanks in advanve,



David

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] where to download BRugs?

2008-06-06 Thread Nanye Long

Hi all,

Does anyone know where to download the BRugs package? I did not find
it on r-project website. Thanks.

NL

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn


i did! what did i miss?

on 06/06/2008 11:45 AM Gabor Grothendieck said the following:

Try reading the posting guide before posting.

On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn [EMAIL PROTECTED] wrote:

Anybody have any thoughts on this? Please? :)

on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:

Hi everyone!

I have a question about data processing efficiency.

My data are as follows: I have a data set on quarterly institutional
ownership of equities; some of them have had recent IPOs, some have not (I
have a binary flag set). The total dataset size is 700k+ rows.

My goal is this: For every quarter since issue for each IPO, I need to
find a matched firm in the same industry, and close in market cap. So,
e.g., for firm X, which had an IPO, i need to find a matched non-issuing
firm in quarter 1 since IPO, then a (possibly different) non-issuing firm in
quarter 2 since IPO, etc. Repeat for each issuing firm (there are about 8300
of these).

Thus it seems to me that I need to be doing a lot of data selection and
subsetting, and looping (yikes!), but the result appears to be highly
inefficient and takes ages (well, many hours). What I am doing, in
pseudocode, is this:

1. for each quarter of data, getting out all the IPOs and all the eligible
non-issuing firms.
2. for each IPO in a quarter, grab all the non-issuers in the same
industry, sort them by size, and finally grab a matching firm closest in
size (the exact procedure is to grab the closest bigger firm if one exists,
and just the biggest available if all are smaller)
3. assign the matched firm-observation the same quarters since issue as
the IPO being matched
4. rbind them all into the matching dataset.

The function I currently have is pasted below, for your reference. Is
there any way to make it produce the same result but much faster?
Specifically, I am guessing eliminating some loops would be very good, but I
don't see how, since I need to do some fancy footwork for each IPO in each
quarter to find the matching firm. I'll be doing a few things similar to
this, so it's somewhat important to up the efficiency of this. Maybe some of
you R-fu masters can clue me in? :)

I would appreciate any help, tips, tricks, tweaks, you name it! :)

== my function below ===

fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata,
quarters_since_issue=40) {

   result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is
cheaper, so typecast the result to matrix

   colnames = names(tfdata)

   quarterends = sort(unique(tfdata$DATE))

   for (aquarter in quarterends) {
   tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]

   tfdata_quarter_fitting_nonissuers = tfdata_quarter[
(tfdata_quarter$Quarters.Since.Latest.Issue  quarters_since_issue) 
(tfdata_quarter$IPO.Flag == 0), ]
   tfdata_quarter_ipoissuers = tfdata_quarter[ tfdata_quarter$IPO.Flag
== 1, ]

   for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
   arow = tfdata_quarter_ipoissuers[i,]
   industrypeers = tfdata_quarter_fitting_nonissuers[
tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
   industrypeers = industrypeers[
order(industrypeers$Market.Cap.13f), ]
   if ( nrow(industrypeers)  0 ) {
   if ( nrow(industrypeers[industrypeers$Market.Cap.13f =
arow$Market.Cap.13f, ])  0 ) {
   bestpeer = industrypeers[industrypeers$Market.Cap.13f

= arow$Market.Cap.13f, ][1,]

   }
   else {
   bestpeer = industrypeers[nrow(industrypeers),]
   }
   bestpeer$Quarters.Since.IPO.Issue =
arow$Quarters.Since.IPO.Issue

#tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO ==
bestpeer$PERMNO] = 1
   result = rbind(result, as.matrix(bestpeer))
   }
   }
   #result = rbind(result, tfdata_quarter)
   print (aquarter)
   }

   result = as.data.frame(result)
   names(result) = colnames
   return(result)

}

= end of my function =


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to force two regression coefficients to be equal but opposite in sign?

2008-06-06 Thread Woolner, Keith

Is there a way to set up a regression in R that forces two coefficients

to be equal but opposite in sign?

 

I'm trying to setup a model where a subject appears in a pair of

environments where a measurement X is made.  There are a total of 5

environments, one of which is a baseline.  But each observation is for

a subject in only two of them, and not all subjects will appear in

each environment.

 

Each of the environments has an effect on the variable X.  I want to

measure the relative effects of each environment E on X with a model.

 

Xj = Xi * Ei / Ej

 

Ei of the baseline model is set equal to 1.

 

With a log transform, a linear-looking regression can be written as:

 

log(Xj) = log(Xi) + log(Ei) - log(Ej)

 

My data looks like:

 

#E1   X1   E2X2

1A   .20   B.25   

 

What I've tried in R:

 

env - c(A,B,C,D,E)

 

# Note: data is made up just for this example

 

df - data.frame(

X1 =
c(.20,.10,.40,.05,.10,.24,.30,.70,.48,.22,.87,.29,.24,.19,.92),

X2 =
c(.25,.12,.45,.01,.19,.50,.30,.40,.50,.40,.68,.30,.16,.02,.70),

E1 =
c(A,A,A,B,B,B,C,C,C,D,D,D,E,E,E),

E2 =
c(B,C,D,A,D,E,A,B,E,B,C,E,A,B,C)

)

 

model - lm(log(X2) ~ log(X1) + E1 + E2, data = df)

 

summary(model)

 

Call:

lm(formula = log(X2) ~ log(X1) + E1 + E2, data = df)

 

Residuals:

  1   2   3   4   5   6   7   8   9
10  11  12  13  14  15 

 0.3240  0.2621 -0.5861 -1.0283  0.5861  0.4422  0.3831 -0.2608 -0.1222
0.9002 -0.5802 -0.3200  0.6452 -0.9634  0.3182 

 

Coefficients:

Estimate Std. Error t value Pr(|t|)  

(Intercept)  0.545631.71558   0.3180.763  

log(X1)  1.297450.57295   2.2650.073 .

E1B -0.235710.95738  -0.2460.815  

E1C -0.570571.20490  -0.4740.656  

E1D -0.229880.98274  -0.2340.824  

E1E -1.171811.02918  -1.1390.306  

E2B -0.167750.87803  -0.1910.856  

E2C  0.059521.12779   0.0530.960  

E2D  0.430771.19485   0.3610.733  

E2E  0.406330.98289   0.4130.696  

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

 

Residual standard error: 1.004 on 5 degrees of freedom

Multiple R-squared: 0.7622, Adjusted R-squared: 0.3343 

F-statistic: 1.781 on 9 and 5 DF,  p-value: 0.2721 

 



 

What I need to do is force the corresponding environment coefficients

to be equal in absolute value, but opposite in sign.  That is:

 

E1B = -E2B

E1C = -E3C

E1D = -E3D

E1E = -E1E

 

In essence, E1 and E2 are the same variable, but can play two

different roles in the model depending on whether it's the first part

of the observation or the second part.

 

I searched the archive, and the closest thing I found to my situation

was:

 

http://tolstoy.newcastle.edu.au/R/e4/help/08/03/6773.html 

 

But the response to that thread didn't seem to be applicable to my

situation.

 

Any pointers would be appreciated.

 

Thanks,

Keith

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Improving data processing efficiency

2008-06-06 Thread Gabor Grothendieck

Its summarized in the last line to r-help.  Note reproducible and
minimal.

On Fri, Jun 6, 2008 at 12:03 PM, Daniel Folkinshteyn [EMAIL PROTECTED] wrote:
 i did! what did i miss?

 on 06/06/2008 11:45 AM Gabor Grothendieck said the following:

 Try reading the posting guide before posting.

 On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn [EMAIL PROTECTED]
 wrote:

 Anybody have any thoughts on this? Please? :)

 on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:

 Hi everyone!

 I have a question about data processing efficiency.

 My data are as follows: I have a data set on quarterly institutional
 ownership of equities; some of them have had recent IPOs, some have not
 (I
 have a binary flag set). The total dataset size is 700k+ rows.

 My goal is this: For every quarter since issue for each IPO, I need to
 find a matched firm in the same industry, and close in market cap. So,
 e.g., for firm X, which had an IPO, i need to find a matched non-issuing
 firm in quarter 1 since IPO, then a (possibly different) non-issuing
 firm in
 quarter 2 since IPO, etc. Repeat for each issuing firm (there are about
 8300
 of these).

 Thus it seems to me that I need to be doing a lot of data selection and
 subsetting, and looping (yikes!), but the result appears to be highly
 inefficient and takes ages (well, many hours). What I am doing, in
 pseudocode, is this:

 1. for each quarter of data, getting out all the IPOs and all the
 eligible
 non-issuing firms.
 2. for each IPO in a quarter, grab all the non-issuers in the same
 industry, sort them by size, and finally grab a matching firm closest in
 size (the exact procedure is to grab the closest bigger firm if one
 exists,
 and just the biggest available if all are smaller)
 3. assign the matched firm-observation the same quarters since issue
 as
 the IPO being matched
 4. rbind them all into the matching dataset.

 The function I currently have is pasted below, for your reference. Is
 there any way to make it produce the same result but much faster?
 Specifically, I am guessing eliminating some loops would be very good,
 but I
 don't see how, since I need to do some fancy footwork for each IPO in
 each
 quarter to find the matching firm. I'll be doing a few things similar to
 this, so it's somewhat important to up the efficiency of this. Maybe
 some of
 you R-fu masters can clue me in? :)

 I would appreciate any help, tips, tricks, tweaks, you name it! :)

 == my function below ===

 fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata,
 quarters_since_issue=40) {

   result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is
 cheaper, so typecast the result to matrix

   colnames = names(tfdata)

   quarterends = sort(unique(tfdata$DATE))

   for (aquarter in quarterends) {
   tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]

   tfdata_quarter_fitting_nonissuers = tfdata_quarter[
 (tfdata_quarter$Quarters.Since.Latest.Issue  quarters_since_issue) 
 (tfdata_quarter$IPO.Flag == 0), ]
   tfdata_quarter_ipoissuers = tfdata_quarter[
 tfdata_quarter$IPO.Flag
 == 1, ]

   for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
   arow = tfdata_quarter_ipoissuers[i,]
   industrypeers = tfdata_quarter_fitting_nonissuers[
 tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
   industrypeers = industrypeers[
 order(industrypeers$Market.Cap.13f), ]
   if ( nrow(industrypeers)  0 ) {
   if ( nrow(industrypeers[industrypeers$Market.Cap.13f =
 arow$Market.Cap.13f, ])  0 ) {
   bestpeer = industrypeers[industrypeers$Market.Cap.13f

 = arow$Market.Cap.13f, ][1,]

   }
   else {
   bestpeer = industrypeers[nrow(industrypeers),]
   }
   bestpeer$Quarters.Since.IPO.Issue =
 arow$Quarters.Since.IPO.Issue

 #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO ==
 bestpeer$PERMNO] = 1
   result = rbind(result, as.matrix(bestpeer))
   }
   }
   #result = rbind(result, tfdata_quarter)
   print (aquarter)
   }

   result = as.data.frame(result)
   names(result) = colnames
   return(result)

 }

 = end of my function =

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Improving data processing efficiency

2008-06-06 Thread Gabor Grothendieck

That is the last line of every message to r-help.

On Fri, Jun 6, 2008 at 12:05 PM, Gabor Grothendieck
[EMAIL PROTECTED] wrote:
 Its summarized in the last line to r-help.  Note reproducible and
 minimal.

 On Fri, Jun 6, 2008 at 12:03 PM, Daniel Folkinshteyn [EMAIL PROTECTED] 
 wrote:
 i did! what did i miss?

 on 06/06/2008 11:45 AM Gabor Grothendieck said the following:

 Try reading the posting guide before posting.

 On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn [EMAIL PROTECTED]
 wrote:

 Anybody have any thoughts on this? Please? :)

 on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:

 Hi everyone!

 I have a question about data processing efficiency.

 My data are as follows: I have a data set on quarterly institutional
 ownership of equities; some of them have had recent IPOs, some have not
 (I
 have a binary flag set). The total dataset size is 700k+ rows.

 My goal is this: For every quarter since issue for each IPO, I need to
 find a matched firm in the same industry, and close in market cap. So,
 e.g., for firm X, which had an IPO, i need to find a matched non-issuing
 firm in quarter 1 since IPO, then a (possibly different) non-issuing
 firm in
 quarter 2 since IPO, etc. Repeat for each issuing firm (there are about
 8300
 of these).

 Thus it seems to me that I need to be doing a lot of data selection and
 subsetting, and looping (yikes!), but the result appears to be highly
 inefficient and takes ages (well, many hours). What I am doing, in
 pseudocode, is this:

 1. for each quarter of data, getting out all the IPOs and all the
 eligible
 non-issuing firms.
 2. for each IPO in a quarter, grab all the non-issuers in the same
 industry, sort them by size, and finally grab a matching firm closest in
 size (the exact procedure is to grab the closest bigger firm if one
 exists,
 and just the biggest available if all are smaller)
 3. assign the matched firm-observation the same quarters since issue
 as
 the IPO being matched
 4. rbind them all into the matching dataset.

 The function I currently have is pasted below, for your reference. Is
 there any way to make it produce the same result but much faster?
 Specifically, I am guessing eliminating some loops would be very good,
 but I
 don't see how, since I need to do some fancy footwork for each IPO in
 each
 quarter to find the matching firm. I'll be doing a few things similar to
 this, so it's somewhat important to up the efficiency of this. Maybe
 some of
 you R-fu masters can clue me in? :)

 I would appreciate any help, tips, tricks, tweaks, you name it! :)

 == my function below ===

 fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata,
 quarters_since_issue=40) {

   result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is
 cheaper, so typecast the result to matrix

   colnames = names(tfdata)

   quarterends = sort(unique(tfdata$DATE))

   for (aquarter in quarterends) {
   tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]

   tfdata_quarter_fitting_nonissuers = tfdata_quarter[
 (tfdata_quarter$Quarters.Since.Latest.Issue  quarters_since_issue) 
 (tfdata_quarter$IPO.Flag == 0), ]
   tfdata_quarter_ipoissuers = tfdata_quarter[
 tfdata_quarter$IPO.Flag
 == 1, ]

   for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
   arow = tfdata_quarter_ipoissuers[i,]
   industrypeers = tfdata_quarter_fitting_nonissuers[
 tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
   industrypeers = industrypeers[
 order(industrypeers$Market.Cap.13f), ]
   if ( nrow(industrypeers)  0 ) {
   if ( nrow(industrypeers[industrypeers$Market.Cap.13f =
 arow$Market.Cap.13f, ])  0 ) {
   bestpeer = industrypeers[industrypeers$Market.Cap.13f

 = arow$Market.Cap.13f, ][1,]

   }
   else {
   bestpeer = industrypeers[nrow(industrypeers),]
   }
   bestpeer$Quarters.Since.IPO.Issue =
 arow$Quarters.Since.IPO.Issue

 #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO ==
 bestpeer$PERMNO] = 1
   result = rbind(result, as.matrix(bestpeer))
   }
   }
   #result = rbind(result, tfdata_quarter)
   print (aquarter)
   }

   result = as.data.frame(result)
   names(result) = colnames
   return(result)

 }

 = end of my function =

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Store filename

2008-06-06 Thread Daniel Folkinshteyn

well, where are you getting the filename in the first place? are you 
looping over a list of filenames that comes from somewhere?


generally, for concatenating strings, look at function 'paste':
write.table(myoutput, paste(myfilename,_out.txt, sep=''),sep=\t)

on 06/06/2008 11:51 AM DAVID ARTETA GARCIA said the following:

Hi list,

Is it possible to save the name of a filename automatically when reading 
it using read.table() or some other function?
My aim is to create then an output table with the name of the original 
table with a suffix like _out


example:

mydata = read.table(Run224_v2_060308.txt, sep = \t, header = TRUE)

## store name?

myfile = the_name_of_the_file

## do analysis of data and store in a data.frame myoutput
## write output in tab format

write.table(myoutput, c(myfile,_out.txt),sep=\t)

the name of the new file will be

Run224_v2_060308_out.txt

Thanks in advanve,



David

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Store filename

2008-06-06 Thread Henrique Dallazuanna

You can write your own function, something about like this:

read.table2 - function(file, ...)
{
x - read.table(file, ...)
attributes(x)[[file_name]] - file
return(x)
}

mydata - read.table2(Run224_v2_060308.txt, sep = \t, header = TRUE)
myfile - attr(x, file_name)

On Fri, Jun 6, 2008 at 12:51 PM, DAVID ARTETA GARCIA 
[EMAIL PROTECTED] wrote:

 Hi list,

 Is it possible to save the name of a filename automatically when reading it
 using read.table() or some other function?
 My aim is to create then an output table with the name of the original
 table with a suffix like _out

 example:

 mydata = read.table(Run224_v2_060308.txt, sep = \t, header = TRUE)

 ## store name?

 myfile = the_name_of_the_file

 ## do analysis of data and store in a data.frame myoutput
 ## write output in tab format

 write.table(myoutput, c(myfile,_out.txt),sep=\t)

 the name of the new file will be

 Run224_v2_060308_out.txt

 Thanks in advanve,



 David

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] fit.contrast error

2008-06-06 Thread Dani Valverde


Hello,
I am trying to perform a fit.contrast() on a lme object with this code:

attach(error_DB)
model_temperature - lme(Error ~ Temperature, data = error_DB,random=~1|ID)
summary(model_temperature)
fit.contrast(model_temperature, Temperature, c(-1,1), conf.int=0.95 )
detach(error_DB)

but I got this error

Error in `contrasts-`(`*tmp*`, value = c(-0.5, 0.5)) :
 contrasts apply only to factors

My database is a dataframe, very similar to that of the Orthodont. Could 
anyone give me some advise on how to solve the problem?


Best,

Dani


--
Daniel Valverde Saubí

Grup de Biologia Molecular de Llevats
Facultat de Veterinària de la Universitat Autònoma de Barcelona
Edifici V, Campus UAB
08193 Cerdanyola del Vallès- SPAIN

Centro de Investigación Biomédica en Red
en Bioingeniería, Biomateriales y
Nanomedicina (CIBER-BBN)

Grup d'Aplicacions Biomèdiques de la RMN
Facultat de Biociències
Universitat Autònoma de Barcelona
Edifici Cs, Campus UAB
08193 Cerdanyola del Vallès- SPAIN
+34 93 5814126

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] where to download BRugs?

2008-06-06 Thread Uwe Ligges



Dear NL.

BRugs is available from the CRAN extras repository hosted by Brian Ripley.
  install.packages(BRugs)
should install it as before (for R-2.7.x), if you have not changed the 
list of default repositories.


Best wishes,
Uwe Ligges




Nanye Long wrote:

Hi all,

Does anyone know where to download the BRugs package? I did not find
it on r-project website. Thanks.

NL

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] choosing an appropriate linear model

2008-06-06 Thread Levi Waldron

Perhaps this was too big a question, so I'll ask something shorter:

I have fit a linear model, and want to use its prediction intervals to
calculate the sum of many individual predictions.

1) Some of the lower prediction intervals are negative, which is
non-sensical.  Should I just set all negative predictions to zero, or is
there another way to require non-negative predictions only?
2) I am interested in the sum of many predictions based on the lm.  How can
I calculate the 95% prediction interval for the sum?  Should I calculate a
root mean square of the individual errors, or use a bootstrap method, or
something else?

ps. the data is attached to the end of this email.

On Thu, Jun 5, 2008 at 6:25 PM, Levi Waldron [EMAIL PROTECTED] wrote:

 I am trying to model the observed leaching of wood preservative chemicals
 from treated wood during an outdoor experiment where leaching is caused by
 rainfall events.  For each rainfall event, the amount of rainfall was
 recorded as well as the amount of preservative chemical leached.  A number
 of climatic variables were measured, but the most important is the amount of
 rainfall.

 I have tried a simple linear model, with zero intercept because zero
 rainfall cannot cause any leaching (leachdata dataframe is attached to this
 email).  The diagnostics show clearly non-normally distributed residuals
 with a simple linear regression, and I am trying to figure out what to do
 about it (see attached diagnostics.png).  This dataset contains measurements
 from 57 rainfall events on three replicate samples, for a total of 171
 measurements.

 Part of the problem is that physically, the leaching values can only be
 positive, so for the smaller rainfall amounts the residuals are all
 positive.  If I allow an intercept then it is significantly positive,
 possibly since the researcher wouldn't have collected measurements for very
 small rain events, but in terms of the model it doesn't make sense
 physically to have a positive intercept, particularly since lab experiments
 have shown that a certain amount of rain exposure is required to wet the
 wood before leaching begins.

 I can get more normally distributed residuals by log-transforming the
 response, or using the optimal box-cox transformation of lambda = 0.21,
 which produces nicer-looking residuals but unsatisfactory prediction which
 is the main goal of the model (also attached).

 Any advice on how to create a better predictive model?  I presume it has
 something to do with glm, especially since I have repeated rainfalls on
 replicate samples, but any advice on the approach to take would be much
 appreciated.  The code I used to produce the attached plots is included
 below.


 leach.lm - lm(leachate~rainmm-1,data=leachdata)

 png(dianostics.png,height=1200,width=700)
 par(mfrow=c(3,2))
 plot(leachate~rainmm,data=leachdata,main=Data and fitted line)
 abline(leach.lm)
 plot(predict(leach.lm)~leachdata$leachate,main=predicted vs. observed
 leaching amount,xlim=c(0,12),ylim=c(0,12),xlab=observed
 leaching,ylab=predicted leaching)
 abline(a=0,b=1)
 plot(leach.lm)
 dev.off()

 library(MASS)
 boxcox(leach.lm,plotit=T,lambda=seq(0,0.4,by=0.01))

 boxtran - function(y,lambda,inverse=F){
   if(inverse)
 return((lambda*y+1)^(1/lambda))
   else
 return((y^lambda-1)/lambda)
 }

 png(boxcox-dianostics.png,height=1200,width=700)
 par(mfrow=c(3,2))
 logleach.lm - lm(boxtran(leachate,0.21)~rainmm-1,data=leachdata)
 plot(leachate~rainmm,data=leachdata,main=Data and fitted line)
 x - leachdata$rainmm
 y - boxtran(predict(logleach.lm),0.21,T)
 xy - cbind(x,y)[order(x),]
 lines(xy)
 plot(y~leachdata$leachate,xlim=c(0,12),ylim=c(0,12),main=predicted vs.
 observed leaching amount,xlab=observed leaching,ylab=predicted
 leaching)
 abline(a=0,b=1)
 plot(logleach.lm)
 dev.off()


`leachdata` -
structure(list(rainmm = c(19.68, 36.168, 18.632, 2.74, 0.822,
9.864, 7.124, 29.592, 4.384, 11.508, 1.37, 3.288, 9.042, 2.74,
18.906, 4.932, 0.274, 3.836, 1.918, 4.384, 16.714, 5.754, 12.604,
2.466, 13.014, 2.74, 14.796, 5.754, 4.93, 5.21, 0.548, 1.644,
3.014, 6.028, 18.358, 1.918, 3.014, 18.358, 0.274, 1.918, 54.2,
43.4, 11.2, 1.6, 3.8, 70.2, 0.2, 24.4, 25.8, 13, 7.124, 10.96,
7.672, 3.562, 3.288, 6.02, 17.54, 19.68, 36.168, 18.632, 2.74,
0.822, 9.864, 7.124, 29.592, 4.384, 11.508, 1.37, 3.288, 9.042,
2.74, 18.906, 4.932, 0.274, 3.836, 1.918, 4.384, 16.714, 5.754,
12.604, 2.466, 13.014, 2.74, 14.796, 5.754, 4.93, 5.21, 0.548,
1.644, 3.014, 6.028, 18.358, 1.918, 3.014, 18.358, 0.274, 1.918,
54.2, 43.4, 11.2, 1.6, 3.8, 70.2, 0.2, 24.4, 25.8, 13, 7.124,
10.96, 7.672, 3.562, 3.288, 6.02, 17.54, 19.68, 36.168, 18.632,
2.74, 0.822, 9.864, 7.124, 29.592, 4.384, 11.508, 1.37, 3.288,
9.042, 2.74, 18.906, 4.932, 0.274, 3.836, 1.918, 4.384, 16.714,
5.754, 12.604, 2.466, 13.014, 2.74, 14.796, 5.754, 4.93, 5.21,
0.548, 1.644, 3.014, 6.028, 18.358, 1.918, 3.014, 18.358, 0.274,
1.918, 54.2, 43.4, 11.2, 1.6, 3.8, 70.2, 0.2, 24.4, 25.8, 13,
7.124, 10.96, 7.672,

[R] reorder breaking by half

2008-06-06 Thread avilella

Hi,

I want to reorder the colors given by rainbow(7) so that the last half
move to the first 4.

For example:

 ci=rainbow(7)
 ci
[1] #FFFF #FFDB00FF #49FF00FF #00FF92FF #0092
#4900
[7] #FF00DBFF

I would like #FFFF #FFDB00FF #49FF00FF to be at the end of
ci, and the rest to be at the beginning.

How can I do that?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rmeta package: metaplot or forestplot of meta-analysis under DSL (ramdon) model

2008-06-06 Thread Thomas Lumley



The package has a plot() method for random-effects meta-analyses as well, 
either those produced by meta.DSL or meta.summaries.


There are examples on the help page for meta.DSL.

-thomas


On Tue, 27 May 2008, Shi, Jiajun [BSD] - KNP wrote:


Dear all,

I could not draw a forest plot for meta-analysis under ramdon models 
using the rmeta package.  The rmeta has a default function for MH 
(fixed-effect) model. Has the rmeta package been updated for such a 
function?  Or someone revised it and kept a private code?


I would appreciate it if you could provide some information on this 
question.


Thanks,

Andrew


This email is intended only for the use of the individua...{{dropped:12}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problem with subset

2008-06-06 Thread Luca Mortarini

Hi,
 I am new to R and i am looking for a way to extract a subset from a
vector.
I have a vector of number oscillating around zero (a decreasing
autocorrelation function) and i would like to extract only the first
positive part of the function (from zero lag to the lag where the function
inverts its sign for the first time).
I have tried

subset(myvector,myvector0)

but this obviously extract all the positive intervals not only the first one.
 Is there a logical statement i can use in subset? I prefer not to use an
if statement that would probably slow down the code.
Thanks a lot,
  Luca


*
dr. Luca Mortarini   [EMAIL PROTECTED]
Università del Piemonte Orientale
Dipartimento di Scienze e Tecnologie Avanzate

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Manipulating DataSets

2008-06-06 Thread Charles C. Berry


On Fri, 6 Jun 2008, Neil Gupta wrote:


Hello R-users,

I have a very simple problem I wanted to solve. I have a large dataset as
such:
  Lag  X.Symbol Time TickType ReferenceNumber  Price Size  X.Symbol.1
Time.1 TickType.1 ReferenceNumber.1
1  ES 3:ESZ7.GB 08:30:00B74390987 151075   44
3:ESZ7.GB08:30:00  A  74390988
2  ES 3:YMZ7.EC 08:30:00B74390993  13686   17
3:YMZ7.EC08:30:00  A  74390994
3  YM 3:ESZ7.GB 08:30:00B74391135 151075   49
3:ESZ7.GB08:30:00  A  74391136
4  YM 3:YMZ7.EC 08:30:00B74390998  13686   17
3:YMZ7.EC08:30:00  A  74390999
5  YM 3:ESZ7.GB 08:30:00B74391135 151075   49
3:ESZ7.GB08:30:00  A  74391136
6  YM 3:YMZ7.EC 08:30:00B74391000  13686   14
3:YMZ7.EC08:30:00  A  74391001
 Price.1 Size.1  LeadTime MidPoint Spread
1  151100 22  08:30:00 *151087.5* 25
2   13688 27  08:30:00  13687.0  2
3  151100 22  08:30:00 *151087.5* 25
4   13688 27  08:30:00  13687.0  2
5  151100 22  08:30:00 151087.5 25
6   13688 27  08:30:00  13687.0  2


All I wanted to do was take the Log(MidPoint[2]) - Log(MidPoint[1]) for a
symbol 3:ESZ7.GB
So the first one would be log(151087.5) - log(151087.5). I wanted to do this
throughout the data set and add that in another column. I would appreciate
any help.



See

example( split )

Note the ### data frame variation, which should serve as a template for 
your problem.


HTH,

Chuck




Regards,

Neil Gupta

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:[EMAIL PROTECTED]  UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn

i thought since the function code (which i provided in full) was pretty 
short, it would be reasonably easy to just read the code and see what 
it's doing.


but ok, so... i am attaching a zip file, with a small sample of the data 
set (tab delimited), and the function code, in a zip file (posting 
guidelines claim that some archive formats are allowed, i assume zip 
is one of them...


would appreciate your comments! :)

on 06/06/2008 12:05 PM Gabor Grothendieck said the following:

Its summarized in the last line to r-help.  Note reproducible and
minimal.

On Fri, Jun 6, 2008 at 12:03 PM, Daniel Folkinshteyn [EMAIL PROTECTED] wrote:

i did! what did i miss?

on 06/06/2008 11:45 AM Gabor Grothendieck said the following:

Try reading the posting guide before posting.

On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn [EMAIL PROTECTED]
wrote:

Anybody have any thoughts on this? Please? :)

on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:

Hi everyone!

I have a question about data processing efficiency.

My data are as follows: I have a data set on quarterly institutional
ownership of equities; some of them have had recent IPOs, some have not
(I
have a binary flag set). The total dataset size is 700k+ rows.

My goal is this: For every quarter since issue for each IPO, I need to
find a matched firm in the same industry, and close in market cap. So,
e.g., for firm X, which had an IPO, i need to find a matched non-issuing
firm in quarter 1 since IPO, then a (possibly different) non-issuing
firm in
quarter 2 since IPO, etc. Repeat for each issuing firm (there are about
8300
of these).

Thus it seems to me that I need to be doing a lot of data selection and
subsetting, and looping (yikes!), but the result appears to be highly
inefficient and takes ages (well, many hours). What I am doing, in
pseudocode, is this:

1. for each quarter of data, getting out all the IPOs and all the
eligible
non-issuing firms.
2. for each IPO in a quarter, grab all the non-issuers in the same
industry, sort them by size, and finally grab a matching firm closest in
size (the exact procedure is to grab the closest bigger firm if one
exists,
and just the biggest available if all are smaller)
3. assign the matched firm-observation the same quarters since issue
as
the IPO being matched
4. rbind them all into the matching dataset.

The function I currently have is pasted below, for your reference. Is
there any way to make it produce the same result but much faster?
Specifically, I am guessing eliminating some loops would be very good,
but I
don't see how, since I need to do some fancy footwork for each IPO in
each
quarter to find the matching firm. I'll be doing a few things similar to
this, so it's somewhat important to up the efficiency of this. Maybe
some of
you R-fu masters can clue me in? :)

I would appreciate any help, tips, tricks, tweaks, you name it! :)

== my function below ===

fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata,
quarters_since_issue=40) {

  result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is
cheaper, so typecast the result to matrix

  colnames = names(tfdata)

  quarterends = sort(unique(tfdata$DATE))

  for (aquarter in quarterends) {
  tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]

  tfdata_quarter_fitting_nonissuers = tfdata_quarter[
(tfdata_quarter$Quarters.Since.Latest.Issue  quarters_since_issue) 
(tfdata_quarter$IPO.Flag == 0), ]
  tfdata_quarter_ipoissuers = tfdata_quarter[
tfdata_quarter$IPO.Flag
== 1, ]

  for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
  arow = tfdata_quarter_ipoissuers[i,]
  industrypeers = tfdata_quarter_fitting_nonissuers[
tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
  industrypeers = industrypeers[
order(industrypeers$Market.Cap.13f), ]
  if ( nrow(industrypeers)  0 ) {
  if ( nrow(industrypeers[industrypeers$Market.Cap.13f =
arow$Market.Cap.13f, ])  0 ) {
  bestpeer = industrypeers[industrypeers$Market.Cap.13f

= arow$Market.Cap.13f, ][1,]

  }
  else {
  bestpeer = industrypeers[nrow(industrypeers),]
  }
  bestpeer$Quarters.Since.IPO.Issue =
arow$Quarters.Since.IPO.Issue

#tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO ==
bestpeer$PERMNO] = 1
  result = rbind(result, as.matrix(bestpeer))
  }
  }
  #result = rbind(result, tfdata_quarter)
  print (aquarter)
  }

  result = as.data.frame(result)
  names(result) = colnames
  return(result)

}

= end of my function =


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__

Re: [R] lsmeans

2008-06-06 Thread John Fox

Dear Dani,

I intend at some point to extend the effects package to linear and
generalized linear mixed-effects models, probably using lmer() rather
than lme(), but as you discovered, it doesn't handle these models now.

It wouldn't be hard, however, to do the computations yourself, using
the coefficient vector for the fixed effects and a suitably constructed
model-matrix to compute the effects; you could also get standard errors
by using the covariance matrix for the fixed effects.

I hope this helps,
 John

On Fri, 06 Jun 2008 17:05:58 +0200
 Dani Valverde [EMAIL PROTECTED] wrote:
 Hello,
 I have the next function call:
 
 lme(fixed=Error ~ Temperature * Tumour ,random = ~1|ID,
 data=error_DB)
 
 which returns an lme object. I am interested on carrying out some
 kind of lsmeans on the data returned, but I cannot find any function
 to do this in R. I'have seen the effect() function, but it does not
 work with lme objects. Any idea?
 
 Best,
 
 Dani
 
 -- 
 Daniel Valverde Saubí
 
 Grup de Biologia Molecular de Llevats
 Facultat de Veterinària de la Universitat Autònoma de Barcelona
 Edifici V, Campus UAB
 08193 Cerdanyola del Vallès- SPAIN
 
 Centro de Investigación Biomédica en Red
 en Bioingeniería, Biomateriales y
 Nanomedicina (CIBER-BBN)
 
 Grup d'Aplicacions Biomèdiques de la RMN
 Facultat de Biociències
 Universitat Autònoma de Barcelona
 Edifici Cs, Campus UAB
 08193 Cerdanyola del Vallès- SPAIN
 +34 93 5814126
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn

thanks for the tip! i'll try that and see how big of a difference that 
makes... if i am not sure what exactly the size will be, am i better off 
making it larger, and then later stripping off the blank rows, or making 
it smaller, and appending the missing rows?


on 06/06/2008 11:44 AM Patrick Burns said the following:

One thing that is likely to speed the code significantly
is if you create 'result' to be its final size and then
subscript into it.  Something like:

  result[i, ] - bestpeer

(though I'm not sure if 'i' is the proper index).

Patrick Burns
[EMAIL PROTECTED]
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and A Guide for the Unwilling S User)

Daniel Folkinshteyn wrote:

Anybody have any thoughts on this? Please? :)

on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:

Hi everyone!

I have a question about data processing efficiency.

My data are as follows: I have a data set on quarterly institutional 
ownership of equities; some of them have had recent IPOs, some have 
not (I have a binary flag set). The total dataset size is 700k+ rows.


My goal is this: For every quarter since issue for each IPO, I need 
to find a matched firm in the same industry, and close in market 
cap. So, e.g., for firm X, which had an IPO, i need to find a matched 
non-issuing firm in quarter 1 since IPO, then a (possibly different) 
non-issuing firm in quarter 2 since IPO, etc. Repeat for each issuing 
firm (there are about 8300 of these).


Thus it seems to me that I need to be doing a lot of data selection 
and subsetting, and looping (yikes!), but the result appears to be 
highly inefficient and takes ages (well, many hours). What I am 
doing, in pseudocode, is this:


1. for each quarter of data, getting out all the IPOs and all the 
eligible non-issuing firms.
2. for each IPO in a quarter, grab all the non-issuers in the same 
industry, sort them by size, and finally grab a matching firm closest 
in size (the exact procedure is to grab the closest bigger firm if 
one exists, and just the biggest available if all are smaller)
3. assign the matched firm-observation the same quarters since 
issue as the IPO being matched

4. rbind them all into the matching dataset.

The function I currently have is pasted below, for your reference. Is 
there any way to make it produce the same result but much faster? 
Specifically, I am guessing eliminating some loops would be very 
good, but I don't see how, since I need to do some fancy footwork for 
each IPO in each quarter to find the matching firm. I'll be doing a 
few things similar to this, so it's somewhat important to up the 
efficiency of this. Maybe some of you R-fu masters can clue me in? :)


I would appreciate any help, tips, tricks, tweaks, you name it! :)

== my function below ===

fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata, 
quarters_since_issue=40) {


result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is 
cheaper, so typecast the result to matrix


colnames = names(tfdata)

quarterends = sort(unique(tfdata$DATE))

for (aquarter in quarterends) {
tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]

tfdata_quarter_fitting_nonissuers = tfdata_quarter[ 
(tfdata_quarter$Quarters.Since.Latest.Issue  quarters_since_issue)  
(tfdata_quarter$IPO.Flag == 0), ]
tfdata_quarter_ipoissuers = tfdata_quarter[ 
tfdata_quarter$IPO.Flag == 1, ]


for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
arow = tfdata_quarter_ipoissuers[i,]
industrypeers = tfdata_quarter_fitting_nonissuers[ 
tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
industrypeers = industrypeers[ 
order(industrypeers$Market.Cap.13f), ]

if ( nrow(industrypeers)  0 ) {
if ( nrow(industrypeers[industrypeers$Market.Cap.13f 
= arow$Market.Cap.13f, ])  0 ) {
bestpeer = 
industrypeers[industrypeers$Market.Cap.13f = arow$Market.Cap.13f, ][1,]

}
else {
bestpeer = industrypeers[nrow(industrypeers),]
}
bestpeer$Quarters.Since.IPO.Issue = 
arow$Quarters.Since.IPO.Issue


#tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO == 
bestpeer$PERMNO] = 1

result = rbind(result, as.matrix(bestpeer))
}
}
#result = rbind(result, tfdata_quarter)
print (aquarter)
}

result = as.data.frame(result)
names(result) = colnames
return(result)

}

= end of my function =



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.






__
R-help@r-project.org mailing list

Re: [R] reorder breaking by half

2008-06-06 Thread Daniel Folkinshteyn


ci = rainbow(7)[c(4:7, 1:3)]

on 06/06/2008 01:02 PM avilella said the following:

Hi,

I want to reorder the colors given by rainbow(7) so that the last half
move to the first 4.

For example:


ci=rainbow(7)
ci

[1] #FFFF #FFDB00FF #49FF00FF #00FF92FF #0092
#4900
[7] #FF00DBFF

I would like #FFFF #FFDB00FF #49FF00FF to be at the end of
ci, and the rest to be at the beginning.

How can I do that?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Improving data processing efficiency

2008-06-06 Thread Gabor Grothendieck

I think the posting guide may not be clear enough and have suggested that
it be clarified.  Hopefully this better communicates what is required and why
in a shorter amount of space:

https://stat.ethz.ch/pipermail/r-devel/2008-June/049891.html


On Fri, Jun 6, 2008 at 1:25 PM, Daniel Folkinshteyn [EMAIL PROTECTED] wrote:
 i thought since the function code (which i provided in full) was pretty
 short, it would be reasonably easy to just read the code and see what it's
 doing.

 but ok, so... i am attaching a zip file, with a small sample of the data set
 (tab delimited), and the function code, in a zip file (posting guidelines
 claim that some archive formats are allowed, i assume zip is one of
 them...

 would appreciate your comments! :)

 on 06/06/2008 12:05 PM Gabor Grothendieck said the following:

 Its summarized in the last line to r-help.  Note reproducible and
 minimal.

 On Fri, Jun 6, 2008 at 12:03 PM, Daniel Folkinshteyn [EMAIL PROTECTED]
 wrote:

 i did! what did i miss?

 on 06/06/2008 11:45 AM Gabor Grothendieck said the following:

 Try reading the posting guide before posting.

 On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn
 [EMAIL PROTECTED]
 wrote:

 Anybody have any thoughts on this? Please? :)

 on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:

 Hi everyone!

 I have a question about data processing efficiency.

 My data are as follows: I have a data set on quarterly institutional
 ownership of equities; some of them have had recent IPOs, some have
 not
 (I
 have a binary flag set). The total dataset size is 700k+ rows.

 My goal is this: For every quarter since issue for each IPO, I need to
 find a matched firm in the same industry, and close in market cap.
 So,
 e.g., for firm X, which had an IPO, i need to find a matched
 non-issuing
 firm in quarter 1 since IPO, then a (possibly different) non-issuing
 firm in
 quarter 2 since IPO, etc. Repeat for each issuing firm (there are
 about
 8300
 of these).

 Thus it seems to me that I need to be doing a lot of data selection
 and
 subsetting, and looping (yikes!), but the result appears to be highly
 inefficient and takes ages (well, many hours). What I am doing, in
 pseudocode, is this:

 1. for each quarter of data, getting out all the IPOs and all the
 eligible
 non-issuing firms.
 2. for each IPO in a quarter, grab all the non-issuers in the same
 industry, sort them by size, and finally grab a matching firm closest
 in
 size (the exact procedure is to grab the closest bigger firm if one
 exists,
 and just the biggest available if all are smaller)
 3. assign the matched firm-observation the same quarters since issue
 as
 the IPO being matched
 4. rbind them all into the matching dataset.

 The function I currently have is pasted below, for your reference. Is
 there any way to make it produce the same result but much faster?
 Specifically, I am guessing eliminating some loops would be very good,
 but I
 don't see how, since I need to do some fancy footwork for each IPO in
 each
 quarter to find the matching firm. I'll be doing a few things similar
 to
 this, so it's somewhat important to up the efficiency of this. Maybe
 some of
 you R-fu masters can clue me in? :)

 I would appreciate any help, tips, tricks, tweaks, you name it! :)

 == my function below ===

 fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata,
 quarters_since_issue=40) {

  result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is
 cheaper, so typecast the result to matrix

  colnames = names(tfdata)

  quarterends = sort(unique(tfdata$DATE))

  for (aquarter in quarterends) {
  tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]

  tfdata_quarter_fitting_nonissuers = tfdata_quarter[
 (tfdata_quarter$Quarters.Since.Latest.Issue  quarters_since_issue) 
 (tfdata_quarter$IPO.Flag == 0), ]
  tfdata_quarter_ipoissuers = tfdata_quarter[
 tfdata_quarter$IPO.Flag
 == 1, ]

  for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
  arow = tfdata_quarter_ipoissuers[i,]
  industrypeers = tfdata_quarter_fitting_nonissuers[
 tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
  industrypeers = industrypeers[
 order(industrypeers$Market.Cap.13f), ]
  if ( nrow(industrypeers)  0 ) {
  if ( nrow(industrypeers[industrypeers$Market.Cap.13f =
 arow$Market.Cap.13f, ])  0 ) {
  bestpeer = industrypeers[industrypeers$Market.Cap.13f

 = arow$Market.Cap.13f, ][1,]

  }
  else {
  bestpeer = industrypeers[nrow(industrypeers),]
  }
  bestpeer$Quarters.Since.IPO.Issue =
 arow$Quarters.Since.IPO.Issue

 #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO ==
 bestpeer$PERMNO] = 1
  result = rbind(result, as.matrix(bestpeer))
  }
  }
  #result = rbind(result, tfdata_quarter)
  print (aquarter)
  }

  result = as.data.frame(result)
  names(result) = colnames

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn

just in case, uploaded it to the server, you can get the zip file i 
mentioned here:

http://astro.temple.edu/~dfolkins/helplistfiles.zip

on 06/06/2008 01:25 PM Daniel Folkinshteyn said the following:
i thought since the function code (which i provided in full) was pretty 
short, it would be reasonably easy to just read the code and see what 
it's doing.


but ok, so... i am attaching a zip file, with a small sample of the data 
set (tab delimited), and the function code, in a zip file (posting 
guidelines claim that some archive formats are allowed, i assume zip 
is one of them...


would appreciate your comments! :)

on 06/06/2008 12:05 PM Gabor Grothendieck said the following:

Its summarized in the last line to r-help.  Note reproducible and
minimal.

On Fri, Jun 6, 2008 at 12:03 PM, Daniel Folkinshteyn 
[EMAIL PROTECTED] wrote:

i did! what did i miss?

on 06/06/2008 11:45 AM Gabor Grothendieck said the following:

Try reading the posting guide before posting.

On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn 
[EMAIL PROTECTED]

wrote:

Anybody have any thoughts on this? Please? :)

on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:

Hi everyone!

I have a question about data processing efficiency.

My data are as follows: I have a data set on quarterly institutional
ownership of equities; some of them have had recent IPOs, some 
have not

(I
have a binary flag set). The total dataset size is 700k+ rows.

My goal is this: For every quarter since issue for each IPO, I 
need to
find a matched firm in the same industry, and close in market 
cap. So,
e.g., for firm X, which had an IPO, i need to find a matched 
non-issuing

firm in quarter 1 since IPO, then a (possibly different) non-issuing
firm in
quarter 2 since IPO, etc. Repeat for each issuing firm (there are 
about

8300
of these).

Thus it seems to me that I need to be doing a lot of data 
selection and

subsetting, and looping (yikes!), but the result appears to be highly
inefficient and takes ages (well, many hours). What I am doing, in
pseudocode, is this:

1. for each quarter of data, getting out all the IPOs and all the
eligible
non-issuing firms.
2. for each IPO in a quarter, grab all the non-issuers in the same
industry, sort them by size, and finally grab a matching firm 
closest in

size (the exact procedure is to grab the closest bigger firm if one
exists,
and just the biggest available if all are smaller)
3. assign the matched firm-observation the same quarters since 
issue

as
the IPO being matched
4. rbind them all into the matching dataset.

The function I currently have is pasted below, for your reference. Is
there any way to make it produce the same result but much faster?
Specifically, I am guessing eliminating some loops would be very 
good,

but I
don't see how, since I need to do some fancy footwork for each IPO in
each
quarter to find the matching firm. I'll be doing a few things 
similar to

this, so it's somewhat important to up the efficiency of this. Maybe
some of
you R-fu masters can clue me in? :)

I would appreciate any help, tips, tricks, tweaks, you name it! :)

== my function below ===

fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata,
quarters_since_issue=40) {

  result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is
cheaper, so typecast the result to matrix

  colnames = names(tfdata)

  quarterends = sort(unique(tfdata$DATE))

  for (aquarter in quarterends) {
  tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]

  tfdata_quarter_fitting_nonissuers = tfdata_quarter[
(tfdata_quarter$Quarters.Since.Latest.Issue  quarters_since_issue) 
(tfdata_quarter$IPO.Flag == 0), ]
  tfdata_quarter_ipoissuers = tfdata_quarter[
tfdata_quarter$IPO.Flag
== 1, ]

  for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
  arow = tfdata_quarter_ipoissuers[i,]
  industrypeers = tfdata_quarter_fitting_nonissuers[
tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
  industrypeers = industrypeers[
order(industrypeers$Market.Cap.13f), ]
  if ( nrow(industrypeers)  0 ) {
  if ( nrow(industrypeers[industrypeers$Market.Cap.13f =
arow$Market.Cap.13f, ])  0 ) {
  bestpeer = 
industrypeers[industrypeers$Market.Cap.13f

= arow$Market.Cap.13f, ][1,]

  }
  else {
  bestpeer = industrypeers[nrow(industrypeers),]
  }
  bestpeer$Quarters.Since.IPO.Issue =
arow$Quarters.Since.IPO.Issue

#tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO ==
bestpeer$PERMNO] = 1
  result = rbind(result, as.matrix(bestpeer))
  }
  }
  #result = rbind(result, tfdata_quarter)
  print (aquarter)
  }

  result = as.data.frame(result)
  names(result) = colnames
  return(result)

}

= end of my function =


__
R-help@r-project.org mailing list

Re: [R] How to force two regression coefficients to be equal but opposite in sign?

2008-06-06 Thread Greg Snow

One simple way is to do something like:

 fit - lm(y ~ I(x1-x2) + x3, data=mydata)

The first coeficient (after the intercept) will be the slope for x1, the slope 
for x2 will be the negative of that.  This model is nested in the fuller model 
with x1 and x2 fit seperately and you can therefore test for differences.

Hope this helps,

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111



 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Woolner, Keith
 Sent: Friday, June 06, 2008 10:07 AM
 To: r-help@r-project.org
 Subject: [R] How to force two regression coefficients to be
 equal but opposite in sign?

 Is there a way to set up a regression in R that forces two
 coefficients

 to be equal but opposite in sign?



 I'm trying to setup a model where a subject appears in a pair of

 environments where a measurement X is made.  There are a total of 5

 environments, one of which is a baseline.  But each observation is for

 a subject in only two of them, and not all subjects will appear in

 each environment.



 Each of the environments has an effect on the variable X.  I want to

 measure the relative effects of each environment E on X with a model.



 Xj = Xi * Ei / Ej



 Ei of the baseline model is set equal to 1.



 With a log transform, a linear-looking regression can be written as:



 log(Xj) = log(Xi) + log(Ei) - log(Ej)



 My data looks like:



 #E1   X1   E2X2

 1A   .20   B.25



 What I've tried in R:



 env - c(A,B,C,D,E)



 # Note: data is made up just for this example



 df - data.frame(

 X1 =
 c(.20,.10,.40,.05,.10,.24,.30,.70,.48,.22,.87,.29,.24,.19,.92),

 X2 =
 c(.25,.12,.45,.01,.19,.50,.30,.40,.50,.40,.68,.30,.16,.02,.70),

 E1 =
 c(A,A,A,B,B,B,C,C,C,D,D,D,E,E,E),

 E2 =
 c(B,C,D,A,D,E,A,B,E,B,C,E,A,B,C)

 )



 model - lm(log(X2) ~ log(X1) + E1 + E2, data = df)



 summary(model)



 Call:

 lm(formula = log(X2) ~ log(X1) + E1 + E2, data = df)



 Residuals:

   1   2   3   4   5   6   7
 8   9
 10  11  12  13  14  15

  0.3240  0.2621 -0.5861 -1.0283  0.5861  0.4422  0.3831
 -0.2608 -0.1222
 0.9002 -0.5802 -0.3200  0.6452 -0.9634  0.3182



 Coefficients:

 Estimate Std. Error t value Pr(|t|)

 (Intercept)  0.545631.71558   0.3180.763

 log(X1)  1.297450.57295   2.2650.073 .

 E1B -0.235710.95738  -0.2460.815

 E1C -0.570571.20490  -0.4740.656

 E1D -0.229880.98274  -0.2340.824

 E1E -1.171811.02918  -1.1390.306

 E2B -0.167750.87803  -0.1910.856

 E2C  0.059521.12779   0.0530.960

 E2D  0.430771.19485   0.3610.733

 E2E  0.406330.98289   0.4130.696

 ---

 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1



 Residual standard error: 1.004 on 5 degrees of freedom

 Multiple R-squared: 0.7622, Adjusted R-squared: 0.3343

 F-statistic: 1.781 on 9 and 5 DF,  p-value: 0.2721



 



 What I need to do is force the corresponding environment coefficients

 to be equal in absolute value, but opposite in sign.  That is:



 E1B = -E2B

 E1C = -E3C

 E1D = -E3D

 E1E = -E1E



 In essence, E1 and E2 are the same variable, but can play two

 different roles in the model depending on whether it's the first part

 of the observation or the second part.



 I searched the archive, and the closest thing I found to my situation

 was:



 http://tolstoy.newcastle.edu.au/R/e4/help/08/03/6773.html



 But the response to that thread didn't seem to be applicable to my

 situation.



 Any pointers would be appreciated.



 Thanks,

 Keith




 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] where to download BRugs?

2008-06-06 Thread Prof Brian Ripley


On Fri, 6 Jun 2008, Nanye Long wrote:


Hi all,

Does anyone know where to download the BRugs package? I did not find
it on r-project website. Thanks.


It is Windows-only, and you download it from 'CRAN (extras)' which is part 
of the default repository set on Windows versions of R.  So


install.packages(BRugs)

is all that is needed unless you changed something to stop it working.
(It is only available for R = 2.6.0.)

--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn

Ok, sorry about the zip, then. :) Thanks for taking the trouble to clue 
me in as to the best posting procedure!


well, here's a dput-ed version of the small data subset you can use for 
testing. below that, an updated version of the function, with extra 
explanatory comments, and producing an extra column showing exactly what 
is matched to what.


to test, just run the function, with the dataset as sole argument.

Thanks again; i'd appreciate any input on this.

=== begin dataset dput representation 

structure(list(PERMNO = c(10001L, 10001L, 10298L, 10298L, 10484L,
10484L, 10515L, 10515L, 10634L, 10634L, 11048L, 11048L, 11237L,
11294L, 11294L, 11338L, 11338L, 11404L, 11404L, 11587L, 11587L,
11591L, 11591L, 11737L, 11737L, 11791L, 11809L, 11809L, 11858L,
11858L, 11955L, 11955L, 12003L, 12003L, 12016L, 12016L, 12223L,
12223L, 12758L, 12758L, 13688L, 13688L, 16117L, 16117L, 17770L,
17770L, 21514L, 21514L, 21792L, 21792L, 21821L, 21821L, 22437L,
22437L, 22947L, 22947L, 23027L, 23027L, 23182L, 23182L, 23536L,
23536L, 23712L, 23712L, 24053L, 24053L, 24117L, 24117L, 24256L,
24256L, 24299L, 24299L, 24352L, 24352L, 24379L, 24379L, 24467L,
24467L, 24679L, 24679L, 24870L, 24870L, 25056L, 25056L, 25208L,
25208L, 25232L, 25232L, 25241L, 25590L, 25590L, 26463L, 26463L,
26470L, 26470L, 26614L, 26614L, 27385L, 27385L, 29196L, 29196L,
30411L, 30411L, 32943L, 32943L, 38893L, 38893L, 40708L, 40708L,
41005L, 41005L, 42817L, 42817L, 42833L, 42833L, 43668L, 43668L,
45947L, 45947L, 46017L, 46017L, 48274L, 48274L, 49971L, 49971L,
53786L, 53786L, 53859L, 53859L, 54199L, 54199L, 56371L, 56952L,
56952L, 57277L, 57277L, 57381L, 57381L, 58202L, 58202L, 59395L,
59395L, 59935L, 60169L, 60169L, 61188L, 61188L, 61444L, 61444L,
62690L, 62690L, 62842L, 62842L, 64290L, 64290L, 64418L, 64418L,
64450L, 64450L, 64477L, 64477L, 64557L, 64557L, 64646L, 64646L,
64902L, 64902L, 67774L, 67774L, 68910L, 68910L, 70471L, 70471L,
74406L, 74406L, 75091L, 75091L, 75304L, 75304L, 75743L, 75964L,
75964L, 76026L, 76026L, 76162L, 76170L, 76173L, 78530L, 78530L,
78682L, 78682L, 81569L, 81569L, 82502L, 82502L, 83337L, 83337L,
83919L, 83919L, 88242L, 88242L, 90852L, 90852L, 91353L, 91353L
), DATE = c(19900331, 19900630, 19900630, 19900331, 19900331,
19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630,
19900630, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630,
19900331, 19900630, 19900630, 19900331, 19900630, 19900331, 19900630,
19900630, 19900331, 19900630, 19900331, 19900331, 19900630, 19900630,
19900331, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630,
19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900630,
19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331,
19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900630,
19900331, 19900331, 19900630, 19900331, 19900630, 19900630, 19900331,
19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900331,
19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630,
19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900331,
19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900331,
19900630, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630,
19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900331,
19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630,
19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900331,
19900630, 19900331, 19900630, 19900630, 19900331, 19900331, 19900630,
19900630, 19900331, 19900630, 19900630, 19900331, 19900630, 19900331,
19900630, 19900331, 19900331, 19900630, 19900331, 19900331, 19900630,
19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900331,
19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630,
19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900331,
19900630, 19900630, 19900331, 19900630, 19900331, 19900331, 19900630,
19900630, 19900331, 19900331, 19900630, 19900331, 19900630, 19900630,
19900331, 19900630, 19900331, 19900630, 19900630, 19900630, 19900630,
19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900630,
19900331, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630,
19900331, 19900630, 19900331, 19900630), Shares.Owned = c(50100,
50100, 25, 293500, 3656629, 3827119, 4132439, 3566591, 2631193,
2500301, 775879, 816879, 38700, 1041600, 1070300, 533768, 558815,
61384492, 60466567, 194595, 196979, 359946, 314446, 106770, 107070,
20242, 1935098, 2099403, 1902125, 1766750, 41991, 41991, 34490,
36290, 589400, 596700, 1549395, 1759440, 854473, 762903, 156366785,
98780287, 2486389, 2635718, 122264, 122292, 25455916, 25458658,
71645490, 71855722, 30969596, 30409838, 2738576, 2814490, 20846605,
20930233, 1148299, 505415, 396388, 385714, 25239923, 24117950,
73465526, 73084616, 8096614, 7595742, 3937930, 3820215, 20884821,
19456342, 2127331, 2188276, 2334515, 2813347, 8267264, 8544084,
783277, 810742, 742048, 512956, 9659658, 9436873,

Re: [R] Subsetting to unique values

2008-06-06 Thread jim holtman

The interesting thing about R is that there are several ways to skin the
cat; here is yet another solution:

 do.call(rbind, by(ddTable, ddTable$Id, function(z) z[1,,drop=FALSE]))
  Id name
1  1 Paul
2  2  Bob



On Fri, Jun 6, 2008 at 9:35 AM, Emslie, Paul [Ctr] [EMAIL PROTECTED] wrote:

 I want to take the first row of each unique ID value from a data frame.
 For instance
  ddTable -
 data.frame(Id=c(1,1,2,2),name=c(Paul,Joe,Bob,Larry))

 I want a dataset that is
 Id  Name
 1   Paul
 2   Bob

  unique(ddTable)
 Will give me all 4 rows, and
  unique(ddTable$Id)
 Will give me c(1,2), but not accompanied by the name column.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot questions

2008-06-06 Thread Thompson, David (MNR)

Thanx Thierry,

Suggestion #1 had no effect.
I have been playing with variants on #2 along the way.

DaveT.
-Original Message-
From: ONKELINX, Thierry [mailto:[EMAIL PROTECTED] 
Sent: June 6, 2008 04:02 AM
To: Thompson, David (MNR); hadley wickham
Cc: r-help@r-project.org
Subject: RE: [R] ggplot questions

David,

1. Try scale_x_continuous(lim = c(0, 360)) + scale_y_continuous(lim =
c(0, 16))
2. You could set the colour of the gridlines equal to the backgroup
colour with ggopt

HTH,

Thierry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Improving data processing efficiency

2008-06-06 Thread Patrick Burns


That is going to be situation dependent, but if you
have a reasonable upper bound, then that will be
much easier and not far from optimal.

If you pick the possibly too small route, then increasing
the size in largish junks is much better than adding
a row at a time.

Pat

Daniel Folkinshteyn wrote:
thanks for the tip! i'll try that and see how big of a difference that 
makes... if i am not sure what exactly the size will be, am i better 
off making it larger, and then later stripping off the blank rows, or 
making it smaller, and appending the missing rows?


on 06/06/2008 11:44 AM Patrick Burns said the following:

One thing that is likely to speed the code significantly
is if you create 'result' to be its final size and then
subscript into it.  Something like:

  result[i, ] - bestpeer

(though I'm not sure if 'i' is the proper index).

Patrick Burns
[EMAIL PROTECTED]
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and A Guide for the Unwilling S User)

Daniel Folkinshteyn wrote:

Anybody have any thoughts on this? Please? :)

on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:

Hi everyone!

I have a question about data processing efficiency.

My data are as follows: I have a data set on quarterly 
institutional ownership of equities; some of them have had recent 
IPOs, some have not (I have a binary flag set). The total dataset 
size is 700k+ rows.


My goal is this: For every quarter since issue for each IPO, I need 
to find a matched firm in the same industry, and close in market 
cap. So, e.g., for firm X, which had an IPO, i need to find a 
matched non-issuing firm in quarter 1 since IPO, then a (possibly 
different) non-issuing firm in quarter 2 since IPO, etc. Repeat for 
each issuing firm (there are about 8300 of these).


Thus it seems to me that I need to be doing a lot of data selection 
and subsetting, and looping (yikes!), but the result appears to be 
highly inefficient and takes ages (well, many hours). What I am 
doing, in pseudocode, is this:


1. for each quarter of data, getting out all the IPOs and all the 
eligible non-issuing firms.
2. for each IPO in a quarter, grab all the non-issuers in the same 
industry, sort them by size, and finally grab a matching firm 
closest in size (the exact procedure is to grab the closest bigger 
firm if one exists, and just the biggest available if all are smaller)
3. assign the matched firm-observation the same quarters since 
issue as the IPO being matched

4. rbind them all into the matching dataset.

The function I currently have is pasted below, for your reference. 
Is there any way to make it produce the same result but much 
faster? Specifically, I am guessing eliminating some loops would be 
very good, but I don't see how, since I need to do some fancy 
footwork for each IPO in each quarter to find the matching firm. 
I'll be doing a few things similar to this, so it's somewhat 
important to up the efficiency of this. Maybe some of you R-fu 
masters can clue me in? :)


I would appreciate any help, tips, tricks, tweaks, you name it! :)

== my function below ===

fcn_create_nonissuing_match_by_quarterssinceissue = 
function(tfdata, quarters_since_issue=40) {


result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix 
is cheaper, so typecast the result to matrix


colnames = names(tfdata)

quarterends = sort(unique(tfdata$DATE))

for (aquarter in quarterends) {
tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]

tfdata_quarter_fitting_nonissuers = tfdata_quarter[ 
(tfdata_quarter$Quarters.Since.Latest.Issue  quarters_since_issue) 
 (tfdata_quarter$IPO.Flag == 0), ]
tfdata_quarter_ipoissuers = tfdata_quarter[ 
tfdata_quarter$IPO.Flag == 1, ]


for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
arow = tfdata_quarter_ipoissuers[i,]
industrypeers = tfdata_quarter_fitting_nonissuers[ 
tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
industrypeers = industrypeers[ 
order(industrypeers$Market.Cap.13f), ]

if ( nrow(industrypeers)  0 ) {
if ( 
nrow(industrypeers[industrypeers$Market.Cap.13f = 
arow$Market.Cap.13f, ])  0 ) {
bestpeer = 
industrypeers[industrypeers$Market.Cap.13f = arow$Market.Cap.13f, 
][1,]

}
else {
bestpeer = industrypeers[nrow(industrypeers),]
}
bestpeer$Quarters.Since.IPO.Issue = 
arow$Quarters.Since.IPO.Issue


#tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO == 
bestpeer$PERMNO] = 1

result = rbind(result, as.matrix(bestpeer))
}
}
#result = rbind(result, tfdata_quarter)
print (aquarter)
}

result = as.data.frame(result)
names(result) = colnames
return(result)

}

= end of my function =



__
R-help@r-project.org mailing

Re: [R] Problem with subset

2008-06-06 Thread Charles C. Berry


On Fri, 6 Jun 2008, Luca Mortarini wrote:


Hi,
I am new to R and i am looking for a way to extract a subset from a
vector.
I have a vector of number oscillating around zero (a decreasing
autocorrelation function) and i would like to extract only the first
positive part of the function (from zero lag to the lag where the function
inverts its sign for the first time).
I have tried

subset(myvector,myvector0)

but this obviously extract all the positive intervals not only the first one.
Is there a logical statement i can use in subset? I prefer not to use an


For vector subsets you probably want [. Because from help([)

For ordinary vectors, the result is simply x[subset  !is.na(subset)].


But see

?rle

Something like

myvector[ 1 : rle( myvector = 0 )$lengths[ 1 ] ]

should work.

HTH,

Chuck



if statement that would probably slow down the code.
Thanks a lot,
 Luca


*
dr. Luca Mortarini   [EMAIL PROTECTED]
Università del Piemonte Orientale
Dipartimento di Scienze e Tecnologie Avanzate

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:[EMAIL PROTECTED]  UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot questions

2008-06-06 Thread hadley wickham

 Does the difference have something to do with ggplot() using ranges
 derived from the data?
 When I modify my original 'test' dataframe with two extra rows as
 defined below, I get expected results in both versions.

Order shouldn't matter - and if it's making a difference, that's a
bug.  But I'm still not completely sure what you're expecting.

 This highlights my next question (warned you ;-) ), I have been
 unsuccessful in trying to define fixed plotting ranges to generate a
 'template' graphic that I may reuse with successive 'overstory plot'
 data sets. I have used '+ xlim(0, 360) + ylim(0, 16)' but, this seems to
 not have any effect on the final plot layout.

Could you please produce a small reproducible example that
demonstrates this?  It may well be a bug.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Improving data processing efficiency

2008-06-06 Thread Daniel Folkinshteyn

Cool, I do have an upper bound, so I'll try it and how much of a 
speedboost it gives me. Thanks for the suggestion!


on 06/06/2008 02:03 PM Patrick Burns said the following:

That is going to be situation dependent, but if you
have a reasonable upper bound, then that will be
much easier and not far from optimal.

If you pick the possibly too small route, then increasing
the size in largish junks is much better than adding
a row at a time.

Pat

Daniel Folkinshteyn wrote:
thanks for the tip! i'll try that and see how big of a difference that 
makes... if i am not sure what exactly the size will be, am i better 
off making it larger, and then later stripping off the blank rows, or 
making it smaller, and appending the missing rows?


on 06/06/2008 11:44 AM Patrick Burns said the following:

One thing that is likely to speed the code significantly
is if you create 'result' to be its final size and then
subscript into it.  Something like:

  result[i, ] - bestpeer

(though I'm not sure if 'i' is the proper index).

Patrick Burns
[EMAIL PROTECTED]
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and A Guide for the Unwilling S User)

Daniel Folkinshteyn wrote:

Anybody have any thoughts on this? Please? :)

on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:

Hi everyone!

I have a question about data processing efficiency.

My data are as follows: I have a data set on quarterly 
institutional ownership of equities; some of them have had recent 
IPOs, some have not (I have a binary flag set). The total dataset 
size is 700k+ rows.


My goal is this: For every quarter since issue for each IPO, I need 
to find a matched firm in the same industry, and close in market 
cap. So, e.g., for firm X, which had an IPO, i need to find a 
matched non-issuing firm in quarter 1 since IPO, then a (possibly 
different) non-issuing firm in quarter 2 since IPO, etc. Repeat for 
each issuing firm (there are about 8300 of these).


Thus it seems to me that I need to be doing a lot of data selection 
and subsetting, and looping (yikes!), but the result appears to be 
highly inefficient and takes ages (well, many hours). What I am 
doing, in pseudocode, is this:


1. for each quarter of data, getting out all the IPOs and all the 
eligible non-issuing firms.
2. for each IPO in a quarter, grab all the non-issuers in the same 
industry, sort them by size, and finally grab a matching firm 
closest in size (the exact procedure is to grab the closest bigger 
firm if one exists, and just the biggest available if all are smaller)
3. assign the matched firm-observation the same quarters since 
issue as the IPO being matched

4. rbind them all into the matching dataset.

The function I currently have is pasted below, for your reference. 
Is there any way to make it produce the same result but much 
faster? Specifically, I am guessing eliminating some loops would be 
very good, but I don't see how, since I need to do some fancy 
footwork for each IPO in each quarter to find the matching firm. 
I'll be doing a few things similar to this, so it's somewhat 
important to up the efficiency of this. Maybe some of you R-fu 
masters can clue me in? :)


I would appreciate any help, tips, tricks, tweaks, you name it! :)

== my function below ===

fcn_create_nonissuing_match_by_quarterssinceissue = 
function(tfdata, quarters_since_issue=40) {


result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix 
is cheaper, so typecast the result to matrix


colnames = names(tfdata)

quarterends = sort(unique(tfdata$DATE))

for (aquarter in quarterends) {
tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]

tfdata_quarter_fitting_nonissuers = tfdata_quarter[ 
(tfdata_quarter$Quarters.Since.Latest.Issue  quarters_since_issue) 
 (tfdata_quarter$IPO.Flag == 0), ]
tfdata_quarter_ipoissuers = tfdata_quarter[ 
tfdata_quarter$IPO.Flag == 1, ]


for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
arow = tfdata_quarter_ipoissuers[i,]
industrypeers = tfdata_quarter_fitting_nonissuers[ 
tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
industrypeers = industrypeers[ 
order(industrypeers$Market.Cap.13f), ]

if ( nrow(industrypeers)  0 ) {
if ( 
nrow(industrypeers[industrypeers$Market.Cap.13f = 
arow$Market.Cap.13f, ])  0 ) {
bestpeer = 
industrypeers[industrypeers$Market.Cap.13f = arow$Market.Cap.13f, 
][1,]

}
else {
bestpeer = industrypeers[nrow(industrypeers),]
}
bestpeer$Quarters.Since.IPO.Issue = 
arow$Quarters.Since.IPO.Issue


#tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO == 
bestpeer$PERMNO] = 1

result = rbind(result, as.matrix(bestpeer))
}
}
#result = rbind(result, tfdata_quarter)
print (aquarter)
}

result =

Re: [R] boxplot changes fontsize of labels

2008-06-06 Thread Prof Brian Ripley

Please read the help for par(mfrow)!  AFAICS this is nothing to do with 
boxplot().


  In a layout with exactly two rows and columns the base value
  of 'cex' is reduced by a factor of 0.83: if there are three
  or more of either rows or columns, the reduction factor is
  0.66.

See also the 'consider the alternatives' in that entry.

On Fri, 6 Jun 2008, Sebastian Merz wrote:


Hi all!

So far I learned some R but finilizing my plots so they look
publishable seems not to be possible.

I set up some boxplots. Everything works well but when I put more then
two of them in one plot the labels of the axes appear smaller than the
normal font size.


x - rnorm(30)
y - rnorm(30)
par(mfrow=c(1,4))
boxplot(x,y, names=c(horray, hurra))
mtext(Jubel, side=1, line=2)


In case I take one or two boxplots this does not happen:

par(mfrow=c(1,2))
boxplot(x,y, names=c(horray, hurra))
mtext(Jubel, side=1, line=2)


The cex.axis seems not to be changed, as setting it to 1.0 doesn't
change the behaviour. If cex.axis=1.3 in the first example the font
size used by boxplot and by mtext is about the same. But as I use a
function to draw quite some of these plots this hack is not a proper
solution.

I couldn't find anything about this behaviour in the documention or
the inet.

Can anybody explain? All hints are appriciated.

Thanks,
S. Merz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Improving data processing efficiency

2008-06-06 Thread Greg Snow

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Patrick Burns
 Sent: Friday, June 06, 2008 12:04 PM
 To: Daniel Folkinshteyn
 Cc: r-help@r-project.org
 Subject: Re: [R] Improving data processing efficiency

 That is going to be situation dependent, but if you have a
 reasonable upper bound, then that will be much easier and not
 far from optimal.

 If you pick the possibly too small route, then increasing the
 size in largish junks is much better than adding a row at a time.

Pat,

I am unfamiliar with the use of the word junk as a unit of measure for data 
objects.  I figure there are a few different possibilities:

1. You are using the term intentionally meaning that you suggest he increases 
the size in terms of old cars and broken pianos rather than used up pens and 
broken pencils.

2. This was a Freudian slip based on your opinion of some datasets you have 
seen.

3. Somewhere between your mind and the final product jumps/chunks became 
junks (possibly a microsoft correction, or just typing too fast combined 
with number 2).

4. junks is an official measure of data/object size that I need to learn more 
about (the history of the term possibly being related to 2 and 3 above).

Please let it be #4, I would love to be able to tell some clients that I have 
received a junk of data from them.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Improving data processing efficiency

2008-06-06 Thread Gabor Grothendieck

On Fri, Jun 6, 2008 at 2:28 PM, Greg Snow [EMAIL PROTECTED] wrote:
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Patrick Burns
 Sent: Friday, June 06, 2008 12:04 PM
 To: Daniel Folkinshteyn
 Cc: r-help@r-project.org
 Subject: Re: [R] Improving data processing efficiency

 That is going to be situation dependent, but if you have a
 reasonable upper bound, then that will be much easier and not
 far from optimal.

 If you pick the possibly too small route, then increasing the
 size in largish junks is much better than adding a row at a time.

 Pat,

 I am unfamiliar with the use of the word junk as a unit of measure for data 
 objects.  I figure there are a few different possibilities:

 1. You are using the term intentionally meaning that you suggest he increases 
 the size in terms of old cars and broken pianos rather than used up pens and 
 broken pencils.

 2. This was a Freudian slip based on your opinion of some datasets you have 
 seen.

 3. Somewhere between your mind and the final product jumps/chunks became 
 junks (possibly a microsoft correction, or just typing too fast combined 
 with number 2).

 4. junks is an official measure of data/object size that I need to learn 
 more about (the history of the term possibly being related to 2 and 3 above).

5. Chinese sailing vessel.
http://en.wikipedia.org/wiki/Junk_(ship)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Random Forest

2008-06-06 Thread Bertrand Pub Michel

Hello

Is there exists a package for multivariate random forest, namely for
multivariate response data ? It seems to be impossible with the
randomForest function and I did not find any information about this
in the help pages  ...
Thank you for your help

Bertrand

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] mean

2008-06-06 Thread Marco Chiapello

Hi,
I have a simple question. If I have a table and I want to have the mean
for each row, how can I do?!
Es:
c1  c2  c3  mean
1   12  13  14  ??
2   15  24  10  ??
...

Thanks,
Marco

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Java to R interface

2008-06-06 Thread madhura

The path to R/bin is in the Windows PATH variable.  Yet I get this
error.

On Jun 6, 10:37 am, Dumblauskas, Jerry [EMAIL PROTECTED]
suisse.com wrote:
 Try and make sure that R is in your windows Path variable

 I got your message when I first did this, but when I did the about it
 then worked...

 ======
 Please access the attached hyperlink for an important electronic 
 communications disclaimer:

 http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
 ======

         [[alternative HTML version deleted]]

 __
 [EMAIL PROTECTED] mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R (D)COM Server not working on windows domain account

2008-06-06 Thread Evans_CSHL


I have installed R (D)COM on a (windows) machine that is part of a windows
domain. if I run the test file in a local (log into this machine)
administrative account it works fine. If I run the test file on a domain
account with administrative rights it will not connect to the server, even
is I change the account type from roaming to local.

Anyone have any ideas?

Thanks,
Gregg
-- 
View this message in context: 
http://www.nabble.com/R-%28D%29COM-Server-not-working-on-windows-domain-account-tp17695171p17695171.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Random Forest and for multivariate response data

2008-06-06 Thread Bertrand Pub Michel

Hello

Is there exists a package for multivariate random forest, namely for
multivariate response data ? It seems to be impossible with the
randomForest function and I did not find any information about this
in the help pages  ...
Thank you for your help

Bertrand

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Improving data processing efficiency

2008-06-06 Thread Patrick Burns


My guess is that number 2 is closest to the mark.
Typing too fast is unfortunately not one of my
habitual attributes.

Gabor Grothendieck wrote:

On Fri, Jun 6, 2008 at 2:28 PM, Greg Snow [EMAIL PROTECTED] wrote:
  

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Patrick Burns
Sent: Friday, June 06, 2008 12:04 PM
To: Daniel Folkinshteyn
Cc: r-help@r-project.org
Subject: Re: [R] Improving data processing efficiency

That is going to be situation dependent, but if you have a
reasonable upper bound, then that will be much easier and not
far from optimal.

If you pick the possibly too small route, then increasing the
size in largish junks is much better than adding a row at a time.
  

Pat,

I am unfamiliar with the use of the word junk as a unit of measure for data 
objects.  I figure there are a few different possibilities:

1. You are using the term intentionally meaning that you suggest he increases 
the size in terms of old cars and broken pianos rather than used up pens and 
broken pencils.

2. This was a Freudian slip based on your opinion of some datasets you have 
seen.

3. Somewhere between your mind and the final product jumps/chunks became junks 
(possibly a microsoft correction, or just typing too fast combined with number 2).

4. junks is an official measure of data/object size that I need to learn more 
about (the history of the term possibly being related to 2 and 3 above).




5. Chinese sailing vessel.
http://en.wikipedia.org/wiki/Junk_(ship)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Random Forest

2008-06-06 Thread Bertrand Pub Michel

Hello

Is there exists a package for multivariate random forest, namely for
multivariate response data ? It seems to be impossible with the
randomForest function and I did not find any information about this
in the help pages  ...
Thank you for your help

Bertrand

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R + Linux

2008-06-06 Thread steven wilson

Dear all;

I'm planning to install Linux on my computer to run R (I'm bored of
W..XP). However, I haven't used Linux before and I would appreciate,
if possible, suggestions/comments about what could be the best option
install, say Fedora, Ubuntu or OpenSuse which to my impression are the
most popular ones (at least on the R-help lists). The computer is a PC
desktop with 4GB RAM and  Intel Quad-Core Xeon processor and will be
used only to run R.

Thanks
Steven

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Improving data processing efficiency

2008-06-06 Thread Greg Snow

 -Original Message-
 From: Gabor Grothendieck [mailto:[EMAIL PROTECTED]
 Sent: Friday, June 06, 2008 12:33 PM
 To: Greg Snow
 Cc: Patrick Burns; Daniel Folkinshteyn; r-help@r-project.org
 Subject: Re: [R] Improving data processing efficiency

 On Fri, Jun 6, 2008 at 2:28 PM, Greg Snow [EMAIL PROTECTED] wrote:
  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of Patrick Burns
  Sent: Friday, June 06, 2008 12:04 PM
  To: Daniel Folkinshteyn
  Cc: r-help@r-project.org
  Subject: Re: [R] Improving data processing efficiency

  That is going to be situation dependent, but if you have a
 reasonable
  upper bound, then that will be much easier and not far
 from optimal.

  If you pick the possibly too small route, then increasing
 the size in
  largish junks is much better than adding a row at a time.

  Pat,

  I am unfamiliar with the use of the word junk as a unit
 of measure for data objects.  I figure there are a few
 different possibilities:

  1. You are using the term intentionally meaning that you
 suggest he increases the size in terms of old cars and broken
 pianos rather than used up pens and broken pencils.

  2. This was a Freudian slip based on your opinion of some
 datasets you have seen.

  3. Somewhere between your mind and the final product
 jumps/chunks became junks (possibly a microsoft
 correction, or just typing too fast combined with number 2).

  4. junks is an official measure of data/object size that
 I need to learn more about (the history of the term possibly
 being related to 2 and 3 above).

 5. Chinese sailing vessel.
 http://en.wikipedia.org/wiki/Junk_(ship)

Thanks for expanding my vocabulary (hmm, how am I going to use that word in 
context today?).

So, if 5 is the case, then Pat's original statement can be reworded as:

If you pick the possibly too small route, then increasing the size in
largish Chinese sailing vessels is much better than adding a row boat at a 
time.

While that is probably true, I am not sure what that would mean in terms of the 
original data processing question.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] mean

2008-06-06 Thread ctu


TABLE-matrix(data=c(12,13,14,15,24,10),byrow=T,nrow=2,ncol=3)
TABLE

 [,1] [,2] [,3]
[1,]   12   13   14
[2,]   15   24   10

apply(TABLE,1,mean)

[1] 13.0 16.3

Chunhao


Quoting Marco Chiapello [EMAIL PROTECTED]:


Hi,
I have a simple question. If I have a table and I want to have the mean
for each row, how can I do?!
Es:
c1  c2  c3  mean
1   12  13  14  ??
2   15  24  10  ??
...

Thanks,
Marco

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot questions

2008-06-06 Thread Thompson, David (MNR)

OK,

The original ggplot() construct (below) on the following two dataframes
(test1, test2) generate different outputs, which I have attached. The
output that I expect is that shown in test2.png. My expectations are
that I have set the plotting limits with 'scale_x_continuous(lim = c(0,
360)) + scale_y_continuous(lim = c(0, 16))' so, both data sets should
produce the same output except for the 'o' at plot center and the 'N' at
the top. The only difference in the two dataframes are inclusion of
first two rows in test2 with rplt column changed to character:
 test2[1:2,]
  oplt rplt  az dist
10o   00
20N 360   16

Ahhh, wait a second! In composing this message I may have found the
problem.
It appears that including the 'scale_x_continuous()' component twice in
my original version was causing (?) the erratic behaviour. And I have
confirmed that the ordering of the layer, scale* and coord* components
does not affect the output. However, I'm still getting more x-breaks
than requested with radial lines corresponding to 45, 135, 225, 315
degrees (NE, SE, SW, NW). Still open to suggestions on that.

# new version working with both dataframes
ggplot() + coord_polar() + 
layer( data = test1, mapping = aes(x = az, y = dist, label = rplt), geom
= text) +
scale_x_continuous(lim = c(0, 360), breaks=c(90, 180, 270, 360),
labels=c('E', 'S', 'W', 'N')) + 
scale_y_continuous(lim = c(0, 16), breaks=c(0, 4, 8, 12, 16),
labels=c('centre', '4m', '8m', '12m', '16m'))

##
##
##

# original version NOT WORKING with test1
ggplot() + coord_polar() + scale_x_continuous(lim = c(0, 360)) +
scale_y_continuous(lim = c(0, 16)) + 
layer( data = test, mapping = aes(x = az, y = dist, label = rplt), geom
= text) +
scale_x_continuous(breaks=c(90, 180, 270, 360), labels=c('90', '180',
'270', '360'))

# data generating test1.png
test1 -structure(list(oplt = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), rplt = 1:10, az = c(57L, 94L, 96L, 152L, 182L, 185L, 227L, 
264L, 332L, 354L), dist = c(4.09, 2.8, 7.08, 7.09, 3.28, 7.85, 
6.12, 1.97, 7.68, 7.9)), .Names = c(oplt, rplt, az, dist
), row.names = c(NA, 10L), class = data.frame)

# data generating test2.png
test2 - structure(list(oplt = c(0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), 
rplt = c(o, N, 1, 2, 3, 4, 5, 6, 7, 8, 
9, 10), az = c(0, 360, 57, 94, 96, 152, 182, 185, 227, 
264, 332, 354), dist = c(0, 16, 4.09, 2.8, 7.08, 7.09, 3.28, 
7.85, 6.12, 1.97, 7.68, 7.9)), .Names = c(oplt, rplt, 
az, dist), row.names = c(NA, 12L), class = data.frame)

Many, many thanks for your patience and perseverance on this one Hadley,
DaveT.

-Original Message-
From: hadley wickham [mailto:[EMAIL PROTECTED] 
Sent: June 6, 2008 02:06 PM
To: Thompson, David (MNR)
Cc: r-help@r-project.org
Subject: Re: [R] ggplot questions

 Does the difference have something to do with ggplot() using ranges
 derived from the data?
 When I modify my original 'test' dataframe with two extra rows as
 defined below, I get expected results in both versions.

Order shouldn't matter - and if it's making a difference, that's a
bug.  But I'm still not completely sure what you're expecting.

 This highlights my next question (warned you ;-) ), I have been
 unsuccessful in trying to define fixed plotting ranges to generate a
 'template' graphic that I may reuse with successive 'overstory plot'
 data sets. I have used '+ xlim(0, 360) + ylim(0, 16)' but, 
this seems to
 not have any effect on the final plot layout.

Could you please produce a small reproducible example that
demonstrates this?  It may well be a bug.

Hadley

-- 
http://had.co.nz/

attachment: test1.pngattachment: test2.png__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 >

1 - 100 of 148 matches

Mail list logo