from:"Roger Koenker"

Re: [R] Optimization under an absolute value constraint

2007-09-07 Thread roger koenker

this should be possible in the lasso2 package.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Sep 7, 2007, at 1:17 PM, Phil Xiang wrote:

 I need to optimize a multivariate function f(w, x, y, z, ...) under  
 an absolute value constraint. For instance:

 min { (2x+y) (w-z) }

 under the constraint:

 |w| +  |x| + |y| + |z| = 1.0 .

 Is there any R function that does this? Thank you for your help!


 Phil Xiang

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Monotonic interpolation

2007-09-06 Thread roger koenker

You might look at the monotone fitting available in the rqss()
function of the quantreg package.

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Sep 6, 2007, at 10:03 AM, excalibur wrote:




 Le jeu. 6 sept. à 09:45, excalibur a écrit :


 Hello everybody, has anyone got a function for smooth monotonic
 interpolation
 (splines ...) of a univariate function (like a distribution
 function for
 example) ?

 approxfun() might be what your looking for.

 Is the result of approxfun() inevitably monotonic ?
 --  
 View this message in context: http://www.nabble.com/Monotonic- 
 interpolation-tf4392288.html#a12524568
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] piecewise linear approximation

2007-08-30 Thread roger koenker

If you want to minimize absolute error for this, then you can
try the rqss fitting in the quantreg package and tune lambda
to get one break in the fitted function.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Aug 29, 2007, at 8:05 PM, Achim Zeileis wrote:

 On Wed, 29 Aug 2007, Naxerova, Kamila wrote:

 Dear list,

 I have a series of data points which I want to approximate with  
 exactly two
 linear functions. I would like to choose the intervals so that the  
 total
 deviation from my fitted lines is minimal. How do I best do this?

 From the information you give it seems that you want to partition  
 a model
 like
lm(y ~ x)
 along a certain ordering of the observations. Without any further
 restrictions you can do that with the function breakpoints() in  
 package
 strucchange. If there are continuity restrictions or something like
 that, you want to look at the segmented package.

 hth,
 Z

 Thanks!
 Kamila


 The information transmitted in this electronic communication... 
 {{dropped}}

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.



 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] quntile(table)?

2007-08-28 Thread roger koenker

You could use:

require(quantreg)
  rq(index ~ 1, weights=count, tau=0:5/5)

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Aug 28, 2007, at 9:22 AM, Seung Jun wrote:

 Hi,

 I have data in the following form:

   index  count
 -7  32
  19382
  22192
  7 190
 11 201

 I'd like to get quantiles from the data.  I thought about something  
 like this:

   index - c(-7, 1, 2, 7, 11)
   count - c(32,  9382, 2192, 190, 201)
   quantile(rep(index, count))

 It answers correctly, but I feel it's wasteful especially when count
 is generally large.  So, my question is, is there a way to get
 quantiles directly from this table (without coding at a low level)?

 Thanks,
 Seung

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] perception of graphical data

2007-08-24 Thread roger koenker

You might want to look at the cartogram literature.  See e.g.

http://www-personal.umich.edu/~mejn/election/

I don't know of an R implementation of this sort of thing, but
perhaps others can correct me.

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Aug 24, 2007, at 12:30 PM, Yeh, Richard C wrote:

 Hello,

 I apologize that this is off-topic.  I am seeking information on
 perception of graphical data, in an effort to improve the plots I
 produce.  Would anyone point me to literature reviews in this  
 area?  (Or
 keywords to try on google?)  Is this located somewhere near cognitive
 science, psychology, human factors research?

 For example, some specific questions I have are:

 I recall as a child when I first saw a map where the areas of the
 containers (geographical states) were drawn as rectangles,  
 proportional
 to a quantity other than land area.  Does anyone know of an algorithm
 for drawing such maps?  Would anyone know of a journal or reference
 where I can find studies on whether subjects reading these maps can
 accurately assess the meaning of the different areas, as [some of us]
 can assess different heights on a bar graph?  (What about areas in bar
 graphs with non-uniform widths?)

 Scatter plots of microarray data often attempt to represent  
 thousands or
 tens of thousands of points, but all I read from them are density and
 distribution --- the gene names cannot be shown.  At what point,  
 would a
 sunflowerplot-like display or a smooth gradient be better?  When two
 data points drawn as 50% gray disks are small and tangent, are they
 perceptually equivalent to a single, 100% black disk?  Or a 50% gray
 disk with twice the area?  What problems are known about plotting with
 disks --- do viewers use the area or the diameter (or neither) to  
 gauge
 weight?


 As you can tell, I'm a non-expert, mixing issues of data  
 interpretation,
 visual perception, graphic representation.  Previously, I didn't have
 the flexibility of R's graphics, so I didn't need to think so much.
 I've read some of Edward S. Tufte's books, but found them more
 qualitative than quantitative.

 Thanks!

 Richard

 212-933-3305 / [EMAIL PROTECTED]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] (Most efficient) way to make random sequences of random sequences

2007-08-21 Thread roger koenker

One way:

N - 10
 s - c(apply(matrix(rep(1:3,N),3,N),2,sample))


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Aug 21, 2007, at 3:49 PM, Emmanuel Levy wrote:

 Hi,

 I was wondering the what would be the (most efficient) way to generate
 a sequence
 of sequences, i mean:

 if I have 1,2 and 3.

 I'd like to generate a sequence of length N*3 (N ~ 1,000,000 or more)

 Where random permutations of the sequence 1,2,3 follow each other.

 i.e  1,2,3,1,3,2,3,2,1

 /!\ The thing is that there should never be twice the same number of
 in the same sub-sequence, meaning that this is different from
 generating a vector with the numbers 1,2 and 3 randomly distributed.

 Any suggestion very welcome! Thanks,

 Emmanuel

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] image plot with multiple x values

2007-08-17 Thread roger koenker

If you are willing to go to the bother of representing your data
as a sparse matrix, the package SparseM has a version of image()
that will do what you would like to do, I believe.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Aug 17, 2007, at 1:51 PM, baptiste Auguié wrote:

 Hi,

 New to R, I don't find a way to plot the following data with image():

 x is a N * M matrix
 y is a vector of length M
 z is a N*M matrix

 I wish to plot z as a greyscale image, but my x axis is different for
 every row of the z data.

 Here is a minimal example,

 theta-c(3:6) # N
 y-c(1:5) # M

 x-theta%*%t(y)# N * M
 z-sin(x) # N * M

 image(z)

 This doesn't give what I want, as the x axis needs to be shifted as
 we go from line to the following. (probably clearer if you plot
 matplot(x,z): the curves are shifted)

 The way I see it, I need either to construct a bigger matrix with all
 possible values of x giving the new dimension and arbitrary values
 for the missing points, or find a plotting function that would plot
 lines by lines. The ordering of the x and z values is giving me a
 headache on the first idea, and I can't find any option / alternative
 to image.

 Thanks in advance!

 baptiste

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] smoothing function for proportions

2007-08-10 Thread roger koenker

It is not entirely clear what you are using for y values in  
smooth.spline,
but it would appear that it is just the point estimates.  I would  
suggest
using instead -- at each x value -- a few equally spaced quantiles of
the estimated proportions.  Implicitly, smooth.spline expects to be  
fitting
a mean curve to data that has constant variance, so you might also
consider reweighting to approximate this, as well.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Aug 10, 2007, at 10:23 AM, Rose Hoberman wrote:

 Sorry, forgot to attach the graph.

 On 8/10/07, Rose Hoberman [EMAIL PROTECTED] wrote:
 I am looking for a function that can fit a smooth function to a  
 vector
 of estimated proportions, such that the smoothed value is within
 specified confidence bounds of each proportion.  In other words,  
 given
 a small number of trials and large confidence intervals, I would
 prefer the function to vary smoothly, but given a large number of
 trials and small confidence intervals, I would prefer the function to
 lie within the confidence intervals, even if it is not smooth.

 I have attached a postscript file illustrating a data set I would  
 like
 to smooth.  As the figure shows, for large values of x, I have few
 data points, and so the ML estimate of the proportion varies widely,
 and the confidence intervals are very large.  When I use the
 smooth.spline function with a large value of spar (the red line), the
 function is not as smooth as desired for large values of x.  When I
 use a smaller value of spar (the green line), the function fails to
 stay within the confidence bounds of the proportions.   Is there a
 smoothing function for which I can specify upper and lower limits for
 the y value for specific values of x?

 Thanks for any suggestions,

 Rose

 smoothProportions.ps
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Predict using SparseM.slm

2007-08-01 Thread roger koenker

If you are feeling altruistic you could write a predict method for
slm objects, it wouldn't be much work to adapt what is already
available and  follow the  predict.lm prototype.  On the other
hand if you are looking for something quick and dirty you can
always resort to

newX %*% coef(slmobj)


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Aug 1, 2007, at 4:42 PM, T. Balachander wrote:

 Hi,

 I am trying out the SparseM package and had the a
 question. The following piece of code works fine:

 ...
 fit = slm(model, data = trainData, weights = weight)

 ...

 But how do I use the fit object to predict the values
 on say a reserved testDataSet? In the regular lm
 function I would do something like this:

 predict.lm(fit,testDataSet)

 Thanks
 -Bala




 __ 
 __

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plotting a summary.rq object in using pkg quantreg

2007-07-24 Thread roger koenker

Package questions to package maintainers  please.

The short answer is that your alpha = .4 parameter needs to
be passed to  summary not to plot.  Try this:

 plot(summary(rq(foodexp~income,tau = 1:49/50,data=engel),alpha =. 
 4), nrow=1,
 ncol=2, ols = TRUE)

A longer answer would involve a boring disquisition about various  
fitting methods
and standard error estimation methods and their historical evolution  
and defaults.
(By default rank-based confidence bands are being used for the engel  
data since
the sample size is relatively small.)

Regarding your more fundamental question:  you can always modify   
functions
such as summary.rq or plot.summary.rqs  -- see for example ?fix.




url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On Jul 24, 2007, at 11:07 AM, Jeff G. wrote:

 Hello,

 I am having problems adjusting the plot output from the quantreg
 package.  Anyone know what I'm doing wrong?

 For example (borrowing from the help files):

 plot(summary(rq(foodexp~income,tau = 1:49/50,data=engel)), nrow=1,
 ncol=2,alpha = .4, ols = TRUE, xlab=test)

 The alpha= parameter seems to have no effect on my output, even  
 when I
 set it to a ridiculous value like 0.4.  Also, though in the help  
 file it
 says |...| = optional arguments to plot, xlab (as an example)  
 seems
 to do nothing.  If the answer is that I should extract the values I  
 need
 and construct the plot I want independently of the rq.process object,
 that it okay I suppose, if inefficient.  Maybe a more fundamental
 question is how do I get in and see how plot is working in this  
 case so
 that I can modify.

 Thanks much!

 J

 P.S.  I've explored using plot.summary.rqs but the problems seem to be
 the same.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] quantreg behavior changes for N1000

2007-07-24 Thread roger koenker

When in doubt:  RTFM --  Quoting from ?summary.rq

se: specifies the method used to compute standard standard
   errors.  There are currently five available methods:

  1.  'rank' which produces confidence intervals for the
 estimated parameters by inverting a rank test as
 described in Koenker (1994).  The default option
 assumes that the errors are iid, while the option iid =
 FALSE implements the proposal of Koenker Machado
 (1999).  This is the default method unless the sample
 size exceeds 1001, or cov = FALSE in which case se =
 nid is used.

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Jul 24, 2007, at 12:57 PM, Jeff G. wrote:

 Hello again R-experts and novices (like me),

 This seems like a bug to me - or maybe it's intentional...can anyone
 confirm?  Up to 1000 reps, summary() of a rq object gives different
 output and subtly different confidence interval estimates.

 ThanksJeff

 testx=runif(1200)
 testy=rnorm(1200, 5)

 test.rq=summary(rq(testy[1:1000]~testx[1:1000], tau=2:98/100))
 test.rq[[1]]
 Gives this output:
 Call: rq(formula = testy[1:1000] ~ testx[1:1000], tau = 2:98/100)

 tau: [1] 0.02

 Coefficients:
   coefficients   lower bd   upper bd
 (Intercept)3.00026 2.45142 3.17098
 testx[1:1000] -0.00870 -0.39817  0.49946

 test.rq=summary(rq(testy[1:1001]~testx[1:1001], tau=2:98/100))
 test.rq[[1]]

 Gives this (different) output:
 Call: rq(formula = testy[1:1001] ~ testx[1:1001], tau = 2:98/100)

 tau: [1] 0.02

 Coefficients:
   ValueStd. Error t value  Pr(|t|)
(Intercept)3.00026  0.21605   13.88658  0.0
 testx[1:1001] -0.00870  0.32976   -0.02638  0.97896


 plot(test.rq, nrow=2, ncol=2) # The slope estimates appear to be the
 same but there are subtle differences in the confidence intervals,  
 which
 shouldn't be due simply to the inclusion of one more point.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] crimtab related question

2007-07-24 Thread roger koenker

While on the subject of mechanical methods of statistical research  I  
can't
resist quoting Doob's (1997) Statistical Science interview:

 My system, complicated by my inaccurate typing, led to retyping  
 material over and over, and for some time I had an electric drill  
 on my desk, provided with an eraser bit which I used to erase  
 typing. I rarely used the system of brushing white fluid over a  
 typed error because I was not patient enough to let the fluid dry  
 before retyping. Long after my first book was done I discovered the  
 tape rolls which cover lines of type. As I typed and retyped my  
 work it became so repugnant to me that I had more and more  
 difficulty even to look at it to check it. This fact accounts for  
 many slips that a careful reading would have discovered. I commonly  
 used a stochastic system of checking, picking a page and then a  
 place on the page at random and reading a few sentences, in order  
 to avoid reading it in context and thereby to avoid reading what  
 was in my mind rather than what I had written. At first I would  
 catch something at almost every trial, and I would continue until  
 several trials would yield nothing. I have tried this system on  
 other authors, betting for example that I would find something to  
 correct on a randomly chosen printed page of text, and  
 nonmathematicans suffering under the delusion that mathematics is  
 errorless would be surprised at how many bets I have won.

The relevance to the present inquiry is confirmed by the misspelling  
of Dennison in the Annals reference
quoted below.  See, for example:

http://www.amazon.com/Avery-Dennison-Metal-Rim-Tags/dp/B000AN376G

On the substance of Jean's question, Mark's interpretation seems very  
plausible.

Thanks to Jean and to Martin Maechler for adding this dataset to R.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Jul 24, 2007, at 4:42 PM, Mark Difford wrote:


 Hi Jean,

 You haven't yet had a reply from an authoratitive source, so here  
 is my
 tuppence worth to part of your enquiry.

 It's almost certain that the receiving box is a receptacle into  
 which tags
 were placed after they had been drawn and the inscribed measurement  
 noted
 down.  Measurements on three tags were unwittingly not noted before  
 the tags
 were transferred to the receiving box.  They lay there with a good  
 many
 other tags, so the inscribed measurement/tag couldn't be recovered.

 I hope this clarifies some points.

 Regards,
 Mark.


 Jean lobry wrote:

 Dear all,

 the dataset documented under ?crimtab was also used in:

 @article{TreloarAE1934,
  title = {The adequacy of {S}tudent's criterion of
   deviations in small sample means},
  author = {Treloar, A.E. and Wilder, M.A.},
  journal = {The Annals of Mathematical Statistics},
  volume = {5},
  pages = {324-341},
  year = {1934}
 }

 The following is from page 335 of the above paper:

 From the table provided by MacDonell (1902) on
 the associated variation of stature (to the nearest inch)
 and length of the left middle finger (to the nearest
 millimeter) in 3000 British criminals, the measusurements
 were transferred to 3000 numbered Denison metal-rim
 tags from which the cords has been removed. After
 thorough checking and mixing of these circular disks,
 samples of 5 tags each were drawn at random until the
 supply was exhausted. Unfortunately, three of these
 samples were erroneously returned to a receiving box
 before being copied, and the records of 597 samples only
 are available.

 Could someone give me a clue about the kind of device
 that was used here? Is it a kind of lottery machine?
 I don't understand why three samples were lost. What
 is this receiving box?

 Thanks for any hint,

 Best,
 -- 
 Jean R. Lobry([EMAIL PROTECTED])
 Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - LYON I,
 43 Bd 11/11/1918, F-69622 VILLEURBANNE CEDEX, FRANCE
 allo  : +33 472 43 27 56 fax: +33 472 43 13 88
 http://pbil.univ-lyon1.fr/members/lobry/

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 -- 
 View this message in context: http://www.nabble.com/crimtab-related- 
 question-tf4137237.html#a11772414
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code

Re: [R] Tools For Preparing Data For Analysis

2007-06-10 Thread roger koenker

An important potential benefit of R solutions shared by awk, sed, ...
is that they provide a reproducible way to  document  exactly how one  
got
from one version of the data to the next.  This  seems to be the main
problem with handicraft methods like editing excel files, it  is too
easy to introduce  new errors that can't be tracked down at later
stages of the analysis.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On Jun 10, 2007, at 4:14 PM, (Ted Harding) wrote:

 On 10-Jun-07 19:27:50, Stephen Tucker wrote:

 Since R is supposed to be a complete programming language,
 I wonder why these tools couldn't be implemented in R
 (unless speed is the issue). Of course, it's a naive desire
 to have a single language that does everything, but it seems
 that R currently has most of the functions necessary to do
 the type of data cleaning described.

 In principle that is certainly true. A couple of comments,
 though.

 1. R's rich data structures are likely to be superfluous.
Mostly, at the sanitisation stage, one is working with
flat files (row  column). This straightforward format
is often easier to handle using simple programs for the
kind of basic filtering needed, rather then getting into
the heavier programming constructs of R.

 2. As follow-on and contrast at the same time, very often
what should be a nice flat file with no rough edges is not.
If there are variable numbers of fields per line, R will
not handle it straightforwardly (you can force it in,
but it's more elaborate). There are related issues as well.

 a) If someone entering data into an Excel table lets their
cursor wander outside the row/col range of the table,
this can cause invisible entities to be planted in the
extraneous cells. When saved as a CSV, this file then
has variable numbers of fields per line, and possibly
also extra lines with arbitrary blank fields.

cat datafile.csv | awk 'BEGIN{FS=,}{n=NF;print n}'

will give you the numbers of fields in each line.

If you further pipe it into | sort -nu you will get
the distinct field-numbers. If you know (by now) how many
fields there should be (e.g. 10), then

cat datafile.csv | awk 'BEGIN{FS=,} (NF != 10){print NR ,  NF}'

will tell you which lines have the wrong number of fields,
and how many fields they have. You can similarly count how
many lines there are (e.g. pipe into wc -l).

 b) Poeple sometimes randomly use a blank space or a . in a
cell to demote a missing value. Consistent use of either
is OK: ,, in a CSV will be treated as NA by R. The use
of . can be more problematic. If for instance you try to
read the following CSV into R as a dataframe:

1,2,.,4
2,.,4,5
3,4,.,6

the . in cols 2 and 3 is treated as the character .,
with the result that something complicated happens to
the typing of the items.

typeeof(D[i,j]) is always integer. sum(D[1,1]=1, but
sum(D[1,2]) gives a type-error, even though the entry
is in fact 2. And so on , in various combinations.

And (as.nmatrix(D)) is of course a matrix of characters.

In fact, columns 2 and 3 of D are treated as factors!

for(i in (1:3)){ for(j in (1:4)){ print( (D[i,j]))}}
[1] 1
[1] 2
Levels: . 2 4
[1] .
Levels: . 4
[1] 4
[1] 2
[1] .
Levels: . 2 4
[1] 4
Levels: . 4
[1] 5
[1] 3
[1] 4
Levels: . 2 4
[1] .
Levels: . 4
[1] 6

This is getting altogether too complicated for the job
one wants to do!

And it gets worse when people mix ,, and ,.,!

On the other hand, a simple brush with awk (or sed in
this case) can sort it once and for all, without waking
the sleeping dogs in R.

 I could go on. R undoubtedly has the power, but it can very
 quickly get over-complicated for simple jobs.

 Best wishes to all,
 Ted.

 
 E-Mail: (Ted Harding) [EMAIL PROTECTED]
 Fax-to-email: +44 (0)870 094 0861
 Date: 10-Jun-07   Time: 22:14:35
 -- XFMail --

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Metropolis-Hastings Markov Chain Monte Carlo in Spatstat

2007-06-06 Thread roger koenker

Take a look at:  http://sepwww.stanford.edu/software/ratfor.html
and in particular the link there to the original paper by Brian
Kernighan describing ratfor; it is only 14 pages, but it is a model
of clarity of exposition and design.

I wouldn't worry too much about the makefile  -- it probably
knows exactly what to do with ratfor provided you have the
ratfor preprocessor available from the above link, and the rest
of the tools to build from source.

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Jun 6, 2007, at 4:42 PM, Kevin C Packard wrote:

 I'm testing some different formulations of pairwise interaction  
 point processes
 in Spatstat (version 1.11-6) using R 2.5.0 on a Windows platform  
 and I wish to
 simulate them using the Metropolis-Hastings algorithm implemented  
 with Spatstat.
 Spatstat utilizes Fortran77 code with the preprocessor RatFor to do  
 the
 Metropolis-Hastings MCMC, but the Makefile is more complicated than  
 any I have
 worked with.
 Any suggestions on how I could get started working with the Fortran  
 code in
 conjunction with RatFor is appreciated.

 Sincerely,
 Kevin

 Kevin Packard
 Department of Forestry, PhD student
 Department of Statistics, MS student
 Virginia Polytechnic Institute and State University
 Blacksburg, Virginia, USA

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to use density function to find h_{k}

2007-06-03 Thread Roger Koenker

You might try:  http://www.stanford.edu/~kasparr/software/silverman.r

But  take a look at the referenced paper by Silverman first.  You could 
also try the CRAN package ftnonpar by Kovac and Davies.


url:www.econ.uiuc.edu/~roger/my.htmlRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820

On Sun, 3 Jun 2007, Patrick Wang wrote:

 Hi, All:

 How can I use the density function to find the minimum of the bandwidth
 make the density function one mode, 2 mode, 3 mode etc. usually the
 larger the bandwidth, the fewer mode of the density. less bumpy.

 It will be impossible to try all possible bandwidths to then plot the pdf
 to see how many modes it has. Is there an automatic way to do this Like
 for loop 1000, try bandwidth from (0, 1). is there a function to get
 how many modes from the density function? the Mode function in R doesnot
 seem to serve this purpose.


 Thanks
 pat

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Smoothing a path in 2D

2007-05-30 Thread roger koenker

You might have a look at the fda package of Ramsay on CRAN.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On May 30, 2007, at 9:42 AM, Dieter Vanderelst wrote:

 Hello,

 I'm currently trying to find a method to interpolate or smooth data  
 that
 represent a trajectory in space.

 For example, I have an ordered (=time) set of (x,y) tuples which
 constitute a path in a 2D space.

 Is there a way using R to interpolate between these points in a way
 similar to spline interpolation so that I get a smooth path in space?

 Greetings,
 Dieter

 -- 
 Dieter Vanderelst
 [EMAIL PROTECTED]
 Department of Industrial Design
 Designed Intelligence

   [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] nlme fixed effects specification

2007-05-09 Thread roger koenker

Just to provide some closure on this thread, let me add two comments:

1.  Doug's version of my sweep function:

diffid1 -
function(h, id) {
 id - as.factor(id)[ , drop = TRUE]
 apply(as.matrix(h), 2, function(x) x - tapply(x, id, mean)[id])
}

is far more elegant than my original, and works perfectly, but

2.  I should have mentioned that proposed strategy gets the
coefficient estimates right, however their standard errors need a
degrees of freedom correction, which in the present instance
is non-negligible -- sqrt(98/89) -- since the lm() step doesn't
know that we have already estimated the fixed effects with the
sweep operation.

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On May 5, 2007, at 7:16 PM, Douglas Bates wrote:

 On 5/5/07, roger koenker [EMAIL PROTECTED] wrote:

 On May 5, 2007, at 3:14 PM, Douglas Bates wrote:
 
  As Roger indicated in another reply you should be able to obtain  
 the
  results you want by sweeping out the means of the groups from  
 both x
  and y.  However, I tried Roger's function and a modified version  
 that
  I wrote and could not show this.  I'm not sure what I am doing  
 wrong.

 Doug,  Isn't it just that you are generating a  balanced factor and
 Ivo is
 generating an unbalanced one -- he wrote:

  fe = as.factor( as.integer( runif(100)*10 ) );

 the coefficient on x is the same

 or, aarrgh,  is it that you don't like the s.e. being wrong.   I
 didn't notice
 this at first.  But it shouldn't happen.  I'll have to take another
 look at  this.

 No, my mistake was much dumber than that.  I was comparing the wrong
 coefficient.  For some reason I was comparing the coefficient for x in
 the second fit to the Intercept from the first fit.

 I'm glad that it really is working and, yes, you are right, the
 degrees of freedom are wrong in the second fit because the effect of
 those 10 degrees of freedom are removed from the data before the model
 is fit.


  I enclose a transcript that shows that I can reproduce the  
 result from
  Roger's function but it doesn't do what either of us think it  
 should.
  BTW, I realize that the estimate for the Intercept should be  
 zero in
  this case.
 
 
 
  now, with a few IQ points more, I would have looked at the lme
  function instead of the nlme function in library(nlme).[then
  again, I could understand stats a lot better with a few more IQ
  points.]  I am reading the lme description now, but I still don't
  understand how to specify that I want to have dummies in my
  specification, plus the x variable, and that's it.  I think I  
 am not
  understanding the integration of fixed and random effects in  
 the same
  R functions.
 
  thanks for pointing me at your lme4 library.  on linux, version
  2.5.0, I did
R CMD INSTALL matrix*.tar.gz
R CMD INSTALL lme4*.tar.gz
  and it installed painlessly.  (I guess R install packages don't  
 have
  knowledge of what they rely on;  lme4 requires matrix, which  
 the docs
  state, but having gotten this wrong, I didn't get an error.  no  
 big
  deal.  I guess I am too used to automatic resolution of  
 dependencies
  from linux installers these days that I did not expect this.)
 
  I now tried your specification:
 
   library(lme4)
  Loading required package: Matrix
  Loading required package: lattice
   lmer(y~x+(1|fe))
  Linear mixed-effects model fit by REML
  Formula: y ~ x + (1 | fe)
   AIC BIC logLik MLdeviance REMLdeviance
   282 290   -138270  276
  Random effects:
   Groups   NameVariance   Std.Dev.
   fe   (Intercept) 0.0445 0.211
   Residual 0.889548532468 0.9431588
  number of obs: 100, groups: fe, 10
 
  Fixed effects:
  Estimate Std. Error t value
  (Intercept)  -0.0188 0.0943  -0.199
  x 0.0528 0.0904   0.585
 
  Correlation of Fixed Effects:
(Intr)
  x -0.022
  Warning messages:
  1: Estimated variance for factor 'fe' is effectively zero
   in: `LMEoptimize-`(`*tmp*`, value = list(maxIter = 200L,
  tolerance =
  0.000149011611938477,
  2: $ operator not defined for this S4 class, returning NULL in: x
  $symbolic.cor
 
  Without being a statistician, I can still determine that this  
 is not
  the model I would like to work with.  The coefficient is  
 0.0528, not
  0.0232.  (I am also not sure why I am getting these warning  
 messages
  on my system, either, but I don't think it matters.)
 
  is there a simple way to get the equivalent specification for my
  smple
  model, using lmer or lme, which does not choke on huge data sets?
 
  regards,
 
  /ivo
 
  Ivo_Rout.txt
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R

Re: [R] nlme fixed effects specification

2007-05-05 Thread roger koenker

Ivo,

I don't know whether you ever got a proper answer to this question.
Here is a kludgy one --  someone else can probably provide
a more elegant version of my diffid function.

What you want to do is sweep out the mean deviations from both y
and x based on the factor fe and then estimate the simple y on x  
linear model.

I have an old function that was originally designed to do panel data
models that looks like this:

diffid - function(h, id)
{
 if(is.vector(h))
 h - matrix(h, ncol = 1)
 Ph - unique(id)
 Ph - cbind(Ph, table(id))
 for(i in 1:ncol(h))
 Ph - cbind(Ph, tapply(h[, i], id, mean))
 is - tapply(id, id)
 Ph - Ph[is,  - (1:2)]
 h - Ph
}

With this  you can do:

set.seed(1);
fe = as.factor( as.integer( runif(100)*10 ) ); y=rnorm(100); x=rnorm 
(100);
summary(lm(diffid(y,fe) ~ diffid(x,fe)))

HTH,

Roger


On May 4, 2007, at 3:08 PM, ivo welch wrote:

 hi doug:  yikes.  could I have done better?  Oh dear.  I tried to make
 my example clearer half-way through, but made it worse.  I meant

 set.seed(1);
 fe = as.factor( as.integer( runif(100)*10 ) ); y=rnorm(100); x=rnorm 
 (100);
 print(summary(lm( y ~ x + fe)))
   deleted
 Coefficients:
 Estimate Std. Error t value Pr(|t|)
 (Intercept)   0.1128 0.36800.31 0.76
 x 0.0232 0.09600.24 0.81
 fe1  -0.6628 0.5467   -1.21 0.23
   deleted more fe's
 Residual standard error: 0.949 on 89 degrees of freedom
 Multiple R-Squared: 0.0838, Adjusted R-squared: -0.0192
 F-statistic: 0.814 on 10 and 89 DF,  p-value: 0.616

 I really am interested only in this linear specification, the
 coefficient on x (0.0232) and the R^2 of 8.38% (adjusted -1.92%).  If
 I did not have so much data in my real application, I would never have
 to look at nlme or nlme4.  I really only want to be able to run this
 specification through lm with far more observations (100,000) and
 groups (10,000), and be done with my problem.

 now, with a few IQ points more, I would have looked at the lme
 function instead of the nlme function in library(nlme).[then
 again, I could understand stats a lot better with a few more IQ
 points.]  I am reading the lme description now, but I still don't
 understand how to specify that I want to have dummies in my
 specification, plus the x variable, and that's it.  I think I am not
 understanding the integration of fixed and random effects in the same
 R functions.

 thanks for pointing me at your lme4 library.  on linux, version  
 2.5.0, I did
   R CMD INSTALL matrix*.tar.gz
   R CMD INSTALL lme4*.tar.gz
 and it installed painlessly.  (I guess R install packages don't have
 knowledge of what they rely on;  lme4 requires matrix, which the docs
 state, but having gotten this wrong, I didn't get an error.  no big
 deal.  I guess I am too used to automatic resolution of dependencies
 from linux installers these days that I did not expect this.)

 I now tried your specification:

 library(lme4)
 Loading required package: Matrix
 Loading required package: lattice
 lmer(y~x+(1|fe))
 Linear mixed-effects model fit by REML
 Formula: y ~ x + (1 | fe)
  AIC BIC logLik MLdeviance REMLdeviance
  282 290   -138270  276
 Random effects:
  Groups   NameVariance   Std.Dev.
  fe   (Intercept) 0.0445 0.211
  Residual 0.889548532468 0.9431588
 number of obs: 100, groups: fe, 10

 Fixed effects:
 Estimate Std. Error t value
 (Intercept)  -0.0188 0.0943  -0.199
 x 0.0528 0.0904   0.585

 Correlation of Fixed Effects:
   (Intr)
 x -0.022
 Warning messages:
 1: Estimated variance for factor 'fe' is effectively zero
  in: `LMEoptimize-`(`*tmp*`, value = list(maxIter = 200L, tolerance =
 0.000149011611938477,
 2: $ operator not defined for this S4 class, returning NULL in: x 
 $symbolic.cor

 Without being a statistician, I can still determine that this is not
 the model I would like to work with.  The coefficient is 0.0528, not
 0.0232.  (I am also not sure why I am getting these warning messages
 on my system, either, but I don't think it matters.)

 is there a simple way to get the equivalent specification for my smple
 model, using lmer or lme, which does not choke on huge data sets?

 regards,

 /ivo

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Freeman-Tukey arcsine transformation

2007-03-13 Thread roger koenker

As a further footnote on this, I can't resist mentioning a letter  
that appears
in Technometrics (1977) by Steve  Portnoy who notes that

2 arcsin(sqrt(p)) = arcsin(2p - 1) + pi/2

and asks: it would be of historical interest to know if any early  
statisticians
were aware of this, and if so, why the former version was  
preferred.  The
latter version seems more convenient since it obviously obviates the  
need
for special tables that appear in many places.



url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On Mar 13, 2007, at 1:48 PM, Sebastian P. Luque wrote:

 On Tue, 13 Mar 2007 14:15:16 -0400,
 Bos, Roger [EMAIL PROTECTED] wrote:

 I'm curious what this transformation does, but I am not curious  
 enough
 to pay $14 to find out.  Someone once told me that the arcsine was a
 good way to transform data and make it more 'normal'.  I am  
 wondering if
 this is an improved method.  Anyone know of a free reference?

 My Zar¹, says this is just:


 p' = 1/2 * (asin(sqrt(x / (n + 1))) + asin(sqrt((x + 1) / (n + 1


 so solving for x should give the back-transformation.  It is  
 recommended
 when the proportions that need to be disciplined are very close  
 to the
 ends of the range (0, 1; 0, 100).


 + *Footnotes* +
 ¹ @BOOK{149,
   title = {Biostatistical analysis},
   publisher = {Prentice-Hall, Inc.},
   year = {1996},
   author = {Zar, J. H.},
   address = {Upper Saddle River, New Jersey},
   key = {149},
 }


 -- 
 Seb

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Linear programming with sparse matrix input format?

2007-03-05 Thread roger koenker

If you can reformulate your LP as an L1 problem, which is known to be
possible without loss of generality, but perhaps not without loss of  
sleep,
then you could use the sparse quantile regression functions in the
quantreg package.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On Mar 5, 2007, at 5:30 PM, Talbot Katz wrote:

 Hi.

 I am aware of three different R packages for linear programming: glpk,
 linprog, lpSolve.  From what I can tell, if there are N variables  
 and M
 constraints, all these solvers require the full NxM constraint  
 matrix.  Some
 linear solvers I know of (not in R) have a sparse matrix input  
 format.  Are
 there any linear solvers in R that have a sparse matrix input format?
 (including the possibility of glpk, linprog, and lpSolve, in case I  
 might
 have missed something in the documentation).  Thanks!

 --  TMK  --
 212-460-5430  home
 917-656-5351  cell

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tournaments to dendrograms

2007-03-05 Thread roger koenker

I've had no response to the enquiry below, so I made a rather half-baked
version in grid  --  code and pdf are available here:

http://www.econ.uiuc.edu/~roger/research/ncaa

comments would be welcome.   This is _the_  ubiquitous graphic this  
time of
year in the US, so R should take a shot at it.  My first attempt is  
rather primitive
but I have to say that Paul's grid package is  superb.

url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On Feb 22, 2007, at 4:08 PM, roger koenker wrote:

 Does anyone have (good) experience converting tables of tournament
 results into dendrogram-like graphics?  Tables, for example, like  
 this:

 read.table(url(http://www.econ.uiuc.edu/~roger/research/ncaa/ 
 NCAA.d))

 Any pointers appreciated.   RK

 url:www.econ.uiuc.edu/~rogerRoger Koenker
 email[EMAIL PROTECTED]Department of Economics
 vox: 217-333-4558University of Illinois
 fax:   217-244-6678Champaign, IL 61820

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Packages in R for least median squares regression and computing outliers (thompson tau technique etc.)

2007-02-28 Thread roger koenker

It's not often one gets needs to correct Gabor, but no, 

least median of squares  is not the same as least absolute error  
regression.

Take a look at the package robust if you want the lms.

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Feb 28, 2007, at 1:24 PM, Gabor Grothendieck wrote:

 Try rq in quantreg using the default value for tau.

 On 2/28/07, lalitha viswanath [EMAIL PROTECTED] wrote:
 Hi
 I am looking for suitable packages in R that do
 regression analyses using least median squares method
 (or better). Additionally, I am also looking for
 packages that implement algorithms/methods for
 detecting outliers that can be discarded before doing
 the regression analyses.

 Although some websites refer to lms method under
 package lps in R, I am unable to find such a package
 on CRAN.

 I would greatly appreciate any pointers to suitable
 functions/packages for doing the above analyses.

 Thanks
 Lalitha



 _ 
 ___
 TV dinner still cooling?
 Check out Tonight's Picks on Yahoo! TV.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] loop issues (r.squared)

2007-02-08 Thread roger koenker

both Matrix and SparseM have formats of this type.

url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On Feb 8, 2007, at 4:45 PM, andy1983 wrote:


 That was a neat trick. However, it created a new problem.

 Before, it took way too long for a 10,000 columns to finish.

 Now, I test the memory limit. With 10,000 columns, I use up about  
 1.5 GBs.

 Assuming memory is not the issue, I still end up with a huge matrix  
 that is
 difficult to export. Is there a way to convert it to 3 columns (1  
 for row, 1
 for column, 1 for value)?

 Thanks.



 Greg Snow wrote:

 The most straight forward way that I can think of is just:

 cor(my.mat)^2 # assuming my.mat is the matrix with your data in the
 columns

 That will give you all the R^2 values for regressing 1 column on 1
 column (it is called R-squared for a reason).


 I would like to compare every column in my matrix with every
 other column and get the r-squared. I have been using the
 following formula and loops:
 summary(lm(matrix[,x]~matrix[,y]))$r.squared
 where x and y are the looping column numbers

 If I have 100 columns (10,000 iterations), the loops give me
 results in a reasonable time.
 If I try 10,000 columns, the loops take forever even if there
 is no formula inside. I am guessing I can vectorize my code
 so that I could eliminate one or both loops. Unfortunately, I
 can't figure out how to.



 -- 
 View this message in context: http://www.nabble.com/Re%3A--R--loop- 
 issues-%28r.squared%29-tf3196163.html#a8875897
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] heteroscedasticity problem

2007-02-07 Thread roger koenker

If you haven't already you might want to take a look at:

http://www.econ.uiuc.edu/~roger/research/rq/QReco.pdf

which is written by and for ecologists.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Feb 7, 2007, at 2:52 PM, [EMAIL PROTECTED] wrote:






 Dear Listers,

 I have a regression problem (x-y) with biological data, where x  
 influences
 y in two ways, (1) y increases with x and (2) the variation around  
 the mean
 (residuals) decreases with increasing x, i.e. y becomes more  
 'predictable'
 as x increases.
 The relationship is saturating, y~a + bx + cx^2, gives a very good  
 fit.

 I know basically how to test for heteroscedasticity. My question is if
 there is an elegant regression method, which captures both, the  
 mean and
 the (non-constant) variation around the mean. Such a method would  
 ideally
 yield an estimate of the mean and its variation, both as a function  
 of x.

 The pattern corresponds very well to some established ecological  
 theory
 (each x is the species richness of a community of primary  
 producers, y is
 the productivity of each community; productivity and its  
 predictability
 both increase with increasing species richness).

 Apologies for the probably clumsy decription of my problem - I am
 ecologist, not statistician (but a big fan of R).

 Cheers,
 Robert


 Robert Ptacnik
 Norwegian Institute for Water Research (NIVA)
 Gaustadalléen 21
 NO-0349 Oslo
  FON +47 982 277 81
 FAX +47 221 852 00

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] memory-efficient column aggregation of a sparse matrix

2007-02-01 Thread roger koenker

Doug is right, I think, that this would be easier with full indexing
using the  matrix.coo classe, if you want to use SparseM.  But
then the tapply seems to be the way to go.

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Feb 1, 2007, at 7:22 AM, Douglas Bates wrote:

 On 1/31/07, Jon Stearley [EMAIL PROTECTED] wrote:
 I need to sum the columns of a sparse matrix according to a factor -
 ie given a sparse matrix X and a factor fac of length ncol(X), sum
 the elements by column factors and return the sparse matrix Y of size
 nrow(X) by nlevels(f).  The appended code does the job, but is
 unacceptably memory-bound because tapply() uses a non-sparse
 representation.  Can anyone suggest a more memory and cpu efficient
 approach?  Eg, a sparse matrix tapply method?  Thanks.

 This is the sort of operation that is much more easily performed in
 the triplet representation of a sparse matrix where each nonzero
 element is represented by its row index, column index and value.
 Using that representation you could map the column indices according
 to the factor then convert back to one of the other representations.
 The only question would be what to do about nonzeros in different
 columns of the original matrix that get mapped to the same element in
 the result.  It turns out that in the sparse matrix code used by the
 Matrix package the triplet representation allows for duplicate index
 positions with the convention that the resulting value at a position
 is the sum of the values of any triplets with that index pair.

 If you decide to use this approach please be aware that the indices
 for the triplet representation in the Matrix package are 0-based (as
 in C code) not 1-based (as in R code).  (I imagine that Martin is
 thinking we really should change that as he reads this part.)


 --
 +--+
 | Jon Stearley  (505) 845-7571  (FAX 844-9297) |
 | Sandia National Laboratories  Scalable Systems Integration   |
 +--+


 # x and y are of SparseM class matrix.csr
 aggregate.csr -
 function(x, fac) {
  # make a vector indicating the row of each nonzero
  rows - integer(length=length([EMAIL PROTECTED]))
  [EMAIL PROTECTED]:nrow(x)]] - 1 # put a 1 at start of each row
  rows - as.integer(cumsum(rows)) # and finish with a cumsum

  # make a vector indicating the column factor of each nonzero
  f - [EMAIL PROTECTED]

  # aggregate by row,f
  y - tapply([EMAIL PROTECTED], list(rows,f), sum)

  # sparsify it
  y[is.na(y)] - 0  # change tapply NAs to as.matrix.csr 0s
  y - as.matrix.csr(y)

  y
 }


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SparseM and Stepwise Problem

2007-01-30 Thread roger koenker

One simple possibility  -- if you can generate the X matrix in dense  
form is
the coercion

X - as.matrix.csr(X)

Unfortunately, there is no current way to go from a formula to a  
sparse X
matrix  without  passing through a dense version of X first.   
Otherwise you
need to use new() to define the X matrix directly.  This is usually  
not that
difficult, but it depends on the model



url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On Jan 30, 2007, at 5:31 PM, [EMAIL PROTECTED] wrote:

 I'm trying to use stepAIC on sparse matrices, and I need some help.
 The documentation for slm.fit suggests:
 slm.fit and slm.wfit call slm.fit.csr to do Cholesky decomposition  
 and then
 backsolve to obtain the least squares estimated coefficients. These  
 functions can be
 called directly if the user is willing to specify the design matrix  
 in matrix.csr form.
 This is often advantageous in large problems to reduce memory  
 requirements.
 I need some help or a reference that will show how to create the  
 design matrix from
 data in matrix.csr form.
 Thanks for any help.


 -- 
 David Katz
  www.davidkatzconsulting.com
541 482-1137

   [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Inverse fuction of ecdf

2007-01-28 Thread roger koenker

quantile() does some somewhat exotic interpolation --- if you are  
wanting to
match moments you need to be more explicit about how you are computing
moments for the two approaches...

On Jan 28, 2007, at 5:06 PM, Geoffrey Zhu wrote:

 Hi Benilton,

 I tried this. It sort of works, but the results are not very
 satisfactionary. The 3rd moment and higher do not match those of the
 original by a large difference. Do you have any better way to do this?

 Thanks,
 Geoffrey

 -Original Message-
 From: Benilton Carvalho [mailto:[EMAIL PROTECTED]
 Sent: Sunday, January 28, 2007 4:45 PM
 To: Geoffrey Zhu
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] Inverse fuction of ecdf

 ?quantile

 b

 On Jan 28, 2007, at 5:41 PM, Geoffrey Zhu wrote:

 Hi Everyone,

 I want to generate some random numbers according to some empirical
 distribution. Therefore I am looking for the inverse of an empirical
 cumulative distribution function. I haven't found any in R. Can  
 anyone

 give a pointer?

 Thanks,
 Geoffrey



 ___=0A=
 =0A=
 =0A=
 The information in this email or in any file attached hereto... 
 {{dropped}}

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 2 problems with latex.table (quantreg package) - reproducible

2007-01-10 Thread roger koenker

The usual R-help etiquette recommends:

1.  questions about packages go to the maintainer, not to R-help.

2.  examples should be reproducible:  ie self contained.

if you look carefully at the function latex.summary.rqs  you will see
that there is a failure to pass the argument ... on to  
latex.table.  This
_may_ be the source of your problem if in fact your v1 and v2 were
summary.rqs objects, but I doubt that they are.

You might try caption = .  More generally there are much improved
latex tools elsewhere in R; if you aren't making tables that are  
specific
to quantreg, you might want to use them.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Jan 10, 2007, at 12:23 PM, Kati Schweitzer wrote:

 Dear all,

 When using latex.table from the quantreg package, I don't seem to  
 be able to set
 table.env=FALSE: when I don't specify caption (as I think I should,  
 when
 understanding the R help rightly(?)), I get an error message, and  
 when I
 do so, of course I get one, as well.
 The funny thing is, that a table is indeed produced in the first case,
 so I get a nice tabular, but as I'm using the command within a for -
 loop, the loop stops due to the error and only one latex table is
 produced.

 Example R-Code:

 library(quantreg)

 v1 - c(val1,val1,val2)
 v2 - c(val1,val2,val2)
 tab - table(v1,v2)

 latex.table(tab,table.env=FALSE)
 #error - german R error message (saying that caption  is missing and
 has no default :-) ):
 #Fehler in cat(caption, \n, file = fi, append = TRUE) :
 #   Argument caption fehlt (ohne Standardwert)

 latex.table(tab,table.env=FALSE,caption=nothing)
 #error - german R error message:
 #Fehler in latex.table(tab, table.env = FALSE, caption = nothing) :
 #   you must have table.env=TRUE if caption is given


 The second problem is, that - when using latex.table to produce a
 tabular within a table environment - I would like to specify cgroup
 with only one value - one multicolumn being a heading for both columns
 in the table.
 But I'm not able to produce latex-compilable code:

 latex.table(tab,cgroup=v2,caption=my table)

 gives me the following latex code:
 \begin{table}[hptb]
 \begin{center}
 \begin{tabular}{|l||c|c|} \hline
 \multicolumn{1}{|l||}{\bf
 tab}\multicolumn{}{c||}{}\multicolumn{2}{c|}{\bf v2}\\ \cline{2-3}
 \multicolumn{1}{|l||}{}\multicolumn{1}{c|}{val1}\multicolumn{1} 
 {c|}{val2}\\
 \hline
 val111\\
 val201\\
 \hline
 \end{tabular}
 \vspace{3mm}
 \caption{my table\label{tab}}
 \end{center}
 \end{table}

 and within this code the problem is the second multicolumn
 (\multicolumn{}{c||}{}), as it has no number specifying how many
 columns the multicolumn should cover. Latex (at least my version)
 complains.
 When deleting this part of the code, the table is compiled and looks
 exactly how I want it to look. I'm doing this with a system call and
 an shell script right now, but this seems pretty ugly to me...

 When I specify 2 columns, this problem doesn't occur:
 latex.table(tab,cgroup=c(blah,v2),caption=my table)

 I'm running R Version 2.3.0 (2006-04-24) on a linux machine Fedora
 Core 5 (i386).

 Can anyone help me find my mistakes?

 Thanks a lot
 ... and sorry for my bad English and potential newbie mistakes!!
 Kati

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] package dependency tree

2007-01-02 Thread roger koenker

Is there a painless way to find the names of all packages on CRAN
that Depend on a specified package?


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] matrix size

2007-01-01 Thread roger koenker


On Jan 1, 2007, at 4:43 PM, Armelini, Guillermo wrote:

 Hello everyone
 Could anybody tell me how to set the following matrix?

 n2-matrix(nrow=10185,ncol=10185,seq(0,0,length=103734225))

You can use:

library(SparseM)
as.matrix.coo(0,10185,10185)

but then you need to find something interesting to do with such a
boring matrix...



 R answer was
 Error: cannot allocate vector of size 810423 Kb

 Are there any solution? I tried to increase the memory size but it  
 didn't work
 G



 This message has been scanned for viruses by TRENDMICRO,\ an... 
 {{dropped}}

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RuleFit quantreg: partial dependence plots; showing an effect

2006-12-20 Thread roger koenker

They are entirely different:  Rulefit is a fiendishly clever  
combination of decision tree  formulation
of models and L1-regularization intended to select parsimonious fits  
to very complicated
responses yielding e.g. piecewise constant functions.  Rulefit   
estimates the  conditional
mean of the response over the covariate space, but permits a very  
flexible, but linear in
parameters specifications of the covariate effects on the conditional  
mean.  The quantile
regression plotting you refer to adopts a fixed, linear specification  
for conditional quantile
functions and given that specification depicts how the covariates  
influence the various
conditional quantiles of the response.   Thus, roughly speaking,  
Rulefit is focused on
flexibility in the x-space, maintaining the classical conditional  
mean objective; while
QR is trying to be more flexible in the y-direction, and maintaining  
a fixed, linear
in parameters specification for the covariate effects at each quantile.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Dec 20, 2006, at 4:17 AM, Mark Difford wrote:

 Dear List,

 I would greatly appreciate help on the following matter:

 The RuleFit program of Professor Friedman uses partial dependence  
 plots
 to explore the effect of an explanatory variable on the response
 variable, after accounting for the average effects of the other
 variables.  The plot method [plot(summary(rq(y ~ x1 + x2,
 t=seq(.1,.9,.05] of Professor Koenker's quantreg program  
 appears to
 do the same thing.


 Question:
 Is there a difference between these two types of plot in the manner  
 in which they depict the relationship between explanatory variables  
 and the response variable ?

 Thank you inav for your help.

 Regards,
 Mark Difford.

 -
 Mark Difford
 Ph.D. candidate, Botany Department,
 Nelson Mandela Metropolitan University,
 Port Elizabeth, SA.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RuleFit quantreg: partial dependence plots; showing an effect

2006-12-20 Thread roger koenker



On Dec 20, 2006, at 8:43 AM, Ravi Varadhan wrote:

 Dear Roger,

 Is it possible to combine the two ideas that you mentioned: (1)  
 algorithmic
 approaches of Breiman, Friedman, and others that achieve  
 flexibility in the
 predictor space, and (2) robust and flexible regression like QR  
 that achieve
 flexibility in the response space, so as to achieve complete  
 flexibility?
 If it is possible, are you or anyone else in the R community  
 working on
 this?


There are some tentative steps in this direction.  One is the rqss()  
fitting
in my quantreg package which does QR fitting with additive models
using total variation as a roughness penalty for nonlinear terms.
Another, along more tree structured lines, is Nicolai Meinshausen's
quantregforest package.

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of roger koenker
 Sent: Wednesday, December 20, 2006 8:57 AM
 To: Mark Difford
 Cc: R-help list
 Subject: Re: [R] RuleFit  quantreg: partial dependence plots;  
 showing an
 effect

 They are entirely different:  Rulefit is a fiendishly clever
 combination of decision tree  formulation
 of models and L1-regularization intended to select parsimonious fits
 to very complicated
 responses yielding e.g. piecewise constant functions.  Rulefit
 estimates the  conditional
 mean of the response over the covariate space, but permits a very
 flexible, but linear in
 parameters specifications of the covariate effects on the conditional
 mean.  The quantile
 regression plotting you refer to adopts a fixed, linear specification
 for conditional quantile
 functions and given that specification depicts how the covariates
 influence the various
 conditional quantiles of the response.   Thus, roughly speaking,
 Rulefit is focused on
 flexibility in the x-space, maintaining the classical conditional
 mean objective; while
 QR is trying to be more flexible in the y-direction, and maintaining
 a fixed, linear
 in parameters specification for the covariate effects at each  
 quantile.


 url:www.econ.uiuc.edu/~rogerRoger Koenker
 email[EMAIL PROTECTED]Department of Economics
 vox: 217-333-4558University of Illinois
 fax:   217-244-6678Champaign, IL 61820


 On Dec 20, 2006, at 4:17 AM, Mark Difford wrote:

 Dear List,

 I would greatly appreciate help on the following matter:

 The RuleFit program of Professor Friedman uses partial dependence
 plots
 to explore the effect of an explanatory variable on the response
 variable, after accounting for the average effects of the other
 variables.  The plot method [plot(summary(rq(y ~ x1 + x2,
 t=seq(.1,.9,.05] of Professor Koenker's quantreg program
 appears to
 do the same thing.


 Question:
 Is there a difference between these two types of plot in the manner
 in which they depict the relationship between explanatory variables
 and the response variable ?

 Thank you inav for your help.

 Regards,
 Mark Difford.

 -
 Mark Difford
 Ph.D. candidate, Botany Department,
 Nelson Mandela Metropolitan University,
 Port Elizabeth, SA.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] nonlinear quantile regression

2006-12-02 Thread roger koenker

This isn't a nonlinear QR problem.  You can write:

f - rq(y ~ log(x),  data=Dat, tau=0.25)

which corresponds to the model

Q_y (.25|x)  =  a log(x) + b

note the sign convention on b.

url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On Dec 2, 2006, at 1:47 PM, Ricardo Bessa wrote:

 Hello, I’m with a problem in using nonlinear quantile regression, the
 function nlrq.
 I want to do a quantile regression o nonlinear function in the form
 a*log(x)-b, the coefficients “a” and “b” is my objective. I try to  
 use the
 command:

 funx - function(x,a,b){
 res - a*log(x)-b
 res
 }

 Dat.nlrq - nlrq(y ~ funx(x, a, b), data=Dat, tau=0.25, trace=TRUE)

 But a can’t solve de problem, How I put the formula “y ~ funx(x,a,b)”?

 _
 MSN Busca: fácil, rápido, direto ao ponto.  http://search.msn.com.br

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] scanning a pdf scan

2006-10-27 Thread roger koenker

I have a pdf scan of several pages of data from a quite famous old  
paper by
C.S. Pierce (1873).  I would like (what else?) to convert it into an  
R dataframe.
Somewhat to my surprise the pdf seems to already be in a character  
recognized
form, since I can search for numerical strings and they are nicely  
found.  Of
course, as is usual with such tables there are also headings and  
column lines, etc
etc. that are less interesting than the numbers themselves.  I've  
tried saving the
pdf in various formats, some of which look vaguely tractable, but I'm  
hoping
that there is something that is more automatic.

Does anyone have experience that they could share toward this objective?


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] scanning a pdf scan

2006-10-27 Thread roger koenker

Thanks for your suggestions.  Trial and error experimentation
with adobe acrobat produced the following method:

It looks like it is possible to highlight the numerical part of the
table in Acrobat and then copy/paste into a text file, with about
98 percent accuracy.  Wonders never cease.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Oct 27, 2006, at 11:52 AM, Gabor Grothendieck wrote:

 I don't have specific experience with this but strapply
 of package gsubfn can extract information from a string by content
 as opposed to delimiters. e.g.

 library(gsubfn)
 strapply(abc34def56xyz, [0-9]+, c)[[1]]
 [1] 34 56

 On 10/27/06, roger koenker [EMAIL PROTECTED] wrote:
 I have a pdf scan of several pages of data from a quite famous old
 paper by
 C.S. Pierce (1873).  I would like (what else?) to convert it into an
 R dataframe.
 Somewhat to my surprise the pdf seems to already be in a character
 recognized
 form, since I can search for numerical strings and they are nicely
 found.  Of
 course, as is usual with such tables there are also headings and
 column lines, etc
 etc. that are less interesting than the numbers themselves.  I've
 tried saving the
 pdf in various formats, some of which look vaguely tractable, but I'm
 hoping
 that there is something that is more automatic.

 Does anyone have experience that they could share toward this  
 objective?


 url:www.econ.uiuc.edu/~rogerRoger Koenker
 email[EMAIL PROTECTED]Department of Economics
 vox: 217-333-4558University of Illinois
 fax:   217-244-6678Champaign, IL 61820

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Quantile regression questions

2006-10-26 Thread roger koenker

Brian,

It is hard to say at this level of resolution of the question, but it  
would seem that you might
be able to start by considering each sample vector as as repeated  
measurement of the
fiber length -- so 12 obs in the first 1/16th bin, 235 in the next  
and so forth, all associated
with some vector of covariates representing location, variety, etc,  
then the conventional
quantile regression would serve to estimate a conditional quantile  
function for fiber length
for each possible covariate setting --- obviously this would require  
some model for the
way that the covariate effects fit together, linearity,  possible  
interactions, etc etc, and it
would also presume that it made sense to treat the vector of  
responses as independent
measurements.  Building in possible dependence involves some new  
challenges, but
there is some recent experience with inferential methods for  
microarrays that have
incorporated these effects.

I'd be happy to hear more about the data and possible models, but  
this should be
routed privately since the topic is rather too specialized for R-help.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Oct 26, 2006, at 7:20 AM, Brian Gardunia wrote:

 I am relatively new to R, but am intrigued by its flexibility.  I  
 am interested in quantile regression and quantile estimation as  
 regards to cotton fiber length distributions.  The length  
 distribution affects spinning and weaving properties, so it is  
 desirable to select for certain distribution types.  The AFIS fiber  
 testing machinery outputs a vector for each sample of type c(12,  
 235, 355, . . . n) with the number of fibers in n=40 1/16 inch  
 length categories.  My question is what would be the best way to  
 convert the raw output to quantiles and whether it would be  
 appropriate to use quantile regression to look at whether location,  
 variety, replication, etc. modify the length distribution.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Quantile Regression

2006-10-25 Thread roger koenker

  data(engel)
  attach(engel)
  rq(y~x)
Call:
rq(formula = y ~ x)

Coefficients:
(Intercept)   x
81.4822474   0.5601806

Degrees of freedom: 235 total; 233 residual
  rq(y~x)-f
  f$tau
[1] 0.5

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Oct 25, 2006, at 4:39 AM, [EMAIL PROTECTED] wrote:

 Hi,

 how is it possible to retrieve the corresponding tau value for each  
 observed data pair (x(t) y(t), t=1,...,n) when doing a quantile  
 regression like

 rq.fit - rq(y~x,tau=-1).

 Thank you for your help.

 Jaci
 --

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem loading SpareM package

2006-10-12 Thread roger koenker


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Oct 12, 2006, at 7:12 AM, Roger Bivand wrote:

 On Thu, 12 Oct 2006, Coomaren Vencatasawmy wrote:

 Hi,
  I have just installed R 2.4.0 and when I try to load SpareseM, I get
 the following error message

 library(SparseM)
 Package SparseM (0.71) loaded.  To cite, see citation(SparseM)
 Error in loadNamespace(package, c(which.lib.loc, lib.loc),  
 keep.source = keep.source) :
 in 'SparseM' methods specified for export, but none  
 defined: as.matrix.csr, as.matrix.csc, as.matrix.ssr,  
 as.matrix.ssc, as.matrix.coo, as.matrix, t, coerce, dim, diff,  
 diag, diag-, det, norm, chol, backsolve, solve, model.matrix,  
 model.response, %*%, %x%, image
 Error: package/namespace load failed for 'SparseM'


 Please re-install the package. All contributed packages using new- 
 style
 classes need to be re-installed because the internal representation of
 such classes and methods has changed, see CHANGES TO S4 METHODS in  
 NEWS.
 Doing:

 update.packages(checkBuilt = TRUE)

 will check your libraries for packages built under previous  
 releases and
 replace them with ones built for the platform release.


 I have contacted the package maintainers and they couldn't be of  
 any help.

 I do not recall getting this error in older R versions.

 Regards

 Coomaren

 Send instant messages to your online friends http:// 
 uk.messenger.yahoo.com
  [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.


 -- 
 Roger Bivand
 Economic Geography Section, Department of Economics, Norwegian  
 School of
 Economics and Business Administration, Helleveien 30, N-5045 Bergen,
 Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
 e-mail: [EMAIL PROTECTED]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] solaris 64 build?

2006-10-05 Thread roger koenker

We have a solaris/sparc machine that has been running an old version
of R-devel:  Version 2.2.0 Under development (unstable) (2005-06-04  
r34577)
which was built as m64 from sources.  Attempting to upgrade to 2.4.0  
the configure step
goes ok, but I'm getting early on from make:

 gcc -m64  -L/opt/sfw/lib/sparcv9  -L/usr/lib/sparcv9
 -L/usr/openwin/lib/sparcv9  -L/usr/local/lib -o R.bin Rmain.o
 CConverters.o CommandLineArgs.o  Rdynload.o Renviron.o RNG.o  apply.o
 arithmetic.o apse.o array.o attrib.o  base.o bind.o builtin.o
 character.o coerce.o colors.o complex.o connections.o context.o  cov.o
 cum.o  dcf.o datetime.o debug.o deparse.o deriv.o  dotcode.o dounzip.o
 dstruct.o duplicate.o  engine.o envir.o errors.o eval.o  format.o
 fourier.o  gevents.o gram.o gram-ex.o graphics.o  identical.o  
 internet.o
 iosupport.o  lapack.o list.o localecharset.o logic.o  main.o mapply.o
 match.o memory.o model.o  names.o  objects.o optim.o optimize.o
 options.o  par.o paste.o pcre.o platform.o  plot.o plot3d.o plotmath.o
 print.o printarray.o printvector.o printutils.o qsort.o  random.o
 regex.o registration.o relop.o rlocale.o  saveload.o scan.o seq.o
 serialize.o size.o sort.o source.o split.o  sprintf.o startup.o
 subassign.o subscript.o subset.o summary.o sysutils.o  unique.o util.o
 version.o vfonts.o xxxpr.o  mkdtemp.o ../unix/libunix.a
 ../appl/libappl.a ../nmath/libnmath.a -L../../lib -lRblas
 -L/usr/local/encap/gf7764-3.4.3+2/lib/gcc/sparc64-sun-solaris2.9/3.4.3
 -L/usr/ccs/bin/sparcv9 -L/usr/ccs/bin -L/usr/ccs/lib
 -L/usr/local/encap/gf7764-3.4.3+2/lib/sparcv9
 -L/usr/local/encap/gf7764-3.4.3+2/lib -lg2c -lm -lgcc_s
 ../extra/zlib/libz.a  ../extra/bzip2/libbz2.a ../extra/pcre/libpcre.a
 ../extra/intl/libintl.a  -lreadline -ltermcap -lnsl -lsocket -ldl -lm


 Undefined   first referenced
 symbol in file
 __builtin_isnan arithmetic.o
 ld: fatal: Symbol referencing errors. No output written to R.bin
 collect2: ld returned 1 exit status

I've tried to look at the difference in outcomes in the old R-devel
version --  if I touch arithmetic.c  there and then type make I get a  
something
almost the same as above except for the following  bits that are new  
to 2.4.0
(this diff is after replacing spaces with linebreaks obviously.)

ysidro.econ.uiuc.edu% diff t0 t1
54a55
  localecharset.o
81a83
  rlocale.o
101a104
  mkdtemp.o
104a108,109
  -L../../lib
  -lRblas


Has there been some change in the way that Rblas is used, or in
isnan?  It didn't seem so from a look at arithmetic.c, but this is well
beyond me.

I hope that someone sees something suspicious, or could point me
toward a better diagnostic.  Thanks,

Roger


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How can I generate this numbers

2006-10-02 Thread roger koenker

Try:

  rsimplex - function(n){
u - diff(sort(runif(n)))
c(u,1-sum(u))
}

On Oct 2, 2006, at 5:43 PM, Rolf Turner wrote:

 Ricardo Rios wrote:

 Hi Rolf Turner, I have a  statistical model, it model need this
 numbers for calculate the probability. This numbers must be random.

 For example I need that
 magicfunction(3)
 [1] 0.3152460 0.5231614 0.1615926
 magicfunction(3)
 [1]  0.6147933 0.3122999  0.0729068

 but the argument of the function is arbitrary , does somebody
 know if exist this function in R?

   As far as I know, no such function exists in R, but
   it would be totally trivial to write one, if that's
   what you really want.

   However the question you pose makes little sense to me.  If
   you really have a ``statisical model'' then there must be
   some marginal distribution for each of the probabilities (I
   *assume* They are probabilities) going into the sequence
   which you wish to sum to 1.

   You mention no such distribution.

   To generate such a sequence with an arbitray marginal
   distribution is so trivial that it does not bear discussing.

   If you really can't see how to do this, then you probably
   shouldn't be messing about with ``statistical models''.

   You did not explicitly deny that this is a homework problem.

   I still suspect that it is.

   cheers,

   Rolf Turner

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting a sequence of charactors or numbers

2006-09-30 Thread roger koenker

?rle

url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On Sep 30, 2006, at 5:13 PM, Joe Byers wrote:

 I have the following sequence of characters.  These could be  
 integers as
 well.  For this problem, only two values are valid.

 S S S S S S W W W W W W W W S S S S S S S S W W W W W W W W S S S S  
 S S
 S S S S S S S W W W W W W W W W

 I need to determine the count of the classes/groups in sequence. as
 6,8,8,8,13,9 where the sum of these equal my total observations.

 Any help is greatly appreciated.

 Thank you
 Joe

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Greedy triangulation

2006-09-14 Thread roger koenker

Or, perhaps, tripack?

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Sep 14, 2006, at 10:32 AM, Greg Snow wrote:

 Does the deldir package do what you want?


 --  
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 [EMAIL PROTECTED]
 (801) 408-8111


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Dan Bebber
 Sent: Thursday, September 14, 2006 3:56 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Greedy triangulation

 Hello,

 does anyone have code that will generate a greedy triangulation
 (triangulation that uses shortest non-overlapping edges) for a set of
 points in Euclidean space?

 Thanks,
 Dan Bebber
 ___
 Dr. Daniel P. Bebber
 Department of Plant Sciences
 University of Oxford
 South Parks Road
 Oxford OX1 3RB
 UK
 Tel. 01865 275060

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Ranking and selection statistical procedure

2006-08-31 Thread roger koenker

Look at ?rank ?order and ?quantile  assuming that you are using
these terms as in cs.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On Aug 31, 2006, at 5:20 AM, Prasanna BALAPRAKASH wrote:

 Dear R helpers

 I would like to know if the Ranking and Selection statistical
 procedure has been implemented in R. I made a quick search in the R
 packages list but I could not find it.

 Thanks in advance
 Prasanna

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can R compute the expected value of a random variable?

2006-08-27 Thread roger koenker

General questions elicit general answers; more specific questions
elicit more specific answers.For example,

  exp(2+9/2)
[1] 665.1416

url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On Aug 27, 2006, at 11:52 AM, Paul Smith wrote:

 On 8/26/06, Mike Nielsen [EMAIL PROTECTED] wrote:
 Yes.

 Can R compute the expected value of a random variable?

 Mike: thank you very much indeed for your so insightful and complete
 answer. I have  meanwhile deepened my research and, as a consequence,
 I have found the following solution, which seems to work fine:

 integrand - function(x){x*dlnorm(x,meanlog=2,sdlog=3)}
 integrate(integrand,-Inf, Inf)
 665.146 with absolute error  0.046


 There is also a package apt to calculate expected values: it is called
 distrEx. (Thanks, Matthias.)

 Paul

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculating trace of products

2006-08-14 Thread roger koenker

I would suspect that something simple like

sum(diag(crossprod(A,B)))

would be quite competitive...

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Aug 14, 2006, at 6:58 AM, Søren Højsgaard wrote:

 Dear all,
 I need to calculate tr(A B), tr(A B A B) and similar quantities  
 **fast** where the matrices A, B are symmetrical. I've searched for  
 built-in functions for that purpose, but without luck. Can anyone  
 help?
 Thanks in advance
 Søren

   [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Pseudo R for Quant Reg

2006-08-02 Thread roger koenker

This is getting to be a faq -- here is a prior answer:

 No, but the objective function can be computed for any fitted
 rq object, say f,  as

   rho - function(u,tau=.5)u*(tau - (u  0))
   V - sum(rho(f$resid, f$tau))

 so it is easy to roll your own

I don't much like R1, or R2 for that matter, so it isn't likely to
be automatically provided in quantreg any time soon.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Aug 1, 2006, at 11:46 AM, [EMAIL PROTECTED] wrote:

 Dear R Users,

 Did someone implemented the R1 (Pseudo R-2) and likelihood ratio
 statistics for quantile regressions,  which are some of the inference
 procedures for quantile regression
 found in Koenker and Machado (1999)?
 I tried the Ox version, but my dataset is too large ( 50.000) and the
 algorith breaks.
 
 Ricardo Gonçalves Silva, M. Sc.
 Apoio aos Processos de Modelagem Matemática
 Econometria  Inadimplência
 Serasa S.A.
 (11) - 6847-8889
 [EMAIL PROTECTED]

 ** 
 
 As informações contidas nesta mensagem e no(s) arquivo(s...{{dropped}}

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Finding the position of a variable in a data.frame

2006-08-02 Thread roger koenker

it is the well-known wicked which problem:  if you had (grammatically  
incorrectly)
thought ... which I want to change then you might have been led
to type (in another window):

?which

and you would have seen the light.  Maybe that() should be an alias
for which()?

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Aug 2, 2006, at 4:01 PM, John Kane wrote:

 Simple problem but I don't see the answer. I'm trying
 to clean up some data
 I have 120 columns in a data.frame.  I have one value
 in a column named blaw that I want to change. How do
 I find the coordinates. I can find the row by doing a
 subset on the data.frame but how do I find out here
 blaw  is in columns without manually counting them
 or converting names(Df) to a list and reading down the
 list.

 Simple example

 cat - c( 3,5,6,8,0)
 dog - c(3,5,3,6, 0)
 rat - c (5, 5, 4, 9, 0)
 bat - c( 12, 42, 45, 32, 54)

 Df - data.frame(cbind(cat, dog, rat, bat))
 Df
 subset(Df, bat = 50)

 results
   cat dog rat bat
 5   0   0   0  54


 Thus I know that my target is in row 5 but how do I
 figure out where 'bat' is?

 All I want to do is be able to say
 Df[5,4] - 100

 Is there some way to have function(bat) return the
 column number: some kind of a colnum() function?  I
 had thought that I had found somthing  in
 library(gdata) matchcols but no luck.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Warning Messages using rq -quantile regressions

2006-07-23 Thread roger koenker


On Jul 23, 2006, at 5:27 AM, roger koenker wrote:

 When computing the median from a sample with an even number of  
 distinct
 values there is inherently some ambiguity about its value:  any  
 value between
 the middle order statistics is a median.  Similarly, in  
 regression settings the
 optimization problem solved by the br version of the simplex  
 algorithm,
 modified to do general quantile regression identifies cases where  
 there may
 be non uniqueness of this type.  When there are continuous  
 covariates this
 is quite rare, when covariates are discrete then it is relatively  
 common, at
 least when tau is chosen from the rationals.  For univariate  
 quantiles R provides
 several methods of resolving this sort of ambiguity by  
 interpolation, br doesn't
 try to do this, instead returning the first vertex solution that it  
 comes to.  Should
 we worry about this?  My answer would be no.  Viewed from an  
 asymptotic
 perspective any choice of a unique value among the multiple  
 solutions is a
 1/n perturbation  -- with 2500 observations this is unlikely to be  
 interesting.
 More to the point, inference about the coefficients of the model,  
 which provides
 O(1/sqrt(n)) intervals is perfectly capable of assessing the  
 meaningful uncertainty
 about these values.  Finally, if you would prefer an estimation  
 procedure that
 produced unique values more like the interpolation procedures in  
 the univariate
 setting, you could try the fn option for the algorithm.  Interior  
 point methods for
 solving linear programming problems have the feature that they  
 tend to converge
 to the centroid of solutions sets when such sets exist.  This  
 approach provides a
 means to assess the magnitude of the non-uniqueness in a particular  
 application.

 I hope that this helps,

 url:www.econ.uiuc.edu/~rogerRoger Koenker
 email   [EMAIL PROTECTED]   Department of  
 Economics
 vox:217-333-4558University of Illinois
 fax:217-244-6678Champaign, IL 61820


 On Jul 22, 2006, at 9:07 PM, Neil KM wrote:

 I am a new to using quantile regressions in R. I have estimated a  
 set of
 coefficients using the method=br algorithm with the rq command  
 at various
 quantiles along the entire distribution.

 My data set contains approximately 2,500 observations and I have 7  
 predictor
 variables. I receive the following warning message:

 Solution may be nonunique in: rq.fit.br(x, y, tau = tau, ...)

 There are 13 warnings of this type after I run a single  model. My  
 results
 are similiar to the results I received in other stat programs  
 using quantile
 reg procedures. I am unclear what these warning messages imply and  
 if there
 are problems with model fit/convergence that I may need to consider.
 Any help would be appreciated. Thanks!

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Quantreg error

2006-07-17 Thread roger koenker

As I have already told you once, and as the posting guide suggests,

If the question relates to a contributed package , e.g., one  
downloaded from CRAN, try contacting the package maintainer first.  
You can also use find(functionname) and packageDescription 
(packagename) to find this information. Only send such questions to  
R-help or R-devel if you get no reply or need further assistance.  
This applies to both requests for help and to bug reports.

the error message seems quite clear:  it means that the model that  
you have specified
implicitly with the formula has a singular X matrix.  The quantile  
regression fitting
functions don't understand about singular designs;  some day they may  
but it isn't
a high priority for me.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Jul 17, 2006, at 9:27 AM, [EMAIL PROTECTED] wrote:

 Dear User,
 I got the following error running a regression quantile:

 rq1-rq(dep ~ ., model=TRUE, data=exo, tau=0.5 );
 summary(rq1)
 Erro em rq.fit.fnb(x, y, tau = tau + h) :
 Error info =  75 in stepy: singular design

 Any hint about the problem?


 Thanks a lot,
 
 Ricardo Gonçalves Silva, M. Sc.
 Apoio aos Processos de Modelagem Matemática
 Econometria  Inadimplência
 Serasa S.A.
 (11) - 6847-8889
 [EMAIL PROTECTED]

 ** 
 

 As informações contidas nesta mensagem e no(s) arquivo(s) anexo(s) são
 endereçadas exclusivamente à(s) pessoa(s) e/ou instituição(ões) acima
 indicada(s), podendo conter dados confidenciais, os quais não  
 podem, sob
 qualquer forma ou pretexto, ser utilizados, divulgados, alterados,
 impressos ou copiados, total ou parcialmente, por pessoas não  
 autorizadas.
 Caso não seja o destinatário, favor providenciar sua exclusão e  
 notificar o
 remetente imediatamente.  O uso impróprio será tratado conforme as  
 normas
 da empresa e da legislação em vigor.
 Esta mensagem expressa o posicionamento pessoal do subscritor e não  
 reflete
 necessariamente a opinião da Serasa.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting- 
 guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] package:Matrix handling of data with identical indices

2006-07-09 Thread roger koenker



On 7/8/06, Thaden, John J [EMAIL PROTECTED] wrote:

 As there is nothing inherent in either compressed, sparse,
 format that would prevent recognition and handling of
 duplicated index pairs, I'm curious why the dgCMatrix
 class doesn't also add x values in those instances?

why not multiply them?  or take the larger one, or ...?  I would
interpret this as a case of user negligence -- there is no
natural default behavior for such cases.

On Jul 9, 2006, at 11:06 AM, Douglas Bates wrote:

 Your matrix Mc should be flagged as invalid.  Martin and I should
 discuss whether we want to add such a test to the validity method.  It
 is not difficult to add the test but there will be a penalty in that
 it will slow down all operations on such matrices and I'm not sure if
 we want to pay that price to catch a rather infrequently occuring
 problem.

Elaborating the validity procedure to flag such instances seems
to be well worth the  speed penalty in my view.  Of course,
anticipating every such misstep imposes a heavy burden
on developers and constitutes the real cost of more elaborate
validity checking.

[My 2cents based on experience with SparseM.]

url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] KhmaladzeTest

2006-07-08 Thread roger koenker

Questions about packages should be directed to the package maintainers.
A more concise example of the difficulty, with accessible data would  
also be helpful.

url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On Jul 7, 2006, at 7:39 PM, raul sanchez wrote:

 Hello. I am a beginer in R and I can not implement the  
 KhmaladzeTest in the following command. Please help me!!!
   PD: I attach thw results and the messages of the R program

   R : Copyright 2006, The R Foundation for Statistical Computing
 Version 2.3.1 (2006-06-01)
 ISBN 3-900051-07-0

 R es un software libre y viene sin GARANTIA ALGUNA.
 Usted puede redistribuirlo bajo ciertas circunstancias.
 Escriba 'license()' o 'licence()' para detalles de distribucion.

 R es un proyecto colaborativo con muchos contribuyentes.
 Escriba 'contributors()' para obtener mas informacion y
 'citation()' para saber como citar R o paquetes de R en publicaciones.

 Escriba 'demo()' para demostraciones, 'help()' para el sistema on-line
 de ayuda,
 o 'help.start()' para abrir el sistema de ayuda HTML con su navegador.
 Escriba 'q()' para salir de R.

 utils:::menuInstallLocal()
 package 'quantreg' successfully unpacked and MD5 sums checked
 updating HTML package descriptions
 utils:::menuInstallLocal()
 package 'foreign' successfully unpacked and MD5 sums checked
 updating HTML package descriptions
 utils:::menuInstallLocal()
 package 'Rcmdr' successfully unpacked and MD5 sums checked
 updating HTML package descriptions
 local({pkg - select.list(sort(.packages(all.available = TRUE)))
 + if(nchar(pkg)) library(pkg, character.only=TRUE)})
 local({pkg - select.list(sort(.packages(all.available = TRUE)))
 + if(nchar(pkg)) library(pkg, character.only=TRUE)})
 quantreg package loaded:  To cite see citation(quantreg)
 local({pkg - select.list(sort(.packages(all.available = TRUE)))
 + if(nchar(pkg)) library(pkg, character.only=TRUE)})
 local({pkg - select.list(sort(.packages(all.available = TRUE)))
 + if(nchar(pkg)) library(pkg, character.only=TRUE)})
 Loading required package: tcltk
 Loading Tcl/Tk interface ... done
 --- Please select a CRAN mirror for use in this session ---
 also installing the dependencies 'acepack', 'scatterplot3d',  
 'fBasics',
 'Hmisc', 'quadprog', 'oz', 'mlbench', 'randomForest', 'SparseM',
 'xtable', 'chron', 'fCalendar', 'its', 'tseries', 'DAAG', 'e1071',  
 'mvtnorm',
 'zoo', 'strucchange', 'sandwich', 'dynlm', 'leaps'

 probando la URL
 'http://cran.au.r-project.org/bin/windows/contrib/2.3/ 
 acepack_1.3-2.2.zip'
 Content
 type 'application/zip' length 55667 bytes
 URL abierta
 downloaded 54Kb

 probando la URL
 'http://cran.au.r-project.org/bin/windows/contrib/2.3/ 
 scatterplot3d_0.3-24.zip'
 Content
 type 'application/zip' length 540318 bytes
 URL abierta
 downloaded 527Kb

 probando la URL
 'http://cran.au.r-project.org/bin/windows/contrib/2.3/ 
 fBasics_221.10065.zip'
 Content
 type 'application/zip' length 3327499 bytes
 URL abierta
 downloaded 3249Kb

 probando la URL
 'http://cran.au.r-project.org/bin/windows/contrib/2.3/ 
 Hmisc_3.0-12.zip'
 Content
 type 'application/zip' length 1993038 bytes
 URL abierta
 downloaded 1946Kb

 probando la URL
 'http://cran.au.r-project.org/bin/windows/contrib/2.3/ 
 quadprog_1.4-8.zip'
 Content
 type 'application/zip' length 38626 bytes
 URL abierta
 downloaded 37Kb

 probando la URL
 'http://cran.au.r-project.org/bin/windows/contrib/2.3/oz_1.0-13.zip'
 Content
 type 'application/zip' length 39640 bytes
 URL abierta
 downloaded 38Kb

 probando la URL
 'http://cran.au.r-project.org/bin/windows/contrib/2.3/ 
 mlbench_1.1-1.zip'
 Content
 type 'application/zip' length 1324913 bytes
 URL abierta
 downloaded 1293Kb

 probando la URL
 'http://cran.au.r-project.org/bin/windows/contrib/2.3/ 
 randomForest_4.5-16.zip'
 Content
 type 'application/zip' length 209710 bytes
 URL abierta
 downloaded 204Kb

 probando la URL
 'http://cran.au.r-project.org/bin/windows/contrib/2.3/ 
 SparseM_0.68.zip'
 Content
 type 'application/zip' length 728594 bytes
 URL abierta
 downloaded 711Kb

 probando la URL
 'http://cran.au.r-project.org/bin/windows/contrib/2.3/ 
 xtable_1.3-2.zip'
 Content
 type 'application/zip' length 56703 bytes
 URL abierta
 downloaded 55Kb

 probando la URL
 'http://cran.au.r-project.org/bin/windows/contrib/2.3/chron_2.3-4.zip'
 Content
 type 'application/zip' length 101287 bytes
 URL abierta
 downloaded 98Kb

 probando la URL
 'http://cran.au.r-project.org/bin/windows/contrib/2.3/ 
 fCalendar_221.10065.zip'
 Content
 type 'application/zip' length 754551 bytes
 URL abierta
 downloaded 736Kb

 probando la URL
 'http://cran.au.r-project.org/bin/windows/contrib/2.3/its_1.1.1.zip'
 Content
 type 'application/zip' length 194287 bytes
 URL abierta
 downloaded 189Kb

 probando la

Re: [R] sparse matrix, rnorm, malloc

2006-06-10 Thread roger koenker

You need to look at the packages specifically designed  for
sparse matrices:  SparseM and Matrix.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On Jun 10, 2006, at 12:53 PM, g l wrote:

 Hi,

 I'm Sorry for any cross-posting. I've reviewed the archives and could
 not find an exact answer to my question below.

 I'm trying to generate very large sparse matrices ( 1% non-zero
 entries per row). I have a sparse matrix function below which works
 well until the row/col count exceeds 10,000. This is being run on a
 machine with 32G memory:

 sparse_matrix - function(dims,rnd,p) {
  ptm - proc.time()
  x - round(rnorm(dims*dims),rnd)
  x[((abs(x) - p)  0)] - 0
  y - matrix(x,nrow=dims,ncol=dims)
  proc.time() - ptm
 }

 When trying to generate the matrix around 20,000 rows/cols on a
 machine with 32G of memory, the error message I receive is:

 R(335) malloc: *** vm_allocate(size=324096) failed (error code=3)
 R(335) malloc: *** error: can't allocate region
 R(335) malloc: *** set a breakpoint in szone_error to debug
 R(335) malloc: *** vm_allocate(size=324096) failed (error code=3)
 R(335) malloc: *** error: can't allocate region
 R(335) malloc: *** set a breakpoint in szone_error to debug
 Error: cannot allocate vector of size 3125000 Kb
 Error in round(rnorm(dims * dims), rnd) : unable to find the argument
 'x' in selecting a method for function 'round'

 * Last error line is obvious. Question:  on machine w/32G memory, why
 can't it allocate a vector of size 3125000 Kb?

 When trying to generate the matrix around 30,000 rows/cols, the error
 message I receive is:

 Error in rnorm(dims * dims) : cannot allocate vector of length  
 9
 Error in round(rnorm(dims * dims), rnd) : unable to find the argument
 'x' in selecting a method for function 'round'

 * Last error line is obvious. Question: is this 9 bytes?
 kilobytes? This error seems to be specific now to rnorm, but it
 doesn't indicate the length metric (b/Kb/Mb) as it did for 20,000
 rows/cols. Even if this Mb, why can't this be allocated on a machine
 with 32G free memory?

 When trying to generate the matrix with over 50,000 rows/cols, the
 error message I receive is:

 Error in rnorm(n, mean, sd) : invalid arguments
 In addition: Warning message:
 NAs introduced by coercion
 Error in round(rnorm(dims * dims), rnd) : unable to find the argument
 'x' in selecting a method for function 'round'

 * Same.

 Why would it generate different errors in each case? Code fixes? Any
 simple ways to generate sparse matrices which would avoid above
 problems?

 Thanks in advance,

 Gavin

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting- 
 guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] sparse matrix, rnorm, malloc

2006-06-10 Thread roger koenker


As an example of how one might do this sort of thing in SparseM
ignoring the rounding aspect...

require(SparseM)
require(msm) #for rtnorm
sm - function(dim,rnd,q){
 n - rbinom(1, dim * dim, 2 * pnorm(q) - 1)
 ia - sample(dim,n,replace = TRUE)
 ja - sample(dim,n,replace = TRUE)
 ra - rtnorm(n,lower = -q, upper = q)
 A - new(matrix.coo, ia = as.integer(ia), ja = as.integer 
(ja), ra = ra, dimension = as.integer(c(dim,dim)))
 A - as.matrix.csr(A)
 }

For dim = 5000 and q = .03 which exceeds Gavin's suggested  1 percent
density, this takes about 30 seconds on my imac and according to Rprof
about 95 percent of that (total) time is spent generating the  
truncated normals.
Word of warning:  pushing this too much further  gets tedious  since the
number of random numbers grows like dim^2.  For example, dim = 20,000
and q = .02 takes 432 seconds with again 93% of the total time spent in
rnorm and rtnorm...


url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On Jun 10, 2006, at 12:53 PM, g l wrote:

 Hi,

 I'm Sorry for any cross-posting. I've reviewed the archives and could
 not find an exact answer to my question below.

 I'm trying to generate very large sparse matrices ( 1% non-zero
 entries per row). I have a sparse matrix function below which works
 well until the row/col count exceeds 10,000. This is being run on a
 machine with 32G memory:

 sparse_matrix - function(dims,rnd,p) {
  ptm - proc.time()
  x - round(rnorm(dims*dims),rnd)
  x[((abs(x) - p)  0)] - 0
  y - matrix(x,nrow=dims,ncol=dims)
  proc.time() - ptm
 }

 When trying to generate the matrix around 20,000 rows/cols on a
 machine with 32G of memory, the error message I receive is:

 R(335) malloc: *** vm_allocate(size=324096) failed (error code=3)
 R(335) malloc: *** error: can't allocate region
 R(335) malloc: *** set a breakpoint in szone_error to debug
 R(335) malloc: *** vm_allocate(size=324096) failed (error code=3)
 R(335) malloc: *** error: can't allocate region
 R(335) malloc: *** set a breakpoint in szone_error to debug
 Error: cannot allocate vector of size 3125000 Kb
 Error in round(rnorm(dims * dims), rnd) : unable to find the argument
 'x' in selecting a method for function 'round'

 * Last error line is obvious. Question:  on machine w/32G memory, why
 can't it allocate a vector of size 3125000 Kb?

 When trying to generate the matrix around 30,000 rows/cols, the error
 message I receive is:

 Error in rnorm(dims * dims) : cannot allocate vector of length  
 9
 Error in round(rnorm(dims * dims), rnd) : unable to find the argument
 'x' in selecting a method for function 'round'

 * Last error line is obvious. Question: is this 9 bytes?
 kilobytes? This error seems to be specific now to rnorm, but it
 doesn't indicate the length metric (b/Kb/Mb) as it did for 20,000
 rows/cols. Even if this Mb, why can't this be allocated on a machine
 with 32G free memory?

 When trying to generate the matrix with over 50,000 rows/cols, the
 error message I receive is:

 Error in rnorm(n, mean, sd) : invalid arguments
 In addition: Warning message:
 NAs introduced by coercion
 Error in round(rnorm(dims * dims), rnd) : unable to find the argument
 'x' in selecting a method for function 'round'

 * Same.

 Why would it generate different errors in each case? Code fixes? Any
 simple ways to generate sparse matrices which would avoid above
 problems?

 Thanks in advance,

 Gavin

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting- 
 guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Re-binning histogram data

2006-06-09 Thread roger koenker

On Jun 9, 2006, at 7:38 AM, Duncan Murdoch wrote:

 Now, if you were to suggest that the stem() function is a bizarre
 simulation of a stone-age tool on a modern computer, I might agree.


But as a stone-age (blackboard)  tool it is unsurpassed.  It is the only
bright spot in the usually depressing ritual  of returning exam
results.  Full disclosure of the distribution in a very concise  
encoding.

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] R crashes on quantreg

2006-06-07 Thread roger koenker

Since the crash occurs plotting the lm object it is unclear what
this has to do with quantreg, but maybe you could explain

1.  what you mean by crash,
2.  something about x,y,

This is best addressed to the maintainer of the package rather than to
R-help, provided, of course, that it is really a question about  
quantreg.

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Jun 7, 2006, at 2:32 PM, Mu Tian wrote:

 I was trying quantreg package,

 lm1 - lm(y~x)
 rq1 - rq(y~x)
 plot(summary(rq1)) #then got a warning says singular value, etc.  
 but this
 line can be omited
 plot(lm1) #crash here

 It happened every time on my PC, Windows XP Pro Serv. Pack 1,  
 Pentium(4)
 3.00G.

   [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting- 
 guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] R crashes on quantreg

2006-06-07 Thread roger koenker

R-help doesn't  foward attached data files like this, but Brian
kindly forwarded it to me.

You need to restrict X so that it is full rank,  it now has
rank 19 and column dimension 29 (with intercept).  See
for example svd(cbind(1,x)).

I'll add some better checking for this, but it will basically amount
to setting singular.ok = FALSE in lm() and forcings users to do
the rank reduction themselves.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Jun 7, 2006, at 3:05 PM, Mu Tian wrote:

 I attached the data file here. I restarted the PC but it still  
 happens. It
 says a memory address could not be written. I am not sure it is a  
 problem of
 R or quantreg but I plot without problems before I load quantreg.

 Thank you.

 Tian

 On 6/7/06, Prof Brian Ripley [EMAIL PROTECTED] wrote:

 Without y and x we cannot reproduce this.

 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html

 On Wed, 7 Jun 2006, Mu Tian wrote:

  I forgot to mention my R version is 2.3.1 and quantreg is the most
 updated
  too.

 It has a version number, which the posting guide tells you how to  
 find.

  On 6/7/06, Mu Tian [EMAIL PROTECTED] wrote:
 
   I was trying quantreg package,
 
  lm1 - lm(y~x)
  rq1 - rq(y~x)
  plot(summary(rq1)) #then got a warning says singular value,  
 etc. but
 this
  line can be omited
  plot(lm1) #crash here
 
  It happened every time on my PC, Windows XP Pro Serv. Pack 1,
 Pentium(4)
  3.00G.
 
 
[[alternative HTML version deleted]]



 --
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting- 
 guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Re : Large database help

2006-05-16 Thread roger koenker

In ancient times, 1999 or so, Alvaro Novo and I experimented with an
interface to mysql that brought chunks of data into R and accumulated  
results.
This is still described and available on the web in its original form at

http://www.econ.uiuc.edu/~roger/research/rq/LM.html

Despite claims of future developments nothing emerged, so anyone
considering further explorations with it may need training in  
Rchaeology.

The toy problem we were solving was a large least squares problem,
which was a stalking horse for large quantile regression  problems.   
Around the same
time I discovered sparse linear algebra and realized that virtually all
large problems that I was interested in were better handled in from
that perspective.

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On May 16, 2006, at 3:57 PM, Robert Citek wrote:


 On May 16, 2006, at 11:19 AM, Prof Brian Ripley wrote:
 Well, there *is* a manual about R Data Import/Export, and this does
 discuss using R with DBMSs with examples.  How about reading it?

 Thanks for the pointer:

http://cran.r-project.org/doc/manuals/R-data.html#Relational-
 databases

 Unfortunately, that manual doesn't really answer my question.  My
 question is not about how do I make R interact with a database, but
 rather how do I make R interact with a database containing large sets.

 The point being made is that you can import just the columns you
 need, and indeed summaries of those columns.

 That sounds great in theory.  Now I want to reduce it to practice.
 In the toy problem from the previous post, how can one compute the
 mean of a set of 1e9 numbers?  R has some difficulty generating a
 billion (1e9) number set let alone taking the mean of that set.  To  
 wit:

bigset - runif(1e9,0,1e9)

 runs out of memory on my system.  I realize that I can do some fancy
 data shuffling and hand-waving to calculate the mean.  But I was
 wondering if R has a module that already abstracts out that magic,
 perhaps using a database.

 Any pointers to more detailed reading is greatly appreciated.

 Regards,
 - Robert
 http://www.cwelug.org/downloads
 Help others get OpenSource software.  Distribute FLOSS
 for Windows, Linux, *BSD, and MacOS X with BitTorrent

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting- 
 guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] (no subject)

2006-05-16 Thread roger koenker

an upgrade:   from the flintstones -- to the michelin  man...


On May 16, 2006, at 4:40 PM, Thomas Lumley wrote:

 On Tue, 16 May 2006, roger koenker wrote:

 In ancient times, 1999 or so, Alvaro Novo and I experimented with an
 interface to mysql that brought chunks of data into R and accumulated
 results.
 This is still described and available on the web in its original  
 form at

  http://www.econ.uiuc.edu/~roger/research/rq/LM.html

 Despite claims of future developments nothing emerged, so anyone
 considering further explorations with it may need training in
 Rchaeology.

 A few hours ago I submitted to CRAN a package biglm that does large
 linear regression models using a similar strategy (it uses  
 incremental QR
 decomposition rather than accumalating the crossproduct matrix). It  
 also
 computes the Huber/White sandwich variance estimate in the same single
 pass over the data.

 Assuming I haven't messed up the package checking it will appear
 in the next couple of day on CRAN. The syntax looks like
a - biglm(log(Volume) ~ log(Girth) + log(Height), chunk1)
a - update(a, chunk2)
a - update(a, chunk3)
summary(a)

 where chunk1, chunk2, chunk3 are chunks of the data.


   -thomas

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting- 
 guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Polygon-like interactive selection of plotted points

2006-04-26 Thread roger koenker

?in.convex.hull  in the package tripack.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Apr 26, 2006, at 1:25 PM, Marc Schwartz (via MN) wrote:

 On Wed, 2006-04-26 at 18:13 +0100, Florian Nigsch wrote:
 [Please CC me for all replies, since I am not currently subscribed to
 the list.]

 Hi all,

 I have the following problem/question: Imagine you have a two-
 dimensional plot, and you want to select a number of points, around
 which you could draw a polygon. The points of the polygon are defined
 by clicking in the graphics window (locator()/identify()), all points
 inside the polygon are returned as an object.

 Is something like this already implemented?

 Thanks a lot in advance,

 Florian

 I don't know if anyone has created a single function do to this  
 (though
 it is always possible).

 However, using:

   RSiteSearch(points inside polygon)

 brings up several function hits that, if put together with the above
 interactive functions, could be used to do what you wish. That is,  
 input
 the matrix of x,y coords of the interactively selected polygon and the
 x,y coords of the underlying points set to return the points inside or
 outside the polygon boundaries.

 Just as an FYI, you might also want to look at ?chull, which is in the
 base R distribution and returns the set of points on the convex  
 hull of
 the underlying point set. This is to some extent, the inverse of what
 you wish to do.

 HTH,

 Marc Schwartz

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting- 
 guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Heteroskedasticity in Tobit models

2006-04-25 Thread roger koenker

Powell's quantile regression method is available in the quantreg
package  rq(..., method=fcen, ...)


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Apr 25, 2006, at 2:07 PM, Alan Spearot wrote:

 Hello,

 I've had no luck finding an R package that has the ability to  
 estimate a
 Tobit model allowing for heteroskedasticity (multiplicative, for  
 example).
 Am I missing something in survReg?  Is there another package that I'm
 unaware of?  Is there an add-on package that will test for
 heteroskedasticity?

 Thanks for your help.

 Cheers,
 Alan Spearot

 --
 Alan Spearot
 Department of Economics
 University of Wisconsin - Madison
 [EMAIL PROTECTED]

   [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting- 
 guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Handling large dataset dataframe

2006-04-24 Thread roger koenker

You can read chunks of it at a time and store it in sparse matrix
form using the packages SparseM or Matrix,  but then you need
to think about what you want to do with it least squares sorts
of things are ok, but other options are somewhat limited...


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Apr 24, 2006, at 12:41 PM, Sachin J wrote:

 Hi,

   I have a dataset consisting of 350,000 rows and 266 columns.  Out  
 of 266 columns 250 are dummy variable columns. I am trying to read  
 this data set into R dataframe object but unable to do it due to  
 memory size limitations (object size created is too large to handle  
 in R).  Is there a way to handle such a large dataset in R.

   My PC has 1GB of RAM, and 55 GB harddisk space running windows XP.

   Any pointers would be of great help.

   TIA
   Sachin

   
 -

   [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting- 
 guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] running median and smoothing splines for robust surface f itting

2006-03-16 Thread roger koenker

Andy's comment gives me an excuse to mention that rqss() in
my quantreg package does median smoothing for 1d and 2d function
and additive models involving such functions using total
variation of f' and grad f  as a roughness penalties.  Further  
references
available from ?rqss.

Roger

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Mar 16, 2006, at 6:13 AM, Liaw, Andy wrote:

 loess() should be able to do robust 2D smoothing.

 There's no natural ordering in 2D, so defining running medians can be
 tricky.  I seem to recall Prof. Koenker talked about some robust 2D
 smoothing method at useR! 2004, but can't remember if it's  
 available in some
 packages.

 Andy

 From: Vladislav Petyuk

 Hi,
 Are there any multidimenstional versions of runmed() and
 smooth.spline() functions? I need to fit surface into quite
 noisy 3D data.

 Below is an example (2D) of kind of fittings I do.
 Thank you,
 Vlad

 #=generating complex x,y dataset with gaussian  uniform
 noise== x - seq(1:1) x2 - rep(NA,2*length(x)) y2 -
 rep(NA,2*length(x)) x2[seq(1,length(x2),2)] - x
 x2[seq(2,length(x2),2)] - x y2[seq(1,length(x2),2)] -
 sin(4*pi*x/length(x)) + rnorm(length(x))
 y2[seq(2,length(x2),2)] - runif(length(x),min=-5,max=5)
 #===

 #=robust  smooth fit===
 y3 - runmed(y2,51,endrule=median) #first round of running
 median y4 - smooth.spline(x2,y3,df=10) #second round of
 smoothing splines
 #===

 #=ploting data==
 plot(x2,y2,pch=19,cex=0.1)
 points(x2,y3,col=red,pch=19,cex=0.1) #running median
 points(y4,col=green,pch=19,cex=0.1) #smoothing splines
 #===

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html



 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting- 
 guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] problem for wtd.quantile()

2006-03-16 Thread roger koenker

Certainly an improvement, but probably not what is really
wanted... I get:

  rq(x ~ 1, weights=w,tau = c(.01,.25,.5,.75,.99))
Call:
rq(formula = x ~ 1, tau = c(0.01, 0.25, 0.5, 0.75, 0.99), weights = w)

Coefficients:
 tau= 0.01 tau= 0.25 tau= 0.50 tau= 0.75 tau= 0.99
(Intercept) 1 1 2 3 5

Degrees of freedom: 5 total; 4 residual

The first observation x=1 has weight .33  so it should be the
.25 quantile, unless there is some interpolation going on

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Mar 16, 2006, at 7:34 AM, Liaw, Andy wrote:

 Perhaps you're looking for this?

 ?wtd.quantile
 wtd.quantile(x,weights=w, normwt=TRUE)
   0%  25%  50%  75% 100%
12235

 Andy

 From: Jing Yang

 Dear R-users,

 I don't know if there is a problem in wtd.quantile (from
 library Hmisc):
 
 x - c(1,2,3,4,5)
 w - c(0.5,0.4,0.3,0.2,0.1)
 wtd.quantile(x,weights=w)
 ---
 The output is:
   0%  25%  50%  75% 100%
 3.00 3.25 3.50 3.75 4.00

 The version of R I am using is: 2.1.0

 Best,Jing



 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting- 
 guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] transforming data frame for use with persp

2006-02-13 Thread roger koenker


a strategy for this that I  use is just

persp(interp(x,y,z))

where interp is from the Akima package, and x,y,z are all
of the same length.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Feb 13, 2006, at 3:07 PM, Denis Chabot wrote:

 Hi,

 This is probably documented, but I cannot find the right words or
 expression for a search. My attempts failed.

 I have a data frame of 3 vectors (x, y and z) and would like to
 transform this so that I could use persp. Presently I have y-level
 copies of each x level, and a z value for each x-y pair. I need 2
 columns giving the possible levels of x and y, and then a
 transformation of z from a long vector into a matrix of x-level rows
 and y-level columns. How do I accomplish this?

 In this example, I made a set of x and y values to get predictions
 from a GAM, then combined them with the predictions into a data
 frame. This is the one I'd like to transform as described above:

 My.data - expand.grid(Depth=seq(40,220, 20), Temp=seq(-1, 6, 0.5))
 predgam - predict.gam(dxt.gam, My.data, type=response)
 pred.data - data.frame(My.data, predgam)

 pred.data has 150 lines and 3 columns.

 Thanks for your help,

 Denis Chabot

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting- 
 guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] rob var/cov + LAD regression

2006-02-08 Thread roger koenker


On Feb 8, 2006, at 10:22 AM, Angelo Secchi wrote:


 1. Is there a function to have a  jackknifed corrected  var/cov  
 estimate (as described in MacKinnon and White 1985) in a standard  
 OLS regression?

package:  sandwich

 2. Does R possess a LAD (Least Absolute Deviation) regression  
 function?

package:  quantreg

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] appeal --- add sd to summary for univariates

2006-02-06 Thread roger koenker


On Feb 6, 2006, at 2:34 PM, ivo welch wrote:

 Aside, a logical ordering might also be:
mean sd min q1 med q3 max
 rather than have mean buried in between order statistics.

Just where it belongs, IMHO

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Tobit estimation?

2006-01-19 Thread roger koenker

For adventurous, but skeptical souls who lack faith in the usual
Gaussian tobit assumptions, I could mention that there is new
fcen  method for the quantreg rq() function that implements
Powell's tobit estimator using an algorithm of Bernd Fitzenberger.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Jan 19, 2006, at 6:04 AM, Achim Zeileis wrote:

 On Thu, 19 Jan 2006 14:05:58 +0530 Ajay Narottam Shah wrote:

 Folks,

 Based on
   http://www.biostat.wustl.edu/archives/html/s-news/1999-06/ 
 msg00125.html

 I thought I should experiment with using survreg() to estimate tobit
 models.

 I've been working on a convenience interface to survreg() that  
 makes it
 particularly easy to fit tobit models re-using the survreg()
 infrastructure. The package containing the code will hopefully be
 release soon - anyone who wants a devel snapshot, please contact me
 off-list.
 Ajay, I'll send you the code in a separate mail.

 Best,
 Z

 I start by simulating a data frame with 100 observations from a tobit
 model

 x1 - runif(100)
 x2 - runif(100)*3
 ystar - 2 + 3*x1 - 4*x2 + rnorm(100)*2
 y - ystar
 censored - ystar = 0
 y[censored] - 0
 D - data.frame(y, x1, x2)
 head(D)
   y x1x2
 1 0.000 0.86848630 2.6275703
 2 0.000 0.88675832 1.7199261
 3 2.7559349 0.38341782 0.6247869
 4 0.000 0.02679007 2.4617981
 5 2.2634588 0.96974450 0.4345950
 6 0.6563741 0.92623096 2.4983289

 # Estimate it
 library(survival)
 tfit - survreg(Surv(y, y0, type='left') ~ x1 + x2,
   data=D, dist='gaussian', link='identity')

 It says:

   Error in survreg.control(...) : unused argument(s) (link ...)
   Execution halted

 My competence on library(survival) is zero. Is it still the case that
 it's possible to be clever and estimate the tobit model using
 library(survival)?

 I also saw the two-equation setup in the micEcon library. I haven't
 yet understood when I would use that and when I would use a straight
 estimation of a censored regression by MLE. Can someone shed light on
 that? My situation is: Foreign investment on the Indian stock
 market. Lots of firms have zero foreign investment. But many do have
 foreign investment. I thought this is a natural tobit situation.

 -- 
 Ajay Shah
 http://www.mayin.org/ajayshah
 [EMAIL PROTECTED]
 http://ajayshahblog.blogspot.com *(:-? - wizard who doesn't know the
 answer.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting- 
 guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] I think simple R question

2006-01-12 Thread roger koenker

see ?rle


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Jan 12, 2006, at 9:56 AM, Mark Leeds wrote:

 I have a vector x with #'s ( 1 or -1 in them ) in it and I want to
 mark a new vector with the sign of the value of the a streak
 of H where H = some number ( at the next spot in the vector )

 So, say H was equal to 3 and
 I had a vector of

 [1]  [2]  [3]  [4]  [5]   [6]  [7]  [8]  [9]  [10]

 1   -11 11   -11 1  -1-1

 then, I would want a function to return a new
 vector of


 [1]  [2]  [3]  [4]  [5]   [6]  [7]  [8]  [9]  [10]

 0 000 0   1 0 0   0 0

 As I said, I used to do these things like this
 it's been a while and I'm rusty with this stuff.

 Without looping is preferred but looping is okay
 also.

Mark





















 **
 This email and any files transmitted with it are confidentia... 
 {{dropped}}

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting- 
 guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] update?

2006-01-02 Thread roger koenker

I'm having problems with environments and update() that
I expect have a simple explanation.  To illustrate, suppose
I wanted to make a very primitive Tukey one-degree-of-
freedom for nonadditivity test and naively wrote:

nonadd - function(formula){
 f - lm(formula)
 v - f$fitted.values^2
 g - update(f, . ~ . + v)
 anova(f,g)
 }

x - rnorm(20)
y - rnorm(20)
nonadd(y ~ x)

Evidently, update is looking in the environment producingf f and
doesn't find v, so I get:

Error in eval(expr, envir, enclos) : Object v not found

This may (or may not) be related to the discussion at:
http://bugs.r-project.org/cgi-bin/R/Models?id=1861;user=guest

but in any case I hope that someone can suggest how such
difficulties can be circumvented.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] GLM Logit and coefficient testing (linear combination)

2005-12-18 Thread roger koenker

see ?anova.glm

On Dec 18, 2005, at 10:32 AM, David STADELMANN wrote:

 Hi,

 I am running glm logit regressions with R and I would like to test a
 linear combination of coefficients (H0: beta1=beta2 against H1:
 beta1beta2). Is there a package for such a test or how can I perform
 it otherwise (perhaps with logLik() ???)?

 Additionally I was wondering if there was no routine to calculate  
 pseudo
 R2s for logit regressions. Currently I am calculating the pseudo R2 by
 comparing the maximum value of the log-Likelihood-function of the  
 fitted
 model with the maximum log-likelihood-function of a model containing
 only a constant. Any better ideas?

 Thanks a lot for your help.
 David

 ##
 David Stadelmann
 Seminar für Finanzwissenschaft
 Université de Fribourg
 Bureau F410
 Bd de Pérolles 90
 CH-1700 Fribourg
 SCHWEIZ

 Tel: +41 (026) 300 93 82
 Fax: +41 (026) 300 96 78
 Tel (priv): +41 (044) 586 78 99
 Mob (priv): +41 (076) 542 33 48
 Email: [EMAIL PROTECTED]
 Internet: http://www.unifr.ch/finwiss
 Internet (priv): http://david.stadelmann-online.com

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting- 
 guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] quantile regression problem

2005-12-10 Thread roger koenker

Since almost all (95%) of the observations are concentrated at x=0  
and x=1,
any fitting you do is strongly influenced by what would be obtained
by simply fitting quantiles at these two points and interpolating, and
extrapolating according to your favored model.  I did the following:

require(quantreg)
formula - log(y) ~ x

plot(x,y)
z - 1:30/10
for(tau in 10:19/20){
 f - rq(formula,tau = tau)
 lines(z,exp(cbind(1,z) %*% f$coef))
 }


url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On Dec 10, 2005, at 11:30 AM, [EMAIL PROTECTED] wrote:

 Dear List members,

 I would like to ask for advise on quantile regression in R.

 I am trying to perform an analysis of a relationship between  
 species abundance and its habitat requirements -
 the habitat requirements are, however, codes - 0,1,2,3... where  
 0123 and the scale is linear - so I would be happy to treat them  
 as continuos

 The analysis of the data somehow does not work, I am trying to  
 perform linear quantile regression using rq function and I cannot  
 figure out whether there is a way to analyse the data using  
 quantile regression ( I would really like to do this since the  
 shape is an envelope) or whether it is not possible.

 I tested that if I replace the categories with continuous data of  
 the same range it works perfectly. In the form I have them ( and I  
 cannot change it) I am getting
  errors - mainly about non-positive fis.

 Could somebody please let me know whether there was a way to  
 analyse the data?
 The data are enclosed and the question is
 Is there a relationship between abundance and absdeviation?
 I am interested in the upperlimit so I wanted to analyze the upper 5%.

 Thanks a lot for your help

 All the best

 Zuzana Munzbergova

 www.natur.cuni.cz/~zuzmun
 GSS1a.txt
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting- 
 guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Matrix of dummy variables from a factor

2005-12-06 Thread roger koenker


On Dec 6, 2005, at 3:27 PM, Berton Gunter wrote:

 But note: There are (almost?) no situations in R where the dummy  
 variables
 coding is needed. The coding is (almost?) always handled properly  
 by the
 modeling functions themselves.

 Question: Can someone provide a straightforward example where the  
 dummy
 variable coding **is** explicitly needed?


Bert's  question offers an opportunity for me to mention (again) my  
long standing wish
for someone to write a version of model.matrix that directly produced  
a matrix
in one of the common  sparse matrix formats.   This could be a good   
project for one of
you who like using ; ?

Roger

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Closed form for regression splines

2005-12-05 Thread roger koenker

you can do:

X - model.matrix(formula, data = your.data)


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Dec 5, 2005, at 7:36 AM, Stephen A Roberts wrote:


 Greetings,

 I have a model fitted using bs() and need to be able to write down  
 a closed form for the spline function to enable the use of the  
 fitted model outside R. Does anyone know a simple way of extracting  
 the piecewise cubics from the coefficients and knots? As far as I  
 know they are defined by recurrence relationships, but the R  
 implementation is buried in C code, and I guess in non-trivial to  
 invert. I know about predict.bs() within R, but I want the full  
 piecewise cubic.

 Steve.

   Dr Steve Roberts
   [EMAIL PROTECTED]

 Senior Lecturer in Medical Statistics,
 Biostatistics Group,
 University of Manchester,

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting- 
 guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Robust Non-linear Regression

2005-11-13 Thread roger koenker

you might consider nlrq() in the quantreg package, which does median
regression for nonlinear response functions


url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On Nov 13, 2005, at 3:47 PM, Vermeiren, Hans [VRCBE] wrote:

 Hi,

 I'm trying to use Robust non-linear regression to fit dose response  
 curves.
 Maybe I didnt look good enough, but I dind't find robust methods  
 for NON
 linear regression implemented in R. A method that looked good to me  
 but is
 unfortunately not (yet) implemented in R is described in
 http://www.graphpad.com/articles/RobustNonlinearRegression_files/ 
 frame.htm
 http://www.graphpad.com/articles/RobustNonlinearRegression_files/ 
 frame.htm


 in short: instead of using the premise that the residuals are  
 gaussian they
 propose a Lorentzian distribution,
 in stead of minimizing the squared residus SUM (Y-Yhat)^2, the  
 objective
 function is now
 SUM log(1+(Y-Yhat)^2/ RobustSD)

 where RobustSD is the 68th percentile of the absolute value of the  
 residues

 my question is: is there a smart and elegant way to change to  
 objective
 function from squared Distance to log(1+D^2/Rsd^2) ?

 or alternatively to write this as a weighted non-linear regression  
 where the
 weights are recalculated during the iterations
 in nlme it is possible to specify weights, possibly that is the way  
 to do
 it, but I didn't manage to get it working
 the weights should then be something like:

 SUM (log(1+(resid(.)/quantile(all_residuals,0.68))^2)) / SUM (resid 
 (.))

 the test data I use :
 x-seq(-5,-2,length=50)
 x-rep(x,4)
 y-SSfpl(x,0,100,-3.5,1)
 y-y+rnorm(length(y),sd=5)
 y[sample(1:length(y),floor(length(y)/50))]-200 # add 2% outliers  
 at 200

 thanks a lot

 Hans Vermeiren


 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting- 
 guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] open source and R

2005-11-13 Thread roger koenker


 On Nov 13, 2005, at 3:24 PM, Robert wrote:


 I am curious about one thing: since the reason for using r is r is  
 a easy-to-learn language and it is good for getting more people  
 involved. Why most of the packages written in r use other  
 languages such as FORTRAN's code? I understand some functions have  
 already been written in other language or it is faster to be  
 implemented in other language. But my understanding is if the user  
 does not know that language (for example, FORTRAN), the package is  
 still a black box to him  because he can not improve the package  
 and can not be involved in the development.
 When I searched the packages of R, I saw many packages with  
 duplicated or similar functions. the main difference among them  
 are the different functions implemented using other languages,  
 which are always a black box to the users. So it is very hard for  
 users to believe the package will run something they need, let  
 alone getting involved in the development.



 No, the box is not black, it is utterly transparent.  Of course,  
 what you can recognize and understand
 inside depends on you,.  Just say no  to linguistic chauvinism   
 -- even R-ism.


 url:www.econ.uiuc.edu/~rogerRoger Koenker
 email   [EMAIL PROTECTED]   Department of  
 Economics
 vox:217-333-4558University of Illinois
 fax:217-244-6678Champaign, IL 61820



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] elements in a matrix to a vector

2005-11-09 Thread roger koenker

If you are really looking for a way to extract the non-zero elements you
can use something like the following:

  library(SparseM)
   A
  [,1] [,2] [,3]
[1,]003
[2,]200
[3,]040
  as.matrix.csr(A)@ra
[1] 3 2 4

there is a tolerance parameter in the coercion to sparse representation
to decide what is really zero  -- by default this is  eps = .Machine 
$double.eps.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Nov 9, 2005, at 10:14 AM, Mike Jones wrote:

 hi all,

 i'm trying to get elements in a matrix into a vector.  i need a
 streamlined way to do it as the way i'm doing it is not very
 serviceable.  an example is a 3x3 matrix like

 0 0 3
 2 0 0
 0 4 0

 to a vector like

 3 2 4

 thanks...mj

   [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting- 
 guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] rgl.snapshot failed

2005-06-10 Thread roger koenker

I've installed the rgl package on a Suse x86-64 machine (further  
details below)
and it produces nice screen images.  Unfortunately, rgl.snapshot   
attempts to
make png files produces only the response failed.  For other  
graphics png()
works fine, and capabilities indicates that it is there.  If anyone  
has a suggestion
of what might be explored at this point I'd be very appreciative.

platform x86_64-unknown-linux-gnu
arch x86_64
os   linux-gnu
system   x86_64, linux-gnu
status
major2
minor1.0
year 2005
month04
day  18
language R

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Robustness of Segmented Regression Contributed by Muggeo

2005-06-08 Thread roger koenker

You might try rqss() in the quantreg package.  It gives piecewise  
linear fits
for a nonparametric form of median regression using total variation  
of the

derivative of the fitted function as a penalty term.  A tuning parameter
(lambda) controls the number of distinct segments.  More details are
available in the vignette for the quantreg package.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Jun 8, 2005, at 7:25 AM, Park, Kyong H Mr. RDECOM wrote:



Hello, R users,
I applied segmented regression method contributed by Muggeo and got
different slope estimates depending on the initial break points.  
The results
are listed below and I'd like to know what is a reasonable approach  
handling
this kinds of problem. I think applying various initial break  
points is
certainly not a efficient approach. Is there any other methods to  
deal with
segmented regression? From a graph, v shapes are more clear at 1.2  
and 1.5

break points than 1.5 and 1.7. Appreciate your help.

Result1:
Initial break points are 1.2 and 1.5. The estimated break points  
and slopes:


 Estimated Break-Point(s):
 Est.  St.Err
Mean.Vel 1.285 0.05258
   1.6520.01247

   Est.  St.Err. t valueCI 
(95%).l

CI(95%).u
slope1   0.4248705 0.3027957   1.403159-0.1685982 
1.018339
slope2   2.3281445 0.3079903   7.559149 1.7244946 
2.931794
slope3   9.5425516 0.7554035   12.632390 8.0619879
11.023115

Adjusted R-squared: 0.9924.

Result2:
Initial break points are 1.5 and 1.7. The estimated break points  
and slopes:


Estimated Break-Point(s):
Est.   St.Err
Mean.Vel 1.412  0.02195
   1.699  0.01001

   Est.  St.Err.t valueCI 
(95%).l

CI(95%).u
slope1  0.7300483   0.13815875.284129   0.4592623   
1.000834
slope2  3.4479466   0.244253014.116289 2.9692194
3.926674
slope3 12.500   1.7783840 7.028853 9.0144314   
15.985569


Adjusted R-squared: 0.995.




[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting- 
guide.html





__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] make install on solaris 10

2005-06-06 Thread roger koenker

We have recently upgraded to Solaris 10 on a couple of sparc machines  
with the usual
mildly mysterious consequences for library locations, etc, etc.  I've  
managed to configure

R 2.1.0 for a 64 bit version with:

R is now configured for sparc-sun-solaris2.10

  Source directory:  .
  Installation directory:/usr/local

  C compiler:gcc -m64 -g -O2
  C++ compiler:  g++  -m64 -fPIC
  Fortran compiler:  g77  -m64 -g -O2

  Interfaces supported:  X11
  External libraries:readline
  Additional capabilities:   PNG, JPEG, MBCS, NLS
  Options enabled:   R profiling

  Recommended packages:  yes

configure:47559: WARNING: you cannot build info or html versions of  
the R manuals


and make and make check seem to run smoothly, however make install  
dies with

the following messages:

ysidro.econ.uiuc.edu# make install
installing doc ...
creating doc/html/resources.html
*** Error code 255
The following command caused the error:
false --html --no-split --no-headers ./resources.texi -o ../html/ 
resources.html

make: Fatal error: Command failed for target `../html/resources.html'
Current working directory /usr/local/encap/R-2.1.0/doc/manual
installing doc/html ...
installing doc/html/search ...
/usr/local/bin/install: resources.html: No such file or directory
*** Error code 1
The following command caused the error:
for f in resources.html; do \
  /usr/local/bin/install -c -m 644 ${f} /usr/local/lib/R/doc/html; \
done
make: Fatal error: Command failed for target `install'
Current working directory /usr/local/encap/R-2.1.0/doc/html
*** Error code 1
The following command caused the error:
for d in html manual; do \
  (cd ${d}  make install) || exit 1; \
done
make: Fatal error: Command failed for target `install'
Current working directory /usr/local/encap/R-2.1.0/doc
*** Error code 1
The following command caused the error:
for d in m4 tools doc etc share src po tests; do \
  (cd ${d}  make install) || exit 1; \
done
make: Fatal error: Command failed for target `install'

and running R from the bin directory gives:

 capabilities()
jpeg  pngtcltk  X11 http/ftp  sockets   libxml fifo
   FALSEFALSEFALSEFALSE TRUE TRUE TRUE TRUE
  cledit  IEEE754iconv
TRUE TRUEFALSE

Any suggestions would be greatly appreciated.  With solaris 9 we had  
a 64 bit build
but never encounter such problems, and I don't see anything in the  
archives or the
install manual that is relevant -- but of course, I'm not very clear  
about what I'm looking

for either.

Roger


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] make install on solaris 10

2005-06-06 Thread roger koenker



On Jun 6, 2005, at 10:08 AM, Peter Dalgaard wrote:




It's your missing (or outdated) makeinfo that is coming back to bite
you. However, I'm a bit confuzzled because we do ship resources.html
et al. as part of the R tarball, so there shouldn't be a need to build
them. Were you building from an SVN checkout?

The way out is to install texinfo 4.7 or better. If you have the .html
files, you might be able to get by just by touch-ing or copying them.


On Jun 6, 2005, at 9:14 AM, Prof Brian Ripley wrote:


As far as I can see something has deleted doc/html/resources.html:  
it is in the tarball. I cannot immediately guess what: have you  
done any sort of `make clean'?


Copying it from the virgin sources and doing `make install' again  
should fix this: if not perhaps you can keep an eye on what is  
apparently removing it.


BTW, where did /usr/local/bin/install come from?  If that is not  
doing what is expected, it could be the problem.


Having:

1.  Downloaded a fresh version of R-devel
2.  Installed texinfo 4.8
3.  moved my rogue /usr/local/bin/install file out of the way

R now builds and installs fine.  It looks like X11 support is still  
missing

but presumably just needs -L/usr/openwin/lib/sparcv9.  Some further
investigation is needed for png, jpeg and tctlk support, but this can  
wait

for a little while.

Thanks very much for your help.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Piecewise Linear Regression

2005-05-30 Thread roger koenker

It is conventional to fit piecewise linear models by assuming  
Gaussian error and
using least squares methods, but one can argue that median regression  
provides
a more robust approach to this problem.  You might consider the  
following fit:


 x = c 
(6.25,6.25,12.50,12.50,18.75,25.00,25.00,25.00,31.25,31.25,37.50,37.50,5 
0.00,50.00,62.50,62.50,75.00,75.00,75.00,100.00,100.00)
 y = c 
(0.328,0.395,0.321,0.239,0.282,0.230,0.273,0.347,0.211,0.210,0.259,0.186 
,0.301,0.270,0.252,0.247,0.277,0.229,0.225,0.168,0.202)

library(quantreg)
plot(x,y)
fit - rqss(y ~ qss(x))
plot(fit)

it gives 5 segments not 3, but this can be controlled by the choice  
of lambda in the qss

function, for example, try:

fit - rqss(y ~ qss(x,lambda=3)
plot(fit,col=red)

which gives a fit like you suggest might be reasonable with only  
three segments.




url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On May 30, 2005, at 6:38 PM, Abhyuday Mandal wrote:


Hi,

I need to fit a piecewise linear regression.

x = c 
(6.25,6.25,12.50,12.50,18.75,25.00,25.00,25.00,31.25,31.25,37.50,37.50 
,50.00,50.00,62.50,62.50,75.00,75.00,75.00,100.00,100.00)
y = c 
(0.328,0.395,0.321,0.239,0.282,0.230,0.273,0.347,0.211,0.210,0.259,0.1 
86,0.301,0.270,0.252,0.247,0.277,0.229,0.225,0.168,0.202)


there are two change points. so the fitted curve should look like



\
 \  /\
  \/  \
   \
\

How do I do this in R ?

Thank you,
Abhyuday

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting- 
guide.html




__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] plotting image/contour on irregular grid

2005-05-06 Thread roger koenker


On May 6, 2005, at 2:45 PM, Roger Bivand wrote:
On Fri, 6 May 2005, m p wrote:
Hello,
I'd like to make a z(x,y) plot for irregularly spaced
x,y. What are routines are available in R for this
purpose?
One possibility is to interpolate a regular grid using interp() in the
akima package, then use image() or contour(). Another is to use
levelplot() with formula z ~ x + y in the lattice package, and the
equivalent contourplot(); here, the x,y pairs must lie on a grid,  
but do
not need to fill the grid (so are regularly spaced with missing grid
cells).

You could also try tripack and rgl.triangles to produce piecewise linear
surfaces on the Delaunay triangulation of the x,y points.
Roger
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] normality test

2005-04-28 Thread roger koenker

For my money,  Frank's comment should go into fortunes.  It seems a
rather Sisyphean battle to keep the lessons of robustness on the 
statistical table
but nevertheless well worthwhile.

url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
On Apr 28, 2005, at 7:46 AM, Frank E Harrell Jr wrote:
Usually (but not always) doing tests of normality reflect a lack of 
understanding of the power of rank tests, and an assumption of high 
power for the tests (qq plots don't always help with that because of 
their subjectivity).  When possible it's good to choose a robust 
method.  Also, doing pre-testing for normality can affect the type I 
error of the overall analysis.

--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt 
University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] standard errors for orthogonal linear regression

2005-04-28 Thread roger koenker

Wayne Fuller's Measurement Error Models is a good reference.
url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
On Apr 28, 2005, at 1:19 PM, [EMAIL PROTECTED] wrote:
Could someone please help me by giving me a reference to how one 
computes standard errors for the coefficients in an orthogonal linear 
regression, or perhaps someone has some R code? (I would accept a 
derivation or formula, but as a former teacher, I know how that can 
rankle.) I tried to imitate what's done in the code for lm() but went 
astray somewhere and got nonsense.

(This type of modeling goes by several names: total least squares, 
errors in variables, orthogonal distance regression (ODR), depending 
on where you are coming from.)

I have found ODRpack, but I haven't yet plowed through the Fortran to 
see if what I need is there; I'm working on it
Thanks!

David L. Reiner
 
Rho Trading
440 S. LaSalle St -- Suite 620
Chicago  IL  60605
 
312-362-4963 (voice)
312-362-4941 (fax)
 
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Construction of a large sparse matrix

2005-04-18 Thread roger koenker

The dense blocks are too big as Reid has already written --
for smaller instances of this sort of thing  I would suggest that the 
the kronecker
product %x% operator in SparseM,  would be more convenient.

url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
On Apr 18, 2005, at 3:54 PM, Doran, Harold wrote:
Dear List:
I'm working to construct a very large sparse matrix and have found
relief using the SparseM package. I have encountered an issue that is
confusing to me and wonder if anyone may be able to suggest a smarter
solution. The matrix I'm creating is a covariance matrix for a larger
research problem that is subsequently used in a simulation. Below is 
the
latex form of the matrix if anyone wants to see the pattern I am trying
to create.

The core of my problem seems to localize to the last line of the
following portion of code.
n-sample.size*4
k-n/4
vl.mat - as.matrix.csr(0, n, n)
block - 1:k #each submatrix size
for(i in 1:3) vl.mat[i *k + block, i*k + block] - LE
When the variable LE is 0, the matrix is easily created. For example,
when sample.size = 10,000 this matrix was created on my machine in 
about
1 second. Here is the object size.

object.size(vl.mat)
[1] 160692
However, when LE is any number other than 0, the code generates an
error. For example, when I try LE - 2 I get
Error: cannot allocate vector of size 781250 Kb
In addition: Warning message:
Reached total allocation of 1024Mb: see help(memory.size)
Error in as.matrix.coo(as.matrix.csr(value, nrow = length(rw), ncol =
length(cl))) :
Unable to find the argument x in selecting a method for
function as.matrix.coo
I'm guessing that single digit integers should occupy the same amount 
of
memory. So, I'm thinking that the matrix is less sparse and the
problem is related to the introduction of a non-zero element (seems
obvious). However, the matrix still retains a very large proportion of
zeros. In fact, there are still more zeros than non-zero elements.

Can anyone suggest a reason why I am not able to create this matrix? 
I'm
at the limit of my experience and could use a pointer if anyone is able
to provide one.

Many thanks,
Harold
P.S. The matrix above is added to another matrix to create the
covariance matrix below. The code above is designed to create the
portion of the matrix \sigma^2_{vle}\bm{J} .
\begin{equation}
\label{vert:cov}
\bm{\Phi} = var
\left [
\begin{array}{c}
Y^*_{1}\\
Y^*_{2}\\
Y^*_{3}\\
Y^*_{4}\\
\end{array}
\right ]
=
\left [
\begin{array}{}
\sigma^2_{\epsilon}\bm{I} \sigma^2_{\epsilon}\rho\bm{I}  \bm{0} 
\bm{0}\\
\sigma^2_{\epsilon}\rho\bm{I} 
\sigma^2_{\epsilon}\bm{I}+\sigma^2_{vle}\bm{J} 
\sigma^2_{\epsilon}\rho^2\bm{I}  \bm{0}\\
\bm{0}  \sigma^2_{\epsilon}\rho^2\bm{I} 
\sigma^2_{\epsilon}\bm{I}+\sigma^2_{vle}\bm{J}
\sigma^2_{\epsilon}\rho^3\bm{I}\\
\bm{0}  \bm{0}  \sigma^2_{\epsilon}\rho^3\bm{I}
\sigma^2_{\epsilon}\bm{I}+\sigma^2_{vle}\bm{J} \\
\end{array}
\right]
\end{equation}
where $\bm{I}$ is the identity matrix, $\bm{J}$ is the unity matrix, 
and
$\rho$ is the autocorrelation.


[[alternative HTML version deleted]]
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] French Curve

2005-04-06 Thread roger koenker

On Apr 6, 2005, at 1:48 AM, Martin Maechler wrote:
Median filtering aka running medians has one distinctive
advantage {over smooth.spline() or other so called linear smoothers}:
   It is robust i.e. not distorted by gross outliers.
Running medians is implemented in runmed() {standard stats package}
in a particularly optimized way rather than using the more general
running(.) approach of package 'gtools'.
Median smoothing splines are also implemented in the quantreg
package see ?rqss, but they produce piecewise linear fitting so
they may not appeal to those accustomed to french curves.
url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] off-topic question: Latex and R in industries

2005-04-06 Thread roger koenker

my favorite answer to this question is because there is no one to sue.
url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
On Apr 6, 2005, at 10:38 AM, Wensui Liu wrote:
Latex and R are really cool stuff. I am just wondering how they are
used in industry. But based on my own experience, very rare. Why?
How about the opinion of other listers? Thanks.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] total variation penalty

2005-03-02 Thread roger koenker

On Mar 2, 2005, at 6:25 PM, Vadim Ogranovich wrote:
I was recently plowing through the docs of the quantreg package by 
Roger
Koenker and came across the total variation penalty approach to
1-dimensional spline fitting. I googled around a bit and have found 
some
papers originated in the image processing community, but (apart from
Roger's papers) no paper that would discuss its statistical aspects.
You might look at
@article{davi:kova:2001,
Author = {Davies, P. L. and Kovac, A.},
Title = {Local Extremes, Runs, Strings and Multiresolution},
Year = 2001,
Journal = {The Annals of Statistics},
Volume = 29,
Number = 1,
Pages = {1--65},
Keywords = {[62G07 (MSC2000)]; [65D10 (MSC2000)]; [62G20 (MSC2000)];
   [nonparametric regression]; [local extremes]; [runs];
   [strings]; [multiresolution analysis]; [asymptotics];
   [outliers]; [low power peaks]; nonparametric function
   estimation}
}
They are using total variation of the function rather than total 
variation of its derivative
as in the KNP paper mentioned below, but there are close connections 
between the
methods.

There are several recent  papers on what Tibshirani calls the lasso vs 
other penalties for
regression problems... for example:

@article{knig:fu:2000,
Author = {Knight, Keith and Fu, Wenjiang},
Title = {Asymptotics for Lasso-type Estimators},
Year = 2000,
Journal = {The Annals of Statistics},
Volume = 28,
Number = 5,
Pages = {1356--1378},
Keywords = {[62J05 (MSC1991)]; [62J07 (MSC1991)]; [62E20 (MSC1991)];
   [60F05 (MSC1991)]; [Penalized regression]; [Lasso];
   [shrinkage estimation]; [epi-convergence in 
distribution];
   neural network models}
}
@article{fan:li:2001,
Author = {Fan, Jianqing and Li, Runze},
Title = {Variable Selection Via Nonconcave Penalized Likelihood and 
Its
Oracle Properties},
Year = 2001,
Journal = {Journal of the American Statistical Association},
Volume = 96,
Number = 456,
Pages = {1348--1360},
Keywords = {[HARD THRESHOLDING]; [LASSO]; [NONNEGATIVE GARROTE];
   [PENALIZED LIKELIHOOD]; [ORACLE ESTIMATOR]; [SCAD]; [SOFT


I have a couple of questions in this regard:
* Is it more natural to consider the total variation penalty in the
context of quantile regression than in the context of OLS?
Not especially, see the lasso literature which is predominantly based
on Gaussian likelihood.  The taut string idea is also based on Gaussian
fidelity, at least in its original form.  There are some computational
conveniences involved in using l1 penalties with l1 fidelities, but with
the development of modern interior point algorithms, l1 vs l2 fidelity 
isn't really
much of a distinction.  The real question is:  do you believe in that 
old
time religion, do you have that Gaussian faith?  I don't.

* Could someone please point to a good overview paper on the subject?
Ideally something that compares merits of different penalty functions.
See above
Threre seems to be an ongoing effort to generalize this approach to 2d,
but at this time I am more interested in 1-d smoothing.
For the sake of completeness, the additive model component of quantreg 
is
based primarily on the following two papers:

@article{koen:ng:port:1994,
Author = {Koenker, Roger and Ng, Pin and Portnoy, Stephen},
Title = {Quantile Smoothing Splines},
Year = 1994,
Journal = {Biometrika},
Volume = 81,
Pages = {673--680}
}
@article{KM.04,
Author = {Koenker, R. and I. Mizera},
Title = {Penalized Triograms:  Total Variation Regularization 
for Bivariate Smoothing},
Journal = JRSS-B,
Volume = 66,
Pages = {145--163},
Year = 2004
}

url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] logit link + alternatives

2005-02-07 Thread roger koenker

Just for the record --  NEWS for 2.1.0 includes:
o   binomial() has a new cauchit link (suggested by Roger Koenker).
the MASS polr for ordered response is also now adapted for the Cauchit 
case.

url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
On Feb 7, 2005, at 7:01 AM, (Ted Harding) wrote:
On 07-Feb-05 [EMAIL PROTECTED] wrote:
Help needed with lm function:
Dear R's,
Could anyone tell me how to replace the link function (probit logit,
loglog etc.) in lm
with an abitrary user-defined function? The task is to perform ML
Estimation of betas
for a dichotome target variable.
Maybe there is already a package for this (I did not find one).
Any hints or a code excerpt would be welcome!
Thank you -Jeff
I asked a similar question last year (2 April 2004) since I wanted
a cauchy link in a binary response model (the data suggested
heavy tails). I thought in the first place that I saw a fairly
straightforward way to do it, but Brian Ripley's informed response
put me off, once I had looked into the details of what would be
involved (his reply which includes my original mail follows):
# On Fri, 2 Apr 2004 [EMAIL PROTECTED] wrote:
#
#  I am interested in extending the repertoire of link functions
#  in glm(Y~X, family=binomial(link=...)) to include a tan link:
# 
# eta = (4/pi)*tan(mu)
# 
#  i.e. this link bears the same relation to the Cauchy distribution
#  as the probit link bears to the Gaussian. I'm interested in sage
#  advice about this from people who know their way aroung glm.
# 
#  From the surface, it looks as though it might just be a matter
#  of re-writing 'make.link' in the obvious sort of way so as to
#  incorporate tan, but I fear traps ...
#
# How are you going to do that?  If you edit make.link and have your
# own local copy, the namespace scoping will ensure that the system
# copy gets used, and the code in binomial() will ensure that even
# that does not get  called except for the pre-coded list of links.
#
#  What am I missing?
#
# You need a local, modified, copy of binomial, too, AFAICS.
As I say, the implied details put me off for a while, but in
this particular case Thomas W Yee came up with a ready-made
solution (23 April 2004):
# my VGAM package at www.stat.auckland.ac.nz/~yee
# now has the tan link for binomialff().
# It is tan(pi*(mu-0.5)).
(See his full mail in the R-help archives for April 2004
for several important details regarding this implementation).
So: it would seem to be quite possible to write yor own link
function, but it would take quite a bit of work and would
involves re-writing at least the codes for 'make.link'
and for 'binomial', and being careful about how you use them.
Hoping this helps,
Ted.

E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 07-Feb-05   Time: 12:57:07
-- XFMail --
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] A modified log transformation with real finite values for negatives and zeros?

2005-02-02 Thread roger koenker

Bickel and Doksum (JASA, 1981) discuss a modified version of the Box-Cox
transformation that looks like this:
y - ( sgn(y)* abs(y)^lambda -1)/lambda
and in the original Box-Cox paper there was an offset parameter that 
gives
rise to some somewhat peculiar likelihood theory as in the 3-parameter
log-normal where one gets an unbounded likelihood by letting the
threshold parameter approach the first order statistic  from below, but
for which the likeihood equations seem to provide a perfectly sensible
root.

url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
On Feb 2, 2005, at 1:28 PM, Spencer Graves wrote:
 Does anyone have any ideas (or even experience) regarding a 
modified log transformation that would assign real finite values to 
zeros and negative numbers?  I encounter this routinely in a couple of 
different situations:
 * Physical measurements that are often lognormally distributed 
except for values that are less than additive normal measurement 
error.  I'd like to take logarithms of the clearly positive values and 
assign some smaller finite number(s) for values less than or equal to 
zero.  I also might like to decompose the values into mean plus 
variance of the logs plus variance of additive normal noise.  However, 
that would require more machinery than is appropriate for exploratory 
data analysis.
 * Integers most of which are plausibly Poisson counts but include 
a few negative values.  People in manufacturing sometimes report the 
number of defects added between two steps in the process, computed 
as the difference between the number counted before and after 
intervening steps.  These counts are occasionally negative either 
because defects are removed in processing or because of a miscount 
either before or after.
 For an example, see www.prodsyse.com/log0.  There, you can also 
download working R code for such a transformation along with 
PowerPoint slides documenting some of the logic behind the code.  It's 
not included here, because it's too much for a standard R post.
 Comments?  Thanks,
 spencer graves

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] read.matrix.csr bug (e1071)?

2005-01-28 Thread roger koenker

Don't you want read.matrix.csr not read.matrix?
url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
On Jan 28, 2005, at 9:22 AM, Jeszenszky Peter wrote:
Hello,
I would like to read and write sparse matrices using the
functions write.matrix.csr() and read.matrix.csr()
of the package e1071. Writing is OK but reading back the
matrix fails:
x - rnorm(100)
m - matrix(x, 10)
m[m  0.5] - 0
m.csr - as.matrix.csr(m)
write.matrix.csr(m, sparse.dat)
read.matrix(sparse.dat)
	Error in initialize(value, ...) : Can't use object of class integer 
in new():  Class matrix.csr does not extend that class

Is something wrong with the code above or it must be
considered as a bug?
Best regards,
Peter
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] CIS inquiries

2005-01-24 Thread roger koenker

Does anyone have an automated way to make Current Index to Statistics
inquiries from R, or from the Unix command line?  I thought it might be
convenient to have something like this for occasions in which I'm in a
foreign domain and would like to make inquires on my office machine
without firing up a full fledged browser.  Lynx is ok for this purpose, 
but it
might be nice to have something more specifically designed for CIS.

url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Peak finding algorithm

2004-12-09 Thread roger koenker

You might want to look at the ftnonpar package.  You haven't quite 
specified whether
you are thinking about estimating densities, or regression functions or 
some third
option, or whether 2-dimensional means: functions R - R or functions 
R^2 - R,
my recollection is that ftnonpar is (mostly?) about the R - R case.

url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
On Dec 9, 2004, at 3:01 PM, Gene Cutler wrote:
I'm sure there must be various peak-finding algorithms out there.  Not 
knowing of any, I have written one myself*, but I thought I'd ask to 
see what's out there.

Basically, I have a 2-dimensional data set and I want to identify 
local peaks in the data, while ignoring trivial peaks.  My naive 
algorithm first identifies every peak and valley (point of inflection 
change in the graph), then shaves off shallow peaks and valleys based 
on an arbitrary depth parameter, then returns whatever is left.  This 
produces decent results, but, again, I'd like to know what other 
implementations are available.

(* source available on request)
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Protocol for answering basic questions

2004-12-01 Thread roger koenker

Maybe it would be helpful to think of R-help as something more than
the Oracle of Delphi.  Questions, ideally, should  be framed in such a
way that they might lead to improvements in R:  extensions of the code
or, more frequently  clarifications or extensions of the documentation.
Indeed the R-help archive itself serves this function and could 
profitably
be searched prior  to firing off a question to R-help.  As traffic on 
R-help
increases there is a delicate balance that must be maintained in order
to keep knowledgeable users interested in the list.

url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
On Dec 1, 2004, at 10:56 AM, James Foadi wrote:
On Wednesday 01 Dec 2004 4:46 pm, Robert Brown FM CEFAS wrote:
Understandable but not a recipe to encourage the use of R by other 
than
experts. The R community needs to decide of they really only want 
expert
statisticians users and make this clear if it is the case.  
Alternatively
if they are to encourage novices the present approach is not the way 
to do
it.
I perfectly agree with Robert Brown. Althogh I have been captivated by 
R,
and will keep using it, I would appreciate if R gurus could make 
this clear.

Thanks
James
--
Dr James Foadi
Structural Biology Laboratory
Department of Chemistry
University of York
YORK YO10 5YW
UK
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] impute missing values in correlated variables: transcan?

2004-11-30 Thread roger koenker

At the risk of stirring up a hornet's nest , I'd suggest that
means are dangerous in such applications.  A nice paper
on combining ratings is:  Gilbert Bassett and Joseph  Persky,
Rating Skating,  JASA, 1994,  1075-1079.
url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
On Nov 30, 2004, at 10:52 AM, Jonathan Baron wrote:
I would like to impute missing data in a set of correlated
variables (columns of a matrix).  It looks like transcan() from
Hmisc is roughly what I want.  It says, transcan automatically
transforms continuous and categorical variables to have maximum
correlation with the best linear combination of the other
variables. And, By default, transcan imputes NAs with best
guess expected values of transformed variables, back transformed
to the original scale.
But I can't get it to work.  I say
m1 - matrix(1:20+rnorm(20),5,)  # four correlated variables
colnames(m1) - paste(R,1:4,sep=)
m1[c(2,19)] - NA# simulate some missing data
library(Hmisc)
transcan(m1,data=m1)
and I get
Error in rcspline.eval(y, nk = nk, inclx = TRUE) :
  fewer than 6 non-missing observations with knots omitted
I've tried a few other things, but I think it is time to ask for
help.
The specific problem is a real one.  Our graduate admissions
committee (4 members) rates applications, and we average the
ratings to get an overall rating for each applicant.  Sometimes
one of the committee members is absent, or late; hence the
missing data.  The members differ in the way they use the rating
scale, in both slope and intercept (if you regress each on the
mean).  Many decisions end up depending on the second decimal
place of the averages, so we want to do better than just averging
the non-missing ratings.
Maybe I'm just not seeing something really simple.  In fact, the
problem is simpler than transcan assumes, since we are willing to
assume linearity of the regression of each variable on the other
variables.  Other members proposed solutions that assumed this,
but they did not take into account the fact that missing data at
the high or low end of each variable (each member's ratings)
would change its mean.
Jon
--
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page: http://www.sas.upenn.edu/~baron
R search page: http://finzi.psych.upenn.edu/
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Avoiding for-loops

2004-11-25 Thread roger koenker

lower triangle can be obtained by
A[row(A)col(A)]
url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
On Nov 25, 2004, at 11:15 AM, John wrote:
Hello R-users,
I have a symmetric matrix of numerical values and I
want to obtain those values in the upper or lower
triangle of the matrix in a vector. I tried to do the
job by using two for-loops but it doens't seem to be a
clever way, and I'd like to know a more efficient code
for a large matrix of thousands of rows and columns.
Below is my code for your reference.
Thanks a lot.
John

# mtx.sym is a symmetric matrix
my.ftn - function(size_mtx, mtx) {
+ my.vector - c()
+ for ( i in 1:size_mtx ) {
+ cat(.)
+ for ( j in 1:size_mtx ) {
+ if ( upper.tri(mtx)[i,j] ) {
+ my.vector - c(my.vector, mtx[i,j])
+ }}}
+ cat(\n)
+ }

# if I have a matrix, mtx.sym, of 100x100
my.ftn(100, mtx.sym)
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] The hidden costs of GPL software?

2004-11-23 Thread roger koenker

Having just finished an index I would like to second John's comments.
Even as an author, it is  difficult to achieve some degree of
completeness and consistency.
Of course, maybe a real whizz at clustering could assemble something
very useful quite easily.  All of us who have had the frustration of 
searching
for a forgotten function would be grateful.

url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
On Nov 23, 2004, at 7:48 AM, John Fox wrote:
Dear Duncan,
I don't think that there is an automatic, nearly costless way of 
providing
an effective solution to locating R resources. The problem seems to me 
to be
analogous to indexing a book. There's an excellent description of what 
that
process *should* look like in the Chicago Manual of Style, and it's a 
lot of
work. In my experience, most book indexes are quite poor, and 
automatically
generated indexes, while not useless, are even worse, since one should 
index
concepts, not words. The ideal indexer is therefore the author of the 
book.

I guess that the question boils down to how important is it to provide 
an
analogue of a good index to R? As I said in a previous message, I 
believe
that the current search facilities work pretty well -- about as well 
as one
could expect of an automatic approach. I don't believe that there's an
effective centralized solution, so doing something more ambitious than 
is
currently available implies farming out the process to package 
authors. Of
course, there's no guarantee that all package authors will be diligent
indexers.

Regards,
 John

John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Duncan Murdoch
Sent: Monday, November 22, 2004 8:55 AM
To: Cliff Lunneborg
Cc: [EMAIL PROTECTED]
Subject: Re: [R] The hidden costs of GPL software?
On Fri, 19 Nov 2004 13:59:23 -0800, Cliff Lunneborg
[EMAIL PROTECTED] quoted John Fox:
Why not, as previously has been proposed, replace the current static
(and, in my view, not very useful) set of keywords in R
documentation
with the requirement that package authors supply their own
keywords for
each documented object? I believe that this is the intent of the
concept entries in Rd files, but their use certainly is not
required or
even actively encouraged. (They're just mentioned in passing in the
Writing R Extensions manual.
That would not be easy and won't happen quickly.  There are some
problems:
 - The base packages mostly don't use  \concept. (E.g. base
has 365 man pages, only about 15 of them use it).  Adding it
to each file is a fairly time-consuming task.
- Before we started, we'd need to agree as to what they are for.
Right now, I think they are mainly used when the name of a
concept doesn't match the name of the function that
implements it, e.g.
modulo, remainder, promise, argmin, assertion.  The
need for this usage is pretty rare.  If they were used for
everything, what would they contain?
 - Keywording in a useful way is hard.  There are spelling
issues (e.g. optimise versus optimize); our fuzzy matching
helps with those.
But there are also multiple names for the same thing, and
multiple meanings for the same name.
Duncan Murdoch
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] 2d approx

2004-10-14 Thread roger koenker

?interp in akima for f: R^2 - R.
url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
On Oct 14, 2004, at 8:09 PM, Vadim Ogranovich wrote:
Hi,
I am looking for a function that generalizes 'approx' to two (or more)
dimensions. The references on the approx help page point toward 
splines,
but a) splines is what I am trying to avoid in the first place and b)
splines (except for mgcv splines) seem to be one dimensional.

Here is a more detailed account. Using mgcv:gam I fit an additive model
xy.gam according to the formula y ~ s(x), which is a spline under the
hood. If I now wish to compute model prediction for new data I could 
use
predict.gam(xy.gam, newdata). However newdata will first be expanded
into a large matrix of coefficients with respect to the spline basis
functions. For example if the length of newdata is 1e6 and the size of
the basis is 100 than the matrix of coefficients is 100*1e6, i.e. huge.
The predict.gam recognizes the problem and works around it by doing a
piece-meal prediction, but this turns out to be too slow for my needs.

One way around is to tabulate s(x) on a fine enough grid and use approx
for prediction. Something like this (pseudo-code)
x.grid - seq(min(newdata), max(newdata), length=1000)
y.grid - predict.gam(xy.gam, x.grid)
y.newdata - approx(x.grid, y.grid, newdata)$y
I didn't test this, but I expect it to be dramatically faster than
predict.gam.
Unfortunately I don't know how to extend it into 2D. Your suggestions
are very welcome!
Thanks,
Vadim
[[alternative HTML version deleted]]
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] ordered probit and cauchit

2004-09-21 Thread roger koenker

What is the current state of the R-art for ordered probit models, and 
more
esoterically is there any available R strategy for ordered cauchit 
models,
i.e. ordered multinomial alternatives with a cauchy link function.  MCMC
is an option, obviously, but for a univariate latent variable model 
this seems
to be overkill... standard mle methods should be preferable.  (??)

Googling reveals that spss provides such functions... just to wave a red
flag.
url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] adding observations to lm for fast recursive residuals?

2004-09-15 Thread roger koenker

In my quantreg package there is a function called lm.fit.recursive() 
that, as the .Rd file
says:

Description:
 This function fits a linear model by recursive least squares.  It
 is a utility routine for the 'khmaladzize' function of the
 quantile regression package.
Usage:
 lm.fit.recursive(X, y, int=TRUE)
Arguments:
   X: Design Matrix
   y: Response Variable
 int: if TRUE then append intercept to X
Value:
 return p by n matrix of fitted parameters, where p. The ith column
 gives the solution up to time i.
It is written in fortran so it should be reasonably quick.
HTH
url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
On Sep 15, 2004, at 9:53 AM, [EMAIL PROTECTED] wrote:
dear R community:  i have been looking but failed to find the 
following:  is there a function in R that updates a plain OLS lm() 
model with one additional observation, so that I can write a function 
that computes recursive residuals *quickly*?

PS: (I looked at package strucchange, but if I am not mistaken, the 
recresid function there takes longer than iterating over the models 
fresh from start to end.)  I know the two functions do not do the same 
thing, but the main part (OLS) is the same:
   handrecurse.test - function( y, x ) { z- rep(NA, T); for (i in 
2:T)  { z[i] - coef(lm(y[1:i] ~ x[1:i]))[2]; }; return(z); }
  system.time(handrecurse.test(y,x))
   [1] 0.69 0.00 0.70 0.00 0.00
  system.time(length(recresid( y~x )))
[1] 1.44 0.07 1.59 0.00 0.00

pointers appreciated.  regards, /iaw
---
ivo welch
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

1 2 >

1 - 100 of 155 matches

Mail list logo