Re: [R] Optimization under an absolute value constraint

2007-09-07 Thread roger koenker
this should be possible in the lasso2 package.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Sep 7, 2007, at 1:17 PM, Phil Xiang wrote:

> I need to optimize a multivariate function f(w, x, y, z, ...) under  
> an absolute value constraint. For instance:
>
> min { (2x+y) (w-z) }
>
> under the constraint:
>
> |w| +  |x| + |y| + |z| = 1.0 .
>
> Is there any R function that does this? Thank you for your help!
>
>
> Phil Xiang
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Monotonic interpolation

2007-09-06 Thread roger koenker
You might look at the monotone fitting available in the rqss()
function of the quantreg package.

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Sep 6, 2007, at 10:03 AM, excalibur wrote:

>
>
>
> Le jeu. 6 sept. à 09:45, excalibur a écrit :
>
>>
>> Hello everybody, has anyone got a function for smooth monotonic
>> interpolation
>> (splines ...) of a univariate function (like a distribution
>> function for
>> example) ?
>
> approxfun() might be what your looking for.
>
> Is the result of approxfun() inevitably monotonic ?
> --  
> View this message in context: http://www.nabble.com/Monotonic- 
> interpolation-tf4392288.html#a12524568
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] piecewise linear approximation

2007-08-30 Thread roger koenker
If you want to minimize absolute error for this, then you can
try the rqss fitting in the quantreg package and tune lambda
to get one break in the fitted function.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Aug 29, 2007, at 8:05 PM, Achim Zeileis wrote:

> On Wed, 29 Aug 2007, Naxerova, Kamila wrote:
>
>> Dear list,
>>
>> I have a series of data points which I want to approximate with  
>> exactly two
>> linear functions. I would like to choose the intervals so that the  
>> total
>> deviation from my fitted lines is minimal. How do I best do this?
>
>> From the information you give it seems that you want to partition  
>> a model
> like
>lm(y ~ x)
> along a certain ordering of the observations. Without any further
> restrictions you can do that with the function breakpoints() in  
> package
> "strucchange". If there are continuity restrictions or something like
> that, you want to look at the "segmented" package.
>
> hth,
> Z
>
>> Thanks!
>> Kamila
>>
>>
>> The information transmitted in this electronic communication... 
>> {{dropped}}
>>
>> __
>> R-help@stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting- 
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] quntile(table)?

2007-08-28 Thread roger koenker
You could use:

require(quantreg)
  rq(index ~ 1, weights=count, tau=0:5/5)

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Aug 28, 2007, at 9:22 AM, Seung Jun wrote:

> Hi,
>
> I have data in the following form:
>
>   index  count
> -7  32
>  19382
>  22192
>  7 190
> 11 201
>
> I'd like to get quantiles from the data.  I thought about something  
> like this:
>
>   index <- c(-7, 1, 2, 7, 11)
>   count <- c(32,  9382, 2192, 190, 201)
>   quantile(rep(index, count))
>
> It answers correctly, but I feel it's wasteful especially when count
> is generally large.  So, my question is, is there a way to get
> quantiles directly from this table (without coding at a low level)?
>
> Thanks,
> Seung
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] perception of graphical data

2007-08-24 Thread roger koenker
You might want to look at the cartogram literature.  See e.g.

http://www-personal.umich.edu/~mejn/election/

I don't know of an R implementation of this sort of thing, but
perhaps others can correct me.

url:www.econ.uiuc.edu/~roger    Roger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Aug 24, 2007, at 12:30 PM, Yeh, Richard C wrote:

> Hello,
>
> I apologize that this is off-topic.  I am seeking information on
> perception of graphical data, in an effort to improve the plots I
> produce.  Would anyone point me to literature reviews in this  
> area?  (Or
> keywords to try on google?)  Is this located somewhere near cognitive
> science, psychology, human factors research?
>
> For example, some specific questions I have are:
>
> I recall as a child when I first saw a map where the areas of the
> containers (geographical states) were drawn as rectangles,  
> proportional
> to a quantity other than land area.  Does anyone know of an algorithm
> for drawing such maps?  Would anyone know of a journal or reference
> where I can find studies on whether subjects reading these maps can
> accurately assess the meaning of the different areas, as [some of us]
> can assess different heights on a bar graph?  (What about areas in bar
> graphs with non-uniform widths?)
>
> Scatter plots of microarray data often attempt to represent  
> thousands or
> tens of thousands of points, but all I read from them are density and
> distribution --- the gene names cannot be shown.  At what point,  
> would a
> sunflowerplot-like display or a smooth gradient be better?  When two
> data points drawn as 50% gray disks are small and tangent, are they
> perceptually equivalent to a single, 100% black disk?  Or a 50% gray
> disk with twice the area?  What problems are known about plotting with
> disks --- do viewers use the area or the diameter (or neither) to  
> gauge
> weight?
>
>
> As you can tell, I'm a non-expert, mixing issues of data  
> interpretation,
> visual perception, graphic representation.  Previously, I didn't have
> the flexibility of R's graphics, so I didn't need to think so much.
> I've read some of Edward S. Tufte's books, but found them more
> qualitative than quantitative.
>
> Thanks!
>
> Richard
>
> 212-933-3305 / [EMAIL PROTECTED]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (Most efficient) way to make random sequences of random sequences

2007-08-21 Thread roger koenker
One way:

N <- 10
 s <- c(apply(matrix(rep(1:3,N),3,N),2,sample))


url:www.econ.uiuc.edu/~roger    Roger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Aug 21, 2007, at 3:49 PM, Emmanuel Levy wrote:

> Hi,
>
> I was wondering the what would be the (most efficient) way to generate
> a sequence
> of sequences, i mean:
>
> if I have 1,2 and 3.
>
> I'd like to generate a sequence of length N*3 (N ~ 1,000,000 or more)
>
> Where random permutations of the sequence 1,2,3 follow each other.
>
> i.e  1,2,3,1,3,2,3,2,1
>
> /!\ The thing is that there should never be twice the same number of
> in the same sub-sequence, meaning that this is different from
> generating a vector with the numbers 1,2 and 3 randomly distributed.
>
> Any suggestion very welcome! Thanks,
>
> Emmanuel
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] image plot with multiple x values

2007-08-17 Thread roger koenker
If you are willing to go to the bother of representing your data
as a sparse matrix, the package SparseM has a version of image()
that will do what you would like to do, I believe.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Aug 17, 2007, at 1:51 PM, baptiste Auguié wrote:

> Hi,
>
> New to R, I don't find a way to plot the following data with image():
>
> x is a N * M matrix
> y is a vector of length M
> z is a N*M matrix
>
> I wish to plot z as a greyscale image, but my x axis is different for
> every row of the z data.
>
> Here is a minimal example,
>
>> theta<-c(3:6) # N
>> y<-c(1:5) # M
>>
>> x<-theta%*%t(y)# N * M
>> z<-sin(x) # N * M
>>
>> image(z)
>
> This doesn't give what I want, as the x axis needs to be shifted as
> we go from line to the following. (probably clearer if you plot
> matplot(x,z): the curves are shifted)
>
> The way I see it, I need either to construct a bigger matrix with all
> possible values of x giving the new dimension and arbitrary values
> for the missing points, or find a plotting function that would plot
> lines by lines. The ordering of the x and z values is giving me a
> headache on the first idea, and I can't find any option / alternative
> to image.
>
> Thanks in advance!
>
> baptiste
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] smoothing function for proportions

2007-08-10 Thread roger koenker
It is not entirely clear what you are using for y values in  
smooth.spline,
but it would appear that it is just the point estimates.  I would  
suggest
using instead -- at each x value -- a few equally spaced quantiles of
the estimated proportions.  Implicitly, smooth.spline expects to be  
fitting
a mean curve to data that has constant variance, so you might also
consider reweighting to approximate this, as well.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Aug 10, 2007, at 10:23 AM, Rose Hoberman wrote:

> Sorry, forgot to attach the graph.
>
> On 8/10/07, Rose Hoberman <[EMAIL PROTECTED]> wrote:
>> I am looking for a function that can fit a smooth function to a  
>> vector
>> of estimated proportions, such that the smoothed value is within
>> specified confidence bounds of each proportion.  In other words,  
>> given
>> a small number of trials and large confidence intervals, I would
>> prefer the function to vary smoothly, but given a large number of
>> trials and small confidence intervals, I would prefer the function to
>> lie within the confidence intervals, even if it is not smooth.
>>
>> I have attached a postscript file illustrating a data set I would  
>> like
>> to smooth.  As the figure shows, for large values of x, I have few
>> data points, and so the ML estimate of the proportion varies widely,
>> and the confidence intervals are very large.  When I use the
>> smooth.spline function with a large value of spar (the red line), the
>> function is not as smooth as desired for large values of x.  When I
>> use a smaller value of spar (the green line), the function fails to
>> stay within the confidence bounds of the proportions.   Is there a
>> smoothing function for which I can specify upper and lower limits for
>> the y value for specific values of x?
>>
>> Thanks for any suggestions,
>>
>> Rose
>>
>> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Predict using SparseM.slm

2007-08-01 Thread roger koenker
If you are feeling altruistic you could write a predict method for
slm objects, it wouldn't be much work to adapt what is already
available and  follow the  predict.lm prototype.  On the other
hand if you are looking for something quick and dirty you can
always resort to

newX %*% coef(slmobj)


url:www.econ.uiuc.edu/~roger    Roger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Aug 1, 2007, at 4:42 PM, T. Balachander wrote:

> Hi,
>
> I am trying out the SparseM package and had the a
> question. The following piece of code works fine:
>
> ...
> fit = slm(model, data = trainData, weights = weight)
>
> ...
>
> But how do I use the fit object to predict the values
> on say a reserved testDataSet? In the regular lm
> function I would do something like this:
>
> predict.lm(fit,testDataSet)
>
> Thanks
> -Bala
>
>
>
>
> __ 
> __
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] crimtab related question

2007-07-24 Thread roger koenker
While on the subject of mechanical methods of statistical research  I  
can't
resist quoting Doob's (1997) Statistical Science interview:

> My system, complicated by my inaccurate typing, led to retyping  
> material over and over, and for some time I had an electric drill  
> on my desk, provided with an eraser bit which I used to erase  
> typing. I rarely used the system of brushing white fluid over a  
> typed error because I was not patient enough to let the fluid dry  
> before retyping. Long after my first book was done I discovered the  
> tape rolls which cover lines of type. As I typed and retyped my  
> work it became so repugnant to me that I had more and more  
> difficulty even to look at it to check it. This fact accounts for  
> many slips that a careful reading would have discovered. I commonly  
> used a stochastic system of checking, picking a page and then a  
> place on the page at random and reading a few sentences, in order  
> to avoid reading it in context and thereby to avoid reading what  
> was in my mind rather than what I had written. At first I would  
> catch something at almost every trial, and I would continue until  
> several trials would yield nothing. I have tried this system on  
> other authors, betting for example that I would find something to  
> correct on a randomly chosen printed page of text, and  
> nonmathematicans suffering under the delusion that mathematics is  
> errorless would be surprised at how many bets I have won.

The relevance to the present inquiry is confirmed by the misspelling  
of Dennison in the Annals reference
quoted below.  See, for example:

http://www.amazon.com/Avery-Dennison-Metal-Rim-Tags/dp/B000AN376G

On the substance of Jean's question, Mark's interpretation seems very  
plausible.

Thanks to Jean and to Martin Maechler for adding this dataset to R.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Jul 24, 2007, at 4:42 PM, Mark Difford wrote:

>
> Hi Jean,
>
> You haven't yet had a reply from an authoratitive source, so here  
> is my
> tuppence worth to part of your enquiry.
>
> It's almost certain that the "receiving box" is a receptacle into  
> which tags
> were placed after they had been drawn and the inscribed measurement  
> noted
> down.  Measurements on three tags were unwittingly not noted before  
> the tags
> were transferred to the receiving box.  They lay there with a good  
> many
> other tags, so the inscribed measurement/tag couldn't be recovered.
>
> I hope this clarifies some points.
>
> Regards,
> Mark.
>
>
> Jean lobry wrote:
>>
>> Dear all,
>>
>> the dataset documented under ?crimtab was also used in:
>>
>> @article{TreloarAE1934,
>>  title = {The adequacy of "{S}tudent's" criterion of
>>   deviations in small sample means},
>>  author = {Treloar, A.E. and Wilder, M.A.},
>>  journal = {The Annals of Mathematical Statistics},
>>  volume = {5},
>>  pages = {324-341},
>>  year = {1934}
>> }
>>
>> The following is from page 335 of the above paper:
>>
>> "From the table provided by MacDonell (1902) on
>> the associated variation of stature (to the nearest inch)
>> and length of the left middle finger (to the nearest
>> millimeter) in 3000 British criminals, the measusurements
>> were transferred to 3000 numbered Denison metal-rim
>> tags from which the cords has been removed. After
>> thorough checking and mixing of these circular disks,
>> samples of 5 tags each were drawn at random until the
>> supply was exhausted. Unfortunately, three of these
>> samples were erroneously returned to a receiving box
>> before being copied, and the records of 597 samples only
>> are available."
>>
>> Could someone give me a clue about the kind of device
>> that was used here? Is it a kind of lottery machine?
>> I don't understand why three samples were lost. What
>> is this "receiving box"?
>>
>> Thanks for any hint,
>>
>> Best,
>> -- 
>> Jean R. Lobry([EMAIL PROTECTED])
>> Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - LYON I,
>> 43 Bd 11/11/1918, F-69622 VILLEURBANNE CEDEX, FRANCE
>> allo  : +33 472 43 27 56 fax: +33 472 43 13 88
>> http://pbil.univ-lyon1.fr/members/lobry/
>>
>> __
>> R-help@stat.math.ethz.ch mailing list
>> https:

Re: [R] quantreg behavior changes for N>1000

2007-07-24 Thread roger koenker
When in doubt:  RTFM --  Quoting from ?summary.rq

se: specifies the method used to compute standard standard
   errors.  There are currently five available methods:

  1.  '"rank"' which produces confidence intervals for the
 estimated parameters by inverting a rank test as
 described in Koenker (1994).  The default option
 assumes that the errors are iid, while the option iid =
 FALSE implements the proposal of Koenker Machado
 (1999).  This is the default method unless the sample
 size exceeds 1001, or cov = FALSE in which case se =
 "nid" is used.

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Jul 24, 2007, at 12:57 PM, Jeff G. wrote:

> Hello again R-experts and novices (like me),
>
> This seems like a bug to me - or maybe it's intentional...can anyone
> confirm?  Up to 1000 reps, summary() of a rq object gives different
> output and subtly different confidence interval estimates.
>
> ThanksJeff
>
> testx=runif(1200)
> testy=rnorm(1200, 5)
>
> test.rq=summary(rq(testy[1:1000]~testx[1:1000], tau=2:98/100))
> test.rq[[1]]
> Gives this output:
> Call: rq(formula = testy[1:1000] ~ testx[1:1000], tau = 2:98/100)
>
> tau: [1] 0.02
>
> Coefficients:
>   coefficients   lower bd   upper bd
> (Intercept)3.00026 2.45142 3.17098
> testx[1:1000] -0.00870 -0.39817  0.49946
>
> test.rq=summary(rq(testy[1:1001]~testx[1:1001], tau=2:98/100))
> test.rq[[1]]
>
> Gives this (different) output:
> Call: rq(formula = testy[1:1001] ~ testx[1:1001], tau = 2:98/100)
>
> tau: [1] 0.02
>
> Coefficients:
>   ValueStd. Error t value  Pr(>|t|)
>(Intercept)3.00026  0.21605   13.88658  0.0
> testx[1:1001] -0.00870  0.32976   -0.02638  0.97896
>
>
> plot(test.rq, nrow=2, ncol=2) # The slope estimates appear to be the
> same but there are subtle differences in the confidence intervals,  
> which
> shouldn't be due simply to the inclusion of one more point.
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plotting a summary.rq object in using pkg quantreg

2007-07-24 Thread roger koenker
Package questions to package maintainers  please.

The short answer is that your alpha = .4 parameter needs to
be passed to  summary not to plot.  Try this:

> plot(summary(rq(foodexp~income,tau = 1:49/50,data=engel),alpha =. 
> 4), nrow=1,
> ncol=2, ols = TRUE)

A longer answer would involve a boring disquisition about various  
fitting methods
and standard error estimation methods and their historical evolution  
and defaults.
(By default rank-based confidence bands are being used for the engel  
data since
the sample size is relatively small.)

Regarding your more fundamental question:  you can always modify   
functions
such as summary.rq or plot.summary.rqs  -- see for example ?fix.




url:www.econ.uiuc.edu/~roger        Roger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On Jul 24, 2007, at 11:07 AM, Jeff G. wrote:

> Hello,
>
> I am having problems adjusting the plot output from the quantreg
> package.  Anyone know what I'm doing wrong?
>
> For example (borrowing from the help files):
>
> plot(summary(rq(foodexp~income,tau = 1:49/50,data=engel)), nrow=1,
> ncol=2,alpha = .4, ols = TRUE, xlab="test")
>
> The "alpha=" parameter seems to have no effect on my output, even  
> when I
> set it to a ridiculous value like 0.4.  Also, though in the help  
> file it
> says |"...| = optional arguments to plot", "xlab" (as an example)  
> seems
> to do nothing.  If the answer is that I should extract the values I  
> need
> and construct the plot I want independently of the rq.process object,
> that it okay I suppose, if inefficient.  Maybe a more fundamental
> question is how do I get in and see how plot is working in this  
> case so
> that I can modify.
>
> Thanks much!
>
> J
>
> P.S.  I've explored using plot.summary.rqs but the problems seem to be
> the same.
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tools For Preparing Data For Analysis

2007-06-10 Thread roger koenker
An important potential benefit of R solutions shared by awk, sed, ...
is that they provide a reproducible way to  document  exactly how one  
got
from one version of the data to the next.  This  seems to be the main
problem with handicraft methods like editing excel files, it  is too
easy to introduce  new errors that can't be tracked down at later
stages of the analysis.


url:www.econ.uiuc.edu/~roger    Roger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On Jun 10, 2007, at 4:14 PM, (Ted Harding) wrote:

> On 10-Jun-07 19:27:50, Stephen Tucker wrote:
>>
>> Since R is supposed to be a complete programming language,
>> I wonder why these tools couldn't be implemented in R
>> (unless speed is the issue). Of course, it's a naive desire
>> to have a single language that does everything, but it seems
>> that R currently has most of the functions necessary to do
>> the type of data cleaning described.
>
> In principle that is certainly true. A couple of comments,
> though.
>
> 1. R's rich data structures are likely to be superfluous.
>Mostly, at the sanitisation stage, one is working with
>"flat" files (row & column). This straightforward format
>is often easier to handle using simple programs for the
>kind of basic filtering needed, rather then getting into
>the heavier programming constructs of R.
>
> 2. As follow-on and contrast at the same time, very often
>what should be a nice flat file with no rough edges is not.
>If there are variable numbers of fields per line, R will
>not handle it straightforwardly (you can force it in,
>but it's more elaborate). There are related issues as well.
>
> a) If someone entering data into an Excel table lets their
>cursor wander outside the row/col range of the table,
>this can cause invisible entities to be planted in the
>extraneous cells. When saved as a CSV, this file then
>has variable numbers of fields per line, and possibly
>also extra lines with arbitrary blank fields.
>
>cat datafile.csv | awk 'BEGIN{FS=","}{n=NF;print n}'
>
>will give you the numbers of fields in each line.
>
>If you further pipe it into | sort -nu you will get
>the distinct field-numbers. If you know (by now) how many
>fields there should be (e.g. 10), then
>
>cat datafile.csv | awk 'BEGIN{FS=","} (NF != 10){print NR ", " NF}'
>
>will tell you which lines have the wrong number of fields,
>and how many fields they have. You can similarly count how
>many lines there are (e.g. pipe into wc -l).
>
> b) Poeple sometimes randomly use a blank space or a "." in a
>cell to demote a missing value. Consistent use of either
>is OK: ",," in a CSV will be treated as "NA" by R. The use
>of "." can be more problematic. If for instance you try to
>read the following CSV into R as a dataframe:
>
>1,2,.,4
>2,.,4,5
>3,4,.,6
>
>the "." in cols 2 and 3 is treated as the character ".",
>with the result that something complicated happens to
>the typing of the items.
>
>typeeof(D[i,j]) is always integer. sum(D[1,1]=1, but
>sum(D[1,2]) gives a type-error, even though the entry
>is in fact 2. And so on , in various combinations.
>
>And (as.nmatrix(D)) is of course a matrix of characters.
>
>In fact, columns 2 and 3 of D are treated as factors!
>
>for(i in (1:3)){ for(j in (1:4)){ print( (D[i,j]))}}
>[1] 1
>[1] 2
>Levels: . 2 4
>[1] .
>Levels: . 4
>[1] 4
>[1] 2
>[1] .
>Levels: . 2 4
>[1] 4
>Levels: . 4
>[1] 5
>[1] 3
>[1] 4
>Levels: . 2 4
>[1] .
>Levels: . 4
>[1] 6
>
>This is getting altogether too complicated for the job
>one wants to do!
>
>And it gets worse when people mix ",," and ",.,"!
>
>On the other hand, a simple brush with awk (or sed in
>this case) can sort it once and for all, without waking
>the sleeping dogs in R.
>
> I could go on. R undoubtedly has the power, but it can very
> quickly get over-complicated for simple jobs.
>
> Best wishes to all,
> Ted.
>
> 
> E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
> Fax-to-email: +44 (0)870 094 0861
> Date: 10-Jun-07   Time: 22

Re: [R] Metropolis-Hastings Markov Chain Monte Carlo in Spatstat

2007-06-06 Thread roger koenker
Take a look at:  http://sepwww.stanford.edu/software/ratfor.html
and in particular the link there to the original paper by Brian
Kernighan describing ratfor; it is only 14 pages, but it is a model
of clarity of exposition and design.

I wouldn't worry too much about the makefile  -- it probably
knows exactly what to do with ratfor provided you have the
ratfor preprocessor available from the above link, and the rest
of the tools to build from source.

url:www.econ.uiuc.edu/~roger    Roger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Jun 6, 2007, at 4:42 PM, Kevin C Packard wrote:

> I'm testing some different formulations of pairwise interaction  
> point processes
> in Spatstat (version 1.11-6) using R 2.5.0 on a Windows platform  
> and I wish to
> simulate them using the Metropolis-Hastings algorithm implemented  
> with Spatstat.
> Spatstat utilizes Fortran77 code with the preprocessor RatFor to do  
> the
> Metropolis-Hastings MCMC, but the Makefile is more complicated than  
> any I have
> worked with.
> Any suggestions on how I could get started working with the Fortran  
> code in
> conjunction with RatFor is appreciated.
>
> Sincerely,
> Kevin
>
> Kevin Packard
> Department of Forestry, PhD student
> Department of Statistics, MS student
> Virginia Polytechnic Institute and State University
> Blacksburg, Virginia, USA
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to use density function to find h_{k}

2007-06-03 Thread Roger Koenker
You might try:  http://www.stanford.edu/~kasparr/software/silverman.r

But  take a look at the referenced paper by Silverman first.  You could 
also try the CRAN package ftnonpar by Kovac and Davies.


url:www.econ.uiuc.edu/~roger/my.htmlRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820

On Sun, 3 Jun 2007, Patrick Wang wrote:

> Hi, All:
>
> How can I use the density function to find the minimum of the bandwidth
> make the density function one mode, 2 mode, 3 mode etc. usually the
> larger the bandwidth, the fewer mode of the density. less bumpy.
>
> It will be impossible to try all possible bandwidths to then plot the pdf
> to see how many modes it has. Is there an automatic way to do this Like
> for loop 1000, try bandwidth from (0, 1). is there a function to get
> how many modes from the density function? the Mode function in R doesnot
> seem to serve this purpose.
>
>
> Thanks
> pat
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Smoothing a path in 2D

2007-05-30 Thread roger koenker
You might have a look at the fda package of Ramsay on CRAN.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On May 30, 2007, at 9:42 AM, Dieter Vanderelst wrote:

> Hello,
>
> I'm currently trying to find a method to interpolate or smooth data  
> that
> represent a trajectory in space.
>
> For example, I have an ordered (=time) set of (x,y) tuples which
> constitute a path in a 2D space.
>
> Is there a way using R to interpolate between these points in a way
> similar to spline interpolation so that I get a smooth path in space?
>
> Greetings,
> Dieter
>
> -- 
> Dieter Vanderelst
> [EMAIL PROTECTED]
> Department of Industrial Design
> Designed Intelligence
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Where to find "nprq"?

2007-05-28 Thread roger koenker
It has been folded into my quantreg package.

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On May 28, 2007, at 4:32 AM, Rainer M. Krug wrote:

> Hi
>
> I am trying to install the package "pheno", but it needs the package
> "nprq" by Roger Koenker et al. which I can I find this package? It  
> does
> not seem to be on CRAN and googling also doesn't give me an URL -  
> is it
> still somewhere available?
>
> Thanks,
>
> Rainer
>
>
> -- 
> NEW EMAIL ADDRESS AND ADDRESS:
>
> [EMAIL PROTECTED]
>
> [EMAIL PROTECTED] WILL BE DISCONTINUED END OF MARCH
>
> Rainer M. Krug, Dipl. Phys. (Germany), MSc Conservation
> Biology (UCT)
>
> Leslie Hill Institute for Plant Conservation
> University of Cape Town
> Rondebosch 7701
> South Africa
>
> Fax:  +27 - (0)86 516 2782
> Fax:  +27 - (0)21 650 2440 (w)
> Cell: +27 - (0)83 9479 042
>
> Skype:RMkrug
>
> email:[EMAIL PROTECTED]
>   [EMAIL PROTECTED]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nlme fixed effects specification

2007-05-09 Thread roger koenker
Just to provide some closure on this thread, let me add two comments:

1.  Doug's version of my sweep function:

diffid1 <-
function(h, id) {
 id <- as.factor(id)[ , drop = TRUE]
 apply(as.matrix(h), 2, function(x) x - tapply(x, id, mean)[id])
}

is far more elegant than my original, and works perfectly, but

2.  I should have mentioned that proposed strategy gets the
coefficient estimates right, however their standard errors need a
degrees of freedom correction, which in the present instance
is non-negligible -- sqrt(98/89) -- since the lm() step doesn't
know that we have already estimated the fixed effects with the
sweep operation.

url:www.econ.uiuc.edu/~roger        Roger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On May 5, 2007, at 7:16 PM, Douglas Bates wrote:

> On 5/5/07, roger koenker <[EMAIL PROTECTED]> wrote:
>>
>> On May 5, 2007, at 3:14 PM, Douglas Bates wrote:
>> >
>> > As Roger indicated in another reply you should be able to obtain  
>> the
>> > results you want by sweeping out the means of the groups from  
>> both x
>> > and y.  However, I tried Roger's function and a modified version  
>> that
>> > I wrote and could not show this.  I'm not sure what I am doing  
>> wrong.
>>
>> Doug,  Isn't it just that you are generating a  balanced factor and
>> Ivo is
>> generating an unbalanced one -- he wrote:
>
>> > fe = as.factor( as.integer( runif(100)*10 ) );
>
>> the coefficient on x is the same
>
>> or, aarrgh,  is it that you don't like the s.e. being wrong.   I
>> didn't notice
>> this at first.  But it shouldn't happen.  I'll have to take another
>> look at  this.
>
> No, my mistake was much dumber than that.  I was comparing the wrong
> coefficient.  For some reason I was comparing the coefficient for x in
> the second fit to the Intercept from the first fit.
>
> I'm glad that it really is working and, yes, you are right, the
> degrees of freedom are wrong in the second fit because the effect of
> those 10 degrees of freedom are removed from the data before the model
> is fit.
>
>
>> > I enclose a transcript that shows that I can reproduce the  
>> result from
>> > Roger's function but it doesn't do what either of us think it  
>> should.
>> > BTW, I realize that the estimate for the Intercept should be  
>> zero in
>> > this case.
>> >
>> >
>> >
>> >> now, with a few IQ points more, I would have looked at the lme
>> >> function instead of the nlme function in library(nlme).[then
>> >> again, I could understand stats a lot better with a few more IQ
>> >> points.]  I am reading the lme description now, but I still don't
>> >> understand how to specify that I want to have dummies in my
>> >> specification, plus the x variable, and that's it.  I think I  
>> am not
>> >> understanding the integration of fixed and random effects in  
>> the same
>> >> R functions.
>> >>
>> >> thanks for pointing me at your lme4 library.  on linux, version
>> >> 2.5.0, I did
>> >>   R CMD INSTALL matrix*.tar.gz
>> >>   R CMD INSTALL lme4*.tar.gz
>> >> and it installed painlessly.  (I guess R install packages don't  
>> have
>> >> knowledge of what they rely on;  lme4 requires matrix, which  
>> the docs
>> >> state, but having gotten this wrong, I didn't get an error.  no  
>> big
>> >> deal.  I guess I am too used to automatic resolution of  
>> dependencies
>> >> from linux installers these days that I did not expect this.)
>> >>
>> >> I now tried your specification:
>> >>
>> >> > library(lme4)
>> >> Loading required package: Matrix
>> >> Loading required package: lattice
>> >> > lmer(y~x+(1|fe))
>> >> Linear mixed-effects model fit by REML
>> >> Formula: y ~ x + (1 | fe)
>> >>  AIC BIC logLik MLdeviance REMLdeviance
>> >>  282 290   -138270  276
>> >> Random effects:
>> >>  Groups   NameVariance   Std.Dev.
>> >>  fe   (Intercept) 0.0445 0.211
>> >>  Residual 0.889548532468 0.9431588
>> >> number of obs: 100, groups: fe, 10
>> >>
>> >> Fixed effects:
>> >&

Re: [R] nlme fixed effects specification

2007-05-05 Thread roger koenker
Ivo,

I don't know whether you ever got a proper answer to this question.
Here is a kludgy one --  someone else can probably provide
a more elegant version of my diffid function.

What you want to do is sweep out the mean deviations from both y
and x based on the factor fe and then estimate the simple y on x  
linear model.

I have an old function that was originally designed to do panel data
models that looks like this:

"diffid" <- function(h, id)
{
 if(is.vector(h))
 h <- matrix(h, ncol = 1)
 Ph <- unique(id)
 Ph <- cbind(Ph, table(id))
 for(i in 1:ncol(h))
 Ph <- cbind(Ph, tapply(h[, i], id, mean))
 is <- tapply(id, id)
 Ph <- Ph[is,  - (1:2)]
 h - Ph
}

With this  you can do:

set.seed(1);
fe = as.factor( as.integer( runif(100)*10 ) ); y=rnorm(100); x=rnorm 
(100);
summary(lm(diffid(y,fe) ~ diffid(x,fe)))

HTH,

Roger


On May 4, 2007, at 3:08 PM, ivo welch wrote:

> hi doug:  yikes.  could I have done better?  Oh dear.  I tried to make
> my example clearer half-way through, but made it worse.  I meant
>
> set.seed(1);
> fe = as.factor( as.integer( runif(100)*10 ) ); y=rnorm(100); x=rnorm 
> (100);
> print(summary(lm( y ~ x + fe)))
>   
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept)   0.1128 0.36800.31 0.76
> x 0.0232 0.09600.24 0.81
> fe1  -0.6628 0.5467   -1.21 0.23
>   
> Residual standard error: 0.949 on 89 degrees of freedom
> Multiple R-Squared: 0.0838, Adjusted R-squared: -0.0192
> F-statistic: 0.814 on 10 and 89 DF,  p-value: 0.616
>
> I really am interested only in this linear specification, the
> coefficient on x (0.0232) and the R^2 of 8.38% (adjusted -1.92%).  If
> I did not have so much data in my real application, I would never have
> to look at nlme or nlme4.  I really only want to be able to run this
> specification through lm with far more observations (100,000) and
> groups (10,000), and be done with my problem.
>
> now, with a few IQ points more, I would have looked at the lme
> function instead of the nlme function in library(nlme).[then
> again, I could understand stats a lot better with a few more IQ
> points.]  I am reading the lme description now, but I still don't
> understand how to specify that I want to have dummies in my
> specification, plus the x variable, and that's it.  I think I am not
> understanding the integration of fixed and random effects in the same
> R functions.
>
> thanks for pointing me at your lme4 library.  on linux, version  
> 2.5.0, I did
>   R CMD INSTALL matrix*.tar.gz
>   R CMD INSTALL lme4*.tar.gz
> and it installed painlessly.  (I guess R install packages don't have
> knowledge of what they rely on;  lme4 requires matrix, which the docs
> state, but having gotten this wrong, I didn't get an error.  no big
> deal.  I guess I am too used to automatic resolution of dependencies
> from linux installers these days that I did not expect this.)
>
> I now tried your specification:
>
>> library(lme4)
> Loading required package: Matrix
> Loading required package: lattice
>> lmer(y~x+(1|fe))
> Linear mixed-effects model fit by REML
> Formula: y ~ x + (1 | fe)
>  AIC BIC logLik MLdeviance REMLdeviance
>  282 290   -138270  276
> Random effects:
>  Groups   NameVariance   Std.Dev.
>  fe   (Intercept) 0.0445 0.211
>  Residual 0.889548532468 0.9431588
> number of obs: 100, groups: fe, 10
>
> Fixed effects:
> Estimate Std. Error t value
> (Intercept)  -0.0188 0.0943  -0.199
> x 0.0528 0.0904   0.585
>
> Correlation of Fixed Effects:
>   (Intr)
> x -0.022
> Warning messages:
> 1: Estimated variance for factor 'fe' is effectively zero
>  in: `LMEoptimize<-`(`*tmp*`, value = list(maxIter = 200L, tolerance =
> 0.000149011611938477,
> 2: $ operator not defined for this S4 class, returning NULL in: x 
> $symbolic.cor
>
> Without being a statistician, I can still determine that this is not
> the model I would like to work with.  The coefficient is 0.0528, not
> 0.0232.  (I am also not sure why I am getting these warning messages
> on my system, either, but I don't think it matters.)
>
> is there a simple way to get the equivalent specification for my smple
> model, using lmer or lme, which does not choke on huge data sets?
>
> regards,
>
> /ivo
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Freeman-Tukey arcsine transformation

2007-03-13 Thread roger koenker
As a further footnote on this, I can't resist mentioning a letter  
that appears
in Technometrics (1977) by Steve  Portnoy who notes that

2 arcsin(sqrt(p)) = arcsin(2p - 1) + pi/2

and asks: "it would be of historical interest to know if any early  
statisticians
were aware of this, and if so, why the former version was  
preferred."  The
latter version seems more convenient since it obviously obviates the  
need
for special tables that appear in many places.



url:www.econ.uiuc.edu/~roger        Roger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On Mar 13, 2007, at 1:48 PM, Sebastian P. Luque wrote:

> On Tue, 13 Mar 2007 14:15:16 -0400,
> "Bos, Roger" <[EMAIL PROTECTED]> wrote:
>
>> I'm curious what this transformation does, but I am not curious  
>> enough
>> to pay $14 to find out.  Someone once told me that the arcsine was a
>> good way to transform data and make it more 'normal'.  I am  
>> wondering if
>> this is an improved method.  Anyone know of a free reference?
>
> My Zar¹, says this is just:
>
>
> p' = 1/2 * (asin(sqrt(x / (n + 1))) + asin(sqrt((x + 1) / (n + 1
>
>
> so solving for x should give the back-transformation.  It is  
> recommended
> when the proportions that need to be "disciplined" are very close  
> to the
> ends of the range (0, 1; 0, 100).
>
>
> + *Footnotes* +
> ¹ @BOOK{149,
>   title = {Biostatistical analysis},
>   publisher = {Prentice-Hall, Inc.},
>   year = {1996},
>   author = {Zar, J. H.},
>   address = {Upper Saddle River, New Jersey},
>   key = {149},
> }
>
>
> -- 
> Seb
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tournaments to dendrograms

2007-03-05 Thread roger koenker
I've had no response to the enquiry below, so I made a rather half-baked
version in grid  --  code and pdf are available here:

http://www.econ.uiuc.edu/~roger/research/ncaa

comments would be welcome.   This is _the_  ubiquitous graphic this  
time of
year in the US, so R should take a shot at it.  My first attempt is  
rather primitive
but I have to say that Paul's grid package is  superb.

url:www.econ.uiuc.edu/~roger    Roger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On Feb 22, 2007, at 4:08 PM, roger koenker wrote:

> Does anyone have (good) experience converting tables of tournament
> results into dendrogram-like graphics?  Tables, for example, like  
> this:
>
> read.table(url("http://www.econ.uiuc.edu/~roger/research/ncaa/ 
> NCAA.d"))
>
> Any pointers appreciated.   RK
>
> url:    www.econ.uiuc.edu/~rogerRoger Koenker
> email[EMAIL PROTECTED]Department of Economics
> vox: 217-333-4558University of Illinois
> fax:   217-244-6678Champaign, IL 61820
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Linear programming with sparse matrix input format?

2007-03-05 Thread roger koenker
If you can reformulate your LP as an L1 problem, which is known to be
possible without loss of generality, but perhaps not without loss of  
sleep,
then you could use the sparse quantile regression functions in the
quantreg package.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On Mar 5, 2007, at 5:30 PM, Talbot Katz wrote:

> Hi.
>
> I am aware of three different R packages for linear programming: glpk,
> linprog, lpSolve.  From what I can tell, if there are N variables  
> and M
> constraints, all these solvers require the full NxM constraint  
> matrix.  Some
> linear solvers I know of (not in R) have a sparse matrix input  
> format.  Are
> there any linear solvers in R that have a sparse matrix input format?
> (including the possibility of glpk, linprog, and lpSolve, in case I  
> might
> have missed something in the documentation).  Thanks!
>
> --  TMK  --
> 212-460-5430  home
> 917-656-5351  cell
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Packages in R for least median squares regression and computing outliers (thompson tau technique etc.)

2007-02-28 Thread roger koenker
It's not often one gets needs to correct Gabor, but no, 

least median of squares  is not the same as least absolute error  
regression.

Take a look at the package robust if you want the lms.

url:www.econ.uiuc.edu/~roger    Roger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Feb 28, 2007, at 1:24 PM, Gabor Grothendieck wrote:

> Try rq in quantreg using the default value for tau.
>
> On 2/28/07, lalitha viswanath <[EMAIL PROTECTED]> wrote:
>> Hi
>> I am looking for suitable packages in R that do
>> regression analyses using least median squares method
>> (or better). Additionally, I am also looking for
>> packages that implement algorithms/methods for
>> detecting outliers that can be discarded before doing
>> the regression analyses.
>>
>> Although some websites refer to "lms" method under
>> package "lps" in R, I am unable to find such a package
>> on CRAN.
>>
>> I would greatly appreciate any pointers to suitable
>> functions/packages for doing the above analyses.
>>
>> Thanks
>> Lalitha
>>
>>
>>
>> _ 
>> ___
>> TV dinner still cooling?
>> Check out "Tonight's Picks" on Yahoo! TV.
>>
>> __
>> R-help@stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting- 
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tournaments to dendrograms

2007-02-22 Thread roger koenker
Does anyone have (good) experience converting tables of tournament
results into dendrogram-like graphics?  Tables, for example, like this:

read.table(url("http://www.econ.uiuc.edu/~roger/research/ncaa/NCAA.d";))

Any pointers appreciated.   RK

url:www.econ.uiuc.edu/~roger    Roger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] loop issues (r.squared)

2007-02-08 Thread roger koenker
both Matrix and SparseM have formats of this type.

url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On Feb 8, 2007, at 4:45 PM, andy1983 wrote:

>
> That was a neat trick. However, it created a new problem.
>
> Before, it took way too long for a 10,000 columns to finish.
>
> Now, I test the memory limit. With 10,000 columns, I use up about  
> 1.5 GBs.
>
> Assuming memory is not the issue, I still end up with a huge matrix  
> that is
> difficult to export. Is there a way to convert it to 3 columns (1  
> for row, 1
> for column, 1 for value)?
>
> Thanks.
>
>
>
> Greg Snow wrote:
>>
>> The most straight forward way that I can think of is just:
>>
>>> cor(my.mat)^2 # assuming my.mat is the matrix with your data in the
>> columns
>>
>> That will give you all the R^2 values for regressing 1 column on 1
>> column (it is called R-squared for a reason).
>>
>>
>>> I would like to compare every column in my matrix with every
>>> other column and get the r-squared. I have been using the
>>> following formula and loops:
>>> summary(lm(matrix[,x]~matrix[,y]))$r.squared
>>> where x and y are the looping column numbers
>>>
>>> If I have 100 columns (10,000 iterations), the loops give me
>>> results in a reasonable time.
>>> If I try 10,000 columns, the loops take forever even if there
>>> is no formula inside. I am guessing I can vectorize my code
>>> so that I could eliminate one or both loops. Unfortunately, I
>>> can't figure out how to.
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/Re%3A--R--loop- 
> issues-%28r.squared%29-tf3196163.html#a8875897
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] heteroscedasticity problem

2007-02-07 Thread roger koenker
If you haven't already you might want to take a look at:

http://www.econ.uiuc.edu/~roger/research/rq/QReco.pdf

which is written by and for ecologists.


url:www.econ.uiuc.edu/~roger    Roger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Feb 7, 2007, at 2:52 PM, [EMAIL PROTECTED] wrote:

>
>
>
>
>
> Dear Listers,
>
> I have a regression problem (x->y) with biological data, where x  
> influences
> y in two ways, (1) y increases with x and (2) the variation around  
> the mean
> (residuals) decreases with increasing x, i.e. y becomes more  
> 'predictable'
> as x increases.
> The relationship is saturating, y~a + bx + cx^2, gives a very good  
> fit.
>
> I know basically how to test for heteroscedasticity. My question is if
> there is an elegant regression method, which captures both, the  
> mean and
> the (non-constant) variation around the mean. Such a method would  
> ideally
> yield an estimate of the mean and its variation, both as a function  
> of x.
>
> The pattern corresponds very well to some established ecological  
> theory
> (each x is the species richness of a community of primary  
> producers, y is
> the productivity of each community; productivity and its  
> predictability
> both increase with increasing species richness).
>
> Apologies for the probably clumsy decription of my problem - I am
> ecologist, not statistician (but a big fan of R).
>
> Cheers,
> Robert
>
>
> Robert Ptacnik
> Norwegian Institute for Water Research (NIVA)
> Gaustadalléen 21
> NO-0349 Oslo
>  FON +47 982 277 81
> FAX +47 221 852 00
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] memory-efficient column aggregation of a sparse matrix

2007-02-01 Thread roger koenker
Doug is right, I think, that this would be easier with full indexing
using the  matrix.coo classe, if you want to use SparseM.  But
then the tapply seems to be the way to go.

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Feb 1, 2007, at 7:22 AM, Douglas Bates wrote:

> On 1/31/07, Jon Stearley <[EMAIL PROTECTED]> wrote:
>> I need to sum the columns of a sparse matrix according to a factor -
>> ie given a sparse matrix X and a factor fac of length ncol(X), sum
>> the elements by column factors and return the sparse matrix Y of size
>> nrow(X) by nlevels(f).  The appended code does the job, but is
>> unacceptably memory-bound because tapply() uses a non-sparse
>> representation.  Can anyone suggest a more memory and cpu efficient
>> approach?  Eg, a sparse matrix tapply method?  Thanks.
>
> This is the sort of operation that is much more easily performed in
> the triplet representation of a sparse matrix where each nonzero
> element is represented by its row index, column index and value.
> Using that representation you could map the column indices according
> to the factor then convert back to one of the other representations.
> The only question would be what to do about nonzeros in different
> columns of the original matrix that get mapped to the same element in
> the result.  It turns out that in the sparse matrix code used by the
> Matrix package the triplet representation allows for duplicate index
> positions with the convention that the resulting value at a position
> is the sum of the values of any triplets with that index pair.
>
> If you decide to use this approach please be aware that the indices
> for the triplet representation in the Matrix package are 0-based (as
> in C code) not 1-based (as in R code).  (I imagine that Martin is
> thinking "we really should change that" as he reads this part.)
>
>>
>> --
>> +--+
>> | Jon Stearley  (505) 845-7571  (FAX 844-9297) |
>> | Sandia National Laboratories  Scalable Systems Integration   |
>> +--+
>>
>>
>> # x and y are of SparseM class matrix.csr
>> "aggregate.csr" <-
>> function(x, fac) {
>>  # make a vector indicating the row of each nonzero
>>  rows <- integer(length=length([EMAIL PROTECTED]))
>>  [EMAIL PROTECTED]:nrow(x)]] <- 1 # put a 1 at start of each row
>>  rows <- as.integer(cumsum(rows)) # and finish with a cumsum
>>
>>  # make a vector indicating the column factor of each nonzero
>>  f <- [EMAIL PROTECTED]
>>
>>  # aggregate by row,f
>>  y <- tapply([EMAIL PROTECTED], list(rows,f), sum)
>>
>>  # sparsify it
>>  y[is.na(y)] <- 0  # change tapply NAs to as.matrix.csr 0s
>>  y <- as.matrix.csr(y)
>>
>>  y
>> }
>>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SparseM and Stepwise Problem

2007-01-30 Thread roger koenker
One simple possibility  -- if you can generate the X matrix in dense  
form is
the coercion

X <- as.matrix.csr(X)

Unfortunately, there is no current way to go from a formula to a  
sparse X
matrix  without  passing through a dense version of X first.   
Otherwise you
need to use new() to define the X matrix directly.  This is usually  
not that
difficult, but it depends on the model



url:www.econ.uiuc.edu/~roger    Roger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On Jan 30, 2007, at 5:31 PM, [EMAIL PROTECTED] wrote:

> I'm trying to use stepAIC on sparse matrices, and I need some help.
> The documentation for slm.fit suggests:
> slm.fit and slm.wfit call slm.fit.csr to do Cholesky decomposition  
> and then
> backsolve to obtain the least squares estimated coefficients. These  
> functions can be
> called directly if the user is willing to specify the design matrix  
> in matrix.csr form.
> This is often advantageous in large problems to reduce memory  
> requirements.
> I need some help or a reference that will show how to create the  
> design matrix from
> data in matrix.csr form.
> Thanks for any help.
>
>
> -- 
> David Katz
>  www.davidkatzconsulting.com
>541 482-1137
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inverse fuction of ecdf

2007-01-28 Thread roger koenker
quantile() does some somewhat exotic interpolation --- if you are  
wanting to
match moments you need to be more explicit about how you are computing
moments for the two approaches...

On Jan 28, 2007, at 5:06 PM, Geoffrey Zhu wrote:

> Hi Benilton,
>
> I tried this. It sort of works, but the results are not very
> satisfactionary. The 3rd moment and higher do not match those of the
> original by a large difference. Do you have any better way to do this?
>
> Thanks,
> Geoffrey
>
> -Original Message-
> From: Benilton Carvalho [mailto:[EMAIL PROTECTED]
> Sent: Sunday, January 28, 2007 4:45 PM
> To: Geoffrey Zhu
> Cc: r-help@stat.math.ethz.ch
> Subject: Re: [R] Inverse fuction of ecdf
>
> ?quantile
>
> b
>
> On Jan 28, 2007, at 5:41 PM, Geoffrey Zhu wrote:
>
>> Hi Everyone,
>>
>> I want to generate some random numbers according to some empirical
>> distribution. Therefore I am looking for the inverse of an empirical
>> cumulative distribution function. I haven't found any in R. Can  
>> anyone
>
>> give a pointer?
>>
>> Thanks,
>> Geoffrey
>
>
>
> ___=0A=
> =0A=
> =0A=
> The information in this email or in any file attached hereto... 
> {{dropped}}
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 2 problems with latex.table (quantreg package) - reproducible

2007-01-10 Thread roger koenker
The usual R-help etiquette recommends:

1.  questions about packages go to the maintainer, not to R-help.

2.  examples should be reproducible:  ie self contained.

if you look carefully at the function latex.summary.rqs  you will see
that there is a failure to pass the argument "..." on to  
latex.table.  This
_may_ be the source of your problem if in fact your v1 and v2 were
summary.rqs objects, but I doubt that they are.

You might try caption = "".  More generally there are much improved
latex tools elsewhere in R; if you aren't making tables that are  
specific
to quantreg, you might want to use them.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Jan 10, 2007, at 12:23 PM, Kati Schweitzer wrote:

> Dear all,
>
> When using latex.table from the quantreg package, I don't seem to  
> be able to set
> table.env=FALSE: when I don't specify caption (as I think I should,  
> when
> understanding the R help rightly(?)), I get an error message, and  
> when I
> do so, of course I get one, as well.
> The funny thing is, that a table is indeed produced in the first case,
> so I get a nice tabular, but as I'm using the command within a for -
> loop, the loop stops due to the error and only one latex table is
> produced.
>
> Example R-Code:
>
> library(quantreg)
>
> v1 <- c("val1","val1","val2")
> v2 <- c("val1","val2","val2")
> tab <- table(v1,v2)
>
> latex.table(tab,table.env=FALSE)
> #error - german R error message (saying that caption  is missing and
> has no default :-) ):
> #Fehler in cat(caption, "\n", file = fi, append = TRUE) :
> #   Argument "caption" fehlt (ohne Standardwert)
>
> latex.table(tab,table.env=FALSE,caption="nothing")
> #error - german R error message:
> #Fehler in latex.table(tab, table.env = FALSE, caption = "nothing") :
> #   you must have table.env=TRUE if caption is given
>
>
> The second problem is, that - when using latex.table to produce a
> tabular within a table environment - I would like to specify cgroup
> with only one value - one multicolumn being a heading for both columns
> in the table.
> But I'm not able to produce latex-compilable code:
>
> latex.table(tab,cgroup="v2",caption="my table")
>
> gives me the following latex code:
> \begin{table}[hptb]
> \begin{center}
> \begin{tabular}{|l||c|c|} \hline
> \multicolumn{1}{|l||}{\bf
> tab}&\multicolumn{}{c||}{}&\multicolumn{2}{c|}{\bf v2}\\ \cline{2-3}
> \multicolumn{1}{|l||}{}&\multicolumn{1}{c|}{val1}&\multicolumn{1} 
> {c|}{val2}\\
> \hline
> val1&1&1\\
> val2&0&1\\
> \hline
> \end{tabular}
> \vspace{3mm}
> \caption{my table\label{tab}}
> \end{center}
> \end{table}
>
> and within this code the problem is the second multicolumn
> (&\multicolumn{}{c||}{}), as it has no number specifying how many
> columns the multicolumn should cover. Latex (at least my version)
> complains.
> When deleting this part of the code, the table is compiled and looks
> exactly how I want it to look. I'm doing this with a system call and
> an shell script right now, but this seems pretty ugly to me...
>
> When I specify 2 columns, this problem doesn't occur:
> latex.table(tab,cgroup=c("blah","v2"),caption="my table")
>
> I'm running R Version 2.3.0 (2006-04-24) on a linux machine Fedora
> Core 5 (i386).
>
> Can anyone help me find my mistakes?
>
> Thanks a lot
> ... and sorry for my bad English and potential newbie mistakes!!
> Kati
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] package dependency tree

2007-01-02 Thread roger koenker
Is there a painless way to find the names of all packages on CRAN
that "Depend" on a specified package?


url:www.econ.uiuc.edu/~roger    Roger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] matrix size

2007-01-01 Thread roger koenker

On Jan 1, 2007, at 4:43 PM, Armelini, Guillermo wrote:

> Hello everyone
> Could anybody tell me how to set the following matrix?
>
> n2<-matrix(nrow=10185,ncol=10185,seq(0,0,length=103734225))

You can use:

library(SparseM)
as.matrix.coo(0,10185,10185)

but then you need to find something interesting to do with such a
boring matrix...


>
> R answer was
> Error: cannot allocate vector of size 810423 Kb
>
> Are there any solution? I tried to increase the memory size but it  
> didn't work
> G
>
>
>
> This message has been scanned for viruses by TRENDMICRO,\ an... 
> {{dropped}}
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RuleFit & quantreg: partial dependence plots; showing an effect

2006-12-20 Thread roger koenker


On Dec 20, 2006, at 8:43 AM, Ravi Varadhan wrote:

> Dear Roger,
>
> Is it possible to combine the two ideas that you mentioned: (1)  
> algorithmic
> approaches of Breiman, Friedman, and others that achieve  
> flexibility in the
> predictor space, and (2) robust and flexible regression like QR  
> that achieve
> flexibility in the response space, so as to achieve complete  
> flexibility?
> If it is possible, are you or anyone else in the R community  
> working on
> this?
>
>
There are some tentative steps in this direction.  One is the rqss()  
fitting
in my quantreg package which does QR fitting with additive models
using total variation as a roughness penalty for nonlinear terms.
Another, along more tree structured lines, is Nicolai Meinshausen's
quantregforest package.
>
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of roger koenker
> Sent: Wednesday, December 20, 2006 8:57 AM
> To: Mark Difford
> Cc: R-help list
> Subject: Re: [R] RuleFit & quantreg: partial dependence plots;  
> showing an
> effect
>
> They are entirely different:  Rulefit is a fiendishly clever
> combination of decision tree  formulation
> of models and L1-regularization intended to select parsimonious fits
> to very complicated
> responses yielding e.g. piecewise constant functions.  Rulefit
> estimates the  conditional
> mean of the response over the covariate space, but permits a very
> flexible, but linear in
> parameters specifications of the covariate effects on the conditional
> mean.  The quantile
> regression plotting you refer to adopts a fixed, linear specification
> for conditional quantile
> functions and given that specification depicts how the covariates
> influence the various
> conditional quantiles of the response.   Thus, roughly speaking,
> Rulefit is focused on
> flexibility in the x-space, maintaining the classical conditional
> mean objective; while
> QR is trying to be more flexible in the y-direction, and maintaining
> a fixed, linear
> in parameters specification for the covariate effects at each  
> quantile.
>
>
> url:www.econ.uiuc.edu/~rogerRoger Koenker
> email[EMAIL PROTECTED]Department of Economics
> vox: 217-333-4558University of Illinois
> fax:   217-244-6678Champaign, IL 61820
>
>
> On Dec 20, 2006, at 4:17 AM, Mark Difford wrote:
>
>> Dear List,
>>
>> I would greatly appreciate help on the following matter:
>>
>> The RuleFit program of Professor Friedman uses partial dependence
>> plots
>> to explore the effect of an explanatory variable on the response
>> variable, after accounting for the average effects of the other
>> variables.  The plot method [plot(summary(rq(y ~ x1 + x2,
>> t=seq(.1,.9,.05] of Professor Koenker's quantreg program
>> appears to
>> do the same thing.
>>
>>
>> Question:
>> Is there a difference between these two types of plot in the manner
>> in which they depict the relationship between explanatory variables
>> and the response variable ?
>>
>> Thank you inav for your help.
>>
>> Regards,
>> Mark Difford.
>>
>> -
>> Mark Difford
>> Ph.D. candidate, Botany Department,
>> Nelson Mandela Metropolitan University,
>> Port Elizabeth, SA.
>>
>> __
>> R-help@stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RuleFit & quantreg: partial dependence plots; showing an effect

2006-12-20 Thread roger koenker
They are entirely different:  Rulefit is a fiendishly clever  
combination of decision tree  formulation
of models and L1-regularization intended to select parsimonious fits  
to very complicated
responses yielding e.g. piecewise constant functions.  Rulefit   
estimates the  conditional
mean of the response over the covariate space, but permits a very  
flexible, but linear in
parameters specifications of the covariate effects on the conditional  
mean.  The quantile
regression plotting you refer to adopts a fixed, linear specification  
for conditional quantile
functions and given that specification depicts how the covariates  
influence the various
conditional quantiles of the response.   Thus, roughly speaking,  
Rulefit is focused on
flexibility in the x-space, maintaining the classical conditional  
mean objective; while
QR is trying to be more flexible in the y-direction, and maintaining  
a fixed, linear
in parameters specification for the covariate effects at each quantile.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Dec 20, 2006, at 4:17 AM, Mark Difford wrote:

> Dear List,
>
> I would greatly appreciate help on the following matter:
>
> The RuleFit program of Professor Friedman uses partial dependence  
> plots
> to explore the effect of an explanatory variable on the response
> variable, after accounting for the average effects of the other
> variables.  The plot method [plot(summary(rq(y ~ x1 + x2,
> t=seq(.1,.9,.05] of Professor Koenker's quantreg program  
> appears to
> do the same thing.
>
>
> Question:
> Is there a difference between these two types of plot in the manner  
> in which they depict the relationship between explanatory variables  
> and the response variable ?
>
> Thank you inav for your help.
>
> Regards,
> Mark Difford.
>
> -
> Mark Difford
> Ph.D. candidate, Botany Department,
> Nelson Mandela Metropolitan University,
> Port Elizabeth, SA.
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nonlinear quantile regression

2006-12-02 Thread roger koenker
This isn't a nonlinear QR problem.  You can write:

f <- rq(y ~ log(x),  data=Dat, tau=0.25)

which corresponds to the model

Q_y (.25|x)  =  a log(x) + b

note the sign convention on b.

url:www.econ.uiuc.edu/~roger    Roger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On Dec 2, 2006, at 1:47 PM, Ricardo Bessa wrote:

> Hello, I’m with a problem in using nonlinear quantile regression, the
> function nlrq.
> I want to do a quantile regression o nonlinear function in the form
> a*log(x)-b, the coefficients “a” and “b” is my objective. I try to  
> use the
> command:
>
> funx <- function(x,a,b){
> res <- a*log(x)-b
> res
> }
>
> Dat.nlrq <- nlrq(y ~ funx(x, a, b), data=Dat, tau=0.25, trace=TRUE)
>
> But a can’t solve de problem, How I put the formula “y ~ funx(x,a,b)”?
>
> _
> MSN Busca: fácil, rápido, direto ao ponto.  http://search.msn.com.br
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] scanning a pdf scan

2006-10-27 Thread roger koenker
Thanks for your suggestions.  Trial and error experimentation
with adobe acrobat produced the following method:

It looks like it is possible to highlight the numerical part of the
table in Acrobat and then copy/paste into a text file, with about
98 percent accuracy.  Wonders never cease.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Oct 27, 2006, at 11:52 AM, Gabor Grothendieck wrote:

> I don't have specific experience with this but strapply
> of package gsubfn can extract information from a string by content
> as opposed to delimiters. e.g.
>
>> library(gsubfn)
>> strapply("abc34def56xyz", "[0-9]+", c)[[1]]
> [1] "34" "56"
>
> On 10/27/06, roger koenker <[EMAIL PROTECTED]> wrote:
>> I have a pdf scan of several pages of data from a quite famous old
>> paper by
>> C.S. Pierce (1873).  I would like (what else?) to convert it into an
>> R dataframe.
>> Somewhat to my surprise the pdf seems to already be in a character
>> recognized
>> form, since I can search for numerical strings and they are nicely
>> found.  Of
>> course, as is usual with such tables there are also headings and
>> column lines, etc
>> etc. that are less interesting than the numbers themselves.  I've
>> tried saving the
>> pdf in various formats, some of which look vaguely tractable, but I'm
>> hoping
>> that there is something that is more automatic.
>>
>> Does anyone have experience that they could share toward this  
>> objective?
>>
>>
>> url:www.econ.uiuc.edu/~rogerRoger Koenker
>> email[EMAIL PROTECTED]Department of Economics
>> vox: 217-333-4558University of Illinois
>> fax:   217-244-6678Champaign, IL 61820
>>
>> __
>> R-help@stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting- 
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] scanning a pdf scan

2006-10-27 Thread roger koenker
I have a pdf scan of several pages of data from a quite famous old  
paper by
C.S. Pierce (1873).  I would like (what else?) to convert it into an  
R dataframe.
Somewhat to my surprise the pdf seems to already be in a character  
recognized
form, since I can search for numerical strings and they are nicely  
found.  Of
course, as is usual with such tables there are also headings and  
column lines, etc
etc. that are less interesting than the numbers themselves.  I've  
tried saving the
pdf in various formats, some of which look vaguely tractable, but I'm  
hoping
that there is something that is more automatic.

Does anyone have experience that they could share toward this objective?


url:www.econ.uiuc.edu/~roger    Roger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Quantile regression questions

2006-10-26 Thread roger koenker
Brian,

It is hard to say at this level of resolution of the question, but it  
would seem that you might
be able to start by considering each sample vector as as repeated  
measurement of the
fiber length -- so 12 obs in the first 1/16th bin, 235 in the next  
and so forth, all associated
with some vector of covariates representing location, variety, etc,  
then the conventional
quantile regression would serve to estimate a conditional quantile  
function for fiber length
for each possible covariate setting --- obviously this would require  
some model for the
way that the covariate effects fit together, linearity,  possible  
interactions, etc etc, and it
would also presume that it made sense to treat the vector of  
responses as independent
measurements.  Building in possible dependence involves some new  
challenges, but
there is some recent experience with inferential methods for  
microarrays that have
incorporated these effects.

I'd be happy to hear more about the data and possible models, but  
this should be
routed privately since the topic is rather too specialized for R-help.


url:www.econ.uiuc.edu/~roger    Roger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Oct 26, 2006, at 7:20 AM, Brian Gardunia wrote:

> I am relatively new to R, but am intrigued by its flexibility.  I  
> am interested in quantile regression and quantile estimation as  
> regards to cotton fiber length distributions.  The length  
> distribution affects spinning and weaving properties, so it is  
> desirable to select for certain distribution types.  The AFIS fiber  
> testing machinery outputs a vector for each sample of type c(12,  
> 235, 355, . . . n) with the number of fibers in n=40 1/16 inch  
> length categories.  My question is what would be the best way to  
> convert the raw output to quantiles and whether it would be  
> appropriate to use quantile regression to look at whether location,  
> variety, replication, etc. modify the length distribution.
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Quantile Regression

2006-10-25 Thread roger koenker
 > data(engel)
 > attach(engel)
 > rq(y~x)
Call:
rq(formula = y ~ x)

Coefficients:
(Intercept)   x
81.4822474   0.5601806

Degrees of freedom: 235 total; 233 residual
 > rq(y~x)->f
 > f$tau
[1] 0.5

url:www.econ.uiuc.edu/~roger    Roger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Oct 25, 2006, at 4:39 AM, [EMAIL PROTECTED] wrote:

> Hi,
>
> how is it possible to retrieve the corresponding tau value for each  
> observed data pair (x(t) y(t), t=1,...,n) when doing a quantile  
> regression like
>
> rq.fit <- rq(y~x,tau=-1).
>
> Thank you for your help.
>
> Jaci
> --
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem loading SpareM package

2006-10-12 Thread roger koenker

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Oct 12, 2006, at 7:12 AM, Roger Bivand wrote:

> On Thu, 12 Oct 2006, Coomaren Vencatasawmy wrote:
>
>> Hi,
>>  I have just installed R 2.4.0 and when I try to load SpareseM, I get
>> the following error message
>>
>> library(SparseM)
>> Package SparseM (0.71) loaded.  To cite, see citation("SparseM")
>> Error in loadNamespace(package, c(which.lib.loc, lib.loc),  
>> keep.source = keep.source) :
>> in 'SparseM' methods specified for export, but none  
>> defined: as.matrix.csr, as.matrix.csc, as.matrix.ssr,  
>> as.matrix.ssc, as.matrix.coo, as.matrix, t, coerce, dim, diff,  
>> diag, diag<-, det, norm, chol, backsolve, solve, model.matrix,  
>> model.response, %*%, %x%, image
>> Error: package/namespace load failed for 'SparseM'
>>
>
> Please re-install the package. All contributed packages using new- 
> style
> classes need to be re-installed because the internal representation of
> such classes and methods has changed, see CHANGES TO S4 METHODS in  
> NEWS.
> Doing:
>
> update.packages(checkBuilt = TRUE)
>
> will check your libraries for packages built under previous  
> releases and
> replace them with ones built for the platform release.
>
>>
>> I have contacted the package maintainers and they couldn't be of  
>> any help.
>>
>> I do not recall getting this error in older R versions.
>>
>> Regards
>>
>> Coomaren
>>
>> Send instant messages to your online friends http:// 
>> uk.messenger.yahoo.com
>>  [[alternative HTML version deleted]]
>>
>> __
>> R-help@stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting- 
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> -- 
> Roger Bivand
> Economic Geography Section, Department of Economics, Norwegian  
> School of
> Economics and Business Administration, Helleveien 30, N-5045 Bergen,
> Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
> e-mail: [EMAIL PROTECTED]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] solaris 64 build?

2006-10-05 Thread roger koenker
We have a solaris/sparc machine that has been running an old version
of R-devel:  Version 2.2.0 Under development (unstable) (2005-06-04  
r34577)
which was built as m64 from sources.  Attempting to upgrade to 2.4.0  
the configure step
goes ok, but I'm getting early on from make:

> gcc -m64  -L/opt/sfw/lib/sparcv9  -L/usr/lib/sparcv9
> -L/usr/openwin/lib/sparcv9  -L/usr/local/lib -o R.bin Rmain.o
> CConverters.o CommandLineArgs.o  Rdynload.o Renviron.o RNG.o  apply.o
> arithmetic.o apse.o array.o attrib.o  base.o bind.o builtin.o
> character.o coerce.o colors.o complex.o connections.o context.o  cov.o
> cum.o  dcf.o datetime.o debug.o deparse.o deriv.o  dotcode.o dounzip.o
> dstruct.o duplicate.o  engine.o envir.o errors.o eval.o  format.o
> fourier.o  gevents.o gram.o gram-ex.o graphics.o  identical.o  
> internet.o
> iosupport.o  lapack.o list.o localecharset.o logic.o  main.o mapply.o
> match.o memory.o model.o  names.o  objects.o optim.o optimize.o
> options.o  par.o paste.o pcre.o platform.o  plot.o plot3d.o plotmath.o
> print.o printarray.o printvector.o printutils.o qsort.o  random.o
> regex.o registration.o relop.o rlocale.o  saveload.o scan.o seq.o
> serialize.o size.o sort.o source.o split.o  sprintf.o startup.o
> subassign.o subscript.o subset.o summary.o sysutils.o  unique.o util.o
> version.o vfonts.o xxxpr.o  mkdtemp.o ../unix/libunix.a
> ../appl/libappl.a ../nmath/libnmath.a -L../../lib -lRblas
> -L/usr/local/encap/gf7764-3.4.3+2/lib/gcc/sparc64-sun-solaris2.9/3.4.3
> -L/usr/ccs/bin/sparcv9 -L/usr/ccs/bin -L/usr/ccs/lib
> -L/usr/local/encap/gf7764-3.4.3+2/lib/sparcv9
> -L/usr/local/encap/gf7764-3.4.3+2/lib -lg2c -lm -lgcc_s
> ../extra/zlib/libz.a  ../extra/bzip2/libbz2.a ../extra/pcre/libpcre.a
> ../extra/intl/libintl.a  -lreadline -ltermcap -lnsl -lsocket -ldl -lm


> Undefined   first referenced
> symbol in file
> __builtin_isnan arithmetic.o
> ld: fatal: Symbol referencing errors. No output written to R.bin
> collect2: ld returned 1 exit status

I've tried to look at the difference in outcomes in the old R-devel
version --  if I touch arithmetic.c  there and then type make I get a  
something
almost the same as above except for the following  bits that are new  
to 2.4.0
(this diff is after replacing spaces with linebreaks obviously.)

ysidro.econ.uiuc.edu% diff t0 t1
54a55
 > localecharset.o
81a83
 > rlocale.o
101a104
 > mkdtemp.o
104a108,109
 > -L../../lib
 > -lRblas


Has there been some change in the way that Rblas is used, or in
isnan?  It didn't seem so from a look at arithmetic.c, but this is well
beyond me.

I hope that someone sees something suspicious, or could point me
toward a better diagnostic.  Thanks,

Roger


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How can I generate this numbers

2006-10-02 Thread roger koenker
Try:

 > rsimplex <- function(n){
u <- diff(sort(runif(n)))
c(u,1-sum(u))
}

On Oct 2, 2006, at 5:43 PM, Rolf Turner wrote:

> Ricardo Rios wrote:
>
>> Hi Rolf Turner, I have a  statistical model, it model need this
>> numbers for calculate the probability. This numbers must be random.
>>
>> For example I need that
>>> magicfunction(3)
>>> [1] 0.3152460 0.5231614 0.1615926
>>> magicfunction(3)
>>> [1]  0.6147933 0.3122999  0.0729068
>>
>> but the argument of the function is arbitrary , does somebody
>> know if exist this function in R?
>
>   As far as I know, no such function exists in R, but
>   it would be totally trivial to write one, if that's
>   what you really want.
>
>   However the question you pose makes little sense to me.  If
>   you really have a ``statisical model'' then there must be
>   some marginal distribution for each of the probabilities (I
>   *assume* They are probabilities) going into the sequence
>   which you wish to sum to 1.
>
>   You mention no such distribution.
>
>   To generate such a sequence with an arbitray marginal
>   distribution is so trivial that it does not bear discussing.
>
>   If you really can't see how to do this, then you probably
>   shouldn't be messing about with ``statistical models''.
>
>   You did not explicitly deny that this is a homework problem.
>
>   I still suspect that it is.
>
>   cheers,
>
>   Rolf Turner
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] counting a sequence of charactors or numbers

2006-09-30 Thread roger koenker
?rle

url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On Sep 30, 2006, at 5:13 PM, Joe Byers wrote:

> I have the following sequence of characters.  These could be  
> integers as
> well.  For this problem, only two values are valid.
>
> S S S S S S W W W W W W W W S S S S S S S S W W W W W W W W S S S S  
> S S
> S S S S S S S W W W W W W W W W
>
> I need to determine the count of the classes/groups in sequence. as
> 6,8,8,8,13,9 where the sum of these equal my total observations.
>
> Any help is greatly appreciated.
>
> Thank you
> Joe
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Greedy triangulation

2006-09-14 Thread roger koenker
Or, perhaps, tripack?

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Sep 14, 2006, at 10:32 AM, Greg Snow wrote:

> Does the deldir package do what you want?
>
>
> --  
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> [EMAIL PROTECTED]
> (801) 408-8111
>
>
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Dan Bebber
> Sent: Thursday, September 14, 2006 3:56 AM
> To: r-help@stat.math.ethz.ch
> Subject: [R] Greedy triangulation
>
> Hello,
>
> does anyone have code that will generate a greedy triangulation
> (triangulation that uses shortest non-overlapping edges) for a set of
> points in Euclidean space?
>
> Thanks,
> Dan Bebber
> ___
> Dr. Daniel P. Bebber
> Department of Plant Sciences
> University of Oxford
> South Parks Road
> Oxford OX1 3RB
> UK
> Tel. 01865 275060
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Ranking and selection statistical procedure

2006-08-31 Thread roger koenker
Look at ?rank ?order and ?quantile  assuming that you are using
these terms as in cs.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On Aug 31, 2006, at 5:20 AM, Prasanna BALAPRAKASH wrote:

> Dear R helpers
>
> I would like to know if the "Ranking and Selection" statistical
> procedure has been implemented in R. I made a quick search in the R
> packages list but I could not find it.
>
> Thanks in advance
> Prasanna
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Can R compute the expected value of a random variable?

2006-08-27 Thread roger koenker
General questions elicit general answers; more specific questions
elicit more specific answers.For example,

 > exp(2+9/2)
[1] 665.1416

url:www.econ.uiuc.edu/~roger    Roger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On Aug 27, 2006, at 11:52 AM, Paul Smith wrote:

> On 8/26/06, Mike Nielsen <[EMAIL PROTECTED]> wrote:
>> Yes.
>>
>>> Can R compute the expected value of a random variable?
>
> Mike: thank you very much indeed for your so insightful and complete
> answer. I have  meanwhile deepened my research and, as a consequence,
> I have found the following solution, which seems to work fine:
>
>> integrand <- function(x){x*dlnorm(x,meanlog=2,sdlog=3)}
>> integrate(integrand,-Inf, Inf)
> 665.146 with absolute error < 0.046
>>
>
> There is also a package apt to calculate expected values: it is called
> distrEx. (Thanks, Matthias.)
>
> Paul
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating trace of products

2006-08-14 Thread roger koenker
I would suspect that something simple like

sum(diag(crossprod(A,B)))

would be quite competitive...

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Aug 14, 2006, at 6:58 AM, Søren Højsgaard wrote:

> Dear all,
> I need to calculate tr(A B), tr(A B A B) and similar quantities  
> **fast** where the matrices A, B are symmetrical. I've searched for  
> built-in functions for that purpose, but without luck. Can anyone  
> help?
> Thanks in advance
> Søren
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding the position of a variable in a data.frame

2006-08-02 Thread roger koenker
it is the well-known wicked which problem:  if you had (grammatically  
incorrectly)
thought "... which I want to change" then you might have been led
to type (in another window):

?which

and you would have seen the light.  Maybe that() should be an alias
for which()?

url:www.econ.uiuc.edu/~roger    Roger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Aug 2, 2006, at 4:01 PM, John Kane wrote:

> Simple problem but I don't see the answer. I'm trying
> to clean up some data
> I have 120 columns in a data.frame.  I have one value
> in a column named "blaw" that I want to change. How do
> I find the coordinates. I can find the row by doing a
> subset on the data.frame but how do I find out here
> "blaw " is in columns without manually counting them
> or converting names(Df) to a list and reading down the
> list.
>
> Simple example
>
> cat <- c( 3,5,6,8,0)
> dog <- c(3,5,3,6, 0)
> rat <- c (5, 5, 4, 9, 0)
> bat <- c( 12, 42, 45, 32, 54)
>
> Df <- data.frame(cbind(cat, dog, rat, bat))
> Df
> subset(Df, bat >= 50)
>
> results
>   cat dog rat bat
> 5   0   0   0  54
>
>
> Thus I know that my target is in row 5 but how do I
> figure out where 'bat' is?
>
> All I want to do is be able to say
> Df[5,4] <- 100
>
> Is there some way to have function(bat) return the
> column number: some kind of a colnum() function?  I
> had thought that I had found somthing  in
> library(gdata) matchcols but no luck.
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pseudo R for Quant Reg

2006-08-02 Thread roger koenker
This is getting to be a faq -- here is a prior answer:

> No, but the objective function can be computed for any fitted
> rq object, say f,  as
>
>   rho <- function(u,tau=.5)u*(tau - (u < 0))
>   V <- sum(rho(f$resid, f$tau))
>
> so it is easy to roll your own

I don't much like R1, or R2 for that matter, so it isn't likely to
be automatically provided in quantreg any time soon.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Aug 1, 2006, at 11:46 AM, [EMAIL PROTECTED] wrote:

> Dear R Users,
>
> Did someone implemented the R1 (Pseudo R-2) and likelihood ratio
> statistics for quantile regressions,  which are some of the inference
> procedures for quantile regression
> found in Koenker and Machado (1999)?
> I tried the Ox version, but my dataset is too large (> 50.000) and the
> algorith breaks.
> 
> Ricardo Gonçalves Silva, M. Sc.
> Apoio aos Processos de Modelagem Matemática
> Econometria & Inadimplência
> Serasa S.A.
> (11) - 6847-8889
> [EMAIL PROTECTED]
>
> ** 
> 
> As informações contidas nesta mensagem e no(s) arquivo(s...{{dropped}}
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Warning Messages using rq -quantile regressions

2006-07-23 Thread roger koenker

On Jul 23, 2006, at 5:27 AM, roger koenker wrote:

> When computing the median from a sample with an even number of  
> distinct
> values there is inherently some ambiguity about its value:  any  
> value between
> the middle order statistics is "a" median.  Similarly, in  
> regression settings the
> optimization problem solved by the "br" version of the simplex  
> algorithm,
> modified to do general quantile regression identifies cases where  
> there may
> be non uniqueness of this type.  When there are "continuous"  
> covariates this
> is quite rare, when covariates are discrete then it is relatively  
> common, at
> least when tau is chosen from the rationals.  For univariate  
> quantiles R provides
> several methods of resolving this sort of ambiguity by  
> interpolation, "br" doesn't
> try to do this, instead returning the first vertex solution that it  
> comes to.  Should
> we worry about this?  My answer would be no.  Viewed from an  
> asymptotic
> perspective any choice of a unique value among the multiple  
> solutions is a
> 1/n perturbation  -- with 2500 observations this is unlikely to be  
> interesting.
> More to the point, inference about the coefficients of the model,  
> which provides
> O(1/sqrt(n)) intervals is perfectly capable of assessing the  
> meaningful uncertainty
> about these values.  Finally, if you would prefer an estimation  
> procedure that
> produced unique values more like the interpolation procedures in  
> the univariate
> setting, you could try the "fn" option for the algorithm.  Interior  
> point methods for
> solving linear programming problems have the "feature" that they  
> tend to converge
> to the centroid of solutions sets when such sets exist.  This  
> approach provides a
> means to assess the magnitude of the non-uniqueness in a particular  
> application.
>
> I hope that this helps,
>
> url:www.econ.uiuc.edu/~rogerRoger Koenker
> email   [EMAIL PROTECTED]   Department of  
> Economics
> vox:217-333-4558University of Illinois
> fax:217-244-6678Champaign, IL 61820
>
>
> On Jul 22, 2006, at 9:07 PM, Neil KM wrote:
>
>> I am a new to using quantile regressions in R. I have estimated a  
>> set of
>> coefficients using the method="br" algorithm with the rq command  
>> at various
>> quantiles along the entire distribution.
>>
>> My data set contains approximately 2,500 observations and I have 7  
>> predictor
>> variables. I receive the following warning message:
>>
>> Solution may be nonunique in: rq.fit.br(x, y, tau = tau, ...)
>>
>> There are 13 warnings of this type after I run a single  model. My  
>> results
>> are similiar to the results I received in other stat programs  
>> using quantile
>> reg procedures. I am unclear what these warning messages imply and  
>> if there
>> are problems with model fit/convergence that I may need to consider.
>> Any help would be appreciated. Thanks!
>>
>> __
>> R-help@stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting- 
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Quantreg error

2006-07-17 Thread roger koenker
As I have already told you once, and as the posting guide suggests,

"If the question relates to a contributed package , e.g., one  
downloaded from CRAN, try contacting the package maintainer first.  
You can also use find("functionname") and packageDescription 
("packagename") to find this information. Only send such questions to  
R-help or R-devel if you get no reply or need further assistance.  
This applies to both requests for help and to bug reports."

the error message seems quite clear:  it means that the model that  
you have specified
implicitly with the formula has a singular X matrix.  The quantile  
regression fitting
functions don't understand about singular designs;  some day they may  
but it isn't
a high priority for me.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Jul 17, 2006, at 9:27 AM, [EMAIL PROTECTED] wrote:

> Dear User,
> I got the following error running a regression quantile:
>
>> rq1<-rq(dep ~ ., model=TRUE, data=exo, tau=0.5 );
>> summary(rq1)
> Erro em rq.fit.fnb(x, y, tau = tau + h) :
> Error info =  75 in stepy: singular design
>
> Any hint about the problem?
>
>
> Thanks a lot,
> 
> Ricardo Gonçalves Silva, M. Sc.
> Apoio aos Processos de Modelagem Matemática
> Econometria & Inadimplência
> Serasa S.A.
> (11) - 6847-8889
> [EMAIL PROTECTED]
>
> ** 
> 
>
> As informações contidas nesta mensagem e no(s) arquivo(s) anexo(s) são
> endereçadas exclusivamente à(s) pessoa(s) e/ou instituição(ões) acima
> indicada(s), podendo conter dados confidenciais, os quais não  
> podem, sob
> qualquer forma ou pretexto, ser utilizados, divulgados, alterados,
> impressos ou copiados, total ou parcialmente, por pessoas não  
> autorizadas.
> Caso não seja o destinatário, favor providenciar sua exclusão e  
> notificar o
> remetente imediatamente.  O uso impróprio será tratado conforme as  
> normas
> da empresa e da legislação em vigor.
> Esta mensagem expressa o posicionamento pessoal do subscritor e não  
> reflete
> necessariamente a opinião da Serasa.
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting- 
> guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] package:Matrix handling of data with identical indices

2006-07-09 Thread roger koenker
>

On 7/8/06, Thaden, John J <[EMAIL PROTECTED]> wrote:

> As there is nothing inherent in either compressed, sparse,
> format that would prevent recognition and handling of
> duplicated index pairs, I'm curious why the dgCMatrix
> class doesn't also add x values in those instances?

why not multiply them?  or take the larger one, or ...?  I would
interpret this as a case of user negligence -- there is no
"natural" default behavior for such cases.

On Jul 9, 2006, at 11:06 AM, Douglas Bates wrote:

> Your matrix Mc should be flagged as invalid.  Martin and I should
> discuss whether we want to add such a test to the validity method.  It
> is not difficult to add the test but there will be a penalty in that
> it will slow down all operations on such matrices and I'm not sure if
> we want to pay that price to catch a rather infrequently occuring
> problem.

Elaborating the validity procedure to flag such instances seems
to be well worth the  speed penalty in my view.  Of course,
anticipating every such misstep imposes a heavy burden
on developers and constitutes the real "cost" of more elaborate
validity checking.

[My 2cents based on experience with SparseM.]

url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] KhmaladzeTest

2006-07-08 Thread roger koenker
Questions about packages should be directed to the package maintainers.
A more concise example of the difficulty, with accessible data would  
also be helpful.

url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On Jul 7, 2006, at 7:39 PM, raul sanchez wrote:

> Hello. I am a beginer in R and I can not implement the  
> KhmaladzeTest in the following command. Please help me!!!
>   PD: I attach thw results and the messages of the R program
>
>   R : Copyright 2006, The R Foundation for Statistical Computing
> Version 2.3.1 (2006-06-01)
> ISBN 3-900051-07-0
>
> R es un software libre y viene sin GARANTIA ALGUNA.
> Usted puede redistribuirlo bajo ciertas circunstancias.
> Escriba 'license()' o 'licence()' para detalles de distribucion.
>
> R es un proyecto colaborativo con muchos contribuyentes.
> Escriba 'contributors()' para obtener mas informacion y
> 'citation()' para saber como citar R o paquetes de R en publicaciones.
>
> Escriba 'demo()' para demostraciones, 'help()' para el sistema on-line
> de ayuda,
> o 'help.start()' para abrir el sistema de ayuda HTML con su navegador.
> Escriba 'q()' para salir de R.
>
>> utils:::menuInstallLocal()
> package 'quantreg' successfully unpacked and MD5 sums checked
> updating HTML package descriptions
>> utils:::menuInstallLocal()
> package 'foreign' successfully unpacked and MD5 sums checked
> updating HTML package descriptions
>> utils:::menuInstallLocal()
> package 'Rcmdr' successfully unpacked and MD5 sums checked
> updating HTML package descriptions
>> local({pkg <- select.list(sort(.packages(all.available = TRUE)))
> + if(nchar(pkg)) library(pkg, character.only=TRUE)})
>> local({pkg <- select.list(sort(.packages(all.available = TRUE)))
> + if(nchar(pkg)) library(pkg, character.only=TRUE)})
> quantreg package loaded:  To cite see citation("quantreg")
>> local({pkg <- select.list(sort(.packages(all.available = TRUE)))
> + if(nchar(pkg)) library(pkg, character.only=TRUE)})
>> local({pkg <- select.list(sort(.packages(all.available = TRUE)))
> + if(nchar(pkg)) library(pkg, character.only=TRUE)})
> Loading required package: tcltk
> Loading Tcl/Tk interface ... done
> --- Please select a CRAN mirror for use in this session ---
> also installing the dependencies 'acepack', 'scatterplot3d',  
> 'fBasics',
> 'Hmisc', 'quadprog', 'oz', 'mlbench', 'randomForest', 'SparseM',
> 'xtable', 'chron', 'fCalendar', 'its', 'tseries', 'DAAG', 'e1071',  
> 'mvtnorm',
> 'zoo', 'strucchange', 'sandwich', 'dynlm', 'leaps'
>
> probando la URL
> 'http://cran.au.r-project.org/bin/windows/contrib/2.3/ 
> acepack_1.3-2.2.zip'
> Content
> type 'application/zip' length 55667 bytes
> URL abierta
> downloaded 54Kb
>
> probando la URL
> 'http://cran.au.r-project.org/bin/windows/contrib/2.3/ 
> scatterplot3d_0.3-24.zip'
> Content
> type 'application/zip' length 540318 bytes
> URL abierta
> downloaded 527Kb
>
> probando la URL
> 'http://cran.au.r-project.org/bin/windows/contrib/2.3/ 
> fBasics_221.10065.zip'
> Content
> type 'application/zip' length 3327499 bytes
> URL abierta
> downloaded 3249Kb
>
> probando la URL
> 'http://cran.au.r-project.org/bin/windows/contrib/2.3/ 
> Hmisc_3.0-12.zip'
> Content
> type 'application/zip' length 1993038 bytes
> URL abierta
> downloaded 1946Kb
>
> probando la URL
> 'http://cran.au.r-project.org/bin/windows/contrib/2.3/ 
> quadprog_1.4-8.zip'
> Content
> type 'application/zip' length 38626 bytes
> URL abierta
> downloaded 37Kb
>
> probando la URL
> 'http://cran.au.r-project.org/bin/windows/contrib/2.3/oz_1.0-13.zip'
> Content
> type 'application/zip' length 39640 bytes
> URL abierta
> downloaded 38Kb
>
> probando la URL
> 'http://cran.au.r-project.org/bin/windows/contrib/2.3/ 
> mlbench_1.1-1.zip'
> Content
> type 'application/zip' length 1324913 bytes
> URL abierta
> downloaded 1293Kb
>
> probando la URL
> 'http://cran.au.r-project.org/bin/windows/contrib/2.3/ 
> randomForest_4.5-16.zip'
> Content
&

Re: [R] sparse matrix, rnorm, malloc

2006-06-10 Thread roger koenker

As an example of how one might do this sort of thing in SparseM
ignoring the rounding aspect...

require(SparseM)
require(msm) #for rtnorm
sm <- function(dim,rnd,q){
 n <- rbinom(1, dim * dim, 2 * pnorm(q) - 1)
 ia <- sample(dim,n,replace = TRUE)
 ja <- sample(dim,n,replace = TRUE)
 ra <- rtnorm(n,lower = -q, upper = q)
 A <- new("matrix.coo", ia = as.integer(ia), ja = as.integer 
(ja), ra = ra, dimension = as.integer(c(dim,dim)))
 A <- as.matrix.csr(A)
 }

For dim = 5000 and q = .03 which exceeds Gavin's suggested  1 percent
density, this takes about 30 seconds on my imac and according to Rprof
about 95 percent of that (total) time is spent generating the  
truncated normals.
Word of warning:  pushing this too much further  gets tedious  since the
number of random numbers grows like dim^2.  For example, dim = 20,000
and q = .02 takes 432 seconds with again 93% of the total time spent in
rnorm and rtnorm...


url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On Jun 10, 2006, at 12:53 PM, g l wrote:

> Hi,
>
> I'm Sorry for any cross-posting. I've reviewed the archives and could
> not find an exact answer to my question below.
>
> I'm trying to generate very large sparse matrices (< 1% non-zero
> entries per row). I have a sparse matrix function below which works
> well until the row/col count exceeds 10,000. This is being run on a
> machine with 32G memory:
>
> sparse_matrix <- function(dims,rnd,p) {
>  ptm <- proc.time()
>  x <- round(rnorm(dims*dims),rnd)
>  x[((abs(x) - p) < 0)] <- 0
>  y <- matrix(x,nrow=dims,ncol=dims)
>  proc.time() - ptm
> }
>
> When trying to generate the matrix around 20,000 rows/cols on a
> machine with 32G of memory, the error message I receive is:
>
> R(335) malloc: *** vm_allocate(size=324096) failed (error code=3)
> R(335) malloc: *** error: can't allocate region
> R(335) malloc: *** set a breakpoint in szone_error to debug
> R(335) malloc: *** vm_allocate(size=324096) failed (error code=3)
> R(335) malloc: *** error: can't allocate region
> R(335) malloc: *** set a breakpoint in szone_error to debug
> Error: cannot allocate vector of size 3125000 Kb
> Error in round(rnorm(dims * dims), rnd) : unable to find the argument
> 'x' in selecting a method for function 'round'
>
> * Last error line is obvious. Question:  on machine w/32G memory, why
> can't it allocate a vector of size 3125000 Kb?
>
> When trying to generate the matrix around 30,000 rows/cols, the error
> message I receive is:
>
> Error in rnorm(dims * dims) : cannot allocate vector of length  
> 9
> Error in round(rnorm(dims * dims), rnd) : unable to find the argument
> 'x' in selecting a method for function 'round'
>
> * Last error line is obvious. Question: is this 9 bytes?
> kilobytes? This error seems to be specific now to rnorm, but it
> doesn't indicate the length metric (b/Kb/Mb) as it did for 20,000
> rows/cols. Even if this Mb, why can't this be allocated on a machine
> with 32G free memory?
>
> When trying to generate the matrix with over 50,000 rows/cols, the
> error message I receive is:
>
> Error in rnorm(n, mean, sd) : invalid arguments
> In addition: Warning message:
> NAs introduced by coercion
> Error in round(rnorm(dims * dims), rnd) : unable to find the argument
> 'x' in selecting a method for function 'round'
>
> * Same.
>
> Why would it generate different errors in each case? Code fixes? Any
> simple ways to generate sparse matrices which would avoid above
> problems?
>
> Thanks in advance,
>
> Gavin
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting- 
> guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] sparse matrix, rnorm, malloc

2006-06-10 Thread roger koenker
You need to look at the packages specifically designed  for
sparse matrices:  SparseM and Matrix.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On Jun 10, 2006, at 12:53 PM, g l wrote:

> Hi,
>
> I'm Sorry for any cross-posting. I've reviewed the archives and could
> not find an exact answer to my question below.
>
> I'm trying to generate very large sparse matrices (< 1% non-zero
> entries per row). I have a sparse matrix function below which works
> well until the row/col count exceeds 10,000. This is being run on a
> machine with 32G memory:
>
> sparse_matrix <- function(dims,rnd,p) {
>  ptm <- proc.time()
>  x <- round(rnorm(dims*dims),rnd)
>  x[((abs(x) - p) < 0)] <- 0
>  y <- matrix(x,nrow=dims,ncol=dims)
>  proc.time() - ptm
> }
>
> When trying to generate the matrix around 20,000 rows/cols on a
> machine with 32G of memory, the error message I receive is:
>
> R(335) malloc: *** vm_allocate(size=324096) failed (error code=3)
> R(335) malloc: *** error: can't allocate region
> R(335) malloc: *** set a breakpoint in szone_error to debug
> R(335) malloc: *** vm_allocate(size=324096) failed (error code=3)
> R(335) malloc: *** error: can't allocate region
> R(335) malloc: *** set a breakpoint in szone_error to debug
> Error: cannot allocate vector of size 3125000 Kb
> Error in round(rnorm(dims * dims), rnd) : unable to find the argument
> 'x' in selecting a method for function 'round'
>
> * Last error line is obvious. Question:  on machine w/32G memory, why
> can't it allocate a vector of size 3125000 Kb?
>
> When trying to generate the matrix around 30,000 rows/cols, the error
> message I receive is:
>
> Error in rnorm(dims * dims) : cannot allocate vector of length  
> 9
> Error in round(rnorm(dims * dims), rnd) : unable to find the argument
> 'x' in selecting a method for function 'round'
>
> * Last error line is obvious. Question: is this 9 bytes?
> kilobytes? This error seems to be specific now to rnorm, but it
> doesn't indicate the length metric (b/Kb/Mb) as it did for 20,000
> rows/cols. Even if this Mb, why can't this be allocated on a machine
> with 32G free memory?
>
> When trying to generate the matrix with over 50,000 rows/cols, the
> error message I receive is:
>
> Error in rnorm(n, mean, sd) : invalid arguments
> In addition: Warning message:
> NAs introduced by coercion
> Error in round(rnorm(dims * dims), rnd) : unable to find the argument
> 'x' in selecting a method for function 'round'
>
> * Same.
>
> Why would it generate different errors in each case? Code fixes? Any
> simple ways to generate sparse matrices which would avoid above
> problems?
>
> Thanks in advance,
>
> Gavin
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting- 
> guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Re-binning histogram data

2006-06-09 Thread roger koenker
On Jun 9, 2006, at 7:38 AM, Duncan Murdoch wrote:
>
> Now, if you were to suggest that the stem() function is a bizarre
> simulation of a stone-age tool on a modern computer, I might agree.
>

But as a stone-age (blackboard)  tool it is unsurpassed.  It is the only
bright spot in the usually depressing ritual  of returning exam
results.  Full disclosure of the distribution in a very concise  
encoding.

url:www.econ.uiuc.edu/~roger    Roger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R crashes on quantreg

2006-06-07 Thread roger koenker
R-help doesn't  foward attached data files like this, but Brian
kindly forwarded it to me.

You need to restrict X so that it is full rank,  it now has
rank 19 and column dimension 29 (with intercept).  See
for example svd(cbind(1,x)).

I'll add some better checking for this, but it will basically amount
to setting singular.ok = FALSE in lm() and forcings users to do
the rank reduction themselves.


url:www.econ.uiuc.edu/~roger    Roger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Jun 7, 2006, at 3:05 PM, Mu Tian wrote:

> I attached the data file here. I restarted the PC but it still  
> happens. It
> says a memory address could not be written. I am not sure it is a  
> problem of
> R or quantreg but I plot without problems before I load quantreg.
>
> Thank you.
>
> Tian
>
> On 6/7/06, Prof Brian Ripley <[EMAIL PROTECTED]> wrote:
>>
>> Without y and x we cannot reproduce this.
>>
>> PLEASE do read the posting guide!
>> http://www.R-project.org/posting-guide.html
>>
>> On Wed, 7 Jun 2006, Mu Tian wrote:
>>
>> > I forgot to mention my R version is 2.3.1 and quantreg is the most
>> updated
>> > too.
>>
>> It has a version number, which the posting guide tells you how to  
>> find.
>>
>> > On 6/7/06, Mu Tian <[EMAIL PROTECTED]> wrote:
>> >>
>> >>  I was trying "quantreg" package,
>> >>
>> >> lm1 <- lm(y~x)
>> >> rq1 <- rq(y~x)
>> >> plot(summary(rq1)) #then got a warning says singular value,  
>> etc. but
>> this
>> >> line can be omited
>> >> plot(lm1) #crash here
>> >>
>> >> It happened every time on my PC, Windows XP Pro Serv. Pack 1,
>> Pentium(4)
>> >> 3.00G.
>> >>
>> >
>> >   [[alternative HTML version deleted]]
>>
>>
>>
>> --
>> Brian D. Ripley,  [EMAIL PROTECTED]
>> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>> University of Oxford, Tel:  +44 1865 272861 (self)
>> 1 South Parks Road, +44 1865 272866 (PA)
>> Oxford OX1 3TG, UKFax:  +44 1865 272595
>>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting- 
> guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R crashes on quantreg

2006-06-07 Thread roger koenker
Since the "crash" occurs plotting the lm object it is unclear what
this has to do with quantreg, but maybe you could explain

1.  what you mean by crash,
2.  something about x,y,

This is best addressed to the maintainer of the package rather than to
R-help, provided, of course, that it is really a question about  
quantreg.

url:www.econ.uiuc.edu/~roger    Roger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Jun 7, 2006, at 2:32 PM, Mu Tian wrote:

> I was trying "quantreg" package,
>
> lm1 <- lm(y~x)
> rq1 <- rq(y~x)
> plot(summary(rq1)) #then got a warning says singular value, etc.  
> but this
> line can be omited
> plot(lm1) #crash here
>
> It happened every time on my PC, Windows XP Pro Serv. Pack 1,  
> Pentium(4)
> 3.00G.
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting- 
> guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] (no subject)

2006-05-16 Thread roger koenker
an upgrade:   from the flintstones -- to the michelin  man...


On May 16, 2006, at 4:40 PM, Thomas Lumley wrote:

> On Tue, 16 May 2006, roger koenker wrote:
>
>> In ancient times, 1999 or so, Alvaro Novo and I experimented with an
>> interface to mysql that brought chunks of data into R and accumulated
>> results.
>> This is still described and available on the web in its original  
>> form at
>>
>>  http://www.econ.uiuc.edu/~roger/research/rq/LM.html
>>
>> Despite claims of "future developments" nothing emerged, so anyone
>> considering further explorations with it may need training in
>> Rchaeology.
>
> A few hours ago I submitted to CRAN a package "biglm" that does large
> linear regression models using a similar strategy (it uses  
> incremental QR
> decomposition rather than accumalating the crossproduct matrix). It  
> also
> computes the Huber/White sandwich variance estimate in the same single
> pass over the data.
>
> Assuming I haven't messed up the package checking it will appear
> in the next couple of day on CRAN. The syntax looks like
>a <- biglm(log(Volume) ~ log(Girth) + log(Height), chunk1)
>a <- update(a, chunk2)
>a <- update(a, chunk3)
>summary(a)
>
> where chunk1, chunk2, chunk3 are chunks of the data.
>
>
>   -thomas
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting- 
> guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Re : Large database help

2006-05-16 Thread roger koenker
In ancient times, 1999 or so, Alvaro Novo and I experimented with an
interface to mysql that brought chunks of data into R and accumulated  
results.
This is still described and available on the web in its original form at

http://www.econ.uiuc.edu/~roger/research/rq/LM.html

Despite claims of "future developments" nothing emerged, so anyone
considering further explorations with it may need training in  
Rchaeology.

The toy problem we were solving was a large least squares problem,
which was a stalking horse for large quantile regression  problems.   
Around the same
time I discovered sparse linear algebra and realized that virtually all
large problems that I was interested in were better handled in from
that perspective.

url:www.econ.uiuc.edu/~roger    Roger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On May 16, 2006, at 3:57 PM, Robert Citek wrote:

>
> On May 16, 2006, at 11:19 AM, Prof Brian Ripley wrote:
>> Well, there *is* a manual about R Data Import/Export, and this does
>> discuss using R with DBMSs with examples.  How about reading it?
>
> Thanks for the pointer:
>
>http://cran.r-project.org/doc/manuals/R-data.html#Relational-
> databases
>
> Unfortunately, that manual doesn't really answer my question.  My
> question is not about how do I make R interact with a database, but
> rather how do I make R interact with a database containing large sets.
>
>> The point being made is that you can import just the columns you
>> need, and indeed summaries of those columns.
>
> That sounds great in theory.  Now I want to reduce it to practice.
> In the toy problem from the previous post, how can one compute the
> mean of a set of 1e9 numbers?  R has some difficulty generating a
> billion (1e9) number set let alone taking the mean of that set.  To  
> wit:
>
>bigset <- runif(1e9,0,1e9)
>
> runs out of memory on my system.  I realize that I can do some fancy
> data shuffling and hand-waving to calculate the mean.  But I was
> wondering if R has a module that already abstracts out that magic,
> perhaps using a database.
>
> Any pointers to more detailed reading is greatly appreciated.
>
> Regards,
> - Robert
> http://www.cwelug.org/downloads
> Help others get OpenSource software.  Distribute FLOSS
> for Windows, Linux, *BSD, and MacOS X with BitTorrent
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting- 
> guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Polygon-like interactive selection of plotted points

2006-04-26 Thread roger koenker
?in.convex.hull  in the package tripack.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Apr 26, 2006, at 1:25 PM, Marc Schwartz (via MN) wrote:

> On Wed, 2006-04-26 at 18:13 +0100, Florian Nigsch wrote:
>> [Please CC me for all replies, since I am not currently subscribed to
>> the list.]
>>
>> Hi all,
>>
>> I have the following problem/question: Imagine you have a two-
>> dimensional plot, and you want to select a number of points, around
>> which you could draw a polygon. The points of the polygon are defined
>> by clicking in the graphics window (locator()/identify()), all points
>> inside the polygon are returned as an object.
>>
>> Is something like this already implemented?
>>
>> Thanks a lot in advance,
>>
>> Florian
>
> I don't know if anyone has created a single function do to this  
> (though
> it is always possible).
>
> However, using:
>
>   RSiteSearch("points inside polygon")
>
> brings up several function hits that, if put together with the above
> interactive functions, could be used to do what you wish. That is,  
> input
> the matrix of x,y coords of the interactively selected polygon and the
> x,y coords of the underlying points set to return the points inside or
> outside the polygon boundaries.
>
> Just as an FYI, you might also want to look at ?chull, which is in the
> base R distribution and returns the set of points on the convex  
> hull of
> the underlying point set. This is to some extent, the inverse of what
> you wish to do.
>
> HTH,
>
> Marc Schwartz
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting- 
> guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Heteroskedasticity in Tobit models

2006-04-25 Thread roger koenker
Powell's quantile regression method is available in the quantreg
package  rq(..., method="fcen", ...)


url:www.econ.uiuc.edu/~roger        Roger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Apr 25, 2006, at 2:07 PM, Alan Spearot wrote:

> Hello,
>
> I've had no luck finding an R package that has the ability to  
> estimate a
> Tobit model allowing for heteroskedasticity (multiplicative, for  
> example).
> Am I missing something in survReg?  Is there another package that I'm
> unaware of?  Is there an add-on package that will test for
> heteroskedasticity?
>
> Thanks for your help.
>
> Cheers,
> Alan Spearot
>
> --
> Alan Spearot
> Department of Economics
> University of Wisconsin - Madison
> [EMAIL PROTECTED]
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting- 
> guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Handling large dataset & dataframe

2006-04-24 Thread roger koenker
You can read chunks of it at a time and store it in sparse matrix
form using the packages SparseM or Matrix,  but then you need
to think about what you want to do with it least squares sorts
of things are ok, but other options are somewhat limited...


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Apr 24, 2006, at 12:41 PM, Sachin J wrote:

> Hi,
>
>   I have a dataset consisting of 350,000 rows and 266 columns.  Out  
> of 266 columns 250 are dummy variable columns. I am trying to read  
> this data set into R dataframe object but unable to do it due to  
> memory size limitations (object size created is too large to handle  
> in R).  Is there a way to handle such a large dataset in R.
>
>   My PC has 1GB of RAM, and 55 GB harddisk space running windows XP.
>
>   Any pointers would be of great help.
>
>   TIA
>   Sachin
>
>   
> -
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting- 
> guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Guidance on step() with large dataset (750K) solicited...

2006-04-13 Thread roger koenker
Jeff,

I don't know whether this is likely to be feasible, but if you could
replace calls to lm() with calls to a sparse matrix version of lm()
either slm() in SparseM or something similar in Matrix, then I
would think that you should safe from memory problems.  Adapting step
might be more than you really bargained for though, I don't
know the code

Roger

url:www.econ.uiuc.edu/~roger    Roger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Apr 13, 2006, at 2:41 PM, Jeffrey Racine wrote:

> Hi.
>
> Background - I am working with a dataset involving around 750K
> observations, where many of the variables (8/11) are unordered  
> factors.
>
> The typical model used to model this relationship in the literature  
> has
> been a simple linear additive model, but this is rejected out of  
> hand by
> the data. I was asked to model this via kernel methods, but first  
> wanted
> to play with the parametric specification out of curiosity.
>
> I thought it would be interesting to see what type of model  
> stepwise BIC
> would yield, and have been playing with the step() function (on R-beta
> due to the factor.scope() problem that has been fixed in the  
> patched and
> beta version).
>
> I am running this on a 64bit box with 32GB of RAM and tons of swap,  
> but
> am hitting the memory wall as occasionally memory needs grow to  
> ungodly
> proportions (in the early iterations the program starts out around 8GB
> but quickly grows to 15GB, then grows from there). This is not due  
> to my
> using the beta version, as this also arises under R-2.2.1 for what  
> that
> is worth.
>
> My question is whether or not there is some simple way to  
> substantially
> reduce the memory footprint for this procedure. I took a look at
> previous posts for step() and memory issues, but still wonder whether
> there might be a switch or possibly better way of constructing my  
> model
> that would overcome the memory issues.
>
> I include the code below, and any comments or suggestions would be  
> most
> welcome (besides `what type of idiot lets information criteria  
> determine
> their model ;-)')
>
> Thanks ever so much in advance.
>
> -- Jeff
>
>  Begin 
>
> ## Read in the full data set (n=745466 observations)
>
> data <- read.table("../data_header.dat",header=TRUE)
>
> ## Create a data frame with all categorical variables declared as
> ## unordered factors
>
> data <- data.frame(logrprice=data$logrprice,
>cgt=factor(data$cgt),  
>cag=factor(data$cag),
>gstann=factor(data$gstann),
>fhogann=factor(data$fhogann),
>gstfhog=factor(data$gstfhog),
>luc=factor(data$luc),
>municipality=factor(data$municipality),
>time=factor(data$time),
>distance=data$distance,
>logr=data$logr,
>loginc=data$loginc)
>
> ## Estimate a simple linear model (used repeatedly in the literature,
> ## fails the most simple of model specification tests e.g.,
> ## resettest())
>
> model.linear <- lm(logrprice~.,data=data)
>
> ## Now conduct stepwise (BIC) regression using the step() function in
> ## the stats library. The lower model is the unconditional mean of y,
> ## the upper having polynomials of up to order 6 in the three
> ## continuous covariates, with interaction among all variables of
> ## order 2.
>
> n <- nrow(data)
>
> model.bic <- step(model.linear,
>   scope=list(
> lower=~ 1,
> upper=~ (.
>  +I(logr^2)
>  +I(logr^3)
>  +I(logr^4)
>  +I(logr^5)
>  +I(logr^6)
>  +I(distance^2)
>  +I(distance^3)
>  +I(distance^4)
>  +I(distance^5)
>  +I(distance^6)
>  +I(loginc^2)
>  +I(loginc^3)
>  +I(loginc^4)
>  +I(loginc^5)
>  +I(loginc^6))
> ^2),
>   trace=TRUE,
>   k=log(n)
>   )
>
> summary(model.bic)
>
>  End 
> -- 
> Professor J. S. Racine Pho

Re: [R] problem for wtd.quantile()

2006-03-16 Thread roger koenker
Certainly an improvement, but probably not what is really
wanted... I get:

 > rq(x ~ 1, weights=w,tau = c(.01,.25,.5,.75,.99))
Call:
rq(formula = x ~ 1, tau = c(0.01, 0.25, 0.5, 0.75, 0.99), weights = w)

Coefficients:
 tau= 0.01 tau= 0.25 tau= 0.50 tau= 0.75 tau= 0.99
(Intercept) 1 1 2 3 5

Degrees of freedom: 5 total; 4 residual

The first observation x=1 has weight .33  so it should be the
.25 quantile, unless there is some interpolation going on

url:www.econ.uiuc.edu/~roger    Roger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Mar 16, 2006, at 7:34 AM, Liaw, Andy wrote:

> Perhaps you're looking for this?
>
>> ?wtd.quantile
>> wtd.quantile(x,weights=w, normwt=TRUE)
>   0%  25%  50%  75% 100%
>12235
>
> Andy
>
> From: Jing Yang
>>
>> Dear R-users,
>>
>> I don't know if there is a problem in wtd.quantile (from
>> library "Hmisc"):
>> 
>> x <- c(1,2,3,4,5)
>> w <- c(0.5,0.4,0.3,0.2,0.1)
>> wtd.quantile(x,weights=w)
>> ---
>> The output is:
>>   0%  25%  50%  75% 100%
>> 3.00 3.25 3.50 3.75 4.00
>>
>> The version of R I am using is: 2.1.0
>>
>> Best,Jing
>>
>>
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting- 
> guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] running median and smoothing splines for robust surface f itting

2006-03-16 Thread roger koenker
Andy's comment gives me an excuse to mention that rqss() in
my quantreg package does median smoothing for 1d and 2d function
and additive models involving such functions using total
variation of f' and grad f  as a roughness penalties.  Further  
references
available from ?rqss.

Roger

url:www.econ.uiuc.edu/~roger    Roger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Mar 16, 2006, at 6:13 AM, Liaw, Andy wrote:

> loess() should be able to do robust 2D smoothing.
>
> There's no natural ordering in 2D, so defining running medians can be
> tricky.  I seem to recall Prof. Koenker talked about some robust 2D
> smoothing method at useR! 2004, but can't remember if it's  
> available in some
> packages.
>
> Andy
>
> From: Vladislav Petyuk
>>
>> Hi,
>> Are there any multidimenstional versions of runmed() and
>> smooth.spline() functions? I need to fit surface into quite
>> noisy 3D data.
>>
>> Below is an example (2D) of kind of fittings I do.
>> Thank you,
>> Vlad
>>
>> #=generating complex x,y dataset with gaussian & uniform
>> noise== x <- seq(1:1) x2 <- rep(NA,2*length(x)) y2 <-
>> rep(NA,2*length(x)) x2[seq(1,length(x2),2)] <- x
>> x2[seq(2,length(x2),2)] <- x y2[seq(1,length(x2),2)] <-
>> sin(4*pi*x/length(x)) + rnorm(length(x))
>> y2[seq(2,length(x2),2)] <- runif(length(x),min=-5,max=5)
>> #===
>>
>> #=robust & smooth fit===
>> y3 <- runmed(y2,51,endrule="median") #first round of running
>> median y4 <- smooth.spline(x2,y3,df=10) #second round of
>> smoothing splines
>> #===
>>
>> #=ploting data==
>> plot(x2,y2,pch=19,cex=0.1)
>> points(x2,y3,col="red",pch=19,cex=0.1) #running median
>> points(y4,col="green",pch=19,cex=0.1) #smoothing splines
>> #===
>>
>> __
>> R-help@stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide!
>> http://www.R-project.org/posting-guide.html
>>
>>
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting- 
> guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] transforming data frame for use with persp

2006-02-13 Thread roger koenker

a strategy for this that I  use is just

persp(interp(x,y,z))

where interp is from the Akima package, and x,y,z are all
of the same length.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Feb 13, 2006, at 3:07 PM, Denis Chabot wrote:

> Hi,
>
> This is probably documented, but I cannot find the right words or
> expression for a search. My attempts failed.
>
> I have a data frame of 3 vectors (x, y and z) and would like to
> transform this so that I could use persp. Presently I have y-level
> copies of each x level, and a z value for each x-y pair. I need 2
> columns giving the possible levels of x and y, and then a
> transformation of z from a long vector into a matrix of x-level rows
> and y-level columns. How do I accomplish this?
>
> In this example, I made a set of x and y values to get predictions
> from a GAM, then combined them with the predictions into a data
> frame. This is the one I'd like to transform as described above:
>
> My.data <- expand.grid(Depth=seq(40,220, 20), Temp=seq(-1, 6, 0.5))
> predgam <- predict.gam(dxt.gam, My.data, type="response")
> pred.data <- data.frame(My.data, predgam)
>
> pred.data has 150 lines and 3 columns.
>
> Thanks for your help,
>
> Denis Chabot
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting- 
> guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] rob var/cov + LAD regression

2006-02-08 Thread roger koenker

On Feb 8, 2006, at 10:22 AM, Angelo Secchi wrote:

>
> 1. Is there a function to have a  "jackknifed corrected  var/cov  
> estimate" (as described in MacKinnon and White 1985) in a standard  
> OLS regression?

package:  sandwich
>
> 2. Does R possess a LAD (Least Absolute Deviation) regression  
> function?
>
package:  quantreg

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] appeal --- add sd to summary for univariates

2006-02-06 Thread roger koenker

On Feb 6, 2006, at 2:34 PM, ivo welch wrote:
>
> Aside, a logical ordering might also be:
>mean sd min q1 med q3 max
> rather than have mean buried in between order statistics.

Just where it belongs, IMHO

url:www.econ.uiuc.edu/~roger    Roger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Tobit estimation?

2006-01-19 Thread roger koenker
For adventurous, but skeptical souls who lack faith in the usual
Gaussian tobit assumptions, I could mention that there is new
"fcen"  method for the quantreg rq() function that implements
Powell's tobit estimator using an algorithm of Bernd Fitzenberger.


url:www.econ.uiuc.edu/~roger        Roger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Jan 19, 2006, at 6:04 AM, Achim Zeileis wrote:

> On Thu, 19 Jan 2006 14:05:58 +0530 Ajay Narottam Shah wrote:
>
>> Folks,
>>
>> Based on
>>   http://www.biostat.wustl.edu/archives/html/s-news/1999-06/ 
>> msg00125.html
>>
>> I thought I should experiment with using survreg() to estimate tobit
>> models.
>
> I've been working on a convenience interface to survreg() that  
> makes it
> particularly easy to fit tobit models re-using the survreg()
> infrastructure. The package containing the code will hopefully be
> release soon - anyone who wants a devel snapshot, please contact me
> off-list.
> Ajay, I'll send you the code in a separate mail.
>
> Best,
> Z
>
>> I start by simulating a data frame with 100 observations from a tobit
>> model
>>
>>> x1 <- runif(100)
>>> x2 <- runif(100)*3
>>> ystar <- 2 + 3*x1 - 4*x2 + rnorm(100)*2
>>> y <- ystar
>>> censored <- ystar <= 0
>>> y[censored] <- 0
>>> D <- data.frame(y, x1, x2)
>>> head(D)
>>   y x1x2
>> 1 0.000 0.86848630 2.6275703
>> 2 0.000 0.88675832 1.7199261
>> 3 2.7559349 0.38341782 0.6247869
>> 4 0.000 0.02679007 2.4617981
>> 5 2.2634588 0.96974450 0.4345950
>> 6 0.6563741 0.92623096 2.4983289
>>
>>> # Estimate it
>>> library(survival)
>>> tfit <- survreg(Surv(y, y>0, type='left') ~ x1 + x2,
>>   data=D, dist='gaussian', link='identity')
>>
>> It says:
>>
>>   Error in survreg.control(...) : unused argument(s) (link ...)
>>   Execution halted
>>
>> My competence on library(survival) is zero. Is it still the case that
>> it's possible to be clever and estimate the tobit model using
>> library(survival)?
>>
>> I also saw the two-equation setup in the micEcon library. I haven't
>> yet understood when I would use that and when I would use a straight
>> estimation of a censored regression by MLE. Can someone shed light on
>> that? My situation is: Foreign investment on the Indian stock
>> market. Lots of firms have zero foreign investment. But many do have
>> foreign investment. I thought this is a natural tobit situation.
>>
>> -- 
>> Ajay Shah
>> http://www.mayin.org/ajayshah
>> [EMAIL PROTECTED]
>> http://ajayshahblog.blogspot.com <*(:-? - wizard who doesn't know the
>> answer.
>>
>> __
>> R-help@stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide!
>> http://www.R-project.org/posting-guide.html
>>
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting- 
> guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] I think simple R question

2006-01-12 Thread roger koenker
see ?rle


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Jan 12, 2006, at 9:56 AM, Mark Leeds wrote:

> I have a vector x with #'s ( 1 or -1 in them ) in it and I want to
> "mark" a new vector with the sign of the value of the a streak
> of H where H = some number ( at the next spot in the vector )
>
> So, say H was equal to 3 and
> I had a vector of
>
> [1]  [2]  [3]  [4]  [5]   [6]  [7]  [8]  [9]  [10]
>
> 1   -11 11   -11 1  -1-1
>
> then, I would want a function to return a new
> vector of
>
>
> [1]  [2]  [3]  [4]  [5]   [6]  [7]  [8]  [9]  [10]
>
> 0 000 0   1 0 0   0 0
>
> As I said, I used to do these things like this
> it's been a while and I'm rusty with this stuff.
>
> Without looping is preferred but looping is okay
> also.
>
>Mark
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> **
> This email and any files transmitted with it are confidentia... 
> {{dropped}}
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting- 
> guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] update?

2006-01-02 Thread roger koenker
I'm having problems with environments and update() that
I expect have a simple explanation.  To illustrate, suppose
I wanted to make a very primitive Tukey one-degree-of-
freedom for nonadditivity test and naively wrote:

nonadd <- function(formula){
 f <- lm(formula)
 v <- f$fitted.values^2
 g <- update(f, . ~ . + v)
 anova(f,g)
 }

x <- rnorm(20)
y <- rnorm(20)
nonadd(y ~ x)

Evidently, update is looking in the environment producingf f and
doesn't find v, so I get:

Error in eval(expr, envir, enclos) : Object "v" not found

This may (or may not) be related to the discussion at:
http://bugs.r-project.org/cgi-bin/R/Models?id=1861;user=guest

but in any case I hope that someone can suggest how such
difficulties can be circumvented.


url:    www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] GLM Logit and coefficient testing (linear combination)

2005-12-18 Thread roger koenker
see ?anova.glm

On Dec 18, 2005, at 10:32 AM, David STADELMANN wrote:

> Hi,
>
> I am running glm logit regressions with R and I would like to test a
> linear combination of coefficients (H0: beta1=beta2 against H1:
> beta1<>beta2). Is there a package for such a test or how can I perform
> it otherwise (perhaps with logLik() ???)?
>
> Additionally I was wondering if there was no routine to calculate  
> pseudo
> R2s for logit regressions. Currently I am calculating the pseudo R2 by
> comparing the maximum value of the log-Likelihood-function of the  
> fitted
> model with the maximum log-likelihood-function of a model containing
> only a constant. Any better ideas?
>
> Thanks a lot for your help.
> David
>
> ##
> David Stadelmann
> Seminar für Finanzwissenschaft
> Université de Fribourg
> Bureau F410
> Bd de Pérolles 90
> CH-1700 Fribourg
> SCHWEIZ
>
> Tel: +41 (026) 300 93 82
> Fax: +41 (026) 300 96 78
> Tel (priv): +41 (044) 586 78 99
> Mob (priv): +41 (076) 542 33 48
> Email: [EMAIL PROTECTED]
> Internet: http://www.unifr.ch/finwiss
> Internet (priv): http://david.stadelmann-online.com
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting- 
> guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] quantile regression problem

2005-12-10 Thread roger koenker
Since almost all (95%) of the observations are concentrated at x=0  
and x=1,
any fitting you do is strongly influenced by what would be obtained
by simply fitting quantiles at these two points and interpolating, and
extrapolating according to your favored model.  I did the following:

require(quantreg)
formula <- log(y) ~ x

plot(x,y)
z <- 1:30/10
for(tau in 10:19/20){
 f <- rq(formula,tau = tau)
 lines(z,exp(cbind(1,z) %*% f$coef))
 }


url:www.econ.uiuc.edu/~roger        Roger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On Dec 10, 2005, at 11:30 AM, [EMAIL PROTECTED] wrote:

> Dear List members,
>
> I would like to ask for advise on quantile regression in R.
>
> I am trying to perform an analysis of a relationship between  
> species abundance and its habitat requirements -
> the habitat requirements are, however, codes - 0,1,2,3... where  
> 0<1<2<3 and the scale is linear - so I would be happy to treat them  
> as continuos
>
> The analysis of the data somehow does not work, I am trying to  
> perform linear quantile regression using rq function and I cannot  
> figure out whether there is a way to analyse the data using  
> quantile regression ( I would really like to do this since the  
> shape is an envelope) or whether it is not possible.
>
> I tested that if I replace the categories with continuous data of  
> the same range it works perfectly. In the form I have them ( and I  
> cannot change it) I am getting
>  errors - mainly about non-positive fis.
>
> Could somebody please let me know whether there was a way to  
> analyse the data?
> The data are enclosed and the question is
> Is there a relationship between abundance and absdeviation?
> I am interested in the upperlimit so I wanted to analyze the upper 5%.
>
> Thanks a lot for your help
>
> All the best
>
> Zuzana Munzbergova
>
> www.natur.cuni.cz/~zuzmun
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting- 
> guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Matrix of dummy variables from a factor

2005-12-06 Thread roger koenker

On Dec 6, 2005, at 3:27 PM, Berton Gunter wrote:

> But note: There are (almost?) no situations in R where the dummy  
> variables
> coding is needed. The coding is (almost?) always handled properly  
> by the
> modeling functions themselves.
>
> Question: Can someone provide a "straightforward" example where the  
> dummy
> variable coding **is** explicitly needed?
>

Bert's  question offers an opportunity for me to mention (again) my  
long standing wish
for someone to write a version of model.matrix that directly produced  
a matrix
in one of the common  sparse matrix formats.   This could be a good   
project for one of
you who like using ";" ?

Roger

url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Closed form for regression splines

2005-12-05 Thread roger koenker
you can do:

X <- model.matrix(formula, data = your.data)


url:www.econ.uiuc.edu/~roger    Roger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Dec 5, 2005, at 7:36 AM, Stephen A Roberts wrote:

>
> Greetings,
>
> I have a model fitted using bs() and need to be able to write down  
> a closed form for the spline function to enable the use of the  
> fitted model outside R. Does anyone know a simple way of extracting  
> the piecewise cubics from the coefficients and knots? As far as I  
> know they are defined by recurrence relationships, but the R  
> implementation is buried in C code, and I guess in non-trivial to  
> invert. I know about predict.bs() within R, but I want the full  
> piecewise cubic.
>
> Steve.
>
>   Dr Steve Roberts
>   [EMAIL PROTECTED]
>
> Senior Lecturer in Medical Statistics,
> Biostatistics Group,
> University of Manchester,
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting- 
> guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] open source and R

2005-11-13 Thread roger koenker
>
> On Nov 13, 2005, at 3:24 PM, Robert wrote:
>
>
>> I am curious about one thing: since the reason for using r is r is  
>> a easy-to-learn language and it is good for getting more people  
>> involved. Why most of the packages written in r use other  
>> languages such as FORTRAN's code? I understand some functions have  
>> already been written in other language or it is faster to be  
>> implemented in other language. But my understanding is if the user  
>> does not know that language (for example, FORTRAN), the package is  
>> still a black box to him  because he can not improve the package  
>> and can not be involved in the development.
>> When I searched the packages of R, I saw many packages with  
>> duplicated or similar functions. the main difference among them  
>> are the different functions implemented using other languages,  
>> which are always a black box to the users. So it is very hard for  
>> users to believe the package will run something they need, let  
>> alone getting involved in the development.
>>
>
>
> No, the box is not black, it is utterly transparent.  Of course,  
> what you can recognize and understand
> inside depends on you,.  Just say "no"  to linguistic chauvinism   
> -- even R-ism.
>
>
> url:www.econ.uiuc.edu/~rogerRoger Koenker
> email   [EMAIL PROTECTED]   Department of  
> Economics
> vox:217-333-4558University of Illinois
> fax:217-244-6678Champaign, IL 61820
>
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Robust Non-linear Regression

2005-11-13 Thread roger koenker
you might consider nlrq() in the quantreg package, which does median
regression for nonlinear response functions


url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On Nov 13, 2005, at 3:47 PM, Vermeiren, Hans [VRCBE] wrote:

> Hi,
>
> I'm trying to use Robust non-linear regression to fit dose response  
> curves.
> Maybe I didnt look good enough, but I dind't find robust methods  
> for NON
> linear regression implemented in R. A method that looked good to me  
> but is
> unfortunately not (yet) implemented in R is described in
> http://www.graphpad.com/articles/RobustNonlinearRegression_files/ 
> frame.htm
> <http://www.graphpad.com/articles/RobustNonlinearRegression_files/ 
> frame.htm>
>
>
> in short: instead of using the premise that the residuals are  
> gaussian they
> propose a Lorentzian distribution,
> in stead of minimizing the squared residus SUM (Y-Yhat)^2, the  
> objective
> function is now
> SUM log(1+(Y-Yhat)^2/ RobustSD)
>
> where RobustSD is the 68th percentile of the absolute value of the  
> residues
>
> my question is: is there a smart and elegant way to change to  
> objective
> function from squared Distance to log(1+D^2/Rsd^2) ?
>
> or alternatively to write this as a weighted non-linear regression  
> where the
> weights are recalculated during the iterations
> in nlme it is possible to specify weights, possibly that is the way  
> to do
> it, but I didn't manage to get it working
> the weights should then be something like:
>
> SUM (log(1+(resid(.)/quantile(all_residuals,0.68))^2)) / SUM (resid 
> (.))
>
> the test data I use :
> x<-seq(-5,-2,length=50)
> x<-rep(x,4)
> y<-SSfpl(x,0,100,-3.5,1)
> y<-y+rnorm(length(y),sd=5)
> y[sample(1:length(y),floor(length(y)/50))]<-200 # add 2% outliers  
> at 200
>
> thanks a lot
>
> Hans Vermeiren
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting- 
> guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] elements in a matrix to a vector

2005-11-09 Thread roger koenker
If you are really looking for a way to extract the non-zero elements you
can use something like the following:

 > library(SparseM)
 >  A
  [,1] [,2] [,3]
[1,]003
[2,]200
[3,]040
 > as.matrix.csr(A)@ra
[1] 3 2 4

there is a tolerance parameter in the coercion to sparse representation
to decide what is really "zero"  -- by default this is  eps = .Machine 
$double.eps.


url:www.econ.uiuc.edu/~roger    Roger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Nov 9, 2005, at 10:14 AM, Mike Jones wrote:

> hi all,
>
> i'm trying to get elements in a matrix into a vector.  i need a
> "streamlined" way to do it as the way i'm doing it is not very
> serviceable.  an example is a 3x3 matrix like
>
> 0 0 3
> 2 0 0
> 0 4 0
>
> to a vector like
>
> 3 2 4
>
> thanks...mj
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting- 
> guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] rgl.snapshot "failed"

2005-06-10 Thread roger koenker
I've installed the rgl package on a Suse x86-64 machine (further  
details below)
and it produces nice screen images.  Unfortunately, rgl.snapshot   
attempts to
make png files produces only the response "failed".  For other  
graphics png()
works fine, and capabilities indicates that it is there.  If anyone  
has a suggestion
of what might be explored at this point I'd be very appreciative.

platform x86_64-unknown-linux-gnu
arch x86_64
os   linux-gnu
system   x86_64, linux-gnu
status
major2
minor1.0
year 2005
month04
day  18
language R

url:www.econ.uiuc.edu/~roger    Roger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Robustness of Segmented Regression Contributed by Muggeo

2005-06-08 Thread roger koenker
You might try rqss() in the quantreg package.  It gives piecewise  
linear fits
for a nonparametric form of median regression using total variation  
of the

derivative of the fitted function as a penalty term.  A tuning parameter
(lambda) controls the number of distinct segments.  More details are
available in the vignette for the quantreg package.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Jun 8, 2005, at 7:25 AM, Park, Kyong H Mr. RDECOM wrote:



Hello, R users,
I applied segmented regression method contributed by Muggeo and got
different slope estimates depending on the initial break points.  
The results
are listed below and I'd like to know what is a reasonable approach  
handling
this kinds of problem. I think applying various initial break  
points is
certainly not a efficient approach. Is there any other methods to  
deal with
segmented regression? From a graph, v shapes are more clear at 1.2  
and 1.5

break points than 1.5 and 1.7. Appreciate your help.

Result1:
Initial break points are 1.2 and 1.5. The estimated break points  
and slopes:


 Estimated Break-Point(s):
 Est.  St.Err
Mean.Vel 1.285 0.05258
   1.6520.01247

   Est.  St.Err. t valueCI 
(95%).l

CI(95%).u
slope1   0.4248705 0.3027957   1.403159-0.1685982 
1.018339
slope2   2.3281445 0.3079903   7.559149 1.7244946 
2.931794
slope3   9.5425516 0.7554035   12.632390 8.0619879
11.023115

Adjusted R-squared: 0.9924.

Result2:
Initial break points are 1.5 and 1.7. The estimated break points  
and slopes:


Estimated Break-Point(s):
Est.   St.Err
Mean.Vel 1.412  0.02195
   1.699  0.01001

   Est.  St.Err.t valueCI 
(95%).l

CI(95%).u
slope1  0.7300483   0.13815875.284129   0.4592623   
1.000834
slope2  3.4479466   0.244253014.116289 2.9692194
3.926674
slope3 12.500   1.7783840 7.028853 9.0144314   
15.985569


Adjusted R-squared: 0.995.




[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting- 
guide.html





__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] make install on solaris 10

2005-06-06 Thread roger koenker


On Jun 6, 2005, at 10:08 AM, Peter Dalgaard wrote:




It's your missing (or outdated) makeinfo that is coming back to bite
you. However, I'm a bit confuzzled because we do ship resources.html
et al. as part of the R tarball, so there shouldn't be a need to build
them. Were you building from an SVN checkout?

The way out is to install texinfo 4.7 or better. If you have the .html
files, you might be able to get by just by touch-ing or copying them.


On Jun 6, 2005, at 9:14 AM, Prof Brian Ripley wrote:


As far as I can see something has deleted doc/html/resources.html:  
it is in the tarball. I cannot immediately guess what: have you  
done any sort of `make clean'?


Copying it from the virgin sources and doing `make install' again  
should fix this: if not perhaps you can keep an eye on what is  
apparently removing it.


BTW, where did /usr/local/bin/install come from?  If that is not  
doing what is expected, it could be the problem.


Having:

1.  Downloaded a fresh version of R-devel
2.  Installed texinfo 4.8
3.  moved my rogue /usr/local/bin/install file out of the way

R now builds and installs fine.  It looks like X11 support is still  
missing

but presumably just needs -L/usr/openwin/lib/sparcv9.  Some further
investigation is needed for png, jpeg and tctlk support, but this can  
wait

for a little while.

Thanks very much for your help.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] make install on solaris 10

2005-06-06 Thread roger koenker
We have recently upgraded to Solaris 10 on a couple of sparc machines  
with the usual
mildly mysterious consequences for library locations, etc, etc.  I've  
managed to configure

R 2.1.0 for a 64 bit version with:

R is now configured for sparc-sun-solaris2.10

  Source directory:  .
  Installation directory:/usr/local

  C compiler:gcc -m64 -g -O2
  C++ compiler:  g++  -m64 -fPIC
  Fortran compiler:  g77  -m64 -g -O2

  Interfaces supported:  X11
  External libraries:readline
  Additional capabilities:   PNG, JPEG, MBCS, NLS
  Options enabled:   R profiling

  Recommended packages:  yes

configure:47559: WARNING: you cannot build info or html versions of  
the R manuals


and make and make check seem to run smoothly, however "make install"  
dies with

the following messages:

ysidro.econ.uiuc.edu# make install
installing doc ...
creating doc/html/resources.html
*** Error code 255
The following command caused the error:
false --html --no-split --no-headers ./resources.texi -o ../html/ 
resources.html

make: Fatal error: Command failed for target `../html/resources.html'
Current working directory /usr/local/encap/R-2.1.0/doc/manual
installing doc/html ...
installing doc/html/search ...
/usr/local/bin/install: resources.html: No such file or directory
*** Error code 1
The following command caused the error:
for f in resources.html; do \
  /usr/local/bin/install -c -m 644 ${f} "/usr/local/lib/R/doc/html"; \
done
make: Fatal error: Command failed for target `install'
Current working directory /usr/local/encap/R-2.1.0/doc/html
*** Error code 1
The following command caused the error:
for d in html manual; do \
  (cd ${d} && make install) || exit 1; \
done
make: Fatal error: Command failed for target `install'
Current working directory /usr/local/encap/R-2.1.0/doc
*** Error code 1
The following command caused the error:
for d in m4 tools doc etc share src po tests; do \
  (cd ${d} && make install) || exit 1; \
done
make: Fatal error: Command failed for target `install'

and running R from the bin directory gives:

> capabilities()
jpeg  pngtcltk  X11 http/ftp  sockets   libxml fifo
   FALSEFALSEFALSEFALSE TRUE TRUE TRUE TRUE
  cledit  IEEE754iconv
TRUE TRUEFALSE

Any suggestions would be greatly appreciated.  With solaris 9 we had  
a 64 bit build
but never encounter such problems, and I don't see anything in the  
archives or the
install manual that is relevant -- but of course, I'm not very clear  
about what I'm looking

for either.

Roger


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Piecewise Linear Regression

2005-05-30 Thread roger koenker
It is conventional to fit piecewise linear models by assuming  
Gaussian error and
using least squares methods, but one can argue that median regression  
provides
a more robust approach to this problem.  You might consider the  
following fit:


 x = c 
(6.25,6.25,12.50,12.50,18.75,25.00,25.00,25.00,31.25,31.25,37.50,37.50,5 
0.00,50.00,62.50,62.50,75.00,75.00,75.00,100.00,100.00)
 y = c 
(0.328,0.395,0.321,0.239,0.282,0.230,0.273,0.347,0.211,0.210,0.259,0.186 
,0.301,0.270,0.252,0.247,0.277,0.229,0.225,0.168,0.202)

library(quantreg)
plot(x,y)
fit <- rqss(y ~ qss(x))
plot(fit)

it gives 5 segments not 3, but this can be controlled by the choice  
of lambda in the qss

function, for example, try:

fit <- rqss(y ~ qss(x,lambda=3)
plot(fit,col="red")

which gives a fit like you suggest might be reasonable with only  
three segments.




url:www.econ.uiuc.edu/~roger        Roger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820


On May 30, 2005, at 6:38 PM, Abhyuday Mandal wrote:


Hi,

I need to fit a piecewise linear regression.

x = c 
(6.25,6.25,12.50,12.50,18.75,25.00,25.00,25.00,31.25,31.25,37.50,37.50 
,50.00,50.00,62.50,62.50,75.00,75.00,75.00,100.00,100.00)
y = c 
(0.328,0.395,0.321,0.239,0.282,0.230,0.273,0.347,0.211,0.210,0.259,0.1 
86,0.301,0.270,0.252,0.247,0.277,0.229,0.225,0.168,0.202)


there are two change points. so the fitted curve should look like



\
 \  /\
  \/  \
   \
\

How do I do this in R ?

Thank you,
Abhyuday

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting- 
guide.html




__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] plotting image/contour on irregular grid

2005-05-06 Thread roger koenker

On May 6, 2005, at 2:45 PM, Roger Bivand wrote:
On Fri, 6 May 2005, m p wrote:
Hello,
I'd like to make a z(x,y) plot for irregularly spaced
x,y. What are routines are available in R for this
purpose?
One possibility is to interpolate a regular grid using interp() in the
akima package, then use image() or contour(). Another is to use
levelplot() with formula z ~ x + y in the lattice package, and the
equivalent contourplot(); here, the x,y pairs must lie on a grid,  
but do
not need to fill the grid (so are regularly spaced with missing grid
cells).

You could also try tripack and rgl.triangles to produce piecewise linear
surfaces on the Delaunay triangulation of the x,y points.
Roger
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] standard errors for orthogonal linear regression

2005-04-28 Thread roger koenker
Wayne Fuller's Measurement Error Models is a good reference.
url:www.econ.uiuc.edu/~roger    Roger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
On Apr 28, 2005, at 1:19 PM, <[EMAIL PROTECTED]> wrote:
Could someone please help me by giving me a reference to how one 
computes standard errors for the coefficients in an orthogonal linear 
regression, or perhaps someone has some R code? (I would accept a 
derivation or formula, but as a former teacher, I know how that can 
rankle.) I tried to imitate what's done in the code for lm() but went 
astray somewhere and got nonsense.

(This type of modeling goes by several names: total least squares, 
errors in variables, orthogonal distance regression (ODR), depending 
on where you are coming from.)

I have found ODRpack, but I haven't yet plowed through the Fortran to 
see if what I need is there; I'm working on it
Thanks!

David L. Reiner
 
Rho Trading
440 S. LaSalle St -- Suite 620
Chicago  IL  60605
 
312-362-4963 (voice)
312-362-4941 (fax)
 
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] normality test

2005-04-28 Thread roger koenker
For my money,  Frank's comment should go into fortunes.  It seems a
rather Sisyphean battle to keep the lessons of robustness on the 
statistical table
but nevertheless well worthwhile.

url:www.econ.uiuc.edu/~roger    Roger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
On Apr 28, 2005, at 7:46 AM, Frank E Harrell Jr wrote:
Usually (but not always) doing tests of normality reflect a lack of 
understanding of the power of rank tests, and an assumption of high 
power for the tests (qq plots don't always help with that because of 
their subjectivity).  When possible it's good to choose a robust 
method.  Also, doing pre-testing for normality can affect the type I 
error of the overall analysis.

--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt 
University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Construction of a large sparse matrix

2005-04-18 Thread roger koenker
The dense blocks are too big as Reid has already written --
for smaller instances of this sort of thing  I would suggest that the 
the kronecker
product %x% operator in SparseM,  would be more convenient.

url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
On Apr 18, 2005, at 3:54 PM, Doran, Harold wrote:
Dear List:
I'm working to construct a very large sparse matrix and have found
relief using the SparseM package. I have encountered an issue that is
confusing to me and wonder if anyone may be able to suggest a smarter
solution. The matrix I'm creating is a covariance matrix for a larger
research problem that is subsequently used in a simulation. Below is 
the
latex form of the matrix if anyone wants to see the pattern I am trying
to create.

The core of my problem seems to localize to the last line of the
following portion of code.
n<-sample.size*4
k<-n/4
vl.mat <- as.matrix.csr(0, n, n)
block <- 1:k #each submatrix size
for(i in 1:3) vl.mat[i *k + block, i*k + block] <- LE
When the variable LE is 0, the matrix is easily created. For example,
when sample.size = 10,000 this matrix was created on my machine in 
about
1 second. Here is the object size.

object.size(vl.mat)
[1] 160692
However, when LE is any number other than 0, the code generates an
error. For example, when I try LE <- 2 I get
Error: cannot allocate vector of size 781250 Kb
In addition: Warning message:
Reached total allocation of 1024Mb: see help(memory.size)
Error in as.matrix.coo(as.matrix.csr(value, nrow = length(rw), ncol =
length(cl))) :
Unable to find the argument "x" in selecting a method for
function "as.matrix.coo"
I'm guessing that single digit integers should occupy the same amount 
of
memory. So, I'm thinking that the matrix is "less sparse" and the
problem is related to the introduction of a non-zero element (seems
obvious). However, the matrix still retains a very large proportion of
zeros. In fact, there are still more zeros than non-zero elements.

Can anyone suggest a reason why I am not able to create this matrix? 
I'm
at the limit of my experience and could use a pointer if anyone is able
to provide one.

Many thanks,
Harold
P.S. The matrix above is added to another matrix to create the
covariance matrix below. The code above is designed to create the
portion of the matrix \sigma^2_{vle}\bm{J} .
\begin{equation}
\label{vert:cov}
\bm{\Phi} = var
\left [
\begin{array}{c}
Y^*_{1}\\
Y^*_{2}\\
Y^*_{3}\\
Y^*_{4}\\
\end{array}
\right ]
=
\left [
\begin{array}{}
\sigma^2_{\epsilon}\bm{I}& \sigma^2_{\epsilon}\rho\bm{I} & \bm{0} &
\bm{0}\\
\sigma^2_{\epsilon}\rho\bm{I} &
\sigma^2_{\epsilon}\bm{I}+\sigma^2_{vle}\bm{J} &
\sigma^2_{\epsilon}\rho^2\bm{I} & \bm{0}\\
\bm{0} & \sigma^2_{\epsilon}\rho^2\bm{I} &
\sigma^2_{\epsilon}\bm{I}+\sigma^2_{vle}\bm{J}&
\sigma^2_{\epsilon}\rho^3\bm{I}\\
\bm{0} & \bm{0} & \sigma^2_{\epsilon}\rho^3\bm{I}&
\sigma^2_{\epsilon}\bm{I}+\sigma^2_{vle}\bm{J} \\
\end{array}
\right]
\end{equation}
where $\bm{I}$ is the identity matrix, $\bm{J}$ is the unity matrix, 
and
$\rho$ is the autocorrelation.


[[alternative HTML version deleted]]
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] off-topic question: Latex and R in industries

2005-04-06 Thread roger koenker
my favorite answer to this question is "because there is no one to sue."
url:www.econ.uiuc.edu/~roger    Roger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
On Apr 6, 2005, at 10:38 AM, Wensui Liu wrote:
Latex and R are really cool stuff. I am just wondering how they are
used in industry. But based on my own experience, very rare. Why?
How about the opinion of other listers? Thanks.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] French Curve

2005-04-06 Thread roger koenker
On Apr 6, 2005, at 1:48 AM, Martin Maechler wrote:
Median filtering aka "running medians" has one distinctive
advantage {over smooth.spline() or other so called linear smoothers}:
   It is "robust" i.e. not distorted by gross outliers.
Running medians is implemented in runmed() {standard "stats" package}
in a particularly optimized way rather than using the more general
running(.) approach of package 'gtools'.
Median smoothing splines are also implemented in the quantreg
package see ?rqss, but they produce piecewise linear fitting so
they may not appeal to those accustomed to french curves.
url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] total variation penalty

2005-03-02 Thread roger koenker
On Mar 2, 2005, at 6:25 PM, Vadim Ogranovich wrote:
I was recently plowing through the docs of the quantreg package by 
Roger
Koenker and came across the total variation penalty approach to
1-dimensional spline fitting. I googled around a bit and have found 
some
papers originated in the image processing community, but (apart from
Roger's papers) no paper that would discuss its statistical aspects.
You might look at
@article{davi:kova:2001,
Author = {Davies, P. L. and Kovac, A.},
Title = {Local Extremes, Runs, Strings and Multiresolution},
Year = 2001,
Journal = {The Annals of Statistics},
Volume = 29,
Number = 1,
Pages = {1--65},
Keywords = {[62G07 (MSC2000)]; [65D10 (MSC2000)]; [62G20 (MSC2000)];
   [nonparametric regression]; [local extremes]; [runs];
   [strings]; [multiresolution analysis]; [asymptotics];
   [outliers]; [low power peaks]; nonparametric function
   estimation}
}
They are using total variation of the function rather than total 
variation of its derivative
as in the KNP paper mentioned below, but there are close connections 
between the
methods.

There are several recent  papers on what Tibshirani calls the lasso vs 
other penalties for
regression problems... for example:

@article{knig:fu:2000,
Author = {Knight, Keith and Fu, Wenjiang},
Title = {Asymptotics for Lasso-type Estimators},
Year = 2000,
Journal = {The Annals of Statistics},
Volume = 28,
Number = 5,
Pages = {1356--1378},
Keywords = {[62J05 (MSC1991)]; [62J07 (MSC1991)]; [62E20 (MSC1991)];
   [60F05 (MSC1991)]; [Penalized regression]; [Lasso];
   [shrinkage estimation]; [epi-convergence in 
distribution];
   neural network models}
}
@article{fan:li:2001,
Author = {Fan, Jianqing and Li, Runze},
Title = {Variable Selection Via Nonconcave Penalized Likelihood and 
Its
Oracle Properties},
Year = 2001,
Journal = {Journal of the American Statistical Association},
Volume = 96,
Number = 456,
Pages = {1348--1360},
Keywords = {[HARD THRESHOLDING]; [LASSO]; [NONNEGATIVE GARROTE];
   [PENALIZED LIKELIHOOD]; [ORACLE ESTIMATOR]; [SCAD]; [SOFT


I have a couple of questions in this regard:
* Is it more natural to consider the total variation penalty in the
context of quantile regression than in the context of OLS?
Not especially, see the lasso literature which is predominantly based
on Gaussian likelihood.  The taut string idea is also based on Gaussian
fidelity, at least in its original form.  There are some computational
conveniences involved in using l1 penalties with l1 fidelities, but with
the development of modern interior point algorithms, l1 vs l2 fidelity 
isn't really
much of a distinction.  The real question is:  do you believe in that 
old
time religion, do you have that Gaussian faith?  I don't.

* Could someone please point to a good overview paper on the subject?
Ideally something that compares merits of different penalty functions.
See above
Threre seems to be an ongoing effort to generalize this approach to 2d,
but at this time I am more interested in 1-d smoothing.
For the sake of completeness, the additive model component of quantreg 
is
based primarily on the following two papers:

@article{koen:ng:port:1994,
Author = {Koenker, Roger and Ng, Pin and Portnoy, Stephen},
Title = {Quantile Smoothing Splines},
Year = 1994,
Journal = {Biometrika},
Volume = 81,
Pages = {673--680}
}
@article{KM.04,
Author = {Koenker, R. and I. Mizera},
Title = {Penalized Triograms:  Total Variation Regularization 
for Bivariate Smoothing},
Journal = JRSS-B,
Volume = 66,
Pages = {145--163},
Year = 2004
}

url:www.econ.uiuc.edu/~roger        Roger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] logit link + alternatives

2005-02-07 Thread roger koenker
Just for the record --  NEWS for 2.1.0 includes:
o   binomial() has a new "cauchit" link (suggested by Roger Koenker).
the MASS polr for ordered response is also now adapted for the Cauchit 
case.

url:www.econ.uiuc.edu/~roger    Roger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
On Feb 7, 2005, at 7:01 AM, (Ted Harding) wrote:
On 07-Feb-05 [EMAIL PROTECTED] wrote:
Help needed with lm function:
Dear R's,
Could anyone tell me how to replace the link function (probit logit,
loglog etc.) in lm
with an abitrary user-defined function? The task is to perform ML
Estimation of betas
for a dichotome target variable.
Maybe there is already a package for this (I did not find one).
Any hints or a code excerpt would be welcome!
Thank you -Jeff
I asked a similar question last year (2 April 2004) since I wanted
a "cauchy" link in a binary response model (the data suggested
heavy tails). I thought in the first place that I saw a fairly
straightforward way to do it, but Brian Ripley's informed response
put me off, once I had looked into the details of what would be
involved (his reply which includes my original mail follows):
# On Fri, 2 Apr 2004 [EMAIL PROTECTED] wrote:
#
# > I am interested in extending the repertoire of link functions
# > in glm(Y~X, family=binomial(link=...)) to include a "tan" link:
# >
# >eta = (4/pi)*tan(mu)
# >
# > i.e. this link bears the same relation to the Cauchy distribution
# > as the probit link bears to the Gaussian. I'm interested in sage
# > advice about this from people who know their way aroung glm.
# >
# > From the surface, it looks as though it might just be a matter
# > of re-writing 'make.link' in the obvious sort of way so as to
# > incorporate "tan", but I fear traps ...
#
# How are you going to do that?  If you edit make.link and have your
# own local copy, the namespace scoping will ensure that the system
# copy gets used, and the code in binomial() will ensure that even
# that does not get  called except for the pre-coded list of links.
#
# > What am I missing?
#
# You need a local, modified, copy of binomial, too, AFAICS.
As I say, the implied details put me off for a while, but in
this particular case Thomas W Yee came up with a ready-made
solution (23 April 2004):
# my VGAM package at www.stat.auckland.ac.nz/~yee
# now has the tan link for binomialff().
# It is tan(pi*(mu-0.5)).
(See his full mail in the R-help archives for April 2004
for several important details regarding this implementation).
So: it would seem to be quite possible to write yor own link
function, but it would take quite a bit of work and would
involves re-writing at least the codes for 'make.link'
and for 'binomial', and being careful about how you use them.
Hoping this helps,
Ted.

E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 07-Feb-05   Time: 12:57:07
-- XFMail --
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] A modified log transformation with real finite values for negatives and zeros?

2005-02-02 Thread roger koenker
Bickel and Doksum (JASA, 1981) discuss a modified version of the Box-Cox
transformation that looks like this:
y -> ( sgn(y)* abs(y)^lambda -1)/lambda
and in the original Box-Cox paper there was an offset parameter that 
gives
rise to some somewhat peculiar likelihood theory as in the 3-parameter
log-normal where one gets an unbounded likelihood by letting the
threshold parameter approach the first order statistic  from below, but
for which the likeihood equations seem to provide a perfectly sensible
root.

url:www.econ.uiuc.edu/~roger    Roger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
On Feb 2, 2005, at 1:28 PM, Spencer Graves wrote:
 Does anyone have any ideas (or even experience) regarding a 
modified log transformation that would assign real finite values to 
zeros and negative numbers?  I encounter this routinely in a couple of 
different situations:
 * Physical measurements that are often lognormally distributed 
except for values that are less than additive normal measurement 
error.  I'd like to take logarithms of the clearly positive values and 
assign some smaller finite number(s) for values less than or equal to 
zero.  I also might like to decompose the values into mean plus 
variance of the logs plus variance of additive normal noise.  However, 
that would require more machinery than is appropriate for exploratory 
data analysis.
 * Integers most of which are plausibly Poisson counts but include 
a few negative values.  People in manufacturing sometimes report the 
number of defects "added" between two steps in the process, computed 
as the difference between the number counted before and after 
intervening steps.  These counts are occasionally negative either 
because defects are removed in processing or because of a miscount 
either before or after.
 For an example, see "www.prodsyse.com/log0".  There, you can also 
download working R code for such a transformation along with 
PowerPoint slides documenting some of the logic behind the code.  It's 
not included here, because it's too much for a standard R post.
 Comments?  Thanks,
 spencer graves

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] read.matrix.csr bug (e1071)?

2005-01-28 Thread roger koenker
Don't you want read.matrix.csr not read.matrix?
url:www.econ.uiuc.edu/~roger    Roger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
On Jan 28, 2005, at 9:22 AM, Jeszenszky Peter wrote:
Hello,
I would like to read and write sparse matrices using the
functions write.matrix.csr() and read.matrix.csr()
of the package e1071. Writing is OK but reading back the
matrix fails:
x <- rnorm(100)
m <- matrix(x, 10)
m[m < 0.5] <- 0
m.csr <- as.matrix.csr(m)
write.matrix.csr(m, "sparse.dat")
read.matrix("sparse.dat")
	Error in initialize(value, ...) : Can't use object of class "integer" 
in new():  Class "matrix.csr" does not extend that class

Is something wrong with the code above or it must be
considered as a bug?
Best regards,
Peter
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] CIS inquiries

2005-01-24 Thread roger koenker
Does anyone have an automated way to make Current Index to Statistics
inquiries from R, or from the Unix command line?  I thought it might be
convenient to have something like this for occasions in which I'm in a
foreign domain and would like to make inquires on my office machine
without firing up a full fledged browser.  Lynx is ok for this purpose, 
but it
might be nice to have something more specifically designed for CIS.

url:www.econ.uiuc.edu/~roger    Roger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Peak finding algorithm

2004-12-09 Thread roger koenker
You might want to look at the ftnonpar package.  You haven't quite 
specified whether
you are thinking about estimating densities, or regression functions or 
some third
option, or whether 2-dimensional means: functions R -> R or functions 
R^2 -> R,
my recollection is that ftnonpar is (mostly?) about the R -> R case.

url:www.econ.uiuc.edu/~roger        Roger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
On Dec 9, 2004, at 3:01 PM, Gene Cutler wrote:
I'm sure there must be various peak-finding algorithms out there.  Not 
knowing of any, I have written one myself*, but I thought I'd ask to 
see what's out there.

Basically, I have a 2-dimensional data set and I want to identify 
local peaks in the data, while ignoring "trivial" peaks.  My naive 
algorithm first identifies every peak and valley (point of inflection 
change in the graph), then shaves off shallow peaks and valleys based 
on an arbitrary depth parameter, then returns whatever is left.  This 
produces decent results, but, again, I'd like to know what other 
implementations are available.

(* source available on request)
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Protocol for answering basic questions

2004-12-01 Thread roger koenker
Maybe it would be helpful to think of R-help as something more than
the Oracle of Delphi.  Questions, ideally, should  be framed in such a
way that they might lead to improvements in R:  extensions of the code
or, more frequently  clarifications or extensions of the documentation.
Indeed the R-help archive itself serves this function and could 
profitably
be searched prior  to firing off a question to R-help.  As traffic on 
R-help
increases there is a delicate balance that must be maintained in order
to keep knowledgeable users interested in the list.

url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
On Dec 1, 2004, at 10:56 AM, James Foadi wrote:
On Wednesday 01 Dec 2004 4:46 pm, Robert Brown FM CEFAS wrote:
Understandable but not a recipe to encourage the use of R by other 
than
experts. The R community needs to decide of they really only want 
expert
statisticians users and make this clear if it is the case.  
Alternatively
if they are to encourage novices the present approach is not the way 
to do
it.
I perfectly agree with Robert Brown. Althogh I have been captivated by 
"R",
and will keep using it, I would appreciate if "R" gurus could make 
this clear.

Thanks
James
--
Dr James Foadi
Structural Biology Laboratory
Department of Chemistry
University of York
YORK YO10 5YW
UK
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] impute missing values in correlated variables: transcan?

2004-11-30 Thread roger koenker
At the risk of stirring up a hornet's nest , I'd suggest that
means are dangerous in such applications.  A nice paper
on combining ratings is:  Gilbert Bassett and Joseph  Persky,
Rating Skating,  JASA, 1994,  1075-1079.
url:www.econ.uiuc.edu/~roger    Roger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
On Nov 30, 2004, at 10:52 AM, Jonathan Baron wrote:
I would like to impute missing data in a set of correlated
variables (columns of a matrix).  It looks like transcan() from
Hmisc is roughly what I want.  It says, "transcan automatically
transforms continuous and categorical variables to have maximum
correlation with the best linear combination of the other
variables." And, "By default, transcan imputes NAs with "best
guess" expected values of transformed variables, back transformed
to the original scale."
But I can't get it to work.  I say
m1 <- matrix(1:20+rnorm(20),5,)  # four correlated variables
colnames(m1) <- paste("R",1:4,sep="")
m1[c(2,19)] <- NA# simulate some missing data
library(Hmisc)
transcan(m1,data=m1)
and I get
Error in rcspline.eval(y, nk = nk, inclx = TRUE) :
  fewer than 6 non-missing observations with knots omitted
I've tried a few other things, but I think it is time to ask for
help.
The specific problem is a real one.  Our graduate admissions
committee (4 members) rates applications, and we average the
ratings to get an overall rating for each applicant.  Sometimes
one of the committee members is absent, or late; hence the
missing data.  The members differ in the way they use the rating
scale, in both slope and intercept (if you regress each on the
mean).  Many decisions end up depending on the second decimal
place of the averages, so we want to do better than just averging
the non-missing ratings.
Maybe I'm just not seeing something really simple.  In fact, the
problem is simpler than transcan assumes, since we are willing to
assume linearity of the regression of each variable on the other
variables.  Other members proposed solutions that assumed this,
but they did not take into account the fact that missing data at
the high or low end of each variable (each member's ratings)
would change its mean.
Jon
--
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page: http://www.sas.upenn.edu/~baron
R search page: http://finzi.psych.upenn.edu/
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Avoiding for-loops

2004-11-25 Thread roger koenker
lower triangle can be obtained by
A[row(A)>col(A)]
url:www.econ.uiuc.edu/~roger    Roger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
On Nov 25, 2004, at 11:15 AM, John wrote:
Hello R-users,
I have a symmetric matrix of numerical values and I
want to obtain those values in the upper or lower
triangle of the matrix in a vector. I tried to do the
job by using two for-loops but it doens't seem to be a
clever way, and I'd like to know a more efficient code
for a large matrix of thousands of rows and columns.
Below is my code for your reference.
Thanks a lot.
John

# mtx.sym is a symmetric matrix
my.ftn <- function(size_mtx, mtx) {
+ my.vector <- c()
+ for ( i in 1:size_mtx ) {
+ cat(".")
+ for ( j in 1:size_mtx ) {
+ if ( upper.tri(mtx)[i,j] ) {
+ my.vector <- c(my.vector, mtx[i,j])
+ }}}
+ cat("\n")
+ }

# if I have a matrix, mtx.sym, of 100x100
my.ftn(100, mtx.sym)
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] The hidden costs of GPL software?

2004-11-23 Thread roger koenker
Having just finished an index I would like to second John's comments.
Even as an author, it is  difficult to achieve some degree of
completeness and consistency.
Of course, maybe a real whizz at clustering could assemble something
very useful quite easily.  All of us who have had the frustration of 
searching
for a forgotten function would be grateful.

url:www.econ.uiuc.edu/~roger    Roger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820
On Nov 23, 2004, at 7:48 AM, John Fox wrote:
Dear Duncan,
I don't think that there is an automatic, nearly costless way of 
providing
an effective solution to locating R resources. The problem seems to me 
to be
analogous to indexing a book. There's an excellent description of what 
that
process *should* look like in the Chicago Manual of Style, and it's a 
lot of
work. In my experience, most book indexes are quite poor, and 
automatically
generated indexes, while not useless, are even worse, since one should 
index
concepts, not words. The ideal indexer is therefore the author of the 
book.

I guess that the question boils down to how important is it to provide 
an
analogue of a good index to R? As I said in a previous message, I 
believe
that the current search facilities work pretty well -- about as well 
as one
could expect of an automatic approach. I don't believe that there's an
effective centralized solution, so doing something more ambitious than 
is
currently available implies farming out the process to package 
authors. Of
course, there's no guarantee that all package authors will be diligent
indexers.

Regards,
 John

John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Duncan Murdoch
Sent: Monday, November 22, 2004 8:55 AM
To: Cliff Lunneborg
Cc: [EMAIL PROTECTED]
Subject: Re: [R] The hidden costs of GPL software?
On Fri, 19 Nov 2004 13:59:23 -0800, "Cliff Lunneborg"
<[EMAIL PROTECTED]> quoted John Fox:
Why not, as previously has been proposed, replace the current static
(and, in my view, not very useful) set of keywords in R
documentation
with the requirement that package authors supply their own
keywords for
each documented object? I believe that this is the intent of the
concept entries in Rd files, but their use certainly is not
required or
even actively encouraged. (They're just mentioned in passing in the
Writing R Extensions manual.
That would not be easy and won't happen quickly.  There are some
problems:
 - The base packages mostly don't use  \concept. (E.g. base
has 365 man pages, only about 15 of them use it).  Adding it
to each file is a fairly time-consuming task.
- Before we started, we'd need to agree as to what they are for.
Right now, I think they are mainly used when the name of a
concept doesn't match the name of the function that
implements it, e.g.
"modulo", "remainder", "promise", "argmin", "assertion".  The
need for this usage is pretty rare.  If they were used for
everything, what would they contain?
 - Keywording in a useful way is hard.  There are spelling
issues (e.g. optimise versus optimize); our fuzzy matching
helps with those.
But there are also multiple names for the same thing, and
multiple meanings for the same name.
Duncan Murdoch
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


  1   2   >