Re: [R] Fitted Value Pareto Distribution

2007-06-13 Thread Markus Jäntti
livia wrote:
> I would like to fit a Pareto Distribution and I am using the following codes. 
> 
> I thought the fitted (fit1) should be the fitted value for the data, is it
> correct? As the result of the "fitted" turns out to be a single value for
> all. 
> 
> fit=vglm(ycf1 ~ 1, pareto1(location=alpha), trace=TRUE, crit="c") 
> fitted(fit) 
> 
> The result is 
> fitted(fit)
> [,1]
>  [1,] 0.07752694
>  [2,] 0.07752694
>  [3,] 0.07752694
>  [4,] 0.07752694
>  [5,] 0.07752694
>  [6,] 0.07752694
>  [7,] 0.07752694
>  [8,] 0.07752694
>  [9,] 0.07752694
> [10,] 0.07752694
> [11,] 0.07752694
> [12,] 0.07752694
> [13,] 0.07752694
> 
> Could anybody give me some advice? 
> 


Your model only includes an intercept, so the fitted value  is supposed to be 
the same for all units (there is nothing in your model that allows the fitted 
value to vary across units).

markus

-- 
Markus Jantti
Abo Akademi University
[EMAIL PROTECTED]
http://www.iki.fi/~mjantti

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generalized moment method

2007-06-11 Thread Markus Jäntti
Sebastian Kruk wrote:
> Dear everyone:
> 
> I have to finish my thesis to graduate as Bs. in Economics.
> 
> I choose to estimate a New Keynesian Phillips Curve (NKPC) for Uruguay
> using Generalized Moment Method (GMM).
> 
> I do not know programming or R but I  would like to use it.
> 
> Should I use gee, geepack or gam?

Dear Sebastián -- neither geepack nor gam provide GMM estimators. GMM -- or at 
least minimum distance estimation techniques -- rely on fitting by linear or 
more often non-linear least squares functions of smaller parameter vectors to 
the empirical moments of your problem. R is a suitable tool for this, but there 
is AFAIK know general GMM package.

The details of our model would need to be known before any further advice can 
be 
given.

Regards,

Markus

> 
> Thanks in advance,
> 
> Sebastián.
> 
> ***
> 
> ¡Hola todos!
> 
> Para terminiar mi licenciatura en Economía debo hacer un trabajo de
> investigación monográfico.
> 
> Elegi como tema la estimación de la curva de Phillips de los Nuevos
> Keynesianos (CPNK).
> 
> No se programar ni conosco el lenguaje R pero me gustaria usarlo para
> estimar la CPNK usando el método generalizado de los momentos (MGM).
> 
> ¿Debería usar el paquete gee, geepack o gam?
> 
> Gracias a todos.
> 
> Sebastián.
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


-- 
Markus Jantti
Abo Akademi University
[EMAIL PROTECTED]
http://www.iki.fi/~mjantti

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with gls

2007-01-06 Thread Markus Jäntti
On Sat, 2007-01-06 at 01:21 -0500, Zhenqiang Lu wrote:
> Hello R-users,
> 
> I am using gls function in R to fit a model with certain correlation
> structure.
> 
> The medol as:
> fit.a<-gls(y~1,data=test.data,correlation=corAR1(form=~1|aa),method="ML")
> mu<-summary(fit.a)$coefficient
> 
> With the toy data I made to test, the estimate of  mu is exactly equal to
> the overall mean of y which can not be true.
> 

On the contrary, this has to be true. 

Due to a well-known result (a recent is is Greene, 2003, sec 14.2.2, pp
343-344) when the covariates are the same in every equation, a
"seemingly unrelated regression" of this sort has GLS and OLS produce
identical results.   The OLS estimate is exactly the arithmetic
average. 
 
@Book{greene2003,
  Author = {William H Greene},
  Title  = {Econometric Analysis},
  Publisher  = {Prentice-Hall International Ltd},
  Address= {London},
  Edition= {Fifth},
  year   = 2003,
}

> But, if I make a toy data with y more than two replicates (for each level
> of aa we have more than 2 abservations, for example N=4), the estimates of
> mu will not be as the same as the overall mean of y.

This is what is strange. You should probably provide example code of
this too. 

> 
> Would you please tell me why this happens?
> The following is my testing code.

The code had some typos in it (and you did not load nlme). I believe
this would be correct:

require(mvtnorm)
library(nlme)
rho=-0.5
N=2
sigma.a=2

Rho.a<-matrix(c(1,rho^(1:(N-1)))[outer(X=1:N,Y=1:N,function(x,y)
1+abs(x-y))],ncol=N)
Sigma.a<-sigma.a^2 * Rho.a/(1-rho^2)
y<-rep(0,N*20)
for (ii in 1:20)
 {y[((ii-1)*N+1):(ii*N)]<-rmvnorm(1, mean=rep(0,N), sigma=Sigma.a)
 }
test.data<-data.frame(y=y,aa=rep(1:20,each=N,len=N*20))
fit.a<-gls(y~1,data=test.data,correlation=corAR1(form=~1|
aa),method="ML")
mu.a<-summary(fit.a)$coefficient
rho.a<-getVarCov(fit.a)[1,2]/getVarCov(fit.a)[1,1]
print(c(mean(y),mu.a))

Regards,

Markus

> Thanks.
> James
> 
> 
> 
> require(mvtnorm)
> rho=-0.5
> N=2
> sigma.a=2
> 
> Rho.a<-matrix(c(1,rho^(1:(N-1)))[outer(X=1:N,Y=1:N,function(x,y)
> 1+abs(x-y))],ncol=N)
> Sigma.a<-sigma.a^2 * Rho/(1-rho.a^2)
> y<-rep(0,N*20)
> for (ii in 1:20)
>  {y[((ii-1)*N+1):(ii*N)]<-rmvnorm(1, mean=rep(0,N), sigma=Sigma.a)
>  }
> test.data<-data.frame(y=y,aa=rep(1:20,each=N,len=N*20))
> fit.a<-gls(y~1,data=test.data,correlation=corAR1(form=~1|aa),method="ML")
> mu.a<-summary(fit.a)$coefficient
> rho.a<-getVarCov(fit.a)[1,2]/getVarCov(fit.a)[1,1]
> print(c(mean(y),mu.a))
> 
> __
> Zhenqiang (James) Lu
> 
> Department of Statistics
> Purdue University
> West Lafayette, IN 47906
> TEL: (765) 494-0027
> FAX: (765) 494-0558
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
-- 
Markus Jantti
Abo Akademi University
[EMAIL PROTECTED]
http://www.iki.fi/~mjantti

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using weighted.mean with tapply()

2005-08-03 Thread Markus Jäntti
roger bos wrote:
> I am trying to calculate the weighted mean for a of 10 deciles and I
> get an error:
> 
>>decile <- tapply(X=mat$trt1m, INDEX=mat$Rank, FUN=weighted.mean, w=mat$mcap)
> 
> Error in FUN(X[[1]], ...) : 'x' and 'w' must have the same length
> 
> All three of my inputs have the same length, as shown below, and the
> weighted.mean calculation works by itself, just not in tapply()
> 

Hi -- I asked pretty much this same question some years ago:

http://www.r-project.org/nocvs/mail/r-help/1999/2160.html

The answer turns out to be that you should pass the index to tapply
rather than the data. In your case this would, I think, translate to

decile <- tapply(seq(along=mat$trlm, mat$Rank,
function(i, x=mat$trlm[i], w=mat$mcap[i])
weighted.mean(x[i], w[i]))

hope this helps.

regards,

Markus

> 
>>length(mat$Rank)
> 
> [1] 1853
> 
>>length(mat$mcap)
> 
> [1] 1853
> 
>>length(mat$trt1m)
> 
> [1] 1853
> 
>>mean(mat$trt1m)
> 
> [1] -0.04475397
> weighted.mean(mat$trt1m, w=mat$mcap)
> [1] -0.04819243
> 
>>mat$mcap[is.na(mat$mcap)] <- min(mat$mcap, na.rm=TRUE)
> 
> 
> I am probably making a simple error in how I pass the optional
> parameter w.  Any help would be greatly appreciated.
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 


-- 
Markus Jantti
Abo Akademi University
[EMAIL PROTECTED]
http://www.iki.fi/~mjantti

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] poly() in lm() leads to wrong coefficients (but correct residuals)

2005-06-29 Thread Markus Jäntti
On Wed, 2005-06-29 at 18:19 +0200, Andreas Neumann wrote:
> Dear all,
> 
> I am using poly() in lm() in the following form.
> 
> 1> DelsDPWOS.lm3 <- lm(DelsPDWOS[,1] ~ poly(DelsPDWOS[,4],3))
> 
> 2> DelsDPWOS.I.lm3 <- lm(DelsPDWOS[,1] ~ poly(I(DelsPDWOS[,4]),3))
> 
> 3> DelsDPWOS.2.lm3 <-
>lm(DelsPDWOS[,1]~DelsPDWOS[,4]+I(DelsPDWOS[,4]^2)+I(DelsPDWOS[,4]^3))
> 
> 1 and 2 lead to identical but wrong results. 3 is correct. Surprisingly
> (to me) the residuals are the same (correct) in all cases although the
> coefficients of 1 (and 2) are wrong and do not in any way match the
> stated regression polynomial. (summaries below)
> 
> QUESTION:
> Is there a correct way to use poly() in lm() since version 3 becomes quite
> tedious if used more often especially with higher order polynomials?
> 

The coefficients  using 1 and 2 are not incorrect.  
poly() defines orthogonal polynomials, whereas your
DelsPDWOS[,4]+I(DelsPDWOS[,4]^2)+I(DelsPDWOS[,4]^3 
contruct defines an ordinary polynomial. 

You should be able to recover version 3 coefficients from 1 and 2.
See help(poly)

> x <- runif(10)
> x
 [1] 0.1878 0.2415 0.5834 0.6556 0.4112 0.3399 0.8144 0.1134 0.7360
0.0463
> model.matrix(~ poly(x, 2))
   (Intercept) poly(x, 2)1 poly(x, 2)2
11-0.27648 -0.0452
21-0.21052 -0.1899
31 0.20937 -0.2708
41 0.29799 -0.1021
51-0.00212 -0.4117
61-0.08970 -0.3621
71 0.49297  0.4968
81-0.36790  0.2148
91 0.39672  0.1620
10   1-0.45033  0.5082
attr(,"assign")
[1] 0 1 1
> model.matrix(~ x + I(x^2))
   (Intercept)  x  I(x^2)
11 0.1878 0.03528
21 0.2415 0.05834
31 0.5834 0.34040
41 0.6556 0.42982
51 0.4112 0.16911
61 0.3399 0.11554
71 0.8144 0.66320
81 0.1134 0.01286
91 0.7360 0.54169
10   1 0.0463 0.00214
attr(,"assign")
[1] 0 1 2
>


Regards,

-- 
Markus Jantti
Abo Akademi University
[EMAIL PROTECTED]
http://www.iki.fi/~mjantti

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Problem loading library Design

2005-03-24 Thread Markus Jäntti
Emmanuelle Comets wrote:
I have used library Design (Frank Harrell) in the past but when I tried 
this week, I get :
 > library(Design)
Error in eval(expr, envir, enclos) : Object "formula.default" not found

I updated to the latest version of the library (2.0) with the same 
result. My version of R may not be the latest but it's recent (and I 
probably changed since using Design for the last time) :

 > version
 _
platform i686-pc-linux-gnu
arch i686
os   linux-gnu
system   i686, linux-gnu
status
major1
minor9.1
year 2004
month06
day  21
language R
I just tried this in
> version
 _
platform i386-pc-linux-gnu
arch i386
os   linux-gnu
system   i386, linux-gnu
status
major2
minor0.1
year 2004
month11
day  15
language R
where it works. Your R is pretty old and there were many changes in 
moving to 2.0 (including the namespaces, which may be the reason you are
getting that particular error).

Upgrading R would appear to be the solution.
Regards,
markus
Could anyone please point me in the right direction to solve my problem? 
Thank you in advance,
Emmanuelle

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


--
Markus Jantti
Abo Akademi University
[EMAIL PROTECTED]
http://www.iki.fi/~mjantti
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Quantiles of data in a contingency table

2005-03-17 Thread Markus Jäntti
Matt Mohebbi wrote:
Hello, 

I have data of the following form:

data <- data.frame(type=c("c","d","e"), size=c(10,20,30), count=c(20,10,5))
data
  type size count
1c   1020
2d   2010
3e   30 5
I would like to compute the quantiles of size given the counts. For
instance, in this example, the median size would be 10. Is there an
easy way of doing this?
One at least is to the function wtd.median [and wtd.quantile] in package 
Hmisc by Frank Harrell.

install.packages("Hmisc")
library(Hmisc)
wtd.median(data$size, data$weights)
is likely a route to get you what you want.
regards,
markus
Is there a good way to deal with data in this format in general? Much
of R seems to center around having an entry for each item. This
question (http://www.r-project.org/nocvs/mail/r-help/2000/0102.html)
seems to be related but no one provided an answer.
Thanks, 
Matt

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

--
Markus Jantti
Abo Akademi University
[EMAIL PROTECTED]
http://www.iki.fi/~mjantti
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] using poly in a linear regression in the presence of NA fails (despite subsetting them out)

2005-02-14 Thread Markus Jäntti
I ran into a to me surprising result on running lm with an orthogonal 
polynomial among the predictors.

The lm command resulted in
Error in qr(X) : NA/NaN/Inf in foreign function call (arg 1)
Error during wrapup:
despite my using a "subset" in the call to get rid of NA's.
poly is apparently evaluated before any NA's are subsetted out
of the data.
Example code (attached to this e-mail as as a script):
> ## generate some data
> n <- 50
> x <- runif(n)
> a0 <- 10
> a1 <- .5
> sigma.e <- 1
> y <- a0 + a1*x + rnorm(n)*sigma.e
> tmp.d <- data.frame(y, x)
> rm(list=c("n", "x", "a0", "a1", "sigma.e", "y"))
>
> print(lm.1 <- lm(y ~ poly(x, 2), data = tmp.d)
+
+ ## now make a few NA's
+
+ tmp.d$x[1:2] <- rep(NA, 2)
Error: syntax error
Error during wrapup:
>
> ## this fails, just as it should
> print(lm.1 <- lm(y ~ poly(x, 2), data = tmp.d))
Call:
lm(formula = y ~ poly(x, 2), data = tmp.d)
Coefficients:
(Intercept)  poly(x, 2)1  poly(x, 2)2
 10.380   -0.242   -1.441
>
> ## these also fail, but should not?
>
> print(lm.2 <- lm(y ~ poly(x, 2), data = tmp.d, subset = !is.na(x)))
Call:
lm(formula = y ~ poly(x, 2), data = tmp.d, subset = !is.na(x))
Coefficients:
(Intercept)  poly(x, 2)1  poly(x, 2)2
 10.380   -0.242   -1.441
> print(lm.3 <- lm(y ~ poly(x, 2), data = tmp.d, na.action = na.omit))
Call:
lm(formula = y ~ poly(x, 2), data = tmp.d, na.action = na.omit)
Coefficients:
(Intercept)  poly(x, 2)1  poly(x, 2)2
 10.380   -0.242   -1.441
>
> ## but this works
>
> print(lm.3 <- lm(y ~ poly(x, 2), data = subset(tmp.d, subset = 
!is.na(x

Call:
lm(formula = y ~ poly(x, 2), data = subset(tmp.d, subset = !is.na(x)))
Coefficients:
(Intercept)  poly(x, 2)1  poly(x, 2)2
 10.380   -0.242   -1.441

The documentation of lm is *not* misleading at this point, saying that
subset 	an optional vector specifying a subset of observations to be 
used in the fitting process.

which implies that data are subsetted once lm.fit is called.
All the same, this behavior is a little unexpected to me.
Is it to be considered a feature, that is, does it produce beneficial 
side effects which explain why it works as it does?

Regards,
Markus
I am running R on a Debian testing system with kernel 2.6.10 and
> version
 _
platform i386-pc-linux-gnu
arch i386
os   linux-gnu
system   i386, linux-gnu
status
major2
minor0.1
year 2004
month11
day  15
language R
--
Markus Jantti
Abo Akademi University
[EMAIL PROTECTED]
http://www.iki.fi/~mjantti

## generate some data
n <- 50
x <- runif(n)
a0 <- 10
a1 <- .5
sigma.e <- 1
y <- a0 + a1*x + rnorm(n)*sigma.e
tmp.d <- data.frame(y, x)
rm(list=c("n", "x", "a0", "a1", "sigma.e", "y"))

print(lm.1 <- lm(y ~ poly(x, 2), data = tmp.d)

## now make a few NA's

tmp.d$x[1:2] <- rep(NA, 2)

## this fails, just as it should
print(lm.1 <- lm(y ~ poly(x, 2), data = tmp.d))

## these also fail, but should not?

print(lm.2 <- lm(y ~ poly(x, 2), data = tmp.d, subset = !is.na(x)))
print(lm.3 <- lm(y ~ poly(x, 2), data = tmp.d, na.action = na.omit))

## but this does not

print(lm.3 <- lm(y ~ poly(x, 2), data = subset(tmp.d, subset = !is.na(x

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Std Err on Concentration measures

2005-02-07 Thread Markus Jäntti
Angelo Secchi wrote:
> Hi,
> I'm using the ineq package to calculate some concentration measures
(Gini, Herfindal, ...) and I was wondering if there's around also a
function to calculate standard error on these measures. If not, is anybody
aware of where I can find a reference on this point?
> Thanks.
>

There are several suggestions for estimating the variance of the Gini
coefficient:a literature search should reveal some suggestions.
If the sampling design is complex, you may be best off using a
resampling method, such as bootstrap.

Here are a few references:

@Article{nayakandgastwirth1989,
   Author = {Tapan K Nayak and Joseph L Gastwirth},
   Title  = {The use of diversity analysis to assess the relative
influence of factors affecting the income
distribution},
   Journal= {Journal of Business \& Economic Statistics},
   Volume = {7},
   Pages  = {453--460},
   subject= {Asymptotic distribution Diversity measure Gini index
U
statistic Utility function},
   year   = 1989,
   month  = oct,
}

@Article{sandstromwretmanandwalden1988a,
   Author = {Arne Sandström and Jan Wretman and Bertil Walden},
   Title  = {Variance estimators of the Gini coefficient -
probability sampling},
   Journal= {Journal of Business \& Economic Statistics},
   Volume = {6},
   Pages  = {113--119},
   subject= {Simulations},
   year   = 1988,
   month  = jan,
}

@Article{davidsonandduclos1997,
   Author = {Russell Davidson and Jean-Yves Duclos},
   Title  = {Statistical inference for the measurement of the
incidence of taxes and transfers},
   Journal= {Econometrica},
   Volume = {65},
   Number = {6},
   Pages  = {1453--1465},
   month  = {November},
   year   = 1997,
}

cheers,

markus
-- 
Markus Jantti
Abo Akademi University
[EMAIL PROTECTED]
http://www.iki.fi/~mjantti
###

This message has been scanned by F-Secure Anti-Virus for Mic...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] weighted mean

2003-11-26 Thread Markus Jäntti
On Wed, 2003-11-26 at 01:28, [EMAIL PROTECTED] wrote:
> How do I go about generating a WEIGHTED mean (and standard error) of a
> variable (e.g., expenditures) for each level of a categorical variable
> (e.g., geographic region)?  I'm looking for something comparable to PROC
> MEANS in SAS with both a class and weight statement.
> 

I asked this question a few years ago and this is the anwer I got works:

see 
http://www.r-project.org/nocvs/mail/r-help/1999/2160.html

tapply(seq(along=wage), list(Educc,Year), 
   function(i, x=wage, w=weight)  weighted.mean(x[i], w[i]))

(wage and weight are the variable of interest and weight, while Educc
and Year are the factors)

Regards,

Markus
>  
> 
> Thanks.
> 
>  
> 
> Marc
> 
>  
> 
> 
> 
> 
> 
> 
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> [EMAIL PROTECTED] mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
-- 
Markus Jäntti <[EMAIL PROTECTED]>
Abo Akademi University

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] rbind.data.frame: character comverted to factor

2003-02-12 Thread Markus Jäntti
Dear All,

on rbind:ing together a number of data.frames, I found that
character variables are converted into factors. Since this
occurred for a data identifier, it was a little inconvenient
and, to me, unexpected. (The help page explains the
general procedure used. I also found that on forming 
a data frame, character variables are converted to factors.

The help page on read.table has the 'as.is' argument, which
I suppose kind of suggests that character variables  tend to
get converted into factors. Is there such a "preference" for 
factors and should this behaviour be expected?

Example code

 d1 <- data.frame(id =letters[1:20], x = runif(20))
d2 <- data.frame(id =paste(letters[1:20],letters[1:20], sep = ""),  x =
rexp(20))
d3 <- rbind(d1, d2)
str(d1) # <- id is factor
str(d2) # <- id is factor
str(d3) # <- id is factor
d1[["id"]] <- as.character(d1[["id"]])
d2[["id"]] <- as.character(d2[["id"]])
d3 <- rbind(d1, d2)
str(d1) # <- id is character
str(d2) # <- id is character
str(d3) # <- id is factor

Regards,

Markus
-- 
Markus Jäntti <[EMAIL PROTECTED]>
Statistics Finland

__
[EMAIL PROTECTED] mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help