Re: [R] Large mixed & crossed-effect model looking at educational spending on crime rates with error messages

2019-09-03 Thread Patrick (Malone Quantitative)
(And post in plain text)

On Tue, Sep 3, 2019 at 10:34 AM Bert Gunter  wrote:
>
> You should post this on the R-sig-mixed-models list, not here. When you do
> so, use ?dput to include data, not cut and paste.
>
> Cheers,
> Bert
>
>
> On Tue, Sep 3, 2019 at 6:10 AM Ades, James  wrote:
>
> >
> > I posted my question at Stack Overflow, where it didn’t get much of a
> > response, and I was pointed in this direction by Ben Bolker. I’m happy to
> > send the whole dataset to anyone who wants but thought that it would be
> > presumptuous to include an enormous dput() here.
> >
> > I’m looking at the effects of education spending per school district on
> > crime rate (FBI crime data/UCR) within the cities and towns those school
> > districts serve over a fifteen year period. The DV now has 203,410
> > observations of city/town crime data over those fifteen years. (I use that
> > figure with some reticence, because there are so many moving parts and
> > things to account for, but having employed over 100 datasets and hours
> > passing through the code again, I think that figure is correct.)
> >
> > Cities are technically crossed with school district, in that one city
> > might attend multiple school districts. This means that one city could have
> > multiple values for expenditure per student. School districts, however,
> > also overlap with counties. As if things weren’t complicated enough, cities
> > are mostly nested within county (though there are cities that exist in two
> > counties, but it’s not often, and it’s usually by a small amount). Given
> > that each city/town has a distinct PLACE_ID, my understanding is that this
> > could be represented as (1|PLACE_ID) + (1|STATE/COUNTY_ID) or
> > (1|STATE/COUNTY_ID/PLACE_ID).
> >
> > I’m pretty familiar with mixed-effect models, and I’ve looked through
> > clear and informative posts such as this one:
> > https://stats.stackexchange.com/questions/228800/crossed-vs-nested-random-effects-how-do-they-differ-and-how-are-they-specified.
> > I believe things would remain sane to include school district
> > (full_district_id) as another crossed effect, as below:
> > glmer.total <- glmer(CRIME_TOTAL ~ 1 + (year|PLACE_ID) +
> > (1|STATE/COUNTY_ID) + (year|full_district_id), family = "poisson", control
> > = glmerControl(optimizer = "nloptwrap", calc.derivs = FALSE), REML = FALSE,
> > total.years, na.action = "na.omit")
> >
> > Variables (not included in this model, to keep things null and simple) are
> > centered and logged: pop per city, pop.dens per city, year, unemployment
> > rate per county, proportion children living in poverty per school district,
> > per capita income per county, difference in those who voted democrat in
> > presidential elections per county, log enforcement per city/town, centered
> > expenditure per student/ 1000 (per school district). PLACE_ID corresponds
> > to cities and towns, COUNTY_ID to counties, full_district_id to school
> > districts, and state.
> >
> > First, if I try to run the full model, using UCSD’s supercomputer, I get
> > the error that the job was killed, presumably because it got to a point
> > where it consumed too much ram (I think 125mb).
> >
> > I then tried to create a small subsection of data with arrange(STATE,
> > COUNTY_ID, PLACE_ID) and then slicing by the first ten states (up through
> > Delaware), so that I have 26,599 values. If I run this null model with the
> > above code, I get the following error:
> >
> > ```
> > Error in getOptfun(optimizer) :
> >   optimizer function must use (at least) formal parameters ‘fn’, ‘par’,
> > ‘lower’, ‘control’
> > ```
> >
> > Then I tried with the optimx, with these configurations: control =
> > glmerControl(optimizer = "optimx”,
> > optCtrl = list(method = "nlminb”,
> > maxit=1,
> > iter.max=1,
> > eval.max=1,
> > lower = c(0,0,0),
> > upper = c(Inf,10,1)))
> >
> > and I received the following warning…since this is a null model, there
> > aren’t any variables to really rescale.\
> >
> > ```
> > Warning messages:
> > 1: In nlminb(start = par, objective = ufn, gradient = ugr, lower = lower,
> > :
> >   unrecognized control elements named ‘lower’, ‘upper’ ignored
> > 2: In nlminb(start = par, objective = ufn, gradient = ugr, lower = lower,
> > :
> >   unrecognized control elements named ‘lower’, ‘upper’ ignored
> > 3: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
> >   Model failed to converge with max|grad| = 0.00102386 (tol = 0.001,
> > component 1)
> > 4: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
> >   Model is nearly unidentifiable: very large eigenvalue
> >  - Rescale variables?
> > ```
> >
> > I then tried more values (to 92,486, through Missouri). First, I tried the
> > optimizer nloptr, and then I tried the optimix. I still received the same
> > above errors.
> >
> > I’ve checked and rechecked everything, so I wanted to solicit advice,
> > either for where I might be going wrong, or for what I could do to re

Re: [R] Large mixed & crossed-effect model looking at educational spending on crime rates with error messages

2019-09-03 Thread Bert Gunter
You should post this on the R-sig-mixed-models list, not here. When you do
so, use ?dput to include data, not cut and paste.

Cheers,
Bert


On Tue, Sep 3, 2019 at 6:10 AM Ades, James  wrote:

>
> I posted my question at Stack Overflow, where it didn’t get much of a
> response, and I was pointed in this direction by Ben Bolker. I’m happy to
> send the whole dataset to anyone who wants but thought that it would be
> presumptuous to include an enormous dput() here.
>
> I’m looking at the effects of education spending per school district on
> crime rate (FBI crime data/UCR) within the cities and towns those school
> districts serve over a fifteen year period. The DV now has 203,410
> observations of city/town crime data over those fifteen years. (I use that
> figure with some reticence, because there are so many moving parts and
> things to account for, but having employed over 100 datasets and hours
> passing through the code again, I think that figure is correct.)
>
> Cities are technically crossed with school district, in that one city
> might attend multiple school districts. This means that one city could have
> multiple values for expenditure per student. School districts, however,
> also overlap with counties. As if things weren’t complicated enough, cities
> are mostly nested within county (though there are cities that exist in two
> counties, but it’s not often, and it’s usually by a small amount). Given
> that each city/town has a distinct PLACE_ID, my understanding is that this
> could be represented as (1|PLACE_ID) + (1|STATE/COUNTY_ID) or
> (1|STATE/COUNTY_ID/PLACE_ID).
>
> I’m pretty familiar with mixed-effect models, and I’ve looked through
> clear and informative posts such as this one:
> https://stats.stackexchange.com/questions/228800/crossed-vs-nested-random-effects-how-do-they-differ-and-how-are-they-specified.
> I believe things would remain sane to include school district
> (full_district_id) as another crossed effect, as below:
> glmer.total <- glmer(CRIME_TOTAL ~ 1 + (year|PLACE_ID) +
> (1|STATE/COUNTY_ID) + (year|full_district_id), family = "poisson", control
> = glmerControl(optimizer = "nloptwrap", calc.derivs = FALSE), REML = FALSE,
> total.years, na.action = "na.omit")
>
> Variables (not included in this model, to keep things null and simple) are
> centered and logged: pop per city, pop.dens per city, year, unemployment
> rate per county, proportion children living in poverty per school district,
> per capita income per county, difference in those who voted democrat in
> presidential elections per county, log enforcement per city/town, centered
> expenditure per student/ 1000 (per school district). PLACE_ID corresponds
> to cities and towns, COUNTY_ID to counties, full_district_id to school
> districts, and state.
>
> First, if I try to run the full model, using UCSD’s supercomputer, I get
> the error that the job was killed, presumably because it got to a point
> where it consumed too much ram (I think 125mb).
>
> I then tried to create a small subsection of data with arrange(STATE,
> COUNTY_ID, PLACE_ID) and then slicing by the first ten states (up through
> Delaware), so that I have 26,599 values. If I run this null model with the
> above code, I get the following error:
>
> ```
> Error in getOptfun(optimizer) :
>   optimizer function must use (at least) formal parameters ‘fn’, ‘par’,
> ‘lower’, ‘control’
> ```
>
> Then I tried with the optimx, with these configurations: control =
> glmerControl(optimizer = "optimx”,
> optCtrl = list(method = "nlminb”,
> maxit=1,
> iter.max=1,
> eval.max=1,
> lower = c(0,0,0),
> upper = c(Inf,10,1)))
>
> and I received the following warning…since this is a null model, there
> aren’t any variables to really rescale.\
>
> ```
> Warning messages:
> 1: In nlminb(start = par, objective = ufn, gradient = ugr, lower = lower,
> :
>   unrecognized control elements named ‘lower’, ‘upper’ ignored
> 2: In nlminb(start = par, objective = ufn, gradient = ugr, lower = lower,
> :
>   unrecognized control elements named ‘lower’, ‘upper’ ignored
> 3: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
>   Model failed to converge with max|grad| = 0.00102386 (tol = 0.001,
> component 1)
> 4: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
>   Model is nearly unidentifiable: very large eigenvalue
>  - Rescale variables?
> ```
>
> I then tried more values (to 92,486, through Missouri). First, I tried the
> optimizer nloptr, and then I tried the optimix. I still received the same
> above errors.
>
> I’ve checked and rechecked everything, so I wanted to solicit advice,
> either for where I might be going wrong, or for what I could do to resolve
> these error messages.
>
> I’ve provided a brief snippet of the data below (randomly pulling a number
> of cities within counties of Arkansas, Arizona, and Alabama, as a dput.
>
> Thanks!
>
>
>
>
> 231, 206, 935, 1070, 974, 1108, 1244, 1095, 1131, 1151, 1420,
> 1316, 132

[R] Large mixed & crossed-effect model looking at educational spending on crime rates with error messages

2019-09-03 Thread Ades, James

I posted my question at Stack Overflow, where it didn’t get much of a response, 
and I was pointed in this direction by Ben Bolker. I’m happy to send the whole 
dataset to anyone who wants but thought that it would be presumptuous to 
include an enormous dput() here.

I’m looking at the effects of education spending per school district on crime 
rate (FBI crime data/UCR) within the cities and towns those school districts 
serve over a fifteen year period. The DV now has 203,410 observations of 
city/town crime data over those fifteen years. (I use that figure with some 
reticence, because there are so many moving parts and things to account for, 
but having employed over 100 datasets and hours passing through the code again, 
I think that figure is correct.)

Cities are technically crossed with school district, in that one city might 
attend multiple school districts. This means that one city could have multiple 
values for expenditure per student. School districts, however, also overlap 
with counties. As if things weren’t complicated enough, cities are mostly 
nested within county (though there are cities that exist in two counties, but 
it’s not often, and it’s usually by a small amount). Given that each city/town 
has a distinct PLACE_ID, my understanding is that this could be represented as 
(1|PLACE_ID) + (1|STATE/COUNTY_ID) or (1|STATE/COUNTY_ID/PLACE_ID).

I’m pretty familiar with mixed-effect models, and I’ve looked through clear and 
informative posts such as this one: 
https://stats.stackexchange.com/questions/228800/crossed-vs-nested-random-effects-how-do-they-differ-and-how-are-they-specified.
 I believe things would remain sane to include school district 
(full_district_id) as another crossed effect, as below:
glmer.total <- glmer(CRIME_TOTAL ~ 1 + (year|PLACE_ID) + (1|STATE/COUNTY_ID) + 
(year|full_district_id), family = "poisson", control = glmerControl(optimizer = 
"nloptwrap", calc.derivs = FALSE), REML = FALSE, total.years, na.action = 
"na.omit")

Variables (not included in this model, to keep things null and simple) are 
centered and logged: pop per city, pop.dens per city, year, unemployment rate 
per county, proportion children living in poverty per school district, per 
capita income per county, difference in those who voted democrat in 
presidential elections per county, log enforcement per city/town, centered 
expenditure per student/ 1000 (per school district). PLACE_ID corresponds to 
cities and towns, COUNTY_ID to counties, full_district_id to school districts, 
and state.

First, if I try to run the full model, using UCSD’s supercomputer, I get the 
error that the job was killed, presumably because it got to a point where it 
consumed too much ram (I think 125mb).

I then tried to create a small subsection of data with arrange(STATE, 
COUNTY_ID, PLACE_ID) and then slicing by the first ten states (up through 
Delaware), so that I have 26,599 values. If I run this null model with the 
above code, I get the following error:

```
Error in getOptfun(optimizer) :
  optimizer function must use (at least) formal parameters ‘fn’, ‘par’, 
‘lower’, ‘control’
```

Then I tried with the optimx, with these configurations: control = 
glmerControl(optimizer = "optimx”,
optCtrl = list(method = "nlminb”,
maxit=1,
iter.max=1,
eval.max=1,
lower = c(0,0,0),
upper = c(Inf,10,1)))

and I received the following warning…since this is a null model, there aren’t 
any variables to really rescale.\

```
Warning messages:
1: In nlminb(start = par, objective = ufn, gradient = ugr, lower = lower,  :
  unrecognized control elements named ‘lower’, ‘upper’ ignored
2: In nlminb(start = par, objective = ufn, gradient = ugr, lower = lower,  :
  unrecognized control elements named ‘lower’, ‘upper’ ignored
3: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.00102386 (tol = 0.001, component 
1)
4: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model is nearly unidentifiable: very large eigenvalue
 - Rescale variables?
```

I then tried more values (to 92,486, through Missouri). First, I tried the 
optimizer nloptr, and then I tried the optimix. I still received the same above 
errors.

I’ve checked and rechecked everything, so I wanted to solicit advice, either 
for where I might be going wrong, or for what I could do to resolve these error 
messages.

I’ve provided a brief snippet of the data below (randomly pulling a number of 
cities within counties of Arkansas, Arizona, and Alabama, as a dput.

Thanks!




231, 206, 935, 1070, 974, 1108, 1244, 1095, 1131, 1151, 1420,
1316, 1321, 1414, 1484), full_district_id = c("0100240", "0100240",
"0100240", "0100240", "0100240", "0100240", "0100240", "0100240",
"0100240", "0100240", "0100240", "0100240", "0100240", "0100240",
"0100240", "0100240", "0100240", "0100240", "0100240", "0100240",
"0100240", "0100240", "0100240", "0100240", "0100240", "0100240",
"0100240", "0