When I set autodiff = true in the Gist I posted above, I get the message 
"ERROR: no method clogit_ll(Array{Dual{Float64},1},)".


> If you can, please do share an example of your code. Logit-style models 
> are in general numerically unstable, so it would be good to see how exactly 
> you’ve coded things up.
> One thing you may be able to do is use automatic differentiation via the 
> autodiff = true keyword to optimize, but that assumes that your objective 
> function is written in completely pure Julia code (which means, for 
> example, that your code must not call any of functions not written in Julia 
> provided by Distributions.jl).
>>  Hello,
>> I installed Julia a couple of days ago and was impressed how easy it was 
>> to make the switch from Matlab and to parallelize my code
>> (something I had never done before in any language; I'm an economist with 
>> only limited programming experience, mainly in Stata and Matlab).
>> However, I ran into a problem when using Optim.jl for Maximum Likelihood 
>> estimation of a conditional logit model. With the default Nelder-Mead 
>> algorithm, optimize from the Optim.jl package gave me the same result that 
>> I had obtained in Stata and Matlab.
>> With gradient-based methods such as BFGS, however, the algorithm jumped 
>> from the starting values to parameter values that are completely different. 
>> This happened for all thr starting values I tried, including the case in 
>> which I took a vector that is closed to the optimum from the Nelder-Mead 
>> algorithm.  
>> The problem seems to be that the algorithm tried values so large (in 
>> absolute value) that this caused problems for the objective
>> function, where I call exponential functions into which these parameter 
>> values enter. As a result, the optimization based on the BFGS algorithm did 
>> not produce the expected optimum.
>> While I could try to provide the analytical gradient in this simple case, 
>> I was planning to use Julia for Maximum Likelihood or Simulated Maximum 
>> Likelihood estimation in cases where the gradient is more difficult to 
>> derive, so it would be good if I could make the optimizer run also with 
>> numerical gradients.
>> I suspect that my problems with optimize from Optim.jl could have 
>> something to do with the gradient() function. In the example below, for 
>> instance, I do not understand why the output of the gradient function 
>> includes values such as 11470.7, given that the function values differ only 
>> minimally.
>> Best wishes,
>> Holger
>> julia> Optim.gradient(clogit_ll,zeros(4))
>> 60554544523933395e-22
>> 0Op
>> 0
>> 0
>> 14923.564009972584
>> -60554544523933395e-22
>> 0
>> 0
>> 0
>> 14923.565228435104
>> 0
>> 60554544523933395e-22
>> 0
>> 0
>> 14923.569064311248
>> 0
>> -60554544523933395e-22
>> 0
>> 0
>> 14923.560174904109
>> 0
>> 0
>> 60554544523933395e-22
>> 0
>> 14923.63413848258
>> 0
>> 0
>> -60554544523933395e-22
>> 0
>> 14923.495218282553
>> 0
>> 0
>> 0
>> 60554544523933395e-22
>> 14923.58699717058
>> 0
>> 0
>> 0
>> -60554544523933395e-22
>> 14923.54224130672
>> 4-element Array{Float64,1}:
>>   -100.609
>>    734.0
>>  11470.7
>>   3695.5
>> function clogit_ll(beta::Vector)
>>     # Print the parameters and the return value to
>>     # check how gradient() and optimize() work.
>>     println(beta) 
>>     println(-sum(compute_ll(beta,T,0)))
>>     # compute_ll computes the individual likelihood contributions
>>     # in the sample. T is the number of periods in the panel. The 0
>>     # is not used in this simple example. In related functions, I
>>     # pass on different values here to estimate finite mixtures of
>>     # the conditional logit model.
>>     return -sum(compute_ll(beta,T,0))
>> end
