Hi,
Long text but not fully convincing. At least concerning my questions (still
posted at the bottom). I'm risking a hurry reply without reading all
references (including "to be published" and PhD Thesis).
See comments below.


> that likelihood term is described by a goodness of fit, say chi-square
> function, which improves as the models become more "complex" or increase
> in the number of parameters. The Ockham's Razor term, on the other hand,
> penalizes a model for the number of parameters by including the a priori
> distribution and uncertainties in parameter values. Hence this term
> "offsets" the influence the likelihood term may have, thereby arriving
> at a choice of model where the number of parameters can be justified. In
> this case it is possible to use a uniform prior

Both, lognormal and gamma are physically based distributions, both have the
same numbers of parameters,
both give the same restored profile inside the noise, the same chi square
(and Rw-gamma even slightly better), the prior distributions are the same
(uniform?)
uncertainties in parameter values are comparable, what Razor term penalizes?


> space (see below). The solution with the greatest entropy relative to a
> priori model and experimental data is the solution with the least
> assumptions or the solution with the most amount of "randomness".
> Solutions with a lower entropy are solutions where specific assumptions
> have been made which can not be justified. For example, we apply this
> method to determining the modal properties of a size distribution i.e.
> monomodal or bimodal size distributions. Say if we assume, distribution
> has a bimodal features, when in reality it is a monodal distribution,
> this assumption will results in a solution with a lower entropy. The
> same is also case for the converse problem (see [3]). Moreover, we can
> use this method to select between different distribution models such as
> lognormal or gamma distributions.

It means that for gamma distribution you found the entropy was lower
and then this model is not justified. Significantly lower?  Probably you
have a measure concerning the significance of the difference between the
entropies of the two models? On the other hand, why lognormal is "the
solution with the least assumptions"?

> What we are trying to do with the full Bayesian/MaxEnt method (see
> [1])is determine a "free form solution" or a "non-parametric solution"
> [5],f, where the  solutions is either a line profile or a distribution
> determine from the experimental data and knowledge of the instrumental,
> noise and background effects. By "free form"or "non-parametric solution"
> I mean a profile and/or distributions which does not assume a specific
> set of parameters, as defined by say a lognormal distribution or a Voigt
> line profile function. The a priori model, m, can be defined by a

And, certainly, the free form solution has the highest entropy, anyway
higher than the initial guess (lognormal). This is the optimal solution, if
I understood correctly. I wonder if this optimal solution from the point of
view of MaxEnt. is not one from the following solutions: w*Logn+(1-w)*Gamm.
To these solutions (of infinite multiplicity) the peak profiles are
indifferent.



> Dear All,
> I'm sorry for the delay in relying. I also want to pass on my thanks to
> Jim Cline for pointing out that wasn't around to response to some of the
> queries/issues. It has been interesting reading the discussion, since
> coming back to Sydney. I don't mean to add more fuel to the fire, but I
> do hope to outline/address some of the issues which have been raised,
> while  giving a more precise outline of the Bayesian/maximum entropy
> (MaxEnt) method as applied to line profile analysis.
>
> By way of background information and reference, the most recent
> publications which present the theory and application of the
> Bayesian/MaxEnt to analyzing size/shape broadened (simulated and
> experimental) data are given in [1-4]-- see below. None of this work
> would be possible without the core collaborators which include: Jim
> Cline (NIST), Walter Kalceff (UTS), Annette Dowd (UTS) and John Bonevich
> (NIST) in various combinations.
>
> In summary, [1] gives a full  and mathematical derivation of the
> Bayesian/MaxEnt method and is applied to simulated  and experimental
> data. In [2], the application of the full Bayesian/Maxent and Markov
> Chain Monte Carlo (MCMC) methods to size analysis. Ref.[3] shows how
> Bayesian/MaxEnt/MCMC methods can be applied to distinguish monomodal and
> bimodal size distributions i.e Bayesian model selection. Ref [4] is
> another application of Bayesian model selection  to distinguish between
> lognormal and gamma distributions which also includes full TEM
> data/analysis and also demonstrates how the method is sensitive to shape
> and microstrain effects in the line profile data.
>
> Let me first address  Nicolae's queries. The application of the
> Bayesian/MCMC method in [2] was simply to demonstrate how the method
> could be used to explore the parameter space, while also determining the
> probability density functions of the parameters. It's not used in model
> selection, but as mentioned above this is outlined in [3,4]. In the
> context of [2], a uniform prior distribution can be used and the method
> in this case defaults to maximizing the likelihood function. However,
> the situation is different in Bayesian model selection where there are
> two competing terms arise from the evaluation of the evidence or
> integrated likelihood function: first, being the Ockham's Razor term;
> the second, is the likelihood term (see chapter 4 in [5]). We all know
> that likelihood term is described by a goodness of fit, say chi-square
> function, which improves as the models become more "complex" or increase
> in the number of parameters. The Ockham's Razor term, on the other hand,
> penalizes a model for the number of parameters by including the a priori
> distribution and uncertainties in parameter values. Hence this term
> "offsets" the influence the likelihood term may have, thereby arriving
> at a choice of model where the number of parameters can be justified. In
> this case it is possible to use a uniform prior (chapter 4 in [5]).
> However, care must be taken in defining the limits and the carefully
> quantifying the all uncertainties (see [3]).
>
> Essentially what we do is to apply the Bayesian/MCMC methods to explore
> the parameters using a variety of models, determine the distributions
> for all the parameters, carryout the integration to evaluate the
> evidence term and finally evaluate the probabilities for all the models.
> Once a favorable model is found, it becomes the a priori model in the
> full Bayesian/MaxEnt method. That is, our initial guess for the full
> Bayesian/MaxEnt method. The plausibility of this model is quantified by
> the entropy function. In other words it acts as measure in a functional
> space (see below). The solution with the greatest entropy relative to a
> priori model and experimental data is the solution with the least
> assumptions or the solution with the most amount of "randomness".
> Solutions with a lower entropy are solutions where specific assumptions
> have been made which can not be justified. For example, we apply this
> method to determining the modal properties of a size distribution i.e.
> monomodal or bimodal size distributions. Say if we assume, distribution
> has a bimodal features, when in reality it is a monodal distribution,
> this assumption will results in a solution with a lower entropy. The
> same is also case for the converse problem (see [3]). Moreover, we can
> use this method to select between different distribution models such as
> lognormal or gamma distributions.
>
> In the case of the full Bayesian/MaxEnt method, the a priori probability
> density function is defined in terms of the entropy function:
> S=\sum_{i=1}^{M}[f_{i}-m_{i}-f_{i}\ln(f_{i}/m_{i})],
> where m is the a priori model; f is the unknown line profile or
> distribution, summed over the 1...M points that define the
> distribution/profile.
> What we are trying to do with the full Bayesian/MaxEnt method (see
> [1])is determine a "free form solution" or a "non-parametric solution"
> [5],f, where the  solutions is either a line profile or a distribution
> determine from the experimental data and knowledge of the instrumental,
> noise and background effects. By "free form"or "non-parametric solution"
> I mean a profile and/or distributions which does not assume a specific
> set of parameters, as defined by say a lognormal distribution or a Voigt
> line profile function. The a priori model, m, can be defined by a
> specific distribution or line profile function, but as mentioned above
> only represents an initial guess as defined by the MCMC approach.
>
> Why the entropy function? Let's just say, there is over 100 years  of
> statistical mechanics and almost 50 years of communication theory,
> information theory and MaxEnt literature which gives some justification
> for use of the entropy function. For starters try [5-8].  But recent
> advances in "statistical geometry" and differential-geometry methods
> applied to statistics given a firm mathematical basis for the use of
> measures like the entropy function (see [9]). Moreover these recent
> developments point demonstrate that it's important to take into
> consideration the metric or geometry of the problem.
>
> In a mathematical physics context, line profile analysis is simply not
> about parameter estimations and curve fitting, but more fundamentally a
> problem of mapping a functional space. By functional space I mean that
> each point represents a size, shape and/or a dislocation distribution. The
> present methods used in line profile analysis, make specific assumptions
> about the distributions and/or profile functions, and only represent a
> "point" in the functional space. Moreover, using a Bayesian/maximum
> entropy reasoning these assumptions may not be physically justified.
> This is the basic underlying weakness in the present methods. By stating
> this I don't mean to be critical of the present efforts. It is a
> statement drawn from a theoretical/mathematical point of view.
>
> The Bayesian/MCMC and Bayesian/MaxEnt methods have been tested on a
> wider verity of simulated data and applied and being used to re-analyze
> experimental data (i.e. CEO2 round-robin data). Here I can't stress
> enough the need to carryout full and rigorous simulations which take
> inot account eh instrumental and noise/background effects etc. In
> addition, when/where possible blind test should be carried out. I will
> be presenting the two talks: at Denver and Florence where I will be
> giving a full description of these methods as applied to developing the
> latest NIST Nanocrystallite size SRM1979. (Hope to see you there...)
>
> About microstrain/dislocations. This si a really hard problem... We are
> presently working on applying the Bayesian/MaxEnt methods to determining
> microstrain/dislocation distributions. We have number theoretically
> approaches for this up our sleeves. A simplified approach as been
> presented in [10] (see chapter 5). It has been generalized to include
> elastically anisotropic materials by applying the contrast factors
> (unpublished). The second approach is computationally time consuming but
> involves simulating various microstructures and determining their
> probabilities and entropy relative to the experimental data. But this
> problem is really hard and progress is slow. A third approaches,
> quantifies the probabilities for various models and selects the best
> give the experimental data. This includes both size and dislocations
> broadening effects (I'm never short on ideas....)
>
> I hope this helps and has addressed some of the queries/questions. Best
> Regards, Nick
>
>
> References
> [1] Armstrong, N. et al. (2004a), "Bayesian inference of nanoparticle
> broadened x-ray line profiles", J. Res. Nat. Inst. Stand. Techn.,
> 109(1),155-178,
>      URL:http://nvl.nist.gov/pub/nistpubs/jres/109/1/cnt109-1.htm
> [2] Armstrong, N. et al. (2004b) "A Bayesian/Maximum Entropy method for
> certification of a
>     nanocrystallite-size NIST Standard Reference Material", chapter 8 in
> "Diffraction
>     analysis of the microstructure of materials", Springer-Verlag.
> [3] Armstrong, N. et al. (2004c), "X-ray diffraction characterisation of
> nanoparticle size and shape distributions:--Application to bimodal
> distributions",
>     Proceedings of the Wagga-Wagga Condense Matter Physics & Materials
> Science Conference, Janurary 2004.
>     URL:http://www.aip.org.au/wagga2004/papers.php
> [4] Armstrong, N. et al. (2005), "Bayesian analysis of ceria
> nanoparticles from line profile data", to be published in Advances in
> X-ray Analysis.
> [5] Sivia, D.S. (1996), "Data Analysis: A Bayesian tutorial"
> [6] Jaynes(1982), Proc. IEEE, 70(9), 939-952
> [7] Johnson & Shore, (1983), IEEE Trans. IT,26(6), 942-943
> [8] Shore & Jhnson (1980), IEEE Trans. IT,26(1), 26-37
> [9] Amari (1985), "Differential-geometrical methods in statistics",
> Springer, Berlin
> [10] Armstrong, N. PhD Thesis, UTS, Australia
>
> Nicolae Popa wrote:
>
> >Hi,
> >So, to resume your statements, by using Bayesian/Max.Entr. we can
> >distinguish between two distributions that can not be distinguished by
> >maximum likelihood (least square)?  Hard to swallow, once the restored
peak
> >profiles are "the same" inside the noise. What other information than the
> >peak profile, instrumental profile and statistical noise we have that
> >Bayes/Max.ent. can use and the least square cannot?
> >
> >"prior distributions to be uniform" - if I understand correctly you refer
to
> >the distributions of  "D0" and "sigma" of the lognormal (gamma)
distribution
> >from which the least square "chooses" the solution, not to the
distribution
> >itself (logn, gamm). Then, how is this prior distribution for
Baye/Max.ent.?
> >
> >Best,
> >Nick Popa
> >
> >
> >
> >
> >>Hi
> >>Sorry for the delay. The Bayesian results showed that the lognormal was
> >>
> >>
> >more probable. Yes, the problem is ill-condition which why you need to


Reply via email to