Re: [Numpy-discussion] Model and experiment fitting

2006-10-23 Thread Tom Loredo

Hi Sebastian,

I'm still unclear about the problem.  From your last description, it
sounds like the problem is *not* 2-D; you are trying to model
a 1-d function of a 1-D argument (unless B is not the scalar field
strength, but rather \vec B, the 3-D field vector).

Also, your problem appears to be a regression (i.e., curve/surface
fitting) problem, not a density estimation problem (comparing samples
drawn from two distributions), so the Numerical Recipes suggestion
(which addresses comparing densities) is probably not relevant.

It sounds like you have some observed data that perhaps can be modeled as

y_i = f(x_i; theta) + e_i

where e_i represent measurement error, f(x;theta) is the "true
model" that gives the intensity as a function of the field strength,
which depends on some parameters theta, and y_i are the measured
intensities.  I presume you know something about the y_i measurement
errors (like a std error for all or for each of them).

You also have a computational model that gives you simulation
data that can be modeled as

Y_i = g(X_i; theta)

with (presumably) no Y_i "measurement error" (though perhaps there is 
an equivalent if you use Monte Carlo in the calculation or have
other quantifiable sources of error).

As you phrased the problem, it appears you know theta exactly for
both the observational and simulation data (an unusual situation!).
You just want to ascertain whether g is a good approximation to
f.  Is this correct?

There is a large literature on problems like this where theta
is *unknown* and one wants to either calibrate the simulation or
infer theta for the observations from a sparse set of runs of
the simulation at various theta.  But no papers come immediately
to mind for the known theta case, though the "validation" stage of
some of the available papers address it.  For the unknown theta case,
the relevant literature is fairly new and goes under various
names (DACE: Design & Analysis of Computer Experiments;
BACCO: Bayesian Analysis of Computer Code Output; MUCM: Managing
Uncertainty in Complex Models).  The main tools are
interpolators that *quantify uncertainty* in the interpolation and 
machinery to propagate uncertainty through subsequent analyses.  Gaussian
processes (which include things like splines, kriging and random
walks as special cases, but with "error propagation") are
used to build an "emulator" for the simulation (an emulator is
an interpolator that also gives you measures of uncertainty
in the interpolation).  There are free codes available for
Matlab and R implementing this paradigm, but it's a tricky
business (as I am slowly discovering, having just started to
delve into it).

It is unclear from your description why the X_i cannot be made
equal to the x_i.  If either or both are uncontrollable---esp. if
you cannot set x_i but have to rely on "nature" providing you
x_i values that would differ from one set of observations to
the next---this adds a layer of complexity that is not trivial.
It becomes a "measurement error problem" (aka "errors-in-the-variables
problem"), with subtle aspects (you can easily go astray by 
simply ignoring measurement errors on the x's; they do *not*
"average out" as you get more sample points).  This can
(and has) been incorporated into the emulator framework, though
I don't know if any of the free code does this.

Part of the MUCM/BACCO approach is estimation of a "bias" or
"misfit" term between the simulation and calibration data (here
it would be delta(x) = g(x)-f(x)).  Perhaps
your problem can be phrased in terms of whether the bias function
is significantly nonzero anywhere, and if so, where.

There are various places to go to learn about this stuff if it
interests you; here are a few links.  The first two are to the
home pages of two working groups at SAMSI (an interdisciplinary
statistics/appl. math institute), which is devoting this year
to research on methods and applications in analysis of computer
models.

http://www.stat.duke.edu/~fei/samsi/index.html
http://www4.stat.ncsu.edu/~gawhite/SAMSIweb/
http://mucm.group.shef.ac.uk/
http://cran.r-project.org/src/contrib/Descriptions/BACCO.html
http://www2.imm.dtu.dk/~hbn/dace/

A particularly accessible paper is by the Los Alamos statistics
group:

http://portal.acm.org/citation.cfm?id=1039891.1039922

Again, I don't think this literature directly addresses your problem,
but it may provide the "right" way to approach it, if you are willing
to do some work to connect your problem to this framework.  The short
answer to your question is that I think you will be hard pressed to
get a good answer to your question using off-the-shelf SciPy tools.

Good luck,
Tom Loredo


-
This mail sent through IMP: http://horde.org/imp/

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your

Re: [Numpy-discussion] Model and experiment fitting.

2006-10-22 Thread Sebastian Żurek
Nadav Horesh napisał(a):
> 1. If at least one of your data sets to be interpulated is on a grid, 
> you can use numpy.ndimage.map function for fast interpolation for 2d (in fact 
> for any dimensional) dataset.

I've already used a splines to interpolate a missing simulated points. 
That procedure works  great and is very fast. But I'll check the 
numpy.ndimage - I haven't used it, yet.


> 2. Isn't there an analytic expression to average the expectration values of 
> SH over all possible orientations 
>  between B and the crystal axis? My experience shows that some analytic work 
> can save 99% of simulation time.

Well, the simulations are already very fast. The time consumption is 
approximately ~0.3s for a single powder spectrum (2.8GHz Pentium D). The 
calculations are held by an external, very fine EPR spectra simulation 
tool. The author must have incorporated into it a lot of 
rationalizations, but this is a binary tool (unfortunately) and I do not 
know, what exactly sits inside of it... All I know, is that the 
orientations are represented by a grid (with an increment step tunable 
by a user). From library documentation: "After having computed the 
spectrum for a number of orientations specified, the simulation function 
interpolates these spectra for additional orientations before summing up 
all spectra." The interpolation is accomplish with a splines.




Thank you for your comment,

best regards

Sebastian


-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] Model and experiment fitting.

2006-10-21 Thread Nadav Horesh

1. If at least one of your data sets to be interpulated is on a grid, you can 
use numpy.ndimage.map function for fast interpolation for 2d (in fact for any 
dimensional) dataset.

2. Isn't there an analytic expression to average the expectration values of SH 
over all possible orientations between B and the crystal axis? My experience 
shows that some analytic work can save 99% of simulation time.

  Nadav

-Original Message-
From:   [EMAIL PROTECTED] on behalf of Sebastian Zurek
Sent:   Sat 21-Oct-06 15:41
To: numpy-discussion@lists.sourceforge.net
Cc: 
Subject:Re: [Numpy-discussion] Model and experiment fitting.

Robert Kern napisal(a):

> Your description is a bit vague. 

Possibly by my weak English... I'll try to make myself clearer now.

Do you mean that you have some model function f
> that maps X values to Y values?
> 
>f(x) -> y
> 

My model is quantum energy operator - spin hamiltonian (SH) with some
additional assumption about so called 'line shape', 'line widths',etc.

   It describes various electron interactions, visible in electron 
paramagnetic resonance (EPR, ESR) experiment. The simplest SH can
be written in a form:
H = m B g S   (1)
where m is a constant (bohr magneton), B is magnetic field (my 
x-variable), g is so called 'zeeman matrix' and S is total spin angular
momentum operator.

Summing it all together: the simple model is parametrized by:
  - line shape,
  - line width,
  - zeeman matrix (3x3 diagonal matrix - the spatial dependence),
  - total spin S.

After SH (1) diagonalization one can obtain so called 'resonance fields' 
and  'resonance intensities'. After a convolution with appropriate  line 
shape function which is parametrized by the line width one can finally
get the simulated EPR spectrum (simDat=[[X1,...,Xn],[Y1,...,Yn]]).
This  is a roughly, schematic description, appropriate to EPR spectra of
monocrystals.

In my situation the problem is more sophisticated - I have 
polycrystaline (powders) data, and to obtain a simulated EPR powder 
spectrum I need to sum up the EPR spectra of monocrystals that come from 
many possible spatial orientations, and the resultant spectrum is an 
envelope of all the monocrystals spectra.

There's no simple model function that maps X -> Y.


> If that is the case, is there some reason that you cannot run your simulation 
> using the same X points as your experimental data?
> 

I can only demand a X range and number of X values within the range, 
there's no possibility to find the Y(X) for a specified X. These 
limitations on one hand come from  the external program I'm using to 
simulate the EPR spectra, on the other are a result of spatial averaging 
of EPR data for powders, where a lot of interpolations are involved.


> OTOH, is there some other independent variable (say Z) that *is* common 
> between 
> your experimental and simulated data?
> 
>f(z) -> (x, y)
> 

This is probably the situation I'm in. These other variables are my 
model parameters, namely: line shape-width, zeeman matrix... and they're
commen between the experiment and the simulation.


To make it clear.

I've already solved the problem by a simple linear interpolation of 
simulated points within the narrow neighborhood of experimental data 
point. The simulation points are uniformly distributed along the 
X-range, with a density I'm able to tune. It all works quite well but 
I'm founding it as a 'brute-force' method and I wonder, if there's any 
more sophisticated and maybe already incorporated into any Python module 
method?

Anyway, it looks like it's impossible to compare two discrete 2D data 
sets without any interpolations included... :]


A. M. Archibald has proposed spline fitting, which I'll try. I'll also 
look at the Numerical Recipes discussion he has proposed.


Sebastian


-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion



<>-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] Model and experiment fitting.

2006-10-21 Thread Sebastian Żurek
The problem seemed to be solved, by the A. M. Archibald clue.

I've used splines to fit the simulation data. After that, I can easily
find any Y(X) point, for all X in range (x_min,x_max) where x_min and 
x_max are the experiment independent variable. The experimental data 
stay untouched.

Sorry for all the confusion I've made.

Thanks a lot to all of You!

Sebastian


-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] Model and experiment fitting.

2006-10-21 Thread Charles R Harris
On 10/21/06, Sebastian Żurek <[EMAIL PROTECTED]> wrote:
Robert Kern napisał(a):To make it clear.I've already solved the problem by a simple linear interpolation ofsimulated points within the narrow neighborhood of experimental datapoint. The simulation points are uniformly distributed along the
X-range, with a density I'm able to tune. It all works quite well butI'm founding it as a 'brute-force' method and I wonder, if there's anymore sophisticated and maybe already incorporated into any Python module
method?Anyway, it looks like it's impossible to compare two discrete 2D datasets without any interpolations included... :]I note that interpolation can be seen as a linear map from the data points to the interpolation points, so a lot of standard tools to be used.
Chuck
-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] Model and experiment fitting.

2006-10-21 Thread Sebastian Żurek
A. M. Archibald napisał(a):
> 
> In scipy there are some very convenient spline fitting tools which
> will allow you to fit a nice smooth spline through the simulation data
> points (or near, if they have some uncertainty); you can then easily
> look at the RMS difference in the y values. You can also, less easily,
> look at the distance from the curve allowing for some uncertainty in
> the x values.
> 

I'll try a spline fitting. I've already made some linear interpolations 
(see Robert Kern answer) which works well enough to use it. I'm working 
on a genetic algorithms application to the model parameters 
optimalization problem and this RMSe comparison serves me as 'fitness 
function'. This 'fitness function' is important element in whole 
procedure, so I'm trying to found the best solution to obtain it.


> I suppose you could also fit a curve through the experimental points
> and compare the two curves in some way.
> 

Well, I can do it, indeed. But every single fitting procedure implicate 
some additional error, so when it comes to fit, I must use it very 
cautiously. The simulated data-points fitting should be the only 
acceptable fitting procedure, I guess.

> If you want to avoid using an a priori model, Numerical Recipes
> discuss some possible approaches ("Do two-dimensional distributions
> differ?" at http://www.nrbook.com/a/bookcpdf.html is one) but it's not
> clear how to turn the problem you describe into a solvable one - some
> assumption about how the models vary between sampled x values appears
> to be necessary, and that amounts to interpolation.
> 

I'll look to this NR discussion.



Thank You for these comments!

Sebastian


-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] Model and experiment fitting.

2006-10-21 Thread Sebastian Żurek
Robert Kern napisał(a):

> Your description is a bit vague. 

Possibly by my weak English... I'll try to make myself clearer now.

Do you mean that you have some model function f
> that maps X values to Y values?
> 
>f(x) -> y
> 

My model is quantum energy operator - spin hamiltonian (SH) with some
additional assumption about so called 'line shape', 'line widths',etc.

   It describes various electron interactions, visible in electron 
paramagnetic resonance (EPR, ESR) experiment. The simplest SH can
be written in a form:
H = m B g S   (1)
where m is a constant (bohr magneton), B is magnetic field (my 
x-variable), g is so called 'zeeman matrix' and S is total spin angular
momentum operator.

Summing it all together: the simple model is parametrized by:
  - line shape,
  - line width,
  - zeeman matrix (3x3 diagonal matrix - the spatial dependence),
  - total spin S.

After SH (1) diagonalization one can obtain so called 'resonance fields' 
and  'resonance intensities'. After a convolution with appropriate  line 
shape function which is parametrized by the line width one can finally
get the simulated EPR spectrum (simDat=[[X1,...,Xn],[Y1,...,Yn]]).
This  is a roughly, schematic description, appropriate to EPR spectra of
monocrystals.

In my situation the problem is more sophisticated - I have 
polycrystaline (powders) data, and to obtain a simulated EPR powder 
spectrum I need to sum up the EPR spectra of monocrystals that come from 
many possible spatial orientations, and the resultant spectrum is an 
envelope of all the monocrystals spectra.

There's no simple model function that maps X -> Y.


> If that is the case, is there some reason that you cannot run your simulation 
> using the same X points as your experimental data?
> 

I can only demand a X range and number of X values within the range, 
there's no possibility to find the Y(X) for a specified X. These 
limitations on one hand come from  the external program I'm using to 
simulate the EPR spectra, on the other are a result of spatial averaging 
of EPR data for powders, where a lot of interpolations are involved.


> OTOH, is there some other independent variable (say Z) that *is* common 
> between 
> your experimental and simulated data?
> 
>f(z) -> (x, y)
> 

This is probably the situation I'm in. These other variables are my 
model parameters, namely: line shape-width, zeeman matrix... and they're
commen between the experiment and the simulation.


To make it clear.

I've already solved the problem by a simple linear interpolation of 
simulated points within the narrow neighborhood of experimental data 
point. The simulation points are uniformly distributed along the 
X-range, with a density I'm able to tune. It all works quite well but 
I'm founding it as a 'brute-force' method and I wonder, if there's any 
more sophisticated and maybe already incorporated into any Python module 
method?

Anyway, it looks like it's impossible to compare two discrete 2D data 
sets without any interpolations included... :]


A. M. Archibald has proposed spline fitting, which I'll try. I'll also 
look at the Numerical Recipes discussion he has proposed.


Sebastian


-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] Model and experiment fitting.

2006-10-20 Thread A. M. Archibald
On 20/10/06, Sebastian Żurek <[EMAIL PROTECTED]> wrote:


> Is there something like that in any numerical python modules (numpy,
> pylab) I could use?

In scipy there are some very convenient spline fitting tools which
will allow you to fit a nice smooth spline through the simulation data
points (or near, if they have some uncertainty); you can then easily
look at the RMS difference in the y values. You can also, less easily,
look at the distance from the curve allowing for some uncertainty in
the x values.

I suppose you could also fit a curve through the experimental points
and compare the two curves in some way.

> I can imagine, I can fit the data with some polynomial or whatever,
> and than compare the fitted data, but my goal is to operate on
> as raw data as it's possible.

If you want to avoid using an a priori model, Numerical Recipes
discuss some possible approaches ("Do two-dimensional distributions
differ?" at http://www.nrbook.com/a/bookcpdf.html is one) but it's not
clear how to turn the problem you describe into a solvable one - some
assumption about how the models vary between sampled x values appears
to be necessary, and that amounts to interpolation.

A. M. Archibald
-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] Model and experiment fitting.

2006-10-20 Thread Robert Kern
Sebastian Żurek wrote:
> Hi!
> 
> This is probably a silly question but I'm getting confused with a 
> certain problem: a comparison between experimental data points (2D 
> points set) and a model (2D points set - no analytical form).
> 
> The physical model produces (by a sophisticated simulations done by an 
> external program) some 2D points data and  one of my task is to compare 
> those calculated data with an experimental one.
> 
> The experimental and modeled data have form of 2D curves, build of n 
> 2D-points, i.e.:
> 
> expDat=[[x1,x2,x3,..xn],[y1,y2,y3,...,yn]]
> simDat=[[X1,X2,X3,...,Xn],[Y1,Y2,Y3,...,Yn]]
> 
> The task of determining, let's say, a root mean squarred error (RMSe)
> is trivial if x1==X1, x2==X2, etc.
> 
> In general, which is a common situation xk differs from Xk (k=0..n) and 
> one may not simply compare succeeding Yk and yk (k=0..n) to determine 
> the goodness-of-fit. The distance h=Xk-X(k-1) is constant, but similar
> distance m(k)=xk-x(k-1) depends on k-th point and is not a constant 
> value, although the data array lengths for simulation and experiment are 
> the same.

Your description is a bit vague. Do you mean that you have some model function 
f 
that maps X values to Y values?

   f(x) -> y

If that is the case, is there some reason that you cannot run your simulation 
using the same X points as your experimental data?

OTOH, is there some other independent variable (say Z) that *is* common between 
your experimental and simulated data?

   f(z) -> (x, y)

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco


-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


[Numpy-discussion] Model and experiment fitting.

2006-10-20 Thread Sebastian Żurek
Hi!

This is probably a silly question but I'm getting confused with a 
certain problem: a comparison between experimental data points (2D 
points set) and a model (2D points set - no analytical form).

The physical model produces (by a sophisticated simulations done by an 
external program) some 2D points data and  one of my task is to compare 
those calculated data with an experimental one.

The experimental and modeled data have form of 2D curves, build of n 
2D-points, i.e.:

expDat=[[x1,x2,x3,..xn],[y1,y2,y3,...,yn]]
simDat=[[X1,X2,X3,...,Xn],[Y1,Y2,Y3,...,Yn]]

The task of determining, let's say, a root mean squarred error (RMSe)
is trivial if x1==X1, x2==X2, etc.

In general, which is a common situation xk differs from Xk (k=0..n) and 
one may not simply compare succeeding Yk and yk (k=0..n) to determine 
the goodness-of-fit. The distance h=Xk-X(k-1) is constant, but similar
distance m(k)=xk-x(k-1) depends on k-th point and is not a constant 
value, although the data array lengths for simulation and experiment are 
the same.

My first idea was to do some interpolations to obtain the missing 
points, but I held it 'by a hand' (which, BTW gave quite rewarding 
results)  and I suppose, there's some i.g. numpy method to do it for me, 
isn't it?

I suppose to do something like:

gfit(expDat,simDat,'measure_type')

which I hope will return the number determining the goodness-of-fit
(mean squarred error, root mean squarred error,...) of two sets of 
discrete 2D data points.

Is there something like that in any numerical python modules (numpy, 
pylab) I could use?


I can imagine, I can fit the data with some polynomial or whatever,
and than compare the fitted data, but my goal is to operate on
as raw data as it's possible.

Thanks for your comments!

Sebastian


-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion