Re: [Numpy-discussion] Model and experiment fitting
Hi Sebastian, I'm still unclear about the problem. From your last description, it sounds like the problem is *not* 2-D; you are trying to model a 1-d function of a 1-D argument (unless B is not the scalar field strength, but rather \vec B, the 3-D field vector). Also, your problem appears to be a regression (i.e., curve/surface fitting) problem, not a density estimation problem (comparing samples drawn from two distributions), so the Numerical Recipes suggestion (which addresses comparing densities) is probably not relevant. It sounds like you have some observed data that perhaps can be modeled as y_i = f(x_i; theta) + e_i where e_i represent measurement error, f(x;theta) is the "true model" that gives the intensity as a function of the field strength, which depends on some parameters theta, and y_i are the measured intensities. I presume you know something about the y_i measurement errors (like a std error for all or for each of them). You also have a computational model that gives you simulation data that can be modeled as Y_i = g(X_i; theta) with (presumably) no Y_i "measurement error" (though perhaps there is an equivalent if you use Monte Carlo in the calculation or have other quantifiable sources of error). As you phrased the problem, it appears you know theta exactly for both the observational and simulation data (an unusual situation!). You just want to ascertain whether g is a good approximation to f. Is this correct? There is a large literature on problems like this where theta is *unknown* and one wants to either calibrate the simulation or infer theta for the observations from a sparse set of runs of the simulation at various theta. But no papers come immediately to mind for the known theta case, though the "validation" stage of some of the available papers address it. For the unknown theta case, the relevant literature is fairly new and goes under various names (DACE: Design & Analysis of Computer Experiments; BACCO: Bayesian Analysis of Computer Code Output; MUCM: Managing Uncertainty in Complex Models). The main tools are interpolators that *quantify uncertainty* in the interpolation and machinery to propagate uncertainty through subsequent analyses. Gaussian processes (which include things like splines, kriging and random walks as special cases, but with "error propagation") are used to build an "emulator" for the simulation (an emulator is an interpolator that also gives you measures of uncertainty in the interpolation). There are free codes available for Matlab and R implementing this paradigm, but it's a tricky business (as I am slowly discovering, having just started to delve into it). It is unclear from your description why the X_i cannot be made equal to the x_i. If either or both are uncontrollable---esp. if you cannot set x_i but have to rely on "nature" providing you x_i values that would differ from one set of observations to the next---this adds a layer of complexity that is not trivial. It becomes a "measurement error problem" (aka "errors-in-the-variables problem"), with subtle aspects (you can easily go astray by simply ignoring measurement errors on the x's; they do *not* "average out" as you get more sample points). This can (and has) been incorporated into the emulator framework, though I don't know if any of the free code does this. Part of the MUCM/BACCO approach is estimation of a "bias" or "misfit" term between the simulation and calibration data (here it would be delta(x) = g(x)-f(x)). Perhaps your problem can be phrased in terms of whether the bias function is significantly nonzero anywhere, and if so, where. There are various places to go to learn about this stuff if it interests you; here are a few links. The first two are to the home pages of two working groups at SAMSI (an interdisciplinary statistics/appl. math institute), which is devoting this year to research on methods and applications in analysis of computer models. http://www.stat.duke.edu/~fei/samsi/index.html http://www4.stat.ncsu.edu/~gawhite/SAMSIweb/ http://mucm.group.shef.ac.uk/ http://cran.r-project.org/src/contrib/Descriptions/BACCO.html http://www2.imm.dtu.dk/~hbn/dace/ A particularly accessible paper is by the Los Alamos statistics group: http://portal.acm.org/citation.cfm?id=1039891.1039922 Again, I don't think this literature directly addresses your problem, but it may provide the "right" way to approach it, if you are willing to do some work to connect your problem to this framework. The short answer to your question is that I think you will be hard pressed to get a good answer to your question using off-the-shelf SciPy tools. Good luck, Tom Loredo - This mail sent through IMP: http://horde.org/imp/ - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your
Re: [Numpy-discussion] Model and experiment fitting.
Nadav Horesh napisał(a): > 1. If at least one of your data sets to be interpulated is on a grid, > you can use numpy.ndimage.map function for fast interpolation for 2d (in fact > for any dimensional) dataset. I've already used a splines to interpolate a missing simulated points. That procedure works great and is very fast. But I'll check the numpy.ndimage - I haven't used it, yet. > 2. Isn't there an analytic expression to average the expectration values of > SH over all possible orientations > between B and the crystal axis? My experience shows that some analytic work > can save 99% of simulation time. Well, the simulations are already very fast. The time consumption is approximately ~0.3s for a single powder spectrum (2.8GHz Pentium D). The calculations are held by an external, very fine EPR spectra simulation tool. The author must have incorporated into it a lot of rationalizations, but this is a binary tool (unfortunately) and I do not know, what exactly sits inside of it... All I know, is that the orientations are represented by a grid (with an increment step tunable by a user). From library documentation: "After having computed the spectrum for a number of orientations specified, the simulation function interpolates these spectra for additional orientations before summing up all spectra." The interpolation is accomplish with a splines. Thank you for your comment, best regards Sebastian - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Model and experiment fitting.
1. If at least one of your data sets to be interpulated is on a grid, you can use numpy.ndimage.map function for fast interpolation for 2d (in fact for any dimensional) dataset. 2. Isn't there an analytic expression to average the expectration values of SH over all possible orientations between B and the crystal axis? My experience shows that some analytic work can save 99% of simulation time. Nadav -Original Message- From: [EMAIL PROTECTED] on behalf of Sebastian Zurek Sent: Sat 21-Oct-06 15:41 To: numpy-discussion@lists.sourceforge.net Cc: Subject:Re: [Numpy-discussion] Model and experiment fitting. Robert Kern napisal(a): > Your description is a bit vague. Possibly by my weak English... I'll try to make myself clearer now. Do you mean that you have some model function f > that maps X values to Y values? > >f(x) -> y > My model is quantum energy operator - spin hamiltonian (SH) with some additional assumption about so called 'line shape', 'line widths',etc. It describes various electron interactions, visible in electron paramagnetic resonance (EPR, ESR) experiment. The simplest SH can be written in a form: H = m B g S (1) where m is a constant (bohr magneton), B is magnetic field (my x-variable), g is so called 'zeeman matrix' and S is total spin angular momentum operator. Summing it all together: the simple model is parametrized by: - line shape, - line width, - zeeman matrix (3x3 diagonal matrix - the spatial dependence), - total spin S. After SH (1) diagonalization one can obtain so called 'resonance fields' and 'resonance intensities'. After a convolution with appropriate line shape function which is parametrized by the line width one can finally get the simulated EPR spectrum (simDat=[[X1,...,Xn],[Y1,...,Yn]]). This is a roughly, schematic description, appropriate to EPR spectra of monocrystals. In my situation the problem is more sophisticated - I have polycrystaline (powders) data, and to obtain a simulated EPR powder spectrum I need to sum up the EPR spectra of monocrystals that come from many possible spatial orientations, and the resultant spectrum is an envelope of all the monocrystals spectra. There's no simple model function that maps X -> Y. > If that is the case, is there some reason that you cannot run your simulation > using the same X points as your experimental data? > I can only demand a X range and number of X values within the range, there's no possibility to find the Y(X) for a specified X. These limitations on one hand come from the external program I'm using to simulate the EPR spectra, on the other are a result of spatial averaging of EPR data for powders, where a lot of interpolations are involved. > OTOH, is there some other independent variable (say Z) that *is* common > between > your experimental and simulated data? > >f(z) -> (x, y) > This is probably the situation I'm in. These other variables are my model parameters, namely: line shape-width, zeeman matrix... and they're commen between the experiment and the simulation. To make it clear. I've already solved the problem by a simple linear interpolation of simulated points within the narrow neighborhood of experimental data point. The simulation points are uniformly distributed along the X-range, with a density I'm able to tune. It all works quite well but I'm founding it as a 'brute-force' method and I wonder, if there's any more sophisticated and maybe already incorporated into any Python module method? Anyway, it looks like it's impossible to compare two discrete 2D data sets without any interpolations included... :] A. M. Archibald has proposed spline fitting, which I'll try. I'll also look at the Numerical Recipes discussion he has proposed. Sebastian - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion <>- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Model and experiment fitting.
The problem seemed to be solved, by the A. M. Archibald clue. I've used splines to fit the simulation data. After that, I can easily find any Y(X) point, for all X in range (x_min,x_max) where x_min and x_max are the experiment independent variable. The experimental data stay untouched. Sorry for all the confusion I've made. Thanks a lot to all of You! Sebastian - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Model and experiment fitting.
On 10/21/06, Sebastian Żurek <[EMAIL PROTECTED]> wrote: Robert Kern napisał(a):To make it clear.I've already solved the problem by a simple linear interpolation ofsimulated points within the narrow neighborhood of experimental datapoint. The simulation points are uniformly distributed along the X-range, with a density I'm able to tune. It all works quite well butI'm founding it as a 'brute-force' method and I wonder, if there's anymore sophisticated and maybe already incorporated into any Python module method?Anyway, it looks like it's impossible to compare two discrete 2D datasets without any interpolations included... :]I note that interpolation can be seen as a linear map from the data points to the interpolation points, so a lot of standard tools to be used. Chuck - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Model and experiment fitting.
A. M. Archibald napisał(a): > > In scipy there are some very convenient spline fitting tools which > will allow you to fit a nice smooth spline through the simulation data > points (or near, if they have some uncertainty); you can then easily > look at the RMS difference in the y values. You can also, less easily, > look at the distance from the curve allowing for some uncertainty in > the x values. > I'll try a spline fitting. I've already made some linear interpolations (see Robert Kern answer) which works well enough to use it. I'm working on a genetic algorithms application to the model parameters optimalization problem and this RMSe comparison serves me as 'fitness function'. This 'fitness function' is important element in whole procedure, so I'm trying to found the best solution to obtain it. > I suppose you could also fit a curve through the experimental points > and compare the two curves in some way. > Well, I can do it, indeed. But every single fitting procedure implicate some additional error, so when it comes to fit, I must use it very cautiously. The simulated data-points fitting should be the only acceptable fitting procedure, I guess. > If you want to avoid using an a priori model, Numerical Recipes > discuss some possible approaches ("Do two-dimensional distributions > differ?" at http://www.nrbook.com/a/bookcpdf.html is one) but it's not > clear how to turn the problem you describe into a solvable one - some > assumption about how the models vary between sampled x values appears > to be necessary, and that amounts to interpolation. > I'll look to this NR discussion. Thank You for these comments! Sebastian - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Model and experiment fitting.
Robert Kern napisał(a): > Your description is a bit vague. Possibly by my weak English... I'll try to make myself clearer now. Do you mean that you have some model function f > that maps X values to Y values? > >f(x) -> y > My model is quantum energy operator - spin hamiltonian (SH) with some additional assumption about so called 'line shape', 'line widths',etc. It describes various electron interactions, visible in electron paramagnetic resonance (EPR, ESR) experiment. The simplest SH can be written in a form: H = m B g S (1) where m is a constant (bohr magneton), B is magnetic field (my x-variable), g is so called 'zeeman matrix' and S is total spin angular momentum operator. Summing it all together: the simple model is parametrized by: - line shape, - line width, - zeeman matrix (3x3 diagonal matrix - the spatial dependence), - total spin S. After SH (1) diagonalization one can obtain so called 'resonance fields' and 'resonance intensities'. After a convolution with appropriate line shape function which is parametrized by the line width one can finally get the simulated EPR spectrum (simDat=[[X1,...,Xn],[Y1,...,Yn]]). This is a roughly, schematic description, appropriate to EPR spectra of monocrystals. In my situation the problem is more sophisticated - I have polycrystaline (powders) data, and to obtain a simulated EPR powder spectrum I need to sum up the EPR spectra of monocrystals that come from many possible spatial orientations, and the resultant spectrum is an envelope of all the monocrystals spectra. There's no simple model function that maps X -> Y. > If that is the case, is there some reason that you cannot run your simulation > using the same X points as your experimental data? > I can only demand a X range and number of X values within the range, there's no possibility to find the Y(X) for a specified X. These limitations on one hand come from the external program I'm using to simulate the EPR spectra, on the other are a result of spatial averaging of EPR data for powders, where a lot of interpolations are involved. > OTOH, is there some other independent variable (say Z) that *is* common > between > your experimental and simulated data? > >f(z) -> (x, y) > This is probably the situation I'm in. These other variables are my model parameters, namely: line shape-width, zeeman matrix... and they're commen between the experiment and the simulation. To make it clear. I've already solved the problem by a simple linear interpolation of simulated points within the narrow neighborhood of experimental data point. The simulation points are uniformly distributed along the X-range, with a density I'm able to tune. It all works quite well but I'm founding it as a 'brute-force' method and I wonder, if there's any more sophisticated and maybe already incorporated into any Python module method? Anyway, it looks like it's impossible to compare two discrete 2D data sets without any interpolations included... :] A. M. Archibald has proposed spline fitting, which I'll try. I'll also look at the Numerical Recipes discussion he has proposed. Sebastian - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Model and experiment fitting.
On 20/10/06, Sebastian Żurek <[EMAIL PROTECTED]> wrote: > Is there something like that in any numerical python modules (numpy, > pylab) I could use? In scipy there are some very convenient spline fitting tools which will allow you to fit a nice smooth spline through the simulation data points (or near, if they have some uncertainty); you can then easily look at the RMS difference in the y values. You can also, less easily, look at the distance from the curve allowing for some uncertainty in the x values. I suppose you could also fit a curve through the experimental points and compare the two curves in some way. > I can imagine, I can fit the data with some polynomial or whatever, > and than compare the fitted data, but my goal is to operate on > as raw data as it's possible. If you want to avoid using an a priori model, Numerical Recipes discuss some possible approaches ("Do two-dimensional distributions differ?" at http://www.nrbook.com/a/bookcpdf.html is one) but it's not clear how to turn the problem you describe into a solvable one - some assumption about how the models vary between sampled x values appears to be necessary, and that amounts to interpolation. A. M. Archibald - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Re: [Numpy-discussion] Model and experiment fitting.
Sebastian Żurek wrote: > Hi! > > This is probably a silly question but I'm getting confused with a > certain problem: a comparison between experimental data points (2D > points set) and a model (2D points set - no analytical form). > > The physical model produces (by a sophisticated simulations done by an > external program) some 2D points data and one of my task is to compare > those calculated data with an experimental one. > > The experimental and modeled data have form of 2D curves, build of n > 2D-points, i.e.: > > expDat=[[x1,x2,x3,..xn],[y1,y2,y3,...,yn]] > simDat=[[X1,X2,X3,...,Xn],[Y1,Y2,Y3,...,Yn]] > > The task of determining, let's say, a root mean squarred error (RMSe) > is trivial if x1==X1, x2==X2, etc. > > In general, which is a common situation xk differs from Xk (k=0..n) and > one may not simply compare succeeding Yk and yk (k=0..n) to determine > the goodness-of-fit. The distance h=Xk-X(k-1) is constant, but similar > distance m(k)=xk-x(k-1) depends on k-th point and is not a constant > value, although the data array lengths for simulation and experiment are > the same. Your description is a bit vague. Do you mean that you have some model function f that maps X values to Y values? f(x) -> y If that is the case, is there some reason that you cannot run your simulation using the same X points as your experimental data? OTOH, is there some other independent variable (say Z) that *is* common between your experimental and simulated data? f(z) -> (x, y) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
[Numpy-discussion] Model and experiment fitting.
Hi! This is probably a silly question but I'm getting confused with a certain problem: a comparison between experimental data points (2D points set) and a model (2D points set - no analytical form). The physical model produces (by a sophisticated simulations done by an external program) some 2D points data and one of my task is to compare those calculated data with an experimental one. The experimental and modeled data have form of 2D curves, build of n 2D-points, i.e.: expDat=[[x1,x2,x3,..xn],[y1,y2,y3,...,yn]] simDat=[[X1,X2,X3,...,Xn],[Y1,Y2,Y3,...,Yn]] The task of determining, let's say, a root mean squarred error (RMSe) is trivial if x1==X1, x2==X2, etc. In general, which is a common situation xk differs from Xk (k=0..n) and one may not simply compare succeeding Yk and yk (k=0..n) to determine the goodness-of-fit. The distance h=Xk-X(k-1) is constant, but similar distance m(k)=xk-x(k-1) depends on k-th point and is not a constant value, although the data array lengths for simulation and experiment are the same. My first idea was to do some interpolations to obtain the missing points, but I held it 'by a hand' (which, BTW gave quite rewarding results) and I suppose, there's some i.g. numpy method to do it for me, isn't it? I suppose to do something like: gfit(expDat,simDat,'measure_type') which I hope will return the number determining the goodness-of-fit (mean squarred error, root mean squarred error,...) of two sets of discrete 2D data points. Is there something like that in any numerical python modules (numpy, pylab) I could use? I can imagine, I can fit the data with some polynomial or whatever, and than compare the fitted data, but my goal is to operate on as raw data as it's possible. Thanks for your comments! Sebastian - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion