Hi Sebastian, I'm still unclear about the problem. From your last description, it sounds like the problem is *not* 2-D; you are trying to model a 1-d function of a 1-D argument (unless B is not the scalar field strength, but rather \vec B, the 3-D field vector).
Also, your problem appears to be a regression (i.e., curve/surface fitting) problem, not a density estimation problem (comparing samples drawn from two distributions), so the Numerical Recipes suggestion (which addresses comparing densities) is probably not relevant. It sounds like you have some observed data that perhaps can be modeled as y_i = f(x_i; theta) + e_i where e_i represent measurement error, f(x;theta) is the "true model" that gives the intensity as a function of the field strength, which depends on some parameters theta, and y_i are the measured intensities. I presume you know something about the y_i measurement errors (like a std error for all or for each of them). You also have a computational model that gives you simulation data that can be modeled as Y_i = g(X_i; theta) with (presumably) no Y_i "measurement error" (though perhaps there is an equivalent if you use Monte Carlo in the calculation or have other quantifiable sources of error). As you phrased the problem, it appears you know theta exactly for both the observational and simulation data (an unusual situation!). You just want to ascertain whether g is a good approximation to f. Is this correct? There is a large literature on problems like this where theta is *unknown* and one wants to either calibrate the simulation or infer theta for the observations from a sparse set of runs of the simulation at various theta. But no papers come immediately to mind for the known theta case, though the "validation" stage of some of the available papers address it. For the unknown theta case, the relevant literature is fairly new and goes under various names (DACE: Design & Analysis of Computer Experiments; BACCO: Bayesian Analysis of Computer Code Output; MUCM: Managing Uncertainty in Complex Models). The main tools are interpolators that *quantify uncertainty* in the interpolation and machinery to propagate uncertainty through subsequent analyses. Gaussian processes (which include things like splines, kriging and random walks as special cases, but with "error propagation") are used to build an "emulator" for the simulation (an emulator is an interpolator that also gives you measures of uncertainty in the interpolation). There are free codes available for Matlab and R implementing this paradigm, but it's a tricky business (as I am slowly discovering, having just started to delve into it). It is unclear from your description why the X_i cannot be made equal to the x_i. If either or both are uncontrollable---esp. if you cannot set x_i but have to rely on "nature" providing you x_i values that would differ from one set of observations to the next---this adds a layer of complexity that is not trivial. It becomes a "measurement error problem" (aka "errors-in-the-variables problem"), with subtle aspects (you can easily go astray by simply ignoring measurement errors on the x's; they do *not* "average out" as you get more sample points). This can (and has) been incorporated into the emulator framework, though I don't know if any of the free code does this. Part of the MUCM/BACCO approach is estimation of a "bias" or "misfit" term between the simulation and calibration data (here it would be delta(x) = g(x)-f(x)). Perhaps your problem can be phrased in terms of whether the bias function is significantly nonzero anywhere, and if so, where. There are various places to go to learn about this stuff if it interests you; here are a few links. The first two are to the home pages of two working groups at SAMSI (an interdisciplinary statistics/appl. math institute), which is devoting this year to research on methods and applications in analysis of computer models. http://www.stat.duke.edu/~fei/samsi/index.html http://www4.stat.ncsu.edu/~gawhite/SAMSIweb/ http://mucm.group.shef.ac.uk/ http://cran.r-project.org/src/contrib/Descriptions/BACCO.html http://www2.imm.dtu.dk/~hbn/dace/ A particularly accessible paper is by the Los Alamos statistics group: http://portal.acm.org/citation.cfm?id=1039891.1039922 Again, I don't think this literature directly addresses your problem, but it may provide the "right" way to approach it, if you are willing to do some work to connect your problem to this framework. The short answer to your question is that I think you will be hard pressed to get a good answer to your question using off-the-shelf SciPy tools. Good luck, Tom Loredo ------------------------------------------------- This mail sent through IMP: http://horde.org/imp/ ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion