On 17 December 2015 at 11:00, Timothée Nicolas <timothee.nico...@gmail.com> wrote:
> Hi, > > So, valgrind is OK (at least on the local machine. Actually on the cluster > helios, it produces strange results even for the simplest petsc program > PetscInitialize followed by PetscFinalize, I will try to figure this out > with their technical team), and I have also tried with exactly the same > versions (3.6.0) and it does not change the behavior. > > So now I would like to now how to have a grip on what comes in and out of > the SNES and the KSP internal to the SNES. That is, I would like to inspect > manually the vector which enters the SNES in the first place (should be > zero I believe), what is being fed to the KSP, and the vector which comes > out of it, etc. if possible at each iteration of the SNES. I want to > actually *see* these vectors, and compute there norm by hand. The trouble > is, it is really hard to understand why the newton residuals are not > reduced since the KSP converges so nicely. This does not make any sense to > me, so I want to know what happens to the vectors. But on the SNES list of > routines, I did not find the tools that would allow me to do that (and > messing around with the C code is too hard for me, it would take me weeks). > Does someone have a hint ? > The only sane way to do this is to write a custom monitor for your SNES object. http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESMonitorSet.html Inside your monitor, you have access the SNES, and everything it defines, e.g. the current solution, non-linear residual, KSP etc. See these pages http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetSolution.html http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetFunction.html http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESGetKSP.html Then you can pull apart the residual and compute specific norms (or plot the residual). Hopefully you can access everything you need to perform your analysis. Cheers, Dave > > Thx > > Timothee > > > > > 2015-12-15 14:20 GMT+09:00 Matthew Knepley <knep...@gmail.com>: > >> On Mon, Dec 14, 2015 at 11:06 PM, Timothée Nicolas < >> timothee.nico...@gmail.com> wrote: >> >>> There is a diference in valgrind indeed between the two. It seems to be >>> clean on my desktop Mac OS X but not on the cluster. I'll try to see what's >>> causing this. I still don't understand well what's causing memory leaks in >>> the case where all PETSc objects are freed correctly (as can pbe checked >>> with -log_summary). >>> >>> Also, I have tried running either >>> >>> valgrind ./my_code -option1 -option2... >>> >>> or >>> >>> valgrind mpiexec -n 1 ./my_code -option1 -option2... >>> >> >> Note here you would need --trace-children=yes for valgrind. >> >> Matt >> >> >>> It seems the second is the correct way to proceed right ? This gives >>> very different behaviour for valgrind. >>> >>> Timothee >>> >>> >>> >>> 2015-12-14 17:38 GMT+09:00 Timothée Nicolas <timothee.nico...@gmail.com> >>> : >>> >>>> OK, I'll try that, thx >>>> >>>> 2015-12-14 17:38 GMT+09:00 Dave May <dave.mayhe...@gmail.com>: >>>> >>>>> You have the configure line, so it should be relatively straight >>>>> forward to configure / build petsc in your home directory. >>>>> >>>>> >>>>> On 14 December 2015 at 09:34, Timothée Nicolas < >>>>> timothee.nico...@gmail.com> wrote: >>>>> >>>>>> OK, The problem is that I don't think I can change this easily as far >>>>>> as the cluster is concerned. I obtain access to petsc by loading the >>>>>> petsc >>>>>> module, and even if I have a few choices, I don't see any debug builds... >>>>>> >>>>>> 2015-12-14 17:26 GMT+09:00 Dave May <dave.mayhe...@gmail.com>: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Monday, 14 December 2015, Timothée Nicolas < >>>>>>> timothee.nico...@gmail.com> wrote: >>>>>>> >>>>>>>> Hum, OK. I use FORTRAN by the way. Is your comment still valid ? >>>>>>>> >>>>>>> >>>>>>> No. Fortran compilers init variables to zero. >>>>>>> In this case, I would run a debug build on your OSX machine through >>>>>>> valgrind and make sure it is clean. >>>>>>> >>>>>>> Other obvious thing to check what happens if use exactly the same >>>>>>> petsc builds on both machines. I see 3.6.1 and 3.6.0 are being used. >>>>>>> >>>>>>> For all this type of checking, I would definitely use debug builds >>>>>>> on both machines. Your cluster build is using the highest level of >>>>>>> optimization... >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> I'll check anyway, but I thought I had been careful about this sort >>>>>>>> of things. >>>>>>>> >>>>>>>> Also, I thought the problem on Mac OS X may have been due to the >>>>>>>> fact I used the version with debugging on, so I rerun configure with >>>>>>>> --with-debugging=no, which did not change anything. >>>>>>>> >>>>>>>> Thx >>>>>>>> >>>>>>>> Timothee >>>>>>>> >>>>>>>> >>>>>>>> 2015-12-14 17:04 GMT+09:00 Dave May <dave.mayhe...@gmail.com>: >>>>>>>> >>>>>>>>> One suggestion is you have some uninitialized variables in your >>>>>>>>> pcshell. Despite your arch being called "debug", your configure >>>>>>>>> options >>>>>>>>> indicate you have turned debugging off. >>>>>>>>> >>>>>>>>> C standard doesn't prescribe how uninit variables should be >>>>>>>>> treated - the behavior is labelled as undefined. As a result, >>>>>>>>> different >>>>>>>>> compilers on different archs with the same optimization flags can and >>>>>>>>> will >>>>>>>>> treat uninit variables differently. I find OSX c compilers tend to >>>>>>>>> set them >>>>>>>>> to zero. >>>>>>>>> >>>>>>>>> I suggest compiling a debug build on both machines and trying your >>>>>>>>> test again. Also, consider running the debug builds through valgrind. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Dave >>>>>>>>> >>>>>>>>> On Monday, 14 December 2015, Timothée Nicolas < >>>>>>>>> timothee.nico...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I have noticed I have a VERY big difference in behaviour between >>>>>>>>>> two machines in my problem, solved with SNES. I can't explain it, >>>>>>>>>> because I >>>>>>>>>> have tested my operators which give the same result. I also checked >>>>>>>>>> that >>>>>>>>>> the vectors fed to the SNES are the same. The problem happens only >>>>>>>>>> with my >>>>>>>>>> shell preconditioner. When I don't use it, and simply solve using >>>>>>>>>> -snes_mf, >>>>>>>>>> I don't see anymore than the usual 3-4 changing digits at the end of >>>>>>>>>> the >>>>>>>>>> residuals. However, when I use my pcshell, the results are completely >>>>>>>>>> different between the two machines. >>>>>>>>>> >>>>>>>>>> I have attached output_SuperComputer.txt and >>>>>>>>>> output_DesktopComputer.txt, which correspond to the output from the >>>>>>>>>> exact >>>>>>>>>> same code and options (and of course same input data file !). More >>>>>>>>>> precisely >>>>>>>>>> >>>>>>>>>> output_SuperComputer.txt : output on a supercomputer called >>>>>>>>>> Helios, sorry I don't know the exact specs. >>>>>>>>>> In this case, the SNES norms are reduced successively: >>>>>>>>>> 0 SNES Function norm 4.867111712420e-03 >>>>>>>>>> 1 SNES Function norm 5.632325929998e-08 >>>>>>>>>> 2 SNES Function norm 7.427800084502e-15 >>>>>>>>>> >>>>>>>>>> output_DesktopComputer.txt : output on a Mac OS X Yosemite 3.4 >>>>>>>>>> GHz Intel Core i5 16GB 1600 MHz DDr3. (the same happens on an other >>>>>>>>>> laptop >>>>>>>>>> with Mac OS X Mavericks). >>>>>>>>>> In this case, I obtain the following for the SNES norms, >>>>>>>>>> while in the other, I obtain >>>>>>>>>> 0 SNES Function norm 4.867111713544e-03 >>>>>>>>>> 1 SNES Function norm 1.560094052222e-03 >>>>>>>>>> 2 SNES Function norm 1.552118650943e-03 >>>>>>>>>> 3 SNES Function norm 1.552106297094e-03 >>>>>>>>>> 4 SNES Function norm 1.552106277949e-03 >>>>>>>>>> which I can't explain, because otherwise the KSP residual (with >>>>>>>>>> the same operator, which I checked) behave well. >>>>>>>>>> >>>>>>>>>> As you can see, the first time the preconditioner is applied >>>>>>>>>> (DB_, DP_, Drho_ and PS_ solves), the two outputs coincide (except >>>>>>>>>> for the >>>>>>>>>> few last digits, up to 9 actually, which is more than I would >>>>>>>>>> expect), and >>>>>>>>>> everything starts to diverge at the first print of the main KSP (the >>>>>>>>>> one >>>>>>>>>> stemming from the SNES) residual norms. >>>>>>>>>> >>>>>>>>>> Do you have an idea what may cause such a strange behaviour ? >>>>>>>>>> >>>>>>>>>> Best >>>>>>>>>> >>>>>>>>>> Timothee >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>>> >>>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > >