Re: [julia-users] Re: Simple Finite Difference Methods
this stack overflow question indicates that there are two options ( http://stackoverflow.com/questions/153559/what-are-some-good-profilers-for-native-c-on-windows ) https://software.intel.com/sites/default/files/managed/cd/92/Intel-VTune-AmplifierXE-2015-Product-Brief-072914.pdf ($900) http://www.glowcode.com/summary.htm ($500) On Wed Dec 03 2014 at 9:11:28 AM Stefan Karpinski ste...@karpinski.org wrote: This seems nuts. There have to be good profilers on Windows – how do those work? On Tue, Dec 2, 2014 at 11:55 PM, Jameson Nash vtjn...@gmail.com wrote: (I forgot to mention, that, to be fair, the windows machine that was used to run this test was an underpowered dual-core hyperthreaded atom processor, whereas the linux and mac machines were pretty comparable Xeon and sandybridge machines, respectively. I only gave windows a factor of 2 advantage in the above computation in my accounting for this gap) On Tue Dec 02 2014 at 10:50:20 PM Tim Holy tim.h...@gmail.com wrote: Wow, those are pathetically-slow backtraces. Since most of us don't have machines with 500 cores, I don't see anything we can do. --Tim On Wednesday, December 03, 2014 03:14:02 AM Jameson Nash wrote: you could copy the whole stack (typically only a few 100kb, max of 8MB), then do the stack walk offline. if you could change the stack pages to copy-on-write, it may even not be too expensive. but this is the real problem: ``` |__/ | x86_64-linux-gnu julia @time for i=1:10^4 backtrace() end elapsed time: 2.789268693 seconds (3200320016 bytes allocated, 89.29% gc time) ``` ``` |__/ | x86_64-apple-darwin14.0.0 julia @time for i=1:10^4 backtrace() end elapsed time: 2.586410216 seconds (640048 bytes allocated, 89.96% gc time) ``` ``` jameson@julia:~/julia-win32$ ./usr/bin/julia.exe -E @time for i=1:10^3 backtrace() end fixme:winsock:WS_EnterSingleProtocolW unknown Protocol 0x fixme:winsock:WS_EnterSingleProtocolW unknown Protocol 0x err:dbghelp_stabs:stabs_parse Unknown stab type 0x0a elapsed time: 22.6314386 seconds (320032016 bytes allocated, 1.51% gc time) ``` ``` |__/ | i686-w64-mingw32 julia @time for i=1:10^4 backtrace() end elapsed time: 69.243275608 seconds (3200320800 bytes allocated, 13.16% gc time) ``` And yes, those gc fractions are verifiably correct. With gc_disable(), they execute in 1/10 of the time. So, that pretty much means you must take 1/100 of the samples if you want to preserve roughly the same slow down. On linux, I find the slowdown to be in the range of 2-5x, and consider that to be pretty reasonable, especially for what you're getting. If you took the same number of samples on windows, it would cause a 200-500x slowdown (give or take a few percent). If you wanted to offload this work to other cores to get the same level of accuracy and no more slowdown than linux, you would need a machine with 200-500 processors (give or take 2-5)! (I think I did those conversions correctly. However, since I just did them for the purposes of this email, sans calculator, and as I was typing, let me know if I made more than a factor of 2 error somewhere, or just have fun reading https://what-if.xkcd.com/84/ instead) On Tue Dec 02 2014 at 6:23:07 PM Tim Holy tim.h...@gmail.com wrote: On Tuesday, December 02, 2014 10:24:43 PM Jameson Nash wrote: You can't profile a moving target. The thread must be frozen first to ensure the stack trace doesn't change while attempting to record it Got it. I assume there's no good way to make a copy and then discard if the copy is bad? --Tim On Tue, Dec 2, 2014 at 5:12 PM Tim Holy tim.h...@gmail.com wrote: If the work of walking the stack is done in the thread, why does it cause any slowdown of the main process? But of course the time it takes to complete the backtrace sets an upper limit on how frequently you can take a snapshot. In that case, though, couldn't you just have the thread always collecting backtraces? --Tim On Tuesday, December 02, 2014 09:54:17 PM Jameson Nash wrote: That's essentially what we do now. (Minus the busy wait part). The overhead is too high to run it any more frequently -- it already causes a significant performance penalty on the system, even at the much lower sample rate than linux. However, I suspect the truncated backtraces on win32 were exaggerating the effect somewhat -- that should not be as much of an issue now. Sure, windows lets you snoop on (and modify) the address space of any process, you just need to find the right handle. On Tue, Dec 2, 2014 at 2:18 PM Tim Holy tim.h...@gmail.com wrote: On
Re: [julia-users] Re: Simple Finite Difference Methods
Some potentially-interesting links (of which I understand very little): http://stackoverflow.com/questions/860602/recommended-open-source-profilers#comment2363112_1137133 http://stackoverflow.com/questions/8406175/optimizing-stack-walking-performance I can tell from this comment: https://github.com/JuliaLang/julia/issues/2597#issuecomment-15159868 that you already know about this (and its negatives): http://www.lenholgate.com/blog/2008/09/alternative-call-stack-capturing.html --Tim On Wednesday, December 03, 2014 02:25:22 PM Jameson Nash wrote: this stack overflow question indicates that there are two options ( http://stackoverflow.com/questions/153559/what-are-some-good-profilers-for-n ative-c-on-windows ) https://software.intel.com/sites/default/files/managed/cd/92/Intel-VTune-Amp lifierXE-2015-Product-Brief-072914.pdf ($900) http://www.glowcode.com/summary.htm ($500) On Wed Dec 03 2014 at 9:11:28 AM Stefan Karpinski ste...@karpinski.org wrote: This seems nuts. There have to be good profilers on Windows – how do those work? On Tue, Dec 2, 2014 at 11:55 PM, Jameson Nash vtjn...@gmail.com wrote: (I forgot to mention, that, to be fair, the windows machine that was used to run this test was an underpowered dual-core hyperthreaded atom processor, whereas the linux and mac machines were pretty comparable Xeon and sandybridge machines, respectively. I only gave windows a factor of 2 advantage in the above computation in my accounting for this gap) On Tue Dec 02 2014 at 10:50:20 PM Tim Holy tim.h...@gmail.com wrote: Wow, those are pathetically-slow backtraces. Since most of us don't have machines with 500 cores, I don't see anything we can do. --Tim On Wednesday, December 03, 2014 03:14:02 AM Jameson Nash wrote: you could copy the whole stack (typically only a few 100kb, max of 8MB), then do the stack walk offline. if you could change the stack pages to copy-on-write, it may even not be too expensive. but this is the real problem: ``` |__/ | x86_64-linux-gnu julia @time for i=1:10^4 backtrace() end elapsed time: 2.789268693 seconds (3200320016 bytes allocated, 89.29% gc time) ``` ``` |__/ | x86_64-apple-darwin14.0.0 julia @time for i=1:10^4 backtrace() end elapsed time: 2.586410216 seconds (640048 bytes allocated, 89.96% gc time) ``` ``` jameson@julia:~/julia-win32$ ./usr/bin/julia.exe -E @time for i=1:10^3 backtrace() end fixme:winsock:WS_EnterSingleProtocolW unknown Protocol 0x fixme:winsock:WS_EnterSingleProtocolW unknown Protocol 0x err:dbghelp_stabs:stabs_parse Unknown stab type 0x0a elapsed time: 22.6314386 seconds (320032016 bytes allocated, 1.51% gc time) ``` ``` |__/ | i686-w64-mingw32 julia @time for i=1:10^4 backtrace() end elapsed time: 69.243275608 seconds (3200320800 bytes allocated, 13.16% gc time) ``` And yes, those gc fractions are verifiably correct. With gc_disable(), they execute in 1/10 of the time. So, that pretty much means you must take 1/100 of the samples if you want to preserve roughly the same slow down. On linux, I find the slowdown to be in the range of 2-5x, and consider that to be pretty reasonable, especially for what you're getting. If you took the same number of samples on windows, it would cause a 200-500x slowdown (give or take a few percent). If you wanted to offload this work to other cores to get the same level of accuracy and no more slowdown than linux, you would need a machine with 200-500 processors (give or take 2-5)! (I think I did those conversions correctly. However, since I just did them for the purposes of this email, sans calculator, and as I was typing, let me know if I made more than a factor of 2 error somewhere, or just have fun reading https://what-if.xkcd.com/84/ instead) On Tue Dec 02 2014 at 6:23:07 PM Tim Holy tim.h...@gmail.com wrote: On Tuesday, December 02, 2014 10:24:43 PM Jameson Nash wrote: You can't profile a moving target. The thread must be frozen first to ensure the stack trace doesn't change while attempting to record it Got it. I assume there's no good way to make a copy and then discard if the copy is bad? --Tim On Tue, Dec 2, 2014 at 5:12 PM Tim Holy tim.h...@gmail.com wrote: If the work of walking the stack is done in the thread, why does it cause any slowdown of the main process? But of course the time it takes to complete the backtrace sets an upper limit on how frequently you can take a snapshot. In that case, though, couldn't you
Re: [julia-users] Re: Simple Finite Difference Methods
That's a pretty serious bummer. I can't believe anybody puts up with this. Should we change the default initialization https://github.com/JuliaLang/julia/blob/b1c99af9bdeef22e0999b28388597757541e2cc7/base/profile.jl#L44 so that, on Windows, it fires every 100ms or so? And add a note to the Profile docs? --Tim On Wednesday, December 03, 2014 03:37:59 PM Jameson Nash wrote: The suggestion apparently is to use Event Tracing for Windows (aka ptrace/dtrace). Yes, that is faster (on linux too...), but misses the point entirely of profiling user code. the other offerings typically wrap StalkWalk64 (as we do), and complain about how absurdly slow it is we used to use RtlCaptureStackBackTrace, but it often failed to give useful backtraces. I think it depends too heavily of walking the EBP pointer chain (which typically doesn't exist on x86_64). As it happens, the remaining suggestions fall into the category of well, obviously you should just write your own EBP (32-bit base pointer register) pointer chain walk algorithm. here, I'll even write part of it for you ... which would be very helpful, if RBP (64-bit base pointer register) was used to make stack frame chains (hint: it isn't). and these days, the EBP isn't used to make stack pointer chains on x86 either. llvm3.5 contains the ability to interpret COFF files, so you could presumably write your own stack-walk algorithm. i don't recommend it, however. you might have to call out to StalkWalk anyways to access the microsoft symbol server (yes, off their network servers) to complete the stalk walk symbol lookup correctly On Wed Dec 03 2014 at 10:04:19 AM Tim Holy tim.h...@gmail.com wrote: Some potentially-interesting links (of which I understand very little): http://stackoverflow.com/questions/860602/recommended-open- source-profilers#comment2363112_1137133 http://stackoverflow.com/questions/8406175/optimizing-stack- walking-performance I can tell from this comment: https://github.com/JuliaLang/julia/issues/2597#issuecomment-15159868 that you already know about this (and its negatives): http://www.lenholgate.com/blog/2008/09/alternative-call-stac k-capturing.html --Tim On Wednesday, December 03, 2014 02:25:22 PM Jameson Nash wrote: this stack overflow question indicates that there are two options ( http://stackoverflow.com/questions/153559/what-are-some- good-profilers-for-n ative-c-on-windows ) https://software.intel.com/sites/default/files/managed/cd/ 92/Intel-VTune-Amp lifierXE-2015-Product-Brief-072914.pdf ($900) http://www.glowcode.com/summary.htm ($500) On Wed Dec 03 2014 at 9:11:28 AM Stefan Karpinski ste...@karpinski.org wrote: This seems nuts. There have to be good profilers on Windows – how do those work? On Tue, Dec 2, 2014 at 11:55 PM, Jameson Nash vtjn...@gmail.com wrote: (I forgot to mention, that, to be fair, the windows machine that was used to run this test was an underpowered dual-core hyperthreaded atom processor, whereas the linux and mac machines were pretty comparable Xeon and sandybridge machines, respectively. I only gave windows a factor of 2 advantage in the above computation in my accounting for this gap) On Tue Dec 02 2014 at 10:50:20 PM Tim Holy tim.h...@gmail.com wrote: Wow, those are pathetically-slow backtraces. Since most of us don't have machines with 500 cores, I don't see anything we can do. --Tim On Wednesday, December 03, 2014 03:14:02 AM Jameson Nash wrote: you could copy the whole stack (typically only a few 100kb, max of 8MB), then do the stack walk offline. if you could change the stack pages to copy-on-write, it may even not be too expensive. but this is the real problem: ``` |__/ | x86_64-linux-gnu julia @time for i=1:10^4 backtrace() end elapsed time: 2.789268693 seconds (3200320016 bytes allocated, 89.29% gc time) ``` ``` |__/ | x86_64-apple-darwin14.0.0 julia @time for i=1:10^4 backtrace() end elapsed time: 2.586410216 seconds (640048 bytes allocated, 89.96% gc time) ``` ``` jameson@julia:~/julia-win32$ ./usr/bin/julia.exe -E @time for i=1:10^3 backtrace() end fixme:winsock:WS_EnterSingleProtocolW unknown Protocol 0x fixme:winsock:WS_EnterSingleProtocolW unknown Protocol 0x err:dbghelp_stabs:stabs_parse Unknown stab type 0x0a elapsed time: 22.6314386 seconds (320032016 bytes allocated, 1.51% gc time) ``` ``` |__/ | i686-w64-mingw32 julia @time for i=1:10^4 backtrace() end elapsed time:
Re: [julia-users] Re: Simple Finite Difference Methods
Yes, probably (I thought we already had). Someone would need to do some comparison work though first. On Wed, Dec 3, 2014 at 10:57 AM Tim Holy tim.h...@gmail.com wrote: That's a pretty serious bummer. I can't believe anybody puts up with this. Should we change the default initialization https://github.com/JuliaLang/julia/blob/b1c99af9bdeef22e0999b283885977 57541e2cc7/base/profile.jl#L44 so that, on Windows, it fires every 100ms or so? And add a note to the Profile docs? --Tim On Wednesday, December 03, 2014 03:37:59 PM Jameson Nash wrote: The suggestion apparently is to use Event Tracing for Windows (aka ptrace/dtrace). Yes, that is faster (on linux too...), but misses the point entirely of profiling user code. the other offerings typically wrap StalkWalk64 (as we do), and complain about how absurdly slow it is we used to use RtlCaptureStackBackTrace, but it often failed to give useful backtraces. I think it depends too heavily of walking the EBP pointer chain (which typically doesn't exist on x86_64). As it happens, the remaining suggestions fall into the category of well, obviously you should just write your own EBP (32-bit base pointer register) pointer chain walk algorithm. here, I'll even write part of it for you ... which would be very helpful, if RBP (64-bit base pointer register) was used to make stack frame chains (hint: it isn't). and these days, the EBP isn't used to make stack pointer chains on x86 either. llvm3.5 contains the ability to interpret COFF files, so you could presumably write your own stack-walk algorithm. i don't recommend it, however. you might have to call out to StalkWalk anyways to access the microsoft symbol server (yes, off their network servers) to complete the stalk walk symbol lookup correctly On Wed Dec 03 2014 at 10:04:19 AM Tim Holy tim.h...@gmail.com wrote: Some potentially-interesting links (of which I understand very little): http://stackoverflow.com/questions/860602/recommended-open- source-profilers#comment2363112_1137133 http://stackoverflow.com/questions/8406175/optimizing-stack- walking-performance I can tell from this comment: https://github.com/JuliaLang/julia/issues/2597#issuecomment-15159868 that you already know about this (and its negatives): http://www.lenholgate.com/blog/2008/09/alternative-call-stac k-capturing.html --Tim On Wednesday, December 03, 2014 02:25:22 PM Jameson Nash wrote: this stack overflow question indicates that there are two options ( http://stackoverflow.com/questions/153559/what-are-some- good-profilers-for-n ative-c-on-windows ) https://software.intel.com/sites/default/files/managed/cd/ 92/Intel-VTune-Amp lifierXE-2015-Product-Brief-072914.pdf ($900) http://www.glowcode.com/summary.htm ($500) On Wed Dec 03 2014 at 9:11:28 AM Stefan Karpinski ste...@karpinski.org wrote: This seems nuts. There have to be good profilers on Windows – how do those work? On Tue, Dec 2, 2014 at 11:55 PM, Jameson Nash vtjn...@gmail.com wrote: (I forgot to mention, that, to be fair, the windows machine that was used to run this test was an underpowered dual-core hyperthreaded atom processor, whereas the linux and mac machines were pretty comparable Xeon and sandybridge machines, respectively. I only gave windows a factor of 2 advantage in the above computation in my accounting for this gap) On Tue Dec 02 2014 at 10:50:20 PM Tim Holy tim.h...@gmail.com wrote: Wow, those are pathetically-slow backtraces. Since most of us don't have machines with 500 cores, I don't see anything we can do. --Tim On Wednesday, December 03, 2014 03:14:02 AM Jameson Nash wrote: you could copy the whole stack (typically only a few 100kb, max of 8MB), then do the stack walk offline. if you could change the stack pages to copy-on-write, it may even not be too expensive. but this is the real problem: ``` |__/ | x86_64-linux-gnu julia @time for i=1:10^4 backtrace() end elapsed time: 2.789268693 seconds (3200320016 bytes allocated, 89.29% gc time) ``` ``` |__/ | x86_64-apple-darwin14.0.0 julia @time for i=1:10^4 backtrace() end elapsed time: 2.586410216 seconds (640048 bytes allocated, 89.96% gc time) ``` ``` jameson@julia:~/julia-win32$ ./usr/bin/julia.exe -E @time for i=1:10^3 backtrace() end fixme:winsock:WS_EnterSingleProtocolW unknown Protocol 0x fixme:winsock:WS_EnterSingleProtocolW unknown Protocol 0x err:dbghelp_stabs:stabs_parse
Re: [julia-users] Re: Simple Finite Difference Methods
Can somebody on a Windows system report back with the output of `Profile.init()`? --Tim On Wednesday, December 03, 2014 04:38:01 PM Jameson Nash wrote: Yes, probably (I thought we already had). Someone would need to do some comparison work though first. On Wed, Dec 3, 2014 at 10:57 AM Tim Holy tim.h...@gmail.com wrote: That's a pretty serious bummer. I can't believe anybody puts up with this. Should we change the default initialization https://github.com/JuliaLang/julia/blob/b1c99af9bdeef22e0999b283885977 57541e2cc7/base/profile.jl#L44 so that, on Windows, it fires every 100ms or so? And add a note to the Profile docs? --Tim On Wednesday, December 03, 2014 03:37:59 PM Jameson Nash wrote: The suggestion apparently is to use Event Tracing for Windows (aka ptrace/dtrace). Yes, that is faster (on linux too...), but misses the point entirely of profiling user code. the other offerings typically wrap StalkWalk64 (as we do), and complain about how absurdly slow it is we used to use RtlCaptureStackBackTrace, but it often failed to give useful backtraces. I think it depends too heavily of walking the EBP pointer chain (which typically doesn't exist on x86_64). As it happens, the remaining suggestions fall into the category of well, obviously you should just write your own EBP (32-bit base pointer register) pointer chain walk algorithm. here, I'll even write part of it for you ... which would be very helpful, if RBP (64-bit base pointer register) was used to make stack frame chains (hint: it isn't). and these days, the EBP isn't used to make stack pointer chains on x86 either. llvm3.5 contains the ability to interpret COFF files, so you could presumably write your own stack-walk algorithm. i don't recommend it, however. you might have to call out to StalkWalk anyways to access the microsoft symbol server (yes, off their network servers) to complete the stalk walk symbol lookup correctly On Wed Dec 03 2014 at 10:04:19 AM Tim Holy tim.h...@gmail.com wrote: Some potentially-interesting links (of which I understand very little): http://stackoverflow.com/questions/860602/recommended-open- source-profilers#comment2363112_1137133 http://stackoverflow.com/questions/8406175/optimizing-stack- walking-performance I can tell from this comment: https://github.com/JuliaLang/julia/issues/2597#issuecomment-15159868 that you already know about this (and its negatives): http://www.lenholgate.com/blog/2008/09/alternative-call-stac k-capturing.html --Tim On Wednesday, December 03, 2014 02:25:22 PM Jameson Nash wrote: this stack overflow question indicates that there are two options ( http://stackoverflow.com/questions/153559/what-are-some- good-profilers-for-n ative-c-on-windows ) https://software.intel.com/sites/default/files/managed/cd/ 92/Intel-VTune-Amp lifierXE-2015-Product-Brief-072914.pdf ($900) http://www.glowcode.com/summary.htm ($500) On Wed Dec 03 2014 at 9:11:28 AM Stefan Karpinski ste...@karpinski.org wrote: This seems nuts. There have to be good profilers on Windows – how do those work? On Tue, Dec 2, 2014 at 11:55 PM, Jameson Nash vtjn...@gmail.com wrote: (I forgot to mention, that, to be fair, the windows machine that was used to run this test was an underpowered dual-core hyperthreaded atom processor, whereas the linux and mac machines were pretty comparable Xeon and sandybridge machines, respectively. I only gave windows a factor of 2 advantage in the above computation in my accounting for this gap) On Tue Dec 02 2014 at 10:50:20 PM Tim Holy tim.h...@gmail.com wrote: Wow, those are pathetically-slow backtraces. Since most of us don't have machines with 500 cores, I don't see anything we can do. --Tim On Wednesday, December 03, 2014 03:14:02 AM Jameson Nash wrote: you could copy the whole stack (typically only a few 100kb, max of 8MB), then do the stack walk offline. if you could change the stack pages to copy-on-write, it may even not be too expensive. but this is the real problem: ``` |__/ | x86_64-linux-gnu julia @time for i=1:10^4 backtrace() end elapsed time: 2.789268693 seconds (3200320016 bytes allocated, 89.29% gc time) ``` ``` |__/ | x86_64-apple-darwin14.0.0 julia @time for i=1:10^4 backtrace() end elapsed time: 2.586410216 seconds (640048
Re: [julia-users] Re: Simple Finite Difference Methods
_ _ _ _(_)_ | A fresh approach to technical computing (_) | (_) (_)| Documentation: http://docs.julialang.org _ _ _| |_ __ _ | Type help() for help. | | | | | | |/ _` | | | | |_| | | | (_| | | Version 0.3.3 (2014-11-23 20:19 UTC) _/ |\__'_|_|_|\__'_| | |__/ | x86_64-w64-mingw32 julia Profile.init() (100,0.001) julia
Re: [julia-users] Re: Simple Finite Difference Methods
The accuracy of windows timers is somewhat questionable. I don't know if 5% is good enough for this purpose, but: one of our collaborators uses (a lightly modified version of) the code below in a real-time imaging application: http://www.codeguru.com/cpp/w-p/system/timers/article.php/c5759/Creating-a-HighPrecision-HighResolution-and-Highly-Reliable-Timer-Utilising-Minimal-CPU-Resources.htm On Tue, Dec 2, 2014 at 1:20 PM, Jameson Nash vtjn...@gmail.com wrote: Correct. Windows imposes a much higher overhead on just about every aspect of doing profiling. Unfortunately, there isn't much we can do about this, other then to complain to Microsoft. (It doesn't have signals, so we must emulate them with a separate thread. The accuracy of windows timers is somewhat questionable. And the stack walk library (for recording the backtrace) is apparently just badly written and therefore insanely slow and memory hungry.) On Tue, Dec 2, 2014 at 12:59 PM Tim Holy tim.h...@gmail.com wrote: I think it's just that Windows is bad at scheduling tasks with short-latency, high-precision timing, but I am not the right person to answer such questions. --Tim On Tuesday, December 02, 2014 09:57:28 AM Peter Simon wrote: I have also experienced the inaccurate profile timings on Windows. Is the reason for the bad profiler performance on Windows understood? Are there plans for improvement? Thanks, --Peter On Tuesday, December 2, 2014 3:57:16 AM UTC-8, Tim Holy wrote: By default, the profiler takes one sample per millisecond. In practice, the timing is quite precise on Linux, seemingly within a factor of twoish on OSX, and nowhere close on Windows. So at least on Linux you can simply read samples as milliseconds. If you want to visualize the relative contributions of each statement, I highly recommend ProfileView. If you use LightTable, it's already built-in via the profile() command. The combination of ProfileView and @profile is, in my (extremely biased) opinion, quite powerful compared to tools I used previously in other programming environments. Finally, there's IProfile.jl, which works via a completely different mechanism but does report raw timings (with some pretty big caveats). Best, --Tim On Monday, December 01, 2014 10:13:16 PM Christoph Ortner wrote: How do you get timings from the Julia profiler, or even better, %-es? I guess one can convert from the numbers one gets, but it is a bit painful? Christoph
Re: [julia-users] Re: Simple Finite Difference Methods
Thanks. Looks like that needs to be changed. --Tim On Wednesday, December 03, 2014 08:53:50 AM Daniel Høegh wrote: _ _ _ _(_)_ | A fresh approach to technical computing (_) | (_) (_)| Documentation: http://docs.julialang.org _ _ _| |_ __ _ | Type help() for help. | | | | | | |/ _` | | | | | | | |_| | | | (_| | | Version 0.3.3 (2014-11-23 20:19 UTC) _/ |\__'_|_|_|\__'_| | |__/ | x86_64-w64-mingw32 julia Profile.init() (100,0.001) julia
Re: [julia-users] Re: Simple Finite Difference Methods
By default, the profiler takes one sample per millisecond. In practice, the timing is quite precise on Linux, seemingly within a factor of twoish on OSX, and nowhere close on Windows. So at least on Linux you can simply read samples as milliseconds. If you want to visualize the relative contributions of each statement, I highly recommend ProfileView. If you use LightTable, it's already built-in via the profile() command. The combination of ProfileView and @profile is, in my (extremely biased) opinion, quite powerful compared to tools I used previously in other programming environments. Finally, there's IProfile.jl, which works via a completely different mechanism but does report raw timings (with some pretty big caveats). Best, --Tim On Monday, December 01, 2014 10:13:16 PM Christoph Ortner wrote: How do you get timings from the Julia profiler, or even better, %-es? I guess one can convert from the numbers one gets, but it is a bit painful? Christoph
Re: [julia-users] Re: Simple Finite Difference Methods
Hi Christoph, If you pass in a function via an argument the compiler get relatively little information about that function and can not do any optimization on it. That is part of the reason why map(f, xs) is currently slower than a for-loop. There are ongoing discussions on how to solve that problem as an example see https://github.com/JuliaLang/julia/issues/8450 For profiling I used LightTable's ProfileView support and I have to agree with Tim that it is pretty amazing. Valentin On Tuesday, 2 December 2014 07:04:39 UTC+1, Christoph Ortner wrote: I took a look at the Lennard-Jones example. On top of the memory allocation from array slicing, there is also a lot of overhead in passing lj(r) as an input function phi. this is counterintuitive to me. Why is this a problem? Christoph
Re: [julia-users] Re: Simple Finite Difference Methods
Dear Tim and Valentin, Thanks for the feedback - It sounds like I need to give Lighttable another try. So far I even have had difficulties setting it up to work properly with Julia. But I will definitely try to play more with ProfileView in Julia. Christoph
Re: [julia-users] Re: Simple Finite Difference Methods
I have also experienced the inaccurate profile timings on Windows. Is the reason for the bad profiler performance on Windows understood? Are there plans for improvement? Thanks, --Peter On Tuesday, December 2, 2014 3:57:16 AM UTC-8, Tim Holy wrote: By default, the profiler takes one sample per millisecond. In practice, the timing is quite precise on Linux, seemingly within a factor of twoish on OSX, and nowhere close on Windows. So at least on Linux you can simply read samples as milliseconds. If you want to visualize the relative contributions of each statement, I highly recommend ProfileView. If you use LightTable, it's already built-in via the profile() command. The combination of ProfileView and @profile is, in my (extremely biased) opinion, quite powerful compared to tools I used previously in other programming environments. Finally, there's IProfile.jl, which works via a completely different mechanism but does report raw timings (with some pretty big caveats). Best, --Tim On Monday, December 01, 2014 10:13:16 PM Christoph Ortner wrote: How do you get timings from the Julia profiler, or even better, %-es? I guess one can convert from the numbers one gets, but it is a bit painful? Christoph
Re: [julia-users] Re: Simple Finite Difference Methods
I think it's just that Windows is bad at scheduling tasks with short-latency, high-precision timing, but I am not the right person to answer such questions. --Tim On Tuesday, December 02, 2014 09:57:28 AM Peter Simon wrote: I have also experienced the inaccurate profile timings on Windows. Is the reason for the bad profiler performance on Windows understood? Are there plans for improvement? Thanks, --Peter On Tuesday, December 2, 2014 3:57:16 AM UTC-8, Tim Holy wrote: By default, the profiler takes one sample per millisecond. In practice, the timing is quite precise on Linux, seemingly within a factor of twoish on OSX, and nowhere close on Windows. So at least on Linux you can simply read samples as milliseconds. If you want to visualize the relative contributions of each statement, I highly recommend ProfileView. If you use LightTable, it's already built-in via the profile() command. The combination of ProfileView and @profile is, in my (extremely biased) opinion, quite powerful compared to tools I used previously in other programming environments. Finally, there's IProfile.jl, which works via a completely different mechanism but does report raw timings (with some pretty big caveats). Best, --Tim On Monday, December 01, 2014 10:13:16 PM Christoph Ortner wrote: How do you get timings from the Julia profiler, or even better, %-es? I guess one can convert from the numbers one gets, but it is a bit painful? Christoph
Re: [julia-users] Re: Simple Finite Difference Methods
Thank you for these ideas. I particularly like how to use unicode - I may need to learn how to do this. The manual has some sections on Unicode input using LaTeX-like tab completion syntax. Having to manually write out all these loops is a bit of a pain. Is @devec not doing the job? I haven't really tried. You seem to say that I shouldn't use the return keyword - I like for readability. But is there a Julia style guide I should start following? All I mean is that it's not strictly required if all you do is return the value of the last statement. We don't really have a style guide. Thanks, Jiahao Chen Staff Research Scientist MIT Computer Science and Artificial Intelligence Laboratory
Re: [julia-users] Re: Simple Finite Difference Methods
Correct. Windows imposes a much higher overhead on just about every aspect of doing profiling. Unfortunately, there isn't much we can do about this, other then to complain to Microsoft. (It doesn't have signals, so we must emulate them with a separate thread. The accuracy of windows timers is somewhat questionable. And the stack walk library (for recording the backtrace) is apparently just badly written and therefore insanely slow and memory hungry.) On Tue, Dec 2, 2014 at 12:59 PM Tim Holy tim.h...@gmail.com wrote: I think it's just that Windows is bad at scheduling tasks with short-latency, high-precision timing, but I am not the right person to answer such questions. --Tim On Tuesday, December 02, 2014 09:57:28 AM Peter Simon wrote: I have also experienced the inaccurate profile timings on Windows. Is the reason for the bad profiler performance on Windows understood? Are there plans for improvement? Thanks, --Peter On Tuesday, December 2, 2014 3:57:16 AM UTC-8, Tim Holy wrote: By default, the profiler takes one sample per millisecond. In practice, the timing is quite precise on Linux, seemingly within a factor of twoish on OSX, and nowhere close on Windows. So at least on Linux you can simply read samples as milliseconds. If you want to visualize the relative contributions of each statement, I highly recommend ProfileView. If you use LightTable, it's already built-in via the profile() command. The combination of ProfileView and @profile is, in my (extremely biased) opinion, quite powerful compared to tools I used previously in other programming environments. Finally, there's IProfile.jl, which works via a completely different mechanism but does report raw timings (with some pretty big caveats). Best, --Tim On Monday, December 01, 2014 10:13:16 PM Christoph Ortner wrote: How do you get timings from the Julia profiler, or even better, %-es? I guess one can convert from the numbers one gets, but it is a bit painful? Christoph
Re: [julia-users] Re: Simple Finite Difference Methods
Although, over thanksgiving, I pushed a number of fixes which should improve the quality of backtraces on win32 (and make sys.dll usable there) On Tue, Dec 2, 2014 at 1:20 PM Jameson Nash vtjn...@gmail.com wrote: Correct. Windows imposes a much higher overhead on just about every aspect of doing profiling. Unfortunately, there isn't much we can do about this, other then to complain to Microsoft. (It doesn't have signals, so we must emulate them with a separate thread. The accuracy of windows timers is somewhat questionable. And the stack walk library (for recording the backtrace) is apparently just badly written and therefore insanely slow and memory hungry.) On Tue, Dec 2, 2014 at 12:59 PM Tim Holy tim.h...@gmail.com wrote: I think it's just that Windows is bad at scheduling tasks with short-latency, high-precision timing, but I am not the right person to answer such questions. --Tim On Tuesday, December 02, 2014 09:57:28 AM Peter Simon wrote: I have also experienced the inaccurate profile timings on Windows. Is the reason for the bad profiler performance on Windows understood? Are there plans for improvement? Thanks, --Peter On Tuesday, December 2, 2014 3:57:16 AM UTC-8, Tim Holy wrote: By default, the profiler takes one sample per millisecond. In practice, the timing is quite precise on Linux, seemingly within a factor of twoish on OSX, and nowhere close on Windows. So at least on Linux you can simply read samples as milliseconds. If you want to visualize the relative contributions of each statement, I highly recommend ProfileView. If you use LightTable, it's already built-in via the profile() command. The combination of ProfileView and @profile is, in my (extremely biased) opinion, quite powerful compared to tools I used previously in other programming environments. Finally, there's IProfile.jl, which works via a completely different mechanism but does report raw timings (with some pretty big caveats). Best, --Tim On Monday, December 01, 2014 10:13:16 PM Christoph Ortner wrote: How do you get timings from the Julia profiler, or even better, %-es? I guess one can convert from the numbers one gets, but it is a bit painful? Christoph
Re: [julia-users] Re: Simple Finite Difference Methods
On Windows, is there any chance that one could set up a separate thread for profiling and use busy-wait to do the timing? (I don't even know whether one thread can snoop on the execution state of another thread.) --Tim On Tuesday, December 02, 2014 06:22:39 PM Jameson Nash wrote: Although, over thanksgiving, I pushed a number of fixes which should improve the quality of backtraces on win32 (and make sys.dll usable there) On Tue, Dec 2, 2014 at 1:20 PM Jameson Nash vtjn...@gmail.com wrote: Correct. Windows imposes a much higher overhead on just about every aspect of doing profiling. Unfortunately, there isn't much we can do about this, other then to complain to Microsoft. (It doesn't have signals, so we must emulate them with a separate thread. The accuracy of windows timers is somewhat questionable. And the stack walk library (for recording the backtrace) is apparently just badly written and therefore insanely slow and memory hungry.) On Tue, Dec 2, 2014 at 12:59 PM Tim Holy tim.h...@gmail.com wrote: I think it's just that Windows is bad at scheduling tasks with short-latency, high-precision timing, but I am not the right person to answer such questions. --Tim On Tuesday, December 02, 2014 09:57:28 AM Peter Simon wrote: I have also experienced the inaccurate profile timings on Windows. Is the reason for the bad profiler performance on Windows understood? Are there plans for improvement? Thanks, --Peter On Tuesday, December 2, 2014 3:57:16 AM UTC-8, Tim Holy wrote: By default, the profiler takes one sample per millisecond. In practice, the timing is quite precise on Linux, seemingly within a factor of twoish on OSX, and nowhere close on Windows. So at least on Linux you can simply read samples as milliseconds. If you want to visualize the relative contributions of each statement, I highly recommend ProfileView. If you use LightTable, it's already built-in via the profile() command. The combination of ProfileView and @profile is, in my (extremely biased) opinion, quite powerful compared to tools I used previously in other programming environments. Finally, there's IProfile.jl, which works via a completely different mechanism but does report raw timings (with some pretty big caveats). Best, --Tim On Monday, December 01, 2014 10:13:16 PM Christoph Ortner wrote: How do you get timings from the Julia profiler, or even better, %-es? I guess one can convert from the numbers one gets, but it is a bit painful? Christoph
Re: [julia-users] Re: Simple Finite Difference Methods
That's essentially what we do now. (Minus the busy wait part). The overhead is too high to run it any more frequently -- it already causes a significant performance penalty on the system, even at the much lower sample rate than linux. However, I suspect the truncated backtraces on win32 were exaggerating the effect somewhat -- that should not be as much of an issue now. Sure, windows lets you snoop on (and modify) the address space of any process, you just need to find the right handle. On Tue, Dec 2, 2014 at 2:18 PM Tim Holy tim.h...@gmail.com wrote: On Windows, is there any chance that one could set up a separate thread for profiling and use busy-wait to do the timing? (I don't even know whether one thread can snoop on the execution state of another thread.) --Tim On Tuesday, December 02, 2014 06:22:39 PM Jameson Nash wrote: Although, over thanksgiving, I pushed a number of fixes which should improve the quality of backtraces on win32 (and make sys.dll usable there) On Tue, Dec 2, 2014 at 1:20 PM Jameson Nash vtjn...@gmail.com wrote: Correct. Windows imposes a much higher overhead on just about every aspect of doing profiling. Unfortunately, there isn't much we can do about this, other then to complain to Microsoft. (It doesn't have signals, so we must emulate them with a separate thread. The accuracy of windows timers is somewhat questionable. And the stack walk library (for recording the backtrace) is apparently just badly written and therefore insanely slow and memory hungry.) On Tue, Dec 2, 2014 at 12:59 PM Tim Holy tim.h...@gmail.com wrote: I think it's just that Windows is bad at scheduling tasks with short-latency, high-precision timing, but I am not the right person to answer such questions. --Tim On Tuesday, December 02, 2014 09:57:28 AM Peter Simon wrote: I have also experienced the inaccurate profile timings on Windows. Is the reason for the bad profiler performance on Windows understood? Are there plans for improvement? Thanks, --Peter On Tuesday, December 2, 2014 3:57:16 AM UTC-8, Tim Holy wrote: By default, the profiler takes one sample per millisecond. In practice, the timing is quite precise on Linux, seemingly within a factor of twoish on OSX, and nowhere close on Windows. So at least on Linux you can simply read samples as milliseconds. If you want to visualize the relative contributions of each statement, I highly recommend ProfileView. If you use LightTable, it's already built-in via the profile() command. The combination of ProfileView and @profile is, in my (extremely biased) opinion, quite powerful compared to tools I used previously in other programming environments. Finally, there's IProfile.jl, which works via a completely different mechanism but does report raw timings (with some pretty big caveats). Best, --Tim On Monday, December 01, 2014 10:13:16 PM Christoph Ortner wrote: How do you get timings from the Julia profiler, or even better, %-es? I guess one can convert from the numbers one gets, but it is a bit painful? Christoph
Re: [julia-users] Re: Simple Finite Difference Methods
If the work of walking the stack is done in the thread, why does it cause any slowdown of the main process? But of course the time it takes to complete the backtrace sets an upper limit on how frequently you can take a snapshot. In that case, though, couldn't you just have the thread always collecting backtraces? --Tim On Tuesday, December 02, 2014 09:54:17 PM Jameson Nash wrote: That's essentially what we do now. (Minus the busy wait part). The overhead is too high to run it any more frequently -- it already causes a significant performance penalty on the system, even at the much lower sample rate than linux. However, I suspect the truncated backtraces on win32 were exaggerating the effect somewhat -- that should not be as much of an issue now. Sure, windows lets you snoop on (and modify) the address space of any process, you just need to find the right handle. On Tue, Dec 2, 2014 at 2:18 PM Tim Holy tim.h...@gmail.com wrote: On Windows, is there any chance that one could set up a separate thread for profiling and use busy-wait to do the timing? (I don't even know whether one thread can snoop on the execution state of another thread.) --Tim On Tuesday, December 02, 2014 06:22:39 PM Jameson Nash wrote: Although, over thanksgiving, I pushed a number of fixes which should improve the quality of backtraces on win32 (and make sys.dll usable there) On Tue, Dec 2, 2014 at 1:20 PM Jameson Nash vtjn...@gmail.com wrote: Correct. Windows imposes a much higher overhead on just about every aspect of doing profiling. Unfortunately, there isn't much we can do about this, other then to complain to Microsoft. (It doesn't have signals, so we must emulate them with a separate thread. The accuracy of windows timers is somewhat questionable. And the stack walk library (for recording the backtrace) is apparently just badly written and therefore insanely slow and memory hungry.) On Tue, Dec 2, 2014 at 12:59 PM Tim Holy tim.h...@gmail.com wrote: I think it's just that Windows is bad at scheduling tasks with short-latency, high-precision timing, but I am not the right person to answer such questions. --Tim On Tuesday, December 02, 2014 09:57:28 AM Peter Simon wrote: I have also experienced the inaccurate profile timings on Windows. Is the reason for the bad profiler performance on Windows understood? Are there plans for improvement? Thanks, --Peter On Tuesday, December 2, 2014 3:57:16 AM UTC-8, Tim Holy wrote: By default, the profiler takes one sample per millisecond. In practice, the timing is quite precise on Linux, seemingly within a factor of twoish on OSX, and nowhere close on Windows. So at least on Linux you can simply read samples as milliseconds. If you want to visualize the relative contributions of each statement, I highly recommend ProfileView. If you use LightTable, it's already built-in via the profile() command. The combination of ProfileView and @profile is, in my (extremely biased) opinion, quite powerful compared to tools I used previously in other programming environments. Finally, there's IProfile.jl, which works via a completely different mechanism but does report raw timings (with some pretty big caveats). Best, --Tim On Monday, December 01, 2014 10:13:16 PM Christoph Ortner wrote: How do you get timings from the Julia profiler, or even better, %-es? I guess one can convert from the numbers one gets, but it is a bit painful? Christoph
Re: [julia-users] Re: Simple Finite Difference Methods
You can't profile a moving target. The thread must be frozen first to ensure the stack trace doesn't change while attempting to record it On Tue, Dec 2, 2014 at 5:12 PM Tim Holy tim.h...@gmail.com wrote: If the work of walking the stack is done in the thread, why does it cause any slowdown of the main process? But of course the time it takes to complete the backtrace sets an upper limit on how frequently you can take a snapshot. In that case, though, couldn't you just have the thread always collecting backtraces? --Tim On Tuesday, December 02, 2014 09:54:17 PM Jameson Nash wrote: That's essentially what we do now. (Minus the busy wait part). The overhead is too high to run it any more frequently -- it already causes a significant performance penalty on the system, even at the much lower sample rate than linux. However, I suspect the truncated backtraces on win32 were exaggerating the effect somewhat -- that should not be as much of an issue now. Sure, windows lets you snoop on (and modify) the address space of any process, you just need to find the right handle. On Tue, Dec 2, 2014 at 2:18 PM Tim Holy tim.h...@gmail.com wrote: On Windows, is there any chance that one could set up a separate thread for profiling and use busy-wait to do the timing? (I don't even know whether one thread can snoop on the execution state of another thread.) --Tim On Tuesday, December 02, 2014 06:22:39 PM Jameson Nash wrote: Although, over thanksgiving, I pushed a number of fixes which should improve the quality of backtraces on win32 (and make sys.dll usable there) On Tue, Dec 2, 2014 at 1:20 PM Jameson Nash vtjn...@gmail.com wrote: Correct. Windows imposes a much higher overhead on just about every aspect of doing profiling. Unfortunately, there isn't much we can do about this, other then to complain to Microsoft. (It doesn't have signals, so we must emulate them with a separate thread. The accuracy of windows timers is somewhat questionable. And the stack walk library (for recording the backtrace) is apparently just badly written and therefore insanely slow and memory hungry.) On Tue, Dec 2, 2014 at 12:59 PM Tim Holy tim.h...@gmail.com wrote: I think it's just that Windows is bad at scheduling tasks with short-latency, high-precision timing, but I am not the right person to answer such questions. --Tim On Tuesday, December 02, 2014 09:57:28 AM Peter Simon wrote: I have also experienced the inaccurate profile timings on Windows. Is the reason for the bad profiler performance on Windows understood? Are there plans for improvement? Thanks, --Peter On Tuesday, December 2, 2014 3:57:16 AM UTC-8, Tim Holy wrote: By default, the profiler takes one sample per millisecond. In practice, the timing is quite precise on Linux, seemingly within a factor of twoish on OSX, and nowhere close on Windows. So at least on Linux you can simply read samples as milliseconds. If you want to visualize the relative contributions of each statement, I highly recommend ProfileView. If you use LightTable, it's already built-in via the profile() command. The combination of ProfileView and @profile is, in my (extremely biased) opinion, quite powerful compared to tools I used previously in other programming environments. Finally, there's IProfile.jl, which works via a completely different mechanism but does report raw timings (with some pretty big caveats). Best, --Tim On Monday, December 01, 2014 10:13:16 PM Christoph Ortner wrote: How do you get timings from the Julia profiler, or even better, %-es? I guess one can convert from the numbers one gets, but it is a bit painful? Christoph
Re: [julia-users] Re: Simple Finite Difference Methods
On Tuesday, December 02, 2014 10:24:43 PM Jameson Nash wrote: You can't profile a moving target. The thread must be frozen first to ensure the stack trace doesn't change while attempting to record it Got it. I assume there's no good way to make a copy and then discard if the copy is bad? --Tim On Tue, Dec 2, 2014 at 5:12 PM Tim Holy tim.h...@gmail.com wrote: If the work of walking the stack is done in the thread, why does it cause any slowdown of the main process? But of course the time it takes to complete the backtrace sets an upper limit on how frequently you can take a snapshot. In that case, though, couldn't you just have the thread always collecting backtraces? --Tim On Tuesday, December 02, 2014 09:54:17 PM Jameson Nash wrote: That's essentially what we do now. (Minus the busy wait part). The overhead is too high to run it any more frequently -- it already causes a significant performance penalty on the system, even at the much lower sample rate than linux. However, I suspect the truncated backtraces on win32 were exaggerating the effect somewhat -- that should not be as much of an issue now. Sure, windows lets you snoop on (and modify) the address space of any process, you just need to find the right handle. On Tue, Dec 2, 2014 at 2:18 PM Tim Holy tim.h...@gmail.com wrote: On Windows, is there any chance that one could set up a separate thread for profiling and use busy-wait to do the timing? (I don't even know whether one thread can snoop on the execution state of another thread.) --Tim On Tuesday, December 02, 2014 06:22:39 PM Jameson Nash wrote: Although, over thanksgiving, I pushed a number of fixes which should improve the quality of backtraces on win32 (and make sys.dll usable there) On Tue, Dec 2, 2014 at 1:20 PM Jameson Nash vtjn...@gmail.com wrote: Correct. Windows imposes a much higher overhead on just about every aspect of doing profiling. Unfortunately, there isn't much we can do about this, other then to complain to Microsoft. (It doesn't have signals, so we must emulate them with a separate thread. The accuracy of windows timers is somewhat questionable. And the stack walk library (for recording the backtrace) is apparently just badly written and therefore insanely slow and memory hungry.) On Tue, Dec 2, 2014 at 12:59 PM Tim Holy tim.h...@gmail.com wrote: I think it's just that Windows is bad at scheduling tasks with short-latency, high-precision timing, but I am not the right person to answer such questions. --Tim On Tuesday, December 02, 2014 09:57:28 AM Peter Simon wrote: I have also experienced the inaccurate profile timings on Windows. Is the reason for the bad profiler performance on Windows understood? Are there plans for improvement? Thanks, --Peter On Tuesday, December 2, 2014 3:57:16 AM UTC-8, Tim Holy wrote: By default, the profiler takes one sample per millisecond. In practice, the timing is quite precise on Linux, seemingly within a factor of twoish on OSX, and nowhere close on Windows. So at least on Linux you can simply read samples as milliseconds. If you want to visualize the relative contributions of each statement, I highly recommend ProfileView. If you use LightTable, it's already built-in via the profile() command. The combination of ProfileView and @profile is, in my (extremely biased) opinion, quite powerful compared to tools I used previously in other programming environments. Finally, there's IProfile.jl, which works via a completely different mechanism but does report raw timings (with some pretty big caveats). Best, --Tim On Monday, December 01, 2014 10:13:16 PM Christoph Ortner wrote: How do you get timings from the Julia profiler, or even better, %-es? I guess one can convert from the numbers one gets, but it is a bit painful? Christoph
Re: [julia-users] Re: Simple Finite Difference Methods
you could copy the whole stack (typically only a few 100kb, max of 8MB), then do the stack walk offline. if you could change the stack pages to copy-on-write, it may even not be too expensive. but this is the real problem: ``` |__/ | x86_64-linux-gnu julia @time for i=1:10^4 backtrace() end elapsed time: 2.789268693 seconds (3200320016 bytes allocated, 89.29% gc time) ``` ``` |__/ | x86_64-apple-darwin14.0.0 julia @time for i=1:10^4 backtrace() end elapsed time: 2.586410216 seconds (640048 bytes allocated, 89.96% gc time) ``` ``` jameson@julia:~/julia-win32$ ./usr/bin/julia.exe -E @time for i=1:10^3 backtrace() end fixme:winsock:WS_EnterSingleProtocolW unknown Protocol 0x fixme:winsock:WS_EnterSingleProtocolW unknown Protocol 0x err:dbghelp_stabs:stabs_parse Unknown stab type 0x0a elapsed time: 22.6314386 seconds (320032016 bytes allocated, 1.51% gc time) ``` ``` |__/ | i686-w64-mingw32 julia @time for i=1:10^4 backtrace() end elapsed time: 69.243275608 seconds (3200320800 bytes allocated, 13.16% gc time) ``` And yes, those gc fractions are verifiably correct. With gc_disable(), they execute in 1/10 of the time. So, that pretty much means you must take 1/100 of the samples if you want to preserve roughly the same slow down. On linux, I find the slowdown to be in the range of 2-5x, and consider that to be pretty reasonable, especially for what you're getting. If you took the same number of samples on windows, it would cause a 200-500x slowdown (give or take a few percent). If you wanted to offload this work to other cores to get the same level of accuracy and no more slowdown than linux, you would need a machine with 200-500 processors (give or take 2-5)! (I think I did those conversions correctly. However, since I just did them for the purposes of this email, sans calculator, and as I was typing, let me know if I made more than a factor of 2 error somewhere, or just have fun reading https://what-if.xkcd.com/84/ instead) On Tue Dec 02 2014 at 6:23:07 PM Tim Holy tim.h...@gmail.com wrote: On Tuesday, December 02, 2014 10:24:43 PM Jameson Nash wrote: You can't profile a moving target. The thread must be frozen first to ensure the stack trace doesn't change while attempting to record it Got it. I assume there's no good way to make a copy and then discard if the copy is bad? --Tim On Tue, Dec 2, 2014 at 5:12 PM Tim Holy tim.h...@gmail.com wrote: If the work of walking the stack is done in the thread, why does it cause any slowdown of the main process? But of course the time it takes to complete the backtrace sets an upper limit on how frequently you can take a snapshot. In that case, though, couldn't you just have the thread always collecting backtraces? --Tim On Tuesday, December 02, 2014 09:54:17 PM Jameson Nash wrote: That's essentially what we do now. (Minus the busy wait part). The overhead is too high to run it any more frequently -- it already causes a significant performance penalty on the system, even at the much lower sample rate than linux. However, I suspect the truncated backtraces on win32 were exaggerating the effect somewhat -- that should not be as much of an issue now. Sure, windows lets you snoop on (and modify) the address space of any process, you just need to find the right handle. On Tue, Dec 2, 2014 at 2:18 PM Tim Holy tim.h...@gmail.com wrote: On Windows, is there any chance that one could set up a separate thread for profiling and use busy-wait to do the timing? (I don't even know whether one thread can snoop on the execution state of another thread.) --Tim On Tuesday, December 02, 2014 06:22:39 PM Jameson Nash wrote: Although, over thanksgiving, I pushed a number of fixes which should improve the quality of backtraces on win32 (and make sys.dll usable there) On Tue, Dec 2, 2014 at 1:20 PM Jameson Nash vtjn...@gmail.com wrote: Correct. Windows imposes a much higher overhead on just about every aspect of doing profiling. Unfortunately, there isn't much we can do about this, other then to complain to Microsoft. (It doesn't have signals, so we must emulate them with a separate thread. The accuracy of windows timers is somewhat questionable. And the stack walk library (for recording the backtrace) is apparently just badly written and therefore insanely slow and memory hungry.) On Tue, Dec 2, 2014 at 12:59 PM Tim Holy tim.h...@gmail.com wrote: I think it's just that Windows is bad at scheduling tasks with short-latency, high-precision timing, but I am not the right person to answer such
Re: [julia-users] Re: Simple Finite Difference Methods
Wow, those are pathetically-slow backtraces. Since most of us don't have machines with 500 cores, I don't see anything we can do. --Tim On Wednesday, December 03, 2014 03:14:02 AM Jameson Nash wrote: you could copy the whole stack (typically only a few 100kb, max of 8MB), then do the stack walk offline. if you could change the stack pages to copy-on-write, it may even not be too expensive. but this is the real problem: ``` |__/ | x86_64-linux-gnu julia @time for i=1:10^4 backtrace() end elapsed time: 2.789268693 seconds (3200320016 bytes allocated, 89.29% gc time) ``` ``` |__/ | x86_64-apple-darwin14.0.0 julia @time for i=1:10^4 backtrace() end elapsed time: 2.586410216 seconds (640048 bytes allocated, 89.96% gc time) ``` ``` jameson@julia:~/julia-win32$ ./usr/bin/julia.exe -E @time for i=1:10^3 backtrace() end fixme:winsock:WS_EnterSingleProtocolW unknown Protocol 0x fixme:winsock:WS_EnterSingleProtocolW unknown Protocol 0x err:dbghelp_stabs:stabs_parse Unknown stab type 0x0a elapsed time: 22.6314386 seconds (320032016 bytes allocated, 1.51% gc time) ``` ``` |__/ | i686-w64-mingw32 julia @time for i=1:10^4 backtrace() end elapsed time: 69.243275608 seconds (3200320800 bytes allocated, 13.16% gc time) ``` And yes, those gc fractions are verifiably correct. With gc_disable(), they execute in 1/10 of the time. So, that pretty much means you must take 1/100 of the samples if you want to preserve roughly the same slow down. On linux, I find the slowdown to be in the range of 2-5x, and consider that to be pretty reasonable, especially for what you're getting. If you took the same number of samples on windows, it would cause a 200-500x slowdown (give or take a few percent). If you wanted to offload this work to other cores to get the same level of accuracy and no more slowdown than linux, you would need a machine with 200-500 processors (give or take 2-5)! (I think I did those conversions correctly. However, since I just did them for the purposes of this email, sans calculator, and as I was typing, let me know if I made more than a factor of 2 error somewhere, or just have fun reading https://what-if.xkcd.com/84/ instead) On Tue Dec 02 2014 at 6:23:07 PM Tim Holy tim.h...@gmail.com wrote: On Tuesday, December 02, 2014 10:24:43 PM Jameson Nash wrote: You can't profile a moving target. The thread must be frozen first to ensure the stack trace doesn't change while attempting to record it Got it. I assume there's no good way to make a copy and then discard if the copy is bad? --Tim On Tue, Dec 2, 2014 at 5:12 PM Tim Holy tim.h...@gmail.com wrote: If the work of walking the stack is done in the thread, why does it cause any slowdown of the main process? But of course the time it takes to complete the backtrace sets an upper limit on how frequently you can take a snapshot. In that case, though, couldn't you just have the thread always collecting backtraces? --Tim On Tuesday, December 02, 2014 09:54:17 PM Jameson Nash wrote: That's essentially what we do now. (Minus the busy wait part). The overhead is too high to run it any more frequently -- it already causes a significant performance penalty on the system, even at the much lower sample rate than linux. However, I suspect the truncated backtraces on win32 were exaggerating the effect somewhat -- that should not be as much of an issue now. Sure, windows lets you snoop on (and modify) the address space of any process, you just need to find the right handle. On Tue, Dec 2, 2014 at 2:18 PM Tim Holy tim.h...@gmail.com wrote: On Windows, is there any chance that one could set up a separate thread for profiling and use busy-wait to do the timing? (I don't even know whether one thread can snoop on the execution state of another thread.) --Tim On Tuesday, December 02, 2014 06:22:39 PM Jameson Nash wrote: Although, over thanksgiving, I pushed a number of fixes which should improve the quality of backtraces on win32 (and make sys.dll usable there) On Tue, Dec 2, 2014 at 1:20 PM Jameson Nash vtjn...@gmail.com wrote: Correct. Windows imposes a much higher overhead on just about every aspect of doing profiling. Unfortunately, there isn't much we can do about this, other then to complain to Microsoft. (It doesn't have signals, so we must emulate them with a separate thread. The accuracy of windows timers is somewhat questionable. And the stack walk library (for
Re: [julia-users] Re: Simple Finite Difference Methods
(I forgot to mention, that, to be fair, the windows machine that was used to run this test was an underpowered dual-core hyperthreaded atom processor, whereas the linux and mac machines were pretty comparable Xeon and sandybridge machines, respectively. I only gave windows a factor of 2 advantage in the above computation in my accounting for this gap) On Tue Dec 02 2014 at 10:50:20 PM Tim Holy tim.h...@gmail.com wrote: Wow, those are pathetically-slow backtraces. Since most of us don't have machines with 500 cores, I don't see anything we can do. --Tim On Wednesday, December 03, 2014 03:14:02 AM Jameson Nash wrote: you could copy the whole stack (typically only a few 100kb, max of 8MB), then do the stack walk offline. if you could change the stack pages to copy-on-write, it may even not be too expensive. but this is the real problem: ``` |__/ | x86_64-linux-gnu julia @time for i=1:10^4 backtrace() end elapsed time: 2.789268693 seconds (3200320016 bytes allocated, 89.29% gc time) ``` ``` |__/ | x86_64-apple-darwin14.0.0 julia @time for i=1:10^4 backtrace() end elapsed time: 2.586410216 seconds (640048 bytes allocated, 89.96% gc time) ``` ``` jameson@julia:~/julia-win32$ ./usr/bin/julia.exe -E @time for i=1:10^3 backtrace() end fixme:winsock:WS_EnterSingleProtocolW unknown Protocol 0x fixme:winsock:WS_EnterSingleProtocolW unknown Protocol 0x err:dbghelp_stabs:stabs_parse Unknown stab type 0x0a elapsed time: 22.6314386 seconds (320032016 bytes allocated, 1.51% gc time) ``` ``` |__/ | i686-w64-mingw32 julia @time for i=1:10^4 backtrace() end elapsed time: 69.243275608 seconds (3200320800 bytes allocated, 13.16% gc time) ``` And yes, those gc fractions are verifiably correct. With gc_disable(), they execute in 1/10 of the time. So, that pretty much means you must take 1/100 of the samples if you want to preserve roughly the same slow down. On linux, I find the slowdown to be in the range of 2-5x, and consider that to be pretty reasonable, especially for what you're getting. If you took the same number of samples on windows, it would cause a 200-500x slowdown (give or take a few percent). If you wanted to offload this work to other cores to get the same level of accuracy and no more slowdown than linux, you would need a machine with 200-500 processors (give or take 2-5)! (I think I did those conversions correctly. However, since I just did them for the purposes of this email, sans calculator, and as I was typing, let me know if I made more than a factor of 2 error somewhere, or just have fun reading https://what-if.xkcd.com/84/ instead) On Tue Dec 02 2014 at 6:23:07 PM Tim Holy tim.h...@gmail.com wrote: On Tuesday, December 02, 2014 10:24:43 PM Jameson Nash wrote: You can't profile a moving target. The thread must be frozen first to ensure the stack trace doesn't change while attempting to record it Got it. I assume there's no good way to make a copy and then discard if the copy is bad? --Tim On Tue, Dec 2, 2014 at 5:12 PM Tim Holy tim.h...@gmail.com wrote: If the work of walking the stack is done in the thread, why does it cause any slowdown of the main process? But of course the time it takes to complete the backtrace sets an upper limit on how frequently you can take a snapshot. In that case, though, couldn't you just have the thread always collecting backtraces? --Tim On Tuesday, December 02, 2014 09:54:17 PM Jameson Nash wrote: That's essentially what we do now. (Minus the busy wait part). The overhead is too high to run it any more frequently -- it already causes a significant performance penalty on the system, even at the much lower sample rate than linux. However, I suspect the truncated backtraces on win32 were exaggerating the effect somewhat -- that should not be as much of an issue now. Sure, windows lets you snoop on (and modify) the address space of any process, you just need to find the right handle. On Tue, Dec 2, 2014 at 2:18 PM Tim Holy tim.h...@gmail.com wrote: On Windows, is there any chance that one could set up a separate thread for profiling and use busy-wait to do the timing? (I don't even know whether one thread can snoop on the execution state of another thread.) --Tim On Tuesday, December 02, 2014 06:22:39 PM Jameson Nash wrote: Although, over thanksgiving, I pushed a number of fixes which should improve the quality of backtraces on win32 (and make sys.dll usable there) On Tue, Dec 2, 2014 at 1:20 PM
[julia-users] Re: Simple Finite Difference Methods
Hi Valentin, This is extremely helpful - thank you for providing these tweaks. I hope you don't mind if I incorporate this. (And sorry for the typos - I am normally better at testing.) Christoph On Sunday, 30 November 2014 20:54:40 UTC, Valentin Churavy wrote: I found a second error in lj_cstyle t is calculated wrongly: t = 1./s*s*s != 1/s^3 It probably should be t = 1.0 / (s * s * s) t = 1.0 / (s*s*s) E += t*t - 2.*t dJ = -12.0 *(t*t - t) / s cstyle is on my machine still two times faster then my optimized variant of jl_pretty Best Valentin On Sunday, 30 November 2014 20:54:58 UTC+1, Valentin Churavy wrote: Nice work! Regarding the pretty Julia version of Lennard-Jones MD. You can shape of another second (on my machine) by not passing in the lj method as a parameter, but directly calling it. I tried to write an optimize version of your lj_pretty function by analysing it with @profile and rewriting the slow parts. You can see my results here: https://gist.github.com/vchuravy/f42f458717a7a49395a5 I went step for step through it and applied one optimization at a time. You can also see the time computation time spend at each line as a comment. Mostly I just removed temporary array allocation and then applied your math optimization. One question though. In lj_cstyle(x) you calculate dJ = -12.*(t*t - t) * s , shouldn't it be dJ = -12.*(t*t - t) / s? Kind regards, Valentin On Sunday, 30 November 2014 12:51:31 UTC+1, Christoph Ortner wrote: Belated update to this thread: I have now finished a first draft of three tutorial-like numerical PDE notebooks; they can be viewed at http://homepages.warwick.ac.uk/staff/C.Ortner/index.php?page=julia I have two more coming up in the near future, one on spectral methods, the other on an optimisation problem. For the moment, I am using them primarily for my research group to learn Julia, and to show it to colleagues when they are interested. Q1: May I use the Julia logo on that website, as well as for any tutorials / courses that I teach based on Julia? Q2: Eventually I think it would be good to have a Julia Examples page such as http://www.mathworks.com/examples/ Q3: I'd of course be interested in feedback.
[julia-users] Re: Simple Finite Difference Methods
You are very welcome. I hope that your collection of notebooks continues to grow. On Monday, 1 December 2014 14:09:22 UTC+1, Christoph Ortner wrote: Hi Valentin, This is extremely helpful - thank you for providing these tweaks. I hope you don't mind if I incorporate this. (And sorry for the typos - I am normally better at testing.) Christoph On Sunday, 30 November 2014 20:54:40 UTC, Valentin Churavy wrote: I found a second error in lj_cstyle t is calculated wrongly: t = 1./s*s*s != 1/s^3 It probably should be t = 1.0 / (s * s * s) t = 1.0 / (s*s*s) E += t*t - 2.*t dJ = -12.0 *(t*t - t) / s cstyle is on my machine still two times faster then my optimized variant of jl_pretty Best Valentin On Sunday, 30 November 2014 20:54:58 UTC+1, Valentin Churavy wrote: Nice work! Regarding the pretty Julia version of Lennard-Jones MD. You can shape of another second (on my machine) by not passing in the lj method as a parameter, but directly calling it. I tried to write an optimize version of your lj_pretty function by analysing it with @profile and rewriting the slow parts. You can see my results here: https://gist.github.com/vchuravy/f42f458717a7a49395a5 I went step for step through it and applied one optimization at a time. You can also see the time computation time spend at each line as a comment. Mostly I just removed temporary array allocation and then applied your math optimization. One question though. In lj_cstyle(x) you calculate dJ = -12.*(t*t - t) * s , shouldn't it be dJ = -12.*(t*t - t) / s? Kind regards, Valentin On Sunday, 30 November 2014 12:51:31 UTC+1, Christoph Ortner wrote: Belated update to this thread: I have now finished a first draft of three tutorial-like numerical PDE notebooks; they can be viewed at http://homepages.warwick.ac.uk/staff/C.Ortner/index.php?page=julia I have two more coming up in the near future, one on spectral methods, the other on an optimisation problem. For the moment, I am using them primarily for my research group to learn Julia, and to show it to colleagues when they are interested. Q1: May I use the Julia logo on that website, as well as for any tutorials / courses that I teach based on Julia? Q2: Eventually I think it would be good to have a Julia Examples page such as http://www.mathworks.com/examples/ Q3: I'd of course be interested in feedback.
[julia-users] Re: Simple Finite Difference Methods
Hi Steven, Thanks for the feedback - yes it was intentional, but I am definitely planning to extend the notebook with performance enhancements. I was not yet aware of this distinction between Int and Integer. Thanks for pointing it out. Christoph On Monday, 1 December 2014 15:49:10 UTC, Steven G. Johnson wrote: I notice that in your finite-element notebook you use: T = zeros(Integer, 3, 2*N^2) You really want to use Int in cases like this (Int = Int64 on 64-bit machines and Int32 on 32-bit machines). Using Integer means that you have an array of pointers to generic integer containers, whose type must be checked at runtime (e.g. T[i] could be an Int32, Int64, BigInt, etc.). (You also have very Matlab-like code that allocates zillions of little temporary arrays in your inner loops. I assume this is intentional, but it is definitely suboptimal performance, possibly by an order of magnitude.)
Re: [julia-users] Re: Simple Finite Difference Methods
I took a look at the Lennard-Jones example. On top of the memory allocation from array slicing, there is also a lot of overhead in passing lj(r) as an input function phi. There is also some overhead from computing size(x) in the range for the inner loop. Consider this version of the code, which is C style in that the loops are explicitly unrolled, but otherwise is not that far from the original lj_pretty: # definition of the Lennard-Jones potential function LJ(r) r⁻² = sumabs2(r) r⁻⁶ = 1.0/(r⁻²*r⁻²*r⁻²) J = r⁻⁶*r⁻⁶ - 2.0r⁻⁶ ∇ᵣJ = -12. * (r⁻⁶*r⁻⁶ - r⁻⁶) / r⁻² J, ∇ᵣJ #-- end function lj_cstyle2(x) d, N = size(x) E = 0.0 ∇E = zeros(x) #note that zeros supports this form too r = zeros(d) @inbounds for n = 1:(N-1), k = (n+1):N for i = 1:d r[i] = x[i,k]-x[i,n] end J, ∇ᵣJ = LJ(r) #-- E += J for i = 1:d ∇E[i, k] += ∇ᵣJ * r[i] ∇E[i, n] -= ∇ᵣJ * r[i] end end E, ∇E #return keyword not strictly required end For me, lj_cstyle2 runs in 83 ms, compared to 66 ms for lj_cstyle. There is some overhead from constructing and unpacking the tuples in the two lines marked with #--, but I think it is unavoidable if you want to have a single external function that computes both the potential and its gradient. Note that this version of LJ(r) computes two numbers, not a number and a vector. For any central potential the gradient along r, ∇ᵣJ, is sufficient to characterize the entire force ∇E, so there is not much loss of generality in not returning the force in the original coordinates. In this case, the mathematical structure of the problem is not entirely at odds with its implementation in code. Thanks, Jiahao Chen Staff Research Scientist MIT Computer Science and Artificial Intelligence Laboratory On Mon, Dec 1, 2014 at 12:14 PM, Christoph Ortner christophortn...@gmail.com wrote: Hi Steven, Thanks for the feedback - yes it was intentional, but I am definitely planning to extend the notebook with performance enhancements. I was not yet aware of this distinction between Int and Integer. Thanks for pointing it out. Christoph On Monday, 1 December 2014 15:49:10 UTC, Steven G. Johnson wrote: I notice that in your finite-element notebook you use: T = zeros(Integer, 3, 2*N^2) You really want to use Int in cases like this (Int = Int64 on 64-bit machines and Int32 on 32-bit machines). Using Integer means that you have an array of pointers to generic integer containers, whose type must be checked at runtime (e.g. T[i] could be an Int32, Int64, BigInt, etc.). (You also have very Matlab-like code that allocates zillions of little temporary arrays in your inner loops. I assume this is intentional, but it is definitely suboptimal performance, possibly by an order of magnitude.)
Re: [julia-users] Re: Simple Finite Difference Methods
* r⁻² should be r², of course. Note that the following very naive version of LJ(r) function LJ(r) r² = sumabs2(r) J = 1.0/(r²)^6 - 2.0/(r²)^3 ∇ᵣJ = (-12.0/(r²)^6 + 12.0/(r²)^3) / r² return J, ∇ᵣJ end results in code that runs in 0.42s, which is not terrible (and such performance would be expected for more complicated Lennard-Jones potentials which are not 12-6). Thanks, Jiahao Chen Staff Research Scientist MIT Computer Science and Artificial Intelligence Laboratory On Mon, Dec 1, 2014 at 9:40 PM, Jiahao Chen jia...@mit.edu wrote: I took a look at the Lennard-Jones example. On top of the memory allocation from array slicing, there is also a lot of overhead in passing lj(r) as an input function phi. There is also some overhead from computing size(x) in the range for the inner loop. Consider this version of the code, which is C style in that the loops are explicitly unrolled, but otherwise is not that far from the original lj_pretty: # definition of the Lennard-Jones potential function LJ(r) r⁻² = sumabs2(r) r⁻⁶ = 1.0/(r⁻²*r⁻²*r⁻²) J = r⁻⁶*r⁻⁶ - 2.0r⁻⁶ ∇ᵣJ = -12. * (r⁻⁶*r⁻⁶ - r⁻⁶) / r⁻² J, ∇ᵣJ #-- end function lj_cstyle2(x) d, N = size(x) E = 0.0 ∇E = zeros(x) #note that zeros supports this form too r = zeros(d) @inbounds for n = 1:(N-1), k = (n+1):N for i = 1:d r[i] = x[i,k]-x[i,n] end J, ∇ᵣJ = LJ(r) #-- E += J for i = 1:d ∇E[i, k] += ∇ᵣJ * r[i] ∇E[i, n] -= ∇ᵣJ * r[i] end end E, ∇E #return keyword not strictly required end For me, lj_cstyle2 runs in 83 ms, compared to 66 ms for lj_cstyle. There is some overhead from constructing and unpacking the tuples in the two lines marked with #--, but I think it is unavoidable if you want to have a single external function that computes both the potential and its gradient. Note that this version of LJ(r) computes two numbers, not a number and a vector. For any central potential the gradient along r, ∇ᵣJ, is sufficient to characterize the entire force ∇E, so there is not much loss of generality in not returning the force in the original coordinates. In this case, the mathematical structure of the problem is not entirely at odds with its implementation in code. Thanks, Jiahao Chen Staff Research Scientist MIT Computer Science and Artificial Intelligence Laboratory On Mon, Dec 1, 2014 at 12:14 PM, Christoph Ortner christophortn...@gmail.com wrote: Hi Steven, Thanks for the feedback - yes it was intentional, but I am definitely planning to extend the notebook with performance enhancements. I was not yet aware of this distinction between Int and Integer. Thanks for pointing it out. Christoph On Monday, 1 December 2014 15:49:10 UTC, Steven G. Johnson wrote: I notice that in your finite-element notebook you use: T = zeros(Integer, 3, 2*N^2) You really want to use Int in cases like this (Int = Int64 on 64-bit machines and Int32 on 32-bit machines). Using Integer means that you have an array of pointers to generic integer containers, whose type must be checked at runtime (e.g. T[i] could be an Int32, Int64, BigInt, etc.). (You also have very Matlab-like code that allocates zillions of little temporary arrays in your inner loops. I assume this is intentional, but it is definitely suboptimal performance, possibly by an order of magnitude.)
Re: [julia-users] Re: Simple Finite Difference Methods
Thank you for these ideas. I particularly like how to use unicode - I may need to learn how to do this. Having to manually write out all these loops is a bit of a pain. Is @devec not doing the job? You seem to say that I shouldn't use the return keyword - I like for readability. But is there a Julia style guide I should start following? Christoph
Re: [julia-users] Re: Simple Finite Difference Methods
I took a look at the Lennard-Jones example. On top of the memory allocation from array slicing, there is also a lot of overhead in passing lj(r) as an input function phi. this is counterintuitive to me. Why is this a problem? Christoph
[julia-users] Re: Simple Finite Difference Methods
How do you get timings from the Julia profiler, or even better, %-es? I guess one can convert from the numbers one gets, but it is a bit painful? Christoph
[julia-users] Re: Simple Finite Difference Methods
Belated update to this thread: I have now finished a first draft of three tutorial-like numerical PDE notebooks; they can be viewed at http://homepages.warwick.ac.uk/staff/C.Ortner/index.php?page=julia I have two more coming up in the near future, one on spectral methods, the other on an optimisation problem. For the moment, I am using them primarily for my research group to learn Julia, and to show it to colleagues when they are interested. Q1: May I use the Julia logo on that website, as well as for any tutorials / courses that I teach based on Julia? Q2: Eventually I think it would be good to have a Julia Examples page such as http://www.mathworks.com/examples/ Q3: I'd of course be interested in feedback.
[julia-users] Re: Simple Finite Difference Methods
Nice work! Regarding the pretty Julia version of Lennard-Jones MD. You can shape of another second (on my machine) by not passing in the lj method as a parameter, but directly calling it. I tried to write an optimize version of your lj_pretty function by analysing it with @profile and rewriting the slow parts. You can see my results here: https://gist.github.com/vchuravy/f42f458717a7a49395a5 I went step for step through it and applied one optimization at a time. You can also see the time computation time spend at each line as a comment. Mostly I just removed temporary array allocation and then applied your math optimization. One question though. In lj_cstyle(x) you calculate dJ = -12.*(t*t - t) * s , shouldn't it be dJ = -12.*(t*t - t) / s? Kind regards, Valentin On Sunday, 30 November 2014 12:51:31 UTC+1, Christoph Ortner wrote: Belated update to this thread: I have now finished a first draft of three tutorial-like numerical PDE notebooks; they can be viewed at http://homepages.warwick.ac.uk/staff/C.Ortner/index.php?page=julia I have two more coming up in the near future, one on spectral methods, the other on an optimisation problem. For the moment, I am using them primarily for my research group to learn Julia, and to show it to colleagues when they are interested. Q1: May I use the Julia logo on that website, as well as for any tutorials / courses that I teach based on Julia? Q2: Eventually I think it would be good to have a Julia Examples page such as http://www.mathworks.com/examples/ Q3: I'd of course be interested in feedback.
[julia-users] Re: Simple Finite Difference Methods
I found a second error in lj_cstyle t is calculated wrongly: t = 1./s*s*s != 1/s^3 It probably should be t = 1.0 / (s * s * s) t = 1.0 / (s*s*s) E += t*t - 2.*t dJ = -12.0 *(t*t - t) / s cstyle is on my machine still two times faster then my optimized variant of jl_pretty Best Valentin On Sunday, 30 November 2014 20:54:58 UTC+1, Valentin Churavy wrote: Nice work! Regarding the pretty Julia version of Lennard-Jones MD. You can shape of another second (on my machine) by not passing in the lj method as a parameter, but directly calling it. I tried to write an optimize version of your lj_pretty function by analysing it with @profile and rewriting the slow parts. You can see my results here: https://gist.github.com/vchuravy/f42f458717a7a49395a5 I went step for step through it and applied one optimization at a time. You can also see the time computation time spend at each line as a comment. Mostly I just removed temporary array allocation and then applied your math optimization. One question though. In lj_cstyle(x) you calculate dJ = -12.*(t*t - t) * s , shouldn't it be dJ = -12.*(t*t - t) / s? Kind regards, Valentin On Sunday, 30 November 2014 12:51:31 UTC+1, Christoph Ortner wrote: Belated update to this thread: I have now finished a first draft of three tutorial-like numerical PDE notebooks; they can be viewed at http://homepages.warwick.ac.uk/staff/C.Ortner/index.php?page=julia I have two more coming up in the near future, one on spectral methods, the other on an optimisation problem. For the moment, I am using them primarily for my research group to learn Julia, and to show it to colleagues when they are interested. Q1: May I use the Julia logo on that website, as well as for any tutorials / courses that I teach based on Julia? Q2: Eventually I think it would be good to have a Julia Examples page such as http://www.mathworks.com/examples/ Q3: I'd of course be interested in feedback.
[julia-users] Re: Simple Finite Difference Methods
I prefer to construct multidimensional Laplacian matrices from the 1d ones via Kronecker products, and the 1d ones from -D'*D where D is a 1d first-derivative matrix, to make the structure (symmetric-definite!) and origin of the matrices clearer. I've posted a notebook showing some examples from my 18.303 class at MIT: http://nbviewer.ipython.org/url/math.mit.edu/~stevenj/18.303/lecture-10.ipynb (It would be nicer to construct the sdiff1 function so that the matrix is sparse from the beginning, rather than converting from a dense matrix. But I was lazy and dense matrices on 1d grids are cheap.)
[julia-users] Re: Simple Finite Difference Methods
Turns out this was *very* quick to translate from the Matlab codes. If there is any interest I can post my IJulia notebook somewhere. On Thursday, 20 November 2014 15:08:30 UTC, Christoph Ortner wrote: I am trying to port http://uk.mathworks.com/examples/matlab/1091-finite-difference-laplacian to Julia for a quick Julia intro for some grad students at my department. Has anybody ported `numgrid` and `delsq` ? Thanks, Christoph
Re: [julia-users] Re: Simple Finite Difference Methods
I've put the current version at https://dl.dropboxusercontent.com/u/9561945/JuliaNBs/Laplacian.ipynb Would there be any interest in porting a larger set of Matlab examples to Julia? I wouldn't mind porting a subset of the Mathematics section on http://uk.mathworks.com/examples/matlab This might be a small way how I could contribute here, as so far I've mostly been enjoying the benefits of everyone else's hard work :). Christoph On Thursday, 20 November 2014 17:33:25 UTC, Stefan Karpinski wrote: Yes, translation from Matlab to Julia is generally pretty easy. Would be cool to take a look at at the code you ended up with. On Thu, Nov 20, 2014 at 11:54 AM, Christoph Ortner christop...@gmail.com javascript: wrote: Turns out this was *very* quick to translate from the Matlab codes. If there is any interest I can post my IJulia notebook somewhere. On Thursday, 20 November 2014 15:08:30 UTC, Christoph Ortner wrote: I am trying to port http://uk.mathworks.com/examples/matlab/1091-finite-difference-laplacian to Julia for a quick Julia intro for some grad students at my department. Has anybody ported `numgrid` and `delsq` ? Thanks, Christoph
Re: [julia-users] Re: Simple Finite Difference Methods
I would be quite wary of material that may be copyright MathWorks. You may not be legally allowed to port it or even download it. On Nov 20, 2014, at 2:23 PM, Christoph Ortner christophortn...@gmail.com wrote: I've put the current version at https://dl.dropboxusercontent.com/u/9561945/JuliaNBs/Laplacian.ipynb Would there be any interest in porting a larger set of Matlab examples to Julia? I wouldn't mind porting a subset of the Mathematics section on http://uk.mathworks.com/examples/matlab This might be a small way how I could contribute here, as so far I've mostly been enjoying the benefits of everyone else's hard work :). Christoph On Thursday, 20 November 2014 17:33:25 UTC, Stefan Karpinski wrote: Yes, translation from Matlab to Julia is generally pretty easy. Would be cool to take a look at at the code you ended up with. On Thu, Nov 20, 2014 at 11:54 AM, Christoph Ortner christop...@gmail.com wrote: Turns out this was *very* quick to translate from the Matlab codes. If there is any interest I can post my IJulia notebook somewhere. On Thursday, 20 November 2014 15:08:30 UTC, Christoph Ortner wrote: I am trying to port http://uk.mathworks.com/examples/matlab/1091-finite-difference-laplacian to Julia for a quick Julia intro for some grad students at my department. Has anybody ported `numgrid` and `delsq` ? Thanks, Christoph
Re: [julia-users] Re: Simple Finite Difference Methods
The usual copyright issues apply here though: numgird and delsq are copyrighted Matlab functions which you are not allowed to copy and even less allowed post copied code. Best re-implement them without looking at the source code! Same goes for all of http://uk.mathworks.com/examples/matlab On Thu, 2014-11-20 at 20:23, Christoph Ortner christophortn...@gmail.com wrote: I've put the current version at https://dl.dropboxusercontent.com/u/9561945/JuliaNBs/Laplacian.ipynb Would there be any interest in porting a larger set of Matlab examples to Julia? I wouldn't mind porting a subset of the Mathematics section on http://uk.mathworks.com/examples/matlab This might be a small way how I could contribute here, as so far I've mostly been enjoying the benefits of everyone else's hard work :). Christoph On Thursday, 20 November 2014 17:33:25 UTC, Stefan Karpinski wrote: Yes, translation from Matlab to Julia is generally pretty easy. Would be cool to take a look at at the code you ended up with. On Thu, Nov 20, 2014 at 11:54 AM, Christoph Ortner christop...@gmail.com javascript: wrote: Turns out this was *very* quick to translate from the Matlab codes. If there is any interest I can post my IJulia notebook somewhere. On Thursday, 20 November 2014 15:08:30 UTC, Christoph Ortner wrote: I am trying to port http://uk.mathworks.com/examples/matlab/1091-finite-difference-laplacian to Julia for a quick Julia intro for some grad students at my department. Has anybody ported `numgrid` and `delsq` ? Thanks, Christoph
Re: [julia-users] Re: Simple Finite Difference Methods
too bad. Anyhow, if I continue to develop a similar set of elementary tutorials, I will make them available. Christoph On Thursday, 20 November 2014 19:29:34 UTC, Mauro wrote: The usual copyright issues apply here though: numgird and delsq are copyrighted Matlab functions which you are not allowed to copy and even less allowed post copied code. Best re-implement them without looking at the source code! Same goes for all of http://uk.mathworks.com/examples/matlab On Thu, 2014-11-20 at 20:23, Christoph Ortner christop...@gmail.com javascript: wrote: I've put the current version at https://dl.dropboxusercontent.com/u/9561945/JuliaNBs/Laplacian.ipynb Would there be any interest in porting a larger set of Matlab examples to Julia? I wouldn't mind porting a subset of the Mathematics section on http://uk.mathworks.com/examples/matlab This might be a small way how I could contribute here, as so far I've mostly been enjoying the benefits of everyone else's hard work :). Christoph On Thursday, 20 November 2014 17:33:25 UTC, Stefan Karpinski wrote: Yes, translation from Matlab to Julia is generally pretty easy. Would be cool to take a look at at the code you ended up with. On Thu, Nov 20, 2014 at 11:54 AM, Christoph Ortner christop...@gmail.com javascript: wrote: Turns out this was *very* quick to translate from the Matlab codes. If there is any interest I can post my IJulia notebook somewhere. On Thursday, 20 November 2014 15:08:30 UTC, Christoph Ortner wrote: I am trying to port http://uk.mathworks.com/examples/matlab/1091-finite-difference-laplacian to Julia for a quick Julia intro for some grad students at my department. Has anybody ported `numgrid` and `delsq` ? Thanks, Christoph
Re: [julia-users] Re: Simple Finite Difference Methods
I've also removed the notebook. On Thursday, 20 November 2014 20:15:51 UTC, Christoph Ortner wrote: too bad. Anyhow, if I continue to develop a similar set of elementary tutorials, I will make them available. Christoph On Thursday, 20 November 2014 19:29:34 UTC, Mauro wrote: The usual copyright issues apply here though: numgird and delsq are copyrighted Matlab functions which you are not allowed to copy and even less allowed post copied code. Best re-implement them without looking at the source code! Same goes for all of http://uk.mathworks.com/examples/matlab On Thu, 2014-11-20 at 20:23, Christoph Ortner christop...@gmail.com wrote: I've put the current version at https://dl.dropboxusercontent.com/u/9561945/JuliaNBs/Laplacian.ipynb Would there be any interest in porting a larger set of Matlab examples to Julia? I wouldn't mind porting a subset of the Mathematics section on http://uk.mathworks.com/examples/matlab This might be a small way how I could contribute here, as so far I've mostly been enjoying the benefits of everyone else's hard work :). Christoph On Thursday, 20 November 2014 17:33:25 UTC, Stefan Karpinski wrote: Yes, translation from Matlab to Julia is generally pretty easy. Would be cool to take a look at at the code you ended up with. On Thu, Nov 20, 2014 at 11:54 AM, Christoph Ortner christop...@gmail.com javascript: wrote: Turns out this was *very* quick to translate from the Matlab codes. If there is any interest I can post my IJulia notebook somewhere. On Thursday, 20 November 2014 15:08:30 UTC, Christoph Ortner wrote: I am trying to port http://uk.mathworks.com/examples/matlab/1091-finite-difference-laplacian to Julia for a quick Julia intro for some grad students at my department. Has anybody ported `numgrid` and `delsq` ? Thanks, Christoph
Re: [julia-users] Re: Simple Finite Difference Methods
The whole copyright stuff is a pain! IANAL, but as far as I understand it: - it's ok to copy examples which have some open source licence, probably needing to keep that licence. - it's ok to re-do someone elses examples as long as you don't look at source code. However, that is probably a bit tricky as examples are usually mixed with code. - if unsure, ask the original author. This should work fine for academics and the like but probably less well for Mathworks. This could be a PDE example to look into: http://wiki.scipy.org/PerformancePython Tutorials/examples would definitely be useful! On Thu, 2014-11-20 at 21:17, Christoph Ortner christophortn...@gmail.com wrote: I've also removed the notebook. On Thursday, 20 November 2014 20:15:51 UTC, Christoph Ortner wrote: too bad. Anyhow, if I continue to develop a similar set of elementary tutorials, I will make them available. Christoph On Thursday, 20 November 2014 19:29:34 UTC, Mauro wrote: The usual copyright issues apply here though: numgird and delsq are copyrighted Matlab functions which you are not allowed to copy and even less allowed post copied code. Best re-implement them without looking at the source code! Same goes for all of http://uk.mathworks.com/examples/matlab On Thu, 2014-11-20 at 20:23, Christoph Ortner christop...@gmail.com wrote: I've put the current version at https://dl.dropboxusercontent.com/u/9561945/JuliaNBs/Laplacian.ipynb Would there be any interest in porting a larger set of Matlab examples to Julia? I wouldn't mind porting a subset of the Mathematics section on http://uk.mathworks.com/examples/matlab This might be a small way how I could contribute here, as so far I've mostly been enjoying the benefits of everyone else's hard work :). Christoph On Thursday, 20 November 2014 17:33:25 UTC, Stefan Karpinski wrote: Yes, translation from Matlab to Julia is generally pretty easy. Would be cool to take a look at at the code you ended up with. On Thu, Nov 20, 2014 at 11:54 AM, Christoph Ortner christop...@gmail.com javascript: wrote: Turns out this was *very* quick to translate from the Matlab codes. If there is any interest I can post my IJulia notebook somewhere. On Thursday, 20 November 2014 15:08:30 UTC, Christoph Ortner wrote: I am trying to port http://uk.mathworks.com/examples/matlab/1091-finite-difference-laplacian to Julia for a quick Julia intro for some grad students at my department. Has anybody ported `numgrid` and `delsq` ? Thanks, Christoph
Re: [julia-users] Re: Simple Finite Difference Methods
Good idea to start from the Python tutorials - I will look into that. Christoph On Thursday, 20 November 2014 20:45:01 UTC, Mauro wrote: The whole copyright stuff is a pain! IANAL, but as far as I understand it: - it's ok to copy examples which have some open source licence, probably needing to keep that licence. - it's ok to re-do someone elses examples as long as you don't look at source code. However, that is probably a bit tricky as examples are usually mixed with code. - if unsure, ask the original author. This should work fine for academics and the like but probably less well for Mathworks. This could be a PDE example to look into: http://wiki.scipy.org/PerformancePython Tutorials/examples would definitely be useful! On Thu, 2014-11-20 at 21:17, Christoph Ortner christop...@gmail.com javascript: wrote: I've also removed the notebook. On Thursday, 20 November 2014 20:15:51 UTC, Christoph Ortner wrote: too bad. Anyhow, if I continue to develop a similar set of elementary tutorials, I will make them available. Christoph On Thursday, 20 November 2014 19:29:34 UTC, Mauro wrote: The usual copyright issues apply here though: numgird and delsq are copyrighted Matlab functions which you are not allowed to copy and even less allowed post copied code. Best re-implement them without looking at the source code! Same goes for all of http://uk.mathworks.com/examples/matlab On Thu, 2014-11-20 at 20:23, Christoph Ortner christop...@gmail.com wrote: I've put the current version at https://dl.dropboxusercontent.com/u/9561945/JuliaNBs/Laplacian.ipynb Would there be any interest in porting a larger set of Matlab examples to Julia? I wouldn't mind porting a subset of the Mathematics section on http://uk.mathworks.com/examples/matlab This might be a small way how I could contribute here, as so far I've mostly been enjoying the benefits of everyone else's hard work :). Christoph On Thursday, 20 November 2014 17:33:25 UTC, Stefan Karpinski wrote: Yes, translation from Matlab to Julia is generally pretty easy. Would be cool to take a look at at the code you ended up with. On Thu, Nov 20, 2014 at 11:54 AM, Christoph Ortner christop...@gmail.com javascript: wrote: Turns out this was *very* quick to translate from the Matlab codes. If there is any interest I can post my IJulia notebook somewhere. On Thursday, 20 November 2014 15:08:30 UTC, Christoph Ortner wrote: I am trying to port http://uk.mathworks.com/examples/matlab/1091-finite-difference-laplacian to Julia for a quick Julia intro for some grad students at my department. Has anybody ported `numgrid` and `delsq` ? Thanks, Christoph
Re: [julia-users] Re: Simple Finite Difference Methods
On 21 November 2014 05:35, Mauro mauro...@runbox.com wrote: - it's ok to re-do someone elses examples as long as you don't look at source code. However, that is probably a bit tricky as examples are usually mixed with code. Having been involved WiFi drivers back-in-the-day, it is insane what kind of hacks you can pull off to dodge copyright. The Wikipedia article on Clean room design [1] is a pretty good read. Pontus [1]: http://en.wikipedia.org/wiki/Clean_room_design