Re: Program size, linking matter, and static this()
I tried different versions of DMD 2.057: - compiled from sources in the release zip (Gentoo ebuild) - using the 32-bit binaries in the release zip - compiling the latest 32-bit version of DMD from the repository I tried different compiler flags or no flags at all, compiled similar code in C++ to see if the linker is ok and tried -m32 and -m64, all to no avail. Then I found a solution that I can hardly imagine happening only on my unique snow-flake of a system ;) : struct Test { __gshared byte abcd[10 * 1024 * 1024]; } If it weren't for your own test results, I'd assume there is a small compiler bug in the code that decides what can go into .bss, that makes it look only for data explicitly flagged as __gshared, but not other immutable data. (Something like that anyway.) I back-tracked the compiler code to where it either calls obj_bytes (good case, goes into .bss) or obj_lidata (bad case) to write the 10 MB of zeros. But there were so many call sites, that I figured someone with inside knowledge would figure it out faster. As a side-effect of this experiment I found this combination to do funny things at runtime: -- struct Test { byte arr1[1024 * 1024 * 10]; __gshared byte arr2[1024 * 1024 * 10]; } int main() { Test test; return 0; } -- -- Marco Am 18.01.2012, 11:18 Uhr, schrieb Walter Bright newshou...@digitalmars.com: On 1/18/2012 1:43 AM, Marco Leise wrote: It is back again! The following struct in my main module increases the executable size by 10MB with DMD 2.075: struct Test { byte abcd[10 * 1024 * 1024]; } Compiling it and obj2asm'ing the result, and you'll see it goes into the BSS segment: _TEXT segment dword use32 public 'CODE' ;size is 0 _TEXT ends _DATA segment para use32 public 'DATA';size is 12 _DATA ends CONST segment para use32 public 'CONST' ;size is 0 CONST ends _BSSsegment para use32 public 'BSS' ;size is 10485760 _BSSends FLATgroup extrn _D19TypeInfo_S3foo4Test6__initZ public _D3foo4Test6__initZ FMB segment dword use32 public 'DATA' ;size is 0 FMB ends FM segment dword use32 public 'DATA' ;size is 4 FM ends FME segment dword use32 public 'DATA' ;size is 0 FME ends extrn _D15TypeInfo_Struct6__vtblZ public _D3foo12__ModuleInfoZ _D19TypeInfo_S3foo4Test6__initZ COMDAT flags=x0 attr=x10 align=x0 _TEXT segment assume CS:_TEXT _TEXT ends _DATA segment _D3foo12__ModuleInfoZ: db 004h,000h,000h,0ff80h,000h,000h,000h,000h ; db 066h,06fh,06fh,000h ;foo. _DATA ends CONST segment CONST ends _BSSsegment _BSSends FMB segment FMB ends FM segment dd offset FLAT:_D3foo12__ModuleInfoZ FM ends FME segment FME ends _D19TypeInfo_S3foo4Test6__initZ comdat dd offset FLAT:_D15TypeInfo_Struct6__vtblZ db 000h,000h,000h,000h ; db 008h,000h,000h,000h ; dd offset FLAT:_D19TypeInfo_S3foo4Test6__initZ[03Ch] db 000h,000h,0ffa0h,000h,000h,000h,000h,000h ; db 000h,000h,000h,000h,000h,000h,000h,000h ; db 000h,000h,000h,000h,000h,000h,000h,000h ; db 000h,000h,000h,000h,000h,000h,000h,000h ; db 000h,000h,000h,000h,000h,000h,000h,000h ; db 001h,000h,000h,000h,066h,06fh,06fh,02eh ;foo. db 054h,065h,073h,074h,000h;Test. _D19TypeInfo_S3foo4Test6__initZ ends end - Adding a void main(){} yields an executable of 145,948 bytes.
Re: Program size, linking matter, and static this()
P.S.: I could have realized it earlier: DMD uses the Windows PE BSS section quite well! It is Linux where the .bss section is not used! I'll file a bug report about this after lunch and look forward to smaller executables under Linux any time soon :D
Re: Program size, linking matter, and static this()
Am 27.12.2011, 03:42 Uhr, schrieb Marco Leise marco.le...@gmx.de: Am 21.12.2011, 07:11 Uhr, schrieb Walter Bright newshou...@digitalmars.com: On 12/20/2011 5:52 PM, Marco Leise wrote: Ok, I jumped on the band wagon to early. Personally I only had this problem with classes and structs. struct Test { byte arr[1024 * 1024 *10]; } and class Test { byte arr[1024 * 1024 *10]; } both create a 10MB executable. While for the class, init may contain more data than just that one field, I don't see the struct adding anything or going into TLS. Can these initializers also go into .bss? The struct one already does. Compile it, obj2asm it, and you'll see it there. Ah, I see it now. Sorry for the noise! It is back again! The following struct in my main module increases the executable size by 10MB with DMD 2.075: struct Test { byte abcd[10 * 1024 * 1024]; } It seems not to do so with *both* of these declarations, that create static arrays in the module: byte abcd[10 * 1024 * 1024]; __gshared byte abcd[10 * 1024 * 1024];
Re: Program size, linking matter, and static this()
On 1/18/2012 1:43 AM, Marco Leise wrote: It is back again! The following struct in my main module increases the executable size by 10MB with DMD 2.075: struct Test { byte abcd[10 * 1024 * 1024]; } Compiling it and obj2asm'ing the result, and you'll see it goes into the BSS segment: _TEXT segment dword use32 public 'CODE' ;size is 0 _TEXT ends _DATA segment para use32 public 'DATA';size is 12 _DATA ends CONST segment para use32 public 'CONST' ;size is 0 CONST ends _BSSsegment para use32 public 'BSS' ;size is 10485760 _BSSends FLATgroup extrn _D19TypeInfo_S3foo4Test6__initZ public _D3foo4Test6__initZ FMB segment dword use32 public 'DATA' ;size is 0 FMB ends FM segment dword use32 public 'DATA' ;size is 4 FM ends FME segment dword use32 public 'DATA' ;size is 0 FME ends extrn _D15TypeInfo_Struct6__vtblZ public _D3foo12__ModuleInfoZ _D19TypeInfo_S3foo4Test6__initZ COMDAT flags=x0 attr=x10 align=x0 _TEXT segment assume CS:_TEXT _TEXT ends _DATA segment _D3foo12__ModuleInfoZ: db 004h,000h,000h,0ff80h,000h,000h,000h,000h ; db 066h,06fh,06fh,000h ;foo. _DATA ends CONST segment CONST ends _BSSsegment _BSSends FMB segment FMB ends FM segment dd offset FLAT:_D3foo12__ModuleInfoZ FM ends FME segment FME ends _D19TypeInfo_S3foo4Test6__initZ comdat dd offset FLAT:_D15TypeInfo_Struct6__vtblZ db 000h,000h,000h,000h ; db 008h,000h,000h,000h ; dd offset FLAT:_D19TypeInfo_S3foo4Test6__initZ[03Ch] db 000h,000h,0ffa0h,000h,000h,000h,000h,000h ; db 000h,000h,000h,000h,000h,000h,000h,000h ; db 000h,000h,000h,000h,000h,000h,000h,000h ; db 000h,000h,000h,000h,000h,000h,000h,000h ; db 000h,000h,000h,000h,000h,000h,000h,000h ; db 001h,000h,000h,000h,066h,06fh,06fh,02eh ;foo. db 054h,065h,073h,074h,000h;Test. _D19TypeInfo_S3foo4Test6__initZ ends end - Adding a void main(){} yields an executable of 145,948 bytes.
Re: Program size, linking matter, and static this()
Am 18.01.2012, 11:18 Uhr, schrieb Walter Bright newshou...@digitalmars.com: On 1/18/2012 1:43 AM, Marco Leise wrote: It is back again! The following struct in my main module increases the executable size by 10MB with DMD 2.075: struct Test { byte abcd[10 * 1024 * 1024]; } Compiling it and obj2asm'ing the result, and you'll see it goes into the BSS segment: [...] Adding a void main(){} yields an executable of 145,948 bytes. Thanks for checking back. I'll have to experiment a bit to narrow this one down. It comes and goes like a ghost. I was using Linux 64-bit and the switches -O -release on a medium size code base.
Re: Program size, linking matter, and static this()
Yes they are. static constructors completely chicken out on them. Not only is there no real attempt to determine whether the static constructors are actually dependent (which granted, isn't an easy problem), but there is _zero_ support in the language for resolving such circular dependencies. There's no way to say that they _aren't_ dependent even if you can clearly see that they aren't. The solution used in Phobos (which won't work in std.datetime due to the use of immutable and pure) is to create a C module which has the code from the static constructor and then have a separate module which calls it in its static constructor. Which is a hack because that C function is a compiler wall while the dependency persists. Btw. that stdiobase and datebase are obsolete the cycles have vanished. You will get this only if std.dateparse had a shared static ctor too. Cycle detected between modules with ctors/dtors: std.date - std.dateparse - std.date object.Exception@src/rt/minfo.d(309): Aborting! There is a cleaner hack to solve the issue but I really don't like it. It's two DAGs that are iterated one for shared static this and one for static this. module a; import b; shared static this() { } module b; import a, core.atomic : cas; shared bool initialized; static this() { if (!cas(initialized, false, true)) return; ... }
Re: Program size, linking matter, and static this()
On Wed, 18 Jan 2012 12:14:07 +0100, Martin Nowak d...@dawgfoto.de wrote: Yes they are. static constructors completely chicken out on them. Not only is there no real attempt to determine whether the static constructors are actually dependent (which granted, isn't an easy problem), but there is _zero_ support in the language for resolving such circular dependencies. There's no way to say that they _aren't_ dependent even if you can clearly see that they aren't. The solution used in Phobos (which won't work in std.datetime due to the use of immutable and pure) is to create a C module which has the code from the static constructor and then have a separate module which calls it in its static constructor. Which is a hack because that C function is a compiler wall while the dependency persists. Btw. that stdiobase and datebase are obsolete the cycles have vanished. You will get this only if std.dateparse had a shared static ctor too. Cycle detected between modules with ctors/dtors: std.date - std.dateparse - std.date object.Exception@src/rt/minfo.d(309): Aborting! There is a cleaner hack to solve the issue but I really don't like it. It's two DAGs that are iterated one for shared static this and one for static this. module a; import b; shared static this() { } module b; import a, core.atomic : cas; shared bool initialized; static this() { if (!cas(initialized, false, true)) return; ... } Forget about it. Immutable initialization shouldn't work from thread local ctors. But hey I found a bug and it already had a number http://d.puremagic.com/issues/show_bug.cgi?id=4923.
Re: Program size, linking matter, and static this()
I'm not an expert in linkers, but my understanding is that linkers naturally remove unused object files. That, coupled with dmd's ability to break compilation output in many pseudo-object files, would take care of the matter. Truth be told, once you link in Object.factory(), bam - all classes are linked. That's strange, because Object.factory should only require TypeInfo_Class which only indirectly iterates through all modules. The ModuleInfos do drag in all their classes so what we currently don't get is a module with only some of it's classes. What OS are you using? Can you bundle up some files that reproduce this?
Re: Program size, linking matter, and static this()
Am 21.12.2011, 07:11 Uhr, schrieb Walter Bright newshou...@digitalmars.com: On 12/20/2011 5:52 PM, Marco Leise wrote: Ok, I jumped on the band wagon to early. Personally I only had this problem with classes and structs. struct Test { byte arr[1024 * 1024 *10]; } and class Test { byte arr[1024 * 1024 *10]; } both create a 10MB executable. While for the class, init may contain more data than just that one field, I don't see the struct adding anything or going into TLS. Can these initializers also go into .bss? The struct one already does. Compile it, obj2asm it, and you'll see it there. Ah, I see it now. Sorry for the noise!
Re: Program size, linking matter, and static this()
21.12.2011 0:22, Walter Bright пишет: First off, dmd most definitely puts 0 initialized static data into the BSS segment. So what's going on here? 1. char data is not initialized to 0, it is initialized to 0xFF. Non-zero data cannot be put in BSS. Sorry, it was because of copying C code in my post. ubyte array was tested in D. 2. Static data goes, by default, into thread local storage. BSS data is not thread local. To put it in global data, it has to be declared with __gshared. I completely forgot about TLS. So, __gshared byte arr[1024 * 1024 *10]; will go into BSS. There is pretty much no reason to have such huge arrays in static data. Instead, dynamically allocate them. Of course, it was just an example of a huge executable. Now I see that dmd uses BSS , thank you for the explanation! I still think that zero-filled TLS arrays can occupy no size in the executable, but it should be done with compiler and D run-time system support and surely it is not worth the time it will take to implement. I apologize for the unfair accusation.
Re: Program size, linking matter, and static this()
On Wed, 21 Dec 2011 07:34:30 +0200, Jonathan M Davis jmdavisp...@gmx.com wrote: On Wednesday, December 21, 2011 06:18:59 Jakob Ovrum wrote: On Wednesday, 21 December 2011 at 02:10:30 UTC, Jonathan M Davis It's not the only place in Phobos which uses a class as a namespace. I believe that both std.process and std.windows.registry are doing the same thing. In this case, it nicely group all of the functions that are grabbing the time in one form or another. They're all effectively grabbing the time from the system clock, so they're grouped on Clock. - Jonathan M Davis Sounds like the perfect candidate for its own module. Not out of the question, I suppose, but it would make an awfully small module and would inevitably make it that much harder for people to figure out how to get the current time. - Jonathan M Davis Supporting module nesting in single file wouldn't hurt, would it? module main; module nested { }
Re: Program size, linking matter, and static this()
On 21. 12. 2011 14:22, so wrote: Supporting module nesting in single file wouldn't hurt, would it? module main; module nested { } Kind of... template MyNamespaceImpl () { int i; } alias MyNamespaceImpl!() MyNamespace; void main () { MyNamespace.i = 1; with (MyNamespace) { i = 2; } writeln(MyNamespace.i); readln(); }
Re: Program size, linking matter, and static this()
16.12.2011 21:29, Andrei Alexandrescu пишет: Hello, Late last night Walter and I figured a few interesting tidbits of information. Allow me to give some context, discuss them, and sketch a few approaches for improving things. A while ago Walter wanted to enable function-level linking, i.e. only get the needed functions from a given (and presumably large) module. So he arranged things that a library contains many small object files (that actually are generated from a single .d file and never exist on disk, only inside the library file, which can be considered an archive like tar). Then the linker would only pick the used object files from the library and link those in. Unfortunately that didn't have nearly the expected impact - essentially the size of most binaries stayed the same. The mystery was unsolved, and Walter needed to move on to other things. One particularly annoying issue is that even programs that don't ostensibly use anything from an imported module may balloon inexplicably in size. Consider: import std.path; void main(){} This program, after stripping and all, has some 750KB in size. Removing the import line reduces the size to 218KB. That includes the runtime support, garbage collector, and such, and I'll consider it a baseline. (A similar but separate discussion could be focused on reducing the baseline size, but herein I'll consider it constant.) What we'd simply want is to be able to import stuff without blatantly paying for what we don't use. If a program imports std.path and uses no function from it, it should be as large as a program without the import. Furthermore, the increase should be incremental - using 2-3 functions from std.path should only increase the executable size by a little, not suddenly link in all code in that module. But in experiments it seemed like program size would increase in sudden amounts when certain modules were included. After much investigation we figured that the following fateful causal sequence happened: 1. Some modules define static constructors with static this() or static shared this(), and/or static destructors. 2. These constructors/destructors are linked in automatically whenever a module is included. 3. Importing a module with a static constructor (or destructor) will generate its ModuleInfo structure, which contains static information about all module members. In particular, it keeps virtual table pointers for all classes defined inside the module. 4. That means generating ModuleInfo refers all virtual functions defined in that module, whether they're used or not. 5. The phenomenon is transitive, e.g. even if std.path has no static constructors but imports std.datetime which does, a ModuleInfo is generated for std.path too, in addition to the one for std.datetime. So now classes inside std.path (if any) will be all linked in. 6. It follows that a module that defines classes which in turn use other functions in other modules, and has static constructors (or includes other modules that do) will baloon the size of the executable suddenly. There are a few approaches that we can use to improve the state of affairs. A. On the library side, use static constructors and destructors sparingly inside druntime and std. We can use lazy initialization instead of compulsively initializing library internals. I think this is often a worthy thing to do in any case (dynamic libraries etc) because it only does work if and when work needs to be done at the small cost of a check upon each use. B. On the compiler side, we could use a similar lazy initialization trick to only refer class methods in the module if they're actually needed. I'm being vague here because I'm not sure what and how that can be done. Here's a list of all files in std using static cdtors: std/__fileinit.d std/concurrency.d std/cpuid.d std/cstream.d std/datebase.d std/datetime.d std/encoding.d std/internal/math/biguintcore.d std/internal/math/biguintx86.d std/internal/processinit.d std/internal/windows/advapi32.d std/mmfile.d std/parallelism.d std/perf.d std/socket.d std/stdiobase.d std/uri.d The majority of them don't do a lot of work and are not much used inside phobos, so they don't blow up the executable. The main one that could receive some attention is std.datetime. It has a few static ctors and a lot of classes. Essentially just importing std.datetime or any std module that transitively imports std.datetime (and there are many of them) ends up linking in most of Phobos and blows the size up from the 218KB baseline to 700KB. Jonathan, could I impose on you to replace all static cdtors in std.datetime with lazy initialization? I looked through it and it strikes me as a reasonably simple job, but I think you'd know better what to do than me. A similar effort could be conducted to reduce or eliminate static cdtors from druntime. I made the experiment of commenting them all, and that reduced the size of the baseline from 218KB to 200KB. This is a good amount, but not as
Re: Program size, linking matter, and static this()
On 12/20/11 9:00 AM, Denis Shelomovskij wrote: 16.12.2011 21:29, Andrei Alexandrescu пишет: [snip] Really sorry, but it sounds silly for me. It's a minor problem. Does anyone really cares about 600 KiB (3.5x) size change in an empty program? Yes, he does, but only if there is no other size increases in real programs. In my experience, in a system programming language people do care about baseline size for one reason or another. I'd agree the reason is often overstated. But I did notice that people take a look at D and use hello, world size as a proxy for language's overall overhead - runtime, handling of linking etc. You may or may not care about the conclusions of our investigation, but we and a category of people do care for a variety of project sizes and approaches to building them. Now dmd have at least _two order of magnitude_ file size increase. I posted that problem four months ago at Building GtkD app on Win32 results in 111 MiB file mostly from zeroes. [snip] --- char arr[1024 * 1024 * 10]; void main() { } --- [snip] If described issues aren't much more significant than static this(), show me where am I wrong, please. Using BSS is a nice optimization, but not all compilers do it and I know for a fact MSVC didn't have it for a long time. That's probably why I got used to thinking poor style when seeing a large statically-sized buffer with static duration. I'd say both issues deserve to be looked at, and saying one is more significant than the other would be difficult. Andrei
Re: Program size, linking matter, and static this()
Am 19.12.2011, 20:43 Uhr, schrieb Jacob Carlborg d...@me.com: It could be useful for a package manager. Theoretically all installed packages could share the same dynamic library. But I would guess the the packages would depend on different versions of the library and the package manager would end up installing a whole bunch of different versions of the Phobos and druntime. No! Let's please try to get closer to something that works with package managers than the situation on Windows. On Windows I see few applications that install libraries separately, unless they started on Linux or the libraries are established like DirectX. In the past DLLs from newly installed programs used to overwrite existing DLLs. IIRC the DLLs were then checked for their versions by installers, so they are never downgraded, but that still broke some applications with library updates that changed the API. Starting with Vista, there is the winsxs difrectory that - as I understand it - keeps a copy of every version of every dll associated to the programs that installed/use them. Package managers are close to my ideal world: - different API versions (major revisions) can be installed in parallel - applications link to the API version they were designed for - bug fixes replace the old DLL for the whole system, all applications benefit - RAM is shared between applications that use the same DLL I'd think it would be bad to make cuts here. If you cannot even imagine an operating system with 1000 little apps like type/cat, cp/copy, sed etc... written in D, because they would all link statically against the runtime and cause major bloat, then that is turning off another few % of C users and purists. You don't drive an off-road car, because you go off-roads so often, but because you could imagine it. (Please buy small cars for city use.) Linking against different library versions goes in practice like this: There is at least one version installed, maybe libphobos2.so.1.057. The 1 would be a major revision (one where hard deprecations occur), then there is a link named libphobos2.so.1 to that file, that all applications using API version 1 link against. So the actual file can be updated to libphobos2.so.1.058 without recompiles or breakage.
Re: Program size, linking matter, and static this()
Am 19.12.2011, 19:08 Uhr, schrieb Walter Bright newshou...@digitalmars.com: On 12/16/2011 2:55 PM, Walter Bright wrote: For example, in std.datetime there's final class Clock. It inherits nothing, and nothing can be derived from it. The comments for it say it is merely a namespace. It should be a struct. Or perhaps it should be in its own module. When I first saw it I thought That's how _Java_ goes about free functions: Make it a class. :)
Re: Program size, linking matter, and static this()
On Tuesday, 20 December 2011 at 20:51:38 UTC, Marco Leise wrote: Am 19.12.2011, 20:43 Uhr, schrieb Jacob Carlborg d...@me.com: On Windows I see few applications that install libraries separately, unless they started on Linux or the libraries are established like DirectX. In the past DLLs from newly installed programs used to overwrite existing DLLs. IIRC the DLLs were then checked for their versions by installers, so they are never downgraded, but that still broke some applications with library updates that changed the API. Starting with Vista, there is the winsxs difrectory that - as I understand it - keeps a copy of every version of every dll associated to the programs that installed/use them. Minor nitpick: winsxs has been around since XP.
Re: Program size, linking matter, and static this()
Am 20.12.2011, 16:00 Uhr, schrieb Denis Shelomovskij verylonglogin@gmail.com: The second dmd issue (that was discovered because of 99.00% of zeros) is that _it doesn't use bss section_. Lets look at the C++ program built using Microsoft's cl: --- char arr[1024 * 1024 * 10]; void main() { } --- It resultis in ~10KiB executable, because `arr` is initialized with zero bytes and put in bss section. If one of its elements is set to non-zero: --- char arr[1024 * 1024 * 10] = { 1 }; void main() { } --- The array can't be in .bss any more and resulting executable size will be increased by adding ~10MiB. The following D program results in ~10MiB executable: --- ubyte[1024 * 1024 * 10] arr; void main() { } --- So, if there really is a reason not to use .bss, it should be clearly explained. If described issues aren't much more significant than static this(), show me where am I wrong, please. +1. I didn't know about .bss, but static arrays of zeroes (global, struct, class) increasing the executable size looked like a problem wanting a solution. I hope it is easy to solve for dmd and is just an unimportant issue, so was never implemented.
Re: Program size, linking matter, and static this()
On 12/20/2011 6:23 AM, Andrei Alexandrescu wrote: On 12/20/11 9:00 AM, Denis Shelomovskij wrote: Now dmd have at least _two order of magnitude_ file size increase. I posted that problem four months ago at Building GtkD app on Win32 results in 111 MiB file mostly from zeroes. [snip] --- char arr[1024 * 1024 * 10]; void main() { } --- [snip] If described issues aren't much more significant than static this(), show me where am I wrong, please. Using BSS is a nice optimization, but not all compilers do it and I know for a fact MSVC didn't have it for a long time. That's probably why I got used to thinking poor style when seeing a large statically-sized buffer with static duration. I'd say both issues deserve to be looked at, and saying one is more significant than the other would be difficult. First off, dmd most definitely puts 0 initialized static data into the BSS segment. So what's going on here? 1. char data is not initialized to 0, it is initialized to 0xFF. Non-zero data cannot be put in BSS. 2. Static data goes, by default, into thread local storage. BSS data is not thread local. To put it in global data, it has to be declared with __gshared. So, __gshared byte arr[1024 * 1024 *10]; will go into BSS. There is pretty much no reason to have such huge arrays in static data. Instead, dynamically allocate them.
Re: Program size, linking matter, and static this()
On 12/20/2011 1:07 PM, Marco Leise wrote: +1. I didn't know about .bss, but static arrays of zeroes (global, struct, class) increasing the executable size looked like a problem wanting a solution. I hope it is easy to solve for dmd and is just an unimportant issue, so was never implemented. I added a faq entry for this.
Re: Program size, linking matter, and static this()
On Tuesday, 20 December 2011 at 14:01:04 UTC, Denis Shelomovskij wrote: Detailed description: GtkD is built using singe (gtk-one-obj.lib) or separate (one per source file) object files (gtk-sep-obj.lib). Than main.d that imports gtk.Main is built using those libraries. Than zeroCount utils is built and launched over resulting files: -- Now let's calculate zero bytes counts: -- Zero bytes| %|Non-zero| Total bytes|File 3628311| 21.56|13202153|16830464|gtk-one-obj.lib 1953124| 15.98|10272924|12226048|gtk-sep-obj.lib 127968798| 99.00| 1298430| 129267228|main-one-obj.exe 743821| 37.51| 1239183| 1983004|main-sep-obj.exe Done. So we have to use very slow per-file build to produce a good (not 100 MiB) executable. No matter what *.exe is launched, its process allocates ~20MiB of RAM (loaded Gtk dll-s). I believe this is bug 2254: http://d.puremagic.com/issues/show_bug.cgi?id=2254 The cause is the way DMD builds libraries. The old way of building libraries (using a librarian) does not create libraries that exhibit this problem when linked with an executable.
Re: Program size, linking matter, and static this()
Am 20.12.2011, 22:39 Uhr, schrieb Walter Bright newshou...@digitalmars.com: On 12/20/2011 1:07 PM, Marco Leise wrote: +1. I didn't know about .bss, but static arrays of zeroes (global, struct, class) increasing the executable size looked like a problem wanting a solution. I hope it is easy to solve for dmd and is just an unimportant issue, so was never implemented. I added a faq entry for this. Ok, I jumped on the band wagon to early. Personally I only had this problem with classes and structs. struct Test { byte arr[1024 * 1024 *10]; } and class Test { byte arr[1024 * 1024 *10]; } both create a 10MB executable. While for the class, init may contain more data than just that one field, I don't see the struct adding anything or going into TLS. Can these initializers also go into .bss?
Re: Program size, linking matter, and static this()
On Tuesday, December 20, 2011 17:32:53 Andrei Alexandrescu wrote: On 12/20/11 2:58 PM, Marco Leise wrote: Am 19.12.2011, 19:08 Uhr, schrieb Walter Bright newshou...@digitalmars.com: On 12/16/2011 2:55 PM, Walter Bright wrote: For example, in std.datetime there's final class Clock. It inherits nothing, and nothing can be derived from it. The comments for it say it is merely a namespace. It should be a struct. Or perhaps it should be in its own module. When I first saw it I thought That's how _Java_ goes about free functions: Make it a class. :) Same here. If I had my way I'd rethink the name of those functions. Having a cutesy prefix Clock. is hardly justifiable. It's not the only place in Phobos which uses a class as a namespace. I believe that both std.process and std.windows.registry are doing the same thing. In this case, it nicely group all of the functions that are grabbing the time in one form or another. They're all effectively grabbing the time from the system clock, so they're grouped on Clock. - Jonathan M Davis
Re: Program size, linking matter, and static this()
On Wednesday, 21 December 2011 at 02:10:30 UTC, Jonathan M Davis wrote: On Tuesday, December 20, 2011 17:32:53 Andrei Alexandrescu wrote: On 12/20/11 2:58 PM, Marco Leise wrote: Am 19.12.2011, 19:08 Uhr, schrieb Walter Bright newshou...@digitalmars.com: On 12/16/2011 2:55 PM, Walter Bright wrote: For example, in std.datetime there's final class Clock. It inherits nothing, and nothing can be derived from it. The comments for it say it is merely a namespace. It should be a struct. Or perhaps it should be in its own module. When I first saw it I thought That's how _Java_ goes about free functions: Make it a class. :) Same here. If I had my way I'd rethink the name of those functions. Having a cutesy prefix Clock. is hardly justifiable. It's not the only place in Phobos which uses a class as a namespace. I believe that both std.process and std.windows.registry are doing the same thing. In this case, it nicely group all of the functions that are grabbing the time in one form or another. They're all effectively grabbing the time from the system clock, so they're grouped on Clock. - Jonathan M Davis Sounds like the perfect candidate for its own module.
Re: Program size, linking matter, and static this()
On Wednesday, December 21, 2011 06:18:59 Jakob Ovrum wrote: On Wednesday, 21 December 2011 at 02:10:30 UTC, Jonathan M Davis It's not the only place in Phobos which uses a class as a namespace. I believe that both std.process and std.windows.registry are doing the same thing. In this case, it nicely group all of the functions that are grabbing the time in one form or another. They're all effectively grabbing the time from the system clock, so they're grouped on Clock. - Jonathan M Davis Sounds like the perfect candidate for its own module. Not out of the question, I suppose, but it would make an awfully small module and would inevitably make it that much harder for people to figure out how to get the current time. - Jonathan M Davis
Re: Program size, linking matter, and static this()
On Tuesday, December 20, 2011 21:34:30 Jonathan M Davis wrote: On Wednesday, December 21, 2011 06:18:59 Jakob Ovrum wrote: On Wednesday, 21 December 2011 at 02:10:30 UTC, Jonathan M Davis It's not the only place in Phobos which uses a class as a namespace. I believe that both std.process and std.windows.registry are doing the same thing. In this case, it nicely group all of the functions that are grabbing the time in one form or another. They're all effectively grabbing the time from the system clock, so they're grouped on Clock. - Jonathan M Davis Sounds like the perfect candidate for its own module. Not out of the question, I suppose, but it would make an awfully small module and would inevitably make it that much harder for people to figure out how to get the current time. Not to mention, I quite like the effect that you get with it as a class, since it's explicit that it's coming from the clock, whereas if it were a module, that wouldn't be the case. You get the same effect with std.process' Environment. When you're calling functions on it, it's explicit that you're getting information from and affecting the environment. In a way, it's like a singleton, but there's nothing to instantiate. - Jonathan M Davis
Re: Program size, linking matter, and static this()
On 12/20/2011 5:52 PM, Marco Leise wrote: Ok, I jumped on the band wagon to early. Personally I only had this problem with classes and structs. struct Test { byte arr[1024 * 1024 *10]; } and class Test { byte arr[1024 * 1024 *10]; } both create a 10MB executable. While for the class, init may contain more data than just that one field, I don't see the struct adding anything or going into TLS. Can these initializers also go into .bss? The struct one already does. Compile it, obj2asm it, and you'll see it there.
Re: Program size, linking matter, and static this()
On Fri, 16 Dec 2011 17:30:44 -0500, torhu no@spam.invalid wrote: On 16.12.2011 22:28, Steven Schveighoffer wrote: In short, dlls will solve the problem, let's work on that instead of shuffling around code. How exactly do they solve the problem? An exe plus a DLL version of the library will usually be larger than just a statically linked exe. The DLL is loaded into memory once. With static linking, it's loaded every time you run an exe. -Steve
Re: Program size, linking matter, and static this()
On Sun, 18 Dec 2011 18:02:10 -0500, Marco Leise marco.le...@gmx.de wrote: Am 16.12.2011, 23:08 Uhr, schrieb Steven Schveighoffer schvei...@yahoo.com: Note that on Linux today, the executable is not truly static -- OS libs are dynamically linked. That should hold true for any OS. Otherwise, how would the program communicate with the kernel and drivers, i.e. render a button on the screen? Some dynamically linked in functions must provide the interface to that administrative singleton that manages system resources. Not necessarily. On Linux, system calls provide the interface between the code and the OS. A system call is essentially an OS interrupt, similar to a network protocol. You don't need dynamic linking to implement it. Remember, Linux didn't even support dynamic libraries before kernel 1.2 maybe? Hm... must check wikipedia... But my point is, if the intention is that you have a myriad of D based libraries or executables on your system, then druntime and phobos enter the same realm as glibc. -Steve
Re: Program size, linking matter, and static this()
On 12/16/2011 2:55 PM, Walter Bright wrote: For example, in std.datetime there's final class Clock. It inherits nothing, and nothing can be derived from it. The comments for it say it is merely a namespace. It should be a struct. Or perhaps it should be in its own module.
Re: Program size, linking matter, and static this()
On 19.12.2011 16:08, Steven Schveighoffer wrote: On Fri, 16 Dec 2011 17:30:44 -0500, torhuno@spam.invalid wrote: On 16.12.2011 22:28, Steven Schveighoffer wrote: In short, dlls will solve the problem, let's work on that instead of shuffling around code. How exactly do they solve the problem? An exe plus a DLL version of the library will usually be larger than just a statically linked exe. The DLL is loaded into memory once. With static linking, it's loaded every time you run an exe. I thought we were talking about distribution sizes, not memory use. But anyway, DLL's won't do a lot as long as people don't have a whole bunch of D programs installed.
Re: Program size, linking matter, and static this()
On 12/19/2011 7:17 AM, Steven Schveighoffer wrote: On Fri, 16 Dec 2011 17:55:47 -0500, Walter Bright newshou...@digitalmars.com wrote: For example, in std.datetime there's final class Clock. It inherits nothing, and nothing can be derived from it. The comments for it say it is merely a namespace. It should be a struct. Although I don't disagree with you that it should be a struct and not a class, does it have anything in its vtbl anyways if it's final? Yes. The pointers to Object's functions, and a pointer to the TypeInfo for that class. I'm just trying to understand what gets pulled in when you import a module with static ctors... Write some trivial code snippets, compile them, and take a look at the object file with obj2asm.
Re: Program size, linking matter, and static this()
On Mon, 19 Dec 2011 13:09:18 -0500, torhu no@spam.invalid wrote: On 19.12.2011 16:08, Steven Schveighoffer wrote: On Fri, 16 Dec 2011 17:30:44 -0500, torhuno@spam.invalid wrote: On 16.12.2011 22:28, Steven Schveighoffer wrote: In short, dlls will solve the problem, let's work on that instead of shuffling around code. How exactly do they solve the problem? An exe plus a DLL version of the library will usually be larger than just a statically linked exe. The DLL is loaded into memory once. With static linking, it's loaded every time you run an exe. I thought we were talking about distribution sizes, not memory use. But anyway, DLL's won't do a lot as long as people don't have a whole bunch of D programs installed. Right, in order for dlls to make a difference, you need to separate the library install from the exe install, as is done most of the time. If you are installing one D application on your box, what would be the issue with the size anyway? The complaint is generally that the size is much bigger than a hello world compiled for C/C++, which obviously doesn't take into account that the C/C++ standard libraries are DLLs. -Steve
Re: Program size, linking matter, and static this()
On Mon, 19 Dec 2011 13:09:42 -0500, Walter Bright newshou...@digitalmars.com wrote: On 12/19/2011 7:17 AM, Steven Schveighoffer wrote: On Fri, 16 Dec 2011 17:55:47 -0500, Walter Bright newshou...@digitalmars.com wrote: For example, in std.datetime there's final class Clock. It inherits nothing, and nothing can be derived from it. The comments for it say it is merely a namespace. It should be a struct. Although I don't disagree with you that it should be a struct and not a class, does it have anything in its vtbl anyways if it's final? Yes. The pointers to Object's functions, and a pointer to the TypeInfo for that class. Well pointers to Object's functions shouldn't add any bloat. The TypeInfo may, but that shouldn't pull in any real code from the module, right? I'm just trying to understand what gets pulled in when you import a module with static ctors... Write some trivial code snippets, compile them, and take a look at the object file with obj2asm. I'll rephrase -- I'm trying to understand what's *supposed* to happen :) Trusting that the compiler is doing it right isn't always correct. Though it probably is in this case. -Steve
Re: Program size, linking matter, and static this()
On 2011-12-19 19:09, torhu wrote: On 19.12.2011 16:08, Steven Schveighoffer wrote: On Fri, 16 Dec 2011 17:30:44 -0500, torhuno@spam.invalid wrote: On 16.12.2011 22:28, Steven Schveighoffer wrote: In short, dlls will solve the problem, let's work on that instead of shuffling around code. How exactly do they solve the problem? An exe plus a DLL version of the library will usually be larger than just a statically linked exe. The DLL is loaded into memory once. With static linking, it's loaded every time you run an exe. I thought we were talking about distribution sizes, not memory use. But anyway, DLL's won't do a lot as long as people don't have a whole bunch of D programs installed. It could be useful for a package manager. Theoretically all installed packages could share the same dynamic library. But I would guess the the packages would depend on different versions of the library and the package manager would end up installing a whole bunch of different versions of the Phobos and druntime. -- /Jacob Carlborg
Re: Program size, linking matter, and static this()
On Sat, 17 Dec 2011 23:12:16 +0200, Jakob Ovrum jakobov...@gmail.com wrote: I suspect that the reason a static member function is prevalent is because it's easy to just make the constructor private (and not have to mess with things like C++'s `friend`). In D, there's no real difference because you can still use private members as long as you're in the same module. Exactly. there is no difference between static A.make and makeA in D. The only difference between them I can see is that the module-level function doesn't expose the class name directly when using the function, which is but a minor improvement. You have to expose either way no? A.make instead of makeA
Re: Program size, linking matter, and static this()
On Sunday, 18 December 2011 at 08:56:56 UTC, so wrote: You have to expose either way no? A.make instead of makeA Yeah, in most sane code, I would imagine so. But still, the original example was just `make` version `A.make`. They could both obscure their return type through various means (like auto), but imo it makes less sense to do so for the static member function - I would be surprised to call `A.make` and not get a value of type `A`. But it would only be a tiny improvement and I don't think it's really relevant to the singleton pattern.
Re: Program size, linking matter, and static this()
On Sunday, 18 December 2011 at 09:26:58 UTC, Jakob Ovrum wrote: On Sunday, 18 December 2011 at 08:56:56 UTC, so wrote: You have to expose either way no? A.make instead of makeA Yeah, in most sane code, I would imagine so. But still, the original example was just `make` version `A.make`. They could both obscure their return type through various means (like auto), but imo it makes less sense to do so for the static member function - I would be surprised to call `A.make` and not get a value of type `A`. But it would only be a tiny improvement and I don't think it's really relevant to the singleton pattern. Sorry, I'm wrong, that wasn't the case at all. The original example was indeed `A.make` versus `makeB`.
Re: Program size, linking matter, and static this()
Le 18/12/2011 03:01, Jonathan M Davis a écrit : On Saturday, December 17, 2011 19:44:28 deadalnix wrote: Very good point. CTFE is improving with each version of dmd, and is a real alternative to static this(); It should be considered when apropriate, it has many benefices. I think that in general, the uses for static this fall into one of two categories: 1. Initializing stuff that can't be initialized at compile time. This includes stuff like classes or AAs as well as stuff which needs to be initialized with a value which isn't known until runtime (e.g. when the program started running). 2. Calling functions which need to be called at the beginning of the program (e.g. a function which does something to the environment that the program is running in). As CTFE improves, #1 should become smaller and smaller, and static this should be needed less and less, but #2 will always remain. It _is_ however the far rarer of the two use cases. So, ultimately static this may become very rare. - Jonathan M Davis In the Java/C# world, they use dependency injection frameworks like Google Guice or picocontainer to deal with this issue. In the case of datetime, though, I suspect it would be a using a hammer to crush a fly.
Re: Program size, linking matter, and static this()
Am 16.12.2011, 23:08 Uhr, schrieb Steven Schveighoffer schvei...@yahoo.com: Note that on Linux today, the executable is not truly static -- OS libs are dynamically linked. That should hold true for any OS. Otherwise, how would the program communicate with the kernel and drivers, i.e. render a button on the screen? Some dynamically linked in functions must provide the interface to that administrative singleton that manages system resources.
Re: Program size, linking matter, and static this()
On Sat, 17 Dec 2011 01:50:51 +0200, Jonathan M Davis jmdavisp...@gmx.com wrote: On Friday, December 16, 2011 17:13:49 Andrei Alexandrescu wrote: Maybe there's an issue with the design. Maybe Singleton (the most damned of all patterns) is not the best choice here. Or maybe the use of an inheritance hierarchy with a grand total of 4 classes. Or maybe the encapsulation could be rethought. The general point is, a design lives within a language. Any language is going to disallow a few designs or make them unsuitable for particular situation. This is, again, multiplied by the context: it's the standard library. I don't know what's wrong with singletons. It's a great pattern in certain circumstances. I don't like patterns much but when it comes to singleton i absolutely hate it. Just ask yourself what does it do to earn that fancy name. NOTHING. It is nothing but a hype of those who want to rule everything with one paradigm. Generic solutions/rules/paradigms are our final target WHEN they are elegant. If you are using singleton in your C++/D (or any other M-P language) code, do yourself a favor and trash that book you learned it from. --- class A { static A make(); } class B; B makeB(); --- What A.make can do makeB can not? (Other than creating objects of two different types :P )
Re: Program size, linking matter, and static this()
Le 17/12/2011 00:18, maarten van damme a écrit : how did other languages solve this issue? I can't imagine D beeing the only language with static constructors, do they have that problem too? AFAIK, I believe like in D, it's best practice to avoid static constructors as much as possible in Java, Python and I imagine C# as well, even though the running order is well-defined. The dependency injection design pattern seems to help here.
Re: Program size, linking matter, and static this()
Le 16/12/2011 22:45, Andrei Alexandrescu a écrit : On 12/16/11 3:38 PM, Trass3r wrote: A related issue is phobos being an intermodule dependency monster. A simple hello world pulls in almost 30 modules! And std.stdio is supposed to be just a simple wrapper around C FILE. In fact it doesn't (after yesterday's commit). The std code in hello, world is a minuscule 3KB. The rest of 218KB is druntime. Once we solve the static constructor issue, function-level linking should take care of pulling only the minimum needed. One interesting fact is that a lot of issues that I tended to take non-critically (templates cause bloat, intermodule dependencies cause bloat, static linking creates large programs) looked a whole lot differently when I looked closer at causes and effects. Andrei Fantastic ! :)
Re: Program size, linking matter, and static this()
Le 17/12/2011 02:39, Andrei Alexandrescu a écrit : On 12/16/11 6:54 PM, Jonathan M Davis wrote: By contrast, we could have a simple feature that was explained in the documenation along with static constructors which made it easy to tell the compiler that the order doesn't matter - either by saying that it doesn't matter at all or that it doesn't matter in regards to a specific module. e.g. @nodepends(std.file) static this() { } Now the code doesn't have to be redesigned to get around the fact that the compiler just isn't smart enough to figure it out on its own. Sure, the feature is potentially unsafe, but so are plenty of other features in D. That is hardly a good argument in favor of the feature :o). One issue that you might have not considered is that this is more brittle than it might seem. Even though the dependency pattern is painfully obvious to the human at a point in time, maintenance work can easily change that, and in very non-obvious ways (e.g. dependency cycles spanning multiple modules). I've seen it happening in C++, and when you realize it it's quite mind-boggling. The best situation would be if the compiler was smart enough to figure it out for itself, but barring that this definitely seems like a far cleaner solution than having to try and figure out how to break up some of the initialization code for a module into a separate module, especially when features such as immutable and pure tend to make such separation impossible without some nasty casts. It would just be way simpler to have a feature which allowed you to tell the compiler that there was no dependency. I think the only right approach to this must be principled - either by CTFEing the constructor or by guaranteeing it calls no functions that may close a dependency cycle. Even without that, I'd say we're in very good shape. Andrei Very good point. CTFE is improving with each version of dmd, and is a real alternative to static this(); It should be considered when apropriate, it has many benefices.
Re: Program size, linking matter, and static this()
On 12/17/11 6:34 AM, so wrote: If you are using singleton in your C++/D (or any other M-P language) code, do yourself a favor and trash that book you learned it from. --- class A { static A make(); } class B; B makeB(); --- What A.make can do makeB can not? (Other than creating objects of two different types :P ) Singleton has two benefits. One, you can't accidentally create more than one instance. The second, which is often overlooked, is that you still benefit of polymorphism (as opposed to making its state global). Andrei
Re: Program size, linking matter, and static this()
On Sat, 17 Dec 2011 21:20:33 +0200, Andrei Alexandrescu seewebsiteforem...@erdani.org wrote: On 12/17/11 6:34 AM, so wrote: If you are using singleton in your C++/D (or any other M-P language) code, do yourself a favor and trash that book you learned it from. --- class A { static A make(); } class B; B makeB(); --- What A.make can do makeB can not? (Other than creating objects of two different types :P ) Singleton has two benefits. One, you can't accidentally create more than one instance. The second, which is often overlooked, is that you still benefit of polymorphism (as opposed to making its state global). Andrei Now i am puzzled, makeB does both and does better. (better as it doesn't expose any detail to user)
Re: Program size, linking matter, and static this()
On Saturday, 17 December 2011 at 21:02:58 UTC, so wrote: On Sat, 17 Dec 2011 21:20:33 +0200, Andrei Alexandrescu seewebsiteforem...@erdani.org wrote: On 12/17/11 6:34 AM, so wrote: If you are using singleton in your C++/D (or any other M-P language) code, do yourself a favor and trash that book you learned it from. --- class A { static A make(); } class B; B makeB(); --- What A.make can do makeB can not? (Other than creating objects of two different types :P ) Singleton has two benefits. One, you can't accidentally create more than one instance. The second, which is often overlooked, is that you still benefit of polymorphism (as opposed to making its state global). Andrei Now i am puzzled, makeB does both and does better. (better as it doesn't expose any detail to user) Both of your examples are the singleton pattern if `make` returns the same instance every time, and arguably (optionally?) A or B shouldn't be instantiable in any other way. I suspect that the reason a static member function is prevalent is because it's easy to just make the constructor private (and not have to mess with things like C++'s `friend`). In D, there's no real difference because you can still use private members as long as you're in the same module. The only difference between them I can see is that the module-level function doesn't expose the class name directly when using the function, which is but a minor improvement.
Re: Program size, linking matter, and static this()
On Saturday, December 17, 2011 19:44:28 deadalnix wrote: Very good point. CTFE is improving with each version of dmd, and is a real alternative to static this(); It should be considered when apropriate, it has many benefices. I think that in general, the uses for static this fall into one of two categories: 1. Initializing stuff that can't be initialized at compile time. This includes stuff like classes or AAs as well as stuff which needs to be initialized with a value which isn't known until runtime (e.g. when the program started running). 2. Calling functions which need to be called at the beginning of the program (e.g. a function which does something to the environment that the program is running in). As CTFE improves, #1 should become smaller and smaller, and static this should be needed less and less, but #2 will always remain. It _is_ however the far rarer of the two use cases. So, ultimately static this may become very rare. - Jonathan M Davis
Re: Program size, linking matter, and static this()
On Saturday, December 17, 2011 13:20:33 Andrei Alexandrescu wrote: On 12/17/11 6:34 AM, so wrote: If you are using singleton in your C++/D (or any other M-P language) code, do yourself a favor and trash that book you learned it from. --- class A { static A make(); } class B; B makeB(); --- What A.make can do makeB can not? (Other than creating objects of two different types :P ) Singleton has two benefits. One, you can't accidentally create more than one instance. The second, which is often overlooked, is that you still benefit of polymorphism (as opposed to making its state global). Yes. There are occasions when singleton is very useful and makes perfect sense. There's every possibity that it's a design pattern which is overused, and if you don't need it, you probably shouldn't use it, but there _are_ cases where it's useful. In the case of std.datetime, the UTC and LocalTime classes are singletons because there's absolutely no point in ever allocating multiple of them. It would be a waste of memory. Imagine if auto time = Clock.currTime(); had to allocate a LocalTime object every time. That's a lot of useless heap allocation. By making it a singleton, it's far more efficient. Currently, it does _no_ heap allocation, and once the singleton becomes lazy, it'll only allocate on the first call. I don't see a valid reason _not_ to use a singleton in this case - certainly not as long as time zones are classes, and I think that they make the most sense as classes considering what they have to do and how they have to behave. - Jonathan M Davis
Program size, linking matter, and static this()
Hello, Late last night Walter and I figured a few interesting tidbits of information. Allow me to give some context, discuss them, and sketch a few approaches for improving things. A while ago Walter wanted to enable function-level linking, i.e. only get the needed functions from a given (and presumably large) module. So he arranged things that a library contains many small object files (that actually are generated from a single .d file and never exist on disk, only inside the library file, which can be considered an archive like tar). Then the linker would only pick the used object files from the library and link those in. Unfortunately that didn't have nearly the expected impact - essentially the size of most binaries stayed the same. The mystery was unsolved, and Walter needed to move on to other things. One particularly annoying issue is that even programs that don't ostensibly use anything from an imported module may balloon inexplicably in size. Consider: import std.path; void main(){} This program, after stripping and all, has some 750KB in size. Removing the import line reduces the size to 218KB. That includes the runtime support, garbage collector, and such, and I'll consider it a baseline. (A similar but separate discussion could be focused on reducing the baseline size, but herein I'll consider it constant.) What we'd simply want is to be able to import stuff without blatantly paying for what we don't use. If a program imports std.path and uses no function from it, it should be as large as a program without the import. Furthermore, the increase should be incremental - using 2-3 functions from std.path should only increase the executable size by a little, not suddenly link in all code in that module. But in experiments it seemed like program size would increase in sudden amounts when certain modules were included. After much investigation we figured that the following fateful causal sequence happened: 1. Some modules define static constructors with static this() or static shared this(), and/or static destructors. 2. These constructors/destructors are linked in automatically whenever a module is included. 3. Importing a module with a static constructor (or destructor) will generate its ModuleInfo structure, which contains static information about all module members. In particular, it keeps virtual table pointers for all classes defined inside the module. 4. That means generating ModuleInfo refers all virtual functions defined in that module, whether they're used or not. 5. The phenomenon is transitive, e.g. even if std.path has no static constructors but imports std.datetime which does, a ModuleInfo is generated for std.path too, in addition to the one for std.datetime. So now classes inside std.path (if any) will be all linked in. 6. It follows that a module that defines classes which in turn use other functions in other modules, and has static constructors (or includes other modules that do) will baloon the size of the executable suddenly. There are a few approaches that we can use to improve the state of affairs. A. On the library side, use static constructors and destructors sparingly inside druntime and std. We can use lazy initialization instead of compulsively initializing library internals. I think this is often a worthy thing to do in any case (dynamic libraries etc) because it only does work if and when work needs to be done at the small cost of a check upon each use. B. On the compiler side, we could use a similar lazy initialization trick to only refer class methods in the module if they're actually needed. I'm being vague here because I'm not sure what and how that can be done. Here's a list of all files in std using static cdtors: std/__fileinit.d std/concurrency.d std/cpuid.d std/cstream.d std/datebase.d std/datetime.d std/encoding.d std/internal/math/biguintcore.d std/internal/math/biguintx86.d std/internal/processinit.d std/internal/windows/advapi32.d std/mmfile.d std/parallelism.d std/perf.d std/socket.d std/stdiobase.d std/uri.d The majority of them don't do a lot of work and are not much used inside phobos, so they don't blow up the executable. The main one that could receive some attention is std.datetime. It has a few static ctors and a lot of classes. Essentially just importing std.datetime or any std module that transitively imports std.datetime (and there are many of them) ends up linking in most of Phobos and blows the size up from the 218KB baseline to 700KB. Jonathan, could I impose on you to replace all static cdtors in std.datetime with lazy initialization? I looked through it and it strikes me as a reasonably simple job, but I think you'd know better what to do than me. A similar effort could be conducted to reduce or eliminate static cdtors from druntime. I made the experiment of commenting them all, and that reduced the size of the baseline from 218KB to 200KB. This is a
Re: Program size, linking matter, and static this()
Interesting stuff. Andrei Alexandrescu seewebsiteforem...@erdani.org wrote in message news:jcg2lu$17p2$1...@digitalmars.com... We can use lazy initialization instead of compulsively initializing library internals. I think this is often a worthy thing to do in any case (dynamic libraries etc) because it only does work if and when work needs to be done at the small cost of a check upon each use. That also has the benefit of reducing the risk of dreaded circular ctor dependency problems.
Re: Program size, linking matter, and static this()
On Fri, 16 Dec 2011 13:29:18 -0500, Andrei Alexandrescu seewebsiteforem...@erdani.org wrote: Hello, Late last night Walter and I figured a few interesting tidbits of information. Allow me to give some context, discuss them, and sketch a few approaches for improving things. A while ago Walter wanted to enable function-level linking, i.e. only get the needed functions from a given (and presumably large) module. So he arranged things that a library contains many small object files (that actually are generated from a single .d file and never exist on disk, only inside the library file, which can be considered an archive like tar). Then the linker would only pick the used object files from the library and link those in. Unfortunately that didn't have nearly the expected impact - essentially the size of most binaries stayed the same. The mystery was unsolved, and Walter needed to move on to other things. One particularly annoying issue is that even programs that don't ostensibly use anything from an imported module may balloon inexplicably in size. Consider: import std.path; void main(){} This program, after stripping and all, has some 750KB in size. Removing the import line reduces the size to 218KB. That includes the runtime support, garbage collector, and such, and I'll consider it a baseline. (A similar but separate discussion could be focused on reducing the baseline size, but herein I'll consider it constant.) What we'd simply want is to be able to import stuff without blatantly paying for what we don't use. If a program imports std.path and uses no function from it, it should be as large as a program without the import. Furthermore, the increase should be incremental - using 2-3 functions from std.path should only increase the executable size by a little, not suddenly link in all code in that module. But in experiments it seemed like program size would increase in sudden amounts when certain modules were included. After much investigation we figured that the following fateful causal sequence happened: 1. Some modules define static constructors with static this() or static shared this(), and/or static destructors. 2. These constructors/destructors are linked in automatically whenever a module is included. 3. Importing a module with a static constructor (or destructor) will generate its ModuleInfo structure, which contains static information about all module members. In particular, it keeps virtual table pointers for all classes defined inside the module. 4. That means generating ModuleInfo refers all virtual functions defined in that module, whether they're used or not. 5. The phenomenon is transitive, e.g. even if std.path has no static constructors but imports std.datetime which does, a ModuleInfo is generated for std.path too, in addition to the one for std.datetime. So now classes inside std.path (if any) will be all linked in. 6. It follows that a module that defines classes which in turn use other functions in other modules, and has static constructors (or includes other modules that do) will baloon the size of the executable suddenly. There are a few approaches that we can use to improve the state of affairs. A. On the library side, use static constructors and destructors sparingly inside druntime and std. We can use lazy initialization instead of compulsively initializing library internals. I think this is often a worthy thing to do in any case (dynamic libraries etc) because it only does work if and when work needs to be done at the small cost of a check upon each use. B. On the compiler side, we could use a similar lazy initialization trick to only refer class methods in the module if they're actually needed. I'm being vague here because I'm not sure what and how that can be done. I disagree with this assessment. It's good to know the cause of the problem, but let's look at the root issue -- reflection. The only reason to include class information for classes not being referenced is to be able to construct/use classes at runtime instead of at compile time. But if you look at D's runtime reflection capabilities, they are quite poor. You can only construct a class at runtime if it has a zero-arg constructor. So essentially, we are paying the penalty of having runtime reflection in terms of bloat, but get very very little benefit. I think there are two things that need to be considered: 1. We eventually should have some reasonably complete runtime reflection capability 2. Runtime reflection and shared libraries go hand-in-hand. With shared library support, the bloat penalty isn't nearly as significant. I don't think the right answer is to avoid using features of the language because the compiler/runtime has some design deficiencies. At some point these deficiencies will be fixed, and then we are left with a library that has seemingly odd
Re: Program size, linking matter, and static this()
On Friday, December 16, 2011 12:29:18 Andrei Alexandrescu wrote: Jonathan, could I impose on you to replace all static cdtors in std.datetime with lazy initialization? I looked through it and it strikes me as a reasonably simple job, but I think you'd know better what to do than me. A similar effort could be conducted to reduce or eliminate static cdtors from druntime. I made the experiment of commenting them all, and that reduced the size of the baseline from 218KB to 200KB. This is a good amount, but not as dramatic as what we can get by working on std.datetime. Hmm. I had reply for this already, but it seems to have disappeared, so I'll try again. You could make core.time use property functions instead of the static immutable variables that it's using now for ticksPerSec and appOrigin, but in order to do that right would require introducing a mutex or synchronized block (which is really just a mutex under the hood anyway), and I'm loathe to do that in time-related code. ticksPerSec gets used all over the place in TickDuration, and that could have a negative impact on performance for something that needs to be really fast (since it's used in stuff like StopWatch and benchmarking). On top of that, in order to maintain the current semantics, the property functions would have to be pure, which they can't be without doing some nasty casting to convince the compiler that stuff which isn't pure is actually pure. For std.datetime, the problem would be reduced if a class could be created in CTFE and still be around at runtime, but we can't do that yet, and it wouldn't completely solve the problem, since the shared static constructor related to LocalTime has to call tzset. So, some sort of runtime initialization must be done. And the instances for the singleton are not only immutable, but the functions for getting them are pure. So, once again, some nasty casting would be required to get it to work without breaking purity. And once again, we'd have introduce a mutex. And for both core.time and std.datetime we're talking about a mutex would be needed only briefly to ensure that we don't end up with two threads trying to initialize the variable at the same time. After that, it would just be impeding performance for no value. They're classic situations for static constructors - initializing static immutable variables - and really, they _should_ be using static constructors. If we have to get rid of them, it's to get around other problems in the language or compiler instead of fixing those problems. So, on some level, that seems like a failure on the part of the language and the compiler. If we _have_ to find a workaround, then we have to find a workaround, but I find the need to be distasteful to say the least. I previously tried to get rid of the static constructors in std.datetime and couldn't precisely because they're needed unless you play major casting games to get around immutable and pure. If we play nice, it's impossible to get rid of the static constructors in std.datetime. It probably is possible if we do nasty casting, but (much as I hate to use the word) it seems like this is a hack to get around the fact that the compiler isn't dealing with static constructors as well as we'd like. I'd _really_ like to see this fixed at the compiler level. And honestly, I think that a far worse problem with static constructors is circular dependencies. _That_ is something that needs to be addressed with regards to static constructors. In general at this point, it's looking like static constructors are turning out to be a bit of a failure on some level, given the issues that we're having because of them, and I think that we should fix the language and/or compiler so that they _aren't_ a failure. - Jonathan M Davis
Re: Program size, linking matter, and static this()
On 12/16/11 1:23 PM, Steven Schveighoffer wrote: I disagree with this assessment. It's good to know the cause of the problem, but let's look at the root issue -- reflection. The only reason to include class information for classes not being referenced is to be able to construct/use classes at runtime instead of at compile time. But if you look at D's runtime reflection capabilities, they are quite poor. You can only construct a class at runtime if it has a zero-arg constructor. So essentially, we are paying the penalty of having runtime reflection in terms of bloat, but get very very little benefit. I'd almost agree, but the code showed doesn't use Object.factory(). So that shouldn't be linked in, and shouldn't pull vtables. I think there are two things that need to be considered: 1. We eventually should have some reasonably complete runtime reflection capability 2. Runtime reflection and shared libraries go hand-in-hand. With shared library support, the bloat penalty isn't nearly as significant. I don't think the right answer is to avoid using features of the language because the compiler/runtime has some design deficiencies. At some point these deficiencies will be fixed, and then we are left with a library that has seemingly odd design choices that we can't change. Runtime reflection is great, but I think it's a separate issue from what's discussed here. Andrei
Re: Program size, linking matter, and static this()
On 12/16/2011 08:41 PM, Jonathan M Davis wrote: On Friday, December 16, 2011 12:29:18 Andrei Alexandrescu wrote: Jonathan, could I impose on you to replace all static cdtors in std.datetime with lazy initialization? I looked through it and it strikes me as a reasonably simple job, but I think you'd know better what to do than me. A similar effort could be conducted to reduce or eliminate static cdtors from druntime. I made the experiment of commenting them all, and that reduced the size of the baseline from 218KB to 200KB. This is a good amount, but not as dramatic as what we can get by working on std.datetime. Hmm. I had reply for this already, but it seems to have disappeared, so I'll try again. You could make core.time use property functions instead of the static immutable variables that it's using now for ticksPerSec and appOrigin, but in order to do that right would require introducing a mutex or synchronized block (which is really just a mutex under the hood anyway), and I'm loathe to do that in time-related code. ticksPerSec gets used all over the place in TickDuration, and that could have a negative impact on performance for something that needs to be really fast (since it's used in stuff like StopWatch and benchmarking). On top of that, in order to maintain the current semantics, the property functions would have to be pure, which they can't be without doing some nasty casting to convince the compiler that stuff which isn't pure is actually pure. lazy variables would resolve this. For std.datetime, the problem would be reduced if a class could be created in CTFE and still be around at runtime, but we can't do that yet, and it wouldn't completely solve the problem, since the shared static constructor related to LocalTime has to call tzset. So, some sort of runtime initialization must be done. And the instances for the singleton are not only immutable, but the functions for getting them are pure. So, once again, some nasty casting would be required to get it to work without breaking purity. And once again, we'd have introduce a mutex. And for both core.time and std.datetime we're talking about a mutex would be needed only briefly to ensure that we don't end up with two threads trying to initialize the variable at the same time. After that, it would just be impeding performance for no value. They're classic situations for static constructors - initializing static immutable variables - and really, they _should_ be using static constructors. If we have to get rid of them, it's to get around other problems in the language or compiler instead of fixing those problems. So, on some level, that seems like a failure on the part of the language no. and the compiler. yes. Although I am not severely affected by 500kb of bloat. If we _have_ to find a workaround, then we have to find a workaround, but I find the need to be distasteful to say the least. I previously tried to get rid of the static constructors in std.datetime and couldn't precisely because they're needed unless you play major casting games to get around immutable and pure. If we play nice, it's impossible to get rid of the static constructors in std.datetime. It probably is possible if we do nasty casting, but (much as I hate to use the word) it seems like this is a hack to get around the fact that the compiler isn't dealing with static constructors as well as we'd like. I'd _really_ like to see this fixed at the compiler level. And honestly, I think that a far worse problem with static constructors is circular dependencies. _That_ is something that needs to be addressed with regards to static constructors. Circular dependencies are not to be blamed on the design of static constructors. In general at this point, it's looking like static constructors are turning out to be a bit of a failure on some level, given the issues that we're having because of them, and I think that we should fix the language and/or compiler so that they _aren't_ a failure. - Jonathan M Davis We are having (minor!!) problems because the task of initializing global data in a modular way is inherently hard. Just have a look how other languages handle initialization of global data and you'll notice that the D solution is actually very sensible.
Re: Program size, linking matter, and static this()
On Friday, December 16, 2011 21:06:49 Timon Gehr wrote: On 12/16/2011 08:41 PM, Jonathan M Davis wrote: On Friday, December 16, 2011 12:29:18 Andrei Alexandrescu wrote: Jonathan, could I impose on you to replace all static cdtors in std.datetime with lazy initialization? I looked through it and it strikes me as a reasonably simple job, but I think you'd know better what to do than me. A similar effort could be conducted to reduce or eliminate static cdtors from druntime. I made the experiment of commenting them all, and that reduced the size of the baseline from 218KB to 200KB. This is a good amount, but not as dramatic as what we can get by working on std.datetime. Hmm. I had reply for this already, but it seems to have disappeared, so I'll try again. You could make core.time use property functions instead of the static immutable variables that it's using now for ticksPerSec and appOrigin, but in order to do that right would require introducing a mutex or synchronized block (which is really just a mutex under the hood anyway), and I'm loathe to do that in time-related code. ticksPerSec gets used all over the place in TickDuration, and that could have a negative impact on performance for something that needs to be really fast (since it's used in stuff like StopWatch and benchmarking). On top of that, in order to maintain the current semantics, the property functions would have to be pure, which they can't be without doing some nasty casting to convince the compiler that stuff which isn't pure is actually pure. lazy variables would resolve this. True, but we don't have them. Circular dependencies are not to be blamed on the design of static constructors. Yes they are. static constructors completely chicken out on them. Not only is there no real attempt to determine whether the static constructors are actually dependent (which granted, isn't an easy problem), but there is _zero_ support in the language for resolving such circular dependencies. There's no way to say that they _aren't_ dependent even if you can clearly see that they aren't. The solution used in Phobos (which won't work in std.datetime due to the use of immutable and pure) is to create a C module which has the code from the static constructor and then have a separate module which calls it in its static constructor. It works, but it's not pretty (and it doesn't always work - e.g. std.datetime), and it would be _far_ better if you could just mark a static constructor as not depending on anything or mark it as not depending on a specific module or something similar. And given how disgusting it generally is to even figure out what's causing a circular dependency when the runtime won't start your program because of it, I really think that this is a problem which should resolved. static constructors need to be improved. In general at this point, it's looking like static constructors are turning out to be a bit of a failure on some level, given the issues that we're having because of them, and I think that we should fix the language and/or compiler so that they _aren't_ a failure. - Jonathan M Davis We are having (minor!!) problems because the task of initializing global data in a modular way is inherently hard. Just have a look how other languages handle initialization of global data and you'll notice that the D solution is actually very sensible. Yes. The situation with D is better than that of many other languages, but what prodblems we do have can be _really_ annoying to deal with. Have to deal with circular dependencies due to static module constructors which aren't actually interdependent is one of the most annoying issues in D IMHO. - Jonathan M Davis
Re: Program size, linking matter, and static this()
On 12/16/11 1:41 PM, Jonathan M Davis wrote: You could make core.time use property functions instead of the static immutable variables that it's using now for ticksPerSec and appOrigin, but in order to do that right would require introducing a mutex or synchronized block (which is really just a mutex under the hood anyway), and I'm loathe to do that in time-related code. ticksPerSec gets used all over the place in TickDuration, and that could have a negative impact on performance for something that needs to be really fast (since it's used in stuff like StopWatch and benchmarking). On top of that, in order to maintain the current semantics, the property functions would have to be pure, which they can't be without doing some nasty casting to convince the compiler that stuff which isn't pure is actually pure. For std.datetime, the problem would be reduced if a class could be created in CTFE and still be around at runtime, but we can't do that yet, and it wouldn't completely solve the problem, since the shared static constructor related to LocalTime has to call tzset. So, some sort of runtime initialization must be done. And the instances for the singleton are not only immutable, but the functions for getting them are pure. So, once again, some nasty casting would be required to get it to work without breaking purity. And once again, we'd have introduce a mutex. And for both core.time and std.datetime we're talking about a mutex would be needed only briefly to ensure that we don't end up with two threads trying to initialize the variable at the same time. After that, it would just be impeding performance for no value. They're classic situations for static constructors - initializing static immutable variables - and really, they _should_ be using static constructors. If we have to get rid of them, it's to get around other problems in the language or compiler instead of fixing those problems. So, on some level, that seems like a failure on the part of the language and the compiler. If we _have_ to find a workaround, then we have to find a workaround, but I find the need to be distasteful to say the least. I previously tried to get rid of the static constructors in std.datetime and couldn't precisely because they're needed unless you play major casting games to get around immutable and pure. If we play nice, it's impossible to get rid of the static constructors in std.datetime. It probably is possible if we do nasty casting, but (much as I hate to use the word) it seems like this is a hack to get around the fact that the compiler isn't dealing with static constructors as well as we'd like. I'd _really_ like to see this fixed at the compiler level. I understand and empathize with the sentiment, and I agree with most of the technical points at face value, save for a few details. But there are other things at stake. Consider scope. Many arguments applicable to application code are not quite fit for the standard library. The stdlib is the connection between the compiler innards, the runtime innards, and the OS innards all meet, and the role of the stdlib is to provide nice abstractions to client code. Inside the stdlib it's entirely expected to find things like __traits most nobody heard of, casts, and other things that would be normally shunned in application code. I'd be more worried if there was no possibility to do what we need to do. The standard library is not a place to play it nice. We can't afford to say well yeah everyone's binary is bloated and slower to start but we didn't like the cast that would have taken care of that. As another matter, there is value in minimizing compulsive work during library startup. Consider for example this code in std.datetime: shared static this() { tzset(); _localTime = new immutable(LocalTime)(); } This summons the garbage collector right off the bat, thus wiping off anyone's chance of compiling and linking without a GC - as many people seem to want to do. And that happens not to programs that import and use std.datetime, but to program using any part of the standard library that transitively imports std.datetime, even for the most innocuous uses, and even if they never, ever use _localtime! That one line essentially locks out 75% of the standard library to anyone wishing to ever avoid using the GC. And honestly, I think that a far worse problem with static constructors is circular dependencies. _That_ is something that needs to be addressed with regards to static constructors. In general at this point, it's looking like static constructors are turning out to be a bit of a failure on some level, given the issues that we're having because of them, and I think that we should fix the language and/or compiler so that they _aren't_ a failure. Here I totally disagree. The design is sound. The issues discussed here are entirely detail implementation artifacts. Andrei
Re: Program size, linking matter, and static this()
On 2011-12-16 20:48, Andrei Alexandrescu wrote: On 12/16/11 1:23 PM, Steven Schveighoffer wrote: I disagree with this assessment. It's good to know the cause of the problem, but let's look at the root issue -- reflection. The only reason to include class information for classes not being referenced is to be able to construct/use classes at runtime instead of at compile time. But if you look at D's runtime reflection capabilities, they are quite poor. You can only construct a class at runtime if it has a zero-arg constructor. So essentially, we are paying the penalty of having runtime reflection in terms of bloat, but get very very little benefit. I'd almost agree, but the code showed doesn't use Object.factory(). So that shouldn't be linked in, and shouldn't pull vtables. There are other runtime reflection functionality that can be used. I think there are two things that need to be considered: 1. We eventually should have some reasonably complete runtime reflection capability 2. Runtime reflection and shared libraries go hand-in-hand. With shared library support, the bloat penalty isn't nearly as significant. I don't think the right answer is to avoid using features of the language because the compiler/runtime has some design deficiencies. At some point these deficiencies will be fixed, and then we are left with a library that has seemingly odd design choices that we can't change. Runtime reflection is great, but I think it's a separate issue from what's discussed here. I don't think it's completely separate. Can the compiler know if runtime reflection is used or not? -- /Jacob Carlborg
Re: Program size, linking matter, and static this()
On 12/16/11 2:47 PM, Jacob Carlborg wrote: I don't think it's completely separate. Can the compiler know if runtime reflection is used or not? Yes. Reflection is used if reflection primitive functions are called. Andrei
Re: Program size, linking matter, and static this()
On 2011-12-16 20:23, Steven Schveighoffer wrote: I disagree with this assessment. It's good to know the cause of the problem, but let's look at the root issue -- reflection. The only reason to include class information for classes not being referenced is to be able to construct/use classes at runtime instead of at compile time. But if you look at D's runtime reflection capabilities, they are quite poor. You can only construct a class at runtime if it has a zero-arg constructor. It's not very useful as is, but you can create your own version that doesn't call the constructor and that can be more useful sometimes. I'm using that technique in my serialization library and providing a special method that can act as a constructor. -- /Jacob Carlborg
Re: Program size, linking matter, and static this()
On Friday, December 16, 2011 14:44:48 Andrei Alexandrescu wrote: On 12/16/11 1:41 PM, Jonathan M Davis wrote: I understand and empathize with the sentiment, and I agree with most of the technical points at face value, save for a few details. But there are other things at stake. Consider scope. Many arguments applicable to application code are not quite fit for the standard library. The stdlib is the connection between the compiler innards, the runtime innards, and the OS innards all meet, and the role of the stdlib is to provide nice abstractions to client code. Inside the stdlib it's entirely expected to find things like __traits most nobody heard of, casts, and other things that would be normally shunned in application code. I'd be more worried if there was no possibility to do what we need to do. The standard library is not a place to play it nice. We can't afford to say well yeah everyone's binary is bloated and slower to start but we didn't like the cast that would have taken care of that. I'm not completely against this precisely because of this, but at the same time, it strikes me as completely ridiculous to have to resort to some nasty casting simply to reduce the binary size of the base executable. I'd much rather see the compiler improved such that this isn't necessary. As another matter, there is value in minimizing compulsive work during library startup. Consider for example this code in std.datetime: shared static this() { tzset(); _localTime = new immutable(LocalTime)(); } This summons the garbage collector right off the bat, thus wiping off anyone's chance of compiling and linking without a GC - as many people seem to want to do. And that happens not to programs that import and use std.datetime, but to program using any part of the standard library that transitively imports std.datetime, even for the most innocuous uses, and even if they never, ever use _localtime! That one line essentially locks out 75% of the standard library to anyone wishing to ever avoid using the GC. This, on the other hand, is of much greater concern, and is a much better argument for using the ugly casting necessary to get rid of the static constructors, even if the compiler did a fanastic job at cutting out the extra cruft in the binary - though as far as the GC goes, it might not be an issue once CTFE is good enough to create classes at compile time that still exist at runtime. Unfortunately, the necessity of tzset would remain however. And honestly, I think that a far worse problem with static constructors is circular dependencies. _That_ is something that needs to be addressed with regards to static constructors. In general at this point, it's looking like static constructors are turning out to be a bit of a failure on some level, given the issues that we're having because of them, and I think that we should fix the language and/or compiler so that they _aren't_ a failure. Here I totally disagree. The design is sound. The issues discussed here are entirely detail implementation artifacts. As far as the binary size goes, I completely agree that it's an implementation issue, but I definitely think that the issues with circular dependencies is a design issue which needs to be addressed. The basics of static constructors wouldn't have to change drastically, but there should at least be a way to indicate to the compiler that there is not actually a circular dependency. I don't think that I have ever seen druntime blow up on a circular dependency where there was actually a circular dependency. It's just that the compiler (or druntime or both) isn't smart enough to determine whether the static constructors _actually_ create a circular dependency. It has no way of determining which module's static constructors should be called first and givse up. We need a way to give it that information so that it can order them when they aren't actually interdependent. _That_ is the design flaw that I see in static constructors, and it's one of the most annoying issues in the language IMHO (which arguably just goes to show how good D is in general, I suppose). - Jonathan M Davis
Re: Program size, linking matter, and static this()
On 2011-12-16 21:49, Andrei Alexandrescu wrote: On 12/16/11 2:47 PM, Jacob Carlborg wrote: I don't think it's completely separate. Can the compiler know if runtime reflection is used or not? Yes. Reflection is used if reflection primitive functions are called. Andrei Yeah, but how does the compiler know which are primitive functions, hard code them in the compiler? Or perhaps the compiler already need to know this. -- /Jacob Carlborg
Re: Program size, linking matter, and static this()
On Fri, 16 Dec 2011 14:48:33 -0500, Andrei Alexandrescu seewebsiteforem...@erdani.org wrote: On 12/16/11 1:23 PM, Steven Schveighoffer wrote: I disagree with this assessment. It's good to know the cause of the problem, but let's look at the root issue -- reflection. The only reason to include class information for classes not being referenced is to be able to construct/use classes at runtime instead of at compile time. But if you look at D's runtime reflection capabilities, they are quite poor. You can only construct a class at runtime if it has a zero-arg constructor. So essentially, we are paying the penalty of having runtime reflection in terms of bloat, but get very very little benefit. I'd almost agree, but the code showed doesn't use Object.factory(). So that shouldn't be linked in, and shouldn't pull vtables. You cannot know until link time whether factory is used when compiling individual files. By then it's probably too late to exclude them. The point is that you can instantiate unreferenced classes simply by calling them out by name. I think there are two things that need to be considered: 1. We eventually should have some reasonably complete runtime reflection capability 2. Runtime reflection and shared libraries go hand-in-hand. With shared library support, the bloat penalty isn't nearly as significant. I don't think the right answer is to avoid using features of the language because the compiler/runtime has some design deficiencies. At some point these deficiencies will be fixed, and then we are left with a library that has seemingly odd design choices that we can't change. Runtime reflection is great, but I think it's a separate issue from what's discussed here. I'm not pushing for runtime reflection, all I'm saying is, I don't think it's worth changing how the library is written to work around something because the *compiler* is incorrectly implemented/designed. So why don't we just leave the code size situation as-is? 500kb is not a terribly significant amount, but dlls are on the horizon (Walter has publicly said so). Then size becomes a moot point. If we get reflection, then you will find that having excluded all the runtime information when not used is going to hamper D's reflection capability, and we'll probably have to start putting it back in anyway. In short, dlls will solve the problem, let's work on that instead of shuffling around code. -Steve
Re: Program size, linking matter, and static this()
A related issue is phobos being an intermodule dependency monster. A simple hello world pulls in almost 30 modules! And std.stdio is supposed to be just a simple wrapper around C FILE.
Re: Program size, linking matter, and static this()
On Fri, 16 Dec 2011 16:28:03 -0500, Steven Schveighoffer schvei...@yahoo.com wrote: So why don't we just leave the code size situation as-is? 500kb is not a terribly significant amount, but dlls are on the horizon (Walter has publicly said so). Then size becomes a moot point. If we get reflection, then you will find that having excluded all the runtime information when not used is going to hamper D's reflection capability, and we'll probably have to start putting it back in anyway. In short, dlls will solve the problem, let's work on that instead of shuffling around code. The other valid option I see is removing the link to the virtual tables, thereby disabling reflection via factory until we can implement full reflection. -Steve
Re: Program size, linking matter, and static this()
On 12/16/2011 09:31 PM, Jonathan M Davis wrote: On Friday, December 16, 2011 21:06:49 Timon Gehr wrote: On 12/16/2011 08:41 PM, Jonathan M Davis wrote: On Friday, December 16, 2011 12:29:18 Andrei Alexandrescu wrote: Jonathan, could I impose on you to replace all static cdtors in std.datetime with lazy initialization? I looked through it and it strikes me as a reasonably simple job, but I think you'd know better what to do than me. A similar effort could be conducted to reduce or eliminate static cdtors from druntime. I made the experiment of commenting them all, and that reduced the size of the baseline from 218KB to 200KB. This is a good amount, but not as dramatic as what we can get by working on std.datetime. Hmm. I had reply for this already, but it seems to have disappeared, so I'll try again. You could make core.time use property functions instead of the static immutable variables that it's using now for ticksPerSec and appOrigin, but in order to do that right would require introducing a mutex or synchronized block (which is really just a mutex under the hood anyway), and I'm loathe to do that in time-related code. ticksPerSec gets used all over the place in TickDuration, and that could have a negative impact on performance for something that needs to be really fast (since it's used in stuff like StopWatch and benchmarking). On top of that, in order to maintain the current semantics, the property functions would have to be pure, which they can't be without doing some nasty casting to convince the compiler that stuff which isn't pure is actually pure. lazy variables would resolve this. True, but we don't have them. Circular dependencies are not to be blamed on the design of static constructors. Yes they are. No. They arise from the design of the module hierarchy. static constructors completely chicken out on them. Not only is there no real attempt to determine whether the static constructors are actually dependent (which granted, isn't an easy problem), I don't think that is an option. but there is _zero_ support in the language for resolving such circular dependencies. There's no way to say that they _aren't_ dependent even if you can clearly see that they aren't. Yes there is. The compiler and runtime understand that they are not mutually dependent if their modules are not mutually dependent. Package level is the right level for dealing with such issues because the circular dependencies are a modularity problem. The solution used in Phobos (which won't work in std.datetime due to the use of immutable and pure) is to create a C module which has the code from the static constructor and then have a separate module which calls it in its static constructor. You don't need a C function if you just factor out every variable it initializes to the separate D module. __fileinit.d works that way. I don't see why stdiobase.d could not do the same. It works, but it's not pretty (and it doesn't always work - e.g. std.datetime), and it would be _far_ better if you could just mark a static constructor as not depending on anything or mark it as not depending on a specific module or something similar. How would that be checked? And given how disgusting it generally is to even figure out what's causing a circular dependency when the runtime won't start your program because of it, I really think that this is a problem which should resolved. static constructors need to be improved. Nobody has figured out how to solve the problem of modular global data initialization. That is because there probably is no solution. In general at this point, it's looking like static constructors are turning out to be a bit of a failure on some level, given the issues that we're having because of them, and I think that we should fix the language and/or compiler so that they _aren't_ a failure. - Jonathan M Davis We are having (minor!!) problems because the task of initializing global data in a modular way is inherently hard. Just have a look how other languages handle initialization of global data and you'll notice that the D solution is actually very sensible. Yes. The situation with D is better than that of many other languages, but what prodblems we do have can be _really_ annoying to deal with. Have to deal with circular dependencies due to static module constructors which aren't actually interdependent is one of the most annoying issues in D IMHO. Adding a language construct that turns off the checking entirely (as you seem to suggest) is not at all better than having to create a few additional source files.
Re: Program size, linking matter, and static this()
On Fri, 16 Dec 2011 15:58:28 -0500, Jonathan M Davis jmdavisp...@gmx.com wrote: On Friday, December 16, 2011 14:44:48 Andrei Alexandrescu wrote: As another matter, there is value in minimizing compulsive work during library startup. Consider for example this code in std.datetime: shared static this() { tzset(); _localTime = new immutable(LocalTime)(); } This summons the garbage collector right off the bat, thus wiping off anyone's chance of compiling and linking without a GC - as many people seem to want to do. And that happens not to programs that import and use std.datetime, but to program using any part of the standard library that transitively imports std.datetime, even for the most innocuous uses, and even if they never, ever use _localtime! That one line essentially locks out 75% of the standard library to anyone wishing to ever avoid using the GC. This, on the other hand, is of much greater concern, and is a much better argument for using the ugly casting necessary to get rid of the static constructors, even if the compiler did a fanastic job at cutting out the extra cruft in the binary - though as far as the GC goes, it might not be an issue once CTFE is good enough to create classes at compile time that still exist at runtime. Unfortunately, the necessity of tzset would remain however. This can be solved with malloc and emplace -Steve
Re: Program size, linking matter, and static this()
On 12/16/11 3:38 PM, Trass3r wrote: A related issue is phobos being an intermodule dependency monster. A simple hello world pulls in almost 30 modules! And std.stdio is supposed to be just a simple wrapper around C FILE. In fact it doesn't (after yesterday's commit). The std code in hello, world is a minuscule 3KB. The rest of 218KB is druntime. Once we solve the static constructor issue, function-level linking should take care of pulling only the minimum needed. One interesting fact is that a lot of issues that I tended to take non-critically (templates cause bloat, intermodule dependencies cause bloat, static linking creates large programs) looked a whole lot differently when I looked closer at causes and effects. Andrei
Re: Program size, linking matter, and static this()
On Friday, 16 December 2011 at 21:28:03 UTC, Steven Schveighoffer wrote: In short, dlls will solve the problem, let's work on that instead of shuffling around code. I wouldn't want to cripple either - put all the reflection info in the dll, but keep it sufficiently decoupled so the linker can strip it out when statically linking. The effort in decoupling most the code isn't great.
Re: Program size, linking matter, and static this()
On 12/16/11 3:43 PM, Steven Schveighoffer wrote: On Fri, 16 Dec 2011 15:58:28 -0500, Jonathan M Davis jmdavisp...@gmx.com wrote: On Friday, December 16, 2011 14:44:48 Andrei Alexandrescu wrote: As another matter, there is value in minimizing compulsive work during library startup. Consider for example this code in std.datetime: shared static this() { tzset(); _localTime = new immutable(LocalTime)(); } This summons the garbage collector right off the bat, thus wiping off anyone's chance of compiling and linking without a GC - as many people seem to want to do. And that happens not to programs that import and use std.datetime, but to program using any part of the standard library that transitively imports std.datetime, even for the most innocuous uses, and even if they never, ever use _localtime! That one line essentially locks out 75% of the standard library to anyone wishing to ever avoid using the GC. This, on the other hand, is of much greater concern, and is a much better argument for using the ugly casting necessary to get rid of the static constructors, even if the compiler did a fanastic job at cutting out the extra cruft in the binary - though as far as the GC goes, it might not be an issue once CTFE is good enough to create classes at compile time that still exist at runtime. Unfortunately, the necessity of tzset would remain however. This can be solved with malloc and emplace Sure you meant static ubyte[__traits(classInstanceSize, T)] and emplace :o). Andrei
Re: Program size, linking matter, and static this()
Am 16.12.2011, 22:45 Uhr, schrieb Andrei Alexandrescu seewebsiteforem...@erdani.org: On 12/16/11 3:38 PM, Trass3r wrote: A related issue is phobos being an intermodule dependency monster. A simple hello world pulls in almost 30 modules! And std.stdio is supposed to be just a simple wrapper around C FILE. In fact it doesn't (after yesterday's commit). The std code in hello, world is a minuscule 3KB. The rest of 218KB is druntime. Yep, the 30 modules is a measure I took before that commit. Once we solve the static constructor issue, function-level linking should take care of pulling only the minimum needed. Also by pulling in I just meant the imports. But the planned lazy semantic analysis should improve the situation.
Re: Program size, linking matter, and static this()
Andrei Alexandrescu Wrote: On 12/16/11 3:38 PM, Trass3r wrote: A related issue is phobos being an intermodule dependency monster. A simple hello world pulls in almost 30 modules! And std.stdio is supposed to be just a simple wrapper around C FILE. In fact it doesn't (after yesterday's commit). The std code in hello, world is a minuscule 3KB. The rest of 218KB is druntime. Once we solve the static constructor issue, function-level linking should take care of pulling only the minimum needed. One interesting fact is that a lot of issues that I tended to take non-critically (templates cause bloat, intermodule dependencies cause bloat, static linking creates large programs) looked a whole lot differently when I looked closer at causes and effects. Andrei http://wiki.freepascal.org/Size_Matters Otherwise a great language that never did manage to remove bloated factor from its name. Many people stopped using it because of that, including me. I guess people do not like bloat when programming systems stuff.
Re: Program size, linking matter, and static this()
On 12/16/11 3:28 PM, Steven Schveighoffer wrote: On Fri, 16 Dec 2011 14:48:33 -0500, Andrei Alexandrescu seewebsiteforem...@erdani.org wrote: On 12/16/11 1:23 PM, Steven Schveighoffer wrote: I disagree with this assessment. It's good to know the cause of the problem, but let's look at the root issue -- reflection. The only reason to include class information for classes not being referenced is to be able to construct/use classes at runtime instead of at compile time. But if you look at D's runtime reflection capabilities, they are quite poor. You can only construct a class at runtime if it has a zero-arg constructor. So essentially, we are paying the penalty of having runtime reflection in terms of bloat, but get very very little benefit. I'd almost agree, but the code showed doesn't use Object.factory(). So that shouldn't be linked in, and shouldn't pull vtables. You cannot know until link time whether factory is used when compiling individual files. By then it's probably too late to exclude them. I'm not an expert in linkers, but my understanding is that linkers naturally remove unused object files. That, coupled with dmd's ability to break compilation output in many pseudo-object files, would take care of the matter. Truth be told, once you link in Object.factory(), bam - all classes are linked. The point is that you can instantiate unreferenced classes simply by calling them out by name. Yah, but you must call a function to do that. I'm not pushing for runtime reflection, all I'm saying is, I don't think it's worth changing how the library is written to work around something because the *compiler* is incorrectly implemented/designed. So why don't we just leave the code size situation as-is? 500kb is not a terribly significant amount, but dlls are on the horizon (Walter has publicly said so). Then size becomes a moot point. If we get reflection, then you will find that having excluded all the runtime information when not used is going to hamper D's reflection capability, and we'll probably have to start putting it back in anyway. In short, dlls will solve the problem, let's work on that instead of shuffling around code. I think there are more issues with static this() than simply executable size, as discussed. Also, adding dynamic linking capability does not mean we give up on static linking. A lot of programs use static linking by choice, and for good reasons. Andrei
Re: Program size, linking matter, and static this()
On 12/16/2011 10:53 PM, Trass3r wrote: Am 16.12.2011, 22:45 Uhr, schrieb Andrei Alexandrescu seewebsiteforem...@erdani.org: On 12/16/11 3:38 PM, Trass3r wrote: A related issue is phobos being an intermodule dependency monster. A simple hello world pulls in almost 30 modules! And std.stdio is supposed to be just a simple wrapper around C FILE. In fact it doesn't (after yesterday's commit). The std code in hello, world is a minuscule 3KB. The rest of 218KB is druntime. Yep, the 30 modules is a measure I took before that commit. Once we solve the static constructor issue, function-level linking should take care of pulling only the minimum needed. Also by pulling in I just meant the imports. But the planned lazy semantic analysis should improve the situation. I think it is already lazy? --- module a; void foo(){ imanundefinedsymbolandcauseacompileerror(); } --- --- module b; import a; void main(){ foo(); } --- $ dmd -c b # compiles fine
Re: Program size, linking matter, and static this()
On Friday, 16 December 2011 at 21:45:43 UTC, Andrei Alexandrescu wrote: Once we solve the static constructor issue, function-level linking should take care of pulling only the minimum needed. This sounds fantastic. One interesting fact is that a lot of issues that I tended to take non-critically (templates cause bloat, intermodule dependencies cause bloat, static linking creates large programs) looked a whole lot differently when I looked closer at causes and effects. I'd be careful to overgeneralize from this though; templates do have the potential to bloat things up, etc. Though static linking has and always shall rok. (For bloated templates, I had a monster of one in web.d that shrunk the binary by about three megabytes by refactoring some of it into regular functions. Shaved two seconds off the compile time too! Note this binary is my work project, so your results may vary with my library. It was basically inlining several kilobytes of the same stuff into hundreds of different functions... 10 kb * 300 functions = lots of code.)
Re: Program size, linking matter, and static this()
On Fri, 16 Dec 2011 17:00:45 -0500, Andrei Alexandrescu seewebsiteforem...@erdani.org wrote: On 12/16/11 3:28 PM, Steven Schveighoffer wrote: On Fri, 16 Dec 2011 14:48:33 -0500, Andrei Alexandrescu seewebsiteforem...@erdani.org wrote: On 12/16/11 1:23 PM, Steven Schveighoffer wrote: I disagree with this assessment. It's good to know the cause of the problem, but let's look at the root issue -- reflection. The only reason to include class information for classes not being referenced is to be able to construct/use classes at runtime instead of at compile time. But if you look at D's runtime reflection capabilities, they are quite poor. You can only construct a class at runtime if it has a zero-arg constructor. So essentially, we are paying the penalty of having runtime reflection in terms of bloat, but get very very little benefit. I'd almost agree, but the code showed doesn't use Object.factory(). So that shouldn't be linked in, and shouldn't pull vtables. You cannot know until link time whether factory is used when compiling individual files. By then it's probably too late to exclude them. I'm not an expert in linkers, but my understanding is that linkers naturally remove unused object files. That, coupled with dmd's ability to break compilation output in many pseudo-object files, would take care of the matter. Truth be told, once you link in Object.factory(), bam - all classes are linked. Factory doesn't directly reference classes, it does so through the moduleinfo tree/array (not sure what it is). So the way it works is, the linker includes the module info because it's defined as static data, which includes the vtable functions, and factory can instantiate non-referenced classes because of this fact, not the other way around. I'm not pushing for runtime reflection, all I'm saying is, I don't think it's worth changing how the library is written to work around something because the *compiler* is incorrectly implemented/designed. So why don't we just leave the code size situation as-is? 500kb is not a terribly significant amount, but dlls are on the horizon (Walter has publicly said so). Then size becomes a moot point. If we get reflection, then you will find that having excluded all the runtime information when not used is going to hamper D's reflection capability, and we'll probably have to start putting it back in anyway. In short, dlls will solve the problem, let's work on that instead of shuffling around code. I think there are more issues with static this() than simply executable size, as discussed. Also, adding dynamic linking capability does not mean we give up on static linking. A lot of programs use static linking by choice, and for good reasons. Even statically linked programs might use runtime reflection. I agree the issue is not static linking vs. dynamic linking, but dynamic linking would hide the problem quite well. Note that on Linux today, the executable is not truly static -- OS libs are dynamically linked. Another option is to disable runtime reflection via a compiler switch (which would sever the ties between moduleinfo and classinfo). Then we simply must make sure we don't use factory in the library anywhere. -Steve
Re: Program size, linking matter, and static this()
On Fri, 16 Dec 2011 16:48:18 -0500, Andrei Alexandrescu seewebsiteforem...@erdani.org wrote: On 12/16/11 3:43 PM, Steven Schveighoffer wrote: On Fri, 16 Dec 2011 15:58:28 -0500, Jonathan M Davis jmdavisp...@gmx.com wrote: On Friday, December 16, 2011 14:44:48 Andrei Alexandrescu wrote: As another matter, there is value in minimizing compulsive work during library startup. Consider for example this code in std.datetime: shared static this() { tzset(); _localTime = new immutable(LocalTime)(); } This summons the garbage collector right off the bat, thus wiping off anyone's chance of compiling and linking without a GC - as many people seem to want to do. And that happens not to programs that import and use std.datetime, but to program using any part of the standard library that transitively imports std.datetime, even for the most innocuous uses, and even if they never, ever use _localtime! That one line essentially locks out 75% of the standard library to anyone wishing to ever avoid using the GC. This, on the other hand, is of much greater concern, and is a much better argument for using the ugly casting necessary to get rid of the static constructors, even if the compiler did a fanastic job at cutting out the extra cruft in the binary - though as far as the GC goes, it might not be an issue once CTFE is good enough to create classes at compile time that still exist at runtime. Unfortunately, the necessity of tzset would remain however. This can be solved with malloc and emplace Sure you meant static ubyte[__traits(classInstanceSize, T)] and emplace :o). That works too! -Steve
Re: Program size, linking matter, and static this()
On Fri, 16 Dec 2011 16:48:47 -0500, Adam D. Ruppe destructiona...@gmail.com wrote: On Friday, 16 December 2011 at 21:28:03 UTC, Steven Schveighoffer wrote: In short, dlls will solve the problem, let's work on that instead of shuffling around code. I wouldn't want to cripple either - put all the reflection info in the dll, but keep it sufficiently decoupled so the linker can strip it out when statically linking. The effort in decoupling most the code isn't great. The only way I can think of to decouple it is to disable it with a compiler switch, since the compiler is the one including the info. I envision a nasty world where libraries are built 4 ways, with two orthogonal factors -- dynamic vs. static, and reflection vs. no reflection. Oh, hello visual C++, what are you doing here? -Steve
Re: Program size, linking matter, and static this()
On 16.12.2011 22:28, Steven Schveighoffer wrote: In short, dlls will solve the problem, let's work on that instead of shuffling around code. How exactly do they solve the problem? An exe plus a DLL version of the library will usually be larger than just a statically linked exe.
Re: Program size, linking matter, and static this()
On Friday, December 16, 2011 22:41:14 Timon Gehr wrote: On 12/16/2011 09:31 PM, Jonathan M Davis wrote: You don't need a C function if you just factor out every variable it initializes to the separate D module. __fileinit.d works that way. I don't see why stdiobase.d could not do the same. That only works if the variable being initialized is in the new module instead of the original module, which you can't always do. It works, but it's not pretty (and it doesn't always work - e.g. std.datetime), and it would be _far_ better if you could just mark a static constructor as not depending on anything or mark it as not depending on a specific module or something similar. How would that be checked? It wouldn't be. It wouldn't need to be. The programmer is telling the compiler that there isn't a dependency. It's up to the programmer to make sure that it's right, and it's wrong, it's their fault. There are plenty of other features like that in D - just not SafeD. annoying issues in D IMHO. Adding a language construct that turns off the checking entirely (as you seem to suggest) is not at all better than having to create a few additional source files. I completely disagree. For instance, it's impossible to move the singleton instances of UTC and LocalTime from std.datetime into another module without breaking encapsulation, and it's definitely impossible to do it and leave them as members of their respective classes. Those static constructors clearly don't rely on any other modules except for the one which gives the declaration for tzset (and has no static constructors). But if std.file needed a module constructor, we'd end up with a circular dependency between std.datetime and std.file when clearly nothing in std.datetime's static constructor relies on std.file in any way shape or form. It would be a huge improvement to be able to just mark those static constructors as not relying on any other modules having their static constructors run first. As it stands, it's a royal pain to deal with any circular dependencies which pop up and because of that, it quickly becomes best practice to avoid static constructors as much as possible, which is a big problem IMHO. Factoring out the static constructor's contents into a separate module is not always possible, and it's an ugly solution IMHO. I'd _much_ rather have a feature where I can tell the compiler that there is no circular dependency so that it can appropriately order the loading of the modules. - Jonathan M Davis
Re: Program size, linking matter, and static this()
On Friday, December 16, 2011 23:30:44 torhu wrote: On 16.12.2011 22:28, Steven Schveighoffer wrote: In short, dlls will solve the problem, let's work on that instead of shuffling around code. How exactly do they solve the problem? An exe plus a DLL version of the library will usually be larger than just a statically linked exe. You have to stick it all in the DLL anyway (since you can't know which parts will and won't be used), so the whole issue of not including used functionality goes away completely. There's no point in worrying about how much unused functionality gets included when you have no choice but to include everything regardless of whether it's actually used. - Jonathan M Davis
Re: Program size, linking matter, and static this()
On 16/12/2011 18:29, Andrei Alexandrescu wrote: Here's a list of all files in std using static cdtors: std/__fileinit.d std/concurrency.d std/cpuid.d std/cstream.d std/datebase.d std/datetime.d std/encoding.d std/internal/math/biguintcore.d std/internal/math/biguintx86.d std/internal/processinit.d std/internal/windows/advapi32.d std/mmfile.d std/parallelism.d std/perf.d std/socket.d std/stdiobase.d std/uri.d On a slightly related note: http://d.puremagic.com/issues/show_bug.cgi?id=5614 Basically, do the static constructors in __fileinit and mmfile need to exist on a (hypothetical) 64bit Windows build?
Re: Program size, linking matter, and static this()
On Dec 16, 2011, at 10:29 AM, Andrei Alexandrescu wrote: But in experiments it seemed like program size would increase in sudden amounts when certain modules were included. After much investigation we figured that the following fateful causal sequence happened: 1. Some modules define static constructors with static this() or static shared this(), and/or static destructors. 2. These constructors/destructors are linked in automatically whenever a module is included. 3. Importing a module with a static constructor (or destructor) will generate its ModuleInfo structure, which contains static information about all module members. In particular, it keeps virtual table pointers for all classes defined inside the module. What is gained from having class vtbls referenced by ModuleInfo? Could we put them elsewhere?
Re: Program size, linking matter, and static this()
On 12/16/2011 1:41 PM, Timon Gehr wrote: Adding a language construct that turns off the checking entirely (as you seem to suggest) is not at all better than having to create a few additional source files. I also don't really see how turning off checking is even slightly more elegant than using a dirty cast. The additional source file thing is best because it fits in with the guarantees of the language - it is not a hack nor does it require trust in the programmer to get it right. It's not going to have heisenbugs where it working or not depends on arbitrary link order.
Re: Program size, linking matter, and static this()
On Dec 16, 2011, at 12:44 PM, Andrei Alexandrescu wrote: Consider scope. Many arguments applicable to application code are not quite fit for the standard library. The stdlib is the connection between the compiler innards, the runtime innards, and the OS innards all meet, and the role of the stdlib is to provide nice abstractions to client code. Inside the stdlib it's entirely expected to find things like __traits most nobody heard of, casts, and other things that would be normally shunned in application code. I'd be more worried if there was no possibility to do what we need to do. The standard library is not a place to play it nice. We can't afford to say well yeah everyone's binary is bloated and slower to start but we didn't like the cast that would have taken care of that. I think this is a reasonable assertion about druntime, but the standard library itself should require very little black magic, though the use of obscure features (like __traits) could be commonplace.
Re: Program size, linking matter, and static this()
On 12/16/2011 1:45 PM, Andrei Alexandrescu wrote: On 12/16/11 3:38 PM, Trass3r wrote: A related issue is phobos being an intermodule dependency monster. A simple hello world pulls in almost 30 modules! And std.stdio is supposed to be just a simple wrapper around C FILE. In fact it doesn't (after yesterday's commit). The std code in hello, world is a minuscule 3KB. The rest of 218KB is druntime. Another thing is to avoid using classes for things where one does not expect it to ever be derived from. Use a struct instead, as referencing parts of the struct implementation will not pull in the whole of it, nor is there a vtbl[] to pull it all in. For example, in std.datetime there's final class Clock. It inherits nothing, and nothing can be derived from it. The comments for it say it is merely a namespace. It should be a struct.
Re: Program size, linking matter, and static this()
On 12/16/2011 11:39 PM, Jonathan M Davis wrote: [...] For instance, it's impossible to move the singleton instances of UTC and LocalTime from std.datetime into another module without breaking encapsulation. In what way would encapsulation be broken by just moving the class to a helper module?
Re: Program size, linking matter, and static this()
On 12/16/11 2:58 PM, Jonathan M Davis wrote: Unfortunately, the necessity of tzset would remain however. Why? From http://pubs.opengroup.org/onlinepubs/007904875/functions/tzset.html: The tzset() function shall use the value of the environment variable TZ to set time conversion information used by ctime(), localtime(), mktime(), and strftime(). If TZ is absent from the environment, implementation-defined default timezone information shall be used. I'd expect a good standard library implementation for D would call tzset() once per process instance, lazily, inside the wrapper functions for the four functions above. Alternatively, people could call the stdc.* versions and expect tzet() to _not_ having been called. That strikes the right balance between convenience, flexibility, and efficiency. Andrei
Re: Program size, linking matter, and static this()
On Dec 16, 2011, at 1:38 PM, Trass3r wrote: A related issue is phobos being an intermodule dependency monster. A simple hello world pulls in almost 30 modules! This was one of the major motivations for separating druntime from phobos. The last thing anyone wants is for something in runtime to print to the console and end up pulling in 80% of the standard library as a result.
Re: Program size, linking matter, and static this()
On Dec 16, 2011, at 1:45 PM, Andrei Alexandrescu wrote: On 12/16/11 3:38 PM, Trass3r wrote: A related issue is phobos being an intermodule dependency monster. A simple hello world pulls in almost 30 modules! And std.stdio is supposed to be just a simple wrapper around C FILE. In fact it doesn't (after yesterday's commit). The std code in hello, world is a minuscule 3KB. The rest of 218KB is runtime. Once upon a time, a minimal D app was roughly 65K. TypeInfo has ballooned a lot since then however. It's worth considering whether you're writing a Windows or Posix app as well, since the Posix headers are far more extensive (and thus may result in far more ModuleInfo instances).
Re: Program size, linking matter, and static this()
On Dec 16, 2011, at 1:48 PM, Andrei Alexandrescu wrote: On 12/16/11 3:43 PM, Steven Schveighoffer wrote: This can be solved with malloc and emplace Sure you meant static ubyte[__traits(classInstanceSize, T)] and emplace :o). Don't forget the 16 byte alignment :-)
Re: Program size, linking matter, and static this()
On Dec 16, 2011, at 2:00 PM, Andrei Alexandrescu wrote: I'm not an expert in linkers, but my understanding is that linkers naturally remove unused object files. That, coupled with dmd's ability to break compilation output in many pseudo-object files, would take care of the matter. Truth be told, once you link in Object.factory(), bam - all classes are linked. There's an old bugzilla entry that may apply: http://d.puremagic.com/issues/show_bug.cgi?id=879
Re: Program size, linking matter, and static this()
On 12/16/11 4:39 PM, Jonathan M Davis wrote: It wouldn't be. It wouldn't need to be. The programmer is telling the compiler that there isn't a dependency. It's up to the programmer to make sure that it's right, and it's wrong, it's their fault. There are plenty of other features like that in D - just not SafeD. I don't see progress here over arranging packages and modules to reflect program structure in a way that clarifies it to the human /and/ the compiler. annoying issues in D IMHO. Adding a language construct that turns off the checking entirely (as you seem to suggest) is not at all better than having to create a few additional source files. I completely disagree. For instance, it's impossible to move the singleton instances of UTC and LocalTime from std.datetime into another module without breaking encapsulation, and it's definitely impossible to do it and leave them as members of their respective classes. Maybe there's an issue with the design. Maybe Singleton (the most damned of all patterns) is not the best choice here. Or maybe the use of an inheritance hierarchy with a grand total of 4 classes. Or maybe the encapsulation could be rethought. The general point is, a design lives within a language. Any language is going to disallow a few designs or make them unsuitable for particular situation. This is, again, multiplied by the context: it's the standard library. Those static constructors clearly don't rely on any other modules except for the one which gives the declaration for tzset (and has no static constructors). But if std.file needed a module constructor, we'd end up with a circular dependency between std.datetime and std.file when clearly nothing in std.datetime's static constructor relies on std.file in any way shape or form. It would be a huge improvement to be able to just mark those static constructors as not relying on any other modules having their static constructors run first. As it stands, it's a royal pain to deal with any circular dependencies which pop up and because of that, it quickly becomes best practice to avoid static constructors as much as possible, which is a big problem IMHO. I think this point has gotten into an extreme, a corner of the design space. Yeah, sky's blue, apple pie is good (and too much of it gives diabetes), and module dependencies can be messy. But it strikes me as a bit backwards to add instructions in the core language to lessen guarantees and make things even messier, when alternatives exist that foster better dependency control for the very rare situations that need intervention. It's just not proportional response. The persona using such a feature would be quite an odd combination - a developer with sophisticated enough needs to want unchecked dependencies as a feature, yet naive enough to be unable to solve the problem without the feature, and yet again sophisticated enough to not make mistakes in using said feature. Factoring out the static constructor's contents into a separate module is not always possible, and it's an ugly solution IMHO. I'd _much_ rather have a feature where I can tell the compiler that there is no circular dependency so that it can appropriately order the loading of the modules. But what's the appropriate order then? :o) Andrei
Re: Program size, linking matter, and static this()
how did other languages solve this issue? I can't imagine D beeing the only language with static constructors, do they have that problem too?
Re: Program size, linking matter, and static this()
On 12/16/11 5:08 PM, Sean Kelly wrote: On Dec 16, 2011, at 1:38 PM, Trass3r wrote: A related issue is phobos being an intermodule dependency monster. A simple hello world pulls in almost 30 modules! This was one of the major motivations for separating druntime from phobos. The last thing anyone wants is for something in runtime to print to the console and end up pulling in 80% of the standard library as a result. Well, right now druntime itself may have become the interdependency knot it once wanted to shun :o). Commenting out all static cdtors from druntime only reduced the code size from 218KB to 200KB for a do-nothing program, so most of druntime is compulsively linked and loaded. I think we can improve things a bit there. Andrei
Re: Program size, linking matter, and static this()
On 12/17/2011 12:11 AM, Sean Kelly wrote: On Dec 16, 2011, at 1:48 PM, Andrei Alexandrescu wrote: On 12/16/11 3:43 PM, Steven Schveighoffer wrote: This can be solved with malloc and emplace Sure you meant static ubyte[__traits(classInstanceSize, T)] and emplace :o). Don't forget the 16 byte alignment :-) Which is currently relatively easy: http://d.puremagic.com/issues/show_bug.cgi?id=6635
Re: Program size, linking matter, and static this()
On Dec 16, 2011, at 3:24 PM, Walter Bright wrote: On 12/16/2011 3:18 PM, maarten van damme wrote: how did other languages solve this issue? I can't imagine D beeing the only language with static constructors, do they have that problem too? In C++, the order that static constructors run is implementation defined. No guarantees at all. The programmer has no reasonable way to control the order in which they are done. (Of course, C++ doesn't even have modules, so the notion of a module constructor is tenuous at best.) This aspect of C++ drives me absolutely crazy. Though I imagine it bothers a lot of people given all the coverage static initialization has gotten in C++ literature.
Re: Program size, linking matter, and static this()
On Dec 16, 2011, at 3:16 PM, Andrei Alexandrescu wrote: On 12/16/11 5:08 PM, Sean Kelly wrote: On Dec 16, 2011, at 1:38 PM, Trass3r wrote: A related issue is phobos being an intermodule dependency monster. A simple hello world pulls in almost 30 modules! This was one of the major motivations for separating druntime from phobos. The last thing anyone wants is for something in runtime to print to the console and end up pulling in 80% of the standard library as a result. Well, right now druntime itself may have become the interdependency knot it once wanted to shun :o). The first place to look would be rt/. I know there's some tool that generates dependency graphs for D. Does Descent do that?
Re: Program size, linking matter, and static this()
On 12/17/2011 12:18 AM, maarten van damme wrote: how did other languages solve this issue? I can't imagine D beeing the only language with static constructors, do they have that problem too? Nobody has solved the issue. The approach in Java and C#, for instance, is to call the static constructor lazily upon class load time. That means it can be called at an arbitrary point during your program execution. And if you accidentally have circular dependencies between static constructors, your program may or may not blow up or behave badly.
Re: Program size, linking matter, and static this()
On Friday, December 16, 2011 17:13:49 Andrei Alexandrescu wrote: Maybe there's an issue with the design. Maybe Singleton (the most damned of all patterns) is not the best choice here. Or maybe the use of an inheritance hierarchy with a grand total of 4 classes. Or maybe the encapsulation could be rethought. The general point is, a design lives within a language. Any language is going to disallow a few designs or make them unsuitable for particular situation. This is, again, multiplied by the context: it's the standard library. I don't know what's wrong with singletons. It's a great pattern in certain circumstances. In this case, it avoids unnecessary allocations every time that you do something like Clock.currTime(). There's no reason to keep allocating new instances of LocalTime and wasting memory. The data in all of them would be identical. And since the time zone has to be dynamic, it requires either a class or function pointers (or delegates). And since multiple functions are involved per time zone, it's far cleaner to use class. It has the added benefit of giving you a nice place to do stuff like ask the time zone its name. So, I don't see what could be better than using classes for the time zones like it does now. And given the fact that it's completely unnecessary and wasteful to allocate multiple instances of UTC and LocalTime, it seems to me that the singleton pattern is exactly the correct solution for this problem. There would be fewer potential issues with circular dependencies if std.datetime were broken up, but the consensus seems to be that we don't want to do that. Regardless, if I find a way to lazily load the singletons in spite of immutable and pure, then there won't be any more need for the static constructors for them. There's still one for the unit tests, but worse comes to worst, that functionality could be moved to a function which is called by the first unittest block. But what's the appropriate order then? :o) It doesn't matter. The static constructors in std.datetime has no dependencies on other modules at all aside from object and the core module which holds the declaration for tzset. In neither case does it depend on any other static constructors. In my experience, that's almost always the case. But because of how circular dependencies are treated, the compiler/runtime considers it a circular dependency as soon as two modules which import each other directly - or worse, indirectly - both have module constructors, regardless of whether there is anything even vaguely interdependent about those static constructors and what they initialize. So, you're forced to move stuff into other modules, and in some cases (such as when pure or immutable is being used), that may not work. Clearly, I'm not going to win any arguments on this, given that both you and Walter are definitely opposed, but I definitely think that the current situation with circular dependencies is one of D's major warts. - Jonathan M Davis