Re: Why is GNU/Linux so Bloated?
Hi all! On Thursday 11 June 2009 16:59:32 Shlomi Fish wrote: Hi all! Based on the gcc-4.4.0 (with -Os) / x86-Linux shared library sizes here: http://tech.groups.yahoo.com/group/fc-solve-discuss/message/998 And the Visual C++/Win32 (also x86) .dll sizes here: http://tech.groups.yahoo.com/group/fc-solve-discuss/message/999 My question is: why are the Visual C++ generated binaries so much smaller than the equivalent Linux ones? Any insights would be appreciated. Replying to myself, I'd like to note that I recently fixed some build problems in the Freecell Solver distribution, and after I was through, MSVC now generates a larger .dll file, comparable in size to the gcc -Os one - i.e: 40K-50K. I guess it previously wasn't built correctly. Regards, Shlomi Fish -- - Shlomi Fish http://www.shlomifish.org/ Humanity - Parody of Modern Life - http://xrl.us/bkeut God gave us two eyes and ten fingers so we will type five times as much as we read. ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Why is GNU/Linux so Bloated?
Hi Shlomi, On Thu, Jun 25, 2009 at 04:20:00PM +0300, Shlomi Fish wrote: Replying to myself, I'd like to note that I recently fixed some build problems in the Freecell Solver distribution, and after I was through, MSVC now generates a larger .dll file, comparable in size to the gcc -Os one - i.e: 40K-50K. I guess it previously wasn't built correctly. Care to elaborate on those build problems? What build option caused the binary size bloat? baruch -- ~. .~ Tk Open Systems =}ooO--U--Ooo{= - bar...@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il - ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Why is GNU/Linux so Bloated?
On Thursday 25 June 2009 16:52:11 Baruch Siach wrote: Hi Shlomi, On Thu, Jun 25, 2009 at 04:20:00PM +0300, Shlomi Fish wrote: Replying to myself, I'd like to note that I recently fixed some build problems in the Freecell Solver distribution, and after I was through, MSVC now generates a larger .dll file, comparable in size to the gcc -Os one - i.e: 40K-50K. I guess it previously wasn't built correctly. Care to elaborate on those build problems? What build option caused the binary size bloat? Actually, it is the other way around. Some build issues caused MSVC to generate much smaller (and probably mal-functioning) binaries. I fixed several problems now that caused it to fail in the 2.32.0 release. However, with a previous release, the build did not fail (but generated the small binaries). The NEWS file for 2.32.1 reads: { Version 2.32.1: (25-Jun-2009) - 1. Added a #define BUILDING_DLL 1 so fcs_dllexport.h will work fine on Microsoft Visual C++. 2. Normalised the DLLEXPORT modifiers. 3. Some fixes to the CMake build system: - CHECK_C_COMPILER_FLAG now uses a different variable for each flag, since the variable was cached. - tcmalloc is now truly optional. 4. Moved the declaration of the strncasecmp(a,b,c) macro for WIN32 systems to before its first use. 5. All of this was done to fix many build/compilation problems. } Regards, Shlomi Fish baruch -- - Shlomi Fish http://www.shlomifish.org/ Optimizing Code for Speed - http://xrl.us/begfgk God gave us two eyes and ten fingers so we will type five times as much as we read. ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Why is GNU/Linux so Bloated?
On Thursday 11 June 2009 16:59:32 Shlomi Fish wrote: Hi all! Based on the gcc-4.4.0 (with -Os) / x86-Linux shared library sizes here: http://tech.groups.yahoo.com/group/fc-solve-discuss/message/998 And the Visual C++/Win32 (also x86) .dll sizes here: http://tech.groups.yahoo.com/group/fc-solve-discuss/message/999 My question is: why are the Visual C++ generated binaries so much smaller than the equivalent Linux ones? Any insights would be appreciated. Replying to myself, I'd like to note that I did the following test: shlomi:~$ cat test_shared_lib.c int myzero() { return 0; } shlomi:~$ gcc -Os -o libtest.so -shared test_shared_lib.c shlomi:~$ ls -l libtest.so -rwxr-xr-x 1 shlomi shlomi 3768 2009-06-21 11:44 libtest.so shlomi:~$ strip libtest.so shlomi:~$ ls -l libtest.so -rwxr-xr-x 1 shlomi shlomi 2448 2009-06-21 11:44 libtest.so shlomi:~$ So the overhead of having a mostly-empty shared library is 2,448 bytes (or 3,768 bytes before strip) which isn't very high, and doesn't explain why the gcc-generated code is so much larger than the MSVC-generated one. Regards, Shlomi Fish -- - Shlomi Fish http://www.shlomifish.org/ Funny Anti-Terrorism Story - http://xrl.us/bjn7t God gave us two eyes and ten fingers so we will type five times as much as we read. ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Why is GNU/Linux so Bloated?
On Thu, Jun 11, 2009, Shlomi Fish wrote about Re: Why is GNU/Linux so Bloated?: I've compared the size of the Linux .so file (after -Os and strip) to the size of the Windows MSVC-generated .dll. With gcc -Os before strip - 86,464 bytes same after strip - 74,584 Shlomi, what did you expect strip to do for the shared object? It definitely does not, and cannot, remove the *dynamic* symbol table which is needed to link this library. Try nm -D on your library to see the dynamic symbol table even after the strip. Is it possible that gcc saves a lot of crap in this symbol table that Windows doesn't? Finally, I have no idea what your makefile looks like, but make sure that you do not accidentally statically-link the C library into your shared object. You'll want to dynamically-link it (to add a dependency), but not statically link it (which will add some actual code from the C library into your shared library). -- Nadav Har'El| Sunday, Jun 14 2009, 22 Sivan 5769 n...@math.technion.ac.il |- Phone +972-523-790466, ICQ 13349191 |Sign in pool: Welcome to our OOL. Notice http://nadav.harel.org.il |there is no P, please keep it that way. ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Why is GNU/Linux so Bloated?
On Sunday 14 June 2009 16:33:17 Nadav Har'El wrote: On Thu, Jun 11, 2009, Shlomi Fish wrote about Re: Why is GNU/Linux so Bloated?: I've compared the size of the Linux .so file (after -Os and strip) to the size of the Windows MSVC-generated .dll. With gcc -Os before strip - 86,464 bytes same after strip - 74,584 Shlomi, what did you expect strip to do for the shared object? It definitely does not, and cannot, remove the *dynamic* symbol table which is needed to link this library. Try nm -D on your library to see the dynamic symbol table even after the strip. Is it possible that gcc saves a lot of crap in this symbol table that Windows doesn't? With nm -D I'm getting: {{ w _Jv_RegisterClasses U __assert_fail da64 A __bss_start U __ctype_b_loc U __ctype_toupper_loc w __cxa_finalize w __gmon_start__ da64 A _edata da70 A _end aec8 T _fini 2658 T _init U atof U atoi U calloc U fclose U fgets U fopen U fread U free 7cda T freecell_solver_user_alloc 7b9b T freecell_solver_user_apply_preset 40e9 T freecell_solver_user_cmd_line_parse_args 3078 T freecell_solver_user_cmd_line_parse_args_with_file_nesting_count 7fd7 T freecell_solver_user_current_state_as_string 7b7f T freecell_solver_user_free 778f T freecell_solver_user_get_current_depth 7ffc T freecell_solver_user_get_invalid_state_error_string 7ab7 T freecell_solver_user_get_lib_version 7873 T freecell_solver_user_get_limit_iterations 78d3 T freecell_solver_user_get_max_num_decks 78bf T freecell_solver_user_get_max_num_freecells 78c9 T freecell_solver_user_get_max_num_stacks 7881 T freecell_solver_user_get_moves_left 80cf T freecell_solver_user_get_next_move 7a73 T freecell_solver_user_get_num_soft_threads_in_instance 7a40 T freecell_solver_user_get_num_states_in_collection 7860 T freecell_solver_user_get_num_times 7fb5 T freecell_solver_user_iter_state_as_string 777a T freecell_solver_user_limit_current_instance_iterations 78ae T freecell_solver_user_limit_depth 776c T freecell_solver_user_limit_iterations 7a4e T freecell_solver_user_limit_num_states_in_collection 80c3 T freecell_solver_user_move_to_string 80a1 T freecell_solver_user_move_to_string_w_state 7f56 T freecell_solver_user_next_hard_thread 7cf9 T freecell_solver_user_next_instance 7f83 T freecell_solver_user_next_soft_thread 7e7d T freecell_solver_user_recycle 7cbc T freecell_solver_user_reset 8127 T freecell_solver_user_resume_solution 7981 T freecell_solver_user_set_a_star_weight 7a81 T freecell_solver_user_set_calc_real_depth 7947 T freecell_solver_user_set_empty_stacks_filled_by 77c9 T freecell_solver_user_set_game 7ecd T freecell_solver_user_set_hard_thread_prelude 79ee T freecell_solver_user_set_iter_handler 77c2 T freecell_solver_user_set_num_decks 77b4 T freecell_solver_user_set_num_freecells 77bb T freecell_solver_user_set_num_stacks 7d8a T freecell_solver_user_set_optimization_scan_tests_order 7a29 T freecell_solver_user_set_random_seed 7a92 T freecell_solver_user_set_reparent_states 7aa3 T freecell_solver_user_set_scans_synergy 7917 T freecell_solver_user_set_sequence_move 78dd T freecell_solver_user_set_sequences_are_built_by_type 7f17 T freecell_solver_user_set_soft_thread_name 7a5f T freecell_solver_user_set_soft_thread_step 789d T freecell_solver_user_set_solution_optimization 77a0 T freecell_solver_user_set_solving_method 7de4 T freecell_solver_user_set_tests_order 8385 T freecell_solver_user_solve_board U fseek U ftell U getenv U malloc U memcmp U memmove U memset U pow U puts U qsort U realloc U sprintf U strchr U strcmp U strcpy U strdup U strncasecmp U strncmp U strncpy U vsprintf }} It's everything I expect it to be and not more - the external API and the functions it imports from libc. However, running strip on the MSVC-generated .dll's generates a .dll under 20KB that also seems to be fully functional. Finally, I have no idea what your makefile looks like, but make sure that you do not accidentally statically-link the C library into your shared object. You'll want to dynamically-link it (to add a dependency), but not statically link it (which will add some actual code from the C library into your shared library). I'm not statically linking libc. With libc statically linked, the .so is much larger: { $ ldd libfreecell-solver.so.0 linux-gate.so.1 = (0xe000) libm.so.6 = /lib/i686/libm.so.6 (0xb7e81000) libc.so.6 = /lib/i686/libc.so.6 (0xb7d1d000) /lib/ld
Re: Why is GNU/Linux so Bloated?
Shachar Shemesh shac...@shemesh.biz writes: I'm not sure whether base addresses are allocated randomly or something else is at work here, but collisions are not that common. You can manually rebase a DLL at post-link time, and I think that DLLs shipped by commercial vendors (such as MS :) have precomputed base addresses to avoid the overhead of load-time relocations. If many of the DLLs in your experience came from 3rd parties then this may explain your observation. http://en.wikipedia.org/wiki/Portable_Executable#Relocations http://www.ddj.com/184416272 -- Oleg Goldshmidt | p...@goldshmidt.org ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Why is GNU/Linux so Bloated?
On Fri, Jun 12, 2009 at 10:21:56AM +0300, Oleg Goldshmidt wrote: Shachar Shemesh shac...@shemesh.biz writes: I'm not sure whether base addresses are allocated randomly or something else is at work here, but collisions are not that common. You can manually rebase a DLL at post-link time, On Linux you use the package 'prelink' and I think that DLLs shipped by commercial vendors (such as MS :) have precomputed base addresses to avoid the overhead of load-time relocations. If many of the DLLs in your experience came from 3rd parties then this may explain your observation. http://en.wikipedia.org/wiki/Portable_Executable#Relocations http://www.ddj.com/184416272 If you care about that, install prelink. IIRC distributions tend to configure it to run a weekly cron job to prelink all binaries on the system. -- Tzafrir Cohen | tzaf...@jabber.org | VIM is http://tzafrir.org.il || a Mutt's tzaf...@cohens.org.il || best ICQ# 16849754 || friend ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Why is GNU/Linux so Bloated?
On Friday 12 June 2009 00:13:45 Ori Berger wrote: Shlomi Fish wrote: I've compared the size of the Linux .so file (after -Os and strip) to the size of the Windows MSVC-generated .dll. With gcc -Os before strip - 86,464 bytes same after strip - 74,584 With gcc -Os that can solve Freecell only - before strip: 71,440 After strip - 60,312 Now on Windows, Visual C++ generated the files in: http://tech.groups.yahoo.com/group/fc-solve-discuss/message/999 I have some freecell-solver.dll's there and since it is the output of cygwin's ls -l, you can determine their size. The Freecell-only DLL after strip is 18,944 bytes long. http://www.nedprod.com/programs/gccvisibility.html This is a good description of what might be causing this, and how to solve it. As the page notes, the visibility option has been integrated into gcc 4, but you do have to use it. Implementing this for libfreecell-solver.so (in a separate branch for the time being) has made the situation a bit better: 53,332 bytes instead of 59,492 bytes. Reportedly (in your link), it should also make the library more performant, which I'd like to try out. Nevertheless, the MSVC .dll is still much smaller. Regards, Shlomi Fish -- - Shlomi Fish http://www.shlomifish.org/ Stop Using MSIE - http://www.shlomifish.org/no-ie/ God gave us two eyes and ten fingers so we will type five times as much as we read. ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Why is GNU/Linux so Bloated?
Oleg Goldshmidt wrote: Shachar Shemesh shac...@shemesh.biz writes: I'm not sure whether base addresses are allocated randomly or something else is at work here, but collisions are not that common. You can manually rebase a DLL at post-link time, and I think that DLLs shipped by commercial vendors (such as MS :) have precomputed base addresses to avoid the overhead of load-time relocations. If many of the DLLs in your experience came from 3rd parties then this may explain your observation. When they did not come from 3rd parties, then I got a link warning when I linked my application, and I rebased them. Either way, actual collisions were not so often. Shachar http://en.wikipedia.org/wiki/Portable_Executable#Relocations http://www.ddj.com/184416272 -- Shachar Shemesh Lingnu Open Source Consulting Ltd. http://www.lingnu.com ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Why is GNU/Linux so Bloated?
Hi all! Based on the gcc-4.4.0 (with -Os) / x86-Linux shared library sizes here: http://tech.groups.yahoo.com/group/fc-solve-discuss/message/998 And the Visual C++/Win32 (also x86) .dll sizes here: http://tech.groups.yahoo.com/group/fc-solve-discuss/message/999 My question is: why are the Visual C++ generated binaries so much smaller than the equivalent Linux ones? Any insights would be appreciated. Regards, Shlomi Fish -- - Shlomi Fish http://www.shlomifish.org/ Funny Anti-Terrorism Story - http://xrl.us/bjn7t God gave us two eyes and ten fingers so we will type five times as much as we read. ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Why is GNU/Linux so Bloated?
Shlomi Fish shlo...@iglu.org.il writes: Hi all! Based on the gcc-4.4.0 (with -Os) / x86-Linux shared library sizes here: http://tech.groups.yahoo.com/group/fc-solve-discuss/message/998 And the Visual C++/Win32 (also x86) .dll sizes here: http://tech.groups.yahoo.com/group/fc-solve-discuss/message/999 My question is: why are the Visual C++ generated binaries so much smaller than the equivalent Linux ones? Any insights would be appreciated. Shlomi, The short answer is, I don't know. I didn't even try to figure out where the apples and the oranges were in your fc-solve-discuss postings. Since you don't list files and sizes (at least not in any way I can decipher, being unfamiliar with the project) or specify how you compile and link (apart from -Os), I don't know if you compare apples to apples. I'll assume you compare dynamically linked executables on Linux/gcc and on Windows/cl, and the corresponding so and dll libraries. I'll wave my hands wildly and offer a couple of guesses that you can try to investigate. They may be completely off the mark. 1) You probably know that DLLs work differently from Linux shared libraries. DLLs contain relocatable code that uses a preferred base address to which the loader will want to map the file. If a process is linked against several libraries all but one need to be relocated to other free addresses, COW-ed while fixing the addresses, independently paged, etc. This also means that DLLs are dynamically loaded, but not shared (they can only be shared between processes with the same memory layout). Linux shared libraries contain position-independent code (PIC) that uses only relative (to the program counter) addresses. These libraries are really shared. PIC implies address translation tables that are filled at load time, but I suppose they are allocated at link time. This may be one source of size overhead. I have no idea how important this overhead is, you need to consult the experts. There are (or at least used to be) -fPIC and -fpic options to GCC. IIRC, -fpic implied a limit on the size of translation tables, and refused to build if the resulting tables were too large. In comparison, -fPIC implies no limits. However, I seem to recall that the limits were quite small. 2) I suppose that the structure of the code is important. E.g., does your optimization include inlining? Inlining replicates code, objects, etc., hence it may affect something. I am not sure if -Os overrides inlining. 3) Do you use exceptions a lot? IIRC, GCC generates stack unwinding information for each function that may throw an exception (unless something changed - you are using a recent version). This information is stored in the executable. I don't know if the MS compiler does the same thing. -- Oleg Goldshmidt | p...@goldshmidt.org ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Why is GNU/Linux so Bloated?
On Thursday 11 June 2009 22:22:13 Oleg Goldshmidt wrote: Shlomi Fish shlo...@iglu.org.il writes: Hi all! Based on the gcc-4.4.0 (with -Os) / x86-Linux shared library sizes here: http://tech.groups.yahoo.com/group/fc-solve-discuss/message/998 And the Visual C++/Win32 (also x86) .dll sizes here: http://tech.groups.yahoo.com/group/fc-solve-discuss/message/999 My question is: why are the Visual C++ generated binaries so much smaller than the equivalent Linux ones? Any insights would be appreciated. Shlomi, The short answer is, I don't know. I didn't even try to figure out where the apples and the oranges were in your fc-solve-discuss postings. Since you don't list files and sizes (at least not in any way I can decipher, being unfamiliar with the project) or specify how you compile and link (apart from -Os), I don't know if you compare apples to apples. I've compared the size of the Linux .so file (after -Os and strip) to the size of the Windows MSVC-generated .dll. With gcc -Os before strip - 86,464 bytes same after strip - 74,584 With gcc -Os that can solve Freecell only - before strip: 71,440 After strip - 60,312 Now on Windows, Visual C++ generated the files in: http://tech.groups.yahoo.com/group/fc-solve-discuss/message/999 I have some freecell-solver.dll's there and since it is the output of cygwin's ls -l, you can determine their size. The Freecell-only DLL after strip is 18,944 bytes long. I'll assume you compare dynamically linked executables on Linux/gcc and on Windows/cl, and the corresponding so and dll libraries. I'm only interested in the shared-libraries/DLLs. I should note that from what I tried MinGW's gcc (rather old) also generates relatively large .so files (that's what I ended up with - don't know if they are valid DLLs), on Windows (Same computer as the one I have Visual C++ on). Hopefully, I'll play with it more tomorrow. I'll wave my hands wildly and offer a couple of guesses that you can try to investigate. They may be completely off the mark. 1) You probably know that DLLs work differently from Linux shared libraries. DLLs contain relocatable code that uses a preferred base address to which the loader will want to map the file. If a process is linked against several libraries all but one need to be relocated to other free addresses, COW-ed while fixing the addresses, independently paged, etc. This also means that DLLs are dynamically loaded, but not shared (they can only be shared between processes with the same memory layout). Linux shared libraries contain position-independent code (PIC) that uses only relative (to the program counter) addresses. These libraries are really shared. PIC implies address translation tables that are filled at load time, but I suppose they are allocated at link time. This may be one source of size overhead. I have no idea how important this overhead is, you need to consult the experts. There are (or at least used to be) -fPIC and -fpic options to GCC. IIRC, -fpic implied a limit on the size of translation tables, and refused to build if the resulting tables were too large. In comparison, -fPIC implies no limits. However, I seem to recall that the limits were quite small. OK, I'll try. 2) I suppose that the structure of the code is important. E.g., does your optimization include inlining? Inlining replicates code, objects, etc., hence it may affect something. I am not sure if -Os overrides inlining. I am using inlining. However, Visual C++ also respects that (via its ANSI C __inline keyword that I'm using). Maybe it does a better job with it with MinSizeRel than gcc does. 3) Do you use exceptions a lot? IIRC, GCC generates stack unwinding information for each function that may throw an exception (unless something changed - you are using a recent version). This information is stored in the executable. I don't know if the MS compiler does the same thing. I don't have any exceptions. This is ANSI C code. Regards, Shlomi Fish -- - Shlomi Fish http://www.shlomifish.org/ Star Trek: We, the Living Dead - http://xrl.us/omqz4 God gave us two eyes and ten fingers so we will type five times as much as we read. ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Why is GNU/Linux so Bloated?
Shlomi Fish wrote: I've compared the size of the Linux .so file (after -Os and strip) to the size of the Windows MSVC-generated .dll. With gcc -Os before strip - 86,464 bytes same after strip - 74,584 With gcc -Os that can solve Freecell only - before strip: 71,440 After strip - 60,312 Now on Windows, Visual C++ generated the files in: http://tech.groups.yahoo.com/group/fc-solve-discuss/message/999 I have some freecell-solver.dll's there and since it is the output of cygwin's ls -l, you can determine their size. The Freecell-only DLL after strip is 18,944 bytes long. http://www.nedprod.com/programs/gccvisibility.html This is a good description of what might be causing this, and how to solve it. As the page notes, the visibility option has been integrated into gcc 4, but you do have to use it. ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Why is GNU/Linux so Bloated?
Oleg Goldshmidt wrote: If a process is linked against several libraries all but one need to be relocated to other free addresses, No, this statement is very far from the truth. I have worked quite a lot with DLLs, and very rarely saw the linker message saying that two DLLs require overlapping base addresses. I'm not sure whether base addresses are allocated randomly or something else is at work here, but collisions are not that common. What you did neglect to count, however, is the size of the runtime library. Also, there may be differences in the executable size that do not translate to final in memory size, and it's not clear which -Os is trying to minimize. Shachar -- Shachar Shemesh Lingnu Open Source Consulting Ltd. http://www.lingnu.com ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il