[fpc-devel] IsMultiThread always true issue 30535
Hi, is the commit from 35567 rev. compatible with 3.0.x fixes branch ? If so is it possible someone to commit it also there ? regards, -- Dimitrios Chr. Ioannidis ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Staticaly link C/C++ library (.lib) into FreePascal on Windows
On Sun, Mar 19, 2017 at 5:43 AM, Sven Barth via fpc-devel < fpc-devel@lists.freepascal.org> wrote: > Am 19.03.2017 04:53 schrieb "silvioprog": > > Unfortunately you can't use the static libraries (.a) of Intel because > they are generated for Linux, in spite of static libraries be > cross-platform. > > Non-sense. Static libraries are as platform specific as any other binary > code, after all it needs to call OS functions. > Well, I understand by cross-platform anything that is implemented on multiple platforms, so once ar archives can be generated for multiple ones, it makes sense for me. :-) > I'm not sure about the .lib files. MS's COFF files adopt the .lib > extension, but it is a little bit strange these sizes below: > > > > `libippi.a`: > > . original - 251 MB; > > . striped - 192 MB. > > > > `libippi.lib`: > > . original - 853 KB; > > . striped - no strip needed, it is already small. > > Seems like the second one is merely an import library for the DLL instead > of a real static library. > Indeed. And of course that is COFF as well. MSVC only supports COFF. > There isn't only one kind of COFF, AFAIK MS has an own COFF style and MSVC supports only that. Sure, Intel must have used MSVC ones. However, my *suggestion* about LacaK confirming that was just because he can generate an object or a shared library from a MS COFF file, solving his problem! -- Silvio Clécio ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Optimization of redundant mov's
Martok wrote: > a:= CurrentHash[0]; b:= CurrentHash[1]; c:= CurrentHash[2]; d:= > CurrentHash[3]; > 000100074943 488b8424a002 mov0x2a0(%rsp),%rax > 00010007494B 4c8b5038 mov0x38(%rax),%r10 > 00010007494F 488b8424a002 mov0x2a0(%rsp),%rax > 000100074957 4c8b5840 mov0x40(%rax),%r11 > 00010007495B 488b9424a002 mov0x2a0(%rsp),%rdx > 000100074963 488b4248 mov0x48(%rdx),%rax > 000100074967 488b9424a002 mov0x2a0(%rsp),%rdx > 00010007496F 488b6a50 mov0x50(%rdx),%rbp > > Every single one of the "mov 0x2a0(%rsp), %rxx" instructions except the first > is > redundant and causes another memory round-trip. At the same time, more > registers > are used, which probably makes other optimizations more difficult, especially > when something similar happens on i386. > > Now, the fun part: I haven't been able to build a simple test that causes the > same issue (the self-pointer already is in %rcx and not fetched from the stack > each time), so I have a feeling this may be a side effect of some other part > of > the code. It's called register spilling: once there are no registers left to hold values, the compiler has to pick registers whose value will be kept in memory instead. Register allocation is an NP-complete problem, so the result will never be 100% optimal (at least if you don't want to wait forever while the compiler checks out all possible assignments). One possible heuristic, which is used by FPC's register allocator, is to spill the register that conflicts with the largest number of other registers (to minimise the number of registers spilled to memory). There are techniques to more optimally spill (e.g. live range splitting), and there are also other kinds of optimisations that could be run after register allocation to make the code more optimal. CSE at the assembler level could be used in this case. That's a very complex undertaking for relatively little gain though. E.g. those memory loads are probably optimised by the processor itself (not necessarily coming even from the L1 cache, but possibly from the write-back buffer). Jonas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Staticaly link C/C++ library (.lib) into FreePascal on Windows
Am 19.03.2017 04:53 schrieb "silvioprog": > > On Wed, Mar 15, 2017 at 4:38 AM, LacaK wrote: >>> >>> I forgot a question, could you send your ippi .a files for us? If so, I can try a test here. :-) >> >> >> Yes of course: I have uploaded them here http://uschovna.zoznam.sk/download?code=1342688547227-EZyyeVzToDVVkkbJNCbN >> But be aware of that I am on Windows, not Linux (Despite this I have added to ZIP also .a files as they are installed by Intel into direcotry "Linux". In direcory "Windows" are installed only .lib files). >> If I can repeat my question: Can I use ".a" libraries also on Windows ? If not can I use ".lib" created by C/C++ (I do not know how they are build) >> Thank you >> >> -Laco. > > > Unfortunately you can't use the static libraries (.a) of Intel because they are generated for Linux, in spite of static libraries be cross-platform. Non-sense. Static libraries are as platform specific as any other binary code, after all it needs to call OS functions. > I'm not sure about the .lib files. MS's COFF files adopt the .lib extension, but it is a little bit strange these sizes below: > > `libippi.a`: > . original - 251 MB; > . striped - 192 MB. > > `libippi.lib`: > . original - 853 KB; > . striped - no strip needed, it is already small. Seems like the second one is merely an import library for the DLL instead of a real static library. And of course that is COFF as well. MSVC only supports COFF. Regards, Sven ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Some questions about compiler work on x86_64-win64
Am 18.03.2017 23:11 schrieb "Bishop": > > 03/18/17 00:51:05, Sven Barth via fpc-devel < fpc-devel@lists.freepascal.org>: > > > > Bo, the main sense of this is to detect when a new thread is started and more importantly terminated cause only with this we can free the threadvar area of the thread accordingly (if the thread is an external one, not one started using BeginThread or TThread). > > Thanks, now i understand how its work. Plus as i understand on Linux (and other unixes) threadvar for external threads allocated on first access to them (and free via PThread ability to call destructor for key). Correct. (and my first word should have been "No", not "Bo"; stupid smartphone keyboard) > > > Why *should* it be auto generated if one can use a table and let the RTL do the rest. > Is it not better make all that can be done in compile time? Its not more complex solution for compiler code, but as i see it, its more harmonious (Its depend not only INIFINAL, but all tables, than used in RTL to make work of compiler/linker. As example, FPC_THREADVARTABLES. Different modules, i mean DLL or SO, use different TLS keys for their threadvar regions. But why position of variable from begin of threadvar region must be generated in runtime? Isn`t it work for linker?). Possible this is depend on that "dynamic packages"? If you have different modules (the binary and the libraries) then they are *separate* entities. Cause it could be that a Pascal library is used with a C binary and thus a library has the whole RTL statically linked (or at least that part that is used). Only dynamic packages allow one to transparently have units be part of different binary modules yet providing one whole application. Package libraries can however only be used by a binary compiled with the same compiler as they rely on quite a bit of compiler magic. > > > Also with the addition of dynamic packages this will move even more towards a table based approach. > Where i can read information about what is it and why we need it? What kind of problems is must solve? Because we already have dll/so, and as i know and see for now its enough. Possible my knowledge is not enough to see whole problem. With dynamic packages you can share classes, strings, memory, etc. between the modules (the main binary and the different package libraries), because the RTL will only exist once. And all this transparently for the user. When you use ordinary libraries you need to use a shared memory manager to pass strings around and you can't use the "as" and "is" operators inside the main binary on classes passed in from the libraries (and by extension this also applies to exceptions). > > > But you can set the corresponding PE flag for ASLR using $SetPEOpts (or so). No recompilation needed in that case. > Can. But what if i dont want ASLR binary? Its totaly valid. Since ASLR is disabled by default in FPC that question is useless. > > > Microsoft recommended that approach for Win64 so why should we do the work and implement it differently even if ASLR isn't enabled by default for FPC executables? > Recommendation in not a law (like it is with SEH in Win64). C compilers allow both type of programs, depend on what programmer need. Is it need many work to change it? As i see it, its just one check in compiler code for global varibles (if select PIC - use RIP-related, if not - use direct). It already done in linux. I think it was better to give compiler user more possibilities when its cost almoust nothing. If it is so important to you: patches are welcome. But keep in mind that the default needs to be the status quo. Regards, Sven ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[fpc-devel] Optimization of redundant mov's
Hi all, there has been some discussion about FPCs optimizer in #31444, prompting me to investigate some of my own code. Generally speaking the generated assembler is not all that bad (I like how it uses LEA for almost all integer arithmetics), but I keep seeing sections with lots of redundant MOVs. Example, from a SHA512 implementation: CurrentHash is a field of the current class, compiled with anything above -O2, -CpCOREAVX2, -Px86_64. a:= CurrentHash[0]; b:= CurrentHash[1]; c:= CurrentHash[2]; d:= CurrentHash[3]; 000100074943 488b8424a002 mov0x2a0(%rsp),%rax 00010007494B 4c8b5038 mov0x38(%rax),%r10 00010007494F 488b8424a002 mov0x2a0(%rsp),%rax 000100074957 4c8b5840 mov0x40(%rax),%r11 00010007495B 488b9424a002 mov0x2a0(%rsp),%rdx 000100074963 488b4248 mov0x48(%rdx),%rax 000100074967 488b9424a002 mov0x2a0(%rsp),%rdx 00010007496F 488b6a50 mov0x50(%rdx),%rbp Every single one of the "mov 0x2a0(%rsp), %rxx" instructions except the first is redundant and causes another memory round-trip. At the same time, more registers are used, which probably makes other optimizations more difficult, especially when something similar happens on i386. Now, the fun part: I haven't been able to build a simple test that causes the same issue (the self-pointer already is in %rcx and not fetched from the stack each time), so I have a feeling this may be a side effect of some other part of the code. Does this sound familiar to anyone? If so, what could I do about it? Regards, Martok ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel