Re: [Tinycc-devel] Minimizing libtcc memory use

2024-03-08 Thread Eric Raible
First off: If you are tired of this conversation, just tell me, I get it.


> > # text 32, data.rw  4, data.ro  4, bss
> 4 bytes
> > # memory usage: 8192 to run, 649 symbols, 2901 other, 1639290 max (bytes)
> > mem_cur_size=11742 (bytes)
>
> > So tcc_print_stats() says 11742, but then displays values totaling 11786.
>
> What 11786 ?
>
> 8192 + 649 + 2901 = 11742
>
I misinterpreted the output, and didn't realize that the 48 bytes
from text, data.rw, data.ro, and bss were already included.


> There is nothing wrong here:  11742 + 32 blocks * 96 = 14814
> Where 96 is the size of tcc's mem_debug extra header.
>
> If you want to see the 11742 from valgrind then you just need to
> run the same example with a normal tcc compiled without MEM_DEBUG.
>
> Which makes sense I would think.
>
100%


> But when showing the example with MEM_DEBUG and -bench -vv  I
> did not expect you to doubt the numbers in the first place.
>
I shouldn't have.  But once my allocator and valgrind agreed
I went down the wrong path, esp considering I was parsing the
output incorrectly.


> Rather I just was trying to show how you could get some numbers
> for your own real case instead.  Which as you suspect could be
> minimized from 29kB down to 1-2 kB.  Most likely impossible but
> if we had some numbers we could tell also why.
>

And showing me was helpful, I use it below to get some numbers.

I don't know enough about the internals of tcc to really even be having
this conversation.  But I do know that my interactive application can
have many TCCStates, and with:

./configure --extra-cflags="-DMEM_DEBUG"
with a: tcc_set_options(state, "-Werror -vv -bench");
a simple hello world (single-TCCState) example reports:
-
0: .text0x14b57000  len 001bc  align 1000
1: .data.ro 0x14b571c0  len 00030  align 0008
2: .data0x14b571f0  len 00078  align 0008
2: .bss 0x14b57268  len 00050  align 0008
2: .got 0x14b572b8  len 00030  align 0008
-
protect rwx 0x14b57000  len 01000
-

That looks great!

But tcc has actually allocated 21248, according to both my
(more sophisticated custom allocator) and valgrind
(for instance if I _don't_ delete the state).

So ~80% of the bytes are unaccounted for.
Scales exactly with the number of states, so it's not ideal.

My loaded C has a callback to register all of its functions
with the main application.  After that, I have no need for
tcc_get_symbol() support, or in fact anything from libtcc
except for tcc_delete().

So I'm trying to pre-delete() all of the unneeded stuff that
tcc_delete() will eventually free anyway.  I was calling that
tcc_finalize().  The goal is to reduce the minimal TCCState
size from ~21k to the 4K required for PROT_EXEC.

- Eric
___
Tinycc-devel mailing list
Tinycc-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/tinycc-devel


Re: [Tinycc-devel] Minimizing libtcc memory use

2024-03-08 Thread grischka via Tinycc-devel

On 08.03.2024 07:30, Eric Raible wrote:

I guess that I just want the numbers to add up.
Using your example:

1) -DMEM_DEBUG -DCONFIG_RUNMEM_RO=0
2) your test.c
3) but I added an early return to tcc_delete() to no-op it

Running: valgrind tcc -nostdlib -vv -bench -run test.c
produced:

tcc version 0.9.28rc 2024-03-03 mob@9d2068c6* (AArch64 Linux)
-> test.c
-
0: .text0x4ccb000  len 00020  align 1000
1: .data.ro 0x4ccb020  len 4  align 0008
2: .data0x4ccb028  len 4  align 0008
2: .bss 0x4ccb030  len 4  align 0008
-
protect rwx 0x4ccb000  len 01000
-
# 3030 idents, 4 lines, 92 bytes
# 0.463 s, 8 lines/s, 0.0 MB/s
# text 32, data.rw  4, data.ro  4, bss 4 bytes
# memory usage: 8192 to run, 649 symbols, 2901 other, 1639290 max (bytes)
mem_cur_size=11742 (bytes)



So tcc_print_stats() says 11742, but then displays values totaling 11786.


What 11786 ?

8192 + 649 + 2901 = 11742

> ==2188==
> ==2188== HEAP SUMMARY:
> ==2188== in use at exit: 14,814 bytes in 32 blocks
>
> And valgrind reports 14814.  I have never seen valgrind wrong about this,
> and especially so b.c. my (luckily-correct) allocator reported 14814 as well.

There is nothing wrong here:  11742 + 32 blocks * 96 = 14814
Where 96 is the size of tcc's mem_debug extra header.

If you want to see the 11742 from valgrind then you just need to
run the same example with a normal tcc compiled without MEM_DEBUG.

Which makes sense I would think.

But when showing the example with MEM_DEBUG and -bench -vv  I
did not expect you to doubt the numbers in the first place.

Rather I just was trying to show how you could get some numbers
for your own real case instead.  Which as you suspect could be
minimized from 29kB down to 1-2 kB.  Most likely impossible but
if we had some numbers we could tell also why.

-- gr


___
Tinycc-devel mailing list
Tinycc-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/tinycc-devel


Re: [Tinycc-devel] Minimizing libtcc memory use

2024-03-06 Thread Eric Raible
I'm sorry for this delayed response.  I've been camping, no wifi for a
minute...

On Mon, Mar 4, 2024 at 3:47 AM grischka via Tinycc-devel <
tinycc-devel@nongnu.org> wrote:

> On 03.03.2024 21:26, Eric Raible wrote:
> >  > isn't there a garbage collecting done at the end to remove all the
> unused stuff
> >  > to produce a binary that contains only the necessary parts ?
> >
> > That very well might be the case, but given that tcc_get_symbol()
> > can be used at any time between tcc_relocate() and tcc_delete(),
> > it follows that _at least_ symbols are resident in the TCCState.
> > What I'm wondering about is the feasibility of keeping just code and
> > data, and flushing everything else.  This would require a new API -
> > something like tcc_finalize(TCCState *) or perhaps
> > tcc_finalize(TCCState *, flags), where flags specify what to flush.
>
> Me thinks this discussion is going around in circles.
>

I'm sorry for that.  tcc as it has been has been tremendously useful to me,
and all I'm trying to do is help.  Btw, I was not advocating for any changes
to tcc_relocate() except that once it became a 1-arg function I suggested
that it could be eliminated and tcc_add_symbol() could call it on demand.

Your tcc_finalize() subject of obsession was what we already had
> all the time since 0.9.25, that is the option to allocate the
> executable code separately from the state and therefor the option
> to delete (finalize) the state and still keep the code.
>

I understand that now.  I had previously used only automatic allocation
in tcc_relocate(), and since it worked I never explored further.  My main
interest was for tcc_set_realloc(), and I am happy that it was "approved".

...
> At which point I found that maintaining two options for running
> code might not be to the best for both sides, neither for tcc wrt.
> maintainability nor for users wrt. avoidance of inconvenience.
> ...
> Where simple plus complicated is more complicated than just
> complicated btw.
>
> I understand the frustration and agree that maintainability is a
crucial consideration.


> > I don't know enough about the internals, but if I'm willing to run with
> > CONFIG_RUNMEM_RO, it seems like the per TCCState memory use in my case
> > could be decreased from something like 29K to 1K or 2K.
> >
> > I should mention that the memory usage in my case is 29K regardless
> > of whether CONFIG_RUNMEM_RO is 0 or 1.
>
> How do you know this?
>

B.c I installed a custom allocator with tcc_set_realloc().
My exact case is hard to reproduce except for me, but the
following standalone example illustrates that mem_cur_size
might be incorrect:

#if 0Everything here assumes the following patch:diff --git a/libtcc.c
b/libtcc.cindex 6d720e74..332ec13d 100644--- a/libtcc.c+++
b/libtcc.c@@ -844,6 +844,12 @@ LIBTCCAPI TCCState *tcc_new(void)
LIBTCCAPI void tcc_delete(TCCState *s1) {+if (s1->do_bench) {+
   fprintf(stderr, "Top of tcc_delete()\n");+
tcc_print_stats(s1, 0);+fprintf(stderr, "Early return from
tcc_delete() - expect leaks!\n");+return;+} /* free
sections */ tccelf_delete(s1); @@ -2209,5 +2215,6 @@ PUB_FUNC void
tcc_print_stats(TCCState *s1, unsigned total_time) } #endif
fprintf(stderr, " %d max (bytes)\n", mem_max_size);+
fprintf(stderr, "mem_cur_size=%d (bytes)\n", mem_cur_size); #endif }
Running this program under valgrind on my arm debian with either
theseconfigs seems to show that mem_max_size is
undercounting../configure --extra-cflags="-DMEM_DEBUG"./configure
--extra-cflags="-DMEM_DEBUG -DCONFIG_RUNMEM_RO=0"#endif#include
#include #include #include
"libtcc.h"typedef struct block {unsigned long size;char
data[];} block;#define _B2S(block)((block) ? (block)->size :
0)#define _B2D(block)((block) ? (block)->data : 0)#define
_D2B(data)((data)  ? (block *)(data) - 1 : 0)#define
_D2S(data)_B2S(_D2B(data))static unsigned int blocks, mcalls,
rcalls, fcalls, bytes, max;static void *reallocator(void *data,
unsigned long want){block *b = _D2B(data);int had =
_D2S(data);if (want) {// some accountingif (had) {
   ++rcalls;bytes -= had + sizeof b;} else
{++blocks;++mcalls;}bytes +=
want + sizeof b;if (max < bytes) max = bytes;b =
realloc(b, want + sizeof b);b->size = want;} else if (had)
{--blocks;++fcalls;bytes -= had + sizeof b;
free(b);b = 0;}return _B2D(b);}static void
fatal(char *msg){fprintf(stderr, "%s\n", msg);exit(1);}int
main(){tcc_set_realloc(reallocator);TCCState *state =
tcc_new();if (!state)fatal("excuse me");
tcc_set_options(state, "-bench");tcc_set_output_type(state,
TCC_OUTPUT_MEMORY);if (tcc_compile_string(state, "int x, y,
z=1;\n"  "int foo(int arg) { return
arg + z; }"))fatal("sorry");if (tcc_relocate(state))
 fatal("so 

Re: [Tinycc-devel] Minimizing libtcc memory use

2024-03-04 Thread grischka via Tinycc-devel

On 03.03.2024 21:26, Eric Raible wrote:

 > isn't there a garbage collecting done at the end to remove all the unused 
stuff
 > to produce a binary that contains only the necessary parts ?

That very well might be the case, but given that tcc_get_symbol()
can be used at any time between tcc_relocate() and tcc_delete(),
it follows that _at least_ symbols are resident in the TCCState.
What I'm wondering about is the feasibility of keeping just code and
data, and flushing everything else.  This would require a new API -
something like tcc_finalize(TCCState *) or perhaps
tcc_finalize(TCCState *, flags), where flags specify what to flush.


Me thinks this discussion is going around in circles.

Your tcc_finalize() subject of obsession was what we already had
all the time since 0.9.25, that is the option to allocate the
executable code separately from the state and therefor the option
to delete (finalize) the state and still keep the code.

It just happened that at some point people found that tcc not only
should allow them to have the cake for their own, but that they
also want tcc to provision them with some extra convenience to
cleanup with it (including unprotect and release of the unwind
table on win64).

At which point I found that maintaining two options for running
code might not be to the best for both sides, neither for tcc wrt.
maintainability nor for users wrt. avoidance of inconvenience.

Of course now that we have that simpler API, it was clear that
soon someone would come up suspecting some benefit if it were not
only simple but complicated too, optionally of course.

Where simple plus complicated is more complicated than just
complicated btw.

And while it's completely unclear what such benefit could be,
under what scenario.


I don't know enough about the internals, but if I'm willing to run with
CONFIG_RUNMEM_RO, it seems like the per TCCState memory use in my case
could be decreased from something like 29K to 1K or 2K.

I should mention that the memory usage in my case is 29K regardless
of whether CONFIG_RUNMEM_RO is 0 or 1.


How do you know this?

Some output from tcc -nostlib -vv -bench -run test.c
(tcc compiled with -DMEM_DEBUG -DCONFIG_RUNMEM_RO=0)

  int x = 1; //.data
  int y; // .bss
  const int z; // .rdata
  int _start() { return 0; } //.text

tcc version 0.9.28rc 2024-03-04 mob@d4f7fcbb* (i386 Windows)
-> test.c
-
memory  007B6390  len 02000
-
0: .text007B7000  len 00018  align 1000
1: .rdata   007B7040  len 4  align 0040
2: .data007B7044  len 4  align 0004
2: .bss 007B7048  len 4  align 0004
-
protect rwx 007B7000  len 01000
-
# 1074 idents, 6980 lines, 137488 bytes
# 0.001 s, 698 lines/s, 137.5 MB/s
# text 17, data.rw 4, data.ro 4, bss 4 bytes
# memory usage: 10250 bytes (8192 to run, 499 for 14 symbols, 1559 for state 
etc)

Which is 8kB to run and 2kB extra.

Same for -run tcc.c
-
memory  00E159D8  len 63000
-
0: .text00E16000  len 43418  align 1000
1: .rdata   00E59440  len 08c00  align 0040
2: .data00E62040  len 00311  align 0004
2: .bss 00E62358  len 15938  align 0008
-
protect rwx 00E16000  len 44000
-
# 29262 idents, 99132 lines, 3377368 bytes
# 0.202 s, 490752 lines/s, 16.7 MB/s
# text 270677, data.rw 4, data.ro 35789, bss 88112 bytes
# memory usage: 413204 bytes (405504 to run, 6030 for 158 symbols, 1670 for 
state etc)

Where the 4 bytes initialized data.rw btw. is Eric Raible's
static void *(*reallocator)(void*, unsigned long) = default_reallocator;

-- gr



Thanks - Eric



___
Tinycc-devel mailing list
Tinycc-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/tinycc-devel


Re: [Tinycc-devel] Minimizing libtcc memory use

2024-03-03 Thread Eric Raible
> isn't there a garbage collecting done at the end to remove all the unused
stuff
> to produce a binary that contains only the necessary parts ?

That very well might be the case, but given that tcc_get_symbol()
can be used at any time between tcc_relocate() and tcc_delete(),
it follows that _at least_ symbols are resident in the TCCState.
What I'm wondering about is the feasibility of keeping just code and
data, and flushing everything else.  This would require a new API -
something like tcc_finalize(TCCState *) or perhaps
tcc_finalize(TCCState *, flags), where flags specify what to flush.

I don't know enough about the internals, but if I'm willing to run with
CONFIG_RUNMEM_RO, it seems like the per TCCState memory use in my case
could be decreased from something like 29K to 1K or 2K.

I should mention that the memory usage in my case is 29K regardless
of whether CONFIG_RUNMEM_RO is 0 or 1.

Thanks - Eric
___
Tinycc-devel mailing list
Tinycc-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/tinycc-devel