Re: In-place extension of arrays only for certain alignment?

2022-08-17 Thread Ali Çehreli via Digitalmars-d-learn

On 8/17/22 19:27, Steven Schveighoffer wrote:
> On 8/17/22 10:09 PM, Ali Çehreli wrote:
>>  > IIRC, your data does not need to be sequential in *physical memory*,
>>  > which means you can use a ring buffer that is segmented instead of
>>  > virtually mapped, and that can be of any size.
>>
>> I thought about that as well. But I would like the sizes of blocks
>> (Appenders?) be equal in size so that opIndex still can provide O(1)
>> guarantee. (Compute the block + an offset.)
>
> It's still O(1). You only have 2 slices to worry about.

Sometimes 2... I wanted to leave the sliding window width dynamic.

So, there will be M buffers, not 2. If their lengths are not equal, 
opIndex must be O(M). M is expected to be small but still...


M buffers of 'pageSize - (meta data).sizeof' each.

BeerConf... Sure... :)

Ali



Re: In-place extension of arrays only for certain alignment?

2022-08-17 Thread Steven Schveighoffer via Digitalmars-d-learn

On 8/17/22 10:09 PM, Ali Çehreli wrote:

 > IIRC, your data does not need to be sequential in *physical memory*,
 > which means you can use a ring buffer that is segmented instead of
 > virtually mapped, and that can be of any size.

I thought about that as well. But I would like the sizes of blocks 
(Appenders?) be equal in size so that opIndex still can provide O(1) 
guarantee. (Compute the block + an offset.)


It's still O(1). You only have 2 slices to worry about.

Perhaps next beerconf we can discuss!

-Steve


Re: In-place extension of arrays only for certain alignment?

2022-08-17 Thread Ali Çehreli via Digitalmars-d-learn

On 8/17/22 18:31, Steven Schveighoffer wrote:

> 1. I highly recommend trying out the ring buffer solution to see if it
> helps. The disadvantage here is that you need to tie up at least a page
> of memory.

I started to think holding on to multiple pages of memory should not 
matter anyway. If really needed, an array of Appenders could be used; 
when really really needed, they may come from a free list.


Aside: I looked at Appender's implementation and saw that extending is 
one of its concerns as well.


> 2. All my tests using the ring buffer showed little to no performance
> improvement over just copying back to the front of the buffer. So
> consider just copying the data back to the front of an already allocated
> block.

That makes sense as well. One worry would be types with copy 
constructors. (I am not sure whether D is still a language where structs 
can freely be moved around.)


> IIRC, your data does not need to be sequential in *physical memory*,
> which means you can use a ring buffer that is segmented instead of
> virtually mapped, and that can be of any size.

I thought about that as well. But I would like the sizes of blocks 
(Appenders?) be equal in size so that opIndex still can provide O(1) 
guarantee. (Compute the block + an offset.)


>
> -Steve

Ali



Re: In-place extension of arrays only for certain alignment?

2022-08-17 Thread Steven Schveighoffer via Digitalmars-d-learn

On 8/17/22 2:40 PM, Ali Çehreli wrote:

On 8/16/22 19:33, Steven Schveighoffer wrote:

Using a 16-byte block sounds like a good strategy at first because 
nobody knows whether an array will get more than one element.


However, if my guess is correct (i.e. the first element of size of 
16-bytes is placed on a 16-byte block), then the next allocation will 
always allocate memory for the second element.


A 16-byte element size must be put in a 32-byte block, you still need 
one byte for the metadata.




One might argue that dynamic arrays are likely to have more than a 
single element, so the initial block should at least be twice the 
element size. This would cut memory allocation by 1 count for all 
arrays. And in my case of 1-element arrays, allocation count would be 
halved. (Because I pay for every single append right now.)


So yes, if you have a 32-byte block for 16-byte elements, it means you 
can only fit one element in the block. If you are using a sliding window 
approach, where you remove the first element and then append another, 
you will in effect reallocate on every append.


Using the append/popFront mechanism to implement your sliding window is 
going to perform badly. Appending is not designed to make this situation 
perform well.


That all makes sense. I didn't think the meta data would be at the end 
but I sense it's related to the "end slice", so it's a better place 
there. (?)


It's for alignment. If I put 1 byte at the front, this means I have to 
always skip 7 or 15 more bytes (depending on alignment requirements).


BUT, I put the metadata at the front on big (page+ size) blocks, because 
I can both afford to skip 16 bytes in a block of 4096, and if you extend 
the block, there is no need to move the metadata later. Consider that 
the metadata lookup cache could be out of date if it had to move.



 > What is your focus? Why do you really want this "optimization" of gluing
 > together items to happen?

This is about what you and I talked about in the past and something I 
mentioned in my DConf lightning talk this year. I am imagining a 
FIFO-like cache where elements of a source range are stored. There is a 
sliding window involved. I want to drop the unused front elements 
because they are expensive in 2 ways:


1) If the elements are not needed anymore, I have to move my slice 
forward so that the GC collects their pages.


2) If they are not needed anymore, I don't want to even copy them to a 
new block because this would be expensive, and in the case of an 
infinite source range, impossible.


Ah! I actually have a solution for this in iopipe -- a ring buffer. 
Basically, you map the same physical pages of memory sequentially. It 
allows you to simply change the pointer, and never need to copy anything.


See this code for an example (I only have it for Posix, but Windows has 
similar features, I have to add them): 
https://github.com/schveiguy/iopipe/blob/6a8c10d2858f92978d72c55eecc7ad55fcc207e2/source/iopipe/buffer.d#L306


The question is when to apply this dropping of old front elements. When 
I need to add one more element to the array, I can detect whether this 
*may* allocate by the expression 'arr.length == arr.capacity' but not 
really though, because the runtime may give me adjacent room without 
allocation. So I can delay the "drop the front elements" computation 
because there will be no actual copying at this time.


Even if you could do this, this doesn't help because at the end of the 
pool, you need to reallocate into a another pool (or back to the 
beginning of the pool), because there can be no free pages after the 
last page in the pool (you can't merge pools together).



 > https://dlang.org/phobos/core_memory.html#.GC.extend

Ok, that sounds very useful. In addition to "I can detect when it *may* 
allocate", I can detect whether there is adjacent free room. (I can ask 
for just 1 element extension; I tested; and it works.) (I guess this 
GC.extend has to grab a GC lock.)


However, for that to work, I seem to need the initial block pointer that 
the GC knows about. (My sliding array's .ptr not work, so I have to save 
the initial arr.ptr).


You can get this via `GC.query`, but that means 2 calls into the GC.


Conclusion:

1) Although understanding the inner workings of the runtime is very 
useful and core.memory has interesting functions, it feels too much 
fragile work to get exactly what I want. I should manage my own memory 
(likely still backed by the GC).


2) I argue that the initial allocation should be more than 1 element so 
that we skip 1 allocation for most arrays and 50% for my window-of-1 
sliding window case.


So 2 things here:

1. I highly recommend trying out the ring buffer solution to see if it 
helps. The disadvantage here is that you need to tie up at least a page 
of memory.
2. All my tests using the ring buffer showed little to no performance 
improvement over just copying back to the front of the buffer. So 
consider just copy

Re: In-place extension of arrays only for certain alignment?

2022-08-17 Thread Ali Çehreli via Digitalmars-d-learn

On 8/16/22 19:33, Steven Schveighoffer wrote:

> Everything in the memory allocator is in terms of pages. A pool is a
> section of pages. The large blocks are a *multiple* of pages, whereas
> the small blocks are pages that are *divided* into same-sized chunks.

Thank you. I am appreciating this discussion a lot.

> When you want to allocate a block, if it's a half-page or less, then it
> goes into the small pool, where you can't glue 2 blocks together.

That's how it should be because we wouldn't want to work to know which 
blocks are glued.


>>  > The reason why your `bad` version fails is because when it must
>>  > reallocate, it still is only allocating 1 element.

I will get back to this 1 element allocation below.

>> That part I still don't understand. The same block of e.g. 16 bytes
>> still has room. Why not use that remaining portion?
>
> It does until it doesn't. Then it needs to reallocate.

I think my problem was caused by my test data being 16 bytes and (I 
think) the runtime picked memory from the 16-byte pool without any hope 
of future growth into adjacent block.


Using a 16-byte block sounds like a good strategy at first because 
nobody knows whether an array will get more than one element.


However, if my guess is correct (i.e. the first element of size of 
16-bytes is placed on a 16-byte block), then the next allocation will 
always allocate memory for the second element.


One might argue that dynamic arrays are likely to have more than a 
single element, so the initial block should at least be twice the 
element size. This would cut memory allocation by 1 count for all 
arrays. And in my case of 1-element arrays, allocation count would be 
halved. (Because I pay for every single append right now.)


Of course, I can understand that there can be applications where a large 
number of arrays (e.g. a 2D array) may have zero-elements or 
one-element, in which case my proposal of allocating the first element 
on a e.g. 32-byte block would be wasteful.


I think such cases are rare and we incur 1 extra allocation penalty for 
all arrays for that strategy.


> Maybe some ASCII art? `A` is "used by the current slice", `x` is
> "allocated, but not referenced by the array". `.` is "part of the block
> but not used (i.e. can grow into this)". `M` is "metadata"
>
> ```d
> auto arr = new ubyte[10];  //   AA.. ...M
> arr = arr[1 .. $]; // xAAA  AA.. ...M
> arr ~= 0;  // xAAA  AAA. ...M
> arr ~= 0;  // xAAA   ...M
> arr = arr[3 .. $]; //    ...M
> arr ~= cast(ubyte[])[1, 2, 3]; //    AAAM // full!
> arr ~= 1;  //    ...M // reallocated!
> arr.ptr has changed
> ```

That all makes sense. I didn't think the meta data would be at the end 
but I sense it's related to the "end slice", so it's a better place 
there. (?)


> Metadata isn't always stored. Plus, it's optimized for the block size.
> For example, any block that is 256 bytes or less only needs a single
> byte to store the "used" space.

That's pretty interesting and smart.

> What is your focus? Why do you really want this "optimization" of gluing
> together items to happen?

This is about what you and I talked about in the past and something I 
mentioned in my DConf lightning talk this year. I am imagining a 
FIFO-like cache where elements of a source range are stored. There is a 
sliding window involved. I want to drop the unused front elements 
because they are expensive in 2 ways:


1) If the elements are not needed anymore, I have to move my slice 
forward so that the GC collects their pages.


2) If they are not needed anymore, I don't want to even copy them to a 
new block because this would be expensive, and in the case of an 
infinite source range, impossible.


The question is when to apply this dropping of old front elements. When 
I need to add one more element to the array, I can detect whether this 
*may* allocate by the expression 'arr.length == arr.capacity' but not 
really though, because the runtime may give me adjacent room without 
allocation. So I can delay the "drop the front elements" computation 
because there will be no actual copying at this time.


But the bigger issue is, because I drop elements my array never gets 
large enough to take advantage of this optimization and there is an 
allocation for every single append.


> https://dlang.org/phobos/core_memory.html#.GC.extend

Ok, that sounds very useful. In addition to "I can detect when it *may* 
allocate", I can detect whether there is adjacent free room. (I can ask 
for just 1 element extension; I tested; and it works.) (I guess this 
GC.extend has to grab a GC lock.)


However, for that to work, I seem to need the initial block pointer that 
the GC knows about. (My sliding array's .ptr not work, so I have to save 
the initial arr.ptr).


Conclusion:

1) Although understanding the inner workings of the runtime is ve

Re: Programs in D are huge

2022-08-17 Thread Ali Çehreli via Digitalmars-d-learn

On 8/17/22 09:28, Diego wrote:

> I'm writing a little terminal tool, so i think `-betterC` is the best
> and simple solution in my case.

It depends on what you mean with terminal tool bun in general, no, full 
features of D is the most useful option.


I've written a family of programs that would normally be run on the 
terminal; I had no issues that would warrant -betterC.


Ali



Re: Programs in D are huge

2022-08-17 Thread Diego via Digitalmars-d-learn

Thank you to everyone,

I'm writing a little terminal tool, so i think `-betterC` is the 
best and simple solution in my case.







Re: Compile time int to string conversion in BetterC

2022-08-17 Thread Steven Schveighoffer via Digitalmars-d-learn

On 8/17/22 6:38 AM, Dennis wrote:

On Wednesday, 17 August 2022 at 08:44:30 UTC, Ogi wrote:

Maybe I’m missing something?


I had the same problem, and came up with the following trick:

```D
enum itoa(int i) = i.stringof;
```


I have the same thing in my code:

```d
enum intStr(int x) = x.stringof;
```

The reason you do this is to shoehorn things that aren't technically 
ints (such as enum types) into ints. This avoids weirdness like 
`cast(foo)bar` or whatnot that might happen if you just use stringof on 
any expression.




Now I need to warn you that the output of `stringof` is technically 
implementation defined per the specification, so you shouldn't rely on 
it. In practice [this doesn't stop 
people](https://github.com/libmir/mir-algorithm/pull/422), and I don't 
think integers will ever not be printed as a string of base 10 digits.


Yeah, I wouldn't worry about stringof for ints.

-Steve


Re: Compile time int to string conversion in BetterC

2022-08-17 Thread Dennis via Digitalmars-d-learn

On Wednesday, 17 August 2022 at 08:44:30 UTC, Ogi wrote:

Maybe I’m missing something?


I had the same problem, and came up with the following trick:

```D
enum itoa(int i) = i.stringof;

enum major = 3;
enum minor = 2;
enum patch = 1;

enum versionString = itoa!major ~ "." ~ itoa!minor ~ "." ~ 
itoa!patch;


static assert(versionString == "3.2.1");
```

Now I need to warn you that the output of `stringof` is 
technically implementation defined per the specification, so you 
shouldn't rely on it. In practice [this doesn't stop 
people](https://github.com/libmir/mir-algorithm/pull/422), and I 
don't think integers will ever not be printed as a string of base 
10 digits.


Compile time int to string conversion in BetterC

2022-08-17 Thread Ogi via Digitalmars-d-learn
It’s 2022 already and BetterC still imposes limits at compile 
time and makes things awkward.


I have 3 integer enums that represents my library version. And I 
want to generate a "1.2.3" enum from them. This is a trivial 
thing to do in standard D but I can’t find a way to do it with 
BetterC enabled. I can’t use `to`, `text`, `sformat` or 
`toChars`, as none of them is compatible with BetterC. And I 
can’t use `sprintf` either because ‘stdc’ is not available at 
compile time.


On the other hand, string to integer conversion is simple: just 
‘mixin’ it! So as a workaround, I put numbers into private string 
enums, and the integer enums are generated from them. But this is 
not nice.


Maybe I’m missing something?