Re: [PD] Store data in memory more efficiently than in arrays

2022-01-14 Thread Roman Haefeli
Hi IOhannes

Interesting side-notes. Thanks!
Roman

On Fri, 2022-01-14 at 09:11 +0100, IOhannes m zmoelnig wrote:
> 
> sidenote: of course we are not alone.
> take for example the most popular programming language¹ of the last
> few 
> years:
> a boolean value ideally requires a single bit to be stored
> accurately.
> now in python (tested on Python3.9 on a 64bit linux system), it
> doesn't 
> take 1 bit, or even 1 byte, but instead it takes 8 bytes.
> 
>  >>> sys.getsizeof([True]*3)-sys.getsizeof([True]*2)
> 8
> 
> at least, if you store the boolean in an array (a single boolean
> value 
> (outside of an array) has some extra metadata, that take 28 bytes in
> total)²
> 
>  >>> sys.getsizeof(True)
> 28
> 
> 


signature.asc
Description: This is a digitally signed message part
___
Pd-list@lists.iem.at mailing list
UNSUBSCRIBE and account-management -> 
https://lists.puredata.info/listinfo/pd-list


Re: [PD] Store data in memory more efficiently than in arrays

2022-01-14 Thread Roman Haefeli
Dear José 

Thanks for the hint. I was indeed aware of this library and would also
use it if I'd hit a memory limit and would be dealing with sound files.
It seems like purpose is to allow to store specifically sound more
efficiently, no so much arbitrary byte-level data (though one could
easily write an abstraction to store 2 bytes at one table16 index.

I was just curious to think about it since we have now a general way to
deal with files on a byte level, I wondered if there is a similarly
general way to deal with bytes in memory. Obviously, there are arrays,
though they eat away 8 times more than strictly necessary on today's
systems. I'll stick to them anyway.

Roman


On Thu, 2022-01-13 at 11:41 -0300, José de Abreu wrote:
> Roman, maybe you could use iem16? 
> 
> https://git.iem.at/pd/iem16 says that it is "16bit storage for Pd"
> 
> looking at the helpfile of [table16] we read:
> 
> "[table16] stores 16bit values. The normal pd-tables ([table], array)
> store the values as floating-points. While floating points are
> (often) more precise (this is of course not really true..., esp. when
> comparing integer(4byte) to floating-point.) they use a lot of memory
> (4byte).
> 
> [table16] uses only 16bit (2bytes) to store the values, which is half
> of the memory."
> 
> So maybe it is exactly what you need?
> 
> Em qua., 12 de jan. de 2022 às 11:57, Christof Ressi <
> i...@christofressi.com> escreveu:
> > > I read
> > > once in IRC that one value in a Pd-array requires not 4 bytes,
> > but 8
> > > bytes on 64-bit systems.
> > Yes. Pd's graphical arrays (and Pd's data structure arrays in
> > general) 
> > are implemented as a linear array of "words" (t_word). A "word" can
> > hold 
> > one of several possible types. It is implemented as a C union, so
> > the 
> > overall size is always that of the largest member. In our case,
> > the 
> > largest member is a pointer (e.g. t_symbol *), which is 4 bytes on
> > a 
> > 32-bit system and 8 bytes on a 64-bit systems.
> > 
> > This means that even if you would add a "byte" type, the overall
> > size of 
> > "t_word" would stay the same.
> > 
> > However, you can always implement your own byte array object as an 
> > external. But as you noted, this is not necessary except you're on
> > a 
> > very tight memory and/or CPU budget.
> > 
> > Christof
> > 
> > On 12.01.2022 14:20, Roman Haefeli wrote:
> > > Hi
> > >
> > > Sometimes I stored byte data (lists of bytes) in arrays. IIRC, I
> > read
> > > once in IRC that one value in a Pd-array requires not 4 bytes,
> > but 8
> > > bytes on 64-bit systems. Since storing plain bytes seems not such
> > an
> > > uncommon use case for me, I wonder if it can be done more
> > efficiently.
> > > Not that I ever hit a memory limit, I'm just curious. With the
> > new
> > > (amazing!) [file] object, dealing with byte lists has become even
> > more
> > > appealing, so the desire to store them in memory increases.
> > >
> > >
> > > Roman
> > >
> > >
> > >
> > > ___
> > > Pd-list@lists.iem.at mailing list
> > > UNSUBSCRIBE and account-management -> 
> > https://lists.puredata.info/listinfo/pd-list
> > 
> > 
> > 
> > ___
> > Pd-list@lists.iem.at mailing list
> > UNSUBSCRIBE and account-management -> 
> > https://lists.puredata.info/listinfo/pd-list
> 
> ___
> Pd-list@lists.iem.at mailing list
> UNSUBSCRIBE and account-management -> 
> https://lists.puredata.info/listinfo/pd-list


signature.asc
Description: This is a digitally signed message part
___
Pd-list@lists.iem.at mailing list
UNSUBSCRIBE and account-management -> 
https://lists.puredata.info/listinfo/pd-list


Re: [PD] Store data in memory more efficiently than in arrays

2022-01-14 Thread IOhannes m zmoelnig

On 1/13/22 15:41, José de Abreu wrote:

Roman, maybe you could use iem16?


[...]

[table16] uses only 16bit (2bytes) to store the values, which is half of
the memory."

So maybe it is exactly what you need?



i don't really think so.
afaict, roman is mainly concerned about a *potential waste* of memory.
in his words:
> Not that I ever hit a memory limit, I'm just curious.


so to answer roman's question first (possibly repeating what christof said):

of course it would be possible to store data in a more packed format, 
saving quite a lot of memory (a factor of 8 on an 64bit system!).

however, it would complicate the internal data handling a lot.
right now, there's a unified data model, where each message (or array) 
consists of *atoms* of a single size: this allows us to write code 
*once* for multiple cases (and whenever there's a bug, it only needs to 
be fixed once) rather than special-casing different data-layouts with 
similar but subtly different code (and whenever there's a bug, it needs 
to be fixed in each place separately, with the possibility to forget one 
of those places every time we do it...).

it also allows us to have "data structures".

so we are trading code complexity for memory consumption.

this is a trade i would do any time (favouring more memory over more 
complex code).


obviously this comes with problems: since we need more memory, we might 
hit the physical RAM size, in which case we get into trouble.


but since - as you say - you've never actually hit the memory limit (and 
according to the number of times this is being discussed on the list, it 
seems that hardly anybody ever does), i'd classify this as "premature 
optimization".


sidenote: of course we are not alone.
take for example the most popular programming language¹ of the last few 
years:

a boolean value ideally requires a single bit to be stored accurately.
now in python (tested on Python3.9 on a 64bit linux system), it doesn't 
take 1 bit, or even 1 byte, but instead it takes 8 bytes.


>>> sys.getsizeof([True]*3)-sys.getsizeof([True]*2)
8

at least, if you store the boolean in an array (a single boolean value 
(outside of an array) has some extra metadata, that take 28 bytes in total)²


>>> sys.getsizeof(True)
28




now about "iem16":

that library was indeed written to store data more efficiently.
i wrote it in 2003 or so (according to the VCS history and some comments 
in the code) to implement the live electronics for a piece that required 
a long (IIRC: 20 minutes) multichannel (IIRC: 4 channels) delayline.


back then, Pd was practically everywhere 32bit (the first amd64 
processor was released in 2003; the first windows to run on a 64bit 
address space was released in 2005), so a single number stored in an 
array would require (only) 4 byte.
if my math serves me right, the required delay line would need a 
laughable 200MiB or RAM.
otoh, a "PowerBook G4 (late 2002)" would be equipped with 256MB by 
default³. specs for PC laptops would probably be about the same.
so it was practically impossible to run a 200MB delayline on such 
systems (at least if you also wanted to run Pd and an OS), and we had to 
trim down the memory consumption so the patch could be used on the 
musicians' laptops.


i don't remember having had a need for this library since then.

mtgasr
IOhannes


¹ according to PYPL: https://pypl.github.io/PYPL.html
² this argument is somewhat flawed, as python also gives us as 'bytes' 
class to store data in byte-arrays, where each byte consumes exactly 1 byte.

³ https://en.wikipedia.org/wiki/PowerBook_G4


OpenPGP_signature
Description: OpenPGP digital signature
___
Pd-list@lists.iem.at mailing list
UNSUBSCRIBE and account-management -> 
https://lists.puredata.info/listinfo/pd-list


Re: [PD] Store data in memory more efficiently than in arrays

2022-01-13 Thread José de Abreu
Roman, maybe you could use iem16?

https://git.iem.at/pd/iem16 says that it is "16bit storage for Pd"

looking at the helpfile of [table16] we read:

"[table16] stores 16bit values. The normal pd-tables ([table], array) store
the values as floating-points. While floating points are (often) more
precise (this is of course not really true..., esp. when comparing
integer(4byte) to floating-point.) they use a lot of memory (4byte).

[table16] uses only 16bit (2bytes) to store the values, which is half of
the memory."

So maybe it is exactly what you need?

Em qua., 12 de jan. de 2022 às 11:57, Christof Ressi 
escreveu:

> > I read
> > once in IRC that one value in a Pd-array requires not 4 bytes, but 8
> > bytes on 64-bit systems.
> Yes. Pd's graphical arrays (and Pd's data structure arrays in general)
> are implemented as a linear array of "words" (t_word). A "word" can hold
> one of several possible types. It is implemented as a C union, so the
> overall size is always that of the largest member. In our case, the
> largest member is a pointer (e.g. t_symbol *), which is 4 bytes on a
> 32-bit system and 8 bytes on a 64-bit systems.
>
> This means that even if you would add a "byte" type, the overall size of
> "t_word" would stay the same.
>
> However, you can always implement your own byte array object as an
> external. But as you noted, this is not necessary except you're on a
> very tight memory and/or CPU budget.
>
> Christof
>
> On 12.01.2022 14:20, Roman Haefeli wrote:
> > Hi
> >
> > Sometimes I stored byte data (lists of bytes) in arrays. IIRC, I read
> > once in IRC that one value in a Pd-array requires not 4 bytes, but 8
> > bytes on 64-bit systems. Since storing plain bytes seems not such an
> > uncommon use case for me, I wonder if it can be done more efficiently.
> > Not that I ever hit a memory limit, I'm just curious. With the new
> > (amazing!) [file] object, dealing with byte lists has become even more
> > appealing, so the desire to store them in memory increases.
> >
> >
> > Roman
> >
> >
> >
> > ___
> > Pd-list@lists.iem.at mailing list
> > UNSUBSCRIBE and account-management ->
> https://lists.puredata.info/listinfo/pd-list
>
>
>
> ___
> Pd-list@lists.iem.at mailing list
> UNSUBSCRIBE and account-management ->
> https://lists.puredata.info/listinfo/pd-list
>
___
Pd-list@lists.iem.at mailing list
UNSUBSCRIBE and account-management -> 
https://lists.puredata.info/listinfo/pd-list


Re: [PD] Store data in memory more efficiently than in arrays

2022-01-12 Thread Christof Ressi

I read
once in IRC that one value in a Pd-array requires not 4 bytes, but 8
bytes on 64-bit systems.
Yes. Pd's graphical arrays (and Pd's data structure arrays in general) 
are implemented as a linear array of "words" (t_word). A "word" can hold 
one of several possible types. It is implemented as a C union, so the 
overall size is always that of the largest member. In our case, the 
largest member is a pointer (e.g. t_symbol *), which is 4 bytes on a 
32-bit system and 8 bytes on a 64-bit systems.


This means that even if you would add a "byte" type, the overall size of 
"t_word" would stay the same.


However, you can always implement your own byte array object as an 
external. But as you noted, this is not necessary except you're on a 
very tight memory and/or CPU budget.


Christof

On 12.01.2022 14:20, Roman Haefeli wrote:

Hi

Sometimes I stored byte data (lists of bytes) in arrays. IIRC, I read
once in IRC that one value in a Pd-array requires not 4 bytes, but 8
bytes on 64-bit systems. Since storing plain bytes seems not such an
uncommon use case for me, I wonder if it can be done more efficiently.
Not that I ever hit a memory limit, I'm just curious. With the new
(amazing!) [file] object, dealing with byte lists has become even more
appealing, so the desire to store them in memory increases.


Roman



___
Pd-list@lists.iem.at mailing list
UNSUBSCRIBE and account-management -> 
https://lists.puredata.info/listinfo/pd-list




___
Pd-list@lists.iem.at mailing list
UNSUBSCRIBE and account-management -> 
https://lists.puredata.info/listinfo/pd-list