To add to the great explanations already given by Johannes and Christoph, Miller gave a seminar last spring quarter called "Inside Pd" where he explains the inner workings of Pd in great detail. Here is the course site: http://msp.ucsd.edu/syllabi/206.20s/index.htm and here the videos: http://msp.ucsd.edu/syllabi/206.20s/movies/ In the video 2b.apr10.mp4 <http://msp.ucsd.edu/syllabi/206.20s/movies/2b.apr10.mp4> he explains the symbol storage strategy.
I'm very grateful to be able to access this content, thank you Miller! El jue., 1 oct. 2020 a las 13:21, Christof Ressi (<[email protected]>) escribió: > > your computer memory will define the time when it will crash Pd (it will > > crash, when all the strings in the symboltable eat up all the memory > > available) > I think eating up all available memory is not a likely scenario on > modern computers with a 64-bit address space and virtual memory. > > The actual problem is that Pd's symbol table is implemented as a hash > table with seperate chaining > (https://en.wikipedia.org/wiki/Hash_table#Separate_chaining). This means > that symbols which have the same hash value (= hash collision) are > connected as a linked list. If you look up a such a symbol, you have to > walk the linked list and do a string comparison for each element until > you find a match. > > The more symbols you add, the more hash collisions you get and the more > symbols end up in the same bucket. In practice, this means that for a > large number of elements, insertion and lookup becomes more and more > expensive because the linked lists for each bucket grow larger and > larger. While the hash lookup takes constant time "O(1)", the linked > list traversal takes linear time "O(N)" (and is not very cache friendly). > > Good hash table implementations mitigate this problems by a technique > called "rehashing" where the hash table array is resized if the "load > factor" (number of elements / number of buckets) exceed a certain > threshold (= "load factor"). Then every item is hashed again and > inserted back into the new hash table. > > Unfortunately, Pd doesn't do any rehashing and it is certainly possible > to significantly slow down Pd by flooding the symbol table. A common > load factor threshold is 0.75, but if you create 10 000 unique symbols > in Pd, the load factor would be ~10! Not very good... > > In fact, I've already been thinking about improving the symbol table. > > Christof > > On 01.10.2020 10:28, IOhannes m zmoelnig wrote: > > On 2020-10-01 09:54, oliver wrote: > >> IOhannes m zmoelnig wrote: > >>> On 2020-10-01 09:22, hans w. koch wrote: > >>>> but be aware of the risks of invoking makefilname all too often. > >>> note that if you use dollsyms (as in `[$1$2(`) you are filling up the > >>> symbol table just as well. > >> i was just about to ask, if the attached modified patch would avoid that > >> problem, but you replied already. > >> > >> could you please clarify the used term "invoke" a bit ? > >> i guess the number of [makefilename] objects isn't the problem, but how > >> much/often its conversion mechanism is used, right ? > > yes (the latter). > > > >> does that mean that everytime a number->symbol conversion happens > >> (regardless how it is done) the symboltable is filled and will > > no. > > everytime a *new* symbol is created. > > the point of symbols (vs ordinary strings) is, that a single literal > > only needs to be stored once. > > so if you first create a string "rubadub" (however you do this), a new > > entry for the symbol 'rubadub' is created. > > now, if you concatenate the symbols 'rubad' and 'ub', the result is > > "rubadub", which already happens to be in the symbol table (and thus no > > new entry needs to be generated). > > for Pd these strings are *identical*. > > this is cool as we can really easily compare the two strings. if they > > occupy the same entry in the symbol table (which basically means, that > > Pd gets the same pointer for when turning the literal into a symbol), > > then the two strings are the same. > > so rather than having to compare each character of the string > > "sjfdjdasjfsfjrueincru057894_curtrfenr3ewf8354j3wp57jp3" with each > > character of "sjfdjdasjfsfjrueincru057894_curtrfenr3ewfB354j3wp57jp3" , > > Pd only needs to compare two pointers - and this can be done in a single > > step on your CPU. > > > > the problem with generating symbol programmatically (e.g. by sending > > numbers to [makefilename %d]) is, that it is so super easy to generate > > lots and lots of (different) symbols. > > > > > >> eventually slow down or crash PD ? > >> > >> so, as a live example: writing number values to GUI labels dynamically > >> is a potentially dangerous thing ? what's the threshold there ? > > your computer memory will define the time when it will crash Pd (it will > > crash, when all the strings in the symboltable eat up all the memory > > available) > > > > as for the slow-down, why not simply create a patch that tests this for > you? > > > > create labels with [makefilename label%08d] with the input ranging from > > 0...2000000 (or so; you'll notice when it gets slow). > > measure the time it takes to generate the symbols (well, measure the > > time it takes to generate 10000 symbols or) > > > > > >> or is there any way to clear the symboltable ? > > i think i covered this in another ("the" other) post quite recently. > > > > > > gfsdm > > IOhannes > > > > > > _______________________________________________ > > [email protected] mailing list > > UNSUBSCRIBE and account-management -> > https://lists.puredata.info/listinfo/pd-list > > > > _______________________________________________ > [email protected] mailing list > UNSUBSCRIBE and account-management -> > https://lists.puredata.info/listinfo/pd-list > -- Maximiliano Estudies *VDT Referat Beschallung* +49 176 36784771 omslo.com maxiestudies.com
_______________________________________________ [email protected] mailing list UNSUBSCRIBE and account-management -> https://lists.puredata.info/listinfo/pd-list
