getting back to the idea of storing symbols by 3!:1 as delimited strings. This would both be an improvement in storage, and eliminate the error prone dependence on 10 s:
I've got the esc (uses ;:) method in the latest jpp ( https://github.com/Pascal-J/jpp ) to get 2MB/s throughput using 2 passes, and an escape code to handle both embeded escapes and nulls, and null delimited data/symbols. Several improvements to ;: would make this significantly faster and more flexible: emit null when j=-1, and emitword issued (previously suggested): This allows null fields to easily be "parsed" (current method used is to use function code 2, and examine gaps in order to add nulls as a 2nd pass. function code 2 is slower than 0, and overhead in calculating gaps, and inserting nulls) Add an action code that suspends/pauses current word. Next start word will append to current word, skipping any characters that were scanned during pause. This would allow "deleting" items in the middle of a word in a single pass instead of using the 2 pass approach (with 2nd pass using function code 1). Alternatively, it could function like ev, but if ew is in same state, it discards the elements between startword's. A custom action code (one interpretation of Henry's inclination, though he may have thought of custom function codes) that has a way of inserting a character. This would allow building an escaped sequence by inserting the escape character prior to last seen. Custom action codes would need to return characters to include (if it is not an ew,ev class), newi, newj at least. A new function code would be a variation on 2, emit i (i-j), actioncode, though "characters to include" would interact direction with function codes 0 and 1. A powerful tool for nested structures (see parenw machine in fsm.ijs that builds trees from parentheses groups) would be an emitwordandIncreaseDepth and emitwordandDecreaseDepth actions. So, as part of the return parameters for custom actions would be a code for the action: (noword, word, WordincreaseDepth, WordDecreaseDepth, vector) ________________________________ From: 'Pascal Jasmin' via Programming <[email protected]> To: "[email protected]" <[email protected]> Sent: Sunday, March 19, 2017 12:38 PM Subject: Re: [Jprogramming] Show cause hearing - (10 s: y) idea for double nullchars doesn't work as there's no way to know if a null is embedded at the end of one "string" or beginning of next string. Though null followed by a code of the number of consecutive nulls would work. If there are 255 nulls, the code 255 0 would be used. 510 consecutive nulls 255 255 0... ________________________________ From: 'Pascal Jasmin' via Programming <[email protected]> To: "[email protected]" <[email protected]> Sent: Sunday, March 19, 2017 11:33 AM Subject: Re: [Jprogramming] Show cause hearing - (10 s: y) Assuming that this comes with some improvement for s: then it would be easy to favour that improvement. things not to like about a global symbol table is that every typo is included, and any "app"/set that is loaded joins that table. AFAIU, Corruption happens if you create symbols, and then restore a table with 10&s:, and so any application that relies on 10&s: can crash another previously "loaded application" A problem is that 3!:1, or 3!:3 anyway, seems to just store indexes for symbols, which relies on 10 s: for actual persistence. A suggestion for 3!:1 of symbols would be to scan the array containing symbols for null (\0), then store 2&s: if not included, or 5&s: if there is a \0. AFAIU, utf8 is safe to not include 0 as an extended byte. An alternative to 5&s: would be a new 8&s: where "data nulls" are encodes similar to embedded ' in strings. double nullchars encode a data nullchar. single nullchar encodes terminating 2&s: nullchar. This format c/would be used for 3!:1. 2&s: could be modified to be the 8&s: proposal. 10&s: could store in this new format for portability. But the problem of previously assigned symbols in session persists, and so a locale level symbol table would make the most sense for robustness. Also, an "application"/locale that just uses `true`false symbols (bad example but replace with small set of enums), would (presumably) be faster if it didn't share a symbol table with a very large symbol array principally used to avoid string fills. A question about symbols/3!:1... the documentation suggests that indexes are limited to 32bit values. Is that true for j64 too? Query (new) and query (old) is not completely clear in documentation either, and does that differ from i. or e. ? ________________________________ From: Henry Rich <[email protected]> To: Programming forum <[email protected]> Sent: Sunday, March 19, 2017 12:14 AM Subject: [Jprogramming] Show cause hearing - (10 s: y) Does anyone use (10 s: y)? It is problematic in that the hash table (0 s: 4) may depend on the CPU and the J release level. I would rather decommit (10 s: y) and have the user reload the symbol table de novo. Any objections? Henry Rich ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
