I'm not sure that eliminating syntax errors to get null words is a good idea.

-- 
Raul


On Fri, Apr 7, 2017 at 12:35 PM, 'Pascal Jasmin' via Programming
<[email protected]> wrote:
> getting back to the idea of storing symbols by 3!:1 as delimited strings.  
> This would both be an improvement in storage, and eliminate the error prone 
> dependence on 10 s:
>
> I've got the esc (uses ;:) method in the latest jpp ( 
> https://github.com/Pascal-J/jpp ) to get 2MB/s throughput using 2 passes, and 
> an escape code to handle both embeded escapes and nulls, and null delimited 
> data/symbols.
>
> Several improvements to ;: would make this significantly faster and more 
> flexible:
>
> emit null when j=-1, and emitword issued (previously suggested):  This allows 
> null fields to easily be "parsed" (current method used is to use function 
> code 2, and examine gaps in order to add nulls as a 2nd pass.  function code 
> 2 is slower than 0, and overhead in calculating gaps, and inserting nulls)
>
> Add an action code that suspends/pauses current word.  Next start word will 
> append to current word, skipping any characters that were scanned during 
> pause.  This would allow "deleting" items in the middle of a word in a single 
> pass instead of using the 2 pass approach (with 2nd pass using function code 
> 1).  Alternatively, it could function like ev, but if ew is in same state, it 
> discards the elements between startword's.
>
>
> A custom action code (one interpretation of Henry's inclination, though he 
> may have thought of custom function codes) that has a way of inserting a 
> character.  This would allow building an escaped sequence by inserting the 
> escape character prior to last seen.
>
> Custom action codes would need to return characters to include (if it is not 
> an ew,ev class), newi, newj at least.  A new function code would be a 
> variation on 2, emit i (i-j), actioncode, though "characters to include" 
> would interact direction with function codes 0 and 1.
>
>
> A powerful tool for nested structures (see parenw machine in fsm.ijs that 
> builds trees from parentheses groups) would be an emitwordandIncreaseDepth 
> and emitwordandDecreaseDepth actions.  So, as part of the return parameters 
> for custom actions would be a code for the action: (noword, word, 
> WordincreaseDepth, WordDecreaseDepth, vector)
>
>
>
>
> ________________________________
> From: 'Pascal Jasmin' via Programming <[email protected]>
> To: "[email protected]" <[email protected]>
> Sent: Sunday, March 19, 2017 12:38 PM
> Subject: Re: [Jprogramming] Show cause hearing - (10 s: y)
>
>
>
> idea for double nullchars doesn't work as there's no way to know if a null is 
> embedded at the end of one "string" or beginning of next string.  Though null 
> followed by a code of the number of consecutive nulls would work.  If there 
> are 255 nulls, the code 255 0 would be used.  510 consecutive nulls 255 255 
> 0...
>
>
>
>
> ________________________________
> From: 'Pascal Jasmin' via Programming <[email protected]>
> To: "[email protected]" <[email protected]>
> Sent: Sunday, March 19, 2017 11:33 AM
> Subject: Re: [Jprogramming] Show cause hearing - (10 s: y)
>
>
>
>
>
> Assuming that this comes with some improvement for s: then it would be easy 
> to favour that improvement.
>
> things not to like about a global symbol table is that every typo is 
> included, and any "app"/set that is loaded joins that table.  AFAIU, 
> Corruption happens if you create symbols, and then restore a table with 
> 10&s:, and so any application that relies on 10&s: can crash another 
> previously "loaded application"
>
>
> A problem is that 3!:1, or 3!:3 anyway, seems to just store indexes for 
> symbols, which relies on 10 s: for actual persistence.
>
> A suggestion for 3!:1 of symbols would be to scan the array containing 
> symbols for null (\0), then store 2&s: if not included, or 5&s: if there is a 
> \0.  AFAIU, utf8 is safe to not include 0 as an extended byte.
>
> An alternative to 5&s: would be a new 8&s: where "data nulls" are encodes 
> similar to embedded ' in strings.  double nullchars encode a data nullchar.  
> single nullchar encodes terminating 2&s: nullchar.  This format c/would be 
> used for 3!:1.  2&s: could be modified to be the 8&s: proposal.
>
> 10&s: could store in this new format for portability.  But the problem of 
> previously assigned symbols in session persists, and so a locale level symbol 
> table would make the most sense for robustness.  Also, an 
> "application"/locale that just uses `true`false symbols (bad example but 
> replace with small set of enums), would (presumably) be faster if it didn't 
> share a symbol table with a very large symbol array principally used to avoid 
> string fills.
>
>
> A question about symbols/3!:1... the documentation suggests that indexes are 
> limited to 32bit values.  Is that true for j64 too?  Query (new) and query 
> (old) is not completely clear in documentation either, and does that differ 
> from i. or e. ?
>
>
>
> ________________________________
> From: Henry Rich <[email protected]>
> To: Programming forum <[email protected]>
> Sent: Sunday, March 19, 2017 12:14 AM
> Subject: [Jprogramming] Show cause hearing - (10 s: y)
>
>
>
> Does anyone use (10 s: y)?
>
>
> It is problematic in that the hash table (0 s: 4) may depend on the CPU
>
> and the J release level.
>
>
> I would rather decommit (10 s: y) and have the user reload the symbol
>
> table de novo.  Any objections?
>
>
> Henry Rich
>
> ----------------------------------------------------------------------
>
> For information about J forums see http://www.jsoftware.com/forums.htm
>
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to