getting back to the idea of storing symbols by 3!:1 as delimited strings.  This 
would both be an improvement in storage, and eliminate the error prone 
dependence on 10 s:

I've got the esc (uses ;:) method in the latest jpp ( 
https://github.com/Pascal-J/jpp ) to get 2MB/s throughput using 2 passes, and 
an escape code to handle both embeded escapes and nulls, and null delimited 
data/symbols.

Several improvements to ;: would make this significantly faster and more 
flexible:

emit null when j=-1, and emitword issued (previously suggested):  This allows 
null fields to easily be "parsed" (current method used is to use function code 
2, and examine gaps in order to add nulls as a 2nd pass.  function code 2 is 
slower than 0, and overhead in calculating gaps, and inserting nulls)

Add an action code that suspends/pauses current word.  Next start word will 
append to current word, skipping any characters that were scanned during pause. 
 This would allow "deleting" items in the middle of a word in a single pass 
instead of using the 2 pass approach (with 2nd pass using function code 1).  
Alternatively, it could function like ev, but if ew is in same state, it 
discards the elements between startword's.


A custom action code (one interpretation of Henry's inclination, though he may 
have thought of custom function codes) that has a way of inserting a character. 
 This would allow building an escaped sequence by inserting the escape 
character prior to last seen.

Custom action codes would need to return characters to include (if it is not an 
ew,ev class), newi, newj at least.  A new function code would be a variation on 
2, emit i (i-j), actioncode, though "characters to include" would interact 
direction with function codes 0 and 1. 


A powerful tool for nested structures (see parenw machine in fsm.ijs that 
builds trees from parentheses groups) would be an emitwordandIncreaseDepth and 
emitwordandDecreaseDepth actions.  So, as part of the return parameters for 
custom actions would be a code for the action: (noword, word, 
WordincreaseDepth, WordDecreaseDepth, vector)




________________________________
From: 'Pascal Jasmin' via Programming <[email protected]>
To: "[email protected]" <[email protected]> 
Sent: Sunday, March 19, 2017 12:38 PM
Subject: Re: [Jprogramming] Show cause hearing - (10 s: y)



idea for double nullchars doesn't work as there's no way to know if a null is 
embedded at the end of one "string" or beginning of next string.  Though null 
followed by a code of the number of consecutive nulls would work.  If there are 
255 nulls, the code 255 0 would be used.  510 consecutive nulls 255 255 0...




________________________________
From: 'Pascal Jasmin' via Programming <[email protected]>
To: "[email protected]" <[email protected]> 
Sent: Sunday, March 19, 2017 11:33 AM
Subject: Re: [Jprogramming] Show cause hearing - (10 s: y)





Assuming that this comes with some improvement for s: then it would be easy to 
favour that improvement.

things not to like about a global symbol table is that every typo is included, 
and any "app"/set that is loaded joins that table.  AFAIU, Corruption happens 
if you create symbols, and then restore a table with 10&s:, and so any 
application that relies on 10&s: can crash another previously "loaded 
application"


A problem is that 3!:1, or 3!:3 anyway, seems to just store indexes for 
symbols, which relies on 10 s: for actual persistence.

A suggestion for 3!:1 of symbols would be to scan the array containing symbols 
for null (\0), then store 2&s: if not included, or 5&s: if there is a \0.  
AFAIU, utf8 is safe to not include 0 as an extended byte.

An alternative to 5&s: would be a new 8&s: where "data nulls" are encodes 
similar to embedded ' in strings.  double nullchars encode a data nullchar.  
single nullchar encodes terminating 2&s: nullchar.  This format c/would be used 
for 3!:1.  2&s: could be modified to be the 8&s: proposal.

10&s: could store in this new format for portability.  But the problem of 
previously assigned symbols in session persists, and so a locale level symbol 
table would make the most sense for robustness.  Also, an "application"/locale 
that just uses `true`false symbols (bad example but replace with small set of 
enums), would (presumably) be faster if it didn't share a symbol table with a 
very large symbol array principally used to avoid string fills.


A question about symbols/3!:1... the documentation suggests that indexes are 
limited to 32bit values.  Is that true for j64 too?  Query (new) and query 
(old) is not completely clear in documentation either, and does that differ 
from i. or e. ?



________________________________
From: Henry Rich <[email protected]>
To: Programming forum <[email protected]> 
Sent: Sunday, March 19, 2017 12:14 AM
Subject: [Jprogramming] Show cause hearing - (10 s: y)



Does anyone use (10 s: y)?


It is problematic in that the hash table (0 s: 4) may depend on the CPU 

and the J release level.


I would rather decommit (10 s: y) and have the user reload the symbol 

table de novo.  Any objections?


Henry Rich

----------------------------------------------------------------------

For information about J forums see http://www.jsoftware.com/forums.htm


----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to