CSV has a lot of issues (consider, for example, quoting and escaping
quotes), and if you wanted ;: to handle csv you would probably think
about adding new operations to deal with the complexity.

Thanks,

-- 
Raul


On Fri, Apr 7, 2017 at 3:03 PM, 'Pascal Jasmin' via Programming
<[email protected]> wrote:
> skipping a character is done "all the time", by resetting j.  It can only 
> skip over the begining of word characters.
>
>>it's hard to imagine a case where empty tokens are meaningful and
>
> useful.
>
> csv or other delimited data:
>
> a,,b,c has 4 fields.  one empty.
>
> there'd be no change to the state machine tokeninizing J language.  You are 
> currently not allowed to emit empty.  The change would not force you to start 
> doing so.
>
>>(from a comprehensibility point of view) to
>
> be using <;._1 or <;._2 for that
>
> its slower.   and the equivalent sj matrix to <;._1 is a single "row".
> ________________________________
> From: Raul Miller <[email protected]>
> To: Programming forum <[email protected]>
> Sent: Friday, April 7, 2017 1:49 PM
> Subject: Re: [Jprogramming] Show cause hearing - (10 s: y)
>
>
>
> I am not sure that empty words make sense with ;:
>
> Each state transition is a character. So, to achieve "empty box" you
> would need to be skipping a character.
>
> (So, for example, right now, you could dedicate a character to be a
> placeholder and then remove all instances of that character from all
> boxes.)
>
> Anyways, the ;: handles the 'tokenizer' roll for the language, and
> it's hard to imagine a case where empty tokens are meaningful and
> useful.
>
> Presumably you do want the empties for some reason, but I am thinking
> it would make more sense (from a comprehensibility point of view) to
> be using <;._1 or <;._2 for that.
>
> Thanks,
>
> --
> Raul
>
>
>
>
> On Fri, Apr 7, 2017 at 1:25 PM, 'Pascal Jasmin' via Programming
> <[email protected]> wrote:
>> So right now, ew when j=-1 is a syntax error.  And, also currently, you can 
>> never emit empty boxes.  If for some reason the intent of your machine is to 
>> never emit empty boxes, then that output will give you a clue that you did 
>> not define it correctly.  No current machine would be affected.  The speed 
>> boost though would be significant compared to the workarounds.  You could 
>> also check with a: e. result if you want to discard all results with an 
>> error.
>>
>>
>>
>>
>> ________________________________
>> From: Raul Miller <[email protected]>
>> To: Programming forum <[email protected]>
>> Sent: Friday, April 7, 2017 1:12 PM
>> Subject: Re: [Jprogramming] Show cause hearing - (10 s: y)
>>
>>
>>
>> I'm not sure that eliminating syntax errors to get null words is a good idea.
>>
>> --
>> Raul
>>
>>
>> On Fri, Apr 7, 2017 at 12:35 PM, 'Pascal Jasmin' via Programming
>> <[email protected]> wrote:
>>> getting back to the idea of storing symbols by 3!:1 as delimited strings.  
>>> This would both be an improvement in storage, and eliminate the error prone 
>>> dependence on 10 s:
>>>
>>> I've got the esc (uses ;:) method in the latest jpp ( 
>>> https://github.com/Pascal-J/jpp ) to get 2MB/s throughput using 2 passes, 
>>> and an escape code to handle both embeded escapes and nulls, and null 
>>> delimited data/symbols.
>>>
>>> Several improvements to ;: would make this significantly faster and more 
>>> flexible:
>>>
>>> emit null when j=-1, and emitword issued (previously suggested):  This 
>>> allows null fields to easily be "parsed" (current method used is to use 
>>> function code 2, and examine gaps in order to add nulls as a 2nd pass.  
>>> function code 2 is slower than 0, and overhead in calculating gaps, and 
>>> inserting nulls)
>>>
>>> Add an action code that suspends/pauses current word.  Next start word will 
>>> append to current word, skipping any characters that were scanned during 
>>> pause.  This would allow "deleting" items in the middle of a word in a 
>>> single pass instead of using the 2 pass approach (with 2nd pass using 
>>> function code 1).  Alternatively, it could function like ev, but if ew is 
>>> in same state, it discards the elements between startword's.
>>>
>>>
>>> A custom action code (one interpretation of Henry's inclination, though he 
>>> may have thought of custom function codes) that has a way of inserting a 
>>> character.  This would allow building an escaped sequence by inserting the 
>>> escape character prior to last seen.
>>>
>>> Custom action codes would need to return characters to include (if it is 
>>> not an ew,ev class), newi, newj at least.  A new function code would be a 
>>> variation on 2, emit i (i-j), actioncode, though "characters to include" 
>>> would interact direction with function codes 0 and 1.
>>>
>>>
>>> A powerful tool for nested structures (see parenw machine in fsm.ijs that 
>>> builds trees from parentheses groups) would be an emitwordandIncreaseDepth 
>>> and emitwordandDecreaseDepth actions.  So, as part of the return parameters 
>>> for custom actions would be a code for the action: (noword, word, 
>>> WordincreaseDepth, WordDecreaseDepth, vector)
>>>
>>>
>>>
>>>
>>> ________________________________
>>> From: 'Pascal Jasmin' via Programming <[email protected]>
>>> To: "[email protected]" <[email protected]>
>>> Sent: Sunday, March 19, 2017 12:38 PM
>>> Subject: Re: [Jprogramming] Show cause hearing - (10 s: y)
>>>
>>>
>>>
>>> idea for double nullchars doesn't work as there's no way to know if a null 
>>> is embedded at the end of one "string" or beginning of next string.  Though 
>>> null followed by a code of the number of consecutive nulls would work.  If 
>>> there are 255 nulls, the code 255 0 would be used.  510 consecutive nulls 
>>> 255 255 0...
>>>
>>>
>>>
>>>
>>> ________________________________
>>> From: 'Pascal Jasmin' via Programming <[email protected]>
>>> To: "[email protected]" <[email protected]>
>>> Sent: Sunday, March 19, 2017 11:33 AM
>>> Subject: Re: [Jprogramming] Show cause hearing - (10 s: y)
>>>
>>>
>>>
>>>
>>>
>>> Assuming that this comes with some improvement for s: then it would be easy 
>>> to favour that improvement.
>>>
>>> things not to like about a global symbol table is that every typo is 
>>> included, and any "app"/set that is loaded joins that table.  AFAIU, 
>>> Corruption happens if you create symbols, and then restore a table with 
>>> 10&s:, and so any application that relies on 10&s: can crash another 
>>> previously "loaded application"
>>>
>>>
>>> A problem is that 3!:1, or 3!:3 anyway, seems to just store indexes for 
>>> symbols, which relies on 10 s: for actual persistence.
>>>
>>> A suggestion for 3!:1 of symbols would be to scan the array containing 
>>> symbols for null (\0), then store 2&s: if not included, or 5&s: if there is 
>>> a \0.  AFAIU, utf8 is safe to not include 0 as an extended byte.
>>>
>>> An alternative to 5&s: would be a new 8&s: where "data nulls" are encodes 
>>> similar to embedded ' in strings.  double nullchars encode a data nullchar. 
>>>  single nullchar encodes terminating 2&s: nullchar.  This format c/would be 
>>> used for 3!:1.  2&s: could be modified to be the 8&s: proposal.
>>>
>>> 10&s: could store in this new format for portability.  But the problem of 
>>> previously assigned symbols in session persists, and so a locale level 
>>> symbol table would make the most sense for robustness.  Also, an 
>>> "application"/locale that just uses `true`false symbols (bad example but 
>>> replace with small set of enums), would (presumably) be faster if it didn't 
>>> share a symbol table with a very large symbol array principally used to 
>>> avoid string fills.
>>>
>>>
>>> A question about symbols/3!:1... the documentation suggests that indexes 
>>> are limited to 32bit values.  Is that true for j64 too?  Query (new) and 
>>> query (old) is not completely clear in documentation either, and does that 
>>> differ from i. or e. ?
>>>
>>>
>>>
>>> ________________________________
>>> From: Henry Rich <[email protected]>
>>> To: Programming forum <[email protected]>
>>> Sent: Sunday, March 19, 2017 12:14 AM
>>> Subject: [Jprogramming] Show cause hearing - (10 s: y)
>>>
>>>
>>>
>>> Does anyone use (10 s: y)?
>>>
>>>
>>> It is problematic in that the hash table (0 s: 4) may depend on the CPU
>>>
>>> and the J release level.
>>>
>>>
>>> I would rather decommit (10 s: y) and have the user reload the symbol
>>>
>>> table de novo.  Any objections?
>>>
>>>
>>> Henry Rich
>>>
>>> ----------------------------------------------------------------------
>>>
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>
>>
>>>
>>>
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to