Re: [Jprogramming] Dictionaries WAS: Report on the J wiki meeting of January 27, 2022

'Pascal Jasmin' via Programming Sun, 06 Feb 2022 13:34:33 -0800

You covered some of the issues with a data encapsulated class approach like 
yours.

The big issue for me is that your set verb returns 0 0 $0, but even if it 
returned the object reference, J is poor at compound expressions that operate 
on an object.  Need to pass strings to what effectively becomes a dsl

new j903 modifier trains get useful, but still messy

d=: dict 'abc';1 2 3

loc_z_=: (,&'_'@[ ,&'_'@, ":@>@])"1 0 boxopen
in_z_ =: ([. loc ].)~

d ('gf' in ]: + 'gf' (in d)) 'a'  NB. parameterizing dictionary as an adverb 
for lhs of fork, and hard coding on rhs

2

but if set returned an object, having a verb that operated on that object would 
require explicit code  (__y will work) to be simple.

Then there is the issue of a set operation that doesn't want a "forced side 
effect" of permanently altering the object.  instead a copy that wants to be 
temporarily used.  A filter/query operation that returns multiple "records" 

Instead of a data encapsulated class, functions that operate on inverted tables 
would allow returning a new/subset of the "data".  This adds extra work to 
save, but the extra work to copy a class in order to modify only the copy, but 
predeciding that if you want to do this, you would never want to overwrite the 
original dictionary, which seems like being above the paygrade of a function 
operating on inverted tables. Also remember to destroy the copy in your code 
when it is supposed to be discarded (actually a hard problem that would need 
its own dsl to solve all "responsibility combintations").  And then J, has 
unfriendly access problems on operating with an object parameter to a function 
if not an explicit function.

J's strengths come from its functional approach.  Returning a new copy of data 
is functional.  It is very easy in J, especially in console, to modify the 
previous line of code such that it assigns a new result value to existing or 
new variable names.  Double checking that the function works properly before 
overwritting "production" or lesser data is a prudent approach I'd recommend 
100% of the time.  J's impure functional approach is also the perfect 
functional approach.  Pure (never side effect) functions inside, but the last 
caller/user (outside) decides on what side effects to make.

An inverted table argument makes it easy to write functions that operate on 
that y argument inverted table.  An encapsulated class makes that difficult to 
extend.  I still think "keyed table" (multi column dictionary including 
potential multicolumn keys uniquely identifying a record) is still the right 
approach to a generalized dictionary, and most (90%+) column use cases would be 
uniformly typed.  A defining property of dictionaries is access by full key 
match which necessarily brings symbols as an optimization feature of fields, 
but even if dictionary/keyed table, general query access is a nice to have, 
that you have with inverted tables, and an ability to covert to/from symbols 
when "necessary".

A class based approach to keyed tables is possible and easiest to create.

I've mentioned a general datastructure framework.  Which is metadata about the 
data in one box, data in the other.  Metadata is a "property dictionary" where 
values are data or functions.  A string encoding is possible especially if 
there is a "class type" field that directs the encoding/decoding, but encoding 
values as boxed items to distinguish among different types/classes of values 
and functions is also an option.  There is an easyish 1:1 mapping between a 
metadata structure about data, and a class definition that references DATA 
variable, or better yet, use data that is expected to conform to metadata 
understanding of the data as its y function parameter.  This necessarily makes 
this approach exactly as easy as the first.  Write a class, and use it either 
as class or as metadata described structure (data) to be chose by user.

A third option, especially if it applies just to keyed tables, is having a 
dsl/description of the inverted table structure as an adverb parameter.  An 
adverb allows for optimization in the returned verb/modifier. To optimize get 
(your valuable feature of your dict class), you only need to know the table 
constraints/definitions.  set using a datastructure definition can generate a 
(pre)validation of input, along with informative descriptions for why elements 
fail if they do.  A multi column dictionary description dsl would look like:

key: ... value: ...  NB. where ... is a list of fields with attributes 
(reserved words not allowed as field names) as follows:

colname: u(nique): s(orted): type: or b(oxed): (optional if first item 
determines type.  But benefits optimization if provided in dictionary 
description)

single line definition potential is a huge convenience for both copy/edit 
coding, and console simplicity.

So a generic get (by whole field match) is an adverb that first uses 'keyed 
table def' get, but then by a column list (indexes or colnames) that permits an 
indexing optimization step on that index (m&i. where m is the column 
parameter), when a single column is passed, then all keys in y are used to 
retrieve records (one for each key passed), and when multiple columns are part 
of final adverb parameter, then y is expected as a boxed values for each 
column, and all records with a key match retrieved.  It is possible to choose 
(with additional (named) adverb) that if only one record is in dataset, then 
just raw values instead of full dictionary structure are returned.

A metadata encoded datastructure seems superior to the adverb dsl processor in 
that an adverb dsl processor could with a preceding adverb interpret any 
meta+data parameter with just the metadata portion that allows it to operate on 
any other similar structured/metadata'd data.

The end goal of an approach, IMO, should be to create improvements to J in 
terms of generic inverted table functions, with some specific improvements 
already identified in this thread:

'column list' { meta-described-dictionary NB. use FIELDS metadata keyword that 
contains symbol data, to retrieve column indexes (or other potential use of 
FIELDS duck named variable specific to datastructure) referenced in string.

&:: =: bind =: (& @: ;) new modifier train such that dyadic m&:: f and f &::n 
are (m&f)(@:;) or (f&n)(@:;).

J already has bound =: (f&n) or (m&f) have special dyadic interpretations of 
bound^:x y.  The above enhancement would allow an interpertation of bound(@:;) 
which allows writing f for 3 arguments, ie. compound 2 boxed x or y arguments, 
but allows user to provide compound part as dyadic unboxed arguments.  &:: 
compounded allows even more arguments.  If x takes 3 (boxed) arguments than 
arg0&::f&::y applied dyadically, has x as arg1 and y as arg2.  If applied 
monadically,  then the 3rd x argument (arg2) to f would be missing, and f 
c/would deal.  Compounding &:: calls would increase arity of functions from 3 
to higher than 3 parameters.

This feature would also allow optimizing inside f.  If f is explicit than any 
line that is varname =: f x (if m&f is bound) or f y (if f&n is bound), and 
where an ideal structure is x =. f x or y =. f y internally as proof that 
original x can be discarded.  If f is implicit, than any u@] or u@[ can be 
optimized away to a constant based on m&::f or f&::n, and if N V N occurrs as 
result of that optimization, then that too can be optimized into a constant.

What the above allows beyond syntax sugar for more than 2 parameter verbs, is 
not having to resort to self-written-code optimizations inside adverbs.  verbs 
can self optimize based on bound parameters (when for example (m i. ]) has same 
optimization as m&i.

> Lua table references

I've been thinking of k/q as the guiding model.  Lua's variant (boxed) key and 
variant (boxed) values tables have the simplicity of storing every potential 
scenario, but as a dictionary implementation, would provide a strong incentive 
to avoid the dictionaries for performance reason.  If you wanted to use a 
dictionary as a key, in J, you could use a linear representation of that 
dictionary in order to keep all keys as strings.

But, repeating sorry, a boxed/variant column type can coexist along side 
uniform typed columns.

Metadata (not at all Lua interpretation) would instead specify types and 
attributes of inverted table columns in the case of keyed tables.  But also 
(kinda like Lua) include optimized/specified functions related to data.

In general, I'd also say that access_keys_ being limited to valid spaceless J 
naming conventions is not a huge sacrifice for accessnames.  Extending to 
spaceless unicode strings is not an ease of use problem if the user wants 
unicode keys, though it would interfere with that 1:1 J locale/classname 
mapping of datastructure metadata.

On Sunday, February 6, 2022, 09:52:04 a.m. EST, Jan-Pieter Jacobs 
<janpieter.jac...@gmail.com> wrote: 

Hi Pascal,
I responded inline below:

A workaround is to optimize SET, ADD, UPDATE, DEL for bulk operations
> (multiple items processed at once  (] F..) super useful), and after bulk
> operations, "redefine"  (just repeat execution of same definition) GET such
> that any m&i. updates.  Also update FILTER functions (GET multiple if they
> gain from static binding optimization.
>

This is, if I get it correctly, exactly what my dict implementation (
https://github.com/jpjacobs/types_dict) does: it allows
setting/updating/removing multiple keys and the lookup verbs used are
updated only if there is a change in keys

>
> An approach that just presumes key uniqueness instead of enforcing it, is
> for GET to be based on i: instead of i. and then any ADD with a duplicate
> key effectively will return the last updated/added values.
>

This would gather a lot of garbage and would loose the advantage of
in-place updating.

>
> Back to generic datastructure, everything a class can do is possible
> within a datastructure.  All administrative "properties" (names) and their
> associated values including functions can be encoded in a dictionary,
> including a string representation dsl for representing "name values" with
> ease as to function/data.  What specializes a datastructure over a "mere"
> class is the concept of existential data held by the datastructure that a J
> user would want complete access to that data.  In a class based
> implementation, a universal name data =: holds the core data that the J
> programmer would want access to.  Usually, it is compound greater than
> atomic data that can be represented as inverted tables of "linked data".
> And part of the data specifying dsl's purpose is to include descriptions
> that permit any possible optimizations that include what k/q's attributes
> do (sorted, unique), but with extensible dsl, any other
> implications/constraints on the data can use/select a specific
> implementation of universally named "accessors"/functions
>

So a datastructure contains 2 boxes:  1st holds the name of the
> datastructure class (for lookup value of any metadata of that classname),
> and all administrative properties, and specialized functions for
> GET/ADD/DELETE and other functions expected to have meaning relative to its
> "existential" data, and the 2nd box holds the (likely compound and so extra
> boxed) "data"
>
> An advantage of a compound datastructure over a class is the user gets to
> decide whether to overwrite the "permanent" data while still having access
> to SET/DEL/ADD functionality of their own copy they may want for their
> application/data needs. It is also possible for generic GET/ADD/DELETE to
> query the datastructure as to how it can best accomplish its integral
> functionality, should there not be a specialized version defined in the
> datastructure, and GET as an adverb that takes either '',
> datastructure_name, or a specific instance of datastructure can optimize
> itself as a first step, or one that can be bound to an optimized named
> function, or if '' is the adverb parameter to GET, then the generic verb
> "inspect y for datastructure properties" before selecting implementation is
> returned.
>

I think these ideas are pretty much what Lua implements with its tables
(dictionaries that can contain anything as keys and values, joined by their
metatables, i.e. tables that can contain functions to override e.g.
indexing operations). These tables do everything: from working as locales
(function environments), over separating modules (our addons) to
implementing OOP (making liberal use of the __call metamethod, specifying
what happens if you calln a table as if you were calling a function, and
__index, specifying what happens if you try to get a non-existent key in a
table).

In my view, the problem with a locale-based dict implementation like mine
is currently that you cannot nest dicts without loosing generality.
As numbered locales are referred to by boxed numbers, you could make a
special case for these in your implementation, but would evidently loose
the possibility to store boxed numbers. Even when adding checks to whether
a boxed number is a locale, one cannot be sure the user intended to refer
to a locale or actually wanted to store a boxed number.

One could think of using the locales themselves as dicts, but there you'd
have the problem that:
- only valid names can be keys
- referring to values is only possible with dict__key, which precludes
doing so tacitly.

For such implementation to work, one could (note, I have no clue about the
implementation itself :p):
- make a datatype only for referring to locales
- implement indexing into that type with {:: following more or less the
same idea as indexing with {::
- providing a verb to amend along the same lines
- have a conjunction DoneIn that allows something like verb DoneIn mylocale
(could be called 'of' as well)
- allowing any value as "name" in locales.

Like that, implementing a dict that allows storing arbitrary keys and
values, nesting dicts and even self-reference, reference loops etc, using
locales would become possible.

In the end, I guess this would end up at about the same functionality as
Lua does for tables… so I don't know what's more effort: implementing
everything in J/C, or binding Lua. There's been a time I would have loved
to have Lua instead of J's explicit language, but I guess that would end up
as a different language :).

Jan-Pieter

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Dictionaries WAS: Report on the J wiki meeting of January 27, 2022

Reply via email to