Re: [Jprogramming] Dictionaries WAS: Report on the J wiki meeting of January 27, 2022

'Pascal Jasmin' via Programming Tue, 01 Feb 2022 08:53:35 -0800

The distinction between an x column table and a x column dictionary (as 
inverted table) is unique keys: unique values that make up 1+ column.  While a 
strict definition of a dictionary is a datastructure that supports just value 
lookup through keys, the ability to make any query at all on the data is an 
important "nice to have".

variant as opposed to single typed columns is a valid form of dictionary, 
especially when values can hold other dictionaries or is a tree of some type, 
or includes a variable length list as a field (though in J, this is considered 
typed with a "special fill code").

Typed columns can also represent a tree structure (that differs from a graph in 
that bottom nodes never point back up to the tree creating a cycle), where a 
multicolumn key includes parent and child keys, with a "null" child coding for 
leafs (childless nodes), or a "parent node" can include value field that is a 
list of all of its direct child node keys.

keeping with DBMS structure, table/dictionary relationships can be implemented 
as one to many (customer -> orders) or many to many  (customer -> orders -> 
product skus) relationships when the "child" association has a different set of 
fields than parent table/dictionary fields, and many to many (graphs) 
relationships handled with an intermediate table/dictionary that pairs the 
relationship keys between the 2 tables/dictionaries.

The alternative simple implementation of trees is to represent parent child 
relationships by embedding a tree/dictionary as the value to the parent key.  
The disadvantage is that retrieval is hard unless you query only by parent 
hierarchy order.  The trouble with implementing graphs as embedded 
graphs/dictionaries is duplicating nodes.  But, it is possible to keep a 
dictionary of customers and all of their orders (subdictionary) as a 
dictionary, and the orders subdictionary just holds many sku keys, and any 
extended information about product skus are kept in another table.

I'll point out that IMO, an associative array differs from a dictionary in that 
there is no requirement for unique keys.  Just an ability to lookup one set of 
values from 1 column by querying the other column(s).  The many to many 
relationship from DBMS is typically an associative array implementation.

The reason I'm pushing for consistent typed column-based dictionaries is that 
it is the most flexible to query, easiest to maintain "integrity", and highest 
performance+lowest space.  BLOBs that must be decoded to query is slow.  It is 
perhaps more manageable if there is a single type of "blob" such as an embedded 
dictionary/table with set fields, or duck typed query response (complexity 
encoded in your query functions).

But this typed push, still allows "boxed type" which allows any "craziness" the 
dictionary user wishes for.

On Tuesday, February 1, 2022, 08:37:08 a.m. EST, ethiejiesa via Programming 
<[email protected]> wrote: 

Henry Rich <[email protected]> wrote:
> I think I agree with all your statements, but you are not responding to 
> my questions, which will help focus the discussion:
> 
> 1. What is a Dictionary, EXACTLY?

FWIW, I find myself confused by this question.

Maybe I am just failing to pick up on an implicit convention here, but an
"exact definition" is only precise/unambiguous relative to a given base
reality. Case in point, defining a dictionary as a mathematical function from
a set of keys to a set of values, is precise and exact in the sense of ZFC, but
I suspect it's not particularly useful in this situation.

What particular "base reality" is best to define against here?

Maybe negative anwsers could help clarify things?
- Why is a 2-column inverted table, together with appropriate access idioms,
  not a dictionary?
- HPC folk have been representing trees as cleverly-arranged arrays for years,
  apparently. Why are tries [0] not dictionaries?

I suspect that anwsers might include discussion about specific performance
issues. So maybe complexity bounds on operations need to be specified for a
sufficiently exact answer?

Or maybe J primitives already suffice, but we're simply noting that the
appropriate idioms have a really steep learning curve and that we want a more
beginner-approachable alternative?

Wonder what Roger Hui would say... :/
https://www.jsoftware.com/papers/APLHashingModel.htm

[0]:https://en.wikipedia.org/wiki/Trie#Replacing_other_data_structures
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Dictionaries WAS: Report on the J wiki meeting of January 27, 2022

Reply via email to