[Factor-talk] Some sequence `depth` ideas

cat via Factor-talk Sun, 05 Jan 2020 18:00:27 -0800

I've been experimenting with some recursive definitions of sequence "depth". My 
main question is:


- whether the same (or similar adaptable) functionality already exists by a 
different name somewhere in the standard library and
- which vocabulary (sequences, -.extras, -.deep) they should be in, if I add 
them to the standard library in a PR

The primary application (especially of leaf-depth, but the others too) is for 
generalised N-dimensional, possibly-heterogeneous, possibly-misshapen matrix 
words (e.g. working with non-mathematical 5-dimensional matrices), but are 
obviously applicable to generalised sequences, and do not belong in 
math.matrices.

Help me find nonsense and logical inconsistencies in this summary, which the 
implementation should match:

depth is maximum depth of all subsequences, disregarding sibling elements / 
homogeneity of 
[branches](https://docs.factorcode.org/content/word-branch__que__,sequences.deep.html)
 and leaves (non-branches). Its semantics consider the outer containing 
sequence itself (even when empty) to be a "layer" and therefore counted, and 
considers leaves to also be a layer. A flat sequence of non-sequences [leaves; 
scalars] has a depth of 2. Subtracting 1 to discount containing sequences is 
not good enough; an alternative definition for depth* would/will be needed 
which starts at 0.

homogenetic-depth is the number of layers to which the sequence is homogeneous, 
i.e. a subsequence [layer] has only branches, or only leaves for elements. If a 
subsequence immediately contains another sequence and a non-sequence, that 
subsequence mixes branches and leaves, and is not homogeneous, so layer depth 
counting must stop on the parent sequence. depth and this word work the same 
for empty and deep-homogeneous sequences, but if any of its input's 
subsequences are ever heterogeneous, this word will stop counting whereas 
`depth` will find the bottom of heterogeneous subsequences.

Lastly, leaf-depth ("scalar depth") is similar to homogeneous-depth with only 2 
differences:

- Counting layers starts at 0 instead of 1; this means an empty input sequence 
is considered to have a leaf-depth of 0, because an empty sequence has no 
leaves.

- The counter is incremented every time the word is called and the input isn't 
empty, whereas in homogenetic-depth it's only incremented when the sequence is 
homogeneous.

This is because leaf-depth wants to consider heterogeneous layers (where 
branches and leaves are in the same parent sequence) as containing only 
non-sequence elements. In other words, if a sequence is beside a non-sequence 
on a layer, the sequence is treated as an opaque object like a string, even if 
it is iterable.

This allows, e.g., a heterogeneous matrix to have numerical and array elements, 
and treat them simply as sub-elements (albeit heterogeneous), instead of 
considering { { 1 { 2 } } } to be a misshapen 3-D matrix. Conversely, the basic 
depth definition is consistent with considering that a misshapen matrix.

If you can optimise these definitions, especially helping to write them in a 
tail-call-optimizable way, I'd appreciate that input as well. It does not seem 
like unless-empty can allow for TCO, and the way (depth) recurses is very 
unfortunate.

USING: kernel sequences sequences.extras sequences.deep sequences.private 
grouping
math combinators.short-circuit tools.test ;
IN: depth

<PRIVATE

: (depth) ( n seq -- n )
  [
    [ 1 + ] dip [ dup branch? [ (depth) ] [ drop ] if ] with map supremum
  ] unless-empty ; recursive foldable

: (homogenetic-depth) ( n seq -- n )
  [
    dup [ branch? ] map [ first-unsafe ] [ all-equal? ] bi [
      [ 1 + ] 2dip [ flatten1 (homogenetic-depth) ] [ drop ] if
    ] [
      2drop
    ] if
  ] unless-empty ; recursive foldable

: (leaf-depth) ( n seq -- n )
  [
    [ 1 + ] [ dup [ branch? ] map [ first-unsafe ] [ all-equal? ] bi ] bi* [
      [ flatten1 (leaf-depth) ] [ drop ] if
    ] [
      2drop
    ] if
  ] unless-empty ; recursive foldable

PRIVATE>

: depth ( seq -- n )
  1 swap (depth) ; inline

! depth to which the sequence is homogeneous of branches or leaves
! stop counting when it stops being homogeneous
: homogenetic-depth ( seq -- n )
  1 swap (homogenetic-depth) ; inline

! scalar depth, similar but distinct to homogenetic depth
! i.e., when a subsequence is heterogeneous (mixed branches and leaves),
! then algorithms targeted at homogeneity and/or matrices consider branches
! that are beside leaves to be leaves too, i.e. subsequences that are on a 
layer with
! other non-sequences are treated as scalar elements rather than sequence data 
with
! sub-elements
: leaf-depth ( seq -- n )
  0 swap (leaf-depth) ; inline

USE: arrays
IN: depth.tests

{ 1 } [ { } depth ] unit-test
{ 1 } [ { } homogeneous-depth ] unit-test
{ 0 } [ { } leaf-depth ] unit-test

{ 2 } [ { 1 } depth ] unit-test
{ 2 } [ { 1 } homogeneous-depth ] unit-test
{ 1 } [ { 1 } leaf-depth ] unit-test
{ 1 } [ { 1 2 } leaf-depth ] unit-test
{ 1 } [ 1 100 <array> leaf-depth ] unit-test

{ 3 } [ { { 1 } } depth ] unit-test
{ 3 } [ { { 1 } } homogeneous-depth ] unit-test
{ 2 } [ { { 1 } } leaf-depth ] unit-test

{ 3 } [ { { 1 } { 2 } } depth ] unit-test
{ 3 } [ { { 1 } { 2 } } homogeneous-depth ] unit-test
{ 2 } [ { { 1 } { 2 } } leaf-depth ] unit-test

{ 3 } [ { { 1 } 2 } depth ] unit-test
{ 1 } [ { { 1 } 2 } homogeneous-depth ] unit-test
{ 1 } [ { { 1 } 2 } leaf-depth ] unit-test
{ 2 } [ { { 1 } { { 2 } } } leaf-depth ] unit-test

{ 2 } [
  {
    { 1 { 2 } }
    { { 3 } 4 }
  } leaf-depth
] unit-test

_______________________________________________
Factor-talk mailing list
Factor-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/factor-talk

[Factor-talk] Some sequence `depth` ideas

Reply via email to