I've been experimenting with some recursive definitions of sequence "depth". My
main question is:
- whether the same (or similar adaptable) functionality already exists by a
different name somewhere in the standard library and
- which vocabulary (sequences, -.extras, -.deep) they should be in, if I add
them to the standard library in a PR
The primary application (especially of leaf-depth, but the others too) is for
generalised N-dimensional, possibly-heterogeneous, possibly-misshapen matrix
words (e.g. working with non-mathematical 5-dimensional matrices), but are
obviously applicable to generalised sequences, and do not belong in
math.matrices.
Help me find nonsense and logical inconsistencies in this summary, which the
implementation should match:
depth is maximum depth of all subsequences, disregarding sibling elements /
homogeneity of
[branches](https://docs.factorcode.org/content/word-branch__que__,sequences.deep.html)
and leaves (non-branches). Its semantics consider the outer containing
sequence itself (even when empty) to be a "layer" and therefore counted, and
considers leaves to also be a layer. A flat sequence of non-sequences [leaves;
scalars] has a depth of 2. Subtracting 1 to discount containing sequences is
not good enough; an alternative definition for depth* would/will be needed
which starts at 0.
homogenetic-depth is the number of layers to which the sequence is homogeneous,
i.e. a subsequence [layer] has only branches, or only leaves for elements. If a
subsequence immediately contains another sequence and a non-sequence, that
subsequence mixes branches and leaves, and is not homogeneous, so layer depth
counting must stop on the parent sequence. depth and this word work the same
for empty and deep-homogeneous sequences, but if any of its input's
subsequences are ever heterogeneous, this word will stop counting whereas
`depth` will find the bottom of heterogeneous subsequences.
Lastly, leaf-depth ("scalar depth") is similar to homogeneous-depth with only 2
differences:
- Counting layers starts at 0 instead of 1; this means an empty input sequence
is considered to have a leaf-depth of 0, because an empty sequence has no
leaves.
- The counter is incremented every time the word is called and the input isn't
empty, whereas in homogenetic-depth it's only incremented when the sequence is
homogeneous.
This is because leaf-depth wants to consider heterogeneous layers (where
branches and leaves are in the same parent sequence) as containing only
non-sequence elements. In other words, if a sequence is beside a non-sequence
on a layer, the sequence is treated as an opaque object like a string, even if
it is iterable.
This allows, e.g., a heterogeneous matrix to have numerical and array elements,
and treat them simply as sub-elements (albeit heterogeneous), instead of
considering { { 1 { 2 } } } to be a misshapen 3-D matrix. Conversely, the basic
depth definition is consistent with considering that a misshapen matrix.
If you can optimise these definitions, especially helping to write them in a
tail-call-optimizable way, I'd appreciate that input as well. It does not seem
like unless-empty can allow for TCO, and the way (depth) recurses is very
unfortunate.
USING: kernel sequences sequences.extras sequences.deep sequences.private
grouping
math combinators.short-circuit tools.test ;
IN: depth
<PRIVATE
: (depth) ( n seq -- n )
[
[ 1 + ] dip [ dup branch? [ (depth) ] [ drop ] if ] with map supremum
] unless-empty ; recursive foldable
: (homogenetic-depth) ( n seq -- n )
[
dup [ branch? ] map [ first-unsafe ] [ all-equal? ] bi [
[ 1 + ] 2dip [ flatten1 (homogenetic-depth) ] [ drop ] if
] [
2drop
] if
] unless-empty ; recursive foldable
: (leaf-depth) ( n seq -- n )
[
[ 1 + ] [ dup [ branch? ] map [ first-unsafe ] [ all-equal? ] bi ] bi* [
[ flatten1 (leaf-depth) ] [ drop ] if
] [
2drop
] if
] unless-empty ; recursive foldable
PRIVATE>
: depth ( seq -- n )
1 swap (depth) ; inline
! depth to which the sequence is homogeneous of branches or leaves
! stop counting when it stops being homogeneous
: homogenetic-depth ( seq -- n )
1 swap (homogenetic-depth) ; inline
! scalar depth, similar but distinct to homogenetic depth
! i.e., when a subsequence is heterogeneous (mixed branches and leaves),
! then algorithms targeted at homogeneity and/or matrices consider branches
! that are beside leaves to be leaves too, i.e. subsequences that are on a
layer with
! other non-sequences are treated as scalar elements rather than sequence data
with
! sub-elements
: leaf-depth ( seq -- n )
0 swap (leaf-depth) ; inline
USE: arrays
IN: depth.tests
{ 1 } [ { } depth ] unit-test
{ 1 } [ { } homogeneous-depth ] unit-test
{ 0 } [ { } leaf-depth ] unit-test
{ 2 } [ { 1 } depth ] unit-test
{ 2 } [ { 1 } homogeneous-depth ] unit-test
{ 1 } [ { 1 } leaf-depth ] unit-test
{ 1 } [ { 1 2 } leaf-depth ] unit-test
{ 1 } [ 1 100 <array> leaf-depth ] unit-test
{ 3 } [ { { 1 } } depth ] unit-test
{ 3 } [ { { 1 } } homogeneous-depth ] unit-test
{ 2 } [ { { 1 } } leaf-depth ] unit-test
{ 3 } [ { { 1 } { 2 } } depth ] unit-test
{ 3 } [ { { 1 } { 2 } } homogeneous-depth ] unit-test
{ 2 } [ { { 1 } { 2 } } leaf-depth ] unit-test
{ 3 } [ { { 1 } 2 } depth ] unit-test
{ 1 } [ { { 1 } 2 } homogeneous-depth ] unit-test
{ 1 } [ { { 1 } 2 } leaf-depth ] unit-test
{ 2 } [ { { 1 } { { 2 } } } leaf-depth ] unit-test
{ 2 } [
{
{ 1 { 2 } }
{ { 3 } 4 }
} leaf-depth
] unit-test
_______________________________________________
Factor-talk mailing list
Factor-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/factor-talk