It would be helpful to know what code you were using that was not working on
symbols and boxes. One quick solution is x ({.,#)/. i.#x; that gives a
two-column table whose first column gives indices and whose second column
gives the count of elements at the corresponding index. Faster is to separate
#/.x and I.@:~:x, building the index and length lists separately.
Assuming you do not care about the content of the words, you may find it
convenient to create a dictionary of words encountered, and then represent
each word with its index in the dictionary.
+/|:= and tying that to the nub somehow
That should work. I would use +/"1=y rather than +/|:=y. The result is a
frequency count for each element of the nub; no extra mapping required.
-E
On Mon, 12 Sep 2022, 'Viktor Grigorov' via Programming wrote:
Hey,
Whilst getting back to a Markov text generator in J, I quickly came to the
issue of all top-result verbs for histograms, found querying the wiki and
the mailing lists, to be dealing with numeric types only. One'd have to
reshape the items appropriately to a new dimension. But if one has n-tuples
of words, what then? The verbs give domain errors on symbols and boxes.
Obviously, I. can't be used, and something long would just be
ill-performant, like +/|:= and tying that to the nub somehow. What would
work with symbols, boxed strings, or rank 2 array? What would work well? A
typical novel is 8e4 +- 2e4, that's a lot of tuples, and my idea is to
employ 3-, 4-, and 5-length groupings, just to compare results with
different settings. I've dealt little-to-not-at-all with symbols, and boxed
strings,
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm