It would be helpful to know what code you were using that was not working on symbols and boxes. One quick solution is x ({.,#)/. i.#x; that gives a two-column table whose first column gives indices and whose second column gives the count of elements at the corresponding index. Faster is to separate #/.x and I.@:~:x, building the index and length lists separately.

Assuming you do not care about the content of the words, you may find it convenient to create a dictionary of words encountered, and then represent each word with its index in the dictionary.

+/|:= and tying that to the nub somehow

That should work. I would use +/"1=y rather than +/|:=y. The result is a frequency count for each element of the nub; no extra mapping required.

 -E

On Mon, 12 Sep 2022, 'Viktor Grigorov' via Programming wrote:

Hey,

Whilst getting back to a Markov text generator in J, I quickly came to the issue of all top-result verbs for histograms, found querying the wiki and the mailing lists, to be dealing with numeric types only. One'd have to reshape the items appropriately to a new dimension. But if one has n-tuples of words, what then? The verbs give domain errors on symbols and boxes.

Obviously, I. can't be used, and something long would just be ill-performant, like +/|:= and tying that to the nub somehow. What would work with symbols, boxed strings, or rank 2 array? What would work well? A typical novel is 8e4 +- 2e4, that's a lot of tuples, and my idea is to employ 3-, 4-, and 5-length groupings, just to compare results with different settings. I've dealt little-to-not-at-all with symbols, and boxed strings,


----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to