As usual, Brother Steve is seeing this at a much higher conceptual and
process level than I am.
From the perspective of Analytic Journalism, if we're dealing with a
large data set -- say 10K to 1 million records -- we would first draw
a sample of a small TK percent to develop and test our assumptions,
methods, and process. Once it's stable, run it against a larger
sample. If it is still stable, then throw it against the total dataset.
your original post triggered a cascade of memories (some of which I
blurted out here) as well as a jam-session with my
bar-friend-cum-technical-interlocutor GPT who led me on a merry chase
through some latent techniques I once ideated on (some blurted here
earlier).
A phrase that came out of that tete-a-tete fits what I think you are
describing from your own POV (highly relevant in these modern times of
3M record data-dumps from DOJ to try to baffle-with-BS) is "pre-image".
GPT offered me the more explicit denotation:
/The pre-image is not “the” original data point, but:/
/*the equivalence class of upstream possibilities consistent with
the downstream observation.*/
I have a lot of respect for those of you who swim well in such
high-dimensional and poorly defined, poorly conditioned data sets such
as "the news stream".
.- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / ...
--- -- . / .- .-. . / ..- ... . ..-. ..- .-..
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe / Thursdays 9a-12p Zoom
https://bit.ly/virtualfriam
to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/
archives: 5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/
1/2003 thru 6/2021 http://friam.383.s1.nabble.com/