As usual, Brother Steve is seeing this at a much higher conceptual and process level than I am. >From the perspective of Analytic Journalism, if we're dealing with a large data set -- say 10K to 1 million records -- we would first draw a sample of a small TK percent to develop and test our assumptions, methods, and process. Once it's stable, run it against a larger sample. If it is still stable, then throw it against the total dataset.
At least that's what I thought this guy was talking about. Tom On Thu, Feb 5, 2026 at 2:21 PM Steve Smith <[email protected]> wrote: > > > https://uknow.uky.edu/research/nsf-career-award-supports-faster-smarter-use-massive-scientific-data > > Sounds like a semi-new vocabulary for what many of us have been doing for > years. So where's our $500k check? > > Def fits some of my work in 2005-6 at LBL on using the compression indices > in massive data sets to identify "interesting" (low-compression ==> high > entropy ==> interesting)) subsets/slices/views of massive data sets. My > only published work from that was actually in providing visualization of > the compression itself as a diagnostic for the multi-scale effectivity of > the compression on various data sets, but I believed that the whole process > could have been inverted to actually enhance efficient analysis. > Unfortunately it was the Data Storage folks funding/driving the work, not > the Viz or Data Mining folks. > > We also touched on deriving CFD familiar-measures like Div/Curl over > non-obvious data sets, once again as a meta-field to try to intuit within. > This was part of a DHS-funded "visual analytics" effort, unfortunately > managed by a rival lab who managed to undermine most project proposals not > lead by their lab... bah-humbug! I wasn't directly active in it, but > LANL had a 1 year LDRD project funding the normalization of a broad suite > of graph-metric algorithms for the same purpose (INCAA?) I don't know of > any applications of it beyond the principles on the project re-using their > tools and understanding going forward. Meta-Tools often don't get good > traction? > > I'm generally a fan of "early career" funding like this even though it > naturally fosters jealousy among peers, etc. My elderDotter was in the > running for several of those types of things about 10 years go (wow, time > flies!) when there was a pause in the program and her eligibility timed out > while the system took that pause. > > It still rankles her that she missed that boat. She also missed the > CRISPR boat by an inch or two (coulda been a contender for a position in > Doudna Lab just before that unfolded) but for personal life constraints! > > All the counterfactuals we conjure when we reflect and ideate! > > *I coulda been a contendah!* > > > > .- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / > ... --- -- . / .- .-. . / ..- ... . ..-. ..- .-.. > FRIAM Applied Complexity Group listserv > Fridays 9a-12p Friday St. Johns Cafe / Thursdays 9a-12p Zoom > https://bit.ly/virtualfriam > to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com > FRIAM-COMIC http://friam-comic.blogspot.com/ > archives: 5/2017 thru present > https://redfish.com/pipermail/friam_redfish.com/ > 1/2003 thru 6/2021 http://friam.383.s1.nabble.com/ > -- ++++++++++++++++++++++++++++ Tom Johnson - [email protected] +1 505 577 6482 Santa Fe, New Mexico USA *New Mexico Writers <https://nmwriters.org/>*++++++++++++++++++++++++++++
.- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / ... --- -- . / .- .-. . / ..- ... . ..-. ..- .-.. FRIAM Applied Complexity Group listserv Fridays 9a-12p Friday St. Johns Cafe / Thursdays 9a-12p Zoom https://bit.ly/virtualfriam to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com FRIAM-COMIC http://friam-comic.blogspot.com/ archives: 5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/ 1/2003 thru 6/2021 http://friam.383.s1.nabble.com/
