On Sun, Jul 23, 2023, 8:16 PM James Bowery <jabow...@gmail.com> wrote:
> https://aclanthology.org/2023.findings-acl.426.pdf > Yes, and I think the only reason gzip didn't outperform the other text classifiers on the largest data sets is that it only finds matching strings over a 32 KB window. Of course, text classification is mostly adversarial, used for spam filtering and censorship. String matching can be easily defeated by deliberately misspelling words like "v1agra". But smarter algorithms that understand text and images have solved the problem. Your inbox is no longer full of spam and viruses. Your political posts just get down ranked instead of a ban. For some reason this reminds me of Matt's distributed competitive routing > AGI proposal <https://www.mattmahoney.net/agi2.html>: > Ah, yes, in 2008 before smart phones, social media, blockchain, and the Arab Spring ushering the demand for internet censorship. A time when we were young and idealistic and thought that our ideas about AGI would change the world. Now we are older and watching big tech solve the problems we failed to solve, and maybe not liking those changes. Instead of the internet being a tool for the people to control the government, it is becoming the other way around. Well I did warn you that AGI would be expensive. It would require a global effort and a funding model equal to decades of world GDP, the value of all the human labor that would be automated. My motivation was to divide the workload (hardware, software, knowledge collection) among millions of specialist peers. The goal was to do this in a hostile environment. Like blockchain, messages could not be deleted, edited, forged, or refuted, but protected by pairwise private key digital signatures and a reputation network instead of proof of work. I didn't give a thought to evading censorship because at the time it didn't exist. I now realize the dangers of centralized control. I suppose the technical reason for failure is that a distributed search index takes O(n log n) storage and work, but a centralized index like Google is O(n) and for a long time did everything we wanted. Distributed search depends on messages being compressible, so it wouldn't work for encrypted data like Bitcoin transactions. But the bigger reason is that people won't use a social network unless a lot of people already use it. If Google+ was a failure, what chance do I have of kick-starting one? My biggest mistake was assuming that people would be willing to make all their personal data public, which is what you need to do to distribute all human knowledge to billions of peers. This would eliminate identity theft and the need for passwords because everyone could instantly see what you are doing if you claim to be someone else. You couldn't secretly stalk someone because your queries would be public, just like the responses. There would be no such thing as a data breach because it's not secret Of course I was wrong. We only share all our data with a few big companies like Google, Amazon, MasterCard, etc. And it is getting worse. Social security numbers and birthdays were never meant to be secret. Facebook built a face recognition database with a billion names and a trillion labeled images, and then deleted it. It is now effectively illegal to post someone's picture without their permission. Maybe I am being pessimistic about P2P networks. Freenet and Tor are mostly unusable because they lack search engines. Napster was killed by shutting down it's centralized search service. USENET was O(n^2) and disappeared. Mastodon lacks a funding model. Bitcoin uses 1% of the world's electricity. Ethereum with proof of stake and support for arbitrary messages is still O(n^2) making transactions unaffordable for widespread use. I'm afraid that censorship is here to stay, in part because we want it as long as it's applied to other voices we want silenced. ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T4dbad1e5c8d7f685-Mcc76ed80bc4b3196ee78c085 Delivery options: https://agi.topicbox.com/groups/agi/subscription