Note: I wrote the following item to Dave Molnar, as part of our off-line conversation. I ended up summing-up a bunch of points I wanted to put out to the list, and Dave has given me permission to include his remarks. A few places refer to "you"...this is why.
On Monday, April 29, 2002, at 09:06 AM, David A Molnar wrote: > > On Mon, 29 Apr 2002, Tim May wrote: >> to Help the Cause. I pointed out to him the Big Brotherish trends and >> how his data mining software would be more likely to be used to track >> dissidents than it would be to stop an Arab from hijacking a plane. > > Yes, this was what disturbed me a bit at the workshop. Privacy issues > were > discussed, but most of the time it seemed like lip service. No one > brought > up the issue of oversight, control, and explanation of the new methods > we'd all develop. Not to mention that the problems we were supposed to > solve started out vague and stayed pretty vague. [Note: this is a discussion about "data mining," the subject of a couple of recent workshops and conferences, post 911. I was referring to a friend of mine who runs a very successful data mining operation, in the hedge fund business, and how he wants to apply his expertise to help the anti-terrorism battle.] The deep error which has been with us for a long time is the assumption that we can create legal systems or surveillance systems which go after "bad guys" but not "good guys." That is, that we can separate "bad guys" like Mohammed Atta from "good guys," all in advance of actual criminal or terrorist acts. Your later point about how the creators of these data mining systems want "protections" which prevent systems from going too far, from extracting too much information, from compiling too many dossier entries...this is just one of many examples. (Others being: restrictions on cash and crypto and many other things, surveillance cameras, etc.) The talk of "safeguards" misses the important error. The error is that any system usable by John Q. Public to protect his privacy is usable by Mohammed Atta to protect HIS privacy, absent some way to classify John Q. Public and Mohammed Atta into two different classes. Such a system was not in place on Sept 10th, and it is unlikely to ever be in place. (The upcoming film "Minority Report" is just the latest treatment of this theme: can criminals be classified in advance of their crimes? Phrenologists used to measure head shape, now we have "personality inventory" tests in grade school, trying to separate out the future psychopaths and thought criminals from the rest of the herd.) Given that such a classifier (in topos terms, a "subobject classifier") does not exist at present, the only solution is then to ban all forms of cash, for example. Or place surveillance cameras in all public places. Or to set up comprehensive national dossier systems. And the "safeguards" in data mining will of course be either subverted or ignored, as any safeguard which protects John Q. Public wll, perforce, protect "future Columbine killers," future Charles Mansons, and future Mohammed Attas. The radical view many of us espouse is actually the one envisaged by the Founders: protection from government is more important than catching a few criminals in advance of their crimes. (Probably a more elegant, universal way of phrasing this...) Yes, some people who use digital cash will be bad guys. Yes, some people who use remailers will be child porn sellers. Yes, etc. [Note: the following is more speculative, meant as a comment to Dave, a math major. When I outline my full proposal on how category theory and topos theory apply to our kind of issues, I'll lay out the arguments in much more detail.] The topos connection is very real, in terms of outlook shift. If someone says "Is Person X a criminal or not a criminal?," this is not meaningful in terms of future actions. It is only meaningful in terms of a *constructive* proof: has this person already *committed* a crime? If a crime can be demonstrated and the right causal links established, the person has been proved to have committed a crime. This is of course the intuitionist (in Brouwer's sense) position (which I am now realizing I support, and that others should support, and that it in fact matches reality in many important ways). ---Digression on Intuitionism--- Intuitionism is defined at length in online sources, e.g., Mathworld. It has nothing to do with mysticism or irrationality. Rather, it's an alternative to conventional 20th century logic. In the intuitionist view, infinity is not used and the "law of the excluded middle" is not used. This has implications for the Axiom of Choice and its equivalent forms. Radical when Brouwer first proposed it nearly a century ago, but used extensively in the 1960s. Closely related to "time-varying sets," where set membership is a function of time, naturally enough. ("John is now a member of the set of civilians, but tomorrow he becomes a member of the set of soldiers."). Also related to alternatives to Boolean algebra, such as Heyting algebra. Classical logic requires "omniscient observers": "There are either stars older than 20 billion years or there are not." "The number of grains of sand in the world is either XXX or it is not." "There will either be a woman president of the U.S. by 2040 or there will not be." Lee Smolin in "Three Roads to Quantum Gravity" gives good example involving "facts" which are unknowable to _anyone_ in our location in space and time because they involve happenings outside our light cone. For example, the number of cats living in galaxy 30 billion light-years from our own. No light, no signal, from that galaxy has ever reached our part of the universe, so there is absolutely no way of knowing the truth or falsity of statements about cats living in that galaxy. Similarly, statements about the future, or statements about events in possible worlds (a la Kripke), are not meaningful using simple classical logic. Astute readers will see that many of the "top down models" in financial crypto, in systems of interacting agents, involve assumptions about omniscience (e.g., the programmers). Is Alice "really" trustworthy or not? Does Bob the Banker have the funds on deposit to cover claims? The logic of systems of malicious, devious agents is the logic of incomplete knowledge and intuitionistic logic. "Show me the money!" is the essence of a constructive proof, a la Brouwer. ---End digression, for now---- What people want to know is "Will Person X commit a crime in the future?" (And hence we should deny him access to strong crypto _now_, for example, which is the whole point of attempting to surveil, restrict, and use data mining to ferret out bad trends.) Even the strongest believer in the law of the excluded middle would not argue that the "Will Person X commit a crime in the future?" has a "Yes" or "No" answer at the _present_ time. (Well, actually, I suppose some folks _would_. They would say "I personally don't know if he will, but in 50 years he either will have committed a crime or he will not have committed a crime.") As with the Schrodinger's Cat "paradox," we cannot say anything at the present time about "crime/no crime" or "dead/not dead." (This is the sense in which a Heyting algebra dealing with time-varying membership in sets is more interesting for our purposes than a Boolean algebra of yes/no truth values.) How radical is this view? People have been reasoning with incomplete current and future knowledge for a long time, needless to say. Betting what someone will or will not do, betting on future outcomes, is the third oldest profession. All of the usual tools: probability, Bayesian outlook, "indicators" (those psychological inventories), skull shape, etc. So I'm not saying a Heyting algebra gives "novel" results, but it changes the focus. (In quantum mechanics, for example, it links directly with "many worlds" and "consistent histories" interpretations.) Can we Identify the Bad Guys? Getting back to law enforcement attempting to predict the future, the lack of any meaningful way to predict who will be a future Mohammed Atta or Charles Manson, and who thus "should" be restricted in his civil liberties, is the important point. Could any amount of data mining have identified Mohammed Atta and his two dozen or so co-conspirators? Sure, *now* we know that an "indicator" is "Unemployed Arab taking flying lessons," but we surely did not know this prior to 9/11. Finding correlations ("took flying lessons," "showed interest in chemical engineering," "partied at a strip club") is not hard. But not very useful. To the law enforcement world, this means _everyone_ must be tracked and surveilled, dossiers compiled. All of the talk about "safeguards" in the data mining is just talk. Any safeguard sufficient to give John Q. Public protection will give Mohammed Atta protection...because operationally they are identical persons: there is no subobject classifier which can distinguish them! By saying Mohammed Atta is indistinguishable from other Arab men who generally fit the same criteria...assuming we don't know in *advance* that "Unemployed Arab taking flying lessons" is an important subobject classifier. (Sure, _now_ we are looking for unemployed Arabs taking flying lessons, and not finding any. That's the easy part. The hard part is finding out about the future, about other types of attacks. There's the rub. And this is why becoming a dossier society is not the answer.) > One speaker, who otherwise gave an **excellent** recap of discussion, > even > went so far as to try to draw a distinction between "social/legal > issues" > and "scientific issues" and then tried to say "we should not care about > the social stuff and focus primarily on the science." Part of this was > motivated by some of the same concerns I see you expressing with regards > to law school -- a feeling that the traditional political/legal methods > we > have are about defending turf instead of solving the problem at hand, so > the real answers will come from the science. Indeed, the major "changes in ground truth" (what is actually seen on the "ground," as in a battle) have come from technology. It was the invention and sale of the Xerox machine and VCR that altered legal ideas about copyright and "fair use," not a bunch of lawyers pontificating. In both cases, the ground truth had already shifted, in a kind of knowledgequake, and the Supremes had only two choices: accept the new reality by arguing about "fair use" and "time-shifting," or declare such machines contraband and authorize the use of storm troopers to collect the millions of copiers and VCRs aleady sold. They chose the first option. (The DMCA and Hollywood proposals have not yet reached this stage, so Congress is pushing the storm trooper approach, a la Sklyarov, requirements that PCs have Big Brother features built in, etc. How this will play out will depend on how many "workarounds" get out there. This is an important point: getting more hacks and workarounds will increase the chances that the laws will be changed. One good hacker is worth a hundred lawyers.) > > Where this line of thought fails and fails miserably is when the > "science" > can give us new options, but we won't look for them if we aren't > thinking > to. As an example, Chaum came up with anonymous digital cash in part > because he was interested in privacy - and that opens up all new > avenues...if privacy had not been present in the "science" side of > things, > we wouldn't have gotten that. Precisely! This is why the talk fo how the Cypherpunks list (and similar lists) should not be political is so wrong-headed: without a political compass, where would we head? (That the dominant political philosophy is closely-attuned to what is now called "libertarianism" (but which used to be called "liberalism," or "classical liberalism") is more because that's the only political philosophy attuned to distributed, non-hierarchical systems. One might imagine a list oriented toward using strong crypto to help with fascism, or with Maoism, but there would be some deep conflicts. The absence of such groups, or even of "strong crypto for social welfare" milder forms, tells us a lot.) > > One encouraging thing was that a lot of the "data mining" speakers > seemed > very interested in exploring methods for limiting use of their > techniques. > i.e. making sure you can't get "too much" out of the database. > Unfortunately the discussion of "how much is too much" wasn't in scope > here, but at least some mechanisms might end up being in place... And, as I argued above, I doubt that any such "limits" will either be very _useful_ or will be very _long-lasting_. Trusting in others to provide safeguards or limits is not very compelling. (BTW, as you probably know or can imagine, there have been crypto methods proposed for safeguarding certain kinds of data collection, e.g., schemes using "random coin flip protocols" for answering questions like "Are you homosexual?" (supposedly "useful" for public health planners trying to deal with HIV/AIDS issues. The idea is that the pollee XORs his answer with a random bit. His answer then doesn't _implicate_ him, but overall statistics can still be deduced from a large enough sample. Ho hum. Better to simply tell the poller "None of your fucking business...get off my property.") The core point is the familiar one: we are coming to, or have reached, a fork in the road. Down one path lies the Surveillance State, the Panopticon, with ubiquitous cameras, intrusive questions, restrictions on untraceable spending, and other detritus of the police state. Down the other path lies a universe of strong crypto with a web of "opaque pipes" linking "opaque objects." Technologists can make the second path the reality. Lawyers and lawmakers will try to take us down the first path. --Tim May -- Timothy C. May [EMAIL PROTECTED] Corralitos, California Political: Co-founder Cypherpunks/crypto anarchy/Cyphernomicon Technical: physics/soft errors/Smalltalk/Squeak/ML/agents/games/Go Personal: b.1951/UCSB/Intel '74-'86/retired/investor/motorcycles/guns Recent interests: category theory, toposes, algebraic topology