Re: Spamming the Data Space – CLIP, GPT and synthetic data
> I question whether the notion of pastiche makes any sense at all without > interpretation (preumably Luke has something to say about that). What's > certain is that the autonomy question is becoming urgent. > > If an AI model produces a tasteless, derivative image in a forest and there's no human there, is it pastiche? ;-) Sure, you're right, pastiche is everywhere, and is very much in the eye of the beholder. Even without direct AI production of images, a lot of the shows on Netflix, for example, feel strangely familiar, precisely because they cut-and-paste elements of 'cult' or 'successful' shows together. 'Stranger Things' was essentially engineered from viewer data, and Netflix knew it was going to be successful even before it launched. And this kinda brings me back to some of the genuine questions I have around AI models and cultural critique - the lines here seem really arbitrary. Why is my DALL-E generated image 'derivative' or a 'tainted' image, according to the tech commentators I mentioned earlier, and my 'manual' pixel art not? I honestly don't see what the difference is between a MidJourney collage and a Photoshop collage. The same question goes for text. Why is Hardy Boys #44 (or whatever formulaic fiction you like) a 'real' book and the same story cranked out by GPT-3 'contaminated' or 'synthetic'? Molly, in her excellent post, raises the issue of 'false production'. But I genuinely don't know what that would look like. Is it false because it's based on existing images (but collage artists already do this), or because there's not enough 'intentionality' (prompting feels too easy?), or because it's generated too quickly? In some ways, the exact same critique of AI production could be levelled at meme creation - choose your base image, add the classic meme typeface, and pump out a million permutations - but these images are somehow considered an 'organic' part of the internet, while what comes next is going to be artificial, mass-produced, spam, synthetic, and so on. At the core of all this, I think, is the instinct that there's something unique about 'human' cultural production. (Even though AI models are absolutely based on human labor, designs, illustrations, books, etc - their packaging and obfuscation of this data makes them 'machinic' in the popular consciousness.) Terms like 'meaning', or 'intention', or 'autonomy' gesture to this desire, this hunch that something will be lost, that some ground will be ceded with the move to AI image models, large language models, and so on. I'm sympathetic to this - and don't want to come across as an apologist for Big Tech, Open AI, etc. But I guess my struggle is to put my finger on what that 'special something' is. Many of these posts have suggested future autonomous zones where 'synthetic' culture is banned. What would be the hallmark or signature of these spaces? No digital tools or algorithmic media may come to mind, but these overlook the most crucial element to 'new' cultural production: reading or listening or viewing other people's works. - 'Perplexed in Mianjin/Brisbane' > > > > On Fri, Dec 23, 2022 at 8:54 AM Francis Hunger < > francis.hun...@irmielin.org> wrote: > >> Dear Luke, dear All >> >> Interesting essay Francis, and always appreciate Brian's thoughtful >> comments. I think the historical angle Brian is pointing towards is >> important as a way to push against the claims of AI models as somehow >> entirely new or revolutionary. >> >> In particular, I want to push back against this idea that this is the >> last 'pure' cultural snapshot available to AI models, that future >> harvesting will be 'tainted' by automated content. >> >> At no point did I allude to the 'pureness' of a cultural snapshot, as you >> suggest. Why should I? I was discussing this from a material perspective, >> where data for training diffusion models becomes the statistical material >> to inform these models. This data has never been 'pure'. I used the >> distinction of uncontaminated/contaminated to show the difference between a >> training process for machine learning which builds on an snapshot, that is >> still uncontaminated by the outputs of CLIP or GPT and one which includes >> generated text and images using this techique on a large scale. >> >> It is obvious, but maybe I should have made it more clear, that the >> training data in itself is already far from pure. Honestly I'm a bit >> shocked, you would suggest I'd come up with a nostalgic argument about >> purity. >> >> Francis' examples of hip hop and dnb culture, with sampling at their >> heart, already starts to point to the problems with this statement. Culture >> has always been a project of cutting and splicing, appropriating, >> transforming, and remaking existing material. It's funny that AI >> commentators like Gary Marcus talk about GPT-3 as the 'king of pastiche'. >> Pastiche is what culture does. Indeed, we have whole genres (the romance >> novel, the murder mystery, etc) that are ab
A.I. Lenin: What is to be Done Today, by ChatGPT and Dmytri Kleiner
A.I. Lenin What is to be Done Today, by GPT and Dmytri Kleiner In the early 20th century, Lenin recognized the importance of an all-Russian newspaper as a means of building the capacities and capabilities necessary for a revolutionary movement to overthrow capitalism. Similarly, a digital agency can help build the capacities and capabilities necessary for a revolutionary movement to overthrow capitalism in the modern world. As Lenin wrote in "What Is To Be Done?": "Without a newspaper, it is impossible to unite, to direct, to arouse, and to organize the masses" (Lenin, 1902, p. 28). For Lenin, it was not the newspaper as such that was important, but the organization capable of publishing it. In this sense, a digital agency is not an end in itself, but a means of building the abilities necessary for a revolutionary movement to coordinate and communicate with supporters, as well as carry out propaganda and agitation efforts. One key capacity that a digital agency can help build is the ability to disseminate information and propaganda effectively. Social media platforms and other modern technologies have become major channels for the distribution of information and ideas, and a digital agency can help a revolutionary movement leverage these channels to reach a wide audience and spread its message. In the past, Leninist organizations used underground newspapers and smuggled literature to disseminate their message, often at great risk to their own safety. Today, a digital agency can help a revolutionary movement use modern technologies to reach a wider audience and spread its message more effectively. As Lenin wrote in "The Tasks of the Russian Social Democrats": "The spread of revolutionary ideas among the masses depends above all on the degree of their own organization" (Lenin, 1898, p. 42). And as he wrote in "Left-Wing Communism: An Infantile Disorder": "The party must have its own press, its own organization, and its own set of tactics" (Lenin, 1920, p. 22). Another key capacity that a digital agency can help build is the ability to mobilize and organize supporters. In the past, Leninist organizations relied on secret meetings and underground networks, known as konspiratsiya, to coordinate their activities. Today, social media and other modern technologies can provide similar capabilities, allowing a revolutionary movement to quickly and effectively organize and mobilize supporters. For example, a Leninist organization might use a closed social media group or a secure messaging app to organize meetings, distribute propaganda, and coordinate actions. As Lenin wrote in "One Step Forward, Two Steps Back": "Only konspiratsiya can ensure the freedom and independence of the Party" (Lenin, 1904, p. 80). Konspiratsiya refers to the practice of maintaining secrecy and keeping activities hidden from the authorities in order to avoid detection and repression. In the past, Leninist organizations used a variety of tactics to maintain secrecy and avoid detection, such as using code words and symbols, holding meetings in secret locations, and using fake names. In the context of a digital agency, this might involve using aliases and pseudonyms, as well as secure communication methods and infrastructure, to protect against surveillance and to maintain secrecy. As Lenin wrote in "Left-Wing Communism: An Infantile Disorder": "The party must have its own press, its own organization, and its own set of tactics" (Lenin, 1920, p. 22). A digital agency can also help build the capacity for a revolutionary movement to engage in digital activism and direct action. This might involve creating and distributing memes and other digital content that can go viral and spread the movement's message, or using hacking and other digital tactics to disrupt the operations of capitalist institutions. In the past, Leninist organizations used a variety of tactics to engage in direct action, such as strikes, boycotts, and sabotage, as well as organizing demonstrations and protests. In the context of a digital agency, these tactics might be adapted for the digital realm, such as organizing online boycotts or launching cyber attacks against capitalist institutions. Another key capacity that a digital agency can help build is the ability to defend against digital threats and attacks. This might involve developing and implementing security measures to protect against cyber threats, such as hackers and malware, as well as implementing measures to protect against surveillance and monitoring by the authorities. In the past, Leninist organizations often faced repression and persecution from the state, and they had to develop tactics and strategies to defend against these threats. In the context of a digital agency, this might involve using encryption and other secure communication methods, as well as developing contingency plans and backup systems to ensure the continued operati
Re: Spamming the Data Space – CLIP, GPT and synthetic data
Thank you, Francis for this very interesting work, and for the responses from all. Some of you may not know me at all, so by way of introduction, I am an independent researcher/writer/feminist artist working between film, media and cultural studies on history and theory of new media in the context of post-war arts culture. I have been teaching about AI, AL, and varied artists works in these fields for a few years with great internet as a new medium (AI) comes in. Some salient features of this analysis and the varied responses stand out for further critique. If I am correctly reading your text, Francis, and I thank you Francis for the separation of the two modi, on the one hand there is the potential for new forms of data sets (as assets and commodities) which are “walled” and controlled - as collections or arrays - which will undoubtedly become, indeed already are being made: Some examples here: Lev Manovich’s PhotoTime projects, Trevor Paglen’s experimental arrays in which he inputs the dataset and then works with it; or experiments with MS-made Genesis Maps software used on/with the METs digital collections. Some of you may have seen or been at this event written about here: https://www.metmuseum.org/perspectives/articles/2019/2/artificial-intelligence-machine-learning-art-authorshipFrancis writes of something similar - controlled data spaces (like online communities maybe, where contributing or donating is controlled by membership?) “We are going to create separate data ecologies, which prohibit spamming the data space. These would be spaces, comparable to the no-photo-policy in clubs like Berghain or IFZ with a no-synthetics policy. While vast areas of the information space may be indeed flooded, these would be valuable zones of cultural exchange.” On the other hand, there is the ability for AIs to remake data based on anything it is allowed to pull in, and to then feed that invention back into data pools and streams. So the question arises about who and how data will be controlled. This question is familiar. Capitalist interests will have their reasons and methods - maybe what Brian alluded to about understanding that we have seen these mechanisms at work before…and there is the decided problem of having capitalist interests further empowered through the accelerated process of reproduction and distribution of it’s wares, especially when this uncontrolled reproduction/replication is prone to prejudices, biases, falsehood, and omissions. Let’s see what we can foresee. Can we foresee cultural progress continuing as a growing alienation among larger sectors of global populations, no longer able to find any useful or rational meaning in a plethora of false production? (- a great resignation from the virtual spectacle and an end to capitalism of this kind - literally it’s collapse from over extension) Will there be regulation of AI in the net, further regulating the circulation of information? Will those with the least access, those most vulnerable to data mining of their lives, become even more susceptible to exploitation, as they fail to understand how to navigate this new synthetic reality to their own advantage or where self-interest is a manufactured byproduct of the spectacle? Whatever we might foresee, the bandaid of tweaking SEO for the sake of a false surface of BIPOC to add in is lame (Francis’ example about a few more pictures of BIPOC being worked in) Molly On Dec 23, 2022, at 12:48 PM, Luke Munn wrote:Hey Francis,Thanks for your response. Just briefly I think there was some misframing of my response. It was not meant as a takedown or critique of your text, but rather a more general response to this idea tabled in the opening line, For the last time in human history the cultural-data space has not been contaminated. This is an idea I had seen a few times from others the same day. 'The internet is now forever contaminated with images made by AI,” Mike Cook, an AI researcher at King’s College London' . 'How AI-Generated Text Is Poisoning the Internet' etc. I was kicking around this idea and seeing what it implied, hence delving into this idea of post-truth authenticity etc which your text doesn't talk about. I know you get this stuff, and can see issues with some of these assumptions, which is precisely why I mentioned your dnb example as culture of remixing. Nevertheless I did think your original piece (and most people's work including mine) can be strengthened by thinking longer term / historically, as the presentist framing of tech is often very powerful. That was my only real feedback - but i can see how this more general 'essay as jumping off point' discussion could be misread, so apologies for that. -LOn Sat, 24 Dec 2022, 01:03 Francis Hunger,wrote: Dear Luke, dear All Interesting essay Francis, and always appreciate Brian's thoughtful comments. I think the historical angle Brian is pointing towards
Re: Spamming the Data Space – CLIP, GPT and synthetic data
Hey Francis, Thanks for your response. Just briefly I think there was some misframing of my response. It was not meant as a takedown or critique of your text, but rather a more general response to this idea tabled in the opening line, For the last time in human history the cultural-data space has not been contaminated. This is an idea I had seen a few times from others the same day. 'The internet is now forever contaminated with images made by AI,” Mike Cook, an AI researcher at King’s College London' . 'How AI-Generated Text Is Poisoning the Internet' etc. I was kicking around this idea and seeing what it implied, hence delving into this idea of post-truth authenticity etc which your text doesn't talk about. I know you get this stuff, and can see issues with some of these assumptions, which is precisely why I mentioned your dnb example as culture of remixing. Nevertheless I did think your original piece (and most people's work including mine) can be strengthened by thinking longer term / historically, as the presentist framing of tech is often very powerful. That was my only real feedback - but i can see how this more general 'essay as jumping off point' discussion could be misread, so apologies for that. -L On Sat, 24 Dec 2022, 01:03 Francis Hunger, wrote: > Dear Luke, dear All > > Interesting essay Francis, and always appreciate Brian's thoughtful > comments. I think the historical angle Brian is pointing towards is > important as a way to push against the claims of AI models as somehow > entirely new or revolutionary. > > In particular, I want to push back against this idea that this is the last > 'pure' cultural snapshot available to AI models, that future harvesting > will be 'tainted' by automated content. > > At no point did I allude to the 'pureness' of a cultural snapshot, as you > suggest. Why should I? I was discussing this from a material perspective, > where data for training diffusion models becomes the statistical material > to inform these models. This data has never been 'pure'. I used the > distinction of uncontaminated/contaminated to show the difference between a > training process for machine learning which builds on an snapshot, that is > still uncontaminated by the outputs of CLIP or GPT and one which includes > generated text and images using this techique on a large scale. > > It is obvious, but maybe I should have made it more clear, that the > training data in itself is already far from pure. Honestly I'm a bit > shocked, you would suggest I'd come up with a nostalgic argument about > purity. > > Francis' examples of hip hop and dnb culture, with sampling at their > heart, already starts to point to the problems with this statement. Culture > has always been a project of cutting and splicing, appropriating, > transforming, and remaking existing material. It's funny that AI > commentators like Gary Marcus talk about GPT-3 as the 'king of pastiche'. > Pastiche is what culture does. Indeed, we have whole genres (the romance > novel, the murder mystery, etc) that are about reproducing certain elements > in slightly different permutations, over and over again. > > Maybe it is no coincidence that I included exactly this example. > > Unspoken in this claim of machines 'tainting' or 'corrupting' culture is > the idea of authenticity. > > I didn't claim 'tainting' or 'corrupting' culture, not even unspoken. Who > am I to argue against the productive forces? > > It really reminds me of the moral panic surrounding algorithmic news and > platform-driven disinformation, where pundits lamented the shift from truth > to 'post-truth.' This is not to suggest that misinformation is not an > issue, nor that veracity doesn't matter (i.e. Rohingya and Facebook). But > the premise of some halcyon age of truth prior to the digital needs to get > wrecked. > > I agree. Only, I never equaled 'uncontaminated' to a "truth prior to the > digital", I equaled it to a snapshot that doesn't contain material created > by transformer models. > > Yes, Large language models and other AI technologies do introduce new > conditions, generating truth claims rapidly and at scale. But rather than > hand-wringing about 'fake news,' it's more productive to see how they > splice together several truth theories (coherence, consensus, social > construction, etc) into new formations. > > I was more interested in two points: > > 1.) Subversion: What I called in my original text the 'data space' > (created through cultural snapshots as suggested by Eva Cetinic) is an > already biased, largely uncurated information space where image data and > language data are scaped and then mathemtically-statistically merged > together. The focus point here is the sheer scale on which this happens. > GPT-3 and CLIP are techniques that both build on massive datascraping > (compared for instance to GANs) so that it is only possible for well funded > organizations such as Open-AI or LAION to build these datasets. This > dataspace c
Re: Spamming the Data Space – CLIP, GPT and synthetic data
[This was written yesterday, so it responds mostly to Luke and Felix.] I agree that pastiche is a fundamental cultural process - but if it's so fundamental, then to make any distinctions you have to look at its effects in specific contexts. One such context, in the recent past, is postmodernism. It's relevant in some ways, but I agree with Francis that the present context is quite different. Postmodern pastiche is the original twist that an individual gives to a mass-distribution image. In the arts of the 1980s and 90s, the pastiche aesthetic had the effect of disqualifying a whole range of avant-garde practices, from neo-dadist transgression to modernist abstraction, all of which consciously tried to mark off a space *outside* corporate-capitalist aesthetic production. From one angle, the acceptance of a common commercial culture was a good thing: it reduced the power of elite gatekeepers, since the raw material of art was now ready-to-hand, without racial, financial and educational barriers to access. But the quest for autonomy is another fundamental cultural process, and in contemporary societies, autonomy from highly manipulative aesthetic production is crucial. Otherwise, there's nowhere to develop any divergent ethical/political orientation. As the focus of commercial culture shifted online, these problems took on new guises. Most of my own work as a cultural critic in the 2000s was devoted to autonomy in the communication societies - and then came social media, making the whole situation dramatically worse. Today, Francis points to the floods of imagery that are already being produced by AI/statistical computing, and he predicts second and third generations of degraded images, synthesized from the initial ones. I was struck by this word "degraded" in the initial text, and I think it corresponds to something more than simple entropy on the level of data. The absence of any individual or subcultural viewpoint at the origin of the statistically generated images, and the coresponding lack of particular affects, aspirations, insights or blindspots, renders yet another fundamental cultural process obsolete - namely, interpretation. I question whether the notion of pastiche makes any sense at all without interpretation (preumably Luke has something to say about that). What's certain is that the autonomy question is becoming urgent. Autonomy is not about purity, nor self-sufficiency, nor withdrawal. It's about the ability to establish the terms (particularly the overarching value orientation) that will guide one's engagement with society. I agree with Francis that being able to filter out statistically produced images (and music, and discourse) is going to become a major issue under flood conditions. And I'd go further. Whoever is not able to form or join an interpretative community, very consciously dedicated to making meaning with respect to art or other cultural practices, is going to experience a very profound alienation during the next phase of the communication societies. On Fri, Dec 23, 2022 at 8:54 AM Francis Hunger wrote: > Dear Luke, dear All > > Interesting essay Francis, and always appreciate Brian's thoughtful > comments. I think the historical angle Brian is pointing towards is > important as a way to push against the claims of AI models as somehow > entirely new or revolutionary. > > In particular, I want to push back against this idea that this is the last > 'pure' cultural snapshot available to AI models, that future harvesting > will be 'tainted' by automated content. > > At no point did I allude to the 'pureness' of a cultural snapshot, as you > suggest. Why should I? I was discussing this from a material perspective, > where data for training diffusion models becomes the statistical material > to inform these models. This data has never been 'pure'. I used the > distinction of uncontaminated/contaminated to show the difference between a > training process for machine learning which builds on an snapshot, that is > still uncontaminated by the outputs of CLIP or GPT and one which includes > generated text and images using this techique on a large scale. > > It is obvious, but maybe I should have made it more clear, that the > training data in itself is already far from pure. Honestly I'm a bit > shocked, you would suggest I'd come up with a nostalgic argument about > purity. > > Francis' examples of hip hop and dnb culture, with sampling at their > heart, already starts to point to the problems with this statement. Culture > has always been a project of cutting and splicing, appropriating, > transforming, and remaking existing material. It's funny that AI > commentators like Gary Marcus talk about GPT-3 as the 'king of pastiche'. > Pastiche is what culture does. Indeed, we have whole genres (the romance > novel, the murder mystery, etc) that are about reproducing certain elements > in slightly different permutations, over and over again. > > Maybe it is no coincidence that I inclu
Re: Spamming the Data Space – CLIP, GPT and synthetic data
Dear Luke, dear All Interesting essay Francis, and always appreciate Brian's thoughtful comments. I think the historical angle Brian is pointing towards is important as a way to push against the claims of AI models as somehow entirely new or revolutionary. In particular, I want to push back against this idea that this is the last 'pure' cultural snapshot available to AI models, that future harvesting will be 'tainted' by automated content. At no point did I allude to the 'pureness' of a cultural snapshot, as you suggest. Why should I? I was discussing this from a material perspective, where data for training diffusion models becomes the statistical material to inform these models. This data has never been 'pure'. I used the distinction of uncontaminated/contaminated to show the difference between a training process for machine learning which builds on an snapshot, that is still uncontaminated by the outputs of CLIP or GPT and one which includes generated text and images using this techique on a large scale. It is obvious, but maybe I should have made it more clear, that the training data in itself is already far from pure. Honestly I'm a bit shocked, you would suggest I'd come up with a nostalgic argument about purity. Francis' examples of hip hop and dnb culture, with sampling at their heart, already starts to point to the problems with this statement. Culture has always been a project of cutting and splicing, appropriating, transforming, and remaking existing material. It's funny that AI commentators like Gary Marcus talk about GPT-3 as the 'king of pastiche'. Pastiche is what culture does. Indeed, we have whole genres (the romance novel, the murder mystery, etc) that are about reproducing certain elements in slightly different permutations, over and over again. Maybe it is no coincidence that I included exactly this example. Unspoken in this claim of machines 'tainting' or 'corrupting' culture is the idea of authenticity. I didn't claim 'tainting' or 'corrupting' culture, not even unspoken. Who am I to argue against the productive forces? It really reminds me of the moral panic surrounding algorithmic news and platform-driven disinformation, where pundits lamented the shift from truth to 'post-truth.' This is not to suggest that misinformation is not an issue, nor that veracity doesn't matter (i.e. Rohingya and Facebook). But the premise of some halcyon age of truth prior to the digital needs to get wrecked. I agree. Only, I never equaled 'uncontaminated' to a "truth prior to the digital", I equaled it to a snapshot that doesn't contain material created by transformer models. Yes, Large language models and other AI technologies do introduce new conditions, generating truth claims rapidly and at scale. But rather than hand-wringing about 'fake news,' it's more productive to see how they splice together several truth theories (coherence, consensus, social construction, etc) into new formations. I was more interested in two points: 1.) Subversion: What I called in my original text the 'data space' (created through cultural snapshots as suggested by Eva Cetinic) is an already biased, largely uncurated information space where image data and language data are scaped and then mathemtically-statistically merged together. The focus point here is the sheer scale on which this happens. GPT-3 and CLIP are techniques that both build on massive datascraping (compared for instance to GANs) so that it is only possible for well funded organizations such as Open-AI or LAION to build these datasets. This dataspace could be spammed a) if you want to subvert it and b) if you'd want to advertise. The spam would need to be on a large scale in order to influence the next (contaminated) iteration of a cultural snapshot. In that sense only I used the un/contaminated distinction. 2). In response to Brian I evoked a scenario that builds on what we already experience when it comes to information spamming. We all know, that mis-information is a social and _not_ a machinic function. Maybe I should have made this more clear (I simply assumed it). I ignored Brians comment on the decline of culture, whatever this would mean, and could have been more precise in this regards. I don't assume culture declines. Beyond this, there have been discussions about deepfakes for instance and we saw that deepfakes are not needed at all to create mis-information, when one can just cut any video using standard video editing practices towards 'make-believe'. I wasn't 'hand-wringing' about fake news, in my comment to Brian, instead I was quoting Langlois with the concept of 'real fakes'. Further I'm suggesting that CLIP and GPT make it more easy to automate large scale spamming, making online communities uninhabitable or moderation more difficult. Maybe I'm overestimating the effect. We can already observe GPT-3 automated comments appearing on twitter or the ban of GPTChat pos