Dear list members,

        We are excited to announce that the AIGC corpus - The aiTECCL
Corpus, is now available to all members of the research community. The
aiTECCL Corpus was compiled by Jiajin Xu and Mingchen Sun of the Corpus
Research Group of the National Research Centre for Foreign Language
Education at Beijing Foreign Studies University.

        The corpus consisting of two million words generated by the GPT-3.5
model, using identical writing prompts to those employed in *the TECCL
Corpus* <http://corpus.bfsu.edu.cn/info/1070/1449.htm>, aims to serve as a
reference corpus that exhibits a native-like linguistic quality. The corpus
is made available online on 9 August, 2023.

        URL: http://114.251.154.212/cqp/

        Username: test

        Password: test


        Please cite: Xu, Jiajin & Mingchen Sun. 2023. aiTECCL: An AIGC
English Essay Corpus. Beijing: National Research Centre for Foreign
Language Education, Beijing Foreign Studies University. Available online:
http://corpus.bfsu.edu.cn/info/1082/1913.htm


*Justifying the concept of "AIGC Corpus" (Artificial Intelligence Generated
Content Corpus) or Generative Corpus*

        The creation of the AIGC Corpus helps expand the concept of
"corpus". In the classic definition of a corpus, the included materials
must be language samples that are authentically or naturally occurring in
real-life communication. Clearly, generative texts do not fall under this
category. We believe that the rationale for the generative corpus can be
viewed from at least three aspects:

        1. The so-called principle of "authenticity" itself is a matter of
degree. For example, whether essays written by learners under exam
conditions belong to genuine communication is questionable. In existing
research, some elicited data also has authenticity issues similar to those
found in learners' interlanguage. Therefore, from the perspective of
existing corpora, there are texts with varying degrees of authenticity.

        2. The generative corpus can serve as an essential complement to
existing corpora. The emergence of the generative corpus can reconcile the
distinction between "probable language" and "possible language." For
linguistic instances that have not yet appeared in reality, they can be
generated using large language models.

        3. Creating a corpus using artificial intelligence technology is a
second-to-best solution under the current conditions for building specific
types of corpora. For example, the aiTECCL corpus simulates a reference
corpus of approximately 10,000 essays, close to the English native speaker
language quality, and written on the same topics as Chinese learners.
Without the use of artificial intelligence methods for generation, it might
be impossible to obtain a reference corpus of such quality and
comparability. Similarly, for corpus construction of languages from
least-developed countries or countries with extremely small populations,
without generative technology, it would be impossible to establish in the
short term.

        Further details about the prompt and the Python script we utilised
to create the corpus will be provided on the site soon
<http://corpus.bfsu.edu.cn/info/1082/1913.htm>.


Best wishes,


Jiajin Xu

Ph.D., Professor

National Research Centre for Foreign Language Education

Beijing Foreign Studies University, China
_______________________________________________
Corpora mailing list -- corpora@list.elra.info
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to corpora-le...@list.elra.info

Reply via email to