I wanted to share this news, so we can get some feedbacks. All comments are welcome.
Wajdi
---------- Forwarded message ---------- From: Mark Dredze <mdre...@cs.jhu.edu> To: c...@cs.jhu.edu Date: Thu, 4 Feb 2010 05:13:39 -0500 Subject: Proposal for the NAACL Mechanical Turk Workshop
Congratulations! Your proposal for the NAACL Workshop on Creating
Speech and Language Data With Amazon’s Mechanical Turk was selected to receive the $100 credit. In order to receive the credit, could you please confirm that: 1. You intend to submit a short paper (4 pages) describing what results from your proposal 2. You or one of your co-authors will attend the workshop at NAACL in LA (Note that price of attending the workshop is higher than the $100 credit) 3. You will submit your data and your HIT templates along with your paper, and that its OK to publish them on the workshop web site. If you cannot share your data due to licensing, please let us know the restrictions. For commonly available corpora (LDC), we will allow the posting of the annotations with instructions for how to merge them with the corpus. If you agree with all of that, then please send us your Amazon.com account name and we'll forward it to Amazon Mechanical Turk so that they can apply the credit. You are of course welcome to add your own funds as well. Finally, We've set up a wiki on GitHub so that people can write trade tips and advice about using Mechanical Turk: http://wiki.github.com/callison-burch/mechanical_turk_workshop/ We've put up two hints. One shows how to record information about what country your Turkers live in. The other shows some javascript for highlight words by clicking on them, and records which words are clicked. Please add to the wiki! Best Regards, Chris Callison-Burch and Mark Dredze p.s. Please let your co-authors know that your proposal was awarded since we're only sending these notifications to the lead author.
Project Proposal Draft Quran corpus annotation with Amazon’s Mechanical Turk Wajdi Zaghouani Kais Dukes Linguistic Data Consortium School of Computing University of Pennsylvania University of Leeds waj...@ldc.upenn.edu s...@leeds.ac.uk 1- Quran project presentation The Quranic Arabic Corpus is an open source project hosted by the Language Research Group at the University fo Leeds. The aim of this project is to provide a richly annotated linguistic resource for researchers wanting to study the language of the Quran. The Quranic Arabic Corpus provides an annotated linguistic resource which shows the Arabic grammar, syntax and morphology for each word in the Holy Quran. The corpus is divided in two levels of analysis: morphological annotation and a syntactic treebank. 2- Current annotation and needs Currently, the annotation is provided by volunteer annotators who are mostly Arabic linguists. Corrections for the online corpus can be easily made online by clicking on an Arabic word and than posting the desired suggestion which will be reviewed before being included in the corpus. Moreover, a message board was created to provide to discussion space for various issues and suggestion regarding the project 3- Proposed experiment using Mechanical Turk Mechanical Turk's potentials opens new possibilities for annotating speech and text. We will be very interested in having an experiment to evaluate the effectiveness of using Mechanical Turk to perform corrections and annotations of the Quran corpus, Especially when it comes to comparing the existing message board correction with a Mechanical Turk’s solution. Which would produce better quality ? What will be the annotation volume for the new approach ? Could the 2 approaches complements each others ?. Paying for suggested corrections to part-of-speech tagging might encourage individuals with knowledge of the Arabic language to participate who might not otherwise. It may also allow for better quality of work and higher consistency over free volunteer annotation. The existing website already provides all the required infrastructure to begin this experiment (an online part-of-speech tagging tool). The proposed experiment would contrast and compare the two approaches in terms of annotator speed, inter-annotator agreement, and tagging accuracy.