[Corpora-List] September 2023 Newsletter - LDC

Penn LDC via Corpora Fri, 15 Sep 2023 08:02:50 -0700

In this newsletter:
LDC data and commercial technology development

New publications:
CALLFRIEND Russian Speech<https://catalog.ldc.upenn.edu/LDC2023S08>
CALLFRIEND Russian Text<https://catalog.ldc.upenn.edu/LDC2023T09>
________________________________
LDC data and commercial technology development
For-profit organizations are reminded that an LDC membership is a pre-requisite 
for obtaining a commercial license to almost all LDC databases. Non-member 
organizations, including non-member for-profit organizations, cannot use LDC 
data to develop or test products for commercialization, nor can they use LDC 
data in any commercial product or for any commercial purpose. LDC data users 
should consult corpus-specific license agreements for limitations on the use of 
certain corpora. Visit the 
Licensing<https://www.ldc.upenn.edu/data-management/using/licensing> page for 
further information.
________________________________


New publications:
CALLFRIEND Russian Speech<https://catalog.ldc.upenn.edu/LDC2023S08> was 
developed by LDC and consists of 48 hours of telephone conversations (100 
recordings) between native speakers of Russian. The calls were recorded in 1999 
as part of the CALLFRIEND collection, a project designed primarily to support 
research in automatic language identification. One hundred native Russian 
speakers living in the continental United States each made a single phone call, 
lasting up to 30 minutes, to a family member or friend living in the United 
States.

All recordings involved domestic calls routed through LDC's automated telephone 
collection platform and stored as 2-channel (4-wire) 8-KHz mu-law samples taken 
directly from a public telephone network via a T-1 circuit. Each audio file is 
a FLAC-compressed MS-WAV (RIFF) format audio file containing 2-channel, 8-KHz, 
16-bit PCM sample data.

This release includes call metadata, including speaker gender, the number of 
speakers on each channel, and call duration.

Corresponding transcripts and a lexicon are available in CALLFRIEND Russian 
Text (LDC2023T09<http://catalog.ldc.upenn.edu/LDC2023T09>).

2023 members can access this corpus through their LDC accounts. Non-members may 
license this data for a fee.
*
CALLFRIEND Russian Text<https://catalog.ldc.upenn.edu/LDC2023T09> contains the 
corresponding transcripts and a lexicon for CALLFRIEND Russian 
Speech<https://catalog.ldc.upenn.edu/LDC2023S08>, that is, 48 hours of 
telephone conversations (100 recordings) between native Russian speakers.

The transcripts have four main fields on each line (begin_offset, end_offset, 
speaker_label, transcript_text) separated by tabs. Each contains a list of 
time-stamped segments in order according to their begin_offset values, with no 
blank lines.

The lexicon covers the word forms in the 97 transcript files. The main lexicon 
table contains three columns per row: Cyrillic orthography, phonetic 
transliteration, and numeric representation of syllabic stress.

Corresponding speech data is available as CALLFRIEND Russian Speech 
(LDC2023S08<http://catalog.ldc.upenn.edu/LDC2023S08>).

2023 members can access this corpus through their LDC accounts. Non-members may 
license this data for a fee.

To unsubscribe from this newsletter, log in to your LDC 
account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to 
"Receive Newsletter" under Account Options or contact LDC for assistance.


Membership Coordinator
Linguistic Data Consortium<ldc.upenn.edu>
University of Pennsylvania
T: +1-215-573-1275
E: l...@ldc.upenn.edu<mailto:l...@ldc.upenn.edu>
M: 3600 Market St. Suite 810
      Philadelphia, PA 19104

_______________________________________________
Corpora mailing list -- corpora@list.elra.info
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to corpora-le...@list.elra.info

[Corpora-List] September 2023 Newsletter - LDC

Reply via email to