Wmatrix is a software tool for corpus analysis and comparison. It providesa 
web interface to the English USAS and CLAWS corpus annotation tools, 
andstandard corpus linguistic methodologies such as frequency lists 
andconcordances. It also extends the keywords method to key 
grammaticalcategories and key semantic domains.Wmatrix allows the user to 
run these tools via a web browser such as Chrome or Firefox,and so will run 
on any computer (Mac, Windows, Linux) with a web browser anda network 
connection.Wmatrix was initially developed by Paul Raysonin the REVERE 
project,extended and applied to corpus linguistics during PhD workand is 
still being updated regularly. Earlier versions were available for Unix 
viaterminal-based command line access (tmatrix) and Unix via Xwindows 
(Xmatrix),but these only offer retrieval of text pre-annotated with USAS 
and CLAWS.*Sections in this introduction to Wmatrix*:screenshots, 
screencasts (short video introductions),acknowledgements and references for 
Wmatrix, and example applications and publications.*Tutorial for Wmatrix*: 
with step-by-step instructions using a case study on howto compare Liberal 
Democrat and Labour Party Manifestos for the 2005 UK General 
Election(updated May 2022).Further examples of the application to the 2010 
general election manifestos can be seenon Paul's blog.The plain text 
versions of the 2010 UK election manifestos can be downloaded foruse in 
your favourite text analysis software (with thanks to Martin Wynne for 
editing two of the files).TEI encoded versions of the 2010 election 
manifestos are now available (with thanks to Lou Burnard).Similar 
application has also been carried out on the 2015,2017 and 2019General 
Election manifestos with downloadable versions of the documents from seven 
main parties.Two versions of Wmatrix are now live:

-wmatrix4.lancaster.ac.uk/*Usernames for Wmatrix* are free to members and 
alumni of Lancaster University for non-commercial research.Please apply on 
Wmatrix5 using your Lancaster email address, or if you no longer have 
access to a Lancaster address as an alumni then please contactPaul Rayson. 
Accounts on Wmatrix5 are freely available for UK government and academic 
researchers in countries on the OECD DAC list of ODA recipients ( ), and 
these accounts will stay free beyond the current one month trial 
period.Please apply on Wmatrix5 using your organisational email 
for non-commercial research and teaching:* (e.g. by non-Lancaster academics 
and students).A free one-month trial is available for individual academic 
users, please apply on Wmatrix5 using your organisational email address to 
set up a username and password. Once the one-month trial has expired, 
usernames are available for 50 per username per yearfrom the online secure 
order page run by Lancaster University.Multiple usernames (or years) may be 
purchased at a reduced cost e.g. for teaching purposes. Please contact Paul 
for details.Further development, support, and external availability of 
Wmatrix currently depends on licensing its use.Introduction to 
WmatrixFoldersWmatrix users can upload their own corpus data to the system, 
so that it can be automaticallyannotated and viewed within the web 
browser.Each file is stored in a folder (equivalent to a folder in Windows 
or directory on Unix).Input format guidelinesThe analysis may be improved 
with some pre-editing of the input text, although pre-editing is not 
normally required. There are guidelinesprovided for texts to be tagged by 
CLAWS. Most important is the replacementof less-than () characters by the 
corresponding SGML entity references (<) and (>) respectively. The text may 
contain well-formed HTML, SGML or XML tags. If the text contains less-than 
or greater-than symbols in formulae, for example, then CLAWS may mistake 
large quantities of the following text for SGML tags, or fail to POS tag 
the file.The guidelines mention start and end text markers, but these are 
not requiredsince they are inserted for you by Wmatrix.Tag wizardWmatrix 
users can upload their file and complete the automatic tagging process by 
clicking on the tagwizard. Once the file has been uploaded to the web 
server, it is POS tagged by CLAWSand semantically tagged by USAS. This 
process can be carried out step by step startingwith the 'load file without 
tagging' option in the advanced interface.As a shortcut you can simply 
upload frequency profilesif you have them. The format for a frequency list 
is a very simple two column formatwith a total line at the head of the 
file. You can see an example of this. The column widths are not 
significant.My Tag WizardMy Tag Wizard is a variant of the tag wizard which 
allows you tooverride or extend the system dictionaries for your own data. 
There aretwo main uses. First, you can override the current most likely tag 
for anyword or MWE. Second, you can extend the dictionaries in terms of 
coverageof vocabulary and tagset. For example, you can create a new tag 
bylisting the words and MWEs that you wish to be tagged with it.Viewing 
foldersBy clicking on the folder name, the user can see its contents. 
Following the applicationof the tag wizard, the folder contains the 
original text, POS and semantically tagged versions of that text, and a set 
of frequency profiles.Simple and advanced interfacesThe user can toggle 
between simple and advanced interfaces in Wmatrix.The advanced interface 
offers more options and more control over the data.Frequency profilesFrom 
the folder view, the user can click on a frequency list to see the most 
frequent items in their corpus. Frequency lists are available for words in 
the simple interface, and in the advanced interfacefor POS tags and 
semantic tags.The lists can be sorted alphabetically or by 
frequency.ConcordancesFrom the frequency list view, the user can click on 
'concordance' and see standard concordances. These can show the usual word 
based concordance as well asall occurrences for words in one POS or 
semantic category.Key words, key POS and key domains: comparison of 
frequency listsFrom the folder view, the user can click on compare 
frequency list toperform a comparison of the frequency list for their 
corpus against another largernormative corpus such as the BNC sampler, or 
against another of their own texts (once that text has been loaded into 
Wmatrix). This comparison can be carried outat the word level to see 
keywords, or at the POS (in the advanced interface), or at the semantic 
level (to see key concepts or domains). The log-likelihood statistic is 
employed by Wmatrix. For more details, see the log-likelihood calculator.In 
the simple interface, word and tag clouds are shown which visualise the 
more significant differences in the larger font sizes.In the advanced 
interface more detailed frequency information is also displayed in table 
form. Then the key comparison shows the most significant key itemstowards 
the top of the list since the result is sorted on the LL(log-likelihood) 
field which shows how significant the difference is.You should just look at 
items with a '+' code since this shows overusein your text as compared to 
the standard English corpora. To bestatistically significant you should 
look at items with a LL value over about 7, since 6.63 is the cut-off for 
99% confidence ofsignificance.N-grams and c-gramsRecurrent sequences of 
words are called n-grams in Wmatrix. These are similarto clusters in 
WordSmith and lexical bundles in Biber's work. You can calculaten-grams of 
length 2 to 5 for each text. Collapsed-grams (or c-grams) area merged 
version of these lists. They show you which 2-grams are subsets of3-grams, 
which 3-grams are subsets of 4-grams, and so on. The resulting c-gramlist 
is a tree structure with the longest n-grams on the left and shortest 
n-grams on the right.CollocationsCollocations in Wmatrix are pairs of words 
that occur together more often than would be expecteddue to chance. There 
are a choice of 11 different statistics that can be used to calculate the 
strength of association between the two words. For further details about 
these statistics, see the following paper:

Piao, S. (2002) Word alignment in English-Chinese parallel corpora.*Literary 
and linguistic computing*, 17 (2), 207-230. doi:10.1093/llc/17.2.207

The collocation feature was introduced in September 2009 and is currently 
in beta testing.Screencasts:This section shows short video introductions to 
the Wmatrix software.Further videos will be appearing soon. 
Acknowledgements and references:Wmatrix was initially developed within the 
REVERE project (REVerse Engineering of Requirements)funded by the EPSRC, 
project numberGR/MO4846. Lancaster University Proof of concept funding in 
July 2006provided support for a new server and continued software 
development.In December 2006, further interface design using XHTML/CSS was 
carried out by Andrew Foote (InfoLab21 Knowledge Business Centre) funded 
under support fromthe European Regional Development Fund. Through a 
Lancaster University small grant(Towards an Online Conceptual Database of 
the Latin Vulgate Bible)a 'reader' interface is being developed for 
pre-tagged corpora.

