On 11/03/2010 03:10 PM, stefano franchi wrote:
Dear Lyxers,

does anyone know any script that would convert a text/tex/lyx file to a sorted list of words (after removal of all punctuation)? That would be very helpful when creating the index, but I could not find anything with the usual googling techniques. Notice that I am not talking about automatic indexing tool. More modestly, I am looking for some sw help on generating a provisional list of words to be pared down (and later inserted) manually. There is an old page describing the procedure I would like to follow here: http://www.karakas-online.de/mySGML/lyx-automatic-index-generation.html. Unfortunately, the awk script it describes seems linked to an old version of Lyx.

Steve Litt describes the same technique in an old post, but he seems to assume it would be trivial to whip up the needed script. Not true for me, I am afraid...

Here's a script. Run it on an export of the LyX file to plain text as follows:
    perl w.pl <yourfile.txt
I've set it to ignore words of less than 4 characters, but you'll see where you can change that.

Richard

=====

Sample output

135ff: 1
13ff: 1
1880s: 2
201x: 2
20ff: 1
94ff: 1
abandon: 1
abandons: 1
able: 2
about: 16
above: 2
absent: 2
abstract: 1
acceptance: 1
accepting: 1
accordingly: 1
account: 1
accuse: 1
achieved: 2
acquaintance: 1
actor: 1
actually: 1
added: 1
addition: 1

etc....

Attachment: w.pl
Description: Perl program

Reply via email to