> What is "alpha-offset format"? we, corpora research kinds of folks, need to process thousand of files as other people process bytes. UTF8 was basically an Americanizierung of alle alphabets. UTF is great to describe an alphabet but not for text files.
UTF8 turned all files into streams not good for questions such as what is the charatcer/string sequence starting on the nth addressable unit of a file ... Doing that with utF8 is from way too complicated to impossible. Also alpha offset nicely splits the files segments into its different parts: ALPHABETICAL text, js, css, ... lbrtchx