Package: libsphinxbase3 Version: 0.8+5prealalpha+1-1 Severity: important Tags: upstream Control: affects -1 src:pocketsphinx
Hi, The build for pocketsphinx fails on 64-bit big-endian architectures, failing with "No space left on device", as the testsuite log files fill up with hundreds of gigabytes of warnings. The first indication of the problem in the log files is: > Sorry, this does not support more than 33554432 n-grams of a particular > order. Edit util/bit_packing.hh and fix the bit packing functions where 33554432 is 0x2000000, i.e. 32 byte-swapped. This error isn't fatal though, and libsphinxbase3 continues to try to build the trie, with tons of duplicate word warnings, as it's reading all kinds of garbage. The issues stem from a widespread use of using fread to read multi-byte values with no regard for their endianness, with the first error, the wrong number of n-grams, coming from reading into the "counts" array in ngram_model_trie_read_bin. The library has functions like bio_fread which can do the byte-swapping for the caller, so presumably these should be used instead, though for this file format there does not seem to be an easy way to determine the endianness of the file based on some header magic like for some of the others (but maybe it's intended to always be little-endian). 32-bit big-endian architectures have the same underlying bugs, but it seems they die a lot earlier, failing to calloc huge sizes (presumably these same calls are made on 64-bit architectures but can be satisfied thanks to overcommitting) and thus don't actually try to build the trie and spew all the warnings. There are "only" 62 calls to fread in sphinxbase (and a further 45 in pocketsphinx) so it shouldn't be too hard for someone with knowledge of the codebase to audit their uses, especially since my guess is that most of them can be turned into something like `bio_fread(..., IS_BIG_ENDIAN)`. Similarly, the corresponding fwrite calls should be audited too. Regards, James