Hi Andreas, You're looking at a comparison between two FASTQ reads:
@Rec1 GCATGATATATACAAC + 012345'FBcEFGHIJ and @Rec1 GCATGATATATACAAC + 012345(FBcEFGHIJ The format is roughly @[name] [N * Nucleotides] + [N * Phred quality scores] So the difference is that the quality score for the 7th base (a 'T') is "'" instead of "(", corresponding to a Phred quality score of 6 instead of 7 (calculated as ASCII value - 33). It is merely a coincidence that this happened between numerical and non-numerical values. During the collapse step, in which two overlapping reads are merged, updated quality scores are calculated as a product of the scores for the two copies of the overlapping base-pair. This will either result in a higher quality score being assigned (for identical positions) or a lower quality score being assigned (for mismatching positions). The calculation to determine the updated quality score makes use of std::log10, which appears to produce slightly different results on i386 vs amd64, resulting in a small number of updated quality scores being off by one on i386 compared to amd64. I determined this simply by printing every single intermediate result for this calculation for binaries built with and without -m32. The solution I am taking is to simply pre-calculate the lookup table and include a hardcoded copy of that, instead populating the table when it is first used. It is not the prettiest solution, but it ensures that results can be reproduced regardless of the architecture. I realize that my last comment was rather vague, but I hope that this makes it clear what the issue was. See https://en.wikipedia.org/wiki/FASTQ_format for a detailed description of the format. -- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/MikkelSchubert/adapterremoval/issues/35#issuecomment-471948966