Am 07.11.23 um 20:37 schrieb Jan Trukenmüller:
The MIDI file parser misinterprets events without status byte when they appear 
directly after a Meta of SysEx event.

For my bugfix I had to decide between two possible solutions:
- Strict solution: Throw an InvalidMidiDataException
- Tolerant solution: Use the status of the last channel event as running status

I used the tolerant solution.
I will send an email to the list client-libs-dev with my reasons.

-------------

Commit messages:
  - 8319598: Fixed incorrect interpretation of interrupted running status in 
SMFParser

Changes: https://git.openjdk.org/jdk/pull/16546/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16546&range=00
   Issue: https://bugs.openjdk.org/browse/JDK-8319598
   Stats: 121 lines in 2 files changed: 118 ins; 0 del; 3 mod
   Patch: https://git.openjdk.org/jdk/pull/16546.diff
   Fetch: git fetch https://git.openjdk.org/jdk.git pull/16546/head:pull/16546

PR: https://git.openjdk.org/jdk/pull/16546


Hi,

As I promised, here are my reasons for choosing the tolerant solution.
But first a short description what my PR is about.

Definition:
An "interrupted running status" means:
- In a MIDI file a running status is active
- The next event is a Meta or SysEx event
- The next event does not contain a status byte

Problem:
Files with "interrupted running status" are not parsed correctly.

Possible solutions:
- Strict solution: Throw an InvalidMidiDataException whenever
  an interrupted running status is found
- Tolerant soution: Use the status byte of the last channel
  event with a status byte

Here are 4 reasons for choosing the tolerant solution, sorted from weak to strong:


Reason 1:

The SMFParser is very tolerant against even worse violations (e.g. data bytes greater than 0x7F, meta data with wrong length and so on) So I think it is in the spirit of the parser to also allow this minor violation. I think that interrupted running status is only a "minor" violation because there is only one sentence in the specification with this assumption. And because there is hardly any other possible interpretation that could make sense.


Reason 2:

I tried out a minimal file with interrupted running status with several other applications. The following ones interpret the file like suggested by the tolerant solution:

- timidity
- Rosegarden
- Musescore3
- aplaymidi
- Audacious
- midicsv
- JZZ.midi.SMF.js (JZZ.js)
- VLC (with fluidsynth plugin)

The following tools behave differently:

- Pipewire's pw-mididump
  reacts with a segmentation fault
- fluidsynth -a alsa -m alsa_seq -l -i testfile.mid
  Doesn't play the file but returns a successful exit code.
  No error message.
- pmidi
  Error (Meta event 60 not implemented\nUnexpected end of file)

That's 8:3.
And I don't think that the behavior of the 3 is desirable.


Reason 3:

I downloaded a large archive with more than 169.000 MIDI files and did a lot of analyses and tests.
First without the bugfix and then with the bugfix.
The results can be summarized like this:

- 1.1% of all MIDI files fail to be loaded by SMFParser (for ANY reason)
- 0.43% of all MIDI files contain interrupted running status
  (maybe some more that have a different problem before the first
  IRS occurs)
- From the files with IRS
  - before the bugfix:
    - 15.67% can be parsed without exception
             (but with incorrectly interpreted data)
    - 69.9% fail with InvalidMidiDataException
    - 14.42% fail with EOFException
  - after the bugfix:
    - 94.2% can be parsed without exception
    - 5.4% fail with InvalidMidiDataException
    - 0.4% fail with EOFException
- The bugfix fixes exceptions for:
  - 78.5% of files with interrupted running status
  - 30.25% of ALL failing MIDI files in general !!!!!
- The bugfix doesn't change anything to any MIDI file without
  interrupted running status

Details:

169457 files are in the archive.
721 contain interrupted running status. (I copied them to a different folder to be able to analyze them separately.)

Statistics from (only) the 721 files with interrupted running status:
- before the fix:
  - all:                      721
  - success:                  113
  - InvalidMidiDataException: 504
  - EOFException:             104
- after the fix:
  - all:                      721
  - success:                  679
  - InvalidMidiDataException:  39
  - EOFException:               3

Statistics from ALL files before and after the fix:
- before the fix:
  - all:                       169457
  - success:                   167586
  - fail:                        1871
    - InvalidMidiDataException:  1606
    - EOFException:               265
- after the fix:
  - all:                       169457
  - success:                   168152
  - fail:                        1305
    - InvalidMidiDataException:  1141
    - EOFException:               164


Reason 4:

In a different branch of the JDK I added my bugfix and also a lot of debug information to print out several events before and after every interrupted running status. I tested this against the 721 files with IRS. The result was more than 8000 occurrences of IRS, and I looked at each of them!
Manually!!!
The vast majority of these occurrences shows very clearly that the data makes absolute sense.

However a few of them (maybe 20-30) made no sense at all or were at least "suspicious".
So I looked at these remaining files with a hex editor.
In most cases there was a meta message with an incorrect length field or something similar, so that the following bytes were interpreted incorrectly.


Conclusion:

After this investigation I'm very confident that the tolerant solution is the best. And isn't it pretty cool to fix 30% of all exceptions with such a simple fix?

Best regards
Jan Trukenmüller
(Github user: truj)

Reply via email to