Hey All Over in the Apache Jena [1] project one of the RDF serialisations we support is based upon a binary encoding [2] of RDF using Thrift [3]. We use the Thrift compact protocol and have various implementations of our high-level reader [4] and writer [5] interfaces wrapped around those.
However, it was recently noticed that when presented with completely junk data our readers would silently return. And more worryingly, taking an otherwise valid Thrift data stream and arbitrarily truncating it would produce some valid output and silently ignore the trailing malformed data thus losing some data. Upon debugging this was determined to be due to our own EOF exception check at [6]. Historically this appears to have been added due to THRIFT-5022 which was fixed in your codebase back in Nov 2019 [7]. However, even with that fix calling protocol.getTransport().isOpen() isn’t a usable check for EOF as even if EOF has been reached this can still return true as the underlying stream might not have been closed yet. Similarly, protocol.getTransport().peek() doesn’t seem a reliable indicator of whether EOF has been reached (it just defers to isOpen() in the default implementation), and we don’t know whether the underlying InputStream is buffered or not (though we do try to enforce that in our wrapper code). So, we’ve currently ended up with this draft PR [8] which feels like a complete hack as I’m basically examining the Exception stack trace to see where in the read attempt we were when we failed and use that as an indicator of malformed input vs true EOF. I feel like we’re missing something obvious about how best to use the Thrift Java APIs, any guidance/suggestions on how to do differentiate between EOF and malformed inputs better would be much appreciated! Or if it’s preferable to handle this as a JIRA Issue let me know and I can get that filed Cheers, Rob [1]: https://jena.apache.org [2]: https://jena.apache.org/documentation/io/rdf-binary.html [3]: https://github.com/apache/jena/blob/main/jena-arq/Grammar/RDF-Thrift/BinaryRDF.thrift [4]: https://github.com/apache/jena/blob/93335dd1b3b2502968ef3cdcf5b7d60540577e9b/jena-arq/src/main/java/org/apache/jena/riot/thrift/ThriftRDF.java#L165-L185 [5]: https://github.com/apache/jena/blob/main/jena-arq/src/main/java/org/apache/jena/riot/thrift/StreamRDF2Thrift.java [6]: https://github.com/apache/jena/blob/93335dd1b3b2502968ef3cdcf5b7d60540577e9b/jena-arq/src/main/java/org/apache/jena/riot/thrift/ThriftRDF.java#L177-L179 [7]: https://lists.apache.org/thread/j03hf73yv1ljc0fk6ffohfc22yo3v7hd [8]: https://github.com/apache/jena/pull/2408/files#diff-2dce962e024e06557e133fe135c8e968e1a381aee892894f236b9a22046008d4