[PR] Added Span and ReadOnlySpan support to DataInput and DataOutput [lucenenet]

via GitHub Thu, 02 Oct 2025 17:00:04 -0700


NightOwl888 opened a new pull request, #1195:
URL: https://github.com/apache/lucenenet/pull/1195


   <!-- Thank you for submitting a pull request to our repo. -->
   
   <!-- Please do NOT submit PRs for features in newer versions of Lucene. We 
should keep the target version consistent across our repository to make the 
upgrade process to newer versions of Lucene go smoothly. -->
   
   <!-- If this is your first PR in the Lucene.NET repo, please run through the 
checklist
   below to ensure a smooth review and merge process for your PR. -->
   
   - [x] You've read the [Contributor 
Guide](https://github.com/apache/lucenenet/blob/master/CONTRIBUTING.md) and 
[Code of Conduct](https://www.apache.org/foundation/policies/conduct.html).
   - [x] You've included unit or integration tests for your change, where 
applicable.
   - [x] You've included inline docs for your change, where applicable.
   - [ ] There's an open issue for the PR that you are making. If you'd like to 
propose a change, please [open an 
issue](https://github.com/apache/lucenenet/issues/new/choose) to discuss the 
change or find an existing issue.
   
   <!-- Once all that is done, you're ready to go. Open the PR with the content 
below. -->
   
   Added Span<byte> and ReadOnlySpan<byte> support to DataInput and DataOutput
   
   ## Description
   
   This is part of an ongoing effort to broaden support for System.Memory types.
   
   This changes the primary I/O methods of `DataInput` and `DataOutput` to use 
`Span<byte>` and `ReadOnlySpan<byte>` instead of `byte[]` + range. The reason 
for this is to support slicing more readily in addition to directly supporting 
reading/writing from/to the stack. This will help us shift some of the 
temporary heap allocations onto the stack, which will take pressure off of the 
GC.
   
   There is some bias here toward .NET Core, as .NET Framework and .NET 
Standard 2.0 do not support 
   span-based overloads on `Stream`. As a result, there are patches to add this 
support included in this PR by using temporary buffers derived from the array 
pool. This is exactly the same approach taken in .NET Core on the `Stream` 
class (and even the same code), since forcing implementations to use spans 
would have been a breaking API contract change. IMHO, it is worth a little pain 
on .NET Framework to take advantage of the span APIs on `FileStream` on .NET 
Core, especially being that this will also allow us to reduce heap allocations 
in other layers such as the codecs.
   
   Such performance improvements are not included here, but will be in future 
PRs.
   
   ### New APIs
   
   ```c#
   namespace Lucene.Net.Index
   {
       public sealed class ByteSliceReader : DataInput
       {
           public override void ReadBytes(Span<byte> destination)
       }
       public class MockIndexInput : BufferedIndexInput
       {
           protected override void ReadInternal(Span<byte> destination)
       }
   }
   namespace Lucene.Net.Store
   {
       public abstract class BufferedIndexInput : IndexInput
       {
           protected abstract void ReadInternal(Span<byte> destination)
           public override sealed void ReadBytes(Span<byte> destination)
           public override sealed void ReadBytes(Span<byte> destination, bool 
useBuffer)
       }
       public abstract class BufferedIndexOutput : IndexOutput
       {
           protected internal abstract void FlushBuffer(ReadOnlySpan<byte> 
source)
           public override void WriteBytes(ReadOnlySpan<byte> source)
       }
       public sealed class ByteArrayDataInput : DataInput
       {
           public override void ReadBytes(Span<byte> destination)
       }
       public sealed class ByteArrayDataOutput : DataOutput
       {
           public override void WriteBytes(ReadOnlySpan<byte> source)
       }
       public abstract class ByteBufferIndexInput : IndexInput
       {
           public override sealed void ReadBytes(Span<byte> destination)
       }
       public abstract class DataInput
       {
           public abstract void ReadBytes(Span<byte> destination)
           public virtual void ReadBytes(Span<byte> destination, bool useBuffer)
       }
       public abstract class DataOutput
       {
           public abstract void WriteBytes(ReadOnlySpan<byte> source);
       }
       public abstract class FSDirectory : BaseDirectory
       {
           protected class FSIndexOutput : BufferedIndexOutput
           {
               public override void WriteBytes(ReadOnlySpan<byte> source)
               protected internal override void FlushBuffer(ReadOnlySpan<byte> 
source)
           }
       }
       public class InputStreamDataInput : DataInput, IDisposable
       {
           public override void ReadBytes(Span<byte> destination)
       }
       public class MockIndexInputWrapper : IndexInput
       {
           public override void ReadBytes(Span<byte> destination)
           public override void ReadBytes(Span<byte> destination, bool 
useBuffer)
       }
       public class MockIndexOutputWrapper : IndexOutput
       {
           public override void WriteBytes(ReadOnlySpan<byte> source)
       }
       public class NIOFSDirectory : FSDirectory
       {
           protected class NIOFSIndexInput : BufferedIndexInput
           {
               protected override void ReadInternal(Span<byte> destination)
           }
       }
       public class OutputStreamDataOutput : DataOutput, IDisposable
       {
           public override void WriteBytes(ReadOnlySpan<byte> source)
       }
       public class RAMInputStream : IndexInput
       {
           public override void ReadBytes(Span<byte> destination)
       }
       public class RAMOutputStream : IndexOutput
       {
           public override void WriteBytes(ReadOnlySpan<byte> source)
       }
       public class SimpleFSDirectory : FSDirectory
       {
           protected internal class SimpleFSIndexInput : BufferedIndexInput
           {
               protected override void ReadInternal(Span<byte> destination)
           }
       }
   }
   namespace Lucene.Net.Util
   {
       public sealed class GrowableByteArrayDataOutput : DataOutput
       {
           public override void WriteBytes(ReadOnlySpan<byte> source)
       }
       public sealed class PagedBytes
       {
           public sealed class PagedBytesDataInput : DataInput
           {
               public override void ReadBytes(Span<byte> destination)
           }
           public sealed class PagedBytesDataOutput : DataOutput
           {
               public override void WriteBytes(ReadOnlySpan<byte> source)
           }
       }
       public class ThrottledIndexOutput : IndexOutput
       {
           public override void WriteBytes(ReadOnlySpan<byte> source)
       }
   }
   ```
   
   The breaking change is that both `DataInput` and `DataOutput` classes gain a 
new abstract method for subclasses to implement. The `ReadBytes(byte[], int, 
int)` and `WriteBytes(byte[], int, int)` overloads were changed from `abstract` 
to `virtual`, but are still supported.
   
   ## Debugging
   
   Note that dropping the indices from the `ReadBytes` and `WriteBytes` is 
standard practice when using spans because of their ability to slice. It also 
means there is nothing that could go wrong that we would need guard clauses to 
catch. However, this means that we won't have 1-to-1 parity with Lucene. The 
"offset" will already be calculated prior to the method call, so it will be 
different than it will be when running the code in Java (although by a 
predictable amount).
   
   I considered trying to come up with some way that we could enable some sort 
of debug mode that specifically matches the Java code, but that seems like 
overkill. I also considered writing more robust tests for all subclasses to 
`DataInput` and `DataOutput`. That could still be done, but ultimately it would 
end up duplicating some of the Lucene tests that already exist.
   
   Finally, I took a look at the Lucene 10.x implementations of 
[`DataInput`](https://github.com/apache/lucene/blob/releases/lucene/10.3.0/lucene/core/src/java/org/apache/lucene/store/DataInput.java)
 and 
[`DataOutput`](https://github.com/apache/lucene/blob/releases/lucene/10.3.0/lucene/core/src/java/org/apache/lucene/store/DataOutput.java)
 and noticed that other than some new members they are largely unchanged. Also, 
when this code was being debugged, we didn't have the ability to repeat the 
random tests that failed. At the end of the day, I don't think it will be as 
hard to debug this going forward as it had been in the past.
   
   That being said, moving from the hard-coded byte conversions to the 
[`BinaryPrimitives` 
class](https://learn.microsoft.com/en-us/dotnet/api/system.buffers.binary.binaryprimitives?view=net-9.0)
 and creating our own "binary primitive" utility to handle `VInt32` and 
`VInt64` conversions in one place would drastically reduce the number of things 
that could fail in these classes. But it probably would be a good idea to move 
some of these hard-coded conversions into a test to confirm compatibility with 
`BinaryPrimitives` both across operating systems and through future versions of 
both .NET and Lucene.
   
   ## NIOFSDirectory Thread Safety
   
   This also addresses some issues with `NIOFSDirectory` and fixes it so it 
properly functions on .NET Core.
   
   1. It eliminates the use of `ByteBuffer` and instead uses `Span<byte>`
   2. It now calls 
[`RandomAccess.Read()`](https://learn.microsoft.com/en-us/dotnet/api/system.io.randomaccess.read?view=net-9.0)
 on .NET Core, which means the operation is now completely thread safe to use 
while updating the underlying stream.
   3. On .NET Framework, we still fall back to changing the position of the 
stream to read, and then reverting back to the original position. This is done 
within a lock so it is atomic, but does not sychronize with the rest of the 
stream.
   
   This is fixable on .NET Standard/.NET Framework by porting over the 
`RandomAccess` bits from .NET Core (which ultimately call native APIs). It is 
probably not very difficult now that we have some native API calls in the 
project. However, it may still not be worth the effort. We might be better off 
excluding this class from .NET Framework and .NET Standard or at least 
documenting that it is broken.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] Added Span and ReadOnlySpan support to DataInput and DataOutput [lucenenet]

Reply via email to