[
https://issues.apache.org/jira/browse/LUCENENET-629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shad Storhaug closed LUCENENET-629.
-----------------------------------
Resolution: Abandoned
Moved to GitHub: https://github.com/apache/lucenenet/issues/262
> Lucene & Memory Mapped Files
> ----------------------------
>
> Key: LUCENENET-629
> URL: https://issues.apache.org/jira/browse/LUCENENET-629
> Project: Lucene.Net
> Issue Type: Improvement
> Components: Lucene.Net Core
> Affects Versions: Lucene.Net 4.8.0
> Reporter: Shad Storhaug
> Priority: Minor
> Labels: up-for-grabs
>
> This came in on the user mailing list on 15-July-2019 and was originally
> reported by Vincent Van Den Berghe ([email protected])
>
> {quote}Hello everyone,
>
> I've just had an interesting performance debugging session, and one of the
> things I've learned is probably applicable for Lucene.NET.
> I'll give it here with no guarantees, hoping that it might be useful to
> someone.
>
> Lucene uses memory mapped files for reading, most notably via
> MemoryMappedFileByteBuffer. Profiling indicated that there are 2 calls that
> have quite some overhead:
>
> public override ByteBuffer Get(byte[] dst, int offset, int length)
> public override byte Get()
>
> These calls spend their time in 2 methods of MemoryMappedViewAccessor:
>
> public int ReadArray<T>(long position, T[] array, int offset, int count)
> where T : struct; public byte ReadByte(long position);
>
> The implementation of both contains a lot of overhead, especially
> ReadArray<T>: apart from the parameter validation, this method makes sure
> that the generic parameter T is properly aligned. This is irrelevant in our
> use case, since T is byte. But because the method implementation doesn't make
> any assumptions on T (other than the fact that is must be a value type, which
> is the generic constraint), every call goes through the same motions, every
> time.
> Microsoft should have provided specializations for common value types, and
> certainly for byte arrays. Sadly, this is not the case.
> The other one, ReadByte, acquires and releases the (unsafe) pointer before
> derefencing it to return one single byte.
>
> A way to do this more efficiently (while avoiding unsafe code), is to acquire
> the pointer handle associated with the view accessor, and use that pointer to
> marshal information back to the caller.
> To do this, MemoryMappedFileByteBuffer needs one extra member variable to
> hold the address:
>
> private long m_Ptr;
>
>
> Then, the 2 MemoryMappedFileByteBuffer constructors need to be rewritten as
> follows (mainly to avoid code duplication):
>
> public MemoryMappedFileByteBuffer(MemoryMappedViewAccessor
> accessor, int capacity)
> : this(accessor, capacity, 0)
> {
> }
>
> public MemoryMappedFileByteBuffer(MemoryMappedViewAccessor
> accessor, int capacity, int offset)
> : base(capacity)
> {
> this.accessor = accessor;
> this.offset = offset;
>
> System.Runtime.CompilerServices.RuntimeHelpers.PrepareConstrainedRegions();
> try
> {
> }
> finally
> {
> bool success = false;
>
> accessor.SafeMemoryMappedViewHandle.DangerousAddRef(ref success);
> m_Ptr =
> accessor.SafeMemoryMappedViewHandle.DangerousGetHandle().ToInt64() +
> accessor.PointerOffset;
> }
> }
>
> The only thing this does is getting the pointer handle. Yes, the method has
> the word "Dangerous" in it, but it's perfectly safe :). Note that this needs
> .NET version 4.5.1 or later, because we want the starting position of the
> view from the beginning of the memory mapped file through the PointerOffset
> property which is unavailable in earlier .NET releases.
> What the constructor does is to get a 64-bit quantity representing the start
> of the memory mapped view. The special construct with an "empty try block"
> conforms to the documentation regarding constrained execution regions
> (although I think it's more of a cargo-cult thing, since constrained
> execution doesn't solve a lot of problems in this case).
>
> Finally, the Dispose method needs to be extended to release the pointer
> handle using DangerousRelease:
>
> public void Dispose()
> {
> if (accessor != null)
> {
> accessor.SafeMemoryMappedViewHandle.DangerousRelease();
> accessor.Dispose();
> accessor = null;
> }
> }
>
> At this point, we can replace the ReadArray in ByteBuffer Get by this:
>
> Marshal.Copy(new IntPtr(m_Ptr + Ix(NextGetIndex(length))), dst, offset,
> length);
>
> And the ReadByte method becomes:
>
> public override byte Get()
> {
> return Marshal.ReadByte(new IntPtr(m_Ptr + Ix(NextGetIndex())));
> }
>
>
> The Marshal class contains various read method to read various data types
> (ReadInt16, ReadInt32), and it would be possible to rewrite all other methods
> that currently assemble the types byte-per-byte. This is left as an exercise
> for the reader. In any case, these methods have a lot less overhead than the
> corresponding methods in the memory view accessor.
>
> In my measurements, even when files reside on slow devices, the performance
> improvements are noticeable: I'm seeing improvements of 5%, especially for
> large segments. If you have slow I/O, the slow I/O still dominates, of
> course: no such thing as a free lunch and all that.
>
> As I said, no guarantees. Have fun with it! If you find something that is
> unacceptable, let me know.
>
>
> Vincent
>
> {quote}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)