[ 
https://issues.apache.org/jira/browse/LUCENENET-629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shad Storhaug closed LUCENENET-629.
-----------------------------------
    Resolution: Abandoned

Moved to GitHub: https://github.com/apache/lucenenet/issues/262

> Lucene & Memory Mapped Files
> ----------------------------
>
>                 Key: LUCENENET-629
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-629
>             Project: Lucene.Net
>          Issue Type: Improvement
>          Components: Lucene.Net Core
>    Affects Versions: Lucene.Net 4.8.0
>            Reporter: Shad Storhaug
>            Priority: Minor
>              Labels: up-for-grabs
>
> This came in on the user mailing list on 15-July-2019 and was originally 
> reported by Vincent Van Den Berghe ([email protected])
>  
> {quote}Hello everyone,
>  
> I've just had an interesting performance debugging session, and one of the 
> things I've learned is probably applicable for Lucene.NET.
> I'll give it here with no guarantees, hoping that it might be useful to 
> someone.
>  
> Lucene uses memory mapped files for reading, most notably via 
> MemoryMappedFileByteBuffer. Profiling indicated that there are 2 calls that 
> have quite some overhead:
>  
>         public override ByteBuffer Get(byte[] dst, int offset, int length)
>         public override byte Get()
>  
> These calls spend their time in 2 methods of MemoryMappedViewAccessor:
>  
> public int ReadArray<T>(long position, T[] array, int offset, int count) 
> where T : struct; public byte ReadByte(long position);
>  
> The implementation of both contains a lot of overhead, especially 
> ReadArray<T>: apart from the parameter validation, this method makes sure 
> that the generic parameter T is properly aligned. This is irrelevant in our 
> use case, since T is byte. But because the method implementation doesn't make 
> any assumptions on T (other than the fact that is must be a value type, which 
> is the generic constraint), every call goes through the same motions, every 
> time.
> Microsoft should have provided specializations for common value types, and 
> certainly for byte arrays. Sadly, this is not the case.
> The other one, ReadByte, acquires and releases the (unsafe) pointer before 
> derefencing it to return one single byte.
>  
> A way to do this more efficiently (while avoiding unsafe code), is to acquire 
> the pointer handle associated with the view accessor, and use that pointer to 
> marshal information back to the caller.
> To do this, MemoryMappedFileByteBuffer needs one extra member variable to 
> hold the address:
>  
>        private long m_Ptr;
>  
>  
> Then, the 2 MemoryMappedFileByteBuffer constructors need to be rewritten as 
> follows (mainly to avoid code duplication):
>  
>               public MemoryMappedFileByteBuffer(MemoryMappedViewAccessor 
> accessor, int capacity)
>                            : this(accessor, capacity, 0)
>               {
>               }
>  
>               public MemoryMappedFileByteBuffer(MemoryMappedViewAccessor 
> accessor, int capacity, int offset)
>                      : base(capacity)
>               {
>                      this.accessor = accessor;
>                      this.offset = offset;
>                      
> System.Runtime.CompilerServices.RuntimeHelpers.PrepareConstrainedRegions();
>                      try
>                      {
>                      }
>                      finally
>                      {
>                            bool success = false;
>                            
> accessor.SafeMemoryMappedViewHandle.DangerousAddRef(ref success);
>                            m_Ptr = 
> accessor.SafeMemoryMappedViewHandle.DangerousGetHandle().ToInt64() + 
> accessor.PointerOffset;
>                      }
>               }
>  
> The only thing this does is getting the pointer handle. Yes, the method has 
> the word "Dangerous" in it, but it's perfectly safe :). Note that this needs 
> .NET version 4.5.1 or later, because we want the starting position of the 
> view from the beginning of the memory mapped file through the PointerOffset 
> property which is unavailable in earlier .NET releases.
> What the constructor does is to get a 64-bit quantity representing the start 
> of the memory mapped view. The special construct with an "empty try block" 
> conforms to the documentation regarding constrained execution regions 
> (although I think it's more of a cargo-cult thing, since constrained 
> execution doesn't solve a lot of problems in this case).
>  
> Finally, the Dispose method needs to be extended to release the pointer 
> handle using DangerousRelease:
>  
>         public void Dispose()
>         {
>             if (accessor != null)
>             {
>               accessor.SafeMemoryMappedViewHandle.DangerousRelease();
>               accessor.Dispose();
>               accessor = null;
>             }
>         }
>  
> At this point, we can replace the ReadArray in ByteBuffer Get by this:
>  
> Marshal.Copy(new IntPtr(m_Ptr + Ix(NextGetIndex(length))), dst, offset, 
> length);
>  
> And the ReadByte method becomes:
>  
>         public override byte Get()
>         {
>               return Marshal.ReadByte(new IntPtr(m_Ptr + Ix(NextGetIndex())));
>         }
>  
>  
> The Marshal class contains various read method to read various data types 
> (ReadInt16, ReadInt32), and it would be possible to rewrite all other methods 
> that currently assemble the types byte-per-byte. This is left as an exercise 
> for the reader. In any case, these methods have a lot less overhead than the 
> corresponding methods in the memory view accessor.
>  
> In my measurements, even when files reside on slow devices, the performance 
> improvements are noticeable: I'm seeing improvements of 5%, especially for 
> large segments. If you have slow I/O, the slow I/O still dominates, of 
> course: no such thing as a free lunch and all that.
>  
> As I said, no guarantees. Have fun with it! If you find something that is 
> unacceptable, let me know.
>  
>  
> Vincent
>  
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to