[
https://issues.apache.org/jira/browse/LUCENE-753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-753:
--------------------------------------
Attachment: FileReadTest.java
Carrying forward from this thread:
http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200806.mbox/[EMAIL
PROTECTED]
Jason Rutherglen <[EMAIL PROTECTED]> wrote:
{quote}
After thinking more about the pool of RandomAccessFiles I think
LUCENE-753 is the best solution. I am not sure how much work nor if
pool of RandomAccessFiles creates more synchronization problems and if
it is only to benefit windows, does not seem worthwhile.
{quote}
It wasn't clear to me that pread would in fact perform better than
letting each thread uses its own private RandomAccessFile.
So I modified (attached) FileReadTest.java to add a new SeparateFile
implementation, which opens a private RandomAccessFile per-thread and
then just does "classic" seeks & reads on that file. Then I ran the
test on 3 platforms (results below), using 4 threads.
The results are very interesting -- using SeparateFile is always
faster, especially so on WinXP Pro (115% faster than the next fastest,
ClassicFile) but also surprisingly so on Linux (44% faster than the
next fastest, ChannelPread). On Mac OS X it was 5% faster than
ChannelPread. So on all platforms it's faster, when using multiple
threads, to use separate files.
I don't have a Windows server class machine readily accessible so if
someone could run on such a machine, and run on other machines
(Solaris) to see if these results are reproducible, that'd be great.
This is a strong argument for some sort of pooling of
RandomAccessFiles under FSDirectory, though the counter balance is
clearly added complexity. I think if we combined the two approaches
(use separate RandomAccessFile objects per thread as managed by a
pool, and then use the best mode (classic on Windows & channel pread
on all others)) we'd likely get the best performance yet.
Mac OS X 10.5.3, single WD Velociraptor hard drive, Sun JRE 1.6.0_05
{code}
config: impl=ClassicFile serial=true nThreads=4 iterations=100 bufsize=1024
filelen=67108864
answer=-23909200, ms=151884, MB/sec=176.73715203708093
config: impl=SeparateFile serial=true nThreads=4 iterations=100 bufsize=1024
filelen=67108864
answer=-23909200, ms=97820, MB/sec=274.4177632386015
config: impl=ChannelPread serial=true nThreads=4 iterations=100 bufsize=1024
filelen=67108864
answer=-23909200, ms=103059, MB/sec=260.4677476008888
config: impl=ChannelFile serial=true nThreads=4 iterations=100 bufsize=1024
filelen=67108864
answer=-23909200, ms=176250, MB/sec=152.30380482269504
config: impl=ChannelTransfer serial=true nThreads=4 iterations=100 bufsize=1024
filelen=67108864
answer=-23909200, ms=365904, MB/sec=73.36226332589969
{code}
Linux 2.6.22.1, 6-drive RAID 5 array, Sun JRE 1.6.0_06
{code}
config: impl=ClassicFile serial=true nThreads=4 iterations=100 bufsize=1024
filelen=67108864
answer=-23909200, ms=75592, MB/sec=355.1109323737962
config: impl=SeparateFile serial=true nThreads=4 iterations=100 bufsize=1024
filelen=67108864
answer=-23909200, ms=35505, MB/sec=756.0497282072947
config: impl=ChannelPread serial=true nThreads=4 iterations=100 bufsize=1024
filelen=67108864
answer=-23909200, ms=51075, MB/sec=525.5711326480665
config: impl=ChannelFile serial=true nThreads=4 iterations=100 bufsize=1024
filelen=67108864
answer=-23909200, ms=95640, MB/sec=280.6727896277708
config: impl=ChannelTransfer serial=true nThreads=4 iterations=100 bufsize=1024
filelen=67108864
answer=-23909200, ms=93711, MB/sec=286.45031639828835
{code}
WIN XP PRO, laptop, Sun JRE 1.4.2_15:
{code}
config: impl=ClassicFile serial=true nThreads=4 iterations=100 bufsize=1024
filelen=67108864
answer=-23909200, ms=135349, MB/sec=198.32836297275932
config: impl=SeparateFile serial=true nThreads=4 iterations=100 bufsize=1024
filelen=67108864
answer=-23909200, ms=62970, MB/sec=426.2910211211688
config: impl=ChannelPread serial=true nThreads=4 iterations=100 bufsize=1024
filelen=67108864
answer=-23909200, ms=174606, MB/sec=153.73781886074937
config: impl=ChannelFile serial=true nThreads=4 iterations=100 bufsize=1024
filelen=67108864
answer=-23909200, ms=152171, MB/sec=176.4038193873997
config: impl=ChannelTransfer serial=true nThreads=4 iterations=100 bufsize=1024
filelen=67108864
answer=-23909200, ms=275603, MB/sec=97.39932293915524
{code}
> Use NIO positional read to avoid synchronization in FSIndexInput
> ----------------------------------------------------------------
>
> Key: LUCENE-753
> URL: https://issues.apache.org/jira/browse/LUCENE-753
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Store
> Reporter: Yonik Seeley
> Attachments: FileReadTest.java, FileReadTest.java, FileReadTest.java,
> FileReadTest.java, FSIndexInput.patch, FSIndexInput.patch, lucene-753.patch
>
>
> As suggested by Doug, we could use NIO pread to avoid synchronization on the
> underlying file.
> This could mitigate any MT performance drop caused by reducing the number of
> files in the index format.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]