Hi Varun, Nice proposal, very complete. Only one thing missing, you should mention somewhere how many hours a week you are willing to spend working on the project and whether there is any holiday you won't be able to work.
Good luck ;) On Wed, Apr 6, 2011 at 5:57 PM, Varun Thacker <varunthacker1...@gmail.com>wrote: > I have drafted the proposal on the official GSoC website . This is the link > to my proposal http://goo.gl/uYXrV . Please do let me know if anything > needs to be changed ,added or removed. > > I will keep on working on it till the deadline on the 8th. > > On Wed, Apr 6, 2011 at 11:41 PM, Michael McCandless < > luc...@mikemccandless.com> wrote: > >> That test code looks good -- you really should have seen awful >> performance had you used O_DIRECT since you read byte by byte. >> >> A more realistic test is to read a whole buffer (eg 4 KB is what >> Lucene now uses during merging, but we'd probably up this to like 1 MB >> when using O_DIRECT). >> >> Linus does hate O_DIRECT (see http://kerneltrap.org/node/7563), and >> for good reason: its existence means projects like ours can use it to >> "work around" limitations in the Linux IO apis that control the buffer >> cache when, otherwise, we might conceivably make patches to fix Linux >> correctly. It's an escape hatch, and we all use the escape hatch >> instead of trying to fix Linux for real... >> >> For example the NOREUSE flag is a no-op now in Linux, which is a >> shame, because that's precisely the flag we'd want to use for merging >> (along with SEQUENTIAL). Had that flag been implemented well, it'd >> give better results than our workaround using O_DIRECT. >> >> Anyway, giving how things are, until we can get more control (waaaay >> up in Javaland) over the buffer cache, O_DIRECT (via native directory >> impl through JNI) is our only real option, today. >> >> More details here: >> http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html >> >> Note that other OSs likely do a better job and actually implement >> NOREUSE, and similar APIs, so the generic Unix/WindowsNativeDirectory >> would simply use NOREUSE on these platforms for I/O during segment >> merging. >> >> Mike >> >> http://blog.mikemccandless.com >> >> On Wed, Apr 6, 2011 at 11:56 AM, Varun Thacker >> <varunthacker1...@gmail.com> wrote: >> > Hi. I wrote a sample code to test out speed difference between >> SEQUENTIAL >> > and O_DIRECT( I used the madvise flag-MADV_DONTNEED) reads . >> > >> > This is the link to the code: http://pastebin.com/8QywKGyS >> > >> > There was a speed difference which when i switched between the two >> flags. I >> > have not used the O_DIRECT flag because Linus had criticized it. >> > >> > Is this what the flags are intended to be used for ? This is just a >> sample >> > code with a test file . >> > >> > On Wed, Apr 6, 2011 at 12:11 PM, Simon Willnauer >> > <simon.willna...@googlemail.com> wrote: >> >> Hey Varun, >> >> On Tue, Apr 5, 2011 at 11:07 PM, Michael McCandless >> >> <luc...@mikemccandless.com> wrote: >> >>> Hi Varun, >> >>> >> >>> Those two issues would make a great GSoC! Comments below... >> >> +1 >> >>> >> >>> On Tue, Apr 5, 2011 at 1:56 PM, Varun Thacker >> >>> <varunthacker1...@gmail.com> wrote: >> >>> >> >>>> I would like to combine two tasks as part of my project >> >>>> namely-Directory createOutput and openInput should take an IOContext >> >>>> (Lucene-2793) and compliment it by Generalize DirectIOLinuxDir to >> >>>> UnixDir (Lucene-2795). >> >>>> >> >>>> The first part of the project is aimed at significantly reducing time >> >>>> taken to search during indexing by adding an IOContext which would >> >>>> store buffer size and have options to bypass the OS’s buffer cache >> >>>> (This is what causes the slowdown in search ) and other hints. Once >> >>>> completed I would move on to Lucene-2795 and generalize the Directory >> >>>> implementation to make a UnixDirectory . >> >>> >> >>> So, the first part (LUCENE-2793) should cause no change at all to >> >>> performance, functionality, etc., because it's "merely" installing the >> >>> plumbing (IOContext threaded throughout the low-level store APIs in >> >>> Lucene) so that higher levels can send important details down to the >> >>> Directory. We'd fix IndexWriter/IndexReader to fill out this >> >>> IOContext with the details (merging, flushing, new reader, etc.). >> >>> >> >>> There's some fun/freedom here in figuring out just what details should >> >>> be included in IOContext... (eg: is it low level "set buffer size to 4 >> >>> KB" >> >>> or is it high level "I am opening a new near-real-time reader"). >> >>> >> >>> This first step is a rote cutover, just changing APIs but in no way >> >>> taking advantage of the new APIs. >> >>> >> >>> The 2nd step (LUCENE-2795) would then take advantage of this plumbing, >> >>> by creating a UnixDir impl that, using JNI (C code), passes advanced >> >>> flags when opening files, based on the incoming IOContext. >> >>> >> >>> The goal is a single UnixDir that has ifdefs so that it's usable >> >>> across multiple Unices, and eg would use direct IO if the context is >> >>> merging. If we are ambitious we could rope Windows into the mix, too, >> >>> and then this would be NativeDir... >> >>> >> >>> We can measure success by validating that a big merge while searching >> >>> does not hurt search performance? (Ie we should be able to reproduce >> >>> the results from >> >>> http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html >> ). >> >> >> >> Thanks for the summary mike! >> >>> >> >>>> I have spoken to Micheal McCandless and Simon Willnauer about >> >>>> undertaking these tasks. Micheal McCandless has agreed to mentor me . >> >>>> I would love to be able to contribute and learn from Apache Lucene >> >>>> community this summer. Also I would love suggestions on how to make >> my >> >>>> application proposal stronger. >> >>> >> >>> I think either Simon or I can be the "official" mentor, and then the >> >>> other one of us (and other Lucene committers) will support/chime >> >>> in... >> >> >> >> I will take the official responsibility here once we are there! >> >> simon >> >>> >> >>> This is an important change for Lucene! >> >>> >> >>> Mike >> >>> >> >>> --------------------------------------------------------------------- >> >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> >>> For additional commands, e-mail: dev-h...@lucene.apache.org >> >>> >> >>> >> >> >> > >> > >> > >> > -- >> > >> > >> > Regards, >> > Varun Thacker >> > http://varunthacker.wordpress.com >> > >> > >> > >> > >> > > > > -- > > > Regards, > Varun Thacker > http://varunthacker.wordpress.com > > >