I have updated my proposal online to mention the time I would be able to dedicate to the project .
On Thu, Apr 7, 2011 at 7:05 AM, Adriano Crestani <[email protected]>wrote: > Hi Varun, > > Nice proposal, very complete. Only one thing missing, you should mention > somewhere how many hours a week you are willing to spend working on the > project and whether there is any holiday you won't be able to work. > > Good luck ;) > > > On Wed, Apr 6, 2011 at 5:57 PM, Varun Thacker > <[email protected]>wrote: > >> I have drafted the proposal on the official GSoC website . This is the >> link to my proposal http://goo.gl/uYXrV . Please do let me know if >> anything needs to be changed ,added or removed. >> >> I will keep on working on it till the deadline on the 8th. >> >> On Wed, Apr 6, 2011 at 11:41 PM, Michael McCandless < >> [email protected]> wrote: >> >>> That test code looks good -- you really should have seen awful >>> performance had you used O_DIRECT since you read byte by byte. >>> >>> A more realistic test is to read a whole buffer (eg 4 KB is what >>> Lucene now uses during merging, but we'd probably up this to like 1 MB >>> when using O_DIRECT). >>> >>> Linus does hate O_DIRECT (see http://kerneltrap.org/node/7563), and >>> for good reason: its existence means projects like ours can use it to >>> "work around" limitations in the Linux IO apis that control the buffer >>> cache when, otherwise, we might conceivably make patches to fix Linux >>> correctly. It's an escape hatch, and we all use the escape hatch >>> instead of trying to fix Linux for real... >>> >>> For example the NOREUSE flag is a no-op now in Linux, which is a >>> shame, because that's precisely the flag we'd want to use for merging >>> (along with SEQUENTIAL). Had that flag been implemented well, it'd >>> give better results than our workaround using O_DIRECT. >>> >>> Anyway, giving how things are, until we can get more control (waaaay >>> up in Javaland) over the buffer cache, O_DIRECT (via native directory >>> impl through JNI) is our only real option, today. >>> >>> More details here: >>> http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html >>> >>> Note that other OSs likely do a better job and actually implement >>> NOREUSE, and similar APIs, so the generic Unix/WindowsNativeDirectory >>> would simply use NOREUSE on these platforms for I/O during segment >>> merging. >>> >>> Mike >>> >>> http://blog.mikemccandless.com >>> >>> On Wed, Apr 6, 2011 at 11:56 AM, Varun Thacker >>> <[email protected]> wrote: >>> > Hi. I wrote a sample code to test out speed difference between >>> SEQUENTIAL >>> > and O_DIRECT( I used the madvise flag-MADV_DONTNEED) reads . >>> > >>> > This is the link to the code: http://pastebin.com/8QywKGyS >>> > >>> > There was a speed difference which when i switched between the two >>> flags. I >>> > have not used the O_DIRECT flag because Linus had criticized it. >>> > >>> > Is this what the flags are intended to be used for ? This is just a >>> sample >>> > code with a test file . >>> > >>> > On Wed, Apr 6, 2011 at 12:11 PM, Simon Willnauer >>> > <[email protected]> wrote: >>> >> Hey Varun, >>> >> On Tue, Apr 5, 2011 at 11:07 PM, Michael McCandless >>> >> <[email protected]> wrote: >>> >>> Hi Varun, >>> >>> >>> >>> Those two issues would make a great GSoC! Comments below... >>> >> +1 >>> >>> >>> >>> On Tue, Apr 5, 2011 at 1:56 PM, Varun Thacker >>> >>> <[email protected]> wrote: >>> >>> >>> >>>> I would like to combine two tasks as part of my project >>> >>>> namely-Directory createOutput and openInput should take an IOContext >>> >>>> (Lucene-2793) and compliment it by Generalize DirectIOLinuxDir to >>> >>>> UnixDir (Lucene-2795). >>> >>>> >>> >>>> The first part of the project is aimed at significantly reducing >>> time >>> >>>> taken to search during indexing by adding an IOContext which would >>> >>>> store buffer size and have options to bypass the OS’s buffer cache >>> >>>> (This is what causes the slowdown in search ) and other hints. Once >>> >>>> completed I would move on to Lucene-2795 and generalize the >>> Directory >>> >>>> implementation to make a UnixDirectory . >>> >>> >>> >>> So, the first part (LUCENE-2793) should cause no change at all to >>> >>> performance, functionality, etc., because it's "merely" installing >>> the >>> >>> plumbing (IOContext threaded throughout the low-level store APIs in >>> >>> Lucene) so that higher levels can send important details down to the >>> >>> Directory. We'd fix IndexWriter/IndexReader to fill out this >>> >>> IOContext with the details (merging, flushing, new reader, etc.). >>> >>> >>> >>> There's some fun/freedom here in figuring out just what details >>> should >>> >>> be included in IOContext... (eg: is it low level "set buffer size to >>> 4 >>> >>> KB" >>> >>> or is it high level "I am opening a new near-real-time reader"). >>> >>> >>> >>> This first step is a rote cutover, just changing APIs but in no way >>> >>> taking advantage of the new APIs. >>> >>> >>> >>> The 2nd step (LUCENE-2795) would then take advantage of this >>> plumbing, >>> >>> by creating a UnixDir impl that, using JNI (C code), passes advanced >>> >>> flags when opening files, based on the incoming IOContext. >>> >>> >>> >>> The goal is a single UnixDir that has ifdefs so that it's usable >>> >>> across multiple Unices, and eg would use direct IO if the context is >>> >>> merging. If we are ambitious we could rope Windows into the mix, >>> too, >>> >>> and then this would be NativeDir... >>> >>> >>> >>> We can measure success by validating that a big merge while searching >>> >>> does not hurt search performance? (Ie we should be able to reproduce >>> >>> the results from >>> >>> >>> http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html). >>> >> >>> >> Thanks for the summary mike! >>> >>> >>> >>>> I have spoken to Micheal McCandless and Simon Willnauer about >>> >>>> undertaking these tasks. Micheal McCandless has agreed to mentor me >>> . >>> >>>> I would love to be able to contribute and learn from Apache Lucene >>> >>>> community this summer. Also I would love suggestions on how to make >>> my >>> >>>> application proposal stronger. >>> >>> >>> >>> I think either Simon or I can be the "official" mentor, and then the >>> >>> other one of us (and other Lucene committers) will support/chime >>> >>> in... >>> >> >>> >> I will take the official responsibility here once we are there! >>> >> simon >>> >>> >>> >>> This is an important change for Lucene! >>> >>> >>> >>> Mike >>> >>> >>> >>> --------------------------------------------------------------------- >>> >>> To unsubscribe, e-mail: [email protected] >>> >>> For additional commands, e-mail: [email protected] >>> >>> >>> >>> >>> >> >>> > >>> > >>> > >>> > -- >>> > >>> > >>> > Regards, >>> > Varun Thacker >>> > http://varunthacker.wordpress.com >>> > >>> > >>> > >>> > >>> >> >> >> >> -- >> >> >> Regards, >> Varun Thacker >> http://varunthacker.wordpress.com >> >> >> > -- Regards, Varun Thacker http://varunthacker.wordpress.com
