Re: Learning Lucene from ground up
+1 to MyCoy's suggestion. To answer your most immediate questions: - Lucene mostly loads metadata in memory at the time of opening a segment (dvm, tmd, fdm, vem, nvm, kdm files), other files are memory-mapped and Lucene relies on the filesystem cache to have their data efficiently available. This allows Lucene to have a very small memory footprint for searching. - Finite state machines are mostly used for suggesters and for the terms index (tip file), which essentially stores all prefixes that are shared by 25-40 terms in a FST. On Sun, Nov 6, 2022 at 2:12 AM MyCoy Z wrote: > I just started learning Lucene HNSW source code last months. > > I find the most effective way is to start with the testcases, set debugging > break points in the code you're interested in, and walk through the code > > Regards > MyCoy > > On Fri, Nov 4, 2022 at 9:24 PM Rahul Goswami > wrote: > > > Hello, > > I have been working with Lucene and Solr for quite some time and have a > > good understanding of a lot of moving parts at the code level. However I > > wish to learn Lucene internals from the ground up and want to > familiarize > > myself with all the dirty details. I would like to know what would be the > > best way to go about it. > > > > To kick things off, I have been thinking about picking up “Lucene in > > Action”, but have been hesitant (and possibly wrongly) since it is based > on > > Lucene 3.0 and we have come a long way since then. To give an example of > > the level of detail I wish to learn (among other things) would be what > > parts of a segment (.tim, .tip, etc) get loaded in memory at search time, > > which part uses finite state machines and why, etc > > > > I would really appreciate any thoughts/inputs on how I can go about this. > > Thanks in advance! > > > > Regards, > > Rahul > > > -- Adrien
Re: Learning Lucene from ground up
I just started learning Lucene HNSW source code last months. I find the most effective way is to start with the testcases, set debugging break points in the code you're interested in, and walk through the code Regards MyCoy On Fri, Nov 4, 2022 at 9:24 PM Rahul Goswami wrote: > Hello, > I have been working with Lucene and Solr for quite some time and have a > good understanding of a lot of moving parts at the code level. However I > wish to learn Lucene internals from the ground up and want to familiarize > myself with all the dirty details. I would like to know what would be the > best way to go about it. > > To kick things off, I have been thinking about picking up “Lucene in > Action”, but have been hesitant (and possibly wrongly) since it is based on > Lucene 3.0 and we have come a long way since then. To give an example of > the level of detail I wish to learn (among other things) would be what > parts of a segment (.tim, .tip, etc) get loaded in memory at search time, > which part uses finite state machines and why, etc > > I would really appreciate any thoughts/inputs on how I can go about this. > Thanks in advance! > > Regards, > Rahul >
Learning Lucene from ground up
Hello, I have been working with Lucene and Solr for quite some time and have a good understanding of a lot of moving parts at the code level. However I wish to learn Lucene internals from the ground up and want to familiarize myself with all the dirty details. I would like to know what would be the best way to go about it. To kick things off, I have been thinking about picking up “Lucene in Action”, but have been hesitant (and possibly wrongly) since it is based on Lucene 3.0 and we have come a long way since then. To give an example of the level of detail I wish to learn (among other things) would be what parts of a segment (.tim, .tip, etc) get loaded in memory at search time, which part uses finite state machines and why, etc I would really appreciate any thoughts/inputs on how I can go about this. Thanks in advance! Regards, Rahul