On Sun, Mar 13, 2011 at 12:11 AM, Michael McCandless <luc...@mikemccandless.com> wrote: > Simon these are great summaries -- can you post them on the issues too? > Thanks!
done! simon > > On Sat, Mar 12, 2011 at 4:35 PM, Simon Willnauer > <simon.willna...@googlemail.com> wrote: >> Hey, >> >> On Sat, Mar 12, 2011 at 5:32 PM, Zhijie Shen <zjshe...@gmail.com> wrote: >>> Hi developers, >>> >>> I'm a graduate student from National University of Singapore, majoring in >>> Computer Science. The enthusiasm of open source and information retrieval >>> drives me to participate in GSoC'11 with your community. I first got to know >>> Lucene when I was in a software engineer intern in IBM, working on Lotus >>> Connections. >> >> Awesome and welcome to Lucene :) >>> >>> Now I've already checked out the source code and successfully built it >>> locally. Meanwhile, I begin to read through the Jira issues, and are more >>> interested in Issue 2308, 2309 and 2621, which seem to be the refactoring >>> tasks (Please correct me if I'm wrong). My personal feeling is that these >>> tasks will be more appropriate for a beginner to get in. Moreover, I think >>> to start with such a big project, it is more efficient to read through the >>> discussion on Jira to understand the problem, and then dive into the related >>> code with the problem kept in mind. What is your opinion? I'm looking >>> forward to your guidance. >> >> Apparently you survived the first steps to get into lucene and solr! >> Great! You also looked at JIRA which is even better. So lemme tell you >> some words about the issues you have listed. >> >> LUCENE-2621 - Extend Codec to handle also stored fields and term vectors >> This is a very interesting and at the same time very much needed >> feature which involves API Design, Refactoring and in depth >> understanding of how IndexWriter and its internals work. The API which >> needs to be refactored (Codec API) was made to consume PostingLists >> once an in memory index segment is flushed to disc. Yet, to expose >> Stored Fields to this API we need to prepare it to consume data for >> every document while we build the in memory segment. So there is a >> little paradigm missmatch here which needs to be addressed. >> >> LUCENE-2309 - Fully decouple IndexWriter from analyzers >> >> This one is something I look forward to have for quite a while which >> would flatten the way for other analysis capabilities than the one >> lucene offers today. This seems to be refactoring-heavier that the >> other but might be require less knowledge about the IndexWriter (IW) >> internals than the codec one. Yet, it still is a very interesting >> issue / project to work on and fairly self-contained. >> >> LUCENE-2308 - Separately specify a field's type >> >> FieldType aims on the one hand to separate field properties from the >> actual value and on the other make Field's extensibility easier. Both >> seem equally important while far from easy to achieve. Fieldable and >> Field are a core API and changes to it need to well thought. Further >> this issue can easily cause drastic performance degradation if not >> done right. Consider this as a massive change since fields are used >> almost all over lucene and solr. >> >> I wrote those little summaries not to scare you away, not at all! I >> rather tried to find out what to expect from the issues and to make it >> easier for you to pick either one or another which you would like to >> work on. I will try to update the description of those issues if they >> are not already clear enough ( LUCENE-2621 seems kind of too brief >> though) in the next couple of days. >> >> If you have any question regarding those issues or any other, feel >> free to ask here on the list or on the issue directly (you might need >> a JIRA account if you don't have one already you should get one :) >> Reading the JIRA issue might help you to understand what those issues >> about but those are usually written by core devs or long time >> contributors so please as any question you have and don't hesitate to >> ask if you have problems with anything. >> >> Simon >>> >>> Regards, >>> Zhijie >>> >>> -- >>> Zhijie Shen >>> School of Computing >>> National University of Singapore >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> > > > > -- > Mike > > http://blog.mikemccandless.com > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org