Looking into the payload patch is next on my list after I finish up the benchmarking, but I know the payload patch is going to require several sets of eyes and some discussion. I would like to see the more flexible index format addressed and I am not sure about the removal of Fieldable as an interface, but I admit I haven't looked into the depths just yet. The reason I like Fieldable as an interface is I think it keeps us open to having Documents be backed by databases, etc. (I have been toying with the notion of DBFieldable which is a Fieldable implementation backed by a DB for seamless usage with a DB when creating Documents, but it isn't ready yet)

As for management, I think we could benefit some from getting some people to work toward a set of goals for improvements. I can think of at least 3 major things (and I am sure others could add theirs) that would have significant impact:
1. Flexible Indexing
2. Java 1.5
3. Merge improvements suggested by Dave Balmain and Marvin Humphrey concerning segments and also storing vector information outside of the merge segments.

We have some planning docs on the Wiki, but I don't think anyone has really taken up that torch.

-Grant


On Oct 17, 2006, at 9:12 AM, Erik Hatcher wrote:

Steven,

Thanks for this prod. I've been meaning to debrief the Lucene dev group on a recent visit I made to IBM's Silicon Valley Labs where I presented Lucene and met with their search gurus. A recurring theme from them was the management of Java Lucene project, and what they and we can do to better have patches applied and move Lucene forward.

For example, the payload design proposed by one of the SVL team members is something they really need to accomplish clever XML indexing, and we've not done anything about it even though the proposal was backwards compatible and added something valuable to Lucene. Granted, that did open the discussion up into a more generalized "flexible index format" topic, but that has yet to become real code.

I am all for us adopting the JIRA techniques that Hadoop uses, and for us committers (and contributors) to become more efficient and effective with patch and release management.

Now all we need are volunteers to facilitate this. I'd love to help more, but I'm much too frazzled and swamped to manage something like this myself currently. But I strongly support it!

        Erik



On Oct 17, 2006, at 8:19 AM, Steven Parkes wrote:

As a member of a number of Lucene subprojects dev lists, I've been
comparing the way the Jira workflow is used on the different projects.
In particular, I've been noting the difference between the workflows
that Lucene Java and Hadoop use. Hadoop in particular has a state for
"patch submitted" which, as it's used on Hadoop, seems to facilitate
communication. State changes (open->patch submitted,
patch-submitted->open) seem to help communications between contributors and reviewers. Looking at the Lucene Java Jira, sometimes "[patch]" is
put at the beginning of the description of the issue to indicate
something similar, but this isn't used too consistently and doesn't seem
to be as effective. It also requires custom filters to easily see all
issues in the "patch submitted" state.

Is there sufficient interest to consider this for Lucene Java? (I'd
write "any interest", but since I'm interested, there's at least some.)

There are a couple of issues around the Hadoop workflow I'm aware of.
One is that once an issue is closed, it can't be reopened. As I
understand it, this is because on Hadoop, they use the Jira feature
which allows automated generation of release notes. As someone who is
responsible for tracking the changes between releases for my company,
this is actually a win for me, so I like the way Hadoop does it. But it does add the step of needing to start a new Jira issue rather than just
reopening an old one.

The other thing I was thinking of was the case where we say "if you're
working on something, go ahead and submit a patch even if it's not
polished or you aren't sure you want it to be a candidate for the trunk. Let others look at it." I think that's clearly a good thing to have, but I wonder what the best way to handle it in Jira is. What state should be
used?

There are quite a few open issues in Jira. Makes me a little
uncomfortable, since in my experience, when you get really long lists of
issues that are not addressed, the effectiveness of an issue tracking
system seems to drop dramatically. Anybody else think this? I've got
time to contribute to cleanup if there's sufficient interest.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
335 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to