Hi Guys, >> >> I vote for reverting this patch, unless there is an overall consensus >> among Nutch developers that it's ok to keep it as it is - on one hand >> considering the added functionality and simplification of Nutch code, >> and on the other hand considering the (lack of) maturity of Tika. > > I agree with Andrzej here. I would have waited a bit more before rushing > into this. Because at this point (where no Tika releases have been made) > it might (even though it does not look like it right now) even be > possible that the project will be retired without any releases at all.
I'm not out for beating a dead horse here, but the thought comes to mind: what about the vitality of the code as it exists within the Nutch code base? When was the last time anybody at all worked on the mime system? It was pioneered by Jerome, but he's been largely inactive as a committer for more than a year now, and it doesn't look like that's going to change. I ported what was largely Nutch's mime system, with Jerome's improvements to Tika, where the code is actively being developed, by me (and vetted by the other *active* members of the team) -- in contrast to Nutch. As a developer, I don't want to maintain the code in both places, but I'm willing to maintain the Nutch use of and interface to Tika, which means that Nutch will inherit the benefits using this approach. Being a member of the Nutch community for almost 2 years now, I can't tell you how many times people have asked for Nutch to be able to reliably detect XML content. This is reified in the form of a number of different JIRA issues that reference that deficiency, that are for all intents and purposes, not being worked on at all. I'm all for following the process, and so forth, but at the same time, I think the Nutch community needs to take a serious look at itself with regards to the "sacred" nature of the trunk, which we currently treat with a large amount of sensitivity, etc. However, the trunk as it stands on other projects (and of course, I'm bias, but I use my work as an example and also say something like Tika), the trunk is not something that is expected to be "always working" and is regularly expected as somewhere where bugs can exist, and where they can be fixed before a release is made. That's not the way I feel on this project and quite honestly I think it stymies progress. Finally, there is precedence for what I did with the Tika patch and making its way into the Nutch. If I recall something very similar happened when Hadoop came along and NDFS (at the time as it was called) and MapReduce made their way into an external library, and Nutch was made to rely on that (at the time) in-development library. This makes sense, because the folks working on Hadoop were actively working on updates to the portion of the code that Nutch relied upon, and all the developers that were interested in that portion of the code started developing in that arena. I'm not compariing Hadoop to Tika, but certainly there are some similarities here. -Chris ______________________________________________ Chris Mattmann, Ph.D. [EMAIL PROTECTED] Cognizant Development Engineer Early Detection Research Network Project _________________________________________________ Jet Propulsion Laboratory Pasadena, CA Office: 171-266B Mailstop: 171-246 _______________________________________________________ Disclaimer: The opinions presented within are my own and do not reflect those of either NASA, JPL, or the California Institute of Technology.