[ https://issues.apache.org/jira/browse/LUCENE-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788731#action_12788731 ]
Grant Ingersoll commented on LUCENE-1377: ----------------------------------------- (kind of rambling, but...) We probably should also include Nutch. I think they have their own analyzers too. While I think it is reasonable to consolidate Analyzers, it is a slippery slope (next up Function queries - Solr users love function queries and they get a lot of love both from users and devs while Lucene's function queries languish and we're never brought over to Solr b/c no one did the work and Solr's moved on as well-, then faceting, then the schema, ZooKeeper integration etc.). One could easily make the case that all of Solr should be a part of Lucene or that all the guts of Solr should be pulled into Lucene other than the "scaffolding" part at which point Solr simply becomes the Lucene search server. At the same time, you could make the case that Solr could be it's own TLP: Solr often has different goals from Lucene, not to mention different release cycles, etc. (for instance, we don't ask Compass, DBSight, et. al. to donate back their analyzers, right?) Besides the fact that there are many things that are just easier to do in Solr b/c it provides the framework around Lucene that _ALL OF US HAVE BUILT_ in one form or another over the years. That being said, Solr and Lucene have a special relationship in that they both fly under the Lucene flag and many Solr committers are also Lucene committers, so we should try to coordinate a bit more, while maintaining independence. As I've said before, though, it often seems like a one way street for those Lucene committers who are not Solr users. Namely, they grab stuff from Solr and put it into Lucene, but then they don't make Solr "whole" again (function queries are "exhibit A"). At the same time, when I see talk of Lucene adding schema like features or other stuff that is already in Solr and that I don't think belongs in Lucene, I think "why not just use Solr", yet the Lucene community, at times seems intent on duplication, too, so the knife cuts both ways, as they say. At any rate, the reason I'm pro Analyzer consolidation (I could even see it being a standalone sub-project) is that Analyzers are useful in their own right outside of search all together. For instance, Mahout currently has a dep. on both core and contrib/analyzers for the sole purpose of using the Analyzers (well, we also have a utility dependency on Lucene core to create feature vectors from a Lucene index, but the core has a dep on Analyzers). So, perhaps, the PMC and broader community should take this up on general@ and we should create an Analyzers project and all Lucene ecosystem committers have commit writes on it _WITH THE VERY RESTRICTIVE_ approach that changes need to be tested across projects. Of course, this will, as Yonik points out, become a bottleneck for each project and it is far from perfect too. However, we could still allow projects to create their own and then they get promoted when deemed useful by others. On the other hand, I kind of am not thrilled about a whole other subproject with it's own lists, JIRA, etc. just for Analyzers. Maybe we could just have separate SVN (so that we can change commit access, builds, etc.) and java-dev is still the place for discussion. Then each project could have it's dependency on that code as needed and the Analyzers code could be released separately, etc. So, pretty much a full project, but not the overhead. > Add HTMLStripReader and WordDelimiterFilter from SOLR > ----------------------------------------------------- > > Key: LUCENE-1377 > URL: https://issues.apache.org/jira/browse/LUCENE-1377 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Affects Versions: 2.3.2 > Reporter: Jason Rutherglen > Priority: Minor > Original Estimate: 24h > Remaining Estimate: 24h > > SOLR has two classes HTMLStripReader and WordDelimiterFilter which are very > useful for a wide variety of use cases. It would be good to place them into > core Lucene. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org