Hi Karl, I think shingle is designed to make the phase search faster, it'll generate a lot of "seemed like" phase by pos only and completely disregard the meaning, that's not good enough.
Regards, Andrew On Tue, Oct 6, 2009 at 11:51 PM, Karl Wettin <[email protected]> wrote: > Hi Andrew, > > I think you are looking for the shingle package in contrib/analyzers. > > > karl > > 6 okt 2009 kl. 13.42 skrev Andrew Zhang: > > > Hi guys, >> >> The requirement is very simple here, e.g. for this sentence, 'The NBA >> formally announced its new *social media* guidelines Wednesday', I want >> to >> treat '*social media*' as a whole phase term. The default english >> analyzers >> came with lucene all deal with single word, so it you want to get the most >> frequent terms, *social *and *media* are separated, and each of them can't >> represent a good meaning as *social media*, right? >> >> I know there's a way built on some phase dictionary, and try to match the >> phase already there, very like the way to do with chinese language, but is >> there an open source solution for english, I mean I don't want to build a >> phase dictionary myself, and I also want a smart way, which can "discover" >> the phase automatically. I got 2 millions docs analyzered the norma way, >> all >> single terms, which I can use as a base source, and it's possible to find >> that *social media *came together frequently, but I really don't know >> what's >> the reverse way. >> >> I tried to find some phase analyzers, but no luck. so any advices? >> >> Regards, >> Andrew >> -- >> Simple is best >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > -- Simple is best
