Re: CJKBigramFilter - position bug with outputUnigrams?

Walter Underwood Sun, 20 Apr 2014 19:41:24 -0700

I have used Basistech linguistics in two products at two companies and they 
make high-quality software. At one point, I met with our Japanese partner, in 
Japan, and was able to make them comfortable with using Basistech instead of 
their own morphological package.


wunder

On Apr 20, 2014, at 7:16 PM, Alexandre Rafalovitch <arafa...@gmail.com> wrote:

> Have you looked at commercial offerings? At some point, it becomes an
> ROI issue. If it is becoming such a serious issue:
> http://www.basistech.com/text-analytics/rosette/base-linguistics/asian-languages/
> 
> Regards,
>   Alex.
> P.s. This is a link, not a recommendation. I haven't tested either
> their quality or their pricing
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr 
> proficiency
> 
> 
> On Mon, Apr 21, 2014 at 8:50 AM, Shawn Heisey <s...@elyograg.org> wrote:
>> On 4/20/2014 6:20 PM, Benson Margulies wrote:
>>> Could I perhaps wonder why your customer is so intent on indexing
>>> ngrams? Why not use Kuromoji and index words?
>> 
>> The data is not just Japanese.  There is a mixture.  For text in the
>> Latin character set, StandardTokenizer and other similar things do not
>> work for us, mostly because of the way that they handle punctuation.
>> ICUTokenizer with its default rule set wouldn't work either, but as
>> you'll see below, I've got a modified ruleset for Latin.
>> 
>> The following is what I currently have for my analysis.  A lot of this
>> has evolved over the last few years on my other index that is primarily
>> English:
>> 
>> http://apaste.info/ypy
>> 
>> We may need to have a major overhaul of our analysis chain for this
>> customer.  Perhaps what we've learned in the past won't apply here.
>> 
>> Right now we have outputUnigrams enabled for both index and query.  This
>> solves the phrase query problem but causes things to match that the
>> customer doesn't want to match.
>> 
>> Thanks,
>> Shawn

Re: CJKBigramFilter - position bug with outputUnigrams?

Reply via email to