[ 
https://issues.apache.org/jira/browse/LUCENE-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721048#action_12721048
 ] 

Mark Miller commented on LUCENE-1628:
-------------------------------------

Looks pretty good. Not sure if we should update to the new token api here or 
just commit and hit it with the other issue. I guess we might as well get it 
here first.

Is it better to put the raw text in there like that (in the tests) or do you 
think it would be better to use utf8 codes with maybe the raw text in a 
comment? I'm just remembering running into issues with such things in a past 
life as I moved around source code.

> Persian Analyzer
> ----------------
>
>                 Key: LUCENE-1628
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1628
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1628.patch, LUCENE-1628.patch
>
>
> A simple persian analyzer.
> i measured trec scores with the benchmark package below against 
> http://ece.ut.ac.ir/DBRG/Hamshahri/ :
> SimpleAnalyzer:
> SUMMARY
>   Search Seconds:         0.012
>   DocName Seconds:        0.020
>   Num Points:           981.015
>   Num Good Points:       33.738
>   Max Good Points:       36.185
>   Average Precision:      0.374
>   MRR:                    0.667
>   Recall:                 0.905
>   Precision At 1:         0.585
>   Precision At 2:         0.531
>   Precision At 3:         0.513
>   Precision At 4:         0.496
>   Precision At 5:         0.486
>   Precision At 6:         0.487
>   Precision At 7:         0.479
>   Precision At 8:         0.465
>   Precision At 9:         0.458
>   Precision At 10:        0.460
>   Precision At 11:        0.453
>   Precision At 12:        0.453
>   Precision At 13:        0.445
>   Precision At 14:        0.438
>   Precision At 15:        0.438
>   Precision At 16:        0.438
>   Precision At 17:        0.429
>   Precision At 18:        0.429
>   Precision At 19:        0.419
>   Precision At 20:        0.415
> PersianAnalyzer:
> SUMMARY
>   Search Seconds:         0.004
>   DocName Seconds:        0.011
>   Num Points:           987.692
>   Num Good Points:       36.123
>   Max Good Points:       36.185
>   Average Precision:      0.481
>   MRR:                    0.833
>   Recall:                 0.998
>   Precision At 1:         0.754
>   Precision At 2:         0.715
>   Precision At 3:         0.646
>   Precision At 4:         0.646
>   Precision At 5:         0.631
>   Precision At 6:         0.621
>   Precision At 7:         0.593
>   Precision At 8:         0.577
>   Precision At 9:         0.573
>   Precision At 10:        0.566
>   Precision At 11:        0.572
>   Precision At 12:        0.562
>   Precision At 13:        0.554
>   Precision At 14:        0.549
>   Precision At 15:        0.542
>   Precision At 16:        0.538
>   Precision At 17:        0.533
>   Precision At 18:        0.527
>   Precision At 19:        0.525
>   Precision At 20:        0.518

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to