[ 
https://issues.apache.org/jira/browse/LUCENE-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718275#action_12718275
 ] 

Mark Miller commented on LUCENE-1628:
-------------------------------------

Okay, I see that the stopword list for Arabic was committed by Grant with the 
BSD license. I'll take that as an "its okay" unless anyone speaks up.

Thanks for all these great Analyzers Robert.

> Persian Analyzer
> ----------------
>
>                 Key: LUCENE-1628
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1628
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1628.patch, LUCENE-1628.patch
>
>
> A simple persian analyzer.
> i measured trec scores with the benchmark package below against 
> http://ece.ut.ac.ir/DBRG/Hamshahri/ :
> SimpleAnalyzer:
> SUMMARY
>   Search Seconds:         0.012
>   DocName Seconds:        0.020
>   Num Points:           981.015
>   Num Good Points:       33.738
>   Max Good Points:       36.185
>   Average Precision:      0.374
>   MRR:                    0.667
>   Recall:                 0.905
>   Precision At 1:         0.585
>   Precision At 2:         0.531
>   Precision At 3:         0.513
>   Precision At 4:         0.496
>   Precision At 5:         0.486
>   Precision At 6:         0.487
>   Precision At 7:         0.479
>   Precision At 8:         0.465
>   Precision At 9:         0.458
>   Precision At 10:        0.460
>   Precision At 11:        0.453
>   Precision At 12:        0.453
>   Precision At 13:        0.445
>   Precision At 14:        0.438
>   Precision At 15:        0.438
>   Precision At 16:        0.438
>   Precision At 17:        0.429
>   Precision At 18:        0.429
>   Precision At 19:        0.419
>   Precision At 20:        0.415
> PersianAnalyzer:
> SUMMARY
>   Search Seconds:         0.004
>   DocName Seconds:        0.011
>   Num Points:           987.692
>   Num Good Points:       36.123
>   Max Good Points:       36.185
>   Average Precision:      0.481
>   MRR:                    0.833
>   Recall:                 0.998
>   Precision At 1:         0.754
>   Precision At 2:         0.715
>   Precision At 3:         0.646
>   Precision At 4:         0.646
>   Precision At 5:         0.631
>   Precision At 6:         0.621
>   Precision At 7:         0.593
>   Precision At 8:         0.577
>   Precision At 9:         0.573
>   Precision At 10:        0.566
>   Precision At 11:        0.572
>   Precision At 12:        0.562
>   Precision At 13:        0.554
>   Precision At 14:        0.549
>   Precision At 15:        0.542
>   Precision At 16:        0.538
>   Precision At 17:        0.533
>   Precision At 18:        0.527
>   Precision At 19:        0.525
>   Precision At 20:        0.518

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to