[jira] Updated: (LUCENE-1628) Persian Analyzer

2009-08-10 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-1628:


Attachment: LUCENE-1628.patch

implement reusableTokenStream here too.

 Persian Analyzer
 

 Key: LUCENE-1628
 URL: https://issues.apache.org/jira/browse/LUCENE-1628
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Robert Muir
Assignee: Robert Muir
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1628.patch, LUCENE-1628.patch, LUCENE-1628.patch, 
 LUCENE-1628.patch, LUCENE-1628.patch, LUCENE-1628.patch, LUCENE-1628.txt


 A simple persian analyzer.
 i measured trec scores with the benchmark package below against 
 http://ece.ut.ac.ir/DBRG/Hamshahri/ :
 SimpleAnalyzer:
 SUMMARY
   Search Seconds: 0.012
   DocName Seconds:0.020
   Num Points:   981.015
   Num Good Points:   33.738
   Max Good Points:   36.185
   Average Precision:  0.374
   MRR:0.667
   Recall: 0.905
   Precision At 1: 0.585
   Precision At 2: 0.531
   Precision At 3: 0.513
   Precision At 4: 0.496
   Precision At 5: 0.486
   Precision At 6: 0.487
   Precision At 7: 0.479
   Precision At 8: 0.465
   Precision At 9: 0.458
   Precision At 10:0.460
   Precision At 11:0.453
   Precision At 12:0.453
   Precision At 13:0.445
   Precision At 14:0.438
   Precision At 15:0.438
   Precision At 16:0.438
   Precision At 17:0.429
   Precision At 18:0.429
   Precision At 19:0.419
   Precision At 20:0.415
 PersianAnalyzer:
 SUMMARY
   Search Seconds: 0.004
   DocName Seconds:0.011
   Num Points:   987.692
   Num Good Points:   36.123
   Max Good Points:   36.185
   Average Precision:  0.481
   MRR:0.833
   Recall: 0.998
   Precision At 1: 0.754
   Precision At 2: 0.715
   Precision At 3: 0.646
   Precision At 4: 0.646
   Precision At 5: 0.631
   Precision At 6: 0.621
   Precision At 7: 0.593
   Precision At 8: 0.577
   Precision At 9: 0.573
   Precision At 10:0.566
   Precision At 11:0.572
   Precision At 12:0.562
   Precision At 13:0.554
   Precision At 14:0.549
   Precision At 15:0.542
   Precision At 16:0.538
   Precision At 17:0.533
   Precision At 18:0.527
   Precision At 19:0.525
   Precision At 20:0.518

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1628) Persian Analyzer

2009-07-29 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-1628:


Attachment: LUCENE-1628.patch

add lowercasefilter, consistent with the arabic analyzer, its userfriendly for 
the common case where there is also some english text.


 Persian Analyzer
 

 Key: LUCENE-1628
 URL: https://issues.apache.org/jira/browse/LUCENE-1628
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Robert Muir
Assignee: Mark Miller
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1628.patch, LUCENE-1628.patch, LUCENE-1628.patch, 
 LUCENE-1628.patch, LUCENE-1628.patch, LUCENE-1628.txt


 A simple persian analyzer.
 i measured trec scores with the benchmark package below against 
 http://ece.ut.ac.ir/DBRG/Hamshahri/ :
 SimpleAnalyzer:
 SUMMARY
   Search Seconds: 0.012
   DocName Seconds:0.020
   Num Points:   981.015
   Num Good Points:   33.738
   Max Good Points:   36.185
   Average Precision:  0.374
   MRR:0.667
   Recall: 0.905
   Precision At 1: 0.585
   Precision At 2: 0.531
   Precision At 3: 0.513
   Precision At 4: 0.496
   Precision At 5: 0.486
   Precision At 6: 0.487
   Precision At 7: 0.479
   Precision At 8: 0.465
   Precision At 9: 0.458
   Precision At 10:0.460
   Precision At 11:0.453
   Precision At 12:0.453
   Precision At 13:0.445
   Precision At 14:0.438
   Precision At 15:0.438
   Precision At 16:0.438
   Precision At 17:0.429
   Precision At 18:0.429
   Precision At 19:0.419
   Precision At 20:0.415
 PersianAnalyzer:
 SUMMARY
   Search Seconds: 0.004
   DocName Seconds:0.011
   Num Points:   987.692
   Num Good Points:   36.123
   Max Good Points:   36.185
   Average Precision:  0.481
   MRR:0.833
   Recall: 0.998
   Precision At 1: 0.754
   Precision At 2: 0.715
   Precision At 3: 0.646
   Precision At 4: 0.646
   Precision At 5: 0.631
   Precision At 6: 0.621
   Precision At 7: 0.593
   Precision At 8: 0.577
   Precision At 9: 0.573
   Precision At 10:0.566
   Precision At 11:0.572
   Precision At 12:0.562
   Precision At 13:0.554
   Precision At 14:0.549
   Precision At 15:0.542
   Precision At 16:0.538
   Precision At 17:0.533
   Precision At 18:0.527
   Precision At 19:0.525
   Precision At 20:0.518

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1628) Persian Analyzer

2009-07-24 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-1628:


Attachment: LUCENE-1628.patch

analyzers/fa - analyzers/common/fa
make PersianNormalizationFilter final
switch to new API.


 Persian Analyzer
 

 Key: LUCENE-1628
 URL: https://issues.apache.org/jira/browse/LUCENE-1628
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Robert Muir
Assignee: Mark Miller
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1628.patch, LUCENE-1628.patch, LUCENE-1628.patch, 
 LUCENE-1628.patch, LUCENE-1628.txt


 A simple persian analyzer.
 i measured trec scores with the benchmark package below against 
 http://ece.ut.ac.ir/DBRG/Hamshahri/ :
 SimpleAnalyzer:
 SUMMARY
   Search Seconds: 0.012
   DocName Seconds:0.020
   Num Points:   981.015
   Num Good Points:   33.738
   Max Good Points:   36.185
   Average Precision:  0.374
   MRR:0.667
   Recall: 0.905
   Precision At 1: 0.585
   Precision At 2: 0.531
   Precision At 3: 0.513
   Precision At 4: 0.496
   Precision At 5: 0.486
   Precision At 6: 0.487
   Precision At 7: 0.479
   Precision At 8: 0.465
   Precision At 9: 0.458
   Precision At 10:0.460
   Precision At 11:0.453
   Precision At 12:0.453
   Precision At 13:0.445
   Precision At 14:0.438
   Precision At 15:0.438
   Precision At 16:0.438
   Precision At 17:0.429
   Precision At 18:0.429
   Precision At 19:0.419
   Precision At 20:0.415
 PersianAnalyzer:
 SUMMARY
   Search Seconds: 0.004
   DocName Seconds:0.011
   Num Points:   987.692
   Num Good Points:   36.123
   Max Good Points:   36.185
   Average Precision:  0.481
   MRR:0.833
   Recall: 0.998
   Precision At 1: 0.754
   Precision At 2: 0.715
   Precision At 3: 0.646
   Precision At 4: 0.646
   Precision At 5: 0.631
   Precision At 6: 0.621
   Precision At 7: 0.593
   Precision At 8: 0.577
   Precision At 9: 0.573
   Precision At 10:0.566
   Precision At 11:0.572
   Precision At 12:0.562
   Precision At 13:0.554
   Precision At 14:0.549
   Precision At 15:0.542
   Precision At 16:0.538
   Precision At 17:0.533
   Precision At 18:0.527
   Precision At 19:0.525
   Precision At 20:0.518

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1628) Persian Analyzer

2009-05-18 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-1628:


Attachment: LUCENE-1628.patch

farsi stopwords file moved to resources folder and test to ensure it loads.


 Persian Analyzer
 

 Key: LUCENE-1628
 URL: https://issues.apache.org/jira/browse/LUCENE-1628
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Robert Muir
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1628.patch, LUCENE-1628.patch


 A simple persian analyzer.
 i measured trec scores with the benchmark package below against 
 http://ece.ut.ac.ir/DBRG/Hamshahri/ :
 SimpleAnalyzer:
 SUMMARY
   Search Seconds: 0.012
   DocName Seconds:0.020
   Num Points:   981.015
   Num Good Points:   33.738
   Max Good Points:   36.185
   Average Precision:  0.374
   MRR:0.667
   Recall: 0.905
   Precision At 1: 0.585
   Precision At 2: 0.531
   Precision At 3: 0.513
   Precision At 4: 0.496
   Precision At 5: 0.486
   Precision At 6: 0.487
   Precision At 7: 0.479
   Precision At 8: 0.465
   Precision At 9: 0.458
   Precision At 10:0.460
   Precision At 11:0.453
   Precision At 12:0.453
   Precision At 13:0.445
   Precision At 14:0.438
   Precision At 15:0.438
   Precision At 16:0.438
   Precision At 17:0.429
   Precision At 18:0.429
   Precision At 19:0.419
   Precision At 20:0.415
 PersianAnalyzer:
 SUMMARY
   Search Seconds: 0.004
   DocName Seconds:0.011
   Num Points:   987.692
   Num Good Points:   36.123
   Max Good Points:   36.185
   Average Precision:  0.481
   MRR:0.833
   Recall: 0.998
   Precision At 1: 0.754
   Precision At 2: 0.715
   Precision At 3: 0.646
   Precision At 4: 0.646
   Precision At 5: 0.631
   Precision At 6: 0.621
   Precision At 7: 0.593
   Precision At 8: 0.577
   Precision At 9: 0.573
   Precision At 10:0.566
   Precision At 11:0.572
   Precision At 12:0.562
   Precision At 13:0.554
   Precision At 14:0.549
   Precision At 15:0.542
   Precision At 16:0.538
   Precision At 17:0.533
   Precision At 18:0.527
   Precision At 19:0.525
   Precision At 20:0.518

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1628) Persian Analyzer

2009-05-04 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1628:
---

Fix Version/s: 2.9

 Persian Analyzer
 

 Key: LUCENE-1628
 URL: https://issues.apache.org/jira/browse/LUCENE-1628
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Robert Muir
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1628.patch


 A simple persian analyzer.
 i measured trec scores with the benchmark package below against 
 http://ece.ut.ac.ir/DBRG/Hamshahri/ :
 SimpleAnalyzer:
 SUMMARY
   Search Seconds: 0.012
   DocName Seconds:0.020
   Num Points:   981.015
   Num Good Points:   33.738
   Max Good Points:   36.185
   Average Precision:  0.374
   MRR:0.667
   Recall: 0.905
   Precision At 1: 0.585
   Precision At 2: 0.531
   Precision At 3: 0.513
   Precision At 4: 0.496
   Precision At 5: 0.486
   Precision At 6: 0.487
   Precision At 7: 0.479
   Precision At 8: 0.465
   Precision At 9: 0.458
   Precision At 10:0.460
   Precision At 11:0.453
   Precision At 12:0.453
   Precision At 13:0.445
   Precision At 14:0.438
   Precision At 15:0.438
   Precision At 16:0.438
   Precision At 17:0.429
   Precision At 18:0.429
   Precision At 19:0.419
   Precision At 20:0.415
 PersianAnalyzer:
 SUMMARY
   Search Seconds: 0.004
   DocName Seconds:0.011
   Num Points:   987.692
   Num Good Points:   36.123
   Max Good Points:   36.185
   Average Precision:  0.481
   MRR:0.833
   Recall: 0.998
   Precision At 1: 0.754
   Precision At 2: 0.715
   Precision At 3: 0.646
   Precision At 4: 0.646
   Precision At 5: 0.631
   Precision At 6: 0.621
   Precision At 7: 0.593
   Precision At 8: 0.577
   Precision At 9: 0.573
   Precision At 10:0.566
   Precision At 11:0.572
   Precision At 12:0.562
   Precision At 13:0.554
   Precision At 14:0.549
   Precision At 15:0.542
   Precision At 16:0.538
   Precision At 17:0.533
   Precision At 18:0.527
   Precision At 19:0.525
   Precision At 20:0.518

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1628) Persian Analyzer

2009-05-03 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-1628:


Attachment: LUCENE-1628.patch

patch file

 Persian Analyzer
 

 Key: LUCENE-1628
 URL: https://issues.apache.org/jira/browse/LUCENE-1628
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Robert Muir
Priority: Minor
 Attachments: LUCENE-1628.patch


 A simple persian analyzer.
 i measured trec scores with the benchmark package below against 
 http://ece.ut.ac.ir/DBRG/Hamshahri/ :
 SimpleAnalyzer:
 SUMMARY
   Search Seconds: 0.012
   DocName Seconds:0.020
   Num Points:   981.015
   Num Good Points:   33.738
   Max Good Points:   36.185
   Average Precision:  0.374
   MRR:0.667
   Recall: 0.905
   Precision At 1: 0.585
   Precision At 2: 0.531
   Precision At 3: 0.513
   Precision At 4: 0.496
   Precision At 5: 0.486
   Precision At 6: 0.487
   Precision At 7: 0.479
   Precision At 8: 0.465
   Precision At 9: 0.458
   Precision At 10:0.460
   Precision At 11:0.453
   Precision At 12:0.453
   Precision At 13:0.445
   Precision At 14:0.438
   Precision At 15:0.438
   Precision At 16:0.438
   Precision At 17:0.429
   Precision At 18:0.429
   Precision At 19:0.419
   Precision At 20:0.415
 PersianAnalyzer:
 SUMMARY
   Search Seconds: 0.004
   DocName Seconds:0.011
   Num Points:   987.692
   Num Good Points:   36.123
   Max Good Points:   36.185
   Average Precision:  0.481
   MRR:0.833
   Recall: 0.998
   Precision At 1: 0.754
   Precision At 2: 0.715
   Precision At 3: 0.646
   Precision At 4: 0.646
   Precision At 5: 0.631
   Precision At 6: 0.621
   Precision At 7: 0.593
   Precision At 8: 0.577
   Precision At 9: 0.573
   Precision At 10:0.566
   Precision At 11:0.572
   Precision At 12:0.562
   Precision At 13:0.554
   Precision At 14:0.549
   Precision At 15:0.542
   Precision At 16:0.538
   Precision At 17:0.533
   Precision At 18:0.527
   Precision At 19:0.525
   Precision At 20:0.518

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org