[jira] Updated: (LUCENE-1628) Persian Analyzer
[ https://issues.apache.org/jira/browse/LUCENE-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1628: Attachment: LUCENE-1628.patch implement reusableTokenStream here too. Persian Analyzer Key: LUCENE-1628 URL: https://issues.apache.org/jira/browse/LUCENE-1628 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Reporter: Robert Muir Assignee: Robert Muir Priority: Minor Fix For: 2.9 Attachments: LUCENE-1628.patch, LUCENE-1628.patch, LUCENE-1628.patch, LUCENE-1628.patch, LUCENE-1628.patch, LUCENE-1628.patch, LUCENE-1628.txt A simple persian analyzer. i measured trec scores with the benchmark package below against http://ece.ut.ac.ir/DBRG/Hamshahri/ : SimpleAnalyzer: SUMMARY Search Seconds: 0.012 DocName Seconds:0.020 Num Points: 981.015 Num Good Points: 33.738 Max Good Points: 36.185 Average Precision: 0.374 MRR:0.667 Recall: 0.905 Precision At 1: 0.585 Precision At 2: 0.531 Precision At 3: 0.513 Precision At 4: 0.496 Precision At 5: 0.486 Precision At 6: 0.487 Precision At 7: 0.479 Precision At 8: 0.465 Precision At 9: 0.458 Precision At 10:0.460 Precision At 11:0.453 Precision At 12:0.453 Precision At 13:0.445 Precision At 14:0.438 Precision At 15:0.438 Precision At 16:0.438 Precision At 17:0.429 Precision At 18:0.429 Precision At 19:0.419 Precision At 20:0.415 PersianAnalyzer: SUMMARY Search Seconds: 0.004 DocName Seconds:0.011 Num Points: 987.692 Num Good Points: 36.123 Max Good Points: 36.185 Average Precision: 0.481 MRR:0.833 Recall: 0.998 Precision At 1: 0.754 Precision At 2: 0.715 Precision At 3: 0.646 Precision At 4: 0.646 Precision At 5: 0.631 Precision At 6: 0.621 Precision At 7: 0.593 Precision At 8: 0.577 Precision At 9: 0.573 Precision At 10:0.566 Precision At 11:0.572 Precision At 12:0.562 Precision At 13:0.554 Precision At 14:0.549 Precision At 15:0.542 Precision At 16:0.538 Precision At 17:0.533 Precision At 18:0.527 Precision At 19:0.525 Precision At 20:0.518 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1628) Persian Analyzer
[ https://issues.apache.org/jira/browse/LUCENE-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1628: Attachment: LUCENE-1628.patch add lowercasefilter, consistent with the arabic analyzer, its userfriendly for the common case where there is also some english text. Persian Analyzer Key: LUCENE-1628 URL: https://issues.apache.org/jira/browse/LUCENE-1628 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Reporter: Robert Muir Assignee: Mark Miller Priority: Minor Fix For: 2.9 Attachments: LUCENE-1628.patch, LUCENE-1628.patch, LUCENE-1628.patch, LUCENE-1628.patch, LUCENE-1628.patch, LUCENE-1628.txt A simple persian analyzer. i measured trec scores with the benchmark package below against http://ece.ut.ac.ir/DBRG/Hamshahri/ : SimpleAnalyzer: SUMMARY Search Seconds: 0.012 DocName Seconds:0.020 Num Points: 981.015 Num Good Points: 33.738 Max Good Points: 36.185 Average Precision: 0.374 MRR:0.667 Recall: 0.905 Precision At 1: 0.585 Precision At 2: 0.531 Precision At 3: 0.513 Precision At 4: 0.496 Precision At 5: 0.486 Precision At 6: 0.487 Precision At 7: 0.479 Precision At 8: 0.465 Precision At 9: 0.458 Precision At 10:0.460 Precision At 11:0.453 Precision At 12:0.453 Precision At 13:0.445 Precision At 14:0.438 Precision At 15:0.438 Precision At 16:0.438 Precision At 17:0.429 Precision At 18:0.429 Precision At 19:0.419 Precision At 20:0.415 PersianAnalyzer: SUMMARY Search Seconds: 0.004 DocName Seconds:0.011 Num Points: 987.692 Num Good Points: 36.123 Max Good Points: 36.185 Average Precision: 0.481 MRR:0.833 Recall: 0.998 Precision At 1: 0.754 Precision At 2: 0.715 Precision At 3: 0.646 Precision At 4: 0.646 Precision At 5: 0.631 Precision At 6: 0.621 Precision At 7: 0.593 Precision At 8: 0.577 Precision At 9: 0.573 Precision At 10:0.566 Precision At 11:0.572 Precision At 12:0.562 Precision At 13:0.554 Precision At 14:0.549 Precision At 15:0.542 Precision At 16:0.538 Precision At 17:0.533 Precision At 18:0.527 Precision At 19:0.525 Precision At 20:0.518 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1628) Persian Analyzer
[ https://issues.apache.org/jira/browse/LUCENE-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1628: Attachment: LUCENE-1628.patch analyzers/fa - analyzers/common/fa make PersianNormalizationFilter final switch to new API. Persian Analyzer Key: LUCENE-1628 URL: https://issues.apache.org/jira/browse/LUCENE-1628 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Reporter: Robert Muir Assignee: Mark Miller Priority: Minor Fix For: 2.9 Attachments: LUCENE-1628.patch, LUCENE-1628.patch, LUCENE-1628.patch, LUCENE-1628.patch, LUCENE-1628.txt A simple persian analyzer. i measured trec scores with the benchmark package below against http://ece.ut.ac.ir/DBRG/Hamshahri/ : SimpleAnalyzer: SUMMARY Search Seconds: 0.012 DocName Seconds:0.020 Num Points: 981.015 Num Good Points: 33.738 Max Good Points: 36.185 Average Precision: 0.374 MRR:0.667 Recall: 0.905 Precision At 1: 0.585 Precision At 2: 0.531 Precision At 3: 0.513 Precision At 4: 0.496 Precision At 5: 0.486 Precision At 6: 0.487 Precision At 7: 0.479 Precision At 8: 0.465 Precision At 9: 0.458 Precision At 10:0.460 Precision At 11:0.453 Precision At 12:0.453 Precision At 13:0.445 Precision At 14:0.438 Precision At 15:0.438 Precision At 16:0.438 Precision At 17:0.429 Precision At 18:0.429 Precision At 19:0.419 Precision At 20:0.415 PersianAnalyzer: SUMMARY Search Seconds: 0.004 DocName Seconds:0.011 Num Points: 987.692 Num Good Points: 36.123 Max Good Points: 36.185 Average Precision: 0.481 MRR:0.833 Recall: 0.998 Precision At 1: 0.754 Precision At 2: 0.715 Precision At 3: 0.646 Precision At 4: 0.646 Precision At 5: 0.631 Precision At 6: 0.621 Precision At 7: 0.593 Precision At 8: 0.577 Precision At 9: 0.573 Precision At 10:0.566 Precision At 11:0.572 Precision At 12:0.562 Precision At 13:0.554 Precision At 14:0.549 Precision At 15:0.542 Precision At 16:0.538 Precision At 17:0.533 Precision At 18:0.527 Precision At 19:0.525 Precision At 20:0.518 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1628) Persian Analyzer
[ https://issues.apache.org/jira/browse/LUCENE-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1628: Attachment: LUCENE-1628.patch farsi stopwords file moved to resources folder and test to ensure it loads. Persian Analyzer Key: LUCENE-1628 URL: https://issues.apache.org/jira/browse/LUCENE-1628 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Reporter: Robert Muir Priority: Minor Fix For: 2.9 Attachments: LUCENE-1628.patch, LUCENE-1628.patch A simple persian analyzer. i measured trec scores with the benchmark package below against http://ece.ut.ac.ir/DBRG/Hamshahri/ : SimpleAnalyzer: SUMMARY Search Seconds: 0.012 DocName Seconds:0.020 Num Points: 981.015 Num Good Points: 33.738 Max Good Points: 36.185 Average Precision: 0.374 MRR:0.667 Recall: 0.905 Precision At 1: 0.585 Precision At 2: 0.531 Precision At 3: 0.513 Precision At 4: 0.496 Precision At 5: 0.486 Precision At 6: 0.487 Precision At 7: 0.479 Precision At 8: 0.465 Precision At 9: 0.458 Precision At 10:0.460 Precision At 11:0.453 Precision At 12:0.453 Precision At 13:0.445 Precision At 14:0.438 Precision At 15:0.438 Precision At 16:0.438 Precision At 17:0.429 Precision At 18:0.429 Precision At 19:0.419 Precision At 20:0.415 PersianAnalyzer: SUMMARY Search Seconds: 0.004 DocName Seconds:0.011 Num Points: 987.692 Num Good Points: 36.123 Max Good Points: 36.185 Average Precision: 0.481 MRR:0.833 Recall: 0.998 Precision At 1: 0.754 Precision At 2: 0.715 Precision At 3: 0.646 Precision At 4: 0.646 Precision At 5: 0.631 Precision At 6: 0.621 Precision At 7: 0.593 Precision At 8: 0.577 Precision At 9: 0.573 Precision At 10:0.566 Precision At 11:0.572 Precision At 12:0.562 Precision At 13:0.554 Precision At 14:0.549 Precision At 15:0.542 Precision At 16:0.538 Precision At 17:0.533 Precision At 18:0.527 Precision At 19:0.525 Precision At 20:0.518 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1628) Persian Analyzer
[ https://issues.apache.org/jira/browse/LUCENE-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1628: --- Fix Version/s: 2.9 Persian Analyzer Key: LUCENE-1628 URL: https://issues.apache.org/jira/browse/LUCENE-1628 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Reporter: Robert Muir Priority: Minor Fix For: 2.9 Attachments: LUCENE-1628.patch A simple persian analyzer. i measured trec scores with the benchmark package below against http://ece.ut.ac.ir/DBRG/Hamshahri/ : SimpleAnalyzer: SUMMARY Search Seconds: 0.012 DocName Seconds:0.020 Num Points: 981.015 Num Good Points: 33.738 Max Good Points: 36.185 Average Precision: 0.374 MRR:0.667 Recall: 0.905 Precision At 1: 0.585 Precision At 2: 0.531 Precision At 3: 0.513 Precision At 4: 0.496 Precision At 5: 0.486 Precision At 6: 0.487 Precision At 7: 0.479 Precision At 8: 0.465 Precision At 9: 0.458 Precision At 10:0.460 Precision At 11:0.453 Precision At 12:0.453 Precision At 13:0.445 Precision At 14:0.438 Precision At 15:0.438 Precision At 16:0.438 Precision At 17:0.429 Precision At 18:0.429 Precision At 19:0.419 Precision At 20:0.415 PersianAnalyzer: SUMMARY Search Seconds: 0.004 DocName Seconds:0.011 Num Points: 987.692 Num Good Points: 36.123 Max Good Points: 36.185 Average Precision: 0.481 MRR:0.833 Recall: 0.998 Precision At 1: 0.754 Precision At 2: 0.715 Precision At 3: 0.646 Precision At 4: 0.646 Precision At 5: 0.631 Precision At 6: 0.621 Precision At 7: 0.593 Precision At 8: 0.577 Precision At 9: 0.573 Precision At 10:0.566 Precision At 11:0.572 Precision At 12:0.562 Precision At 13:0.554 Precision At 14:0.549 Precision At 15:0.542 Precision At 16:0.538 Precision At 17:0.533 Precision At 18:0.527 Precision At 19:0.525 Precision At 20:0.518 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1628) Persian Analyzer
[ https://issues.apache.org/jira/browse/LUCENE-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1628: Attachment: LUCENE-1628.patch patch file Persian Analyzer Key: LUCENE-1628 URL: https://issues.apache.org/jira/browse/LUCENE-1628 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Reporter: Robert Muir Priority: Minor Attachments: LUCENE-1628.patch A simple persian analyzer. i measured trec scores with the benchmark package below against http://ece.ut.ac.ir/DBRG/Hamshahri/ : SimpleAnalyzer: SUMMARY Search Seconds: 0.012 DocName Seconds:0.020 Num Points: 981.015 Num Good Points: 33.738 Max Good Points: 36.185 Average Precision: 0.374 MRR:0.667 Recall: 0.905 Precision At 1: 0.585 Precision At 2: 0.531 Precision At 3: 0.513 Precision At 4: 0.496 Precision At 5: 0.486 Precision At 6: 0.487 Precision At 7: 0.479 Precision At 8: 0.465 Precision At 9: 0.458 Precision At 10:0.460 Precision At 11:0.453 Precision At 12:0.453 Precision At 13:0.445 Precision At 14:0.438 Precision At 15:0.438 Precision At 16:0.438 Precision At 17:0.429 Precision At 18:0.429 Precision At 19:0.419 Precision At 20:0.415 PersianAnalyzer: SUMMARY Search Seconds: 0.004 DocName Seconds:0.011 Num Points: 987.692 Num Good Points: 36.123 Max Good Points: 36.185 Average Precision: 0.481 MRR:0.833 Recall: 0.998 Precision At 1: 0.754 Precision At 2: 0.715 Precision At 3: 0.646 Precision At 4: 0.646 Precision At 5: 0.631 Precision At 6: 0.621 Precision At 7: 0.593 Precision At 8: 0.577 Precision At 9: 0.573 Precision At 10:0.566 Precision At 11:0.572 Precision At 12:0.562 Precision At 13:0.554 Precision At 14:0.549 Precision At 15:0.542 Precision At 16:0.538 Precision At 17:0.533 Precision At 18:0.527 Precision At 19:0.525 Precision At 20:0.518 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org