[Bug 73605] No normalization for ancient greek accents in searches
https://bugzilla.wikimedia.org/show_bug.cgi?id=73605 --- Comment #9 from paolo anghileri --- (In reply to Nik Everett from comment #8) Thanks Nik, I'll try following this way. As you suggested I'll provide you a link for the Lucene commitment here soon, so you can review it. Thanks for your suggestions Paolo -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 73605] No normalization for ancient greek accents in searches
https://bugzilla.wikimedia.org/show_bug.cgi?id=73605 --- Comment #8 from Nik Everett --- (In reply to paolo anghileri from comment #7) > (In reply to Nik Everett from comment #6) > > Provided I am not a wikimedia expert and did not explore yet CirruSearch > code, as a CirruSearch developer do you think this normalization should go > through Lucene or is it possible to implement it direcly in CirrusSearch > extension, or maybe in its dependency elasticsearch? > > Otherwise, if this can be done only passing through Lucene, I'll try adding > extra normalization in Lucene and propose a commitment for that. Try getting it in Lucene. Anything in Cirrus would be a nasty hack. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 73605] No normalization for ancient greek accents in searches
https://bugzilla.wikimedia.org/show_bug.cgi?id=73605 --- Comment #7 from paolo anghileri --- (In reply to Nik Everett from comment #6) Provided I am not a wikimedia expert and did not explore yet CirruSearch code, as a CirruSearch developer do you think this normalization should go through Lucene or is it possible to implement it direcly in CirrusSearch extension, or maybe in its dependency elasticsearch? Otherwise, if this can be done only passing through Lucene, I'll try adding extra normalization in Lucene and propose a commitment for that. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 73605] No normalization for ancient greek accents in searches
https://bugzilla.wikimedia.org/show_bug.cgi?id=73605 --- Comment #6 from Nik Everett --- (In reply to paolo anghileri from comment #5) If you want to propose a change to implement it in Lucene then link it here and I'll jump over there and help. I'm not a Lucene committer but I can certainly review it and prod a committer. (In reply to paolo anghileri from comment #4) > I will do local test in the next days. About search backend or extensions do > you have any suggestions? Use CirrusSearch. Its the search backend that we use on all of our wikis. Its better than the built in MySQL search in just about every way. Its the only option to get that normalization from Lucene to take effect as well. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 73605] No normalization for ancient greek accents in searches
https://bugzilla.wikimedia.org/show_bug.cgi?id=73605 --- Comment #5 from paolo anghileri --- (In reply to Nik Everett from comment #2) Thank you Nik, I had a look at that file. I am not an experienced mediawiki developer, but if the problem is really related to that, maybe I can provide some help in adding extra normalization. Thanks Paolo -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 73605] No normalization for ancient greek accents in searches
https://bugzilla.wikimedia.org/show_bug.cgi?id=73605 --- Comment #4 from paolo anghileri --- (In reply to Andre Klapper from comment #1) > Thanks for taking the time to report this! > > I tried the search on https://el.wikipedia.org (which uses the CirrusSearch > extension) and αλφα finds άλφα but ἄλφα only seems to find ἄλφα. > Which search backend/extension do you use? Which MediaWiki version is this? About the second part of the question, I am at a first preliminary step for this project and did not install a mediawiki for this at the moment, so I made tests only on public mediawiki instances for the moment, for instance el.wiktionary.org I will do local test in the next days. About search backend or extensions do you have any suggestions? Thanks again Paolo -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 73605] No normalization for ancient greek accents in searches
https://bugzilla.wikimedia.org/show_bug.cgi?id=73605 --- Comment #3 from paolo anghileri --- (In reply to Andre Klapper from comment #1) > Thanks for taking the time to report this! > > I tried the search on https://el.wikipedia.org (which uses the CirrusSearch > extension) and αλφα finds άλφα but ἄλφα only seems to find ἄλφα. > Which search backend/extension do you use? Which MediaWiki version is this? Thank you Andre for the reply. This is the same situation I have found in my searches My need is being able to search and retrieve ancient greek worlds even with vowels ortographical details specified ( άλφα searchstring retrtieves άλφα, αλφα and άλφα) and without vowels ortograhical details specified (αλφα searchstring retrtieves άλφα, αλφα and άλφα) The fact it works for modern greek but not for ancient suggest me that in this case ancient greek is not supported, while modern, which has different ortographical details, works. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 73605] No normalization for ancient greek accents in searches
https://bugzilla.wikimedia.org/show_bug.cgi?id=73605 Nik Everett changed: What|Removed |Added Keywords||upstream -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 73605] No normalization for ancient greek accents in searches
https://bugzilla.wikimedia.org/show_bug.cgi?id=73605 Nik Everett changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #2 from Nik Everett --- Cirrus uses Elasticsearch for the anlaysis which in turn uses Apache Lucene. I imagine the right place to implement this is there. It looks like https://github.com/apache/lucene-solr/blob/trunk/lucene/analysis/common/src/java/org/apache/lucene/analysis/el/GreekLowerCaseFilter.java implements the normalization. I'd file a bug over there. It doesn't _look_ like adding the extra normalization would be that hard. I suppose you'd have to decide with them whether they should be enabled by default (so you could just add them to that file) or optional. If optional you'd just make a new filter I believe. After its released in Lucene and Elasticsearch we could enable it by default for Greek across the site I think. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 73605] No normalization for ancient greek accents in searches
https://bugzilla.wikimedia.org/show_bug.cgi?id=73605 Andre Klapper changed: What|Removed |Added CC||aklap...@wikimedia.org Summary|Ancient greek accents |No normalization for |problem in searches |ancient greek accents in ||searches --- Comment #1 from Andre Klapper --- Thanks for taking the time to report this! I tried the search on https://el.wikipedia.org (which uses the CirrusSearch extension) and αλφα finds άλφα but ἄλφα only seems to find ἄλφα. Which search backend/extension do you use? Which MediaWiki version is this? -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l