Our East Asian Languages Librarian has approached me with a problem he wants to see solved. According to him, the typical North American library cataloging rules for constructing Pinyin transliterations are different from the rules that are used in China. What this means is that native Chinese speakers have a lot of trouble searching our catalog (it is "practically unusable" was his exact quote). His proposal, and I think it's a good one, is that since we're re-indexing our records into solr anyway, we could apply at index time an algorithm to convert North American Pinyin to Chinese rules Pinyin, index both values, and thus make the catalog much more useful to an under-served population. This seems like a great suggestion to me, but before I start devoting development cycles to it I wanted to poll the community... is there a more obvious answer that I'm not seeing? Has anyone solved this already?
What's the right place for such a piece of code? Solrmarc seems the obvious place to me. As it has been described to me so far, this doesn't seem like an issue affecting people outside the library realm, which makes it seem too niche and community-specific to get it built into the lucene codebase, but I could be wrong about that. Maybe it would be better as a lucene contrib library?
So, thoughts? Anyone know more about this than I do and want to speak up?
Thanks! Bess
smime.p7s
Description: S/MIME cryptographic signature