Thank you for your reply. As for the playback, I also think that singing each note the moment we put it is impossible; we need to set the lyrics, and even then, the synthesis takes time. But getting it to play like in Cadencii would probably be good; to press play once everything is set, that is. Cadencii takes a while to do that though, and at some point, the time spent waiting for the synthesis is probably multiple times the time spent actually editing (that being said I think a lot of optimization is possible on Cadencii so it's probably not the best example).
Leaving the questions about dictionaries for later, a side note about my struggles with v.Connect-STAND, Cadencii's synthesizing engine. I have finally been able to get some results out of it (by switching between my Linux and Windows every time one gets a problem). The rendering is more than decent in my opinion (although it depends a lot on the settings and the used voicebank, and it could sound worse than e-cantorix if not used properly (okay, not that bad, but still) ), and I think it is an interesting tool to use overall (some Utau users import their Utaus to v.Connect-stand to get a better rendering, but it is sometimes a little tricky). However, there are a few points that hinders direct use: - The Windows binaries won't work unless the system is as Japanese as possible, and while I don't know what is causing this yet (because I am not used to compiling on Windows), this needs a fix. - Encoding auto-detection is probably needed; even my Linux-built version needs a default input encoded as shift-jis (the typical encoding when dealing with files created by Japanese users on Windows). It supports other encodings, but the user needs to specify them. - The software takes a meta text sequence file (its own format), and outputs an audio. While I think implementing a conversion from a score to a meta text sequence would be sufficient for the first part of the project (generating the audio), optionally, I believe an optimization might be possible. As v.Connect's based on World (which implements real-time singing synthesis according to their introduction page), I am wondering whether changing the code to intercept the parameters before the audio is generated, and playing it in real-time would be possible. I have not dived into v.Connect's code far enough, so if someone who did thinks I am going a wrong and completely impossible way, please do let me know. A very interesting point in it however is its ability to convert and use Utau voicebanks, with the great amount of downlodable utaus on the net (let's forget for now about the mass of problems that alone causes). While looking for the possibility of using English with Utau voices, I came, among others, across this page : http://utau.wiki/cv-vc (see also: utau.wiki/tutorials:cvvc-english-tutorial-by-mystsaphyr ). This seems to be popular enough that a lot of utauloids use this method to simulate non-Japanese pronunciation. Namine Ritsu, a free voice for v.Connect-stand (and the most popular one), also has recordings of this kind, although the way English is rendered far from being perfect, and accents are all for the user to simulate. There are also (non-open source) plugins that can convert lyrics (or rather sequence files) from CVVC to VCV (another style used in utaus). Even though this allows for the user to get and add their sets of voice from the internet, I can easily think of a few issues one can come across: - Making the user input phonetic symbols instead of actual lyrics is not a solution. I think it may be possible to convert lyrics to espeak phonemes, and implement the remaining conversion step (that would depend on the voice). That gets us to another set of problems; the user would need to supply both the word and the hyphenation. And even then, some other problems are bound to happen, either because the word isn't in the dictionary or because the sound isn't available. In the first case, the user may need to provide the pronunciation (a proper noun for example). Beside this, should we let the user modify the pronunciation they want (after it is automatically generated) to simulate an accent or to make something sound more natural? - Encoding problems, always. Japanese on Windows is unpredictably tricky to deal with. - Voicebanks are usually recorded for a precise language. I could be wrong, but for now I don't see how we could detect the language unless the user specifies it. Also, some of the Japanese are only compatible with either romaji or kana (we could use kakasi to convert either the lyrics or the voicebank). Anyways, I don't think any amount of work of one summer would be enough to even think about all the issues (everything is so much more complicated than it first seems). The question would be, how much would make an acceptable project? The project I have in mind for now would be something like the following: - As a first step, taking care of the usability issues of v.Connect-stand, or ideally turning it into a usable library. - Implementing the generation of meta text sequences (it would be interesting to see how Cadencii, the open source C++/Qt editor, does it). This should include the processing of whatever settings we have (including phonemes) as this kind of files should provide all the information needed for synthesis. - Making a MuseScore plugin out of the two aforementioned items. This would include in addition: - the front-end (collecting settings) - the playback function Though I don't know if this is relevant to the current discussion (or at all), while looking for a good free voice data, I found Namine Ritsu's license is very unclear to me (the site the wiki pages link to for the use terms doesn't exist anymore). There is a separation between the character (visual art, profile, ...) and the voice resources. I suspect from the contradicting official information that it has changed over the time. The character itself seems to be the property of canon, but there doesn't seem to be any restrictions over the use of the voices. In addition, this voicebank (http://hal-the-cat.music.coocan.jp/ritsu_e.html) says it is released under the terms of GPLv3. I assume at least this voicebank is safe enough. [Unclear official material : - http://www.canon-voice.com/english/kiyaku.html (the English says something very unclear about the character but the voice is free) - http://canon-voice.com/ritsu.html ] So immediate questions are: - Is this a realistic and/or an acceptable project? - I am not aware of MuseScore plugin rules, so is such an approach alright? If not, what is the better way? - I am not sure where to integrate the second part, but I think the part to integrate into MuseScore should be as general as possible to add gradually support for other tools. Sorry for the long post. Please let me know your opinion, and whether I am analyzing things wrong! -- View this message in context: http://dev-list.musescore.org/GSOC-2016-Regarding-the-Virtual-Singer-project-idea-tp7579698p7579737.html Sent from the MuseScore Developer mailing list archive at Nabble.com. ------------------------------------------------------------------------------ Transform Data into Opportunity. Accelerate data analysis in your applications with Intel Data Analytics Acceleration Library. Click to learn more. http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140 _______________________________________________ Mscore-developer mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/mscore-developer
