On 03/22/2016 08:18 PM, David Cuny wrote: > Non-developer David again: > > The issues of running UTAU (and UTAU-derived tools) under a Japanese locale > has been enough to keep me from trying it out. For European languages eSpeak is the best one, but that will require much more work. eSpeak can convert text to phonetic symbols for many languages.
> >> Making the user input phonetic symbols instead of actual lyrics is >> not a solution. > > Sorry, I didn't mean to propose that. I just wanted to note that a fallback > that allowed phonetic symbols would be necessary. > > As to the rest, my (unofficial) thought is that it currently takes quite a > bit of manual intervention to get English working well with the UTAU > toolchain, whether it uses VCV or CVVC. And each approach requires a > different set of tools to connect the samples together. It seems to me that > there's quite a bit of risk of not coming out with something usable at the > end. > > -- David > > > > > On Tue, Mar 22, 2016 at 7:58 AM, syrma <[email protected]> wrote: > >> Thank you for your reply. >> >> As for the playback, I also think that singing each note the moment we put >> it is impossible; we need to set the lyrics, and even then, the synthesis >> takes time. But getting it to play like in Cadencii would probably be good; >> to press play once everything is set, that is. Cadencii takes a while to do >> that though, and at some point, the time spent waiting for the synthesis is >> probably multiple times the time spent actually editing (that being said I >> think a lot of optimization is possible on Cadencii so it's probably not >> the >> best example). >> >> Leaving the questions about dictionaries for later, a side note about my >> struggles with v.Connect-STAND, Cadencii's synthesizing engine. I have >> finally been able to get some results out of it (by switching between my >> Linux and Windows every time one gets a problem). The rendering is more >> than >> decent in my opinion (although it depends a lot on the settings and the >> used >> voicebank, and it could sound worse than e-cantorix if not used properly >> (okay, not that bad, but still) ), and I think it is an interesting tool to >> use overall (some Utau users import their Utaus to v.Connect-stand to get a >> better rendering, but it is sometimes a little tricky). However, there are >> a >> few points that hinders direct use: >> >> - The Windows binaries won't work unless the system is as Japanese as >> possible, and while I don't know what is causing this yet (because I am not >> used to compiling on Windows), this needs a fix. >> - Encoding auto-detection is probably needed; even my Linux-built version >> needs a default input encoded as shift-jis (the typical encoding when >> dealing with files created by Japanese users on Windows). It supports other >> encodings, but the user needs to specify them. >> - The software takes a meta text sequence file (its own format), and >> outputs >> an audio. While I think implementing a conversion from a score to a meta >> text sequence would be sufficient for the first part of the project >> (generating the audio), optionally, I believe an optimization might be >> possible. As v.Connect's based on World (which implements real-time singing >> synthesis according to their introduction page), I am wondering whether >> changing the code to intercept the parameters before the audio is >> generated, >> and playing it in real-time would be possible. I have not dived into >> v.Connect's code far enough, so if someone who did thinks I am going a >> wrong >> and completely impossible way, please do let me know. >> >> A very interesting point in it however is its ability to convert and use >> Utau voicebanks, with the great amount of downlodable utaus on the net >> (let's forget for now about the mass of problems that alone causes). While >> looking for the possibility of using English with Utau voices, I came, >> among >> others, across this page : http://utau.wiki/cv-vc (see also: >> utau.wiki/tutorials:cvvc-english-tutorial-by-mystsaphyr ). This seems to be >> popular enough that a lot of utauloids use this method to simulate >> non-Japanese pronunciation. Namine Ritsu, a free voice for v.Connect-stand >> (and the most popular one), also has recordings of this kind, although the >> way English is rendered far from being perfect, and accents are all for the >> user to simulate. There are also (non-open source) plugins that can convert >> lyrics (or rather sequence files) from CVVC to VCV (another style used in >> utaus). Even though this allows for the user to get and add their sets of >> voice from the internet, I can easily think of a few issues one can come >> across: >> >> - Making the user input phonetic symbols instead of actual lyrics is not a >> solution. I think it may be possible to convert lyrics to espeak phonemes, >> and implement the remaining conversion step (that would depend on the >> voice). That gets us to another set of problems; the user would need to >> supply both the word and the hyphenation. And even then, some other >> problems >> are bound to happen, either because the word isn't in the dictionary or >> because the sound isn't available. In the first case, the user may need to >> provide the pronunciation (a proper noun for example). Beside this, should >> we let the user modify the pronunciation they want (after it is >> automatically generated) to simulate an accent or to make something sound >> more natural? >> >> - Encoding problems, always. Japanese on Windows is unpredictably tricky to >> deal with. >> >> - Voicebanks are usually recorded for a precise language. I could be wrong, >> but for now I don't see how we could detect the language unless the user >> specifies it. Also, some of the Japanese are only compatible with either >> romaji or kana (we could use kakasi to convert either the lyrics or the >> voicebank). >> >> Anyways, I don't think any amount of work of one summer would be enough to >> even think about all the issues (everything is so much more complicated >> than >> it first seems). The question would be, how much would make an acceptable >> project? >> >> The project I have in mind for now would be something like the following: >> >> - As a first step, taking care of the usability issues of v.Connect-stand, >> or ideally turning it into a usable library. >> - Implementing the generation of meta text sequences (it would be >> interesting to see how Cadencii, the open source C++/Qt editor, does it). >> This should include the processing of whatever settings we have (including >> phonemes) as this kind of files should provide all the information needed >> for synthesis. >> - Making a MuseScore plugin out of the two aforementioned items. This would >> include in addition: >> - the front-end (collecting settings) >> - the playback function >> >> Though I don't know if this is relevant to the current discussion (or at >> all), while looking for a good free voice data, I found Namine Ritsu's >> license is very unclear to me (the site the wiki pages link to for the use >> terms doesn't exist anymore). There is a separation between the character >> (visual art, profile, ...) and the voice resources. I suspect from the >> contradicting official information that it has changed over the time. The >> character itself seems to be the property of canon, but there doesn't seem >> to be any restrictions over the use of the voices. In addition, this >> voicebank (http://hal-the-cat.music.coocan.jp/ritsu_e.html) says it is >> released under the terms of GPLv3. I assume at least this voicebank is safe >> enough. >> [Unclear official material : >> - http://www.canon-voice.com/english/kiyaku.html (the English says >> something >> very unclear about the character but the voice is free) >> - http://canon-voice.com/ritsu.html ] >> >> So immediate questions are: >> - Is this a realistic and/or an acceptable project? >> - I am not aware of MuseScore plugin rules, so is such an approach alright? >> If not, what is the better way? >> - I am not sure where to integrate the second part, but I think the part to >> integrate into MuseScore should be as general as possible to add gradually >> support for other tools. >> >> Sorry for the long post. Please let me know your opinion, and whether I am >> analyzing things wrong! >> >> >> >> -- >> View this message in context: >> http://dev-list.musescore.org/GSOC-2016-Regarding-the-Virtual-Singer-project-idea-tp7579698p7579737.html >> Sent from the MuseScore Developer mailing list archive at Nabble.com. >> >> >> ------------------------------------------------------------------------------ >> Transform Data into Opportunity. >> Accelerate data analysis in your applications with >> Intel Data Analytics Acceleration Library. >> Click to learn more. >> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140 >> _______________________________________________ >> Mscore-developer mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/mscore-developer >> > > > > ------------------------------------------------------------------------------ > Transform Data into Opportunity. > Accelerate data analysis in your applications with > Intel Data Analytics Acceleration Library. > Click to learn more. > http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140 > > > > _______________________________________________ > Mscore-developer mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/mscore-developer > -- Sent from my Libreboot X200 ------------------------------------------------------------------------------ Transform Data into Opportunity. Accelerate data analysis in your applications with Intel Data Analytics Acceleration Library. Click to learn more. http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140 _______________________________________________ Mscore-developer mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/mscore-developer
