On Tue, Jul 3, 2012 at 1:23 PM, Dave Fisher <dave2w...@comcast.net> wrote:
> > On Jul 2, 2012, at 5:49 PM, Rob Weir wrote: > > > On Mon, Jul 2, 2012 at 7:18 PM, Dave Fisher <dave2w...@comcast.net> > wrote: > >> Sorry for the top post. I like where this is going. A lot of > interesting ideas. > >> > >> I have one major concern. How do we manage the human created content as > people replace and/or edit the translations. What happens when the original > English (or French) page is changed? To me we are really discussing > managing Markdown text. If the names of files are like: > >> > >> index.mdtext > >> index.en-GB.mdtext > >> index.fr.mdtext > >> > > Prefacing my remarks with a general agreement to your ideas, but with > exceptions that we should consider. > > > > > Wouldn't you say that 99% of the website is HTML and Wiki Text today? > > There is very little Markdown in use outside of the Podling project > > pages. > > Yes, that is true. However all the markdown created will be AL2 licensed > while the html is mostly something from the TermsOfUse that was discussed > in a different thread. > > If you are concerned with a clear IP trail, and I know you are, then we > should proceed with Markdown and select HTML. > > > > > In any case, it should be possible to use Pootle for this, just as we > > manage changing product strings and updating those. > > > > There are a good number of convertors for getting to/from Pootle > > format: http://translate.sourceforge.net/wiki/toolkit/index > > > > Note html2po. > > > > I bet writing mdtext2po (and the inverse) would be possible. > > > > If we used Pootle for this, we'd need to define some sort of schedule, > > since it is not really a "release" in the traditional sense. But you > > could imagine every month or so, doing a cycle of: > > > > 1) html2po and mdtext2po > > > > 2) Load into Pootle > > > > 3) Volunteers translate > > > > 4) At specified time run po2html and po2mdtext > > > > 5) check in the new website files > > This can work for content that does not a quick response. Other files like > announcements and various news sources require a more immediate approach. > This same goal is the initial goal of the Apache CMS. It is to eliminate > extraneous process. > > I think that process for translations is good and you provide one that > will work for most content. > > I would add that the branding and navigation (but not the footer) markdown > / ssi's would benefit from this approach. > > > > >> We'll have some type of Apache CMS magic that can handle translated SSI > elements. I need to write Joe / infra-dev an email... > >> > >> Then if we can tie together the CMS to take translations and somehow > inform either or both the human and/or the tool translators when changes > occur in other languages ... svn diff can be used... assuming that... > >> > > > > The issue is the average translator is not an markup (or markdown) > > person. They use Pootle or similar tools that facilitate translation. > > What do we need to be translator-friendly? Consistency between how > > we translate UI and webpages might help. > > What we are targeting here is the NL user who notices a problem with a > translation and wants to help. These people may find Pootle to be any more > usual than HTML. > > Let's not let a Pootle "all encompassing" process break the ability of > committers and contributors to make ad hoc contributions. If the tool you > describe is built then it should certainly carry some svn tags so that > merges of updated content from pootle don't overwrite any CMS based > contributions that have been made in the interim. These will be merge > conflicts. > > As long as your process includes (1) every time then there isn't a problem > with that. However that won't work because the translations in Pootle are > where the bulk of the work occurs. > > I think there are benefits to both the CMS and the Pootle approaches, more > thought will need to go into the timeline and how to fully leverage the > human element between the project's base content, NL users and L10N > Translators. As always our goal is to allow more and more of the community > to be able to easily contribute. > > Let's divide up the process as follows. I'm adding the notion of a string > table which could be implemented as a file for each language with: > > string.mdtext > homepage: home > > string.fr.mdtext > homepage: maison > > string.en-GB.mdtext > homepage: home > > string.it.mdtext > homepage: casa > > (A) Apache CMS - Web Content is Edited / Website's built. > Some changes will be via string (sledgehammer ... ) and others by > content page. > > (My outline) > > (B) CMS to Pootle process > > > 1) html2po and mdtext2po > > Keep an index of the files included in this set. Perhaps use an > attribute (mdtext) or metatag (html) to self identify translatable files. > (footer.mdtext must remain in English since it has legal > implications and translators are unlikely to be IP lawyers.) > > > 2) Load into Pootle > > > Include string table changes made through CMS. > During load handle any differences caused by conflicts between > changes made in (C) since the last (D) > > (C) Pootle Apache Instance > > > 3) Volunteers translate > > Committers by name and contributors from po or other files. > > (D) Pootle to CMS process > > > 4) At specified time run po2html and po2mdtext > > Use the index from (B) of the files included. > > > 5) check in the new website files > > During merge handle conflicts between changes made in (A) since (B) > > (A) and (C) are continuous. > > (B) and (D) can be done in whatever sequence and frequency make sense to > the L10N team. > > Are we getting to a reasonable framework? There certainly detail work > about conversion in and out, but I do think this is something we can > incrementally work towards together. > > Regards, > Dave > > > > >> With markdown it will be easy to have a header parameter that will > signal the inclusion of an SSI detailing the machine translated page vs. > human translation situation. By making it an SSI and translatable it can > become something different language groups can handle in an organic way. > We'll have an objective measure of the engagement of different language > communities based on the the number of edits, number of translators and how > up to date and/or responsive they are. > >> > >> I think we could start by creating a test-auto.mdtext file, and using > the translate.google to convert it to 100 pages. Put the scripts in the > ooo-site/trunk/tools/ directory. If they are perl scripts then in > ooo-site/lib/. > >> > >> Regards, > >> Dave > >> > >> On Jul 2, 2012, at 2:43 PM, Kay Schenk wrote: > >> > >>> On Mon, Jul 2, 2012 at 2:27 PM, Rob Weir <robw...@apache.org> wrote: > >>> > >>>> On Mon, Jul 2, 2012 at 4:20 PM, Kay Schenk <kay.sch...@gmail.com> > wrote: > >>>>> On Mon, Jul 2, 2012 at 7:14 AM, Rob Weir <robw...@apache.org> wrote: > >>>>> > >>>>>> On Mon, Jul 2, 2012 at 9:57 AM, Donald Whytock <dwhyt...@gmail.com> > >>>> wrote: > >>>>>>> You don't have to use Google Translate for the entire site into a > >>>>>>> given language. Better than no page at all in a given language is > a > >>>>>> > >>>>>> True. To enable this integration requires adding markup to two > >>>>>> places in the HTML file: > >>>>>> > >>>>>> 1) Load some script in the <head> section > >>>>>> > >>>>>> 2) Add a Google-provided <div> to wherever in the page we want the > >>>>>> language selector drop down to be. > >>>>>> > >>>>>> It would be really easy to add this to a small number of selected > pages. > >>>>>> > >>>>>> It would also be easy to add to all pages via the CMS template. > >>>>>> > >>>>>> What would be hard is managing this for a large number of pages, but > >>>>>> not all pages. > >>>>>> > >>>>>>> page in a given language that says, "Hi there! This is the site > for > >>>>>>> Apache OpenOffice. We welcome translations of our site into your > >>>>>>> language, and invite you to volunteer at the following email > address: > >>>>>>> <blah> Or you can submit a translation through Google Translate, > which > >>>>>>> was used to produce this page." > >>>>>>> > >>>>>>> Something as short as that is less likely to be garbled in > >>>>>>> auto-translation than something technical, and it tells potential > >>>>>>> contributors what to do to help out. > >>>>>>> > >>>>>> > >>>>>> The trick would be to get people to visit that page. Unless it was > on > >>>>>> the home page. > >>>>>> > >>>>>> -Rob > >>>>>> > >>>>>>> Don > >>>>>> > >>>>> > >>>>> OK, it took me a little while to weed through Google's info on this. > >>>>> > >>>>> A good sample can be found at: > >>>>> > >>>>> > >>>> > http://googleblog.blogspot.com/2009/09/translate-your-website-with-google.html > >>>>> > >>>>> Is there any possibility we could ad the gadget to the OOo blogs > site -- > >>>>> > >>>>> https://blogs.apache.org/OOo/ > >>>>> > >>>>> just for fun and see what we think? > >>>>> This way we'd just be impacting one page and not a whole site. > >>>>> > >>>> > >>>> If we want access to review and approve suggestions made by readers > >>>> then it needs to be on a domain that we "own". This is in common with > >>>> most Google services, you need to demonstrate that you control the > >>>> domain, typically by adding a special META tag to the homepage. For > >>>> *.openoffice.org this is easy, and I've already done this to enable > >>>> Google Analytics. If we want to do the same for the blog we'd need > >>>> the ability to insert special markup into the <head> and <body> of the > >>>> blog template. I'm not sure whether this is possible with our Roller > >>>> setup. > >>>> > >>> > >>> oh -- well too bad. It could have been fun. > >>> > >>> > >>>> > >>>> Another way of testing this, in a quantitative way, is via what is > >>>> called "A/B Testing". With this approach we define an action a > >>>> satisfied site visitor might take, like downloading AOO 3.4. Then we > >>>> randomly show users either the original home page (or download page or > >>>> any other page we're testing). This is "A", and then we show other > >>>> users a different version, B. For example, B could have the > >>>> translation enabled. Then we ran this "experiment" for a period of > >>>> time, like a week or two, tracking which version of the page has the > >>>> higher success rate with users. > >>>> > >>> > >>> hmmmm...interesting > >>> > >>> OK, I've looked at the rest of your post here and will think about > this for > >>> a bit. > >>> > >>> > >>>> > >>>> If the machine translated page leads visitors confuses users, or makes > >>>> them suspect the page, then the download %'s will be lower than the > >>>> original page. And if the translated page is helpful then the > >>>> download numbers would be higher. > >>>> > >>>> You could imagine other success indicators. Pretty much anything that > >>>> has a URL can be measured. For example, imagine we add a link, "This > >>>> page solved my problem" to the bottom of every documentation page. > >>>> Even though the link would just go to a "thanks" page, we could use > >>>> that action to measure the success of translated versus untranslated > >>>> pages. > >>>> > >>>> Of course, we don't need to do this all at once. But I'd recommend we > >>>> think of ways of quantifying success. The website serves our users. > >>>> How do we know what is working well and what isn't? How can we design > >>>> experiments to test alternative approaches? > >>>> > >>>> > >>>> Possible successes for users might be: > >>>> > >>>> - downloaded AOO > >>>> > >>>> - found answer to their question > >>>> > >>>> - signed up for our announcement list > >>>> > >>>> - entered their first bug report > >>>> > >>>> - signed up for one of the project lists > >>>> > >>>> - make first wiki contribution > >>>> > >>>> - followed/liked/+1'ed us on one of our social networking sites > >>>> > >>>> Measure, improve, repeat. Constant improvement and optimization. > >>>> > >>>> We can debate what will improve the website for the users. Or we can > >>>> test and measure. A/B testing is a new option for us, a technique > >>>> that once was used only by the largest commercial websites, but is now > >>>> available to everyone via Google's "content experiments" support in > >>>> Google Analytics. > >>>> > >>>> -Rob > >>>> > >>>>> I think that might a perfect application for something like this. > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> > >>>> > ---------------------------------------------------------------------------------------- > >>>>> MzK > >>>>> > >>>>> "I would rather have a donkey that takes me there > >>>>> than a horse that will not fare." > >>>>> -- Portuguese proverb > >>>> > >>> > >>> > >>> > >>> -- > >>> > ---------------------------------------------------------------------------------------- > >>> MzK > >>> > >>> "I would rather have a donkey that takes me there > >>> than a horse that will not fare." > >>> -- Portuguese proverb > >> > > more info on options... this is an old article, but... http://www.labnol.org/internet/google-translation-widgets/10135/ as an FYI, the translate "gadget" is still available: http://www.gstatic.com/ is JS based and would be a good tool to use for stuff like "announcements" etc if we wanted to send them out as HTML instead of text. I guess this is what I was thinking about when I mentioned the blog articles. This can be applied on a page by page basis. I have NO idea how good it is though. I have no experience with the "product" Rob originally mentioned -- https://translate.google.com/manager/ but since the setup seems to be "bulk" and the mdtext pages actually do get translated to HTML before display, well, I would imagine the licensing would get translated also? So, Rob, in summary, I imagine you were thinking of this for the "english" portions of our existing site(s) -- NOT the NL areas, correct? We would need to find out how "tailorable" this is -- maybe we could exclude areas. -- ---------------------------------------------------------------------------------------- MzK "I would rather have a donkey that takes me there than a horse that will not fare." -- Portuguese proverb