Re: [Apertium-stuff] [GSOC] Unify the metadix formats Queries (Mikel Forcada)
On 4 March 2014 10:44, Gaurav Agrawal ergaur...@gmail.com wrote: Hello Mikel, Sorry for the late reply, I was busy in some assignment work at my university. Hi Gaurav: have you read about metadix? Have you understood how metadix dictionaries are converted to the .dix format used by Apertium compilers? -- I have read about the Contributing to an existing pair information present on the wiki and also about the dictionary file (.dix) file available on the wiki and done the basic installation. I can't find the infotmation about the metadix files on the wiki can you please suggest me some resources. I would suggest that you check the files apertium-fr-es.fr.metadix in the apertium-fr-es package, and apertium-en-ca.en.metadix in the apertium-en-ca package, which demonstrate respectively the use of the prm and sa elements. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Subversion Kills Productivity. Get off Subversion Make the Move to Perforce. With Perforce, you get hassle-free workflows. Merge that actually works. Faster operations. Version large binaries. Built-in WAN optimization and the freedom to use Git, Perforce or both. Make the move to Perforce. http://pubads.g.doubleclick.net/gampad/clk?id=122218951iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Regarding Joining the mailing list
On 4 March 2014 16:29, TARUN GUPTA tarungupta@lnmiit.ac.in wrote: Sir, When I click the confirmation link it shows invalid confirmation string please tell what to do . The confirmation link is only valid once, so clicking it a second time will tell you that it's invalid. It's fine, you're subscribed, there's nothing to worry about. Now -- in a separate thread, please! -- you can talk to us about the project you're interested in. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Subversion Kills Productivity. Get off Subversion Make the Move to Perforce. With Perforce, you get hassle-free workflows. Merge that actually works. Faster operations. Version large binaries. Built-in WAN optimization and the freedom to use Git, Perforce or both. Make the move to Perforce. http://pubads.g.doubleclick.net/gampad/clk?id=122218951iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] [GSOC] Unify the metadix formats Queries
On 28 February 2014 04:28, Gaurav Agrawal ergaur...@gmail.com wrote: Hello All, I am Gaurav Agrawal, student of M.Tech in Computer Science and Engineering at IIIT, Hyderabad. I am very much interested in the machine learning and want to send my summer by contributing to the open source. So, GSoc is the best opportunities and Apertium is the best organization in machine learning for me. As I have the prior good knowledge of the XML and the Java and also the basic knowledge of the python and shell scripts, I found the project Unify the metadix formats interesting and suitable for me. Thanks to the #Unhammer #firespeaker #wei2912 for suggesting me the wiki pages for the basic understanding of the Apertium project and for the installation. Presently, I have been working on the Coding Challenge :) I have the few queries in the same: 1) For the entry: e r=RL lm=débilidébil/ipar n=abdominal__adj//e Output suggested: (débil::débil)[abdominal_adj]; # débil But as is it RL i.e. Right to Left. So, as per understading it should be : (débil::débil)[abdominal_adj]; # débil ? You are correct. 2) Similarly for the conversion of : e r=LR lm=inapropiadoiinapropiad/ipar n=absolut/o__adj//e Output Suggested: (inapropiad::inapropiad)[absolut/o_adj]; # inapropiado But as is it LR i.e. Right to Left. So, as per understading it should be : (inapropiad::inapropiad)[absolut/o_adj]; # inapropiado ? You are correct. 3) The Entry: e lm=multa de tráficoimulta/ipar n=abeja__n/plb/deb/tráfico/lrgb/deb/tráfico/g/r/p/e should becomes: (multa:multa)[abeja__n](_de_tráfico)); # multa de tráfico We have both the left (l) and right(r) part in the pair (p) : plb/deb/tráfico/lrgb/deb/tráfico/g/r/p/e But in the conversion we only have the (_de_tráfico)) and not the (_de_tráfico:_de_tráfico)) is it because both the left and right part are equal ? If yes, we are doing this way only when there is Multiwords with inner inflection and we have the tag g ? How we will treat the case when the left and right part are different with the g tag. I would assume that the output of 'plb/deb/tráfico/lrgb/deb/tráfico/g/r/p/e' should be '(_de_tráfico:#_de_tráfico) -- i.e., that p is processed as usual, and that g inserts the '#' symbol as in the text stream. '(_de_tráfico)' is the output I would expect to see for ib/deb/tráfico/i -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Subversion Kills Productivity. Get off Subversion Make the Move to Perforce. With Perforce, you get hassle-free workflows. Merge that actually works. Faster operations. Version large binaries. Built-in WAN optimization and the freedom to use Git, Perforce or both. Make the move to Perforce. http://pubads.g.doubleclick.net/gampad/clk?id=122218951iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] GSoC Proposal: Diacritic restoration (was: Re: Helping you as a gsoc applicant.)
On 28 February 2014 18:21, Alex Aruj alex.a...@gmail.com wrote: Hi group, Hi. One part of GSoC is that you will learn how to engage with an open source community; you've taken your first step. Good job! A necessary part of interacting with open source communities is to communicate via mailing lists, and there is a certain amount of etiquette involved. In this particular instance, what you have done is usually called thread hijacking -- you've sent a mail as a reply to another, but on a completely different topic. A normal thread on a mailing list is essentially a single conversation, and interjecting an email on an unrelated topic interrupts the usual flow of conversation. This is bad for us, as the result can be a confusing mix of two separate conversations, under the same heading. It's bad for you, as a GSoC applicant, because there may come a time when one of the mentors will need to refer to an earlier part of the communication with you, and will find it difficult to find your email. In future, please write a new email when writing on a new subject, rather than using 'reply'. It's a minor inconvenience to copy and paste the mailing list address, but it's more than outweighed by the later inconvenience involved if you need to refer back to an earlier email. To help you, I've changed the subject to one more appropriate to your proposal. I am considering tackling the 'restoration of diacritic marks' task. I am in the middle of my second semester of C++ and winding down my full-time job in a translation company in order to study computational issues related to language and work freelance in my pair ESEN, and possibly to develop more in PTEN. Anyway, back to GSOC: Is the priority to make the charlifter case-sensitive and for it to respect superblanks exactly as in the example in the box laid out here http://wiki.apertium.org/wiki/Superblanks? Respecting superblanks is a must: diacritic restoration must not be applied to them. Case should definitely be _respected_: the output needs to match the input in terms of case. As for case sensitivity, Kevin Scannell is the person to ask for a definitive answer. My feeling is that case sensitivity can potentially be more accurate, but in the absence of sufficient data, case insensitive (trained on lowercase) should be the default. Should the tasks be done in this order or according to applicant interest? http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Accent_and_diacritic_restoration The task itself is to port Charlifter. Adding a rule-based replacements can be done in a number of ways, but possibly the easiest (and likely most effective way) would be to do so in a similar manner to apertium-tagger -- by adding non-statistically derived probabilities (i.e., you insert a high probability for a rule-based replacement). Training models is a necessary to test the system -- this is a non-code task, and cannot be a requirement. You will need to train multiple models, because testing with one will not be sufficient, but the whatever you can manage of the remainder during the wrap up time should be sufficient. Inform charlifter with target-language information... -- I think this is necessary to make this a full GSoC project (that is, I don't imagine a port of Charlifter will take 3 months by itself). Ideally, this should be started before midterms, but taking midterms as a starting point would be fine. Are the main coding skills needed for this task boolean operations, loops and file input/output knowledge or is something exotic I should be aware of (see next question ; ) )? Anything to help understand finite state automata in this process? Are the different nodes basically functions that are called as the diacritic mark, word, structure is analyzed? The port is the important part. There may be some 'exotic' stuff, depending on how much time is left over, but you'll just be calling functions, not implementing them. Nothing scary :) -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Flow-based real-time traffic analytics software. Cisco certified tool. Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer Customize your own dashboards, set traffic alerts and generate reports. Network behavioral analysis security monitoring. All-in-one tool. http://pubads.g.doubleclick.net/gampad/clk?id=126839071iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Translation memories with Apertium
On 14 February 2014 19:04, Mikel Forcada m...@dlsi.ua.es wrote: Jim, apertiumers: I have explained the task better in the Ideas page. Maybe it will become clear that this is far from being trivial. Fran and I were talking this afternoon and he can tell you about that too. My mistake, I assumed this was related to a project idea I had listed in previous years. In essence, you seem to be talking about a command-line version of Miquel's OmegaT plugin. Is that the case? -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Android apps run on BlackBerry 10 Introducing the new BlackBerry 10.2.1 Runtime for Android apps. Now with support for Jelly Bean, Bluetooth, Mapview and more. Get your Android app in front of a whole new audience. Start now. http://pubads.g.doubleclick.net/gampad/clk?id=124407151iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] GSoC Idea : Stepping into Web 3.0 with WebRTC+Apertium
On 14 February 2014 03:29, Aayush Kothari aayush.kothar...@gmail.com wrote: Thank you, Jimmy and Keld for your inputs. I give the Speech Translation Wiki a thorough read and also looked up GenieTalk. Here's what I want to add: The Web Speech API* open-sourced by Google Open source? In what way is uploading sound files to a proprietary server open source? (Open API != open source). The idea is to exploit the in-browser capability and do away with the need to download or install anything on your computer/tablet/phone. It should be as simple as just picking up a Nexus 7 or an iPad, going to a url on Chrome and begin talking. How do you propose to add translation? Via the web service? Please let me know if I'm overlooking a caveat Honestly, too many to list. Most importantly, your list of potential applications is still more the stuff of science fiction (I mentioned Star Trek for a reason), and I was rather hoping that if you did enough background reading, you would realise that yourself. or going beyond scope here I can only imagine that it is. Calling a bunch of Javascript APIs to get the basic 'you speak, translate the ASR output, encode it via TTS, send over a WebRTC channel' would not take even a week to do. You haven't talked about the details, but if you're just calling APIs for translation, ASR/TTS, etc., then I imagine that you intend to spend the bulk of the project working on how to route the conversation to inject the translation in a way that's not intrusive. That's more a WebRTC project than Apertium. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Android apps run on BlackBerry 10 Introducing the new BlackBerry 10.2.1 Runtime for Android apps. Now with support for Jelly Bean, Bluetooth, Mapview and more. Get your Android app in front of a whole new audience. Start now. http://pubads.g.doubleclick.net/gampad/clk?id=124407151iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] GSOC idea: improve support for non-standard input
On 13 February 2014 09:55, Francis Tyers fty...@prompsit.com wrote: You'll need to discuss licensing with Apple and get them to change the terms for their Application Shop so that GPL programs are allowed. The Free Software Foundation already did this (someone added an app based on GNU Go), and got nowhere. Good luck with that! -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Android apps run on BlackBerry 10 Introducing the new BlackBerry 10.2.1 Runtime for Android apps. Now with support for Jelly Bean, Bluetooth, Mapview and more. Get your Android app in front of a whole new audience. Start now. http://pubads.g.doubleclick.net/gampad/clk?id=124407151iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] GSoC Idea : Stepping into Web 3.0 with WebRTC+Apertium
On 12 February 2014 17:59, Aayush Kothari aayush.kothar...@gmail.com wrote: Hello all, Forgive if this project already exists somewhere, but this is something I truly wanted implemented after I learned what WebRTC is about and capable of. So the idea is this - WebRTC already gives you the ability to have in-browser audio/video chats and there are many implementations of the same already out there. But what all of them do not do is allow communication between 2 persons who may differ in languages they can speak - something that lead to the demand for human and eventually, computer-aided translators such as Google Translate (sadly not free anymore) and Apertium. With my idea, and constantly evolving web-browsers, it'd be a wonderful gift for a huge chunk of the internet users. Speech-to-speech translation is the dream of anyone who grew up watching Star Trek :) A basic idea of what it'd do: It would allow a Japanese guy and a French guy to speak to the browser in their native language and display what the Japanese person actually meant in French on the French guy's screen. It also gives you the chance to speak in Japanese but heard in French on the other side by having the bot (such as a SpeechSynthesisUtterance instance) speak out a translated version of what you said. As well as speech synthesis, you would need speech recognition. I'd suggest that you start with http://en.wikipedia.org/wiki/Speech_translation and follow the links in the article, to familiarise yourself with what would be involved. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Android apps run on BlackBerry 10 Introducing the new BlackBerry 10.2.1 Runtime for Android apps. Now with support for Jelly Bean, Bluetooth, Mapview and more. Get your Android app in front of a whole new audience. Start now. http://pubads.g.doubleclick.net/gampad/clk?id=124407151iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] GSOC idea: make an app for Iphone/Ipad
On 05/02/2014, Francis Tyers fty...@prompsit.com wrote: El dc 05 de 02 de 2014 a les 13:56 +0100, en/na Xavi Ivars va escriure: 2014-02-05 Francis Tyers fty...@prompsit.com: A related question for Mikel: How much work would it be to make Mitzuli support HFST and VislCG in translators ? Would it be enough work to make a GSOC project do you think ? It's a really sought-after feature here (in Tromsø). The main problem I see in here is to have (and maintain) a Android-compatible-JAVA port of both libraries (if the Android NDK can't be used). As far as I am aware there is/was a Java library for HFST,[1] If you know who to ask to add a licence, that would make a good first step. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121051231iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] GSOC 2014 intro
On 14 January 2014 10:39, Prateek Gupta prateekgupta.3...@gmail.com wrote: Hello, Hi! I am a B.E. student from India interested to participate in GSOC 2014 with Apertium organization. The participating organisations for GSoC 2014 have not been selected yet, and there is no guarantee that Apertium will be selected. While it seems likely, given Apertium's multi-year participation, please remember that it is not certain. Can anyone guide me to the current development of the project and its probable ideas and help me to understand the project for a better understanding? http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code contains last year's ideas -- it's a good way to get an idea of the types of projects that are likely to be selected. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] apertium docs in epub format
On 10 January 2014 15:11, Francis Tyers fty...@prompsit.com wrote: Does anyone have experience with epub ? How hard would it be to get the PDF documentation in epub format ? (I think it's basically XHTML in a zip file). Plus an index/metadata in XML, it can be (there's another format intended for screenreaders that can be used instead of XHTML). Calibre (http://calibre-ebook.com) should handle the conversion. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] [POSSIBLESPAM]: Re: [POSSIBLESPAM]: Re: Call for bids: Apertiummaintenance
On 29 December 2013 14:57, Mikel Forcada m...@dlsi.ua.es wrote: Al 12/29/2013 03:50 PM, En/na aboobacker sidheeque mk ha escrit: may be, but I am a person not a company . BTW I am currently trying to create apertium ppa for ubuntu , https://launchpad.net/~aboobackervyd/+archive/apertium ,it is not completed yet:-) I wonder what the project management committee would say, but personally I wouldn't have any problem hiring a person instead of a company. If it were by me, if you think you can provide the services, prepare a bid which is attractive technically as well as economically, and you'll be taken into consideration. I'll second that. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] A design limitation: perfect format handling in transfer may be impossible
On 25 December 2013 09:01, Mikel Forcada m...@dlsi.ua.es wrote: Al 12/24/2013 08:51 PM, En/na Jimmy O'Regan ha escrit: This is a known issue (e.g., Jacob mentions it in this thread from 2009:http://sourceforge.net/mailarchive/forum.php?thread_name=20cf28cd0904300204v45f35e51i118f4d146f83748%40mail.gmail.comforum_name=apertium-stuff) A minor quibble: Sergio's message does not address XML validity at all, which is one of the key points in my message. Many quibble returns: I said Jacob (3rd message in the thread), not Sergio :) Merry Christmas. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] A design limitation: perfect format handling in transfer may be impossible
On 24 December 2013 15:34, Mikel Forcada m...@dlsi.ua.es wrote: Hi all, As part of my work with students in the Google Code-In (notably galaxyfeeder) I have found a limitation in the current design of Apertium, as regards handling of format tags (encapsulated as superblanks) in Apertium. I would appreciate it very much has time to turn this message into a proper bug report, although, as will be seen, rather than a bug, it is a design limitation. Since transfer rules (.t1x, .t2x) have to move superblanks around explicitly, it may be the case that valid HTML or XML is rendered invalid. For instance, a translated ODT file may not open, or a translated XHTML page may not be valid. This is a known issue (e.g., Jacob mentions it in this thread from 2009: http://sourceforge.net/mailarchive/forum.php?thread_name=20cf28cd0904300204v45f35e51i118f4d146f83748%40mail.gmail.comforum_name=apertium-stuff) For instance a rule can move around b pos=1/ and b pos=2/. If b pos=1/ is sometag and b pos=2/ is /sometag, the result is that /sometag comes before sometag, leading to invalid XML or HTML. Similar validity errors may be introduced when tags are lost or repeated. Careful writing of rules may avoid this. In each rule, one can always make sure output superblanks in the same order, and as late as possible, so that the format is preserved as much as possible. But not everything can be avoided this way. Even if superblanks inside a .t1x chunk are correctly handled, .t2x may move chunks around (with their superblanks inside, so nothing can be done about it) and lead to invalid HTML or XML. I see no easy way to solve this without a serious redesign of blank management (perhaps by keeping a standoff list of blanks outside the stream). But I think it's good to be aware of it. Matxin's format (which is already supported by some of the tools) might be a good starting point for this, but it would be best to use an XML parser for XML-based formats. You mentioned ITS support as a wishlist item not too long ago, which would make parsing a requirement; perhaps it would be best to bundle the two together for a GSoC project. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Task ideas for Google Code In
On 7 December 2013 10:02, Gabriel Esteban Gullón yufu...@gmail.com wrote: Hi, I'm Gabriel Esteban, one of the students of GCI of this year. The other day, I downloaded the apertium app on my phone, and I see a lot of things that can be improved (Also I download the apertium app from the svn) In the following lines I will put forward all the things that I think that can be improved. The translations of the app. The app is only translated to english and french, I propose to create some task for translating the app to other language. It's easy, you only need to translate by hand one xml. If you want, I can carry with the task of getting all the files created by others students and uploading to the svn. Design. I propose to create a task that asks for implementing ActionBarSherlock, thats it's a library that allow to include ActionBar on devices upper android 2.2. (I don't know what licenses uses ActionBarSherlock, it can be another task). https://github.com/JakeWharton/ActionBarSherlock says Apache 2.0 -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Sponsored by Intel(R) XDK Develop, test and display web and hybrid apps with a single code base. Download it for free now! http://pubads.g.doubleclick.net/gampad/clk?id=111408631iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Short technical question
On 2 December 2013 12:17, Yannis Haralambous yannis.haralamb...@telecom-bretagne.eu wrote: thanks for your answer! concerning the tagger training, I think there is a lack of information on the Wiki. 1) if I choose unsupervised training, there is a page describing what to do, starting with a raw text file in the given language. It is not clear whether the TSX file is generated during the unsupervised training, or whether it has to exist already It has to exist already. There is currently nothing that generates adequate TSX. Also, what do you mean by closed list? In the examples given or in the existing TSX files, I don't see why some lists of fine tags are called closed and others not... It's 'closed' if nothing new will be added to it, 'open' otherwise. Prepositions and conjunctions are (usually) closed, while nouns and verbs are typically open. 2) the supervised training method not being documented... ...I was wondering whether I can produce a TSX file by running TreeTagger on some large amount of text and then search for frequent/forbidden patterns in the tags produced? If it works, it would mean that all I need to do is to establish a match between TreeTagger tags (the coarse ones), and Apertium tags (the fine ones). TreeTagger's tags would more or less correspond to fine tags in Apertium. Final question: is there somewhere a description of the .prob file format? No, there's nothing to describe it other than the code that reads and writes it. You can get a dump of the probabilities using the prob2text tool that comes with the tagger training tools package. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349351iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Short questions
On 1 December 2013 10:13, Yannis Haralambous yannis.haralamb...@telecom-bretagne.eu wrote: Hi again, could you please help me in understanding the semantics of the structural transfer module programming language, by answering a few short questions? I'm reading the code apertium-es-ca.ca-es.t1x. In the rule called REGLA: NOM you use the macro f_enviaa. In this macro you have the following code: equal clip pos=1 side=sl part=a_npant/ lit-tag v=np.ant/ /equal I do understand that you test the value of attribute a_npant of the class, it is defined as follows: def-attr n=a_npant attr-item tags=np.ant/ /def-attr This part of the macro is irrelevant to 'REGLA: NOM', as it will never contain 'np.ant'. If you look at the pattern: pattern pattern-item n=nom/ /pattern which in section-def-cats is: def-cat n=nom cat-item tags=n.*/ /def-cat you'll see that it can never match 'np.ant'. I presume this macro is used in other rules, which can match np.ant Is the purpose of the test that the two tags np (proper noun) and ant (anthroponym) should be present in the source token? No, it is not a test that they _should_ be present, it is a test for _if_ they are present. The macro can be explained as if the variable 'valverb' contains the value '2' and if _either_ the sl tags contain 'np.ant' _or_ the tl lemma is in the list 'huma', then output the preposition 'a'. The second part of the 'or' makes this macro relevant to this rule: if the tl lemma is in the list 'huma'. Later in the rule you send a lexical unit to the output: lu clip pos=1 side=tl part=lemh/ clip pos=1 side=tl part=a_nom/ clip pos=1 side=tl part=gen/ clip pos=1 side=tl part=nbr/ clip pos=1 side=tl part=lemq/ /lu and I see that you send a_nom, which is def-attr n=a_nom attr-item tags=n/ attr-item tags=n.acr/ attr-item tags=np.loc/ /def-attr Which one of the three tags do you send to the output? How is the choice done? The tag will be either 'n' or 'n.acr', depending on what is on the tl side of the lexicon. def-attr selections are made using regexes. (It cannot be np.loc in this case, as that is not matched by the rule). Furthermore, you separate lemh and lemq, but in the rule there has been no segmentation of the lemma, where does the segmentation come from? lemh and lemq (and lem, whole, and tags) are predefined by transfer. In this case, with a multiword with inner inflection, e.g. tenervblex# en cuenta', lem will contain 'tener# en cuenta', lemh will contain 'tener' and lemq will contain ' en cuenta'. This is mostly used for verbs with enclitic pronouns, which need to be placed between lemh and lemq. Another question: in the same rule, to decide whether you are going to apply f_concord1 (which checks gender and number and sets variables genero and numero) or f_enviaa (which sends an a only if the variable valverb==2 or if the token is an anthroponymic proper noun), you check whether the lemma is equal to pas in the singular number. I looked in the dictionary and pas means step. I was wondering how come this word pas (in the singular) serves to detect anthroponymic proper nouns? It doesn't. At all. The macros are skipped in the single case of 'pas' -- neither apply. Finally, on line 3388 starts rule DETERMINANT NOM, this rule uses two tokens, the determinant and the noun: pattern pattern-item n=det/ pattern-item n=nom/ /pattern that makes two tokens. But on line 3404 I see the following code: test in caseless=yes clip pos=3 side=sl part=lem/ list n=mesos/ /in /test with the purpose of checking whether the noun is a month name. Here the pos argument takes value 3. What is the meaning of pos=3 when there are only two tokens? That's an error; it should be 'pos=2'. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349351iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] A bug in the t1x processor?
On 22 November 2013 14:22, Mikel L. Forcada m...@dlsi.ua.es wrote: Dear Sergio, dear list, Aida and I think we have found a bug in the t1x processor and we have chased it down to a single file containing only the definitions necessary and a single rule. Unfortunately, it has to be tested by installing apertium-eng-kaz (following the steps here: http://wiki.apertium.org/wiki/English_and_Kazakh (hfst and all!, sorry). That shouldn't be necessary. How to test the (possible) bug: install apertium-eng-kaz, then replace apertium-eng-kaz.kaz-eng.t1x with the file with the same name in the dev/ directory, compile, and run this test: echo жазғанмын | apertium -d. kaz-eng-transfer The output should be: apertium-transfer: Rule 1 жазvtvpastp1sg/ writevblexpastp1sg/recordvblexpastp1sg ^+++ HELLO, I AM THE WRONG RULE! +++ The following should be ND or zzz:sg$^defaultdefault{^.sent$} 'zzz', not 'zzz' - the assignment is 'lit', not 'lit-tag' (and to assign a tag, it would have to be declared in the relevant def-attr). If you look at the rule, first the value ND to the number clip is assigned and then there is a choose block that tests for 1st person and assigns zzz to the number clip. None of these two values are printed; instead, the value extracted from жазvtvpastp1sg is printed, namely sg. We also saw some other strange behavior, but this was the easiest to reproduce. Basically, assignments to clips are overriden and the values obtained in previous assignments seem to prevail. We would appreciate it very much if someone could look into this bug. You're trying to change 'sl'; apertium-transfer wasn't designed with that in mind. I'm dimly aware that there was support added for input lt-proc -b , I was not aware that the source would be preserved even with that. Víctor wrote a while back about a bug in that support, that involved a variable that was not used - are you using a version with Víctor's patch applied? Similarly, it could be that sl output (if there is any!) is coming straight from the input buffer. But more fundamentally, do you actually intend to change sl? All the best Mikel P.S. Another behaviour we observed is that under some circumstances you cannot assign values to clips outside their definition range, but we haven't been able to isolate the problem. You can only assign, using lit-tag, values that are included in the part's def-attr. You can get around this with lit if you need to. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Shape the Mobile Experience: Free Subscription Software experts and developers: Be at the forefront of tech innovation. Intel(R) Software Adrenaline delivers strategic insight and game-changing conversations that shape the rapidly evolving mobile landscape. Sign up now. http://pubads.g.doubleclick.net/gampad/clk?id=63431311iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Windows installation problems
On 21 November 2013 15:49, Kevin Brubeck Unhammer unham...@fsfe.org wrote: Jimmy O'Regan jore...@gmail.com writes: On 21 November 2013 15:05, Kevin Brubeck Unhammer unham...@fsfe.org wrote: Jimmy O'Regan jore...@gmail.com writes: I'm not 100% about this, but there was a problem with Cygwin recently - IIRC, certain programs are no longer installed by default - and we should really either update that installer, or remove it. Seems like it needs an update, yes: http://www.google-melange.com/gci/task/view/google/gci2013/6396457749839872 http://superuser.com/a/628401 Does the source for that installer exist anywhere? (Can't find anything likely in SVN.) It's probably in Melange -- that was a GCI student, two or three years ago. The sourceforge files seem to be from 2010. I found http://www.google-melange.com/gci/task/view/google/gci2010/7068214 http://www.google-melange.com/gci/task/view/google/gci2010/7076217 but: Download Broken :-/ I'll see if I have a copy on my old laptop, but digging up a charger for it might prove troublesome. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Shape the Mobile Experience: Free Subscription Software experts and developers: Be at the forefront of tech innovation. Intel(R) Software Adrenaline delivers strategic insight and game-changing conversations that shape the rapidly evolving mobile landscape. Sign up now. http://pubads.g.doubleclick.net/gampad/clk?id=63431311iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Internationalization Tag Set
On 5 November 2013 06:38, Mikel Forcada m...@dlsi.ua.es wrote: Al 11/05/2013 02:12 AM, En/na Jimmy O'Regan ha escrit: Last sentence of the abstract: ITS 2.0 focuses on HTML, XML-based formats in general, and can leverage processing based on the XML Localization Interchange File Format (XLIFF), as well as the Natural Language Processing Interchange Format (NIF). -- tl;dr, it's not just for XML. Jim: XLIFF is an XML application. With added emphasis: ITS 2.0 focuses on *HTML*, etc. Not just XML. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- November Webinars for C, C++, Fortran Developers Accelerate application performance with scalable programming models. Explore techniques for threading, error checking, porting, and tuning. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60136231iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Internationalization Tag Set
On 4 November 2013 18:38, Bernard Chardonneau bechapert...@free.fr wrote: User-Agent: Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20100101 Thunderbird/24.0 Date: Mon, 04 Nov 2013 10:23:54 +0100 From: Mikel L. Forcada m...@dlsi.ua.es To: apertium-stuff@lists.sourceforge.net Reply-To: apertium-stuff@lists.sourceforge.net Subject: [Apertium-stuff] Internationalization Tag Set Hi Apertiumers! A new standard has been adopted by the W3C which relates the internationalization of web content. I think we in Apertium should be aware of this: http://www.w3.org/TR/its20/ All the best Mikel -- OK but what to do with that ? No problen if it is for apertium.org website for the small part outside the wiki. The wiki is not in XML format. Last sentence of the abstract: ITS 2.0 focuses on HTML, XML-based formats in general, and can leverage processing based on the XML Localization Interchange File Format (XLIFF), as well as the Natural Language Processing Interchange Format (NIF). -- tl;dr, it's not just for XML. The wiki generates HTML, and it's not a major task to add templates for ITS. Further, ITS is designed to be used inline, or as stand off annotation. It's possible to use ITS stand off to annotate even plain text, though the XPath to do so would be horrible. But using ITS annotation for documentation or language data is about the last thing I think of in relation to Apertium. At the most basic, it would be nice to have Apertium respect ITS instructions that say 'don't translate this part of the document', for example, or to skip sections that have been translated by another tool, or even to add basic provenance information. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- November Webinars for C, C++, Fortran Developers Accelerate application performance with scalable programming models. Explore techniques for threading, error checking, porting, and tuning. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60136231iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] [Fwd: [GSoC Mentors] Google Summer of Code 2014 + 10 Things]
On 8 October 2013 20:14, Francis Tyers fty...@prompsit.com wrote: Hey all! Looks like GSOC will be taking place next year ! \o/ \o/ \o/ We got the notice a lot earlier this year :) Yeah... what gives? It's not even 2014 yet! :) -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60134071iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Tagger training prerequisites
On 23 September 2013 08:17, Per Tunedal per.tune...@operamail.com wrote: Hi, what should the text-files look like before starting the tagger training? One sentence a line? Something else? Is a text formatted like below OK: Antingen genom att gå in under rätt rubrik ovan och lägga till ditt bidrag eller lägg ditt bidrag i bufferten om du inte vet var eller hur det ska stå. I Önskelistan lägger du förslag på sånt du tycker borde vara med. Or should e.g. the punctuation marks be separated like: I Önskelistan lägger du förslag på sånt du tycker borde vara med . No, you don't need to do that. You don't really need to have the text split into sentences either, but it makes life a little easier if there are problems. Some of the older language pairs have makefiles for tagger training. At a minimum, you will need to adapt the variables for language, and make sure that lt-proc is called with the same set of switches as the primary mode (if you're training for Swedish in sv-da, the mode will be the one that starts mode name=sv-da install=yes). The tagset specification is where you have the most scope to control the tagger. I wrote a linter tool because of problems you were reporting, I'd recommend that you run it before training. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. http://pubads.g.doubleclick.net/gampad/clk?id=58041151iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Tagger training prerequisites
On 23 September 2013 15:45, Per Tunedal per.tune...@operamail.com wrote: Hi, Thanks! I noticed your tool, but unfortunately I'm not sure how to use it! SYNOPSIS apertium-tsx-lint tsx-file [DIC] [DIC] is the 'dictionary' generated during tagger training (not an actual dictionary!). It'll run without it, but it won't give all the warnings. BTW -- are you training for Swedish? Supervised or unsupervised? -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. http://pubads.g.doubleclick.net/gampad/clk?id=58041151iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] New mode for the Apertium Tagger
On 21 September 2013 18:29, Mikel Forcada m...@dlsi.ua.es wrote: Al 09/21/2013 02:11 PM, En/na Francis Tyers ha escrit: No, basically I'm asking if it can work without specifying the set of coarse tags. What would happen if one did not specify the set of coarse tags in the HMM tagger? The tagset specification serves two purposes: to cluster similar tags, and to mark which of these are open, and which are closed. Without open classes, the tagger will fail to train, as there is nothing to assign to unknowns; without closed classes, the tagger is free to assign them to unknowns. Without clustering, the size of the model balloons, and data sparseness becomes a greater problem. It would be nice to have this feature, but I think this is a bit out of the scope of Gang's project. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/22/13. http://pubads.g.doubleclick.net/gampad/clk?id=64545871iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] One more difference between Swedish and Danish monodix
On 11 September 2013 07:38, Per Tunedal per.tune...@operamail.com wrote: Hi, Apertium presupposes that the form in the source language could be generated in the target language, right? Yes and no. Apertium by default passes on the remainder of the tags after what is matched in the bidix. So if the input is 'foonsg', and the bidix has 'foon:barn' then the output will be 'barnsg'. This is what happens with the default rule, or if the rule that matches uses 'part=tags'. But, transfer rules are generally written to have more selective 'part's, and the tags can otherwise be modified by transfer. What if the form doesn't exist in the target language? How to handle that? The Swedish adjective blå (=blue) might have the old-fashioned masculine definite form ending on -e: blåe, just as most other adjectives. As far as I know there isn't any masculine form in Danish, anyhow there isn't anyone in the original Danish monodix. How do I manage to translate blåe to Danish? It's analysed as adj.pst.m.sg.def, but a similar form doesn't exist in Danish. If this is truly exceptional, add an entry with the full amount of needed tags (i.e., as far as 'm'); if it's not, handle it in transfer. The output will probably need to be 'GD', but that assumes that concordance is done in transfer (it ought to be, but...) BTW A similar problem would occur if I ever try to translate French or Spanish to Swedish: In French and Spanish verbs in subjunctive form flourish, but they doesn't exist in Swedish (except in some rare cases, mainly idiomatic expressions). How is this handled in the pair en-es? It's handled in transfer, but the en-es transfer rules are not exactly beginner-friendly -- you'd need to gain quite a bit of experience with transfer to hope to understand some of them. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- How ServiceNow helps IT people transform IT departments: 1. Consolidate legacy IT systems to a single system of record for IT 2. Standardize and globalize service processes across IT 3. Implement zero-touch automation to replace manual, redundant tasks http://pubads.g.doubleclick.net/gampad/clk?id=5127iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Differences in paradigmes for Swedish and Danish
On 11 September 2013 07:15, Per Tunedal per.tune...@operamail.com wrote: Hi, yes, this has to be corrected for several entries. If it's corrected in is-sv I might just copy the entries: I have copied these once before. But my original question is: The translation to Swedish (generation) cannot work if two forms have the same analysis, can it? Apertium cannot choose what to generate, can it? How to handle that? Direction restrictions. In the monodix, e r=LR means 'analyse only' (i.e., do not generate), and e r=RL means 'generate only' (i.e., do not analyse). So you would change e to e r=LR. But in your case, the analysis was wrong, so fix that instead. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- How ServiceNow helps IT people transform IT departments: 1. Consolidate legacy IT systems to a single system of record for IT 2. Standardize and globalize service processes across IT 3. Implement zero-touch automation to replace manual, redundant tasks http://pubads.g.doubleclick.net/gampad/clk?id=5127iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Differences in paradigmes for Swedish and Danish
On 10 September 2013 18:09, Per Tunedal per.tune...@operamail.com wrote: Hi, Working on the Swedish verb ställa and the Danish equivalent stille. I'm confused about the entries in the sv monidix as some have the very same tags: pardef n=följ/a__vblex e pla/l ras n=vblex/s n=inf//r/ppar n=S__voice//e e pler/lras n=vblex/s n=pres/s n=actv//r/p/e e ples/lras n=vblex/s n=pres/s n=actv//r/p/e e r=LRpls/l ras n=vblex/s n=pres/s n=actv//r/p/e According to sv.wiktionary, these last two are both passive, but this looks to be the problem - two generation candidates for vblex.pres.actv. It's either mistagged, or a restriction needs to be added. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- How ServiceNow helps IT people transform IT departments: 1. Consolidate legacy IT systems to a single system of record for IT 2. Standardize and globalize service processes across IT 3. Implement zero-touch automation to replace manual, redundant tasks http://pubads.g.doubleclick.net/gampad/clk?id=5127iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] OT Punctuation in Spanish was: Re: IBM1 partly better than Apertium from French to Spanish
On 3 September 2013 14:30, Per Tunedal per.tune...@operamail.com wrote: Hi, I presume that in my toy corpus the most appropriate would be: ¡Tomad un bloque! Probably. And if it's really important to you, as you've got a regular one-line-per-sentence layout, you can write a simple script to insert it. Further, I assume that Tomad ¡un bloque! and Tomad un ¡bloque! both emphasizes that the person should take a block and not for instance a cone. Is there any difference between them? It depends on context, I guess. Is it possible to emphasize that the person should take one and not two items? Tomad ¡un! bloque. or is this done in an other way? I assume so, I just hadn't thought of it. Anyway, I was just trying to make the point that it's a more difficult problem than it might appear to be (in fact, I'd say it's AI-complete), not to start a discussion. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] IBM1 partly better than Apertium from French to Spanish
On 3 September 2013 14:51, Per Tunedal per.tune...@operamail.com wrote: Hi again, one more thing: I suppose I should start the sentences with a capital letter too? Or doesn't that matter to Apertium? It shouldn't make much of a difference with the input, but you'll find that some language pairs capitalise the first word in the sentence, regardless of whether or not it was capitalised in the input. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] IBM1 partly better than Apertium from French to Spanish
On 2 September 2013 15:49, Xavi Ivars xavi.iv...@gmail.com wrote: 2013/9/2 Jimmy O'Regan jore...@gmail.com The first test sentences for the Block World Corpus are better translated by the outdated statistical translation model IBM model 1 in the direction French to Spanish. Apparently, Apertium has some problems with the imperative of verbs and goes for the subjunctive used in negated requests (this problem persists in the omitted sentences): Original: prenez une flèche prenez un bloc prenez un cône bleu Your sentences are not terminated. If they had been, you would have seen the output you expected. This is something I noticed when the first email was sent, but I didn't look deeper in it: it doesn't make sense than in a rule-based engine like Apertium prenez was translated sometimes as tomad and sometimes as tomáis, when in fact one of the (key?) benefits of the rule-based systems is predictability on the output. It's not hard to understand. There are no sentence boundaries, making this one big sentence. The tagger sees something that may be an imperative following a noun and, because that's unlikely, chooses something else instead (as would a rule-based disambiguator with a reasonable set of rules). Garbage in, garbage out. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Updating errors of the website - Warning!!
On 2 August 2013 17:02, Guillermo Puebla Suárez guillerpue...@hotmail.com wrote: Hello to all Apertiumers, Hi! I'm new on this mailing lists, To be perfectly frank, from the tone of your writing I think that you're new to mailing lists and open source in general. That's ok; everyone begins at the beginning. but I'd like to inform you about not updating the website and pages related to Apertium (Eslema, Prompsit, Opentrad, etc.). So here's how it works: Apertium is open source, so everyone is free to run their own webservices providing Apertium. That does not mean that we have any influence over them: the groups you listed are separate entities. We can only announce new releases (see below) and hope that they update. I'm specifically working on Asturian and Spanish languages. Thanks for your contributions. I told Francis to change some codes in es-ast (Spanish-Asturian) package but we are not able to enjoy them because the WEB IS NOT UP-TO-DATE. Typing in uppercase is considered shouting on mailing lists. Please don't shout, we can hear you just fine :) That the web is 'not up to date' -- that's to be expected, and that's *what we want*. The version in SVN is a development version: it's untested. To best present ourselves, we only want the tested, released versions to be presented to users. For a company such as Prompsit, it could even be irresponsible to present a development version, as they may have customers depending on the service. If you're prepared to do the work involved in preparing a new release, we'd be happy to help, but otherwise, you'll just have to wait until someone else is prepared to do that work. I encourage anybody who has got access to do this to contact me at this email and update the website with the latest sources, I'm free all day long and part of night. 'Send me an offlist email' is usually not the done thing on mailing lists: we answer in public (and use mailing list archives) so our answers can be of use to anyone who may be searching for the answer later (and so to the benefit of the project as a whole), not specifically for the benefit of the person asking. (On some mailing lists, if you ask for offlist email, you'll be presented with a set of consulting rates). -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Get your SQL database under version control now! Version control is standard for application code, but databases havent caught up. So what steps can you take to put your SQL databases under version control? Why should you start doing it? Read more to find out. http://pubads.g.doubleclick.net/gampad/clk?id=49501711iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Updating errors of the website - Warning!!
On 2 August 2013 18:53, Guillermo Puebla Suárez guillerpue...@hotmail.com wrote: In short, what are the tested versions (from SVN I imagine)? No, the tested versions are the tarballs available for download here: https://sourceforge.net/projects/apertium/files/ -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Get your SQL database under version control now! Version control is standard for application code, but databases havent caught up. So what steps can you take to put your SQL databases under version control? Why should you start doing it? Read more to find out. http://pubads.g.doubleclick.net/gampad/clk?id=49501711iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] merge of ATT - lttoolbox binary compiler into lttoolbox
On 19 July 2013 17:49, Francis Tyers fty...@prompsit.com wrote: One question: My preference is for lt-comp to parse both the .dix and the ATT files. The behaviour would be: if(fileIsValidXML()) { parse_xml else if(fileIsValidATT()) { parse_att } else { fail } Would this be ok for people? * I'd prefer to have the ATT compiler as part of lt-comp as opposed to a separate program. * I'd prefer it to work out the file format automatically rather than for people to have to specify the format to compile from. Er... why? 'lt-comp lr' and 'lt-comp rl' are presumably irrelevant, so why not make it 'lt-comp att' or whatever else makes sense. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] idea about transfer files
On 13 July 2013 10:38, Francis Tyers fty...@prompsit.com wrote: While we're on the subject of the transfer files and making changes, I had an idea the other month about making it easier to teach apertium transfer: make the attributes and variables sections optional. Try out r45737. Nothing seems to have broken so far, but I'll back it out if something does. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] idea about transfer files
On 13 July 2013 15:34, Mikel Forcada m...@dlsi.ua.es wrote: Jim, Fran: Sorry but I have to step in and say that I am not happy with the procedure followed here. Neither am I. I'd love to be able to put the changes in a branch, say try out this branch, and merge if all is well or discard if not. As is, the only choice for *both* making changes *and* having someone else test them is commit/revert or never to change anything. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Ask for help on HMM unsupervised training
On 6 June 2013 10:14, Francis Tyers fty...@prompsit.com wrote: I think the problem is that the extra analyses are added by regular expressions which are not covered in the expansion. Not with 'Mar'. The regexes that were in those dictionaries1) were not specific about gender (even when they could/should have been), 2) did not capture individual words like that, and 3) are disabled in those dictionaries because they prove the JWZ 'now you have two problems' axiom. (Actually, at least 3, last count) -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Rules for proper names
On 30 May 2013 18:47, Francis Tyers fty...@prompsit.com wrote: El dj 30 de 05 de 2013 a les 19:42 +0200, en/na Per Tunedal va escriure: The most difficult part would be to find the names. Perhaps someone has any ideas? In Icelandic--English, regular expressions are used. See e.g. pardefs for persons and lastnames in is.dix This is not altogether recommended though, as regular expressions slow down your transducer. What you could do is use them on a large corpus and then mass-add the ones after superficial checking. Census data is easy to find, gazetteers for NER are easy to find, en.wiktionary has categories for names (http://en.wiktionary.org/wiki/Category:Surnames_by_language http://en.wiktionary.org/wiki/Category:Male_given_names_by_language http://en.wiktionary.org/wiki/Category:Female_given_names_by_language), as do en.wikipedia (http://en.wikipedia.org/wiki/Category:Surnames http://en.wikipedia.org/wiki/Category:Given_names), da.wikipedia (http://da.wikipedia.org/wiki/Kategori:Efternavne http://da.wikipedia.org/wiki/Kategori:Fornavne), and sv.wikipedia (http://sv.wikipedia.org/wiki/Kategori:Efternamn http://sv.wikipedia.org/wiki/Kategori:Förnamn), and Europarl has speaker annotation which contains the name of the speaker. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with 2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2 ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Stange behaviour of the on-line version of the translator
On 13 May 2013 22:07, Bernard Chardonneau bechapert...@free.fr wrote: A less important problem, changes done on availlable language pairs since more than one year are not yet taken into account by on-line translators. This point also concern http://apertium.saluton.dk website. Presumably, those pairs have not had a new release. The development versions in SVN are often quite unstable, and are otherwise rarely as thoroughly tested as the versions that are released, and it would be unwise to use them in a web service (if not outright damaging to the project's reputation!). -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- AlienVault Unified Security Management (USM) platform delivers complete security visibility with the essential security capabilities. Easily and efficiently configure, manage, and operate all of your security controls from a single console and one unified framework. Download a free trial. http://p.sf.net/sfu/alienvault_d2d ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] [GSoC 2013] Simpledix improvements
[Sorry, I've only noticed now that the email didn't send!] On 30 April 2013 08:53, d...@alu.ua.es d...@alu.ua.es wrote: 2013/4/30 Jimmy O'Regan jore...@gmail.com On 29 April 2013 18:07, d...@alu.ua.es d...@alu.ua.es wrote: Hi everybody, I'd prefer to have a meta-configuration: if it sees 'vblex', then generate 'pri.p3.sg', 'inf' and 'pp.m.sg', etc. and generate the configuration based on that. It would be trivial to add a task to dixtools to do this, and should be easy enough to do otherwise. The automatic script already takes that kind of meta-configuration. You can see an example at the end of (http://wiki.apertium.org/wiki/User:Dtr5#Making_your_own_configuration_file). But that method has some problems: it is really slow (takes around 2 hours for processing es-ca dictionaries with the sample configuration), it is That seems wrong. There should be no reason for this to happen. The maximum that I would expect from a dixtools-based tool to do this would be a few seconds. Perhaps you should investigate that? I encourage developers to test Simpledix (http://apertium.vm.bytemark.co.uk/simpledix). It only has configuration files for the es-ca pair, but it would give you a better understanding of the current state of the tool, and see how it could be improved. If somebody needs a bit more information, you can read the tutorial on the wiki (http://wiki.apertium.org/wiki/User:Dtr5). I am looking forward to hearing some feedback on this project. I really like the idea of having an easy to use interface for editing the dictionaries, but I'd like you to give some thought to the _next_ problem, too: what to do with the changes, to make it easier for users to contribute them. Passing whole dix files around can work, but would be quite a pain - it would be much better to be able to pass just the changes. Do you have any thoughts on that? When you export the dictionaries, a simple xslt transformation puts all the new entries at the end of the dictionary. I could provide only the difference, greatly reducing the size of that download. Sure, that's an option. There should be plenty of pre-built diff/patch tools out there. As for uploading, I think nothing can be done. There are plenty of options. At the most basic, all of our interactions with SVN are via HTTP. At the very least, you can provide a configuration option to specify the address of the package in SVN, then download the files directly from there. With a little more effort, there are functions for SVN (http://php.net/manual/en/ref.svn.php) so at the very least, you can provide the revision number of the dictionaries that have been modified. Yet more complicated would be to use git as a backing store (e.g., using http://gitorious.org/git-php), and create a branch whenever someone edits the dictionaries. Language pair maintainers who are able to use git could pull directly, or git's machinery could be used to export patch sets. It would even give the option of allowing logged in users to pick up where they left off. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with 2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2 ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] [GSoC 2013] Simpledix improvements
On 3 May 2013 11:57, d...@alu.ua.es d...@alu.ua.es wrote: [Sorry, I've only noticed now that the email didn't send!] No problem. I got some work for this summer, so I don't think I can give GSoC ~30h/week. Won't apply this year. If (as expected) I get some free time late June, I'll do the configuration file generation improvements. That's a pity, but best of luck with the job! That seems wrong. There should be no reason for this to happen. The maximum that I would expect from a dixtools-based tool to do this would be a few seconds. Perhaps you should investigate that? It does not use dixtools, but some bash + xslt. Of course, it should not take more than a couple of minutes. Yeah, I took a (brief!) look at the scripts. I have seen something like this kind of slowdown before, (with EXSLT and xsltproc, IIRC), but nothing struck me as familiar. As for uploading, I think nothing can be done. There are plenty of options. At the most basic, all of our interactions with SVN are via HTTP. At the very least, you can provide a configuration option to specify the address of the package in SVN, then download the files directly from there. With a little more effort, there are functions for SVN (http://php.net/manual/en/ref.svn.php) so at the very least, you can provide the revision number of the dictionaries that have been modified. Yet more complicated would be to use git as a backing store (e.g., using http://gitorious.org/git-php), and create a branch whenever someone edits the dictionaries. Language pair maintainers who are able to use git could pull directly, or git's machinery could be used to export patch sets. It would even give the option of allowing logged in users to pick up where they left off. As is, you can choose to upload the dictionaries and configuration files from an url, like the ones sourceforge provides for direct download. Yes, you can pull the individual files straight from SVN. Interacting directly with repositories seems a good idea, but it requires a major rework: the tool would need to know how to test the dictionaries before uploading to the repository. Nowadays, it does not even need Apertium to work. I don't think you'd need to trouble yourself with doing more than validating (the XML of) the dictionaries (which is just a DTD-based validation). It'd be nice, sure, to check if they also compile, but as long as the XML is more or less valid, it should be ok. Another option is downloading the tool and setting it locally. Simpledix only needs a web server, xsltproc and BaseX, all of them easily installed (they are part of the official debian repositories). As for keeping the progress of the users, if you don't close your session, you can save the url (that has your id as a get parameter), and keep working later. But as the tool lacks proper session management, anybody can use that id, and close your session (erasing your progress), so is not advised to work that way. I was just mentioning that as a positive side-effect -- it doesn't overly interest me. I'm far more interested in making it as easy as possible for potential contributors to contribute, and to merge those contributions. Really, I have github's pull requests in mind as the ideal: there's an open source clone called GitLab (http://gitlab.org) and they seem to have this model, so it might not be too hard to port to PHP. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with 2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2 ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] GSoC: Visual editor for transfer rules
On 1 May 2013 15:22, Lipka Boldizsár lip...@zoho.com wrote: Hi all, Hi! I'm a Molecular Bionics student from Hungary (yeah, quite far away from machine translation, I know, but at least I can code) and am interested in the GSoC idea Visual interface for writing structural transfer rules. I did some research on the matter already, gone through the New Language Pair Howto, the machine translation series at wiki.apertium.eu and I'm currently reading the Transfer Rules Examples article. Do you think I need anything else to pu together a good proposal? You could show us what you came up with while working through the howto, but you're on the right track. (Guess I'm a bit too late, but meh. There's no harm in trying.) Correct :) -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET Get 100% visibility into your production application - at no cost. Code-level diagnostics for performance bottlenecks with 2% overhead Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap1 ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Apertium-stuff Digest, Vol 72, Issue 48
On 29 April 2013 16:58, Francis Tyers fty...@prompsit.com wrote: El dl 29 de 04 de 2013 a les 21:20 +0530, en/na Anand Soni va escriure: Hi, Sentiment analysis will not directly help machine translation. But, machine translation can definitely help sentiment analysis. Most of the work in sentiment analysis has been done in English only. After building the sentiment analysis tool, we can integrate it with a translator to do sentiment analysis for many languages. This is the idea that I have behind this project. Thus, it may be viewed as a new feature for Apertium machine translator. Please share your ideas on this. (1) Please do not reply to list digest posts. ...without trimming them! (2) Apertium is a machine translation project. Our goal is to make machine translation systems :) Any project should have either making a machine translation system, or improving the framework for making machine translation systems as a goal. (3) This sounds very much like Opinum[1], which may be open sourced at some point. [1] Bonev, Boyan, Gema Ramírez-Sánchez, and Sergio Ortiz Rojas. Opinum: statistical sentiment analysis for opinion classification. Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis. Association for Computational Linguistics, 2012. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Try New Relic Now We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] proposal page of sphinx
On 28 April 2013 08:37, sphinx jiang yishan...@gmail.com wrote: Hi, This is my original proposal page of GSOC, maybe something need to be modified, so publish now to get some advise~~ http://wiki.apertium.org/wiki/User:Sphinx/GSoC_2013_Application:_%22Chinese(simple)-Chinese(traditional)_language_pair%22 You should replace the screenshots with text (use pre) and maybe rename your github repo to include the language names. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Try New Relic Now We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Coding Challenge for idea Sliding-window part-of-speech tagger
On 20 April 2013 16:37, Gang Chen pkucheng...@gmail.com wrote: I've done the coding challenge for this idea, with the code here: https://github.com/elephantgcc/gsoc-2013/blob/master/ApertiumFilter.py As the project is listed as a C++ project, the coding challenge ought to also be carried out in C++. As is, we don't know whether or not you can even compile C++, let alone write it. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Problem in generating tagged corpus
On 12 April 2013 22:50, Mohit Aggarwal mohit@gmail.com wrote: Hi, I tried to tag a corpus by using moses as given in the coding challenge page http://wiki.apertium.org/wiki/Generating_lexical-selection_rules_from_a_parallel_corpus . I first cleaned the corpus by perl (path to your mosesdecoder)/scripts/training/clean-corpus-n.perl europarl-v7.es-en es en europarl.clean 1 40 then I tried to tag the corpus by nohup cat europarl.clean.en | apertium-destxt |\ apertium -f none -d /home/fran/source/apertium-en-es en-es-pretransfer europarl.tagged.en But each time I execute this command the output tagged file contains different number of lines which is not equal to the number of lines in the input file. Please tell me is there something I'm doing wrong. The most obvious thing that seems wrong to me is that you probably don't have a directory named '/home/fran/source/apertium-en-es' -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Apertium PMC election: election board
On 8 April 2013 11:45, Kevin Brubeck Unhammer unham...@fsfe.org wrote: Mikel Forcada m...@dlsi.ua.es writes: [...] The current temporary census consists of: As it has been more than 7 days, this is considered the definitive census of Committers with right to vote. I think something went wrong between these two stages, as Jonathan at least was unaware that the election was taking place -- voter turnout was much lower this time around than last, perhaps something went amiss during the census? And if so, wouldn't it be appropriate to re-open it? -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Apertium PMC election: election board
On 9 April 2013 17:32, Xavi Ivars xavi.iv...@gmail.com wrote: 2013/4/9 Jimmy O'Regan jore...@gmail.com I think something went wrong between these two stages, as Jonathan at least was unaware that the election was taking place -- voter turnout was much lower this time around than last, perhaps something went amiss during the census? And if so, wouldn't it be appropriate to re-open it? I wouldn't say it was much lower. It has decreased, but I don't think the number is so low that we could think something strange happened (28 registered voters in 2011 [1] and 22 in this year [2]). Aha. Sorry, I somehow had the impression that the number was much higher last time. In any case, as Mikel has pointed out on IRC, the decision is the election board's. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Android ideas (was: Google joins Apertium in providing offline translation on Android.)
On 8 April 2013 07:47, Kevin Brubeck Unhammer unham...@fsfe.org wrote: Mikel Artetxe artet...@gmail.com writes: As for Google Translate's app integrating better in Android, it is true that it has some great features that Apertium's app misses. Implementing some of them (like offline OCR[1], which was suggested during last GSoC) would be nice and relatively easy, but some others (like TTS or voice recognition, at least for all the minor languages that Apertium supports) would probably be unachievable for us. Doesn't Android come with some recognition and TTS built-in? Kind of, but the ASR is pretty limited (command and control only), and there are either no tools available for adapting the language data, or the tools are only available as Windows binaries (and even then, you don't get the full set of tools). The ASR most people think of when they think of Android is a proprietary Google-branded add-on, and maybe they've added an offline mode in more recent versions, but it at least used to be true that it did nothing more than record a sound file and send it to a Google server to be processed (there's equivalent code in the Chrome tree. It's really not interesting). -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Minimize network downtime and maximize team effectiveness. Reduce network management and security costs.Learn how to hire the most talented Cisco Certified professionals. Visit the Employer Resources Portal http://www.cisco.com/web/learning/employer_resources/index.html ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Google joins Apertium in providing offline translation on Android.
On 7 April 2013 20:08, Mikel Artetxe artet...@gmail.com wrote: As for Google Translate's app integrating better in Android, it is true that it has some great features that Apertium's app misses. Implementing some of them (like offline OCR[1], which was suggested during last GSoC) would be nice and relatively easy, but some others (like TTS or voice recognition, at least for all the minor languages that Apertium supports) TTS is not a big problem. eSpeak is available for Android (via the Eyes-Free project), and I think CMU Flite is too. I added 'generate with tags' mode to lt-proc for exactly this purpose, but a wrapper to pick out the ambiguous words and annotate with, say, SSML would be needed (not a whole lot of work, though). ASR is more of a problem. PocketSphinx is available for Android, but there are very few languages with available acoustic models. (If you want to help to change that, VoxForge (http://www.voxforge.org/) are building open data for ASR). The English model is relatively well developed, but they have models for other languages. write about topics -and apps- suggested by readers. Wouldn't it be nice to suggest them to write an article about Apertium's app? It's just an idea, perhaps somebody has already tried something like that... I'd assume that nobody has, marketing is not a project strong point. If you have ideas about how we can change that, I know I'd love to hear them. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Minimize network downtime and maximize team effectiveness. Reduce network management and security costs.Learn how to hire the most talented Cisco Certified professionals. Visit the Employer Resources Portal http://www.cisco.com/web/learning/employer_resources/index.html ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] The terrible tagger was :Re: New page about transfer rules ready (in French)
On 2 March 2013 02:52, Jimmy O'Regan jore...@gmail.com wrote: On 1 March 2013 12:39, Per Tunedal per.tune...@operamail.com wrote: Hmm It's selected all the time! That's a bit confusing. Why does the tagger choose something previoulsy unknow (man) instead of the indefinite artikle (en)? Because it's getting the coarse tag 'PRN', which was presumably the most common of the available options in the corpus the tagger was trained on. 'PRN' contains the tags-item 'prn.*', so it catches a lot. I've made a lint tool for TSX: https://github.com/jimregan/apertium-tsx-lint This is one of the errors it will catch, in case it comes up again in future: $ cat examples/multimatch.tsx ?xml version=1.0 encoding=UTF-8? tagger name=multimatch tagset def-label name=TESTMATCH tags-item tags=prn.*/ /def-label /tagset /tagger $ echo fooprnsubj/fooprnobj| perl apertium-tsx-lint.pl examples/multimatch.tsx MASKED_AMBIGUITY: TESTMATCH (4) matches more than one analysis: INPUT: fooprnsubj/fooprnobj MATCHED: prnsubj/prnobj (Unlike libxml2, Perl's XML::Parser gives the correct line numbers :) -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Minimize network downtime and maximize team effectiveness. Reduce network management and security costs.Learn how to hire the most talented Cisco Certified professionals. Visit the Employer Resources Portal http://www.cisco.com/web/learning/employer_resources/index.html ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Apertium-stuff Digest, Vol 71, Issue 48
On 29 March 2013 03:13, Anand Soni anand.92.s...@gmail.com wrote: Hello! By the 'toolbar', I just meant the online translation platform of Apertium (http://www.apertium.org). The main reason I asked is because your idea is quite vague, and I prefer to think in more concrete terms. It's still not clear to me whether you are proposing to integrate an existing package for anaphora resolution into something like Apertium AWI in a similar way to spelling and grammar checking, to provide a visual indication to a human translator of pronouns that may have been incorrectly translated; or, if you're proposing to add a module that aims to improve the automatic translation of these pronouns. And presently, it is not clear to me how I will go about integrating it with the translator. I want ideas on whether this will be a good addition to Apertium or not. I will definitely think on the integration part. I was rather hoping that you would make some attempt to explain how you would go about integrating it, because doing so could have made what you're proposing more clear. Until I know what exactly you're proposing, I can't tell you whether or not it's a good idea. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Own the Future-Intel(R) Level Up Game Demo Contest 2013 Rise to greatness in Intel's independent game demo contest. Compete for recognition, cash, and the chance to get your game on Steam. $5K grand prize plus 10 genre and skill prizes. Submit your demo by 6/6/13. http://altfarm.mediaplex.com/ad/ck/12124-176961-30367-2 ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Apertium-stuff Digest, Vol 71, Issue 40
On 25 March 2013 03:20, Anand Soni anand.92.s...@gmail.com wrote: Now, I understand and believe you, Sir, that this would be rather difficult. We're not that formal. None of us has received a knighthood, there are no 'Sirs' here :) I too think that I should work on a project with higher probability 'success'. Good. It's quite easy, when dealing with something new, to underestimate the difficulty of the task, and three months is not a lot of time. I have been thinking of other ideas. Good. I'd be interested in hearing them. You should join the IRC channel - it's a little easier to talk about ideas that are maybe not fully formed when communication is in realtime. I will definitely keep in touch with you regarding any idea that comes up in my mind so that I can be guided by you on whether I should do it or not. I'm not seeking to veto your ideas. If there is something that I think may be more difficult than you think, I will tell you -- and, if you want a clarification, please ask! -- but if you want to insist, then insist! And really sorry for the way I replied to the previous mails. I assure you that it was not intentional and definitely, will not happen again. There's no need to apologise. The point of GSoC is to get students involved in Open Source projects, and participating in mailing lists is part of that, and comes with a set of conventions that are not obvious at first. Hopefully, you can appreciate that trimming the email you're replying to down to just the parts you wish to address is a better way to communicate than allowing your words to be lost! (I used 'ignore' in my email yesterday, because that's how it would appear to you; I hope you will pay heed to Bernard's reply, that he tried, and failed, to find what you had written). -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Apertium-stuff Digest, Vol 71, Issue 37
First of all... Replying to a digest email the way you did is a very good way to be ignored, because your reply is buried in the middle of many emails that have already been read. It also gives the impression that you are either too lazy or too self-important to consider the time of the others on the list, who have to search for your answer. Don't do that: that's not the impression you want to give. Now... On 24 March 2013 04:14, Anand Soni anand.92.s...@gmail.com wrote: Sorry for using the wrong terminology. It is translation, not transliteration. I kind of changed the whole idea by using this word! I donot know how difficult that would be but, I *do* know, that's why I told you it's too difficult. Perhaps you feel I'm underestimating you, because of your slip-up in terminology, but let me assure you that this is not the case. This would be a difficult project for an exceptional, experienced student. I would definitely like to start working on this. I will keep asking things here and to my mentor if I get stuck. Also, I will definitely figure out things myself. But, this is the project that I would like to do as off now. If I change my idea, I will let you know. We can only accept a small fraction of the proposals we receive - if we are even fortunate enough to be selected again - and given the choice between an ambitious, but unrealistic, proposal that seems likely to fail, and a more modest, but realistic, proposal that seems likely to succeed, we, as mentors, will typically choose the latter. So your choice is this: pick a more realistic project, or make us believe that *you* can make the project realistic. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Apertium-stuff Digest, Vol 71, Issue 26
On 23 March 2013 12:19, Anand Soni anand.92.s...@gmail.com wrote: Hello Everyone! One of the project idea that I would like to introduce is a English-Hindi transliteration pair Transliteration? Now, do you actually mean transliteration, or do you mean translation? You'll have to be quite careful with your terminology. that Apertium does not support currently. This language pair, if implemented and released, would be very useful to millions of Indians and would be a nice quality addition to the Apertium toolbox. Also, I plan to do word-sense disambiguation which is also one of the proposed idea of Apertium. Currently, Apertium has only a limited number of language pairs. And this addition will be valuable. Please give me your feedback on this idea. I will think further on the implementation details and soon submit a proposal for the same if this idea sounds good to you. The standard advice applies: a language pair involving English is too difficult for someone new to Apertium, and even for someone with a lot of experience, it would be difficult to complete in 3 months. I suggest that you take a look at translation between more closely related languages, where the learning curve is lower, and the chance of success is higher. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Idea for GSOC: tools to train supervised taggers
On 20 March 2013 21:59, Francis Tyers fty...@prompsit.com wrote: I've added it to the ideas page, if anyone would like to expand on it, the read more page is here: http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Interface_for_creating_tagged_corpora I assumed Gema was talking specifically about a web interface ('upload') rather than a desktop tool. (IIRC, Jacob's apertium-viewer can be used for that). -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Apertium in android
On 18 March 2013 10:19, karunakar medamoni kannaiah.chi...@gmail.com wrote: Hi Jacob Thanks for u reply. I have downloaded apertium toolkit for java jar file lttoolbox.jar f. Here we are doing translation for Sanskrit to Hindi(http://sanskrit.uohyd.ac.in/scl/). We have compiled bin file for Sanskrit. Let me explain what actually am doing.Am trying to get morph information of a word from bin file it size around 17MB. Initially i did integration apertium code with my code and its runs fine in eclipse. While coming to android am getting out of memory error. Please find the logcat of my android applications. I was posted in android forums also about this error. Later i think apertium also released apertium translator for android. This is reason i have posted my query apertium. I think memory issues like this are part of what prompted Jacob to work on memory mapping the transducers. Try using the latest SVN version of lttoolbox-java and see if you still have the same issue. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Idea for GSOC
On 12 March 2013 16:58, Tino Didriksen tino.didrik...@gmail.com wrote: On Tue, Mar 12, 2013 at 12:21 PM, Francis Tyers fty...@prompsit.com wrote: El dt 12 de 03 de 2013 a les 10:55 +, en/na Jimmy O'Regan va escriure: Sorry, I wasn't clear enough. The idea is segmentation. I said that segmentation by itself would probably make a good project, where by itself was intended to mean that the project would just be segmentation. In practice, you will also have to work on a language pair where this can be used. zh_ZH-zh_TW is a perfect candidate, because segmentation is not strictly necessary for this language pair - i.e., you use it to demonstrate that segmentation is working, without _needing_ to. In that regard, you will need to also allot some time to developing that language pair, though it will not be the primary focus of the project. So this would be for languages where word boundaries are not written ... Chinese/Thai/Lao/Khmer/Burmese etc. ? Yes, that could be interesting. But, if it was the case that the project would be for just segmentation, then ideally it would be tested on more than one language. Sounds trivially done by making a thin shell over ICU's BreakIterator: http://userguide.icu-project.org/boundaryanalysis Not really. That's aimed more at segmentation for display purposes - wrapping lines and the like - where things like ambiguity in the segmentation are not a pressing concern. We can already get something equivalent in lttoolbox, by setting the dictionary to postblank by default. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Idea for GSOC
On 12 March 2013 11:21, Francis Tyers fty...@prompsit.com wrote: El dt 12 de 03 de 2013 a les 10:55 +, en/na Jimmy O'Regan va escriure: In practice, you will also have to work on a language pair where this can be used. zh_ZH-zh_TW is a perfect candidate, because segmentation is not strictly necessary for this language pair - i.e., you use it to demonstrate that segmentation is working, without _needing_ to. In that regard, you will need to also allot some time to developing that language pair, though it will not be the primary focus of the project. So this would be for languages where word boundaries are not written ... Chinese/Thai/Lao/Khmer/Burmese etc. ? Yes, that's the idea. Yes, that could be interesting. But, if it was the case that the project would be for just segmentation, then ideally it would be tested on more than one language. Ideally, yes. In practice... maybe. Language-independent segmentation is not the most well-trodden path, and anything that I have seen that claims language-independence was only tested on a single language. The method used in the project I pointed to is one of the few that claims language independence, but that implementation might have some (I can't tell for sure, as the comments are in Chinese, but it doesn't look like it). I would require language independence as a project goal, but wouldn't make it a hard requirement before midterms - more of an 'avoid the obvious' guideline. As for actually testing how independent it is... I don't know how that's going to work out. There are plenty of resources for Chinese, and drastically fewer for everything else. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Idea for GSOC
On 11 March 2013 18:04, sphinx jiang yishan...@gmail.com wrote: Hi, I would like to suggest an idea for Apertium GSOC program. Several days age I talked to Jimmy, and was enlightened by the idea Segmentation by itself. Sorry, I wasn't clear enough. The idea is segmentation. I said that segmentation by itself would probably make a good project, where by itself was intended to mean that the project would just be segmentation. In practice, you will also have to work on a language pair where this can be used. zh_ZH-zh_TW is a perfect candidate, because segmentation is not strictly necessary for this language pair - i.e., you use it to demonstrate that segmentation is working, without _needing_ to. In that regard, you will need to also allot some time to developing that language pair, though it will not be the primary focus of the project. The Hierarchical HMM for segmentation ports, especially the imdict-chinese-analyzer, which is for Chinese segment, wrote in Java, I think it can be transplant to C++, and used for Apertium . Then we can fulfill the program translate Chinese-ZH to Chinese -TW by self segment. Is my idea possible to achieve? I am looking forward to your reply~~ A straightforward port will not be sufficient. The module will, at the very least, also need to handle the Apertium stream format. Your proposal should take this into account. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester Wave(TM): Endpoint Security, Q1 2013 and remains a good choice in the endpoint security space. For insight on selecting the right partner to tackle endpoint security challenges, access the full report. http://p.sf.net/sfu/symantec-dev2dev ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] ask for help
On 8 March 2013 16:34, sphinx jiang yishan...@gmail.com wrote: dear authors of apertium: Hi. I am a beginner of apertium who want to make a Chinese related language pair, so I need some help from the author of http://apertium.svn.sourceforge.net/viewvc/apertium/incubator/apertium-zh_CN-zh_TW/ So would you please to tell me how can i get in touch with the author. Thank you very much~~ The author of that module was a student who was hoping to apply for Google Summer of Code. It was partly a proof-of-concept to see if it could be done, without adding a specific module for segmentation (which is still a matter to be determined, though I do recall that the student was quite pleased with the results). If you have any questions, you can ask them here, and we'll do our best to answer. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester Wave(TM): Endpoint Security, Q1 2013 and remains a good choice in the endpoint security space. For insight on selecting the right partner to tackle endpoint security challenges, access the full report. http://p.sf.net/sfu/symantec-dev2dev ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Fwd: Apertium PMC Election
On 3 March 2013 19:25, Juan Pablo Martínez Cortés jpm...@unizar.es wrote: I'm not sure, but in case I am entitled to vote: Sure you are - if you are a registered developer (i.e., if you can commit to the SVN repository), you're entitled to vote. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_feb ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] The terrible tagger was :Re: New page about transfer rules ready (in French)
On 1 March 2013 09:20, Per Tunedal per.tune...@operamail.com wrote: Next mail?? I sent two emails, one after the other. The second had an example (both forms of 'man') of your coarse tagset being too broad, which underlined the point I was trying to make in the first. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_feb ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] google translate altering the meaning in eu-es
On 28 February 2013 09:52, Antonio Toral ato...@computing.dcu.ie wrote: Hi apertiumers, I came across a news story in Basque about someone getting in trouble for speaking in Basque to the police http://www.ateakireki.com/2013/02/lizarrako-gazte-bat-epaituko-dute.html I used Google and Apertium to translate it into Spanish to read it and... Google changes a youngster speaks in Basque in front of the police for a youngster speaks in ENGLISH in front of the police, or the dangers of statistical machine translation! It's a localisation artifact. A less obvious example of the same factors leads to Austria becoming Ireland (http://itre.cis.upenn.edu/~myl/languagelog/archives/005492.html): for example, on a news website there could be several instances of the phrase 'últimas noticias en español' in the Spanish edition, where the English equivalent would have 'recent news in English'. The language model will typically favour 'in English' because it occurs in English much more often (in a typical corpus) than 'in Spanish'. Another typical error, due to the terrible number handling in most SMT systems, causes 'millón' to become 'billion': Spanish typically uses the long scale, so billón almost never collocates with billion, and when 5.000 millón - 5 billion has been naively converted to _NUM_ millón - _NUM_ billion, you're down to little more than a coin flip whether the output will be million or billion. (Sergio had a paper on mixing Apertium and Moses, where he got better results partly by adding better number handling). It's not just SMT, though: simplistic filters lead to the sprinter Tyson Gay becoming Tyson Homosexual (http://languagelog.ldc.upenn.edu/nll/?p=294), and automated currency converters to 50 Cent becoming RM1.50 (http://languagelog.ldc.upenn.edu/nll/?p=3915), and it's not like humans don't make translation errors either - I'm not sure if it's a coincidence that Moses is named for a biblical figure who was the subject of a quite long lasting translation error (http://en.wikipedia.org/wiki/Moses_(Michelangelo)#Horns) -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_feb ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] The terrible tagger was :Re: New page about transfer rules ready (in French)
On 1 March 2013 12:39, Per Tunedal per.tune...@operamail.com wrote: Hi again, thanks for the thorough answer. I've glanced it throw and have a few quick comments: 1. The problem right now is that the pronoun man is chosen instead of the indefinite article en: en man (= a man) becomes man man! (Yes, yet an other man!). Not that it's never chosen! The details don't really make a difference, as long as you understand the general idea. Nothing can ever be chosen based on right context. In the tagger. It can be done in transfer, though. I'd prefer to explain this with an example, but I'm rather fussy about transfer, and I perhaps could be convinced to adapt those rules, but not without repeatedly questioning the sanity/competence/parentage/predilections of whoever wrote them, so it'd probably be better if we found time when I could simply rewrite them. 2. Will it be easier or harder for the tagger if I split the pardef for the pronoun man (the way that's common i Apertium) into a pronoun (man - en) and a determiner (ens)? Man blir glad när ens barn ger en blommor. = You get happy when your children gives you flowers. I'm not sure what you mean here. 3. And what about the dialectal variant to use en instead of man: en - en - ens (now very popular among young trendy people). What's the least confusing way to handle it? 'en' as a form of 'man' has the same tags, so it would need to 1) be part of a new coarse tag, and 2) that coarse tag would need to use the 'lemma' attribute: def-label name=PRNENSUBJ closed=true tags-item lemma=en tags=prn.pers.p3.ut.sg.nom/ /def-label (However, if 'en' and 'man' both lead to the same translation, you can let it be discarded). It looks to me like you've created a new ambiguity class, by adding 'en' as an analysis of 'man'. If there is no corresponding coarse tag in the tagger .tsx file *and* if the tagger has not been trained to determine probabilities for that tag, then it will never, ever, be selected because you have not made it possible for the tagger to select it. Hmm It's selected all the time! That's a bit confusing. Why does the tagger choose something previoulsy unknow (man) instead of the indefinite artikle (en)? Because it's getting the coarse tag 'PRN', which was presumably the most common of the available options in the corpus the tagger was trained on. 'PRN' contains the tags-item 'prn.*', so it catches a lot. It's probably easiest to think of 'coarse tag' as a category, by the way. This is different from what Jacob told you: essentially, that a bigram tagger simply lacks the context to make a correct determination between pronoun and determiner - in many cases, this requires right context (i.e., knowing what the next word is), but the tagger has only left context (i.e., the previous word). Fine. Now I know how the tagger works. The previous word. I will ponder on that one. There's more to it than that, but writing an email that long would probably have introduced more confusion (and hurt my wrists :) at a minimum, those could be adapted[1]: I will have to treat all other variants as well: p1, p2, and plural. for other pronouns. Yes, that's why I gave you a set of commands to find what those tags are :) Really, you need to adapt the .tsx files and retrain the tagger. Yes, it couldn't be worse, could it? Francis tells me I need to add a lot of more words, though. But I don't think it would do any harm if I retrained the tagger with the few additions and corrections I've already have done. Probably not. If I've retrained it once, I suppose it would be easy to do it again when even more words are added. Yes. [1] There's no need to keep 'prn.subj.*' or 'prn.obj.*' because nothing in the dictionaries matches. Right. That's because I've changed all subj to nom and all obj to acc to be compatible with other language pairs. Was that a bad thing? Not in itself, but the .tsx file also needs to be updated. It's also important to note that if you add new categories (=coarse tags), you will need to retrain the tagger - simply updating the rules will not be enough. Otherwise - if the coarse tags have not changed, just the entries in them (as I did with PRNOBJ and PRNSUBJ) - it's fine. $ lt-expand apertium-sv-da.sv.dix | [etc.] Would this be a list of words that are not at all caught by the tagger, or what? No, words that are _partially_ caught by the tagger - i.e., where there are missing coarse tags. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_feb ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net
Re: [Apertium-stuff] The terrible tagger was :Re: New page about transfer rules ready (in French)
On 28 February 2013 09:01, Per Tunedal per.tune...@operamail.com wrote: Hi, it might be helpful with some information on how the tagger (le Tageur redoutable?) actually works. How can I help the tagger when I add words and paradigms to the dictionaries? I suppose the structure of the dictionaries, and specifically the paradigms, has a great impact on the work of the tagger. Not directly. The tagger is entirely independent of the dictionaries. The fine tags (the tags coming from the dictionary) need to have corresponding coarse tags (the tags used by the tagger) that are sufficient to disambiguate the text. Coarse tags group together equivalent fine tags, which helps to alleviate the data sparseness problem: not all words occur in all contexts, so we group them together so that what we know about classes of words applies to all words in that class. The coarse tags should be as broad as possible, but not too broad - if two word forms match the same coarse tag, then that tag needs to be split, for example. See my next mail. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_feb ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Apertium PMC Election
On 27 February 2013 10:51, Mikel L. Forcada m...@dlsi.ua.es wrote: Dear Apertiumers It's time for the Apertium assembly of committers to elect a new Project Management Committee. We need to update our census. I will take care of that. This message is being sent to the apertium-stuff mailing list. This message will also be sent to all developers to their @users.sf.net addresses. If you receive this message and you want to register to vote in this election, please reply to this message, adding your SourceForge developer ID and your full name before March 4 at 23:59 CET. SF id: jimregan SF name: Jimmy O Regan -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_feb ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Apertium-viewer (r42723) crashes
On 26 February 2013 16:54, Ilnar Salimzyan ilnar.salimz...@gmail.com wrote: org.apertium.lttoolbox.process.FSTProcessor.analysis(FSTProcessor.java:886) at org.apertium.lttoolbox.LTProc.doMain(LTProc.java:284) at org.apertium.pipeline.Dispatcher.doLTProc(Dispatcher.java:297) at org.apertium.pipeline.Dispatcher.dispatch(Dispatcher.java:381) at apertiumview.Pipeline$PipelineTask.run(Pipeline.java:123) at apertiumview.Pipeline$1.run(Pipeline.java:41) (IIRC) This is the equivalent of lt-proc -a, but your mode uses HFST, which is probably what's causing the problem. I think the pipeline has some hardcoded assumptions of what an Apertium pipeline consists of, and this has probably not been tested with Apertium+HFST. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_feb ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Cheap bilingual dictionary
On 14 February 2013 09:16, Per Tunedal per.tune...@operamail.com wrote: Thank you! At last I can start working :-) Per PS Maybe this should be added to the Wiki? The various editions of Wikipedia are having interwiki links migrated to Wikidata, so it might be better to look into that. If nothing else, it's a cleaner source of data. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Free Next-Gen Firewall Hardware Offer Buy your Sophos next-gen firewall before the end March 2013 and get the hardware for free! Learn more. http://p.sf.net/sfu/sophos-d2d-feb ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Cheap bilingual dictionary
On 13 February 2013 11:27, Per Tunedal per.tune...@operamail.com wrote: Hi, I'm experimenting with the script on the page http://wiki.apertium.org/wiki/Building_dictionaries . I'm repeatedly getting an error message: 'import sitecustomize' failed; use -v for traceback All the same, I get results: eplplånboks n=n//lrPortemonnæs n=n//r/p/e eplprogramkods n=n//lrKildekodes n=n//r/p/e eplregisters n=n//lrRegisters n=n//r/p/e eplrepliks n=n//lrRepliks n=n//r/p/e eplscanners n=n//lrSkanners n=n//r/p/e The Danish national characters are distorted, though. Any suggestions? cat [your file]|perl -MEncode -ane 'chomp;if(m!(epl)([^]*)(s n=n//lr)([^]*)(s n=n//r/p/e)!){print $1$2$3.encode(iso-8859-1,decode(utf-8, $4)).$5\n;}' -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Free Next-Gen Firewall Hardware Offer Buy your Sophos next-gen firewall before the end March 2013 and get the hardware for free! Learn more. http://p.sf.net/sfu/sophos-d2d-feb ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Pan-Lexical database on line
On 13 February 2013 16:03, Jimmy O'Regan jore...@gmail.com wrote: On 13 February 2013 15:14, Mikel Forcada m...@dlsi.ua.es wrote: Al 02/13/2013 03:06 PM, En/na Federico Gobbo ha escrit: It's a pity, that there is no indication about the copyright, but I think that it can be used anyway by us. I think that when copyright is not explicitly regulated it amounts to all rights reserved according to the Berne Convention. Therefore, my opinion is quite the opposite. Yes. From a quick look at their sources, the project looks like a lawsuit waiting to happen, so I would avoid it like the plague. On the upside, in cleanly-licensed (CC-BY-SA) terms, there's a CSV file from the DBPedia-Wiktionary project that provides translations from both en.wiktionary and de.wiktionary (http://downloads.dbpedia.org/wiktionary/dumps/de/wiktionary_de+en_2012-04-01_translations.csv.gz). The entries look like this: aalartig,German,Adjective,einem Aal ähnlich; wie ein Aal,eel-like,English aalartig,German,Adjective,einem Aal ähnlich; wie ein Aal,eellike,English aalartig,German,Adjective,einem Aal ähnlich; wie ein Aal,węgorzowaty,Polish The endpoint is offline at the moment, but the aim of the project is to provide a linked-data view on multiple editions of Wiktionary simultaneously. (i.e., it uses a proper database, but it's a graph database rather than an SQL database). Inflection is not currently extracted, because of the sheer number of templates involved, but most of the rest of Wiktionary is. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Free Next-Gen Firewall Hardware Offer Buy your Sophos next-gen firewall before the end March 2013 and get the hardware for free! Learn more. http://p.sf.net/sfu/sophos-d2d-feb ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Cheap bilingual dictionary
On 13 February 2013 17:53, Per Tunedal per.tune...@operamail.com wrote: Hi, I just found out that the script for generating bidix entries works alright when translating from the left language to the right language. Translating from Swedish to Danish works OK in the pair sv-da, but if I try to translate a Danish text, the lexical entries are reversed: eplbagepulvers n=n//lrBakpulvers n=n//r/p/e eplbasilikums n=n//lrBasilika_s n=n//r/p/e eplblades n=n//lrBlads n=n//r/p/e eplblegselleris n=n//lrSelleris n=n//r/p/e eplblinis n=n//lrBliniers n=n//r/p/e eplblomkåls n=n//lrBlomkÃ¥ls n=n//r/p/e You still have the same problem: 'BlomkÃ¥l' should (presumably) be 'Blomkål' -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Free Next-Gen Firewall Hardware Offer Buy your Sophos next-gen firewall before the end March 2013 and get the hardware for free! Learn more. http://p.sf.net/sfu/sophos-d2d-feb ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Cheap bilingual dictionary
On 13 February 2013 21:00, Per Tunedal per.tune...@operamail.com wrote: Well, I ran your script afterwords, and the Swedish characters where corrected - but the Danish ones where damaged: Before: eplBlomkÃ¥ls n=n//lrblomkåls n=n//r/p/e eplBlÃ¥musslas n=n//lrblåmuslings n=n//r/p/e eplSamlags n=n//lrboldes n=n//r/p/e eplBombs n=n//lrbombes n=n//r/p/e eplBrandy_s n=n//lrbrandys n=n//r/p/e eplHallonsläktets n=n//lrbrombærs n=n//r/p/e eplBröllopstÃ¥rtas n=n//lrbryllupskages n=n//r/p/e eplKvinnobrösts n=n//lrbrysts n=n//r/p/e eplBröds n=n//lrbrøds n=n//r/p/e eplBulgurs n=n//lrbulgurs n=n//r/p/e eplBunsenbrännares n=n//lrbunsenbrænders n=n//r/p/e eplBönas n=n//lrbønnes n=n//r/p/e eplBönas n=n//lrbønners n=n//r/p/e after: eplBlomkåls n=n//lrblomk?ls n=n//r/p/e eplBlåmusslas n=n//lrbl?muslings n=n//r/p/e eplSamlags n=n//lrboldes n=n//r/p/e eplBombs n=n//lrbombes n=n//r/p/e eplBrandy_s n=n//lrbrandys n=n//r/p/e eplHallonsläktets n=n//lrbromb?rs n=n//r/p/e eplBröllopstårtas n=n//lrbryllupskages n=n//r/p/e eplKvinnobrösts n=n//lrbrysts n=n//r/p/e eplBröds n=n//lrbr?ds n=n//r/p/e eplBulgurs n=n//lrbulgurs n=n//r/p/e eplBunsenbrännares n=n//lrbunsenbr?nders n=n//r/p/e eplBönas n=n//lrb?nnes n=n//r/p/e eplBönas n=n//lrb?nners n=n//r/p/e That's strange, because your script corrected the file translated in the other direction OK. Yes, because it was expecting the corrupted characters to be on the right, so to go the other way it would need to be: perl -MEncode -ane 'chomp;if(m!(epl)([^]*)(s n=n//lr)([^]*)(s n=n//r/p/e)!){$rec=encode(iso-8859-1,decode(utf-8, $2));if($2 eq lc($2)){$rec=lc($rec);}; print $1$rec$3$4$5\n;}' -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Free Next-Gen Firewall Hardware Offer Buy your Sophos next-gen firewall before the end March 2013 and get the hardware for free! Learn more. http://p.sf.net/sfu/sophos-d2d-feb ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] A more simple example for transfer rules
On 2 February 2013 17:30, Bernard Chardonneau bechapert...@free.fr wrote: The wiki page of Francis about writing transfer rules is interesting (and it is good to have written it), but the example is not enough simple for me to know what to write in the different sections. For instance, in def-cats , you seem to describe separated words when transfer rules are supposed to work with groups of words. No, it's defining tag categories. 'sg', 'pl', 'sp', and 'ND' (number to be determined) fit into the category of 'number', so having a category containing these elements allows us to treat any of these items as one, rather than having to treat them individually. So, if you want to have simple agreement between words, you can take the contents of the relevant category, whatever it is, without having to treat what it _really_ is. So instead of checking if one word contains 'lit-tag v=sg/', then 'lit-tag v=pl/' ... etc., you can use 'clip ...' where the 'part' attribute is whatever you named the number category. So, if I have: def-attr n=nbr attr-item tags=sg/ attr-item tags=pl/ attr-item tags=sp/ attr-item tags=ND/ /def-attr and the input word was '^foonsg$' then clip pos=1 side=sl part=nbr/ equals 'sg'. (Here, 'pos' has the same meaning as with 'b' - the number of the word relative to the 'pattern-item' that matched it - and 'side' is either 'sl' (source language, or input) or 'tl' (target language, or output)). -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_jan ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] A more simple example for transfer rules
On 2 February 2013 19:50, Bernard Chardonneau bechapert...@free.fr wrote: OK for that. And according to the wiki, as there can be kinds of regular expressions in The implementation uses regular expressions, but that is an implementation detail which should not be relied upon. cat-item there can also be pattern with categories of words including a special tag (for instance a noun with a acc attribute). The category matches the attribute, not a combination of them. So a 'case' category that matches 'nom' and 'acc' will match those attributes wherever they may appear, not just in nouns. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_jan ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Using Apertium language pair jar files in an Android app?
On 30 December 2012 12:07, Francis Tyers fty...@prompsit.com wrote: It means the source code for the whole app. iirc the GPL forbids linking with non-free code. Someone else may be better placed to answer this though. Fran El dg 30 de 12 de 2012 a les 16:49 +0530, en/na Mark Carter va escriure: Thanks very much for your reply. When you say the source code - do you mean the entire source code of my app? What about if there was a separate module specifically for the apertium stuff? Would it be enough to just release the source for that? The long answer is that it would heavily depend on just how such a module was structured[1]; the short answer is 'no'. Additionally, it is not sufficient to merely release the source code - to vastly over-simplify, the source *and* the build scripts must be released under terms which are compatible with the GPL. The GPL has other requirements that you may not find particularly appealing: the first that comes to mind is that each recipient of a GPL-licensed program is allowed to further redistribute the program under the terms of the GPL. [1] I expect that you'll excuse me for not spending the time to enumerate these options, as this would run contrary to our interests. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. ON SALE this month only -- learn more at: http://p.sf.net/sfu/learnmore_123012 ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Android app released
On 30 December 2012 07:36, Mikel Forcada m...@dlsi.ua.es wrote: Authors and contributorsAUTHORS AND CONTRIBUTORS This app would not have been possible without the support of Google Summer of Code (GSoC) stipends. 2012 GSoC student Mikel Artetxe - Making Java port of lttoolbox (dictionary engine) embeddable 2012 GSoC student Arink Verma - Created an Android app using lttoolbox-java 2012 GSoC Jacob Nordfalk - Mentor of Mikel and Arink Re-architectengineering of lttoolbox-java for memory-constrained devices Android app revision autumn 2012 2010 GSoC student Stephen Tigner - Java port of the Apertium C++ library 2009 GSoC student 2009 Raphaël Laurent - Java port of the Apertium C++ library 2008 Nic Cottrell - Initial draft of Java port 2009-2012 Jacob Nordfalk - Maintainer and GSOC mentor There should be some nod to the authors of the C++ version of Apertium, and to Stephen's involvement in mentoring Arink and Mikel. Also, self-serving as it is, there's a bunch of code in the tagger that I wrote that's still exactly as I left it. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. ON SALE this month only -- learn more at: http://p.sf.net/sfu/learnmore_123012 ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Duplicate entries in apertium-es-ca monilingual dics
On 29 November 2012 14:25, Jimmy O'Regan jore...@gmail.com wrote: $ diff -u sort.dix out.dix |grep '^\-.*v='- e v=valplegeixi/l regirs n=vblex/s n=prs/s n=p3/s n=sg//r/p/e These entries: - e v=catpleguin/lreures n=vblex/s n=imp/s n=p3/s n=pl//r/p/e - e v=catpleguin/lreures n=vblex/s n=imp/s n=p3/s n=pl/j//r/ppar n=S__anant//e - e v=catplegui/l reures n=vblex/s n=imp/s n=p3/s n=sg//r/p/e - e v=catplegui/l reures n=vblex/s n=imp/s n=p3/s n=sg/j//r/ppar n=S__vagi//e - e v=valpleguen/lreures n=vblex/s n=imp/s n=p3/s n=pl//r/p/e - e v=valpleguen/lreures n=vblex/s n=imp/s n=p3/s n=pl/j//r/ppar n=S__anant//e - e v=valplega/l reures n=vblex/s n=imp/s n=p3/s n=sg//r/p/e - e v=valplega/l reures n=vblex/s n=imp/s n=p3/s n=sg/j//r/ppar n=S__vagi//e were genuinely redundant, so I fixed them. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Keep yourself connected to Go Parallel: VERIFY Test and improve your parallel project with help from experts and peers. http://goparallel.sourceforge.net ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] ACX files
On 18 November 2012 14:55, Bernard Chardonneau bechapert...@free.fr wrote: $(PREFVAR2)$(PREFIX1).autogen.bin: $(PREFVAR2)$(LANG2).dix apertium-validate-dictionary $(PREFVAR2)$(LANG2).dix lt-comp rl $(PREFVAR2)$(LANG2).dix $@ $(BASENAME).$(LANG1).acx $(PREFVAR1)$(PREFIX2).autogen.bin: $(PREFVAR1)$(LANG1).dix apertium-validate-dictionary $(PREFVAR1)$(LANG1).dix lt-comp rl $(PREFVAR1)$(LANG1).dix $@ $(BASENAME).$(LANG2).acx Two questions : 1) For the second autogen, should not $(BASENAME).$(LANG2).acx be rather used ? 2) Are .acx files usefull for generation ? In reverse order: 2) No 1) It's not used, so it doesn't matter. ACX is used to specify alternative characters for analysis, and for most language pairs it's not used for much more than to normalise apostrophes (the acx files in fr-es are most likely identical, for example). In rl mode, lt-proc does not process the ACX file. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Monitor your physical, virtual and cloud infrastructure from a single web console. Get in-depth insight into apps, servers, databases, vmware, SAP, cloud infrastructure, etc. Download 30-day Free Trial. Pricing starts from $795 for 25 servers or applications! http://p.sf.net/sfu/zoho_dev2dev_nov ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] google code in task descriptions
On 1 November 2012 07:20, Mikel Forcada m...@dlsi.ua.es wrote: Al 10/31/2012 10:49 PM, En/na Francis Tyers ha escrit: Make a 50 sentences long translation memory Why so short? I think it is quite easy if one finds text and aligns it... Why from wikipedia? To have open content text without having to explain the issues is one good reason. Also, while working on Spanish-Aragonese, I noticed that, quite often, the first sentences of a pair of equivalent articles were parallel, or almost parallel, even if the rest of the text diverges. There may be other areas - template properties, image descriptions, descriptions on Wikimedia Commons, etc. - where we could be looking for parallel sentences that become more visible once we've seen a collection of them. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Debugging?
On 30 October 2012 16:07, Yannis Haralambous yannis.haralamb...@telecom-bretagne.eu wrote: dear Apertium people, is it possible to follow the structural transfer of a sentence step by step? For example: what are the chunks, which rule is applied to each, what is the result for each chunk. In other words, is there a debugging option for the structural transfer module? You can get this with apertium-transfer -t but it's not available from the script (it wouldn't make sense) -- you'll have to manually provide the entire pipeline. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] CorpusCatcher
On 26 October 2012 13:41, Per Tunedal per.tune...@operamail.com wrote: Hi, what the status of corpuscatcher? I would like to get a monolingual corpus, but corpus catcher uses Yahoo to crawl the web. And I get an error message about a depreciated Yahoo search API. Any updates? Any way to circumvent the issue? Any alternatives? I've only taken a quick look at github, but it seems to be under pretty active development. Have you tried the development version? Or asking corpuscatcher's developers? -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Bitextor installation
On 26 October 2012 13:57, Raymond HS raymh...@gmail.com wrote: Hi Jim, For the Antara website, I think most of their stories are not translations (more like comparable than parallel). But I believe there are some of them that are direct translations. Actually it will be good if Bitextor can use some linguistic information (like bilingual dictionary) during the alignment process. :) IIRC, Bitextor only uses document structure. If you already have a set of aligned documents, Hunalign can use a dictionary to improve existing sentence alignments, and maligna can additionally create IBM Model 1 models. Finding parallel document pairs in comparable corpora is a less researched problem, but Felipe's doctrans project (http://code.google.com/p/doctrans/) happily does that - you'll need a phrase table from Moses to use it, though. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] CorpusCatcher
On 26 October 2012 15:27, Per Tunedal per.tune...@operamail.com wrote: Hi, Strange. I only find files two years old. Have you found any newer files somewhere? Do you have any more information? I googled 'corpuscatcher' -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Bitextor installation
On 26 October 2012 15:51, Raymond HS raymh...@gmail.com wrote: Hi Jim, Finding parallel document pairs in comparable corpora is a less researched problem, but Felipe's doctrans project (http://code.google.com/p/doctrans/) happily does that - you'll need a phrase table from Moses to use it, though. Thanks for giving me this information. This is probably what I need, and I am also working using Moses at the moment. Have you tried compiling the program? Not recently. I do recall that I had to patch something to get it to compile, though. Now that I think of it, I think the Moses interfaces changed in the meantime, so it might be some effort to get running. Felipe is subscribed to this list, and might be able to provide some insight, when he has time. When I ran the configure script, it doesn't seem to find PhraseDictionaryTreeAdaptor.h in Moses. This is how I ran the configure script: (the Moses program is already installed locally) ./configure --with-srilm=$HOME/software/srilm --with-moses=$HOME/software/mosesdecoder --prefix=$HOME/local and the error output: checking PhraseDictionaryTreeAdaptor.h usability... no checking PhraseDictionaryTreeAdaptor.h presence... no checking for PhraseDictionaryTreeAdaptor.h... no configure: error: Cannot find MOSES! Is there a bug in the script? Thanks again for your help. :) This might be because of the interface change I mentioned, above. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Bitextor installation
On 25 October 2012 17:09, Raymond HS raymh...@gmail.com wrote: Hi everyone, I wanted to try Bitextor to get some parallel texts from the Web. I have installed all the required libraries on my Ubuntu, but when I tried to compile the Bitextor source code, I got the following error: g++ -g -O2 -o bitextor BitextCandidates.o TranslationMemory.o DownloadMod.o FilePreprocess.o GlobalParams.o Heuristics.o WebFile.o WebSite.o Bitextor.o -L/home/raymondhs/local/lib -ltagaligner3 -lenca -lm -lxml2 -ltre -ltidy -ltextcat /home/raymondhs/local/lib/libtagaligner3.so: undefined reference to `std::basic_stringwchar_t, std::char_traitswchar_t, std::allocatorwchar_t EditDistanceTools::EditDistanceBeamshort(std::vectorshort, std::allocatorshort , std::vectorshort, std::allocatorshort , double (*)(short const, short const, short const), bool const, double const, double*)' collect2: ld returned 1 exit status So there seems to be a linker error in libtagaligner, which I can't really figure out why. Any hint why this could happen? Thanks! Missing template instantiation. I sent Miquel a patch (attached) for this in 2010, but I guess he never got around to applying it. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you tagaligner.patch Description: Binary data -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] apertium es-de
On 25 October 2012 20:10, Isabel Imbernón isabelimber...@gmail.com wrote: Hi, I've been trying to cross the en-es.dix with the en-de.dix to get the es-de.dix, but I don't get it. I've been doing it according to the wiki about crossdics, so I use the script: apertium-dixtools cross-param monA.dix -n bilAB.dix -n bilBC-dix monC.dix, which in my case would be apertium-dixtools cross-param dics/apertium-en-es.es.dix -n dics/apertium-en-es.en-es.dix -n dics/apertium-en-de.en-de.dix dics/apertium-en-de.de.dix isn't it right? However I get following errors: Clipping down to the relevant part: Reading file null/schemas/cross-model.xml Error (null/schemas/cross-model.xml): /Users/isaimbernon/null/schemas/cross-model.xml (No such file or directory) Can anyone explain me what's happening? You need to provide a cross model. There's some documentation on the wiki (http://wiki.apertium.org/wiki/Cross_Model). I find that the best thing to do is to just use the default model (schemas/cross-model.xml in the distribution), and edit the detected patterns that crossdics outputs. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] apertium es-de
On 25 October 2012 20:57, Jimmy O'Regan jore...@gmail.com wrote: On 25 October 2012 20:10, Isabel Imbernón isabelimber...@gmail.com wrote: Hi, I've been trying to cross the en-es.dix with the en-de.dix to get the es-de.dix, but I don't get it. I've been doing it according to the wiki about crossdics, so I use the script: apertium-dixtools cross-param monA.dix -n bilAB.dix -n bilBC-dix monC.dix, which in my case would be apertium-dixtools cross-param dics/apertium-en-es.es.dix -n dics/apertium-en-es.en-es.dix -n dics/apertium-en-de.en-de.dix dics/apertium-en-de.de.dix isn't it right? However I get following errors: Clipping down to the relevant part: Reading file null/schemas/cross-model.xml Error (null/schemas/cross-model.xml): /Users/isaimbernon/null/schemas/cross-model.xml (No such file or directory) Can anyone explain me what's happening? You need to provide a cross model. There's some documentation on the wiki (http://wiki.apertium.org/wiki/Cross_Model). I find that the best thing to do is to just use the default model (schemas/cross-model.xml in the distribution), and edit the detected patterns that crossdics outputs. In case anyone was interested, thanks to Isabel's feedback, the cross-param function in the apertium-dixtools script is a little more robust, and the crossing process no longer creates empty dictionaries if there is a mismatch of sections. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Google Code-in tasks
On 23 October 2012 20:35, Bernard Chardonneau bechapert...@free.fr wrote: Date: Mon, 22 Oct 2012 20:14:35 +0100 From: Jimmy O'Regan jore...@gmail.com To: Apertium-stuff apertium-stuff@lists.sourceforge.net Reply-To: apertium-stuff@lists.sourceforge.net Subject: [Apertium-stuff] Google Code-in tasks http://wiki.apertium.org/wiki/Task_ideas_for_Google_Code-in Feel free to add tasks - we now have the minimum 5 tasks in each category, but more are welcome. Please bear in mind that these are intended to be completed by high school students. In fact, for these tasks that can be done shortly, what and for how much is the important between : - asking new people to do something usefull for the project ? - making new people discover apertium project ? The latter fits into 'outreach', which, along with 'code', 'documentation', 'user interface', and 'research', is one of the areas of contribution. All are considered equally important. Another problem is about mentoring. As this work would be short with the result in 48 H or less, if somebody mentors that, he will need to be availlable in a short amount of time. A to me, I cannot promise anything. That may depend on the hour, the day of the week, and the week during the year. For anyone who is new, or relatively new, to GCI (or GSoC), my recommendation would be to wait until the competition is under way, and to take an observational role on a handful of tasks. Seeing how it works in practice will give a better of idea of what's involved than any explanation, and will also allow a type of 'meta mentoring' (mentoring new mentors) that is quite difficult to provide outside of the programme. Unlike last year, we do not need to 'stockpile' tasks, but can add them as the competition progresses, so there isn't a vital need to think of everything in advance. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] bug apertium.
On Wednesday, 24 October 2012, Mikel Forcada m...@dlsi.ua.es wrote: Al 10/24/2012 08:38 PM, En/na erik ha escrit: Hola, hello look, i am trying to solve this problem with gaupol and apertium. one of the developers of gaupol thinks the problem lies in apertium. Can you help me? https://bugzilla.gnome.org/show_bug.cgi?id=686772 thanks in advance! Erik, I am copying this message to apertium-stuff to see if someone can please help (perhaps Kevin Unhammer), as I don't know what the problem is. But I suspect the problem is in the way apertium is being invoked from gaupol. If I invoke /usr/local/bin/apertium -l in my installation, what I get is the list of installed language pairs, and the status is zero. My version is Apertium 3.2.0. Older versions of Apertium didn't have the -l option. The bug report mentions Ubuntu - the packages in Ubuntu (via Debian) are ancient, so I'd assume that's what's happening. -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Google Code-in tasks
On 23 October 2012 07:40, Mikel L. Forcada m...@dlsi.ua.es wrote: Hi there, if the Apertium OmegaT plugin has not been modified since it was contributed, there is a task that a Java programmer could probably attempt with a bit of help from some of our Java experts: escaping OmegaT's format codes (u, i1, etc.) so that Apertium does not translate them. This involves: researching a bit what kind of format tags OmegaT produces (I can help here), and writing a quick filter that does that. Would this be an adequate GCI task? It looks like a quick hack to me, but requires a bit of guidance. What do you guys think? This is exactly the sort of task we're looking for :) -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] Google Code-in tasks
http://wiki.apertium.org/wiki/Task_ideas_for_Google_Code-in Feel free to add tasks - we now have the minimum 5 tasks in each category, but more are welcome. Please bear in mind that these are intended to be completed by high school students. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Google Code-in
On 16 October 2012 15:43, Francis Tyers fty...@prompsit.com wrote: Hey all, It's that time of year again! Next Monday we'll be applying again for the Google Code-in. One of the most important things we need is a list of tasks suitable for 13-17 year olds. http://wiki.apertium.org/wiki/Task_ideas_for_Google_Code-in There are some changes this year with respect to last year: * There are no translation tasks :( * No difficulty rating * No monetary incentive this year. IIRC, they also changed it so there are two winners from each organisation, chosen by the organisation. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Word selection by sens was: Re: Adding Swedish nouns from SALDO to da-se was: Re: Danish - Swedish Nouns
On 9 October 2012 14:14, k...@keldix.com wrote: On Tue, Oct 09, 2012 at 09:41:41AM +0200, Per Tunedal wrote: Hej Keld, I liked your algo but had to think it over. After I've slept on, it a few things got into my mind: My initial go on an algorithm is then: I found a homonym. Each of the homonyms have a placement in the meaning tree via its father and mother relations. Unfortunately, I've no idea what's the father relation. Maybe you should follow only the mother relations? The father relation is meant to discriminate between the same mother relations. So maybe it can be of help. I don't know. I take it into account to generalize wordnet-like structures, there may be more than one relation from a given homonym Saldo is not a WordNet (and it's creators don't claim that it is, only that it is equivalent for some purposes), and this is one of the major differences. WordNet synsets can have an unlimited number of typed references, whereas Saldo has maximum two untyped references (what the type is depends on the pair, and does not seem to be encoded anywhere that's publicly available). On the plus side, there is a relatively complete set of mappings between the English WordNet and Saldo, so WordNet types could be inferenced from those alignments, though how accurate the results would be remains to be seen. And a general Apertium wordnet module and algoritm should be able to handle more than one upwards relation, In the monodix markup this could be then marked with a rel tag, and more rel tags may be present. I need input from people more in the know if this could be the recommended way to mark up such meaning relations in the monodix. The problem with using WordNet is that the synsets are simultaneously too fine grained -- i.e., they represent a distinction without a difference when it comes to translation, such as 'tree' the plant vs. 'tree' meaning a tree-like structure (parse tree, family tree, etc.) -- and too coarse grained -- synsets are conceptual, rather than lexical, so while 'panther' and 'leopard' are the same animal, we can never say 'black leopard' or 'a panther never changes its spots' -- to be useful for MT. In addition, there is no indication of the relative importance of a sense, which may be too obscure for inclusion in a translation lexicon (e.g., 'torpedo' meaning 'hitman' is a sense of that word that I have only seen in WordNet). If you were to give some thought to how you might split, merge, and prune WordNet synsets into something that's useful for translation, then you might be able to generate some interest. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Word selection by sens was: Re: Adding Swedish nouns from SALDO to da-se was: Re: Danish - Swedish Nouns
On 9 October 2012 15:14, Francis Tyers fty...@prompsit.com wrote: * For Swedish-Danish this will be unnecessary. * For other language pairs in Apertium, there are no free WordNets. Thus the method would have zero applicability. es-ca, es-it, ca-it, en-es, en-ca, es-gl, en-gl are all candidates (Spanish, Catalan, Galician, and Italian WordNets were all released under CC-BY earlier this year). -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Word selection by sens was: Re: Adding Swedish nouns from SALDO to da-se was: Re: Danish - Swedish Nouns
On 9 October 2012 15:59, Francis Tyers fty...@prompsit.com wrote: El dt 09 de 10 de 2012 a les 15:50 +0100, en/na Jimmy O'Regan va escriure: On 9 October 2012 15:14, Francis Tyers fty...@prompsit.com wrote: * For Swedish-Danish this will be unnecessary. * For other language pairs in Apertium, there are no free WordNets. Thus the method would have zero applicability. es-ca, es-it, ca-it, en-es, en-ca, es-gl, en-gl are all candidates (Spanish, Catalan, Galician, and Italian WordNets were all released under CC-BY earlier this year). The whole schebang ? or just a part ? -- I know that 10% or so of the es/ca ones have been available in FreeLing for a while. Yep, all. The ca one was available under the GPL for a while. Ooh, cool: http://adimen.si.ehu.es/web/MCR Shame about the Basque one though. Yeah, but even though it's one of the crappy CC licences, it's still a CC licence, so uses that are prohibited by the database laws (and not by copyright) are fair game. Anyway, if you get something working, I have test data for English--Spanish and would be happy to compare any method using WordNet to my methods. I'm interested in wordnets for other reasons, I was just mentioning they're there. Frankly, I still think that trying to use raw wordnet data for lexical selection would be a massive waste of time, but if someone wants to prove me wrong, more power to them. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Danish - Swedish was: Re: Swedish - Norwegian
On 7 September 2012 07:26, Per Tunedal per.tune...@operamail.com wrote: Hi, originally the translation was broken as both lines used e. The translations of the two swedish words inte and icke should be ikke in danish. In the opposite direction the danish ikke should in most cases be translated with inte in Swedish, but in some contexts icke would be better. As Apertium yet cannot handle a one to two relation I changed the line with icke to e r=LR. 5 minutes of googling leads me to believe that 'har ikke gjort' in Danish translates as 'har inte gjort' in Swedish, and this is the sort of case that can be handled in a rule. (It might be wrong, but it doesn't matter for the example). I'll assume that applies for all past participles (and gloss over that it changes to the supine in Swedish, because where there's 'har inte gjort', I assume you can also have 'har gjort', so that change would be best handled in a macro). I'm also going to skip over defining the 'def-cat' pieces for the 'pattern-item' parts - it's enough to mention that they do have to be defined. You can handle that in two ways: you can either ignore 'ikke', and replace it completely: rule pattern pattern-item n=haver/ pattern-item n=ikke/ pattern-item n=pp/ /pattern action let clip pos=2 side=tl part=lem/ lit v=inte/ /let out lu clip pos=1 side=tl part=whole/ /lu b pos=1/ lu lit v=inte/ lit-tag v=adv/ /lu b pos=2/ lu clip pos=3 side=tl part=whole/ /lu /out /action /rule or you can change its lemma: rule pattern pattern-item n=haver/ pattern-item n=ikke/ pattern-item n=pp/ /pattern action let clip pos=2 side=tl part=lem/ lit v=inte/ /let out lu clip pos=1 side=tl part=whole/ /lu b pos=1/ lu clip pos=2 side=tl part=whole/ /lu b pos=2/ lu clip pos=3 side=tl part=whole/ /lu /out /action /rule It seems to me that other adverbs could fit in the same place as 'ikke', so you could use a test instead: rule pattern pattern-item n=haver/ pattern-item n=adv/ pattern-item n=pp/ /pattern action choose when test equal clip pos=2 side=sl part=lem/ lit v=ikke/ /equal /test let clip pos=2 side=tl part=lem/ lit v=inte/ /let /when /choose out lu clip pos=1 side=tl part=whole/ /lu b pos=1/ lu clip pos=2 side=tl part=whole/ /lu b pos=2/ lu clip pos=3 side=tl part=whole/ /lu /out /action /rule The 'choose' part basically means 'if the lemma part of the source language (sl) is ikke, then change the target language (tl) to inte (but do nothing otherwise)'. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Formatters and deformatter for man pages and mnémonic files added to apertium
On 4 September 2012 20:38, Bernard Chardonneau bechapert...@free.fr wrote: So, when you wrote ... you should put these somewhere else (approved by Mikel), I thought the problem was to have put the files directly in trunk before other people test them, not to have put them in apertium directory. Definitely the latter (don't put them in the apertium/ directory). trunk/ is a lot more relaxed, but we do expect things in trunk to be release quality - if it doesn't work out of the box, we'll expect you to move it - but there aren't any problems about creating new modules. As it has been explained now, no problem to create a new directory (I think in trunk for that case) to put my new formatter et deformatters with a makefile and 2 shells called apertium-man and apertium-mnemo to permit a more simple usage, as apertium shell will not support these formats. Right. And if you think a version based on a XML file may be interesting to put in apertium trunk, that may be something interesting to develop, but not during the 4 next months for me (and as there will be another solution working, there is not emergency). Sure. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Formatters and deformatter for man pages and mnémonic files added to apertium
On 3 September 2012 09:52, Bernard Chardonneau bechapert...@free.fr wrote: Hello As indicated in another Email in August, I developped deformatters and reformatter for man pages and mnémonic files. I added the source files in http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium/apertium/ Unless Sergio says otherwise, you should put these somewhere else. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Formatters and deformatter for man pages and mnémonic files added to apertium
On 3 September 2012 16:04, Bernard Chardonneau bechapert...@free.fr wrote: Al 09/03/2012 01:05 PM, En/na Jimmy O'Regan ha escrit: Unless Sergio says otherwise, you should put these somewhere else. +1 Mikel Well, when I put a question about that 5 weeks ago, I did not get any answer for that point. Without trawling back through old mail, my recollection was that Kevin Unhammer pointed out the mediawiki deformatter to you, which is in a separate directory. I for one took the implication to be this is the example to follow - including where to put it - I assume others did too. I also felt that there was nothing to be said to improve on that answer; I again assume that others did too. I'm sorry we didn't make that sufficiently clear, _but_ when a) you are diverting from existing conventions; and b) using a different programming language, I for one feel that the default position should be 'make this a separate module', and I'm relatively confident that the majority of open source projects adopt a similar stance, particularly when it comes to their core software. -- Sefam Are any of the mentors around? jimregan yes, they're the ones trolling you -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff