Re: [Apertium-stuff] [GSOC] Unify the metadix formats Queries (Mikel Forcada)

2014-03-04 Thread Jimmy O'Regan
On 4 March 2014 10:44, Gaurav Agrawal ergaur...@gmail.com wrote:
 Hello Mikel,

 Sorry for the late reply, I was busy in some assignment work at my
 university.


 Hi Gaurav:
 have you read about metadix? Have you understood how metadix
 dictionaries are converted to the .dix format used by Apertium
 compilers?

 -- I have read about the Contributing to an existing pair information
 present on the wiki and also about the dictionary file (.dix) file available
 on the wiki and done the basic installation. I can't find the infotmation
 about the metadix files on the wiki can you please suggest me some
 resources.


I would suggest that you check the files apertium-fr-es.fr.metadix in
the apertium-fr-es package, and apertium-en-ca.en.metadix in the
apertium-en-ca package, which demonstrate respectively the use of the
prm and sa elements.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Subversion Kills Productivity. Get off Subversion  Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works. 
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Regarding Joining the mailing list

2014-03-04 Thread Jimmy O'Regan
On 4 March 2014 16:29, TARUN GUPTA tarungupta@lnmiit.ac.in wrote:
 Sir, When I click the confirmation link it shows invalid confirmation string
 please tell what to do .

The confirmation link is only valid once, so clicking it a second time
will tell you that it's invalid. It's fine, you're subscribed, there's
nothing to worry about.

Now -- in a separate thread, please! -- you can talk to us about the
project you're interested in.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Subversion Kills Productivity. Get off Subversion  Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works. 
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] [GSOC] Unify the metadix formats Queries

2014-03-03 Thread Jimmy O'Regan
On 28 February 2014 04:28, Gaurav Agrawal ergaur...@gmail.com wrote:
 Hello All,

 I am Gaurav Agrawal, student of M.Tech in Computer Science and Engineering
 at IIIT, Hyderabad.

 I am very much interested in the machine learning and want to send my summer
 by contributing to the open source. So, GSoc is the best opportunities and
 Apertium is the best organization in machine learning for me.

 As I have the prior good knowledge of the XML and the Java and also the
 basic knowledge of the python and shell scripts, I found the project Unify
 the metadix formats interesting and suitable for me.

 Thanks to the #Unhammer #firespeaker #wei2912 for suggesting me the wiki
 pages for the basic understanding of the Apertium project and for the
 installation.

 Presently, I have been working on the Coding Challenge :)

 I have the few queries in the same:

 1)

 For the entry:
 e r=RL lm=débilidébil/ipar n=abdominal__adj//e

 Output suggested:

 (débil::débil)[abdominal_adj]; # débil

 But as is it RL i.e. Right to Left. So, as per understading it should be :

 (débil::débil)[abdominal_adj]; # débil ?


You are correct.

 2)

 Similarly for the conversion of :
 e r=LR lm=inapropiadoiinapropiad/ipar n=absolut/o__adj//e

 Output Suggested:
 (inapropiad::inapropiad)[absolut/o_adj]; # inapropiado

 But as is it LR i.e. Right to Left. So, as per understading it should be :

 (inapropiad::inapropiad)[absolut/o_adj]; # inapropiado ?


You are correct.

 3)

 The Entry:
  e lm=multa de tráficoimulta/ipar
 n=abeja__n/plb/deb/tráfico/lrgb/deb/tráfico/g/r/p/e

 should becomes:

 (multa:multa)[abeja__n](_de_tráfico)); # multa de tráfico

 We have both the left (l) and right(r) part in the pair (p) :

 plb/deb/tráfico/lrgb/deb/tráfico/g/r/p/e

 But in the conversion we only have the (_de_tráfico)) and not the
 (_de_tráfico:_de_tráfico)) is it because both the left and right part are
 equal ?
 If yes, we are doing this way only when there is Multiwords with inner
 inflection and we have the tag g ?
 How we will treat the case when the left and right part are different with
 the g tag.


I would assume that the output of
'plb/deb/tráfico/lrgb/deb/tráfico/g/r/p/e'
should be '(_de_tráfico:#_de_tráfico) -- i.e., that p is processed
as usual, and that g inserts the '#' symbol as in the text stream.
'(_de_tráfico)' is the output I would expect to see for
ib/deb/tráfico/i


-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Subversion Kills Productivity. Get off Subversion  Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works. 
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] GSoC Proposal: Diacritic restoration (was: Re: Helping you as a gsoc applicant.)

2014-02-28 Thread Jimmy O'Regan
On 28 February 2014 18:21, Alex Aruj alex.a...@gmail.com wrote:
 Hi group,


Hi.

One part of GSoC is that you will learn how to engage with an open
source community; you've taken your first step. Good job!

A necessary part of interacting with open source communities is to
communicate via mailing lists, and there is a certain amount of
etiquette involved.

In this particular instance, what you have done is usually called
thread hijacking -- you've sent a mail as a reply to another, but on
a completely different topic. A normal thread on a mailing list is
essentially a single conversation, and interjecting an email on an
unrelated topic interrupts the usual flow of conversation. This is bad
for us, as the result can be a confusing mix of two separate
conversations, under the same heading. It's bad for you, as a GSoC
applicant, because there may come a time when one of the mentors will
need to refer to an earlier part of the communication with you, and
will find it difficult to find your email.

In future, please write a new email when writing on a new subject,
rather than using 'reply'. It's a minor inconvenience to copy and
paste the mailing list address, but it's more than outweighed by the
later inconvenience involved if you need to refer back to an earlier
email.

To help you, I've changed the subject to one more appropriate to your proposal.

 I am considering tackling the 'restoration of diacritic marks' task. I am in
 the middle of my second semester of C++ and winding down my full-time job in
 a translation company in order to study computational issues related to
 language and work freelance in my pair ESEN, and possibly to develop more
 in PTEN. Anyway, back to GSOC:

 Is the priority to make the charlifter case-sensitive and for it to respect
 superblanks exactly as in the example in the box laid out here
 http://wiki.apertium.org/wiki/Superblanks?


Respecting superblanks is a must: diacritic restoration must not be
applied to them.

Case should definitely be _respected_: the output needs to match the
input in terms of case.

As for case sensitivity, Kevin Scannell is the person to ask for a
definitive answer.  My feeling is that case sensitivity can
potentially be more accurate, but in the absence of sufficient data,
case insensitive (trained on lowercase) should be the default.

 Should the tasks be done in this order or according to applicant interest?
 http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Accent_and_diacritic_restoration


The task itself is to port Charlifter.

Adding a rule-based replacements can be done in a number of ways, but
possibly the easiest (and likely most effective way) would be to do so
in a similar manner to apertium-tagger -- by adding non-statistically
derived probabilities (i.e., you insert a high probability for a
rule-based replacement).

Training models is a necessary to test the system -- this is a
non-code task, and cannot be a requirement. You will need to train
multiple models, because testing with one will not be sufficient, but
the whatever you can manage of the remainder during the wrap up time
should be sufficient.

Inform charlifter with target-language information... -- I think
this is necessary to make this a full GSoC project (that is, I don't
imagine a port of Charlifter will take 3 months by itself). Ideally,
this should be started before midterms, but taking midterms as a
starting point would be fine.

 Are the main coding skills needed for this task boolean operations, loops
 and file input/output knowledge or is something exotic I should be aware of
 (see next question  ; ) )?

 Anything to help understand finite state automata in this process? Are the
 different nodes basically functions that are called as the diacritic mark,
 word, structure is analyzed?

The port is the important part. There may be some 'exotic' stuff,
depending on how much time is left over, but you'll just be calling
functions, not implementing them. Nothing scary :)

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis  security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Translation memories with Apertium

2014-02-15 Thread Jimmy O'Regan
On 14 February 2014 19:04, Mikel Forcada m...@dlsi.ua.es wrote:
 Jim, apertiumers:
 I have explained the task better in the Ideas page. Maybe it will become
 clear that this is far from being trivial. Fran and I were talking this
 afternoon and he can tell you about that too.

My mistake, I assumed this was related to a project idea I had listed
in previous years.

In essence, you seem to be talking about a command-line version of
Miquel's OmegaT plugin. Is that the case?


-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Android apps run on BlackBerry 10
Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
Now with support for Jelly Bean, Bluetooth, Mapview and more.
Get your Android app in front of a whole new audience.  Start now.
http://pubads.g.doubleclick.net/gampad/clk?id=124407151iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] GSoC Idea : Stepping into Web 3.0 with WebRTC+Apertium

2014-02-14 Thread Jimmy O'Regan
On 14 February 2014 03:29, Aayush Kothari aayush.kothar...@gmail.com wrote:
 Thank you, Jimmy and Keld for your inputs.

 I give the Speech Translation Wiki a thorough read and also looked up
 GenieTalk. Here's what I want to add:

  The Web Speech API* open-sourced by Google

Open source? In what way is uploading sound files to a proprietary
server open source?
(Open API != open source).

 The idea is to exploit the in-browser capability and do away with the need
 to download or install anything on your computer/tablet/phone. It should be
 as simple as just picking up a Nexus 7 or an iPad, going to a url on Chrome
 and begin talking.


How do you propose to add translation? Via the web service?

 Please let me know if I'm overlooking a caveat

Honestly, too many to list.

Most importantly, your list of potential applications is still more
the stuff of science fiction (I mentioned Star Trek for a reason), and
I was rather hoping that if you did enough background reading, you
would realise that yourself.

 or going beyond scope here

I can only imagine that it is. Calling a bunch of Javascript APIs to
get the basic 'you speak, translate the ASR output, encode it via TTS,
send over a WebRTC channel' would not take even a week to do. You
haven't talked about the details, but if you're just calling APIs for
translation, ASR/TTS, etc., then I imagine that you intend to spend
the bulk of the project working on how to route the conversation to
inject the translation in a way that's not intrusive. That's more a
WebRTC project than Apertium.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Android apps run on BlackBerry 10
Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
Now with support for Jelly Bean, Bluetooth, Mapview and more.
Get your Android app in front of a whole new audience.  Start now.
http://pubads.g.doubleclick.net/gampad/clk?id=124407151iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] GSOC idea: improve support for non-standard input

2014-02-13 Thread Jimmy O'Regan
On 13 February 2014 09:55, Francis Tyers fty...@prompsit.com wrote:
 You'll need to discuss licensing with Apple and get them to change the
 terms for their Application Shop so that GPL programs are allowed.

The Free Software Foundation already did this (someone added an app
based on GNU Go), and got nowhere. Good luck with that!

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Android apps run on BlackBerry 10
Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
Now with support for Jelly Bean, Bluetooth, Mapview and more.
Get your Android app in front of a whole new audience.  Start now.
http://pubads.g.doubleclick.net/gampad/clk?id=124407151iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] GSoC Idea : Stepping into Web 3.0 with WebRTC+Apertium

2014-02-12 Thread Jimmy O'Regan
On 12 February 2014 17:59, Aayush Kothari aayush.kothar...@gmail.com wrote:
 Hello all,

 Forgive if this project already exists somewhere, but this is something I
 truly wanted implemented after I learned what WebRTC is about and capable
 of.
 So the idea is this - WebRTC already gives you the ability to have
 in-browser audio/video chats and there are many implementations of the same
 already out there. But what all of them do not do is allow communication
 between 2 persons who may differ in languages they can speak - something
 that lead to the demand for human and eventually, computer-aided translators
 such as Google Translate (sadly not free anymore) and Apertium. With my
 idea, and constantly evolving web-browsers, it'd be a wonderful gift for a
 huge chunk of the internet users.


Speech-to-speech translation is the dream of anyone who grew up
watching Star Trek :)

 A basic idea of what it'd do:

 It would allow a Japanese guy and a French guy to speak to the browser in
 their native language and display what the Japanese person actually meant in
 French   on  the French guy's screen.

 It also gives you the chance to speak in Japanese but heard in French on the
 other side by having the bot (such as a SpeechSynthesisUtterance instance)
 speak out a translated version of what you said.

As well as speech synthesis, you would need speech recognition.

I'd suggest that you start with
http://en.wikipedia.org/wiki/Speech_translation and follow the links
in the article, to familiarise yourself with what would be involved.


-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Android apps run on BlackBerry 10
Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
Now with support for Jelly Bean, Bluetooth, Mapview and more.
Get your Android app in front of a whole new audience.  Start now.
http://pubads.g.doubleclick.net/gampad/clk?id=124407151iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] GSOC idea: make an app for Iphone/Ipad

2014-02-05 Thread Jimmy O'Regan
On 05/02/2014, Francis Tyers fty...@prompsit.com wrote:
 El dc 05 de 02 de 2014 a les 13:56 +0100, en/na Xavi Ivars va escriure:



 2014-02-05 Francis Tyers fty...@prompsit.com:


 A related question for Mikel:

 How much work would it be to make Mitzuli support HFST and
 VislCG in
 translators ? Would it be enough work to make a GSOC project
 do you
 think ?  It's a really sought-after feature here (in Tromsø).



 The main problem I see in here is to have (and maintain) a
 Android-compatible-JAVA port of both libraries (if the Android NDK
 can't be used).

 As far as I am aware there is/was a Java library for HFST,[1]

If you know who to ask to add a licence, that would make a good first step.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] GSOC 2014 intro

2014-01-14 Thread Jimmy O'Regan
On 14 January 2014 10:39, Prateek Gupta prateekgupta.3...@gmail.com wrote:
 Hello,

Hi!

 I am a B.E. student from India interested to participate in GSOC 2014 with
 Apertium organization.

The participating organisations for GSoC 2014 have not been selected
yet, and there is no guarantee that Apertium will be selected. While
it seems likely, given Apertium's multi-year participation, please
remember that it is not certain.

 Can anyone guide me to the current development of the project and its
 probable ideas and help me to understand the project for a better
 understanding?

http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code contains
last year's ideas -- it's a good way to get an idea of the types of
projects that are likely to be selected.


-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments  Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] apertium docs in epub format

2014-01-10 Thread Jimmy O'Regan
On 10 January 2014 15:11, Francis Tyers fty...@prompsit.com wrote:
 Does anyone have experience with epub ? How hard would it be to get the
 PDF documentation in epub format ? (I think it's basically XHTML in a
 zip file).

Plus an index/metadata in XML, it can be (there's another format
intended for screenreaders that can be used instead of XHTML). Calibre
(http://calibre-ebook.com) should handle the conversion.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments  Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] [POSSIBLESPAM]: Re: [POSSIBLESPAM]: Re: Call for bids: Apertiummaintenance

2013-12-29 Thread Jimmy O'Regan
On 29 December 2013 14:57, Mikel Forcada m...@dlsi.ua.es wrote:
 Al 12/29/2013 03:50 PM, En/na aboobacker sidheeque mk ha escrit:
 may be, but I am a person not a company . BTW I am currently trying to
 create apertium ppa for ubuntu ,
 https://launchpad.net/~aboobackervyd/+archive/apertium  ,it is not
 completed yet:-)
 I wonder what the project management committee would say, but personally
 I wouldn't have any problem hiring a person instead of a company. If it
 were by me, if you think you can provide the services, prepare a bid
 which is attractive technically as well as economically, and you'll be
 taken into consideration.

I'll second that.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] A design limitation: perfect format handling in transfer may be impossible

2013-12-25 Thread Jimmy O'Regan
On 25 December 2013 09:01, Mikel Forcada m...@dlsi.ua.es wrote:
 Al 12/24/2013 08:51 PM, En/na Jimmy O'Regan ha escrit:
 This is a known issue (e.g., Jacob mentions it in this thread from
 2009:http://sourceforge.net/mailarchive/forum.php?thread_name=20cf28cd0904300204v45f35e51i118f4d146f83748%40mail.gmail.comforum_name=apertium-stuff)
 A minor quibble: Sergio's message does not address XML validity at all,
 which is one of the key points in my message.

Many quibble returns: I said Jacob (3rd message in the thread), not Sergio :)

Merry Christmas.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] A design limitation: perfect format handling in transfer may be impossible

2013-12-24 Thread Jimmy O'Regan
On 24 December 2013 15:34, Mikel Forcada m...@dlsi.ua.es wrote:
 Hi all,

 As part of my work with students in the Google Code-In (notably
 galaxyfeeder) I have found a limitation in the current design of
 Apertium, as regards handling of format tags (encapsulated as
 superblanks) in Apertium.

 I would appreciate it very much has time to turn this message into a
 proper bug report, although, as will be seen, rather than a bug, it is a
 design limitation.

 Since transfer rules (.t1x, .t2x) have to move superblanks around
 explicitly, it may be the case that valid HTML or XML is rendered
 invalid. For instance, a translated ODT file may not open, or a
 translated XHTML page may not be valid.


This is a known issue (e.g., Jacob mentions it in this thread from
2009: 
http://sourceforge.net/mailarchive/forum.php?thread_name=20cf28cd0904300204v45f35e51i118f4d146f83748%40mail.gmail.comforum_name=apertium-stuff)

 For instance a rule can move around b pos=1/ and b pos=2/. If b
 pos=1/ is sometag and b pos=2/ is  /sometag, the result
 is that /sometag comes before sometag, leading to invalid XML or HTML.

 Similar validity errors may be introduced when tags are lost or repeated.

 Careful writing of rules may avoid this. In each rule, one can always
 make sure output superblanks in the same order, and as late as possible,
 so that the format is preserved as much as possible.

 But not everything can be avoided this way.

 Even if superblanks inside a .t1x chunk are correctly handled, .t2x may
 move chunks around (with their superblanks inside, so nothing can be
 done about it) and lead to invalid HTML or XML.

 I see no easy way to solve this without a serious redesign of blank
 management (perhaps by keeping a standoff list of blanks outside the
 stream). But I think it's good to be aware of it.


Matxin's format (which is already supported by some of the tools)
might be a good starting point for this, but it would be best to use
an XML parser for XML-based formats. You mentioned ITS support as a
wishlist item not too long ago, which would make parsing a
requirement; perhaps it would be best to bundle the two together for a
GSoC project.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Task ideas for Google Code In

2013-12-07 Thread Jimmy O'Regan
On 7 December 2013 10:02, Gabriel Esteban Gullón yufu...@gmail.com wrote:
 Hi,

 I'm Gabriel Esteban, one of the students of GCI of this year. The other day,
 I downloaded the apertium app on my phone, and I see a lot of things that
 can be improved (Also I download the apertium app from the svn) In the
 following lines I will put forward all the things that I think that can be
 improved.

 The translations of the app. The app is only translated to english and
 french, I propose to create some task for translating the app to other
 language. It's easy, you only need to translate by hand one xml. If you
 want, I can carry with the task of getting all the files created by others
 students and uploading to the svn.
 Design. I propose to create a task that asks for implementing
 ActionBarSherlock, thats it's a library that allow to include ActionBar on
 devices upper android 2.2. (I don't know what licenses uses
 ActionBarSherlock, it can be another task).

https://github.com/JakeWharton/ActionBarSherlock says Apache 2.0

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Sponsored by Intel(R) XDK 
Develop, test and display web and hybrid apps with a single code base.
Download it for free now!
http://pubads.g.doubleclick.net/gampad/clk?id=111408631iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Short technical question

2013-12-02 Thread Jimmy O'Regan
On 2 December 2013 12:17, Yannis Haralambous
yannis.haralamb...@telecom-bretagne.eu wrote:
 thanks for your answer!

 concerning the tagger training, I think there is a lack of information on the 
 Wiki.

 1) if I choose unsupervised training, there is a page describing what to do, 
 starting with a raw text file in the given language. It is not clear whether 
 the TSX file is generated during the unsupervised training, or whether it has 
 to exist already


It has to exist already. There is currently nothing that generates adequate TSX.

 Also, what do you mean by closed list? In the examples given or in the 
 existing TSX files, I don't see why some lists of fine tags are called 
 closed and others not...


It's 'closed' if nothing new will be added to it, 'open' otherwise.
Prepositions and conjunctions are (usually) closed, while nouns and
verbs are typically open.

 2) the supervised training method not being documented...

 ...I was wondering whether I can produce a TSX file by running TreeTagger on 
 some large amount of text and then search for frequent/forbidden patterns in 
 the tags produced? If it works, it would mean that all I need to do is to 
 establish a match between TreeTagger tags (the coarse ones), and Apertium 
 tags (the fine ones).


TreeTagger's tags would more or less correspond to fine tags in Apertium.

 Final question: is there somewhere a description of the .prob file format?


No, there's nothing to describe it other than the code that reads and
writes it. You can get a dump of the probabilities using the prob2text
tool that comes with the tagger training tools package.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349351iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Short questions

2013-12-01 Thread Jimmy O'Regan
On 1 December 2013 10:13, Yannis Haralambous
yannis.haralamb...@telecom-bretagne.eu wrote:
 Hi again,

 could you please help me in understanding the semantics of the structural 
 transfer module programming language, by answering a few short questions?

 I'm reading the code apertium-es-ca.ca-es.t1x.

 In the rule called REGLA: NOM you use the macro f_enviaa. In this macro you 
 have the following code:

 equal
   clip pos=1 side=sl part=a_npant/
   lit-tag v=np.ant/
 /equal

 I do understand that you test the value of attribute a_npant of the class, it 
 is defined as follows:

 def-attr n=a_npant
   attr-item tags=np.ant/
 /def-attr

This part of the macro is irrelevant to 'REGLA: NOM', as it will never
contain 'np.ant'. If you look at the pattern:
  pattern
pattern-item n=nom/
  /pattern

which in section-def-cats is:

def-cat n=nom
  cat-item tags=n.*/
/def-cat

you'll see that it can never match 'np.ant'. I presume this macro is
used in other rules, which can match np.ant


 Is the purpose of the test that the two tags np (proper noun) and ant 
 (anthroponym) should be present in the source token?


No, it is not a test that they _should_ be present, it is a test for
_if_ they are present.

The macro can be explained as if the variable 'valverb' contains the
value '2' and if _either_ the sl tags contain 'np.ant' _or_ the tl
lemma is in the list 'huma', then output the preposition 'a'. The
second part of the 'or' makes this macro relevant to this rule: if the
tl lemma is in the list 'huma'.

 Later in the rule you send a lexical unit to the output:

   lu
 clip pos=1 side=tl part=lemh/
 clip pos=1 side=tl part=a_nom/
 clip pos=1 side=tl part=gen/
 clip pos=1 side=tl part=nbr/
 clip pos=1 side=tl part=lemq/
   /lu

 and I see that you send a_nom, which is

 def-attr n=a_nom
   attr-item tags=n/
   attr-item tags=n.acr/
   attr-item tags=np.loc/
 /def-attr

 Which one of the three tags do you send to the output? How is the choice done?

The tag will be either 'n' or 'n.acr', depending on what is on the tl
side of the lexicon. def-attr selections are made using regexes. (It
cannot be np.loc in this case, as that is not matched by the rule).


 Furthermore, you separate lemh and lemq, but in the rule there has been no 
 segmentation of the lemma, where does the segmentation come from?


lemh and lemq (and lem, whole, and tags) are predefined by transfer.
In this case, with a multiword with inner inflection, e.g.
tenervblex# en cuenta', lem will contain 'tener# en cuenta', lemh
will contain 'tener' and lemq will contain ' en cuenta'. This is
mostly used for verbs with enclitic pronouns, which need to be placed
between lemh and lemq.

 Another question: in the same rule, to decide whether you are going to apply 
 f_concord1 (which checks gender and number and sets variables genero and 
 numero) or f_enviaa (which sends an a only if the variable valverb==2 or if 
 the token is an anthroponymic proper noun), you check whether the lemma is 
 equal to pas in the singular number. I looked in the dictionary and pas 
 means step. I was wondering how come this word pas (in the singular) serves 
 to detect anthroponymic proper nouns?

It doesn't. At all. The macros are skipped in the single case of 'pas'
-- neither apply.


 Finally, on line 3388 starts rule DETERMINANT NOM, this rule uses two tokens, 
 the determinant and the noun:

   pattern
 pattern-item n=det/
 pattern-item n=nom/
   /pattern

 that makes two tokens.  But on line 3404 I see the following code:

 test
   in caseless=yes
 clip pos=3 side=sl part=lem/
 list n=mesos/
   /in
 /test

 with the purpose of checking whether the noun is a month name. Here the pos 
 argument takes value 3. What is the meaning of pos=3 when there are only 
 two tokens?

That's an error; it should be 'pos=2'.


-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349351iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] A bug in the t1x processor?

2013-11-22 Thread Jimmy O'Regan
On 22 November 2013 14:22, Mikel L. Forcada m...@dlsi.ua.es wrote:
 Dear Sergio, dear list,

 Aida and I think we have found a bug in the t1x processor and we have chased
 it down to a single file containing only the definitions necessary and a
 single rule. Unfortunately, it has to be tested by installing
 apertium-eng-kaz (following the steps here:
 http://wiki.apertium.org/wiki/English_and_Kazakh (hfst and all!, sorry).

That shouldn't be necessary.


 How to test the (possible) bug: install apertium-eng-kaz, then replace
 apertium-eng-kaz.kaz-eng.t1x with the file with the same name in the dev/
 directory, compile, and run this test:

 echo жазғанмын | apertium -d. kaz-eng-transfer

 The output should be:

 apertium-transfer: Rule 1 жазvtvpastp1sg/
 writevblexpastp1sg/recordvblexpastp1sg
 ^+++ HELLO, I AM THE WRONG RULE! +++ The following should be ND or
 zzz:sg$^defaultdefault{^.sent$}

'zzz', not 'zzz' - the assignment is 'lit', not 'lit-tag' (and
to assign a tag, it would have to be declared in the relevant
def-attr).


 If you look at the rule, first the value ND to the number clip is assigned
 and then there is a choose block that tests for 1st person and assigns
 zzz to the number clip. None of these two values are printed; instead, the
 value extracted from жазvtvpastp1sg is printed, namely sg.

 We also saw some other strange behavior, but this was the easiest to
 reproduce.

 Basically, assignments to clips are overriden and the values obtained in
 previous assignments seem to prevail.

 We would appreciate it very much if someone could look into this bug.

You're trying to change 'sl'; apertium-transfer wasn't designed with
that in mind. I'm dimly aware that there was support added for input
lt-proc -b , I was not aware that the source would be preserved even
with that. Víctor wrote a while back about a bug in that support, that
involved a variable that was not used - are you using a version with
Víctor's patch applied? Similarly, it could be that sl output (if
there is any!) is coming straight from the input buffer.

But more fundamentally, do you actually intend to change sl?


 All the best

 Mikel

 P.S. Another behaviour we observed is that under some circumstances you
 cannot assign values to clips outside their definition range, but we haven't
 been able to isolate the problem.

You can only assign, using lit-tag, values that are included in the
part's def-attr. You can get around this with lit if you need to.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Shape the Mobile Experience: Free Subscription
Software experts and developers: Be at the forefront of tech innovation.
Intel(R) Software Adrenaline delivers strategic insight and game-changing 
conversations that shape the rapidly evolving mobile landscape. Sign up now. 
http://pubads.g.doubleclick.net/gampad/clk?id=63431311iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Windows installation problems

2013-11-21 Thread Jimmy O'Regan
On 21 November 2013 15:49, Kevin Brubeck Unhammer unham...@fsfe.org wrote:
 Jimmy O'Regan jore...@gmail.com
 writes:

 On 21 November 2013 15:05, Kevin Brubeck Unhammer
 unham...@fsfe.org wrote:
 Jimmy O'Regan jore...@gmail.com
 writes:

 I'm not 100% about this, but there was a problem with Cygwin recently
 - IIRC, certain programs are no longer installed by default - and we
 should really either update that installer, or remove it.

 Seems like it needs an update, yes:
 http://www.google-melange.com/gci/task/view/google/gci2013/6396457749839872
 http://superuser.com/a/628401

 Does the source for that installer exist anywhere? (Can't find anything
 likely in SVN.)

 It's probably in Melange -- that was a GCI student, two or three years ago.

 The sourceforge files seem to be from 2010. I found
 http://www.google-melange.com/gci/task/view/google/gci2010/7068214
 http://www.google-melange.com/gci/task/view/google/gci2010/7076217
 but: Download Broken :-/

I'll see if I have a copy on my old laptop, but digging up a charger
for it might prove troublesome.


-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Shape the Mobile Experience: Free Subscription
Software experts and developers: Be at the forefront of tech innovation.
Intel(R) Software Adrenaline delivers strategic insight and game-changing 
conversations that shape the rapidly evolving mobile landscape. Sign up now. 
http://pubads.g.doubleclick.net/gampad/clk?id=63431311iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Internationalization Tag Set

2013-11-05 Thread Jimmy O'Regan
On 5 November 2013 06:38, Mikel Forcada m...@dlsi.ua.es wrote:
 Al 11/05/2013 02:12 AM, En/na Jimmy O'Regan ha escrit:
 Last sentence of the abstract: ITS 2.0 focuses on HTML, XML-based
 formats in general, and can leverage processing based on the XML
 Localization Interchange File Format (XLIFF), as well as the Natural
 Language Processing Interchange Format (NIF). -- tl;dr, it's not just
 for XML.
 Jim:

 XLIFF is an XML application.

With added emphasis: ITS 2.0 focuses on *HTML*, etc. Not just XML.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
November Webinars for C, C++, Fortran Developers
Accelerate application performance with scalable programming models. Explore
techniques for threading, error checking, porting, and tuning. Get the most 
from the latest Intel processors and coprocessors. See abstracts and register
http://pubads.g.doubleclick.net/gampad/clk?id=60136231iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Internationalization Tag Set

2013-11-04 Thread Jimmy O'Regan
On 4 November 2013 18:38, Bernard Chardonneau bechapert...@free.fr wrote:
 User-Agent: Mozilla/5.0 (X11; Linux i686;
   rv:24.0) Gecko/20100101 Thunderbird/24.0
 Date: Mon, 04 Nov 2013 10:23:54 +0100
 From: Mikel L. Forcada m...@dlsi.ua.es
 To: apertium-stuff@lists.sourceforge.net
 Reply-To: apertium-stuff@lists.sourceforge.net
 Subject: [Apertium-stuff] Internationalization Tag Set

 Hi Apertiumers!

 A new standard has been adopted by the W3C which relates the
 internationalization of web content. I think we in Apertium should be
 aware of this:

 http://www.w3.org/TR/its20/

 All the best

 Mikel

 --

 OK but what to do with that ?

 No problen if it is for apertium.org website for the small part outside
 the wiki. The wiki is not in XML format.


Last sentence of the abstract: ITS 2.0 focuses on HTML, XML-based
formats in general, and can leverage processing based on the XML
Localization Interchange File Format (XLIFF), as well as the Natural
Language Processing Interchange Format (NIF). -- tl;dr, it's not just
for XML.

The wiki generates HTML, and it's not a major task to add templates
for ITS. Further, ITS is designed to be used inline, or as stand off
annotation. It's possible to use ITS stand off to annotate even plain
text, though the XPath to do so would be horrible.

But using ITS annotation for documentation or language data is about
the last thing I think of in relation to Apertium. At the most basic,
it would be nice to have Apertium respect ITS instructions that say
'don't translate this part of the document', for example, or to skip
sections that have been translated by another tool, or even to add
basic provenance information.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
November Webinars for C, C++, Fortran Developers
Accelerate application performance with scalable programming models. Explore
techniques for threading, error checking, porting, and tuning. Get the most 
from the latest Intel processors and coprocessors. See abstracts and register
http://pubads.g.doubleclick.net/gampad/clk?id=60136231iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] [Fwd: [GSoC Mentors] Google Summer of Code 2014 + 10 Things]

2013-10-08 Thread Jimmy O'Regan
On 8 October 2013 20:14, Francis Tyers fty...@prompsit.com wrote:
 Hey all!

 Looks like GSOC will be taking place next year ! \o/ \o/ \o/ We got the
 notice a lot earlier this year :)

Yeah... what gives? It's not even 2014 yet! :)

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60134071iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Tagger training prerequisites

2013-09-23 Thread Jimmy O'Regan
On 23 September 2013 08:17, Per Tunedal per.tune...@operamail.com wrote:

 Hi,
 what should the text-files look like before starting the tagger
 training? One sentence a line? Something else?

 Is a text formatted like below OK:

 Antingen genom att gå in under rätt rubrik ovan och lägga till ditt
 bidrag eller lägg ditt bidrag i bufferten om du inte vet var eller hur
 det ska stå.
 I Önskelistan lägger du förslag på sånt du tycker borde vara med.

 Or should e.g. the punctuation marks be separated like:
 I Önskelistan lägger du förslag på sånt du tycker borde vara med .


No, you don't need to do that. You don't really need to have the text
split into sentences either, but it makes life a little easier if
there are problems.

Some of the older language pairs have makefiles for tagger training.
At a minimum, you will need to adapt the variables for language, and
make sure that lt-proc is called with the same set of switches as the
primary mode (if you're training for Swedish in sv-da, the mode will
be the one that starts mode name=sv-da install=yes).

The tagset specification is where you have the most scope to control
the tagger. I wrote a linter tool because of problems you were
reporting, I'd recommend that you run it before training.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. 
http://pubads.g.doubleclick.net/gampad/clk?id=58041151iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Tagger training prerequisites

2013-09-23 Thread Jimmy O'Regan
On 23 September 2013 15:45, Per Tunedal per.tune...@operamail.com wrote:
 Hi,
 Thanks!
 I noticed your tool, but unfortunately I'm not sure how to use it!

SYNOPSIS

apertium-tsx-lint tsx-file [DIC]

[DIC] is the 'dictionary' generated during tagger training (not an
actual dictionary!). It'll run without it, but it won't give all the
warnings.

BTW -- are you training for Swedish? Supervised or unsupervised?

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. 
http://pubads.g.doubleclick.net/gampad/clk?id=58041151iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] New mode for the Apertium Tagger

2013-09-21 Thread Jimmy O'Regan
On 21 September 2013 18:29, Mikel Forcada m...@dlsi.ua.es wrote:
 Al 09/21/2013 02:11 PM, En/na Francis Tyers ha escrit:
 No, basically I'm asking if it can work without specifying the set of
 coarse tags.
 What would happen if one did not specify the set of coarse tags in the
 HMM tagger?


The tagset specification serves two purposes: to cluster similar tags,
and to mark which of these are open, and which are closed. Without
open classes, the tagger will fail to train, as there is nothing to
assign to unknowns; without closed classes, the tagger is free to
assign them to unknowns. Without clustering, the size of the model
balloons, and data sparseness becomes a greater problem.

 It would be nice to have this feature, but I think this is a bit out of
 the scope of Gang's project.


-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/22/13. 
http://pubads.g.doubleclick.net/gampad/clk?id=64545871iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] One more difference between Swedish and Danish monodix

2013-09-11 Thread Jimmy O'Regan
On 11 September 2013 07:38, Per Tunedal per.tune...@operamail.com wrote:

 Hi,
 Apertium presupposes that the form in the source language could be
 generated in the target language, right?

Yes and no.

Apertium by default passes on the remainder of the tags after what is
matched in the bidix. So if the input is 'foonsg', and the bidix
has 'foon:barn' then the output will be 'barnsg'. This is what
happens with the default rule, or if the rule that matches uses
'part=tags'. But, transfer rules are generally written to have more
selective 'part's, and the tags can otherwise be modified by transfer.

 What if the form doesn't exist
 in the target language? How to handle that?

 The Swedish adjective blå (=blue) might have the old-fashioned
 masculine definite form ending on -e: blåe, just as most other
 adjectives. As far as I know there isn't any masculine form in Danish,
 anyhow there isn't anyone in the original Danish monodix. How do I
 manage to translate blåe to Danish? It's analysed as adj.pst.m.sg.def,
 but a similar form doesn't exist in Danish.


If this is truly exceptional, add an entry with the full amount of
needed tags (i.e., as far as 'm'); if it's not, handle it in
transfer. The output will probably need to be 'GD', but that assumes
that concordance is done in transfer (it ought to be, but...)

 BTW A similar problem would occur if I ever try to translate French or
 Spanish to Swedish: In French and Spanish verbs in subjunctive form
 flourish, but they doesn't exist in Swedish (except in some rare cases,
 mainly idiomatic expressions). How is this handled in the pair en-es?

It's handled in transfer, but the en-es transfer rules are not exactly
beginner-friendly -- you'd need to gain quite a bit of experience with
transfer to hope to understand some of them.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
How ServiceNow helps IT people transform IT departments:
1. Consolidate legacy IT systems to a single system of record for IT
2. Standardize and globalize service processes across IT
3. Implement zero-touch automation to replace manual, redundant tasks
http://pubads.g.doubleclick.net/gampad/clk?id=5127iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Differences in paradigmes for Swedish and Danish

2013-09-11 Thread Jimmy O'Regan
On 11 September 2013 07:15, Per Tunedal per.tune...@operamail.com wrote:
 Hi,
 yes, this has to be corrected for several entries. If it's corrected in
 is-sv I might just copy the entries: I have copied these once before.
 But my original question is:
 The translation to Swedish (generation) cannot work if two forms have
 the same analysis, can it? Apertium cannot choose what to generate, can
 it?
 How to handle that?

Direction restrictions. In the monodix, e r=LR means 'analyse
only' (i.e., do not generate), and e r=RL means 'generate only'
(i.e., do not analyse). So you would change e to e r=LR.

But in your case, the analysis was wrong, so fix that instead.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
How ServiceNow helps IT people transform IT departments:
1. Consolidate legacy IT systems to a single system of record for IT
2. Standardize and globalize service processes across IT
3. Implement zero-touch automation to replace manual, redundant tasks
http://pubads.g.doubleclick.net/gampad/clk?id=5127iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Differences in paradigmes for Swedish and Danish

2013-09-10 Thread Jimmy O'Regan
On 10 September 2013 18:09, Per Tunedal per.tune...@operamail.com wrote:
 Hi,
 Working on the Swedish verb ställa and the Danish equivalent stille.
 I'm confused about the entries in the sv monidix as some have the very
 same tags:

 pardef n=följ/a__vblex
   e   pla/l ras n=vblex/s
   n=inf//r/ppar n=S__voice//e
   e   pler/lras n=vblex/s n=pres/s
   n=actv//r/p/e
   e   ples/lras n=vblex/s n=pres/s
   n=actv//r/p/e
   e r=LRpls/l ras n=vblex/s n=pres/s
   n=actv//r/p/e

According to sv.wiktionary, these last two are both passive, but this
looks to be the problem - two generation candidates for
vblex.pres.actv. It's either mistagged, or a restriction needs to be
added.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
How ServiceNow helps IT people transform IT departments:
1. Consolidate legacy IT systems to a single system of record for IT
2. Standardize and globalize service processes across IT
3. Implement zero-touch automation to replace manual, redundant tasks
http://pubads.g.doubleclick.net/gampad/clk?id=5127iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] OT Punctuation in Spanish was: Re: IBM1 partly better than Apertium from French to Spanish

2013-09-03 Thread Jimmy O'Regan
On 3 September 2013 14:30, Per Tunedal per.tune...@operamail.com wrote:
 Hi,
 I presume that in my toy corpus the most appropriate would be:

 ¡Tomad un bloque!

Probably. And if it's really important to you, as you've got a regular
one-line-per-sentence layout, you can write a simple script to insert
it.


 Further, I assume that Tomad ¡un bloque! and Tomad un ¡bloque! both
 emphasizes that the person should take a block and not for instance a
 cone. Is there any difference between them?


It depends on context, I guess.

 Is it possible to emphasize that the person should take one and not two
 items? Tomad  ¡un! bloque.  or is this done in an other way?

I assume so, I just hadn't thought of it.

Anyway, I was just trying to make the point that it's a more difficult
problem than it might appear to be (in fact, I'd say it's
AI-complete), not to start a discussion.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] IBM1 partly better than Apertium from French to Spanish

2013-09-03 Thread Jimmy O'Regan
On 3 September 2013 14:51, Per Tunedal per.tune...@operamail.com wrote:
 Hi again,
 one more thing:

 I suppose I should start the sentences with a capital letter too? Or
 doesn't that matter to Apertium?

It shouldn't make much of a difference with the input, but you'll find
that some language pairs capitalise the first word in the sentence,
regardless of whether or not it was capitalised in the input.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] IBM1 partly better than Apertium from French to Spanish

2013-09-02 Thread Jimmy O'Regan
On 2 September 2013 15:49, Xavi Ivars xavi.iv...@gmail.com wrote:
 2013/9/2 Jimmy O'Regan jore...@gmail.com


  The first test
  sentences for the Block World Corpus are better translated by the
  outdated statistical translation model IBM model 1 in the direction
  French to Spanish. Apparently, Apertium has some problems with the
  imperative of  verbs and goes for the subjunctive used in negated
  requests (this problem persists in the omitted sentences):
 
  Original:
 
  prenez une flèche
  prenez un bloc
  prenez un cône bleu
 

 Your sentences are not terminated. If they had been, you would have
 seen the output you expected.



 This is something I noticed when the first email was sent, but I didn't look
 deeper in it: it doesn't make sense than in a rule-based engine like
 Apertium prenez was translated sometimes as tomad and sometimes as
 tomáis, when in fact one of the (key?) benefits of the rule-based systems
 is predictability on the output.

It's not hard to understand. There are no sentence boundaries, making
this one big sentence. The tagger sees something that may be an
imperative following a noun and, because that's unlikely, chooses
something else instead (as would a rule-based disambiguator with a
reasonable set of rules). Garbage in, garbage out.


-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Updating errors of the website - Warning!!

2013-08-02 Thread Jimmy O'Regan
On 2 August 2013 17:02, Guillermo Puebla Suárez
guillerpue...@hotmail.com wrote:
 Hello to all Apertiumers,


Hi!

 I'm new on this mailing lists,

To be perfectly frank, from the tone of your writing I think that
you're new to mailing lists and open source in general. That's ok;
everyone begins at the beginning.

 but I'd like to inform you about not updating
 the website and pages related to Apertium (Eslema, Prompsit, Opentrad,
 etc.).

So here's how it works: Apertium is open source, so everyone is free
to run their own webservices providing Apertium. That does not mean
that we have any influence over them: the groups you listed are
separate entities. We can only announce new releases (see below) and
hope that they update.

 I'm specifically working on Asturian and Spanish languages.

Thanks for your contributions.

 I told
 Francis to change some codes in es-ast (Spanish-Asturian) package but we are
 not able to enjoy them because the WEB IS NOT UP-TO-DATE.

Typing in uppercase is considered shouting on mailing lists. Please
don't shout, we can hear you just fine :)

That the web is 'not up to date' -- that's to be expected, and that's
*what we want*. The version in SVN is a development version: it's
untested. To best present ourselves, we only want the tested, released
versions to be presented to users. For a company such as Prompsit, it
could even be irresponsible to present a development version, as they
may have customers depending on the service.

If you're prepared to do the work involved in preparing a new release,
we'd be happy to help, but otherwise, you'll just have to wait until
someone else is prepared to do that work.

 I encourage
 anybody who has got access to do this to contact me at this email and update
 the website with the latest sources, I'm free all day long and part of
 night.

'Send me an offlist email' is usually not the done thing on mailing
lists: we answer in public (and use mailing list archives) so our
answers can be of use to anyone who may be searching for the answer
later (and so to the benefit of the project as a whole), not
specifically for the benefit of the person asking. (On some mailing
lists, if you ask for offlist email, you'll be presented with a set of
consulting rates).

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Updating errors of the website - Warning!!

2013-08-02 Thread Jimmy O'Regan
On 2 August 2013 18:53, Guillermo Puebla Suárez
guillerpue...@hotmail.com wrote:
 In short, what are the tested versions (from SVN I imagine)?


No, the tested versions are the tarballs available for download here:
https://sourceforge.net/projects/apertium/files/


-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] merge of ATT - lttoolbox binary compiler into lttoolbox

2013-07-19 Thread Jimmy O'Regan
On 19 July 2013 17:49, Francis Tyers fty...@prompsit.com wrote:
 One question:

 My preference is for lt-comp to parse both the .dix and the ATT files.

 The behaviour would be:

 if(fileIsValidXML()) {
   parse_xml
 else if(fileIsValidATT()) {
   parse_att
 } else {
   fail
 }

 Would this be ok for people?

 * I'd prefer to have the ATT compiler as part of lt-comp as opposed to
 a separate program.
 * I'd prefer it to work out the file format automatically rather than
 for people to have to specify the format to compile from.

Er... why? 'lt-comp lr' and 'lt-comp rl' are presumably irrelevant, so
why not make it 'lt-comp att' or whatever else makes sense.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] idea about transfer files

2013-07-13 Thread Jimmy O'Regan
On 13 July 2013 10:38, Francis Tyers fty...@prompsit.com wrote:
 While we're on the subject of the transfer files and making
 changes, I had an idea the other month about making it easier
 to teach apertium transfer: make the attributes
 and variables sections optional.

Try out r45737. Nothing seems to have broken so far, but I'll back it
out if something does.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] idea about transfer files

2013-07-13 Thread Jimmy O'Regan
On 13 July 2013 15:34, Mikel Forcada m...@dlsi.ua.es wrote:
 Jim, Fran:

 Sorry but I have to step in and say that I am not happy with the procedure
 followed here.


Neither am I. I'd love to be able to put the changes in a branch, say
try out this branch, and merge if all is well or discard if not. As
is, the only choice for *both* making changes *and* having someone
else test them is commit/revert or never to change anything.


-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Ask for help on HMM unsupervised training

2013-06-07 Thread Jimmy O'Regan
On 6 June 2013 10:14, Francis Tyers fty...@prompsit.com wrote:
 I think the problem is that the extra analyses are added by regular
 expressions which are not covered in the expansion.

Not with 'Mar'. The regexes that were in those dictionaries1) were not
specific about gender (even when they could/should have been), 2) did
not capture individual words like that, and 3) are disabled in those
dictionaries because they prove the JWZ 'now you have two problems'
axiom. (Actually, at least 3, last count)


-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Rules for proper names

2013-05-31 Thread Jimmy O'Regan
On 30 May 2013 18:47, Francis Tyers fty...@prompsit.com wrote:
 El dj 30 de 05 de 2013 a les 19:42 +0200, en/na Per Tunedal va escriure:
 The most difficult part would be to find the names. Perhaps someone has
 any ideas?

 In Icelandic--English, regular expressions are used. See e.g. pardefs
 for persons and lastnames in is.dix

 This is not altogether recommended though, as regular expressions slow
 down your transducer. What you could do is use them on a large corpus
 and then mass-add the ones after superficial checking.

Census data is easy to find, gazetteers for NER are easy to find,
en.wiktionary has categories for names
(http://en.wiktionary.org/wiki/Category:Surnames_by_language
http://en.wiktionary.org/wiki/Category:Male_given_names_by_language
http://en.wiktionary.org/wiki/Category:Female_given_names_by_language),
as do en.wikipedia (http://en.wikipedia.org/wiki/Category:Surnames
http://en.wikipedia.org/wiki/Category:Given_names), da.wikipedia
(http://da.wikipedia.org/wiki/Kategori:Efternavne
http://da.wikipedia.org/wiki/Kategori:Fornavne), and sv.wikipedia
(http://sv.wikipedia.org/wiki/Kategori:Efternamn
http://sv.wikipedia.org/wiki/Kategori:Förnamn), and Europarl has
speaker annotation which contains the name of the speaker.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with 2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Stange behaviour of the on-line version of the translator

2013-05-13 Thread Jimmy O'Regan
On 13 May 2013 22:07, Bernard Chardonneau bechapert...@free.fr wrote:
 A less important problem, changes done on availlable language pairs since
 more than one year are not yet taken into account by on-line translators.
 This point also concern http://apertium.saluton.dk website.

Presumably, those pairs have not had a new release. The development
versions in SVN are often quite unstable, and are otherwise rarely as
thoroughly tested as the versions that are released, and it would be
unwise to use them in a web service (if not outright damaging to the
project's reputation!).

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] [GSoC 2013] Simpledix improvements

2013-05-03 Thread Jimmy O'Regan
[Sorry, I've only noticed now that the email didn't send!]

On 30 April 2013 08:53, d...@alu.ua.es d...@alu.ua.es wrote:
 2013/4/30 Jimmy O'Regan jore...@gmail.com
 On 29 April 2013 18:07, d...@alu.ua.es d...@alu.ua.es wrote:
  Hi everybody,
 I'd prefer to have a meta-configuration: if it sees 'vblex', then
 generate 'pri.p3.sg', 'inf' and 'pp.m.sg', etc. and generate the
 configuration based on that. It would be trivial to add a task to
 dixtools to do this, and should be easy enough to do otherwise.


 The automatic script already takes that kind of meta-configuration. You can
 see an example at the end of
 (http://wiki.apertium.org/wiki/User:Dtr5#Making_your_own_configuration_file).

 But that method has some problems: it is really slow (takes around 2 hours
 for processing es-ca dictionaries with the sample configuration), it is

That seems wrong. There should be no reason for this to happen. The
maximum that I would expect from a dixtools-based tool to do this
would be a few seconds. Perhaps you should investigate that?

  I encourage developers to test Simpledix
  (http://apertium.vm.bytemark.co.uk/simpledix). It only has configuration
  files for the es-ca pair, but it would give you a better understanding
  of
  the current state of the tool, and see how it could be improved.
 
  If somebody needs a bit more information, you can read the tutorial on
  the
  wiki (http://wiki.apertium.org/wiki/User:Dtr5).
 
  I am looking forward to hearing some feedback on this project.

 I really like the idea of having an easy to use interface for editing
 the dictionaries, but I'd like you to give some thought to the _next_
 problem, too: what to do with the changes, to make it easier for users
 to contribute them. Passing whole dix files around can work, but would
 be quite a pain - it would be much better to be able to pass just the
 changes. Do you have any thoughts on that?


 When you export the dictionaries, a simple xslt transformation puts all the
 new entries at the end of the dictionary. I could provide only the
 difference, greatly reducing the size of that download.


Sure, that's an option. There should be plenty of pre-built diff/patch
tools out there.

 As for uploading, I think nothing can be done.

There are plenty of options. At the most basic, all of our
interactions with SVN are via HTTP. At the very least, you can provide
a configuration option to specify the address of the package in SVN,
then download the files directly from there. With a little more
effort, there are functions for SVN
(http://php.net/manual/en/ref.svn.php) so at the very least, you can
provide the revision number of the dictionaries that have been
modified. Yet more complicated would be to use git as a backing store
(e.g., using http://gitorious.org/git-php), and create a branch
whenever someone edits the dictionaries. Language pair maintainers who
are able to use git could pull directly, or git's machinery could be
used to export patch sets. It would even give the option of allowing
logged in users to pick up where they left off.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with 2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] [GSoC 2013] Simpledix improvements

2013-05-03 Thread Jimmy O'Regan
On 3 May 2013 11:57, d...@alu.ua.es d...@alu.ua.es wrote:
 [Sorry, I've only noticed now that the email didn't send!]


 No problem.

 I got some work for this summer, so I don't think I can give GSoC ~30h/week.
 Won't apply this year. If (as expected) I get some free time late June, I'll
 do the configuration file generation improvements.


That's a pity, but best of luck with the job!

 That seems wrong. There should be no reason for this to happen. The
 maximum that I would expect from a dixtools-based tool to do this
 would be a few seconds. Perhaps you should investigate that?


 It does not use dixtools, but some bash + xslt. Of course, it should not
 take more than a couple of minutes.


Yeah, I took a (brief!) look at the scripts. I have seen something
like this kind of slowdown before, (with EXSLT and xsltproc, IIRC),
but nothing struck me as familiar.


  As for uploading, I think nothing can be done.

 There are plenty of options. At the most basic, all of our
 interactions with SVN are via HTTP. At the very least, you can provide
 a configuration option to specify the address of the package in SVN,
 then download the files directly from there. With a little more
 effort, there are functions for SVN
 (http://php.net/manual/en/ref.svn.php) so at the very least, you can
 provide the revision number of the dictionaries that have been
 modified. Yet more complicated would be to use git as a backing store
 (e.g., using http://gitorious.org/git-php), and create a branch
 whenever someone edits the dictionaries. Language pair maintainers who
 are able to use git could pull directly, or git's machinery could be
 used to export patch sets. It would even give the option of allowing
 logged in users to pick up where they left off.


 As is, you can choose to upload the dictionaries and configuration files
 from an url, like the ones sourceforge provides for direct download.


Yes, you can pull the individual files straight from SVN.

 Interacting directly with repositories seems a good idea, but it requires a
 major rework: the tool would need to know how to test the dictionaries
 before uploading to the repository. Nowadays, it does not even need Apertium
 to work.


I don't think you'd need to trouble yourself with doing more than
validating (the XML of) the dictionaries (which is just a DTD-based
validation). It'd be nice, sure, to check if they also compile, but as
long as the XML is more or less valid, it should be ok.

 Another option is downloading the tool and setting it locally. Simpledix
 only needs a web server, xsltproc and BaseX, all of them easily installed
 (they are part of the official debian repositories).

 As for keeping the progress of the users, if you don't close your session,
 you can save the url (that has your id as a get parameter), and keep working
 later. But as the tool lacks proper session management, anybody can use that
 id, and close your session (erasing your progress), so is not advised to
 work that way.


I was just mentioning that as a positive side-effect -- it doesn't
overly interest me. I'm far more interested in making it as easy as
possible for potential contributors to contribute, and to merge those
contributions. Really, I have github's pull requests in mind as the
ideal: there's an open source clone called GitLab (http://gitlab.org)
and they seem to have this model, so it might not be too hard to port
to PHP.


-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with 2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] GSoC: Visual editor for transfer rules

2013-05-01 Thread Jimmy O'Regan
On 1 May 2013 15:22, Lipka Boldizsár lip...@zoho.com wrote:
 Hi all,


Hi!

 I'm a Molecular Bionics student from Hungary (yeah, quite far away from
 machine translation, I know, but at least I can code) and am interested in the
 GSoC idea Visual interface for writing structural transfer rules. I did some
 research on the matter already, gone through the New Language Pair Howto, the
 machine translation series at wiki.apertium.eu and I'm currently reading the
 Transfer Rules Examples article. Do you think I need anything else to pu
 together a good proposal?


You could show us what you came up with while working through the
howto, but you're on the right track.

 (Guess I'm a bit too late, but meh. There's no harm in trying.)

Correct :)

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with 2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Apertium-stuff Digest, Vol 72, Issue 48

2013-04-29 Thread Jimmy O'Regan
On 29 April 2013 16:58, Francis Tyers fty...@prompsit.com wrote:
 El dl 29 de 04 de 2013 a les 21:20 +0530, en/na Anand Soni va escriure:
 Hi,

 Sentiment analysis will not directly help machine translation. But,
 machine translation can definitely help sentiment analysis. Most of
 the work in sentiment analysis has been done in English only. After
 building the sentiment analysis tool, we can integrate it with a
 translator to do sentiment analysis for many languages. This is the
 idea that I have behind this project.
 Thus, it may be viewed as a new feature for Apertium machine
 translator. Please share your ideas on this.

 (1) Please do not reply to list digest posts.


...without trimming them!

 (2) Apertium is a machine translation project. Our goal is to make
 machine translation systems :) Any project should have either making a
 machine translation system, or improving the framework for making
 machine translation systems as a goal.

(3) This sounds very much like Opinum[1], which may be open sourced at
some point.

[1] Bonev, Boyan, Gema Ramírez-Sánchez, and Sergio Ortiz Rojas.
Opinum: statistical sentiment analysis for opinion classification.
Proceedings of the 3rd Workshop in Computational Approaches to
Subjectivity and Sentiment Analysis. Association for Computational
Linguistics, 2012.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Try New Relic Now  We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app,  servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] proposal page of sphinx

2013-04-28 Thread Jimmy O'Regan
On 28 April 2013 08:37, sphinx jiang yishan...@gmail.com wrote:
 Hi,
 This is my original proposal page of GSOC, maybe something need to be
 modified, so publish now to get some advise~~

 http://wiki.apertium.org/wiki/User:Sphinx/GSoC_2013_Application:_%22Chinese(simple)-Chinese(traditional)_language_pair%22

You should replace the screenshots with text (use pre) and maybe
rename your github repo to include the language names.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Try New Relic Now  We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app,  servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Coding Challenge for idea Sliding-window part-of-speech tagger

2013-04-20 Thread Jimmy O'Regan
On 20 April 2013 16:37, Gang Chen pkucheng...@gmail.com wrote:
 I've done the coding challenge for this idea, with the code here:

 https://github.com/elephantgcc/gsoc-2013/blob/master/ApertiumFilter.py

As the project is listed as a C++ project, the coding challenge ought
to also be carried out in C++. As is, we don't know whether or not you
can even compile C++, let alone write it.


-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis  visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Problem in generating tagged corpus

2013-04-12 Thread Jimmy O'Regan
On 12 April 2013 22:50, Mohit Aggarwal mohit@gmail.com wrote:
 Hi,

 I tried to tag a corpus by using moses as given in the coding challenge page
 http://wiki.apertium.org/wiki/Generating_lexical-selection_rules_from_a_parallel_corpus
 .
 I first cleaned the corpus by

 perl (path to your mosesdecoder)/scripts/training/clean-corpus-n.perl
 europarl-v7.es-en es en europarl.clean 1 40

 then I tried to tag the corpus by
 nohup cat europarl.clean.en | apertium-destxt |\
  apertium -f none -d /home/fran/source/apertium-en-es en-es-pretransfer 
 europarl.tagged.en 

 But each time I execute this command the output tagged file contains
 different
 number of lines which is not equal to the number of lines in the input file.


 Please tell me is there something I'm doing wrong.

The most obvious thing that seems wrong to me is that you probably
don't have a directory named '/home/fran/source/apertium-en-es'

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis  visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Apertium PMC election: election board

2013-04-09 Thread Jimmy O'Regan
On 8 April 2013 11:45, Kevin Brubeck Unhammer unham...@fsfe.org wrote:
 Mikel Forcada m...@dlsi.ua.es writes:

 [...]

 The current temporary census consists of:

 As it has been more than 7 days, this is considered the definitive
 census of Committers with right to vote.


I think something went wrong between these two stages, as Jonathan at
least was unaware that the election was taking place -- voter turnout
was much lower this time around than last, perhaps something went
amiss during the census? And if so, wouldn't it be appropriate to
re-open it?

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis  visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Apertium PMC election: election board

2013-04-09 Thread Jimmy O'Regan
On 9 April 2013 17:32, Xavi Ivars xavi.iv...@gmail.com wrote:
 2013/4/9 Jimmy O'Regan jore...@gmail.com


 I think something went wrong between these two stages, as Jonathan at
 least was unaware that the election was taking place -- voter turnout
 was much lower this time around than last, perhaps something went
 amiss during the census? And if so, wouldn't it be appropriate to
 re-open it?


 I wouldn't say it was much lower. It has decreased, but I don't think the
 number is so low that we could think something strange happened (28
 registered voters in 2011 [1] and 22 in this year [2]).

Aha. Sorry, I somehow had the impression that the number was much
higher last time.

In any case, as Mikel has pointed out on IRC, the decision is the
election board's.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis  visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Android ideas (was: Google joins Apertium in providing offline translation on Android.)

2013-04-08 Thread Jimmy O'Regan
On 8 April 2013 07:47, Kevin Brubeck Unhammer unham...@fsfe.org wrote:
 Mikel Artetxe artet...@gmail.com writes:
 As for Google Translate's app integrating better in Android, it is
 true that it has some great features that Apertium's app misses.
 Implementing some of them (like offline OCR[1], which was suggested
 during last GSoC) would be nice and relatively easy, but some others
 (like TTS or voice recognition, at least for all the minor languages
 that Apertium supports) would probably be unachievable for us.

 Doesn't Android come with some recognition and TTS built-in?

Kind of, but the ASR is pretty limited (command and control only), and
there are either no tools available for adapting the language data, or
the tools are only available as Windows binaries (and even then, you
don't get the full set of tools). The ASR most people think of when
they think of Android is a proprietary Google-branded add-on, and
maybe they've added an offline mode in more recent versions, but it at
least used to be true that it did nothing more than record a sound
file and send it to a Google server to be processed (there's
equivalent code in the Chrome tree. It's really not interesting).

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Google joins Apertium in providing offline translation on Android.

2013-04-07 Thread Jimmy O'Regan
On 7 April 2013 20:08, Mikel Artetxe artet...@gmail.com wrote:
 As for Google Translate's app integrating better in Android, it is true that
 it has some great features that Apertium's app misses. Implementing some of
 them (like offline OCR[1], which was suggested during last GSoC) would be
 nice and relatively easy, but some others (like TTS or voice recognition, at
 least for all the minor languages that Apertium supports)

TTS is not a big problem. eSpeak is available for Android (via the
Eyes-Free project), and I think CMU Flite is too. I added 'generate
with tags' mode to lt-proc for exactly this purpose, but a wrapper to
pick out the ambiguous words and annotate with, say, SSML would be
needed (not a whole lot of work, though).

ASR is more of a problem. PocketSphinx is available for Android, but
there are very few languages with available acoustic models. (If you
want to help to change that, VoxForge (http://www.voxforge.org/) are
building open data for ASR). The English model is relatively well
developed, but they have models for other languages.

 write about topics -and apps- suggested by readers. Wouldn't it be nice to
 suggest them to write an article about Apertium's app? It's just an idea,
 perhaps somebody has already tried something like that...

I'd assume that nobody has, marketing is not a project strong point.
If you have ideas about how we can change that, I know I'd love to
hear them.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] The terrible tagger was :Re: New page about transfer rules ready (in French)

2013-04-02 Thread Jimmy O'Regan
On 2 March 2013 02:52, Jimmy O'Regan jore...@gmail.com wrote:
 On 1 March 2013 12:39, Per Tunedal per.tune...@operamail.com wrote:
 Hmm It's selected all the time! That's a bit confusing. Why does the
 tagger choose something previoulsy unknow (man) instead of the
 indefinite artikle (en)?


 Because it's getting the coarse tag 'PRN', which was presumably the
 most common of the available options in the corpus the tagger was
 trained on.

 'PRN' contains the tags-item 'prn.*', so it catches a lot.

I've made a lint tool for TSX: https://github.com/jimregan/apertium-tsx-lint
This is one of the errors it will catch, in case it comes up again in future:

$ cat examples/multimatch.tsx
?xml version=1.0 encoding=UTF-8?
tagger name=multimatch
tagset
   def-label name=TESTMATCH
tags-item tags=prn.*/
  /def-label
/tagset
/tagger

$ echo fooprnsubj/fooprnobj| perl apertium-tsx-lint.pl
examples/multimatch.tsx
MASKED_AMBIGUITY: TESTMATCH (4) matches more than one analysis:
INPUT: fooprnsubj/fooprnobj
MATCHED: prnsubj/prnobj

(Unlike libxml2, Perl's XML::Parser gives the correct line numbers :)

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Apertium-stuff Digest, Vol 71, Issue 48

2013-03-29 Thread Jimmy O'Regan
On 29 March 2013 03:13, Anand Soni anand.92.s...@gmail.com wrote:
 Hello!

 By the 'toolbar', I just meant the online translation platform of Apertium
 (http://www.apertium.org).

The main reason I asked is because your idea is quite vague, and I
prefer to think in more concrete terms.

It's still not clear to me whether you are proposing to integrate an
existing package for anaphora resolution into something like Apertium
AWI in a similar way to spelling and grammar checking, to provide a
visual indication to a human translator of pronouns that may have been
incorrectly translated; or, if you're proposing to add a module that
aims to improve the automatic translation of these pronouns.

 And presently, it is not clear to me how I will
 go about integrating it with the translator. I want ideas on whether this
 will be a good addition to Apertium or not. I will definitely think on the
 integration part.

I was rather hoping that you would make some attempt to explain how
you would go about integrating it, because doing so could have made
what you're proposing more clear. Until I know what exactly you're
proposing, I can't tell you whether or not it's a good idea.
-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Own the Future-Intel(R) Level Up Game Demo Contest 2013
Rise to greatness in Intel's independent game demo contest. Compete 
for recognition, cash, and the chance to get your game on Steam. 
$5K grand prize plus 10 genre and skill prizes. Submit your demo 
by 6/6/13. http://altfarm.mediaplex.com/ad/ck/12124-176961-30367-2
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Apertium-stuff Digest, Vol 71, Issue 40

2013-03-25 Thread Jimmy O'Regan
On 25 March 2013 03:20, Anand Soni anand.92.s...@gmail.com wrote:
 Now, I understand and believe you, Sir, that this would be rather difficult.

We're not that formal. None of us has received a knighthood, there are
no 'Sirs' here :)

 I too think that I should work on a project with higher probability
 'success'.

Good. It's quite easy, when dealing with something new, to
underestimate the difficulty of the task, and three months is not a
lot of time.

 I have been thinking of other ideas.

Good. I'd be interested in hearing them. You should join the IRC
channel - it's a little easier to talk about ideas that are maybe not
fully formed when communication is in realtime.

 I will definitely keep in
 touch with you regarding any idea that comes up in my mind so that I can be
 guided by you on whether I should do it or not.

I'm not seeking to veto your ideas. If there is something that I think
may be more difficult than you think, I will tell you -- and, if you
want a clarification, please ask! -- but if you want to insist, then
insist!

 And really sorry for the way
 I replied to the previous mails. I assure you that it was not intentional
 and definitely, will not happen again.

There's no need to apologise. The point of GSoC is to get students
involved in Open Source projects, and participating in mailing lists
is part of that, and comes with a set of conventions that are not
obvious at first.

Hopefully, you can appreciate that trimming the email you're replying
to down to just the parts you wish to address is a better way to
communicate than allowing your words to be lost! (I used 'ignore' in
my email yesterday, because that's how it would appear to you; I hope
you will pay heed to Bernard's reply, that he tried, and failed, to
find what you had written).


-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Apertium-stuff Digest, Vol 71, Issue 37

2013-03-24 Thread Jimmy O'Regan
First of all...

Replying to a digest email the way you did is a very good way to be
ignored, because your reply is buried in the middle of many emails
that have already been read.

It also gives the impression that you are either too lazy or too
self-important to consider the time of the others on the list, who
have to search for your answer. Don't do that: that's not the
impression you want to give.

Now...

On 24 March 2013 04:14, Anand Soni anand.92.s...@gmail.com wrote:
 Sorry for using the wrong terminology. It is translation, not
 transliteration. I kind of changed the whole idea by using this word! I
 donot know how difficult that would be but,

I *do* know, that's why I told you it's too difficult. Perhaps you
feel I'm underestimating you, because of your slip-up in terminology,
but let me assure you that this is not the case. This would be a
difficult project for an exceptional, experienced student.

 I would definitely like to start
 working on this. I will keep asking things here and to my mentor if I get
 stuck. Also, I will definitely figure out things myself. But, this is the
 project that I would like to do as off now. If I change my idea, I will let
 you know.

We can only accept a small fraction of the proposals we receive - if
we are even fortunate enough to be selected again - and given the
choice between an ambitious, but unrealistic, proposal that seems
likely to fail, and a more modest, but realistic, proposal that seems
likely to succeed, we, as mentors, will typically choose the latter.

So your choice is this: pick a more realistic project, or make us
believe that *you* can make the project realistic.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Apertium-stuff Digest, Vol 71, Issue 26

2013-03-23 Thread Jimmy O'Regan
On 23 March 2013 12:19, Anand Soni anand.92.s...@gmail.com wrote:
 Hello Everyone!

 One of the project idea that I would like to introduce is a English-Hindi
 transliteration pair

Transliteration? Now, do you actually mean transliteration, or do you
mean translation? You'll have to be quite careful with your
terminology.

 that Apertium does not support currently. This language
 pair, if implemented and released, would be very useful to millions of
 Indians and would be a nice quality addition to the Apertium toolbox. Also,
 I plan to do word-sense disambiguation which is also one of the proposed
 idea of Apertium. Currently, Apertium has only a limited number of language
 pairs. And this addition will be valuable. Please give me your feedback on
 this idea. I will think further on the implementation details and soon
 submit a proposal for the same if this idea sounds good to you.

The standard advice applies: a language pair involving English is too
difficult for someone new to Apertium, and even for someone with a lot
of experience, it would be difficult to complete in 3 months.

I suggest that you take a look at translation between more closely
related languages, where the learning curve is lower, and the chance
of success is higher.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Idea for GSOC: tools to train supervised taggers

2013-03-20 Thread Jimmy O'Regan
On 20 March 2013 21:59, Francis Tyers fty...@prompsit.com wrote:
 I've added it to the ideas page, if anyone would like to expand on it,
 the read more page is here:

 http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Interface_for_creating_tagged_corpora

I assumed Gema was talking specifically about a web interface
('upload') rather than a desktop tool. (IIRC, Jacob's apertium-viewer
can be used for that).

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Apertium in android

2013-03-18 Thread Jimmy O'Regan
On 18 March 2013 10:19, karunakar medamoni kannaiah.chi...@gmail.com wrote:
 Hi Jacob

 Thanks for u reply. I have downloaded apertium toolkit for java jar file
 lttoolbox.jar f. Here we are doing translation for Sanskrit to
 Hindi(http://sanskrit.uohyd.ac.in/scl/).  We have compiled bin file for
 Sanskrit.

 Let me explain what actually am doing.Am trying to get morph information of
 a word from bin file it size around 17MB. Initially  i did integration
 apertium code with my code and its runs fine in eclipse. While coming to
 android am getting out of memory error. Please find the logcat of my android
 applications.  I was posted in android forums also about this error. Later i
 think  apertium also released apertium translator for android. This is
 reason i have posted my query apertium.

I think memory issues like this are part of what prompted Jacob to
work on memory mapping the transducers. Try using the latest SVN
version of lttoolbox-java and see if you still have the same issue.


-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Idea for GSOC

2013-03-13 Thread Jimmy O'Regan
On 12 March 2013 16:58, Tino Didriksen tino.didrik...@gmail.com wrote:
 On Tue, Mar 12, 2013 at 12:21 PM, Francis Tyers fty...@prompsit.com wrote:

 El dt 12 de 03 de 2013 a les 10:55 +, en/na Jimmy O'Regan va
 escriure:
 

  Sorry, I wasn't clear enough. The idea is segmentation. I said that
  segmentation by itself would probably make a good project, where by
  itself was intended to mean that the project would just be
  segmentation.
 
  In practice, you will also have to work on a language pair where this
  can be used. zh_ZH-zh_TW is a perfect candidate, because segmentation
  is not strictly necessary for this language pair - i.e., you use it to
  demonstrate that segmentation is working, without _needing_ to. In
  that regard, you will need to also allot some time to developing that
  language pair, though it will not be the primary focus of the project.

 So this would be for languages where word boundaries are not written ...
 Chinese/Thai/Lao/Khmer/Burmese etc. ?

 Yes, that could be interesting. But, if it was the case that the project
 would be for just segmentation, then ideally it would be tested on more
 than one language.


 Sounds trivially done by making a thin shell over ICU's BreakIterator:
 http://userguide.icu-project.org/boundaryanalysis

Not really. That's aimed more at segmentation for display purposes -
wrapping lines and the like - where things like ambiguity in the
segmentation are not a pressing concern. We can already get something
equivalent in lttoolbox, by setting the dictionary to postblank by
default.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Idea for GSOC

2013-03-13 Thread Jimmy O'Regan
On 12 March 2013 11:21, Francis Tyers fty...@prompsit.com wrote:
 El dt 12 de 03 de 2013 a les 10:55 +, en/na Jimmy O'Regan va
 escriure:
 In practice, you will also have to work on a language pair where this
 can be used. zh_ZH-zh_TW is a perfect candidate, because segmentation
 is not strictly necessary for this language pair - i.e., you use it to
 demonstrate that segmentation is working, without _needing_ to. In
 that regard, you will need to also allot some time to developing that
 language pair, though it will not be the primary focus of the project.

 So this would be for languages where word boundaries are not written ...
 Chinese/Thai/Lao/Khmer/Burmese etc. ?


Yes, that's the idea.

 Yes, that could be interesting. But, if it was the case that the project
 would be for just segmentation, then ideally it would be tested on more
 than one language.

Ideally, yes. In practice... maybe. Language-independent segmentation
is not the most well-trodden path, and anything that I have seen that
claims language-independence was only tested on a single language. The
method used in the project I pointed to is one of the few that claims
language independence, but that implementation might have some (I
can't tell for sure, as the comments are in Chinese, but it doesn't
look like it).

I would require language independence as a project goal, but wouldn't
make it a hard requirement before midterms - more of an 'avoid the
obvious' guideline. As for actually testing how independent it is... I
don't know how that's going to work out. There are plenty of resources
for Chinese, and drastically fewer for everything else.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Idea for GSOC

2013-03-12 Thread Jimmy O'Regan
On 11 March 2013 18:04, sphinx jiang yishan...@gmail.com wrote:
 Hi,

 I would like to suggest an idea for Apertium GSOC program. Several days age
 I talked to Jimmy, and was enlightened by the idea Segmentation by itself.


Sorry, I wasn't clear enough. The idea is segmentation. I said that
segmentation by itself would probably make a good project, where by
itself was intended to mean that the project would just be
segmentation.

In practice, you will also have to work on a language pair where this
can be used. zh_ZH-zh_TW is a perfect candidate, because segmentation
is not strictly necessary for this language pair - i.e., you use it to
demonstrate that segmentation is working, without _needing_ to. In
that regard, you will need to also allot some time to developing that
language pair, though it will not be the primary focus of the project.

 The  Hierarchical HMM for segmentation ports, especially the
 imdict-chinese-analyzer, which is for Chinese segment, wrote in Java, I
 think it can be transplant to C++, and used for Apertium . Then we can
 fulfill the program translate Chinese-ZH to Chinese -TW by self segment.

 Is my idea possible to achieve? I am looking forward to your reply~~

A straightforward port will not be sufficient. The module will, at the
very least, also need to handle the Apertium stream format. Your
proposal should take this into account.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and remains a good choice in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] ask for help

2013-03-08 Thread Jimmy O'Regan
On 8 March 2013 16:34, sphinx jiang yishan...@gmail.com wrote:
 dear authors of apertium:


Hi.

  I am a beginner of apertium who want to make a Chinese related
 language pair, so I need some help from the author of
 http://apertium.svn.sourceforge.net/viewvc/apertium/incubator/apertium-zh_CN-zh_TW/

  So would you please to tell me how can i get in touch with the
 author. Thank you very much~~


The author of that module was a student who was hoping to apply for
Google Summer of Code. It was partly a proof-of-concept to see if it
could be done, without adding a specific module for segmentation
(which is still a matter to be determined, though I do recall that the
student was quite pleased with the results).

If you have any questions, you can ask them here, and we'll do our
best to answer.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and remains a good choice in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Fwd: Apertium PMC Election

2013-03-03 Thread Jimmy O'Regan
On 3 March 2013 19:25, Juan Pablo Martínez Cortés jpm...@unizar.es wrote:
 I'm not sure, but in case I am entitled to vote:

Sure you are - if you are a registered developer (i.e., if you can
commit to the SVN repository), you're entitled to vote.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] The terrible tagger was :Re: New page about transfer rules ready (in French)

2013-03-01 Thread Jimmy O'Regan
On 1 March 2013 09:20, Per Tunedal per.tune...@operamail.com wrote:
 Next mail??

I sent two emails, one after the other. The second had an example
(both forms of 'man') of your coarse tagset being too broad, which
underlined the point I was trying to make in the first.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] google translate altering the meaning in eu-es

2013-03-01 Thread Jimmy O'Regan
On 28 February 2013 09:52, Antonio Toral ato...@computing.dcu.ie wrote:
 Hi apertiumers,

 I came across a news story in Basque about someone getting in trouble
 for speaking in Basque to the police
 http://www.ateakireki.com/2013/02/lizarrako-gazte-bat-epaituko-dute.html

 I used Google and Apertium to translate it into Spanish to read it and...

 Google changes a youngster speaks in Basque in front of the police for
 a youngster speaks in ENGLISH in front of the police, or the dangers
 of statistical machine translation!

It's a localisation artifact. A less obvious example of the same
factors leads to Austria becoming Ireland
(http://itre.cis.upenn.edu/~myl/languagelog/archives/005492.html): for
example, on a news website there could be several instances of the
phrase 'últimas noticias en español' in the Spanish edition, where the
English equivalent would have 'recent news in English'. The language
model will typically favour 'in English' because it occurs in English
much more often (in a typical corpus) than 'in Spanish'.

Another typical error, due to the terrible number handling in most SMT
systems, causes 'millón' to become 'billion': Spanish typically uses
the long scale, so billón almost never collocates with billion, and
when 5.000 millón - 5 billion has been naively converted to _NUM_
millón - _NUM_ billion, you're down to little more than a coin flip
whether the output will be million or billion. (Sergio had a paper on
mixing Apertium and Moses, where he got better results partly by
adding better number handling).

It's not just SMT, though: simplistic filters lead to the sprinter
Tyson Gay becoming Tyson Homosexual
(http://languagelog.ldc.upenn.edu/nll/?p=294), and automated currency
converters to 50 Cent becoming RM1.50
(http://languagelog.ldc.upenn.edu/nll/?p=3915), and it's not like
humans don't make translation errors either - I'm not sure if it's a
coincidence that Moses is named for a biblical figure who was the
subject of a quite long lasting translation error
(http://en.wikipedia.org/wiki/Moses_(Michelangelo)#Horns)

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] The terrible tagger was :Re: New page about transfer rules ready (in French)

2013-03-01 Thread Jimmy O'Regan
On 1 March 2013 12:39, Per Tunedal per.tune...@operamail.com wrote:
 Hi again,
 thanks for the thorough answer. I've glanced it throw and have a few
 quick comments:

 1. The problem right now is that the pronoun man is chosen instead of
 the indefinite article en:
 en man (= a man) becomes man man! (Yes, yet an other man!). Not
 that it's never chosen!


The details don't really make a difference, as long as you understand
the general idea. Nothing can ever be chosen based on right context.
In the tagger.

It can be done in transfer, though. I'd prefer to explain this with an
example, but I'm rather fussy about transfer, and I perhaps could be
convinced to adapt those rules, but not without repeatedly questioning
the sanity/competence/parentage/predilections of whoever wrote them,
so it'd probably be better if we found time when I could simply
rewrite them.

 2. Will it be easier or harder for the tagger if I split the pardef for
 the pronoun man (the way that's common i Apertium) into a pronoun (man
 - en) and a determiner (ens)? Man blir glad när ens barn ger en
 blommor. = You get happy when your children gives you flowers.


I'm not sure what you mean here.

 3. And what about the dialectal variant to use en instead of man: en
 -  en - ens (now very popular among young trendy people). What's the
 least confusing way to handle it?

'en' as a form of 'man' has the same tags, so it would need to 1) be
part of a new coarse tag, and 2) that coarse tag would need to use the
'lemma' attribute:

   def-label name=PRNENSUBJ closed=true
 tags-item lemma=en tags=prn.pers.p3.ut.sg.nom/
   /def-label

(However, if 'en' and 'man' both lead to the same translation, you can
let it be discarded).

 It looks to me like you've created a new ambiguity class, by adding
 'en' as an analysis of 'man'. If there is no corresponding coarse tag
 in the tagger .tsx file *and* if the tagger has not been trained to
 determine probabilities for that tag, then it will never, ever, be
 selected because you have not made it possible for the tagger to
 select it.

 Hmm It's selected all the time! That's a bit confusing. Why does the
 tagger choose something previoulsy unknow (man) instead of the
 indefinite artikle (en)?


Because it's getting the coarse tag 'PRN', which was presumably the
most common of the available options in the corpus the tagger was
trained on.

'PRN' contains the tags-item 'prn.*', so it catches a lot.

It's probably easiest to think of 'coarse tag' as a category, by the way.

 This is different from what Jacob told you: essentially, that a bigram
 tagger simply lacks the context to make a correct determination
 between pronoun and determiner - in many cases, this requires right
 context (i.e., knowing what the next word is), but the tagger has only
 left context (i.e., the previous word).

 Fine. Now I know how the tagger works. The previous word. I will ponder
 on that one.


There's more to it than that, but writing an email that long would
probably have introduced more confusion (and hurt my wrists :)

 at a minimum, those could be adapted[1]:


 I will have to treat all other variants as well: p1, p2, and plural. for
 other pronouns.


Yes, that's why I gave you a set of commands to find what those tags are :)

 Really, you need to adapt the .tsx files and retrain the tagger.

 Yes, it couldn't be worse, could it? Francis tells me I need to add a
 lot of more words, though. But I don't think it would do any harm if I
 retrained the tagger with the few additions and corrections I've already
 have done.

Probably not.

 If I've retrained it once, I suppose it would be easy to do
 it again when even more words are added.


Yes.

 [1] There's no need to keep 'prn.subj.*' or 'prn.obj.*' because
 nothing in the dictionaries matches.

 Right. That's because I've changed all subj to nom and all obj to
 acc to be compatible with other language pairs. Was that a bad thing?

Not in itself, but the .tsx file also needs to be updated.

It's also important to note that if you add new categories (=coarse
tags), you will need to retrain the tagger - simply updating the rules
will not be enough. Otherwise - if the coarse tags have not changed,
just the entries in them (as I did with PRNOBJ and PRNSUBJ) - it's
fine.

 $ lt-expand apertium-sv-da.sv.dix | [etc.]

 Would this be a list of words that are not at all caught by the tagger,
 or what?

No, words that are _partially_ caught by the tagger - i.e., where
there are missing coarse tags.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net

Re: [Apertium-stuff] The terrible tagger was :Re: New page about transfer rules ready (in French)

2013-02-28 Thread Jimmy O'Regan
On 28 February 2013 09:01, Per Tunedal per.tune...@operamail.com wrote:
 Hi,
 it might be helpful with some information on how the tagger (le Tageur
 redoutable?) actually works. How can I help the tagger when I add words
 and paradigms to the dictionaries? I suppose the structure of the
 dictionaries, and specifically the paradigms, has a great impact on the
 work of the tagger.

Not directly. The tagger is entirely independent of the dictionaries.

The fine tags (the tags coming from the dictionary) need to have
corresponding coarse tags (the tags used by the tagger) that are
sufficient to disambiguate the text. Coarse tags group together
equivalent fine tags, which helps to alleviate the data sparseness
problem: not all words occur in all contexts, so we group them
together so that what we know about classes of words applies to all
words in that class. The coarse tags should be as broad as possible,
but not too broad - if two word forms match the same coarse tag, then
that tag needs to be split, for example. See my next mail.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Apertium PMC Election

2013-02-27 Thread Jimmy O'Regan
On 27 February 2013 10:51, Mikel L. Forcada m...@dlsi.ua.es wrote:
 Dear Apertiumers

 It's time for the Apertium assembly of committers to elect a new Project
 Management Committee. We need to update our census. I will take care of
 that. This message is being sent to the apertium-stuff mailing list.

 This message will also be sent to all developers to their @users.sf.net
 addresses.

 If you receive this message and you want to register to vote in this
 election, please reply to this message, adding your SourceForge
 developer ID and your full name before March 4 at 23:59 CET.

SF id: jimregan
SF name: Jimmy O Regan

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Apertium-viewer (r42723) crashes

2013-02-26 Thread Jimmy O'Regan
On 26 February 2013 16:54, Ilnar Salimzyan ilnar.salimz...@gmail.com wrote:
 org.apertium.lttoolbox.process.FSTProcessor.analysis(FSTProcessor.java:886)
 at org.apertium.lttoolbox.LTProc.doMain(LTProc.java:284)
 at org.apertium.pipeline.Dispatcher.doLTProc(Dispatcher.java:297)
 at org.apertium.pipeline.Dispatcher.dispatch(Dispatcher.java:381)
 at apertiumview.Pipeline$PipelineTask.run(Pipeline.java:123)
 at apertiumview.Pipeline$1.run(Pipeline.java:41)

(IIRC) This is the equivalent of lt-proc -a, but your mode uses HFST,
which is probably what's causing the problem. I think the pipeline has
some hardcoded assumptions of what an Apertium pipeline consists of,
and this has probably not been tested with Apertium+HFST.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Cheap bilingual dictionary

2013-02-14 Thread Jimmy O'Regan
On 14 February 2013 09:16, Per Tunedal per.tune...@operamail.com wrote:
 Thank you!
 At last I can start working :-)
 Per
 PS Maybe this should be added to the Wiki?

The various editions of Wikipedia are having interwiki links migrated
to Wikidata, so it might be better to look into that. If nothing else,
it's a cleaner source of data.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Cheap bilingual dictionary

2013-02-13 Thread Jimmy O'Regan
On 13 February 2013 11:27, Per Tunedal per.tune...@operamail.com wrote:
 Hi,
 I'm experimenting with the script on the page
 http://wiki.apertium.org/wiki/Building_dictionaries .

 I'm repeatedly getting an error message:

 'import sitecustomize' failed; use -v for traceback

 All the same, I get results:

 eplplånboks n=n//lrPortemonnæs n=n//r/p/e
 eplprogramkods n=n//lrKildekodes n=n//r/p/e
 eplregisters n=n//lrRegisters n=n//r/p/e
 eplrepliks n=n//lrRepliks n=n//r/p/e
 eplscanners n=n//lrSkanners n=n//r/p/e

 The Danish national characters are distorted, though.

 Any suggestions?


cat [your file]|perl -MEncode -ane 'chomp;if(m!(epl)([^]*)(s
n=n//lr)([^]*)(s n=n//r/p/e)!){print
$1$2$3.encode(iso-8859-1,decode(utf-8, $4)).$5\n;}'


-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Pan-Lexical database on line

2013-02-13 Thread Jimmy O'Regan
On 13 February 2013 16:03, Jimmy O'Regan jore...@gmail.com wrote:
 On 13 February 2013 15:14, Mikel Forcada m...@dlsi.ua.es wrote:
 Al 02/13/2013 03:06 PM, En/na Federico Gobbo ha escrit:
 It's a pity, that there is no indication about the copyright, but I think 
 that
 it can be used anyway by us.
 I think that when copyright is not explicitly regulated it amounts to
 all rights reserved according to the Berne Convention. Therefore, my
 opinion is quite the opposite.


 Yes. From a quick look at their sources, the project looks like a
 lawsuit waiting to happen, so I would avoid it like the plague.


On the upside, in cleanly-licensed (CC-BY-SA) terms, there's a CSV
file from the DBPedia-Wiktionary project that provides translations
from both en.wiktionary and de.wiktionary
(http://downloads.dbpedia.org/wiktionary/dumps/de/wiktionary_de+en_2012-04-01_translations.csv.gz).
The entries look like this:

aalartig,German,Adjective,einem Aal ähnlich; wie ein
Aal,eel-like,English
aalartig,German,Adjective,einem Aal ähnlich; wie ein
Aal,eellike,English
aalartig,German,Adjective,einem Aal ähnlich; wie ein
Aal,węgorzowaty,Polish

The endpoint is offline at the moment, but the aim of the project is
to provide a linked-data view on multiple editions of Wiktionary
simultaneously. (i.e., it uses a proper database, but it's a graph
database rather than an SQL database). Inflection is not currently
extracted, because of the sheer number of templates involved, but most
of the rest of Wiktionary is.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Cheap bilingual dictionary

2013-02-13 Thread Jimmy O'Regan
On 13 February 2013 17:53, Per Tunedal per.tune...@operamail.com wrote:
 Hi,
 I just found out that the script for generating bidix entries works
 alright when translating from the left language to the right language.
 Translating from Swedish to Danish works OK in the pair sv-da, but if I
 try to translate a Danish text, the lexical entries are reversed:

 eplbagepulvers n=n//lrBakpulvers n=n//r/p/e
 eplbasilikums n=n//lrBasilika_s n=n//r/p/e
 eplblades n=n//lrBlads n=n//r/p/e
 eplblegselleris n=n//lrSelleris n=n//r/p/e
 eplblinis n=n//lrBliniers n=n//r/p/e
 eplblomkåls n=n//lrBlomkÃ¥ls n=n//r/p/e

You still have the same problem: 'BlomkÃ¥l' should (presumably) be 'Blomkål'

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Cheap bilingual dictionary

2013-02-13 Thread Jimmy O'Regan
On 13 February 2013 21:00, Per Tunedal per.tune...@operamail.com wrote:
 Well,
 I ran your script afterwords, and the Swedish characters where corrected
 - but the Danish ones where damaged:

 Before:
 eplBlomkÃ¥ls n=n//lrblomkåls n=n//r/p/e
 eplBlÃ¥musslas n=n//lrblåmuslings n=n//r/p/e
 eplSamlags n=n//lrboldes n=n//r/p/e
 eplBombs n=n//lrbombes n=n//r/p/e
 eplBrandy_s n=n//lrbrandys n=n//r/p/e
 eplHallonsläktets n=n//lrbrombærs n=n//r/p/e
 eplBröllopstårtas n=n//lrbryllupskages
 n=n//r/p/e
 eplKvinnobrösts n=n//lrbrysts n=n//r/p/e
 eplBröds n=n//lrbrøds n=n//r/p/e
 eplBulgurs n=n//lrbulgurs n=n//r/p/e
 eplBunsenbrännares n=n//lrbunsenbrænders
 n=n//r/p/e
 eplBönas n=n//lrbønnes n=n//r/p/e
 eplBönas n=n//lrbønners n=n//r/p/e

 after:
 eplBlomkåls n=n//lrblomk?ls n=n//r/p/e
 eplBlåmusslas n=n//lrbl?muslings n=n//r/p/e
 eplSamlags n=n//lrboldes n=n//r/p/e
 eplBombs n=n//lrbombes n=n//r/p/e
 eplBrandy_s n=n//lrbrandys n=n//r/p/e
 eplHallonsläktets n=n//lrbromb?rs n=n//r/p/e
 eplBröllopstårtas n=n//lrbryllupskages
 n=n//r/p/e
 eplKvinnobrösts n=n//lrbrysts n=n//r/p/e
 eplBröds n=n//lrbr?ds n=n//r/p/e
 eplBulgurs n=n//lrbulgurs n=n//r/p/e
 eplBunsenbrännares n=n//lrbunsenbr?nders
 n=n//r/p/e
 eplBönas n=n//lrb?nnes n=n//r/p/e
 eplBönas n=n//lrb?nners n=n//r/p/e

 That's strange, because your script corrected the file translated in the
 other direction OK.

Yes, because it was expecting the corrupted characters to be on the
right, so to go the other way it would need to be:
perl -MEncode -ane 'chomp;if(m!(epl)([^]*)(s
n=n//lr)([^]*)(s
n=n//r/p/e)!){$rec=encode(iso-8859-1,decode(utf-8,
$2));if($2 eq lc($2)){$rec=lc($rec);}; print $1$rec$3$4$5\n;}'

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] A more simple example for transfer rules

2013-02-03 Thread Jimmy O'Regan
On 2 February 2013 17:30, Bernard Chardonneau bechapert...@free.fr wrote:
 The wiki page of Francis about writing transfer rules is interesting
 (and it is good to have written it), but the example is not enough
 simple for me to know what to write in the different sections.

 For instance, in def-cats , you seem to describe separated words when
 transfer rules are supposed to work with groups of words.


No, it's defining tag categories. 'sg', 'pl', 'sp', and 'ND' (number
to be determined) fit into the category of 'number', so having a
category containing these elements allows us to treat any of these
items as one, rather than having to treat them individually. So, if
you want to have simple agreement between words, you can take the
contents of the relevant category, whatever it is, without having to
treat what it _really_ is. So instead of checking if one word contains
'lit-tag v=sg/', then 'lit-tag v=pl/' ... etc., you can use
'clip ...' where the 'part' attribute is whatever you named the
number category.

So, if I have:
def-attr n=nbr
  attr-item tags=sg/
  attr-item tags=pl/
  attr-item tags=sp/
  attr-item tags=ND/
/def-attr

and the input word was '^foonsg$'

then
clip pos=1 side=sl part=nbr/

equals 'sg'.

(Here, 'pos' has the same meaning as with 'b' - the number of the
word relative to the 'pattern-item' that matched it - and 'side' is
either 'sl' (source language, or input) or 'tl' (target language, or
output)).

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_jan
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] A more simple example for transfer rules

2013-02-03 Thread Jimmy O'Regan
On 2 February 2013 19:50, Bernard Chardonneau bechapert...@free.fr wrote:
 OK for that. And according to the wiki, as there can be kinds of regular
 expressions in

The implementation uses regular expressions, but that is an
implementation detail which should not be relied upon.

 cat-item there can also be pattern with categories of
 words including a special tag (for instance a noun with a acc attribute).

The category matches the attribute, not a combination of them. So a
'case' category that matches 'nom' and 'acc' will match those
attributes wherever they may appear, not just in nouns.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_jan
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Using Apertium language pair jar files in an Android app?

2012-12-30 Thread Jimmy O'Regan
On 30 December 2012 12:07, Francis Tyers fty...@prompsit.com wrote:
 It means the source code for the whole app. iirc the GPL forbids linking
 with non-free code. Someone else may be better placed to answer this
 though.

 Fran

 El dg 30 de 12 de 2012 a les 16:49 +0530, en/na Mark Carter va escriure:
 Thanks very much for your reply.


 When you say the source code - do you mean the entire source code of
 my app?


 What about if there was a separate module specifically for the
 apertium stuff?  Would it be enough to just release the source for
 that?


The long answer is that it would heavily depend on just how such a
module was structured[1]; the short answer is 'no'. Additionally, it
is not sufficient to merely release the source code - to vastly
over-simplify, the source *and* the build scripts must be released
under terms which are compatible with the GPL.

The GPL has other requirements that you may not find particularly
appealing: the first that comes to mind is that each recipient of a
GPL-licensed program is allowed to further redistribute the program
under the terms of the GPL.

[1] I expect that you'll excuse me for not spending the time to
enumerate these options, as this would run contrary to our interests.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnmore_123012
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Android app released

2012-12-30 Thread Jimmy O'Regan
On 30 December 2012 07:36, Mikel Forcada m...@dlsi.ua.es wrote:
 Authors and contributorsAUTHORS AND CONTRIBUTORS

 This app would not have been possible without the support of Google Summer
 of Code (GSoC) stipends.

 2012 GSoC student Mikel Artetxe - Making Java port of lttoolbox (dictionary
 engine) embeddable

 2012 GSoC student Arink Verma - Created an Android app using lttoolbox-java

 2012 GSoC Jacob Nordfalk - Mentor of Mikel and Arink
 Re-architectengineering of lttoolbox-java for memory-constrained devices
 Android app revision autumn 2012

 2010 GSoC student Stephen Tigner - Java port of the Apertium C++ library

 2009 GSoC student 2009 Raphaël Laurent - Java port of the Apertium C++
 library

 2008 Nic Cottrell - Initial draft of Java port

 2009-2012 Jacob Nordfalk - Maintainer and GSOC mentor

There should be some nod to the authors of the C++ version of
Apertium, and to Stephen's involvement in mentoring Arink and Mikel.
Also, self-serving as it is, there's a bunch of code in the tagger
that I wrote that's still exactly as I left it.


-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnmore_123012
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Duplicate entries in apertium-es-ca monilingual dics

2012-11-29 Thread Jimmy O'Regan
On 29 November 2012 14:25, Jimmy O'Regan jore...@gmail.com wrote:
 $ diff -u sort.dix out.dix |grep '^\-.*v='-  e
 v=valplegeixi/l   regirs n=vblex/s n=prs/s
 n=p3/s n=sg//r/p/e

These entries:

 -  e v=catpleguin/lreures n=vblex/s n=imp/s
 n=p3/s n=pl//r/p/e
 -  e v=catpleguin/lreures n=vblex/s n=imp/s
 n=p3/s n=pl/j//r/ppar n=S__anant//e
 -  e v=catplegui/l reures n=vblex/s n=imp/s
 n=p3/s n=sg//r/p/e
 -  e v=catplegui/l reures n=vblex/s n=imp/s
 n=p3/s n=sg/j//r/ppar n=S__vagi//e
 -  e v=valpleguen/lreures n=vblex/s n=imp/s
 n=p3/s n=pl//r/p/e
 -  e v=valpleguen/lreures n=vblex/s n=imp/s
 n=p3/s n=pl/j//r/ppar n=S__anant//e
 -  e v=valplega/l  reures n=vblex/s n=imp/s
 n=p3/s n=sg//r/p/e
 -  e v=valplega/l  reures n=vblex/s n=imp/s
 n=p3/s n=sg/j//r/ppar n=S__vagi//e

were genuinely redundant, so I fixed them.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Keep yourself connected to Go Parallel: 
VERIFY Test and improve your parallel project with help from experts 
and peers. http://goparallel.sourceforge.net
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] ACX files

2012-11-18 Thread Jimmy O'Regan
On 18 November 2012 14:55, Bernard Chardonneau bechapert...@free.fr wrote:
 $(PREFVAR2)$(PREFIX1).autogen.bin: $(PREFVAR2)$(LANG2).dix
 apertium-validate-dictionary $(PREFVAR2)$(LANG2).dix
 lt-comp rl $(PREFVAR2)$(LANG2).dix $@ $(BASENAME).$(LANG1).acx

 $(PREFVAR1)$(PREFIX2).autogen.bin: $(PREFVAR1)$(LANG1).dix
 apertium-validate-dictionary $(PREFVAR1)$(LANG1).dix
 lt-comp rl $(PREFVAR1)$(LANG1).dix $@ $(BASENAME).$(LANG2).acx

 Two questions :

 1) For the second autogen, should not $(BASENAME).$(LANG2).acx be
rather used ?

 2) Are .acx files usefull for generation ?

In reverse order:

2) No

1) It's not used, so it doesn't matter.

ACX is used to specify alternative characters for analysis, and for
most language pairs it's not used for much more than to normalise
apostrophes (the acx files in fr-es are most likely identical, for
example). In rl mode, lt-proc does not process the ACX file.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] google code in task descriptions

2012-11-01 Thread Jimmy O'Regan
On 1 November 2012 07:20, Mikel Forcada m...@dlsi.ua.es wrote:
 Al 10/31/2012 10:49 PM, En/na Francis Tyers ha escrit:
 Make a 50 sentences long translation memory
 Why so short? I think it is quite easy if one finds text and aligns
 it... Why from wikipedia?

To have open content text without having to explain the issues is one
good reason. Also, while working on Spanish-Aragonese, I noticed that,
quite often, the first sentences of a pair of equivalent articles were
parallel, or almost parallel, even if the rest of the text diverges.
There may be other areas - template properties, image descriptions,
descriptions on Wikimedia Commons, etc. - where we could be looking
for parallel sentences that become more visible once we've seen a
collection of them.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Debugging?

2012-10-30 Thread Jimmy O'Regan
On 30 October 2012 16:07, Yannis Haralambous
yannis.haralamb...@telecom-bretagne.eu wrote:
 dear Apertium people,

 is it possible to follow the structural transfer of a sentence step by step? 
 For example: what are the chunks, which rule is applied to each, what is the 
 result for each chunk. In other words, is there a debugging option for the 
 structural transfer module?


You can get this with
apertium-transfer -t
but it's not available from the script (it wouldn't make sense) --
you'll have to manually provide the entire pipeline.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] CorpusCatcher

2012-10-26 Thread Jimmy O'Regan
On 26 October 2012 13:41, Per Tunedal per.tune...@operamail.com wrote:
 Hi,
 what the status of corpuscatcher? I would like to get a monolingual
 corpus, but corpus catcher uses Yahoo to crawl the web. And I get an
 error message about a depreciated Yahoo search API.

 Any updates? Any way to circumvent the issue? Any alternatives?

I've only taken a quick look at github, but it seems to be under
pretty active development. Have you tried the development version? Or
asking corpuscatcher's developers?


-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Bitextor installation

2012-10-26 Thread Jimmy O'Regan
On 26 October 2012 13:57, Raymond HS raymh...@gmail.com wrote:
 Hi Jim,

 For the Antara website, I think most of their stories are not translations
 (more like comparable than parallel). But I believe there are some of them
 that are direct translations. Actually it will be good if Bitextor can use
 some linguistic information (like bilingual dictionary) during the alignment
 process. :)

IIRC, Bitextor only uses document structure. If you already have a set
of aligned documents, Hunalign can use a dictionary to improve
existing sentence alignments, and maligna can additionally create IBM
Model 1 models.

Finding parallel document pairs in comparable corpora is a less
researched problem, but Felipe's doctrans project
(http://code.google.com/p/doctrans/) happily does that - you'll need a
phrase table from Moses to use it, though.


-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] CorpusCatcher

2012-10-26 Thread Jimmy O'Regan
On 26 October 2012 15:27, Per Tunedal per.tune...@operamail.com wrote:
 Hi,
 Strange. I only find files two years old. Have you found any newer files
 somewhere? Do you have any more information?

I googled 'corpuscatcher'

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Bitextor installation

2012-10-26 Thread Jimmy O'Regan
On 26 October 2012 15:51, Raymond HS raymh...@gmail.com wrote:
 Hi Jim,

 Finding parallel document pairs in comparable corpora is a less
 researched problem, but Felipe's doctrans project
 (http://code.google.com/p/doctrans/) happily does that - you'll need a
 phrase table from Moses to use it, though.


 Thanks for giving me this information. This is probably what I need, and I
 am also working using Moses at the moment.

 Have you tried compiling the program?

Not recently. I do recall that I had to patch something to get it to
compile, though. Now that I think of it, I think the Moses interfaces
changed in the meantime, so it might be some effort to get running.

Felipe is subscribed to this list, and might be able to provide some
insight, when he has time.

 When I ran the configure script, it
 doesn't seem to find PhraseDictionaryTreeAdaptor.h in Moses. This is how I
 ran the configure script: (the Moses program is already installed locally)

 ./configure --with-srilm=$HOME/software/srilm
 --with-moses=$HOME/software/mosesdecoder --prefix=$HOME/local

 and the error output:

 checking PhraseDictionaryTreeAdaptor.h usability... no
 checking PhraseDictionaryTreeAdaptor.h presence... no
 checking for PhraseDictionaryTreeAdaptor.h... no
 configure: error: Cannot find MOSES!

 Is there a bug in the script? Thanks again for your help. :)

This might be because of the interface change I mentioned, above.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Bitextor installation

2012-10-25 Thread Jimmy O'Regan
On 25 October 2012 17:09, Raymond HS raymh...@gmail.com wrote:
 Hi everyone,

 I wanted to try Bitextor to get some parallel texts from the Web. I have
 installed all the required libraries on my Ubuntu, but when I tried to
 compile the Bitextor source code, I got the following error:

 g++  -g -O2 -o bitextor BitextCandidates.o TranslationMemory.o DownloadMod.o
 FilePreprocess.o GlobalParams.o Heuristics.o WebFile.o WebSite.o Bitextor.o
 -L/home/raymondhs/local/lib -ltagaligner3 -lenca -lm -lxml2   -ltre -ltidy
 -ltextcat

 /home/raymondhs/local/lib/libtagaligner3.so: undefined reference to
 `std::basic_stringwchar_t, std::char_traitswchar_t,
 std::allocatorwchar_t 
 EditDistanceTools::EditDistanceBeamshort(std::vectorshort,
 std::allocatorshort , std::vectorshort, std::allocatorshort ,
 double (*)(short const, short const, short const), bool const, double
 const, double*)'

 collect2: ld returned 1 exit status

 So there seems to be a linker error in libtagaligner, which I can't really
 figure out why. Any hint why this could happen? Thanks!

Missing template instantiation. I sent Miquel a patch (attached) for
this in 2010, but I guess he never got around to applying it.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you


tagaligner.patch
Description: Binary data
--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] apertium es-de

2012-10-25 Thread Jimmy O'Regan
On 25 October 2012 20:10, Isabel Imbernón isabelimber...@gmail.com wrote:

 Hi,

 I've been trying to cross the en-es.dix with the en-de.dix to get the
 es-de.dix, but I don't get it. I've been doing it according to the wiki
 about crossdics, so I use the script: apertium-dixtools cross-param monA.dix
 -n bilAB.dix -n bilBC-dix monC.dix, which in my case would be
 apertium-dixtools cross-param dics/apertium-en-es.es.dix -n
 dics/apertium-en-es.en-es.dix -n dics/apertium-en-de.en-de.dix
 dics/apertium-en-de.de.dix isn't it right?
 However I get following errors:


Clipping down to the relevant part:

 Reading file null/schemas/cross-model.xml
 Error (null/schemas/cross-model.xml):
 /Users/isaimbernon/null/schemas/cross-model.xml (No such file or directory)


 Can anyone explain me what's happening?

You need to provide a cross model. There's some documentation on the
wiki (http://wiki.apertium.org/wiki/Cross_Model). I find that the best
thing to do is to just use the default model (schemas/cross-model.xml
in the distribution), and edit the detected patterns that crossdics
outputs.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] apertium es-de

2012-10-25 Thread Jimmy O'Regan
On 25 October 2012 20:57, Jimmy O'Regan jore...@gmail.com wrote:
 On 25 October 2012 20:10, Isabel Imbernón isabelimber...@gmail.com wrote:

 Hi,

 I've been trying to cross the en-es.dix with the en-de.dix to get the
 es-de.dix, but I don't get it. I've been doing it according to the wiki
 about crossdics, so I use the script: apertium-dixtools cross-param monA.dix
 -n bilAB.dix -n bilBC-dix monC.dix, which in my case would be
 apertium-dixtools cross-param dics/apertium-en-es.es.dix -n
 dics/apertium-en-es.en-es.dix -n dics/apertium-en-de.en-de.dix
 dics/apertium-en-de.de.dix isn't it right?
 However I get following errors:


 Clipping down to the relevant part:

 Reading file null/schemas/cross-model.xml
 Error (null/schemas/cross-model.xml):
 /Users/isaimbernon/null/schemas/cross-model.xml (No such file or directory)


 Can anyone explain me what's happening?

 You need to provide a cross model. There's some documentation on the
 wiki (http://wiki.apertium.org/wiki/Cross_Model). I find that the best
 thing to do is to just use the default model (schemas/cross-model.xml
 in the distribution), and edit the detected patterns that crossdics
 outputs.

In case anyone was interested, thanks to Isabel's feedback, the
cross-param function in the apertium-dixtools script is a little more
robust, and the crossing process no longer creates empty dictionaries
if there is a mismatch of sections.


-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Google Code-in tasks

2012-10-24 Thread Jimmy O'Regan
On 23 October 2012 20:35, Bernard Chardonneau bechapert...@free.fr wrote:
 Date: Mon, 22 Oct 2012 20:14:35 +0100
 From: Jimmy O'Regan jore...@gmail.com
 To: Apertium-stuff apertium-stuff@lists.sourceforge.net
 Reply-To: apertium-stuff@lists.sourceforge.net
 Subject: [Apertium-stuff] Google Code-in tasks

 http://wiki.apertium.org/wiki/Task_ideas_for_Google_Code-in

 Feel free to add tasks - we now have the minimum 5 tasks in each
 category, but more are welcome. Please bear in mind that these are
 intended to be completed by high school students.

 In fact, for these tasks that can be done shortly, what and for how
 much is the important between :
 - asking new people to do something usefull for the project ?
 - making new people discover apertium project ?


The latter fits into 'outreach', which, along with 'code',
'documentation', 'user interface', and 'research', is one of the areas
of contribution. All are considered equally important.

 Another problem is about mentoring. As this work would be short with
 the result in 48 H or less, if somebody mentors that, he will need
 to be availlable in a short amount of time. A to me, I cannot promise
 anything. That may depend on the hour, the day of the week, and the
 week during the year.

For anyone who is new, or relatively new, to GCI (or GSoC), my
recommendation would be to wait until the competition is under way,
and to take an observational role on a handful of tasks. Seeing how it
works in practice will give a better of idea of what's involved than
any explanation, and will also allow a type of 'meta mentoring'
(mentoring new mentors) that is quite difficult to provide outside of
the programme. Unlike last year, we do not need to 'stockpile' tasks,
but can add them as the competition progresses, so there isn't a vital
need to think of everything in advance.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] bug apertium.

2012-10-24 Thread Jimmy O'Regan
On Wednesday, 24 October 2012, Mikel Forcada m...@dlsi.ua.es wrote:
 Al 10/24/2012 08:38 PM, En/na erik ha escrit:
 Hola, hello

 look, i am trying to solve this problem with gaupol and apertium.
 one of the developers of gaupol thinks the problem lies in apertium.
 Can you help me?

 https://bugzilla.gnome.org/show_bug.cgi?id=686772

 thanks in advance!
 Erik,

 I am copying this message to apertium-stuff to see if someone can please
 help (perhaps Kevin Unhammer), as I don't know what the problem is. But
 I suspect the problem is in the way apertium is being invoked from
 gaupol. If I invoke

 /usr/local/bin/apertium -l

 in my installation, what I get is the list of installed language pairs,
 and the status is zero.

 My version is Apertium 3.2.0.

Older versions of Apertium didn't have the -l option. The bug report
mentions Ubuntu - the packages in Ubuntu (via Debian) are ancient, so I'd
assume that's what's happening.
--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Google Code-in tasks

2012-10-23 Thread Jimmy O'Regan
On 23 October 2012 07:40, Mikel L. Forcada m...@dlsi.ua.es wrote:
 Hi there,

 if the Apertium OmegaT plugin has not been modified since it was
 contributed, there is a task that a Java programmer could probably
 attempt with a bit of help from some of our Java experts: escaping
 OmegaT's format codes (u, i1, etc.) so that Apertium does not
 translate them. This involves: researching a bit what kind of format
 tags OmegaT produces (I can help here), and writing a quick filter that
 does that. Would this be an adequate GCI task? It looks like a quick
 hack to me, but requires a bit of guidance.

 What do you guys think?

This is exactly the sort of task we're looking for :)

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] Google Code-in tasks

2012-10-22 Thread Jimmy O'Regan
http://wiki.apertium.org/wiki/Task_ideas_for_Google_Code-in

Feel free to add tasks - we now have the minimum 5 tasks in each
category, but more are welcome. Please bear in mind that these are
intended to be completed by high school students.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Google Code-in

2012-10-16 Thread Jimmy O'Regan
On 16 October 2012 15:43, Francis Tyers fty...@prompsit.com wrote:
 Hey all,

 It's that time of year again!  Next Monday we'll be applying again for
 the Google Code-in.  One of the most important things we need is a list
 of tasks suitable for 13-17 year olds.

 http://wiki.apertium.org/wiki/Task_ideas_for_Google_Code-in

 There are some changes this year with respect to last year:

 * There are no translation tasks :(
 * No difficulty rating
 * No monetary incentive this year.

IIRC, they also changed it so there are two winners from each
organisation, chosen by the organisation.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Word selection by sens was: Re: Adding Swedish nouns from SALDO to da-se was: Re: Danish - Swedish Nouns

2012-10-09 Thread Jimmy O'Regan
On 9 October 2012 14:14,  k...@keldix.com wrote:
 On Tue, Oct 09, 2012 at 09:41:41AM +0200, Per Tunedal wrote:
 Hej Keld,
 I liked your algo but had to think it over. After I've slept on, it a
 few things got into my mind:

 My initial go on an algorithm is then: I found a homonym.
 Each of the homonyms have a placement in the meaning tree via its father
 and mother relations.

 Unfortunately, I've no idea what's the father relation. Maybe you should
 follow only the mother relations?

 The father relation is meant to discriminate between the same mother 
 relations.
 So maybe it can be of help. I don't know. I take it into account to generalize
 wordnet-like structures, there may be more than one relation from a given 
 homonym

Saldo is not a WordNet (and it's creators don't claim that it is, only
that it is equivalent for some purposes), and this is one of the major
differences. WordNet synsets can have an unlimited number of typed
references, whereas Saldo has maximum two untyped references (what the
type is depends on the pair, and does not seem to be encoded anywhere
that's publicly available).

On the plus side, there is a relatively complete set of mappings
between the English WordNet and Saldo, so WordNet types could be
inferenced from those alignments, though how accurate the results
would be remains to be seen.

 And a general Apertium wordnet module and algoritm should be able
 to handle more than one upwards relation, In the monodix markup
 this could be then marked with a rel tag, and more
 rel tags may be present. I need input from people more in the know if this 
 could be
 the recommended way to mark up such meaning relations in the monodix.


The problem with using WordNet is that the synsets are simultaneously
too fine grained -- i.e., they represent a distinction without a
difference when it comes to translation, such as 'tree' the plant vs.
'tree' meaning a tree-like structure (parse tree, family tree, etc.)
-- and too coarse grained -- synsets are conceptual, rather than
lexical, so while 'panther' and 'leopard' are the same animal, we can
never say 'black leopard' or 'a panther never changes its spots' -- to
be useful for MT. In addition, there is no indication of the relative
importance of a sense, which may be too obscure for inclusion in a
translation lexicon (e.g., 'torpedo' meaning 'hitman' is a sense of
that word that I have only seen in WordNet).

If you were to give some thought to how you might split, merge, and
prune WordNet synsets into something that's useful for translation,
then you might be able to generate some interest.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Word selection by sens was: Re: Adding Swedish nouns from SALDO to da-se was: Re: Danish - Swedish Nouns

2012-10-09 Thread Jimmy O'Regan
On 9 October 2012 15:14, Francis Tyers fty...@prompsit.com wrote:
 * For Swedish-Danish this will be unnecessary.
 * For other language pairs in Apertium, there are no free WordNets. Thus
 the method would have zero applicability.

es-ca, es-it, ca-it, en-es, en-ca, es-gl, en-gl are all candidates
(Spanish, Catalan, Galician, and Italian WordNets were all released
under CC-BY earlier this year).

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Word selection by sens was: Re: Adding Swedish nouns from SALDO to da-se was: Re: Danish - Swedish Nouns

2012-10-09 Thread Jimmy O'Regan
On 9 October 2012 15:59, Francis Tyers fty...@prompsit.com wrote:
 El dt 09 de 10 de 2012 a les 15:50 +0100, en/na Jimmy O'Regan va
 escriure:
 On 9 October 2012 15:14, Francis Tyers fty...@prompsit.com wrote:
  * For Swedish-Danish this will be unnecessary.
  * For other language pairs in Apertium, there are no free WordNets. Thus
  the method would have zero applicability.

 es-ca, es-it, ca-it, en-es, en-ca, es-gl, en-gl are all candidates
 (Spanish, Catalan, Galician, and Italian WordNets were all released
 under CC-BY earlier this year).

 The whole schebang ? or just a part ? -- I know that 10% or so of the
 es/ca ones have been available in FreeLing for a while.


Yep, all. The ca one was available under the GPL for a while.

 Ooh, cool: http://adimen.si.ehu.es/web/MCR

 Shame about the Basque one though.

Yeah, but even though it's one of the crappy CC licences, it's still a
CC licence, so uses that are prohibited by the database laws (and not
by copyright) are fair game.


 Anyway, if you get something working, I have test data for
 English--Spanish and would be happy to compare any method using WordNet
 to my methods.

I'm interested in wordnets for other reasons, I was just mentioning
they're there. Frankly, I still think that trying to use raw wordnet
data for lexical selection would be a massive waste of time, but if
someone wants to prove me wrong, more power to them.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Danish - Swedish was: Re: Swedish - Norwegian

2012-09-07 Thread Jimmy O'Regan
On 7 September 2012 07:26, Per Tunedal per.tune...@operamail.com wrote:
 Hi,
 originally the translation was broken as both lines used e.

 The translations of the two swedish words inte and icke should be
 ikke in danish. In the opposite direction the danish ikke should in
 most cases be translated with inte in Swedish, but in some contexts
 icke would be better. As Apertium yet cannot handle a one to two
 relation I changed the line with icke to e r=LR.

5 minutes of googling leads me to believe that 'har ikke gjort' in
Danish translates as 'har inte gjort' in Swedish, and this is the sort
of case that can be handled in a rule. (It might be wrong, but it
doesn't matter for the example). I'll assume that applies for all past
participles (and gloss over that it changes to the supine in Swedish,
because where there's 'har inte gjort', I assume you can also have
'har gjort', so that change would be best handled in a macro). I'm
also going to skip over defining the 'def-cat' pieces for the
'pattern-item' parts - it's enough to mention that they do have to be
defined.

You can handle that in two ways: you can either ignore 'ikke', and
replace it completely:
rule
  pattern
pattern-item n=haver/
pattern-item n=ikke/
pattern-item n=pp/
  /pattern
  action
let
  clip pos=2 side=tl part=lem/
  lit v=inte/
/let
out
  lu
clip pos=1 side=tl part=whole/
  /lu
  b pos=1/
  lu
lit v=inte/
lit-tag v=adv/
  /lu
  b pos=2/
  lu
clip pos=3 side=tl part=whole/
  /lu
/out
  /action
/rule

or you can change its lemma:

rule
  pattern
pattern-item n=haver/
pattern-item n=ikke/
pattern-item n=pp/
  /pattern
  action
let
  clip pos=2 side=tl part=lem/
  lit v=inte/
/let
out
  lu
clip pos=1 side=tl part=whole/
  /lu
  b pos=1/
  lu
clip pos=2 side=tl part=whole/
  /lu
  b pos=2/
  lu
clip pos=3 side=tl part=whole/
  /lu
/out
  /action
/rule

It seems to me that other adverbs could fit in the same place as
'ikke', so you could use a test instead:
rule
  pattern
pattern-item n=haver/
pattern-item n=adv/
pattern-item n=pp/
  /pattern
  action
choose
  when
test
  equal
clip pos=2 side=sl part=lem/
lit v=ikke/
  /equal
/test
let
  clip pos=2 side=tl part=lem/
  lit v=inte/
/let
  /when
/choose
out
  lu
clip pos=1 side=tl part=whole/
  /lu
  b pos=1/
  lu
clip pos=2 side=tl part=whole/
  /lu
  b pos=2/
  lu
clip pos=3 side=tl part=whole/
  /lu
/out
  /action
/rule

The 'choose' part basically means 'if the lemma part of the source
language (sl) is ikke, then change the target language (tl) to
inte (but do nothing otherwise)'.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Formatters and deformatter for man pages and mnémonic files added to apertium

2012-09-04 Thread Jimmy O'Regan
On 4 September 2012 20:38, Bernard Chardonneau bechapert...@free.fr wrote:
 So, when you wrote ... you should put these somewhere else (approved
 by Mikel), I thought the problem was to have put the files directly in
 trunk before other people test them, not to have put them in apertium
 directory.


Definitely the latter (don't put them in the apertium/ directory).
trunk/ is a lot more relaxed, but we do expect things in trunk to be
release quality - if it doesn't work out of the box, we'll expect you
to move it - but there aren't any problems about creating new modules.

 As it has been explained now, no problem to create a new directory (I
 think in trunk for that case) to put my new formatter et deformatters
 with a makefile and 2 shells called apertium-man and apertium-mnemo to
 permit a more simple usage, as apertium shell will not support these
 formats.

Right.

 And if you think a version based on a XML file may be interesting to
 put in apertium trunk, that may be something interesting to develop,
 but not during the 4 next months for me (and as there will be another
 solution working, there is not emergency).

Sure.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Formatters and deformatter for man pages and mnémonic files added to apertium

2012-09-03 Thread Jimmy O'Regan
On 3 September 2012 09:52, Bernard Chardonneau bechapert...@free.fr wrote:
 Hello

 As indicated in another Email in August, I developped deformatters and
 reformatter for man pages and mnémonic files.

 I added the source files in
 http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium/apertium/


Unless Sergio says otherwise, you should put these somewhere else.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Formatters and deformatter for man pages and mnémonic files added to apertium

2012-09-03 Thread Jimmy O'Regan
On 3 September 2012 16:04, Bernard Chardonneau bechapert...@free.fr wrote:
 Al 09/03/2012 01:05 PM, En/na Jimmy O'Regan ha escrit:
  Unless Sergio says otherwise, you should put these somewhere else.
 +1

 Mikel


 Well, when I put a question about that 5 weeks ago, I did not get
 any answer for that point.


Without trawling back through old mail, my recollection was that Kevin
Unhammer pointed out the mediawiki deformatter to you, which is in a
separate directory. I for one took the implication to be this is the
example to follow - including where to put it - I assume others did
too. I also felt that there was nothing to be said to improve on that
answer; I again assume that others did too. I'm sorry we didn't make
that sufficiently clear, _but_ when a) you are diverting from existing
conventions; and b) using a different programming language, I for one
feel that the default position should be 'make this a separate
module', and I'm relatively confident that the majority of open source
projects adopt a similar stance, particularly when it comes to their
core software.

-- 
Sefam Are any of the mentors around?
jimregan yes, they're the ones trolling you

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


  1   2   3   4   >