[OFF TOPIC] Instead of AdSense (was: Re: [off topic] A new project - automatic translation)
[This is a question, which really belongs more to the Hackers-IL mailing list to Linux-IL, but the thread started here so I am continuing it in the same mailing list. Apologies to those who are not interested in the subject. I am crossposting to both mailing lists, and suggest that those who are interested subscribe to Hackers-IL and follow it up there.] The question is how to make a translation Web site pay for itself. In principle, a Web 2.0 level Web site needs two flows of resources: 1. Information - from visitors, who contribute content. Like those who edit articles in Wikipedia. 2. Cash - to pay for hosting the Web site, for a Webmaster to supervise, manage and improve it, for a sysadmin to grow the hosting infrastructure as needed, for motivating the innovator who had the original idea. Usually, Web sites can utilize Google AdSense to generate some cash flow. Visitors browse the Web site, and as they look for additional resources, they click through relevant ads, which they happen to see. The visitors are in information search mode, and are receptive to looking for additional information in targetted ads. Therefore, AdSense is good for both them and the Web site owner. However, when one wants to do a translation, one wants to be concentrated on the task on hand. One does not want to be distracted by links (unless they lead to thesaurus-like information) or ads. Therefore, AdSense would be neither helpful nor effective in a translation Web site. Does anyone have a bright idea how to generate revenue from a translation Web site, in lieu of targetted ads? Of course, this should not be based upon subscription fees, exacting micropayments from visitors, relying upon donations from grateful companies and individuals, or other forms of coercion. --- Omer [Thanks to Shlomi Fish for mentioning the Web 2.0 article in http://www.paulgraham.com/web20.html, which inspired me to ask this question.] -- Sent from a PC running a top secret test version of Windows 97. My own blog is at http://www.livejournal.com/users/tddpirate/ My opinions, as expressed in this E-mail message, are mine alone. They do not represent the official policy of any organization with which I may be affiliated in any way. WARNING TO SPAMMERS: at http://www.zak.co.il/spamwarning.html = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: [OFF TOPIC] Instead of AdSense (was: Re: [off topic] A new project - automatic translation)
On Tue, Nov 22, 2005 at 07:52:16PM +0200, Omer Zak wrote: Therefore, AdSense is good for both them and the Web site owner. You should note that not only will Amateur radio operators aka Hams avoid your site, they will ask their family and friends to do so. Geoff. -- Geoffrey S. Mendelson, Jerusalem, Israel [EMAIL PROTECTED] N3OWJ/4X1GM IL Voice: (07)-7424-1667 IL Fax: 972-2-648-1443 U.S. Voice: 1-215-821-1838 You should have boycotted Google while you could, now Google supported BPL is in action. Time is running out on worldwide radio communication. = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: [off topic] A new project - automatic translation
Thanks! Uri. Ehud Karni wrote: On Wed, 16 Nov 2005 17:58:16 +0200, Uri Even-Chen wrote: Very interesting. I haven't read Ray Kurzweil's book, nor heard his name until a few days ago, when I saw his name in one of the Wikipedia articles you sent me (about translation). I also read about him and it's very impressive. I want to write him and ask his opinion about my idea. Do you happen to know his E-mail address? One of his email is [EMAIL PROTECTED] (if you'll pass the junk filters). You can also try to reach him at http://www.kurzweilai.net/ . Ehud. Ehud Karni wrote: Uri, You might find The Rosetta Project http://www.rosettaproject.org/live interesting. Ehud. = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: [off topic] A new project - automatic translation
Sorry for not replying yesterday. I was so busy. Anyway, here's my reply. Ely Levy wrote: Yea, join us;) or word together with the Mila team (which I think also try to get to the same goal). Btw I was thinking about automaticly translating software with the right glossary. It seems that is a lot easier task. Especialy with the right po comments. If anyone wish to join and help;) Ely Please give me more information on what you do and how I can contribute. I remind you what I previously wrote: Uri Even-Chen wrote: Are the Hspell and WordNet databases available to use? Can you send me links? I'll be glad to contribute if I can, but my focus is not on any specific language (Hebrew) but on translating in general. My idea is to create something like Wikipedia (but for translations), and yes, I understand how huge the task is. Best Regards, Uri Even-Chen Speedy Net Raanana, Israel. E-mail: [EMAIL PROTECTED] Phone: +972-9-7715013 Website: www.uri.co.il = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: [off topic] A new project - automatic translation
Nadav Har'El wrote: This is getting wildly off-topic, but... Ray Kurzweil, in his book The Age of Spiritual Machines: When Computers Exceed Human Intelligence, makes the following observation about human thought, and how computers can immitate it. If you'll allow me to put what I remember from his writing into my own words: He believes that human thought has two modes: computation and patern recognition. Examples of the former include an arithmetic computation, or thinking about several options one after another, and an example of the latter includes face recognition. He believes that chess is an example where both modes are used: a chess expert, like a chess novice, goes in his head through many of the possible moves and his oponent's possible reactions (this is the computational mode), but unlike a novice, he also does some patern recognition on each of the resulting boards, and instantly (without sequential computation) recognizes situations which are good, or bad, for him. This final recognition is the part of their thought-process that chess-players can't really explain, and is often called intuition. Kurzweil argues that a chess-playing program could act similarly - walk the the (fantasically huge) tree of possible moves and counter-moves, sequentially, and at every junction apply a neural network that recognizes good boards, and prune the tree at that junction if the neural network decides to that this move is not worth it. This technique, of walking huge trees with a *heuristic function*, are well known in AI (look up A star, Minimax, etc.) and are not Kurzweil's invention. But his interesting insight is that Neural Networks are useful but SHOULD NOT (not only CAN NOT) be used directly to solve every problem, but rather should be combined with other computational techniques. It is arguable that similarly, neural networks should not be used directly to parse language. It is very possible that language understanding and generation has both a computational, or sequential, aspect (reading the words one by one, following some sort of state machine in your head), and a pattern recognition aspect. Very interesting. I haven't read Ray Kurzweil's book, nor heard his name until a few days ago, when I saw his name in one of the Wikipedia articles you sent me (about translation). I also read about him and it's very impressive. I want to write him and ask his opinion about my idea. Do you happen to know his E-mail address? If you're interested in what I did with neural networks - I used them to compose music. If you want to see more details, look at Speedy Composer: http://www.speedy.co.il/composer/ The music is very nice :) Thanks! Unfortunately I didn't have too much time to invest in this project, but I'm sure that a better quality of music can be reached if more time is invested (by the right people, of course). I already had ideas how to reach better quality, but I never had time to implement them. Best Regards, Uri Even-Chen Speedy Net Raanana, Israel. E-mail: [EMAIL PROTECTED] Phone: +972-9-7715013 Website: www.uri.co.il = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: [off topic] A new project - automatic translation
Shachar Shemesh wrote: Not trying to discourage you, but it seems that this approach, if it's going to work at all, will likely only start working when it has a HUGE database of phrases. The words approach seems fairly hopeless to me. I'm aware of it. That's why I want many people to contribute to the database. Best Regards, Uri Even-Chen Speedy Net Raanana, Israel. E-mail: [EMAIL PROTECTED] Phone: +972-9-7715013 Website: www.uri.co.il = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: [off topic] A new project - automatic translation
Nadav Har'El wrote: Are the Hspell and WordNet databases available to use? Can you send me links? I'll be glad to contribute if I can, but my focus is not on any Of course, both are released with free software licenses: http://www.ivrix.org.il/projects/spell-checker/ http://cl.haifa.ac.il/projects/mwn/ Thanks! http://www.wiktionary.org/ is a like wikipedia (but for translations), but for individual words. Like people noted here time and again, this is only a small step in the translation direction. Arguably, it is even a step in the wrong direction (with the wordnet effort being more in the right direction, because it translates individual word senses, rather than words). I'm aware of it and I agree it's not in the right direction. I need to establish a database of words and phrases and their translation in various languages. This database should be good enough to do search and replace for a text in one language into another language, so that people who know only the target language will be able to understand most of what the original text is about (it doesn't have to be perfect). Then it will be improved over time (with feedback). If there is any database I can use, which contains texts (or words, phrases) and their translations, please let me know. I would like to start with something. Again, wikitionary has this for words. If you'll read some machine-translation literature, you'll find references to a bunch of corpora which contain quality texts translated into several languages, with an explicit correspondence between the sentences in each language. For example, there is a corpus of EU laws translated into several EU languages. And I believe there's also a UN corpus. Sorry, I don't have any links. And, for an interesting corpus, why not try... the bible? It has been translated into countless languages, and a strict correspondence between the verses has been observed. Of course, some of the translations features some archaic language :-) No thanks. I think the bible it completely irrelevant. It's archaic language, religious words etc. And I don't want to mess up with God/Moses/Jesus/The Pope or whoever thinks he has copyrights on the bible... I would rather use modern language instead. Best Regards, Uri Even-Chen Speedy Net Raanana, Israel. E-mail: [EMAIL PROTECTED] Phone: +972-9-7715013 Website: www.uri.co.il = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: [off topic] A new project - automatic translation
Shachar Shemesh wrote: In other words, this is a case where a human programmer has to analyze the problem, decide on a solution path, decide where, if at all, to apply neural networks, and then program the whole thing. Compare it to cars or airplanes. They can't work on their own - they need people to build them, and then drive them. But when using them you can go much faster than you can go with your own feet. It's the same with computers. Theoretically, everything you can do with a computer you can also do without it, but it will take you much more time. In many cases, even though programming a task is difficult and takes much time, you can still achieve more by programming something and let the computer do it than by doing it on your own. I don't think neural networks are an exception - they are just a programming tool. But in some cases you can achieve more with neural networks than what you achieve without them. I'll give you a simple example: There are people who can compose good music, but I'm not one of them. By programming a computer, I was able to let the computer compose melodies which are better than what I could compose without the computer. It doesn't mean the computer is smarter than me. It just means I could teach him to do something I can't do on my own. Nadav Har'El wrote: There's no argument that programming such a thing will take effort. Artifical intelligence doesn't mean some sort of hey look, it's magic, I'll get a working program without doing any effort!. Rather, the idea that the programmer, while being an expert programmer, does not have to be an expert chess player (to use this example), and the program can learn how to play chess by watching the games of grandmasters. There's a division of labor, if you will, by the programmer who can program, and the teachers who can play chess extremely well but couldn't program if their life depended on it. To return to the translation issue, the idea that Uri raised was that he wanted to write a translation program, and perhaps spend a good deal of effort doing so, but since he doesn't really know how to translate French to Swedish (for example), he himself cannot teach the program to do that, and he hopes that the program could pick up that skill from experts of these languages. By the way, chess is probably not a very good example for this division of labor (programmer vs. teacher), because with the strength of modern computers, even the most naive, brute-force, tree-walking algorithms with the most simplistic heuristic functions, can actually play great chess. A programmer is enough, and you don't even need an expert chess teacher. These sorts of simplistic algorithms makes my Palm Pilot beat me at chess every time, and a stronger computer beat even the best chess player in the world. Now you're probably saying: 'well, chess doesn't actually require intelligence to play, and these programs should not be called artificial intelligence'. Kurzweil also points out to this interesting phenomenon, of the drifting definition of artificial intelligence. He claims that by definition, a computer will never be called intelligent, because whenever we learn how to do something with a computer, we'll suddenly say that this task does not require intelligence. He gives as examples OCR and speech recognition, tasks once thought to be too intelligent for a computer to undertake, but now that computers do them casually we call these tasks un-intelligent, and move our intelligence bar a little higher. I agree. People tend to think that computers are machines and therefore are not intelligent. Personally I think it's not true. I think computers are capable of doing intelligent things, and people are also capable of doing (very) unintelligent things (and vice versa). So it's not a yes/no question whether a person or computer are intelligent. What matters is the action itself. And I think many intelligent actions we do can be done, and will be done in the future, by computers. Like playing chess, playing music, composing music, translating, understanding speech, speaking etc. In the future we may even be able to hire computers for many jobs we do now. For example - a secretary. Or maybe even a software programmer! And maybe also a prime minister - I think ANY computer can be smarter than what we have now! :-) Regarding artificial intelligence, you might be interested to read (at least the last few paragraphs) of my summary of the Speedy Composer project (from 6 years ago). It's available in Hebrew and English (I translated it manually): http://www.speedy.co.il/composer/summary.php http://music.speedy.co.il/speedy_composer.php By the way, the issue is not only intelligence but also feelings: I believe a computer is even capable of having feelings. Or at least, act as if he has feelings, which is the same. When computers will have emotions, I think it will be a real breakthrough in artificial intelligence. It reminds me some
Re: [off topic] A new project - automatic translation
Shachar Shemesh wrote: Nothing is. I'll settle for ideas regarding what things they are useful for. Please restrict your answer to those in which success ratio can be accurately measured. Sorry, no time at the moment. Just answer the above question, i.e. - can you give (relatively) objective standard by which how good the music your neural network produced can be measured? Please don't understand this question as a taunt. Music is a highly subjective thing, and there is nothing wrong with a program that can produce good music, regardless of what definition of good you may wish to use. With translations, however, quality measurement is, by far, less subjective. Any program that consistantly produces a translation of a general text that 20% of the target native speaking population will call a good translation will get my appreciation. I'll even not include literary text, as those tend to be harder to translate. My experience is that you can use neural networks to compose music. It doesn't mean you can't compose music without them - it's just a tool. Like there is a piano, a guitar, many instruments - each one of them can be used to play music. The same is with neural networks. Regarding other uses of neural networks - I'm not an expert, but I know they have been used for pattern recognition and all sort of things in physics and other areas. What is special with neural networks is their ability to generalize. You teach them something, then they learn and generalize on data which was never given to them. Best Regards, Uri Even-Chen Speedy Net Raanana, Israel. E-mail: [EMAIL PROTECTED] Phone: +972-9-7715013 Website: www.uri.co.il = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: [off topic] A new project - automatic translation
Barry.R wrote: You may be interested in an article which appeared in the Proceedings of the National Academy of Sciences (America) August 8th issue entitled Unsupervised Learning of Natural Languages by four Israelis Zach Solan, David Horn, Eytan Ruppin and Shimon Edelman. They maintain that they have developed an unsupervised algorithm that discovers heirachchical structures in sequences of data. This algorithm has been tested on several thousand sentences in languages it was originally unfamiliar with and was able to decode the grammatical structure of these languages and produce acceptable sentences. Barry. Thanks. I found it, printed it and will read it. Thanks for your advice. Best Regards, Uri Even-Chen Speedy Net Raanana, Israel. E-mail: [EMAIL PROTECTED] Phone: +972-9-7715013 Website: www.uri.co.il = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: [off topic] A new project - automatic translation
On Wed, 16 Nov 2005 17:58:16 +0200, Uri Even-Chen wrote: Very interesting. I haven't read Ray Kurzweil's book, nor heard his name until a few days ago, when I saw his name in one of the Wikipedia articles you sent me (about translation). I also read about him and it's very impressive. I want to write him and ask his opinion about my idea. Do you happen to know his E-mail address? One of his email is [EMAIL PROTECTED] (if you'll pass the junk filters). You can also try to reach him at http://www.kurzweilai.net/ . Ehud. -- Ehud Karni Tel: +972-3-7966-561 /\ Mivtach - Simon Fax: +972-3-7966-667 \ / ASCII Ribbon Campaign Insurance agencies (USA) voice mail and X Against HTML Mail http://www.mvs.co.il FAX: 1-815-5509341 / \ GnuPG: 98EA398D http://www.keyserver.net/Better Safe Than Sorry = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: [off topic] A new project - automatic translation
Uri, You might find The Rosetta Project http://www.rosettaproject.org/live interesting. Ehud. -- Ehud Karni Tel: +972-3-7966-561 /\ Mivtach - Simon Fax: +972-3-7966-667 \ / ASCII Ribbon Campaign Insurance agencies (USA) voice mail and X Against HTML Mail http://www.mvs.co.il FAX: 1-815-5509341 / \ GnuPG: 98EA398D http://www.keyserver.net/Better Safe Than Sorry = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: [off topic] A new project - automatic translation
Nadav Har'El wrote: On Mon, Nov 14, 2005, Uri Even-Chen wrote about Re: [off topic] A new project - automatic translation: This is getting wildly off-topic, but... but interesting. but unlike a novice, he also does some patern recognition on each of the resulting boards, and instantly (without sequential computation) recognizes situations which are good, or bad, for him. This final recognition is the part of their thought-process that chess-players can't really explain, and is often called intuition. Kurzweil argues that a chess-playing program could act similarly - walk the the (fantasically huge) tree of possible moves and counter-moves, sequentially, and at every junction apply a neural network that recognizes good boards, and prune the tree at that junction if the neural network decides to that this move is not worth it. In other words, this is a case where a human programmer has to analyze the problem, decide on a solution path, decide where, if at all, to apply neural networks, and then program the whole thing. I agree that this may produce a well playing chess program. I refuse to call it artificial intelligence. I have nothing against neural networks as a programming tool (aside from the fact that they are, in my humble opinion, too complex for casual use). I just don't like the thought that says just throw a neural network at it and everything will be ok given enough training data and time. Shachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: [off topic] A new project - automatic translation
On Tue, Nov 15, 2005, Shachar Shemesh wrote about Re: [off topic] A new project - automatic translation: This is getting wildly off-topic, but... but interesting. .. In other words, this is a case where a human programmer has to analyze the problem, decide on a solution path, decide where, if at all, to apply neural networks, and then program the whole thing. .. There's no argument that programming such a thing will take effort. Artifical intelligence doesn't mean some sort of hey look, it's magic, I'll get a working program without doing any effort!. Rather, the idea that the programmer, while being an expert programmer, does not have to be an expert chess player (to use this example), and the program can learn how to play chess by watching the games of grandmasters. There's a division of labor, if you will, by the programmer who can program, and the teachers who can play chess extremely well but couldn't program if their life depended on it. To return to the translation issue, the idea that Uri raised was that he wanted to write a translation program, and perhaps spend a good deal of effort doing so, but since he doesn't really know how to translate French to Swedish (for example), he himself cannot teach the program to do that, and he hopes that the program could pick up that skill from experts of these languages. By the way, chess is probably not a very good example for this division of labor (programmer vs. teacher), because with the strength of modern computers, even the most naive, brute-force, tree-walking algorithms with the most simplistic heuristic functions, can actually play great chess. A programmer is enough, and you don't even need an expert chess teacher. These sorts of simplistic algorithms makes my Palm Pilot beat me at chess every time, and a stronger computer beat even the best chess player in the world. Now you're probably saying: 'well, chess doesn't actually require intelligence to play, and these programs should not be called artificial intelligence'. Kurzweil also points out to this interesting phenomenon, of the drifting definition of artificial intelligence. He claims that by definition, a computer will never be called intelligent, because whenever we learn how to do something with a computer, we'll suddenly say that this task does not require intelligence. He gives as examples OCR and speech recognition, tasks once thought to be too intelligent for a computer to undertake, but now that computers do them casually we call these tasks un-intelligent, and move our intelligence bar a little higher. -- Nadav Har'El|Tuesday, Nov 15 2005, 13 Heshvan 5766 [EMAIL PROTECTED] |- Phone +972-523-790466, ICQ 13349191 |[I'm] so full of action, my name should http://nadav.harel.org.il |be a verb -- Big Daddy Kane (Raw, 1987) = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: [off topic] A new project - automatic translation
On Sun, Nov 13, 2005 at 09:42:48PM +0200, Danny Lieberman wrote: As long as you can find content-expert professional human translators at 5cents/word, you won't have a business proposition, because a free community effort will depend precisely upon the people who make a living from translation. As long as you can find professional programmers at $smallnum dollars, you won't have a business proposition, because a free community effort will depend precisely upon the people who make a living from programming. Need I say more? Cheers, Muli -- Muli Ben-Yehuda http://www.mulix.org | http://mulix.livejournal.com/ = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: [off topic] A new project - automatic translation
Uri Even-Chen wrote: Hi Shachar, Shachar Shemesh wrote: I know about four people who deal with linguistics (two of them actual linguists, one of them subscribed to this list). It would be nice if any of your friends would be interested in giving me their advice. I want to write something that actually works, but not spend too much time writing it. I'm counting on volunteers who will contribute to this project. Like I said, one of said acquaintance is a subscriber of this list. He will speak up if he so chooses. Bear in mind that any translation engine contains a rather huge list of words and various attributes about each one of them. The whole appeal of Generative linguistics based engines is that you can significantly reduce the number of attributes you store per-word. Of course, this is also PRECISELY the reason they give out such appalling results. Also bear in mind that creating this list of words is a task given to trained linguistics to do - i.e. - it's totally manual, demands highly skilled workers (and not of the sort of skill computer people usually posses), and quite time consuming. You MAY find such lists available somewhere for use in engines based on the usual technology, but I wouldn't count on them enabling you to achieve anything better than what the current engines already know how to. The short of it is that the reason that 30 years of research has not produced any good results with machine translations is that they are using the wrong tool for the job. I agree. I want to try a new approach which was never tried before (as far as I know). Care to explain what it is? Don't forget that if translating (or any other type of NLP) consisted merely of getting a list of words and their meaning, the problem would have been long ago solved. I have heared of some cases where interesting results were achieved using the neural network approach - teach the engine by letting it look at translations done by others. Having learned some neural network, the whole art is in choosing the correct learning network. A friend of mine once referred to the entire field as Artificial Stupidity. You set up a computer program, you tell it, with precise details, what it needs to do. You set it out to do it. It does it. You cry out in wonder look, it learns all by itself! My first friend claims that his engine achieves the same level of accuracy as proffesional engines today WITHOUT BEING ADAPTED TO THE USED CORPUS. He even claims he has no particular problems with the standard gotchas, such as the infamous Time flies like an arrow, fruit flies like a banana. It would be interesting to see your friend's engine in action. Do you have any link? No. It's a private company looking for financing. I did talk to him about releasing it open source should the other alternative be letting the technology sink, but I'm actually hoping for his sake that that doesn't happen (and things are looking fairly well, in that respect). In the mean while, I'm not aware of any web-presence the company has at all. Best Regards, Uri Even-Chen Speedy Net Raanana, Israel. E-mail: [EMAIL PROTECTED] Phone: +972-9-7715013 Website: www.uri.co.il -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: [off topic] A new project - automatic translation
Hi Shachar, Shachar Shemesh wrote: I know about four people who deal with linguistics (two of them actual linguists, one of them subscribed to this list). It would be nice if any of your friends would be interested in giving me their advice. I want to write something that actually works, but not spend too much time writing it. I'm counting on volunteers who will contribute to this project. The short of it is that the reason that 30 years of research has not produced any good results with machine translations is that they are using the wrong tool for the job. I agree. I want to try a new approach which was never tried before (as far as I know). My first friend claims that his engine achieves the same level of accuracy as proffesional engines today WITHOUT BEING ADAPTED TO THE USED CORPUS. He even claims he has no particular problems with the standard gotchas, such as the infamous Time flies like an arrow, fruit flies like a banana. It would be interesting to see your friend's engine in action. Do you have any link? Best Regards, Uri Even-Chen Speedy Net Raanana, Israel. E-mail: [EMAIL PROTECTED] Phone: +972-9-7715013 Website: www.uri.co.il = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: [off topic] A new project - automatic translation
Uri, I am not sure that you grasp the enormity of the task at hand. However, I don't want to discourage you. Unlike others, I would appreciate a stupid translator that replaces a word by a word. It would be wonderful to have a free software that is as bad Google or babelfish. I would LOVE to have a free too to make fun about. Now we have nothing. There are few already no? So go on and get to work. Note that if I were you, I'd set myself a more realistic goals first. In a way, I am you: when we set out to write hspell we had dreams of a linguistic future. I hope that the huge list that we collected, of almost all modern Hebrew words, will be useful to you. Even the most huge tasks can be started as small ones. Wordnet and the free dictionary could be a good start. (word net is more what he probebly needs). I want to have both an algorithm and a database of languages (words, phrases etc) that will improve over time. That is, start with a simple algorithm, and feed data into it. The data will be sources and translations in any language. When there is enough data for a given The database you describe here is not dissimilar to WordNet, and I am told that few list members are trying to extend the Hebrew WordNet. Maybe you can join them. Yea, join us;) or word together with the Mila team (which I think also try to get to the same goal). Btw I was thinking about automaticly translating software with the right glossary. It seems that is a lot easier task. Especialy with the right po comments. If anyone wish to join and help;) Ely -- Dan Kenigsberghttp://www.cs.technion.ac.il/~dankenICQ 162180901 = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED] = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: [off topic] A new project - automatic translation
On Mon, Nov 14, 2005 at 01:46:03PM +0200, Offer Kaye wrote: On 11/14/05, Dan Kenigsberg wrote: that is as bad Google or babelfish. I would LOVE to have a free tool to make fun about. Now we have nothing. Enter the Perl module Lingua::Translate - Go ahead, make fun of it :) Since you are asking so nicely, I'll simply quote from its man page: Locale::Translate translates text from one written language to another. Currently this is implemented by contacting Babelfish (http://babelfish.altavista.com/), so see there for the language pairs that are supported. Babelfish uses SysTran (http://www.systran.org/) to perform Meaning it is not interestingly free (or extremely interesting at all), and has no Hebrew support (but has Arabic!). -- Dan Kenigsberghttp://www.cs.technion.ac.il/~dankenICQ 162180901 = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: [off topic] A new project - automatic translation
Shachar Shemesh wrote: Care to explain what it is? Don't forget that if translating (or any other type of NLP) consisted merely of getting a list of words and their meaning, the problem would have been long ago solved. I think I already explained it. I'm copying what I wrote yesterday: I want to consider using existing databases, which are free to use, such as your Hspell project or Wikipedia - to feed initial data into the system. The main goal is to have (for each pair of languages) a list of translations of words, phrases and maybe even sentences. Then, the algorithm will just do search and replace - for every word, phrase or sentence it will replace it with its equivalent in the target languages. I think it's quite a simple algorithm to start with. And then it will be improved in the future. (Even Linux was not written in one day!). Look at my E-mail from yesterday for more details. I have heared of some cases where interesting results were achieved using the neural network approach - teach the engine by letting it look at translations done by others. Having learned some neural network, the whole art is in choosing the correct learning network. A friend of mine once referred to the entire field as Artificial Stupidity. You set up a computer program, you tell it, with precise details, what it needs to do. You set it out to do it. It does it. You cry out in wonder look, it learns all by itself! I worked with artificial neural networks in the past and I think the approach of Artificial Stupidity is wrong. Neural networks do work in some cases. The idea (in a nutshell) is that they can generalize, and you don't teach them how to generalize. You just feed them with data, train them and they generalize by themselves. On the other hand, neural networks are not really intelligent. You can't teach them to play chess, for example. They are not suitable for everything. If you're interested in what I did with neural networks - I used them to compose music. If you want to see more details, look at Speedy Composer: http://www.speedy.co.il/composer/ Best Regards, Uri Even-Chen Speedy Net Raanana, Israel. E-mail: [EMAIL PROTECTED] Phone: +972-9-7715013 Website: www.uri.co.il = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: [off topic] A new project - automatic translation
Hi Dan, Dan Kenigsberg wrote: I am not sure that you grasp the enormity of the task at hand. However, I don't want to discourage you. Unlike others, I would appreciate a stupid translator that replaces a word by a word. It would be wonderful to have a free software that is as bad Google or babelfish. I would LOVE to have a free too to make fun about. Now we have nothing. So go on and get to work. Note that if I were you, I'd set myself a more realistic goals first. In a way, I am you: when we set out to write hspell we had dreams of a linguistic future. I hope that the huge list that we collected, of almost all modern Hebrew words, will be useful to you. Thanks for encouraging me (if you are not being cynical). The database you describe here is not dissimilar to WordNet, and I am told that few list members are trying to extend the Hebrew WordNet. Maybe you can join them. Are the Hspell and WordNet databases available to use? Can you send me links? I'll be glad to contribute if I can, but my focus is not on any specific language (Hebrew) but on translating in general. My idea is to create something like Wikipedia (but for translations), and yes, I understand how huge the task is. If there is any database I can use, which contains texts (or words, phrases) and their translations, please let me know. I would like to start with something. Of course, it must be a database which is free to use without legal problems. Best Regards, Uri Even-Chen Speedy Net Raanana, Israel. E-mail: [EMAIL PROTECTED] Phone: +972-9-7715013 Website: www.uri.co.il = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: [off topic] A new project - automatic translation
Uri Even-Chen wrote: I want to consider using existing databases, which are free to use, such as your Hspell project or Wikipedia - to feed initial data into the system. The main goal is to have (for each pair of languages) a list of translations of words, phrases and maybe even sentences. Then, the algorithm will just do search and replace - for every word, phrase or sentence it will replace it with its equivalent in the target languages. I think it's quite a simple algorithm to start with. And then it will be improved in the future. (Even Linux was not written in one day!). Not trying to discourage you, but it seems that this approach, if it's going to work at all, will likely only start working when it has a HUGE database of phrases. The words approach seems fairly hopeless to me. I worked with artificial neural networks in the past and I think the approach of Artificial Stupidity is wrong. Neural networks do work in some cases. Yes, if you pick the right neural function, and choose the correct number of network levels, and how many neurons to put in each level. This does strike me as almost the same thing as actually coding the thing, except that I am not aware of any better way of deciding what this right number is except trial and error. My (limited) expereince with neural networks is that even detecting over-learning is not easy. The idea (in a nutshell) is that they can generalize, and you don't teach them how to generalize. Unless you call choosing the right number of neurons and choosing the number of layers to the network and choosing the right neural function - teaching. If you happen, like me, to think that there is no fundemental difference between the art of getting the neural network parameters right and the art of art of programming an algorithmic solution to the same problem (except that in the former case, the programmer itself is rather powerless in case a bug is found after the product is released), then I see nothing special about neural networks' ability to generalize. They are not suitable for everything. Nothing is. I'll settle for ideas regarding what things they are useful for. Please restrict your answer to those in which success ratio can be accurately measured. If you're interested in what I did with neural networks - I used them to compose music. If you want to see more details, look at Speedy Composer: http://www.speedy.co.il/composer/ Sorry, no time at the moment. Just answer the above question, i.e. - can you give (relatively) objective standard by which how good the music your neural network produced can be measured? Please don't understand this question as a taunt. Music is a highly subjective thing, and there is nothing wrong with a program that can produce good music, regardless of what definition of good you may wish to use. With translations, however, quality measurement is, by far, less subjective. Any program that consistantly produces a translation of a general text that 20% of the target native speaking population will call a good translation will get my appreciation. I'll even not include literary text, as those tend to be harder to translate. Best Regards, Uri Even-Chen Speedy Net Raanana, Israel. E-mail: [EMAIL PROTECTED] Phone: +972-9-7715013 Website: www.uri.co.il Shachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: [off topic] A new project - automatic translation
Offer Kaye wrote: There's *no way* to go from a simplistic search and replace of single words (or very short/simple phrases) to a full blown translation software. There's no improvement you could make that would make such a methodology work for complete sentences in a real language. .Anyone who tells you otherwise is trying to sell you something... You first need to come up with a working method, then find a way to implement it. You can't try to start with a naive list of words to replace and expand that to translating complete sentences. I want to start with a simple algorithm and improve it with time. If a better algorithm is found or developed, it can completely replace the initial algorithm. But I want to start with something, and something which is not too difficult to implement. As for maintaining *lists of sentences* to translate dude, are you nuts? ;-) It will not contain every sentence in the world. But it can remember specific sentences of texts which were previously translated. If the same sentence will be found again, the same translation will be used. But the main focus is words and phrases, and not sentences. Finally, since you mentioned Wikipedia, here are some useful links: http://en.wikipedia.org/wiki/Machine_translation See especially the section on Free (open source) software for existing efforts where you might be able to help. A more general article regarding translation: http://en.wikipedia.org/wiki/Translation Very interesting. Thanks for the links. I'm glad I wrote this mailing list - I received many useful comments. I also read the article Translation memory [http://en.wikipedia.org/wiki/Translation_memory] (linked from the articles you sent me). I think the term Translation memory describes what I want to do: I want to create a huge translation memory database of many languages, which will be used to translate texts from one language to another. It's data will come from volunteers all over the world. This translation memory database will be used together with other methods to translate texts. The more words and phrases there are in this translation memory database, the better the quality of the translation will be. And it will be big if many people will contribute to it. Compare to Wikipedia as an encyclopedia. Best Regards, Uri Even-Chen Speedy Net Raanana, Israel. E-mail: [EMAIL PROTECTED] Phone: +972-9-7715013 Website: www.uri.co.il = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: [off topic] A new project - automatic translation
On Mon, Nov 14, 2005, Uri Even-Chen wrote about Re: [off topic] A new project - automatic translation: On the other hand, neural networks are not really intelligent. You can't teach them to play chess, for example. They are not suitable for everything. This is getting wildly off-topic, but... Ray Kurzweil, in his book The Age of Spiritual Machines: When Computers Exceed Human Intelligence, makes the following observation about human thought, and how computers can immitate it. If you'll allow me to put what I remember from his writing into my own words: He believes that human thought has two modes: computation and patern recognition. Examples of the former include an arithmetic computation, or thinking about several options one after another, and an example of the latter includes face recognition. He believes that chess is an example where both modes are used: a chess expert, like a chess novice, goes in his head through many of the possible moves and his oponent's possible reactions (this is the computational mode), but unlike a novice, he also does some patern recognition on each of the resulting boards, and instantly (without sequential computation) recognizes situations which are good, or bad, for him. This final recognition is the part of their thought-process that chess-players can't really explain, and is often called intuition. Kurzweil argues that a chess-playing program could act similarly - walk the the (fantasically huge) tree of possible moves and counter-moves, sequentially, and at every junction apply a neural network that recognizes good boards, and prune the tree at that junction if the neural network decides to that this move is not worth it. This technique, of walking huge trees with a *heuristic function*, are well known in AI (look up A star, Minimax, etc.) and are not Kurzweil's invention. But his interesting insight is that Neural Networks are useful but SHOULD NOT (not only CAN NOT) be used directly to solve every problem, but rather should be combined with other computational techniques. It is arguable that similarly, neural networks should not be used directly to parse language. It is very possible that language understanding and generation has both a computational, or sequential, aspect (reading the words one by one, following some sort of state machine in your head), and a pattern recognition aspect. If you're interested in what I did with neural networks - I used them to compose music. If you want to see more details, look at Speedy Composer: http://www.speedy.co.il/composer/ The music is very nice :) -- Nadav Har'El| Monday, Nov 14 2005, 13 Heshvan 5766 [EMAIL PROTECTED] |- Phone +972-523-790466, ICQ 13349191 |A computer without Microsoft is like a http://nadav.harel.org.il |chocolate cake without mustard. = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: [off topic] A new project - automatic translation
On Mon, Nov 14, 2005, Uri Even-Chen wrote about Re: [off topic] A new project - automatic translation: Are the Hspell and WordNet databases available to use? Can you send me links? I'll be glad to contribute if I can, but my focus is not on any Of course, both are released with free software licenses: http://www.ivrix.org.il/projects/spell-checker/ http://cl.haifa.ac.il/projects/mwn/ specific language (Hebrew) but on translating in general. My idea is to create something like Wikipedia (but for translations), and yes, I understand how huge the task is. http://www.wiktionary.org/ is a like wikipedia (but for translations), but for individual words. Like people noted here time and again, this is only a small step in the translation direction. Arguably, it is even a step in the wrong direction (with the wordnet effort being more in the right direction, because it translates individual word senses, rather than words). If there is any database I can use, which contains texts (or words, phrases) and their translations, please let me know. I would like to start with something. Again, wikitionary has this for words. If you'll read some machine-translation literature, you'll find references to a bunch of corpora which contain quality texts translated into several languages, with an explicit correspondence between the sentences in each language. For example, there is a corpus of EU laws translated into several EU languages. And I believe there's also a UN corpus. Sorry, I don't have any links. And, for an interesting corpus, why not try... the bible? It has been translated into countless languages, and a strict correspondence between the verses has been observed. Of course, some of the translations features some archaic language :-) -- Nadav Har'El| Monday, Nov 14 2005, 13 Heshvan 5766 [EMAIL PROTECTED] |- Phone +972-523-790466, ICQ 13349191 |Strike not only while the iron is hot, http://nadav.harel.org.il |make the iron hot by striking it. = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: [off topic] A new project - automatic translation
Uri, I am not sure that you grasp the enormity of the task at hand. However, I don't want to discourage you. Unlike others, I would appreciate a stupid translator that replaces a word by a word. It would be wonderful to have a free software that is as bad Google or babelfish. I would LOVE to have a free too to make fun about. Now we have nothing. So go on and get to work. Note that if I were you, I'd set myself a more realistic goals first. In a way, I am you: when we set out to write hspell we had dreams of a linguistic future. I hope that the huge list that we collected, of almost all modern Hebrew words, will be useful to you. I want to have both an algorithm and a database of languages (words, phrases etc) that will improve over time. That is, start with a simple algorithm, and feed data into it. The data will be sources and translations in any language. When there is enough data for a given The database you describe here is not dissimilar to WordNet, and I am told that few list members are trying to extend the Hebrew WordNet. Maybe you can join them. -- Dan Kenigsberghttp://www.cs.technion.ac.il/~dankenICQ 162180901 = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: [off topic] A new project - automatic translation
On 11/13/05, Uri Even-Chen wrote: The main goal is to have (for each pair of languages) a list of translations of words, phrases and maybe even sentences. Then, the algorithm will just do search and replace - for every word, phrase or sentence it will replace it with its equivalent in the target languages. I think it's quite a simple algorithm to start with. And then it will be improved in the future. (Even Linux was not written in one day!). There's *no way* to go from a simplistic search and replace of single words (or very short/simple phrases) to a full blown translation software. There's no improvement you could make that would make such a methodology work for complete sentences in a real language. .Anyone who tells you otherwise is trying to sell you something... You first need to come up with a working method, then find a way to implement it. You can't try to start with a naive list of words to replace and expand that to translating complete sentences. As for maintaining *lists of sentences* to translate dude, are you nuts? ;-) Finally, since you mentioned Wikipedia, here are some useful links: http://en.wikipedia.org/wiki/Machine_translation See especially the section on Free (open source) software for existing efforts where you might be able to help. A more general article regarding translation: http://en.wikipedia.org/wiki/Translation Best regards, -- Offer Kaye To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: [off topic] A new project - automatic translation
On 11/14/05, Dan Kenigsberg wrote: Unlike others, I would appreciate a stupid translator that replaces a word by a word. It would be wonderful to have a free software that is as bad Google or babelfish. I would LOVE to have a free too to make fun about. Now we have nothing. Enter the Perl module Lingua::Translate - http://search.cpan.org/dist/Lingua-Translate/ It uses SYSTRAN by default, which is the same backend used by Google (AFAIK): http://www.systransoft.com/index.html Go ahead, make fun of it :) Cheers, -- Offer Kaye To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: [off topic] A new project - automatic translation
Nadav Har'El wrote: And, for an interesting corpus, why not try... the bible? It has been translated into countless languages, and a strict correspondence between the verses has been observed. Of course, some of the translations features some archaic language :-) Hebrew and the 1917 JPS English translation. http://www.mechon-mamre.org/ Lots of translations: http://bible.gospelcom.net/ -- Thanks, Uri http://translation.israel.net = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: [off topic] A new project - automatic translation
Uri Even-Chen wrote: I also read the article Translation memory [http://en.wikipedia.org/wiki/Translation_memory] (linked from the articles you sent me). I think the term Translation memory describes what I want to do: I want to create a huge translation memory database of many languages, which will be used to translate texts from one language to another. A single TM database for all fields isn't very useful. -- Thanks, Uri http://translation.israel.net = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: [off topic] A new project - automatic translation
Uri Even-Chen wrote: Offer Kaye wrote: There's *no way* to go from a simplistic search and replace of single words (or very short/simple phrases) to a full blown translation software. There's no improvement you could make that would make such a methodology work for complete sentences in a real language. .Anyone who tells you otherwise is trying to sell you something... You first need to come up with a working method, then find a way to implement it. You can't try to start with a naive list of words to replace and expand that to translating complete sentences. I want to start with a simple algorithm and improve it with time. If a better algorithm is found or developed, it can completely replace the initial algorithm. But I want to start with something, and something which is not too difficult to implement. You may be interested in an article which appeared in the Proceedings of the National Academy of Sciences (America) August 8th issue entitled Unsupervised Learning of Natural Languages by four Israelis Zach Solan, David Horn, Eytan Ruppin and Shimon Edelman. They maintain that they have developed an unsupervised algorithm that discovers heirachchical structures in sequences of data. This algorithm has been tested on several thousand sentences in languages it was originally unfamiliar with and was able to decode the grammatical structure of these languages and produce acceptable sentences. Barry. = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED] = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: [off topic] A new project - automatic translation
On Sun, 13 Nov 2005, Uri Even-Chen wrote: Date: Sun, 13 Nov 2005 15:13:38 +0200 From: Uri Even-Chen [EMAIL PROTECTED] To: linux-il linux-il@linux.org.il Subject: [off topic] A new project - automatic translation Hi people, I am thinking about starting a new open source project related to automatic translation. The idea is to create a website and/or software that will automatically translate texts from one language to another. I want the website or software to be able to get feedback from users and learn from mistakes. The quality of translation should improve over time. I would like to get your feedback about this idea: - Do you think it's a good idea? - Do you think it can be implemented? - Do you have any suggestions how to implement it? - How many people are necessary to implement such an idea? - Are you interested in being involved? Personally I think there is a need for such a website or software. There are existing websites or softwares, but they have three main disadvantages: 1. Some of them are not free. 2. Most of them support only a few languages. 3. I think all of them should improve the quality of translation. I want to have a website that support many languages, its usage is free, and the quality of translation will improve over time. What do you think? It is not so much a question of implementation, free or not, website or program, but of algorithms. Before you run to implement, you need to research the subject. Many researchers already are researcing it, and as you can see, the results are far from perfect, since this is a problem of natural languages. Orna. -- Orna Agmon http://ladypine.org/ http://haifux.org/~ladypine/ ICQ: 348759096 = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: [off topic] A new project - automatic translation
On Sun, Nov 13, 2005, Uri Even-Chen wrote about [off topic] A new project - automatic translation: I am thinking about starting a new open source project related to automatic translation. The idea is to create a website and/or software that will automatically translate texts from one language to another. I want the website or software to be able to get feedback from users and learn from mistakes. The quality of translation should improve over time. I would like to get your feedback about this idea: - Do you think it's a good idea? The idea of machine translation is obviously a good one, and it would be even better if we had one that was free, both in price and in freedom to inspect and to improve the code. BUT, between saying that it's a good idea, and actually being able to implement it, there's a VERY VERY LONG road. - Do you think it can be implemented? - Do you have any suggestions how to implement it? Machine translation has, or at least good one, has long been an open question in AI research. Better and better algorithms are appearing, and your first course of action should probably to read up on known approaches (in linguistic journals, books, university courses, etc.). What will very likely NOT WORK is any naive approach, including the one which you seem to imply above (some sort of simplistic machine learning approach) - people already tried these simplistic approaches, long ago, and they just didn't work. In addition to algorithms, you'll also need linguistic data. There has been plenty of research on trying to teach a computer a language without linguistic data, i.e., using only untagged texts. But I'm not aware of such research ever being fully successful. Arguably, even a baby doesn't get untagged text when he learns a language, but rather also gets input on physical objects corresponding to words, get corrected by his teacher, and so on. So be prepared for a lot (and I mean A LOT) of work on preparing linguistic data: lexicons of words, dictionaries of word *meanings* (with links between different languages), various representations of grammar. You will probably also need collections of idioms, names, and various other ways of representing world knowledge, which is often needed for good translation. Alternative approaches use tagged texts instead of linguistic data, but these are also hard to come by (especially in Hebrew). All of this is very far from being easy, unfortunately. When we started Hspell (http://ivrix.org.il/projects/spell-checker/), we envisioned it as the first step toward more sophisticated linguistic applications, including machine translation. But it was only the first step, in the journey of many miles :( - How many people are necessary to implement such an idea? My estimate (based on nothing but pure guesswork) is that you can get something sort-of-working in 5 person-years. This is about 10 times more work than went into Hspell so far... But then again, I'm not a translation expert (or even a novice) and maybe I'm grossly underestimating the complexity involved. I also suggest you take a look at http://www.mila.cs.technion.ac.il/, which is the Knowledge Center for Processing Hebrew. This is a cooperation of people from the Academia who work in the field of Computational Lingustics, in Hebrew, and they finally started to cooperate in building basic building blocks that are necessary to advance Hebrew linguistic research. These building blocks will be released as free software, and they include (or will include) a morphological analyzer (similar in purpose to Hspell), worse-sense disambiguators, tagged texts, grammar analyzers, and so on. I assume that they are interested as well to advance their toolset to the point that they will also have translation tools, text understanding and generation tools, and so on. But they are also quite far from this goal. - Are you interested in being involved? I am, but slowly slowly (in the same mode I've been working on Hspell so far) :) I want to have a website that support many languages, its usage is free, and the quality of translation will improve over time. I am not sure I understand the quality of translation will improve over time thing. What makes you think that a translator, whether computerized or even human, can learn to improve his translation capabilities meerly by translating more texts (without any feedback)? And even if there's feedback, how will it be used? Are you aware of any papers on machine-translation that actually can learn from past experience? -- Nadav Har'El| Sunday, Nov 13 2005, 12 Heshvan 5766 [EMAIL PROTECTED] |- Phone +972-523-790466, ICQ 13349191 |I started out with nothing... I still http://nadav.harel.org.il |have most of it. = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message
Re: [off topic] A new project - automatic translation
Uri, This is a ripe research area as Orna has pointed out, I might add that it has been around for almost 30 years with no significant breakthroughs. Having just finished the first phase of translation of our main project's Web site to french, german and Italian - I can vouch that there is an enormous amount of quality translation resources available online - go to http://www.proz.com/ that has leveled the playing field of pricing and translation service providers. In other words, IMHO, you don't have a business proposition. Danny Orna Agmon wrote: On Sun, 13 Nov 2005, Uri Even-Chen wrote: Date: Sun, 13 Nov 2005 15:13:38 +0200 From: Uri Even-Chen [EMAIL PROTECTED] To: linux-il linux-il@linux.org.il Subject: [off topic] A new project - automatic translation Hi people, I am thinking about starting a new open source project related to automatic translation. The idea is to create a website and/or software that will automatically translate texts from one language to another. I want the website or software to be able to get feedback from users and learn from mistakes. The quality of translation should improve over time. I would like to get your feedback about this idea: - Do you think it's a good idea? - Do you think it can be implemented? - Do you have any suggestions how to implement it? - How many people are necessary to implement such an idea? - Are you interested in being involved? Personally I think there is a need for such a website or software. There are existing websites or softwares, but they have three main disadvantages: 1. Some of them are not free. 2. Most of them support only a few languages. 3. I think all of them should improve the quality of translation. I want to have a website that support many languages, its usage is free, and the quality of translation will improve over time. What do you think? It is not so much a question of implementation, free or not, website or program, but of algorithms. Before you run to implement, you need to research the subject. Many researchers already are researcing it, and as you can see, the results are far from perfect, since this is a problem of natural languages. Orna. -- Orna Agmon http://ladypine.org/ http://haifux.org/~ladypine/ ICQ: 348759096 = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED] -- Danny Lieberman Visit us at http://www.software.co.il Office + 972 8 970-1485 Cell + 972 54 447-1114 = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: [off topic] A new project - automatic translation
I'm replying to a few people together. Orna Agmon wrote: It is not so much a question of implementation, free or not, website or program, but of algorithms. Before you run to implement, you need to research the subject. Many researchers already are researcing it, and as you can see, the results are far from perfect, since this is a problem of natural languages. Yes, I'm aware of it. Nadav Har'El wrote: The idea of machine translation is obviously a good one, and it would be even better if we had one that was free, both in price and in freedom to inspect and to improve the code. BUT, between saying that it's a good idea, and actually being able to implement it, there's a VERY VERY LONG road. I agree. Machine translation has, or at least good one, has long been an open question in AI research. Better and better algorithms are appearing, and your first course of action should probably to read up on known approaches (in linguistic journals, books, university courses, etc.). What will very likely NOT WORK is any naive approach, including the one which you seem to imply above (some sort of simplistic machine learning approach) - people already tried these simplistic approaches, long ago, and they just didn't work. I want to have both an algorithm and a database of languages (words, phrases etc) that will improve over time. That is, start with a simple algorithm, and feed data into it. The data will be sources and translations in any language. When there is enough data for a given pair of languages, the software will be able to try to translate. People will correct improve the translations and feed them back into the system. This will improve the quality of translation for the given languages. In addition, the algorithms will be improved over time. All feedback improvements will be done by volunteers, in the spirit of Wikipedia and similar projects. Using the website for all users will be free of charge. When we started Hspell (http://ivrix.org.il/projects/spell-checker/), we envisioned it as the first step toward more sophisticated linguistic applications, including machine translation. But it was only the first step, in the journey of many miles :( I want to consider using existing databases, which are free to use, such as your Hspell project or Wikipedia - to feed initial data into the system. The main goal is to have (for each pair of languages) a list of translations of words, phrases and maybe even sentences. Then, the algorithm will just do search and replace - for every word, phrase or sentence it will replace it with its equivalent in the target languages. I think it's quite a simple algorithm to start with. And then it will be improved in the future. (Even Linux was not written in one day!). My estimate (based on nothing but pure guesswork) is that you can get something sort-of-working in 5 person-years. This is about 10 times more work than went into Hspell so far... But then again, I'm not a translation expert (or even a novice) and maybe I'm grossly underestimating the complexity involved. I hope that for writing the first version (alpha), it will require less than one person-year. Not including feeding the data into the system. I also suggest you take a look at http://www.mila.cs.technion.ac.il/, which is the Knowledge Center for Processing Hebrew. This is a cooperation of people from the Academia who work in the field of Computational Lingustics, in Hebrew, and they finally started to cooperate in building basic building blocks that are necessary to advance Hebrew linguistic research. These building blocks will be released as free software, and they include (or will include) a morphological analyzer (similar in purpose to Hspell), worse-sense disambiguators, tagged texts, grammar analyzers, and so on. I assume that they are interested as well to advance their toolset to the point that they will also have translation tools, text understanding and generation tools, and so on. But they are also quite far from this goal. I want the first algorithm (alpha) to be independent of language - it should work for any pair of languages. Of course I want to support Hebrew, but many other languages too. I am not sure I understand the quality of translation will improve over time thing. What makes you think that a translator, whether computerized or even human, can learn to improve his translation capabilities meerly by translating more texts (without any feedback)? And even if there's feedback, how will it be used? Are you aware of any papers on machine-translation that actually can learn from past experience? In order for the quality of translation to improve over time, there is need for feedback. People (who understand both languages) should correct translations and feed them back into the system. The algorithm should remember the feedback and update the database. The next time the same sentence (or phrase, or word) is translated by the system, the corrected
Re: [off topic] A new project - automatic translation
I'll start by stating that all of my info comes from a close friend, who is both a linguist and the owner of a small startup that does NLP (Natural Language Processing). I cannot testify to the truth behind all of the opinions I am transferring here, but I can testify that I have seen his technology, and it's fairly awesome. Danny Lieberman wrote: Uri, This is a ripe research area as Orna has pointed out, I might add that it has been around for almost 30 years with no significant breakthroughs. According to this friend of mine, it is more precise to say that there has been much research on the matter some 30 years ago. He says that most of the research done was in a field of linguistics called Generative Linguistics, lead by one Noam Chomsky. According to another friend of mine, Chomsky, while reveared by Generative linguistics around the world, is renowned for having his theories around NLP being knocked down, one after the other, to which he simply goes out and makes up a new theory. I know about four people who deal with linguistics (two of them actual linguists, one of them subscribed to this list). None of them has any respect whatso'ever to Generative Linguistics. From the little I know about what Generative Linguistics is about, neither do I. The short of it is that the reason that 30 years of research has not produced any good results with machine translations is that they are using the wrong tool for the job. My first friend claims that his engine achieves the same level of accuracy as proffesional engines today WITHOUT BEING ADAPTED TO THE USED CORPUS. He even claims he has no particular problems with the standard gotchas, such as the infamous Time flies like an arrow, fruit flies like a banana. That's the end of my rant. If any list reading differential SCSI cable yielding Structural Linguist wishes to add his proffesional view of things, he is welcome. Shachar -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: [off topic] A new project - automatic translation
I'm sure this problem will eventually be cracked. However, experience and the laws of physics teach us that problems and physical objects roll down walls to the lower energy levels not up walls to higher energy levels. I challenge any NLP system to understand that in German you write Home for home page and not StartSeit. As long as you can find content-expert professional human translators at 5cents/word, you won't have a business proposition, because a free community effort will depend precisely upon the people who make a living from translation. OTOH - if a technology solution costs more than that, it wont be a viable alternative either. BTW - I hear that organizations like NSA use NLP to listen in on conversations and translate - but they also have a small army of linguists to make sense of it. that's the end of my 5c :-) Shachar Shemesh wrote: I'll start by stating that all of my info comes from a close friend, who is both a linguist and the owner of a small startup that does NLP (Natural Language Processing). I cannot testify to the truth behind all of the opinions I am transferring here, but I can testify that I have seen his technology, and it's fairly awesome. Danny Lieberman wrote: Uri, This is a ripe research area as Orna has pointed out, I might add that it has been around for almost 30 years with no significant breakthroughs. According to this friend of mine, it is more precise to say that there has been much research on the matter some 30 years ago. He says that most of the research done was in a field of linguistics called Generative Linguistics, lead by one Noam Chomsky. According to another friend of mine, Chomsky, while reveared by Generative linguistics around the world, is renowned for having his theories around NLP being knocked down, one after the other, to which he simply goes out and makes up a new theory. I know about four people who deal with linguistics (two of them actual linguists, one of them subscribed to this list). None of them has any respect whatso'ever to Generative Linguistics. From the little I know about what Generative Linguistics is about, neither do I. The short of it is that the reason that 30 years of research has not produced any good results with machine translations is that they are using the wrong tool for the job. My first friend claims that his engine achieves the same level of accuracy as proffesional engines today WITHOUT BEING ADAPTED TO THE USED CORPUS. He even claims he has no particular problems with the standard gotchas, such as the infamous Time flies like an arrow, fruit flies like a banana. That's the end of my rant. If any list reading differential SCSI cable yielding Structural Linguist wishes to add his proffesional view of things, he is welcome. Shachar -- Danny Lieberman Visit us at http://www.software.co.il Office + 972 8 970-1485 Cell + 972 54 447-1114 = To unsubscribe, send mail to [EMAIL PROTECTED] with the word unsubscribe in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]