[OFF TOPIC] Instead of AdSense (was: Re: [off topic] A new project - automatic translation)

2005-11-22 Thread Omer Zak
[This is a question, which really belongs more to the Hackers-IL mailing
list to Linux-IL, but the thread started here so I am continuing it in
the same mailing list.  Apologies to those who are not interested in the
subject.
I am crossposting to both mailing lists, and suggest that those who are
interested subscribe to Hackers-IL and follow it up there.]

The question is how to make a translation Web site pay for itself.

In principle, a Web 2.0 level Web site needs two flows of resources:
1. Information - from visitors, who contribute content.  Like those who
edit articles in Wikipedia.
2. Cash - to pay for hosting the Web site, for a Webmaster to supervise,
manage and improve it, for a sysadmin to grow the hosting infrastructure
as needed, for motivating the innovator who had the original idea.

Usually, Web sites can utilize Google AdSense to generate some cash
flow.  Visitors browse the Web site, and as they look for additional
resources, they click through relevant ads, which they happen to see.
The visitors are in information search mode, and are receptive to
looking for additional information in targetted ads.

Therefore, AdSense is good for both them and the Web site owner.

However, when one wants to do a translation, one wants to be
concentrated on the task on hand.  One does not want to be distracted by
links (unless they lead to thesaurus-like information) or ads.

Therefore, AdSense would be neither helpful nor effective in a
translation Web site.

Does anyone have a bright idea how to generate revenue from a
translation Web site, in lieu of targetted ads?
Of course, this should not be based upon subscription fees, exacting
micropayments from visitors, relying upon donations from grateful
companies and individuals, or other forms of coercion.
 --- Omer
[Thanks to Shlomi Fish for mentioning the Web 2.0 article in
http://www.paulgraham.com/web20.html, which inspired me to ask this
question.]
-- 
Sent from a PC running a top secret test version of Windows 97.
My own blog is at http://www.livejournal.com/users/tddpirate/

My opinions, as expressed in this E-mail message, are mine alone.
They do not represent the official policy of any organization with which
I may be affiliated in any way.
WARNING TO SPAMMERS:  at http://www.zak.co.il/spamwarning.html


=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: [OFF TOPIC] Instead of AdSense (was: Re: [off topic] A new project - automatic translation)

2005-11-22 Thread Geoffrey S. Mendelson
On Tue, Nov 22, 2005 at 07:52:16PM +0200, Omer Zak wrote:

 Therefore, AdSense is good for both them and the Web site owner.

You should note that not only will Amateur radio operators aka Hams
avoid your site, they will ask their family and friends to do so.

Geoff.
-- 
Geoffrey S. Mendelson, Jerusalem, Israel [EMAIL PROTECTED]  N3OWJ/4X1GM
IL Voice: (07)-7424-1667  IL Fax: 972-2-648-1443 U.S. Voice: 1-215-821-1838 
You should have boycotted Google while you could, now Google supported
BPL is in action. Time is running out on worldwide radio communication.

=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: [off topic] A new project - automatic translation

2005-11-17 Thread Uri Even-Chen

Thanks!

Uri.


Ehud Karni wrote:

On Wed, 16 Nov 2005 17:58:16 +0200, Uri Even-Chen wrote:


Very interesting.  I haven't read Ray Kurzweil's book, nor heard his
name until a few days ago, when I saw his name in one of the Wikipedia
articles you sent me (about translation).  I also read about him and
it's very impressive.  I want to write him and ask his opinion about my
idea.  Do you happen to know his E-mail address?



One of his email is [EMAIL PROTECTED] (if you'll pass the junk
filters). You can also try to reach him at http://www.kurzweilai.net/ .

Ehud.




Ehud Karni wrote:

Uri,

You might find The Rosetta Project http://www.rosettaproject.org/live
interesting.

Ehud.





=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: [off topic] A new project - automatic translation

2005-11-16 Thread Uri Even-Chen

Sorry for not replying yesterday.  I was so busy.  Anyway, here's my reply.

Ely Levy wrote:

Yea, join us;) or word together with the Mila team (which I think also try
to get to the same goal).
Btw I was thinking about automaticly translating software with the right
glossary. It seems that is a lot easier task. Especialy with the right po
comments. If anyone wish to join and help;)

Ely


Please give me more information on what you do and how I can contribute.
I remind you what I previously wrote:

Uri Even-Chen wrote:

Are the Hspell and WordNet databases available to use?  Can you send me
links?  I'll be glad to contribute if I can, but my focus is not on any
specific language (Hebrew) but on translating in general.  My idea is to
create something like Wikipedia (but for translations), and yes, I
understand how huge the task is.


Best Regards,

Uri Even-Chen
Speedy Net
Raanana, Israel.

E-mail: [EMAIL PROTECTED]
Phone: +972-9-7715013
Website: www.uri.co.il



=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: [off topic] A new project - automatic translation

2005-11-16 Thread Uri Even-Chen

Nadav Har'El wrote:

This is getting wildly off-topic, but...

Ray Kurzweil, in his book The Age of Spiritual Machines: When Computers
Exceed Human Intelligence, makes the following observation about human
thought, and how computers can immitate it. If you'll allow me to put what
I remember from his writing into my own words:

He believes that human thought has two modes: computation and patern
recognition. Examples of the former include an arithmetic computation, or
thinking about several options one after another, and an example of the
latter includes face recognition. He believes that chess is an example where
both modes are used: a chess expert, like a chess novice, goes in his head
through many of the possible moves and his oponent's possible reactions (this
is the computational mode), but unlike a novice, he also does some patern
recognition on each of the resulting boards, and instantly (without
sequential computation) recognizes situations which are good, or bad, for
him. This final recognition is the part of their thought-process that
chess-players can't really explain, and is often called intuition.

Kurzweil argues that a chess-playing program could act similarly - walk the
the (fantasically huge) tree of possible moves and counter-moves, sequentially,
and at every junction apply a neural network that recognizes good boards,
and prune the tree at that junction if the neural network decides to that
this move is not worth it.

This technique, of walking huge trees with a *heuristic function*, are well
known in AI (look up A star, Minimax, etc.) and are not Kurzweil's
invention. But his interesting insight is that Neural Networks are useful
but SHOULD NOT (not only CAN NOT) be used directly to solve every problem,
but rather should be combined with other computational techniques.

It is arguable that similarly, neural networks should not be used directly
to parse language. It is very possible that language understanding and
generation has both a computational, or sequential, aspect (reading the
words one by one, following some sort of state machine in your head), and
a pattern recognition aspect.


Very interesting.  I haven't read Ray Kurzweil's book, nor heard his
name until a few days ago, when I saw his name in one of the Wikipedia
articles you sent me (about translation).  I also read about him and
it's very impressive.  I want to write him and ask his opinion about my
idea.  Do you happen to know his E-mail address?


If you're interested in what I did with neural networks - I used them to
compose music.  If you want to see more details, look at Speedy
Composer: http://www.speedy.co.il/composer/



The music is very nice :)


Thanks!  Unfortunately I didn't have too much time to invest in this
project, but I'm sure that a better quality of music can be reached if
more time is invested (by the right people, of course).  I already had
ideas how to reach better quality, but I never had time to implement them.

Best Regards,

Uri Even-Chen
Speedy Net
Raanana, Israel.

E-mail: [EMAIL PROTECTED]
Phone: +972-9-7715013
Website: www.uri.co.il



=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: [off topic] A new project - automatic translation

2005-11-16 Thread Uri Even-Chen

Shachar Shemesh wrote:

Not trying to discourage you, but it seems that this approach, if it's
going to work at all, will likely only start working when it has a HUGE
database of phrases. The words approach seems fairly hopeless to me.


I'm aware of it.  That's why I want many people to contribute to the
database.

Best Regards,

Uri Even-Chen
Speedy Net
Raanana, Israel.

E-mail: [EMAIL PROTECTED]
Phone: +972-9-7715013
Website: www.uri.co.il



=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: [off topic] A new project - automatic translation

2005-11-16 Thread Uri Even-Chen

Nadav Har'El wrote:

Are the Hspell and WordNet databases available to use?  Can you send me
links?  I'll be glad to contribute if I can, but my focus is not on any



Of course, both are released with free software licenses:

http://www.ivrix.org.il/projects/spell-checker/

http://cl.haifa.ac.il/projects/mwn/


Thanks!


http://www.wiktionary.org/ is a like wikipedia (but for translations),
but for individual words. Like people noted here time and again, this is
only a small step in the translation direction. Arguably, it is even a step
in the wrong direction (with the wordnet effort being more in the right
direction, because it translates individual word senses, rather than words).


I'm aware of it and I agree it's not in the right direction.  I need to
establish a database of words and phrases and their translation in
various languages.  This database should be good enough to do search
and replace for a text in one language into another language, so that
people who know only the target language will be able to understand most
of what the original text is about (it doesn't have to be perfect).
Then it will be improved over time (with feedback).


If there is any database I can use, which contains texts (or words,
phrases) and their translations, please let me know.  I would like to
start with something.



Again, wikitionary has this for words.
If you'll read some machine-translation literature, you'll find references
to a bunch of corpora which contain quality texts translated into several
languages, with an explicit correspondence between the sentences in each
language. For example, there is a corpus of EU laws translated into several
EU languages. And I believe there's also a UN corpus. Sorry, I don't have
any links.

And, for an interesting corpus, why not try... the bible? It has been
translated into countless languages, and a strict correspondence between
the verses has been observed. Of course, some of the translations features
some archaic language :-)


No thanks.  I think the bible it completely irrelevant.  It's archaic
language, religious words etc.  And I don't want to mess up with
God/Moses/Jesus/The Pope or whoever thinks he has copyrights on the
bible...  I would rather use modern language instead.

Best Regards,

Uri Even-Chen
Speedy Net
Raanana, Israel.

E-mail: [EMAIL PROTECTED]
Phone: +972-9-7715013
Website: www.uri.co.il




=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: [off topic] A new project - automatic translation

2005-11-16 Thread Uri Even-Chen

Shachar Shemesh wrote:

In other words, this is a case where a human programmer has to analyze
the problem, decide on a solution path, decide where, if at all, to
apply neural networks, and then program the whole thing.


Compare it to cars or airplanes.  They can't work on their own - they
need people to build them, and then drive them.  But when using them you
can go much faster than you can go with your own feet.

It's the same with computers.  Theoretically, everything you can do with
a computer you can also do without it, but it will take you much more
time.  In many cases, even though programming a task is difficult and
takes much time, you can still achieve more by programming something and
let the computer do it than by doing it on your own.  I don't think
neural networks are an exception - they are just a programming tool.
But in some cases you can achieve more with neural networks than what
you achieve without them.

I'll give you a simple example: There are people who can compose good
music, but I'm not one of them.  By programming a computer, I was able
to let the computer compose melodies which are better than what I could
compose without the computer.  It doesn't mean the computer is smarter
than me.  It just means I could teach him to do something I can't do on
my own.

Nadav Har'El wrote:

There's no argument that programming such a thing will take effort. Artifical
intelligence doesn't mean some sort of hey look, it's magic, I'll get a
working program without doing any effort!. Rather, the idea that the
programmer, while being an expert programmer, does not have to be an expert
chess player (to use this example), and the program can learn how to play
chess by watching the games of grandmasters. There's a division of labor,
if you will, by the programmer who can program, and the teachers who can
play chess extremely well but couldn't program if their life depended on it.
To return to the translation issue, the idea that Uri raised was that he
wanted to write a translation program, and perhaps spend a good deal of effort
doing so, but since he doesn't really know how to translate French to Swedish
(for example), he himself cannot teach the program to do that, and he hopes
that the program could pick up that skill from experts of these languages.

By the way, chess is probably not a very good example for this division of
labor (programmer vs. teacher), because with the strength of modern computers,
even the most naive, brute-force, tree-walking algorithms with the most
simplistic heuristic functions, can actually play great chess. A programmer
is enough, and you don't even need an expert chess teacher. These sorts of
simplistic algorithms makes my Palm Pilot beat me at chess every time, and
a stronger computer beat even the best chess player in the world.

Now you're probably saying: 'well, chess doesn't actually require intelligence
to play, and these programs should not be called artificial intelligence'.
Kurzweil also points out to this interesting phenomenon, of the drifting
definition of artificial intelligence. He claims that by definition, a
computer will never be called intelligent, because whenever we learn how
to do something with a computer, we'll suddenly say that this task does not
require intelligence. He gives as examples OCR and speech recognition, tasks
once thought to be too intelligent for a computer to undertake, but now
that computers do them casually we call these tasks un-intelligent, and
move our intelligence bar a little higher.


I agree.  People tend to think that computers are machines and
therefore are not intelligent.  Personally I think it's not true.  I
think computers are capable of doing intelligent things, and people are
also capable of doing (very) unintelligent things (and vice versa).  So
it's not a yes/no question whether a person or computer are intelligent.
What matters is the action itself.  And I think many intelligent
actions we do can be done, and will be done in the future, by computers.
Like playing chess, playing music, composing music, translating,
understanding speech, speaking etc.  In the future we may even be able
to hire computers for many jobs we do now.  For example - a secretary.
Or maybe even a software programmer!  And maybe also a prime minister -
I think ANY computer can be smarter than what we have now! :-)

Regarding artificial intelligence, you might be interested to read (at
least the last few paragraphs) of my summary of the Speedy Composer
project (from 6 years ago).  It's available in Hebrew and English (I
translated it manually):
http://www.speedy.co.il/composer/summary.php
http://music.speedy.co.il/speedy_composer.php

By the way, the issue is not only intelligence but also feelings: I
believe a computer is even capable of having feelings.  Or at least, act
as if he has feelings, which is the same.  When computers will have
emotions, I think it will be a real breakthrough in artificial intelligence.

It reminds me some 

Re: [off topic] A new project - automatic translation

2005-11-16 Thread Uri Even-Chen

Shachar Shemesh wrote:

Nothing is. I'll settle for ideas regarding what things they are useful
for. Please restrict your answer to those in which success ratio can be
accurately measured.



Sorry, no time at the moment. Just answer the above question, i.e. - can
you give (relatively) objective standard by which how good the music
your neural network produced can be measured? Please don't understand
this question as a taunt. Music is a highly subjective thing, and there
is nothing wrong with a program that can produce good music, regardless
of what definition of good you may wish to use. With translations,
however, quality measurement is, by far, less subjective. Any program
that consistantly produces a translation of a general text that 20% of
the target native speaking population will call a good translation
will get my appreciation. I'll even not include literary text, as those
tend to be harder to translate.


My experience is that you can use neural networks to compose music.  It
doesn't mean you can't compose music without them - it's just a tool.
Like there is a piano, a guitar, many instruments - each one of them can
be used to play music.  The same is with neural networks.

Regarding other uses of neural networks - I'm not an expert, but I know
they have been used for pattern recognition and all sort of things in
physics and other areas.  What is special with neural networks is their
ability to generalize.  You teach them something, then they learn and
generalize on data which was never given to them.

Best Regards,

Uri Even-Chen
Speedy Net
Raanana, Israel.

E-mail: [EMAIL PROTECTED]
Phone: +972-9-7715013
Website: www.uri.co.il



=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: [off topic] A new project - automatic translation

2005-11-16 Thread Uri Even-Chen

Barry.R wrote:
You may be interested in an article which appeared in the Proceedings 
of the National Academy of Sciences  (America) August 8th issue 
entitled Unsupervised Learning of Natural Languages by four Israelis

Zach Solan, David Horn, Eytan Ruppin and Shimon Edelman.
They maintain that they have developed an unsupervised algorithm that 
discovers heirachchical structures in sequences
of data. This algorithm has been tested on several thousand sentences in 
languages it was originally unfamiliar with and was able to decode the 
grammatical structure of these languages and produce acceptable sentences.


Barry.


Thanks.  I found it, printed it and will read it.  Thanks for your advice.

Best Regards,

Uri Even-Chen
Speedy Net
Raanana, Israel.

E-mail: [EMAIL PROTECTED]
Phone: +972-9-7715013
Website: www.uri.co.il



=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: [off topic] A new project - automatic translation

2005-11-16 Thread Ehud Karni
On Wed, 16 Nov 2005 17:58:16 +0200, Uri Even-Chen wrote:

 Very interesting.  I haven't read Ray Kurzweil's book, nor heard his
 name until a few days ago, when I saw his name in one of the Wikipedia
 articles you sent me (about translation).  I also read about him and
 it's very impressive.  I want to write him and ask his opinion about my
 idea.  Do you happen to know his E-mail address?

One of his email is [EMAIL PROTECTED] (if you'll pass the junk
filters). You can also try to reach him at http://www.kurzweilai.net/ .

Ehud.


--
 Ehud Karni   Tel: +972-3-7966-561  /\
 Mivtach - Simon  Fax: +972-3-7966-667  \ /  ASCII Ribbon Campaign
 Insurance agencies   (USA) voice mail and   X   Against   HTML   Mail
 http://www.mvs.co.il  FAX:  1-815-5509341  / \
 GnuPG: 98EA398D http://www.keyserver.net/Better Safe Than Sorry

=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: [off topic] A new project - automatic translation

2005-11-16 Thread Ehud Karni
Uri,

You might find The Rosetta Project http://www.rosettaproject.org/live
interesting.

Ehud.


--
 Ehud Karni   Tel: +972-3-7966-561  /\
 Mivtach - Simon  Fax: +972-3-7966-667  \ /  ASCII Ribbon Campaign
 Insurance agencies   (USA) voice mail and   X   Against   HTML   Mail
 http://www.mvs.co.il  FAX:  1-815-5509341  / \
 GnuPG: 98EA398D http://www.keyserver.net/Better Safe Than Sorry

=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: [off topic] A new project - automatic translation

2005-11-15 Thread Shachar Shemesh
Nadav Har'El wrote:

On Mon, Nov 14, 2005, Uri Even-Chen wrote about Re: [off topic] A new project 
- automatic translation:
  

This is getting wildly off-topic, but...
  

but interesting.

but unlike a novice, he also does some patern
recognition on each of the resulting boards, and instantly (without
sequential computation) recognizes situations which are good, or bad, for
him. This final recognition is the part of their thought-process that
chess-players can't really explain, and is often called intuition.

Kurzweil argues that a chess-playing program could act similarly - walk the
the (fantasically huge) tree of possible moves and counter-moves, sequentially,
and at every junction apply a neural network that recognizes good boards,
and prune the tree at that junction if the neural network decides to that
this move is not worth it.
  

In other words, this is a case where a human programmer has to analyze
the problem, decide on a solution path, decide where, if at all, to
apply neural networks, and then program the whole thing.

I agree that this may produce a well playing chess program. I refuse to
call it artificial intelligence. I have nothing against neural
networks as a programming tool (aside from the fact that they are, in my
humble opinion, too complex for casual use).  I just don't like the
thought that says just throw a neural network at it and everything will
be ok given enough training data and time.

  Shachar

-- 
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html


=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: [off topic] A new project - automatic translation

2005-11-15 Thread Nadav Har'El
On Tue, Nov 15, 2005, Shachar Shemesh wrote about Re: [off topic] A new 
project - automatic translation:
 This is getting wildly off-topic, but...
 but interesting.
 
..
 In other words, this is a case where a human programmer has to analyze
 the problem, decide on a solution path, decide where, if at all, to
 apply neural networks, and then program the whole thing.
..

There's no argument that programming such a thing will take effort. Artifical
intelligence doesn't mean some sort of hey look, it's magic, I'll get a
working program without doing any effort!. Rather, the idea that the
programmer, while being an expert programmer, does not have to be an expert
chess player (to use this example), and the program can learn how to play
chess by watching the games of grandmasters. There's a division of labor,
if you will, by the programmer who can program, and the teachers who can
play chess extremely well but couldn't program if their life depended on it.
To return to the translation issue, the idea that Uri raised was that he
wanted to write a translation program, and perhaps spend a good deal of effort
doing so, but since he doesn't really know how to translate French to Swedish
(for example), he himself cannot teach the program to do that, and he hopes
that the program could pick up that skill from experts of these languages.

By the way, chess is probably not a very good example for this division of
labor (programmer vs. teacher), because with the strength of modern computers,
even the most naive, brute-force, tree-walking algorithms with the most
simplistic heuristic functions, can actually play great chess. A programmer
is enough, and you don't even need an expert chess teacher. These sorts of
simplistic algorithms makes my Palm Pilot beat me at chess every time, and
a stronger computer beat even the best chess player in the world.

Now you're probably saying: 'well, chess doesn't actually require intelligence
to play, and these programs should not be called artificial intelligence'.
Kurzweil also points out to this interesting phenomenon, of the drifting
definition of artificial intelligence. He claims that by definition, a
computer will never be called intelligent, because whenever we learn how
to do something with a computer, we'll suddenly say that this task does not
require intelligence. He gives as examples OCR and speech recognition, tasks
once thought to be too intelligent for a computer to undertake, but now
that computers do them casually we call these tasks un-intelligent, and
move our intelligence bar a little higher.

-- 
Nadav Har'El|Tuesday, Nov 15 2005, 13 Heshvan 5766
[EMAIL PROTECTED] |-
Phone +972-523-790466, ICQ 13349191 |[I'm] so full of action, my name should
http://nadav.harel.org.il   |be a verb -- Big Daddy Kane (Raw, 1987)

=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: [off topic] A new project - automatic translation

2005-11-14 Thread Muli Ben-Yehuda
On Sun, Nov 13, 2005 at 09:42:48PM +0200, Danny Lieberman wrote:

 As long as 
 you can find content-expert professional human translators at 
 5cents/word, you won't have a business proposition, because  a free  
 community effort will depend precisely upon the people who make a living 
 from translation.

As long as you can find professional programmers at $smallnum dollars,
you won't have a business proposition, because a free community effort
will depend precisely upon the people who make a living from
programming.

Need I say more?

Cheers,
Muli
-- 
Muli Ben-Yehuda
http://www.mulix.org | http://mulix.livejournal.com/


=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: [off topic] A new project - automatic translation

2005-11-14 Thread Shachar Shemesh
Uri Even-Chen wrote:

 Hi Shachar,

 Shachar Shemesh wrote:

 I know about four people who deal with linguistics (two of them actual
 linguists, one of them subscribed to this list).


 It would be nice if any of your friends would be interested in giving me
 their advice.  I want to write something that actually works, but not
 spend too much time writing it.  I'm counting on volunteers who will
 contribute to this project.

Like I said, one of said acquaintance is a subscriber of this list. He
will speak up if he so chooses.

Bear in mind that any translation engine contains a rather huge list of
words and various attributes about each one of them. The whole appeal of
Generative linguistics based engines is that you can significantly
reduce the number of attributes you store per-word. Of course, this is
also PRECISELY the reason they give out such appalling results.

Also bear in mind that creating this list of words is a task given to
trained linguistics to do - i.e. - it's totally manual, demands highly
skilled workers (and not of the sort of skill computer people usually
posses), and quite time consuming. You MAY find such lists available
somewhere for use in engines based on the usual technology, but I
wouldn't count on them enabling you to achieve anything better than what
the current engines already know how to.

 The short of it is that the reason that 30 years of research has not
 produced any good results with machine translations is that they are
 using the wrong tool for the job.

 I agree.  I want to try a new approach which was never tried before (as
 far as I know).

Care to explain what it is? Don't forget that if translating (or any
other type of NLP) consisted merely of getting a list of words and their
meaning, the problem would have been long ago solved.

I have heared of some cases where interesting results were achieved
using the neural network approach - teach the engine by letting it look
at translations done by others. Having learned some neural network, the
whole art is in choosing the correct learning network. A friend of mine
once referred to the entire field as Artificial Stupidity. You set up
a computer program, you tell it, with precise details, what it needs to
do. You set it out to do it. It does it. You cry out in wonder look, it
learns all by itself!

 My first friend claims that his engine
 achieves the same level of accuracy as proffesional engines today
 WITHOUT BEING ADAPTED TO THE USED CORPUS. He even claims he has no
 particular problems with the standard gotchas, such as the infamous
 Time flies like an arrow, fruit flies like a banana.

 It would be interesting to see your friend's engine in action.  Do you
 have any link?

No. It's a private company looking for financing. I did talk to him
about releasing it open source should the other alternative be letting
the technology sink, but I'm actually hoping for his sake that that
doesn't happen (and things are looking fairly well, in that respect). In
the mean while, I'm not aware of any web-presence the company has at all.

 Best Regards,

 Uri Even-Chen
 Speedy Net
 Raanana, Israel.

 E-mail: [EMAIL PROTECTED]
 Phone: +972-9-7715013
 Website: www.uri.co.il


-- 
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html


=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: [off topic] A new project - automatic translation

2005-11-14 Thread Uri Even-Chen

Hi Shachar,

Shachar Shemesh wrote:

I know about four people who deal with linguistics (two of them actual
linguists, one of them subscribed to this list).


It would be nice if any of your friends would be interested in giving me
their advice.  I want to write something that actually works, but not
spend too much time writing it.  I'm counting on volunteers who will
contribute to this project.


The short of it is that the reason that 30 years of research has not
produced any good results with machine translations is that they are
using the wrong tool for the job.


I agree.  I want to try a new approach which was never tried before (as
far as I know).


My first friend claims that his engine
achieves the same level of accuracy as proffesional engines today
WITHOUT BEING ADAPTED TO THE USED CORPUS. He even claims he has no
particular problems with the standard gotchas, such as the infamous
Time flies like an arrow, fruit flies like a banana.


It would be interesting to see your friend's engine in action.  Do you
have any link?

Best Regards,

Uri Even-Chen
Speedy Net
Raanana, Israel.

E-mail: [EMAIL PROTECTED]
Phone: +972-9-7715013
Website: www.uri.co.il



=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: [off topic] A new project - automatic translation

2005-11-14 Thread Ely Levy

 Uri,

 I am not sure that you grasp the enormity of the task at hand. However, I 
 don't
 want to discourage you. Unlike others, I would appreciate a stupid translator
 that replaces a word by a word. It would be wonderful to have a free software
 that is as bad Google or babelfish. I would LOVE to have a free too to make 
 fun
 about. Now we have nothing.

There are few already no?

 So go on and get to work. Note that if I were you, I'd set myself a more
 realistic goals first. In a way, I am you: when we set out to write hspell we
 had dreams of a linguistic future. I hope that the huge list that we 
 collected,
 of almost all modern Hebrew words, will be useful to you.

Even the most huge tasks can be started as small ones.
Wordnet and the free dictionary could be a good start.
(word net is more what he probebly needs).
  I want to have both an algorithm and a database of languages (words,
  phrases etc) that will improve over time.  That is, start with a simple
  algorithm, and feed data into it.  The data will be sources and
  translations in any language.  When there is enough data for a given

 The database you describe here is not dissimilar to WordNet, and I am told 
 that
 few list members are trying to extend the Hebrew WordNet. Maybe you can join
 them.

Yea, join us;) or word together with the Mila team (which I think also try
to get to the same goal).
Btw I was thinking about automaticly translating software with the right
glossary. It seems that is a lot easier task. Especialy with the right po
comments. If anyone wish to join and help;)

Ely

 --
 Dan Kenigsberghttp://www.cs.technion.ac.il/~dankenICQ 
 162180901

 =
 To unsubscribe, send mail to [EMAIL PROTECTED] with
 the word unsubscribe in the message body, e.g., run the command
 echo unsubscribe | mail [EMAIL PROTECTED]



=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: [off topic] A new project - automatic translation

2005-11-14 Thread Dan Kenigsberg
On Mon, Nov 14, 2005 at 01:46:03PM +0200, Offer Kaye wrote:
 On 11/14/05, Dan Kenigsberg wrote:
  that is as bad Google or babelfish. I would LOVE to have a free tool to 
  make fun
  about. Now we have nothing.
 
 Enter the Perl module Lingua::Translate -
 Go ahead, make fun of it :)

Since you are asking so nicely, I'll simply quote from its man page:

Locale::Translate translates text from one written
language to another.  Currently this is implemented by
contacting Babelfish (http://babelfish.altavista.com/), so
see there for the language pairs that are supported.
Babelfish uses SysTran (http://www.systran.org/) to perform

Meaning it is not interestingly free (or extremely interesting at all), and has
no Hebrew support (but has Arabic!).

-- 
Dan Kenigsberghttp://www.cs.technion.ac.il/~dankenICQ 162180901

=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: [off topic] A new project - automatic translation

2005-11-14 Thread Uri Even-Chen

Shachar Shemesh wrote:

Care to explain what it is? Don't forget that if translating (or any
other type of NLP) consisted merely of getting a list of words and their
meaning, the problem would have been long ago solved.


I think I already explained it.  I'm copying what I wrote yesterday:

I want to consider using existing databases, which are free to use, such
as your Hspell project or Wikipedia - to feed initial data into the
system.  The main goal is to have (for each pair of languages) a list of
translations of words, phrases and maybe even sentences.  Then, the
algorithm will just do search and replace - for every word, phrase or
sentence it will replace it with its equivalent in the target languages.
I think it's quite a simple algorithm to start with.  And then it will
be improved in the future.  (Even Linux was not written in one day!).

Look at my E-mail from yesterday for more details.


I have heared of some cases where interesting results were achieved
using the neural network approach - teach the engine by letting it look
at translations done by others. Having learned some neural network, the
whole art is in choosing the correct learning network. A friend of mine
once referred to the entire field as Artificial Stupidity. You set up
a computer program, you tell it, with precise details, what it needs to
do. You set it out to do it. It does it. You cry out in wonder look, it
learns all by itself!


I worked with artificial neural networks in the past and I think the
approach of Artificial Stupidity is wrong.  Neural networks do work in
some cases.  The idea (in a nutshell) is that they can generalize, and
you don't teach them how to generalize.  You just feed them with data,
train them and they generalize by themselves.

On the other hand, neural networks are not really intelligent.  You
can't teach them to play chess, for example.  They are not suitable for
everything.

If you're interested in what I did with neural networks - I used them to
compose music.  If you want to see more details, look at Speedy
Composer: http://www.speedy.co.il/composer/

Best Regards,

Uri Even-Chen
Speedy Net
Raanana, Israel.

E-mail: [EMAIL PROTECTED]
Phone: +972-9-7715013
Website: www.uri.co.il



=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: [off topic] A new project - automatic translation

2005-11-14 Thread Uri Even-Chen

Hi Dan,

Dan Kenigsberg wrote:

I am not sure that you grasp the enormity of the task at hand. However, I don't
want to discourage you. Unlike others, I would appreciate a stupid translator
that replaces a word by a word. It would be wonderful to have a free software
that is as bad Google or babelfish. I would LOVE to have a free too to make fun
about. Now we have nothing.

So go on and get to work. Note that if I were you, I'd set myself a more
realistic goals first. In a way, I am you: when we set out to write hspell we
had dreams of a linguistic future. I hope that the huge list that we collected,
of almost all modern Hebrew words, will be useful to you.


Thanks for encouraging me (if you are not being cynical).


The database you describe here is not dissimilar to WordNet, and I am told that
few list members are trying to extend the Hebrew WordNet. Maybe you can join
them.


Are the Hspell and WordNet databases available to use?  Can you send me
links?  I'll be glad to contribute if I can, but my focus is not on any
specific language (Hebrew) but on translating in general.  My idea is to
create something like Wikipedia (but for translations), and yes, I
understand how huge the task is.

If there is any database I can use, which contains texts (or words,
phrases) and their translations, please let me know.  I would like to
start with something.

Of course, it must be a database which is free to use without legal
problems.

Best Regards,

Uri Even-Chen
Speedy Net
Raanana, Israel.

E-mail: [EMAIL PROTECTED]
Phone: +972-9-7715013
Website: www.uri.co.il



=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: [off topic] A new project - automatic translation

2005-11-14 Thread Shachar Shemesh
Uri Even-Chen wrote:

 I want to consider using existing databases, which are free to use, such
 as your Hspell project or Wikipedia - to feed initial data into the
 system.  The main goal is to have (for each pair of languages) a list of
 translations of words, phrases and maybe even sentences.  Then, the
 algorithm will just do search and replace - for every word, phrase or
 sentence it will replace it with its equivalent in the target languages.
 I think it's quite a simple algorithm to start with.  And then it will
 be improved in the future.  (Even Linux was not written in one day!).

Not trying to discourage you, but it seems that this approach, if it's
going to work at all, will likely only start working when it has a HUGE
database of phrases. The words approach seems fairly hopeless to me.

 I worked with artificial neural networks in the past and I think the
 approach of Artificial Stupidity is wrong.  Neural networks do work in
 some cases.

Yes, if you pick the right neural function, and choose the correct
number of network levels, and how many neurons to put in each level.
This does strike me as almost the same thing as actually coding the
thing, except that I am not aware of any better way of deciding what
this right number is except trial and error. My (limited) expereince
with neural networks is that even detecting over-learning is not easy.

   The idea (in a nutshell) is that they can generalize, and
 you don't teach them how to generalize.

Unless you call choosing the right number of neurons and choosing the
number of layers to the network and choosing the right neural
function - teaching. If you happen, like me, to think that there is
no fundemental difference between the art of getting the neural network
parameters right and the art of art of programming an algorithmic
solution to the same problem (except that in the former case, the
programmer itself is rather powerless in case a bug is found after the
product is released), then I see nothing special about neural networks'
ability to generalize.

 They are not suitable for everything.

Nothing is. I'll settle for ideas regarding what things they are useful
for. Please restrict your answer to those in which success ratio can be
accurately measured.

 If you're interested in what I did with neural networks - I used them to
 compose music.  If you want to see more details, look at Speedy
 Composer: http://www.speedy.co.il/composer/

Sorry, no time at the moment. Just answer the above question, i.e. - can
you give (relatively) objective standard by which how good the music
your neural network produced can be measured? Please don't understand
this question as a taunt. Music is a highly subjective thing, and there
is nothing wrong with a program that can produce good music, regardless
of what definition of good you may wish to use. With translations,
however, quality measurement is, by far, less subjective. Any program
that consistantly produces a translation of a general text that 20% of
the target native speaking population will call a good translation
will get my appreciation. I'll even not include literary text, as those
tend to be harder to translate.

 Best Regards,

 Uri Even-Chen
 Speedy Net
 Raanana, Israel.

 E-mail: [EMAIL PROTECTED]
 Phone: +972-9-7715013
 Website: www.uri.co.il
 

  Shachar

-- 
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html


=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: [off topic] A new project - automatic translation

2005-11-14 Thread Uri Even-Chen

Offer Kaye wrote:

There's *no way* to go from a simplistic search and replace of
single words (or very short/simple phrases) to a full blown
translation software. There's no improvement you could make that
would make such a methodology work for complete sentences in a real
language. .Anyone who tells you otherwise is trying to sell you
something...

You first need to come up with a working method, then find a way to
implement it. You can't try to start with a naive list of words to
replace and expand that to translating complete sentences.


I want to start with a simple algorithm and improve it with time.  If a
better algorithm is found or developed, it can completely replace the
initial algorithm.  But I want to start with something, and something
which is not too difficult to implement.


As for maintaining *lists of sentences* to translate dude, are you nuts? ;-)


It will not contain every sentence in the world.  But it can remember
specific sentences of texts which were previously translated.  If the
same sentence will be found again, the same translation will be used.

But the main focus is words and phrases, and not sentences.


Finally, since you mentioned Wikipedia, here are some useful links:
   http://en.wikipedia.org/wiki/Machine_translation
See especially the section on Free (open source) software for
existing efforts where you might be able to help.
A more general article regarding translation:
   http://en.wikipedia.org/wiki/Translation


Very interesting.  Thanks for the links.  I'm glad I wrote this mailing
list - I received many useful comments.

I also read the article Translation memory
[http://en.wikipedia.org/wiki/Translation_memory] (linked from the
articles you sent me).  I think the term Translation memory describes
what I want to do: I want to create a huge translation memory database
of many languages, which will be used to translate texts from one
language to another.  It's data will come from volunteers all over the
world.  This translation memory database will be used together with
other methods to translate texts.  The more words and phrases there are
in this translation memory database, the better the quality of the
translation will be.  And it will be big if many people will contribute
to it.  Compare to Wikipedia as an encyclopedia.

Best Regards,

Uri Even-Chen
Speedy Net
Raanana, Israel.

E-mail: [EMAIL PROTECTED]
Phone: +972-9-7715013
Website: www.uri.co.il



=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: [off topic] A new project - automatic translation

2005-11-14 Thread Nadav Har'El
On Mon, Nov 14, 2005, Uri Even-Chen wrote about Re: [off topic] A new project 
- automatic translation:
 On the other hand, neural networks are not really intelligent.  You
 can't teach them to play chess, for example.  They are not suitable for
 everything.

This is getting wildly off-topic, but...

Ray Kurzweil, in his book The Age of Spiritual Machines: When Computers
Exceed Human Intelligence, makes the following observation about human
thought, and how computers can immitate it. If you'll allow me to put what
I remember from his writing into my own words:

He believes that human thought has two modes: computation and patern
recognition. Examples of the former include an arithmetic computation, or
thinking about several options one after another, and an example of the
latter includes face recognition. He believes that chess is an example where
both modes are used: a chess expert, like a chess novice, goes in his head
through many of the possible moves and his oponent's possible reactions (this
is the computational mode), but unlike a novice, he also does some patern
recognition on each of the resulting boards, and instantly (without
sequential computation) recognizes situations which are good, or bad, for
him. This final recognition is the part of their thought-process that
chess-players can't really explain, and is often called intuition.

Kurzweil argues that a chess-playing program could act similarly - walk the
the (fantasically huge) tree of possible moves and counter-moves, sequentially,
and at every junction apply a neural network that recognizes good boards,
and prune the tree at that junction if the neural network decides to that
this move is not worth it.

This technique, of walking huge trees with a *heuristic function*, are well
known in AI (look up A star, Minimax, etc.) and are not Kurzweil's
invention. But his interesting insight is that Neural Networks are useful
but SHOULD NOT (not only CAN NOT) be used directly to solve every problem,
but rather should be combined with other computational techniques.

It is arguable that similarly, neural networks should not be used directly
to parse language. It is very possible that language understanding and
generation has both a computational, or sequential, aspect (reading the
words one by one, following some sort of state machine in your head), and
a pattern recognition aspect.

 If you're interested in what I did with neural networks - I used them to
 compose music.  If you want to see more details, look at Speedy
 Composer: http://www.speedy.co.il/composer/

The music is very nice :)

-- 
Nadav Har'El| Monday, Nov 14 2005, 13 Heshvan 5766
[EMAIL PROTECTED] |-
Phone +972-523-790466, ICQ 13349191 |A computer without Microsoft is like a
http://nadav.harel.org.il   |chocolate cake without mustard.

=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: [off topic] A new project - automatic translation

2005-11-14 Thread Nadav Har'El
On Mon, Nov 14, 2005, Uri Even-Chen wrote about Re: [off topic] A new project 
- automatic translation:
 Are the Hspell and WordNet databases available to use?  Can you send me
 links?  I'll be glad to contribute if I can, but my focus is not on any

Of course, both are released with free software licenses:

http://www.ivrix.org.il/projects/spell-checker/

http://cl.haifa.ac.il/projects/mwn/

 specific language (Hebrew) but on translating in general.  My idea is to
 create something like Wikipedia (but for translations), and yes, I
 understand how huge the task is.

http://www.wiktionary.org/ is a like wikipedia (but for translations),
but for individual words. Like people noted here time and again, this is
only a small step in the translation direction. Arguably, it is even a step
in the wrong direction (with the wordnet effort being more in the right
direction, because it translates individual word senses, rather than words).

 If there is any database I can use, which contains texts (or words,
 phrases) and their translations, please let me know.  I would like to
 start with something.

Again, wikitionary has this for words.
If you'll read some machine-translation literature, you'll find references
to a bunch of corpora which contain quality texts translated into several
languages, with an explicit correspondence between the sentences in each
language. For example, there is a corpus of EU laws translated into several
EU languages. And I believe there's also a UN corpus. Sorry, I don't have
any links.

And, for an interesting corpus, why not try... the bible? It has been
translated into countless languages, and a strict correspondence between
the verses has been observed. Of course, some of the translations features
some archaic language :-)


-- 
Nadav Har'El| Monday, Nov 14 2005, 13 Heshvan 5766
[EMAIL PROTECTED] |-
Phone +972-523-790466, ICQ 13349191 |Strike not only while the iron is hot,
http://nadav.harel.org.il   |make the iron hot by striking it.

=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: [off topic] A new project - automatic translation

2005-11-14 Thread Dan Kenigsberg
Uri,

I am not sure that you grasp the enormity of the task at hand. However, I don't
want to discourage you. Unlike others, I would appreciate a stupid translator
that replaces a word by a word. It would be wonderful to have a free software
that is as bad Google or babelfish. I would LOVE to have a free too to make fun
about. Now we have nothing.

So go on and get to work. Note that if I were you, I'd set myself a more
realistic goals first. In a way, I am you: when we set out to write hspell we
had dreams of a linguistic future. I hope that the huge list that we collected,
of almost all modern Hebrew words, will be useful to you.

 I want to have both an algorithm and a database of languages (words,
 phrases etc) that will improve over time.  That is, start with a simple
 algorithm, and feed data into it.  The data will be sources and
 translations in any language.  When there is enough data for a given

The database you describe here is not dissimilar to WordNet, and I am told that
few list members are trying to extend the Hebrew WordNet. Maybe you can join
them.

-- 
Dan Kenigsberghttp://www.cs.technion.ac.il/~dankenICQ 162180901

=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: [off topic] A new project - automatic translation

2005-11-14 Thread Offer Kaye
On 11/13/05, Uri Even-Chen wrote:
  The main goal is to have (for each pair of languages) a list of
 translations of words, phrases and maybe even sentences.  Then, the
 algorithm will just do search and replace - for every word, phrase or
 sentence it will replace it with its equivalent in the target languages.
 I think it's quite a simple algorithm to start with.  And then it will
 be improved in the future.  (Even Linux was not written in one day!).

There's *no way* to go from a simplistic search and replace of
single words (or very short/simple phrases) to a full blown
translation software. There's no improvement you could make that
would make such a methodology work for complete sentences in a real
language. .Anyone who tells you otherwise is trying to sell you
something...

You first need to come up with a working method, then find a way to
implement it. You can't try to start with a naive list of words to
replace and expand that to translating complete sentences.

As for maintaining *lists of sentences* to translate dude, are you nuts? ;-)

Finally, since you mentioned Wikipedia, here are some useful links:
   http://en.wikipedia.org/wiki/Machine_translation
See especially the section on Free (open source) software for
existing efforts where you might be able to help.
A more general article regarding translation:
   http://en.wikipedia.org/wiki/Translation


Best regards,
--
Offer Kaye

To unsubscribe, 
send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: [off topic] A new project - automatic translation

2005-11-14 Thread Offer Kaye
On 11/14/05, Dan Kenigsberg wrote:
 Unlike others, I would appreciate a stupid translator
 that replaces a word by a word. It would be wonderful to have a free software
 that is as bad Google or babelfish. I would LOVE to have a free too to make 
 fun
 about. Now we have nothing.


Enter the Perl module Lingua::Translate -
   http://search.cpan.org/dist/Lingua-Translate/

It uses SYSTRAN by default, which is the same backend used by Google (AFAIK):
   http://www.systransoft.com/index.html

Go ahead, make fun of it :)

Cheers,
--
Offer Kaye

To unsubscribe, 
send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: [off topic] A new project - automatic translation

2005-11-14 Thread Uri Bruck

Nadav Har'El wrote:



And, for an interesting corpus, why not try... the bible? It has been
translated into countless languages, and a strict correspondence between
the verses has been observed. Of course, some of the translations features
some archaic language :-)


Hebrew and the 1917 JPS English translation.
http://www.mechon-mamre.org/

Lots of translations:
http://bible.gospelcom.net/





--
Thanks,
Uri
http://translation.israel.net

=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: [off topic] A new project - automatic translation

2005-11-14 Thread Uri Bruck

Uri Even-Chen wrote:



I also read the article Translation memory
[http://en.wikipedia.org/wiki/Translation_memory] (linked from the
articles you sent me).  I think the term Translation memory describes
what I want to do: I want to create a huge translation memory database
of many languages, which will be used to translate texts from one
language to another.


A single TM database for all fields isn't very useful.


--
Thanks,
Uri
http://translation.israel.net

=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: [off topic] A new project - automatic translation

2005-11-14 Thread Barry.R

Uri Even-Chen wrote:


Offer Kaye wrote:


There's *no way* to go from a simplistic search and replace of
single words (or very short/simple phrases) to a full blown
translation software. There's no improvement you could make that
would make such a methodology work for complete sentences in a real
language. .Anyone who tells you otherwise is trying to sell you
something...

You first need to come up with a working method, then find a way to
implement it. You can't try to start with a naive list of words to
replace and expand that to translating complete sentences.



I want to start with a simple algorithm and improve it with time.  If a
better algorithm is found or developed, it can completely replace the
initial algorithm.  But I want to start with something, and something
which is not too difficult to implement.

You may be interested in an article which appeared in the Proceedings 
of the National Academy of Sciences  (America) August 8th issue 
entitled Unsupervised Learning of Natural Languages by four Israelis

Zach Solan, David Horn, Eytan Ruppin and Shimon Edelman.
They maintain that they have developed an unsupervised algorithm that 
discovers heirachchical structures in sequences
of data. This algorithm has been tested on several thousand sentences in 
languages it was originally unfamiliar with and was able to decode the 
grammatical structure of these languages and produce acceptable sentences.


Barry.


=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]





=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: [off topic] A new project - automatic translation

2005-11-13 Thread Orna Agmon
On Sun, 13 Nov 2005, Uri Even-Chen wrote:

 Date: Sun, 13 Nov 2005 15:13:38 +0200
 From: Uri Even-Chen [EMAIL PROTECTED]
 To: linux-il linux-il@linux.org.il
 Subject: [off topic] A new project - automatic translation

 Hi people,

 I am thinking about starting a new open source project related to
 automatic translation.  The idea is to create a website and/or software
 that will automatically translate texts from one language to another.  I
 want the website or software to be able to get feedback from users and
 learn from mistakes.  The quality of translation should improve over time.

 I would like to get your feedback about this idea:
 - Do you think it's a good idea?
 - Do you think it can be implemented?
 - Do you have any suggestions how to implement it?
 - How many people are necessary to implement such an idea?
 - Are you interested in being involved?

 Personally I think there is a need for such a website or software.
 There are existing websites or softwares, but they have three main
 disadvantages:

 1. Some of them are not free.
 2. Most of them support only a few languages.
 3. I think all of them should improve the quality of translation.

 I want to have a website that support many languages, its usage is free,
 and the quality of translation will improve over time.

 What do you think?

It is not so much a question of implementation, free or not, website or
program, but of algorithms. Before you run to implement, you need to
research the subject.  Many researchers already are researcing it,
and as you can see, the results are far from perfect, since this is a
problem of natural languages.

Orna.
--
Orna Agmon http://ladypine.org/  http://haifux.org/~ladypine/
ICQ: 348759096


=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: [off topic] A new project - automatic translation

2005-11-13 Thread Nadav Har'El
On Sun, Nov 13, 2005, Uri Even-Chen wrote about [off topic] A new project - 
automatic translation:
 I am thinking about starting a new open source project related to
 automatic translation.  The idea is to create a website and/or software
 that will automatically translate texts from one language to another.  I
 want the website or software to be able to get feedback from users and
 learn from mistakes.  The quality of translation should improve over time.
 
 I would like to get your feedback about this idea:
 - Do you think it's a good idea?

The idea of machine translation is obviously a good one, and it would be
even better if we had one that was free, both in price and in freedom to
inspect and to improve the code.

BUT, between saying that it's a good idea, and actually being able to
implement it, there's a VERY VERY LONG road.

 - Do you think it can be implemented?
 - Do you have any suggestions how to implement it?

Machine translation has, or at least good one, has long been an open question
in AI research. Better and better algorithms are appearing, and your first
course of action should probably to read up on known approaches (in linguistic
journals, books, university courses, etc.). What will very likely NOT WORK
is any naive approach, including the one which you seem to imply above (some
sort of simplistic machine learning approach) - people already tried these
simplistic approaches, long ago, and they just didn't work.

In addition to algorithms, you'll also need linguistic data. There has been
plenty of research on trying to teach a computer a language without linguistic
data, i.e., using only untagged texts. But I'm not aware of such research ever
being fully successful. Arguably, even a baby doesn't get untagged text when
he learns a language, but rather also gets input on physical objects
corresponding to words, get corrected by his teacher, and so on.

So be prepared for a lot (and I mean A LOT) of work on preparing linguistic
data: lexicons of words, dictionaries of word *meanings* (with links between
different languages), various representations of grammar. You will probably
also need collections of idioms, names, and various other ways of representing
world knowledge, which is often needed for good translation. Alternative
approaches use tagged texts instead of linguistic data, but these are also
hard to come by (especially in Hebrew).

All of this is very far from being easy, unfortunately.

When we started Hspell (http://ivrix.org.il/projects/spell-checker/), we
envisioned it as the first step toward more sophisticated linguistic
applications, including machine translation. But it was only the first step,
in the journey of many miles :(

 - How many people are necessary to implement such an idea?

My estimate (based on nothing but pure guesswork) is that you can get something
sort-of-working in 5 person-years. This is about 10 times more work than went
into Hspell so far... But then again, I'm not a translation expert (or even
a novice) and maybe I'm grossly underestimating the complexity involved.

I also suggest you take a look at http://www.mila.cs.technion.ac.il/,
which is the Knowledge Center for Processing Hebrew. This is a cooperation
of people from the Academia who work in the field of Computational
Lingustics, in Hebrew, and they finally started to cooperate in building
basic building blocks that are necessary to advance Hebrew linguistic
research. These building blocks will be released as free software, and
they include (or will include) a morphological analyzer (similar in purpose
to Hspell), worse-sense disambiguators, tagged texts, grammar analyzers,
and so on. I assume that they are interested as well to advance their
toolset to the point that they will also have translation tools, text
understanding and generation tools, and so on. But they are also quite
far from this goal.

 - Are you interested in being involved?

I am, but slowly slowly (in the same mode I've been working on Hspell so far)
:)

 I want to have a website that support many languages, its usage is free,
 and the quality of translation will improve over time.

I am not sure I understand the quality of translation will improve over time
thing. What makes you think that a translator, whether computerized or even
human, can learn to improve his translation capabilities meerly by translating
more texts (without any feedback)? And even if there's feedback, how will it
be used? Are you aware of any papers on machine-translation that actually can
learn from past experience?

-- 
Nadav Har'El| Sunday, Nov 13 2005, 12 Heshvan 5766
[EMAIL PROTECTED] |-
Phone +972-523-790466, ICQ 13349191 |I started out with nothing... I still
http://nadav.harel.org.il   |have most of it.

=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message 

Re: [off topic] A new project - automatic translation

2005-11-13 Thread Danny Lieberman

Uri,

This is a ripe research area as Orna has pointed out, I might add that 
it has been around for almost 30 years with no significant breakthroughs.


Having just finished the first phase of translation of our main 
project's Web site to french, german and Italian - I can vouch that 
there is an enormous amount of quality translation resources available 
online - go to http://www.proz.com/  that has leveled the  playing field 
of pricing and translation service providers.


In other words, IMHO, you don't have a business proposition.

Danny

Orna Agmon wrote:


On Sun, 13 Nov 2005, Uri Even-Chen wrote:

 


Date: Sun, 13 Nov 2005 15:13:38 +0200
From: Uri Even-Chen [EMAIL PROTECTED]
To: linux-il linux-il@linux.org.il
Subject: [off topic] A new project - automatic translation

Hi people,

I am thinking about starting a new open source project related to
automatic translation.  The idea is to create a website and/or software
that will automatically translate texts from one language to another.  I
want the website or software to be able to get feedback from users and
learn from mistakes.  The quality of translation should improve over time.

I would like to get your feedback about this idea:
- Do you think it's a good idea?
- Do you think it can be implemented?
- Do you have any suggestions how to implement it?
- How many people are necessary to implement such an idea?
- Are you interested in being involved?

Personally I think there is a need for such a website or software.
There are existing websites or softwares, but they have three main
disadvantages:

1. Some of them are not free.
2. Most of them support only a few languages.
3. I think all of them should improve the quality of translation.

I want to have a website that support many languages, its usage is free,
and the quality of translation will improve over time.

What do you think?
   



It is not so much a question of implementation, free or not, website or
program, but of algorithms. Before you run to implement, you need to
research the subject.  Many researchers already are researcing it,
and as you can see, the results are far from perfect, since this is a
problem of natural languages.

Orna.
--
Orna Agmon http://ladypine.org/  http://haifux.org/~ladypine/
ICQ: 348759096


=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



 



--
Danny Lieberman
Visit us at http://www.software.co.il
Office + 972  8 970-1485
Cell   + 972 54 447-1114



=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: [off topic] A new project - automatic translation

2005-11-13 Thread Uri Even-Chen

I'm replying to a few people together.

Orna Agmon wrote:

It is not so much a question of implementation, free or not, website or
program, but of algorithms. Before you run to implement, you need to
research the subject.  Many researchers already are researcing it,
and as you can see, the results are far from perfect, since this is a
problem of natural languages.


Yes, I'm aware of it.

Nadav Har'El wrote:

The idea of machine translation is obviously a good one, and it would be
even better if we had one that was free, both in price and in freedom to
inspect and to improve the code.

BUT, between saying that it's a good idea, and actually being able to
implement it, there's a VERY VERY LONG road.


I agree.


Machine translation has, or at least good one, has long been an open question
in AI research. Better and better algorithms are appearing, and your first
course of action should probably to read up on known approaches (in linguistic
journals, books, university courses, etc.). What will very likely NOT WORK
is any naive approach, including the one which you seem to imply above (some
sort of simplistic machine learning approach) - people already tried these
simplistic approaches, long ago, and they just didn't work.


I want to have both an algorithm and a database of languages (words,
phrases etc) that will improve over time.  That is, start with a simple
algorithm, and feed data into it.  The data will be sources and
translations in any language.  When there is enough data for a given
pair of languages, the software will be able to try to translate.
People will correct  improve the translations and feed them back into
the system.  This will improve the quality of translation for the given
languages.  In addition, the algorithms will be improved over time.

All feedback  improvements will be done by volunteers, in the spirit of
Wikipedia and similar projects.  Using the website for all users will be
free of charge.


When we started Hspell (http://ivrix.org.il/projects/spell-checker/), we
envisioned it as the first step toward more sophisticated linguistic
applications, including machine translation. But it was only the first step,
in the journey of many miles :(


I want to consider using existing databases, which are free to use, such
as your Hspell project or Wikipedia - to feed initial data into the
system.  The main goal is to have (for each pair of languages) a list of
translations of words, phrases and maybe even sentences.  Then, the
algorithm will just do search and replace - for every word, phrase or
sentence it will replace it with its equivalent in the target languages.
I think it's quite a simple algorithm to start with.  And then it will
be improved in the future.  (Even Linux was not written in one day!).


My estimate (based on nothing but pure guesswork) is that you can get something
sort-of-working in 5 person-years. This is about 10 times more work than went
into Hspell so far... But then again, I'm not a translation expert (or even
a novice) and maybe I'm grossly underestimating the complexity involved.


I hope that for writing the first version (alpha), it will require less
than one person-year.  Not including feeding the data into the system.


I also suggest you take a look at http://www.mila.cs.technion.ac.il/,
which is the Knowledge Center for Processing Hebrew. This is a cooperation
of people from the Academia who work in the field of Computational
Lingustics, in Hebrew, and they finally started to cooperate in building
basic building blocks that are necessary to advance Hebrew linguistic
research. These building blocks will be released as free software, and
they include (or will include) a morphological analyzer (similar in purpose
to Hspell), worse-sense disambiguators, tagged texts, grammar analyzers,
and so on. I assume that they are interested as well to advance their
toolset to the point that they will also have translation tools, text
understanding and generation tools, and so on. But they are also quite
far from this goal.


I want the first algorithm (alpha) to be independent of language - it
should work for any pair of languages.  Of course I want to support
Hebrew, but many other languages too.


I am not sure I understand the quality of translation will improve over time
thing. What makes you think that a translator, whether computerized or even
human, can learn to improve his translation capabilities meerly by translating
more texts (without any feedback)? And even if there's feedback, how will it
be used? Are you aware of any papers on machine-translation that actually can
learn from past experience?


In order for the quality of translation to improve over time, there is
need for feedback.  People (who understand both languages) should
correct translations and feed them back into the system.  The algorithm
should remember the feedback and update the database.  The next time the
same sentence (or phrase, or word) is translated by the system, the
corrected 

Re: [off topic] A new project - automatic translation

2005-11-13 Thread Shachar Shemesh
I'll start by stating that all of my info comes from a close friend, who
is both a linguist and the owner of a small startup that does NLP
(Natural Language Processing). I cannot testify to the truth behind all
of the opinions I am transferring here, but I can testify that I have
seen his technology, and it's fairly awesome.

Danny Lieberman wrote:

 Uri,

 This is a ripe research area as Orna has pointed out, I might add that
 it has been around for almost 30 years with no significant breakthroughs.

According to this friend of mine, it is more precise to say that there
has been much research on the matter some 30 years ago. He says that
most of the research done was in a field of linguistics called
Generative Linguistics, lead by one Noam Chomsky. According to another
friend of mine, Chomsky, while reveared by Generative linguistics around
the world, is renowned for having his theories around NLP being knocked
down, one after the other, to which he simply goes out and makes up a
new theory.

I know about four people who deal with linguistics (two of them actual
linguists, one of them subscribed to this list). None of them has any
respect whatso'ever to Generative Linguistics. From the little I know
about what Generative Linguistics is about, neither do I.

The short of it is that the reason that 30 years of research has not
produced any good results with machine translations is that they are
using the wrong tool for the job. My first friend claims that his engine
achieves the same level of accuracy as proffesional engines today
WITHOUT BEING ADAPTED TO THE USED CORPUS. He even claims he has no
particular problems with the standard gotchas, such as the infamous
Time flies like an arrow, fruit flies like a banana.

That's the end of my rant. If any list reading differential SCSI cable
yielding Structural Linguist wishes to add his proffesional view of
things, he is welcome.

 Shachar

-- 
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html


=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: [off topic] A new project - automatic translation

2005-11-13 Thread Danny Lieberman

I'm sure this problem will eventually be cracked.

However, experience and the laws of physics teach us that problems and 
physical objects roll down walls to the lower energy levels not up walls 
to higher energy levels.  I challenge any NLP system to understand that 
in German you write Home for home page and not StartSeit.  As long as 
you can find content-expert professional human translators at 
5cents/word, you won't have a business proposition, because  a free  
community effort will depend precisely upon the people who make a living 
from translation. OTOH - if a technology solution costs more than that, 
it wont be a viable alternative either.


BTW - I hear that organizations like NSA use NLP to listen in on 
conversations and translate - but they also have  a small army of 
linguists to make sense of it.


that's the end of my 5c :-)

Shachar Shemesh wrote:


I'll start by stating that all of my info comes from a close friend, who
is both a linguist and the owner of a small startup that does NLP
(Natural Language Processing). I cannot testify to the truth behind all
of the opinions I am transferring here, but I can testify that I have
seen his technology, and it's fairly awesome.

Danny Lieberman wrote:

 


Uri,

This is a ripe research area as Orna has pointed out, I might add that
it has been around for almost 30 years with no significant breakthroughs.
   



According to this friend of mine, it is more precise to say that there
has been much research on the matter some 30 years ago. He says that
most of the research done was in a field of linguistics called
Generative Linguistics, lead by one Noam Chomsky. According to another
friend of mine, Chomsky, while reveared by Generative linguistics around
the world, is renowned for having his theories around NLP being knocked
down, one after the other, to which he simply goes out and makes up a
new theory.

I know about four people who deal with linguistics (two of them actual
linguists, one of them subscribed to this list). None of them has any
respect whatso'ever to Generative Linguistics. From the little I know
about what Generative Linguistics is about, neither do I.

The short of it is that the reason that 30 years of research has not
produced any good results with machine translations is that they are
using the wrong tool for the job. My first friend claims that his engine
achieves the same level of accuracy as proffesional engines today
WITHOUT BEING ADAPTED TO THE USED CORPUS. He even claims he has no
particular problems with the standard gotchas, such as the infamous
Time flies like an arrow, fruit flies like a banana.

That's the end of my rant. If any list reading differential SCSI cable
yielding Structural Linguist wishes to add his proffesional view of
things, he is welcome.

Shachar

 



--
Danny Lieberman
Visit us at http://www.software.co.il
Office + 972  8 970-1485
Cell   + 972 54 447-1114



=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]