Re: [Mt-list] Public release of Haitian Creole language databyCarnegie Mellon

Jeff Allen Sat, 23 Jan 2010 02:29:00 -0800

All,

Can we please end this discussion about nitpicking over licensing and potential
lawsuits.

As Bob and Alon have clearly said, this licensing statement has been used for
over a decade with other CMU published stuff.  Did anything negative ever come
of that?   If nothing negative, then case is closed.

As Bob said, we just want credit to be given where credit is due.  I work
outside of this industry and had the guts (in the midst of an already stressed
out software dev cycle) to approach my boss a week ago with a proposal to
participate in this special Haiti Disaster Relief project, and to see how the
company would let me participate.  So just give credit to the participants and
their affiliations. Create a little "About" menu item in your tools with a link
off to the content providers.
Last night was the first time in a week that I've slept more than 2 or 3 hours.
And we are working on unarchiving, verifying, validating more CMU content.  This
takes time to do, and I am thankful to my employer for allowing me to
participate.

I've receiving so many emails and calls for a week with people screaming to have
Haitian Creole data.  Who else has been willing to give such content to the
community at large?

We were able to deliver to Doctors without Borders in Haiti the list of 1600
sentences that were translated in 1 day by an experienced translation company.
Watch the news. These sentences are needed NOW because in a few days, the value
of that small corpus would have dropped to null if the people are still no
longer living who could benefit from the communication.

And I'm right now answering emails to 2 separate Haitian Creole content
providers who did not give me their data 12 years ago, and now might need to
arrange some confcalls.  These are relationships that took years to develop and
maintain. It's about who you know and how much they trust you.  And
there is no magic wand to accelerate this, even in a crisis situation.

The statements in this discussion thread can undermine and destroy all efforts
at present to get others to also make their content available, in whatever way
and conditions that they are willing to do so.

So if you want to potentially see more data from orgs other than CMU, then it
would be more constructive to send the CMU guys (Bob as the Haitian Creole
Project PI, Alon as the AMTA president) and me (as participant now + industry
advisor to both the LINGUIST-list and also to MultiLingual Computing and
Technology) your statements of support of the CMU initiative.  Your letters of
support (email is OK) for the continued release of Haitian Creole data will be
one way to make things possible.

And spread the word all over the internet (every blog, every discussion
forum, every newsletter, every news list) that this initiative by CMU is a first
and essential step to make language data available for building language
technologies upon it.
And encouraging others to do the same with their data, as they are able to do
so.

Please try to grasp and feel that moment in time of 10 days back in March 1998
where Haitian colleagues and I were speaking in Creole with 200 students at the
Universite d'Haiti with instructions on how to do the recording sessions.
And the final day with my seminar (in Creole) to a full room of Haitians about
the reason, the need and the ways of doing text and speech data collection, and
what it could do to develop a speech-to-speech MT system.  I was inventing
terminology in Haitian Creole that had never existed. (BTW, that seminar is
sitting on a VHS cassette and I need to get the conversion kit and find the time
to convert it to DV format and make it available to all.)

And then a week later when we built the system and were creating the demos along
with our Haitian colleagues, and to see the incredible excitement in their eyes
and the emotion in their voices.  And at that moment in time, I told them that
no one in the world could ever again say that Haitian Creole was a substandard,
broken patois.  Because you can't creole a speech-to-speech MT system to convert
a "non-language" to a language.  You can only match two entities which are at
the same level of the hiearchy:  a real language to a real language.

And then as I sat there a week ago watching the TV and was nearly in tears as 10
years of my life (half of it only in my free time, late nights, while family
members were already sleeping) had been spent working with people of several
different French-based Creoles, helping promote the idea of elevating the value
of their mother tongue from the oppressed, substandard, broken-patois to the
right to call it a "language".  All of that in vain if thousands of people would
suffer and die today because the system is sitting in a box, and the data is
archived on various media. Something had to be done to help to resolve
immediate needs, short-term needs, and longer-term needs.   And in one-week, we
have started to release the data that has been carfully checked. It's not like
the crap content that you will find on internet sites.  We had already done the
triage and clean-up of that over a decade ago.

* Send your letters of support
* Do what you can do with what we can make available
* Encourage and promote all further efforts (because it will be much more
complicated for the data that has been created by others)
* Develop all type of systems that you can, but don't do it just to say that you
created it for a minority language as some type of neat research project. Do it
to solve a real concrete need right now for a suffering people and nation.

Sorry for the seemingly long email, but if you want the really long version,
with all the details covering linguistic principles, language history,
socio-psychological issues, and the types of technologies and language
processing methods that can be developed, then go visit my publications page (on
LinkedIn profile) on Creole linguistics and Creole technologies. But that would
in itself take a full week to read all of those works. So I guess this email is
the short version.

Yours faithfully and for the people of Haiti,

Jeff

Jeff Allen
SAP Business Objects Division
Advisor, MultiLingual (Computing & Technology) magazine
Advisor, LINGUIST-List
http://www.linkedin.com/in/jeffallen

===============
Quoting Vadim Berman <[email protected]>:
> Hmm, yes, actually. Common sense is handy :-) .
>
> I only recall a handful of lawsuits related to free stuff, and usually the
> point was to get a lot of money from someone in the size of IBM. I kind of
> doubt that free corpora from
> a university for a language without much commercial potential in NLP would
> result in something like this.
>
>  Best regards,
>  Vadim
>
>   ----- Original Message -----
>   From: Job M. van Zuijlen
>   To: [email protected]
>   Sent: Saturday, January 23, 2010 10:29 AM
>   Subject: Re: [Mt-list] Public release of Haitian Creole language
> databyCarnegie Mellon
>
>
>   Some of the verbiage used in this discussion (lawyer bomb...) doesn't
> particularly encourage people to make their data freely available. What
> happened to common sense?  I think CMU's initiative should be commended.
>
>   Job van Zuijlen
>
>
>   From: Robert Frederking
>   Sent: Friday, January 22, 2010 16:32
>   To: Francis Tyers
>   Cc: [email protected]
>   Subject: Re: [Mt-list] Public release of Haitian Creole language data
> byCarnegie Mellon
>
>
>   I'm not a lawyer, but let me start by stating that out intent was simply
> that re-use included acknowledgement.  This was not intended to be a
> splash-screen on every start-up, or making the software pronounce our names
> at the start of every sentence.  :-)  It only has to be "clearly visible" in
> anyone's source files.
>
>   We aren't interested in suing people; we are a non-profit research
> organization.  But like the Regents in California, we have a responsibility
> to our sponsors that appropriate credit is given for our work.  So this is
> intended to be like the old BSD advertising clause, which is generally
> considered to be clear from a legal point of view.
>
>   Please use the data however you want; just don't say you originally
> collected it.
>
>       Bob
>
>   Francis Tyers wrote:
> [ Sorry in advance for cross posting ]
>
> I'm going over this on the debian-legal mailing list (a good place to
> ask about issues in free/open-source software licensing).
>
> There is a question about clause 5 of the licence:
>
> ----------------------------------------------------------------------------
>
> ##  5. Any commercial, public or published work that uses this data
> ##
> ##     must contain a clearly visible acknowledgment as to the           ##
> ##     provenance of the data.                                           ##
>
> ----------------------------------------------------------------------------
>
> >From debian-legal:
>
>  My concern is whether, contrary to the favourable interpretation you
>  give, this is intended to act like an obnoxious advertising clause.
>
>  In other words, what will satisfy âcontainâ in âcontain a clearly
>  visible acknowledgementâ? Is it sufficient for the acknowledgement to
>  be âclearly visibleâ only after inspecting various files in the source
>  code?
>
>  Or is the copyright holder's intent that the acknowledgement be clearly
>  visible to every recipient, even those who receive a non-source form of
>  the work? The latter would be a non-free restriction, like the
>  obnoxious advertising clause in the older BSD licenses.
>
>  This looks, as it is currently worded, more like a lawyerbomb now that
>  I consider it. I would appreciate input on this from legally-trained
>  minds.
>
> ----------------------------------------------------------------------------
>
> Could you confirm if that clause means that the acknowledgement should
> be _clearly visible_ to _every recipient_ or would it suffice to be
> visible after inspecting the source code?
>
> Thanks for your help in this and best regards,
>
> Francis Tyers
>
>
> El dj 21 de 01 de 2010 a les 22:59 -0500, en/na Alon Lavie va escriure:
>   Hi Francis,
>
> Thanks for the suggestion, but we were advised to leave the licensing
> language as is.  Our licensing language is effectively equivalent to the
> MIT license.and is unambiguous with respect to releasing the data for
> any use (commercial or non-commercial).
>
> Best regards,
>
> - *Alon*
>
> Francis Tyers wrote:
>     El dj 21 de 01 de 2010 a les 14:49 -0500, en/na Robert Frederking va
> escriure:
>
>       The Language Technologies Institute (LTI) of Carnegie Mellon
> University's
> School of Computer Science (CMU SCS) is making publicly available the
> Haitian Creole spoken and text data that we have collected or produced. We
> are providing this data with minimal restrictions in order to
> allow others to develop language technology for Haiti, in parallel with our
> own efforts to help with this crisis. Since organizing the data in a useful
> fashion is not instantaneous, and more text data is currently being
> produced
> by collaborators, we will be publishing the data incrementally on the web,
> as it becomes available.  To access the currently available data, please
> visit the website at  http://www.speech.cs.cmu.edu/haitian/
>
>         Would you consider also dual/triple licensing the data under an
> existing
> free software licence, such as the MIT licence[1] or the GNU GPL[2] ?
> This way it could be combined with existing data under these licences
> (e.g. the majority of free/open-source software) and researchers and
> developers don't need to hire legal advice to determine if they can
> combine their work with yours.
>
> Best regards,
>
> Fran
>
> 1. http://en.wikipedia.org/wiki/MIT_Licence#License_terms
> 2. http://www.gnu.org/licenses/gpl.html
>

_______________________________________________
Mt-list mailing list

Re: [Mt-list] Public release of Haitian Creole language databyCarnegie Mellon

Reply via email to