All, Can we please end this discussion about nitpicking over licensing and potential lawsuits.
As Bob and Alon have clearly said, this licensing statement has been used for over a decade with other CMU published stuff. Did anything negative ever come of that? If nothing negative, then case is closed. As Bob said, we just want credit to be given where credit is due. I work outside of this industry and had the guts (in the midst of an already stressed out software dev cycle) to approach my boss a week ago with a proposal to participate in this special Haiti Disaster Relief project, and to see how the company would let me participate. So just give credit to the participants and their affiliations. Create a little "About" menu item in your tools with a link off to the content providers. Last night was the first time in a week that I've slept more than 2 or 3 hours. And we are working on unarchiving, verifying, validating more CMU content. This takes time to do, and I am thankful to my employer for allowing me to participate. I've receiving so many emails and calls for a week with people screaming to have Haitian Creole data. Who else has been willing to give such content to the community at large? We were able to deliver to Doctors without Borders in Haiti the list of 1600 sentences that were translated in 1 day by an experienced translation company. Watch the news. These sentences are needed NOW because in a few days, the value of that small corpus would have dropped to null if the people are still no longer living who could benefit from the communication. And I'm right now answering emails to 2 separate Haitian Creole content providers who did not give me their data 12 years ago, and now might need to arrange some confcalls. These are relationships that took years to develop and maintain. It's about who you know and how much they trust you. And there is no magic wand to accelerate this, even in a crisis situation. The statements in this discussion thread can undermine and destroy all efforts at present to get others to also make their content available, in whatever way and conditions that they are willing to do so. So if you want to potentially see more data from orgs other than CMU, then it would be more constructive to send the CMU guys (Bob as the Haitian Creole Project PI, Alon as the AMTA president) and me (as participant now + industry advisor to both the LINGUIST-list and also to MultiLingual Computing and Technology) your statements of support of the CMU initiative. Your letters of support (email is OK) for the continued release of Haitian Creole data will be one way to make things possible. And spread the word all over the internet (every blog, every discussion forum, every newsletter, every news list) that this initiative by CMU is a first and essential step to make language data available for building language technologies upon it. And encouraging others to do the same with their data, as they are able to do so. Please try to grasp and feel that moment in time of 10 days back in March 1998 where Haitian colleagues and I were speaking in Creole with 200 students at the Universite d'Haiti with instructions on how to do the recording sessions. And the final day with my seminar (in Creole) to a full room of Haitians about the reason, the need and the ways of doing text and speech data collection, and what it could do to develop a speech-to-speech MT system. I was inventing terminology in Haitian Creole that had never existed. (BTW, that seminar is sitting on a VHS cassette and I need to get the conversion kit and find the time to convert it to DV format and make it available to all.) And then a week later when we built the system and were creating the demos along with our Haitian colleagues, and to see the incredible excitement in their eyes and the emotion in their voices. And at that moment in time, I told them that no one in the world could ever again say that Haitian Creole was a substandard, broken patois. Because you can't creole a speech-to-speech MT system to convert a "non-language" to a language. You can only match two entities which are at the same level of the hiearchy: a real language to a real language. And then as I sat there a week ago watching the TV and was nearly in tears as 10 years of my life (half of it only in my free time, late nights, while family members were already sleeping) had been spent working with people of several different French-based Creoles, helping promote the idea of elevating the value of their mother tongue from the oppressed, substandard, broken-patois to the right to call it a "language". All of that in vain if thousands of people would suffer and die today because the system is sitting in a box, and the data is archived on various media. Something had to be done to help to resolve immediate needs, short-term needs, and longer-term needs. And in one-week, we have started to release the data that has been carfully checked. It's not like the crap content that you will find on internet sites. We had already done the triage and clean-up of that over a decade ago. * Send your letters of support * Do what you can do with what we can make available * Encourage and promote all further efforts (because it will be much more complicated for the data that has been created by others) * Develop all type of systems that you can, but don't do it just to say that you created it for a minority language as some type of neat research project. Do it to solve a real concrete need right now for a suffering people and nation. Sorry for the seemingly long email, but if you want the really long version, with all the details covering linguistic principles, language history, socio-psychological issues, and the types of technologies and language processing methods that can be developed, then go visit my publications page (on LinkedIn profile) on Creole linguistics and Creole technologies. But that would in itself take a full week to read all of those works. So I guess this email is the short version. Yours faithfully and for the people of Haiti, Jeff Jeff Allen SAP Business Objects Division Advisor, MultiLingual (Computing & Technology) magazine Advisor, LINGUIST-List http://www.linkedin.com/in/jeffallen =============== Quoting Vadim Berman <[email protected]>: > Hmm, yes, actually. Common sense is handy :-) . > > I only recall a handful of lawsuits related to free stuff, and usually the > point was to get a lot of money from someone in the size of IBM. I kind of > doubt that free corpora from > a university for a language without much commercial potential in NLP would > result in something like this. > > Best regards, > Vadim > > ----- Original Message ----- > From: Job M. van Zuijlen > To: [email protected] > Sent: Saturday, January 23, 2010 10:29 AM > Subject: Re: [Mt-list] Public release of Haitian Creole language > databyCarnegie Mellon > > > Some of the verbiage used in this discussion (lawyer bomb...) doesn't > particularly encourage people to make their data freely available. What > happened to common sense? I think CMU's initiative should be commended. > > Job van Zuijlen > > > From: Robert Frederking > Sent: Friday, January 22, 2010 16:32 > To: Francis Tyers > Cc: [email protected] > Subject: Re: [Mt-list] Public release of Haitian Creole language data > byCarnegie Mellon > > > I'm not a lawyer, but let me start by stating that out intent was simply > that re-use included acknowledgement. This was not intended to be a > splash-screen on every start-up, or making the software pronounce our names > at the start of every sentence. :-) It only has to be "clearly visible" in > anyone's source files. > > We aren't interested in suing people; we are a non-profit research > organization. But like the Regents in California, we have a responsibility > to our sponsors that appropriate credit is given for our work. So this is > intended to be like the old BSD advertising clause, which is generally > considered to be clear from a legal point of view. > > Please use the data however you want; just don't say you originally > collected it. > > Bob > > Francis Tyers wrote: > [ Sorry in advance for cross posting ] > > I'm going over this on the debian-legal mailing list (a good place to > ask about issues in free/open-source software licensing). > > There is a question about clause 5 of the licence: > > ---------------------------------------------------------------------------- > > ## 5. Any commercial, public or published work that uses this data > ## > ## must contain a clearly visible acknowledgment as to the ## > ## provenance of the data. ## > > ---------------------------------------------------------------------------- > > >From debian-legal: > > My concern is whether, contrary to the favourable interpretation you > give, this is intended to act like an obnoxious advertising clause. > > In other words, what will satisfy âcontainâ in âcontain a clearly > visible acknowledgementâ? Is it sufficient for the acknowledgement to > be âclearly visibleâ only after inspecting various files in the source > code? > > Or is the copyright holder's intent that the acknowledgement be clearly > visible to every recipient, even those who receive a non-source form of > the work? The latter would be a non-free restriction, like the > obnoxious advertising clause in the older BSD licenses. > > This looks, as it is currently worded, more like a lawyerbomb now that > I consider it. I would appreciate input on this from legally-trained > minds. > > ---------------------------------------------------------------------------- > > Could you confirm if that clause means that the acknowledgement should > be _clearly visible_ to _every recipient_ or would it suffice to be > visible after inspecting the source code? > > Thanks for your help in this and best regards, > > Francis Tyers > > > El dj 21 de 01 de 2010 a les 22:59 -0500, en/na Alon Lavie va escriure: > Hi Francis, > > Thanks for the suggestion, but we were advised to leave the licensing > language as is. Our licensing language is effectively equivalent to the > MIT license.and is unambiguous with respect to releasing the data for > any use (commercial or non-commercial). > > Best regards, > > - *Alon* > > Francis Tyers wrote: > El dj 21 de 01 de 2010 a les 14:49 -0500, en/na Robert Frederking va > escriure: > > The Language Technologies Institute (LTI) of Carnegie Mellon > University's > School of Computer Science (CMU SCS) is making publicly available the > Haitian Creole spoken and text data that we have collected or produced. We > are providing this data with minimal restrictions in order to > allow others to develop language technology for Haiti, in parallel with our > own efforts to help with this crisis. Since organizing the data in a useful > fashion is not instantaneous, and more text data is currently being > produced > by collaborators, we will be publishing the data incrementally on the web, > as it becomes available. To access the currently available data, please > visit the website at http://www.speech.cs.cmu.edu/haitian/ > > Would you consider also dual/triple licensing the data under an > existing > free software licence, such as the MIT licence[1] or the GNU GPL[2] ? > This way it could be combined with existing data under these licences > (e.g. the majority of free/open-source software) and researchers and > developers don't need to hire legal advice to determine if they can > combine their work with yours. > > Best regards, > > Fran > > 1. http://en.wikipedia.org/wiki/MIT_Licence#License_terms > 2. http://www.gnu.org/licenses/gpl.html > _______________________________________________ Mt-list mailing list
