Re: [RESULT][VOTE] - Graduate Apache Joshua (incubating) as a TLP

2018-10-02 Thread lewis john mcgibbney
Hi Tommaso,
Excellent, are you able to take care of this?

On Tue, Oct 2, 2018 at 12:40 AM 
wrote:

> From: Tommaso Teofili 
> To: dev@joshua.incubator.apache.org
> Cc:
> Bcc:
> Date: Tue, 2 Oct 2018 09:40:14 +0200
> Subject: Fwd: [RESULT][VOTE] - Graduate Apache Joshua (incubating) as a TLP
> Hi folks,
>
> our TLP graduation vote has succeeded :-)
> We should now follow the required steps to complete graduation.
> I think next, is submission of the resolution to the ASF board [1].
>
> Regards,
> Tommaso
>
> [1] :
> http://incubator.apache.org/guides/graduation.html#submission_of_the_resolution_to_the_board
>


Fwd: FW: September 2018 Newsletter - LDC

2018-09-17 Thread lewis john mcgibbney
-- Forwarded message -
From: Mcgibbney, Lewis J (398M) 
Date: Mon, Sep 17, 2018 at 12:39 PM
Subject: FW: September 2018 Newsletter - LDC
To: lewis john mcgibbney 






Dr. Lewis John McGibbney Ph.D., B.Sc.

Data Scientist II

Computer Science for Data Intensive Applications Group (398M)

Instrument Software and Science Data Systems Section (398)

Jet Propulsion Laboratory

California Institute of Technology

4800 Oak Grove Drive

Pasadena, California 91109-8099

Mail Stop : 158-256C

Tel:  (+1) (818)-393-7402

Cell: (+1) (626)-487-3476

Fax:  (+1) (818)-393-1190

Email: lewis.j.mcgibb...@jpl.nasa.gov

ORCID: orcid.org/-0003-2185-928X



   [image: signature_1314009030]



 Dare Mighty Things



*From: *Ldc-customers1  on behalf of
Penn LDC 
*Date: *Monday, September 17, 2018 at 12:09 PM
*To: *Penn LDC 
*Subject: *September 2018 Newsletter - LDC



In this newsletter:


New Publications:

BOLT Information Retrieval Comprehensive Training and Evaluation
<https://catalog.ldc.upenn.edu/LDC2018T18>

HAVIC MED Event E051-E060 -- Videos, Metadata and Annotation
<https://catalog.ldc.upenn.edu/LDC2018V01>

Multi-Language Conversational Telephone Speech 2011 -- Spanish
<https://catalog.ldc.upenn.edu/LDC2018S12>

IARPA Babel Kazakh Language Pack IARPA-babel302b-v1.0a
<https://catalog.ldc.upenn.edu/LDC2018S13>




New publications:



(1) BOLT Information Retrieval Comprehensive Training and Evaluation
<https://catalog.ldc.upenn.edu/LDC2018T18> was developed by LDC and
consists of all data produced in support of the Information Retrieval (IR
<https://www.ldc.upenn.edu/collaborations/current-projects/bolt/information-retrieval>)
task within the DARPA Broad Operational Language Translation (BOLT)
Program, including annotations, source documents and scoring software.



The BOLT IR task sought to support development of systems that could take
as input a natural language English query sentence, return relevant
responses to that query from a large corpus of informal documents in the
three BOLT languages (Arabic, Chinese, and English) and translate responses
from non-English documents into English. This release contains (1)
natural-language IR queries, system responses to queries, and
manually-generated assessment judgments for system responses; (2)
discussion forum source documents in Arabic, Chinese and English; (3)
scoring software for each evaluation phase; and (4) experimental data
developed in Phase 2.



BOLT Information Retrieval Comprehensive Training and Evaluation is
distributed via web download.



2018 Subscription Members will automatically receive copies of this corpus.
2018 Standard Members may request a copy as part of their 16 free
membership corpora. Non-members may license this data for $2,500.

*

(2) HAVIC MED Event E051-E060 -- Videos, Metadata and Annotation
<https://catalog.ldc.upenn.edu/LDC2018V01> was developed by LDC and is
comprised of approximately 53 hours of user-generated videos with
annotation and metadata. To advance multimodal event detection and related
technologies, LDC developed, in collaboration with NIST
<https://www.nist.gov/> (the National Institute of Standards and
Technology), a large, heterogeneous, annotated multimodal corpus for HAVIC
<https://www.ldc.upenn.edu/collaborations/past-projects/havic> (the
Heterogeneous Audio Visual Internet Collection) that was used in the
NIST-sponsored MED
<https://www.nist.gov/itl/iad/mig/trecvid-multimedia-event-detection-evaluation-track>
(Multimedia Event Detection) task for several years. HAVIC MED Event
E051-E060 is a subset of that corpus, specifically, a collection of event
videos for the HAVIC Project originally released to support the 2016
Multimedia Event Detection task
<https://www.nist.gov/itl/iad/mig/med-2016-evaluation>.



The data consists of videos of various events (event videos) and videos
completely unrelated to events (background videos) harvested by a large
team of human annotators. Each event video was manually annotated with a
set of judgments describing its event properties and other salient
features. Background videos were labeled with topic and genre categories.



HAVIC MED Event E051-E060 -- Videos, Metadata and Annotation is distributed
via web download.



2018 Subscription Members will automatically receive copies of this corpus.
2018 Standard Members may request a copy as part of their 16 free
membership corpora. Non-members may license this data for $2,000.

*

(3) Multi-Language Conversational Telephone Speech 2011 -- Spanish
<https://catalog.ldc.upenn.edu/LDC2018S12> was developed by LDC and is
comprised of approximately 23 hours of telephone speech in Spanish.



The data were collected primarily to support research and technology
evaluation in automatic language identification, and portions of these
telephone calls were used in the NIST 2011 Language Recognition Evaluation (
LRE <https://www.nis

[jira] [Commented] (JOSHUA-335) Consider using thrax2

2018-09-06 Thread Lewis John McGibbney (JIRA)


[ 
https://issues.apache.org/jira/browse/JOSHUA-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16606589#comment-16606589
 ] 

Lewis John McGibbney commented on JOSHUA-335:
-

[~mjwall]

bq. Either by replacing it in the pipeline.pl

this is the quickest solution

bq. or reworking the pipeline as I have seen discussed in other tickets.

This is what needs done. It is hellish and I think we should invest a GSoC 
project in trying to achieve it. Can you reference the ticket here please if 
there is one?

> Consider using thrax2
> -
>
> Key: JOSHUA-335
> URL: https://issues.apache.org/jira/browse/JOSHUA-335
> Project: Joshua
>  Issue Type: Improvement
>  Components: thrax
>Reporter: Michael Wall
>Priority: Minor
>
> Ran across this https://github.com/jweese/thrax2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JOSHUA-335) Consider using thrax2

2018-09-06 Thread Lewis John McGibbney (JIRA)


[ 
https://issues.apache.org/jira/browse/JOSHUA-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16606465#comment-16606465
 ] 

Lewis John McGibbney commented on JOSHUA-335:
-

[~mjwall] this is interesting. IIRC Thrax is always used when building LM's in 
Joshua. Are you planning on taking this on?

> Consider using thrax2
> -
>
> Key: JOSHUA-335
> URL: https://issues.apache.org/jira/browse/JOSHUA-335
> Project: Joshua
>  Issue Type: Improvement
>  Components: thrax
>Reporter: Michael Wall
>Priority: Minor
>
> Ran across this https://github.com/jweese/thrax2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] Graduation (was Re: Path to TLP)

2018-09-06 Thread lewis john mcgibbney
Hi Chris,

I am +1 to this, please carry through my VOTE. I am also eager for Joshua
to graduate as we have been stagnating here for a while with the only
Roadblock being the mentorship and PPMC to stand up and push the button.
Thanks for looping back in.
Lewis

On Thu, Sep 6, 2018 at 8:36 AM 
wrote:

>
> From: Chris Mattmann 
> To: "dev@joshua.incubator.apache.org" 
> Cc:
> Bcc:
> Date: Thu, 06 Sep 2018 08:36:20 -0700
> Subject: Re: [DISCUSS] Graduation (was Re: Path to TLP)
> Coming back to this.
>
>
>
> Sorry it took so long :/
>
>
>
> Here is a proposed graduation template. I will call for a VOTE on it
> by mid-next week once the discussion comes to consensus.
>
>
>
> WHEREAS, the Board of Directors deems it to be in the best
>
> interests of the Foundation and consistent with the
>
> Foundation's purpose to establish a Project Management
>
> Committee charged with the creation and maintenance of
>
> open-source software, for distribution at no charge to
>
> the public, related to statistical and other forms of machine
> translation.
>
>
>
> NOW, THEREFORE, BE IT RESOLVED, that a Project Management
>
> Committee (PMC), to be known as the "Apache Joshua Project",
>
> be and hereby is established pursuant to Bylaws of the
>
> Foundation; and be it further
>
>
>
> RESOLVED, that the Apache Joshua Project be and hereby is
>
> responsible for the creation and maintenance of software
>
> related to statistical and other forms of machine translation;
>
> and be it further
>
>
>
> RESOLVED, that the office of "Vice President, Apache Joshua" be
>
> and hereby is created, the person holding such office to
>
> serve at the direction of the Board of Directors as the chair
>
> of the Apache Joshua Project, and to have primary responsibility
>
> for management of the projects within the scope of
>
> responsibility of the Apache Joshua Project; and be it further
>
>
>
> RESOLVED, that the persons listed immediately below be and
>
> hereby are appointed to serve as the initial members of the
>
> Apache Joshua Project:
>
>
>
> * Tom Barber  
>
> * Thamme Gowda   
>
> * Felix Hieber 
>
> * Lewis John McGibbney 
>
> * Chris Mattmann 
>
> * Matt Post 
>
> * Paul Ramirez   
>
> * Henry Saputra
>
> * Kellen Sunderland 
>
> * Tommaso Teofili
>
>
>
> NOW, THEREFORE, BE IT FURTHER RESOLVED, that Matt Post
>
> be appointed to the office of Vice President, Apache Joshua to
>
> serve in accordance with and subject to the direction of the
>
> Board of Directors and the Bylaws of the Foundation until
>
> death, resignation, retirement, removal or disqualification,
>
> or until a successor is appointed; and be it further
>
>
>
> RESOLVED, that the initial Apache Joshua PMC be and hereby is
>
> tasked with the creation of a set of bylaws intended to
>
> encourage open development and increased participation in the
>
> Apache Joshua Project; and be it further
>
>
>
> RESOLVED, that the Apache Joshua Project be and hereby
>
> is tasked with the migration and rationalization of the Apache
>
> Incubator Joshua podling; and be it further
>
>
>
> RESOLVED, that all responsibilities pertaining to the Apache
>
> Incubator Joshua podling encumbered upon the Apache Incubator
>
> Project are hereafter discharged.
>
>
>
> Cheers,
>
> Chris
>
>
>
>
>
>
>
> From: Thamme Gowda 
> Reply-To: "dev@joshua.incubator.apache.org" <
> dev@joshua.incubator.apache.org>
> Date: Saturday, February 3, 2018 at 7:51 PM
> To: "dev@joshua.incubator.apache.org" 
> Subject: Re: [DISCUSS] Graduation (was Re: Path to TLP)
>
>
>
> Great news!
>
>
>
> 2018-02-01 19:48 GMT-08:00 Mattmann, Chris A (1761) <
>
> chris.a.mattm...@jpl.nasa.gov>:
>
>
>
> +1 I’ll draft the resolution and send shortly for community vote
>
>
>
> Sent from my iPhone
>
>
>
> > On Feb 1, 2018, at 7:22 PM, Tom Barber  wrote:
>
> >
>
> > I'd just like to dig this one back. Seeing how Matt accepted the
>
> proposal and there is action from Tommaso and Lewis to get stuff merged,
>
> it seems like there is general consensus to get Joshua out of the
> incubator.
>
> >
>
> > Tom
>
> >
>
> 

Fwd: FW: August 2018 Newsletter - LDC

2018-08-16 Thread lewis john mcgibbney
FYI

-- Forwarded message -
From: Mcgibbney, Lewis J (398M) 
Date: Thu, Aug 16, 2018 at 12:18 PM
Subject: FW: August 2018 Newsletter - LDC
To: lewis john mcgibbney 






Dr. Lewis John McGibbney Ph.D., B.Sc.

Data Scientist II

Computer Science for Data Intensive Applications Group (398M)

Instrument Software and Science Data Systems Section (398)

Jet Propulsion Laboratory

California Institute of Technology

4800 Oak Grove Drive

Pasadena, California 91109-8099

Mail Stop : 158-256C

Tel:  (+1) (818)-393-7402

Cell: (+1) (626)-487-3476

Fax:  (+1) (818)-393-1190

Email: lewis.j.mcgibb...@jpl.nasa.gov

ORCID: orcid.org/-0003-2185-928X



   [image: signature_601139709]



 Dare Mighty Things



*From: *Ldc-customers1  on behalf of
Penn LDC 
*Date: *Wednesday, August 15, 2018 at 8:09 AM
*To: *Penn LDC 
*Subject: *August 2018 Newsletter - LDC



*In this newsletter: *

*LDC at Interspeech 2018*

*Fall 2018 LDC Data Scholarship Program*

*New Publications:*

BOLT English SMS/Chat <https://catalog.ldc.upenn.edu/LDC2018T19>

CIEMPIESS Balance <https://catalog.ldc.upenn.edu/LDC2018S11>



2011 NIST Language Recognition Evaluation Test Set
<https://catalog.ldc.upenn.edu/LDC2018S06>





*LDC at Interspeech 2018*

*LDC will participate in various ways  at **Interspeech 2018
<http://interspeech2018.org/index.html>** held this year in Hyderabad,
India, September 2-6. It is co-organizing the special session, **The First
DIHARD Speech Diarization Challenge
<https://coml.lscp.ens.fr/dihard/index.html>**, **on September 3 and is a
sponsor of the September 1 pre-conference workshop, ** Young Female
Researchers in Speech Science & Technology
<https://sites.google.com/view/yfrsw2018/home>** (YFRSW). Results of recent
work will be presented during the poster session on September 3, “Global
TIMIT: Acoustic Phonetic Datasets for the World’s Languages.”*

*Fall 2018 LDC Data Scholarship Program*

Students can apply for the Fall 2018 Data Scholarship Program now through
September 15, 2018. The LDC Data Scholarship program provides students with
access to LDC data at no cost. For more information on application
requirements and program rules, please visit LDC Data Scholarships
<https://www.ldc.upenn.edu/language-resources/data/data-scholarships>.




* New publications:*



(1) BOLT English SMS/Chat <https://catalog.ldc.upenn.edu/LDC2018T19> was
developed by LDC and consists of naturally-occurring Short Message Service
(SMS) and Chat (CHT) data collected through data donations and live
collection from native English speakers. The corpus contains 18,429
conversations totaling 3,674,802 words across 375,967 messages.

The BOLT <https://www.ldc.upenn.edu/collaborations/current-projects/bolt>
(Broad Operational Language Translation) program developed machine
translation and information retrieval for less formal genres, focusing
particularly on user-generated content. LDC supported the BOLT program by
collecting informal data sources -- discussion forums, text messaging, and
chat -- in Chinese, Egyptian Arabic, and English. The collected data was
translated and annotated for various tasks including word alignment,
treebanking, propbanking, and co-reference.

BOLT English SMS/Chat is available via web download.



2018 Subscription Members will receive copies of this corpus. 2018 Standard
Members may request a copy as part of their 16 free membership corpora.
Non-members may license this data for US $1750.



*



(2) CIEMPIESS Balance <https://catalog.ldc.upenn.edu/LDC2018S11> (Corpus de
Investigación en Español de México del Posgrado de Ingeniería Eléctrica y
Servicio Social) was developed by the Development of Speech Technologies
program at the School of Engineering <http://www.ingenieria.unam.mx>
at the National
Autonomous University of Mexico <http://www.unam.mx/> (UNAM) and consists
of approximately 18 hours of Mexican Spanish broadcast speech with
associated transcripts. The goal of this work was to create acoustic models
for automatic speech recognition. For more information and documentation
see the CIEMPIESS-UNAM Project website <http://www.CIEMPIESS.org/>.



CIEMPIESS Balance is a companion corpus to CIEMPIESS Light, released by LDC
as LDC2017S23 <https://catalog.ldc.upenn.edu/LDC2017S23>. It was developed
so that the data sets together constitute a gender-balanced corpus. The
gender breakdown in CIEMPIESS Light is approximately 75% male and 25%
female. In CIEMPIESS Balance, the gender breakdown is approximately 25%
male and 75% female.



The majority of the speech recordings were collected from Radio-IUS
<http://www.derecho.unam.mx/cultura-juridica/radio.php>, a UNAM radio
station. Other recordings were taken from IUS Canal Multimedia
<https://www.youtube.com/user/DEDUNAM/videos> and Centro Universitario de
Estudios Jurídicos
<https://www.youtube.com/channel/UCTxkzdUd0tiXT5BN5o6Xo-A/video

Re: Apache Joshua Implementation

2018-07-03 Thread Lewis John Mcgibbney
Hi Rosie,
Replies inline

On Tue, Jul 3, 2018 at 6:41 AM, rosie.ole...@baesystems.com <
rosie.ole...@baesystems.com> wrote:

>
>
> Our goal is to use Joshua to translate, with a reasonable speed and
> accuracy ratio, large documents.
>

The terms 'reasonable speed' and 'accuracy' should be further defined as of
course there are tradeoffs. These are highly configurable based upon the
generation and use of language model(s) used within the SMT.


> We want to integrate this into the rest of our platform, so we are
> developing a REST API to wrap around the functionality.
>

Sounds good.


>
>
> Our solution is to create a Node.js Express API and call Joshua. We have
> narrowed this down to two possibilities: running Joshua commands through
> the command line or running Joshua as a HTTP server, formatting the input
> document content into sentences to send to the Joshua REST endpoint.
>

There is an existing Python implementation demonstrating how this could be
done
https://github.com/joshua-decoder/joshua_translation_engine


>
>
> With both options we have had a number of issues.
>
>
>
> Firstly, running the commands through the command line.
>
> ·The documentation is specific to Linux and bash terminals
> whereas we want to apply the functionality to a Windows Operating System,
> so we can only run bash scripts through a git-bash terminal which has been
> difficult to implement in a nodejs module. We are especially having issues
> implementing the prepare.sh script. Do you have any solutions for running
> this script, or mimic what it does, through Windows command line?
>
Absolutely none what-so-ever I have not used Windows for many years. I
would highly suggest that you run Joshua as a service.

>
>
> Running Joshua as a HTTP server
>
>
>
> ·The documentation for using the Joshua live server suggests that
> to translate text, the content must be broken down manually into sentences
> and make separate HTTP GET requests to the server with the sentence in the
> URL. Is there any functionality in Joshua that handles the translation of a
> large block of text?
>
No AFAIK input is processing of sentences.

>
>
> Overall we understand it will be more efficient and faster to run Joshua
> as a HTTP server, ideally with multiple languages, especially since we are
> having problems running the tokenise and normalise scripts through the
> command line. Do you have an idea of which method is best to implement
> Joshua?
>

See above.


>
>
> Alternatively, we have an idea of interacting directly with the methods in
> the jar file but we can’t find any documentation on using it , do you have
> any insight on this?
>

You shouldn't need to do this, provisioning Joshua-as-a-service will enable
all of the functionality you require. Each language pack already provides
this as well. See
https://cwiki.apache.org/confluence/display/JOSHUA/Language+Packs


>
>
>
> Finally what is the status of Joshua in the Apache Incubator?
>

We are in the process of graduating as a top level project.


> Is it still being developed and supported?
>
>
>
> Yes, Joshua is being developed and maintained by the existing community.

Please keep the questions coming.
Lewis


Re: Help

2018-07-02 Thread lewis john mcgibbney
Hi Folks,
Please CC dev@joshua in correspondence.
Thanks
Lewis

On Mon, Jul 2, 2018 at 11:58 Smith, Adam 
wrote:

> Kevin / Rosie / Tom,
>
> Lewis and the Joshua incubator team have kindly agreed to help us with our
> Apache Joshua implementation.
>
> Tomorrow morning can we list questions in a batch and send them.
>
> Lewis, please look out for some questions by this time tomorrow. The team
> are struggling to configure AJ correctly for a micro service architecture.
> We may need a VC or call if necessary. We are in the UK so I am assuming
> there is a time difference.
>
> Many thanks,
> Adam S
>
> BAE Systems will collect and process information about you that may be
> subject to data protection laws. For more information about how we use and
> disclose your personal information, how we protect your information, our
> legal basis to use your information, your rights and who you can contact,
> please refer to the relevant sections of our Privacy note at
> www.baesystems.com/en/cybersecurity/privacy
>
>
> Please consider the environment before printing this email. This message
> should be regarded as confidential. If you have received this email in
> error please notify the sender and destroy it immediately. Statements of
> intent shall only become binding when confirmed in hard copy by an
> authorised signatory. The contents of this email may relate to dealings
> with other companies under the control of BAE Systems PLC, details of which
> can be found at http://www.baesystems.com/Businesses/index.htm.
>
-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Fwd: FW: June 2018 Newsletter - LDC

2018-06-20 Thread lewis john mcgibbney
-- Forwarded message --
From: Mcgibbney, Lewis J (398M) 
Date: Tue, Jun 19, 2018 at 3:34 PM
Subject: FW: June 2018 Newsletter - LDC
To: lewis john mcgibbney 






Dr. Lewis John McGibbney Ph.D., B.Sc.

Data Scientist II

Computer Science for Data Intensive Applications Group (398M)

Instrument Software and Science Data Systems Section (398)

Jet Propulsion Laboratory

California Institute of Technology

4800 Oak Grove Drive
<https://maps.google.com/?q=4800+Oak+Grove+Drive+%0D%0A+%0D%0A+%0D%0A+%0D%0A+Pasadena,+California+91109&entry=gmail&source=g>

Pasadena, California 91109
<https://maps.google.com/?q=4800+Oak+Grove+Drive+%0D%0A+%0D%0A+%0D%0A+%0D%0A+Pasadena,+California+91109&entry=gmail&source=g>
-8099

Mail Stop : 158-256C

Tel:  (+1) (818)-393-7402

Cell: (+1) (626)-487-3476

Fax:  (+1) (818)-393-1190

Email: lewis.j.mcgibb...@jpl.nasa.gov

ORCID: orcid.org/-0003-2185-928X



   [image: signature_55933217]



 Dare Mighty Things



*From: *Ldc-customers1  on behalf of
Penn LDC 
*Date: *Monday, June 18, 2018 at 8:09 AM
*To: *Penn LDC 
*Subject: *June 2018 Newsletter - LDC



*In this newsletter: *



*LDC Catalog certified as CoreTrustSeal data repository *


*LDC data and commercial technology development *
*New Publications:*

*BOLT Chinese SMS/Chat* <https://catalog.ldc.upenn.edu/LDC2018T15>

*Multi-Language Conversational Telephone Speech 2011 -- Central European*
<https://catalog.ldc.upenn.edu/LDC2018S08>

*TAC KBP English Entity Linking - Comprehensive Training and Evaluation
Data 2009-2013* <https://catalog.ldc.upenn.edu/LDC2018T16>

*IARPA Babel Cebuano Language Pack IARPA-babel301b-v2.0b*
<https://catalog.ldc.upenn.edu/LDC2018S07>

__

*LDC Catalog certified as CoreTrustSeal data repository *

LDC is pleased to announce that the Catalog <https://catalog.ldc.upenn.edu/>
has been awarded the CoreTrustSeal <https://www.coretrustseal.org/> for
recognition as a trustworthy data repository. This means that the Catalog
meets a series of standards covering data access, rights management,
curation, and storage developed by the ISCU World Data System and the Data
Seal of Approval. LDC joins the other 136 certified repositories around the
globe in the commitment to promote sustainable and trustworthy data
infrastructures.

*LDC data and commercial technology development*

For-profit organizations are reminded that an LDC membership is a
pre-requisite for obtaining a commercial license to almost all LDC
databases. Non-member organizations, including non-member for-profit
organizations, cannot use LDC data to develop or test products for
commercialization, nor can they use LDC data in any commercial product or
for any commercial purpose. LDC data users should consult corpus-specific
license agreements for limitations on the use of certain corpora. Visit the
Licensing <https://www.ldc.upenn.edu/data-management/using/licensing> page
for further information.


___


* New publications:*

(1) *BOLT Chinese SMS/Chat* <https://catalog.ldc.upenn.edu/LDC2018T15> was
developed by LDC and consists of naturally-occurring Short Message Service
(SMS) and Chat (CHT) data collected through data donations and live
collection involving native speakers of Chinese. The corpus contains 14,877
conversations totaling 3,005,810 words across 497,543 messages.

The BOLT  <https://www.ldc.upenn.edu/collaborations/current-projects/bolt>(Broad
Operational Language Translation) program developed machine translation and
information retrieval for less formal genres, focusing particularly on
user-generated content. LDC supported the BOLT program by collecting
informal data sources – discussion forums, text messaging, and chat – in
Chinese, Egyptian Arabic, and English. The collected data was translated
and annotated for various tasks including word alignment, treebanking,
propbanking, and co-reference. The data in this release was collected using
two methods: new collection via LDC's collection platform, and donation of
SMS or chat archives from BOLT collection participants.

BOLT Chinese SMS/Chat is distributed via web download.

2018 Subscription Members will automatically receive copies of this corpus.
2018 Standard Members may request a copy as part of their 16 free
membership corpora. Non-members may license this data for US $1750.



*



(2) *Multi-Language Conversational Telephone Speech 2011 -- Central
European* <https://catalog.ldc.upenn.edu/LDC2018S08> was developed by LDC
and is comprised of approximately 44 hours of telephone speech in two
distinct language varieties of Central Europe: Czech and Slovak.

The data were collected primarily to support research and technology
evaluation in automatic language identification, and portions of these
telephone cal

[RESULT] WAS Re: [VOTE] Graduate the Apache Joshua (Incubating) Project

2018-05-01 Thread lewis john mcgibbney
Hi Folks,
72 hours has come and gone. I am closing off this thread. Thank you to
everyone that VOTE'd. RESULT is below

[7] +1 Graduate the Apache Joshua (Incubating) Project
Rajesh Dharmadhikari
Lewis John McGibbney*
Tommaso Teofili*
Thamme Gowda*
Tom Barber*
Chris A. Mattmann*
kellen sunderland*

[0] -1 NO NOT Graduate the Apache Joshua (Incubating) Project... please
provide reasoning

* Joshua PPMC Binding

The VOTE therefore passes :)
I'll continue with the Graduation as described in the hyperlink below.
Thanks
Lewis

On Tue, Apr 24, 2018 at 10:02 PM, lewis john mcgibbney 
wrote:

> Hi Folks,
> I would like to open a VOTE for graduating the Apache Joshua (Incubating)
> project.
> For those that are interested, the Incubator guidelines on graduation can
> be found at [0].
> Joshua has been reporting to the IPMC since 16th March 2016 and made one
> Incubating release.
>
> Joshua Basics
>
>- Podling Proposal <http://wiki.apache.org/incubator/JoshuaProposal>
>- Status: current
>- Established: 2016-02-13
>- Incubating for 802 days
>- Prior Board Reports <https://whimsy.apache.org/board/minutes/Joshua>
>
> There are a few issues to resolve before drafting the graduation
> resolution however this community VOTE is timely. The VOTE will be open at
> least 72 hours and will pass if 3 +1's are received from the Joshua PPMC.
>
> [ ] +1 Graduate the Apache Joshua (Incubating) Project
> [ ] -1 NO NOT Graduate the Apache Joshua (Incubating) Project... please
> provide reasoning
>
> P.S. Here is my binding +1
>
> [0] https://incubator.apache.org/guides/graduation.html#the_
> graduation_process
>
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>



-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[REPORT]

2018-05-01 Thread lewis john mcgibbney
JoshuaJoshua is a statistical machine translation toolkitJoshua has
been incubating since 2016-02-13.Three most important issues to
address in the move towards graduation: 1. Complete graduation process
2. Further identifying specific use cases that Joshua might excel at.
3. Continue to attrac active developers and users.Any issues that the
Incubator PMC (IPMC) or ASF Board wish/need to beaware of?The Joshua
community has VOTE'd on Graduation with favorable
results.https://s.apache.org/dk8MHow has the community developed since
the last report?1. Jeff Zemerick and Suneel Marthi presented
'Embracing Diversity: Searching over multiple languages' at Haystack
Conf, Charlottesville VA using Apache Joshua, Apache Nifi and Apache
OpenNLP on April 10,
2018https://smarthi.github.io/haystack-embracing-diversity-searching-over-multiple-languages/#/2.
Suneel Marthi and Kellen Sunderland presented - Streaming Pipelines
for Neural Machine Translation using Apache Joshua, Apache Flink,
Apache OpenNLP at DataWorks Summit, Berlin on April 19,
2018https://smarthi.github.io/DSW-Berlin18-Streaming-NMT/#/How has the
project developed since the last report?The community has been engaged
with the Graduation VOTEHow would you assess the podling's
maturity?Please feel free to add your own commentary.  [X] Initial
setup  [X] Working towards first release  [X] Community building  [X]
Nearing graduation  [ ] Other:Date of last release:  2017-06-22When
were the last committers or PPMC members elected? - 2016-11-16 Michael
A. Hedderich (mhedderich) joins the Joshua PPMC +   Committership. -
2016-11-16 Tobias Domhan (tdomhan) joins the Joshua PPMC +
Committership. - 2016-11-02 Max Thomas (mthomas) joins the Joshua PPMC
+ Committership.Signed-off-by:  [ ](joshua) Paul Ramirez Comments:
 [X](joshua) Lewis John McGibbney Comments:  [ ](joshua) Chris
Mattmann Comments:  [ ](joshua) Tom Barber Comments:



-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Re: [VOTE] Graduate the Apache Joshua (Incubating) Project

2018-04-24 Thread lewis john mcgibbney
Thank you for the reply Rajesh, your VOTE will be registered.

Lewis

On Tue, Apr 24, 2018 at 11:06 PM, Rajesh Dharmadhikari <
rajes...@techmahindra.com> wrote:

> Hello Lewis,
>
>
>
> I am user of the tool Joshua. I used Joshua for a POC work.
>
> I agree with your proposal.
>
>
>
> I could not find any way to vote, but I am with you for this proposal.
>
>
>
> Thanks,
>
> Rajesh Dharmadhikari
>
>
>
> *From:* lewis john mcgibbney [mailto:lewi...@apache.org]
> *Sent:* Wednesday, April 25, 2018 10:32 AM
> *To:* dev@joshua.incubator.apache.org; u...@joshua.incubator.apache.org
> *Subject:* [VOTE] Graduate the Apache Joshua (Incubating) Project
>
>
>
> Hi Folks,
>
> I would like to open a VOTE for graduating the Apache Joshua (Incubating)
> project.
>
> For those that are interested, the Incubator guidelines on graduation can
> be found at [0].
>
> Joshua has been reporting to the IPMC since 16th March 2016 and made one
> Incubating release.
>
> Joshua Basics
>
>- Podling Proposal <http://wiki.apache.org/incubator/JoshuaProposal>
>- Status: current
>- Established: 2016-02-13
>- Incubating for 802 days
>- Prior Board Reports <https://whimsy.apache.org/board/minutes/Joshua>
>
> There are a few issues to resolve before drafting the graduation
> resolution however this community VOTE is timely. The VOTE will be open at
> least 72 hours and will pass if 3 +1's are received from the Joshua PPMC.
>
> [ ] +1 Graduate the Apache Joshua (Incubating) Project
>
> [ ] -1 NO NOT Graduate the Apache Joshua (Incubating) Project... please
> provide reasoning
>
>
>
> P.S. Here is my binding +1
>
>
> [0] https://incubator.apache.org/guides/graduation.html#the_
> graduation_process
>
>
> --
>
> http://home.apache.org/~lewismc/
>
> http://people.apache.org/keys/committer/lewismc
>
> 
> 
>
> Disclaimer:  This message and the information contained herein is
> proprietary and confidential and subject to the Tech Mahindra policy
> statement, you may review the policy at http://www.techmahindra.com/
> Disclaimer.html externally http://tim.techmahindra.com/tim/disclaimer.html
> internally within TechMahindra.
>
> 
> 
>



-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[VOTE] Graduate the Apache Joshua (Incubating) Project

2018-04-24 Thread lewis john mcgibbney
Hi Folks,
I would like to open a VOTE for graduating the Apache Joshua (Incubating)
project.
For those that are interested, the Incubator guidelines on graduation can
be found at [0].
Joshua has been reporting to the IPMC since 16th March 2016 and made one
Incubating release.

Joshua Basics

   - Podling Proposal 
   - Status: current
   - Established: 2016-02-13
   - Incubating for 802 days
   - Prior Board Reports 

There are a few issues to resolve before drafting the graduation resolution
however this community VOTE is timely. The VOTE will be open at least 72
hours and will pass if 3 +1's are received from the Joshua PPMC.

[ ] +1 Graduate the Apache Joshua (Incubating) Project
[ ] -1 NO NOT Graduate the Apache Joshua (Incubating) Project... please
provide reasoning

P.S. Here is my binding +1

[0]
https://incubator.apache.org/guides/graduation.html#the_graduation_process


-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Fwd: FW: April 2018 Newsletter - LDC

2018-04-18 Thread lewis john mcgibbney
-- Forwarded message --
From: Mcgibbney, Lewis J (398M) 
Date: Mon, Apr 16, 2018 at 9:57 AM
Subject: FW: April 2018 Newsletter - LDC
To: lewis john mcgibbney 






Dr. Lewis John McGibbney Ph.D., B.Sc.

Data Scientist II

Computer Science for Data Intensive Applications Group (398M)

Instrument Software and Science Data Systems Section (398)

Jet Propulsion Laboratory

California Institute of Technology

4800 Oak Grove Drive
<https://maps.google.com/?q=4800+Oak+Grove+Drive+%0D%0A+%0D%0A+%0D%0A+%0D%0A+Pasadena,+California+91109&entry=gmail&source=g>

Pasadena, California 91109
<https://maps.google.com/?q=4800+Oak+Grove+Drive+%0D%0A+%0D%0A+%0D%0A+%0D%0A+Pasadena,+California+91109&entry=gmail&source=g>
-8099

Mail Stop : 158-256C

Tel:  (+1) (818)-393-7402

Cell: (+1) (626)-487-3476

Fax:  (+1) (818)-393-1190

Email: lewis.j.mcgibb...@jpl.nasa.gov

ORCID: orcid.org/-0003-2185-928X



   [image: signature_334080446]



 Dare Mighty Things

*From: *Ldc-customers1  on behalf of
Penn LDC 
*Date: *Friday, April 13, 2018 at 7:47 AM
*To: *Penn LDC 
*Subject: *April 2018 Newsletter - LDC






*In this newsletter: *

*LDC at ICASSP 2018*

*LDC at the Philadelphia Science Carnival*

*New Publications:*

*Concretely Annotated New York Times*
<https://catalog.ldc.upenn.edu/LDC2018T12>

*H2, E2, ERK1 Children's Writing* <https://catalog.ldc.upenn.edu/LDC2018T05>

*TRAD Arabic-French Parallel Text -- Newsgroup*
<https://catalog.ldc.upenn.edu/LDC2018T13>

__

*LDC at ICASSP 2018*

LDC will be exhibiting at ICASSP 2018, held this year April 15-20 in
Calgary, Canada. Stop by booth B2 to learn more about recent developments
at the Consortium and new publications.

Also, be on the lookout for the following presentations featuring LDC work:


*Enhancement and Analysis of Conversational Speech: JSALT 2017 *Tuesday,
April 17, 16:00 - 18:00
Session: Speech Analysis

*Leveraging LSTM Models for Overlap Detection in Multi-Party Meetings*
Wednesday, April 18, 13:30 - 15:30
Session: Speaker Diarization & Identification


*A Novel LSTM-based Speech Preprocessor for Speaker Diarization in
Realistic Mismatch Conditions *Wednesday, April 18, 13:30 - 15:30
Session: Speaker Diarization & Identification

LDC will post conference updates via our Twitter feed and Facebook page. We
hope to see you there!


*LDC at the Philadelphia Science Carnival*

LDC will share the fun of language with the community  on Saturday, April
28, with a booth at the Philadelphia Science Carnival
<https://www.fi.edu/psf/science-carnival>. Visitors will enjoy three
language-oriented educational activities that include a language
identification game and Chinese character recognition..

The Philadelphia Science Carnival is an annual event organized by
Philadelphia’s Franklin Institute to acquaint children and adults with the
joys of science.


___


* New publications:*



(1) Concretely Annotated New York Times
<https://catalog.ldc.upenn.edu/LDC2018T12> was developed by Johns Hopkins
University's Human Language Technology Center of Excellence
<http://hltcoe.jhu.edu/>. It adds multiple kinds and instances of
automatically-generated syntactic, semantic, and coreference annotations to
The New York Times Annotated Corpus (LDC2008T19
<https://catalog.ldc.upenn.edu/LDC2008T19>). Concrete
<http://hltcoe.github.io/> is a schema for representing structured,
hierarchical, and overlapping linguistic annotations. This release provides
multiple tool outputs producing the same annotation types as different
annotation theories under a shared tokenization. Concretely Annotated New
York Times contains all of the 1.8 million articles in The New York Times
Annotated Corpus.

Concretely Annotated New York Times is distributed via hard drive.

2018 Subscription Members will receive copies of this corpus provided they
have submitted a completed copy of the special license agreement. 2018
Standard Members may request a copy as part of their 16 free membership
corpora. Any organization that licensed The New York Times Annotated Corpus
(LDC2008T19) may request a copy of Concretely Annotated New York Times
(LDC2018T12) for a $250 media fee.  Non-members may license this data for
$300.


*



(2) H2, E2, ERK1 Children's Writing
<https://catalog.ldc.upenn.edu/LDC2018T05> was developed by the Cooperative
State University Baden-Württemberg, University of Education.
<http://www.dhbw.de/english/dhbw/about-us.html> It consists of
approximately 2,000 texts written over four months by 173 German school
children age six through eleven years. The data in this corpus was
collected by elementary schools in Baden Württemberg, Germany, and
digitized at the Cooperative State University during the 2016/2017 school
y

Establish whether "Apache Joshua" is a suitable name

2018-03-05 Thread lewis john mcgibbney
Hi Folks,
I've been working with a few folks on PODLINGNAMESEARCH-97 [0]. This
essentially to determine whether the Joshua name can be trademarked and
used moving forward if we wished, as a PPMC/PMC, to do this.
Can folks please mention your opinions on that thread? If so, we can move
on.
Thank you
Lewis

[0] https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-97

-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[REPORT] Apache Joshua (Incubating)

2018-02-07 Thread lewis john mcgibbney
JoshuaJoshua is a statistical machine translation toolkitJoshua has
been incubating since 2016-02-13.Three most important issues to
address in the move towards graduation:  1. Draft Graduation
Resolution  2. Identifying specific use cases that Joshua might excel
at.  3. Attracting active developers and users.Any issues that the
Incubator PMC (IPMC) or ASF Board wish/need to beaware of?Joshua PPMC
are in the process of moving towards an initial graduation resolution
draft.How has the community developed since the last report?The PPMC
has decided not to use the 7.X branch as new master.Current master
branch has more features and we have decided touse it as the basis for
moving forward.How has the project developed since the last report?New
work is going in to updating the Homebrew Formula. Thiswill make
Joshua available with some 50 or so language packs.Essentially, this
will make Joshua the most comprehensivelypackaged open source machine
translation library available.How would you assess the podling's
maturity?Please feel free to add your own commentary.  [X] Initial
setup  [X] Working towards first release  [X] Community building  [X]
Nearing graduation  [ ] Other:Date of last release:  2017-06-22When
were the last committers or PPMC members elected?  - 2016-11-16
Michael A. Hedderich (mhedderich) joins the Joshua PPMC +
Committership.  - 2016-11-16 Tobias Domhan (tdomhan) joins the Joshua
PPMC + Committership.  - 2016-11-02 Max Thomas (mthomas) joins the
Joshua PPMC + Committership.Signed-off-by:  [ ](joshua) Paul Ramirez
  Comments:  [X](joshua) Lewis John McGibbney Comments:  [
](joshua) Chris Mattmann Comments:  [ ](joshua) Tom Barber
Comments:



-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[jira] [Commented] (JOSHUA-334) Update Homebrew Formular with all language pack options

2018-02-02 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351304#comment-16351304
 ] 

Lewis John McGibbney commented on JOSHUA-334:
-

Progress can be seen at 
https://github.com/lewismc/homebrew-core/tree/joshua_language_packs
Still lots of SHA256 calculation and remote URL resolution fir dropbox but we 
are getting there.

> Update Homebrew Formular with all language pack options
> ---
>
> Key: JOSHUA-334
> URL: https://issues.apache.org/jira/browse/JOSHUA-334
> Project: Joshua
>  Issue Type: Improvement
>  Components: homebrew-formula, language packs
>    Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 6.2
>
>
> When I originally wrote the [Homebrew 
> Formula|https://github.com/Homebrew/homebrew-core/blob/00eea5b204b069416142352ca24314c024f5d6c7/Formula/joshua.rb#L18-L20],
>  I added options for installing the old *with-es-en-phrase-pack*, 
> *with-ar-en-phrase-pack* and *with-zh-en-hiero-pack* language packs.
> Back then, these were staged on Matt's server at Johns Hopkin but they have 
> since been relocated to Tom's dropbox. Additionally, we now have a wealth of 
> other language packs which are not currently available through the Formula.
> This issue is pretty large in scope, but in essence will update the Formula 
> to provide options for installing [all of our language 
> packs|https://cwiki.apache.org/confluence/display/JOSHUA/Language+Packs].
> Once this is done, it will be very powerful and extremely useful tooling for 
> Joshua.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (JOSHUA-334) Update Homebrew Formular with all language pack options

2018-02-02 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created JOSHUA-334:
---

 Summary: Update Homebrew Formular with all language pack options
 Key: JOSHUA-334
 URL: https://issues.apache.org/jira/browse/JOSHUA-334
 Project: Joshua
  Issue Type: Improvement
  Components: homebrew-formula, language packs
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 6.2


When I originally wrote the [Homebrew 
Formula|https://github.com/Homebrew/homebrew-core/blob/00eea5b204b069416142352ca24314c024f5d6c7/Formula/joshua.rb#L18-L20],
 I added options for installing the old *with-es-en-phrase-pack*, 
*with-ar-en-phrase-pack* and *with-zh-en-hiero-pack* language packs.
Back then, these were staged on Matt's server at Johns Hopkin but they have 
since been relocated to Tom's dropbox. Additionally, we now have a wealth of 
other language packs which are not currently available through the Formula.
This issue is pretty large in scope, but in essence will update the Formula to 
provide options for installing [all of our language 
packs|https://cwiki.apache.org/confluence/display/JOSHUA/Language+Packs].
Once this is done, it will be very powerful and extremely useful tooling for 
Joshua.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (JOSHUA-328) failure when glue grammar is listed first

2018-02-02 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated JOSHUA-328:

Fix Version/s: (was: 6.1)
   6.2

> failure when glue grammar is listed first
> -
>
> Key: JOSHUA-328
> URL: https://issues.apache.org/jira/browse/JOSHUA-328
> Project: Joshua
>  Issue Type: Bug
>Affects Versions: 6.1
>Reporter: Matt Post
>Priority: Major
> Fix For: 6.2
>
>
> If doing CKY-decoding (-search cky), listing the glue grammar before the 
> packed grammar results in a parsing failure. E.g., the following lines in the 
> config file:
> tm = thrax -maxspan -1 -owner glue -path model/glue.grammar
> tm = thrax -maxspan 20 -path model/grammar.packed -owner pt
> will result in failed decoding every time, and a printing of the following 
> error message:
> ERROR - the goal_bin does not have exactly one item



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: version 3 language pack

2018-01-30 Thread lewis john mcgibbney
Hi Cameron,
Replies inline

On Mon, Jan 29, 2018 at 8:20 PM, <
dev-digest-h...@joshua.incubator.apache.org> wrote:

>
> From: Cameron 
> To: d...@joshua.apache.org
> Cc:
> Bcc:
> Date: Wed, 24 Jan 2018 23:05:55 +
> Subject: version 3 language pack
> Hi
>
> I have been using the german-english vers 2 language pack and think it is
> great.
>

Excellent. Thank you for the feedback.


>
> My computer is quiet slow, can you give me an indication of the time taken
> to translate the example file on a reasonably fast machine?
>

As you can guess, this depends on a number of variables.
1) size of input
2) available RAM on machine
3) size of model
4)... etc.
Which example file are you referring to? Is there one included with the
language pack?


>
> When will the version 3 pack be released for this language combination?
>
>
>
No time soon. Unless another/more community contributions come in, this
language pack can be considered as 'final'.
Lewis


Fwd: FW: January 2018 Newsletter - LDC

2018-01-16 Thread lewis john mcgibbney
FYI folks.

-- Forwarded message --
From: Mcgibbney, Lewis J (398M) 
Date: Tue, Jan 16, 2018 at 9:09 AM
Subject: FW: January 2018 Newsletter - LDC
To: lewis john mcgibbney 






Dr. Lewis John McGibbney Ph.D., B.Sc.

Data Scientist II

Computer Science for Data Intensive Applications Group (398M)

Instrument Software and Science Data Systems Section (398)

Jet Propulsion Laboratory

California Institute of Technology

4800 Oak Grove Drive
<https://maps.google.com/?q=4800+Oak+Grove+Drive%0D+%0D+%0D+Pasadena,+California+91109&entry=gmail&source=g>

Pasadena, California 91109
<https://maps.google.com/?q=4800+Oak+Grove+Drive%0D+%0D+%0D+Pasadena,+California+91109&entry=gmail&source=g>
-8099

Mail Stop : 158-256C

Tel:  (+1) (818)-393-7402 <(818)%20393-7402>

Cell: (+1) (626)-487-3476 <(626)%20487-3476>

Fax:  (+1) (818)-393-1190 <(818)%20393-1190>

Email: lewis.j.mcgibb...@jpl.nasa.gov

ORCID: orcid.org/-0003-2185-928X



   [image: ignature_147184835]



 Dare Mighty Things



*From: *Ldc-customers1  on behalf of
Penn LDC 
*Date: *Tuesday, January 16, 2018 at 8:15 AM
*To: *Penn LDC 
*Subject: *January 2018 Newsletter - LDC



*In this newsletter: *

*Membership Discounts for MY2018 Still Available*

*New Publications:*

DEFT Spanish Treebank <https://catalog.ldc.upenn.edu/LDC2018T01>

DIRHA English WSJ Audio <https://catalog.ldc.upenn.edu/LDC2018S01>

TRAD Chinese-French Parallel Text – Blog
<https://catalog.ldc.upenn.edu/LDC2018T02>




___



*Membership Discounts for MY2018 Still Available*

Join LDC while membership savings are still available. Now through March 1,
2018, renewing MY2017 members will receive a 10% discount off the
membership fee. New or non-consecutive member organizations will receive a
5% discount. Membership remains the most economical way to access LDC
releases. This year’s planned publications include Multilanguage
Conversational Telephone Speech, IARPA Babel Language Packs (telephone
speech and transcripts), DIRHA (Distant-speech Interaction for Robust Home
Applications), TRAD (Chinese-French and Arabic-French parallel text), data
from BOLT, DEFT, LORELEI, RATS and TAC KBP, and more. Browse the Members
<https://www.ldc.upenn.edu/members/join-ldc> pages for details on
membership options and benefits.


___


* New publications:*



(1) DEFT Spanish Treebank <https://catalog.ldc.upenn.edu/LDC2018T01> was
developed by LDC and the Language and Computation Center (CLiC), University
of Barcelona <http://clic.ub.edu/>. It contains treebank annotation of
international Spanish newswire text and Latin American Spanish discussion
forum data created for the DARPA Deep Exploration and Filtering of Text
(DEFT) program. DEFT Spanish Treebank supported the program's goal of deep
natural language understanding.



Newswire source files were selected from Spanish Gigaword Third Edition (
LDC2011T12 <https://catalog.ldc.upenn.edu/ldc2011t12>) and were manually
sentence-segmented for DEFT. Discussion forum source files were selected
from Spanish discussion forum source data collected by LDC, consisting of
continuous multi-posts of 100-1000 words.



This release contains 114 files (54,394 tokens) of newswire data and 60
files (55,307 tokens) of discussion forum data all of which were annotated
with constituents and syntactic functions.

DEFT Spanish Treebank is distributed via web download.



2018 Subscription Members will receive copies of this corpus. 2018 Standard
Members may request a copy as part of their 16 free membership corpora.
Non-members may license this data for US $1000.



*



(2) DIRHA English WSJ Audio <https://catalog.ldc.upenn.edu/LDC2018S01> was
developed as part of the Distant-Speech Interaction for Robust Home
Applications (DIRHA) Project <https://dirha.fbk.eu/> which addressed
natural spontaneous speech interaction with distant microphones in a
domestic environment. It is comprised of approximately 85 hours of real and
simulated read speech by six native American English speakers. The target
utterances were taken from CSR-I (WSJ0) Complete (LDC93S6A
<https://catalog.ldc.upenn.edu/LDC93S6A/>), specifically, the 5,000 word
subset of read speech from Wall Street Journal news text.



Speech was collected in a real apartment setting with typical domestic
background noise and inter/intra-room reverberation effects. Annotations,
speaker metadata and images of the apartment setting are also included.



DIRHA English WSJ Audio is distributed via web download.



2018 Subscription Members will receive copies of this corpus provided they
have submitted a completed copy of the special license agreement. 2018
Standard Members may request a copy as part of their 16 free membership
corpo

[jira] [Commented] (JOSHUA-332) Merge 7 branch into master

2018-01-10 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16320535#comment-16320535
 ] 

Lewis John McGibbney commented on JOSHUA-332:
-

If this is an entire PITA I would just leave it and close it as not an issue. 

>  Merge 7 branch into master
> ---
>
> Key: JOSHUA-332
> URL: https://issues.apache.org/jira/browse/JOSHUA-332
> Project: Joshua
>  Issue Type: Task
>  Components: core
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
> Fix For: 7
>
>
> As discussed on the mailing list, let's branch _master_ into a _6x_ branch 
> and merge branch _7_ into _master_ in order to keep developing on top of the 
> latest in the main branch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (JOSHUA-333) The English-English Language Pack download links are broken.

2018-01-05 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16313425#comment-16313425
 ] 

Lewis John McGibbney commented on JOSHUA-333:
-

[~bugg_tb] were these files copied when we migrated from [~post]'s server to 
Dropbox?

> The English-English Language Pack download links are broken.
> 
>
> Key: JOSHUA-333
> URL: https://issues.apache.org/jira/browse/JOSHUA-333
> Project: Joshua
>  Issue Type: Bug
>Reporter: David Gonzalez
>
> On the Apache Joshua English-English wiki page the ruleset (PPDB v2) 
> downloads are all broken (404).
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65142863



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[REPORT]

2017-10-31 Thread lewis john mcgibbney
Hi Folks,
Report is below, please check and augment if you have time

https://wiki.apache.org/incubator/November2017
Lewis

JoshuaJoshua is a statistical machine translation toolkitJoshua has
been incubating since 2016-02-13.Three most important issues to
address in the move towards graduation:  1. Make another Apache Joshua
incubating release (7.0.0).  2. Identifying specific use cases that
Joshua might excel at.  3. Attracting active developers and users.Any
issues that the Incubator PMC (IPMC) or ASF Board wish/need to beaware
of?  NoneHow has the community developed since the last report?The
community mailing lists continue to see a few new usersrequiring
assistance with using the project.Questions have been answered in
reasonable time. Community building continues to be the biggest
challenge for Joshua rightnow.How has the project developed since the
last report?The PPMC is actively working towards a 7.0.0
releasecandidate which will essentially be a re-modularization ofthe
Joshua source code for distribution and hopefully improved consumption
as a dependency within other projects.We have seen a few pull requests
logged specifically addressingsource code formatting, this is a
positive.How would you assess the podling's maturity?Please feel free
to add your own commentary.  [X] Initial setup  [X] Working towards
first release  [X] Community building  [X] Nearing graduation  [ ]
Other:Date of last release:  2017-06-22When were the last committers
or PPMC members elected?  - 2016-11-16 Michael A. Hedderich
(mhedderich) joins the Joshua PPMC +Committership.  - 2016-11-16
Tobias Domhan (tdomhan) joins the Joshua PPMC + Committership.  -
2016-11-02 Max Thomas (mthomas) joins the Joshua PPMC +
Committership.Signed-off-by:  [ ](joshua) Paul Ramirez Comments:
[X](joshua) Lewis John McGibbney Comments: I suspect that the
Joshua PPMC will move towards proposing the  project graduates
post 7.0.0 release. I am working with Tommaso to progress the 7.X
branch merge into mainstream development.   [ ](joshua) Chris Mattmann
Comments:  [ ](joshua) Tom Barber Comments:IPMC/Shepherd
notes:


-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


[jira] [Commented] (JOSHUA-332) Merge 7 branch into master

2017-10-26 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16220815#comment-16220815
 ] 

Lewis John McGibbney commented on JOSHUA-332:
-

Damn Tommaso. Is there still a lot of work to do?

>  Merge 7 branch into master
> ---
>
> Key: JOSHUA-332
> URL: https://issues.apache.org/jira/browse/JOSHUA-332
> Project: Joshua
>  Issue Type: Task
>  Components: core
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
> Fix For: 7
>
>
> As discussed on the mailing list, let's branch _master_ into a _6x_ branch 
> and merge branch _7_ into _master_ in order to keep developing on top of the 
> latest in the main branch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (JOSHUA-332) Merge 7 branch into master

2017-10-25 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219517#comment-16219517
 ] 

Lewis John McGibbney commented on JOSHUA-332:
-

[~teofili] I see that your recent [link mailing list 
discussion|https://lists.apache.org/thread.html/b43cdffd8f3ea7b7c70929eed4aaa989af31bcdc5b5e8320ff412dd4@%3Cdev.joshua.apache.org%3E]
 may have not been resolved yet. Is this preventing the replacement of current 
master with 7 branch?
Thanks

>  Merge 7 branch into master
> ---
>
> Key: JOSHUA-332
> URL: https://issues.apache.org/jira/browse/JOSHUA-332
> Project: Joshua
>  Issue Type: Task
>  Components: core
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
> Fix For: 7
>
>
> As discussed on the mailing list, let's branch _master_ into a _6x_ branch 
> and merge branch _7_ into _master_ in order to keep developing on top of the 
> latest in the main branch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [DISCUSS] Graduation (was Re: Path to TLP)

2017-10-06 Thread lewis john mcgibbney
Hi Folks,
In my limited experience at Apache ;)
I've come to notice that communities and therefore projects far exceed
their usefulness outside of what current industry or academia is doing.
Examples are all over the place, but my own experience stems from my
involvement with the Apache Nutch project. Key inventors of that software
moved on to Hadoop and goodness knows whatever else, but the current Nutch
community remains at around ~1K subscribers on out user@ mailing list. I've
personally seen and pushed >15 releases used by countless (1000's) of
people around the world. The software exists are THE best maintained,
highest quality, production ready Web search software current available to
this day.
Chris' points are well founded, Tomasso's match very appropriately to the
fact that Joshua is nowhere near a dead project. I acknowledge that no-one
said it was. The resources available for Joshua are FAR more comprehensive
than anywhere else I've seen. FAR FAR more comprehensive. Joshua is the
FIRST toolkit to be made available as a packaged, consumable,
community-backed software artifact for anyone attempting to get involved
with machine translation.
NONE of the NMT software communities even come close to providing new
software developers with translation packs as Joshua does. They don't even
come close. AFAIK, all of the people so far working on NMT have kept
everything proprietary... which is utterly useless for the next person or
the next academic, etc.
This highlights the essence and hits at the heart of why a group of us
shepharded Joshua into the ASF in the first place.
Believe me, if people are actively discussing a new release on an Apache
mailing list (or any mailing list for that matter), there is always purpose
in continuing.
To bring this back a bit, I will openly state that Matt you have been an
excellent champion for JHU as well as representing yourself with regards to
the way you have adopted and displayed a forward thinking, collaborative
mentality for Joshua.
If you feel your job is 'done', then I congratulate you.
Joshua will live on... at Apache.
Writing software at Apache is not about a competition. It is about writing
high quality software in a collaborative environment for the public good.
We achieve this through peer review from people we have probably never met.
That is called community.
If you would be gracious enough to stay with the community as a PMC Chair
then it would be highly appreciated. If you feel at any time that this is
too much, then let us know. We will be here and we will act when we cross
that bridge.
Over and out folks.
Lewis


On Thu, Oct 5, 2017 at 10:04 PM, <
dev-digest-h...@joshua.incubator.apache.org> wrote:

>
>
> From: Matt Post 
> To: dev@joshua.incubator.apache.org
> Cc:
> Bcc:
> Date: Fri, 6 Oct 2017 07:03:58 +0200
> Subject: Re: [DISCUSS] Graduation (was Re: Path to TLP)
> Thanks Tommaso. Though, I should say, initial thanks goes to Zhifei Li. I
> just took it over.
>
> I think I can stick around in the capacity Chris suggests. Thanks, all.
>
> matt
>
>


Re: [DISCUSS] Graduation (was Re: Path to TLP)

2017-09-28 Thread lewis john mcgibbney
Certainly,, I think Joshua is ready to graduate.
I see no reason that we should't also push a 7.X release once 7.X becomes
master.
That is by no means however a blocker for us taking a VOTE' thread to
general@incubator.a.o
Anyone else?

On Mon, Sep 25, 2017 at 5:14 AM, <
dev-digest-h...@joshua.incubator.apache.org> wrote:

>
>
> From: Chris Mattmann 
> To: "dev@joshua.incubator.apache.org" 
> Cc:
> Bcc:
> Date: Fri, 22 Sep 2017 16:21:26 -0700
> Subject: [DISCUSS] Graduation (was Re: Path to TLP)
> Tom, glad you raised this issue, IMO, Joshua is ready for TLP.
>
> We’ve:
>
> 1. Added new PPMC/committers
> 2. Made a release
> 3. Been friendly and cordial and welcoming on the lists
> 4. Vetted the software
> 5. Have some decent, emerging docs
>
> Graduation time…Thoughts?
>
> Cheers,
> Chris
>
> P.S. Subject line change to officially turn this into a [DISCUSS] and
> hopefully
> a [VOTE]
>
>
>
>
>
> Tommaso
>
>
> > Lewis
> >
> > [0]
> >
> > https://lists.apache.org/thread.html/67e5cdde9c9652ec98bca2705cb1c4
> bdd253ec9c0966e9b3064c47d8@%3Cdev.joshua.apache.org%3E
> >
> > On Wed, Sep 20, 2017 at 2:50 AM, <
> > dev-digest-h...@joshua.incubator.apache.org> wrote:
> >
> > >
> > > From: Tommaso Teofili 
> > > To: "dev@joshua.incubator.apache.org"  >
> > > Cc:
> > > Bcc:
> > > Date: Wed, 20 Sep 2017 09:50:29 +
> > > Subject: merging 7 branch to master
> > > hi all,
> > >
> > > how about :
> > > - moving the master branch into a 6.x branch
> > > - merging 7 branch into master
> > >
> > > This way we can support 6.1 with bugfixes and minor releases and go
> ahead
> > > with development on version 7.
> > >
> > > Regards,
> > > Tommaso
> > >
> > >
> >
> >
> > --
> > http://home.apache.org/~lewismc/
> > @hectorMcSpector
> > http://www.linkedin.com/in/lmcgibbney
> >
>
>


-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Re: merging 7 branch to master

2017-09-20 Thread lewis john mcgibbney
Hi Tommaso,

I'm +1 to this. From my previous thread on the topic as well [0] I've seen
nothing which did not support making 7.X as master.
Do you want to go ahead and do this Tommaso? Also, do you want to take a
merging JOSHUA-209 into the new master?
After that is done, we can go ahead and clean up the branches and to then
look at the Git workflow as others suggested.
Lewis

[0]
https://lists.apache.org/thread.html/67e5cdde9c9652ec98bca2705cb1c4bdd253ec9c0966e9b3064c47d8@%3Cdev.joshua.apache.org%3E

On Wed, Sep 20, 2017 at 2:50 AM, <
dev-digest-h...@joshua.incubator.apache.org> wrote:

>
> From: Tommaso Teofili 
> To: "dev@joshua.incubator.apache.org" 
> Cc:
> Bcc:
> Date: Wed, 20 Sep 2017 09:50:29 +
> Subject: merging 7 branch to master
> hi all,
>
> how about :
> - moving the master branch into a 6.x branch
> - merging 7 branch into master
>
> This way we can support 6.1 with bugfixes and minor releases and go ahead
> with development on version 7.
>
> Regards,
> Tommaso
>
>


-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Re: About how to use Jousha translator

2017-09-12 Thread lewis john mcgibbney
Hi Tehetena,
If some words are not trained correctly then you may wish to experiment
building a new model.
This is when you need to run the pipeline yourself and make best efforts to
converge on a better translation consistency.
Lewis

On Tue, Sep 12, 2017 at 5:57 AM Tehetena Alemu  wrote:

>
> Okay Thank you very much again Lewis. I will try to contact them.
>
> The translator works well, but it doesn't translate some basic Amharic
> words. I think the amharic part of the translator is poor, I might need
> some enhancement to use on my study.
>
> I have no word to thank you again for your help.
>
>
>
>
> On Tuesday, September 12, 2017, lewis john mcgibbney 
> wrote:
>
>> If I were you I would simply contact dev@joshia with that query then.
>> Someone on the list should hopefully see the comment and respond.
>> It looks like an update to this documentation is possibly required as I
>> am not sure if anyone is actively working on this... I may be wrong however!
>>
>> On Tue, Sep 12, 2017 at 3:07 AM Tehetena Alemu 
>> wrote:
>>
>>> https://cwiki.apache.org/confluence/display/JOSHUA/Language+Packs
>>>
>>> "*Version 3 Language Packs Coming Soon*
>>> (March 2017) Version 3 language packs with Kenlm (via Docker) and more
>>> complete Google Translate API support
>>> <https://cloud.google.com/translate/docs/reference/rest> are coming
>>> soon. If you have questions, comments, concerns, or wish to help, please
>>> post questions to the Joshua mailing list: d...@joshua.apache.org."
>>>
>>> Tehetena Alemu
>>>
>>> On Tue, Sep 12, 2017 at 1:45 AM, lewis john mcgibbney <
>>> lewi...@apache.org> wrote:
>>>
>>>> Where did you get this information from?
>>>>
>>>> On Mon, Sep 11, 2017 at 12:28 PM, Tehetena Alemu 
>>>> wrote:
>>>>
>>>>> Thank you very much Lewis , it is very kind of you. Your help means a
>>>>>> lot. By the way, 2 weeks is the time i took on trying diffrent options ,
>>>>>> but not for getting  a response.
>>>>>>
>>>>>
>>>>> On the other way, I just found out jousha pack 3 will be released
>>>>> soon, with  Google translation. When will it be released ? It will be a
>>>>> very good contribution to my paper.
>>>>>
>>>>> Best,
>>>>>
>>>>>
>>>>> --
>>>>> Tehetena Alemu
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> http://home.apache.org/~lewismc/
>>>> @hectorMcSpector
>>>> http://www.linkedin.com/in/lmcgibbney
>>>>
>>>
>>> --
>> http://home.apache.org/~lewismc/
>> @hectorMcSpector
>> http://www.linkedin.com/in/lmcgibbney
>>
>
>
> --
> Tehetena Alemu
>
> --
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Re: About how to use Jousha translator

2017-09-12 Thread lewis john mcgibbney
If I were you I would simply contact dev@joshia with that query then.
Someone on the list should hopefully see the comment and respond.
It looks like an update to this documentation is possibly required as I am
not sure if anyone is actively working on this... I may be wrong however!

On Tue, Sep 12, 2017 at 3:07 AM Tehetena Alemu  wrote:

> https://cwiki.apache.org/confluence/display/JOSHUA/Language+Packs
>
> "*Version 3 Language Packs Coming Soon*
> (March 2017) Version 3 language packs with Kenlm (via Docker) and more
> complete Google Translate API support
> <https://cloud.google.com/translate/docs/reference/rest> are coming soon.
> If you have questions, comments, concerns, or wish to help, please post
> questions to the Joshua mailing list: d...@joshua.apache.org."
>
> Tehetena Alemu
>
> On Tue, Sep 12, 2017 at 1:45 AM, lewis john mcgibbney 
> wrote:
>
>> Where did you get this information from?
>>
>> On Mon, Sep 11, 2017 at 12:28 PM, Tehetena Alemu 
>> wrote:
>>
>>> Thank you very much Lewis , it is very kind of you. Your help means a
>>>> lot. By the way, 2 weeks is the time i took on trying diffrent options ,
>>>> but not for getting  a response.
>>>>
>>>
>>> On the other way, I just found out jousha pack 3 will be released soon,
>>> with  Google translation. When will it be released ? It will be a very good
>>> contribution to my paper.
>>>
>>> Best,
>>>
>>>
>>> --
>>> Tehetena Alemu
>>>
>>>
>>
>>
>> --
>> http://home.apache.org/~lewismc/
>> @hectorMcSpector
>> http://www.linkedin.com/in/lmcgibbney
>>
>
> --
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Re: About how to use Jousha translator

2017-09-11 Thread lewis john mcgibbney
Where did you get this information from?

On Mon, Sep 11, 2017 at 12:28 PM, Tehetena Alemu  wrote:

> Thank you very much Lewis , it is very kind of you. Your help means a lot.
>> By the way, 2 weeks is the time i took on trying diffrent options , but not
>> for getting  a response.
>>
>
> On the other way, I just found out jousha pack 3 will be released soon,
> with  Google translation. When will it be released ? It will be a very good
> contribution to my paper.
>
> Best,
>
>
> --
> Tehetena Alemu
>
>


-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Re: About how to use Jousha translator

2017-09-11 Thread lewis john mcgibbney
Hi Tehetena,
If you are looking for help with Joshua, please consider joining the Joshua
community mailing lists

https://cwiki.apache.org/confluence/display/JOSHUA/Support

More people will be able to help you there and will hopefully prevent you
from waiting 2 weeks to arrive at an answer. Please see my other responses
inline

On Mon, Sep 11, 2017 at 11:38 AM, Tehetena Alemu  wrote:

> Thank you again for your help. I have tried to remove back ward slash but
>> the error is still same.
>>
>
As I stated before, I don't think you are looking create/train a language
model... you are only looking to use the Amharic–English language pack. You
do not need to run the pipeline. You should therefore do the following


   1. Download the language pack from the relevant wiki page
   https://cwiki.apache.org/confluence/display/JOSHUA/Language+Packs
   2. decompress is as follows "tar xopf apache-joshua-am-en-2016-11-18.tar"
   3. change into the decompressed language pack "cd
   apache-joshua-am-en-2016-11-18"
   4. Read the content contained within the README "cat README"
   5. Run the Joshua decoder "./joshua
   6. The Joshua decoder will start running, accepting input from STDIN and
   writing to STDOUT. Joshua expects its input in the form of a single
   sentence per line. Each sentence should first be piped through
   `prepare.sh`, which normalizes and tokenizes the input for the language
   pack's source language. You can therefore throw some text at the Decoder
   server "cat example.am | ./prepare.sh | ./joshua > output.en"
   7. Take a look at the output "cat output.en"

Hopefully the above enables you to move forward.


> By the way I think I don't very well understand how i define "--type
>> (hiero...) ". I guess it been difficult for me to understand and follow how
>> the Joshua translator works. Plus I couldn't find a readme file on the
>> language pack. And it took me more than 2 weeks. Your help means alot,
>> because I got run out of time.
>>
>
>
Please read what I have provided above. It works perfectly for me so it
should also work for you. If it does not, write back to us and please make
sure to include the dev@ mailing list when you respond.
Thank you
Lewis


Re: About how to use Jousha translator

2017-09-08 Thread lewis john mcgibbney
Hi Tehetena,
Replies inline...

On Thu, Sep 7, 2017 at 6:46 AM, Tehetena Alemu  wrote:

> Thank you for your replay and the links.
>
> Even if I don't understand perfectly what command I use to finally
> translate english text to amharic. I have perfectly accomplish this
> tutorial "https://cwiki.apache.org/confluence/display/JOSHUA/
> Getting+Started"
>

ok good.


>
> But when I come to Joshua Tutorial  "https://cwiki.apache.org/
> confluence/display/JOSHUA/Joshua+Tutorial".
> I have faced the following error in "creating the baseline run" as shown
> in the picture attached below.
> [image: Inline image 1]
>
> Thank you very much for your help.
>
>
>
The command above is for running a pipeline which generates a language
pack. This is much more than you need in order to use the Amharic -->
English language pack.

In addition, I see you are attempting to invoke the pipeline with the
incorrect parameters. Please read the output "...FATAL: You must define
--type (hiero|samt|ghkm|phrase|moses)"
Please ensure that one of them is included in the parameters.
Additionally, if you are running this entire task on one line of standard
input, please remove all instances of "\" as these are used to concatenate
input which spans multiple lines of standard input.

I would also advise you to take a look inside of the language pack itself.
If I'm not mistaken all language packs are packaged with a README which
clearly details how to run basic translations.
Lewis


Re: About how to use Jousha translator

2017-09-06 Thread lewis john mcgibbney
Hi Tehetena,
CC dev@joshua

On Sat, Sep 2, 2017 at 1:21 AM, Tehetena Alemu  wrote:

> Dear Lewis,
>
> I thank you for making your transltion system free. I want to use Joshua
> translation into my Thesis. But i find it very hard to know how I can use
> Joshua for translation. And there is no simplefied tutorial.
>

The primary tutorial can be found at
https://cwiki.apache.org/confluence/display/JOSHUA/Getting+Started
followed by
https://cwiki.apache.org/confluence/display/JOSHUA/Joshua+Tutorial
If there are specific issues please report them to the community mailing
list.


>
> I have successfuly managed to build Joshua using ant. But I got confused
> on how I use Joshua to translate text.  I want to use the Amharic and
> English Language pack.
>

This language pack can be obtained from
Amharic–English

What specifically is it you are having problems with?


>
> Would you mind to give an example text using either command line or java
> or Apache Tika to translate from Amharic to English or vise vera.
>

You should look at one of the deployment mechanisms
https://cwiki.apache.org/confluence/display/JOSHUA/Deployment
Again if you have any issues then please let us know.
Thanks
Lewis


-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Re: Podling Report Reminder - August 2017

2017-08-05 Thread lewis john mcgibbney
Hello Suneel,
Your not an outsider at all... :)
Yes I've added the contribution to the wiki page now.
I watched your session and enjoyed it very much. I am glad they were
recording sessions at BB as I enjoyed listening to your German accent :) :)
:)
Tommaso's was equally as enjoyable!
Very good work and very well presented. I think we should definitely link
to it from our wiki site.
Lewis

On Sat, Aug 5, 2017 at 7:27 AM,  wrote:

>
> From: Suneel Marthi 
> To: dev@joshua.incubator.apache.org
> Cc:
> Bcc:
> Date: Tue, 1 Aug 2017 18:58:40 -0400
> Subject: Re: Podling Report Reminder - August 2017
> An outsider to the party here, pardon my meddling:
>
> Would u also wanna add to the report the following:
>
> Tommaso Teofili and Suneel Marthi, presented 'Embracing Diversity:
> Searching Over multiple languages' on June 12, 2017 at Berlin Buzzwords,
> Berlin - demostrating machine translation using Apache Joshua.
>
>


Re: Podling Report Reminder - August 2017

2017-08-01 Thread lewis john mcgibbney
Hi Folks,
I've contributed our report.
Please scope it out and see what you think.
Lewis

On Sat, Jul 29, 2017 at 6:02 AM, <
dev-digest-h...@joshua.incubator.apache.org> wrote:

>
> dev Digest 29 Jul 2017 13:02:50 - Issue 225
>
> Topics (messages 2237 through 2237)
>
> Podling Report Reminder - August 2017
> 2237 by: johndament.apache.org
>
> Administrivia:
>
> -
> To post to the list, e-mail: dev@joshua.incubator.apache.org
> To unsubscribe, e-mail: dev-digest-unsubscr...@joshua.incubator.apache.org
> For additional commands, e-mail: dev-digest-help@joshua.
> incubator.apache.org
>
> --
>
>
>
> -- Forwarded message --
> From: johndam...@apache.org
> To: dev@joshua.incubator.apache.org
> Cc:
> Bcc:
> Date: Sat, 29 Jul 2017 13:02:37 -
> Subject: Podling Report Reminder - August 2017
> Dear podling,
>
> This email was sent by an automated system on behalf of the Apache
> Incubator PMC. It is an initial reminder to give you plenty of time to
> prepare your quarterly board report.
>
> The board meeting is scheduled for Wed, 16 August 2017, 10:30 am PDT.
> The report for your podling will form a part of the Incubator PMC
> report. The Incubator PMC requires your report to be submitted 2 weeks
> before the board meeting, to allow sufficient time for review and
> submission (Wed, August 02).
>
> Please submit your report with sufficient time to allow the Incubator
> PMC, and subsequently board members to review and digest. Again, the
> very latest you should submit your report is 2 weeks prior to the board
> meeting.
>
> Thanks,
>
> The Apache Incubator PMC
>
> Submitting your Report
>
> --
>
> Your report should contain the following:
>
> *   Your project name
> *   A brief description of your project, which assumes no knowledge of
> the project or necessarily of its field
> *   A list of the three most important issues to address in the move
> towards graduation.
> *   Any issues that the Incubator PMC or ASF Board might wish/need to be
> aware of
> *   How has the community developed since the last report
> *   How has the project developed since the last report.
> *   How does the podling rate their own maturity.
>
> This should be appended to the Incubator Wiki page at:
>
> https://wiki.apache.org/incubator/August2017
>
> Note: This is manually populated. You may need to wait a little before
> this page is created from a template.
>
> Mentors
> ---
>
> Mentors should review reports for their project(s) and sign them off on
> the Incubator wiki page. Signing off reports shows that you are
> following the project - projects that are not signed may raise alarms
> for the Incubator PMC.
>
> Incubator PMC
>
>
>


-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Re: Joshua decoder error

2017-07-31 Thread lewis john mcgibbney
Hi Arezoo,

On Mon, Jul 31, 2017 at 1:22 PM, <
dev-digest-h...@joshua.incubator.apache.org> wrote:

>
>
>  in addition, i set JAVA/_HOME with three different path as follows:
> default-java,  java-1.8.0-openjdk-amd64, java-8-openjdk-amd64,
> java-9-openjdk-amd64.
>
>
There is no need for you to have so many Java distributions on your
classpath.
If I were you i would use the most recent Java 1.8 you can find. I build
Joshua against it and all is well.
Lewis


Fwd: FW: July 2017 Newsletter -- LDC

2017-07-19 Thread lewis john mcgibbney
FYI folks

-- Forwarded message -
From: Mcgibbney, Lewis J (398M) 
Date: Wed, Jul 19, 2017 at 10:14 AM
Subject: FW: July 2017 Newsletter -- LDC
To: lewis john mcgibbney 






Dr. Lewis John McGibbney Ph.D., B.Sc.

Data Scientist II

Computer Science for Data Intensive Applications Group (398M)

Instrument Software and Science Data Systems Section (398)

Jet Propulsion Laboratory

California Institute of Technology

4800 Oak Grove Drive

Pasadena, California 91109-8099

Mail Stop : 158-256C

Tel:  (+1) (818)-393-7402

Cell: (+1) (626)-487-3476

Fax:  (+1) (818)-393-1190

Email: lewis.j.mcgibb...@jpl.nasa.gov







 Dare Mighty Things



*From: *Ldc-customers1  on behalf of
Penn LDC 
*Date: *Tuesday, July 18, 2017 at 8:51 AM
*To: *Penn LDC 
*Subject: *July 2017 Newsletter -- LDC



*In this newsletter*



*LDC at ACL 2017*



*Fall 2017 Data Scholarship Program*



*New corpora: *



BOLT English Discussion Forums <https://catalog.ldc.upenn.edu/LDC2017T11>
IARPA Babel Tamil Language Pack IARPA-babel204b-v1.1b
<https://catalog.ldc.upenn.edu/LDC2017S13>
KSUEmotions <https://catalog.ldc.upenn.edu/LDC2017S12>
Metalogue Multi-Issue Bargaining Dialogue
<https://catalog.ldc.upenn.edu/LDC2017S11>

___

*LDC at ACL 2017: July 31-August 2, Vancouver, Canada*

ACL has returned to North America and LDC is taking this opportunity to
interact with top HLT researchers gathering in Vancouver, Canada.  Stop by
our exhibition table to learn more about recent developments at the
Consortium and new publications.



*Fall 2017 Data Scholarship Program*

Student applications for the Fall 2017 LDC Data Scholarship program are
being accepted now through Friday, September 15, 2017, 11:59PM EST. The LDC
Data Scholarship program provides university students with access to LDC
data at no cost. Students must complete an application which consists of a
data use proposal and letter of support from their advisor.

For more information on application requirements and program rules, please
visit the LDC Data Scholarship page
<https://www.ldc.upenn.edu/language-resources/data/data-scholarships>.

 Applicants can email their materials to the LDC Data Scholarship program
.



*New corpora*

(1) BOLT English Discussion Forums
<https://catalog.ldc.upenn.edu/LDC2017T11> was developed by LDC and
consists of 830,440 discussion forum threads in English harvested from the
Internet using a combination of manual and automatic processes.

The BOLT <https://www.ldc.upenn.edu/collaborations/current-projects/bolt>
(Broad
Operational Language Translation) program developed machine translation and
information retrieval for less formal genres, focusing particularly on
user-generated content in Chinese, Egyptian Arabic and English. The
collected data was translated and annotated for various tasks including
word alignment, treebanking, propbanking and co-reference.

The material in this release represents the unannotated English source data
in the discussion forum genre. Collection was seeded based on the results
of manual data scouting by native speaker annotators. When multiple threads
from a forum were submitted, the entire forum was automatically harvested
and added to the collection. Only a small portion of the threads included
in this release were manually reviewed, and it is expected that there may
be some offensive or otherwise undesired content as well as some threads
that contain a large amount of non-English content. Language identification
was performed on all threads in this corpus (using CLD2
<https://github.com/CLD2Owners/cld2>).

BOLT English Discussion Forums is distributed via web download.

2017 Subscription Members will automatically receive copies of this corpus.
2017 Standard Members may request a copy as part of their 16 free
membership corpora. Non-members may license this data for US $3500.

***

(2) IARPA Babel Tamil Language Pack IARPA-babel204b-v1.1b
<https://catalog.ldc.upenn.edu/LDC2017S13> was developed by Appen for the
IARPA (Intelligence Advanced Research Projects Activity) Babel program. It
contains 200 hours of Tamil conversational and scripted telephone speech
collected in 2012 and 2013 along with corresponding transcripts.

The Babel program focuses on underserved languages and seeks to develop
speech recognition technology that can be rapidly applied to any human
language to support keyword search performance over large amounts of
recorded speech.

The Tamil speech in this release represents that spoken in the Northern,
Central, Southern and Western dialect regions of the Indian state of Tamil
Nadu. The gender distribution among speakers is approximately equal;
speakers' ages range from 16 years to 65 years. Calls were made using
different telephones (e.g., mobile, landline) from a variety of
environments including the street, a home or office, a public place, and
inside 

Re: Merging 7.X into master??? + cleaning up branches

2017-07-09 Thread lewis john mcgibbney
Hi Folks,

On Sat, Jul 8, 2017 at 9:38 PM,  wrote:

>
> From: Matt Post 
> To: dev@joshua.incubator.apache.org
> Cc:
> Bcc:
> Date: Tue, 4 Jul 2017 12:40:36 -0400
> Subject: Re: Merging 7.X into master??? + cleaning up branches
>

...snip


> I think that it would be better to focus on low-resource scenarios and
> user-focused applications, instead.
>
>
This is a valid point. IMHO there is currently NO toolkit/framework out
there which makes it easy to undertake language translation tasks... Joshua
in it's current state e.g. Maven artifact(s), community language packs and
source downloads represent the best resource in the field and this speaks
volumes towards the user-focused applications perspective you mention above
Matt.

Additionally, there is a still a significant win to be had from continued
development of 7.X branch (as new master) with the aim of further infusion
into other Apache products and communities.

We also still have Incubator graduation to think about... so there is loads
on the table at this stage.

My opinion is that we should make an effort to forward port everything from
master into 7.X and then possibly make another release as the
re-architected codebase.

Is there any objections to this? If you are too busy Matt, then I would
step up and champion the effort. A few questions right now;

   1. Which branch is 7.X? Is it [0]? Is there a JIRA ticket open for this?
   2. Does anyone have a suggestion for how to forward port issues from
   master to 7.X?
   3. Which branches can we delete to clean things up a wee bit?

Thanks
Lewis

[0] https://github.com/apache/incubator-joshua/tree/7


2 PhD studentships in Translation Technology

2017-06-30 Thread lewis john mcgibbney
http://rgcl.wlv.ac.uk/2017/06/30/2-phd-studentships-translation-technology/

-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Re: [ANNOUNCE] - Apache Joshua 6.1 incubating release

2017-06-27 Thread lewis john mcgibbney
Hi Suneel,
I think it's worth opening a JIRA issue and we can possibly mark it for 7.X?
lewis

On Tue, Jun 27, 2017 at 9:36 PM, <
dev-digest-h...@joshua.incubator.apache.org> wrote:

>
> From: Suneel Marthi 
> To: dev@joshua.incubator.apache.org
> Cc:
> Bcc:
> Date: Fri, 23 Jun 2017 01:59:28 -0400
> Subject: Re: [ANNOUNCE] - Apache Joshua 6.1 incubating release
> Congrats on the release.
>
> I have been a silent lurker on this channel since I first heard of Joshua
> last September at Amazon, Berlin.
>
> Tommaso and myself recently did a talk at Berlin Buzzwords 2017 -
> 'Embracing Diversity - searching over multiple languages' [1]
> using Apache Joshua for Machine Translation, and Apache OpenNLP for
> Language detection.
>
> I have been wondering how much of the present VLPS can be replaced by
> OpenNLP with Flink/Beam pipelines.
> I did a talk last week at Hadoop Summit, San Jose about 'Large Scale Text
> processing with Apache OpenNLP and Apache Flink [2].
>
> Also that Thrax which is presently MapReduce based, can definitely be
> ported over to modern streaming distributed frameworks like Flink/Kafka
> Streams/Beam.
>
>
> [1]
> https://www.youtube.com/watch?v=ZrWxySF-9KY&index=20&t=2s&;
> list=PLq-odUc2x7i-9Nijx-WfoRMoAfHC9XzTt
> [2] https://www.slideshare.net/SuneelMarthi/large-scale-text-processing
>
>
>


Merging 7.X into master??? + cleaning up branches

2017-06-27 Thread lewis john mcgibbney
Hi Folks,
Two things...

   1. Currently the branches for Joshua are a bit of a mess... it would be
   better if they were named after JIRA issues such that the mappings back to
   some concrete development were explicit. Does anyone want to clean these up?
   2. Now that 6.1-incubating is released and live, Is there any desire to
   merge 7.X branch into master and continue development there? I was not
   involved with the 7.X development but it looked like a significant step
   forward... it would be a shame for that work to stagnate.

Thanks,

lewis

-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Re: [ANNOUNCE] - Apache Joshua 6.1 incubating release

2017-06-22 Thread lewis john mcgibbney
Hi Tommaso,
EXCELLENT :)
@Matt are you able to Tweet this out and make some tags?
@Tommaso, where else did you announce this? Is it possible for us to make
some more noise on various other communication forums/channels?
This is brilliant news. Thank you Tommaso for being persistent with the
release process, I am glad that we were able to recover the artifacts.
Lewis

On Thu, Jun 22, 2017 at 5:55 AM, <
dev-digest-h...@joshua.incubator.apache.org> wrote:

>
> From: Tommaso Teofili 
> To: annou...@apache.org
> Cc: "dev@joshua.incubator.apache.org" 
> Bcc:
> Date: Thu, 22 Jun 2017 12:54:49 +
> Subject: [ANNOUNCE] - Apache Joshua 6.1 incubating release
> Hi Folks,
>
> The Apache Joshua team (PPMC) is pleased to announce the immediate
> availability of Apache Joshua 6.1 (incubating).
>
> Apache Joshua is a statistical machine translation decoder for
> phrase-based, hierarchical, and syntax-based machine translation, written
> in Java.
>
> Apache Joshua is released as both source code, downloads for which can be
> found at ASF dist download site [0] as well as Maven artifacts which can be
> found on Maven central [1].
>
> The full Jira release report can be found here [3].
>
> Thank you,
> Tommaso (on behalf of Apache Joshua PPMC)
>
> — DISCLAIMER Apache Joshua is an effort undergoing incubation at The Apache
> Software Foundation (ASF), sponsored by the Apache Incubator PMC.
> Incubation is required of all newly accepted projects until a further
> review indicates that the infrastructure, communications, and decision
> making process have stabilized in a manner consistent with other successful
> ASF projects. While incubation status is not necessarily a reflection of
> the completeness or stability of the code,it does indicate that the project
> has yet to be fully endorsed by the ASF.
>
> [0] http://apache.org/dist/incubator/joshua/6.1/
> [1] http://search.maven.org/#search|ga|1|g%3A%22org.apache.joshua%22
> [3]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> projectId=12319720&version=12335049
>
>


-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Re: Staging repo gone

2017-06-14 Thread lewis john mcgibbney
Hi Tommaso,
This is strange, frustrating and disappointing. I am not award of any
auto-purging which goes on on repository.apache.org however I cannot
confirm as I do not know!
I'm going to take this over to INFRA and see what is up. Please track it on
https://issues.apache.org/jira/projects/INFRA/issues/INFRA-14352
Lewis

On Wed, Jun 14, 2017 at 9:11 AM, <
dev-digest-h...@joshua.incubator.apache.org> wrote:

>
> From: Tommaso Teofili 
> To: "dev@joshua.incubator.apache.org" 
> Cc:
> Bcc:
> Date: Tue, 13 Jun 2017 13:14:29 +
> Subject: Staging repo gone
> Hi all,
>
> as to proceed with the remaining release tasks [0], I can't find the closed
> staging repo [1] anymore, did anyone release it ?
>
> Regards,
> Tommaso
>
> [0] :
> https://cwiki.apache.org/confluence/display/JOSHUA/
> Joshua+Release+Management+Procedure
> [1] :
> https://repository.apache.org/content/repositories/orgapachejoshua-1005
>
>


-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Thumbs up from general@ to release Joshua 6.1 (Incubating)

2017-06-10 Thread lewis john mcgibbney
Hi Folks,
Both Justin and John have provided us with +1's for releasing... which is
quite frankly great.
We've been undertaking a good bit of due diligence for this release... it
has admittedly taken a hellish amount of time to push through. On the
bright side, we have now nearly made the first official Apache release
which is a huge milestone for the project and for getting the word out that
we are alive and kicking in the Incubator.
Huge thank you to Tommaso who has been acting as release manager and
community liason so to speak. It makes a huge difference and is greatly
appreciated.
Once Tommaso's RESULT thread hits general@ we can progress with the
remaining release management items.
Hopefully there will be a release announcement pretty soon.
In the meantime, can everyone being thinking about appropriate avenue's and
communication forums for us to publicize the release announcement? If you
could, please append them to the release management document on the Joshua
wiki.
Best
Lewis


-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Request to confirm that cdec classes can be included in Apache Joshua (Incubating) v6.1 Release

2017-06-09 Thread lewis john mcgibbney
Good Afternoon David,
I took this query to the cdec issue tracker [0] and was advised to contact
you directly, so I am doing exactly that.
If you would kindly review the following description [1] and provide your
feedback it would be greatly appreciated.
Kind Regards
Lewis
(On behalf of the Apache Joshua Podling Project Management Committee)

[0] https://github.com/redpony/cdec/issues/94
[1] https://github.com/redpony/cdec/issues/94#issue-233715741

-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Re: Podling Report Reminder - June 2017

2017-05-25 Thread lewis john mcgibbney
Hi Folks,
I've populated this report. If any mentors are able to look it over, it
would be appreciated.
Lewis

On Tue, May 23, 2017 at 3:00 AM, <
dev-digest-h...@joshua.incubator.apache.org> wrote:

>
> dev Digest 23 May 2017 10:00:50 - Issue 206
>
> Topics (messages 2179 through 2179)
>
> Podling Report Reminder - June 2017
> 2179 by: johndament.apache.org
>
> Administrivia:
>
> -
> To post to the list, e-mail: dev@joshua.incubator.apache.org
> To unsubscribe, e-mail: dev-digest-unsubscr...@joshua.incubator.apache.org
> For additional commands, e-mail: dev-digest-help@joshua.
> incubator.apache.org
>
> --
>
>
>
> -- Forwarded message --
> From: johndam...@apache.org
> To: dev@joshua.incubator.apache.org
> Cc:
> Bcc:
> Date: Tue, 23 May 2017 10:00:46 -
> Subject: Podling Report Reminder - June 2017
> Dear podling,
>
> This email was sent by an automated system on behalf of the Apache
> Incubator PMC. It is an initial reminder to give you plenty of time to
> prepare your quarterly board report.
>
> The board meeting is scheduled for Wed, 21 June 2017, 10:30 am PDT.
> The report for your podling will form a part of the Incubator PMC
> report. The Incubator PMC requires your report to be submitted 2 weeks
> before the board meeting, to allow sufficient time for review and
> submission (Wed, June 07).
>
> Please submit your report with sufficient time to allow the Incubator
> PMC, and subsequently board members to review and digest. Again, the
> very latest you should submit your report is 2 weeks prior to the board
> meeting.
>
> Thanks,
>
> The Apache Incubator PMC
>
> Submitting your Report
>
> --
>
> Your report should contain the following:
>
> *   Your project name
> *   A brief description of your project, which assumes no knowledge of
> the project or necessarily of its field
> *   A list of the three most important issues to address in the move
> towards graduation.
> *   Any issues that the Incubator PMC or ASF Board might wish/need to be
> aware of
> *   How has the community developed since the last report
> *   How has the project developed since the last report.
> *   How does the podling rate their own maturity.
>
> This should be appended to the Incubator Wiki page at:
>
> https://wiki.apache.org/incubator/June2017
>
> Note: This is manually populated. You may need to wait a little before
> this page is created from a template.
>
> Mentors
> ---
>
> Mentors should review reports for their project(s) and sign them off on
> the Incubator wiki page. Signing off reports shows that you are
> following the project - projects that are not signed may raise alarms
> for the Incubator PMC.
>
> Incubator PMC
>
>
>


-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Re: [VOTE] Release Apache Joshua 6.1 (Incubating) RC4

2017-05-18 Thread lewis john mcgibbney
Hi Folks,
I generated it by simply copying the JIRA report into CHANGES.md. It turns
out that this looks somewhat nice due to markdown rendering!
No real work involved... so I can;t take credit for anything :(
I'll get on to the release ASAP.
Thanks

On Mon, May 8, 2017 at 6:04 AM, Tommaso Teofili 
wrote:

> Hi Henry,
>
> I just took the one created by Lewis, so he should know :-)
>
> Regards,
> Tommaso
>
> Il giorno gio 4 mag 2017 alle ore 20:39 Henry Saputra <
> henry.sapu...@gmail.com> ha scritto:
>
>> Hi Tommaso, curious how did you generate the CHANGES.md for the release
>> note?
>>
>> Was it manually created by looking at the JIRA tickets and Git PR merges?
>>
>> Thanks,
>>
>> - Henry
>>
>> On Thu, Mar 9, 2017 at 5:40 AM, Tommaso Teofili <
>> tommaso.teof...@gmail.com>
>> wrote:
>>
>> > Hi Folks,
>> > Please VOTE on the Apache Joshua 6.1 Release Candidate #4.
>> >
>> > We solved 36 issues:
>> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?
>> > projectId=12319720&version=12335049
>> >
>> > Git source tag (23d5bda277028ea42b142b221639fd233a08da36):
>> > https://git-wip-us.apache.org/repos/asf?p=incubator-joshua.
>> git;a=commit;h=
>> > 23d5bda277028ea42b142b221639fd233a08da36
>> >
>> > Staging repo:
>> > https://repository.apache.org/content/repositories/orgapachejoshua-1005
>> >
>> > Source Release Artifacts:
>> > https://dist.apache.org/repos/dist/dev/incubator/joshua/6.1/
>> >
>> > PGP release keys (signed using 891768A5):
>> > *https://git1-us-west.apache.org/repos/asf?p=incubator-
>> > joshua.git;a=blob_plain;f=KEYS;h=aa18365bf5c8c8fb17b084f783a75c
>> > 3a2460a98d;hb=HEAD
>> > > > joshua.git;a=blob_plain;f=KEYS;h=aa18365bf5c8c8fb17b084f783a75c
>> > 3a2460a98d;hb=HEAD>*
>> >
>> > Vote will be open for 72 hours.
>> > Thank you to everyone that is able to VOTE as well as everyone that
>> > contributed to Apache Joshua 6.1.
>> >
>> > [ ] +1, let's get it released!!!
>> > [ ] +/-0, fine, but consider to fix few issues before...
>> > [ ] -1, nope, because... (and please explain why)
>> >
>> > Regards,
>> > Tommaso
>> >
>>
>


-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Fwd: FW: May 2017 Newsletter -- LDC

2017-05-15 Thread lewis john mcgibbney
FYI Folks

-- Forwarded message -
From: Mcgibbney, Lewis J (398M) 
Date: Mon, May 15, 2017 at 9:09 AM
Subject: FW: May 2017 Newsletter -- LDC
To: lewis john mcgibbney 






Dr. Lewis John McGibbney Ph.D., B.Sc.

Data Scientist II

Computer Science for Data Intensive Applications Group 398M

Jet Propulsion Laboratory

California Institute of Technology

4800 Oak Grove Drive

Pasadena, California 91109-8099

Mail Stop : 158-256C

Tel:  (+1) (818)-393-7402

Cell: (+1) (626)-487-3476

Fax:  (+1) (818)-393-1190

Email: lewis.j.mcgibb...@jpl.nasa.gov







 Dare Mighty Things



*From: *Ldc-customers1  on behalf of
Penn LDC 
*Date: *Monday, May 15, 2017 at 8:22 AM
*To: *Penn LDC 
*Subject: *May 2017 Newsletter -- LDC



*In this newsletter:*



*Recent Collaborations*



*New publications:*



IARPA Babel Lao Language Pack IARPA-babel203b-v3.1a
<https://catalog.ldc.upenn.edu/LDC2017S08>

Multi-Language Conversational Telephone Speech 2011 -- Turkish
<https://catalog.ldc.upenn.edu/LDC2017S09>

Phrase Detectives Corpus <https://catalog.ldc.upenn.edu/ldc2017T08>

The EventStatus Corpus <https://catalog.ldc.upenn.edu/ldc2017T09>

*_*

*Recent Collaborations*

Collaborations play an important role in many LDC activities. Over the past
twenty-five years, LDC has partnered, consulted, and otherwise
“collaborated” with a variety of organizations to advance research
community goals. Recently, LDC partnered with Oxford Wave Research
<http://www.oxfordwaveresearch.com/> to integrate its latest speech
technology into data collection and annotation processes. LDC also supports
the Hearables Challenge
<https://ninesights.ninesigma.com/web/hearables/innovationcontest>
sponsored by the National Science Foundation by creating and distributing
training and test corpora. Finally, LDC Executive Director Chris Cieri is
working with international colleagues to plan LREC2018
<http://www.lrec-conf.org/lrec2018/lrec2018.htm> as a member of the
Conference Programme Committee.

LDC welcomes new collaborations. Let us know what interests you and how we
can work together. Contact LDC  to begin the
conversation.



*New publications:*

(1) IARPA Babel Lao Language Pack IARPA-babel203b-v3.1a
<https://catalog.ldc.upenn.edu/LDC2017S08> was developed by Appen for the
IARPA (Intelligence Advanced Research Projects Activity) Babel program. It
contains approximately 207 hours of Lao conversational and scripted
telephone speech collected in 2013 along with corresponding transcripts.



The Babel program focuses on underserved languages and seeks to develop
speech recognition technology that can be rapidly applied to any human
language to support keyword search performance over large amounts of
recorded speech.



The Lao speech in this release represents that spoken in the Vientiane
dialect region in Laos. The gender distribution among speakers is
approximately equal; speakers' ages range from 16 years to 60 years. Calls
were made using different telephones (e.g., mobile, landline) from a
variety of environments including the street, a home or office, a public
place, and inside a vehicle.



IARPA Babel Lao Language Pack IARPA-babel203b-v3.1a is distributed via web
download.



2017 Subscription Members will receive copies of this corpus provided they
have submitted a completed copy of the special license agreement. 2017
Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for US $25.

*

(2) Multi-Language Conversational Telephone Speech 2011 -- Turkish
<https://catalog.ldc.upenn.edu/LDC2017S09> was developed by LDC and is
comprised of approximately 18 hours of telephone speech in Turkish. The
data was collected primarily to support research and technology evaluation
in automatic language identification, and portions of these telephone calls
were used in the NIST 2011 Language Recognition Evaluation (LRE
<https://www.nist.gov/itl/iad/mig/2011-language-recognition-evaluation>).

Participants were recruited by native speakers who contacted acquaintances
in their social network. Those native speakers made one call, up to 15
minutes, to each acquaintance. The data was collected using LDC's telephone
collection infrastructure
<https://www.ldc.upenn.edu/about/facilities/human-subjects-collection>,
comprised of three computer telephony systems. Human auditors labeled calls
for callee gender, dialect type and noise. Demographic information about
the participants was not collected.

LDC has also released the Multi-Language Conversation Telephone Speech 2011
-- Slavic Group (LDC2016S11 <https://catalog.ldc.upenn.edu/LDC2016S11>)

Multi-Language Conversational Telephone Speech 2011 -- Turkish is
distributed via web download.

2017 Subscription Members will automatically receive copies of this corpus.
2017 Standard Membe

Re: ping on RC4 vote

2017-04-25 Thread lewis john mcgibbney
Hi Tommaso,
Can you close this VOTE out with a RESULT thread then progress to
general@incubator with a VOTE?
Thanks
Lewis

On Tue, Apr 25, 2017 at 8:24 AM, <
dev-digest-h...@joshua.incubator.apache.org> wrote:

>
> From: Tommaso Teofili 
> To: dev@joshua.incubator.apache.org
> Cc:
> Bcc:
> Date: Mon, 24 Apr 2017 20:25:11 +
> Subject: Re: ping on RC4 vote
> hi all,
>
> I've uploaded the artifacts from the nexus staging repo to /dist/dev; those
> should be the good ones, so that we can proceed and get the IMPC voting
> happen.
>
> Regards,
> Tommaso
>
>
>


Re: ping on RC4 vote

2017-04-20 Thread lewis john mcgibbney
PING Tommaso.

On Thu, Apr 13, 2017 at 11:32 AM, lewis john mcgibbney 
wrote:

> Hi Tommaso,
>
> Go for it. Let's get some more feedback and then we can take it to the
> IPMC if the VOTE passes here.
> Lewis
>
> On Mon, Apr 10, 2017 at 5:46 AM,  incubator.apache.org> wrote:
>
>>
>> thanks a lot Lewis for your in depth analysis which makes things clearer
>> now.
>> I can find the mentioned (wrong) binary files in the source packages on
>> dist/dev [1] while I can't find them within the ones on the staging repo
>> [2].
>> So if I can copy the ones from the staging repo to dis/dev that should be
>> ok, perhaps that's what I would have had to do in first place.
>>
>> What do you think ?
>> Regards,
>> Tommaso
>>
>> [1] : https://dist.apache.org/repos/dist/dev/incubator/joshua/6.1/
>> [2] :
>> https://repository.apache.org/content/repositories/orgapache
>> joshua-1005/org/apache/joshua/joshua-incubating/6.1/
>>
>>
>>


-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Fwd: FW: April 2017 Newsletter -- LDC

2017-04-17 Thread lewis john mcgibbney
Hi Folks,
FYI
Lewis

-- Forwarded message --
From: Mcgibbney, Lewis J (398M) 
Date: Mon, Apr 17, 2017 at 10:46 AM
Subject: FW: April 2017 Newsletter -- LDC
To: lewis john mcgibbney 






Dr. Lewis John McGibbney Ph.D., B.Sc.

Data Scientist II

Computer Science for Data Intensive Applications Group 398M

Jet Propulsion Laboratory

California Institute of Technology

4800 Oak Grove Drive

Pasadena, California 91109-8099

Mail Stop : 158-256C

Tel:  (+1) (818)-393-7402 <(818)%20393-7402>

Cell: (+1) (626)-487-3476 <(626)%20487-3476>

Fax:  (+1) (818)-393-1190 <(818)%20393-1190>

Email: lewis.j.mcgibb...@jpl.nasa.gov







 Dare Mighty Things



*From: *Ldc-customers1  on behalf of
Penn LDC 
*Date: *Monday, April 17, 2017 at 8:05 AM
*To: *Penn LDC 
*Subject: *April 2017 Newsletter -- LDC



*In this newsletter*

*LDC celebrates 25 years *

*LDC data and commercial technology development *

*New publications:*

2010 NIST Speaker Recognition Evaluation Test Set
<https://catalog.ldc.upenn.edu/LDC2017S06>

BOLT Egyptian Arabic SMS/Chat and Transliteration
<https://catalog.ldc.upenn.edu/LDC2017T07>

CHiME2 Grid <https://catalog.ldc.upenn.edu/LDC2017S07>


_

*LDC celebrates 25 years*

April 2017 marks the beginning of LDC’s 25th year as the leader in language
resource development and distribution. Founded in 1992, the Consortium has
grown from a data repository to a vibrant data center that creates, shares
and archives language resources. The Catalog continues to grow, boasting
over 700 titles in more than 90 languages. With the support of members,
licensees, sponsors and collaborators, LDC has distributed over 120,000
copies of data to more than 3,500 organizations worldwide. Our heartfelt
thanks for your support as we continue our mission to provide large
quantities of diverse data, research program support and high quality
member services.



*LDC data and commercial technology development *

Any organization wishing to use LDC data to develop or test products for
commercialization or use LDC data in any commercial product or for any
commercial purpose, must first license the data as a For-Profit Member.
Once the data is licensed under the For-Profit Membership, the organization
retains perpetual rights to use the data for commercial technology
development. LDC data users should consult corpus-specific license
agreements for limitations on the use of certain corpora. Visit our
Licensing <https://www.ldc.upenn.edu/data-management/using/licensing> page
for more information.



*New Corpora*

(1) 2010 NIST Speaker Recognition Evaluation Test Set
<https://catalog.ldc.upenn.edu/LDC2017S06> was developed by LDC and NIST
(National Institute of Standards and Technology). It contains 2,255 hours
of American English telephone speech and interview speech recorded over a
microphone channel used as test data in the NIST-sponsored 2010 Speaker
Recognition Evaluation (SRE)
<http://www.itl.nist.gov/iad/mig/tests/spk/2010/index.html>.

The telephone speech segments include two-channel excerpts of approximately
10 seconds and 5 minutes. There are also summed-channel excerpts in the
range of 5 minutes. The microphone excerpts are 3-15 minutes in duration.
As in prior evaluations, intervals of silence were not removed.

The 2010 evaluation includes not only conversational telephone speech (CTS)
recorded over ordinary telephone channels for the core training and test
conditions, but also CTS and conversational interview speech recorded over
a room microphone channel. Unlike prior evaluations, some of the
conversational telephone style speech was collected in a manner to produce
particularly high, or particularly low, vocal effort on the part of the
speaker of interest. In addition to evaluation data, this package also
consists of answer keys, trial and train files, development data and
evaluation documentation.

2010 NIST Speaker Recognition Evaluation Test Set is distributed via hard
drive.

2017 Subscription Members will receive copies of this corpus. 2017 Standard
Members may request a copy as part of their 16 free membership corpora.
Non-members may license this data for US $4000.

*

(2) BOLT Egyptian Arabic SMS/Chat and Transliteration
<https://catalog.ldc.upenn.edu/LDC2017T07> was developed by LDC and
consists of naturally-occurring Short Message Service (SMS) and Chat (CHT)
data collected through data donations and live collection involving native
speakers of Egyptian Arabic. The corpus contains 5,691 conversations
totaling 1,029,248 words across 262,026 messages. Messages were natively
written in either Arabic orthography or romanized Arabizi. A total of 1,856
Arabizi conversations (287,022 words) were transliterated from the original
romanized Arabizi script into standard Arabic orthography and then
reviewed, corrected and normalized by LDC annotators accord

Re: ping on RC4 vote

2017-04-13 Thread lewis john mcgibbney
Hi Tommaso,

Go for it. Let's get some more feedback and then we can take it to the IPMC
if the VOTE passes here.
Lewis

On Mon, Apr 10, 2017 at 5:46 AM, <
dev-digest-h...@joshua.incubator.apache.org> wrote:

>
> thanks a lot Lewis for your in depth analysis which makes things clearer
> now.
> I can find the mentioned (wrong) binary files in the source packages on
> dist/dev [1] while I can't find them within the ones on the staging repo
> [2].
> So if I can copy the ones from the staging repo to dis/dev that should be
> ok, perhaps that's what I would have had to do in first place.
>
> What do you think ?
> Regards,
> Tommaso
>
> [1] : https://dist.apache.org/repos/dist/dev/incubator/joshua/6.1/
> [2] :
> https://repository.apache.org/content/repositories/
> orgapachejoshua-1005/org/apache/joshua/joshua-incubating/6.1/
>
>
>


Podling Report April 2017

2017-04-07 Thread lewis john mcgibbney
Hi Folks,
Please find this at
https://wiki.apache.org/incubator/April2017
I'll be more active helping out with the release candidate this coming week.
Thanks
Lewis

-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Re: ping on RC4 vote

2017-04-07 Thread lewis john mcgibbney
Hi Tomasso,

On Thu, Apr 6, 2017 at 5:31 AM,  wrote:

>
> From: Tommaso Teofili 
> To: dev@joshua.incubator.apache.org
> Cc:
> Bcc:
> Date: Sat, 01 Apr 2017 17:06:06 +
> Subject: Re: ping on RC4 vote
> I really have no idea, I just executed the Maven commands as per wiki [1],
> then I found out that in my /target directory I had all the expected
> artifacts but no md5 / sha1 signatures for them, on the other hand it seems
> they got generated at some point and existed in the staging repo on Nexus.
>

This seems strange, I just used a very similar release procedure on another
project (Gora) and we were able to provide all signatures with staging and
repository artifacts being the same. It should be noted however that the
release policy [0] does not explicitly mention which type of cryptographic
signature method be used, only that "...All supplied packages MUST be
cryptographically signed by the Release Manager with a detached signature."

[0] http://apache.org/legal/release-policy.html#release-signing

In my opinion, if one method of signature is provided (which it is) then
that satisfies the release policy. The mismatch does however raise
questions as to whether the staging and repository artifacts are the same.
I thought I would check it out, here are my results.

I calculated an md5 checksum for the staging -src.tar.gz artifact and then
repository artifact as follows

gpg --print-md MD5 joshua-incubating-6.1-src.tar.gz >
joshua-incubating-6.1-src.tar.gz.md5
joshua-incubating-6.1-src.tar.gz: 9A 13 8A E8 F6 A3 12 8C  64 77 9B 29 18
FD 86
  48

gpg --print-md MD5 joshua-incubating-6.1-src.tar.gz >
joshua-incubating-6.1-src.tar.gz.md5
joshua-incubating-6.1-src.tar.gz: 16 75 A7 A9 B0 D7 DF 56  61 06 52 FA C9
12 D2
  6F

I then undertook a manual diff of the directories

diff -r apache-joshua-6.1-incubating ./maven/apache-joshua-6.1-incubating |
grep apache-joshua-6.1-incubating | awk '{print $4}' > difference1.txt

difference1.txt contained the following entries

build_binary
lmplz
query
sentclient
sentclient.dSYM
sentserver
sentserver.dSYM

These files can be found at the following locations

lmcgibbn@LMC-056430 ~/Desktop/apache-joshua-6.1-incubating $ find . -name
"build_binary"
./bin/build_binary
lmcgibbn@LMC-056430 ~/Desktop/apache-joshua-6.1-incubating $ find . -name
"lmplz"
./bin/lmplz
lmcgibbn@LMC-056430 ~/Desktop/apache-joshua-6.1-incubating $ find . -name
"query"
./bin/query
lmcgibbn@LMC-056430 ~/Desktop/apache-joshua-6.1-incubating $ find . -name
"sentclient"
./scripts/training/parallelize/sentclient
./scripts/training/parallelize/sentclient.dSYM/Contents/Resources/DWARF/sentclient
lmcgibbn@LMC-056430 ~/Desktop/apache-joshua-6.1-incubating $ find . -name
"sentclient.dSYM"
./scripts/training/parallelize/sentclient.dSYM
lmcgibbn@LMC-056430 ~/Desktop/apache-joshua-6.1-incubating $ find . -name
"sentserver"
./scripts/training/parallelize/sentserver
./scripts/training/parallelize/sentserver.dSYM/Contents/Resources/DWARF/sentserver
lmcgibbn@LMC-056430 ~/Desktop/apache-joshua-6.1-incubating $ find . -name
"sentserver.dSYM"
./scripts/training/parallelize/sentserver.dSYM

These are binary files and should not be included within the release
candidate.



> Having realized that I manually created the md5 counterparts for source
> distribution packages and uploaded both artifacts and md5 signatures to
> /dist.
>
> I am not sure myself if this is a somewhat ok or expected behaviour (it's
> one of my first times as a release manager).
>
> I guess we could simply put the stuff from Nexus on /dist/dev instead, as
> that will anyway be the one that goes in /dist/release once we release the
> staging repo, WDYT?
>
>
It is therefore my opinion that you replace the staging artifacts with the
artifacts present within repository... or DROP the release candidate and
push another one.
Lewis


Re: ping on RC4 vote

2017-03-29 Thread lewis john mcgibbney
Hi Folks,
I would also like to encourage people to take a look and VOTE as soon as
possible.
I'm in regular contact with some folks over at the Linguistic Data
Consortium [0] (as are several of us I'm sure) and they have tentatively
agreed to announce our release (should it be done by then) in their next
newsletter... which has a wide reader base.

Thank you Tommaso for hanging on here.

To clarify, I'm a +1

[0] https://www.ldc.upenn.edu/

On Wed, Mar 29, 2017 at 8:39 AM, <
dev-digest-h...@joshua.incubator.apache.org> wrote:

>
>
> From: Tommaso Teofili 
> To: "dev@joshua.incubator.apache.org" 
> Cc:
> Bcc:
> Date: Wed, 29 Mar 2017 15:39:18 +
> Subject: Re: ping on RC4 vote
> ping
>
>


Fwd: FW: March 2017 Newsletter -- LDC

2017-03-17 Thread lewis john mcgibbney
Hi Team,
Please see below for LDC March Newsletter.
Lewis

-- Forwarded message --
From: Mcgibbney, Lewis J (398M) 
Date: Fri, Mar 17, 2017 at 12:16 PM
Subject: FW: March 2017 Newsletter -- LDC
To: Lewis John McGibbney 






Dr. Lewis John McGibbney Ph.D., B.Sc.

Data Scientist II

Computer Science for Data Intensive Applications Group 398M

Jet Propulsion Laboratory

California Institute of Technology

4800 Oak Grove Drive

Pasadena, California 91109-8099

Mail Stop : 158-256C

Tel:  (+1) (818)-393-7402 <(818)%20393-7402>

Cell: (+1) (626)-487-3476 <(626)%20487-3476>

Fax:  (+1) (818)-393-1190 <(818)%20393-1190>

Email: lewis.j.mcgibb...@jpl.nasa.gov







 Dare Mighty Things



*From: *Ldc-customers1  on behalf of
Penn LDC 
*Date: *Friday, March 17, 2017 at 7:49 AM
*To: *Penn LDC 
*Subject: *March 2017 Newsletter -- LDC



*In this newsletter*

BOLT Chinese Discussion Forum Parallel Training Data
<https://catalog.ldc.upenn.edu/LDC2017T05>

IARPA Babel Swahili Language Pack IARPA-babel202b-v1.0d
<https://catalog.ldc.upenn.edu/LDC2017S05>

Noisy TIMIT Speech <https://catalog.ldc.upenn.edu/LDC2017S04>

GALE English-Chinese Parallel Aligned Treebank -- Training
<https://catalog.ldc.upenn.edu/LDC2017T06>

*New Corpora*

(1) BOLT Chinese Discussion Forum Parallel Training Data
<https://catalog.ldc.upenn.edu/LDC2017T05> was developed by LDC and
consists of 1,876,799 tokens of Chinese discussion forum data collected for
the DARPA BOLT program along with their corresponding English translations.

The BOLT <https://www.ldc.upenn.edu/collaborations/current-projects/bolt>
(Broad
Operational Language Translation) program developed machine translation and
information retrieval for less formal genres, focusing particularly on
user-generated content. LDC supported the BOLT program by collecting
informal data sources -- discussion forums, text messaging and chat -- in
Chinese, Egyptian Arabic and English. The collected data was translated and
annotated for various tasks including word alignment, treebanking,
propbanking and co-reference.

The source data in this release consists of discussion forum threads
harvested from the Internet by LDC using a combination of manual and
automatic processes. The full source data collection is released as BOLT
Chinese Discussion Forums (LDC2016T05
<https://catalog.ldc.upenn.edu/LDC2016T05>). Word-aligned and tagged data
is released as BOLT Chinese-English Word Alignment and Tagging - Discussion
Forum Training (LDC2016T19 <https://catalog.ldc.upenn.edu/LDC2016T19>).

BOLT Chinese Discussion Forum Parallel Training Data is distributed via web
download.



2017 Subscription Members will automatically receive copies of this corpus.
2017 Standard Members may request a copy as part of their 16 free
membership corpora. Non-members may license this data for US $1750.

*

(2) IARPA Babel Swahili Language Pack IARPA-babel202b-v1.0d
<https://catalog.ldc.upenn.edu/LDC2017S05> was developed by Appen for the
IARPA (Intelligence Advanced Research Projects Activity) Babel program. It
contains approximately 200 hours of Swahili conversational and scripted
telephone speech collected from 2012-2014 along with corresponding
transcripts.

The Babel program focuses on underserved languages and seeks to develop
speech recognition technology that can be rapidly applied to any human
language to support keyword search performance over large amounts of
recorded speech.



The Swahili speech in this release represents that spoken in the Nairobi
dialect region of Kenya. The gender distribution among speakers is
approximately equal; speakers' ages range from 16 years to 65 years. Calls
were made using different telephones (e.g., mobile, landline) from a
variety of environments including the street, a home or office, a public
place, and inside a vehicle.



Transcripts are encoded in UTF-8.



IARPA Babel Swahili Language Pack IARPA-babel202b-v1.0d is distributed via
web download.



2017 Subscription Members will receive copies of this corpus provided they
have submitted a completed copy of the special license agreement. 2017
Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for US $25.

*

(3) Noisy TIMIT Speech <https://catalog.ldc.upenn.edu/LDC2017S04> was
developed by the Florida Institute of Technology <http://www.fit.edu/> and
contains approximately 322 hours of speech from the TIMIT Acoustic-Phonetic
Continuous Speech Corpus (LDC93S1 <https://catalog.ldc.upenn.edu/LDC93S1>)
modified with different additive noise levels. Only the audio has been
modified; the original arrangement of the TIMIT corpus is still as
described by the TIMIT documentation.

The additive noise are white, pink, blue, red, violet and babble noise with
levels varying in 5 dB (decibel) steps, ranging from 5 to 50 dB. The color
noise types were generated artificia

Re: [VOTE] Release Apache Joshua 6.1 (Incubating) RC4

2017-03-16 Thread lewis john mcgibbney
Hi Tommaso,
It looks like you caught the PPMC on a bad week... we will get the VOTE'd
done worry ;)
Thanks for putting the RC together.
Comments inline

On Mon, Mar 13, 2017 at 3:58 PM, <
dev-digest-h...@joshua.incubator.apache.org> wrote:

SIGS look good so do tags and staging repos.

On primary release src at
https://dist.apache.org/repos/dist/dev/incubator/joshua/6.1/joshua-incubating-6.1-src.tar.gz,
the compressed archive is called joshua-incubating-6.1-src, when I
decompress it, it is called apache-joshua-6.1-incubating. This is a minor
inconsistency which we may wish to address for next incubating release.

When I build (mvn clean install) I get the following... damn laptop. This
is the same issue I got when I tried to spin the original RC2 myself. This
is specific to my environment s not a blocker.

[INFO]

[INFO] BUILD FAILURE
[INFO]

[INFO] Total time: 29.351 s
[INFO] Finished at: 2017-03-16T17:07:16-07:00
[INFO] Final Memory: 41M/697M
[INFO]

[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-assembly-plugin:3.0.0:single
(source-release-assembly) on project joshua-incubating: Execution
source-release-assembly of goal
org.apache.maven.plugins:maven-assembly-plugin:3.0.0:single failed: user id
'498339010' is too big ( > 2097151 ). -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions,
please read the following articles:
[ERROR] [Help 1]
http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException

A mvn clean test results in the following

[INFO]

[INFO] BUILD SUCCESS
[INFO]

[INFO] Total time: 18.971 s
[INFO] Finished at: 2017-03-16T17:09:25-07:00
[INFO] Final Memory: 34M/608M
[INFO]


CHANGES, DISCLAIMER, LICENSE, NOTICE and README all look good. DOAP is
slightly out of date, however it reflects the first RC.


[X] +1, let's get it released!!!
>

Thank you Tommaso

-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


[jira] [Commented] (JOSHUA-324) Address Apache Joshua 6.1 RC#2 Issues

2017-02-21 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15876442#comment-15876442
 ] 

Lewis John McGibbney commented on JOSHUA-324:
-

[~teofili] yes thank you very much, please do.

> Address Apache Joshua 6.1 RC#2 Issues
> -
>
> Key: JOSHUA-324
> URL: https://issues.apache.org/jira/browse/JOSHUA-324
> Project: Joshua
>  Issue Type: Task
>Affects Versions: 6.1
>    Reporter: Lewis John McGibbney
>Assignee: Tommaso Teofili
>Priority: Blocker
> Fix For: 6.1
>
>
> Feedback from [~jmclean] (thank you Justin) on our RC#2 is as follows
> {code}
> ==
> - Your missing incubating in the release artifacts name. [1]
> - There are a number of binary files in the source release that look to be
> compiled source code.
> I checked:
> - name doesn’t include incubating
> - signatures and hashes correct
> - DISCLAIMER exists
> - LICENSE is missing a few things (see below)
> - a source file is missing an Apache header [7]
> - Several unexpected binary files are contained in the source release
> [8][9][10][11]
> - Can compile from source
> License is missing:
> - MIT licensed normalize.css v3.0.3 bundled in [5]
> - glyph icon fonts [6]
> Not an issue but it's a little odd to have LICENSE and NOTICE.txt - usually
> both are bare or both have .txt extension.
> Also while looking at your site I noticed that the download links of you
> incubating site [2] points to github, please change to point to the offical
> release area.
> Also the 6.1 release has already been tagged and it available for public
> download on github [4]  before this vote is finished. This is IMO against
> Apache release policy [3] please remove.
> I also notice you recently released the language packs (18th Nov) but there
> doesn’t seem to have been a vote for that? Any reason for this?
> ===
> [1] http://incubator.apache.org/incubation/Incubation_Policy.html#Releases
> [2] 
> https://cwiki.apache.org/confluence/display/JOSHUA/Apache+Joshua+%28Incubating%29+Home
> [3] http://www.apache.org/dev/release.html#what
> [4] https://github.com/apache/incubator-joshua/releases
> [5] ./demo/bootstrap/css/bootstrap.min.css
> [6] apache-joshua-6.1/demo/bootstrap/fonts/*
> [7] ./src/test/java/org/apache/joshua/decoder/ff/tm/OwnerMapTest.java
> [8] ./bin/GIZA++
> [9] ./bin/mkcls
> [10 ]./bin/snt2cooc.out
> [11] ,/src/test/resources/berkeley_lm/lm.berkeleylm.gz
> [12] http://www.mail-archive.com/general%40incubator.apache.org/msg57543.html
> [13] http://www.mail-archive.com/general%40incubator.apache.org/msg57551.html
> {code}
> This is a blocking issue and until addressed we cannot release 6.1-incubating



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Fwd: Google Summer of Code 2017 is coming

2017-02-03 Thread lewis john mcgibbney
Hi Folks,
Please see above. If anyone is interested in participating in or mentoring
a GSoC project then please respond to this thread. Usually, from there you
can open a Jira ticket in which ever project it is you are interested and
we take it from there.
Have a great weekend.
Lewis


-- Forwarded message --
From: Ulrich Stärk 
Date: Fri, Feb 3, 2017 at 11:50 AM
Subject: Google Summer of Code 2017 is coming
To: ment...@community.apache.org


Hello PMCs (incubator Mentors, please forward this email to your podlings),

Google Summer of Code [1] is a program sponsored by Google allowing
students to spend their summer
working on open source software. Students will receive stipends for
developing open source software
full-time for three months. Projects will provide mentoring and project
ideas, and in return have
the chance to get new code developed and - most importantly - to identify
and bring in new committers.

The ASF will apply as a participating organization meaning individual
projects don't have to apply
separately.

If you want to participate with your project we ask you to do the following
things as soon as
possible but by no later than 2017-02-09:

1. understand what it means to be a mentor [2].

2. record your project ideas.

Just create issues in JIRA, label them with gsoc2017, and they will show up
at [3]. Please be as
specific as possible when describing your idea. Include the programming
language, the tools and
skills required, but try not to scare potential students away. They are
supposed to learn what's
required before the program starts.

Use labels, e.g. for the programming language (java, c, c++, erlang,
python, brainfuck, ...) or
technology area (cloud, xml, web, foo, bar, ...) and record them at [5].

Please use the COMDEV JIRA project for recording your ideas if your project
doesn't use JIRA (e.g.
httpd, ooo). Contact d...@community.apache.org if you need assistance.

[4] contains some additional information (will be updated for 2017 shortly).

3. subscribe to ment...@community.apache.org; restricted to potential
mentors, meant to be used as a
private list - general discussions on the public d...@community.apache.org
list as much as possible
please). Use a recognized address when subscribing (@apache.org or one of
your alias addresses on
record).

Note that the ASF isn't accepted as a participating organization yet,
nevertheless you *have to*
start recording your ideas now or we will not get accepted.

Over the years we were able to complete hundreds of projects successfully.
Some of our prior
students are active contributors now! Let's make this year a success again!

Cheers,

Uli

P.S.: Except for the private parts (label spreadsheet mostly), this email
is free to be shared
publicly if you want to.

[1] https://summerofcode.withgoogle.com/
[2] http://community.apache.org/guide-to-being-a-mentor.html
[3] http://s.apache.org/gsoc2017ideas
[4] http://community.apache.org/gsoc.html
[5] http://s.apache.org/gsoclabels




-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


[jira] [Commented] (JOSHUA-324) Address Apache Joshua 6.1 RC#2 Issues

2017-01-26 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15841593#comment-15841593
 ] 

Lewis John McGibbney commented on JOSHUA-324:
-

Hi Team, OK, I need to someone else to please try out the release procedure 
I've been documenting over at 
https://cwiki.apache.org/confluence/display/JOSHUA/Joshua+Release+Management+Procedure#JoshuaReleaseManagementProcedure-Preparingareleasecandidate%28RC%29forcommunityVOTE%27ing
I cannot get past the issue I've been documenting on the mailing list and it is 
driving me insane,.

Mailing list thread is available at 
http://www.mail-archive.com/dev%40joshua.incubator.apache.org/msg02023.html 

> Address Apache Joshua 6.1 RC#2 Issues
> -
>
> Key: JOSHUA-324
> URL: https://issues.apache.org/jira/browse/JOSHUA-324
> Project: Joshua
>  Issue Type: Task
>Affects Versions: 6.1
>    Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Blocker
> Fix For: 6.1
>
>
> Feedback from [~jmclean] (thank you Justin) on our RC#2 is as follows
> {code}
> ==
> - Your missing incubating in the release artifacts name. [1]
> - There are a number of binary files in the source release that look to be
> compiled source code.
> I checked:
> - name doesn’t include incubating
> - signatures and hashes correct
> - DISCLAIMER exists
> - LICENSE is missing a few things (see below)
> - a source file is missing an Apache header [7]
> - Several unexpected binary files are contained in the source release
> [8][9][10][11]
> - Can compile from source
> License is missing:
> - MIT licensed normalize.css v3.0.3 bundled in [5]
> - glyph icon fonts [6]
> Not an issue but it's a little odd to have LICENSE and NOTICE.txt - usually
> both are bare or both have .txt extension.
> Also while looking at your site I noticed that the download links of you
> incubating site [2] points to github, please change to point to the offical
> release area.
> Also the 6.1 release has already been tagged and it available for public
> download on github [4]  before this vote is finished. This is IMO against
> Apache release policy [3] please remove.
> I also notice you recently released the language packs (18th Nov) but there
> doesn’t seem to have been a vote for that? Any reason for this?
> ===
> [1] http://incubator.apache.org/incubation/Incubation_Policy.html#Releases
> [2] 
> https://cwiki.apache.org/confluence/display/JOSHUA/Apache+Joshua+%28Incubating%29+Home
> [3] http://www.apache.org/dev/release.html#what
> [4] https://github.com/apache/incubator-joshua/releases
> [5] ./demo/bootstrap/css/bootstrap.min.css
> [6] apache-joshua-6.1/demo/bootstrap/fonts/*
> [7] ./src/test/java/org/apache/joshua/decoder/ff/tm/OwnerMapTest.java
> [8] ./bin/GIZA++
> [9] ./bin/mkcls
> [10 ]./bin/snt2cooc.out
> [11] ,/src/test/resources/berkeley_lm/lm.berkeleylm.gz
> [12] http://www.mail-archive.com/general%40incubator.apache.org/msg57543.html
> [13] http://www.mail-archive.com/general%40incubator.apache.org/msg57551.html
> {code}
> This is a blocking issue and until addressed we cannot release 6.1-incubating



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-324) Address Apache Joshua 6.1 RC#2 Issues

2017-01-25 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15838087#comment-15838087
 ] 

Lewis John McGibbney commented on JOSHUA-324:
-

[~post] the only pending issue is the mvn assembly issue I described at 
http://www.mail-archive.com/dev%40joshua.incubator.apache.org/msg02023.html
I'll have a crack today and try to resolve it.

> Address Apache Joshua 6.1 RC#2 Issues
> -
>
> Key: JOSHUA-324
> URL: https://issues.apache.org/jira/browse/JOSHUA-324
> Project: Joshua
>  Issue Type: Task
>Affects Versions: 6.1
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Blocker
> Fix For: 6.1
>
>
> Feedback from [~jmclean] (thank you Justin) on our RC#2 is as follows
> {code}
> ==
> - Your missing incubating in the release artifacts name. [1]
> - There are a number of binary files in the source release that look to be
> compiled source code.
> I checked:
> - name doesn’t include incubating
> - signatures and hashes correct
> - DISCLAIMER exists
> - LICENSE is missing a few things (see below)
> - a source file is missing an Apache header [7]
> - Several unexpected binary files are contained in the source release
> [8][9][10][11]
> - Can compile from source
> License is missing:
> - MIT licensed normalize.css v3.0.3 bundled in [5]
> - glyph icon fonts [6]
> Not an issue but it's a little odd to have LICENSE and NOTICE.txt - usually
> both are bare or both have .txt extension.
> Also while looking at your site I noticed that the download links of you
> incubating site [2] points to github, please change to point to the offical
> release area.
> Also the 6.1 release has already been tagged and it available for public
> download on github [4]  before this vote is finished. This is IMO against
> Apache release policy [3] please remove.
> I also notice you recently released the language packs (18th Nov) but there
> doesn’t seem to have been a vote for that? Any reason for this?
> ===
> [1] http://incubator.apache.org/incubation/Incubation_Policy.html#Releases
> [2] 
> https://cwiki.apache.org/confluence/display/JOSHUA/Apache+Joshua+%28Incubating%29+Home
> [3] http://www.apache.org/dev/release.html#what
> [4] https://github.com/apache/incubator-joshua/releases
> [5] ./demo/bootstrap/css/bootstrap.min.css
> [6] apache-joshua-6.1/demo/bootstrap/fonts/*
> [7] ./src/test/java/org/apache/joshua/decoder/ff/tm/OwnerMapTest.java
> [8] ./bin/GIZA++
> [9] ./bin/mkcls
> [10 ]./bin/snt2cooc.out
> [11] ,/src/test/resources/berkeley_lm/lm.berkeleylm.gz
> [12] http://www.mail-archive.com/general%40incubator.apache.org/msg57543.html
> [13] http://www.mail-archive.com/general%40incubator.apache.org/msg57551.html
> {code}
> This is a blocking issue and until addressed we cannot release 6.1-incubating



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


mvn assembly issues

2017-01-18 Thread lewis john mcgibbney
Hi Folks,
Anyone know how to work through this issue? The code in question can be
found at
https://github.com/apache/incubator-joshua/blob/master/pom.xml#L287-L309
Lewis

[INFO]

[INFO] BUILD FAILURE
[INFO]

[INFO] Total time: 16.222 s
[INFO] Finished at: 2017-01-18T13:59:41-08:00
[INFO] Final Memory: 37M/639M
[INFO]

[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-assembly-plugin:3.0.0:single
(source-release-assembly) on project joshua-incubating: Execution
source-release-assembly of goal
org.apache.maven.plugins:maven-assembly-plugin:3.0.0:single failed: user id
'498339010' is too big ( > 2097151 ). -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions,
please read the following articles:
[ERROR] [Help 1]
http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException

-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


[jira] [Commented] (JOSHUA-324) Address Apache Joshua 6.1 RC#2 Issues

2017-01-17 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15827360#comment-15827360
 ] 

Lewis John McGibbney commented on JOSHUA-324:
-

I'll be finishing my QA and producing an RC#3 tomorrow folks. Thanks.
I've just committed 
{code}
commit ae755a8bc0b1de9475285fcc8d35d8a8b5f00a6f
Author: Lewis John McGibbney 
Date:   Tue Jan 17 19:12:10 2017 -0800

JOSHUA-324 Address Apache Joshua 6.1 RC#2 Issues
{code}

> Address Apache Joshua 6.1 RC#2 Issues
> -
>
> Key: JOSHUA-324
> URL: https://issues.apache.org/jira/browse/JOSHUA-324
> Project: Joshua
>  Issue Type: Task
>Affects Versions: 6.1
>    Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Blocker
> Fix For: 6.1
>
>
> Feedback from [~jmclean] (thank you Justin) on our RC#2 is as follows
> {code}
> ==
> - Your missing incubating in the release artifacts name. [1]
> - There are a number of binary files in the source release that look to be
> compiled source code.
> I checked:
> - name doesn’t include incubating
> - signatures and hashes correct
> - DISCLAIMER exists
> - LICENSE is missing a few things (see below)
> - a source file is missing an Apache header [7]
> - Several unexpected binary files are contained in the source release
> [8][9][10][11]
> - Can compile from source
> License is missing:
> - MIT licensed normalize.css v3.0.3 bundled in [5]
> - glyph icon fonts [6]
> Not an issue but it's a little odd to have LICENSE and NOTICE.txt - usually
> both are bare or both have .txt extension.
> Also while looking at your site I noticed that the download links of you
> incubating site [2] points to github, please change to point to the offical
> release area.
> Also the 6.1 release has already been tagged and it available for public
> download on github [4]  before this vote is finished. This is IMO against
> Apache release policy [3] please remove.
> I also notice you recently released the language packs (18th Nov) but there
> doesn’t seem to have been a vote for that? Any reason for this?
> ===
> [1] http://incubator.apache.org/incubation/Incubation_Policy.html#Releases
> [2] 
> https://cwiki.apache.org/confluence/display/JOSHUA/Apache+Joshua+%28Incubating%29+Home
> [3] http://www.apache.org/dev/release.html#what
> [4] https://github.com/apache/incubator-joshua/releases
> [5] ./demo/bootstrap/css/bootstrap.min.css
> [6] apache-joshua-6.1/demo/bootstrap/fonts/*
> [7] ./src/test/java/org/apache/joshua/decoder/ff/tm/OwnerMapTest.java
> [8] ./bin/GIZA++
> [9] ./bin/mkcls
> [10 ]./bin/snt2cooc.out
> [11] ,/src/test/resources/berkeley_lm/lm.berkeleylm.gz
> [12] http://www.mail-archive.com/general%40incubator.apache.org/msg57543.html
> [13] http://www.mail-archive.com/general%40incubator.apache.org/msg57551.html
> {code}
> This is a blocking issue and until addressed we cannot release 6.1-incubating



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Rebase on Relese

2017-01-17 Thread lewis john mcgibbney
Hi Matt,

On Mon, Jan 16, 2017 at 9:27 AM, <
dev-digest-h...@joshua.incubator.apache.org> wrote:

>
> From: Matt Post 
> To: dev@joshua.incubator.apache.org
> Cc:
> Date: Fri, 13 Jan 2017 12:28:59 -0500
> Subject: Re: Rebase on Relese
> Hi Lewis,
>
> Welcome back!
>
> I think we have checked off all the things on your list, and are ready any
> time for the release. Do you have the time to double-check, and then to
> head up this effort?
>
>
I'll try to get back on to it today.
Thanks for everyone patience.
Lewis


Re: Plugging self-hosted Joshua into mailman?

2017-01-17 Thread lewis john mcgibbney
Hi Karel,
The short answer is yes.
I would advise you to start at the Tutorial
https://cwiki.apache.org/confluence/display/JOSHUA/Getting+Started
If you find anything which causes you problems then please write back here.
Once you have skipped through the tutorial then you will have a much better
feel for the workflow required.
I can see the Apache Tika language identification and translate API's being
of particular use here when considered in a runtime context. We have a
Joshua implementation over in Tika which can aid you in this task however
try the Joshua tutorial first.
Lewis

On Mon, Jan 16, 2017 at 7:41 AM, Chris Mattmann  wrote:

> Hi Karel,
>
> I would recommend moving this thread to dev@joshua.incubator.apache.org
> instead of the private list. I’ve moved private to BCC.
>
> Thank you.
>
> Cheers,
> Chris
>
>
>
> On 1/16/17, 6:58 AM, wrote:
>
> Hello,
>
> We would like to build a self-hosted machine translation system that
> could be plugged into our mailman installs. The objective is that the
> members of our multicultural network would be able to send email in
> their mother language and it would be delivered to the list
> machine-translated (and vise versa).
>
> Are we on the right track with Joshua? I suppose that a lot of
> configuration would be needed, but at this point I want to know if I am
> not completely mistaken when considering your sw for this.
>
> Thanks
>
> karel
>
>
> --
> ~~~
> Karel Novotny
> Knowledge Sharing & Network Development Coordinator
> APC - The Association for Progressive Communications
> https://www.apc.org
> GSM: +420 605 243 246 (GMT +1)
> jabber: ka...@riseup.net
> Working/online: Monday - Thursday
> ~~~
> My public OpenPGP key: https://pgp.mit.edu/pks/lookup?op=get&search=
> 0x7FDEF502377E4FCA
>
>
>
>
>
>


-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Rebase on Relese

2017-01-13 Thread lewis john mcgibbney
Hi Folks,
Where are we with the release? I need to apologize for disappearing. Phone
off and Laptop off for close to 3 weeks.
Can someone bring me up-to-date with where we are?
Thanks
Lewis

-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Skype’s real-time translation now works for calls to mobiles and landlines - The Verge

2016-12-12 Thread Lewis John Mcgibbney
Skype’s real-time translation now works for calls to mobiles and landlines
- The Verge


https://apple.news/ATIBym7JBSmmPm4-jNBOMdA


Re: Downloading of non ASF licensed code

2016-12-02 Thread lewis john mcgibbney
Hi Matt,
Can you please open a ticket for this and commit to master branch?
Thanks

On Thu, Dec 1, 2016 at 4:49 AM,  wrote:

> From: Matt Post 
> To: dev@joshua.incubator.apache.org
> Cc:
> Date: Mon, 28 Nov 2016 11:16:26 -0500
> Subject: Re: Downloading of non ASF licensed code
> This would be easy to do. Maybe just a simple prompt that alerts the user?
> Something like
>
> echo "Warning: this script downloads many tools used in building
> and running"
> echo "Joshua. Not all of them are Apache Licensed. If you wish to
> continue, hit Enter".
> read j
> if [[ ! -z $j ]]; then
> echo "Quitting."
> fi
>
>
>


[jira] [Commented] (JOSHUA-324) Address Apache Joshua 6.1 RC#2 Issues

2016-11-29 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706577#comment-15706577
 ] 

Lewis John McGibbney commented on JOSHUA-324:
-

Hi Folks, I've assigned this to myself and will begin working on a pull request 
to incrementally address the above issues.

> Address Apache Joshua 6.1 RC#2 Issues
> -
>
> Key: JOSHUA-324
> URL: https://issues.apache.org/jira/browse/JOSHUA-324
> Project: Joshua
>  Issue Type: Task
>Affects Versions: 6.1
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Blocker
> Fix For: 6.1
>
>
> Feedback from [~jmclean] (thank you Justin) on our RC#2 is as follows
> {code}
> ==
> - Your missing incubating in the release artifacts name. [1]
> - There are a number of binary files in the source release that look to be
> compiled source code.
> I checked:
> - name doesn’t include incubating
> - signatures and hashes correct
> - DISCLAIMER exists
> - LICENSE is missing a few things (see below)
> - a source file is missing an Apache header [7]
> - Several unexpected binary files are contained in the source release
> [8][9][10][11]
> - Can compile from source
> License is missing:
> - MIT licensed normalize.css v3.0.3 bundled in [5]
> - glyph icon fonts [6]
> Not an issue but it's a little odd to have LICENSE and NOTICE.txt - usually
> both are bare or both have .txt extension.
> Also while looking at your site I noticed that the download links of you
> incubating site [2] points to github, please change to point to the offical
> release area.
> Also the 6.1 release has already been tagged and it available for public
> download on github [4]  before this vote is finished. This is IMO against
> Apache release policy [3] please remove.
> I also notice you recently released the language packs (18th Nov) but there
> doesn’t seem to have been a vote for that? Any reason for this?
> ===
> [1] http://incubator.apache.org/incubation/Incubation_Policy.html#Releases
> [2] 
> https://cwiki.apache.org/confluence/display/JOSHUA/Apache+Joshua+%28Incubating%29+Home
> [3] http://www.apache.org/dev/release.html#what
> [4] https://github.com/apache/incubator-joshua/releases
> [5] ./demo/bootstrap/css/bootstrap.min.css
> [6] apache-joshua-6.1/demo/bootstrap/fonts/*
> [7] ./src/test/java/org/apache/joshua/decoder/ff/tm/OwnerMapTest.java
> [8] ./bin/GIZA++
> [9] ./bin/mkcls
> [10 ]./bin/snt2cooc.out
> [11] ,/src/test/resources/berkeley_lm/lm.berkeleylm.gz
> [12] http://www.mail-archive.com/general%40incubator.apache.org/msg57543.html
> [13] http://www.mail-archive.com/general%40incubator.apache.org/msg57551.html
> {code}
> This is a blocking issue and until addressed we cannot release 6.1-incubating



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (JOSHUA-324) Address Apache Joshua 6.1 RC#2 Issues

2016-11-29 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney reassigned JOSHUA-324:
---

Assignee: Lewis John McGibbney

> Address Apache Joshua 6.1 RC#2 Issues
> -
>
> Key: JOSHUA-324
> URL: https://issues.apache.org/jira/browse/JOSHUA-324
> Project: Joshua
>  Issue Type: Task
>Affects Versions: 6.1
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Blocker
> Fix For: 6.1
>
>
> Feedback from [~jmclean] (thank you Justin) on our RC#2 is as follows
> {code}
> ==
> - Your missing incubating in the release artifacts name. [1]
> - There are a number of binary files in the source release that look to be
> compiled source code.
> I checked:
> - name doesn’t include incubating
> - signatures and hashes correct
> - DISCLAIMER exists
> - LICENSE is missing a few things (see below)
> - a source file is missing an Apache header [7]
> - Several unexpected binary files are contained in the source release
> [8][9][10][11]
> - Can compile from source
> License is missing:
> - MIT licensed normalize.css v3.0.3 bundled in [5]
> - glyph icon fonts [6]
> Not an issue but it's a little odd to have LICENSE and NOTICE.txt - usually
> both are bare or both have .txt extension.
> Also while looking at your site I noticed that the download links of you
> incubating site [2] points to github, please change to point to the offical
> release area.
> Also the 6.1 release has already been tagged and it available for public
> download on github [4]  before this vote is finished. This is IMO against
> Apache release policy [3] please remove.
> I also notice you recently released the language packs (18th Nov) but there
> doesn’t seem to have been a vote for that? Any reason for this?
> ===
> [1] http://incubator.apache.org/incubation/Incubation_Policy.html#Releases
> [2] 
> https://cwiki.apache.org/confluence/display/JOSHUA/Apache+Joshua+%28Incubating%29+Home
> [3] http://www.apache.org/dev/release.html#what
> [4] https://github.com/apache/incubator-joshua/releases
> [5] ./demo/bootstrap/css/bootstrap.min.css
> [6] apache-joshua-6.1/demo/bootstrap/fonts/*
> [7] ./src/test/java/org/apache/joshua/decoder/ff/tm/OwnerMapTest.java
> [8] ./bin/GIZA++
> [9] ./bin/mkcls
> [10 ]./bin/snt2cooc.out
> [11] ,/src/test/resources/berkeley_lm/lm.berkeleylm.gz
> [12] http://www.mail-archive.com/general%40incubator.apache.org/msg57543.html
> [13] http://www.mail-archive.com/general%40incubator.apache.org/msg57551.html
> {code}
> This is a blocking issue and until addressed we cannot release 6.1-incubating



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Issues to Fix with Apache Joshua 6.1 RC#2

2016-11-29 Thread lewis john mcgibbney
Hi Folks,
We have a number of issues to fix which were picked up over on general@. In
particular, we received excellent feedback from my good friend Justin [12]
[13]. As the general@ VOTE has not had 72 hours to stew I am not going to
close it, however we should take this time to fix the issues with master
before we spin an RC#3. These can be summarized as follows.
I've opened a Jira issue to track all of this.
https://issues.apache.org/jira/browse/JOSHUA-324
Lets track the progress on the Jira ticket.

==
- Your missing incubating in the release artifacts name. [1]
- There are a number of binary files in the source release that look to be
compiled source code.

I checked:
- name doesn’t include incubating
- signatures and hashes correct
- DISCLAIMER exists
- LICENSE is missing a few things (see below)
- a source file is missing an Apache header [7]
- Several unexpected binary files are contained in the source release
[8][9][10][11]
- Can compile from source

License is missing:
- MIT licensed normalize.css v3.0.3 bundled in [5]
- glyph icon fonts [6]

Not an issue but it's a little odd to have LICENSE and NOTICE.txt - usually
both are bare or both have .txt extension.

Also while looking at your site I noticed that the download links of you
incubating site [2] points to github, please change to point to the offical
release area.
Also the 6.1 release has already been tagged and it available for public
download on github [4]  before this vote is finished. This is IMO against
Apache release policy [3] please remove.

I also notice you recently released the language packs (18th Nov) but there
doesn’t seem to have been a vote for that? Any reason for this?
===

[1] http://incubator.apache.org/incubation/Incubation_Policy.html#Releases
[2]
https://cwiki.apache.org/confluence/display/JOSHUA/Apache+Joshua+%28Incubating%29+Home
[3] http://www.apache.org/dev/release.html#what
[4] https://github.com/apache/incubator-joshua/releases
[5] ./demo/bootstrap/css/bootstrap.min.css
[6] apache-joshua-6.1/demo/bootstrap/fonts/*
[7] ./src/test/java/org/apache/joshua/decoder/ff/tm/OwnerMapTest.java
[8] ./bin/GIZA++
[9] ./bin/mkcls
[10 ]./bin/snt2cooc.out
[11] ,/src/test/resources/berkeley_lm/lm.berkeleylm.gz
[12]
http://www.mail-archive.com/general%40incubator.apache.org/msg57543.html
[13]
http://www.mail-archive.com/general%40incubator.apache.org/msg57551.html


--
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


[jira] [Created] (JOSHUA-324) Address Apache Joshua 6.1 RC#2 Issues

2016-11-29 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created JOSHUA-324:
---

 Summary: Address Apache Joshua 6.1 RC#2 Issues
 Key: JOSHUA-324
 URL: https://issues.apache.org/jira/browse/JOSHUA-324
 Project: Joshua
  Issue Type: Task
Affects Versions: 6.1
Reporter: Lewis John McGibbney
Priority: Blocker
 Fix For: 6.1


Feedback from [~jmclean] (thank you Justin) on our RC#2 is as follows
{code}
==
- Your missing incubating in the release artifacts name. [1]
- There are a number of binary files in the source release that look to be
compiled source code.

I checked:
- name doesn’t include incubating
- signatures and hashes correct
- DISCLAIMER exists
- LICENSE is missing a few things (see below)
- a source file is missing an Apache header [7]
- Several unexpected binary files are contained in the source release
[8][9][10][11]
- Can compile from source

License is missing:
- MIT licensed normalize.css v3.0.3 bundled in [5]
- glyph icon fonts [6]

Not an issue but it's a little odd to have LICENSE and NOTICE.txt - usually
both are bare or both have .txt extension.

Also while looking at your site I noticed that the download links of you
incubating site [2] points to github, please change to point to the offical
release area.
Also the 6.1 release has already been tagged and it available for public
download on github [4]  before this vote is finished. This is IMO against
Apache release policy [3] please remove.

I also notice you recently released the language packs (18th Nov) but there
doesn’t seem to have been a vote for that? Any reason for this?
===

[1] http://incubator.apache.org/incubation/Incubation_Policy.html#Releases
[2] 
https://cwiki.apache.org/confluence/display/JOSHUA/Apache+Joshua+%28Incubating%29+Home
[3] http://www.apache.org/dev/release.html#what
[4] https://github.com/apache/incubator-joshua/releases
[5] ./demo/bootstrap/css/bootstrap.min.css
[6] apache-joshua-6.1/demo/bootstrap/fonts/*
[7] ./src/test/java/org/apache/joshua/decoder/ff/tm/OwnerMapTest.java
[8] ./bin/GIZA++
[9] ./bin/mkcls
[10 ]./bin/snt2cooc.out
[11] ,/src/test/resources/berkeley_lm/lm.berkeleylm.gz
[12] http://www.mail-archive.com/general%40incubator.apache.org/msg57543.html
[13] http://www.mail-archive.com/general%40incubator.apache.org/msg57551.html
{code}
This is a blocking issue and until addressed we cannot release 6.1-incubating



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[RESULT] WAS Re: [VOTE] Release Apache Joshua 6.1 RC#2

2016-11-28 Thread lewis john mcgibbney
Evening All,
OK, 72 hours has come and gone. I'm going to close of this VOTE thread. The
following VOTE's were cast.

[10] +1, let's get it released!!!
Lewis John McGibbney
Matt Post
Tommaso Teofili
John Hewitt
Kellen Sunderland
Tom Barber
Chris A. Mattmann
Henry Saptura
Michael A. Hedderich
Felix Hieber

[0] +/-0, fine, but consider to fix few issues before...
[0] -1, nope, because... (and please explain why)

Thank you to everyone that VOTE'd, I'll progress to general@ and see how we
get on.
Thanks
Lewis

On Tue, Nov 22, 2016 at 9:15 PM, lewis john mcgibbney 
wrote:

> Hello user@ and dev,
> Please VOTE on the Apache Joshua 6.1 Release Candidate #2.
>
> We solved 50 issues: https://s.apache.org/joshua6.1
>
> Git source tag (29c8be650d53216f779a340d33f8f61af4d45629):
> https://s.apache.org/pk2t <https://s.apache.org/joshua6.1tag>
>
> Staging repo: https://repository.apache.org/content/repositories/
> orgapachejoshua-1001/
> <https://repository.apache.org/content/repositories/orgapachejoshua-1000/>
>
> Source Release Artifacts: https://dist.apache.org/repos/
> dist/dev/incubator/joshua/
>
> PGP release keys (signed using 48BAEBF6): https://dist.apache.org/repos/
> dist/release/incubator/joshua/KEYS
>
> Vote will be open for 72 hours.
> Thank you to everyone that is able to VOTE as well as everyone that
> contributed to Apache Joshua 6.1.
>
> [ ] +1, let's get it released!!!
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
> P.S. here is my +1
>
> --
> http://home.apache.org/~lewismc/
> @hectorMcSpector
> http://www.linkedin.com/in/lmcgibbney
>



-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


[VOTE] Release Apache Joshua 6.1 RC#2

2016-11-22 Thread lewis john mcgibbney
Hello user@ and dev,
Please VOTE on the Apache Joshua 6.1 Release Candidate #2.

We solved 50 issues: https://s.apache.org/joshua6.1

Git source tag (29c8be650d53216f779a340d33f8f61af4d45629):
https://s.apache.org/pk2t 

Staging repo:
https://repository.apache.org/content/repositories/orgapachejoshua-1001/


Source Release Artifacts: https://dist.apache.org/repos/
dist/dev/incubator/joshua/

PGP release keys (signed using 48BAEBF6): https://dist.apache.org/repos/
dist/release/incubator/joshua/KEYS

Vote will be open for 72 hours.
Thank you to everyone that is able to VOTE as well as everyone that
contributed to Apache Joshua 6.1.

[ ] +1, let's get it released!!!
[ ] +/-0, fine, but consider to fix few issues before...
[ ] -1, nope, because... (and please explain why)

P.S. here is my +1

-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


[jira] [Updated] (JOSHUA-315) Thrax keeps all rules

2016-11-22 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated JOSHUA-315:

Fix Version/s: (was: 6.2)
   6.1

> Thrax keeps all rules
> -
>
> Key: JOSHUA-315
> URL: https://issues.apache.org/jira/browse/JOSHUA-315
> Project: Joshua
>  Issue Type: Bug
>Reporter: Matt Post
> Fix For: 6.1
>
>
> When extracting rules, Thrax keeps *all* options for each target side. For 
> large bitexts and common source sides (e.g., "de" for Spanish–English), there 
> can be tens of thousands of translations, due to errors in the alignments and 
> phenomena like garbage collection. The decoder throws out all but the top 
> num_translation_options of these (default 20), but before doing so, it has to 
> score all the target side options with all feature functions, include the 
> language model. This slows down "warming up" of the model and means that the 
> first sentences to use these items are very slow to translation.
> I have updated scripts/training/filter-rules.pl to filter out using Thrax's 
> rarity penalty field, but it would be much better if Thrax were to keep only 
> the most 100 frequent translation options for each source side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Dockerhub hosted images

2016-11-22 Thread lewis john mcgibbney
Hi Kellen,
Nice :)
Another option is for us to host these via the Apache account.
https://hub.docker.com/r/apache/
We could then add a badge to our README which points to the Dockerfile(s).
Do you want to open a ticket over on the INFRA Jira for this?

On Tue, Nov 22, 2016 at 1:57 PM, <
dev-digest-h...@joshua.incubator.apache.org> wrote:

> From: kellen sunderland 
> To: "dev@joshua.incubator.apache.org" 
> Cc:
> Date: Tue, 22 Nov 2016 22:56:56 +0100
> Subject: Re: Dockerhub hosted images
> Ok, the first image should be properly uploaded now.
>
> https://hub.docker.com/r/kellens/apache-joshua-es-en-2016-10-05/
>
> -Kellen
>
>


Dockerfile Issue

2016-11-22 Thread lewis john mcgibbney
Hi Kellen,
Have you fixed your Dockerfile issue? If so then please confirm and I can
spin out RC#2 for 6.1.
Thanks in advance Kellen.
Lewis

-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


[RESULTS] WAS Re: [VOTE] Release Apache Joshua (Incubating) 6.1

2016-11-22 Thread lewis john mcgibbney
Hello Folks,
72 hours has come and gone so I am going to close off this thread.
The VOTE's are as follows

[2] +1, let's get it released!!!
Lewis John McGibbney
Chris Mattmann

[0] +/-0, fine, but consider to fix few issues before...

[1] -1, nope, because... (and please explain why)
Henry Saptura

The -1 is justified over at
http://www.mail-archive.com/dev%40joshua.incubator.apache.org/msg01891.html
Thank you Henry for your attention to detail. The VOTE does not pass, we
will respin an RC and VOTE again over on dev@joshua
Lewis


On Fri, Nov 18, 2016 at 2:11 PM, lewis john mcgibbney 
wrote:

> Hello general@incubator,
> Please VOTE on the Apache Joshua 6.1 Release Candidate #1. The release
> VOTE has passed over on user@ and dev@joshua with the following results
> http://www.mail-archive.com/dev%40joshua.incubator.apache.
> org/msg01884.html.
>
> We solved 44 issues: https://s.apache.org/joshua6.1
>
> Git source tag (167489bbd78526b9833fe7c88646bf96101d5d2b):
> https://s.apache.org/joshua6.1tag
>
> Staging repo: https://repository.apache.org/content/repositories/orgapache
> joshua-1000/
>
> Source Release Artifacts: https://dist.apache.org/repos/
> dist/dev/incubator/joshua/
>
> PGP release keys (signed using 48BAEBF6): https://dist.apache.org/repos/
> dist/release/incubator/joshua/KEYS
>
> Vote will be open for 72 hours.
> Thank you to everyone that is able to VOTE as well as everyone that
> contributed to Apache Joshua 6.1.
>
> [ ] +1, let's get it released!!!
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
> P.S. here is my +1
>
> --
> http://home.apache.org/~lewismc/
> @hectorMcSpector
> http://www.linkedin.com/in/lmcgibbney
>



-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


[jira] [Updated] (JOSHUA-316) run_bundler.py returning JOB FAILED (return code 1) TypeError: memoryview: a bytes-like object is required, not 'str'

2016-11-22 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated JOSHUA-316:

Fix Version/s: (was: 6.2)
   6.1

> run_bundler.py returning JOB FAILED (return code 1) TypeError: memoryview: a 
> bytes-like object is required, not 'str'
> -
>
> Key: JOSHUA-316
> URL: https://issues.apache.org/jira/browse/JOSHUA-316
> Project: Joshua
>  Issue Type: Bug
>  Components: bundler
>Affects Versions: 6.0.5
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Critical
> Fix For: 6.1
>
>
> {code}
> [glue-tune] rebuilding...
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp2/grammar.packed/slice_0.source
>  [CHANGED]
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue
>  [NOT FOUND]
>   cmd=/usr/local/incubator-joshua/scripts/support/create_glue_grammar.sh 
> /usr/local/joshua_resources/russian_experiments/exp2/grammar.packed > 
> /usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue
>   took 1 seconds (1s)
> [tune-bundle] rebuilding...
>   
> dep=/usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config 
> [CHANGED]
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp2/grammar.packed/slice_0.source
>  [CHANGED]
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp2/tune/model/run-joshua.sh
>  [NOT FOUND]
>   cmd=/usr/local/incubator-joshua/scripts/support/run_bundler.py --force 
> --symlink --absolute --verbose -T /usr/local/hadoop-2.5.2/hadoop_tmp_dir 
> /usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config 
> /usr/local/joshua_resources/russian_experiments/exp2/tune/model 
> --copy-config-options '-top-n 300 -output-format "%i ||| %s ||| %f ||| %c" 
> -mark-oovs false -search cky -weights "lm_0 1 tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 
> tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " -feature-function 
> "StateMinimizingLanguageModel -lm_order 5 -lm_file 
> /usr/local/joshua_resources/russian_experiments/exp2/lm.kenlm"  -tm0/type 
> hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue' --pack-tm 
> /usr/local/joshua_resources/russian_experiments/exp2/grammar.packed --tm 
> /usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue
>   JOB FAILED (return code 1)
> * Running the copy-config.pl script with the command: 
> /usr/local/incubator-joshua/scripts/copy-config.pl -top-n 300 -output-format 
> "%i ||| %s ||| %f ||| %c" -mark-oovs false -search cky -weights "lm_0 1 
> tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " 
> -feature-function "StateMinimizingLanguageModel -lm_order 5 -lm_file 
> /usr/local/joshua_resources/russian_experiments/exp2/lm.kenlm"  -tm0/type 
> hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue
> Traceback (most recent call last):
>   File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line 
> 748, in main
> operations = collect_operations(opts)
>   File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line 
> 637, in collect_operations
> opts.copy_config_options
>   File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line 
> 202, in filter_through_copy_config_script
> result, err = p.communicate(config_text)
>   File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 1072, 
> in communicate
> stdout, stderr = self._communicate(input, endtime, timeout)
>   File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 1700, 
> in _communicate
> input_view = memoryview(self._input)
> TypeError: memoryview: a bytes-like object is required, not 'str'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line 
> 760, in 
> main(sys.argv)
>   File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line 
> 751, in main
> error_quit(e.message)
> AttributeError: 'TypeError' object has no attribute 'message'
> * WARNING: no key 'outputformat' found in config file (appending to end)
> * WARNING: no key 'search' found in config file (appending to end)
> * WARNING: no key 'topn' found in config file (appending to end)
> * WARNING: no key 'markoovs' found in config file (appending to end)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (JOSHUA-317) SyntaxError: invalid syntax scripts/training/run_tuner.py", line 391

2016-11-22 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved JOSHUA-317.
-
Resolution: Fixed

> SyntaxError: invalid syntax scripts/training/run_tuner.py", line 391
> 
>
> Key: JOSHUA-317
> URL: https://issues.apache.org/jira/browse/JOSHUA-317
> Project: Joshua
>  Issue Type: Bug
>  Components: tuner
>Affects Versions: 6.0.5
> Environment: Python 3.5
>Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
> Fix For: 6.1
>
>
> {code}
> [tune-bundle] rebuilding...
>   
> dep=/usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config 
> [CHANGED]
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp3/grammar.packed/slice_0.source
>  [CHANGED]
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/model/run-joshua.sh
>  [NOT FOUND]
>   cmd=/usr/local/incubator-joshua/scripts/support/run_bundler.py --force 
> --symlink --absolute --verbose -T /usr/local/hadoop-2.5.2/hadoop_tmp_dir 
> /usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/model 
> --copy-config-options '-top-n 300 -output-format "%i ||| %s ||| %f ||| %c" 
> -mark-oovs false -search cky -weights "lm_0 1 tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 
> tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " -feature-function 
> "StateMinimizingLanguageModel -lm_order 5 -lm_file 
> /usr/local/joshua_resources/russian_experiments/exp3/lm.kenlm"  -tm0/type 
> hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue' --pack-tm 
> /usr/local/joshua_resources/russian_experiments/exp3/grammar.packed --tm 
> /usr/local/joshua_resources/russian_experiments/exp3/data/tune/grammar.glue
>   took 0 seconds (0s)
> [mert-1] rebuilding...
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en 
> [CHANGED]
>   dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config 
> [CHANGED]
>   dep=tune/model/grammar.packed/slice_0.source [CHANGED]
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config.final
>  [NOT FOUND]
>   cmd=/usr/local/incubator-joshua/scripts/training/run_tuner.py 
> /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en 
> /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.ru 
> --tunedir /usr/local/joshua_resources/russian_experiments/exp3/tune --tuner 
> mert --decoder 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/decoder_command 
> --decoder-config 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config 
> --decoder-output-file 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/output.nbest 
> --decoder-log-file 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.log 
> --iterations 10 --metric 'BLEU 4 closest'
>   JOB FAILED (return code 1)
>   File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 391
> 'ITERATIONS': `iterations`,
>   ^
> SyntaxError: invalid syntax
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (JOSHUA-316) run_bundler.py returning JOB FAILED (return code 1) TypeError: memoryview: a bytes-like object is required, not 'str'

2016-11-22 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved JOSHUA-316.
-
Resolution: Fixed

> run_bundler.py returning JOB FAILED (return code 1) TypeError: memoryview: a 
> bytes-like object is required, not 'str'
> -
>
> Key: JOSHUA-316
> URL: https://issues.apache.org/jira/browse/JOSHUA-316
> Project: Joshua
>  Issue Type: Bug
>  Components: bundler
>Affects Versions: 6.0.5
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Critical
> Fix For: 6.1
>
>
> {code}
> [glue-tune] rebuilding...
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp2/grammar.packed/slice_0.source
>  [CHANGED]
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue
>  [NOT FOUND]
>   cmd=/usr/local/incubator-joshua/scripts/support/create_glue_grammar.sh 
> /usr/local/joshua_resources/russian_experiments/exp2/grammar.packed > 
> /usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue
>   took 1 seconds (1s)
> [tune-bundle] rebuilding...
>   
> dep=/usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config 
> [CHANGED]
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp2/grammar.packed/slice_0.source
>  [CHANGED]
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp2/tune/model/run-joshua.sh
>  [NOT FOUND]
>   cmd=/usr/local/incubator-joshua/scripts/support/run_bundler.py --force 
> --symlink --absolute --verbose -T /usr/local/hadoop-2.5.2/hadoop_tmp_dir 
> /usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config 
> /usr/local/joshua_resources/russian_experiments/exp2/tune/model 
> --copy-config-options '-top-n 300 -output-format "%i ||| %s ||| %f ||| %c" 
> -mark-oovs false -search cky -weights "lm_0 1 tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 
> tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " -feature-function 
> "StateMinimizingLanguageModel -lm_order 5 -lm_file 
> /usr/local/joshua_resources/russian_experiments/exp2/lm.kenlm"  -tm0/type 
> hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue' --pack-tm 
> /usr/local/joshua_resources/russian_experiments/exp2/grammar.packed --tm 
> /usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue
>   JOB FAILED (return code 1)
> * Running the copy-config.pl script with the command: 
> /usr/local/incubator-joshua/scripts/copy-config.pl -top-n 300 -output-format 
> "%i ||| %s ||| %f ||| %c" -mark-oovs false -search cky -weights "lm_0 1 
> tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " 
> -feature-function "StateMinimizingLanguageModel -lm_order 5 -lm_file 
> /usr/local/joshua_resources/russian_experiments/exp2/lm.kenlm"  -tm0/type 
> hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue
> Traceback (most recent call last):
>   File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line 
> 748, in main
> operations = collect_operations(opts)
>   File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line 
> 637, in collect_operations
> opts.copy_config_options
>   File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line 
> 202, in filter_through_copy_config_script
> result, err = p.communicate(config_text)
>   File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 1072, 
> in communicate
> stdout, stderr = self._communicate(input, endtime, timeout)
>   File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 1700, 
> in _communicate
> input_view = memoryview(self._input)
> TypeError: memoryview: a bytes-like object is required, not 'str'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line 
> 760, in 
> main(sys.argv)
>   File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line 
> 751, in main
> error_quit(e.message)
> AttributeError: 'TypeError' object has no attribute 'message'
> * WARNING: no key 'outputformat' found in config file (appending to end)
> * WARNING: no key 'search' found in config file (appending to end)
> * WARNING: no key 'topn' found in config file (appending to end)
> * WARNING: no key 'markoovs' found in config file (appending to end)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JOSHUA-317) SyntaxError: invalid syntax scripts/training/run_tuner.py", line 391

2016-11-22 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated JOSHUA-317:

Fix Version/s: (was: 6.2)
   6.1

> SyntaxError: invalid syntax scripts/training/run_tuner.py", line 391
> 
>
> Key: JOSHUA-317
> URL: https://issues.apache.org/jira/browse/JOSHUA-317
> Project: Joshua
>  Issue Type: Bug
>  Components: tuner
>Affects Versions: 6.0.5
> Environment: Python 3.5
>Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
> Fix For: 6.1
>
>
> {code}
> [tune-bundle] rebuilding...
>   
> dep=/usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config 
> [CHANGED]
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp3/grammar.packed/slice_0.source
>  [CHANGED]
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/model/run-joshua.sh
>  [NOT FOUND]
>   cmd=/usr/local/incubator-joshua/scripts/support/run_bundler.py --force 
> --symlink --absolute --verbose -T /usr/local/hadoop-2.5.2/hadoop_tmp_dir 
> /usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/model 
> --copy-config-options '-top-n 300 -output-format "%i ||| %s ||| %f ||| %c" 
> -mark-oovs false -search cky -weights "lm_0 1 tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 
> tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " -feature-function 
> "StateMinimizingLanguageModel -lm_order 5 -lm_file 
> /usr/local/joshua_resources/russian_experiments/exp3/lm.kenlm"  -tm0/type 
> hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue' --pack-tm 
> /usr/local/joshua_resources/russian_experiments/exp3/grammar.packed --tm 
> /usr/local/joshua_resources/russian_experiments/exp3/data/tune/grammar.glue
>   took 0 seconds (0s)
> [mert-1] rebuilding...
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en 
> [CHANGED]
>   dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config 
> [CHANGED]
>   dep=tune/model/grammar.packed/slice_0.source [CHANGED]
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config.final
>  [NOT FOUND]
>   cmd=/usr/local/incubator-joshua/scripts/training/run_tuner.py 
> /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en 
> /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.ru 
> --tunedir /usr/local/joshua_resources/russian_experiments/exp3/tune --tuner 
> mert --decoder 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/decoder_command 
> --decoder-config 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config 
> --decoder-output-file 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/output.nbest 
> --decoder-log-file 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.log 
> --iterations 10 --metric 'BLEU 4 closest'
>   JOB FAILED (return code 1)
>   File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 391
> 'ITERATIONS': `iterations`,
>   ^
> SyntaxError: invalid syntax
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Unable to run the language packs - facing some errors

2016-11-22 Thread lewis john mcgibbney
Hi Dixon,
There is no need to include builds@, that list is for all issues regarding
builds.apache.org.
All you need to do is increase the available memory for the JVM... this can
be done from within the script(s) which you are invoking.

On Mon, Nov 21, 2016 at 4:41 AM, Dixon Daniel  wrote:

> Hi,
>
> I am trying to run the German to English Language pack but I get the error
> shown below:
>
> dixon@HOME:~/Joshua/apache-joshua-de-en-2016-11-18$ cat example.de |
> ./prepare.sh | ./joshua
> Exception in thread "main" java.lang.RuntimeException: Unable to
> instantiate feature function 'LanguageModel -lm_type berkeleylm -lm_order 4
> -lm_file model/lm.berkeleylm'!
> at org.apache.joshua.decoder.Decoder.initializeFeatureFunctions(
> Decoder.java:632)
> at org.apache.joshua.decoder.Decoder.initialize(Decoder.java:394)
> at org.apache.joshua.decoder.Decoder.(Decoder.java:128)
> at org.apache.joshua.decoder.JoshuaDecoder.main(
> JoshuaDecoder.java:69)
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance(
> NativeConstructorAccessorImpl.java:62)
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
> DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at org.apache.joshua.decoder.Decoder.initializeFeatureFunctions(
> Decoder.java:628)
> ... 3 more
> Caused by: java.lang.OutOfMemoryError: Java heap space
> at java.lang.reflect.Array.newArray(Native Method)
> at java.lang.reflect.Array.newInstance(Array.java:75)
> at java.io.ObjectInputStream.readArray(ObjectInputStream.
> java:1678)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.
> java:1347)
> at java.io.ObjectInputStream.defaultReadFields(
> ObjectInputStream.java:2018)
> at java.io.ObjectInputStream.readSerialData(
> ObjectInputStream.java:1942)
> at java.io.ObjectInputStream.readOrdinaryObject(
> ObjectInputStream.java:1808)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.
> java:1353)
> at java.io.ObjectInputStream.defaultReadFields(
> ObjectInputStream.java:2018)
> at java.io.ObjectInputStream.readSerialData(
> ObjectInputStream.java:1942)
> at java.io.ObjectInputStream.readOrdinaryObject(
> ObjectInputStream.java:1808)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.
> java:1353)
> at java.io.ObjectInputStream.defaultReadFields(
> ObjectInputStream.java:2018)
> at java.io.ObjectInputStream.readSerialData(
> ObjectInputStream.java:1942)
> at java.io.ObjectInputStream.readOrdinaryObject(
> ObjectInputStream.java:1808)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.
> java:1353)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.
> java:373)
> at edu.berkeley.nlp.lm.io.IOUtils.readObjFile(IOUtils.java:139)
> at edu.berkeley.nlp.lm.io.IOUtils.readObjFileHard(
> IOUtils.java:164)
> at edu.berkeley.nlp.lm.io.IOUtils.readObjFileHard(
> IOUtils.java:159)
> at edu.berkeley.nlp.lm.io.LmReaders.readLmBinary(
> LmReaders.java:337)
> at org.apache.joshua.decoder.ff.lm.berkeley_lm.
> LMGrammarBerkeley.(LMGrammarBerkeley.java:87)
> at org.apache.joshua.decoder.ff.lm.LanguageModelFF.
> initializeLM(LanguageModelFF.java:158)
> at org.apache.joshua.decoder.ff.lm.LanguageModelFF.(
> LanguageModelFF.java:132)
> ... 8 more
>
> Could you please help me resolve this error?
>
> Thanks,
> Dixon
>
> Have a great day!
>



-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


[VOTE] Release Apache Joshua (Incubating) 6.1

2016-11-18 Thread lewis john mcgibbney
Hello general@incubator,
Please VOTE on the Apache Joshua 6.1 Release Candidate #1. The release VOTE
has passed over on user@ and dev@joshua with the following results
http://www.mail-archive.com/dev%40joshua.incubator.apache.org/msg01884.html.

We solved 44 issues: https://s.apache.org/joshua6.1

Git source tag (167489bbd78526b9833fe7c88646bf96101d5d2b):
https://s.apache.org/joshua6.1tag

Staging repo: https://repository.apache.org/content/repositories/
orgapachejoshua-1000/

Source Release Artifacts: https://dist.apache.org/repos/
dist/dev/incubator/joshua/

PGP release keys (signed using 48BAEBF6): https://dist.apache.org/repos/
dist/release/incubator/joshua/KEYS

Vote will be open for 72 hours.
Thank you to everyone that is able to VOTE as well as everyone that
contributed to Apache Joshua 6.1.

[ ] +1, let's get it released!!!
[ ] +/-0, fine, but consider to fix few issues before...
[ ] -1, nope, because... (and please explain why)

P.S. here is my +1

-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


[RESULTS] WAS Re: [VOTE] Release Apache Joshua (Incubating) 6.1

2016-11-18 Thread lewis john mcgibbney
Hi Team,
OK so I am going to bring this VOTE to a close... 72 hours has come and
gone.
The VOTE's are in and are as follows

[8] +1, let's get it released!!!
Lewis John McGibbney**
Paul M. Ramirez**
Matt Post*
John Hewitt*
Tommaso Teofili**
Thamme Gowda*
Kellen Sunderland*
Felix Hieber*

[0] +/-0, fine, but consider to fix few issues before...
[0] -1, nope, because... (and please explain why)

* = Joshua PPMC
** Joshua PPMC + Incubator PMC

Thank you to everyone that was able to VOTE, much appreciated. Also thank
you everyone that was able to contribute towards Joshua in the past. The
project is looking in great shape and the release of the language packs in
just killer.
I'll progress with the VOTE over on general@incubator.
See you there!
Lewis

On Mon, Nov 14, 2016 at 9:16 AM, lewis john mcgibbney 
wrote:

> Hi Folks,
> Please VOTE on the Apache Joshua 6.1 Release Candidate #1.
>
> We solved 44 issues: https://s.apache.org/joshua6.1
>
> Git source tag (167489bbd78526b9833fe7c88646bf96101d5d2b):
> https://s.apache.org/joshua6.1tag
>
> Staging repo: https://repository.apache.org/content/repositories/
> orgapachejoshua-1000/
>
> Source Release Artifacts: https://dist.apache.org/repos/
> dist/dev/incubator/joshua/
>
> PGP release keys (signed using 48BAEBF6): https://dist.apache.org/repos/
> dist/release/incubator/joshua/KEYS
>
> Vote will be open for 72 hours.
> Thank you to everyone that is able to VOTE as well as everyone that
> contributed to Apache Joshua 6.1.
>
> [ ] +1, let's get it released!!!
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
> P.S. here is my +1
>
> --
> http://home.apache.org/~lewismc/
> @hectorMcSpector
> http://www.linkedin.com/in/lmcgibbney
>



-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


RE: package-info.java

2016-11-16 Thread lewis john mcgibbney
Hi Matt,
I get digest email, however I saw you email on the remote list.
The answer is here
https://www.intertech.com/Blog/whats-package-info-java-for/
I couldn't find any Oracle or Open JDK-level documentation.
package.html is deprecated in current JDK.

-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


RE: "mvn assembly" no longer works

2016-11-16 Thread lewis john mcgibbney
Hi Matt,
Again, I am on digest and didn't receive but I'll reply here.
No need to use the Maven assembly plugin anymore... simply execute mvn
package... you will then see
./target/joshua-6.2-SNAPSHOT-jar-with-dependencies.jar the exact same, but
now a default Maven task rather than a custom plugin implementation.
Do we need to update README?

-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Fwd: FW: November 2016 Newsletter -- LDC

2016-11-15 Thread lewis john mcgibbney
Hi Folks,
LDC newsletter FYI.
I wonder if we should starts publishing our notices in their newsletter? I
think I'll make an inquiry about that exact topic.
Lewis

-- Forwarded message --
From: Mcgibbney, Lewis J (398M) 
Date: Tue, Nov 15, 2016 at 10:02 AM
Subject: FW: November 2016 Newsletter -- LDC
To: "lewis.mcgibb...@gmail.com" 






Dr. Lewis John McGibbney Ph.D., B.Sc.

Data Scientist II

Computer Science for Data Intensive Applications Group 398M

Jet Propulsion Laboratory

California Institute of Technology

4800 Oak Grove Drive

Pasadena, California 91109-8099

Mail Stop : 158-256C

Tel:  (+1) (818)-393-7402

Cell: (+1) (626)-487-3476

Fax:  (+1) (818)-393-1190

Email: lewis.j.mcgibb...@jpl.nasa.gov







 Dare Mighty Things



*From: *Ldc-customers1  on behalf of
Penn LDC 
*Date: *Tuesday, November 15, 2016 at 9:44 AM
*To: *Penn LDC 
*Subject: *November 2016 Newsletter -- LDC



*In this newsletter:*

*Join LDC for Membership Year 2017*

*Commercial use and LDC data*

*Spring 2017 Data Scholarship Program*

*LDC closed November 24-25 for US Thanksgiving Holiday*



*New publications:*

JANA: A Human-Human Dialogues Corpus for Egyptian Dialect
<https://catalog.ldc.upenn.edu/LDC2016T24>



Multi-Language Conversational Telephone Speech 2011 – Slavic Group
<https://catalog.ldc.upenn.edu/LDC2016S11>



IARPA Babel Georgian Language Pack IARPA-babel404b-v1.0a
<https://catalog.ldc.upenn.edu/LDC2016S12>

GALE Phase 3 and 4 Chinese Newswire Parallel Text
<https://catalog.ldc.upenn.edu/LDC2016T25>



*Join LDC for Membership Year 2017*

Organizations engaged in language-related research, education and
technology development are invited to join LDC for Membership Year (MY)
2017. Consortium members enjoy unparalleled access and continuing rights to
new data releases and to an archive of close to 700 holdings.

Membership fees have not increased for 2017. In addition, discounts are
available for organizations who keep their membership current and for those
who join before March 1, 2017.

   • MY 2016 members receive a 10% discount if they renew their
membership before March 1, 2017. After March 1, MY2016 members receive a 5%
discount if they renew their membership any time in 2017.

   • New members and returning former members receive a 5% discount
off the membership fee if they join/renew before March 1, 2017.

Plans for MY2017 publications are in progress. Among the expected releases
are:

2010 NIST Speaker Recognition Evaluation data set

Multilanguage conversational telephone speech: developed to support
language identification research in related languages

UCLA High Speed Laryngeal Database: audio recordings and high-speed
videoendoscopic images of the vocal folds while sustaining vowels

Noisy TIMIT: TIMIT with added artificial noise

CHiME shared task data: noisy read WSJ speech

First Year Law Students’ Memoranda: memos to a hypothetical court with
annotations

IARPA Babel Language Packs: languages include Vietnamese, Haitian Creole,
Zulu, Kazakh and Lithuanian

BOLT: source, parallel and word-aligned data in all languages

RATS Keyword Spotting data set

GALE Phases 3 and 4: all tasks and languages

Visit Join LDC <https://www.ldc.upenn.edu/members/join-ldc> for details on
membership, user accounts and payment.

*Commercial use and LDC data*

For-profit organizations are reminded that an LDC membership is a
pre-requisite for obtaining a commercial license to almost all LDC
databases. Non-member organizations, including non-member for-profit
organizations, cannot use LDC data to develop or test products for
commercialization, nor can they use LDC data in any commercial product or
for any commercial purpose. LDC data users should consult corpus-specific
license agreements for limitations on the use of certain corpora. Visit the
Licensing <https://www.ldc.upenn.edu/data-management/using/licensing> page
for further information.

*Spring 2017 Data Scholarship Program*

Applications are now being accepted through January 15, 2017 for the Spring
2017 LDC Data Scholarship program which provides university students with
no-cost access to LDC data. Consult the LDC Data Scholarship
<https://www.ldc.upenn.edu/language-resources/data/data-scholarships> page
for further information about program rules and submission requirements.

*LDC closed November 24-25 for US Thanksgiving Holiday *

LDC will be closed on Thursday, November 24, 2016 and Friday, November 25,
2016 in observance of the US Thanksgiving Holiday. The office will reopen
on Monday, November 28, 2016.

*New Corpora*



(1) JANA: A Human-Human Dialogues Corpus for Egyptian Dialect
<https://catalog.ldc.upenn.edu/LDC2016T24> was developed by researchers at
Cairo University. This is a special release in addition to the LDC
scheduled corpora for membership year 2016, available under separate terms.



This corpus consists of 82 transcribed dialogues f

Re: Updating Incubator summary

2016-11-14 Thread lewis john mcgibbney
Hi Henri,
I just pushed the update to SVN. Should update asynch reasonably soon.

http://incubator.apache.org/projects/joshua.html

Thanks

On Sun, Nov 13, 2016 at 1:22 PM, <
dev-digest-h...@joshua.incubator.apache.org> wrote:

>
> From: Henri Yandell 
> To: dev@joshua.incubator.apache.org
> Cc:
> Date: Sun, 13 Nov 2016 01:17:57 -0800
> Subject: Updating Incubator summary
> Would be useful to update this page:
>
> http://incubator.apache.org/projects/joshua.html
>
>
> Are there any of the checklist items that are still open?
>
>
As far as I am aware no :)


[jira] [Updated] (JOSHUA-308) Apply consistent formatting to project and remove trailing whitespace

2016-11-14 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated JOSHUA-308:

Fix Version/s: 6.2

> Apply consistent formatting to project and remove trailing whitespace
> -
>
> Key: JOSHUA-308
> URL: https://issues.apache.org/jira/browse/JOSHUA-308
> Project: Joshua
>  Issue Type: Improvement
>Reporter: Max Thomas
>Priority: Minor
> Fix For: 6.2
>
>
> I suggest that the checked in code format be applied to all files, with the 
> following addition: remove trailing whitespace. Trailing whitespace makes it 
> unnecessarily more difficult to work with the code base.
> I thought that this was part of the format file, but I think it must be a 
> setting I have enabled outside of this in Eclipse.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JOSHUA-290) Provide Joshua artifact as a bundle

2016-11-14 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated JOSHUA-290:

Fix Version/s: 6.2

> Provide Joshua artifact as a bundle
> ---
>
> Key: JOSHUA-290
> URL: https://issues.apache.org/jira/browse/JOSHUA-290
> Project: Joshua
>  Issue Type: Task
>  Components: build
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
> Fix For: 6.2
>
>
> I think it'd be good if we could make the Joshua artifact an OSGi _bundle_.
> This would have no impact on plain java applications but would give the 
> following benefits:
> - make it possible to install it in OSGi environments
> - optionally introduce semantic versioning (in addition with the baseline 
> plugin) that would help track e.g. if changes in APIs break backward 
> compatibility 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JOSHUA-51) add jhclark/bigfatlm

2016-11-14 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-51?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated JOSHUA-51:
---
Fix Version/s: 6.1

> add jhclark/bigfatlm
> 
>
> Key: JOSHUA-51
> URL: https://issues.apache.org/jira/browse/JOSHUA-51
> Project: Joshua
>  Issue Type: Bug
>Reporter: Matt Post
>Assignee: Matt Post
> Fix For: 6.2
>
>
> It would be nice to leverage more Hadoop tools in the pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JOSHUA-314) Enable set structured-output from config file

2016-11-14 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated JOSHUA-314:

Fix Version/s: 6.2

> Enable set structured-output from config file
> -
>
> Key: JOSHUA-314
> URL: https://issues.apache.org/jira/browse/JOSHUA-314
> Project: Joshua
>  Issue Type: Improvement
>  Components: core
>Reporter: Tommaso Teofili
> Fix For: 6.2
>
>
> Currently if one sets _use-structured-output = true_ in joshua.config that 
> results in error when parsing the config as it's not explicitly handled by 
> {{JoshuaConfiguration#readConfig}} (it can only be set programmatically), I 
> think it'd be nice to be able to configure it from config file too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JOSHUA-51) add jhclark/bigfatlm

2016-11-14 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-51?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated JOSHUA-51:
---
Fix Version/s: (was: 6.1)
   6.2

> add jhclark/bigfatlm
> 
>
> Key: JOSHUA-51
> URL: https://issues.apache.org/jira/browse/JOSHUA-51
> Project: Joshua
>  Issue Type: Bug
>Reporter: Matt Post
>Assignee: Matt Post
> Fix For: 6.2
>
>
> It would be nice to leverage more Hadoop tools in the pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (JOSHUA-323) Joshua 6.1 Release Management

2016-11-14 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved JOSHUA-323.
-
Resolution: Fixed

> Joshua 6.1 Release Management
> -
>
> Key: JOSHUA-323
> URL: https://issues.apache.org/jira/browse/JOSHUA-323
> Project: Joshua
>  Issue Type: Task
>  Components: build, release
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Blocker
> Fix For: 6.1
>
>
> This is a governing ticket for reference more than anything else. We need to 
> add all release specific build additions to parent pom.xml which enable us to 
> roll a release candidate.
> The process is also being documented over at 
> https://cwiki.apache.org/confluence/display/JOSHUA/Joshua+Release+Management+Procedure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[VOTE] Release Apache Joshua (Incubating) 6.1

2016-11-14 Thread lewis john mcgibbney
Hi Folks,
Please VOTE on the Apache Joshua 6.1 Release Candidate #1.

We solved 44 issues: https://s.apache.org/joshua6.1

Git source tag (167489bbd78526b9833fe7c88646bf96101d5d2b):
https://s.apache.org/joshua6.1tag

Staging repo:
https://repository.apache.org/content/repositories/orgapachejoshua-1000/

Source Release Artifacts:
https://dist.apache.org/repos/dist/dev/incubator/joshua/

PGP release keys (signed using 48BAEBF6):
https://dist.apache.org/repos/dist/release/incubator/joshua/KEYS

Vote will be open for 72 hours.
Thank you to everyone that is able to VOTE as well as everyone that
contributed to Apache Joshua 6.1.

[ ] +1, let's get it released!!!
[ ] +/-0, fine, but consider to fix few issues before...
[ ] -1, nope, because... (and please explain why)

P.S. here is my +1

-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


"Amazon launches voice-controlled music streaming service"

2016-11-14 Thread lewis john mcgibbney
Hi Folks,
Any Joshua involved deep down in here?


"Amazon launches voice-controlled music streaming service"


http://www.scotsman.com/future-scotland/tech/amazon-launches-voice-controlled-music-streaming-service-1-4286952


[jira] [Commented] (JOSHUA-323) Joshua 6.1 Release Management

2016-11-11 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15656783#comment-15656783
 ] 

Lewis John McGibbney commented on JOSHUA-323:
-

All licensing is now addressed and merged into master. I have some work to do 
with regards to release packaging which is not quite up to scratch but I will 
work on that tomorrow.

> Joshua 6.1 Release Management
> -
>
> Key: JOSHUA-323
> URL: https://issues.apache.org/jira/browse/JOSHUA-323
> Project: Joshua
>  Issue Type: Task
>  Components: build, release
>    Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Blocker
> Fix For: 6.1
>
>
> This is a governing ticket for reference more than anything else. We need to 
> add all release specific build additions to parent pom.xml which enable us to 
> roll a release candidate.
> The process is also being documented over at 
> https://cwiki.apache.org/confluence/display/JOSHUA/Joshua+Release+Management+Procedure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   >