Re: moses2 vs. joshua

2016-10-06 Thread Mattmann, Chris A (3980)
Here here, great job and thanks for hosting

++
Chris Mattmann, Ph.D.
Principal Data Scientist, Engineering Administrative Office (3010)
Manager, Open Source Projects Formulation and Development Office (8212)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++
 

On 10/6/16, 12:49 AM, "kellen sunderland"  wrote:

Will do, but it might be a few days before I get the time to do a proper
test.  Thanks for hosting Matt.

On Thu, Oct 6, 2016 at 2:19 AM, Matt Post  wrote:

> Hi folks,
>
> Sorry this took so long, long story. But the four models that Hieu shared
> with me are ready. You can download them here; they're each about 15–20 
GB.
>
>   http://cs.jhu.edu/~post/files/joshua-hiero-ar-en.tbz
>   http://cs.jhu.edu/~post/files/joshua-phrase-ar-en.tbz
>   http://cs.jhu.edu/~post/files/joshua-hiero-ru-en.tbz
>   http://cs.jhu.edu/~post/files/joshua-hiero-ru-en.tbz
>
> It'd be great if someone could test them on a machine with lots of cores,
> to see how things scale.
>
> matt
>
> On Sep 22, 2016, at 9:09 AM, Matt Post  wrote:
>
> Hi folks,
>
> I have finished the comparison. Here you can find graphs for ar-en and
> ru-en. The ground-up rewrite of Moses is
> about 2x–3x faster than Joshua.
>
> http://imgur.com/a/FcIbW
>
> One implication (untested) is that we are likely as fast as or faster than
> Moses.
>
> We could brainstorm things to do to close this gap. I'd be much happier
> with 2x or even 1.5x than with 3x, and I bet we could narrow this down. 
But
> I'd like to get the 6.1 release out of the way, first, so I'm pushing this
> off to next month. Sound cool?
>
> matt
>
>
> On Sep 19, 2016, at 6:26 AM, Matt Post  wrote:
>
> I can't believe I did this, but I mis-colored one of the hiero lines, and
> the Numbers legend doesn't show the line type. If you reload the dropbox
> file, it's fixed now. The difference is about 3x for both. Here's the 
table.
>
> Threads
> Joshua
> Moses2
> Joshua (hiero)
> Moses2 (hiero)
> Phrase rate
> Hiero rate
> 1
> 178
> 65
> 2116
> 1137
> 2.74
> 1.86
> 2
> 109
> 42
> 1014
> 389
> 2.60
> 2.61
> 4
> 78
> 29
> 596
> 213
> 2.69
> 2.80
> 6
> 72
> 25
> 473
> 154
> 2.88
> 3.07
>
> I'll put the models together and share them later today. This was on a
> 6-core machine and I agree it'd be nice to test with something much 
higher.
>
> matt
>
>
> On Sep 19, 2016, at 5:33 AM, kellen sunderland <
> kellen.sunderl...@gmail.com >> wrote:
>
> Do we just want to store these models somewhere temporarily?  I've got a
> OneDrive account and could share the models from there (as long as they're
> below 500GBs or so).
>
> On Mon, Sep 19, 2016 at 11:32 AM, kellen sunderland <
> kellen.sunderl...@gmail.com  >> wrote:
> Very nice results.  I think getting to within 25% of a optimized c++
> decoder from a Java decoder is impressive.  Great that Hieu has put in the
> work to make moses2 so fast as well, that gives organizations two quite
> nice decoding engines to choose from, both with reasonable performance.
>
> Matt: I had a question about the x axis here.  Is that number of threads?
> We should be scaling more or less linearly with the number of threads, is
> that the case here?  If you post the models somewhere I can also do a 
quick
> benchmark on a machine with a few more cores.
>
> -Kellen
>
>
> On Mon, Sep 19, 2016 at 10:53 AM, Tommaso Teofili <
> tommaso.teof...@gmail.com >> wrote:
> Il giorno sab 17 set 2016 alle ore 15:23 Matt Post  mailto:p...@cs.jhu.edu >> ha
> scritto:
>
> I'll ask Hieu; I don't anticipate any problems. One potential problem is
> that that models occupy about 15--20 GB; do you think Jenkins would host
> this?
>
>
> I'm not sure, can such models be downloaded and pruned 

Re: bigtranslate

2016-07-19 Thread Mattmann, Chris A (3980)
yep sounds great.

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++










On 7/19/16, 1:02 AM, "Matt Post" <p...@cs.jhu.edu> wrote:

>Yes — after the first week of August. This would be useful to factor into 
>discussions about pulling the server out of the main code. 
>
>
>> On Jul 15, 2016, at 1:11 AM, Mattmann, Chris A (3980) 
>> <chris.a.mattm...@jpl.nasa.gov> wrote:
>> 
>> Hey Matt,
>> 
>> I’d love some help. Yes I would like to add a connection via
>> Tika Translate to Joshua - probably via the REST server.
>> Wanna help?
>> 
>> Cheers,
>> Chris
>> 
>> ++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattm...@nasa.gov
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++
>> Director, Information Retrieval and Data Science Group (IRDS)
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> WWW: http://irds.usc.edu/
>> ++
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On 7/13/16, 7:49 AM, "Matt Post" <p...@cs.jhu.edu> wrote:
>> 
>>> Chris,
>>> 
>>> This looks cool. How are you planning to get this to work with Joshua? Do 
>>> you need help with the API piece?
>>> 
>>> matt
>>> 
>>> 
>>>> On Jul 12, 2016, at 6:40 PM, Mattmann, Chris A (3980) 
>>>> <chris.a.mattm...@jpl.nasa.gov> wrote:
>>>> 
>>>> I will see about registering as well :)
>>>> 
>>>> I have BigTranslate up and working if anyone is interested. I am
>>>> currently evaluating it on the XDATA employment corpus with Lingo24
>>>> but next is Joshua (and hoping to use Bing Translate too). If anyone
>>>> has an Amazon unlimited key for translation to send my way would 
>>>> love to add it to the mix too :)
>>>> 
>>>> http://github.com/chrismattmann/bigtranslate/
>>>> 
>>>> Cheers,
>>>> Chris
>>>> 
>>>> ++
>>>> Chris Mattmann, Ph.D.
>>>> Chief Architect
>>>> Instrument Software and Science Data Systems Section (398)
>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>> Office: 168-519, Mailstop: 168-527
>>>> Email: chris.a.mattm...@nasa.gov
>>>> WWW:  http://sunset.usc.edu/~mattmann/
>>>> ++
>>>> Director, Information Retrieval and Data Science Group (IRDS)
>>>> Adjunct Associate Professor, Computer Science Department
>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>> WWW: http://irds.usc.edu/
>>>> ++
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On 7/12/16, 5:12 PM, "kellen sunderland" <kellen.sunderl...@gmail.com> 
>>>> wrote:
>>>> 
>>>>> Thanks for forwarding Matt.  I think a fair number of people from my team
>>>>> will want to attend.  I'll pass around the registration link.
>>>>> 
>>>>> -Kellen
>>>>> On Jul 12, 2016 11:01 PM, "Matt Post" <p...@cs.jhu.edu> wrote:
>>>>> 
>>>>>> Hi everyone,
>>>>>> 
>>>>>> We had talked a while ago about Joshua projects for MT Marathon in 
>>>>>> Prague.
>>>>>> Registration (free) is now open. Let me know if you're planning to go and
>>>>>> we can make some plans!
>>>>>> 
>>>>>> http://ufal.mff.cuni.cz/mtm16/registration
>>>>>> 
>>>>>> matt
>>>>>> 
>>>>>> 
>>> 
>


Re: Russian Language Model for Joshua

2016-07-17 Thread Mattmann, Chris A (3980)
I was able to download it thanks!

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++










On 7/17/16, 5:40 PM, "Matt Post" <p...@cs.jhu.edu> wrote:

>I don't mind hosting it, and JHU hasn't complained, but it's an ugly URL.
>
>matt
>
>
>> On Jul 16, 2016, at 5:45 PM, Mcgibbney, Lewis J (398M) 
>> <lewis.j.mcgibb...@jpl.nasa.gov> wrote:
>> 
>> Can you make this public for good? Or is it the size which is the issue?
>> Is this build using master branch Matt? I am having issues building models
>> with masterŠ I¹ll post my issues on another thread.
>> 
>> Dr. Lewis John McGibbney Ph.D., B.Sc.
>> Data Scientist II
>> Computer Science for Data Intensive Applications Group 398M
>> Jet Propulsion Laboratory
>> California Institute of Technology
>> 4800 Oak Grove Drive
>> Pasadena, California 91109-8099
>> Mail Stop : 158-256C
>> Tel:  (+1) (818)-393-7402
>> Cell: (+1) (626)-487-3476
>> Fax:  (+1) (818)-393-1190
>> Email: lewis.j.mcgibb...@jpl.nasa.gov
>> 
>> 
>> 
>> Dare Mighty Things
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On 7/16/16, 1:09 PM, "Matt Post" <p...@cs.jhu.edu> wrote:
>> 
>>> Done:
>>> 
>>> http://cs.jhu.edu/~post/tmp/ru.kenlm
>>> 4106251755 bytes, sha1sum: 5c894e24dafa42bc44a5bb6822812d6234eda791
>>> 
>>> Let me know when you have it so I can delete it.
>>> 
>>> matt
>>> 
>>> 
>>>> On Jul 15, 2016, at 4:42 PM, Matt Post <p...@cs.jhu.edu> wrote:
>>>> 
>>>> All right, started trying to recompile. If you have a machine with >
>>>> 256 GB of memory, it might be more efficient for me to give you the raw
>>>> ARPA file and for you to compile it. We'll see how it goes. Ping me in a
>>>> day if you don't hear from me.
>>>> 
>>>> matt
>>>> 
>>>> 
>>>>> On Jul 15, 2016, at 4:40 PM, Mattmann, Chris A (3980)
>>>>> <chris.a.mattm...@jpl.nasa.gov> wrote:
>>>>> 
>>>>> Yes please! :)
>>>>> 
>>>>> Sent from my iPhone
>>>>> 
>>>>>> On Jul 15, 2016, at 1:39 PM, Matt Post <p...@cs.jhu.edu> wrote:
>>>>>> 
>>>>>> I have one built on Common Crawl. It's 25 GB uncompressed. My KenLM
>>>>>> compiles of it failed in the past, but I'll try again. I expect it to
>>>>>> be about 8 GB when that's done. Do you want it?
>>>>>> 
>>>>>> matt
>>>>>> 
>>>>>> 
>>>>>>> On Jul 15, 2016, at 3:50 PM, Mattmann, Chris A (3980)
>>>>>>> <chris.a.mattm...@jpl.nasa.gov> wrote:
>>>>>>> 
>>>>>>> Hey Folks,
>>>>>>> 
>>>>>>> Anyone have a Russian Language Model for Joshua? Lewis was working on
>>>>>>> one, not sure if he has it but just broadening the question.
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> Chris
>>>>>>> 
>>>>>>> ++
>>>>>>> Chris Mattmann, Ph.D.
>>>>>>> Chief Architect
>>>>>>> Instrument Software and Science Data Systems Section (398)
>>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>>> Office: 168-519, Mailstop: 168-527
>>>>>>> Email: chris.a.mattm...@nasa.gov
>>>>>>> WWW:  http://sunset.usc.edu/~mattmann/
>>>>>>> ++
>>>>>>> Director, Information Retrieval and Data Science Group (IRDS)
>>>>>>> Adjunct Associate Professor, Computer Science Department
>>>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>>>> WWW: http://irds.usc.edu/
>>>>>>> ++
>>>>>> 
>>>> 
>>> 
>> 
>


Re: Russian Language Model for Joshua

2016-07-15 Thread Mattmann, Chris A (3980)
Will do.

Adding Paul Zimdars - do we have an Amazon machine that has > 256GB
of memory? How much would that cost?

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++










On 7/15/16, 1:42 PM, "Matt Post" <p...@cs.jhu.edu> wrote:

>All right, started trying to recompile. If you have a machine with > 256 GB of 
>memory, it might be more efficient for me to give you the raw ARPA file and 
>for you to compile it. We'll see how it goes. Ping me in a day if you don't 
>hear from me.
>
>matt
>
>
>> On Jul 15, 2016, at 4:40 PM, Mattmann, Chris A (3980) 
>> <chris.a.mattm...@jpl.nasa.gov> wrote:
>> 
>> Yes please! :)
>> 
>> Sent from my iPhone
>> 
>>> On Jul 15, 2016, at 1:39 PM, Matt Post <p...@cs.jhu.edu> wrote:
>>> 
>>> I have one built on Common Crawl. It's 25 GB uncompressed. My KenLM 
>>> compiles of it failed in the past, but I'll try again. I expect it to be 
>>> about 8 GB when that's done. Do you want it?
>>> 
>>> matt
>>> 
>>> 
>>>> On Jul 15, 2016, at 3:50 PM, Mattmann, Chris A (3980) 
>>>> <chris.a.mattm...@jpl.nasa.gov> wrote:
>>>> 
>>>> Hey Folks,
>>>> 
>>>> Anyone have a Russian Language Model for Joshua? Lewis was working on
>>>> one, not sure if he has it but just broadening the question.
>>>> 
>>>> Cheers,
>>>> Chris
>>>> 
>>>> ++
>>>> Chris Mattmann, Ph.D.
>>>> Chief Architect
>>>> Instrument Software and Science Data Systems Section (398)
>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>> Office: 168-519, Mailstop: 168-527
>>>> Email: chris.a.mattm...@nasa.gov
>>>> WWW:  http://sunset.usc.edu/~mattmann/
>>>> ++
>>>> Director, Information Retrieval and Data Science Group (IRDS)
>>>> Adjunct Associate Professor, Computer Science Department
>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>> WWW: http://irds.usc.edu/
>>>> ++
>>> 
>


Re: Russian Language Model for Joshua

2016-07-15 Thread Mattmann, Chris A (3980)
Yes please! :)

Sent from my iPhone

> On Jul 15, 2016, at 1:39 PM, Matt Post <p...@cs.jhu.edu> wrote:
> 
> I have one built on Common Crawl. It's 25 GB uncompressed. My KenLM compiles 
> of it failed in the past, but I'll try again. I expect it to be about 8 GB 
> when that's done. Do you want it?
> 
> matt
> 
> 
>> On Jul 15, 2016, at 3:50 PM, Mattmann, Chris A (3980) 
>> <chris.a.mattm...@jpl.nasa.gov> wrote:
>> 
>> Hey Folks,
>> 
>> Anyone have a Russian Language Model for Joshua? Lewis was working on
>> one, not sure if he has it but just broadening the question.
>> 
>> Cheers,
>> Chris
>> 
>> ++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattm...@nasa.gov
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++
>> Director, Information Retrieval and Data Science Group (IRDS)
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> WWW: http://irds.usc.edu/
>> ++
> 


Russian Language Model for Joshua

2016-07-15 Thread Mattmann, Chris A (3980)
Hey Folks,

Anyone have a Russian Language Model for Joshua? Lewis was working on
one, not sure if he has it but just broadening the question.

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++







Re: MT marathon registration is open

2016-07-12 Thread Mattmann, Chris A (3980)
I will see about registering as well :)

I have BigTranslate up and working if anyone is interested. I am
currently evaluating it on the XDATA employment corpus with Lingo24
but next is Joshua (and hoping to use Bing Translate too). If anyone
has an Amazon unlimited key for translation to send my way would 
love to add it to the mix too :)

http://github.com/chrismattmann/bigtranslate/

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++










On 7/12/16, 5:12 PM, "kellen sunderland"  wrote:

>Thanks for forwarding Matt.  I think a fair number of people from my team
>will want to attend.  I'll pass around the registration link.
>
>-Kellen
>On Jul 12, 2016 11:01 PM, "Matt Post"  wrote:
>
>> Hi everyone,
>>
>> We had talked a while ago about Joshua projects for MT Marathon in Prague.
>> Registration (free) is now open. Let me know if you're planning to go and
>> we can make some plans!
>>
>> http://ufal.mff.cuni.cz/mtm16/registration
>>
>> matt
>>
>>


Re: Avoiding master failures with CI

2016-07-11 Thread Mattmann, Chris A (3980)
CI = continuous integration :)

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++










On 7/11/16, 4:50 PM, "Matt Post"  wrote:

>This sounds fine to me. What does CI stand for?
>
>Another thing we should do, which might be complementary to this, is just be 
>more formal about our process. I had been using this method for a while:
>
>   http://nvie.com/posts/a-successful-git-branching-model/
>
>Sort of informally, but that could be a good approach (I think someone 
>suggested it a while ago). In short:
>
>- "master" is always stable and records official releases
>- development takes place on "develop"
>- if you need to make an important fix, you branch off master, fix it, then 
>merge that into both "master" (as a point release) and "develop"
>
>I was using "release" for releases and "master" for develop, but we could 
>adopt anything.
>
>Kellen, how does this fit with CI? It seems like we could set it up to do 
>testing on "master" and "develop" branches --- the first as a sanity check, 
>and the second as a test for when we could merge into master?
>
>matt
>
>
>> On Jul 11, 2016, at 8:17 AM, kellen sunderland  
>> wrote:
>> 
>> We've made a lot of progress on moving the project over to Apache + Maven.
>> I was wondering if now would be a good time to consider re-thinking how we
>> merge changes into master.  The main goal would be to make sure we have a
>> stable master branch that everyone can pull from.
>> 
>> What I'd suggest is that we only merge into master once CI has completed
>> testing.  This way we can codify style rules, best practices, and make sure
>> builds succeed and tests pass.  We can develop new features create PRs as
>> normal, and then get quick feedback if those PRs are mergable.  I'd also
>> suggest we dis-allow manual pushing to the master branch.
>> 
>> I'm not sure how much effort this would be with the existing CI server, but
>> I could investigate this if someone could grant me admin permissions.  If
>> it's a Jenkins server I'm sure it's possible.
>> 
>> Another option is to use Travis CI.  I have taken a quick look at Travis CI
>> and it seems like a quite polished solution.  It's free to use for open
>> source projects.  It supports automatically building + testing PRs.  The
>> interface is really clean.  It has email notifications and group
>> administration support.  It's got support for multiple (programming)
>> languages so we could in theory build kenlm as a build step and run those
>> tests.
>> 
>> Here's some more info on what the workflow with Travis-CI and PRs would be
>> https://docs.travis-ci.com/user/pull-requests
>> 
>> What do you guys think?  Is there a strong preference for using Jenkins
>> from the Apache community?  Would everyone be ok with avoiding direct
>> pushes to master?
>> 
>> -Kellen
>


Re: Avoiding master failures with CI

2016-07-11 Thread Mattmann, Chris A (3980)
+1 let’s start using Travis - CI IMO..

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++










On 7/11/16, 8:17 AM, "kellen sunderland"  wrote:

>We've made a lot of progress on moving the project over to Apache + Maven.
>I was wondering if now would be a good time to consider re-thinking how we
>merge changes into master.  The main goal would be to make sure we have a
>stable master branch that everyone can pull from.
>
>What I'd suggest is that we only merge into master once CI has completed
>testing.  This way we can codify style rules, best practices, and make sure
>builds succeed and tests pass.  We can develop new features create PRs as
>normal, and then get quick feedback if those PRs are mergable.  I'd also
>suggest we dis-allow manual pushing to the master branch.
>
>I'm not sure how much effort this would be with the existing CI server, but
>I could investigate this if someone could grant me admin permissions.  If
>it's a Jenkins server I'm sure it's possible.
>
>Another option is to use Travis CI.  I have taken a quick look at Travis CI
>and it seems like a quite polished solution.  It's free to use for open
>source projects.  It supports automatically building + testing PRs.  The
>interface is really clean.  It has email notifications and group
>administration support.  It's got support for multiple (programming)
>languages so we could in theory build kenlm as a build step and run those
>tests.
>
>Here's some more info on what the workflow with Travis-CI and PRs would be
>https://docs.travis-ci.com/user/pull-requests
>
>What do you guys think?  Is there a strong preference for using Jenkins
>from the Apache community?  Would everyone be ok with avoiding direct
>pushes to master?
>
>-Kellen


Re: [IMPORTANT] Roadmap for 6.1 Release

2016-06-20 Thread Mattmann, Chris A (3980)
Thanks for doing the yeoman’s work Lewis

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++










On 6/20/16, 11:34 AM, "Lewis John Mcgibbney"  wrote:

>Hi Folks,
>I've just smartened up Jira a bit with our Roadmap being defined as follows
>
>https://issues.apache.org/jira/browse/joshua/?selectedTab=com.atlassian.jira.jira-projects-plugin:roadmap-panel
>
>Right now there are only 14/14 issues as RESOLVED for 6.1. This is false as
>I know that many more issues have been addressed however I don't think that
>Jira tickets have been created for all changes to the source code. Maybe
>moving forward we could open Jira issues and link them to the Github
>tickets via commit messages?
>
>Additionally, everything that was currently UNRESOLVED has merely been
>pushed to 6.2. If this is not what is required then please reassign the fix
>version for any ticket(s) to 6.1 and we can fix.
>
>Finally, are there any mitigating factor which would prevent a 6.1 release
>candidate being prepared right now?
>Thanks
>Lewis
>
>-- 
>*Lewis*


Re: incubator wiki

2016-06-06 Thread Mattmann, Chris A (3980)
hey Matt can you grant perms to chrismattmann (username)

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++










On 6/6/16, 6:03 PM, "Matt Post"  wrote:

>Hi everyone,
>
>I made the confluence page public (read-only), as part of transitioning the 
>website there. It didn't seem to me that anything there was private, but if 
>something should be, we can lock down individual pages to members only.
>
>(Does anyone know how to have a Confluence group created?)
>
>matt


Re: too many emails

2016-05-26 Thread Mattmann, Chris A (3980)
Hey Matt,

To be clear, I’m asking for input on a name amongst those choices.

Also, we shouldn’t be archiving emails so we forget about them.
The point is the conversation for the project should happen here
and if it’s dev relevant conversation then it should be something
that those that don’t have the advantage of operating at GitHub
and believing that’s the home for the project still have a chance
to participate by sending mails and participating in the conversation
for the project, here.

That said, I think there’s a simple solution:

1) stand up new list (we can use the mlreq program here, once
we agree on the name)
https://infra.apache.org/officers/mlreq/


2) file INFRA JIRA ticket and have ASF GitHub bot send communication
to list from #1

Make sense? Agree?

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++









On 5/26/16, 11:23 AM, "Matt Post" <p...@cs.jhu.edu> wrote:

>Chris, to be clear, are you asking for input on a name, or suggesting creating 
>all three lists?
>
>The main issue I'm concerned with is that comments on Github generate three 
>emails:
>
>- Github sends an email to dev
>- If the comment is on a pull that matches a JIRA issue, the ASF Github bot 
>sends an email to me
>- It also sends the same email to dev
>
>I think we should just (a) tell Github to stop posting to dev and (b) tell the 
>ASF Github bot to send everything to commits. We could create a new list, but 
>there's some complexity in maintaining lists themselves, and it seems that 
>commits would be a good place to bury things and forget about them.
>
>Would that satisfy the archiving goals? Who can do this? I don't seem to have 
>Github permission on incubator-joshua to do (a), and I don't know how to do 
>(b).
>
>matt
>
>
>
>
>> On May 26, 2016, at 11:34 AM, kellen sunderland 
>> <kellen.sunderl...@gmail.com> wrote:
>> 
>> I'd +1 as well.  Your breakdown looks good to me Chris.
>> 
>> On Thu, May 26, 2016 at 4:12 PM, Mattmann, Chris A (3980) <
>> chris.a.mattm...@jpl.nasa.gov> wrote:
>> 
>>> +1 to a separate list for GitHub stuff. Many communities (Kudu,
>>> Spark, etc.) end up doing this.
>>> 
>>> How about:
>>> 
>>> revi...@joshua.incubator.apache.org
>>> git...@joshua.incubator.apache.org
>>> iss...@joshua.incubator.apache.org
>>> 
>>> Any of those?
>>> 
>>> ++
>>> Chris Mattmann, Ph.D.
>>> Chief Architect
>>> Instrument Software and Science Data Systems Section (398)
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 168-519, Mailstop: 168-527
>>> Email: chris.a.mattm...@nasa.gov
>>> WWW:  http://sunset.usc.edu/~mattmann/
>>> ++
>>> Director, Information Retrieval and Data Science Group (IRDS)
>>> Adjunct Associate Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> WWW: http://irds.usc.edu/
>>> ++
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On 5/26/16, 6:00 AM, "Matt Post" <p...@cs.jhu.edu> wrote:
>>> 
>>>> I agree it's good to have Github stuff archived on Apache-owned domains,
>>> I just think that the list gets overwhelmed with garbage that most people
>>> are just deleting. I mean, I like the idea of skimming through commits, but
>>> today I am waking up to over 100 emails, and I have to pick out the
>>> auto-generated emails that I don't have time to read from the important
>>> ones. If most people are just saving things to a separate folder, that they
>>> are never going to read, isn't it better to turn off those auto-emails?
>>>> 
>>>> Why not use a separate list like git@ or archive@ for such posts? Then
>>> it's there for people 

Re: too many emails

2016-05-26 Thread Mattmann, Chris A (3980)
+1 to a separate list for GitHub stuff. Many communities (Kudu, 
Spark, etc.) end up doing this.

How about:

revi...@joshua.incubator.apache.org
git...@joshua.incubator.apache.org
iss...@joshua.incubator.apache.org

Any of those?

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++









On 5/26/16, 6:00 AM, "Matt Post"  wrote:

>I agree it's good to have Github stuff archived on Apache-owned domains, I 
>just think that the list gets overwhelmed with garbage that most people are 
>just deleting. I mean, I like the idea of skimming through commits, but today 
>I am waking up to over 100 emails, and I have to pick out the auto-generated 
>emails that I don't have time to read from the important ones. If most people 
>are just saving things to a separate folder, that they are never going to 
>read, isn't it better to turn off those auto-emails?
>
>Why not use a separate list like git@ or archive@ for such posts? Then it's 
>there for people to search, but no one has to wade through it.
>
>
>
>
>> On May 26, 2016, at 12:45 AM, Lewis John Mcgibbney 
>>  wrote:
>> 
>> Hi Matt,
>> 
>> As Henry said. Either we get them going to a different list or else you
>> subscribe to dev-dig...@joshua.incubator.apache.org (subscribe through
>> dev-digest-subscr...@joshua.incubator.apache.org)?
>> Which do you prefer?
>> Quick reasoning as to why Github convo is shadowed on the Apache lists. If
>> Github ever goes away, then we loose all of the conversation. We archive it
>> @Apache so we cover our communities.
>> Thanks
>> 
>> 
>> On Wed, May 25, 2016 at 2:11 PM, <
>> dev-digest-h...@joshua.incubator.apache.org> wrote:
>> 
>>> 
>>> From: Matt Post 
>>> To: dev@joshua.incubator.apache.org
>>> Cc:
>>> Date: Wed, 25 May 2016 15:48:24 -0400
>>> Subject: too many emails
>>> Does someone know how to turn off the mailing of all github comments to
>>> dev?
>>> 
>>> The way I see it, we all have to be on dev, so it should be for people,
>>> not robots. I am getting every comment about three times.
>>> 
>>> I would just do it but I don't know how.
>>> 
>>> 
>


Re: May 2016 Newsletter – LDC

2016-05-20 Thread Mattmann, Chris A (3980)
Thanks Lewis. I’m also an org rep for NASA at LDC, and also via my
USC hat. Good show.

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++










On 5/20/16, 8:45 AM, "Lewis John Mcgibbney"  wrote:

>Hi Folks,
>I've ended up primary JPL organizational rep for the linguistics data
>consortium. They produce monthly newsletters (see below for most
>recent) which I will be forwarding to dev@ Joshua from now on.
>They are pretty cool, especially the new datasets they publish.
>Lewis
>
>-- Forwarded message --
>From: *Mcgibbney, Lewis J (398M)* 
>Date: Friday, May 20, 2016
>Subject: Fwd: May 2016 Newsletter – LDC
>To: "lewis.mcgibb...@gmail.com" 
>
>
>
>
>Sent from my iPhone
>
>Begin forwarded message:
>
>*From:* Linguistic Data Consortium >
>*Date:* May 16, 2016 at 8:20:33 AM PDT
>*To:* Linguistic Data Consortium >
>*Subject:* *May 2016 Newsletter – LDC*
>
>*In this newsletter:*
>
>*LDC at LREC 2016*
>
>
>
>*New publications:*
>
>­­SDP 2014 & 2015: Broad Coverage Semantic Dependency Parsing
><#m_-2915229479963685663_SDP>
>
>
>GALE Phase 4 Chinese Broadcast Conversation Speech
><#m_-2915229479963685663_GALE1>
>
>
>GALE Phase 4 Chinese Broadcast Conversation Transcripts
><#m_-2915229479963685663_GALE2>
>
>
>
>
>
>*LDC at LREC 2016*
>
>
>
>LDC will attend the 10th Language Resource Evaluation Conference
>(LREC2016), hosted by ELRA, the European Language Resource Association. The
>conference will be held in Portorož, Slovenia from May 23-28 and features a
>broad range of sessions on language resources and human language
>technologies research. Seven LDC staff members will be presenting current
>work on topics including trends in HLT research, building language
>resources for autism spectrum disorders, data management plans, rapid
>development of morphological analyzers for typologically diverse languages,
>selection criteria for low resource language programs, multi-language
>speech collection for NIST LRE, novel incentives for collecting data and
>annotation from people, and more.
>
>
>
>Following the conference, LDC’s presented papers and posters will be
>available on LDC’s Papers Page
>.
>
>
>
>
>
>New Corpora
>
>
>
>(1) SDP 2014 & 2015: Broad Coverage Semantic Dependency Parsing
> consists of data, tools, system
>results, and publications associated with the 2014 and 2015 tasks on
>Broad-Coverage Semantic Dependency Parsing (SDP )
>conducted in conjunction with the International Workshop on Semantic
>Evaluation (SemEval ) and was developed
>by the SDP task organizers.
>
>SemEval is an ongoing series of evaluations of computational semantic
>analysis systems intended to explore the nature of meaning in language. It
>evolved from the Senseval  word sense
>disambiguation series to include semantic analysis tasks outside of word
>sense disambiguation.
>
>This release is based on English, Chinese and Czech data from the following
>resources: Treebank-2 LDC95T17 ,
>Proposition Bank I LDC2004T14 ,
>NomBaank v 1.0 LDC2008T23  and
>CCGBank LDC2005T13  (English);
>Chinese Treebank (e.g., Chinese Treebank 8.0 LDC2013T21
>) (Chinese); and Prague
>Dependency Treebank (e.g., Prague Dependency Treebank 2.0, LDC2006T01
>) (Czech).
>
>The results are presented as graphs in three target representations:
>MRS-Derived Semantic Dependencies (DM), Enju Predicate–Argument Structures
>(PAS), and Prague Semantic Dependencies (PSD). As a fourth, additional
>target representation CCGbank was converted to semantic dependency graphs
>(in the subdirectory ‘ccd’).
>
>SDP 2014 & 2015: Broad Coverage Semantic Dependency Parsing is distributed
>via web download.
>
>2016 Subscription Members will automatically receive two copies of this
>corpus. 2016 Standard 

Re: [NOTICE] Bootstrap Complete

2016-04-14 Thread Mattmann, Chris A (3980)
Thanks Lewis great work

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++










On 4/14/16, 8:25 PM, "Lewis John Mcgibbney"  wrote:

>Hi Team,
>OK, so INFRA have closed out [0] and the transition to the Apache Incubator
>is complete.
>We can now formally focus on driving releases, building out the community
>and onwards towards graduation. All very positive.
>I've been updating our podling status page over at [1] so generally I would
>say we are in very good health.
>Excellent work.
>Lewis
>
>[0] https://issues.apache.org/jira/browse/INFRA-11264
>[1] http://incubator.apache.org/projects/joshua.html
>
>-- 
>*Lewis*


Re: programmatic API usage?

2016-04-12 Thread Mattmann, Chris A (3980)
Here is an example of how to run it, in TIKA-1343 programmatically:


https://issues.apache.org/jira/browse/TIKA-1343


this patch:

https://reviews.apache.org/r/22761/

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++










On 4/12/16, 8:33 AM, "Tommaso Teofili"  wrote:

>thanks Matt for the reply.
>
>What I could come up so far is the following:
>String configFile = "/path/to/config";
>JoshuaConfiguration joshuaConfiguration = new JoshuaConfiguration();
>
>joshuaConfiguration.readConfigFile(configFile);
>Decoder decoder = new Decoder(joshuaConfiguration, configFile);
>ByteArrayOutputStream os = new ByteArrayOutputStream();
>decoder.decodeAll(new TranslationRequestStream(new BufferedReader(new
>StringReader(textToTranslate)), joshuaConfiguration), os);
>os.flush();
>byte[] bytes = os.toByteArray();
>String translationOutput = IOUtils.toString(bytes,
>Charset.defaultCharset().name());
>
>
>I think it'd be good if we could support such a usage, I'll keep
>experimenting.
>
>Regards,
>Tommaso
>
>
>
>Il giorno mar 12 apr 2016 alle ore 17:09 Matt Post  ha
>scritto:
>
>> Hi Tommaso,
>>
>> There isn't really, unfortunately. I have never used Joshua as a library;
>> it would be nice if the Amazon folks (who I infer have done so, from a
>> comment on their last commit) would contribute a doc on this front.
>>
>> What is the preferred avenue for developer documentation? Javadocs, or
>> something else?
>>
>> matt
>>
>>
>> > On Apr 12, 2016, at 6:09 AM, Tommaso Teofili 
>> wrote:
>> >
>> > Hi all,
>> >
>> > I am going through the code (so I'll probably figure it out at some
>> point),
>> > however I wonder if there's a quick guide on how to start using Joshua
>> > programmatically as I am start having a look at how it could be
>> integrated
>> > into other projects.
>> >
>> > Regards,
>> > Tommaso
>>
>>


Re: hosting release files

2016-04-05 Thread Mattmann, Chris A (3980)
FYI:
https://issues.apache.org/jira/browse/INFRA-11602


Cheers,
Chris

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++










On 4/5/16, 3:46 PM, "Mattmann, Chris A (3980)" <chris.a.mattm...@jpl.nasa.gov> 
wrote:

>You got it thanks 
>
>Sent from my iPhone
>
>> On Apr 5, 2016, at 3:44 PM, Daniel Gruno <humbed...@apache.org> wrote:
>> 
>>> On 04/05/2016 09:36 PM, Mattmann, Chris A (3980) wrote:
>>> Nowhere near OOo its normal range so can you Increase our limit to say 5gb 
>>> per upload to be safe? Thanks DG
>> 
>> I would, except it's nearly 10PM here and I'm playing the role of Arno
>> Dorian, mysterious assassin in Paris! Could you please file a JIRA for
>> this? :)
>> 
>> With regards,
>> Daniel.
>> 
>>> 
>>> Sent from my iPhone
>>> 
>>>> On Apr 5, 2016, at 3:31 PM, Daniel Gruno <humbed...@apache.org> wrote:
>>>> 
>>>> The actual size of the releases don't matter much - we just need a
>>>> heads-up so we can increase your 'upload limit'. What will matter is how
>>>> many downloads you estimate will happen per month. Are we talking a few
>>>> hundred? thousand? million?
>>>> 
>>>> If it's within 'normal range', then there's nothing to worry about. If
>>>> it's OpenOffice figures, then we need to discuss hosting.
>>>> 
>>>> With regards,
>>>> Daniel.
>>>> 
>>>>> On 04/05/2016 09:23 PM, Matt Post wrote:
>>>>> Hi,
>>>>> 
>>>>> To add to this, there could easily be many tens of such files. We 
>>>>> currently have three language packs (for Chinese, Arabic, and Spanish), 
>>>>> but plan to add many of them over the coming months.
>>>>> 
>>>>>   http://joshua-decoder.org/language-packs/
>>>>> 
>>>>> Matt
>>>>> 
>>>>> 
>>>>>> On Apr 5, 2016, at 2:19 PM, Mattmann, Chris A (3980) 
>>>>>> <chris.a.mattm...@jpl.nasa.gov> wrote:
>>>>>> 
>>>>>> Thanks Matt.
>>>>>> 
>>>>>> I’m copying infrastruct...@apache.org on this email. Infra@ what
>>>>>> are your thoughts on the Apache Joshua (Incubating) podling being
>>>>>> able to release our language packs - which are on order of 1gb-3gb
>>>>>> each? Can you suggest any gotchas in doing so - I realize the concern
>>>>>> about release size, but these language packs are more than just 
>>>>>> convenience binaries, they help make Apache Joshua a complete product.
>>>>>> 
>>>>>> Please advise.
>>>>>> 
>>>>>> Thx.
>>>>>> 
>>>>>> Cheers,
>>>>>> Chris
>>>>>> 
>>>>>> ++
>>>>>> Chris Mattmann, Ph.D.
>>>>>> Chief Architect
>>>>>> Instrument Software and Science Data Systems Section (398)
>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>> Office: 168-519, Mailstop: 168-527
>>>>>> Email: chris.a.mattm...@nasa.gov
>>>>>> WWW:  http://sunset.usc.edu/~mattmann/
>>>>>> ++
>>>>>> Director, Information Retrieval and Data Science Group (IRDS)
>>>>>> Adjunct Associate Professor, Computer Science Department
>>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>>> WWW: http://irds.usc.edu/
>>>>>> ++
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>

Re: Logo for Joshua

2016-03-29 Thread Mattmann, Chris A (3980)
would be great to brand it..

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++





-Original Message-
From: Lewis John Mcgibbney 
Reply-To: "dev@joshua.incubator.apache.org"

Date: Tuesday, March 29, 2016 at 8:26 AM
To: "dev@joshua.incubator.apache.org" 
Subject: Logo for Joshua

>Hi Folks,
>A bit of fun now...
>The current logo for Joshua can be found at [0], I actually quite like the
>color.
>Does anyone want to take on the task of branding it? Or do you want to
>leave it as it is?
>Ta
>
>[0] http://joshua-decoder.org/images/joshua-logo-small.png
>
>-- 
>*Lewis*



Re: consolidating thread

2016-03-28 Thread Mattmann, Chris A (3980)
Matt, what’s your Apache username?

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++





-Original Message-
From: Matt Post 
Reply-To: "dev@joshua.incubator.apache.org"

Date: Monday, March 28, 2016 at 8:31 PM
To: "dev@joshua.incubator.apache.org" 
Subject: Re: consolidating thread

>>> 
>>> (4)
>>> 
>>>https://git-wip-us.apache.org/repos/asf?p=incubator-joshua.git;a=summary
>>> and https://github.com/joshua-decoder/joshua are out of sync. I tried
>>>to
>>> write to the former again, and got the same error. I assumed yesterday
>>>that
>>> this was scheduled downtime but it now seems to be a write bit or
>>> something. Do you know how to fix this?
>>> 
>>> $ git push apache master
>>> Counting objects: 18, done.
>>> Delta compression using up to 8 threads.
>>> Compressing objects: 100% (18/18), done.
>>> Writing objects: 100% (18/18), 1.57 KiB | 0 bytes/s, done.
>>> Total 18 (delta 14), reused 0 (delta 0)
>>> remote: Write access is currently disabled. The ASF Git
>>> remote: repositories are currently undergoing maintenance.
>>> remote:
>>> To https://git-wip-us.apache.org/repos/asf/incubator-joshua.git
>>> ! [remote rejected] master -> master (pre-receive hook declined)
>>> error: failed to push some refs to '
>>> https://git-wip-us.apache.org/repos/asf/incubator-joshua.git'
>>> 
>>> 
>> Mmmm. OK. Can you please make sure that you have a key and that your
>> details are entered at http://id.apache.org, once this is done your
>> security authentication should be valid.
>> Let me know if this works for you.
>> Thanks
>
>I do have public keys listed there, but must still have something setup
>incorrectly.
>
>This document doesn't address git:
>http://www.apache.org/dev/version-control.html
>And this one doesn't address authentication:
>https://cwiki.apache.org/confluence/display/LUCENENET/Git+Setup+and+Pull+R
>equests
>
>Any thoughts?
>



Re: Migrating Community from Github and GoggleGroups to Apache

2016-03-28 Thread Mattmann, Chris A (3980)
secret...@apache.org and yes you can send 

Sent from my iPhone

> On Mar 28, 2016, at 7:42 PM, Matt Post  wrote:
> 
> Hi Henry,
> 
> I don't think I've filed this form, but I think I am authorized to. Who 
> should I send it to?
> 
> matt
> 
> 
>> On Mar 27, 2016, at 12:44 AM, Henry Saputra  wrote:
>> 
>> HI Matt,
>> 
>> Before we move the code from Joshua Github to Apache Git repo, have you
>> submitted software grant [1] to ASF?
>> 
>> This to make sure the "transfer" of rights of code to ASF.
>> 
>> Does John Hopkins U has the rights for Joshua code base?
>> 
>> 
>> Thanks,
>> 
>> - Henry
>> 
>> 
>> [1] https://www.apache.org/licenses/software-grant.txt
>> 
>>> On Sat, Mar 26, 2016 at 8:52 PM, Matt Post  wrote:
>>> 
>>> Thanks for the detailed notes, Lewis, I just tried to push to [0] and got
>>> the following note:
>>> 
>>> $ git push apache master
>>> Counting objects: 3, done.
>>> Delta compression using up to 8 threads.
>>> Compressing objects: 100% (3/3), done.
>>> Writing objects: 100% (3/3), 306 bytes | 0 bytes/s, done.
>>> Total 3 (delta 2), reused 0 (delta 0)
>>> remote: Write access is currently disabled. The ASF Git
>>> remote: repositories are currently undergoing maintenance.
>>> remote:
>>> To https://git-wip-us.apache.org/repos/asf/incubator-joshua.git
>>> ! [remote rejected] master -> master (pre-receive hook declined)
>>> error: failed to push some refs to '
>>> https://git-wip-us.apache.org/repos/asf/incubator-joshua.git'
>>> 
>>> I will assume this is just temporary and will try again sometime tomorrow.
>>> 
>>> Once [0] is up-to-date, I'll make the Apache branch at [1] the default and
>>> disable write access.
>>> 
>>> matt
>>> 
>>> 
> On Mar 26, 2016, at 1:38 PM, Lewis John Mcgibbney <
 lewis.mcgibb...@gmail.com> wrote:
 
 Hi Matt,
 
> On Sat, Mar 26, 2016 at 9:47 AM, Matt Post  wrote:
> 
> What is the new codebase convention supposed to be?
 
 
 So now, the canonical Joshua codebase MUST be at [0]
 The Github mirror at [2] is merely a mirror NOT the canonical source. All
 of the future releases of Joshua will be cut from the source at [0]
>>> meaning
 that all code development must now be transitioned to that codebase.
 
 [0] https://git-wip-us.apache.org/repos/asf/incubator-joshua.git
 
 
 
> Does development still happen at [1] and get mirrored to [2]
> automatically?
 
 
 No, essentially [1] gets shut down with a notice (apache branch README)
 telling people that the canonical source is at [0] however there is a
 convenient mirror at [2] for pull requests, etc. The notice should also
>>> say
 that the new website is at http://joshua.incubator.apacha.org
 
 
> I am uncertain what I am supposed to do (i.e., where I should set origin
> to be on my personal checkouts).
 
 So origin is now [0]
 
 
> 
> [1] https://github.com/joshua-decoder/joshua
> [2] https://github.com/apache/incubator-joshua
> 
> I have created the apache branch on the main repo and pushed it up.
 
 
 Great, can you make this the default branch on the repository so that
>>> when
 people try to navigate there the README clearly directs them towards [0]
 for the canonical source or [2] for the mirror?
 
 I notice that [1] and [0] are out of sync now actually. Would it be
 possible for you to forward port all of the code augmentations which have
 taken place since following commit
 
 mjpost  Added ability to add, remove, and
>>> list
 rules in the private phrase table
 <
>>> https://github.com/apache/incubator-joshua/commit/9a7700b2b71f64370a4822335916867e9a7e5afe
 
 
 to [0]?
 Once this is done, I would suggest that you make [1] read only as this
>>> will
 prevent the codebases diverging again.
 Thanks, this is not typically a painful process however once in place we
 will be good to go and begin building the community out.
>