Re: joshua_api

Matt Post Wed, 27 Apr 2016 11:33:06 -0700

Do you want me to fix the recapitalization? Or are you going to do that? I 
looked a bit, and it seems I'll have to add a method to get a word alignment 
object instead of just the string, so that I can poke through them. This 
approach is as good as true-casing in some languages.


A few other things:

- I saw a comment in the commit about the changes not working for phrase-based 
translation. Can you (or Felix) elaborate? What exactly will no longer work?

- Currently, there are multiple places where the "output-format" string has to 
get edited (KBestExtractor and in Translation). After you push your changes in, 
I'm going to make some edits so that this all occurs in one place.

matt


> On Apr 27, 2016, at 2:25 PM, kellen sunderland <[email protected]> 
> wrote:
> 
> Thanks for taking a look Matt,
> 
> I think this is all we've got planned as far as changes relating to an API
> would go.  We have a few more commits coming but they're just performance
> improvements and they don't change too much in the way of interfaces or
> method signatures.
> 
> -Kellen
> 
> On Wed, Apr 27, 2016 at 4:47 AM, Matt Post <[email protected]> wrote:
> 
>> Kellen,
>> 
>> Great. I had a chance to start looking over the ReworkedExtractions
>> branch. I'll have some more time today. It looks good to me so far. Is
>> there anything else you plan to do, or does that branch contain basically
>> all of it (apart from the recapitalization fix, which I see should be
>> applied more selectively, maybe only when a -recapitalize flag is present,
>> to save on time).
>> 
>> matt
>> 
>> 
>>> On Apr 26, 2016, at 1:56 AM, kellen sunderland <
>> [email protected]> wrote:
>>> 
>>> Hey Matt,
>>> 
>>> I've opened a new pull request with a few of our commits, feel free to
>> take
>>> a look when you have some time.
>>> 
>>> More importantly I've pushed our queue of upcoming commits to the
>> following
>>> branch in my fork:
>>> 
>> https://github.com/KellenSunderland/incubator-joshua/commits/ReworkedExtractions
>>> .  From there you can get an idea for the work we've done so far.  I
>>> haven't opened a PR yet for these commits because there's still some
>>> merging I have to do (there's a few failing tests and I had to
>> temporarily
>>> comment out some of your casing code).  Once that's fixed I'll do a
>> proper
>>> PR for these commits.
>>> 
>>> -Kellen
>>> 
>>> On Mon, Apr 25, 2016 at 1:35 PM, Matt Post <[email protected]> wrote:
>>> 
>>>> Great. On that first point, I meant that translate() would return a
>>>> Translation object, which would know its hypergraph and could iterate
>> over
>>>> a KBestExtractor. In any case, though, it sounds like you are a bit
>> ahead
>>>> of me on this, so I'll wait for a push that I can see, and then we can
>>>> converge on the design.
>>>> 
>>>> matt
>>>> 
>>>> 
>>>>> On Apr 25, 2016, at 4:10 PM, Hieber, Felix <[email protected]> wrote:
>>>>> 
>>>>> Hi Matt,
>>>>> 
>>>>> These are some nice suggestions. Most of the work we have done is in
>>>> line of what you propose so I would agree with Kellen that we should
>>>> synchronize and compare better earlier than later.
>>>>> 
>>>>> Best,
>>>>> Felix
>>>>> 
>>>>>> On 25.04.2016, at 07:44, kellen sunderland <
>> [email protected]>
>>>> wrote:
>>>>>> 
>>>>>> Hey Matt,
>>>>>> 
>>>>>> Sorry for the late reply.  The Joshua-6 folder and tst may have just
>>>> been
>>>>>> artifacts of some symlinks I have locally.  Sorry they may have been
>>>> pushed
>>>>>> by mistake, I can clean that up.
>>>>>> 
>>>>>> Good idea to have the api code in a separate branch.  We can merge the
>>>> work
>>>>>> that we've done some time next week.
>>>>>> 
>>>>>> KBestExtractor is one of the things we want to return via the API.  We
>>>>>> already have some of this implemented though as you suggest.  I'll try
>>>> and
>>>>>> push the remaining work we've done into my github branch so you can
>>>> compare.
>>>>>> 
>>>>>> -Kellen
>>>>>> 
>>>>>>> On Mon, Apr 25, 2016 at 6:11 AM, Matt Post <[email protected]> wrote:
>>>>>>> 
>>>>>>> Okay, after looking at this a bit more, I have a better
>> understanding,
>>>> and
>>>>>>> an idea for how to move forward.
>>>>>>> 
>>>>>>> First, I see that Translation.java has provisions for structured
>>>> output.
>>>>>>> I'm guessing StructuredTranslation was added by mistake?
>>>>>>> 
>>>>>>> Moving forward, on the joshua_api branch, I was thinking of the
>>>> following,
>>>>>>> but want to make sure it doesn't collide with what you've done or are
>>>> doing:
>>>>>>> 
>>>>>>> - Factor KBestExtractor to return Translation objects instead of
>>>> printing,
>>>>>>> and also turn it into an iterator
>>>>>>> 
>>>>>>> - There's a real discrepancy with competing forest representations.
>>>> There
>>>>>>> are operations on the hypergraph (via WalkerFunction), and then also
>>>>>>> operations on Derivations. This leads to code that operates on both.
>> It
>>>>>>> would be nice if the KBestExtractor just returned something like a
>>>> reduced
>>>>>>> "slice" of a forest forest new nodes containing only single back
>>>> pointers,
>>>>>>> representing exactly the nth-best derivation. Then we could
>>>> generically use
>>>>>>> the WalkerFunctions on that (e.g., viterbi extraction), and get rid
>> of
>>>> many
>>>>>>> of the DerivationVisitor classes
>>>>>>> 
>>>>>>> - Related: constructing the k-best list is expensive, even for just
>> the
>>>>>>> first item, since you have to set up all the candidate lists and so
>> on.
>>>>>>> This led to me implementing top-n = 0, where you can get the
>>>> translation
>>>>>>> and some limited information (not replayed features) via Viterbi
>>>> extractors
>>>>>>> on the hypergraph, and you only have to call KBestExtractor if you
>>>> actually
>>>>>>> want k-best lists. This leads to dual code, e.g., substitutions of
>>>>>>> output_format in multiple places. The first item the KBestIterator
>>>> returns
>>>>>>> should be constructed more efficiently, on the assumption that the
>>>> caller
>>>>>>> might not ask for more items. The StructuredTranslation object
>> already
>>>> is
>>>>>>> lazy about returning things that are asked for (e.g., it will only
>>>> replay
>>>>>>> features if you ask for the feature functions).
>>>>>>> 
>>>>>>> I will probably implement most of these tonight and tomorrow unless
>>>> there
>>>>>>> are objections from anyone (including an objection asking for more
>>>> time to
>>>>>>> evaluate!)
>>>>>>> 
>>>>>>> matt
>>>>>>> 
>>>>>>> 
>>>>>>>> On Apr 23, 2016, at 7:22 PM, Matt Post <[email protected]> wrote:
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> Kellen suggested we create a Joshua API, which I think is an
>> excellent
>>>>>>> idea. I've just made a start at this. It is not done and needs more
>>>> work,
>>>>>>> but I know that the Amazon folks have done some things on the
>> backend,
>>>> and
>>>>>>> I wanted to make sure not to duplicate any work they might have done.
>>>> Also,
>>>>>>> it's something we should discuss.
>>>>>>>> 
>>>>>>>> First, I was a bit confused about the joshua-6 subdirectory, and the
>>>>>>> files there (also, what is tst/? Both of these were from a recent
>>>> commit).
>>>>>>> I moved those over and then things didn't compile. I got things
>>>> compiling
>>>>>>> and then made a few changes to StructuredTranslation.
>>>>>>>> 
>>>>>>>> The biggest change I hope doesn't create problems is that I
>> simplified
>>>>>>> StructuredTranslation to no longer contain the Hypergraph object;
>>>> instead,
>>>>>>> it contains a DerivationState object. This represents a particular
>>>> k-best
>>>>>>> derivation, using Huang & Chiang (2005)-style ranked back pointers.
>> The
>>>>>>> nice thing is that you can simplify define a DerivationVisitor class
>>>> and
>>>>>>> pass it to DeriviationState::visit, and it will see every node in a
>>>>>>> particular derivation.
>>>>>>>> 
>>>>>>>> This is distinct from WalkerFunction, which walks an entire
>>>> *HyperGraph*.
>>>>>>>> 
>>>>>>>> Let me know what you guys thing about these changes, and maybe we
>> can
>>>>>>> spec out the API, and then clean things up inside a bit to use it
>>>> (there's
>>>>>>> no reason to be passing output stream writers to KBestExtractor, for
>>>>>>> example...).
>>>>>>>> 
>>>>>>>> matt
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> Begin forwarded message:
>>>>>>>>> 
>>>>>>>>> From: [email protected]
>>>>>>>>> Subject: incubator-joshua git commit: Simplified
>>>> StructuredTranslation
>>>>>>> to use derivations instead of hypergraphs, now using in
>> KBestExtractor
>>>>>>>>> Date: April 23, 2016 at 7:12:19 PM EDT
>>>>>>>>> To: [email protected]
>>>>>>>>> Reply-To: [email protected]
>>>>>>>>> 
>>>>>>>>> Repository: incubator-joshua
>>>>>>>>> Updated Branches:
>>>>>>>>> refs/heads/joshua_api [created] 824319561
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Simplified StructuredTranslation to use derivations instead of
>>>>>>> hypergraphs, now using in KBestExtractor
>>>>>>>>> 
>>>>>>>>> The StructuredTranslation object is a great idea. I rewrote it here
>>>> to
>>>>>>> do the following:
>>>>>>>>> 
>>>>>>>>> - It now compiles. I'm not sure why it was tucked under
>>>>>>> $JOSHUA/joshua-6, but I just noticed this, and when I brought it in,
>> it
>>>>>>> didn't work
>>>>>>>>> -  I rewrote it to be based on a single (k-best) derivation,
>> instead
>>>> of
>>>>>>> knowing about the whole hypergraph. We should also build a more
>> general
>>>>>>> object that knows about all the StructuredTranslation objects (maybe
>>>> with
>>>>>>> some renaming
>>>>>>>>> -  I changed it to have an option to only compute each of the items
>>>>>>> (e.g., features) if it was requested. The non-lazy version remains
>> the
>>>>>>> default.
>>>>>>>>> -  KBestExtractor now uses these. This is the first step to making
>> a
>>>>>>> proper API. My thinking is that a large object (maybe Translation?)
>>>> will
>>>>>>> contain the k-best extractor and can return StructuredTranslation
>>>> objects
>>>>>>> as requested (again, we may want to jiggle the names a bit)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Project:
>>>> http://git-wip-us.apache.org/repos/asf/incubator-joshua/repo
>>>>>>>>> Commit:
>>>>>>> 
>>>> http://git-wip-us.apache.org/repos/asf/incubator-joshua/commit/82431956
>>>>>>>>> Tree:
>>>>>>> 
>> http://git-wip-us.apache.org/repos/asf/incubator-joshua/tree/82431956
>>>>>>>>> Diff:
>>>>>>> 
>> http://git-wip-us.apache.org/repos/asf/incubator-joshua/diff/82431956
>>>>>>>>> 
>>>>>>>>> Branch: refs/heads/joshua_api
>>>>>>>>> Commit: 8243195611a17e0ef067ec7dbf6c4a57612d041b
>>>>>>>>> Parents: bc83a1a
>>>>>>>>> Author: Matt Post <[email protected]>
>>>>>>>>> Authored: Sat Apr 23 19:12:12 2016 -0400
>>>>>>>>> Committer: Matt Post <[email protected]>
>>>>>>>>> Committed: Sat Apr 23 19:12:12 2016 -0400
>>>>>>>>> 
>>>>>>>>> 
>>>> ----------------------------------------------------------------------
>>>>>>>>> src/joshua/decoder/StructuredTranslation.java   | 144
>>>>>>> ++++++++++---------
>>>>>>>>> .../decoder/hypergraph/KBestExtractor.java      |  47 +++---
>>>>>>>>> 2 files changed, 98 insertions(+), 93 deletions(-)
>>>>>>>>> 
>>>> ----------------------------------------------------------------------
>>>>>>> 
>>>> 
>> http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/82431956/src/joshua/decoder/StructuredTranslation.java
>>>>>>>>> 
>>>> ----------------------------------------------------------------------
>>>>>>>>> diff --git a/src/joshua/decoder/StructuredTranslation.java
>>>>>>> b/src/joshua/decoder/StructuredTranslation.java
>>>>>>>>> index 1939ea0..e3018b4 100644
>>>>>>>>> --- a/src/joshua/decoder/StructuredTranslation.java
>>>>>>>>> +++ b/src/joshua/decoder/StructuredTranslation.java
>>>>>>>>> @@ -10,7 +10,10 @@ import java.util.List;
>>>>>>>>> import java.util.Map;
>>>>>>>>> 
>>>>>>>>> import joshua.decoder.ff.FeatureFunction;
>>>>>>>>> +import joshua.decoder.ff.FeatureVector;
>>>>>>>>> import joshua.decoder.hypergraph.HyperGraph;
>>>>>>>>> +import joshua.decoder.hypergraph.KBestExtractor.DerivationState;
>>>>>>>>> +import joshua.decoder.io.DeNormalize;
>>>>>>>>> import
>> joshua.decoder.hypergraph.ViterbiFeatureVectorWalkerFunction;
>>>>>>>>> import joshua.decoder.hypergraph.ViterbiOutputStringWalkerFunction;
>>>>>>>>> import joshua.decoder.hypergraph.WalkerFunction;
>>>>>>>>> @@ -30,77 +33,51 @@ import joshua.decoder.segment_file.Sentence;
>>>>>>>>> public class StructuredTranslation {
>>>>>>>>> 
>>>>>>>>> private final Sentence sourceSentence;
>>>>>>>>> -  private final List<FeatureFunction> featureFunctions;
>>>>>>>>> +  private final DerivationState derivationRoot;
>>>>>>>>> +  private final JoshuaConfiguration joshuaConfiguration;
>>>>>>>>> 
>>>>>>>>> -  private final String translationString;
>>>>>>>>> -  private final List<String> translationTokens;
>>>>>>>>> -  private final float translationScore;
>>>>>>>>> -  private List<List<Integer>> translationWordAlignments;
>>>>>>>>> -  private Map<String,Float> translationFeatures;
>>>>>>>>> -  private final float extractionTime;
>>>>>>>>> +  private String translationString = null;
>>>>>>>>> +  private List<String> translationTokens = null;
>>>>>>>>> +  private String translationWordAlignments = null;
>>>>>>>>> +  private FeatureVector translationFeatures = null;
>>>>>>>>> +  private float extractionTime = 0.0f;
>>>>>>>>> +  private float translationScore = 0.0f;
>>>>>>>>> 
>>>>>>>>> +  /* If we need to replay the features, this will get set to true,
>>>> so
>>>>>>> that it's only done once */
>>>>>>>>> +  private boolean featuresReplayed = false;
>>>>>>>>> +
>>>>>>>>> public StructuredTranslation(final Sentence sourceSentence,
>>>>>>>>> -      final HyperGraph hypergraph,
>>>>>>>>> -      final List<FeatureFunction> featureFunctions) {
>>>>>>>>> -
>>>>>>>>> -      final long startTime = System.currentTimeMillis();
>>>>>>>>> -
>>>>>>>>> -      this.sourceSentence = sourceSentence;
>>>>>>>>> -      this.featureFunctions = featureFunctions;
>>>>>>>>> -      this.translationString = extractViterbiString(hypergraph);
>>>>>>>>> -      this.translationTokens = extractTranslationTokens();
>>>>>>>>> -      this.translationScore = extractTranslationScore(hypergraph);
>>>>>>>>> -      this.translationFeatures =
>> extractViterbiFeatures(hypergraph);
>>>>>>>>> -      this.translationWordAlignments =
>>>>>>> extractViterbiWordAlignment(hypergraph);
>>>>>>>>> -      this.extractionTime = (System.currentTimeMillis() -
>>>> startTime) /
>>>>>>> 1000.0f;
>>>>>>>>> -  }
>>>>>>>>> -
>>>>>>>>> -  private Map<String,Float> extractViterbiFeatures(final
>> HyperGraph
>>>>>>> hypergraph) {
>>>>>>>>> -    if (hypergraph == null) {
>>>>>>>>> -      return emptyMap();
>>>>>>>>> -    } else {
>>>>>>>>> -      ViterbiFeatureVectorWalkerFunction
>> viterbiFeatureVectorWalker
>>>> =
>>>>>>> new ViterbiFeatureVectorWalkerFunction(featureFunctions,
>>>> sourceSentence);
>>>>>>>>> -      walk(hypergraph.goalNode, viterbiFeatureVectorWalker);
>>>>>>>>> -      return new
>>>>>>> HashMap<String,Float>(viterbiFeatureVectorWalker.getFeaturesMap());
>>>>>>>>> -    }
>>>>>>>>> -  }
>>>>>>>>> +      final DerivationState derivationRoot,
>>>>>>>>> +      JoshuaConfiguration config) {
>>>>>>>>> 
>>>>>>>>> -  private List<List<Integer>> extractViterbiWordAlignment(final
>>>>>>> HyperGraph hypergraph) {
>>>>>>>>> -    if (hypergraph == null) {
>>>>>>>>> -      return emptyList();
>>>>>>>>> -    } else {
>>>>>>>>> -      final WordAlignmentExtractor wordAlignmentWalker = new
>>>>>>> WordAlignmentExtractor();
>>>>>>>>> -      walk(hypergraph.goalNode, wordAlignmentWalker);
>>>>>>>>> -      return wordAlignmentWalker.getFinalWordAlignments();
>>>>>>>>> -    }
>>>>>>>>> -  }
>>>>>>>>> -
>>>>>>>>> -  private float extractTranslationScore(final HyperGraph
>>>> hypergraph) {
>>>>>>>>> -    if (hypergraph == null) {
>>>>>>>>> -      return 0;
>>>>>>>>> -    } else {
>>>>>>>>> -      return hypergraph.goalNode.getScore();
>>>>>>>>> -    }
>>>>>>>>> -  }
>>>>>>>>> -
>>>>>>>>> -  private String extractViterbiString(final HyperGraph
>> hypergraph) {
>>>>>>>>> -    if (hypergraph == null) {
>>>>>>>>> -      return sourceSentence.source();
>>>>>>>>> -    } else {
>>>>>>>>> -      final WalkerFunction viterbiOutputStringWalker = new
>>>>>>> ViterbiOutputStringWalkerFunction();
>>>>>>>>> -      walk(hypergraph.goalNode, viterbiOutputStringWalker);
>>>>>>>>> -      return viterbiOutputStringWalker.toString();
>>>>>>>>> -    }
>>>>>>>>> +    this(sourceSentence, derivationRoot, config, true);
>>>>>>>>> }
>>>>>>>>> +
>>>>>>>>> 
>>>>>>>>> -  private List<String> extractTranslationTokens() {
>>>>>>>>> -    if (translationString.isEmpty()) {
>>>>>>>>> -      return emptyList();
>>>>>>>>> -    } else {
>>>>>>>>> -      return asList(translationString.split("\\s+"));
>>>>>>>>> +  public StructuredTranslation(final Sentence sourceSentence,
>>>>>>>>> +      final DerivationState derivationRoot,
>>>>>>>>> +      JoshuaConfiguration config,
>>>>>>>>> +      boolean now) {
>>>>>>>>> +
>>>>>>>>> +    final long startTime = System.currentTimeMillis();
>>>>>>>>> +
>>>>>>>>> +    this.sourceSentence = sourceSentence;
>>>>>>>>> +    this.derivationRoot = derivationRoot;
>>>>>>>>> +    this.joshuaConfiguration = config;
>>>>>>>>> +
>>>>>>>>> +    if (now) {
>>>>>>>>> +      getTranslationString();
>>>>>>>>> +      getTranslationTokens();
>>>>>>>>> +      getTranslationScore();
>>>>>>>>> +      getTranslationFeatures();
>>>>>>>>> +      getTranslationWordAlignments();
>>>>>>>>> }
>>>>>>>>> +    this.translationScore = getTranslationScore();
>>>>>>>>> +
>>>>>>>>> +    this.extractionTime = (System.currentTimeMillis() -
>> startTime) /
>>>>>>> 1000.0f;
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> +
>>>>>>>>> // Getters to use upstream
>>>>>>>>> 
>>>>>>>>> public Sentence getSourceSentence() {
>>>>>>>>> @@ -112,25 +89,60 @@ public class StructuredTranslation {
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> public String getTranslationString() {
>>>>>>>>> -    return translationString;
>>>>>>>>> +    if (this.translationString == null) {
>>>>>>>>> +      if (derivationRoot == null) {
>>>>>>>>> +        this.translationString = sourceSentence.source();
>>>>>>>>> +      } else {
>>>>>>>>> +        this.translationString = derivationRoot.getHypothesis();
>>>>>>>>> +      }
>>>>>>>>> +    }
>>>>>>>>> +    return this.translationString;
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> public List<String> getTranslationTokens() {
>>>>>>>>> +    if (this.translationTokens == null) {
>>>>>>>>> +      String trans = getTranslationString();
>>>>>>>>> +      if (trans.isEmpty()) {
>>>>>>>>> +        this.translationTokens = emptyList();
>>>>>>>>> +      } else {
>>>>>>>>> +        this.translationTokens = asList(trans.split("\\s+"));
>>>>>>>>> +      }
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>> return translationTokens;
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> public float getTranslationScore() {
>>>>>>>>> +    if (derivationRoot == null) {
>>>>>>>>> +      this.translationScore = 0.0f;
>>>>>>>>> +    } else {
>>>>>>>>> +      this.translationScore = derivationRoot.getModelCost();
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>> return translationScore;
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> /**
>>>>>>>>> * Returns a list of target to source alignments.
>>>>>>>>> */
>>>>>>>>> -  public List<List<Integer>> getTranslationWordAlignments() {
>>>>>>>>> -    return translationWordAlignments;
>>>>>>>>> +  public String getTranslationWordAlignments() {
>>>>>>>>> +    if (this.translationWordAlignments == null) {
>>>>>>>>> +      if (derivationRoot == null)
>>>>>>>>> +        this.translationWordAlignments = "";
>>>>>>>>> +      else {
>>>>>>>>> +        WordAlignmentExtractor wordAlignmentExtractor = new
>>>>>>> WordAlignmentExtractor();
>>>>>>>>> +        derivationRoot.visit(wordAlignmentExtractor);
>>>>>>>>> +        this.translationWordAlignments =
>>>>>>> wordAlignmentExtractor.toString();
>>>>>>>>> +      }
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>> +    return this.translationWordAlignments;
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> -  public Map<String,Float> getTranslationFeatures() {
>>>>>>>>> +  public FeatureVector getTranslationFeatures() {
>>>>>>>>> +    if (this.translationFeatures == null)
>>>>>>>>> +      this.translationFeatures = derivationRoot.replayFeatures();
>>>>>>>>> +
>>>>>>>>> return translationFeatures;
>>>>>>>>> }
>>>>>>> 
>>>> 
>> http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/82431956/src/joshua/decoder/hypergraph/KBestExtractor.java
>>>>>>>>> 
>>>> ----------------------------------------------------------------------
>>>>>>>>> diff --git a/src/joshua/decoder/hypergraph/KBestExtractor.java
>>>>>>> b/src/joshua/decoder/hypergraph/KBestExtractor.java
>>>>>>>>> index 42539cc..ea6ca73 100644
>>>>>>>>> --- a/src/joshua/decoder/hypergraph/KBestExtractor.java
>>>>>>>>> +++ b/src/joshua/decoder/hypergraph/KBestExtractor.java
>>>>>>>>> @@ -34,6 +34,7 @@ import java.util.regex.Matcher;
>>>>>>>>> import joshua.corpus.Vocabulary;
>>>>>>>>> import joshua.decoder.BLEU;
>>>>>>>>> import joshua.decoder.JoshuaConfiguration;
>>>>>>>>> +import joshua.decoder.StructuredTranslation;
>>>>>>>>> import joshua.decoder.chart_parser.ComputeNodeResult;
>>>>>>>>> import joshua.decoder.ff.FeatureFunction;
>>>>>>>>> import joshua.decoder.ff.FeatureVector;
>>>>>>>>> @@ -167,33 +168,25 @@ public class KBestExtractor {
>>>>>>>>> // Determine the k-best hypotheses at each HGNode
>>>>>>>>> VirtualNode virtualNode = getVirtualNode(node);
>>>>>>>>> DerivationState derivationState =
>>>>>>> virtualNode.lazyKBestExtractOnNode(this, k);
>>>>>>>>> +
>>>>>>>>> //    DerivationState derivationState = getKthDerivation(node, k);
>>>>>>>>> if (derivationState != null) {
>>>>>>>>> -      // ==== read the kbest from each hgnode and convert to
>> output
>>>>>>> format
>>>>>>>>> -      FeatureVector features = new FeatureVector();
>>>>>>>>> 
>>>>>>>>> -      /*
>>>>>>>>> -       * To save space, the decoder only stores the model cost, no
>>>> the
>>>>>>> individual feature values. If
>>>>>>>>> -       * you want to output them, you have to replay them.
>>>>>>>>> -       */
>>>>>>>>> -      String hypothesis = null;
>>>>>>>>> -      if (joshuaConfiguration.outputFormat.contains("%f")
>>>>>>>>> -          || joshuaConfiguration.outputFormat.contains("%d"))
>>>>>>>>> -        features = derivationState.replayFeatures();
>>>>>>>>> -
>>>>>>>>> -      hypothesis = derivationState.getHypothesis()
>>>>>>>>> +      StructuredTranslation translation = new
>> StructuredTranslation(
>>>>>>>>> +          sentence, derivationState, joshuaConfiguration);
>>>>>>>>> +
>>>>>>>>> +      String hypothesis = translation.getTranslationString()
>>>>>>>>>      .replaceAll("-lsb-", "[")
>>>>>>>>>      .replaceAll("-rsb-", "]")
>>>>>>>>>      .replaceAll("-pipe-", "|");
>>>>>>>>> 
>>>>>>>>> -
>>>>>>>>>  outputString = joshuaConfiguration.outputFormat
>>>>>>>>>      .replace("%k", Integer.toString(k))
>>>>>>>>>      .replace("%s", hypothesis)
>>>>>>>>>      .replace("%S", DeNormalize.processSingleLine(hypothesis))
>>>>>>>>>      .replace("%i", Integer.toString(sentence.id()))
>>>>>>>>> -          .replace("%f", joshuaConfiguration.moses ?
>>>>>>> features.mosesString() : features.toString())
>>>>>>>>> -          .replace("%c", String.format("%.3f",
>>>> derivationState.cost));
>>>>>>>>> +          .replace("%f", joshuaConfiguration.moses ?
>>>>>>> translation.getTranslationFeatures().mosesString() :
>>>>>>> translation.getTranslationFeatures().toString())
>>>>>>>>> +          .replace("%c", String.format("%.3f",
>>>>>>> translation.getTranslationScore()));
>>>>>>>>> 
>>>>>>>>>  if (joshuaConfiguration.outputFormat.contains("%t")) {
>>>>>>>>>    outputString = outputString.replace("%t",
>>>>>>> derivationState.getTree());
>>>>>>>>> @@ -250,11 +243,11 @@ public class KBestExtractor {
>>>>>>>>>  return;
>>>>>>>>> 
>>>>>>>>> for (int k = 1; k <= topN; k++) {
>>>>>>>>> -      String hypStr = getKthHyp(hg.goalNode, k);
>>>>>>>>> -      if (null == hypStr)
>>>>>>>>> +      String translation = getKthHyp(hg.goalNode, k);
>>>>>>>>> +      if (null == translation)
>>>>>>>>>    break;
>>>>>>>>> 
>>>>>>>>> -      out.write(hypStr);
>>>>>>>>> +      out.write(translation);
>>>>>>>>>  out.write("\n");
>>>>>>>>>  out.flush();
>>>>>>>>> }
>>>>>>>>> @@ -704,11 +697,11 @@ public class KBestExtractor {
>>>>>>>>> /**
>>>>>>>>> * Visits every state in the derivation in a depth-first order.
>>>>>>>>> */
>>>>>>>>> -    private DerivationVisitor visit(DerivationVisitor visitor) {
>>>>>>>>> +    public DerivationVisitor visit(DerivationVisitor visitor) {
>>>>>>>>>  return visit(visitor, 0);
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> -    private DerivationVisitor visit(DerivationVisitor visitor, int
>>>>>>> indent) {
>>>>>>>>> +    public DerivationVisitor visit(DerivationVisitor visitor, int
>>>>>>> indent) {
>>>>>>>>> 
>>>>>>>>>  visitor.before(this, indent);
>>>>>>>>> 
>>>>>>>>> @@ -733,25 +726,25 @@ public class KBestExtractor {
>>>>>>>>>  return visitor;
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> -    private String getHypothesis() {
>>>>>>>>> +    public String getHypothesis() {
>>>>>>>>>  return getHypothesis(defaultSide);
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> -    private String getTree() {
>>>>>>>>> +    public String getTree() {
>>>>>>>>>  return visit(new TreeExtractor()).toString();
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> -    private String getHypothesis(Side side) {
>>>>>>>>> +    public String getHypothesis(Side side) {
>>>>>>>>>  return visit(new HypothesisExtractor(side)).toString();
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> -    private FeatureVector replayFeatures() {
>>>>>>>>> +    public FeatureVector replayFeatures() {
>>>>>>>>>  FeatureReplayer fp = new FeatureReplayer();
>>>>>>>>>  visit(fp);
>>>>>>>>>  return fp.getFeatures();
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> -    private String getDerivation() {
>>>>>>>>> +    public String getDerivation() {
>>>>>>>>>  return visit(new DerivationExtractor()).toString();
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> @@ -811,7 +804,7 @@ public class KBestExtractor {
>>>>>>>>> */
>>>>>>>>> void after(DerivationState state, int level);
>>>>>>>>> }
>>>>>>>>> -
>>>>>>>>> +
>>>>>>>>> /**
>>>>>>>>> * Extracts the hypothesis from the leaves of the tree using the
>>>>>>> generic (depth-first) visitor.
>>>>>>>>> * Since we're using the visitor, we can't just print out the words
>> as
>>>>>>> we see them. We have to
>>>>>>>>> @@ -878,7 +871,7 @@ public class KBestExtractor {
>>>>>>>>>  return outputs.pop().replaceAll("<s> ", "").replace(" </s>", "");
>>>>>>>>> }
>>>>>>>>> }
>>>>>>>>> -
>>>>>>>>> +
>>>>>>>>> /**
>>>>>>>>> * Assembles a Penn treebank format tree for a given derivation.
>>>>>>>>> */
>>>>>>> 
>>>>>>> 
>>>>> Amazon Development Center Germany GmbH
>>>>> Berlin - Dresden - Aachen
>>>>> main office: Krausenstr. 38, 10117 Berlin
>>>>> Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
>>>>> Ust-ID: DE289237879
>>>>> Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
>>>>> 
>>>> 
>>>> 
>> 
>>

Re: joshua_api

Reply via email to