Do you want me to fix the recapitalization? Or are you going to do that? I looked a bit, and it seems I'll have to add a method to get a word alignment object instead of just the string, so that I can poke through them. This approach is as good as true-casing in some languages.
A few other things: - I saw a comment in the commit about the changes not working for phrase-based translation. Can you (or Felix) elaborate? What exactly will no longer work? - Currently, there are multiple places where the "output-format" string has to get edited (KBestExtractor and in Translation). After you push your changes in, I'm going to make some edits so that this all occurs in one place. matt > On Apr 27, 2016, at 2:25 PM, kellen sunderland <kellen.sunderl...@gmail.com> > wrote: > > Thanks for taking a look Matt, > > I think this is all we've got planned as far as changes relating to an API > would go. We have a few more commits coming but they're just performance > improvements and they don't change too much in the way of interfaces or > method signatures. > > -Kellen > > On Wed, Apr 27, 2016 at 4:47 AM, Matt Post <p...@cs.jhu.edu> wrote: > >> Kellen, >> >> Great. I had a chance to start looking over the ReworkedExtractions >> branch. I'll have some more time today. It looks good to me so far. Is >> there anything else you plan to do, or does that branch contain basically >> all of it (apart from the recapitalization fix, which I see should be >> applied more selectively, maybe only when a -recapitalize flag is present, >> to save on time). >> >> matt >> >> >>> On Apr 26, 2016, at 1:56 AM, kellen sunderland < >> kellen.sunderl...@gmail.com> wrote: >>> >>> Hey Matt, >>> >>> I've opened a new pull request with a few of our commits, feel free to >> take >>> a look when you have some time. >>> >>> More importantly I've pushed our queue of upcoming commits to the >> following >>> branch in my fork: >>> >> https://github.com/KellenSunderland/incubator-joshua/commits/ReworkedExtractions >>> . From there you can get an idea for the work we've done so far. I >>> haven't opened a PR yet for these commits because there's still some >>> merging I have to do (there's a few failing tests and I had to >> temporarily >>> comment out some of your casing code). Once that's fixed I'll do a >> proper >>> PR for these commits. >>> >>> -Kellen >>> >>> On Mon, Apr 25, 2016 at 1:35 PM, Matt Post <p...@cs.jhu.edu> wrote: >>> >>>> Great. On that first point, I meant that translate() would return a >>>> Translation object, which would know its hypergraph and could iterate >> over >>>> a KBestExtractor. In any case, though, it sounds like you are a bit >> ahead >>>> of me on this, so I'll wait for a push that I can see, and then we can >>>> converge on the design. >>>> >>>> matt >>>> >>>> >>>>> On Apr 25, 2016, at 4:10 PM, Hieber, Felix <fhie...@amazon.de> wrote: >>>>> >>>>> Hi Matt, >>>>> >>>>> These are some nice suggestions. Most of the work we have done is in >>>> line of what you propose so I would agree with Kellen that we should >>>> synchronize and compare better earlier than later. >>>>> >>>>> Best, >>>>> Felix >>>>> >>>>>> On 25.04.2016, at 07:44, kellen sunderland < >> kellen.sunderl...@gmail.com> >>>> wrote: >>>>>> >>>>>> Hey Matt, >>>>>> >>>>>> Sorry for the late reply. The Joshua-6 folder and tst may have just >>>> been >>>>>> artifacts of some symlinks I have locally. Sorry they may have been >>>> pushed >>>>>> by mistake, I can clean that up. >>>>>> >>>>>> Good idea to have the api code in a separate branch. We can merge the >>>> work >>>>>> that we've done some time next week. >>>>>> >>>>>> KBestExtractor is one of the things we want to return via the API. We >>>>>> already have some of this implemented though as you suggest. I'll try >>>> and >>>>>> push the remaining work we've done into my github branch so you can >>>> compare. >>>>>> >>>>>> -Kellen >>>>>> >>>>>>> On Mon, Apr 25, 2016 at 6:11 AM, Matt Post <p...@cs.jhu.edu> wrote: >>>>>>> >>>>>>> Okay, after looking at this a bit more, I have a better >> understanding, >>>> and >>>>>>> an idea for how to move forward. >>>>>>> >>>>>>> First, I see that Translation.java has provisions for structured >>>> output. >>>>>>> I'm guessing StructuredTranslation was added by mistake? >>>>>>> >>>>>>> Moving forward, on the joshua_api branch, I was thinking of the >>>> following, >>>>>>> but want to make sure it doesn't collide with what you've done or are >>>> doing: >>>>>>> >>>>>>> - Factor KBestExtractor to return Translation objects instead of >>>> printing, >>>>>>> and also turn it into an iterator >>>>>>> >>>>>>> - There's a real discrepancy with competing forest representations. >>>> There >>>>>>> are operations on the hypergraph (via WalkerFunction), and then also >>>>>>> operations on Derivations. This leads to code that operates on both. >> It >>>>>>> would be nice if the KBestExtractor just returned something like a >>>> reduced >>>>>>> "slice" of a forest forest new nodes containing only single back >>>> pointers, >>>>>>> representing exactly the nth-best derivation. Then we could >>>> generically use >>>>>>> the WalkerFunctions on that (e.g., viterbi extraction), and get rid >> of >>>> many >>>>>>> of the DerivationVisitor classes >>>>>>> >>>>>>> - Related: constructing the k-best list is expensive, even for just >> the >>>>>>> first item, since you have to set up all the candidate lists and so >> on. >>>>>>> This led to me implementing top-n = 0, where you can get the >>>> translation >>>>>>> and some limited information (not replayed features) via Viterbi >>>> extractors >>>>>>> on the hypergraph, and you only have to call KBestExtractor if you >>>> actually >>>>>>> want k-best lists. This leads to dual code, e.g., substitutions of >>>>>>> output_format in multiple places. The first item the KBestIterator >>>> returns >>>>>>> should be constructed more efficiently, on the assumption that the >>>> caller >>>>>>> might not ask for more items. The StructuredTranslation object >> already >>>> is >>>>>>> lazy about returning things that are asked for (e.g., it will only >>>> replay >>>>>>> features if you ask for the feature functions). >>>>>>> >>>>>>> I will probably implement most of these tonight and tomorrow unless >>>> there >>>>>>> are objections from anyone (including an objection asking for more >>>> time to >>>>>>> evaluate!) >>>>>>> >>>>>>> matt >>>>>>> >>>>>>> >>>>>>>> On Apr 23, 2016, at 7:22 PM, Matt Post <p...@cs.jhu.edu> wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Kellen suggested we create a Joshua API, which I think is an >> excellent >>>>>>> idea. I've just made a start at this. It is not done and needs more >>>> work, >>>>>>> but I know that the Amazon folks have done some things on the >> backend, >>>> and >>>>>>> I wanted to make sure not to duplicate any work they might have done. >>>> Also, >>>>>>> it's something we should discuss. >>>>>>>> >>>>>>>> First, I was a bit confused about the joshua-6 subdirectory, and the >>>>>>> files there (also, what is tst/? Both of these were from a recent >>>> commit). >>>>>>> I moved those over and then things didn't compile. I got things >>>> compiling >>>>>>> and then made a few changes to StructuredTranslation. >>>>>>>> >>>>>>>> The biggest change I hope doesn't create problems is that I >> simplified >>>>>>> StructuredTranslation to no longer contain the Hypergraph object; >>>> instead, >>>>>>> it contains a DerivationState object. This represents a particular >>>> k-best >>>>>>> derivation, using Huang & Chiang (2005)-style ranked back pointers. >> The >>>>>>> nice thing is that you can simplify define a DerivationVisitor class >>>> and >>>>>>> pass it to DeriviationState::visit, and it will see every node in a >>>>>>> particular derivation. >>>>>>>> >>>>>>>> This is distinct from WalkerFunction, which walks an entire >>>> *HyperGraph*. >>>>>>>> >>>>>>>> Let me know what you guys thing about these changes, and maybe we >> can >>>>>>> spec out the API, and then clean things up inside a bit to use it >>>> (there's >>>>>>> no reason to be passing output stream writers to KBestExtractor, for >>>>>>> example...). >>>>>>>> >>>>>>>> matt >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Begin forwarded message: >>>>>>>>> >>>>>>>>> From: mjp...@apache.org >>>>>>>>> Subject: incubator-joshua git commit: Simplified >>>> StructuredTranslation >>>>>>> to use derivations instead of hypergraphs, now using in >> KBestExtractor >>>>>>>>> Date: April 23, 2016 at 7:12:19 PM EDT >>>>>>>>> To: comm...@joshua.incubator.apache.org >>>>>>>>> Reply-To: dev@joshua.incubator.apache.org >>>>>>>>> >>>>>>>>> Repository: incubator-joshua >>>>>>>>> Updated Branches: >>>>>>>>> refs/heads/joshua_api [created] 824319561 >>>>>>>>> >>>>>>>>> >>>>>>>>> Simplified StructuredTranslation to use derivations instead of >>>>>>> hypergraphs, now using in KBestExtractor >>>>>>>>> >>>>>>>>> The StructuredTranslation object is a great idea. I rewrote it here >>>> to >>>>>>> do the following: >>>>>>>>> >>>>>>>>> - It now compiles. I'm not sure why it was tucked under >>>>>>> $JOSHUA/joshua-6, but I just noticed this, and when I brought it in, >> it >>>>>>> didn't work >>>>>>>>> - I rewrote it to be based on a single (k-best) derivation, >> instead >>>> of >>>>>>> knowing about the whole hypergraph. We should also build a more >> general >>>>>>> object that knows about all the StructuredTranslation objects (maybe >>>> with >>>>>>> some renaming >>>>>>>>> - I changed it to have an option to only compute each of the items >>>>>>> (e.g., features) if it was requested. The non-lazy version remains >> the >>>>>>> default. >>>>>>>>> - KBestExtractor now uses these. This is the first step to making >> a >>>>>>> proper API. My thinking is that a large object (maybe Translation?) >>>> will >>>>>>> contain the k-best extractor and can return StructuredTranslation >>>> objects >>>>>>> as requested (again, we may want to jiggle the names a bit) >>>>>>>>> >>>>>>>>> >>>>>>>>> Project: >>>> http://git-wip-us.apache.org/repos/asf/incubator-joshua/repo >>>>>>>>> Commit: >>>>>>> >>>> http://git-wip-us.apache.org/repos/asf/incubator-joshua/commit/82431956 >>>>>>>>> Tree: >>>>>>> >> http://git-wip-us.apache.org/repos/asf/incubator-joshua/tree/82431956 >>>>>>>>> Diff: >>>>>>> >> http://git-wip-us.apache.org/repos/asf/incubator-joshua/diff/82431956 >>>>>>>>> >>>>>>>>> Branch: refs/heads/joshua_api >>>>>>>>> Commit: 8243195611a17e0ef067ec7dbf6c4a57612d041b >>>>>>>>> Parents: bc83a1a >>>>>>>>> Author: Matt Post <p...@cs.jhu.edu> >>>>>>>>> Authored: Sat Apr 23 19:12:12 2016 -0400 >>>>>>>>> Committer: Matt Post <p...@cs.jhu.edu> >>>>>>>>> Committed: Sat Apr 23 19:12:12 2016 -0400 >>>>>>>>> >>>>>>>>> >>>> ---------------------------------------------------------------------- >>>>>>>>> src/joshua/decoder/StructuredTranslation.java | 144 >>>>>>> ++++++++++--------- >>>>>>>>> .../decoder/hypergraph/KBestExtractor.java | 47 +++--- >>>>>>>>> 2 files changed, 98 insertions(+), 93 deletions(-) >>>>>>>>> >>>> ---------------------------------------------------------------------- >>>>>>> >>>> >> http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/82431956/src/joshua/decoder/StructuredTranslation.java >>>>>>>>> >>>> ---------------------------------------------------------------------- >>>>>>>>> diff --git a/src/joshua/decoder/StructuredTranslation.java >>>>>>> b/src/joshua/decoder/StructuredTranslation.java >>>>>>>>> index 1939ea0..e3018b4 100644 >>>>>>>>> --- a/src/joshua/decoder/StructuredTranslation.java >>>>>>>>> +++ b/src/joshua/decoder/StructuredTranslation.java >>>>>>>>> @@ -10,7 +10,10 @@ import java.util.List; >>>>>>>>> import java.util.Map; >>>>>>>>> >>>>>>>>> import joshua.decoder.ff.FeatureFunction; >>>>>>>>> +import joshua.decoder.ff.FeatureVector; >>>>>>>>> import joshua.decoder.hypergraph.HyperGraph; >>>>>>>>> +import joshua.decoder.hypergraph.KBestExtractor.DerivationState; >>>>>>>>> +import joshua.decoder.io.DeNormalize; >>>>>>>>> import >> joshua.decoder.hypergraph.ViterbiFeatureVectorWalkerFunction; >>>>>>>>> import joshua.decoder.hypergraph.ViterbiOutputStringWalkerFunction; >>>>>>>>> import joshua.decoder.hypergraph.WalkerFunction; >>>>>>>>> @@ -30,77 +33,51 @@ import joshua.decoder.segment_file.Sentence; >>>>>>>>> public class StructuredTranslation { >>>>>>>>> >>>>>>>>> private final Sentence sourceSentence; >>>>>>>>> - private final List<FeatureFunction> featureFunctions; >>>>>>>>> + private final DerivationState derivationRoot; >>>>>>>>> + private final JoshuaConfiguration joshuaConfiguration; >>>>>>>>> >>>>>>>>> - private final String translationString; >>>>>>>>> - private final List<String> translationTokens; >>>>>>>>> - private final float translationScore; >>>>>>>>> - private List<List<Integer>> translationWordAlignments; >>>>>>>>> - private Map<String,Float> translationFeatures; >>>>>>>>> - private final float extractionTime; >>>>>>>>> + private String translationString = null; >>>>>>>>> + private List<String> translationTokens = null; >>>>>>>>> + private String translationWordAlignments = null; >>>>>>>>> + private FeatureVector translationFeatures = null; >>>>>>>>> + private float extractionTime = 0.0f; >>>>>>>>> + private float translationScore = 0.0f; >>>>>>>>> >>>>>>>>> + /* If we need to replay the features, this will get set to true, >>>> so >>>>>>> that it's only done once */ >>>>>>>>> + private boolean featuresReplayed = false; >>>>>>>>> + >>>>>>>>> public StructuredTranslation(final Sentence sourceSentence, >>>>>>>>> - final HyperGraph hypergraph, >>>>>>>>> - final List<FeatureFunction> featureFunctions) { >>>>>>>>> - >>>>>>>>> - final long startTime = System.currentTimeMillis(); >>>>>>>>> - >>>>>>>>> - this.sourceSentence = sourceSentence; >>>>>>>>> - this.featureFunctions = featureFunctions; >>>>>>>>> - this.translationString = extractViterbiString(hypergraph); >>>>>>>>> - this.translationTokens = extractTranslationTokens(); >>>>>>>>> - this.translationScore = extractTranslationScore(hypergraph); >>>>>>>>> - this.translationFeatures = >> extractViterbiFeatures(hypergraph); >>>>>>>>> - this.translationWordAlignments = >>>>>>> extractViterbiWordAlignment(hypergraph); >>>>>>>>> - this.extractionTime = (System.currentTimeMillis() - >>>> startTime) / >>>>>>> 1000.0f; >>>>>>>>> - } >>>>>>>>> - >>>>>>>>> - private Map<String,Float> extractViterbiFeatures(final >> HyperGraph >>>>>>> hypergraph) { >>>>>>>>> - if (hypergraph == null) { >>>>>>>>> - return emptyMap(); >>>>>>>>> - } else { >>>>>>>>> - ViterbiFeatureVectorWalkerFunction >> viterbiFeatureVectorWalker >>>> = >>>>>>> new ViterbiFeatureVectorWalkerFunction(featureFunctions, >>>> sourceSentence); >>>>>>>>> - walk(hypergraph.goalNode, viterbiFeatureVectorWalker); >>>>>>>>> - return new >>>>>>> HashMap<String,Float>(viterbiFeatureVectorWalker.getFeaturesMap()); >>>>>>>>> - } >>>>>>>>> - } >>>>>>>>> + final DerivationState derivationRoot, >>>>>>>>> + JoshuaConfiguration config) { >>>>>>>>> >>>>>>>>> - private List<List<Integer>> extractViterbiWordAlignment(final >>>>>>> HyperGraph hypergraph) { >>>>>>>>> - if (hypergraph == null) { >>>>>>>>> - return emptyList(); >>>>>>>>> - } else { >>>>>>>>> - final WordAlignmentExtractor wordAlignmentWalker = new >>>>>>> WordAlignmentExtractor(); >>>>>>>>> - walk(hypergraph.goalNode, wordAlignmentWalker); >>>>>>>>> - return wordAlignmentWalker.getFinalWordAlignments(); >>>>>>>>> - } >>>>>>>>> - } >>>>>>>>> - >>>>>>>>> - private float extractTranslationScore(final HyperGraph >>>> hypergraph) { >>>>>>>>> - if (hypergraph == null) { >>>>>>>>> - return 0; >>>>>>>>> - } else { >>>>>>>>> - return hypergraph.goalNode.getScore(); >>>>>>>>> - } >>>>>>>>> - } >>>>>>>>> - >>>>>>>>> - private String extractViterbiString(final HyperGraph >> hypergraph) { >>>>>>>>> - if (hypergraph == null) { >>>>>>>>> - return sourceSentence.source(); >>>>>>>>> - } else { >>>>>>>>> - final WalkerFunction viterbiOutputStringWalker = new >>>>>>> ViterbiOutputStringWalkerFunction(); >>>>>>>>> - walk(hypergraph.goalNode, viterbiOutputStringWalker); >>>>>>>>> - return viterbiOutputStringWalker.toString(); >>>>>>>>> - } >>>>>>>>> + this(sourceSentence, derivationRoot, config, true); >>>>>>>>> } >>>>>>>>> + >>>>>>>>> >>>>>>>>> - private List<String> extractTranslationTokens() { >>>>>>>>> - if (translationString.isEmpty()) { >>>>>>>>> - return emptyList(); >>>>>>>>> - } else { >>>>>>>>> - return asList(translationString.split("\\s+")); >>>>>>>>> + public StructuredTranslation(final Sentence sourceSentence, >>>>>>>>> + final DerivationState derivationRoot, >>>>>>>>> + JoshuaConfiguration config, >>>>>>>>> + boolean now) { >>>>>>>>> + >>>>>>>>> + final long startTime = System.currentTimeMillis(); >>>>>>>>> + >>>>>>>>> + this.sourceSentence = sourceSentence; >>>>>>>>> + this.derivationRoot = derivationRoot; >>>>>>>>> + this.joshuaConfiguration = config; >>>>>>>>> + >>>>>>>>> + if (now) { >>>>>>>>> + getTranslationString(); >>>>>>>>> + getTranslationTokens(); >>>>>>>>> + getTranslationScore(); >>>>>>>>> + getTranslationFeatures(); >>>>>>>>> + getTranslationWordAlignments(); >>>>>>>>> } >>>>>>>>> + this.translationScore = getTranslationScore(); >>>>>>>>> + >>>>>>>>> + this.extractionTime = (System.currentTimeMillis() - >> startTime) / >>>>>>> 1000.0f; >>>>>>>>> } >>>>>>>>> >>>>>>>>> + >>>>>>>>> // Getters to use upstream >>>>>>>>> >>>>>>>>> public Sentence getSourceSentence() { >>>>>>>>> @@ -112,25 +89,60 @@ public class StructuredTranslation { >>>>>>>>> } >>>>>>>>> >>>>>>>>> public String getTranslationString() { >>>>>>>>> - return translationString; >>>>>>>>> + if (this.translationString == null) { >>>>>>>>> + if (derivationRoot == null) { >>>>>>>>> + this.translationString = sourceSentence.source(); >>>>>>>>> + } else { >>>>>>>>> + this.translationString = derivationRoot.getHypothesis(); >>>>>>>>> + } >>>>>>>>> + } >>>>>>>>> + return this.translationString; >>>>>>>>> } >>>>>>>>> >>>>>>>>> public List<String> getTranslationTokens() { >>>>>>>>> + if (this.translationTokens == null) { >>>>>>>>> + String trans = getTranslationString(); >>>>>>>>> + if (trans.isEmpty()) { >>>>>>>>> + this.translationTokens = emptyList(); >>>>>>>>> + } else { >>>>>>>>> + this.translationTokens = asList(trans.split("\\s+")); >>>>>>>>> + } >>>>>>>>> + } >>>>>>>>> + >>>>>>>>> return translationTokens; >>>>>>>>> } >>>>>>>>> >>>>>>>>> public float getTranslationScore() { >>>>>>>>> + if (derivationRoot == null) { >>>>>>>>> + this.translationScore = 0.0f; >>>>>>>>> + } else { >>>>>>>>> + this.translationScore = derivationRoot.getModelCost(); >>>>>>>>> + } >>>>>>>>> + >>>>>>>>> return translationScore; >>>>>>>>> } >>>>>>>>> >>>>>>>>> /** >>>>>>>>> * Returns a list of target to source alignments. >>>>>>>>> */ >>>>>>>>> - public List<List<Integer>> getTranslationWordAlignments() { >>>>>>>>> - return translationWordAlignments; >>>>>>>>> + public String getTranslationWordAlignments() { >>>>>>>>> + if (this.translationWordAlignments == null) { >>>>>>>>> + if (derivationRoot == null) >>>>>>>>> + this.translationWordAlignments = ""; >>>>>>>>> + else { >>>>>>>>> + WordAlignmentExtractor wordAlignmentExtractor = new >>>>>>> WordAlignmentExtractor(); >>>>>>>>> + derivationRoot.visit(wordAlignmentExtractor); >>>>>>>>> + this.translationWordAlignments = >>>>>>> wordAlignmentExtractor.toString(); >>>>>>>>> + } >>>>>>>>> + } >>>>>>>>> + >>>>>>>>> + return this.translationWordAlignments; >>>>>>>>> } >>>>>>>>> >>>>>>>>> - public Map<String,Float> getTranslationFeatures() { >>>>>>>>> + public FeatureVector getTranslationFeatures() { >>>>>>>>> + if (this.translationFeatures == null) >>>>>>>>> + this.translationFeatures = derivationRoot.replayFeatures(); >>>>>>>>> + >>>>>>>>> return translationFeatures; >>>>>>>>> } >>>>>>> >>>> >> http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/82431956/src/joshua/decoder/hypergraph/KBestExtractor.java >>>>>>>>> >>>> ---------------------------------------------------------------------- >>>>>>>>> diff --git a/src/joshua/decoder/hypergraph/KBestExtractor.java >>>>>>> b/src/joshua/decoder/hypergraph/KBestExtractor.java >>>>>>>>> index 42539cc..ea6ca73 100644 >>>>>>>>> --- a/src/joshua/decoder/hypergraph/KBestExtractor.java >>>>>>>>> +++ b/src/joshua/decoder/hypergraph/KBestExtractor.java >>>>>>>>> @@ -34,6 +34,7 @@ import java.util.regex.Matcher; >>>>>>>>> import joshua.corpus.Vocabulary; >>>>>>>>> import joshua.decoder.BLEU; >>>>>>>>> import joshua.decoder.JoshuaConfiguration; >>>>>>>>> +import joshua.decoder.StructuredTranslation; >>>>>>>>> import joshua.decoder.chart_parser.ComputeNodeResult; >>>>>>>>> import joshua.decoder.ff.FeatureFunction; >>>>>>>>> import joshua.decoder.ff.FeatureVector; >>>>>>>>> @@ -167,33 +168,25 @@ public class KBestExtractor { >>>>>>>>> // Determine the k-best hypotheses at each HGNode >>>>>>>>> VirtualNode virtualNode = getVirtualNode(node); >>>>>>>>> DerivationState derivationState = >>>>>>> virtualNode.lazyKBestExtractOnNode(this, k); >>>>>>>>> + >>>>>>>>> // DerivationState derivationState = getKthDerivation(node, k); >>>>>>>>> if (derivationState != null) { >>>>>>>>> - // ==== read the kbest from each hgnode and convert to >> output >>>>>>> format >>>>>>>>> - FeatureVector features = new FeatureVector(); >>>>>>>>> >>>>>>>>> - /* >>>>>>>>> - * To save space, the decoder only stores the model cost, no >>>> the >>>>>>> individual feature values. If >>>>>>>>> - * you want to output them, you have to replay them. >>>>>>>>> - */ >>>>>>>>> - String hypothesis = null; >>>>>>>>> - if (joshuaConfiguration.outputFormat.contains("%f") >>>>>>>>> - || joshuaConfiguration.outputFormat.contains("%d")) >>>>>>>>> - features = derivationState.replayFeatures(); >>>>>>>>> - >>>>>>>>> - hypothesis = derivationState.getHypothesis() >>>>>>>>> + StructuredTranslation translation = new >> StructuredTranslation( >>>>>>>>> + sentence, derivationState, joshuaConfiguration); >>>>>>>>> + >>>>>>>>> + String hypothesis = translation.getTranslationString() >>>>>>>>> .replaceAll("-lsb-", "[") >>>>>>>>> .replaceAll("-rsb-", "]") >>>>>>>>> .replaceAll("-pipe-", "|"); >>>>>>>>> >>>>>>>>> - >>>>>>>>> outputString = joshuaConfiguration.outputFormat >>>>>>>>> .replace("%k", Integer.toString(k)) >>>>>>>>> .replace("%s", hypothesis) >>>>>>>>> .replace("%S", DeNormalize.processSingleLine(hypothesis)) >>>>>>>>> .replace("%i", Integer.toString(sentence.id())) >>>>>>>>> - .replace("%f", joshuaConfiguration.moses ? >>>>>>> features.mosesString() : features.toString()) >>>>>>>>> - .replace("%c", String.format("%.3f", >>>> derivationState.cost)); >>>>>>>>> + .replace("%f", joshuaConfiguration.moses ? >>>>>>> translation.getTranslationFeatures().mosesString() : >>>>>>> translation.getTranslationFeatures().toString()) >>>>>>>>> + .replace("%c", String.format("%.3f", >>>>>>> translation.getTranslationScore())); >>>>>>>>> >>>>>>>>> if (joshuaConfiguration.outputFormat.contains("%t")) { >>>>>>>>> outputString = outputString.replace("%t", >>>>>>> derivationState.getTree()); >>>>>>>>> @@ -250,11 +243,11 @@ public class KBestExtractor { >>>>>>>>> return; >>>>>>>>> >>>>>>>>> for (int k = 1; k <= topN; k++) { >>>>>>>>> - String hypStr = getKthHyp(hg.goalNode, k); >>>>>>>>> - if (null == hypStr) >>>>>>>>> + String translation = getKthHyp(hg.goalNode, k); >>>>>>>>> + if (null == translation) >>>>>>>>> break; >>>>>>>>> >>>>>>>>> - out.write(hypStr); >>>>>>>>> + out.write(translation); >>>>>>>>> out.write("\n"); >>>>>>>>> out.flush(); >>>>>>>>> } >>>>>>>>> @@ -704,11 +697,11 @@ public class KBestExtractor { >>>>>>>>> /** >>>>>>>>> * Visits every state in the derivation in a depth-first order. >>>>>>>>> */ >>>>>>>>> - private DerivationVisitor visit(DerivationVisitor visitor) { >>>>>>>>> + public DerivationVisitor visit(DerivationVisitor visitor) { >>>>>>>>> return visit(visitor, 0); >>>>>>>>> } >>>>>>>>> >>>>>>>>> - private DerivationVisitor visit(DerivationVisitor visitor, int >>>>>>> indent) { >>>>>>>>> + public DerivationVisitor visit(DerivationVisitor visitor, int >>>>>>> indent) { >>>>>>>>> >>>>>>>>> visitor.before(this, indent); >>>>>>>>> >>>>>>>>> @@ -733,25 +726,25 @@ public class KBestExtractor { >>>>>>>>> return visitor; >>>>>>>>> } >>>>>>>>> >>>>>>>>> - private String getHypothesis() { >>>>>>>>> + public String getHypothesis() { >>>>>>>>> return getHypothesis(defaultSide); >>>>>>>>> } >>>>>>>>> >>>>>>>>> - private String getTree() { >>>>>>>>> + public String getTree() { >>>>>>>>> return visit(new TreeExtractor()).toString(); >>>>>>>>> } >>>>>>>>> >>>>>>>>> - private String getHypothesis(Side side) { >>>>>>>>> + public String getHypothesis(Side side) { >>>>>>>>> return visit(new HypothesisExtractor(side)).toString(); >>>>>>>>> } >>>>>>>>> >>>>>>>>> - private FeatureVector replayFeatures() { >>>>>>>>> + public FeatureVector replayFeatures() { >>>>>>>>> FeatureReplayer fp = new FeatureReplayer(); >>>>>>>>> visit(fp); >>>>>>>>> return fp.getFeatures(); >>>>>>>>> } >>>>>>>>> >>>>>>>>> - private String getDerivation() { >>>>>>>>> + public String getDerivation() { >>>>>>>>> return visit(new DerivationExtractor()).toString(); >>>>>>>>> } >>>>>>>>> >>>>>>>>> @@ -811,7 +804,7 @@ public class KBestExtractor { >>>>>>>>> */ >>>>>>>>> void after(DerivationState state, int level); >>>>>>>>> } >>>>>>>>> - >>>>>>>>> + >>>>>>>>> /** >>>>>>>>> * Extracts the hypothesis from the leaves of the tree using the >>>>>>> generic (depth-first) visitor. >>>>>>>>> * Since we're using the visitor, we can't just print out the words >> as >>>>>>> we see them. We have to >>>>>>>>> @@ -878,7 +871,7 @@ public class KBestExtractor { >>>>>>>>> return outputs.pop().replaceAll("<s> ", "").replace(" </s>", ""); >>>>>>>>> } >>>>>>>>> } >>>>>>>>> - >>>>>>>>> + >>>>>>>>> /** >>>>>>>>> * Assembles a Penn treebank format tree for a given derivation. >>>>>>>>> */ >>>>>>> >>>>>>> >>>>> Amazon Development Center Germany GmbH >>>>> Berlin - Dresden - Aachen >>>>> main office: Krausenstr. 38, 10117 Berlin >>>>> Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger >>>>> Ust-ID: DE289237879 >>>>> Eingetragen am Amtsgericht Charlottenburg HRB 149173 B >>>>> >>>> >>>> >> >>