Agreed about the cuteness of const Factor *. Let's say you're reading space-delimited file input.
std::string line("Foo Bar Baz Quux ."); One can make a StringPiece(line.data(), 3) that looks and for most purposes acts like std::string("Foo") but requires zero memory allocation. It's not null terminated. It's just a const char * and a length without owning the underlying memory. This makes it super fast to parse/split text. util/tokenize_piece.hh provides an iterator operation for string splitting. Taking it a step further, util::FilePiece does a rolling mmap of a text file and gives you StringPiece. Zero-copy file reading. In Moses preference order for function parameters: const Factor *, StringPiece, std::string or char *. On 10/10/2015 06:22 PM, Hieu Hoang wrote: > Yep. The cinst factor* is the original unique vocab I'd and its more > useful IMO cos u can get the string back without u referring back to the > vocab factory. But use what u like > > String piece is apparently faster for some operations > > On 10 Oct 2015 5:35 pm, "Lane Schwartz" <dowob...@gmail.com > <mailto:dowob...@gmail.com>> wrote: > > Wouldn't factor->GetId() be the unique integer ID of the string? > > On Fri, Oct 9, 2015 at 5:54 PM, Hieu Hoang <hieuho...@gmail.com > <mailto:hieuho...@gmail.com>> wrote: > > const Factor* is the vocab id. It's guaranteed to be unique for > each unique string. You can map directly to the string using > factor->GetString() > > > > On 09/10/2015 22:55, Lane Schwartz wrote: >> Thanks, Marcin. >> >> So when the various components of Moses pass words back and >> forth, what do they send each other? std::string? StringPiece? >> >> On Fri, Oct 9, 2015 at 4:28 PM, Marcin Junczys-Dowmunt >> <junc...@amu.edu.pl <mailto:junc...@amu.edu.pl>> wrote: >> >> For instance in my phrase table that would be >> >> mosesdecoder/moses/TranslationModel/CompactPT/PhraseDecoder.h >> >> StringVector<unsigned char, unsigned, std::allocator> >> m_sourceSymbols; >> StringVector<unsigned char, unsigned, std::allocator> >> m_targetSymbols; >> >> That's a memory-mapped vector of strings. >> >> W dniu 09.10.2015 o 23:22, Lane Schwartz pisze: >>> Seriously? That sounds inefficient. >>> >>> I've found code in KenLM that maps from strings to >>> integers, but not the other way around. >>> >>> Marcin, do you know, for example, where any Moses code is >>> for doing the mapping for any data structure? >>> >>> >>> On Fri, Oct 9, 2015 at 4:14 PM, Marcin Junczys-Dowmunt >>> <<mailto:junc...@amu.edu.pl>junc...@amu.edu.pl >>> <mailto:junc...@amu.edu.pl>> wrote: >>> >>> Hi, >>> This would only be a simple thing if there was a >>> common framework for that, but there isn't. Each >>> datastructure implements its own vocabularies and >>> look-up tables. There is no common set of integers. >>> Best, >>> Marcin >>> >>> W dniu 09.10.2015 o 23:11, Lane Schwartz pisze: >>>> Hey, >>>> >>>> I know this should be a simple thing to find, but >>>> what code in Moses is responsible for mapping back >>>> and forth between strings and integers? >>>> >>>> Thanks, >>>> Lane >>>> >>>> >>>> >>>> _______________________________________________ >>>> Moses-support mailing list >>>> Moses-support@mit.edu <mailto:Moses-support@mit.edu> >>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> >>> >>> _______________________________________________ >>> Moses-support mailing list >>> Moses-support@mit.edu <mailto:Moses-support@mit.edu> >>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> >>> >>> >>> >>> -- >>> When a place gets crowded enough to require ID's, social >>> collapse is not >>> far away. It is time to go elsewhere. The best thing >>> about space travel >>> is that it made it possible to go elsewhere. >>> -- R.A. Heinlein, "Time Enough For Love" >> >> >> >> >> -- >> When a place gets crowded enough to require ID's, social >> collapse is not >> far away. It is time to go elsewhere. The best thing about >> space travel >> is that it made it possible to go elsewhere. >> -- R.A. Heinlein, "Time Enough For Love" >> >> >> _______________________________________________ >> Moses-support mailing list >> Moses-support@mit.edu <mailto:Moses-support@mit.edu> >> http://mailman.mit.edu/mailman/listinfo/moses-support > > -- > Hieu Hoang > http://www.hoang.co.uk/hieu > > > > > -- > When a place gets crowded enough to require ID's, social collapse is not > far away. It is time to go elsewhere. The best thing about space > travel > is that it made it possible to go elsewhere. > -- R.A. Heinlein, "Time Enough For Love" > > > > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support