Agreed about the cuteness of const Factor *.

Let's say you're reading space-delimited file input.

std::string line("Foo Bar Baz Quux .");

One can make a StringPiece(line.data(), 3) that looks and for most
purposes acts like std::string("Foo") but requires zero memory
allocation.  It's not null terminated.  It's just a const char * and a
length without owning the underlying memory.  This makes it super fast
to parse/split text.  util/tokenize_piece.hh provides an iterator
operation for string splitting.

Taking it a step further, util::FilePiece does a rolling mmap of a text
file and gives you StringPiece.  Zero-copy file reading.

In Moses preference order for function parameters: const Factor *,
StringPiece, std::string or char *.

On 10/10/2015 06:22 PM, Hieu Hoang wrote:
> Yep. The cinst factor* is the original unique vocab I'd and its more
> useful IMO cos u can get the string back without u referring back to the
> vocab factory. But use what u like
> 
> String piece is apparently faster for some operations
> 
> On 10 Oct 2015 5:35 pm, "Lane Schwartz" <dowob...@gmail.com
> <mailto:dowob...@gmail.com>> wrote:
> 
>     Wouldn't factor->GetId() be the unique integer ID of the string?
> 
>     On Fri, Oct 9, 2015 at 5:54 PM, Hieu Hoang <hieuho...@gmail.com
>     <mailto:hieuho...@gmail.com>> wrote:
> 
>         const Factor* is the vocab id. It's guaranteed to be unique for
>         each unique string. You can map directly to the string using
>            factor->GetString()
> 
> 
> 
>         On 09/10/2015 22:55, Lane Schwartz wrote:
>>         Thanks, Marcin.
>>
>>         So when the various components of Moses pass words back and
>>         forth, what do they send each other? std::string? StringPiece? 
>>
>>         On Fri, Oct 9, 2015 at 4:28 PM, Marcin Junczys-Dowmunt
>>         <junc...@amu.edu.pl <mailto:junc...@amu.edu.pl>> wrote:
>>
>>             For instance in my phrase table that would be
>>
>>             mosesdecoder/moses/TranslationModel/CompactPT/PhraseDecoder.h
>>
>>               StringVector<unsigned char, unsigned, std::allocator>
>>             m_sourceSymbols;                   
>>               StringVector<unsigned char, unsigned, std::allocator>
>>             m_targetSymbols;
>>
>>             That's a memory-mapped vector of strings.
>>
>>             W dniu 09.10.2015 o 23:22, Lane Schwartz pisze:
>>>             Seriously? That sounds inefficient.
>>>
>>>             I've found code in KenLM that maps from strings to
>>>             integers, but not the other way around.
>>>
>>>             Marcin, do you know, for example, where any Moses code is
>>>             for doing the mapping for any data structure?
>>>
>>>
>>>             On Fri, Oct 9, 2015 at 4:14 PM, Marcin Junczys-Dowmunt
>>>             <<mailto:junc...@amu.edu.pl>junc...@amu.edu.pl
>>>             <mailto:junc...@amu.edu.pl>> wrote:
>>>
>>>                 Hi,
>>>                 This would only be a simple thing if there was a
>>>                 common framework for that, but there isn't. Each
>>>                 datastructure implements its own vocabularies and
>>>                 look-up tables. There is no common set of integers.
>>>                 Best,
>>>                 Marcin
>>>
>>>                 W dniu 09.10.2015 o 23:11, Lane Schwartz pisze:
>>>>                 Hey,
>>>>
>>>>                 I know this should be a simple thing to find, but
>>>>                 what code in Moses is responsible for mapping back
>>>>                 and forth between strings and integers?
>>>>
>>>>                 Thanks,
>>>>                 Lane
>>>>
>>>>
>>>>
>>>>                 _______________________________________________
>>>>                 Moses-support mailing list
>>>>                 Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>>>                 http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>                 _______________________________________________
>>>                 Moses-support mailing list
>>>                 Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>>                 http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>>>
>>>             -- 
>>>             When a place gets crowded enough to require ID's, social
>>>             collapse is not
>>>             far away.  It is time to go elsewhere.  The best thing
>>>             about space travel
>>>             is that it made it possible to go elsewhere.
>>>                             -- R.A. Heinlein, "Time Enough For Love"
>>
>>
>>
>>
>>         -- 
>>         When a place gets crowded enough to require ID's, social
>>         collapse is not
>>         far away.  It is time to go elsewhere.  The best thing about
>>         space travel
>>         is that it made it possible to go elsewhere.
>>                         -- R.A. Heinlein, "Time Enough For Love"
>>
>>
>>         _______________________________________________
>>         Moses-support mailing list
>>         Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>         http://mailman.mit.edu/mailman/listinfo/moses-support
> 
>         -- 
>         Hieu Hoang
>         http://www.hoang.co.uk/hieu
> 
> 
> 
> 
>     -- 
>     When a place gets crowded enough to require ID's, social collapse is not
>     far away.  It is time to go elsewhere.  The best thing about space
>     travel
>     is that it made it possible to go elsewhere.
>                     -- R.A. Heinlein, "Time Enough For Love"
> 
> 
> 
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to