Re: [Moses-support] Moses vocabulary code

2015-10-10 Thread Kenneth Heafield
Agreed about the cuteness of const Factor *.

Let's say you're reading space-delimited file input.

std::string line("Foo Bar Baz Quux .");

One can make a StringPiece(line.data(), 3) that looks and for most
purposes acts like std::string("Foo") but requires zero memory
allocation.  It's not null terminated.  It's just a const char * and a
length without owning the underlying memory.  This makes it super fast
to parse/split text.  util/tokenize_piece.hh provides an iterator
operation for string splitting.

Taking it a step further, util::FilePiece does a rolling mmap of a text
file and gives you StringPiece.  Zero-copy file reading.

In Moses preference order for function parameters: const Factor *,
StringPiece, std::string or char *.

On 10/10/2015 06:22 PM, Hieu Hoang wrote:
> Yep. The cinst factor* is the original unique vocab I'd and its more
> useful IMO cos u can get the string back without u referring back to the
> vocab factory. But use what u like
> 
> String piece is apparently faster for some operations
> 
> On 10 Oct 2015 5:35 pm, "Lane Schwartz"  > wrote:
> 
> Wouldn't factor->GetId() be the unique integer ID of the string?
> 
> On Fri, Oct 9, 2015 at 5:54 PM, Hieu Hoang  > wrote:
> 
> const Factor* is the vocab id. It's guaranteed to be unique for
> each unique string. You can map directly to the string using
>factor->GetString()
> 
> 
> 
> On 09/10/2015 22:55, Lane Schwartz wrote:
>> Thanks, Marcin.
>>
>> So when the various components of Moses pass words back and
>> forth, what do they send each other? std::string? StringPiece? 
>>
>> On Fri, Oct 9, 2015 at 4:28 PM, Marcin Junczys-Dowmunt
>> mailto:junc...@amu.edu.pl>> wrote:
>>
>> For instance in my phrase table that would be
>>
>> mosesdecoder/moses/TranslationModel/CompactPT/PhraseDecoder.h
>>
>>   StringVector
>> m_sourceSymbols;   
>>   StringVector
>> m_targetSymbols;
>>
>> That's a memory-mapped vector of strings.
>>
>> W dniu 09.10.2015 o 23:22, Lane Schwartz pisze:
>>> Seriously? That sounds inefficient.
>>>
>>> I've found code in KenLM that maps from strings to
>>> integers, but not the other way around.
>>>
>>> Marcin, do you know, for example, where any Moses code is
>>> for doing the mapping for any data structure?
>>>
>>>
>>> On Fri, Oct 9, 2015 at 4:14 PM, Marcin Junczys-Dowmunt
>>> <junc...@amu.edu.pl
>>> > wrote:
>>>
>>> Hi,
>>> This would only be a simple thing if there was a
>>> common framework for that, but there isn't. Each
>>> datastructure implements its own vocabularies and
>>> look-up tables. There is no common set of integers.
>>> Best,
>>> Marcin
>>>
>>> W dniu 09.10.2015 o 23:11, Lane Schwartz pisze:
 Hey,

 I know this should be a simple thing to find, but
 what code in Moses is responsible for mapping back
 and forth between strings and integers?

 Thanks,
 Lane



 ___
 Moses-support mailing list
 Moses-support@mit.edu 
 http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu 
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>>>
>>> -- 
>>> When a place gets crowded enough to require ID's, social
>>> collapse is not
>>> far away.  It is time to go elsewhere.  The best thing
>>> about space travel
>>> is that it made it possible to go elsewhere.
>>> -- R.A. Heinlein, "Time Enough For Love"
>>
>>
>>
>>
>> -- 
>> When a place gets crowded enough to require ID's, social
>> collapse is not
>> far away.  It is time to go elsewhere.  The best thing about
>> space travel
>> is that it made it possible to go elsewhere.
>> -- R.A. Heinlein, "Time Enough For Love"
>>
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu 
>> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> -- 
> 

Re: [Moses-support] Moses vocabulary code

2015-10-10 Thread Hieu Hoang
Yep. The cinst factor* is the original unique vocab I'd and its more useful
IMO cos u can get the string back without u referring back to the vocab
factory. But use what u like

String piece is apparently faster for some operations
On 10 Oct 2015 5:35 pm, "Lane Schwartz"  wrote:

> Wouldn't factor->GetId() be the unique integer ID of the string?
>
> On Fri, Oct 9, 2015 at 5:54 PM, Hieu Hoang  wrote:
>
>> const Factor* is the vocab id. It's guaranteed to be unique for each
>> unique string. You can map directly to the string using
>>factor->GetString()
>>
>>
>>
>> On 09/10/2015 22:55, Lane Schwartz wrote:
>>
>> Thanks, Marcin.
>>
>> So when the various components of Moses pass words back and forth, what
>> do they send each other? std::string? StringPiece?
>>
>> On Fri, Oct 9, 2015 at 4:28 PM, Marcin Junczys-Dowmunt <
>> junc...@amu.edu.pl> wrote:
>>
>>> For instance in my phrase table that would be
>>>
>>> mosesdecoder/moses/TranslationModel/CompactPT/PhraseDecoder.h
>>>
>>>   StringVector
>>> m_sourceSymbols;
>>>   StringVector m_targetSymbols;
>>>
>>> That's a memory-mapped vector of strings.
>>>
>>> W dniu 09.10.2015 o 23:22, Lane Schwartz pisze:
>>>
>>> Seriously? That sounds inefficient.
>>>
>>> I've found code in KenLM that maps from strings to integers, but not the
>>> other way around.
>>>
>>> Marcin, do you know, for example, where any Moses code is for doing the
>>> mapping for any data structure?
>>>
>>>
>>> On Fri, Oct 9, 2015 at 4:14 PM, Marcin Junczys-Dowmunt <
>>> junc...@amu.edu.pl> wrote:
>>>
 Hi,
 This would only be a simple thing if there was a common framework for
 that, but there isn't. Each datastructure implements its own vocabularies
 and look-up tables. There is no common set of integers.
 Best,
 Marcin

 W dniu 09.10.2015 o 23:11, Lane Schwartz pisze:

 Hey,

 I know this should be a simple thing to find, but what code in Moses is
 responsible for mapping back and forth between strings and integers?

 Thanks,
 Lane



 ___
 Moses-support mailing 
 listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support



 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support


>>>
>>>
>>> --
>>> When a place gets crowded enough to require ID's, social collapse is not
>>> far away.  It is time to go elsewhere.  The best thing about space travel
>>> is that it made it possible to go elsewhere.
>>> -- R.A. Heinlein, "Time Enough For Love"
>>>
>>>
>>>
>>
>>
>> --
>> When a place gets crowded enough to require ID's, social collapse is not
>> far away.  It is time to go elsewhere.  The best thing about space travel
>> is that it made it possible to go elsewhere.
>> -- R.A. Heinlein, "Time Enough For Love"
>>
>>
>> ___
>> Moses-support mailing 
>> listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>> --
>> Hieu Hoanghttp://www.hoang.co.uk/hieu
>>
>>
>
>
> --
> When a place gets crowded enough to require ID's, social collapse is not
> far away.  It is time to go elsewhere.  The best thing about space travel
> is that it made it possible to go elsewhere.
> -- R.A. Heinlein, "Time Enough For Love"
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Moses vocabulary code

2015-10-10 Thread Lane Schwartz
BTW, what's the rationale for using StringPiece instead of std::string? I
thought the main reason for using StringPiece was for implicit conversion
from char *

On Fri, Oct 9, 2015 at 5:15 PM, Kenneth Heafield 
wrote:

> The Moses common vocabulary is moses/FactorCollection.h.  Common
> practice in core Moses code is to pass around a const Factor * (which
> can be resolved to a StringPiece or a consecutive ID).
>
> If a feature/phrase table has its own ids because e.g. it's baked into
> the binary file, then there's a std::vector to map from Moses ID to
> feature function ID.  See moses/LM/Ken.h:99 for an example.
>
> std::string (or even StringPiece) conversion at decode time is a bug.  A
> sadly common one.
>
> On 10/09/2015 10:22 PM, Lane Schwartz wrote:
> > Seriously? That sounds inefficient.
> >
> > I've found code in KenLM that maps from strings to integers, but not the
> > other way around.
> >
> > Marcin, do you know, for example, where any Moses code is for doing the
> > mapping for any data structure?
> >
> >
> > On Fri, Oct 9, 2015 at 4:14 PM, Marcin Junczys-Dowmunt
> > mailto:junc...@amu.edu.pl>> wrote:
> >
> > Hi,
> > This would only be a simple thing if there was a common framework
> > for that, but there isn't. Each datastructure implements its own
> > vocabularies and look-up tables. There is no common set of integers.
> > Best,
> > Marcin
> >
> > W dniu 09.10.2015 o 23:11, Lane Schwartz pisze:
> >> Hey,
> >>
> >> I know this should be a simple thing to find, but what code in
> >> Moses is responsible for mapping back and forth between strings
> >> and integers?
> >>
> >> Thanks,
> >> Lane
> >>
> >>
> >>
> >> ___
> >> Moses-support mailing list
> >> Moses-support@mit.edu 
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu 
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> >
> >
> > --
> > When a place gets crowded enough to require ID's, social collapse is not
> > far away.  It is time to go elsewhere.  The best thing about space travel
> > is that it made it possible to go elsewhere.
> > -- R.A. Heinlein, "Time Enough For Love"
> >
> >
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
When a place gets crowded enough to require ID's, social collapse is not
far away.  It is time to go elsewhere.  The best thing about space travel
is that it made it possible to go elsewhere.
-- R.A. Heinlein, "Time Enough For Love"
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Moses vocabulary code

2015-10-10 Thread Lane Schwartz
Wouldn't factor->GetId() be the unique integer ID of the string?

On Fri, Oct 9, 2015 at 5:54 PM, Hieu Hoang  wrote:

> const Factor* is the vocab id. It's guaranteed to be unique for each
> unique string. You can map directly to the string using
>factor->GetString()
>
>
>
> On 09/10/2015 22:55, Lane Schwartz wrote:
>
> Thanks, Marcin.
>
> So when the various components of Moses pass words back and forth, what do
> they send each other? std::string? StringPiece?
>
> On Fri, Oct 9, 2015 at 4:28 PM, Marcin Junczys-Dowmunt  > wrote:
>
>> For instance in my phrase table that would be
>>
>> mosesdecoder/moses/TranslationModel/CompactPT/PhraseDecoder.h
>>
>>   StringVector
>> m_sourceSymbols;
>>   StringVector m_targetSymbols;
>>
>> That's a memory-mapped vector of strings.
>>
>> W dniu 09.10.2015 o 23:22, Lane Schwartz pisze:
>>
>> Seriously? That sounds inefficient.
>>
>> I've found code in KenLM that maps from strings to integers, but not the
>> other way around.
>>
>> Marcin, do you know, for example, where any Moses code is for doing the
>> mapping for any data structure?
>>
>>
>> On Fri, Oct 9, 2015 at 4:14 PM, Marcin Junczys-Dowmunt <
>> junc...@amu.edu.pl> wrote:
>>
>>> Hi,
>>> This would only be a simple thing if there was a common framework for
>>> that, but there isn't. Each datastructure implements its own vocabularies
>>> and look-up tables. There is no common set of integers.
>>> Best,
>>> Marcin
>>>
>>> W dniu 09.10.2015 o 23:11, Lane Schwartz pisze:
>>>
>>> Hey,
>>>
>>> I know this should be a simple thing to find, but what code in Moses is
>>> responsible for mapping back and forth between strings and integers?
>>>
>>> Thanks,
>>> Lane
>>>
>>>
>>>
>>> ___
>>> Moses-support mailing 
>>> listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>
>>
>> --
>> When a place gets crowded enough to require ID's, social collapse is not
>> far away.  It is time to go elsewhere.  The best thing about space travel
>> is that it made it possible to go elsewhere.
>> -- R.A. Heinlein, "Time Enough For Love"
>>
>>
>>
>
>
> --
> When a place gets crowded enough to require ID's, social collapse is not
> far away.  It is time to go elsewhere.  The best thing about space travel
> is that it made it possible to go elsewhere.
> -- R.A. Heinlein, "Time Enough For Love"
>
>
> ___
> Moses-support mailing 
> listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> --
> Hieu Hoanghttp://www.hoang.co.uk/hieu
>
>


-- 
When a place gets crowded enough to require ID's, social collapse is not
far away.  It is time to go elsewhere.  The best thing about space travel
is that it made it possible to go elsewhere.
-- R.A. Heinlein, "Time Enough For Love"
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Moses vocabulary code

2015-10-09 Thread Hieu Hoang
const Factor* is the vocab id. It's guaranteed to be unique for each 
unique string. You can map directly to the string using

   factor->GetString()


On 09/10/2015 22:55, Lane Schwartz wrote:

Thanks, Marcin.

So when the various components of Moses pass words back and forth, 
what do they send each other? std::string? StringPiece?


On Fri, Oct 9, 2015 at 4:28 PM, Marcin Junczys-Dowmunt 
mailto:junc...@amu.edu.pl>> wrote:


For instance in my phrase table that would be

mosesdecoder/moses/TranslationModel/CompactPT/PhraseDecoder.h

  StringVector
m_sourceSymbols;
  StringVector
m_targetSymbols;

That's a memory-mapped vector of strings.

W dniu 09.10.2015 o 23:22, Lane Schwartz pisze:

Seriously? That sounds inefficient.

I've found code in KenLM that maps from strings to integers, but
not the other way around.

Marcin, do you know, for example, where any Moses code is for
doing the mapping for any data structure?


On Fri, Oct 9, 2015 at 4:14 PM, Marcin Junczys-Dowmunt
mailto:junc...@amu.edu.pl>> wrote:

Hi,
This would only be a simple thing if there was a common
framework for that, but there isn't. Each datastructure
implements its own vocabularies and look-up tables. There is
no common set of integers.
Best,
Marcin

W dniu 09.10.2015 o 23:11, Lane Schwartz pisze:

Hey,

I know this should be a simple thing to find, but what code
in Moses is responsible for mapping back and forth between
strings and integers?

Thanks,
Lane



___
Moses-support mailing list
Moses-support@mit.edu 
http://mailman.mit.edu/mailman/listinfo/moses-support



___
Moses-support mailing list
Moses-support@mit.edu 
http://mailman.mit.edu/mailman/listinfo/moses-support




-- 
When a place gets crowded enough to require ID's, social collapse

is not
far away.  It is time to go elsewhere.  The best thing about
space travel
is that it made it possible to go elsewhere.
-- R.A. Heinlein, "Time Enough For Love"





--
When a place gets crowded enough to require ID's, social collapse is not
far away.  It is time to go elsewhere.  The best thing about space travel
is that it made it possible to go elsewhere.
-- R.A. Heinlein, "Time Enough For Love"


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


--
Hieu Hoang
http://www.hoang.co.uk/hieu

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Moses vocabulary code

2015-10-09 Thread Marcin Junczys-Dowmunt

oh, I didn't know that. Is any feature function actually using that?

W dniu 10.10.2015 o 00:54, Hieu Hoang pisze:
const Factor* is the vocab id. It's guaranteed to be unique for each 
unique string. You can map directly to the string using

   factor->GetString()


On 09/10/2015 22:55, Lane Schwartz wrote:

Thanks, Marcin.

So when the various components of Moses pass words back and forth, 
what do they send each other? std::string? StringPiece?


On Fri, Oct 9, 2015 at 4:28 PM, Marcin Junczys-Dowmunt 
mailto:junc...@amu.edu.pl>> wrote:


For instance in my phrase table that would be

mosesdecoder/moses/TranslationModel/CompactPT/PhraseDecoder.h

  StringVector
m_sourceSymbols;
  StringVector
m_targetSymbols;

That's a memory-mapped vector of strings.

W dniu 09.10.2015 o 23:22, Lane Schwartz pisze:

Seriously? That sounds inefficient.

I've found code in KenLM that maps from strings to integers, but
not the other way around.

Marcin, do you know, for example, where any Moses code is for
doing the mapping for any data structure?


On Fri, Oct 9, 2015 at 4:14 PM, Marcin Junczys-Dowmunt
 wrote:

Hi,
This would only be a simple thing if there was a common
framework for that, but there isn't. Each datastructure
implements its own vocabularies and look-up tables. There is
no common set of integers.
Best,
Marcin

W dniu 09.10.2015 o 23:11, Lane Schwartz pisze:

Hey,

I know this should be a simple thing to find, but what code
in Moses is responsible for mapping back and forth between
strings and integers?

Thanks,
Lane



___
Moses-support mailing list
Moses-support@mit.edu 
http://mailman.mit.edu/mailman/listinfo/moses-support



___
Moses-support mailing list
Moses-support@mit.edu 
http://mailman.mit.edu/mailman/listinfo/moses-support




-- 
When a place gets crowded enough to require ID's, social

collapse is not
far away.  It is time to go elsewhere.  The best thing about
space travel
is that it made it possible to go elsewhere.
-- R.A. Heinlein, "Time Enough For Love"





--
When a place gets crowded enough to require ID's, social collapse is not
far away.  It is time to go elsewhere.  The best thing about space travel
is that it made it possible to go elsewhere.
-- R.A. Heinlein, "Time Enough For Love"


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


--
Hieu Hoang
http://www.hoang.co.uk/hieu


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Moses vocabulary code

2015-10-09 Thread Hieu Hoang
err, i thought every ff uses (const Factor*), but perhaps not now that i 
look at the code. Probing pt uses it

   ProbingPt.h line 43-47
probing pt doesn't support multiple factors.

Most other pt supprts supports multiple factors but does it 
inefficiently by getting tokenizing the string into multiple factors.


Factor class also has a unique id which KenLM and IRSTLM uses to map to 
their internal id. I would prefer everyone (const Factor*) as the unique 
id, but it's no big deal


On 09/10/2015 23:59, Marcin Junczys-Dowmunt wrote:

oh, I didn't know that. Is any feature function actually using that?

W dniu 10.10.2015 o 00:54, Hieu Hoang pisze:
const Factor* is the vocab id. It's guaranteed to be unique for each 
unique string. You can map directly to the string using

   factor->GetString()


On 09/10/2015 22:55, Lane Schwartz wrote:

Thanks, Marcin.

So when the various components of Moses pass words back and forth, 
what do they send each other? std::string? StringPiece?


On Fri, Oct 9, 2015 at 4:28 PM, Marcin Junczys-Dowmunt 
 wrote:


For instance in my phrase table that would be

mosesdecoder/moses/TranslationModel/CompactPT/PhraseDecoder.h

  StringVector
m_sourceSymbols;
  StringVector
m_targetSymbols;

That's a memory-mapped vector of strings.

W dniu 09.10.2015 o 23:22, Lane Schwartz pisze:

Seriously? That sounds inefficient.

I've found code in KenLM that maps from strings to integers,
but not the other way around.

Marcin, do you know, for example, where any Moses code is for
doing the mapping for any data structure?


On Fri, Oct 9, 2015 at 4:14 PM, Marcin Junczys-Dowmunt
 wrote:

Hi,
This would only be a simple thing if there was a common
framework for that, but there isn't. Each datastructure
implements its own vocabularies and look-up tables. There
is no common set of integers.
Best,
Marcin

W dniu 09.10.2015 o 23:11, Lane Schwartz pisze:

Hey,

I know this should be a simple thing to find, but what
code in Moses is responsible for mapping back and forth
between strings and integers?

Thanks,
Lane



___
Moses-support mailing list
Moses-support@mit.edu 
http://mailman.mit.edu/mailman/listinfo/moses-support



___
Moses-support mailing list
Moses-support@mit.edu 
http://mailman.mit.edu/mailman/listinfo/moses-support




-- 
When a place gets crowded enough to require ID's, social

collapse is not
far away.  It is time to go elsewhere.  The best thing about
space travel
is that it made it possible to go elsewhere.
-- R.A. Heinlein, "Time Enough For Love"





--
When a place gets crowded enough to require ID's, social collapse is not
far away.  It is time to go elsewhere.  The best thing about space 
travel

is that it made it possible to go elsewhere.
-- R.A. Heinlein, "Time Enough For Love"


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


--
Hieu Hoang
http://www.hoang.co.uk/hieu




--
Hieu Hoang
http://www.hoang.co.uk/hieu

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Moses vocabulary code

2015-10-09 Thread Marcin Junczys-Dowmunt

For instance in my phrase table that would be

mosesdecoder/moses/TranslationModel/CompactPT/PhraseDecoder.h

  StringVector m_sourceSymbols;
  StringVector m_targetSymbols;

That's a memory-mapped vector of strings.

W dniu 09.10.2015 o 23:22, Lane Schwartz pisze:

Seriously? That sounds inefficient.

I've found code in KenLM that maps from strings to integers, but not 
the other way around.


Marcin, do you know, for example, where any Moses code is for doing 
the mapping for any data structure?



On Fri, Oct 9, 2015 at 4:14 PM, Marcin Junczys-Dowmunt 
mailto:junc...@amu.edu.pl>> wrote:


Hi,
This would only be a simple thing if there was a common framework
for that, but there isn't. Each datastructure implements its own
vocabularies and look-up tables. There is no common set of integers.
Best,
Marcin

W dniu 09.10.2015 o 23:11, Lane Schwartz pisze:

Hey,

I know this should be a simple thing to find, but what code in
Moses is responsible for mapping back and forth between strings
and integers?

Thanks,
Lane



___
Moses-support mailing list
Moses-support@mit.edu 
http://mailman.mit.edu/mailman/listinfo/moses-support



___
Moses-support mailing list
Moses-support@mit.edu 
http://mailman.mit.edu/mailman/listinfo/moses-support




--
When a place gets crowded enough to require ID's, social collapse is not
far away.  It is time to go elsewhere.  The best thing about space travel
is that it made it possible to go elsewhere.
-- R.A. Heinlein, "Time Enough For Love"


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Moses vocabulary code

2015-10-09 Thread Marcin Junczys-Dowmunt

Hi,
This would only be a simple thing if there was a common framework for 
that, but there isn't. Each datastructure implements its own 
vocabularies and look-up tables. There is no common set of integers.

Best,
Marcin

W dniu 09.10.2015 o 23:11, Lane Schwartz pisze:

Hey,

I know this should be a simple thing to find, but what code in Moses 
is responsible for mapping back and forth between strings and integers?


Thanks,
Lane



___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Moses vocabulary code

2015-10-09 Thread Kenneth Heafield
The Moses common vocabulary is moses/FactorCollection.h.  Common
practice in core Moses code is to pass around a const Factor * (which
can be resolved to a StringPiece or a consecutive ID).

If a feature/phrase table has its own ids because e.g. it's baked into
the binary file, then there's a std::vector to map from Moses ID to
feature function ID.  See moses/LM/Ken.h:99 for an example.

std::string (or even StringPiece) conversion at decode time is a bug.  A
sadly common one.

On 10/09/2015 10:22 PM, Lane Schwartz wrote:
> Seriously? That sounds inefficient.
> 
> I've found code in KenLM that maps from strings to integers, but not the
> other way around.
> 
> Marcin, do you know, for example, where any Moses code is for doing the
> mapping for any data structure?
> 
> 
> On Fri, Oct 9, 2015 at 4:14 PM, Marcin Junczys-Dowmunt
> mailto:junc...@amu.edu.pl>> wrote:
> 
> Hi,
> This would only be a simple thing if there was a common framework
> for that, but there isn't. Each datastructure implements its own
> vocabularies and look-up tables. There is no common set of integers.
> Best,
> Marcin
> 
> W dniu 09.10.2015 o 23:11, Lane Schwartz pisze:
>> Hey,
>>
>> I know this should be a simple thing to find, but what code in
>> Moses is responsible for mapping back and forth between strings
>> and integers?
>>
>> Thanks,
>> Lane
>>
>>
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu 
>> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu 
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> 
> 
> 
> -- 
> When a place gets crowded enough to require ID's, social collapse is not
> far away.  It is time to go elsewhere.  The best thing about space travel
> is that it made it possible to go elsewhere.
> -- R.A. Heinlein, "Time Enough For Love"
> 
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Moses vocabulary code

2015-10-09 Thread Marcin Junczys-Dowmunt

Hopefully StringPiece if it's newer code.
In my own code I wasn't yet using StringPiece and I did not rewrite it 
after Moses switched mostly to StringPiece. Something to fix.


W dniu 09.10.2015 o 23:55, Lane Schwartz pisze:

Thanks, Marcin.

So when the various components of Moses pass words back and forth, 
what do they send each other? std::string? StringPiece?


On Fri, Oct 9, 2015 at 4:28 PM, Marcin Junczys-Dowmunt 
mailto:junc...@amu.edu.pl>> wrote:


For instance in my phrase table that would be

mosesdecoder/moses/TranslationModel/CompactPT/PhraseDecoder.h

  StringVector
m_sourceSymbols;
  StringVector
m_targetSymbols;

That's a memory-mapped vector of strings.

W dniu 09.10.2015 o 23:22, Lane Schwartz pisze:

Seriously? That sounds inefficient.

I've found code in KenLM that maps from strings to integers, but
not the other way around.

Marcin, do you know, for example, where any Moses code is for
doing the mapping for any data structure?


On Fri, Oct 9, 2015 at 4:14 PM, Marcin Junczys-Dowmunt
mailto:junc...@amu.edu.pl>> wrote:

Hi,
This would only be a simple thing if there was a common
framework for that, but there isn't. Each datastructure
implements its own vocabularies and look-up tables. There is
no common set of integers.
Best,
Marcin

W dniu 09.10.2015 o 23:11, Lane Schwartz pisze:

Hey,

I know this should be a simple thing to find, but what code
in Moses is responsible for mapping back and forth between
strings and integers?

Thanks,
Lane



___
Moses-support mailing list
Moses-support@mit.edu 
http://mailman.mit.edu/mailman/listinfo/moses-support



___
Moses-support mailing list
Moses-support@mit.edu 
http://mailman.mit.edu/mailman/listinfo/moses-support




-- 
When a place gets crowded enough to require ID's, social collapse

is not
far away.  It is time to go elsewhere.  The best thing about
space travel
is that it made it possible to go elsewhere.
-- R.A. Heinlein, "Time Enough For Love"





--
When a place gets crowded enough to require ID's, social collapse is not
far away.  It is time to go elsewhere.  The best thing about space travel
is that it made it possible to go elsewhere.
-- R.A. Heinlein, "Time Enough For Love"


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Moses vocabulary code

2015-10-09 Thread Lane Schwartz
Thanks, Marcin.

So when the various components of Moses pass words back and forth, what do
they send each other? std::string? StringPiece?

On Fri, Oct 9, 2015 at 4:28 PM, Marcin Junczys-Dowmunt 
wrote:

> For instance in my phrase table that would be
>
> mosesdecoder/moses/TranslationModel/CompactPT/PhraseDecoder.h
>
>   StringVector
> m_sourceSymbols;
>   StringVector m_targetSymbols;
>
> That's a memory-mapped vector of strings.
>
> W dniu 09.10.2015 o 23:22, Lane Schwartz pisze:
>
> Seriously? That sounds inefficient.
>
> I've found code in KenLM that maps from strings to integers, but not the
> other way around.
>
> Marcin, do you know, for example, where any Moses code is for doing the
> mapping for any data structure?
>
>
> On Fri, Oct 9, 2015 at 4:14 PM, Marcin Junczys-Dowmunt  > wrote:
>
>> Hi,
>> This would only be a simple thing if there was a common framework for
>> that, but there isn't. Each datastructure implements its own vocabularies
>> and look-up tables. There is no common set of integers.
>> Best,
>> Marcin
>>
>> W dniu 09.10.2015 o 23:11, Lane Schwartz pisze:
>>
>> Hey,
>>
>> I know this should be a simple thing to find, but what code in Moses is
>> responsible for mapping back and forth between strings and integers?
>>
>> Thanks,
>> Lane
>>
>>
>>
>> ___
>> Moses-support mailing 
>> listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
>
> --
> When a place gets crowded enough to require ID's, social collapse is not
> far away.  It is time to go elsewhere.  The best thing about space travel
> is that it made it possible to go elsewhere.
> -- R.A. Heinlein, "Time Enough For Love"
>
>
>


-- 
When a place gets crowded enough to require ID's, social collapse is not
far away.  It is time to go elsewhere.  The best thing about space travel
is that it made it possible to go elsewhere.
-- R.A. Heinlein, "Time Enough For Love"
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Moses vocabulary code

2015-10-09 Thread Lane Schwartz
Seriously? That sounds inefficient.

I've found code in KenLM that maps from strings to integers, but not the
other way around.

Marcin, do you know, for example, where any Moses code is for doing the
mapping for any data structure?


On Fri, Oct 9, 2015 at 4:14 PM, Marcin Junczys-Dowmunt 
wrote:

> Hi,
> This would only be a simple thing if there was a common framework for
> that, but there isn't. Each datastructure implements its own vocabularies
> and look-up tables. There is no common set of integers.
> Best,
> Marcin
>
> W dniu 09.10.2015 o 23:11, Lane Schwartz pisze:
>
> Hey,
>
> I know this should be a simple thing to find, but what code in Moses is
> responsible for mapping back and forth between strings and integers?
>
> Thanks,
> Lane
>
>
>
> ___
> Moses-support mailing 
> listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


-- 
When a place gets crowded enough to require ID's, social collapse is not
far away.  It is time to go elsewhere.  The best thing about space travel
is that it made it possible to go elsewhere.
-- R.A. Heinlein, "Time Enough For Love"
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Moses vocabulary code

2015-10-09 Thread Lane Schwartz
Hey,

I know this should be a simple thing to find, but what code in Moses is
responsible for mapping back and forth between strings and integers?

Thanks,
Lane
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support