Re: [Moses-support] Estimating probabilities with KenLM

Kenneth Heafield Fri, 22 Nov 2013 09:52:14 -0800

Hi,

        What OS are you on?  Cygwin?  Apparently every OS reports memory size
in a different way:

http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/physmem.c;h=2629936146e3042f927523322f18aca76996cd7f;hb=HEAD

The good news is that the above code is LGPLv2:

http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=modules/physmem;h=9644522e0493a85a9fb4ae7c4449741c2c1500ea;hb=HEAD

But currently I'm just using this short function that will fail on some
platforms:

uint64_t GuessPhysicalMemory() {
#if defined(_WIN32) || defined(_WIN64)
  return 0;
#elif defined(_SC_PHYS_PAGES) && defined(_SC_PAGESIZE)
  long pages = sysconf(_SC_PHYS_PAGES);
  if (pages == -1) return 0;
  long page_size = sysconf(_SC_PAGESIZE);
  if (page_size == -1) return 0;
  return static_cast<uint64_t>(pages) * static_cast<uint64_t>(page_size);
#else
  return 0;
#endif
}

If it fails, I just don't let users specify memory as a percentage.  So
one thing thing to fix is putting physmem.{h,c} in util then changing
calls to GuessPhysicalMemory.  But I'm also not a fan of the way the GNU
code gives up and makes up a number at the end.

The second porting issue is that lmplz makes parallel use of pread,
pwrite, and write.  Windows is unsafe in this regard (POSIX requires
that pread/pwrite not change the file pointer; Windows has no way to
implement that atomically).  To fix this, we'll always specify the file
offset in cases that happen concurrently.  Extend util/stream/io.* with
a PWrite class based on PWriteOrThrow then change FileBuffer to use
PWrite.  Then I guess one should rename PReadOrThrow/PWriteOrThrow to
something that indicates they're not-quite-POSIX on windows.  Also, the
macros in these functions should detect cygwin, bypassing cygwin's
"Function not implemented" and calling Windows APIs directly (they're
already there for _WIN32).

I don't have a windows box so I can say what should be changed at a high
level, but need an actual user to ensure it compiles and runs correctly.

Kenneth

On 11/22/13 06:49, Prasanth K wrote:
> Hi, 
> 
> I am trying to use KenLM for building a language model on the Europarl
> corpus. Following the instructions in
> (http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel#ntoc19),
> I added the few lines for getting KenLM to estimate the LM probabilities
> (order/n=5) to my config file to the EMS. The language model dies down
> during training saying that the "Function not implemented" at counting
> and sorting n-grams stage (the first stage itself). Does this mean there
> is something wrong with my installation? Or is just insufficient memory?
> 
> Incidentally, when I started giving the amount of memory in terms of %
> (80%) there was an error "Failed to parse .. into memory size because
> physical memory size could not be determined". I am also curious why
> this happens? 
> 
> Kenneth, can you shed some light on this? Thanks. 
> 
> - Regards,
> Prasanth
> 
> 
> 
> -- 
> "Theories have four stages of acceptance. i) this is worthless nonsense;
> ii) this is an interesting, but perverse, point of view, iii) this is
> true, but quite unimportant; iv) I always said so."
> 
>   --- J.B.S. Haldane
> 
> 
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Estimating probabilities with KenLM

Reply via email to