I always edit the opennlp script and change it to what I need.

Anyway, we have a Two Pass Data Indexer which writes the features to disk
to save memory during indexing, depending on how you train you might
have a cutoff=5 which eliminates probably a lot of your features and therefore
saves a lot of memory.

The indexing might just need a bit of time, how long did you wait?

Jörn

On 04/26/2013 12:33 PM, William Colen wrote:
 From command line you can specify memory using

MAVEN_OPTS="-Xmx4048m"

You can also set it as JVM arguments if you are using from the API:

java -Xmx4048m ...



On Fri, Apr 26, 2013 at 4:30 AM, Svetoslav Marinov <
[email protected]> wrote:

I use the API. Can one specify the memory size via the command line? I
think the default there is 1024M? At 8G memory during "computing event
counts...", at 16G during indexing: "Computing event counts...  done.
50153300 events
         IndexingŠ"

Svetoslav

On 2013-04-26 09:12, "Jörn Kottmann" <[email protected]> wrote:

On 04/26/2013 09:06 AM, Svetoslav Marinov wrote:
I'm wondering what is the max size (if such exists) for training a NER
model? I have a corpus of 2 600 000 sentences annotated with just one
category, 310M in size. However, the training never finishes ­ 8G memory
resulted in java out of memory exception, and when I increased it to 16G
it just died with no error message.
Do you use the command line interface or the API for the training?
At which stage of the training did you get the out of memory exception?
Where did it just die when you used 16G of memory (maybe do a jstack) ?

Jörn




Reply via email to