I prefer the API as it gives me more flexibility and fits the overall architecture of our components. But here is part of my set-up:
Cutoff 6 Iterations 200 CustomFeatureGenerator with looking at the 4 previous and 2 subsequent tokens. So, I gave it a whole night and I saw the process was dead in the morning. But I'll give it another try and will let you know. Thank you! Svetoslav On 2013-04-26 12:42, "Jörn Kottmann" <[email protected]> wrote: >I always edit the opennlp script and change it to what I need. > >Anyway, we have a Two Pass Data Indexer which writes the features to disk >to save memory during indexing, depending on how you train you might >have a cutoff=5 which eliminates probably a lot of your features and >therefore >saves a lot of memory. > >The indexing might just need a bit of time, how long did you wait? > >Jörn > >On 04/26/2013 12:33 PM, William Colen wrote: >> From command line you can specify memory using >> >> MAVEN_OPTS="-Xmx4048m" >> >> You can also set it as JVM arguments if you are using from the API: >> >> java -Xmx4048m ... >> >> >> >> On Fri, Apr 26, 2013 at 4:30 AM, Svetoslav Marinov < >> [email protected]> wrote: >> >>> I use the API. Can one specify the memory size via the command line? I >>> think the default there is 1024M? At 8G memory during "computing event >>> counts...", at 16G during indexing: "Computing event counts... done. >>> 50153300 events >>> IndexingŠ" >>> >>> Svetoslav >>> >>> On 2013-04-26 09:12, "Jörn Kottmann" <[email protected]> wrote: >>> >>>> On 04/26/2013 09:06 AM, Svetoslav Marinov wrote: >>>>> I'm wondering what is the max size (if such exists) for training a >>>>>NER >>>>> model? I have a corpus of 2 600 000 sentences annotated with just one >>>>> category, 310M in size. However, the training never finishes 8G >>>>>memory >>>>> resulted in java out of memory exception, and when I increased it to >>>>>16G >>>>> it just died with no error message. >>>> Do you use the command line interface or the API for the training? >>>> At which stage of the training did you get the out of memory >>>>exception? >>>> Where did it just die when you used 16G of memory (maybe do a jstack) >>>>? >>>> >>>> Jörn >>>> >>> >>> > >
