El 2018-11-04 14:56, mansur escribió:
I also noticed another problem...

When I parse a very big file I see that cg-proc process' memory
consumption constantly grows. About 0,1% per 2 seconds. I get out of
memory in my machine (8Gb total) during the night...

# top's outpus

32509 root      20   0 1201684   1,1g   6540 R  92,1  14,2  24:35.61
32508 root      20   0   79972  54880   3628 R  80,8   0,7  21:10.69
32490 root      20   0 1165972   1,1g   6080 R  29,1  13,7   8:02.71

# command I'm using:

cat file.txt | sed -r 's/$/\n/' | apertium -n -d ./apertium-tat
tat-tagger | cg-proc ./apertium-tat/dev/mansur.bin > file.txt

I also noticed that the speed of memory usage growing depends on what
I use in the example above:

1) 's/$/\n/'

2) 's/$/\n\n\n\n\n\n\n\n\n\n/'3) 's/$/\n. . . . . . . . . .\n. . . . .
. . . . .\n. . . . . . . . . .\n. . . . . . . . . .\n. . . . . . . . .
.\n. . . . . . . . . .\n. . . . . . . . . .\n. . . . . . . . . .\n. .
. . . . . . . .\n. . . . . . . . . .\n/'

With the 3rd example memory consumption grows much slower that with
the 1st.

Does cg-proc process have problems with memory leaking or I'm somehow
using it the wrong way? What can I do here to process my file

cat file.txt |\
sed -r 's/$/@.@#.#\n/' |\
apertium-destxt | apertium -f none -n -d ./apertium-tat tat-tagger |\
cg-proc ./apertium-tat/dev/mansur.bin |\
apertium-retxt > file.txt

Try that. If not, make sure you have your delimiters defined.

Note that if you are trying to run two CGs in series,

      <program name="lt-proc -w">
        <file name="tat.automorf.bin"/>
      <program name="cg-proc -w -1">
        <file name="tat.rlx.bin"/>

You probably don't want the -1 option or the -w option to the first CG. The '-1' means "pick the first analysis" and "-w" means "restore lemma case". You probably
only want the -w on the last cg-proc in the chain.


Apertium-stuff mailing list

Reply via email to