El 2018-11-04 14:56, mansur escribió:
I also noticed another problem...
When I parse a very big file I see that cg-proc process' memory
consumption constantly grows. About 0,1% per 2 seconds. I get out of
memory in my machine (8Gb total) during the night...
# top's outpus
32509 root 20 0 1201684 1,1g 6540 R 92,1 14,2 24:35.61
cg-proc
32508 root 20 0 79972 54880 3628 R 80,8 0,7 21:10.69
lt-proc
32490 root 20 0 1165972 1,1g 6080 R 29,1 13,7 8:02.71
cg-proc
# command I'm using:
cat file.txt | sed -r 's/$/\n/' | apertium -n -d ./apertium-tat
tat-tagger | cg-proc ./apertium-tat/dev/mansur.bin > file.txt
I also noticed that the speed of memory usage growing depends on what
I use in the example above:
1) 's/$/\n/'
2) 's/$/\n\n\n\n\n\n\n\n\n\n/'3) 's/$/\n. . . . . . . . . .\n. . . . .
. . . . .\n. . . . . . . . . .\n. . . . . . . . . .\n. . . . . . . . .
.\n. . . . . . . . . .\n. . . . . . . . . .\n. . . . . . . . . .\n. .
. . . . . . . .\n. . . . . . . . . .\n/'
With the 3rd example memory consumption grows much slower that with
the 1st.
Does cg-proc process have problems with memory leaking or I'm somehow
using it the wrong way? What can I do here to process my file
successfully?
cat file.txt |\
sed -r 's/$/@.@#.#\n/' |\
apertium-destxt | apertium -f none -n -d ./apertium-tat tat-tagger |\
cg-proc ./apertium-tat/dev/mansur.bin |\
apertium-retxt > file.txt
Try that. If not, make sure you have your delimiters defined.
Note that if you are trying to run two CGs in series,
<program name="lt-proc -w">
<file name="tat.automorf.bin"/>
</program>
<program name="cg-proc -w -1">
<file name="tat.rlx.bin"/>
</program>
You probably don't want the -1 option or the -w option to the first CG.
The '-1'
means "pick the first analysis" and "-w" means "restore lemma case". You
probably
only want the -w on the last cg-proc in the chain.
FRan
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff