I also had the same problem, and I am also interested by the
modifications to make to avoid the segmentation fault.

Since when I tried it was for a simple test and I didnt bother
correcting, I made this script, which you might use also. It splits the
input into chunks of 2500 lines, It is used like this:
<file split-file-wrapper.py 2500 parse-en-collins >outfile
(But it makes the processing much slower, modifying the source would be
better).

-- 
Raphael Payen


On Fri, 2010-10-15 at 14:40 +0200, marco turchi wrote:
> Hi
> I have the same problem with the Collins' parser. Do u know exactly
> what I need to change in the source code of the parser? or u have a
> modified version?
> 
> Thanks a lot
> Marco
> 
> On Thu, Jun 3, 2010 at 5:03 PM, Hwidong Na <le...@postech.ac.kr>
> wrote:
>         Hi,
>         
>         This is not because of the wrapper script, but the Collins'
>         parser. You
>         can modify the source to iterate the read_sentences function
>         in the file
>         "main.c". In addition, you need to modify defined values in
>         "grammar.h"
>         to avoid segmentation faults of long sentences.
>         
>         --
>         Hwidong Na <le...@postech.ac.kr>
>         KLE lab, POSTECH, KOREA
>         
>         
>         2010-05-27 (목), 19:20 +0800, dongxinghua0213: 
>         
>         > hello,
>         > when  parsing sentences using  parse-en-collins.perl,I find
>         only 2500
>         > parsed sentences are available ,but the number  of sentences
>         are more
>         > than  one hundred thousand , what can I do to parse all
>         sentences ?
>         >
>         >  thank you !
>         >
>         >
>         >
>         >
>         ______________________________________________________________________
>         > 网易为中小企业免费提供企业邮箱(自主域名)
>         
>         > _______________________________________________
>         > Moses-support mailing list
>         > Moses-support@mit.edu
>         > http://mailman.mit.edu/mailman/listinfo/moses-support
>         
>         
>         
>         
>         
>         _______________________________________________
>         Moses-support mailing list
>         Moses-support@mit.edu
>         http://mailman.mit.edu/mailman/listinfo/moses-support 
> 
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

#!/usr/bin/python
# Copyright Alpha CRC Ltd. distributed as GPL

import sys,os,subprocess

def usage():
    sys.stdout.write("Runs <command> repetitively, splitting input in chunks of <num-lines>\nUsage: "+os.path.basename(sys.argv[0])+" <num-lines> <command>\n")
    sys.exit(1)

if not len(sys.argv) >= 3:
    usage()
chunksize=int(sys.argv[1])
command=sys.argv[2:]
if not chunksize > 0:
    sys.stdout.write("invalid num\n")
    sys.exit(1)

def init_processpipe(c):
    proc = subprocess.Popen(c, stdin=subprocess.PIPE, stdout=sys.stdout, stderr=open(os.devnull,"w"))
    return proc

def communicate_and_check(p):
    p.communicate()
    if (p.returncode != 0):
        import os
        print >> sys.stderr, "Stopped in line "+str(numlines)+" of iteration "+str(numiter)+" (source line "+str(numiter*chunksize+numlines)+") with error: "+str(p.returncode)#+ " - "+os.strerror(p.returncode)
        sys.exit(p.returncode)
    sys.stdout.flush()
#    print

process = init_processpipe(command)
numlines=0
numiter = 0
for line in sys.stdin:
    if (numlines == chunksize):
        communicate_and_check(process)
        process = init_processpipe(command)
        numlines=0
        numiter+=1
    process.stdin.write(line)
    numlines +=1

communicate_and_check(process)
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to