I also had the same problem, and I am also interested by the modifications to make to avoid the segmentation fault.
Since when I tried it was for a simple test and I didnt bother correcting, I made this script, which you might use also. It splits the input into chunks of 2500 lines, It is used like this: <file split-file-wrapper.py 2500 parse-en-collins >outfile (But it makes the processing much slower, modifying the source would be better). -- Raphael Payen On Fri, 2010-10-15 at 14:40 +0200, marco turchi wrote: > Hi > I have the same problem with the Collins' parser. Do u know exactly > what I need to change in the source code of the parser? or u have a > modified version? > > Thanks a lot > Marco > > On Thu, Jun 3, 2010 at 5:03 PM, Hwidong Na <le...@postech.ac.kr> > wrote: > Hi, > > This is not because of the wrapper script, but the Collins' > parser. You > can modify the source to iterate the read_sentences function > in the file > "main.c". In addition, you need to modify defined values in > "grammar.h" > to avoid segmentation faults of long sentences. > > -- > Hwidong Na <le...@postech.ac.kr> > KLE lab, POSTECH, KOREA > > > 2010-05-27 (목), 19:20 +0800, dongxinghua0213: > > > hello, > > when parsing sentences using parse-en-collins.perl,I find > only 2500 > > parsed sentences are available ,but the number of sentences > are more > > than one hundred thousand , what can I do to parse all > sentences ? > > > > thank you ! > > > > > > > > > ______________________________________________________________________ > > 网易为中小企业免费提供企业邮箱(自主域名) > > > _______________________________________________ > > Moses-support mailing list > > Moses-support@mit.edu > > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > > > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support
#!/usr/bin/python # Copyright Alpha CRC Ltd. distributed as GPL import sys,os,subprocess def usage(): sys.stdout.write("Runs <command> repetitively, splitting input in chunks of <num-lines>\nUsage: "+os.path.basename(sys.argv[0])+" <num-lines> <command>\n") sys.exit(1) if not len(sys.argv) >= 3: usage() chunksize=int(sys.argv[1]) command=sys.argv[2:] if not chunksize > 0: sys.stdout.write("invalid num\n") sys.exit(1) def init_processpipe(c): proc = subprocess.Popen(c, stdin=subprocess.PIPE, stdout=sys.stdout, stderr=open(os.devnull,"w")) return proc def communicate_and_check(p): p.communicate() if (p.returncode != 0): import os print >> sys.stderr, "Stopped in line "+str(numlines)+" of iteration "+str(numiter)+" (source line "+str(numiter*chunksize+numlines)+") with error: "+str(p.returncode)#+ " - "+os.strerror(p.returncode) sys.exit(p.returncode) sys.stdout.flush() # print process = init_processpipe(command) numlines=0 numiter = 0 for line in sys.stdin: if (numlines == chunksize): communicate_and_check(process) process = init_processpipe(command) numlines=0 numiter+=1 process.stdin.write(line) numlines +=1 communicate_and_check(process)
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support