Hi, You don't need to know the total number of the lines in advance. I attached my modified version of "main.c", downsizing the number of sentences in memory for each iteration to 250. I tested it using "examples/sec??.tagged", and the both input file is parsed without any segmentation fault.
Best regards, -- Hwidong Na <le...@postech.ac.kr> KLE lab, POSTECH, KOREA 2010-10-15 (금), 16:02 +0200, marco turchi: > Hi > thanks, I'm trying to modify the main.c in a way that it reads the > file twice, the first time to get the number of lines and the second > to run the parser. It is not the best solution, but if ti works it can > solve the problem. > > I do not yet take in account the segmentation fault. > > thanks > Marco > > On Fri, Oct 15, 2010 at 3:49 PM, Raphael Payen <rpa...@alphacrc.com> > wrote: > I also had the same problem, and I am also interested by the > modifications to make to avoid the segmentation fault. > > Since when I tried it was for a simple test and I didnt bother > correcting, I made this script, which you might use also. It > splits the > input into chunks of 2500 lines, It is used like this: > <file split-file-wrapper.py 2500 parse-en-collins >outfile > (But it makes the processing much slower, modifying the source > would be > better). > > -- > Raphael Payen > > > > On Fri, 2010-10-15 at 14:40 +0200, marco turchi wrote: > > Hi > > I have the same problem with the Collins' parser. Do u know > exactly > > what I need to change in the source code of the parser? or u > have a > > modified version? > > > > Thanks a lot > > Marco > > > > On Thu, Jun 3, 2010 at 5:03 PM, Hwidong Na > <le...@postech.ac.kr> > > wrote: > > Hi, > > > > This is not because of the wrapper script, but the > Collins' > > parser. You > > can modify the source to iterate the read_sentences > function > > in the file > > "main.c". In addition, you need to modify defined > values in > > "grammar.h" > > to avoid segmentation faults of long sentences. > > > > -- > > Hwidong Na <le...@postech.ac.kr> > > KLE lab, POSTECH, KOREA > > > > > > 2010-05-27 (목), 19:20 +0800, dongxinghua0213: > > > > > hello, > > > when parsing sentences using > parse-en-collins.perl,I find > > only 2500 > > > parsed sentences are available ,but the number of > sentences > > are more > > > than one hundred thousand , what can I do to > parse all > > sentences ? > > > > > > thank you ! > > > > > > > > > > > > > > > ______________________________________________________________________ > > > 网易为中小企业免费提供企业邮箱(自主域名) > > > > > _______________________________________________ > > > Moses-support mailing list > > > Moses-support@mit.edu > > > > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > > > > > > > > > _______________________________________________ > > Moses-support mailing list > > Moses-support@mit.edu > > > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > _______________________________________________ > > Moses-support mailing list > > Moses-support@mit.edu > > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > > > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support
/* This code is the statistical natural language parser described in M. Collins. 1999. Head-Driven Statistical Models for Natural Language Parsing. PhD Dissertation, University of Pennsylvania. Copyright (C) 1999 Michael Collins This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ #include <stdio.h> #include <string.h> #include <math.h> #include <stdlib.h> #include <time.h> #include <assert.h> #include "lexicon.h" #include "grammar.h" #include "mymalloc.h" #include "mymalloc_char.h" #include "hash.h" #include "prob.h" #include "readevents.h" #include "sentence.h" #include "chart.h" #define BUFSIZE 250 sentence_type sentences[BUFSIZE]; int main(int argc, char *argv[]) { int s; int numsentences; FILE *words; char grammar[1000]; char buffer[1000]; float temp; int npflag; time_t g_time; time_t s_time; if(argc!=8) { fprintf(stderr,"ERROR in command line, usage:\n cat countsfile | parser.out sentences-file grammarfile beamsize punctuation-flag distaflag distvflag npflag\n"); return 0; } sscanf(argv[1],"%s",buffer); words=fopen(buffer,"r"); assert(words!=NULL); sscanf(argv[2],"%s",grammar); sscanf(argv[3],"%f",&temp); BEAMPROB = log(temp); sscanf(argv[4],"%d",&PUNC_FLAG); sscanf(argv[5],"%d",&DISTAFLAG); sscanf(argv[6],"%d",&DISTVFLAG); sscanf(argv[7],"%d",&npflag); assert(npflag==0 || npflag==1); set_treebankoutputflag(npflag); mymalloc_init(); mymalloc_char_init(); hash_make_table(8000007,&new_hash); effhash_make_table(1000003,&eff_hash); read_grammar(grammar); // numsentences=read_sentences(words,sentences,BUFSIZE); // // fprintf(stderr,"NUMSENTENCES %d\n",numsentences); read_events(stdin,&new_hash,-1); // iterate until no more sentences remain. numsentences = 1; while (numsentences > 0){ numsentences=read_sentences(words,sentences,BUFSIZE); fprintf(stderr,"NUMSENTENCES %d\n",numsentences); for(s=0;s<numsentences;s++) { time(&g_time); pthresh = -5000000; parse_sentence(&sentences[s]); /* print_chart();*/ time(&s_time); printf("TIME %d\n",(int) (s_time-g_time)); } } return 1; }
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support