Re: Possible Fix? (Re: UdmSearch: DeleteNoServer still broken in 3.1.9)
Hello! Please find a patch which fixes "DeleteNoServer no" command incorrect behaviour. Patch against server.c Thanks for help in debugging! Caffeinate The World wrote: --- Caffeinate The World [EMAIL PROTECTED] wrote: # ./indexer -ma -n 1 -u http://www.mnpage.com/% AddServer 'http://www.state.mn.us/' 17 AddServer 'http://www.mnworkforcecenter.org/' 17 AddServer 'http://www.exploreminnesota.com/' 17 AddServer 'http://www.tpt.org/' 17 AddServer 'http://www.gorp.com/gorp/location/mn/mn.htm' 17 AddServer 'http://lists.rootsweb.com/index/usa/MN/' 17 AddServer 'http://*.mn.us/*' 18 AddServer '(null)' 17 Indexer[12748]: indexer from mnogosearch-3.1.9/PgSQL started with '/usr/local/install/mnogosearch -3.1.9/etc/indexer.conf' Indexer[12748]: [1] http://www.mnpage.com/magazines.html 0 'http://www.state.mn.us/' 17 1 'http://www.mnworkforcecenter.org/' 17 2 'http://www.exploreminnesota.com/' 17 3 'http://www.tpt.org/' 17 4 'http://www.gorp.com/gorp/location/mn/mn.htm' 17 5 'http://lists.rootsweb.com/index/usa/MN/' 17 6 'http://*.mn.us/*' 18 7 'http://lists.rootsweb.com/index/usa/MN/' 17 Indexer[12748]: [1] No 'Server' command for url... deleted. Indexer[12748]: [1] Done (627 seconds) ---cut--- looks like the 'null' server wasn't matched? also note that #5 and #7 are the same: 'http://lists.rootsweb.com/index/usa/MN/' --- Alexander Barkov [EMAIL PROTECTED] wrote: Well, indexer.conf is loaded as expected. Now find this in UdmFindServer() : for(i=0;iConf-nservers;i++){ int res; regmatch_t subs[NS]; and insert here: printf("%d '%s' %d\n",i,Conf-Server[i].url,Conf-Server[i].match_type); --- server.c.orig Thu Feb 1 14:17:35 2001 +++ server.cThu Feb 1 14:20:46 2001 @@ -30,9 +30,9 @@ /* to keep srv-url unchanged */ strcpy(urlstr,UDM_NULL2EMPTY(srv-url)); - if(UDM_SRV_TYPE(match_type)==UDM_SERVER_SUBSTR){ + if((UDM_SRV_TYPE(match_type)==UDM_SERVER_SUBSTR)(urlstr[0])){ /* Check whether valid URL is passed */ - if((urlstr[0])(res=UdmParseURL(from,urlstr))){ + if((res=UdmParseURL(from,urlstr))){ switch(res){ case UDM_PARSEURL_LONG: Conf-errcode=1;
Possible Fix? (Re: UdmSearch: DeleteNoServer still broken in 3.1.9)
alex or serge, could you look over this patch? i believe this patch should fix this problem described below: ---cut--- # diff -ru indexer.c.orig indexer.c --- indexer.c.orig Tue Jan 30 10:45:03 2001 +++ indexer.c Tue Jan 30 10:47:29 2001 @@ -368,7 +368,7 @@ } /* Find correspondent Server record from indexer.conf */ - if(!(CurSrv=UdmFindServer(Indexer-Conf,Doc-url,aliastr))){ + if((!(CurSrv=UdmFindServer(Indexer-Conf,Doc-url,aliastr)) (!CurSrv-delete_no_server ))){ UdmLog(Indexer,UDM_LOG_WARN,"No 'Server' command for url... deleted."); if(!strcmp(CurURL.filename,"robots.txt")){ if(IND_OK==(result=UdmDeleteRobotsFromHost(Indexer,CurURL.hostinfo))) ---/cut--- --- Caffeinate The World [EMAIL PROTECTED] wrote: i reported this back in 3.1.9pre13. i have 'DeleteNoServer no' set with many URL's in my sql db not having associated Server commands. here i just tried to reindex and i see that my URL is being deleted: # indexer -m -s 200 Indexer[2397]: indexer from mnogosearch-3.1.9/PgSQL started with '/usr/local/install/mnogosearch- 3.1.9/etc/indexer.conf' jobs Indexer[2397]: [1] http://www.mnworkforcecenter.org/lmi/pub1/mms/index.htm Indexer[2397]: [1] No 'Server' command for url... deleted. ò^C Received signal 2 - exit! (NOTE: i had to Ctrl-C it to stop it from deleting more URL's. here is my full indexer.conf: ---cut--- #Include inc1.conf DBAddr pgsql://***:*@/work/ DBMode cache #SyslogFacility local7 LogdAddr localhost:7000 LocalCharset iso-8859-1 Ispellmode db StopwordTable stopword #ServerTable server DeleteNoServer no #Allow * #Disallow NoMatch *.state.mn.us/* Disallow http://www.rootsweb.com/~mn* Disallow http://www.wxusa.com/* Disallow http://www.vitalrec.com/* Disallow http://*yahoo.com/* Disallow http://*aol.com/* Disallow http://www.salescircular.com/* Disallow http://*.wellsfargo.com/* # Disallow any except known extensions and directory index using "regex" match: Disallow NoMatch Regex \/$|\/SMTMall|\.htm$|\.html$|\.shtml$|\.jhtml$|\.phtml$|\.php$|\.php3$|\.a sp|\.txt$ # Exclude cgi-bin and non-parsed-headers using "string" match: Disallow */cgi-bin/* *.cgi */nph-* # Exclude anything with '?' sign in URL. Note that '?' sign has a # special meaning in "string" match, so we have to use "regex" match here: #Disallow Regex \? # Exclude some known extensions using fast "String" match: Disallow *.b*.sh *.md5 *.rpm Disallow *.arj *.tar *.zip *.tgz *.gz *.z *.bz2 Disallow *.lha *.lzh *.rar *.zoo *.ha *.tar.Z Disallow *.gif *.jpg *.jpeg *.bmp *.tiff *.tif *.xpm *.xbm *.pcx Disallow *.vdo *.mpeg *.mpe *.mpg *.avi *.movie *.mov *.dat Disallow *.mid *.mp3 *.rm *.ram *.wav *.aiff *.ra Disallow *.vrml *.wrl *.png Disallow *.exe *.com *.cab *.dll *.bin *.class *.ex_ Disallow *.tex *.texi *.xls *.doc *.texinfo Disallow *.rtf *.pdf *.cdf *.ps Disallow *.ai *.eps *.ppt *.hqx Disallow *.cpt *.bms *.oda *.tcl Disallow *.o*.a*.la *.so Disallow *.pat *.pm *.m4 *.am *.css Disallow *.map *.aif *.sit *.sea Disallow *.m3u *.qt *.mov # Exclude Apache directory list in different sort order using "string" match: Disallow *D=A *D=D *M=A *M=D *N=A *N=D *S=A *S=D # More complicated case. RAR .r00-.r99, ARJ a00-a99 files # and unix shared libraries. We use "Regex" match type here: Disallow Regex \.r[0-9][0-9]$ \.a[0-9][0-9]$ \.so\.[0-9]$ #CheckOnly *.b*.sh *.md5 #CheckOnly *.arj *.tar *.zip *.tgz *.gz #CheckOnly *.lha *.lzh *.rar *.zoo *.tar*.Z #CheckOnly *.gif *.jpg *.jpeg *.bmp *.tiff #CheckOnly *.vdo *.mpeg *.mpe *.mpg *.avi *.movie #CheckOnly *.mid *.mp3 *.rm *.ram *.wav *.aiff #CheckOnly *.vrml *.wrl *.png #CheckOnly *.exe *.cab *.dll *.bin *.class #CheckOnly *.tex *.texi *.xls *.doc *.texinfo #CheckOnly *.rtf *.pdf *.cdf *.ps #CheckOnly *.ai *.eps *.ppt *.hqx #CheckOnly *.cpt *.bms *.oda *.tcl #CheckOnly *.rpm *.m3u *.qt *.mov #CheckOnly *.map *.aif *.sit *.sea # # or check ANY except known text extensions using "regex" match: #Check NoMatch Regex \/$|\.html$|\.shtml$|\.phtml$|\.php$|\.txt$ #HrefOnly */mail*.html */thread*.html UseRemoteContentType yes AddType text/plain *.txt *.pl *.js *.h *.c *.pm *.e AddType text/html *.html *.htm *.m AddType image/x-xpixmap *.xpm AddType image/x-xbitmap *.xbm AddType image/gif *.gif AddType Regex \.r[0-9][0-9]$ AddType application/unknown *.* #Mime application/msword "text/plain; charset=cp1251" "catdoc $1" #Mime application/x-troff-man text/plain "deroff" #Mime text/x-postscripttext/plain "ps2ascii" Period 6m #Tag string #Category FFAABBCCDD MaxHops 56 MaxNetErrors 6 ReadTimeOut 30s DocTimeOut 1m30s NetErrorDelayTime 1d Robots yes
Re: Possible Fix? (Re: UdmSearch: DeleteNoServer still broken in 3.1.9)
oops that didn't work. but i'm pretty sure we need to test for the condition of delete_no_server here. i also tried: /* Find correspondent Server record from indexer.conf */ if(!(CurSrv=UdmFindServer(Indexer-Conf,Doc-url,aliastr))){ if(Indexer-Conf-csrv-delete_no_server){ UdmLog(Indexer,UDM_LOG_WARN,"No 'Server' command for url... deleted."); if(!strcmp(CurURL.filename,"robots.txt")){ if(IND_OK==(result=UdmDeleteRobotsFromHost(Indexer,CurURL.hostinfo))) result=UdmLoadRobots(Indexer); }else{ result=IND_OK; } if(result==IND_OK)result=UdmDeleteUrl(Indexer,Doc-url_id); FreeDoc(Doc); return(result); } } ---/cut--- but that didn't work either. any ideas? --- Caffeinate The World [EMAIL PROTECTED] wrote: alex or serge, could you look over this patch? i believe this patch should fix this problem described below: ---cut--- # diff -ru indexer.c.orig indexer.c --- indexer.c.orig Tue Jan 30 10:45:03 2001 +++ indexer.c Tue Jan 30 10:47:29 2001 @@ -368,7 +368,7 @@ } /* Find correspondent Server record from indexer.conf */ - if(!(CurSrv=UdmFindServer(Indexer-Conf,Doc-url,aliastr))){ + if((!(CurSrv=UdmFindServer(Indexer-Conf,Doc-url,aliastr)) (!CurSrv-delete_no_server ))){ UdmLog(Indexer,UDM_LOG_WARN,"No 'Server' command for url... deleted."); if(!strcmp(CurURL.filename,"robots.txt")){ if(IND_OK==(result=UdmDeleteRobotsFromHost(Indexer,CurURL.hostinfo))) ---/cut--- --- Caffeinate The World [EMAIL PROTECTED] wrote: i reported this back in 3.1.9pre13. i have 'DeleteNoServer no' set with many URL's in my sql db not having associated Server commands. here i just tried to reindex and i see that my URL is being deleted: # indexer -m -s 200 Indexer[2397]: indexer from mnogosearch-3.1.9/PgSQL started with '/usr/local/install/mnogosearch- 3.1.9/etc/indexer.conf' jobs Indexer[2397]: [1] http://www.mnworkforcecenter.org/lmi/pub1/mms/index.htm Indexer[2397]: [1] No 'Server' command for url... deleted. ò^C Received signal 2 - exit! (NOTE: i had to Ctrl-C it to stop it from deleting more URL's. here is my full indexer.conf: ---cut--- #Include inc1.conf DBAddr pgsql://***:*@/work/ DBMode cache #SyslogFacility local7 LogdAddr localhost:7000 LocalCharset iso-8859-1 Ispellmode db StopwordTable stopword #ServerTable server DeleteNoServer no #Allow * #Disallow NoMatch *.state.mn.us/* Disallow http://www.rootsweb.com/~mn* Disallow http://www.wxusa.com/* Disallow http://www.vitalrec.com/* Disallow http://*yahoo.com/* Disallow http://*aol.com/* Disallow http://www.salescircular.com/* Disallow http://*.wellsfargo.com/* # Disallow any except known extensions and directory index using "regex" match: Disallow NoMatch Regex \/$|\/SMTMall|\.htm$|\.html$|\.shtml$|\.jhtml$|\.phtml$|\.php$|\.php3$|\.a sp|\.txt$ # Exclude cgi-bin and non-parsed-headers using "string" match: Disallow */cgi-bin/* *.cgi */nph-* # Exclude anything with '?' sign in URL. Note that '?' sign has a # special meaning in "string" match, so we have to use "regex" match here: #Disallow Regex \? # Exclude some known extensions using fast "String" match: Disallow *.b*.sh *.md5 *.rpm Disallow *.arj *.tar *.zip *.tgz *.gz *.z *.bz2 Disallow *.lha *.lzh *.rar *.zoo *.ha *.tar.Z Disallow *.gif *.jpg *.jpeg *.bmp *.tiff *.tif *.xpm *.xbm *.pcx Disallow *.vdo *.mpeg *.mpe *.mpg *.avi *.movie *.mov *.dat Disallow *.mid *.mp3 *.rm *.ram *.wav *.aiff *.ra Disallow *.vrml *.wrl *.png Disallow *.exe *.com *.cab *.dll *.bin *.class *.ex_ Disallow *.tex *.texi *.xls *.doc *.texinfo Disallow *.rtf *.pdf *.cdf *.ps Disallow *.ai *.eps *.ppt *.hqx Disallow *.cpt *.bms *.oda *.tcl Disallow *.o*.a*.la *.so Disallow *.pat *.pm *.m4 *.am *.css Disallow *.map *.aif *.sit *.sea Disallow *.m3u *.qt *.mov # Exclude Apache directory list in different sort order using "string" match: Disallow *D=A *D=D *M=A *M=D *N=A *N=D *S=A *S=D # More complicated case. RAR .r00-.r99, ARJ a00-a99 files # and unix shared libraries. We use "Regex" match type here: Disallow Regex \.r[0-9][0-9]$ \.a[0-9][0-9]$ \.so\.[0-9]$ #CheckOnly *.b*.sh *.md5 #CheckOnly *.arj *.tar *.zip *.tgz *.gz #CheckOnly *.lha *.lzh *.rar *.zoo *.tar*.Z #CheckOnly *.gif *.jpg *.jpeg *.bmp *.tiff #CheckOnly *.vdo *.mpeg *.mpe *.mpg *.avi *.movie #CheckOnly *.mid *.mp3 *.rm *.ram *.wav *.aiff #CheckOnly *.vrml *.wrl *.png #CheckOnly *.exe *.cab *.dll *.bin *.class #CheckOnly *.tex *.texi *.xls *.doc *.texinfo #CheckOnly *.rtf *.pdf *.cdf *.ps #CheckOnly *.ai *.eps *.ppt *.hqx #CheckOnly *.cpt *.bms *.oda
Re: UdmSearch: DeleteNoServer still broken in 3.1.9
That's strange. I've tested your indexer.conf. Everything works fine. indexer does not delete this URL. Caffeinate The World wrote: i reported this back in 3.1.9pre13. i have 'DeleteNoServer no' set with many URL's in my sql db not having associated Server commands. here i just tried to reindex and i see that my URL is being deleted: # indexer -m -s 200 Indexer[2397]: indexer from mnogosearch-3.1.9/PgSQL started with '/usr/local/install/mnogosearch- 3.1.9/etc/indexer.conf' jobs Indexer[2397]: [1] http://www.mnworkforcecenter.org/lmi/pub1/mms/index.htm Indexer[2397]: [1] No 'Server' command for url... deleted. ò^C Received signal 2 - exit! (NOTE: i had to Ctrl-C it to stop it from deleting more URL's. here is my full indexer.conf: ---cut--- #Include inc1.conf DBAddr pgsql://***:*@/work/ DBMode cache #SyslogFacility local7 LogdAddr localhost:7000 LocalCharset iso-8859-1 Ispellmode db StopwordTable stopword #ServerTable server DeleteNoServer no #Allow * #Disallow NoMatch *.state.mn.us/* Disallow http://www.rootsweb.com/~mn* Disallow http://www.wxusa.com/* Disallow http://www.vitalrec.com/* Disallow http://*yahoo.com/* Disallow http://*aol.com/* Disallow http://www.salescircular.com/* Disallow http://*.wellsfargo.com/* # Disallow any except known extensions and directory index using "regex" match: Disallow NoMatch Regex \/$|\/SMTMall|\.htm$|\.html$|\.shtml$|\.jhtml$|\.phtml$|\.php$|\.php3$|\.a sp|\.txt$ # Exclude cgi-bin and non-parsed-headers using "string" match: Disallow */cgi-bin/* *.cgi */nph-* # Exclude anything with '?' sign in URL. Note that '?' sign has a # special meaning in "string" match, so we have to use "regex" match here: #Disallow Regex \? # Exclude some known extensions using fast "String" match: Disallow *.b*.sh *.md5 *.rpm Disallow *.arj *.tar *.zip *.tgz *.gz *.z *.bz2 Disallow *.lha *.lzh *.rar *.zoo *.ha *.tar.Z Disallow *.gif *.jpg *.jpeg *.bmp *.tiff *.tif *.xpm *.xbm *.pcx Disallow *.vdo *.mpeg *.mpe *.mpg *.avi *.movie *.mov *.dat Disallow *.mid *.mp3 *.rm *.ram *.wav *.aiff *.ra Disallow *.vrml *.wrl *.png Disallow *.exe *.com *.cab *.dll *.bin *.class *.ex_ Disallow *.tex *.texi *.xls *.doc *.texinfo Disallow *.rtf *.pdf *.cdf *.ps Disallow *.ai *.eps *.ppt *.hqx Disallow *.cpt *.bms *.oda *.tcl Disallow *.o*.a*.la *.so Disallow *.pat *.pm *.m4 *.am *.css Disallow *.map *.aif *.sit *.sea Disallow *.m3u *.qt *.mov # Exclude Apache directory list in different sort order using "string" match: Disallow *D=A *D=D *M=A *M=D *N=A *N=D *S=A *S=D # More complicated case. RAR .r00-.r99, ARJ a00-a99 files # and unix shared libraries. We use "Regex" match type here: Disallow Regex \.r[0-9][0-9]$ \.a[0-9][0-9]$ \.so\.[0-9]$ #CheckOnly *.b*.sh *.md5 #CheckOnly *.arj *.tar *.zip *.tgz *.gz #CheckOnly *.lha *.lzh *.rar *.zoo *.tar*.Z #CheckOnly *.gif *.jpg *.jpeg *.bmp *.tiff #CheckOnly *.vdo *.mpeg *.mpe *.mpg *.avi *.movie #CheckOnly *.mid *.mp3 *.rm *.ram *.wav *.aiff #CheckOnly *.vrml *.wrl *.png #CheckOnly *.exe *.cab *.dll *.bin *.class #CheckOnly *.tex *.texi *.xls *.doc *.texinfo #CheckOnly *.rtf *.pdf *.cdf *.ps #CheckOnly *.ai *.eps *.ppt *.hqx #CheckOnly *.cpt *.bms *.oda *.tcl #CheckOnly *.rpm *.m3u *.qt *.mov #CheckOnly *.map *.aif *.sit *.sea # # or check ANY except known text extensions using "regex" match: #Check NoMatch Regex \/$|\.html$|\.shtml$|\.phtml$|\.php$|\.txt$ #HrefOnly */mail*.html */thread*.html UseRemoteContentType yes AddType text/plain *.txt *.pl *.js *.h *.c *.pm *.e AddType text/html *.html *.htm *.m AddType image/x-xpixmap *.xpm AddType image/x-xbitmap *.xbm AddType image/gif *.gif AddType Regex \.r[0-9][0-9]$ AddType application/unknown *.* #Mime application/msword "text/plain; charset=cp1251" "catdoc $1" #Mime application/x-troff-man text/plain "deroff" #Mime text/x-postscripttext/plain "ps2ascii" Period 6m #Tag string #Category FFAABBCCDD MaxHops 56 MaxNetErrors 6 ReadTimeOut 30s DocTimeOut 1m30s NetErrorDelayTime 1d Robots yes Clones yes BodyWeight 2 TitleWeight 4 KeywordWeight 8 DescWeight 16 #UrlWeight 16 #UrlHostWeight 8 #Category FFAABBCCDD MaxHops 56 MaxNetErrors 6 ReadTimeOut 30s DocTimeOut 1m30s NetErrorDelayTime 1d Robots yes Clones yes BodyWeight 2 TitleWeight 4 KeywordWeight 8 DescWeight 16 #UrlWeight 16 #UrlHostWeight 8 #UrlPathWeight 8 #UrlFileWeight 0 #IspellCorrectFactor1 #IspellIncorrectFactor 1 #NumberFactor 1 #AlnumFactor 1 #MinWordLength 1 #MaxWordLength 32 #DeleteBad no Index yes Follow path Server site http://www.state.mn.us/ Server site http://www.exploreminnesota.com/ Server site http://www.tpt.org/ Server page
Re: Possible Fix? (Re: UdmSearch: DeleteNoServer still broken in 3.1.9)
This patch will not fix the problem. The problem is not here. "DeleteNoServer no" is implemented via adding one virtual emtpy server after loading indexer.conf. It means that if there is no other correspondent Server or Realm commands for some URL, indexer will find the last one empty server and will execute something like this: strncmp(url,Server[i].url,strlen(Server[i].url)) where Server[i].url is an empty string. So, any URL will pass this condition. I can't reproduce the same unexpected behaviour on my box, To debug it please check two things: 1. function UdmAddServer in the file server.c add as a first statement: printf("AddServer '%s' %d\n",srv-url,match_type); and check that an empty string appeare in the output after all "Server" arguments given in indexer.conf Then if the above works fine 2. Add debugging output into UdmFindServer function. I think it is clean enough how does it work. Caffeinate The World wrote: alex or serge, could you look over this patch? i believe this patch should fix this problem described below: ---cut--- # diff -ru indexer.c.orig indexer.c --- indexer.c.orig Tue Jan 30 10:45:03 2001 +++ indexer.c Tue Jan 30 10:47:29 2001 @@ -368,7 +368,7 @@ } /* Find correspondent Server record from indexer.conf */ - if(!(CurSrv=UdmFindServer(Indexer-Conf,Doc-url,aliastr))){ + if((!(CurSrv=UdmFindServer(Indexer-Conf,Doc-url,aliastr)) (!CurSrv-delete_no_server ))){ UdmLog(Indexer,UDM_LOG_WARN,"No 'Server' command for url... deleted."); if(!strcmp(CurURL.filename,"robots.txt")){ if(IND_OK==(result=UdmDeleteRobotsFromHost(Indexer,CurURL.hostinfo))) ---/cut--- --- Caffeinate The World [EMAIL PROTECTED] wrote: i reported this back in 3.1.9pre13. i have 'DeleteNoServer no' set with many URL's in my sql db not having associated Server commands. here i just tried to reindex and i see that my URL is being deleted: # indexer -m -s 200 Indexer[2397]: indexer from mnogosearch-3.1.9/PgSQL started with '/usr/local/install/mnogosearch- 3.1.9/etc/indexer.conf' jobs Indexer[2397]: [1] http://www.mnworkforcecenter.org/lmi/pub1/mms/index.htm Indexer[2397]: [1] No 'Server' command for url... deleted. ò^C Received signal 2 - exit! (NOTE: i had to Ctrl-C it to stop it from deleting more URL's. here is my full indexer.conf: ---cut--- #Include inc1.conf DBAddr pgsql://***:*@/work/ DBMode cache #SyslogFacility local7 LogdAddr localhost:7000 LocalCharset iso-8859-1 Ispellmode db StopwordTable stopword #ServerTable server DeleteNoServer no #Allow * #Disallow NoMatch *.state.mn.us/* Disallow http://www.rootsweb.com/~mn* Disallow http://www.wxusa.com/* Disallow http://www.vitalrec.com/* Disallow http://*yahoo.com/* Disallow http://*aol.com/* Disallow http://www.salescircular.com/* Disallow http://*.wellsfargo.com/* # Disallow any except known extensions and directory index using "regex" match: Disallow NoMatch Regex \/$|\/SMTMall|\.htm$|\.html$|\.shtml$|\.jhtml$|\.phtml$|\.php$|\.php3$|\.a sp|\.txt$ # Exclude cgi-bin and non-parsed-headers using "string" match: Disallow */cgi-bin/* *.cgi */nph-* # Exclude anything with '?' sign in URL. Note that '?' sign has a # special meaning in "string" match, so we have to use "regex" match here: #Disallow Regex \? # Exclude some known extensions using fast "String" match: Disallow *.b*.sh *.md5 *.rpm Disallow *.arj *.tar *.zip *.tgz *.gz *.z *.bz2 Disallow *.lha *.lzh *.rar *.zoo *.ha *.tar.Z Disallow *.gif *.jpg *.jpeg *.bmp *.tiff *.tif *.xpm *.xbm *.pcx Disallow *.vdo *.mpeg *.mpe *.mpg *.avi *.movie *.mov *.dat Disallow *.mid *.mp3 *.rm *.ram *.wav *.aiff *.ra Disallow *.vrml *.wrl *.png Disallow *.exe *.com *.cab *.dll *.bin *.class *.ex_ Disallow *.tex *.texi *.xls *.doc *.texinfo Disallow *.rtf *.pdf *.cdf *.ps Disallow *.ai *.eps *.ppt *.hqx Disallow *.cpt *.bms *.oda *.tcl Disallow *.o*.a*.la *.so Disallow *.pat *.pm *.m4 *.am *.css Disallow *.map *.aif *.sit *.sea Disallow *.m3u *.qt *.mov # Exclude Apache directory list in different sort order using "string" match: Disallow *D=A *D=D *M=A *M=D *N=A *N=D *S=A *S=D # More complicated case. RAR .r00-.r99, ARJ a00-a99 files # and unix shared libraries. We use "Regex" match type here: Disallow Regex \.r[0-9][0-9]$ \.a[0-9][0-9]$ \.so\.[0-9]$ #CheckOnly *.b*.sh *.md5 #CheckOnly *.arj *.tar *.zip *.tgz *.gz #CheckOnly *.lha *.lzh *.rar *.zoo *.tar*.Z #CheckOnly *.gif *.jpg *.jpeg *.bmp *.tiff #CheckOnly *.vdo *.mpeg *.mpe *.mpg *.avi *.movie #CheckOnly *.mid *.mp3 *.rm *.ram *.wav *.aiff #CheckOnly *.vrml *.wrl *.png #CheckOnly *.exe *.cab *.dll *.bin *.class #CheckOnly *.tex *.texi
Re: Possible Fix? (Re: UdmSearch: DeleteNoServer still broken in 3.1.9)
Well, indexer.conf is loaded as expected. Now find this in UdmFindServer() : for(i=0;iConf-nservers;i++){ int res; regmatch_t subs[NS]; and insert here: printf("%d '%s' %d\n",i,Conf-Server[i].url,Conf-Server[i].match_type); Caffeinate The World wrote: # ./indexer -ma -u http://www.mnpage.com/% AddServer 'http://www.state.mn.us/' 17 AddServer 'http://www.mnworkforcecenter.org/' 17 AddServer 'http://www.exploreminnesota.com/' 17 AddServer 'http://www.tpt.org/' 17 AddServer 'http://www.gorp.com/gorp/location/mn/mn.htm' 17 AddServer 'http://lists.rootsweb.com/index/usa/MN/' 17 AddServer 'http://*.mn.us/*' 18 AddServer '(null)' 17 __ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]