Webboard: Trailing dot=segfault

2001-07-03 Thread Zenon Panoussis

Author: Zenon Panoussis
Email: [EMAIL PROTECTED]
Message:
v 3.1.17 RH linux 7.1, MySQL 3.23.26

Indexer[10300]: [1] http://www.freezone.org/.
Tue 03 12:35:25 [10044] Client #0 left
Segmentation fault (core dumped)

Three times in a row on the same URL. Notice the trailing dot 
in the URL. I didn't put it there; it is part of the URL as 
indexer printed it before crashing. Could it be that indexer 
can't deal with a malformatted link that? 

Z


Reply: http://www.mnogosearch.org/board/message.php?id=2577

___
If you want to unsubscribe send unsubscribe general
to [EMAIL PROTECTED]




Webboard: Trailing dot=segfault

2001-07-03 Thread Zenon Panoussis

Author: Zenon Panoussis
Email: [EMAIL PROTECTED]
Message:

Here's another one:

Indexer[12263]: [1] http://members.aol.com/pjmoy/..
Wed 04 03:03:10 [10044] Client #0 left
Segmentation fault (core dumped)

Notice the trailing dots. Indexer segfaults every time 
it comes to a URL with trailing dots and never otherwise. 
I have indexed these URLs with earlier versions without 
problems. I suspect that previous versions could deal 
with bad html better than .17 can. 

Ah, and another difference: in previous versions I didn't 
use crosswords, but now I do. It could be there too.

Z


Reply: http://www.mnogosearch.org/board/message.php?id=2582

___
If you want to unsubscribe send unsubscribe general
to [EMAIL PROTECTED]




Webboard: Weird search results

2001-06-13 Thread Zenon Panoussis

Author: Zenon Panoussis
Email: [EMAIL PROTECTED]
Message:

 It seems you are using search results cache and it is
 quite old. Remove all cached queries.

No, it's not that. I am not using results cache at all. 

Z

Reply: http://www.mnogosearch.org/board/message.php?id=2435

___
If you want to unsubscribe send unsubscribe general
to [EMAIL PROTECTED]




Spammers, mother fuckers, virii and this list

2001-04-08 Thread Zenon Panoussis


Guys (sadly no gals around as far as the eye can see...)

If we could all agree that 

- nationality and character are not derivatives of each-other
- spammers are mother fuckers
- superonline.com is an ISP that really sucks (I have had to 
  deal with them myself, I know first-hand)
- Outlook is a mail client that really sucks and there is no 
  virus that won't spread through it
- anybody can make a mistake or become the victim of a mistake

and, most important of all, that 

- this list is not a general discussion list

then perhaps we could all shake virtual hands and leave this 
subject behind us and go back to our favourite nerding activities? 

Please do not reply, except by e-mail if you need to.

Z


-- 
oracle@everywhere: The ephemeral source of the eternal truth...
___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




Re: Webboard: SSI

2001-04-06 Thread Zenon Panoussis



Alexander Barkov skrev:
 

  Does anybody know of any way to put server side includes 
  in search.htm? 

 Use $iurl(http://some/include.html) template
 syntax. It includes given URL. You may also
 use $if(/usr/local/httpd/include.html). This comman
 includes given file from local system.

It works excellently. And I realise that I have to read the 
documentation again. Last time I did that we were at 3.1.7 or 
so, and lots of things seem to have been added in the meanwhile. 

Z


-- 
oracle@everywhere: The ephemeral source of the eternal truth...
___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




Webboard: SSI

2001-04-05 Thread Zenon Panoussis

Author: Zenon Panoussis
Email: [EMAIL PROTECTED]
Message:

Does anybody know of any way to put server side includes in search.htm? It seems 
rather impossible to do it through the 
server configuration; after all, the file is not processed 
by the server, but by search.cgi. 

Reply: http://search.mnogo.ru/board/message.php?id=1905

___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




Webboard: Link length

2001-04-05 Thread Zenon Panoussis

Author: Zenon Panoussis
Email: [EMAIL PROTECTED]
Message:

The webboard is going to reformat this and make it look like shit, but you can get the 
same information with 
  # mysql -p mnogosearch
  mysql describe url;
The length is 128 characters. 

Z


+-+--+--+-+-++-+
| Field   | Type | Null | Key | Default | Extra  | Privileges  
||
+-+--+--+-+-++-+
| rec_id  | int(11)  |  | PRI | NULL| auto_increment | 
|select,insert,update,references |
| status  | int(11)  |  | | 0   || 
|select,insert,update,references |
| url | varchar(128) |  | UNI | || 
|select,insert,update,references |
| content_type| varchar(48)  |  | | || 
|select,insert,update,references |
| title   | varchar(128) |  | | || 
|select,insert,update,references |
| txt | varchar(255) |  | | || 
|select,insert,update,references |
| docsize | int(11)  |  | | 0   || 
|select,insert,update,references |
| last_index_time | int(11)  |  | | 0   || 
|select,insert,update,references |
| next_index_time | int(11)  |  | | 0   || 
|select,insert,update,references |
| last_mod_time   | int(11)  |  | | 0   || 
|select,insert,update,references |
| referrer| int(11)  |  | | 0   || 
|select,insert,update,references |
| tag | varchar(11)  |  | | 0   || 
|select,insert,update,references |
| hops| int(11)  |  | | 0   || 
|select,insert,update,references |
| category| varchar(11)  |  | | || 
|select,insert,update,references |
| keywords| varchar(255) |  | | || 
|select,insert,update,references |
| description | varchar(100) |  | | || 
|select,insert,update,references |
| crc32   | int(11)  |  | MUL | 0   || 
|select,insert,update,references |
| lang| char(2)  |  | | || 
|select,insert,update,references |
+-+--+--+-+-++-+

Reply: http://search.mnogo.ru/board/message.php?id=1906

___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




Webboard: Compliments

2001-04-03 Thread Zenon Panoussis

Author: Zenon Panoussis
Email: [EMAIL PROTECTED]
Message:

In the past couple of days I've been looking at urls scrolling 
up the screen, ending in things like

   Indexer[12895]: [1] Done (100122 seconds)

and

   200  0 190277 OK

In the meanwhile, no core dumps. No segfaults. No complaints. 
No problems whatsoever.

At the same time I have been getting lots of public compliments 
for "my" search engine, while really it is *your* search engine 
and the compliments should be addressed to you.

You guys have come a long way. It has to be said and acknowledged, 
and I am raising a glas to you. Cheers! Thanks for really good work. 



Reply: http://search.mnogo.ru/board/message.php?id=1891

___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




Webboard: indexer -g

2001-04-01 Thread Zenon Panoussis

Author: Zenon Panoussis
Email: [EMAIL PROTECTED]
Message:
v 3.1.12, MySQL:

./indexer -g 0101 correctly indexes all the pages that are listed 
in that category in indexer.conf , but it doesn't stop there; it 
goes on and indexes pages in other categories too if they are linked 
to from pages in the right category. 

Hmm. Do I make myself clear? I think not. This is what I mean: 

Assume that indexer.conf contains the following:

Category 0101
Server site http://www.here.com
Category 0102
Server site http://www.there.org 


If I now give the command "indexer -g 0101" the indexer will 
crawl www.here.com. However, if there is a link in www.here.com 
to www.there.org, then the indexer will continue and index 
www.there.com too, which was not my intention.

Z



Reply: http://search.mnogo.ru/board/message.php?id=1878

___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




Webboard: One URL per domain

2001-04-01 Thread Zenon Panoussis

Author: Zenon Panoussis
Email: [EMAIL PROTECTED]
Message:

At times a search term happens to be repeated lots of times on 
different pages of the same site, so that the results get clogged. 
Imagine looking for shops in your area that offer "home delivery". 
If one shop has the line "home delivery $2 extra per order" on 
every single page of every one of thousands of articles, you will 
never get across to any other shop. Yet, what you are looking for 
is different shops, not different articles.

The solution to this is an option to return only one URL per domain. 
To my knowledge is http://www.vindex.nl the only search engine that 
offers this option currently, actually by default. I think it is an 
excellent option to add to the todo list.



Reply: http://search.mnogo.ru/board/message.php?id=1879

___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




Webboard: CVS

2001-03-31 Thread Zenon Panoussis

Author: Zenon Panoussis
Email: [EMAIL PROTECTED]
Message:

Being totally unfamiliar with CVS, I keep reading HOWTOs and yet can't manage to check 
out a current version (%#@*!). It would be very helpful if you'd put the exact cvs 
command together with the CVS info on the main page. 

Z


Reply: http://search.mnogo.ru/board/message.php?id=1877

___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




Webboard: DB.robots ?

2001-03-13 Thread Zenon Panoussis

Author: Zenon Panoussis
Email: [EMAIL PROTECTED]
Message:

Duh. It took some time to find out what had happened, but 
as it turns out I had typed 
  mysql -p database source/create/mysql/create.txt 
thereby truncating the create.txt file to 0 bytes and not 
creating the tables either. db.robots simply happened to 
be the first table that indexer tried to access. The bug 
is me.

Z

Reply: http://search.mnogo.ru/board/message.php?id=1697

___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




Re: mirror paths

2001-03-06 Thread Zenon Panoussis



Caffeinate The World skrev:
 

 #MirrorRoot /path/to/mirror
 #MirrorHeadersRoot /path/to/headers

 in regard to the above, are they relative to the installation path
 --PREFIX like var is?

No, these ones are absolute. 

Z


-- 
oracle@everywhere: The ephemeral source of the eternal truth...
___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




Re: mirror paths

2001-03-06 Thread Zenon Panoussis



Caffeinate The World skrev:
 
 this is so strange, i still don't see anything in my mirror
 directories...

Try ./indexer -m 

Z


-- 
oracle@everywhere: The ephemeral source of the eternal truth...
___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




Re: BOUNCE general@mnogosearch.org: Non-member submission from [Zenon Panoussis lrh@xs4all.nl]

2001-02-28 Thread Zenon Panoussis


Alexander Barkov skrev:

 
 I just not sure what version are you using.

3.1.11 with the add_url.3.1.11.diff patch only. 

 Is it here:
 
 }else{
 /* Unknown Content-Type */
 if(Method!=UDM_HEAD){
 crc32=UdmCRC32(Doc-content, (size_t)realsize);
 changed=!(crc32==Doc-crc32);
 if(CurSrv-use_clones){
 origin=UdmFindOrigin(Indexer, crc32,
 size);
 origin=((origin==Doc-url_id)?0:origin);
 }
 }
 }
 
 please run the following commands in gdb:
 
 frame 1
 print content_type
 print Method
 print Doc
 print Doc-content
 print Doc-url

#0  0x80600ca in UdmCRC32 (buf=0x4021c03e "", size=4294967295) at crc32.c:97
97  _CRC32_(crc, *p) ;
(gdb) frame 1
#1  0x804d7f8 in UdmIndexNextURL (Indexer=0x807ca50, index_flags=4) at indexer.c:1150
1150crc32=UdmCRC32(Doc-content, (size_t)realsize);
(gdb) print content_type
$1 = 0x4021c027 "application/unknown"
(gdb) print Method
$2 = 1
(gdb) print Doc
$3 = (UDM_DOCUMENT *) 0x91ef7d8
(gdb) print Doc-content
$4 = 0x4021c03e ""
(gdb) print Doc-url
$5 = 0x91f0548 "http://www.xs4all.nl/~fishman/ls/."

See my (bounced) posting from [EMAIL PROTECTED] on 
Tue, 27 Feb 2001 15:46:37 +0100 for details about this and 
the other URLs that the indexer crashes on. 

Z


-- 
oracle@everywhere: The ephemeral source of the eternal truth...
___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




Webboard: Alias news://xyz news://123 ?

2001-02-28 Thread Zenon Panoussis

Author: Zenon Panoussis
Email: [EMAIL PROTECTED]
Message:

 What exactly didn't work? Alias? Does indexer connect to the origial
 server? Or something else?

Nothing at all happens. The indexer connects to cachelogd and 
exits again normally after one or two seconds without ever 
connecting to the news server. 

I gather from your answer that aliasing news should be no 
problem, so later today I'll try to trace just what indexer 
does and report back.

Z


Reply: http://search.mnogo.ru/board/message.php?id=1581

___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




Re: BOUNCE general@mnogosearch.org: Non-member submission from [Zenon Panoussis lrh@xs4all.nl]

2001-02-27 Thread Zenon Panoussis


Hi

 I tested several pages from your site and everything seems to work fine.

What you see on the web is 3.1.10 . The 3.1.11 (patched yesterday) 
runs in a separate directory and a separate database; this way I 
can maintain the search functional while the 3.1.11 bugs are worked 
out of it :)  You can use 3.1.11 by going to 
http://search.freewinds.cx/cgi-bin/v4.cgi . On the other hand, it 
is *indexer* that's craching; the search part works fine (apart 
from the little ul= problem I reported yesterday).


 Does indexer crash always on the same URL? 

This is new since yesterday's patch: indexer crashes after a few 
minutes, always in the middle of a URL, like this

Indexer[21800]: [1] http://www.scientology-kills.org/dead.htm
Indexer[21800]: [1] http://www.xs4all.nl/~fishman/ls/.
Tue 27 08:26:06 [21283] Client #0 left
Segmentation fault (core dumped)

This particular URL is not very long and contains no spaces or 
other funny stuff; what is missing after 
   http://www.xs4all.nl/~fishman/ls/ 
is something like 
   ls02b.html 

After a crash I restart indexer and it goes on with status 0 URLs 
in the order it has them, so it won't go back and won't crash on 
the same URL.


 Please send also "backtrace" gdb command output.

#0  0x80600ca in UdmCRC32 (buf=0x4021c03e "", size=4294967295) at crc32.c:97
97  _CRC32_(crc, *p) ;
(gdb) print crc
$1 = 1181568253
(gdb) print p
$2 = 0x40499000 Address 0x40499000 out of bounds
(gdb) print *p
Cannot access memory at address 0x40499000
(gdb) backtrace
#0  0x80600ca in UdmCRC32 (buf=0x4021c03e "", size=4294967295) at crc32.c:97
#1  0x804d7f8 in UdmIndexNextURL (Indexer=0x807ca50, index_flags=4) at indexer.c:1150
#2  0x804a050 in thread_main (arg=0x0) at main.c:256
#3  0x804a9e4 in main (argc=3, argv=0xbab4) at main.c:596
#4  0x4009cbfc in __libc_start_main (main=0x804a16c main, argc=3, ubp_av=0xbab4, 
init=0x80496a8 _init, fini=0x806abfc _fini, rtld_fini=0x4000d674 _dl_fini, 
stack_end=0xbaac)
at ../sysdeps/generic/libc-start.c:118

I'm saving the core, 


-- 
oracle@everywhere: The ephemeral source of the eternal truth...
___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




Webboard: Site search

2001-02-26 Thread Zenon Panoussis

Author: Zenon Panoussis
Email: [EMAIL PROTECTED]
Message:

The escape bug in previous versions of cache mode is 
fixed, but ul=site seems to work only in the format 
http://www.domain.dom/ . Attempting http://www.domain 
yields no results at all and just www.domain.dom 
returns just anything (in other words, is ignored). 

Wouldn't it be a good idea to put the ul string 
between %% wildcards in the sql query, so that the 
user can type just any string and get his search 
limited to that? E.g. use 

[nothing] or http:// for any matches, 
http://www.domain.dom or just domain.dom for domain.dom
.dom/ for anything in the .dom tLD.

Z



Reply: http://search.mnogo.ru/board/message.php?id=1537

___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]




Webboard: Segfault (bad reference)

2001-02-20 Thread Zenon Panoussis

Author: Zenon Panoussis
Email: [EMAIL PROTECTED]
Message:

First try with a freshly compiled 3.1.11 on RH-7.0, mysql.

# ./indexer -m -c 300
Indexer[20577]: indexer from mnogosearch-3.1.11/MySQL started with 
'/usr/local/mn3111-1/etc/indexer.conf'
Wed 21 04:24:54 [20556] Client #0 connected
Segmentation fault (core dumped)
# Wed 21 04:24:55 [20556] Client #0 left

# gdb indexer core
loading etc
#0  0x400d4e1f in _IO_vfprintf (s=0xbfff4f20, 
format=0x806eec0 "INSERT INTO url 
(url,referrer,hops,crc32,last_index_time,next_index_time,status,tag,category) VALUES 
('%s',%d,%d,0,%d,%d,0,'%s','%s')", ap=0xbfff5020) at ../sysdeps/i386/bits/string.h:343
343 ../sysdeps/i386/bits/string.h: No such file or directory.

I have sting.h in the following places: 

# locate string.h
/usr/include/asm/string.h
/usr/include/linux/string.h
/usr/include/bits/string.h
/usr/include/string.h
/usr/include/g++-3/std/bastring.h
/usr/include/linuxconf/sstring.h
/usr/include/mysql/m_string.h
/usr/lib/bcc/include/string.h
/usr/local/include/php/ext/standard/php_string.h

I used --prefix=/usr/local/mn3111-1 --localstatedir=/var/mn3111-1 . 
That "3111-1" is almost if I would be expecting a 3111-2 ;) 

Z


Reply: http://search.mnogo.ru/board/message.php?id=1485

___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]