UdmSearch: configure options:

2001-02-15 Thread hanksdc

For 3.1.10, is the little configure bug fixed? I.e., do I use

--enable-fast-tag

or

--enable-fasttag

to get the fast tag search?

-- Dan

-- 
 /\
||   Daniel Hanks - Systems/Database Administrator||
||   About Inc., Web Services Division||
||   1253 N. Research Way, Suite Q-2500.  Orem, UT 84097  ||
||   ph: 801-437-6023  fax: 801-437-6020  email: [EMAIL PROTECTED]||
 \/


__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: Webboard: Segfault (grrr)

2001-02-15 Thread hanksdc

Is this file to be used with 3.1.9 sources, or 3.1.10? (Either is fine - I can adjust 
as necessary quite easily).

Thanks for the fix. I have over a million urls inserted and climbing. :-)

-- Dan


On Thu, 15 Feb 2001, Alexander Barkov wrote:

> Dan,please take new cache.c and recompile everything.
> It should fix the problem.
>
>
> [EMAIL PROTECTED] wrote:
> >
> > I just have to put in my encounters here, because they seem very similar. I get a 
>large amount of information indexed, but upon trying to run splitter, it will core 
>dump somewhere midway through, and on one round left wierd directories in the 
>$VAR/raw directory:
> >
> > [root@spider raw]# ls -al
> > total 32988
> > drwxr-xr-x   5 root root 8192 Feb 14 04:13 .
> > drwxr-xr-x   6 root root 4096 Feb 13 01:58 ..
> > drwxr-xr-x   3 root root 4096 Feb 13 03:12 64
> > -rw---   1 root root 33132544 Feb 14 04:13 core
> > -rw-r--r--   1 root root 8464 Feb 14 04:22 del.log
> > -rw-r--r--   1 root root   566272 Feb 14 04:22 wrd.log
> > drwxr-xr-x   3 root root 4096 Feb 13 03:58 ?Ë?64
> > drwxr-xr-x   3 root root 4096 Feb 13 06:06 à??18
> > [root@spider raw]#
> >
> > Unfortunately I wasn't thinking and I deleted all the .done files, and not all of 
>the logs were split. Well, back to indexing...
> >
> > I'm using 3.1.9 on Linux/Oracle
> >
> > -- Dan Hanks
> >
> > On Wed, 14 Feb 2001, Zenon Panoussis wrote:
> >
> > >
> > >
> > > Zenon Panoussis skrev:
> > > >
> > >
> > > > By now, I have almost 1 GB of indexed files, 4 indexer
> > > > crashes and one splitter crash. I'll do the debugging and
> > > > post its output tomorrow.
> > >
> > > ===
> > > # gdb indexer core.indexer.01
> > > GNU gdb 5.0
> > > Copyright 2000 Free Software Foundation, Inc.
> > > GDB is free software, covered by the GNU General Public License, and you
> > > are
> > > welcome to change it and/or distribute copies of it under certain
> > > conditions.
> > > Type "show copying" to see the conditions.
> > > There is absolutely no warranty for GDB.  Type "show warranty" for
> > > details.
> > > This GDB was configured as "i386-redhat-linux"...
> > > Core was generated by `./indexer -m -s 200'.
> > > Program terminated with signal 11, Segmentation fault.
> > > Reading symbols from /usr/lib/mysql/libmysqlclient.so.10...done.
> > > Loaded symbols for /usr/lib/mysql/libmysqlclient.so.10
> > > Reading symbols from /lib/libm.so.6...done.
> > > Loaded symbols for /lib/libm.so.6
> > > Reading symbols from /usr/lib/libz.so.1...done.
> > > Loaded symbols for /usr/lib/libz.so.1
> > > Reading symbols from /lib/libc.so.6...done.
> > > Loaded symbols for /lib/libc.so.6
> > > Reading symbols from /lib/libcrypt.so.1...done.
> > > Loaded symbols for /lib/libcrypt.so.1
> > > Reading symbols from /lib/libnsl.so.1...done.
> > > Loaded symbols for /lib/libnsl.so.1
> > > Reading symbols from /lib/ld-linux.so.2...done.
> > > Loaded symbols for /lib/ld-linux.so.2
> > > Reading symbols from /lib/libnss_files.so.2...done.
> > > Loaded symbols for /lib/libnss_files.so.2
> > > Reading symbols from /lib/libnss_nisplus.so.2...done.
> > > Loaded symbols for /lib/libnss_nisplus.so.2
> > > Reading symbols from /lib/libnss_nis.so.2...done.
> > > Loaded symbols for /lib/libnss_nis.so.2
> > > Reading symbols from /lib/libnss_dns.so.2...done.
> > > Loaded symbols for /lib/libnss_dns.so.2
> > > Reading symbols from /lib/libresolv.so.2...done.
> > > Loaded symbols for /lib/libresolv.so.2
> > > #0  0x805e5fa in UdmCRC32 (buf=0x4021b03e "", size=4294967295) at
> > > crc32.c:97
> > > 97  _CRC32_(crc, *p) ;
> > > (gdb) print crc
> > > $1 = 1928826335
> > > (gdb) print p
> > > $2 = 0x40431000 
> > >
> > > ===
> > >
> > > # gdb indexer core.indexer.02
> > > 
> > > #0  0x805e5fa in UdmCRC32 (buf=0x4021b03e "", size=4294967295) at
> > > crc32.c:97
> > > 97  _CRC32_(crc, *p) ;
> > > (gdb) print crc
> > > $1 = 835566978
> > > (gdb) print p
> > > $2 = 0x40404000 
> > >
> > > ===
> > >
> > > # gdb indexer core.indexer.03
> > > 
> > > #0  0x805e5fa in UdmCRC32 (buf=0x4021b03e "", size=4294967295) at
> > > crc32.c:97
> > > 97  _CRC32_(crc, *p) ;
> > > (gdb) print crc
> > > $1 = 2869617068
> > > (gdb) print p
> > > $2 = 0x40404000 
> > >
> > > ===
> > >
> > > # gdb indexer core.indexer.04
> > > 
> > > (gdb) print crc
> > > $1 = 1253677059
> > > (gdb) print p
> > > $2 = 0x40431000 
> > >
> > > ===
> > >
> > > And finally the splitter:
> > >
> > > # gdb splitter core.splitter.01
> > > 
> > > This GDB was configured as "i386-redhat-linux"...
> > > Core was generated by `/usr/local/mnogo3110/sbin/splitter'.
> > > Program terminated with signal 11, Segmentation fault.
> > > Reading symbols from /usr/lib/mysql/libmysqlclient.so.10...done.
> > > Loaded symbols for /usr/lib/mysql/libmysqlclient.so.10
> > > Reading sym

Re: UdmSearch: Webboard: Segfault (grrr)

2001-02-14 Thread hanksdc

I just have to put in my encounters here, because they seem very similar. I get a 
large amount of information indexed, but upon trying to run splitter, it will core 
dump somewhere midway through, and on one round left wierd directories in the $VAR/raw 
directory:

[root@spider raw]# ls -al
total 32988
drwxr-xr-x   5 root root 8192 Feb 14 04:13 .
drwxr-xr-x   6 root root 4096 Feb 13 01:58 ..
drwxr-xr-x   3 root root 4096 Feb 13 03:12 64
-rw---   1 root root 33132544 Feb 14 04:13 core
-rw-r--r--   1 root root 8464 Feb 14 04:22 del.log
-rw-r--r--   1 root root   566272 Feb 14 04:22 wrd.log
drwxr-xr-x   3 root root 4096 Feb 13 03:58 ?Ë?64
drwxr-xr-x   3 root root 4096 Feb 13 06:06 à??18
[root@spider raw]#

Unfortunately I wasn't thinking and I deleted all the .done files, and not all of the 
logs were split. Well, back to indexing...

I'm using 3.1.9 on Linux/Oracle

-- Dan Hanks


On Wed, 14 Feb 2001, Zenon Panoussis wrote:

>
>
> Zenon Panoussis skrev:
> >
>
> > By now, I have almost 1 GB of indexed files, 4 indexer
> > crashes and one splitter crash. I'll do the debugging and
> > post its output tomorrow.
>
> ===
> # gdb indexer core.indexer.01
> GNU gdb 5.0
> Copyright 2000 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you
> are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for
> details.
> This GDB was configured as "i386-redhat-linux"...
> Core was generated by `./indexer -m -s 200'.
> Program terminated with signal 11, Segmentation fault.
> Reading symbols from /usr/lib/mysql/libmysqlclient.so.10...done.
> Loaded symbols for /usr/lib/mysql/libmysqlclient.so.10
> Reading symbols from /lib/libm.so.6...done.
> Loaded symbols for /lib/libm.so.6
> Reading symbols from /usr/lib/libz.so.1...done.
> Loaded symbols for /usr/lib/libz.so.1
> Reading symbols from /lib/libc.so.6...done.
> Loaded symbols for /lib/libc.so.6
> Reading symbols from /lib/libcrypt.so.1...done.
> Loaded symbols for /lib/libcrypt.so.1
> Reading symbols from /lib/libnsl.so.1...done.
> Loaded symbols for /lib/libnsl.so.1
> Reading symbols from /lib/ld-linux.so.2...done.
> Loaded symbols for /lib/ld-linux.so.2
> Reading symbols from /lib/libnss_files.so.2...done.
> Loaded symbols for /lib/libnss_files.so.2
> Reading symbols from /lib/libnss_nisplus.so.2...done.
> Loaded symbols for /lib/libnss_nisplus.so.2
> Reading symbols from /lib/libnss_nis.so.2...done.
> Loaded symbols for /lib/libnss_nis.so.2
> Reading symbols from /lib/libnss_dns.so.2...done.
> Loaded symbols for /lib/libnss_dns.so.2
> Reading symbols from /lib/libresolv.so.2...done.
> Loaded symbols for /lib/libresolv.so.2
> #0  0x805e5fa in UdmCRC32 (buf=0x4021b03e "", size=4294967295) at
> crc32.c:97
> 97  _CRC32_(crc, *p) ;
> (gdb) print crc
> $1 = 1928826335
> (gdb) print p
> $2 = 0x40431000 
>
> ===
>
> # gdb indexer core.indexer.02
> 
> #0  0x805e5fa in UdmCRC32 (buf=0x4021b03e "", size=4294967295) at
> crc32.c:97
> 97  _CRC32_(crc, *p) ;
> (gdb) print crc
> $1 = 835566978
> (gdb) print p
> $2 = 0x40404000 
>
> ===
>
> # gdb indexer core.indexer.03
> 
> #0  0x805e5fa in UdmCRC32 (buf=0x4021b03e "", size=4294967295) at
> crc32.c:97
> 97  _CRC32_(crc, *p) ;
> (gdb) print crc
> $1 = 2869617068
> (gdb) print p
> $2 = 0x40404000 
>
> ===
>
> # gdb indexer core.indexer.04
> 
> (gdb) print crc
> $1 = 1253677059
> (gdb) print p
> $2 = 0x40431000 
>
> ===
>
> And finally the splitter:
>
> # gdb splitter core.splitter.01
> 
> This GDB was configured as "i386-redhat-linux"...
> Core was generated by `/usr/local/mnogo3110/sbin/splitter'.
> Program terminated with signal 11, Segmentation fault.
> Reading symbols from /usr/lib/mysql/libmysqlclient.so.10...done.
> Loaded symbols for /usr/lib/mysql/libmysqlclient.so.10
> Reading symbols from /lib/libm.so.6...done.
> Loaded symbols for /lib/libm.so.6
> Reading symbols from /usr/lib/libz.so.1...done.
> Loaded symbols for /usr/lib/libz.so.1
> Reading symbols from /lib/libc.so.6...done.
> Loaded symbols for /lib/libc.so.6
> Reading symbols from /lib/libcrypt.so.1...done.
> Loaded symbols for /lib/libcrypt.so.1
> Reading symbols from /lib/libnsl.so.1...done.
> Loaded symbols for /lib/libnsl.so.1
> Reading symbols from /lib/ld-linux.so.2...done.
> Loaded symbols for /lib/ld-linux.so.2
> #0  0x8057d15 in UdmSplitCacheLog (log=118) at cache.c:635
> 635
> logwords[count+j].wrd_id=table[w].wrd_id;
> (gdb) print count
> $1 = 13121220
> (gdb) print count+j
> $2 = 13125316
> (gdb) print logwords
> $3 = (UDM_LOGWORD *) 0x0
> (gdb) print table[w]
> $4 = {wrd_id = 1918989871, weight = 1869507887, pos = 825454439, len =
> 1949249585}

Re: UdmSearch: Webboard: Segfault (grrr)

2001-02-14 Thread hanksdc

 >
> >
> > The only one disadvantage is that it will not work on huge
> > search engines with millions documents. There is a limit on total
> > file number on file system in most unixes.
> > For example, my 30G /usr partition on FreeBSD box can create about 8
> > mln
> > files.
>
> is that a per file system limit or per unix box limit?
>

Generally it's a limitation with how the file system has been created. Different 
parameters when creating the filesystem will yield different results. So the number of 
available inodes is really dependent on the parameters with which you create thge 
filesystem. On Linux it's dependent on parameters for block_size, bytes_per_inode, etc.

-- Dan
=
Daniel Hanks
Network Administrator
Web Services Group

About
The Human Internet


1253 N. Research Way, Suite Q-2500.  Orem, UT 84097
ph: 801-437-6023fax: 801-437-6020
email: [EMAIL PROTECTED]



__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




Re: UdmSearch: possible search.cgi problem

2000-10-11 Thread hanksdc

Would it be safe to use the 3.1.6 splitter on data indexed by the indexer
from 3.1.5 or 3.1.2-pre1?

-- Dan Hanks


On Wed, 11 Oct 2000, Alexander Barkov wrote:

> BTW. 3.1.6 we've fixed two major bugs in cache mode.
> So I affraid that evebody have to drop database and reindex :-(
> We sorry about that.
> 
> The problem is that first indexing did not put URL timestamps into
> /var/raw/del.log
> 
> This caused a problem that the same URL distributed again and again
> everytime when spitter is running. This means that the same word in the
> same URL is found several times.
> 
> Fredy Kuenzler wrote:
> > 
> > Dear Alexander and all
> > 
> > I'm currently testing 3.1.6-pre2, indexing a big number of
> > domains, using the new cache mode. Currently appx. 500'000
> > Documents total, 1/3 already indexed, 2/3 still to go.
> > 
> > After indexing overnight with 4 concurrent indexer processes
> > (seems to work fine) and running splitter afterwards, I
> > expericend a strange issue. It seems, previous queries are still
> > beeing cached and won't be updated.
> > 
> > Here is the example:
> > Yesterday afternoon I queried "fussball". 49 Results.
> > After indexing overnight and splittering, I queried "fussball"
> > again. Still 49 Results. If I query "fussball" and "sport", it
> > shows more results: fussball: 997 sport: 12881
> > 
> > I checked with different browsers to remove cache entries and
> > also with a previously unused proxy server - no difference.
> > 
> > You might try it out
> > http://potato.webtourist.net/cgi-bin/pftest/search.cgi
> > to figure out whether I'm expericening a fata morgana or not :-)
> > 
> > Thanks
> > Fredy
> > 
> > BTW. There is another minor issue in search.cgi: The 2nd page is
> > numbered from 1 to 20 instead 21 to 40 and so on (default
> > settings).
> 
> 

-- 
=
Daniel Hanks
Network Administrator
Web Services Group

About
The Human Internet


1253 N. Research Way, Suite Q-2500.  Orem, UT 84097
ph: 801-437-6023fax: 801-437-6020
email: [EMAIL PROTECTED]



__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]




UdmSearch: Re: [devel] Segfaults in 3.1.5

2000-09-28 Thread hanksdc

Excellent! I figured that NULL wasn't supposed to be there! :-)

Thanks,

Dan

On Thu, 28 Sep 2000, Alexander Barkov wrote:

>   Hi!
> 
> Find the patch against main.c which fixes this in attachement.
> 
> 
> 
> [EMAIL PROTECTED] wrote:
> > 
> > I'm getting segfaults when running indexer from 3.1.5.
> > I'm trying to do something like this:
> > ./indexer -i -f site_file
> > I'm including an alias file of about 50 lines, which worked fine with
> > 3.1.4-pre8. It seems (from strace output) that the indexer loads the alias
> > file fine, but the minute it tries to read from site_file it segfaults.
> > From playing around with the code, it almost seems like it's dying when trying
> > to do the strcmp near the top of
> > 
> > __INDLIB__ int UdmURLFile(UDM_AGENT *Indexer, int action) in indexer.c.
> > 
> > Perhaps in the line:
> > 
> > if(!strcmp(Indexer->Conf->url_file_name,"-"))
> > 
> > I'm downloading kdbg right now (not being too familiar with
> > gdb...alas...) so I'll take a 'closer' look at it to see if I can see
> > what's happening.
> 
> 

-- 
=
Daniel Hanks
Network Administrator
Web Services Group

About
The Human Internet


1508 N. Technology Way, Suite D-2300, Orem, UT 84097
ph: 801-437-6023fax: 801-437-6020
email: [EMAIL PROTECTED]



__
If you want to unsubscribe send "unsubscribe udmsearch"
to [EMAIL PROTECTED]