[dataparksearch] [Forum] Re: Индексация от обеда до забора

2008-07-04 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: Индексация от обеда до забора

Если имеется в виду индексирование всего Рунета, то
Realm regex ^http://[^/\.]*\.ru/
Realm regex ^http://www.[^/\.]*\.ru/

Если имеется в виду индексирование всех ссылок, найденых на каком-то сайте, то 
такая возможность не поддерживается.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=05;topic_id=1215180806



[dataparksearch] [Forum] Re: FTP поиск по именам дирокторий и файлов

2008-07-05 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: FTP поиск по именам дирокторий и файлов

Честно говоря удивлен, что работает :) Вчерашний снапшот был недоделаным, 
сегодня пофиксил: http://www.dataparksearch.org/dpsearch-4.50-05072008.tar.bz2

deb пэкадж я не собираю, я делаю только порт для FreeBSD. Если вы дадите ссылку 
на описание, как делать deb пэкаджи и куда их отправлять (в репозиторий?), то я 
попробую следующую версию оформить в виде deb-пэкаджа.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=05;topic_id=1214665125;page=2



[dataparksearch] [Forum] Re: FTP поиск по именам дирокторий и файлов

2008-07-07 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: MF
Subject: Re: FTP поиск по именам дирокторий и файлов

Вот официальная дока дебиана.

http://www.us.debian.org/doc/manuals/maint-guide/

я плохо представляю как dataparksearch и mnogosearch могут совмещаться в 1 
сисетеме, просто много в репах уже есть 
http://packages.debian.org/search?keywords=mnogosearch наверно надо делать 
исключение
 
Пожалуй попробую собрать версию с mysql, если получиться - напишу.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=05;topic_id=1214665125;page=2



[dataparksearch] [Forum] Re: RSS выборочное срабатывание

2008-07-08 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: RSS выборочное срабатывание

Проверьте, какой именно лог вы смотрите, эта команда включает максимальный 
уровень выдачи отладочной информации, поэтому вывод в error_log должен 
увеличиться.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=06;topic_id=1215548226



[dataparksearch] [Forum] Re: RSS выборочное срабатывание

2008-07-09 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: RSS выборочное срабатывание

Попробуйте выполнить из командной строки:
QUERY_STRING="%F1%EE%E1%E0%EA%E8&c=&site=&m=all&sp=1&sy=0&s=DRP&tmplt=rss.htm" 
./search.cgi 2>err

и покажите, что выводится в файл err.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=06;topic_id=1215548226



[dataparksearch] [Forum] Re: Segmentation fault при индексировании

2008-07-09 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: dalex
Subject: Re: Segmentation fault при индексировании

Вот бэктрейс дампа версии 1.50 от 5-го числа этого месяца.  Так же вываливается 
в segfault, только я удалил документ на котором валилось в прошлый раз.  Сейчас 
валится на другом.

# gdb /sbin/indexer core
GNU gdb 6.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-suse-linux"...Using host libthread_db 
library "/lib64/tls/libthread_db.so.1".

Core was generated by `:[1] URL:htdb:/04/'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from //lib/libdpsearch-4.so...done.
Loaded symbols for //lib/libdpsearch-4.so
Reading symbols from //lib/libdpcharset-4.so...done.
Loaded symbols for //lib/libdpcharset-4.so
Reading symbols from /usr/lib64/libmysqlclient.so.15...done.
Loaded symbols for /usr/lib64/libmysqlclient.so.15
Reading symbols from /lib64/tls/librt.so.1...done.
Loaded symbols for /lib64/tls/librt.so.1
Reading symbols from /lib64/libz.so.1...done.
Loaded symbols for /lib64/libz.so.1
Reading symbols from /usr/lib64/libaspell.so.15...done.
Loaded symbols for /usr/lib64/libaspell.so.15
Reading symbols from /lib64/libdl.so.2...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /usr/lib64/libstdc++.so.5...done.
Loaded symbols for /usr/lib64/libstdc++.so.5
Reading symbols from /lib64/tls/libm.so.6...done.
Loaded symbols for /lib64/tls/libm.so.6
Reading symbols from /lib64/tls/libc.so.6...done.
Loaded symbols for /lib64/tls/libc.so.6
Reading symbols from /lib64/libgcc_s.so.1...done.
Loaded symbols for /lib64/libgcc_s.so.1
Reading symbols from /lib64/tls/libpthread.so.0...done.
Loaded symbols for /lib64/tls/libpthread.so.0
Reading symbols from /lib64/libcrypt.so.1...done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libnsl.so.1...done.
Loaded symbols for /lib64/libnsl.so.1
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libnss_files.so.2...done.
Loaded symbols for /lib64/libnss_files.so.2
#0  0x002a95849a0e in DpsDSTRAppendUni (dstr=Variable "dstr" is not 
available.
) at charset-utils.c:334
334 charset-utils.c: No such file or directory.
in charset-utils.c
(gdb) backtrace
#0  0x002a95849a0e in DpsDSTRAppendUni (dstr=Variable "dstr" is not 
available.
) at charset-utils.c:334
#1  0x002a95846ea2 in DpsUniDecomposeRecursive (buf=Variable "buf" is not 
available.
) at unidata.c:363
#2  0x002a95846f5e in DpsUniNormalizeNFD (buf=Variable "buf" is not 
available.
) at unidata.c:446
#3  0x002a9584706d in DpsUniNormalizeNFC (buf=Variable "buf" is not 
available.
) at unidata.c:470
#4  0x002a956cf409 in DpsPrepareItem (Indexer=Variable "Indexer" is not 
available.
) at parsehtml.c:103
#5  0x002a956d00cd in DpsPrepareWords (Indexer=Variable "Indexer" is not 
available.
) at parsehtml.c:469
#6  0x002a95684e2d in DpsIndexNextURL (Indexer=Variable "Indexer" is not 
available.
) at indexer.c:2054
#7  0x00404607 in main (argc=Variable "argc" is not available.
) at main.c:884
(gdb) q

- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=04;topic_id=1214392453



[dataparksearch] [Forum] Re: RSS выборочное срабатывание

2008-07-09 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: RSS выборочное срабатывание

Попробуйте пересобрать указав для configure ключ --enable-syslog вместо 
--disable-syslog.
Появится ли после этого отладочная информация в error_lor/файле err ?
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=06;topic_id=1215548226



[dataparksearch] [Forum] Проблема при configure

2008-07-10 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Андрей
Subject: Проблема при configure

Здравствуйте! Только что нашел Вашу технологию, очень заинтересовала.
Решил установить на сервак (ASPLinux 11) к себе, но...

checking for daemon... yes
checking for inet_addr... yes
checking for sqrt... no
checking for sqrt in -lm... yes
checking for libtre... yes
checking tre/regex.h usability... yes
checking tre/regex.h presence... yes
checking for tre/regex.h... yes
checking for ares_init in -lcares... no
checking for ares_init in -lares... no
checking for getaddrinfo in -lbind... yes
checking for hstrerror... no
checking for getaddrinfo... no
checking for inet_net_pton... no
checking for pthread_setconcurrency function prototype in pthread.h... no
checking for thr_setconcurrency function prototype in thread.h... no
checking for char*... yes
checking size of char*... configure: error: cannot compute sizeof (char*), 77
See `config.log' for more details.
configure failed: 256 at ./install.pl line 176,  line 32.

Куда копать?
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=04;post=



[dataparksearch] [Forum] Re: Проблема при configure

2008-07-10 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Андрей
Subject: Re: Проблема при configure

Вот еще вырезки из config.log

This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by configure, which was
generated by GNU Autoconf 2.59.  Invocation command line was

  $ ./configure --prefix=/usr/local/dpsearch --bindir=/usr/local/dpsearch/bin 
--sbindir=/usr/local/dpsearch/sbin --sysconfdir=/usr/local/dpsearch/etc 
--localstatedir=/usr/local/dpsearch/var --libdir=/usr/local/dpsearch/lib 
--includedir=/usr/local/dpsearch/include --mandir=/usr/local/dpsearch/man 
--enable-shared --enable-syslog --enable-pthreads --enable-parser --enable-mp3 
--without-aspell --enable-file --enable-http --enable-ftp --enable-htdb 
--enable-news --with-mysql

## - ##
## Platform. ##
## - ##

hostname = localhost
uname -m = i686
uname -r = 2.6.14-1.1653.1aspsmp
uname -s = Linux
uname -v = #1 SMP Mon Jan 23 20:08:13 EET 2006

/usr/bin/uname -p = unknown
/bin/uname -X = unknown

/bin/arch  = i686
/usr/bin/arch -k   = unknown
/usr/convex/getsysinfo = unknown
hostinfo   = unknown
/bin/machine   = unknown
/usr/bin/oslevel   = unknown
/bin/universe  = unknown

PATH: /usr/kerberos/sbin
PATH: /usr/kerberos/bin
PATH: /usr/local/sbin
PATH: /usr/local/bin
PATH: /sbin
PATH: /bin
PATH: /usr/sbin
PATH: /usr/bin
PATH: /usr/X11R6/bin
PATH: /usr/NX/bin
PATH: /root/bin
PATH: /usr/NX/bin

..
...
..

## --- ##
## confdefs.h. ##
## --- ##

#define DPS_BASE_VERSION 4
#define DPS_TAIL_VERSION 49
#define DPS_VERSION_ID 449
#define HAVE_ARPA_INET_H 1
#define HAVE_ARPA_NAMESER_H 1
#define HAVE_BZERO 1
#define HAVE_DAEMON 1
#define HAVE_DLFCN_H 1
#define HAVE_FCNTL_H 1
#define HAVE_FSEEKO 1
#define HAVE_INTTYPES_H 1
#define HAVE_LIBBIND 1
#define HAVE_LIMITS_H 1
#define HAVE_MEMORY_H 1
#define HAVE_NETDB_H 1
#define HAVE_NETINET_IN_H 1
#define HAVE_NETINET_IN_SYSTM_H 1
#define HAVE_NETINET_IP_H 1
#define HAVE_NETINET_TCP_H 1
#define HAVE_PUTENV 1
#define HAVE_REGCOMP 1
#define HAVE_RESOLV_H 1
#define HAVE_SEMAPHORE_H 1
#define HAVE_SETENV 1
#define HAVE_SNPRINTF 1
#define HAVE_SOCKET 1
#define HAVE_STDINT_H 1
#define HAVE_STDLIB_H 1
#define HAVE_STRCASECMP 1
#define HAVE_STRCASESTR 1
#define HAVE_STRDUP 1
#define HAVE_STRINGS_H 1
#define HAVE_STRING_H 1
#define HAVE_STRNCASECMP 1
#define HAVE_STRNDUP 1
#define HAVE_STRNLEN 1
#define HAVE_STRSTR 1
#define HAVE_STRTOK_R 1
#define HAVE_SYSLOG_H 1
#define HAVE_SYS_CDEFS_H 1
#define HAVE_SYS_IOCTL_H 1
#define HAVE_SYS_IPC_H 1
#define HAVE_SYS_MSG_H 1
#define HAVE_SYS_PARAM_H 1
#define HAVE_SYS_SELECT_H 1
#define HAVE_SYS_SEM_H 1
#define HAVE_SYS_SOCKET_H 1
#define HAVE_SYS_STAT_H 1
#define HAVE_SYS_SYSCTL_H 1
#define HAVE_SYS_TIMES_H 1
#define HAVE_SYS_TIME_H 1
#define HAVE_SYS_TYPES_H 1
#define HAVE_SYS_TYPES_H 1
#define HAVE_SYS_WAIT_H 1
#define HAVE_TIMEGM 1
#define HAVE_TM_GMTOFF 1
#define HAVE_TRE_REGEX_H 1
#define HAVE_UNISTD_H 1
#define HAVE_UNISTD_H 1
#define HAVE_UNSETENV 1
#define HAVE_VSNPRINTF 1
#define PACKAGE "dpsearch"
#define PACKAGE_BUGREPORT ""
#define PACKAGE_NAME ""
#define PACKAGE_STRING ""
#define PACKAGE_TARNAME ""
#define PACKAGE_VERSION ""
#define STDC_HEADERS 1
#define STDC_HEADERS 1
#define VERSION "4.49"
#define _FILE_OFFSET_BITS 64
#define _LARGEFILE_SOURCE 1
#endif
#ifdef __cplusplus
extern "C" void std::exit (int) throw (); using std::exit;

configure: exit 1

Пока не могу понять что не хватает. Может какой то пакет устарел.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=04;topic_id=1215757425



[dataparksearch] [Forum] Re: Segmentation fault при индексировании

2008-07-11 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: dalex
Subject: Re: Segmentation fault при индексировании

> At 19:55:38  10/07/08, Maxime wrote:
>А в генерируемой вами таблице могут попадаться "слова" длиной более 256 
>символов ?

Просмотрел - да, были сочетания символов (знак подчеркивания) длинные, возможно 
длиннее 256 символов. Но после пересоздания таблицы с удалением таких длинных 
подчеркиваний проблема не исчезла.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=04;topic_id=1214392453;reply=1215705338



[dataparksearch] [Forum] Re: RSS выборочное срабатывание

2008-07-12 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: zabar
Subject: Re: RSS выборочное срабатывание

> At 18:52:16  09/07/08, Maxime wrote:
>Попробуйте пересобрать указав для configure ключ --enable-syslog вместо 
>--disable-syslog.
>Появится ли после этого отладочная информация в error_lor/файле err ?
Сделал, результат прежний

- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=06;topic_id=1215548226;reply=1215615136



[dataparksearch] [Forum] Re: Проблема при configure

2008-07-13 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Андрей
Subject: Re: Проблема при configure

> At 14:50:11  11/07/08, Maxime wrote:
>Проверьте, стоят ли у вас пэкаджи, необходимы для сборки ПО из исходников, в 
>Линуксах обычно они не ставятся по-умолчанию.

А какие именно?
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=04;topic_id=1215757425;reply=1215773411



[dataparksearch] [Forum] Re: Проблема при configure

2008-07-14 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: Проблема при configure

Инструментарий, необходимый для сборки, указан в документации: 
http://www.dataparksearch.org/dpsearch-toolsreq.ru.html
Я не могу назвать имена пэкаджей для линукса, но кроме перечисленных на этой 
странице утилит, вам нужно будет установить linux-headers, и девелоперские 
пэкаджи для всех библиотек, которые будут использоваться с DataparkSearch.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=04;topic_id=1215757425



[dataparksearch] [Forum] Configuration of Dataparksearch utility with Cygwin linux utility?

2008-07-17 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Anup Nair
Subject: Configuration of Dataparksearch utility with Cygwin linux utility?

Hi,
I have been trying to install DataparkSearch using Cygwin on a Windows XP SP2 
system.
I have downloaded the entire installation of Cygwin, all repositories.

I can run the install.pl successfully but make gives errors. I used make 
version 3.81.
I used perl v5.10.0 to configure.

My system has MySQL as part of xampp 1.6.6a. I gave the path to the MySQL 
folder as
"/cygdrive/d/xampp/mysql; where 'd' is the automounted D drive partition. I 
also downloaded the development version of xampp and copied the files into the 
running version when I got a "could not find mysql.h" error.

The install path is default. I created the /usr/local/dpsearch directory.
It fails to autodetect my MySQL database even though I have xampp running. It 
detects PostgreSQL though, even though I haven’t installed it.

I only gave yes for MySQL support and no for all the rest.

For other options I gave the default (in brackets) value.

when I run makeIi get 6 warnings all from sql.c - 
assignment discards qualifiers from pointer target type in functions DpsAddURL, 
DpsAddLink, DpsResAddDocInfoSQL, DpsHtdbGet and DpsLimitLinkSQL

Errors listed are
1. 'SHM_R' undeclared (first use in this function)
1. 'SHM_W' undeclared (first use in this function)
1. 'Env' undeclared (first use in this function)

Could anyone please guide to successfully install and run Dataparksearch with 
Cygwin, any possibilities of using Dataparksearch utility on a windows based 
system? My main criteria is to get a search engine working to index text and 
multimedia files both for our intranet.

Any help will be appreciated...

[EMAIL PROTECTED]
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;post=



[dataparksearch] [Forum] Re: Configuration of Dataparksearch utility with Cygwin linux utility?

2008-07-19 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: Configuration of Dataparksearch utility with Cygwin linux utility?

DataparkSearch is a Unix software. I can't believe it would be compiled on 
Windows successfully. Although I know nothing about Cygwin, so I can't  advise 
you.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1216287469



[dataparksearch] [Forum] Re: Протестил новый поиск

2008-07-20 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: Протестил новый поиск

Похожие запросы - это отдельный поиск, когда таблица qtrack проиндексирована 
средствами DataparkSearch, обращение к этому поиску идет через HttpRequest, по 
сути это отдельный поиск.

Номера телефонов - это секция, выделяемая из текса по шаблону регулярного 
выражения, один из вариантов команды Section.

- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=05;topic_id=1216509417



[dataparksearch] [Forum] Re: Протестил новый поиск

2008-07-20 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Roman
Subject: Re: Протестил новый поиск

Вижу, а не лучше как у nigma.ru сделать (парсить из текста) - так и базу 
дёргать не нужно?

Вот ещё распространённый глук - в большенстве страниц ошибочно распознаётся 
язык, на русские страници маркерует bg, ro, cv, kv  - а не ru
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=05;topic_id=1216509417



[dataparksearch] [Forum] How To Use DPSearch

2008-07-20 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: will harris
Subject: How To Use DPSearch

It's not entirely clear to me how to use this progam. The documentation lists 
several options but I am new, and am not exactly sure why I would want to do 
certain steps over other ones. I have dpsearch configured and running fine. I 
just don't know what to do next. I wanted to be able to give it search terms 
and have it branch out over networks looking for documents with those terms, 
but reading the docs it doesn't seem like that's what this does. Can anyone 
help me with pointers, advice, and perhaps even example config files to see 
how, and why you use the program the way you do?
Best Regards,
Will
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;post=



[dataparksearch] [Forum] Re: segfault | Can

2008-07-26 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Fox
Subject: Re: segfault | Can

при индексирование,
после "indexer -Ecreate"
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=05;topic_id=1216734506



[dataparksearch] [Forum] install

2008-07-28 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Mike
Subject: install

I would like to either offer my server (high spec dedicated) for testing in 
exchange for install support, or find someone I can pay to help with the 
initial install.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;post=



[dataparksearch] [Forum] Индексаторы запирают базу

2008-07-28 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: zabar
Subject: Индексаторы запирают базу

FreeBSD 7.0/amd64
mysql 5.0.51a
при сканировании после подобных записей
 
[74103]{12} Can't connect to host dreamtour.info:80
[74103]{15} Download timeout
[74103]{17} Download timeout

в процессах MySQL появляется куча Lock'ов. пока эти Lock-и не прибьешь, 
индексатор дальше не работает.
посоветуйте, пожалуйста, в чем может быть проблема?

ниже приведены конфиги.

cached-zoo.conf

запускаю так
cached /usr/local/dpsearch/cached-zoo.conf

Listen 7000
DBAddr  mysql://*:[EMAIL PROTECTED]/*/?dbmode=cache
WrdFiles 4096
CacheLogWords 16384
CacheLogDels 8192
URLDataFiles 256
OptimizeAtUpdate yes
OptimizeInterval 3600
OptimizeRatio 5
VarDir /usr/local/dpsearch/var/zoo
Limit site:siteid
Limit c:category

indexer

запуск indexer
indexer -r -N 20 -H -W /usr/local/dpsearch/indexer-zoo.conf

DBAddr  mysql://*:[EMAIL 
PROTECTED]/*/?dbmode=cache&cached=localhost:7000
VarDir /usr/local/dpsearch/var/zoo
LocalCharset cp1251
CollectLinks yes
DoStore yes
Include stopwords.conf
Include langmap.conf
MinWordLength 1
MaxWordLength 25
MaxDocSize 51200
MinDocSize 2048
IndexDocSizeLimit 51200
URLSelectCacheSize 10240
MaxDepth 4
Period 600d
PeriodByHops 0 14d
PeriodByHops 1 30d
PeriodByHops 2 60d
PeriodByHops 3 120d
PeriodByHops 4 240d
PeriodByHops 5 480d
ParserTimeOut 3s
ReadTimeOut 3s
RobotsPeriod 30d
DocTimeOut 3s
ServerTable mysql://*:[EMAIL PROTECTED]/*/zoooz_server
Limit site:siteid
Limit c:category
PopRankMethod Neo
PopRankFeedBack yes
PopRankNeoIterations 10
PopRankUseTracking yes
MaxNetErrors 32
MaxSiteLevel 3
URLInfoSQL no
MarkForIndex no
CheckInsertSQL yes
DetectClones yes
Include sections.conf
RemoteCharset windows-1251
DefaultLang ru
VaryLang "ru en"
Disallow *sort=* *filmrnd.php* *trans=* *actor=* *producer=*
Disallow *sasn=???* *ortOrder=* *rderby=* *rder_by=* *rder=* *sortby=* 
*sort_by=*
Disallow */ad/* *&cb=???* *userpic* *showuser=?*
Disallow *video*&style=* *video*&leter=?*
Disallow *&end_mark=* *referrer*
Disallow */adm/* */admin* *login.* *=*auth*
Disallow */assets/* */classes/* */js/* */menus/*
Disallow *http://*http:/* *http://*www*/www*/*
Disallow */koi/koi* */koi/iso/* */koi/dos/* */koi/win/*
Disallow */koi8/koi* */koi8/iso/* */koi8/dos/* */koi8/win/*
Disallow */iso/*/iso/* */iso/koi8/* */iso/iso/* */iso/dos/* */iso/win/*
Disallow *out.cgi* *privatesend.* *action* *ubbmisc.* *findthread* */search.* 
*simplesearch*
Disallow *Ultimate.*email* *recent_user* *=*profile* *=*transfer*
Disallow *ultimatebb.*get_ip* *=reply* *send_topic* *next_topic* *edit_post*
Disallow *close_topic* *ultimatebb.*email* *delete_topic* *=agree*
Disallow *.b *.sh *.md5 *.rpm
Disallow *.arj *.tar *.zip *.tgz *.gz *.z *.bz2
Disallow *.lha *.lzh *.rar *.zoo *.ha *.tar.Z
Disallow *.gif *.jpg *.jpeg *.bmp *.tiff *.tif *.xpm *.xbm *.pcx *.ico
Disallow *.vdo *.mpeg *.mpe *.mpg *.avi *.movie *.mov *.dat *.swf *.fla
Disallow *.mid *.mp3 *.rm *.ram *.wav *.aiff *.ra
Disallow *.vrml *.wrl *.png *.psd
Disallow *.exe *.com *.cab *.dll *.bin *.class *.ex_
Disallow *.tex *.texi *.xls *.doc *.texinfo
Disallow *.rtf *.pdf *.cdf *.ps
Disallow *.ai *.eps *.ppt *.hqx
Disallow *.cpt *.bms *.oda *.tcl
Disallow *.o *.a *.la *.so
Disallow *.pat *.pm *.m4 *.am *.css
Disallow *.map *.aif *.sit *.sea
Disallow *.m3u *.qt *.mov
Disallow *D=A *D=D *M=A *M=D *N=A *N=D *S=A *S=D *O=A *O=D
#Disallow Regex \.r[0-9][0-9]$ \.a[0-9][0-9]$ \.so\.[0-9]$
Include /*/disallows.conf
ReverseAlias regex ^(.*)&[a-zA-Z;]+=[a-zA-Z0-9]{32}(.*) $1$2
ReverseAlias regex ^(.*)\?[a-zA-Z;]+=[a-zA-Z0-9]{32}&(.*) $1?$2
ReverseAlias regex ^(.*)\?[a-zA-Z;]+=[a-zA-Z0-9]{32}&(.*) $1?$2
ReverseAlias regex ^(.*)\?[a-zA-Z;]+=[a-zA-Z0-9]{32}(.*) $1?$2
ReverseAlias regex ^(.*)&[a-zA-Z;]+=[a-zA-Z0-9]{32}(.*) $1$2
ReverseAlias regex ^(.*)[&\?][a-zA-Z;]+=[a-zA-Z0-9]{16}$ $1
ReverseAlias regex ^(.*)&[a-zA-Z;]+=[a-zA-Z0-9]{16}(.*) $1$2
ReverseAlias regex ^(.*)\?[a-zA-Z;]+=[a-zA-Z0-9]{16}&(.*) $1?$2
ReverseAlias regex ^(.*)\?[a-zA-Z;]+=[a-zA-Z0-9]{16}&(.*) $1?$2
ReverseAlias regex ^(.*)\?[a-zA-Z;]+=[a-zA-Z0-9]{16}(.*) $1?$2
ReverseAlias regex ^(.*)&[a-zA-Z;]+=[a-zA-Z0-9]{16}(.*) $1$2
ReverseAlias regex ^(.*)[&\?][a-zA-Z;]+=[a-zA-Z0-9]{32}$ $1
ReverseAlias regex ^(.*)([&\?])[a-zA-Z;]+=[a-zA-Z0-9]{32}&(.*) $1$2$3
ReverseAlias regex ^(.*)[&\?][a-zA-Z;]+=[a-zA-Z0-9]{16}$ $1
ReverseAlias regex ^(.*)([&\?])[a-zA-Z;]+=[a-zA-Z0-9]{16}&(.*) $1$2$3
HoldBadHrefs 30d
#UseRemoteContentType yes
AddType image/x-xpixmap *.xpm
AddType image/x-xbitmap *.xbm
AddType text/plain  *.txt  *.pl *.js *.h *.c *.pm *.e
AddType text/html   *.html *.htm
AddType text/rtf*.rtf
AddType application/pdf *.pdf
AddType application/msword  *.doc
AddType application/vnd.ms-excel*.xls
AddType text/x-postscript   *.ps
AddType application/unknown *.*
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

[dataparksearch] [Forum] Re: install

2008-07-28 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Mike
Subject: Re: install

Hi,

I installed the script but when I run (make install) after successfully running 
./install.pl and make I get these errors

make[2]: *** [install-includeHEADERS] Error 1
make[2]: Leaving directory `/usr/local/dpsearch/include'
make[1]: *** [install-am] Error 2
make[1]: Leaving directory `/usr/local/dpsearch/include'
make: *** [install-recursive] Error 1

The server is 

OS: CentOS 5.x
Hardware: Intel Core 2 Duo Processor E6420/2048MB Ram/2x200GB SATA 
Drivers/100Mbps Port Speed/1600GB Bandwidth Per Month
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;topic_id=1217236698



[dataparksearch] [Forum] Re: install

2008-07-28 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: install

It looks like you have put sources under /usr/local/dpsearch and you're trying 
to install into the same directory.
Try to move sources into another directory, i.e. into your home directory, and 
repeat installation from that new place.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;topic_id=1217236698



[dataparksearch] [Forum] Re: No

2008-07-30 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: No 

It looks like you have entered a Server command without trailing slash. Try 
correct it like this one:
Server http://www.sina.com.cn/

- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.com/cgi-bin/simpleforum.cgi?fid=02;topic_id=1217405250



[dataparksearch] [Forum] Re: No

2008-07-30 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: ssharry
Subject: Re: No 

Thank you!
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.com/cgi-bin/simpleforum.cgi?fid=02;topic_id=1217405250



[dataparksearch] [Forum] Re: Problem with install of 4.50

2008-07-31 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: Problem with install of 4.50

Please look inside config.log file in the directory where you have ran 
configure/install.pl, especially for the line which starts with
checking for MySQL support...
How this line looks like and few lines just after it in your config.log file ?
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217394663



[dataparksearch] [Forum] About Chinese charset

2008-08-01 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: ssharry
Subject: About Chinese charset

Hi,

I configured the project as follow,but still can't see the right chinese words 
through cgi.

./configure --prefix=/home/sc/ --with-pgsql=/usr/local/pgsql/ 
--with-extra-charsets=chinese --without-aspell
make 
make install

in indexer.conf

 pgsqlX
 Server http://www.sina.com.cn/
 LocalCharset BIG5
 LoadChineseList BIG5 /home/share/dpsearch-4.50/TraditionalChinese.freq 

./indexer -W  
 it runs without problem.

But when accessing through cgi, it still can show chinese characters. Like this:

1. ÁªÏµÎÒÃÇ_ÐÂÀËÍø [0.006% Popularity: 0.25000] 
ÐÂÀËÍø¿Í»¡±¡PþÎñµç»¢X ÐÂÀËÍø²úÆ¡PÓû¡±¡PþÎñ¢G¬²úÆ¡P¡ÑÉѯ¢G¬¢G¡± ¡±...  
http://www.sina.com.cn/contactus.html - 21615 bytes [text/html] - Mon, 21 Jul 
2008, 20:28:17 CST 
[All results from this site ]  


Could you give me any suggections? Thank you very much.



- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;post=



[dataparksearch] [Forum] Re: About Chinese charset

2008-08-01 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: About Chinese charset

did you uncomment all chinese language maps in langmap.conf file ? They are 
commented out by default, since the support for chinese charsets doesn't 
compiled in by default.
If you need to uncomment these maps, you have to reindex your pages indexed.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;topic_id=1217585036



[dataparksearch] [Forum] An Error about client_encoding

2008-08-04 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: ssharry
Subject: An Error about client_encoding


Hi

Here is the log of an error when indexing.

{sql.c:1990} Query: SELECT rec_id, hops FROM url WHERE 
url='http://www.verycd.com/tags/动漫/'
SQL-server message: ERROR:  invalid byte sequence for encoding "UTF8": 
0xb6
HINT:  This error can also happen if the byte sequence does not match the 
encoding expected by the server, which is controlled by "client_encoding".
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=03;post=



[dataparksearch] [Forum] Install for people like me cpanel - linux

2008-08-04 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Mike
Subject: Install for people like me cpanel - linux

As couldnt find an install for dummies like me this is what I did

In cpanel make sure you create a new mysql database and give a user ALL 
priviliges the account and database name will be like this acct_user and 
acct_databasename

1) SSH into your server with putty
2) cd /
3) mkdir dpsearch
4) cd /dpsearch
5) wget (url for dpsearch)
6) unpack tar file
7) cd dpsearch directory
8) ./install.pl (if this doesnt work first type chmod 755 ./install.pl)
9) make selections
10) make
11) make install
12) cd /usr/local/dpsearch/bin
13) cp ./*.cgi /home/acct/public_html/cgi-bin
14) chown -R acct:acct /home/acct/public_html/cgi-bin
15) vi /usr/local/dpsearch/etc/indexer.conf-dist
16) add the information needed and then :w indexer.conf at the top where it 
asks for your mysql info with foo and bar, use this format 

DBAddr  mysql://acct_user:[EMAIL PROTECTED]/acct_databasename/?dbmode=cache

17) change all -dist files to .conf and .htm check by typing ls to see which 
once are there and edit them all accordingly
18) cd ../sbin
19) ./indexer -Ecreate
20) ./indexer 

Now it should start indexing whatever information you setup in your indexer.conf

Go to www.yoursite.com/cgi-bin/search.cgi and it should show a search bar, if 
it doesnt most likely your having wrong permissions or owners for files. 

Anyways... I know this is a stupid list and all, but it took me a week to 
figure it out... now that it works... I love it! Having so much fun with this 
thing !

- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;post=



[dataparksearch] [Forum] Re: Протестил новый поиск

2008-08-04 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Roman
Subject: Re: Протестил новый поиск

stored, я где-то в мануале видел команду к indexer переиндексировать базу 
поиска из сохранённых копии (что то счас не найду как точно она выглядит). 
Правда не заглючит ли она, при условии что сами ссылки на них не видны при 
поиске?
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=05;topic_id=1216509417;page=2



[dataparksearch] [Forum] Re: Протестил новый поиск

2008-08-04 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: Протестил новый поиск

Я пока не знаю о причинах пропадания ссылок, поэтому при индексировании из базы 
stored (это ключ -B для indexer), возможно, вы получите только 30% документов 
из базы stored, остальные будут проиндекированы как обычно, вытягиванием через 
интернет.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=05;topic_id=1216509417;page=2



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-05 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Mike
Subject: Re: getting closer to my end result

Right now all is dbmode multi
As soon as I change this to cache the following happens

I search for mason -- no results
I search for Mason -- some results
I search for 1 -- No results

It seems that with cache turned on I cannot search any of the documents based 
on spelling, 
however if I turn to dbmode multi... all works very well... check it out 
www.biblers.org/cgi-bin/search.cgi

I have put DoStore yes in all files

- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-05 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: getting closer to my end result

Using dbmode cache you have to write down fresh URL data and limits using the 
command
./indexer -THW
after each indexing/reindexing (or periodically if indexing takes long run).
Please note, if you use cached, this command exit immediately, but all work is 
performed by cached and this take some time (depend on search database size).

"Not found rec:..." message for subdocument indexing is normal, since indexer 
is trying first fetch subdoc from stored database and then from remote host. 
This allow to reduce traffic.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135



[dataparksearch] [Forum] Re: Search for XYZ. Search results: lait: 95421 / 95421 and don

2008-08-05 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Mike
Subject: Re: Search for XYZ. Search results: lait: 95421 / 95421  and don

Thank you,

I have dont this and started indexing, also ran the THW,
However one thing is weird

Search for Masons and you get results, Search for masons and you get no results.

Also if I click cached copy it goes to a The webpage cannot be found

Ahhh, one more!... my indexer get stuck on .swf files!


- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;topic_id=1209717853



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-05 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Mike
Subject: Re: getting closer to my end result

Thank you,

I have dont this and started indexing, also ran the THW,
However one thing is weird

Search for Masons and you get results, Search for masons and you get no results.

Also if I click cached copy it goes to a The webpage cannot be found

Ahhh, one more!... my indexer get stuck on .swf files!



- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135



[dataparksearch] [Forum] Re: How to crawl from one site to other sites using links?

2008-08-05 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: How to crawl from one site to other sites using links?

Please describe more what are expecting to get ?

By default, dpsearch crawls all links betwen site which are having a 
corresponding Server/Realm/Subnet command in indexer.conf file. So you need to 
write appropriate commands in your indexer.conf file.

If you need to enable population of the links table for the PopRank 
calculation, you need to place the command
CollectLinks yes
into your indexer.conf file. 
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;topic_id=1217940158



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-05 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Mike
Subject: Re: getting closer to my end result

Kicked of the indexer last night, and just came back to my office now..
17,000,000 indexed dict definitions.. its going well! 
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-05 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Mike
Subject: Re: getting closer to my end result

This GroupBySites=yes can I not put this in the indexer or search.htm template?

If not, how do I pass this to my search.cgi 
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-05 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: getting closer to my end result

You may put it as a hidden CGI parameter into your search form:


You don't need to put it into your search template search.htm, since it already 
put here and take the value by default or what was passed in CGI-parameters.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135



[dataparksearch] [Forum] ? in url

2008-08-06 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: pending
Subject: ? in url

Generally speaking, dpsearch indexes my site correctly, which is using a php 
framework. 

Although after indexing the site, it indeed indexed all required urls including 
those like http://mySiteDomain/products/1/index.html?id=353, it saved no 
word/information of pages using sort of the above link into the 'dict' table of 
the 'search' database. 

Other pages can be correctly searched. Could anyone tell me what the problem 
might be about dpsearch ignoring information under links like: 
http://mySiteDomain/products/number/index.html?id=

Thanks in advance!
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=03;post=



[dataparksearch] [Forum] segfault

2008-08-06 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Fox
Subject: segfault

Перевел баэу с 4.48 на 4.50
indexer -Erehashstored

поиск отказывется работать с появлением такого сообщения в логах системы

search.cgi[2681]: segfault at 8 ip 7ff145dcb932 sp 7fff4f530190 error 4 in 
libc-2.8.so[7ff145d3a000+141000]

downgrade на 4.48, там все ок.
На тестовой базе выло все ok, правда при индексирование с нуля.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=05;topic_id=1216734506



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-07 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Mike
Subject: Re: getting closer to my end result

Thank you so much!

cache works
group by page works
indexer is running hard
Aspell is working

awesome!... thank you sooo much!

1 question for today

How to Disallow a url, for example no indexing of amazon.com


- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135



[dataparksearch] [Forum] Re: ? in url

2008-08-07 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: ? in url

Please run the command:
./indexer -qamv5 -u http://mySiteDomain/products/1/index.html?id=353
the -v5 switch here enables full debug output, include information why this 
page has been indexed or not.
Please show the output of this command if it doesn't get you a clue.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=03;topic_id=1218086676



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-07 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: getting closer to my end result

Put this command into your indexer.conf file:
Disallow regex amazon\.com

- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-07 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Mike
Subject: Re: getting closer to my end result

When I do that it gives me

indexer[9452]: {01} SubDoc.robots.txt: 'Disallow /'
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=2



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-07 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: getting closer to my end result

No, this message means, that a subdocument is disallowed by a rule in 
robots.txt of remote site.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=2



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-07 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Mike
Subject: Re: getting closer to my end result

I am not sure what happens,

but all my indexer seem to be stuck amazon, nothing goes along... it gets worse 
if I put the line

Disallow regex amazon\.com  (or Regex)
 
in my indexer.conf
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=2



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-07 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: getting closer to my end result

What do you mean under "stuck amazon" ?
Probably, you've got a vast number of URLs from amazon.com and indexer deletes 
all of them according to this Disallow command.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=2



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-07 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Mike
Subject: Re: getting closer to my end result

I am not sure what happened... but I guess your right, it now has to delete all 
the amazon entries.

Its a lot of fine tuning hey!


- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=2



[dataparksearch] [Forum] Re: segfault | Can

2008-08-07 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Fox
Subject: Re: segfault | Can

но уже после "indexer -Erehashstored" назад дороги нет, Видимо придется 
переиндексировать с нуля
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=05;topic_id=1216734506



[dataparksearch] [Forum] Re: segfault | Can

2008-08-07 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: segfault | Can

Включите, пожалуйста, создание посмертных дампов для пользователя, из-под 
которого запускается search.cgi, командой 
limits -c unlimited
затем создайте по полученому дампу отчет как написано здесь:
http://www.dataparksearch.org/dpsearch-misc.ru.html#bugs-core

Если высделали бэкап вашей директории /usr/local/dpsearch/var/ , то можено 
откатиться восстановив эту директорию из дампа.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=05;topic_id=1216734506;page=2



[dataparksearch] [Forum] Re: segfault | Can

2008-08-07 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Fox
Subject: Re: segfault | Can

Запустил индексацию с нуля все ok, появился шанс это сделать :) думаю нет 
смысла тратить время на проблемы с совместимостью, пока. Дальше будут проблемы 
выложу дамп. Спасибо.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=05;topic_id=1216734506;page=2



[dataparksearch] [Forum] Re: segfault | Can

2008-08-08 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Fox
Subject: Re: segfault | Can

trouble с каткгориями в версии 4.50
индексация произведена с ключами:

## LIMITS !!!
Limit c:category
...
##

Category 01
Server site http://site.name
...
при поиске добавляем "&c=01" в URL результат "did not find any results"
в версии 4.48 все ok
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=05;topic_id=1216734506;page=2



[dataparksearch] [Forum] Re: segfault | Can

2008-08-08 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: segfault | Can

Эта же команда Limit присутствует в шаблоне search.htm или в файле конфигурации 
searchd.conf, если используется searchd ?

Добавьте в шаблон searchd.htm или в searchd.conf команду 
LogLevel 5
что при этом будет выводиться в error_log при поиске с лимитом по категории ? 
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=05;topic_id=1216734506;page=2



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-09 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Mike
Subject: Re: getting closer to my end result

Hi,

I have no idea what I did wrong,

But when I start my indexer (I did a ./indexer -C) It show me the following


[EMAIL PROTECTED] ~]# /usr/local/dpsearch/sbin/indexer
indexer[4172]: {00} indexer from dpsearch-4.50-mysql started with 
'/usr/local/dpsearch/etc/indexer.conf'
indexer[4172]: {01} Done (1 seconds, 0 documents, 0 bytes,  0.00 Kbytes/sec.)
indexer[4172]: {00} Total 1 seconds, 0 documents, 0 bytes,  0.00 Kbytes/sec,  
0.00 sec/doc, 0 bytes/doc.
indexer[4172]: {00} Neo PopRank: 0 documents, 0 pas,  0.00 Kpas/sec,  0.00 
sec/doc,  0.00 pas/doc.
[EMAIL PROTECTED] ~]#


Why doesnt it start indexing?
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=2



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-09 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: getting closer to my end result

What the output is for the command:
/usr/local/dpsearch/sbin/indexer -S
?

Try to run
/usr/local/dpsearch/sbin/indexer -a
which is force reindexing for all documents in the database.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=2



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-10 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Mike
Subject: Re: getting closer to my end result

Ok. it is running, but no dict is filled, 

Database statistics

StatusExpired  Total
   -
 0 108210 111937 Not indexed yet
   200  0   5257 OK
   206  0  2 Partial OK
   301  0238 Moved Permanently
   302  0197 Moved Temporarily
   304  0149 Not Modified
   401  0 53 Unauthorized
   403  0  3 Forbidden
   404  0 57 Not found
   406  0 37 Not Acceptable
   415  0209 Unsupported Media Type
   500  0  6 Internal Server Error
   503  0 27 Service Unavailable
   504  0  2 Gateway Timeout
   -
 Total 108210 118174

- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=2



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-10 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: getting closer to my end result

If you use dbmode cache, dict table isn't filles. All data stores under 
/usr/local/dpserach/var directory.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=2



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-10 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Mike
Subject: Re: getting closer to my end result

I must have broken something, because there are no results anymore...
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=2



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-10 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Mike
Subject: Re: getting closer to my end result

Very confused,

If I search for "bible" i get over a thousand results, but if I then search for 
other words in the results of "bible" they dont show.. What am I doing wrong?
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=3



[dataparksearch] [Forum] Re: ? in url

2008-08-10 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: pending
Subject: Re: ? in url

thanks a lot, i have figured out what the problem is. session issue for cgi
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=03;topic_id=1218086676



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-11 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: getting closer to my end result

When dbmode cache is used, it use caching to reduce disk usage. It looks like 
the "bible" word is one of most used in your collection and its buffer have 
been already flushed while others buffers aren't filled yet.

If you use cached daemon, you may flush all buffers using the command
/usr/local/dpsearch/sbin/indexer -TH

If you don't use cached daemon, stop the indexer, it will flush all buffers on 
exit.

As well, you need to write URL data for dbmode cache using the command
/usr/local/dpsearch/sbin/indexer -TW

- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=3



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-11 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Mike
Subject: Re: getting closer to my end result

Hi, thanks as always,

I give up on cache mode, it is too much trouble... but multi is working nicely

About the amazon exclusion, I put the line you gave me in the indexer.conf but 
it still seems to get stuck on amazon, what happens is that it just looks 
stuck, these are last lines, keeps hanging everytime



k_7263062_4?pf_rd_m=ATVPDKIKX0DER&pf_rd_s=right-1&pf_rd_r=0CZBSQYM5VNGQ1B7T737&pf_rd_t=1401&pf_rd_p=424603701&pf_rd_i=161771
indexer[18526]: {01} [] Subdoc URL: 
http://ad.doubleclick.net/adi/amzn.us.dp.books/nonfiction.true_accounts;sz=300x250;s=3;s=5;s=9;s=10;s=12;s=14;s=22;s=32;s=37;s=40;s=49;s=52;s=53;s=56;s=57;s=58;s=59;s=63;s=66;s=67;s=86;s=88;s=89;s=92;s=94;s=96;s=97;s=100;u=74fb9213ce0
indexer[18526]: {01} SubDoc.robots.txt: 'Disallow /'
indexer[18526]: {01} URL: 
http://www.amazon.com/gp/product/0060892080/ref=amb_link_7094502_3?pf_rd_m=ATVPDKIKX0DER&pf_rd_s=center-5&pf_rd_r=0CZBSQYM5VNGQ1B7T737&pf_rd_t=1401&pf_rd_p=421043401&pf_rd_i=161771
indexer[18526]: {01} [] Subdoc URL: 
http://ad.doubleclick.net/adi/amzn.us.dp.books/childrens;sz=300x250;s=3;s=5;s=9;s=10;s=12;s=14;s=22;s=32;s=37;s=40;s=49;s=52;s=53;s=56;s=57;s=58;s=59;s=63;s=66;s=67;s=86;s=88;s=89;s=92;s=94;s=96;s=97;s=100;u=22c94b9c7c0e4cd982fa2a008c
indexer[18526]: {01} SubDoc.robots.txt: 'Disallow /'
indexer[18526]: {01} URL: 
http://www.amazon.com/gp/product/0061147761/ref=amb_link_7263062_6?pf_rd_m=ATVPDKIKX0DER&pf_rd_s=right-1&pf_rd_r=0CZBSQYM5VNGQ1B7T737&pf_rd_t=1401&pf_rd_p=424603701&pf_rd_i=161771
indexer[18526]: {01} URL: 
http://www.amazon.com/gp/product/0061173509/ref=amb_link_7263082_1?pf_rd_m=ATVPDKIKX0DER&pf_rd_s=right-3&pf_rd_r=0CZBSQYM5VNGQ1B7T737&pf_rd_t=1401&pf_rd_p=424482201&pf_rd_i=161771
indexer[18526]: {01} [] Subdoc URL: 
http://ad.doubleclick.net/adi/amzn.us.dp.books/fiction_literature.fiction;sz=300
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=3



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-11 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: getting closer to my end result

Please show the output for the command
/usr/local/dpsearch/sbin/indexer -v5 -n1 -u http://www.amazon.com/%

Yes, it will be huge, post it anyway.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=3



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-11 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: getting closer to my end result

Place
Allow *
command in your indexer.conf file below any of Disallow command. 
All Allow/Disallow commands are trying on order of appearance in the 
indexer.conf and only the first match apply.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=3



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-11 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Mike
Subject: Re: getting closer to my end result

This is my indexer.conf 

Am I doing something wrong?

#VarDir /usr/local/dpsearch/var
#NewsExtensions no
#AccentExtensions no
#SyslogFacility local7
#LocalCharset iso-8859-1
#LocalCharset windows-1252
# Central Europe: Czech, Slovenian, Slovak, Hungarian
#LocalCharset iso-8859-2
#LocalCharset windows-1250
# Japanese
#LocalCharset UTF-8
CrossWords yes
CollectLinks yes
DoStore yes
StopwordFile stopwords/en.sl
Include stopwords.conf
#LangMapFile langmap/en.ascii.lm
Include langmap.conf
MinWordLength 1
MaxWordLength 32
#MaxDocSize 1048576
#MinDocSize 1024
#IndexDocSizeLimit 65536
#URLSelectCacheSize 10240
#HTTPHeader "User-Agent: My_Own_Agent"
#HTTPHeader "Accept-Language: ru, en"
#HTTPHeader "From: [EMAIL PROTECTED]"
#FlushServerTable
#ServerTable mysql://user:[EMAIL PROTECTED]/dbname/tablename
#UseDateHeader yes
Allow *
Allow Case *.HTM
Disallow *.b*.sh   *.md5  *.rpm
Disallow *.arj  *.tar  *.zip  *.tgz  *.gz   *.z *.bz2 
Disallow *.lha  *.lzh  *.rar  *.zoo  *.ha   *.tar.Z
Disallow *.gif  *.jpg  *.jpeg *.bmp  *.tiff *.tif   *.xpm  *.xbm *.pcx
Disallow *.vdo  *.mpeg *.mpe  *.mpg  *.avi  *.movie *.mov  *.dat
Disallow *.mid  *.mp3  *.rm   *.ram  *.wav  *.aiff  *.ra
Disallow *.vrml *.wrl  *.png  *.psd
Disallow *.exe  *.com  *.cab  *.dll  *.bin  *.class *.ex_
Disallow *.tex  *.texi *.xls  *.doc  *.texinfo
Disallow *.rtf  *.pdf  *.cdf  *.ps
Disallow *.ai   *.eps  *.ppt  *.hqx
Disallow *.cpt  *.bms  *.oda  *.tcl
Disallow *.o*.a*.la   *.so 
Disallow *.pat  *.pm   *.m4   *.am   *.css
Disallow *.map  *.aif  *.sit  *.sea
Disallow *.m3u  *.qt   *.mov
Disallow *D=A *D=D *M=A *M=D *N=A *N=D *S=A *S=D *O=A *O=D
Disallow Regex \.r[0-9][0-9]$ \.a[0-9][0-9]$ \.so\.[0-9]$
#CheckOnly *.lha  *.lzh  *.rar  *.zoo  *.tar*.Z
#CheckOnly *.gif  *.jpg  *.jpeg *.bmp  *.tiff 
#CheckOnly *.vdo  *.mpeg *.mpe  *.mpg  *.avi  *.movie
#CheckOnly *.mid  *.mp3  *.rm   *.ram  *.wav  *.aiff
#CheckOnly *.vrml *.wrl  *.png
#CheckOnly *.exe  *.cab  *.dll  *.bin  *.class
#CheckOnly *.tex  *.texi *.xls  *.doc  *.texinfo
#CheckOnly *.rtf  *.pdf  *.cdf  *.ps
#CheckOnly *.ai   *.eps  *.ppt  *.hqx
#CheckOnly *.cpt  *.bms  *.oda  *.tcl
#CheckOnly *.rpm  *.m3u  *.qt   *.mov
#CheckOnly *.map  *.aif  *.sit  *.sea
# or check ANY except known text extensions using "regex" match:
#CheckOnly NoMatch Regex \/$|\.html$|\.shtml$|\.phtml$|\.php$|\.txt$
#HrefOnly */mail*.html */thread*.html
Allow .html .txt .php .php* .htm */ .shtml .pl 
Disallow *
#HoldBadHrefs 30d
#DeleteOlder 7d
# Default: yes
UseRemoteContentType yes
AddType image/x-xpixmap *.xpm
AddType image/x-xbitmap *.xbm
AddType image/gif   *.gif
AddType text/plain  *.txt  *.pl *.js *.h *.c *.pm *.e
AddType text/html   *.html *.htm
AddType text/rtf*.rtf
AddType application/pdf *.pdf
AddType application/msword  *.doc
AddType application/vnd.ms-excel*.xls
AddType text/x-postscript   *.ps
#DefaultLang en
MaxDocsPerServer -1
#MaxNetErrors 16
#ReadTimeOut 30s
#DocTimeOut 1m30s
#NetErrorDelayTime 1d
Robots yes
Cookies yes
DetectClones yes
Include sections.conf
Index yes
PopRankMethod Goo
PopRankSkipSameSite yes
PopRankFeedBack yes
Realm * 
IndexIf regex title [Jj]esus [Cc]hrist [Mm]asonry [Mm]asonic [Ff]reemason 
[Cc]hristianity [Cc]atholic [Rr]eligion [Hh]iram [Aa]bif [Aa]biff [Pp]rotestant 
[Cc]hurch [Ss]cientology [Aa]theism [Bb]aptist [Rr]ites [Kk]abala [Cc]abala 
[Tt]emplar 
IndexIf regex body [Jj]esus [Cc]hrist [Mm]asonry [Mm]asonic [Ff]reemason 
[Cc]hristianity [Cc]atholic [Rr]eligion [Hh]iram [Aa]bif [Aa]biff [Pp]rotestant 
[Cc]hurch [Ss]cientology [Aa]theism [Bb]aptist [Rr]ites [Kk]abala [Cc]abala 
[Tt]emplar 
NoIndexIf title * 
NoIndexIf body *
Disallow regex amazon\.com
Allow * 
URL en.wikipedia.org/wiki/Freemasonry
URL http://www.bessel.org/
URL http://www.sacred-texts.com/mas/
URL http://www.masonicinfo.com/
URL http;//www.masonicinfo.com/
URL http;//www.freemasons-freemasonry.com/
URL http://gnosismagazine.com/
URL http://www.freemasonrytoday.com/
URL http;//www.phoenixmasonry.org/
URL http://www.corcerstonesociety.com/
URL http://www.freemasonry.org/
URL http://albertpike.org/
URL http://freemasonry.net/somerset/
URL http://freimaurer.org/
URL http://gl-mi.org/lodges/dearborn-172/
URL http://masons.sk.ca/
URL http://mastermason.com
URL http://morelight.org/444/
URL http://mountvernon14.org/
URL http://mt.moriahlodgeno18.com/
URL http://mwphglne.org/
URL http://www.novusordosaeculorum.com/
URL http://www.churchesofchrist.net/
URL http://www.jewishencyclopedia.com/
URL http://www.fullbooks.com/
URL http://virtualreligion.net/
URL http://www.biblegateway.com/
URL http://seattlemasons.org/
URL http://valleylodge511.com/
URL http://www.2be1ask1.com/
URL http://www.alaska-mason.org/grand_lodge/
URL http://www.alphalodge729.com/
URL http://www.ancientlandmarks.com/
U

[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-11 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: getting closer to my end result

Yes, it seems you need to comment in the
Allow *
command on 31st line.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=3



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-11 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Mike
Subject: Re: getting closer to my end result

Like this


CrossWords yes
CollectLinks yes
DoStore yes
StopwordFile stopwords/en.sl
Include stopwords.conf
Include langmap.conf
MinWordLength 1
MaxWordLength 32
#Allow *
Allow Case *.HTM
Disallow *.b*.sh   *.md5  *.rpm
Disallow *.arj  *.tar  *.zip  *.tgz  *.gz   *.z *.bz2 
Disallow *.lha  *.lzh  *.rar  *.zoo  *.ha   *.tar.Z
Disallow *.gif  *.jpg  *.jpeg *.bmp  *.tiff *.tif   *.xpm  *.xbm *.pcx
Disallow *.vdo  *.mpeg *.mpe  *.mpg  *.avi  *.movie *.mov  *.dat
Disallow *.mid  *.mp3  *.rm   *.ram  *.wav  *.aiff  *.ra
Disallow *.vrml *.wrl  *.png  *.psd
Disallow *.exe  *.com  *.cab  *.dll  *.bin  *.class *.ex_
Disallow *.tex  *.texi *.xls  *.doc  *.texinfo
Disallow *.rtf  *.pdf  *.cdf  *.ps
Disallow *.ai   *.eps  *.ppt  *.hqx
Disallow *.cpt  *.bms  *.oda  *.tcl
Disallow *.o*.a*.la   *.so 
Disallow *.pat  *.pm   *.m4   *.am   *.css
Disallow *.map  *.aif  *.sit  *.sea
Disallow *.m3u  *.qt   *.mov
Disallow *D=A *D=D *M=A *M=D *N=A *N=D *S=A *S=D *O=A *O=D
Disallow Regex \.r[0-9][0-9]$ \.a[0-9][0-9]$ \.so\.[0-9]$
Allow .html .txt .php .php* .htm */ .shtml .pl 
Disallow *
UseRemoteContentType yes
AddType image/x-xpixmap *.xpm
AddType image/x-xbitmap *.xbm
AddType image/gif   *.gif
AddType text/plain  *.txt  *.pl *.js *.h *.c *.pm *.e
AddType text/html   *.html *.htm
AddType text/rtf*.rtf
AddType application/pdf *.pdf
AddType application/msword  *.doc
AddType application/vnd.ms-excel*.xls
AddType text/x-postscript   *.ps
MaxDocsPerServer -1
Robots yes
Cookies yes
DetectClones yes
Include sections.conf
Index yes
PopRankMethod Goo
PopRankSkipSameSite yes
PopRankFeedBack yes
Realm * 
IndexIf regex title [Jj]esus [Cc]hrist [Mm]asonry [Mm]asonic [Ff]reemason 
[Cc]hristianity [Cc]atholic [Rr]eligion [Hh]iram [Aa]bif [Aa]biff [Pp]rotestant 
[Cc]hurch [Ss]cientology [Aa]theism [Bb]aptist [Rr]ites [Kk]abala [Cc]abala 
[Tt]emplar 
IndexIf regex body [Jj]esus [Cc]hrist [Mm]asonry [Mm]asonic [Ff]reemason 
[Cc]hristianity [Cc]atholic [Rr]eligion [Hh]iram [Aa]bif [Aa]biff [Pp]rotestant 
[Cc]hurch [Ss]cientology [Aa]theism [Bb]aptist [Rr]ites [Kk]abala [Cc]abala 
[Tt]emplar 
NoIndexIf title * 
NoIndexIf body *
Disallow regex amazon\.com
Allow * 
URL en.wikipedia.org/wiki/Freemasonry

- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=3



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-11 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: getting closer to my end result

Yes, it is.
Please note, the commands
Disallow regex amazon\.com 
Allow *
doesn't play anything, since all documents are dissalowed by the command
Disallow *
above.
If you need to disallow anything from amazon.com domain, you need to move the 
command
Disallow regex amazon\.com 
above any Allow command in your config.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=3



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-11 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Mike
Subject: Re: getting closer to my end result

Thats confusing, sorry
 

Like this , it looks silly!

Allow .html .txt .php .php* .htm */ .shtml .pl
Disallow regex amazon\.com
Allow *
Disallow *

- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=4



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-11 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: getting closer to my end result

Once again, the
Allow *
command just after
Disallow regex amazon\.com
command allows indexing of everything except amazon.com and makes any of Allow 
/ Disallow command after it. It seems you need to remove this
Allow *
command.


- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=4



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-11 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Mike
Subject: Re: getting closer to my end result

I did that.. thanks a lot for your patience,
One thing keeps happening,

My indexer keeps freezing or something.. it starts, and then it stops after a 
few minutes... 
at different pages and places... I stop it and restart... and it keeps 
happening over and over again..

what I would love to have, is too kick it off.. and just let it go for eternity
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=4



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-11 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: getting closer to my end result

How many indexing threads do you start at same time ? (what is the value for -N 
switch for indexer ?)
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=4



[dataparksearch] [Forum] Re: segfault | Can

2008-08-11 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Fox
Subject: Re: segfault | Can

Limit присутствует в шаблоне search.htm и searchd.conf
файл error_log не появляется, смог вывести в syslog, следующую инфу, если это 
моможет:

###
search.cgi started with '/home/indexer/dpsearch/etc/search.htm'
VarDir: '/home/indexer/dpsearch/var'
Affixes: 0, Spells: 0, Synonyms: 0, Acronyms: 0, Stopwords: 0
Chinese dictionary with 0 entries
Korean dictionary with 0 entries
Thai dictionary with 0 entries
Start DpsFind
DpsFind for pgsql://dpsearch:[EMAIL PROTECTED]/search/?dbmode=cache
DpsGetWords for pgsql://dpsearch:[EMAIL PROTECTED]/search/?dbmode=cache
.spell lang: en
Prepare query: mail, ltxt:mail
Segment lang:
wrd {4}: mail
00334200  - 004ce2ff 81bf0fff
num: 0
lims.0.size:1
[tree/wrd] ARetrieved rec_id: b216802 Size: 4019->9576
max_order: 0  max_order_inquery: 0
Start Order, Last-Modified and Excerpts
Stop  Order, Last-Modified and Excerpts: 0.00
Start DpsTrack
Stop  DpsTrack: 0.00
Done  DpsFind 0.002
###

- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=05;topic_id=1216734506;page=2



[dataparksearch] [Forum] Re: segfault | Can

2008-08-11 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: segfault | Can

Выглядит, как будто нет данных в лимите по категориям. Выполнялась ли команда 
indexer -TW 
по окончании индексирования и searchd отправлялся сигнал -HUP на перезагрузку 
данных об URL и лимитов, если они предзагружаются в память ?

Есть ли у пользователя, из под которого запущен searchd права на чтение файлов
/usr/local/dpsearch/var/tree/lim_* 
?
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=05;topic_id=1216734506;page=2



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-11 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Mike
Subject: Re: getting closer to my end result

Hi,

At the end of the day this is the message

SQL-server message: MySQL driver: #1203: User biblers_search has already more 
than 'max_user_connections' active connections

indexer[4313]: {01} MySQL driver: #1203: User biblers_search has already more 
than 'max_user_connections' active connections
indexer[4313]: {01} Error: 'No appropriate storage support compiled'
indexer[4313]: {00} Total 104 seconds, 27 documents, 800717 bytes,  7.52 
Kbytes/sec,  3.85 sec/doc, 29656 bytes/doc.
indexer[4313]: {00} Neo PopRank: 0 documents, 0 pas,  0.00 Kpas/sec,  0.00 
sec/doc,  0.00 pas/doc.
[EMAIL PROTECTED] ~]#
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=4



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-11 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: getting closer to my end result

What value for max_user_connections do you have for the User biblers_search ?
How many indexers running simultaneously do you have an ow many indexing 
threads each of them have ?
By default, DataparkSearch open one connection per every indexing thread, so if 
you run 3 indexers with 4 indexing threads each, you'll have 12 connections to 
the SQL server.

To force indexer use one single connection for all indexing threads use -U 
switch for it.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=4



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-11 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Mike
Subject: Re: getting closer to my end result

I just increased max connections to 100 so it should be ok, I have 2 indexers 
running now,

BUT>.. I wanted to use cache mode and 

changed all dbmode multi to cache
Added this line to search.htm, indexer.conf, cached.confVarDir 
/usr/local/dpsearch/var/
./indexer -C
./indexer -Edrop
./indexer -Ecreate
cd ../var/cache | rm -rf *
/usr/local/dpsearch/sbin/indexer 

waited a while, did ./indexer -TWH

But... no search results..

I just dont know what is going wrong?...

- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=4



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-11 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: getting closer to my end result

Have you stopped
/usr/local/dpsearch/sbin/indexer
?
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=4



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-11 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Mike
Subject: Re: getting closer to my end result

ctrl Z before I did all the other work... 

- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=4



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-11 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: getting closer to my end result

Ctrl Z suspends the program. To stop it, use Ctrl C.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=4



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-12 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Mike
Subject: Re: getting closer to my end result

I had to make a decission anyways on multi or cache, and as multi works very 
well now its just the easier choice.

Thank you for your patience and kind advise!
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=5



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-12 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: getting closer to my end result

Please note, dbmode cache works much faster with huge number of URLs indexed.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=5



[dataparksearch] [Forum] show total sites

2008-08-13 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: mike
Subject: show total sites

I would like to out a blib if info on the site

total sites indexed
total size of index

can someone please advise me how to do this...
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;post=



[dataparksearch] [Forum] Re: show total sites

2008-08-13 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: show total sites

You may find the number of site indexed with this SQL-query to the search 
database;
SELECT COUNT(*) FROM (SELECT distinct site_id FROM url) AS foo;

Please note, this query works only for PgSQL and MySQL 5.

The number and the size of documents indexed you can find with this SQL-query:
SELECT COUNT(*), SUM(docsize) FROM url WHERE status IN 
(200,206,304,2200,2206,2304);

Please note, these queries are very hard formedium and large databases, so it's 
better to run these queries periodically and write numbers into text file and 
then include this text file into your web-page. 
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;topic_id=1218615278



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-13 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Mike
Subject: Re: getting closer to my end result

Yeah I know, but I keep doing something wrong and cant get cache to work... 
weird!
The multi database is useless if you want cache after right? it is one or the 
other I think...

I wish it worked, but am not so good!
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=5



[dataparksearch] [Forum] Re: show total sites

2008-08-13 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Mike
Subject: Re: show total sites

Thanks a lot!

Also.. i was wondering, is there anywhere that the search terms are kept? It 
would be a great statistic to keep track of!
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;topic_id=1218615278



[dataparksearch] [Forum] Re: show total sites

2008-08-13 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: show total sites

You need to enable search query tracking, see:
http://www.dataparksearch.org/dpsearch-track.en.html
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;topic_id=1218615278



[dataparksearch] [Forum] Cannot display search results

2008-08-13 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: gagrilli
Subject: Cannot display search results

Hi, 
Trying to setup DPsearch for the first time, so this is probably some stupid 
mistake, but here it is..
Apache 2.2.9, MySQL 5.0.51b, Perl 5.10.0  , (just search.cgi no mod or searchd)
I Think I setup indexer.conf asper the instructions, but theindexerrefusesto 
index anything, unless I provide in the DBAddr command the ?socket parameter, 
pointing to my mysql.sock file. In this case I don't see how I could provide 
the desired dbmode in the same line..(?). With the ?socket=... parameter, the 
indexer runs succesfully, but my search turns up 0/0 results. I have set up a 
user, the tables get created, but no results. search.cgi gets called as I can 
see from my server logs, DNAddr in search.html is the same as in indexer.conf, 
I have cache disabled...
Any pointers to what might be wrong are welcome. 
Anything else I can provide, I am glad to
Thanks for the effort you put behind the project

- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;post=



[dataparksearch] [Forum] Re: Cannot display search results

2008-08-13 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: Cannot display search results

You can use both socket and dbmode parameters in DBAddr in that way:
DBAddr mysql://?socket=...&dbmode=...

- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1218653591



[dataparksearch] [Forum] Re: Cannot display search results

2008-08-13 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: gagrilli
Subject: Re: Cannot display search results

Thanks for your quick reply, Maxime. 
I got the indexer working, it told me it had 805 documents indexed, but, again 
nothing(!)
I think I am doing something wrong woth the Server command , though, because 
looking to the MySQL tables I don't see the dict table populated with words, 
only the url and urlinfo ones. 
I am not interested in following the links in my documents while indexing, 
because they are not a website, only html and other types gathered together (I 
have external parsers in place and anyway I search for the common words found 
in HTML docs), so I put the MaxHops command with value 2. I can access the 
documents through the browser normally
My Server command directive in indexer.conf is as follows:

Server http://localhost/CLI/  file:/opt/lampp/htdocs/CLI/

I take it this is the correct way to tell it not to request by HTTP header but 
through the filesystem, right? 
Does the indexer respect that command or should I use sth else?
Does the search.cgi script respect that command or should I use sth else?

And one last question.. Do I just need to manually re-run indexer every time, 
or do I need to clear some cache somewhere?

Thanks again for your help... appreciate it

- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1218653591;reply=1218653955



[dataparksearch] [Forum] Re: Cannot display search results

2008-08-13 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: gagrilli
Subject: Re: Cannot display search results

I really don't understand what else can I add to my indexer.conf so that the 
basic functionality appears..
I tried changing the dbmode , I tried altering the Server directive, I tried 
changing the Period command, I tried changing the MaxHops command, I tried 
searching the forum, I tried deleting and recreating the database, I tried 
re-installing DPsearch. Please don't take this the wrong way, I'm sure the 
correct configuration is somewhere in the docs, I just can't seem to locate 
it...
OK, up to now..

--indexer connects to the database (MySQL),  runs OK, with increased verbosity 
I can see it parses my documents, with the -a option it gives 304 (not changed)
--search.cgi appears correctly on my browser (simple & extended)
--every search query has 0/0 result

I can post my .conf file if it is useful...

I don't expect to be taken by the hand here, just some direction I could follow.
I know the developers' time is not for me to abuse, but it would really mean a 
world of difference, because I'm in a somewhat tight schedule here.
Thanks againfor any answer...
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01&topic_id=1218653591



[dataparksearch] [Forum] Re: show total sites

2008-08-13 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Mike
Subject: Re: show total sites

Do I need to Edrop Ecreate again to change it over? 
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;topic_id=1218615278



[dataparksearch] [Forum] Re: show total sites

2008-08-13 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: show total sites

No, you don't need it.

You can add any URL to the database using the following SQL command:
INSERT INTO url (url, next_index_time) VALUES ('http://server.ext/', 0);
Attention: don't delete any URL in such way!

Also you can add any URL using indexer command:
/usr/local/dpsearch/sbin/indexer -qiu http://server.ext/

Please note: for both cases, you need to have a corresponding 
Server/Realm/Subnet command in your config for each URL feeded, otherwise all 
URL without appropriate Server/Realm/Subnet command will be deleted as indexer 
try to index them.

You can delete any URL fom database using the indexer:
/usr/local/dpsearch/sbin/indexer -Cu http://server.ext/

- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;topic_id=1218615278



[dataparksearch] [Forum] Re: Cannot display search results

2008-08-13 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: Cannot display search results

Have you created your sections.conf file and include it from your indexer.conf 
file ?
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1218653591



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-13 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: getting closer to my end result

If you would like ti try cache mode once again, add the following command to 
your search.htm template
LogLevel 5
and show the output to the server error log when your perform a search request.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=5



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-13 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Mike
Subject: Re: getting closer to my end result

I am so sorry, but which error log? The servers error log shows no errors when 
I add LogLevel 5 to the search.htm 
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=5



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-13 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Maxime
Subject: Re: getting closer to my end result

It's web-server error log for a web-server where search.cgi is calling.

Or you can run search.cgi from command line:

/usr/local/dpsearch/bin/search.cgi bible 2>err.log

then show the content of err.log file.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=5



[dataparksearch] [Forum] Re: getting closer to my end result

2008-08-13 Пенетрантность DataparkSearchForum
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Mike
Subject: Re: getting closer to my end result

search.cgi[3292]: {00} search.cgi started with 
'/usr/local/dpsearch/etc/search.htm'
search.cgi[3292]: {00} VarDir: '/usr/local/dpsearch/var'
search.cgi[3292]: {00} Affixes: 0, Spells: 0, Synonyms: 0, Acronyms: 0, 
Stopwords: 122
search.cgi[3292]: {00} Chinese dictionary with 0 entries
search.cgi[3292]: {00} Korean dictionary with 0 entries
search.cgi[3292]: {00} Thai dictionary with 0 entries
search.cgi[3292]: {00} Start DpsFind
search.cgi[3292]: {00} DpsFind for mysql://biblers_search:[EMAIL 
PROTECTED]/biblers_search/?dbmode=multi&trackquery
search.cgi[3292]: {00} DpsGetWords for mysql://biblers_search:[EMAIL 
PROTECTED]/biblers_search/?dbmode=multi&trackquery
search.cgi[3292]: {00} .spell lang: en
search.cgi[3292]: {00} Prepare query: bible, ltxt:bible
search.cgi[3292]: {00} Segment lang:
search.cgi[3292]: {00}  wrd {5}: bible
search.cgi[3292]: {00} Start search for 'bible'
search.cgi[3292]: {00} Stop search for 'bible'  0.10  18670 found
search.cgi[3292]: {00} Start sort by url_id 18670 words
search.cgi[3292]: {00} Stop sort by url_id: 0.00
search.cgi[3292]: {00} Start group by url_id 18670 docs
search.cgi[3292]: {00} max_order: 0  max_order_inquery: 0
search.cgi[3292]: {00} Stop group by url_id:0.08
search.cgi[3292]: {00} Start load url data 5096 docs
search.cgi[3292]: {00} Stop load url data:  0.10
search.cgi[3292]: {00} Start SORT by PATTERN 5096 words
search.cgi[3292]: {00} Stop SORT by PATTERN:0.00
search.cgi[3292]: {00} use_showcnt: 0  ratio: 0.00
search.cgi[3292]: {00} Start Order, Last-Modified and Excerpts
search.cgi[3292]: {00} [] Retrieve rec_id: 399488c7
search.cgi[3292]: {00} [] Retrieved rec_id: 399488c7 Size: 34395 Ratio: 28.07%
search.cgi[3292]: {00} [] Retrieve rec_id: d671482e
search.cgi[3292]: {00} [] Retrieved rec_id: d671482e Size: 69087 Ratio: 23.58%
search.cgi[3292]: {00} [] Retrieve rec_id: 30145be2
search.cgi[3292]: {00} [] Retrieved rec_id: 30145be2 Size: 63033 Ratio: 24.85%
search.cgi[3292]: {00} [] Retrieve rec_id: c1a76e13
search.cgi[3292]: {00} [] Retrieved rec_id: c1a76e13 Size: 17978 Ratio: 40.32%
search.cgi[3292]: {00} [] Retrieve rec_id: 2436fd71
search.cgi[3292]: {00} [] Retrieved rec_id: 2436fd71 Size: 13247 Ratio: 26.51%
search.cgi[3292]: {00} [] Retrieve rec_id: 386ba112
search.cgi[3292]: {00} [] Retrieved rec_id: 386ba112 Size: 51630 Ratio: 14.83%
search.cgi[3292]: {00} [] Retrieve rec_id: efb686f1
search.cgi[3292]: {00} [] Retrieved rec_id: efb686f1 Size: 13400 Ratio: 39.96%
search.cgi[3292]: {00} [] Retrieve rec_id: 1a1340d7
search.cgi[3292]: {00} [] Retrieved rec_id: 1a1340d7 Size: 45712 Ratio: 15.55%
search.cgi[3292]: {00} [] Retrieve rec_id: 79d513ed
search.cgi[3292]: {00} [] Retrieved rec_id: 79d513ed Size: 46894 Ratio: 17.28%
search.cgi[3292]: {00} [] Retrieve rec_id: 19371393
search.cgi[3292]: {00} [] Retrieved rec_id: 19371393 Size: 43155 Ratio: 16.39%
search.cgi[3292]: {00} Stop  Order, Last-Modified and Excerpts: 2.63
search.cgi[3292]: {00} Start DpsTrack
search.cgi[3292]: {00} Stop  DpsTrack: 0.00
search.cgi[3292]: {00} Done  DpsFind 12.925

- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=5



<    2   3   4   5   6   7   8   9   10   11   >