[dataparksearch] [Forum] Re: Индексация от обеда до забора
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: Индексация от обеда до забора Если имеется в виду индексирование всего Рунета, то Realm regex ^http://[^/\.]*\.ru/ Realm regex ^http://www.[^/\.]*\.ru/ Если имеется в виду индексирование всех ссылок, найденых на каком-то сайте, то такая возможность не поддерживается. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=05;topic_id=1215180806
[dataparksearch] [Forum] Re: FTP поиск по именам дирокторий и файлов
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: FTP поиск по именам дирокторий и файлов Честно говоря удивлен, что работает :) Вчерашний снапшот был недоделаным, сегодня пофиксил: http://www.dataparksearch.org/dpsearch-4.50-05072008.tar.bz2 deb пэкадж я не собираю, я делаю только порт для FreeBSD. Если вы дадите ссылку на описание, как делать deb пэкаджи и куда их отправлять (в репозиторий?), то я попробую следующую версию оформить в виде deb-пэкаджа. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=05;topic_id=1214665125;page=2
[dataparksearch] [Forum] Re: FTP поиск по именам дирокторий и файлов
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: MF Subject: Re: FTP поиск по именам дирокторий и файлов Вот официальная дока дебиана. http://www.us.debian.org/doc/manuals/maint-guide/ я плохо представляю как dataparksearch и mnogosearch могут совмещаться в 1 сисетеме, просто много в репах уже есть http://packages.debian.org/search?keywords=mnogosearch наверно надо делать исключение Пожалуй попробую собрать версию с mysql, если получиться - напишу. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=05;topic_id=1214665125;page=2
[dataparksearch] [Forum] Re: RSS выборочное срабатывание
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: RSS выборочное срабатывание Проверьте, какой именно лог вы смотрите, эта команда включает максимальный уровень выдачи отладочной информации, поэтому вывод в error_log должен увеличиться. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=06;topic_id=1215548226
[dataparksearch] [Forum] Re: RSS выборочное срабатывание
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: RSS выборочное срабатывание Попробуйте выполнить из командной строки: QUERY_STRING="%F1%EE%E1%E0%EA%E8&c=&site=&m=all&sp=1&sy=0&s=DRP&tmplt=rss.htm" ./search.cgi 2>err и покажите, что выводится в файл err. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=06;topic_id=1215548226
[dataparksearch] [Forum] Re: Segmentation fault при индексировании
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: dalex Subject: Re: Segmentation fault при индексировании Вот бэктрейс дампа версии 1.50 от 5-го числа этого месяца. Так же вываливается в segfault, только я удалил документ на котором валилось в прошлый раз. Сейчас валится на другом. # gdb /sbin/indexer core GNU gdb 6.1 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-suse-linux"...Using host libthread_db library "/lib64/tls/libthread_db.so.1". Core was generated by `:[1] URL:htdb:/04/'. Program terminated with signal 11, Segmentation fault. Reading symbols from //lib/libdpsearch-4.so...done. Loaded symbols for //lib/libdpsearch-4.so Reading symbols from //lib/libdpcharset-4.so...done. Loaded symbols for //lib/libdpcharset-4.so Reading symbols from /usr/lib64/libmysqlclient.so.15...done. Loaded symbols for /usr/lib64/libmysqlclient.so.15 Reading symbols from /lib64/tls/librt.so.1...done. Loaded symbols for /lib64/tls/librt.so.1 Reading symbols from /lib64/libz.so.1...done. Loaded symbols for /lib64/libz.so.1 Reading symbols from /usr/lib64/libaspell.so.15...done. Loaded symbols for /usr/lib64/libaspell.so.15 Reading symbols from /lib64/libdl.so.2...done. Loaded symbols for /lib64/libdl.so.2 Reading symbols from /usr/lib64/libstdc++.so.5...done. Loaded symbols for /usr/lib64/libstdc++.so.5 Reading symbols from /lib64/tls/libm.so.6...done. Loaded symbols for /lib64/tls/libm.so.6 Reading symbols from /lib64/tls/libc.so.6...done. Loaded symbols for /lib64/tls/libc.so.6 Reading symbols from /lib64/libgcc_s.so.1...done. Loaded symbols for /lib64/libgcc_s.so.1 Reading symbols from /lib64/tls/libpthread.so.0...done. Loaded symbols for /lib64/tls/libpthread.so.0 Reading symbols from /lib64/libcrypt.so.1...done. Loaded symbols for /lib64/libcrypt.so.1 Reading symbols from /lib64/libnsl.so.1...done. Loaded symbols for /lib64/libnsl.so.1 Reading symbols from /lib64/ld-linux-x86-64.so.2...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /lib64/libnss_files.so.2...done. Loaded symbols for /lib64/libnss_files.so.2 #0 0x002a95849a0e in DpsDSTRAppendUni (dstr=Variable "dstr" is not available. ) at charset-utils.c:334 334 charset-utils.c: No such file or directory. in charset-utils.c (gdb) backtrace #0 0x002a95849a0e in DpsDSTRAppendUni (dstr=Variable "dstr" is not available. ) at charset-utils.c:334 #1 0x002a95846ea2 in DpsUniDecomposeRecursive (buf=Variable "buf" is not available. ) at unidata.c:363 #2 0x002a95846f5e in DpsUniNormalizeNFD (buf=Variable "buf" is not available. ) at unidata.c:446 #3 0x002a9584706d in DpsUniNormalizeNFC (buf=Variable "buf" is not available. ) at unidata.c:470 #4 0x002a956cf409 in DpsPrepareItem (Indexer=Variable "Indexer" is not available. ) at parsehtml.c:103 #5 0x002a956d00cd in DpsPrepareWords (Indexer=Variable "Indexer" is not available. ) at parsehtml.c:469 #6 0x002a95684e2d in DpsIndexNextURL (Indexer=Variable "Indexer" is not available. ) at indexer.c:2054 #7 0x00404607 in main (argc=Variable "argc" is not available. ) at main.c:884 (gdb) q - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=04;topic_id=1214392453
[dataparksearch] [Forum] Re: RSS выборочное срабатывание
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: RSS выборочное срабатывание Попробуйте пересобрать указав для configure ключ --enable-syslog вместо --disable-syslog. Появится ли после этого отладочная информация в error_lor/файле err ? - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=06;topic_id=1215548226
[dataparksearch] [Forum] Проблема при configure
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Андрей Subject: Проблема при configure Здравствуйте! Только что нашел Вашу технологию, очень заинтересовала. Решил установить на сервак (ASPLinux 11) к себе, но... checking for daemon... yes checking for inet_addr... yes checking for sqrt... no checking for sqrt in -lm... yes checking for libtre... yes checking tre/regex.h usability... yes checking tre/regex.h presence... yes checking for tre/regex.h... yes checking for ares_init in -lcares... no checking for ares_init in -lares... no checking for getaddrinfo in -lbind... yes checking for hstrerror... no checking for getaddrinfo... no checking for inet_net_pton... no checking for pthread_setconcurrency function prototype in pthread.h... no checking for thr_setconcurrency function prototype in thread.h... no checking for char*... yes checking size of char*... configure: error: cannot compute sizeof (char*), 77 See `config.log' for more details. configure failed: 256 at ./install.pl line 176, line 32. Куда копать? - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=04;post=
[dataparksearch] [Forum] Re: Проблема при configure
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Андрей Subject: Re: Проблема при configure Вот еще вырезки из config.log This file contains any messages produced by compilers while running configure, to aid debugging if configure makes a mistake. It was created by configure, which was generated by GNU Autoconf 2.59. Invocation command line was $ ./configure --prefix=/usr/local/dpsearch --bindir=/usr/local/dpsearch/bin --sbindir=/usr/local/dpsearch/sbin --sysconfdir=/usr/local/dpsearch/etc --localstatedir=/usr/local/dpsearch/var --libdir=/usr/local/dpsearch/lib --includedir=/usr/local/dpsearch/include --mandir=/usr/local/dpsearch/man --enable-shared --enable-syslog --enable-pthreads --enable-parser --enable-mp3 --without-aspell --enable-file --enable-http --enable-ftp --enable-htdb --enable-news --with-mysql ## - ## ## Platform. ## ## - ## hostname = localhost uname -m = i686 uname -r = 2.6.14-1.1653.1aspsmp uname -s = Linux uname -v = #1 SMP Mon Jan 23 20:08:13 EET 2006 /usr/bin/uname -p = unknown /bin/uname -X = unknown /bin/arch = i686 /usr/bin/arch -k = unknown /usr/convex/getsysinfo = unknown hostinfo = unknown /bin/machine = unknown /usr/bin/oslevel = unknown /bin/universe = unknown PATH: /usr/kerberos/sbin PATH: /usr/kerberos/bin PATH: /usr/local/sbin PATH: /usr/local/bin PATH: /sbin PATH: /bin PATH: /usr/sbin PATH: /usr/bin PATH: /usr/X11R6/bin PATH: /usr/NX/bin PATH: /root/bin PATH: /usr/NX/bin .. ... .. ## --- ## ## confdefs.h. ## ## --- ## #define DPS_BASE_VERSION 4 #define DPS_TAIL_VERSION 49 #define DPS_VERSION_ID 449 #define HAVE_ARPA_INET_H 1 #define HAVE_ARPA_NAMESER_H 1 #define HAVE_BZERO 1 #define HAVE_DAEMON 1 #define HAVE_DLFCN_H 1 #define HAVE_FCNTL_H 1 #define HAVE_FSEEKO 1 #define HAVE_INTTYPES_H 1 #define HAVE_LIBBIND 1 #define HAVE_LIMITS_H 1 #define HAVE_MEMORY_H 1 #define HAVE_NETDB_H 1 #define HAVE_NETINET_IN_H 1 #define HAVE_NETINET_IN_SYSTM_H 1 #define HAVE_NETINET_IP_H 1 #define HAVE_NETINET_TCP_H 1 #define HAVE_PUTENV 1 #define HAVE_REGCOMP 1 #define HAVE_RESOLV_H 1 #define HAVE_SEMAPHORE_H 1 #define HAVE_SETENV 1 #define HAVE_SNPRINTF 1 #define HAVE_SOCKET 1 #define HAVE_STDINT_H 1 #define HAVE_STDLIB_H 1 #define HAVE_STRCASECMP 1 #define HAVE_STRCASESTR 1 #define HAVE_STRDUP 1 #define HAVE_STRINGS_H 1 #define HAVE_STRING_H 1 #define HAVE_STRNCASECMP 1 #define HAVE_STRNDUP 1 #define HAVE_STRNLEN 1 #define HAVE_STRSTR 1 #define HAVE_STRTOK_R 1 #define HAVE_SYSLOG_H 1 #define HAVE_SYS_CDEFS_H 1 #define HAVE_SYS_IOCTL_H 1 #define HAVE_SYS_IPC_H 1 #define HAVE_SYS_MSG_H 1 #define HAVE_SYS_PARAM_H 1 #define HAVE_SYS_SELECT_H 1 #define HAVE_SYS_SEM_H 1 #define HAVE_SYS_SOCKET_H 1 #define HAVE_SYS_STAT_H 1 #define HAVE_SYS_SYSCTL_H 1 #define HAVE_SYS_TIMES_H 1 #define HAVE_SYS_TIME_H 1 #define HAVE_SYS_TYPES_H 1 #define HAVE_SYS_TYPES_H 1 #define HAVE_SYS_WAIT_H 1 #define HAVE_TIMEGM 1 #define HAVE_TM_GMTOFF 1 #define HAVE_TRE_REGEX_H 1 #define HAVE_UNISTD_H 1 #define HAVE_UNISTD_H 1 #define HAVE_UNSETENV 1 #define HAVE_VSNPRINTF 1 #define PACKAGE "dpsearch" #define PACKAGE_BUGREPORT "" #define PACKAGE_NAME "" #define PACKAGE_STRING "" #define PACKAGE_TARNAME "" #define PACKAGE_VERSION "" #define STDC_HEADERS 1 #define STDC_HEADERS 1 #define VERSION "4.49" #define _FILE_OFFSET_BITS 64 #define _LARGEFILE_SOURCE 1 #endif #ifdef __cplusplus extern "C" void std::exit (int) throw (); using std::exit; configure: exit 1 Пока не могу понять что не хватает. Может какой то пакет устарел. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=04;topic_id=1215757425
[dataparksearch] [Forum] Re: Segmentation fault при индексировании
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: dalex Subject: Re: Segmentation fault при индексировании > At 19:55:38 10/07/08, Maxime wrote: >А в генерируемой вами таблице могут попадаться "слова" длиной более 256 >символов ? Просмотрел - да, были сочетания символов (знак подчеркивания) длинные, возможно длиннее 256 символов. Но после пересоздания таблицы с удалением таких длинных подчеркиваний проблема не исчезла. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=04;topic_id=1214392453;reply=1215705338
[dataparksearch] [Forum] Re: RSS выборочное срабатывание
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: zabar Subject: Re: RSS выборочное срабатывание > At 18:52:16 09/07/08, Maxime wrote: >Попробуйте пересобрать указав для configure ключ --enable-syslog вместо >--disable-syslog. >Появится ли после этого отладочная информация в error_lor/файле err ? Сделал, результат прежний - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=06;topic_id=1215548226;reply=1215615136
[dataparksearch] [Forum] Re: Проблема при configure
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Андрей Subject: Re: Проблема при configure > At 14:50:11 11/07/08, Maxime wrote: >Проверьте, стоят ли у вас пэкаджи, необходимы для сборки ПО из исходников, в >Линуксах обычно они не ставятся по-умолчанию. А какие именно? - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=04;topic_id=1215757425;reply=1215773411
[dataparksearch] [Forum] Re: Проблема при configure
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: Проблема при configure Инструментарий, необходимый для сборки, указан в документации: http://www.dataparksearch.org/dpsearch-toolsreq.ru.html Я не могу назвать имена пэкаджей для линукса, но кроме перечисленных на этой странице утилит, вам нужно будет установить linux-headers, и девелоперские пэкаджи для всех библиотек, которые будут использоваться с DataparkSearch. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=04;topic_id=1215757425
[dataparksearch] [Forum] Configuration of Dataparksearch utility with Cygwin linux utility?
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Anup Nair Subject: Configuration of Dataparksearch utility with Cygwin linux utility? Hi, I have been trying to install DataparkSearch using Cygwin on a Windows XP SP2 system. I have downloaded the entire installation of Cygwin, all repositories. I can run the install.pl successfully but make gives errors. I used make version 3.81. I used perl v5.10.0 to configure. My system has MySQL as part of xampp 1.6.6a. I gave the path to the MySQL folder as "/cygdrive/d/xampp/mysql; where 'd' is the automounted D drive partition. I also downloaded the development version of xampp and copied the files into the running version when I got a "could not find mysql.h" error. The install path is default. I created the /usr/local/dpsearch directory. It fails to autodetect my MySQL database even though I have xampp running. It detects PostgreSQL though, even though I haven’t installed it. I only gave yes for MySQL support and no for all the rest. For other options I gave the default (in brackets) value. when I run makeIi get 6 warnings all from sql.c - assignment discards qualifiers from pointer target type in functions DpsAddURL, DpsAddLink, DpsResAddDocInfoSQL, DpsHtdbGet and DpsLimitLinkSQL Errors listed are 1. 'SHM_R' undeclared (first use in this function) 1. 'SHM_W' undeclared (first use in this function) 1. 'Env' undeclared (first use in this function) Could anyone please guide to successfully install and run Dataparksearch with Cygwin, any possibilities of using Dataparksearch utility on a windows based system? My main criteria is to get a search engine working to index text and multimedia files both for our intranet. Any help will be appreciated... [EMAIL PROTECTED] - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;post=
[dataparksearch] [Forum] Re: Configuration of Dataparksearch utility with Cygwin linux utility?
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: Configuration of Dataparksearch utility with Cygwin linux utility? DataparkSearch is a Unix software. I can't believe it would be compiled on Windows successfully. Although I know nothing about Cygwin, so I can't advise you. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1216287469
[dataparksearch] [Forum] Re: Протестил новый поиск
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: Протестил новый поиск Похожие запросы - это отдельный поиск, когда таблица qtrack проиндексирована средствами DataparkSearch, обращение к этому поиску идет через HttpRequest, по сути это отдельный поиск. Номера телефонов - это секция, выделяемая из текса по шаблону регулярного выражения, один из вариантов команды Section. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=05;topic_id=1216509417
[dataparksearch] [Forum] Re: Протестил новый поиск
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Roman Subject: Re: Протестил новый поиск Вижу, а не лучше как у nigma.ru сделать (парсить из текста) - так и базу дёргать не нужно? Вот ещё распространённый глук - в большенстве страниц ошибочно распознаётся язык, на русские страници маркерует bg, ro, cv, kv - а не ru - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=05;topic_id=1216509417
[dataparksearch] [Forum] How To Use DPSearch
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: will harris Subject: How To Use DPSearch It's not entirely clear to me how to use this progam. The documentation lists several options but I am new, and am not exactly sure why I would want to do certain steps over other ones. I have dpsearch configured and running fine. I just don't know what to do next. I wanted to be able to give it search terms and have it branch out over networks looking for documents with those terms, but reading the docs it doesn't seem like that's what this does. Can anyone help me with pointers, advice, and perhaps even example config files to see how, and why you use the program the way you do? Best Regards, Will - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;post=
[dataparksearch] [Forum] Re: segfault | Can
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Fox Subject: Re: segfault | Can при индексирование, после "indexer -Ecreate" - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=05;topic_id=1216734506
[dataparksearch] [Forum] install
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Mike Subject: install I would like to either offer my server (high spec dedicated) for testing in exchange for install support, or find someone I can pay to help with the initial install. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;post=
[dataparksearch] [Forum] Индексаторы запирают базу
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: zabar Subject: Индексаторы запирают базу FreeBSD 7.0/amd64 mysql 5.0.51a при сканировании после подобных записей [74103]{12} Can't connect to host dreamtour.info:80 [74103]{15} Download timeout [74103]{17} Download timeout в процессах MySQL появляется куча Lock'ов. пока эти Lock-и не прибьешь, индексатор дальше не работает. посоветуйте, пожалуйста, в чем может быть проблема? ниже приведены конфиги. cached-zoo.conf запускаю так cached /usr/local/dpsearch/cached-zoo.conf Listen 7000 DBAddr mysql://*:[EMAIL PROTECTED]/*/?dbmode=cache WrdFiles 4096 CacheLogWords 16384 CacheLogDels 8192 URLDataFiles 256 OptimizeAtUpdate yes OptimizeInterval 3600 OptimizeRatio 5 VarDir /usr/local/dpsearch/var/zoo Limit site:siteid Limit c:category indexer запуск indexer indexer -r -N 20 -H -W /usr/local/dpsearch/indexer-zoo.conf DBAddr mysql://*:[EMAIL PROTECTED]/*/?dbmode=cache&cached=localhost:7000 VarDir /usr/local/dpsearch/var/zoo LocalCharset cp1251 CollectLinks yes DoStore yes Include stopwords.conf Include langmap.conf MinWordLength 1 MaxWordLength 25 MaxDocSize 51200 MinDocSize 2048 IndexDocSizeLimit 51200 URLSelectCacheSize 10240 MaxDepth 4 Period 600d PeriodByHops 0 14d PeriodByHops 1 30d PeriodByHops 2 60d PeriodByHops 3 120d PeriodByHops 4 240d PeriodByHops 5 480d ParserTimeOut 3s ReadTimeOut 3s RobotsPeriod 30d DocTimeOut 3s ServerTable mysql://*:[EMAIL PROTECTED]/*/zoooz_server Limit site:siteid Limit c:category PopRankMethod Neo PopRankFeedBack yes PopRankNeoIterations 10 PopRankUseTracking yes MaxNetErrors 32 MaxSiteLevel 3 URLInfoSQL no MarkForIndex no CheckInsertSQL yes DetectClones yes Include sections.conf RemoteCharset windows-1251 DefaultLang ru VaryLang "ru en" Disallow *sort=* *filmrnd.php* *trans=* *actor=* *producer=* Disallow *sasn=???* *ortOrder=* *rderby=* *rder_by=* *rder=* *sortby=* *sort_by=* Disallow */ad/* *&cb=???* *userpic* *showuser=?* Disallow *video*&style=* *video*&leter=?* Disallow *&end_mark=* *referrer* Disallow */adm/* */admin* *login.* *=*auth* Disallow */assets/* */classes/* */js/* */menus/* Disallow *http://*http:/* *http://*www*/www*/* Disallow */koi/koi* */koi/iso/* */koi/dos/* */koi/win/* Disallow */koi8/koi* */koi8/iso/* */koi8/dos/* */koi8/win/* Disallow */iso/*/iso/* */iso/koi8/* */iso/iso/* */iso/dos/* */iso/win/* Disallow *out.cgi* *privatesend.* *action* *ubbmisc.* *findthread* */search.* *simplesearch* Disallow *Ultimate.*email* *recent_user* *=*profile* *=*transfer* Disallow *ultimatebb.*get_ip* *=reply* *send_topic* *next_topic* *edit_post* Disallow *close_topic* *ultimatebb.*email* *delete_topic* *=agree* Disallow *.b *.sh *.md5 *.rpm Disallow *.arj *.tar *.zip *.tgz *.gz *.z *.bz2 Disallow *.lha *.lzh *.rar *.zoo *.ha *.tar.Z Disallow *.gif *.jpg *.jpeg *.bmp *.tiff *.tif *.xpm *.xbm *.pcx *.ico Disallow *.vdo *.mpeg *.mpe *.mpg *.avi *.movie *.mov *.dat *.swf *.fla Disallow *.mid *.mp3 *.rm *.ram *.wav *.aiff *.ra Disallow *.vrml *.wrl *.png *.psd Disallow *.exe *.com *.cab *.dll *.bin *.class *.ex_ Disallow *.tex *.texi *.xls *.doc *.texinfo Disallow *.rtf *.pdf *.cdf *.ps Disallow *.ai *.eps *.ppt *.hqx Disallow *.cpt *.bms *.oda *.tcl Disallow *.o *.a *.la *.so Disallow *.pat *.pm *.m4 *.am *.css Disallow *.map *.aif *.sit *.sea Disallow *.m3u *.qt *.mov Disallow *D=A *D=D *M=A *M=D *N=A *N=D *S=A *S=D *O=A *O=D #Disallow Regex \.r[0-9][0-9]$ \.a[0-9][0-9]$ \.so\.[0-9]$ Include /*/disallows.conf ReverseAlias regex ^(.*)&[a-zA-Z;]+=[a-zA-Z0-9]{32}(.*) $1$2 ReverseAlias regex ^(.*)\?[a-zA-Z;]+=[a-zA-Z0-9]{32}&(.*) $1?$2 ReverseAlias regex ^(.*)\?[a-zA-Z;]+=[a-zA-Z0-9]{32}&(.*) $1?$2 ReverseAlias regex ^(.*)\?[a-zA-Z;]+=[a-zA-Z0-9]{32}(.*) $1?$2 ReverseAlias regex ^(.*)&[a-zA-Z;]+=[a-zA-Z0-9]{32}(.*) $1$2 ReverseAlias regex ^(.*)[&\?][a-zA-Z;]+=[a-zA-Z0-9]{16}$ $1 ReverseAlias regex ^(.*)&[a-zA-Z;]+=[a-zA-Z0-9]{16}(.*) $1$2 ReverseAlias regex ^(.*)\?[a-zA-Z;]+=[a-zA-Z0-9]{16}&(.*) $1?$2 ReverseAlias regex ^(.*)\?[a-zA-Z;]+=[a-zA-Z0-9]{16}&(.*) $1?$2 ReverseAlias regex ^(.*)\?[a-zA-Z;]+=[a-zA-Z0-9]{16}(.*) $1?$2 ReverseAlias regex ^(.*)&[a-zA-Z;]+=[a-zA-Z0-9]{16}(.*) $1$2 ReverseAlias regex ^(.*)[&\?][a-zA-Z;]+=[a-zA-Z0-9]{32}$ $1 ReverseAlias regex ^(.*)([&\?])[a-zA-Z;]+=[a-zA-Z0-9]{32}&(.*) $1$2$3 ReverseAlias regex ^(.*)[&\?][a-zA-Z;]+=[a-zA-Z0-9]{16}$ $1 ReverseAlias regex ^(.*)([&\?])[a-zA-Z;]+=[a-zA-Z0-9]{16}&(.*) $1$2$3 HoldBadHrefs 30d #UseRemoteContentType yes AddType image/x-xpixmap *.xpm AddType image/x-xbitmap *.xbm AddType text/plain *.txt *.pl *.js *.h *.c *.pm *.e AddType text/html *.html *.htm AddType text/rtf*.rtf AddType application/pdf *.pdf AddType application/msword *.doc AddType application/vnd.ms-excel*.xls AddType text/x-postscript *.ps AddType application/unknown *.* - - - - - - - - - - - - - - - - - - - - - - - - - - - -
[dataparksearch] [Forum] Re: install
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Mike Subject: Re: install Hi, I installed the script but when I run (make install) after successfully running ./install.pl and make I get these errors make[2]: *** [install-includeHEADERS] Error 1 make[2]: Leaving directory `/usr/local/dpsearch/include' make[1]: *** [install-am] Error 2 make[1]: Leaving directory `/usr/local/dpsearch/include' make: *** [install-recursive] Error 1 The server is OS: CentOS 5.x Hardware: Intel Core 2 Duo Processor E6420/2048MB Ram/2x200GB SATA Drivers/100Mbps Port Speed/1600GB Bandwidth Per Month - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;topic_id=1217236698
[dataparksearch] [Forum] Re: install
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: install It looks like you have put sources under /usr/local/dpsearch and you're trying to install into the same directory. Try to move sources into another directory, i.e. into your home directory, and repeat installation from that new place. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;topic_id=1217236698
[dataparksearch] [Forum] Re: No
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: No It looks like you have entered a Server command without trailing slash. Try correct it like this one: Server http://www.sina.com.cn/ - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.com/cgi-bin/simpleforum.cgi?fid=02;topic_id=1217405250
[dataparksearch] [Forum] Re: No
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: ssharry Subject: Re: No Thank you! - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.com/cgi-bin/simpleforum.cgi?fid=02;topic_id=1217405250
[dataparksearch] [Forum] Re: Problem with install of 4.50
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: Problem with install of 4.50 Please look inside config.log file in the directory where you have ran configure/install.pl, especially for the line which starts with checking for MySQL support... How this line looks like and few lines just after it in your config.log file ? - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217394663
[dataparksearch] [Forum] About Chinese charset
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: ssharry Subject: About Chinese charset Hi, I configured the project as follow,but still can't see the right chinese words through cgi. ./configure --prefix=/home/sc/ --with-pgsql=/usr/local/pgsql/ --with-extra-charsets=chinese --without-aspell make make install in indexer.conf pgsqlX Server http://www.sina.com.cn/ LocalCharset BIG5 LoadChineseList BIG5 /home/share/dpsearch-4.50/TraditionalChinese.freq ./indexer -W it runs without problem. But when accessing through cgi, it still can show chinese characters. Like this: 1. ÁªÏµÎÒÃÇ_ÐÂÀËÍø [0.006% Popularity: 0.25000] ÐÂÀËÍø¿Í»¡±¡PþÎñµç»¢X ÐÂÀËÍø²úÆ¡PÓû¡±¡PþÎñ¢G¬²úÆ¡P¡ÑÉѯ¢G¬¢G¡± ¡±... http://www.sina.com.cn/contactus.html - 21615 bytes [text/html] - Mon, 21 Jul 2008, 20:28:17 CST [All results from this site ] Could you give me any suggections? Thank you very much. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;post=
[dataparksearch] [Forum] Re: About Chinese charset
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: About Chinese charset did you uncomment all chinese language maps in langmap.conf file ? They are commented out by default, since the support for chinese charsets doesn't compiled in by default. If you need to uncomment these maps, you have to reindex your pages indexed. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;topic_id=1217585036
[dataparksearch] [Forum] An Error about client_encoding
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: ssharry Subject: An Error about client_encoding Hi Here is the log of an error when indexing. {sql.c:1990} Query: SELECT rec_id, hops FROM url WHERE url='http://www.verycd.com/tags/动漫/' SQL-server message: ERROR: invalid byte sequence for encoding "UTF8": 0xb6 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding". - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=03;post=
[dataparksearch] [Forum] Install for people like me cpanel - linux
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Mike Subject: Install for people like me cpanel - linux As couldnt find an install for dummies like me this is what I did In cpanel make sure you create a new mysql database and give a user ALL priviliges the account and database name will be like this acct_user and acct_databasename 1) SSH into your server with putty 2) cd / 3) mkdir dpsearch 4) cd /dpsearch 5) wget (url for dpsearch) 6) unpack tar file 7) cd dpsearch directory 8) ./install.pl (if this doesnt work first type chmod 755 ./install.pl) 9) make selections 10) make 11) make install 12) cd /usr/local/dpsearch/bin 13) cp ./*.cgi /home/acct/public_html/cgi-bin 14) chown -R acct:acct /home/acct/public_html/cgi-bin 15) vi /usr/local/dpsearch/etc/indexer.conf-dist 16) add the information needed and then :w indexer.conf at the top where it asks for your mysql info with foo and bar, use this format DBAddr mysql://acct_user:[EMAIL PROTECTED]/acct_databasename/?dbmode=cache 17) change all -dist files to .conf and .htm check by typing ls to see which once are there and edit them all accordingly 18) cd ../sbin 19) ./indexer -Ecreate 20) ./indexer Now it should start indexing whatever information you setup in your indexer.conf Go to www.yoursite.com/cgi-bin/search.cgi and it should show a search bar, if it doesnt most likely your having wrong permissions or owners for files. Anyways... I know this is a stupid list and all, but it took me a week to figure it out... now that it works... I love it! Having so much fun with this thing ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;post=
[dataparksearch] [Forum] Re: Протестил новый поиск
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Roman Subject: Re: Протестил новый поиск stored, я где-то в мануале видел команду к indexer переиндексировать базу поиска из сохранённых копии (что то счас не найду как точно она выглядит). Правда не заглючит ли она, при условии что сами ссылки на них не видны при поиске? - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=05;topic_id=1216509417;page=2
[dataparksearch] [Forum] Re: Протестил новый поиск
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: Протестил новый поиск Я пока не знаю о причинах пропадания ссылок, поэтому при индексировании из базы stored (это ключ -B для indexer), возможно, вы получите только 30% документов из базы stored, остальные будут проиндекированы как обычно, вытягиванием через интернет. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=05;topic_id=1216509417;page=2
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Mike Subject: Re: getting closer to my end result Right now all is dbmode multi As soon as I change this to cache the following happens I search for mason -- no results I search for Mason -- some results I search for 1 -- No results It seems that with cache turned on I cannot search any of the documents based on spelling, however if I turn to dbmode multi... all works very well... check it out www.biblers.org/cgi-bin/search.cgi I have put DoStore yes in all files - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: getting closer to my end result Using dbmode cache you have to write down fresh URL data and limits using the command ./indexer -THW after each indexing/reindexing (or periodically if indexing takes long run). Please note, if you use cached, this command exit immediately, but all work is performed by cached and this take some time (depend on search database size). "Not found rec:..." message for subdocument indexing is normal, since indexer is trying first fetch subdoc from stored database and then from remote host. This allow to reduce traffic. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135
[dataparksearch] [Forum] Re: Search for XYZ. Search results: lait: 95421 / 95421 and don
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Mike Subject: Re: Search for XYZ. Search results: lait: 95421 / 95421 and don Thank you, I have dont this and started indexing, also ran the THW, However one thing is weird Search for Masons and you get results, Search for masons and you get no results. Also if I click cached copy it goes to a The webpage cannot be found Ahhh, one more!... my indexer get stuck on .swf files! - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;topic_id=1209717853
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Mike Subject: Re: getting closer to my end result Thank you, I have dont this and started indexing, also ran the THW, However one thing is weird Search for Masons and you get results, Search for masons and you get no results. Also if I click cached copy it goes to a The webpage cannot be found Ahhh, one more!... my indexer get stuck on .swf files! - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135
[dataparksearch] [Forum] Re: How to crawl from one site to other sites using links?
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: How to crawl from one site to other sites using links? Please describe more what are expecting to get ? By default, dpsearch crawls all links betwen site which are having a corresponding Server/Realm/Subnet command in indexer.conf file. So you need to write appropriate commands in your indexer.conf file. If you need to enable population of the links table for the PopRank calculation, you need to place the command CollectLinks yes into your indexer.conf file. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;topic_id=1217940158
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Mike Subject: Re: getting closer to my end result Kicked of the indexer last night, and just came back to my office now.. 17,000,000 indexed dict definitions.. its going well! - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Mike Subject: Re: getting closer to my end result This GroupBySites=yes can I not put this in the indexer or search.htm template? If not, how do I pass this to my search.cgi - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: getting closer to my end result You may put it as a hidden CGI parameter into your search form: You don't need to put it into your search template search.htm, since it already put here and take the value by default or what was passed in CGI-parameters. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135
[dataparksearch] [Forum] ? in url
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: pending Subject: ? in url Generally speaking, dpsearch indexes my site correctly, which is using a php framework. Although after indexing the site, it indeed indexed all required urls including those like http://mySiteDomain/products/1/index.html?id=353, it saved no word/information of pages using sort of the above link into the 'dict' table of the 'search' database. Other pages can be correctly searched. Could anyone tell me what the problem might be about dpsearch ignoring information under links like: http://mySiteDomain/products/number/index.html?id= Thanks in advance! - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=03;post=
[dataparksearch] [Forum] segfault
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Fox Subject: segfault Перевел баэу с 4.48 на 4.50 indexer -Erehashstored поиск отказывется работать с появлением такого сообщения в логах системы search.cgi[2681]: segfault at 8 ip 7ff145dcb932 sp 7fff4f530190 error 4 in libc-2.8.so[7ff145d3a000+141000] downgrade на 4.48, там все ок. На тестовой базе выло все ok, правда при индексирование с нуля. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=05;topic_id=1216734506
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Mike Subject: Re: getting closer to my end result Thank you so much! cache works group by page works indexer is running hard Aspell is working awesome!... thank you sooo much! 1 question for today How to Disallow a url, for example no indexing of amazon.com - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135
[dataparksearch] [Forum] Re: ? in url
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: ? in url Please run the command: ./indexer -qamv5 -u http://mySiteDomain/products/1/index.html?id=353 the -v5 switch here enables full debug output, include information why this page has been indexed or not. Please show the output of this command if it doesn't get you a clue. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=03;topic_id=1218086676
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: getting closer to my end result Put this command into your indexer.conf file: Disallow regex amazon\.com - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Mike Subject: Re: getting closer to my end result When I do that it gives me indexer[9452]: {01} SubDoc.robots.txt: 'Disallow /' - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=2
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: getting closer to my end result No, this message means, that a subdocument is disallowed by a rule in robots.txt of remote site. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=2
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Mike Subject: Re: getting closer to my end result I am not sure what happens, but all my indexer seem to be stuck amazon, nothing goes along... it gets worse if I put the line Disallow regex amazon\.com (or Regex) in my indexer.conf - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=2
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: getting closer to my end result What do you mean under "stuck amazon" ? Probably, you've got a vast number of URLs from amazon.com and indexer deletes all of them according to this Disallow command. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=2
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Mike Subject: Re: getting closer to my end result I am not sure what happened... but I guess your right, it now has to delete all the amazon entries. Its a lot of fine tuning hey! - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=2
[dataparksearch] [Forum] Re: segfault | Can
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Fox Subject: Re: segfault | Can но уже после "indexer -Erehashstored" назад дороги нет, Видимо придется переиндексировать с нуля - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=05;topic_id=1216734506
[dataparksearch] [Forum] Re: segfault | Can
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: segfault | Can Включите, пожалуйста, создание посмертных дампов для пользователя, из-под которого запускается search.cgi, командой limits -c unlimited затем создайте по полученому дампу отчет как написано здесь: http://www.dataparksearch.org/dpsearch-misc.ru.html#bugs-core Если высделали бэкап вашей директории /usr/local/dpsearch/var/ , то можено откатиться восстановив эту директорию из дампа. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=05;topic_id=1216734506;page=2
[dataparksearch] [Forum] Re: segfault | Can
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Fox Subject: Re: segfault | Can Запустил индексацию с нуля все ok, появился шанс это сделать :) думаю нет смысла тратить время на проблемы с совместимостью, пока. Дальше будут проблемы выложу дамп. Спасибо. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=05;topic_id=1216734506;page=2
[dataparksearch] [Forum] Re: segfault | Can
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Fox Subject: Re: segfault | Can trouble с каткгориями в версии 4.50 индексация произведена с ключами: ## LIMITS !!! Limit c:category ... ## Category 01 Server site http://site.name ... при поиске добавляем "&c=01" в URL результат "did not find any results" в версии 4.48 все ok - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=05;topic_id=1216734506;page=2
[dataparksearch] [Forum] Re: segfault | Can
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: segfault | Can Эта же команда Limit присутствует в шаблоне search.htm или в файле конфигурации searchd.conf, если используется searchd ? Добавьте в шаблон searchd.htm или в searchd.conf команду LogLevel 5 что при этом будет выводиться в error_log при поиске с лимитом по категории ? - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=05;topic_id=1216734506;page=2
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Mike Subject: Re: getting closer to my end result Hi, I have no idea what I did wrong, But when I start my indexer (I did a ./indexer -C) It show me the following [EMAIL PROTECTED] ~]# /usr/local/dpsearch/sbin/indexer indexer[4172]: {00} indexer from dpsearch-4.50-mysql started with '/usr/local/dpsearch/etc/indexer.conf' indexer[4172]: {01} Done (1 seconds, 0 documents, 0 bytes, 0.00 Kbytes/sec.) indexer[4172]: {00} Total 1 seconds, 0 documents, 0 bytes, 0.00 Kbytes/sec, 0.00 sec/doc, 0 bytes/doc. indexer[4172]: {00} Neo PopRank: 0 documents, 0 pas, 0.00 Kpas/sec, 0.00 sec/doc, 0.00 pas/doc. [EMAIL PROTECTED] ~]# Why doesnt it start indexing? - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=2
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: getting closer to my end result What the output is for the command: /usr/local/dpsearch/sbin/indexer -S ? Try to run /usr/local/dpsearch/sbin/indexer -a which is force reindexing for all documents in the database. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=2
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Mike Subject: Re: getting closer to my end result Ok. it is running, but no dict is filled, Database statistics StatusExpired Total - 0 108210 111937 Not indexed yet 200 0 5257 OK 206 0 2 Partial OK 301 0238 Moved Permanently 302 0197 Moved Temporarily 304 0149 Not Modified 401 0 53 Unauthorized 403 0 3 Forbidden 404 0 57 Not found 406 0 37 Not Acceptable 415 0209 Unsupported Media Type 500 0 6 Internal Server Error 503 0 27 Service Unavailable 504 0 2 Gateway Timeout - Total 108210 118174 - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=2
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: getting closer to my end result If you use dbmode cache, dict table isn't filles. All data stores under /usr/local/dpserach/var directory. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=2
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Mike Subject: Re: getting closer to my end result I must have broken something, because there are no results anymore... - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=2
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Mike Subject: Re: getting closer to my end result Very confused, If I search for "bible" i get over a thousand results, but if I then search for other words in the results of "bible" they dont show.. What am I doing wrong? - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=3
[dataparksearch] [Forum] Re: ? in url
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: pending Subject: Re: ? in url thanks a lot, i have figured out what the problem is. session issue for cgi - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=03;topic_id=1218086676
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: getting closer to my end result When dbmode cache is used, it use caching to reduce disk usage. It looks like the "bible" word is one of most used in your collection and its buffer have been already flushed while others buffers aren't filled yet. If you use cached daemon, you may flush all buffers using the command /usr/local/dpsearch/sbin/indexer -TH If you don't use cached daemon, stop the indexer, it will flush all buffers on exit. As well, you need to write URL data for dbmode cache using the command /usr/local/dpsearch/sbin/indexer -TW - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=3
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Mike Subject: Re: getting closer to my end result Hi, thanks as always, I give up on cache mode, it is too much trouble... but multi is working nicely About the amazon exclusion, I put the line you gave me in the indexer.conf but it still seems to get stuck on amazon, what happens is that it just looks stuck, these are last lines, keeps hanging everytime k_7263062_4?pf_rd_m=ATVPDKIKX0DER&pf_rd_s=right-1&pf_rd_r=0CZBSQYM5VNGQ1B7T737&pf_rd_t=1401&pf_rd_p=424603701&pf_rd_i=161771 indexer[18526]: {01} [] Subdoc URL: http://ad.doubleclick.net/adi/amzn.us.dp.books/nonfiction.true_accounts;sz=300x250;s=3;s=5;s=9;s=10;s=12;s=14;s=22;s=32;s=37;s=40;s=49;s=52;s=53;s=56;s=57;s=58;s=59;s=63;s=66;s=67;s=86;s=88;s=89;s=92;s=94;s=96;s=97;s=100;u=74fb9213ce0 indexer[18526]: {01} SubDoc.robots.txt: 'Disallow /' indexer[18526]: {01} URL: http://www.amazon.com/gp/product/0060892080/ref=amb_link_7094502_3?pf_rd_m=ATVPDKIKX0DER&pf_rd_s=center-5&pf_rd_r=0CZBSQYM5VNGQ1B7T737&pf_rd_t=1401&pf_rd_p=421043401&pf_rd_i=161771 indexer[18526]: {01} [] Subdoc URL: http://ad.doubleclick.net/adi/amzn.us.dp.books/childrens;sz=300x250;s=3;s=5;s=9;s=10;s=12;s=14;s=22;s=32;s=37;s=40;s=49;s=52;s=53;s=56;s=57;s=58;s=59;s=63;s=66;s=67;s=86;s=88;s=89;s=92;s=94;s=96;s=97;s=100;u=22c94b9c7c0e4cd982fa2a008c indexer[18526]: {01} SubDoc.robots.txt: 'Disallow /' indexer[18526]: {01} URL: http://www.amazon.com/gp/product/0061147761/ref=amb_link_7263062_6?pf_rd_m=ATVPDKIKX0DER&pf_rd_s=right-1&pf_rd_r=0CZBSQYM5VNGQ1B7T737&pf_rd_t=1401&pf_rd_p=424603701&pf_rd_i=161771 indexer[18526]: {01} URL: http://www.amazon.com/gp/product/0061173509/ref=amb_link_7263082_1?pf_rd_m=ATVPDKIKX0DER&pf_rd_s=right-3&pf_rd_r=0CZBSQYM5VNGQ1B7T737&pf_rd_t=1401&pf_rd_p=424482201&pf_rd_i=161771 indexer[18526]: {01} [] Subdoc URL: http://ad.doubleclick.net/adi/amzn.us.dp.books/fiction_literature.fiction;sz=300 - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=3
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: getting closer to my end result Please show the output for the command /usr/local/dpsearch/sbin/indexer -v5 -n1 -u http://www.amazon.com/% Yes, it will be huge, post it anyway. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=3
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: getting closer to my end result Place Allow * command in your indexer.conf file below any of Disallow command. All Allow/Disallow commands are trying on order of appearance in the indexer.conf and only the first match apply. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=3
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Mike Subject: Re: getting closer to my end result This is my indexer.conf Am I doing something wrong? #VarDir /usr/local/dpsearch/var #NewsExtensions no #AccentExtensions no #SyslogFacility local7 #LocalCharset iso-8859-1 #LocalCharset windows-1252 # Central Europe: Czech, Slovenian, Slovak, Hungarian #LocalCharset iso-8859-2 #LocalCharset windows-1250 # Japanese #LocalCharset UTF-8 CrossWords yes CollectLinks yes DoStore yes StopwordFile stopwords/en.sl Include stopwords.conf #LangMapFile langmap/en.ascii.lm Include langmap.conf MinWordLength 1 MaxWordLength 32 #MaxDocSize 1048576 #MinDocSize 1024 #IndexDocSizeLimit 65536 #URLSelectCacheSize 10240 #HTTPHeader "User-Agent: My_Own_Agent" #HTTPHeader "Accept-Language: ru, en" #HTTPHeader "From: [EMAIL PROTECTED]" #FlushServerTable #ServerTable mysql://user:[EMAIL PROTECTED]/dbname/tablename #UseDateHeader yes Allow * Allow Case *.HTM Disallow *.b*.sh *.md5 *.rpm Disallow *.arj *.tar *.zip *.tgz *.gz *.z *.bz2 Disallow *.lha *.lzh *.rar *.zoo *.ha *.tar.Z Disallow *.gif *.jpg *.jpeg *.bmp *.tiff *.tif *.xpm *.xbm *.pcx Disallow *.vdo *.mpeg *.mpe *.mpg *.avi *.movie *.mov *.dat Disallow *.mid *.mp3 *.rm *.ram *.wav *.aiff *.ra Disallow *.vrml *.wrl *.png *.psd Disallow *.exe *.com *.cab *.dll *.bin *.class *.ex_ Disallow *.tex *.texi *.xls *.doc *.texinfo Disallow *.rtf *.pdf *.cdf *.ps Disallow *.ai *.eps *.ppt *.hqx Disallow *.cpt *.bms *.oda *.tcl Disallow *.o*.a*.la *.so Disallow *.pat *.pm *.m4 *.am *.css Disallow *.map *.aif *.sit *.sea Disallow *.m3u *.qt *.mov Disallow *D=A *D=D *M=A *M=D *N=A *N=D *S=A *S=D *O=A *O=D Disallow Regex \.r[0-9][0-9]$ \.a[0-9][0-9]$ \.so\.[0-9]$ #CheckOnly *.lha *.lzh *.rar *.zoo *.tar*.Z #CheckOnly *.gif *.jpg *.jpeg *.bmp *.tiff #CheckOnly *.vdo *.mpeg *.mpe *.mpg *.avi *.movie #CheckOnly *.mid *.mp3 *.rm *.ram *.wav *.aiff #CheckOnly *.vrml *.wrl *.png #CheckOnly *.exe *.cab *.dll *.bin *.class #CheckOnly *.tex *.texi *.xls *.doc *.texinfo #CheckOnly *.rtf *.pdf *.cdf *.ps #CheckOnly *.ai *.eps *.ppt *.hqx #CheckOnly *.cpt *.bms *.oda *.tcl #CheckOnly *.rpm *.m3u *.qt *.mov #CheckOnly *.map *.aif *.sit *.sea # or check ANY except known text extensions using "regex" match: #CheckOnly NoMatch Regex \/$|\.html$|\.shtml$|\.phtml$|\.php$|\.txt$ #HrefOnly */mail*.html */thread*.html Allow .html .txt .php .php* .htm */ .shtml .pl Disallow * #HoldBadHrefs 30d #DeleteOlder 7d # Default: yes UseRemoteContentType yes AddType image/x-xpixmap *.xpm AddType image/x-xbitmap *.xbm AddType image/gif *.gif AddType text/plain *.txt *.pl *.js *.h *.c *.pm *.e AddType text/html *.html *.htm AddType text/rtf*.rtf AddType application/pdf *.pdf AddType application/msword *.doc AddType application/vnd.ms-excel*.xls AddType text/x-postscript *.ps #DefaultLang en MaxDocsPerServer -1 #MaxNetErrors 16 #ReadTimeOut 30s #DocTimeOut 1m30s #NetErrorDelayTime 1d Robots yes Cookies yes DetectClones yes Include sections.conf Index yes PopRankMethod Goo PopRankSkipSameSite yes PopRankFeedBack yes Realm * IndexIf regex title [Jj]esus [Cc]hrist [Mm]asonry [Mm]asonic [Ff]reemason [Cc]hristianity [Cc]atholic [Rr]eligion [Hh]iram [Aa]bif [Aa]biff [Pp]rotestant [Cc]hurch [Ss]cientology [Aa]theism [Bb]aptist [Rr]ites [Kk]abala [Cc]abala [Tt]emplar IndexIf regex body [Jj]esus [Cc]hrist [Mm]asonry [Mm]asonic [Ff]reemason [Cc]hristianity [Cc]atholic [Rr]eligion [Hh]iram [Aa]bif [Aa]biff [Pp]rotestant [Cc]hurch [Ss]cientology [Aa]theism [Bb]aptist [Rr]ites [Kk]abala [Cc]abala [Tt]emplar NoIndexIf title * NoIndexIf body * Disallow regex amazon\.com Allow * URL en.wikipedia.org/wiki/Freemasonry URL http://www.bessel.org/ URL http://www.sacred-texts.com/mas/ URL http://www.masonicinfo.com/ URL http;//www.masonicinfo.com/ URL http;//www.freemasons-freemasonry.com/ URL http://gnosismagazine.com/ URL http://www.freemasonrytoday.com/ URL http;//www.phoenixmasonry.org/ URL http://www.corcerstonesociety.com/ URL http://www.freemasonry.org/ URL http://albertpike.org/ URL http://freemasonry.net/somerset/ URL http://freimaurer.org/ URL http://gl-mi.org/lodges/dearborn-172/ URL http://masons.sk.ca/ URL http://mastermason.com URL http://morelight.org/444/ URL http://mountvernon14.org/ URL http://mt.moriahlodgeno18.com/ URL http://mwphglne.org/ URL http://www.novusordosaeculorum.com/ URL http://www.churchesofchrist.net/ URL http://www.jewishencyclopedia.com/ URL http://www.fullbooks.com/ URL http://virtualreligion.net/ URL http://www.biblegateway.com/ URL http://seattlemasons.org/ URL http://valleylodge511.com/ URL http://www.2be1ask1.com/ URL http://www.alaska-mason.org/grand_lodge/ URL http://www.alphalodge729.com/ URL http://www.ancientlandmarks.com/ U
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: getting closer to my end result Yes, it seems you need to comment in the Allow * command on 31st line. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=3
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Mike Subject: Re: getting closer to my end result Like this CrossWords yes CollectLinks yes DoStore yes StopwordFile stopwords/en.sl Include stopwords.conf Include langmap.conf MinWordLength 1 MaxWordLength 32 #Allow * Allow Case *.HTM Disallow *.b*.sh *.md5 *.rpm Disallow *.arj *.tar *.zip *.tgz *.gz *.z *.bz2 Disallow *.lha *.lzh *.rar *.zoo *.ha *.tar.Z Disallow *.gif *.jpg *.jpeg *.bmp *.tiff *.tif *.xpm *.xbm *.pcx Disallow *.vdo *.mpeg *.mpe *.mpg *.avi *.movie *.mov *.dat Disallow *.mid *.mp3 *.rm *.ram *.wav *.aiff *.ra Disallow *.vrml *.wrl *.png *.psd Disallow *.exe *.com *.cab *.dll *.bin *.class *.ex_ Disallow *.tex *.texi *.xls *.doc *.texinfo Disallow *.rtf *.pdf *.cdf *.ps Disallow *.ai *.eps *.ppt *.hqx Disallow *.cpt *.bms *.oda *.tcl Disallow *.o*.a*.la *.so Disallow *.pat *.pm *.m4 *.am *.css Disallow *.map *.aif *.sit *.sea Disallow *.m3u *.qt *.mov Disallow *D=A *D=D *M=A *M=D *N=A *N=D *S=A *S=D *O=A *O=D Disallow Regex \.r[0-9][0-9]$ \.a[0-9][0-9]$ \.so\.[0-9]$ Allow .html .txt .php .php* .htm */ .shtml .pl Disallow * UseRemoteContentType yes AddType image/x-xpixmap *.xpm AddType image/x-xbitmap *.xbm AddType image/gif *.gif AddType text/plain *.txt *.pl *.js *.h *.c *.pm *.e AddType text/html *.html *.htm AddType text/rtf*.rtf AddType application/pdf *.pdf AddType application/msword *.doc AddType application/vnd.ms-excel*.xls AddType text/x-postscript *.ps MaxDocsPerServer -1 Robots yes Cookies yes DetectClones yes Include sections.conf Index yes PopRankMethod Goo PopRankSkipSameSite yes PopRankFeedBack yes Realm * IndexIf regex title [Jj]esus [Cc]hrist [Mm]asonry [Mm]asonic [Ff]reemason [Cc]hristianity [Cc]atholic [Rr]eligion [Hh]iram [Aa]bif [Aa]biff [Pp]rotestant [Cc]hurch [Ss]cientology [Aa]theism [Bb]aptist [Rr]ites [Kk]abala [Cc]abala [Tt]emplar IndexIf regex body [Jj]esus [Cc]hrist [Mm]asonry [Mm]asonic [Ff]reemason [Cc]hristianity [Cc]atholic [Rr]eligion [Hh]iram [Aa]bif [Aa]biff [Pp]rotestant [Cc]hurch [Ss]cientology [Aa]theism [Bb]aptist [Rr]ites [Kk]abala [Cc]abala [Tt]emplar NoIndexIf title * NoIndexIf body * Disallow regex amazon\.com Allow * URL en.wikipedia.org/wiki/Freemasonry - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=3
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: getting closer to my end result Yes, it is. Please note, the commands Disallow regex amazon\.com Allow * doesn't play anything, since all documents are dissalowed by the command Disallow * above. If you need to disallow anything from amazon.com domain, you need to move the command Disallow regex amazon\.com above any Allow command in your config. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=3
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Mike Subject: Re: getting closer to my end result Thats confusing, sorry Like this , it looks silly! Allow .html .txt .php .php* .htm */ .shtml .pl Disallow regex amazon\.com Allow * Disallow * - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=4
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: getting closer to my end result Once again, the Allow * command just after Disallow regex amazon\.com command allows indexing of everything except amazon.com and makes any of Allow / Disallow command after it. It seems you need to remove this Allow * command. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=4
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Mike Subject: Re: getting closer to my end result I did that.. thanks a lot for your patience, One thing keeps happening, My indexer keeps freezing or something.. it starts, and then it stops after a few minutes... at different pages and places... I stop it and restart... and it keeps happening over and over again.. what I would love to have, is too kick it off.. and just let it go for eternity - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=4
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: getting closer to my end result How many indexing threads do you start at same time ? (what is the value for -N switch for indexer ?) - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=4
[dataparksearch] [Forum] Re: segfault | Can
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Fox Subject: Re: segfault | Can Limit присутствует в шаблоне search.htm и searchd.conf файл error_log не появляется, смог вывести в syslog, следующую инфу, если это моможет: ### search.cgi started with '/home/indexer/dpsearch/etc/search.htm' VarDir: '/home/indexer/dpsearch/var' Affixes: 0, Spells: 0, Synonyms: 0, Acronyms: 0, Stopwords: 0 Chinese dictionary with 0 entries Korean dictionary with 0 entries Thai dictionary with 0 entries Start DpsFind DpsFind for pgsql://dpsearch:[EMAIL PROTECTED]/search/?dbmode=cache DpsGetWords for pgsql://dpsearch:[EMAIL PROTECTED]/search/?dbmode=cache .spell lang: en Prepare query: mail, ltxt:mail Segment lang: wrd {4}: mail 00334200 - 004ce2ff 81bf0fff num: 0 lims.0.size:1 [tree/wrd] ARetrieved rec_id: b216802 Size: 4019->9576 max_order: 0 max_order_inquery: 0 Start Order, Last-Modified and Excerpts Stop Order, Last-Modified and Excerpts: 0.00 Start DpsTrack Stop DpsTrack: 0.00 Done DpsFind 0.002 ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=05;topic_id=1216734506;page=2
[dataparksearch] [Forum] Re: segfault | Can
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: segfault | Can Выглядит, как будто нет данных в лимите по категориям. Выполнялась ли команда indexer -TW по окончании индексирования и searchd отправлялся сигнал -HUP на перезагрузку данных об URL и лимитов, если они предзагружаются в память ? Есть ли у пользователя, из под которого запущен searchd права на чтение файлов /usr/local/dpsearch/var/tree/lim_* ? - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=05;topic_id=1216734506;page=2
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Mike Subject: Re: getting closer to my end result Hi, At the end of the day this is the message SQL-server message: MySQL driver: #1203: User biblers_search has already more than 'max_user_connections' active connections indexer[4313]: {01} MySQL driver: #1203: User biblers_search has already more than 'max_user_connections' active connections indexer[4313]: {01} Error: 'No appropriate storage support compiled' indexer[4313]: {00} Total 104 seconds, 27 documents, 800717 bytes, 7.52 Kbytes/sec, 3.85 sec/doc, 29656 bytes/doc. indexer[4313]: {00} Neo PopRank: 0 documents, 0 pas, 0.00 Kpas/sec, 0.00 sec/doc, 0.00 pas/doc. [EMAIL PROTECTED] ~]# - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=4
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: getting closer to my end result What value for max_user_connections do you have for the User biblers_search ? How many indexers running simultaneously do you have an ow many indexing threads each of them have ? By default, DataparkSearch open one connection per every indexing thread, so if you run 3 indexers with 4 indexing threads each, you'll have 12 connections to the SQL server. To force indexer use one single connection for all indexing threads use -U switch for it. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=4
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Mike Subject: Re: getting closer to my end result I just increased max connections to 100 so it should be ok, I have 2 indexers running now, BUT>.. I wanted to use cache mode and changed all dbmode multi to cache Added this line to search.htm, indexer.conf, cached.confVarDir /usr/local/dpsearch/var/ ./indexer -C ./indexer -Edrop ./indexer -Ecreate cd ../var/cache | rm -rf * /usr/local/dpsearch/sbin/indexer waited a while, did ./indexer -TWH But... no search results.. I just dont know what is going wrong?... - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=4
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: getting closer to my end result Have you stopped /usr/local/dpsearch/sbin/indexer ? - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=4
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Mike Subject: Re: getting closer to my end result ctrl Z before I did all the other work... - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=4
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: getting closer to my end result Ctrl Z suspends the program. To stop it, use Ctrl C. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=4
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Mike Subject: Re: getting closer to my end result I had to make a decission anyways on multi or cache, and as multi works very well now its just the easier choice. Thank you for your patience and kind advise! - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=5
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: getting closer to my end result Please note, dbmode cache works much faster with huge number of URLs indexed. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=5
[dataparksearch] [Forum] show total sites
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: mike Subject: show total sites I would like to out a blib if info on the site total sites indexed total size of index can someone please advise me how to do this... - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;post=
[dataparksearch] [Forum] Re: show total sites
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: show total sites You may find the number of site indexed with this SQL-query to the search database; SELECT COUNT(*) FROM (SELECT distinct site_id FROM url) AS foo; Please note, this query works only for PgSQL and MySQL 5. The number and the size of documents indexed you can find with this SQL-query: SELECT COUNT(*), SUM(docsize) FROM url WHERE status IN (200,206,304,2200,2206,2304); Please note, these queries are very hard formedium and large databases, so it's better to run these queries periodically and write numbers into text file and then include this text file into your web-page. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;topic_id=1218615278
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Mike Subject: Re: getting closer to my end result Yeah I know, but I keep doing something wrong and cant get cache to work... weird! The multi database is useless if you want cache after right? it is one or the other I think... I wish it worked, but am not so good! - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=5
[dataparksearch] [Forum] Re: show total sites
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Mike Subject: Re: show total sites Thanks a lot! Also.. i was wondering, is there anywhere that the search terms are kept? It would be a great statistic to keep track of! - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;topic_id=1218615278
[dataparksearch] [Forum] Re: show total sites
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: show total sites You need to enable search query tracking, see: http://www.dataparksearch.org/dpsearch-track.en.html - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;topic_id=1218615278
[dataparksearch] [Forum] Cannot display search results
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: gagrilli Subject: Cannot display search results Hi, Trying to setup DPsearch for the first time, so this is probably some stupid mistake, but here it is.. Apache 2.2.9, MySQL 5.0.51b, Perl 5.10.0 , (just search.cgi no mod or searchd) I Think I setup indexer.conf asper the instructions, but theindexerrefusesto index anything, unless I provide in the DBAddr command the ?socket parameter, pointing to my mysql.sock file. In this case I don't see how I could provide the desired dbmode in the same line..(?). With the ?socket=... parameter, the indexer runs succesfully, but my search turns up 0/0 results. I have set up a user, the tables get created, but no results. search.cgi gets called as I can see from my server logs, DNAddr in search.html is the same as in indexer.conf, I have cache disabled... Any pointers to what might be wrong are welcome. Anything else I can provide, I am glad to Thanks for the effort you put behind the project - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;post=
[dataparksearch] [Forum] Re: Cannot display search results
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: Cannot display search results You can use both socket and dbmode parameters in DBAddr in that way: DBAddr mysql://?socket=...&dbmode=... - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1218653591
[dataparksearch] [Forum] Re: Cannot display search results
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: gagrilli Subject: Re: Cannot display search results Thanks for your quick reply, Maxime. I got the indexer working, it told me it had 805 documents indexed, but, again nothing(!) I think I am doing something wrong woth the Server command , though, because looking to the MySQL tables I don't see the dict table populated with words, only the url and urlinfo ones. I am not interested in following the links in my documents while indexing, because they are not a website, only html and other types gathered together (I have external parsers in place and anyway I search for the common words found in HTML docs), so I put the MaxHops command with value 2. I can access the documents through the browser normally My Server command directive in indexer.conf is as follows: Server http://localhost/CLI/ file:/opt/lampp/htdocs/CLI/ I take it this is the correct way to tell it not to request by HTTP header but through the filesystem, right? Does the indexer respect that command or should I use sth else? Does the search.cgi script respect that command or should I use sth else? And one last question.. Do I just need to manually re-run indexer every time, or do I need to clear some cache somewhere? Thanks again for your help... appreciate it - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1218653591;reply=1218653955
[dataparksearch] [Forum] Re: Cannot display search results
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: gagrilli Subject: Re: Cannot display search results I really don't understand what else can I add to my indexer.conf so that the basic functionality appears.. I tried changing the dbmode , I tried altering the Server directive, I tried changing the Period command, I tried changing the MaxHops command, I tried searching the forum, I tried deleting and recreating the database, I tried re-installing DPsearch. Please don't take this the wrong way, I'm sure the correct configuration is somewhere in the docs, I just can't seem to locate it... OK, up to now.. --indexer connects to the database (MySQL), runs OK, with increased verbosity I can see it parses my documents, with the -a option it gives 304 (not changed) --search.cgi appears correctly on my browser (simple & extended) --every search query has 0/0 result I can post my .conf file if it is useful... I don't expect to be taken by the hand here, just some direction I could follow. I know the developers' time is not for me to abuse, but it would really mean a world of difference, because I'm in a somewhat tight schedule here. Thanks againfor any answer... - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01&topic_id=1218653591
[dataparksearch] [Forum] Re: show total sites
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Mike Subject: Re: show total sites Do I need to Edrop Ecreate again to change it over? - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;topic_id=1218615278
[dataparksearch] [Forum] Re: show total sites
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: show total sites No, you don't need it. You can add any URL to the database using the following SQL command: INSERT INTO url (url, next_index_time) VALUES ('http://server.ext/', 0); Attention: don't delete any URL in such way! Also you can add any URL using indexer command: /usr/local/dpsearch/sbin/indexer -qiu http://server.ext/ Please note: for both cases, you need to have a corresponding Server/Realm/Subnet command in your config for each URL feeded, otherwise all URL without appropriate Server/Realm/Subnet command will be deleted as indexer try to index them. You can delete any URL fom database using the indexer: /usr/local/dpsearch/sbin/indexer -Cu http://server.ext/ - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;topic_id=1218615278
[dataparksearch] [Forum] Re: Cannot display search results
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: Cannot display search results Have you created your sections.conf file and include it from your indexer.conf file ? - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1218653591
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: getting closer to my end result If you would like ti try cache mode once again, add the following command to your search.htm template LogLevel 5 and show the output to the server error log when your perform a search request. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=5
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Mike Subject: Re: getting closer to my end result I am so sorry, but which error log? The servers error log shows no errors when I add LogLevel 5 to the search.htm - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=5
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Maxime Subject: Re: getting closer to my end result It's web-server error log for a web-server where search.cgi is calling. Or you can run search.cgi from command line: /usr/local/dpsearch/bin/search.cgi bible 2>err.log then show the content of err.log file. - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=5
[dataparksearch] [Forum] Re: getting closer to my end result
- - - - - - - - - - - - - - - - - - - - - - - - - - - - Name: Mike Subject: Re: getting closer to my end result search.cgi[3292]: {00} search.cgi started with '/usr/local/dpsearch/etc/search.htm' search.cgi[3292]: {00} VarDir: '/usr/local/dpsearch/var' search.cgi[3292]: {00} Affixes: 0, Spells: 0, Synonyms: 0, Acronyms: 0, Stopwords: 122 search.cgi[3292]: {00} Chinese dictionary with 0 entries search.cgi[3292]: {00} Korean dictionary with 0 entries search.cgi[3292]: {00} Thai dictionary with 0 entries search.cgi[3292]: {00} Start DpsFind search.cgi[3292]: {00} DpsFind for mysql://biblers_search:[EMAIL PROTECTED]/biblers_search/?dbmode=multi&trackquery search.cgi[3292]: {00} DpsGetWords for mysql://biblers_search:[EMAIL PROTECTED]/biblers_search/?dbmode=multi&trackquery search.cgi[3292]: {00} .spell lang: en search.cgi[3292]: {00} Prepare query: bible, ltxt:bible search.cgi[3292]: {00} Segment lang: search.cgi[3292]: {00} wrd {5}: bible search.cgi[3292]: {00} Start search for 'bible' search.cgi[3292]: {00} Stop search for 'bible' 0.10 18670 found search.cgi[3292]: {00} Start sort by url_id 18670 words search.cgi[3292]: {00} Stop sort by url_id: 0.00 search.cgi[3292]: {00} Start group by url_id 18670 docs search.cgi[3292]: {00} max_order: 0 max_order_inquery: 0 search.cgi[3292]: {00} Stop group by url_id:0.08 search.cgi[3292]: {00} Start load url data 5096 docs search.cgi[3292]: {00} Stop load url data: 0.10 search.cgi[3292]: {00} Start SORT by PATTERN 5096 words search.cgi[3292]: {00} Stop SORT by PATTERN:0.00 search.cgi[3292]: {00} use_showcnt: 0 ratio: 0.00 search.cgi[3292]: {00} Start Order, Last-Modified and Excerpts search.cgi[3292]: {00} [] Retrieve rec_id: 399488c7 search.cgi[3292]: {00} [] Retrieved rec_id: 399488c7 Size: 34395 Ratio: 28.07% search.cgi[3292]: {00} [] Retrieve rec_id: d671482e search.cgi[3292]: {00} [] Retrieved rec_id: d671482e Size: 69087 Ratio: 23.58% search.cgi[3292]: {00} [] Retrieve rec_id: 30145be2 search.cgi[3292]: {00} [] Retrieved rec_id: 30145be2 Size: 63033 Ratio: 24.85% search.cgi[3292]: {00} [] Retrieve rec_id: c1a76e13 search.cgi[3292]: {00} [] Retrieved rec_id: c1a76e13 Size: 17978 Ratio: 40.32% search.cgi[3292]: {00} [] Retrieve rec_id: 2436fd71 search.cgi[3292]: {00} [] Retrieved rec_id: 2436fd71 Size: 13247 Ratio: 26.51% search.cgi[3292]: {00} [] Retrieve rec_id: 386ba112 search.cgi[3292]: {00} [] Retrieved rec_id: 386ba112 Size: 51630 Ratio: 14.83% search.cgi[3292]: {00} [] Retrieve rec_id: efb686f1 search.cgi[3292]: {00} [] Retrieved rec_id: efb686f1 Size: 13400 Ratio: 39.96% search.cgi[3292]: {00} [] Retrieve rec_id: 1a1340d7 search.cgi[3292]: {00} [] Retrieved rec_id: 1a1340d7 Size: 45712 Ratio: 15.55% search.cgi[3292]: {00} [] Retrieve rec_id: 79d513ed search.cgi[3292]: {00} [] Retrieved rec_id: 79d513ed Size: 46894 Ratio: 17.28% search.cgi[3292]: {00} [] Retrieve rec_id: 19371393 search.cgi[3292]: {00} [] Retrieved rec_id: 19371393 Size: 43155 Ratio: 16.39% search.cgi[3292]: {00} Stop Order, Last-Modified and Excerpts: 2.63 search.cgi[3292]: {00} Start DpsTrack search.cgi[3292]: {00} Stop DpsTrack: 0.00 search.cgi[3292]: {00} Done DpsFind 12.925 - - - - - - - - - - - - - - - - - - - - - - - - - - - - Read the full topic here: http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=01;topic_id=1217914135;page=5