Re: Indexing a largish collection of mail and usenet messages?

2007-01-02 Thread Christophe Ollier

John L a écrit :

I have a collection of archives of mailing list and news messages. The 
largest collection is pretty big, about 150,000 messages which means 
about 200 megabytes of text, shortly to be migrated to a FreeBSD 
server.  The lists are all active so archives typically add a few 
messages each day. I want to provide a full text search of each 
archive.  What software should I use?  I have been using the sturdy but 
ancient lqtext package. It's OK, but it has a few bugs I have yet to 
pick and I'm wondering if something better is available.


You could have a look at Lucene (http://lucene.apache.org/) : a text 
search engine library written in Java. I don't know lqtext, but Lucene 
seems to work in a similar way : a first program builds  updates an 
index, a second program allows to query the index.


It's only a library, you have to program the interfaces for you 
(indexing) and your users (querying). There are numerous ports to other 
languages (C, Perl, Python, PHP (through ZendFramework) are in the ports 
tree).


First, I am NOT, repeat NOT, asking about web spiders.  The messages are 
directly available to indexing software as files on my server, so 
there's no advantage to running them through Apache on the way to the 
indexer. Also, the messages in the archive never change and I know what 
files are new each day, so it would be pointless for a package to 
re-spider the whole archive to look for the new messages.  I am not 
unalterably opposed to something that spiders if it is otherwise 
wonderful, but that approach hasn't been fruitful in the past.


Lucene can update an existing index with new documents.

What I want ideally is something that knows enough about the structure 
of mail messages to deal intelligently with headers vs. body, that can 
do something reasonable with MIME and HTML bodies (not urgent, I can 
always run them through demime on the way to the index), and most 
importantly that actually works with 150,000 messages.  I've seen lots 
of packages that look promising but that fall over dead once they get 
past 10,000 messages or so.


I don't think Lucene can do this out of the box, but you can associate 
any keyword to your indexed documents (e.g. mail headers).


About performance, I'm personally satisfied. I use the PHP port, with 
20k documents, the full index takes about an hour to build, queries 
about 100 to 1000 ms. Lucene seems fit for millions of documents.



[...]


--
Christophe
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Torrent Program

2005-09-04 Thread christophe ollier

Le 20/01/2005 13:25, Warren a écrit :
Im chasing a GUI Torrent program that will allow multiple downloads of 
torrents without having to re-open the d/l program for each new torrent.  If 
anyone knows of such a program please let me know(not QTorrent)


You can try ports/net/mldonkey. It comes in different flavours : with or 
without GUI. You can install it without GUI (mldonkey-core), and then 
use another software as graphical interface, or the integrated 
telnet/web server to control it.


MLDonkey lets you use different p2p protocols in addition to bittorrent. 
, as ed2k/kademlia, gnutella, fasttrack...


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Witch apache, mysql and php do i need ?

2005-01-27 Thread christophe ollier
Le 27/01/2005 à 07:49, Gert Cuykens a écrit :
So i can use phpmyadmin, use php as a apache module , make a ssl
connection and use the include feature.
The easiest way would be to install the phpMyAdmin port, it will build 
all needed software, with the exception of mysql-server (the MySQL port 
is split between a client and a server softwares).

PS 2.1+ and 5+ are my favorite numbers :)
Sorry, the dependencies at this time show Apache 1.3.33, mod_php 4.4.3, 
MySQL 4.0.23a. Wrong numbers :)

Also can everybody make a ssl connection or do you have to register a
key [...] ?
I'm lucky you didn't use XOR, because the answer is yes and yes. 
Olivier already provided details.

Cheers,
--
Christophe
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Witch apache, mysql and php do i need ?

2005-01-27 Thread christophe ollier
Le 27/01/2005 à 10:16, Gert Cuykens a écrit :
If i install phpmyadmin will php be a module into apache or a cgi bin ?
It's mod_php, as in module. It brings better integration of PHP into 
Apache than the classic CGI.

Witch mysql do i install exactly ? And after installing it, will it
work or will php say something like cant connect to mysql :)
I personaly use the exact same version as the client. As phpMyAdmin uses 
a 4.0.x client, I installed databases/mysql40-server from the ports. 
You'll have to edit phpMyAdmin config files to match the server. See 
http://www.phpmyadmin.net/documentation/#setup

Do i need something extra for the ssl ?
Yes, but I haven't personal experience of this. There seems to be a good 
page here : http://www.unixcities.com/apache-openssl/

Cheers,
--
Christophe
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Witch apache, mysql and php do i need ?

2005-01-27 Thread christophe ollier
Le 28/01/2005 à 00:48, Gert Cuykens a écrit :
7rxI# make pretty-print-run-depends-list
This port requires package(s) grep: /usr/ports/INDEX-5: No such file
or directory
 to run.
7rxI# make pretty-print-build-depends-list
7rxI#
How can i show the list of dependencies ?
It seems your ports index is missing. I think you can try this to make 
another one :

cd /usr/ports
make fetchindex
portsdb -u
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Too many entries in UPDATING ?

2004-12-19 Thread christophe ollier
Hello,
I just undertook the process of upgrading from 5.2.1-RELEASE to 
RELENG_5_3. First step was to read the doc, next to synchronize sources 
with cvsup.

My problem comes at the read /usr/src/UPDATING step. I begun to track 
the entries pertaining to my system. I found so far two (20041010 on rc 
scripts, and 20040925 about bind). The task seems overwhelming if I have 
to track all of this : there are in this file many entries, but most of 
them have a date largely _before_ my initial installation of 5.2.1.

Should I follow every entry, or only those after the date of my last 
system upgrade (in my case, initial installation) ? And, if the later is 
the case, how can I know the precise release date of 5.2.1-RELEASE ? Is 
there a calendar somewhere on the web, or, preferably, a log file or 
command on my system ? Or maybe an utility helping in this task ?

Thanks for your help !
--
Christophe Ollier
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Too many entries in UPDATING ?

2004-12-19 Thread christophe ollier
Le 20/12/2004 00:27, Robert Huff a écrit :
To the original poster: are you asking how to determine when
the last update of your system sources (via cvsup etc.) happened?
Yes, that was my question. As my last update was an installation of 
5.2.1-RELEASE, I found the date (2004-02-22) in the file pointed by Kris 
(/usr/share/misc/bsd-family-tree). Now I know that I have to follow 
every entry in /usr/src/UPDATING dated after this day.

Or how to determine when the last system (kernel+system utilities
upgrade happened? 
This information will also be useful one day, on the next system 
upgrade, when I will have to determine which entries in UPDATING are 
pertaining. uname -a seems to give the kernel compilation time. I 
don't know if there is a way to know the date of the last system 
upgrade; I sure can mark the day in my agenda.

Thanks again.
--
Christophe
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]