Joachim Arrasz wrote:
Hello List.
we have written an application which includes OpenOffice Integration
into an OpenSource CMS (OpenCms).
For this CMS there is a Lucene Integration available under sourceforge.
So now we are looking for search and index Filters for Lucene, that
weÂŽre able to inte
Hi Michael,
I wonder if you would be interested in cooperating on the
extracting/index management bit. We use Lucene and our own extractor
plugins for a Swing-application:
http://tockit.sf.net/docco
Code can be found here:
http://cvs.sourceforge.net/viewcvs.py/toscanaj/docco/
It is BSD-Style l
We did a simple one a while ago. Could probably be a bit more
sophisticated, but it seems to do it job on the little bit of testing we
did.
See
http://cvs.sourceforge.net/viewcvs.py/toscanaj/docco/source/org/tockit/docco/documenthandler/OpenOfficeDocumentHandler.java?rev=1.4&view=auto
HTH,
Pe
rano-gnome/...
cheers,
sv
On Wed, 14 Apr 2004, Peter Becker wrote:
Hello,
we released Docco 0.3 along with two updates for its plugins.
Docco is a personal document retrieval tool based on Apache's Lucene
indexing engine and Formal Concept Analysis. It allows you to create an
index fo
Hello,
we released Docco 0.3 along with two updates for its plugins.
Docco is a personal document retrieval tool based on Apache's Lucene
indexing engine and Formal Concept Analysis. It allows you to create an
index for files on your file system which you can then search for
keywords. It can i
Hi Sebastian,
there are not too many Lucene features used, and some rather orthogonal
mixin of Formal Concept Analysis, but let me still advertise our little
Docco tool:
http://tockit.sourceforge.net/docco/index.html
It is based on Lucene, comes with a couple of indexing tools (including
HTM
Tatu Saloranta wrote:
On Thursday 18 September 2003 14:50, Michael Giles wrote:
I know, I know, the HTML Parser in the demo is just that (i.e. a demo), but
I also know that it is updated from time to time and performs much better
than the other ones that I have tested. Frustratingly, the very
Erik Hatcher wrote:
[...]
- Index text and HTML files. Any others? I don't want to get into
putting too many dependencies in though - let's keep it relatively
simple, although still demonstrative. Allow search filtering by last
modified date range and document type (extension).
If I may pl
ve you a reasonable estimate of the effort involved.
Cheers,
Peter
Best,
Gregor
-----Original Message-
From: Peter Becker [mailto:[EMAIL PROTECTED]
Sent: Tuesday, September 02, 2003 1:52 PM
To: Lucene Users List
Subject: ANN: Docco 0.2 / contribution offer
Hi all,
we finally finished the
Hi all,
we finally finished the 0.2 release of our little personal document
management tool based on Lucene:
http://tockit.sourceforge.net/docco/index.html
This might be interesting for some readers of this list since its source
contains some infrastructure for document handlers and index man
Brian Mila wrote:
amounts). I failed to find a way to get Lucene to give me this
information without hacking this or that. Considering the attention IR
Excuse me if this is off-topic, but isn't hacking the code what open source
software is all about?
Not always, but quite often :-)
I mean
hat end?
Regards,
Terry
- Original Message -
From: "Peter Becker" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Thursday, August 21, 2003 1:37 AM
Subject: Re: Similar Document Search
Hi all,
it seems there are quite a few people l
yet, therefore no code at this time...
Best regards,
Gregor
-----Original Message-
From: Peter Becker [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 19, 2003 3:06 AM
To: Lucene Users List
Subject: Re: Similar Document Search
Hi Terry,
we have been thinking about the same problem and in the
le code if anyone is interested.
/magnus
Peter Becker wrote:
Hi Terry,
we have been thinking about the same problem and in the end we
decided that most likely the only good solution to this is to keep a
non-inverted index, i.e. a map from the documents to the terms. Then
you can query the most terms f
Hi Terry,
we have been thinking about the same problem and in the end we decided
that most likely the only good solution to this is to keep a
non-inverted index, i.e. a map from the documents to the terms. Then you
can query the most terms for the documents and query other documents
matching p
Hi Tom,
Killeen, Tom wrote:
I am attempting to create approx 10 different Lucene indexes. I'm trying to
create them at the same time by running multiple processes and each index is
written to a new directory. Once I create more than one process - the
performance is very, very slow.
As Otis s
Kevin A. Burton wrote:
Killeen, Tom wrote:
I am attempting to create approx 10 different Lucene indexes. I'm
trying to
create them at the same time by running multiple processes and each
index is
written to a new directory. Once I create more than one process - the
performance is very, very s
Andrzej Bialecki wrote:
[...Luke feature requests...]
open the original Dokuments with the platform dependant mimetype viewer
Someone else already explained the problems with this... What is a
document in Lucene? It's a set of fields and their String values (or
their terms), so it's not possi
[redirected to lucene-user]
Me, too! :-)
We are currently playing with the small Reuters collection (about 21.500
news items from the 80s), but I don't know if I am allowed to distribute
it and it is too small anyway -- many of the implications we find are
based on 1 to 3 documents. I still ha
Hi all,
I am interested in comparing different query result sets in term of term
frequency. Questions I'd like to answer are:
- what are the N most common terms in a result set?
- how often does term X occur in a certain result set?
The second one is of course easy to do with a boolean query, bu
Roger Ford wrote:
[...index size troubles...]
Believe it or not, this 10 million documents was meant to be a single
partition of a much larger dataset. I'm not sure I'm at liberty to
discuss in detail the data I'm indexing - but it's a massive
geneological database.
Roger,
maybe your data type is
StarOffice (OpenOffice and StarOffice 6, please correct me if
this isn't true), so I'd like to know if there is already any way to index
files of version 5.2 and below, or any clue on how to do this,
Thank you in advance,
Oscar Herrera
Bogotá, Colombia, SA.
- Original Message -
Fr
Hi Oscar,
we have been looking into the StarOffice/OpenOffice problem, although we
haven't done it and probably won't anytime soon as we have to move on to
other things. I see two approaches, both with variants:
(1) use the fact that it is just zipped XML: use a ZipInputStream to
open the file
Hi Alvaro,
there are some examples in our code here -- working with a slightly
similar interface to the Ant task in the Lucene contributions.
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/toscanaj/docco/source/org/tockit/docco/indexer/documenthandler/
The actual step of turning it into a Luc
Hi,
is there any way to get the keywords for certain fields in document
easily? The situation is that I have small sets of documents coming back
from queries and I want to compare those in terms of similarity. The
questions are: what are the common terms within each set and what are
the terms
25 matches
Mail list logo