- Original Message -
From: "Erik Hatcher" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Tuesday, March 09, 2004 11:51 AM
Subject: Re: Lucene Taglib
> Also, there was a big smiley implicit in my JSP taglib rantings below.
> Certainly no offense intended.
None ta
On Mar 8, 2004, at 10:21 PM, Iskandar Salim wrote:
Thanks for the tips and comments.
Also, there was a big smiley implicit in my JSP taglib rantings below.
Certainly no offense intended. I've paid my Struts/taglib dues and am
now deep into a completely different web development paradigm that I
Thanks for the tips and comments.
Regards,
Iskandar
- Original Message -
From: "Erik Hatcher" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Monday, March 08, 2004 7:48 PM
Subject: Re: Lucene Taglib
> On Mar 8, 2004, at 3:46 AM, Iskandar Salim wrote:
> > I've wor
I don't see any reason for this to be a Hashtable.
It seems an acceptable alternative to not share analyzer/filter
instances across threads - they don't really take up much space, so is
there a reason to share them? Or I'm guessing you're sharing it
implicitly through an IndexWriter, huh?
I'm looking at StopFilter.java right now...
I did a kill -3 java and a number of my threads were blocked here:
"ksa-task-thread-34" prio=1 tid=0xad89fbe8 nid=0x1c6e waiting for
monitor entry [b9bff000..b9bff8d0]
at java.util.Hashtable.get(Hashtable.java:332)
- waiting to lock <0x61
On Monday 08 March 2004 12:34, Erik Hatcher wrote:
> In the RealWorld... many applications actually just re-run a search and
> jump to the appropriate page within the hits searching is generally
> plenty fast enough to alleviate concerns of caching.
>
> However, if you need to cache Hits, you n
I have a BooleanQuery that takes 3 TermQueries
for example (title:colombo OR txt:colombo OR city:colombo)
I would like to mark hits that match in the field title in red on
display, txt in blue, and city in green. and maybe those that match in 2
fields in another color
is this possible?
thanks
My impression is the new term vector support should at least make this
type of comparison feasible in some manner. I'd be interested to see
what you come up with if you give this a try. You will need the latest
CVS codebase.
Erik
On Mar 8, 2004, at 4:37 PM, Michael Giles wrote:
I'm looking
that kind of fuzzy equality is an area of open research. you need to define what is an
acceptable error rate for Type 1 and Type 2 errors before you can think about
implementations that scale better. approaches range from identifying document
vocabulary and statistics to raw hashing of the input
I'm looking for a way to filter out duplicate documents from an index
(either while indexing, or after the fact). It seems like there should be
an approach of comparing the terms for two documents, but I'm wondering if
any other folks (i.e. nutch) have come up with a solution to this problem.
I tend to agree (but with the same uncertainty as to why I feel that way).
Regards,
Terry
- Original Message -
From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Monday, March 08, 2004 2:34 PM
Subject: Re: Sys properties Was: java.io.tmpdir as
I can't explain why, but I feel like the old index format should stay
by default. I feel like I'd rather a (slightly) faster index, and
switch to the compound one when/IF I encounter problems, than have a
safer, but slower index, and never realize that there is a faster
option available.
Weak arg
In the RealWorld... many applications actually just re-run a search and
jump to the appropriate page within the hits searching is generally
plenty fast enough to alleviate concerns of caching.
However, if you need to cache Hits, you need to be sure to keep around
the originating IndexSearch
Hi all,
could someone describe his expirience in
implementation of caching, sorting and paging search
results.
Is Stateful Session bean appropriate for this?
My wish is to obtain all search hits only in first
call, and after that, to iterate through Hit
Collection and display cached results.
I hav
hui wrote:
Index time:
compound format is 89 seconds slower.
compound format:
1389507 total milliseconds
non-compound format:
1300534 total milliseconds
The index size is 85m with 4 fields only. The files are stored in the index.
The compound format has only 3 files and the other has 13 files.
T
Erik Hatcher wrote:
private static final DecimalFormat formatter =
new DecimalFormat("0"); // make this as wide as you need
For ints, ten digits is probably safest. Since Lucene uses prefix
compression on the term dictionary, you don't pay a penalty at search
time for long shared pre
Thank you, the converting option from Luke is really helpful for migrate
existing user index.
Regards,
Hui
-Original Message-
From: Andrzej Bialecki [mailto:[EMAIL PROTECTED]
Sent: Monday, March 08, 2004 10:57 AM
To: Lucene Users List
Subject: Re: Sys properties Was: java.io.tmpdir as loc
hui wrote:
Hi,
Here is the indexing performance testing result for the two index formats.
A shameless plug: you can use Luke (http://www.getopt.org/luke) to
convert the same index between compound/non-compound formats. Which
could be useful to rule out any possible differences in the
indexin
Hi,
Here is the indexing performance testing result for the two index formats.
1000 megahertz Intel Pentium III (2 installed)
32 kilobyte primary memory cache
256 kilobyte secondary memory cache
SCSI Hard drive 145.45 GB
RAm 3G
Windows 2000 Advanced Server, Service Pack 2
JDK 140
JVM me
On Mar 8, 2004, at 3:46 AM, Iskandar Salim wrote:
I've worked on a bit on the taglib and added an "index" and "field"
tag for
basic indexing capability, though I don't think it's really useful,
apart
from, in my case quick prototyping of web applications. What do you
guys
think? I'm new to Lucen
Hi,
I've worked on a bit on the taglib and added an "index" and "field" tag for
basic indexing capability, though I don't think it's really useful, apart
from, in my case quick prototyping of web applications. What do you guys
think? I'm new to Lucene and taglibs so I may have missed out lots of
t
21 matches
Mail list logo