Well for the moment we don't. The lucene index only contains the full
text content (indexed, not stored). We use lucene to perform full text
and fuzzy searches on the keywords field. Once we have the result, we
match them with the geospatial box provided by the user (we use Oracle
Spatial for that)
One trick I can think of is somehow keeping internal
document id of Lucene document same after document is
updated (i.e. deleted and re-inserted). I am not sure
if we have this capability in Lucene.
Regards,
Rajesh
--- Otis Gospodnetic <[EMAIL PROTECTED]>
wrote:
> That's correct, Rajesh. Paral
Right. And the typical answer to that is:
- If your terms are roughly equally distributed in all N indices (e.g. random
doc->index/shard assignment), the relevance score will roughly match.
- If you have business rules for doc->index/shard distribution, then your
relevance scores will not be c
That's correct, Rajesh. ParallelReader has its uses, but I guess your case is
not one of them, unless we are all missing some key aspect of PR or a trick to
make it work in your case.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Rajesh
From: John Wang <[EMAIL PROTECTED]>
[...]
> sub index 1: 1 billion docs
> sub index 2: 1 billion docs
> sub index 3: 1 billion docs
>
> federating search to these subindexes, you represent an index of 3
> billiondocs, and all internal doc ids are of type int.
That falls under Daniel's "...unless
I am not sure why this is the case, docid is internal to the sub index. As
long as the sub index size is below 2 bil, there is no need for docid to be
long. With multiple indexes, I was thinking having an aggregater which
merges maybe only a page of search result.
Example:
sub index 1: 1 billion
Stephane,
Could you describe how you setup the spatial area? Having BooleanQuery with
200 terms in it definitely slows things down (I'm not sure exactly why yet
-- it seems like it shouldn't be "that" slow). If you can describe your
spatial area in fewer terms you can get much better performance.
Thanks Yonik.
So, if rebuilding the second index is not an option
due to large no of documents, then ParallelReader will
not work :-(
And I believe there is no other way than
parallelReader to search across multiple indexes that
contain related data. Is there any other alternative?
I think, Multi
On Wed, Apr 30, 2008 at 10:52 PM, Rajesh parab <[EMAIL PROTECTED]> wrote:
> Can we somehow keep
> internal document id same after updating (i.e. delete
> and re-insert) index document?
No. ParallelReader is not a general solution, it's an expert-level
solution that leaves the task of keeping t
On May 1, 2008, at 4:36 AM, esra wrote:
Hi,
document's encoding is "UTF-8".
i tried the explain() method and the result for "د-ژ" range
searching is:
fieldWeight(keywordIndex:ساب وو�ر in 0),
product of:
1.0 = tf(termFreq(keywordIndex:ساب وو�ر)=1)
0.30685282 = idf(do
Hi Esra,
Going back to the original problem statement, I see something that looks
illogical to me - please correct me if I'm wrong:
On Apr 30, 2008, at 3:21 AM, esra wrote:
> i am using lucene's "IndexSearcher" to search the given xml by
> keyword which contains farsi information.
> while search
The issue here is a general one of trying to perform an efficient join between
an external resource (rdbms) and Lucene.
This experiment may be of interest:
http://issues.apache.org/jira/browse/LUCENE-434
KeyMap.java embodies the core service which translates from lucene doc ids to
DB primary
Hi,
document's encoding is "UTF-8".
i tried the explain() method and the result for "د-ژ" range searching is:
fieldWeight(keywordIndex:ساب وو�ر in 0), product of:
1.0 = tf(termFreq(keywordIndex:ساب وو�ر)=1)
0.30685282 = idf(docFreq=1)
1.0 = fieldNorm(field=keywordIndex,
Hi there,
We're using lucene with Hibernate search and we're very happy so far
with the performance and the usability of lucene. We have however a
specific use cases that prevent us to use only lucene: spatial
queries. I already sent a mail on this list a while back about the
problem and we starte
> Even if they're in multiple indexes, the doc IDs being ints
> will still prevent
> it going past 2Gi unless you wrap your own framework around it.
Hm. Does this mean that a MultiReader has the int-limit too?
I thought that this limit applies to a single index only...
Hi Steve,
thanks for your reply , i know farsi is written and read right-to-left.
i am using RangeOuery class and it's rewrite(IndexReader reader) method
decides if the word is in range or not by compareTo method and this decision
is made by using unicodes.
while searching for "د-ژ" range the lo
16 matches
Mail list logo