I am using mmap fs directory in lucene. My index is small (about 3GB
in disk) and I have plenty of memory available. The problem is that
when the term is first queried, it's slow. How can I "load" all
directory into memory? One solution is using many query to "warm" it
up. But I can't query all ter
I have an index of about 30 million short strings, the index size is
about 3GB in disk
I have give jvm 5gb memory with default setting in ubuntu 12.04 of sun jdk 7.
When I use 20 theads, it's ok. But If I run 30 threads. After a
while. The jvm is doing nothing but gc.
lucene version si 4.10.0
as it works well
> with other much more demanding projects.
>
>
>
>
>
>
>
>
> ------ 原始邮件 --
> 发件人: "Li Li";
> 发送时间: 2012年8月16日(星期四) 上午9:59
> 收件人: "java-user";
>
> 主题: Re: Why does this query slow down
projects.
>
>
>
>
>
>
>
>
> ------ 原始邮件 --
> 发件人: "Li Li";
> 发送时间: 2012年8月16日(星期四) 上午9:59
> 收件人: "java-user";
>
> 主题: Re: Why does this query slow down Lucene?
>
>
>
> how slow is it? are all your
how slow is it? are all your searches slow or only that query slow? how
many docs are indexed and the size of the indexes? whats the hardware
configuration?
you should describe it clearly to get help.
在 2012-8-16 上午9:28,"zhoucheng2008" 写道:
> Hi,
>
>
> I have the string "$21 a Day Once a Month" to
hi everyone,
in lucene 4.0 alpha, I found the DocValues are available and gave
it a try. I am following the slides in
http://www.slideshare.net/lucenerevolution/willnauer-simon-doc-values-column-stride-fields-in-lucene
I have got 2 questions.
1. is DocValues updatable now?
2. How
flush is not commit.
On Thu, Jun 28, 2012 at 2:42 PM, Aditya wrote:
> Hi Ram,
>
> I guess IndexWriter.SetMaxBufferedDocs will help...
>
> Regards
> Aditya
> www.findbestopensource.com
>
>
> On Wed, Jun 27, 2012 at 11:25 AM, Ramprakash Ramamoorthy <
> youngestachie...@gmail.com> wrote:
>
>> Dear,
Analyzer works, but I don't understand why
> StandardAnalyzer does not work if according with the ChineseAnalyzer
> deprecation I should use StandardAnalyzer:
>
> @deprecated Use {@link StandardAnalyzer} instead, which has the same
> functionality.
>
> Is very annoying.
>
&g
On Thu, Jun 28, 2012 at 11:14 AM, wangjing wrote:
> thanks
>
> could you help me to solve another problem,
>
> why lucene will reset lastDocID = 0 when finish add one doc?
it will not call finish after adding a document
reading the JavaDoc of FormatPostingsDocsConsumer
/** Called when w
lastDocID represent last document which contains this term.
because it will reuse this FormatPostingsDocsConsumer. so you need
clear all member variables in finish method
On Thu, Jun 28, 2012 at 11:14 AM, wangjing wrote:
> thanks
>
> could you help me to solve another problem,
>
> why lucene will
standard analyzer will segment each character into a token, you should use
whitespace analyzer or your own analyzer that can tokenize it as one token
for wildcard search
在 2012-6-27 傍晚6:20,"Paco Avila" 写道:
> Hi there,
>
> I have to index chinese content and I don't get the expected results when
>
what do you want to do?
1. sort all matched docs by field A.
2. sort all matched docs by relevant score, selecting top 100 docs and
then sort by field A
On Wed, Jun 27, 2012 at 1:44 PM, Yogesh patel
wrote:
> Thanks for reply Ian ,
>
> But i just gave suppose document number..i have 2-3 GB index
trying a 2nd multi-valued field with exactly (or closer to) what
> you need in it.
>
> -Paul
>
> > -Original Message-
> > From: Li Li [mailto:fancye...@gmail.com]
> > our old map implementation use about 10 ms, while newer one is 40
> > ms. the reason i
ow,
> no additional fields or field caches), and retrieving document would
> be fast enough simply because all data is in RAM.
>
>
> On Fri, Jun 22, 2012 at 3:56 AM, Li Li wrote:
> > use collector and field cache is a good idea for ranking by certain
> > field'
r that special query.
>
> I hope that helps,
>
> -Paul
>
>> -Original Message-
>> From: Li Li [mailto:fancye...@gmail.com]
>> but as l can remember, in 2.9.x FieldCache can only apply to indexed but not
>> analyzed fields.
>
--
u are on 4.x you can use DocTermOrds
> (FieldCache.getDocTermOrds) which allows for multiple tokens per
> field.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Wed, Jun 20, 2012 at 9:47 AM, Li Li wrote:
>> but as l can remember, in 2.9.x FieldCache can only ap
However if you are building result set in your own Collector, using
> FieldCache is quite straight forward.
>
>
> On Wed, Jun 20, 2012 at 3:49 PM, Li Li wrote:
> > hi all
> >I need to return certain fields of all matched documents quickly.
> > I am now using Document.
FieldCache is quite straight forward.
>
>
> On Wed, Jun 20, 2012 at 3:49 PM, Li Li wrote:
> > hi all
> >I need to return certain fields of all matched documents quickly.
> > I am now using Document.get(field), but the performance is not well
> > enough. Origin
hi all
I need to return certain fields of all matched documents quickly.
I am now using Document.get(field), but the performance is not well
enough. Originally I use HashMap to store these fields. it's much
faster but I have to maintain two storage systems. Now I am
reconstructing this project.
t; Lots of good tips in
>>> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed, linked from
>>> the FAQ.
>>>
>>>
>>> --
>>> Ian.
>>>
>>>
>>> On Tue, May 22, 2012 at 2:08 AM, Li Li wrote:
>>> > something wrong wh
is not fully used, yuo can do this in one physical machine
在 2012-5-22 上午8:50,"Li Li" 写道:
>
>
> 在 2012-5-22 凌晨4:59,"Yang" 写道:
>
> >
> > I'm trying to make my search faster. right now a query like
> >
> > name:Joe Moe Pizza address:77 m
在 2012-5-22 凌晨4:59,"Yang" 写道:
>
> I'm trying to make my search faster. right now a query like
>
> name:Joe Moe Pizza address:77 main street city:San Francisco
>is this a conjunction query or a disjunction query?
> in a index with 20mil such short business descriptions (total size about
3GB) take
what's your meaning of performance of storage?
lucene just stores all fields of a document(or columns of a row if in
db) together. it can only store string. you can't store int or long(
except you convert it to string). to retrieve a given field of a
document will cause many io operations. it's des
gt;
> On Fri, May 11, 2012 at 11:01 AM, Li Li wrote:
>> I have some french hotels such as Elysée Etoile
>> But for many of our users, then can't type French letters, so they
>> will type Elysee Etoile
>> is there any analyzer can do this? thanks.
>>
>>
I have some french hotels such as Elysée Etoile
But for many of our users, then can't type French letters, so they
will type Elysee Etoile
is there any analyzer can do this? thanks.
-
To unsubscribe, e-mail: java-user-unsubscr...@
But this only get (term1 or term2 or term3. ). you can't
implement (term1 or term2 ...) and (term3 or term4) by this method.
maybe you should writer your own Scorer to deal with this kind of queries.
On Tue, May 8, 2012 at 9:44 PM, Li Li wrote:
> disjunction query is much slo
ohol)".So,
> the long query happens...I hope i have describe the question
> clearly.
> At 2012-05-08 18:44:13,"Li Li" wrote:
>>a disjunction (or) query of so many terms is indeed slow.
>>can u describe your real problem? why you should the disjunction
a disjunction (or) query of so many terms is indeed slow.
can u describe your real problem? why you should the disjunction
results of so many terms?
On Sun, May 6, 2012 at 9:57 PM, qibaoy...@126.com wrote:
> Hi,
> I met a problem about how to search many keywords in about 5,000,000
> do
what's your analyzer?
if you use standard analyzer, I think this won't happen.
if you want to get exact match of name field, you should index this
field but not analyze it.
On Mon, May 7, 2012 at 11:59 AM, Yogesh patel
wrote:
> Hi
>
> I am using lucene for search implementation .
> I have create
stemmer
semantic is a "large" word, care to use it.
On Sat, Apr 28, 2012 at 11:02 AM, Kasun Perera wrote:
> I'm using Lucene's Term Freq vector to calculate cosine similarity between
> documents, Say my docments has these 3 terms, "owe" "owed" "owing". Lucene
> takes this as 3 separate terms, but
hat contain only one of them.
>
> Ákos
>
>
>
> On Fri, Apr 27, 2012 at 5:17 AM, Li Li wrote:
>
>> sorry for some typos.
>> original query +(title:hello desc:hello) +(title:world desc:world)
>> boosted one +(title:hello^2 desc:hello) +(title:world^2 d
On Thu, Apr 26, 2012 at 5:13 AM, Yang wrote:
>
> I read the paper by Doug "Space optimizations for total ranking",
>
> since it was written a long time ago, I wonder what algorithms lucene uses
> (regarding postings list traversal and score calculation, ranking)
>
>
> particularly the total rankin
has two terms. if it has more terms, the query will become too
complicated.
On Fri, Apr 27, 2012 at 11:12 AM, Li Li wrote:
> you should describe your ranking strategy more precisely.
> if the query has 2 terms, "hello" and "world" for example, and your
> search fie
you should describe your ranking strategy more precisely.
if the query has 2 terms, "hello" and "world" for example, and your search
fields are title and description. There are many possible combinations.
Here is my understanding.
Both terms should occur in title or desc
query may be +(title:
jira/browse/LUCENE-2686 is absolutely useful
> (with one small addition, I'll post it in comments soon). By using it I
> have disjunction summing query with steady subscorers.
>
> Regards
>
> On Tue, Apr 17, 2012 at 2:37 PM, Li Li wrote:
>
>> hi all,
>> I am now
s.apache.org/jira/browse/LUCENE-2686 is absolutely useful
> (with one small addition, I'll post it in comments soon). By using it I
> have disjunction summing query with steady subscorers.
>
> Regards
>
> On Tue, Apr 17, 2012 at 2:37 PM, Li Li wrote:
>
>> hi all,
some mistakes of the example:
after first call advance(5)
currentDoc=6
first scorer's nextDoc is called to in advance, the heap is empty now.
then call advance(6)
because scorerDocQueue.size() < minimumNrMatchers, it just return
NO_MORE_DOCS
On Tue, Apr 17, 2012 at 6:37 P
hi all,
I am now hacking the BooleanScorer2 to let it keep the docID() of the
leaf scorer(mostly possible TermScorer) the same as the top-level Scorer.
Why I want to do this is: When I Collect a doc, I want to know which term
is matched(especially for BooleanClause whose Occur is SHOULD). we ha
. The JoinUtil can be used to do join filtering and the
> block join is more meant for parent / child search.
>
> Martijn
>
> On 23 March 2012 11:58, Li Li wrote:
>
> > thank you. is there any the search time join example?
> > I can only find a JoinUtil in package org.
parent document
> is first, not last, in the group. Try putting the parent (shirt)
> document last in each case instead...
>
> Query-time join is already committed to trunk and 3.x, so it'll be in
> 3.6.0/4.0.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
&
hi all,
I read these two articles
http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html,
http://blog.mikemccandless.com/2012/01/tochildblockjoinquery-in-lucene.htmland
wrote a test program. But it seems there is some problem. it ends with
endless loop.
Here is my pro
it's a very common problem. many of our users(including programmers that
familiar with sql) have the same question.
comparing with sql, all queries in lucene are based on inverted index.
fortunately, when searching, we can providing a Filter.
from source codes of function searchWithFilter
we can
document id will be subject to changes. and all segments' document id is
starting from zero. after a merge, document ids will also change.
On Mon, Mar 5, 2012 at 12:31 AM, Benson Margulies wrote:
> I am walking down the document in an index by number, and I find that
> I want to update one. The u
if you want to identify a document, you should use a field such as url as
Unique Key in solr
On Mon, Mar 5, 2012 at 12:31 AM, Benson Margulies wrote:
> I am walking down the document in an index by number, and I find that
> I want to update one. The updateDocument API only works on queries and
>
I think many users of lucene use large memory because 32bit system's memory
is too limited(windows 1.5GB, Linux 2-3GB). the only noticable thing is *
Compressed* *oops* . some says it's useful, some not. you should give it a
try.
On Thu, Mar 1, 2012 at 4:59 PM, Ganesh wrote:
> Hello all,
>
> Is
In particular the Diagrams tab?
>
> Thanks for your time.
>
> Regards,
> Vineet
> --
> Founder, Architexa - www.architexa.com
> Understand & Document Code In Seconds
>
>
>
> On Wed, Feb 29, 2012 at 10:21 PM, Li Li wrote:
>
> > great website. though I use
great website. though I used to Eclipse and it provide good tools for
reading codes, it may be useful when I haven't development environment
outside the office.
a problem, there is no searching function like eclipse's ctrl+shift+t which
searching classes that supporting query like '*Reader'
another
you can delete by query like -category:category1
On Sun, Feb 19, 2012 at 9:41 PM, Li Li wrote:
> I think you could do as follows. taking splitting it to 3 indexes for
> example.
> you can copy the index 3 times.
> for copy 1
> for(int i=0;i reader1.delete(i);
> }
I think you could do as follows. taking splitting it to 3 indexes for
example.
you can copy the index 3 times.
for copy 1
for(int i=0;i
for now lucene don't provide any thing like this.
maybe you can diff each version before add them into index . so it just
indexes and stores difference for newer version.
On Wed, Feb 15, 2012 at 4:25 PM, Jamie wrote:
> Greetings All.
>
> I'd like to index data corresponding to different versions
for 2.x and 3.x you can simply use this codes:
Directory dir=FSDirectory.open(new File("./testindex"));
IndexReader reader=IndexReader.open(dir);
List urls=new ArrayList(reader.numDocs());
for(int i=0;i wrote:
> Hi there,
>
> I am currently working on a search engine based on lucen
it's up to your machines. in our application, we indexs about
30,000,000(30M)docs/shard, and the response time is about 150ms. our
machine has about 48GB memory and about 25GB is allocated to solr and other
is used for disk cache in Linux.
if calculated by our application, indexing 1.25T docs will
docNum * IndexedFieldsNum * 1 Bytes
you should disable indexed fields which are not used for relevancy rank.
On Sun, Sep 18, 2011 at 5:20 AM, roz dev wrote:
> Hi,
>
> I want to estimate the size of NORM file that lucene will generate for a 20
> Gb index which has 2.5 Million Docs and 50 fields
hi all,
I am using spellcheck in solr 1.4. I found that spell check is not
implemented as SolrCore. in SolrCore, it uses reference count to track
current searcher. oldSearcher and newSearcher will both exist if oldSearcher
is servicing some query. But in FileBasedSpellChecker
public void bu
It will affect the entire index because it 's a parameter of IndexWriter.
but you can modify it anytime you like before IndexWriter.addDocument.
If you want to truncate different fields with different maxLength. you
should avoid multithreads' race condition.
maybe you can add a TokenFilter t
hi all
I am interested in vertical crawler. But it seems this project is not
very active. It's last update time is 11/16/2009
g functions, such as BM25 or Language Model, rather
> than Lucene's original ranking function? Thank you.
>
> On Fri, Aug 19, 2011 at 2:37 PM, Li Li wrote:
>
>> if there are only text information, your "video search" is just normal full
>> text search. but
if there are only text information, your "video search" is just normal full
text search. but I think you should consider more on ranking, facet search
etc.
On Fri, Aug 19, 2011 at 1:05 PM, Lei Pang wrote:
> Hi everyone, I want to use Lucene to retrieve videos through their meta
> data: title, de
archblox
>
> On Monday, July 4, 2011, Li Li wrote:
>> hi all,
>> I want to provide full text searching for some "small" websites.
>> It seems cloud computing is popular now. And it will save costs
>> because it don't need employ engineer to maintain
&g
hi all,
I want to provide full text searching for some "small" websites.
It seems cloud computing is popular now. And it will save costs
because it don't need employ engineer to maintain
the machine.
For now, there are many services such as amazon s3, google app
engine, ms azure etc. I am
merge will also change docid
all segments' docId begin with 0
2011/3/30 Trejkaz :
> On Tue, Mar 29, 2011 at 11:21 PM, Erick Erickson
> wrote:
>> I'm always skeptical of storing the doc IDs since they can
>> change out from underneath you (just delete even a single
>> document and optimize).
>
> W
and also try using compound files (cfs)
2011/3/23 Vo Nhu Tuan :
> Hi,
>
> Can someone help me with this problem please? I got these when running my
> program:
>
> java.io.FileNotFoundException:
> /Users/vonhutuan/Documents/workspace/InformationExtractor/index_wordlist/_i82.frq
> (Too many open
use lsof to count the number of opened files
ulimit to modify it. maybe u need ask adminstrator to modify limit.conf
2011/3/23 Vo Nhu Tuan :
> Hi,
>
> Can someone help me with this problem please? I got these when running my
> program:
>
> java.io.FileNotFoundException:
> /Users/vonhutuan/Docume
I used plain text and sent successfully. thanks.
2011/3/11 Erick Erickson :
> What mail client are you using? I also had this problem and it's
> solved in Gmail by sending the mail as "plain text" rather than
> "Rich formatting".
>
> Best
> Erick
>
&
I don't use any client but browser.
2011/3/11 Erick Erickson
> What mail client are you using? I also had this problem and it's
> solved in Gmail by sending the mail as "plain text" rather than
> "Rich formatting".
>
> Best
> Erick
>
>
parsers when crawling and save parsed result only.
HtmlUnit is also a good tool for this purpose which support javascript
and parsing web pages.
2011/3/11 shrinath.m
> Thank you Li Li.
>
> Two questions :
>
> 1. Is there anything *in* *Lucene* that I need to know of ? some cont
http://java-source.net/open-source/html-parsers
2011/3/11 shrinath.m
> I am trying to index content withing certain HTML tags, how do I index it ?
> Which is the best parser/tokenizer available to do this ?
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Which-is-the-
bility of master. we want to use
some synchronization mechanism to allow only 1 or 2 ReplicationHandler
threads are doing CMD_GET_FILE command.
Is that solution feasible?
2011/3/11 Li Li
> hi
> it seems my mail is judged as spam.
> Technical details of permanent failure:
>
hi
it seems my mail is judged as spam.
Technical details of permanent failure:
Google tried to deliver your message, but it was rejected by the recipient
domain. We recommend contacting the other email provider for further
information about the cause of this error. The error that the other
it's indeed very slow. because it do collapsing in all matched documents.
we tacked this problem by doing collapsing in top 100 documents.
2011/3/6 Mark
> I'm familiar with Deduplication however I do not wish to remove my
> duplicates and my needs are slightly different. I would like to mark the
it's the problem of near duplication detection. there are many papers
addressing this problem. methods like simhash are used.
2011/3/5 Mark
> Is there a way one could detect duplicates (say by using some unique hash
> of certain fields) and marking a document as a duplicate but not remove it.
>
thank you. I got it.
2011/2/16 Chris Hostetter :
>
> : I used to receive the email myself because I subscribe the maillist.
> : but recently if I post a email to the maillist, I can't receive the
> : email posted by me. So I thought I failed to post this email.
>
> I notice you are using gmail --
; mean? Bounced as spam? rejected for other
> reasons? This question came through so obviously you can post
> something
>
> I found that sending mail as "plain text" kept the spam filter
> from kicking in.
>
> Best
> Erick
>
> On Tue, Feb 15, 2011 at 7:
hi all
is there any limit to post email to this maillist now? thanks
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
do you mean get tf of the hited documents when doing search?
it's a difficult problem because only TermScorer has TermDocs and using tf
in score() function.
but this we can't know whether these doc is selected because we use a
priorityQueue in TopScoreDocCollector
public void collect(int doc) th
I don't understand your problem well. but needing know when a new
term occur is a hard problem because when new document is added, it
will be added to a new segment. I think you can only do this in the
last merge in optimization stage. You can read the codes in
SegmentMerger.mergeTermInfos() . I
I think you can read the codes of solr.
I guess you can implement a collect to get all hit docs into a
DocSet(bitset). Also getting facet fields into memory(doc id->field
value) then loop the DocSet to count it.
2010/8/30 fulin tang :
> we are building a search system on top of lucene, and we are
han deserializing the index.
>
> What speed do you see if you only load 10% (7k)?
>
> Did you see the graphics in the package level javadocs?
> http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/store/instantiated/package-summary.html
>
>
> karl
>
>
me than deserializing the index.
>
> What speed do you see if you only load 10% (7k)?
>
> Did you see the graphics in the package level javadocs?
> http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/store/instantiated/package-summary.html
>
>
> karl
>
>
I have about 70k document, the total indexed size is about 15MB(the
orginal text files' size).
dir=new RAMDirectory();
IndexWriter write=new IndexWriter(dir,...;
for(loop){
writer.addDocument(doc);
}
writer
;
> Though, how many docs are you "typically" retrieving per search?
>
> Mike
>
> On Thu, Aug 5, 2010 at 3:37 AM, Li Li wrote:
>> hi all
>> we analyze system call of lucene and find that the fdx file is
>> always read when we get field values. In my appli
hi all
we analyze system call of lucene and find that the fdx file is
always read when we get field values. In my application the fdt is
about 50GB and fdx is about 120MB. I think it may be benifit to load
fdx into memory just like tii. Anyone else tried this ?
I found the system call by java when reading file, the buffer size is
always 1024. Can I modify this value to reduce system call?
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: j
lucene in action 2nd ed. is a good book
2010/7/28 Yakob :
> hello everyone,
> I am starting to understand lucene in java and I am having a hard time
> in implementing it.
> I am trying to develop a java application that can do indexing,
> searching and whatnot. and using lucene framework is one of
Or where to find any improvement proposal for lucene?
e.g. I want to change the float point multiplication to integer
multiplication or using bitmap for high frequent terms or something
else like this. Is there any place where I can find any resources or
guys?
thanks.
I want to cache full text into memory to improve performance.
Full text is only used to highlight in my application(But it's very
time consuming, My avg query time is about 250ms, I guess it will cost
about 50ms if I just get top 10 full text. Things get worse when get
more full text because i
we use the distributed search so don't tend
> to need to merge very large indexes anyway.
> when your system grows / you go into production you'll probably split
> the indexes too to use solr's distributed search func. for the sake of
> query speed).
>
> hope that he
I used to store full text into lucene index. But I found it's very
slow when merging index because when merging 2 segments it copy the
fdt files into a new one. So I want to only index full text. But When
searching I need the full text for applications such as hightlight and
view full text. I can s
谢谢
在 2010年7月7日 上午10:53,jg lin 写道:
> 加个QQ群问问18038594,你的问题我不会。
>
> Li Li 於 2010年7月7日上午10:48 ��道:
>
>> 会
>> 在 2010年7月7日 上午10:46,jg lin 写道:
>> > 你会说汉语吗(⊙_⊙)?
>> >
>> > 2010/7/7 Li Li
>> >
>> >> -- Forwarded message
会
在 2010年7月7日 上午10:46,jg lin 写道:
> 你会说汉语吗(⊙_⊙)?
>
> 2010/7/7 Li Li
>
>> -- Forwarded message ----------
>> From: Li Li
>> Date: 2010/7/7
>> Subject: index format error because disk full
>> To: solr-u...@lucene.apache.org
>>
>>
&g
-- Forwarded message --
From: Li Li
Date: 2010/7/7
Subject: index format error because disk full
To: solr-u...@lucene.apache.org
the index file is ill-formated because disk full when feeding. Can I
roll back to last version? Is there any method to avoid unexpected
errors when
it is said that "At a few thousand ~160 characters long documents
InstantiatedIndex outperforms RAMDirectory some 50x, 15x at 100
documents of 2000 characters length, and is linear to RAMDirectory at
10,000 documents of 2000 characters length. ". I have an index of
about 8,000,000 document and th
I want to use fast highlighter in solr1.4 and find a issue in
https://issues.apache.org/jira/browse/SOLR-1268
File Name Date Attached ↑
Attached By Size
SOLR-1268.patch 2010-02-05 10:32 PM
Koji Sekiguc
hi all
when use highlighter, We must provide a tokenStream and the
original text. To get a tokenStream, we can either reanlyze the
original text or use saved TermVector to reconstruct it.
In my application, highlight will cost average 200ms-300ms, and I
want to optimze it to lower than 100ms.
I want to override TermScorer.score() method to take position info
into scoring. e.g. any occurence whose position is less than 100 will
get a boost
The original score method:
public float score() {
assert doc != -1;
int f = freqs[pointer];
float raw =
.org/java/2_0_0/api/org/apache/lucene/search/Disjunction
> MaxQuery.html.
>
> Itamar.
>
> -----Original Message-
> From: Li Li [mailto:fancye...@gmail.com]
> Sent: Tuesday, June 01, 2010 11:42 AM
> To: java-user@lucene.apache.org
> Subject: What's Disjunc
olve?
>
> Best
> Erick
>
> On Wed, Jun 2, 2010 at 8:54 PM, Li Li wrote:
>
>> such as the detailed process of store data structures, index, search
>> and sort. not just apis. thanks.
>>
>> ---
such as the detailed process of store data structures, index, search
and sort. not just apis. thanks.
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.
in javadoc
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html#formula_norm
norm(t,d) = doc.getBoost() ・ lengthNorm(field) ・ ∏ f.getBoost()
field
f in d named as t
whre is field come from in leng
thank you
2010/6/2 Rebecca Watson :
> Hi Li Li
>
> If you want to support some query types and not others you should
> overide/extend the queryparser so that you throw an exception / makes
> a different query type instead.
>
> Similarity doesn't do the actual scori
1 - 100 of 107 matches
Mail list logo