any issues about the *perthread classes

2010-12-28 Thread xu cheng
hi all
I noticed that there are plenty *PerThread classes in the trunk
http://svn.apache.org/repos/asf/lucene/dev/trunk/
while in the realtime_search version
http://svn.apache.org/repos/asf/lucene/dev/branches/realtime_search/
the *PerThread classes are gone!
this just confused me,  cos I'm new here.

what's the purpose of such a design?what's the advantage? any issues refer
to this ??

any suggestion or references are appreciated!
regards.
xu


Re: is the classes ended with PerThread(*PerThread) multithread

2010-12-28 Thread xu cheng
hi simon

thanks for replying very much.

after reading the source code with your suggestion, here's my understanding,
and I don't know whether it's right:

the DocumentsWriter actually don't create threads, but the codes that
useDocumentsWriter can do the
multithreading(say, several threads call updateDocument). and each thread
has its DocumentsWriterThreadState, in the mean while,
each DocumentsWriterThreadState has its own objects(the *PerThread such as
DocFieldProcessorPerThread, DocInverterPerThread and so on )

as the methods of DocumentsWriter are called by multiple threads, for
example, 4 threads, there are 4 DocumentsWriterThreadState objects, and 4
index chains, ( each index chain has it's own *PerThread objects ,  to
process the document).

am I right??

thanks for replying again!



2010/12/28 Simon Willnauer 

> Hey there,
>
> so what you are looking at are classes that are created per Thread
> rather than shared with other threads. Lucene internally rarely
> creates threads or subclasses Thread, Runnable or Callable
> (ParallelMultiSearcher is an exception or some of the merging code).
> Yet, inside the indexer when you add (update) a document Lucene
> utilizes the callers thread rather than spanning a new one. When you
> look at DocumentsWriter.java there should be a method callled
> getThreadState. Each indexing thread, lets say in updateDocument, gets
> its Thread-Private DocumentsWriterThreadState. This thread state holds
> a DocConsumerPerThread obtained from the DocumentsWriters DocConsumer
> (see the indexing chain). DocConsumerPerThread in that case is some
> kind of decorator that hold other DocConsumerPerThread instances like
> TermsHashPerThread etc.
>
> The general pattern is for each DocConsumer you can get a
> DocConsumerPerThread for your indexing thread which then consumes the
> document you are processing right now.
>
> I hope that helps
>
> simon
>
>
> On Tue, Dec 28, 2010 at 4:19 AM, xu cheng  wrote:
> > hi all:
> > I'm new to dev
> > these days I'm reading the source code in the index package
> > and I was confused.
> > there are classes with suffix PerThread such as
> DocFieldProcessorPerThread,
> > DocInverterPerThread, TermsHashPerThread, FreqProxTermWriterPerThread.
> > in this mailing-list, I was told that they are multithreaded.
> > however, there are some difficulties for me to understand!
> > I see no sign that they inherited from the Thread , or implement the
> > Runnable, or something else??
> > how do they map to the OS thread??
> > thanks ^_^
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


is the classes ended with PerThread(*PerThread) multithread

2010-12-27 Thread xu cheng
hi all:
I'm new to dev
these days I'm reading the source code in the index package
and I was confused.
there are classes with suffix PerThread such as DocFieldProcessorPerThread,
DocInverterPerThread, TermsHashPerThread, FreqProxTermWriterPerThread.

in this mailing-list, I was told that they are multithreaded.
however, there are some difficulties for me to understand!
I see no sign that they inherited from the Thread , or implement the
Runnable, or something else??

how do they map to the OS thread??

thanks ^_^


Re: difficulties for me to understand the index chain

2010-12-26 Thread xu cheng
hi Li Li
thanks for your answer very much!!!

 To support multithreads indexing, PerThread class is used.
multithreads to do what? each thread for processing per file, or each thread
for processing per field or something else??

regards


2010/12/27 Li Li 

> I am also interested in this question.
> And my understanding may be wrong.
>
>
> 2010/12/27 xu cheng :
> > Hi all:
> > I'm new to lucene dev. these days I'm reading the lucene source code. and
> > now there are some difficulties for me to understand the index chain.
> > I could not understand the complex relationship between the classes!
> > for example:
> > I could not understand the relations between these classes:
> >  DocFieldConsumerPerThread, DocFieldConsumerPerField,
> DocInvertedPerThread,
> > DocInverterPerThread..
>   because segments often have the same fields, so PerField is used
> to share common things.
>  To support multithreads indexing, PerThread class is used.
>
>   See codes in DocumentsWriter
>
>  static final IndexingChain DefaultIndexingChain = new IndexingChain() {
>
>DocConsumer getChain(DocumentsWriter documentsWriter) {
>  /*
>  This is the current indexing chain:
>
>  DocConsumer / DocConsumerPerThread
>--> code: DocFieldProcessor / DocFieldProcessorPerThread
>  --> DocFieldConsumer / DocFieldConsumerPerThread /
> DocFieldConsumerPerField
>--> code: DocFieldConsumers / DocFieldConsumersPerThread /
> DocFieldConsumersPerField
>  --> code: DocInverter / DocInverterPerThread /
> DocInverterPerField
>--> InvertedDocConsumer / InvertedDocConsumerPerThread
> / InvertedDocConsumerPerField
>  --> code: TermsHash / TermsHashPerThread /
> TermsHashPerField
>--> TermsHashConsumer / TermsHashConsumerPerThread
> / TermsHashConsumerPerField
>  --> code: FreqProxTermsWriter /
> FreqProxTermsWriterPerThread / FreqProxTermsWriterPerField
>  --> code: TermVectorsTermsWriter /
> TermVectorsTermsWriterPerThread / TermVectorsTermsWriterPerField
>--> InvertedDocEndConsumer /
> InvertedDocConsumerPerThread / InvertedDocConsumerPerField
>  --> code: NormsWriter / NormsWriterPerThread /
> NormsWriterPerField
>  --> code: StoredFieldsWriter /
> StoredFieldsWriterPerThread / StoredFieldsWriterPerField
>*/
>
>// Build up indexing chain:
>
>  final TermsHashConsumer termVectorsWriter = new
> TermVectorsTermsWriter(documentsWriter);
>  final TermsHashConsumer freqProxWriter = new FreqProxTermsWriter();
>
>  final InvertedDocConsumer  termsHash = new
> TermsHash(documentsWriter, true, freqProxWriter,
>   new
> TermsHash(documentsWriter, false, termVectorsWriter, null));
>  final NormsWriter normsWriter = new NormsWriter();
>  final DocInverter docInverter = new DocInverter(termsHash,
> normsWriter);
>  return new DocFieldProcessor(documentsWriter, docInverter);
> }
>  };
> > btw, what 's the advantage of using such a design, the so called index
> > chain??
>   I think because older version of lucene only support single
> thread indexing and to reuse existed codes, they designed such a
> architecture.
> > Is there any docs about this??
>   If you can read Chinese, you may find some useful articles here:
> http://forfuture1978.javaeye.com/
>  But I think read codes are very helpful.
> > any suggestion or references are appreciated! thanks
> > regards.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


difficulties for me to understand the index chain

2010-12-26 Thread xu cheng
Hi all:
I'm new to lucene dev. these days I'm reading the lucene source code. and
now there are some difficulties for me to understand the index chain.
I could not understand the complex relationship between the classes!
for example:
I could not understand the relations between these classes:
 DocFieldConsumerPerThread, DocFieldConsumerPerField, DocInvertedPerThread,
DocInverterPerThread..

btw, what 's the advantage of using such a design, the so called index
chain??

Is there any docs about this??

any suggestion or references are appreciated! thanks

regards.


what's the differences between Hits and TopDocs

2010-12-21 Thread xu cheng
hi all:
I notice that ,once apon a time, long long ago...the IndexSeacher.search
returns Hits, and now the returned object is TopDocs?
what's the differences?
any answer and references are appreciated.thanks