Re: is the classes ended with PerThread(*PerThread) multithread
Hey there, so what you are looking at are classes that are created per Thread rather than shared with other threads. Lucene internally rarely creates threads or subclasses Thread, Runnable or Callable (ParallelMultiSearcher is an exception or some of the merging code). Yet, inside the indexer when you add (update) a document Lucene utilizes the callers thread rather than spanning a new one. When you look at DocumentsWriter.java there should be a method callled getThreadState. Each indexing thread, lets say in updateDocument, gets its Thread-Private DocumentsWriterThreadState. This thread state holds a DocConsumerPerThread obtained from the DocumentsWriters DocConsumer (see the indexing chain). DocConsumerPerThread in that case is some kind of decorator that hold other DocConsumerPerThread instances like TermsHashPerThread etc. The general pattern is for each DocConsumer you can get a DocConsumerPerThread for your indexing thread which then consumes the document you are processing right now. I hope that helps simon On Tue, Dec 28, 2010 at 4:19 AM, xu cheng xcheng@gmail.com wrote: hi all: I'm new to dev these days I'm reading the source code in the index package and I was confused. there are classes with suffix PerThread such as DocFieldProcessorPerThread, DocInverterPerThread, TermsHashPerThread, FreqProxTermWriterPerThread. in this mailing-list, I was told that they are multithreaded. however, there are some difficulties for me to understand! I see no sign that they inherited from the Thread , or implement the Runnable, or something else?? how do they map to the OS thread?? thanks ^_^ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: is the classes ended with PerThread(*PerThread) multithread
hi simon thanks for replying very much. after reading the source code with your suggestion, here's my understanding, and I don't know whether it's right: the DocumentsWriter actually don't create threads, but the codes that useDocumentsWriter can do the multithreading(say, several threads call updateDocument). and each thread has its DocumentsWriterThreadState, in the mean while, each DocumentsWriterThreadState has its own objects(the *PerThread such as DocFieldProcessorPerThread, DocInverterPerThread and so on ) as the methods of DocumentsWriter are called by multiple threads, for example, 4 threads, there are 4 DocumentsWriterThreadState objects, and 4 index chains, ( each index chain has it's own *PerThread objects , to process the document). am I right?? thanks for replying again! 2010/12/28 Simon Willnauer simon.willna...@googlemail.com Hey there, so what you are looking at are classes that are created per Thread rather than shared with other threads. Lucene internally rarely creates threads or subclasses Thread, Runnable or Callable (ParallelMultiSearcher is an exception or some of the merging code). Yet, inside the indexer when you add (update) a document Lucene utilizes the callers thread rather than spanning a new one. When you look at DocumentsWriter.java there should be a method callled getThreadState. Each indexing thread, lets say in updateDocument, gets its Thread-Private DocumentsWriterThreadState. This thread state holds a DocConsumerPerThread obtained from the DocumentsWriters DocConsumer (see the indexing chain). DocConsumerPerThread in that case is some kind of decorator that hold other DocConsumerPerThread instances like TermsHashPerThread etc. The general pattern is for each DocConsumer you can get a DocConsumerPerThread for your indexing thread which then consumes the document you are processing right now. I hope that helps simon On Tue, Dec 28, 2010 at 4:19 AM, xu cheng xcheng@gmail.com wrote: hi all: I'm new to dev these days I'm reading the source code in the index package and I was confused. there are classes with suffix PerThread such as DocFieldProcessorPerThread, DocInverterPerThread, TermsHashPerThread, FreqProxTermWriterPerThread. in this mailing-list, I was told that they are multithreaded. however, there are some difficulties for me to understand! I see no sign that they inherited from the Thread , or implement the Runnable, or something else?? how do they map to the OS thread?? thanks ^_^ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: is the classes ended with PerThread(*PerThread) multithread
On Tue, Dec 28, 2010 at 10:57 AM, xu cheng xcheng@gmail.com wrote: hi simon thanks for replying very much. after reading the source code with your suggestion, here's my understanding, and I don't know whether it's right: the DocumentsWriter actually don't create threads, but the codes that use DocumentsWriter can do the multithreading(say, several threads call updateDocument). and each thread has its DocumentsWriterThreadState, in the mean while, each DocumentsWriterThreadState has its own objects(the *PerThread such as DocFieldProcessorPerThread, DocInverterPerThread and so on ) as the methods of DocumentsWriter are called by multiple threads, for example, 4 threads, there are 4 DocumentsWriterThreadState objects, and 4 index chains, ( each index chain has it's own *PerThread objects , to process the document). am I right?? that sounds about right simon thanks for replying again! 2010/12/28 Simon Willnauer simon.willna...@googlemail.com Hey there, so what you are looking at are classes that are created per Thread rather than shared with other threads. Lucene internally rarely creates threads or subclasses Thread, Runnable or Callable (ParallelMultiSearcher is an exception or some of the merging code). Yet, inside the indexer when you add (update) a document Lucene utilizes the callers thread rather than spanning a new one. When you look at DocumentsWriter.java there should be a method callled getThreadState. Each indexing thread, lets say in updateDocument, gets its Thread-Private DocumentsWriterThreadState. This thread state holds a DocConsumerPerThread obtained from the DocumentsWriters DocConsumer (see the indexing chain). DocConsumerPerThread in that case is some kind of decorator that hold other DocConsumerPerThread instances like TermsHashPerThread etc. The general pattern is for each DocConsumer you can get a DocConsumerPerThread for your indexing thread which then consumes the document you are processing right now. I hope that helps simon On Tue, Dec 28, 2010 at 4:19 AM, xu cheng xcheng@gmail.com wrote: hi all: I'm new to dev these days I'm reading the source code in the index package and I was confused. there are classes with suffix PerThread such as DocFieldProcessorPerThread, DocInverterPerThread, TermsHashPerThread, FreqProxTermWriterPerThread. in this mailing-list, I was told that they are multithreaded. however, there are some difficulties for me to understand! I see no sign that they inherited from the Thread , or implement the Runnable, or something else?? how do they map to the OS thread?? thanks ^_^ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: is the classes ended with PerThread(*PerThread) multithread
There is a single indexchain, with a single instance of each chain component, except those ending in -PerThread. Though that's gonna change with https://issues.apache.org/jira/browse/LUCENE-2324 On Tue, Dec 28, 2010 at 13:10, Simon Willnauer simon.willna...@googlemail.com wrote: On Tue, Dec 28, 2010 at 10:57 AM, xu cheng xcheng@gmail.com wrote: hi simon thanks for replying very much. after reading the source code with your suggestion, here's my understanding, and I don't know whether it's right: the DocumentsWriter actually don't create threads, but the codes that use DocumentsWriter can do the multithreading(say, several threads call updateDocument). and each thread has its DocumentsWriterThreadState, in the mean while, each DocumentsWriterThreadState has its own objects(the *PerThread such as DocFieldProcessorPerThread, DocInverterPerThread and so on ) as the methods of DocumentsWriter are called by multiple threads, for example, 4 threads, there are 4 DocumentsWriterThreadState objects, and 4 index chains, ( each index chain has it's own *PerThread objects , to process the document). am I right?? that sounds about right simon thanks for replying again! 2010/12/28 Simon Willnauer simon.willna...@googlemail.com Hey there, so what you are looking at are classes that are created per Thread rather than shared with other threads. Lucene internally rarely creates threads or subclasses Thread, Runnable or Callable (ParallelMultiSearcher is an exception or some of the merging code). Yet, inside the indexer when you add (update) a document Lucene utilizes the callers thread rather than spanning a new one. When you look at DocumentsWriter.java there should be a method callled getThreadState. Each indexing thread, lets say in updateDocument, gets its Thread-Private DocumentsWriterThreadState. This thread state holds a DocConsumerPerThread obtained from the DocumentsWriters DocConsumer (see the indexing chain). DocConsumerPerThread in that case is some kind of decorator that hold other DocConsumerPerThread instances like TermsHashPerThread etc. The general pattern is for each DocConsumer you can get a DocConsumerPerThread for your indexing thread which then consumes the document you are processing right now. I hope that helps simon On Tue, Dec 28, 2010 at 4:19 AM, xu cheng xcheng@gmail.com wrote: hi all: I'm new to dev these days I'm reading the source code in the index package and I was confused. there are classes with suffix PerThread such as DocFieldProcessorPerThread, DocInverterPerThread, TermsHashPerThread, FreqProxTermWriterPerThread. in this mailing-list, I was told that they are multithreaded. however, there are some difficulties for me to understand! I see no sign that they inherited from the Thread , or implement the Runnable, or something else?? how do they map to the OS thread?? thanks ^_^ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) Phone: +7 (495) 683-567-4 ICQ: 104465785 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
is the classes ended with PerThread(*PerThread) multithread
hi all: I'm new to dev these days I'm reading the source code in the index package and I was confused. there are classes with suffix PerThread such as DocFieldProcessorPerThread, DocInverterPerThread, TermsHashPerThread, FreqProxTermWriterPerThread. in this mailing-list, I was told that they are multithreaded. however, there are some difficulties for me to understand! I see no sign that they inherited from the Thread , or implement the Runnable, or something else?? how do they map to the OS thread?? thanks ^_^