The test results seem hard to believe. Doubling the CPUs only increased
through put by 20%??? Seems rather low for primarily a "read only" test.
Peter did not seem to answer many of the follow-up questions (at least I
could not find the answers) regarding whether or not the CPU usage was 100%.
If the OS cache is insufficient to support the size of the index and the
number of queries being executed, then you will not achieve linear increases
with the number of CPUs, since you will become quickly become IO bound
(especially if the queries are returning a wide variety of documents that
are scattered through out the index).
Since reading a document is a relatively expensive operation (especially if
the data blocks are not in the OS cache), if synchronized, no other thread
can read a document, or begin to read a document (in the case of an
OS/hardware that supports scatter/gather multiple IO requests). The is not
just applicable to cases where lots of documents are being read. Since the
isDeleted() method uses the same synchronized lock as document(), all query
scorers that filter out deleted documents will also be impacted, as they
will block while the document is being read.
In order to test this, I wrote the attached test case. It uses 2 threads,
one which reads every document in a segment, another which reads the same
document repeatedly (for as many documents as there are in the index). The
theory being, the "readsame" should be able to execute rather quickly (since
the needed disk blocks will quickly become available in the OS cache), where
as the "readall" will be much slower (since almost every document retrieval
will require disk access).
I tested using a segment containing 100k documents. I ran the test on a
single CPU machine (1.2 ghz P4).
I used the windows "cleanmem" to clear the system cache before running the
tests. (It seemed unreliable at times. Does anyone know a fool-proof method
of emptying the system cache on windows???)
Running using the unmodified SegmentReader and FieldsReader (synchronized)
over multiple tests, I got the following:
BEST TIME
ReadSameThread, time = 2359
ReadAllThread, time = 2469
WORST TIME
ReadSameThread, time = 2671
ReadAllThread, time = 2968
Using the modified (unsynchronized using ThreadLocal) classes, I got the
following:
BEST TIME
ReadSameThread, time = 1328
ReadAllThread, time = 1859
WORST TIME
ReadSameThread, time = 1671
ReadAllThread, time = 1953
I believe that using an MMap directory only improves the situation since the
OS reads the blocks much more efficiently (faster). Imagine if you were
running Lucene using a VERY SLOW disk subsystem - the synchronized block
would have an even greater negative impact.
Hopefully, this is enough to demonstrate the value of using ThreadLocals to
support simultaneous IO.
I look forward to your thoughts, and others - hopefully someone can run the
test on a multiple CPU machine.
Robert
-----Original Message-----
From: Doug Cutting [mailto:[EMAIL PROTECTED]
Sent: Tuesday, May 16, 2006 3:17 PM
To: [email protected]
Subject: Re: FieldsReader synchronized access vs. ThreadLocal ?
Robert Engels wrote:
> It seems that in a highly multi-threaded server this synchronized method
> could lead to significant blocking when the documents are being retrieved?
Perhaps, but I'd prefer to wait for someone to demonstrate this as a
performance bottleneck before adding another ThreadLocal.
Peter Keegan has recently demonstrated pretty good concurrency using
mmap directory on four and eight CPU systems:
http://www.mail-archive.com/[email protected]/msg05074.html
Peter also wondered if the SegmentReader.document(int) method might be a
bottleneck, and tried patching it to run unsynchronized:
http://www.mail-archive.com/[email protected]/msg05891.html
Unfortunately that did not improve his performance:
http://www.mail-archive.com/[email protected]/msg06163.html
Doug
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
package org.apache.lucene.index;
/**
* Copyright 2004 The Apache Software Foundation
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.util.zip.DataFormatException;
import java.util.zip.Inflater;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.IndexInput;
/**
* Class responsible for access to stored document fields.
*
* It uses <segment>.fdt and <segment>.fdx; files.
*
* @version $Id: FieldsReader.java 329524 2005-10-30 00:38:46 -0500 (Sun, 30 Oct 2005) yonik $
*/
final class FieldsReader {
private FieldInfos fieldInfos;
private IndexInput fieldsStream;
private IndexInput indexStream;
private ThreadLocal fieldsLocal = new ThreadLocal() {
public Object initialValue() {
return fieldsStream.clone();
}
};
private ThreadLocal indexLocal = new ThreadLocal() {
public Object initialValue() {
return indexStream.clone();
}
};
private int size;
FieldsReader(Directory d, String segment, FieldInfos fn) throws IOException {
fieldInfos = fn;
fieldsStream = d.openInput(segment + ".fdt");
indexStream = d.openInput(segment + ".fdx");
size = (int)(indexStream.length() / 8);
}
final void close() throws IOException {
fieldsStream.close();
indexStream.close();
}
final int size() {
return size;
}
final Document doc(int n) throws IOException {
final IndexInput indexStream = (IndexInput) indexLocal.get();
final IndexInput fieldsStream = (IndexInput) fieldsLocal.get();
indexStream.seek(n * 8L);
long position = indexStream.readLong();
fieldsStream.seek(position);
Document doc = new Document();
int numFields = fieldsStream.readVInt();
for (int i = 0; i < numFields; i++) {
int fieldNumber = fieldsStream.readVInt();
FieldInfo fi = fieldInfos.fieldInfo(fieldNumber);
byte bits = fieldsStream.readByte();
boolean compressed = (bits & FieldsWriter.FIELD_IS_COMPRESSED) != 0;
boolean tokenize = (bits & FieldsWriter.FIELD_IS_TOKENIZED) != 0;
if ((bits & FieldsWriter.FIELD_IS_BINARY) != 0) {
final byte[] b = new byte[fieldsStream.readVInt()];
fieldsStream.readBytes(b, 0, b.length);
if (compressed)
doc.add(new Field(fi.name, uncompress(b), Field.Store.COMPRESS));
else
doc.add(new Field(fi.name, b, Field.Store.YES));
}
else {
Field.Index index;
Field.Store store = Field.Store.YES;
if (fi.isIndexed && tokenize)
index = Field.Index.TOKENIZED;
else if (fi.isIndexed && !tokenize)
index = Field.Index.UN_TOKENIZED;
else
index = Field.Index.NO;
Field.TermVector termVector = null;
if (fi.storeTermVector) {
if (fi.storeOffsetWithTermVector) {
if (fi.storePositionWithTermVector) {
termVector = Field.TermVector.WITH_POSITIONS_OFFSETS;
}
else {
termVector = Field.TermVector.WITH_OFFSETS;
}
}
else if (fi.storePositionWithTermVector) {
termVector = Field.TermVector.WITH_POSITIONS;
}
else {
termVector = Field.TermVector.YES;
}
}
else {
termVector = Field.TermVector.NO;
}
if (compressed) {
store = Field.Store.COMPRESS;
final byte[] b = new byte[fieldsStream.readVInt()];
fieldsStream.readBytes(b, 0, b.length);
Field f = new Field(fi.name, // field name
new String(uncompress(b), "UTF-8"), // uncompress the value and add as string
store,
index,
termVector);
f.setOmitNorms(fi.omitNorms);
doc.add(f);
}
else {
Field f = new Field(fi.name, // name
fieldsStream.readString(), // read value
store,
index,
termVector);
f.setOmitNorms(fi.omitNorms);
doc.add(f);
}
}
}
return doc;
}
private final byte[] uncompress(final byte[] input)
throws IOException
{
Inflater decompressor = new Inflater();
decompressor.setInput(input);
// Create an expandable byte array to hold the decompressed data
ByteArrayOutputStream bos = new ByteArrayOutputStream(input.length);
// Decompress the data
byte[] buf = new byte[1024];
while (!decompressor.finished()) {
try {
int count = decompressor.inflate(buf);
bos.write(buf, 0, count);
}
catch (DataFormatException e) {
// this will happen if the field is not compressed
throw new IOException ("field data are in wrong format: " + e.toString());
}
}
decompressor.end();
// Get the decompressed data
return bos.toByteArray();
}
}
package org.apache.lucene.index;
/**
* Copyright 2004 The Apache Software Foundation
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import java.io.IOException;
import java.util.*;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.store.IndexInput;
import org.apache.lucene.store.IndexOutput;
import org.apache.lucene.store.Directory;
import org.apache.lucene.util.BitVector;
import org.apache.lucene.search.DefaultSimilarity;
/**
* @version $Id: SegmentReader.java 329523 2005-10-30 00:37:11 -0500 (Sun, 30 Oct 2005) yonik $
*/
class SegmentReader extends IndexReader {
private String segment;
FieldInfos fieldInfos;
private FieldsReader fieldsReader;
TermInfosReader tis;
TermVectorsReader termVectorsReaderOrig = null;
ThreadLocal termVectorsLocal = new ThreadLocal();
BitVector deletedDocs = null;
private boolean deletedDocsDirty = false;
private boolean normsDirty = false;
private boolean undeleteAll = false;
IndexInput freqStream;
IndexInput proxStream;
// Compound File Reader when based on a compound file segment
CompoundFileReader cfsReader = null;
private class Norm {
public Norm(IndexInput in, int number)
{
this.in = in;
this.number = number;
}
private IndexInput in;
private byte[] bytes;
private boolean dirty;
private int number;
private void reWrite() throws IOException {
// NOTE: norms are re-written in regular directory, not cfs
IndexOutput out = directory().createOutput(segment + ".tmp");
try {
out.writeBytes(bytes, maxDoc());
} finally {
out.close();
}
String fileName;
if(cfsReader == null)
fileName = segment + ".f" + number;
else{
// use a different file name if we have compound format
fileName = segment + ".s" + number;
}
directory().renameFile(segment + ".tmp", fileName);
this.dirty = false;
}
}
private Hashtable norms = new Hashtable();
/** The class which implements SegmentReader. */
private static Class IMPL;
static {
try {
String name =
System.getProperty("org.apache.lucene.SegmentReader.class",
SegmentReader.class.getName());
IMPL = Class.forName(name);
} catch (ClassNotFoundException e) {
throw new RuntimeException("cannot load SegmentReader class: " + e);
} catch (SecurityException se) {
try {
IMPL = Class.forName(SegmentReader.class.getName());
} catch (ClassNotFoundException e) {
throw new RuntimeException("cannot load default SegmentReader class: " + e);
}
}
}
protected SegmentReader() { super(null); }
public static SegmentReader get(SegmentInfo si) throws IOException {
return get(si.dir, si, null, false, false);
}
public static SegmentReader get(SegmentInfos sis, SegmentInfo si,
boolean closeDir) throws IOException {
return get(si.dir, si, sis, closeDir, true);
}
public static SegmentReader get(Directory dir, SegmentInfo si,
SegmentInfos sis,
boolean closeDir, boolean ownDir)
throws IOException {
SegmentReader instance;
try {
instance = (SegmentReader)IMPL.newInstance();
} catch (Exception e) {
throw new RuntimeException("cannot load SegmentReader class: " + e);
}
instance.init(dir, sis, closeDir, ownDir);
instance.initialize(si);
return instance;
}
private void initialize(SegmentInfo si) throws IOException {
segment = si.name;
// Use compound file directory for some files, if it exists
Directory cfsDir = directory();
if (directory().fileExists(segment + ".cfs")) {
cfsReader = new CompoundFileReader(directory(), segment + ".cfs");
cfsDir = cfsReader;
}
// No compound file exists - use the multi-file format
fieldInfos = new FieldInfos(cfsDir, segment + ".fnm");
fieldsReader = new FieldsReader(cfsDir, segment, fieldInfos);
tis = new TermInfosReader(cfsDir, segment, fieldInfos);
// NOTE: the bitvector is stored using the regular directory, not cfs
if (hasDeletions(si))
deletedDocs = new BitVector(directory(), segment + ".del");
// make sure that all index files have been read or are kept open
// so that if an index update removes them we'll still have them
freqStream = cfsDir.openInput(segment + ".frq");
proxStream = cfsDir.openInput(segment + ".prx");
openNorms(cfsDir);
if (fieldInfos.hasVectors()) { // open term vector files only as needed
termVectorsReaderOrig = new TermVectorsReader(cfsDir, segment, fieldInfos);
}
}
protected void finalize() {
// patch for pre-1.4.2 JVMs, whose ThreadLocals leak
termVectorsLocal.set(null);
super.finalize();
}
protected void doCommit() throws IOException {
if (deletedDocsDirty) { // re-write deleted
deletedDocs.write(directory(), segment + ".tmp");
directory().renameFile(segment + ".tmp", segment + ".del");
}
if(undeleteAll && directory().fileExists(segment + ".del")){
directory().deleteFile(segment + ".del");
}
if (normsDirty) { // re-write norms
Enumeration values = norms.elements();
while (values.hasMoreElements()) {
Norm norm = (Norm) values.nextElement();
if (norm.dirty) {
norm.reWrite();
}
}
}
deletedDocsDirty = false;
normsDirty = false;
undeleteAll = false;
}
protected void doClose() throws IOException {
fieldsReader.close();
tis.close();
if (freqStream != null)
freqStream.close();
if (proxStream != null)
proxStream.close();
closeNorms();
if (termVectorsReaderOrig != null)
termVectorsReaderOrig.close();
if (cfsReader != null)
cfsReader.close();
}
static boolean hasDeletions(SegmentInfo si) throws IOException {
return si.dir.fileExists(si.name + ".del");
}
public boolean hasDeletions() {
return deletedDocs != null;
}
static boolean usesCompoundFile(SegmentInfo si) throws IOException {
return si.dir.fileExists(si.name + ".cfs");
}
static boolean hasSeparateNorms(SegmentInfo si) throws IOException {
String[] result = si.dir.list();
String pattern = si.name + ".s";
int patternLength = pattern.length();
for(int i = 0; i < result.length; i++){
if(result[i].startsWith(pattern) && Character.isDigit(result[i].charAt(patternLength)))
return true;
}
return false;
}
protected void doDelete(int docNum) {
if (deletedDocs == null)
deletedDocs = new BitVector(maxDoc());
deletedDocsDirty = true;
undeleteAll = false;
deletedDocs.set(docNum);
}
protected void doUndeleteAll() {
deletedDocs = null;
deletedDocsDirty = false;
undeleteAll = true;
}
Vector files() throws IOException {
Vector files = new Vector(16);
for (int i = 0; i < IndexFileNames.INDEX_EXTENSIONS.length; i++) {
String name = segment + "." + IndexFileNames.INDEX_EXTENSIONS[i];
if (directory().fileExists(name))
files.addElement(name);
}
for (int i = 0; i < fieldInfos.size(); i++) {
FieldInfo fi = fieldInfos.fieldInfo(i);
if (fi.isIndexed && !fi.omitNorms){
String name;
if(cfsReader == null)
name = segment + ".f" + i;
else
name = segment + ".s" + i;
if (directory().fileExists(name))
files.addElement(name);
}
}
return files;
}
public TermEnum terms() {
return tis.terms();
}
public TermEnum terms(Term t) throws IOException {
return tis.terms(t);
}
// public synchronized Document document(int n) throws IOException {
public Document document(int n) throws IOException {
if (isDeleted(n))
throw new IllegalArgumentException
("attempt to access a deleted document");
return fieldsReader.doc(n);
}
public synchronized boolean isDeleted(int n) {
return (deletedDocs != null && deletedDocs.get(n));
}
public TermDocs termDocs() throws IOException {
return new SegmentTermDocs(this);
}
public TermPositions termPositions() throws IOException {
return new SegmentTermPositions(this);
}
public int docFreq(Term t) throws IOException {
TermInfo ti = tis.get(t);
if (ti != null)
return ti.docFreq;
else
return 0;
}
public int numDocs() {
int n = maxDoc();
if (deletedDocs != null)
n -= deletedDocs.count();
return n;
}
public int maxDoc() {
return fieldsReader.size();
}
/**
* @see IndexReader#getFieldNames()
* @deprecated Replaced by [EMAIL PROTECTED] #getFieldNames (IndexReader.FieldOption fldOption)}
*/
public Collection getFieldNames() {
// maintain a unique set of field names
Set fieldSet = new HashSet();
for (int i = 0; i < fieldInfos.size(); i++) {
FieldInfo fi = fieldInfos.fieldInfo(i);
fieldSet.add(fi.name);
}
return fieldSet;
}
/**
* @see IndexReader#getFieldNames(boolean)
* @deprecated Replaced by [EMAIL PROTECTED] #getFieldNames (IndexReader.FieldOption fldOption)}
*/
public Collection getFieldNames(boolean indexed) {
// maintain a unique set of field names
Set fieldSet = new HashSet();
for (int i = 0; i < fieldInfos.size(); i++) {
FieldInfo fi = fieldInfos.fieldInfo(i);
if (fi.isIndexed == indexed)
fieldSet.add(fi.name);
}
return fieldSet;
}
/**
* @see IndexReader#getIndexedFieldNames(Field.TermVector tvSpec)
* @deprecated Replaced by [EMAIL PROTECTED] #getFieldNames (IndexReader.FieldOption fldOption)}
*/
public Collection getIndexedFieldNames (Field.TermVector tvSpec){
boolean storedTermVector;
boolean storePositionWithTermVector;
boolean storeOffsetWithTermVector;
if(tvSpec == Field.TermVector.NO){
storedTermVector = false;
storePositionWithTermVector = false;
storeOffsetWithTermVector = false;
}
else if(tvSpec == Field.TermVector.YES){
storedTermVector = true;
storePositionWithTermVector = false;
storeOffsetWithTermVector = false;
}
else if(tvSpec == Field.TermVector.WITH_POSITIONS){
storedTermVector = true;
storePositionWithTermVector = true;
storeOffsetWithTermVector = false;
}
else if(tvSpec == Field.TermVector.WITH_OFFSETS){
storedTermVector = true;
storePositionWithTermVector = false;
storeOffsetWithTermVector = true;
}
else if(tvSpec == Field.TermVector.WITH_POSITIONS_OFFSETS){
storedTermVector = true;
storePositionWithTermVector = true;
storeOffsetWithTermVector = true;
}
else{
throw new IllegalArgumentException("unknown termVector parameter " + tvSpec);
}
// maintain a unique set of field names
Set fieldSet = new HashSet();
for (int i = 0; i < fieldInfos.size(); i++) {
FieldInfo fi = fieldInfos.fieldInfo(i);
if (fi.isIndexed && fi.storeTermVector == storedTermVector &&
fi.storePositionWithTermVector == storePositionWithTermVector &&
fi.storeOffsetWithTermVector == storeOffsetWithTermVector){
fieldSet.add(fi.name);
}
}
return fieldSet;
}
/**
* @see IndexReader#getFieldNames(IndexReader.FieldOption fldOption)
*/
public Collection getFieldNames(IndexReader.FieldOption fieldOption) {
Set fieldSet = new HashSet();
for (int i = 0; i < fieldInfos.size(); i++) {
FieldInfo fi = fieldInfos.fieldInfo(i);
if (fieldOption == IndexReader.FieldOption.ALL) {
fieldSet.add(fi.name);
}
else if (!fi.isIndexed && fieldOption == IndexReader.FieldOption.UNINDEXED) {
fieldSet.add(fi.name);
}
else if (fi.isIndexed && fieldOption == IndexReader.FieldOption.INDEXED) {
fieldSet.add(fi.name);
}
else if (fi.isIndexed && fi.storeTermVector == false && fieldOption == IndexReader.FieldOption.INDEXED_NO_TERMVECTOR) {
fieldSet.add(fi.name);
}
else if (fi.storeTermVector == true &&
fi.storePositionWithTermVector == false &&
fi.storeOffsetWithTermVector == false &&
fieldOption == IndexReader.FieldOption.TERMVECTOR) {
fieldSet.add(fi.name);
}
else if (fi.isIndexed && fi.storeTermVector && fieldOption == IndexReader.FieldOption.INDEXED_WITH_TERMVECTOR) {
fieldSet.add(fi.name);
}
else if (fi.storePositionWithTermVector && fi.storeOffsetWithTermVector == false && fieldOption == IndexReader.FieldOption.TERMVECTOR_WITH_POSITION) {
fieldSet.add(fi.name);
}
else if (fi.storeOffsetWithTermVector && fi.storePositionWithTermVector == false && fieldOption == IndexReader.FieldOption.TERMVECTOR_WITH_OFFSET) {
fieldSet.add(fi.name);
}
else if ((fi.storeOffsetWithTermVector && fi.storePositionWithTermVector) &&
fieldOption == IndexReader.FieldOption.TERMVECTOR_WITH_POSITION_OFFSET) {
fieldSet.add(fi.name);
}
}
return fieldSet;
}
public synchronized boolean hasNorms(String field) {
return norms.containsKey(field);
}
static byte[] createFakeNorms(int size) {
byte[] ones = new byte[size];
Arrays.fill(ones, DefaultSimilarity.encodeNorm(1.0f));
return ones;
}
private byte[] ones;
private byte[] fakeNorms() {
if (ones==null) ones=createFakeNorms(maxDoc());
return ones;
}
// can return null if norms aren't stored
protected synchronized byte[] getNorms(String field) throws IOException {
Norm norm = (Norm) norms.get(field);
if (norm == null) return null; // not indexed, or norms not stored
if (norm.bytes == null) { // value not yet read
byte[] bytes = new byte[maxDoc()];
norms(field, bytes, 0);
norm.bytes = bytes; // cache it
}
return norm.bytes;
}
// returns fake norms if norms aren't available
public synchronized byte[] norms(String field) throws IOException {
byte[] bytes = getNorms(field);
if (bytes==null) bytes=fakeNorms();
return bytes;
}
protected void doSetNorm(int doc, String field, byte value)
throws IOException {
Norm norm = (Norm) norms.get(field);
if (norm == null) // not an indexed field
return;
norm.dirty = true; // mark it dirty
normsDirty = true;
norms(field)[doc] = value; // set the value
}
/** Read norms into a pre-allocated array. */
public synchronized void norms(String field, byte[] bytes, int offset)
throws IOException {
Norm norm = (Norm) norms.get(field);
if (norm == null) {
System.arraycopy(fakeNorms(), 0, bytes, offset, maxDoc());
return;
}
if (norm.bytes != null) { // can copy from cache
System.arraycopy(norm.bytes, 0, bytes, offset, maxDoc());
return;
}
IndexInput normStream = (IndexInput) norm.in.clone();
try { // read from disk
normStream.seek(0);
normStream.readBytes(bytes, offset, maxDoc());
} finally {
normStream.close();
}
}
private void openNorms(Directory cfsDir) throws IOException {
for (int i = 0; i < fieldInfos.size(); i++) {
FieldInfo fi = fieldInfos.fieldInfo(i);
if (fi.isIndexed && !fi.omitNorms) {
// look first if there are separate norms in compound format
String fileName = segment + ".s" + fi.number;
Directory d = directory();
if(!d.fileExists(fileName)){
fileName = segment + ".f" + fi.number;
d = cfsDir;
}
norms.put(fi.name, new Norm(d.openInput(fileName), fi.number));
}
}
}
private void closeNorms() throws IOException {
synchronized (norms) {
Enumeration enumerator = norms.elements();
while (enumerator.hasMoreElements()) {
Norm norm = (Norm) enumerator.nextElement();
norm.in.close();
}
}
}
/**
* Create a clone from the initial TermVectorsReader and store it in the ThreadLocal.
* @return TermVectorsReader
*/
private TermVectorsReader getTermVectorsReader() {
TermVectorsReader tvReader = (TermVectorsReader)termVectorsLocal.get();
if (tvReader == null) {
tvReader = (TermVectorsReader)termVectorsReaderOrig.clone();
termVectorsLocal.set(tvReader);
}
return tvReader;
}
/** Return a term frequency vector for the specified document and field. The
* vector returned contains term numbers and frequencies for all terms in
* the specified field of this document, if the field had storeTermVector
* flag set. If the flag was not set, the method returns null.
* @throws IOException
*/
public TermFreqVector getTermFreqVector(int docNumber, String field) throws IOException {
// Check if this field is invalid or has no stored term vector
FieldInfo fi = fieldInfos.fieldInfo(field);
if (fi == null || !fi.storeTermVector || termVectorsReaderOrig == null)
return null;
TermVectorsReader termVectorsReader = getTermVectorsReader();
if (termVectorsReader == null)
return null;
return termVectorsReader.get(docNumber, field);
}
/** Return an array of term frequency vectors for the specified document.
* The array contains a vector for each vectorized field in the document.
* Each vector vector contains term numbers and frequencies for all terms
* in a given vectorized field.
* If no such fields existed, the method returns null.
* @throws IOException
*/
public TermFreqVector[] getTermFreqVectors(int docNumber) throws IOException {
if (termVectorsReaderOrig == null)
return null;
TermVectorsReader termVectorsReader = getTermVectorsReader();
if (termVectorsReader == null)
return null;
return termVectorsReader.get(docNumber);
}
}
package org.apache.lucene.index;
import java.io.IOException;
import junit.framework.TestCase;
import org.apache.lucene.document.Document;
import org.apache.lucene.store.*;
/**
* test searching using multiple threads
*/
public class MultiThreadSegmentReaderTest extends TestCase {
private static final int NITERATIONS = 10;
SegmentReader sr;
public MultiThreadSegmentReaderTest(String name) {
super(name);
}
public void testMultipleThreads() throws Exception {
Directory dir = FSDirectory.getDirectory("c:/searchdb/empire.test",false);
sr = SegmentReader.get(new SegmentInfo("_4acp",0,dir));
System.out.println("segment contains "+sr.maxDoc()+" documents");
Thread t0 = new ReadAllThread();
Thread t1 = new ReadSameThread();
long stime = System.currentTimeMillis();
t0.start();
t1.start();
t0.join();
t1.join();
System.out.println("time = "+(System.currentTimeMillis()-stime));
}
private class ReadAllThread extends Thread {
public ReadAllThread() {
}
public void run() {
int maxdoc = sr.maxDoc();
long stime = System.currentTimeMillis();
for(int i=0;i<maxdoc-1;i++) {
if(!sr.isDeleted(i)) {
try {
Document doc = sr.document(i);
} catch (IOException e) {
e.printStackTrace();
}
}
}
System.out.println("ReadAllThread, time = "+(System.currentTimeMillis()-stime));
}
}
private class ReadSameThread extends Thread {
public ReadSameThread() {
}
public void run() {
long stime = System.currentTimeMillis();
int maxdoc = sr.maxDoc();
for(int i=0;i<maxdoc;i++) {
if(sr.isDeleted(maxdoc-1)) {
maxdoc = maxdoc-1;
continue;
}
if(!sr.isDeleted(maxdoc-1)) {
try {
Document doc = sr.document(maxdoc-1);
} catch (IOException e) {
e.printStackTrace();
}
}
}
System.out.println("ReadSameThread, time = "+(System.currentTimeMillis()-stime));
}
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]