Thanks Simon!

On 29.10.2012 г. 21:38, Simon Willnauer wrote:
you should call currDocsAndPositions.nextPosition() before you call
currDocsAndPositions.getPayload()  payloads are per positions so you
need to advance the pos first!

simon

On Mon, Oct 29, 2012 at 6:44 PM, Ivan Vasilev <ivasi...@sirma.bg> wrote:
Hi Guys,

I use the following code to index documents and set Payloads to term
positions:

public class TestPayloads_ {
     private static final String INDEX_DIR =
             "E:/Temp/Index";

     public static void main(String[] args) throws Exception {
         IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_40, new
MyAnalyzer_());
         iwc.setOpenMode(OpenMode.CREATE);
         IndexWriter writer = new IndexWriter(FSDirectory.open(new
File(INDEX_DIR)), iwc);

         FieldType fieldType = new FieldType();
         IndexOptions indexOptions =
IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
         fieldType.setIndexOptions(indexOptions);
         fieldType.setIndexed(true);
         fieldType.setOmitNorms(true);
         fieldType.setStored(true);
         fieldType.freeze();

         Document doc = new Document();
         doc.add(new Field("content", "one two three four.", fieldType));
         writer.addDocument(doc);

         writer.addDocument(doc);
         writer.addDocument(doc);

         writer.close();

         DirectoryReader dr = DirectoryReader.open(FSDirectory.open(new
File(INDEX_DIR)));
         AtomicReader sr = dr.leaves().get(0).reader();

         Bits liveDocs = sr.getLiveDocs();
         Fields fields = sr.fields();
         for (String currFieldName : fields) {
             Terms currTerms = fields.terms(currFieldName);
             TermsEnum currTermEnum = currTerms.iterator(null);
             boolean currTermsHasPayloads = currTerms.hasPayloads();
             BytesRef currFieldValue;
             while ((currFieldValue = currTermEnum.next()) != null) {
                 String currVfieldValueStr = currFieldValue.utf8ToString();
//                    DocsEnum currDocsEnum = currTermEnum.docs(liveDocs,
null);
                 DocsAndPositionsEnum currDocsAndPositions =
                     currTermEnum.docsAndPositions(liveDocs, null,
                         DocsAndPositionsEnum.FLAG_PAYLOADS
                         | DocsAndPositionsEnum.FLAG_OFFSETS);
                 int docID;
                 while ((docID = currDocsAndPositions.nextDoc()) !=
DocsEnum.NO_MORE_DOCS) {
                     int freq = currDocsAndPositions.freq();
                     for (int i = 0; i < freq; i++) {
                         byte payload;
                         if (currTermsHasPayloads &&
currDocsAndPositions.getPayload() != null) {
                             payload =
currDocsAndPositions.getPayload().bytes[0];
                         } else {
                             payload = -1;
                         }
                         System.out.println("Term: (" + currFieldName + ":" +
currVfieldValueStr + "); doc: "
                             + docID + "; position: " +
currDocsAndPositions.nextPosition()
                             + "; payload: " + payload);
                     }
                 }
             }
         }
         dr.close();
     }

}

class MyAnalyzer_ extends Analyzer {
     @Override
     protected TokenStreamComponents createComponents(String fieldName,
Reader reader) {
         Tokenizer tokenizer = new StandardTokenizer(Version.LUCENE_40,
reader);
         return new TokenStreamComponents(tokenizer, new
MyFilter_(tokenizer));
     }

}

class MyFilter_ extends TokenFilter {
     private PayloadAttribute payloadAttr;
     private byte[] payloadVal;

     MyFilter_(TokenStream in) {
         super(in);
         payloadAttr = addAttribute(PayloadAttribute.class);
         payloadVal = new byte[1];
     }
     public final boolean incrementToken() throws IOException {
         if (input.incrementToken()) {
             payloadVal[0]++;
             payloadAttr.setPayload(new BytesRef(payloadVal));
             return true;
         } else {
             return false;
         }
     }
}




The output is the following:

Term: (content:four); doc: 0; position: 3; payload: -1
Term: (content:four); doc: 1; position: 3; payload: 4
Term: (content:four); doc: 2; position: 3; payload: 8
Term: (content:one); doc: 0; position: 0; payload: -1
Term: (content:one); doc: 1; position: 0; payload: 1
Term: (content:one); doc: 2; position: 0; payload: 5
Term: (content:three); doc: 0; position: 2; payload: -1
Term: (content:three); doc: 1; position: 2; payload: 3
Term: (content:three); doc: 2; position: 2; payload: 7
Term: (content:two); doc: 0; position: 1; payload: -1
Term: (content:two); doc: 1; position: 1; payload: 2
Term: (content:two); doc: 2; position: 1; payload: 6


The payloads of document with Lucene ID #0 were not added. Payloads that
were intended to doc #0 were added to doc #1, those intended for doc #1 were
added to doc #2.
With the debugger I see that during adding doc #0 payloadVal is incremented
form 1 to 4, and after each incrementation is invoked
payloadAttr.setPayload(..), but strangely when reading
DocsAndPositionsEnumwe see those payloads (1 to 4) belong actually to doc
#1.

Do I make some mistake with invoking setPayload(..) method or it is a bug?

Cheers,
Ivan Vasilev

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to