Re: Empty Sink Tokenizer

Grant Ingersoll Tue, 31 Mar 2009 09:27:10 -0700

Well, we don't make any guarantees about it in docs, AFAICT, but wehave in the past advertised it (via the mailing lists) as such. TheTee/Sink stuff does rely on what has been the de facto way of doingthings up until 2.3 it sounds. The snippet of code I included caneasily be converted to a test case if we wish to enforce it goingforward.

What's the benefit of collation? I don't know if this is considered aback-compatible breakage or not (likely not) but this issue does comeup from time to time and there are people that have relied on ouranswer.

In the end, we should document whichever it is going to be and thenmake sure the Tee/Sink stuff documents it as well.




On Mar 31, 2009, at 10:51 AM, Michael McCandless wrote:

Uh-oh: I think this happened as part of LUCENE-843, which landed in2.3.


IndexWriter now first collates each Field instance, by name, and then
visits those fields in sorted order.  Multiple instances of the same
field name are written in the order that they appeared in the
document.

StoredFieldsWriter taps in to the indexing chain after that per-field collation.


But, if getting back to this is important, we should be able to move
StoredFieldsWriter up in the chain so that it visits the original
document, instead.  Offhand, I'm not sure if there are any tradeoffs
in doing that.

Mike

On Tue, Mar 31, 2009 at 9:30 AM, Grant Ingersoll<gsing...@apache.org> wrote:

Has the way fields get added changed recently?
 
http://www.lucidimagination.com/search/document/954555c478002a3/empty_sinktokenizer

See also:
http://www.lucidimagination.com/search/document/274ec8c1c56fdd54/order_of_field_objects_within_document#5ffce4509ed32511

http://www.lucidimagination.com/search/document/d6b19ab1bd87e30a/order_of_fields_returned_by_document_getfields#d6b19ab1bd87e30a

http://www.lucidimagination.com/search/document/deda4dd3f9041bee/the_order_of_fields_in_document_fields#bb26d84091aebcaa

The following little program confirms that they are indeed in alphaorder

now and not in added order:
public class TestFieldOrdering extends LuceneTestCase {
 protected RAMDirectory dir;

 protected void setUp() throws Exception {
   super.setUp();
   dir = new RAMDirectory();

 }

 public void testAddFields() throws Exception {

IndexWriter writer = new IndexWriter(dir, new SimpleAnalyzer(),true,

IndexWriter.MaxFieldLength.LIMITED);

   Document doc = new Document();
   doc.add(new Field("id", "one", Field.Store.YES, Field.Index.NO));
   doc.add(new Field("z", "document z", Field.Store.YES,
Field.Index.ANALYZED));
   doc.add(new Field("a", "document a", Field.Store.YES,
Field.Index.ANALYZED));
   doc.add(new Field("e", "document e", Field.Store.YES,
Field.Index.ANALYZED));
   doc.add(new Field("b", "document b", Field.Store.YES,
Field.Index.ANALYZED));
   writer.addDocument(doc);
   writer.close();
   IndexReader reader = IndexReader.open(dir);
   Document retreived = reader.document(0);

assertTrue("retreived is null and it shouldn't be", retreived !=null);

   List fields = retreived.getFields();
   for (Iterator iterator = fields.iterator(); iterator.hasNext();) {
     Field name = (Field) iterator.next();
     System.out.println("Name: " + name);
   }
 }

}

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Empty Sink Tokenizer

Reply via email to