[PATCH] Moving HitDetails construction to a HitDetails constructor (v2).
This is a fixed version of the previous patch. Please, don't ignore me =). I'm trying to use Lucene queries with Nutch and this patch will help. This patch also removes a deprecated API usage, removes useless object creation and array copying. Thanks! Index: src/java/org/apache/nutch/searcher/IndexSearcher.java === --- src/java/org/apache/nutch/searcher/IndexSearcher.java (revisión: 543252) +++ src/java/org/apache/nutch/searcher/IndexSearcher.java (copia de trabajo) @@ -21,6 +21,8 @@ import java.util.ArrayList; import java.util.Enumeration; +import java.util.Iterator; +import java.util.List; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; @@ -105,20 +107,8 @@ } public HitDetails getDetails(Hit hit) throws IOException { -ArrayList fields = new ArrayList(); -ArrayList values = new ArrayList(); - Document doc = luceneSearcher.doc(hit.getIndexDocNo()); - -Enumeration e = doc.fields(); -while (e.hasMoreElements()) { - Field field = (Field)e.nextElement(); - fields.add(field.name()); - values.add(field.stringValue()); -} - -return new HitDetails((String[])fields.toArray(new String[fields.size()]), - (String[])values.toArray(new String[values.size()])); +return new HitDetails(doc); } public HitDetails[] getDetails(Hit[] hits) throws IOException { Index: src/java/org/apache/nutch/searcher/HitDetails.java === --- src/java/org/apache/nutch/searcher/HitDetails.java (revisión: 543252) +++ src/java/org/apache/nutch/searcher/HitDetails.java (copia de trabajo) @@ -21,8 +21,11 @@ import java.io.DataOutput; import java.io.IOException; import java.util.ArrayList; +import java.util.List; import org.apache.hadoop.io.*; +import org.apache.lucene.document.Document; +import org.apache.lucene.document.Field; import org.apache.nutch.html.Entities; /** Data stored in the index for a hit. @@ -52,7 +55,23 @@ this.fields[1] = "url"; this.values[1] = url; } + + /** Construct from Lucene document. */ + public HitDetails(Document doc) + { +List ff = doc.getFields(); +length = ff.size(); + +fields = new String[length]; +values = new String[length]; +for(int i = 0 ; i < length ; i++) { + Field field = (Field)ff.get(i); + fields[i] = field.name(); + values[i] = field.stringValue(); +} + } + /** Returns the number of fields contained in this. */ public int getLength() { return length; }
Re: [PATCH] Moving HitDetails construction to a HitDetails constructor (v2).
Nicolás Lichtmaier wrote: This is a fixed version of the previous patch. In the future, please use JIRA bug tracking system to submit patches. Please, don't ignore me =). We don't - but there's only so much ou can do in 24 hrs/day, and Nutch developers have their own lives to attend to ... ;) I'm trying to use Lucene queries with Nutch and this patch will help. This patch also removes a deprecated API usage, removes useless object creation and array copying. I believe the conversion from Document to HitDetails was separated this way on purpose. Please note that front-end Nutch API has no dependencies on Lucene classes. If we applied your patch, all of a sudden HitDetails would become dependent on Lucene, causing front-end applications to become dependent on Lucene, too. We can certainly fix the use of deprecated API as you suggested. As for the rest of the patch, in my opinion it should not be applied. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: [PATCH] Moving HitDetails construction to a HitDetails constructor (v2).
Please, don't ignore me =). We don't - but there's only so much ou can do in 24 hrs/day, and Nutch developers have their own lives to attend to ... ;) =) Sorry, I didn't mean to sound "demanding". It's that there's a natural focus in real features and I thought that "tidyness" patches get unnoticed. I'm trying to use Lucene queries with Nutch and this patch will help. This patch also removes a deprecated API usage, removes useless object creation and array copying. I believe the conversion from Document to HitDetails was separated this way on purpose. Please note that front-end Nutch API has no dependencies on Lucene classes. If we applied your patch, all of a sudden HitDetails would become dependent on Lucene, causing front-end applications to become dependent on Lucene, too. We can certainly fix the use of deprecated API as you suggested. As for the rest of the patch, in my opinion it should not be applied. Oh, I see... a pitty. It looked cleaner too me, and I'll have to copy+paste that into my code. What about the other patch? (Retrofit Hits to implement List)