[PATCH] Moving HitDetails construction to a HitDetails constructor (v2).

2007-06-01 Thread Nicolás Lichtmaier
This is a fixed version of the previous patch. Please, don't ignore me 
=). I'm trying to use Lucene queries with Nutch and this patch will 
help. This patch also removes a deprecated API usage, removes useless 
object creation and array copying.


Thanks!

Index: src/java/org/apache/nutch/searcher/IndexSearcher.java
===
--- src/java/org/apache/nutch/searcher/IndexSearcher.java	(revisión: 543252)
+++ src/java/org/apache/nutch/searcher/IndexSearcher.java	(copia de trabajo)
@@ -21,6 +21,8 @@
 
 import java.util.ArrayList;
 import java.util.Enumeration;
+import java.util.Iterator;
+import java.util.List;
 
 import org.apache.lucene.store.Directory;
 import org.apache.lucene.store.FSDirectory;
@@ -105,20 +107,8 @@
   }
 
   public HitDetails getDetails(Hit hit) throws IOException {
-ArrayList fields = new ArrayList();
-ArrayList values = new ArrayList();
-
 Document doc = luceneSearcher.doc(hit.getIndexDocNo());
-
-Enumeration e = doc.fields();
-while (e.hasMoreElements()) {
-  Field field = (Field)e.nextElement();
-  fields.add(field.name());
-  values.add(field.stringValue());
-}
-
-return new HitDetails((String[])fields.toArray(new String[fields.size()]),
-  (String[])values.toArray(new String[values.size()]));
+return new HitDetails(doc);
   }
 
   public HitDetails[] getDetails(Hit[] hits) throws IOException {
Index: src/java/org/apache/nutch/searcher/HitDetails.java
===
--- src/java/org/apache/nutch/searcher/HitDetails.java	(revisión: 543252)
+++ src/java/org/apache/nutch/searcher/HitDetails.java	(copia de trabajo)
@@ -21,8 +21,11 @@
 import java.io.DataOutput;
 import java.io.IOException;
 import java.util.ArrayList;
+import java.util.List;
 
 import org.apache.hadoop.io.*;
+import org.apache.lucene.document.Document;
+import org.apache.lucene.document.Field;
 import org.apache.nutch.html.Entities;
 
 /** Data stored in the index for a hit.
@@ -52,7 +55,23 @@
 this.fields[1] = "url";
 this.values[1] = url;
   }
+  
+  /** Construct from Lucene document. */
+  public HitDetails(Document doc)
+  {
+List ff = doc.getFields();
+length = ff.size();
+
+fields = new String[length];
+values = new String[length];
 
+for(int i = 0 ; i < length ; i++) {
+  Field field = (Field)ff.get(i);
+  fields[i] = field.name();
+  values[i] = field.stringValue();
+}
+  }
+
   /** Returns the number of fields contained in this. */
   public int getLength() { return length; }
 


Re: [PATCH] Moving HitDetails construction to a HitDetails constructor (v2).

2007-06-01 Thread Andrzej Bialecki

Nicolás Lichtmaier wrote:

This is a fixed version of the previous patch.


In the future, please use JIRA bug tracking system to submit patches.


Please, don't ignore me =).


We don't - but there's only so much ou can do in 24 hrs/day, and Nutch 
developers have their own lives to attend to ... ;)



I'm trying to use Lucene queries with Nutch and this patch will 
help. This patch also removes a deprecated API usage, removes useless 
object creation and array copying.


I believe the conversion from Document to HitDetails was separated this 
way on purpose. Please note that front-end Nutch API has no dependencies 
on Lucene classes. If we applied your patch, all of a sudden HitDetails 
would become dependent on Lucene, causing front-end applications to 
become dependent on Lucene, too.


We can certainly fix the use of deprecated API as you suggested. As for 
the rest of the patch, in my opinion it should not be applied.


--
Best regards,
Andrzej Bialecki <><
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: [PATCH] Moving HitDetails construction to a HitDetails constructor (v2).

2007-06-03 Thread Nicolás Lichtmaier



Please, don't ignore me =).
We don't - but there's only so much ou can do in 24 hrs/day, and Nutch 
developers have their own lives to attend to ... ;)


=) Sorry, I didn't mean to sound "demanding". It's that there's a 
natural focus in real features and I thought that "tidyness" patches get 
unnoticed.





I'm trying to use Lucene queries with Nutch and this patch will help. 
This patch also removes a deprecated API usage, removes useless 
object creation and array copying.


I believe the conversion from Document to HitDetails was separated 
this way on purpose. Please note that front-end Nutch API has no 
dependencies on Lucene classes. If we applied your patch, all of a 
sudden HitDetails would become dependent on Lucene, causing front-end 
applications to become dependent on Lucene, too.


We can certainly fix the use of deprecated API as you suggested. As 
for the rest of the patch, in my opinion it should not be applied.




Oh, I see... a pitty. It looked cleaner too me, and I'll have to 
copy+paste that into my code. What about the other patch? (Retrofit Hits 
to implement List)