[ 
http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12363834 ] 

Jerome Charron commented on NUTCH-139:
--------------------------------------

Andrzej,

I really don't like this "X-Nutch" naming convention. First it's really 
protocol level oriented, and it forces to map "X-Nutch" values with original 
ones (of course an utility method can easily provides this mapping). But I 
really think this solution is really clean (from my point of view).

We should perhaps define one more time what is a MetaData value.
I suggest to define a new class to represent a metadata value instead of using 
a simple String.
Thus, we can define a class that holds both original and final value.
The idea is that the only way to set the original value is to construct a new 
object (I will call this class MetaValue, but native english speakers are 
encourage to propose a better name), then when you set the value of this 
metadata value, it never override the original one, but the final one.
Here is a short piece of code:

public class MetaValue {
    private String[] original = null;
    private List actual = null;

    public MetaValue(String[] values) {
        // Constructor for multi value
        original = values;
    }
    public MetaValue(String value) {
        // Constructor for single value
       original = new String[] { value };
    }
   public void setValue(String[] values) {
       // copies the values in a new empty actual list
   }

   public void addValue(String value) {
       // append this value to the list of values
   }

   public String[] getOriginalValues() { }

   public String[] getFinalValues() { }

   public String[] getValues() {
       // Return the final values if the list of values is not null
      // otherwise return the final values
  }
}

With this approach we can keep the same value (MetaValue) with the same key.


> Standard metadata property names in the ParseData metadata
> ----------------------------------------------------------
>
>          Key: NUTCH-139
>          URL: http://issues.apache.org/jira/browse/NUTCH-139
>      Project: Nutch
>         Type: Improvement
>   Components: fetcher
>     Versions: 0.7.1, 0.7, 0.6, 0.7.2-dev, 0.8-dev
>  Environment: Power Mac OS X 10.4, Dual Processor G5 2.0 Ghz, 1.5 GB  RAM, 
> although bug is independent of environment
>     Reporter: Chris A. Mattmann
>     Assignee: Chris A. Mattmann
>     Priority: Minor
>      Fix For: 0.7.2-dev, 0.8-dev, 0.7.1, 0.7, 0.6
>  Attachments: NUTCH-139.060105.patch, NUTCH-139.Mattmann.patch.txt, 
> NUTCH-139.jc.review.patch.txt
>
> Currently, people are free to name their string-based properties anything 
> that they want, such as having names of "Content-type", "content-TyPe", 
> "CONTENT_TYPE" all having the same meaning. Stefan G. I believe proposed a 
> solution in which all property names be converted to lower case, but in 
> essence this really only fixes half the problem right (the case of 
> identifying that "CONTENT_TYPE"
> and "conTeNT_TyPE" and all the permutations are really the same). What about
> if I named it "Content     Type", or "ContentType"?
>  I propose that a way to correct this would be to create a standard set of 
> named Strings in the ParseData class that the protocol framework and the 
> parsing framework could use to identify common properties such as 
> "Content-type", "Creator", "Language", etc.
>  The properties would be defined at the top of the ParseData class, something 
> like:
>  public class ParseData{
>    .....
>     public static final String CONTENT_TYPE = "content-type";
>     public static final String CREATOR = "creator";
>    ....
> }
> In this fashion, users could at least know what the name of the standard 
> properties that they can obtain from the ParseData are, for example by making 
> a call to ParseData.getMetadata().get(ParseData.CONTENT_TYPE) to get the 
> content type or a call to ParseData.getMetadata().set(ParseData.CONTENT_TYPE, 
> "text/xml"); Of course, this wouldn't preclude users from doing what they are 
> currently doing, it would just provide a standard method of obtaining some of 
> the more common, critical metadata without pouring over the code base to 
> figure out what they are named.
> I'll contribute a patch near the end of the this week, or beg. of next week 
> that addresses this issue.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to