Hi Armel, On 11/20/06 1:44 PM, "Armel T. Nene" <[EMAIL PROTECTED]> wrote:
> Hi Chris, > > I am trying to extend parse-xml to enable the creation of lucene fields > straight from an xml file. For example, a database table that has been parse > as an XML file should be stored in the index with the relevant fields, i.e. > customer name, address and so on. This file will not have a namespace > associated with it and should not be stored as "xmlcontent" in the database. > Currently, parse-xml looks for known fields in the document and stores the > associated values with the field name. I have added an extra conditions as > if the known fields are not present in the current document, the element or > node in the document should be the new field stored in the index with their > value. I think that this is fine. > > Therefore, when parse-xml receives an xml document with no namespace > available, it will parse the document and store it element name as new field > in the index and the element associated value. > > Let me know if I am on the right track because I know I don't have to write > a separate plugin for this feature but just extending ( or modifying) > parse-xml. I think that parse-xml will support what you are talking about. In terms of the "check" that you are doing to see if a field exists or not before adding another value for it in the index, as I understood Lucene, I believe that you could just omit this check and add the field regardless. If you add multiple values for the same field in a Document, e.g: <snip> Document doc = new Document(); doc.add(new Field("fieldname", "fieldvalue", ...)); doc.add(new Field("fieldname", "fieldvalue2",...)); </snip> Both the values "fieldvalue" and "fieldvalue2" will both get stored in the index for the key "fieldname". So, if I understand you correctly (which I may not ;) ), then I think you can omit the check that you are talking about above and just go with adding the same field name 2x. HTH, Chris > > Cheers, > > Armel > > > -----Original Message----- > From: Chris Mattmann [mailto:[EMAIL PROTECTED] > Sent: 20 November 2006 18:40 > To: nutch-dev@lucene.apache.org > Subject: Re: What's the status of Nutch-GUI? > > Hi Sami and Scott, > > This is on my TO-DO list as one of the items that I will begin working on > getting into the sources as a committer. Additionally, I plan on integrating > and testing the parse-xml plugin into the source tree. As soon as I get my > Apache account and SVN access, I will start working on this. > > Thanks! > > Cheers, > Chris > > > > On 11/20/06 9:24 AM, "Sami Siren" <[EMAIL PROTECTED]> wrote: > >> scott green wrote: >>> Hi >>> >>> Is nutch-gui dead? why i cannot find any source in svn repo? >> >> Unfortunately the sources for the admin gui never got into svn. It would >> be great if someone could pick it up and bring it up to date to get it >> integrated. >> >> -- >> Sami Siren >> > > > > ______________________________________________ Chris A. Mattmann [EMAIL PROTECTED] Staff Member Modeling and Data Management Systems Section (387) Data Management Systems and Technologies Group _________________________________________________ Jet Propulsion Laboratory Pasadena, CA Office: 171-266B Mailstop: 171-246 _______________________________________________________ Disclaimer: The opinions presented within are my own and do not reflect those of either NASA, JPL, or the California Institute of Technology.