AW: Newbie wants to index XML content.

2011-03-25 Thread Martin Rödig
You can use the DIH (Dataimport Import Handler) to split up and index that XML.
 http://wiki.apache.org/solr/DataImportHandler


Mit freundlichen Grüßen
M.Sc. Dipl.-Inf. (FH) Martin Rödig
 
SHI Elektronische Medien GmbH
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - -
AKTUELL - NEU - AB SOFORT 
Solr/Lucene Schulung vom 19. - 21. April in Berlin
 
Als erster zertifizierter Trainingspartner von Lucid Imagination in 
Deutschland, Österreich und Schweiz bietet SHI ab sofort 
deutschsprachige Solr Schulungen an.
Weitere Informationen: www.shi-gmbh.com/services/solr-training
Achtung: Die Anzahl der Plätze ist beschränkt!
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - -
Postadresse: Watzmannstr. 23, 86316 Friedberg
Besuchsadresse: Curt-Frenzel-Str. 12, 86167 Augsburg
Tel.: 0821 7482633 18
Tel.: 0821 7482633 0 (Zentrale)
Fax: 0821 7482633 29

Internet: http://www.shi-gmbh.com
Registergericht Augsburg HRB 17382
Geschäftsführer: Peter Spiske
Steuernummer: 103/137/30412

-Ursprüngliche Nachricht-
Von: Marcelo Iturbe [mailto:marc...@santiago.cl] 
Gesendet: Donnerstag, 24. März 2011 21:55
An: solr-user@lucene.apache.org
Betreff: Newbie wants to index XML content.

Hello,
I've been reading up on how to index XML content but have a few questions.

How is data in element attributes handled or defined? How are nested elements 
handled?

In the following XML structure, I want to index the content of what is between 
the entry tags.
In one XML document, there can be up to 100 entry tags.
So the entry tag would be equivalent to the doc tag...

Can I somehow index this XML as is or will I have to parse it, creating the 
doc tag and placing all the elements on the same level?

Thanks for your help.

?xml version=1.0 encoding=utf-8?
root
sourcemanual/source
author
nameMC Anon User/name
emailmca...@mcdomain.com/email
/author

entry
name
fullnameJohn Smith/fullname
/name
emailjsmit...@gmail.com/email
/entry

entry
name
fullnameFirst Last/fullname
firstnameFirst/firstname
lastnameLast/lastname
/name
organization
nameMC S.A./name
tittleCIO/tittle
/organization
email type=work primary=truefi...@mcdomain.com/email
emailflas...@yahoo.com/email
phoneNumber type=work primary=true+5629460600/phoneNumber
im carrier=gtalk primary=truefi...@mcdomain.com/im
im carrier=skypeFirst.Last/im
postalAddress111 Bude St, Toronto/postalAddress
custom name=bloghttp://blog.mcdomain.com//custom
/entry
/root

regards
Marcelo
WebRep
Overall rating


Re: Newbie wants to index XML content.

2011-03-25 Thread Erick Erickson
Solr does not index random XML documents, (but see Martin's comments
about DIH). Solr will index XML documents that have a specific format,
however. The general form is:
add
doc
  field name=value to index/field
  field name=value for this field /field
/doc
doc

/doc
/add

So you can either try DIH or parse the raw XML yourself and put it in the above
form for indexing...

Best
Erick

On Thu, Mar 24, 2011 at 4:54 PM, Marcelo Iturbe marc...@santiago.cl wrote:
 Hello,
 I've been reading up on how to index XML content but have a few questions.

 How is data in element attributes handled or defined? How are nested
 elements handled?

 In the following XML structure, I want to index the content of what is
 between the entry tags.
 In one XML document, there can be up to 100 entry tags.
 So the entry tag would be equivalent to the doc tag...

 Can I somehow index this XML as is or will I have to parse it, creating
 the doc tag and placing all the elements on the same level?

 Thanks for your help.

 ?xml version=1.0 encoding=utf-8?
 root
    sourcemanual/source
    author
        nameMC Anon User/name
        emailmca...@mcdomain.com/email
    /author

    entry
        name
            fullnameJohn Smith/fullname
        /name
        emailjsmit...@gmail.com/email
    /entry

    entry
        name
            fullnameFirst Last/fullname
            firstnameFirst/firstname
            lastnameLast/lastname
        /name
        organization
            nameMC S.A./name
            tittleCIO/tittle
        /organization
        email type=work primary=truefi...@mcdomain.com/email
        emailflas...@yahoo.com/email
        phoneNumber type=work primary=true+5629460600/phoneNumber
        im carrier=gtalk primary=truefi...@mcdomain.com/im
        im carrier=skypeFirst.Last/im
        postalAddress111 Bude St, Toronto/postalAddress
        custom name=bloghttp://blog.mcdomain.com//custom
    /entry
 /root

 regards
 Marcelo
 WebRep
 Overall rating



Newbie wants to index XML content.

2011-03-24 Thread Marcelo Iturbe
Hello,
I've been reading up on how to index XML content but have a few questions.

How is data in element attributes handled or defined? How are nested
elements handled?

In the following XML structure, I want to index the content of what is
between the entry tags.
In one XML document, there can be up to 100 entry tags.
So the entry tag would be equivalent to the doc tag...

Can I somehow index this XML as is or will I have to parse it, creating
the doc tag and placing all the elements on the same level?

Thanks for your help.

?xml version=1.0 encoding=utf-8?
root
sourcemanual/source
author
nameMC Anon User/name
emailmca...@mcdomain.com/email
/author

entry
name
fullnameJohn Smith/fullname
/name
emailjsmit...@gmail.com/email
/entry

entry
name
fullnameFirst Last/fullname
firstnameFirst/firstname
lastnameLast/lastname
/name
organization
nameMC S.A./name
tittleCIO/tittle
/organization
email type=work primary=truefi...@mcdomain.com/email
emailflas...@yahoo.com/email
phoneNumber type=work primary=true+5629460600/phoneNumber
im carrier=gtalk primary=truefi...@mcdomain.com/im
im carrier=skypeFirst.Last/im
postalAddress111 Bude St, Toronto/postalAddress
custom name=bloghttp://blog.mcdomain.com//custom
/entry
/root

regards
Marcelo
WebRep
Overall rating