AW: Newbie wants to index XML content.
You can use the DIH (Dataimport Import Handler) to split up and index that XML. http://wiki.apache.org/solr/DataImportHandler Mit freundlichen Grüßen M.Sc. Dipl.-Inf. (FH) Martin Rödig SHI Elektronische Medien GmbH - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - AKTUELL - NEU - AB SOFORT Solr/Lucene Schulung vom 19. - 21. April in Berlin Als erster zertifizierter Trainingspartner von Lucid Imagination in Deutschland, Österreich und Schweiz bietet SHI ab sofort deutschsprachige Solr Schulungen an. Weitere Informationen: www.shi-gmbh.com/services/solr-training Achtung: Die Anzahl der Plätze ist beschränkt! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Postadresse: Watzmannstr. 23, 86316 Friedberg Besuchsadresse: Curt-Frenzel-Str. 12, 86167 Augsburg Tel.: 0821 7482633 18 Tel.: 0821 7482633 0 (Zentrale) Fax: 0821 7482633 29 Internet: http://www.shi-gmbh.com Registergericht Augsburg HRB 17382 Geschäftsführer: Peter Spiske Steuernummer: 103/137/30412 -Ursprüngliche Nachricht- Von: Marcelo Iturbe [mailto:marc...@santiago.cl] Gesendet: Donnerstag, 24. März 2011 21:55 An: solr-user@lucene.apache.org Betreff: Newbie wants to index XML content. Hello, I've been reading up on how to index XML content but have a few questions. How is data in element attributes handled or defined? How are nested elements handled? In the following XML structure, I want to index the content of what is between the entry tags. In one XML document, there can be up to 100 entry tags. So the entry tag would be equivalent to the doc tag... Can I somehow index this XML as is or will I have to parse it, creating the doc tag and placing all the elements on the same level? Thanks for your help. ?xml version=1.0 encoding=utf-8? root sourcemanual/source author nameMC Anon User/name emailmca...@mcdomain.com/email /author entry name fullnameJohn Smith/fullname /name emailjsmit...@gmail.com/email /entry entry name fullnameFirst Last/fullname firstnameFirst/firstname lastnameLast/lastname /name organization nameMC S.A./name tittleCIO/tittle /organization email type=work primary=truefi...@mcdomain.com/email emailflas...@yahoo.com/email phoneNumber type=work primary=true+5629460600/phoneNumber im carrier=gtalk primary=truefi...@mcdomain.com/im im carrier=skypeFirst.Last/im postalAddress111 Bude St, Toronto/postalAddress custom name=bloghttp://blog.mcdomain.com//custom /entry /root regards Marcelo WebRep Overall rating
Re: Newbie wants to index XML content.
Solr does not index random XML documents, (but see Martin's comments about DIH). Solr will index XML documents that have a specific format, however. The general form is: add doc field name=value to index/field field name=value for this field /field /doc doc /doc /add So you can either try DIH or parse the raw XML yourself and put it in the above form for indexing... Best Erick On Thu, Mar 24, 2011 at 4:54 PM, Marcelo Iturbe marc...@santiago.cl wrote: Hello, I've been reading up on how to index XML content but have a few questions. How is data in element attributes handled or defined? How are nested elements handled? In the following XML structure, I want to index the content of what is between the entry tags. In one XML document, there can be up to 100 entry tags. So the entry tag would be equivalent to the doc tag... Can I somehow index this XML as is or will I have to parse it, creating the doc tag and placing all the elements on the same level? Thanks for your help. ?xml version=1.0 encoding=utf-8? root sourcemanual/source author nameMC Anon User/name emailmca...@mcdomain.com/email /author entry name fullnameJohn Smith/fullname /name emailjsmit...@gmail.com/email /entry entry name fullnameFirst Last/fullname firstnameFirst/firstname lastnameLast/lastname /name organization nameMC S.A./name tittleCIO/tittle /organization email type=work primary=truefi...@mcdomain.com/email emailflas...@yahoo.com/email phoneNumber type=work primary=true+5629460600/phoneNumber im carrier=gtalk primary=truefi...@mcdomain.com/im im carrier=skypeFirst.Last/im postalAddress111 Bude St, Toronto/postalAddress custom name=bloghttp://blog.mcdomain.com//custom /entry /root regards Marcelo WebRep Overall rating
Newbie wants to index XML content.
Hello, I've been reading up on how to index XML content but have a few questions. How is data in element attributes handled or defined? How are nested elements handled? In the following XML structure, I want to index the content of what is between the entry tags. In one XML document, there can be up to 100 entry tags. So the entry tag would be equivalent to the doc tag... Can I somehow index this XML as is or will I have to parse it, creating the doc tag and placing all the elements on the same level? Thanks for your help. ?xml version=1.0 encoding=utf-8? root sourcemanual/source author nameMC Anon User/name emailmca...@mcdomain.com/email /author entry name fullnameJohn Smith/fullname /name emailjsmit...@gmail.com/email /entry entry name fullnameFirst Last/fullname firstnameFirst/firstname lastnameLast/lastname /name organization nameMC S.A./name tittleCIO/tittle /organization email type=work primary=truefi...@mcdomain.com/email emailflas...@yahoo.com/email phoneNumber type=work primary=true+5629460600/phoneNumber im carrier=gtalk primary=truefi...@mcdomain.com/im im carrier=skypeFirst.Last/im postalAddress111 Bude St, Toronto/postalAddress custom name=bloghttp://blog.mcdomain.com//custom /entry /root regards Marcelo WebRep Overall rating