Re: correct XPATH syntax
The XPath implementation in DIH is very minimal- it is tuned for speed, not features. The XSL option lets you do everything you could want, with a slower engine. On Thu, May 3, 2012 at 7:30 AM, lboutros boutr...@gmail.com wrote: ok, not that easy :) I did not test it myself but it seems that you could use an XSL preprocessing with the 'xsl' option in your XPathEntityProcessor : http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1 You could transform the author part as you wish and then import the author field with your actual configuration. Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/correct-XPATH-syntax-tp3951804p3959397.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Re: correct XPATH syntax
Hi David, what do you want to do with the 'commonField' option ? Is it possible to have the part of the schema for the author field please ? Is the author field stored ? Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/correct-XPATH-syntax-tp3951804p3959097.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: correct XPATH syntax
Is what I want even possible with XPathEntityProcessor? It sort of works now - I didn't realize the flatten attribute is an attribute of field instead of entity. BUT it's still not what I would like. The XML looks like below and it's nested within /MedlineCitationSet/MedlineCitation/Article/ AuthorList CompleteYN=Y Author ValidYN=Y LastNameStarremans/LastName ForeNamePatrick G J F/ForeName InitialsPG/Initials /AuthorAuthor ValidYN=Y LastNamevan der Kemp/LastName ForeNameAnnemiete W C M/ForeName InitialsAW/Initials /Author Author ValidYN=Y LastNameKnoers/LastName ForeNameNine V A M/ForeName InitialsNV/Initials /Author Author ValidYN=Y LastNamevan den Heuvel/LastName ForeNameLambertus P W J/ForeName InitialsLP/Initials /Author /AuthorList What I would like to see in the index author field is authorStarremans PG, Van der Kemp AW, etc /author note lastname Initials, no forename. When I set Xpath like this field column=author xpath=/MedlineCitationSet/MedlineCitation/Article/AuthorList/Author flatten=true / I get this in the index arr name=author strStarremans Patrick G J F PG/str strVan der Kemp Annemiete W C M AW/str . . /arr note: the forename field is included My author field in the schema.xml is field name=author type=textgen indexed=true stored=true multiValued=true required=false/ So is this even possible with XPathEntityProcessor? Thanks David On 5/3/12 8:40 AM, lboutros boutr...@gmail.commailto:boutr...@gmail.com wrote: Hi David, what do you want to do with the 'commonField' option ? Is it possible to have the part of the schema for the author field please ? Is the author field stored ? Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/correct-XPATH-syntax-tp3951804p3959097.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: correct XPATH syntax
ok, not that easy :) I did not test it myself but it seems that you could use an XSL preprocessing with the 'xsl' option in your XPathEntityProcessor : http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1 You could transform the author part as you wish and then import the author field with your actual configuration. Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/correct-XPATH-syntax-tp3951804p3959397.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: correct XPATH syntax
Hi David, I think you should add this option : flatten=true and the could you try to use this XPath : /MedlineCitationSet/MedlineCitation/AuthorList/Author see here for the description : http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1 I don't think the that the commonField option is needed here, I think you should suppress it. Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/correct-XPATH-syntax-tp3951804p3952812.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: correct XPATH syntax
Ludovic, Thanks for your help. I tried your suggestion but it didn't work for Authors. Below are 3 snippets from data-config.xml, the XML file and the XML response from the DB Data-config: entity name=medlineFiles processor=XPathEntityProcessor url=${medlineFileList.fileAbsolutePath} forEach=/MedlineCitationSet/MedlineCitation transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer,Log Transformer logTemplate= processing ${medlineFileList.fileAbsolutePath} logLevel=info flatten=true stream=true field column=pmid xpath=/MedlineCitationSet/MedlineCitation/PMID commonField=true / field column=journal_name xpath=/MedlineCitationSet/MedlineCitation/Article/Journal/Title commonField=true / field column=title xpath=/MedlineCitationSet/MedlineCitation/Article/ArticleTitle commonField=true / field column=abstract xpath=/MedlineCitationSet/MedlineCitation/Article/Abstract/AbstractText commonField=true / field column=author xpath=/MedlineCitationSet/MedlineCitation/Article/AuthorList/Author commonField=false / field column=year xpath=/MedlineCitationSet/MedlineCitation/Article/Journal/JournalIssue/Pub Date/Year commonField=true / /entity XML Snippet for Author: AuthorList CompleteYN=Y Author ValidYN=Y LastNameMalathi/LastName ForeNameK/ForeName InitialsK/Initials /Author Author ValidYN=Y LastNameXiao/LastName ForeNameY/ForeName InitialsY/Initials /Author Author ValidYN=Y LastNameMitchell/LastName ForeNameA P/ForeName InitialsAP/Initials /Author /AuthorList Response from SOLR: arr name=author str/str str/str str/str str/str str/str str/str str/str str/str str/str str/str str/str str/str str/str str/str /arr str name=journal_nameJournal of cancer research and clinical oncology/str Thanks David On 5/1/12 8:05 AM, lboutros boutr...@gmail.com wrote: Hi David, I think you should add this option : flatten=true and the could you try to use this XPath : /MedlineCitationSet/MedlineCitation/AuthorList/Author see here for the description : http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config .xml-1 I don't think the that the commonField option is needed here, I think you should suppress it. Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/correct-XPATH-syntax-tp3951804p3952812. html Sent from the Solr - User mailing list archive at Nabble.com.
Re: correct XPATH syntax
Sorry hit send too soon. Continued the email below On 4/30/12 4:46 PM, Twomey, David david.two...@novartis.com wrote: Is this possible in DataImportHandler I want the following XML to all collapse into one mult-valued Author field AuthorList CompleteYN=Y Author ValidYN=Y LastNameSørlie/LastName ForeNameT/ForeName InitialsT/Initials /Author Author ValidYN=Y LastNamePerou/LastName ForeNameC M/ForeName InitialsCM/Initials /Author Author ValidYN=Y LastNameTibshirani/LastName ForeNameR/ForeName InitialsR/Initials /Author ... So my XPATH is like xpath=/MedlineCitationSet/MedlineCitation/AuthorList/?? commonField=true /
Re: correct XPATH syntax
Answering my own question: I think I can do this by writing a script that concats the Lastname, Forname and Initials and adding that to xpath = /AuthorList/Author Yes? On 4/30/12 4:49 PM, Twomey, David david.two...@novartis.com wrote: Sorry hit send too soon. Continued the email below On 4/30/12 4:46 PM, Twomey, David david.two...@novartis.com wrote: Is this possible in DataImportHandler I want the following XML to all collapse into one mult-valued Author field AuthorList CompleteYN=Y Author ValidYN=Y LastNameSørlie/LastName ForeNameT/ForeName InitialsT/Initials /Author Author ValidYN=Y LastNamePerou/LastName ForeNameC M/ForeName InitialsCM/Initials /Author Author ValidYN=Y LastNameTibshirani/LastName ForeNameR/ForeName InitialsR/Initials /Author ... So my XPATH is like xpath=/MedlineCitationSet/MedlineCitation/AuthorList/?? commonField=true /