Re: correct XPATH syntax

2012-05-04 Thread Lance Norskog
The XPath implementation in DIH is very minimal- it is tuned for
speed, not features. The XSL option lets you do everything you could
want, with a slower engine.

On Thu, May 3, 2012 at 7:30 AM, lboutros boutr...@gmail.com wrote:
 ok, not that easy :)

 I did not test it myself but it seems that you could use an XSL
 preprocessing with the 'xsl' option in your XPathEntityProcessor :

 http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1

 You could transform the author part as you wish and then import the author
 field with your actual configuration.

 Ludovic.

 -
 Jouve
 France.
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/correct-XPATH-syntax-tp3951804p3959397.html
 Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Lance Norskog
goks...@gmail.com


Re: correct XPATH syntax

2012-05-03 Thread lboutros
Hi David,

what do you want to do with the 'commonField' option ?

Is it possible to have the part of the schema for the author field please ?
Is the author field stored ?

Ludovic.

-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/correct-XPATH-syntax-tp3951804p3959097.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: correct XPATH syntax

2012-05-03 Thread Twomey, David
Is what I want even possible with XPathEntityProcessor?

It sort of works now - I didn't realize the flatten attribute is an attribute 
of field instead of entity.

BUT it's still not what I would like.

The XML looks like below and it's nested within 
/MedlineCitationSet/MedlineCitation/Article/

AuthorList CompleteYN=Y
Author ValidYN=Y
LastNameStarremans/LastName
ForeNamePatrick G J F/ForeName
InitialsPG/Initials
/AuthorAuthor ValidYN=Y
LastNamevan der Kemp/LastName
ForeNameAnnemiete W C M/ForeName
InitialsAW/Initials
/Author
Author ValidYN=Y
LastNameKnoers/LastName
ForeNameNine V A M/ForeName
InitialsNV/Initials
/Author
Author ValidYN=Y
LastNamevan den Heuvel/LastName
ForeNameLambertus P W J/ForeName
InitialsLP/Initials
/Author
/AuthorList

What I would like to see in the index author field is
authorStarremans PG, Van der Kemp AW, etc   /author  note lastname 
Initials,  no forename.


When I set Xpath like this
field column=author 
xpath=/MedlineCitationSet/MedlineCitation/Article/AuthorList/Author 
flatten=true /

I get this in the index
arr name=author
strStarremans Patrick G J F PG/str
strVan der Kemp Annemiete W C M AW/str
.
.
/arr
note: the forename field is included

My author field in the schema.xml is
field name=author type=textgen indexed=true stored=true 
multiValued=true required=false/

So is this even possible with XPathEntityProcessor?

Thanks
David




On 5/3/12 8:40 AM, lboutros boutr...@gmail.commailto:boutr...@gmail.com 
wrote:

Hi David,

what do you want to do with the 'commonField' option ?

Is it possible to have the part of the schema for the author field please ?
Is the author field stored ?

Ludovic.

-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/correct-XPATH-syntax-tp3951804p3959097.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: correct XPATH syntax

2012-05-03 Thread lboutros
ok, not that easy :)

I did not test it myself but it seems that you could use an XSL
preprocessing with the 'xsl' option in your XPathEntityProcessor :

http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1

You could transform the author part as you wish and then import the author
field with your actual configuration.

Ludovic.

-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/correct-XPATH-syntax-tp3951804p3959397.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: correct XPATH syntax

2012-05-01 Thread lboutros
Hi David,

I think you should add this option : flatten=true

and the could you try to use this XPath :

/MedlineCitationSet/MedlineCitation/AuthorList/Author

see here for the description :

http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1

I don't think the that the commonField option is needed here, I think you
should suppress it.

Ludovic. 

-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/correct-XPATH-syntax-tp3951804p3952812.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: correct XPATH syntax

2012-05-01 Thread Twomey, David
Ludovic,

Thanks for your help.  I tried your suggestion but it didn't work for
Authors.  Below are 3 snippets from data-config.xml, the XML file and the
XML response from the DB

Data-config:
 entity name=medlineFiles processor=XPathEntityProcessor
url=${medlineFileList.fileAbsolutePath}
forEach=/MedlineCitationSet/MedlineCitation

transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer,Log
Transformer
logTemplate=   processing
${medlineFileList.fileAbsolutePath}
logLevel=info
flatten=true
stream=true

field column=pmid
xpath=/MedlineCitationSet/MedlineCitation/PMID   commonField=true /
field column=journal_name
xpath=/MedlineCitationSet/MedlineCitation/Article/Journal/Title
commonField=true /
field column=title
xpath=/MedlineCitationSet/MedlineCitation/Article/ArticleTitle
commonField=true /
field column=abstract
xpath=/MedlineCitationSet/MedlineCitation/Article/Abstract/AbstractText
 commonField=true /
field column=author
xpath=/MedlineCitationSet/MedlineCitation/Article/AuthorList/Author
commonField=false /
field column=year
xpath=/MedlineCitationSet/MedlineCitation/Article/Journal/JournalIssue/Pub
Date/Year   commonField=true /

  /entity



XML Snippet for Author:
AuthorList CompleteYN=Y
 Author ValidYN=Y
  LastNameMalathi/LastName
  ForeNameK/ForeName
  InitialsK/Initials
 /Author
 Author ValidYN=Y
  LastNameXiao/LastName
  ForeNameY/ForeName
  InitialsY/Initials
 /Author
 Author ValidYN=Y
  LastNameMitchell/LastName
  ForeNameA P/ForeName
  InitialsAP/Initials
 /Author
/AuthorList


Response from SOLR:

arr name=author
str/str
str/str
str/str
str/str
str/str
str/str
str/str
str/str
str/str
str/str
str/str
str/str
str/str
str/str
/arr
str name=journal_nameJournal of cancer research and clinical
oncology/str




Thanks
David

On 5/1/12 8:05 AM, lboutros boutr...@gmail.com wrote:

Hi David,

I think you should add this option : flatten=true

and the could you try to use this XPath :

/MedlineCitationSet/MedlineCitation/AuthorList/Author

see here for the description :

http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config
.xml-1

I don't think the that the commonField option is needed here, I think you
should suppress it.

Ludovic. 

-
Jouve
France.
--
View this message in context:
http://lucene.472066.n3.nabble.com/correct-XPATH-syntax-tp3951804p3952812.
html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: correct XPATH syntax

2012-04-30 Thread Twomey, David
Sorry hit send too soon.  Continued the email below

On 4/30/12 4:46 PM, Twomey, David david.two...@novartis.com wrote:


Is this possible in DataImportHandler

I want the following XML to all collapse into one mult-valued Author field

AuthorList CompleteYN=Y
 Author ValidYN=Y
  LastNameSørlie/LastName
  ForeNameT/ForeName
  InitialsT/Initials
 /Author
 Author ValidYN=Y
  LastNamePerou/LastName
  ForeNameC M/ForeName
  InitialsCM/Initials
 /Author
 Author ValidYN=Y
  LastNameTibshirani/LastName
  ForeNameR/ForeName
  InitialsR/Initials
 /Author
...

So my XPATH is like
xpath=/MedlineCitationSet/MedlineCitation/AuthorList/??
commonField=true /





Re: correct XPATH syntax

2012-04-30 Thread Twomey, David
Answering my own question:  I think I can do this by writing a script that
concats the Lastname, Forname and Initials and adding that to xpath =
/AuthorList/Author 

Yes?

On 4/30/12 4:49 PM, Twomey, David david.two...@novartis.com wrote:

Sorry hit send too soon.  Continued the email below

On 4/30/12 4:46 PM, Twomey, David david.two...@novartis.com wrote:


Is this possible in DataImportHandler

I want the following XML to all collapse into one mult-valued Author
field

AuthorList CompleteYN=Y
 Author ValidYN=Y
  LastNameSørlie/LastName
  ForeNameT/ForeName
  InitialsT/Initials
 /Author
 Author ValidYN=Y
  LastNamePerou/LastName
  ForeNameC M/ForeName
  InitialsCM/Initials
 /Author
 Author ValidYN=Y
  LastNameTibshirani/LastName
  ForeNameR/ForeName
  InitialsR/Initials
 /Author
...

So my XPATH is like
xpath=/MedlineCitationSet/MedlineCitation/AuthorList/??
commonField=true /