Re: xpath processing

2010-11-02 Thread Lance Norskog
The XPathEP has the option to run a real XSL script at some point in
its processing chain. I guess you could make an XSL that pulls your
fields out into a simpler XML in the /a/b/c format that the XPath
parser supports.



On Tue, Nov 2, 2010 at 5:37 PM,   wrote:
>
> 
> http://www.loc.gov/mods/v3";
> xmlns:xlink="http://www.w3.org/1999/xlink";
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
> xsi:schemaLocation="http://www.loc.gov/mods/v3
>  http://www.loc.gov/standards/mods/v3/mods-3-0.xsd";>
>    
>        Any place I hang my hat is home
>    
>    
>        St. Louis woman
>        Any place I hang my hat is home
>    
>    
>        Free an' easy that's my style
>    
>    
>        Arlen, Harold
>        1905-1986
>        
>             type="text">creator
>        
>    
>    
>        Mercer, Johnny
>        1909-
>    
>    
>        Davison, R.
>    
>    
>        Bontemps, Arna Wendell
>        1902-1973
>    
>    
>        Cullen, Countee
>        1903-1946
>    
>    notated music
>    
>        
>             type="code">nyu
>        
>        
>            New York
>        
>        De Sylva, Brown & Henderson,
> Inc.
>        c1946
>        1946
>        monographic
>        1946
>        1946
>    
>    
>         type="code">eng
>    
>    
>        print
>        1 vocal score (5 p.) : ill. ; 31 cm.
>    
>    music by Harold Arlen ;
> lyrics by Johnny Mercer.
>    For voice and piano.
>    Includes chord symbols.
>    Illustration by R. Davison.
>    First line: Free an' easy that's my style.
>    "Edward Gross presents St. Louis Woman ... Book by Arna
> Bontemps & Countee Cullen" -- Cover.
>    Publisher's advertising includes musical incipits.
>    
>        Motion picture music
>        Excerpts
>        Vocal scores with piano
>    
>    M1 .S8
>    1403-4 De Sylva, Brown Henderson,
> Inc.
>    
>        Lilly Library, Indiana University
> Bloomington
>    
>    
>         authority="marcorg">IUL
>         encoding="marc">990316
>        LL-SSM-ALC4888
>    
> 
>
> Above is my sample xml
>
> 
> 
> 
>  processor="FileListEntityProcessor" fileName=".*xml" recursive="true"
> baseDir="C:\test_xml">
>  url="${f.fileAbsolutePath}" stream="false" forEach="/mods"
> transformer="DateFormatTransformer,RegexTransformer,TemplateTransformer">
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>
> above is the data config file
> The namePart element in the above xml may or may not have type attribute.
> How can i get data from the namePart element which has no type attribute?
> xpath="/mods/name/namepa...@type != 'date']" This is not working. I dont get
> any errors ,There is no namePart_keyword in the index.
>
>
> Quoting Ken Stanley :
>
>> On Fri, Oct 22, 2010 at 11:52 PM,  wrote:
>>
>>>
>>>
>>> 
>>> 
>>> 
>>> >> processor="FileListEntityProcessor" fileName=".*xml" recursive="true"
>>> baseDir="C:\data\sample_records\mods\starr">
>>> >> processor="XPathEntityProcessor"
>>> url="${f.fileAbsolutePath}" stream="false" forEach="/mods"
>>> transformer="DateFormatTransformer,RegexTransformer,TemplateTransformer">
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> >> />
>>> 
>>> 
>>> 
>>> 
>>
>>
>> The documentation says you don't need a dataSource for your
>> XPathEntityProcessor entity; in my configuration, I have mine set to the
>> name of the top-level FileListEntityProcessor. Everything else looks fine.
>> Can you provide one record from your data? Also, are you getting any
>> errors
>> in your log?
>>
>> - Ken
>>
>
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: xpath processing

2010-11-02 Thread pghorpade



http://www.loc.gov/mods/v3";  
xmlns:xlink="http://www.w3.org/1999/xlink";  
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";  
xsi:schemaLocation="http://www.loc.gov/mods/v3 
http://www.loc.gov/standards/mods/v3/mods-3-0.xsd";>


Any place I hang my hat is home


St. Louis woman
Any place I hang my hat is home


Free an' easy that's my style


Arlen, Harold
1905-1986

type="text">creator




Mercer, Johnny
1909-


Davison, R.


Bontemps, Arna Wendell
1902-1973


Cullen, Countee
1903-1946

notated music


type="code">nyu



New York

De Sylva, Brown & Henderson, Inc.
c1946
1946
monographic
1946
1946


type="code">eng



print
1 vocal score (5 p.) : ill. ; 31 cm.

music by Harold  
Arlen ; lyrics by Johnny Mercer.

For voice and piano.
Includes chord symbols.
Illustration by R. Davison.
First line: Free an' easy that's my style.
"Edward Gross presents St. Louis Woman ... Book by  
Arna Bontemps & Countee Cullen" -- Cover.

Publisher's advertising includes musical incipits.

Motion picture music
Excerpts
Vocal scores with piano

M1 .S8
1403-4 De Sylva, Brown  
Henderson, Inc.


Lilly Library, Indiana University  
Bloomington



authority="marcorg">IUL
encoding="marc">990316

LL-SSM-ALC4888



Above is my sample xml




processor="FileListEntityProcessor" fileName=".*xml" recursive="true"  
baseDir="C:\test_xml">
processor="XPathEntityProcessor" url="${f.fileAbsolutePath}"  
stream="false" forEach="/mods"  
transformer="DateFormatTransformer,RegexTransformer,TemplateTransformer">
















above is the data config file
The namePart element in the above xml may or may not have type attribute.
How can i get data from the namePart element which has no type attribute?
xpath="/mods/name/namepa...@type != 'date']" This is not working. I  
dont get any errors ,There is no namePart_keyword in the index.



Quoting Ken Stanley :


On Fri, Oct 22, 2010 at 11:52 PM,  wrote:


























The documentation says you don't need a dataSource for your
XPathEntityProcessor entity; in my configuration, I have mine set to the
name of the top-level FileListEntityProcessor. Everything else looks fine.
Can you provide one record from your data? Also, are you getting any errors
in your log?

- Ken






Re: xpath processing

2010-10-27 Thread Lance Norskog
The XPathEntityProcessor does not do full XPath. It is a very limited 
set intended to be very fast.
You can add code in any scripting language, but that is not really 
performant.
Is it possible to use the RegexTransformer to find your records with 
regular expressions?


Ken Stanley wrote:

On Fri, Oct 22, 2010 at 11:52 PM,  wrote:

   





















 


The documentation says you don't need a dataSource for your
XPathEntityProcessor entity; in my configuration, I have mine set to the
name of the top-level FileListEntityProcessor. Everything else looks fine.
Can you provide one record from your data? Also, are you getting any errors
in your log?

- Ken

   


Re: xpath processing

2010-10-23 Thread Ken Stanley
On Fri, Oct 22, 2010 at 11:52 PM,  wrote:

>
>
> 
> 
> 
>  processor="FileListEntityProcessor" fileName=".*xml" recursive="true"
> baseDir="C:\data\sample_records\mods\starr">
>  url="${f.fileAbsolutePath}" stream="false" forEach="/mods"
> transformer="DateFormatTransformer,RegexTransformer,TemplateTransformer">
> 
> 
> 
> 
> 
> 
> 
> 
> 
>  />
> 
> 
> 
> 


The documentation says you don't need a dataSource for your
XPathEntityProcessor entity; in my configuration, I have mine set to the
name of the top-level FileListEntityProcessor. Everything else looks fine.
Can you provide one record from your data? Also, are you getting any errors
in your log?

- Ken


Re: xpath processing

2010-10-23 Thread Ben Boggess
> processor="FileListEntityProcessor" fileName=".*xml" recursive="true" 

Shouldn't this be fileName="*.xml"?

Ben

On Oct 22, 2010, at 10:52 PM, pghorp...@ucla.edu wrote:

> 
> 
> 
> 
> 
>  processor="FileListEntityProcessor" fileName=".*xml" recursive="true" 
> baseDir="C:\data\sample_records\mods\starr">
>  url="${f.fileAbsolutePath}" stream="false" forEach="/mods" 
> transformer="DateFormatTransformer,RegexTransformer,TemplateTransformer">
> 
> 
> 
> 
> 
> 
> 
> 
> 
>  />
> 
> 
> 
> 
> 
> Quoting Ken Stanley :
> 
>> Parinita,
>> 
>> In its simplest form, what does your entity definition for DIH look like;
>> also, what does one record from your xml look like? We need more information
>> before we can really be of any help. :)
>> 
>> - Ken
>> 
>> It looked like something resembling white marble, which was
>> probably what it was: something resembling white marble.
>>-- Douglas Adams, "The Hitchhikers Guide to the Galaxy"
>> 
>> 
>> On Fri, Oct 22, 2010 at 8:00 PM,  wrote:
>> 
>>> Quoting pghorp...@ucla.edu:
>>> Can someone help me please?
>>> 
>>> 
 I am trying to import mods xml data in solr using  the xml/http datasource
 
 This does not work with XPathEntityProcessor of the data import handler
 xpath="/mods/name/namepa...@type = 'date']"
 
 I actually have 143 records with type attribute as 'date' for element
 namePart.
 
 Thank you
 Parinita
 
 
>>> 
>>> 
>> 
> 
> 


Re: xpath processing

2010-10-22 Thread pghorpade






processor="FileListEntityProcessor" fileName=".*xml" recursive="true"  
baseDir="C:\data\sample_records\mods\starr">
processor="XPathEntityProcessor" url="${f.fileAbsolutePath}"  
stream="false" forEach="/mods"  
transformer="DateFormatTransformer,RegexTransformer,TemplateTransformer">
















Quoting Ken Stanley :


Parinita,

In its simplest form, what does your entity definition for DIH look like;
also, what does one record from your xml look like? We need more information
before we can really be of any help. :)

- Ken

It looked like something resembling white marble, which was
probably what it was: something resembling white marble.
-- Douglas Adams, "The Hitchhikers Guide to the Galaxy"


On Fri, Oct 22, 2010 at 8:00 PM,  wrote:


Quoting pghorp...@ucla.edu:
Can someone help me please?



I am trying to import mods xml data in solr using  the xml/http datasource

This does not work with XPathEntityProcessor of the data import handler
xpath="/mods/name/namepa...@type = 'date']"

I actually have 143 records with type attribute as 'date' for element
namePart.

Thank you
Parinita












Re: xpath processing

2010-10-22 Thread Ken Stanley
Parinita,

In its simplest form, what does your entity definition for DIH look like;
also, what does one record from your xml look like? We need more information
before we can really be of any help. :)

- Ken

It looked like something resembling white marble, which was
probably what it was: something resembling white marble.
-- Douglas Adams, "The Hitchhikers Guide to the Galaxy"


On Fri, Oct 22, 2010 at 8:00 PM,  wrote:

> Quoting pghorp...@ucla.edu:
> Can someone help me please?
>
>
>> I am trying to import mods xml data in solr using  the xml/http datasource
>>
>> This does not work with XPathEntityProcessor of the data import handler
>> xpath="/mods/name/namepa...@type = 'date']"
>>
>> I actually have 143 records with type attribute as 'date' for element
>> namePart.
>>
>> Thank you
>> Parinita
>>
>>
>
>


Re: xpath processing

2010-10-22 Thread pghorpade

Quoting pghorp...@ucla.edu:
Can someone help me please?


I am trying to import mods xml data in solr using  the xml/http datasource

This does not work with XPathEntityProcessor of the data import handler
xpath="/mods/name/namepa...@type = 'date']"

I actually have 143 records with type attribute as 'date' for  
element namePart.


Thank you
Parinita






RE: XPath Processing Applied to Clob

2010-03-18 Thread Craig Christman
You could also do the xpath processing on the oracle end using the extract or 
extractValue functions.  Here's a good reference:  
http://www.psoug.org/reference/xml_functions.html


-Original Message-
From: Neil Chaudhuri [mailto:nchaudh...@potomacfusion.com]
Sent: Wednesday, March 17, 2010 3:24 PM
To: solr-user@lucene.apache.org
Subject: XPath Processing Applied to Clob

I am using the DataImportHandler to index 3 fields in a table: an id, a date, 
and the text of a document. This is an Oracle database, and the document is an 
XML document stored as Oracle's xmltype data type. Since this is nothing more 
than a fancy CLOB, I am using the ClobTransformer to extract the actual XML. 
However, I don't want to index/store all the XML but instead just the XML 
within a set of tags. The XPath itself is trivial, but it seems like the 
XPathEntityProcessor only works for XML file content rather than the output of 
a Transformer.

Here is what I currently have that fails:





















Is there an easy way to do this without writing my own custom transformer?

Thanks.


Re: XPath Processing Applied to Clob

2010-03-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
keep in mind that the xpath is case-sensitive. paste a sample xml

what is dataField="d.text"  it does not seem to refer to anything.
where is the enclosing entity?
did you mean dataField="doc.text".

xpath="//BODY" is a supported syntax as long as you are using Solr1.4 or higher




On Thu, Mar 18, 2010 at 3:15 AM, Neil Chaudhuri
 wrote:
> Incidentally, I tried adding this:
>
> 
> 
>         dataField="d.text" forEach="/MESSAGE">
>                  
>        
> 
>
> But this didn't seem to change anything.
>
> Any insight is appreciated.
>
> Thanks.
>
>
>
> From: Neil Chaudhuri
> Sent: Wednesday, March 17, 2010 3:24 PM
> To: solr-user@lucene.apache.org
> Subject: XPath Processing Applied to Clob
>
> I am using the DataImportHandler to index 3 fields in a table: an id, a date, 
> and the text of a document. This is an Oracle database, and the document is 
> an XML document stored as Oracle's xmltype data type. Since this is nothing 
> more than a fancy CLOB, I am using the ClobTransformer to extract the actual 
> XML. However, I don't want to index/store all the XML but instead just the 
> XML within a set of tags. The XPath itself is trivial, but it seems like the 
> XPathEntityProcessor only works for XML file content rather than the output 
> of a Transformer.
>
> Here is what I currently have that fails:
>
>
> 
>
>        
>
>            
>
>            
>
>            
>             forEach="/MESSAGE" url="${doc.text}">
>                
>
>            
>
>        
>
> 
>
>
> Is there an easy way to do this without writing my own custom transformer?
>
> Thanks.
>



-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com


Re: XPath Processing Applied to Clob

2010-03-17 Thread Lance Norskog
The XPath parser in the DIH is a limited implementation. The unit test
program is the only enumeration (that I can find) of what it handles:

http://svn.apache.org/repos/asf/lucene/solr/trunk/contrib/dataimporthandler/src/test/java/org/apache/solr/handler/dataimport/TestXPathRecordReader.java

//BODY in fact is not allowed, and should throw an Exception. Or at
least some kind of error message. Perhaps there is one in the logs?


On Wed, Mar 17, 2010 at 2:45 PM, Neil Chaudhuri
 wrote:
> Incidentally, I tried adding this:
>
> 
> 
>         dataField="d.text" forEach="/MESSAGE">
>                  
>        
> 
>
> But this didn't seem to change anything.
>
> Any insight is appreciated.
>
> Thanks.
>
>
>
> From: Neil Chaudhuri
> Sent: Wednesday, March 17, 2010 3:24 PM
> To: solr-user@lucene.apache.org
> Subject: XPath Processing Applied to Clob
>
> I am using the DataImportHandler to index 3 fields in a table: an id, a date, 
> and the text of a document. This is an Oracle database, and the document is 
> an XML document stored as Oracle's xmltype data type. Since this is nothing 
> more than a fancy CLOB, I am using the ClobTransformer to extract the actual 
> XML. However, I don't want to index/store all the XML but instead just the 
> XML within a set of tags. The XPath itself is trivial, but it seems like the 
> XPathEntityProcessor only works for XML file content rather than the output 
> of a Transformer.
>
> Here is what I currently have that fails:
>
>
> 
>
>        
>
>            
>
>            
>
>            
>             forEach="/MESSAGE" url="${doc.text}">
>                
>
>            
>
>        
>
> 
>
>
> Is there an easy way to do this without writing my own custom transformer?
>
> Thanks.
>



-- 
Lance Norskog
goks...@gmail.com


RE: XPath Processing Applied to Clob

2010-03-17 Thread Neil Chaudhuri
Incidentally, I tried adding this:




  



But this didn't seem to change anything.

Any insight is appreciated.

Thanks.



From: Neil Chaudhuri
Sent: Wednesday, March 17, 2010 3:24 PM
To: solr-user@lucene.apache.org
Subject: XPath Processing Applied to Clob

I am using the DataImportHandler to index 3 fields in a table: an id, a date, 
and the text of a document. This is an Oracle database, and the document is an 
XML document stored as Oracle's xmltype data type. Since this is nothing more 
than a fancy CLOB, I am using the ClobTransformer to extract the actual XML. 
However, I don't want to index/store all the XML but instead just the XML 
within a set of tags. The XPath itself is trivial, but it seems like the 
XPathEntityProcessor only works for XML file content rather than the output of 
a Transformer.

Here is what I currently have that fails:





















Is there an easy way to do this without writing my own custom transformer?

Thanks.