Re: [DIH] Multiple repeat XPath stmts

2012-06-13 Thread alesp
TNX. A lifesaver...

--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-Multiple-repeat-XPath-stmts-tp499770p3989439.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: [DIH] Multiple repeat XPath stmts

2009-09-14 Thread Grant Ingersoll

As I said, copying is not an option.  That will break everything else.

On Sep 14, 2009, at 1:07 AM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



The XPathRecordreader has a limit one mapping per xpath. So copying is
the best solution

On Mon, Sep 14, 2009 at 2:54 AM, Fergus McMenemie  
fer...@twig.me.uk wrote:

I'm trying to import several RSS feeds using DIH and running into a
bit of a problem.  Some feeds define a GUID value that I map to my
Solr ID, while others don't.  I also have a link field which I  
fill in
with the RSS link field.  For the feeds that don't have the GUID  
value
set, I want to use the link field as the id.  However, if I define  
the

same XPath twice, but map it to two diff. columns I don't get the id
value set.

For instance, I want to do:
schema.xml
field name=id type=string indexed=true stored=true
required=true/
field name=link type=string indexed=true stored=false/

DIH config:
field column=id xpath=/rss/channel/item/link /
field column=link xpath=/rss/channel/item/link /

Because I am consolidating multiple fields, I'm not able to do
copyFields, unless of course, I wanted to implement conditional copy
fields (only copy if the field is not defined) which I would  
rather not.


How do I solve this?



How about.

entity name=x ... transformer=TemplateTransformer
 field column=link xpath=/rss/channel/item/link /
 field column=GUID xpath=/rss/channel/GUID /
 field column=id   template=${x.link} /
 field column-id   template=${x.GUID} /

The TemplateTransformer does nothing if its source expression is  
null.

So the first transform assign the fallback value to ID, this is
overwritten by the GUID if it is defined.

You can not sort of do if-then-else using a combination of template
and regex transformers. Adding a bit of maths to the transformers and
I think we will have a turing complete language:-)

fergus.


Thanks,
Grant


--

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===





--
-
Noble Paul | Principal Engineer| AOL | http://aol.com


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: [DIH] Multiple repeat XPath stmts

2009-09-14 Thread Noble Paul നോബിള്‍ नोब्ळ्
if you wish to use conditional copy you can use a RegexTransformer

field column=guid  xpath=/rss/channel/guid/
field column=id regex=.* sourceColName=guid
replaceWith=${entityname.guid}/

this means that if guid!= null 'id' will be set to guid


On Mon, Sep 14, 2009 at 4:16 PM, Grant Ingersoll gsing...@apache.org wrote:
 As I said, copying is not an option.  That will break everything else.

 On Sep 14, 2009, at 1:07 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:

 The XPathRecordreader has a limit one mapping per xpath. So copying is
 the best solution

 On Mon, Sep 14, 2009 at 2:54 AM, Fergus McMenemie fer...@twig.me.uk
 wrote:

 I'm trying to import several RSS feeds using DIH and running into a
 bit of a problem.  Some feeds define a GUID value that I map to my
 Solr ID, while others don't.  I also have a link field which I fill in
 with the RSS link field.  For the feeds that don't have the GUID value
 set, I want to use the link field as the id.  However, if I define the
 same XPath twice, but map it to two diff. columns I don't get the id
 value set.

 For instance, I want to do:
 schema.xml
 field name=id type=string indexed=true stored=true
 required=true/
 field name=link type=string indexed=true stored=false/

 DIH config:
 field column=id xpath=/rss/channel/item/link /
 field column=link xpath=/rss/channel/item/link /

 Because I am consolidating multiple fields, I'm not able to do
 copyFields, unless of course, I wanted to implement conditional copy
 fields (only copy if the field is not defined) which I would rather not.

 How do I solve this?


 How about.

 entity name=x ... transformer=TemplateTransformer
  field column=link xpath=/rss/channel/item/link /
  field column=GUID xpath=/rss/channel/GUID /
  field column=id   template=${x.link} /
  field column-id   template=${x.GUID} /

 The TemplateTransformer does nothing if its source expression is null.
 So the first transform assign the fallback value to ID, this is
 overwritten by the GUID if it is defined.

 You can not sort of do if-then-else using a combination of template
 and regex transformers. Adding a bit of maths to the transformers and
 I think we will have a turing complete language:-)

 fergus.

 Thanks,
 Grant

 --

 ===
 Fergus McMenemie               Email:fer...@twig.me.uk
 Techmore Ltd                   Phone:(UK) 07721 376021

 Unix/Mac/Intranets             Analyst Programmer
 ===




 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com

 --
 Grant Ingersoll
 http://www.lucidimagination.com/

 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
 Solr/Lucene:
 http://www.lucidimagination.com/search





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: [DIH] Multiple repeat XPath stmts

2009-09-13 Thread Fergus McMenemie
I'm trying to import several RSS feeds using DIH and running into a  
bit of a problem.  Some feeds define a GUID value that I map to my  
Solr ID, while others don't.  I also have a link field which I fill in  
with the RSS link field.  For the feeds that don't have the GUID value  
set, I want to use the link field as the id.  However, if I define the  
same XPath twice, but map it to two diff. columns I don't get the id  
value set.

For instance, I want to do:
schema.xml
field name=id type=string indexed=true stored=true  
required=true/
field name=link type=string indexed=true stored=false/

DIH config:
field column=id xpath=/rss/channel/item/link /
field column=link xpath=/rss/channel/item/link /

Because I am consolidating multiple fields, I'm not able to do  
copyFields, unless of course, I wanted to implement conditional copy  
fields (only copy if the field is not defined) which I would rather not.

How do I solve this?


How about.

entity name=x ... transformer=TemplateTransformer
  field column=link xpath=/rss/channel/item/link /
  field column=GUID xpath=/rss/channel/GUID /
  field column=id   template=${x.link} /
  field column-id   template=${x.GUID} /

The TemplateTransformer does nothing if its source expression is null.
So the first transform assign the fallback value to ID, this is
overwritten by the GUID if it is defined.

You can not sort of do if-then-else using a combination of template
and regex transformers. Adding a bit of maths to the transformers and
I think we will have a turing complete language:-) 

fergus.

Thanks,
Grant

-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===


Re: [DIH] Multiple repeat XPath stmts

2009-09-13 Thread Noble Paul നോബിള്‍ नोब्ळ्
The XPathRecordreader has a limit one mapping per xpath. So copying is
the best solution

On Mon, Sep 14, 2009 at 2:54 AM, Fergus McMenemie fer...@twig.me.uk wrote:
I'm trying to import several RSS feeds using DIH and running into a
bit of a problem.  Some feeds define a GUID value that I map to my
Solr ID, while others don't.  I also have a link field which I fill in
with the RSS link field.  For the feeds that don't have the GUID value
set, I want to use the link field as the id.  However, if I define the
same XPath twice, but map it to two diff. columns I don't get the id
value set.

For instance, I want to do:
schema.xml
field name=id type=string indexed=true stored=true
required=true/
field name=link type=string indexed=true stored=false/

DIH config:
field column=id xpath=/rss/channel/item/link /
field column=link xpath=/rss/channel/item/link /

Because I am consolidating multiple fields, I'm not able to do
copyFields, unless of course, I wanted to implement conditional copy
fields (only copy if the field is not defined) which I would rather not.

How do I solve this?


 How about.

 entity name=x ... transformer=TemplateTransformer
  field column=link xpath=/rss/channel/item/link /
  field column=GUID xpath=/rss/channel/GUID /
  field column=id   template=${x.link} /
  field column-id   template=${x.GUID} /

 The TemplateTransformer does nothing if its source expression is null.
 So the first transform assign the fallback value to ID, this is
 overwritten by the GUID if it is defined.

 You can not sort of do if-then-else using a combination of template
 and regex transformers. Adding a bit of maths to the transformers and
 I think we will have a turing complete language:-)

 fergus.

Thanks,
Grant

 --

 ===
 Fergus McMenemie               Email:fer...@twig.me.uk
 Techmore Ltd                   Phone:(UK) 07721 376021

 Unix/Mac/Intranets             Analyst Programmer
 ===




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com