Re: DIH transformers - sect 2 - SOLR-1033
I have created SOLR-1033 in JIRA to address this issue. At 13:32 + 21/2/09, Fergus McMenemie wrote: On Mon, Feb 16, 2009 at 3:22 PM, Fergus McMenemie fer...@twig.me.uk wrote: 2) Having used TemplateTransformer to assign a value to an entity column that column cannot be used in other TemplateTransformer operations. In my project I am attempting to reuse x.fileWebPath. To fix this, the last line of transformRow() in TemplateTransformer.java needs replaced with the following which as well as 'putting' the templated-ed string in 'row' also saves it into the 'resolver'. **originally** row.put(column, resolver.replaceTokens(expr)); } **new** String columnName = map.get(DataImporter.COLUMN); expr=resolver.replaceTokens(expr); row.put(columnName, expr); resolverMapCopy.put(columnName, expr); } isn't it better to write a custom transformer to achieve this. I did not want a standard component to change the state of the VariableResolver . I am not sure what is the best way. Noble, (Good to have email working :-) Hmm not sure why this requires a custom transformer. Why is this not more in the nature of a bug fix? Also the current behavior temporarily adds all the column names into the resolver for the duration of the TemplateTransformer's operation, removing them again at the end. I do not think there is any permanent change to the state of the VariableResolver. Surely if we have defined a value for a column, that value should be temporarily available in subsequent template or regexp operations? Fergus. dataConfig dataSource name=myfilereader type=FileDataSource/ document entity name=jc processor=FileListEntityProcessor fileName=^.*\.xml$ newerThan='NOW-1000DAYS' recursive=true rootEntity=false dataSource=null baseDir=/Volumes/spare/ts/solr/content entity name=x dataSource=myfilereader processor=XPathEntityProcessor url=${jc.fileAbsolutePath} rootEntity=true stream=false forEach=/record | /record/mediaBlock transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer field column=fileAbsolutePath template=${jc.fileAbsolutePath} / field column=fileWebPathregex=${x.test}(.*) replaceWith=/ford$1 sourceColName=fileAbsolutePath/ field column=title xpath=/record/title / field column=para1 name=para xpath=/record/sect1/para / field column=para2 name=para xpath=/record/list/listitem/para / field column=pubdate xpath=/record/metadata/da...@qualifier='pubDate'] dateTimeFormat=MMdd / field column=vurl xpath=/record/mediaBlock/mediaObject/@vurl / field column=imgSrcArticle template=${dataimporter.request.fordinstalldir} / field column=imgCpation xpath=/record/mediaBlock/caption / field column=test template=${dataimporter.request.contentinstalldir} / !-- **problem is that vurl is just a fragment of the info needed to access the picture. -- field column=imgWebPathICON regex=(.*)/.* replaceWith=$1/imagery/${x.vurl}s.jpg sourceColName=fileWebPath/ field column=imgWebPathFULL regex=(.*)/.* replaceWith=$1/imagery/${x.vurl}.jpg sourceColName=fileWebPath/ field column=vdkvgwkey template=${jc.fileAbsolutePath}#${x.vurl} / /entity /entity /document /dataConfig -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===
Re: DIH transformers - sect 2
On Mon, Feb 16, 2009 at 3:22 PM, Fergus McMenemie fer...@twig.me.uk wrote: 2) Having used TemplateTransformer to assign a value to an entity column that column cannot be used in other TemplateTransformer operations. In my project I am attempting to reuse x.fileWebPath. To fix this, the last line of transformRow() in TemplateTransformer.java needs replaced with the following which as well as 'putting' the templated-ed string in 'row' also saves it into the 'resolver'. **originally** row.put(column, resolver.replaceTokens(expr)); } **new** String columnName = map.get(DataImporter.COLUMN); expr=resolver.replaceTokens(expr); row.put(columnName, expr); resolverMapCopy.put(columnName, expr); } isn't it better to write a custom transformer to achieve this. I did not want a standard component to change the state of the VariableResolver . I am not sure what is the best way. Noble, (Good to have email working :-) Hmm not sure why this requires a custom transformer. Why is this not more in the nature of a bug fix? Also the current behavior temporarily adds all the column names into the resolver for the duration of the TemplateTransformer's operation, removing them again at the end. I do not think there is any permanent change to the state of the VariableResolver. Surely if we have defined a value for a column, that value should be temporarily available in subsequent template or regexp operations? Fergus. dataConfig dataSource name=myfilereader type=FileDataSource/ document entity name=jc processor=FileListEntityProcessor fileName=^.*\.xml$ newerThan='NOW-1000DAYS' recursive=true rootEntity=false dataSource=null baseDir=/Volumes/spare/ts/solr/content entity name=x dataSource=myfilereader processor=XPathEntityProcessor url=${jc.fileAbsolutePath} rootEntity=true stream=false forEach=/record | /record/mediaBlock transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer field column=fileAbsolutePath template=${jc.fileAbsolutePath} / field column=fileWebPathregex=${x.test}(.*) replaceWith=/ford$1 sourceColName=fileAbsolutePath/ field column=title xpath=/record/title / field column=para1 name=para xpath=/record/sect1/para / field column=para2 name=para xpath=/record/list/listitem/para / field column=pubdate xpath=/record/metadata/da...@qualifier='pubDate'] dateTimeFormat=MMdd / field column=vurl xpath=/record/mediaBlock/mediaObject/@vurl / field column=imgSrcArticle template=${dataimporter.request.fordinstalldir} / field column=imgCpation xpath=/record/mediaBlock/caption / field column=test template=${dataimporter.request.contentinstalldir} / !-- **problem is that vurl is just a fragment of the info needed to access the picture. -- field column=imgWebPathICON regex=(.*)/.* replaceWith=$1/imagery/${x.vurl}s.jpg sourceColName=fileWebPath/ field column=imgWebPathFULL regex=(.*)/.* replaceWith=$1/imagery/${x.vurl}.jpg sourceColName=fileWebPath/ field column=vdkvgwkey template=${jc.fileAbsolutePath}#${x.vurl} / /entity /entity /document /dataConfig -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===
DIH transformers
Hello. I have been beating my head around the data-config.xml listed at the end of this message. It breaks in a few different ways. 1) I have bodged TemplateTransformer to allow it to return when one of the variables is undefined. This ensures my uniqueKey is always defined. But thinking more on Nobel's comments there is use in having it work both ways. ie leaving the column undefined or replacing the variable with . I still like my idea about using the default value of a solr field from schema.xml, but I cant figure out how/where to best implement it. 2) Having used TemplateTransformer to assign a value to an entity column that column cannot be used in other TemplateTransformer operations. In my project I am attempting to reuse x.fileWebPath. To fix this, the last line of transformRow() in TemplateTransformer.java needs replaced with the following which as well as 'putting' the templated-ed string in 'row' also saves it into the 'resolver'. **originally** row.put(column, resolver.replaceTokens(expr)); } **new** String columnName = map.get(DataImporter.COLUMN); expr=resolver.replaceTokens(expr); row.put(columnName, expr); resolverMapCopy.put(columnName, expr); } As an aside I think I ran into the issues covered by SOLR-993. It took a while to figure out I could not a a single columnname/value to the resolver. I had instead to add to the map that was already stored within the resolver. 3) No entity column names can be used within RegexTransformer. I guess all the stuff that was added to TemplateTransformer to allow column names to be used in templates needs re-added into RegexTransformer. I am doing that now... but am confused by the fragment of code which copies from resolverMap into resolverMapCopy. As best I can see resolverMap is always empty; but I am barely able to follow the code! Can somebody explain when/why resolverMap would be populated. Also, I begin to understand comments made by Noble in SOL-1001 about resolving entity attributes in ContextImpl.getEntityAttribute and I guess Shalin was right as well. However it also seems wrong that at the top of every transformer we are going to repeat the same code to load the resolver with information about the entity. 4) In that I am reusing template output within other templates the order of execution becomes important. Can I assume that the explicitly listed columns in an entity are processed by the various transformers in the order they appear within data-config.xml. I *think* that the list of columns within an entity as returned by getAllEntityFields() is actually an ArrayList which I think or order dependent. IS this correct? 5) Should I raise this as a single JIRA issue? 6) Having played with this stuff, I was going to add a bit more to the wiki highlighting some of the possibilities and issues with transformers. But want to check with the list first! dataConfig dataSource name=myfilereader type=FileDataSource/ document entity name=jc processor=FileListEntityProcessor fileName=^.*\.xml$ newerThan='NOW-1000DAYS' recursive=true rootEntity=false dataSource=null baseDir=/Volumes/spare/ts/solr/content entity name=x dataSource=myfilereader processor=XPathEntityProcessor url=${jc.fileAbsolutePath} rootEntity=true stream=false forEach=/record | /record/mediaBlock transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer field column=fileAbsolutePath template=${jc.fileAbsolutePath} / field column=fileWebPathregex=${x.test}(.*) replaceWith=/ford$1 sourceColName=fileAbsolutePath/ field column=title xpath=/record/title / field column=para1 name=para xpath=/record/sect1/para / field column=para2 name=para xpath=/record/list/listitem/para / field column=pubdate xpath=/record/metadata/da...@qualifier='pubDate'] dateTimeFormat=MMdd / field column=vurl xpath=/record/mediaBlock/mediaObject/@vurl / field column=imgSrcArticle template=${dataimporter.request.fordinstalldir} / field column=imgCpation xpath=/record/mediaBlock/caption / field column=test template=${dataimporter.request.contentinstalldir} / !-- **problem is that vurl is just a fragment of the info needed to access the picture. -- field column=imgWebPathICON regex=(.*)/.* replaceWith=$1/imagery/${x.vurl}s.jpg sourceColName=fileWebPath/ field column=imgWebPathFULL regex=(.*)/.*
Re: DIH transformers
On Mon, Feb 16, 2009 at 3:22 PM, Fergus McMenemie fer...@twig.me.uk wrote: Hello. I have been beating my head around the data-config.xml listed at the end of this message. It breaks in a few different ways. 1) I have bodged TemplateTransformer to allow it to return when one of the variables is undefined. This ensures my uniqueKey is always defined. But thinking more on Nobel's comments there is use in having it work both ways. ie leaving the column undefined or replacing the variable with . I still like my idea about using the default value of a solr field from schema.xml, but I cant figure out how/where to best implement it. When a value is missing from the templatewe may end up giving constructing a partial string which may not be desired. If we leave it out as empty, then Solr would automatically put in the default value and it should be solved. Just in case you wish to know the defaultvalue in the schema.xml you can get it from the api. fields = context.getAllEntityFields(); String defval = fields.get(0).get(defaultvalue); 2) Having used TemplateTransformer to assign a value to an entity column that column cannot be used in other TemplateTransformer operations. In my project I am attempting to reuse x.fileWebPath. To fix this, the last line of transformRow() in TemplateTransformer.java needs replaced with the following which as well as 'putting' the templated-ed string in 'row' also saves it into the 'resolver'. **originally** row.put(column, resolver.replaceTokens(expr)); } **new** String columnName = map.get(DataImporter.COLUMN); expr=resolver.replaceTokens(expr); row.put(columnName, expr); resolverMapCopy.put(columnName, expr); } isn't it better to write a custom transformer to achieve this. I did not want a standard component to change the state of the VariableResolver . I am not sure what is the best way. As an aside I think I ran into the issues covered by SOLR-993. It took a while to figure out I could not a a single columnname/value to the resolver. I had instead to add to the map that was already stored within the resolver. 3) No entity column names can be used within RegexTransformer. I guess all the stuff that was added to TemplateTransformer to allow column names to be used in templates needs re-added into RegexTransformer. I am doing that now... but am confused by the fragment of code which copies from resolverMap into resolverMapCopy. As best I can see resolverMap is always empty; but I am barely able to follow the code! Can somebody explain when/why resolverMap would be populated. The behavior is like this, the expression ${currentEntity.colName} does not work automatically. Because the row is not added to VariableResolver .TemplateTransformer has hacked the stuff to make it work. We can think of modifying this behavior Also, I begin to understand comments made by Noble in SOL-1001 about resolving entity attributes in ContextImpl.getEntityAttribute and I guess Shalin was right as well. However it also seems wrong that at the top of every transformer we are going to repeat the same code to load the resolver with information about the entity. 4) In that I am reusing template output within other templates the order of execution becomes important. Can I assume that the explicitly listed columns in an entity are processed by the various transformers in the order they appear within data-config.xml. I *think* that the list of columns within an entity as returned by getAllEntityFields() is actually an ArrayList which I think or order dependent. IS this correct? IT IS CORRECT 5) Should I raise this as a single JIRA issue? Do not add ONE issue forall. If they are logically connected put all of them into one.If not, split them into as many issues as possible. 6) Having played with this stuff, I was going to add a bit more to the wiki highlighting some of the possibilities and issues with transformers. But want to check with the list first! dataConfig dataSource name=myfilereader type=FileDataSource/ document entity name=jc processor=FileListEntityProcessor fileName=^.*\.xml$ newerThan='NOW-1000DAYS' recursive=true rootEntity=false dataSource=null baseDir=/Volumes/spare/ts/solr/content entity name=x dataSource=myfilereader processor=XPathEntityProcessor url=${jc.fileAbsolutePath} rootEntity=true stream=false forEach=/record | /record/mediaBlock transformer=DateFormatTransformer,TemplateTransformer,RegexTransformer field