Please see my response interleaved below. On Mon, Feb 27, 2012 at 9:53 AM, Matthew Parker <mpar...@apogeeintegration.com> wrote: > I'm trying to push data into SOLR.. > > Is there a way to transform the metadata coming in from different data > sources like SharePoint, and the File Share, prior to posting it into SOLR? >
In general, ManifoldCF does not have data transformation abilities. With Solr, we rely on Solr Cell, which is a pipeline built on Tika, to extract content from documents and to perform transformations to document metadata etc. It is possible that at some point it will be possible to do more transformations in ManifoldCF in order to support search engines that don't have a pipeline, but that is currently not available. > For instance, documents have metadata specifying their file path. I need to > transform that to a URL I can use within SOLR to retrieve that document > through a servlet that I wrote. > The ManifoldCF model is that a connector creates a URL for each document that it indexes, using whatever makes sense for that particular repository to get you back to the document in question. So, for instance, Documentum documents will use URLs that point at Documentum's Webtop web application. It would be helpful to understand more precisely what you are trying to do. You could, for instance, modify your servlet to redirect to the ManifoldCF-generated URL. It gets indexed into Solr as the "id" field. > Also, based on specific metadata that I'm seeing in the documents, I might > want to conditionally add populate other fields in SOLR index. > That sounds like a job for the Tika pipeline to me. Thanks, Karl > ------------------------------ > This e-mail and any files transmitted with it may be proprietary. Please > note that any views or opinions presented in this e-mail are solely those of > the author and do not necessarily represent those of Apogee Integration. >