Hi Karl,
I'm not referring to filter documents.
I'm referring to filter metadata associated to a document ( which will be
mapped in Solr fields by the Solr connector) .
Because now in the job metadata mapping screen, you can select a sub set of
metadata to be mapped in solr fields, but then all the metadata associated
to the document are sent to Solr ( in the way I expressed in the e mail).

Cheers


2013/12/13 Karl Wright <[email protected]>

> Hi Alessandro,
>
> I'm not entirely sure I understand your use case, but so far in ManifoldCF
> nobody has requested that an output connector perform document filtering,
> other than to reject documents by responding with "DOCUMENT_REJECTED".
> Usually document filtering is part of the repository connector's
> functionality, since filtering is most effective when it is described in
> terms of the individual repository's constructs.  At the repository
> connector level, you can describe an appropriate set of documents to
> include, rather than crawling everything and rejecting the ones you don't
> want.  This description is called the "Document Specification".  When you
> create and edit a job in the Crawler UI some of the job's tabs modify that
> specification, and the repository connector code understands the
> specification and limits the documents being crawled using it.
>
> On the output side, e.g. in the Solr output connector, it's already too
> late to restrict which documents are crawled.  The best you can do is just
> to not send them to the index, or explicitly reject them.  This makes the
> utility of any feature to filter documents in an output connector of
> limited utility, compared with doing the same thing in the Document
> Specification.
>
> Hope this helps,
> Karl
>
>
>
>
> On Fri, Dec 13, 2013 at 7:12 AM, Alessandro Benedetti <
> [email protected]> wrote:
>
> > Hi guys,
> > I have one question for you.
> > looking in the details of the SolrConnector it's possible to see that :
> >
> > org.apache.manifoldcf.agents.output.solr.HttpPoster
> >
> >  writeField(out,LITERAL+newFieldName,values);
> > // Write the commitWithin parameter
> >  if (commitWithin != null)
> >      writeField(out,COMMITWITHIN_METADATA,commitWithin);
> >      contentStreamUpdateRequest.setParams(out);
> >      contentStreamUpdateRequest.addContentStream(new
> >  RepositoryDocumentStream(is,length,contentType,contentName));
> >
> > In a Job using a Solr connector, it's possible to express the metadata
> > mapping, mapping specific metadata to solr field names.
> > But if you select only 3 mappings , what is happening is that all the
> > metadata in the manifold document are sent as params of the
> > contentStreamRequest and the mapping is used only to rename the fields we
> > want to rename .
> >
> > In my opinion the mapping should be use as a filter as well.
> > Because if the user select only 3 metadata, he wants to see only those
> > metadata.
> > probably should be present at least a flag that allow the user to filter
> > the metadata sent to solr or not.
> > A little change that can solve a lot of use cases when the user is
> > interested only in a subset of metadata and does not need to send
> > everithing in the header of the http POST.
> > I'm pretty new to ManifoldCF so let me know if this feature is already
> > there and I misunderstood something .
> >
> >
> > Cheers
> >
> >
> >
> >
> > --
> > --------------------------
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Reply via email to