Hi Alessandro,

Thank you for the clarification.
If you believe it would be helpful to filter metadata, by all means open a
ticket and attach a patch.  But I don't exactly see where there would be an
issue, since metadata that is posted that is not in the Solr schema is
simply going to be discarded.

Karl



On Fri, Dec 13, 2013 at 7:56 AM, Alessandro Benedetti <
[email protected]> wrote:

> Hi Karl,
> I'm not referring to filter documents.
> I'm referring to filter metadata associated to a document ( which will be
> mapped in Solr fields by the Solr connector) .
> Because now in the job metadata mapping screen, you can select a sub set of
> metadata to be mapped in solr fields, but then all the metadata associated
> to the document are sent to Solr ( in the way I expressed in the e mail).
>
> Cheers
>
>
> 2013/12/13 Karl Wright <[email protected]>
>
> > Hi Alessandro,
> >
> > I'm not entirely sure I understand your use case, but so far in
> ManifoldCF
> > nobody has requested that an output connector perform document filtering,
> > other than to reject documents by responding with "DOCUMENT_REJECTED".
> > Usually document filtering is part of the repository connector's
> > functionality, since filtering is most effective when it is described in
> > terms of the individual repository's constructs.  At the repository
> > connector level, you can describe an appropriate set of documents to
> > include, rather than crawling everything and rejecting the ones you don't
> > want.  This description is called the "Document Specification".  When you
> > create and edit a job in the Crawler UI some of the job's tabs modify
> that
> > specification, and the repository connector code understands the
> > specification and limits the documents being crawled using it.
> >
> > On the output side, e.g. in the Solr output connector, it's already too
> > late to restrict which documents are crawled.  The best you can do is
> just
> > to not send them to the index, or explicitly reject them.  This makes the
> > utility of any feature to filter documents in an output connector of
> > limited utility, compared with doing the same thing in the Document
> > Specification.
> >
> > Hope this helps,
> > Karl
> >
> >
> >
> >
> > On Fri, Dec 13, 2013 at 7:12 AM, Alessandro Benedetti <
> > [email protected]> wrote:
> >
> > > Hi guys,
> > > I have one question for you.
> > > looking in the details of the SolrConnector it's possible to see that :
> > >
> > > org.apache.manifoldcf.agents.output.solr.HttpPoster
> > >
> > >  writeField(out,LITERAL+newFieldName,values);
> > > // Write the commitWithin parameter
> > >  if (commitWithin != null)
> > >      writeField(out,COMMITWITHIN_METADATA,commitWithin);
> > >      contentStreamUpdateRequest.setParams(out);
> > >      contentStreamUpdateRequest.addContentStream(new
> > >  RepositoryDocumentStream(is,length,contentType,contentName));
> > >
> > > In a Job using a Solr connector, it's possible to express the metadata
> > > mapping, mapping specific metadata to solr field names.
> > > But if you select only 3 mappings , what is happening is that all the
> > > metadata in the manifold document are sent as params of the
> > > contentStreamRequest and the mapping is used only to rename the fields
> we
> > > want to rename .
> > >
> > > In my opinion the mapping should be use as a filter as well.
> > > Because if the user select only 3 metadata, he wants to see only those
> > > metadata.
> > > probably should be present at least a flag that allow the user to
> filter
> > > the metadata sent to solr or not.
> > > A little change that can solve a lot of use cases when the user is
> > > interested only in a subset of metadata and does not need to send
> > > everithing in the header of the http POST.
> > > I'm pretty new to ManifoldCF so let me know if this feature is already
> > > there and I misunderstood something .
> > >
> > >
> > > Cheers
> > >
> > >
> > >
> > >
> > > --
> > > --------------------------
> > >
> > > Benedetti Alessandro
> > > Visiting card : http://about.me/alessandro_benedetti
> > >
> > > "Tyger, tyger burning bright
> > > In the forests of the night,
> > > What immortal hand or eye
> > > Could frame thy fearful symmetry?"
> > >
> > > William Blake - Songs of Experience -1794 England
> > >
> >
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

Reply via email to