Hi Alessandro,

The Solr metadata mapping field is described thoroughly in the end user
documentation:

"

When you configure a job to use a Solr-type output connection, the Solr
connection type provides a tab called "Field Mapping". The purpose of this
tab is to allow you to map metadata fields as fetched by the job's
connection type to fields that Solr is set up to receive. This is necessary
because the names of the metadata items are often determined by the
repository, with no alignment to fields defined in the Solr schema. You may
also suppress specific metadata items from being sent to the index using
this tab. The tab looks like this:


 [image: Solr Specification, Field Mapping tab]


Add a new mapping by filling in the "source" with the name of the metadata
item from the repository, and "target" as the name of the output field in
Solr, and click the "Add" button. Leaving the "target" field blank will
result in all metadata items of that name not being sent to Solr."

Karl


On Fri, Dec 13, 2013 at 9:54 AM, Alessandro Benedetti <
[email protected]> wrote:

> But we were talking about the output connector right ?
> Maybe I want the repository connector to extract those metadata fields, and
> those metadata will be used differently by different output connectors (
> for example 2 different Jobs, with different Solr mappings).
>
> Sorry if I repeat the question but :
> What is the meaning of the Solr field mapping in a ManifoldJob ( that uses
> a Solr Connector) ?
> If the meaning is to index in Solr only those fields, so, there is that
> little bug :)
>
>
>
> 2013/12/13 Karl Wright <[email protected]>
>
> > Hi Alessandro,
> >
> > Usually the repository connector also specifies what metadata to include.
> > What connector are you crawling with?
> >
> > Karl
> >
> >
> >
> > On Fri, Dec 13, 2013 at 9:06 AM, Alessandro Benedetti <
> > [email protected]> wrote:
> >
> > > Actually it can be a problem.
> > > For example your Solr is running in an application server with a limit
> on
> > > the HttpRequestHeader.
> > > So the server will refuse all the requests that exceeds that limit.
> > >
> > > We are interested in only 3 metadata but Manifold extract n ( n>>3) for
> > > each document.
> > > We can configure the mapping to map those 3 metadata.
> > > But the Post request is built with all the metadata from the document ,
> > it
> > > exceeds the request header and the document will be Rejected without
> > > reason.
> > >
> > > So if the meaning of the Solr field mapping in a Job with a Solr
> > Connector
> > > it's to index only those fields, so the current behaviour it's a bug.
> > > For the reason I explained before.
> > >
> > > Cheers
> > >
> > >
> > > 2013/12/13 Karl Wright <[email protected]>
> > >
> > > > Hi Alessandro,
> > > >
> > > > Thank you for the clarification.
> > > > If you believe it would be helpful to filter metadata, by all means
> > open
> > > a
> > > > ticket and attach a patch.  But I don't exactly see where there would
> > be
> > > an
> > > > issue, since metadata that is posted that is not in the Solr schema
> is
> > > > simply going to be discarded.
> > > >
> > > > Karl
> > > >
> > > >
> > > >
> > > > On Fri, Dec 13, 2013 at 7:56 AM, Alessandro Benedetti <
> > > > [email protected]> wrote:
> > > >
> > > > > Hi Karl,
> > > > > I'm not referring to filter documents.
> > > > > I'm referring to filter metadata associated to a document ( which
> > will
> > > be
> > > > > mapped in Solr fields by the Solr connector) .
> > > > > Because now in the job metadata mapping screen, you can select a
> sub
> > > set
> > > > of
> > > > > metadata to be mapped in solr fields, but then all the metadata
> > > > associated
> > > > > to the document are sent to Solr ( in the way I expressed in the e
> > > mail).
> > > > >
> > > > > Cheers
> > > > >
> > > > >
> > > > > 2013/12/13 Karl Wright <[email protected]>
> > > > >
> > > > > > Hi Alessandro,
> > > > > >
> > > > > > I'm not entirely sure I understand your use case, but so far in
> > > > > ManifoldCF
> > > > > > nobody has requested that an output connector perform document
> > > > filtering,
> > > > > > other than to reject documents by responding with
> > > "DOCUMENT_REJECTED".
> > > > > > Usually document filtering is part of the repository connector's
> > > > > > functionality, since filtering is most effective when it is
> > described
> > > > in
> > > > > > terms of the individual repository's constructs.  At the
> repository
> > > > > > connector level, you can describe an appropriate set of documents
> > to
> > > > > > include, rather than crawling everything and rejecting the ones
> you
> > > > don't
> > > > > > want.  This description is called the "Document Specification".
> >  When
> > > > you
> > > > > > create and edit a job in the Crawler UI some of the job's tabs
> > modify
> > > > > that
> > > > > > specification, and the repository connector code understands the
> > > > > > specification and limits the documents being crawled using it.
> > > > > >
> > > > > > On the output side, e.g. in the Solr output connector, it's
> already
> > > too
> > > > > > late to restrict which documents are crawled.  The best you can
> do
> > is
> > > > > just
> > > > > > to not send them to the index, or explicitly reject them.  This
> > makes
> > > > the
> > > > > > utility of any feature to filter documents in an output connector
> > of
> > > > > > limited utility, compared with doing the same thing in the
> Document
> > > > > > Specification.
> > > > > >
> > > > > > Hope this helps,
> > > > > > Karl
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Dec 13, 2013 at 7:12 AM, Alessandro Benedetti <
> > > > > > [email protected]> wrote:
> > > > > >
> > > > > > > Hi guys,
> > > > > > > I have one question for you.
> > > > > > > looking in the details of the SolrConnector it's possible to
> see
> > > > that :
> > > > > > >
> > > > > > > org.apache.manifoldcf.agents.output.solr.HttpPoster
> > > > > > >
> > > > > > >  writeField(out,LITERAL+newFieldName,values);
> > > > > > > // Write the commitWithin parameter
> > > > > > >  if (commitWithin != null)
> > > > > > >      writeField(out,COMMITWITHIN_METADATA,commitWithin);
> > > > > > >      contentStreamUpdateRequest.setParams(out);
> > > > > > >      contentStreamUpdateRequest.addContentStream(new
> > > > > > >  RepositoryDocumentStream(is,length,contentType,contentName));
> > > > > > >
> > > > > > > In a Job using a Solr connector, it's possible to express the
> > > > metadata
> > > > > > > mapping, mapping specific metadata to solr field names.
> > > > > > > But if you select only 3 mappings , what is happening is that
> all
> > > the
> > > > > > > metadata in the manifold document are sent as params of the
> > > > > > > contentStreamRequest and the mapping is used only to rename the
> > > > fields
> > > > > we
> > > > > > > want to rename .
> > > > > > >
> > > > > > > In my opinion the mapping should be use as a filter as well.
> > > > > > > Because if the user select only 3 metadata, he wants to see
> only
> > > > those
> > > > > > > metadata.
> > > > > > > probably should be present at least a flag that allow the user
> to
> > > > > filter
> > > > > > > the metadata sent to solr or not.
> > > > > > > A little change that can solve a lot of use cases when the user
> > is
> > > > > > > interested only in a subset of metadata and does not need to
> send
> > > > > > > everithing in the header of the http POST.
> > > > > > > I'm pretty new to ManifoldCF so let me know if this feature is
> > > > already
> > > > > > > there and I misunderstood something .
> > > > > > >
> > > > > > >
> > > > > > > Cheers
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > --------------------------
> > > > > > >
> > > > > > > Benedetti Alessandro
> > > > > > > Visiting card : http://about.me/alessandro_benedetti
> > > > > > >
> > > > > > > "Tyger, tyger burning bright
> > > > > > > In the forests of the night,
> > > > > > > What immortal hand or eye
> > > > > > > Could frame thy fearful symmetry?"
> > > > > > >
> > > > > > > William Blake - Songs of Experience -1794 England
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > --------------------------
> > > > >
> > > > > Benedetti Alessandro
> > > > > Visiting card : http://about.me/alessandro_benedetti
> > > > >
> > > > > "Tyger, tyger burning bright
> > > > > In the forests of the night,
> > > > > What immortal hand or eye
> > > > > Could frame thy fearful symmetry?"
> > > > >
> > > > > William Blake - Songs of Experience -1794 England
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > --------------------------
> > >
> > > Benedetti Alessandro
> > > Visiting card : http://about.me/alessandro_benedetti
> > >
> > > "Tyger, tyger burning bright
> > > In the forests of the night,
> > > What immortal hand or eye
> > > Could frame thy fearful symmetry?"
> > >
> > > William Blake - Songs of Experience -1794 England
> > >
> >
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

Reply via email to