If this is not working as detailed in the documentation, please do open a
ticket and we'll look at it.
Karl



On Fri, Dec 13, 2013 at 10:13 AM, Karl Wright <[email protected]> wrote:

> Hi Alessandro,
>
> The Solr metadata mapping field is described thoroughly in the end user
> documentation:
>
> "
>
> When you configure a job to use a Solr-type output connection, the Solr
> connection type provides a tab called "Field Mapping". The purpose of this
> tab is to allow you to map metadata fields as fetched by the job's
> connection type to fields that Solr is set up to receive. This is necessary
> because the names of the metadata items are often determined by the
> repository, with no alignment to fields defined in the Solr schema. You may
> also suppress specific metadata items from being sent to the index using
> this tab. The tab looks like this:
>
>
>  [image: Solr Specification, Field Mapping tab]
>
>
> Add a new mapping by filling in the "source" with the name of the metadata
> item from the repository, and "target" as the name of the output field in
> Solr, and click the "Add" button. Leaving the "target" field blank will
> result in all metadata items of that name not being sent to Solr."
>
> Karl
>
>
> On Fri, Dec 13, 2013 at 9:54 AM, Alessandro Benedetti <
> [email protected]> wrote:
>
>> But we were talking about the output connector right ?
>> Maybe I want the repository connector to extract those metadata fields,
>> and
>> those metadata will be used differently by different output connectors (
>> for example 2 different Jobs, with different Solr mappings).
>>
>> Sorry if I repeat the question but :
>> What is the meaning of the Solr field mapping in a ManifoldJob ( that uses
>> a Solr Connector) ?
>> If the meaning is to index in Solr only those fields, so, there is that
>> little bug :)
>>
>>
>>
>> 2013/12/13 Karl Wright <[email protected]>
>>
>> > Hi Alessandro,
>> >
>> > Usually the repository connector also specifies what metadata to
>> include.
>> > What connector are you crawling with?
>> >
>> > Karl
>> >
>> >
>> >
>> > On Fri, Dec 13, 2013 at 9:06 AM, Alessandro Benedetti <
>> > [email protected]> wrote:
>> >
>> > > Actually it can be a problem.
>> > > For example your Solr is running in an application server with a
>> limit on
>> > > the HttpRequestHeader.
>> > > So the server will refuse all the requests that exceeds that limit.
>> > >
>> > > We are interested in only 3 metadata but Manifold extract n ( n>>3)
>> for
>> > > each document.
>> > > We can configure the mapping to map those 3 metadata.
>> > > But the Post request is built with all the metadata from the document
>> ,
>> > it
>> > > exceeds the request header and the document will be Rejected without
>> > > reason.
>> > >
>> > > So if the meaning of the Solr field mapping in a Job with a Solr
>> > Connector
>> > > it's to index only those fields, so the current behaviour it's a bug.
>> > > For the reason I explained before.
>> > >
>> > > Cheers
>> > >
>> > >
>> > > 2013/12/13 Karl Wright <[email protected]>
>> > >
>> > > > Hi Alessandro,
>> > > >
>> > > > Thank you for the clarification.
>> > > > If you believe it would be helpful to filter metadata, by all means
>> > open
>> > > a
>> > > > ticket and attach a patch.  But I don't exactly see where there
>> would
>> > be
>> > > an
>> > > > issue, since metadata that is posted that is not in the Solr schema
>> is
>> > > > simply going to be discarded.
>> > > >
>> > > > Karl
>> > > >
>> > > >
>> > > >
>> > > > On Fri, Dec 13, 2013 at 7:56 AM, Alessandro Benedetti <
>> > > > [email protected]> wrote:
>> > > >
>> > > > > Hi Karl,
>> > > > > I'm not referring to filter documents.
>> > > > > I'm referring to filter metadata associated to a document ( which
>> > will
>> > > be
>> > > > > mapped in Solr fields by the Solr connector) .
>> > > > > Because now in the job metadata mapping screen, you can select a
>> sub
>> > > set
>> > > > of
>> > > > > metadata to be mapped in solr fields, but then all the metadata
>> > > > associated
>> > > > > to the document are sent to Solr ( in the way I expressed in the e
>> > > mail).
>> > > > >
>> > > > > Cheers
>> > > > >
>> > > > >
>> > > > > 2013/12/13 Karl Wright <[email protected]>
>> > > > >
>> > > > > > Hi Alessandro,
>> > > > > >
>> > > > > > I'm not entirely sure I understand your use case, but so far in
>> > > > > ManifoldCF
>> > > > > > nobody has requested that an output connector perform document
>> > > > filtering,
>> > > > > > other than to reject documents by responding with
>> > > "DOCUMENT_REJECTED".
>> > > > > > Usually document filtering is part of the repository connector's
>> > > > > > functionality, since filtering is most effective when it is
>> > described
>> > > > in
>> > > > > > terms of the individual repository's constructs.  At the
>> repository
>> > > > > > connector level, you can describe an appropriate set of
>> documents
>> > to
>> > > > > > include, rather than crawling everything and rejecting the ones
>> you
>> > > > don't
>> > > > > > want.  This description is called the "Document Specification".
>> >  When
>> > > > you
>> > > > > > create and edit a job in the Crawler UI some of the job's tabs
>> > modify
>> > > > > that
>> > > > > > specification, and the repository connector code understands the
>> > > > > > specification and limits the documents being crawled using it.
>> > > > > >
>> > > > > > On the output side, e.g. in the Solr output connector, it's
>> already
>> > > too
>> > > > > > late to restrict which documents are crawled.  The best you can
>> do
>> > is
>> > > > > just
>> > > > > > to not send them to the index, or explicitly reject them.  This
>> > makes
>> > > > the
>> > > > > > utility of any feature to filter documents in an output
>> connector
>> > of
>> > > > > > limited utility, compared with doing the same thing in the
>> Document
>> > > > > > Specification.
>> > > > > >
>> > > > > > Hope this helps,
>> > > > > > Karl
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > On Fri, Dec 13, 2013 at 7:12 AM, Alessandro Benedetti <
>> > > > > > [email protected]> wrote:
>> > > > > >
>> > > > > > > Hi guys,
>> > > > > > > I have one question for you.
>> > > > > > > looking in the details of the SolrConnector it's possible to
>> see
>> > > > that :
>> > > > > > >
>> > > > > > > org.apache.manifoldcf.agents.output.solr.HttpPoster
>> > > > > > >
>> > > > > > >  writeField(out,LITERAL+newFieldName,values);
>> > > > > > > // Write the commitWithin parameter
>> > > > > > >  if (commitWithin != null)
>> > > > > > >      writeField(out,COMMITWITHIN_METADATA,commitWithin);
>> > > > > > >      contentStreamUpdateRequest.setParams(out);
>> > > > > > >      contentStreamUpdateRequest.addContentStream(new
>> > > > > > >  RepositoryDocumentStream(is,length,contentType,contentName));
>> > > > > > >
>> > > > > > > In a Job using a Solr connector, it's possible to express the
>> > > > metadata
>> > > > > > > mapping, mapping specific metadata to solr field names.
>> > > > > > > But if you select only 3 mappings , what is happening is that
>> all
>> > > the
>> > > > > > > metadata in the manifold document are sent as params of the
>> > > > > > > contentStreamRequest and the mapping is used only to rename
>> the
>> > > > fields
>> > > > > we
>> > > > > > > want to rename .
>> > > > > > >
>> > > > > > > In my opinion the mapping should be use as a filter as well.
>> > > > > > > Because if the user select only 3 metadata, he wants to see
>> only
>> > > > those
>> > > > > > > metadata.
>> > > > > > > probably should be present at least a flag that allow the
>> user to
>> > > > > filter
>> > > > > > > the metadata sent to solr or not.
>> > > > > > > A little change that can solve a lot of use cases when the
>> user
>> > is
>> > > > > > > interested only in a subset of metadata and does not need to
>> send
>> > > > > > > everithing in the header of the http POST.
>> > > > > > > I'm pretty new to ManifoldCF so let me know if this feature is
>> > > > already
>> > > > > > > there and I misunderstood something .
>> > > > > > >
>> > > > > > >
>> > > > > > > Cheers
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > --
>> > > > > > > --------------------------
>> > > > > > >
>> > > > > > > Benedetti Alessandro
>> > > > > > > Visiting card : http://about.me/alessandro_benedetti
>> > > > > > >
>> > > > > > > "Tyger, tyger burning bright
>> > > > > > > In the forests of the night,
>> > > > > > > What immortal hand or eye
>> > > > > > > Could frame thy fearful symmetry?"
>> > > > > > >
>> > > > > > > William Blake - Songs of Experience -1794 England
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > --------------------------
>> > > > >
>> > > > > Benedetti Alessandro
>> > > > > Visiting card : http://about.me/alessandro_benedetti
>> > > > >
>> > > > > "Tyger, tyger burning bright
>> > > > > In the forests of the night,
>> > > > > What immortal hand or eye
>> > > > > Could frame thy fearful symmetry?"
>> > > > >
>> > > > > William Blake - Songs of Experience -1794 England
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > --------------------------
>> > >
>> > > Benedetti Alessandro
>> > > Visiting card : http://about.me/alessandro_benedetti
>> > >
>> > > "Tyger, tyger burning bright
>> > > In the forests of the night,
>> > > What immortal hand or eye
>> > > Could frame thy fearful symmetry?"
>> > >
>> > > William Blake - Songs of Experience -1794 England
>> > >
>> >
>>
>>
>>
>> --
>> --------------------------
>>
>> Benedetti Alessandro
>> Visiting card : http://about.me/alessandro_benedetti
>>
>> "Tyger, tyger burning bright
>> In the forests of the night,
>> What immortal hand or eye
>> Could frame thy fearful symmetry?"
>>
>> William Blake - Songs of Experience -1794 England
>>
>
>

Reply via email to