But we were talking about the output connector right ?
Maybe I want the repository connector to extract those metadata fields, and
those metadata will be used differently by different output connectors (
for example 2 different Jobs, with different Solr mappings).

Sorry if I repeat the question but :
What is the meaning of the Solr field mapping in a ManifoldJob ( that uses
a Solr Connector) ?
If the meaning is to index in Solr only those fields, so, there is that
little bug :)



2013/12/13 Karl Wright <[email protected]>

> Hi Alessandro,
>
> Usually the repository connector also specifies what metadata to include.
> What connector are you crawling with?
>
> Karl
>
>
>
> On Fri, Dec 13, 2013 at 9:06 AM, Alessandro Benedetti <
> [email protected]> wrote:
>
> > Actually it can be a problem.
> > For example your Solr is running in an application server with a limit on
> > the HttpRequestHeader.
> > So the server will refuse all the requests that exceeds that limit.
> >
> > We are interested in only 3 metadata but Manifold extract n ( n>>3) for
> > each document.
> > We can configure the mapping to map those 3 metadata.
> > But the Post request is built with all the metadata from the document ,
> it
> > exceeds the request header and the document will be Rejected without
> > reason.
> >
> > So if the meaning of the Solr field mapping in a Job with a Solr
> Connector
> > it's to index only those fields, so the current behaviour it's a bug.
> > For the reason I explained before.
> >
> > Cheers
> >
> >
> > 2013/12/13 Karl Wright <[email protected]>
> >
> > > Hi Alessandro,
> > >
> > > Thank you for the clarification.
> > > If you believe it would be helpful to filter metadata, by all means
> open
> > a
> > > ticket and attach a patch.  But I don't exactly see where there would
> be
> > an
> > > issue, since metadata that is posted that is not in the Solr schema is
> > > simply going to be discarded.
> > >
> > > Karl
> > >
> > >
> > >
> > > On Fri, Dec 13, 2013 at 7:56 AM, Alessandro Benedetti <
> > > [email protected]> wrote:
> > >
> > > > Hi Karl,
> > > > I'm not referring to filter documents.
> > > > I'm referring to filter metadata associated to a document ( which
> will
> > be
> > > > mapped in Solr fields by the Solr connector) .
> > > > Because now in the job metadata mapping screen, you can select a sub
> > set
> > > of
> > > > metadata to be mapped in solr fields, but then all the metadata
> > > associated
> > > > to the document are sent to Solr ( in the way I expressed in the e
> > mail).
> > > >
> > > > Cheers
> > > >
> > > >
> > > > 2013/12/13 Karl Wright <[email protected]>
> > > >
> > > > > Hi Alessandro,
> > > > >
> > > > > I'm not entirely sure I understand your use case, but so far in
> > > > ManifoldCF
> > > > > nobody has requested that an output connector perform document
> > > filtering,
> > > > > other than to reject documents by responding with
> > "DOCUMENT_REJECTED".
> > > > > Usually document filtering is part of the repository connector's
> > > > > functionality, since filtering is most effective when it is
> described
> > > in
> > > > > terms of the individual repository's constructs.  At the repository
> > > > > connector level, you can describe an appropriate set of documents
> to
> > > > > include, rather than crawling everything and rejecting the ones you
> > > don't
> > > > > want.  This description is called the "Document Specification".
>  When
> > > you
> > > > > create and edit a job in the Crawler UI some of the job's tabs
> modify
> > > > that
> > > > > specification, and the repository connector code understands the
> > > > > specification and limits the documents being crawled using it.
> > > > >
> > > > > On the output side, e.g. in the Solr output connector, it's already
> > too
> > > > > late to restrict which documents are crawled.  The best you can do
> is
> > > > just
> > > > > to not send them to the index, or explicitly reject them.  This
> makes
> > > the
> > > > > utility of any feature to filter documents in an output connector
> of
> > > > > limited utility, compared with doing the same thing in the Document
> > > > > Specification.
> > > > >
> > > > > Hope this helps,
> > > > > Karl
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Dec 13, 2013 at 7:12 AM, Alessandro Benedetti <
> > > > > [email protected]> wrote:
> > > > >
> > > > > > Hi guys,
> > > > > > I have one question for you.
> > > > > > looking in the details of the SolrConnector it's possible to see
> > > that :
> > > > > >
> > > > > > org.apache.manifoldcf.agents.output.solr.HttpPoster
> > > > > >
> > > > > >  writeField(out,LITERAL+newFieldName,values);
> > > > > > // Write the commitWithin parameter
> > > > > >  if (commitWithin != null)
> > > > > >      writeField(out,COMMITWITHIN_METADATA,commitWithin);
> > > > > >      contentStreamUpdateRequest.setParams(out);
> > > > > >      contentStreamUpdateRequest.addContentStream(new
> > > > > >  RepositoryDocumentStream(is,length,contentType,contentName));
> > > > > >
> > > > > > In a Job using a Solr connector, it's possible to express the
> > > metadata
> > > > > > mapping, mapping specific metadata to solr field names.
> > > > > > But if you select only 3 mappings , what is happening is that all
> > the
> > > > > > metadata in the manifold document are sent as params of the
> > > > > > contentStreamRequest and the mapping is used only to rename the
> > > fields
> > > > we
> > > > > > want to rename .
> > > > > >
> > > > > > In my opinion the mapping should be use as a filter as well.
> > > > > > Because if the user select only 3 metadata, he wants to see only
> > > those
> > > > > > metadata.
> > > > > > probably should be present at least a flag that allow the user to
> > > > filter
> > > > > > the metadata sent to solr or not.
> > > > > > A little change that can solve a lot of use cases when the user
> is
> > > > > > interested only in a subset of metadata and does not need to send
> > > > > > everithing in the header of the http POST.
> > > > > > I'm pretty new to ManifoldCF so let me know if this feature is
> > > already
> > > > > > there and I misunderstood something .
> > > > > >
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > --------------------------
> > > > > >
> > > > > > Benedetti Alessandro
> > > > > > Visiting card : http://about.me/alessandro_benedetti
> > > > > >
> > > > > > "Tyger, tyger burning bright
> > > > > > In the forests of the night,
> > > > > > What immortal hand or eye
> > > > > > Could frame thy fearful symmetry?"
> > > > > >
> > > > > > William Blake - Songs of Experience -1794 England
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > --------------------------
> > > >
> > > > Benedetti Alessandro
> > > > Visiting card : http://about.me/alessandro_benedetti
> > > >
> > > > "Tyger, tyger burning bright
> > > > In the forests of the night,
> > > > What immortal hand or eye
> > > > Could frame thy fearful symmetry?"
> > > >
> > > > William Blake - Songs of Experience -1794 England
> > > >
> > >
> >
> >
> >
> > --
> > --------------------------
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Reply via email to