But we were talking about the output connector right ? Maybe I want the repository connector to extract those metadata fields, and those metadata will be used differently by different output connectors ( for example 2 different Jobs, with different Solr mappings).
Sorry if I repeat the question but : What is the meaning of the Solr field mapping in a ManifoldJob ( that uses a Solr Connector) ? If the meaning is to index in Solr only those fields, so, there is that little bug :) 2013/12/13 Karl Wright <[email protected]> > Hi Alessandro, > > Usually the repository connector also specifies what metadata to include. > What connector are you crawling with? > > Karl > > > > On Fri, Dec 13, 2013 at 9:06 AM, Alessandro Benedetti < > [email protected]> wrote: > > > Actually it can be a problem. > > For example your Solr is running in an application server with a limit on > > the HttpRequestHeader. > > So the server will refuse all the requests that exceeds that limit. > > > > We are interested in only 3 metadata but Manifold extract n ( n>>3) for > > each document. > > We can configure the mapping to map those 3 metadata. > > But the Post request is built with all the metadata from the document , > it > > exceeds the request header and the document will be Rejected without > > reason. > > > > So if the meaning of the Solr field mapping in a Job with a Solr > Connector > > it's to index only those fields, so the current behaviour it's a bug. > > For the reason I explained before. > > > > Cheers > > > > > > 2013/12/13 Karl Wright <[email protected]> > > > > > Hi Alessandro, > > > > > > Thank you for the clarification. > > > If you believe it would be helpful to filter metadata, by all means > open > > a > > > ticket and attach a patch. But I don't exactly see where there would > be > > an > > > issue, since metadata that is posted that is not in the Solr schema is > > > simply going to be discarded. > > > > > > Karl > > > > > > > > > > > > On Fri, Dec 13, 2013 at 7:56 AM, Alessandro Benedetti < > > > [email protected]> wrote: > > > > > > > Hi Karl, > > > > I'm not referring to filter documents. > > > > I'm referring to filter metadata associated to a document ( which > will > > be > > > > mapped in Solr fields by the Solr connector) . > > > > Because now in the job metadata mapping screen, you can select a sub > > set > > > of > > > > metadata to be mapped in solr fields, but then all the metadata > > > associated > > > > to the document are sent to Solr ( in the way I expressed in the e > > mail). > > > > > > > > Cheers > > > > > > > > > > > > 2013/12/13 Karl Wright <[email protected]> > > > > > > > > > Hi Alessandro, > > > > > > > > > > I'm not entirely sure I understand your use case, but so far in > > > > ManifoldCF > > > > > nobody has requested that an output connector perform document > > > filtering, > > > > > other than to reject documents by responding with > > "DOCUMENT_REJECTED". > > > > > Usually document filtering is part of the repository connector's > > > > > functionality, since filtering is most effective when it is > described > > > in > > > > > terms of the individual repository's constructs. At the repository > > > > > connector level, you can describe an appropriate set of documents > to > > > > > include, rather than crawling everything and rejecting the ones you > > > don't > > > > > want. This description is called the "Document Specification". > When > > > you > > > > > create and edit a job in the Crawler UI some of the job's tabs > modify > > > > that > > > > > specification, and the repository connector code understands the > > > > > specification and limits the documents being crawled using it. > > > > > > > > > > On the output side, e.g. in the Solr output connector, it's already > > too > > > > > late to restrict which documents are crawled. The best you can do > is > > > > just > > > > > to not send them to the index, or explicitly reject them. This > makes > > > the > > > > > utility of any feature to filter documents in an output connector > of > > > > > limited utility, compared with doing the same thing in the Document > > > > > Specification. > > > > > > > > > > Hope this helps, > > > > > Karl > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Dec 13, 2013 at 7:12 AM, Alessandro Benedetti < > > > > > [email protected]> wrote: > > > > > > > > > > > Hi guys, > > > > > > I have one question for you. > > > > > > looking in the details of the SolrConnector it's possible to see > > > that : > > > > > > > > > > > > org.apache.manifoldcf.agents.output.solr.HttpPoster > > > > > > > > > > > > writeField(out,LITERAL+newFieldName,values); > > > > > > // Write the commitWithin parameter > > > > > > if (commitWithin != null) > > > > > > writeField(out,COMMITWITHIN_METADATA,commitWithin); > > > > > > contentStreamUpdateRequest.setParams(out); > > > > > > contentStreamUpdateRequest.addContentStream(new > > > > > > RepositoryDocumentStream(is,length,contentType,contentName)); > > > > > > > > > > > > In a Job using a Solr connector, it's possible to express the > > > metadata > > > > > > mapping, mapping specific metadata to solr field names. > > > > > > But if you select only 3 mappings , what is happening is that all > > the > > > > > > metadata in the manifold document are sent as params of the > > > > > > contentStreamRequest and the mapping is used only to rename the > > > fields > > > > we > > > > > > want to rename . > > > > > > > > > > > > In my opinion the mapping should be use as a filter as well. > > > > > > Because if the user select only 3 metadata, he wants to see only > > > those > > > > > > metadata. > > > > > > probably should be present at least a flag that allow the user to > > > > filter > > > > > > the metadata sent to solr or not. > > > > > > A little change that can solve a lot of use cases when the user > is > > > > > > interested only in a subset of metadata and does not need to send > > > > > > everithing in the header of the http POST. > > > > > > I'm pretty new to ManifoldCF so let me know if this feature is > > > already > > > > > > there and I misunderstood something . > > > > > > > > > > > > > > > > > > Cheers > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > -------------------------- > > > > > > > > > > > > Benedetti Alessandro > > > > > > Visiting card : http://about.me/alessandro_benedetti > > > > > > > > > > > > "Tyger, tyger burning bright > > > > > > In the forests of the night, > > > > > > What immortal hand or eye > > > > > > Could frame thy fearful symmetry?" > > > > > > > > > > > > William Blake - Songs of Experience -1794 England > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > -------------------------- > > > > > > > > Benedetti Alessandro > > > > Visiting card : http://about.me/alessandro_benedetti > > > > > > > > "Tyger, tyger burning bright > > > > In the forests of the night, > > > > What immortal hand or eye > > > > Could frame thy fearful symmetry?" > > > > > > > > William Blake - Songs of Experience -1794 England > > > > > > > > > > > > > > > -- > > -------------------------- > > > > Benedetti Alessandro > > Visiting card : http://about.me/alessandro_benedetti > > > > "Tyger, tyger burning bright > > In the forests of the night, > > What immortal hand or eye > > Could frame thy fearful symmetry?" > > > > William Blake - Songs of Experience -1794 England > > > -- -------------------------- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England
