Feel free to create a ticket and attach a patch if you'd like an additional feature here.
Karl On Fri, Dec 13, 2013 at 10:33 AM, Alessandro Benedetti < [email protected]> wrote: > Ok, thank you , now I have clear the process. > So if I have 100 metadata fields, and in each output Solr Connector I want > 3 fields to be indexed I have to add 100 mappings, 3 with value and 97 > blank ? > Now I understand how it's working, but it seems a little bit counter > intuitive and long to configure, doesn't it ? > > > 2013/12/13 Karl Wright <[email protected]> > > > If this is not working as detailed in the documentation, please do open a > > ticket and we'll look at it. > > Karl > > > > > > > > On Fri, Dec 13, 2013 at 10:13 AM, Karl Wright <[email protected]> > wrote: > > > > > Hi Alessandro, > > > > > > The Solr metadata mapping field is described thoroughly in the end user > > > documentation: > > > > > > " > > > > > > When you configure a job to use a Solr-type output connection, the Solr > > > connection type provides a tab called "Field Mapping". The purpose of > > this > > > tab is to allow you to map metadata fields as fetched by the job's > > > connection type to fields that Solr is set up to receive. This is > > necessary > > > because the names of the metadata items are often determined by the > > > repository, with no alignment to fields defined in the Solr schema. You > > may > > > also suppress specific metadata items from being sent to the index > using > > > this tab. The tab looks like this: > > > > > > > > > [image: Solr Specification, Field Mapping tab] > > > > > > > > > Add a new mapping by filling in the "source" with the name of the > > metadata > > > item from the repository, and "target" as the name of the output field > in > > > Solr, and click the "Add" button. Leaving the "target" field blank will > > > result in all metadata items of that name not being sent to Solr." > > > > > > Karl > > > > > > > > > On Fri, Dec 13, 2013 at 9:54 AM, Alessandro Benedetti < > > > [email protected]> wrote: > > > > > >> But we were talking about the output connector right ? > > >> Maybe I want the repository connector to extract those metadata > fields, > > >> and > > >> those metadata will be used differently by different output > connectors ( > > >> for example 2 different Jobs, with different Solr mappings). > > >> > > >> Sorry if I repeat the question but : > > >> What is the meaning of the Solr field mapping in a ManifoldJob ( that > > uses > > >> a Solr Connector) ? > > >> If the meaning is to index in Solr only those fields, so, there is > that > > >> little bug :) > > >> > > >> > > >> > > >> 2013/12/13 Karl Wright <[email protected]> > > >> > > >> > Hi Alessandro, > > >> > > > >> > Usually the repository connector also specifies what metadata to > > >> include. > > >> > What connector are you crawling with? > > >> > > > >> > Karl > > >> > > > >> > > > >> > > > >> > On Fri, Dec 13, 2013 at 9:06 AM, Alessandro Benedetti < > > >> > [email protected]> wrote: > > >> > > > >> > > Actually it can be a problem. > > >> > > For example your Solr is running in an application server with a > > >> limit on > > >> > > the HttpRequestHeader. > > >> > > So the server will refuse all the requests that exceeds that > limit. > > >> > > > > >> > > We are interested in only 3 metadata but Manifold extract n ( > n>>3) > > >> for > > >> > > each document. > > >> > > We can configure the mapping to map those 3 metadata. > > >> > > But the Post request is built with all the metadata from the > > document > > >> , > > >> > it > > >> > > exceeds the request header and the document will be Rejected > without > > >> > > reason. > > >> > > > > >> > > So if the meaning of the Solr field mapping in a Job with a Solr > > >> > Connector > > >> > > it's to index only those fields, so the current behaviour it's a > > bug. > > >> > > For the reason I explained before. > > >> > > > > >> > > Cheers > > >> > > > > >> > > > > >> > > 2013/12/13 Karl Wright <[email protected]> > > >> > > > > >> > > > Hi Alessandro, > > >> > > > > > >> > > > Thank you for the clarification. > > >> > > > If you believe it would be helpful to filter metadata, by all > > means > > >> > open > > >> > > a > > >> > > > ticket and attach a patch. But I don't exactly see where there > > >> would > > >> > be > > >> > > an > > >> > > > issue, since metadata that is posted that is not in the Solr > > schema > > >> is > > >> > > > simply going to be discarded. > > >> > > > > > >> > > > Karl > > >> > > > > > >> > > > > > >> > > > > > >> > > > On Fri, Dec 13, 2013 at 7:56 AM, Alessandro Benedetti < > > >> > > > [email protected]> wrote: > > >> > > > > > >> > > > > Hi Karl, > > >> > > > > I'm not referring to filter documents. > > >> > > > > I'm referring to filter metadata associated to a document ( > > which > > >> > will > > >> > > be > > >> > > > > mapped in Solr fields by the Solr connector) . > > >> > > > > Because now in the job metadata mapping screen, you can > select a > > >> sub > > >> > > set > > >> > > > of > > >> > > > > metadata to be mapped in solr fields, but then all the > metadata > > >> > > > associated > > >> > > > > to the document are sent to Solr ( in the way I expressed in > > the e > > >> > > mail). > > >> > > > > > > >> > > > > Cheers > > >> > > > > > > >> > > > > > > >> > > > > 2013/12/13 Karl Wright <[email protected]> > > >> > > > > > > >> > > > > > Hi Alessandro, > > >> > > > > > > > >> > > > > > I'm not entirely sure I understand your use case, but so far > > in > > >> > > > > ManifoldCF > > >> > > > > > nobody has requested that an output connector perform > document > > >> > > > filtering, > > >> > > > > > other than to reject documents by responding with > > >> > > "DOCUMENT_REJECTED". > > >> > > > > > Usually document filtering is part of the repository > > connector's > > >> > > > > > functionality, since filtering is most effective when it is > > >> > described > > >> > > > in > > >> > > > > > terms of the individual repository's constructs. At the > > >> repository > > >> > > > > > connector level, you can describe an appropriate set of > > >> documents > > >> > to > > >> > > > > > include, rather than crawling everything and rejecting the > > ones > > >> you > > >> > > > don't > > >> > > > > > want. This description is called the "Document > > Specification". > > >> > When > > >> > > > you > > >> > > > > > create and edit a job in the Crawler UI some of the job's > tabs > > >> > modify > > >> > > > > that > > >> > > > > > specification, and the repository connector code understands > > the > > >> > > > > > specification and limits the documents being crawled using > it. > > >> > > > > > > > >> > > > > > On the output side, e.g. in the Solr output connector, it's > > >> already > > >> > > too > > >> > > > > > late to restrict which documents are crawled. The best you > > can > > >> do > > >> > is > > >> > > > > just > > >> > > > > > to not send them to the index, or explicitly reject them. > > This > > >> > makes > > >> > > > the > > >> > > > > > utility of any feature to filter documents in an output > > >> connector > > >> > of > > >> > > > > > limited utility, compared with doing the same thing in the > > >> Document > > >> > > > > > Specification. > > >> > > > > > > > >> > > > > > Hope this helps, > > >> > > > > > Karl > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > On Fri, Dec 13, 2013 at 7:12 AM, Alessandro Benedetti < > > >> > > > > > [email protected]> wrote: > > >> > > > > > > > >> > > > > > > Hi guys, > > >> > > > > > > I have one question for you. > > >> > > > > > > looking in the details of the SolrConnector it's possible > to > > >> see > > >> > > > that : > > >> > > > > > > > > >> > > > > > > org.apache.manifoldcf.agents.output.solr.HttpPoster > > >> > > > > > > > > >> > > > > > > writeField(out,LITERAL+newFieldName,values); > > >> > > > > > > // Write the commitWithin parameter > > >> > > > > > > if (commitWithin != null) > > >> > > > > > > writeField(out,COMMITWITHIN_METADATA,commitWithin); > > >> > > > > > > contentStreamUpdateRequest.setParams(out); > > >> > > > > > > contentStreamUpdateRequest.addContentStream(new > > >> > > > > > > > > RepositoryDocumentStream(is,length,contentType,contentName)); > > >> > > > > > > > > >> > > > > > > In a Job using a Solr connector, it's possible to express > > the > > >> > > > metadata > > >> > > > > > > mapping, mapping specific metadata to solr field names. > > >> > > > > > > But if you select only 3 mappings , what is happening is > > that > > >> all > > >> > > the > > >> > > > > > > metadata in the manifold document are sent as params of > the > > >> > > > > > > contentStreamRequest and the mapping is used only to > rename > > >> the > > >> > > > fields > > >> > > > > we > > >> > > > > > > want to rename . > > >> > > > > > > > > >> > > > > > > In my opinion the mapping should be use as a filter as > well. > > >> > > > > > > Because if the user select only 3 metadata, he wants to > see > > >> only > > >> > > > those > > >> > > > > > > metadata. > > >> > > > > > > probably should be present at least a flag that allow the > > >> user to > > >> > > > > filter > > >> > > > > > > the metadata sent to solr or not. > > >> > > > > > > A little change that can solve a lot of use cases when the > > >> user > > >> > is > > >> > > > > > > interested only in a subset of metadata and does not need > to > > >> send > > >> > > > > > > everithing in the header of the http POST. > > >> > > > > > > I'm pretty new to ManifoldCF so let me know if this > feature > > is > > >> > > > already > > >> > > > > > > there and I misunderstood something . > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > Cheers > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > -- > > >> > > > > > > -------------------------- > > >> > > > > > > > > >> > > > > > > Benedetti Alessandro > > >> > > > > > > Visiting card : http://about.me/alessandro_benedetti > > >> > > > > > > > > >> > > > > > > "Tyger, tyger burning bright > > >> > > > > > > In the forests of the night, > > >> > > > > > > What immortal hand or eye > > >> > > > > > > Could frame thy fearful symmetry?" > > >> > > > > > > > > >> > > > > > > William Blake - Songs of Experience -1794 England > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > -- > > >> > > > > -------------------------- > > >> > > > > > > >> > > > > Benedetti Alessandro > > >> > > > > Visiting card : http://about.me/alessandro_benedetti > > >> > > > > > > >> > > > > "Tyger, tyger burning bright > > >> > > > > In the forests of the night, > > >> > > > > What immortal hand or eye > > >> > > > > Could frame thy fearful symmetry?" > > >> > > > > > > >> > > > > William Blake - Songs of Experience -1794 England > > >> > > > > > > >> > > > > > >> > > > > >> > > > > >> > > > > >> > > -- > > >> > > -------------------------- > > >> > > > > >> > > Benedetti Alessandro > > >> > > Visiting card : http://about.me/alessandro_benedetti > > >> > > > > >> > > "Tyger, tyger burning bright > > >> > > In the forests of the night, > > >> > > What immortal hand or eye > > >> > > Could frame thy fearful symmetry?" > > >> > > > > >> > > William Blake - Songs of Experience -1794 England > > >> > > > > >> > > > >> > > >> > > >> > > >> -- > > >> -------------------------- > > >> > > >> Benedetti Alessandro > > >> Visiting card : http://about.me/alessandro_benedetti > > >> > > >> "Tyger, tyger burning bright > > >> In the forests of the night, > > >> What immortal hand or eye > > >> Could frame thy fearful symmetry?" > > >> > > >> William Blake - Songs of Experience -1794 England > > >> > > > > > > > > > > > > -- > -------------------------- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England >
