[ https://issues.apache.org/jira/browse/SOLR-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118594#comment-13118594 ]
Hoss Man commented on SOLR-2802: -------------------------------- bq. I already have a FieldCopy processor which can copy/move fields, Jan: Yeah ... I designed the base class arround the assumption that we would come up with a good "clone fields" processor in SOLR-2599, so that they can simply modify the values "in place" and people can clone/rename fields as needed before using them bq. With SOLR-2599, I imagine we could take copyField's out of schema.xml, Erik: I actually consider them very orthogonal. Supporting cloning/copying in an update processor is a way of saying "when docs are added to the index using this Update Chain, take these actions on the fields" but copyField in schema.xml is a way of saying "no matter where this doc comes from, the value of field X should also be put in field Y" bq. Before we get too carried away, what about making this even more general purpose with scripting, ala SOLR-1725 ? We definitely should get the Script Processor in for people who don't know java but have specific goals, but we shouldn't let support for scripting prevent us from implementing some of the more commonly requested actions in java - there's a fine line between "you _can_ write scripts to do _anything_ you want" and "you _have_ to write scripts to do _everything_ you want" bq. There's one other update processor that perhaps could fit within this framework and become something generally useful in Solr - SOLR-1280 I looked at that one before i started actually because of the "modify in place" nature of this base class, it didn't really seem like a good fit to try and refactor that one to be a subclass. bq. I think in general that processors should match nothing by default. Could lead to unexpected behaviour for users in the long run. Martijn: I kept going back and forth on this while i was working on it. Ultimately my thought process was that it didn't really make sense for the "default" to be a No-Op because if that's the case then what's the point of having a default at all? And if we're going to require that they provide at least one of the field selectors, and we want to offer them syntactic sugar for "match all field" why not make it the shortest sugar possible?. I figured it would make sense for the base class to assume that "no args" ment let the subclass see all of the fields/values -- and the subclasses could enforce their own rules default rules as needed, ala... * implicitly... ** in the TrimFieldUpdateProcessorFactory attached, it ignores anything that isn't an instance of String -- regardless of how it's configured (so it doesn't call toString() on an Integer and then try to trim that) * explicitly ** i imagine that Date/Number parsing update processors should default to only trying to parse fields where the FieldType extends DateField/TrieField (the Concat processor should probably do the same for StrFields fields configured to be multiValued=false now that i think about it). But unlike how the Trim processor works, if they are explicitly configuring it to parse fields named "foo.*" they should try to do so regardless of what the field type/settings might be, because maybe a subsequent processor will renamed/move those fields in the input docs to something that is expecting a Date/Number (or does support multivalued fields) what do you think? the scenario that still bothers me about all this is that if we put something like this in the example schema... {code} <updateRequestProcessorChain name="simple" default="true"> <processor class="solr.TrimFieldUpdateProcessorFactory" /> <processor class="solr.LogUpdateProcessorFactory" /> <processor class="solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain> {code} ...(so all strings get trimmed) someone might say "Hey, stop trimming my strings!" and it's easy for them to remove that from the example. But someone else might say: "This is exactly what i want _most_ of the time, but I've got this one field where whitespace matters, stop trimming that one." -- and now he's got to jump through a lot of hoops to keep the trim behavior on all but on field (unless we add some sort of exclusion option(s)). Even if we make some field selection args mandatory for the processor and use this instead... {code} <updateRequestProcessorChain name="simple" default="true"> <processor class="solr.TrimFieldUpdateProcessorFactory"> <str name="fieldRegex">.*</str> </processor> <processor class="solr.LogUpdateProcessorFactory" /> <processor class="solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain> {code} ..that user still has the same amount of pain to deal with. > Toolkit of UpdateProcessors for modifying document values > --------------------------------------------------------- > > Key: SOLR-2802 > URL: https://issues.apache.org/jira/browse/SOLR-2802 > Project: Solr > Issue Type: New Feature > Reporter: Hoss Man > Attachments: SOLR-2802_update_processor_toolkit.patch > > > Frequently users ask about questions about things where the answer is "you > could do it with an UpdateProcessor" but the number of our of hte box > UpdateProcessors is generally lacking and there aren't even very good base > classes for the common case of manipulating field values when adding documents -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org