[ https://issues.apache.org/jira/browse/SOLR-912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083884#comment-14083884 ]
Shai Erera edited comment on SOLR-912 at 8/3/14 6:22 AM: --------------------------------------------------------- Resurrecting that issue, I reviewed NamedList and I don't understand why it has to be so complicated: * Its {{<T>}} generic results in TONS of warnings in eclipse, for no good reason. I don't buy this comment made on the issue that it's better to make it generic than not. If we have a generic API which isn't used like that, it calls for a bad API IMO. From what I can see, NamedList is not supposed to be generic at all, as its main purpose is to allow you to store a heterogeneous list of {{<name,value>}} pairs, where name is always String, and the type of {{value}} may differ. If we want to make it more convenient for people who know e.g. all values are Strings, we can add sugar methods like {{getInt(), getString()...}}. I've also briefly reviewed some classes that use NamedList (outside of tests), and none seem to rely on {{<T>}} at all. So I'd rather we remove that generic from the API signature. * There is what seems to be a totally redundant {{SimpleOrderedMap}} class, which has contradicting documentation in its class-jdocs: ** _a {{NamedList}} where access by key is more important than maintaining order_ ** _This class does not provide efficient lookup by key_ But the class doesn't add any additional data structures, contains only 3 ctors which delegate as-is to NamedList and offers a clone() which is identical to NamedList.clone(). Yet there are 574 references to it (per-eclipse) ... I think this class is just confusing and has to go away. Leaving performance aside for a second, NamedList could simply hold an internal {{Map<String,List<Object>>}} to enable efficient access by key, remove all values of a key, access a key's values in order etc. It doesn't allow accessing the {{<name,value>}} pairs in any order though, i.e. {{getVal\(i\)}}. I don't know how important is this functionality though .. i.e. if we replaced it with a {{namesIterator()}}, would it not allow roughly the same thing? I'm kind of sure it does, but there are so many uses of NamedList across the Solr code base that I might be missing a case which won't like it. So I'd like to ask the Solr folks who know this code better than me -- how important is {{getName/Val\(i\)}}? Now back to performance, for a sec, in order to not always allocate a {{List<Object>}} when NamedList is used to hold only one value per parameter, we can either: * Use Collections.singletonList() on first _add_, and change to a concrete List on the second _add_ only. * Use an {{Object[]}}, it's less expensive than a List object. * Use a Map<String,Object> internally and do instanceof checks on add/get as appropriate. BUT, if accessing/removing values by name is not important and it's OK if get\(i\) is run on O(N), we can simply simplify the class, like Yonik's proposal above, to hold an Object[] array (instead of List). But I think we should remove the generic anyway. Maybe we should break this down into 3 issues: * Get rid of SimpleOrderedMap -- if it's important to keep in 4x, I can deprecate and move all uses of it to NamedList directly. * Remove the generics from NamedList's API. We can add sugar getters for specific types if we want. * Simplify NamedList internal implementation. On the performance side -- how critical is NamedList on the execution path? I don't like micro-benchmarks too much, so if NamedList is only a fraction of an entire execution path, I'd rather it's even a tad slower but readable and easier to use/maintain, than if it's overly complicated only to buy us a few nanos in the overall request. was (Author: shaie): Resurrecting that issue, I reviewed NamedList and I don't understand why it has to be so complicated: * Its {{<T>}} generic results in TONS of warnings in eclipse, for no good reason. I don't buy this comment made on the issue that it's better to make it generic than not. If we have a generic API which isn't used like that, it calls for a bad API IMO. From what I can see, NamedList is not supposed to be generic at all, as its main purpose is to allow you to store a heterogeneous list of {{<name,value>}} pairs, where name is always String, and the type of {{value}} may differ. If we want to make it more convenient for people who know e.g. all values are Strings, we can add sugar methods like {{getInt(), getString()...}}. I've also briefly reviewed some classes that use NamedList (outside of tests), and none seem to rely on {{<T>}} at all. So I'd rather we remove that generic from the API signature. * There is what seems to be a totally redundant {{SimpleOrderedMap}} class, which has contradicting documentation in its class-jdocs: ** _a {{NamedList}} where access by key is more important than maintaining order_ ** _This class does not provide efficient lookup by key_ But the class doesn't add any additional data structures, contains only 3 ctors which delegate as-is to NamedList and offers a clone() which is identical to NamedList.clone(). Yet there are 574 references to it (per-eclipse) ... I think this class is just confusing and has to go away. Leaving performance aside for a second, NamedList could simply hold an internal {{Map<String,List<Object>>}} to enable efficient access by key, remove all values of a key, access a key's values in order etc. It doesn't allow accessing the {{<name,value>}} pairs in any order though, i.e. {{getVal(i)}}. I don't know how important is this functionality though .. i.e. if we replaced it with a {{namesIterator()}}, would it not allow roughly the same thing? I'm kind of sure it does, but there are so many uses of NamedList across the Solr code base that I might be missing a case which won't like it. So I'd like to ask the Solr folks who know this code better than me -- how important is {{getName/Val(i)}}? Now back to performance, for a sec, in order to not always allocate a {{List<Object>}} when NamedList is used to hold only one value per parameter, we can either: * Use Collections.singletonList() on first _add_, and change to a concrete List on the second _add_ only. * Use an {{Object[]}}, it's less expensive than a List object. * Use a Map<String,Object> internally and do instanceof checks on add/get as appropriate. BUT, if accessing/removing values by name is not important and it's OK if get(i) is run on O(N), we can simply simplify the class, like Yonik's proposal above, to hold an Object[] array (instead of List). But I think we should remove the generic anyway. Maybe we should break this down into 3 issues: * Get rid of SimpleOrderedMap -- if it's important to keep in 4x, I can deprecate and move all uses of it to NamedList directly. * Remove the generics from NamedList's API. We can add sugar getters for specific types if we want. * Simplify NamedList internal implementation. On the performance side -- how critical is NamedList on the execution path? I don't like micro-benchmarks too much, so if NamedList is only a fraction of an entire execution path, I'd rather it's even a tad slower but readable and easier to use/maintain, than if it's overly complicated only to buy us a few nanos in the overall request. > org.apache.solr.common.util.NamedList - Typesafe efficient variant - > ModernNamedList introduced - implementing the same API as NamedList > ---------------------------------------------------------------------------------------------------------------------------------------- > > Key: SOLR-912 > URL: https://issues.apache.org/jira/browse/SOLR-912 > Project: Solr > Issue Type: Improvement > Components: search > Environment: Tomcat 6, JRE 6, Solr 1.3+ nightlies > Reporter: Karthik K > Priority: Minor > Attachments: NLProfile.java, SOLR-912.patch, SOLR-912.patch > > Original Estimate: 72h > Remaining Estimate: 72h > > The implementation of NamedList - while being fast - is not necessarily > type-safe. I have implemented an additional implementation of the same - > ModernNamedList (a type-safe variation providing the same interface as > NamedList) - while preserving the semantics in terms of ordering of elements > and allowing null elements for key and values (keys are always Strings , > while values correspond to generics ). -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org