[jira] Updated: (SOLR-1872) Document-level Access Control in Solr
[ https://issues.apache.org/jira/browse/SOLR-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge updated SOLR-1872: --- Attachment: SolrACLSecurity.java Updates a typo or two plus some misc tweaks. {code} zxb79j3g76A79N8N2AbR0K852976qr1klt86xv436j2 acl.xml true 10 1 audit.log 5 15 {code} Thanks, Peter > Document-level Access Control in Solr > - > > Key: SOLR-1872 > URL: https://issues.apache.org/jira/browse/SOLR-1872 > Project: Solr > Issue Type: New Feature > Components: SearchComponents - other >Affects Versions: 1.4 > Environment: Solr 1.4 >Reporter: Peter Sturge >Priority: Minor > Attachments: SolrACLSecurity.java, SolrACLSecurity.java, > SolrACLSecurity.rar > > > This issue relates to providing document-level access control for Solr index > data. > A related JIRA issue is: SOLR-1834. I thought it would be best if I created a > separate JIRA issue, rather than tack on to SOLR-1834, as the approach here > is somewhat different, and I didn't want to confuse things or step on Anders' > good work. > There have been lots of discussions about document-level access in Solr using > LCF, custom comoponents and the like. Access Control is one of those subjects > that quickly spreads to lots of 'ratholes' to dive into. Even if not everyone > agrees with the approaches taken here, it does, at the very least, highlight > some of the salient issues surrounding access control in Solr, and will > hopefully initiate a healthy discussion on the range of related requirements, > with the aim of finding the optimum balance of requirements. > The approach taken here is document and schema agnostic - i.e. the access > control is independant of what is or will be in the index, and no schema > changes are required. This version doesn't include LDAP/AD integration, but > could be added relatively easily (see Ander's very good work on this in > SOLR-1834). Note that, at the moment, this version doesn't deal with /update, > /replication etc., it's currently a /select thing at the moment (but it could > be used for these). > This approach uses a SearchComponent subclass called SolrACLSecurity. Its > configuration is read in from solrconfig.xml in the usual way, and the > allow/deny configuration is split out into a config file called acl.xml. > acl.xml defines a number of users and groups (and 1 global for 'everyone'), > and assigns 0 or more {{}} and/or {{}} elements. > When the SearchComponent is initialized, user objects are created and cached, > including an 'allow' list and a 'deny' list. > When a request comes in, these lists are used to build filter queries > ('allows' are OR'ed and 'denies' are NAND'ed), and then added to the query > request. > Because the allow and deny elements are simply subsearch queries (e.g. > {{somefield:secret}}, this mechanism will work on any > stored data that can be queried, including already existing data. > Authentication > One of the sticky problems with access control is how to determine who's > asking for data. There are many approaches, and to stay in the generic vein > the current mechanism uses http parameters for this. > For an initial search, a client includes a {{username=somename}} parameter > and a {{hash=pwdhash}} hash of its password. If the request sends the correct > parameters, the search is granted and a uuid parameter is returned in the > response header. This uuid can then be used in subsequent requests from the > client. If the request is wrong, the SearchComponent fails and will increment > the user's failed login count (if a valid user was specified). If this count > exceeds the configured lockoutThreshold, no further requests are granted > until the lockoutTime has elapsed. > This mechanism protects against some types of attacks (e.g. CLRF, dictionary > etc.), but it really needs container HTTPS as well (as would most other auth > implementations). Incorporating SSL certificates for authentication and > making the authentication mechanism pluggable would be a nice improvement > (i.e. separate authentication from access control). > Another issue is how internal searchers perform autowarming etc. The solution > here is to use a local key called 'SolrACLSecurityKey'. This key is local and > [should be] unique to that server. firstSearcher, newSearcher et al then > include this key in their parameters so they can perform autowarming without > constraint. Again, there are likely many ways to achieve this, this approach > is but one. > The attached rar holds the source and associated configuration. This has been > tested on the 1.4 release codebase (search in the attached solrconfig.xml for > SolrACLSecurity to find the relevant sect
[jira] Issue Comment Edited: (SOLR-1872) Document-level Access Control in Solr
[ https://issues.apache.org/jira/browse/SOLR-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855698#action_12855698 ] Peter Sturge edited comment on SOLR-1872 at 4/11/10 6:18 AM: - This update adds in optional auditing of searches by users and failed access attempts, plus a few minor tweaks. To configure auditing, here is a sample searchComponent section from solrconfg.xml: {code} zxb79j3g76A79N8N2AbR0K852976qr1klt86xv436j2 acl.xml true 10 1 audit.log 5 15 {code} was (Author: midiman): This update adds in optional auditing of searches by users and failed access attempts, plus a few minor tweaks. To configure auditing, here is a sample searchComponent section from solrconfg.xml: {{ zxb79j3g76A79N8N2AbR0K852976qr1klt86xv436j2 acl.xml true 10 1 audit.log 5 15 }} > Document-level Access Control in Solr > - > > Key: SOLR-1872 > URL: https://issues.apache.org/jira/browse/SOLR-1872 > Project: Solr > Issue Type: New Feature > Components: SearchComponents - other >Affects Versions: 1.4 > Environment: Solr 1.4 >Reporter: Peter Sturge >Priority: Minor > Attachments: SolrACLSecurity.java, SolrACLSecurity.rar > > > This issue relates to providing document-level access control for Solr index > data. > A related JIRA issue is: SOLR-1834. I thought it would be best if I created a > separate JIRA issue, rather than tack on to SOLR-1834, as the approach here > is somewhat different, and I didn't want to confuse things or step on Anders' > good work. > There have been lots of discussions about document-level access in Solr using > LCF, custom comoponents and the like. Access Control is one of those subjects > that quickly spreads to lots of 'ratholes' to dive into. Even if not everyone > agrees with the approaches taken here, it does, at the very least, highlight > some of the salient issues surrounding access control in Solr, and will > hopefully initiate a healthy discussion on the range of related requirements, > with the aim of finding the optimum balance of requirements. > The approach taken here is document and schema agnostic - i.e. the access > control is independant of what is or will be in the index, and no schema > changes are required. This version doesn't include LDAP/AD integration, but > could be added relatively easily (see Ander's very good work on this in > SOLR-1834). Note that, at the moment, this version doesn't deal with /update, > /replication etc., it's currently a /select thing at the moment (but it could > be used for these). > This approach uses a SearchComponent subclass called SolrACLSecurity. Its > configuration is read in from solrconfig.xml in the usual way, and the > allow/deny configuration is split out into a config file called acl.xml. > acl.xml defines a number of users and groups (and 1 global for 'everyone'), > and assigns 0 or more {{}} and/or {{}} elements. > When the SearchComponent is initialized, user objects are created and cached, > including an 'allow' list and a 'deny' list. > When a request comes in, these lists are used to build filter queries > ('allows' are OR'ed and 'denies' are NAND'ed), and then added to the query > request. > Because the allow and deny elements are simply subsearch queries (e.g. > {{somefield:secret}}, this mechanism will work on any > stored data that can be queried, including already existing data. > Authentication > One of the sticky problems with access control is how to determine who's > asking for data. There are many approaches, and to stay in the generic vein > the current mechanism uses http parameters for this. > For an initial search, a client includes a {{username=somename}} parameter > and a {{hash=pwdhash}} hash of its password. If the request sends the correct > parameters, the search is granted and a uuid parameter is returned in the > response header. This uuid can then be used in subsequent requests from the > client. If the request is wrong, the SearchComponent fails and will increment > the user's failed login count (if a valid user was specified). If this count > exceeds the configured lockoutThreshold, no further requests are granted > until the lockoutTime has elapsed. > This mechanism protects against some types of attacks (e.g. CLRF, dictionary > etc.), but it really needs container HTTPS as well (as would most other auth > implementations). Incorporating SSL certificates for authentication and > making the authentication mechanism pluggable would be a nice improvement > (i.e. separate authentication from access control). > Another issue is how internal searchers pe
[jira] Issue Comment Edited: (SOLR-1872) Document-level Access Control in Solr
[ https://issues.apache.org/jira/browse/SOLR-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855698#action_12855698 ] Peter Sturge edited comment on SOLR-1872 at 4/11/10 6:16 AM: - This update adds in optional auditing of searches by users and failed access attempts, plus a few minor tweaks. To configure auditing, here is a sample searchComponent section from solrconfg.xml: {{ zxb79j3g76A79N8N2AbR0K852976qr1klt86xv436j2 acl.xml true 10 1 audit.log 5 15 }} was (Author: midiman): This update adds in optional auditing of searches by users and failed access attempts, plus a few minor tweaks. To configure auditing, here is a sample searchComponent section from solrconfg.xml: {{ zxb79j3g76A79N8N2AbR0K852976qr1klt86xv436j2 acl.xml true 10 1 audit.log 5 15 }} > Document-level Access Control in Solr > - > > Key: SOLR-1872 > URL: https://issues.apache.org/jira/browse/SOLR-1872 > Project: Solr > Issue Type: New Feature > Components: SearchComponents - other >Affects Versions: 1.4 > Environment: Solr 1.4 >Reporter: Peter Sturge >Priority: Minor > Attachments: SolrACLSecurity.java, SolrACLSecurity.rar > > > This issue relates to providing document-level access control for Solr index > data. > A related JIRA issue is: SOLR-1834. I thought it would be best if I created a > separate JIRA issue, rather than tack on to SOLR-1834, as the approach here > is somewhat different, and I didn't want to confuse things or step on Anders' > good work. > There have been lots of discussions about document-level access in Solr using > LCF, custom comoponents and the like. Access Control is one of those subjects > that quickly spreads to lots of 'ratholes' to dive into. Even if not everyone > agrees with the approaches taken here, it does, at the very least, highlight > some of the salient issues surrounding access control in Solr, and will > hopefully initiate a healthy discussion on the range of related requirements, > with the aim of finding the optimum balance of requirements. > The approach taken here is document and schema agnostic - i.e. the access > control is independant of what is or will be in the index, and no schema > changes are required. This version doesn't include LDAP/AD integration, but > could be added relatively easily (see Ander's very good work on this in > SOLR-1834). Note that, at the moment, this version doesn't deal with /update, > /replication etc., it's currently a /select thing at the moment (but it could > be used for these). > This approach uses a SearchComponent subclass called SolrACLSecurity. Its > configuration is read in from solrconfig.xml in the usual way, and the > allow/deny configuration is split out into a config file called acl.xml. > acl.xml defines a number of users and groups (and 1 global for 'everyone'), > and assigns 0 or more {{}} and/or {{}} elements. > When the SearchComponent is initialized, user objects are created and cached, > including an 'allow' list and a 'deny' list. > When a request comes in, these lists are used to build filter queries > ('allows' are OR'ed and 'denies' are NAND'ed), and then added to the query > request. > Because the allow and deny elements are simply subsearch queries (e.g. > {{somefield:secret}}, this mechanism will work on any > stored data that can be queried, including already existing data. > Authentication > One of the sticky problems with access control is how to determine who's > asking for data. There are many approaches, and to stay in the generic vein > the current mechanism uses http parameters for this. > For an initial search, a client includes a {{username=somename}} parameter > and a {{hash=pwdhash}} hash of its password. If the request sends the correct > parameters, the search is granted and a uuid parameter is returned in the > response header. This uuid can then be used in subsequent requests from the > client. If the request is wrong, the SearchComponent fails and will increment > the user's failed login count (if a valid user was specified). If this count > exceeds the configured lockoutThreshold, no further requests are granted > until the lockoutTime has elapsed. > This mechanism protects against some types of attacks (e.g. CLRF, dictionary > etc.), but it really needs container HTTPS as well (as would most other auth > implementations). Incorporating SSL certificates for authentication and > making the authentication mechanism pluggable would be a nice improvement > (i.e. separate authentication from access control). > Another issue is how internal searchers perform aut
[jira] Updated: (SOLR-1872) Document-level Access Control in Solr
[ https://issues.apache.org/jira/browse/SOLR-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge updated SOLR-1872: --- Attachment: SolrACLSecurity.java This update adds in optional auditing of searches by users and failed access attempts, plus a few minor tweaks. To configure auditing, here is a sample searchComponent section from solrconfg.xml: {{ zxb79j3g76A79N8N2AbR0K852976qr1klt86xv436j2 acl.xml true 10 1 audit.log 5 15 }} > Document-level Access Control in Solr > - > > Key: SOLR-1872 > URL: https://issues.apache.org/jira/browse/SOLR-1872 > Project: Solr > Issue Type: New Feature > Components: SearchComponents - other >Affects Versions: 1.4 > Environment: Solr 1.4 >Reporter: Peter Sturge >Priority: Minor > Attachments: SolrACLSecurity.java, SolrACLSecurity.rar > > > This issue relates to providing document-level access control for Solr index > data. > A related JIRA issue is: SOLR-1834. I thought it would be best if I created a > separate JIRA issue, rather than tack on to SOLR-1834, as the approach here > is somewhat different, and I didn't want to confuse things or step on Anders' > good work. > There have been lots of discussions about document-level access in Solr using > LCF, custom comoponents and the like. Access Control is one of those subjects > that quickly spreads to lots of 'ratholes' to dive into. Even if not everyone > agrees with the approaches taken here, it does, at the very least, highlight > some of the salient issues surrounding access control in Solr, and will > hopefully initiate a healthy discussion on the range of related requirements, > with the aim of finding the optimum balance of requirements. > The approach taken here is document and schema agnostic - i.e. the access > control is independant of what is or will be in the index, and no schema > changes are required. This version doesn't include LDAP/AD integration, but > could be added relatively easily (see Ander's very good work on this in > SOLR-1834). Note that, at the moment, this version doesn't deal with /update, > /replication etc., it's currently a /select thing at the moment (but it could > be used for these). > This approach uses a SearchComponent subclass called SolrACLSecurity. Its > configuration is read in from solrconfig.xml in the usual way, and the > allow/deny configuration is split out into a config file called acl.xml. > acl.xml defines a number of users and groups (and 1 global for 'everyone'), > and assigns 0 or more {{}} and/or {{}} elements. > When the SearchComponent is initialized, user objects are created and cached, > including an 'allow' list and a 'deny' list. > When a request comes in, these lists are used to build filter queries > ('allows' are OR'ed and 'denies' are NAND'ed), and then added to the query > request. > Because the allow and deny elements are simply subsearch queries (e.g. > {{somefield:secret}}, this mechanism will work on any > stored data that can be queried, including already existing data. > Authentication > One of the sticky problems with access control is how to determine who's > asking for data. There are many approaches, and to stay in the generic vein > the current mechanism uses http parameters for this. > For an initial search, a client includes a {{username=somename}} parameter > and a {{hash=pwdhash}} hash of its password. If the request sends the correct > parameters, the search is granted and a uuid parameter is returned in the > response header. This uuid can then be used in subsequent requests from the > client. If the request is wrong, the SearchComponent fails and will increment > the user's failed login count (if a valid user was specified). If this count > exceeds the configured lockoutThreshold, no further requests are granted > until the lockoutTime has elapsed. > This mechanism protects against some types of attacks (e.g. CLRF, dictionary > etc.), but it really needs container HTTPS as well (as would most other auth > implementations). Incorporating SSL certificates for authentication and > making the authentication mechanism pluggable would be a nice improvement > (i.e. separate authentication from access control). > Another issue is how internal searchers perform autowarming etc. The solution > here is to use a local key called 'SolrACLSecurityKey'. This key is local and > [should be] unique to that server. firstSearcher, newSearcher et al then > include this key in their parameters so they can perform autowarming without > constraint. Again, there are likely many ways to achieve this, this approach > is but one. > The attached rar holds the source and associated configuration. This has been > tested on the 1.4 r
[jira] Updated: (SOLR-1872) Document-level Access Control in Solr
[ https://issues.apache.org/jira/browse/SOLR-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge updated SOLR-1872: --- Attachment: SolrACLSecurity.rar > Document-level Access Control in Solr > - > > Key: SOLR-1872 > URL: https://issues.apache.org/jira/browse/SOLR-1872 > Project: Solr > Issue Type: New Feature > Components: SearchComponents - other >Affects Versions: 1.4 > Environment: Solr 1.4 >Reporter: Peter Sturge >Priority: Minor > Attachments: SolrACLSecurity.rar > > > This issue relates to providing document-level access control for Solr index > data. > A related JIRA issue is: SOLR-1834. I thought it would be best if I created a > separate JIRA issue, rather than tack on to SOLR-1834, as the approach here > is somewhat different, and I didn't want to confuse things or step on Anders' > good work. > There have been lots of discussions about document-level access in Solr using > LCF, custom comoponents and the like. Access Control is one of those subjects > that quickly spreads to lots of 'ratholes' to dive into. Even if not everyone > agrees with the approaches taken here, it does, at the very least, highlight > some of the salient issues surrounding access control in Solr, and will > hopefully initiate a healthy discussion on the range of related requirements, > with the aim of finding the optimum balance of requirements. > The approach taken here is document and schema agnostic - i.e. the access > control is independant of what is or will be in the index, and no schema > changes are required. This version doesn't include LDAP/AD integration, but > could be added relatively easily (see Ander's very good work on this in > SOLR-1834). Note that, at the moment, this version doesn't deal with /update, > /replication etc., it's currently a /select thing at the moment (but it could > be used for these). > This approach uses a SearchComponent subclass called SolrACLSecurity. Its > configuration is read in from solrconfig.xml in the usual way, and the > allow/deny configuration is split out into a config file called acl.xml. > acl.xml defines a number of users and groups (and 1 global for 'everyone'), > and assigns 0 or more {{}} and/or {{}} elements. > When the SearchComponent is initialized, user objects are created and cached, > including an 'allow' list and a 'deny' list. > When a request comes in, these lists are used to build filter queries > ('allows' are OR'ed and 'denies' are NAND'ed), and then added to the query > request. > Because the allow and deny elements are simply subsearch queries (e.g. > {{somefield:secret}}, this mechanism will work on any > stored data that can be queried, including already existing data. > Authentication > One of the sticky problems with access control is how to determine who's > asking for data. There are many approaches, and to stay in the generic vein > the current mechanism uses http parameters for this. > For an initial search, a client includes a {{username=somename}} parameter > and a {{hash=pwdhash}} hash of its password. If the request sends the correct > parameters, the search is granted and a uuid parameter is returned in the > response header. This uuid can then be used in subsequent requests from the > client. If the request is wrong, the SearchComponent fails and will increment > the user's failed login count (if a valid user was specified). If this count > exceeds the configured lockoutThreshold, no further requests are granted > until the lockoutTime has elapsed. > This mechanism protects against some types of attacks (e.g. CLRF, dictionary > etc.), but it really needs container HTTPS as well (as would most other auth > implementations). Incorporating SSL certificates for authentication and > making the authentication mechanism pluggable would be a nice improvement > (i.e. separate authentication from access control). > Another issue is how internal searchers perform autowarming etc. The solution > here is to use a local key called 'SolrACLSecurityKey'. This key is local and > [should be] unique to that server. firstSearcher, newSearcher et al then > include this key in their parameters so they can perform autowarming without > constraint. Again, there are likely many ways to achieve this, this approach > is but one. > The attached rar holds the source and associated configuration. This has been > tested on the 1.4 release codebase (search in the attached solrconfig.xml for > SolrACLSecurity to find the relevant sections in this file). > I hope this proves helpful for people who are looking for this sort of > functionality in Solr, and more generally to address how such a mechanism > could ultimately be integrated into a future Solr release. > Many thanks, > Peter -- This message is automat
[jira] Created: (SOLR-1872) Document-level Access Control in Solr
Document-level Access Control in Solr - Key: SOLR-1872 URL: https://issues.apache.org/jira/browse/SOLR-1872 Project: Solr Issue Type: New Feature Components: SearchComponents - other Affects Versions: 1.4 Environment: Solr 1.4 Reporter: Peter Sturge Priority: Minor Attachments: SolrACLSecurity.rar This issue relates to providing document-level access control for Solr index data. A related JIRA issue is: SOLR-1834. I thought it would be best if I created a separate JIRA issue, rather than tack on to SOLR-1834, as the approach here is somewhat different, and I didn't want to confuse things or step on Anders' good work. There have been lots of discussions about document-level access in Solr using LCF, custom comoponents and the like. Access Control is one of those subjects that quickly spreads to lots of 'ratholes' to dive into. Even if not everyone agrees with the approaches taken here, it does, at the very least, highlight some of the salient issues surrounding access control in Solr, and will hopefully initiate a healthy discussion on the range of related requirements, with the aim of finding the optimum balance of requirements. The approach taken here is document and schema agnostic - i.e. the access control is independant of what is or will be in the index, and no schema changes are required. This version doesn't include LDAP/AD integration, but could be added relatively easily (see Ander's very good work on this in SOLR-1834). Note that, at the moment, this version doesn't deal with /update, /replication etc., it's currently a /select thing at the moment (but it could be used for these). This approach uses a SearchComponent subclass called SolrACLSecurity. Its configuration is read in from solrconfig.xml in the usual way, and the allow/deny configuration is split out into a config file called acl.xml. acl.xml defines a number of users and groups (and 1 global for 'everyone'), and assigns 0 or more {{}} and/or {{}} elements. When the SearchComponent is initialized, user objects are created and cached, including an 'allow' list and a 'deny' list. When a request comes in, these lists are used to build filter queries ('allows' are OR'ed and 'denies' are NAND'ed), and then added to the query request. Because the allow and deny elements are simply subsearch queries (e.g. {{somefield:secret}}, this mechanism will work on any stored data that can be queried, including already existing data. Authentication One of the sticky problems with access control is how to determine who's asking for data. There are many approaches, and to stay in the generic vein the current mechanism uses http parameters for this. For an initial search, a client includes a {{username=somename}} parameter and a {{hash=pwdhash}} hash of its password. If the request sends the correct parameters, the search is granted and a uuid parameter is returned in the response header. This uuid can then be used in subsequent requests from the client. If the request is wrong, the SearchComponent fails and will increment the user's failed login count (if a valid user was specified). If this count exceeds the configured lockoutThreshold, no further requests are granted until the lockoutTime has elapsed. This mechanism protects against some types of attacks (e.g. CLRF, dictionary etc.), but it really needs container HTTPS as well (as would most other auth implementations). Incorporating SSL certificates for authentication and making the authentication mechanism pluggable would be a nice improvement (i.e. separate authentication from access control). Another issue is how internal searchers perform autowarming etc. The solution here is to use a local key called 'SolrACLSecurityKey'. This key is local and [should be] unique to that server. firstSearcher, newSearcher et al then include this key in their parameters so they can perform autowarming without constraint. Again, there are likely many ways to achieve this, this approach is but one. The attached rar holds the source and associated configuration. This has been tested on the 1.4 release codebase (search in the attached solrconfig.xml for SolrACLSecurity to find the relevant sections in this file). I hope this proves helpful for people who are looking for this sort of functionality in Solr, and more generally to address how such a mechanism could ultimately be integrated into a future Solr release. Many thanks, Peter -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1143) Return partial results when a connection to a shard is refused
[ https://issues.apache.org/jira/browse/SOLR-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853168#action_12853168 ] Peter Sturge commented on SOLR-1143: This is a cool patch - yes, very useful. I've found a couple of issues with it, though: 1. When going through the 'waiting for shard replies' loop, because no exception is thrown on shard failure, the next block after the loop can throw a NullPointerException in {{SearchComponent.handleResponses()}} for any SearchComponent that checks shard responses. It could be that this doesn't always happen, but it certainly happens in FacetComponent when date_facets are turned on. 2. There's a bit of code that sets {{partialResults=true}} if there's at least one failure, but it doesn't set it to false if everything's ok. In order for the patch to operate, this parameter must have already been present and true, otherwise the patch is essentially 'disabled' anyway (problem of using the same parameter as input and result). I've made some modifications to the patch for these and a couple of other things: 1. FacetComponent modified to check for null shard reponse. Perhaps it would be better to check this in SearchHandler.handleResponses(), but then no SearchComponents would be contacted re failed shards, even if they don't care that it's failed (is that a good thing?). 2. Added a new CommonParams parameter called FAILED_SHARDS. {{partialResults}} is now only an input parameter to enable the feature (Note: {{partialResults}} is referenced in RequestHandlerBase, but it's not from the patch - is this an existing parameter that is used for something else?! If so, perhaps the name should be changed to something like {{allowPartialResults}} to avoid b/w compat and other potential conflicts). The output parameter that goes in the response header is now: {{failedShards=shard0;shard1;shardn}}. If everything succeeds, there will be no failedShards in the response header, otherwise, a list of failed shards is given. This is very useful to alert someone/something that a server/network needs attention (e.g. a health checker thread could run empty disributed seaches solely for the purpose of checking status). 3. Changed the detection of a shard request error to be any Exception, rather than just ConnectException. This way, any failure is caught and can be actioned. Possible TODO: it might be nice to include a short message (Exception class name?) in the FAILED_SHARDS parameter about what failed (e.g. ConnectException, IOException, etc.). If you like this idea, please say so, and I'll include it - i.e. something like: {{ failedShards=myshard:8983/solr/core0|ConnectException;myothershard:8983/solr/core0|IOException}} I'm currently testing these changes in our internal build. In the meantime, any comments are grealy appreciated. If there are no objections, I'll add a patch update when the dev test run is complete. > Return partial results when a connection to a shard is refused > -- > > Key: SOLR-1143 > URL: https://issues.apache.org/jira/browse/SOLR-1143 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Nicolas Dessaigne >Assignee: Grant Ingersoll > Fix For: 1.5 > > Attachments: SOLR-1143-2.patch, SOLR-1143-3.patch, SOLR-1143.patch > > > If any shard is down in a distributed search, a ConnectException it thrown. > Here's a little patch that change this behaviour: if we can't connect to a > shard (ConnectException), we get partial results from the active shards. As > for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we > set the parameter "partialResults" at true. > This patch also adresses a problem expressed in the mailing list about a year > ago > (http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html) > We have a use case that needs this behaviour and we would like to know your > thougths about such a behaviour? Should it be the default behaviour for > distributed search? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1861) HTTP Authentication for sharded queries
[ https://issues.apache.org/jira/browse/SOLR-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge updated SOLR-1861: --- Attachment: SearchHandler.java A small update to this patch to support distributed searches with multiple cores. > HTTP Authentication for sharded queries > --- > > Key: SOLR-1861 > URL: https://issues.apache.org/jira/browse/SOLR-1861 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 1.4 > Environment: Solr 1.4 >Reporter: Peter Sturge >Priority: Minor > Attachments: SearchHandler.java, SearchHandler.java > > > This issue came out of a requirement to have HTTP authentication for queries. > Currently, HTTP authentication works for querying single servers, but it's > not possible for distributed searches across multiple shards to receive > authenticated http requests. > This patch adds the option for Solr clients to pass shard-specific http > credentials to SearchHandler, which can then use these credentials when > making http requests to shards. > Here's how the patch works: > A final constant String called {{shardcredentials}} acts as the name of the > SolrParams parameter key name. > The format for the value associated with this key is a comma-delimited list > of colon-separated tokens: > {{ > shard0:port0:username0:password0,shard1:port1:username1:password1,shardN:portN:usernameN:passwordN > }} > A client adds these parameters to their sharded request. > In the absence of {{shardcredentials}} and/or matching credentials, the patch > reverts to the existing behaviour of using a default http client (i.e. no > credentials). This ensures b/w compatibility. > When SearchHandler receives the request, it passes the 'shardcredentials' > parameter to the HttpCommComponent via the submit() method. > The HttpCommComponent parses the parameter string, and when it finds matching > credentials for a given shard, it creates an HttpClient object with those > credentials, and then sends the request using this. > Note: Because the match comparison is a string compare (a.o.t. dns compare), > the host/ip names used in the shardcredentials parameters must match those > used in the shards parameter. > Impl Notes: > This patch is used and tested on the 1.4 release codebase. There weren't any > significant diffs between the 1.4 release and the latest trunk for > SearchHandler, so should be fine on other trunks, but I've only tested with > the 1.4 release code base. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1861) HTTP Authentication for sharded queries
[ https://issues.apache.org/jira/browse/SOLR-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge updated SOLR-1861: --- Attachment: SearchHandler.java Apologies that this is the source file and not a diff'ed patch file. I've tried so many Win doze svn products, but I just can't get them to create a patch file (I'm sure this is more down to me not configuring them correctly, rather than rapidsvn, visualsvn, Tortoisesvn etc.). If someone would like to create a patch file from this source, that would be extraordinarily kind of you! In any case, the changes to this file are quite straightforward. > HTTP Authentication for sharded queries > --- > > Key: SOLR-1861 > URL: https://issues.apache.org/jira/browse/SOLR-1861 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 1.4 > Environment: Solr 1.4 >Reporter: Peter Sturge >Priority: Minor > Attachments: SearchHandler.java > > > This issue came out of a requirement to have HTTP authentication for queries. > Currently, HTTP authentication works for querying single servers, but it's > not possible for distributed searches across multiple shards to receive > authenticated http requests. > This patch adds the option for Solr clients to pass shard-specific http > credentials to SearchHandler, which can then use these credentials when > making http requests to shards. > Here's how the patch works: > A final constant String called {{shardcredentials}} acts as the name of the > SolrParams parameter key name. > The format for the value associated with this key is a comma-delimited list > of colon-separated tokens: > {{ > shard0:port0:username0:password0,shard1:port1:username1:password1,shardN:portN:usernameN:passwordN > }} > A client adds these parameters to their sharded request. > In the absence of {{shardcredentials}} and/or matching credentials, the patch > reverts to the existing behaviour of using a default http client (i.e. no > credentials). This ensures b/w compatibility. > When SearchHandler receives the request, it passes the 'shardcredentials' > parameter to the HttpCommComponent via the submit() method. > The HttpCommComponent parses the parameter string, and when it finds matching > credentials for a given shard, it creates an HttpClient object with those > credentials, and then sends the request using this. > Note: Because the match comparison is a string compare (a.o.t. dns compare), > the host/ip names used in the shardcredentials parameters must match those > used in the shards parameter. > Impl Notes: > This patch is used and tested on the 1.4 release codebase. There weren't any > significant diffs between the 1.4 release and the latest trunk for > SearchHandler, so should be fine on other trunks, but I've only tested with > the 1.4 release code base. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1861) HTTP Authentication for sharded queries
HTTP Authentication for sharded queries --- Key: SOLR-1861 URL: https://issues.apache.org/jira/browse/SOLR-1861 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Environment: Solr 1.4 Reporter: Peter Sturge Priority: Minor This issue came out of a requirement to have HTTP authentication for queries. Currently, HTTP authentication works for querying single servers, but it's not possible for distributed searches across multiple shards to receive authenticated http requests. This patch adds the option for Solr clients to pass shard-specific http credentials to SearchHandler, which can then use these credentials when making http requests to shards. Here's how the patch works: A final constant String called {{shardcredentials}} acts as the name of the SolrParams parameter key name. The format for the value associated with this key is a comma-delimited list of colon-separated tokens: {{ shard0:port0:username0:password0,shard1:port1:username1:password1,shardN:portN:usernameN:passwordN }} A client adds these parameters to their sharded request. In the absence of {{shardcredentials}} and/or matching credentials, the patch reverts to the existing behaviour of using a default http client (i.e. no credentials). This ensures b/w compatibility. When SearchHandler receives the request, it passes the 'shardcredentials' parameter to the HttpCommComponent via the submit() method. The HttpCommComponent parses the parameter string, and when it finds matching credentials for a given shard, it creates an HttpClient object with those credentials, and then sends the request using this. Note: Because the match comparison is a string compare (a.o.t. dns compare), the host/ip names used in the shardcredentials parameters must match those used in the shards parameter. Impl Notes: This patch is used and tested on the 1.4 release codebase. There weren't any significant diffs between the 1.4 release and the latest trunk for SearchHandler, so should be fine on other trunks, but I've only tested with the 1.4 release code base. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1672) RFE: facet reverse sort count
[ https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850159#action_12850159 ] Peter Sturge commented on SOLR-1672: I agree there's some refactoring to do to bring it in line with current FacetParams conventions. At the same time, it would be good to look at wrapping up the functionality into a method, and covering all the code paths in the way you describe. I've been wanting to get to finishing off this patch, but I'm in the throws of a product release myself, so I've not had many spare cycles. You mention termenum, fieldcache, uninverted - presumably, these are among the code paths that need to cater for facet counts. If you know them, can you add a comment here that lists all the areas that need to be catered for, so that none are left out (if it's more than those 3). Thanks! Peter > RFE: facet reverse sort count > - > > Key: SOLR-1672 > URL: https://issues.apache.org/jira/browse/SOLR-1672 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 1.4 > Environment: Java, Solrj, http >Reporter: Peter Sturge >Priority: Minor > Attachments: SOLR-1672.patch > > Original Estimate: 0h > Remaining Estimate: 0h > > As suggested by Chris Hosstetter, I have added an optional Comparator to the > BoundedTreeSet in the UnInvertedField class. > This optional comparator is used when a new (and also optional) field facet > parameter called 'facet.sortorder' is set to the string 'dsc' > (e.g. &f..facet.sortorder=dsc for per field, or > &facet.sortorder=dsc for all facets). > Note that this parameter has no effect if facet.method=enum. > Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to > its default behaviour. > > This change affects 2 source files: > > UnInvertedField.java > [line 438] The getCounts() method signature is modified to add the > 'facetSortOrder' parameter value to the end of the argument list. > > DIFF UnInvertedField.java: > - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int > offset, int limit, Integer mincount, boolean missing, String sort, String > prefix) throws IOException { > + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int > offset, int limit, Integer mincount, boolean missing, String sort, String > prefix, String facetSortOrder) throws IOException { > [line 556] The getCounts() method is modified to create an overridden > BoundedTreeSet(int, Comparator) if the 'facetSortOrder' parameter > equals 'dsc'. > DIFF UnInvertedField.java: > - final BoundedTreeSet queue = new BoundedTreeSet(maxsize); > + final BoundedTreeSet queue = (sort.equals("count") || > sort.equals("true")) ? (facetSortOrder.equals("dsc") ? new > BoundedTreeSet(maxsize, new Comparator() > { @Override > public int compare(Object o1, Object o2) > { > if (o1 == null || o2 == null) > return 0; > int result = ((Long) o1).compareTo((Long) o2); > return (result != 0 ? result > 0 ? -1 : 1 : 0); //lowest number first sort > }}) : new BoundedTreeSet(maxsize)) : null; > > SimpleFacets.java > [line 221] A getFieldParam(field, "facet.sortorder", "asc"); is added to > retrieve the new parameter, if present. 'asc' used as a default value. > DIFF SimpleFacets.java: > + String facetSortOrder = params.getFieldParam(field, "facet.sortorder", > "asc"); > > [line 253] The call to uif.getCounts() in the getTermCounts() method is > modified to pass the 'facetSortOrder' value string. > DIFF SimpleFacets.java: > - counts = uif.getCounts(searcher, base, offset, limit, > mincount,missing,sort,prefix); > + counts = uif.getCounts(searcher, base, offset, limit, > mincount,missing,sort,prefix, facetSortOrder); > Implementation Notes: > I have noted in testing that I was not able to retrieve any '0' counts as I > had expected. > I believe this could be because there appear to be some optimizations in > SimpleFacets/count caching such that zero counts are not iterated (at least > not by default) > as a performance enhancement. > I could be wrong about this, and zero counts may appear under some other as > yet untested circumstances. Perhaps an expert familiar with this part of the > code can clarify. > In fact, this is not such a bad thing (at least for my requirements), as a > whole bunch of zero counts is not necessarily useful (for my requirements, > starting at '1' is just right). > > There may, however, be instances where someone *will* want zero counts - e.g. > searching for zero product stock counts (e.g. 'what have we run out of'). I > was envisioning the facet.mincount field > being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 > or possibly higher), but because of the caching/optimizati
[jira] Updated: (SOLR-1729) Date Facet now override time parameter
[ https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge updated SOLR-1729: --- Attachment: UnInvertedField.java Hi Thomas, Thanks for catching this. I thought I'd attached that one. *sigh* Honestly, that is really slack of me - many apologies. The attached UnInvertedField.java has the updated getCounts() method. Any troubles, let me know. Thanks! Peter > Date Facet now override time parameter > -- > > Key: SOLR-1729 > URL: https://issues.apache.org/jira/browse/SOLR-1729 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 1.4 > Environment: Solr 1.4 >Reporter: Peter Sturge >Priority: Minor > Attachments: FacetParams.java, SimpleFacets.java, UnInvertedField.java > > > This PATCH introduces a new query parameter that tells a (typically, but not > necessarily) remote server what time to use as 'NOW' when calculating date > facets for a query (and, for the moment, date facets *only*) - overriding the > default behaviour of using the local server's current time. > This gets 'round a problem whereby an explicit time range is specified in a > query (e.g. timestamp:[then0 TO then1]), and date facets are required for the > given time range (in fact, any explicit time range). > Because DateMathParser performs all its calculations from 'NOW', remote > callers have to work out how long ago 'then0' and 'then1' are from 'now', and > use the relative-to-now values in the facet.date.xxx parameters. If a remote > server has a different opinion of NOW compared to the caller, the results > will be skewed (e.g. they are in a different time-zone, not time-synced etc.). > This becomes particularly salient when performing distributed date faceting > (see SOLR-1709), where multiple shards may all be running with different > times, and the faceting needs to be aligned. > The new parameter is called 'facet.date.now', and takes as a parameter a > (stringified) long that is the number of milliseconds from the epoch (1 Jan > 1970 00:00) - i.e. the returned value from a System.currentTimeMillis() call. > This was chosen over a formatted date to delineate it from a 'searchable' > time and to avoid superfluous date parsing. This makes the value generally a > programatically-set value, but as that is where the use-case is for this type > of parameter, this should be ok. > NOTE: This parameter affects date facet timing only. If there are other areas > of a query that rely on 'NOW', these will not interpret this value. This is a > broader issue about setting a 'query-global' NOW that all parts of query > analysis can share. > Source files affected: > FacetParams.java (holds the new constant FACET_DATE_NOW) > SimpleFacets.java getFacetDateCounts() NOW parameter modified > This PATCH is mildly related to SOLR-1709 (Distributed Date Faceting), but as > it's a general change for date faceting, it was deemed deserving of its own > patch. I will be updating SOLR-1709 in due course to include the use of this > new parameter, after some rfc acceptance. > A possible enhancement to this is to detect facet.date fields, look for and > match these fields in queries (if they exist), and potentially determine > automatically the required time skew, if any. There are a whole host of > reasons why this could be problematic to implement, so an explicit > facet.date.now parameter is the safest route. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1709) Distributed Date Faceting
[ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834222#action_12834222 ] Peter Sturge commented on SOLR-1709: Hi Thomas, Hmmm...TermsHelper is an inner class inside TermsComponent. In the code base that I have, this class exists within TermsComponent. I've just had a look on the http://mirrors.dedipower.com/ftp.apache.org/lucene/solr/1.4.0/ mirror, and the TermsComponent *doesn't* have this inner class. Not sure where the difference is, as I would have got my codebase from the same set of mirrors as you (unless some mirrors are out-of-sync?). TermsComponent hasn't changed in this patch, so I don't know much about this class. One thing to try is to diff the 2 files above with your 1.4 codebase, and merge the changes into your codebase. The differences should be very easy to see. This does highlight the very good policy for putting patch files as attachments rather than source files. This is my fault, as we don't use svn in our (win) environment, and Tortoise SVN crashes explorer64, so i'm not able to make compatible diff files - sorry. If you do create a couple of diff files, it would be very kind of you if you could post it up on this issue for others? Thanks! > Distributed Date Faceting > - > > Key: SOLR-1709 > URL: https://issues.apache.org/jira/browse/SOLR-1709 > Project: Solr > Issue Type: Improvement > Components: SearchComponents - other >Affects Versions: 1.4 >Reporter: Peter Sturge >Priority: Minor > Attachments: FacetComponent.java, FacetComponent.java, > ResponseBuilder.java > > > This patch is for adding support for date facets when using distributed > searches. > Date faceting across multiple machines exposes some time-based issues that > anyone interested in this behaviour should be aware of: > Any time and/or time-zone differences are not accounted for in the patch > (i.e. merged date facets are at a time-of-day, not necessarily at a universal > 'instant-in-time', unless all shards are time-synced to the exact same time). > The implementation uses the first encountered shard's facet_dates as the > basis for subsequent shards' data to be merged in. > This means that if subsequent shards' facet_dates are skewed in relation to > the first by >1 'gap', these 'earlier' or 'later' facets will not be merged > in. > There are several reasons for this: > * Performance: It's faster to check facet_date lists against a single map's > data, rather than against each other, particularly if there are many shards > * If 'earlier' and/or 'later' facet_dates are added in, this will make the > time range larger than that which was requested > (e.g. a request for one hour's worth of facets could bring back 2, 3 > or more hours of data) > This could be dealt with if timezone and skew information was added, and > the dates were normalized. > One possibility for adding such support is to [optionally] add 'timezone' and > 'now' parameters to the 'facet_dates' map. This would tell requesters what > time and TZ the remote server thinks it is, and so multiple shards' time data > can be normalized. > The patch affects 2 files in the Solr core: > org.apache.solr.handler.component.FacetComponent.java > org.apache.solr.handler.component.ResponseBuilder.java > The main changes are in FacetComponent - ResponseBuilder is just to hold the > completed SimpleOrderedMap until the finishStage. > One possible enhancement is to perhaps make this an optional parameter, but > really, if facet.date parameters are specified, it is assumed they are > desired. > Comments & suggestions welcome. > As a favour to ask, if anyone could take my 2 source files and create a PATCH > file from it, it would be greatly appreciated, as I'm having a bit of trouble > with svn (don't shoot me, but my environment is a Redmond-based os company). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1729) Date Facet now override time parameter
[ https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829043#action_12829043 ] Peter Sturge commented on SOLR-1729: Hi Chris, Thanks for your comments - I hope I didn't sound like your comments were taken wrongly - I absolutely count on comments from you and other experts to make sure I'm not missing some important functionality and/or side effect. You know the code base far better than I, so its great that you take the time to point out all the different bits and peices that need addressing. I can certainly understand the need to address the 'core-global' isssues raised by you and Yonik for storing a ThreadLocal 'query-global' 'NOW'. I suppose the main issue in implementing the thread-local route is that we'd have to make sure we found every place in the query core that references now, and point those references to the new variable? If the 'code-at-large' [hopefully] always calls the date math routines for finding 'NOW', great, it should be relatively straightforward. If there are any stray e.g. System.currentTimeMillis(), then it's a bit more fiddly, but still do-able. ??it's all handled internally by DateField?? Sounds like DateField would the best candidate for holding the ThreadLocal? The query handler code can set the variable of its DateField instance if it's set in a query parameter, otherwise it just defaults to it's own local (UTC) time. Could be done similarly to DateField.ThreadLocalDateFormat, perhaps? > Date Facet now override time parameter > -- > > Key: SOLR-1729 > URL: https://issues.apache.org/jira/browse/SOLR-1729 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 1.4 > Environment: Solr 1.4 >Reporter: Peter Sturge >Priority: Minor > Attachments: FacetParams.java, SimpleFacets.java > > > This PATCH introduces a new query parameter that tells a (typically, but not > necessarily) remote server what time to use as 'NOW' when calculating date > facets for a query (and, for the moment, date facets *only*) - overriding the > default behaviour of using the local server's current time. > This gets 'round a problem whereby an explicit time range is specified in a > query (e.g. timestamp:[then0 TO then1]), and date facets are required for the > given time range (in fact, any explicit time range). > Because DateMathParser performs all its calculations from 'NOW', remote > callers have to work out how long ago 'then0' and 'then1' are from 'now', and > use the relative-to-now values in the facet.date.xxx parameters. If a remote > server has a different opinion of NOW compared to the caller, the results > will be skewed (e.g. they are in a different time-zone, not time-synced etc.). > This becomes particularly salient when performing distributed date faceting > (see SOLR-1709), where multiple shards may all be running with different > times, and the faceting needs to be aligned. > The new parameter is called 'facet.date.now', and takes as a parameter a > (stringified) long that is the number of milliseconds from the epoch (1 Jan > 1970 00:00) - i.e. the returned value from a System.currentTimeMillis() call. > This was chosen over a formatted date to delineate it from a 'searchable' > time and to avoid superfluous date parsing. This makes the value generally a > programatically-set value, but as that is where the use-case is for this type > of parameter, this should be ok. > NOTE: This parameter affects date facet timing only. If there are other areas > of a query that rely on 'NOW', these will not interpret this value. This is a > broader issue about setting a 'query-global' NOW that all parts of query > analysis can share. > Source files affected: > FacetParams.java (holds the new constant FACET_DATE_NOW) > SimpleFacets.java getFacetDateCounts() NOW parameter modified > This PATCH is mildly related to SOLR-1709 (Distributed Date Faceting), but as > it's a general change for date faceting, it was deemed deserving of its own > patch. I will be updating SOLR-1709 in due course to include the use of this > new parameter, after some rfc acceptance. > A possible enhancement to this is to detect facet.date fields, look for and > match these fields in queries (if they exist), and potentially determine > automatically the required time skew, if any. There are a whole host of > reasons why this could be problematic to implement, so an explicit > facet.date.now parameter is the safest route. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1729) Date Facet now override time parameter
[ https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805995#action_12805995 ] Peter Sturge commented on SOLR-1729: ??...they might not all get queried at the exact same time?? I suppose this is what the explicit 'NOW' is meant to resolve - staggered/lagged receipt/response, and, in an erzatz fashion, discrepencies in local time sync. Since the passed-in 'NOW' is relative only to the epoch, network latency is handled, and time-sync on any given server is assumed to be correct. ??...multiple requets might be made to a single server for different phrases of the distributed request that expect to get the same answers.?? As long as the same code path is followed for such requests, it should honour the same (passed-in) 'NOW'. Are there scenarios where this is not the case? In which case, yes, these would need to be addressed. ??...unless filter queries that use date math also respect it the counts returned from date faceting will still potentially be non-sensical.?? Definitely filter queries will need to get/use/honour the same 'NOW' as its corresponding query, otherwise anarchy will quickly ensue. Can you point me toward the class(es) where filter queries' date math lives, and I'll have a look? As filter queries are cached separately, can you think of any potential caching issues relating to filter queries? > Date Facet now override time parameter > -- > > Key: SOLR-1729 > URL: https://issues.apache.org/jira/browse/SOLR-1729 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 1.4 > Environment: Solr 1.4 >Reporter: Peter Sturge >Priority: Minor > Attachments: FacetParams.java, SimpleFacets.java > > > This PATCH introduces a new query parameter that tells a (typically, but not > necessarily) remote server what time to use as 'NOW' when calculating date > facets for a query (and, for the moment, date facets *only*) - overriding the > default behaviour of using the local server's current time. > This gets 'round a problem whereby an explicit time range is specified in a > query (e.g. timestamp:[then0 TO then1]), and date facets are required for the > given time range (in fact, any explicit time range). > Because DateMathParser performs all its calculations from 'NOW', remote > callers have to work out how long ago 'then0' and 'then1' are from 'now', and > use the relative-to-now values in the facet.date.xxx parameters. If a remote > server has a different opinion of NOW compared to the caller, the results > will be skewed (e.g. they are in a different time-zone, not time-synced etc.). > This becomes particularly salient when performing distributed date faceting > (see SOLR-1709), where multiple shards may all be running with different > times, and the faceting needs to be aligned. > The new parameter is called 'facet.date.now', and takes as a parameter a > (stringified) long that is the number of milliseconds from the epoch (1 Jan > 1970 00:00) - i.e. the returned value from a System.currentTimeMillis() call. > This was chosen over a formatted date to delineate it from a 'searchable' > time and to avoid superfluous date parsing. This makes the value generally a > programatically-set value, but as that is where the use-case is for this type > of parameter, this should be ok. > NOTE: This parameter affects date facet timing only. If there are other areas > of a query that rely on 'NOW', these will not interpret this value. This is a > broader issue about setting a 'query-global' NOW that all parts of query > analysis can share. > Source files affected: > FacetParams.java (holds the new constant FACET_DATE_NOW) > SimpleFacets.java getFacetDateCounts() NOW parameter modified > This PATCH is mildly related to SOLR-1709 (Distributed Date Faceting), but as > it's a general change for date faceting, it was deemed deserving of its own > patch. I will be updating SOLR-1709 in due course to include the use of this > new parameter, after some rfc acceptance. > A possible enhancement to this is to detect facet.date fields, look for and > match these fields in queries (if they exist), and potentially determine > automatically the required time skew, if any. There are a whole host of > reasons why this could be problematic to implement, so an explicit > facet.date.now parameter is the safest route. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1729) Date Facet now override time parameter
[ https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803860#action_12803860 ] Peter Sturge commented on SOLR-1729: I agree there are wider issues that relate to this -- this particular patch addresses the time sync issue for allowing distributed date facets to happen. In this case, you must have multiple cores using the same NOW for all, so that your date facets are consistent. In fact, it doesn't really matter which now you use, as long they're all the same -- the caller setting the now value makes the most sense. For other time-related queries, this might not be the case, but as you rightly pointed out, these are not addressed here. > Date Facet now override time parameter > -- > > Key: SOLR-1729 > URL: https://issues.apache.org/jira/browse/SOLR-1729 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 1.4 > Environment: Solr 1.4 >Reporter: Peter Sturge >Priority: Minor > Attachments: FacetParams.java, SimpleFacets.java > > > This PATCH introduces a new query parameter that tells a (typically, but not > necessarily) remote server what time to use as 'NOW' when calculating date > facets for a query (and, for the moment, date facets *only*) - overriding the > default behaviour of using the local server's current time. > This gets 'round a problem whereby an explicit time range is specified in a > query (e.g. timestamp:[then0 TO then1]), and date facets are required for the > given time range (in fact, any explicit time range). > Because DateMathParser performs all its calculations from 'NOW', remote > callers have to work out how long ago 'then0' and 'then1' are from 'now', and > use the relative-to-now values in the facet.date.xxx parameters. If a remote > server has a different opinion of NOW compared to the caller, the results > will be skewed (e.g. they are in a different time-zone, not time-synced etc.). > This becomes particularly salient when performing distributed date faceting > (see SOLR-1709), where multiple shards may all be running with different > times, and the faceting needs to be aligned. > The new parameter is called 'facet.date.now', and takes as a parameter a > (stringified) long that is the number of milliseconds from the epoch (1 Jan > 1970 00:00) - i.e. the returned value from a System.currentTimeMillis() call. > This was chosen over a formatted date to delineate it from a 'searchable' > time and to avoid superfluous date parsing. This makes the value generally a > programatically-set value, but as that is where the use-case is for this type > of parameter, this should be ok. > NOTE: This parameter affects date facet timing only. If there are other areas > of a query that rely on 'NOW', these will not interpret this value. This is a > broader issue about setting a 'query-global' NOW that all parts of query > analysis can share. > Source files affected: > FacetParams.java (holds the new constant FACET_DATE_NOW) > SimpleFacets.java getFacetDateCounts() NOW parameter modified > This PATCH is mildly related to SOLR-1709 (Distributed Date Faceting), but as > it's a general change for date faceting, it was deemed deserving of its own > patch. I will be updating SOLR-1709 in due course to include the use of this > new parameter, after some rfc acceptance. > A possible enhancement to this is to detect facet.date fields, look for and > match these fields in queries (if they exist), and potentially determine > automatically the required time skew, if any. There are a whole host of > reasons why this could be problematic to implement, so an explicit > facet.date.now parameter is the safest route. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1672) RFE: facet reverse sort count
[ https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803858#action_12803858 ] Peter Sturge commented on SOLR-1672: Jan, you are absolutely correct that the parameter should (and will) be 'desc'. I have an update in my queue of things todo which changes this, but also removes the new 'facet.sortorder' parameter, and includes instead 'facet.sort desc' as a valid parameter for facet.sort. This keeps things nice and tidy and consistent. The 'facet.sortorder' parameter was really as POC to try out the behaviour before changing the core parameter syntax of the existing 'facet.sort' parameter. Not that's done, the parameter will be rolled into 'facet.sort'. Thanks, Peter > RFE: facet reverse sort count > - > > Key: SOLR-1672 > URL: https://issues.apache.org/jira/browse/SOLR-1672 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 1.4 > Environment: Java, Solrj, http >Reporter: Peter Sturge >Priority: Minor > Attachments: SOLR-1672.patch > > Original Estimate: 0h > Remaining Estimate: 0h > > As suggested by Chris Hosstetter, I have added an optional Comparator to the > BoundedTreeSet in the UnInvertedField class. > This optional comparator is used when a new (and also optional) field facet > parameter called 'facet.sortorder' is set to the string 'dsc' > (e.g. &f..facet.sortorder=dsc for per field, or > &facet.sortorder=dsc for all facets). > Note that this parameter has no effect if facet.method=enum. > Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to > its default behaviour. > > This change affects 2 source files: > > UnInvertedField.java > [line 438] The getCounts() method signature is modified to add the > 'facetSortOrder' parameter value to the end of the argument list. > > DIFF UnInvertedField.java: > - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int > offset, int limit, Integer mincount, boolean missing, String sort, String > prefix) throws IOException { > + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int > offset, int limit, Integer mincount, boolean missing, String sort, String > prefix, String facetSortOrder) throws IOException { > [line 556] The getCounts() method is modified to create an overridden > BoundedTreeSet(int, Comparator) if the 'facetSortOrder' parameter > equals 'dsc'. > DIFF UnInvertedField.java: > - final BoundedTreeSet queue = new BoundedTreeSet(maxsize); > + final BoundedTreeSet queue = (sort.equals("count") || > sort.equals("true")) ? (facetSortOrder.equals("dsc") ? new > BoundedTreeSet(maxsize, new Comparator() > { @Override > public int compare(Object o1, Object o2) > { > if (o1 == null || o2 == null) > return 0; > int result = ((Long) o1).compareTo((Long) o2); > return (result != 0 ? result > 0 ? -1 : 1 : 0); //lowest number first sort > }}) : new BoundedTreeSet(maxsize)) : null; > > SimpleFacets.java > [line 221] A getFieldParam(field, "facet.sortorder", "asc"); is added to > retrieve the new parameter, if present. 'asc' used as a default value. > DIFF SimpleFacets.java: > + String facetSortOrder = params.getFieldParam(field, "facet.sortorder", > "asc"); > > [line 253] The call to uif.getCounts() in the getTermCounts() method is > modified to pass the 'facetSortOrder' value string. > DIFF SimpleFacets.java: > - counts = uif.getCounts(searcher, base, offset, limit, > mincount,missing,sort,prefix); > + counts = uif.getCounts(searcher, base, offset, limit, > mincount,missing,sort,prefix, facetSortOrder); > Implementation Notes: > I have noted in testing that I was not able to retrieve any '0' counts as I > had expected. > I believe this could be because there appear to be some optimizations in > SimpleFacets/count caching such that zero counts are not iterated (at least > not by default) > as a performance enhancement. > I could be wrong about this, and zero counts may appear under some other as > yet untested circumstances. Perhaps an expert familiar with this part of the > code can clarify. > In fact, this is not such a bad thing (at least for my requirements), as a > whole bunch of zero counts is not necessarily useful (for my requirements, > starting at '1' is just right). > > There may, however, be instances where someone *will* want zero counts - e.g. > searching for zero product stock counts (e.g. 'what have we run out of'). I > was envisioning the facet.mincount field > being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 > or possibly higher), but because of the caching/optimization, the behaviour > is somewhat different than expected. -- This message is automatically generated by JIRA. - You can re
[jira] Updated: (SOLR-1709) Distributed Date Faceting
[ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge updated SOLR-1709: --- Attachment: FacetComponent.java Updated version of FacetComponent.java after more testing and sync with FacetParams.FACET_DATE_NOW (see SOLR-1729). For use with the 1.4 trunk (along with the existing ResponseBuilder.java in this patch). > Distributed Date Faceting > - > > Key: SOLR-1709 > URL: https://issues.apache.org/jira/browse/SOLR-1709 > Project: Solr > Issue Type: Improvement > Components: SearchComponents - other >Affects Versions: 1.4 >Reporter: Peter Sturge >Priority: Minor > Attachments: FacetComponent.java, FacetComponent.java, > ResponseBuilder.java > > > This patch is for adding support for date facets when using distributed > searches. > Date faceting across multiple machines exposes some time-based issues that > anyone interested in this behaviour should be aware of: > Any time and/or time-zone differences are not accounted for in the patch > (i.e. merged date facets are at a time-of-day, not necessarily at a universal > 'instant-in-time', unless all shards are time-synced to the exact same time). > The implementation uses the first encountered shard's facet_dates as the > basis for subsequent shards' data to be merged in. > This means that if subsequent shards' facet_dates are skewed in relation to > the first by >1 'gap', these 'earlier' or 'later' facets will not be merged > in. > There are several reasons for this: > * Performance: It's faster to check facet_date lists against a single map's > data, rather than against each other, particularly if there are many shards > * If 'earlier' and/or 'later' facet_dates are added in, this will make the > time range larger than that which was requested > (e.g. a request for one hour's worth of facets could bring back 2, 3 > or more hours of data) > This could be dealt with if timezone and skew information was added, and > the dates were normalized. > One possibility for adding such support is to [optionally] add 'timezone' and > 'now' parameters to the 'facet_dates' map. This would tell requesters what > time and TZ the remote server thinks it is, and so multiple shards' time data > can be normalized. > The patch affects 2 files in the Solr core: > org.apache.solr.handler.component.FacetComponent.java > org.apache.solr.handler.component.ResponseBuilder.java > The main changes are in FacetComponent - ResponseBuilder is just to hold the > completed SimpleOrderedMap until the finishStage. > One possible enhancement is to perhaps make this an optional parameter, but > really, if facet.date parameters are specified, it is assumed they are > desired. > Comments & suggestions welcome. > As a favour to ask, if anyone could take my 2 source files and create a PATCH > file from it, it would be greatly appreciated, as I'm having a bit of trouble > with svn (don't shoot me, but my environment is a Redmond-based os company). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1729) Date Facet now override time parameter
[ https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge updated SOLR-1729: --- Attachment: FacetParams.java SimpleFacets.java These are the source files affected for this patch. Apologies for not creating a PATCH file - my tortoise svn is not working for creating patch files. If anyone would like to create a patch from these, that would be extraordinarily kind of you! Diff: (trunk: 1.4 Release) FacetParams.java: Add at line 179: /** * String that tells the date facet counter what time to use as 'now'. * * The value of this parameter, if it exists, must be a stringified long * of the number of milliseconds since the epoch (milliseconds since 1 Jan 1970 00:00). * System.currentTimeMillis() provides this. * * The DateField and DateMathParser work out their times relative to 'now'. * By default, 'now' is the local machine's System.currentTimeMillis(). * This parameter overrides the local value to use a different time. * This is very useful for remote server queries where the times on the querying * machine are skewed/different than that of the date faceting machine. * This is a date.facet global query parameter (i.e. not per field) * @see DateMathParser * @see DateField */ public static final String FACET_DATE_NOW = "facet.date.now"; SimpleFacets.java: Change at line 551: -final Date NOW = new Date(); + final Date NOW = new Date(params.get(FacetParams.FACET_DATE_NOW) != null ? Long.parseLong(params.get("facet.date.now")) : System.currentTimeMillis()); > Date Facet now override time parameter > -- > > Key: SOLR-1729 > URL: https://issues.apache.org/jira/browse/SOLR-1729 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 1.4 > Environment: Solr 1.4 >Reporter: Peter Sturge >Priority: Minor > Attachments: FacetParams.java, SimpleFacets.java > > > This PATCH introduces a new query parameter that tells a (typically, but not > necessarily) remote server what time to use as 'NOW' when calculating date > facets for a query (and, for the moment, date facets *only*) - overriding the > default behaviour of using the local server's current time. > This gets 'round a problem whereby an explicit time range is specified in a > query (e.g. timestamp:[then0 TO then1]), and date facets are required for the > given time range (in fact, any explicit time range). > Because DateMathParser performs all its calculations from 'NOW', remote > callers have to work out how long ago 'then0' and 'then1' are from 'now', and > use the relative-to-now values in the facet.date.xxx parameters. If a remote > server has a different opinion of NOW compared to the caller, the results > will be skewed (e.g. they are in a different time-zone, not time-synced etc.). > This becomes particularly salient when performing distributed date faceting > (see SOLR-1709), where multiple shards may all be running with different > times, and the faceting needs to be aligned. > The new parameter is called 'facet.date.now', and takes as a parameter a > (stringified) long that is the number of milliseconds from the epoch (1 Jan > 1970 00:00) - i.e. the returned value from a System.currentTimeMillis() call. > This was chosen over a formatted date to delineate it from a 'searchable' > time and to avoid superfluous date parsing. This makes the value generally a > programatically-set value, but as that is where the use-case is for this type > of parameter, this should be ok. > NOTE: This parameter affects date facet timing only. If there are other areas > of a query that rely on 'NOW', these will not interpret this value. This is a > broader issue about setting a 'query-global' NOW that all parts of query > analysis can share. > Source files affected: > FacetParams.java (holds the new constant FACET_DATE_NOW) > SimpleFacets.java getFacetDateCounts() NOW parameter modified > This PATCH is mildly related to SOLR-1709 (Distributed Date Faceting), but as > it's a general change for date faceting, it was deemed deserving of its own > patch. I will be updating SOLR-1709 in due course to include the use of this > new parameter, after some rfc acceptance. > A possible enhancement to this is to detect facet.date fields, look for and > match these fields in queries (if they exist), and potentially determine > automatically the required time skew, if any. There are a whole host of > reasons why this could be problematic to implement, so an explicit > facet.date.now parameter is the safest route. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1729) Date Facet now override time parameter
Date Facet now override time parameter -- Key: SOLR-1729 URL: https://issues.apache.org/jira/browse/SOLR-1729 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Environment: Solr 1.4 Reporter: Peter Sturge Priority: Minor This PATCH introduces a new query parameter that tells a (typically, but not necessarily) remote server what time to use as 'NOW' when calculating date facets for a query (and, for the moment, date facets *only*) - overriding the default behaviour of using the local server's current time. This gets 'round a problem whereby an explicit time range is specified in a query (e.g. timestamp:[then0 TO then1]), and date facets are required for the given time range (in fact, any explicit time range). Because DateMathParser performs all its calculations from 'NOW', remote callers have to work out how long ago 'then0' and 'then1' are from 'now', and use the relative-to-now values in the facet.date.xxx parameters. If a remote server has a different opinion of NOW compared to the caller, the results will be skewed (e.g. they are in a different time-zone, not time-synced etc.). This becomes particularly salient when performing distributed date faceting (see SOLR-1709), where multiple shards may all be running with different times, and the faceting needs to be aligned. The new parameter is called 'facet.date.now', and takes as a parameter a (stringified) long that is the number of milliseconds from the epoch (1 Jan 1970 00:00) - i.e. the returned value from a System.currentTimeMillis() call. This was chosen over a formatted date to delineate it from a 'searchable' time and to avoid superfluous date parsing. This makes the value generally a programatically-set value, but as that is where the use-case is for this type of parameter, this should be ok. NOTE: This parameter affects date facet timing only. If there are other areas of a query that rely on 'NOW', these will not interpret this value. This is a broader issue about setting a 'query-global' NOW that all parts of query analysis can share. Source files affected: FacetParams.java (holds the new constant FACET_DATE_NOW) SimpleFacets.java getFacetDateCounts() NOW parameter modified This PATCH is mildly related to SOLR-1709 (Distributed Date Faceting), but as it's a general change for date faceting, it was deemed deserving of its own patch. I will be updating SOLR-1709 in due course to include the use of this new parameter, after some rfc acceptance. A possible enhancement to this is to detect facet.date fields, look for and match these fields in queries (if they exist), and potentially determine automatically the required time skew, if any. There are a whole host of reasons why this could be problematic to implement, so an explicit facet.date.now parameter is the safest route. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1709) Distributed Date Faceting
[ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798411#action_12798411 ] Peter Sturge commented on SOLR-1709: Yonik, Yes, I can see what you mean that of course NOW will affect anything date-related to a given query. I'm wondering whether the passing of 'NOW' to shards should be a separate issue/patch from this one (e.g. something like 'Time Sync to Remote Shards'), as its scope and ramifications go far beyond simply distributed date faceting. The whole area of code relating to date math is one that I'm not familiar with, but do let me know if there's anything you'd like me to look at. > Distributed Date Faceting > - > > Key: SOLR-1709 > URL: https://issues.apache.org/jira/browse/SOLR-1709 > Project: Solr > Issue Type: Improvement > Components: SearchComponents - other >Affects Versions: 1.4 >Reporter: Peter Sturge >Priority: Minor > Attachments: FacetComponent.java, ResponseBuilder.java > > > This patch is for adding support for date facets when using distributed > searches. > Date faceting across multiple machines exposes some time-based issues that > anyone interested in this behaviour should be aware of: > Any time and/or time-zone differences are not accounted for in the patch > (i.e. merged date facets are at a time-of-day, not necessarily at a universal > 'instant-in-time', unless all shards are time-synced to the exact same time). > The implementation uses the first encountered shard's facet_dates as the > basis for subsequent shards' data to be merged in. > This means that if subsequent shards' facet_dates are skewed in relation to > the first by >1 'gap', these 'earlier' or 'later' facets will not be merged > in. > There are several reasons for this: > * Performance: It's faster to check facet_date lists against a single map's > data, rather than against each other, particularly if there are many shards > * If 'earlier' and/or 'later' facet_dates are added in, this will make the > time range larger than that which was requested > (e.g. a request for one hour's worth of facets could bring back 2, 3 > or more hours of data) > This could be dealt with if timezone and skew information was added, and > the dates were normalized. > One possibility for adding such support is to [optionally] add 'timezone' and > 'now' parameters to the 'facet_dates' map. This would tell requesters what > time and TZ the remote server thinks it is, and so multiple shards' time data > can be normalized. > The patch affects 2 files in the Solr core: > org.apache.solr.handler.component.FacetComponent.java > org.apache.solr.handler.component.ResponseBuilder.java > The main changes are in FacetComponent - ResponseBuilder is just to hold the > completed SimpleOrderedMap until the finishStage. > One possible enhancement is to perhaps make this an optional parameter, but > really, if facet.date parameters are specified, it is assumed they are > desired. > Comments & suggestions welcome. > As a favour to ask, if anyone could take my 2 source files and create a PATCH > file from it, it would be greatly appreciated, as I'm having a bit of trouble > with svn (don't shoot me, but my environment is a Redmond-based os company). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1709) Distributed Date Faceting
[ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798233#action_12798233 ] Peter Sturge commented on SOLR-1709: Definitely true! -- messing about with Date strings isn't great for performance. As the NOW parameter would be for internal request use only (i.e. not for the indexer, not for human consumption), could it not just be an epoch long? The adjustment math should then be nice and quick (no string/date parsing/formatting; at worst just one Date.getTimeInMillis() call if the time is stored locally as a string). > Distributed Date Faceting > - > > Key: SOLR-1709 > URL: https://issues.apache.org/jira/browse/SOLR-1709 > Project: Solr > Issue Type: Improvement > Components: SearchComponents - other >Affects Versions: 1.4 >Reporter: Peter Sturge >Priority: Minor > Attachments: FacetComponent.java, ResponseBuilder.java > > > This patch is for adding support for date facets when using distributed > searches. > Date faceting across multiple machines exposes some time-based issues that > anyone interested in this behaviour should be aware of: > Any time and/or time-zone differences are not accounted for in the patch > (i.e. merged date facets are at a time-of-day, not necessarily at a universal > 'instant-in-time', unless all shards are time-synced to the exact same time). > The implementation uses the first encountered shard's facet_dates as the > basis for subsequent shards' data to be merged in. > This means that if subsequent shards' facet_dates are skewed in relation to > the first by >1 'gap', these 'earlier' or 'later' facets will not be merged > in. > There are several reasons for this: > * Performance: It's faster to check facet_date lists against a single map's > data, rather than against each other, particularly if there are many shards > * If 'earlier' and/or 'later' facet_dates are added in, this will make the > time range larger than that which was requested > (e.g. a request for one hour's worth of facets could bring back 2, 3 > or more hours of data) > This could be dealt with if timezone and skew information was added, and > the dates were normalized. > One possibility for adding such support is to [optionally] add 'timezone' and > 'now' parameters to the 'facet_dates' map. This would tell requesters what > time and TZ the remote server thinks it is, and so multiple shards' time data > can be normalized. > The patch affects 2 files in the Solr core: > org.apache.solr.handler.component.FacetComponent.java > org.apache.solr.handler.component.ResponseBuilder.java > The main changes are in FacetComponent - ResponseBuilder is just to hold the > completed SimpleOrderedMap until the finishStage. > One possible enhancement is to perhaps make this an optional parameter, but > really, if facet.date parameters are specified, it is assumed they are > desired. > Comments & suggestions welcome. > As a favour to ask, if anyone could take my 2 source files and create a PATCH > file from it, it would be greatly appreciated, as I'm having a bit of trouble > with svn (don't shoot me, but my environment is a Redmond-based os company). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1709) Distributed Date Faceting
[ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge updated SOLR-1709: --- Attachment: ResponseBuilder.java FacetComponent.java Sorry, guys, can't get svn to create a patch file correctly on windows, so I'm attaching the source files here. With some time, which at the moment I don't have, I'm sure I could get svn working. Rather than anyone have to wait for me to get the patch file created, I thought it best to get the source uploaded, so people can start using it. Thanks, Peter > Distributed Date Faceting > - > > Key: SOLR-1709 > URL: https://issues.apache.org/jira/browse/SOLR-1709 > Project: Solr > Issue Type: Improvement > Components: SearchComponents - other >Affects Versions: 1.4 >Reporter: Peter Sturge >Priority: Minor > Attachments: FacetComponent.java, ResponseBuilder.java > > > This patch is for adding support for date facets when using distributed > searches. > Date faceting across multiple machines exposes some time-based issues that > anyone interested in this behaviour should be aware of: > Any time and/or time-zone differences are not accounted for in the patch > (i.e. merged date facets are at a time-of-day, not necessarily at a universal > 'instant-in-time', unless all shards are time-synced to the exact same time). > The implementation uses the first encountered shard's facet_dates as the > basis for subsequent shards' data to be merged in. > This means that if subsequent shards' facet_dates are skewed in relation to > the first by >1 'gap', these 'earlier' or 'later' facets will not be merged > in. > There are several reasons for this: > * Performance: It's faster to check facet_date lists against a single map's > data, rather than against each other, particularly if there are many shards > * If 'earlier' and/or 'later' facet_dates are added in, this will make the > time range larger than that which was requested > (e.g. a request for one hour's worth of facets could bring back 2, 3 > or more hours of data) > This could be dealt with if timezone and skew information was added, and > the dates were normalized. > One possibility for adding such support is to [optionally] add 'timezone' and > 'now' parameters to the 'facet_dates' map. This would tell requesters what > time and TZ the remote server thinks it is, and so multiple shards' time data > can be normalized. > The patch affects 2 files in the Solr core: > org.apache.solr.handler.component.FacetComponent.java > org.apache.solr.handler.component.ResponseBuilder.java > The main changes are in FacetComponent - ResponseBuilder is just to hold the > completed SimpleOrderedMap until the finishStage. > One possible enhancement is to perhaps make this an optional parameter, but > really, if facet.date parameters are specified, it is assumed they are > desired. > Comments & suggestions welcome. > As a favour to ask, if anyone could take my 2 source files and create a PATCH > file from it, it would be greatly appreciated, as I'm having a bit of trouble > with svn (don't shoot me, but my environment is a Redmond-based os company). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1709) Distributed Date Faceting
[ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797957#action_12797957 ] Peter Sturge commented on SOLR-1709: I've heard of Tortoise, I'll give that a try, thanks. On the time-zone/skew issue, perhaps a more efficient approach would be a 'push' rather than 'pull' - i.e.: Requesters would include an optional parameter that told remote shards what time to use as 'NOW', and which TZ to use for date faceting. This would avoid having to translate loads of time strings at merge time. Thanks, Peter > Distributed Date Faceting > - > > Key: SOLR-1709 > URL: https://issues.apache.org/jira/browse/SOLR-1709 > Project: Solr > Issue Type: Improvement > Components: SearchComponents - other >Affects Versions: 1.4 >Reporter: Peter Sturge >Priority: Minor > > This patch is for adding support for date facets when using distributed > searches. > Date faceting across multiple machines exposes some time-based issues that > anyone interested in this behaviour should be aware of: > Any time and/or time-zone differences are not accounted for in the patch > (i.e. merged date facets are at a time-of-day, not necessarily at a universal > 'instant-in-time', unless all shards are time-synced to the exact same time). > The implementation uses the first encountered shard's facet_dates as the > basis for subsequent shards' data to be merged in. > This means that if subsequent shards' facet_dates are skewed in relation to > the first by >1 'gap', these 'earlier' or 'later' facets will not be merged > in. > There are several reasons for this: > * Performance: It's faster to check facet_date lists against a single map's > data, rather than against each other, particularly if there are many shards > * If 'earlier' and/or 'later' facet_dates are added in, this will make the > time range larger than that which was requested > (e.g. a request for one hour's worth of facets could bring back 2, 3 > or more hours of data) > This could be dealt with if timezone and skew information was added, and > the dates were normalized. > One possibility for adding such support is to [optionally] add 'timezone' and > 'now' parameters to the 'facet_dates' map. This would tell requesters what > time and TZ the remote server thinks it is, and so multiple shards' time data > can be normalized. > The patch affects 2 files in the Solr core: > org.apache.solr.handler.component.FacetComponent.java > org.apache.solr.handler.component.ResponseBuilder.java > The main changes are in FacetComponent - ResponseBuilder is just to hold the > completed SimpleOrderedMap until the finishStage. > One possible enhancement is to perhaps make this an optional parameter, but > really, if facet.date parameters are specified, it is assumed they are > desired. > Comments & suggestions welcome. > As a favour to ask, if anyone could take my 2 source files and create a PATCH > file from it, it would be greatly appreciated, as I'm having a bit of trouble > with svn (don't shoot me, but my environment is a Redmond-based os company). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1709) Distributed Date Faceting
Distributed Date Faceting - Key: SOLR-1709 URL: https://issues.apache.org/jira/browse/SOLR-1709 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 1.4 Reporter: Peter Sturge Priority: Minor This patch is for adding support for date facets when using distributed searches. Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of: Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time). The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in. This means that if subsequent shards' facet_dates are skewed in relation to the first by >1 'gap', these 'earlier' or 'later' facets will not be merged in. There are several reasons for this: * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data) This could be dealt with if timezone and skew information was added, and the dates were normalized. One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized. The patch affects 2 files in the Solr core: org.apache.solr.handler.component.FacetComponent.java org.apache.solr.handler.component.ResponseBuilder.java The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage. One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired. Comments & suggestions welcome. As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1672) RFE: facet reverse sort count
[ https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge resolved SOLR-1672. Resolution: Fixed Marking as resolved. > RFE: facet reverse sort count > - > > Key: SOLR-1672 > URL: https://issues.apache.org/jira/browse/SOLR-1672 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 1.4 > Environment: Java, Solrj, http >Reporter: Peter Sturge >Priority: Minor > Attachments: SOLR-1672.patch > > Original Estimate: 0h > Remaining Estimate: 0h > > As suggested by Chris Hosstetter, I have added an optional Comparator to the > BoundedTreeSet in the UnInvertedField class. > This optional comparator is used when a new (and also optional) field facet > parameter called 'facet.sortorder' is set to the string 'dsc' > (e.g. &f..facet.sortorder=dsc for per field, or > &facet.sortorder=dsc for all facets). > Note that this parameter has no effect if facet.method=enum. > Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to > its default behaviour. > > This change affects 2 source files: > > UnInvertedField.java > [line 438] The getCounts() method signature is modified to add the > 'facetSortOrder' parameter value to the end of the argument list. > > DIFF UnInvertedField.java: > - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int > offset, int limit, Integer mincount, boolean missing, String sort, String > prefix) throws IOException { > + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int > offset, int limit, Integer mincount, boolean missing, String sort, String > prefix, String facetSortOrder) throws IOException { > [line 556] The getCounts() method is modified to create an overridden > BoundedTreeSet(int, Comparator) if the 'facetSortOrder' parameter > equals 'dsc'. > DIFF UnInvertedField.java: > - final BoundedTreeSet queue = new BoundedTreeSet(maxsize); > + final BoundedTreeSet queue = (sort.equals("count") || > sort.equals("true")) ? (facetSortOrder.equals("dsc") ? new > BoundedTreeSet(maxsize, new Comparator() > { @Override > public int compare(Object o1, Object o2) > { > if (o1 == null || o2 == null) > return 0; > int result = ((Long) o1).compareTo((Long) o2); > return (result != 0 ? result > 0 ? -1 : 1 : 0); //lowest number first sort > }}) : new BoundedTreeSet(maxsize)) : null; > > SimpleFacets.java > [line 221] A getFieldParam(field, "facet.sortorder", "asc"); is added to > retrieve the new parameter, if present. 'asc' used as a default value. > DIFF SimpleFacets.java: > + String facetSortOrder = params.getFieldParam(field, "facet.sortorder", > "asc"); > > [line 253] The call to uif.getCounts() in the getTermCounts() method is > modified to pass the 'facetSortOrder' value string. > DIFF SimpleFacets.java: > - counts = uif.getCounts(searcher, base, offset, limit, > mincount,missing,sort,prefix); > + counts = uif.getCounts(searcher, base, offset, limit, > mincount,missing,sort,prefix, facetSortOrder); > Implementation Notes: > I have noted in testing that I was not able to retrieve any '0' counts as I > had expected. > I believe this could be because there appear to be some optimizations in > SimpleFacets/count caching such that zero counts are not iterated (at least > not by default) > as a performance enhancement. > I could be wrong about this, and zero counts may appear under some other as > yet untested circumstances. Perhaps an expert familiar with this part of the > code can clarify. > In fact, this is not such a bad thing (at least for my requirements), as a > whole bunch of zero counts is not necessarily useful (for my requirements, > starting at '1' is just right). > > There may, however, be instances where someone *will* want zero counts - e.g. > searching for zero product stock counts (e.g. 'what have we run out of'). I > was envisioning the facet.mincount field > being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 > or possibly higher), but because of the caching/optimization, the behaviour > is somewhat different than expected. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1672) RFE: facet reverse sort count
[ https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge updated SOLR-1672: --- Remaining Estimate: 0h (was: 24h) Original Estimate: 0h (was: 24h) > RFE: facet reverse sort count > - > > Key: SOLR-1672 > URL: https://issues.apache.org/jira/browse/SOLR-1672 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 1.4 > Environment: Java, Solrj, http >Reporter: Peter Sturge >Priority: Minor > Attachments: SOLR-1672.patch > > Original Estimate: 0h > Remaining Estimate: 0h > > As suggested by Chris Hosstetter, I have added an optional Comparator to the > BoundedTreeSet in the UnInvertedField class. > This optional comparator is used when a new (and also optional) field facet > parameter called 'facet.sortorder' is set to the string 'dsc' > (e.g. &f..facet.sortorder=dsc for per field, or > &facet.sortorder=dsc for all facets). > Note that this parameter has no effect if facet.method=enum. > Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to > its default behaviour. > > This change affects 2 source files: > > UnInvertedField.java > [line 438] The getCounts() method signature is modified to add the > 'facetSortOrder' parameter value to the end of the argument list. > > DIFF UnInvertedField.java: > - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int > offset, int limit, Integer mincount, boolean missing, String sort, String > prefix) throws IOException { > + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int > offset, int limit, Integer mincount, boolean missing, String sort, String > prefix, String facetSortOrder) throws IOException { > [line 556] The getCounts() method is modified to create an overridden > BoundedTreeSet(int, Comparator) if the 'facetSortOrder' parameter > equals 'dsc'. > DIFF UnInvertedField.java: > - final BoundedTreeSet queue = new BoundedTreeSet(maxsize); > + final BoundedTreeSet queue = (sort.equals("count") || > sort.equals("true")) ? (facetSortOrder.equals("dsc") ? new > BoundedTreeSet(maxsize, new Comparator() > { @Override > public int compare(Object o1, Object o2) > { > if (o1 == null || o2 == null) > return 0; > int result = ((Long) o1).compareTo((Long) o2); > return (result != 0 ? result > 0 ? -1 : 1 : 0); //lowest number first sort > }}) : new BoundedTreeSet(maxsize)) : null; > > SimpleFacets.java > [line 221] A getFieldParam(field, "facet.sortorder", "asc"); is added to > retrieve the new parameter, if present. 'asc' used as a default value. > DIFF SimpleFacets.java: > + String facetSortOrder = params.getFieldParam(field, "facet.sortorder", > "asc"); > > [line 253] The call to uif.getCounts() in the getTermCounts() method is > modified to pass the 'facetSortOrder' value string. > DIFF SimpleFacets.java: > - counts = uif.getCounts(searcher, base, offset, limit, > mincount,missing,sort,prefix); > + counts = uif.getCounts(searcher, base, offset, limit, > mincount,missing,sort,prefix, facetSortOrder); > Implementation Notes: > I have noted in testing that I was not able to retrieve any '0' counts as I > had expected. > I believe this could be because there appear to be some optimizations in > SimpleFacets/count caching such that zero counts are not iterated (at least > not by default) > as a performance enhancement. > I could be wrong about this, and zero counts may appear under some other as > yet untested circumstances. Perhaps an expert familiar with this part of the > code can clarify. > In fact, this is not such a bad thing (at least for my requirements), as a > whole bunch of zero counts is not necessarily useful (for my requirements, > starting at '1' is just right). > > There may, however, be instances where someone *will* want zero counts - e.g. > searching for zero product stock counts (e.g. 'what have we run out of'). I > was envisioning the facet.mincount field > being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 > or possibly higher), but because of the caching/optimization, the behaviour > is somewhat different than expected. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1672) RFE: facet reverse sort count
[ https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792424#action_12792424 ] Peter Sturge commented on SOLR-1672: Patch SOLR-1672.patch now included for review > RFE: facet reverse sort count > - > > Key: SOLR-1672 > URL: https://issues.apache.org/jira/browse/SOLR-1672 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 1.4 > Environment: Java, Solrj, http >Reporter: Peter Sturge >Priority: Minor > Attachments: SOLR-1672.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > As suggested by Chris Hosstetter, I have added an optional Comparator to the > BoundedTreeSet in the UnInvertedField class. > This optional comparator is used when a new (and also optional) field facet > parameter called 'facet.sortorder' is set to the string 'dsc' > (e.g. &f..facet.sortorder=dsc for per field, or > &facet.sortorder=dsc for all facets). > Note that this parameter has no effect if facet.method=enum. > Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to > its default behaviour. > > This change affects 2 source files: > > UnInvertedField.java > [line 438] The getCounts() method signature is modified to add the > 'facetSortOrder' parameter value to the end of the argument list. > > DIFF UnInvertedField.java: > - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int > offset, int limit, Integer mincount, boolean missing, String sort, String > prefix) throws IOException { > + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int > offset, int limit, Integer mincount, boolean missing, String sort, String > prefix, String facetSortOrder) throws IOException { > [line 556] The getCounts() method is modified to create an overridden > BoundedTreeSet(int, Comparator) if the 'facetSortOrder' parameter > equals 'dsc'. > DIFF UnInvertedField.java: > - final BoundedTreeSet queue = new BoundedTreeSet(maxsize); > + final BoundedTreeSet queue = (sort.equals("count") || > sort.equals("true")) ? (facetSortOrder.equals("dsc") ? new > BoundedTreeSet(maxsize, new Comparator() > { @Override > public int compare(Object o1, Object o2) > { > if (o1 == null || o2 == null) > return 0; > int result = ((Long) o1).compareTo((Long) o2); > return (result != 0 ? result > 0 ? -1 : 1 : 0); //lowest number first sort > }}) : new BoundedTreeSet(maxsize)) : null; > > SimpleFacets.java > [line 221] A getFieldParam(field, "facet.sortorder", "asc"); is added to > retrieve the new parameter, if present. 'asc' used as a default value. > DIFF SimpleFacets.java: > + String facetSortOrder = params.getFieldParam(field, "facet.sortorder", > "asc"); > > [line 253] The call to uif.getCounts() in the getTermCounts() method is > modified to pass the 'facetSortOrder' value string. > DIFF SimpleFacets.java: > - counts = uif.getCounts(searcher, base, offset, limit, > mincount,missing,sort,prefix); > + counts = uif.getCounts(searcher, base, offset, limit, > mincount,missing,sort,prefix, facetSortOrder); > Implementation Notes: > I have noted in testing that I was not able to retrieve any '0' counts as I > had expected. > I believe this could be because there appear to be some optimizations in > SimpleFacets/count caching such that zero counts are not iterated (at least > not by default) > as a performance enhancement. > I could be wrong about this, and zero counts may appear under some other as > yet untested circumstances. Perhaps an expert familiar with this part of the > code can clarify. > In fact, this is not such a bad thing (at least for my requirements), as a > whole bunch of zero counts is not necessarily useful (for my requirements, > starting at '1' is just right). > > There may, however, be instances where someone *will* want zero counts - e.g. > searching for zero product stock counts (e.g. 'what have we run out of'). I > was envisioning the facet.mincount field > being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 > or possibly higher), but because of the caching/optimization, the behaviour > is somewhat different than expected. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1672) RFE: facet reverse sort count
[ https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge updated SOLR-1672: --- Attachment: SOLR-1672.patch Patch diff file for adding facet reverse sorting > RFE: facet reverse sort count > - > > Key: SOLR-1672 > URL: https://issues.apache.org/jira/browse/SOLR-1672 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 1.4 > Environment: Java, Solrj, http >Reporter: Peter Sturge >Priority: Minor > Attachments: SOLR-1672.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > As suggested by Chris Hosstetter, I have added an optional Comparator to the > BoundedTreeSet in the UnInvertedField class. > This optional comparator is used when a new (and also optional) field facet > parameter called 'facet.sortorder' is set to the string 'dsc' > (e.g. &f..facet.sortorder=dsc for per field, or > &facet.sortorder=dsc for all facets). > Note that this parameter has no effect if facet.method=enum. > Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to > its default behaviour. > > This change affects 2 source files: > > UnInvertedField.java > [line 438] The getCounts() method signature is modified to add the > 'facetSortOrder' parameter value to the end of the argument list. > > DIFF UnInvertedField.java: > - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int > offset, int limit, Integer mincount, boolean missing, String sort, String > prefix) throws IOException { > + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int > offset, int limit, Integer mincount, boolean missing, String sort, String > prefix, String facetSortOrder) throws IOException { > [line 556] The getCounts() method is modified to create an overridden > BoundedTreeSet(int, Comparator) if the 'facetSortOrder' parameter > equals 'dsc'. > DIFF UnInvertedField.java: > - final BoundedTreeSet queue = new BoundedTreeSet(maxsize); > + final BoundedTreeSet queue = (sort.equals("count") || > sort.equals("true")) ? (facetSortOrder.equals("dsc") ? new > BoundedTreeSet(maxsize, new Comparator() > { @Override > public int compare(Object o1, Object o2) > { > if (o1 == null || o2 == null) > return 0; > int result = ((Long) o1).compareTo((Long) o2); > return (result != 0 ? result > 0 ? -1 : 1 : 0); //lowest number first sort > }}) : new BoundedTreeSet(maxsize)) : null; > > SimpleFacets.java > [line 221] A getFieldParam(field, "facet.sortorder", "asc"); is added to > retrieve the new parameter, if present. 'asc' used as a default value. > DIFF SimpleFacets.java: > + String facetSortOrder = params.getFieldParam(field, "facet.sortorder", > "asc"); > > [line 253] The call to uif.getCounts() in the getTermCounts() method is > modified to pass the 'facetSortOrder' value string. > DIFF SimpleFacets.java: > - counts = uif.getCounts(searcher, base, offset, limit, > mincount,missing,sort,prefix); > + counts = uif.getCounts(searcher, base, offset, limit, > mincount,missing,sort,prefix, facetSortOrder); > Implementation Notes: > I have noted in testing that I was not able to retrieve any '0' counts as I > had expected. > I believe this could be because there appear to be some optimizations in > SimpleFacets/count caching such that zero counts are not iterated (at least > not by default) > as a performance enhancement. > I could be wrong about this, and zero counts may appear under some other as > yet untested circumstances. Perhaps an expert familiar with this part of the > code can clarify. > In fact, this is not such a bad thing (at least for my requirements), as a > whole bunch of zero counts is not necessarily useful (for my requirements, > starting at '1' is just right). > > There may, however, be instances where someone *will* want zero counts - e.g. > searching for zero product stock counts (e.g. 'what have we run out of'). I > was envisioning the facet.mincount field > being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 > or possibly higher), but because of the caching/optimization, the behaviour > is somewhat different than expected. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1672) RFE: facet reverse sort count
RFE: facet reverse sort count - Key: SOLR-1672 URL: https://issues.apache.org/jira/browse/SOLR-1672 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Environment: Java, Solrj, http Reporter: Peter Sturge Priority: Minor As suggested by Chris Hosstetter, I have added an optional Comparator to the BoundedTreeSet in the UnInvertedField class. This optional comparator is used when a new (and also optional) field facet parameter called 'facet.sortorder' is set to the string 'dsc' (e.g. &f..facet.sortorder=dsc for per field, or &facet.sortorder=dsc for all facets). Note that this parameter has no effect if facet.method=enum. Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to its default behaviour. This change affects 2 source files: > UnInvertedField.java [line 438] The getCounts() method signature is modified to add the 'facetSortOrder' parameter value to the end of the argument list. DIFF UnInvertedField.java: - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int offset, int limit, Integer mincount, boolean missing, String sort, String prefix) throws IOException { + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int offset, int limit, Integer mincount, boolean missing, String sort, String prefix, String facetSortOrder) throws IOException { [line 556] The getCounts() method is modified to create an overridden BoundedTreeSet(int, Comparator) if the 'facetSortOrder' parameter equals 'dsc'. DIFF UnInvertedField.java: - final BoundedTreeSet queue = new BoundedTreeSet(maxsize); + final BoundedTreeSet queue = (sort.equals("count") || sort.equals("true")) ? (facetSortOrder.equals("dsc") ? new BoundedTreeSet(maxsize, new Comparator() { @Override public int compare(Object o1, Object o2) { if (o1 == null || o2 == null) return 0; int result = ((Long) o1).compareTo((Long) o2); return (result != 0 ? result > 0 ? -1 : 1 : 0); //lowest number first sort }}) : new BoundedTreeSet(maxsize)) : null; > SimpleFacets.java [line 221] A getFieldParam(field, "facet.sortorder", "asc"); is added to retrieve the new parameter, if present. 'asc' used as a default value. DIFF SimpleFacets.java: + String facetSortOrder = params.getFieldParam(field, "facet.sortorder", "asc"); [line 253] The call to uif.getCounts() in the getTermCounts() method is modified to pass the 'facetSortOrder' value string. DIFF SimpleFacets.java: - counts = uif.getCounts(searcher, base, offset, limit, mincount,missing,sort,prefix); + counts = uif.getCounts(searcher, base, offset, limit, mincount,missing,sort,prefix, facetSortOrder); Implementation Notes: I have noted in testing that I was not able to retrieve any '0' counts as I had expected. I believe this could be because there appear to be some optimizations in SimpleFacets/count caching such that zero counts are not iterated (at least not by default) as a performance enhancement. I could be wrong about this, and zero counts may appear under some other as yet untested circumstances. Perhaps an expert familiar with this part of the code can clarify. In fact, this is not such a bad thing (at least for my requirements), as a whole bunch of zero counts is not necessarily useful (for my requirements, starting at '1' is just right). There may, however, be instances where someone *will* want zero counts - e.g. searching for zero product stock counts (e.g. 'what have we run out of'). I was envisioning the facet.mincount field being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 or possibly higher), but because of the caching/optimization, the behaviour is somewhat different than expected. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.