[jira] Updated: (SOLR-1872) Document-level Access Control in Solr

2010-04-12 Thread Peter Sturge (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge updated SOLR-1872:
---

Attachment: SolrACLSecurity.java

Updates a typo or two plus some misc tweaks.

{code}
  searchComponent name=SolrACLSecurity 
class=org.apache.solr.handler.security.SolrACLSecurity
  !-- SolrACLSecurityKey can be any alphanumeric string, the more complex 
the better.
   For production environments, don't use the default value - create a 
new value.
   This property needs to be present in all firstSearcher and 
newSearcher warming queries, otherwise
   those requests will be blocked.
  --
  str 
name=SolrACLSecurityKeyzxb79j3g76A79N8N2AbR0K852976qr1klt86xv436j2/str
  str name=config-fileacl.xml/str
  !-- Auditing: Set audit to true to log all searches, including failed 
access attempts --
  bool name=audittrue/bool
  int name=maxFileSizeInMB10/int
  int name=maxFileCount1/int
  str name=auditFileaudit.log/str
  !-- 
User lockout 
'lockoutThreshold' is the number of consecutive incorrect logins 
before locking out the account
'lockoutTime' is the number of minutes to lockout the account
If 'lockoutThreshold' is 0 or less, account lockout is disabled (no 
accounts are ever locked out)
If not specified, the default values are: lockThreshold=5 
lockoutTime=15
  --
  str name=lockoutThreshold5/str
  str name=lockoutTime15/str
  /searchComponent  
{code}

Thanks,
Peter


 Document-level Access Control in Solr
 -

 Key: SOLR-1872
 URL: https://issues.apache.org/jira/browse/SOLR-1872
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Affects Versions: 1.4
 Environment: Solr 1.4
Reporter: Peter Sturge
Priority: Minor
 Attachments: SolrACLSecurity.java, SolrACLSecurity.java, 
 SolrACLSecurity.rar


 This issue relates to providing document-level access control for Solr index 
 data.
 A related JIRA issue is: SOLR-1834. I thought it would be best if I created a 
 separate JIRA issue, rather than tack on to SOLR-1834, as the approach here 
 is somewhat different, and I didn't want to confuse things or step on Anders' 
 good work.
 There have been lots of discussions about document-level access in Solr using 
 LCF, custom comoponents and the like. Access Control is one of those subjects 
 that quickly spreads to lots of 'ratholes' to dive into. Even if not everyone 
 agrees with the approaches taken here, it does, at the very least, highlight 
 some of the salient issues surrounding access control in Solr, and will 
 hopefully initiate a healthy discussion on the range of related requirements, 
 with the aim of finding the optimum balance of requirements.
 The approach taken here is document and schema agnostic - i.e. the access 
 control is independant of what is or will be in the index, and no schema 
 changes are required. This version doesn't include LDAP/AD integration, but 
 could be added relatively easily (see Ander's very good work on this in 
 SOLR-1834). Note that, at the moment, this version doesn't deal with /update, 
 /replication etc., it's currently a /select thing at the moment (but it could 
 be used for these).
 This approach uses a SearchComponent subclass called SolrACLSecurity. Its 
 configuration is read in from solrconfig.xml in the usual way, and the 
 allow/deny configuration is split out into a config file called acl.xml.
 acl.xml defines a number of users and groups (and 1 global for 'everyone'), 
 and assigns 0 or more {{acl-allow}} and/or {{acl-deny}} elements.
 When the SearchComponent is initialized, user objects are created and cached, 
 including an 'allow' list and a 'deny' list.
 When a request comes in, these lists are used to build filter queries 
 ('allows' are OR'ed and 'denies' are NAND'ed), and then added to the query 
 request.
 Because the allow and deny elements are simply subsearch queries (e.g. 
 {{acl-allowsomefield:secret/acl-allow}}, this mechanism will work on any 
 stored data that can be queried, including already existing data.
 Authentication
 One of the sticky problems with access control is how to determine who's 
 asking for data. There are many approaches, and to stay in the generic vein 
 the current mechanism uses http parameters for this.
 For an initial search, a client includes a {{username=somename}} parameter 
 and a {{hash=pwdhash}} hash of its password. If the request sends the correct 
 parameters, the search is granted and a uuid parameter is returned in the 
 response header. This uuid can then be used in subsequent requests from the 
 client. If the request is wrong, the SearchComponent fails and will increment 
 the user's failed login count (if a 

[jira] Updated: (SOLR-1872) Document-level Access Control in Solr

2010-04-11 Thread Peter Sturge (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge updated SOLR-1872:
---

Attachment: SolrACLSecurity.java

This update adds in optional auditing of searches by users and failed access 
attempts, plus a few minor tweaks.

To configure auditing, here is a sample searchComponent section from 
solrconfg.xml:

{{
  searchComponent name=SolrACLSecurity 
class=org.apache.solr.handler.security.SolrACLSecurity
  !-- SolrACLSecurityKey can be any alphanumeric string, the more complex 
the better.
   For production environments, don't use the default value - create a 
new value.
   This property needs to be present in all firstSearcher and 
newSearcher warming queries, otherwise
   those requests will be blocked.
  --
  str 
name=SolrACLSecurityzxb79j3g76A79N8N2AbR0K852976qr1klt86xv436j2/str
  str name=config-fileacl.xml/str
  !-- Auditing: Set audit to true to log all searches, including failed 
access attempts --
  bool name=audittrue/bool
  int name=maxFileSizeInMB10/int
  int name=maxFileCount1/int
  str name=auditFileaudit.log/str
  !-- 
User lockout 
'lockoutThreshold' is the number of consecutive incorrect logins 
before locking out the account
'lockoutTime' is the number of minutes to lockout the account
If 'lockoutThreshold' is 0 or less, account lockout is disabled (no 
accounts are ever locked out)
If not specified, the default values are: lockThreshold=5 
lockoutTIme=15
  --
  str name=lockoutThreshold5/str
  str name=lockoutTime15/str
 /searchComponent
}}


 Document-level Access Control in Solr
 -

 Key: SOLR-1872
 URL: https://issues.apache.org/jira/browse/SOLR-1872
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Affects Versions: 1.4
 Environment: Solr 1.4
Reporter: Peter Sturge
Priority: Minor
 Attachments: SolrACLSecurity.java, SolrACLSecurity.rar


 This issue relates to providing document-level access control for Solr index 
 data.
 A related JIRA issue is: SOLR-1834. I thought it would be best if I created a 
 separate JIRA issue, rather than tack on to SOLR-1834, as the approach here 
 is somewhat different, and I didn't want to confuse things or step on Anders' 
 good work.
 There have been lots of discussions about document-level access in Solr using 
 LCF, custom comoponents and the like. Access Control is one of those subjects 
 that quickly spreads to lots of 'ratholes' to dive into. Even if not everyone 
 agrees with the approaches taken here, it does, at the very least, highlight 
 some of the salient issues surrounding access control in Solr, and will 
 hopefully initiate a healthy discussion on the range of related requirements, 
 with the aim of finding the optimum balance of requirements.
 The approach taken here is document and schema agnostic - i.e. the access 
 control is independant of what is or will be in the index, and no schema 
 changes are required. This version doesn't include LDAP/AD integration, but 
 could be added relatively easily (see Ander's very good work on this in 
 SOLR-1834). Note that, at the moment, this version doesn't deal with /update, 
 /replication etc., it's currently a /select thing at the moment (but it could 
 be used for these).
 This approach uses a SearchComponent subclass called SolrACLSecurity. Its 
 configuration is read in from solrconfig.xml in the usual way, and the 
 allow/deny configuration is split out into a config file called acl.xml.
 acl.xml defines a number of users and groups (and 1 global for 'everyone'), 
 and assigns 0 or more {{acl-allow}} and/or {{acl-deny}} elements.
 When the SearchComponent is initialized, user objects are created and cached, 
 including an 'allow' list and a 'deny' list.
 When a request comes in, these lists are used to build filter queries 
 ('allows' are OR'ed and 'denies' are NAND'ed), and then added to the query 
 request.
 Because the allow and deny elements are simply subsearch queries (e.g. 
 {{acl-allowsomefield:secret/acl-allow}}, this mechanism will work on any 
 stored data that can be queried, including already existing data.
 Authentication
 One of the sticky problems with access control is how to determine who's 
 asking for data. There are many approaches, and to stay in the generic vein 
 the current mechanism uses http parameters for this.
 For an initial search, a client includes a {{username=somename}} parameter 
 and a {{hash=pwdhash}} hash of its password. If the request sends the correct 
 parameters, the search is granted and a uuid parameter is returned in the 
 response header. This uuid can then be used in subsequent requests from the 
 client. If the 

[jira] Issue Comment Edited: (SOLR-1872) Document-level Access Control in Solr

2010-04-11 Thread Peter Sturge (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855698#action_12855698
 ] 

Peter Sturge edited comment on SOLR-1872 at 4/11/10 6:16 AM:
-

This update adds in optional auditing of searches by users and failed access 
attempts, plus a few minor tweaks.

To configure auditing, here is a sample searchComponent section from 
solrconfg.xml:

{{searchComponent name=SolrACLSecurity 
class=org.apache.solr.handler.security.SolrACLSecurity
  !-- SolrACLSecurityKey can be any alphanumeric string, the more complex 
the better.
   For production environments, don't use the default value - create a 
new value.
   This property needs to be present in all firstSearcher and 
newSearcher warming queries, otherwise
   those requests will be blocked.
  --
  str 
name=SolrACLSecurityzxb79j3g76A79N8N2AbR0K852976qr1klt86xv436j2/str
  str name=config-fileacl.xml/str
  !-- Auditing: Set audit to true to log all searches, including failed 
access attempts --
  bool name=audittrue/bool
  int name=maxFileSizeInMB10/int
  int name=maxFileCount1/int
  str name=auditFileaudit.log/str
  !-- 
User lockout 
'lockoutThreshold' is the number of consecutive incorrect logins 
before locking out the account
'lockoutTime' is the number of minutes to lockout the account
If 'lockoutThreshold' is 0 or less, account lockout is disabled (no 
accounts are ever locked out)
If not specified, the default values are: lockThreshold=5 
lockoutTIme=15
  --
  str name=lockoutThreshold5/str
  str name=lockoutTime15/str
 /searchComponent}}


  was (Author: midiman):
This update adds in optional auditing of searches by users and failed 
access attempts, plus a few minor tweaks.

To configure auditing, here is a sample searchComponent section from 
solrconfg.xml:

{{
  searchComponent name=SolrACLSecurity 
class=org.apache.solr.handler.security.SolrACLSecurity
  !-- SolrACLSecurityKey can be any alphanumeric string, the more complex 
the better.
   For production environments, don't use the default value - create a 
new value.
   This property needs to be present in all firstSearcher and 
newSearcher warming queries, otherwise
   those requests will be blocked.
  --
  str 
name=SolrACLSecurityzxb79j3g76A79N8N2AbR0K852976qr1klt86xv436j2/str
  str name=config-fileacl.xml/str
  !-- Auditing: Set audit to true to log all searches, including failed 
access attempts --
  bool name=audittrue/bool
  int name=maxFileSizeInMB10/int
  int name=maxFileCount1/int
  str name=auditFileaudit.log/str
  !-- 
User lockout 
'lockoutThreshold' is the number of consecutive incorrect logins 
before locking out the account
'lockoutTime' is the number of minutes to lockout the account
If 'lockoutThreshold' is 0 or less, account lockout is disabled (no 
accounts are ever locked out)
If not specified, the default values are: lockThreshold=5 
lockoutTIme=15
  --
  str name=lockoutThreshold5/str
  str name=lockoutTime15/str
 /searchComponent
}}

  
 Document-level Access Control in Solr
 -

 Key: SOLR-1872
 URL: https://issues.apache.org/jira/browse/SOLR-1872
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Affects Versions: 1.4
 Environment: Solr 1.4
Reporter: Peter Sturge
Priority: Minor
 Attachments: SolrACLSecurity.java, SolrACLSecurity.rar


 This issue relates to providing document-level access control for Solr index 
 data.
 A related JIRA issue is: SOLR-1834. I thought it would be best if I created a 
 separate JIRA issue, rather than tack on to SOLR-1834, as the approach here 
 is somewhat different, and I didn't want to confuse things or step on Anders' 
 good work.
 There have been lots of discussions about document-level access in Solr using 
 LCF, custom comoponents and the like. Access Control is one of those subjects 
 that quickly spreads to lots of 'ratholes' to dive into. Even if not everyone 
 agrees with the approaches taken here, it does, at the very least, highlight 
 some of the salient issues surrounding access control in Solr, and will 
 hopefully initiate a healthy discussion on the range of related requirements, 
 with the aim of finding the optimum balance of requirements.
 The approach taken here is document and schema agnostic - i.e. the access 
 control is independant of what is or will be in the index, and no schema 
 changes are required. This version doesn't include LDAP/AD integration, but 
 could be added relatively easily (see Ander's very good work 

[jira] Issue Comment Edited: (SOLR-1872) Document-level Access Control in Solr

2010-04-11 Thread Peter Sturge (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855698#action_12855698
 ] 

Peter Sturge edited comment on SOLR-1872 at 4/11/10 6:18 AM:
-

This update adds in optional auditing of searches by users and failed access 
attempts, plus a few minor tweaks.

To configure auditing, here is a sample searchComponent section from 
solrconfg.xml:

{code}
  searchComponent name=SolrACLSecurity 
class=org.apache.solr.handler.security.SolrACLSecurity
  !-- SolrACLSecurityKey can be any alphanumeric string, the more complex 
the better.
   For production environments, don't use the default value - create a 
new value.
   This property needs to be present in all firstSearcher and 
newSearcher warming queries, otherwise
   those requests will be blocked.
  --
  str 
name=SolrACLSecurityzxb79j3g76A79N8N2AbR0K852976qr1klt86xv436j2/str
  str name=config-fileacl.xml/str
  !-- Auditing: Set audit to true to log all searches, including failed 
access attempts --
  bool name=audittrue/bool
  int name=maxFileSizeInMB10/int
  int name=maxFileCount1/int
  str name=auditFileaudit.log/str
  !-- 
User lockout 
'lockoutThreshold' is the number of consecutive incorrect logins 
before locking out the account
'lockoutTime' is the number of minutes to lockout the account
If 'lockoutThreshold' is 0 or less, account lockout is disabled (no 
accounts are ever locked out)
If not specified, the default values are: lockThreshold=5 
lockoutTIme=15
  --
  str name=lockoutThreshold5/str
  str name=lockoutTime15/str
 /searchComponent
{code}



  was (Author: midiman):
This update adds in optional auditing of searches by users and failed 
access attempts, plus a few minor tweaks.

To configure auditing, here is a sample searchComponent section from 
solrconfg.xml:

{{searchComponent name=SolrACLSecurity 
class=org.apache.solr.handler.security.SolrACLSecurity
  !-- SolrACLSecurityKey can be any alphanumeric string, the more complex 
the better.
   For production environments, don't use the default value - create a 
new value.
   This property needs to be present in all firstSearcher and 
newSearcher warming queries, otherwise
   those requests will be blocked.
  --
  str 
name=SolrACLSecurityzxb79j3g76A79N8N2AbR0K852976qr1klt86xv436j2/str
  str name=config-fileacl.xml/str
  !-- Auditing: Set audit to true to log all searches, including failed 
access attempts --
  bool name=audittrue/bool
  int name=maxFileSizeInMB10/int
  int name=maxFileCount1/int
  str name=auditFileaudit.log/str
  !-- 
User lockout 
'lockoutThreshold' is the number of consecutive incorrect logins 
before locking out the account
'lockoutTime' is the number of minutes to lockout the account
If 'lockoutThreshold' is 0 or less, account lockout is disabled (no 
accounts are ever locked out)
If not specified, the default values are: lockThreshold=5 
lockoutTIme=15
  --
  str name=lockoutThreshold5/str
  str name=lockoutTime15/str
 /searchComponent}}

  
 Document-level Access Control in Solr
 -

 Key: SOLR-1872
 URL: https://issues.apache.org/jira/browse/SOLR-1872
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Affects Versions: 1.4
 Environment: Solr 1.4
Reporter: Peter Sturge
Priority: Minor
 Attachments: SolrACLSecurity.java, SolrACLSecurity.rar


 This issue relates to providing document-level access control for Solr index 
 data.
 A related JIRA issue is: SOLR-1834. I thought it would be best if I created a 
 separate JIRA issue, rather than tack on to SOLR-1834, as the approach here 
 is somewhat different, and I didn't want to confuse things or step on Anders' 
 good work.
 There have been lots of discussions about document-level access in Solr using 
 LCF, custom comoponents and the like. Access Control is one of those subjects 
 that quickly spreads to lots of 'ratholes' to dive into. Even if not everyone 
 agrees with the approaches taken here, it does, at the very least, highlight 
 some of the salient issues surrounding access control in Solr, and will 
 hopefully initiate a healthy discussion on the range of related requirements, 
 with the aim of finding the optimum balance of requirements.
 The approach taken here is document and schema agnostic - i.e. the access 
 control is independant of what is or will be in the index, and no schema 
 changes are required. This version doesn't include LDAP/AD integration, but 
 could be added relatively easily (see Ander's very 

[jira] Created: (SOLR-1872) Document-level Access Control in Solr

2010-04-08 Thread Peter Sturge (JIRA)
Document-level Access Control in Solr
-

 Key: SOLR-1872
 URL: https://issues.apache.org/jira/browse/SOLR-1872
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Affects Versions: 1.4
 Environment: Solr 1.4
Reporter: Peter Sturge
Priority: Minor
 Attachments: SolrACLSecurity.rar

This issue relates to providing document-level access control for Solr index 
data.

A related JIRA issue is: SOLR-1834. I thought it would be best if I created a 
separate JIRA issue, rather than tack on to SOLR-1834, as the approach here is 
somewhat different, and I didn't want to confuse things or step on Anders' good 
work.

There have been lots of discussions about document-level access in Solr using 
LCF, custom comoponents and the like. Access Control is one of those subjects 
that quickly spreads to lots of 'ratholes' to dive into. Even if not everyone 
agrees with the approaches taken here, it does, at the very least, highlight 
some of the salient issues surrounding access control in Solr, and will 
hopefully initiate a healthy discussion on the range of related requirements, 
with the aim of finding the optimum balance of requirements.

The approach taken here is document and schema agnostic - i.e. the access 
control is independant of what is or will be in the index, and no schema 
changes are required. This version doesn't include LDAP/AD integration, but 
could be added relatively easily (see Ander's very good work on this in 
SOLR-1834). Note that, at the moment, this version doesn't deal with /update, 
/replication etc., it's currently a /select thing at the moment (but it could 
be used for these).

This approach uses a SearchComponent subclass called SolrACLSecurity. Its 
configuration is read in from solrconfig.xml in the usual way, and the 
allow/deny configuration is split out into a config file called acl.xml.

acl.xml defines a number of users and groups (and 1 global for 'everyone'), and 
assigns 0 or more {{acl-allow}} and/or {{acl-deny}} elements.
When the SearchComponent is initialized, user objects are created and cached, 
including an 'allow' list and a 'deny' list.
When a request comes in, these lists are used to build filter queries ('allows' 
are OR'ed and 'denies' are NAND'ed), and then added to the query request.

Because the allow and deny elements are simply subsearch queries (e.g. 
{{acl-allowsomefield:secret/acl-allow}}, this mechanism will work on any 
stored data that can be queried, including already existing data.

Authentication
One of the sticky problems with access control is how to determine who's asking 
for data. There are many approaches, and to stay in the generic vein the 
current mechanism uses http parameters for this.
For an initial search, a client includes a {{username=somename}} parameter and 
a {{hash=pwdhash}} hash of its password. If the request sends the correct 
parameters, the search is granted and a uuid parameter is returned in the 
response header. This uuid can then be used in subsequent requests from the 
client. If the request is wrong, the SearchComponent fails and will increment 
the user's failed login count (if a valid user was specified). If this count 
exceeds the configured lockoutThreshold, no further requests are granted until 
the lockoutTime has elapsed.
This mechanism protects against some types of attacks (e.g. CLRF, dictionary 
etc.), but it really needs container HTTPS as well (as would most other auth 
implementations). Incorporating SSL certificates for authentication and making 
the authentication mechanism pluggable would be a nice improvement (i.e. 
separate authentication from access control).

Another issue is how internal searchers perform autowarming etc. The solution 
here is to use a local key called 'SolrACLSecurityKey'. This key is local and 
[should be] unique to that server. firstSearcher, newSearcher et al then 
include this key in their parameters so they can perform autowarming without 
constraint. Again, there are likely many ways to achieve this, this approach is 
but one.

The attached rar holds the source and associated configuration. This has been 
tested on the 1.4 release codebase (search in the attached solrconfig.xml for 
SolrACLSecurity to find the relevant sections in this file).

I hope this proves helpful for people who are looking for this sort of 
functionality in Solr, and more generally to address how such a mechanism could 
ultimately be integrated into a future Solr release.

Many thanks,
Peter





-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1872) Document-level Access Control in Solr

2010-04-08 Thread Peter Sturge (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge updated SOLR-1872:
---

Attachment: SolrACLSecurity.rar

 Document-level Access Control in Solr
 -

 Key: SOLR-1872
 URL: https://issues.apache.org/jira/browse/SOLR-1872
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Affects Versions: 1.4
 Environment: Solr 1.4
Reporter: Peter Sturge
Priority: Minor
 Attachments: SolrACLSecurity.rar


 This issue relates to providing document-level access control for Solr index 
 data.
 A related JIRA issue is: SOLR-1834. I thought it would be best if I created a 
 separate JIRA issue, rather than tack on to SOLR-1834, as the approach here 
 is somewhat different, and I didn't want to confuse things or step on Anders' 
 good work.
 There have been lots of discussions about document-level access in Solr using 
 LCF, custom comoponents and the like. Access Control is one of those subjects 
 that quickly spreads to lots of 'ratholes' to dive into. Even if not everyone 
 agrees with the approaches taken here, it does, at the very least, highlight 
 some of the salient issues surrounding access control in Solr, and will 
 hopefully initiate a healthy discussion on the range of related requirements, 
 with the aim of finding the optimum balance of requirements.
 The approach taken here is document and schema agnostic - i.e. the access 
 control is independant of what is or will be in the index, and no schema 
 changes are required. This version doesn't include LDAP/AD integration, but 
 could be added relatively easily (see Ander's very good work on this in 
 SOLR-1834). Note that, at the moment, this version doesn't deal with /update, 
 /replication etc., it's currently a /select thing at the moment (but it could 
 be used for these).
 This approach uses a SearchComponent subclass called SolrACLSecurity. Its 
 configuration is read in from solrconfig.xml in the usual way, and the 
 allow/deny configuration is split out into a config file called acl.xml.
 acl.xml defines a number of users and groups (and 1 global for 'everyone'), 
 and assigns 0 or more {{acl-allow}} and/or {{acl-deny}} elements.
 When the SearchComponent is initialized, user objects are created and cached, 
 including an 'allow' list and a 'deny' list.
 When a request comes in, these lists are used to build filter queries 
 ('allows' are OR'ed and 'denies' are NAND'ed), and then added to the query 
 request.
 Because the allow and deny elements are simply subsearch queries (e.g. 
 {{acl-allowsomefield:secret/acl-allow}}, this mechanism will work on any 
 stored data that can be queried, including already existing data.
 Authentication
 One of the sticky problems with access control is how to determine who's 
 asking for data. There are many approaches, and to stay in the generic vein 
 the current mechanism uses http parameters for this.
 For an initial search, a client includes a {{username=somename}} parameter 
 and a {{hash=pwdhash}} hash of its password. If the request sends the correct 
 parameters, the search is granted and a uuid parameter is returned in the 
 response header. This uuid can then be used in subsequent requests from the 
 client. If the request is wrong, the SearchComponent fails and will increment 
 the user's failed login count (if a valid user was specified). If this count 
 exceeds the configured lockoutThreshold, no further requests are granted 
 until the lockoutTime has elapsed.
 This mechanism protects against some types of attacks (e.g. CLRF, dictionary 
 etc.), but it really needs container HTTPS as well (as would most other auth 
 implementations). Incorporating SSL certificates for authentication and 
 making the authentication mechanism pluggable would be a nice improvement 
 (i.e. separate authentication from access control).
 Another issue is how internal searchers perform autowarming etc. The solution 
 here is to use a local key called 'SolrACLSecurityKey'. This key is local and 
 [should be] unique to that server. firstSearcher, newSearcher et al then 
 include this key in their parameters so they can perform autowarming without 
 constraint. Again, there are likely many ways to achieve this, this approach 
 is but one.
 The attached rar holds the source and associated configuration. This has been 
 tested on the 1.4 release codebase (search in the attached solrconfig.xml for 
 SolrACLSecurity to find the relevant sections in this file).
 I hope this proves helpful for people who are looking for this sort of 
 functionality in Solr, and more generally to address how such a mechanism 
 could ultimately be integrated into a future Solr release.
 Many thanks,
 Peter

-- 
This message is automatically generated by JIRA.
-
You can reply 

[jira] Commented: (SOLR-1143) Return partial results when a connection to a shard is refused

2010-04-03 Thread Peter Sturge (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853168#action_12853168
 ] 

Peter Sturge commented on SOLR-1143:


This is a cool patch - yes, very useful.

I've found a couple of issues with it, though:

1. When going through the 'waiting for shard replies' loop, because no 
exception is thrown on shard failure, the next block after the loop can throw a 
NullPointerException in {{SearchComponent.handleResponses()}} for any 
SearchComponent that checks shard responses. It could be that this doesn't 
always happen, but it certainly happens in FacetComponent when date_facets are 
turned on.

2. There's a bit of code that sets {{partialResults=true}} if there's at least 
one failure, but it doesn't set it to false if everything's ok. In order for 
the patch to operate, this parameter must have already been present and true, 
otherwise the patch is essentially 'disabled' anyway (problem of using the same 
parameter as input and result).

I've made some modifications to the patch for these and a couple of other 
things:

1. FacetComponent modified to check for null shard reponse. Perhaps it would be 
better to check this in SearchHandler.handleResponses(), but then no 
SearchComponents would be contacted re failed shards, even if they don't care 
that it's failed (is that a good thing?).

2. Added a new CommonParams parameter called FAILED_SHARDS.
{{partialResults}} is now only an input parameter to enable the feature (Note: 
{{partialResults}} is referenced in RequestHandlerBase, but it's not from the 
patch - is this an existing parameter that is used for something else?! If so, 
perhaps the name should be changed to something like {{allowPartialResults}} to 
avoid b/w compat and other potential conflicts).
The output parameter that goes in the response header is now: 
{{failedShards=shard0;shard1;shardn}}. If everything succeeds, there will be no 
failedShards in the response header, otherwise, a list of failed shards is 
given. This is very useful to alert someone/something that a server/network 
needs attention (e.g. a health checker thread could run empty disributed 
seaches solely for the purpose of checking status).

3. Changed the detection of a shard request error to be any Exception, rather 
than just ConnectException. This way, any failure is caught and can be 
actioned. Possible TODO: it might be nice to include a short message (Exception 
class name?) in the FAILED_SHARDS parameter about what failed (e.g. 
ConnectException, IOException, etc.). If you like this idea, please say so, and 
I'll include it - i.e. something like: 
{{
failedShards=myshard:8983/solr/core0|ConnectException;myothershard:8983/solr/core0|IOException}}

I'm currently testing these changes in our internal build. In the meantime, any 
comments are grealy appreciated. If there are no objections, I'll add a patch 
update when the dev test run is complete.




 Return partial results when a connection to a shard is refused
 --

 Key: SOLR-1143
 URL: https://issues.apache.org/jira/browse/SOLR-1143
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Nicolas Dessaigne
Assignee: Grant Ingersoll
 Fix For: 1.5

 Attachments: SOLR-1143-2.patch, SOLR-1143-3.patch, SOLR-1143.patch


 If any shard is down in a distributed search, a ConnectException it thrown.
 Here's a little patch that change this behaviour: if we can't connect to a 
 shard (ConnectException), we get partial results from the active shards. As 
 for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we 
 set the parameter partialResults at true.
 This patch also adresses a problem expressed in the mailing list about a year 
 ago 
 (http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html)
 We have a use case that needs this behaviour and we would like to know your 
 thougths about such a behaviour? Should it be the default behaviour for 
 distributed search?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1861) HTTP Authentication for sharded queries

2010-04-02 Thread Peter Sturge (JIRA)
HTTP Authentication for sharded queries
---

 Key: SOLR-1861
 URL: https://issues.apache.org/jira/browse/SOLR-1861
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
 Environment: Solr 1.4
Reporter: Peter Sturge
Priority: Minor


This issue came out of a requirement to have HTTP authentication for queries. 
Currently, HTTP authentication works for querying single servers, but it's not 
possible for distributed searches across multiple shards to receive 
authenticated http requests.

This patch adds the option for Solr clients to pass shard-specific http 
credentials to SearchHandler, which can then use these credentials when making 
http requests to shards.

Here's how the patch works:

A final constant String called {{shardcredentials}} acts as the name of the 
SolrParams parameter key name.
The format for the value associated with this key is a comma-delimited list of 
colon-separated tokens:
{{   
shard0:port0:username0:password0,shard1:port1:username1:password1,shardN:portN:usernameN:passwordN
  }}
A client adds these parameters to their sharded request. 
In the absence of {{shardcredentials}} and/or matching credentials, the patch 
reverts to the existing behaviour of using a default http client (i.e. no 
credentials). This ensures b/w compatibility.

When SearchHandler receives the request, it passes the 'shardcredentials' 
parameter to the HttpCommComponent via the submit() method.
The HttpCommComponent parses the parameter string, and when it finds matching 
credentials for a given shard, it creates an HttpClient object with those 
credentials, and then sends the request using this.
Note: Because the match comparison is a string compare (a.o.t. dns compare), 
the host/ip names used in the shardcredentials parameters must match those used 
in the shards parameter.

Impl Notes:
This patch is used and tested on the 1.4 release codebase. There weren't any 
significant diffs between the 1.4 release and the latest trunk for 
SearchHandler, so should be fine on other trunks, but I've only tested with the 
1.4 release code base.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1861) HTTP Authentication for sharded queries

2010-04-02 Thread Peter Sturge (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge updated SOLR-1861:
---

Attachment: SearchHandler.java

Apologies that this is the source file and not a diff'ed patch file.

I've tried so many Win doze svn products, but I just can't get them to create a 
patch file (I'm sure this is more down to me not configuring them correctly, 
rather than rapidsvn, visualsvn, Tortoisesvn etc.).
If someone would like to create a patch file from this source, that would be 
extraordinarily kind of you!
In any case, the changes to this file are quite straightforward.


 HTTP Authentication for sharded queries
 ---

 Key: SOLR-1861
 URL: https://issues.apache.org/jira/browse/SOLR-1861
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
 Environment: Solr 1.4
Reporter: Peter Sturge
Priority: Minor
 Attachments: SearchHandler.java


 This issue came out of a requirement to have HTTP authentication for queries. 
 Currently, HTTP authentication works for querying single servers, but it's 
 not possible for distributed searches across multiple shards to receive 
 authenticated http requests.
 This patch adds the option for Solr clients to pass shard-specific http 
 credentials to SearchHandler, which can then use these credentials when 
 making http requests to shards.
 Here's how the patch works:
 A final constant String called {{shardcredentials}} acts as the name of the 
 SolrParams parameter key name.
 The format for the value associated with this key is a comma-delimited list 
 of colon-separated tokens:
 {{   
 shard0:port0:username0:password0,shard1:port1:username1:password1,shardN:portN:usernameN:passwordN
   }}
 A client adds these parameters to their sharded request. 
 In the absence of {{shardcredentials}} and/or matching credentials, the patch 
 reverts to the existing behaviour of using a default http client (i.e. no 
 credentials). This ensures b/w compatibility.
 When SearchHandler receives the request, it passes the 'shardcredentials' 
 parameter to the HttpCommComponent via the submit() method.
 The HttpCommComponent parses the parameter string, and when it finds matching 
 credentials for a given shard, it creates an HttpClient object with those 
 credentials, and then sends the request using this.
 Note: Because the match comparison is a string compare (a.o.t. dns compare), 
 the host/ip names used in the shardcredentials parameters must match those 
 used in the shards parameter.
 Impl Notes:
 This patch is used and tested on the 1.4 release codebase. There weren't any 
 significant diffs between the 1.4 release and the latest trunk for 
 SearchHandler, so should be fine on other trunks, but I've only tested with 
 the 1.4 release code base.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1861) HTTP Authentication for sharded queries

2010-04-02 Thread Peter Sturge (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge updated SOLR-1861:
---

Attachment: SearchHandler.java

A small update to this patch to support distributed searches with multiple 
cores.


 HTTP Authentication for sharded queries
 ---

 Key: SOLR-1861
 URL: https://issues.apache.org/jira/browse/SOLR-1861
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
 Environment: Solr 1.4
Reporter: Peter Sturge
Priority: Minor
 Attachments: SearchHandler.java, SearchHandler.java


 This issue came out of a requirement to have HTTP authentication for queries. 
 Currently, HTTP authentication works for querying single servers, but it's 
 not possible for distributed searches across multiple shards to receive 
 authenticated http requests.
 This patch adds the option for Solr clients to pass shard-specific http 
 credentials to SearchHandler, which can then use these credentials when 
 making http requests to shards.
 Here's how the patch works:
 A final constant String called {{shardcredentials}} acts as the name of the 
 SolrParams parameter key name.
 The format for the value associated with this key is a comma-delimited list 
 of colon-separated tokens:
 {{   
 shard0:port0:username0:password0,shard1:port1:username1:password1,shardN:portN:usernameN:passwordN
   }}
 A client adds these parameters to their sharded request. 
 In the absence of {{shardcredentials}} and/or matching credentials, the patch 
 reverts to the existing behaviour of using a default http client (i.e. no 
 credentials). This ensures b/w compatibility.
 When SearchHandler receives the request, it passes the 'shardcredentials' 
 parameter to the HttpCommComponent via the submit() method.
 The HttpCommComponent parses the parameter string, and when it finds matching 
 credentials for a given shard, it creates an HttpClient object with those 
 credentials, and then sends the request using this.
 Note: Because the match comparison is a string compare (a.o.t. dns compare), 
 the host/ip names used in the shardcredentials parameters must match those 
 used in the shards parameter.
 Impl Notes:
 This patch is used and tested on the 1.4 release codebase. There weren't any 
 significant diffs between the 1.4 release and the latest trunk for 
 SearchHandler, so should be fine on other trunks, but I've only tested with 
 the 1.4 release code base.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1672) RFE: facet reverse sort count

2010-03-26 Thread Peter Sturge (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850159#action_12850159
 ] 

Peter Sturge commented on SOLR-1672:


I agree there's some refactoring to do to bring it in line with current 
FacetParams conventions. At the same time, it would be good to look at wrapping 
up the functionality into a method, and covering all the code paths in the way 
you describe.

I've been wanting to get to finishing off this patch, but I'm in the throws of 
a product release myself, so I've not had many spare cycles.

You mention termenum, fieldcache, uninverted - presumably, these are among the 
code paths that need to cater for facet counts. If you know them, can you add a 
comment here that lists all the areas that need to be catered for, so that none 
are left out (if it's more than those 3).

Thanks!
Peter


 RFE: facet reverse sort count
 -

 Key: SOLR-1672
 URL: https://issues.apache.org/jira/browse/SOLR-1672
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
 Environment: Java, Solrj, http
Reporter: Peter Sturge
Priority: Minor
 Attachments: SOLR-1672.patch

   Original Estimate: 0h
  Remaining Estimate: 0h

 As suggested by Chris Hosstetter, I have added an optional Comparator to the 
 BoundedTreeSetLong in the UnInvertedField class.
 This optional comparator is used when a new (and also optional) field facet 
 parameter called 'facet.sortorder' is set to the string 'dsc' 
 (e.g. f.facetname.facet.sortorder=dsc for per field, or 
 facet.sortorder=dsc for all facets).
 Note that this parameter has no effect if facet.method=enum.
 Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to 
 its default behaviour.
  
 This change affects 2 source files:
  UnInvertedField.java
 [line 438] The getCounts() method signature is modified to add the 
 'facetSortOrder' parameter value to the end of the argument list.
  
 DIFF UnInvertedField.java:
 - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
 offset, int limit, Integer mincount, boolean missing, String sort, String 
 prefix) throws IOException {
 + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
 offset, int limit, Integer mincount, boolean missing, String sort, String 
 prefix, String facetSortOrder) throws IOException {
 [line 556] The getCounts() method is modified to create an overridden 
 BoundedTreeSetLong(int, Comparator) if the 'facetSortOrder' parameter 
 equals 'dsc'.
 DIFF UnInvertedField.java:
 - final BoundedTreeSetLong queue = new BoundedTreeSetLong(maxsize);
 + final BoundedTreeSetLong queue = (sort.equals(count) || 
 sort.equals(true)) ? (facetSortOrder.equals(dsc) ? new 
 BoundedTreeSetLong(maxsize, new Comparator()
 { @Override
 public int compare(Object o1, Object o2)
 {
   if (o1 == null || o2 == null)
 return 0;
   int result = ((Long) o1).compareTo((Long) o2);
   return (result != 0 ? result  0 ? -1 : 1 : 0); //lowest number first sort
 }}) : new BoundedTreeSetLong(maxsize)) : null;
  SimpleFacets.java
 [line 221] A getFieldParam(field, facet.sortorder, asc); is added to 
 retrieve the new parameter, if present. 'asc' used as a default value.
 DIFF SimpleFacets.java:
 + String facetSortOrder = params.getFieldParam(field, facet.sortorder, 
 asc);
  
 [line 253] The call to uif.getCounts() in the getTermCounts() method is 
 modified to pass the 'facetSortOrder' value string.
 DIFF SimpleFacets.java:
 - counts = uif.getCounts(searcher, base, offset, limit, 
 mincount,missing,sort,prefix);
 + counts = uif.getCounts(searcher, base, offset, limit, 
 mincount,missing,sort,prefix, facetSortOrder);
 Implementation Notes:
 I have noted in testing that I was not able to retrieve any '0' counts as I 
 had expected.
 I believe this could be because there appear to be some optimizations in 
 SimpleFacets/count caching such that zero counts are not iterated (at least 
 not by default)
 as a performance enhancement.
 I could be wrong about this, and zero counts may appear under some other as 
 yet untested circumstances. Perhaps an expert familiar with this part of the 
 code can clarify.
 In fact, this is not such a bad thing (at least for my requirements), as a 
 whole bunch of zero counts is not necessarily useful (for my requirements, 
 starting at '1' is just right).
  
 There may, however, be instances where someone *will* want zero counts - e.g. 
 searching for zero product stock counts (e.g. 'what have we run out of'). I 
 was envisioning the facet.mincount field
 being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 
 or possibly higher), but because of the caching/optimization, the behaviour 
 is somewhat different than expected.

-- 
This 

[jira] Updated: (SOLR-1729) Date Facet now override time parameter

2010-02-17 Thread Peter Sturge (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge updated SOLR-1729:
---

Attachment: UnInvertedField.java

Hi Thomas,

Thanks for catching this. I thought I'd attached that one. *sigh* Honestly, 
that is really slack of me - many apologies.
The attached UnInvertedField.java has the updated getCounts() method. Any 
troubles, let me know.

Thanks!
Peter


 Date Facet now override time parameter
 --

 Key: SOLR-1729
 URL: https://issues.apache.org/jira/browse/SOLR-1729
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
 Environment: Solr 1.4
Reporter: Peter Sturge
Priority: Minor
 Attachments: FacetParams.java, SimpleFacets.java, UnInvertedField.java


 This PATCH introduces a new query parameter that tells a (typically, but not 
 necessarily) remote server what time to use as 'NOW' when calculating date 
 facets for a query (and, for the moment, date facets *only*) - overriding the 
 default behaviour of using the local server's current time.
 This gets 'round a problem whereby an explicit time range is specified in a 
 query (e.g. timestamp:[then0 TO then1]), and date facets are required for the 
 given time range (in fact, any explicit time range). 
 Because DateMathParser performs all its calculations from 'NOW', remote 
 callers have to work out how long ago 'then0' and 'then1' are from 'now', and 
 use the relative-to-now values in the facet.date.xxx parameters. If a remote 
 server has a different opinion of NOW compared to the caller, the results 
 will be skewed (e.g. they are in a different time-zone, not time-synced etc.).
 This becomes particularly salient when performing distributed date faceting 
 (see SOLR-1709), where multiple shards may all be running with different 
 times, and the faceting needs to be aligned.
 The new parameter is called 'facet.date.now', and takes as a parameter a 
 (stringified) long that is the number of milliseconds from the epoch (1 Jan 
 1970 00:00) - i.e. the returned value from a System.currentTimeMillis() call. 
 This was chosen over a formatted date to delineate it from a 'searchable' 
 time and to avoid superfluous date parsing. This makes the value generally a 
 programatically-set value, but as that is where the use-case is for this type 
 of parameter, this should be ok.
 NOTE: This parameter affects date facet timing only. If there are other areas 
 of a query that rely on 'NOW', these will not interpret this value. This is a 
 broader issue about setting a 'query-global' NOW that all parts of query 
 analysis can share.
 Source files affected:
 FacetParams.java   (holds the new constant FACET_DATE_NOW)
 SimpleFacets.java  getFacetDateCounts() NOW parameter modified
 This PATCH is mildly related to SOLR-1709 (Distributed Date Faceting), but as 
 it's a general change for date faceting, it was deemed deserving of its own 
 patch. I will be updating SOLR-1709 in due course to include the use of this 
 new parameter, after some rfc acceptance.
 A possible enhancement to this is to detect facet.date fields, look for and 
 match these fields in queries (if they exist), and potentially determine 
 automatically the required time skew, if any. There are a whole host of 
 reasons why this could be problematic to implement, so an explicit 
 facet.date.now parameter is the safest route.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1709) Distributed Date Faceting

2010-02-16 Thread Peter Sturge (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834222#action_12834222
 ] 

Peter Sturge commented on SOLR-1709:


Hi Thomas,

Hmmm...TermsHelper is an inner class inside TermsComponent.
In the code base that I have, this class exists within TermsComponent. I've 
just had a look on the 
http://mirrors.dedipower.com/ftp.apache.org/lucene/solr/1.4.0/ mirror, and the 
TermsComponent *doesn't* have this inner class.

Not sure where the difference is, as I would have got my codebase from the same 
set of mirrors as you (unless some mirrors are out-of-sync?). 

TermsComponent hasn't changed in this patch, so I don't know much about this 
class. One thing to try is to diff the 2 files above with your 1.4 codebase, 
and merge the changes into your codebase. The differences should be very easy 
to see.

This does highlight the very good policy for putting patch files as attachments 
rather than source files. This is my fault, as we don't use svn in our (win) 
environment, and Tortoise SVN crashes explorer64, so i'm not able to make 
compatible diff files - sorry.

If you do create a couple of diff files, it would be very kind of you if you 
could post it up on this issue for others?

Thanks!


 Distributed Date Faceting
 -

 Key: SOLR-1709
 URL: https://issues.apache.org/jira/browse/SOLR-1709
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 1.4
Reporter: Peter Sturge
Priority: Minor
 Attachments: FacetComponent.java, FacetComponent.java, 
 ResponseBuilder.java


 This patch is for adding support for date facets when using distributed 
 searches.
 Date faceting across multiple machines exposes some time-based issues that 
 anyone interested in this behaviour should be aware of:
 Any time and/or time-zone differences are not accounted for in the patch 
 (i.e. merged date facets are at a time-of-day, not necessarily at a universal 
 'instant-in-time', unless all shards are time-synced to the exact same time).
 The implementation uses the first encountered shard's facet_dates as the 
 basis for subsequent shards' data to be merged in.
 This means that if subsequent shards' facet_dates are skewed in relation to 
 the first by 1 'gap', these 'earlier' or 'later' facets will not be merged 
 in.
 There are several reasons for this:
   * Performance: It's faster to check facet_date lists against a single map's 
 data, rather than against each other, particularly if there are many shards
   * If 'earlier' and/or 'later' facet_dates are added in, this will make the 
 time range larger than that which was requested
 (e.g. a request for one hour's worth of facets could bring back 2, 3 
 or more hours of data)
 This could be dealt with if timezone and skew information was added, and 
 the dates were normalized.
 One possibility for adding such support is to [optionally] add 'timezone' and 
 'now' parameters to the 'facet_dates' map. This would tell requesters what 
 time and TZ the remote server thinks it is, and so multiple shards' time data 
 can be normalized.
 The patch affects 2 files in the Solr core:
   org.apache.solr.handler.component.FacetComponent.java
   org.apache.solr.handler.component.ResponseBuilder.java
 The main changes are in FacetComponent - ResponseBuilder is just to hold the 
 completed SimpleOrderedMap until the finishStage.
 One possible enhancement is to perhaps make this an optional parameter, but 
 really, if facet.date parameters are specified, it is assumed they are 
 desired.
 Comments  suggestions welcome.
 As a favour to ask, if anyone could take my 2 source files and create a PATCH 
 file from it, it would be greatly appreciated, as I'm having a bit of trouble 
 with svn (don't shoot me, but my environment is a Redmond-based os company).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1729) Date Facet now override time parameter

2010-02-03 Thread Peter Sturge (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829043#action_12829043
 ] 

Peter Sturge commented on SOLR-1729:


Hi Chris,
Thanks for your comments - I hope I didn't sound like your comments were taken 
wrongly - I absolutely count on comments from you and other experts to make 
sure I'm not missing some important functionality and/or side effect. You know 
the code base far better than I, so its great that you take the time to point 
out all the different bits and peices that need addressing.

I can certainly understand the need to address the 'core-global' isssues raised 
by you and Yonik for storing a ThreadLocal 'query-global' 'NOW'.
I suppose the main issue in implementing the thread-local route is that we'd 
have to make sure we found every place in the query core that references now, 
and point those references to the new variable? If the 'code-at-large' 
[hopefully] always calls the date math routines for finding 'NOW', great, it 
should be relatively straightforward. If there are any stray e.g. 
System.currentTimeMillis(), then it's a bit more fiddly, but still do-able.

??it's all handled internally by DateField??
Sounds like DateField would the best candidate for holding the ThreadLocal? The 
query handler code can set the variable of its DateField instance if it's set 
in a query parameter, otherwise it just defaults to it's own local (UTC) time.
Could be done similarly to DateField.ThreadLocalDateFormat, perhaps?


 Date Facet now override time parameter
 --

 Key: SOLR-1729
 URL: https://issues.apache.org/jira/browse/SOLR-1729
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
 Environment: Solr 1.4
Reporter: Peter Sturge
Priority: Minor
 Attachments: FacetParams.java, SimpleFacets.java


 This PATCH introduces a new query parameter that tells a (typically, but not 
 necessarily) remote server what time to use as 'NOW' when calculating date 
 facets for a query (and, for the moment, date facets *only*) - overriding the 
 default behaviour of using the local server's current time.
 This gets 'round a problem whereby an explicit time range is specified in a 
 query (e.g. timestamp:[then0 TO then1]), and date facets are required for the 
 given time range (in fact, any explicit time range). 
 Because DateMathParser performs all its calculations from 'NOW', remote 
 callers have to work out how long ago 'then0' and 'then1' are from 'now', and 
 use the relative-to-now values in the facet.date.xxx parameters. If a remote 
 server has a different opinion of NOW compared to the caller, the results 
 will be skewed (e.g. they are in a different time-zone, not time-synced etc.).
 This becomes particularly salient when performing distributed date faceting 
 (see SOLR-1709), where multiple shards may all be running with different 
 times, and the faceting needs to be aligned.
 The new parameter is called 'facet.date.now', and takes as a parameter a 
 (stringified) long that is the number of milliseconds from the epoch (1 Jan 
 1970 00:00) - i.e. the returned value from a System.currentTimeMillis() call. 
 This was chosen over a formatted date to delineate it from a 'searchable' 
 time and to avoid superfluous date parsing. This makes the value generally a 
 programatically-set value, but as that is where the use-case is for this type 
 of parameter, this should be ok.
 NOTE: This parameter affects date facet timing only. If there are other areas 
 of a query that rely on 'NOW', these will not interpret this value. This is a 
 broader issue about setting a 'query-global' NOW that all parts of query 
 analysis can share.
 Source files affected:
 FacetParams.java   (holds the new constant FACET_DATE_NOW)
 SimpleFacets.java  getFacetDateCounts() NOW parameter modified
 This PATCH is mildly related to SOLR-1709 (Distributed Date Faceting), but as 
 it's a general change for date faceting, it was deemed deserving of its own 
 patch. I will be updating SOLR-1709 in due course to include the use of this 
 new parameter, after some rfc acceptance.
 A possible enhancement to this is to detect facet.date fields, look for and 
 match these fields in queries (if they exist), and potentially determine 
 automatically the required time skew, if any. There are a whole host of 
 reasons why this could be problematic to implement, so an explicit 
 facet.date.now parameter is the safest route.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1729) Date Facet now override time parameter

2010-01-28 Thread Peter Sturge (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805995#action_12805995
 ] 

Peter Sturge commented on SOLR-1729:


??...they might not all get queried at the exact same time??

I suppose this is what the explicit 'NOW' is meant to resolve - 
staggered/lagged receipt/response, and, in an erzatz fashion, discrepencies in 
local time sync. Since the passed-in 'NOW' is relative only to the epoch, 
network latency is handled, and time-sync on any given server is assumed to be 
correct.

??...multiple requets might be made to a single server for different phrases of 
the distributed request that expect to get the same answers.??

As long as the same code path is followed for such requests, it should honour 
the same (passed-in) 'NOW'. Are there scenarios where this is not the case? In 
which case, yes, these would need to be addressed.

??...unless filter queries that use date math also respect it the counts 
returned from date faceting will still potentially be non-sensical.??

Definitely filter queries will need to get/use/honour the same 'NOW' as its 
corresponding query, otherwise anarchy will quickly ensue.
Can you point me toward the class(es) where filter queries' date math lives, 
and I'll have a look? As filter queries are cached separately, can you think of 
any potential caching issues relating to filter queries?


 Date Facet now override time parameter
 --

 Key: SOLR-1729
 URL: https://issues.apache.org/jira/browse/SOLR-1729
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
 Environment: Solr 1.4
Reporter: Peter Sturge
Priority: Minor
 Attachments: FacetParams.java, SimpleFacets.java


 This PATCH introduces a new query parameter that tells a (typically, but not 
 necessarily) remote server what time to use as 'NOW' when calculating date 
 facets for a query (and, for the moment, date facets *only*) - overriding the 
 default behaviour of using the local server's current time.
 This gets 'round a problem whereby an explicit time range is specified in a 
 query (e.g. timestamp:[then0 TO then1]), and date facets are required for the 
 given time range (in fact, any explicit time range). 
 Because DateMathParser performs all its calculations from 'NOW', remote 
 callers have to work out how long ago 'then0' and 'then1' are from 'now', and 
 use the relative-to-now values in the facet.date.xxx parameters. If a remote 
 server has a different opinion of NOW compared to the caller, the results 
 will be skewed (e.g. they are in a different time-zone, not time-synced etc.).
 This becomes particularly salient when performing distributed date faceting 
 (see SOLR-1709), where multiple shards may all be running with different 
 times, and the faceting needs to be aligned.
 The new parameter is called 'facet.date.now', and takes as a parameter a 
 (stringified) long that is the number of milliseconds from the epoch (1 Jan 
 1970 00:00) - i.e. the returned value from a System.currentTimeMillis() call. 
 This was chosen over a formatted date to delineate it from a 'searchable' 
 time and to avoid superfluous date parsing. This makes the value generally a 
 programatically-set value, but as that is where the use-case is for this type 
 of parameter, this should be ok.
 NOTE: This parameter affects date facet timing only. If there are other areas 
 of a query that rely on 'NOW', these will not interpret this value. This is a 
 broader issue about setting a 'query-global' NOW that all parts of query 
 analysis can share.
 Source files affected:
 FacetParams.java   (holds the new constant FACET_DATE_NOW)
 SimpleFacets.java  getFacetDateCounts() NOW parameter modified
 This PATCH is mildly related to SOLR-1709 (Distributed Date Faceting), but as 
 it's a general change for date faceting, it was deemed deserving of its own 
 patch. I will be updating SOLR-1709 in due course to include the use of this 
 new parameter, after some rfc acceptance.
 A possible enhancement to this is to detect facet.date fields, look for and 
 match these fields in queries (if they exist), and potentially determine 
 automatically the required time skew, if any. There are a whole host of 
 reasons why this could be problematic to implement, so an explicit 
 facet.date.now parameter is the safest route.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1672) RFE: facet reverse sort count

2010-01-22 Thread Peter Sturge (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803858#action_12803858
 ] 

Peter Sturge commented on SOLR-1672:


Jan, you are absolutely correct that the parameter should (and will) be 'desc'.

I have an update in my queue of things todo which changes this, but also 
removes the new 'facet.sortorder' parameter, and includes instead 'facet.sort 
desc' as a valid parameter for facet.sort. This keeps things nice and tidy and 
consistent.

The 'facet.sortorder' parameter was really as POC to try out the behaviour 
before changing the core parameter syntax of the existing 'facet.sort' 
parameter. Not that's done, the parameter will be rolled into 'facet.sort'.

Thanks,
Peter


 RFE: facet reverse sort count
 -

 Key: SOLR-1672
 URL: https://issues.apache.org/jira/browse/SOLR-1672
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
 Environment: Java, Solrj, http
Reporter: Peter Sturge
Priority: Minor
 Attachments: SOLR-1672.patch

   Original Estimate: 0h
  Remaining Estimate: 0h

 As suggested by Chris Hosstetter, I have added an optional Comparator to the 
 BoundedTreeSetLong in the UnInvertedField class.
 This optional comparator is used when a new (and also optional) field facet 
 parameter called 'facet.sortorder' is set to the string 'dsc' 
 (e.g. f.facetname.facet.sortorder=dsc for per field, or 
 facet.sortorder=dsc for all facets).
 Note that this parameter has no effect if facet.method=enum.
 Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to 
 its default behaviour.
  
 This change affects 2 source files:
  UnInvertedField.java
 [line 438] The getCounts() method signature is modified to add the 
 'facetSortOrder' parameter value to the end of the argument list.
  
 DIFF UnInvertedField.java:
 - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
 offset, int limit, Integer mincount, boolean missing, String sort, String 
 prefix) throws IOException {
 + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
 offset, int limit, Integer mincount, boolean missing, String sort, String 
 prefix, String facetSortOrder) throws IOException {
 [line 556] The getCounts() method is modified to create an overridden 
 BoundedTreeSetLong(int, Comparator) if the 'facetSortOrder' parameter 
 equals 'dsc'.
 DIFF UnInvertedField.java:
 - final BoundedTreeSetLong queue = new BoundedTreeSetLong(maxsize);
 + final BoundedTreeSetLong queue = (sort.equals(count) || 
 sort.equals(true)) ? (facetSortOrder.equals(dsc) ? new 
 BoundedTreeSetLong(maxsize, new Comparator()
 { @Override
 public int compare(Object o1, Object o2)
 {
   if (o1 == null || o2 == null)
 return 0;
   int result = ((Long) o1).compareTo((Long) o2);
   return (result != 0 ? result  0 ? -1 : 1 : 0); //lowest number first sort
 }}) : new BoundedTreeSetLong(maxsize)) : null;
  SimpleFacets.java
 [line 221] A getFieldParam(field, facet.sortorder, asc); is added to 
 retrieve the new parameter, if present. 'asc' used as a default value.
 DIFF SimpleFacets.java:
 + String facetSortOrder = params.getFieldParam(field, facet.sortorder, 
 asc);
  
 [line 253] The call to uif.getCounts() in the getTermCounts() method is 
 modified to pass the 'facetSortOrder' value string.
 DIFF SimpleFacets.java:
 - counts = uif.getCounts(searcher, base, offset, limit, 
 mincount,missing,sort,prefix);
 + counts = uif.getCounts(searcher, base, offset, limit, 
 mincount,missing,sort,prefix, facetSortOrder);
 Implementation Notes:
 I have noted in testing that I was not able to retrieve any '0' counts as I 
 had expected.
 I believe this could be because there appear to be some optimizations in 
 SimpleFacets/count caching such that zero counts are not iterated (at least 
 not by default)
 as a performance enhancement.
 I could be wrong about this, and zero counts may appear under some other as 
 yet untested circumstances. Perhaps an expert familiar with this part of the 
 code can clarify.
 In fact, this is not such a bad thing (at least for my requirements), as a 
 whole bunch of zero counts is not necessarily useful (for my requirements, 
 starting at '1' is just right).
  
 There may, however, be instances where someone *will* want zero counts - e.g. 
 searching for zero product stock counts (e.g. 'what have we run out of'). I 
 was envisioning the facet.mincount field
 being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 
 or possibly higher), but because of the caching/optimization, the behaviour 
 is somewhat different than expected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1729) Date Facet now override time parameter

2010-01-22 Thread Peter Sturge (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803860#action_12803860
 ] 

Peter Sturge commented on SOLR-1729:


I agree there are wider issues that relate to this -- this particular patch 
addresses the time sync issue for allowing distributed date facets to happen.
In this case, you must have multiple cores using the same NOW for all, so that 
your date facets are consistent. In fact, it doesn't really matter which now 
you use, as long they're all the same -- the caller setting the now value makes 
the most sense.

For other time-related queries, this might not be the case, but as you rightly 
pointed out, these are not addressed here.


 Date Facet now override time parameter
 --

 Key: SOLR-1729
 URL: https://issues.apache.org/jira/browse/SOLR-1729
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
 Environment: Solr 1.4
Reporter: Peter Sturge
Priority: Minor
 Attachments: FacetParams.java, SimpleFacets.java


 This PATCH introduces a new query parameter that tells a (typically, but not 
 necessarily) remote server what time to use as 'NOW' when calculating date 
 facets for a query (and, for the moment, date facets *only*) - overriding the 
 default behaviour of using the local server's current time.
 This gets 'round a problem whereby an explicit time range is specified in a 
 query (e.g. timestamp:[then0 TO then1]), and date facets are required for the 
 given time range (in fact, any explicit time range). 
 Because DateMathParser performs all its calculations from 'NOW', remote 
 callers have to work out how long ago 'then0' and 'then1' are from 'now', and 
 use the relative-to-now values in the facet.date.xxx parameters. If a remote 
 server has a different opinion of NOW compared to the caller, the results 
 will be skewed (e.g. they are in a different time-zone, not time-synced etc.).
 This becomes particularly salient when performing distributed date faceting 
 (see SOLR-1709), where multiple shards may all be running with different 
 times, and the faceting needs to be aligned.
 The new parameter is called 'facet.date.now', and takes as a parameter a 
 (stringified) long that is the number of milliseconds from the epoch (1 Jan 
 1970 00:00) - i.e. the returned value from a System.currentTimeMillis() call. 
 This was chosen over a formatted date to delineate it from a 'searchable' 
 time and to avoid superfluous date parsing. This makes the value generally a 
 programatically-set value, but as that is where the use-case is for this type 
 of parameter, this should be ok.
 NOTE: This parameter affects date facet timing only. If there are other areas 
 of a query that rely on 'NOW', these will not interpret this value. This is a 
 broader issue about setting a 'query-global' NOW that all parts of query 
 analysis can share.
 Source files affected:
 FacetParams.java   (holds the new constant FACET_DATE_NOW)
 SimpleFacets.java  getFacetDateCounts() NOW parameter modified
 This PATCH is mildly related to SOLR-1709 (Distributed Date Faceting), but as 
 it's a general change for date faceting, it was deemed deserving of its own 
 patch. I will be updating SOLR-1709 in due course to include the use of this 
 new parameter, after some rfc acceptance.
 A possible enhancement to this is to detect facet.date fields, look for and 
 match these fields in queries (if they exist), and potentially determine 
 automatically the required time skew, if any. There are a whole host of 
 reasons why this could be problematic to implement, so an explicit 
 facet.date.now parameter is the safest route.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1729) Date Facet now override time parameter

2010-01-21 Thread Peter Sturge (JIRA)
Date Facet now override time parameter
--

 Key: SOLR-1729
 URL: https://issues.apache.org/jira/browse/SOLR-1729
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
 Environment: Solr 1.4
Reporter: Peter Sturge
Priority: Minor


This PATCH introduces a new query parameter that tells a (typically, but not 
necessarily) remote server what time to use as 'NOW' when calculating date 
facets for a query (and, for the moment, date facets *only*) - overriding the 
default behaviour of using the local server's current time.

This gets 'round a problem whereby an explicit time range is specified in a 
query (e.g. timestamp:[then0 TO then1]), and date facets are required for the 
given time range (in fact, any explicit time range). 
Because DateMathParser performs all its calculations from 'NOW', remote callers 
have to work out how long ago 'then0' and 'then1' are from 'now', and use the 
relative-to-now values in the facet.date.xxx parameters. If a remote server has 
a different opinion of NOW compared to the caller, the results will be skewed 
(e.g. they are in a different time-zone, not time-synced etc.).
This becomes particularly salient when performing distributed date faceting 
(see SOLR-1709), where multiple shards may all be running with different times, 
and the faceting needs to be aligned.

The new parameter is called 'facet.date.now', and takes as a parameter a 
(stringified) long that is the number of milliseconds from the epoch (1 Jan 
1970 00:00) - i.e. the returned value from a System.currentTimeMillis() call. 
This was chosen over a formatted date to delineate it from a 'searchable' time 
and to avoid superfluous date parsing. This makes the value generally a 
programatically-set value, but as that is where the use-case is for this type 
of parameter, this should be ok.

NOTE: This parameter affects date facet timing only. If there are other areas 
of a query that rely on 'NOW', these will not interpret this value. This is a 
broader issue about setting a 'query-global' NOW that all parts of query 
analysis can share.

Source files affected:
FacetParams.java   (holds the new constant FACET_DATE_NOW)
SimpleFacets.java  getFacetDateCounts() NOW parameter modified

This PATCH is mildly related to SOLR-1709 (Distributed Date Faceting), but as 
it's a general change for date faceting, it was deemed deserving of its own 
patch. I will be updating SOLR-1709 in due course to include the use of this 
new parameter, after some rfc acceptance.

A possible enhancement to this is to detect facet.date fields, look for and 
match these fields in queries (if they exist), and potentially determine 
automatically the required time skew, if any. There are a whole host of reasons 
why this could be problematic to implement, so an explicit facet.date.now 
parameter is the safest route.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1729) Date Facet now override time parameter

2010-01-21 Thread Peter Sturge (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge updated SOLR-1729:
---

Attachment: FacetParams.java
SimpleFacets.java

These are the source files affected for this patch.
Apologies for not creating a PATCH file - my tortoise svn is not working for 
creating patch files.
If anyone would like to create a patch from these, that would be 
extraordinarily kind of you!

Diff: (trunk: 1.4 Release)
FacetParams.java:
Add at line 179:
  /**
   * String that tells the date facet counter what time to use as 'now'.
   * 
   * The value of this parameter, if it exists, must be a stringified long 
   * of the number of milliseconds since the epoch (milliseconds since 1 Jan 
1970 00:00).
   * System.currentTimeMillis() provides this.
   * 
   * The DateField and DateMathParser work out their times relative to 'now'.
   * By default, 'now' is the local machine's System.currentTimeMillis().
   * This parameter overrides the local value to use a different time.
   * This is very useful for remote server queries where the times on the 
querying
   * machine are skewed/different than that of the date faceting machine.
   * This is a date.facet global query parameter (i.e. not per field)
   * @see DateMathParser
   * @see DateField
   */
  public static final String FACET_DATE_NOW = facet.date.now;

SimpleFacets.java:
Change at line 551:
-final Date NOW = new Date();
+ final Date NOW = new Date(params.get(FacetParams.FACET_DATE_NOW) != null 
? Long.parseLong(params.get(facet.date.now)) : System.currentTimeMillis());


 Date Facet now override time parameter
 --

 Key: SOLR-1729
 URL: https://issues.apache.org/jira/browse/SOLR-1729
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
 Environment: Solr 1.4
Reporter: Peter Sturge
Priority: Minor
 Attachments: FacetParams.java, SimpleFacets.java


 This PATCH introduces a new query parameter that tells a (typically, but not 
 necessarily) remote server what time to use as 'NOW' when calculating date 
 facets for a query (and, for the moment, date facets *only*) - overriding the 
 default behaviour of using the local server's current time.
 This gets 'round a problem whereby an explicit time range is specified in a 
 query (e.g. timestamp:[then0 TO then1]), and date facets are required for the 
 given time range (in fact, any explicit time range). 
 Because DateMathParser performs all its calculations from 'NOW', remote 
 callers have to work out how long ago 'then0' and 'then1' are from 'now', and 
 use the relative-to-now values in the facet.date.xxx parameters. If a remote 
 server has a different opinion of NOW compared to the caller, the results 
 will be skewed (e.g. they are in a different time-zone, not time-synced etc.).
 This becomes particularly salient when performing distributed date faceting 
 (see SOLR-1709), where multiple shards may all be running with different 
 times, and the faceting needs to be aligned.
 The new parameter is called 'facet.date.now', and takes as a parameter a 
 (stringified) long that is the number of milliseconds from the epoch (1 Jan 
 1970 00:00) - i.e. the returned value from a System.currentTimeMillis() call. 
 This was chosen over a formatted date to delineate it from a 'searchable' 
 time and to avoid superfluous date parsing. This makes the value generally a 
 programatically-set value, but as that is where the use-case is for this type 
 of parameter, this should be ok.
 NOTE: This parameter affects date facet timing only. If there are other areas 
 of a query that rely on 'NOW', these will not interpret this value. This is a 
 broader issue about setting a 'query-global' NOW that all parts of query 
 analysis can share.
 Source files affected:
 FacetParams.java   (holds the new constant FACET_DATE_NOW)
 SimpleFacets.java  getFacetDateCounts() NOW parameter modified
 This PATCH is mildly related to SOLR-1709 (Distributed Date Faceting), but as 
 it's a general change for date faceting, it was deemed deserving of its own 
 patch. I will be updating SOLR-1709 in due course to include the use of this 
 new parameter, after some rfc acceptance.
 A possible enhancement to this is to detect facet.date fields, look for and 
 match these fields in queries (if they exist), and potentially determine 
 automatically the required time skew, if any. There are a whole host of 
 reasons why this could be problematic to implement, so an explicit 
 facet.date.now parameter is the safest route.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1709) Distributed Date Faceting

2010-01-21 Thread Peter Sturge (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge updated SOLR-1709:
---

Attachment: FacetComponent.java

Updated version of FacetComponent.java after more testing and sync with 
FacetParams.FACET_DATE_NOW (see SOLR-1729).
For use with the 1.4 trunk (along with the existing ResponseBuilder.java in 
this patch).


 Distributed Date Faceting
 -

 Key: SOLR-1709
 URL: https://issues.apache.org/jira/browse/SOLR-1709
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 1.4
Reporter: Peter Sturge
Priority: Minor
 Attachments: FacetComponent.java, FacetComponent.java, 
 ResponseBuilder.java


 This patch is for adding support for date facets when using distributed 
 searches.
 Date faceting across multiple machines exposes some time-based issues that 
 anyone interested in this behaviour should be aware of:
 Any time and/or time-zone differences are not accounted for in the patch 
 (i.e. merged date facets are at a time-of-day, not necessarily at a universal 
 'instant-in-time', unless all shards are time-synced to the exact same time).
 The implementation uses the first encountered shard's facet_dates as the 
 basis for subsequent shards' data to be merged in.
 This means that if subsequent shards' facet_dates are skewed in relation to 
 the first by 1 'gap', these 'earlier' or 'later' facets will not be merged 
 in.
 There are several reasons for this:
   * Performance: It's faster to check facet_date lists against a single map's 
 data, rather than against each other, particularly if there are many shards
   * If 'earlier' and/or 'later' facet_dates are added in, this will make the 
 time range larger than that which was requested
 (e.g. a request for one hour's worth of facets could bring back 2, 3 
 or more hours of data)
 This could be dealt with if timezone and skew information was added, and 
 the dates were normalized.
 One possibility for adding such support is to [optionally] add 'timezone' and 
 'now' parameters to the 'facet_dates' map. This would tell requesters what 
 time and TZ the remote server thinks it is, and so multiple shards' time data 
 can be normalized.
 The patch affects 2 files in the Solr core:
   org.apache.solr.handler.component.FacetComponent.java
   org.apache.solr.handler.component.ResponseBuilder.java
 The main changes are in FacetComponent - ResponseBuilder is just to hold the 
 completed SimpleOrderedMap until the finishStage.
 One possible enhancement is to perhaps make this an optional parameter, but 
 really, if facet.date parameters are specified, it is assumed they are 
 desired.
 Comments  suggestions welcome.
 As a favour to ask, if anyone could take my 2 source files and create a PATCH 
 file from it, it would be greatly appreciated, as I'm having a bit of trouble 
 with svn (don't shoot me, but my environment is a Redmond-based os company).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1709) Distributed Date Faceting

2010-01-09 Thread Peter Sturge (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798411#action_12798411
 ] 

Peter Sturge commented on SOLR-1709:


Yonik,

Yes, I can see what you mean that of course NOW will affect anything 
date-related to a given query.
I'm wondering whether the passing of 'NOW' to shards should be a separate 
issue/patch from this one (e.g. something like 'Time Sync to Remote Shards'), 
as its scope and ramifications go far beyond simply distributed date faceting.
The whole area of code relating to date math is one that I'm not familiar with, 
but do let me know if there's anything you'd like me to look at.


 Distributed Date Faceting
 -

 Key: SOLR-1709
 URL: https://issues.apache.org/jira/browse/SOLR-1709
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 1.4
Reporter: Peter Sturge
Priority: Minor
 Attachments: FacetComponent.java, ResponseBuilder.java


 This patch is for adding support for date facets when using distributed 
 searches.
 Date faceting across multiple machines exposes some time-based issues that 
 anyone interested in this behaviour should be aware of:
 Any time and/or time-zone differences are not accounted for in the patch 
 (i.e. merged date facets are at a time-of-day, not necessarily at a universal 
 'instant-in-time', unless all shards are time-synced to the exact same time).
 The implementation uses the first encountered shard's facet_dates as the 
 basis for subsequent shards' data to be merged in.
 This means that if subsequent shards' facet_dates are skewed in relation to 
 the first by 1 'gap', these 'earlier' or 'later' facets will not be merged 
 in.
 There are several reasons for this:
   * Performance: It's faster to check facet_date lists against a single map's 
 data, rather than against each other, particularly if there are many shards
   * If 'earlier' and/or 'later' facet_dates are added in, this will make the 
 time range larger than that which was requested
 (e.g. a request for one hour's worth of facets could bring back 2, 3 
 or more hours of data)
 This could be dealt with if timezone and skew information was added, and 
 the dates were normalized.
 One possibility for adding such support is to [optionally] add 'timezone' and 
 'now' parameters to the 'facet_dates' map. This would tell requesters what 
 time and TZ the remote server thinks it is, and so multiple shards' time data 
 can be normalized.
 The patch affects 2 files in the Solr core:
   org.apache.solr.handler.component.FacetComponent.java
   org.apache.solr.handler.component.ResponseBuilder.java
 The main changes are in FacetComponent - ResponseBuilder is just to hold the 
 completed SimpleOrderedMap until the finishStage.
 One possible enhancement is to perhaps make this an optional parameter, but 
 really, if facet.date parameters are specified, it is assumed they are 
 desired.
 Comments  suggestions welcome.
 As a favour to ask, if anyone could take my 2 source files and create a PATCH 
 file from it, it would be greatly appreciated, as I'm having a bit of trouble 
 with svn (don't shoot me, but my environment is a Redmond-based os company).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1709) Distributed Date Faceting

2010-01-08 Thread Peter Sturge (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797957#action_12797957
 ] 

Peter Sturge commented on SOLR-1709:


I've heard of Tortoise, I'll give that a try, thanks.

On the time-zone/skew issue, perhaps a more efficient approach would be a 
'push' rather than 'pull' - i.e.:

Requesters would include an optional parameter that told remote shards what 
time to use as 'NOW', and which TZ to use for date faceting.
This would avoid having to translate loads of time strings at merge time.

Thanks,
Peter


 Distributed Date Faceting
 -

 Key: SOLR-1709
 URL: https://issues.apache.org/jira/browse/SOLR-1709
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 1.4
Reporter: Peter Sturge
Priority: Minor

 This patch is for adding support for date facets when using distributed 
 searches.
 Date faceting across multiple machines exposes some time-based issues that 
 anyone interested in this behaviour should be aware of:
 Any time and/or time-zone differences are not accounted for in the patch 
 (i.e. merged date facets are at a time-of-day, not necessarily at a universal 
 'instant-in-time', unless all shards are time-synced to the exact same time).
 The implementation uses the first encountered shard's facet_dates as the 
 basis for subsequent shards' data to be merged in.
 This means that if subsequent shards' facet_dates are skewed in relation to 
 the first by 1 'gap', these 'earlier' or 'later' facets will not be merged 
 in.
 There are several reasons for this:
   * Performance: It's faster to check facet_date lists against a single map's 
 data, rather than against each other, particularly if there are many shards
   * If 'earlier' and/or 'later' facet_dates are added in, this will make the 
 time range larger than that which was requested
 (e.g. a request for one hour's worth of facets could bring back 2, 3 
 or more hours of data)
 This could be dealt with if timezone and skew information was added, and 
 the dates were normalized.
 One possibility for adding such support is to [optionally] add 'timezone' and 
 'now' parameters to the 'facet_dates' map. This would tell requesters what 
 time and TZ the remote server thinks it is, and so multiple shards' time data 
 can be normalized.
 The patch affects 2 files in the Solr core:
   org.apache.solr.handler.component.FacetComponent.java
   org.apache.solr.handler.component.ResponseBuilder.java
 The main changes are in FacetComponent - ResponseBuilder is just to hold the 
 completed SimpleOrderedMap until the finishStage.
 One possible enhancement is to perhaps make this an optional parameter, but 
 really, if facet.date parameters are specified, it is assumed they are 
 desired.
 Comments  suggestions welcome.
 As a favour to ask, if anyone could take my 2 source files and create a PATCH 
 file from it, it would be greatly appreciated, as I'm having a bit of trouble 
 with svn (don't shoot me, but my environment is a Redmond-based os company).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1709) Distributed Date Faceting

2010-01-08 Thread Peter Sturge (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge updated SOLR-1709:
---

Attachment: ResponseBuilder.java
FacetComponent.java

Sorry, guys, can't get svn to create a patch file correctly on windows, so I'm 
attaching the source files here. With some time, which at the moment I don't 
have, I'm sure I could get svn working. Rather than anyone have to wait for me 
to get the patch file created, I thought it best to get the source uploaded, so 
people can start using it.
Thanks, Peter


 Distributed Date Faceting
 -

 Key: SOLR-1709
 URL: https://issues.apache.org/jira/browse/SOLR-1709
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 1.4
Reporter: Peter Sturge
Priority: Minor
 Attachments: FacetComponent.java, ResponseBuilder.java


 This patch is for adding support for date facets when using distributed 
 searches.
 Date faceting across multiple machines exposes some time-based issues that 
 anyone interested in this behaviour should be aware of:
 Any time and/or time-zone differences are not accounted for in the patch 
 (i.e. merged date facets are at a time-of-day, not necessarily at a universal 
 'instant-in-time', unless all shards are time-synced to the exact same time).
 The implementation uses the first encountered shard's facet_dates as the 
 basis for subsequent shards' data to be merged in.
 This means that if subsequent shards' facet_dates are skewed in relation to 
 the first by 1 'gap', these 'earlier' or 'later' facets will not be merged 
 in.
 There are several reasons for this:
   * Performance: It's faster to check facet_date lists against a single map's 
 data, rather than against each other, particularly if there are many shards
   * If 'earlier' and/or 'later' facet_dates are added in, this will make the 
 time range larger than that which was requested
 (e.g. a request for one hour's worth of facets could bring back 2, 3 
 or more hours of data)
 This could be dealt with if timezone and skew information was added, and 
 the dates were normalized.
 One possibility for adding such support is to [optionally] add 'timezone' and 
 'now' parameters to the 'facet_dates' map. This would tell requesters what 
 time and TZ the remote server thinks it is, and so multiple shards' time data 
 can be normalized.
 The patch affects 2 files in the Solr core:
   org.apache.solr.handler.component.FacetComponent.java
   org.apache.solr.handler.component.ResponseBuilder.java
 The main changes are in FacetComponent - ResponseBuilder is just to hold the 
 completed SimpleOrderedMap until the finishStage.
 One possible enhancement is to perhaps make this an optional parameter, but 
 really, if facet.date parameters are specified, it is assumed they are 
 desired.
 Comments  suggestions welcome.
 As a favour to ask, if anyone could take my 2 source files and create a PATCH 
 file from it, it would be greatly appreciated, as I'm having a bit of trouble 
 with svn (don't shoot me, but my environment is a Redmond-based os company).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1709) Distributed Date Faceting

2010-01-08 Thread Peter Sturge (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798233#action_12798233
 ] 

Peter Sturge commented on SOLR-1709:


Definitely true! -- messing about with Date strings isn't great for performance.

As the NOW parameter would be for internal request use only (i.e. not for the 
indexer, not for human consumption), could it not just be an epoch long? The 
adjustment math should then be nice and quick (no string/date 
parsing/formatting; at worst just one Date.getTimeInMillis() call if the time 
is stored locally as a string).

 Distributed Date Faceting
 -

 Key: SOLR-1709
 URL: https://issues.apache.org/jira/browse/SOLR-1709
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 1.4
Reporter: Peter Sturge
Priority: Minor
 Attachments: FacetComponent.java, ResponseBuilder.java


 This patch is for adding support for date facets when using distributed 
 searches.
 Date faceting across multiple machines exposes some time-based issues that 
 anyone interested in this behaviour should be aware of:
 Any time and/or time-zone differences are not accounted for in the patch 
 (i.e. merged date facets are at a time-of-day, not necessarily at a universal 
 'instant-in-time', unless all shards are time-synced to the exact same time).
 The implementation uses the first encountered shard's facet_dates as the 
 basis for subsequent shards' data to be merged in.
 This means that if subsequent shards' facet_dates are skewed in relation to 
 the first by 1 'gap', these 'earlier' or 'later' facets will not be merged 
 in.
 There are several reasons for this:
   * Performance: It's faster to check facet_date lists against a single map's 
 data, rather than against each other, particularly if there are many shards
   * If 'earlier' and/or 'later' facet_dates are added in, this will make the 
 time range larger than that which was requested
 (e.g. a request for one hour's worth of facets could bring back 2, 3 
 or more hours of data)
 This could be dealt with if timezone and skew information was added, and 
 the dates were normalized.
 One possibility for adding such support is to [optionally] add 'timezone' and 
 'now' parameters to the 'facet_dates' map. This would tell requesters what 
 time and TZ the remote server thinks it is, and so multiple shards' time data 
 can be normalized.
 The patch affects 2 files in the Solr core:
   org.apache.solr.handler.component.FacetComponent.java
   org.apache.solr.handler.component.ResponseBuilder.java
 The main changes are in FacetComponent - ResponseBuilder is just to hold the 
 completed SimpleOrderedMap until the finishStage.
 One possible enhancement is to perhaps make this an optional parameter, but 
 really, if facet.date parameters are specified, it is assumed they are 
 desired.
 Comments  suggestions welcome.
 As a favour to ask, if anyone could take my 2 source files and create a PATCH 
 file from it, it would be greatly appreciated, as I'm having a bit of trouble 
 with svn (don't shoot me, but my environment is a Redmond-based os company).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1672) RFE: facet reverse sort count

2010-01-07 Thread Peter Sturge (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge resolved SOLR-1672.


Resolution: Fixed

Marking as resolved.


 RFE: facet reverse sort count
 -

 Key: SOLR-1672
 URL: https://issues.apache.org/jira/browse/SOLR-1672
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
 Environment: Java, Solrj, http
Reporter: Peter Sturge
Priority: Minor
 Attachments: SOLR-1672.patch

   Original Estimate: 0h
  Remaining Estimate: 0h

 As suggested by Chris Hosstetter, I have added an optional Comparator to the 
 BoundedTreeSetLong in the UnInvertedField class.
 This optional comparator is used when a new (and also optional) field facet 
 parameter called 'facet.sortorder' is set to the string 'dsc' 
 (e.g. f.facetname.facet.sortorder=dsc for per field, or 
 facet.sortorder=dsc for all facets).
 Note that this parameter has no effect if facet.method=enum.
 Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to 
 its default behaviour.
  
 This change affects 2 source files:
  UnInvertedField.java
 [line 438] The getCounts() method signature is modified to add the 
 'facetSortOrder' parameter value to the end of the argument list.
  
 DIFF UnInvertedField.java:
 - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
 offset, int limit, Integer mincount, boolean missing, String sort, String 
 prefix) throws IOException {
 + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
 offset, int limit, Integer mincount, boolean missing, String sort, String 
 prefix, String facetSortOrder) throws IOException {
 [line 556] The getCounts() method is modified to create an overridden 
 BoundedTreeSetLong(int, Comparator) if the 'facetSortOrder' parameter 
 equals 'dsc'.
 DIFF UnInvertedField.java:
 - final BoundedTreeSetLong queue = new BoundedTreeSetLong(maxsize);
 + final BoundedTreeSetLong queue = (sort.equals(count) || 
 sort.equals(true)) ? (facetSortOrder.equals(dsc) ? new 
 BoundedTreeSetLong(maxsize, new Comparator()
 { @Override
 public int compare(Object o1, Object o2)
 {
   if (o1 == null || o2 == null)
 return 0;
   int result = ((Long) o1).compareTo((Long) o2);
   return (result != 0 ? result  0 ? -1 : 1 : 0); //lowest number first sort
 }}) : new BoundedTreeSetLong(maxsize)) : null;
  SimpleFacets.java
 [line 221] A getFieldParam(field, facet.sortorder, asc); is added to 
 retrieve the new parameter, if present. 'asc' used as a default value.
 DIFF SimpleFacets.java:
 + String facetSortOrder = params.getFieldParam(field, facet.sortorder, 
 asc);
  
 [line 253] The call to uif.getCounts() in the getTermCounts() method is 
 modified to pass the 'facetSortOrder' value string.
 DIFF SimpleFacets.java:
 - counts = uif.getCounts(searcher, base, offset, limit, 
 mincount,missing,sort,prefix);
 + counts = uif.getCounts(searcher, base, offset, limit, 
 mincount,missing,sort,prefix, facetSortOrder);
 Implementation Notes:
 I have noted in testing that I was not able to retrieve any '0' counts as I 
 had expected.
 I believe this could be because there appear to be some optimizations in 
 SimpleFacets/count caching such that zero counts are not iterated (at least 
 not by default)
 as a performance enhancement.
 I could be wrong about this, and zero counts may appear under some other as 
 yet untested circumstances. Perhaps an expert familiar with this part of the 
 code can clarify.
 In fact, this is not such a bad thing (at least for my requirements), as a 
 whole bunch of zero counts is not necessarily useful (for my requirements, 
 starting at '1' is just right).
  
 There may, however, be instances where someone *will* want zero counts - e.g. 
 searching for zero product stock counts (e.g. 'what have we run out of'). I 
 was envisioning the facet.mincount field
 being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 
 or possibly higher), but because of the caching/optimization, the behaviour 
 is somewhat different than expected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1709) Distributed Date Faceting

2010-01-07 Thread Peter Sturge (JIRA)
Distributed Date Faceting
-

 Key: SOLR-1709
 URL: https://issues.apache.org/jira/browse/SOLR-1709
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 1.4
Reporter: Peter Sturge
Priority: Minor


This patch is for adding support for date facets when using distributed 
searches.

Date faceting across multiple machines exposes some time-based issues that 
anyone interested in this behaviour should be aware of:
Any time and/or time-zone differences are not accounted for in the patch (i.e. 
merged date facets are at a time-of-day, not necessarily at a universal 
'instant-in-time', unless all shards are time-synced to the exact same time).
The implementation uses the first encountered shard's facet_dates as the basis 
for subsequent shards' data to be merged in.
This means that if subsequent shards' facet_dates are skewed in relation to the 
first by 1 'gap', these 'earlier' or 'later' facets will not be merged in.
There are several reasons for this:
  * Performance: It's faster to check facet_date lists against a single map's 
data, rather than against each other, particularly if there are many shards
  * If 'earlier' and/or 'later' facet_dates are added in, this will make the 
time range larger than that which was requested
(e.g. a request for one hour's worth of facets could bring back 2, 3 or 
more hours of data)
This could be dealt with if timezone and skew information was added, and 
the dates were normalized.
One possibility for adding such support is to [optionally] add 'timezone' and 
'now' parameters to the 'facet_dates' map. This would tell requesters what time 
and TZ the remote server thinks it is, and so multiple shards' time data can be 
normalized.

The patch affects 2 files in the Solr core:
  org.apache.solr.handler.component.FacetComponent.java
  org.apache.solr.handler.component.ResponseBuilder.java

The main changes are in FacetComponent - ResponseBuilder is just to hold the 
completed SimpleOrderedMap until the finishStage.
One possible enhancement is to perhaps make this an optional parameter, but 
really, if facet.date parameters are specified, it is assumed they are desired.
Comments  suggestions welcome.

As a favour to ask, if anyone could take my 2 source files and create a PATCH 
file from it, it would be greatly appreciated, as I'm having a bit of trouble 
with svn (don't shoot me, but my environment is a Redmond-based os company).


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1672) RFE: facet reverse sort count

2009-12-18 Thread Peter Sturge (JIRA)
RFE: facet reverse sort count
-

 Key: SOLR-1672
 URL: https://issues.apache.org/jira/browse/SOLR-1672
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
 Environment: Java, Solrj, http
Reporter: Peter Sturge
Priority: Minor


As suggested by Chris Hosstetter, I have added an optional Comparator to the 
BoundedTreeSetLong in the UnInvertedField class.
This optional comparator is used when a new (and also optional) field facet 
parameter called 'facet.sortorder' is set to the string 'dsc' 
(e.g. f.facetname.facet.sortorder=dsc for per field, or facet.sortorder=dsc 
for all facets).
Note that this parameter has no effect if facet.method=enum.
Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to 
its default behaviour.
 
This change affects 2 source files:
 UnInvertedField.java
[line 438] The getCounts() method signature is modified to add the 
'facetSortOrder' parameter value to the end of the argument list.
 
DIFF UnInvertedField.java:
- public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
offset, int limit, Integer mincount, boolean missing, String sort, String 
prefix) throws IOException {

+ public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
offset, int limit, Integer mincount, boolean missing, String sort, String 
prefix, String facetSortOrder) throws IOException {

[line 556] The getCounts() method is modified to create an overridden 
BoundedTreeSetLong(int, Comparator) if the 'facetSortOrder' parameter equals 
'dsc'.
DIFF UnInvertedField.java:
- final BoundedTreeSetLong queue = new BoundedTreeSetLong(maxsize);

+ final BoundedTreeSetLong queue = (sort.equals(count) || 
sort.equals(true)) ? (facetSortOrder.equals(dsc) ? new 
BoundedTreeSetLong(maxsize, new Comparator()
{ @Override
public int compare(Object o1, Object o2)
{
  if (o1 == null || o2 == null)
return 0;
  int result = ((Long) o1).compareTo((Long) o2);
  return (result != 0 ? result  0 ? -1 : 1 : 0); //lowest number first sort
}}) : new BoundedTreeSetLong(maxsize)) : null;

 SimpleFacets.java
[line 221] A getFieldParam(field, facet.sortorder, asc); is added to 
retrieve the new parameter, if present. 'asc' used as a default value.
DIFF SimpleFacets.java:

+ String facetSortOrder = params.getFieldParam(field, facet.sortorder, asc);
 
[line 253] The call to uif.getCounts() in the getTermCounts() method is 
modified to pass the 'facetSortOrder' value string.
DIFF SimpleFacets.java:
- counts = uif.getCounts(searcher, base, offset, limit, 
mincount,missing,sort,prefix);
+ counts = uif.getCounts(searcher, base, offset, limit, 
mincount,missing,sort,prefix, facetSortOrder);

Implementation Notes:
I have noted in testing that I was not able to retrieve any '0' counts as I had 
expected.
I believe this could be because there appear to be some optimizations in 
SimpleFacets/count caching such that zero counts are not iterated (at least not 
by default)
as a performance enhancement.
I could be wrong about this, and zero counts may appear under some other as yet 
untested circumstances. Perhaps an expert familiar with this part of the code 
can clarify.
In fact, this is not such a bad thing (at least for my requirements), as a 
whole bunch of zero counts is not necessarily useful (for my requirements, 
starting at '1' is just right).
 
There may, however, be instances where someone *will* want zero counts - e.g. 
searching for zero product stock counts (e.g. 'what have we run out of'). I was 
envisioning the facet.mincount field
being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 
or possibly higher), but because of the caching/optimization, the behaviour is 
somewhat different than expected.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1672) RFE: facet reverse sort count

2009-12-18 Thread Peter Sturge (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge updated SOLR-1672:
---

Attachment: SOLR-1672.patch

Patch diff file for adding facet reverse sorting


 RFE: facet reverse sort count
 -

 Key: SOLR-1672
 URL: https://issues.apache.org/jira/browse/SOLR-1672
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
 Environment: Java, Solrj, http
Reporter: Peter Sturge
Priority: Minor
 Attachments: SOLR-1672.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 As suggested by Chris Hosstetter, I have added an optional Comparator to the 
 BoundedTreeSetLong in the UnInvertedField class.
 This optional comparator is used when a new (and also optional) field facet 
 parameter called 'facet.sortorder' is set to the string 'dsc' 
 (e.g. f.facetname.facet.sortorder=dsc for per field, or 
 facet.sortorder=dsc for all facets).
 Note that this parameter has no effect if facet.method=enum.
 Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to 
 its default behaviour.
  
 This change affects 2 source files:
  UnInvertedField.java
 [line 438] The getCounts() method signature is modified to add the 
 'facetSortOrder' parameter value to the end of the argument list.
  
 DIFF UnInvertedField.java:
 - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
 offset, int limit, Integer mincount, boolean missing, String sort, String 
 prefix) throws IOException {
 + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
 offset, int limit, Integer mincount, boolean missing, String sort, String 
 prefix, String facetSortOrder) throws IOException {
 [line 556] The getCounts() method is modified to create an overridden 
 BoundedTreeSetLong(int, Comparator) if the 'facetSortOrder' parameter 
 equals 'dsc'.
 DIFF UnInvertedField.java:
 - final BoundedTreeSetLong queue = new BoundedTreeSetLong(maxsize);
 + final BoundedTreeSetLong queue = (sort.equals(count) || 
 sort.equals(true)) ? (facetSortOrder.equals(dsc) ? new 
 BoundedTreeSetLong(maxsize, new Comparator()
 { @Override
 public int compare(Object o1, Object o2)
 {
   if (o1 == null || o2 == null)
 return 0;
   int result = ((Long) o1).compareTo((Long) o2);
   return (result != 0 ? result  0 ? -1 : 1 : 0); //lowest number first sort
 }}) : new BoundedTreeSetLong(maxsize)) : null;
  SimpleFacets.java
 [line 221] A getFieldParam(field, facet.sortorder, asc); is added to 
 retrieve the new parameter, if present. 'asc' used as a default value.
 DIFF SimpleFacets.java:
 + String facetSortOrder = params.getFieldParam(field, facet.sortorder, 
 asc);
  
 [line 253] The call to uif.getCounts() in the getTermCounts() method is 
 modified to pass the 'facetSortOrder' value string.
 DIFF SimpleFacets.java:
 - counts = uif.getCounts(searcher, base, offset, limit, 
 mincount,missing,sort,prefix);
 + counts = uif.getCounts(searcher, base, offset, limit, 
 mincount,missing,sort,prefix, facetSortOrder);
 Implementation Notes:
 I have noted in testing that I was not able to retrieve any '0' counts as I 
 had expected.
 I believe this could be because there appear to be some optimizations in 
 SimpleFacets/count caching such that zero counts are not iterated (at least 
 not by default)
 as a performance enhancement.
 I could be wrong about this, and zero counts may appear under some other as 
 yet untested circumstances. Perhaps an expert familiar with this part of the 
 code can clarify.
 In fact, this is not such a bad thing (at least for my requirements), as a 
 whole bunch of zero counts is not necessarily useful (for my requirements, 
 starting at '1' is just right).
  
 There may, however, be instances where someone *will* want zero counts - e.g. 
 searching for zero product stock counts (e.g. 'what have we run out of'). I 
 was envisioning the facet.mincount field
 being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 
 or possibly higher), but because of the caching/optimization, the behaviour 
 is somewhat different than expected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1672) RFE: facet reverse sort count

2009-12-18 Thread Peter Sturge (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792424#action_12792424
 ] 

Peter Sturge commented on SOLR-1672:


Patch SOLR-1672.patch now included for review


 RFE: facet reverse sort count
 -

 Key: SOLR-1672
 URL: https://issues.apache.org/jira/browse/SOLR-1672
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
 Environment: Java, Solrj, http
Reporter: Peter Sturge
Priority: Minor
 Attachments: SOLR-1672.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 As suggested by Chris Hosstetter, I have added an optional Comparator to the 
 BoundedTreeSetLong in the UnInvertedField class.
 This optional comparator is used when a new (and also optional) field facet 
 parameter called 'facet.sortorder' is set to the string 'dsc' 
 (e.g. f.facetname.facet.sortorder=dsc for per field, or 
 facet.sortorder=dsc for all facets).
 Note that this parameter has no effect if facet.method=enum.
 Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to 
 its default behaviour.
  
 This change affects 2 source files:
  UnInvertedField.java
 [line 438] The getCounts() method signature is modified to add the 
 'facetSortOrder' parameter value to the end of the argument list.
  
 DIFF UnInvertedField.java:
 - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
 offset, int limit, Integer mincount, boolean missing, String sort, String 
 prefix) throws IOException {
 + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
 offset, int limit, Integer mincount, boolean missing, String sort, String 
 prefix, String facetSortOrder) throws IOException {
 [line 556] The getCounts() method is modified to create an overridden 
 BoundedTreeSetLong(int, Comparator) if the 'facetSortOrder' parameter 
 equals 'dsc'.
 DIFF UnInvertedField.java:
 - final BoundedTreeSetLong queue = new BoundedTreeSetLong(maxsize);
 + final BoundedTreeSetLong queue = (sort.equals(count) || 
 sort.equals(true)) ? (facetSortOrder.equals(dsc) ? new 
 BoundedTreeSetLong(maxsize, new Comparator()
 { @Override
 public int compare(Object o1, Object o2)
 {
   if (o1 == null || o2 == null)
 return 0;
   int result = ((Long) o1).compareTo((Long) o2);
   return (result != 0 ? result  0 ? -1 : 1 : 0); //lowest number first sort
 }}) : new BoundedTreeSetLong(maxsize)) : null;
  SimpleFacets.java
 [line 221] A getFieldParam(field, facet.sortorder, asc); is added to 
 retrieve the new parameter, if present. 'asc' used as a default value.
 DIFF SimpleFacets.java:
 + String facetSortOrder = params.getFieldParam(field, facet.sortorder, 
 asc);
  
 [line 253] The call to uif.getCounts() in the getTermCounts() method is 
 modified to pass the 'facetSortOrder' value string.
 DIFF SimpleFacets.java:
 - counts = uif.getCounts(searcher, base, offset, limit, 
 mincount,missing,sort,prefix);
 + counts = uif.getCounts(searcher, base, offset, limit, 
 mincount,missing,sort,prefix, facetSortOrder);
 Implementation Notes:
 I have noted in testing that I was not able to retrieve any '0' counts as I 
 had expected.
 I believe this could be because there appear to be some optimizations in 
 SimpleFacets/count caching such that zero counts are not iterated (at least 
 not by default)
 as a performance enhancement.
 I could be wrong about this, and zero counts may appear under some other as 
 yet untested circumstances. Perhaps an expert familiar with this part of the 
 code can clarify.
 In fact, this is not such a bad thing (at least for my requirements), as a 
 whole bunch of zero counts is not necessarily useful (for my requirements, 
 starting at '1' is just right).
  
 There may, however, be instances where someone *will* want zero counts - e.g. 
 searching for zero product stock counts (e.g. 'what have we run out of'). I 
 was envisioning the facet.mincount field
 being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 
 or possibly higher), but because of the caching/optimization, the behaviour 
 is somewhat different than expected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.