[jira] Updated: (SOLR-1872) Document-level Access Control in Solr

2010-04-12 Thread Peter Sturge (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge updated SOLR-1872:
---

Attachment: SolrACLSecurity.java

Updates a typo or two plus some misc tweaks.

{code}
  
  
  zxb79j3g76A79N8N2AbR0K852976qr1klt86xv436j2
  acl.xml
  
  true
  10
  1
  audit.log
  
  5
  15

{code}

Thanks,
Peter


> Document-level Access Control in Solr
> -
>
> Key: SOLR-1872
> URL: https://issues.apache.org/jira/browse/SOLR-1872
> Project: Solr
>  Issue Type: New Feature
>  Components: SearchComponents - other
>Affects Versions: 1.4
> Environment: Solr 1.4
>Reporter: Peter Sturge
>Priority: Minor
> Attachments: SolrACLSecurity.java, SolrACLSecurity.java, 
> SolrACLSecurity.rar
>
>
> This issue relates to providing document-level access control for Solr index 
> data.
> A related JIRA issue is: SOLR-1834. I thought it would be best if I created a 
> separate JIRA issue, rather than tack on to SOLR-1834, as the approach here 
> is somewhat different, and I didn't want to confuse things or step on Anders' 
> good work.
> There have been lots of discussions about document-level access in Solr using 
> LCF, custom comoponents and the like. Access Control is one of those subjects 
> that quickly spreads to lots of 'ratholes' to dive into. Even if not everyone 
> agrees with the approaches taken here, it does, at the very least, highlight 
> some of the salient issues surrounding access control in Solr, and will 
> hopefully initiate a healthy discussion on the range of related requirements, 
> with the aim of finding the optimum balance of requirements.
> The approach taken here is document and schema agnostic - i.e. the access 
> control is independant of what is or will be in the index, and no schema 
> changes are required. This version doesn't include LDAP/AD integration, but 
> could be added relatively easily (see Ander's very good work on this in 
> SOLR-1834). Note that, at the moment, this version doesn't deal with /update, 
> /replication etc., it's currently a /select thing at the moment (but it could 
> be used for these).
> This approach uses a SearchComponent subclass called SolrACLSecurity. Its 
> configuration is read in from solrconfig.xml in the usual way, and the 
> allow/deny configuration is split out into a config file called acl.xml.
> acl.xml defines a number of users and groups (and 1 global for 'everyone'), 
> and assigns 0 or more {{}} and/or {{}} elements.
> When the SearchComponent is initialized, user objects are created and cached, 
> including an 'allow' list and a 'deny' list.
> When a request comes in, these lists are used to build filter queries 
> ('allows' are OR'ed and 'denies' are NAND'ed), and then added to the query 
> request.
> Because the allow and deny elements are simply subsearch queries (e.g. 
> {{somefield:secret}}, this mechanism will work on any 
> stored data that can be queried, including already existing data.
> Authentication
> One of the sticky problems with access control is how to determine who's 
> asking for data. There are many approaches, and to stay in the generic vein 
> the current mechanism uses http parameters for this.
> For an initial search, a client includes a {{username=somename}} parameter 
> and a {{hash=pwdhash}} hash of its password. If the request sends the correct 
> parameters, the search is granted and a uuid parameter is returned in the 
> response header. This uuid can then be used in subsequent requests from the 
> client. If the request is wrong, the SearchComponent fails and will increment 
> the user's failed login count (if a valid user was specified). If this count 
> exceeds the configured lockoutThreshold, no further requests are granted 
> until the lockoutTime has elapsed.
> This mechanism protects against some types of attacks (e.g. CLRF, dictionary 
> etc.), but it really needs container HTTPS as well (as would most other auth 
> implementations). Incorporating SSL certificates for authentication and 
> making the authentication mechanism pluggable would be a nice improvement 
> (i.e. separate authentication from access control).
> Another issue is how internal searchers perform autowarming etc. The solution 
> here is to use a local key called 'SolrACLSecurityKey'. This key is local and 
> [should be] unique to that server. firstSearcher, newSearcher et al then 
> include this key in their parameters so they can perform autowarming without 
> constraint. Again, there are likely many ways to achieve this, this approach 
> is but one.
> The attached rar holds the source and associated configuration. This has been 
> tested on the 1.4 release codebase (search in the attached solrconfig.xml for 
> SolrACLSecurity to find the relevant sect

[jira] Issue Comment Edited: (SOLR-1872) Document-level Access Control in Solr

2010-04-11 Thread Peter Sturge (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855698#action_12855698
 ] 

Peter Sturge edited comment on SOLR-1872 at 4/11/10 6:18 AM:
-

This update adds in optional auditing of searches by users and failed access 
attempts, plus a few minor tweaks.

To configure auditing, here is a sample searchComponent section from 
solrconfg.xml:

{code}
  
  
  zxb79j3g76A79N8N2AbR0K852976qr1klt86xv436j2
  acl.xml
  
  true
  10
  1
  audit.log
  
  5
  15
 
{code}



  was (Author: midiman):
This update adds in optional auditing of searches by users and failed 
access attempts, plus a few minor tweaks.

To configure auditing, here is a sample searchComponent section from 
solrconfg.xml:

{{
  
  zxb79j3g76A79N8N2AbR0K852976qr1klt86xv436j2
  acl.xml
  
  true
  10
  1
  audit.log
  
  5
  15
 }}

  
> Document-level Access Control in Solr
> -
>
> Key: SOLR-1872
> URL: https://issues.apache.org/jira/browse/SOLR-1872
> Project: Solr
>  Issue Type: New Feature
>  Components: SearchComponents - other
>Affects Versions: 1.4
> Environment: Solr 1.4
>Reporter: Peter Sturge
>Priority: Minor
> Attachments: SolrACLSecurity.java, SolrACLSecurity.rar
>
>
> This issue relates to providing document-level access control for Solr index 
> data.
> A related JIRA issue is: SOLR-1834. I thought it would be best if I created a 
> separate JIRA issue, rather than tack on to SOLR-1834, as the approach here 
> is somewhat different, and I didn't want to confuse things or step on Anders' 
> good work.
> There have been lots of discussions about document-level access in Solr using 
> LCF, custom comoponents and the like. Access Control is one of those subjects 
> that quickly spreads to lots of 'ratholes' to dive into. Even if not everyone 
> agrees with the approaches taken here, it does, at the very least, highlight 
> some of the salient issues surrounding access control in Solr, and will 
> hopefully initiate a healthy discussion on the range of related requirements, 
> with the aim of finding the optimum balance of requirements.
> The approach taken here is document and schema agnostic - i.e. the access 
> control is independant of what is or will be in the index, and no schema 
> changes are required. This version doesn't include LDAP/AD integration, but 
> could be added relatively easily (see Ander's very good work on this in 
> SOLR-1834). Note that, at the moment, this version doesn't deal with /update, 
> /replication etc., it's currently a /select thing at the moment (but it could 
> be used for these).
> This approach uses a SearchComponent subclass called SolrACLSecurity. Its 
> configuration is read in from solrconfig.xml in the usual way, and the 
> allow/deny configuration is split out into a config file called acl.xml.
> acl.xml defines a number of users and groups (and 1 global for 'everyone'), 
> and assigns 0 or more {{}} and/or {{}} elements.
> When the SearchComponent is initialized, user objects are created and cached, 
> including an 'allow' list and a 'deny' list.
> When a request comes in, these lists are used to build filter queries 
> ('allows' are OR'ed and 'denies' are NAND'ed), and then added to the query 
> request.
> Because the allow and deny elements are simply subsearch queries (e.g. 
> {{somefield:secret}}, this mechanism will work on any 
> stored data that can be queried, including already existing data.
> Authentication
> One of the sticky problems with access control is how to determine who's 
> asking for data. There are many approaches, and to stay in the generic vein 
> the current mechanism uses http parameters for this.
> For an initial search, a client includes a {{username=somename}} parameter 
> and a {{hash=pwdhash}} hash of its password. If the request sends the correct 
> parameters, the search is granted and a uuid parameter is returned in the 
> response header. This uuid can then be used in subsequent requests from the 
> client. If the request is wrong, the SearchComponent fails and will increment 
> the user's failed login count (if a valid user was specified). If this count 
> exceeds the configured lockoutThreshold, no further requests are granted 
> until the lockoutTime has elapsed.
> This mechanism protects against some types of attacks (e.g. CLRF, dictionary 
> etc.), but it really needs container HTTPS as well (as would most other auth 
> implementations). Incorporating SSL certificates for authentication and 
> making the authentication mechanism pluggable would be a nice improvement 
> (i.e. separate authentication from access control).
> Another issue is how internal searchers pe

[jira] Issue Comment Edited: (SOLR-1872) Document-level Access Control in Solr

2010-04-11 Thread Peter Sturge (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855698#action_12855698
 ] 

Peter Sturge edited comment on SOLR-1872 at 4/11/10 6:16 AM:
-

This update adds in optional auditing of searches by users and failed access 
attempts, plus a few minor tweaks.

To configure auditing, here is a sample searchComponent section from 
solrconfg.xml:

{{
  
  zxb79j3g76A79N8N2AbR0K852976qr1klt86xv436j2
  acl.xml
  
  true
  10
  1
  audit.log
  
  5
  15
 }}


  was (Author: midiman):
This update adds in optional auditing of searches by users and failed 
access attempts, plus a few minor tweaks.

To configure auditing, here is a sample searchComponent section from 
solrconfg.xml:

{{
  
  
  zxb79j3g76A79N8N2AbR0K852976qr1klt86xv436j2
  acl.xml
  
  true
  10
  1
  audit.log
  
  5
  15
 
}}

  
> Document-level Access Control in Solr
> -
>
> Key: SOLR-1872
> URL: https://issues.apache.org/jira/browse/SOLR-1872
> Project: Solr
>  Issue Type: New Feature
>  Components: SearchComponents - other
>Affects Versions: 1.4
> Environment: Solr 1.4
>Reporter: Peter Sturge
>Priority: Minor
> Attachments: SolrACLSecurity.java, SolrACLSecurity.rar
>
>
> This issue relates to providing document-level access control for Solr index 
> data.
> A related JIRA issue is: SOLR-1834. I thought it would be best if I created a 
> separate JIRA issue, rather than tack on to SOLR-1834, as the approach here 
> is somewhat different, and I didn't want to confuse things or step on Anders' 
> good work.
> There have been lots of discussions about document-level access in Solr using 
> LCF, custom comoponents and the like. Access Control is one of those subjects 
> that quickly spreads to lots of 'ratholes' to dive into. Even if not everyone 
> agrees with the approaches taken here, it does, at the very least, highlight 
> some of the salient issues surrounding access control in Solr, and will 
> hopefully initiate a healthy discussion on the range of related requirements, 
> with the aim of finding the optimum balance of requirements.
> The approach taken here is document and schema agnostic - i.e. the access 
> control is independant of what is or will be in the index, and no schema 
> changes are required. This version doesn't include LDAP/AD integration, but 
> could be added relatively easily (see Ander's very good work on this in 
> SOLR-1834). Note that, at the moment, this version doesn't deal with /update, 
> /replication etc., it's currently a /select thing at the moment (but it could 
> be used for these).
> This approach uses a SearchComponent subclass called SolrACLSecurity. Its 
> configuration is read in from solrconfig.xml in the usual way, and the 
> allow/deny configuration is split out into a config file called acl.xml.
> acl.xml defines a number of users and groups (and 1 global for 'everyone'), 
> and assigns 0 or more {{}} and/or {{}} elements.
> When the SearchComponent is initialized, user objects are created and cached, 
> including an 'allow' list and a 'deny' list.
> When a request comes in, these lists are used to build filter queries 
> ('allows' are OR'ed and 'denies' are NAND'ed), and then added to the query 
> request.
> Because the allow and deny elements are simply subsearch queries (e.g. 
> {{somefield:secret}}, this mechanism will work on any 
> stored data that can be queried, including already existing data.
> Authentication
> One of the sticky problems with access control is how to determine who's 
> asking for data. There are many approaches, and to stay in the generic vein 
> the current mechanism uses http parameters for this.
> For an initial search, a client includes a {{username=somename}} parameter 
> and a {{hash=pwdhash}} hash of its password. If the request sends the correct 
> parameters, the search is granted and a uuid parameter is returned in the 
> response header. This uuid can then be used in subsequent requests from the 
> client. If the request is wrong, the SearchComponent fails and will increment 
> the user's failed login count (if a valid user was specified). If this count 
> exceeds the configured lockoutThreshold, no further requests are granted 
> until the lockoutTime has elapsed.
> This mechanism protects against some types of attacks (e.g. CLRF, dictionary 
> etc.), but it really needs container HTTPS as well (as would most other auth 
> implementations). Incorporating SSL certificates for authentication and 
> making the authentication mechanism pluggable would be a nice improvement 
> (i.e. separate authentication from access control).
> Another issue is how internal searchers perform aut

[jira] Updated: (SOLR-1872) Document-level Access Control in Solr

2010-04-11 Thread Peter Sturge (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge updated SOLR-1872:
---

Attachment: SolrACLSecurity.java

This update adds in optional auditing of searches by users and failed access 
attempts, plus a few minor tweaks.

To configure auditing, here is a sample searchComponent section from 
solrconfg.xml:

{{
  
  
  zxb79j3g76A79N8N2AbR0K852976qr1klt86xv436j2
  acl.xml
  
  true
  10
  1
  audit.log
  
  5
  15
 
}}


> Document-level Access Control in Solr
> -
>
> Key: SOLR-1872
> URL: https://issues.apache.org/jira/browse/SOLR-1872
> Project: Solr
>  Issue Type: New Feature
>  Components: SearchComponents - other
>Affects Versions: 1.4
> Environment: Solr 1.4
>Reporter: Peter Sturge
>Priority: Minor
> Attachments: SolrACLSecurity.java, SolrACLSecurity.rar
>
>
> This issue relates to providing document-level access control for Solr index 
> data.
> A related JIRA issue is: SOLR-1834. I thought it would be best if I created a 
> separate JIRA issue, rather than tack on to SOLR-1834, as the approach here 
> is somewhat different, and I didn't want to confuse things or step on Anders' 
> good work.
> There have been lots of discussions about document-level access in Solr using 
> LCF, custom comoponents and the like. Access Control is one of those subjects 
> that quickly spreads to lots of 'ratholes' to dive into. Even if not everyone 
> agrees with the approaches taken here, it does, at the very least, highlight 
> some of the salient issues surrounding access control in Solr, and will 
> hopefully initiate a healthy discussion on the range of related requirements, 
> with the aim of finding the optimum balance of requirements.
> The approach taken here is document and schema agnostic - i.e. the access 
> control is independant of what is or will be in the index, and no schema 
> changes are required. This version doesn't include LDAP/AD integration, but 
> could be added relatively easily (see Ander's very good work on this in 
> SOLR-1834). Note that, at the moment, this version doesn't deal with /update, 
> /replication etc., it's currently a /select thing at the moment (but it could 
> be used for these).
> This approach uses a SearchComponent subclass called SolrACLSecurity. Its 
> configuration is read in from solrconfig.xml in the usual way, and the 
> allow/deny configuration is split out into a config file called acl.xml.
> acl.xml defines a number of users and groups (and 1 global for 'everyone'), 
> and assigns 0 or more {{}} and/or {{}} elements.
> When the SearchComponent is initialized, user objects are created and cached, 
> including an 'allow' list and a 'deny' list.
> When a request comes in, these lists are used to build filter queries 
> ('allows' are OR'ed and 'denies' are NAND'ed), and then added to the query 
> request.
> Because the allow and deny elements are simply subsearch queries (e.g. 
> {{somefield:secret}}, this mechanism will work on any 
> stored data that can be queried, including already existing data.
> Authentication
> One of the sticky problems with access control is how to determine who's 
> asking for data. There are many approaches, and to stay in the generic vein 
> the current mechanism uses http parameters for this.
> For an initial search, a client includes a {{username=somename}} parameter 
> and a {{hash=pwdhash}} hash of its password. If the request sends the correct 
> parameters, the search is granted and a uuid parameter is returned in the 
> response header. This uuid can then be used in subsequent requests from the 
> client. If the request is wrong, the SearchComponent fails and will increment 
> the user's failed login count (if a valid user was specified). If this count 
> exceeds the configured lockoutThreshold, no further requests are granted 
> until the lockoutTime has elapsed.
> This mechanism protects against some types of attacks (e.g. CLRF, dictionary 
> etc.), but it really needs container HTTPS as well (as would most other auth 
> implementations). Incorporating SSL certificates for authentication and 
> making the authentication mechanism pluggable would be a nice improvement 
> (i.e. separate authentication from access control).
> Another issue is how internal searchers perform autowarming etc. The solution 
> here is to use a local key called 'SolrACLSecurityKey'. This key is local and 
> [should be] unique to that server. firstSearcher, newSearcher et al then 
> include this key in their parameters so they can perform autowarming without 
> constraint. Again, there are likely many ways to achieve this, this approach 
> is but one.
> The attached rar holds the source and associated configuration. This has been 
> tested on the 1.4 r

[jira] Updated: (SOLR-1872) Document-level Access Control in Solr

2010-04-08 Thread Peter Sturge (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge updated SOLR-1872:
---

Attachment: SolrACLSecurity.rar

> Document-level Access Control in Solr
> -
>
> Key: SOLR-1872
> URL: https://issues.apache.org/jira/browse/SOLR-1872
> Project: Solr
>  Issue Type: New Feature
>  Components: SearchComponents - other
>Affects Versions: 1.4
> Environment: Solr 1.4
>Reporter: Peter Sturge
>Priority: Minor
> Attachments: SolrACLSecurity.rar
>
>
> This issue relates to providing document-level access control for Solr index 
> data.
> A related JIRA issue is: SOLR-1834. I thought it would be best if I created a 
> separate JIRA issue, rather than tack on to SOLR-1834, as the approach here 
> is somewhat different, and I didn't want to confuse things or step on Anders' 
> good work.
> There have been lots of discussions about document-level access in Solr using 
> LCF, custom comoponents and the like. Access Control is one of those subjects 
> that quickly spreads to lots of 'ratholes' to dive into. Even if not everyone 
> agrees with the approaches taken here, it does, at the very least, highlight 
> some of the salient issues surrounding access control in Solr, and will 
> hopefully initiate a healthy discussion on the range of related requirements, 
> with the aim of finding the optimum balance of requirements.
> The approach taken here is document and schema agnostic - i.e. the access 
> control is independant of what is or will be in the index, and no schema 
> changes are required. This version doesn't include LDAP/AD integration, but 
> could be added relatively easily (see Ander's very good work on this in 
> SOLR-1834). Note that, at the moment, this version doesn't deal with /update, 
> /replication etc., it's currently a /select thing at the moment (but it could 
> be used for these).
> This approach uses a SearchComponent subclass called SolrACLSecurity. Its 
> configuration is read in from solrconfig.xml in the usual way, and the 
> allow/deny configuration is split out into a config file called acl.xml.
> acl.xml defines a number of users and groups (and 1 global for 'everyone'), 
> and assigns 0 or more {{}} and/or {{}} elements.
> When the SearchComponent is initialized, user objects are created and cached, 
> including an 'allow' list and a 'deny' list.
> When a request comes in, these lists are used to build filter queries 
> ('allows' are OR'ed and 'denies' are NAND'ed), and then added to the query 
> request.
> Because the allow and deny elements are simply subsearch queries (e.g. 
> {{somefield:secret}}, this mechanism will work on any 
> stored data that can be queried, including already existing data.
> Authentication
> One of the sticky problems with access control is how to determine who's 
> asking for data. There are many approaches, and to stay in the generic vein 
> the current mechanism uses http parameters for this.
> For an initial search, a client includes a {{username=somename}} parameter 
> and a {{hash=pwdhash}} hash of its password. If the request sends the correct 
> parameters, the search is granted and a uuid parameter is returned in the 
> response header. This uuid can then be used in subsequent requests from the 
> client. If the request is wrong, the SearchComponent fails and will increment 
> the user's failed login count (if a valid user was specified). If this count 
> exceeds the configured lockoutThreshold, no further requests are granted 
> until the lockoutTime has elapsed.
> This mechanism protects against some types of attacks (e.g. CLRF, dictionary 
> etc.), but it really needs container HTTPS as well (as would most other auth 
> implementations). Incorporating SSL certificates for authentication and 
> making the authentication mechanism pluggable would be a nice improvement 
> (i.e. separate authentication from access control).
> Another issue is how internal searchers perform autowarming etc. The solution 
> here is to use a local key called 'SolrACLSecurityKey'. This key is local and 
> [should be] unique to that server. firstSearcher, newSearcher et al then 
> include this key in their parameters so they can perform autowarming without 
> constraint. Again, there are likely many ways to achieve this, this approach 
> is but one.
> The attached rar holds the source and associated configuration. This has been 
> tested on the 1.4 release codebase (search in the attached solrconfig.xml for 
> SolrACLSecurity to find the relevant sections in this file).
> I hope this proves helpful for people who are looking for this sort of 
> functionality in Solr, and more generally to address how such a mechanism 
> could ultimately be integrated into a future Solr release.
> Many thanks,
> Peter

-- 
This message is automat

[jira] Created: (SOLR-1872) Document-level Access Control in Solr

2010-04-08 Thread Peter Sturge (JIRA)
Document-level Access Control in Solr
-

 Key: SOLR-1872
 URL: https://issues.apache.org/jira/browse/SOLR-1872
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Affects Versions: 1.4
 Environment: Solr 1.4
Reporter: Peter Sturge
Priority: Minor
 Attachments: SolrACLSecurity.rar

This issue relates to providing document-level access control for Solr index 
data.

A related JIRA issue is: SOLR-1834. I thought it would be best if I created a 
separate JIRA issue, rather than tack on to SOLR-1834, as the approach here is 
somewhat different, and I didn't want to confuse things or step on Anders' good 
work.

There have been lots of discussions about document-level access in Solr using 
LCF, custom comoponents and the like. Access Control is one of those subjects 
that quickly spreads to lots of 'ratholes' to dive into. Even if not everyone 
agrees with the approaches taken here, it does, at the very least, highlight 
some of the salient issues surrounding access control in Solr, and will 
hopefully initiate a healthy discussion on the range of related requirements, 
with the aim of finding the optimum balance of requirements.

The approach taken here is document and schema agnostic - i.e. the access 
control is independant of what is or will be in the index, and no schema 
changes are required. This version doesn't include LDAP/AD integration, but 
could be added relatively easily (see Ander's very good work on this in 
SOLR-1834). Note that, at the moment, this version doesn't deal with /update, 
/replication etc., it's currently a /select thing at the moment (but it could 
be used for these).

This approach uses a SearchComponent subclass called SolrACLSecurity. Its 
configuration is read in from solrconfig.xml in the usual way, and the 
allow/deny configuration is split out into a config file called acl.xml.

acl.xml defines a number of users and groups (and 1 global for 'everyone'), and 
assigns 0 or more {{}} and/or {{}} elements.
When the SearchComponent is initialized, user objects are created and cached, 
including an 'allow' list and a 'deny' list.
When a request comes in, these lists are used to build filter queries ('allows' 
are OR'ed and 'denies' are NAND'ed), and then added to the query request.

Because the allow and deny elements are simply subsearch queries (e.g. 
{{somefield:secret}}, this mechanism will work on any 
stored data that can be queried, including already existing data.

Authentication
One of the sticky problems with access control is how to determine who's asking 
for data. There are many approaches, and to stay in the generic vein the 
current mechanism uses http parameters for this.
For an initial search, a client includes a {{username=somename}} parameter and 
a {{hash=pwdhash}} hash of its password. If the request sends the correct 
parameters, the search is granted and a uuid parameter is returned in the 
response header. This uuid can then be used in subsequent requests from the 
client. If the request is wrong, the SearchComponent fails and will increment 
the user's failed login count (if a valid user was specified). If this count 
exceeds the configured lockoutThreshold, no further requests are granted until 
the lockoutTime has elapsed.
This mechanism protects against some types of attacks (e.g. CLRF, dictionary 
etc.), but it really needs container HTTPS as well (as would most other auth 
implementations). Incorporating SSL certificates for authentication and making 
the authentication mechanism pluggable would be a nice improvement (i.e. 
separate authentication from access control).

Another issue is how internal searchers perform autowarming etc. The solution 
here is to use a local key called 'SolrACLSecurityKey'. This key is local and 
[should be] unique to that server. firstSearcher, newSearcher et al then 
include this key in their parameters so they can perform autowarming without 
constraint. Again, there are likely many ways to achieve this, this approach is 
but one.

The attached rar holds the source and associated configuration. This has been 
tested on the 1.4 release codebase (search in the attached solrconfig.xml for 
SolrACLSecurity to find the relevant sections in this file).

I hope this proves helpful for people who are looking for this sort of 
functionality in Solr, and more generally to address how such a mechanism could 
ultimately be integrated into a future Solr release.

Many thanks,
Peter





-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1143) Return partial results when a connection to a shard is refused

2010-04-03 Thread Peter Sturge (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853168#action_12853168
 ] 

Peter Sturge commented on SOLR-1143:


This is a cool patch - yes, very useful.

I've found a couple of issues with it, though:

1. When going through the 'waiting for shard replies' loop, because no 
exception is thrown on shard failure, the next block after the loop can throw a 
NullPointerException in {{SearchComponent.handleResponses()}} for any 
SearchComponent that checks shard responses. It could be that this doesn't 
always happen, but it certainly happens in FacetComponent when date_facets are 
turned on.

2. There's a bit of code that sets {{partialResults=true}} if there's at least 
one failure, but it doesn't set it to false if everything's ok. In order for 
the patch to operate, this parameter must have already been present and true, 
otherwise the patch is essentially 'disabled' anyway (problem of using the same 
parameter as input and result).

I've made some modifications to the patch for these and a couple of other 
things:

1. FacetComponent modified to check for null shard reponse. Perhaps it would be 
better to check this in SearchHandler.handleResponses(), but then no 
SearchComponents would be contacted re failed shards, even if they don't care 
that it's failed (is that a good thing?).

2. Added a new CommonParams parameter called FAILED_SHARDS.
{{partialResults}} is now only an input parameter to enable the feature (Note: 
{{partialResults}} is referenced in RequestHandlerBase, but it's not from the 
patch - is this an existing parameter that is used for something else?! If so, 
perhaps the name should be changed to something like {{allowPartialResults}} to 
avoid b/w compat and other potential conflicts).
The output parameter that goes in the response header is now: 
{{failedShards=shard0;shard1;shardn}}. If everything succeeds, there will be no 
failedShards in the response header, otherwise, a list of failed shards is 
given. This is very useful to alert someone/something that a server/network 
needs attention (e.g. a health checker thread could run empty disributed 
seaches solely for the purpose of checking status).

3. Changed the detection of a shard request error to be any Exception, rather 
than just ConnectException. This way, any failure is caught and can be 
actioned. Possible TODO: it might be nice to include a short message (Exception 
class name?) in the FAILED_SHARDS parameter about what failed (e.g. 
ConnectException, IOException, etc.). If you like this idea, please say so, and 
I'll include it - i.e. something like: 
{{
failedShards=myshard:8983/solr/core0|ConnectException;myothershard:8983/solr/core0|IOException}}

I'm currently testing these changes in our internal build. In the meantime, any 
comments are grealy appreciated. If there are no objections, I'll add a patch 
update when the dev test run is complete.




> Return partial results when a connection to a shard is refused
> --
>
> Key: SOLR-1143
> URL: https://issues.apache.org/jira/browse/SOLR-1143
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Nicolas Dessaigne
>Assignee: Grant Ingersoll
> Fix For: 1.5
>
> Attachments: SOLR-1143-2.patch, SOLR-1143-3.patch, SOLR-1143.patch
>
>
> If any shard is down in a distributed search, a ConnectException it thrown.
> Here's a little patch that change this behaviour: if we can't connect to a 
> shard (ConnectException), we get partial results from the active shards. As 
> for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we 
> set the parameter "partialResults" at true.
> This patch also adresses a problem expressed in the mailing list about a year 
> ago 
> (http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html)
> We have a use case that needs this behaviour and we would like to know your 
> thougths about such a behaviour? Should it be the default behaviour for 
> distributed search?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1861) HTTP Authentication for sharded queries

2010-04-02 Thread Peter Sturge (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge updated SOLR-1861:
---

Attachment: SearchHandler.java

A small update to this patch to support distributed searches with multiple 
cores.


> HTTP Authentication for sharded queries
> ---
>
> Key: SOLR-1861
> URL: https://issues.apache.org/jira/browse/SOLR-1861
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.4
> Environment: Solr 1.4
>Reporter: Peter Sturge
>Priority: Minor
> Attachments: SearchHandler.java, SearchHandler.java
>
>
> This issue came out of a requirement to have HTTP authentication for queries. 
> Currently, HTTP authentication works for querying single servers, but it's 
> not possible for distributed searches across multiple shards to receive 
> authenticated http requests.
> This patch adds the option for Solr clients to pass shard-specific http 
> credentials to SearchHandler, which can then use these credentials when 
> making http requests to shards.
> Here's how the patch works:
> A final constant String called {{shardcredentials}} acts as the name of the 
> SolrParams parameter key name.
> The format for the value associated with this key is a comma-delimited list 
> of colon-separated tokens:
> {{   
> shard0:port0:username0:password0,shard1:port1:username1:password1,shardN:portN:usernameN:passwordN
>   }}
> A client adds these parameters to their sharded request. 
> In the absence of {{shardcredentials}} and/or matching credentials, the patch 
> reverts to the existing behaviour of using a default http client (i.e. no 
> credentials). This ensures b/w compatibility.
> When SearchHandler receives the request, it passes the 'shardcredentials' 
> parameter to the HttpCommComponent via the submit() method.
> The HttpCommComponent parses the parameter string, and when it finds matching 
> credentials for a given shard, it creates an HttpClient object with those 
> credentials, and then sends the request using this.
> Note: Because the match comparison is a string compare (a.o.t. dns compare), 
> the host/ip names used in the shardcredentials parameters must match those 
> used in the shards parameter.
> Impl Notes:
> This patch is used and tested on the 1.4 release codebase. There weren't any 
> significant diffs between the 1.4 release and the latest trunk for 
> SearchHandler, so should be fine on other trunks, but I've only tested with 
> the 1.4 release code base.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1861) HTTP Authentication for sharded queries

2010-04-02 Thread Peter Sturge (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge updated SOLR-1861:
---

Attachment: SearchHandler.java

Apologies that this is the source file and not a diff'ed patch file.

I've tried so many Win doze svn products, but I just can't get them to create a 
patch file (I'm sure this is more down to me not configuring them correctly, 
rather than rapidsvn, visualsvn, Tortoisesvn etc.).
If someone would like to create a patch file from this source, that would be 
extraordinarily kind of you!
In any case, the changes to this file are quite straightforward.


> HTTP Authentication for sharded queries
> ---
>
> Key: SOLR-1861
> URL: https://issues.apache.org/jira/browse/SOLR-1861
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.4
> Environment: Solr 1.4
>Reporter: Peter Sturge
>Priority: Minor
> Attachments: SearchHandler.java
>
>
> This issue came out of a requirement to have HTTP authentication for queries. 
> Currently, HTTP authentication works for querying single servers, but it's 
> not possible for distributed searches across multiple shards to receive 
> authenticated http requests.
> This patch adds the option for Solr clients to pass shard-specific http 
> credentials to SearchHandler, which can then use these credentials when 
> making http requests to shards.
> Here's how the patch works:
> A final constant String called {{shardcredentials}} acts as the name of the 
> SolrParams parameter key name.
> The format for the value associated with this key is a comma-delimited list 
> of colon-separated tokens:
> {{   
> shard0:port0:username0:password0,shard1:port1:username1:password1,shardN:portN:usernameN:passwordN
>   }}
> A client adds these parameters to their sharded request. 
> In the absence of {{shardcredentials}} and/or matching credentials, the patch 
> reverts to the existing behaviour of using a default http client (i.e. no 
> credentials). This ensures b/w compatibility.
> When SearchHandler receives the request, it passes the 'shardcredentials' 
> parameter to the HttpCommComponent via the submit() method.
> The HttpCommComponent parses the parameter string, and when it finds matching 
> credentials for a given shard, it creates an HttpClient object with those 
> credentials, and then sends the request using this.
> Note: Because the match comparison is a string compare (a.o.t. dns compare), 
> the host/ip names used in the shardcredentials parameters must match those 
> used in the shards parameter.
> Impl Notes:
> This patch is used and tested on the 1.4 release codebase. There weren't any 
> significant diffs between the 1.4 release and the latest trunk for 
> SearchHandler, so should be fine on other trunks, but I've only tested with 
> the 1.4 release code base.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1861) HTTP Authentication for sharded queries

2010-04-02 Thread Peter Sturge (JIRA)
HTTP Authentication for sharded queries
---

 Key: SOLR-1861
 URL: https://issues.apache.org/jira/browse/SOLR-1861
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
 Environment: Solr 1.4
Reporter: Peter Sturge
Priority: Minor


This issue came out of a requirement to have HTTP authentication for queries. 
Currently, HTTP authentication works for querying single servers, but it's not 
possible for distributed searches across multiple shards to receive 
authenticated http requests.

This patch adds the option for Solr clients to pass shard-specific http 
credentials to SearchHandler, which can then use these credentials when making 
http requests to shards.

Here's how the patch works:

A final constant String called {{shardcredentials}} acts as the name of the 
SolrParams parameter key name.
The format for the value associated with this key is a comma-delimited list of 
colon-separated tokens:
{{   
shard0:port0:username0:password0,shard1:port1:username1:password1,shardN:portN:usernameN:passwordN
  }}
A client adds these parameters to their sharded request. 
In the absence of {{shardcredentials}} and/or matching credentials, the patch 
reverts to the existing behaviour of using a default http client (i.e. no 
credentials). This ensures b/w compatibility.

When SearchHandler receives the request, it passes the 'shardcredentials' 
parameter to the HttpCommComponent via the submit() method.
The HttpCommComponent parses the parameter string, and when it finds matching 
credentials for a given shard, it creates an HttpClient object with those 
credentials, and then sends the request using this.
Note: Because the match comparison is a string compare (a.o.t. dns compare), 
the host/ip names used in the shardcredentials parameters must match those used 
in the shards parameter.

Impl Notes:
This patch is used and tested on the 1.4 release codebase. There weren't any 
significant diffs between the 1.4 release and the latest trunk for 
SearchHandler, so should be fine on other trunks, but I've only tested with the 
1.4 release code base.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1672) RFE: facet reverse sort count

2010-03-26 Thread Peter Sturge (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850159#action_12850159
 ] 

Peter Sturge commented on SOLR-1672:


I agree there's some refactoring to do to bring it in line with current 
FacetParams conventions. At the same time, it would be good to look at wrapping 
up the functionality into a method, and covering all the code paths in the way 
you describe.

I've been wanting to get to finishing off this patch, but I'm in the throws of 
a product release myself, so I've not had many spare cycles.

You mention termenum, fieldcache, uninverted - presumably, these are among the 
code paths that need to cater for facet counts. If you know them, can you add a 
comment here that lists all the areas that need to be catered for, so that none 
are left out (if it's more than those 3).

Thanks!
Peter


> RFE: facet reverse sort count
> -
>
> Key: SOLR-1672
> URL: https://issues.apache.org/jira/browse/SOLR-1672
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.4
> Environment: Java, Solrj, http
>Reporter: Peter Sturge
>Priority: Minor
> Attachments: SOLR-1672.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> As suggested by Chris Hosstetter, I have added an optional Comparator to the 
> BoundedTreeSet in the UnInvertedField class.
> This optional comparator is used when a new (and also optional) field facet 
> parameter called 'facet.sortorder' is set to the string 'dsc' 
> (e.g. &f..facet.sortorder=dsc for per field, or 
> &facet.sortorder=dsc for all facets).
> Note that this parameter has no effect if facet.method=enum.
> Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to 
> its default behaviour.
>  
> This change affects 2 source files:
> > UnInvertedField.java
> [line 438] The getCounts() method signature is modified to add the 
> 'facetSortOrder' parameter value to the end of the argument list.
>  
> DIFF UnInvertedField.java:
> - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
> offset, int limit, Integer mincount, boolean missing, String sort, String 
> prefix) throws IOException {
> + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
> offset, int limit, Integer mincount, boolean missing, String sort, String 
> prefix, String facetSortOrder) throws IOException {
> [line 556] The getCounts() method is modified to create an overridden 
> BoundedTreeSet(int, Comparator) if the 'facetSortOrder' parameter 
> equals 'dsc'.
> DIFF UnInvertedField.java:
> - final BoundedTreeSet queue = new BoundedTreeSet(maxsize);
> + final BoundedTreeSet queue = (sort.equals("count") || 
> sort.equals("true")) ? (facetSortOrder.equals("dsc") ? new 
> BoundedTreeSet(maxsize, new Comparator()
> { @Override
> public int compare(Object o1, Object o2)
> {
>   if (o1 == null || o2 == null)
> return 0;
>   int result = ((Long) o1).compareTo((Long) o2);
>   return (result != 0 ? result > 0 ? -1 : 1 : 0); //lowest number first sort
> }}) : new BoundedTreeSet(maxsize)) : null;
> > SimpleFacets.java
> [line 221] A getFieldParam(field, "facet.sortorder", "asc"); is added to 
> retrieve the new parameter, if present. 'asc' used as a default value.
> DIFF SimpleFacets.java:
> + String facetSortOrder = params.getFieldParam(field, "facet.sortorder", 
> "asc");
>  
> [line 253] The call to uif.getCounts() in the getTermCounts() method is 
> modified to pass the 'facetSortOrder' value string.
> DIFF SimpleFacets.java:
> - counts = uif.getCounts(searcher, base, offset, limit, 
> mincount,missing,sort,prefix);
> + counts = uif.getCounts(searcher, base, offset, limit, 
> mincount,missing,sort,prefix, facetSortOrder);
> Implementation Notes:
> I have noted in testing that I was not able to retrieve any '0' counts as I 
> had expected.
> I believe this could be because there appear to be some optimizations in 
> SimpleFacets/count caching such that zero counts are not iterated (at least 
> not by default)
> as a performance enhancement.
> I could be wrong about this, and zero counts may appear under some other as 
> yet untested circumstances. Perhaps an expert familiar with this part of the 
> code can clarify.
> In fact, this is not such a bad thing (at least for my requirements), as a 
> whole bunch of zero counts is not necessarily useful (for my requirements, 
> starting at '1' is just right).
>  
> There may, however, be instances where someone *will* want zero counts - e.g. 
> searching for zero product stock counts (e.g. 'what have we run out of'). I 
> was envisioning the facet.mincount field
> being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 
> or possibly higher), but because of the caching/optimizati

[jira] Updated: (SOLR-1729) Date Facet now override time parameter

2010-02-17 Thread Peter Sturge (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge updated SOLR-1729:
---

Attachment: UnInvertedField.java

Hi Thomas,

Thanks for catching this. I thought I'd attached that one. *sigh* Honestly, 
that is really slack of me - many apologies.
The attached UnInvertedField.java has the updated getCounts() method. Any 
troubles, let me know.

Thanks!
Peter


> Date Facet now override time parameter
> --
>
> Key: SOLR-1729
> URL: https://issues.apache.org/jira/browse/SOLR-1729
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.4
> Environment: Solr 1.4
>Reporter: Peter Sturge
>Priority: Minor
> Attachments: FacetParams.java, SimpleFacets.java, UnInvertedField.java
>
>
> This PATCH introduces a new query parameter that tells a (typically, but not 
> necessarily) remote server what time to use as 'NOW' when calculating date 
> facets for a query (and, for the moment, date facets *only*) - overriding the 
> default behaviour of using the local server's current time.
> This gets 'round a problem whereby an explicit time range is specified in a 
> query (e.g. timestamp:[then0 TO then1]), and date facets are required for the 
> given time range (in fact, any explicit time range). 
> Because DateMathParser performs all its calculations from 'NOW', remote 
> callers have to work out how long ago 'then0' and 'then1' are from 'now', and 
> use the relative-to-now values in the facet.date.xxx parameters. If a remote 
> server has a different opinion of NOW compared to the caller, the results 
> will be skewed (e.g. they are in a different time-zone, not time-synced etc.).
> This becomes particularly salient when performing distributed date faceting 
> (see SOLR-1709), where multiple shards may all be running with different 
> times, and the faceting needs to be aligned.
> The new parameter is called 'facet.date.now', and takes as a parameter a 
> (stringified) long that is the number of milliseconds from the epoch (1 Jan 
> 1970 00:00) - i.e. the returned value from a System.currentTimeMillis() call. 
> This was chosen over a formatted date to delineate it from a 'searchable' 
> time and to avoid superfluous date parsing. This makes the value generally a 
> programatically-set value, but as that is where the use-case is for this type 
> of parameter, this should be ok.
> NOTE: This parameter affects date facet timing only. If there are other areas 
> of a query that rely on 'NOW', these will not interpret this value. This is a 
> broader issue about setting a 'query-global' NOW that all parts of query 
> analysis can share.
> Source files affected:
> FacetParams.java   (holds the new constant FACET_DATE_NOW)
> SimpleFacets.java  getFacetDateCounts() NOW parameter modified
> This PATCH is mildly related to SOLR-1709 (Distributed Date Faceting), but as 
> it's a general change for date faceting, it was deemed deserving of its own 
> patch. I will be updating SOLR-1709 in due course to include the use of this 
> new parameter, after some rfc acceptance.
> A possible enhancement to this is to detect facet.date fields, look for and 
> match these fields in queries (if they exist), and potentially determine 
> automatically the required time skew, if any. There are a whole host of 
> reasons why this could be problematic to implement, so an explicit 
> facet.date.now parameter is the safest route.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1709) Distributed Date Faceting

2010-02-16 Thread Peter Sturge (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834222#action_12834222
 ] 

Peter Sturge commented on SOLR-1709:


Hi Thomas,

Hmmm...TermsHelper is an inner class inside TermsComponent.
In the code base that I have, this class exists within TermsComponent. I've 
just had a look on the 
http://mirrors.dedipower.com/ftp.apache.org/lucene/solr/1.4.0/ mirror, and the 
TermsComponent *doesn't* have this inner class.

Not sure where the difference is, as I would have got my codebase from the same 
set of mirrors as you (unless some mirrors are out-of-sync?). 

TermsComponent hasn't changed in this patch, so I don't know much about this 
class. One thing to try is to diff the 2 files above with your 1.4 codebase, 
and merge the changes into your codebase. The differences should be very easy 
to see.

This does highlight the very good policy for putting patch files as attachments 
rather than source files. This is my fault, as we don't use svn in our (win) 
environment, and Tortoise SVN crashes explorer64, so i'm not able to make 
compatible diff files - sorry.

If you do create a couple of diff files, it would be very kind of you if you 
could post it up on this issue for others?

Thanks!


> Distributed Date Faceting
> -
>
> Key: SOLR-1709
> URL: https://issues.apache.org/jira/browse/SOLR-1709
> Project: Solr
>  Issue Type: Improvement
>  Components: SearchComponents - other
>Affects Versions: 1.4
>Reporter: Peter Sturge
>Priority: Minor
> Attachments: FacetComponent.java, FacetComponent.java, 
> ResponseBuilder.java
>
>
> This patch is for adding support for date facets when using distributed 
> searches.
> Date faceting across multiple machines exposes some time-based issues that 
> anyone interested in this behaviour should be aware of:
> Any time and/or time-zone differences are not accounted for in the patch 
> (i.e. merged date facets are at a time-of-day, not necessarily at a universal 
> 'instant-in-time', unless all shards are time-synced to the exact same time).
> The implementation uses the first encountered shard's facet_dates as the 
> basis for subsequent shards' data to be merged in.
> This means that if subsequent shards' facet_dates are skewed in relation to 
> the first by >1 'gap', these 'earlier' or 'later' facets will not be merged 
> in.
> There are several reasons for this:
>   * Performance: It's faster to check facet_date lists against a single map's 
> data, rather than against each other, particularly if there are many shards
>   * If 'earlier' and/or 'later' facet_dates are added in, this will make the 
> time range larger than that which was requested
> (e.g. a request for one hour's worth of facets could bring back 2, 3 
> or more hours of data)
> This could be dealt with if timezone and skew information was added, and 
> the dates were normalized.
> One possibility for adding such support is to [optionally] add 'timezone' and 
> 'now' parameters to the 'facet_dates' map. This would tell requesters what 
> time and TZ the remote server thinks it is, and so multiple shards' time data 
> can be normalized.
> The patch affects 2 files in the Solr core:
>   org.apache.solr.handler.component.FacetComponent.java
>   org.apache.solr.handler.component.ResponseBuilder.java
> The main changes are in FacetComponent - ResponseBuilder is just to hold the 
> completed SimpleOrderedMap until the finishStage.
> One possible enhancement is to perhaps make this an optional parameter, but 
> really, if facet.date parameters are specified, it is assumed they are 
> desired.
> Comments & suggestions welcome.
> As a favour to ask, if anyone could take my 2 source files and create a PATCH 
> file from it, it would be greatly appreciated, as I'm having a bit of trouble 
> with svn (don't shoot me, but my environment is a Redmond-based os company).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1729) Date Facet now override time parameter

2010-02-03 Thread Peter Sturge (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829043#action_12829043
 ] 

Peter Sturge commented on SOLR-1729:


Hi Chris,
Thanks for your comments - I hope I didn't sound like your comments were taken 
wrongly - I absolutely count on comments from you and other experts to make 
sure I'm not missing some important functionality and/or side effect. You know 
the code base far better than I, so its great that you take the time to point 
out all the different bits and peices that need addressing.

I can certainly understand the need to address the 'core-global' isssues raised 
by you and Yonik for storing a ThreadLocal 'query-global' 'NOW'.
I suppose the main issue in implementing the thread-local route is that we'd 
have to make sure we found every place in the query core that references now, 
and point those references to the new variable? If the 'code-at-large' 
[hopefully] always calls the date math routines for finding 'NOW', great, it 
should be relatively straightforward. If there are any stray e.g. 
System.currentTimeMillis(), then it's a bit more fiddly, but still do-able.

??it's all handled internally by DateField??
Sounds like DateField would the best candidate for holding the ThreadLocal? The 
query handler code can set the variable of its DateField instance if it's set 
in a query parameter, otherwise it just defaults to it's own local (UTC) time.
Could be done similarly to DateField.ThreadLocalDateFormat, perhaps?


> Date Facet now override time parameter
> --
>
> Key: SOLR-1729
> URL: https://issues.apache.org/jira/browse/SOLR-1729
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.4
> Environment: Solr 1.4
>Reporter: Peter Sturge
>Priority: Minor
> Attachments: FacetParams.java, SimpleFacets.java
>
>
> This PATCH introduces a new query parameter that tells a (typically, but not 
> necessarily) remote server what time to use as 'NOW' when calculating date 
> facets for a query (and, for the moment, date facets *only*) - overriding the 
> default behaviour of using the local server's current time.
> This gets 'round a problem whereby an explicit time range is specified in a 
> query (e.g. timestamp:[then0 TO then1]), and date facets are required for the 
> given time range (in fact, any explicit time range). 
> Because DateMathParser performs all its calculations from 'NOW', remote 
> callers have to work out how long ago 'then0' and 'then1' are from 'now', and 
> use the relative-to-now values in the facet.date.xxx parameters. If a remote 
> server has a different opinion of NOW compared to the caller, the results 
> will be skewed (e.g. they are in a different time-zone, not time-synced etc.).
> This becomes particularly salient when performing distributed date faceting 
> (see SOLR-1709), where multiple shards may all be running with different 
> times, and the faceting needs to be aligned.
> The new parameter is called 'facet.date.now', and takes as a parameter a 
> (stringified) long that is the number of milliseconds from the epoch (1 Jan 
> 1970 00:00) - i.e. the returned value from a System.currentTimeMillis() call. 
> This was chosen over a formatted date to delineate it from a 'searchable' 
> time and to avoid superfluous date parsing. This makes the value generally a 
> programatically-set value, but as that is where the use-case is for this type 
> of parameter, this should be ok.
> NOTE: This parameter affects date facet timing only. If there are other areas 
> of a query that rely on 'NOW', these will not interpret this value. This is a 
> broader issue about setting a 'query-global' NOW that all parts of query 
> analysis can share.
> Source files affected:
> FacetParams.java   (holds the new constant FACET_DATE_NOW)
> SimpleFacets.java  getFacetDateCounts() NOW parameter modified
> This PATCH is mildly related to SOLR-1709 (Distributed Date Faceting), but as 
> it's a general change for date faceting, it was deemed deserving of its own 
> patch. I will be updating SOLR-1709 in due course to include the use of this 
> new parameter, after some rfc acceptance.
> A possible enhancement to this is to detect facet.date fields, look for and 
> match these fields in queries (if they exist), and potentially determine 
> automatically the required time skew, if any. There are a whole host of 
> reasons why this could be problematic to implement, so an explicit 
> facet.date.now parameter is the safest route.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1729) Date Facet now override time parameter

2010-01-28 Thread Peter Sturge (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805995#action_12805995
 ] 

Peter Sturge commented on SOLR-1729:


??...they might not all get queried at the exact same time??

I suppose this is what the explicit 'NOW' is meant to resolve - 
staggered/lagged receipt/response, and, in an erzatz fashion, discrepencies in 
local time sync. Since the passed-in 'NOW' is relative only to the epoch, 
network latency is handled, and time-sync on any given server is assumed to be 
correct.

??...multiple requets might be made to a single server for different phrases of 
the distributed request that expect to get the same answers.??

As long as the same code path is followed for such requests, it should honour 
the same (passed-in) 'NOW'. Are there scenarios where this is not the case? In 
which case, yes, these would need to be addressed.

??...unless filter queries that use date math also respect it the counts 
returned from date faceting will still potentially be non-sensical.??

Definitely filter queries will need to get/use/honour the same 'NOW' as its 
corresponding query, otherwise anarchy will quickly ensue.
Can you point me toward the class(es) where filter queries' date math lives, 
and I'll have a look? As filter queries are cached separately, can you think of 
any potential caching issues relating to filter queries?


> Date Facet now override time parameter
> --
>
> Key: SOLR-1729
> URL: https://issues.apache.org/jira/browse/SOLR-1729
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.4
> Environment: Solr 1.4
>Reporter: Peter Sturge
>Priority: Minor
> Attachments: FacetParams.java, SimpleFacets.java
>
>
> This PATCH introduces a new query parameter that tells a (typically, but not 
> necessarily) remote server what time to use as 'NOW' when calculating date 
> facets for a query (and, for the moment, date facets *only*) - overriding the 
> default behaviour of using the local server's current time.
> This gets 'round a problem whereby an explicit time range is specified in a 
> query (e.g. timestamp:[then0 TO then1]), and date facets are required for the 
> given time range (in fact, any explicit time range). 
> Because DateMathParser performs all its calculations from 'NOW', remote 
> callers have to work out how long ago 'then0' and 'then1' are from 'now', and 
> use the relative-to-now values in the facet.date.xxx parameters. If a remote 
> server has a different opinion of NOW compared to the caller, the results 
> will be skewed (e.g. they are in a different time-zone, not time-synced etc.).
> This becomes particularly salient when performing distributed date faceting 
> (see SOLR-1709), where multiple shards may all be running with different 
> times, and the faceting needs to be aligned.
> The new parameter is called 'facet.date.now', and takes as a parameter a 
> (stringified) long that is the number of milliseconds from the epoch (1 Jan 
> 1970 00:00) - i.e. the returned value from a System.currentTimeMillis() call. 
> This was chosen over a formatted date to delineate it from a 'searchable' 
> time and to avoid superfluous date parsing. This makes the value generally a 
> programatically-set value, but as that is where the use-case is for this type 
> of parameter, this should be ok.
> NOTE: This parameter affects date facet timing only. If there are other areas 
> of a query that rely on 'NOW', these will not interpret this value. This is a 
> broader issue about setting a 'query-global' NOW that all parts of query 
> analysis can share.
> Source files affected:
> FacetParams.java   (holds the new constant FACET_DATE_NOW)
> SimpleFacets.java  getFacetDateCounts() NOW parameter modified
> This PATCH is mildly related to SOLR-1709 (Distributed Date Faceting), but as 
> it's a general change for date faceting, it was deemed deserving of its own 
> patch. I will be updating SOLR-1709 in due course to include the use of this 
> new parameter, after some rfc acceptance.
> A possible enhancement to this is to detect facet.date fields, look for and 
> match these fields in queries (if they exist), and potentially determine 
> automatically the required time skew, if any. There are a whole host of 
> reasons why this could be problematic to implement, so an explicit 
> facet.date.now parameter is the safest route.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1729) Date Facet now override time parameter

2010-01-22 Thread Peter Sturge (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803860#action_12803860
 ] 

Peter Sturge commented on SOLR-1729:


I agree there are wider issues that relate to this -- this particular patch 
addresses the time sync issue for allowing distributed date facets to happen.
In this case, you must have multiple cores using the same NOW for all, so that 
your date facets are consistent. In fact, it doesn't really matter which now 
you use, as long they're all the same -- the caller setting the now value makes 
the most sense.

For other time-related queries, this might not be the case, but as you rightly 
pointed out, these are not addressed here.


> Date Facet now override time parameter
> --
>
> Key: SOLR-1729
> URL: https://issues.apache.org/jira/browse/SOLR-1729
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.4
> Environment: Solr 1.4
>Reporter: Peter Sturge
>Priority: Minor
> Attachments: FacetParams.java, SimpleFacets.java
>
>
> This PATCH introduces a new query parameter that tells a (typically, but not 
> necessarily) remote server what time to use as 'NOW' when calculating date 
> facets for a query (and, for the moment, date facets *only*) - overriding the 
> default behaviour of using the local server's current time.
> This gets 'round a problem whereby an explicit time range is specified in a 
> query (e.g. timestamp:[then0 TO then1]), and date facets are required for the 
> given time range (in fact, any explicit time range). 
> Because DateMathParser performs all its calculations from 'NOW', remote 
> callers have to work out how long ago 'then0' and 'then1' are from 'now', and 
> use the relative-to-now values in the facet.date.xxx parameters. If a remote 
> server has a different opinion of NOW compared to the caller, the results 
> will be skewed (e.g. they are in a different time-zone, not time-synced etc.).
> This becomes particularly salient when performing distributed date faceting 
> (see SOLR-1709), where multiple shards may all be running with different 
> times, and the faceting needs to be aligned.
> The new parameter is called 'facet.date.now', and takes as a parameter a 
> (stringified) long that is the number of milliseconds from the epoch (1 Jan 
> 1970 00:00) - i.e. the returned value from a System.currentTimeMillis() call. 
> This was chosen over a formatted date to delineate it from a 'searchable' 
> time and to avoid superfluous date parsing. This makes the value generally a 
> programatically-set value, but as that is where the use-case is for this type 
> of parameter, this should be ok.
> NOTE: This parameter affects date facet timing only. If there are other areas 
> of a query that rely on 'NOW', these will not interpret this value. This is a 
> broader issue about setting a 'query-global' NOW that all parts of query 
> analysis can share.
> Source files affected:
> FacetParams.java   (holds the new constant FACET_DATE_NOW)
> SimpleFacets.java  getFacetDateCounts() NOW parameter modified
> This PATCH is mildly related to SOLR-1709 (Distributed Date Faceting), but as 
> it's a general change for date faceting, it was deemed deserving of its own 
> patch. I will be updating SOLR-1709 in due course to include the use of this 
> new parameter, after some rfc acceptance.
> A possible enhancement to this is to detect facet.date fields, look for and 
> match these fields in queries (if they exist), and potentially determine 
> automatically the required time skew, if any. There are a whole host of 
> reasons why this could be problematic to implement, so an explicit 
> facet.date.now parameter is the safest route.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1672) RFE: facet reverse sort count

2010-01-22 Thread Peter Sturge (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803858#action_12803858
 ] 

Peter Sturge commented on SOLR-1672:


Jan, you are absolutely correct that the parameter should (and will) be 'desc'.

I have an update in my queue of things todo which changes this, but also 
removes the new 'facet.sortorder' parameter, and includes instead 'facet.sort 
desc' as a valid parameter for facet.sort. This keeps things nice and tidy and 
consistent.

The 'facet.sortorder' parameter was really as POC to try out the behaviour 
before changing the core parameter syntax of the existing 'facet.sort' 
parameter. Not that's done, the parameter will be rolled into 'facet.sort'.

Thanks,
Peter


> RFE: facet reverse sort count
> -
>
> Key: SOLR-1672
> URL: https://issues.apache.org/jira/browse/SOLR-1672
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.4
> Environment: Java, Solrj, http
>Reporter: Peter Sturge
>Priority: Minor
> Attachments: SOLR-1672.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> As suggested by Chris Hosstetter, I have added an optional Comparator to the 
> BoundedTreeSet in the UnInvertedField class.
> This optional comparator is used when a new (and also optional) field facet 
> parameter called 'facet.sortorder' is set to the string 'dsc' 
> (e.g. &f..facet.sortorder=dsc for per field, or 
> &facet.sortorder=dsc for all facets).
> Note that this parameter has no effect if facet.method=enum.
> Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to 
> its default behaviour.
>  
> This change affects 2 source files:
> > UnInvertedField.java
> [line 438] The getCounts() method signature is modified to add the 
> 'facetSortOrder' parameter value to the end of the argument list.
>  
> DIFF UnInvertedField.java:
> - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
> offset, int limit, Integer mincount, boolean missing, String sort, String 
> prefix) throws IOException {
> + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
> offset, int limit, Integer mincount, boolean missing, String sort, String 
> prefix, String facetSortOrder) throws IOException {
> [line 556] The getCounts() method is modified to create an overridden 
> BoundedTreeSet(int, Comparator) if the 'facetSortOrder' parameter 
> equals 'dsc'.
> DIFF UnInvertedField.java:
> - final BoundedTreeSet queue = new BoundedTreeSet(maxsize);
> + final BoundedTreeSet queue = (sort.equals("count") || 
> sort.equals("true")) ? (facetSortOrder.equals("dsc") ? new 
> BoundedTreeSet(maxsize, new Comparator()
> { @Override
> public int compare(Object o1, Object o2)
> {
>   if (o1 == null || o2 == null)
> return 0;
>   int result = ((Long) o1).compareTo((Long) o2);
>   return (result != 0 ? result > 0 ? -1 : 1 : 0); //lowest number first sort
> }}) : new BoundedTreeSet(maxsize)) : null;
> > SimpleFacets.java
> [line 221] A getFieldParam(field, "facet.sortorder", "asc"); is added to 
> retrieve the new parameter, if present. 'asc' used as a default value.
> DIFF SimpleFacets.java:
> + String facetSortOrder = params.getFieldParam(field, "facet.sortorder", 
> "asc");
>  
> [line 253] The call to uif.getCounts() in the getTermCounts() method is 
> modified to pass the 'facetSortOrder' value string.
> DIFF SimpleFacets.java:
> - counts = uif.getCounts(searcher, base, offset, limit, 
> mincount,missing,sort,prefix);
> + counts = uif.getCounts(searcher, base, offset, limit, 
> mincount,missing,sort,prefix, facetSortOrder);
> Implementation Notes:
> I have noted in testing that I was not able to retrieve any '0' counts as I 
> had expected.
> I believe this could be because there appear to be some optimizations in 
> SimpleFacets/count caching such that zero counts are not iterated (at least 
> not by default)
> as a performance enhancement.
> I could be wrong about this, and zero counts may appear under some other as 
> yet untested circumstances. Perhaps an expert familiar with this part of the 
> code can clarify.
> In fact, this is not such a bad thing (at least for my requirements), as a 
> whole bunch of zero counts is not necessarily useful (for my requirements, 
> starting at '1' is just right).
>  
> There may, however, be instances where someone *will* want zero counts - e.g. 
> searching for zero product stock counts (e.g. 'what have we run out of'). I 
> was envisioning the facet.mincount field
> being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 
> or possibly higher), but because of the caching/optimization, the behaviour 
> is somewhat different than expected.

-- 
This message is automatically generated by JIRA.
-
You can re

[jira] Updated: (SOLR-1709) Distributed Date Faceting

2010-01-21 Thread Peter Sturge (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge updated SOLR-1709:
---

Attachment: FacetComponent.java

Updated version of FacetComponent.java after more testing and sync with 
FacetParams.FACET_DATE_NOW (see SOLR-1729).
For use with the 1.4 trunk (along with the existing ResponseBuilder.java in 
this patch).


> Distributed Date Faceting
> -
>
> Key: SOLR-1709
> URL: https://issues.apache.org/jira/browse/SOLR-1709
> Project: Solr
>  Issue Type: Improvement
>  Components: SearchComponents - other
>Affects Versions: 1.4
>Reporter: Peter Sturge
>Priority: Minor
> Attachments: FacetComponent.java, FacetComponent.java, 
> ResponseBuilder.java
>
>
> This patch is for adding support for date facets when using distributed 
> searches.
> Date faceting across multiple machines exposes some time-based issues that 
> anyone interested in this behaviour should be aware of:
> Any time and/or time-zone differences are not accounted for in the patch 
> (i.e. merged date facets are at a time-of-day, not necessarily at a universal 
> 'instant-in-time', unless all shards are time-synced to the exact same time).
> The implementation uses the first encountered shard's facet_dates as the 
> basis for subsequent shards' data to be merged in.
> This means that if subsequent shards' facet_dates are skewed in relation to 
> the first by >1 'gap', these 'earlier' or 'later' facets will not be merged 
> in.
> There are several reasons for this:
>   * Performance: It's faster to check facet_date lists against a single map's 
> data, rather than against each other, particularly if there are many shards
>   * If 'earlier' and/or 'later' facet_dates are added in, this will make the 
> time range larger than that which was requested
> (e.g. a request for one hour's worth of facets could bring back 2, 3 
> or more hours of data)
> This could be dealt with if timezone and skew information was added, and 
> the dates were normalized.
> One possibility for adding such support is to [optionally] add 'timezone' and 
> 'now' parameters to the 'facet_dates' map. This would tell requesters what 
> time and TZ the remote server thinks it is, and so multiple shards' time data 
> can be normalized.
> The patch affects 2 files in the Solr core:
>   org.apache.solr.handler.component.FacetComponent.java
>   org.apache.solr.handler.component.ResponseBuilder.java
> The main changes are in FacetComponent - ResponseBuilder is just to hold the 
> completed SimpleOrderedMap until the finishStage.
> One possible enhancement is to perhaps make this an optional parameter, but 
> really, if facet.date parameters are specified, it is assumed they are 
> desired.
> Comments & suggestions welcome.
> As a favour to ask, if anyone could take my 2 source files and create a PATCH 
> file from it, it would be greatly appreciated, as I'm having a bit of trouble 
> with svn (don't shoot me, but my environment is a Redmond-based os company).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1729) Date Facet now override time parameter

2010-01-21 Thread Peter Sturge (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge updated SOLR-1729:
---

Attachment: FacetParams.java
SimpleFacets.java

These are the source files affected for this patch.
Apologies for not creating a PATCH file - my tortoise svn is not working for 
creating patch files.
If anyone would like to create a patch from these, that would be 
extraordinarily kind of you!

Diff: (trunk: 1.4 Release)
FacetParams.java:
Add at line 179:
  /**
   * String that tells the date facet counter what time to use as 'now'.
   * 
   * The value of this parameter, if it exists, must be a stringified long 
   * of the number of milliseconds since the epoch (milliseconds since 1 Jan 
1970 00:00).
   * System.currentTimeMillis() provides this.
   * 
   * The DateField and DateMathParser work out their times relative to 'now'.
   * By default, 'now' is the local machine's System.currentTimeMillis().
   * This parameter overrides the local value to use a different time.
   * This is very useful for remote server queries where the times on the 
querying
   * machine are skewed/different than that of the date faceting machine.
   * This is a date.facet global query parameter (i.e. not per field)
   * @see DateMathParser
   * @see DateField
   */
  public static final String FACET_DATE_NOW = "facet.date.now";

SimpleFacets.java:
Change at line 551:
-final Date NOW = new Date();
+ final Date NOW = new Date(params.get(FacetParams.FACET_DATE_NOW) != null 
? Long.parseLong(params.get("facet.date.now")) : System.currentTimeMillis());


> Date Facet now override time parameter
> --
>
> Key: SOLR-1729
> URL: https://issues.apache.org/jira/browse/SOLR-1729
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.4
> Environment: Solr 1.4
>Reporter: Peter Sturge
>Priority: Minor
> Attachments: FacetParams.java, SimpleFacets.java
>
>
> This PATCH introduces a new query parameter that tells a (typically, but not 
> necessarily) remote server what time to use as 'NOW' when calculating date 
> facets for a query (and, for the moment, date facets *only*) - overriding the 
> default behaviour of using the local server's current time.
> This gets 'round a problem whereby an explicit time range is specified in a 
> query (e.g. timestamp:[then0 TO then1]), and date facets are required for the 
> given time range (in fact, any explicit time range). 
> Because DateMathParser performs all its calculations from 'NOW', remote 
> callers have to work out how long ago 'then0' and 'then1' are from 'now', and 
> use the relative-to-now values in the facet.date.xxx parameters. If a remote 
> server has a different opinion of NOW compared to the caller, the results 
> will be skewed (e.g. they are in a different time-zone, not time-synced etc.).
> This becomes particularly salient when performing distributed date faceting 
> (see SOLR-1709), where multiple shards may all be running with different 
> times, and the faceting needs to be aligned.
> The new parameter is called 'facet.date.now', and takes as a parameter a 
> (stringified) long that is the number of milliseconds from the epoch (1 Jan 
> 1970 00:00) - i.e. the returned value from a System.currentTimeMillis() call. 
> This was chosen over a formatted date to delineate it from a 'searchable' 
> time and to avoid superfluous date parsing. This makes the value generally a 
> programatically-set value, but as that is where the use-case is for this type 
> of parameter, this should be ok.
> NOTE: This parameter affects date facet timing only. If there are other areas 
> of a query that rely on 'NOW', these will not interpret this value. This is a 
> broader issue about setting a 'query-global' NOW that all parts of query 
> analysis can share.
> Source files affected:
> FacetParams.java   (holds the new constant FACET_DATE_NOW)
> SimpleFacets.java  getFacetDateCounts() NOW parameter modified
> This PATCH is mildly related to SOLR-1709 (Distributed Date Faceting), but as 
> it's a general change for date faceting, it was deemed deserving of its own 
> patch. I will be updating SOLR-1709 in due course to include the use of this 
> new parameter, after some rfc acceptance.
> A possible enhancement to this is to detect facet.date fields, look for and 
> match these fields in queries (if they exist), and potentially determine 
> automatically the required time skew, if any. There are a whole host of 
> reasons why this could be problematic to implement, so an explicit 
> facet.date.now parameter is the safest route.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1729) Date Facet now override time parameter

2010-01-21 Thread Peter Sturge (JIRA)
Date Facet now override time parameter
--

 Key: SOLR-1729
 URL: https://issues.apache.org/jira/browse/SOLR-1729
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
 Environment: Solr 1.4
Reporter: Peter Sturge
Priority: Minor


This PATCH introduces a new query parameter that tells a (typically, but not 
necessarily) remote server what time to use as 'NOW' when calculating date 
facets for a query (and, for the moment, date facets *only*) - overriding the 
default behaviour of using the local server's current time.

This gets 'round a problem whereby an explicit time range is specified in a 
query (e.g. timestamp:[then0 TO then1]), and date facets are required for the 
given time range (in fact, any explicit time range). 
Because DateMathParser performs all its calculations from 'NOW', remote callers 
have to work out how long ago 'then0' and 'then1' are from 'now', and use the 
relative-to-now values in the facet.date.xxx parameters. If a remote server has 
a different opinion of NOW compared to the caller, the results will be skewed 
(e.g. they are in a different time-zone, not time-synced etc.).
This becomes particularly salient when performing distributed date faceting 
(see SOLR-1709), where multiple shards may all be running with different times, 
and the faceting needs to be aligned.

The new parameter is called 'facet.date.now', and takes as a parameter a 
(stringified) long that is the number of milliseconds from the epoch (1 Jan 
1970 00:00) - i.e. the returned value from a System.currentTimeMillis() call. 
This was chosen over a formatted date to delineate it from a 'searchable' time 
and to avoid superfluous date parsing. This makes the value generally a 
programatically-set value, but as that is where the use-case is for this type 
of parameter, this should be ok.

NOTE: This parameter affects date facet timing only. If there are other areas 
of a query that rely on 'NOW', these will not interpret this value. This is a 
broader issue about setting a 'query-global' NOW that all parts of query 
analysis can share.

Source files affected:
FacetParams.java   (holds the new constant FACET_DATE_NOW)
SimpleFacets.java  getFacetDateCounts() NOW parameter modified

This PATCH is mildly related to SOLR-1709 (Distributed Date Faceting), but as 
it's a general change for date faceting, it was deemed deserving of its own 
patch. I will be updating SOLR-1709 in due course to include the use of this 
new parameter, after some rfc acceptance.

A possible enhancement to this is to detect facet.date fields, look for and 
match these fields in queries (if they exist), and potentially determine 
automatically the required time skew, if any. There are a whole host of reasons 
why this could be problematic to implement, so an explicit facet.date.now 
parameter is the safest route.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1709) Distributed Date Faceting

2010-01-09 Thread Peter Sturge (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798411#action_12798411
 ] 

Peter Sturge commented on SOLR-1709:


Yonik,

Yes, I can see what you mean that of course NOW will affect anything 
date-related to a given query.
I'm wondering whether the passing of 'NOW' to shards should be a separate 
issue/patch from this one (e.g. something like 'Time Sync to Remote Shards'), 
as its scope and ramifications go far beyond simply distributed date faceting.
The whole area of code relating to date math is one that I'm not familiar with, 
but do let me know if there's anything you'd like me to look at.


> Distributed Date Faceting
> -
>
> Key: SOLR-1709
> URL: https://issues.apache.org/jira/browse/SOLR-1709
> Project: Solr
>  Issue Type: Improvement
>  Components: SearchComponents - other
>Affects Versions: 1.4
>Reporter: Peter Sturge
>Priority: Minor
> Attachments: FacetComponent.java, ResponseBuilder.java
>
>
> This patch is for adding support for date facets when using distributed 
> searches.
> Date faceting across multiple machines exposes some time-based issues that 
> anyone interested in this behaviour should be aware of:
> Any time and/or time-zone differences are not accounted for in the patch 
> (i.e. merged date facets are at a time-of-day, not necessarily at a universal 
> 'instant-in-time', unless all shards are time-synced to the exact same time).
> The implementation uses the first encountered shard's facet_dates as the 
> basis for subsequent shards' data to be merged in.
> This means that if subsequent shards' facet_dates are skewed in relation to 
> the first by >1 'gap', these 'earlier' or 'later' facets will not be merged 
> in.
> There are several reasons for this:
>   * Performance: It's faster to check facet_date lists against a single map's 
> data, rather than against each other, particularly if there are many shards
>   * If 'earlier' and/or 'later' facet_dates are added in, this will make the 
> time range larger than that which was requested
> (e.g. a request for one hour's worth of facets could bring back 2, 3 
> or more hours of data)
> This could be dealt with if timezone and skew information was added, and 
> the dates were normalized.
> One possibility for adding such support is to [optionally] add 'timezone' and 
> 'now' parameters to the 'facet_dates' map. This would tell requesters what 
> time and TZ the remote server thinks it is, and so multiple shards' time data 
> can be normalized.
> The patch affects 2 files in the Solr core:
>   org.apache.solr.handler.component.FacetComponent.java
>   org.apache.solr.handler.component.ResponseBuilder.java
> The main changes are in FacetComponent - ResponseBuilder is just to hold the 
> completed SimpleOrderedMap until the finishStage.
> One possible enhancement is to perhaps make this an optional parameter, but 
> really, if facet.date parameters are specified, it is assumed they are 
> desired.
> Comments & suggestions welcome.
> As a favour to ask, if anyone could take my 2 source files and create a PATCH 
> file from it, it would be greatly appreciated, as I'm having a bit of trouble 
> with svn (don't shoot me, but my environment is a Redmond-based os company).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1709) Distributed Date Faceting

2010-01-08 Thread Peter Sturge (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798233#action_12798233
 ] 

Peter Sturge commented on SOLR-1709:


Definitely true! -- messing about with Date strings isn't great for performance.

As the NOW parameter would be for internal request use only (i.e. not for the 
indexer, not for human consumption), could it not just be an epoch long? The 
adjustment math should then be nice and quick (no string/date 
parsing/formatting; at worst just one Date.getTimeInMillis() call if the time 
is stored locally as a string).

> Distributed Date Faceting
> -
>
> Key: SOLR-1709
> URL: https://issues.apache.org/jira/browse/SOLR-1709
> Project: Solr
>  Issue Type: Improvement
>  Components: SearchComponents - other
>Affects Versions: 1.4
>Reporter: Peter Sturge
>Priority: Minor
> Attachments: FacetComponent.java, ResponseBuilder.java
>
>
> This patch is for adding support for date facets when using distributed 
> searches.
> Date faceting across multiple machines exposes some time-based issues that 
> anyone interested in this behaviour should be aware of:
> Any time and/or time-zone differences are not accounted for in the patch 
> (i.e. merged date facets are at a time-of-day, not necessarily at a universal 
> 'instant-in-time', unless all shards are time-synced to the exact same time).
> The implementation uses the first encountered shard's facet_dates as the 
> basis for subsequent shards' data to be merged in.
> This means that if subsequent shards' facet_dates are skewed in relation to 
> the first by >1 'gap', these 'earlier' or 'later' facets will not be merged 
> in.
> There are several reasons for this:
>   * Performance: It's faster to check facet_date lists against a single map's 
> data, rather than against each other, particularly if there are many shards
>   * If 'earlier' and/or 'later' facet_dates are added in, this will make the 
> time range larger than that which was requested
> (e.g. a request for one hour's worth of facets could bring back 2, 3 
> or more hours of data)
> This could be dealt with if timezone and skew information was added, and 
> the dates were normalized.
> One possibility for adding such support is to [optionally] add 'timezone' and 
> 'now' parameters to the 'facet_dates' map. This would tell requesters what 
> time and TZ the remote server thinks it is, and so multiple shards' time data 
> can be normalized.
> The patch affects 2 files in the Solr core:
>   org.apache.solr.handler.component.FacetComponent.java
>   org.apache.solr.handler.component.ResponseBuilder.java
> The main changes are in FacetComponent - ResponseBuilder is just to hold the 
> completed SimpleOrderedMap until the finishStage.
> One possible enhancement is to perhaps make this an optional parameter, but 
> really, if facet.date parameters are specified, it is assumed they are 
> desired.
> Comments & suggestions welcome.
> As a favour to ask, if anyone could take my 2 source files and create a PATCH 
> file from it, it would be greatly appreciated, as I'm having a bit of trouble 
> with svn (don't shoot me, but my environment is a Redmond-based os company).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1709) Distributed Date Faceting

2010-01-08 Thread Peter Sturge (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge updated SOLR-1709:
---

Attachment: ResponseBuilder.java
FacetComponent.java

Sorry, guys, can't get svn to create a patch file correctly on windows, so I'm 
attaching the source files here. With some time, which at the moment I don't 
have, I'm sure I could get svn working. Rather than anyone have to wait for me 
to get the patch file created, I thought it best to get the source uploaded, so 
people can start using it.
Thanks, Peter


> Distributed Date Faceting
> -
>
> Key: SOLR-1709
> URL: https://issues.apache.org/jira/browse/SOLR-1709
> Project: Solr
>  Issue Type: Improvement
>  Components: SearchComponents - other
>Affects Versions: 1.4
>Reporter: Peter Sturge
>Priority: Minor
> Attachments: FacetComponent.java, ResponseBuilder.java
>
>
> This patch is for adding support for date facets when using distributed 
> searches.
> Date faceting across multiple machines exposes some time-based issues that 
> anyone interested in this behaviour should be aware of:
> Any time and/or time-zone differences are not accounted for in the patch 
> (i.e. merged date facets are at a time-of-day, not necessarily at a universal 
> 'instant-in-time', unless all shards are time-synced to the exact same time).
> The implementation uses the first encountered shard's facet_dates as the 
> basis for subsequent shards' data to be merged in.
> This means that if subsequent shards' facet_dates are skewed in relation to 
> the first by >1 'gap', these 'earlier' or 'later' facets will not be merged 
> in.
> There are several reasons for this:
>   * Performance: It's faster to check facet_date lists against a single map's 
> data, rather than against each other, particularly if there are many shards
>   * If 'earlier' and/or 'later' facet_dates are added in, this will make the 
> time range larger than that which was requested
> (e.g. a request for one hour's worth of facets could bring back 2, 3 
> or more hours of data)
> This could be dealt with if timezone and skew information was added, and 
> the dates were normalized.
> One possibility for adding such support is to [optionally] add 'timezone' and 
> 'now' parameters to the 'facet_dates' map. This would tell requesters what 
> time and TZ the remote server thinks it is, and so multiple shards' time data 
> can be normalized.
> The patch affects 2 files in the Solr core:
>   org.apache.solr.handler.component.FacetComponent.java
>   org.apache.solr.handler.component.ResponseBuilder.java
> The main changes are in FacetComponent - ResponseBuilder is just to hold the 
> completed SimpleOrderedMap until the finishStage.
> One possible enhancement is to perhaps make this an optional parameter, but 
> really, if facet.date parameters are specified, it is assumed they are 
> desired.
> Comments & suggestions welcome.
> As a favour to ask, if anyone could take my 2 source files and create a PATCH 
> file from it, it would be greatly appreciated, as I'm having a bit of trouble 
> with svn (don't shoot me, but my environment is a Redmond-based os company).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1709) Distributed Date Faceting

2010-01-08 Thread Peter Sturge (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797957#action_12797957
 ] 

Peter Sturge commented on SOLR-1709:


I've heard of Tortoise, I'll give that a try, thanks.

On the time-zone/skew issue, perhaps a more efficient approach would be a 
'push' rather than 'pull' - i.e.:

Requesters would include an optional parameter that told remote shards what 
time to use as 'NOW', and which TZ to use for date faceting.
This would avoid having to translate loads of time strings at merge time.

Thanks,
Peter


> Distributed Date Faceting
> -
>
> Key: SOLR-1709
> URL: https://issues.apache.org/jira/browse/SOLR-1709
> Project: Solr
>  Issue Type: Improvement
>  Components: SearchComponents - other
>Affects Versions: 1.4
>Reporter: Peter Sturge
>Priority: Minor
>
> This patch is for adding support for date facets when using distributed 
> searches.
> Date faceting across multiple machines exposes some time-based issues that 
> anyone interested in this behaviour should be aware of:
> Any time and/or time-zone differences are not accounted for in the patch 
> (i.e. merged date facets are at a time-of-day, not necessarily at a universal 
> 'instant-in-time', unless all shards are time-synced to the exact same time).
> The implementation uses the first encountered shard's facet_dates as the 
> basis for subsequent shards' data to be merged in.
> This means that if subsequent shards' facet_dates are skewed in relation to 
> the first by >1 'gap', these 'earlier' or 'later' facets will not be merged 
> in.
> There are several reasons for this:
>   * Performance: It's faster to check facet_date lists against a single map's 
> data, rather than against each other, particularly if there are many shards
>   * If 'earlier' and/or 'later' facet_dates are added in, this will make the 
> time range larger than that which was requested
> (e.g. a request for one hour's worth of facets could bring back 2, 3 
> or more hours of data)
> This could be dealt with if timezone and skew information was added, and 
> the dates were normalized.
> One possibility for adding such support is to [optionally] add 'timezone' and 
> 'now' parameters to the 'facet_dates' map. This would tell requesters what 
> time and TZ the remote server thinks it is, and so multiple shards' time data 
> can be normalized.
> The patch affects 2 files in the Solr core:
>   org.apache.solr.handler.component.FacetComponent.java
>   org.apache.solr.handler.component.ResponseBuilder.java
> The main changes are in FacetComponent - ResponseBuilder is just to hold the 
> completed SimpleOrderedMap until the finishStage.
> One possible enhancement is to perhaps make this an optional parameter, but 
> really, if facet.date parameters are specified, it is assumed they are 
> desired.
> Comments & suggestions welcome.
> As a favour to ask, if anyone could take my 2 source files and create a PATCH 
> file from it, it would be greatly appreciated, as I'm having a bit of trouble 
> with svn (don't shoot me, but my environment is a Redmond-based os company).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1709) Distributed Date Faceting

2010-01-07 Thread Peter Sturge (JIRA)
Distributed Date Faceting
-

 Key: SOLR-1709
 URL: https://issues.apache.org/jira/browse/SOLR-1709
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 1.4
Reporter: Peter Sturge
Priority: Minor


This patch is for adding support for date facets when using distributed 
searches.

Date faceting across multiple machines exposes some time-based issues that 
anyone interested in this behaviour should be aware of:
Any time and/or time-zone differences are not accounted for in the patch (i.e. 
merged date facets are at a time-of-day, not necessarily at a universal 
'instant-in-time', unless all shards are time-synced to the exact same time).
The implementation uses the first encountered shard's facet_dates as the basis 
for subsequent shards' data to be merged in.
This means that if subsequent shards' facet_dates are skewed in relation to the 
first by >1 'gap', these 'earlier' or 'later' facets will not be merged in.
There are several reasons for this:
  * Performance: It's faster to check facet_date lists against a single map's 
data, rather than against each other, particularly if there are many shards
  * If 'earlier' and/or 'later' facet_dates are added in, this will make the 
time range larger than that which was requested
(e.g. a request for one hour's worth of facets could bring back 2, 3 or 
more hours of data)
This could be dealt with if timezone and skew information was added, and 
the dates were normalized.
One possibility for adding such support is to [optionally] add 'timezone' and 
'now' parameters to the 'facet_dates' map. This would tell requesters what time 
and TZ the remote server thinks it is, and so multiple shards' time data can be 
normalized.

The patch affects 2 files in the Solr core:
  org.apache.solr.handler.component.FacetComponent.java
  org.apache.solr.handler.component.ResponseBuilder.java

The main changes are in FacetComponent - ResponseBuilder is just to hold the 
completed SimpleOrderedMap until the finishStage.
One possible enhancement is to perhaps make this an optional parameter, but 
really, if facet.date parameters are specified, it is assumed they are desired.
Comments & suggestions welcome.

As a favour to ask, if anyone could take my 2 source files and create a PATCH 
file from it, it would be greatly appreciated, as I'm having a bit of trouble 
with svn (don't shoot me, but my environment is a Redmond-based os company).


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1672) RFE: facet reverse sort count

2010-01-07 Thread Peter Sturge (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge resolved SOLR-1672.


Resolution: Fixed

Marking as resolved.


> RFE: facet reverse sort count
> -
>
> Key: SOLR-1672
> URL: https://issues.apache.org/jira/browse/SOLR-1672
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.4
> Environment: Java, Solrj, http
>Reporter: Peter Sturge
>Priority: Minor
> Attachments: SOLR-1672.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> As suggested by Chris Hosstetter, I have added an optional Comparator to the 
> BoundedTreeSet in the UnInvertedField class.
> This optional comparator is used when a new (and also optional) field facet 
> parameter called 'facet.sortorder' is set to the string 'dsc' 
> (e.g. &f..facet.sortorder=dsc for per field, or 
> &facet.sortorder=dsc for all facets).
> Note that this parameter has no effect if facet.method=enum.
> Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to 
> its default behaviour.
>  
> This change affects 2 source files:
> > UnInvertedField.java
> [line 438] The getCounts() method signature is modified to add the 
> 'facetSortOrder' parameter value to the end of the argument list.
>  
> DIFF UnInvertedField.java:
> - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
> offset, int limit, Integer mincount, boolean missing, String sort, String 
> prefix) throws IOException {
> + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
> offset, int limit, Integer mincount, boolean missing, String sort, String 
> prefix, String facetSortOrder) throws IOException {
> [line 556] The getCounts() method is modified to create an overridden 
> BoundedTreeSet(int, Comparator) if the 'facetSortOrder' parameter 
> equals 'dsc'.
> DIFF UnInvertedField.java:
> - final BoundedTreeSet queue = new BoundedTreeSet(maxsize);
> + final BoundedTreeSet queue = (sort.equals("count") || 
> sort.equals("true")) ? (facetSortOrder.equals("dsc") ? new 
> BoundedTreeSet(maxsize, new Comparator()
> { @Override
> public int compare(Object o1, Object o2)
> {
>   if (o1 == null || o2 == null)
> return 0;
>   int result = ((Long) o1).compareTo((Long) o2);
>   return (result != 0 ? result > 0 ? -1 : 1 : 0); //lowest number first sort
> }}) : new BoundedTreeSet(maxsize)) : null;
> > SimpleFacets.java
> [line 221] A getFieldParam(field, "facet.sortorder", "asc"); is added to 
> retrieve the new parameter, if present. 'asc' used as a default value.
> DIFF SimpleFacets.java:
> + String facetSortOrder = params.getFieldParam(field, "facet.sortorder", 
> "asc");
>  
> [line 253] The call to uif.getCounts() in the getTermCounts() method is 
> modified to pass the 'facetSortOrder' value string.
> DIFF SimpleFacets.java:
> - counts = uif.getCounts(searcher, base, offset, limit, 
> mincount,missing,sort,prefix);
> + counts = uif.getCounts(searcher, base, offset, limit, 
> mincount,missing,sort,prefix, facetSortOrder);
> Implementation Notes:
> I have noted in testing that I was not able to retrieve any '0' counts as I 
> had expected.
> I believe this could be because there appear to be some optimizations in 
> SimpleFacets/count caching such that zero counts are not iterated (at least 
> not by default)
> as a performance enhancement.
> I could be wrong about this, and zero counts may appear under some other as 
> yet untested circumstances. Perhaps an expert familiar with this part of the 
> code can clarify.
> In fact, this is not such a bad thing (at least for my requirements), as a 
> whole bunch of zero counts is not necessarily useful (for my requirements, 
> starting at '1' is just right).
>  
> There may, however, be instances where someone *will* want zero counts - e.g. 
> searching for zero product stock counts (e.g. 'what have we run out of'). I 
> was envisioning the facet.mincount field
> being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 
> or possibly higher), but because of the caching/optimization, the behaviour 
> is somewhat different than expected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1672) RFE: facet reverse sort count

2009-12-18 Thread Peter Sturge (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge updated SOLR-1672:
---

Remaining Estimate: 0h  (was: 24h)
 Original Estimate: 0h  (was: 24h)

> RFE: facet reverse sort count
> -
>
> Key: SOLR-1672
> URL: https://issues.apache.org/jira/browse/SOLR-1672
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.4
> Environment: Java, Solrj, http
>Reporter: Peter Sturge
>Priority: Minor
> Attachments: SOLR-1672.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> As suggested by Chris Hosstetter, I have added an optional Comparator to the 
> BoundedTreeSet in the UnInvertedField class.
> This optional comparator is used when a new (and also optional) field facet 
> parameter called 'facet.sortorder' is set to the string 'dsc' 
> (e.g. &f..facet.sortorder=dsc for per field, or 
> &facet.sortorder=dsc for all facets).
> Note that this parameter has no effect if facet.method=enum.
> Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to 
> its default behaviour.
>  
> This change affects 2 source files:
> > UnInvertedField.java
> [line 438] The getCounts() method signature is modified to add the 
> 'facetSortOrder' parameter value to the end of the argument list.
>  
> DIFF UnInvertedField.java:
> - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
> offset, int limit, Integer mincount, boolean missing, String sort, String 
> prefix) throws IOException {
> + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
> offset, int limit, Integer mincount, boolean missing, String sort, String 
> prefix, String facetSortOrder) throws IOException {
> [line 556] The getCounts() method is modified to create an overridden 
> BoundedTreeSet(int, Comparator) if the 'facetSortOrder' parameter 
> equals 'dsc'.
> DIFF UnInvertedField.java:
> - final BoundedTreeSet queue = new BoundedTreeSet(maxsize);
> + final BoundedTreeSet queue = (sort.equals("count") || 
> sort.equals("true")) ? (facetSortOrder.equals("dsc") ? new 
> BoundedTreeSet(maxsize, new Comparator()
> { @Override
> public int compare(Object o1, Object o2)
> {
>   if (o1 == null || o2 == null)
> return 0;
>   int result = ((Long) o1).compareTo((Long) o2);
>   return (result != 0 ? result > 0 ? -1 : 1 : 0); //lowest number first sort
> }}) : new BoundedTreeSet(maxsize)) : null;
> > SimpleFacets.java
> [line 221] A getFieldParam(field, "facet.sortorder", "asc"); is added to 
> retrieve the new parameter, if present. 'asc' used as a default value.
> DIFF SimpleFacets.java:
> + String facetSortOrder = params.getFieldParam(field, "facet.sortorder", 
> "asc");
>  
> [line 253] The call to uif.getCounts() in the getTermCounts() method is 
> modified to pass the 'facetSortOrder' value string.
> DIFF SimpleFacets.java:
> - counts = uif.getCounts(searcher, base, offset, limit, 
> mincount,missing,sort,prefix);
> + counts = uif.getCounts(searcher, base, offset, limit, 
> mincount,missing,sort,prefix, facetSortOrder);
> Implementation Notes:
> I have noted in testing that I was not able to retrieve any '0' counts as I 
> had expected.
> I believe this could be because there appear to be some optimizations in 
> SimpleFacets/count caching such that zero counts are not iterated (at least 
> not by default)
> as a performance enhancement.
> I could be wrong about this, and zero counts may appear under some other as 
> yet untested circumstances. Perhaps an expert familiar with this part of the 
> code can clarify.
> In fact, this is not such a bad thing (at least for my requirements), as a 
> whole bunch of zero counts is not necessarily useful (for my requirements, 
> starting at '1' is just right).
>  
> There may, however, be instances where someone *will* want zero counts - e.g. 
> searching for zero product stock counts (e.g. 'what have we run out of'). I 
> was envisioning the facet.mincount field
> being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 
> or possibly higher), but because of the caching/optimization, the behaviour 
> is somewhat different than expected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1672) RFE: facet reverse sort count

2009-12-18 Thread Peter Sturge (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792424#action_12792424
 ] 

Peter Sturge commented on SOLR-1672:


Patch SOLR-1672.patch now included for review


> RFE: facet reverse sort count
> -
>
> Key: SOLR-1672
> URL: https://issues.apache.org/jira/browse/SOLR-1672
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.4
> Environment: Java, Solrj, http
>Reporter: Peter Sturge
>Priority: Minor
> Attachments: SOLR-1672.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> As suggested by Chris Hosstetter, I have added an optional Comparator to the 
> BoundedTreeSet in the UnInvertedField class.
> This optional comparator is used when a new (and also optional) field facet 
> parameter called 'facet.sortorder' is set to the string 'dsc' 
> (e.g. &f..facet.sortorder=dsc for per field, or 
> &facet.sortorder=dsc for all facets).
> Note that this parameter has no effect if facet.method=enum.
> Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to 
> its default behaviour.
>  
> This change affects 2 source files:
> > UnInvertedField.java
> [line 438] The getCounts() method signature is modified to add the 
> 'facetSortOrder' parameter value to the end of the argument list.
>  
> DIFF UnInvertedField.java:
> - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
> offset, int limit, Integer mincount, boolean missing, String sort, String 
> prefix) throws IOException {
> + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
> offset, int limit, Integer mincount, boolean missing, String sort, String 
> prefix, String facetSortOrder) throws IOException {
> [line 556] The getCounts() method is modified to create an overridden 
> BoundedTreeSet(int, Comparator) if the 'facetSortOrder' parameter 
> equals 'dsc'.
> DIFF UnInvertedField.java:
> - final BoundedTreeSet queue = new BoundedTreeSet(maxsize);
> + final BoundedTreeSet queue = (sort.equals("count") || 
> sort.equals("true")) ? (facetSortOrder.equals("dsc") ? new 
> BoundedTreeSet(maxsize, new Comparator()
> { @Override
> public int compare(Object o1, Object o2)
> {
>   if (o1 == null || o2 == null)
> return 0;
>   int result = ((Long) o1).compareTo((Long) o2);
>   return (result != 0 ? result > 0 ? -1 : 1 : 0); //lowest number first sort
> }}) : new BoundedTreeSet(maxsize)) : null;
> > SimpleFacets.java
> [line 221] A getFieldParam(field, "facet.sortorder", "asc"); is added to 
> retrieve the new parameter, if present. 'asc' used as a default value.
> DIFF SimpleFacets.java:
> + String facetSortOrder = params.getFieldParam(field, "facet.sortorder", 
> "asc");
>  
> [line 253] The call to uif.getCounts() in the getTermCounts() method is 
> modified to pass the 'facetSortOrder' value string.
> DIFF SimpleFacets.java:
> - counts = uif.getCounts(searcher, base, offset, limit, 
> mincount,missing,sort,prefix);
> + counts = uif.getCounts(searcher, base, offset, limit, 
> mincount,missing,sort,prefix, facetSortOrder);
> Implementation Notes:
> I have noted in testing that I was not able to retrieve any '0' counts as I 
> had expected.
> I believe this could be because there appear to be some optimizations in 
> SimpleFacets/count caching such that zero counts are not iterated (at least 
> not by default)
> as a performance enhancement.
> I could be wrong about this, and zero counts may appear under some other as 
> yet untested circumstances. Perhaps an expert familiar with this part of the 
> code can clarify.
> In fact, this is not such a bad thing (at least for my requirements), as a 
> whole bunch of zero counts is not necessarily useful (for my requirements, 
> starting at '1' is just right).
>  
> There may, however, be instances where someone *will* want zero counts - e.g. 
> searching for zero product stock counts (e.g. 'what have we run out of'). I 
> was envisioning the facet.mincount field
> being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 
> or possibly higher), but because of the caching/optimization, the behaviour 
> is somewhat different than expected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1672) RFE: facet reverse sort count

2009-12-18 Thread Peter Sturge (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge updated SOLR-1672:
---

Attachment: SOLR-1672.patch

Patch diff file for adding facet reverse sorting


> RFE: facet reverse sort count
> -
>
> Key: SOLR-1672
> URL: https://issues.apache.org/jira/browse/SOLR-1672
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.4
> Environment: Java, Solrj, http
>Reporter: Peter Sturge
>Priority: Minor
> Attachments: SOLR-1672.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> As suggested by Chris Hosstetter, I have added an optional Comparator to the 
> BoundedTreeSet in the UnInvertedField class.
> This optional comparator is used when a new (and also optional) field facet 
> parameter called 'facet.sortorder' is set to the string 'dsc' 
> (e.g. &f..facet.sortorder=dsc for per field, or 
> &facet.sortorder=dsc for all facets).
> Note that this parameter has no effect if facet.method=enum.
> Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to 
> its default behaviour.
>  
> This change affects 2 source files:
> > UnInvertedField.java
> [line 438] The getCounts() method signature is modified to add the 
> 'facetSortOrder' parameter value to the end of the argument list.
>  
> DIFF UnInvertedField.java:
> - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
> offset, int limit, Integer mincount, boolean missing, String sort, String 
> prefix) throws IOException {
> + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
> offset, int limit, Integer mincount, boolean missing, String sort, String 
> prefix, String facetSortOrder) throws IOException {
> [line 556] The getCounts() method is modified to create an overridden 
> BoundedTreeSet(int, Comparator) if the 'facetSortOrder' parameter 
> equals 'dsc'.
> DIFF UnInvertedField.java:
> - final BoundedTreeSet queue = new BoundedTreeSet(maxsize);
> + final BoundedTreeSet queue = (sort.equals("count") || 
> sort.equals("true")) ? (facetSortOrder.equals("dsc") ? new 
> BoundedTreeSet(maxsize, new Comparator()
> { @Override
> public int compare(Object o1, Object o2)
> {
>   if (o1 == null || o2 == null)
> return 0;
>   int result = ((Long) o1).compareTo((Long) o2);
>   return (result != 0 ? result > 0 ? -1 : 1 : 0); //lowest number first sort
> }}) : new BoundedTreeSet(maxsize)) : null;
> > SimpleFacets.java
> [line 221] A getFieldParam(field, "facet.sortorder", "asc"); is added to 
> retrieve the new parameter, if present. 'asc' used as a default value.
> DIFF SimpleFacets.java:
> + String facetSortOrder = params.getFieldParam(field, "facet.sortorder", 
> "asc");
>  
> [line 253] The call to uif.getCounts() in the getTermCounts() method is 
> modified to pass the 'facetSortOrder' value string.
> DIFF SimpleFacets.java:
> - counts = uif.getCounts(searcher, base, offset, limit, 
> mincount,missing,sort,prefix);
> + counts = uif.getCounts(searcher, base, offset, limit, 
> mincount,missing,sort,prefix, facetSortOrder);
> Implementation Notes:
> I have noted in testing that I was not able to retrieve any '0' counts as I 
> had expected.
> I believe this could be because there appear to be some optimizations in 
> SimpleFacets/count caching such that zero counts are not iterated (at least 
> not by default)
> as a performance enhancement.
> I could be wrong about this, and zero counts may appear under some other as 
> yet untested circumstances. Perhaps an expert familiar with this part of the 
> code can clarify.
> In fact, this is not such a bad thing (at least for my requirements), as a 
> whole bunch of zero counts is not necessarily useful (for my requirements, 
> starting at '1' is just right).
>  
> There may, however, be instances where someone *will* want zero counts - e.g. 
> searching for zero product stock counts (e.g. 'what have we run out of'). I 
> was envisioning the facet.mincount field
> being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 
> or possibly higher), but because of the caching/optimization, the behaviour 
> is somewhat different than expected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1672) RFE: facet reverse sort count

2009-12-18 Thread Peter Sturge (JIRA)
RFE: facet reverse sort count
-

 Key: SOLR-1672
 URL: https://issues.apache.org/jira/browse/SOLR-1672
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
 Environment: Java, Solrj, http
Reporter: Peter Sturge
Priority: Minor


As suggested by Chris Hosstetter, I have added an optional Comparator to the 
BoundedTreeSet in the UnInvertedField class.
This optional comparator is used when a new (and also optional) field facet 
parameter called 'facet.sortorder' is set to the string 'dsc' 
(e.g. &f..facet.sortorder=dsc for per field, or &facet.sortorder=dsc 
for all facets).
Note that this parameter has no effect if facet.method=enum.
Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to 
its default behaviour.
 
This change affects 2 source files:
> UnInvertedField.java
[line 438] The getCounts() method signature is modified to add the 
'facetSortOrder' parameter value to the end of the argument list.
 
DIFF UnInvertedField.java:
- public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
offset, int limit, Integer mincount, boolean missing, String sort, String 
prefix) throws IOException {

+ public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
offset, int limit, Integer mincount, boolean missing, String sort, String 
prefix, String facetSortOrder) throws IOException {

[line 556] The getCounts() method is modified to create an overridden 
BoundedTreeSet(int, Comparator) if the 'facetSortOrder' parameter equals 
'dsc'.
DIFF UnInvertedField.java:
- final BoundedTreeSet queue = new BoundedTreeSet(maxsize);

+ final BoundedTreeSet queue = (sort.equals("count") || 
sort.equals("true")) ? (facetSortOrder.equals("dsc") ? new 
BoundedTreeSet(maxsize, new Comparator()
{ @Override
public int compare(Object o1, Object o2)
{
  if (o1 == null || o2 == null)
return 0;
  int result = ((Long) o1).compareTo((Long) o2);
  return (result != 0 ? result > 0 ? -1 : 1 : 0); //lowest number first sort
}}) : new BoundedTreeSet(maxsize)) : null;

> SimpleFacets.java
[line 221] A getFieldParam(field, "facet.sortorder", "asc"); is added to 
retrieve the new parameter, if present. 'asc' used as a default value.
DIFF SimpleFacets.java:

+ String facetSortOrder = params.getFieldParam(field, "facet.sortorder", "asc");
 
[line 253] The call to uif.getCounts() in the getTermCounts() method is 
modified to pass the 'facetSortOrder' value string.
DIFF SimpleFacets.java:
- counts = uif.getCounts(searcher, base, offset, limit, 
mincount,missing,sort,prefix);
+ counts = uif.getCounts(searcher, base, offset, limit, 
mincount,missing,sort,prefix, facetSortOrder);

Implementation Notes:
I have noted in testing that I was not able to retrieve any '0' counts as I had 
expected.
I believe this could be because there appear to be some optimizations in 
SimpleFacets/count caching such that zero counts are not iterated (at least not 
by default)
as a performance enhancement.
I could be wrong about this, and zero counts may appear under some other as yet 
untested circumstances. Perhaps an expert familiar with this part of the code 
can clarify.
In fact, this is not such a bad thing (at least for my requirements), as a 
whole bunch of zero counts is not necessarily useful (for my requirements, 
starting at '1' is just right).
 
There may, however, be instances where someone *will* want zero counts - e.g. 
searching for zero product stock counts (e.g. 'what have we run out of'). I was 
envisioning the facet.mincount field
being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 
or possibly higher), but because of the caching/optimization, the behaviour is 
somewhat different than expected.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.