LUCENE-460

2005-12-23 Thread Pasha Bizhan
Hi,

Question about latest cvs changes and hashcodes.
http://issues.apache.org/jira/browse/LUCENE-460

Could anybody explain the magic numbers? 0x6634D93C,0x2742E74A and other.
Any special meaning? Is this documented anywhere?

Pasha 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: LUCENE-460

2005-12-23 Thread Yonik Seeley
Here is my previous reply to Otis:

> python -c "import random;print hex(random.getrandbits(32))[:-1]"
> Just a way of making some things unique let me know if you have a
> better idea on that.

So those constants are just random numbers.  I thought about trying to
pick magic numbers to maximize hamming distances, etc, but it's more
work and more likely to mess things up if you get it wrong. 
getClass().hashCode() would also work, but it would be slower.

I'll add this explanation to the bug.

-Yonik

On 12/23/05, Pasha Bizhan <[EMAIL PROTECTED]> wrote:
> Hi,
>
> Question about latest cvs changes and hashcodes.
> http://issues.apache.org/jira/browse/LUCENE-460
>
> Could anybody explain the magic numbers? 0x6634D93C,0x2742E74A and other.
> Any special meaning? Is this documented anywhere?
>
> Pasha
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-460) hashCode improvements

2005-10-30 Thread Yonik Seeley (JIRA)
hashCode improvements
-

 Key: LUCENE-460
 URL: http://issues.apache.org/jira/browse/LUCENE-460
 Project: Lucene - Java
Type: Improvement
  Components: Search  
Versions: CVS Nightly - Specify date in submission
Reporter: Yonik Seeley
Priority: Minor
 Fix For: CVS Nightly - Specify date in submission


It would be nice for all Query classes to implement hashCode and equals to 
enable them to be used as keys when caching.


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-460) hashCode improvements

2005-10-30 Thread Yonik Seeley (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-460?page=comments#action_12356325 ] 

Yonik Seeley commented on LUCENE-460:
-

A couple of guidelines off the top of my head...
 - hash codes should strive to be unique across the Query hierarchy, not just 
unique within one specific subclass.  For example, TermQuery(t) and 
SpanTermQuery(t) will generate the exact same hash codes.
- mix bits between different components that have any hashCode parts in 
common... 
   for example RangeQuery will produce the same hashCode whenever 
lowerTerm==upperTerm.
   Also, field[x TO y] will produce the same hashCode for *any* field since the 
fieldname parts of the
  terms will always cancel eachother out.  This will also cause the hashCode of 
field{x TO x} to equal field:x
  The hashCode of FilteredQuery will also cause many collisions because the 
bits aren't mixed inbetween
   the query and the filter.
  Remember that every query as a boost component... never just xor two query 
hashCodes together.
- make things position dependent.
  Currently, field[x TO y] will produce the same hasCode as field[y TO x]... 
not particularly important for RangeQuery, but
   you get the idea. 
- don't be afraid of using "+" instead of "^".  They both take a single CPU 
cycle, but "+" is not quite so easily (accidentally) reversed.
- flipping more than a single bit when hashing a boolean might be a good idea - 
it will make collisions harder.

http://www.concentric.net/~Ttwang/tech/inthash.htm is an interesting link on 
integer hash codes (what we are in effect doing when we combine multiple hash 
codes).  Esp interesting is the section "Parallel Operations"

> hashCode improvements
> -
>
>  Key: LUCENE-460
>  URL: http://issues.apache.org/jira/browse/LUCENE-460
>  Project: Lucene - Java
> Type: Improvement
>   Components: Search
> Versions: CVS Nightly - Specify date in submission
> Reporter: Yonik Seeley
> Priority: Minor
>  Fix For: CVS Nightly - Specify date in submission

>
> It would be nice for all Query classes to implement hashCode and equals to 
> enable them to be used as keys when caching.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-460) hashCode improvements

2005-10-30 Thread Yonik Seeley (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-460?page=comments#action_12356326 ] 

Yonik Seeley commented on LUCENE-460:
-

Oh, and preserve entropy by using reversible integer hash functions (see the 
previous link).
 -  key ^= (key << a) | (key >>> b); for a,b in (17,16) (16,17) (14,19) (19,14) 
(13,20) (20,13) (10,23) (23,10) (8,25) (25,8)
 - multiply by an odd
 - addition
 - xor
 - rotates

> hashCode improvements
> -----
>
>  Key: LUCENE-460
>  URL: http://issues.apache.org/jira/browse/LUCENE-460
>  Project: Lucene - Java
> Type: Improvement
>   Components: Search
> Versions: CVS Nightly - Specify date in submission
> Reporter: Yonik Seeley
> Priority: Minor
>  Fix For: CVS Nightly - Specify date in submission

>
> It would be nice for all Query classes to implement hashCode and equals to 
> enable them to be used as keys when caching.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-460) hashCode improvements

2005-10-30 Thread Daniel Naber (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-460?page=comments#action_12356327 ] 

Daniel Naber commented on LUCENE-460:
-

"Currently, field[x TO y] will produce the same hasCode as field[y TO x]... not 
particularly important for RangeQuery, but  you get the idea. "

Actually it is kind of important, because while [a TO z] will give hits [z TO 
a] won't.


> hashCode improvements
> -----
>
>  Key: LUCENE-460
>  URL: http://issues.apache.org/jira/browse/LUCENE-460
>  Project: Lucene - Java
> Type: Improvement
>   Components: Search
> Versions: CVS Nightly - Specify date in submission
> Reporter: Yonik Seeley
> Priority: Minor
>  Fix For: CVS Nightly - Specify date in submission

>
> It would be nice for all Query classes to implement hashCode and equals to 
> enable them to be used as keys when caching.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-460) hashCode improvements

2005-10-30 Thread paul.elschot (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-460?page=comments#action_12356384 ] 

paul.elschot commented on LUCENE-460:
-

>  - hash codes should strive to be unique across the Query hierarchy, not just 
> unique within one specific
> subclass. For example, TermQuery(t) and SpanTermQuery(t) will generate the 
> exact same hash codes. 

This has the disadvantage that the structure information added by the query 
parser is lost in the hash.
In case a hash is needed over the query text before parsing, one can use 
precisely that.
What's the point of hash code uniqueness over subclasses?

Regards,
Paul Elschot


> hashCode improvements
> -
>
>  Key: LUCENE-460
>  URL: http://issues.apache.org/jira/browse/LUCENE-460
>  Project: Lucene - Java
> Type: Improvement
>   Components: Search
> Versions: CVS Nightly - Specify date in submission
> Reporter: Yonik Seeley
> Priority: Minor
>  Fix For: CVS Nightly - Specify date in submission

>
> It would be nice for all Query classes to implement hashCode and equals to 
> enable them to be used as keys when caching.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-460) hashCode improvements

2005-10-31 Thread Yonik Seeley (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-460?page=comments#action_12356414 ] 

Yonik Seeley commented on LUCENE-460:
-

Paul, I'm not sure I understand your point about "structure information added 
by the query parser is lost in the hash".
Let me rephrase my statement in case you misunderstood me:

   - hash codes should strive to avoid collisions across Query hierarchy, not 
just within one specific
 subclass. An example of hashCode implementations that don't do this are 
TermQuery(t) and SpanTermQuery(t) which will generate the exact same hash 
codes. 

> What's the point of hash code uniqueness over subclasses? 

The same point as hash code uniqueness (avoiding collisions) within a 
particular class - the more collisions you have, the more it will slow down 
hash based lookups.  Things should still work if all hashCodes mapped to the 
same number, but it would be dog slow.

> hashCode improvements
> ---------
>
>  Key: LUCENE-460
>  URL: http://issues.apache.org/jira/browse/LUCENE-460
>  Project: Lucene - Java
> Type: Improvement
>   Components: Search
> Versions: CVS Nightly - Specify date in submission
> Reporter: Yonik Seeley
> Priority: Minor
>  Fix For: CVS Nightly - Specify date in submission

>
> It would be nice for all Query classes to implement hashCode and equals to 
> enable them to be used as keys when caching.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-460) hashCode improvements

2005-10-31 Thread paul.elschot (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-460?page=comments#action_12356432 ] 

paul.elschot commented on LUCENE-460:
-

>  For example, TermQuery(t) and SpanTermQuery(t) will generate the exact same 
> hash codes. 

I'm sorry, I misread this as an example of what a hash code should be. You 
meant this
to be an example of what is wrong with current hashcodes...
 

> hashCode improvements
> -
>
>  Key: LUCENE-460
>  URL: http://issues.apache.org/jira/browse/LUCENE-460
>  Project: Lucene - Java
> Type: Improvement
>   Components: Search
> Versions: CVS Nightly - Specify date in submission
> Reporter: Yonik Seeley
> Priority: Minor
>  Fix For: CVS Nightly - Specify date in submission

>
> It would be nice for all Query classes to implement hashCode and equals to 
> enable them to be used as keys when caching.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-460) hashCode improvements

2005-12-23 Thread Yonik Seeley (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-460?page=comments#action_12361200 ] 

Yonik Seeley commented on LUCENE-460:
-

Some people have asked where some of the magic constants come from in the 
hashCodes:

> python -c "import random;print hex(random.getrandbits(32))[:-1]"
> Just a way of making some things unique let me know if you have a
> better idea on that.
> 
> So those constants are just random numbers.  I thought about trying to pick 
> magic numbers to maximize hamming distances, etc,
> but it's more work and more likely to mess things up if you get it wrong.  
> getClass().hashCode() would also work, but it would be 
> slower.


> hashCode improvements
> -----
>
>  Key: LUCENE-460
>      URL: http://issues.apache.org/jira/browse/LUCENE-460
>  Project: Lucene - Java
> Type: Improvement
>   Components: Search
> Versions: CVS Nightly - Specify date in submission
> Reporter: Yonik Seeley
> Priority: Minor
>  Fix For: CVS Nightly - Specify date in submission

>
> It would be nice for all Query classes to implement hashCode and equals to 
> enable them to be used as keys when caching.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Closed: (LUCENE-460) hashCode improvements

2006-04-05 Thread Yonik Seeley (JIRA)
 [ http://issues.apache.org/jira/browse/LUCENE-460?page=all ]
 
Yonik Seeley closed LUCENE-460:
---

Resolution: Fixed
 Assign To: Yonik Seeley

closing... I think I got most of these.

> hashCode improvements
> -
>
>      Key: LUCENE-460
>  URL: http://issues.apache.org/jira/browse/LUCENE-460
>  Project: Lucene - Java
> Type: Improvement

>   Components: Search
> Versions: CVS Nightly - Specify date in submission
> Reporter: Yonik Seeley
> Assignee: Yonik Seeley
> Priority: Minor
>  Fix For: CVS Nightly - Specify date in submission

>
> It would be nice for all Query classes to implement hashCode and equals to 
> enable them to be used as keys when caching.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]