[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-08-24 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746845#action_12746845
 ] 

Yonik Seeley commented on LUCENE-1607:
--

{quote}
isn't it possible to make the call to
public String intern(char[] arr, int offset, int len) {}
not create a String?
{quote}

Yep - that's why I added that method (just didn't get around to implementing it 
that way).

> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
>Assignee: Yonik Seeley
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607-contrib.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-08-24 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746773#action_12746773
 ] 

Noble Paul commented on LUCENE-1607:


isn't it possible to make the call to 
{code}
public String intern(char[] arr, int offset, int len) {
}
{code}
not create a String? 
currently , if I have a char[] it ends up creating a String (which makes a copy 
of the char[]) before it does the intern() operation.


> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
>Assignee: Yonik Seeley
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607-contrib.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-08-07 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740638#action_12740638
 ] 

Uwe Schindler commented on LUCENE-1607:
---

Committed rev 802095.

> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
>Assignee: Yonik Seeley
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607-contrib.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-08-05 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739628#action_12739628
 ] 

Yonik Seeley commented on LUCENE-1607:
--

bq. Except I'd like the javadoc demand each impl to be 
String.intern()-compatible.

If *everything* went through the same intern, it wouldn't matter. 

bq. rewrite 'for' as 'for (Entry e = first;e != null;e = e.next)' for clarity?

done.

bq. If check around 'nextToLast = e' can also be removed?

I don't see how...

> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
>Assignee: Yonik Seeley
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-07-14 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731185#action_12731185
 ] 

Mark Miller commented on LUCENE-1607:
-

looks like this is so close ...

> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
>Assignee: Yonik Seeley
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-06-23 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723352#action_12723352
 ] 

Earwin Burrfoot commented on LUCENE-1607:
-

Okay, let's have an extra class and ability to switch impls. I liked that 
static method could get inlined (at least its short-path), but that's not 
necessary.

Except I'd like the javadoc demand each impl to be String.intern()-compatible. 
There's nothing bad in it, as in any decent impl an unique string will be 
String.intern()'ed one time at most. And the case when you get an infinite flow 
of unique strings is degenerate anyway, you have to fix something, not deal 
with it. On the other hand, we can remove "This should never be changed after 
other Lucene APIs have been used" clause.

rewrite 'for' as 'for (Entry e = first;e != null;e = e.next)' for clarity?
'Entry[] arr = cache;' - this can be skipped? 'cache' is already final and 
optimizer loves finals. Plus further down the method you use both cache[slot] 
and arr[slot]. Or am I missing some voodoo?
If check around 'nextToLast = e' can also be removed?
'public String intern(char[] arr, int offset, int len)' - is this needed?

> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
>Assignee: Yonik Seeley
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-06-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720291#action_12720291
 ] 

Michael McCandless commented on LUCENE-1607:


bq. I've held off because of a lack of consensus

Silence = consensus!

> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
>Assignee: Yonik Seeley
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-06-16 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720274#action_12720274
 ] 

Yonik Seeley commented on LUCENE-1607:
--

I've held off because of a lack of consensus, but I suppose I can do the old 
"i'll commit in a few days" thing.

> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-06-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720266#action_12720266
 ] 

Mark Miller commented on LUCENE-1607:
-

I assume we can assign this one to you Yonik?

> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-06-11 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718650#action_12718650
 ] 

Yonik Seeley commented on LUCENE-1607:
--

bq. I still have a feeling we should expose a single static method - intern() 
and hide implementation away

While the default should be beneficial for most users, I'd hate to lock away a 
users ability to either expand or remove the cache. 

> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-06-10 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718198#action_12718198
 ] 

Earwin Burrfoot commented on LUCENE-1607:
-

bq. but I was waiting for some kind of feedback if people in general thought it 
was the right approach. It introduces another static, and people tend to not 
like that.
Just forgot somehow about this issue.
You're right about static, it's not clear how and when to initialize it, plus 
you introduce some public classes we'll be unable to change/remove later.
I still have a feeling we should expose a single static method - intern() and 
hide implementation away, possibly tuning it to be advantageous for  String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-06-10 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718188#action_12718188
 ] 

Yonik Seeley commented on LUCENE-1607:
--

I think so... but I was waiting for some kind of feedback if people in general 
thought it was the right approach.  It introduces another static, and people 
tend to not like that.  I accidentally didn't upload the latest version with 
the javadoc + helper method.  I'll do that now.

> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-06-10 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718181#action_12718181
 ] 

Michael McCandless commented on LUCENE-1607:


Yonik is this ready to go in...?

> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-05-05 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706129#action_12706129
 ] 

Earwin Burrfoot commented on LUCENE-1607:
-

Bug in previous algo (unbounded hash):

  // check cached hashCode first, and use compareTo to avoid
  // dynamic type checking in equals().
  if (h == other.hashCode() && s.compareTo(other)==0) {
return s;  <--- here we should return 'other'
  }


> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-29 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704484#action_12704484
 ] 

Shalin Shekhar Mangar commented on LUCENE-1607:
---

Yonik, the string is being interned twice in your latest patch
{code}
+  if (e==null) {
+s = s.intern();
+arr[slot] = new Entry(s.intern(), h, first);
{code}

> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-29 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704315#action_12704315
 ] 

Earwin Burrfoot commented on LUCENE-1607:
-

A top bound on cache size will do? If you're fed too much unique strings it'll 
end up with characteristics of simple cache.

> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-29 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704313#action_12704313
 ] 

Earwin Burrfoot commented on LUCENE-1607:
-

Is there 'any' benefit of dumping String.intern() compatibility? All those 
interns happen at startup anyway.
If future brings us something better (which I doubt in this case), we'll just 
swap the impl, which is all-private.

All in all I consider a practice of system-property-driven pluggable 
implementations a sickening one. Take MMapDirectory that was impossible to use 
until now without twiddling with startup keys, take 
SegmentReader/ReadonlySegmentReader - resulting code is ugly and nobody's going 
to override defaults anyway, all important dependencies are package-private.

> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-29 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704306#action_12704306
 ] 

Yonik Seeley commented on LUCENE-1607:
--

The last patch removed the ability to plug a different implementation, but I 
guess we don't need that until we have another implementation (and it's not 
clear if the benefits of dumping String.intern() compatability in the future 
outweigh the disadvantages of breaking back compatibility).

I plan on committing this shortly.

> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-29 Thread Patrick Eger (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704261#action_12704261
 ] 

Patrick Eger commented on LUCENE-1607:
--

Ah I see thanks. Was looking at the most recent patch (by  date).

> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-29 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704252#action_12704252
 ] 

Yonik Seeley commented on LUCENE-1607:
--

bq. As a quick comment on the implementation, i notice that it is possible 
(with reasonable probability) for hash collisions to result in re-interning a 
pair of strings multiple times.

For the first implementation, yes.  The latest implementation is guaranteed to 
only intern a string once ( the hashtable does probing and resizing.)


> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-29 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704247#action_12704247
 ] 

Yonik Seeley commented on LUCENE-1607:
--

bq. why h >> 7?

Was copied from Solr's hashing of doc ids... we didn't want to throw away too 
many lower bits since they were likely to be the most random.  In string 
hashes, the rightmost bits also have the most entropy.

bq. Should I add a patch with Yonik's last hash impl and all calls to 
String.intern() replaced to get it moving? 

That would be helpful, thanks!

> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-29 Thread Patrick Eger (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704246#action_12704246
 ] 

Patrick Eger commented on LUCENE-1607:
--

As a quick comment on the implementation, i notice that it is possible (with 
reasonable probability) for hash collisions to result in re-interning a pair of 
strings multiple times. For example, a codepath that traverses across 32 unique 
string datapoints (say, in an inner loop somewhere), would have a minimum 3% 
probability of colliding and re-interning 2 strings every time through the 
loop. Because of the birthday paradox, it becomes likely to have such a 
situation (50% probability with ~150 unique values).

These are the probabilities of collision, assuming random distribution and 
perfect hashing. In real life the distribution will not be so random 
(string.hashCode() & MASK) so these will be "best case".
2 datapoints: collision prob = 0.006104%
4 datapoints: collision prob = 0.036617%
8 datapoints: collision prob = 0.170779%
16 datapoints: collision prob = 0.729976%
32 datapoints: collision prob = 2.983863%
64 datapoints: collision prob = 11.591861%
128 datapoints: collision prob = 39.188158%
256 datapoints: collision prob = 86.501947%


Practically this may or may not matter. My thought is that some sort of fast 
LRU structure would be better, but of course creating something like this 
without locking may be tricky. Another idea might be to support some form of 
limited hash-chaining or probing in the table, which would mitigate the damage 
of a collision significantly.


for reference, python code for calculating birthday/hash collisions, 
shamelessly stolen:

def bp(n, d):
v = 1.0
for i in range(n):
v = v * (1 - float(i)/d)
return 1 - v

for n in [2, 4, 8, 16, 32, 64, 128, 256]:
print "%i datapoints: collision prob = %f%%" % (n, bp(n, 16*1024)*100)

> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-29 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704225#action_12704225
 ] 

Earwin Burrfoot commented on LUCENE-1607:
-

Mmm.. what's the status of this one?
Should I add a patch with Yonik's last hash impl and all calls to 
String.intern() replaced to get it moving?

> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-22 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701626#action_12701626
 ] 

Earwin Burrfoot commented on LUCENE-1607:
-

I tried it out. Works a little bit better than simple cache (no stray interns 
must've paid off), doesn't degrade at all.
I'd like to change starter value to something 256-1024, it works way better for 
10-20 fields.

Why h >> 7? I understand that you're sacking collision-guilty bits, but why not 
exact amount that was used (have to store it?), or a whole byte or two?

> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-20 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700942#action_12700942
 ] 

Yonik Seeley commented on LUCENE-1607:
--

bq. In no-collision resolution scheme, if you detect a collision early with 
hashcode, you still call String.intern(). 

In the old code, yes... that's why I left it commented out there.  In the 
latest code, we re-probe on a collision w/o calling compareTo (assuming 
hashCodes are unequal) and only call intern() if it's really a String we 
haven't seen before.  The new code is a real hash table that will store all the 
Strings requested through it, unlike the first variant that was a simple cache.

> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-20 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700935#action_12700935
 ] 

Earwin Burrfoot commented on LUCENE-1607:
-

bq. Collisions should also be very quick to skip by comparing the hash code 
(which is cached for Strings).
In no-collision resolution scheme, if you detect a collision early with 
hashcode, you still call String.intern(). That kills any benefit gained from 
hashcode check vs compareTo. And if no collision is detected, introducing this 
check slows things down a little. Yes, that surprised me too.

> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-20 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700931#action_12700931
 ] 

Yonik Seeley commented on LUCENE-1607:
--

bq. The fastest hash we can get, should have no collisions. This is achievable 
by resizing on each new collision.

Hmmm, in my quick'n'dirty tests of about 256 unique strings, a smaller hash 
table was actually quicker (initialized with 32 and let it resize vs starting 
at 1024).  I imagine that this would be due to a larger part of the table 
fitting in smaller and faster processor caches.  YMMV.  Collisions should also 
be very quick to skip by comparing the hash code (which is cached for Strings).



> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-20 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700928#action_12700928
 ] 

Earwin Burrfoot commented on LUCENE-1607:
-

Hehe, ten minute difference. Take over this issue, since you're obviously 
better at it than I?

> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, 
> LUCENE-1607.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-20 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700827#action_12700827
 ] 

Yonik Seeley commented on LUCENE-1607:
--

bq. lack of collision resolve

My version was the most basic... a simple cache that is not guaranteed to 
always hit.  I also wanted to keep the overhead very low in case of misses 
(hence no re-probing).  In the best case, I don't think one can get much 
faster... and in the worst case it won't be much slower.  There could be other 
implementations of course...

bq. It'll break on non-power-of-two sizes. 

The size is guaranteed to be a power of two by the constructor.

> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-20 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700784#action_12700784
 ] 

Mark Miller commented on LUCENE-1607:
-

bq. The thread safety problem has to do with safe object publication

Oh, great, you mean basic concurrency competency :( Well thats an embarrassing 
erosion of knowledge. Back to the books. Thread a has no happens before 
relationship with thread b unless they share the lock. I trained myself to just 
synchronize or use volatile long ago (and then let the knowledge begin seeping 
I guess), but even still, every time I see one of these double lock check type 
tricks I go, eh, this one must work or something.

> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-20 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700696#action_12700696
 ] 

Earwin Burrfoot commented on LUCENE-1607:
-

Okay, you're probably right. It's not that hashmap can be corrupted on 
subsequent write, it's that reader threads can possibly see not-yet-built 
hashmap.
And volatile works, while simple sync doesn't, because volatile also orders 
reads. 
Hm. Hm. Hm.
Than, as you suggested, all we need is our own hash implementation that remains 
usable even being partially updated. I skipped through your patch and something 
looks suspicious to me. Like lack of collision resolve and that line:
int slot = h & (cache.length-1);
It'll break on non-power-of-two sizes.

> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-19 Thread Derek DeMarco (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700629#action_12700629
 ] 

Derek DeMarco commented on LUCENE-1607:
---

Good point Yonik.  Making pool volatile should take care of it, but only on 
JVMs 1.5+, as before that volatile didn't prevent reordering of non-volatile 
reads/writes around it.

> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-19 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700618#action_12700618
 ] 

Yonik Seeley commented on LUCENE-1607:
--

The thread safety problem has to do with safe object publication (making an 
object visible to a different thread).  Mutable objects generally can't be 
safely published w/o synchronization it has to do with CPU caches in 
multi-CPU systems.

IIRC, current x86 architectures will see fewer problems since read barriers are 
no-ops.. caches are guaranteed to be coherent.  But even on x86, you aren't 
guaranteed that instructions and writes won't be reordered...  so the 
assignment to pool could be visible *before* all the memory changes that 
newPool.put() could cause.

> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: [jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-19 Thread Chris Miller

You was. I just wanted to point out that in real apps you're not going
to see stale data longer than for milliseconds


Agreed, which is why this whole discussion is very theoretical anyway :)


On cache miss, I re-retrieve pool reference after a lock (HashMap is
no longer stale), re-read a string, and do the write only if the
string is still not there.


Ah I see what you're saying now, I had been overlooking the re-retrieval 
of the pool reference. Thanks for clarifying that for me.



The truth is born in argument. I reread jmm docs to be sure, but I
can't guarantee I understood them well.


Seems like you have a pretty good understanding to me :)




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: [jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-19 Thread Earwin Burrfoot
> Sorry I wasn't as clear as I could have been - I realise JEE servers use a
> threadpool for handling requests, I was thinking of many other applications
> in the real world I'm aware of that don't (be that good design or
> otherwise...).
You was. I just wanted to point out that in real apps you're not going
to see stale data longer than for milliseconds (unless you're some
processor cache guru doing his black magic). Besides calling intern(),
threads synchronize in a gazillion other ways, and each sync updates
stale stuff.

>  I'm not sure I follow you though when you say "it won't even
> do a write" on a cache miss, it copies the (potentially stale) HashMap in
> that situation does it not?
On cache miss, I re-retrieve pool reference after a lock (HashMap is
no longer stale), re-read a string, and do the write only if the
string is still not there.

> I was only trying to raise the stale/visibility issue but it's clear you've
> already given that plenty of thought.
The truth is born in argument. I reread jmm docs to be sure, but I
can't guarantee I understood them well.

-- 
Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: [jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-19 Thread Chris Miller
Sorry I wasn't as clear as I could have been - I realise JEE servers use 
a threadpool for handling requests, I was thinking of many other applications 
in the real world I'm aware of that don't (be that good design or otherwise...). 
I'm not sure I follow you though when you say "it won't even do a write" 
on a cache miss, it copies the (potentially stale) HashMap in that situation 
does it not?


I still think there are some *theoretical* dangers (eg a thread replacing 
a well populated cache with a stale/empty copy). Just consider me pessimistic 
after having been burnt in the past by similar 'impossibilities' ;-) Overall 
though I'm in agreement that there aren't any practical issues with this 
code and in fact it'll give a good performance boost over String.intern() 
even in the worst case anyway! I was only trying to raise the stale/visibility 
issue but it's clear you've already given that plenty of thought.




On Sun, Apr 19, 2009 at 23:42, Chris Miller 
wrote:


As soon as all possible fields are in the pool, we're essentially
readonly.


The problem is, there's no guarantee we will ever reach this point.
For example suppose you have a server app that spawns a new thread
per request. Each new thread might have to make all the .intern()
calls again because they never see anything in the cache.


No sane server app is actually spawning a new thread per request, it
takes one from pool. A spawned thread inherits visibility of its
parent thread (at the moment of spawning). Even if a new thread still
sees stale pool, it'll be updated on first cache miss. It won't even
do a write. So, no "all the intern calls again", such scenario doesn't
exist even in theory.






-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: [jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-19 Thread Earwin Burrfoot
On Sun, Apr 19, 2009 at 23:42, Chris Miller  wrote:
>> As soon as all possible fields are in the pool, we're essentially
>> readonly.
> The problem is, there's no guarantee we will ever reach this point. For
> example suppose you have a server app that spawns a new thread per request.
> Each new thread might have to make all the .intern() calls again because
> they never see anything in the cache.

No sane server app is actually spawning a new thread per request, it
takes one from pool. A spawned thread inherits visibility of its
parent thread (at the moment of spawning). Even if a new thread still
sees stale pool, it'll be updated on first cache miss. It won't even
do a write. So, no "all the intern calls again", such scenario doesn't
exist even in theory.

-- 
Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: [jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-19 Thread Chris Miller

As soon as all possible fields are in the pool, we're essentially
readonly.


The problem is, there's no guarantee we will ever reach this point. For example 
suppose you have a server app that spawns a new thread per request. Each 
new thread might have to make all the .intern() calls again because they 
never see anything in the cache. Having said that, I agree that the code 
will still work correctly regardless, and this is a very unlikely scenario 
anyway - for most practical situations this is never going to be an issue.





-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-19 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700604#action_12700604
 ] 

Earwin Burrfoot commented on LUCENE-1607:
-

bq. What was the field count? Is it still a considerable speedup with hundreds 
of fields without slowing anything else down ?
The field count was 1 :) I rewrote the benchmark, with extra code Java6 speedup 
on single already interned field became 12.5x.
Results for three java varieties, and different sets of keys follow:

Java 6 (64, server):

  1 key
interned 12.47x
uninterned 3.76x

  10 keys
interned 8.03x
uninterned 3.08x

  100 keys
interned 6.58x
uninterned 2.55x

  1000 keys
interned 5.39x
uninterned 2.69x

Java 5 (64, server):

  1 key
interned 9.84x
uninterned 5.03x

  10 keys
interned 7.00x
uninterned 4.61x

  100 keys
interned 6.61x
uninterned 2.28x

  1000 keys
interned 4.73x
uninterned 2.73x

Java 4 (32, client):

  1 key
interned 4.90x
uninterned 2.88x

  10 keys
interned 4.08x
uninterned 2.67x

  100 keys
interned 3.88x
uninterned 2.52x

  1000 keys
interned 3.44x
uninterned 2.31x


> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: [jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-19 Thread Chris Miller

How about benchmarking with eg a ConcurrentHashMap instead?


Scratch that, I forgot about the 1.3/1.4 dependency...




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: [jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-19 Thread Earwin Burrfoot
On Sun, Apr 19, 2009 at 23:16, Chris Miller  wrote:
> As far as I can see, both these implementations only suffer from
> threadsafety problems in that they don't guarantee visibility across
> threads, ie it's possible for threads to see stale data.

> So the code should work fine if you can live with the consequences of stale
> data - in this case, the (remote) possibility of large performance
> differences between VMs.

I guarantee just enough visibility for this code to never ever produce
incorrect results.
As soon as all possible fields are in the pool, we're essentially readonly.
Take whatever JVM you have out there, readonly will beat anything
performance-wise :)

> How about benchmarking with eg a ConcurrentHashMap instead?
No Java 5 until 3.0. It will be slower anyway.

-- 
Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: [jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-19 Thread Chris Miller
As far as I can see, both these implementations only suffer from threadsafety 
problems in that they don't guarantee visibility across threads, ie it's 
possible for threads to see stale data. I don't see any prospect of corruption 
or race conditions due to out-of-order execution. So the code should work 
fine if you can live with the consequences of stale data - in this case, 
the (remote) possibility of large performance differences between VMs. Personally 
I tend to avoid such fragile and hard to maintain code unless there's a very 
good reason for it.


How about benchmarking with eg a ConcurrentHashMap instead?



[
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.j
ira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=127
00600#action_12700600 ]

Mark Miller commented on LUCENE-1607:
-
bq. Earwin, I took a quick look at your implementation just now, but
it doesn't look thread-safe.

That was my first impression too, but I couldnt pin down the issue.
The access will either be against the old pool, or it will be against
the new pool, and the instance switch should be atomic? I figured it
was a clever trick of some kind (though I did wonder about the cost of
making the new hashmap every add). The HashMaps are read only right
(once they can be accessed by multiple threads)? And they are popped
in with an atomic variable assignment?


String.intern() faster alternative
--
Key: LUCENE-1607
URL: https://issues.apache.org/jira/browse/LUCENE-1607
Project: Lucene - Java
Issue Type: Improvement
Reporter: Earwin Burrfoot
Fix For: 2.9
Attachments: intern.patch, LUCENE-1607.patch

By using our own interned string pool on top of default,
String.intern() can be greatly optimized.

On my setup (java 6) this alternative runs ~15.8x faster for already
interned strings, and ~2.2x faster for 'new String(interned)'

For java 5 and 4 speedup is lower, but still considerable.






-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-19 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700601#action_12700601
 ] 

Earwin Burrfoot commented on LUCENE-1607:
-

bq. This default would be more back compatible since programs may rely on 
String instances being interned via String.intern(). 
My version is also String.intern()-compatible

bq. Earwin, I took a quick look at your implementation just now, but it doesn't 
look thread-safe.
unlock for one thread happens-before lock on the same monitor for the other 
thread, inside each thread each action happens-before the next one
Since I get pool reference for the second time after the lock, and write pool 
reference before unlocking, everything's fine. As for the other threads, if 
they find what they need in the pool, it doesn't matter if they're seeing a 
stale pool, or not. If they don't find what they need, they hit the lock, 
re-retrieve pool reference, getting the latest one and either find what they 
need there, or write.
Correct me if I'm wrong?

bq. though I did wonder about the cost of making the new hashmap every add
It's COSTLY :) But you're going to pay all of it at startup.

I think we can introduce the class (without any interfaces, why should we need 
one here?), and then try to make it faster by switching storage. I tried GNU 
Trove THashMap, but on Java 6 it was slower than stock HashMap.

> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-19 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700600#action_12700600
 ] 

Mark Miller commented on LUCENE-1607:
-

bq. Earwin, I took a quick look at your implementation just now, but it doesn't 
look thread-safe. 

That was my first impression too, but I couldnt pin down the issue. The access 
will either be against the old pool, or it will be against the new pool, and 
the instance switch should be atomic? I figured it was a clever trick of some 
kind (though I did wonder about the cost of making the new hashmap every add). 
The HashMaps are read only right (once they can be accessed by multiple 
threads)? And they are popped in with an atomic variable assignment?


> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-19 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700599#action_12700599
 ] 

Yonik Seeley commented on LUCENE-1607:
--

Earwin, I took a quick look at your implementation just now, but it doesn't 
look thread-safe.

> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
> Fix For: 2.9
>
> Attachments: intern.patch, LUCENE-1607.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-19 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700593#action_12700593
 ] 

Mark Miller commented on LUCENE-1607:
-

What was the field count? Is it still a considerable speedup with hundreds of 
fields without slowing anything else down ?(I would assume so, but would be 
nice to know considering a new hashmap is made per add - it is a one time hit 
though and the number of fields is not likely to exceed hundreds at the 
extreme)?

Also would be great to get the speedup numbers for Java 4 and 5.

I'll relate this to the 3 or 4 other field intern issues out there in a bit.

> String.intern() faster alternative
> --
>
> Key: LUCENE-1607
> URL: https://issues.apache.org/jira/browse/LUCENE-1607
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Earwin Burrfoot
> Fix For: 2.9
>
> Attachments: intern.patch
>
>
> By using our own interned string pool on top of default, String.intern() can 
> be greatly optimized.
> On my setup (java 6) this alternative runs ~15.8x faster for already interned 
> strings, and ~2.2x faster for 'new String(interned)'
> For java 5 and 4 speedup is lower, but still considerable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org