[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746845#action_12746845 ] Yonik Seeley commented on LUCENE-1607: -- {quote} isn't it possible to make the call to public String intern(char[] arr, int offset, int len) {} not create a String? {quote} Yep - that's why I added that method (just didn't get around to implementing it that way). > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot >Assignee: Yonik Seeley > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607-contrib.patch, > LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746773#action_12746773 ] Noble Paul commented on LUCENE-1607: isn't it possible to make the call to {code} public String intern(char[] arr, int offset, int len) { } {code} not create a String? currently , if I have a char[] it ends up creating a String (which makes a copy of the char[]) before it does the intern() operation. > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot >Assignee: Yonik Seeley > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607-contrib.patch, > LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740638#action_12740638 ] Uwe Schindler commented on LUCENE-1607: --- Committed rev 802095. > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot >Assignee: Yonik Seeley > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607-contrib.patch, > LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739628#action_12739628 ] Yonik Seeley commented on LUCENE-1607: -- bq. Except I'd like the javadoc demand each impl to be String.intern()-compatible. If *everything* went through the same intern, it wouldn't matter. bq. rewrite 'for' as 'for (Entry e = first;e != null;e = e.next)' for clarity? done. bq. If check around 'nextToLast = e' can also be removed? I don't see how... > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot >Assignee: Yonik Seeley > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731185#action_12731185 ] Mark Miller commented on LUCENE-1607: - looks like this is so close ... > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot >Assignee: Yonik Seeley > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723352#action_12723352 ] Earwin Burrfoot commented on LUCENE-1607: - Okay, let's have an extra class and ability to switch impls. I liked that static method could get inlined (at least its short-path), but that's not necessary. Except I'd like the javadoc demand each impl to be String.intern()-compatible. There's nothing bad in it, as in any decent impl an unique string will be String.intern()'ed one time at most. And the case when you get an infinite flow of unique strings is degenerate anyway, you have to fix something, not deal with it. On the other hand, we can remove "This should never be changed after other Lucene APIs have been used" clause. rewrite 'for' as 'for (Entry e = first;e != null;e = e.next)' for clarity? 'Entry[] arr = cache;' - this can be skipped? 'cache' is already final and optimizer loves finals. Plus further down the method you use both cache[slot] and arr[slot]. Or am I missing some voodoo? If check around 'nextToLast = e' can also be removed? 'public String intern(char[] arr, int offset, int len)' - is this needed? > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot >Assignee: Yonik Seeley > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720291#action_12720291 ] Michael McCandless commented on LUCENE-1607: bq. I've held off because of a lack of consensus Silence = consensus! > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot >Assignee: Yonik Seeley > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720274#action_12720274 ] Yonik Seeley commented on LUCENE-1607: -- I've held off because of a lack of consensus, but I suppose I can do the old "i'll commit in a few days" thing. > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720266#action_12720266 ] Mark Miller commented on LUCENE-1607: - I assume we can assign this one to you Yonik? > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718650#action_12718650 ] Yonik Seeley commented on LUCENE-1607: -- bq. I still have a feeling we should expose a single static method - intern() and hide implementation away While the default should be beneficial for most users, I'd hate to lock away a users ability to either expand or remove the cache. > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718198#action_12718198 ] Earwin Burrfoot commented on LUCENE-1607: - bq. but I was waiting for some kind of feedback if people in general thought it was the right approach. It introduces another static, and people tend to not like that. Just forgot somehow about this issue. You're right about static, it's not clear how and when to initialize it, plus you introduce some public classes we'll be unable to change/remove later. I still have a feeling we should expose a single static method - intern() and hide implementation away, possibly tuning it to be advantageous for String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718188#action_12718188 ] Yonik Seeley commented on LUCENE-1607: -- I think so... but I was waiting for some kind of feedback if people in general thought it was the right approach. It introduces another static, and people tend to not like that. I accidentally didn't upload the latest version with the javadoc + helper method. I'll do that now. > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718181#action_12718181 ] Michael McCandless commented on LUCENE-1607: Yonik is this ready to go in...? > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706129#action_12706129 ] Earwin Burrfoot commented on LUCENE-1607: - Bug in previous algo (unbounded hash): // check cached hashCode first, and use compareTo to avoid // dynamic type checking in equals(). if (h == other.hashCode() && s.compareTo(other)==0) { return s; <--- here we should return 'other' } > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704484#action_12704484 ] Shalin Shekhar Mangar commented on LUCENE-1607: --- Yonik, the string is being interned twice in your latest patch {code} + if (e==null) { +s = s.intern(); +arr[slot] = new Entry(s.intern(), h, first); {code} > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704315#action_12704315 ] Earwin Burrfoot commented on LUCENE-1607: - A top bound on cache size will do? If you're fed too much unique strings it'll end up with characteristics of simple cache. > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704313#action_12704313 ] Earwin Burrfoot commented on LUCENE-1607: - Is there 'any' benefit of dumping String.intern() compatibility? All those interns happen at startup anyway. If future brings us something better (which I doubt in this case), we'll just swap the impl, which is all-private. All in all I consider a practice of system-property-driven pluggable implementations a sickening one. Take MMapDirectory that was impossible to use until now without twiddling with startup keys, take SegmentReader/ReadonlySegmentReader - resulting code is ugly and nobody's going to override defaults anyway, all important dependencies are package-private. > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704306#action_12704306 ] Yonik Seeley commented on LUCENE-1607: -- The last patch removed the ability to plug a different implementation, but I guess we don't need that until we have another implementation (and it's not clear if the benefits of dumping String.intern() compatability in the future outweigh the disadvantages of breaking back compatibility). I plan on committing this shortly. > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704261#action_12704261 ] Patrick Eger commented on LUCENE-1607: -- Ah I see thanks. Was looking at the most recent patch (by date). > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704252#action_12704252 ] Yonik Seeley commented on LUCENE-1607: -- bq. As a quick comment on the implementation, i notice that it is possible (with reasonable probability) for hash collisions to result in re-interning a pair of strings multiple times. For the first implementation, yes. The latest implementation is guaranteed to only intern a string once ( the hashtable does probing and resizing.) > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704247#action_12704247 ] Yonik Seeley commented on LUCENE-1607: -- bq. why h >> 7? Was copied from Solr's hashing of doc ids... we didn't want to throw away too many lower bits since they were likely to be the most random. In string hashes, the rightmost bits also have the most entropy. bq. Should I add a patch with Yonik's last hash impl and all calls to String.intern() replaced to get it moving? That would be helpful, thanks! > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704246#action_12704246 ] Patrick Eger commented on LUCENE-1607: -- As a quick comment on the implementation, i notice that it is possible (with reasonable probability) for hash collisions to result in re-interning a pair of strings multiple times. For example, a codepath that traverses across 32 unique string datapoints (say, in an inner loop somewhere), would have a minimum 3% probability of colliding and re-interning 2 strings every time through the loop. Because of the birthday paradox, it becomes likely to have such a situation (50% probability with ~150 unique values). These are the probabilities of collision, assuming random distribution and perfect hashing. In real life the distribution will not be so random (string.hashCode() & MASK) so these will be "best case". 2 datapoints: collision prob = 0.006104% 4 datapoints: collision prob = 0.036617% 8 datapoints: collision prob = 0.170779% 16 datapoints: collision prob = 0.729976% 32 datapoints: collision prob = 2.983863% 64 datapoints: collision prob = 11.591861% 128 datapoints: collision prob = 39.188158% 256 datapoints: collision prob = 86.501947% Practically this may or may not matter. My thought is that some sort of fast LRU structure would be better, but of course creating something like this without locking may be tricky. Another idea might be to support some form of limited hash-chaining or probing in the table, which would mitigate the damage of a collision significantly. for reference, python code for calculating birthday/hash collisions, shamelessly stolen: def bp(n, d): v = 1.0 for i in range(n): v = v * (1 - float(i)/d) return 1 - v for n in [2, 4, 8, 16, 32, 64, 128, 256]: print "%i datapoints: collision prob = %f%%" % (n, bp(n, 16*1024)*100) > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704225#action_12704225 ] Earwin Burrfoot commented on LUCENE-1607: - Mmm.. what's the status of this one? Should I add a patch with Yonik's last hash impl and all calls to String.intern() replaced to get it moving? > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701626#action_12701626 ] Earwin Burrfoot commented on LUCENE-1607: - I tried it out. Works a little bit better than simple cache (no stray interns must've paid off), doesn't degrade at all. I'd like to change starter value to something 256-1024, it works way better for 10-20 fields. Why h >> 7? I understand that you're sacking collision-guilty bits, but why not exact amount that was used (have to store it?), or a whole byte or two? > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700942#action_12700942 ] Yonik Seeley commented on LUCENE-1607: -- bq. In no-collision resolution scheme, if you detect a collision early with hashcode, you still call String.intern(). In the old code, yes... that's why I left it commented out there. In the latest code, we re-probe on a collision w/o calling compareTo (assuming hashCodes are unequal) and only call intern() if it's really a String we haven't seen before. The new code is a real hash table that will store all the Strings requested through it, unlike the first variant that was a simple cache. > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700935#action_12700935 ] Earwin Burrfoot commented on LUCENE-1607: - bq. Collisions should also be very quick to skip by comparing the hash code (which is cached for Strings). In no-collision resolution scheme, if you detect a collision early with hashcode, you still call String.intern(). That kills any benefit gained from hashcode check vs compareTo. And if no collision is detected, introducing this check slows things down a little. Yes, that surprised me too. > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700931#action_12700931 ] Yonik Seeley commented on LUCENE-1607: -- bq. The fastest hash we can get, should have no collisions. This is achievable by resizing on each new collision. Hmmm, in my quick'n'dirty tests of about 256 unique strings, a smaller hash table was actually quicker (initialized with 32 and let it resize vs starting at 1024). I imagine that this would be due to a larger part of the table fitting in smaller and faster processor caches. YMMV. Collisions should also be very quick to skip by comparing the hash code (which is cached for Strings). > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700928#action_12700928 ] Earwin Burrfoot commented on LUCENE-1607: - Hehe, ten minute difference. Take over this issue, since you're obviously better at it than I? > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch, LUCENE-1607.patch, > LUCENE-1607.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700827#action_12700827 ] Yonik Seeley commented on LUCENE-1607: -- bq. lack of collision resolve My version was the most basic... a simple cache that is not guaranteed to always hit. I also wanted to keep the overhead very low in case of misses (hence no re-probing). In the best case, I don't think one can get much faster... and in the worst case it won't be much slower. There could be other implementations of course... bq. It'll break on non-power-of-two sizes. The size is guaranteed to be a power of two by the constructor. > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700784#action_12700784 ] Mark Miller commented on LUCENE-1607: - bq. The thread safety problem has to do with safe object publication Oh, great, you mean basic concurrency competency :( Well thats an embarrassing erosion of knowledge. Back to the books. Thread a has no happens before relationship with thread b unless they share the lock. I trained myself to just synchronize or use volatile long ago (and then let the knowledge begin seeping I guess), but even still, every time I see one of these double lock check type tricks I go, eh, this one must work or something. > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700696#action_12700696 ] Earwin Burrfoot commented on LUCENE-1607: - Okay, you're probably right. It's not that hashmap can be corrupted on subsequent write, it's that reader threads can possibly see not-yet-built hashmap. And volatile works, while simple sync doesn't, because volatile also orders reads. Hm. Hm. Hm. Than, as you suggested, all we need is our own hash implementation that remains usable even being partially updated. I skipped through your patch and something looks suspicious to me. Like lack of collision resolve and that line: int slot = h & (cache.length-1); It'll break on non-power-of-two sizes. > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700629#action_12700629 ] Derek DeMarco commented on LUCENE-1607: --- Good point Yonik. Making pool volatile should take care of it, but only on JVMs 1.5+, as before that volatile didn't prevent reordering of non-volatile reads/writes around it. > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700618#action_12700618 ] Yonik Seeley commented on LUCENE-1607: -- The thread safety problem has to do with safe object publication (making an object visible to a different thread). Mutable objects generally can't be safely published w/o synchronization it has to do with CPU caches in multi-CPU systems. IIRC, current x86 architectures will see fewer problems since read barriers are no-ops.. caches are guaranteed to be coherent. But even on x86, you aren't guaranteed that instructions and writes won't be reordered... so the assignment to pool could be visible *before* all the memory changes that newPool.put() could cause. > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENE-1607) String.intern() faster alternative
You was. I just wanted to point out that in real apps you're not going to see stale data longer than for milliseconds Agreed, which is why this whole discussion is very theoretical anyway :) On cache miss, I re-retrieve pool reference after a lock (HashMap is no longer stale), re-read a string, and do the write only if the string is still not there. Ah I see what you're saying now, I had been overlooking the re-retrieval of the pool reference. Thanks for clarifying that for me. The truth is born in argument. I reread jmm docs to be sure, but I can't guarantee I understood them well. Seems like you have a pretty good understanding to me :) - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENE-1607) String.intern() faster alternative
> Sorry I wasn't as clear as I could have been - I realise JEE servers use a > threadpool for handling requests, I was thinking of many other applications > in the real world I'm aware of that don't (be that good design or > otherwise...). You was. I just wanted to point out that in real apps you're not going to see stale data longer than for milliseconds (unless you're some processor cache guru doing his black magic). Besides calling intern(), threads synchronize in a gazillion other ways, and each sync updates stale stuff. > I'm not sure I follow you though when you say "it won't even > do a write" on a cache miss, it copies the (potentially stale) HashMap in > that situation does it not? On cache miss, I re-retrieve pool reference after a lock (HashMap is no longer stale), re-read a string, and do the write only if the string is still not there. > I was only trying to raise the stale/visibility issue but it's clear you've > already given that plenty of thought. The truth is born in argument. I reread jmm docs to be sure, but I can't guarantee I understood them well. -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 ICQ: 104465785 - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENE-1607) String.intern() faster alternative
Sorry I wasn't as clear as I could have been - I realise JEE servers use a threadpool for handling requests, I was thinking of many other applications in the real world I'm aware of that don't (be that good design or otherwise...). I'm not sure I follow you though when you say "it won't even do a write" on a cache miss, it copies the (potentially stale) HashMap in that situation does it not? I still think there are some *theoretical* dangers (eg a thread replacing a well populated cache with a stale/empty copy). Just consider me pessimistic after having been burnt in the past by similar 'impossibilities' ;-) Overall though I'm in agreement that there aren't any practical issues with this code and in fact it'll give a good performance boost over String.intern() even in the worst case anyway! I was only trying to raise the stale/visibility issue but it's clear you've already given that plenty of thought. On Sun, Apr 19, 2009 at 23:42, Chris Miller wrote: As soon as all possible fields are in the pool, we're essentially readonly. The problem is, there's no guarantee we will ever reach this point. For example suppose you have a server app that spawns a new thread per request. Each new thread might have to make all the .intern() calls again because they never see anything in the cache. No sane server app is actually spawning a new thread per request, it takes one from pool. A spawned thread inherits visibility of its parent thread (at the moment of spawning). Even if a new thread still sees stale pool, it'll be updated on first cache miss. It won't even do a write. So, no "all the intern calls again", such scenario doesn't exist even in theory. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENE-1607) String.intern() faster alternative
On Sun, Apr 19, 2009 at 23:42, Chris Miller wrote: >> As soon as all possible fields are in the pool, we're essentially >> readonly. > The problem is, there's no guarantee we will ever reach this point. For > example suppose you have a server app that spawns a new thread per request. > Each new thread might have to make all the .intern() calls again because > they never see anything in the cache. No sane server app is actually spawning a new thread per request, it takes one from pool. A spawned thread inherits visibility of its parent thread (at the moment of spawning). Even if a new thread still sees stale pool, it'll be updated on first cache miss. It won't even do a write. So, no "all the intern calls again", such scenario doesn't exist even in theory. -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 ICQ: 104465785 - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENE-1607) String.intern() faster alternative
As soon as all possible fields are in the pool, we're essentially readonly. The problem is, there's no guarantee we will ever reach this point. For example suppose you have a server app that spawns a new thread per request. Each new thread might have to make all the .intern() calls again because they never see anything in the cache. Having said that, I agree that the code will still work correctly regardless, and this is a very unlikely scenario anyway - for most practical situations this is never going to be an issue. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700604#action_12700604 ] Earwin Burrfoot commented on LUCENE-1607: - bq. What was the field count? Is it still a considerable speedup with hundreds of fields without slowing anything else down ? The field count was 1 :) I rewrote the benchmark, with extra code Java6 speedup on single already interned field became 12.5x. Results for three java varieties, and different sets of keys follow: Java 6 (64, server): 1 key interned 12.47x uninterned 3.76x 10 keys interned 8.03x uninterned 3.08x 100 keys interned 6.58x uninterned 2.55x 1000 keys interned 5.39x uninterned 2.69x Java 5 (64, server): 1 key interned 9.84x uninterned 5.03x 10 keys interned 7.00x uninterned 4.61x 100 keys interned 6.61x uninterned 2.28x 1000 keys interned 4.73x uninterned 2.73x Java 4 (32, client): 1 key interned 4.90x uninterned 2.88x 10 keys interned 4.08x uninterned 2.67x 100 keys interned 3.88x uninterned 2.52x 1000 keys interned 3.44x uninterned 2.31x > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENE-1607) String.intern() faster alternative
How about benchmarking with eg a ConcurrentHashMap instead? Scratch that, I forgot about the 1.3/1.4 dependency... - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENE-1607) String.intern() faster alternative
On Sun, Apr 19, 2009 at 23:16, Chris Miller wrote: > As far as I can see, both these implementations only suffer from > threadsafety problems in that they don't guarantee visibility across > threads, ie it's possible for threads to see stale data. > So the code should work fine if you can live with the consequences of stale > data - in this case, the (remote) possibility of large performance > differences between VMs. I guarantee just enough visibility for this code to never ever produce incorrect results. As soon as all possible fields are in the pool, we're essentially readonly. Take whatever JVM you have out there, readonly will beat anything performance-wise :) > How about benchmarking with eg a ConcurrentHashMap instead? No Java 5 until 3.0. It will be slower anyway. -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 ICQ: 104465785 - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENE-1607) String.intern() faster alternative
As far as I can see, both these implementations only suffer from threadsafety problems in that they don't guarantee visibility across threads, ie it's possible for threads to see stale data. I don't see any prospect of corruption or race conditions due to out-of-order execution. So the code should work fine if you can live with the consequences of stale data - in this case, the (remote) possibility of large performance differences between VMs. Personally I tend to avoid such fragile and hard to maintain code unless there's a very good reason for it. How about benchmarking with eg a ConcurrentHashMap instead? [ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.j ira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=127 00600#action_12700600 ] Mark Miller commented on LUCENE-1607: - bq. Earwin, I took a quick look at your implementation just now, but it doesn't look thread-safe. That was my first impression too, but I couldnt pin down the issue. The access will either be against the old pool, or it will be against the new pool, and the instance switch should be atomic? I figured it was a clever trick of some kind (though I did wonder about the cost of making the new hashmap every add). The HashMaps are read only right (once they can be accessed by multiple threads)? And they are popped in with an atomic variable assignment? String.intern() faster alternative -- Key: LUCENE-1607 URL: https://issues.apache.org/jira/browse/LUCENE-1607 Project: Lucene - Java Issue Type: Improvement Reporter: Earwin Burrfoot Fix For: 2.9 Attachments: intern.patch, LUCENE-1607.patch By using our own interned string pool on top of default, String.intern() can be greatly optimized. On my setup (java 6) this alternative runs ~15.8x faster for already interned strings, and ~2.2x faster for 'new String(interned)' For java 5 and 4 speedup is lower, but still considerable. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700601#action_12700601 ] Earwin Burrfoot commented on LUCENE-1607: - bq. This default would be more back compatible since programs may rely on String instances being interned via String.intern(). My version is also String.intern()-compatible bq. Earwin, I took a quick look at your implementation just now, but it doesn't look thread-safe. unlock for one thread happens-before lock on the same monitor for the other thread, inside each thread each action happens-before the next one Since I get pool reference for the second time after the lock, and write pool reference before unlocking, everything's fine. As for the other threads, if they find what they need in the pool, it doesn't matter if they're seeing a stale pool, or not. If they don't find what they need, they hit the lock, re-retrieve pool reference, getting the latest one and either find what they need there, or write. Correct me if I'm wrong? bq. though I did wonder about the cost of making the new hashmap every add It's COSTLY :) But you're going to pay all of it at startup. I think we can introduce the class (without any interfaces, why should we need one here?), and then try to make it faster by switching storage. I tried GNU Trove THashMap, but on Java 6 it was slower than stock HashMap. > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700600#action_12700600 ] Mark Miller commented on LUCENE-1607: - bq. Earwin, I took a quick look at your implementation just now, but it doesn't look thread-safe. That was my first impression too, but I couldnt pin down the issue. The access will either be against the old pool, or it will be against the new pool, and the instance switch should be atomic? I figured it was a clever trick of some kind (though I did wonder about the cost of making the new hashmap every add). The HashMaps are read only right (once they can be accessed by multiple threads)? And they are popped in with an atomic variable assignment? > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700599#action_12700599 ] Yonik Seeley commented on LUCENE-1607: -- Earwin, I took a quick look at your implementation just now, but it doesn't look thread-safe. > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot > Fix For: 2.9 > > Attachments: intern.patch, LUCENE-1607.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1607) String.intern() faster alternative
[ https://issues.apache.org/jira/browse/LUCENE-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700593#action_12700593 ] Mark Miller commented on LUCENE-1607: - What was the field count? Is it still a considerable speedup with hundreds of fields without slowing anything else down ?(I would assume so, but would be nice to know considering a new hashmap is made per add - it is a one time hit though and the number of fields is not likely to exceed hundreds at the extreme)? Also would be great to get the speedup numbers for Java 4 and 5. I'll relate this to the 3 or 4 other field intern issues out there in a bit. > String.intern() faster alternative > -- > > Key: LUCENE-1607 > URL: https://issues.apache.org/jira/browse/LUCENE-1607 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Earwin Burrfoot > Fix For: 2.9 > > Attachments: intern.patch > > > By using our own interned string pool on top of default, String.intern() can > be greatly optimized. > On my setup (java 6) this alternative runs ~15.8x faster for already interned > strings, and ~2.2x faster for 'new String(interned)' > For java 5 and 4 speedup is lower, but still considerable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org