[jira] [Commented] (LUCENE-9959) Can we remove threadlocals of stored fields and term vectors
[ https://issues.apache.org/jira/browse/LUCENE-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409333#comment-17409333 ] ASF subversion and git services commented on LUCENE-9959: - Commit d4e4fe22b1826ec6e96432182a98a6e2fc9b91fc in lucene's branch refs/heads/main from zacharymorn [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=d4e4fe2 ] Revert "LUCENE-9959: Add non thread local based API for term vector reader usage (#180)" (#280) This reverts commit 180cfa241b133c75c31daf3628db4979c949f7f6. > Can we remove threadlocals of stored fields and term vectors > > > Key: LUCENE-9959 > URL: https://issues.apache.org/jira/browse/LUCENE-9959 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Time Spent: 8.5h > Remaining Estimate: 0h > > [~rmuir] suggested removing these threadlocals at > https://github.com/apache/lucene/pull/137#issuecomment-840111367. > These threadlocals are trappy if you manage many segments and threads within > the same JVM, or worse: non-fixed threadpools. The challenge is to keep the > API easy to use. > We could take advantage of 9.0 to change the stored fields API? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9959) Can we remove threadlocals of stored fields and term vectors
[ https://issues.apache.org/jira/browse/LUCENE-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409181#comment-17409181 ] Zach Chen commented on LUCENE-9959: --- Hi [~jpountz], sorry for the delay here, somehow I missed the update earlier. +1 for reverting the changes to unblock 9.0 release, I've created a PR here https://github.com/apache/lucene/pull/280 > Can we remove threadlocals of stored fields and term vectors > > > Key: LUCENE-9959 > URL: https://issues.apache.org/jira/browse/LUCENE-9959 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Time Spent: 8.5h > Remaining Estimate: 0h > > [~rmuir] suggested removing these threadlocals at > https://github.com/apache/lucene/pull/137#issuecomment-840111367. > These threadlocals are trappy if you manage many segments and threads within > the same JVM, or worse: non-fixed threadpools. The challenge is to keep the > API easy to use. > We could take advantage of 9.0 to change the stored fields API? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9959) Can we remove threadlocals of stored fields and term vectors
[ https://issues.apache.org/jira/browse/LUCENE-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408921#comment-17408921 ] Adrien Grand commented on LUCENE-9959: -- [~zacharymorn] Any concerns with reverting? > Can we remove threadlocals of stored fields and term vectors > > > Key: LUCENE-9959 > URL: https://issues.apache.org/jira/browse/LUCENE-9959 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Time Spent: 8h 20m > Remaining Estimate: 0h > > [~rmuir] suggested removing these threadlocals at > https://github.com/apache/lucene/pull/137#issuecomment-840111367. > These threadlocals are trappy if you manage many segments and threads within > the same JVM, or worse: non-fixed threadpools. The challenge is to keep the > API easy to use. > We could take advantage of 9.0 to change the stored fields API? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9959) Can we remove threadlocals of stored fields and term vectors
[ https://issues.apache.org/jira/browse/LUCENE-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17400267#comment-17400267 ] Adrien Grand commented on LUCENE-9959: -- Since this issue is proving a bit controversial, and is currently half-baked with term vectors having a new API without threadlocals but not stored fields, let's revert changes that have been done until now in order not to block the Lucene 9.0 release ([~sokolov] sent an email to the dev list about releasing soon). > Can we remove threadlocals of stored fields and term vectors > > > Key: LUCENE-9959 > URL: https://issues.apache.org/jira/browse/LUCENE-9959 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Time Spent: 8h 20m > Remaining Estimate: 0h > > [~rmuir] suggested removing these threadlocals at > https://github.com/apache/lucene/pull/137#issuecomment-840111367. > These threadlocals are trappy if you manage many segments and threads within > the same JVM, or worse: non-fixed threadpools. The challenge is to keep the > API easy to use. > We could take advantage of 9.0 to change the stored fields API? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9959) Can we remove threadlocals of stored fields and term vectors
[ https://issues.apache.org/jira/browse/LUCENE-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17386614#comment-17386614 ] Zach Chen commented on LUCENE-9959: --- {quote}I had put it on hold to see whether we should explore changing the API like you did rather than still caching stored fields readers per thread but removing as much state as possible like my PR does. {quote} I see. Thanks for the clarification! {quote}If the new API proves controversial, I'd be open to an alternative that would consist of keeping the previous API and pulling a new TermVectorsReader (resp. StoredFieldsReader) internally every time that term vectors (resp. stored fields) are requested instead of the previous approach that consisted of caching instances in a threadlocal. {quote} +1. Do we want to try this different approach for stored field, and see how it compares with the new API for term vector (which may create inconsistency between APIs for the two, but hopefully temporarily) ? > Can we remove threadlocals of stored fields and term vectors > > > Key: LUCENE-9959 > URL: https://issues.apache.org/jira/browse/LUCENE-9959 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Time Spent: 8h 20m > Remaining Estimate: 0h > > [~rmuir] suggested removing these threadlocals at > https://github.com/apache/lucene/pull/137#issuecomment-840111367. > These threadlocals are trappy if you manage many segments and threads within > the same JVM, or worse: non-fixed threadpools. The challenge is to keep the > API easy to use. > We could take advantage of 9.0 to change the stored fields API? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9959) Can we remove threadlocals of stored fields and term vectors
[ https://issues.apache.org/jira/browse/LUCENE-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17384068#comment-17384068 ] Adrien Grand commented on LUCENE-9959: -- I actually like the changes made in #180. It does add a new level of indirection through a new public class, but this is already what we're doing with e.g. doc values where you need to pull a reader first, then advance the iterator, and only then you can retrieve a value. It's slightly more annoying to use with this new indirection, but maybe the benefits are worth the hassle? When I'm thinking of a machine like Mike's beast with its 128 cores, it doesn't sound right to keep 128 states *per segment* for stored fields and term vectors at all times when: - these states use tens of kB, - while not entirely free to create, these states aren't super expensive to create either, - it's very unlikely that many of these states get used at the same time, because hopefully more time is spent running queries than fetching stored fields or term vectors for top hits. bq. I noticed your original PR https://github.com/apache/lucene/pull/137 that led to this Jira and also touched on stored fields has not been merged yet, do you plan to merge it any time soon, or will you have more changes for it? I had put it on hold to see whether we should explore changing the API like you did rather than still caching stored fields readers per thread but removing as much state as possible like my PR does. If the new API proves controversial, I'd be open to an alternative that would consist of keeping the previous API and pulling a new TermVectorsReader (resp. StoredFieldsReader) internally every time that term vectors (resp. stored fields) are requested instead of the previous approach that consisted of caching instances in a threadlocal. > Can we remove threadlocals of stored fields and term vectors > > > Key: LUCENE-9959 > URL: https://issues.apache.org/jira/browse/LUCENE-9959 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Time Spent: 7h 40m > Remaining Estimate: 0h > > [~rmuir] suggested removing these threadlocals at > https://github.com/apache/lucene/pull/137#issuecomment-840111367. > These threadlocals are trappy if you manage many segments and threads within > the same JVM, or worse: non-fixed threadpools. The challenge is to keep the > API easy to use. > We could take advantage of 9.0 to change the stored fields API? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9959) Can we remove threadlocals of stored fields and term vectors
[ https://issues.apache.org/jira/browse/LUCENE-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17383314#comment-17383314 ] David Smiley commented on LUCENE-9959: -- Rob, please don't needlessly personalize your critical feedback towards me, and plus the swearing just magnifies whatever your message is against the person and not the code. And I'm sure you are smart enough to look at just about anyone's work and understand where the motivation comes from, even if you don't agree with the approach. Let's not talk to each other or each other's code in this way. I'm not pleased with the extra public class either – [https://github.com/apache/lucene/pull/180#pullrequestreview-686320076] I said as much. At least "Fields" can become purely internal and thus the net change is just one more class for TVs (Zach added "TermVectors"). In [https://github.com/apache/lucene/pull/180#issuecomment-876482149] I thought of an approach that may work but last night I had second thoughts and went for simplicity of the change (adding DocTermVectors). Based on your feedback Rob, lets ignore what I came up with last night and I'll post a different PR that doesn't introduce it. > Can we remove threadlocals of stored fields and term vectors > > > Key: LUCENE-9959 > URL: https://issues.apache.org/jira/browse/LUCENE-9959 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Time Spent: 7h 40m > Remaining Estimate: 0h > > [~rmuir] suggested removing these threadlocals at > https://github.com/apache/lucene/pull/137#issuecomment-840111367. > These threadlocals are trappy if you manage many segments and threads within > the same JVM, or worse: non-fixed threadpools. The challenge is to keep the > API easy to use. > We could take advantage of 9.0 to change the stored fields API? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9959) Can we remove threadlocals of stored fields and term vectors
[ https://issues.apache.org/jira/browse/LUCENE-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17383126#comment-17383126 ] Robert Muir commented on LUCENE-9959: - I don't think we should do this anymore. Somehow this has morphed into a david-smiley-adds-shit-tons-of-new-public-classes-for-no-goddamn-reason without actually solving any problems. We shouldn't even tackle this issue if it can't be done in a responsible way without destroying APIs. > Can we remove threadlocals of stored fields and term vectors > > > Key: LUCENE-9959 > URL: https://issues.apache.org/jira/browse/LUCENE-9959 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Time Spent: 7.5h > Remaining Estimate: 0h > > [~rmuir] suggested removing these threadlocals at > https://github.com/apache/lucene/pull/137#issuecomment-840111367. > These threadlocals are trappy if you manage many segments and threads within > the same JVM, or worse: non-fixed threadpools. The challenge is to keep the > API easy to use. > We could take advantage of 9.0 to change the stored fields API? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9959) Can we remove threadlocals of stored fields and term vectors
[ https://issues.apache.org/jira/browse/LUCENE-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17379654#comment-17379654 ] Zach Chen commented on LUCENE-9959: --- Hi [~jpountz], I've merged the PR for term vectors thread local removal, and plan to take on the stored fields one next. I noticed your original PR [https://github.com/apache/lucene/pull/137] that led to this Jira and also touched on stored fields has not been merged yet, do you plan to merge it any time soon, or will you have more changes for it? > Can we remove threadlocals of stored fields and term vectors > > > Key: LUCENE-9959 > URL: https://issues.apache.org/jira/browse/LUCENE-9959 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Time Spent: 7.5h > Remaining Estimate: 0h > > [~rmuir] suggested removing these threadlocals at > https://github.com/apache/lucene/pull/137#issuecomment-840111367. > These threadlocals are trappy if you manage many segments and threads within > the same JVM, or worse: non-fixed threadpools. The challenge is to keep the > API easy to use. > We could take advantage of 9.0 to change the stored fields API? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9959) Can we remove threadlocals of stored fields and term vectors
[ https://issues.apache.org/jira/browse/LUCENE-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17379646#comment-17379646 ] ASF subversion and git services commented on LUCENE-9959: - Commit 180cfa241b133c75c31daf3628db4979c949f7f6 in lucene's branch refs/heads/main from zacharymorn [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=180cfa2 ] LUCENE-9959: Add non thread local based API for term vector reader usage (#180) > Can we remove threadlocals of stored fields and term vectors > > > Key: LUCENE-9959 > URL: https://issues.apache.org/jira/browse/LUCENE-9959 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Time Spent: 7h 20m > Remaining Estimate: 0h > > [~rmuir] suggested removing these threadlocals at > https://github.com/apache/lucene/pull/137#issuecomment-840111367. > These threadlocals are trappy if you manage many segments and threads within > the same JVM, or worse: non-fixed threadpools. The challenge is to keep the > API easy to use. > We could take advantage of 9.0 to change the stored fields API? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9959) Can we remove threadlocals of stored fields and term vectors
[ https://issues.apache.org/jira/browse/LUCENE-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362813#comment-17362813 ] Adrien Grand commented on LUCENE-9959: -- bq. Is the implementation what you are expecting? Yes, something along these lines indeed. I left some thoughts on the PR, the main one being that we should avoid leaking logic like checkIntegrity/clone/getMergeInstance to the IndexReader APIs. bq. add a new API to go alongside with them, and gradually phased out the use of the existing two +1 bq. I'm a bit wondering why other readers from SegmentReader don't need to use the same thread local approach Actually e.g. doc values used to leverage thread locals until Lucene 7.0. We removed the thread locals because Lucene 7 changed the doc values API from being random-access (like stored fields) to being an iterator, and the iterator API is not compatible with caching. > Can we remove threadlocals of stored fields and term vectors > > > Key: LUCENE-9959 > URL: https://issues.apache.org/jira/browse/LUCENE-9959 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > [~rmuir] suggested removing these threadlocals at > https://github.com/apache/lucene/pull/137#issuecomment-840111367. > These threadlocals are trappy if you manage many segments and threads within > the same JVM, or worse: non-fixed threadpools. The challenge is to keep the > API easy to use. > We could take advantage of 9.0 to change the stored fields API? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9959) Can we remove threadlocals of stored fields and term vectors
[ https://issues.apache.org/jira/browse/LUCENE-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362392#comment-17362392 ] Zach Chen commented on LUCENE-9959: --- I took a look at this issue and the idea suggested by Robert (and https://issues.apache.org/jira/browse/LUCENE-1195 that seems to introduce thread local originally), and gave it a try with this WIP PR [https://github.com/apache/lucene/pull/180] (with commit [https://github.com/apache/lucene/commit/5062e4d69938f104b461004022e19c10a65960a5] that has the most meaningful changes). Is the implementation what you are expecting? I feel since `IndexReader` already has APIs _getTermVectors_ and _getTermVector_, it might not be too bad to add a new API to go alongside with them, and gradually phased out the use of the existing two (at least for term vector)? In addition, I'm a bit wondering why other readers from SegmentReader don't need to use the same thread local approach for concurrency / caching (namely, the PointsReader, NormsProducer, DocValuesProducer, VectorReader, FieldsProducer in SegmentReader). I'm guessing these readers' operations might be much less costly compared with term vector reader and stored field reader, so their operations are made thread-safe internally? I'll dig around to understand more about the context there... > Can we remove threadlocals of stored fields and term vectors > > > Key: LUCENE-9959 > URL: https://issues.apache.org/jira/browse/LUCENE-9959 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > [~rmuir] suggested removing these threadlocals at > https://github.com/apache/lucene/pull/137#issuecomment-840111367. > These threadlocals are trappy if you manage many segments and threads within > the same JVM, or worse: non-fixed threadpools. The challenge is to keep the > API easy to use. > We could take advantage of 9.0 to change the stored fields API? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org