[jira] [Commented] (LUCENE-9959) Can we remove threadlocals of stored fields and term vectors

2021-09-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409333#comment-17409333
 ] 

ASF subversion and git services commented on LUCENE-9959:
-

Commit d4e4fe22b1826ec6e96432182a98a6e2fc9b91fc in lucene's branch 
refs/heads/main from zacharymorn
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=d4e4fe2 ]

Revert "LUCENE-9959: Add non thread local based API for term vector reader 
usage (#180)" (#280)

This reverts commit 180cfa241b133c75c31daf3628db4979c949f7f6.

> Can we remove threadlocals of stored fields and term vectors
> 
>
> Key: LUCENE-9959
> URL: https://issues.apache.org/jira/browse/LUCENE-9959
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> [~rmuir] suggested removing these threadlocals at 
> https://github.com/apache/lucene/pull/137#issuecomment-840111367.
> These threadlocals are trappy if you manage many segments and threads within 
> the same JVM, or worse: non-fixed threadpools. The challenge is to keep the 
> API easy to use.
> We could take advantage of 9.0 to change the stored fields API?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9959) Can we remove threadlocals of stored fields and term vectors

2021-09-02 Thread Zach Chen (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409181#comment-17409181
 ] 

Zach Chen commented on LUCENE-9959:
---

Hi [~jpountz], sorry for the delay here, somehow I missed the update earlier. 
+1 for reverting the changes to unblock 9.0 release, I've created a PR here 
https://github.com/apache/lucene/pull/280

> Can we remove threadlocals of stored fields and term vectors
> 
>
> Key: LUCENE-9959
> URL: https://issues.apache.org/jira/browse/LUCENE-9959
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> [~rmuir] suggested removing these threadlocals at 
> https://github.com/apache/lucene/pull/137#issuecomment-840111367.
> These threadlocals are trappy if you manage many segments and threads within 
> the same JVM, or worse: non-fixed threadpools. The challenge is to keep the 
> API easy to use.
> We could take advantage of 9.0 to change the stored fields API?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9959) Can we remove threadlocals of stored fields and term vectors

2021-09-02 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408921#comment-17408921
 ] 

Adrien Grand commented on LUCENE-9959:
--

[~zacharymorn] Any concerns with reverting?

> Can we remove threadlocals of stored fields and term vectors
> 
>
> Key: LUCENE-9959
> URL: https://issues.apache.org/jira/browse/LUCENE-9959
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> [~rmuir] suggested removing these threadlocals at 
> https://github.com/apache/lucene/pull/137#issuecomment-840111367.
> These threadlocals are trappy if you manage many segments and threads within 
> the same JVM, or worse: non-fixed threadpools. The challenge is to keep the 
> API easy to use.
> We could take advantage of 9.0 to change the stored fields API?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9959) Can we remove threadlocals of stored fields and term vectors

2021-08-17 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17400267#comment-17400267
 ] 

Adrien Grand commented on LUCENE-9959:
--

Since this issue is proving a bit controversial, and is currently half-baked 
with term vectors having a new API without threadlocals but not stored fields, 
let's revert changes that have been done until now in order not to block the 
Lucene 9.0 release ([~sokolov] sent an email to the dev list about releasing 
soon).

> Can we remove threadlocals of stored fields and term vectors
> 
>
> Key: LUCENE-9959
> URL: https://issues.apache.org/jira/browse/LUCENE-9959
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> [~rmuir] suggested removing these threadlocals at 
> https://github.com/apache/lucene/pull/137#issuecomment-840111367.
> These threadlocals are trappy if you manage many segments and threads within 
> the same JVM, or worse: non-fixed threadpools. The challenge is to keep the 
> API easy to use.
> We could take advantage of 9.0 to change the stored fields API?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9959) Can we remove threadlocals of stored fields and term vectors

2021-07-23 Thread Zach Chen (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17386614#comment-17386614
 ] 

Zach Chen commented on LUCENE-9959:
---

{quote}I had put it on hold to see whether we should explore changing the API 
like you did rather than still caching stored fields readers per thread but 
removing as much state as possible like my PR does.
{quote}
I see. Thanks for the clarification!
{quote}If the new API proves controversial, I'd be open to an alternative that 
would consist of keeping the previous API and pulling a new TermVectorsReader 
(resp. StoredFieldsReader) internally every time that term vectors (resp. 
stored fields) are requested instead of the previous approach that consisted of 
caching instances in a threadlocal.
{quote}
+1.  Do we want to try this different approach for stored field, and see how it 
compares with the new API for term vector (which may create inconsistency 
between APIs for the two, but hopefully temporarily) ?

> Can we remove threadlocals of stored fields and term vectors
> 
>
> Key: LUCENE-9959
> URL: https://issues.apache.org/jira/browse/LUCENE-9959
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> [~rmuir] suggested removing these threadlocals at 
> https://github.com/apache/lucene/pull/137#issuecomment-840111367.
> These threadlocals are trappy if you manage many segments and threads within 
> the same JVM, or worse: non-fixed threadpools. The challenge is to keep the 
> API easy to use.
> We could take advantage of 9.0 to change the stored fields API?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9959) Can we remove threadlocals of stored fields and term vectors

2021-07-20 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17384068#comment-17384068
 ] 

Adrien Grand commented on LUCENE-9959:
--

I actually like the changes made in #180. It does add a new level of 
indirection through a new public class, but this is already what we're doing 
with e.g. doc values where you need to pull a reader first, then advance the 
iterator, and only then you can retrieve a value. It's slightly more annoying 
to use with this new indirection, but maybe the benefits are worth the hassle? 
When I'm thinking of a machine like Mike's beast with its 128 cores, it doesn't 
sound right to keep 128 states *per segment* for stored fields and term vectors 
at all times when:
 - these states use tens of kB,
 - while not entirely free to create, these states aren't super expensive to 
create either,
 - it's very unlikely that many of these states get used at the same time, 
because hopefully more time is spent running queries than fetching stored 
fields or term vectors for top hits.

bq. I noticed your original PR https://github.com/apache/lucene/pull/137 that 
led to this Jira and also touched on stored fields has not been merged yet, do 
you plan to merge it any time soon, or will you have more changes for it?

I had put it on hold to see whether we should explore changing the API like you 
did rather than still caching stored fields readers per thread but removing as 
much state as possible like my PR does.

If the new API proves controversial, I'd be open to an alternative that would 
consist of keeping the previous API and pulling a new TermVectorsReader (resp. 
StoredFieldsReader) internally every time that term vectors (resp. stored 
fields) are requested instead of the previous approach that consisted of 
caching instances in a threadlocal.

> Can we remove threadlocals of stored fields and term vectors
> 
>
> Key: LUCENE-9959
> URL: https://issues.apache.org/jira/browse/LUCENE-9959
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> [~rmuir] suggested removing these threadlocals at 
> https://github.com/apache/lucene/pull/137#issuecomment-840111367.
> These threadlocals are trappy if you manage many segments and threads within 
> the same JVM, or worse: non-fixed threadpools. The challenge is to keep the 
> API easy to use.
> We could take advantage of 9.0 to change the stored fields API?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9959) Can we remove threadlocals of stored fields and term vectors

2021-07-19 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17383314#comment-17383314
 ] 

David Smiley commented on LUCENE-9959:
--

Rob, please don't needlessly personalize your critical feedback towards me, and 
plus the swearing just magnifies whatever your message is against the person 
and not the code.  And I'm sure you are smart enough to look at just about 
anyone's work and understand where the motivation comes from, even if you don't 
agree with the approach.  Let's not talk to each other or each other's code in 
this way.

I'm not pleased with the extra public class either – 
[https://github.com/apache/lucene/pull/180#pullrequestreview-686320076] I said 
as much.  At least "Fields" can become purely internal and thus the net change 
is just one more class for TVs (Zach added "TermVectors").  In 
[https://github.com/apache/lucene/pull/180#issuecomment-876482149] I thought of 
an approach that may work but last night I had second thoughts and went for 
simplicity of the change (adding DocTermVectors).  Based on your feedback Rob, 
lets ignore what I came up with last night and I'll post a different PR that 
doesn't introduce it.

> Can we remove threadlocals of stored fields and term vectors
> 
>
> Key: LUCENE-9959
> URL: https://issues.apache.org/jira/browse/LUCENE-9959
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> [~rmuir] suggested removing these threadlocals at 
> https://github.com/apache/lucene/pull/137#issuecomment-840111367.
> These threadlocals are trappy if you manage many segments and threads within 
> the same JVM, or worse: non-fixed threadpools. The challenge is to keep the 
> API easy to use.
> We could take advantage of 9.0 to change the stored fields API?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9959) Can we remove threadlocals of stored fields and term vectors

2021-07-19 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17383126#comment-17383126
 ] 

Robert Muir commented on LUCENE-9959:
-

I don't think we should do this anymore. Somehow this has morphed into a 
david-smiley-adds-shit-tons-of-new-public-classes-for-no-goddamn-reason without 
actually solving any problems. We shouldn't even tackle this issue if it can't 
be done in a responsible way without destroying APIs.

> Can we remove threadlocals of stored fields and term vectors
> 
>
> Key: LUCENE-9959
> URL: https://issues.apache.org/jira/browse/LUCENE-9959
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> [~rmuir] suggested removing these threadlocals at 
> https://github.com/apache/lucene/pull/137#issuecomment-840111367.
> These threadlocals are trappy if you manage many segments and threads within 
> the same JVM, or worse: non-fixed threadpools. The challenge is to keep the 
> API easy to use.
> We could take advantage of 9.0 to change the stored fields API?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9959) Can we remove threadlocals of stored fields and term vectors

2021-07-13 Thread Zach Chen (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17379654#comment-17379654
 ] 

Zach Chen commented on LUCENE-9959:
---

Hi [~jpountz], I've merged the PR for term vectors thread local removal, and 
plan to take on the stored fields one next. I noticed your original PR 
[https://github.com/apache/lucene/pull/137] that led to this Jira and also 
touched on stored fields has not been merged yet, do you plan to merge it any 
time soon, or will you have more changes for it?

> Can we remove threadlocals of stored fields and term vectors
> 
>
> Key: LUCENE-9959
> URL: https://issues.apache.org/jira/browse/LUCENE-9959
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> [~rmuir] suggested removing these threadlocals at 
> https://github.com/apache/lucene/pull/137#issuecomment-840111367.
> These threadlocals are trappy if you manage many segments and threads within 
> the same JVM, or worse: non-fixed threadpools. The challenge is to keep the 
> API easy to use.
> We could take advantage of 9.0 to change the stored fields API?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9959) Can we remove threadlocals of stored fields and term vectors

2021-07-13 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17379646#comment-17379646
 ] 

ASF subversion and git services commented on LUCENE-9959:
-

Commit 180cfa241b133c75c31daf3628db4979c949f7f6 in lucene's branch 
refs/heads/main from zacharymorn
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=180cfa2 ]

LUCENE-9959: Add non thread local based API for term vector reader usage (#180)



> Can we remove threadlocals of stored fields and term vectors
> 
>
> Key: LUCENE-9959
> URL: https://issues.apache.org/jira/browse/LUCENE-9959
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> [~rmuir] suggested removing these threadlocals at 
> https://github.com/apache/lucene/pull/137#issuecomment-840111367.
> These threadlocals are trappy if you manage many segments and threads within 
> the same JVM, or worse: non-fixed threadpools. The challenge is to keep the 
> API easy to use.
> We could take advantage of 9.0 to change the stored fields API?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9959) Can we remove threadlocals of stored fields and term vectors

2021-06-14 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362813#comment-17362813
 ] 

Adrien Grand commented on LUCENE-9959:
--

bq. Is the implementation what you are expecting?

Yes, something along these lines indeed. I left some thoughts on the PR, the 
main one being that we should avoid leaking logic like 
checkIntegrity/clone/getMergeInstance to the IndexReader APIs.

bq. add a new API to go alongside with them, and gradually phased out the use 
of the existing two

+1

bq. I'm a bit wondering why other readers from SegmentReader don't need to use 
the same thread local approach

Actually e.g. doc values used to leverage thread locals until Lucene 7.0. We 
removed the thread locals because Lucene 7 changed the doc values API from 
being random-access (like stored fields) to being an iterator, and the iterator 
API is not compatible with caching.

> Can we remove threadlocals of stored fields and term vectors
> 
>
> Key: LUCENE-9959
> URL: https://issues.apache.org/jira/browse/LUCENE-9959
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [~rmuir] suggested removing these threadlocals at 
> https://github.com/apache/lucene/pull/137#issuecomment-840111367.
> These threadlocals are trappy if you manage many segments and threads within 
> the same JVM, or worse: non-fixed threadpools. The challenge is to keep the 
> API easy to use.
> We could take advantage of 9.0 to change the stored fields API?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9959) Can we remove threadlocals of stored fields and term vectors

2021-06-12 Thread Zach Chen (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362392#comment-17362392
 ] 

Zach Chen commented on LUCENE-9959:
---

I took a look at this issue and the idea suggested by Robert (and 
https://issues.apache.org/jira/browse/LUCENE-1195 that seems to introduce 
thread local originally), and gave it a try with this WIP PR 
[https://github.com/apache/lucene/pull/180] (with commit 
[https://github.com/apache/lucene/commit/5062e4d69938f104b461004022e19c10a65960a5]
 that has the most meaningful changes). Is the implementation what you are 
expecting? I feel since `IndexReader` already has APIs _getTermVectors_ and 
_getTermVector_, it might not be too bad to add a new API to go alongside with 
them, and gradually phased out the use of the existing two (at least for term 
vector)?

In addition, I'm a bit wondering why other readers from SegmentReader don't 
need to use the same thread local approach for concurrency / caching (namely, 
the PointsReader, NormsProducer, DocValuesProducer, VectorReader, 
FieldsProducer in SegmentReader). I'm guessing these readers' operations might 
be much less costly compared with term vector reader and stored field reader, 
so their operations are made thread-safe internally? I'll dig around to 
understand more about the context there...

> Can we remove threadlocals of stored fields and term vectors
> 
>
> Key: LUCENE-9959
> URL: https://issues.apache.org/jira/browse/LUCENE-9959
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> [~rmuir] suggested removing these threadlocals at 
> https://github.com/apache/lucene/pull/137#issuecomment-840111367.
> These threadlocals are trappy if you manage many segments and threads within 
> the same JVM, or worse: non-fixed threadpools. The challenge is to keep the 
> API easy to use.
> We could take advantage of 9.0 to change the stored fields API?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org