[jira] [Commented] (LUCENE-5123) invert the codec postings API

2014-11-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219969#comment-14219969
 ] 

ASF subversion and git services commented on LUCENE-5123:
-

Commit 1640807 from [~mikemccand] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1640807 ]

LUCENE-5123: fix changes

 invert the codec postings API
 -

 Key: LUCENE-5123
 URL: https://issues.apache.org/jira/browse/LUCENE-5123
 Project: Lucene - Core
  Issue Type: Wish
Reporter: Robert Muir
Assignee: Michael McCandless
 Fix For: 5.0, Trunk

 Attachments: LUCENE-5123.patch, LUCENE-5123.patch, LUCENE-5123.patch, 
 LUCENE-5123.patch, LUCENE-5123.patch


 Currently FieldsConsumer/PostingsConsumer/etc is a push oriented api, e.g. 
 FreqProxTermsWriter streams the postings at flush, and the default merge() 
 takes the incoming codec api and filters out deleted docs and pushes via 
 same api (but that can be overridden).
 It could be cleaner if we allowed for a pull model instead (like 
 DocValues). For example, maybe FreqProxTermsWriter could expose a Terms of 
 itself and just passed this to the codec consumer.
 This would give the codec more flexibility to e.g. do multiple passes if it 
 wanted to do things like encode high-frequency terms more efficiently with a 
 bitset-like encoding or other things...
 A codec can try to do things like this to some extent today, but its very 
 difficult (look at buffering in Pulsing). We made this change with DV and it 
 made a lot of interesting optimizations easy to implement...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5123) invert the codec postings API

2014-11-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219971#comment-14219971
 ] 

ASF subversion and git services commented on LUCENE-5123:
-

Commit 1640808 from [~mikemccand] in branch 'dev/trunk'
[ https://svn.apache.org/r1640808 ]

LUCENE-5123: fix changes

 invert the codec postings API
 -

 Key: LUCENE-5123
 URL: https://issues.apache.org/jira/browse/LUCENE-5123
 Project: Lucene - Core
  Issue Type: Wish
Reporter: Robert Muir
Assignee: Michael McCandless
 Fix For: 5.0, Trunk

 Attachments: LUCENE-5123.patch, LUCENE-5123.patch, LUCENE-5123.patch, 
 LUCENE-5123.patch, LUCENE-5123.patch


 Currently FieldsConsumer/PostingsConsumer/etc is a push oriented api, e.g. 
 FreqProxTermsWriter streams the postings at flush, and the default merge() 
 takes the incoming codec api and filters out deleted docs and pushes via 
 same api (but that can be overridden).
 It could be cleaner if we allowed for a pull model instead (like 
 DocValues). For example, maybe FreqProxTermsWriter could expose a Terms of 
 itself and just passed this to the codec consumer.
 This would give the codec more flexibility to e.g. do multiple passes if it 
 wanted to do things like encode high-frequency terms more efficiently with a 
 bitset-like encoding or other things...
 A codec can try to do things like this to some extent today, but its very 
 difficult (look at buffering in Pulsing). We made this change with DV and it 
 made a lot of interesting optimizations easy to implement...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5123) invert the codec postings API

2014-08-25 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108967#comment-14108967
 ] 

Michael McCandless commented on LUCENE-5123:


Thanks Rob!

 invert the codec postings API
 -

 Key: LUCENE-5123
 URL: https://issues.apache.org/jira/browse/LUCENE-5123
 Project: Lucene - Core
  Issue Type: Wish
Reporter: Robert Muir
Assignee: Michael McCandless
 Fix For: 5.0, 4.11

 Attachments: LUCENE-5123.patch, LUCENE-5123.patch, LUCENE-5123.patch, 
 LUCENE-5123.patch, LUCENE-5123.patch


 Currently FieldsConsumer/PostingsConsumer/etc is a push oriented api, e.g. 
 FreqProxTermsWriter streams the postings at flush, and the default merge() 
 takes the incoming codec api and filters out deleted docs and pushes via 
 same api (but that can be overridden).
 It could be cleaner if we allowed for a pull model instead (like 
 DocValues). For example, maybe FreqProxTermsWriter could expose a Terms of 
 itself and just passed this to the codec consumer.
 This would give the codec more flexibility to e.g. do multiple passes if it 
 wanted to do things like encode high-frequency terms more efficiently with a 
 bitset-like encoding or other things...
 A codec can try to do things like this to some extent today, but its very 
 difficult (look at buffering in Pulsing). We made this change with DV and it 
 made a lot of interesting optimizations easy to implement...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5123) invert the codec postings API

2014-08-24 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108669#comment-14108669
 ] 

ASF subversion and git services commented on LUCENE-5123:
-

Commit 1620250 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1620250 ]

LUCENE-5123, LUCENE-5268: invert codec postings api (backport from trunk)

 invert the codec postings API
 -

 Key: LUCENE-5123
 URL: https://issues.apache.org/jira/browse/LUCENE-5123
 Project: Lucene - Core
  Issue Type: Wish
Reporter: Robert Muir
Assignee: Michael McCandless
 Fix For: 5.0, 4.11

 Attachments: LUCENE-5123.patch, LUCENE-5123.patch, LUCENE-5123.patch, 
 LUCENE-5123.patch, LUCENE-5123.patch


 Currently FieldsConsumer/PostingsConsumer/etc is a push oriented api, e.g. 
 FreqProxTermsWriter streams the postings at flush, and the default merge() 
 takes the incoming codec api and filters out deleted docs and pushes via 
 same api (but that can be overridden).
 It could be cleaner if we allowed for a pull model instead (like 
 DocValues). For example, maybe FreqProxTermsWriter could expose a Terms of 
 itself and just passed this to the codec consumer.
 This would give the codec more flexibility to e.g. do multiple passes if it 
 wanted to do things like encode high-frequency terms more efficiently with a 
 bitset-like encoding or other things...
 A codec can try to do things like this to some extent today, but its very 
 difficult (look at buffering in Pulsing). We made this change with DV and it 
 made a lot of interesting optimizations easy to implement...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5123) invert the codec postings API

2014-08-24 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108672#comment-14108672
 ] 

ASF subversion and git services commented on LUCENE-5123:
-

Commit 1620252 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1620252 ]

LUCENE-5123, LUCENE-5268: move CHANGES 5.0 - 4.11

 invert the codec postings API
 -

 Key: LUCENE-5123
 URL: https://issues.apache.org/jira/browse/LUCENE-5123
 Project: Lucene - Core
  Issue Type: Wish
Reporter: Robert Muir
Assignee: Michael McCandless
 Fix For: 5.0, 4.11

 Attachments: LUCENE-5123.patch, LUCENE-5123.patch, LUCENE-5123.patch, 
 LUCENE-5123.patch, LUCENE-5123.patch


 Currently FieldsConsumer/PostingsConsumer/etc is a push oriented api, e.g. 
 FreqProxTermsWriter streams the postings at flush, and the default merge() 
 takes the incoming codec api and filters out deleted docs and pushes via 
 same api (but that can be overridden).
 It could be cleaner if we allowed for a pull model instead (like 
 DocValues). For example, maybe FreqProxTermsWriter could expose a Terms of 
 itself and just passed this to the codec consumer.
 This would give the codec more flexibility to e.g. do multiple passes if it 
 wanted to do things like encode high-frequency terms more efficiently with a 
 bitset-like encoding or other things...
 A codec can try to do things like this to some extent today, but its very 
 difficult (look at buffering in Pulsing). We made this change with DV and it 
 made a lot of interesting optimizations easy to implement...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5123) invert the codec postings API

2013-09-20 Thread Han Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772862#comment-13772862
 ] 

Han Jiang commented on LUCENE-5123:
---

Nice change! Although PushFieldsConsumer is still using the old API, I like the 
migrating
of flush() logic from FreqProxTermsWriterPerField to PushFieldsConsumer, the 
calling chain is 
more clear in codec level now. :)

Also, I'm quite curious whether StoredFields and TermVectors will get rid of 
'merge()' later.


 invert the codec postings API
 -

 Key: LUCENE-5123
 URL: https://issues.apache.org/jira/browse/LUCENE-5123
 Project: Lucene - Core
  Issue Type: Wish
Reporter: Robert Muir
Assignee: Michael McCandless
 Fix For: 5.0

 Attachments: LUCENE-5123.patch, LUCENE-5123.patch, LUCENE-5123.patch, 
 LUCENE-5123.patch, LUCENE-5123.patch


 Currently FieldsConsumer/PostingsConsumer/etc is a push oriented api, e.g. 
 FreqProxTermsWriter streams the postings at flush, and the default merge() 
 takes the incoming codec api and filters out deleted docs and pushes via 
 same api (but that can be overridden).
 It could be cleaner if we allowed for a pull model instead (like 
 DocValues). For example, maybe FreqProxTermsWriter could expose a Terms of 
 itself and just passed this to the codec consumer.
 This would give the codec more flexibility to e.g. do multiple passes if it 
 wanted to do things like encode high-frequency terms more efficiently with a 
 bitset-like encoding or other things...
 A codec can try to do things like this to some extent today, but its very 
 difficult (look at buffering in Pulsing). We made this change with DV and it 
 made a lot of interesting optimizations easy to implement...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5123) invert the codec postings API

2013-09-20 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772939#comment-13772939
 ] 

Michael McCandless commented on LUCENE-5123:


Thanks Han, I do like the new API better ...

I don't think we need go get rid of merge() for stored fields / term vectors, 
at least not yet ...

 invert the codec postings API
 -

 Key: LUCENE-5123
 URL: https://issues.apache.org/jira/browse/LUCENE-5123
 Project: Lucene - Core
  Issue Type: Wish
Reporter: Robert Muir
Assignee: Michael McCandless
 Fix For: 5.0

 Attachments: LUCENE-5123.patch, LUCENE-5123.patch, LUCENE-5123.patch, 
 LUCENE-5123.patch, LUCENE-5123.patch


 Currently FieldsConsumer/PostingsConsumer/etc is a push oriented api, e.g. 
 FreqProxTermsWriter streams the postings at flush, and the default merge() 
 takes the incoming codec api and filters out deleted docs and pushes via 
 same api (but that can be overridden).
 It could be cleaner if we allowed for a pull model instead (like 
 DocValues). For example, maybe FreqProxTermsWriter could expose a Terms of 
 itself and just passed this to the codec consumer.
 This would give the codec more flexibility to e.g. do multiple passes if it 
 wanted to do things like encode high-frequency terms more efficiently with a 
 bitset-like encoding or other things...
 A codec can try to do things like this to some extent today, but its very 
 difficult (look at buffering in Pulsing). We made this change with DV and it 
 made a lot of interesting optimizations easy to implement...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5123) invert the codec postings API

2013-09-20 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773028#comment-13773028
 ] 

Robert Muir commented on LUCENE-5123:
-

The only reason merge() exists there is so they can implement some bulk 
merging optimizations?

Can we remove these optimizations? Has there ever been a benchmark showing they 
help at all?

We shouldnt have such scary code in lucene because it looks faster. Every 
time I look at infostreams from merge, its completely dominated by postings and 
other things.

 invert the codec postings API
 -

 Key: LUCENE-5123
 URL: https://issues.apache.org/jira/browse/LUCENE-5123
 Project: Lucene - Core
  Issue Type: Wish
Reporter: Robert Muir
Assignee: Michael McCandless
 Fix For: 5.0

 Attachments: LUCENE-5123.patch, LUCENE-5123.patch, LUCENE-5123.patch, 
 LUCENE-5123.patch, LUCENE-5123.patch


 Currently FieldsConsumer/PostingsConsumer/etc is a push oriented api, e.g. 
 FreqProxTermsWriter streams the postings at flush, and the default merge() 
 takes the incoming codec api and filters out deleted docs and pushes via 
 same api (but that can be overridden).
 It could be cleaner if we allowed for a pull model instead (like 
 DocValues). For example, maybe FreqProxTermsWriter could expose a Terms of 
 itself and just passed this to the codec consumer.
 This would give the codec more flexibility to e.g. do multiple passes if it 
 wanted to do things like encode high-frequency terms more efficiently with a 
 bitset-like encoding or other things...
 A codec can try to do things like this to some extent today, but its very 
 difficult (look at buffering in Pulsing). We made this change with DV and it 
 made a lot of interesting optimizations easy to implement...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5123) invert the codec postings API

2013-09-19 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771998#comment-13771998
 ] 

Robert Muir commented on LUCENE-5123:
-

+1 to commit: patch looks awesome!

The change to TestConcurrentMergeScheduler looks leftover/accidental?


 invert the codec postings API
 -

 Key: LUCENE-5123
 URL: https://issues.apache.org/jira/browse/LUCENE-5123
 Project: Lucene - Core
  Issue Type: Wish
Reporter: Robert Muir
Assignee: Michael McCandless
 Fix For: 5.0

 Attachments: LUCENE-5123.patch, LUCENE-5123.patch, LUCENE-5123.patch, 
 LUCENE-5123.patch, LUCENE-5123.patch


 Currently FieldsConsumer/PostingsConsumer/etc is a push oriented api, e.g. 
 FreqProxTermsWriter streams the postings at flush, and the default merge() 
 takes the incoming codec api and filters out deleted docs and pushes via 
 same api (but that can be overridden).
 It could be cleaner if we allowed for a pull model instead (like 
 DocValues). For example, maybe FreqProxTermsWriter could expose a Terms of 
 itself and just passed this to the codec consumer.
 This would give the codec more flexibility to e.g. do multiple passes if it 
 wanted to do things like encode high-frequency terms more efficiently with a 
 bitset-like encoding or other things...
 A codec can try to do things like this to some extent today, but its very 
 difficult (look at buffering in Pulsing). We made this change with DV and it 
 made a lot of interesting optimizations easy to implement...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5123) invert the codec postings API

2013-09-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772087#comment-13772087
 ] 

Michael McCandless commented on LUCENE-5123:


Thanks Rob, I'll commit soon.

bq. The change to TestConcurrentMergeScheduler looks leftover/accidental?

It is separate, but intentional: it adds some fangs to that test.  I'll put a 
comment ...

 invert the codec postings API
 -

 Key: LUCENE-5123
 URL: https://issues.apache.org/jira/browse/LUCENE-5123
 Project: Lucene - Core
  Issue Type: Wish
Reporter: Robert Muir
Assignee: Michael McCandless
 Fix For: 5.0

 Attachments: LUCENE-5123.patch, LUCENE-5123.patch, LUCENE-5123.patch, 
 LUCENE-5123.patch, LUCENE-5123.patch


 Currently FieldsConsumer/PostingsConsumer/etc is a push oriented api, e.g. 
 FreqProxTermsWriter streams the postings at flush, and the default merge() 
 takes the incoming codec api and filters out deleted docs and pushes via 
 same api (but that can be overridden).
 It could be cleaner if we allowed for a pull model instead (like 
 DocValues). For example, maybe FreqProxTermsWriter could expose a Terms of 
 itself and just passed this to the codec consumer.
 This would give the codec more flexibility to e.g. do multiple passes if it 
 wanted to do things like encode high-frequency terms more efficiently with a 
 bitset-like encoding or other things...
 A codec can try to do things like this to some extent today, but its very 
 difficult (look at buffering in Pulsing). We made this change with DV and it 
 made a lot of interesting optimizations easy to implement...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5123) invert the codec postings API

2013-09-19 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772307#comment-13772307
 ] 

ASF subversion and git services commented on LUCENE-5123:
-

Commit 1524840 from [~mikemccand] in branch 'dev/trunk'
[ https://svn.apache.org/r1524840 ]

LUCENE-5123: invert postings writing API

 invert the codec postings API
 -

 Key: LUCENE-5123
 URL: https://issues.apache.org/jira/browse/LUCENE-5123
 Project: Lucene - Core
  Issue Type: Wish
Reporter: Robert Muir
Assignee: Michael McCandless
 Fix For: 5.0

 Attachments: LUCENE-5123.patch, LUCENE-5123.patch, LUCENE-5123.patch, 
 LUCENE-5123.patch, LUCENE-5123.patch


 Currently FieldsConsumer/PostingsConsumer/etc is a push oriented api, e.g. 
 FreqProxTermsWriter streams the postings at flush, and the default merge() 
 takes the incoming codec api and filters out deleted docs and pushes via 
 same api (but that can be overridden).
 It could be cleaner if we allowed for a pull model instead (like 
 DocValues). For example, maybe FreqProxTermsWriter could expose a Terms of 
 itself and just passed this to the codec consumer.
 This would give the codec more flexibility to e.g. do multiple passes if it 
 wanted to do things like encode high-frequency terms more efficiently with a 
 bitset-like encoding or other things...
 A codec can try to do things like this to some extent today, but its very 
 difficult (look at buffering in Pulsing). We made this change with DV and it 
 made a lot of interesting optimizations easy to implement...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5123) invert the codec postings API

2013-09-08 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13761540#comment-13761540
 ] 

Michael McCandless commented on LUCENE-5123:


{quote}
1. move write() from PostingsFormat to FieldsConsumer
2. make the push api a subclass of FieldsConsumer that has a final 
implementation of write() and exposes the abstract api it has today (e.g. 
addField)
{quote}

I started down this path (moved the write method to FieldsConsumer, and created 
a PushFieldsConsumer subclass that impls final write, exposing the current API) 
but ... this causes problems for wrapping/delegating PostingsConsumers (e.g. 
AssertingPF, BloomPF, PulsingPF) since suddenly they must be strongly typed to 
accept only PushFieldsConsumer.  Either that or I guess we could cut each of 
these over to write().

I mean, it exposes a real issue w/ the current patch: you cannot wrap 
SimpleTextPF (or any future PF that uses the pull API) inside these PFs that 
use the push API.  Not sure what to do ...



 invert the codec postings API
 -

 Key: LUCENE-5123
 URL: https://issues.apache.org/jira/browse/LUCENE-5123
 Project: Lucene - Core
  Issue Type: Wish
Reporter: Robert Muir
Assignee: Michael McCandless
 Attachments: LUCENE-5123.patch, LUCENE-5123.patch, LUCENE-5123.patch


 Currently FieldsConsumer/PostingsConsumer/etc is a push oriented api, e.g. 
 FreqProxTermsWriter streams the postings at flush, and the default merge() 
 takes the incoming codec api and filters out deleted docs and pushes via 
 same api (but that can be overridden).
 It could be cleaner if we allowed for a pull model instead (like 
 DocValues). For example, maybe FreqProxTermsWriter could expose a Terms of 
 itself and just passed this to the codec consumer.
 This would give the codec more flexibility to e.g. do multiple passes if it 
 wanted to do things like encode high-frequency terms more efficiently with a 
 bitset-like encoding or other things...
 A codec can try to do things like this to some extent today, but its very 
 difficult (look at buffering in Pulsing). We made this change with DV and it 
 made a lot of interesting optimizations easy to implement...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5123) invert the codec postings API

2013-08-30 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754835#comment-13754835
 ] 

Robert Muir commented on LUCENE-5123:
-

Maybe we should:

# move write() from PostingsFormat to FieldsConsumer
# make the push api a subclass of FieldsConsumer that has a final 
implementation of write() and exposes the abstract api it has today (e.g. 
addField)

or we can keep the names the same and have the new one be underneath 
FieldsConsumer (e.g. RawFieldsConsumer).

But I am not sure we should focus on nuking the old api or even cutting over 
impls that dont need the visitor api, since the existing api works pretty well
for a lot of implementations and if you dont need to do fancy stuff it might be 
easier and less error-prone.



 invert the codec postings API
 -

 Key: LUCENE-5123
 URL: https://issues.apache.org/jira/browse/LUCENE-5123
 Project: Lucene - Core
  Issue Type: Wish
Reporter: Robert Muir
Assignee: Michael McCandless
 Attachments: LUCENE-5123.patch, LUCENE-5123.patch, LUCENE-5123.patch


 Currently FieldsConsumer/PostingsConsumer/etc is a push oriented api, e.g. 
 FreqProxTermsWriter streams the postings at flush, and the default merge() 
 takes the incoming codec api and filters out deleted docs and pushes via 
 same api (but that can be overridden).
 It could be cleaner if we allowed for a pull model instead (like 
 DocValues). For example, maybe FreqProxTermsWriter could expose a Terms of 
 itself and just passed this to the codec consumer.
 This would give the codec more flexibility to e.g. do multiple passes if it 
 wanted to do things like encode high-frequency terms more efficiently with a 
 bitset-like encoding or other things...
 A codec can try to do things like this to some extent today, but its very 
 difficult (look at buffering in Pulsing). We made this change with DV and it 
 made a lot of interesting optimizations easy to implement...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5123) invert the codec postings API

2013-08-21 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747034#comment-13747034
 ] 

Robert Muir commented on LUCENE-5123:
-

This is exciting! 

One idea is to just keep the old API (at least for now)?
Then, we dont have to cutover tons of code at once and we just have a new low 
level api (and back compat by accident).

I think it would be good if we wrote or converted a 'demo' codec (simpletext is 
ok for example, or a new simple one) to the new api first, just to see if we 
are happy with it.

Like maybe its just fine that if you are implementing the new API you have to 
compute the stats in your codec yourself, maybe its simple, or maybe we just 
plan on keeping the higher level API and not deprecating it.


 invert the codec postings API
 -

 Key: LUCENE-5123
 URL: https://issues.apache.org/jira/browse/LUCENE-5123
 Project: Lucene - Core
  Issue Type: Wish
Reporter: Robert Muir
Assignee: Michael McCandless
 Attachments: LUCENE-5123.patch


 Currently FieldsConsumer/PostingsConsumer/etc is a push oriented api, e.g. 
 FreqProxTermsWriter streams the postings at flush, and the default merge() 
 takes the incoming codec api and filters out deleted docs and pushes via 
 same api (but that can be overridden).
 It could be cleaner if we allowed for a pull model instead (like 
 DocValues). For example, maybe FreqProxTermsWriter could expose a Terms of 
 itself and just passed this to the codec consumer.
 This would give the codec more flexibility to e.g. do multiple passes if it 
 wanted to do things like encode high-frequency terms more efficiently with a 
 bitset-like encoding or other things...
 A codec can try to do things like this to some extent today, but its very 
 difficult (look at buffering in Pulsing). We made this change with DV and it 
 made a lot of interesting optimizations easy to implement...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5123) invert the codec postings API

2013-07-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713851#comment-13713851
 ] 

Michael McCandless commented on LUCENE-5123:


+1

I think this would allow for more innovation in PostingsFormats ...

 invert the codec postings API
 -

 Key: LUCENE-5123
 URL: https://issues.apache.org/jira/browse/LUCENE-5123
 Project: Lucene - Core
  Issue Type: Wish
Reporter: Robert Muir

 Currently FieldsConsumer/PostingsConsumer/etc is a push oriented api, e.g. 
 FreqProxTermsWriter streams the postings at flush, and the default merge() 
 takes the incoming codec api and filters out deleted docs and pushes via 
 same api (but that can be overridden).
 It could be cleaner if we allowed for a pull model instead (like 
 DocValues). For example, maybe FreqProxTermsWriter could expose a Terms of 
 itself and just passed this to the codec consumer.
 This would give the codec more flexibility to e.g. do multiple passes if it 
 wanted to do things like encode high-frequency terms more efficiently with a 
 bitset-like encoding or other things...
 A codec can try to do things like this to some extent today, but its very 
 difficult (look at buffering in Pulsing). We made this change with DV and it 
 made a lot of interesting optimizations easy to implement...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org