[jira] Commented: (LUCENE-1340) Make it posible not to include TF information in index

2008-07-18 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614909#action_12614909
 ] 

Paul Elschot commented on LUCENE-1340:
--

Ok ok. I'll start working on adding a Filter as a clause to BooleanQuery. Will 
take some time though, there's a holiday coming up.

> Make it posible not to include TF information in index
> --
>
> Key: LUCENE-1340
> URL: https://issues.apache.org/jira/browse/LUCENE-1340
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Eks Dev
>Priority: Minor
> Attachments: LUCENE-1340.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Term Frequency is typically not needed  for all fields, some CPU (reading one 
> VInt less and one X>>>1...) and IO can be spared by making pure boolen fields 
> possible in Lucene. This topic has already been discussed and accepted as a 
> part of Flexible Indexing... This issue tries to push things a bit faster 
> forward as I have some concrete customer demands.
> benefits can be expected for fields that are typical candidates for Filters, 
> enumerations, user rights, IDs or very short "texts", phone  numbers, zip 
> codes, names...
> Status: just passed standard test (compatibility), commited for early review, 
> I have not tried new feature, missing some asserts and one two unit tests
> Complexity: simpler than expected
> can be used via omitTf() (who used omitNorms() will know where to find it :)  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1340) Make it posible not to include TF information in index

2008-07-21 Thread Eks Dev (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615357#action_12615357
 ] 

Eks Dev commented on LUCENE-1340:
-

Great, it is already more than I expected, even indexing is going to be 
somewhat faster.

I have tried your patch on smallish index with 8Mio documents and it worked on 
our regression test without problems. 
it worked fine with and without omitTf(true), no performance drop or bad 
surprises when we do not use it. Tomorrow is scheduled real test with 
production data, around 80Mio very small documents, with some very extensive 
tests I will report back.

"The one place I know of that will still waste bytes is the term dict
(TermInfo): it stores a long proxPointer on disk (in .tii,.tis) and
also in memory because we load *.tii into RAM "

 About this one, it would be nice not to store this as well, but I think the 
pointers are already reduced to one byte, as they are 0 for these cases (are 
they,?) So we have this benefit without expecting it :)

And yes, more "column stride" is great, if you followed my comments on 
LUCENE-1278, that would mean we could easily "inline" very short postings into 
term dict (here I expect huge performance benefit, as skip()  on another large 
file is going to be saved independent from omitTf(true)), without increase in 
size (or minimal) of tii (no locality penalty) If we follow Zipfian 
distribution, there is *a lot* of terms with postings shorter than e.g. 16 ... 

Thanks again for your support, without you this patch would be just another 
nice idea :)








> Make it posible not to include TF information in index
> --
>
> Key: LUCENE-1340
> URL: https://issues.apache.org/jira/browse/LUCENE-1340
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Eks Dev
>Priority: Minor
> Attachments: LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch, 
> LUCENE-1340.patch, LUCENE-1340.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Term Frequency is typically not needed  for all fields, some CPU (reading one 
> VInt less and one X>>>1...) and IO can be spared by making pure boolen fields 
> possible in Lucene. This topic has already been discussed and accepted as a 
> part of Flexible Indexing... This issue tries to push things a bit faster 
> forward as I have some concrete customer demands.
> benefits can be expected for fields that are typical candidates for Filters, 
> enumerations, user rights, IDs or very short "texts", phone  numbers, zip 
> codes, names...
> Status: just passed standard test (compatibility), commited for early review, 
> I have not tried new feature, missing some asserts and one two unit tests
> Complexity: simpler than expected
> can be used via omitTf() (who used omitNorms() will know where to find it :)  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1340) Make it posible not to include TF information in index

2008-07-21 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615446#action_12615446
 ] 

Michael McCandless commented on LUCENE-1340:


bq. About this one, it would be nice not to store this as well, but I think the 
pointers are already reduced to one byte, as they are 0 for these cases (are 
they,?) So we have this benefit without expecting it

Ahh, right.  The delta between the proxPointers are written as vlong's.  Since 
the delta will be zero it's now only 1 byte; only a bit worse than 0 bytes ;)

bq. that would mean we could easily "inline" very short postings into term dict 
(here I expect huge performance benefit, as skip() on another large file is 
going to be saved independent from omitTf(true))

Yes, this looks like it would be a win for cases that need to visit the 
postings for many small terms.

> Make it posible not to include TF information in index
> --
>
> Key: LUCENE-1340
> URL: https://issues.apache.org/jira/browse/LUCENE-1340
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Eks Dev
>Priority: Minor
> Attachments: LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch, 
> LUCENE-1340.patch, LUCENE-1340.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Term Frequency is typically not needed  for all fields, some CPU (reading one 
> VInt less and one X>>>1...) and IO can be spared by making pure boolen fields 
> possible in Lucene. This topic has already been discussed and accepted as a 
> part of Flexible Indexing... This issue tries to push things a bit faster 
> forward as I have some concrete customer demands.
> benefits can be expected for fields that are typical candidates for Filters, 
> enumerations, user rights, IDs or very short "texts", phone  numbers, zip 
> codes, names...
> Status: just passed standard test (compatibility), commited for early review, 
> I have not tried new feature, missing some asserts and one two unit tests
> Complexity: simpler than expected
> can be used via omitTf() (who used omitNorms() will know where to find it :)  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1340) Make it posible not to include TF information in index

2008-07-24 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616513#action_12616513
 ] 

Michael McCandless commented on LUCENE-1340:


bq. The delta between the proxPointers are written as vlong's. Since the delta 
will be zero it's now only 1 byte; only a bit worse than 0 bytes

One more thing here: since the tiis are loaded into RAM, that unused 
proxPointer wastes 8 bytes for each indexed terms.  For indices with alot of 
terms this can add up to alot of wasted ram.  But still I think we should wait 
and fix this as part of flexible indexing, when we maybe refactor the TermInfos 
to be "column stride" instead.

> Make it posible not to include TF information in index
> --
>
> Key: LUCENE-1340
> URL: https://issues.apache.org/jira/browse/LUCENE-1340
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Eks Dev
>Priority: Minor
> Attachments: LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch, 
> LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Term Frequency is typically not needed  for all fields, some CPU (reading one 
> VInt less and one X>>>1...) and IO can be spared by making pure boolen fields 
> possible in Lucene. This topic has already been discussed and accepted as a 
> part of Flexible Indexing... This issue tries to push things a bit faster 
> forward as I have some concrete customer demands.
> benefits can be expected for fields that are typical candidates for Filters, 
> enumerations, user rights, IDs or very short "texts", phone  numbers, zip 
> codes, names...
> Status: just passed standard test (compatibility), commited for early review, 
> I have not tried new feature, missing some asserts and one two unit tests
> Complexity: simpler than expected
> can be used via omitTf() (who used omitNorms() will know where to find it :)  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1340) Make it posible not to include TF information in index

2008-07-26 Thread Eks Dev (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12617140#action_12617140
 ] 

Eks Dev commented on LUCENE-1340:
-

we  finished our tests

Index without omitTf() :
- 87Mio Documents, 2 indexed Fields one stored field
- Unique terms in index 2.5Mio
- Average Field lengths in tokens: 3.3 and 5.5 (very short fields)
- On Disk size 3.8 Gb total with stored field
 
Queries under test: 
- BooleanQuery in all shapes and forms (disjunctive, conjunctive, nested, with 
minNumberShouldMatch()) . with a lot of clauses (5-100).
- Filter used, yes

Test scope, regression with 30k Queries on the same index with 
omitTf(true/false).

Result:

- The Queries returned 100% identical Hits (full recall tested, all hits 
checked)!

- Index size reduction(not including stored field!): 7% (short documents => 
less positions than in Mike's case)

- Performance of Queries: 5.2% faster, but index was loaded as RAMIndex (on 
disk setup should bring even more due to the reduced IO for reading postings)

-Indexing performance (FSDisk!) 13% faster

Also, we compared omitTf(false) with this patch and lucene.jar without this 
patch, no changes whatsoever.

>From my perspective, this is good to go into production. At least for our 
>usage of lucene, there are no differences with homitTf(true)... 

>One more thing here: since the tiis are loaded into RAM, that unused 
>proxPointer wastes 8 bytes for each indexed terms. For indices with alot of 
>terms this can add up to alot of wasted ram. But still I think we should wait 
>and fix this as part of flexible indexing, when we maybe refactor the 
>TermInfos to be "column stride" instead.

I am more than happy with the results, no need to squeeze the last bit out of 
it right now.

Mike, thanks again for the great work! 



> Make it posible not to include TF information in index
> --
>
> Key: LUCENE-1340
> URL: https://issues.apache.org/jira/browse/LUCENE-1340
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Eks Dev
>Priority: Minor
> Attachments: LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch, 
> LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Term Frequency is typically not needed  for all fields, some CPU (reading one 
> VInt less and one X>>>1...) and IO can be spared by making pure boolen fields 
> possible in Lucene. This topic has already been discussed and accepted as a 
> part of Flexible Indexing... This issue tries to push things a bit faster 
> forward as I have some concrete customer demands.
> benefits can be expected for fields that are typical candidates for Filters, 
> enumerations, user rights, IDs or very short "texts", phone  numbers, zip 
> codes, names...
> Status: just passed standard test (compatibility), commited for early review, 
> I have not tried new feature, missing some asserts and one two unit tests
> Complexity: simpler than expected
> can be used via omitTf() (who used omitNorms() will know where to find it :)  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1340) Make it posible not to include TF information in index

2008-07-26 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12617143#action_12617143
 ] 

Michael McCandless commented on LUCENE-1340:


OK that sounds like a healthy test.

bq. Mike, thanks again for the great work! 

Thank you for the sudden burst of effort to make this happen!

So I think this is ready to commit.  I'll wait a few days and then commit...

> Make it posible not to include TF information in index
> --
>
> Key: LUCENE-1340
> URL: https://issues.apache.org/jira/browse/LUCENE-1340
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Eks Dev
>Priority: Minor
> Attachments: LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch, 
> LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Term Frequency is typically not needed  for all fields, some CPU (reading one 
> VInt less and one X>>>1...) and IO can be spared by making pure boolen fields 
> possible in Lucene. This topic has already been discussed and accepted as a 
> part of Flexible Indexing... This issue tries to push things a bit faster 
> forward as I have some concrete customer demands.
> benefits can be expected for fields that are typical candidates for Filters, 
> enumerations, user rights, IDs or very short "texts", phone  numbers, zip 
> codes, names...
> Status: just passed standard test (compatibility), commited for early review, 
> I have not tried new feature, missing some asserts and one two unit tests
> Complexity: simpler than expected
> can be used via omitTf() (who used omitNorms() will know where to find it :)  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1340) Make it posible not to include TF information in index

2008-07-29 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12617954#action_12617954
 ] 

Grant Ingersoll commented on LUCENE-1340:
-

I note a change to Fieldable...  sigh...  Back compatibility fails.  Ugh.

Me thinks we should either rework Fieldable as we've previously discussed, or 
we mark it as being one of the very few classes in LUcene that is subject to 
change between releases.



> Make it posible not to include TF information in index
> --
>
> Key: LUCENE-1340
> URL: https://issues.apache.org/jira/browse/LUCENE-1340
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Eks Dev
>Priority: Minor
> Attachments: LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch, 
> LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Term Frequency is typically not needed  for all fields, some CPU (reading one 
> VInt less and one X>>>1...) and IO can be spared by making pure boolen fields 
> possible in Lucene. This topic has already been discussed and accepted as a 
> part of Flexible Indexing... This issue tries to push things a bit faster 
> forward as I have some concrete customer demands.
> benefits can be expected for fields that are typical candidates for Filters, 
> enumerations, user rights, IDs or very short "texts", phone  numbers, zip 
> codes, names...
> Status: just passed standard test (compatibility), commited for early review, 
> I have not tried new feature, missing some asserts and one two unit tests
> Complexity: simpler than expected
> can be used via omitTf() (who used omitNorms() will know where to find it :)  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1340) Make it posible not to include TF information in index

2008-07-29 Thread Eks Dev (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12617978#action_12617978
 ] 

Eks Dev commented on LUCENE-1340:
-

ouch! it is kind of getting personal between me and Fieldable :) Not the first 
time to get bugged by it!

Due to Fieldable (things really important, at lest to me):  
- We cannot get binary stored Field in and out of lucene without getting gc() 
go crazy
- We cannot omitTF 
 
it would be possible somehow to make it at AbstractField levele and instanceoff 
at a few places, but I simply hate to do it (I will patch my local copy, this 
issue is worth to me... must branch off from the trunk for the first time, sigh)

funny it is, I see no reason to have anything but AbstractField 
(Field/Fieldable are just redundant)

> Make it posible not to include TF information in index
> --
>
> Key: LUCENE-1340
> URL: https://issues.apache.org/jira/browse/LUCENE-1340
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Eks Dev
>Priority: Minor
> Attachments: LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch, 
> LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Term Frequency is typically not needed  for all fields, some CPU (reading one 
> VInt less and one X>>>1...) and IO can be spared by making pure boolen fields 
> possible in Lucene. This topic has already been discussed and accepted as a 
> part of Flexible Indexing... This issue tries to push things a bit faster 
> forward as I have some concrete customer demands.
> benefits can be expected for fields that are typical candidates for Filters, 
> enumerations, user rights, IDs or very short "texts", phone  numbers, zip 
> codes, names...
> Status: just passed standard test (compatibility), commited for early review, 
> I have not tried new feature, missing some asserts and one two unit tests
> Complexity: simpler than expected
> can be used via omitTf() (who used omitNorms() will know where to find it :)  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1340) Make it posible not to include TF information in index

2008-07-29 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12617996#action_12617996
 ] 

Grant Ingersoll commented on LUCENE-1340:
-

Yeah, it's one of my biggest regrets in Lucene (yes, I am responsible for it), 
yet I firmly believe there is a way to do interfaces and abstracts in a proper 
way in Java.

We could make LazyField extend AbstractField, I think, but it's not clear, as 
there are some differences between the two, mostly around construction.  I'd 
have to go back and review again.

That being said, I still think if there is one place where we should allow 
breaking the back compat. contract, it is Fieldable!  For every rule, there is 
an exception, right?  I thinnk we could, w/ sufficient warning, tell people 
that we are changing the interface.  I am willing to bet that the number of 
people that would be effected by that would be less than 10.

So, please don't give up on this patch.  I am totally 100% for it.  I think it 
makes total sense to do.  

Another option is to speed up going towards 3.0

> Make it posible not to include TF information in index
> --
>
> Key: LUCENE-1340
> URL: https://issues.apache.org/jira/browse/LUCENE-1340
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Eks Dev
>Priority: Minor
> Attachments: LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch, 
> LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Term Frequency is typically not needed  for all fields, some CPU (reading one 
> VInt less and one X>>>1...) and IO can be spared by making pure boolen fields 
> possible in Lucene. This topic has already been discussed and accepted as a 
> part of Flexible Indexing... This issue tries to push things a bit faster 
> forward as I have some concrete customer demands.
> benefits can be expected for fields that are typical candidates for Filters, 
> enumerations, user rights, IDs or very short "texts", phone  numbers, zip 
> codes, names...
> Status: just passed standard test (compatibility), commited for early review, 
> I have not tried new feature, missing some asserts and one two unit tests
> Complexity: simpler than expected
> can be used via omitTf() (who used omitNorms() will know where to find it :)  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1340) Make it posible not to include TF information in index

2008-07-29 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618001#action_12618001
 ] 

Doug Cutting commented on LUCENE-1340:
--

> I firmly believe there is a way to do interfaces and abstracts in a proper 
> way in Java. 

Personally, I've given up on interfaces for stuff with more than one method 
with at most one parameter.  Ditch the interface and move on.

> Make it posible not to include TF information in index
> --
>
> Key: LUCENE-1340
> URL: https://issues.apache.org/jira/browse/LUCENE-1340
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Eks Dev
>Priority: Minor
> Attachments: LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch, 
> LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Term Frequency is typically not needed  for all fields, some CPU (reading one 
> VInt less and one X>>>1...) and IO can be spared by making pure boolen fields 
> possible in Lucene. This topic has already been discussed and accepted as a 
> part of Flexible Indexing... This issue tries to push things a bit faster 
> forward as I have some concrete customer demands.
> benefits can be expected for fields that are typical candidates for Filters, 
> enumerations, user rights, IDs or very short "texts", phone  numbers, zip 
> codes, names...
> Status: just passed standard test (compatibility), commited for early review, 
> I have not tried new feature, missing some asserts and one two unit tests
> Complexity: simpler than expected
> can be used via omitTf() (who used omitNorms() will know where to find it :)  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1340) Make it posible not to include TF information in index

2008-07-29 Thread Eks Dev (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618069#action_12618069
 ] 

Eks Dev commented on LUCENE-1340:
-

that sound like consensus :) Great!

in that case LUCENE-1219 can be reworked slightly to avoid instanceoff (less 
code). Also it opens a way to pass reference to byte[] for retrieving stored 
fields out of lucene and communicating length back to caller (now we new byte[] 
every time we fetch stored field) 

bq. it's one of my biggest regrets in Lucene (yes, I am responsible for it), 
yet I firmly believe there is a way to do interfaces and abstracts in a proper 
way in Java. 

no need to regret Grant, if you do nothing you make no mistakes... Interfaces 
are ok, as long as you can tell what they are going to be doing in next 5 
years... this forces you to design "for the future"... something we cannot 
afford in so popular and complex libraries like lucene at places like Field. 
Abstract* is equally good design-abstraction...  

Proposal:
We could live with a statement "Fieldable changes are allowed from now, it is 
deprecated and will be  probably removed in 3.0" , it causes just a tiny bit of 
work in case someone is really implementing it (adding new methods to Fieldable 
like omitTf() costs you max 5 minutes work to change your implementing class to 
implement it!).

from 3.0 on, I could very well live without it, until then, we cause 5 minutes 
work for people that implement Fieldable on their own and want to stay up to 
date with the trunk.  It is fair  deal for everyone and lucene moves forward... 




 


  

> Make it posible not to include TF information in index
> --
>
> Key: LUCENE-1340
> URL: https://issues.apache.org/jira/browse/LUCENE-1340
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Eks Dev
>Priority: Minor
> Attachments: LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch, 
> LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Term Frequency is typically not needed  for all fields, some CPU (reading one 
> VInt less and one X>>>1...) and IO can be spared by making pure boolen fields 
> possible in Lucene. This topic has already been discussed and accepted as a 
> part of Flexible Indexing... This issue tries to push things a bit faster 
> forward as I have some concrete customer demands.
> benefits can be expected for fields that are typical candidates for Filters, 
> enumerations, user rights, IDs or very short "texts", phone  numbers, zip 
> codes, names...
> Status: just passed standard test (compatibility), commited for early review, 
> I have not tried new feature, missing some asserts and one two unit tests
> Complexity: simpler than expected
> can be used via omitTf() (who used omitNorms() will know where to find it :)  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1340) Make it posible not to include TF information in index

2008-07-30 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618272#action_12618272
 ] 

Michael McCandless commented on LUCENE-1340:


Sigh, I too missed that we broke back-compatibility.

But I agree: let's mark Fieldable interface as being allowed to change from 
release to release (consciously make an exception to back compatibility 
requirements).

Let's also transition away from interface for Field, for 3.0   EG we last had 
discussions on this, here:

http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200803.mbox/[EMAIL 
PROTECTED]



> Make it posible not to include TF information in index
> --
>
> Key: LUCENE-1340
> URL: https://issues.apache.org/jira/browse/LUCENE-1340
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Eks Dev
>Priority: Minor
> Attachments: LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch, 
> LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Term Frequency is typically not needed  for all fields, some CPU (reading one 
> VInt less and one X>>>1...) and IO can be spared by making pure boolen fields 
> possible in Lucene. This topic has already been discussed and accepted as a 
> part of Flexible Indexing... This issue tries to push things a bit faster 
> forward as I have some concrete customer demands.
> benefits can be expected for fields that are typical candidates for Filters, 
> enumerations, user rights, IDs or very short "texts", phone  numbers, zip 
> codes, names...
> Status: just passed standard test (compatibility), commited for early review, 
> I have not tried new feature, missing some asserts and one two unit tests
> Complexity: simpler than expected
> can be used via omitTf() (who used omitNorms() will know where to find it :)  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1340) Make it posible not to include TF information in index

2008-07-30 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618290#action_12618290
 ] 

Grant Ingersoll commented on LUCENE-1340:
-

OK, I think we should call a vote on it, as it is significant enough in my 
mind.  I will write it up.

> Make it posible not to include TF information in index
> --
>
> Key: LUCENE-1340
> URL: https://issues.apache.org/jira/browse/LUCENE-1340
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Eks Dev
>Priority: Minor
> Attachments: LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch, 
> LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Term Frequency is typically not needed  for all fields, some CPU (reading one 
> VInt less and one X>>>1...) and IO can be spared by making pure boolen fields 
> possible in Lucene. This topic has already been discussed and accepted as a 
> part of Flexible Indexing... This issue tries to push things a bit faster 
> forward as I have some concrete customer demands.
> benefits can be expected for fields that are typical candidates for Filters, 
> enumerations, user rights, IDs or very short "texts", phone  numbers, zip 
> codes, names...
> Status: just passed standard test (compatibility), commited for early review, 
> I have not tried new feature, missing some asserts and one two unit tests
> Complexity: simpler than expected
> can be used via omitTf() (who used omitNorms() will know where to find it :)  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1340) Make it posible not to include TF information in index

2008-08-05 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619963#action_12619963
 ] 

Michael McCandless commented on LUCENE-1340:


LUCENE-1349 is in; I plan to commit this shortly...

> Make it posible not to include TF information in index
> --
>
> Key: LUCENE-1340
> URL: https://issues.apache.org/jira/browse/LUCENE-1340
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Eks Dev
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch, 
> LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Term Frequency is typically not needed  for all fields, some CPU (reading one 
> VInt less and one X>>>1...) and IO can be spared by making pure boolen fields 
> possible in Lucene. This topic has already been discussed and accepted as a 
> part of Flexible Indexing... This issue tries to push things a bit faster 
> forward as I have some concrete customer demands.
> benefits can be expected for fields that are typical candidates for Filters, 
> enumerations, user rights, IDs or very short "texts", phone  numbers, zip 
> codes, names...
> Status: just passed standard test (compatibility), commited for early review, 
> I have not tried new feature, missing some asserts and one two unit tests
> Complexity: simpler than expected
> can be used via omitTf() (who used omitNorms() will know where to find it :)  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-1340) Make it posible not to include TF information in index

2008-07-18 Thread eks dev
cool, that will round up this story...  Somehow I new you will react on this 
"provocation" :)



- Original Message 
> From: Paul Elschot (JIRA) <[EMAIL PROTECTED]>
> To: java-dev@lucene.apache.org
> Sent: Saturday, 19 July, 2008 12:49:32 AM
> Subject: [jira] Commented: (LUCENE-1340) Make it posible not to include TF 
> information in index
> 
> 
> [ 
> https://issues.apache.org/jira/browse/LUCENE-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614909#action_12614909
>  
> ] 
> 
> Paul Elschot commented on LUCENE-1340:
> --
> 
> Ok ok. I'll start working on adding a Filter as a clause to BooleanQuery. 
> Will 
> take some time though, there's a holiday coming up.
> 
> > Make it posible not to include TF information in index
> > --
> >
> > Key: LUCENE-1340
> > URL: https://issues.apache.org/jira/browse/LUCENE-1340
> > Project: Lucene - Java
> >  Issue Type: New Feature
> >  Components: Index
> >Reporter: Eks Dev
> >Priority: Minor
> > Attachments: LUCENE-1340.patch
> >
> >   Original Estimate: 24h
> >  Remaining Estimate: 24h
> >
> > Term Frequency is typically not needed  for all fields, some CPU (reading 
> > one 
> VInt less and one X>>>1...) and IO can be spared by making pure boolen fields 
> possible in Lucene. This topic has already been discussed and accepted as a 
> part 
> of Flexible Indexing... This issue tries to push things a bit faster forward 
> as 
> I have some concrete customer demands.
> > benefits can be expected for fields that are typical candidates for 
> > Filters, 
> enumerations, user rights, IDs or very short "texts", phone  numbers, zip 
> codes, 
> names...
> > Status: just passed standard test (compatibility), commited for early 
> > review, 
> I have not tried new feature, missing some asserts and one two unit tests
> > Complexity: simpler than expected
> > can be used via omitTf() (who used omitNorms() will know where to find it 
> > :)  
> 
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]



  __
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at 
Yahoo! http://uk.docs.yahoo.com/ymail/new.html

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-1340) Make it posible not to include TF information in index

2008-07-24 Thread eks dev
Great Mike, 

I am a bit short with time to write back about our tests in details, but we are 
getting very similar numbers on Indexing speed and index size Performance 
of queries is also  better... but I have to clean-up the numbers from our 
internal things before reporting... 

The most important information is that there are no problems whatsoever with 
regression tests (3 Queries in a complex setup with expansion of terms via 
our spell checker,  pushing BoooleanQuery to the limit in all possible 
variations, index size 80Mio short documents, two fields)  gave me 100% 
identical responses as our standard reference test! Just for info, NO Phrase 
Queries NOR Payloads were covered by our regression test.






- Original Message 
> From: Michael McCandless (JIRA) <[EMAIL PROTECTED]>
> To: java-dev@lucene.apache.org
> Sent: Thursday, 24 July, 2008 6:05:31 PM
> Subject: [jira] Commented: (LUCENE-1340) Make it posible not to include TF 
> information in index
> 
> 
> [ 
> https://issues.apache.org/jira/browse/LUCENE-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616513#action_12616513
>  
> ] 
> 
> Michael McCandless commented on LUCENE-1340:
> 
> 
> bq. The delta between the proxPointers are written as vlong's. Since the 
> delta 
> will be zero it's now only 1 byte; only a bit worse than 0 bytes
> 
> One more thing here: since the tiis are loaded into RAM, that unused 
> proxPointer 
> wastes 8 bytes for each indexed terms.  For indices with alot of terms this 
> can 
> add up to alot of wasted ram.  But still I think we should wait and fix this 
> as 
> part of flexible indexing, when we maybe refactor the TermInfos to be "column 
> stride" instead.
> 
> > Make it posible not to include TF information in index
> > --
> >
> > Key: LUCENE-1340
> > URL: https://issues.apache.org/jira/browse/LUCENE-1340
> > Project: Lucene - Java
> >  Issue Type: New Feature
> >  Components: Index
> >Reporter: Eks Dev
> >Priority: Minor
> > Attachments: LUCENE-1340.patch, LUCENE-1340.patch, 
> > LUCENE-1340.patch, 
> LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch
> >
> >   Original Estimate: 24h
> >  Remaining Estimate: 24h
> >
> > Term Frequency is typically not needed  for all fields, some CPU (reading 
> > one 
> VInt less and one X>>>1...) and IO can be spared by making pure boolen fields 
> possible in Lucene. This topic has already been discussed and accepted as a 
> part 
> of Flexible Indexing... This issue tries to push things a bit faster forward 
> as 
> I have some concrete customer demands.
> > benefits can be expected for fields that are typical candidates for 
> > Filters, 
> enumerations, user rights, IDs or very short "texts", phone  numbers, zip 
> codes, 
> names...
> > Status: just passed standard test (compatibility), commited for early 
> > review, 
> I have not tried new feature, missing some asserts and one two unit tests
> > Complexity: simpler than expected
> > can be used via omitTf() (who used omitNorms() will know where to find it 
> > :)  
> 
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]



  __
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at 
Yahoo! http://uk.docs.yahoo.com/ymail/new.html

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]