[jira] Commented: (LUCENE-652) Compressed fields should be externalized (from Fields into Document)

2009-03-21 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12688012#action_12688012
 ] 

Michael McCandless commented on LUCENE-652:
---

bq. Fine! In my opinion the little overhead of UnicodeUtils is far lower that 
the one by compression and the ByteArrayStreams.

Good point...

bq. You can: With a FieldSelector that load the fields for merge, you get the 
raw binary values (found out from the code of FieldsReader).

Ahh, true!  In fact, I will go and deprecate that FieldSelectorResult option.

 Compressed fields should be externalized (from Fields into Document)
 --

 Key: LUCENE-652
 URL: https://issues.apache.org/jira/browse/LUCENE-652
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 1.9, 2.0.0, 2.1
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-652.patch, LUCENE-652.patch, LUCENE-652.patch


 Right now, as of 2.0 release, Lucene supports compressed stored fields.  
 However, after discussion on java-dev, the suggestion arose, from Robert 
 Engels, that it would be better if this logic were moved into the Document 
 level.  This way the indexing level just stores opaque binary fields, and 
 then Document handles compress/uncompressing as needed.
 This approach would have prevented issues like LUCENE-629 because merging of 
 segments would never need to decompress.
 See this thread for the recent discussion:
 http://www.gossamer-threads.com/lists/lucene/java-dev/38836
 When we do this we should also work on related issue LUCENE-648.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-652) Compressed fields should be externalized (from Fields into Document)

2009-03-20 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683962#action_12683962
 ] 

Uwe Schindler commented on LUCENE-652:
--

Is an index compressed with Store.COMPRESS still readable? Can i uncompress 
fields compressed using the old tools also by retrieving the byte array and 
using CompressionTools? There should be some documentation about that.

Another question: Compressing was also used for string fields, maybe 
CompressionTols also suplies a method to compress strings (and convert them to 
UTF-8 during that to be backwards compatible). This would prevent people from 
calling String.getBytes() without charset and then wondering, why they cannoit 
read their index again...

 Compressed fields should be externalized (from Fields into Document)
 --

 Key: LUCENE-652
 URL: https://issues.apache.org/jira/browse/LUCENE-652
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 1.9, 2.0.0, 2.1
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-652.patch


 Right now, as of 2.0 release, Lucene supports compressed stored fields.  
 However, after discussion on java-dev, the suggestion arose, from Robert 
 Engels, that it would be better if this logic were moved into the Document 
 level.  This way the indexing level just stores opaque binary fields, and 
 then Document handles compress/uncompressing as needed.
 This approach would have prevented issues like LUCENE-629 because merging of 
 segments would never need to decompress.
 See this thread for the recent discussion:
 http://www.gossamer-threads.com/lists/lucene/java-dev/38836
 When we do this we should also work on related issue LUCENE-648.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-652) Compressed fields should be externalized (from Fields into Document)

2009-03-20 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683967#action_12683967
 ] 

Michael McCandless commented on LUCENE-652:
---

Good questions!

bq. Is an index compressed with Store.COMPRESS still readable?

Yes, we have to support that until Lucene 4.0.  But
Field.Store.COMPRESS will be removed in 3.0 (ie you can read previous
compressed fields, interact w/ an index that has compressed fields in
it, etc., just not add docs with Field.Store.COMPRESS to an index as
of 3.0).

bq. Can i uncompress fields compressed using the old tools also by retrieving 
the byte array and using CompressionTools?

Well... yes, but: you can't actually get the compressed byte[]
(because Lucene will decompress it for you).

bq. Compressing was also used for string fields, maybe CompressionTols also 
suplies a method to compress strings (and convert them to UTF-8 during that to 
be backwards compatible). This would prevent people from calling 
String.getBytes() without charset and then wondering, why they cannoit read 
their index again...

OK I'll add them.  I'll name them compressString and decompressString.


 Compressed fields should be externalized (from Fields into Document)
 --

 Key: LUCENE-652
 URL: https://issues.apache.org/jira/browse/LUCENE-652
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 1.9, 2.0.0, 2.1
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-652.patch


 Right now, as of 2.0 release, Lucene supports compressed stored fields.  
 However, after discussion on java-dev, the suggestion arose, from Robert 
 Engels, that it would be better if this logic were moved into the Document 
 level.  This way the indexing level just stores opaque binary fields, and 
 then Document handles compress/uncompressing as needed.
 This approach would have prevented issues like LUCENE-629 because merging of 
 segments would never need to decompress.
 See this thread for the recent discussion:
 http://www.gossamer-threads.com/lists/lucene/java-dev/38836
 When we do this we should also work on related issue LUCENE-648.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-652) Compressed fields should be externalized (from Fields into Document)

2009-03-20 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683969#action_12683969
 ] 

Uwe Schindler commented on LUCENE-652:
--

bq. OK I'll add them. I'll name them compressString and decompressString.

Maybe it is better to use the new UTF-8 tools to encode/decode (instead of 
toBytes()). This would be consistent with the rest bof Lucene.
But for the old deprecated Field.Store.COMPRESS, keep it how it is (backwards 
compatibility).

 Compressed fields should be externalized (from Fields into Document)
 --

 Key: LUCENE-652
 URL: https://issues.apache.org/jira/browse/LUCENE-652
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 1.9, 2.0.0, 2.1
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-652.patch


 Right now, as of 2.0 release, Lucene supports compressed stored fields.  
 However, after discussion on java-dev, the suggestion arose, from Robert 
 Engels, that it would be better if this logic were moved into the Document 
 level.  This way the indexing level just stores opaque binary fields, and 
 then Document handles compress/uncompressing as needed.
 This approach would have prevented issues like LUCENE-629 because merging of 
 segments would never need to decompress.
 See this thread for the recent discussion:
 http://www.gossamer-threads.com/lists/lucene/java-dev/38836
 When we do this we should also work on related issue LUCENE-648.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-652) Compressed fields should be externalized (from Fields into Document)

2009-03-20 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683977#action_12683977
 ] 

Uwe Schindler commented on LUCENE-652:
--

Yes, should I prepare a patch for trunk and add these methods?

 Compressed fields should be externalized (from Fields into Document)
 --

 Key: LUCENE-652
 URL: https://issues.apache.org/jira/browse/LUCENE-652
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 1.9, 2.0.0, 2.1
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-652.patch


 Right now, as of 2.0 release, Lucene supports compressed stored fields.  
 However, after discussion on java-dev, the suggestion arose, from Robert 
 Engels, that it would be better if this logic were moved into the Document 
 level.  This way the indexing level just stores opaque binary fields, and 
 then Document handles compress/uncompressing as needed.
 This approach would have prevented issues like LUCENE-629 because merging of 
 segments would never need to decompress.
 See this thread for the recent discussion:
 http://www.gossamer-threads.com/lists/lucene/java-dev/38836
 When we do this we should also work on related issue LUCENE-648.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-652) Compressed fields should be externalized (from Fields into Document)

2009-03-20 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683985#action_12683985
 ] 

Michael McCandless commented on LUCENE-652:
---

If we switch to UnicodeUntil we may want to allow instantiation of 
CompressionTools, since UnicodeUtil is optimized for re-use.

And if we do that we have to think about thread safety  concurrency, probably 
using CloseableThreadLocal under the hood, and then add a close() method.


 Compressed fields should be externalized (from Fields into Document)
 --

 Key: LUCENE-652
 URL: https://issues.apache.org/jira/browse/LUCENE-652
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 1.9, 2.0.0, 2.1
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-652.patch, LUCENE-652.patch


 Right now, as of 2.0 release, Lucene supports compressed stored fields.  
 However, after discussion on java-dev, the suggestion arose, from Robert 
 Engels, that it would be better if this logic were moved into the Document 
 level.  This way the indexing level just stores opaque binary fields, and 
 then Document handles compress/uncompressing as needed.
 This approach would have prevented issues like LUCENE-629 because merging of 
 segments would never need to decompress.
 See this thread for the recent discussion:
 http://www.gossamer-threads.com/lists/lucene/java-dev/38836
 When we do this we should also work on related issue LUCENE-648.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-652) Compressed fields should be externalized (from Fields into Document)

2009-03-20 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12683981#action_12683981
 ] 

Michael McCandless commented on LUCENE-652:
---

bq. Yes, should I prepare a patch for trunk and add these methods?

You mean to switch to UnicodeUtil?  That would be great!


 Compressed fields should be externalized (from Fields into Document)
 --

 Key: LUCENE-652
 URL: https://issues.apache.org/jira/browse/LUCENE-652
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 1.9, 2.0.0, 2.1
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-652.patch, LUCENE-652.patch


 Right now, as of 2.0 release, Lucene supports compressed stored fields.  
 However, after discussion on java-dev, the suggestion arose, from Robert 
 Engels, that it would be better if this logic were moved into the Document 
 level.  This way the indexing level just stores opaque binary fields, and 
 then Document handles compress/uncompressing as needed.
 This approach would have prevented issues like LUCENE-629 because merging of 
 segments would never need to decompress.
 See this thread for the recent discussion:
 http://www.gossamer-threads.com/lists/lucene/java-dev/38836
 When we do this we should also work on related issue LUCENE-648.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-652) Compressed fields should be externalized (from Fields into Document)

2009-03-20 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12684067#action_12684067
 ] 

Michael McCandless commented on LUCENE-652:
---

OK thanks Uwe, it looks good.  We can leave the other changes I
suggested to future optimizations.  I'll commit soon!


 Compressed fields should be externalized (from Fields into Document)
 --

 Key: LUCENE-652
 URL: https://issues.apache.org/jira/browse/LUCENE-652
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 1.9, 2.0.0, 2.1
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-652.patch, LUCENE-652.patch, LUCENE-652.patch


 Right now, as of 2.0 release, Lucene supports compressed stored fields.  
 However, after discussion on java-dev, the suggestion arose, from Robert 
 Engels, that it would be better if this logic were moved into the Document 
 level.  This way the indexing level just stores opaque binary fields, and 
 then Document handles compress/uncompressing as needed.
 This approach would have prevented issues like LUCENE-629 because merging of 
 segments would never need to decompress.
 See this thread for the recent discussion:
 http://www.gossamer-threads.com/lists/lucene/java-dev/38836
 When we do this we should also work on related issue LUCENE-648.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-652) Compressed fields should be externalized (from Fields into Document)

2009-03-20 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12685385#action_12685385
 ] 

Uwe Schindler commented on LUCENE-652:
--

Fine! In my opinion the little overhead of UnicodeUtils is far lower that the 
one by compression and the ByteArrayStreams.

{quote}
bq. Can i uncompress fields compressed using the old tools also by retrieving 
the byte array and using CompressionTools?

Well... yes, but: you can't actually get the compressed byte[]
(because Lucene will decompress it for you).
{quote}

You can: With a FieldSelector that load the fields for merge, you get the raw 
binary values (found out from the code of FieldsReader).

 Compressed fields should be externalized (from Fields into Document)
 --

 Key: LUCENE-652
 URL: https://issues.apache.org/jira/browse/LUCENE-652
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 1.9, 2.0.0, 2.1
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-652.patch, LUCENE-652.patch, LUCENE-652.patch


 Right now, as of 2.0 release, Lucene supports compressed stored fields.  
 However, after discussion on java-dev, the suggestion arose, from Robert 
 Engels, that it would be better if this logic were moved into the Document 
 level.  This way the indexing level just stores opaque binary fields, and 
 then Document handles compress/uncompressing as needed.
 This approach would have prevented issues like LUCENE-629 because merging of 
 segments would never need to decompress.
 See this thread for the recent discussion:
 http://www.gossamer-threads.com/lists/lucene/java-dev/38836
 When we do this we should also work on related issue LUCENE-648.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-652) Compressed fields should be externalized (from Fields into Document)

2008-01-13 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558407#action_12558407
 ] 

Grant Ingersoll commented on LUCENE-652:


Implementing this would mean deprecating Field.Store.COMPRESS and the various 
other places that use/set bits marking a field as compressed.

Seems like a reasonable thing to do.  I will mark this as a 2.9 issue, so that 
we make sure we deprecate it at or before that time.

 Compressed fields should be externalized (from Fields into Document)
 --

 Key: LUCENE-652
 URL: https://issues.apache.org/jira/browse/LUCENE-652
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 1.9, 2.0.0, 2.0.1, 2.1
Reporter: Michael McCandless
Priority: Minor
 Fix For: 2.9


 Right now, as of 2.0 release, Lucene supports compressed stored fields.  
 However, after discussion on java-dev, the suggestion arose, from Robert 
 Engels, that it would be better if this logic were moved into the Document 
 level.  This way the indexing level just stores opaque binary fields, and 
 then Document handles compress/uncompressing as needed.
 This approach would have prevented issues like LUCENE-629 because merging of 
 segments would never need to decompress.
 See this thread for the recent discussion:
 http://www.gossamer-threads.com/lists/lucene/java-dev/38836
 When we do this we should also work on related issue LUCENE-648.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]