Re: Index Optimization

2008-03-11 Thread masz-wow

thanks hossman. will post it at java-user

hossman wrote:
> 
> 
> 1)
> http://wiki.apache.org/lucene-java/LuceneFAQ#head-adee7c1d869aa20101733944da79e15a1a2e7dfa
> 
> FAQ: "Why do I have a deletable file (and old segment files remain) after
> running optimize?"
> 
> 2) http://people.apache.org/~hossman/#java-dev
> 
> Please Use "[EMAIL PROTECTED]" Not "[EMAIL PROTECTED]"
> 
> Your question is better suited for the [EMAIL PROTECTED] mailing list ...
> not the [EMAIL PROTECTED] list.  java-dev is for discussing development of
> the internals of the Lucene Java library ... it is *not* the appropriate
> place to ask questions about how to use the Lucene Java library when
> developing your own applications.  
> 
> If you have further questions about this topic, please send them to the 
> java-user mailing list, where you are likely to get more/better responses 
> since that list also has a larger number of subscribers.
> 
> : I managed to optimize my index successfully. The problem that I'm having
> now
> : is when I check the index using Lucene Index Toolbox there are a few
> files
> : in the index itself is deletable. I understand that optimize method will
> : merge the index files but How come there is still deletable index files
> in
> : it? What I do now is delete it manually. Is there by any chance that I
> can
> : delete it automatically? Any code that I can refer to?
> 
> 
> 
> -Hoss
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Index-Optimization-tp15996107p15996877.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Index Optimization

2008-03-11 Thread Chris Hostetter

1) 
http://wiki.apache.org/lucene-java/LuceneFAQ#head-adee7c1d869aa20101733944da79e15a1a2e7dfa

FAQ: "Why do I have a deletable file (and old segment files remain) after 
running optimize?"

2) http://people.apache.org/~hossman/#java-dev

Please Use "[EMAIL PROTECTED]" Not "[EMAIL PROTECTED]"

Your question is better suited for the [EMAIL PROTECTED] mailing list ...
not the [EMAIL PROTECTED] list.  java-dev is for discussing development of
the internals of the Lucene Java library ... it is *not* the appropriate
place to ask questions about how to use the Lucene Java library when
developing your own applications.  

If you have further questions about this topic, please send them to the 
java-user mailing list, where you are likely to get more/better responses 
since that list also has a larger number of subscribers.

: I managed to optimize my index successfully. The problem that I'm having now
: is when I check the index using Lucene Index Toolbox there are a few files
: in the index itself is deletable. I understand that optimize method will
: merge the index files but How come there is still deletable index files in
: it? What I do now is delete it manually. Is there by any chance that I can
: delete it automatically? Any code that I can refer to?



-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Index Optimization

2008-03-11 Thread masz-wow

I managed to optimize my index successfully. The problem that I'm having now
is when I check the index using Lucene Index Toolbox there are a few files
in the index itself is deletable. I understand that optimize method will
merge the index files but How come there is still deletable index files in
it? What I do now is delete it manually. Is there by any chance that I can
delete it automatically? Any code that I can refer to?
-- 
View this message in context: 
http://www.nabble.com/Index-Optimization-tp15996107p15996107.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: How to add a jar to a contrib build.xml

2008-03-11 Thread Chris Hostetter

: Here is how the span highlighter I have been working on uses the Memory
: contrib (I think I copied this from another contrib that has a dependency):

You might want to take a look at contrib/xml-query-parser/build.xml as a 
slightly better example of this.  It uses  to test if the 
dependency has already been built to save some overhead.



-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1223) lazy fields don't enforce binary vs string value

2008-03-11 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1223:
---

Attachment: LUCENE-1223.patch

Attached patch that just propagates the "binary" value from when we scanned the 
fields, into the LazyField, recording it as isBinary.  Then I enforce isBinary 
before returning a binaryValue() and !isBinary before returning a stringValue().

I'll commit in a day or two.

> lazy fields don't enforce binary vs string value
> 
>
> Key: LUCENE-1223
> URL: https://issues.apache.org/jira/browse/LUCENE-1223
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3, 2.3.1
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.4
>
> Attachments: LUCENE-1223.patch
>
>
> If you have a binary field, and load it lazy, and then ask that field
> for its stringValue, it will incorrectly give you a String back (and
> then will refuse to give a binaryValue).  And, vice-versa.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-1223) lazy fields don't enforce binary vs string value

2008-03-11 Thread Michael McCandless (JIRA)
lazy fields don't enforce binary vs string value


 Key: LUCENE-1223
 URL: https://issues.apache.org/jira/browse/LUCENE-1223
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.3.1, 2.3
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.4


If you have a binary field, and load it lazy, and then ask that field
for its stringValue, it will incorrectly give you a String back (and
then will refuse to give a binaryValue).  And, vice-versa.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1217) use isBinary cached variable instead of instanceof in Filed

2008-03-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577637#action_12577637
 ] 

Michael McCandless commented on LUCENE-1217:


OK the new patch passes all tests -- thanks!

One unrelated thing I noticed: it looks like you can get a binary LazyField and 
then ask for its stringValue(), and vice-versa.  Ie we are failing to check in 
binaryValue() that the field is in fact binary even though when we create the 
LazyField we know whether it is.  I'll open a separate issue for this.

> use isBinary cached variable instead of instanceof in Filed
> ---
>
> Key: LUCENE-1217
> URL: https://issues.apache.org/jira/browse/LUCENE-1217
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Other
>Reporter: Eks Dev
>Assignee: Michael McCandless
>Priority: Trivial
> Attachments: Lucene-1217-take1.patch, LUCENE-1217.patch
>
>
> Filed class can hold three types of values, 
> See: AbstractField.java  protected Object fieldsData = null; 
> currently, mainly RTTI (instanceof) is used to determine the type of the 
> value stored in particular instance of the Field, but for binary value we 
> have mixed RTTI and cached variable "boolean isBinary" 
> This patch makes consistent use of cached variable isBinary.
> Benefit: consistent usage of method to determine run-time type for binary 
> case  (reduces chance to get out of sync on cached variable). It should be 
> slightly faster as well.
> Thinking aloud: 
> Would it not make sense to maintain type with some integer/byte"poor man's 
> enum" (Interface with a couple of constants)
> code:java{
> public static final interface Type{
> public static final byte BOOLEAN = 0;
> public static final byte STRING = 1;
> public static final byte READER = 2;
> 
> }
> }
> and use that instead of isBinary + instanceof? 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1219) support array/offset/ length setters for Field with binary data

2008-03-11 Thread Eks Dev (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eks Dev updated LUCENE-1219:


Attachment: LUCENE-1219.patch

this one keeps addition of new methods localized to AbstractField, does not 
change Fieldable interface... it looks like it could work done this way with a 
few instanceof checks in  FieldsWriter, This one has dependency on LUCENE-1217 

it will not give you any benefit if you directly implement your Fieldable 
without extending AbstractField, therefore   I would suggest to eventually  
change Fieldable to support all these methods that operate with offset/length. 
Or someone clever finds some way to change an interface without braking 
backwards compatibility :)

> support array/offset/ length setters for Field with binary data
> ---
>
> Key: LUCENE-1219
> URL: https://issues.apache.org/jira/browse/LUCENE-1219
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Eks Dev
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1219.patch, LUCENE-1219.patch, LUCENE-1219.patch
>
>
> currently Field/Fieldable interface supports only compact, zero based byte 
> arrays. This forces end users to create and copy content of new objects 
> before passing them to Lucene as such fields are often of variable size. 
> Depending on use case, this can bring far from negligible  performance  
> improvement. 
> this approach extends Fieldable interface with 3 new methods   
> getOffset(); gettLenght(); and getBinaryValue() (this only returns reference 
> to the array)
>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1035) Optional Buffer Pool to Improve Search Performance

2008-03-11 Thread Ning Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Li updated LUCENE-1035:


Attachment: LUCENE-1035.contrib.patch

Re-do as a contrib package. Creating BufferPooledDirectory with your customized 
file name filter for readers allows you to decide which files you want to use 
the caching layer for.

The package includes some tests. I also modified and tested the core tests with 
the caching layer in a private setting and all tests passed.

> Optional Buffer Pool to Improve Search Performance
> --
>
> Key: LUCENE-1035
> URL: https://issues.apache.org/jira/browse/LUCENE-1035
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Store
>Reporter: Ning Li
> Attachments: LUCENE-1035.contrib.patch, LUCENE-1035.patch
>
>
> Index in RAMDirectory provides better performance over that in FSDirectory.
> But many indexes cannot fit in memory or applications cannot afford to
> spend that much memory on index. On the other hand, because of locality,
> a reasonably sized buffer pool may provide good improvement over FSDirectory.
> This issue aims at providing such an optional buffer pool layer. In cases
> where it fits, i.e. a reasonable hit ratio can be achieved, it should provide
> a good improvement over FSDirectory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Ideas to refactor Filed

2008-03-11 Thread Chris Hostetter

: I think, if you give it the same name, it just grays out the old ones.  See
: https://issues.apache.org/jira/browse/LUCENE-550 for an example.
: 
: Thus, I prefer #3, but am fine with #2 as well.  #3 makes it easier, IMO, to
: find the latest.

use the same name if the patch serves the same purpose (in the majority of 
issues, there is a linear evolution of a single patch).  when doing this 
Jira recognizes that the patches "superceed" eachother, and allways 
prsents the latest at the top of the list with the others greyed out.

use differnet names for patches that serve differnet purposes (ie: one 
patch which may go through several iterations using one approach, someone 
may then post a differnet patch with a differnet name which attempts to 
solve the same problem with a completely differnet approach, someone else 
may then post a third patch with a third name which provides unit tests 
that work against both of the other patches ... at which point all three 
different" patches" may be updated many times as they evolve in attempting 
to find the best ultimate solution.

if you use differnet names for differnet iterations of the same "logical 
patch" it's very not easy to see in jira which one is the "newest" because 
jira orders patches with differnet names lexigraphically.  you have to go 
to the "Manage Attachemnts" screen or view the full history of the issue 
to get any sense of when each differently name patch was added.




-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1217) use isBinary cached variable instead of instanceof in Filed

2008-03-11 Thread Eks Dev (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eks Dev updated LUCENE-1217:


Attachment: Lucene-1217-take1.patch

new patch, fixes isBinary status in LazyField

> use isBinary cached variable instead of instanceof in Filed
> ---
>
> Key: LUCENE-1217
> URL: https://issues.apache.org/jira/browse/LUCENE-1217
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Other
>Reporter: Eks Dev
>Assignee: Michael McCandless
>Priority: Trivial
> Attachments: Lucene-1217-take1.patch, LUCENE-1217.patch
>
>
> Filed class can hold three types of values, 
> See: AbstractField.java  protected Object fieldsData = null; 
> currently, mainly RTTI (instanceof) is used to determine the type of the 
> value stored in particular instance of the Field, but for binary value we 
> have mixed RTTI and cached variable "boolean isBinary" 
> This patch makes consistent use of cached variable isBinary.
> Benefit: consistent usage of method to determine run-time type for binary 
> case  (reduces chance to get out of sync on cached variable). It should be 
> slightly faster as well.
> Thinking aloud: 
> Would it not make sense to maintain type with some integer/byte"poor man's 
> enum" (Interface with a couple of constants)
> code:java{
> public static final interface Type{
> public static final byte BOOLEAN = 0;
> public static final byte STRING = 1;
> public static final byte READER = 2;
> 
> }
> }
> and use that instead of isBinary + instanceof? 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1217) use isBinary cached variable instead of instanceof in Filed

2008-03-11 Thread Eks Dev (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577601#action_12577601
 ] 

Eks Dev commented on LUCENE-1217:
-

hah, this bug just  justified this patch :) 
sorry,  I should have run tests before... nothing is trivial enough.   
 The problem was indeed isBinary that went out of sync in LazyField, new patch 
follows 

> use isBinary cached variable instead of instanceof in Filed
> ---
>
> Key: LUCENE-1217
> URL: https://issues.apache.org/jira/browse/LUCENE-1217
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Other
>Reporter: Eks Dev
>Assignee: Michael McCandless
>Priority: Trivial
> Attachments: LUCENE-1217.patch
>
>
> Filed class can hold three types of values, 
> See: AbstractField.java  protected Object fieldsData = null; 
> currently, mainly RTTI (instanceof) is used to determine the type of the 
> value stored in particular instance of the Field, but for binary value we 
> have mixed RTTI and cached variable "boolean isBinary" 
> This patch makes consistent use of cached variable isBinary.
> Benefit: consistent usage of method to determine run-time type for binary 
> case  (reduces chance to get out of sync on cached variable). It should be 
> slightly faster as well.
> Thinking aloud: 
> Would it not make sense to maintain type with some integer/byte"poor man's 
> enum" (Interface with a couple of constants)
> code:java{
> public static final interface Type{
> public static final byte BOOLEAN = 0;
> public static final byte STRING = 1;
> public static final byte READER = 2;
> 
> }
> }
> and use that instead of isBinary + instanceof? 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1217) use isBinary cached variable instead of instanceof in Filed

2008-03-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577598#action_12577598
 ] 

Michael McCandless commented on LUCENE-1217:


Actually seeing a test failure with this:

[junit] Testcase: testLazyFields(org.apache.lucene.index.TestFieldsReader): 
FAILED
[junit] bytes is null and it shouldn't be
[junit] junit.framework.AssertionFailedError: bytes is null and it 
shouldn't be
[junit] at 
org.apache.lucene.index.TestFieldsReader.testLazyFields(TestFieldsReader.java:132)



> use isBinary cached variable instead of instanceof in Filed
> ---
>
> Key: LUCENE-1217
> URL: https://issues.apache.org/jira/browse/LUCENE-1217
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Other
>Reporter: Eks Dev
>Assignee: Michael McCandless
>Priority: Trivial
> Attachments: LUCENE-1217.patch
>
>
> Filed class can hold three types of values, 
> See: AbstractField.java  protected Object fieldsData = null; 
> currently, mainly RTTI (instanceof) is used to determine the type of the 
> value stored in particular instance of the Field, but for binary value we 
> have mixed RTTI and cached variable "boolean isBinary" 
> This patch makes consistent use of cached variable isBinary.
> Benefit: consistent usage of method to determine run-time type for binary 
> case  (reduces chance to get out of sync on cached variable). It should be 
> slightly faster as well.
> Thinking aloud: 
> Would it not make sense to maintain type with some integer/byte"poor man's 
> enum" (Interface with a couple of constants)
> code:java{
> public static final interface Type{
> public static final byte BOOLEAN = 0;
> public static final byte STRING = 1;
> public static final byte READER = 2;
> 
> }
> }
> and use that instead of isBinary + instanceof? 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1219) support array/offset/ length setters for Field with binary data

2008-03-11 Thread Eks Dev (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577597#action_12577597
 ] 

Eks Dev commented on LUCENE-1219:
-

I do not know for sure if this is something we could not live with.  Adding new 
interface sounds equally bad, would work nicely, but I do not like it as it 
makes code harder to follow with too many interfaces  ... I'll have another 
look at it to see if there is a way to do it without interface changes. Any 
ideas?

> support array/offset/ length setters for Field with binary data
> ---
>
> Key: LUCENE-1219
> URL: https://issues.apache.org/jira/browse/LUCENE-1219
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Eks Dev
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1219.patch, LUCENE-1219.patch
>
>
> currently Field/Fieldable interface supports only compact, zero based byte 
> arrays. This forces end users to create and copy content of new objects 
> before passing them to Lucene as such fields are often of variable size. 
> Depending on use case, this can bring far from negligible  performance  
> improvement. 
> this approach extends Fieldable interface with 3 new methods   
> getOffset(); gettLenght(); and getBinaryValue() (this only returns reference 
> to the array)
>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1217) use isBinary cached variable instead of instanceof in Filed

2008-03-11 Thread Eks Dev (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577591#action_12577591
 ] 

Eks Dev commented on LUCENE-1217:
-

thanks fof looking into it!
Subclassing now with backwards compatibility would be clumsy, I was thinking 
about it but could not find clean way to make it.

>>Or we could wait until Java 5 (3.0) and use real enums?
yes, that is ultimate solution, but my line of thoughts was that "poor man's 
enum"->java 5 enum migration would be trivial later... but do not change 
working code kicks-in here :)  

> use isBinary cached variable instead of instanceof in Filed
> ---
>
> Key: LUCENE-1217
> URL: https://issues.apache.org/jira/browse/LUCENE-1217
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Other
>Reporter: Eks Dev
>Assignee: Michael McCandless
>Priority: Trivial
> Attachments: LUCENE-1217.patch
>
>
> Filed class can hold three types of values, 
> See: AbstractField.java  protected Object fieldsData = null; 
> currently, mainly RTTI (instanceof) is used to determine the type of the 
> value stored in particular instance of the Field, but for binary value we 
> have mixed RTTI and cached variable "boolean isBinary" 
> This patch makes consistent use of cached variable isBinary.
> Benefit: consistent usage of method to determine run-time type for binary 
> case  (reduces chance to get out of sync on cached variable). It should be 
> slightly faster as well.
> Thinking aloud: 
> Would it not make sense to maintain type with some integer/byte"poor man's 
> enum" (Interface with a couple of constants)
> code:java{
> public static final interface Type{
> public static final byte BOOLEAN = 0;
> public static final byte STRING = 1;
> public static final byte READER = 2;
> 
> }
> }
> and use that instead of isBinary + instanceof? 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Updated: (LUCENE-1198) Exception in DocumentsWriter.ThreadState.init leads to corruption

2008-03-11 Thread Chris Hostetter

: Thanks Hoss!

I did the easy book-keeping part ... you're the guy fixing the bugs and 
merging them into the release branches :)


-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1222) IndexWriter.doAfterFlush not being called when there are no deletions flushed

2008-03-11 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated LUCENE-1222:
-

Fix Version/s: 2.4
   2.3.2

targeted for 2.3.2 bug fix release

> IndexWriter.doAfterFlush not being called when there are no deletions flushed
> -
>
> Key: LUCENE-1222
> URL: https://issues.apache.org/jira/browse/LUCENE-1222
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3, 2.3.1
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.3.2, 2.4
>
>
> It should be called when flushing either added docs or deletions.  The fix is 
> trivial.  I'll commit shortly to trunk & 2.3.2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1199) NullPointerException in IndexModifier.close()

2008-03-11 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated LUCENE-1199:
-

Fix Version/s: 2.3.2

targeted for 2.3.2 bug fix release

> NullPointerException in IndexModifier.close()
> -
>
> Key: LUCENE-1199
> URL: https://issues.apache.org/jira/browse/LUCENE-1199
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.0.0, 2.3.1
>Reporter: James William Dumay
> Fix For: 2.3.2, 2.4
>
>
> We upgraded from Lucene 2.0.0. to 2.3.1 hoping this would resolve this issue.
> http://jira.codehaus.org/browse/MRM-715
> Trace is as below for Lucene 2.3.1:
> java.lang.NullPointerException
> at org.apache.lucene.index.IndexModifier.close(IndexModifier.java:576)
> at 
> org.apache.maven.archiva.indexer.lucene.LuceneRepositoryContentIndex.closeQuietly(LuceneRepositoryContentIndex.java:416)
> at 
> org.apache.maven.archiva.indexer.lucene.LuceneRepositoryContentIndex.modifyRecord(LuceneRepositoryContentIndex.java:152)
> at 
> org.apache.maven.archiva.consumers.lucene.IndexContentConsumer.processFile(IndexContentConsumer.java:169)
> at 
> org.apache.maven.archiva.repository.scanner.functors.ConsumerProcessFileClosure.execute(ConsumerProcessFileClosure.java:51)
> at 
> org.apache.commons.collections.functors.IfClosure.execute(IfClosure.java:117)
> at 
> org.apache.commons.collections.CollectionUtils.forAllDo(CollectionUtils.java:388)
> at 
> org.apache.maven.archiva.repository.scanner.RepositoryContentConsumers.executeConsumers(RepositoryContentConsumers.java:283)
> at 
> org.apache.maven.archiva.proxy.DefaultRepositoryProxyConnectors.transferFile(DefaultRepositoryProxyConnectors.java:597)
> at 
> org.apache.maven.archiva.proxy.DefaultRepositoryProxyConnectors.fetchFromProxies(DefaultRepositoryProxyConnectors.java:157)
> at 
> org.apache.maven.archiva.web.repository.ProxiedDavServer.applyServerSideRelocation(ProxiedDavServer.java:447)
> at 
> org.apache.maven.archiva.web.repository.ProxiedDavServer.fetchContentFromProxies(ProxiedDavServer.java:354)
> at 
> org.apache.maven.archiva.web.repository.ProxiedDavServer.process(ProxiedDavServer.java:189)
> at 
> org.codehaus.plexus.webdav.servlet.multiplexed.MultiplexedWebDavServlet.service(MultiplexedWebDavServlet.java:119)
> at 
> org.apache.maven.archiva.web.repository.RepositoryServlet.service(RepositoryServlet.java:155)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1210) IndexWriter & ConcurrentMergeScheduler deadlock case if starting a merge hits an exception

2008-03-11 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated LUCENE-1210:
-

Fix Version/s: 2.3.2

targeted for 2.3.2 bug fix release

> IndexWriter & ConcurrentMergeScheduler deadlock case if starting a merge hits 
> an exception
> --
>
> Key: LUCENE-1210
> URL: https://issues.apache.org/jira/browse/LUCENE-1210
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3, 2.3.1
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.3.2, 2.4
>
>
> If you're using CMS (the default) and mergeInit hits an exception (eg
> OOME), we are not properly clearing IndexWriter's internal tracking of
> running merges.  This causes IW.close() to hang while it incorrectly
> waits for these non-started merges to finish.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1200) IndexWriter.addIndexes* can deadlock in rare cases

2008-03-11 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated LUCENE-1200:
-

Fix Version/s: 2.3.2

targeted for 2.3.2 bug fix release

> IndexWriter.addIndexes* can deadlock in rare cases
> --
>
> Key: LUCENE-1200
> URL: https://issues.apache.org/jira/browse/LUCENE-1200
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.4
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.3.2, 2.4
>
> Attachments: LUCENE-1200.patch
>
>
> In somewhat rare cases it's possible for addIndexes to deadlock
> because it is a synchronized method.
> Normally the merges that are necessary for addIndexes are done
> serially (with the primary thread) because they involve segments from
> an external directory.  However, if mergeFactor of these merges
> complete then a merge becomes necessary for the merged segments, which
> are not external, and so it can run in the background.  If too many BG
> threads need to run (currently > 4) then the "pause primary thread"
> approach adopted in LUCENE-1164 will deadlock, because the addIndexes
> method is holding a lock on IndexWriter.
> This was appearing as a intermittant deadlock in the
> TestIndexWriterMerging test case.
> This issue is not present in 2.3 (it was caused by LUCENE-1164).
> The solution is to shrink the scope of synchronization: don't
> synchronize on the whole method & wrap synchronized(this) in the right
> places inside the methods.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1208) Deadlock case in IndexWriter on exception just before flush

2008-03-11 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated LUCENE-1208:
-

Fix Version/s: 2.3.2

targeted for 2.3.2 bug fix release

> Deadlock case in IndexWriter on exception just before flush
> ---
>
> Key: LUCENE-1208
> URL: https://issues.apache.org/jira/browse/LUCENE-1208
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3, 2.3.1
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.3.2, 2.4
>
> Attachments: LUCENE-1208.patch
>
>
> If a document hits a non-aborting exception, eg something goes wrong
> in tokenStream.next(), and, that document had triggered a flush
> (due to RAM or doc count) then DocumentsWriter will deadlock because
> that thread marks the flush as pending but fails to clear it on
> exception.
> I have a simple test case showing this, and a fix fixing it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Updated: (LUCENE-1198) Exception in DocumentsWriter.ThreadState.init leads to corruption

2008-03-11 Thread Michael McCandless


Thanks Hoss!

Mike

On Mar 11, 2008, at 3:28 PM, Hoss Man (JIRA) wrote:



 [ https://issues.apache.org/jira/browse/LUCENE-1198? 
page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]


Hoss Man updated LUCENE-1198:
-

Fix Version/s: 2.3.2

targeted for 2.3.2 bug fix release


Exception in DocumentsWriter.ThreadState.init leads to corruption
-

Key: LUCENE-1198
URL: https://issues.apache.org/jira/browse/ 
LUCENE-1198

Project: Lucene - Java
 Issue Type: Bug
 Components: Index
   Affects Versions: 2.3
   Reporter: Michael McCandless
   Assignee: Michael McCandless
   Priority: Minor
Fix For: 2.3.2, 2.4

Attachments: LUCENE-1198.patch


If an exception is hit in the init method, DocumentsWriter  
incorrectly

increments numDocsInRAM when in fact the document is not added.
Spinoff of this thread:
  http://markmail.org/message/e76hgkgldxhakuaa
The root cause that led to the exception in init was actually due to
incorrect use of Lucene's APIs (one thread still modifying the
Document while IndexWriter.addDocument is adding it) but still we
should protect against any exceptions coming out of init.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1198) Exception in DocumentsWriter.ThreadState.init leads to corruption

2008-03-11 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated LUCENE-1198:
-

Fix Version/s: 2.3.2

targeted for 2.3.2 bug fix release

> Exception in DocumentsWriter.ThreadState.init leads to corruption
> -
>
> Key: LUCENE-1198
> URL: https://issues.apache.org/jira/browse/LUCENE-1198
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.3.2, 2.4
>
> Attachments: LUCENE-1198.patch
>
>
> If an exception is hit in the init method, DocumentsWriter incorrectly
> increments numDocsInRAM when in fact the document is not added.
> Spinoff of this thread:
>   http://markmail.org/message/e76hgkgldxhakuaa
> The root cause that led to the exception in init was actually due to
> incorrect use of Lucene's APIs (one thread still modifying the
> Document while IndexWriter.addDocument is adding it) but still we
> should protect against any exceptions coming out of init.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1197) IndexWriter can flush too early when flushing by RAM usage

2008-03-11 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated LUCENE-1197:
-

Fix Version/s: 2.3.2

targeted for 2.3.2 bug fix release

> IndexWriter can flush too early when flushing by RAM usage
> --
>
> Key: LUCENE-1197
> URL: https://issues.apache.org/jira/browse/LUCENE-1197
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.3.2, 2.4
>
>
> There is a silly bug in how DocumentsWriter tracks its RAM usage:
> whenever term vectors are enabled, it incorrectly counts the space
> used by term vectors towards flushing, when in fact this space is
> recycled per document.
> This is not a functionality bug.  All it causes is flushes to happen
> too frequently, and, IndexWriter will use less RAM than you asked it
> to.  To work around it you can simply give it a bigger RAM buffer.
> I will commit a fix shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1191) If IndexWriter hits OutOfMemoryError it should not commit

2008-03-11 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated LUCENE-1191:
-

Fix Version/s: 2.3.2

targeted for 2.3.2 bug fix release

> If IndexWriter hits OutOfMemoryError it should not commit
> -
>
> Key: LUCENE-1191
> URL: https://issues.apache.org/jira/browse/LUCENE-1191
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 1.9, 2.0.0, 2.1, 2.2, 2.3, 2.3.1, 2.4
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.3.2, 2.4
>
> Attachments: LUCENE-1191.patch
>
>
> While progress has been made making IndexWriter robust to OOME, I
> think there is still a real risk that an OOME at a bad time could put
> IndexWriter into a bad state such that if close() is called and
> somehow it succeeds without hitting another OOME, it risks
> introducing messing up the index.
> I'd like to detect if OOME has been hit in any of the methods that
> alter IW's state, and if so, do not commit changes to the index.  If
> close is called after hitting OOME, I think writer should instead
> abort.
> Attached patch just adds try/catch clauses to catch OOME, note that
> it was hit, and re-throw it.  Then, sync() refuses to commit a new
> segments_N if OOME was hit, and close instead calls abort when OOME
> was hit.  All tests pass.  I plan to commit in a day or two.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1207) Allow spell check input to be part of the results

2008-03-11 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated LUCENE-1207:
-

Lucene Fields: [New, Patch Available]  (was: [Patch Available, New])
Fix Version/s: (was: 2.3.1)

this was not actually part of (the already released) 2.3.1 -- removing "Fix 
Version"

> Allow spell check input to be part of the results
> -
>
> Key: LUCENE-1207
> URL: https://issues.apache.org/jira/browse/LUCENE-1207
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/*
>Reporter: Karl Wettin
>Priority: Trivial
> Attachments: canSuggestSelf.patch
>
>
> As a threadshold marker, to see if the word seems to exist at all, or what 
> not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1221) DocumentsWriter truncates term text at \uFFFF

2008-03-11 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577563#action_12577563
 ] 

Yonik Seeley commented on LUCENE-1221:
--

If there is a real character that doesn't appear in a property name, it would 
be much safer to use that.
Using non-unicode chars or reserved chars is pretty dicey since you never know 
what methods might throw an exception because of it.

> DocumentsWriter truncates term text at \u
> -
>
> Key: LUCENE-1221
> URL: https://issues.apache.org/jira/browse/LUCENE-1221
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3, 2.3.1
>Reporter: Marcel Reutegger
>Priority: Minor
> Attachments: OddTermTest.java
>
>
> When a Term text contains the unicode 'character' \u, DocumentsWriter 
> will truncate the text and only write the text up to the \u character.
> This has been introduces with changes for LUCENE-843 to reduce memory usage 
> and improve performance.
> This change in behavior prevents us (Jackrabbit) from upgrading to Lucene 2.3.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1221) DocumentsWriter truncates term text at \uFFFF

2008-03-11 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577561#action_12577561
 ] 

Marcel Reutegger commented on LUCENE-1221:
--

I'll see if I can build some kind of filter index reader that translates 
existing terms on the fly to use a new separator, while new terms are written 
with the new separator.

> DocumentsWriter truncates term text at \u
> -
>
> Key: LUCENE-1221
> URL: https://issues.apache.org/jira/browse/LUCENE-1221
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3, 2.3.1
>Reporter: Marcel Reutegger
>Priority: Minor
> Attachments: OddTermTest.java
>
>
> When a Term text contains the unicode 'character' \u, DocumentsWriter 
> will truncate the text and only write the text up to the \u character.
> This has been introduces with changes for LUCENE-843 to reduce memory usage 
> and improve performance.
> This change in behavior prevents us (Jackrabbit) from upgrading to Lucene 2.3.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1219) support array/offset/ length setters for Field with binary data

2008-03-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577551#action_12577551
 ] 

Michael McCandless commented on LUCENE-1219:


Hmm ... one problem is Fieldable is an interface, and this patch adds methods 
to the interface, which I believe breaks our backwards compatibility 
requirement.

> support array/offset/ length setters for Field with binary data
> ---
>
> Key: LUCENE-1219
> URL: https://issues.apache.org/jira/browse/LUCENE-1219
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Eks Dev
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1219.patch, LUCENE-1219.patch
>
>
> currently Field/Fieldable interface supports only compact, zero based byte 
> arrays. This forces end users to create and copy content of new objects 
> before passing them to Lucene as such fields are often of variable size. 
> Depending on use case, this can bring far from negligible  performance  
> improvement. 
> this approach extends Fieldable interface with 3 new methods   
> getOffset(); gettLenght(); and getBinaryValue() (this only returns reference 
> to the array)
>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1217) use isBinary cached variable instead of instanceof in Filed

2008-03-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577547#action_12577547
 ] 

Michael McCandless commented on LUCENE-1217:


Patch looks good.  I will commit shortly.  Thanks Eks Dev.

{quote}
Would it not make sense to maintain type with some integer/byte"poor man's 
enum" (Interface with a couple of constants)
{quote}

Or we could wait until Java 5 (3.0) and use real enums?

Or ... maybe we should have subclasses of Field (TextField, BinaryField,
ReaderField, TokenStreamField) which override the corresponding method
(and the base Field.java would still implement these methods but
return null)?  Though this would be a rather large change...

> use isBinary cached variable instead of instanceof in Filed
> ---
>
> Key: LUCENE-1217
> URL: https://issues.apache.org/jira/browse/LUCENE-1217
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Other
>Reporter: Eks Dev
>Assignee: Michael McCandless
>Priority: Trivial
> Attachments: LUCENE-1217.patch
>
>
> Filed class can hold three types of values, 
> See: AbstractField.java  protected Object fieldsData = null; 
> currently, mainly RTTI (instanceof) is used to determine the type of the 
> value stored in particular instance of the Field, but for binary value we 
> have mixed RTTI and cached variable "boolean isBinary" 
> This patch makes consistent use of cached variable isBinary.
> Benefit: consistent usage of method to determine run-time type for binary 
> case  (reduces chance to get out of sync on cached variable). It should be 
> slightly faster as well.
> Thinking aloud: 
> Would it not make sense to maintain type with some integer/byte"poor man's 
> enum" (Interface with a couple of constants)
> code:java{
> public static final interface Type{
> public static final byte BOOLEAN = 0;
> public static final byte STRING = 1;
> public static final byte READER = 2;
> 
> }
> }
> and use that instead of isBinary + instanceof? 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1221) DocumentsWriter truncates term text at \uFFFF

2008-03-11 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577544#action_12577544
 ] 

Marcel Reutegger commented on LUCENE-1221:
--

> How/why are you seeing/using this character in Jackrabbit

To avoid an excessive amount of Lucene fields we prefix term values with the 
JCR property name and put everything under the same Lucene field name. The 
0x separates the property name from the property value.

See: JCR-106. That was before Lucene 2.1, when each field had a separate norm 
file.

> DocumentsWriter truncates term text at \u
> -
>
> Key: LUCENE-1221
> URL: https://issues.apache.org/jira/browse/LUCENE-1221
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3, 2.3.1
>Reporter: Marcel Reutegger
>Priority: Minor
> Attachments: OddTermTest.java
>
>
> When a Term text contains the unicode 'character' \u, DocumentsWriter 
> will truncate the text and only write the text up to the \u character.
> This has been introduces with changes for LUCENE-843 to reduce memory usage 
> and improve performance.
> This change in behavior prevents us (Jackrabbit) from upgrading to Lucene 2.3.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Resolved: (LUCENE-1222) IndexWriter.doAfterFlush not being called when there are no deletions flushed

2008-03-11 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-1222.


Resolution: Fixed

> IndexWriter.doAfterFlush not being called when there are no deletions flushed
> -
>
> Key: LUCENE-1222
> URL: https://issues.apache.org/jira/browse/LUCENE-1222
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3, 2.3.1
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
>
> It should be called when flushing either added docs or deletions.  The fix is 
> trivial.  I'll commit shortly to trunk & 2.3.2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-1222) IndexWriter.doAfterFlush not being called when there are no deletions flushed

2008-03-11 Thread Michael McCandless (JIRA)
IndexWriter.doAfterFlush not being called when there are no deletions flushed
-

 Key: LUCENE-1222
 URL: https://issues.apache.org/jira/browse/LUCENE-1222
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.3.1, 2.3
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor


It should be called when flushing either added docs or deletions.  The fix is 
trivial.  I'll commit shortly to trunk & 2.3.2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1221) DocumentsWriter truncates term text at \uFFFF

2008-03-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577513#action_12577513
 ] 

Michael McCandless commented on LUCENE-1221:


Hmmm ... 0x is one of the "invalid for interchange but may freely
be used internal to an implementation" UTF-16 characters (from
http://unicode.org/faq/utf_bom.html#6), so I assumed it was safe to
use internally in DocumentsWriter.

But apparently you are using it.  How/why are you seeing/using this
character in Jackrabbit?

Note that with LUCENE-510 (not yet fixed but in progress), there may
be similar issues whereby the treatment of other kinds of invalid
UTF-16 strings changes.



> DocumentsWriter truncates term text at \u
> -
>
> Key: LUCENE-1221
> URL: https://issues.apache.org/jira/browse/LUCENE-1221
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3, 2.3.1
>Reporter: Marcel Reutegger
>Priority: Minor
> Attachments: OddTermTest.java
>
>
> When a Term text contains the unicode 'character' \u, DocumentsWriter 
> will truncate the text and only write the text up to the \u character.
> This has been introduces with changes for LUCENE-843 to reduce memory usage 
> and improve performance.
> This change in behavior prevents us (Jackrabbit) from upgrading to Lucene 2.3.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1221) DocumentsWriter truncates term text at \uFFFF

2008-03-11 Thread Marcel Reutegger (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger updated LUCENE-1221:
-

Attachment: OddTermTest.java

Test to reproduce the issue.

> DocumentsWriter truncates term text at \u
> -
>
> Key: LUCENE-1221
> URL: https://issues.apache.org/jira/browse/LUCENE-1221
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3, 2.3.1
>Reporter: Marcel Reutegger
>Priority: Minor
> Attachments: OddTermTest.java
>
>
> When a Term text contains the unicode 'character' \u, DocumentsWriter 
> will truncate the text and only write the text up to the \u character.
> This has been introduces with changes for LUCENE-843 to reduce memory usage 
> and improve performance.
> This change in behavior prevents us (Jackrabbit) from upgrading to Lucene 2.3.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-1221) DocumentsWriter truncates term text at \uFFFF

2008-03-11 Thread Marcel Reutegger (JIRA)
DocumentsWriter truncates term text at \u
-

 Key: LUCENE-1221
 URL: https://issues.apache.org/jira/browse/LUCENE-1221
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.3.1, 2.3
Reporter: Marcel Reutegger
Priority: Minor


When a Term text contains the unicode 'character' \u, DocumentsWriter will 
truncate the text and only write the text up to the \u character.

This has been introduces with changes for LUCENE-843 to reduce memory usage and 
improve performance.

This change in behavior prevents us (Jackrabbit) from upgrading to Lucene 2.3.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Ideas to refactor Filed

2008-03-11 Thread eks dev
thanks, 
I get it now, matter of taste :) 

I would opt for,
#3 if you fix bugs from previous patch, decorate javadoc..., but you leave 
things mainly as they are
#2 is better to mark interface, approach change or something more substantial 


- Original Message 
From: Grant Ingersoll <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Tuesday, 11 March, 2008 4:47:16 PM
Subject: Re: Ideas to refactor Filed

I think, if you give it the same name, it just grays out the old  
ones.  See https://issues.apache.org/jira/browse/LUCENE-550 for an  
example..

Thus, I prefer #3, but am fine with #2 as well.  #3 makes it easier,  
IMO, to find the latest.

-Grant

On Mar 11, 2008, at 10:26 AM, Michael McCandless wrote:

>
> I like #2.
>
> I don't think we should delete/replace attachments in Jira.  The  
> history can be useful..
>
> Mike
>
> eks dev wrote:
>
>> Michael, others
>>
>> what is Lucene/Jira best practice for new versions of the same patch:
>>
>> 1. delete existing / add new patch wit the same name
>> 2. add new patch with some funky version e.g. "Jira-1219-take3.patch"
>> 3. just add new patch with the same name
>>
>> ?
>>
>>
>>
>>
>>
>>
>>  __
>> Sent from Yahoo! Mail.
>> The World's Favourite Email http://uk.docs.yahoo.com/nowyoucan.html
>>
>>
>> -
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>

--
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ






-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






  __
Sent from Yahoo! Mail.
The World's Favourite Email http://uk.docs.yahoo.com/nowyoucan.html


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Resolved: (LUCENE-1220) PDF search is not working

2008-03-11 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved LUCENE-1220.
-

Resolution: Invalid

Lucene knows nothing about PDFs.  It is up to your application to handle PDFs.  
See Tika or PDFBox or other tools for how to do that.

> PDF search is not working
> -
>
> Key: LUCENE-1220
> URL: https://issues.apache.org/jira/browse/LUCENE-1220
> Project: Lucene - Java
>  Issue Type: Test
>Reporter: Akshya kumar
>
> I uploaded pdf file in my repository and try for full text search.Its not 
> able to search in PDF,MS powerpoint,HTML  files while it is able to search in 
> Ms Word,text,MS Excel files.Can u suggest me any solution how to get result.
> Following is my XPapth Query.
> String str = "Documentum";
> String sQuery = "//element(*,nt:unstructured)[jcr:contains(jcr:content,' " + 
> str + " ')]/rep:excerpt(.)";
>  Query q =qm.createQuery(sQuery, Query.XPATH);

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Ideas to refactor Filed

2008-03-11 Thread Grant Ingersoll
I think, if you give it the same name, it just grays out the old  
ones.  See https://issues.apache.org/jira/browse/LUCENE-550 for an  
example.


Thus, I prefer #3, but am fine with #2 as well.  #3 makes it easier,  
IMO, to find the latest.


-Grant

On Mar 11, 2008, at 10:26 AM, Michael McCandless wrote:



I like #2.

I don't think we should delete/replace attachments in Jira.  The  
history can be useful.


Mike

eks dev wrote:


Michael, others

what is Lucene/Jira best practice for new versions of the same patch:

1. delete existing / add new patch wit the same name
2. add new patch with some funky version e.g. "Jira-1219-take3.patch"
3. just add new patch with the same name

?






 __
Sent from Yahoo! Mail.
The World's Favourite Email http://uk.docs.yahoo.com/nowyoucan.html


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



--
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ






-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-1220) PDF search is not working

2008-03-11 Thread Akshya kumar (JIRA)
PDF search is not working
-

 Key: LUCENE-1220
 URL: https://issues.apache.org/jira/browse/LUCENE-1220
 Project: Lucene - Java
  Issue Type: Test
Reporter: Akshya kumar


I uploaded pdf file in my repository and try for full text search.Its not able 
to search in PDF,MS powerpoint,HTML  files while it is able to search in Ms 
Word,text,MS Excel files.Can u suggest me any solution how to get result.
Following is my XPapth Query.
String str = "Documentum";

String sQuery = "//element(*,nt:unstructured)[jcr:contains(jcr:content,' " + 
str + " ')]/rep:excerpt(.)";
 Query q =qm.createQuery(sQuery, Query.XPATH);

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Ideas to refactor Filed

2008-03-11 Thread Michael McCandless


I like #2.

I don't think we should delete/replace attachments in Jira.  The  
history can be useful.


Mike

eks dev wrote:


Michael, others

what is Lucene/Jira best practice for new versions of the same patch:

1. delete existing / add new patch wit the same name
2. add new patch with some funky version e.g. "Jira-1219-take3.patch"
3. just add new patch with the same name

?






  __
Sent from Yahoo! Mail.
The World's Favourite Email http://uk.docs.yahoo.com/nowyoucan.html


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Ideas to refactor Filed

2008-03-11 Thread eks dev
Michael, others

what is Lucene/Jira best practice for new versions of the same patch:

1. delete existing / add new patch wit the same name
2. add new patch with some funky version e.g. "Jira-1219-take3.patch"
3. just add new patch with the same name

?






  __
Sent from Yahoo! Mail.
The World's Favourite Email http://uk.docs.yahoo.com/nowyoucan.html


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1219) support array/offset/ length setters for Field with binary data

2008-03-11 Thread Eks Dev (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eks Dev updated LUCENE-1219:


Attachment: LUCENE-1219.patch

Michael McCandless had some nice ideas on how to make  getValue() change 
performance penalty for legacy usage negligible, this patch includes them: 
- deprecates getValue() method 
- returns direct reference if offset==0 && length == data.length

> support array/offset/ length setters for Field with binary data
> ---
>
> Key: LUCENE-1219
> URL: https://issues.apache.org/jira/browse/LUCENE-1219
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Eks Dev
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1219.patch, LUCENE-1219.patch
>
>
> currently Field/Fieldable interface supports only compact, zero based byte 
> arrays. This forces end users to create and copy content of new objects 
> before passing them to Lucene as such fields are often of variable size. 
> Depending on use case, this can bring far from negligible  performance  
> improvement. 
> this approach extends Fieldable interface with 3 new methods   
> getOffset(); gettLenght(); and getBinaryValue() (this only returns reference 
> to the array)
>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Ideas to refactor Filed

2008-03-11 Thread eks dev
tip with extra checks is good, deprecate even better, I will update patch

- Original Message 
From: Michael McCandless <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Tuesday, 11 March, 2008 2:45:56 PM
Subject: Re: Ideas to refactor Filed


Hello!

Responses below:

eks dev wrote:

> Moin Moin Michael,
>
> for the first issue I have crated LUCENE-1217, and for the second  
> one I have some questions.
>
> if we maintain length and offset internally in Field than we have  
> one, imo, theoretical "legacy performance problem" as we need to
> create new byte[length] and copy in order to preserve compatibility  
> (users expect this method to return compact array with 0 offset)
> I am talking about.
> public byte[] binaryValue();

Actually, if offset==0 and dataLength==array.length, can't we return  
the array itself?  This way legacy apps, which will pass both these  
checks, would see tiny (because of these added checks) performance  
loss?  Also, in a search setting, where doc was created from stored  
fields, I think both those checks would be true as well (unless  
FieldsReader is changed to share byte[] arrays between fields).

I think we should then deprecate binaryValue() in favor of  
getBinaryValue()?

> would that be acceptable, it is very small penalty and there will  
> be a way to avoid it? Anyhow, if one is using
> public void setValue(byte[] value), it is to be expected that this  
> user allready has a reference to value.  This makes this
> question rather theoretical, no?
>
> we could than create new methods,  getOffset() getLength()  
> getBinaryValue() that enable full spectrum and replace all uses  
> that expect 0-offset array.

Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






  __
Sent from Yahoo! Mail.
The World's Favourite Email http://uk.docs.yahoo.com/nowyoucan.html


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Ideas to refactor Filed

2008-03-11 Thread Michael McCandless


Hello!

Responses below:

eks dev wrote:


Moin Moin Michael,

for the first issue I have crated LUCENE-1217, and for the second  
one I have some questions.


if we maintain length and offset internally in Field than we have  
one, imo, theoretical "legacy performance problem" as we need to
create new byte[length] and copy in order to preserve compatibility  
(users expect this method to return compact array with 0 offset)

I am talking about.
public byte[] binaryValue();


Actually, if offset==0 and dataLength==array.length, can't we return  
the array itself?  This way legacy apps, which will pass both these  
checks, would see tiny (because of these added checks) performance  
loss?  Also, in a search setting, where doc was created from stored  
fields, I think both those checks would be true as well (unless  
FieldsReader is changed to share byte[] arrays between fields).


I think we should then deprecate binaryValue() in favor of  
getBinaryValue()?


would that be acceptable, it is very small penalty and there will  
be a way to avoid it? Anyhow, if one is using
public void setValue(byte[] value), it is to be expected that this  
user allready has a reference to value.  This makes this

question rather theoretical, no?

we could than create new methods,  getOffset() getLength()  
getBinaryValue() that enable full spectrum and replace all uses  
that expect 0-offset array.


Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Assigned: (LUCENE-1217) use isBinary cached variable instead of instanceof in Filed

2008-03-11 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-1217:
--

Assignee: Michael McCandless

> use isBinary cached variable instead of instanceof in Filed
> ---
>
> Key: LUCENE-1217
> URL: https://issues.apache.org/jira/browse/LUCENE-1217
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Other
>Reporter: Eks Dev
>Assignee: Michael McCandless
>Priority: Trivial
> Attachments: LUCENE-1217.patch
>
>
> Filed class can hold three types of values, 
> See: AbstractField.java  protected Object fieldsData = null; 
> currently, mainly RTTI (instanceof) is used to determine the type of the 
> value stored in particular instance of the Field, but for binary value we 
> have mixed RTTI and cached variable "boolean isBinary" 
> This patch makes consistent use of cached variable isBinary.
> Benefit: consistent usage of method to determine run-time type for binary 
> case  (reduces chance to get out of sync on cached variable). It should be 
> slightly faster as well.
> Thinking aloud: 
> Would it not make sense to maintain type with some integer/byte"poor man's 
> enum" (Interface with a couple of constants)
> code:java{
> public static final interface Type{
> public static final byte BOOLEAN = 0;
> public static final byte STRING = 1;
> public static final byte READER = 2;
> 
> }
> }
> and use that instead of isBinary + instanceof? 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Assigned: (LUCENE-1219) support array/offset/ length setters for Field with binary data

2008-03-11 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-1219:
--

Assignee: Michael McCandless

> support array/offset/ length setters for Field with binary data
> ---
>
> Key: LUCENE-1219
> URL: https://issues.apache.org/jira/browse/LUCENE-1219
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Eks Dev
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1219.patch
>
>
> currently Field/Fieldable interface supports only compact, zero based byte 
> arrays. This forces end users to create and copy content of new objects 
> before passing them to Lucene as such fields are often of variable size. 
> Depending on use case, this can bring far from negligible  performance  
> improvement. 
> this approach extends Fieldable interface with 3 new methods   
> getOffset(); gettLenght(); and getBinaryValue() (this only returns reference 
> to the array)
>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1219) support array/offset/ length setters for Field with binary data

2008-03-11 Thread Eks Dev (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eks Dev updated LUCENE-1219:


Attachment: LUCENE-1219.patch

> support array/offset/ length setters for Field with binary data
> ---
>
> Key: LUCENE-1219
> URL: https://issues.apache.org/jira/browse/LUCENE-1219
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Eks Dev
>Priority: Minor
> Attachments: LUCENE-1219.patch
>
>
> currently Field/Fieldable interface supports only compact, zero based byte 
> arrays. This forces end users to create and copy content of new objects 
> before passing them to Lucene as such fields are often of variable size. 
> Depending on use case, this can bring far from negligible  performance  
> improvement. 
> this approach extends Fieldable interface with 3 new methods   
> getOffset(); gettLenght(); and getBinaryValue() (this only returns reference 
> to the array)
>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1219) support array/offset/ length setters for Field with binary data

2008-03-11 Thread Eks Dev (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eks Dev updated LUCENE-1219:


Attachment: (was: LUCENE-1219.patch)

> support array/offset/ length setters for Field with binary data
> ---
>
> Key: LUCENE-1219
> URL: https://issues.apache.org/jira/browse/LUCENE-1219
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Eks Dev
>Priority: Minor
>
> currently Field/Fieldable interface supports only compact, zero based byte 
> arrays. This forces end users to create and copy content of new objects 
> before passing them to Lucene as such fields are often of variable size. 
> Depending on use case, this can bring far from negligible  performance  
> improvement. 
> this approach extends Fieldable interface with 3 new methods   
> getOffset(); gettLenght(); and getBinaryValue() (this only returns reference 
> to the array)
>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1219) support array/offset/ length setters for Field with binary data

2008-03-11 Thread Eks Dev (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eks Dev updated LUCENE-1219:


Attachment: LUCENE-1219.patch

all tests pass with this patch. 
 some polish needed and probably more testing, TODOs:

- someone pedantic should check if these new set / get methods should be named 
better 
- check if there are more places where this new feature cold/should be used, I 
think I have changed all of them but one place, direct subclass FieldForMerge 
in FieldsReader, this is the code I do not know so I did not touch it...
-  javadoc  is poor 

should be enough to get us started.

the only "pseudo-issue"  I see is that 
public byte[] binaryValue(); now creates byte[] and copies content into it, 
reference to original array can be now fetched via getBinaryValue() method... 
this is to preserve compatibility as users expect compact, zero based array 
from this method and we keep offset/length in Field now
this is "pseudo issue" as users already should have a reference to this array, 
so this method is rather superfluous for end users.

 




> support array/offset/ length setters for Field with binary data
> ---
>
> Key: LUCENE-1219
> URL: https://issues.apache.org/jira/browse/LUCENE-1219
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Eks Dev
>Priority: Minor
> Attachments: LUCENE-1219.patch
>
>
> currently Field/Fieldable interface supports only compact, zero based byte 
> arrays. This forces end users to create and copy content of new objects 
> before passing them to Lucene as such fields are often of variable size. 
> Depending on use case, this can bring far from negligible  performance  
> improvement. 
> this approach extends Fieldable interface with 3 new methods   
> getOffset(); gettLenght(); and getBinaryValue() (this only returns reference 
> to the array)
>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-1208) Deadlock case in IndexWriter on exception just before flush

2008-03-11 Thread Michael McCandless


OK I've backported fixes for these issues to the 2.3 branch!

Mike

Michael Busch wrote:


Michael McCandless (JIRA) wrote:
[ https://issues.apache.org/jira/browse/LUCENE-1208? 
page=com.atlassian.jira.plugin.system.issuetabpanels:comment- 
tabpanel&focusedCommentId=12576941#action_12576941 ]


Michael McCandless commented on LUCENE-1208:


Agreed.  I'm thinking these issues should be ported to 2.3.2:

  LUCENE-1191
  LUCENE-1197
  LUCENE-1198
  LUCENE-1199
  LUCENE-1200
  LUCENE-1208 (this issue)
  LUCENE-1210



+1

-Michael

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-1219) support array/offset/ length setters for Field with binary data

2008-03-11 Thread Eks Dev (JIRA)
support array/offset/ length setters for Field with binary data
---

 Key: LUCENE-1219
 URL: https://issues.apache.org/jira/browse/LUCENE-1219
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Eks Dev
Priority: Minor


currently Field/Fieldable interface supports only compact, zero based byte 
arrays. This forces end users to create and copy content of new objects before 
passing them to Lucene as such fields are often of variable size. Depending on 
use case, this can bring far from negligible  performance  improvement. 

this approach extends Fieldable interface with 3 new methods   
getOffset(); gettLenght(); and getBinaryValue() (this only returns reference to 
the array)

   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Ideas to refactor Filed

2008-03-11 Thread eks dev
Moin Moin Michael, 

for the first issue I have crated LUCENE-1217, and for the second one I have 
some questions.

if we maintain length and offset internally in Field than we have one, imo, 
theoretical "legacy performance problem" as we need to 
create new byte[length] and copy in order to preserve compatibility (users 
expect this method to return compact array with 0 offset)
I am talking about. 
public byte[] binaryValue();

would that be acceptable, it is very small penalty and there will be a way to 
avoid it? Anyhow, if one is using 
public void setValue(byte[] value), it is to be expected that this user 
allready has a reference to value.  This makes this 
question rather theoretical, no?

we could than create new methods,  getOffset() getLength() getBinaryValue() 
that enable full spectrum and replace all uses that expect 0-offset array.





- Original Message 
From: Michael McCandless <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Wednesday, 5 March, 2008 10:09:26 AM
Subject: Re: Ideas to refactor Filed


Good morning!

eks dev wrote:

> I have noticed the two potential enhancements in Field, and I am  
> not sure if I read it correctly, so better to ask before crating  
> Jira issue :)
>
> 1.. Field uses two methods to determine type of fieldsData,  
> sometimes with boolean isBinary; and sometimes with instanceof byt[]
> The proposal is to reduce it to one method, ether by removing   
> isBinary and using instance of byte[] or to replace one instanceof  
> with isBinary. I do not know which one should be faster?

This makes sense.  Is this for the binaryValue() method?  I would  
expect the explicit isBinary would be fastest.

> 2. Second enhancement would be to add length of char[]/byte[], to  
> setValue(...) methods e.g.
> public void setValue(byte[] value, int length)  //maybe offset as  
> well?
> This would enable users to save some allocations

This also makes sense.  I think adding offset and length makes sense.

Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






  ___ 
Rise to the challenge for Sport Relief with Yahoo! For Good  

http://uk.promotions.yahoo.com/forgood/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1218) PassTokenizerFilter that pass text in a Token

2008-03-11 Thread Hiroaki Kawai (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hiroaki Kawai updated LUCENE-1218:
--

Attachment: PassTokenizer.java

> PassTokenizerFilter that pass text in a Token
> -
>
> Key: LUCENE-1218
> URL: https://issues.apache.org/jira/browse/LUCENE-1218
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Analysis
>Reporter: Hiroaki Kawai
>Priority: Minor
> Attachments: PassTokenizer.java
>
>
> The PassTokenizer passes a text in a TokenStream that has a single token in 
> its stream.
> This Tokenizer(attached) is very useful for debugging. You may test 
> TokenFilter that will do sub-tokenization in a token. This Tokenizer is also 
> useful for debugging the tokenization process dependent on the TokenFilter 
> order.
> This Tokenizer is not suitable for processing a large text field.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-1218) PassTokenizerFilter that pass text in a Token

2008-03-11 Thread Hiroaki Kawai (JIRA)
PassTokenizerFilter that pass text in a Token
-

 Key: LUCENE-1218
 URL: https://issues.apache.org/jira/browse/LUCENE-1218
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Analysis
Reporter: Hiroaki Kawai
Priority: Minor
 Attachments: PassTokenizer.java

The PassTokenizer passes a text in a TokenStream that has a single token in its 
stream.

This Tokenizer(attached) is very useful for debugging. You may test TokenFilter 
that will do sub-tokenization in a token. This Tokenizer is also useful for 
debugging the tokenization process dependent on the TokenFilter order.

This Tokenizer is not suitable for processing a large text field.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1217) use isBinary cached variable instead of instanceof in Filed

2008-03-11 Thread Eks Dev (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eks Dev updated LUCENE-1217:


Attachment: LUCENE-1217.patch

> use isBinary cached variable instead of instanceof in Filed
> ---
>
> Key: LUCENE-1217
> URL: https://issues.apache.org/jira/browse/LUCENE-1217
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Other
>Reporter: Eks Dev
>Priority: Trivial
> Attachments: LUCENE-1217.patch
>
>
> Filed class can hold three types of values, 
> See: AbstractField.java  protected Object fieldsData = null; 
> currently, mainly RTTI (instanceof) is used to determine the type of the 
> value stored in particular instance of the Field, but for binary value we 
> have mixed RTTI and cached variable "boolean isBinary" 
> This patch makes consistent use of cached variable isBinary.
> Benefit: consistent usage of method to determine run-time type for binary 
> case  (reduces chance to get out of sync on cached variable). It should be 
> slightly faster as well.
> Thinking aloud: 
> Would it not make sense to maintain type with some integer/byte"poor man's 
> enum" (Interface with a couple of constants)
> code:java{
> public static final interface Type{
> public static final byte BOOLEAN = 0;
> public static final byte STRING = 1;
> public static final byte READER = 2;
> 
> }
> }
> and use that instead of isBinary + instanceof? 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-1217) use isBinary cached variable instead of instanceof in Filed

2008-03-11 Thread Eks Dev (JIRA)
use isBinary cached variable instead of instanceof in Filed
---

 Key: LUCENE-1217
 URL: https://issues.apache.org/jira/browse/LUCENE-1217
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Reporter: Eks Dev
Priority: Trivial


Filed class can hold three types of values, 
See: AbstractField.java  protected Object fieldsData = null; 

currently, mainly RTTI (instanceof) is used to determine the type of the value 
stored in particular instance of the Field, but for binary value we have mixed 
RTTI and cached variable "boolean isBinary" 

This patch makes consistent use of cached variable isBinary.

Benefit: consistent usage of method to determine run-time type for binary case  
(reduces chance to get out of sync on cached variable). It should be slightly 
faster as well.

Thinking aloud: 
Would it not make sense to maintain type with some integer/byte"poor man's 
enum" (Interface with a couple of constants)
code:java{
public static final interface Type{
public static final byte BOOLEAN = 0;
public static final byte STRING = 1;
public static final byte READER = 2;

}
}

and use that instead of isBinary + instanceof? 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Resolved: (LUCENE-1213) MultiFieldQueryParser ignores slop parameter

2008-03-11 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-1213.
-

   Resolution: Fixed
Lucene Fields: [Patch Available]  (was: [New])

Committed, thanks Trejkaz!

> MultiFieldQueryParser ignores slop parameter
> 
>
> Key: LUCENE-1213
> URL: https://issues.apache.org/jira/browse/LUCENE-1213
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: QueryParser
>Reporter: Trejkaz
>Assignee: Doron Cohen
> Attachments: multifield-fix.patch, multifield-fix.patch
>
>
> MultiFieldQueryParser.getFieldQuery(String, String, int) calls 
> super.getFieldQuery(String, String), thus obliterating any slop parameter 
> present in the query.
> It should probably be changed to call super.getFieldQuery(String, String, 
> int), except doing only that will result in a recursive loop which is a 
> side-effect of what may be a deeper problem in MultiFieldQueryParser -- 
> getFieldQuery(String, String, int) is documented as delegating to 
> getFieldQuery(String, String), yet what it actually does is the exact 
> opposite.  This also causes problems for subclasses which need to override 
> getFieldQuery(String, String) to provide different behaviour.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Issue Comment Edited: (LUCENE-1213) MultiFieldQueryParser ignores slop parameter

2008-03-11 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576880#action_12576880
 ] 

doronc edited comment on LUCENE-1213 at 3/11/08 1:00 AM:
--

Trejkaz thanks for the patch. 

Attached a slightly compacted fix (refactoring slop-applying to a separate 
method).
Also added a test that fails without this fix.

All tests pass, if there are no comments I will commit this in a day or two.

  was (Author: doronc):
Trekaj thanks for the patch. 

Attached a slightly compacted fix (refactoring slop-applying to a separate 
method).
Also added a test that fails without this fix.

All tests pass, if there are no comments I will commit this in a day or two.
  
> MultiFieldQueryParser ignores slop parameter
> 
>
> Key: LUCENE-1213
> URL: https://issues.apache.org/jira/browse/LUCENE-1213
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: QueryParser
>Reporter: Trejkaz
>Assignee: Doron Cohen
> Attachments: multifield-fix.patch, multifield-fix.patch
>
>
> MultiFieldQueryParser.getFieldQuery(String, String, int) calls 
> super.getFieldQuery(String, String), thus obliterating any slop parameter 
> present in the query.
> It should probably be changed to call super.getFieldQuery(String, String, 
> int), except doing only that will result in a recursive loop which is a 
> side-effect of what may be a deeper problem in MultiFieldQueryParser -- 
> getFieldQuery(String, String, int) is documented as delegating to 
> getFieldQuery(String, String), yet what it actually does is the exact 
> opposite.  This also causes problems for subclasses which need to override 
> getFieldQuery(String, String) to provide different behaviour.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2008-03-11 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577326#action_12577326
 ] 

Paul Elschot commented on LUCENE-584:
-

>From the traceback I suppose this happened at the end, using the ChainedFilter?
Iirc ChainedFilter is from contrib/..., and it is mentioned at LUCENE-1187 as 
one of the things to be done.
Could you contribute this code as a contrib/... test case there?
Sorry, I don't remember exactly from which contrib module ChainedFilter is.

> Decouple Filter from BitSet
> ---
>
> Key: LUCENE-584
> URL: https://issues.apache.org/jira/browse/LUCENE-584
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.1
>Reporter: Peter Schäfer
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 2.4
>
> Attachments: bench-diff.txt, bench-diff.txt, CHANGES.txt.patch, 
> ContribQueries20080111.patch, lucene-584-take2.patch, 
> lucene-584-take3-part1.patch, lucene-584-take3-part2.patch, 
> lucene-584-take4-part1.patch, lucene-584-take4-part2.patch, 
> lucene-584-take5-part1.patch, lucene-584-take5-part2.patch, lucene-584.patch, 
> Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, 
> Matcher-20071122-1ground.patch, Some Matchers.zip, Test20080111.patch
>
>
> {code}
> package org.apache.lucene.search;
> public abstract class Filter implements java.io.Serializable 
> {
>   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
> }
> public interface AbstractBitSet 
> {
>   public boolean get(int index);
> }
> {code}
> It would be useful if the method =Filter.bits()= returned an abstract 
> interface, instead of =java.util.BitSet=.
> Use case: there is a very large index, and, depending on the user's 
> privileges, only a small portion of the index is actually visible.
> Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
> memory. It would be desirable to have an alternative BitSet implementation 
> with smaller memory footprint.
> Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
> obviously not designed for that purpose.
> That's why I propose to use an interface instead. The default implementation 
> could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]