date:20101217

MMapDirectory speedups
--

 Key: LUCENE-2816
 URL: https://issues.apache.org/jira/browse/LUCENE-2816
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Affects Versions: 3.1, 4.0
Reporter: Robert Muir
Assignee: Robert Muir


MMapDirectory has some performance problems:
# When the file is larger than Integer.MAX_VALUE, we use MultiMMapIndexInput, 
which does a lot of unnecessary bounds-checks for its buffer-switching etc. 
Instead, like MMapIndexInput, it should rely upon the contract of these 
operations
in ByteBuffer (which will do a bounds check always and throw 
BufferUnderflowException).
Our 'buffer' is so large (Integer.MAX_VALUE) that its rare this happens and 
doing
our own bounds checks just slows things down.
# the readInt()/readLong()/readShort() are slow and should just defer to 
ByteBuffer.readInt(), etc
This isn't very important since we don't much use these, but I think there's no 
reason
users (e.g. codec writers) should have to readBytes() + wrap as bytebuffer + 
get an 
IntBuffer view when readInt() can be almost as fast...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2816) MMapDirectory speedups


 [ 
https://issues.apache.org/jira/browse/LUCENE-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2816:


Attachment: LUCENE-2816.patch

Here's the most important benchmark: speeding up the MultiMMap's readByte(s) in 
general:

MultiMMapIndexInput readByte(s) improvements [trunk, Standard codec]
||Query||QPS trunk||QPS patch||Pct diff
|spanFirst(unit, 5)|12.72|12.85|{color:green}1.0%{color}|
|+nebraska +state|137.47|139.33|{color:green}1.3%{color}|
|spanNear([unit, state], 10, true)|2.90|2.94|{color:green}1.4%{color}|
|unit state|5.88|5.99|{color:green}1.8%{color}|
|unit~2.0|7.06|7.20|{color:green}2.0%{color}|
|+unit +state|8.68|8.87|{color:green}2.2%{color}|
|unit state|8.00|8.23|{color:green}2.9%{color}|
|unit~1.0|7.19|7.41|{color:green}3.0%{color}|
|unit*|22.66|23.41|{color:green}3.3%{color}|
|uni*|12.54|13.12|{color:green}4.6%{color}|
|united~1.0|10.61|11.12|{color:green}4.8%{color}|
|united~2.0|2.52|2.65|{color:green}5.1%{color}|
|state|28.72|30.23|{color:green}5.3%{color}|
|un*d|44.84|48.06|{color:green}7.2%{color}|
|u*d|13.17|14.51|{color:green}10.2%{color}|

In the bulk postings branch, I've been experimenting with various techniques 
for FOR/PFOR 
and one thing i tried was simply decoding with readInt() from the DataInput. So 
I adapted For/PFOR
to just take DataInput and work on it directly, instead of reading into a 
byte[], wrapping it with a ByteBuffer,
and working on an IntBuffer view.

But when I did this, i found that MMap was slow for readInt(), etc. So we 
implement these primitives
with ByteBuffer.readInt(). This isn't very important since lucene doesn't much 
use these, and mostly theoretical 
but I still think things like readInt(), readShort(), readLong() should be 
fast... for example just earlier today 
someone posted an alternative PFOR implementation on LUCENE-1410 that uses 
DataInput.readInt().

MMapIndexInput readInt() improvements [bulkpostings, FrameOfRefDataInput codec]
||Query||QPS branch||QPS patch||Pct diff
|spanFirst(unit, 5)|12.14|11.99|{color:red}-1.2%{color}|
|united~1.0|11.32|11.33|{color:green}0.1%{color}|
|united~2.0|2.51|2.56|{color:green}2.1%{color}|
|unit~1.0|6.98|7.19|{color:green}3.0%{color}|
|unit~2.0|6.88|7.11|{color:green}3.3%{color}|
|spanNear([unit, state], 10, true)|2.81|2.96|{color:green}5.2%{color}|
|unit state|8.04|8.59|{color:green}6.8%{color}|
|+unit +state|10.97|12.12|{color:green}10.5%{color}|
|unit*|26.67|29.80|{color:green}11.7%{color}|
|unit state|5.59|6.27|{color:green}12.3%{color}|
|uni*|15.10|17.51|{color:green}15.9%{color}|
|state|33.20|38.72|{color:green}16.6%{color}|
|+nebraska +state|59.17|71.45|{color:green}20.8%{color}|
|un*d|35.98|47.14|{color:green}31.0%{color}|
|u*d|9.48|12.46|{color:green}31.4%{color}|

Here's the same benchmark of DataInput.readInt() but with the 
MultiMMapIndexInput

MultiMMapIndexInput readInt() improvements [bulkpostings, FrameOfRefDataInput 
codec]
||Query||QPS branch||QPS patch||Pct diff
|united~2.0|2.43|2.54|{color:green}4.3%{color}|
|united~1.0|10.78|11.39|{color:green}5.7%{color}|
|unit~1.0|6.81|7.21|{color:green}5.8%{color}|
|unit~2.0|6.62|7.05|{color:green}6.5%{color}|
|spanNear([unit, state], 10, true)|2.77|2.96|{color:green}6.6%{color}|
|unit state|7.85|8.53|{color:green}8.7%{color}|
|spanFirst(unit, 5)|10.50|11.71|{color:green}11.5%{color}|
|+unit +state|10.26|11.94|{color:green}16.3%{color}|
|unit state|5.39|6.31|{color:green}17.0%{color}|
|state|31.95|39.17|{color:green}22.6%{color}|
|unit*|24.39|31.02|{color:green}27.2%{color}|
|+nebraska +state|54.68|71.98|{color:green}31.6%{color}|
|u*d|9.53|12.62|{color:green}32.5%{color}|
|uni*|13.72|18.23|{color:green}32.9%{color}|
|un*d|35.87|48.19|{color:green}34.3%{color}|

Just to be sure, I ran this last one on sparc64 (bigendian) also.

MultiMMapIndexInput readInt() improvements [bulkpostings, FrameOfRefDataInput 
codec]
||Query||QPS branch||QPS patch||Pct diff
|united~2.0|2.23|2.26|{color:green}1.5%{color}|
|unit~2.0|6.37|6.47|{color:green}1.6%{color}|
|united~1.0|11.33|11.59|{color:green}2.3%{color}|
|unit~1.0|9.68|10.05|{color:green}3.7%{color}|
|spanNear([unit, state], 10, true)|15.60|17.54|{color:green}12.5%{color}|
|unit*|127.14|144.08|{color:green}13.3%{color}|
|unit state|44.93|51.30|{color:green}14.2%{color}|
|spanFirst(unit, 5)|58.42|68.37|{color:green}17.0%{color}|
|uni*|56.66|67.53|{color:green}19.2%{color}|
|+nebraska +state|215.62|262.99|{color:green}22.0%{color}|
|+unit +state|63.18|77.86|{color:green}23.2%{color}|
|unit state|32.24|40.05|{color:green}24.2%{color}|
|u*d|29.13|36.69|{color:green}26.0%{color}|
|state|145.99|188.33|{color:green}29.0%{color}|
|un*d|65.27|87.20|{color:green}33.6%{color}|

I think some of these benchmarks also show that MultiMMapIndexInput might now be
essentially just as fast as MMapIndexInput... but lets not go there yet and 
keep them separate for now.


 MMapDirectory speedups

[jira] Commented: (LUCENE-2816) MMapDirectory speedups

2010-12-17 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972398#action_12972398
 ] 

Simon Willnauer commented on LUCENE-2816:
-

Awesome results robert!! :)

 MMapDirectory speedups
 --

 Key: LUCENE-2816
 URL: https://issues.apache.org/jira/browse/LUCENE-2816
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Affects Versions: 3.1, 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-2816.patch


 MMapDirectory has some performance problems:
 # When the file is larger than Integer.MAX_VALUE, we use MultiMMapIndexInput, 
 which does a lot of unnecessary bounds-checks for its buffer-switching etc. 
 Instead, like MMapIndexInput, it should rely upon the contract of these 
 operations
 in ByteBuffer (which will do a bounds check always and throw 
 BufferUnderflowException).
 Our 'buffer' is so large (Integer.MAX_VALUE) that its rare this happens and 
 doing
 our own bounds checks just slows things down.
 # the readInt()/readLong()/readShort() are slow and should just defer to 
 ByteBuffer.readInt(), etc
 This isn't very important since we don't much use these, but I think there's 
 no reason
 users (e.g. codec writers) should have to readBytes() + wrap as bytebuffer + 
 get an 
 IntBuffer view when readInt() can be almost as fast...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: QParserPlugin as jar?

2010-12-17 Thread Jan Høydahl / Cominvent

That was how I expected it to work (and yes, I registered it in solrconfig.xml).
Wonder why it did not work when I tested it, I had to apply the patch and 
recompile. Guess I'll have to give it another try.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 15. des. 2010, at 19.19, Erik Hatcher wrote:

 
 On Dec 15, 2010, at 12:47 , Jan Høydahl / Cominvent wrote:
 
 Hi,
 
 I tried to package the edismax QParser (SOLR-1553) as a .jar file for 
 inclusion in an already installed solr1.4.1, and dropped my new jar in 
 SOLR_HOME/lib.
 However it failed with an exception. It suspect because the patch modifies 
 o.a.s.s.QParserPlugin, which is already existing on the classpath.
 
 Is there a way to dynamically initialize new plugins without statically 
 updating the QParserPlugin class?
 
 Yes, you can simply register it in solrconfig.xml:
 
  queryParser name=lucene 
 class=org.apache.solr.search.LuceneQParserPlugin/
 
 The QParserPlugin statically registered qparsers are just convenience so 
 those come built-in as registered (though can be overridden by registering 
 a different class with the same name).
 
   Erik
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2817) SimpleText has a bulk enum buffer reuse bug

SimpleText has a bulk enum buffer reuse bug
---

 Key: LUCENE-2817
 URL: https://issues.apache.org/jira/browse/LUCENE-2817
 Project: Lucene - Java
  Issue Type: Bug
  Components: Codecs
Affects Versions: Bulk Postings branch
Reporter: Robert Muir


testBulkPostingsBufferReuse fails with SimpleText codec.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2817) SimpleText has a bulk enum buffer reuse bug


[ 
https://issues.apache.org/jira/browse/LUCENE-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972410#action_12972410
 ] 

Robert Muir commented on LUCENE-2817:
-

junit-sequential:
[junit] Testsuite: org.apache.lucene.index.TestCodecs
[junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 0.492 sec
[junit]
[junit] - Standard Error -
[junit] NOTE: reproduce with: ant test -Dtestcase=TestCodecs 
-Dtestmethod=testBulkPostingsBufferReuse 
-Dtests.seed=8878914233730236058:-5578535381307601353
[junit] NOTE: test params are: codec=RandomCodecProvider: 
{field=SimpleText}, locale=mk, timezone=America/Indiana/Tell_City
[junit] NOTE: all tests run in this JVM:
[junit] [TestCodecs]
[junit] -  ---
[junit] Testcase: 
testBulkPostingsBufferReuse(org.apache.lucene.index.TestCodecs):  FAILED
[junit] 
expected:org.apache.lucene.index.codecs.simpletext.simpletextfieldsreader$simpletextbulkpostingse...@1a42792
 
but 
was:org.apache.lucene.index.codecs.simpletext.simpletextfieldsreader$simpletextbulkpostingse...@2200d5
[junit] junit.framework.AssertionFailedError: 
expected:org.apache.lucene.index.codecs.simpletext.simpletextfieldsreader$simpletextbulkpostingse...@1a42792
 
but 
was:org.apache.lucene.index.codecs.simpletext.simpletextfieldsreader$simpletextbulkpostingse...@2200d5
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1043)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:981)
[junit] at 
org.apache.lucene.index.TestCodecs.testBulkPostingsBufferReuse(TestCodecs.java:671)
[junit]
[junit]
[junit] Test org.apache.lucene.index.TestCodecs FAILED


 SimpleText has a bulk enum buffer reuse bug
 ---

 Key: LUCENE-2817
 URL: https://issues.apache.org/jira/browse/LUCENE-2817
 Project: Lucene - Java
  Issue Type: Bug
  Components: Codecs
Affects Versions: Bulk Postings branch
Reporter: Robert Muir

 testBulkPostingsBufferReuse fails with SimpleText codec.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2816) MMapDirectory speedups

2010-12-17 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972413#action_12972413
 ] 

Uwe Schindler commented on LUCENE-2816:
---

Awesome, Ro bert is changing to the MMap Performance Policeman!

I like the idea to simply delegate the methods and catch exception to fallback 
to manual read with boundary transition! I just wanted to be sure that the 
position pointer in the buffer does not partly go forward when you read request 
fails at a buffer boundary, but that seems to be the case.

 MMapDirectory speedups
 --

 Key: LUCENE-2816
 URL: https://issues.apache.org/jira/browse/LUCENE-2816
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Affects Versions: 3.1, 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-2816.patch


 MMapDirectory has some performance problems:
 # When the file is larger than Integer.MAX_VALUE, we use MultiMMapIndexInput, 
 which does a lot of unnecessary bounds-checks for its buffer-switching etc. 
 Instead, like MMapIndexInput, it should rely upon the contract of these 
 operations
 in ByteBuffer (which will do a bounds check always and throw 
 BufferUnderflowException).
 Our 'buffer' is so large (Integer.MAX_VALUE) that its rare this happens and 
 doing
 our own bounds checks just slows things down.
 # the readInt()/readLong()/readShort() are slow and should just defer to 
 ByteBuffer.readInt(), etc
 This isn't very important since we don't much use these, but I think there's 
 no reason
 users (e.g. codec writers) should have to readBytes() + wrap as bytebuffer + 
 get an 
 IntBuffer view when readInt() can be almost as fast...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2816) MMapDirectory speedups

2010-12-17 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972416#action_12972416
 ] 

Uwe Schindler commented on LUCENE-2816:
---

One thing to add: When using readFloat  co, we should make sure that we set 
the endianness explicitely in the ctor. I just want to explicitely make sure 
that the endianness is correct and document it that it is big endian for Lucene.

 MMapDirectory speedups
 --

 Key: LUCENE-2816
 URL: https://issues.apache.org/jira/browse/LUCENE-2816
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Affects Versions: 3.1, 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-2816.patch


 MMapDirectory has some performance problems:
 # When the file is larger than Integer.MAX_VALUE, we use MultiMMapIndexInput, 
 which does a lot of unnecessary bounds-checks for its buffer-switching etc. 
 Instead, like MMapIndexInput, it should rely upon the contract of these 
 operations
 in ByteBuffer (which will do a bounds check always and throw 
 BufferUnderflowException).
 Our 'buffer' is so large (Integer.MAX_VALUE) that its rare this happens and 
 doing
 our own bounds checks just slows things down.
 # the readInt()/readLong()/readShort() are slow and should just defer to 
 ByteBuffer.readInt(), etc
 This isn't very important since we don't much use these, but I think there's 
 no reason
 users (e.g. codec writers) should have to readBytes() + wrap as bytebuffer + 
 get an 
 IntBuffer view when readInt() can be almost as fast...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-2816) MMapDirectory speedups

2010-12-17 Thread Uwe Schindler (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972416#action_12972416
]

Uwe Schindler edited comment on LUCENE-2816 at 12/17/10 4:43 AM:
-

-One thing to add: When using readFloat co, we should make sure that we set
the endianness explicitely in the ctor. I just want to explicitely make sure
that the endianness is correct and document it that it is big endian for
Lucene.-

We don't need that: The initial order of a byte buffer is always BIG_ENDIAN.

was (Author: thetaphi):
One thing to add: When using readFloat co, we should make sure that we
set the endianness explicitely in the ctor. I just want to explicitely make
sure that the endianness is correct and document it that it is big endian for
Lucene.

MMapDirectory speedups
--

Key: LUCENE-2816
URL: https://issues.apache.org/jira/browse/LUCENE-2816
Project: Lucene - Java
Issue Type: Improvement
Components: Store
Affects Versions: 3.1, 4.0
Reporter: Robert Muir
Assignee: Robert Muir
Attachments: LUCENE-2816.patch

MMapDirectory has some performance problems:
# When the file is larger than Integer.MAX_VALUE, we use MultiMMapIndexInput,
which does a lot of unnecessary bounds-checks for its buffer-switching etc.
Instead, like MMapIndexInput, it should rely upon the contract of these
operations
in ByteBuffer (which will do a bounds check always and throw
BufferUnderflowException).
Our 'buffer' is so large (Integer.MAX_VALUE) that its rare this happens and
doing
our own bounds checks just slows things down.
# the readInt()/readLong()/readShort() are slow and should just defer to
ByteBuffer.readInt(), etc
This isn't very important since we don't much use these, but I think there's
no reason
users (e.g. codec writers) should have to readBytes() + wrap as bytebuffer +
get an
IntBuffer view when readInt() can be almost as fast...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2814) stop writing shared doc stores across segments

[
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972417#action_12972417
]

Michael McCandless commented on LUCENE-2814:

bq. Probably still a smaller change than flex indexing

Yes, true!

bq. But yeah in general I agree that we should do things more incrementally. I
think that's a mistake I've made with the RT branch so far.

Well, not a mistake... this was unavoidable given that trunk was so far from
what DWPT needs. But with per-seg deletes (LUCENE-2680) done, and no more doc
stores (this issue), I think we've got DWPT down to about as bite sized as it
can be (it's still gonna be big!).

I can help merge too! I think coordinating on IRC #lucene is a good idea? It
seems like LUCENE-2573 needs to be incorporated into IW's new FlushControl
class (which is already coordinating flush-due-to-deletes and
flush-due-to-added-docs-of-one-DWPT).

stop writing shared doc stores across segments
--

Key: LUCENE-2814
URL: https://issues.apache.org/jira/browse/LUCENE-2814
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Affects Versions: 3.1, 4.0
Reporter: Michael McCandless
Assignee: Michael McCandless
Attachments: LUCENE-2814.patch, LUCENE-2814.patch, LUCENE-2814.patch

Shared doc stores enables the files for stored fields and term vectors to be
shared across multiple segments. We've had this optimization since 2.1 I
think.
It works best against a new index, where you open an IW, add lots of docs,
and then close it. In that case all of the written segments will reference
slices a single shared doc store segment.
This was a good optimization because it means we never need to merge these
files. But, when you open another IW on that index, it writes a new set of
doc stores, and then whenever merges take place across doc stores, they must
now be merged.
However, since we switched to shared doc stores, there have been two
optimizations for merging the stores. First, we now bulk-copy the bytes in
these files if the field name/number assignment is congruent. Second, we
now force congruent field name/number mapping in IndexWriter. This means
this optimization is much less potent than it used to be.
Furthermore, the optimization adds *a lot* of hair to
IndexWriter/DocumentsWriter; this has been the source of sneaky bugs over
time, and causes odd behavior like a merge possibly forcing a flush when it
starts. Finally, with DWPT (LUCENE-2324), which gets us truly concurrent
flushing, we can no longer share doc stores.
So, I think we should turn off the write-side of shared doc stores to pave
the path for DWPT to land on trunk and simplify IW/DW. We still must support
reading them (until 5.0), but the read side is far less hairy.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2816) MMapDirectory speedups

[
https://issues.apache.org/jira/browse/LUCENE-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972418#action_12972418
]

Robert Muir commented on LUCENE-2816:
-

bq. I just wanted to be sure that the position pointer in the buffer does not
partly go forward when you read request fails at a buffer boundary, but that
seems to be the case.

Yes, this is guaranteed in the APIs, and also tested well by TestMultiMMap,
which uses random chunk sizes between 20 and 100 (including odd numbers etc)
Though we should enhance this test, i think it just retrieves documents at the
moment... probably better if it did some searches too.

MMapDirectory speedups
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2816) MMapDirectory speedups


[ 
https://issues.apache.org/jira/browse/LUCENE-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972420#action_12972420
 ] 

Michael McCandless commented on LUCENE-2816:


Good grief!  What amazing gains, especially w/ PFor codec which of course makes 
super heavy use of .readInt().  Awesome Robert!

This will mean w/ the cutover to FORPFOR codec for 4.0, MMapDir will likely 
have a huge edge over NIOFSDir?

 MMapDirectory speedups
 --

 Key: LUCENE-2816
 URL: https://issues.apache.org/jira/browse/LUCENE-2816
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Affects Versions: 3.1, 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-2816.patch


 MMapDirectory has some performance problems:
 # When the file is larger than Integer.MAX_VALUE, we use MultiMMapIndexInput, 
 which does a lot of unnecessary bounds-checks for its buffer-switching etc. 
 Instead, like MMapIndexInput, it should rely upon the contract of these 
 operations
 in ByteBuffer (which will do a bounds check always and throw 
 BufferUnderflowException).
 Our 'buffer' is so large (Integer.MAX_VALUE) that its rare this happens and 
 doing
 our own bounds checks just slows things down.
 # the readInt()/readLong()/readShort() are slow and should just defer to 
 ByteBuffer.readInt(), etc
 This isn't very important since we don't much use these, but I think there's 
 no reason
 users (e.g. codec writers) should have to readBytes() + wrap as bytebuffer + 
 get an 
 IntBuffer view when readInt() can be almost as fast...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2817) SimpleText has a bulk enum buffer reuse bug


[ 
https://issues.apache.org/jira/browse/LUCENE-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972421#action_12972421
 ] 

Michael McCandless commented on LUCENE-2817:


Duh, silliness.  I just added this test (to assert BulkPostingsEnum/buffer 
reuse) but SimpleText never re-uses.  I'll add an Assume.

 SimpleText has a bulk enum buffer reuse bug
 ---

 Key: LUCENE-2817
 URL: https://issues.apache.org/jira/browse/LUCENE-2817
 Project: Lucene - Java
  Issue Type: Bug
  Components: Codecs
Affects Versions: Bulk Postings branch
Reporter: Robert Muir

 testBulkPostingsBufferReuse fails with SimpleText codec.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2816) MMapDirectory speedups


[ 
https://issues.apache.org/jira/browse/LUCENE-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972422#action_12972422
 ] 

Robert Muir commented on LUCENE-2816:
-

{quote}
Good grief! What amazing gains, especially w/ PFor codec which of course makes 
super heavy use of .readInt(). Awesome Robert!
This will mean w/ the cutover to FORPFOR codec for 4.0, MMapDir will likely 
have a huge edge over NIOFSDir?
{quote}

This isn't really a 'gain' for the bulkpostings branch?
This is just making DataInput.readInt() faster.
Currently the bulkpostings branch uses readByte(byte[]), then wraps into a 
ByteBuffer and processes an IntBuffer view of that.
I switched to just using readInt() from DataInputDirectly [FrameOfRefDataInput] 
and found it to be much slower than this IntBuffer method.

this whole benchmark is just benching DataInput.readInt()...

So, we shouldn't change anything in bulkpostings, this isn't faster than the 
intbuffer method in my tests, at best its equivalent... but we should fix this 
slowdown in our APIs.


 MMapDirectory speedups
 --

 Key: LUCENE-2816
 URL: https://issues.apache.org/jira/browse/LUCENE-2816
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Affects Versions: 3.1, 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-2816.patch


 MMapDirectory has some performance problems:
 # When the file is larger than Integer.MAX_VALUE, we use MultiMMapIndexInput, 
 which does a lot of unnecessary bounds-checks for its buffer-switching etc. 
 Instead, like MMapIndexInput, it should rely upon the contract of these 
 operations
 in ByteBuffer (which will do a bounds check always and throw 
 BufferUnderflowException).
 Our 'buffer' is so large (Integer.MAX_VALUE) that its rare this happens and 
 doing
 our own bounds checks just slows things down.
 # the readInt()/readLong()/readShort() are slow and should just defer to 
 ByteBuffer.readInt(), etc
 This isn't very important since we don't much use these, but I think there's 
 no reason
 users (e.g. codec writers) should have to readBytes() + wrap as bytebuffer + 
 get an 
 IntBuffer view when readInt() can be almost as fast...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-2817) SimpleText has a bulk enum buffer reuse bug


 [ 
https://issues.apache.org/jira/browse/LUCENE-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2817.


   Resolution: Fixed
Fix Version/s: Bulk Postings branch

Fixed.  In fact SimpleText does try to reuse and in fact it was buggy!  I fixed 
it.

 SimpleText has a bulk enum buffer reuse bug
 ---

 Key: LUCENE-2817
 URL: https://issues.apache.org/jira/browse/LUCENE-2817
 Project: Lucene - Java
  Issue Type: Bug
  Components: Codecs
Affects Versions: Bulk Postings branch
Reporter: Robert Muir
 Fix For: Bulk Postings branch


 testBulkPostingsBufferReuse fails with SimpleText codec.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-2680) Improve how IndexWriter flushes deletes against existing segments

[
https://issues.apache.org/jira/browse/LUCENE-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael McCandless resolved LUCENE-2680.

Resolution: Fixed

Improve how IndexWriter flushes deletes against existing segments
-

Key: LUCENE-2680
URL: https://issues.apache.org/jira/browse/LUCENE-2680
Project: Lucene - Java
Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
Fix For: 3.1, 4.0

Attachments: LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch,
LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch,
LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch,
LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch,
LUCENE-2680.patch, LUCENE-2680.patch

IndexWriter buffers up all deletes (by Term and Query) and only
applies them if 1) commit or NRT getReader() is called, or 2) a merge
is about to kickoff.
We do this because, for a large index, it's very costly to open a
SegmentReader for every segment in the index. So we defer as long as
we can. We do it just before merge so that the merge can eliminate
the deleted docs.
But, most merges are small, yet in a big index we apply deletes to all
of the segments, which is really very wasteful.
Instead, we should only apply the buffered deletes to the segments
that are about to be merged, and keep the buffer around for the
remaining segments.
I think it's not so hard to do; we'd have to have generations of
pending deletions, because the newly merged segment doesn't need the
same buffered deletions applied again. So every time a merge kicks
off, we pinch off the current set of buffered deletions, open a new
set (the next generation), and record which segment was created as of
which generation.
This should be a very sizable gain for large indices that mix
deletes, though, less so in flex since opening the terms index is much
faster.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Lucene-Solr-tests-only-3.x - Build # 2623 - Failure

Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/2623/

1 tests failed.
REGRESSION:  org.apache.lucene.index.TestAddIndexes.testAddIndexesWithRollback

Error Message:
ConcurrentMergeScheduler hit unhandled exceptions

Stack Trace:
junit.framework.AssertionFailedError: ConcurrentMergeScheduler hit unhandled 
exceptions
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:891)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:829)
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:375)




Build Log (for compile errors):
[...truncated 4528 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Lucene-Solr-tests-only-trunk - Build # 2651 - Failure

Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/2651/

1 tests failed.
REGRESSION:  org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads

Error Message:
CheckIndex failed

Stack Trace:
java.lang.RuntimeException: CheckIndex failed
at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:87)
at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:73)
at 
org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:131)
at 
org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:137)
at 
org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads(TestIndexWriterOnJRECrash.java:61)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1048)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:986)




Build Log (for compile errors):
[...truncated 3301 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2815) MultiFields not thread safe

2010-12-17 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972505#action_12972505
 ] 

Yonik Seeley commented on LUCENE-2815:
--

Hmmm, this patch causes test failures because ConcurrentHashMap doesn't handle 
null values.


 MultiFields not thread safe
 ---

 Key: LUCENE-2815
 URL: https://issues.apache.org/jira/browse/LUCENE-2815
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Yonik Seeley
 Fix For: 4.0

 Attachments: LUCENE-2815.patch


 MultiFields looks like it has thread safety issues

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2815) MultiFields not thread safe

2010-12-17 Thread Yonik Seeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated LUCENE-2815:
-

Attachment: LUCENE-2815.patch

Here's an updated patch that avoids containsKey() followed by get() (just an 
optimization)
and avoids caching null Terms instances.  This is the right thing to do anyway, 
since one could easily blow up the cache with fields that don't exist.

 MultiFields not thread safe
 ---

 Key: LUCENE-2815
 URL: https://issues.apache.org/jira/browse/LUCENE-2815
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Yonik Seeley
 Fix For: 4.0

 Attachments: LUCENE-2815.patch, LUCENE-2815.patch


 MultiFields looks like it has thread safety issues

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2723) Speed up Lucene's low level bulk postings read API

[
https://issues.apache.org/jira/browse/LUCENE-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated LUCENE-2723:

Attachment: LUCENE-2723_wastedint.patch

patch with more refactoring of For/Pfor decompression:
* The decompressors take DataInput, but still use the IntBuffer technique for
now.
* I removed the wasted int-per-block in For.

Speed up Lucene's low level bulk postings read API
--

Key: LUCENE-2723
URL: https://issues.apache.org/jira/browse/LUCENE-2723
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
Fix For: 4.0

Attachments: LUCENE-2723-termscorer.patch,
LUCENE-2723-termscorer.patch, LUCENE-2723-termscorer.patch,
LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch,
LUCENE-2723.patch, LUCENE-2723_termscorer.patch, LUCENE-2723_wastedint.patch

Spinoff from LUCENE-1410.
The flex DocsEnum has a simple bulk-read API that reads the next chunk
of docs/freqs. But it's a poor fit for intblock codecs like FOR/PFOR
(from LUCENE-1410). This is not unlike sucking coffee through those
tiny plastic coffee stirrers they hand out airplanes that,
surprisingly, also happen to function as a straw.
As a result we see no perf gain from using FOR/PFOR.
I had hacked up a fix for this, described at in my blog post at
http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html
I'm opening this issue to get that work to a committable point.
So... I've worked out a new bulk-read API to address performance
bottleneck. It has some big changes over the current bulk-read API:
* You can now also bulk-read positions (but not payloads), but, I
have yet to cutover positional queries.
* The buffer contains doc deltas, not absolute values, for docIDs
and positions (freqs are absolute).
* Deleted docs are not filtered out.
* The doc freq buffers need not be aligned. For fixed intblock
codecs (FOR/PFOR) they will be, but for varint codecs (Simple9/16,
Group varint, etc.) they won't be.
It's still a work in progress...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-2815) MultiFields not thread safe

2010-12-17 Thread Yonik Seeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved LUCENE-2815.
--

Resolution: Fixed

committed.

 MultiFields not thread safe
 ---

 Key: LUCENE-2815
 URL: https://issues.apache.org/jira/browse/LUCENE-2815
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Yonik Seeley
 Fix For: 4.0

 Attachments: LUCENE-2815.patch, LUCENE-2815.patch


 MultiFields looks like it has thread safety issues

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2814) stop writing shared doc stores across segments

2010-12-17 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972531#action_12972531
]

Jason Rutherglen commented on LUCENE-2814:
--

bq. I think we've got DWPT down to about as bite sized as it can be (it's still
gonna be big!)

Indeed!

bq. I think coordinating on IRC #lucene is a good idea?

It'd be nice if there were a log of IRC #lucene, otherwise I prefer Jira.

bq. It seems like LUCENE-2573 needs to be incorporated into IW's new
FlushControl class

Right, into the DWPT branch.

stop writing shared doc stores across segments
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Cannot Escape Special charectors Search with Lucene.Net 2.0

2010-12-17 Thread Granroth, Neal V.


Robert's correct the StandardAnalyzer will split the input text at the  
characters so your index will not contain them.  As in this simple example:

StandardAnalyzer aa = new StandardAnalyzer();

System.IO.StringReader srs = new System.IO.StringReader(aaa bbb testtest ccc 
ddd);

Lucene.Net.Analysis.TokenStream ts = aa.TokenStream(srs);

Lucene.Net.Analysis.Token tk;
while( (tk = ts.Next()) != null )
{
   System.Console.WriteLine(String.Format(Token: \{0}\: S:{1}, E:{2},
  tk.TermText(),tk.StartOffset(),tk.EndOffset()));
}

The output looks like this:
Token: aaa: S:0, E:3
Token: bbb: S:4, E:7
Token: test: S:8, E:12
Token: test: S:14, E:18
Token: ccc: S:19, E:22
Token: ddd: S:23, E:26

You can see that the  characters were identified as separators and two 
test tokens were emitted not the single testtest you expected.


- Neal

-Original Message-
From: Robert Jordan [mailto:robe...@gmx.net] 
Sent: Friday, December 17, 2010 6:25 AM
To: lucene-net-...@incubator.apache.org
Subject: Re: Cannot Escape Special charectors Search with Lucene.Net 2.0

On 17.12.2010 12:29, abhilash ramachandran wrote:
 q = new global::Lucene.Net.QueryParsers.QueryParser(content, new
 StandardAnalyzer()).Parse(query);

I believe the issue has nothing to do with your query
syntax. StandardAnalyzer is skipping chars like  during
the indexing process, so you can't search for them.

Robert

WARNING: re-index all trunk indices!

2010-12-17 Thread Michael McCandless

If you are using Lucene's trunk (nightly build) release, read on...

I just committed a change (for LUCENE-2811) that changes the index
format on trunk, thus breaking (w/ likely strange exceptions on
reading the segments_N file) any trunk indices created in the past
week or so.

Mike

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-2811) SegmentInfo should explicitly track whether that segment wrote term vectors


 [ 
https://issues.apache.org/jira/browse/LUCENE-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2811.


Resolution: Fixed

 SegmentInfo should explicitly track whether that segment wrote term vectors
 ---

 Key: LUCENE-2811
 URL: https://issues.apache.org/jira/browse/LUCENE-2811
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2811.patch


 Today SegmentInfo doesn't know if it has vectors, which means its files() 
 method must check if the files exist.
 This leads to subtle bugs, because Si.files() caches the files but then we 
 fail to invalidate that later when the term vectors files are created.
 It also leads to sloppy code, eg TermVectorsReader gracefully handles being 
 opened when the files do not exist.  I don't like that; it should only be 
 opened if they exist.
 This also fixes these intermittent failures we've been seeing:
 {noformat}
 junit.framework.AssertionFailedError: IndexFileDeleter doesn't know about 
 file _1e.tvx
at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:979)
at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:917)
at 
 org.apache.lucene.index.IndexWriter.filesExist(IndexWriter.java:3633)
at 
 org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3699)
at 
 org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2407)
at 
 org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2478)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2460)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2444)
at 
 org.apache.lucene.index.TestIndexWriterExceptions.testRandomExceptionsThreads(TestIndexWriterExceptions.java:213)
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2814) stop writing shared doc stores across segments

[
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972538#action_12972538
]

Steven Rowe commented on LUCENE-2814:
-

{quote}
bq. I think coordinating on IRC #lucene is a good idea?

It'd be nice if there were a log of IRC #lucene, otherwise I prefer Jira.
{quote}

#lucene-dev is logged.

stop writing shared doc stores across segments
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: WARNING: re-index all trunk indices!

2010-12-17 Thread Yonik Seeley

On Fri, Dec 17, 2010 at 11:18 AM, Michael McCandless
luc...@mikemccandless.com wrote:
 If you are using Lucene's trunk (nightly build) release, read on...

 I just committed a change (for LUCENE-2811) that changes the index
 format on trunk, thus breaking (w/ likely strange exceptions on
 reading the segments_N file) any trunk indices created in the past
 week or so.

For reference, the exception I got trying to start Solr with an older
index on Windows is below.

-Yonik
http://www.lucidimagination.com


SEVERE: java.lang.RuntimeException: java.io.IOException: read past EOF
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1095)
at org.apache.solr.core.SolrCore.init(SolrCore.java:587)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:660)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:412)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:294)
at 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:243)
at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:86)
at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
at 
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282)
at 
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
at 
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
at org.mortbay.jetty.Server.doStart(Server.java:224)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.mortbay.start.Main.invokeMain(Main.java:194)
at org.mortbay.start.Main.start(Main.java:534)
at org.mortbay.start.Main.start(Main.java:441)
at org.mortbay.start.Main.main(Main.java:119)
Caused by: java.io.IOException: read past EOF
at 
org.apache.lucene.store.MMapDirectory$MMapIndexInput.readBytes(MMapDirectory.java:242)
at 
org.apache.lucene.store.ChecksumIndexInput.readBytes(ChecksumIndexInput.java:48)
at org.apache.lucene.store.DataInput.readString(DataInput.java:121)
at 
org.apache.lucene.store.DataInput.readStringStringMap(DataInput.java:148)
at org.apache.lucene.index.SegmentInfo.init(SegmentInfo.java:192)
at 
org.apache.lucene.index.codecs.DefaultSegmentInfosReader.read(DefaultSegmentInfosReader.java:57)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:220)
at 
org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:90)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:623)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:86)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:437)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:316)
at 
org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:38)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1084)
... 31 more

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2814) stop writing shared doc stores across segments

[
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972543#action_12972543
]

Steven Rowe commented on LUCENE-2814:
-

bq. Steven, is that on a wiki page?

bq. The usage seems a little slim?
http://colabti.org/irclogger/irclogger_log/lucene-dev?date=2010-12-17;raw=on

Yeah, it's very rarely used.

Several Lucene people who use #lucene are strongly against logging, so I set up
#lucene-dev as a place to host on-the-record Lucene conversations. I mentioned
it because this is what you want.

stop writing shared doc stores across segments
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-2814) stop writing shared doc stores across segments

[
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972543#action_12972543
]

Steven Rowe edited comment on LUCENE-2814 at 12/17/10 11:49 AM:

bq. Steven, is that on a wiki page?

I don't know, I never put it anywhere, just discussed on d...@l.a.o. Feel free
to do so if you like.

bq. The usage seems a little slim?
http://colabti.org/irclogger/irclogger_log/lucene-dev?date=2010-12-17;raw=on

Yeah, it's very rarely used.

Several Lucene people who use #lucene are strongly against logging, so I set up
#lucene-dev as a place to host on-the-record Lucene conversations. I mentioned
it because this is what you want.

was (Author: steve_rowe):
bq. Steven, is that on a wiki page?

bq. The usage seems a little slim?
http://colabti.org/irclogger/irclogger_log/lucene-dev?date=2010-12-17;raw=on

Yeah, it's very rarely used.

Several Lucene people who use #lucene are strongly against logging, so I set up
#lucene-dev as a place to host on-the-record Lucene conversations. I mentioned
it because this is what you want.

stop writing shared doc stores across segments
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (SOLR-2289) The example documents have the same lat/lon for the store field for several stores. Space them out.

2010-12-17 Thread Erick Erickson (JIRA)

The example documents have the same lat/lon for the store field for several 
stores. Space them out.
-

 Key: SOLR-2289
 URL: https://issues.apache.org/jira/browse/SOLR-2289
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: Next
 Environment: All
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Trivial


Half-dozen or so of the documents in the exampledocs directory all have the 
same location, which makes it a bit confusing when playing with geospatial, at 
least I scratched my head wondering whether it was working. Note that this 
another reason to include the distance in the returned doc :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Cannot Escape Special charectors Search with Lucene.Net 2.0

2010-12-17 Thread Digy

 N.G -- You can see that the  characters were identified as separators
and two test tokens were emitted not the single testtest you expected.

 A.R -- The scenario is if I try search a text TestTest

But the query TestTest will also be parsed as test test by
StandardAnalyzer. Since there are 2 sucessive tests in the index, there
must be a hit.

DIGY


-Original Message-
From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com] 
Sent: Friday, December 17, 2010 6:06 PM
To: lucene-net-...@lucene.apache.org
Subject: RE: Cannot Escape Special charectors Search with Lucene.Net 2.0


Robert's correct the StandardAnalyzer will split the input text at the 
characters so your index will not contain them.  As in this simple example:

StandardAnalyzer aa = new StandardAnalyzer();

System.IO.StringReader srs = new System.IO.StringReader(aaa bbb testtest
ccc ddd);

Lucene.Net.Analysis.TokenStream ts = aa.TokenStream(srs);

Lucene.Net.Analysis.Token tk;
while( (tk = ts.Next()) != null )
{
   System.Console.WriteLine(String.Format(Token: \{0}\: S:{1}, E:{2},
  tk.TermText(),tk.StartOffset(),tk.EndOffset()));
}

The output looks like this:
Token: aaa: S:0, E:3
Token: bbb: S:4, E:7
Token: test: S:8, E:12
Token: test: S:14, E:18
Token: ccc: S:19, E:22
Token: ddd: S:23, E:26

You can see that the  characters were identified as separators and two
test tokens were emitted not the single testtest you expected.


- Neal

-Original Message-
From: Robert Jordan [mailto:robe...@gmx.net] 
Sent: Friday, December 17, 2010 6:25 AM
To: lucene-net-...@incubator.apache.org
Subject: Re: Cannot Escape Special charectors Search with Lucene.Net 2.0

On 17.12.2010 12:29, abhilash ramachandran wrote:
 q = new global::Lucene.Net.QueryParsers.QueryParser(content, new
 StandardAnalyzer()).Parse(query);

I believe the issue has nothing to do with your query
syntax. StandardAnalyzer is skipping chars like  during
the indexing process, so you can't search for them.

Robert

[jira] Updated: (SOLR-2289) The example documents have the same lat/lon for the store field for several stores. Space them out.

2010-12-17 Thread Erick Erickson (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-2289:
-

Attachment: SOLR-2289.patch

Patch attached, moves the stores that used to be identical NW along 55. Some 
are in farm fields, but what the heck

Patch made with Tortoise SVN rather than the usual IntelliJ, but the format 
looks OK (tm).

Anybody want to pick it up and commit?



 The example documents have the same lat/lon for the store field for several 
 stores. Space them out.
 -

 Key: SOLR-2289
 URL: https://issues.apache.org/jira/browse/SOLR-2289
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: Next
 Environment: All
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Trivial
 Attachments: SOLR-2289.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 Half-dozen or so of the documents in the exampledocs directory all have the 
 same location, which makes it a bit confusing when playing with geospatial, 
 at least I scratched my head wondering whether it was working. Note that this 
 another reason to include the distance in the returned doc :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Cannot Escape Special charectors Search with Lucene.Net 2.0

2010-12-17 Thread Robert Jordan


On 17.12.2010 17:59, Digy wrote:

N.G --  You can see that the  characters were identified as separators

and two test tokens were emitted not the single testtest you expected.


A.R --  The scenario is if I try search a text TestTest


But the query TestTest will also be parsed as test test by
StandardAnalyzer. Since there are 2 sucessive tests in the index, there
must be a hit.


Or he doesn't use the same analyzer for indexing and searching.

Robert




DIGY


-Original Message-
From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com]
Sent: Friday, December 17, 2010 6:06 PM
To: lucene-net-...@lucene.apache.org
Subject: RE: Cannot Escape Special charectors Search with Lucene.Net 2.0


Robert's correct the StandardAnalyzer will split the input text at the 
characters so your index will not contain them.  As in this simple example:

StandardAnalyzer aa = new StandardAnalyzer();

System.IO.StringReader srs = new System.IO.StringReader(aaa bbb testtest
ccc ddd);

Lucene.Net.Analysis.TokenStream ts = aa.TokenStream(srs);

Lucene.Net.Analysis.Token tk;
while( (tk = ts.Next()) != null )
{
System.Console.WriteLine(String.Format(Token: \{0}\: S:{1}, E:{2},
   tk.TermText(),tk.StartOffset(),tk.EndOffset()));
}

The output looks like this:
Token: aaa: S:0, E:3
Token: bbb: S:4, E:7
Token: test: S:8, E:12
Token: test: S:14, E:18
Token: ccc: S:19, E:22
Token: ddd: S:23, E:26

You can see that the  characters were identified as separators and two
test tokens were emitted not the single testtest you expected.


- Neal

-Original Message-
From: Robert Jordan [mailto:robe...@gmx.net]
Sent: Friday, December 17, 2010 6:25 AM
To: lucene-net-...@incubator.apache.org
Subject: Re: Cannot Escape Special charectors Search with Lucene.Net 2.0

On 17.12.2010 12:29, abhilash ramachandran wrote:

q = new global::Lucene.Net.QueryParsers.QueryParser(content, new
StandardAnalyzer()).Parse(query);


I believe the issue has nothing to do with your query
syntax. StandardAnalyzer is skipping chars like  during
the indexing process, so you can't search for them.

Robert

RE: Is it possible to set the merge policy setMaxMergeMB from Solr

2010-12-17 Thread Burton-West, Tom

I'm a bit confused.  

There are some examples in the JIRA issue for Solr 1447, but I can't tell from 
reading it what the final allowed syntax is.

I see 

!--mergePolicy class=org.apache.lucene.index.LogByteSizeMergePolicy--
  !--double name=maxMergeMB64.0/double--
!--/mergePolicy--
in the JIRA issue and in what I think is the test case config file:
http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/solr/src/test/test-files/solr/conf/solrconfig-propinject.xml?view=log

Lance's example is 

mergePolicyorg.apache.lucene.index.LogByteSizeMergePolicy
maxMergeMB1024/maxMergeMB
/mergePolicy

Which one is correct?

Tom

-Original Message-
From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] 
Sent: Tuesday, December 07, 2010 10:48 AM
To: dev@lucene.apache.org
Subject: Re: Is it possible to set the merge policy setMaxMergeMB from Solr

SOLR-1447 added this functionality.

On Mon, Dec 6, 2010 at 2:34 PM, Burton-West, Tom tburt...@umich.edu wrote:
 Lucene has this method to set the maximum size of a segment when merging:
 LogByteSizeMergePolicy.setMaxMergeMB
 (http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/index/LogByteSizeMergePolicy.html#setMaxMergeMB%28double%29
 )

 I would like to be able to set this in my solrconfig.xml.  Is this
 possible?  If not should I open a JIRA issue or is there some gotcha I am
 unaware of?

 Tom

 Tom Burton-West


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2694) MTQ rewrite + weight/scorer init should be single pass

2010-12-17 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972586#action_12972586
 ] 

Simon Willnauer commented on LUCENE-2694:
-

FYI - there is a coding error in the latest patch that causes the TermState to 
be ignored - TermWeight uses the wrong reference to the PerReaderTermState. I 
will upload a new patch later this weekend

simon

 MTQ rewrite + weight/scorer init should be single pass
 --

 Key: LUCENE-2694
 URL: https://issues.apache.org/jira/browse/LUCENE-2694
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch


 Spinoff of LUCENE-2690 (see the hacked patch on that issue)...
 Once we fix MTQ rewrite to be per-segment, we should take it further and make 
 weight/scorer init also run in the same single pass as rewrite.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2694) MTQ rewrite + weight/scorer init should be single pass

2010-12-17 Thread Simon Willnauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2694:


Attachment: LUCENE-2694.patch

here is a new patch. I removed the hacky TermWeight part to make only MTQ 
single pass for now. Other TermQueries will hit the TermCache for now.  No 
nocommit left. Currently there is some duplication / unncessary classes in the 
TermState hierarchy - that needs cleanup. 

BTW. I see some highlighter test failing - will look into this later...
simon

 MTQ rewrite + weight/scorer init should be single pass
 --

 Key: LUCENE-2694
 URL: https://issues.apache.org/jira/browse/LUCENE-2694
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, 
 LUCENE-2694.patch


 Spinoff of LUCENE-2690 (see the hacked patch on that issue)...
 Once we fix MTQ rewrite to be per-segment, we should take it further and make 
 weight/scorer init also run in the same single pass as rewrite.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Lucene-3.x - Build # 214 - Failure

Build: https://hudson.apache.org/hudson/job/Lucene-3.x/214/

4 tests failed.
REGRESSION:  
org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddIndexOnDiskFull

Error Message:
addIndexes(Directory[]) + optimize() hit IOException after disk space was freed 
up

Stack Trace:
junit.framework.AssertionFailedError: addIndexes(Directory[]) + optimize() hit 
IOException after disk space was freed up
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:891)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:829)
at 
org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddIndexOnDiskFull(TestIndexWriterOnDiskFull.java:323)


REGRESSION:  
org.apache.lucene.index.TestIndexWriterOnDiskFull.testCorruptionAfterDiskFullDuringMerge

Error Message:
Some threads threw uncaught exceptions!

Stack Trace:
junit.framework.AssertionFailedError: Some threads threw uncaught exceptions!
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:891)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:829)
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:354)


REGRESSION:  
org.apache.lucene.index.TestIndexWriterOnDiskFull.testImmediateDiskFull

Error Message:
ConcurrentMergeScheduler hit unhandled exceptions

Stack Trace:
junit.framework.AssertionFailedError: ConcurrentMergeScheduler hit unhandled 
exceptions
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:891)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:829)
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:375)


REGRESSION:  org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads

Error Message:
Some threads threw uncaught exceptions!

Stack Trace:
junit.framework.AssertionFailedError: Some threads threw uncaught exceptions!
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:891)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:829)
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:354)




Build Log (for compile errors):
[...truncated 6950 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2818) abort() method for IndexOutput

2010-12-17 Thread Earwin Burrfoot (JIRA)

abort() method for IndexOutput
--

 Key: LUCENE-2818
 URL: https://issues.apache.org/jira/browse/LUCENE-2818
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Earwin Burrfoot


I'd like to see abort() method on IndexOutput that silently (no exceptions) 
closes IO and then does silent papaDir.deleteFile(this.fileName()).
This will simplify a bunch of error recovery code for IndexWriter and friends, 
but constitutes an API backcompat break.

What do you think?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2282) Distributed Support for Search Result Clustering

2010-12-17 Thread Koji Sekiguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-2282:
-

Attachment: SOLR-2282.patch

A patch attached. Currently, carrot.produceSummary doesn't work in distributed 
mode:

{code:title=ClusteringComponent.finishStage()}
// TODO: Currently, docIds is set to null in distributed environment.
// This causes CarrotParams.PRODUCE_SUMMARY doesn't work.
// To work CarrotParams.PRODUCE_SUMMARY under distributed mode, we can choose 
either one of:
// (a) In each shard, ClusteringComponent produces summary and finishStage()
// merges these summaries.
// (b) Adding doHighlighting(SolrDocumentList, ...) method to SolrHighlighter 
and
// making SolrHighlighter uses external text rather than stored values to 
produce snippets.
{code}


 Distributed Support for Search Result Clustering
 

 Key: SOLR-2282
 URL: https://issues.apache.org/jira/browse/SOLR-2282
 Project: Solr
  Issue Type: New Feature
  Components: contrib - Clustering
Affects Versions: 1.4, 1.4.1
Reporter: Koji Sekiguchi
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: SOLR-2282.patch


 Brad Giaccio contributed a patch for this in SOLR-769. I'd like to 
 incorporate it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2814) stop writing shared doc stores across segments

2010-12-17 Thread Earwin Burrfoot (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Earwin Burrfoot updated LUCENE-2814:

Attachment: LUCENE-2814.patch

New patch. Now with even more lines removed!

DocStore-related index chain components used to track open/closed files through
DocumentsWriter.
Closed files list was unused, and is silently gone.
Open files list was used to:
* prevent not-yet-flushed shared docstores from being deleted by
IndexFileDeleter.
** no shared docstores, no need + IFD no longer requires a reference to DW
* delete already opened docstore files, when aborting.
** index chain now handles this on its own + has cleaner error handling code.

stop writing shared doc stores across segments
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Is it possible to set the merge policy setMaxMergeMB from Solr

2010-12-17 Thread Jason Rutherglen

I worked on the patch and I can't keep it straight either.  We need to
update the wiki?

I believe double name=maxMergeMB64.0/double is correct.

On Fri, Dec 17, 2010 at 10:25 AM, Burton-West, Tom tburt...@umich.edu wrote:
 I'm a bit confused.

 There are some examples in the JIRA issue for Solr 1447, but I can't tell 
 from reading it what the final allowed syntax is.

 I see

 !--mergePolicy class=org.apache.lucene.index.LogByteSizeMergePolicy--
      !--double name=maxMergeMB64.0/double--
    !--/mergePolicy--
 in the JIRA issue and in what I think is the test case config file:
 http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/solr/src/test/test-files/solr/conf/solrconfig-propinject.xml?view=log

 Lance's example is

 mergePolicyorg.apache.lucene.index.LogByteSizeMergePolicy
    maxMergeMB1024/maxMergeMB
 /mergePolicy

 Which one is correct?

 Tom

 -Original Message-
 From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
 Sent: Tuesday, December 07, 2010 10:48 AM
 To: dev@lucene.apache.org
 Subject: Re: Is it possible to set the merge policy setMaxMergeMB from Solr

 SOLR-1447 added this functionality.

 On Mon, Dec 6, 2010 at 2:34 PM, Burton-West, Tom tburt...@umich.edu wrote:
 Lucene has this method to set the maximum size of a segment when merging:
 LogByteSizeMergePolicy.setMaxMergeMB
 (http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/index/LogByteSizeMergePolicy.html#setMaxMergeMB%28double%29
 )

 I would like to be able to set this in my solrconfig.xml.  Is this
 possible?  If not should I open a JIRA issue or is there some gotcha I am
 unaware of?

 Tom

 Tom Burton-West


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2818) abort() method for IndexOutput

2010-12-17 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972737#action_12972737
 ] 

Shai Erera commented on LUCENE-2818:


bq. but constitutes an API backcompat break

Can abort() have a default impl in IndexOutput, such as close() followed by 
deleteFile() maybe? If so, then it won't break anything.

Anyway, I think we can make an exception in this case - only those who impl 
Directory and provide their own IndexOutput extension will be affected, which I 
think is a relatively low number of applications?

bq. What do you think?

Would abort() on Directory fit better? E.g., it can abort all currently open 
and modified files, instead of the caller calling abort() on each IndexOutput? 
Are you thinking of a case where a write failed, and the caller would call 
abort() immediately, instead of some higher-level code? If so, would rollback() 
be a better name?

I always thought of IndexOutput as a means for writing bytes, no special 
semantic logic coded and executed by it. The management code IMO should be 
maintained by higher-level code, such as Directory or even higher (today 
IndexWriter, but that's what you're trying to remove :)).

So on one hand, I'd like to see IndexWriter's code simplified (this class has 
become a monster), but on the other, it doesn't feel right to me to add this 
logic in IndexOutput. Maybe I don't understand the use case for it well though. 
I do think though, that abort() on IndexOutput has a specific, clearer, 
meaning, where on Directory it can be perceived as kinda vague (what exactly is 
it aborting, reading / writing?). And maybe aborting a Directory is not good, 
if say you want to abort/rollback the changes done to a particular file.

All in all, I'm +1 for simplifying IW, but am still +0 on transferring the 
logic to IndexOutput, unless I misunderstand the use case.

 abort() method for IndexOutput
 --

 Key: LUCENE-2818
 URL: https://issues.apache.org/jira/browse/LUCENE-2818
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Earwin Burrfoot

 I'd like to see abort() method on IndexOutput that silently (no exceptions) 
 closes IO and then does silent papaDir.deleteFile(this.fileName()).
 This will simplify a bunch of error recovery code for IndexWriter and 
 friends, but constitutes an API backcompat break.
 What do you think?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Is it possible to set the merge policy setMaxMergeMB from Solr

2010-12-17 Thread Jason Rutherglen

Probably best to add something here as it currently has nothing
regarding merge policies and has a long standing TODO on the
indexDefaults.  http://wiki.apache.org/solr/SolrConfigXml

On Fri, Dec 17, 2010 at 7:22 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
 I worked on the patch and I can't keep it straight either.  We need to
 update the wiki?

 I believe double name=maxMergeMB64.0/double is correct.

 On Fri, Dec 17, 2010 at 10:25 AM, Burton-West, Tom tburt...@umich.edu wrote:
 I'm a bit confused.

 There are some examples in the JIRA issue for Solr 1447, but I can't tell 
 from reading it what the final allowed syntax is.

 I see

 !--mergePolicy class=org.apache.lucene.index.LogByteSizeMergePolicy--
      !--double name=maxMergeMB64.0/double--
    !--/mergePolicy--
 in the JIRA issue and in what I think is the test case config file:
 http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/solr/src/test/test-files/solr/conf/solrconfig-propinject.xml?view=log

 Lance's example is

 mergePolicyorg.apache.lucene.index.LogByteSizeMergePolicy
    maxMergeMB1024/maxMergeMB
 /mergePolicy

 Which one is correct?

 Tom

 -Original Message-
 From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
 Sent: Tuesday, December 07, 2010 10:48 AM
 To: dev@lucene.apache.org
 Subject: Re: Is it possible to set the merge policy setMaxMergeMB from Solr

 SOLR-1447 added this functionality.

 On Mon, Dec 6, 2010 at 2:34 PM, Burton-West, Tom tburt...@umich.edu wrote:
 Lucene has this method to set the maximum size of a segment when merging:
 LogByteSizeMergePolicy.setMaxMergeMB
 (http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/index/LogByteSizeMergePolicy.html#setMaxMergeMB%28double%29
 )

 I would like to be able to set this in my solrconfig.xml.  Is this
 possible?  If not should I open a JIRA issue or is there some gotcha I am
 unaware of?

 Tom

 Tom Burton-West


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Do we want 'nocommit' to fail the commit?

2010-12-17 Thread Shai Erera

Hi

Out of curiosity, I searched if we can have a nocommit comment in the code
fail the commit. As far as I see, we try to avoid accidental commits (of say
debug messages) by putting a nocommit comment, but I don't know if svn ci
would fail in the presence of such comment - I guess not because we've seen
some accidental nocommits checked in already in the past.

So I Googled around and found that if we have control of the svn repo, we
can add a pre-commit hook that will check and fail the commit. Here is a
nice article that explains how to add pre-commit hooks in general (
http://wordaligned.org/articles/a-subversion-pre-commit-hook). I didn't try
it yet (on our local svn instance), so I cannot say how well it works, but
perhaps someone has experience with it ...

So if this is interesting, and is doable for Lucene (say, open a JIRA issue
for Infra?) I don't mind investigating it further and write the script
(which can be as simple as 'grep the changed files and fail on the presence
of nocommit string').

Shai

Lucene-trunk - Build # 1398 - Still Failing

Build: https://hudson.apache.org/hudson/job/Lucene-trunk/1398/

All tests passed

Build Log (for compile errors):
[...truncated 18389 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly


 [ 
https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated LUCENE-2657:


Attachment: LUCENE-2657.patch

Updated to trunk, including addition of a POM for 
{{solr/contrib/analysis-extras/}} and upgraded dependencies.

All tests pass.

 Replace Maven POM templates with full POMs, and change documentation 
 accordingly
 

 Key: LUCENE-2657
 URL: https://issues.apache.org/jira/browse/LUCENE-2657
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.1, 4.0
Reporter: Steven Rowe
Assignee: Steven Rowe
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch


 The current Maven POM templates only contain dependency information, the bare 
 bones necessary for uploading artifacts to the Maven repository.
 Full Maven POMs will include the information necessary to run a multi-module 
 Maven build, in addition to serving the same purpose as the current POM 
 templates.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly


 [ 
https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated LUCENE-2657:


Description: 
The current Maven POM templates only contain dependency information, the bare 
bones necessary for uploading artifacts to the Maven repository.

The full Maven POMs in the attached patch include the information necessary to 
run a multi-module Maven build, in addition to serving the same purpose as the 
current POM templates.

Several dependencies are not available through public maven repositories.  A 
profile in the top-level POM can be activated to install these dependencies 
from the various {{lib/}} directories into your local repository.  From the 
top-level directory:

{code}
mvn -N -Pbootstrap install
{code}

Once these non-Maven dependencies have been installed, to run all Lucene/Solr 
tests via Maven's surefire plugin, and populate your local repository with all 
artifacts, from the top level directory, run:

{code}
mvn install
{code}

When one Lucene/Solr module depends on another, the dependency is declared on 
the *artifact(s)* produced by the other module and deposited in your local 
repository, rather than on the other module's un-jarred compiler output in the 
{{build/}} directory, so you must run {{mvn install}} on the other module 
before its changes are visible to the module that depends on it.

To create all the artifacts without running tests:

{code}
mvn -DskipTests install
{code}

I almost always include the {{clean}} phase when I do a build, e.g.:

{code}
mvn -DskipTests clean install
{code}


  was:
The current Maven POM templates only contain dependency information, the bare 
bones necessary for uploading artifacts to the Maven repository.

Full Maven POMs will include the information necessary to run a multi-module 
Maven build, in addition to serving the same purpose as the current POM 
templates.


 Replace Maven POM templates with full POMs, and change documentation 
 accordingly
 

 Key: LUCENE-2657
 URL: https://issues.apache.org/jira/browse/LUCENE-2657
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.1, 4.0
Reporter: Steven Rowe
Assignee: Steven Rowe
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch


 The current Maven POM templates only contain dependency information, the bare 
 bones necessary for uploading artifacts to the Maven repository.
 The full Maven POMs in the attached patch include the information necessary 
 to run a multi-module Maven build, in addition to serving the same purpose as 
 the current POM templates.
 Several dependencies are not available through public maven repositories.  A 
 profile in the top-level POM can be activated to install these dependencies 
 from the various {{lib/}} directories into your local repository.  From the 
 top-level directory:
 {code}
 mvn -N -Pbootstrap install
 {code}
 Once these non-Maven dependencies have been installed, to run all Lucene/Solr 
 tests via Maven's surefire plugin, and populate your local repository with 
 all artifacts, from the top level directory, run:
 {code}
 mvn install
 {code}
 When one Lucene/Solr module depends on another, the dependency is declared on 
 the *artifact(s)* produced by the other module and deposited in your local 
 repository, rather than on the other module's un-jarred compiler output in 
 the {{build/}} directory, so you must run {{mvn install}} on the other module 
 before its changes are visible to the module that depends on it.
 To create all the artifacts without running tests:
 {code}
 mvn -DskipTests install
 {code}
 I almost always include the {{clean}} phase when I do a build, e.g.:
 {code}
 mvn -DskipTests clean install
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Solr-3.x - Build # 200 - Failure