XmlCharFilter

2011-06-14 Thread Michael Sokolov
I work with a lot of XML data sources and have needed to implement an 
analysis chain for Solr/Lucene that accepts XML. In the course of doing 
that, I found I needed something very much like HTMLCharFilter, but that 
does standard XML parsing (understands XML entities defined in an 
internal or external DTD, for example).  So I wrote XmlCharFilter, which 
uses the Woodstox XML parser (already used by Solr).  I think this could 
be useful for others, and it would be nice for me if it were committed 
here, so I'd like to contribute.  Should I open a JIRA for this?  Is 
there anybody that can spare the time to review?  It is basically one 
class (plus a factory class) and has a fairly complete set of tests.


-Mike Sokolov
Engineering Directory
iFactory.com


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[Lucene.Net] [jira] [Closed] (LUCENENET-425) MMapDirectory implementation

2011-06-14 Thread Digy (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENENET-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Digy closed LUCENENET-425.
--

Resolution: Won't Fix

> MMapDirectory implementation
> 
>
> Key: LUCENENET-425
> URL: https://issues.apache.org/jira/browse/LUCENENET-425
> Project: Lucene.Net
>  Issue Type: New Feature
>Affects Versions: Lucene.Net 2.9.4g
>Reporter: Digy
>Priority: Trivial
> Fix For: Lucene.Net 2.9.4g
>
> Attachments: MMapDirectory.patch
>
>
> Since this is not a direct port of MMapDirectory.java, I'll put it under 
> "Support" and implement MMapDirectory as 
> {code}
> public class MMapDirectory:Lucene.Net.Support.MemoryMappedDirectory
> {
> }
> {code}
> If a Mem-Map can not be created(for ex, if the file is too big to fit in 32 
> bit address range), it will default to FSDirectory.FSIndexInput
> In my tests, I didn't see any performance gain in 32bit environment and I 
> consider it as better then nothing. 
> I would be happy if someone could send test results on 64bit platform.
> DIGY

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-06-14 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049482#comment-13049482
 ] 

Mark Miller commented on SOLR-1431:
---

Does this patch incorporate any of Nobles feedback/patches? Any reason we want 
to create a new ShardHandler every request?

> CommComponent abstracted
> 
>
> Key: SOLR-1431
> URL: https://issues.apache.org/jira/browse/SOLR-1431
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.0
>Reporter: Jason Rutherglen
>Assignee: Mark Miller
> Fix For: 4.0
>
> Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch
>
>
> We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3204) Include maven-ant-tasks jar in the source tree and use this jar from generate-maven-artifacts

2011-06-14 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe resolved LUCENE-3204.
-

Resolution: Fixed

Committed:

- trunk: r1135801, r1135818, r1135822, r1135825
- branch_3x: r1135827

> Include maven-ant-tasks jar in the source tree and use this jar from 
> generate-maven-artifacts
> -
>
> Key: LUCENE-3204
> URL: https://issues.apache.org/jira/browse/LUCENE-3204
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: general/build
>Affects Versions: 3.3, 4.0
>Reporter: Steven Rowe
>Assignee: Steven Rowe
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3204.patch, LUCENE-3204.patch
>
>
> Currently, running {{ant generate-maven-artifacts}} requires the user to have 
> {{maven-ant-tasks-*.jar}} in their Ant classpath, e.g. in {{~/.ant/lib/}}.  
> The build should instead rely on a copy of this jar included in the source 
> tree.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3190) TestStressIndexing2 testMultiConfig failure

2011-06-14 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3190:


Attachment: LUCENE-3190.patch

here is a patch that prevent this assert when ram buffer is low compared to the 
size of the documents we are indexing.

for "normal" setting the assert will be executed but for very lowish documents 
we simply skip it entirely

> TestStressIndexing2 testMultiConfig failure
> ---
>
> Key: LUCENE-3190
> URL: https://issues.apache.org/jira/browse/LUCENE-3190
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: selckin
>Assignee: Simon Willnauer
> Attachments: LUCENE-3190.patch
>
>
> trunk: r1134311
> reproducible
> {code}
> [junit] Testsuite: org.apache.lucene.index.TestStressIndexing2
> [junit] Tests run: 1, Failures: 2, Errors: 0, Time elapsed: 0.882 sec
> [junit] 
> [junit] - Standard Error -
> [junit] java.lang.AssertionError: ram was 460908 expected: 408216 flush 
> mem: 395100 active: 65808
> [junit] at 
> org.apache.lucene.index.DocumentsWriterFlushControl.assertMemory(DocumentsWriterFlushControl.java:102)
> [junit] at 
> org.apache.lucene.index.DocumentsWriterFlushControl.doAfterDocument(DocumentsWriterFlushControl.java:164)
> [junit] at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380)
> [junit] at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1473)
> [junit] at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1445)
> [junit] at 
> org.apache.lucene.index.TestStressIndexing2$IndexingThread.indexDoc(TestStressIndexing2.java:723)
> [junit] at 
> org.apache.lucene.index.TestStressIndexing2$IndexingThread.run(TestStressIndexing2.java:757)
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 
> -Dtestmethod=testMultiConfig 
> -Dtests.seed=2571834029692482827:-8116419692655152763
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 
> -Dtestmethod=testMultiConfig 
> -Dtests.seed=2571834029692482827:-8116419692655152763
> [junit] The following exceptions were thrown by threads:
> [junit] *** Thread: Thread-0 ***
> [junit] junit.framework.AssertionFailedError: java.lang.AssertionError: 
> ram was 460908 expected: 408216 flush mem: 395100 active: 65808
> [junit] at junit.framework.Assert.fail(Assert.java:47)
> [junit] at 
> org.apache.lucene.index.TestStressIndexing2$IndexingThread.run(TestStressIndexing2.java:762)
> [junit] NOTE: test params are: codec=RandomCodecProvider: {f33=Standard, 
> f57=MockFixedIntBlock(blockSize=649), f11=Standard, f41=MockRandom, 
> f40=Standard, f62=MockRandom, f75=Standard, f73=MockSep, 
> f29=MockFixedIntBlock(blockSize=649), f83=MockRandom, f66=MockSep, 
> f49=MockVariableIntBlock(baseBlockSize=9), f72=Pulsing(freqCutoff=7), 
> f54=Standard, id=MockFixedIntBlock(blockSize=649), f80=MockRandom, 
> f94=MockSep, f93=Pulsing(freqCutoff=7), f95=Standard}, locale=en_SG, 
> timezone=Pacific/Palau
> [junit] NOTE: all tests run in this JVM:
> [junit] [TestStressIndexing2]
> [junit] NOTE: Linux 2.6.39-gentoo amd64/Sun Microsystems Inc. 1.6.0_25 
> (64-bit)/cpus=8,threads=1,free=133324528,total=158400512
> [junit] -  ---
> [junit] Testcase: 
> testMultiConfig(org.apache.lucene.index.TestStressIndexing2): FAILED
> [junit] r1.numDocs()=17 vs r2.numDocs()=16
> [junit] junit.framework.AssertionFailedError: r1.numDocs()=17 vs 
> r2.numDocs()=16
> [junit] at 
> org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:308)
> [junit] at 
> org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:278)
> [junit] at 
> org.apache.lucene.index.TestStressIndexing2.testMultiConfig(TestStressIndexing2.java:124)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)
> [junit] 
> [junit] 
> [junit] Testcase: 
> testMultiConfig(org.apache.lucene.index.TestStressIndexing2): FAILED
> [junit] Some threads threw uncaught exceptions!
> [junit] junit.framework.AssertionFailedError: Some threads threw uncaught 
> exceptions!
> [junit] at 
> org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:603)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)

[Lucene.Net] [jira] [Commented] (LUCENENET-425) MMapDirectory implementation

2011-06-14 Thread Digy (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049473#comment-13049473
 ] 

Digy commented on LUCENENET-425:


OK, I think it will be better to mark MMapDirectory as unimplemented like 
NIOFSDirectory.

DIGY

> MMapDirectory implementation
> 
>
> Key: LUCENENET-425
> URL: https://issues.apache.org/jira/browse/LUCENENET-425
> Project: Lucene.Net
>  Issue Type: New Feature
>Affects Versions: Lucene.Net 2.9.4g
>Reporter: Digy
>Priority: Trivial
> Fix For: Lucene.Net 2.9.4g
>
> Attachments: MMapDirectory.patch
>
>
> Since this is not a direct port of MMapDirectory.java, I'll put it under 
> "Support" and implement MMapDirectory as 
> {code}
> public class MMapDirectory:Lucene.Net.Support.MemoryMappedDirectory
> {
> }
> {code}
> If a Mem-Map can not be created(for ex, if the file is too big to fit in 32 
> bit address range), it will default to FSDirectory.FSIndexInput
> In my tests, I didn't see any performance gain in 32bit environment and I 
> consider it as better then nothing. 
> I would be happy if someone could send test results on 64bit platform.
> DIGY

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-06-14 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049465#comment-13049465
 ] 

Mark Miller commented on SOLR-1431:
---

Hang - might have gotten bit by JIRA's new patch sorting bs - used to just do 
it right and I prob had it sorting wrong or something. Just gave it one last go 
and the patch applied cleanly.

> CommComponent abstracted
> 
>
> Key: SOLR-1431
> URL: https://issues.apache.org/jira/browse/SOLR-1431
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.0
>Reporter: Jason Rutherglen
>Assignee: Mark Miller
> Fix For: 4.0
>
> Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch
>
>
> We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3201) improved compound file handling

2011-06-14 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049460#comment-13049460
 ] 

Uwe Schindler edited comment on LUCENE-3201 at 6/14/11 10:07 PM:
-

Robert: Very nice. Small thing:

- NIOFSCompoundFileDirectory / SimpleFSCompoundFileDirectory / 
MMapCompoundFileDirectory are non-static inner classes but still get parent 
Directory in ctor. This is douplicated as javac also passes the parent around 
(the special ParentClassName.this one). I would remove the ctor param and use 
"*FSDirectory.this" as reference to outer class. I nitpick, because at some 
places it references the parent directory without the ctor param, so its 
inconsistent.

That's all for now, thanks for hard work!

  was (Author: thetaphi):
Robert: Very nice. Small thing:

- NIOFSCompoundFileDirectory / SimpleFSCompoundFileDirectory are non-static 
inner classes but still get parent Directory in ctor. This is douplicated as 
javac also passes the parent around (the special ParentClassName.this one). I 
would remove the ctor param and use "Simple/NIO-FSDirectory.this" as reference 
to outer class. I nitpick, because at some places it references the parent 
directory without the ctor param, so its inconsistent.

That's all for now, thanks for hard work!
  
> improved compound file handling
> ---
>
> Key: LUCENE-3201
> URL: https://issues.apache.org/jira/browse/LUCENE-3201
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Robert Muir
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3201.patch, LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following 
> problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for 
> directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of 
> compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would 
> just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of 
> course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory 
> could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it 
> wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize 
> how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
> return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a 
> Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it 
> expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-06-14 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049461#comment-13049461
 ] 

Mark Miller commented on SOLR-1431:
---

Can you update your patch to apply without the hunk failures? Tests will not 
pass for me locally with the current patch.

> CommComponent abstracted
> 
>
> Key: SOLR-1431
> URL: https://issues.apache.org/jira/browse/SOLR-1431
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.0
>Reporter: Jason Rutherglen
>Assignee: Mark Miller
> Fix For: 4.0
>
> Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch
>
>
> We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8841 - Still Failing

2011-06-14 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8841/

7 tests failed.
REGRESSION:  org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety

Error Message:
Error occurred in thread Thread-47: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/4/test2351223530tmp/_x_1.tiv
 (Too many open files in system)

Stack Trace:
junit.framework.AssertionFailedError: Error occurred in thread Thread-47:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/4/test2351223530tmp/_x_1.tiv
 (Too many open files in system)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/4/test2351223530tmp/_x_1.tiv
 (Too many open files in system)
at 
org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety(TestIndexReaderReopen.java:822)


REGRESSION:  
org.apache.lucene.search.TestSimpleExplanationsOfNonMatches.testMultiFieldBQofPQ4

Error Message:
CheckIndex failed

Stack Trace:
java.lang.RuntimeException: CheckIndex failed
at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:158)
at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144)
at 
org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477)
at 
org.apache.lucene.search.TestExplanations.tearDown(TestExplanations.java:66)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)


REGRESSION:  
org.apache.lucene.search.TestSimpleExplanationsOfNonMatches.testMultiFieldBQofPQ5

Error Message:
null

Stack Trace:
java.lang.NullPointerException
at 
org.apache.lucene.search.TestExplanations.tearDown(TestExplanations.java:64)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)


REGRESSION:  
org.apache.lucene.search.TestSimpleExplanationsOfNonMatches.testMultiFieldBQofPQ6

Error Message:
null

Stack Trace:
java.lang.NullPointerException
at 
org.apache.lucene.search.TestExplanations.tearDown(TestExplanations.java:64)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)


REGRESSION:  
org.apache.lucene.search.TestSimpleExplanationsOfNonMatches.testMultiFieldBQofPQ7

Error Message:
null

Stack Trace:
java.lang.NullPointerException
at 
org.apache.lucene.search.TestExplanations.tearDown(TestExplanations.java:64)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)


REGRESSION:  
org.apache.lucene.search.TestSimpleExplanationsOfNonMatches.testNoop

Error Message:
null

Stack Trace:
java.lang.NullPointerException
at 
org.apache.lucene.search.TestExplanations.tearDown(TestExplanations.java:64)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)


FAILED:  
junit.framework.TestSuite.org.apache.lucene.search.TestSimpleExplanationsOfNonMatches

Error Message:
ensure your setUp() calls super.setUp() and your tearDown() calls 
super.tearDown()!!!

Stack Trace:
junit.framework.AssertionFailedError: ensure your setUp() calls super.setUp() 
and your tearDown() calls super.tearDown()!!!
at 
org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:401)




Build Log (for compile errors):
[...truncated 3414 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3201) improved compound file handling

2011-06-14 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049460#comment-13049460
 ] 

Uwe Schindler commented on LUCENE-3201:
---

Robert: Very nice. Small thing:

- NIOFSCompoundFileDirectory / SimpleFSCompoundFileDirectory are non-static 
inner classes but still get parent Directory in ctor. This is douplicated as 
javac also passes the parent around (the special ParentClassName.this one). I 
would remove the ctor param and use "Simple/NIO-FSDirectory.this" as reference 
to outer class. I nitpick, because at some places it references the parent 
directory without the ctor param, so its inconsistent.

That's all for now, thanks for hard work!

> improved compound file handling
> ---
>
> Key: LUCENE-3201
> URL: https://issues.apache.org/jira/browse/LUCENE-3201
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Robert Muir
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3201.patch, LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following 
> problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for 
> directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of 
> compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would 
> just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of 
> course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory 
> could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it 
> wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize 
> how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
> return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a 
> Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it 
> expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2548) Multithreaded faceting

2011-06-14 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049453#comment-13049453
 ] 

Adrien Grand commented on SOLR-2548:


Hoss Man,

Regarding the best value of the number of threads to spawn based on the number 
of CPUs and the traffic, one could imagine to decide whether to spawn a new 
thread or to run the task in the current thread based on the load of the 
server. This way, servers under high traffic would run every request in a 
single thread (maximizing throughput) whereas servers under low traffic would 
be able to use every processor in order to minimize response time. The load is 
easily retrievable in Java 6 using OperatingSystemMXBean, I don't know if it is 
possible in a non OS-specific way in Java 5.

I don't really understand what you mean by "if you really care about 
parallelizing faceting, you probably wouldn't want some other intensive 
component starving out the thread pool". Do you mean that you would expect some 
requests to be run slower with every component using a global thread pool than 
with a single thread pool dedicated to facets?

Yonik, why would you want to limit the number of threads on a per-request 
basis, if enough CPUs are available?


> Multithreaded faceting
> --
>
> Key: SOLR-2548
> URL: https://issues.apache.org/jira/browse/SOLR-2548
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 3.1
>Reporter: Janne Majaranta
>Priority: Minor
>  Labels: facet
> Attachments: SOLR-2548.patch, SOLR-2548_for_31x.patch
>
>
> Add multithreading support for faceting.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3204) Include maven-ant-tasks jar in the source tree and use this jar from generate-maven-artifacts

2011-06-14 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049451#comment-13049451
 ] 

Steven Rowe commented on LUCENE-3204:
-

bq. Jenkins now complains because of missing license file: 
https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8840/console

It's the NOTICE file that's missing, and I've just added it.

bq. On Jenkins, I removed maven-ant-tasks from ~hudson/.ant/lib.

Thanks!

> Include maven-ant-tasks jar in the source tree and use this jar from 
> generate-maven-artifacts
> -
>
> Key: LUCENE-3204
> URL: https://issues.apache.org/jira/browse/LUCENE-3204
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: general/build
>Affects Versions: 3.3, 4.0
>Reporter: Steven Rowe
>Assignee: Steven Rowe
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3204.patch, LUCENE-3204.patch
>
>
> Currently, running {{ant generate-maven-artifacts}} requires the user to have 
> {{maven-ant-tasks-*.jar}} in their Ant classpath, e.g. in {{~/.ant/lib/}}.  
> The build should instead rely on a copy of this jar included in the source 
> tree.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8844 - Failure

2011-06-14 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/8844/

1 tests failed.
REGRESSION:  org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety

Error Message:
Error occurred in thread Thread-149: Lock obtain timed out: 
org.apache.lucene.store.MockLockFactoryWrapper$MockLock@c3c1635

Stack Trace:
junit.framework.AssertionFailedError: Error occurred in thread Thread-149:
Lock obtain timed out: 
org.apache.lucene.store.MockLockFactoryWrapper$MockLock@c3c1635
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1268)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1186)
Lock obtain timed out: 
org.apache.lucene.store.MockLockFactoryWrapper$MockLock@c3c1635
at 
org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety(TestIndexReaderReopen.java:820)




Build Log (for compile errors):
[...truncated 5258 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3204) Include maven-ant-tasks jar in the source tree and use this jar from generate-maven-artifacts

2011-06-14 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049448#comment-13049448
 ] 

Uwe Schindler commented on LUCENE-3204:
---

Jenkins now complains because of missing license file: 
[https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8840/console]

On Jenkins, I removed maven-ant-tasks from ~hudson/.ant/lib.

> Include maven-ant-tasks jar in the source tree and use this jar from 
> generate-maven-artifacts
> -
>
> Key: LUCENE-3204
> URL: https://issues.apache.org/jira/browse/LUCENE-3204
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: general/build
>Affects Versions: 3.3, 4.0
>Reporter: Steven Rowe
>Assignee: Steven Rowe
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3204.patch, LUCENE-3204.patch
>
>
> Currently, running {{ant generate-maven-artifacts}} requires the user to have 
> {{maven-ant-tasks-*.jar}} in their Ant classpath, e.g. in {{~/.ant/lib/}}.  
> The build should instead rely on a copy of this jar included in the source 
> tree.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 8840 - Failure

2011-06-14 Thread Steven A Rowe
This build failed because of a missing NOTICE file for the maven-ant-tasks jar. 
 I'm adding it now. - Steve

> -Original Message-
> From: Apache Jenkins Server [mailto:jenk...@builds.apache.org]
> Sent: Tuesday, June 14, 2011 5:37 PM
> To: dev@lucene.apache.org
> Subject: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 8840 - Failure
> 
> Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8840/
> 
> No tests ran.
> 
> Build Log (for compile errors):
> [...truncated 2261 lines...]
> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3201) improved compound file handling

2011-06-14 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3201:


Attachment: LUCENE-3201.patch

here is an updated patch, including impls for SimpleFS and NIOFS, fixing the 
FileSwitchDirectory thing uwe mentioned, and also mockdirectorywrapper and 
NRTCachingDirectory.

all the tests pass with Simple/NIO/MMap but we need to benchmark. haven't had 
good luck today with luceneutil

> improved compound file handling
> ---
>
> Key: LUCENE-3201
> URL: https://issues.apache.org/jira/browse/LUCENE-3201
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Robert Muir
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3201.patch, LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following 
> problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for 
> directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of 
> compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would 
> just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of 
> course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory 
> could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it 
> wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize 
> how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
> return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a 
> Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it 
> expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8840 - Failure

2011-06-14 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8840/

No tests ran.

Build Log (for compile errors):
[...truncated 2261 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[Lucene.Net] [jira] [Commented] (LUCENENET-425) MMapDirectory implementation

2011-06-14 Thread Ben West (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049435#comment-13049435
 ] 

Ben West commented on LUCENENET-425:


Unfortunately (or perhaps fortunately in that Digy doesn't need to do more work 
:-) MMap is slower on 64 bit too. Index is 2.2gb.

{panel}
Create index, FSDir:   419061
Create index, MMapdir: 532536
Search index, FSDir:   757
Search index, MMapdir: 2030
{panel}

Reversing order:

{panel}
Search index, FSDir: 734
Search index, MMap dir: 1934
{panel}

I have 8gb ram, so I think the entire index was able to be cached in memory by 
the OS. 

> MMapDirectory implementation
> 
>
> Key: LUCENENET-425
> URL: https://issues.apache.org/jira/browse/LUCENENET-425
> Project: Lucene.Net
>  Issue Type: New Feature
>Affects Versions: Lucene.Net 2.9.4g
>Reporter: Digy
>Priority: Trivial
> Fix For: Lucene.Net 2.9.4g
>
> Attachments: MMapDirectory.patch
>
>
> Since this is not a direct port of MMapDirectory.java, I'll put it under 
> "Support" and implement MMapDirectory as 
> {code}
> public class MMapDirectory:Lucene.Net.Support.MemoryMappedDirectory
> {
> }
> {code}
> If a Mem-Map can not be created(for ex, if the file is too big to fit in 32 
> bit address range), it will default to FSDirectory.FSIndexInput
> In my tests, I didn't see any performance gain in 32bit environment and I 
> consider it as better then nothing. 
> I would be happy if someone could send test results on 64bit platform.
> DIGY

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (LUCENE-3204) Include maven-ant-tasks jar in the source tree and use this jar from generate-maven-artifacts

2011-06-14 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated LUCENE-3204:


Attachment: LUCENE-3204.patch

Added CHANGES.txt entries, including mention of the fact that copies of the 
maven-ant-tasks jar in the Ant classpath take precedence over the copy in the 
Lucene/Solr source tree.

Committing shortly.

> Include maven-ant-tasks jar in the source tree and use this jar from 
> generate-maven-artifacts
> -
>
> Key: LUCENE-3204
> URL: https://issues.apache.org/jira/browse/LUCENE-3204
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: general/build
>Affects Versions: 3.3, 4.0
>Reporter: Steven Rowe
>Assignee: Steven Rowe
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3204.patch, LUCENE-3204.patch
>
>
> Currently, running {{ant generate-maven-artifacts}} requires the user to have 
> {{maven-ant-tasks-*.jar}} in their Ant classpath, e.g. in {{~/.ant/lib/}}.  
> The build should instead rely on a copy of this jar included in the source 
> tree.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2452) rewrite solr build system

2011-06-14 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049430#comment-13049430
 ] 

Steven Rowe commented on SOLR-2452:
---

Forgot to include the issue number in the comment, so it's not showing up here, 
but I just committed a merge with trunk up to r1135758.  Here's the ViewVC 
link: http://svn.apache.org/viewvc?view=revision&revision=1135759

> rewrite solr build system
> -
>
> Key: SOLR-2452
> URL: https://issues.apache.org/jira/browse/SOLR-2452
> Project: Solr
>  Issue Type: Task
>  Components: Build
>Reporter: Robert Muir
> Fix For: 3.3
>
>
> As discussed some in SOLR-2002 (but that issue is long and hard to follow), I 
> think we should rewrite the solr build system.
> Its slow, cumbersome, and messy, and makes it hard for us to improve things.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1873) Commit Solr Cloud to trunk

2011-06-14 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049425#comment-13049425
 ] 

Mark Miller commented on SOLR-1873:
---

If I remember right (been a long time since I talked about it with Jon), I 
think loggly had to do some small custom hack for this type of thing as well - 
no issue that I know of - lets make a new issue.

> Commit Solr Cloud to trunk
> --
>
> Key: SOLR-1873
> URL: https://issues.apache.org/jira/browse/SOLR-1873
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 1.4
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 4.0
>
> Attachments: SOLR-1873.patch, SOLR-1873.patch, SOLR-1873.patch, 
> SOLR-1873.patch, SOLR-1873.patch, SOLR-1873.patch, SOLR-1873.patch, 
> SOLR-1873.patch, SOLR-1873.patch, SOLR-1873.patch, SOLR-1873.patch, 
> SOLR-1873.patch, SOLR-1873.patch, SOLR-1873.patch, 
> TEST-org.apache.solr.cloud.ZkSolrClientTest.txt, log4j-over-slf4j-1.5.5.jar, 
> zookeeper-3.2.2.jar, zookeeper-3.3.1.jar
>
>
> See http://wiki.apache.org/solr/SolrCloud
> This is a real hassle - I didn't merge up to trunk before all the svn 
> scrambling, so integrating cloud is now a bit difficult. I'm running through 
> and just preparing a commit by hand though (applying changes/handling 
> conflicts a file at a time).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: commitLockTimeout in solrconfig.xml

2011-06-14 Thread Martijn v Groningen
I've created SOLR-2591 for removing commitLockTimeout option.

Martijn

On 4 June 2011 13:47, Mark Miller  wrote:

>
> On Jun 4, 2011, at 7:42 AM, Martijn v Groningen wrote:
>
> > the commitLockTimeout option is really not used I think we should remove
> this
>
> +1.
>
> - Mark Miller
> lucidimagination.com
>
> BERLIN BUZZWORDS JUNE 6-7TH, 2011
>
>
>
>
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


-- 
Met vriendelijke groet,

Martijn van Groningen


[jira] [Created] (SOLR-2591) Remove commitLockTimeout option from solrconfig.xml

2011-06-14 Thread Martijn van Groningen (JIRA)
Remove commitLockTimeout option from solrconfig.xml
---

 Key: SOLR-2591
 URL: https://issues.apache.org/jira/browse/SOLR-2591
 Project: Solr
  Issue Type: Improvement
Reporter: Martijn van Groningen
Assignee: Martijn van Groningen
Priority: Minor
 Fix For: 4.0


I've noticed that commitLockTimeout option is loaded by the configuration but 
no longer used. This issue will be concerned with removing this option from all 
solrconfig.xml files (including example) and from the SolrConfig class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3204) Include maven-ant-tasks jar in the source tree and use this jar from generate-maven-artifacts

2011-06-14 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049386#comment-13049386
 ] 

Steven Rowe commented on LUCENE-3204:
-

I unpacked the jar, defaced the definitions file loaded by the : 
{{org/apache/maven/artifact/ant/antlib.xml}}, then repacked the now-mangled jar 
and put the result in {{~/.ant/lib/}}, while leaving intact the copy under 
{{lucene/lib/}}.

The result: the mangled copy under {{~/.ant/lib/}} is visited first, resulting 
in an error.  This means that the supplied version does *not* get preferred 
over what's already in {{~/.ant/lib/}}.

I don't think this is a serious problem, but I'll make mention of it in the 
CHANGES.txt entry (to be included in another iteration of the patch).

> Include maven-ant-tasks jar in the source tree and use this jar from 
> generate-maven-artifacts
> -
>
> Key: LUCENE-3204
> URL: https://issues.apache.org/jira/browse/LUCENE-3204
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: general/build
>Affects Versions: 3.3, 4.0
>Reporter: Steven Rowe
>Assignee: Steven Rowe
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3204.patch
>
>
> Currently, running {{ant generate-maven-artifacts}} requires the user to have 
> {{maven-ant-tasks-*.jar}} in their Ant classpath, e.g. in {{~/.ant/lib/}}.  
> The build should instead rely on a copy of this jar included in the source 
> tree.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3204) Include maven-ant-tasks jar in the source tree and use this jar from generate-maven-artifacts

2011-06-14 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049382#comment-13049382
 ] 

Steven Rowe commented on LUCENE-3204:
-

bq. Does the supplied version of maven-ant-task automatically get preferred 
over whats already in ~/.ant/lib ?

I'm not sure.  How can I test this?  I removed the copy in {{lucene/lib/}} and 
put a copy of the jar in {{~/.ant/lib/}}.  {{ant generate-maven-artifacts}} 
still succeeds.

> Include maven-ant-tasks jar in the source tree and use this jar from 
> generate-maven-artifacts
> -
>
> Key: LUCENE-3204
> URL: https://issues.apache.org/jira/browse/LUCENE-3204
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: general/build
>Affects Versions: 3.3, 4.0
>Reporter: Steven Rowe
>Assignee: Steven Rowe
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3204.patch
>
>
> Currently, running {{ant generate-maven-artifacts}} requires the user to have 
> {{maven-ant-tasks-*.jar}} in their Ant classpath, e.g. in {{~/.ant/lib/}}.  
> The build should instead rely on a copy of this jar included in the source 
> tree.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3204) Include maven-ant-tasks jar in the source tree and use this jar from generate-maven-artifacts

2011-06-14 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049374#comment-13049374
 ] 

Uwe Schindler commented on LUCENE-3204:
---

I think that's fine. Does the supplied version of maven-ant-task automatically 
get preferred over whats already in ~/.ant/lib ?

> Include maven-ant-tasks jar in the source tree and use this jar from 
> generate-maven-artifacts
> -
>
> Key: LUCENE-3204
> URL: https://issues.apache.org/jira/browse/LUCENE-3204
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: general/build
>Affects Versions: 3.3, 4.0
>Reporter: Steven Rowe
>Assignee: Steven Rowe
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3204.patch
>
>
> Currently, running {{ant generate-maven-artifacts}} requires the user to have 
> {{maven-ant-tasks-*.jar}} in their Ant classpath, e.g. in {{~/.ant/lib/}}.  
> The build should instead rely on a copy of this jar included in the source 
> tree.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3204) Include maven-ant-tasks jar in the source tree and use this jar from generate-maven-artifacts

2011-06-14 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated LUCENE-3204:


Attachment: LUCENE-3204.patch

Patch implementing the idea.

> Include maven-ant-tasks jar in the source tree and use this jar from 
> generate-maven-artifacts
> -
>
> Key: LUCENE-3204
> URL: https://issues.apache.org/jira/browse/LUCENE-3204
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: general/build
>Affects Versions: 3.3, 4.0
>Reporter: Steven Rowe
>Assignee: Steven Rowe
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3204.patch
>
>
> Currently, running {{ant generate-maven-artifacts}} requires the user to have 
> {{maven-ant-tasks-*.jar}} in their Ant classpath, e.g. in {{~/.ant/lib/}}.  
> The build should instead rely on a copy of this jar included in the source 
> tree.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3204) Include maven-ant-tasks jar in the source tree and use this jar from generate-maven-artifacts

2011-06-14 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049372#comment-13049372
 ] 

Steven Rowe commented on LUCENE-3204:
-

Committing shortly.

> Include maven-ant-tasks jar in the source tree and use this jar from 
> generate-maven-artifacts
> -
>
> Key: LUCENE-3204
> URL: https://issues.apache.org/jira/browse/LUCENE-3204
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: general/build
>Affects Versions: 3.3, 4.0
>Reporter: Steven Rowe
>Assignee: Steven Rowe
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3204.patch
>
>
> Currently, running {{ant generate-maven-artifacts}} requires the user to have 
> {{maven-ant-tasks-*.jar}} in their Ant classpath, e.g. in {{~/.ant/lib/}}.  
> The build should instead rely on a copy of this jar included in the source 
> tree.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3204) Include maven-ant-tasks jar in the source tree and use this jar from generate-maven-artifacts

2011-06-14 Thread Steven Rowe (JIRA)
Include maven-ant-tasks jar in the source tree and use this jar from 
generate-maven-artifacts
-

 Key: LUCENE-3204
 URL: https://issues.apache.org/jira/browse/LUCENE-3204
 Project: Lucene - Java
  Issue Type: Improvement
  Components: general/build
Affects Versions: 3.3, 4.0
Reporter: Steven Rowe
Assignee: Steven Rowe
 Fix For: 3.3, 4.0
 Attachments: LUCENE-3204.patch

Currently, running {{ant generate-maven-artifacts}} requires the user to have 
{{maven-ant-tasks-*.jar}} in their Ant classpath, e.g. in {{~/.ant/lib/}}.  The 
build should instead rely on a copy of this jar included in the source tree.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[Lucene.Net] [jira] [Updated] (LUCENENET-425) MMapDirectory implementation

2011-06-14 Thread Christopher Currens (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENENET-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher Currens updated LUCENENET-425:
--

Comment: was deleted

(was: On a 1.18GB index of only text:

FS Reader: 27
MMap Reader: 90
---
FS Reader: 38
MMap Reader: 77
Press any key to continue . . .)

> MMapDirectory implementation
> 
>
> Key: LUCENENET-425
> URL: https://issues.apache.org/jira/browse/LUCENENET-425
> Project: Lucene.Net
>  Issue Type: New Feature
>Affects Versions: Lucene.Net 2.9.4g
>Reporter: Digy
>Priority: Trivial
> Fix For: Lucene.Net 2.9.4g
>
> Attachments: MMapDirectory.patch
>
>
> Since this is not a direct port of MMapDirectory.java, I'll put it under 
> "Support" and implement MMapDirectory as 
> {code}
> public class MMapDirectory:Lucene.Net.Support.MemoryMappedDirectory
> {
> }
> {code}
> If a Mem-Map can not be created(for ex, if the file is too big to fit in 32 
> bit address range), it will default to FSDirectory.FSIndexInput
> In my tests, I didn't see any performance gain in 32bit environment and I 
> consider it as better then nothing. 
> I would be happy if someone could send test results on 64bit platform.
> DIGY

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[Lucene.Net] [jira] [Commented] (LUCENENET-425) MMapDirectory implementation

2011-06-14 Thread Christopher Currens (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049359#comment-13049359
 ] 

Christopher Currens commented on LUCENENET-425:
---

On a 1.18GB index of only text:

FS Reader: 27
MMap Reader: 90
---
FS Reader: 38
MMap Reader: 77
Press any key to continue . . .

> MMapDirectory implementation
> 
>
> Key: LUCENENET-425
> URL: https://issues.apache.org/jira/browse/LUCENENET-425
> Project: Lucene.Net
>  Issue Type: New Feature
>Affects Versions: Lucene.Net 2.9.4g
>Reporter: Digy
>Priority: Trivial
> Fix For: Lucene.Net 2.9.4g
>
> Attachments: MMapDirectory.patch
>
>
> Since this is not a direct port of MMapDirectory.java, I'll put it under 
> "Support" and implement MMapDirectory as 
> {code}
> public class MMapDirectory:Lucene.Net.Support.MemoryMappedDirectory
> {
> }
> {code}
> If a Mem-Map can not be created(for ex, if the file is too big to fit in 32 
> bit address range), it will default to FSDirectory.FSIndexInput
> In my tests, I didn't see any performance gain in 32bit environment and I 
> consider it as better then nothing. 
> I would be happy if someone could send test results on 64bit platform.
> DIGY

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (SOLR-2548) Multithreaded faceting

2011-06-14 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049355#comment-13049355
 ] 

Yonik Seeley commented on SOLR-2548:


I think this should be configurable on a per-request basis (not the max size of 
the threadpool, but how many threads out of that to use concurrently).
For facet.method=fcs (per-segment faceting using the field cache), I did 
introduce a "threads" localParam.
Perhaps we should have a "threads" or "facet.threads" request parameter?

> Multithreaded faceting
> --
>
> Key: SOLR-2548
> URL: https://issues.apache.org/jira/browse/SOLR-2548
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 3.1
>Reporter: Janne Majaranta
>Priority: Minor
>  Labels: facet
> Attachments: SOLR-2548.patch, SOLR-2548_for_31x.patch
>
>
> Add multithreading support for faceting.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3201) improved compound file handling

2011-06-14 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049352#comment-13049352
 ] 

Uwe Schindler commented on LUCENE-3201:
---

We have LUCENE-1743 for the small files can of worms.

> improved compound file handling
> ---
>
> Key: LUCENE-3201
> URL: https://issues.apache.org/jira/browse/LUCENE-3201
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Robert Muir
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following 
> problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for 
> directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of 
> compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would 
> just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of 
> course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory 
> could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it 
> wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize 
> how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
> return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a 
> Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it 
> expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[Lucene.Net] [jira] [Commented] (LUCENENET-417) implement streams as field values

2011-06-14 Thread Digy (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049350#comment-13049350
 ] 

Digy commented on LUCENENET-417:


Maybe, this is a stupid question but, what is the reason to index a very large 
doc?
If I indexed a whole book as single document, It would appear in almost every 
kind of search's result sets.
search "computer" --> this book.
search "sport"  --> this book.
search "politics" --> this book.

DIGY

> implement streams as field values
> -
>
> Key: LUCENENET-417
> URL: https://issues.apache.org/jira/browse/LUCENENET-417
> Project: Lucene.Net
>  Issue Type: New Feature
>  Components: Lucene.Net Core
>Reporter: Christopher Currens
> Attachments: StreamValues.patch
>
>
> Adding binary values to a field is an expensive operation, as the whole 
> binary data must be loaded into memory and then written to the index.  Adding 
> the ability to use a stream instead of a byte array could not only speed up 
> the indexing process, but reducing the memory footprint as well.
> -Java lucene has the ability to use a TextReader the both analyze and store 
> text in the index.-  Lucene.NET lacks the ability to store string data in the 
> index via streams. This should be a feature added into Lucene .NET as well.  
> My thoughts are to add another Field constructor, that is Field(string name, 
> System.IO.Stream stream, System.Text.Encoding encoding), that will allow the 
> text to be analyzed and stored into the index.
> Comments about this approach are greatly appreciated.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (LUCENE-3201) improved compound file handling

2011-06-14 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049346#comment-13049346
 ] 

Robert Muir commented on LUCENE-3201:
-

I agree, the fileswitchdirectory should delegate the openCompoundInput.

As far as mapping small things, I think we should set this aside for another 
issue. 
as far as this issue goes, I don't mind returning the DefaultCompound impl if 
unmapping isn't supported, but i'd really rather defer the open the can of 
worms of 'mapping small things' to some other issue :)


> improved compound file handling
> ---
>
> Key: LUCENE-3201
> URL: https://issues.apache.org/jira/browse/LUCENE-3201
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Robert Muir
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following 
> problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for 
> directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of 
> compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would 
> just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of 
> course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory 
> could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it 
> wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize 
> how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
> return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a 
> Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it 
> expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2548) Multithreaded faceting

2011-06-14 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049341#comment-13049341
 ] 

Hoss Man commented on SOLR-2548:


Janne: thanks for the awesome patch!

in general i think this type of functionality is a good idea -- the real 
question is how it should be configured/controlled.

admins with many CPUs who expect low amounts of concurrent user traffic might 
be ok with spawning availableProcessors() threads per request, but admins with 
less CPUs then concurrent requests are going to prefer that individual requests 
stay single threaded and take a little longer.

The suggestion to use a thread pool based executorservice definitely seems like 
it makes more sense, then it just becomes a matter of asking the admin to 
configure a simple number determining the size of the threadpool, we just need 
to support a few sentinal values: NUM_CPUS, and NONE (always use callers 
thread).

since we'd want a threadpool that lives longer then a single request, this 
definitely shouldn't be an option specified via SolrParams (not to mention the 
risk involved if people don't lock it down with invariants).  That leaves the 
question of wether this should be an "init" param on the FacetComponent, or 
something more global.

My first thought was that we should bite the bullet and add a new top level 
config for a global thread pool executor service that any solr plugin could 
start using, but after thinking about it some more i think that would not only 
be premature, but perhaps even a bad idea in general -- even if we assume 
something like DIH, UIMA, or Highlighting could also take advantage of a shared 
thread pool owned by solr, if you really care about parallelizing faceting, you 
probably wouldn't want some other intensive component starving out the thread 
pool (or vice versa)

> Multithreaded faceting
> --
>
> Key: SOLR-2548
> URL: https://issues.apache.org/jira/browse/SOLR-2548
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 3.1
>Reporter: Janne Majaranta
>Priority: Minor
>  Labels: facet
> Attachments: SOLR-2548.patch, SOLR-2548_for_31x.patch
>
>
> Add multithreading support for faceting.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Welcome Jan Høydahl as Lucene/Solr committer

2011-06-14 Thread Jan Høydahl
Hi all,

Thanks a lot to the PMC for entrusting me this role!

% whoami

I'm a hacker from Norway, soon to become 101000 summers old. Married to 
wonderful Hilde, living
outside Oslo with our cat Rosåsi (meaning "grey" in Arabic). I love 
snowboarding, kayaking,
travelling, working with immigrants and volunteering in my local church.

Started my IT carreer at age 12 programming Basic on the C=64 while my brother 
was playing games.
Then later 68000 assembly and C on the Amiga. Sold my first program around 
1993, an AREXX
script for NComm on Amiga, helping people save money placing ads faster on the 
national teletext
service :) I've also programmed Turbo Pascal, C++, PLEX-C, ASA110, Python, PHP, 
Ruby, and even
assembly for the HP-48 calculator :) Been a Solaris, Lunix and for the last 5 
years Mac user.

Fast forward to 1998 when I learnt Java and helped develop Ericsson's first IP 
telephony service
way before JIT compilers etc. I became one of FAST's first Global Services 
consultants in 2000 in
the days of AllTheWeb™ and before they even had an enterprise search product. 
The search engine
consisted of a few C++ binaries; "findex" writing the index and "fsearch" 
searching it. Then came
RealTimeSearch, FDS and finally ESP. After 5 years @ FAST I "committed" a 
software outsourcing
startup for a few years before founding Cominvent to do full-time search 
consulting on FAST
technology (or so I thought..). I played some with Lucene in 2006 but then 
picked up Solr in 
2009, and now 95% of the business is on Solr/Lucene and 5% on FAST. What a 
change!

I love Apache, Open source, the Apache License and the Lucene community. With 
more than a decade
experience from Enterprise Search and well over 100 customer projects, I've 
learnt a thing or two
which I'm now doing my best to share with my customers and the community. Now 
hopefully more of
that will be as code. One of the first areas I'm hoping to help is UpdateChain 
related stuff as
well as Norwegian/Nordic language support.

http://no.linkedin.com/in/janhoy
http://twitter.com/cominvent

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 13. juni 2011, at 16.43, Mark Miller wrote:

> I'm happy to announce that the Lucene/Solr PMC has voted in Jan Høydahl as 
> our newest committer.
> 
> Jan, if you don't mind, could you introduce yourself with a brief bio as has 
> become our tradition?
> 
> Congratulations and welcome aboard!
> 
> 
> - Mark Miller
> lucidimagination.com
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3201) improved compound file handling

2011-06-14 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049335#comment-13049335
 ] 

Uwe Schindler commented on LUCENE-3201:
---

Hi Robert, great patch, exactly as I would have wished to have it when we 
discussed about it!

Patch looks file, small bug:
- FileSwitchDirectory should also override the openCompoundInput() from 
Directory and delegate to the correct underlying directory. Now it always uses 
the default impl, which is double buffering. So if you e.g. put MMapDirectory 
as a delegate for CFS files, those files would be opened like before your 
patch. Just copy'n'paste the code from one of the other FileSwitchDirectory 
methods.

Some suggestions:
We currently map the whole compound file into address space, read the 
header/contents and unmap it again. This may be some overhead especially if 
unmapping is not supported.
- We could use SimpleFSIndexInput to read CFS contents (we only need to pass 
the already open RAF there, alternatively use Dawids new wrapper IndexInput 
around a standard InputStream, got from RAF -> LUCENE-3202)
- Only map the header of the CFS file, the problem: we dont know exact size.

> improved compound file handling
> ---
>
> Key: LUCENE-3201
> URL: https://issues.apache.org/jira/browse/LUCENE-3201
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Robert Muir
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following 
> problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for 
> directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of 
> compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would 
> just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of 
> course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory 
> could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it 
> wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize 
> how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
> return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a 
> Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it 
> expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[Lucene.Net] [jira] [Commented] (LUCENENET-425) MMapDirectory implementation

2011-06-14 Thread Ben West (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049299#comment-13049299
 ] 

Ben West commented on LUCENENET-425:


The entire "Store" set of tests (including TestWindowsMMap) passes on Windows 7 
64 bit with your patch. Let me know if there are other tests you'd like me to 
run. I'm not familiar with what mmap directories do, so I probably won't be 
able to write a perf test myself.

> MMapDirectory implementation
> 
>
> Key: LUCENENET-425
> URL: https://issues.apache.org/jira/browse/LUCENENET-425
> Project: Lucene.Net
>  Issue Type: New Feature
>Affects Versions: Lucene.Net 2.9.4g
>Reporter: Digy
>Priority: Trivial
> Fix For: Lucene.Net 2.9.4g
>
> Attachments: MMapDirectory.patch
>
>
> Since this is not a direct port of MMapDirectory.java, I'll put it under 
> "Support" and implement MMapDirectory as 
> {code}
> public class MMapDirectory:Lucene.Net.Support.MemoryMappedDirectory
> {
> }
> {code}
> If a Mem-Map can not be created(for ex, if the file is too big to fit in 32 
> bit address range), it will default to FSDirectory.FSIndexInput
> In my tests, I didn't see any performance gain in 32bit environment and I 
> consider it as better then nothing. 
> I would be happy if someone could send test results on 64bit platform.
> DIGY

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (LUCENE-3197) Optimize runs forever if you keep deleting docs at the same time

2011-06-14 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049298#comment-13049298
 ] 

Yonik Seeley commented on LUCENE-3197:
--

Regardless of if one views this as a bug or not, I think the more useful 
semantics are to at least "merge all of the current segments into 1 and remove 
all *currently* deleted docs" (i.e. I agree with Mike).  The alternative is 
that optimize is dangerous in the presence of index updates (i.e. applications 
should discontinue updates if they call optimize).


> Optimize runs forever if you keep deleting docs at the same time
> 
>
> Key: LUCENE-3197
> URL: https://issues.apache.org/jira/browse/LUCENE-3197
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.3, 4.0
>
>
> Because we "cascade" merges for an optimize... if you also delete documents 
> while the merges are running, then the merge policy will see the resulting 
> single segment as still not optimized (since it has pending deletes) and do a 
> single-segment merge, and will repeat indefinitely (as long as your app keeps 
> deleting docs).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3201) improved compound file handling

2011-06-14 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3201:


Fix Version/s: 4.0
   3.3

setting 3.3/4.0 as fix version, as the changes are backwards compatible 
(compoundfilereader is pkg-private still in 3.x)


> improved compound file handling
> ---
>
> Key: LUCENE-3201
> URL: https://issues.apache.org/jira/browse/LUCENE-3201
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Robert Muir
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following 
> problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for 
> directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of 
> compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would 
> just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of 
> course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory 
> could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it 
> wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize 
> how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
> return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a 
> Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it 
> expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-14 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049293#comment-13049293
 ] 

Michael McCandless commented on LUCENE-2793:


LUCENE-3203 is another example where a Dir needs the IOContext so it can 
optionally rate limit the bytes/second if it's a merge.

> Directory createOutput and openInput should take an IOContext
> -
>
> Key: LUCENE-2793
> URL: https://issues.apache.org/jira/browse/LUCENE-2793
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Michael McCandless
>Assignee: Varun Thacker
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
> LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
> LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch
>
>
> Today for merging we pass down a larger readBufferSize than for searching 
> because we get better performance.
> I think we should generalize this to a class (IOContext), which would hold 
> the buffer size, but then could hold other flags like DIRECT (bypass OS's 
> buffer cache), SEQUENTIAL, etc.
> Then, we can make the DirectIOLinuxDirectory fully usable because we would 
> only use DIRECT/SEQUENTIAL during merging.
> This will require fixing how IW pools readers, so that a reader opened for 
> merging is not then used for searching, and vice/versa.  Really, it's only 
> all the open file handles that need to be different -- we could in theory 
> share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3203) Rate-limit IO used by merging

2011-06-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3203:
---

Attachment: LUCENE-3203.patch

Patch, with a hacked up a prototype impl, but I don't think we should
commit it like this.  Instead, I think we should wait for IOContext,
and then Dir impls can allow app to specify max merge write rate.


> Rate-limit IO used by merging
> -
>
> Key: LUCENE-3203
> URL: https://issues.apache.org/jira/browse/LUCENE-3203
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3203.patch
>
>
> Large merges can mess up searches and increase NRT reopen time (see
> http://blog.mikemccandless.com/2011/06/lucenes-near-real-time-search-is-fast.html).
> A simple rate limiter improves the spikey NRT reopen times during big
> merges, so I think we should somehow make this possible.  Likely this
> would reduce impact on searches as well.
> Typically apps that do indexing and searching on same box are in no
> rush to see the merges complete so this is a good tradeoff.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3203) Rate-limit IO used by merging

2011-06-14 Thread Michael McCandless (JIRA)
Rate-limit IO used by merging
-

 Key: LUCENE-3203
 URL: https://issues.apache.org/jira/browse/LUCENE-3203
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.3, 4.0


Large merges can mess up searches and increase NRT reopen time (see
http://blog.mikemccandless.com/2011/06/lucenes-near-real-time-search-is-fast.html).

A simple rate limiter improves the spikey NRT reopen times during big
merges, so I think we should somehow make this possible.  Likely this
would reduce impact on searches as well.

Typically apps that do indexing and searching on same box are in no
rush to see the merges complete so this is a good tradeoff.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3201) improved compound file handling

2011-06-14 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049287#comment-13049287
 ] 

Michael McCandless commented on LUCENE-3201:


Patch looks great!  Incredible that this means there's no penalty at all at 
search time when using CFS, if you use MMapDir.

I like that CFS reader is now under oal.store not .index.

> improved compound file handling
> ---
>
> Key: LUCENE-3201
> URL: https://issues.apache.org/jira/browse/LUCENE-3201
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Robert Muir
> Attachments: LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following 
> problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for 
> directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of 
> compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would 
> just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of 
> course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory 
> could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it 
> wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize 
> how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
> return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a 
> Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it 
> expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3201) improved compound file handling

2011-06-14 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3201:


Attachment: LUCENE-3201.patch

Initial patch for review. In this patch I only cut over MMapDirectory to using 
a special CompoundFileDirectory, all others use the default as before (but i 
cleaned up some things about it).

Pretty sure i can easily improve SimpleFS and NIOFS, i'll take a look at that 
now, but I wanted to get this up for review.


> improved compound file handling
> ---
>
> Key: LUCENE-3201
> URL: https://issues.apache.org/jira/browse/LUCENE-3201
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Robert Muir
> Attachments: LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following 
> problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for 
> directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of 
> compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would 
> just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of 
> course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory 
> could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it 
> wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize 
> how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
> return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a 
> Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it 
> expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2574) upgrade SLF4J (primary motivation: simplifiy use of solrj)

2011-06-14 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-2574:


Issue Type: Wish  (was: Bug)

since this is not a bug... lets change the status

> upgrade SLF4J (primary motivation: simplifiy use of solrj)
> --
>
> Key: SOLR-2574
> URL: https://issues.apache.org/jira/browse/SOLR-2574
> Project: Solr
>  Issue Type: Wish
>Reporter: Gabriele Kahlout
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 3.3, 4.0
>
> Attachments: solrjtest.zip
>
>
> Whatever the merits of slf4j, a quick solrj test should work. 
> I've attached a sample 1-line project with dependency on solrj-3.2 on run it 
> prints:
> {code}
> java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
>   at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.(CommonsHttpSolrServer.java:72)
>   at com.mysimpatico.solrjtest.App.main(App.java:12)
> {code}
> Uncomment the nop dependency and it will work.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8832 - Failure

2011-06-14 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8832/

20 tests failed.
REGRESSION:  org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety

Error Message:
Error occurred in thread Thread-73: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/4/test6102461040tmp/_b_1.skp
 (Too many open files in system)

Stack Trace:
junit.framework.AssertionFailedError: Error occurred in thread Thread-73:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/4/test6102461040tmp/_b_1.skp
 (Too many open files in system)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/4/test6102461040tmp/_b_1.skp
 (Too many open files in system)
at 
org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety(TestIndexReaderReopen.java:822)


REGRESSION:  org.apache.lucene.search.TestComplexExplanations.testCSQ4

Error Message:
CheckIndex failed

Stack Trace:
java.lang.RuntimeException: CheckIndex failed
at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:158)
at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144)
at 
org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477)
at 
org.apache.lucene.search.TestExplanations.tearDown(TestExplanations.java:66)
at 
org.apache.lucene.search.TestComplexExplanations.tearDown(TestComplexExplanations.java:43)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)


REGRESSION:  org.apache.lucene.search.TestComplexExplanations.testDMQ10

Error Message:
null

Stack Trace:
java.lang.NullPointerException
at 
org.apache.lucene.search.TestComplexExplanations.tearDown(TestComplexExplanations.java:42)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)


REGRESSION:  org.apache.lucene.search.TestComplexExplanations.testMPQ7

Error Message:
null

Stack Trace:
java.lang.NullPointerException
at 
org.apache.lucene.search.TestComplexExplanations.tearDown(TestComplexExplanations.java:42)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)


REGRESSION:  org.apache.lucene.search.TestComplexExplanations.testBQ12

Error Message:
null

Stack Trace:
java.lang.NullPointerException
at 
org.apache.lucene.search.TestComplexExplanations.tearDown(TestComplexExplanations.java:42)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)


REGRESSION:  org.apache.lucene.search.TestComplexExplanations.testBQ13

Error Message:
null

Stack Trace:
java.lang.NullPointerException
at 
org.apache.lucene.search.TestComplexExplanations.tearDown(TestComplexExplanations.java:42)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)


REGRESSION:  org.apache.lucene.search.TestComplexExplanations.testBQ18

Error Message:
null

Stack Trace:
java.lang.NullPointerException
at 
org.apache.lucene.search.TestComplexExplanations.tearDown(TestComplexExplanations.java:42)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)


REGRESSION:  org.apache.lucene.search.TestComplexExplanations.testBQ21

Error Message:
null

Stack Trace:
java.lang.NullPointerException
at 
org.apache.lucene.search.TestComplexExplanations.tearDown(TestComplexExplanations.java:42)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)


REGRESSION:  org.apache.lucene.search.TestComplexExplanations.testBQ22

Error Message:
null

Stack Trace:
java.lang.NullPointerException
at 
org.apache.lucene.search.TestComplexExplanations.tearDown(TestComplexExplanations.java:42)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
at 

[jira] [Issue Comment Edited] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)

2011-06-14 Thread Martin Grotzke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049256#comment-13049256
 ] 

Martin Grotzke edited comment on SOLR-2583 at 6/14/11 4:25 PM:
---

I just compared memory consumption of the 3 different approaches, with 
different number of puts (number of scores) and sizes (number of docs), the 
memory is in byte:

{noformat}
Puts  1.000, size 1.000.000:  CompactFloatArray 898.136,float[] 
4.000.016,  HashMap  72.192
Puts  10.000, size 1.000.000: CompactFloatArray 3.724.376,  float[] 
4.000.016,  HashMap  702.784
Puts  100.000, size 1.000.000:CompactFloatArray 4.016.472,  float[] 
4.000.016,  HashMap  6.607.808
Puts  1.000.000, size 1.000.000:  CompactFloatArray 4.016.472,  float[] 
4.000.016,  HashMap  44.644.032
Puts  1.000, size 5.000.000:  CompactFloatArray 1.128.536,  float[] 
20.000.016, HashMap  72.256
Puts  10.000, size 5.000.000: CompactFloatArray 8.168.536,  float[] 
20.000.016, HashMap  704.832
Puts  100.000, size 5.000.000:CompactFloatArray 20.013.144, float[] 
20.000.016, HashMap  7.385.152
Puts  1.000.000, size 5.000.000:  CompactFloatArray 20.131.160, float[] 
20.000.016, HashMap  66.395.584
Puts  1.000, size 10.000.000: CompactFloatArray 1.275.992,  float[] 
40.000.016, HashMap  72.256
Puts  10.000, size 10.000.000:CompactFloatArray 9.289.816,  float[] 
40.000.016, HashMap  705.280
Puts  100.000, size 10.000.000:   CompactFloatArray 37.130.328, float[] 
40.000.016, HashMap  7.418.112
Puts  1.000.000, size 10.000.000: CompactFloatArray 40.262.232, float[] 
40.000.016, HashMap  69.282.496
{noformat}

I want to share this intermediately, without further interpretation/conclusion 
for now (I just need to get the train).

  was (Author: martin.grotzke):
I just compared memory consumption of the 3 different approaches, with 
different number of puts (number of scores) and sizes (number of docs):

{noformat}
Puts  1.000, size 1.000.000:  CompactFloatArray 898.136,float[] 
4.000.016,  HashMap  72.192
Puts  10.000, size 1.000.000: CompactFloatArray 3.724.376,  float[] 
4.000.016,  HashMap  702.784
Puts  100.000, size 1.000.000:CompactFloatArray 4.016.472,  float[] 
4.000.016,  HashMap  6.607.808
Puts  1.000.000, size 1.000.000:  CompactFloatArray 4.016.472,  float[] 
4.000.016,  HashMap  44.644.032
Puts  1.000, size 5.000.000:  CompactFloatArray 1.128.536,  float[] 
20.000.016, HashMap  72.256
Puts  10.000, size 5.000.000: CompactFloatArray 8.168.536,  float[] 
20.000.016, HashMap  704.832
Puts  100.000, size 5.000.000:CompactFloatArray 20.013.144, float[] 
20.000.016, HashMap  7.385.152
Puts  1.000.000, size 5.000.000:  CompactFloatArray 20.131.160, float[] 
20.000.016, HashMap  66.395.584
Puts  1.000, size 10.000.000: CompactFloatArray 1.275.992,  float[] 
40.000.016, HashMap  72.256
Puts  10.000, size 10.000.000:CompactFloatArray 9.289.816,  float[] 
40.000.016, HashMap  705.280
Puts  100.000, size 10.000.000:   CompactFloatArray 37.130.328, float[] 
40.000.016, HashMap  7.418.112
Puts  1.000.000, size 10.000.000: CompactFloatArray 40.262.232, float[] 
40.000.016, HashMap  69.282.496
{noformat}

I want to share this intermediately, without further interpretation/conclusion 
for now (I just need to get the train).
  
> Make external scoring more efficient (ExternalFileField, FileFloatSource)
> -
>
> Key: SOLR-2583
> URL: https://issues.apache.org/jira/browse/SOLR-2583
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Martin Grotzke
>Priority: Minor
> Attachments: FileFloatSource.java.patch, patch.txt
>
>
> External scoring eats much memory, depending on the number of documents in 
> the index. The ExternalFileField (used for external scoring) uses 
> FileFloatSource, where one FileFloatSource is created per external scoring 
> file. FileFloatSource creates a float array with the size of the number of 
> docs (this is also done if the file to load is not found). If there are much 
> less entries in the scoring file than there are number of docs in total the 
> big float array wastes much memory.
> This could be optimized by using a map of doc -> score, so that the map 
> contains as many entries as there are scoring entries in the external file, 
> but not more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)

2011-06-14 Thread Martin Grotzke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049256#comment-13049256
 ] 

Martin Grotzke commented on SOLR-2583:
--

I just compared memory consumption of the 3 different approaches, with 
different number of puts (number of scores) and sizes (number of docs):

{noformat}
Puts  1.000, size 1.000.000:  CompactFloatArray 898.136,float[] 
4.000.016,  HashMap  72.192
Puts  10.000, size 1.000.000: CompactFloatArray 3.724.376,  float[] 
4.000.016,  HashMap  702.784
Puts  100.000, size 1.000.000:CompactFloatArray 4.016.472,  float[] 
4.000.016,  HashMap  6.607.808
Puts  1.000.000, size 1.000.000:  CompactFloatArray 4.016.472,  float[] 
4.000.016,  HashMap  44.644.032
Puts  1.000, size 5.000.000:  CompactFloatArray 1.128.536,  float[] 
20.000.016, HashMap  72.256
Puts  10.000, size 5.000.000: CompactFloatArray 8.168.536,  float[] 
20.000.016, HashMap  704.832
Puts  100.000, size 5.000.000:CompactFloatArray 20.013.144, float[] 
20.000.016, HashMap  7.385.152
Puts  1.000.000, size 5.000.000:  CompactFloatArray 20.131.160, float[] 
20.000.016, HashMap  66.395.584
Puts  1.000, size 10.000.000: CompactFloatArray 1.275.992,  float[] 
40.000.016, HashMap  72.256
Puts  10.000, size 10.000.000:CompactFloatArray 9.289.816,  float[] 
40.000.016, HashMap  705.280
Puts  100.000, size 10.000.000:   CompactFloatArray 37.130.328, float[] 
40.000.016, HashMap  7.418.112
Puts  1.000.000, size 10.000.000: CompactFloatArray 40.262.232, float[] 
40.000.016, HashMap  69.282.496
{noformat}

I want to share this intermediately, without further interpretation/conclusion 
for now (I just need to get the train).

> Make external scoring more efficient (ExternalFileField, FileFloatSource)
> -
>
> Key: SOLR-2583
> URL: https://issues.apache.org/jira/browse/SOLR-2583
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Martin Grotzke
>Priority: Minor
> Attachments: FileFloatSource.java.patch, patch.txt
>
>
> External scoring eats much memory, depending on the number of documents in 
> the index. The ExternalFileField (used for external scoring) uses 
> FileFloatSource, where one FileFloatSource is created per external scoring 
> file. FileFloatSource creates a float array with the size of the number of 
> docs (this is also done if the file to load is not found). If there are much 
> less entries in the scoring file than there are number of docs in total the 
> big float array wastes much memory.
> This could be optimized by using a map of doc -> score, so that the map 
> contains as many entries as there are scoring entries in the external file, 
> but not more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3197) Optimize runs forever if you keep deleting docs at the same time

2011-06-14 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049254#comment-13049254
 ] 

Hoss Man commented on LUCENE-3197:
--

is the possibility of a never ending optimize in this situation (never ending 
deletes) really something we need to "fix" ?

i mean ... isn't this what hte user should expect?  they've asked for a single 
segment w/o deletes, and then while we try to give it to them they keep 
deleting -- how is it bad that we optimize doesn't stop until it's completely 
done ?

> Optimize runs forever if you keep deleting docs at the same time
> 
>
> Key: LUCENE-3197
> URL: https://issues.apache.org/jira/browse/LUCENE-3197
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.3, 4.0
>
>
> Because we "cascade" merges for an optimize... if you also delete documents 
> while the merges are running, then the merge policy will see the resulting 
> single segment as still not optimized (since it has pending deletes) and do a 
> single-segment merge, and will repeat indefinitely (as long as your app keeps 
> deleting docs).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2588) Make Velocity an optional dependency in SolrCore

2011-06-14 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-2588:


  Priority: Minor  (was: Major)
Issue Type: Wish  (was: Bug)
   Summary: Make Velocity an optional dependency in SolrCore  (was: Solr 
doesn't work without Velocity on classpath)

updating the JIRA description... since this is not a 'bug'

> Make Velocity an optional dependency in SolrCore
> 
>
> Key: SOLR-2588
> URL: https://issues.apache.org/jira/browse/SOLR-2588
> Project: Solr
>  Issue Type: Wish
>Affects Versions: 3.2
>Reporter: Gunnar Wagenknecht
>Priority: Minor
> Fix For: 3.3
>
>
> In 1.4. it was fine to run Solr without Velocity on the classpath. However, 
> in 3.2. SolrCore won't load because of a hard reference to the Velocity 
> response writer in a static initializer.
> {noformat}
> ... ERROR org.apache.solr.core.CoreContainer - 
> java.lang.NoClassDefFoundError: org/apache/velocity/context/Context
>   at org.apache.solr.core.SolrCore.(SolrCore.java:1447)
>   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:463)
>   at org.apache.solr.core.CoreContainer.load(CoreContainer.java:316)
>   at org.apache.solr.core.CoreContainer.load(CoreContainer.java:207)
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2574) upgrade SLF4J (primary motivation: simplifiy use of solrj)

2011-06-14 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-2574:
---

Summary: upgrade SLF4J (primary motivation: simplifiy use of solrj)  (was: 
Add SLF4J-nop dependency)

> upgrade SLF4J (primary motivation: simplifiy use of solrj)
> --
>
> Key: SOLR-2574
> URL: https://issues.apache.org/jira/browse/SOLR-2574
> Project: Solr
>  Issue Type: Bug
>Reporter: Gabriele Kahlout
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 3.3, 4.0
>
> Attachments: solrjtest.zip
>
>
> Whatever the merits of slf4j, a quick solrj test should work. 
> I've attached a sample 1-line project with dependency on solrj-3.2 on run it 
> prints:
> {code}
> java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
>   at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.(CommonsHttpSolrServer.java:72)
>   at com.mysimpatico.solrjtest.App.main(App.java:12)
> {code}
> Uncomment the nop dependency and it will work.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-2955) Add utitily class to manage NRT reopening

2011-06-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2955.


   Resolution: Fixed
Fix Version/s: 4.0

> Add utitily class to manage NRT reopening
> -
>
> Key: LUCENE-2955
> URL: https://issues.apache.org/jira/browse/LUCENE-2955
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-2955.patch, LUCENE-2955.patch, LUCENE-2955.patch
>
>
> I created a simple class, NRTManager, that tries to abstract away some
> of the reopen logic when using NRT readers.
> You give it your IW, tell it min and max nanoseconds staleness you can
> tolerate, and it privately runs a reopen thread to periodically reopen
> the searcher.
> It subsumes the SearcherManager from LIA2.  Besides running the reopen
> thread, it also adds the notion of a "generation" containing changes
> you've made.  So eg it has addDocument, returning a long.  You can
> then take that long value and pass it back to the getSearcher method
> and getSearcher will return a searcher that reflects the changes made
> in that generation.
> This gives your app the freedom to force "immediate" consistency (ie
> wait for the reopen) only for those searches that require it, like a
> verifier that adds a doc and then immediately searches for it, but
> also use "eventual consistency" for other searches.
> I want to also add support for the new "applyDeletions" option when
> pulling an NRT reader.
> Also, this is very new and I'm sure buggy -- the concurrency is either
> wrong over overly-locking.  But it's a start...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2588) Solr doesn't work without Velocity on classpath

2011-06-14 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049247#comment-13049247
 ] 

Hoss Man commented on SOLR-2588:


bq. With some small changes, Velocity could be optional.

Velocity (and the velocitywriter) were optional before, and a conscious and 
deliberate choice was made to promote it into a core dependency so that admin 
code (and users) could start expecting it to reliable always work.

if we want to re-consider i'm fine with having that discussion, but it 
shouldn't be an "optional core" feature ... either it's a core feature and 
dependency, or it's an optional contrib.

It's not a bug that code in solr has a direct dependency on jars in the lib dir.

> Solr doesn't work without Velocity on classpath
> ---
>
> Key: SOLR-2588
> URL: https://issues.apache.org/jira/browse/SOLR-2588
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 3.2
>Reporter: Gunnar Wagenknecht
> Fix For: 3.3
>
>
> In 1.4. it was fine to run Solr without Velocity on the classpath. However, 
> in 3.2. SolrCore won't load because of a hard reference to the Velocity 
> response writer in a static initializer.
> {noformat}
> ... ERROR org.apache.solr.core.CoreContainer - 
> java.lang.NoClassDefFoundError: org/apache/velocity/context/Context
>   at org.apache.solr.core.SolrCore.(SolrCore.java:1447)
>   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:463)
>   at org.apache.solr.core.CoreContainer.load(CoreContainer.java:316)
>   at org.apache.solr.core.CoreContainer.load(CoreContainer.java:207)
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3198) Change default Directory impl on 64bit linux to MMap

2011-06-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3198.


Resolution: Fixed

OK I cutover just Linux, 64 bit, when unmap is available.

We can open new issues for other platforms when we have some data that MMap is 
better...

> Change default Directory impl on 64bit linux to MMap
> 
>
> Key: LUCENE-3198
> URL: https://issues.apache.org/jira/browse/LUCENE-3198
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Michael McCandless
> Fix For: 3.3, 4.0
>
>
> Consistently in my NRT testing on Fedora 13 Linux, 64 bit JVM (Oracle 
> 1.6.0_21) I see MMapDir getting better search and merge performance when 
> compared to NIOFSDir.
> I think we should fix the default.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3190) TestStressIndexing2 testMultiConfig failure

2011-06-14 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049202#comment-13049202
 ] 

Simon Willnauer commented on LUCENE-3190:
-

I managed to reproduce this and trip into it with a debugger. So what happens 
here is bizarre :)
Due to the very tight maxBufferedDocs (3) and maxRamBufferSizeMB (0.1MB) we 
have a pretty good chance that several in flight DWPT will trigger a flush 
after the document they are indexing right now. That means that if we have lets 
say 4 DWPT and 3 are already flushing and memory is close to the asserts limit 
we got a problem if there is already a 4th DWPT in flight (passed the stall 
check) the document can easily add enough bytes that we cross the asserts max 
expected ram and fail then. 

I am not sure how we can fix that right now but at least its not a bug in DWPT.

> TestStressIndexing2 testMultiConfig failure
> ---
>
> Key: LUCENE-3190
> URL: https://issues.apache.org/jira/browse/LUCENE-3190
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: selckin
>Assignee: Simon Willnauer
>
> trunk: r1134311
> reproducible
> {code}
> [junit] Testsuite: org.apache.lucene.index.TestStressIndexing2
> [junit] Tests run: 1, Failures: 2, Errors: 0, Time elapsed: 0.882 sec
> [junit] 
> [junit] - Standard Error -
> [junit] java.lang.AssertionError: ram was 460908 expected: 408216 flush 
> mem: 395100 active: 65808
> [junit] at 
> org.apache.lucene.index.DocumentsWriterFlushControl.assertMemory(DocumentsWriterFlushControl.java:102)
> [junit] at 
> org.apache.lucene.index.DocumentsWriterFlushControl.doAfterDocument(DocumentsWriterFlushControl.java:164)
> [junit] at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380)
> [junit] at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1473)
> [junit] at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1445)
> [junit] at 
> org.apache.lucene.index.TestStressIndexing2$IndexingThread.indexDoc(TestStressIndexing2.java:723)
> [junit] at 
> org.apache.lucene.index.TestStressIndexing2$IndexingThread.run(TestStressIndexing2.java:757)
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 
> -Dtestmethod=testMultiConfig 
> -Dtests.seed=2571834029692482827:-8116419692655152763
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 
> -Dtestmethod=testMultiConfig 
> -Dtests.seed=2571834029692482827:-8116419692655152763
> [junit] The following exceptions were thrown by threads:
> [junit] *** Thread: Thread-0 ***
> [junit] junit.framework.AssertionFailedError: java.lang.AssertionError: 
> ram was 460908 expected: 408216 flush mem: 395100 active: 65808
> [junit] at junit.framework.Assert.fail(Assert.java:47)
> [junit] at 
> org.apache.lucene.index.TestStressIndexing2$IndexingThread.run(TestStressIndexing2.java:762)
> [junit] NOTE: test params are: codec=RandomCodecProvider: {f33=Standard, 
> f57=MockFixedIntBlock(blockSize=649), f11=Standard, f41=MockRandom, 
> f40=Standard, f62=MockRandom, f75=Standard, f73=MockSep, 
> f29=MockFixedIntBlock(blockSize=649), f83=MockRandom, f66=MockSep, 
> f49=MockVariableIntBlock(baseBlockSize=9), f72=Pulsing(freqCutoff=7), 
> f54=Standard, id=MockFixedIntBlock(blockSize=649), f80=MockRandom, 
> f94=MockSep, f93=Pulsing(freqCutoff=7), f95=Standard}, locale=en_SG, 
> timezone=Pacific/Palau
> [junit] NOTE: all tests run in this JVM:
> [junit] [TestStressIndexing2]
> [junit] NOTE: Linux 2.6.39-gentoo amd64/Sun Microsystems Inc. 1.6.0_25 
> (64-bit)/cpus=8,threads=1,free=133324528,total=158400512
> [junit] -  ---
> [junit] Testcase: 
> testMultiConfig(org.apache.lucene.index.TestStressIndexing2): FAILED
> [junit] r1.numDocs()=17 vs r2.numDocs()=16
> [junit] junit.framework.AssertionFailedError: r1.numDocs()=17 vs 
> r2.numDocs()=16
> [junit] at 
> org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:308)
> [junit] at 
> org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:278)
> [junit] at 
> org.apache.lucene.index.TestStressIndexing2.testMultiConfig(TestStressIndexing2.java:124)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)
> [junit] 
> [junit] 
> [junit] Testcase: 
> testMultiConfig(org.apache.lucene.index.TestStressIndexing2): FAILED
> [junit] Some threads threw uncaught exceptions

[jira] [Commented] (SOLR-2305) DataImportScheduler - Marko Bonaci

2011-06-14 Thread Marko Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049180#comment-13049180
 ] 

Marko Bonaci commented on SOLR-2305:


I'll attach the patch during the following weekend.

> DataImportScheduler -  Marko Bonaci
> ---
>
> Key: SOLR-2305
> URL: https://issues.apache.org/jira/browse/SOLR-2305
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 4.0
>Reporter: Bill Bell
> Fix For: 4.0
>
>
> Marko Bonaci has updated the WIKI page to add the DataImportScheduler, but I 
> cannot find a JIRA ticket for it?
> http://wiki.apache.org/solr/DataImportHandler
> Do we have a ticket so the code can be tracked?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Indexing slower in trunk

2011-06-14 Thread Uwe Schindler
For simple removing deletes, there is also IW.expungeDeletes(), which is
less intensive! Not sure if solr support this, too, but as far as I know
there is an issue open.

Also please note: As soon as one segment is selected for merging (the merge
policy may also do this dependent on the number of deletes in a segment), it
will reclaim all deleted ressources - that's what merging does. So expunging
deletes once per week is a good idea, if your index consists of very old and
large segments that are rarely merged anymore and lots of documents are
deleted from them.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Tuesday, June 14, 2011 3:19 PM
> To: dev@lucene.apache.org
> Subject: Re: Indexing slower in trunk
> 
> Optimization used to have a very noticeable impact on search speed prior
to
> some index format changes from quite a while ago.
> 
> At this point the effect is much less noticeable, but the thing optimize
does
> do is reclaim resources from deleted documents. If you have lots of
> deletions, it's a good idea to periodically optimize, but in that case
it's often
> done pretty infrequently (once a
> day/week/month) rather than as part of any ongoing indexing process.
> 
> Best
> Erick
> 
> 2011/6/14 Yury Kats :
> > On 6/14/2011 4:28 AM, Uwe Schindler wrote:
> >> indexing and optimizing was only a
> >> good idea pre Lucene-2.9, now it's mostly obsolete)
> >
> > Could you please elaborate on this? Is optimizing obsolete in general
> > or after indexing new documents? Is it obsolete after deletions? And
> > what it "mostly"?
> >
> > Thanks!
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
> > additional commands, e-mail: dev-h...@lucene.apache.org
> >
> >
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
> commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Indexing slower in trunk

2011-06-14 Thread Uwe Schindler
Since Lucene 2.9, Lucene works on a per segment basis when searching. Since
Lucene 3.1 it can even parallelize on multiple segments. If you optimize
your index you only have one segment. Also, when you frequently reopen your
indexes (e.g. after updates), the cost for warming the cache and FieldCache
for a very large new segment ( and the CPU power lost for optimizing the
whole index) does not rectify a more compact representation on disk. For
search performance it has a minor impact.

It's a much better idea to configure the MergePolicy accordingly to merge
segments optimally (Lucene 3.2 has a TieredMergePolicy that's now the
default). This one will minimize number of segments. 

The same overhead applies to Solr when you replicate your index. If you
don't optimize, Solr will only exchange new segments between the indexes. As
after a optimize the whole index is rebuilt, it has to always transfer the
whole index files to the replicas.

Optimizing only makes sense for e.g. read-only indexes that are build one
time and never touched.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Yury Kats [mailto:yuryk...@yahoo.com]
> Sent: Tuesday, June 14, 2011 3:04 PM
> To: dev@lucene.apache.org
> Subject: Re: Indexing slower in trunk
> 
> On 6/14/2011 4:28 AM, Uwe Schindler wrote:
> > indexing and optimizing was only a
> > good idea pre Lucene-2.9, now it's mostly obsolete)
> 
> Could you please elaborate on this? Is optimizing obsolete in general or
after
> indexing new documents? Is it obsolete after deletions? And what it
> "mostly"?
> 
> Thanks!
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
> commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Indexing slower in trunk

2011-06-14 Thread Erick Erickson
Optimization used to have a very noticeable impact
on search speed prior to some index format changes
from quite a while ago.

At this point the effect is much less noticeable, but
the thing optimize does do is reclaim resources
from deleted documents. If you have lots of deletions,
it's a good idea to periodically optimize, but in that
case it's often done pretty infrequently (once a
day/week/month) rather than as part of any ongoing
indexing process.

Best
Erick

2011/6/14 Yury Kats :
> On 6/14/2011 4:28 AM, Uwe Schindler wrote:
>> indexing and optimizing was only a
>> good idea pre Lucene-2.9, now it's mostly obsolete)
>
> Could you please elaborate on this? Is optimizing obsolete
> in general or after indexing new documents? Is it obsolete
> after deletions? And what it "mostly"?
>
> Thanks!
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Indexing slower in trunk

2011-06-14 Thread Yury Kats
On 6/14/2011 4:28 AM, Uwe Schindler wrote:
> indexing and optimizing was only a
> good idea pre Lucene-2.9, now it's mostly obsolete)

Could you please elaborate on this? Is optimizing obsolete
in general or after indexing new documents? Is it obsolete
after deletions? And what it "mostly"?

Thanks!

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3200) Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2

2011-06-14 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049166#comment-13049166
 ] 

Uwe Schindler commented on LUCENE-3200:
---

Thanks to Robert for help debugging my stupid + vs | problem and lots of 
fruitful discussions about the whole stuff and how to improve :) Thanks to Mike 
for testing on beast!

Now you can refactor CFSIndexInput & Co!

> Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized 
> of powers of 2
> ---
>
> Key: LUCENE-3200
> URL: https://issues.apache.org/jira/browse/LUCENE-3200
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, 
> LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200_tests.patch
>
>
> Robert and me discussed a little bit after Mike's investigations, that using 
> SingleMMapIndexinput together with MultiMMapIndexInput leads to hotspot 
> slowdowns sometimes.
> We had the following ideas:
> - MultiMMapIndexInput is almost as fast as SingleMMapIndexInput, as the 
> switching between buffer boundaries is done in exception catch blocks. So 
> normal code path is always the same like for Single*
> - Only the seek method uses strange calculations (the modulo is totally 
> bogus, it could be simply: int bufOffset = (int) (pos % maxBufSize); - very 
> strange way of calculating modulo in the original code)
> - Because of speed we suggest to no longer use arbitrary buffer sizes. We 
> should pass only the power of 2 to the indexinput as size. All calculations 
> in seek and anywhere else would be simple bit shifts and AND operations (the 
> and masks for the modulo can be calculated in the ctor like NumericUtils does 
> when calculating precisionSteps).
> - the maximum buffer size will now be 2^30, not 2^31-1. But thats not an 
> issue at all. In my opinion, a buffer size of 2^31-1 is stupid in all cases, 
> as it will no longer fit page boundaries and mmapping gets harder for the O/S.
> We will provide a patch with those cleanups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3200) Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2

2011-06-14 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-3200.
---

   Resolution: Fixed
Fix Version/s: 4.0
   3.3

Committed trunk revision: 1135537
Committed 3.x revision: 1135538

> Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized 
> of powers of 2
> ---
>
> Key: LUCENE-3200
> URL: https://issues.apache.org/jira/browse/LUCENE-3200
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, 
> LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200_tests.patch
>
>
> Robert and me discussed a little bit after Mike's investigations, that using 
> SingleMMapIndexinput together with MultiMMapIndexInput leads to hotspot 
> slowdowns sometimes.
> We had the following ideas:
> - MultiMMapIndexInput is almost as fast as SingleMMapIndexInput, as the 
> switching between buffer boundaries is done in exception catch blocks. So 
> normal code path is always the same like for Single*
> - Only the seek method uses strange calculations (the modulo is totally 
> bogus, it could be simply: int bufOffset = (int) (pos % maxBufSize); - very 
> strange way of calculating modulo in the original code)
> - Because of speed we suggest to no longer use arbitrary buffer sizes. We 
> should pass only the power of 2 to the indexinput as size. All calculations 
> in seek and anywhere else would be simple bit shifts and AND operations (the 
> and masks for the modulo can be calculated in the ctor like NumericUtils does 
> when calculating precisionSteps).
> - the maximum buffer size will now be 2^30, not 2^31-1. But thats not an 
> issue at all. In my opinion, a buffer size of 2^31-1 is stupid in all cases, 
> as it will no longer fit page boundaries and mmapping gets harder for the O/S.
> We will provide a patch with those cleanups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Indexing slower in trunk

2011-06-14 Thread Erick Erickson
Thanks, guys. Yes, I am running it all locally and disk seeks
may well be the culprit. This thread is mainly to be sure that
the behavior I'm seeing is expected, or at least explainable.

Really, I don't need to pursue this further unless there's
actually data I can gather to help speed things up. If this
is just a consequence of DWPT and/or my particular
setup then that's fine. I'm mostly trying to understand
the characteristics of indexing/searching on the trunk.
This started with me exploring memory
requirements, and is really just something I noticed along
the way and wanted to get some feedback on.

So, absent the commit step, the times are reasonably
comparable. Can I impose upon one of you to give a
two-sentence summary of what DWPT buys us from a
user perspective? If memory serves it should have
background merging and other goodies.

Uwe:
Yep, I was curious about optimize but understand that it's not required
in recent code. That said, data is not searchable until a commit
happens, so just for yucks I changed the optimize to a commit. Stats
of that run below.

Simon:
OK, adjusted the ram buffer size to 512M, and it's a bit faster, but
not all that much, see stats, and the delta could well be sampling
errors, one run doth not a statistical certainty make. Up until the
commit step, the admin stats page is showing no documents in
the index so I think this setting completely avoids intermediate
committing although that says nothing about the individual writers
writing lots of segments to disk, that still happens.

Added 188 docs. Took 1437 ms. cumulative interval (seconds) = 284
Added 189 docs. Took 1285 ms. cumulative interval (seconds) = 285
Added 190 docs. Took 1182 ms. cumulative interval (seconds) = 286
Added 191 docs. Took 1675 ms. cumulative interval (seconds) = 288
About to commit, total time so far: 290
Total Time Taken-> 395 seconds***100 secs for the commit to finish.
Total documents added-> 1917728
Docs/sec-> 4855

Thanks, all
Erick


On Tue, Jun 14, 2011 at 4:39 AM, Simon Willnauer
 wrote:
> Erick, it seems you need to adjust your settings for 4.0 a little.
> When you index with DWPT it builds thread private segments which are
> independently flushed to disk. Yet, when you set your ram buffer IW
> will accumulate the ram used by all active DWPT and flush the largest
> once you reach your ram buffer. with 128M you might end up wil lots of
> small segments which need to be merged in the background. Eventually
> what will happen here is that your disk is so busy that you are not
> able to flush fast enough and threads might stall.
>
> What you can try here is adjust your RAM buffer to be a little higher,
> lets say 350MB or change the max number of thread states in
> DocumentsWriterPerThreadPool ie.
> ThreadAffinityDocumentsWriterThreadPool. The latter is unfortunately
> not exposed yet in solr so maybe for testing you just want to change
> the default value in DocumentsWriterPerThreadPool to 4. That will also
> cause segments to be bigger eventually.
>
> simon
>
> On Tue, Jun 14, 2011 at 10:28 AM, Uwe Schindler  wrote:
>> Hi Erick,
>>
>> Do you use harddisks or SSDs? I assume harddisks, which may explain what you
>> see:
>>
>> - DWPT writes lots of segments in parallel, which also explains why you are
>> seeing more files. Writing in parallel to several files, needs more head
>> movements of your harddisk and this slows down. In the past, only one
>> segment was written at the same time (sequential), so the harddisk is not so
>> stressed.
>> - Optimizing may be slower for the same reason: there are many more files to
>> merge (but optimize cost should not be counted as a problem here as normally
>> you won't need to optimize after initial indexing and optimizing was only a
>> good idea pre Lucene-2.9, now it's mostly obsolete)
>>
>> Uwe
>>
>> -
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: u...@thetaphi.de
>>
>>
>>> -Original Message-
>>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>>> Sent: Tuesday, June 14, 2011 2:46 AM
>>> To: dev@lucene.apache.org; simon.willna...@gmail.com
>>> Subject: Re: Indexing slower in trunk
>>>
>>> Simon:
>>>
>>> Yep, I was asking to see if it was weird. Pursuant to our
>>> chat I tried some things, results below:
>>>
>>> All these are running on my local machine, same disk, same
>>> JVM settings re: memory. The SolrJ indexing is happening
>>> from IntelliJ (OK, I'm lazy).
>>>
>>> Rambuffer is set at 128M in all cases. Merge factor is 10.
>>>
>>> I'm allocation 2G to the server and 2G to the indexer
>>>
>>> Servers get started like this:
>>> _server = new StreamingUpdateSolrServer(url, 10, 4);
>>>
>>> I choose the threads and queue length semi-arbitrarily.
>>>
>>> Autocommit originally was 10,000 docs, 60,000 maxTime for
>>> all tests, but I removed that in the case of trunk , substituting
>>> a "commitWithin" in the SolrJ code of 10 minutes. That flattened
>>> out the run up

Re: svn commit: r1135526 - in /lucene/dev/branches/branch_3x: ./ lucene/ lucene/backwards/ lucene/backwards/src/test-framework/ lucene/backwards/src/test/ lucene/src/java/org/apache/lucene/store/ solr

2011-06-14 Thread Michael McCandless
Well, at some point, we'll move to Java 1.6 and then we don't have to
worry about this craziness anymore!

Mike McCandless

http://blog.mikemccandless.com

On Tue, Jun 14, 2011 at 8:32 AM, Dawid Weiss  wrote:
> Thanks Mike. I keep forgetting about it.
>
> Dawid
>
> On Tue, Jun 14, 2011 at 2:28 PM,   wrote:
>> Author: mikemccand
>> Date: Tue Jun 14 12:28:16 2011
>> New Revision: 1135526
>>
>> URL: http://svn.apache.org/viewvc?rev=1135526&view=rev
>> Log:
>> can't override interface until Java 1.6
>>
>> Modified:
>>    lucene/dev/branches/branch_3x/   (props changed)
>>    lucene/dev/branches/branch_3x/lucene/   (props changed)
>>    lucene/dev/branches/branch_3x/lucene/backwards/   (props changed)
>>    lucene/dev/branches/branch_3x/lucene/backwards/src/test/   (props changed)
>>    lucene/dev/branches/branch_3x/lucene/backwards/src/test-framework/   
>> (props changed)
>>    
>> lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/InputStreamDataInput.java
>>    
>> lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/OutputStreamDataOutput.java
>>    lucene/dev/branches/branch_3x/solr/   (props changed)
>>
>> Modified: 
>> lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/InputStreamDataInput.java
>> URL: 
>> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/InputStreamDataInput.java?rev=1135526&r1=1135525&r2=1135526&view=diff
>> ==
>> --- 
>> lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/InputStreamDataInput.java
>>  (original)
>> +++ 
>> lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/InputStreamDataInput.java
>>  Tue Jun 14 12:28:16 2011
>> @@ -51,7 +51,7 @@ public class InputStreamDataInput extend
>>     }
>>   }
>>
>> -  @Override
>> +  // @Override -- not until Java 1.6
>>   public void close() throws IOException {
>>     is.close();
>>   }
>>
>> Modified: 
>> lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/OutputStreamDataOutput.java
>> URL: 
>> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/OutputStreamDataOutput.java?rev=1135526&r1=1135525&r2=1135526&view=diff
>> ==
>> --- 
>> lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/OutputStreamDataOutput.java
>>  (original)
>> +++ 
>> lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/OutputStreamDataOutput.java
>>  Tue Jun 14 12:28:16 2011
>> @@ -39,6 +39,7 @@ public class OutputStreamDataOutput exte
>>     os.write(b, offset, length);
>>   }
>>
>> +  // @Override -- not until Java 1.6
>>   public void close() throws IOException {
>>     os.close();
>>   }
>>
>>
>>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8832 - Failure

2011-06-14 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/8832/

No tests ran.

Build Log (for compile errors):
[...truncated 15595 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: svn commit: r1135487 - /lucene/dev/trunk/lucene/src/java/org/apache/lucene/util/fst/FST.java

2011-06-14 Thread Michael McCandless
On Tue, Jun 14, 2011 at 8:42 AM, Dawid Weiss
 wrote:
>> Don't forget to fix on 3.x too :)
>
> Seems like you did it on both, right?

No, I just fixed the @Override 1.6 only issue, not this one!

>> Re using up -1, I agree... but, this actually simplified the *FSTEnum
>> classes (I think?).
>
> I'll see if I can come up with something.

Cool, thanks!

Mike

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: External strings sort and case folding.

2011-06-14 Thread Michael McCandless
On Tue, Jun 14, 2011 at 8:35 AM, Dawid Weiss
 wrote:
>> Merging FSTs sounds cool!
>
> Yep, it would be quite neat and relatively simple too. I looked at the
> FST API again, after doing some work on those compounds and there are
> several things that just hurt my eyes... I'll see if I can figure out
> a nicer API... how emotionally attached are you to that simulation of
> terminal states, Mike? :)

I'm not at all!  Fix away :)  FST is very new and it needs some good
iterating...

Just make sure the *FSTEnum work ok -- I think that's why I added the END_LABEL.

Mike

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: svn commit: r1135487 - /lucene/dev/trunk/lucene/src/java/org/apache/lucene/util/fst/FST.java

2011-06-14 Thread Dawid Weiss
> Don't forget to fix on 3.x too :)

Seems like you did it on both, right?

> Re using up -1, I agree... but, this actually simplified the *FSTEnum
> classes (I think?).

I'll see if I can come up with something.

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: svn commit: r1135487 - /lucene/dev/trunk/lucene/src/java/org/apache/lucene/util/fst/FST.java

2011-06-14 Thread Michael McCandless
Don't forget to fix on 3.x too :)

Re using up -1, I agree... but, this actually simplified the *FSTEnum
classes (I think?).

Mike McCandless

http://blog.mikemccandless.com

On Tue, Jun 14, 2011 at 7:13 AM,   wrote:
> Author: dweiss
> Date: Tue Jun 14 11:13:27 2011
> New Revision: 1135487
>
> URL: http://svn.apache.org/viewvc?rev=1135487&view=rev
> Log:
> Replaced magic constants with END_LABEL. I don't like this END_LABEL 
> thingy... it makes code more complex and, worse of all, it makes having -1 
> label on a transition impossible.
>
> Modified:
>    lucene/dev/trunk/lucene/src/java/org/apache/lucene/util/fst/FST.java
>
> Modified: lucene/dev/trunk/lucene/src/java/org/apache/lucene/util/fst/FST.java
> URL: 
> http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/src/java/org/apache/lucene/util/fst/FST.java?rev=1135487&r1=1135486&r2=1135487&view=diff
> ==
> --- lucene/dev/trunk/lucene/src/java/org/apache/lucene/util/fst/FST.java 
> (original)
> +++ lucene/dev/trunk/lucene/src/java/org/apache/lucene/util/fst/FST.java Tue 
> Jun 14 11:13:27 2011
> @@ -490,7 +490,7 @@ public class FST {
>     if (!targetHasArcs(follow)) {
>       //System.out.println("  end node");
>       assert follow.isFinal();
> -      arc.label = -1;
> +      arc.label = END_LABEL;
>       arc.output = follow.nextFinalOutput;
>       arc.flags = BIT_LAST_ARC;
>       return arc;
> @@ -544,7 +544,7 @@ public class FST {
>     //System.out.println("    readFirstTarget follow.target=" + follow.target 
> + " isFinal=" + follow.isFinal());
>     if (follow.isFinal()) {
>       // Insert "fake" final first arc:
> -      arc.label = -1;
> +      arc.label = END_LABEL;
>       arc.output = follow.nextFinalOutput;
>       if (follow.target <= 0) {
>         arc.flags = BIT_LAST_ARC;
> @@ -599,7 +599,7 @@ public class FST {
>
>   /** In-place read; returns the arc. */
>   public Arc readNextArc(Arc arc) throws IOException {
> -    if (arc.label == -1) {
> +    if (arc.label == END_LABEL) {
>       // This was a fake inserted "final" arc
>       if (arc.nextArc <= 0) {
>         // This arc went to virtual final node, ie has no outgoing arcs
>
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: External strings sort and case folding.

2011-06-14 Thread Dawid Weiss
> Merging FSTs sounds cool!

Yep, it would be quite neat and relatively simple too. I looked at the
FST API again, after doing some work on those compounds and there are
several things that just hurt my eyes... I'll see if I can figure out
a nicer API... how emotionally attached are you to that simulation of
terminal states, Mike? :)

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: svn commit: r1135526 - in /lucene/dev/branches/branch_3x: ./ lucene/ lucene/backwards/ lucene/backwards/src/test-framework/ lucene/backwards/src/test/ lucene/src/java/org/apache/lucene/store/ solr

2011-06-14 Thread Dawid Weiss
Thanks Mike. I keep forgetting about it.

Dawid

On Tue, Jun 14, 2011 at 2:28 PM,   wrote:
> Author: mikemccand
> Date: Tue Jun 14 12:28:16 2011
> New Revision: 1135526
>
> URL: http://svn.apache.org/viewvc?rev=1135526&view=rev
> Log:
> can't override interface until Java 1.6
>
> Modified:
>    lucene/dev/branches/branch_3x/   (props changed)
>    lucene/dev/branches/branch_3x/lucene/   (props changed)
>    lucene/dev/branches/branch_3x/lucene/backwards/   (props changed)
>    lucene/dev/branches/branch_3x/lucene/backwards/src/test/   (props changed)
>    lucene/dev/branches/branch_3x/lucene/backwards/src/test-framework/   
> (props changed)
>    
> lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/InputStreamDataInput.java
>    
> lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/OutputStreamDataOutput.java
>    lucene/dev/branches/branch_3x/solr/   (props changed)
>
> Modified: 
> lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/InputStreamDataInput.java
> URL: 
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/InputStreamDataInput.java?rev=1135526&r1=1135525&r2=1135526&view=diff
> ==
> --- 
> lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/InputStreamDataInput.java
>  (original)
> +++ 
> lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/InputStreamDataInput.java
>  Tue Jun 14 12:28:16 2011
> @@ -51,7 +51,7 @@ public class InputStreamDataInput extend
>     }
>   }
>
> -  @Override
> +  // @Override -- not until Java 1.6
>   public void close() throws IOException {
>     is.close();
>   }
>
> Modified: 
> lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/OutputStreamDataOutput.java
> URL: 
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/OutputStreamDataOutput.java?rev=1135526&r1=1135525&r2=1135526&view=diff
> ==
> --- 
> lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/OutputStreamDataOutput.java
>  (original)
> +++ 
> lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/OutputStreamDataOutput.java
>  Tue Jun 14 12:28:16 2011
> @@ -39,6 +39,7 @@ public class OutputStreamDataOutput exte
>     os.write(b, offset, length);
>   }
>
> +  // @Override -- not until Java 1.6
>   public void close() throws IOException {
>     os.close();
>   }
>
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3200) Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2

2011-06-14 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049154#comment-13049154
 ] 

Simon Willnauer commented on LUCENE-3200:
-

+1 this looks awesome. Gute Arbeit Uwe :)

> Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized 
> of powers of 2
> ---
>
> Key: LUCENE-3200
> URL: https://issues.apache.org/jira/browse/LUCENE-3200
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Attachments: LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, 
> LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200_tests.patch
>
>
> Robert and me discussed a little bit after Mike's investigations, that using 
> SingleMMapIndexinput together with MultiMMapIndexInput leads to hotspot 
> slowdowns sometimes.
> We had the following ideas:
> - MultiMMapIndexInput is almost as fast as SingleMMapIndexInput, as the 
> switching between buffer boundaries is done in exception catch blocks. So 
> normal code path is always the same like for Single*
> - Only the seek method uses strange calculations (the modulo is totally 
> bogus, it could be simply: int bufOffset = (int) (pos % maxBufSize); - very 
> strange way of calculating modulo in the original code)
> - Because of speed we suggest to no longer use arbitrary buffer sizes. We 
> should pass only the power of 2 to the indexinput as size. All calculations 
> in seek and anywhere else would be simple bit shifts and AND operations (the 
> and masks for the modulo can be calculated in the ctor like NumericUtils does 
> when calculating precisionSteps).
> - the maximum buffer size will now be 2^30, not 2^31-1. But thats not an 
> issue at all. In my opinion, a buffer size of 2^31-1 is stupid in all cases, 
> as it will no longer fit page boundaries and mmapping gets harder for the O/S.
> We will provide a patch with those cleanups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2551) Checking dataimport.properties for write access during startup

2011-06-14 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-2551:


Attachment: SOLR-2551.patch

Patch to fail imports if the data config supports delta import but the 
dataimport.properties file is not writable.

Added a test to verify.

> Checking dataimport.properties for write access during startup
> --
>
> Key: SOLR-2551
> URL: https://issues.apache.org/jira/browse/SOLR-2551
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4.1, 3.1
>Reporter: C S
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Attachments: SOLR-2551.patch
>
>
> A common mistake is that the /conf (respectively the dataimport.properties) 
> file is not writable for solr. It would be great if that were detected on 
> starting a dataimport job. 
> Currently and import might grind away for days and fail if it can't write its 
> timestamp to the dataimport.properties file.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: External strings sort and case folding.

2011-06-14 Thread Michael McCandless
In theory, you could use the codec API directly, adding "chunks" of
pre-sorted terms, and then fake up a SegmentInfo to make it look like
some kind of degenerate segment, and then merge them?

But it's gonna be a lot of work to do that :)

Merging FSTs sounds cool!

Mike McCandless

http://blog.mikemccandless.com

On Tue, Jun 14, 2011 at 8:18 AM, Dawid Weiss
 wrote:
>> So actually it would work if you just enum'd the terms yourself, after
>> indexing and optimizing.  And this does amount to an external sort, I
>> think!
>
> Yep. I was just curious if there's a way to do it without the overhead
> of creating fields, documents, etc. If I have a spare minute I'll try
> to write a merge sort from disk splits. It'd be neat to write FST
> merging too (so that, given to FSTs you could merge them into one by
> creating a new FST and adding sequences in order from one or the other
> source).
>
> Dawid
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: External strings sort and case folding.

2011-06-14 Thread Dawid Weiss
> So actually it would work if you just enum'd the terms yourself, after
> indexing and optimizing.  And this does amount to an external sort, I
> think!

Yep. I was just curious if there's a way to do it without the overhead
of creating fields, documents, etc. If I have a spare minute I'll try
to write a merge sort from disk splits. It'd be neat to write FST
merging too (so that, given to FSTs you could merge them into one by
creating a new FST and adding sequences in order from one or the other
source).

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3202) Add DataInput/DataOutput subclasses that delegate to an InputStream/OutputStream.

2011-06-14 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-3202.
-

Resolution: Fixed

> Add DataInput/DataOutput subclasses that delegate to an 
> InputStream/OutputStream.
> -
>
> Key: LUCENE-3202
> URL: https://issues.apache.org/jira/browse/LUCENE-3202
> Project: Lucene - Java
>  Issue Type: Task
>  Components: core/other
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Trivial
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3202.patch, LUCENE-3202.patch
>
>
> Such classes would be handy for FST serialization/deserialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3202) Add DataInput/DataOutput subclasses that delegate to an InputStream/OutputStream.

2011-06-14 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-3202:


Attachment: LUCENE-3202.patch

Updated patch, applied.

> Add DataInput/DataOutput subclasses that delegate to an 
> InputStream/OutputStream.
> -
>
> Key: LUCENE-3202
> URL: https://issues.apache.org/jira/browse/LUCENE-3202
> Project: Lucene - Java
>  Issue Type: Task
>  Components: core/other
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Trivial
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3202.patch, LUCENE-3202.patch
>
>
> Such classes would be handy for FST serialization/deserialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3202) Add DataInput/DataOutput subclasses that delegate to an InputStream/OutputStream.

2011-06-14 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049146#comment-13049146
 ] 

Dawid Weiss commented on LUCENE-3202:
-

Thanks Shai. I'll add the headers, clean up this where applicable and commit in.

> Add DataInput/DataOutput subclasses that delegate to an 
> InputStream/OutputStream.
> -
>
> Key: LUCENE-3202
> URL: https://issues.apache.org/jira/browse/LUCENE-3202
> Project: Lucene - Java
>  Issue Type: Task
>  Components: core/other
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Trivial
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3202.patch
>
>
> Such classes would be handy for FST serialization/deserialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: External strings sort and case folding.

2011-06-14 Thread Michael McCandless
On Tue, Jun 14, 2011 at 7:06 AM, Dawid Weiss
 wrote:
>> And, if you create & index such Lucene documents, and then do a
>> MatchAllDocsQuery sorting by your field, this is (unfortunately) not
>
> I was thinking about an optimized segment -- then the terms enum on a
> given field should be sorted, right?

Ahh, right.

So actually it would work if you just enum'd the terms yourself, after
indexing and optimizing.  And this does amount to an external sort, I
think!

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)

2011-06-14 Thread Martin Grotzke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049143#comment-13049143
 ] 

Martin Grotzke commented on SOLR-2583:
--

{quote}
See: http://www.strchr.com/multi-stage_tables

i attached a patch, of a (not great) implementation i was sorta kinda trying to 
clean up for other reasons... maybe you can use it.
{quote}

Thanx, interesting approach!

I just tried to create a CompactFloatArray based on the CompactByteArray to be 
able to compare memory consumptions. There's one change that wasn't just 
changing byte to float, and I'm not sure what's the right adaption in this case:

{code}
diff -w solr/src/java/org/apache/solr/util/CompactByteArray.java 
solr/src/java/org/apache/solr/util/CompactFloatArray.java
57c57
...
202,203c202,203
<   private void touchBlock(int i, int value) {
< hashes[i] = (hashes[i] + (value << 1)) | 1;
---
>   private void touchBlock(int i, float value) {
> hashes[i] = (hashes[i] + (Float.floatToIntBits(value) << 1)) | 1;
{code}

The adapted test is green, so it seems to be correct at least. I'll also attach 
the full patch for CompactFloatArray.java and TestCompactFloatArray.java

> Make external scoring more efficient (ExternalFileField, FileFloatSource)
> -
>
> Key: SOLR-2583
> URL: https://issues.apache.org/jira/browse/SOLR-2583
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Martin Grotzke
>Priority: Minor
> Attachments: FileFloatSource.java.patch, patch.txt
>
>
> External scoring eats much memory, depending on the number of documents in 
> the index. The ExternalFileField (used for external scoring) uses 
> FileFloatSource, where one FileFloatSource is created per external scoring 
> file. FileFloatSource creates a float array with the size of the number of 
> docs (this is also done if the file to load is not found). If there are much 
> less entries in the scoring file than there are number of docs in total the 
> big float array wastes much memory.
> This could be optimized by using a map of doc -> score, so that the map 
> contains as many entries as there are scoring entries in the external file, 
> but not more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3202) Add DataInput/DataOutput subclasses that delegate to an InputStream/OutputStream.

2011-06-14 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049142#comment-13049142
 ] 

Robert Muir commented on LUCENE-3202:
-

I agree with moving these to .store package, sorry I forgot about this in the 
suggest refactoring.

I've had to write similar classes myself before since they were not there.


> Add DataInput/DataOutput subclasses that delegate to an 
> InputStream/OutputStream.
> -
>
> Key: LUCENE-3202
> URL: https://issues.apache.org/jira/browse/LUCENE-3202
> Project: Lucene - Java
>  Issue Type: Task
>  Components: core/other
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Trivial
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3202.patch
>
>
> Such classes would be handy for FST serialization/deserialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3202) Add DataInput/DataOutput subclasses that delegate to an InputStream/OutputStream.

2011-06-14 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049138#comment-13049138
 ] 

Shai Erera commented on LUCENE-3202:


Patch looks good. Two comments:
# The files are missing the Apache License header.
# In some places you use this.is / this.out and others just is/out. Can you 
consolidate on one (I prefer w/o this.)?

> Add DataInput/DataOutput subclasses that delegate to an 
> InputStream/OutputStream.
> -
>
> Key: LUCENE-3202
> URL: https://issues.apache.org/jira/browse/LUCENE-3202
> Project: Lucene - Java
>  Issue Type: Task
>  Components: core/other
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Trivial
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3202.patch
>
>
> Such classes would be handy for FST serialization/deserialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2206) DIH MailEntityProcessor has mispelled words

2011-06-14 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul resolved SOLR-2206.
--

   Resolution: Fixed
Fix Version/s: 4.0
 Assignee: Noble Paul

corrected spelling error

> DIH MailEntityProcessor has mispelled words
> ---
>
> Key: SOLR-2206
> URL: https://issues.apache.org/jira/browse/SOLR-2206
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Reporter: Lance Norskog
>Assignee: Noble Paul
> Fix For: 4.0
>
>
> The MailEntityProcessor spells the XML attribute processAttachement with an 
> extra 'e'. From the archives, it appears that the two Solr fields also 
> spelled it this way but were fixed at some point.
> Please make 'processAttachment' the standard spelling. The code should also 
> check for the extra-e version but it should not appear in the wiki or other 
> documentation.
> Thanks!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1873) Commit Solr Cloud to trunk

2011-06-14 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049132#comment-13049132
 ] 

Noble Paul commented on SOLR-1873:
--

How can we make the logic for identifying the shards pluggable? if I have a per 
user data stored in a given shard, the search should be performed only there. 
Is there an issue to track this or shall I open one?

> Commit Solr Cloud to trunk
> --
>
> Key: SOLR-1873
> URL: https://issues.apache.org/jira/browse/SOLR-1873
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 1.4
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 4.0
>
> Attachments: SOLR-1873.patch, SOLR-1873.patch, SOLR-1873.patch, 
> SOLR-1873.patch, SOLR-1873.patch, SOLR-1873.patch, SOLR-1873.patch, 
> SOLR-1873.patch, SOLR-1873.patch, SOLR-1873.patch, SOLR-1873.patch, 
> SOLR-1873.patch, SOLR-1873.patch, SOLR-1873.patch, 
> TEST-org.apache.solr.cloud.ZkSolrClientTest.txt, log4j-over-slf4j-1.5.5.jar, 
> zookeeper-3.2.2.jar, zookeeper-3.3.1.jar
>
>
> See http://wiki.apache.org/solr/SolrCloud
> This is a real hassle - I didn't merge up to trunk before all the svn 
> scrambling, so integrating cloud is now a bit difficult. I'm running through 
> and just preparing a commit by hand though (applying changes/handling 
> conflicts a file at a time).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: External strings sort and case folding.

2011-06-14 Thread Dawid Weiss
> And, if you create & index such Lucene documents, and then do a
> MatchAllDocsQuery sorting by your field, this is (unfortunately) not

I was thinking about an optimized segment -- then the terms enum on a
given field should be sorted, right?


Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: External strings sort and case folding.

2011-06-14 Thread Michael McCandless
Not that I know of.

And, if you create & index such Lucene documents, and then do a
MatchAllDocsQuery sorting by your field, this is (unfortunately) not
an external sort!  Ie, Lucene loads all terms data in RAM as packed
byte[], for merging the per-segment results.

It even does this, unnecessarily, for an optimized segment, even
though we only need ords in that case (there's an issue open for
this).

Doing a sort-by-String-field without loading the String data even when
there are multiple segments in the index would be a nice addition :)

Mike McCandless

http://blog.mikemccandless.com

On Tue, Jun 14, 2011 at 6:31 AM, Dawid Weiss  wrote:
> Hi. While I was playing with automata recently, I had a use case
> scenario when I could really use an external sort of a large list of
> unicode strings. I know I could simply emulate this by creating
> synthetic documents, index, etc., but is there a more "direct" way of
> achieving this using Lucene's internals?
>
> Dawid
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3200) Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2

2011-06-14 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049117#comment-13049117
 ] 

Robert Muir commented on LUCENE-3200:
-

+1, great work Uwe.

> Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized 
> of powers of 2
> ---
>
> Key: LUCENE-3200
> URL: https://issues.apache.org/jira/browse/LUCENE-3200
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Attachments: LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, 
> LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200_tests.patch
>
>
> Robert and me discussed a little bit after Mike's investigations, that using 
> SingleMMapIndexinput together with MultiMMapIndexInput leads to hotspot 
> slowdowns sometimes.
> We had the following ideas:
> - MultiMMapIndexInput is almost as fast as SingleMMapIndexInput, as the 
> switching between buffer boundaries is done in exception catch blocks. So 
> normal code path is always the same like for Single*
> - Only the seek method uses strange calculations (the modulo is totally 
> bogus, it could be simply: int bufOffset = (int) (pos % maxBufSize); - very 
> strange way of calculating modulo in the original code)
> - Because of speed we suggest to no longer use arbitrary buffer sizes. We 
> should pass only the power of 2 to the indexinput as size. All calculations 
> in seek and anywhere else would be simple bit shifts and AND operations (the 
> and masks for the modulo can be calculated in the ctor like NumericUtils does 
> when calculating precisionSteps).
> - the maximum buffer size will now be 2^30, not 2^31-1. But thats not an 
> issue at all. In my opinion, a buffer size of 2^31-1 is stupid in all cases, 
> as it will no longer fit page boundaries and mmapping gets harder for the O/S.
> We will provide a patch with those cleanups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3200) Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2

2011-06-14 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049115#comment-13049115
 ] 

Michael McCandless commented on LUCENE-3200:


+1 to commit!

In my stress NRT test (runs optimize on a full Wiki index with ongoing
indexing / reopening), without this patch, I see performance drop
substantially (like 180 QPS down to 140 QPS) when the JVM cuts over to
the optimized segment.  With the patch I see it jump up a bit after
the optimize completes!  So this seems to make hotspot's job
easier...


> Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized 
> of powers of 2
> ---
>
> Key: LUCENE-3200
> URL: https://issues.apache.org/jira/browse/LUCENE-3200
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Attachments: LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, 
> LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200_tests.patch
>
>
> Robert and me discussed a little bit after Mike's investigations, that using 
> SingleMMapIndexinput together with MultiMMapIndexInput leads to hotspot 
> slowdowns sometimes.
> We had the following ideas:
> - MultiMMapIndexInput is almost as fast as SingleMMapIndexInput, as the 
> switching between buffer boundaries is done in exception catch blocks. So 
> normal code path is always the same like for Single*
> - Only the seek method uses strange calculations (the modulo is totally 
> bogus, it could be simply: int bufOffset = (int) (pos % maxBufSize); - very 
> strange way of calculating modulo in the original code)
> - Because of speed we suggest to no longer use arbitrary buffer sizes. We 
> should pass only the power of 2 to the indexinput as size. All calculations 
> in seek and anywhere else would be simple bit shifts and AND operations (the 
> and masks for the modulo can be calculated in the ctor like NumericUtils does 
> when calculating precisionSteps).
> - the maximum buffer size will now be 2^30, not 2^31-1. But thats not an 
> issue at all. In my opinion, a buffer size of 2^31-1 is stupid in all cases, 
> as it will no longer fit page boundaries and mmapping gets harder for the O/S.
> We will provide a patch with those cleanups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3197) Optimize runs forever if you keep deleting docs at the same time

2011-06-14 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049109#comment-13049109
 ] 

Michael McCandless commented on LUCENE-3197:


One simple way to fix this would be to have IW disregard the MergePolicy if 
ever it asks to do a single-segment merge of a segment that had already been 
produced by merging for the current optimize call.

But... I don't really like this, as it could be some unusual MergePolicy out 
there sometimes wants to do such merging.

So I think a better solution, but API breaking to the MergePolicy, which is OK 
because it's @experimental, is to change the segmentsToOptimize argument; 
currently it's just a set recording which segments need to be optimized away.  
I think we should change it to a Map, where the Boolean 
indicates whether this segment had been created by a merge in the current 
optimize session.  Then I'll fix our MPs to not cascade in such a case.

> Optimize runs forever if you keep deleting docs at the same time
> 
>
> Key: LUCENE-3197
> URL: https://issues.apache.org/jira/browse/LUCENE-3197
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.3, 4.0
>
>
> Because we "cascade" merges for an optimize... if you also delete documents 
> while the merges are running, then the merge policy will see the resulting 
> single segment as still not optimized (since it has pending deletes) and do a 
> single-segment merge, and will repeat indefinitely (as long as your app keeps 
> deleting docs).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-3197) Optimize runs forever if you keep deleting docs at the same time

2011-06-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-3197:
--

Assignee: Michael McCandless

> Optimize runs forever if you keep deleting docs at the same time
> 
>
> Key: LUCENE-3197
> URL: https://issues.apache.org/jira/browse/LUCENE-3197
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.3, 4.0
>
>
> Because we "cascade" merges for an optimize... if you also delete documents 
> while the merges are running, then the merge policy will see the resulting 
> single segment as still not optimized (since it has pending deletes) and do a 
> single-segment merge, and will repeat indefinitely (as long as your app keeps 
> deleting docs).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



External strings sort and case folding.

2011-06-14 Thread Dawid Weiss
Hi. While I was playing with automata recently, I had a use case
scenario when I could really use an external sort of a large list of
unicode strings. I know I could simply emulate this by creating
synthetic documents, index, etc., but is there a more "direct" way of
achieving this using Lucene's internals?

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3202) Add DataInput/DataOutput subclasses that delegate to an InputStream/OutputStream.

2011-06-14 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049104#comment-13049104
 ] 

Dawid Weiss edited comment on LUCENE-3202 at 6/14/11 10:25 AM:
---

A patch moving these stream delegation classes to org.apache.lucene.store. A 
potential bugfix is piggybacked (potential partial read(byte[]) was not handled 
correctly).

  was (Author: dweiss):
A patch moving these stream delegation classes to org.apache.lucene.store. 
A potential bug is piggybacked (potential partial read(byte[]) was not handled 
correctly).
  
> Add DataInput/DataOutput subclasses that delegate to an 
> InputStream/OutputStream.
> -
>
> Key: LUCENE-3202
> URL: https://issues.apache.org/jira/browse/LUCENE-3202
> Project: Lucene - Java
>  Issue Type: Task
>  Components: core/other
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Trivial
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3202.patch
>
>
> Such classes would be handy for FST serialization/deserialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3202) Add DataInput/DataOutput subclasses that delegate to an InputStream/OutputStream.

2011-06-14 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-3202:


Attachment: LUCENE-3202.patch

A patch moving these stream delegation classes to org.apache.lucene.store. A 
potential bug is piggybacked (potential partial read(byte[]) was not handled 
correctly).

> Add DataInput/DataOutput subclasses that delegate to an 
> InputStream/OutputStream.
> -
>
> Key: LUCENE-3202
> URL: https://issues.apache.org/jira/browse/LUCENE-3202
> Project: Lucene - Java
>  Issue Type: Task
>  Components: core/other
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Trivial
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3202.patch
>
>
> Such classes would be handy for FST serialization/deserialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3190) TestStressIndexing2 testMultiConfig failure

2011-06-14 Thread selckin (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049100#comment-13049100
 ] 

selckin commented on LUCENE-3190:
-

{code}

[junit] Testsuite: org.apache.lucene.index.TestStressIndexing2
[junit] Tests run: 50, Failures: 2, Errors: 0, Time elapsed: 11.135 sec
[junit] 
[junit] - Standard Error -
[junit] java.lang.AssertionError: ram was 460248 expected: 407592 flush 
mem: 394568 active: 65680 pending: 0 flushing: 3 blocked: 0 peakDelta: 65959
[junit] at 
org.apache.lucene.index.DocumentsWriterFlushControl.assertMemory(DocumentsWriterFlushControl.java:102)
[junit] at 
org.apache.lucene.index.DocumentsWriterFlushControl.doAfterDocument(DocumentsWriterFlushControl.java:164)
[junit] at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380)
[junit] at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1474)
[junit] at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1446)
[junit] at 
org.apache.lucene.index.TestStressIndexing2$IndexingThread.indexDoc(TestStressIndexing2.java:723)
[junit] at 
org.apache.lucene.index.TestStressIndexing2$IndexingThread.run(TestStressIndexing2.java:757)
[junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 
-Dtestmethod=testMultiConfig 
-Dtests.seed=2571834029692482827:-8116419692655152763
[junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 
-Dtestmethod=testMultiConfig 
-Dtests.seed=2571834029692482827:-8116419692655152763
[junit] The following exceptions were thrown by threads:
[junit] *** Thread: Thread-793 ***
[junit] junit.framework.AssertionFailedError: java.lang.AssertionError: ram 
was 460248 expected: 407592 flush mem: 394568 active: 65680 pending: 0 
flushing: 3 blocked: 0 peakDelta: 65959
[junit] at junit.framework.Assert.fail(Assert.java:47)
[junit] at 
org.apache.lucene.index.TestStressIndexing2$IndexingThread.run(TestStressIndexing2.java:762)
[junit] NOTE: test params are: codec=RandomCodecProvider: {f34=MockRandom, 
f33=Standard, f32=Standard, f31=MockSep, f30=Pulsing(freqCutoff=7), 
f39=Standard, f38=MockSep, f37=Pulsing(freqCutoff=7), 
f36=MockFixedIntBlock(blockSize=649), 
f35=MockVariableIntBlock(baseBlockSize=9), 
f43=MockFixedIntBlock(blockSize=649), 
f42=MockVariableIntBlock(baseBlockSize=9), f45=MockSep, 
f44=Pulsing(freqCutoff=7), f41=MockRandom, f40=Standard, f47=Standard, 
f46=Standard, f49=MockVariableIntBlock(baseBlockSize=9), f48=MockRandom, 
f6=Pulsing(freqCutoff=7), f7=MockSep, f8=Standard, 
f9=MockVariableIntBlock(baseBlockSize=9), f12=Standard, f11=Standard, 
f10=MockSep, f16=Pulsing(freqCutoff=7), f15=MockFixedIntBlock(blockSize=649), 
f14=MockVariableIntBlock(baseBlockSize=9), f13=MockRandom, f19=Standard, 
f18=Standard, f17=MockSep, f1=MockFixedIntBlock(blockSize=649), 
f0=MockVariableIntBlock(baseBlockSize=9), f3=MockSep, f2=Pulsing(freqCutoff=7), 
f5=MockVariableIntBlock(baseBlockSize=9), f4=Standard, 
f21=MockVariableIntBlock(baseBlockSize=9), f20=MockRandom, 
f23=Pulsing(freqCutoff=7), f22=MockFixedIntBlock(blockSize=649), f25=Standard, 
f24=MockSep, f27=MockRandom, f26=Standard, 
f29=MockFixedIntBlock(blockSize=649), 
f28=MockVariableIntBlock(baseBlockSize=9), 
f98=MockVariableIntBlock(baseBlockSize=9), f97=MockRandom, 
f99=MockFixedIntBlock(blockSize=649), f94=MockSep, f93=Pulsing(freqCutoff=7), 
f96=Standard, f95=Standard, f79=Pulsing(freqCutoff=7), 
f77=MockVariableIntBlock(baseBlockSize=9), 
f78=MockFixedIntBlock(blockSize=649), f75=Standard, f76=MockRandom, 
f73=MockSep, f74=Standard, f71=MockFixedIntBlock(blockSize=649), 
f72=Pulsing(freqCutoff=7), f81=MockVariableIntBlock(baseBlockSize=9), 
f80=MockRandom, f86=Pulsing(freqCutoff=7), f87=MockSep, f88=Standard, 
f89=Standard, f82=Standard, f83=MockRandom, 
f84=MockVariableIntBlock(baseBlockSize=9), 
f85=MockFixedIntBlock(blockSize=649), f90=Pulsing(freqCutoff=7), f92=Standard, 
f91=MockSep, f59=MockSep, f57=MockFixedIntBlock(blockSize=649), 
f58=Pulsing(freqCutoff=7), f51=Pulsing(freqCutoff=7), f52=MockSep, 
f50=MockFixedIntBlock(blockSize=649), f55=MockRandom, 
f56=MockVariableIntBlock(baseBlockSize=9), f53=Standard, f54=Standard, 
id=MockFixedIntBlock(blockSize=649), f68=Standard, f69=MockRandom, 
f60=Standard, f61=Standard, f62=MockRandom, 
f63=MockVariableIntBlock(baseBlockSize=9), 
f64=MockFixedIntBlock(blockSize=649), f65=Pulsing(freqCutoff=7), f66=MockSep, 
f67=Standard, f70=MockSep}, locale=en_SG, timezone=Europe/Dublin
[junit] NOTE: all tests run in this JVM:
[junit] [TestStressIndexing2]
[junit] NOTE: Linux 2.6.39-gentoo amd64/Sun Microsystems Inc. 1.6.0_26 
(64-bit)/cpus=8,threads=1,free=61143744,total=147718144
[junit] -  ---
[junit] Testcase: 
testMult

[jira] [Created] (LUCENE-3202) Add DataInput/DataOutput subclasses that delegate to an InputStream/OutputStream.

2011-06-14 Thread Dawid Weiss (JIRA)
Add DataInput/DataOutput subclasses that delegate to an 
InputStream/OutputStream.
-

 Key: LUCENE-3202
 URL: https://issues.apache.org/jira/browse/LUCENE-3202
 Project: Lucene - Java
  Issue Type: Task
  Components: core/other
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Trivial
 Fix For: 3.3, 4.0


Such classes would be handy for FST serialization/deserialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3190) TestStressIndexing2 testMultiConfig failure

2011-06-14 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049095#comment-13049095
 ] 

Simon Willnauer commented on LUCENE-3190:
-

{noformat}
[junit] Testcase: testRandom(org.apache.lucene.index.TestStressIndexing2):  
FAILED
[junit] Some threads threw uncaught exceptions!
[junit] junit.framework.AssertionFailedError: Some threads threw uncaught 
exceptions!
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)
[junit] at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:603)
[junit] 
[junit] 
[junit] Tests run: 3, Failures: 2, Errors: 0, Time elapsed: 9.243 sec
[junit] 
[junit] - Standard Error -
[junit] java.lang.AssertionError: ram was 462219 expected: 409920 flush 
mem: 396467 active: 65752
[junit] at 
org.apache.lucene.index.DocumentsWriterFlushControl.assertMemory(DocumentsWriterFlushControl.java:102)
[junit] at 
org.apache.lucene.index.DocumentsWriterFlushControl.doAfterDocument(DocumentsWriterFlushControl.java:164)
[junit] at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380)
[junit] at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1474)
[junit] at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1446)
[junit] at 
org.apache.lucene.index.TestStressIndexing2$IndexingThread.indexDoc(TestStressIndexing2.java:723)
[junit] at 
org.apache.lucene.index.TestStressIndexing2$IndexingThread.run(TestStressIndexing2.java:757)
[junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 
-Dtestmethod=testRandom -Dtests.seed=-3081198538389112044:6990165845273194870 
-Dtests.multiplier=3
{noformat}

jenkins just tripped a similar issue... the problem here seems related to a 
very lowish rambuffer together
with flushing by docCount. I was not able to reproduce it yet.
each time this fails ram buffer is 0.1M and maxBufferedDocs is 3 so
something seems to break the assert if we flush by doccount and not
necessarily take the largest DWPT out of the loop

selckin can you reproduce these errors? I just added some more info to the 
assert so if you run into it can you past the output?

> TestStressIndexing2 testMultiConfig failure
> ---
>
> Key: LUCENE-3190
> URL: https://issues.apache.org/jira/browse/LUCENE-3190
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: selckin
>Assignee: Simon Willnauer
>
> trunk: r1134311
> reproducible
> {code}
> [junit] Testsuite: org.apache.lucene.index.TestStressIndexing2
> [junit] Tests run: 1, Failures: 2, Errors: 0, Time elapsed: 0.882 sec
> [junit] 
> [junit] - Standard Error -
> [junit] java.lang.AssertionError: ram was 460908 expected: 408216 flush 
> mem: 395100 active: 65808
> [junit] at 
> org.apache.lucene.index.DocumentsWriterFlushControl.assertMemory(DocumentsWriterFlushControl.java:102)
> [junit] at 
> org.apache.lucene.index.DocumentsWriterFlushControl.doAfterDocument(DocumentsWriterFlushControl.java:164)
> [junit] at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380)
> [junit] at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1473)
> [junit] at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1445)
> [junit] at 
> org.apache.lucene.index.TestStressIndexing2$IndexingThread.indexDoc(TestStressIndexing2.java:723)
> [junit] at 
> org.apache.lucene.index.TestStressIndexing2$IndexingThread.run(TestStressIndexing2.java:757)
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 
> -Dtestmethod=testMultiConfig 
> -Dtests.seed=2571834029692482827:-8116419692655152763
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 
> -Dtestmethod=testMultiConfig 
> -Dtests.seed=2571834029692482827:-8116419692655152763
> [junit] The following exceptions were thrown by threads:
> [junit] *** Thread: Thread-0 ***
> [junit] junit.framework.AssertionFailedError: java.lang.AssertionError: 
> ram was 460908 expected: 408216 flush mem: 395100 active: 65808
> [junit] at junit.framework.Assert.fail(Assert.java:47)
> [junit] at 
> org.apache.lucene.index.TestStressIndexing2$IndexingThread.run(TestStressIndexing2.java:762)
> [junit] NOTE: test params are: codec=RandomCodecProvider: {f33=Standard, 
> f57=MockFixedIntBlock(blockSize=649), f11=Standard, f41=MockRandom, 
> f40=Standard, f62=MockRandom, f75=Stand

Re: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 8823 - Failure

2011-06-14 Thread Simon Willnauer
ah this is tripping an assert I added a couple of weeks ago. We
already have an issue for this here:
https://issues.apache.org/jira/browse/LUCENE-3190

the problem here seems related to a very lowish rambuffer together
with flushing by docCount. I was not able to reproduce it yet.
each time this fails ram buffer is 0.1M and maxBufferedDocs is 3 so
something seems to break the assert if we flush by doccount and not
necessarily take the largest DWPT out of the loop

I will dig further

simon

On Tue, Jun 14, 2011 at 9:16 AM, Apache Jenkins Server
 wrote:
> Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8823/
>
> 1 tests failed.
> REGRESSION:  org.apache.lucene.index.TestStressIndexing2.testRandom
>
> Error Message:
> r1.numDocs()=9 vs r2.numDocs()=8
>
> Stack Trace:
> junit.framework.AssertionFailedError: r1.numDocs()=9 vs r2.numDocs()=8
>        at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
>        at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)
>        at 
> org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:308)
>        at 
> org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:278)
>        at 
> org.apache.lucene.index.TestStressIndexing2.testRandom(TestStressIndexing2.java:88)
>
>
>
>
> Build Log (for compile errors):
> [...truncated 3276 lines...]
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >