XmlCharFilter
I work with a lot of XML data sources and have needed to implement an analysis chain for Solr/Lucene that accepts XML. In the course of doing that, I found I needed something very much like HTMLCharFilter, but that does standard XML parsing (understands XML entities defined in an internal or external DTD, for example). So I wrote XmlCharFilter, which uses the Woodstox XML parser (already used by Solr). I think this could be useful for others, and it would be nice for me if it were committed here, so I'd like to contribute. Should I open a JIRA for this? Is there anybody that can spare the time to review? It is basically one class (plus a factory class) and has a fairly complete set of tests. -Mike Sokolov Engineering Directory iFactory.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[Lucene.Net] [jira] [Closed] (LUCENENET-425) MMapDirectory implementation
[ https://issues.apache.org/jira/browse/LUCENENET-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy closed LUCENENET-425. -- Resolution: Won't Fix > MMapDirectory implementation > > > Key: LUCENENET-425 > URL: https://issues.apache.org/jira/browse/LUCENENET-425 > Project: Lucene.Net > Issue Type: New Feature >Affects Versions: Lucene.Net 2.9.4g >Reporter: Digy >Priority: Trivial > Fix For: Lucene.Net 2.9.4g > > Attachments: MMapDirectory.patch > > > Since this is not a direct port of MMapDirectory.java, I'll put it under > "Support" and implement MMapDirectory as > {code} > public class MMapDirectory:Lucene.Net.Support.MemoryMappedDirectory > { > } > {code} > If a Mem-Map can not be created(for ex, if the file is too big to fit in 32 > bit address range), it will default to FSDirectory.FSIndexInput > In my tests, I didn't see any performance gain in 32bit environment and I > consider it as better then nothing. > I would be happy if someone could send test results on 64bit platform. > DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (SOLR-1431) CommComponent abstracted
[ https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049482#comment-13049482 ] Mark Miller commented on SOLR-1431: --- Does this patch incorporate any of Nobles feedback/patches? Any reason we want to create a new ShardHandler every request? > CommComponent abstracted > > > Key: SOLR-1431 > URL: https://issues.apache.org/jira/browse/SOLR-1431 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 4.0 >Reporter: Jason Rutherglen >Assignee: Mark Miller > Fix For: 4.0 > > Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, > SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, > SOLR-1431.patch, SOLR-1431.patch > > > We'll abstract CommComponent in this issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3204) Include maven-ant-tasks jar in the source tree and use this jar from generate-maven-artifacts
[ https://issues.apache.org/jira/browse/LUCENE-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe resolved LUCENE-3204. - Resolution: Fixed Committed: - trunk: r1135801, r1135818, r1135822, r1135825 - branch_3x: r1135827 > Include maven-ant-tasks jar in the source tree and use this jar from > generate-maven-artifacts > - > > Key: LUCENE-3204 > URL: https://issues.apache.org/jira/browse/LUCENE-3204 > Project: Lucene - Java > Issue Type: Improvement > Components: general/build >Affects Versions: 3.3, 4.0 >Reporter: Steven Rowe >Assignee: Steven Rowe > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3204.patch, LUCENE-3204.patch > > > Currently, running {{ant generate-maven-artifacts}} requires the user to have > {{maven-ant-tasks-*.jar}} in their Ant classpath, e.g. in {{~/.ant/lib/}}. > The build should instead rely on a copy of this jar included in the source > tree. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3190) TestStressIndexing2 testMultiConfig failure
[ https://issues.apache.org/jira/browse/LUCENE-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3190: Attachment: LUCENE-3190.patch here is a patch that prevent this assert when ram buffer is low compared to the size of the documents we are indexing. for "normal" setting the assert will be executed but for very lowish documents we simply skip it entirely > TestStressIndexing2 testMultiConfig failure > --- > > Key: LUCENE-3190 > URL: https://issues.apache.org/jira/browse/LUCENE-3190 > Project: Lucene - Java > Issue Type: Bug >Reporter: selckin >Assignee: Simon Willnauer > Attachments: LUCENE-3190.patch > > > trunk: r1134311 > reproducible > {code} > [junit] Testsuite: org.apache.lucene.index.TestStressIndexing2 > [junit] Tests run: 1, Failures: 2, Errors: 0, Time elapsed: 0.882 sec > [junit] > [junit] - Standard Error - > [junit] java.lang.AssertionError: ram was 460908 expected: 408216 flush > mem: 395100 active: 65808 > [junit] at > org.apache.lucene.index.DocumentsWriterFlushControl.assertMemory(DocumentsWriterFlushControl.java:102) > [junit] at > org.apache.lucene.index.DocumentsWriterFlushControl.doAfterDocument(DocumentsWriterFlushControl.java:164) > [junit] at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380) > [junit] at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1473) > [junit] at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1445) > [junit] at > org.apache.lucene.index.TestStressIndexing2$IndexingThread.indexDoc(TestStressIndexing2.java:723) > [junit] at > org.apache.lucene.index.TestStressIndexing2$IndexingThread.run(TestStressIndexing2.java:757) > [junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 > -Dtestmethod=testMultiConfig > -Dtests.seed=2571834029692482827:-8116419692655152763 > [junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 > -Dtestmethod=testMultiConfig > -Dtests.seed=2571834029692482827:-8116419692655152763 > [junit] The following exceptions were thrown by threads: > [junit] *** Thread: Thread-0 *** > [junit] junit.framework.AssertionFailedError: java.lang.AssertionError: > ram was 460908 expected: 408216 flush mem: 395100 active: 65808 > [junit] at junit.framework.Assert.fail(Assert.java:47) > [junit] at > org.apache.lucene.index.TestStressIndexing2$IndexingThread.run(TestStressIndexing2.java:762) > [junit] NOTE: test params are: codec=RandomCodecProvider: {f33=Standard, > f57=MockFixedIntBlock(blockSize=649), f11=Standard, f41=MockRandom, > f40=Standard, f62=MockRandom, f75=Standard, f73=MockSep, > f29=MockFixedIntBlock(blockSize=649), f83=MockRandom, f66=MockSep, > f49=MockVariableIntBlock(baseBlockSize=9), f72=Pulsing(freqCutoff=7), > f54=Standard, id=MockFixedIntBlock(blockSize=649), f80=MockRandom, > f94=MockSep, f93=Pulsing(freqCutoff=7), f95=Standard}, locale=en_SG, > timezone=Pacific/Palau > [junit] NOTE: all tests run in this JVM: > [junit] [TestStressIndexing2] > [junit] NOTE: Linux 2.6.39-gentoo amd64/Sun Microsystems Inc. 1.6.0_25 > (64-bit)/cpus=8,threads=1,free=133324528,total=158400512 > [junit] - --- > [junit] Testcase: > testMultiConfig(org.apache.lucene.index.TestStressIndexing2): FAILED > [junit] r1.numDocs()=17 vs r2.numDocs()=16 > [junit] junit.framework.AssertionFailedError: r1.numDocs()=17 vs > r2.numDocs()=16 > [junit] at > org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:308) > [junit] at > org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:278) > [junit] at > org.apache.lucene.index.TestStressIndexing2.testMultiConfig(TestStressIndexing2.java:124) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) > [junit] > [junit] > [junit] Testcase: > testMultiConfig(org.apache.lucene.index.TestStressIndexing2): FAILED > [junit] Some threads threw uncaught exceptions! > [junit] junit.framework.AssertionFailedError: Some threads threw uncaught > exceptions! > [junit] at > org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:603) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)
[Lucene.Net] [jira] [Commented] (LUCENENET-425) MMapDirectory implementation
[ https://issues.apache.org/jira/browse/LUCENENET-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049473#comment-13049473 ] Digy commented on LUCENENET-425: OK, I think it will be better to mark MMapDirectory as unimplemented like NIOFSDirectory. DIGY > MMapDirectory implementation > > > Key: LUCENENET-425 > URL: https://issues.apache.org/jira/browse/LUCENENET-425 > Project: Lucene.Net > Issue Type: New Feature >Affects Versions: Lucene.Net 2.9.4g >Reporter: Digy >Priority: Trivial > Fix For: Lucene.Net 2.9.4g > > Attachments: MMapDirectory.patch > > > Since this is not a direct port of MMapDirectory.java, I'll put it under > "Support" and implement MMapDirectory as > {code} > public class MMapDirectory:Lucene.Net.Support.MemoryMappedDirectory > { > } > {code} > If a Mem-Map can not be created(for ex, if the file is too big to fit in 32 > bit address range), it will default to FSDirectory.FSIndexInput > In my tests, I didn't see any performance gain in 32bit environment and I > consider it as better then nothing. > I would be happy if someone could send test results on 64bit platform. > DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (SOLR-1431) CommComponent abstracted
[ https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049465#comment-13049465 ] Mark Miller commented on SOLR-1431: --- Hang - might have gotten bit by JIRA's new patch sorting bs - used to just do it right and I prob had it sorting wrong or something. Just gave it one last go and the patch applied cleanly. > CommComponent abstracted > > > Key: SOLR-1431 > URL: https://issues.apache.org/jira/browse/SOLR-1431 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 4.0 >Reporter: Jason Rutherglen >Assignee: Mark Miller > Fix For: 4.0 > > Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, > SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, > SOLR-1431.patch, SOLR-1431.patch > > > We'll abstract CommComponent in this issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3201) improved compound file handling
[ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049460#comment-13049460 ] Uwe Schindler edited comment on LUCENE-3201 at 6/14/11 10:07 PM: - Robert: Very nice. Small thing: - NIOFSCompoundFileDirectory / SimpleFSCompoundFileDirectory / MMapCompoundFileDirectory are non-static inner classes but still get parent Directory in ctor. This is douplicated as javac also passes the parent around (the special ParentClassName.this one). I would remove the ctor param and use "*FSDirectory.this" as reference to outer class. I nitpick, because at some places it references the parent directory without the ctor param, so its inconsistent. That's all for now, thanks for hard work! was (Author: thetaphi): Robert: Very nice. Small thing: - NIOFSCompoundFileDirectory / SimpleFSCompoundFileDirectory are non-static inner classes but still get parent Directory in ctor. This is douplicated as javac also passes the parent around (the special ParentClassName.this one). I would remove the ctor param and use "Simple/NIO-FSDirectory.this" as reference to outer class. I nitpick, because at some places it references the parent directory without the ctor param, so its inconsistent. That's all for now, thanks for hard work! > improved compound file handling > --- > > Key: LUCENE-3201 > URL: https://issues.apache.org/jira/browse/LUCENE-3201 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Robert Muir > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3201.patch, LUCENE-3201.patch > > > Currently CompoundFileReader could use some improvements, i see the following > problems > * its CSIndexInput extends bufferedindexinput, which is stupid for > directories like mmap. > * it seeks on every readInternal > * its not possible for a directory to override or improve the handling of > compound files. > for example: it seems if you were impl'ing this thing from scratch, you would > just wrap the II directly (not extend BufferedIndexInput, > and add compound file offset X to seek() calls, and override length(). But of > course, then you couldnt throw read past EOF always when you should, > as a user could read into the next file and be left unaware. > however, some directories could handle this better. for example MMapDirectory > could return an indexinput that simply mmaps the 'slice' of the CFS file. > its underlying bytebuffer etc naturally does bounds checks already etc, so it > wouldnt need to be buffered, not even needing to add any offsets to seek(), > as its position would just work. > So I think we should try to refactor this so that a Directory can customize > how compound files are handled, the simplest > case for the least code change would be to add this to Directory.java: > {code} > public Directory openCompoundInput(String filename) { > return new CompoundFileReader(this, filename); > } > {code} > Because most code depends upon the fact compound files are implemented as a > Directory and transparent. at least then a subclass could override... > but the 'recursion' is a little ugly... we could still label it > expert+internal+experimental or whatever. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1431) CommComponent abstracted
[ https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049461#comment-13049461 ] Mark Miller commented on SOLR-1431: --- Can you update your patch to apply without the hunk failures? Tests will not pass for me locally with the current patch. > CommComponent abstracted > > > Key: SOLR-1431 > URL: https://issues.apache.org/jira/browse/SOLR-1431 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 4.0 >Reporter: Jason Rutherglen >Assignee: Mark Miller > Fix For: 4.0 > > Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, > SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, > SOLR-1431.patch, SOLR-1431.patch > > > We'll abstract CommComponent in this issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8841 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8841/ 7 tests failed. REGRESSION: org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety Error Message: Error occurred in thread Thread-47: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/4/test2351223530tmp/_x_1.tiv (Too many open files in system) Stack Trace: junit.framework.AssertionFailedError: Error occurred in thread Thread-47: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/4/test2351223530tmp/_x_1.tiv (Too many open files in system) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/4/test2351223530tmp/_x_1.tiv (Too many open files in system) at org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety(TestIndexReaderReopen.java:822) REGRESSION: org.apache.lucene.search.TestSimpleExplanationsOfNonMatches.testMultiFieldBQofPQ4 Error Message: CheckIndex failed Stack Trace: java.lang.RuntimeException: CheckIndex failed at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:158) at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144) at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477) at org.apache.lucene.search.TestExplanations.tearDown(TestExplanations.java:66) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) REGRESSION: org.apache.lucene.search.TestSimpleExplanationsOfNonMatches.testMultiFieldBQofPQ5 Error Message: null Stack Trace: java.lang.NullPointerException at org.apache.lucene.search.TestExplanations.tearDown(TestExplanations.java:64) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) REGRESSION: org.apache.lucene.search.TestSimpleExplanationsOfNonMatches.testMultiFieldBQofPQ6 Error Message: null Stack Trace: java.lang.NullPointerException at org.apache.lucene.search.TestExplanations.tearDown(TestExplanations.java:64) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) REGRESSION: org.apache.lucene.search.TestSimpleExplanationsOfNonMatches.testMultiFieldBQofPQ7 Error Message: null Stack Trace: java.lang.NullPointerException at org.apache.lucene.search.TestExplanations.tearDown(TestExplanations.java:64) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) REGRESSION: org.apache.lucene.search.TestSimpleExplanationsOfNonMatches.testNoop Error Message: null Stack Trace: java.lang.NullPointerException at org.apache.lucene.search.TestExplanations.tearDown(TestExplanations.java:64) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) FAILED: junit.framework.TestSuite.org.apache.lucene.search.TestSimpleExplanationsOfNonMatches Error Message: ensure your setUp() calls super.setUp() and your tearDown() calls super.tearDown()!!! Stack Trace: junit.framework.AssertionFailedError: ensure your setUp() calls super.setUp() and your tearDown() calls super.tearDown()!!! at org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:401) Build Log (for compile errors): [...truncated 3414 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3201) improved compound file handling
[ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049460#comment-13049460 ] Uwe Schindler commented on LUCENE-3201: --- Robert: Very nice. Small thing: - NIOFSCompoundFileDirectory / SimpleFSCompoundFileDirectory are non-static inner classes but still get parent Directory in ctor. This is douplicated as javac also passes the parent around (the special ParentClassName.this one). I would remove the ctor param and use "Simple/NIO-FSDirectory.this" as reference to outer class. I nitpick, because at some places it references the parent directory without the ctor param, so its inconsistent. That's all for now, thanks for hard work! > improved compound file handling > --- > > Key: LUCENE-3201 > URL: https://issues.apache.org/jira/browse/LUCENE-3201 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Robert Muir > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3201.patch, LUCENE-3201.patch > > > Currently CompoundFileReader could use some improvements, i see the following > problems > * its CSIndexInput extends bufferedindexinput, which is stupid for > directories like mmap. > * it seeks on every readInternal > * its not possible for a directory to override or improve the handling of > compound files. > for example: it seems if you were impl'ing this thing from scratch, you would > just wrap the II directly (not extend BufferedIndexInput, > and add compound file offset X to seek() calls, and override length(). But of > course, then you couldnt throw read past EOF always when you should, > as a user could read into the next file and be left unaware. > however, some directories could handle this better. for example MMapDirectory > could return an indexinput that simply mmaps the 'slice' of the CFS file. > its underlying bytebuffer etc naturally does bounds checks already etc, so it > wouldnt need to be buffered, not even needing to add any offsets to seek(), > as its position would just work. > So I think we should try to refactor this so that a Directory can customize > how compound files are handled, the simplest > case for the least code change would be to add this to Directory.java: > {code} > public Directory openCompoundInput(String filename) { > return new CompoundFileReader(this, filename); > } > {code} > Because most code depends upon the fact compound files are implemented as a > Directory and transparent. at least then a subclass could override... > but the 'recursion' is a little ugly... we could still label it > expert+internal+experimental or whatever. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2548) Multithreaded faceting
[ https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049453#comment-13049453 ] Adrien Grand commented on SOLR-2548: Hoss Man, Regarding the best value of the number of threads to spawn based on the number of CPUs and the traffic, one could imagine to decide whether to spawn a new thread or to run the task in the current thread based on the load of the server. This way, servers under high traffic would run every request in a single thread (maximizing throughput) whereas servers under low traffic would be able to use every processor in order to minimize response time. The load is easily retrievable in Java 6 using OperatingSystemMXBean, I don't know if it is possible in a non OS-specific way in Java 5. I don't really understand what you mean by "if you really care about parallelizing faceting, you probably wouldn't want some other intensive component starving out the thread pool". Do you mean that you would expect some requests to be run slower with every component using a global thread pool than with a single thread pool dedicated to facets? Yonik, why would you want to limit the number of threads on a per-request basis, if enough CPUs are available? > Multithreaded faceting > -- > > Key: SOLR-2548 > URL: https://issues.apache.org/jira/browse/SOLR-2548 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 3.1 >Reporter: Janne Majaranta >Priority: Minor > Labels: facet > Attachments: SOLR-2548.patch, SOLR-2548_for_31x.patch > > > Add multithreading support for faceting. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3204) Include maven-ant-tasks jar in the source tree and use this jar from generate-maven-artifacts
[ https://issues.apache.org/jira/browse/LUCENE-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049451#comment-13049451 ] Steven Rowe commented on LUCENE-3204: - bq. Jenkins now complains because of missing license file: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8840/console It's the NOTICE file that's missing, and I've just added it. bq. On Jenkins, I removed maven-ant-tasks from ~hudson/.ant/lib. Thanks! > Include maven-ant-tasks jar in the source tree and use this jar from > generate-maven-artifacts > - > > Key: LUCENE-3204 > URL: https://issues.apache.org/jira/browse/LUCENE-3204 > Project: Lucene - Java > Issue Type: Improvement > Components: general/build >Affects Versions: 3.3, 4.0 >Reporter: Steven Rowe >Assignee: Steven Rowe > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3204.patch, LUCENE-3204.patch > > > Currently, running {{ant generate-maven-artifacts}} requires the user to have > {{maven-ant-tasks-*.jar}} in their Ant classpath, e.g. in {{~/.ant/lib/}}. > The build should instead rely on a copy of this jar included in the source > tree. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8844 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/8844/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety Error Message: Error occurred in thread Thread-149: Lock obtain timed out: org.apache.lucene.store.MockLockFactoryWrapper$MockLock@c3c1635 Stack Trace: junit.framework.AssertionFailedError: Error occurred in thread Thread-149: Lock obtain timed out: org.apache.lucene.store.MockLockFactoryWrapper$MockLock@c3c1635 at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1268) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1186) Lock obtain timed out: org.apache.lucene.store.MockLockFactoryWrapper$MockLock@c3c1635 at org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety(TestIndexReaderReopen.java:820) Build Log (for compile errors): [...truncated 5258 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3204) Include maven-ant-tasks jar in the source tree and use this jar from generate-maven-artifacts
[ https://issues.apache.org/jira/browse/LUCENE-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049448#comment-13049448 ] Uwe Schindler commented on LUCENE-3204: --- Jenkins now complains because of missing license file: [https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8840/console] On Jenkins, I removed maven-ant-tasks from ~hudson/.ant/lib. > Include maven-ant-tasks jar in the source tree and use this jar from > generate-maven-artifacts > - > > Key: LUCENE-3204 > URL: https://issues.apache.org/jira/browse/LUCENE-3204 > Project: Lucene - Java > Issue Type: Improvement > Components: general/build >Affects Versions: 3.3, 4.0 >Reporter: Steven Rowe >Assignee: Steven Rowe > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3204.patch, LUCENE-3204.patch > > > Currently, running {{ant generate-maven-artifacts}} requires the user to have > {{maven-ant-tasks-*.jar}} in their Ant classpath, e.g. in {{~/.ant/lib/}}. > The build should instead rely on a copy of this jar included in the source > tree. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 8840 - Failure
This build failed because of a missing NOTICE file for the maven-ant-tasks jar. I'm adding it now. - Steve > -Original Message- > From: Apache Jenkins Server [mailto:jenk...@builds.apache.org] > Sent: Tuesday, June 14, 2011 5:37 PM > To: dev@lucene.apache.org > Subject: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 8840 - Failure > > Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8840/ > > No tests ran. > > Build Log (for compile errors): > [...truncated 2261 lines...] > > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3201) improved compound file handling
[ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3201: Attachment: LUCENE-3201.patch here is an updated patch, including impls for SimpleFS and NIOFS, fixing the FileSwitchDirectory thing uwe mentioned, and also mockdirectorywrapper and NRTCachingDirectory. all the tests pass with Simple/NIO/MMap but we need to benchmark. haven't had good luck today with luceneutil > improved compound file handling > --- > > Key: LUCENE-3201 > URL: https://issues.apache.org/jira/browse/LUCENE-3201 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Robert Muir > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3201.patch, LUCENE-3201.patch > > > Currently CompoundFileReader could use some improvements, i see the following > problems > * its CSIndexInput extends bufferedindexinput, which is stupid for > directories like mmap. > * it seeks on every readInternal > * its not possible for a directory to override or improve the handling of > compound files. > for example: it seems if you were impl'ing this thing from scratch, you would > just wrap the II directly (not extend BufferedIndexInput, > and add compound file offset X to seek() calls, and override length(). But of > course, then you couldnt throw read past EOF always when you should, > as a user could read into the next file and be left unaware. > however, some directories could handle this better. for example MMapDirectory > could return an indexinput that simply mmaps the 'slice' of the CFS file. > its underlying bytebuffer etc naturally does bounds checks already etc, so it > wouldnt need to be buffered, not even needing to add any offsets to seek(), > as its position would just work. > So I think we should try to refactor this so that a Directory can customize > how compound files are handled, the simplest > case for the least code change would be to add this to Directory.java: > {code} > public Directory openCompoundInput(String filename) { > return new CompoundFileReader(this, filename); > } > {code} > Because most code depends upon the fact compound files are implemented as a > Directory and transparent. at least then a subclass could override... > but the 'recursion' is a little ugly... we could still label it > expert+internal+experimental or whatever. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8840 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8840/ No tests ran. Build Log (for compile errors): [...truncated 2261 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[Lucene.Net] [jira] [Commented] (LUCENENET-425) MMapDirectory implementation
[ https://issues.apache.org/jira/browse/LUCENENET-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049435#comment-13049435 ] Ben West commented on LUCENENET-425: Unfortunately (or perhaps fortunately in that Digy doesn't need to do more work :-) MMap is slower on 64 bit too. Index is 2.2gb. {panel} Create index, FSDir: 419061 Create index, MMapdir: 532536 Search index, FSDir: 757 Search index, MMapdir: 2030 {panel} Reversing order: {panel} Search index, FSDir: 734 Search index, MMap dir: 1934 {panel} I have 8gb ram, so I think the entire index was able to be cached in memory by the OS. > MMapDirectory implementation > > > Key: LUCENENET-425 > URL: https://issues.apache.org/jira/browse/LUCENENET-425 > Project: Lucene.Net > Issue Type: New Feature >Affects Versions: Lucene.Net 2.9.4g >Reporter: Digy >Priority: Trivial > Fix For: Lucene.Net 2.9.4g > > Attachments: MMapDirectory.patch > > > Since this is not a direct port of MMapDirectory.java, I'll put it under > "Support" and implement MMapDirectory as > {code} > public class MMapDirectory:Lucene.Net.Support.MemoryMappedDirectory > { > } > {code} > If a Mem-Map can not be created(for ex, if the file is too big to fit in 32 > bit address range), it will default to FSDirectory.FSIndexInput > In my tests, I didn't see any performance gain in 32bit environment and I > consider it as better then nothing. > I would be happy if someone could send test results on 64bit platform. > DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (LUCENE-3204) Include maven-ant-tasks jar in the source tree and use this jar from generate-maven-artifacts
[ https://issues.apache.org/jira/browse/LUCENE-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe updated LUCENE-3204: Attachment: LUCENE-3204.patch Added CHANGES.txt entries, including mention of the fact that copies of the maven-ant-tasks jar in the Ant classpath take precedence over the copy in the Lucene/Solr source tree. Committing shortly. > Include maven-ant-tasks jar in the source tree and use this jar from > generate-maven-artifacts > - > > Key: LUCENE-3204 > URL: https://issues.apache.org/jira/browse/LUCENE-3204 > Project: Lucene - Java > Issue Type: Improvement > Components: general/build >Affects Versions: 3.3, 4.0 >Reporter: Steven Rowe >Assignee: Steven Rowe > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3204.patch, LUCENE-3204.patch > > > Currently, running {{ant generate-maven-artifacts}} requires the user to have > {{maven-ant-tasks-*.jar}} in their Ant classpath, e.g. in {{~/.ant/lib/}}. > The build should instead rely on a copy of this jar included in the source > tree. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2452) rewrite solr build system
[ https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049430#comment-13049430 ] Steven Rowe commented on SOLR-2452: --- Forgot to include the issue number in the comment, so it's not showing up here, but I just committed a merge with trunk up to r1135758. Here's the ViewVC link: http://svn.apache.org/viewvc?view=revision&revision=1135759 > rewrite solr build system > - > > Key: SOLR-2452 > URL: https://issues.apache.org/jira/browse/SOLR-2452 > Project: Solr > Issue Type: Task > Components: Build >Reporter: Robert Muir > Fix For: 3.3 > > > As discussed some in SOLR-2002 (but that issue is long and hard to follow), I > think we should rewrite the solr build system. > Its slow, cumbersome, and messy, and makes it hard for us to improve things. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1873) Commit Solr Cloud to trunk
[ https://issues.apache.org/jira/browse/SOLR-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049425#comment-13049425 ] Mark Miller commented on SOLR-1873: --- If I remember right (been a long time since I talked about it with Jon), I think loggly had to do some small custom hack for this type of thing as well - no issue that I know of - lets make a new issue. > Commit Solr Cloud to trunk > -- > > Key: SOLR-1873 > URL: https://issues.apache.org/jira/browse/SOLR-1873 > Project: Solr > Issue Type: New Feature >Affects Versions: 1.4 >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 4.0 > > Attachments: SOLR-1873.patch, SOLR-1873.patch, SOLR-1873.patch, > SOLR-1873.patch, SOLR-1873.patch, SOLR-1873.patch, SOLR-1873.patch, > SOLR-1873.patch, SOLR-1873.patch, SOLR-1873.patch, SOLR-1873.patch, > SOLR-1873.patch, SOLR-1873.patch, SOLR-1873.patch, > TEST-org.apache.solr.cloud.ZkSolrClientTest.txt, log4j-over-slf4j-1.5.5.jar, > zookeeper-3.2.2.jar, zookeeper-3.3.1.jar > > > See http://wiki.apache.org/solr/SolrCloud > This is a real hassle - I didn't merge up to trunk before all the svn > scrambling, so integrating cloud is now a bit difficult. I'm running through > and just preparing a commit by hand though (applying changes/handling > conflicts a file at a time). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: commitLockTimeout in solrconfig.xml
I've created SOLR-2591 for removing commitLockTimeout option. Martijn On 4 June 2011 13:47, Mark Miller wrote: > > On Jun 4, 2011, at 7:42 AM, Martijn v Groningen wrote: > > > the commitLockTimeout option is really not used I think we should remove > this > > +1. > > - Mark Miller > lucidimagination.com > > BERLIN BUZZWORDS JUNE 6-7TH, 2011 > > > > > > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > -- Met vriendelijke groet, Martijn van Groningen
[jira] [Created] (SOLR-2591) Remove commitLockTimeout option from solrconfig.xml
Remove commitLockTimeout option from solrconfig.xml --- Key: SOLR-2591 URL: https://issues.apache.org/jira/browse/SOLR-2591 Project: Solr Issue Type: Improvement Reporter: Martijn van Groningen Assignee: Martijn van Groningen Priority: Minor Fix For: 4.0 I've noticed that commitLockTimeout option is loaded by the configuration but no longer used. This issue will be concerned with removing this option from all solrconfig.xml files (including example) and from the SolrConfig class. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3204) Include maven-ant-tasks jar in the source tree and use this jar from generate-maven-artifacts
[ https://issues.apache.org/jira/browse/LUCENE-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049386#comment-13049386 ] Steven Rowe commented on LUCENE-3204: - I unpacked the jar, defaced the definitions file loaded by the : {{org/apache/maven/artifact/ant/antlib.xml}}, then repacked the now-mangled jar and put the result in {{~/.ant/lib/}}, while leaving intact the copy under {{lucene/lib/}}. The result: the mangled copy under {{~/.ant/lib/}} is visited first, resulting in an error. This means that the supplied version does *not* get preferred over what's already in {{~/.ant/lib/}}. I don't think this is a serious problem, but I'll make mention of it in the CHANGES.txt entry (to be included in another iteration of the patch). > Include maven-ant-tasks jar in the source tree and use this jar from > generate-maven-artifacts > - > > Key: LUCENE-3204 > URL: https://issues.apache.org/jira/browse/LUCENE-3204 > Project: Lucene - Java > Issue Type: Improvement > Components: general/build >Affects Versions: 3.3, 4.0 >Reporter: Steven Rowe >Assignee: Steven Rowe > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3204.patch > > > Currently, running {{ant generate-maven-artifacts}} requires the user to have > {{maven-ant-tasks-*.jar}} in their Ant classpath, e.g. in {{~/.ant/lib/}}. > The build should instead rely on a copy of this jar included in the source > tree. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3204) Include maven-ant-tasks jar in the source tree and use this jar from generate-maven-artifacts
[ https://issues.apache.org/jira/browse/LUCENE-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049382#comment-13049382 ] Steven Rowe commented on LUCENE-3204: - bq. Does the supplied version of maven-ant-task automatically get preferred over whats already in ~/.ant/lib ? I'm not sure. How can I test this? I removed the copy in {{lucene/lib/}} and put a copy of the jar in {{~/.ant/lib/}}. {{ant generate-maven-artifacts}} still succeeds. > Include maven-ant-tasks jar in the source tree and use this jar from > generate-maven-artifacts > - > > Key: LUCENE-3204 > URL: https://issues.apache.org/jira/browse/LUCENE-3204 > Project: Lucene - Java > Issue Type: Improvement > Components: general/build >Affects Versions: 3.3, 4.0 >Reporter: Steven Rowe >Assignee: Steven Rowe > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3204.patch > > > Currently, running {{ant generate-maven-artifacts}} requires the user to have > {{maven-ant-tasks-*.jar}} in their Ant classpath, e.g. in {{~/.ant/lib/}}. > The build should instead rely on a copy of this jar included in the source > tree. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3204) Include maven-ant-tasks jar in the source tree and use this jar from generate-maven-artifacts
[ https://issues.apache.org/jira/browse/LUCENE-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049374#comment-13049374 ] Uwe Schindler commented on LUCENE-3204: --- I think that's fine. Does the supplied version of maven-ant-task automatically get preferred over whats already in ~/.ant/lib ? > Include maven-ant-tasks jar in the source tree and use this jar from > generate-maven-artifacts > - > > Key: LUCENE-3204 > URL: https://issues.apache.org/jira/browse/LUCENE-3204 > Project: Lucene - Java > Issue Type: Improvement > Components: general/build >Affects Versions: 3.3, 4.0 >Reporter: Steven Rowe >Assignee: Steven Rowe > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3204.patch > > > Currently, running {{ant generate-maven-artifacts}} requires the user to have > {{maven-ant-tasks-*.jar}} in their Ant classpath, e.g. in {{~/.ant/lib/}}. > The build should instead rely on a copy of this jar included in the source > tree. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3204) Include maven-ant-tasks jar in the source tree and use this jar from generate-maven-artifacts
[ https://issues.apache.org/jira/browse/LUCENE-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe updated LUCENE-3204: Attachment: LUCENE-3204.patch Patch implementing the idea. > Include maven-ant-tasks jar in the source tree and use this jar from > generate-maven-artifacts > - > > Key: LUCENE-3204 > URL: https://issues.apache.org/jira/browse/LUCENE-3204 > Project: Lucene - Java > Issue Type: Improvement > Components: general/build >Affects Versions: 3.3, 4.0 >Reporter: Steven Rowe >Assignee: Steven Rowe > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3204.patch > > > Currently, running {{ant generate-maven-artifacts}} requires the user to have > {{maven-ant-tasks-*.jar}} in their Ant classpath, e.g. in {{~/.ant/lib/}}. > The build should instead rely on a copy of this jar included in the source > tree. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3204) Include maven-ant-tasks jar in the source tree and use this jar from generate-maven-artifacts
[ https://issues.apache.org/jira/browse/LUCENE-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049372#comment-13049372 ] Steven Rowe commented on LUCENE-3204: - Committing shortly. > Include maven-ant-tasks jar in the source tree and use this jar from > generate-maven-artifacts > - > > Key: LUCENE-3204 > URL: https://issues.apache.org/jira/browse/LUCENE-3204 > Project: Lucene - Java > Issue Type: Improvement > Components: general/build >Affects Versions: 3.3, 4.0 >Reporter: Steven Rowe >Assignee: Steven Rowe > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3204.patch > > > Currently, running {{ant generate-maven-artifacts}} requires the user to have > {{maven-ant-tasks-*.jar}} in their Ant classpath, e.g. in {{~/.ant/lib/}}. > The build should instead rely on a copy of this jar included in the source > tree. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3204) Include maven-ant-tasks jar in the source tree and use this jar from generate-maven-artifacts
Include maven-ant-tasks jar in the source tree and use this jar from generate-maven-artifacts - Key: LUCENE-3204 URL: https://issues.apache.org/jira/browse/LUCENE-3204 Project: Lucene - Java Issue Type: Improvement Components: general/build Affects Versions: 3.3, 4.0 Reporter: Steven Rowe Assignee: Steven Rowe Fix For: 3.3, 4.0 Attachments: LUCENE-3204.patch Currently, running {{ant generate-maven-artifacts}} requires the user to have {{maven-ant-tasks-*.jar}} in their Ant classpath, e.g. in {{~/.ant/lib/}}. The build should instead rely on a copy of this jar included in the source tree. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[Lucene.Net] [jira] [Updated] (LUCENENET-425) MMapDirectory implementation
[ https://issues.apache.org/jira/browse/LUCENENET-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Currens updated LUCENENET-425: -- Comment: was deleted (was: On a 1.18GB index of only text: FS Reader: 27 MMap Reader: 90 --- FS Reader: 38 MMap Reader: 77 Press any key to continue . . .) > MMapDirectory implementation > > > Key: LUCENENET-425 > URL: https://issues.apache.org/jira/browse/LUCENENET-425 > Project: Lucene.Net > Issue Type: New Feature >Affects Versions: Lucene.Net 2.9.4g >Reporter: Digy >Priority: Trivial > Fix For: Lucene.Net 2.9.4g > > Attachments: MMapDirectory.patch > > > Since this is not a direct port of MMapDirectory.java, I'll put it under > "Support" and implement MMapDirectory as > {code} > public class MMapDirectory:Lucene.Net.Support.MemoryMappedDirectory > { > } > {code} > If a Mem-Map can not be created(for ex, if the file is too big to fit in 32 > bit address range), it will default to FSDirectory.FSIndexInput > In my tests, I didn't see any performance gain in 32bit environment and I > consider it as better then nothing. > I would be happy if someone could send test results on 64bit platform. > DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-425) MMapDirectory implementation
[ https://issues.apache.org/jira/browse/LUCENENET-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049359#comment-13049359 ] Christopher Currens commented on LUCENENET-425: --- On a 1.18GB index of only text: FS Reader: 27 MMap Reader: 90 --- FS Reader: 38 MMap Reader: 77 Press any key to continue . . . > MMapDirectory implementation > > > Key: LUCENENET-425 > URL: https://issues.apache.org/jira/browse/LUCENENET-425 > Project: Lucene.Net > Issue Type: New Feature >Affects Versions: Lucene.Net 2.9.4g >Reporter: Digy >Priority: Trivial > Fix For: Lucene.Net 2.9.4g > > Attachments: MMapDirectory.patch > > > Since this is not a direct port of MMapDirectory.java, I'll put it under > "Support" and implement MMapDirectory as > {code} > public class MMapDirectory:Lucene.Net.Support.MemoryMappedDirectory > { > } > {code} > If a Mem-Map can not be created(for ex, if the file is too big to fit in 32 > bit address range), it will default to FSDirectory.FSIndexInput > In my tests, I didn't see any performance gain in 32bit environment and I > consider it as better then nothing. > I would be happy if someone could send test results on 64bit platform. > DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (SOLR-2548) Multithreaded faceting
[ https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049355#comment-13049355 ] Yonik Seeley commented on SOLR-2548: I think this should be configurable on a per-request basis (not the max size of the threadpool, but how many threads out of that to use concurrently). For facet.method=fcs (per-segment faceting using the field cache), I did introduce a "threads" localParam. Perhaps we should have a "threads" or "facet.threads" request parameter? > Multithreaded faceting > -- > > Key: SOLR-2548 > URL: https://issues.apache.org/jira/browse/SOLR-2548 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 3.1 >Reporter: Janne Majaranta >Priority: Minor > Labels: facet > Attachments: SOLR-2548.patch, SOLR-2548_for_31x.patch > > > Add multithreading support for faceting. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3201) improved compound file handling
[ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049352#comment-13049352 ] Uwe Schindler commented on LUCENE-3201: --- We have LUCENE-1743 for the small files can of worms. > improved compound file handling > --- > > Key: LUCENE-3201 > URL: https://issues.apache.org/jira/browse/LUCENE-3201 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Robert Muir > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3201.patch > > > Currently CompoundFileReader could use some improvements, i see the following > problems > * its CSIndexInput extends bufferedindexinput, which is stupid for > directories like mmap. > * it seeks on every readInternal > * its not possible for a directory to override or improve the handling of > compound files. > for example: it seems if you were impl'ing this thing from scratch, you would > just wrap the II directly (not extend BufferedIndexInput, > and add compound file offset X to seek() calls, and override length(). But of > course, then you couldnt throw read past EOF always when you should, > as a user could read into the next file and be left unaware. > however, some directories could handle this better. for example MMapDirectory > could return an indexinput that simply mmaps the 'slice' of the CFS file. > its underlying bytebuffer etc naturally does bounds checks already etc, so it > wouldnt need to be buffered, not even needing to add any offsets to seek(), > as its position would just work. > So I think we should try to refactor this so that a Directory can customize > how compound files are handled, the simplest > case for the least code change would be to add this to Directory.java: > {code} > public Directory openCompoundInput(String filename) { > return new CompoundFileReader(this, filename); > } > {code} > Because most code depends upon the fact compound files are implemented as a > Directory and transparent. at least then a subclass could override... > but the 'recursion' is a little ugly... we could still label it > expert+internal+experimental or whatever. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[Lucene.Net] [jira] [Commented] (LUCENENET-417) implement streams as field values
[ https://issues.apache.org/jira/browse/LUCENENET-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049350#comment-13049350 ] Digy commented on LUCENENET-417: Maybe, this is a stupid question but, what is the reason to index a very large doc? If I indexed a whole book as single document, It would appear in almost every kind of search's result sets. search "computer" --> this book. search "sport" --> this book. search "politics" --> this book. DIGY > implement streams as field values > - > > Key: LUCENENET-417 > URL: https://issues.apache.org/jira/browse/LUCENENET-417 > Project: Lucene.Net > Issue Type: New Feature > Components: Lucene.Net Core >Reporter: Christopher Currens > Attachments: StreamValues.patch > > > Adding binary values to a field is an expensive operation, as the whole > binary data must be loaded into memory and then written to the index. Adding > the ability to use a stream instead of a byte array could not only speed up > the indexing process, but reducing the memory footprint as well. > -Java lucene has the ability to use a TextReader the both analyze and store > text in the index.- Lucene.NET lacks the ability to store string data in the > index via streams. This should be a feature added into Lucene .NET as well. > My thoughts are to add another Field constructor, that is Field(string name, > System.IO.Stream stream, System.Text.Encoding encoding), that will allow the > text to be analyzed and stored into the index. > Comments about this approach are greatly appreciated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LUCENE-3201) improved compound file handling
[ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049346#comment-13049346 ] Robert Muir commented on LUCENE-3201: - I agree, the fileswitchdirectory should delegate the openCompoundInput. As far as mapping small things, I think we should set this aside for another issue. as far as this issue goes, I don't mind returning the DefaultCompound impl if unmapping isn't supported, but i'd really rather defer the open the can of worms of 'mapping small things' to some other issue :) > improved compound file handling > --- > > Key: LUCENE-3201 > URL: https://issues.apache.org/jira/browse/LUCENE-3201 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Robert Muir > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3201.patch > > > Currently CompoundFileReader could use some improvements, i see the following > problems > * its CSIndexInput extends bufferedindexinput, which is stupid for > directories like mmap. > * it seeks on every readInternal > * its not possible for a directory to override or improve the handling of > compound files. > for example: it seems if you were impl'ing this thing from scratch, you would > just wrap the II directly (not extend BufferedIndexInput, > and add compound file offset X to seek() calls, and override length(). But of > course, then you couldnt throw read past EOF always when you should, > as a user could read into the next file and be left unaware. > however, some directories could handle this better. for example MMapDirectory > could return an indexinput that simply mmaps the 'slice' of the CFS file. > its underlying bytebuffer etc naturally does bounds checks already etc, so it > wouldnt need to be buffered, not even needing to add any offsets to seek(), > as its position would just work. > So I think we should try to refactor this so that a Directory can customize > how compound files are handled, the simplest > case for the least code change would be to add this to Directory.java: > {code} > public Directory openCompoundInput(String filename) { > return new CompoundFileReader(this, filename); > } > {code} > Because most code depends upon the fact compound files are implemented as a > Directory and transparent. at least then a subclass could override... > but the 'recursion' is a little ugly... we could still label it > expert+internal+experimental or whatever. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2548) Multithreaded faceting
[ https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049341#comment-13049341 ] Hoss Man commented on SOLR-2548: Janne: thanks for the awesome patch! in general i think this type of functionality is a good idea -- the real question is how it should be configured/controlled. admins with many CPUs who expect low amounts of concurrent user traffic might be ok with spawning availableProcessors() threads per request, but admins with less CPUs then concurrent requests are going to prefer that individual requests stay single threaded and take a little longer. The suggestion to use a thread pool based executorservice definitely seems like it makes more sense, then it just becomes a matter of asking the admin to configure a simple number determining the size of the threadpool, we just need to support a few sentinal values: NUM_CPUS, and NONE (always use callers thread). since we'd want a threadpool that lives longer then a single request, this definitely shouldn't be an option specified via SolrParams (not to mention the risk involved if people don't lock it down with invariants). That leaves the question of wether this should be an "init" param on the FacetComponent, or something more global. My first thought was that we should bite the bullet and add a new top level config for a global thread pool executor service that any solr plugin could start using, but after thinking about it some more i think that would not only be premature, but perhaps even a bad idea in general -- even if we assume something like DIH, UIMA, or Highlighting could also take advantage of a shared thread pool owned by solr, if you really care about parallelizing faceting, you probably wouldn't want some other intensive component starving out the thread pool (or vice versa) > Multithreaded faceting > -- > > Key: SOLR-2548 > URL: https://issues.apache.org/jira/browse/SOLR-2548 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 3.1 >Reporter: Janne Majaranta >Priority: Minor > Labels: facet > Attachments: SOLR-2548.patch, SOLR-2548_for_31x.patch > > > Add multithreading support for faceting. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Welcome Jan Høydahl as Lucene/Solr committer
Hi all, Thanks a lot to the PMC for entrusting me this role! % whoami I'm a hacker from Norway, soon to become 101000 summers old. Married to wonderful Hilde, living outside Oslo with our cat Rosåsi (meaning "grey" in Arabic). I love snowboarding, kayaking, travelling, working with immigrants and volunteering in my local church. Started my IT carreer at age 12 programming Basic on the C=64 while my brother was playing games. Then later 68000 assembly and C on the Amiga. Sold my first program around 1993, an AREXX script for NComm on Amiga, helping people save money placing ads faster on the national teletext service :) I've also programmed Turbo Pascal, C++, PLEX-C, ASA110, Python, PHP, Ruby, and even assembly for the HP-48 calculator :) Been a Solaris, Lunix and for the last 5 years Mac user. Fast forward to 1998 when I learnt Java and helped develop Ericsson's first IP telephony service way before JIT compilers etc. I became one of FAST's first Global Services consultants in 2000 in the days of AllTheWeb™ and before they even had an enterprise search product. The search engine consisted of a few C++ binaries; "findex" writing the index and "fsearch" searching it. Then came RealTimeSearch, FDS and finally ESP. After 5 years @ FAST I "committed" a software outsourcing startup for a few years before founding Cominvent to do full-time search consulting on FAST technology (or so I thought..). I played some with Lucene in 2006 but then picked up Solr in 2009, and now 95% of the business is on Solr/Lucene and 5% on FAST. What a change! I love Apache, Open source, the Apache License and the Lucene community. With more than a decade experience from Enterprise Search and well over 100 customer projects, I've learnt a thing or two which I'm now doing my best to share with my customers and the community. Now hopefully more of that will be as code. One of the first areas I'm hoping to help is UpdateChain related stuff as well as Norwegian/Nordic language support. http://no.linkedin.com/in/janhoy http://twitter.com/cominvent -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 13. juni 2011, at 16.43, Mark Miller wrote: > I'm happy to announce that the Lucene/Solr PMC has voted in Jan Høydahl as > our newest committer. > > Jan, if you don't mind, could you introduce yourself with a brief bio as has > become our tradition? > > Congratulations and welcome aboard! > > > - Mark Miller > lucidimagination.com > > > > > > > > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3201) improved compound file handling
[ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049335#comment-13049335 ] Uwe Schindler commented on LUCENE-3201: --- Hi Robert, great patch, exactly as I would have wished to have it when we discussed about it! Patch looks file, small bug: - FileSwitchDirectory should also override the openCompoundInput() from Directory and delegate to the correct underlying directory. Now it always uses the default impl, which is double buffering. So if you e.g. put MMapDirectory as a delegate for CFS files, those files would be opened like before your patch. Just copy'n'paste the code from one of the other FileSwitchDirectory methods. Some suggestions: We currently map the whole compound file into address space, read the header/contents and unmap it again. This may be some overhead especially if unmapping is not supported. - We could use SimpleFSIndexInput to read CFS contents (we only need to pass the already open RAF there, alternatively use Dawids new wrapper IndexInput around a standard InputStream, got from RAF -> LUCENE-3202) - Only map the header of the CFS file, the problem: we dont know exact size. > improved compound file handling > --- > > Key: LUCENE-3201 > URL: https://issues.apache.org/jira/browse/LUCENE-3201 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Robert Muir > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3201.patch > > > Currently CompoundFileReader could use some improvements, i see the following > problems > * its CSIndexInput extends bufferedindexinput, which is stupid for > directories like mmap. > * it seeks on every readInternal > * its not possible for a directory to override or improve the handling of > compound files. > for example: it seems if you were impl'ing this thing from scratch, you would > just wrap the II directly (not extend BufferedIndexInput, > and add compound file offset X to seek() calls, and override length(). But of > course, then you couldnt throw read past EOF always when you should, > as a user could read into the next file and be left unaware. > however, some directories could handle this better. for example MMapDirectory > could return an indexinput that simply mmaps the 'slice' of the CFS file. > its underlying bytebuffer etc naturally does bounds checks already etc, so it > wouldnt need to be buffered, not even needing to add any offsets to seek(), > as its position would just work. > So I think we should try to refactor this so that a Directory can customize > how compound files are handled, the simplest > case for the least code change would be to add this to Directory.java: > {code} > public Directory openCompoundInput(String filename) { > return new CompoundFileReader(this, filename); > } > {code} > Because most code depends upon the fact compound files are implemented as a > Directory and transparent. at least then a subclass could override... > but the 'recursion' is a little ugly... we could still label it > expert+internal+experimental or whatever. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[Lucene.Net] [jira] [Commented] (LUCENENET-425) MMapDirectory implementation
[ https://issues.apache.org/jira/browse/LUCENENET-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049299#comment-13049299 ] Ben West commented on LUCENENET-425: The entire "Store" set of tests (including TestWindowsMMap) passes on Windows 7 64 bit with your patch. Let me know if there are other tests you'd like me to run. I'm not familiar with what mmap directories do, so I probably won't be able to write a perf test myself. > MMapDirectory implementation > > > Key: LUCENENET-425 > URL: https://issues.apache.org/jira/browse/LUCENENET-425 > Project: Lucene.Net > Issue Type: New Feature >Affects Versions: Lucene.Net 2.9.4g >Reporter: Digy >Priority: Trivial > Fix For: Lucene.Net 2.9.4g > > Attachments: MMapDirectory.patch > > > Since this is not a direct port of MMapDirectory.java, I'll put it under > "Support" and implement MMapDirectory as > {code} > public class MMapDirectory:Lucene.Net.Support.MemoryMappedDirectory > { > } > {code} > If a Mem-Map can not be created(for ex, if the file is too big to fit in 32 > bit address range), it will default to FSDirectory.FSIndexInput > In my tests, I didn't see any performance gain in 32bit environment and I > consider it as better then nothing. > I would be happy if someone could send test results on 64bit platform. > DIGY -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LUCENE-3197) Optimize runs forever if you keep deleting docs at the same time
[ https://issues.apache.org/jira/browse/LUCENE-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049298#comment-13049298 ] Yonik Seeley commented on LUCENE-3197: -- Regardless of if one views this as a bug or not, I think the more useful semantics are to at least "merge all of the current segments into 1 and remove all *currently* deleted docs" (i.e. I agree with Mike). The alternative is that optimize is dangerous in the presence of index updates (i.e. applications should discontinue updates if they call optimize). > Optimize runs forever if you keep deleting docs at the same time > > > Key: LUCENE-3197 > URL: https://issues.apache.org/jira/browse/LUCENE-3197 > Project: Lucene - Java > Issue Type: Bug > Components: core/index >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 3.3, 4.0 > > > Because we "cascade" merges for an optimize... if you also delete documents > while the merges are running, then the merge policy will see the resulting > single segment as still not optimized (since it has pending deletes) and do a > single-segment merge, and will repeat indefinitely (as long as your app keeps > deleting docs). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3201) improved compound file handling
[ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3201: Fix Version/s: 4.0 3.3 setting 3.3/4.0 as fix version, as the changes are backwards compatible (compoundfilereader is pkg-private still in 3.x) > improved compound file handling > --- > > Key: LUCENE-3201 > URL: https://issues.apache.org/jira/browse/LUCENE-3201 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Robert Muir > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3201.patch > > > Currently CompoundFileReader could use some improvements, i see the following > problems > * its CSIndexInput extends bufferedindexinput, which is stupid for > directories like mmap. > * it seeks on every readInternal > * its not possible for a directory to override or improve the handling of > compound files. > for example: it seems if you were impl'ing this thing from scratch, you would > just wrap the II directly (not extend BufferedIndexInput, > and add compound file offset X to seek() calls, and override length(). But of > course, then you couldnt throw read past EOF always when you should, > as a user could read into the next file and be left unaware. > however, some directories could handle this better. for example MMapDirectory > could return an indexinput that simply mmaps the 'slice' of the CFS file. > its underlying bytebuffer etc naturally does bounds checks already etc, so it > wouldnt need to be buffered, not even needing to add any offsets to seek(), > as its position would just work. > So I think we should try to refactor this so that a Directory can customize > how compound files are handled, the simplest > case for the least code change would be to add this to Directory.java: > {code} > public Directory openCompoundInput(String filename) { > return new CompoundFileReader(this, filename); > } > {code} > Because most code depends upon the fact compound files are implemented as a > Directory and transparent. at least then a subclass could override... > but the 'recursion' is a little ugly... we could still label it > expert+internal+experimental or whatever. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049293#comment-13049293 ] Michael McCandless commented on LUCENE-2793: LUCENE-3203 is another example where a Dir needs the IOContext so it can optionally rate limit the bytes/second if it's a merge. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Varun Thacker > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3203) Rate-limit IO used by merging
[ https://issues.apache.org/jira/browse/LUCENE-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3203: --- Attachment: LUCENE-3203.patch Patch, with a hacked up a prototype impl, but I don't think we should commit it like this. Instead, I think we should wait for IOContext, and then Dir impls can allow app to specify max merge write rate. > Rate-limit IO used by merging > - > > Key: LUCENE-3203 > URL: https://issues.apache.org/jira/browse/LUCENE-3203 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3203.patch > > > Large merges can mess up searches and increase NRT reopen time (see > http://blog.mikemccandless.com/2011/06/lucenes-near-real-time-search-is-fast.html). > A simple rate limiter improves the spikey NRT reopen times during big > merges, so I think we should somehow make this possible. Likely this > would reduce impact on searches as well. > Typically apps that do indexing and searching on same box are in no > rush to see the merges complete so this is a good tradeoff. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3203) Rate-limit IO used by merging
Rate-limit IO used by merging - Key: LUCENE-3203 URL: https://issues.apache.org/jira/browse/LUCENE-3203 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.3, 4.0 Large merges can mess up searches and increase NRT reopen time (see http://blog.mikemccandless.com/2011/06/lucenes-near-real-time-search-is-fast.html). A simple rate limiter improves the spikey NRT reopen times during big merges, so I think we should somehow make this possible. Likely this would reduce impact on searches as well. Typically apps that do indexing and searching on same box are in no rush to see the merges complete so this is a good tradeoff. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3201) improved compound file handling
[ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049287#comment-13049287 ] Michael McCandless commented on LUCENE-3201: Patch looks great! Incredible that this means there's no penalty at all at search time when using CFS, if you use MMapDir. I like that CFS reader is now under oal.store not .index. > improved compound file handling > --- > > Key: LUCENE-3201 > URL: https://issues.apache.org/jira/browse/LUCENE-3201 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Robert Muir > Attachments: LUCENE-3201.patch > > > Currently CompoundFileReader could use some improvements, i see the following > problems > * its CSIndexInput extends bufferedindexinput, which is stupid for > directories like mmap. > * it seeks on every readInternal > * its not possible for a directory to override or improve the handling of > compound files. > for example: it seems if you were impl'ing this thing from scratch, you would > just wrap the II directly (not extend BufferedIndexInput, > and add compound file offset X to seek() calls, and override length(). But of > course, then you couldnt throw read past EOF always when you should, > as a user could read into the next file and be left unaware. > however, some directories could handle this better. for example MMapDirectory > could return an indexinput that simply mmaps the 'slice' of the CFS file. > its underlying bytebuffer etc naturally does bounds checks already etc, so it > wouldnt need to be buffered, not even needing to add any offsets to seek(), > as its position would just work. > So I think we should try to refactor this so that a Directory can customize > how compound files are handled, the simplest > case for the least code change would be to add this to Directory.java: > {code} > public Directory openCompoundInput(String filename) { > return new CompoundFileReader(this, filename); > } > {code} > Because most code depends upon the fact compound files are implemented as a > Directory and transparent. at least then a subclass could override... > but the 'recursion' is a little ugly... we could still label it > expert+internal+experimental or whatever. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3201) improved compound file handling
[ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3201: Attachment: LUCENE-3201.patch Initial patch for review. In this patch I only cut over MMapDirectory to using a special CompoundFileDirectory, all others use the default as before (but i cleaned up some things about it). Pretty sure i can easily improve SimpleFS and NIOFS, i'll take a look at that now, but I wanted to get this up for review. > improved compound file handling > --- > > Key: LUCENE-3201 > URL: https://issues.apache.org/jira/browse/LUCENE-3201 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Robert Muir > Attachments: LUCENE-3201.patch > > > Currently CompoundFileReader could use some improvements, i see the following > problems > * its CSIndexInput extends bufferedindexinput, which is stupid for > directories like mmap. > * it seeks on every readInternal > * its not possible for a directory to override or improve the handling of > compound files. > for example: it seems if you were impl'ing this thing from scratch, you would > just wrap the II directly (not extend BufferedIndexInput, > and add compound file offset X to seek() calls, and override length(). But of > course, then you couldnt throw read past EOF always when you should, > as a user could read into the next file and be left unaware. > however, some directories could handle this better. for example MMapDirectory > could return an indexinput that simply mmaps the 'slice' of the CFS file. > its underlying bytebuffer etc naturally does bounds checks already etc, so it > wouldnt need to be buffered, not even needing to add any offsets to seek(), > as its position would just work. > So I think we should try to refactor this so that a Directory can customize > how compound files are handled, the simplest > case for the least code change would be to add this to Directory.java: > {code} > public Directory openCompoundInput(String filename) { > return new CompoundFileReader(this, filename); > } > {code} > Because most code depends upon the fact compound files are implemented as a > Directory and transparent. at least then a subclass could override... > but the 'recursion' is a little ugly... we could still label it > expert+internal+experimental or whatever. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2574) upgrade SLF4J (primary motivation: simplifiy use of solrj)
[ https://issues.apache.org/jira/browse/SOLR-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-2574: Issue Type: Wish (was: Bug) since this is not a bug... lets change the status > upgrade SLF4J (primary motivation: simplifiy use of solrj) > -- > > Key: SOLR-2574 > URL: https://issues.apache.org/jira/browse/SOLR-2574 > Project: Solr > Issue Type: Wish >Reporter: Gabriele Kahlout >Assignee: Shalin Shekhar Mangar >Priority: Minor > Fix For: 3.3, 4.0 > > Attachments: solrjtest.zip > > > Whatever the merits of slf4j, a quick solrj test should work. > I've attached a sample 1-line project with dependency on solrj-3.2 on run it > prints: > {code} > java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.(CommonsHttpSolrServer.java:72) > at com.mysimpatico.solrjtest.App.main(App.java:12) > {code} > Uncomment the nop dependency and it will work. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8832 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8832/ 20 tests failed. REGRESSION: org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety Error Message: Error occurred in thread Thread-73: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/4/test6102461040tmp/_b_1.skp (Too many open files in system) Stack Trace: junit.framework.AssertionFailedError: Error occurred in thread Thread-73: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/4/test6102461040tmp/_b_1.skp (Too many open files in system) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/4/test6102461040tmp/_b_1.skp (Too many open files in system) at org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety(TestIndexReaderReopen.java:822) REGRESSION: org.apache.lucene.search.TestComplexExplanations.testCSQ4 Error Message: CheckIndex failed Stack Trace: java.lang.RuntimeException: CheckIndex failed at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:158) at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144) at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477) at org.apache.lucene.search.TestExplanations.tearDown(TestExplanations.java:66) at org.apache.lucene.search.TestComplexExplanations.tearDown(TestComplexExplanations.java:43) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) REGRESSION: org.apache.lucene.search.TestComplexExplanations.testDMQ10 Error Message: null Stack Trace: java.lang.NullPointerException at org.apache.lucene.search.TestComplexExplanations.tearDown(TestComplexExplanations.java:42) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) REGRESSION: org.apache.lucene.search.TestComplexExplanations.testMPQ7 Error Message: null Stack Trace: java.lang.NullPointerException at org.apache.lucene.search.TestComplexExplanations.tearDown(TestComplexExplanations.java:42) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) REGRESSION: org.apache.lucene.search.TestComplexExplanations.testBQ12 Error Message: null Stack Trace: java.lang.NullPointerException at org.apache.lucene.search.TestComplexExplanations.tearDown(TestComplexExplanations.java:42) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) REGRESSION: org.apache.lucene.search.TestComplexExplanations.testBQ13 Error Message: null Stack Trace: java.lang.NullPointerException at org.apache.lucene.search.TestComplexExplanations.tearDown(TestComplexExplanations.java:42) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) REGRESSION: org.apache.lucene.search.TestComplexExplanations.testBQ18 Error Message: null Stack Trace: java.lang.NullPointerException at org.apache.lucene.search.TestComplexExplanations.tearDown(TestComplexExplanations.java:42) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) REGRESSION: org.apache.lucene.search.TestComplexExplanations.testBQ21 Error Message: null Stack Trace: java.lang.NullPointerException at org.apache.lucene.search.TestComplexExplanations.tearDown(TestComplexExplanations.java:42) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) REGRESSION: org.apache.lucene.search.TestComplexExplanations.testBQ22 Error Message: null Stack Trace: java.lang.NullPointerException at org.apache.lucene.search.TestComplexExplanations.tearDown(TestComplexExplanations.java:42) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) at
[jira] [Issue Comment Edited] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)
[ https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049256#comment-13049256 ] Martin Grotzke edited comment on SOLR-2583 at 6/14/11 4:25 PM: --- I just compared memory consumption of the 3 different approaches, with different number of puts (number of scores) and sizes (number of docs), the memory is in byte: {noformat} Puts 1.000, size 1.000.000: CompactFloatArray 898.136,float[] 4.000.016, HashMap 72.192 Puts 10.000, size 1.000.000: CompactFloatArray 3.724.376, float[] 4.000.016, HashMap 702.784 Puts 100.000, size 1.000.000:CompactFloatArray 4.016.472, float[] 4.000.016, HashMap 6.607.808 Puts 1.000.000, size 1.000.000: CompactFloatArray 4.016.472, float[] 4.000.016, HashMap 44.644.032 Puts 1.000, size 5.000.000: CompactFloatArray 1.128.536, float[] 20.000.016, HashMap 72.256 Puts 10.000, size 5.000.000: CompactFloatArray 8.168.536, float[] 20.000.016, HashMap 704.832 Puts 100.000, size 5.000.000:CompactFloatArray 20.013.144, float[] 20.000.016, HashMap 7.385.152 Puts 1.000.000, size 5.000.000: CompactFloatArray 20.131.160, float[] 20.000.016, HashMap 66.395.584 Puts 1.000, size 10.000.000: CompactFloatArray 1.275.992, float[] 40.000.016, HashMap 72.256 Puts 10.000, size 10.000.000:CompactFloatArray 9.289.816, float[] 40.000.016, HashMap 705.280 Puts 100.000, size 10.000.000: CompactFloatArray 37.130.328, float[] 40.000.016, HashMap 7.418.112 Puts 1.000.000, size 10.000.000: CompactFloatArray 40.262.232, float[] 40.000.016, HashMap 69.282.496 {noformat} I want to share this intermediately, without further interpretation/conclusion for now (I just need to get the train). was (Author: martin.grotzke): I just compared memory consumption of the 3 different approaches, with different number of puts (number of scores) and sizes (number of docs): {noformat} Puts 1.000, size 1.000.000: CompactFloatArray 898.136,float[] 4.000.016, HashMap 72.192 Puts 10.000, size 1.000.000: CompactFloatArray 3.724.376, float[] 4.000.016, HashMap 702.784 Puts 100.000, size 1.000.000:CompactFloatArray 4.016.472, float[] 4.000.016, HashMap 6.607.808 Puts 1.000.000, size 1.000.000: CompactFloatArray 4.016.472, float[] 4.000.016, HashMap 44.644.032 Puts 1.000, size 5.000.000: CompactFloatArray 1.128.536, float[] 20.000.016, HashMap 72.256 Puts 10.000, size 5.000.000: CompactFloatArray 8.168.536, float[] 20.000.016, HashMap 704.832 Puts 100.000, size 5.000.000:CompactFloatArray 20.013.144, float[] 20.000.016, HashMap 7.385.152 Puts 1.000.000, size 5.000.000: CompactFloatArray 20.131.160, float[] 20.000.016, HashMap 66.395.584 Puts 1.000, size 10.000.000: CompactFloatArray 1.275.992, float[] 40.000.016, HashMap 72.256 Puts 10.000, size 10.000.000:CompactFloatArray 9.289.816, float[] 40.000.016, HashMap 705.280 Puts 100.000, size 10.000.000: CompactFloatArray 37.130.328, float[] 40.000.016, HashMap 7.418.112 Puts 1.000.000, size 10.000.000: CompactFloatArray 40.262.232, float[] 40.000.016, HashMap 69.282.496 {noformat} I want to share this intermediately, without further interpretation/conclusion for now (I just need to get the train). > Make external scoring more efficient (ExternalFileField, FileFloatSource) > - > > Key: SOLR-2583 > URL: https://issues.apache.org/jira/browse/SOLR-2583 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Martin Grotzke >Priority: Minor > Attachments: FileFloatSource.java.patch, patch.txt > > > External scoring eats much memory, depending on the number of documents in > the index. The ExternalFileField (used for external scoring) uses > FileFloatSource, where one FileFloatSource is created per external scoring > file. FileFloatSource creates a float array with the size of the number of > docs (this is also done if the file to load is not found). If there are much > less entries in the scoring file than there are number of docs in total the > big float array wastes much memory. > This could be optimized by using a map of doc -> score, so that the map > contains as many entries as there are scoring entries in the external file, > but not more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)
[ https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049256#comment-13049256 ] Martin Grotzke commented on SOLR-2583: -- I just compared memory consumption of the 3 different approaches, with different number of puts (number of scores) and sizes (number of docs): {noformat} Puts 1.000, size 1.000.000: CompactFloatArray 898.136,float[] 4.000.016, HashMap 72.192 Puts 10.000, size 1.000.000: CompactFloatArray 3.724.376, float[] 4.000.016, HashMap 702.784 Puts 100.000, size 1.000.000:CompactFloatArray 4.016.472, float[] 4.000.016, HashMap 6.607.808 Puts 1.000.000, size 1.000.000: CompactFloatArray 4.016.472, float[] 4.000.016, HashMap 44.644.032 Puts 1.000, size 5.000.000: CompactFloatArray 1.128.536, float[] 20.000.016, HashMap 72.256 Puts 10.000, size 5.000.000: CompactFloatArray 8.168.536, float[] 20.000.016, HashMap 704.832 Puts 100.000, size 5.000.000:CompactFloatArray 20.013.144, float[] 20.000.016, HashMap 7.385.152 Puts 1.000.000, size 5.000.000: CompactFloatArray 20.131.160, float[] 20.000.016, HashMap 66.395.584 Puts 1.000, size 10.000.000: CompactFloatArray 1.275.992, float[] 40.000.016, HashMap 72.256 Puts 10.000, size 10.000.000:CompactFloatArray 9.289.816, float[] 40.000.016, HashMap 705.280 Puts 100.000, size 10.000.000: CompactFloatArray 37.130.328, float[] 40.000.016, HashMap 7.418.112 Puts 1.000.000, size 10.000.000: CompactFloatArray 40.262.232, float[] 40.000.016, HashMap 69.282.496 {noformat} I want to share this intermediately, without further interpretation/conclusion for now (I just need to get the train). > Make external scoring more efficient (ExternalFileField, FileFloatSource) > - > > Key: SOLR-2583 > URL: https://issues.apache.org/jira/browse/SOLR-2583 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Martin Grotzke >Priority: Minor > Attachments: FileFloatSource.java.patch, patch.txt > > > External scoring eats much memory, depending on the number of documents in > the index. The ExternalFileField (used for external scoring) uses > FileFloatSource, where one FileFloatSource is created per external scoring > file. FileFloatSource creates a float array with the size of the number of > docs (this is also done if the file to load is not found). If there are much > less entries in the scoring file than there are number of docs in total the > big float array wastes much memory. > This could be optimized by using a map of doc -> score, so that the map > contains as many entries as there are scoring entries in the external file, > but not more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3197) Optimize runs forever if you keep deleting docs at the same time
[ https://issues.apache.org/jira/browse/LUCENE-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049254#comment-13049254 ] Hoss Man commented on LUCENE-3197: -- is the possibility of a never ending optimize in this situation (never ending deletes) really something we need to "fix" ? i mean ... isn't this what hte user should expect? they've asked for a single segment w/o deletes, and then while we try to give it to them they keep deleting -- how is it bad that we optimize doesn't stop until it's completely done ? > Optimize runs forever if you keep deleting docs at the same time > > > Key: LUCENE-3197 > URL: https://issues.apache.org/jira/browse/LUCENE-3197 > Project: Lucene - Java > Issue Type: Bug > Components: core/index >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 3.3, 4.0 > > > Because we "cascade" merges for an optimize... if you also delete documents > while the merges are running, then the merge policy will see the resulting > single segment as still not optimized (since it has pending deletes) and do a > single-segment merge, and will repeat indefinitely (as long as your app keeps > deleting docs). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2588) Make Velocity an optional dependency in SolrCore
[ https://issues.apache.org/jira/browse/SOLR-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-2588: Priority: Minor (was: Major) Issue Type: Wish (was: Bug) Summary: Make Velocity an optional dependency in SolrCore (was: Solr doesn't work without Velocity on classpath) updating the JIRA description... since this is not a 'bug' > Make Velocity an optional dependency in SolrCore > > > Key: SOLR-2588 > URL: https://issues.apache.org/jira/browse/SOLR-2588 > Project: Solr > Issue Type: Wish >Affects Versions: 3.2 >Reporter: Gunnar Wagenknecht >Priority: Minor > Fix For: 3.3 > > > In 1.4. it was fine to run Solr without Velocity on the classpath. However, > in 3.2. SolrCore won't load because of a hard reference to the Velocity > response writer in a static initializer. > {noformat} > ... ERROR org.apache.solr.core.CoreContainer - > java.lang.NoClassDefFoundError: org/apache/velocity/context/Context > at org.apache.solr.core.SolrCore.(SolrCore.java:1447) > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:463) > at org.apache.solr.core.CoreContainer.load(CoreContainer.java:316) > at org.apache.solr.core.CoreContainer.load(CoreContainer.java:207) > {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2574) upgrade SLF4J (primary motivation: simplifiy use of solrj)
[ https://issues.apache.org/jira/browse/SOLR-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-2574: --- Summary: upgrade SLF4J (primary motivation: simplifiy use of solrj) (was: Add SLF4J-nop dependency) > upgrade SLF4J (primary motivation: simplifiy use of solrj) > -- > > Key: SOLR-2574 > URL: https://issues.apache.org/jira/browse/SOLR-2574 > Project: Solr > Issue Type: Bug >Reporter: Gabriele Kahlout >Assignee: Shalin Shekhar Mangar >Priority: Minor > Fix For: 3.3, 4.0 > > Attachments: solrjtest.zip > > > Whatever the merits of slf4j, a quick solrj test should work. > I've attached a sample 1-line project with dependency on solrj-3.2 on run it > prints: > {code} > java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.(CommonsHttpSolrServer.java:72) > at com.mysimpatico.solrjtest.App.main(App.java:12) > {code} > Uncomment the nop dependency and it will work. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2955) Add utitily class to manage NRT reopening
[ https://issues.apache.org/jira/browse/LUCENE-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2955. Resolution: Fixed Fix Version/s: 4.0 > Add utitily class to manage NRT reopening > - > > Key: LUCENE-2955 > URL: https://issues.apache.org/jira/browse/LUCENE-2955 > Project: Lucene - Java > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.3, 4.0 > > Attachments: LUCENE-2955.patch, LUCENE-2955.patch, LUCENE-2955.patch > > > I created a simple class, NRTManager, that tries to abstract away some > of the reopen logic when using NRT readers. > You give it your IW, tell it min and max nanoseconds staleness you can > tolerate, and it privately runs a reopen thread to periodically reopen > the searcher. > It subsumes the SearcherManager from LIA2. Besides running the reopen > thread, it also adds the notion of a "generation" containing changes > you've made. So eg it has addDocument, returning a long. You can > then take that long value and pass it back to the getSearcher method > and getSearcher will return a searcher that reflects the changes made > in that generation. > This gives your app the freedom to force "immediate" consistency (ie > wait for the reopen) only for those searches that require it, like a > verifier that adds a doc and then immediately searches for it, but > also use "eventual consistency" for other searches. > I want to also add support for the new "applyDeletions" option when > pulling an NRT reader. > Also, this is very new and I'm sure buggy -- the concurrency is either > wrong over overly-locking. But it's a start... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2588) Solr doesn't work without Velocity on classpath
[ https://issues.apache.org/jira/browse/SOLR-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049247#comment-13049247 ] Hoss Man commented on SOLR-2588: bq. With some small changes, Velocity could be optional. Velocity (and the velocitywriter) were optional before, and a conscious and deliberate choice was made to promote it into a core dependency so that admin code (and users) could start expecting it to reliable always work. if we want to re-consider i'm fine with having that discussion, but it shouldn't be an "optional core" feature ... either it's a core feature and dependency, or it's an optional contrib. It's not a bug that code in solr has a direct dependency on jars in the lib dir. > Solr doesn't work without Velocity on classpath > --- > > Key: SOLR-2588 > URL: https://issues.apache.org/jira/browse/SOLR-2588 > Project: Solr > Issue Type: Bug >Affects Versions: 3.2 >Reporter: Gunnar Wagenknecht > Fix For: 3.3 > > > In 1.4. it was fine to run Solr without Velocity on the classpath. However, > in 3.2. SolrCore won't load because of a hard reference to the Velocity > response writer in a static initializer. > {noformat} > ... ERROR org.apache.solr.core.CoreContainer - > java.lang.NoClassDefFoundError: org/apache/velocity/context/Context > at org.apache.solr.core.SolrCore.(SolrCore.java:1447) > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:463) > at org.apache.solr.core.CoreContainer.load(CoreContainer.java:316) > at org.apache.solr.core.CoreContainer.load(CoreContainer.java:207) > {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3198) Change default Directory impl on 64bit linux to MMap
[ https://issues.apache.org/jira/browse/LUCENE-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3198. Resolution: Fixed OK I cutover just Linux, 64 bit, when unmap is available. We can open new issues for other platforms when we have some data that MMap is better... > Change default Directory impl on 64bit linux to MMap > > > Key: LUCENE-3198 > URL: https://issues.apache.org/jira/browse/LUCENE-3198 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless > Fix For: 3.3, 4.0 > > > Consistently in my NRT testing on Fedora 13 Linux, 64 bit JVM (Oracle > 1.6.0_21) I see MMapDir getting better search and merge performance when > compared to NIOFSDir. > I think we should fix the default. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3190) TestStressIndexing2 testMultiConfig failure
[ https://issues.apache.org/jira/browse/LUCENE-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049202#comment-13049202 ] Simon Willnauer commented on LUCENE-3190: - I managed to reproduce this and trip into it with a debugger. So what happens here is bizarre :) Due to the very tight maxBufferedDocs (3) and maxRamBufferSizeMB (0.1MB) we have a pretty good chance that several in flight DWPT will trigger a flush after the document they are indexing right now. That means that if we have lets say 4 DWPT and 3 are already flushing and memory is close to the asserts limit we got a problem if there is already a 4th DWPT in flight (passed the stall check) the document can easily add enough bytes that we cross the asserts max expected ram and fail then. I am not sure how we can fix that right now but at least its not a bug in DWPT. > TestStressIndexing2 testMultiConfig failure > --- > > Key: LUCENE-3190 > URL: https://issues.apache.org/jira/browse/LUCENE-3190 > Project: Lucene - Java > Issue Type: Bug >Reporter: selckin >Assignee: Simon Willnauer > > trunk: r1134311 > reproducible > {code} > [junit] Testsuite: org.apache.lucene.index.TestStressIndexing2 > [junit] Tests run: 1, Failures: 2, Errors: 0, Time elapsed: 0.882 sec > [junit] > [junit] - Standard Error - > [junit] java.lang.AssertionError: ram was 460908 expected: 408216 flush > mem: 395100 active: 65808 > [junit] at > org.apache.lucene.index.DocumentsWriterFlushControl.assertMemory(DocumentsWriterFlushControl.java:102) > [junit] at > org.apache.lucene.index.DocumentsWriterFlushControl.doAfterDocument(DocumentsWriterFlushControl.java:164) > [junit] at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380) > [junit] at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1473) > [junit] at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1445) > [junit] at > org.apache.lucene.index.TestStressIndexing2$IndexingThread.indexDoc(TestStressIndexing2.java:723) > [junit] at > org.apache.lucene.index.TestStressIndexing2$IndexingThread.run(TestStressIndexing2.java:757) > [junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 > -Dtestmethod=testMultiConfig > -Dtests.seed=2571834029692482827:-8116419692655152763 > [junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 > -Dtestmethod=testMultiConfig > -Dtests.seed=2571834029692482827:-8116419692655152763 > [junit] The following exceptions were thrown by threads: > [junit] *** Thread: Thread-0 *** > [junit] junit.framework.AssertionFailedError: java.lang.AssertionError: > ram was 460908 expected: 408216 flush mem: 395100 active: 65808 > [junit] at junit.framework.Assert.fail(Assert.java:47) > [junit] at > org.apache.lucene.index.TestStressIndexing2$IndexingThread.run(TestStressIndexing2.java:762) > [junit] NOTE: test params are: codec=RandomCodecProvider: {f33=Standard, > f57=MockFixedIntBlock(blockSize=649), f11=Standard, f41=MockRandom, > f40=Standard, f62=MockRandom, f75=Standard, f73=MockSep, > f29=MockFixedIntBlock(blockSize=649), f83=MockRandom, f66=MockSep, > f49=MockVariableIntBlock(baseBlockSize=9), f72=Pulsing(freqCutoff=7), > f54=Standard, id=MockFixedIntBlock(blockSize=649), f80=MockRandom, > f94=MockSep, f93=Pulsing(freqCutoff=7), f95=Standard}, locale=en_SG, > timezone=Pacific/Palau > [junit] NOTE: all tests run in this JVM: > [junit] [TestStressIndexing2] > [junit] NOTE: Linux 2.6.39-gentoo amd64/Sun Microsystems Inc. 1.6.0_25 > (64-bit)/cpus=8,threads=1,free=133324528,total=158400512 > [junit] - --- > [junit] Testcase: > testMultiConfig(org.apache.lucene.index.TestStressIndexing2): FAILED > [junit] r1.numDocs()=17 vs r2.numDocs()=16 > [junit] junit.framework.AssertionFailedError: r1.numDocs()=17 vs > r2.numDocs()=16 > [junit] at > org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:308) > [junit] at > org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:278) > [junit] at > org.apache.lucene.index.TestStressIndexing2.testMultiConfig(TestStressIndexing2.java:124) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) > [junit] > [junit] > [junit] Testcase: > testMultiConfig(org.apache.lucene.index.TestStressIndexing2): FAILED > [junit] Some threads threw uncaught exceptions
[jira] [Commented] (SOLR-2305) DataImportScheduler - Marko Bonaci
[ https://issues.apache.org/jira/browse/SOLR-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049180#comment-13049180 ] Marko Bonaci commented on SOLR-2305: I'll attach the patch during the following weekend. > DataImportScheduler - Marko Bonaci > --- > > Key: SOLR-2305 > URL: https://issues.apache.org/jira/browse/SOLR-2305 > Project: Solr > Issue Type: New Feature >Affects Versions: 4.0 >Reporter: Bill Bell > Fix For: 4.0 > > > Marko Bonaci has updated the WIKI page to add the DataImportScheduler, but I > cannot find a JIRA ticket for it? > http://wiki.apache.org/solr/DataImportHandler > Do we have a ticket so the code can be tracked? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Indexing slower in trunk
For simple removing deletes, there is also IW.expungeDeletes(), which is less intensive! Not sure if solr support this, too, but as far as I know there is an issue open. Also please note: As soon as one segment is selected for merging (the merge policy may also do this dependent on the number of deletes in a segment), it will reclaim all deleted ressources - that's what merging does. So expunging deletes once per week is a good idea, if your index consists of very old and large segments that are rarely merged anymore and lots of documents are deleted from them. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Tuesday, June 14, 2011 3:19 PM > To: dev@lucene.apache.org > Subject: Re: Indexing slower in trunk > > Optimization used to have a very noticeable impact on search speed prior to > some index format changes from quite a while ago. > > At this point the effect is much less noticeable, but the thing optimize does > do is reclaim resources from deleted documents. If you have lots of > deletions, it's a good idea to periodically optimize, but in that case it's often > done pretty infrequently (once a > day/week/month) rather than as part of any ongoing indexing process. > > Best > Erick > > 2011/6/14 Yury Kats : > > On 6/14/2011 4:28 AM, Uwe Schindler wrote: > >> indexing and optimizing was only a > >> good idea pre Lucene-2.9, now it's mostly obsolete) > > > > Could you please elaborate on this? Is optimizing obsolete in general > > or after indexing new documents? Is it obsolete after deletions? And > > what it "mostly"? > > > > Thanks! > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For > > additional commands, e-mail: dev-h...@lucene.apache.org > > > > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional > commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Indexing slower in trunk
Since Lucene 2.9, Lucene works on a per segment basis when searching. Since Lucene 3.1 it can even parallelize on multiple segments. If you optimize your index you only have one segment. Also, when you frequently reopen your indexes (e.g. after updates), the cost for warming the cache and FieldCache for a very large new segment ( and the CPU power lost for optimizing the whole index) does not rectify a more compact representation on disk. For search performance it has a minor impact. It's a much better idea to configure the MergePolicy accordingly to merge segments optimally (Lucene 3.2 has a TieredMergePolicy that's now the default). This one will minimize number of segments. The same overhead applies to Solr when you replicate your index. If you don't optimize, Solr will only exchange new segments between the indexes. As after a optimize the whole index is rebuilt, it has to always transfer the whole index files to the replicas. Optimizing only makes sense for e.g. read-only indexes that are build one time and never touched. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Yury Kats [mailto:yuryk...@yahoo.com] > Sent: Tuesday, June 14, 2011 3:04 PM > To: dev@lucene.apache.org > Subject: Re: Indexing slower in trunk > > On 6/14/2011 4:28 AM, Uwe Schindler wrote: > > indexing and optimizing was only a > > good idea pre Lucene-2.9, now it's mostly obsolete) > > Could you please elaborate on this? Is optimizing obsolete in general or after > indexing new documents? Is it obsolete after deletions? And what it > "mostly"? > > Thanks! > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional > commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Indexing slower in trunk
Optimization used to have a very noticeable impact on search speed prior to some index format changes from quite a while ago. At this point the effect is much less noticeable, but the thing optimize does do is reclaim resources from deleted documents. If you have lots of deletions, it's a good idea to periodically optimize, but in that case it's often done pretty infrequently (once a day/week/month) rather than as part of any ongoing indexing process. Best Erick 2011/6/14 Yury Kats : > On 6/14/2011 4:28 AM, Uwe Schindler wrote: >> indexing and optimizing was only a >> good idea pre Lucene-2.9, now it's mostly obsolete) > > Could you please elaborate on this? Is optimizing obsolete > in general or after indexing new documents? Is it obsolete > after deletions? And what it "mostly"? > > Thanks! > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Indexing slower in trunk
On 6/14/2011 4:28 AM, Uwe Schindler wrote: > indexing and optimizing was only a > good idea pre Lucene-2.9, now it's mostly obsolete) Could you please elaborate on this? Is optimizing obsolete in general or after indexing new documents? Is it obsolete after deletions? And what it "mostly"? Thanks! - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3200) Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2
[ https://issues.apache.org/jira/browse/LUCENE-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049166#comment-13049166 ] Uwe Schindler commented on LUCENE-3200: --- Thanks to Robert for help debugging my stupid + vs | problem and lots of fruitful discussions about the whole stuff and how to improve :) Thanks to Mike for testing on beast! Now you can refactor CFSIndexInput & Co! > Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized > of powers of 2 > --- > > Key: LUCENE-3200 > URL: https://issues.apache.org/jira/browse/LUCENE-3200 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, > LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200_tests.patch > > > Robert and me discussed a little bit after Mike's investigations, that using > SingleMMapIndexinput together with MultiMMapIndexInput leads to hotspot > slowdowns sometimes. > We had the following ideas: > - MultiMMapIndexInput is almost as fast as SingleMMapIndexInput, as the > switching between buffer boundaries is done in exception catch blocks. So > normal code path is always the same like for Single* > - Only the seek method uses strange calculations (the modulo is totally > bogus, it could be simply: int bufOffset = (int) (pos % maxBufSize); - very > strange way of calculating modulo in the original code) > - Because of speed we suggest to no longer use arbitrary buffer sizes. We > should pass only the power of 2 to the indexinput as size. All calculations > in seek and anywhere else would be simple bit shifts and AND operations (the > and masks for the modulo can be calculated in the ctor like NumericUtils does > when calculating precisionSteps). > - the maximum buffer size will now be 2^30, not 2^31-1. But thats not an > issue at all. In my opinion, a buffer size of 2^31-1 is stupid in all cases, > as it will no longer fit page boundaries and mmapping gets harder for the O/S. > We will provide a patch with those cleanups. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3200) Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2
[ https://issues.apache.org/jira/browse/LUCENE-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-3200. --- Resolution: Fixed Fix Version/s: 4.0 3.3 Committed trunk revision: 1135537 Committed 3.x revision: 1135538 > Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized > of powers of 2 > --- > > Key: LUCENE-3200 > URL: https://issues.apache.org/jira/browse/LUCENE-3200 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, > LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200_tests.patch > > > Robert and me discussed a little bit after Mike's investigations, that using > SingleMMapIndexinput together with MultiMMapIndexInput leads to hotspot > slowdowns sometimes. > We had the following ideas: > - MultiMMapIndexInput is almost as fast as SingleMMapIndexInput, as the > switching between buffer boundaries is done in exception catch blocks. So > normal code path is always the same like for Single* > - Only the seek method uses strange calculations (the modulo is totally > bogus, it could be simply: int bufOffset = (int) (pos % maxBufSize); - very > strange way of calculating modulo in the original code) > - Because of speed we suggest to no longer use arbitrary buffer sizes. We > should pass only the power of 2 to the indexinput as size. All calculations > in seek and anywhere else would be simple bit shifts and AND operations (the > and masks for the modulo can be calculated in the ctor like NumericUtils does > when calculating precisionSteps). > - the maximum buffer size will now be 2^30, not 2^31-1. But thats not an > issue at all. In my opinion, a buffer size of 2^31-1 is stupid in all cases, > as it will no longer fit page boundaries and mmapping gets harder for the O/S. > We will provide a patch with those cleanups. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Indexing slower in trunk
Thanks, guys. Yes, I am running it all locally and disk seeks may well be the culprit. This thread is mainly to be sure that the behavior I'm seeing is expected, or at least explainable. Really, I don't need to pursue this further unless there's actually data I can gather to help speed things up. If this is just a consequence of DWPT and/or my particular setup then that's fine. I'm mostly trying to understand the characteristics of indexing/searching on the trunk. This started with me exploring memory requirements, and is really just something I noticed along the way and wanted to get some feedback on. So, absent the commit step, the times are reasonably comparable. Can I impose upon one of you to give a two-sentence summary of what DWPT buys us from a user perspective? If memory serves it should have background merging and other goodies. Uwe: Yep, I was curious about optimize but understand that it's not required in recent code. That said, data is not searchable until a commit happens, so just for yucks I changed the optimize to a commit. Stats of that run below. Simon: OK, adjusted the ram buffer size to 512M, and it's a bit faster, but not all that much, see stats, and the delta could well be sampling errors, one run doth not a statistical certainty make. Up until the commit step, the admin stats page is showing no documents in the index so I think this setting completely avoids intermediate committing although that says nothing about the individual writers writing lots of segments to disk, that still happens. Added 188 docs. Took 1437 ms. cumulative interval (seconds) = 284 Added 189 docs. Took 1285 ms. cumulative interval (seconds) = 285 Added 190 docs. Took 1182 ms. cumulative interval (seconds) = 286 Added 191 docs. Took 1675 ms. cumulative interval (seconds) = 288 About to commit, total time so far: 290 Total Time Taken-> 395 seconds***100 secs for the commit to finish. Total documents added-> 1917728 Docs/sec-> 4855 Thanks, all Erick On Tue, Jun 14, 2011 at 4:39 AM, Simon Willnauer wrote: > Erick, it seems you need to adjust your settings for 4.0 a little. > When you index with DWPT it builds thread private segments which are > independently flushed to disk. Yet, when you set your ram buffer IW > will accumulate the ram used by all active DWPT and flush the largest > once you reach your ram buffer. with 128M you might end up wil lots of > small segments which need to be merged in the background. Eventually > what will happen here is that your disk is so busy that you are not > able to flush fast enough and threads might stall. > > What you can try here is adjust your RAM buffer to be a little higher, > lets say 350MB or change the max number of thread states in > DocumentsWriterPerThreadPool ie. > ThreadAffinityDocumentsWriterThreadPool. The latter is unfortunately > not exposed yet in solr so maybe for testing you just want to change > the default value in DocumentsWriterPerThreadPool to 4. That will also > cause segments to be bigger eventually. > > simon > > On Tue, Jun 14, 2011 at 10:28 AM, Uwe Schindler wrote: >> Hi Erick, >> >> Do you use harddisks or SSDs? I assume harddisks, which may explain what you >> see: >> >> - DWPT writes lots of segments in parallel, which also explains why you are >> seeing more files. Writing in parallel to several files, needs more head >> movements of your harddisk and this slows down. In the past, only one >> segment was written at the same time (sequential), so the harddisk is not so >> stressed. >> - Optimizing may be slower for the same reason: there are many more files to >> merge (but optimize cost should not be counted as a problem here as normally >> you won't need to optimize after initial indexing and optimizing was only a >> good idea pre Lucene-2.9, now it's mostly obsolete) >> >> Uwe >> >> - >> Uwe Schindler >> H.-H.-Meier-Allee 63, D-28213 Bremen >> http://www.thetaphi.de >> eMail: u...@thetaphi.de >> >> >>> -Original Message- >>> From: Erick Erickson [mailto:erickerick...@gmail.com] >>> Sent: Tuesday, June 14, 2011 2:46 AM >>> To: dev@lucene.apache.org; simon.willna...@gmail.com >>> Subject: Re: Indexing slower in trunk >>> >>> Simon: >>> >>> Yep, I was asking to see if it was weird. Pursuant to our >>> chat I tried some things, results below: >>> >>> All these are running on my local machine, same disk, same >>> JVM settings re: memory. The SolrJ indexing is happening >>> from IntelliJ (OK, I'm lazy). >>> >>> Rambuffer is set at 128M in all cases. Merge factor is 10. >>> >>> I'm allocation 2G to the server and 2G to the indexer >>> >>> Servers get started like this: >>> _server = new StreamingUpdateSolrServer(url, 10, 4); >>> >>> I choose the threads and queue length semi-arbitrarily. >>> >>> Autocommit originally was 10,000 docs, 60,000 maxTime for >>> all tests, but I removed that in the case of trunk , substituting >>> a "commitWithin" in the SolrJ code of 10 minutes. That flattened >>> out the run up
Re: svn commit: r1135526 - in /lucene/dev/branches/branch_3x: ./ lucene/ lucene/backwards/ lucene/backwards/src/test-framework/ lucene/backwards/src/test/ lucene/src/java/org/apache/lucene/store/ solr
Well, at some point, we'll move to Java 1.6 and then we don't have to worry about this craziness anymore! Mike McCandless http://blog.mikemccandless.com On Tue, Jun 14, 2011 at 8:32 AM, Dawid Weiss wrote: > Thanks Mike. I keep forgetting about it. > > Dawid > > On Tue, Jun 14, 2011 at 2:28 PM, wrote: >> Author: mikemccand >> Date: Tue Jun 14 12:28:16 2011 >> New Revision: 1135526 >> >> URL: http://svn.apache.org/viewvc?rev=1135526&view=rev >> Log: >> can't override interface until Java 1.6 >> >> Modified: >> lucene/dev/branches/branch_3x/ (props changed) >> lucene/dev/branches/branch_3x/lucene/ (props changed) >> lucene/dev/branches/branch_3x/lucene/backwards/ (props changed) >> lucene/dev/branches/branch_3x/lucene/backwards/src/test/ (props changed) >> lucene/dev/branches/branch_3x/lucene/backwards/src/test-framework/ >> (props changed) >> >> lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/InputStreamDataInput.java >> >> lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/OutputStreamDataOutput.java >> lucene/dev/branches/branch_3x/solr/ (props changed) >> >> Modified: >> lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/InputStreamDataInput.java >> URL: >> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/InputStreamDataInput.java?rev=1135526&r1=1135525&r2=1135526&view=diff >> == >> --- >> lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/InputStreamDataInput.java >> (original) >> +++ >> lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/InputStreamDataInput.java >> Tue Jun 14 12:28:16 2011 >> @@ -51,7 +51,7 @@ public class InputStreamDataInput extend >> } >> } >> >> - @Override >> + // @Override -- not until Java 1.6 >> public void close() throws IOException { >> is.close(); >> } >> >> Modified: >> lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/OutputStreamDataOutput.java >> URL: >> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/OutputStreamDataOutput.java?rev=1135526&r1=1135525&r2=1135526&view=diff >> == >> --- >> lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/OutputStreamDataOutput.java >> (original) >> +++ >> lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/OutputStreamDataOutput.java >> Tue Jun 14 12:28:16 2011 >> @@ -39,6 +39,7 @@ public class OutputStreamDataOutput exte >> os.write(b, offset, length); >> } >> >> + // @Override -- not until Java 1.6 >> public void close() throws IOException { >> os.close(); >> } >> >> >> > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8832 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/8832/ No tests ran. Build Log (for compile errors): [...truncated 15595 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: svn commit: r1135487 - /lucene/dev/trunk/lucene/src/java/org/apache/lucene/util/fst/FST.java
On Tue, Jun 14, 2011 at 8:42 AM, Dawid Weiss wrote: >> Don't forget to fix on 3.x too :) > > Seems like you did it on both, right? No, I just fixed the @Override 1.6 only issue, not this one! >> Re using up -1, I agree... but, this actually simplified the *FSTEnum >> classes (I think?). > > I'll see if I can come up with something. Cool, thanks! Mike - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: External strings sort and case folding.
On Tue, Jun 14, 2011 at 8:35 AM, Dawid Weiss wrote: >> Merging FSTs sounds cool! > > Yep, it would be quite neat and relatively simple too. I looked at the > FST API again, after doing some work on those compounds and there are > several things that just hurt my eyes... I'll see if I can figure out > a nicer API... how emotionally attached are you to that simulation of > terminal states, Mike? :) I'm not at all! Fix away :) FST is very new and it needs some good iterating... Just make sure the *FSTEnum work ok -- I think that's why I added the END_LABEL. Mike - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: svn commit: r1135487 - /lucene/dev/trunk/lucene/src/java/org/apache/lucene/util/fst/FST.java
> Don't forget to fix on 3.x too :) Seems like you did it on both, right? > Re using up -1, I agree... but, this actually simplified the *FSTEnum > classes (I think?). I'll see if I can come up with something. Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: svn commit: r1135487 - /lucene/dev/trunk/lucene/src/java/org/apache/lucene/util/fst/FST.java
Don't forget to fix on 3.x too :) Re using up -1, I agree... but, this actually simplified the *FSTEnum classes (I think?). Mike McCandless http://blog.mikemccandless.com On Tue, Jun 14, 2011 at 7:13 AM, wrote: > Author: dweiss > Date: Tue Jun 14 11:13:27 2011 > New Revision: 1135487 > > URL: http://svn.apache.org/viewvc?rev=1135487&view=rev > Log: > Replaced magic constants with END_LABEL. I don't like this END_LABEL > thingy... it makes code more complex and, worse of all, it makes having -1 > label on a transition impossible. > > Modified: > lucene/dev/trunk/lucene/src/java/org/apache/lucene/util/fst/FST.java > > Modified: lucene/dev/trunk/lucene/src/java/org/apache/lucene/util/fst/FST.java > URL: > http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/src/java/org/apache/lucene/util/fst/FST.java?rev=1135487&r1=1135486&r2=1135487&view=diff > == > --- lucene/dev/trunk/lucene/src/java/org/apache/lucene/util/fst/FST.java > (original) > +++ lucene/dev/trunk/lucene/src/java/org/apache/lucene/util/fst/FST.java Tue > Jun 14 11:13:27 2011 > @@ -490,7 +490,7 @@ public class FST { > if (!targetHasArcs(follow)) { > //System.out.println(" end node"); > assert follow.isFinal(); > - arc.label = -1; > + arc.label = END_LABEL; > arc.output = follow.nextFinalOutput; > arc.flags = BIT_LAST_ARC; > return arc; > @@ -544,7 +544,7 @@ public class FST { > //System.out.println(" readFirstTarget follow.target=" + follow.target > + " isFinal=" + follow.isFinal()); > if (follow.isFinal()) { > // Insert "fake" final first arc: > - arc.label = -1; > + arc.label = END_LABEL; > arc.output = follow.nextFinalOutput; > if (follow.target <= 0) { > arc.flags = BIT_LAST_ARC; > @@ -599,7 +599,7 @@ public class FST { > > /** In-place read; returns the arc. */ > public Arc readNextArc(Arc arc) throws IOException { > - if (arc.label == -1) { > + if (arc.label == END_LABEL) { > // This was a fake inserted "final" arc > if (arc.nextArc <= 0) { > // This arc went to virtual final node, ie has no outgoing arcs > > > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: External strings sort and case folding.
> Merging FSTs sounds cool! Yep, it would be quite neat and relatively simple too. I looked at the FST API again, after doing some work on those compounds and there are several things that just hurt my eyes... I'll see if I can figure out a nicer API... how emotionally attached are you to that simulation of terminal states, Mike? :) Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: svn commit: r1135526 - in /lucene/dev/branches/branch_3x: ./ lucene/ lucene/backwards/ lucene/backwards/src/test-framework/ lucene/backwards/src/test/ lucene/src/java/org/apache/lucene/store/ solr
Thanks Mike. I keep forgetting about it. Dawid On Tue, Jun 14, 2011 at 2:28 PM, wrote: > Author: mikemccand > Date: Tue Jun 14 12:28:16 2011 > New Revision: 1135526 > > URL: http://svn.apache.org/viewvc?rev=1135526&view=rev > Log: > can't override interface until Java 1.6 > > Modified: > lucene/dev/branches/branch_3x/ (props changed) > lucene/dev/branches/branch_3x/lucene/ (props changed) > lucene/dev/branches/branch_3x/lucene/backwards/ (props changed) > lucene/dev/branches/branch_3x/lucene/backwards/src/test/ (props changed) > lucene/dev/branches/branch_3x/lucene/backwards/src/test-framework/ > (props changed) > > lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/InputStreamDataInput.java > > lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/OutputStreamDataOutput.java > lucene/dev/branches/branch_3x/solr/ (props changed) > > Modified: > lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/InputStreamDataInput.java > URL: > http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/InputStreamDataInput.java?rev=1135526&r1=1135525&r2=1135526&view=diff > == > --- > lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/InputStreamDataInput.java > (original) > +++ > lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/InputStreamDataInput.java > Tue Jun 14 12:28:16 2011 > @@ -51,7 +51,7 @@ public class InputStreamDataInput extend > } > } > > - @Override > + // @Override -- not until Java 1.6 > public void close() throws IOException { > is.close(); > } > > Modified: > lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/OutputStreamDataOutput.java > URL: > http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/OutputStreamDataOutput.java?rev=1135526&r1=1135525&r2=1135526&view=diff > == > --- > lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/OutputStreamDataOutput.java > (original) > +++ > lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/store/OutputStreamDataOutput.java > Tue Jun 14 12:28:16 2011 > @@ -39,6 +39,7 @@ public class OutputStreamDataOutput exte > os.write(b, offset, length); > } > > + // @Override -- not until Java 1.6 > public void close() throws IOException { > os.close(); > } > > > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3200) Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2
[ https://issues.apache.org/jira/browse/LUCENE-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049154#comment-13049154 ] Simon Willnauer commented on LUCENE-3200: - +1 this looks awesome. Gute Arbeit Uwe :) > Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized > of powers of 2 > --- > > Key: LUCENE-3200 > URL: https://issues.apache.org/jira/browse/LUCENE-3200 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Attachments: LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, > LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200_tests.patch > > > Robert and me discussed a little bit after Mike's investigations, that using > SingleMMapIndexinput together with MultiMMapIndexInput leads to hotspot > slowdowns sometimes. > We had the following ideas: > - MultiMMapIndexInput is almost as fast as SingleMMapIndexInput, as the > switching between buffer boundaries is done in exception catch blocks. So > normal code path is always the same like for Single* > - Only the seek method uses strange calculations (the modulo is totally > bogus, it could be simply: int bufOffset = (int) (pos % maxBufSize); - very > strange way of calculating modulo in the original code) > - Because of speed we suggest to no longer use arbitrary buffer sizes. We > should pass only the power of 2 to the indexinput as size. All calculations > in seek and anywhere else would be simple bit shifts and AND operations (the > and masks for the modulo can be calculated in the ctor like NumericUtils does > when calculating precisionSteps). > - the maximum buffer size will now be 2^30, not 2^31-1. But thats not an > issue at all. In my opinion, a buffer size of 2^31-1 is stupid in all cases, > as it will no longer fit page boundaries and mmapping gets harder for the O/S. > We will provide a patch with those cleanups. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2551) Checking dataimport.properties for write access during startup
[ https://issues.apache.org/jira/browse/SOLR-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-2551: Attachment: SOLR-2551.patch Patch to fail imports if the data config supports delta import but the dataimport.properties file is not writable. Added a test to verify. > Checking dataimport.properties for write access during startup > -- > > Key: SOLR-2551 > URL: https://issues.apache.org/jira/browse/SOLR-2551 > Project: Solr > Issue Type: Improvement > Components: contrib - DataImportHandler >Affects Versions: 1.4.1, 3.1 >Reporter: C S >Assignee: Shalin Shekhar Mangar >Priority: Minor > Attachments: SOLR-2551.patch > > > A common mistake is that the /conf (respectively the dataimport.properties) > file is not writable for solr. It would be great if that were detected on > starting a dataimport job. > Currently and import might grind away for days and fail if it can't write its > timestamp to the dataimport.properties file. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: External strings sort and case folding.
In theory, you could use the codec API directly, adding "chunks" of pre-sorted terms, and then fake up a SegmentInfo to make it look like some kind of degenerate segment, and then merge them? But it's gonna be a lot of work to do that :) Merging FSTs sounds cool! Mike McCandless http://blog.mikemccandless.com On Tue, Jun 14, 2011 at 8:18 AM, Dawid Weiss wrote: >> So actually it would work if you just enum'd the terms yourself, after >> indexing and optimizing. And this does amount to an external sort, I >> think! > > Yep. I was just curious if there's a way to do it without the overhead > of creating fields, documents, etc. If I have a spare minute I'll try > to write a merge sort from disk splits. It'd be neat to write FST > merging too (so that, given to FSTs you could merge them into one by > creating a new FST and adding sequences in order from one or the other > source). > > Dawid > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: External strings sort and case folding.
> So actually it would work if you just enum'd the terms yourself, after > indexing and optimizing. And this does amount to an external sort, I > think! Yep. I was just curious if there's a way to do it without the overhead of creating fields, documents, etc. If I have a spare minute I'll try to write a merge sort from disk splits. It'd be neat to write FST merging too (so that, given to FSTs you could merge them into one by creating a new FST and adding sequences in order from one or the other source). Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3202) Add DataInput/DataOutput subclasses that delegate to an InputStream/OutputStream.
[ https://issues.apache.org/jira/browse/LUCENE-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-3202. - Resolution: Fixed > Add DataInput/DataOutput subclasses that delegate to an > InputStream/OutputStream. > - > > Key: LUCENE-3202 > URL: https://issues.apache.org/jira/browse/LUCENE-3202 > Project: Lucene - Java > Issue Type: Task > Components: core/other >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3202.patch, LUCENE-3202.patch > > > Such classes would be handy for FST serialization/deserialization. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3202) Add DataInput/DataOutput subclasses that delegate to an InputStream/OutputStream.
[ https://issues.apache.org/jira/browse/LUCENE-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-3202: Attachment: LUCENE-3202.patch Updated patch, applied. > Add DataInput/DataOutput subclasses that delegate to an > InputStream/OutputStream. > - > > Key: LUCENE-3202 > URL: https://issues.apache.org/jira/browse/LUCENE-3202 > Project: Lucene - Java > Issue Type: Task > Components: core/other >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3202.patch, LUCENE-3202.patch > > > Such classes would be handy for FST serialization/deserialization. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3202) Add DataInput/DataOutput subclasses that delegate to an InputStream/OutputStream.
[ https://issues.apache.org/jira/browse/LUCENE-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049146#comment-13049146 ] Dawid Weiss commented on LUCENE-3202: - Thanks Shai. I'll add the headers, clean up this where applicable and commit in. > Add DataInput/DataOutput subclasses that delegate to an > InputStream/OutputStream. > - > > Key: LUCENE-3202 > URL: https://issues.apache.org/jira/browse/LUCENE-3202 > Project: Lucene - Java > Issue Type: Task > Components: core/other >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3202.patch > > > Such classes would be handy for FST serialization/deserialization. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: External strings sort and case folding.
On Tue, Jun 14, 2011 at 7:06 AM, Dawid Weiss wrote: >> And, if you create & index such Lucene documents, and then do a >> MatchAllDocsQuery sorting by your field, this is (unfortunately) not > > I was thinking about an optimized segment -- then the terms enum on a > given field should be sorted, right? Ahh, right. So actually it would work if you just enum'd the terms yourself, after indexing and optimizing. And this does amount to an external sort, I think! Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)
[ https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049143#comment-13049143 ] Martin Grotzke commented on SOLR-2583: -- {quote} See: http://www.strchr.com/multi-stage_tables i attached a patch, of a (not great) implementation i was sorta kinda trying to clean up for other reasons... maybe you can use it. {quote} Thanx, interesting approach! I just tried to create a CompactFloatArray based on the CompactByteArray to be able to compare memory consumptions. There's one change that wasn't just changing byte to float, and I'm not sure what's the right adaption in this case: {code} diff -w solr/src/java/org/apache/solr/util/CompactByteArray.java solr/src/java/org/apache/solr/util/CompactFloatArray.java 57c57 ... 202,203c202,203 < private void touchBlock(int i, int value) { < hashes[i] = (hashes[i] + (value << 1)) | 1; --- > private void touchBlock(int i, float value) { > hashes[i] = (hashes[i] + (Float.floatToIntBits(value) << 1)) | 1; {code} The adapted test is green, so it seems to be correct at least. I'll also attach the full patch for CompactFloatArray.java and TestCompactFloatArray.java > Make external scoring more efficient (ExternalFileField, FileFloatSource) > - > > Key: SOLR-2583 > URL: https://issues.apache.org/jira/browse/SOLR-2583 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Martin Grotzke >Priority: Minor > Attachments: FileFloatSource.java.patch, patch.txt > > > External scoring eats much memory, depending on the number of documents in > the index. The ExternalFileField (used for external scoring) uses > FileFloatSource, where one FileFloatSource is created per external scoring > file. FileFloatSource creates a float array with the size of the number of > docs (this is also done if the file to load is not found). If there are much > less entries in the scoring file than there are number of docs in total the > big float array wastes much memory. > This could be optimized by using a map of doc -> score, so that the map > contains as many entries as there are scoring entries in the external file, > but not more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3202) Add DataInput/DataOutput subclasses that delegate to an InputStream/OutputStream.
[ https://issues.apache.org/jira/browse/LUCENE-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049142#comment-13049142 ] Robert Muir commented on LUCENE-3202: - I agree with moving these to .store package, sorry I forgot about this in the suggest refactoring. I've had to write similar classes myself before since they were not there. > Add DataInput/DataOutput subclasses that delegate to an > InputStream/OutputStream. > - > > Key: LUCENE-3202 > URL: https://issues.apache.org/jira/browse/LUCENE-3202 > Project: Lucene - Java > Issue Type: Task > Components: core/other >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3202.patch > > > Such classes would be handy for FST serialization/deserialization. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3202) Add DataInput/DataOutput subclasses that delegate to an InputStream/OutputStream.
[ https://issues.apache.org/jira/browse/LUCENE-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049138#comment-13049138 ] Shai Erera commented on LUCENE-3202: Patch looks good. Two comments: # The files are missing the Apache License header. # In some places you use this.is / this.out and others just is/out. Can you consolidate on one (I prefer w/o this.)? > Add DataInput/DataOutput subclasses that delegate to an > InputStream/OutputStream. > - > > Key: LUCENE-3202 > URL: https://issues.apache.org/jira/browse/LUCENE-3202 > Project: Lucene - Java > Issue Type: Task > Components: core/other >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3202.patch > > > Such classes would be handy for FST serialization/deserialization. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2206) DIH MailEntityProcessor has mispelled words
[ https://issues.apache.org/jira/browse/SOLR-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul resolved SOLR-2206. -- Resolution: Fixed Fix Version/s: 4.0 Assignee: Noble Paul corrected spelling error > DIH MailEntityProcessor has mispelled words > --- > > Key: SOLR-2206 > URL: https://issues.apache.org/jira/browse/SOLR-2206 > Project: Solr > Issue Type: Improvement > Components: contrib - DataImportHandler >Reporter: Lance Norskog >Assignee: Noble Paul > Fix For: 4.0 > > > The MailEntityProcessor spells the XML attribute processAttachement with an > extra 'e'. From the archives, it appears that the two Solr fields also > spelled it this way but were fixed at some point. > Please make 'processAttachment' the standard spelling. The code should also > check for the extra-e version but it should not appear in the wiki or other > documentation. > Thanks! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1873) Commit Solr Cloud to trunk
[ https://issues.apache.org/jira/browse/SOLR-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049132#comment-13049132 ] Noble Paul commented on SOLR-1873: -- How can we make the logic for identifying the shards pluggable? if I have a per user data stored in a given shard, the search should be performed only there. Is there an issue to track this or shall I open one? > Commit Solr Cloud to trunk > -- > > Key: SOLR-1873 > URL: https://issues.apache.org/jira/browse/SOLR-1873 > Project: Solr > Issue Type: New Feature >Affects Versions: 1.4 >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 4.0 > > Attachments: SOLR-1873.patch, SOLR-1873.patch, SOLR-1873.patch, > SOLR-1873.patch, SOLR-1873.patch, SOLR-1873.patch, SOLR-1873.patch, > SOLR-1873.patch, SOLR-1873.patch, SOLR-1873.patch, SOLR-1873.patch, > SOLR-1873.patch, SOLR-1873.patch, SOLR-1873.patch, > TEST-org.apache.solr.cloud.ZkSolrClientTest.txt, log4j-over-slf4j-1.5.5.jar, > zookeeper-3.2.2.jar, zookeeper-3.3.1.jar > > > See http://wiki.apache.org/solr/SolrCloud > This is a real hassle - I didn't merge up to trunk before all the svn > scrambling, so integrating cloud is now a bit difficult. I'm running through > and just preparing a commit by hand though (applying changes/handling > conflicts a file at a time). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: External strings sort and case folding.
> And, if you create & index such Lucene documents, and then do a > MatchAllDocsQuery sorting by your field, this is (unfortunately) not I was thinking about an optimized segment -- then the terms enum on a given field should be sorted, right? Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: External strings sort and case folding.
Not that I know of. And, if you create & index such Lucene documents, and then do a MatchAllDocsQuery sorting by your field, this is (unfortunately) not an external sort! Ie, Lucene loads all terms data in RAM as packed byte[], for merging the per-segment results. It even does this, unnecessarily, for an optimized segment, even though we only need ords in that case (there's an issue open for this). Doing a sort-by-String-field without loading the String data even when there are multiple segments in the index would be a nice addition :) Mike McCandless http://blog.mikemccandless.com On Tue, Jun 14, 2011 at 6:31 AM, Dawid Weiss wrote: > Hi. While I was playing with automata recently, I had a use case > scenario when I could really use an external sort of a large list of > unicode strings. I know I could simply emulate this by creating > synthetic documents, index, etc., but is there a more "direct" way of > achieving this using Lucene's internals? > > Dawid > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3200) Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2
[ https://issues.apache.org/jira/browse/LUCENE-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049117#comment-13049117 ] Robert Muir commented on LUCENE-3200: - +1, great work Uwe. > Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized > of powers of 2 > --- > > Key: LUCENE-3200 > URL: https://issues.apache.org/jira/browse/LUCENE-3200 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Attachments: LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, > LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200_tests.patch > > > Robert and me discussed a little bit after Mike's investigations, that using > SingleMMapIndexinput together with MultiMMapIndexInput leads to hotspot > slowdowns sometimes. > We had the following ideas: > - MultiMMapIndexInput is almost as fast as SingleMMapIndexInput, as the > switching between buffer boundaries is done in exception catch blocks. So > normal code path is always the same like for Single* > - Only the seek method uses strange calculations (the modulo is totally > bogus, it could be simply: int bufOffset = (int) (pos % maxBufSize); - very > strange way of calculating modulo in the original code) > - Because of speed we suggest to no longer use arbitrary buffer sizes. We > should pass only the power of 2 to the indexinput as size. All calculations > in seek and anywhere else would be simple bit shifts and AND operations (the > and masks for the modulo can be calculated in the ctor like NumericUtils does > when calculating precisionSteps). > - the maximum buffer size will now be 2^30, not 2^31-1. But thats not an > issue at all. In my opinion, a buffer size of 2^31-1 is stupid in all cases, > as it will no longer fit page boundaries and mmapping gets harder for the O/S. > We will provide a patch with those cleanups. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3200) Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2
[ https://issues.apache.org/jira/browse/LUCENE-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049115#comment-13049115 ] Michael McCandless commented on LUCENE-3200: +1 to commit! In my stress NRT test (runs optimize on a full Wiki index with ongoing indexing / reopening), without this patch, I see performance drop substantially (like 180 QPS down to 140 QPS) when the JVM cuts over to the optimized segment. With the patch I see it jump up a bit after the optimize completes! So this seems to make hotspot's job easier... > Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized > of powers of 2 > --- > > Key: LUCENE-3200 > URL: https://issues.apache.org/jira/browse/LUCENE-3200 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Attachments: LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, > LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200_tests.patch > > > Robert and me discussed a little bit after Mike's investigations, that using > SingleMMapIndexinput together with MultiMMapIndexInput leads to hotspot > slowdowns sometimes. > We had the following ideas: > - MultiMMapIndexInput is almost as fast as SingleMMapIndexInput, as the > switching between buffer boundaries is done in exception catch blocks. So > normal code path is always the same like for Single* > - Only the seek method uses strange calculations (the modulo is totally > bogus, it could be simply: int bufOffset = (int) (pos % maxBufSize); - very > strange way of calculating modulo in the original code) > - Because of speed we suggest to no longer use arbitrary buffer sizes. We > should pass only the power of 2 to the indexinput as size. All calculations > in seek and anywhere else would be simple bit shifts and AND operations (the > and masks for the modulo can be calculated in the ctor like NumericUtils does > when calculating precisionSteps). > - the maximum buffer size will now be 2^30, not 2^31-1. But thats not an > issue at all. In my opinion, a buffer size of 2^31-1 is stupid in all cases, > as it will no longer fit page boundaries and mmapping gets harder for the O/S. > We will provide a patch with those cleanups. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3197) Optimize runs forever if you keep deleting docs at the same time
[ https://issues.apache.org/jira/browse/LUCENE-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049109#comment-13049109 ] Michael McCandless commented on LUCENE-3197: One simple way to fix this would be to have IW disregard the MergePolicy if ever it asks to do a single-segment merge of a segment that had already been produced by merging for the current optimize call. But... I don't really like this, as it could be some unusual MergePolicy out there sometimes wants to do such merging. So I think a better solution, but API breaking to the MergePolicy, which is OK because it's @experimental, is to change the segmentsToOptimize argument; currently it's just a set recording which segments need to be optimized away. I think we should change it to a Map, where the Boolean indicates whether this segment had been created by a merge in the current optimize session. Then I'll fix our MPs to not cascade in such a case. > Optimize runs forever if you keep deleting docs at the same time > > > Key: LUCENE-3197 > URL: https://issues.apache.org/jira/browse/LUCENE-3197 > Project: Lucene - Java > Issue Type: Bug > Components: core/index >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 3.3, 4.0 > > > Because we "cascade" merges for an optimize... if you also delete documents > while the merges are running, then the merge policy will see the resulting > single segment as still not optimized (since it has pending deletes) and do a > single-segment merge, and will repeat indefinitely (as long as your app keeps > deleting docs). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3197) Optimize runs forever if you keep deleting docs at the same time
[ https://issues.apache.org/jira/browse/LUCENE-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-3197: -- Assignee: Michael McCandless > Optimize runs forever if you keep deleting docs at the same time > > > Key: LUCENE-3197 > URL: https://issues.apache.org/jira/browse/LUCENE-3197 > Project: Lucene - Java > Issue Type: Bug > Components: core/index >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 3.3, 4.0 > > > Because we "cascade" merges for an optimize... if you also delete documents > while the merges are running, then the merge policy will see the resulting > single segment as still not optimized (since it has pending deletes) and do a > single-segment merge, and will repeat indefinitely (as long as your app keeps > deleting docs). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
External strings sort and case folding.
Hi. While I was playing with automata recently, I had a use case scenario when I could really use an external sort of a large list of unicode strings. I know I could simply emulate this by creating synthetic documents, index, etc., but is there a more "direct" way of achieving this using Lucene's internals? Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3202) Add DataInput/DataOutput subclasses that delegate to an InputStream/OutputStream.
[ https://issues.apache.org/jira/browse/LUCENE-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049104#comment-13049104 ] Dawid Weiss edited comment on LUCENE-3202 at 6/14/11 10:25 AM: --- A patch moving these stream delegation classes to org.apache.lucene.store. A potential bugfix is piggybacked (potential partial read(byte[]) was not handled correctly). was (Author: dweiss): A patch moving these stream delegation classes to org.apache.lucene.store. A potential bug is piggybacked (potential partial read(byte[]) was not handled correctly). > Add DataInput/DataOutput subclasses that delegate to an > InputStream/OutputStream. > - > > Key: LUCENE-3202 > URL: https://issues.apache.org/jira/browse/LUCENE-3202 > Project: Lucene - Java > Issue Type: Task > Components: core/other >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3202.patch > > > Such classes would be handy for FST serialization/deserialization. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3202) Add DataInput/DataOutput subclasses that delegate to an InputStream/OutputStream.
[ https://issues.apache.org/jira/browse/LUCENE-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-3202: Attachment: LUCENE-3202.patch A patch moving these stream delegation classes to org.apache.lucene.store. A potential bug is piggybacked (potential partial read(byte[]) was not handled correctly). > Add DataInput/DataOutput subclasses that delegate to an > InputStream/OutputStream. > - > > Key: LUCENE-3202 > URL: https://issues.apache.org/jira/browse/LUCENE-3202 > Project: Lucene - Java > Issue Type: Task > Components: core/other >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3202.patch > > > Such classes would be handy for FST serialization/deserialization. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3190) TestStressIndexing2 testMultiConfig failure
[ https://issues.apache.org/jira/browse/LUCENE-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049100#comment-13049100 ] selckin commented on LUCENE-3190: - {code} [junit] Testsuite: org.apache.lucene.index.TestStressIndexing2 [junit] Tests run: 50, Failures: 2, Errors: 0, Time elapsed: 11.135 sec [junit] [junit] - Standard Error - [junit] java.lang.AssertionError: ram was 460248 expected: 407592 flush mem: 394568 active: 65680 pending: 0 flushing: 3 blocked: 0 peakDelta: 65959 [junit] at org.apache.lucene.index.DocumentsWriterFlushControl.assertMemory(DocumentsWriterFlushControl.java:102) [junit] at org.apache.lucene.index.DocumentsWriterFlushControl.doAfterDocument(DocumentsWriterFlushControl.java:164) [junit] at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380) [junit] at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1474) [junit] at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1446) [junit] at org.apache.lucene.index.TestStressIndexing2$IndexingThread.indexDoc(TestStressIndexing2.java:723) [junit] at org.apache.lucene.index.TestStressIndexing2$IndexingThread.run(TestStressIndexing2.java:757) [junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 -Dtestmethod=testMultiConfig -Dtests.seed=2571834029692482827:-8116419692655152763 [junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 -Dtestmethod=testMultiConfig -Dtests.seed=2571834029692482827:-8116419692655152763 [junit] The following exceptions were thrown by threads: [junit] *** Thread: Thread-793 *** [junit] junit.framework.AssertionFailedError: java.lang.AssertionError: ram was 460248 expected: 407592 flush mem: 394568 active: 65680 pending: 0 flushing: 3 blocked: 0 peakDelta: 65959 [junit] at junit.framework.Assert.fail(Assert.java:47) [junit] at org.apache.lucene.index.TestStressIndexing2$IndexingThread.run(TestStressIndexing2.java:762) [junit] NOTE: test params are: codec=RandomCodecProvider: {f34=MockRandom, f33=Standard, f32=Standard, f31=MockSep, f30=Pulsing(freqCutoff=7), f39=Standard, f38=MockSep, f37=Pulsing(freqCutoff=7), f36=MockFixedIntBlock(blockSize=649), f35=MockVariableIntBlock(baseBlockSize=9), f43=MockFixedIntBlock(blockSize=649), f42=MockVariableIntBlock(baseBlockSize=9), f45=MockSep, f44=Pulsing(freqCutoff=7), f41=MockRandom, f40=Standard, f47=Standard, f46=Standard, f49=MockVariableIntBlock(baseBlockSize=9), f48=MockRandom, f6=Pulsing(freqCutoff=7), f7=MockSep, f8=Standard, f9=MockVariableIntBlock(baseBlockSize=9), f12=Standard, f11=Standard, f10=MockSep, f16=Pulsing(freqCutoff=7), f15=MockFixedIntBlock(blockSize=649), f14=MockVariableIntBlock(baseBlockSize=9), f13=MockRandom, f19=Standard, f18=Standard, f17=MockSep, f1=MockFixedIntBlock(blockSize=649), f0=MockVariableIntBlock(baseBlockSize=9), f3=MockSep, f2=Pulsing(freqCutoff=7), f5=MockVariableIntBlock(baseBlockSize=9), f4=Standard, f21=MockVariableIntBlock(baseBlockSize=9), f20=MockRandom, f23=Pulsing(freqCutoff=7), f22=MockFixedIntBlock(blockSize=649), f25=Standard, f24=MockSep, f27=MockRandom, f26=Standard, f29=MockFixedIntBlock(blockSize=649), f28=MockVariableIntBlock(baseBlockSize=9), f98=MockVariableIntBlock(baseBlockSize=9), f97=MockRandom, f99=MockFixedIntBlock(blockSize=649), f94=MockSep, f93=Pulsing(freqCutoff=7), f96=Standard, f95=Standard, f79=Pulsing(freqCutoff=7), f77=MockVariableIntBlock(baseBlockSize=9), f78=MockFixedIntBlock(blockSize=649), f75=Standard, f76=MockRandom, f73=MockSep, f74=Standard, f71=MockFixedIntBlock(blockSize=649), f72=Pulsing(freqCutoff=7), f81=MockVariableIntBlock(baseBlockSize=9), f80=MockRandom, f86=Pulsing(freqCutoff=7), f87=MockSep, f88=Standard, f89=Standard, f82=Standard, f83=MockRandom, f84=MockVariableIntBlock(baseBlockSize=9), f85=MockFixedIntBlock(blockSize=649), f90=Pulsing(freqCutoff=7), f92=Standard, f91=MockSep, f59=MockSep, f57=MockFixedIntBlock(blockSize=649), f58=Pulsing(freqCutoff=7), f51=Pulsing(freqCutoff=7), f52=MockSep, f50=MockFixedIntBlock(blockSize=649), f55=MockRandom, f56=MockVariableIntBlock(baseBlockSize=9), f53=Standard, f54=Standard, id=MockFixedIntBlock(blockSize=649), f68=Standard, f69=MockRandom, f60=Standard, f61=Standard, f62=MockRandom, f63=MockVariableIntBlock(baseBlockSize=9), f64=MockFixedIntBlock(blockSize=649), f65=Pulsing(freqCutoff=7), f66=MockSep, f67=Standard, f70=MockSep}, locale=en_SG, timezone=Europe/Dublin [junit] NOTE: all tests run in this JVM: [junit] [TestStressIndexing2] [junit] NOTE: Linux 2.6.39-gentoo amd64/Sun Microsystems Inc. 1.6.0_26 (64-bit)/cpus=8,threads=1,free=61143744,total=147718144 [junit] - --- [junit] Testcase: testMult
[jira] [Created] (LUCENE-3202) Add DataInput/DataOutput subclasses that delegate to an InputStream/OutputStream.
Add DataInput/DataOutput subclasses that delegate to an InputStream/OutputStream. - Key: LUCENE-3202 URL: https://issues.apache.org/jira/browse/LUCENE-3202 Project: Lucene - Java Issue Type: Task Components: core/other Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Trivial Fix For: 3.3, 4.0 Such classes would be handy for FST serialization/deserialization. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3190) TestStressIndexing2 testMultiConfig failure
[ https://issues.apache.org/jira/browse/LUCENE-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049095#comment-13049095 ] Simon Willnauer commented on LUCENE-3190: - {noformat} [junit] Testcase: testRandom(org.apache.lucene.index.TestStressIndexing2): FAILED [junit] Some threads threw uncaught exceptions! [junit] junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) [junit] at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:603) [junit] [junit] [junit] Tests run: 3, Failures: 2, Errors: 0, Time elapsed: 9.243 sec [junit] [junit] - Standard Error - [junit] java.lang.AssertionError: ram was 462219 expected: 409920 flush mem: 396467 active: 65752 [junit] at org.apache.lucene.index.DocumentsWriterFlushControl.assertMemory(DocumentsWriterFlushControl.java:102) [junit] at org.apache.lucene.index.DocumentsWriterFlushControl.doAfterDocument(DocumentsWriterFlushControl.java:164) [junit] at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380) [junit] at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1474) [junit] at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1446) [junit] at org.apache.lucene.index.TestStressIndexing2$IndexingThread.indexDoc(TestStressIndexing2.java:723) [junit] at org.apache.lucene.index.TestStressIndexing2$IndexingThread.run(TestStressIndexing2.java:757) [junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 -Dtestmethod=testRandom -Dtests.seed=-3081198538389112044:6990165845273194870 -Dtests.multiplier=3 {noformat} jenkins just tripped a similar issue... the problem here seems related to a very lowish rambuffer together with flushing by docCount. I was not able to reproduce it yet. each time this fails ram buffer is 0.1M and maxBufferedDocs is 3 so something seems to break the assert if we flush by doccount and not necessarily take the largest DWPT out of the loop selckin can you reproduce these errors? I just added some more info to the assert so if you run into it can you past the output? > TestStressIndexing2 testMultiConfig failure > --- > > Key: LUCENE-3190 > URL: https://issues.apache.org/jira/browse/LUCENE-3190 > Project: Lucene - Java > Issue Type: Bug >Reporter: selckin >Assignee: Simon Willnauer > > trunk: r1134311 > reproducible > {code} > [junit] Testsuite: org.apache.lucene.index.TestStressIndexing2 > [junit] Tests run: 1, Failures: 2, Errors: 0, Time elapsed: 0.882 sec > [junit] > [junit] - Standard Error - > [junit] java.lang.AssertionError: ram was 460908 expected: 408216 flush > mem: 395100 active: 65808 > [junit] at > org.apache.lucene.index.DocumentsWriterFlushControl.assertMemory(DocumentsWriterFlushControl.java:102) > [junit] at > org.apache.lucene.index.DocumentsWriterFlushControl.doAfterDocument(DocumentsWriterFlushControl.java:164) > [junit] at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380) > [junit] at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1473) > [junit] at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1445) > [junit] at > org.apache.lucene.index.TestStressIndexing2$IndexingThread.indexDoc(TestStressIndexing2.java:723) > [junit] at > org.apache.lucene.index.TestStressIndexing2$IndexingThread.run(TestStressIndexing2.java:757) > [junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 > -Dtestmethod=testMultiConfig > -Dtests.seed=2571834029692482827:-8116419692655152763 > [junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 > -Dtestmethod=testMultiConfig > -Dtests.seed=2571834029692482827:-8116419692655152763 > [junit] The following exceptions were thrown by threads: > [junit] *** Thread: Thread-0 *** > [junit] junit.framework.AssertionFailedError: java.lang.AssertionError: > ram was 460908 expected: 408216 flush mem: 395100 active: 65808 > [junit] at junit.framework.Assert.fail(Assert.java:47) > [junit] at > org.apache.lucene.index.TestStressIndexing2$IndexingThread.run(TestStressIndexing2.java:762) > [junit] NOTE: test params are: codec=RandomCodecProvider: {f33=Standard, > f57=MockFixedIntBlock(blockSize=649), f11=Standard, f41=MockRandom, > f40=Standard, f62=MockRandom, f75=Stand
Re: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 8823 - Failure
ah this is tripping an assert I added a couple of weeks ago. We already have an issue for this here: https://issues.apache.org/jira/browse/LUCENE-3190 the problem here seems related to a very lowish rambuffer together with flushing by docCount. I was not able to reproduce it yet. each time this fails ram buffer is 0.1M and maxBufferedDocs is 3 so something seems to break the assert if we flush by doccount and not necessarily take the largest DWPT out of the loop I will dig further simon On Tue, Jun 14, 2011 at 9:16 AM, Apache Jenkins Server wrote: > Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8823/ > > 1 tests failed. > REGRESSION: org.apache.lucene.index.TestStressIndexing2.testRandom > > Error Message: > r1.numDocs()=9 vs r2.numDocs()=8 > > Stack Trace: > junit.framework.AssertionFailedError: r1.numDocs()=9 vs r2.numDocs()=8 > at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) > at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) > at > org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:308) > at > org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:278) > at > org.apache.lucene.index.TestStressIndexing2.testRandom(TestStressIndexing2.java:88) > > > > > Build Log (for compile errors): > [...truncated 3276 lines...] > > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org