Re: set PYTHONPATH programatically from Java?
hi, so after reading http://docs.python.org/c-api/init.html#PySys_SetArgvEx and the source code for _PythonVM_init i figured it out I have to do: PythonVM.start(/dvt/workspace/montysolr/src/python/montysolr); and the sys.path then contains the parent folder (above montysolr) and i can then set more things by loading some boostrap module but something like http://docs.python.org/c-api/veryhigh.html#PyRun_SimpleString would be much more flexible. Is it something that could be added? I can prepare a patch (as it seems really trivial my knowledge might be sufficient for this :)) roman On Mon, Nov 14, 2011 at 1:12 PM, Roman Chyla roman.ch...@gmail.com wrote: On Mon, Nov 14, 2011 at 4:25 AM, Andi Vajda va...@apache.org wrote: On Sun, 13 Nov 2011, Roman Chyla wrote: I am using JCC to run Python inside Java. For unittest, I'd like to set PYTHONPATH environment variable programmatically. I can change env vars inside Java (using http://stackoverflow.com/questions/318239/how-do-i-set-environment-variables-from-java) and System.getenv(PYTHONPATH) shows correct values However, I am still getting ImportError: no module named If I set PYTHONPATH before starting unittest, it works fine Is it possible what I would like to do? Why mess with the environment instead of setting sys.path directly instead ? That would be great, but I don't know how. I am doing roughly this: PythonVM.start(programName) vm = PythonVM.get() vm.instantiate(moduleName, className); I tried also: PythonVM.start(programName, new String[]{-c, import sys;sys.path.insert(0, \'/dvt/workspace/montysolr/src/python\'}); it is failing on vm.instantiate when Python cannot find the module Alternatively, if JCC could execute/eval python string, I could set sys.argv that way I'm not sure what you mean here but JCC's Java PythonVM.init() method takes an array of strings that is fed into sys.argv. See _PythonVM_Init() sources in jcc.cpp for details. sorry, i meant sys.path, not sys.argv roman Andi..
[jira] [Updated] (LUCENE-3269) Speed up Top-K sampling tests
[ https://issues.apache.org/jira/browse/LUCENE-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-3269: --- Attachment: LUCENE-3269.patch Patch introduces the following: * HashMapInteger, SearchTaxoDirPair which is initialized in beforeClass and maps a partition size to the pair of Directories. * initIndex first checks the map for the partition size, and creates the indexes only if no matching pair is found. The sampling tests do not benefit from that directly, as they only run one test method, however, if they will run in the same JVM, then they will reuse the already created indexes. Patch is against 3x and seems trivial, so I intend to commit this later today or tomorrow if there are no objections. Speed up Top-K sampling tests - Key: LUCENE-3269 URL: https://issues.apache.org/jira/browse/LUCENE-3269 Project: Lucene - Java Issue Type: Test Components: modules/facet Reporter: Robert Muir Fix For: 3.5, 4.0 Attachments: LUCENE-3269.patch, LUCENE-3269.patch, LUCENE-3269.patch, LUCENE-3269.patch speed up the top-k sampling tests (but make sure they are thorough on nightly etc still) usually we would do this with use of atLeast(), but these tests are somewhat tricky, so maybe a different approach is needed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2571) Indexing performance tests with realtime branch
[ https://issues.apache.org/jira/browse/LUCENE-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-2571. - Resolution: Fixed Indexing performance tests with realtime branch --- Key: LUCENE-2571 URL: https://issues.apache.org/jira/browse/LUCENE-2571 Project: Lucene - Java Issue Type: Task Components: core/index Reporter: Michael Busch Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch Attachments: wikimedium.realtime.Standard.nd10M_dps.png, wikimedium.realtime.Standard.nd10M_dps_addDocuments.png, wikimedium.realtime.Standard.nd10M_dps_addDocuments_flush.png, wikimedium.trunk.Standard.nd10M_dps.png, wikimedium.trunk.Standard.nd10M_dps_BalancedSegmentMergePolicy.png, wikimedium.trunk.Standard.nd10M_dps_addDocuments.png We should run indexing performance tests with the DWPT changes and compare to trunk. We need to test both single-threaded and multi-threaded performance. NOTE: flush by RAM isn't implemented just yet, so either we wait with the tests or flush by doc count. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3562) Stop storing TermsEnum in CloseableThreadLocal inside Terms instance
[ https://issues.apache.org/jira/browse/LUCENE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149531#comment-13149531 ] Simon Willnauer commented on LUCENE-3562: - mike I think you should commit this - patch looks good to me Stop storing TermsEnum in CloseableThreadLocal inside Terms instance Key: LUCENE-3562 URL: https://issues.apache.org/jira/browse/LUCENE-3562 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-3562.patch We have sugar methods in Terms.java (docFreq, totalTermFreq, docs, docsAndPositions) that use a saved thread-private TermsEnum to do the lookups. But on apps that send many threads through Lucene, and/or have many segments, this can add up to a lot of RAM, especially if the codecs impl holds onto stuff. Also, Terms has a close method (closes the CloseableThreadLocal) which must be called, but we fail to do so in some places. These saved enums are the cause of the recent OOME in TestNRTManager (TestNRTManager.testNRTManager -seed 2aa27e1aec20c4a2:-4a5a5ecf46837d0e:-7c4f651f1f0b75d7 -mult 3 -nightly). Really sharing these enums is a holdover from before Lucene queries would share state (ie, save the TermState from the first pass, and use it later to pull enums, get docFreq, etc.). It's not helpful anymore, and it can use gobbs of RAM, so I'd like to remove it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-1271) ClassCastException when using ParallelMultiSearcher.search(Query query, Filter filter, int n, Sort sort)
[ https://issues.apache.org/jira/browse/LUCENE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-1271. - Resolution: Won't Fix ParallelMultiSearcher is deprecated use IndexSearcher instead ClassCastException when using ParallelMultiSearcher.search(Query query, Filter filter, int n, Sort sort) Key: LUCENE-1271 URL: https://issues.apache.org/jira/browse/LUCENE-1271 Project: Lucene - Java Issue Type: Bug Components: core/search Affects Versions: 2.3, 2.3.1 Environment: MS Windows XP (SP 2), JDK 1.5.0 Update 12 Reporter: Kai Burjack Priority: Minor Fix For: 4.0 Stacktrace-Output in Console: Exception in thread MultiSearcher thread #1 java.lang.ClassCastException: org.apache.lucene.search.ScoreDoc at org.apache.lucene.search.FieldDocSortedHitQueue.lessThan(FieldDocSortedHitQueue.java:105) at org.apache.lucene.util.PriorityQueue.upHeap(PriorityQueue.java:139) at org.apache.lucene.util.PriorityQueue.put(PriorityQueue.java:53) at org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:78) at org.apache.lucene.util.PriorityQueue.insert(PriorityQueue.java:63) at org.apache.lucene.search.MultiSearcherThread.run(ParallelMultiSearcher.java:272) Exception in thread MultiSearcher thread #2 java.lang.ClassCastException: org.apache.lucene.search.ScoreDoc at org.apache.lucene.search.FieldDocSortedHitQueue.lessThan(FieldDocSortedHitQueue.java:105) at org.apache.lucene.util.PriorityQueue.upHeap(PriorityQueue.java:139) at org.apache.lucene.util.PriorityQueue.put(PriorityQueue.java:53) at org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:78) at org.apache.lucene.util.PriorityQueue.insert(PriorityQueue.java:63) at org.apache.lucene.search.MultiSearcherThread.run(ParallelMultiSearcher.java:272) Stack-Trace in resulting exception while performing the JUnit-Test: java.lang.ClassCastException: org.apache.lucene.search.ScoreDoc at org.apache.lucene.search.FieldDocSortedHitQueue.lessThan(FieldDocSortedHitQueue.java:105) at org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:155) at org.apache.lucene.util.PriorityQueue.pop(PriorityQueue.java:106) at org.apache.lucene.search.ParallelMultiSearcher.search(ParallelMultiSearcher.java:146) at org.apache.lucene.search.Searcher.search(Searcher.java:78) at class calling the Searcher.search(Query query, Filter filter, int n, Sort sort) method with filter:null and sort:null at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at junit.framework.TestCase.runTest(TestCase.java:154) at junit.framework.TestCase.runBare(TestCase.java:127) at junit.framework.TestResult$1.protect(TestResult.java:106) at junit.framework.TestResult.runProtected(TestResult.java:124) at junit.framework.TestResult.run(TestResult.java:109) at junit.framework.TestCase.run(TestCase.java:118) at junit.framework.TestSuite.runTest(TestSuite.java:208) at junit.framework.TestSuite.run(TestSuite.java:203) at org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3428) trunk tests hang/deadlock TestIndexWriterWithThreads
[ https://issues.apache.org/jira/browse/LUCENE-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-3428. - Resolution: Fixed Fix Version/s: 4.0 Lucene Fields: New,Patch Available (was: New) fixed trunk tests hang/deadlock TestIndexWriterWithThreads Key: LUCENE-3428 URL: https://issues.apache.org/jira/browse/LUCENE-3428 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3428.patch trunk tests have been hanging often lately in hudson, this time i was careful to kill and get a good stacktrace: -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3425) NRT Caching Dir to allow for exact memory usage, better buffer allocation and global cross indices control
[ https://issues.apache.org/jira/browse/LUCENE-3425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3425: Affects Version/s: 4.0 3.4 Fix Version/s: 4.0 3.5 NRT Caching Dir to allow for exact memory usage, better buffer allocation and global cross indices control Key: LUCENE-3425 URL: https://issues.apache.org/jira/browse/LUCENE-3425 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 3.4, 4.0 Reporter: Shay Banon Fix For: 3.5, 4.0 A discussion on IRC raised several improvements that can be made to NRT caching dir. Some of the problems it currently has are: 1. Not explicitly controlling the memory usage, which can result in overusing memory (for example, large new segments being committed because refreshing is too far behind). 2. Heap fragmentation because of constant allocation of (probably promoted to old gen) byte buffers. 3. Not being able to control the memory usage across indices for multi index usage within a single JVM. A suggested solution (which still needs to be ironed out) is to have a BufferAllocator that controls allocation of byte[], and allow to return unused byte[] to it. It will have a cap on the size of memory it allows to be allocated. The NRT caching dir will use the allocator, which can either be provided (for usage across several indices) or created internally. The caching dir will also create a wrapped IndexOutput, that will flush to the main dir if the allocator can no longer provide byte[] (exhausted). When a file is flushed from the cache to the main directory, it will return all the currently allocated byte[] to the BufferAllocator to be reused by other files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3453) remove IndexDocValuesField
[ https://issues.apache.org/jira/browse/LUCENE-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149536#comment-13149536 ] Simon Willnauer commented on LUCENE-3453: - hey chris what is the status here? remove IndexDocValuesField -- Key: LUCENE-3453 URL: https://issues.apache.org/jira/browse/LUCENE-3453 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Robert Muir Assignee: Chris Male Fix For: 4.0 Its confusing how we present CSF functionality to the user, its actually not a field but an attribute of a field like STORED or INDEXED. Otherwise, its really hard to think about CSF because there is a mismatch between the APIs and the index format. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3453) remove IndexDocValuesField
[ https://issues.apache.org/jira/browse/LUCENE-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3453: Fix Version/s: 4.0 remove IndexDocValuesField -- Key: LUCENE-3453 URL: https://issues.apache.org/jira/browse/LUCENE-3453 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Robert Muir Assignee: Chris Male Fix For: 4.0 Its confusing how we present CSF functionality to the user, its actually not a field but an attribute of a field like STORED or INDEXED. Otherwise, its really hard to think about CSF because there is a mismatch between the APIs and the index format. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2382) DIH Cache Improvements
[ https://issues.apache.org/jira/browse/SOLR-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149540#comment-13149540 ] Noble Paul commented on SOLR-2382: -- committed svn version 121659. Thanks James DIH Cache Improvements -- Key: SOLR-2382 URL: https://issues.apache.org/jira/browse/SOLR-2382 Project: Solr Issue Type: New Feature Components: contrib - DataImportHandler Reporter: James Dyer Priority: Minor Attachments: SOLR-2382-dihwriter.patch, SOLR-2382-dihwriter.patch, SOLR-2382-dihwriter.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-properties.patch, SOLR-2382-properties.patch, SOLR-2382-solrwriter-verbose-fix.patch, SOLR-2382-solrwriter.patch, SOLR-2382-solrwriter.patch, SOLR-2382-solrwriter.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch Functionality: 1. Provide a pluggable caching framework for DIH so that users can choose a cache implementation that best suits their data and application. 2. Provide a means to temporarily cache a child Entity's data without needing to create a special cached implementation of the Entity Processor (such as CachedSqlEntityProcessor). 3. Provide a means to write the final (root entity) DIH output to a cache rather than to Solr. Then provide a way for a subsequent DIH call to use the cache as an Entity input. Also provide the ability to do delta updates on such persistent caches. 4. Provide the ability to partition data across multiple caches that can then be fed back into DIH and indexed either to varying Solr Shards, or to the same Core in parallel. Use Cases: 1. We needed a flexible scalable way to temporarily cache child-entity data prior to joining to parent entities. - Using SqlEntityProcessor with Child Entities can cause an n+1 select problem. - CachedSqlEntityProcessor only supports an in-memory HashMap as a Caching mechanism and does not scale. - There is no way to cache non-SQL inputs (ex: flat files, xml, etc). 2. We needed the ability to gather data from long-running entities by a process that runs separate from our main indexing process. 3. We wanted the ability to do a delta import of only the entities that changed. - Lucene/Solr requires entire documents to be re-indexed, even if only a few fields changed. - Our data comes from 50+ complex sql queries and/or flat files. - We do not want to incur overhead re-gathering all of this data if only 1 entity's data changed. - Persistent DIH caches solve this problem. 4. We want the ability to index several documents in parallel (using 1.4.1, which did not have the threads parameter). 5. In the future, we may need to use Shards, creating a need to easily partition our source data into Shards. Implementation Details: 1. De-couple EntityProcessorBase from caching. - Created a new interface, DIHCache two implementations: - SortedMapBackedCache - An in-memory cache, used as default with CachedSqlEntityProcessor (now deprecated). - BerkleyBackedCache - A disk-backed cache, dependent on bdb-je, tested with je-4.1.6.jar - NOTE: the existing Lucene Contrib db project uses je-3.3.93.jar. I believe this may be incompatible due to Generic Usage. - NOTE: I did not modify the ant script to automatically get this jar, so to use or evaluate this patch, download bdb-je from http://www.oracle.com/technetwork/database/berkeleydb/downloads/index.html 2. Allow Entity Processors to take a cacheImpl parameter to cause the entity data to be cached (see EntityProcessorBase DIHCacheProperties). 3. Partially De-couple SolrWriter from DocBuilder - Created a new interface DIHWriter, two implementations: - SolrWriter (refactored) - DIHCacheWriter (allows DIH to write ultimately to a Cache). 4. Create a new Entity Processor, DIHCacheProcessor, which reads a persistent Cache as DIH Entity Input. 5. Support a partition parameter with both DIHCacheWriter and DIHCacheProcessor to allow for easy partitioning of source entity data. 6. Change the semantics of entity.destroy() - Previously, it was being called on each iteration of DocBuilder.buildDocument(). - Now it is does one-time cleanup tasks (like closing or deleting a disk-backed cache) once the entity processor is completed. - The only out-of-the-box entity processor that previously implemented destroy() was LineEntitiyProcessor, so this is not a very invasive change.
[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers
[ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149541#comment-13149541 ] Simon Willnauer commented on LUCENE-3396: - chris, this seems to be done no? can you close it? Make TokenStream Reuse Mandatory for Analyzers -- Key: LUCENE-3396 URL: https://issues.apache.org/jira/browse/LUCENE-3396 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Reporter: Chris Male Attachments: LUCENE-3396-forgotten.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-remaining-analyzers.patch, LUCENE-3396-remaining-merging.patch In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams. This is a big chunk of work, but its time to bite the bullet. I plan to attack this in the following way: - Collapse the logic of ReusableAnalyzerBase into Analyzer - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field. - Convert all Analyzers over to using TokenStreamComponents. I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused). - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2949) FastVectorHighlighter FieldTermStack could likely benefit from using TermVectorMapper
[ https://issues.apache.org/jira/browse/LUCENE-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated LUCENE-2949: --- Assignee: (was: Koji Sekiguchi) FastVectorHighlighter FieldTermStack could likely benefit from using TermVectorMapper - Key: LUCENE-2949 URL: https://issues.apache.org/jira/browse/LUCENE-2949 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.0.3, 4.0 Reporter: Grant Ingersoll Priority: Minor Labels: FastVectorHighlighter, Highlighter Fix For: 3.5, 4.0 Attachments: LUCENE-2949.patch Based on my reading of the FieldTermStack constructor that loads the vector from disk, we could probably save a bunch of time and memory by using the TermVectorMapper callback mechanism instead of materializing the full array of terms into memory and then throwing most of them out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2949) FastVectorHighlighter FieldTermStack could likely benefit from using TermVectorMapper
[ https://issues.apache.org/jira/browse/LUCENE-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149542#comment-13149542 ] Koji Sekiguchi commented on LUCENE-2949: Cool, I like the idea! But I don't have much time to try it now, I'll unassign myself. FastVectorHighlighter FieldTermStack could likely benefit from using TermVectorMapper - Key: LUCENE-2949 URL: https://issues.apache.org/jira/browse/LUCENE-2949 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.0.3, 4.0 Reporter: Grant Ingersoll Assignee: Koji Sekiguchi Priority: Minor Labels: FastVectorHighlighter, Highlighter Fix For: 3.5, 4.0 Attachments: LUCENE-2949.patch Based on my reading of the FieldTermStack constructor that loads the vector from disk, we could probably save a bunch of time and memory by using the TermVectorMapper callback mechanism instead of materializing the full array of terms into memory and then throwing most of them out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3496) Support grouping by IndexDocValues
[ https://issues.apache.org/jira/browse/LUCENE-3496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149543#comment-13149543 ] Simon Willnauer commented on LUCENE-3496: - Martjin, the last patch looks ok to me. you should go ahead and commit this... Support grouping by IndexDocValues -- Key: LUCENE-3496 URL: https://issues.apache.org/jira/browse/LUCENE-3496 Project: Lucene - Java Issue Type: New Feature Components: modules/grouping Reporter: Martijn van Groningen Assignee: Martijn van Groningen Fix For: 4.0 Attachments: LUCENE-3496.patch, LUCENE-3496.patch, LUCENE-3496.patch, LUCENE-3496.patch, LUCENE-3496.patch, LUCENE-3496.patch, LUCENE-3496.patch Although IDV is not yet finalized (More particular the SortedSource). I think we already can discuss / investigate implementing grouping by IDV. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3509) Add settings to IWC to optimize IDV indices for CPU or RAM respectivly
[ https://issues.apache.org/jira/browse/LUCENE-3509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149545#comment-13149545 ] Simon Willnauer commented on LUCENE-3509: - bq. I think fasterButMoreRam is fine, since it is a dv codec parameter now. +1 go ahead Add settings to IWC to optimize IDV indices for CPU or RAM respectivly -- Key: LUCENE-3509 URL: https://issues.apache.org/jira/browse/LUCENE-3509 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Priority: Minor Fix For: 4.0 Attachments: LUCENE-3509.patch, LUCENE-3509.patch spinnoff from LUCENE-3496 - we are seeing much better performance if required bits for PackedInts are rounded up to a 8/16/32/64. We should add this option to IWC and default to round up ie. more RAM faster lookups. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (LUCENE-3379) jre crashes in ArrayUtil mergeSort
[ https://issues.apache.org/jira/browse/LUCENE-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer closed LUCENE-3379. --- Resolution: Not A Problem closing this - nobody seemed to hit this again jre crashes in ArrayUtil mergeSort -- Key: LUCENE-3379 URL: https://issues.apache.org/jira/browse/LUCENE-3379 Project: Lucene - Java Issue Type: Bug Environment: 1.6.0_24 Reporter: Robert Muir Attachments: hs_err_pid25327.log, hs_err_pid4624.log while running the analyzers test, i got a JRE crash with 1.6.0_24 in {noformat} Current CompileTask: C2: 54 org.apache.lucene.util.SorterTemplate.merge(I)V (151 bytes) {noformat} {noformat} [junit] # [junit] # A fatal error has been detected by the Java Runtime Environment: [junit] # [junit] # SIGSEGV (0xb) at pc=0x7f768cc2f0ec, pid=4624, tid=140147041961728 [junit] # [junit] # JRE version: 6.0_24-b07 [junit] # Java VM: Java HotSpot(TM) 64-Bit Server VM (19.1-b02 mixed mode linux-amd64 compressed oops) [junit] # Problematic frame: [junit] # V [libjvm.so+0x3eb0ec] [junit] # [junit] # An error report file with more information is saved as: [junit] # /home/rmuir/workspace/lucene-trunk/modules/analysis/build/common/test/8/hs_err_pid4624.log [junit] # [junit] # If you would like to submit a bug report, please visit: [junit] # http://java.sun.com/webapps/bugreport/crash.jsp [junit] # {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3270) additional tests enhancements to faceting module
[ https://issues.apache.org/jira/browse/LUCENE-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149553#comment-13149553 ] Shai Erera commented on LUCENE-3270: I searched for static final under facet/src/test and scanned all the results - nothing there that seems worth randomizing. Also, I thought about RandomeTaxonomyWriter, and I'm not sure it's worth the effort since I'm afraid randomization will affect the strict behavior required by TW and we'll just chase ourselves. Perhaps we should just close this issue and handle things on a per case basis when we encounter them? additional tests enhancements to faceting module Key: LUCENE-3270 URL: https://issues.apache.org/jira/browse/LUCENE-3270 Project: Lucene - Java Issue Type: Test Components: modules/facet Reporter: Robert Muir Some ideas from LUCENE-3264: * make a RandomTaxonomyWriter * look at any hardcoded constants like #docs etc and see if we can in general add randomization. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3237) FSDirectory.fsync() may not work properly
[ https://issues.apache.org/jira/browse/LUCENE-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149555#comment-13149555 ] Simon Willnauer commented on LUCENE-3237: - Shai, I think we should close this. we can still reopen if we run into issues? FSDirectory.fsync() may not work properly - Key: LUCENE-3237 URL: https://issues.apache.org/jira/browse/LUCENE-3237 Project: Lucene - Java Issue Type: Bug Components: core/store Reporter: Shai Erera Fix For: 3.5, 4.0 Spinoff from LUCENE-3230. FSDirectory.fsync() opens a new RAF, sync() its FileDescriptor and closes RAF. It is not clear that this syncs whatever was written to the file by other FileDescriptors. It would be better if we do this operation on the actual RAF/FileOS which wrote the data. We can add sync() to IndexOutput and FSIndexOutput will do that. Directory-wise, we should stop syncing on file names, and instead sync on the IOs that performed the write operations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3237) FSDirectory.fsync() may not work properly
[ https://issues.apache.org/jira/browse/LUCENE-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera resolved LUCENE-3237. Resolution: Won't Fix Fix Version/s: (was: 3.5) (was: 4.0) Closing. If we ever see that this actually is a problem, we can reopen. FSDirectory.fsync() may not work properly - Key: LUCENE-3237 URL: https://issues.apache.org/jira/browse/LUCENE-3237 Project: Lucene - Java Issue Type: Bug Components: core/store Reporter: Shai Erera Spinoff from LUCENE-3230. FSDirectory.fsync() opens a new RAF, sync() its FileDescriptor and closes RAF. It is not clear that this syncs whatever was written to the file by other FileDescriptors. It would be better if we do this operation on the actual RAF/FileOS which wrote the data. We can add sync() to IndexOutput and FSIndexOutput will do that. Directory-wise, we should stop syncing on file names, and instead sync on the IOs that performed the write operations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3235) TestDoubleBarrelLRUCache hangs under Java 1.5, 3.x and trunk, likely JVM bug
[ https://issues.apache.org/jira/browse/LUCENE-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-3235. - Resolution: Won't Fix we moved to 1.6 on trunk seems we can't do much about it on 3.x - folks should run their stuff on 1.6 jvms or newer TestDoubleBarrelLRUCache hangs under Java 1.5, 3.x and trunk, likely JVM bug Key: LUCENE-3235 URL: https://issues.apache.org/jira/browse/LUCENE-3235 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Not sure what's going on yet... but under Java 1.6 it seems not to hang bug under Java 1.5 hangs fairly easily, on Linux. Java is 1.5.0_22. I suspect this is relevant: http://stackoverflow.com/questions/3292577/is-it-possible-for-concurrenthashmap-to-deadlock which refers to this JVM bug http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6865591 which then refers to this one http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6822370 It looks like that last bug was fixed in Java 1.6 but not 1.5. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3176) TestNRTThreads test failure
[ https://issues.apache.org/jira/browse/LUCENE-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-3176. - Resolution: Fixed this was a temp file issue - fixed TestNRTThreads test failure --- Key: LUCENE-3176 URL: https://issues.apache.org/jira/browse/LUCENE-3176 Project: Lucene - Java Issue Type: Bug Environment: trunk Reporter: Robert Muir Assignee: Michael McCandless hit a fail in TestNRTThreads running tests over and over: -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3235) TestDoubleBarrelLRUCache hangs under Java 1.5, 3.x and trunk, likely JVM bug
[ https://issues.apache.org/jira/browse/LUCENE-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149562#comment-13149562 ] Robert Muir commented on LUCENE-3235: - wait, this statement makes no sense. if 1.5 is no longer supported, then 1.5 should no longer be supported, and we should be free to use 1.6 code everywhere. TestDoubleBarrelLRUCache hangs under Java 1.5, 3.x and trunk, likely JVM bug Key: LUCENE-3235 URL: https://issues.apache.org/jira/browse/LUCENE-3235 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Not sure what's going on yet... but under Java 1.6 it seems not to hang bug under Java 1.5 hangs fairly easily, on Linux. Java is 1.5.0_22. I suspect this is relevant: http://stackoverflow.com/questions/3292577/is-it-possible-for-concurrenthashmap-to-deadlock which refers to this JVM bug http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6865591 which then refers to this one http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6822370 It looks like that last bug was fixed in Java 1.6 but not 1.5. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (LUCENE-3235) TestDoubleBarrelLRUCache hangs under Java 1.5, 3.x and trunk, likely JVM bug
[ https://issues.apache.org/jira/browse/LUCENE-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir reopened LUCENE-3235: - TestDoubleBarrelLRUCache hangs under Java 1.5, 3.x and trunk, likely JVM bug Key: LUCENE-3235 URL: https://issues.apache.org/jira/browse/LUCENE-3235 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Not sure what's going on yet... but under Java 1.6 it seems not to hang bug under Java 1.5 hangs fairly easily, on Linux. Java is 1.5.0_22. I suspect this is relevant: http://stackoverflow.com/questions/3292577/is-it-possible-for-concurrenthashmap-to-deadlock which refers to this JVM bug http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6865591 which then refers to this one http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6822370 It looks like that last bug was fixed in Java 1.6 but not 1.5. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3089) CachingTokenFilter can cause close() to be called twice.
[ https://issues.apache.org/jira/browse/LUCENE-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149564#comment-13149564 ] Simon Willnauer commented on LUCENE-3089: - robert, since TokenStream impl. Closeable we should be able to call close as often as we want to. we should actually check that we do that in our tests to make sure nothing fails. CachingTokenFilter can cause close() to be called twice. Key: LUCENE-3089 URL: https://issues.apache.org/jira/browse/LUCENE-3089 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir In LUCENE-3064, we added some state and checks to MockTokenizer to validate that consumers are properly using the tokenstream workflow (described here: http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/analysis/TokenStream.html) One problem I noticed in TestTermVectorsWriter.testEndOffsetPositionWithCachingTokenFilter is that providing a CachingTOkenFilter directly will result in close() being called twice on the underlying tokenstream... this seems wrong. Some ideas to fix this could be: # CachingTokenFilter overrides close() and we document that you must close the underlying stream yourself. I think this is what the queryparser does anyway. # CachingTokenFilter does something tricky to ensure it only closes the underlying stream once. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3235) TestDoubleBarrelLRUCache hangs under Java 1.5, 3.x and trunk, likely JVM bug
[ https://issues.apache.org/jira/browse/LUCENE-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149565#comment-13149565 ] Uwe Schindler commented on LUCENE-3235: --- I agree with Robert. This issue is still existent in 3.x as we officially support Java 5. TestDoubleBarrelLRUCache hangs under Java 1.5, 3.x and trunk, likely JVM bug Key: LUCENE-3235 URL: https://issues.apache.org/jira/browse/LUCENE-3235 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Not sure what's going on yet... but under Java 1.6 it seems not to hang bug under Java 1.5 hangs fairly easily, on Linux. Java is 1.5.0_22. I suspect this is relevant: http://stackoverflow.com/questions/3292577/is-it-possible-for-concurrenthashmap-to-deadlock which refers to this JVM bug http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6865591 which then refers to this one http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6822370 It looks like that last bug was fixed in Java 1.6 but not 1.5. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3089) CachingTokenFilter can cause close() to be called twice.
[ https://issues.apache.org/jira/browse/LUCENE-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149566#comment-13149566 ] Uwe Schindler commented on LUCENE-3089: --- Yes, the java.io.Closeable interface requires the underlying implementation to ignore additional close calls. But we should still fix our code to actually call it only once. CachingTokenFilter can cause close() to be called twice. Key: LUCENE-3089 URL: https://issues.apache.org/jira/browse/LUCENE-3089 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir In LUCENE-3064, we added some state and checks to MockTokenizer to validate that consumers are properly using the tokenstream workflow (described here: http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/analysis/TokenStream.html) One problem I noticed in TestTermVectorsWriter.testEndOffsetPositionWithCachingTokenFilter is that providing a CachingTOkenFilter directly will result in close() being called twice on the underlying tokenstream... this seems wrong. Some ideas to fix this could be: # CachingTokenFilter overrides close() and we document that you must close the underlying stream yourself. I think this is what the queryparser does anyway. # CachingTokenFilter does something tricky to ensure it only closes the underlying stream once. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3089) CachingTokenFilter can cause close() to be called twice.
[ https://issues.apache.org/jira/browse/LUCENE-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149568#comment-13149568 ] Robert Muir commented on LUCENE-3089: - Hmm i'm not sure i like that... perhaps its not appropriate to implement closeable. Lots of people seem to have problems with the analysis workflow and I think this adds confusion. CachingTokenFilter can cause close() to be called twice. Key: LUCENE-3089 URL: https://issues.apache.org/jira/browse/LUCENE-3089 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir In LUCENE-3064, we added some state and checks to MockTokenizer to validate that consumers are properly using the tokenstream workflow (described here: http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/analysis/TokenStream.html) One problem I noticed in TestTermVectorsWriter.testEndOffsetPositionWithCachingTokenFilter is that providing a CachingTOkenFilter directly will result in close() being called twice on the underlying tokenstream... this seems wrong. Some ideas to fix this could be: # CachingTokenFilter overrides close() and we document that you must close the underlying stream yourself. I think this is what the queryparser does anyway. # CachingTokenFilter does something tricky to ensure it only closes the underlying stream once. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3397) Cleanup Test TokenStreams so they are reusable
[ https://issues.apache.org/jira/browse/LUCENE-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male resolved LUCENE-3397. Resolution: Fixed All TokenStreams are now reusable. Cleanup Test TokenStreams so they are reusable -- Key: LUCENE-3397 URL: https://issues.apache.org/jira/browse/LUCENE-3397 Project: Lucene - Java Issue Type: Sub-task Components: modules/analysis Reporter: Chris Male Assignee: Chris Male Fix For: 4.0 Attachments: LUCENE-3397-highlighter.patch, LUCENE-3397-more.patch, LUCENE-3397.patch, LUCENE-3397.patch Many TokenStreams created in tests are not reusable. Some do some really messy things which prevent their reuse so we may have to change the tests themselves. We'll target back porting this to 3x. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3089) CachingTokenFilter can cause close() to be called twice.
[ https://issues.apache.org/jira/browse/LUCENE-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149571#comment-13149571 ] Uwe Schindler commented on LUCENE-3089: --- I disagree, removing the Closeable interface makes it stupid to use in Java 7 (close-with-resources). CachingTokenFilter can cause close() to be called twice. Key: LUCENE-3089 URL: https://issues.apache.org/jira/browse/LUCENE-3089 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir In LUCENE-3064, we added some state and checks to MockTokenizer to validate that consumers are properly using the tokenstream workflow (described here: http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/analysis/TokenStream.html) One problem I noticed in TestTermVectorsWriter.testEndOffsetPositionWithCachingTokenFilter is that providing a CachingTOkenFilter directly will result in close() being called twice on the underlying tokenstream... this seems wrong. Some ideas to fix this could be: # CachingTokenFilter overrides close() and we document that you must close the underlying stream yourself. I think this is what the queryparser does anyway. # CachingTokenFilter does something tricky to ensure it only closes the underlying stream once. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers
[ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male resolved LUCENE-3396. Resolution: Fixed Fix Version/s: 4.0 Assignee: Chris Male TokenStream reuse is now mandatory Make TokenStream Reuse Mandatory for Analyzers -- Key: LUCENE-3396 URL: https://issues.apache.org/jira/browse/LUCENE-3396 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Reporter: Chris Male Assignee: Chris Male Fix For: 4.0 Attachments: LUCENE-3396-forgotten.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-remaining-analyzers.patch, LUCENE-3396-remaining-merging.patch In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams. This is a big chunk of work, but its time to bite the bullet. I plan to attack this in the following way: - Collapse the logic of ReusableAnalyzerBase into Analyzer - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field. - Convert all Analyzers over to using TokenStreamComponents. I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused). - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3089) CachingTokenFilter can cause close() to be called twice.
[ https://issues.apache.org/jira/browse/LUCENE-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149570#comment-13149570 ] Robert Muir commented on LUCENE-3089: - {quote} Yes, the java.io.Closeable interface requires the underlying implementation to ignore additional close calls. {quote} Just because java.io.Closeable exists doesn't mean we must use it everywhere: if these semantics are inappropriate we can simply have .close() ourselves. CachingTokenFilter can cause close() to be called twice. Key: LUCENE-3089 URL: https://issues.apache.org/jira/browse/LUCENE-3089 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir In LUCENE-3064, we added some state and checks to MockTokenizer to validate that consumers are properly using the tokenstream workflow (described here: http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/analysis/TokenStream.html) One problem I noticed in TestTermVectorsWriter.testEndOffsetPositionWithCachingTokenFilter is that providing a CachingTOkenFilter directly will result in close() being called twice on the underlying tokenstream... this seems wrong. Some ideas to fix this could be: # CachingTokenFilter overrides close() and we document that you must close the underlying stream yourself. I think this is what the queryparser does anyway. # CachingTokenFilter does something tricky to ensure it only closes the underlying stream once. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3089) CachingTokenFilter can cause close() to be called twice.
[ https://issues.apache.org/jira/browse/LUCENE-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149572#comment-13149572 ] Robert Muir commented on LUCENE-3089: - I think java 7 close-with-resources is stupid too. CachingTokenFilter can cause close() to be called twice. Key: LUCENE-3089 URL: https://issues.apache.org/jira/browse/LUCENE-3089 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir In LUCENE-3064, we added some state and checks to MockTokenizer to validate that consumers are properly using the tokenstream workflow (described here: http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/analysis/TokenStream.html) One problem I noticed in TestTermVectorsWriter.testEndOffsetPositionWithCachingTokenFilter is that providing a CachingTOkenFilter directly will result in close() being called twice on the underlying tokenstream... this seems wrong. Some ideas to fix this could be: # CachingTokenFilter overrides close() and we document that you must close the underlying stream yourself. I think this is what the queryparser does anyway. # CachingTokenFilter does something tricky to ensure it only closes the underlying stream once. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3089) CachingTokenFilter can cause close() to be called twice.
[ https://issues.apache.org/jira/browse/LUCENE-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149573#comment-13149573 ] Uwe Schindler commented on LUCENE-3089: --- Why? For TokenStreams close-with-resources is great. CachingTokenFilter can cause close() to be called twice. Key: LUCENE-3089 URL: https://issues.apache.org/jira/browse/LUCENE-3089 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir In LUCENE-3064, we added some state and checks to MockTokenizer to validate that consumers are properly using the tokenstream workflow (described here: http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/analysis/TokenStream.html) One problem I noticed in TestTermVectorsWriter.testEndOffsetPositionWithCachingTokenFilter is that providing a CachingTOkenFilter directly will result in close() being called twice on the underlying tokenstream... this seems wrong. Some ideas to fix this could be: # CachingTokenFilter overrides close() and we document that you must close the underlying stream yourself. I think this is what the queryparser does anyway. # CachingTokenFilter does something tricky to ensure it only closes the underlying stream once. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3089) CachingTokenFilter can cause close() to be called twice.
[ https://issues.apache.org/jira/browse/LUCENE-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149575#comment-13149575 ] Simon Willnauer commented on LUCENE-3089: - bq. Why? For TokenStreams close-with-resources is great. +1 CachingTokenFilter can cause close() to be called twice. Key: LUCENE-3089 URL: https://issues.apache.org/jira/browse/LUCENE-3089 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir In LUCENE-3064, we added some state and checks to MockTokenizer to validate that consumers are properly using the tokenstream workflow (described here: http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/analysis/TokenStream.html) One problem I noticed in TestTermVectorsWriter.testEndOffsetPositionWithCachingTokenFilter is that providing a CachingTOkenFilter directly will result in close() being called twice on the underlying tokenstream... this seems wrong. Some ideas to fix this could be: # CachingTokenFilter overrides close() and we document that you must close the underlying stream yourself. I think this is what the queryparser does anyway. # CachingTokenFilter does something tricky to ensure it only closes the underlying stream once. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3089) CachingTokenFilter can cause close() to be called twice.
[ https://issues.apache.org/jira/browse/LUCENE-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149571#comment-13149571 ] Uwe Schindler edited comment on LUCENE-3089 at 11/14/11 11:11 AM: -- I disagree, removing the Closeable interface makes it stupid to use in Java 7 (try-with-resources). was (Author: thetaphi): I disagree, removing the Closeable interface makes it stupid to use in Java 7 (close-with-resources). CachingTokenFilter can cause close() to be called twice. Key: LUCENE-3089 URL: https://issues.apache.org/jira/browse/LUCENE-3089 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir In LUCENE-3064, we added some state and checks to MockTokenizer to validate that consumers are properly using the tokenstream workflow (described here: http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/analysis/TokenStream.html) One problem I noticed in TestTermVectorsWriter.testEndOffsetPositionWithCachingTokenFilter is that providing a CachingTOkenFilter directly will result in close() being called twice on the underlying tokenstream... this seems wrong. Some ideas to fix this could be: # CachingTokenFilter overrides close() and we document that you must close the underlying stream yourself. I think this is what the queryparser does anyway. # CachingTokenFilter does something tricky to ensure it only closes the underlying stream once. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3089) CachingTokenFilter can cause close() to be called twice.
[ https://issues.apache.org/jira/browse/LUCENE-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149573#comment-13149573 ] Uwe Schindler edited comment on LUCENE-3089 at 11/14/11 11:11 AM: -- Why? For TokenStreams try-with-resources is great. was (Author: thetaphi): Why? For TokenStreams close-with-resources is great. CachingTokenFilter can cause close() to be called twice. Key: LUCENE-3089 URL: https://issues.apache.org/jira/browse/LUCENE-3089 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir In LUCENE-3064, we added some state and checks to MockTokenizer to validate that consumers are properly using the tokenstream workflow (described here: http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/analysis/TokenStream.html) One problem I noticed in TestTermVectorsWriter.testEndOffsetPositionWithCachingTokenFilter is that providing a CachingTOkenFilter directly will result in close() being called twice on the underlying tokenstream... this seems wrong. Some ideas to fix this could be: # CachingTokenFilter overrides close() and we document that you must close the underlying stream yourself. I think this is what the queryparser does anyway. # CachingTokenFilter does something tricky to ensure it only closes the underlying stream once. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3235) TestDoubleBarrelLRUCache hangs under Java 1.5, 3.x and trunk, likely JVM bug
[ https://issues.apache.org/jira/browse/LUCENE-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3235: Affects Version/s: 3.0 3.1 3.2 3.3 3.4 Fix Version/s: 3.5 TestDoubleBarrelLRUCache hangs under Java 1.5, 3.x and trunk, likely JVM bug Key: LUCENE-3235 URL: https://issues.apache.org/jira/browse/LUCENE-3235 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0, 3.1, 3.2, 3.3, 3.4 Reporter: Michael McCandless Fix For: 3.5 Not sure what's going on yet... but under Java 1.6 it seems not to hang bug under Java 1.5 hangs fairly easily, on Linux. Java is 1.5.0_22. I suspect this is relevant: http://stackoverflow.com/questions/3292577/is-it-possible-for-concurrenthashmap-to-deadlock which refers to this JVM bug http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6865591 which then refers to this one http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6822370 It looks like that last bug was fixed in Java 1.6 but not 1.5. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3235) TestDoubleBarrelLRUCache hangs under Java 1.5, 3.x and trunk, likely JVM bug
[ https://issues.apache.org/jira/browse/LUCENE-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149577#comment-13149577 ] Simon Willnauer commented on LUCENE-3235: - well then we should fix it - I will mark it as 3.5 TestDoubleBarrelLRUCache hangs under Java 1.5, 3.x and trunk, likely JVM bug Key: LUCENE-3235 URL: https://issues.apache.org/jira/browse/LUCENE-3235 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0, 3.1, 3.2, 3.3, 3.4 Reporter: Michael McCandless Fix For: 3.5 Not sure what's going on yet... but under Java 1.6 it seems not to hang bug under Java 1.5 hangs fairly easily, on Linux. Java is 1.5.0_22. I suspect this is relevant: http://stackoverflow.com/questions/3292577/is-it-possible-for-concurrenthashmap-to-deadlock which refers to this JVM bug http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6865591 which then refers to this one http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6822370 It looks like that last bug was fixed in Java 1.6 but not 1.5. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 11324 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/11324/ 1 tests failed. REGRESSION: org.apache.lucene.search.TestSort.testReverseSort Error Message: expected:[CEGIA] but was:[ACEGI] Stack Trace: at org.apache.lucene.search.TestSort.assertMatches(TestSort.java:1234) at org.apache.lucene.search.TestSort.assertMatches(TestSort.java:1215) at org.apache.lucene.search.TestSort.testReverseSort(TestSort.java:758) at org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:523) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:149) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:51) Build Log (for compile errors): [...truncated 1331 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3089) CachingTokenFilter can cause close() to be called twice.
[ https://issues.apache.org/jira/browse/LUCENE-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149580#comment-13149580 ] Robert Muir commented on LUCENE-3089: - I just don't think it should be blanket policy without thinking things thru. for example: lots of code you see on the internet opens a new indexreader for every search and closes it should we seriously encourage this?! If someone seriously needs to do this, thats an expert case and they can use try + finally and close themselves. So for example, there I think it makes sense for IndexReader to not support AutoCLoseable, and separately to remove the stupid IndexSearcher(Directory) so that IndexSearcher only takes IndexReader, so its *always* a thin wrapper like we claim it is (which is an outright lie today). Then IndexSearcher would implement [Auto]Closeable since its cheap. CachingTokenFilter can cause close() to be called twice. Key: LUCENE-3089 URL: https://issues.apache.org/jira/browse/LUCENE-3089 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir In LUCENE-3064, we added some state and checks to MockTokenizer to validate that consumers are properly using the tokenstream workflow (described here: http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/analysis/TokenStream.html) One problem I noticed in TestTermVectorsWriter.testEndOffsetPositionWithCachingTokenFilter is that providing a CachingTOkenFilter directly will result in close() being called twice on the underlying tokenstream... this seems wrong. Some ideas to fix this could be: # CachingTokenFilter overrides close() and we document that you must close the underlying stream yourself. I think this is what the queryparser does anyway. # CachingTokenFilter does something tricky to ensure it only closes the underlying stream once. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3235) TestDoubleBarrelLRUCache hangs under Java 1.5, 3.x and trunk, likely JVM bug
[ https://issues.apache.org/jira/browse/LUCENE-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149581#comment-13149581 ] Uwe Schindler commented on LUCENE-3235: --- An easy fix would be to use Collections.synchronizedMap(new HashMap()) in the ctor to initializer cache1 and cache2 (if Java 5 is detected)? If people are using Java 5 they get not-the best-performance. TestDoubleBarrelLRUCache hangs under Java 1.5, 3.x and trunk, likely JVM bug Key: LUCENE-3235 URL: https://issues.apache.org/jira/browse/LUCENE-3235 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0, 3.1, 3.2, 3.3, 3.4 Reporter: Michael McCandless Fix For: 3.5 Not sure what's going on yet... but under Java 1.6 it seems not to hang bug under Java 1.5 hangs fairly easily, on Linux. Java is 1.5.0_22. I suspect this is relevant: http://stackoverflow.com/questions/3292577/is-it-possible-for-concurrenthashmap-to-deadlock which refers to this JVM bug http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6865591 which then refers to this one http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6822370 It looks like that last bug was fixed in Java 1.6 but not 1.5. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 11324 - Failure
I'll dig... Mike McCandless http://blog.mikemccandless.com On Mon, Nov 14, 2011 at 6:28 AM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/11324/ 1 tests failed. REGRESSION: org.apache.lucene.search.TestSort.testReverseSort Error Message: expected:[CEGIA] but was:[ACEGI] Stack Trace: at org.apache.lucene.search.TestSort.assertMatches(TestSort.java:1234) at org.apache.lucene.search.TestSort.assertMatches(TestSort.java:1215) at org.apache.lucene.search.TestSort.testReverseSort(TestSort.java:758) at org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:523) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:149) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:51) Build Log (for compile errors): [...truncated 1331 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3269) Speed up Top-K sampling tests
[ https://issues.apache.org/jira/browse/LUCENE-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149583#comment-13149583 ] Robert Muir commented on LUCENE-3269: - Hi Shai: a couple suggestions. With the current patch we will never close these directories, so we lose some test coverage like the CheckIndex at the end... I think these tests caught a serious JRE bug in this checkindex so i'd like to keep it. Additionally we have a problem I think if we randomly get a FSDirectory, especially on windows. So how about we build up a RAMdir and cache it? when topK tests start up they could do something like this: {noformat} Directory dir = newDirectory(random, getCachedDir()); ... dir.close(); {noformat} where getCachedDir is the access to the cache (if it doesnt exist, it builds it, and its always a ramdir). (LuceneTestCase already has newDirectory(random, Directory) that copies from an existing directory) Speed up Top-K sampling tests - Key: LUCENE-3269 URL: https://issues.apache.org/jira/browse/LUCENE-3269 Project: Lucene - Java Issue Type: Test Components: modules/facet Reporter: Robert Muir Fix For: 3.5, 4.0 Attachments: LUCENE-3269.patch, LUCENE-3269.patch, LUCENE-3269.patch, LUCENE-3269.patch speed up the top-k sampling tests (but make sure they are thorough on nightly etc still) usually we would do this with use of atLeast(), but these tests are somewhat tricky, so maybe a different approach is needed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3235) TestDoubleBarrelLRUCache hangs under Java 1.5, 3.x and trunk, likely JVM bug
[ https://issues.apache.org/jira/browse/LUCENE-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149584#comment-13149584 ] Robert Muir commented on LUCENE-3235: - I like Uwe's idea: not-the-best-performance is far preferable to a hang/deadlock! TestDoubleBarrelLRUCache hangs under Java 1.5, 3.x and trunk, likely JVM bug Key: LUCENE-3235 URL: https://issues.apache.org/jira/browse/LUCENE-3235 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0, 3.1, 3.2, 3.3, 3.4 Reporter: Michael McCandless Fix For: 3.5 Not sure what's going on yet... but under Java 1.6 it seems not to hang bug under Java 1.5 hangs fairly easily, on Linux. Java is 1.5.0_22. I suspect this is relevant: http://stackoverflow.com/questions/3292577/is-it-possible-for-concurrenthashmap-to-deadlock which refers to this JVM bug http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6865591 which then refers to this one http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6822370 It looks like that last bug was fixed in Java 1.6 but not 1.5. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3235) TestDoubleBarrelLRUCache hangs under Java 1.5, 3.x and trunk, likely JVM bug
[ https://issues.apache.org/jira/browse/LUCENE-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149585#comment-13149585 ] Uwe Schindler commented on LUCENE-3235: --- I am currently preparing a patch. TestDoubleBarrelLRUCache hangs under Java 1.5, 3.x and trunk, likely JVM bug Key: LUCENE-3235 URL: https://issues.apache.org/jira/browse/LUCENE-3235 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0, 3.1, 3.2, 3.3, 3.4 Reporter: Michael McCandless Fix For: 3.5 Not sure what's going on yet... but under Java 1.6 it seems not to hang bug under Java 1.5 hangs fairly easily, on Linux. Java is 1.5.0_22. I suspect this is relevant: http://stackoverflow.com/questions/3292577/is-it-possible-for-concurrenthashmap-to-deadlock which refers to this JVM bug http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6865591 which then refers to this one http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6822370 It looks like that last bug was fixed in Java 1.6 but not 1.5. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3089) CachingTokenFilter can cause close() to be called twice.
[ https://issues.apache.org/jira/browse/LUCENE-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149586#comment-13149586 ] Michael McCandless commented on LUCENE-3089: {quote} So for example, there I think it makes sense for IndexReader to not support AutoCLoseable, and separately to remove the stupid IndexSearcher(Directory) so that IndexSearcher only takes IndexReader, so its always a thin wrapper like we claim it is (which is an outright lie today). {quote} +1 We should deprecate/remove the IS ctor that takes a Directory. It's trappy. CachingTokenFilter can cause close() to be called twice. Key: LUCENE-3089 URL: https://issues.apache.org/jira/browse/LUCENE-3089 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir In LUCENE-3064, we added some state and checks to MockTokenizer to validate that consumers are properly using the tokenstream workflow (described here: http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/analysis/TokenStream.html) One problem I noticed in TestTermVectorsWriter.testEndOffsetPositionWithCachingTokenFilter is that providing a CachingTOkenFilter directly will result in close() being called twice on the underlying tokenstream... this seems wrong. Some ideas to fix this could be: # CachingTokenFilter overrides close() and we document that you must close the underlying stream yourself. I think this is what the queryparser does anyway. # CachingTokenFilter does something tricky to ensure it only closes the underlying stream once. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3235) TestDoubleBarrelLRUCache hangs under Java 1.5, 3.x and trunk, likely JVM bug
[ https://issues.apache.org/jira/browse/LUCENE-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3235: -- Attachment: LUCENE-3235.patch Patch. We should forward port the deprecation/removal of useless Constants. TestDoubleBarrelLRUCache hangs under Java 1.5, 3.x and trunk, likely JVM bug Key: LUCENE-3235 URL: https://issues.apache.org/jira/browse/LUCENE-3235 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0, 3.1, 3.2, 3.3, 3.4 Reporter: Michael McCandless Fix For: 3.5 Attachments: LUCENE-3235.patch Not sure what's going on yet... but under Java 1.6 it seems not to hang bug under Java 1.5 hangs fairly easily, on Linux. Java is 1.5.0_22. I suspect this is relevant: http://stackoverflow.com/questions/3292577/is-it-possible-for-concurrenthashmap-to-deadlock which refers to this JVM bug http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6865591 which then refers to this one http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6822370 It looks like that last bug was fixed in Java 1.6 but not 1.5. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3269) Speed up Top-K sampling tests
[ https://issues.apache.org/jira/browse/LUCENE-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149594#comment-13149594 ] Robert Muir commented on LUCENE-3269: - Sorry Shai, i got myself confused and thought you were trying to cache across-tests... this patch is good in case a test has multiple methods...! Speed up Top-K sampling tests - Key: LUCENE-3269 URL: https://issues.apache.org/jira/browse/LUCENE-3269 Project: Lucene - Java Issue Type: Test Components: modules/facet Reporter: Robert Muir Fix For: 3.5, 4.0 Attachments: LUCENE-3269.patch, LUCENE-3269.patch, LUCENE-3269.patch, LUCENE-3269.patch speed up the top-k sampling tests (but make sure they are thorough on nightly etc still) usually we would do this with use of atLeast(), but these tests are somewhat tricky, so maybe a different approach is needed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3089) CachingTokenFilter can cause close() to be called twice.
[ https://issues.apache.org/jira/browse/LUCENE-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149595#comment-13149595 ] Uwe Schindler commented on LUCENE-3089: --- Then we should also rename close() to something else: closeThisIfYouAreReallySure() - implementing Closeable is then already out-of scope. When adding a close method to classes it leads you to take care of closing after using it. Also everybody expects what Closeable interface defines: You can use it multiple times. For TokenStreams thats find, as close is just a cleanup and is not even required if you dont have a Tokenizer with Reader. CachingTokenFilter can cause close() to be called twice. Key: LUCENE-3089 URL: https://issues.apache.org/jira/browse/LUCENE-3089 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir In LUCENE-3064, we added some state and checks to MockTokenizer to validate that consumers are properly using the tokenstream workflow (described here: http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/analysis/TokenStream.html) One problem I noticed in TestTermVectorsWriter.testEndOffsetPositionWithCachingTokenFilter is that providing a CachingTOkenFilter directly will result in close() being called twice on the underlying tokenstream... this seems wrong. Some ideas to fix this could be: # CachingTokenFilter overrides close() and we document that you must close the underlying stream yourself. I think this is what the queryparser does anyway. # CachingTokenFilter does something tricky to ensure it only closes the underlying stream once. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: set PYTHONPATH programatically from Java?
On Mon, Nov 14, 2011 at 4:25 AM, Andi Vajda va...@apache.org wrote: On Sun, 13 Nov 2011, Roman Chyla wrote: I am using JCC to run Python inside Java. For unittest, I'd like to set PYTHONPATH environment variable programmatically. I can change env vars inside Java (using http://stackoverflow.com/questions/318239/how-do-i-set-environment-variables-from-java) and System.getenv(PYTHONPATH) shows correct values However, I am still getting ImportError: no module named If I set PYTHONPATH before starting unittest, it works fine Is it possible what I would like to do? Why mess with the environment instead of setting sys.path directly instead ? That would be great, but I don't know how. I am doing roughly this: PythonVM.start(programName) vm = PythonVM.get() vm.instantiate(moduleName, className); I tried also: PythonVM.start(programName, new String[]{-c, import sys;sys.path.insert(0, \'/dvt/workspace/montysolr/src/python\'}); it is failing on vm.instantiate when Python cannot find the module Alternatively, if JCC could execute/eval python string, I could set sys.argv that way I'm not sure what you mean here but JCC's Java PythonVM.init() method takes an array of strings that is fed into sys.argv. See _PythonVM_Init() sources in jcc.cpp for details. sorry, i meant sys.path, not sys.argv roman Andi..
[jira] [Commented] (LUCENE-3269) Speed up Top-K sampling tests
[ https://issues.apache.org/jira/browse/LUCENE-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149597#comment-13149597 ] Shai Erera commented on LUCENE-3269: Right. Caching across tests is very tricky since they can anyway run in different JVMs (with parallel testing) and so we'll gain nothing. And the tests are not really slow - the sampling tests run 12 seconds on my laptop ... not a big deal. I'll commit shortly. Speed up Top-K sampling tests - Key: LUCENE-3269 URL: https://issues.apache.org/jira/browse/LUCENE-3269 Project: Lucene - Java Issue Type: Test Components: modules/facet Reporter: Robert Muir Fix For: 3.5, 4.0 Attachments: LUCENE-3269.patch, LUCENE-3269.patch, LUCENE-3269.patch, LUCENE-3269.patch speed up the top-k sampling tests (but make sure they are thorough on nightly etc still) usually we would do this with use of atLeast(), but these tests are somewhat tricky, so maybe a different approach is needed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3269) Speed up Top-K sampling tests
[ https://issues.apache.org/jira/browse/LUCENE-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera resolved LUCENE-3269. Resolution: Fixed Assignee: Shai Erera Lucene Fields: New,Patch Available (was: New) Committed revisions 1201677 (3x) and 1201678 (trunk). Thanks Robert ! Speed up Top-K sampling tests - Key: LUCENE-3269 URL: https://issues.apache.org/jira/browse/LUCENE-3269 Project: Lucene - Java Issue Type: Test Components: modules/facet Reporter: Robert Muir Assignee: Shai Erera Fix For: 3.5, 4.0 Attachments: LUCENE-3269.patch, LUCENE-3269.patch, LUCENE-3269.patch, LUCENE-3269.patch speed up the top-k sampling tests (but make sure they are thorough on nightly etc still) usually we would do this with use of atLeast(), but these tests are somewhat tricky, so maybe a different approach is needed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3269) Speed up Top-K sampling tests
[ https://issues.apache.org/jira/browse/LUCENE-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149600#comment-13149600 ] Shai Erera commented on LUCENE-3269: I see what got you confused (it was me, not you): {quote} however, if they will run in the same JVM, then they will reuse the already created indexes {quote} what I wrote is wrong (I got myself confused (!) -- whatever you do in beforeClass affects only that testcase, not all the ones that will run in the JVM. Perhaps JUnit need to invent two more concepts @StartJVM and @EndJVM, for this to happen :) Speed up Top-K sampling tests - Key: LUCENE-3269 URL: https://issues.apache.org/jira/browse/LUCENE-3269 Project: Lucene - Java Issue Type: Test Components: modules/facet Reporter: Robert Muir Assignee: Shai Erera Fix For: 3.5, 4.0 Attachments: LUCENE-3269.patch, LUCENE-3269.patch, LUCENE-3269.patch, LUCENE-3269.patch speed up the top-k sampling tests (but make sure they are thorough on nightly etc still) usually we would do this with use of atLeast(), but these tests are somewhat tricky, so maybe a different approach is needed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3269) Speed up Top-K sampling tests
[ https://issues.apache.org/jira/browse/LUCENE-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149600#comment-13149600 ] Shai Erera edited comment on LUCENE-3269 at 11/14/11 12:20 PM: --- I see what got you confused (it was me, not you): {quote} however, if they will run in the same JVM, then they will reuse the already created indexes {quote} what I wrote is wrong (I got myself confused !) -- whatever you do in beforeClass affects only that testcase, not all the ones that will run in the JVM. Perhaps JUnit need to invent two more concepts @StartJVM and @EndJVM, for this to happen :) was (Author: shaie): I see what got you confused (it was me, not you): {quote} however, if they will run in the same JVM, then they will reuse the already created indexes {quote} what I wrote is wrong (I got myself confused (!) -- whatever you do in beforeClass affects only that testcase, not all the ones that will run in the JVM. Perhaps JUnit need to invent two more concepts @StartJVM and @EndJVM, for this to happen :) Speed up Top-K sampling tests - Key: LUCENE-3269 URL: https://issues.apache.org/jira/browse/LUCENE-3269 Project: Lucene - Java Issue Type: Test Components: modules/facet Reporter: Robert Muir Assignee: Shai Erera Fix For: 3.5, 4.0 Attachments: LUCENE-3269.patch, LUCENE-3269.patch, LUCENE-3269.patch, LUCENE-3269.patch speed up the top-k sampling tests (but make sure they are thorough on nightly etc still) usually we would do this with use of atLeast(), but these tests are somewhat tricky, so maybe a different approach is needed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3097) Post grouping faceting
[ https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn van Groningen resolved LUCENE-3097. --- Resolution: Fixed Lucene Fields: Patch Available (was: New) The support for real grouped faceting (matrix counts) needs to be added to Solr or faceting module. Post grouping faceting -- Key: LUCENE-3097 URL: https://issues.apache.org/jira/browse/LUCENE-3097 Project: Lucene - Java Issue Type: New Feature Components: modules/grouping Reporter: Martijn van Groningen Assignee: Martijn van Groningen Priority: Minor Fix For: 4.0, 3.4 Attachments: LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-30971.patch This issues focuses on implementing post grouping faceting. * How to handle multivalued fields. What field value to show with the facet. * Where the facet counts should be based on ** Facet counts can be based on the normal documents. Ungrouped counts. ** Facet counts can be based on the groups. Grouped counts. ** Facet counts can be based on the combination of group value and facet value. Matrix counts. And properly more implementation options. The first two methods are implemented in the SOLR-236 patch. For the first option it calculates a DocSet based on the individual documents from the query result. For the second option it calculates a DocSet for all the most relevant documents of a group. Once the DocSet is computed the FacetComponent and StatsComponent use one the DocSet to create facets and statistics. This last one is a bit more complex. I think it is best explained with an example. Lets say we search on travel offers: |||hotel||departure_airport||duration|| |Hotel a|AMS|5 |Hotel a|DUS|10 |Hotel b|AMS|5 |Hotel b|AMS|10 If we group by hotel and have a facet for airport. Most end users expect (according to my experience off course) the following airport facet: AMS: 2 DUS: 1 The above result can't be achieved by the first two methods. You either get counts AMS:3 and DUS:1 or 1 for both airports. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3097) Post grouping faceting
[ https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn van Groningen updated LUCENE-3097: -- Fix Version/s: (was: 3.5) 3.4 Post grouping faceting -- Key: LUCENE-3097 URL: https://issues.apache.org/jira/browse/LUCENE-3097 Project: Lucene - Java Issue Type: New Feature Components: modules/grouping Reporter: Martijn van Groningen Assignee: Martijn van Groningen Priority: Minor Fix For: 3.4, 4.0 Attachments: LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-30971.patch This issues focuses on implementing post grouping faceting. * How to handle multivalued fields. What field value to show with the facet. * Where the facet counts should be based on ** Facet counts can be based on the normal documents. Ungrouped counts. ** Facet counts can be based on the groups. Grouped counts. ** Facet counts can be based on the combination of group value and facet value. Matrix counts. And properly more implementation options. The first two methods are implemented in the SOLR-236 patch. For the first option it calculates a DocSet based on the individual documents from the query result. For the second option it calculates a DocSet for all the most relevant documents of a group. Once the DocSet is computed the FacetComponent and StatsComponent use one the DocSet to create facets and statistics. This last one is a bit more complex. I think it is best explained with an example. Lets say we search on travel offers: |||hotel||departure_airport||duration|| |Hotel a|AMS|5 |Hotel a|DUS|10 |Hotel b|AMS|5 |Hotel b|AMS|10 If we group by hotel and have a facet for airport. Most end users expect (according to my experience off course) the following airport facet: AMS: 2 DUS: 1 The above result can't be achieved by the first two methods. You either get counts AMS:3 and DUS:1 or 1 for both airports. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2898) Support grouped faceting
Support grouped faceting Key: SOLR-2898 URL: https://issues.apache.org/jira/browse/SOLR-2898 Project: Solr Issue Type: New Feature Reporter: Martijn van Groningen Support grouped faceting. As described in LUCENE-3097 (matrix counts). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 11325 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/11325/ All tests passed Build Log (for compile errors): [...truncated 14647 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3571) nuke IndexSearcher(directory)
nuke IndexSearcher(directory) - Key: LUCENE-3571 URL: https://issues.apache.org/jira/browse/LUCENE-3571 Project: Lucene - Java Issue Type: Task Reporter: Robert Muir Fix For: 4.0 IndexSearcher is supposed to be a cheap wrapper around a reader, but sometimes it is, sometimes it isn't. I think its confusing tangling of a heavyweight and lightweight object that it sometimes 'houses' a reader and must close it in that case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3571) nuke IndexSearcher(directory)
[ https://issues.apache.org/jira/browse/LUCENE-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3571: Attachment: LUCENE-3571.patch nuke IndexSearcher(directory) - Key: LUCENE-3571 URL: https://issues.apache.org/jira/browse/LUCENE-3571 Project: Lucene - Java Issue Type: Task Reporter: Robert Muir Fix For: 4.0 Attachments: LUCENE-3571.patch IndexSearcher is supposed to be a cheap wrapper around a reader, but sometimes it is, sometimes it isn't. I think its confusing tangling of a heavyweight and lightweight object that it sometimes 'houses' a reader and must close it in that case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3572) MultiIndexDocValues pretends it can merge sorted sources
MultiIndexDocValues pretends it can merge sorted sources Key: LUCENE-3572 URL: https://issues.apache.org/jira/browse/LUCENE-3572 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Fix For: 4.0 Nightly build hit this failure: {noformat} ant test-core -Dtestcase=TestSort -Dtestmethod=testReverseSort -Dtests.seed=791b126576b0cfab:-48895c7243ecc5d0:743c683d1c9f7768 -Dtests.multiplier=3 -Dargs=-Dfile.encoding=ISO8859-1 [junit] Testcase: testReverseSort(org.apache.lucene.search.TestSort): Caused an ERROR [junit] expected:[CEGIA] but was:[ACEGI] [junit] at org.apache.lucene.search.TestSort.assertMatches(TestSort.java:1248) [junit] at org.apache.lucene.search.TestSort.assertMatches(TestSort.java:1216) [junit] at org.apache.lucene.search.TestSort.testReverseSort(TestSort.java:759) [junit] at org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:523) [junit] at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:149) [junit] at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:51) {noformat} It's happening in the test for reverse-sort of a string field with DocValues, when the test had gotten SlowMultiReaderWrapper. I committed a fix to the test to avoid testing this case, but we need a better fix to the underlying bug. MultiIndexDocValues cannot merge sorted sources (I think?), yet somehow it's pretending it can (in the above test, the three subs had BYTES_FIXED_SORTED type, and the TypePromoter happily claims to merge these to BYTES_FIXED_SORTED; I think MultiIndexDocValues should return null for the sorted source in this case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 11324 - Failure
OK I committed a fix to the test, but also opened LUCENE-3572 to get to the root cause... Mike McCandless http://blog.mikemccandless.com On Mon, Nov 14, 2011 at 6:39 AM, Michael McCandless luc...@mikemccandless.com wrote: I'll dig... Mike McCandless http://blog.mikemccandless.com On Mon, Nov 14, 2011 at 6:28 AM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/11324/ 1 tests failed. REGRESSION: org.apache.lucene.search.TestSort.testReverseSort Error Message: expected:[CEGIA] but was:[ACEGI] Stack Trace: at org.apache.lucene.search.TestSort.assertMatches(TestSort.java:1234) at org.apache.lucene.search.TestSort.assertMatches(TestSort.java:1215) at org.apache.lucene.search.TestSort.testReverseSort(TestSort.java:758) at org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:523) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:149) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:51) Build Log (for compile errors): [...truncated 1331 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 940 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/940/ 1 tests failed. REGRESSION: org.apache.solr.update.AutoCommitTest.testMaxDocs Error Message: should find one query failed XPath: //result[@numFound=1] xml response was: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime1/int/lstresult name=response numFound=0 start=0/result /response request was: start=0q=id:14qt=standardrows=20version=2.2 Stack Trace: junit.framework.AssertionFailedError: should find one query failed XPath: //result[@numFound=1] xml response was: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime1/int/lstresult name=response numFound=0 start=0/result /response request was: start=0q=id:14qt=standardrows=20version=2.2 at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:149) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:51) xml response was: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime1/int/lstresult name=response numFound=0 start=0/result /response request was: start=0q=id:14qt=standardrows=20version=2.2 at org.apache.solr.util.AbstractSolrTestCase.assertQ(AbstractSolrTestCase.java:260) at org.apache.solr.update.AutoCommitTest.testMaxDocs(AutoCommitTest.java:181) at org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:523) Build Log (for compile errors): [...truncated 10996 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1726) Deep Paging and Large Results Improvements
[ https://issues.apache.org/jira/browse/SOLR-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149629#comment-13149629 ] Manojkumar Rangasamy Kannadasan commented on SOLR-1726: --- hi, I am working to insert a new type of query for the issue 1726 by including the lastpageScore and lastDoc in the query as stated by Grant. Can anyone please let me know the place of code where i can insert a new mapping rule for this query to a new function in SolrIndexSearcher. Kindly reply. Deep Paging and Large Results Improvements -- Key: SOLR-1726 URL: https://issues.apache.org/jira/browse/SOLR-1726 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 3.5, 4.0 There are possibly ways to improve collections of deep paging by passing Solr/Lucene more information about the last page of results seen, thereby saving priority queue operations. See LUCENE-2215. There may also be better options for retrieving large numbers of rows at a time that are worth exploring. LUCENE-2127. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 11326 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/11326/ 1 tests failed. REGRESSION: org.apache.solr.update.AutoCommitTest.testMaxDocs Error Message: should find one query failed XPath: //result[@numFound=1] xml response was: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime3/int/lstresult name=response numFound=0 start=0/result /response request was: start=0q=id:14qt=standardrows=20version=2.2 Stack Trace: junit.framework.AssertionFailedError: should find one query failed XPath: //result[@numFound=1] xml response was: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime3/int/lstresult name=response numFound=0 start=0/result /response request was: start=0q=id:14qt=standardrows=20version=2.2 at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:149) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:51) xml response was: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime3/int/lstresult name=response numFound=0 start=0/result /response request was: start=0q=id:14qt=standardrows=20version=2.2 at org.apache.solr.util.AbstractSolrTestCase.assertQ(AbstractSolrTestCase.java:260) at org.apache.solr.update.AutoCommitTest.testMaxDocs(AutoCommitTest.java:181) at org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:523) Build Log (for compile errors): [...truncated 7847 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3235) TestDoubleBarrelLRUCache hangs under Java 1.5, 3.x and trunk, likely JVM bug
[ https://issues.apache.org/jira/browse/LUCENE-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149631#comment-13149631 ] Simon Willnauer commented on LUCENE-3235: - bq. An easy fix would be to use Collections.synchronizedMap(new HashMap()) in the ctor to initializer cache1 and cache2 (if Java 5 is detected)? If people are using Java 5 they get not-the best-performance. I like that too... TestDoubleBarrelLRUCache hangs under Java 1.5, 3.x and trunk, likely JVM bug Key: LUCENE-3235 URL: https://issues.apache.org/jira/browse/LUCENE-3235 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0, 3.1, 3.2, 3.3, 3.4 Reporter: Michael McCandless Fix For: 3.5 Attachments: LUCENE-3235.patch Not sure what's going on yet... but under Java 1.6 it seems not to hang bug under Java 1.5 hangs fairly easily, on Linux. Java is 1.5.0_22. I suspect this is relevant: http://stackoverflow.com/questions/3292577/is-it-possible-for-concurrenthashmap-to-deadlock which refers to this JVM bug http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6865591 which then refers to this one http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6822370 It looks like that last bug was fixed in Java 1.6 but not 1.5. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Casesensitive search problem
HI Whenever I am searching with the words OfficeJet or officejet or Officejet or oFiiIcejET. I am getting the different results for each search respectively. I am not able to understand why this is happening? I want to solve this problem such a way that search will become case insensitive and I will get same result for any combination of capital and small letters. Please let me know How i will solve this problem -- Jayanta Sahoo
[jira] [Commented] (LUCENE-3571) nuke IndexSearcher(directory)
[ https://issues.apache.org/jira/browse/LUCENE-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149634#comment-13149634 ] Simon Willnauer commented on LUCENE-3571: - +1 nuke IndexSearcher(directory) - Key: LUCENE-3571 URL: https://issues.apache.org/jira/browse/LUCENE-3571 Project: Lucene - Java Issue Type: Task Reporter: Robert Muir Fix For: 4.0 Attachments: LUCENE-3571.patch IndexSearcher is supposed to be a cheap wrapper around a reader, but sometimes it is, sometimes it isn't. I think its confusing tangling of a heavyweight and lightweight object that it sometimes 'houses' a reader and must close it in that case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3571) nuke IndexSearcher(directory)
[ https://issues.apache.org/jira/browse/LUCENE-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149634#comment-13149634 ] Simon Willnauer edited comment on LUCENE-3571 at 11/14/11 2:12 PM: --- +1 - actually I think we should deprecate this ctor in 3.x - nobody should use that really was (Author: simonw): +1 nuke IndexSearcher(directory) - Key: LUCENE-3571 URL: https://issues.apache.org/jira/browse/LUCENE-3571 Project: Lucene - Java Issue Type: Task Reporter: Robert Muir Fix For: 4.0 Attachments: LUCENE-3571.patch IndexSearcher is supposed to be a cheap wrapper around a reader, but sometimes it is, sometimes it isn't. I think its confusing tangling of a heavyweight and lightweight object that it sometimes 'houses' a reader and must close it in that case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Query mapping for Issue 1726
Hi, I would like to insert a new type of query for the issue 1726 by including the lastpageScore and lastDoc in the query. Can anyone please let me know the place of code where i can insert a new mapping rule for this query to a new function in SolrIndexSearcher. Kindly reply. Thanks Regards, Manoj Kumar.R.K Graduate Student, MS Computer Science University at Buffalo Buffalo, New York (413) 461-8938|www.rkmanojkumar.co.nr
[jira] [Commented] (LUCENE-3305) Kuromoji code donation - a new Japanese morphological analyzer
[ https://issues.apache.org/jira/browse/LUCENE-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149642#comment-13149642 ] Christian Moen commented on LUCENE-3305: Thanks a lot, Simon! Robert, I agree completely with your comments. The Unicode normalization is only done at dictionary build time. Simon has turned it on by default -- its previous default was off. Perhaps it makes sense to have it on in Lucene's case... Simon, the TokenizerRunner class doesn't seem to be included in the patch, which might be fine. It's not strictly necessary for Lucene, but I think it's useful to keep it there so the analyzer can easily be run from the command line. The DebugTokenizer and GraphvizFormatter is there already, which aren't strictly necessary either, but sometimes quite useful, so I'm think we should add the TokenizerRunner as well -- at least for now. Tests didn't pass in my case, but I'll look more into this soon. My tomorrow is very busy, but I'll have time for this on Wednesday. Kuromoji code donation - a new Japanese morphological analyzer -- Key: LUCENE-3305 URL: https://issues.apache.org/jira/browse/LUCENE-3305 Project: Lucene - Java Issue Type: New Feature Components: modules/analysis Reporter: Christian Moen Assignee: Simon Willnauer Fix For: 4.0 Attachments: Kuromoji short overview .pdf, LUCENE-3305.patch, ip-clearance-Kuromoji.xml, ip-clearance-Kuromoji.xml, kuromoji-0.7.6-asf.tar.gz, kuromoji-0.7.6.tar.gz, kuromoji-solr-0.5.3-asf.tar.gz, kuromoji-solr-0.5.3.tar.gz Atilika Inc. (アティリカ株式会社) would like to donate the Kuromoji Japanese morphological analyzer to the Apache Software Foundation in the hope that it will be useful to Lucene and Solr users in Japan and elsewhere. The project was started in 2010 since we couldn't find any high-quality, actively maintained and easy-to-use Java-based Japanese morphological analyzers, and these become many of our design goals for Kuromoji. Kuromoji also has a segmentation mode that is particularly useful for search, which we hope will interest Lucene and Solr users. Compound-nouns, such as 関西国際空港 (Kansai International Airport) and 日本経済新聞 (Nikkei Newspaper), are segmented as one token with most analyzers. As a result, a search for 空港 (airport) or 新聞 (newspaper) will not give you a for in these words. Kuromoji can segment these words into 関西 国際 空港 and 日本 経済 新聞, which is generally what you would want for search and you'll get a hit. We also wanted to make sure the technology has a license that makes it compatible with other Apache Software Foundation software to maximize its usefulness. Kuromoji has an Apache License 2.0 and all code is currently owned by Atilika Inc. The software has been developed by my good friend and ex-colleague Masaru Hasegawa and myself. Kuromoji uses the so-called IPADIC for its dictionary/statistical model and its license terms are described in NOTICE.txt. I'll upload code distributions and their corresponding hashes and I'd very much like to start the code grant process. I'm also happy to provide patches to integrate Kuromoji into the codebase, if you prefer that. Please advise on how you'd like me to proceed with this. Thank you. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3571) nuke IndexSearcher(directory)
[ https://issues.apache.org/jira/browse/LUCENE-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3571: Fix Version/s: 3.5 setting fix version 3.x for the @deprecated nuke IndexSearcher(directory) - Key: LUCENE-3571 URL: https://issues.apache.org/jira/browse/LUCENE-3571 Project: Lucene - Java Issue Type: Task Reporter: Robert Muir Fix For: 3.5, 4.0 Attachments: LUCENE-3571.patch IndexSearcher is supposed to be a cheap wrapper around a reader, but sometimes it is, sometimes it isn't. I think its confusing tangling of a heavyweight and lightweight object that it sometimes 'houses' a reader and must close it in that case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3573) TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern
TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern Key: LUCENE-3573 URL: https://issues.apache.org/jira/browse/LUCENE-3573 Project: Lucene - Java Issue Type: Bug Components: modules/facet Reporter: Doron Cohen Assignee: Doron Cohen Priority: Minor When recreating the taxonomy index, TR's assumption that categories are only added does not hold anymore. As result, calling TR.refresh() will be incorrect at best, but usually throw an AIOOBE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3571) nuke IndexSearcher(directory)
[ https://issues.apache.org/jira/browse/LUCENE-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149649#comment-13149649 ] Uwe Schindler commented on LUCENE-3571: --- +1 nuke IndexSearcher(directory) - Key: LUCENE-3571 URL: https://issues.apache.org/jira/browse/LUCENE-3571 Project: Lucene - Java Issue Type: Task Reporter: Robert Muir Fix For: 3.5, 4.0 Attachments: LUCENE-3571.patch IndexSearcher is supposed to be a cheap wrapper around a reader, but sometimes it is, sometimes it isn't. I think its confusing tangling of a heavyweight and lightweight object that it sometimes 'houses' a reader and must close it in that case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3573) TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern
[ https://issues.apache.org/jira/browse/LUCENE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-3573: Attachment: LUCENE-3573.patch Attached patch for trunk adds two tests: * one of them is opening a new TR and passes * the other is refreshing the TR and fails. TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern Key: LUCENE-3573 URL: https://issues.apache.org/jira/browse/LUCENE-3573 Project: Lucene - Java Issue Type: Bug Components: modules/facet Reporter: Doron Cohen Assignee: Doron Cohen Priority: Minor Attachments: LUCENE-3573.patch When recreating the taxonomy index, TR's assumption that categories are only added does not hold anymore. As result, calling TR.refresh() will be incorrect at best, but usually throw an AIOOBE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Casesensitive search problem
If you're using the example schema, your problem is probably WordDelimiterFilterFactory, which splits the input into separate tokens if the case changes. See admin/analysis for a great way to see what your analysis chain does at every step. Click the verbose mode... Best Erick On Mon, Nov 14, 2011 at 8:22 AM, jayanta sahoo jsahoo1...@gmail.com wrote: HI Whenever I am searching with the words OfficeJet or officejet or Officejet or oFiiIcejET. I am getting the different results for each search respectively. I am not able to understand why this is happening? I want to solve this problem such a way that search will become case insensitive and I will get same result for any combination of capital and small letters. Please let me know How i will solve this problem -- Jayanta Sahoo - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #296: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/296/ No tests ran. Build Log (for compile errors): [...truncated 16206 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3573) TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern
[ https://issues.apache.org/jira/browse/LUCENE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149661#comment-13149661 ] Shai Erera commented on LUCENE-3573: +1. I think that we should nuke refresh() and adopt the IR approach, even though I don't like the 'maybe' and 'if', might as well make the API consistent. So instead of refresh() we'll have a static TR.openIfChanged that either returns null (no changes, or the taxonomy wasn't recreated) or a new instance in case it was recreated. Note that unlike IndexReader, if the taxonomy index wasn't recreated, openIfChanged will modify the internal state of TR. That's ok since the taxonomy index was built for it: existing TR instances (that weren't refreshed) won't be affected as they won't know about the new categories (and taxonomy index doesn't support deletes) and the caller can use the same TR instance in that case. Whatever we end up doing, we should remove refresh(). Even though we're not committed to back-compat yet (it's all experimental), I think it is dangerous if we'll simply modify refresh() behavior, because users may not be aware of the change. So a new method is a must. Besides that, the test looks good. Was there any reason to add it to TestTaxonomyCombined? TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern Key: LUCENE-3573 URL: https://issues.apache.org/jira/browse/LUCENE-3573 Project: Lucene - Java Issue Type: Bug Components: modules/facet Reporter: Doron Cohen Assignee: Doron Cohen Priority: Minor Attachments: LUCENE-3573.patch When recreating the taxonomy index, TR's assumption that categories are only added does not hold anymore. As result, calling TR.refresh() will be incorrect at best, but usually throw an AIOOBE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3574) Add some more constants for newer Java versions to Constants.class, remove outdated ones.
Add some more constants for newer Java versions to Constants.class, remove outdated ones. - Key: LUCENE-3574 URL: https://issues.apache.org/jira/browse/LUCENE-3574 Project: Lucene - Java Issue Type: New Feature Components: core/other Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.5, 4.0 Preparation for LUCENE-3235: This adds constants to quickly detect Java6 and Java7 to Constants.java. It also deprecated and removes the outdated historical Java versions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3574) Add some more constants for newer Java versions to Constants.class, remove outdated ones.
[ https://issues.apache.org/jira/browse/LUCENE-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3574: -- Attachment: LUCENE-3574-3x.patch Patch for Lucene 3.x will remove deprecations in trunk and make JRE_IS_MINIMUM_JRE6 = true (+ deprecate it there) Add some more constants for newer Java versions to Constants.class, remove outdated ones. - Key: LUCENE-3574 URL: https://issues.apache.org/jira/browse/LUCENE-3574 Project: Lucene - Java Issue Type: New Feature Components: core/other Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.5, 4.0 Attachments: LUCENE-3574-3x.patch Preparation for LUCENE-3235: This adds constants to quickly detect Java6 and Java7 to Constants.java. It also deprecated and removes the outdated historical Java versions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3574) Add some more constants for newer Java versions to Constants.class, remove outdated ones.
[ https://issues.apache.org/jira/browse/LUCENE-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149665#comment-13149665 ] Uwe Schindler commented on LUCENE-3574: --- Committed 3.x revision: 1201739 Add some more constants for newer Java versions to Constants.class, remove outdated ones. - Key: LUCENE-3574 URL: https://issues.apache.org/jira/browse/LUCENE-3574 Project: Lucene - Java Issue Type: New Feature Components: core/other Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.5, 4.0 Attachments: LUCENE-3574-3x.patch Preparation for LUCENE-3235: This adds constants to quickly detect Java6 and Java7 to Constants.java. It also deprecated and removes the outdated historical Java versions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3574) Add some more constants for newer Java versions to Constants.class, remove outdated ones.
[ https://issues.apache.org/jira/browse/LUCENE-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149667#comment-13149667 ] Shai Erera commented on LUCENE-3574: One typo: nsme - name Also, not sure if it's worth it, but perhaps instead of constants like MIMINUM_JAVA_X we can have a class JavaVersion that follows the same logic we have in Version and can compare itself to other JavaVersions? Then we can have constants for JAVA_6 = new JavaVersion(6) and similar for JAVA_7, and another CURRENT_JAVA_VER that is initialized with the code you wrote. And you can then compare CURRENT to JAVA_6/7? Just an idea. Add some more constants for newer Java versions to Constants.class, remove outdated ones. - Key: LUCENE-3574 URL: https://issues.apache.org/jira/browse/LUCENE-3574 Project: Lucene - Java Issue Type: New Feature Components: core/other Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.5, 4.0 Attachments: LUCENE-3574-3x.patch Preparation for LUCENE-3235: This adds constants to quickly detect Java6 and Java7 to Constants.java. It also deprecated and removes the outdated historical Java versions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3574) Add some more constants for newer Java versions to Constants.class, remove outdated ones.
[ https://issues.apache.org/jira/browse/LUCENE-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-3574. --- Resolution: Fixed Committed trunk revision: 1201741 Add some more constants for newer Java versions to Constants.class, remove outdated ones. - Key: LUCENE-3574 URL: https://issues.apache.org/jira/browse/LUCENE-3574 Project: Lucene - Java Issue Type: New Feature Components: core/other Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.5, 4.0 Attachments: LUCENE-3574-3x.patch Preparation for LUCENE-3235: This adds constants to quickly detect Java6 and Java7 to Constants.java. It also deprecated and removes the outdated historical Java versions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3574) Add some more constants for newer Java versions to Constants.class, remove outdated ones.
[ https://issues.apache.org/jira/browse/LUCENE-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149671#comment-13149671 ] Robert Muir commented on LUCENE-3574: - {quote} Also, not sure if it's worth it, but perhaps instead of constants like MIMINUM_JAVA_X we can have a class JavaVersion that follows the same logic we have in Version {quote} I think the problem here would be that say we release 3.5 in a week. Then two years later Java 8 comes out... we can't know today how to detect it. So all we can do is say that we are 'at least' java 7 because we have XYZ. Add some more constants for newer Java versions to Constants.class, remove outdated ones. - Key: LUCENE-3574 URL: https://issues.apache.org/jira/browse/LUCENE-3574 Project: Lucene - Java Issue Type: New Feature Components: core/other Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.5, 4.0 Attachments: LUCENE-3574-3x.patch Preparation for LUCENE-3235: This adds constants to quickly detect Java6 and Java7 to Constants.java. It also deprecated and removes the outdated historical Java versions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3235) TestDoubleBarrelLRUCache hangs under Java 1.5, 3.x and trunk, likely JVM bug
[ https://issues.apache.org/jira/browse/LUCENE-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3235: -- Attachment: LUCENE-3235.patch Updated patch after LUCENE-3574 was committed. I also added a System.out.println to the test (VERBOSE only). TestDoubleBarrelLRUCache hangs under Java 1.5, 3.x and trunk, likely JVM bug Key: LUCENE-3235 URL: https://issues.apache.org/jira/browse/LUCENE-3235 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0, 3.1, 3.2, 3.3, 3.4 Reporter: Michael McCandless Fix For: 3.5 Attachments: LUCENE-3235.patch, LUCENE-3235.patch Not sure what's going on yet... but under Java 1.6 it seems not to hang bug under Java 1.5 hangs fairly easily, on Linux. Java is 1.5.0_22. I suspect this is relevant: http://stackoverflow.com/questions/3292577/is-it-possible-for-concurrenthashmap-to-deadlock which refers to this JVM bug http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6865591 which then refers to this one http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6822370 It looks like that last bug was fixed in Java 1.6 but not 1.5. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 11327 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/11327/ All tests passed Build Log (for compile errors): [...truncated 14675 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3574) Add some more constants for newer Java versions to Constants.class, remove outdated ones.
[ https://issues.apache.org/jira/browse/LUCENE-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149677#comment-13149677 ] Uwe Schindler commented on LUCENE-3574: --- bq. One typo: nsme - name nsme - NoSuchMethodException Add some more constants for newer Java versions to Constants.class, remove outdated ones. - Key: LUCENE-3574 URL: https://issues.apache.org/jira/browse/LUCENE-3574 Project: Lucene - Java Issue Type: New Feature Components: core/other Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.5, 4.0 Attachments: LUCENE-3574-3x.patch Preparation for LUCENE-3235: This adds constants to quickly detect Java6 and Java7 to Constants.java. It also deprecated and removes the outdated historical Java versions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2898) Support grouped faceting
[ https://issues.apache.org/jira/browse/SOLR-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn van Groningen updated SOLR-2898: Attachment: SOLR-2898.patch Attached initial patch that supports rudimentary grouped field facets for single valued and non tokenized string fields. Grouped facets isn't yet implemented for query / range and pivot facets. This patch is compatible with trunk. To use it for all field facets use group.facet=true or specify it per field. See test in patch for more details. I just hacked some code in the SimpleFacets class. To support it for all types of facets will require a lot of changes in many places in this class. Currently I don't see another way... Support grouped faceting Key: SOLR-2898 URL: https://issues.apache.org/jira/browse/SOLR-2898 Project: Solr Issue Type: New Feature Reporter: Martijn van Groningen Attachments: SOLR-2898.patch Support grouped faceting. As described in LUCENE-3097 (matrix counts). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3574) Add some more constants for newer Java versions to Constants.class, remove outdated ones.
[ https://issues.apache.org/jira/browse/LUCENE-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149679#comment-13149679 ] Shai Erera commented on LUCENE-3574: Exactly (I think that's what I meant) -- we detect the Java version as best we can and store it in a constant JAVA_VERSION. It can be compared to JAVA_6/7 thru an atLeast() API, like JAVA_VERSION.atLeast(JAVA_7). The code in 3.5 will only know to detect up to Java 7, while the code in 5.2 will know to detect Java 8. Wouldn't that work? Add some more constants for newer Java versions to Constants.class, remove outdated ones. - Key: LUCENE-3574 URL: https://issues.apache.org/jira/browse/LUCENE-3574 Project: Lucene - Java Issue Type: New Feature Components: core/other Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.5, 4.0 Attachments: LUCENE-3574-3x.patch Preparation for LUCENE-3235: This adds constants to quickly detect Java6 and Java7 to Constants.java. It also deprecated and removes the outdated historical Java versions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3574) Add some more constants for newer Java versions to Constants.class, remove outdated ones.
[ https://issues.apache.org/jira/browse/LUCENE-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149682#comment-13149682 ] Shai Erera commented on LUCENE-3574: bq. nsme - NoSuchMethodException ah, ok :). Add some more constants for newer Java versions to Constants.class, remove outdated ones. - Key: LUCENE-3574 URL: https://issues.apache.org/jira/browse/LUCENE-3574 Project: Lucene - Java Issue Type: New Feature Components: core/other Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.5, 4.0 Attachments: LUCENE-3574-3x.patch Preparation for LUCENE-3235: This adds constants to quickly detect Java6 and Java7 to Constants.java. It also deprecated and removes the outdated historical Java versions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3573) TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern
[ https://issues.apache.org/jira/browse/LUCENE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149684#comment-13149684 ] Doron Cohen commented on LUCENE-3573: - I agree about keeping the same notions as IR. bq. returns null (no changes, or the taxonomy wasn't recreated) In fact I was thinking of a different contract. So we have two approaches here for the returned value: * Option A: ## *new TR* - if the taxonomy was recreated. ## *null* - if the taxonomy was either not modified or just grew. * Option B: ## *new TR* - if the taxonomy was modified (either recreated or just grew) ## *null* - if the taxonomy was not modified. Option A is simpler to implement, but I think it has two drawbacks: * it is confusingly different from that of IR * the fact that the TR was refreshed is hidden from the caller. Option B is a bit more involved to implement: * would need to copy arrays' data from old TR to new one in case the taxonomy only grew I started to implement option B but now rethinking this... bq. Was there any reason to add it to TestTaxonomyCombined? Good point, should probably move this to TestDirectoryTaxonomyReader. TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern Key: LUCENE-3573 URL: https://issues.apache.org/jira/browse/LUCENE-3573 Project: Lucene - Java Issue Type: Bug Components: modules/facet Reporter: Doron Cohen Assignee: Doron Cohen Priority: Minor Attachments: LUCENE-3573.patch When recreating the taxonomy index, TR's assumption that categories are only added does not hold anymore. As result, calling TR.refresh() will be incorrect at best, but usually throw an AIOOBE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3573) TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern
[ https://issues.apache.org/jira/browse/LUCENE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149687#comment-13149687 ] Doron Cohen commented on LUCENE-3573: - One more thing - In approach B, the fact that the taxonomy just grew simply allows an optimization (read only the new ordinals), and so it is not a part of the API logic, and the only logic is - was the taxonomy modified or not. - In approach A, this fact is part of the API logic. TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern Key: LUCENE-3573 URL: https://issues.apache.org/jira/browse/LUCENE-3573 Project: Lucene - Java Issue Type: Bug Components: modules/facet Reporter: Doron Cohen Assignee: Doron Cohen Priority: Minor Attachments: LUCENE-3573.patch When recreating the taxonomy index, TR's assumption that categories are only added does not hold anymore. As result, calling TR.refresh() will be incorrect at best, but usually throw an AIOOBE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3496) Support grouping by IndexDocValues
[ https://issues.apache.org/jira/browse/LUCENE-3496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149689#comment-13149689 ] Martijn van Groningen commented on LUCENE-3496: --- I was planning on doing this. I'm almost ready to commit it. I'm only a bit stuck on documents that don't have a value for a group field. The random grouping tests also add documents with a null value for the group field and an empty string for the group field. This works fine with the term based implementations, but not the DV based implementations (random test fail). Should we not use null as group value if the dv based implementations are used during the test? Support grouping by IndexDocValues -- Key: LUCENE-3496 URL: https://issues.apache.org/jira/browse/LUCENE-3496 Project: Lucene - Java Issue Type: New Feature Components: modules/grouping Reporter: Martijn van Groningen Assignee: Martijn van Groningen Fix For: 4.0 Attachments: LUCENE-3496.patch, LUCENE-3496.patch, LUCENE-3496.patch, LUCENE-3496.patch, LUCENE-3496.patch, LUCENE-3496.patch, LUCENE-3496.patch Although IDV is not yet finalized (More particular the SortedSource). I think we already can discuss / investigate implementing grouping by IDV. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3574) Add some more constants for newer Java versions to Constants.class, remove outdated ones.
[ https://issues.apache.org/jira/browse/LUCENE-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149697#comment-13149697 ] Uwe Schindler commented on LUCENE-3574: --- One example where it might be bad: If it's an enum, you can also do if (JAVA_VERSION==JAVA_7, so the enum constants are not named like the fact they represent. I think thats all too much logic for something simple. For one major version we will have mostly 2 or 3 constants. In trunk we currently only have Java7 and a deprecated one which is always true. New constants are only added on request, when we want to test for features/bugs. Add some more constants for newer Java versions to Constants.class, remove outdated ones. - Key: LUCENE-3574 URL: https://issues.apache.org/jira/browse/LUCENE-3574 Project: Lucene - Java Issue Type: New Feature Components: core/other Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.5, 4.0 Attachments: LUCENE-3574-3x.patch Preparation for LUCENE-3235: This adds constants to quickly detect Java6 and Java7 to Constants.java. It also deprecated and removes the outdated historical Java versions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3235) TestDoubleBarrelLRUCache hangs under Java 1.5, 3.x and trunk, likely JVM bug
[ https://issues.apache.org/jira/browse/LUCENE-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149701#comment-13149701 ] Uwe Schindler commented on LUCENE-3235: --- I wait until tomorrow before I commit this safe-but-slow fix. TestDoubleBarrelLRUCache hangs under Java 1.5, 3.x and trunk, likely JVM bug Key: LUCENE-3235 URL: https://issues.apache.org/jira/browse/LUCENE-3235 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0, 3.1, 3.2, 3.3, 3.4 Reporter: Michael McCandless Fix For: 3.5 Attachments: LUCENE-3235.patch, LUCENE-3235.patch Not sure what's going on yet... but under Java 1.6 it seems not to hang bug under Java 1.5 hangs fairly easily, on Linux. Java is 1.5.0_22. I suspect this is relevant: http://stackoverflow.com/questions/3292577/is-it-possible-for-concurrenthashmap-to-deadlock which refers to this JVM bug http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6865591 which then refers to this one http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6822370 It looks like that last bug was fixed in Java 1.6 but not 1.5. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 11328 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/11328/ No tests ran. Build Log (for compile errors): [...truncated 1312 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2382) DIH Cache Improvements
[ https://issues.apache.org/jira/browse/SOLR-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149711#comment-13149711 ] Steven Rowe commented on SOLR-2382: --- Hi Noble, In {{DIHCache.java}}, you used the javadoc tag {{@solr.experimental}}, but there is no support in the build system for this tag, so it causes javadoc warnings, which fail the build, e.g.: [https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/11327/consoleText] (scroll down to the bottom to see the warning): {noformat} [javadoc] [...]/DIHCache.java:14: warning - @solr.experimental is an unknown tag. {noformat} Would you mind if I switch {{@solr.experimental}} to {{@lucene.experimental}}? DIH Cache Improvements -- Key: SOLR-2382 URL: https://issues.apache.org/jira/browse/SOLR-2382 Project: Solr Issue Type: New Feature Components: contrib - DataImportHandler Reporter: James Dyer Priority: Minor Attachments: SOLR-2382-dihwriter.patch, SOLR-2382-dihwriter.patch, SOLR-2382-dihwriter.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-properties.patch, SOLR-2382-properties.patch, SOLR-2382-solrwriter-verbose-fix.patch, SOLR-2382-solrwriter.patch, SOLR-2382-solrwriter.patch, SOLR-2382-solrwriter.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch Functionality: 1. Provide a pluggable caching framework for DIH so that users can choose a cache implementation that best suits their data and application. 2. Provide a means to temporarily cache a child Entity's data without needing to create a special cached implementation of the Entity Processor (such as CachedSqlEntityProcessor). 3. Provide a means to write the final (root entity) DIH output to a cache rather than to Solr. Then provide a way for a subsequent DIH call to use the cache as an Entity input. Also provide the ability to do delta updates on such persistent caches. 4. Provide the ability to partition data across multiple caches that can then be fed back into DIH and indexed either to varying Solr Shards, or to the same Core in parallel. Use Cases: 1. We needed a flexible scalable way to temporarily cache child-entity data prior to joining to parent entities. - Using SqlEntityProcessor with Child Entities can cause an n+1 select problem. - CachedSqlEntityProcessor only supports an in-memory HashMap as a Caching mechanism and does not scale. - There is no way to cache non-SQL inputs (ex: flat files, xml, etc). 2. We needed the ability to gather data from long-running entities by a process that runs separate from our main indexing process. 3. We wanted the ability to do a delta import of only the entities that changed. - Lucene/Solr requires entire documents to be re-indexed, even if only a few fields changed. - Our data comes from 50+ complex sql queries and/or flat files. - We do not want to incur overhead re-gathering all of this data if only 1 entity's data changed. - Persistent DIH caches solve this problem. 4. We want the ability to index several documents in parallel (using 1.4.1, which did not have the threads parameter). 5. In the future, we may need to use Shards, creating a need to easily partition our source data into Shards. Implementation Details: 1. De-couple EntityProcessorBase from caching. - Created a new interface, DIHCache two implementations: - SortedMapBackedCache - An in-memory cache, used as default with CachedSqlEntityProcessor (now deprecated). - BerkleyBackedCache - A disk-backed cache, dependent on bdb-je, tested with je-4.1.6.jar - NOTE: the existing Lucene Contrib db project uses je-3.3.93.jar. I believe this may be incompatible due to Generic Usage. - NOTE: I did not modify the ant script to automatically get this jar, so to use or evaluate this patch, download bdb-je from http://www.oracle.com/technetwork/database/berkeleydb/downloads/index.html 2. Allow Entity Processors to take a cacheImpl parameter to cause the entity data to be cached (see EntityProcessorBase DIHCacheProperties). 3. Partially De-couple SolrWriter from DocBuilder - Created a new interface DIHWriter, two implementations: - SolrWriter (refactored) - DIHCacheWriter (allows DIH to write ultimately to a Cache). 4. Create a new Entity Processor, DIHCacheProcessor, which reads a persistent Cache as DIH Entity Input. 5. Support a partition parameter with both DIHCacheWriter and DIHCacheProcessor to
[jira] [Commented] (SOLR-1726) Deep Paging and Large Results Improvements
[ https://issues.apache.org/jira/browse/SOLR-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149723#comment-13149723 ] Grant Ingersoll commented on SOLR-1726: --- Hi Manoj, This shouldn't require a new query since it should work with all queries, but instead new parameters that get passed in alongside the query (see earlier comments that lay out what the parameter names are.) You might start by looking at how something like the rows parameter or the start parameter are handled and passed through down to the SolrIndexSearcher. Deep Paging and Large Results Improvements -- Key: SOLR-1726 URL: https://issues.apache.org/jira/browse/SOLR-1726 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 3.5, 4.0 There are possibly ways to improve collections of deep paging by passing Solr/Lucene more information about the last page of results seen, thereby saving priority queue operations. See LUCENE-2215. There may also be better options for retrieving large numbers of rows at a time that are worth exploring. LUCENE-2127. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3235) TestDoubleBarrelLRUCache hangs under Java 1.5, 3.x and trunk, likely JVM bug
[ https://issues.apache.org/jira/browse/LUCENE-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149728#comment-13149728 ] Michael McCandless commented on LUCENE-3235: +1 for the safe-but-slow Java 5 only workaround TestDoubleBarrelLRUCache hangs under Java 1.5, 3.x and trunk, likely JVM bug Key: LUCENE-3235 URL: https://issues.apache.org/jira/browse/LUCENE-3235 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0, 3.1, 3.2, 3.3, 3.4 Reporter: Michael McCandless Fix For: 3.5 Attachments: LUCENE-3235.patch, LUCENE-3235.patch Not sure what's going on yet... but under Java 1.6 it seems not to hang bug under Java 1.5 hangs fairly easily, on Linux. Java is 1.5.0_22. I suspect this is relevant: http://stackoverflow.com/questions/3292577/is-it-possible-for-concurrenthashmap-to-deadlock which refers to this JVM bug http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6865591 which then refers to this one http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6822370 It looks like that last bug was fixed in Java 1.6 but not 1.5. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2382) DIH Cache Improvements
[ https://issues.apache.org/jira/browse/SOLR-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149731#comment-13149731 ] Noble Paul commented on SOLR-2382: -- please go ahead DIH Cache Improvements -- Key: SOLR-2382 URL: https://issues.apache.org/jira/browse/SOLR-2382 Project: Solr Issue Type: New Feature Components: contrib - DataImportHandler Reporter: James Dyer Priority: Minor Attachments: SOLR-2382-dihwriter.patch, SOLR-2382-dihwriter.patch, SOLR-2382-dihwriter.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-entities.patch, SOLR-2382-properties.patch, SOLR-2382-properties.patch, SOLR-2382-solrwriter-verbose-fix.patch, SOLR-2382-solrwriter.patch, SOLR-2382-solrwriter.patch, SOLR-2382-solrwriter.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch, SOLR-2382.patch Functionality: 1. Provide a pluggable caching framework for DIH so that users can choose a cache implementation that best suits their data and application. 2. Provide a means to temporarily cache a child Entity's data without needing to create a special cached implementation of the Entity Processor (such as CachedSqlEntityProcessor). 3. Provide a means to write the final (root entity) DIH output to a cache rather than to Solr. Then provide a way for a subsequent DIH call to use the cache as an Entity input. Also provide the ability to do delta updates on such persistent caches. 4. Provide the ability to partition data across multiple caches that can then be fed back into DIH and indexed either to varying Solr Shards, or to the same Core in parallel. Use Cases: 1. We needed a flexible scalable way to temporarily cache child-entity data prior to joining to parent entities. - Using SqlEntityProcessor with Child Entities can cause an n+1 select problem. - CachedSqlEntityProcessor only supports an in-memory HashMap as a Caching mechanism and does not scale. - There is no way to cache non-SQL inputs (ex: flat files, xml, etc). 2. We needed the ability to gather data from long-running entities by a process that runs separate from our main indexing process. 3. We wanted the ability to do a delta import of only the entities that changed. - Lucene/Solr requires entire documents to be re-indexed, even if only a few fields changed. - Our data comes from 50+ complex sql queries and/or flat files. - We do not want to incur overhead re-gathering all of this data if only 1 entity's data changed. - Persistent DIH caches solve this problem. 4. We want the ability to index several documents in parallel (using 1.4.1, which did not have the threads parameter). 5. In the future, we may need to use Shards, creating a need to easily partition our source data into Shards. Implementation Details: 1. De-couple EntityProcessorBase from caching. - Created a new interface, DIHCache two implementations: - SortedMapBackedCache - An in-memory cache, used as default with CachedSqlEntityProcessor (now deprecated). - BerkleyBackedCache - A disk-backed cache, dependent on bdb-je, tested with je-4.1.6.jar - NOTE: the existing Lucene Contrib db project uses je-3.3.93.jar. I believe this may be incompatible due to Generic Usage. - NOTE: I did not modify the ant script to automatically get this jar, so to use or evaluate this patch, download bdb-je from http://www.oracle.com/technetwork/database/berkeleydb/downloads/index.html 2. Allow Entity Processors to take a cacheImpl parameter to cause the entity data to be cached (see EntityProcessorBase DIHCacheProperties). 3. Partially De-couple SolrWriter from DocBuilder - Created a new interface DIHWriter, two implementations: - SolrWriter (refactored) - DIHCacheWriter (allows DIH to write ultimately to a Cache). 4. Create a new Entity Processor, DIHCacheProcessor, which reads a persistent Cache as DIH Entity Input. 5. Support a partition parameter with both DIHCacheWriter and DIHCacheProcessor to allow for easy partitioning of source entity data. 6. Change the semantics of entity.destroy() - Previously, it was being called on each iteration of DocBuilder.buildDocument(). - Now it is does one-time cleanup tasks (like closing or deleting a disk-backed cache) once the entity processor is completed. - The only out-of-the-box entity processor that previously implemented destroy() was LineEntitiyProcessor, so this is not a very invasive change. General Notes: We are near