[jira] Commented: (LUCENE-2235) implement PerFieldAnalyzerWrapper.getOffsetGap
[ https://issues.apache.org/jira/browse/LUCENE-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967121#action_12967121 ] Uwe Schindler commented on LUCENE-2235: --- To come back to the original issue: bq. Should this be checking that a field is indeed analyzed before calling getOffsetGap ? In my opinion this should be done (and so this issue would disappear). Can you open another issue requesting this check and link it to this one? One problem coming from not checking for analyzed is this: You add a field indexed and it gets analyzed by PFAW - After that you add the same field name stored-only (which is perfectly legal and often used, e.g. when the stored value is binary or in some other format and does not correspond to the indexed text), the positionIncrement is increased. After that you again add another instance of the same field as indexed-only, which also increases posIncr. So you have 2 times the gap between both indexed sub-fields. This is definitely wrong. implement PerFieldAnalyzerWrapper.getOffsetGap -- Key: LUCENE-2235 URL: https://issues.apache.org/jira/browse/LUCENE-2235 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 3.0 Environment: Any Reporter: Javier Godoy Assignee: Uwe Schindler Priority: Minor Fix For: 2.9.4, 3.0.3, 3.1, 4.0 Attachments: LUCENE-2235.patch, PerFieldAnalyzerWrapper.patch PerFieldAnalyzerWrapper does not delegates calls to getOffsetGap(Fieldable), instead it returns the default values from the implementation of Analyzer. (Similar to LUCENE-659 PerFieldAnalyzerWrapper fails to implement getPositionIncrementGap) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1395) Integrate Katta
[ https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967125#action_12967125 ] JohnWu commented on SOLR-1395: -- Tomliu: so in proxy , not in sub-proxy, katta startNode need add the class org.apache.solr.katta.DeployableSolrKattaServer ? in katta's lib, there are too many differents: solr, lucene, zookeeper, the worst is lucene! can you give me a mailbox? I can contect you directly (mine is pangla...@gmail.com). now, in workqueue of katta, NodeInteraction 135 row: T result = (T) _method.invoke(proxy, _args); proxy is a IPC of hadoop, it can not find the pc-slavo2:2, I lose some config in hadoop? or I need patch Hadoop with your https://issues.apache.org/jira/browse/HADOOP-7017? please reply, thanks! JohnWu Integrate Katta --- Key: SOLR-1395 URL: https://issues.apache.org/jira/browse/SOLR-1395 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: Next Attachments: back-end.log, front-end.log, hadoop-core-0.19.0.jar, katta-core-0.6-dev.jar, katta-solrcores.jpg, katta.node.properties, katta.zk.properties, log4j-1.2.13.jar, solr-1395-1431-3.patch, solr-1395-1431-4.patch, solr-1395-1431-katta0.6.patch, solr-1395-1431-katta0.6.patch, solr-1395-1431.patch, solr-1395-katta-0.6.2-1.patch, solr-1395-katta-0.6.2-2.patch, solr-1395-katta-0.6.2-3.patch, solr-1395-katta-0.6.2.patch, SOLR-1395.patch, SOLR-1395.patch, SOLR-1395.patch, test-katta-core-0.6-dev.jar, zkclient-0.1-dev.jar, zookeeper-3.2.1.jar Original Estimate: 336h Remaining Estimate: 336h We'll integrate Katta into Solr so that: * Distributed search uses Hadoop RPC * Shard/SolrCore distribution and management * Zookeeper based failover * Indexes may be built using Hadoop -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2790) IndexWriter should call MP.useCompoundFile and not LogMP.getUseCompoundFile
[ https://issues.apache.org/jira/browse/LUCENE-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967127#action_12967127 ] Shai Erera commented on LUCENE-2790: bq. How Lucene manages the index files is under-the-hood so we are free to change it. That's correct. However, sadly, the backwards tests do not agree with you :). Because the runtime behavior has changed, the tests fail. If you try to call LMP.setNoCFSRation, you get a NoSuchMethodError because the tests are compiled against 3.0's source, where indeed it does not exist. I'm trying to resolve it by fetching the method using reflection, but this shows another problem w/ how we maintain the backwards tests. IndexWriter should call MP.useCompoundFile and not LogMP.getUseCompoundFile --- Key: LUCENE-2790 URL: https://issues.apache.org/jira/browse/LUCENE-2790 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Shai Erera Assignee: Shai Erera Priority: Minor Fix For: 3.1, 4.0 Attachments: LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch Spin off from here: http://www.gossamer-threads.com/lists/lucene/java-dev/112311. I will attach a patch shortly that addresses the issue on trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: API Semantics and Backwards
I've hit another backwards tests problem. Over in LUCENE-2790 we've changed LogMergePolicy.useCompoundFile's behavior to factor in the newly added noCFSRatio. After some discussion we decided that even though it breaks back-compat's runtime behavior, it's ok in this case because how Lucene manages the internal representation of segments (compound or not) is up to it. And you can override it by disabling the CFSRatio setting. And indeed some tests failed (backwards as well as core stream) and the way to fix them was to force CFS creation. However, on backwards this is not doable because the tests are compiled against 3.0's source, where setNoCFSRatio does not exist on LogMergePolicy, even though we agree that this change is allowed back-compat wise. I ended up fixing it by querying for the method using reflection and the tests now pass. Now, regardless of this change (whether it's ok or not), I think this shows another problem with how we maintain backwards tests. Internal changes like this, especially for @experimental / @internal classes are allowed, but we need to revert to reflection hacks to resolve the tests. So either we delete the offending tests, because like Uwe says - they duplicate the test efforts, or we maintain a source for backwards. I personally am in favor of removing all non backwards tests, and keep those that do actually test backwards behavior. But I know the opinions are divided here. Shai On Wed, Dec 1, 2010 at 4:48 PM, Shai Erera ser...@gmail.com wrote: While I'm not against going back towards a checkout backwards that we can modify, I wonder if all the tests there should be there and how much do we actually duplicate. Lucene 3x should include all of 3.0 tests + new ones that test new functionality, or assert bug fixes etc. There shouldn't be a test in 3.0 that does not exist in 3x, unless the missing test/feature was an agreed upon backwards break. So I think it would be really nice if backwards tested exactly what it should. For example, testing index format backcompat is done twice today, in test-core and test-backwards, while it should only be run by backwards. There are a bunch of test classes I've created once that impl/extend 'search' related classes, for back-compat compilation only. They should also be run in backwards only. The downside of this is that maintenance is going to be difficult - it's much easier to copy tests over to backwards, then decide which ones should go there and which shouldn't. Also, adding new API requires a matching backwards test etc. Not non doable, but difficult - requires discipline. Shai On Tue, Nov 30, 2010 at 2:02 PM, Robert Muir rcm...@gmail.com wrote: On Tue, Nov 30, 2010 at 4:47 AM, Shai Erera ser...@gmail.com wrote: Like you said, the rest of the tests just increase the test running time. I'm not completely sure about this: do we always switch over our tests to do the equivalent checks both against the new API and the old API when we make API changes? There could be bugs in our 'backwards handling' that are actually logic bugs that the new tests dont detect. So I'm a little concerned about only running pure simplistic API tests in backwards. On the other hand, I'm really worried about what Shai brings up here: we are doing some refactoring of the tests system and there is more shared code at the moment: similar to MockRAMDirectory. Because we worry about preventing things like index corruption, its my opinion we need things like MockRAMDirectory, and they should be able to break all the rules/etc (use pkg-private APIS) if we can prevent bugs. Just look at our trunk or 3.x tests and imagine them as backwards tests... these sort of utilities like RandomIndexWriter will be more fragile to internal/experimental/pkg-private changes, but as mentioned above i think these are good to have in backwards tests. So, I think at the moment I'm leaning towards the idea of going back towards a checkout that we can modify, in combination with us all soliciting more reviews / longer time for any backports to stable branches that require backwards tests modifications? I understand Uwe's point too, its dangerous to modify the code and seems to defeat the purpose of backwards, but I think this is going to be a more serious problem after releasing 3.1! - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
SearchBlox is now FREE. No limitations!
SearchBlox is pleased to announce the availability of SearchBlox Search Software as a completely FREE product. The product is now available with no limitations in terms of number of documents indexed and no restrictions in product functionality. SearchBlox will support the free product with a number of new paid support packages and free forum-based support. SearchBlox is an Enterprise Search Server built on top of Apache Lucene and includes: - Integrated crawlers for HTTP/HTTPS, filesystems and feeds - Web based Admin Console to configure and manage upto 250 indexes - REST API - Multilingual support to index content in 37 languages - Packaged for deployment to Linux/Unix, Windows, Mac OS X and Amazon Web Services(AWS) Since 2003, SearchBlox has been continually enhanced leveraging new features in Apache Lucene and has been deployed by more than 300 customers in 30 countries. The product can be downloaded from www.searchblox.com Best regards, The SearchBlox Team www.searchblox.com http://twitter.com/search_software - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-trunk - Build # 1384 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-trunk/1384/ All tests passed Build Log (for compile errors): [...truncated 18318 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2801) getOffsetGap should not be called for non-anaylyzed fields
getOffsetGap should not be called for non-anaylyzed fields -- Key: LUCENE-2801 URL: https://issues.apache.org/jira/browse/LUCENE-2801 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 3.0.3 Reporter: Nick Pellow from: LUCENE-2235 Since Lucene 3.0.3, when a PerFieldAnalyzerWrapper is constructed with a null defaultAnalyzer it will NPE when DocInverterPerField calls: {code} fieldState.offset += docState.analyzer.getOffsetGap(field); {code} This block should first check that the field is analyzed, or the javadoc on PerFieldAnalyzerWrapper could mention that a null defaultAnalyzer is disallowed. Also, the main reason for checking for isAnalyzed, from Uwe Schindler in LUCENE-2235 {quote} One problem coming from not checking for analyzed is this: You add a field indexed and it gets analyzed by PFAW - After that you add the same field name stored-only (which is perfectly legal and often used, e.g. when the stored value is binary or in some other format and does not correspond to the indexed text), the positionIncrement is increased. After that you again add another instance of the same field as indexed-only, which also increases posIncr. So you have 2 times the gap between both indexed sub-fields. This is definitely wrong. {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2235) implement PerFieldAnalyzerWrapper.getOffsetGap
[ https://issues.apache.org/jira/browse/LUCENE-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967148#action_12967148 ] Nick Pellow commented on LUCENE-2235: - Thanks for the clarification, Uwe. I wasn't sure if null Analyzers were meant to be accepted or not. I was upgrading some existing code from 3.0.2 to 3.0.3 and stumbled across that, so its good to know. I've created LUCENE-2801 to track the real reason the check should be done too! implement PerFieldAnalyzerWrapper.getOffsetGap -- Key: LUCENE-2235 URL: https://issues.apache.org/jira/browse/LUCENE-2235 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 3.0 Environment: Any Reporter: Javier Godoy Assignee: Uwe Schindler Priority: Minor Fix For: 2.9.4, 3.0.3, 3.1, 4.0 Attachments: LUCENE-2235.patch, PerFieldAnalyzerWrapper.patch PerFieldAnalyzerWrapper does not delegates calls to getOffsetGap(Fieldable), instead it returns the default values from the implementation of Analyzer. (Similar to LUCENE-659 PerFieldAnalyzerWrapper fails to implement getPositionIncrementGap) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2235) implement PerFieldAnalyzerWrapper.getOffsetGap
[ https://issues.apache.org/jira/browse/LUCENE-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967151#action_12967151 ] Uwe Schindler commented on LUCENE-2235: --- Thanks, Nick! implement PerFieldAnalyzerWrapper.getOffsetGap -- Key: LUCENE-2235 URL: https://issues.apache.org/jira/browse/LUCENE-2235 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 3.0 Environment: Any Reporter: Javier Godoy Assignee: Uwe Schindler Priority: Minor Fix For: 2.9.4, 3.0.3, 3.1, 4.0 Attachments: LUCENE-2235.patch, PerFieldAnalyzerWrapper.patch PerFieldAnalyzerWrapper does not delegates calls to getOffsetGap(Fieldable), instead it returns the default values from the implementation of Analyzer. (Similar to LUCENE-659 PerFieldAnalyzerWrapper fails to implement getPositionIncrementGap) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2790) IndexWriter should call MP.useCompoundFile and not LogMP.getUseCompoundFile
[ https://issues.apache.org/jira/browse/LUCENE-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2790: --- Attachment: LUCENE-2790-3x.patch Backport to 3x. Note the reflection hack I had to use to make the backwards tests run. I don't commit yet - waiting for some response about the backwards tests. If you're ok with it, I'll commit. IndexWriter should call MP.useCompoundFile and not LogMP.getUseCompoundFile --- Key: LUCENE-2790 URL: https://issues.apache.org/jira/browse/LUCENE-2790 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Shai Erera Assignee: Shai Erera Priority: Minor Fix For: 3.1, 4.0 Attachments: LUCENE-2790-3x.patch, LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch Spin off from here: http://www.gossamer-threads.com/lists/lucene/java-dev/112311. I will attach a patch shortly that addresses the issue on trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2790) IndexWriter should call MP.useCompoundFile and not LogMP.getUseCompoundFile
[ https://issues.apache.org/jira/browse/LUCENE-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967183#action_12967183 ] Uwe Schindler commented on LUCENE-2790: --- I would supply disable the tests. Reflection should only be used when mock classes are used that affect thousands of tests. There are already lots of tests disabled. IndexWriter should call MP.useCompoundFile and not LogMP.getUseCompoundFile --- Key: LUCENE-2790 URL: https://issues.apache.org/jira/browse/LUCENE-2790 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Shai Erera Assignee: Shai Erera Priority: Minor Fix For: 3.1, 4.0 Attachments: LUCENE-2790-3x.patch, LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch Spin off from here: http://www.gossamer-threads.com/lists/lucene/java-dev/112311. I will attach a patch shortly that addresses the issue on trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2267) Using query function in bf parameter in the DisMaxQParser forces the use of parameter dereferencing
Using query function in bf parameter in the DisMaxQParser forces the use of parameter dereferencing --- Key: SOLR-2267 URL: https://issues.apache.org/jira/browse/SOLR-2267 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.1 Reporter: Uri Boness Fix For: 3.1 The DisMaxQParser parses the bf parameter using the {{SolrPluginUtils.parseFieldBoosts(...)}} function. This function tokenizes the string based on whitespaces and then bulilds a map mapping fields to their boost values. Unfortunately, the the *{!...}* form of a query contains whitespaces and therefore the parsing of the boost function fails. This should be considered as a bug as effectively it forces the use of parameter dereferencing which in many cases is not ideal. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1395) Integrate Katta
[ https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967184#action_12967184 ] Eric Pugh commented on SOLR-1395: - Tom, John, Just wanted to comment that having your conversation on this ticket in public has been great! I am a couple steps behind you, having started up Katta, and started Solr with the patch, but not having success on searching. My current error is that Solr can't find the katta.zk.properties file, where did you put it so it would be found on the class path? Eric Integrate Katta --- Key: SOLR-1395 URL: https://issues.apache.org/jira/browse/SOLR-1395 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: Next Attachments: back-end.log, front-end.log, hadoop-core-0.19.0.jar, katta-core-0.6-dev.jar, katta-solrcores.jpg, katta.node.properties, katta.zk.properties, log4j-1.2.13.jar, solr-1395-1431-3.patch, solr-1395-1431-4.patch, solr-1395-1431-katta0.6.patch, solr-1395-1431-katta0.6.patch, solr-1395-1431.patch, solr-1395-katta-0.6.2-1.patch, solr-1395-katta-0.6.2-2.patch, solr-1395-katta-0.6.2-3.patch, solr-1395-katta-0.6.2.patch, SOLR-1395.patch, SOLR-1395.patch, SOLR-1395.patch, test-katta-core-0.6-dev.jar, zkclient-0.1-dev.jar, zookeeper-3.2.1.jar Original Estimate: 336h Remaining Estimate: 336h We'll integrate Katta into Solr so that: * Distributed search uses Hadoop RPC * Shard/SolrCore distribution and management * Zookeeper based failover * Indexes may be built using Hadoop -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2802) DirectoryReader ignores NRT SegmentInfos in #isOptimized()
DirectoryReader ignores NRT SegmentInfos in #isOptimized() -- Key: LUCENE-2802 URL: https://issues.apache.org/jira/browse/LUCENE-2802 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 4.0 Reporter: Simon Willnauer DirectoryReader only takes shared (with IW) SegmentInfos into account in DirectoryReader#isOptimized(). This can return true even if the actual realtime reader sees more than one segments. {code} public boolean isOptimized() { ensureOpen(); // if segmentsInfos changes in IW this can return false positive return segmentInfos.size() == 1 !hasDeletions(); } {code} DirectoryReader should check if this reader has a non-nul segmentInfosStart and use that instead -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2790) IndexWriter should call MP.useCompoundFile and not LogMP.getUseCompoundFile
[ https://issues.apache.org/jira/browse/LUCENE-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967208#action_12967208 ] Shai Erera commented on LUCENE-2790: I don't mind disabling the tests, but I think we should discuss the bigger issue (on that thread on the mailing list). If we decide to make it a 'policy' to disable backwards tests that break due to legal changes to the API and behavior, let's at least reach a consensus. IndexWriter should call MP.useCompoundFile and not LogMP.getUseCompoundFile --- Key: LUCENE-2790 URL: https://issues.apache.org/jira/browse/LUCENE-2790 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Shai Erera Assignee: Shai Erera Priority: Minor Fix For: 3.1, 4.0 Attachments: LUCENE-2790-3x.patch, LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch Spin off from here: http://www.gossamer-threads.com/lists/lucene/java-dev/112311. I will attach a patch shortly that addresses the issue on trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Changes Mess
CHANGES file: LUCENE-2658: Exceptions while processing term vectors enabled for multiple fields could lead to invalid ArrayIndexOutOfBoundsExceptions. JIRA description: LUCENE-2658: TestIndexWriterExceptions random failure: AIOOBE in ByteBlockPool.allocSlice So you see the story, i hit a random test failure and just opened an issue describing that the test randomly failed. Mike then went and fixed it and wrote up a CHANGES.txt entry thats significantly better to the users. In order for us to use JIRA here, we would have to do a lot of JIRA-editing and re-organizing I think, and probably create a lot of unnecessary issues. What's the difference between Mike going and writing up a more informative CHANGES.txt entry than say updating JIRA with the information from that entry to have a more descriptive title? Also, besides issue titles, there is also a way to capture anyone that's been involved in the JIRA cycle (comment, issue created, etc.) as part of the contribution report that's probably *even more* inclusive than what you guys are currently doing. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Changes Mess
Would you mind naming these Apache projects? I'd like to take a look. Tika, Nutch, OODT. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2471) Supporting bulk copies in Directory
[ https://issues.apache.org/jira/browse/LUCENE-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967210#action_12967210 ] Shai Erera commented on LUCENE-2471: bq. I think the problem actually wasn't interrupting but some sort of race condition? Could be, I don't remember the exact details. I totally agree with you, though it's like a hen and egg situation - we cannot develop anything safe until we have good threaded unit tests, and we can never know we have those until we have any implementation that might break. So I personally don't mind if we pursue implementation of FileChannel copying, in NIOFSDirectory only, and then investigate the current threaded indexing/search tests and add some if we think something's missing. But currently we're in sort of a limbo :). Anyway, I don't think it's related to that issue and can be handled in a separate issue. If you agree, and assuming nothing more should be done here, we can close this one. Supporting bulk copies in Directory --- Key: LUCENE-2471 URL: https://issues.apache.org/jira/browse/LUCENE-2471 Project: Lucene - Java Issue Type: Improvement Components: Store Reporter: Earwin Burrfoot Fix For: 3.1, 4.0 A method can be added to IndexOutput that accepts IndexInput, and writes bytes using it as a source. This should be used for bulk-merge cases (offhand - norms, docstores?). Some Directories can then override default impl and skip intermediate buffers (NIO, MMap, RAM?). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967211#action_12967211 ] Jan Høydahl commented on SOLR-1979: --- @Grant: I dropped the outputField setting and a number of other settings There should be a way to output the language for the whole document to some field as some applications need to filter on language. I like making most things configurable, but with good defaults which fits most needs. The default could be to detect a document wide langauge from all input fields and output this to a language_s field, unless you specify params docLangInputFields=f1,f2.. and docLangOutputField=nn. Likewise make it easy to disable field renaming. Create LanguageIdentifierUpdateProcessor Key: SOLR-1979 URL: https://issues.apache.org/jira/browse/SOLR-1979 Project: Solr Issue Type: New Feature Components: update Reporter: Jan Høydahl Assignee: Grant Ingersoll Priority: Minor Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch We need the ability to detect language of some random text in order to act upon it, such as indexing the content into language aware fields. Another usecase is to be able to filter/facet on language on random unstructured content. To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The processor is configurable like this: {code:xml} processor class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory str name=inputFieldsname,subject/str str name=outputFieldlanguage_s/str str name=idFieldid/str str name=fallbacken/str /processor {code} It will then read the text from inputFields name and subject, perform language identification and output the ISO code for the detected language in the outputField. If no language was detected, fallback language is used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Changes Mess
On Mon, Dec 6, 2010 at 9:56 AM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: What's the difference between Mike going and writing up a more informative CHANGES.txt entry than say updating JIRA with the information from that entry to have a more descriptive title? Well, you are right, but its another modification to JIRA (an edit). And then there are more examples like this: CHANGES: * LUCENE-2650: Added extra safety to MMapIndexInput clones to prevent accessing an unmapped buffer if the input is closed JIRA: * LUCENE-2650: improve windows defaults in FSDirectory The jira is *CORRECT*. While working on the issue i discovered we could trivially add some extra safety. So i backported the extra safety to all branches. In this case i would have to split my patch in half and create another JIRA issue for this very trivial change? Just saying, to do what you are saying (by the way, I'm not opposed to the idea!), we would have to change the way we use JIRA and increase noise to the mailing list. There are quite a few examples like this: e.g. this jira release notes say this: [LUCENE-2055] - Fix buggy stemmers and Remove duplicate analysis functionality. But i certainly didn't do this in a bugfix release! what actually happened is in contrib/CHANGES.txt: * LUCENE-2055: Add documentation noting that the Dutch and French stemmers in contrib/analyzers do not implement the Snowball algorithm correctly, and recommend to use the equivalents in contrib/snowball if possible. So I don't know how jira would handle this case? because we merged contrib/snowball with contrib/analyzers in 3.1 i would have to create a separate jira issue just so that 3.1 has the correct description/path name in its release notes? and in 4.0 i'd have to create a third duplicate JIRA issue because we merged all the analyzers, so there it needs to refer to modules/analysis? - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2802) DirectoryReader ignores NRT SegmentInfos in #isOptimized()
[ https://issues.apache.org/jira/browse/LUCENE-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967213#action_12967213 ] Michael McCandless commented on LUCENE-2802: Nice catch Simon! This is also a thread safety issue since IR should not touch the writer's segmentInfos outside of sync(IW). DirectoryReader ignores NRT SegmentInfos in #isOptimized() -- Key: LUCENE-2802 URL: https://issues.apache.org/jira/browse/LUCENE-2802 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 4.0 Reporter: Simon Willnauer Attachments: LUCENE-2802.patch DirectoryReader only takes shared (with IW) SegmentInfos into account in DirectoryReader#isOptimized(). This can return true even if the actual realtime reader sees more than one segments. {code} public boolean isOptimized() { ensureOpen(); // if segmentsInfos changes in IW this can return false positive return segmentInfos.size() == 1 !hasDeletions(); } {code} DirectoryReader should check if this reader has a non-nul segmentInfosStart and use that instead -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967214#action_12967214 ] Grant Ingersoll commented on SOLR-1979: --- bq. There should be a way to output the language for the whole document to some field as some applications need to filter on language. There is. It's the langField. bq. Can't we validate the output mapping (and log it!) at initialization time? To some extent, but users can also pass it in. bq. We should not be using 639-1 codes in any APIs!!! I'll update the patch. Create LanguageIdentifierUpdateProcessor Key: SOLR-1979 URL: https://issues.apache.org/jira/browse/SOLR-1979 Project: Solr Issue Type: New Feature Components: update Reporter: Jan Høydahl Assignee: Grant Ingersoll Priority: Minor Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch We need the ability to detect language of some random text in order to act upon it, such as indexing the content into language aware fields. Another usecase is to be able to filter/facet on language on random unstructured content. To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The processor is configurable like this: {code:xml} processor class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory str name=inputFieldsname,subject/str str name=outputFieldlanguage_s/str str name=idFieldid/str str name=fallbacken/str /processor {code} It will then read the text from inputFields name and subject, perform language identification and output the ISO code for the detected language in the outputField. If no language was detected, fallback language is used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-1979) Create LanguageIdentifierUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-1979: -- Attachment: SOLR-1979.patch Removes mentions of ISO 639. Create LanguageIdentifierUpdateProcessor Key: SOLR-1979 URL: https://issues.apache.org/jira/browse/SOLR-1979 Project: Solr Issue Type: New Feature Components: update Reporter: Jan Høydahl Assignee: Grant Ingersoll Priority: Minor Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch We need the ability to detect language of some random text in order to act upon it, such as indexing the content into language aware fields. Another usecase is to be able to filter/facet on language on random unstructured content. To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The processor is configurable like this: {code:xml} processor class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory str name=inputFieldsname,subject/str str name=outputFieldlanguage_s/str str name=idFieldid/str str name=fallbacken/str /processor {code} It will then read the text from inputFields name and subject, perform language identification and output the ISO code for the detected language in the outputField. If no language was detected, fallback language is used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Changes Mess
On Dec 6, 2010, at 9:58 AM, Mattmann, Chris A (388J) wrote: Would you mind naming these Apache projects? I'd like to take a look. Tika, Nutch, OODT. Add in Mahout. I believe Hadoop does too. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2803) FieldCache should not pay attention to deleted docs when creating entries
FieldCache should not pay attention to deleted docs when creating entries - Key: LUCENE-2803 URL: https://issues.apache.org/jira/browse/LUCENE-2803 Project: Lucene - Java Issue Type: Bug Reporter: Yonik Seeley The FieldCache uses a key that ignores deleted docs, so it's actually a bug to use deleted docs when creating an entry. It can lead to incorrect values when the same entry is used with a different reader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2763) Swap URL+Email recognizing StandardTokenizer and UAX29Tokenizer
[ https://issues.apache.org/jira/browse/LUCENE-2763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe updated LUCENE-2763: Attachment: LUCENE-2763.patch Updated patch to fix {{solr/CHANGES.txt}}, {{lucene/CHANGES.txt}}, and {{analysis/standard/READ_BEFORE_REGENERATING.txt}}. I will commit later today if there are no objections. Swap URL+Email recognizing StandardTokenizer and UAX29Tokenizer --- Key: LUCENE-2763 URL: https://issues.apache.org/jira/browse/LUCENE-2763 Project: Lucene - Java Issue Type: Improvement Components: Analysis Affects Versions: 3.1, 4.0 Reporter: Steven Rowe Assignee: Steven Rowe Fix For: 3.1, 4.0 Attachments: LUCENE-2763.patch, LUCENE-2763.patch Currently, in addition to implementing the UAX#29 word boundary rules, StandardTokenizer recognizes email adresses and URLs, but doesn't provide a way to turn this behavior off and/or provide overlapping tokens with the components (username from email address, hostname from URL, etc.). UAX29Tokenizer should become StandardTokenizer, and current StandardTokenizer should be renamed to something like UAX29TokenizerPlusPlus (or something like that). For rationale, see [the discussion at the reopened LUCENE-2167|https://issues.apache.org/jira/browse/LUCENE-2167?focusedCommentId=12929325page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12929325]. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Changes Mess
On Dec 5, 2010, at 12:18 PM, Robert Muir wrote: On Sun, Dec 5, 2010 at 12:08 PM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Hi Mark, RE: the credit system. JIRA provides a contribution report here, like this one that I generated for Lucene 3.1: My concern with this is that it leaves out important email contributors. I think we probably miss these as is too. br/ Note, however, in my proposal, one can still call out specific things. We could for instance have a Contributors section and just add names to it. I just think we put too much minutiae in CHANGES and it is a real burden to deal with it across branches b/c there are always massive conflicts and it requires you to look up every last change to recall which version it is in. IMO, JIRA should be the system of record for all bug discussions. Discussions that happen on email can easily be pointed to using any one of our many mail archive systems. Our new Changes could be structured like below. The important thing about this approach is that it can all more or less be written at release time other than the contributor list and perhaps the back compat section. /snip = Version X.Y = Brief Intro == Dependencies == Junit 4.4 == New Features == * Magic search was implemented == Backward Compatibility Breaks == * Blah, blah, blah == Significant Changes == * We've replaced the inverted index with a giant array == Contributors == (alphabetical order) Joe Schmoe (optionally cite an issue number) Jane Doe Optionally paste in the list from JIRA == Full Changes List == * LINK TO JIRA /snip -Grant - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2802) DirectoryReader ignores NRT SegmentInfos in #isOptimized()
[ https://issues.apache.org/jira/browse/LUCENE-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967227#action_12967227 ] Simon Willnauer commented on LUCENE-2802: - bq. Nice catch Simon! This is also a thread safety issue since IR should not touch the writer's segmentInfos outside of sync(IW). it seem like there is more about all that in DR - we should really only use the uncloned SegmentInfos if we are not in NRT mode #getVersion uses it too which is wrong. I actually rely on the isOptimized in several tests and run into a NPE due to that though so we should really fix DR to use a private SegmentInfos or restrict the uncloned one for the isCurrent comparison DirectoryReader ignores NRT SegmentInfos in #isOptimized() -- Key: LUCENE-2802 URL: https://issues.apache.org/jira/browse/LUCENE-2802 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 4.0 Reporter: Simon Willnauer Attachments: LUCENE-2802.patch DirectoryReader only takes shared (with IW) SegmentInfos into account in DirectoryReader#isOptimized(). This can return true even if the actual realtime reader sees more than one segments. {code} public boolean isOptimized() { ensureOpen(); // if segmentsInfos changes in IW this can return false positive return segmentInfos.size() == 1 !hasDeletions(); } {code} DirectoryReader should check if this reader has a non-nul segmentInfosStart and use that instead -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2804) check all tests that use FSDirectory.open
check all tests that use FSDirectory.open - Key: LUCENE-2804 URL: https://issues.apache.org/jira/browse/LUCENE-2804 Project: Lucene - Java Issue Type: Test Reporter: Robert Muir In LUCENE-2471 we were discussing the copyBytes issue, and Shai and I had a discussion about how we could prevent such bugs in the future. One thing that lead to the bug existing in our code for so long, was that it only happened on windows (e.g. never failed in hudson!) This was because the bug only happened if you were copying from SimpleFSDirectory, and the test used FSDirectory.open Today the situation is improving: most tests use newDirectory() which is random by default and never use FSDir.open, it always uses SimpleFS or NIOFS so that the same random seed will reproduce across both windows and unix. So I think we need to review all uses of FSDirectory.open in our tests, and minimize these. In general tests should use newDirectory(). If the test comes with say a zip-file and wants to explicitly open stuff from disk, I think it should open the contents with say SimpleFSDir, and then call newDirectory(Directory) to copy into a new random implementation for actual testing. This method already exists: {noformat} /** * Returns a new Dictionary instance, with contents copied from the * provided directory. See {...@link #newDirectory()} for more * information. */ public static MockDirectoryWrapper newDirectory(Directory d) throws IOException { {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-2790) IndexWriter should call MP.useCompoundFile and not LogMP.getUseCompoundFile
[ https://issues.apache.org/jira/browse/LUCENE-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967183#action_12967183 ] Uwe Schindler edited comment on LUCENE-2790 at 12/6/10 10:54 AM: - I would simply disable the tests. Reflection should only be used when mock classes are used that affect thousands of tests. There are already lots of tests disabled. was (Author: thetaphi): I would supply disable the tests. Reflection should only be used when mock classes are used that affect thousands of tests. There are already lots of tests disabled. IndexWriter should call MP.useCompoundFile and not LogMP.getUseCompoundFile --- Key: LUCENE-2790 URL: https://issues.apache.org/jira/browse/LUCENE-2790 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Shai Erera Assignee: Shai Erera Priority: Minor Fix For: 3.1, 4.0 Attachments: LUCENE-2790-3x.patch, LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch Spin off from here: http://www.gossamer-threads.com/lists/lucene/java-dev/112311. I will attach a patch shortly that addresses the issue on trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENENET-383) System.IO.IOException: read past EOF while deleting the file from upload folder of filemanager.
[ https://issues.apache.org/jira/browse/LUCENENET-383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968383#action_12968383 ] Digy commented on LUCENENET-383: Hi Chaitanya, Even tought it seems to be a Lucene.Net bug, I don't think that you will find anyone willing to fix that 4 years old version. It is probably fixed in 2.9.2 DIGY System.IO.IOException: read past EOF while deleting the file from upload folder of filemanager. --- Key: LUCENENET-383 URL: https://issues.apache.org/jira/browse/LUCENENET-383 Project: Lucene.Net Issue Type: Bug Environment: production Reporter: chaitanya We are getting System.IO.IOException: read past EOF when deleting the file from upload folder of filemanager.It used to work fine earlier.But from fast few days we are getting this error. We are using episerver content management system and episerver inturn uses Lucene for indexing. Please find the following stack trace of the error.Help me inorder to overcome this error.Thanks in advance [IOException: read past EOF] Lucene.Net.Store.BufferedIndexInput.Refill() +233 Lucene.Net.Store.BufferedIndexInput.ReadByte() +21 Lucene.Net.Store.IndexInput.ReadInt() +13 Lucene.Net.Index.SegmentInfos.Read(Directory directory) +60 Lucene.Net.Index.AnonymousClassWith.DoBody() +45 Lucene.Net.Store.With.Run() +67 Lucene.Net.Index.IndexReader.Open(Directory directory, Boolean closeDirectory) +110 Lucene.Net.Index.IndexReader.Open(String path) +65 EPiServer.Web.Hosting.Versioning.Store.FileOperations.DeleteItemIdFromIndex(String filePath, Object fileId) +78 EPiServer.Web.Hosting.Versioning.Store.FileOperations.DeleteFile(Object dirId, Object fileId) +118 EPiServer.Web.Hosting.Versioning.VersioningFileHandler.Delete() +28 EPiServer.Web.Hosting.VersioningFile.Delete() +118 EPiServer.UI.Hosting.UploadFile.ConfirmReplaceButton_Click(Object sender, EventArgs e) +578 EPiServer.UI.WebControls.ToolButton.OnClick(EventArgs e) +107 EPiServer.UI.WebControls.ToolButton.RaisePostBackEvent(String eventArgument) +135 System.Web.UI.Page.RaisePostBackEvent(IPostBackEventHandler sourceControl, String eventArgument) +13 System.Web.UI.Page.RaisePostBackEvent(NameValueCollection postData) +36 System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint) +1565 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: FieldCache usage for custom field collapse in solr 1.4
Hey Yonik. Thanks for clarifying. The reason I went rolling my own way - I asked previously is there's any plan to back-port the field collapse to solr 1.4 and I understood that its not at all straight forward. If you think it'll be fairly easy to look at the new code in Solr 4.0 trunk and use that as basis for example I'd go ahead and do that. Q - does the field collapse componet expect the field to collapse on to be stored? or does it also try to use field cache trickery? Thanks, Adam On Mon, Dec 6, 2010 at 9:42 AM, Yonik Seeley yo...@lucidimagination.comwrote: On Sun, Dec 5, 2010 at 6:12 PM, Adam H. jimmoe...@gmail.com wrote: StringIndex fieldCacheVals = FieldCache.DEFAULT.getStringIndex(reader, collapseField); where 'reader' is the instance of the SolrIndexReader passed along to the component with the ResponseBuilder.SolrQueryRequest object. As I understand, this can double memory usage due to (re)loading this fieldcache on a reader-wide basis rather than on a per segment basis? Yep. Sorting and function queries use per-segment FieldCache entries. So If you also request a FieldCache from the top level reader, it won't reuse the per-segment caches and hence will take up 2x memory over just using per-segment. Solr's field collapsing already works on a per-segment basis... if your needs are at all general, it could make sense to try and get it rolled into solr rather than implementing custom code. -Yonik http://www.lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: FieldCache usage for custom field collapse in solr 1.4
On Mon, Dec 6, 2010 at 3:24 PM, Adam H. jimmoe...@gmail.com wrote: Hey Yonik. Thanks for clarifying. The reason I went rolling my own way - I asked previously is there's any plan to back-port the field collapse to solr 1.4 and I understood that its not at all straight forward. Ahhh... I'd just use trunk if possible ;-) The risks to being in production on custom code that no one else uses is perhaps greater than running on a widely used development version. But yes... I don't see a backport happening for 1.4 -Yonik http://www.lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: FieldCache usage for custom field collapse in solr 1.4
Fair enough - I might give it a shot if most functionality is compatible to solr 1.4.1 to your mind? and is fairly stable? One last Q regarding correct usage of per-segment FieldCache in Solr components - since this is something I might also have issues with elsewhere, and I suspect other people who work on custom logic as well, i think it might be useful to have some documentation and/or a simple programmatic interface for implementing correct access path to these inside a custom SolrComponent. I looked around the Grouping code abit and have yet to fully understand whats going on, but is the ValueSource supposed to take care of access to underlying field? On Mon, Dec 6, 2010 at 12:34 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Mon, Dec 6, 2010 at 3:24 PM, Adam H. jimmoe...@gmail.com wrote: Hey Yonik. Thanks for clarifying. The reason I went rolling my own way - I asked previously is there's any plan to back-port the field collapse to solr 1.4 and I understood that its not at all straight forward. Ahhh... I'd just use trunk if possible ;-) The risks to being in production on custom code that no one else uses is perhaps greater than running on a widely used development version. But yes... I don't see a backport happening for 1.4 -Yonik http://www.lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: FieldCache usage for custom field collapse in solr 1.4
On Mon, Dec 6, 2010 at 3:41 PM, Adam H. jimmoe...@gmail.com wrote: Fair enough - I might give it a shot if most functionality is compatible to solr 1.4.1 to your mind? and is fairly stable? Yes, the external APIs are very compatible. The internal APIs - not so much. You should reindex also. One last Q regarding correct usage of per-segment FieldCache in Solr components - since this is something I might also have issues with elsewhere, and I suspect other people who work on custom logic as well, i think it might be useful to have some documentation and/or a simple programmatic interface for implementing correct access path to these inside a custom SolrComponent. I looked around the Grouping code abit and have yet to fully understand whats going on, but is the ValueSource supposed to take care of access to underlying field? Yes - you can actually group on arbitrary function queries even. That will be more useful when we add some bucketing functions. -Yonik http://www.lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2186) First cut at column-stride fields (index values storage)
[ https://issues.apache.org/jira/browse/LUCENE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968416#action_12968416 ] Simon Willnauer commented on LUCENE-2186: - bq. Whew... this interface is more expansive than I thought it would be (but I guess it's really many issues rolled into one... like sorting, caching, etc). sorry about that :) bq. So it seems like DocValuesEnum is the traditional lowest level read the index, and Source is a cached version of that? Not quiet DocValuesEnum is an iterator based access to the DocValues which does not load everything to memory while Source is a entirely Ram-Resident offering random access to values similar to field cache. Yet, you can also obtain a DocValuesEnum from a Source since its already in memory. bq. A higher level question I have is why we're not reusing the FieldCache for caching/sorting? You mean as a replacement for Source? - For caching what we did in here is to leave it to the user to do the caching or cache based on Source instance how would that relate to FieldCache in your opinion? First cut at column-stride fields (index values storage) Key: LUCENE-2186 URL: https://issues.apache.org/jira/browse/LUCENE-2186 Project: Lucene - Java Issue Type: New Feature Components: Index Reporter: Michael McCandless Assignee: Simon Willnauer Fix For: CSF branch, 4.0 Attachments: LUCENE-2186.patch, LUCENE-2186.patch, LUCENE-2186.patch, LUCENE-2186.patch, LUCENE-2186.patch, mem.py I created an initial basic impl for storing index values (ie column-stride value storage). This is still a work in progress... but the approach looks compelling. I'm posting my current status/patch here to get feedback/iterate, etc. The code is standalone now, and lives under new package oal.index.values (plus some util changes, refactorings) -- I have yet to integrate into Lucene so eg you can mark that a given Field's value should be stored into the index values, sorting will use these values instead of field cache, etc. It handles 3 types of values: * Six variants of byte[] per doc, all combinations of fixed vs variable length, and stored either straight (good for eg a title field), deref (good when many docs share the same value, but you won't do any sorting) or sorted. * Integers (variable bit precision used as necessary, ie this can store byte/short/int/long, and all precisions in between) * Floats (4 or 8 byte precision) String fields are stored as the UTF8 byte[]. This patch adds a BytesRef, which does the same thing as flex's TermRef (we should merge them). This patch also adds basic initial impl of PackedInts (LUCENE-1990); we can swap that out if/when we get a better impl. This storage is dense (like field cache), so it's appropriate when the field occurs in all/most docs. It's just like field cache, except the reading API is a get() method invocation, per document. Next step is to do basic integration with Lucene, and then compare sort performance of this vs field cache. For the sort by String value case, I think RAM usage GC load of this index values API should be much better than field caache, since it does not create object per document (instead shares big long[] and byte[] across all docs), and because the values are stored in RAM as their UTF8 bytes. There are abstract Writer/Reader classes. The current reader impls are entirely RAM resident (like field cache), but the API is (I think) agnostic, ie, one could make an MMAP impl instead. I think this is the first baby step towards LUCENE-1231. Ie, it cannot yet update values, and the reading API is fully random-access by docID (like field cache), not like a posting list, though I do think we should add an iterator() api (to return flex's DocsEnum) -- eg I think this would be a good way to track avg doc/field length for BM25/lnu.ltc scoring. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2802) DirectoryReader ignores NRT SegmentInfos in #isOptimized()
[ https://issues.apache.org/jira/browse/LUCENE-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968418#action_12968418 ] Simon Willnauer commented on LUCENE-2802: - I changed DirectoryReader to use the cloned version of SegmentInfos instead of two of them inconsistently. The only failure I get is on TestIndexWriterReader line 105 {code} r1.close(); writer.close(); assertTrue(r2.isCurrent()); {code} where the writer is closed and afterwards it checks if the r2 reader still is the current one which failes since the writer.close() method changes the version of the SegmentInfos. In my opinion this is actually the semantics I would expect from #isCurrent(), the question is if we would want to change the semantics to return false from #isCurrent if the writer we used to obtain the reader from is closed. I think we should consider it for consistency and simplicity though. DirectoryReader ignores NRT SegmentInfos in #isOptimized() -- Key: LUCENE-2802 URL: https://issues.apache.org/jira/browse/LUCENE-2802 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 4.0 Reporter: Simon Willnauer Attachments: LUCENE-2802.patch DirectoryReader only takes shared (with IW) SegmentInfos into account in DirectoryReader#isOptimized(). This can return true even if the actual realtime reader sees more than one segments. {code} public boolean isOptimized() { ensureOpen(); // if segmentsInfos changes in IW this can return false positive return segmentInfos.size() == 1 !hasDeletions(); } {code} DirectoryReader should check if this reader has a non-nul segmentInfosStart and use that instead -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: FieldCache usage for custom field collapse in solr 1.4
On Mon, Dec 6, 2010 at 4:02 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Mon, Dec 6, 2010 at 3:41 PM, Adam H. jimmoe...@gmail.com wrote: Fair enough - I might give it a shot if most functionality is compatible to solr 1.4.1 to your mind? and is fairly stable? Yes, the external APIs are very compatible. The internal APIs - not so much. You should reindex also. And not be (too) surprised if things change before the official 4.x release -- the chances are good that something will change that may require reindexing. ryan - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2186) DataImportHandler multi-threaded option throws exception
[ https://issues.apache.org/jira/browse/SOLR-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968439#action_12968439 ] Grant Ingersoll commented on SOLR-2186: --- Lance, can you update this patch and add a unit test? DataImportHandler multi-threaded option throws exception Key: SOLR-2186 URL: https://issues.apache.org/jira/browse/SOLR-2186 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Reporter: Lance Norskog Assignee: Grant Ingersoll Attachments: TikaResolver.patch The multi-threaded option for the DataImportHandler throws an exception and the entire operation fails. This is true even if only 1 thread is configured via *threads='1'* -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968445#action_12968445 ] Yonik Seeley commented on SOLR-1979: bq. In skimming the current patch, it looks like fields get mapped no matter what. What if I just want the language detected and added as another field, but no field mapping desired? Yeah, that's sort of in line with my: bq. And just because you can detect a language doesn't mean you know how to handle it differently... so also have an optional catchall that handles all languages not specifically mapped. So for all unmapped languages, you may want to map to a single generic field, or not map at all (leave field as is). I guess it also depends on the general strategy... if you are detecting language on the body field, are we using a copyField type approach and only storing the body field while indexing as body_enText, or are we moving the field from body to body_enText? bq. Also, if there are multiple input fields, the current patch would create multiple language field values requiring that field to be multi-valued. Is the goal here to identify a single language for a document? I could see both making sense. Create LanguageIdentifierUpdateProcessor Key: SOLR-1979 URL: https://issues.apache.org/jira/browse/SOLR-1979 Project: Solr Issue Type: New Feature Components: update Reporter: Jan Høydahl Assignee: Grant Ingersoll Priority: Minor Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch We need the ability to detect language of some random text in order to act upon it, such as indexing the content into language aware fields. Another usecase is to be able to filter/facet on language on random unstructured content. To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The processor is configurable like this: {code:xml} processor class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory str name=inputFieldsname,subject/str str name=outputFieldlanguage_s/str str name=idFieldid/str str name=fallbacken/str /processor {code} It will then read the text from inputFields name and subject, perform language identification and output the ISO code for the detected language in the outputField. If no language was detected, fallback language is used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs
[ https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968451#action_12968451 ] Yonik Seeley commented on LUCENE-2649: -- For the sort-missing-last type of functionality, the current comparator code looks like this (see IntComparator for more context): {code} final int v2 = (checkMissing !cached.valid.get(doc)) ? missingValue : cached.values[doc]; {code} And I was thinking of changing it to this: {code} int v2 = cached.values[doc]; if (valid != null v2==0 !valid.get(doc)) v2 = missingValue; {code} This should make the common case faster by both eliminating an unneeded variable (checkMissing) and checking that the value is the Java default value before checking the bitset. Thoughts? FieldCache should include a BitSet for matching docs Key: LUCENE-2649 URL: https://issues.apache.org/jira/browse/LUCENE-2649 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Ryan McKinley Assignee: Ryan McKinley Fix For: 4.0 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch The FieldCache returns an array representing the values for each doc. However there is no way to know if the doc actually has a value. This should be changed to return an object representing the values *and* a BitSet for all valid docs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Is it possible to set the merge policy setMaxMergeMB from Solr
Lucene has this method to set the maximum size of a segment when merging: LogByteSizeMergePolicy.setMaxMergeMB (http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/index/LogByteSizeMergePolicy.html#setMaxMergeMB%28double%29 ) I would like to be able to set this in my solrconfig.xml. Is this possible? If not should I open a JIRA issue or is there some gotcha I am unaware of? Tom Tom Burton-West
[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs
[ https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968458#action_12968458 ] Ryan McKinley commented on LUCENE-2649: --- looks good to me bq. we instantiate vals.values lazily for some reason... and then at the end, if it still hasn't been instantiated, we do it anyway? I don't know about this, I just copied from the existing code... We could make the case where Bits.MatchNoBits( maxDoc ), have a null array. This would make your proposed change invalid though since it checks the array first. bq. I'm still trying to grok the logic of calling checkMatchAllBits only if vals.valid == null... seems like it will always return null in that case? The assumption is that once vals.valid is set, it should not be recalculated. The reasons for the if vals.valie == null in the validate function are: - the vals.valid Bits may have been set in fillXXValues - the first call may have excluded checkMatchAllBits, and a subsequet call has it set Are you asking about in the validate function? If so, fillXXXValues can set the vals.valid, so it does not do it again. FieldCache should include a BitSet for matching docs Key: LUCENE-2649 URL: https://issues.apache.org/jira/browse/LUCENE-2649 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Ryan McKinley Assignee: Ryan McKinley Fix For: 4.0 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch The FieldCache returns an array representing the values for each doc. However there is no way to know if the doc actually has a value. This should be changed to return an object representing the values *and* a BitSet for all valid docs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: FieldCache usage for custom field collapse in solr 1.4
So, summing up all the information i now have, and the fact I have some additional custom components that use fieldcache, such that the specific answer for field collapsing by migrating to solr 4.0 is not a complete solution to my problems, it seems to me more and more like I might have to actually implement a custom solr QueryComponent, whereby I will pass it multiple collectors (perhaps via some kind of MultiCollector interface, similar to Grouping uses) which will do their appropriate field value collection/aggregation as results are being fetched. In other words, using a per-segment fieldcache collection as a post-processing step (e.g after QueryComponent did its collection) does not seem at all trivial, if at all possible ( is it possible? ) Is this accurate? Thanks again for all the info here.. Adam On Mon, Dec 6, 2010 at 1:48 PM, Ryan McKinley ryan...@gmail.com wrote: On Mon, Dec 6, 2010 at 4:02 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Mon, Dec 6, 2010 at 3:41 PM, Adam H. jimmoe...@gmail.com wrote: Fair enough - I might give it a shot if most functionality is compatible to solr 1.4.1 to your mind? and is fairly stable? Yes, the external APIs are very compatible. The internal APIs - not so much. You should reindex also. And not be (too) surprised if things change before the official 4.x release -- the chances are good that something will change that may require reindexing. ryan - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: FieldCache usage for custom field collapse in solr 1.4
On Mon, Dec 6, 2010 at 5:48 PM, Adam H. jimmoe...@gmail.com wrote: In other words, using a per-segment fieldcache collection as a post-processing step (e.g after QueryComponent did its collection) does not seem at all trivial, if at all possible ( is it possible? ) Sure, it's possible, and not too hard (as long as no sort field involves score). Just instruct the QueryComponent to retrieve the set of all matching documents, then you can use that to run then through whatever collectors you want again. I've been meaning to implement this optimization to field collapsing... Depending on the details, either replacing the QueryComponent with your custom one, or inserting an additional component after the query component could make sense. -Yonik http://www.lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2803) FieldCache should not pay attention to deleted docs when creating entries
[ https://issues.apache.org/jira/browse/LUCENE-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated LUCENE-2803: - Attachment: LUCENE-2803.patch Here's the patch... pretty simple, so I plan on committing shortly. FieldCache should not pay attention to deleted docs when creating entries - Key: LUCENE-2803 URL: https://issues.apache.org/jira/browse/LUCENE-2803 Project: Lucene - Java Issue Type: Bug Reporter: Yonik Seeley Attachments: LUCENE-2803.patch The FieldCache uses a key that ignores deleted docs, so it's actually a bug to use deleted docs when creating an entry. It can lead to incorrect values when the same entry is used with a different reader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: FieldCache usage for custom field collapse in solr 1.4
ah! so just so I can get cracking on this - Can you be alittle more specific? e.g in my component implementation that runs in the request handling after the normal QueryComponent, How would I access the specific field value for the documents that were retrieved? i.e how would it fit in a code like this if at all: // docList is the matching documents for given offset/rows/query DocIterator it = docList.iterator(); while (it.hasNext()) { docId = it.next(); score = it.score(); // this would've worked if this was stored field: // reader.document(docId).get(fieldName) ?? } On Mon, Dec 6, 2010 at 2:57 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Mon, Dec 6, 2010 at 5:48 PM, Adam H. jimmoe...@gmail.com wrote: In other words, using a per-segment fieldcache collection as a post-processing step (e.g after QueryComponent did its collection) does not seem at all trivial, if at all possible ( is it possible? ) Sure, it's possible, and not too hard (as long as no sort field involves score). Just instruct the QueryComponent to retrieve the set of all matching documents, then you can use that to run then through whatever collectors you want again. I've been meaning to implement this optimization to field collapsing... Depending on the details, either replacing the QueryComponent with your custom one, or inserting an additional component after the query component could make sense. -Yonik http://www.lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2803) FieldCache should not pay attention to deleted docs when creating entries
[ https://issues.apache.org/jira/browse/LUCENE-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968488#action_12968488 ] Ryan McKinley commented on LUCENE-2803: --- if checkMatchAllBits always has a null first parameter, should we just take it out? FieldCache should not pay attention to deleted docs when creating entries - Key: LUCENE-2803 URL: https://issues.apache.org/jira/browse/LUCENE-2803 Project: Lucene - Java Issue Type: Bug Reporter: Yonik Seeley Attachments: LUCENE-2803.patch The FieldCache uses a key that ignores deleted docs, so it's actually a bug to use deleted docs when creating an entry. It can lead to incorrect values when the same entry is used with a different reader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Assigned: (LUCENE-2802) DirectoryReader ignores NRT SegmentInfos in #isOptimized()
[ https://issues.apache.org/jira/browse/LUCENE-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer reassigned LUCENE-2802: --- Assignee: Simon Willnauer DirectoryReader ignores NRT SegmentInfos in #isOptimized() -- Key: LUCENE-2802 URL: https://issues.apache.org/jira/browse/LUCENE-2802 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Attachments: LUCENE-2802.patch DirectoryReader only takes shared (with IW) SegmentInfos into account in DirectoryReader#isOptimized(). This can return true even if the actual realtime reader sees more than one segments. {code} public boolean isOptimized() { ensureOpen(); // if segmentsInfos changes in IW this can return false positive return segmentInfos.size() == 1 !hasDeletions(); } {code} DirectoryReader should check if this reader has a non-nul segmentInfosStart and use that instead -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2802) DirectoryReader ignores NRT SegmentInfos in #isOptimized()
[ https://issues.apache.org/jira/browse/LUCENE-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2802: Attachment: LUCENE-2802.patch here is a patch that removes the mutable state from DirectoryReader in the NRT case. The actual reason IMO why this has been introduced was that the RT reader returns true from #isCurrent() if the wirter was closed which is actually wrong since closing a writer changes the index and the reader should see that change. I also added a testcase for is current to check the semantics DirectoryReader ignores NRT SegmentInfos in #isOptimized() -- Key: LUCENE-2802 URL: https://issues.apache.org/jira/browse/LUCENE-2802 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Attachments: LUCENE-2802.patch, LUCENE-2802.patch DirectoryReader only takes shared (with IW) SegmentInfos into account in DirectoryReader#isOptimized(). This can return true even if the actual realtime reader sees more than one segments. {code} public boolean isOptimized() { ensureOpen(); // if segmentsInfos changes in IW this can return false positive return segmentInfos.size() == 1 !hasDeletions(); } {code} DirectoryReader should check if this reader has a non-nul segmentInfosStart and use that instead -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Changes Mess
This is out of thread, but I realized that some entries for DIH are in solr/CHANGES.txt. These should go solr/contrib/dataimporthandler/CHANGES.txt (Some of them are my fault). I also found that solr/contrib/*/CHANGES.txt have 1.5-dev title. These should be 4.0-dev or 3.1-dev. I'll open a ticket. Koji -- http://www.rondhuit.com/en/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2802) DirectoryReader ignores NRT SegmentInfos in #isOptimized()
[ https://issues.apache.org/jira/browse/LUCENE-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2802: Affects Version/s: 3.1 we need to backport to 3.x too DirectoryReader ignores NRT SegmentInfos in #isOptimized() -- Key: LUCENE-2802 URL: https://issues.apache.org/jira/browse/LUCENE-2802 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 3.1, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Attachments: LUCENE-2802.patch, LUCENE-2802.patch DirectoryReader only takes shared (with IW) SegmentInfos into account in DirectoryReader#isOptimized(). This can return true even if the actual realtime reader sees more than one segments. {code} public boolean isOptimized() { ensureOpen(); // if segmentsInfos changes in IW this can return false positive return segmentInfos.size() == 1 !hasDeletions(); } {code} DirectoryReader should check if this reader has a non-nul segmentInfosStart and use that instead -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2802) DirectoryReader ignores NRT SegmentInfos in #isOptimized()
[ https://issues.apache.org/jira/browse/LUCENE-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968503#action_12968503 ] Earwin Burrfoot commented on LUCENE-2802: - Patch looks cool. DirectoryReader ignores NRT SegmentInfos in #isOptimized() -- Key: LUCENE-2802 URL: https://issues.apache.org/jira/browse/LUCENE-2802 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 3.1, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Attachments: LUCENE-2802.patch, LUCENE-2802.patch DirectoryReader only takes shared (with IW) SegmentInfos into account in DirectoryReader#isOptimized(). This can return true even if the actual realtime reader sees more than one segments. {code} public boolean isOptimized() { ensureOpen(); // if segmentsInfos changes in IW this can return false positive return segmentInfos.size() == 1 !hasDeletions(); } {code} DirectoryReader should check if this reader has a non-nul segmentInfosStart and use that instead -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: FieldCache usage for custom field collapse in solr 1.4
One more comment/question - Having looked at the Solr stats panel, I do not see detailed memory usage for the field i'm collapsing on in the lucene FieldCache entries listings. As I understand ( after having looked through this ticket: https://issues.apache.org/jira/browse/SOLR-1292 ), this means that its not an 'insanity' instance, and so actually I am not using double the memory, but rather only have this field in the FieldCache on the whole index level. This got me thinking - If i'm not using any segment-level fieldcaching for this field, there's no reason not to use an index-wide one, as long as I can guarantee thats the only use case for this field in the fieldcache.. is this correct? Thanks again for helping me out with this delicate subject :) Adam On Mon, Dec 6, 2010 at 3:21 PM, Adam H. jimmoe...@gmail.com wrote: ah! so just so I can get cracking on this - Can you be alittle more specific? e.g in my component implementation that runs in the request handling after the normal QueryComponent, How would I access the specific field value for the documents that were retrieved? i.e how would it fit in a code like this if at all: // docList is the matching documents for given offset/rows/query DocIterator it = docList.iterator(); while (it.hasNext()) { docId = it.next(); score = it.score(); // this would've worked if this was stored field: // reader.document(docId).get(fieldName) ?? } On Mon, Dec 6, 2010 at 2:57 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Mon, Dec 6, 2010 at 5:48 PM, Adam H. jimmoe...@gmail.com wrote: In other words, using a per-segment fieldcache collection as a post-processing step (e.g after QueryComponent did its collection) does not seem at all trivial, if at all possible ( is it possible? ) Sure, it's possible, and not too hard (as long as no sort field involves score). Just instruct the QueryComponent to retrieve the set of all matching documents, then you can use that to run then through whatever collectors you want again. I've been meaning to implement this optimization to field collapsing... Depending on the details, either replacing the QueryComponent with your custom one, or inserting an additional component after the query component could make sense. -Yonik http://www.lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2805) SegmentInfos shouldn't blindly increment version on commit
SegmentInfos shouldn't blindly increment version on commit -- Key: LUCENE-2805 URL: https://issues.apache.org/jira/browse/LUCENE-2805 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Michael McCandless Fix For: 3.1, 4.0 SegmentInfos currently increments version on the assumption that there are always changes. But, both DirReader and IW are more careful about tracking whether there are changes. DirReader has hasChanges and IW has changeCount. I think these classes should notify the SIS when there are in fact changes; this will fix the case Simon hit on fixing LUCENE-2082 when the NRT reader thought there were changes, but in fact there weren't because IW simply committed the exact SIS it already had. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2269) contrib entries in solr/CHANGES.txt should go solr/contrib/*/CHANGES.txt
contrib entries in solr/CHANGES.txt should go solr/contrib/*/CHANGES.txt Key: SOLR-2269 URL: https://issues.apache.org/jira/browse/SOLR-2269 Project: Solr Issue Type: Task Components: contrib - Clustering, contrib - DataImportHandler, contrib - Solr Cell (Tika extraction) Affects Versions: 3.1, 4.0 Reporter: Koji Sekiguchi Priority: Minor Fix For: 3.1, 4.0 http://www.lucidimagination.com/search/document/b8c19488a691265c/changes_mess {quote} I realized that some entries for DIH are in solr/CHANGES.txt. These should go solr/contrib/dataimporthandler/CHANGES.txt (Some of them are my fault). I also found that solr/contrib/*/CHANGES.txt have 1.5-dev title. These should be 4.0-dev or 3.1-dev. {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2805) SegmentInfos shouldn't blindly increment version on commit
[ https://issues.apache.org/jira/browse/LUCENE-2805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968521#action_12968521 ] Michael McCandless commented on LUCENE-2805: Duh, make that LUCENE-2802. SegmentInfos shouldn't blindly increment version on commit -- Key: LUCENE-2805 URL: https://issues.apache.org/jira/browse/LUCENE-2805 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Michael McCandless Fix For: 3.1, 4.0 Attachments: LUCENE-2805.patch SegmentInfos currently increments version on the assumption that there are always changes. But, both DirReader and IW are more careful about tracking whether there are changes. DirReader has hasChanges and IW has changeCount. I think these classes should notify the SIS when there are in fact changes; this will fix the case Simon hit on fixing LUCENE-2082 when the NRT reader thought there were changes, but in fact there weren't because IW simply committed the exact SIS it already had. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2805) SegmentInfos shouldn't blindly increment version on commit
[ https://issues.apache.org/jira/browse/LUCENE-2805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2805: --- Attachment: LUCENE-2805.patch Attached first cut patch, just moving the .version++ responsibility into DirReader/IW. But I haven't verified if it fixes the case in LUCENE-2802. SegmentInfos shouldn't blindly increment version on commit -- Key: LUCENE-2805 URL: https://issues.apache.org/jira/browse/LUCENE-2805 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Michael McCandless Fix For: 3.1, 4.0 Attachments: LUCENE-2805.patch SegmentInfos currently increments version on the assumption that there are always changes. But, both DirReader and IW are more careful about tracking whether there are changes. DirReader has hasChanges and IW has changeCount. I think these classes should notify the SIS when there are in fact changes; this will fix the case Simon hit on fixing LUCENE-2082 when the NRT reader thought there were changes, but in fact there weren't because IW simply committed the exact SIS it already had. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968528#action_12968528 ] Grant Ingersoll commented on SOLR-1979: --- bq. So for all unmapped languages, you may want to map to a single generic field, or not map at all (leave field as is). It currently leaves it in the original field. bq. Also, if there are multiple input fields, the current patch would create multiple language field values requiring that field to be multi-valued. Is the goal here to identify a single language for a document? Or a separate language value for each of the input fields (which seems odd to me)? Current patch requires multivalued language field. I figure the main thing you want the lang. field for is faceting and filtering, but it can be changed. As for the broader goal, I think it makes sense to detect languages per field and not per document. In other words, you can have multiple languages in a single document. Create LanguageIdentifierUpdateProcessor Key: SOLR-1979 URL: https://issues.apache.org/jira/browse/SOLR-1979 Project: Solr Issue Type: New Feature Components: update Reporter: Jan Høydahl Assignee: Grant Ingersoll Priority: Minor Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch We need the ability to detect language of some random text in order to act upon it, such as indexing the content into language aware fields. Another usecase is to be able to filter/facet on language on random unstructured content. To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The processor is configurable like this: {code:xml} processor class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory str name=inputFieldsname,subject/str str name=outputFieldlanguage_s/str str name=idFieldid/str str name=fallbacken/str /processor {code} It will then read the text from inputFields name and subject, perform language identification and output the ISO code for the detected language in the outputField. If no language was detected, fallback language is used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-1721) Add explicit option to run DataImportHandler in synchronous mode
[ https://issues.apache.org/jira/browse/SOLR-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-1721: - Affects Version/s: 1.3 1.4 Fix Version/s: 4.0 3.1 Add explicit option to run DataImportHandler in synchronous mode Key: SOLR-1721 URL: https://issues.apache.org/jira/browse/SOLR-1721 Project: Solr Issue Type: New Feature Components: contrib - DataImportHandler Affects Versions: 1.3, 1.4 Reporter: Alexey Serba Assignee: Noble Paul Priority: Trivial Fix For: 3.1, 4.0 Attachments: SOLR-1721.patch There's no explicit option to run DataImportHandler in a synchronous mode / blocking call. It could be useful to run DIH from SolrJ ( EmbeddedSolrServer ) in the same thread. Currently one can pass dummy stream (or enable debug mode) as a workaround to achieve the same behavior, but I think it makes sense to add specific option for that. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2268) Add support for Point in Polygon searches
[ https://issues.apache.org/jira/browse/SOLR-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968545#action_12968545 ] Grant Ingersoll commented on SOLR-2268: --- This is a work in progress. Here are a few ideas: I think this can all be accomplished via a few things: For the case where the field is a polygon and the user supplies a point, we need a new FieldType, PolygonType. I would propose the following format: vertices are separated by semi-colons, points are separated by commas just as they are for the other capabilities, i.e.: 1.0,1.0;0.0,0.0;3.0,3.0 gives the vertices 1.0,1.0 0,0, 3, 3. Lines are assumed between each point. See the java.awt.Polygon class Next, I think we can cover everything else through some function queries: For case one above {code} pip(pt, dimension, boost) -- pt can be a PointType or a Vector. Boost says how much score to give if a point is in a polygon pipll(latlonPt, boost) -- Use spherical calculations to determine if the lat lon point is in the polygon, as it is laid on a sphere //Note, we may just fold this into the one above, but I think the calculations could be different enough that we would want to avoid instanceof checks. Plus the parsing is simpler {code} For case two above, the user would pass in a polygon as defined above for the PolygonType. In this case, we still need a function query: {code} pip(poly, boost) -- poly is the passed in polygon, boost is the value to give if the point is in a polygon {code} For PointType, we can just use capabilities of java.awt.Polygon, for lat lon, I'm still investigating. It could be we still use Polygon, but maybe we can just scale it a little bit bigger and live with some error. Otherwise, there seems to be some decent algorithms for doing it w/ lat/lon (http://msdn.microsoft.com/en-us/library/cc451895.aspx for one). Not sure that one is practical at scale, but it could be a start. While we are at it, it shouldn't be that hard to do the same for lines, i.e. is the point on a line. Add support for Point in Polygon searches - Key: SOLR-2268 URL: https://issues.apache.org/jira/browse/SOLR-2268 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll In spatial applications, it is common to ask whether a point is inside of a polygon. Solr could support two forms of this: # A field contains a polygon and the user supplies a point. If it does, the doc is returned. # A document contains a point and the user supplies a polygon. If the point is in the polygon, return the document With both of these case, it would be good to support the negative assertion, too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2268) Add support for Point in Polygon searches
[ https://issues.apache.org/jira/browse/SOLR-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968553#action_12968553 ] Lance Norskog commented on SOLR-2268: - 2 tricks for speeding up document holds polygons, using vertex-based hashing of lat/long values. (It's a variation on a kind of bitwise filtering whose name I cannot remember: if the bit is off, there is no match, but if the bit is on there may be a match.) Master data: A field with one or more polygon descriptions. Bitwise data: Two bit fields, latitudelongitude, with a string of bits for each vertex. For example, given a Level Of Detail (LOD) of 1 degree, there would be 360 bits in either bitfield. The document would have one of each bitfield. Each degree's bit is true if any polygon has area within that bit's degree. The first phase of searching for point in all polygons is to check the latitude and longitude bitfields for that point. Add support for Point in Polygon searches - Key: SOLR-2268 URL: https://issues.apache.org/jira/browse/SOLR-2268 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll In spatial applications, it is common to ask whether a point is inside of a polygon. Solr could support two forms of this: # A field contains a polygon and the user supplies a point. If it does, the doc is returned. # A document contains a point and the user supplies a polygon. If the point is in the polygon, return the document With both of these case, it would be good to support the negative assertion, too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-trunk - Build # 1385 - Still Failing
Build: https://hudson.apache.org/hudson/job/Lucene-trunk/1385/ All tests passed Build Log (for compile errors): [...truncated 18318 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Is it possible to set the merge policy setMaxMergeMB from Solr
I have not tried this, but some parts of the solrconfig elements support setters for sub-elements. So, this might work but probably won't. mergePolicyorg.apache.lucene.index.LogByteSizeMergePolicy maxMergeMB1024/maxMergeMB /mergePolicy On Mon, Dec 6, 2010 at 2:34 PM, Burton-West, Tom tburt...@umich.edu wrote: Lucene has this method to set the maximum size of a segment when merging: LogByteSizeMergePolicy.setMaxMergeMB (http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/index/LogByteSizeMergePolicy.html#setMaxMergeMB%28double%29 ) I would like to be able to set this in my solrconfig.xml. Is this possible? If not should I open a JIRA issue or is there some gotcha I am unaware of? Tom Tom Burton-West -- Lance Norskog goks...@gmail.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (SOLR-2268) Add support for Point in Polygon searches
[ https://issues.apache.org/jira/browse/SOLR-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968553#action_12968553 ] Lance Norskog edited comment on SOLR-2268 at 12/6/10 10:08 PM: --- 1 trick for speeding up document holds polygons, using vertex-based hashing of lat/long values. (It's a variation on a kind of bitwise filtering whose name I cannot remember: if the bit is off, there is no match, but if the bit is on there may be a match.) Master data: A field with one or more polygon descriptions. Bitwise data: Two bit fields, latitudelongitude, with a string of bits for each vertex. For example, given a Level Of Detail (LOD) of 1 degree, there would be 360 bits in either bitfield. The document would have one of each bitfield. Each degree's bit is true if any polygon has area within that bit's degree. The first phase of searching for point in all polygons is to check the latitude and longitude bitfields for that point. was (Author: lancenorskog): 2 tricks for speeding up document holds polygons, using vertex-based hashing of lat/long values. (It's a variation on a kind of bitwise filtering whose name I cannot remember: if the bit is off, there is no match, but if the bit is on there may be a match.) Master data: A field with one or more polygon descriptions. Bitwise data: Two bit fields, latitudelongitude, with a string of bits for each vertex. For example, given a Level Of Detail (LOD) of 1 degree, there would be 360 bits in either bitfield. The document would have one of each bitfield. Each degree's bit is true if any polygon has area within that bit's degree. The first phase of searching for point in all polygons is to check the latitude and longitude bitfields for that point. Add support for Point in Polygon searches - Key: SOLR-2268 URL: https://issues.apache.org/jira/browse/SOLR-2268 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll In spatial applications, it is common to ask whether a point is inside of a polygon. Solr could support two forms of this: # A field contains a polygon and the user supplies a point. If it does, the doc is returned. # A document contains a point and the user supplies a polygon. If the point is in the polygon, return the document With both of these case, it would be good to support the negative assertion, too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2268) Add support for Point in Polygon searches
[ https://issues.apache.org/jira/browse/SOLR-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968556#action_12968556 ] Lance Norskog commented on SOLR-2268: - A second variation: a multiValued field of vertex pairs which contain a polygon. The incoming point searches for vertex point. This is faster than the bitwise filter, but uses more space for larger polygons. The bitwise filter uses constant memory for each document. Add support for Point in Polygon searches - Key: SOLR-2268 URL: https://issues.apache.org/jira/browse/SOLR-2268 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll In spatial applications, it is common to ask whether a point is inside of a polygon. Solr could support two forms of this: # A field contains a polygon and the user supplies a point. If it does, the doc is returned. # A document contains a point and the user supplies a polygon. If the point is in the polygon, return the document With both of these case, it would be good to support the negative assertion, too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-Solr-tests-only-trunk - Build # 2255 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/2255/ 15 tests failed. REGRESSION: org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration Error Message: null Stack Trace: org.apache.solr.common.cloud.ZooKeeperException: at org.apache.solr.core.CoreContainer.load(CoreContainer.java:441) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:294) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:243) at org.apache.solr.cloud.CloudStateUpdateTest.setUp(CloudStateUpdateTest.java:131) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:979) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:917) Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /collections at org.apache.zookeeper.KeeperException.create(KeeperException.java:118) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243) at org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:199) at org.apache.solr.common.cloud.ZkStateReader.makeShardZkNodeWatches(ZkStateReader.java:184) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:430) FAILED: junit.framework.TestSuite.org.apache.solr.cloud.CloudStateUpdateTest Error Message: ERROR: SolrIndexSearcher opens=24 closes=23 Stack Trace: junit.framework.AssertionFailedError: ERROR: SolrIndexSearcher opens=24 closes=23 at org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:128) at org.apache.solr.SolrTestCaseJ4.deleteCore(SolrTestCaseJ4.java:302) at org.apache.solr.SolrTestCaseJ4.afterClassSolrTestCase(SolrTestCaseJ4.java:79) REGRESSION: org.apache.solr.handler.TestReplicationHandler.testReplicateAfterWrite2Slave Error Message: Error executing query Stack Trace: org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:119) at org.apache.solr.handler.TestReplicationHandler.query(TestReplicationHandler.java:142) at org.apache.solr.handler.TestReplicationHandler.clearIndexWithReplication(TestReplicationHandler.java:85) at org.apache.solr.handler.TestReplicationHandler.testReplicateAfterWrite2Slave(TestReplicationHandler.java:165) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:979) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:917) request: http://localhost:31325/solr/select?q=*:*wt=javabinversion=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89) REGRESSION: org.apache.solr.handler.TestReplicationHandler.testIndexAndConfigReplication Error Message: Error executing query Stack Trace: org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:119) at org.apache.solr.handler.TestReplicationHandler.query(TestReplicationHandler.java:142) at org.apache.solr.handler.TestReplicationHandler.clearIndexWithReplication(TestReplicationHandler.java:85) at org.apache.solr.handler.TestReplicationHandler.testIndexAndConfigReplication(TestReplicationHandler.java:230) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:979) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:917) request: http://localhost:31325/solr/select?q=*:*wt=javabinversion=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89) REGRESSION: org.apache.solr.handler.TestReplicationHandler.testStopPoll Error Message: Error executing query Stack Trace: org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:119) at
[jira] Updated: (LUCENE-2805) SegmentInfos shouldn't blindly increment version on commit
[ https://issues.apache.org/jira/browse/LUCENE-2805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2805: Attachment: LUCENE-2805.patch here is a slightly updated patch that removes the blind increment from DefaultSegmentInfosWriter, adds #changed() calles to contrib classes and adds a missing #changed() call to IW#deleteAll() test pass also for LUCENE-2802 SegmentInfos shouldn't blindly increment version on commit -- Key: LUCENE-2805 URL: https://issues.apache.org/jira/browse/LUCENE-2805 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Michael McCandless Fix For: 3.1, 4.0 Attachments: LUCENE-2805.patch, LUCENE-2805.patch SegmentInfos currently increments version on the assumption that there are always changes. But, both DirReader and IW are more careful about tracking whether there are changes. DirReader has hasChanges and IW has changeCount. I think these classes should notify the SIS when there are in fact changes; this will fix the case Simon hit on fixing LUCENE-2082 when the NRT reader thought there were changes, but in fact there weren't because IW simply committed the exact SIS it already had. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Changes Mess
Jumping in late to this thread, though I've read most of it. As a user and committer, I find the current CHANGES very convenient! It's very easy for me to read what has changed in 3.0, and very easy for me to put a CHANGES entry whenever I work on something that warrants such entry. And if an issue is back ported all the way 'till 1.4, then IMO it should contain an entry in each CHANGES (of each release). Users who download 2.9.4 need to be able to read what has changed since 2.9.3, in a clear and concise way. Which as far as I'm concerned is the current situation and I'm happy with it. Back porting issues is usually a simple svn merge, and in more complex cases, even if it's done manually, the CHANGES entry is the easiest to copy over. I don't think we should work hard to make JIRA produce the CHANGES for us - in the end of the day, JIRA is our bug tracking system, and it should remain like that. The CHANGES entry need to summarize the change to the reader, and combined with the issue number it gives enough info. If one wants, one can load the issue in JIRA and read the full correspondence. So I'm +1 for keeping things as they are, and paying attention to put the entries in all applicable CHANGES. Shai On Monday, December 6, 2010, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Hey Robert, I feel ya. +1 to releasing more often! :) Cheers, Chris On Dec 6, 2010, at 8:31 AM, Robert Muir wrote: On Mon, Dec 6, 2010 at 11:20 AM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Yeah in the end all I can say is that you basically get out of JIRA what you put into it. What you call extra work is just something that I would do anyways working on some of my projects. I'm not saying it's *not difficult* and super easy, but we've just decided in those cases to invest time into the issue management system so that we can get the reports we want out of it. I've seen this work both ways: in the early days of Nutch there were intense debates on simply moving everything to JIRA versus maintaining a disconnected CHANGES.txt file. I've heard all the arguments (many times over) on both sides including tidbits like oh I don't want to go to a separate URL as a consumer of software just to see what changed in it to what's so hard about doing a curl or wget on an Internet-connected system which most of our software assumes nowadays?, those types of things. When the dust cleared, I think I like the approach we use in Tika (and that I use in a number of projects at JPL) which is just to maintain the information in JIRA. It's worked for us since it's a single source to curate that type of information; it produces very useable reports (not perfect, but useable) that are good enough for us in terms of trading between the different properties we want to maximize (user contribution acknowledgement, change history, etc.) I agree with what you said, and as I mentioned before I'm not opposed to the idea at all. But if we are going to rely on JIRA more to produce this documentation, we need to make some major changes to how we use it, to avoid some of the problems I mentioned... The scariest part to me about this approach is that we unfortunately have very long release cycles. So i'm worried about this documentation being generated and fixed at release time versus incrementally where its fresh in our mind... thats a lot of editing and filtering to do. Obviously I feel this would be mitigated and other things much better if we released more often but thats a harder problem, this is just the situation as it is now. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org