[jira] Commented: (LUCENENET-379) Clean up Lucene.Net website
[ https://issues.apache.org/jira/browse/LUCENENET-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995115#comment-12995115 ] Prescott Nasser commented on LUCENENET-379: --- I like it better than the current one. What is people's feelings on moving away from the Lucene logo? We aren't a different product like solr, nor are we a loose port like Lucy. Clean up Lucene.Net website --- Key: LUCENENET-379 URL: https://issues.apache.org/jira/browse/LUCENENET-379 Project: Lucene.Net Issue Type: Task Reporter: George Aroush Attachments: Lucene.zip, New Logo Idea.jpg, asfcms.zip, asfcms_1.patch The existing Lucene.Net home page at http://lucene.apache.org/lucene.net/ is still based on the incubation, out of date design. This JIRA task is to bring it up to date with other ASF project's web page. The existing website is here: https://svn.apache.org/repos/asf/lucene/lucene.net/site/ See http://www.apache.org/dev/project-site.html to get started. It would be best to start by cloning an existing ASF project's website and adopting it for Lucene.Net. Some examples, https://svn.apache.org/repos/asf/lucene/pylucene/site/ and https://svn.apache.org/repos/asf/lucene/java/site/ -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (LUCENENET-379) Clean up Lucene.Net website
[ https://issues.apache.org/jira/browse/LUCENENET-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995120#comment-12995120 ] Christopher Currens commented on LUCENENET-379: --- I would be happy to move away from the old logo. The project's goals have certainly changed from the previous project and I think it deserves to look new. Not to mention the old lucene.net logo, the .net part of it looks funky. Clean up Lucene.Net website --- Key: LUCENENET-379 URL: https://issues.apache.org/jira/browse/LUCENENET-379 Project: Lucene.Net Issue Type: Task Reporter: George Aroush Attachments: Lucene.zip, New Logo Idea.jpg, asfcms.zip, asfcms_1.patch The existing Lucene.Net home page at http://lucene.apache.org/lucene.net/ is still based on the incubation, out of date design. This JIRA task is to bring it up to date with other ASF project's web page. The existing website is here: https://svn.apache.org/repos/asf/lucene/lucene.net/site/ See http://www.apache.org/dev/project-site.html to get started. It would be best to start by cloning an existing ASF project's website and adopting it for Lucene.Net. Some examples, https://svn.apache.org/repos/asf/lucene/pylucene/site/ and https://svn.apache.org/repos/asf/lucene/java/site/ -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Problem loading jcc from java : undefined symbol: PyExc_IOError
On Tue, Feb 15, 2011 at 4:22 AM, Andi Vajda va...@apache.org wrote: On Tue, 15 Feb 2011, Roman Chyla wrote: from: http://realmike.org/blog/2010/07/18/python-extensions-in-cpp-using-swig/ Q. ?Fatal Python error: Interpreter not initialized (version mismatch?)? A. This error occurs when the version of the Python interpreter for which the extension module has been built is different from the version of the interpreter that attempts to import the module. Is there a way to find out which python interpreter version is inside JCC? Also, Is it somehow possible that the java process that load jcc library will be picking the default python (2.4) instead of the python (2.5)? PATH is set to python2.5. There is no Python interpreter inside jcc. It's dynamically linked. To know which version of the shared library is looked for and expected, use the 'ldd' utility against the various shared libraries involved to tell you. That version is selected at build time, when you run 'python setup.py ...' That version of python determines the version of libpython.so used. This will be probably the problem (as you said before), the libjcc.so shows no python - bash-3.2$ ldd build/lib.linux-x86_64-2.5/libjcc.so linux-vdso.so.1 = (0x7fff7affc000) /$LIB/snoopy.so = /lib64/snoopy.so (0x2b8ed0e74000) libjava.so = /afs/cern.ch/user/r/rchyla/public/jdk1.6.0_18/jre/lib/amd64/libjava.so (0x2b8ed1076000) libjvm.so = /afs/cern.ch/user/r/rchyla/public/jdk1.6.0_18/jre/lib/amd64/server/libjvm.so (0x2b8ed11a5000) libstdc++.so.6 = /usr/lib64/libstdc++.so.6 (0x2b8ed1c3f000) libm.so.6 = /lib64/libm.so.6 (0x2b8ed1f3f000) libgcc_s.so.1 = /lib64/libgcc_s.so.1 (0x2b8ed21c2000) libpthread.so.0 = /lib64/libpthread.so.0 (0x2b8ed23cf000) libc.so.6 = /lib64/libc.so.6 (0x2b8ed25eb000) libdl.so.2 = /lib64/libdl.so.2 (0x2b8ed2943000) libverify.so = /afs/cern.ch/user/r/rchyla/public/jdk1.6.0_18/jre/lib/amd64/libverify.so (0x2b8ed2b47000) libnsl.so.1 = /lib64/libnsl.so.1 (0x2b8ed2c57000) /lib64/ld-linux-x86-64.so.2 (0x2b8ed08c9000) And I think, the python2.4 (the default on the system) is being loaded -- but how to force loading of python2.5 (if that was possible at all) I don't know. Compilation is definitely done with -lpython2.5 Cheers, roman Andi.. Cheers, roman On Tue, Feb 15, 2011 at 2:40 AM, Roman Chyla roman.ch...@gmail.com wrote: On Tue, Feb 15, 2011 at 1:32 AM, Andi Vajda va...@apache.org wrote: On Tue, 15 Feb 2011, Roman Chyla wrote: The python embedded in Java works really well on MacOsX and also Ubuntu. But I am trying hard to make it work also on Scientific Linux (SLC5) with *statically* built Python. The python is a build from ActiveState. You mean you're going to try to dynamically load libpython.a into a JVM ? I have no idea if this can work at all. I am very ignorant as far as the difference between statically and dynamically linked libraries go - I just wanted to use JCC wrapped code with this particular statically linked python I got little bit further, but just little: after I changed -Xlinker --export-dynamic into -Xlinker -export-dynamic (and installed python into /opt...) I am getting a different error: SEVERE: org.apache.jcc.PythonException: No module named solrpie.java_bridge null at org.apache.jcc.PythonVM.instantiate(Native Method) at rca.python.jni.PythonVMBridge.start(Unknown Source) at rca.python.jni.PythonVMBridge.start(Unknown Source) at rca.python.jni.PythonVMBridge.start(Unknown Source) at rca.python.jni.SolrpieVM.getBridge(Unknown Source) My understanding is that the previous error has gone (and the python module time is loaded), because if I set PYTHONPATH incorrectly, I get: This message is IMHO coming from Python But when I correct the PYTHONPATH, I am getting only this: [java] Fatal Python error: Interpreter not initialized (version mismatch?) [java] Java Result: 134 If my understanding of static builds is correct, I'd imagine the only way for this to work would be to statically compile the JVM (hotspot) and python together. oooups, that is way over my head But why all this ? Because on the grid, we already had a statically linked python and it was working very well with pylucene (and after all, I managed to make it work also for solr and other packages) But if you think that it is not possible, I should do something else :) But it was fun trying, if you get some idea, please let me know. Thank you, Roman Andi.. So far, I managed to build all the needed extensions (jcc, lucene, solr) and I can run them in python, but when I try to start the java app and use python, I get: SEVERE: org.apache.jcc.PythonException:
[jira] Commented: (SOLR-1395) Integrate Katta
[ https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994688#comment-12994688 ] tom liu commented on SOLR-1395: --- ISolrServer's config is set by katta script. QueryCore's config will be set autolly. Sub-proxy solr is just proxy, which do not process any request. so, sub-proxy dispatch request to querycore. and querycore process request, return solrdoclists. but, you get the exception which do not cast object type. i think that querycore would be wrong. Integrate Katta --- Key: SOLR-1395 URL: https://issues.apache.org/jira/browse/SOLR-1395 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: Next Attachments: SOLR-1395.patch, SOLR-1395.patch, SOLR-1395.patch, back-end.log, front-end.log, hadoop-core-0.19.0.jar, katta-core-0.6-dev.jar, katta-solrcores.jpg, katta.node.properties, katta.zk.properties, log4j-1.2.13.jar, solr-1395-1431-3.patch, solr-1395-1431-4.patch, solr-1395-1431-katta0.6.patch, solr-1395-1431-katta0.6.patch, solr-1395-1431.patch, solr-1395-katta-0.6.2-1.patch, solr-1395-katta-0.6.2-2.patch, solr-1395-katta-0.6.2-3.patch, solr-1395-katta-0.6.2.patch, test-katta-core-0.6-dev.jar, zkclient-0.1-dev.jar, zookeeper-3.2.1.jar Original Estimate: 336h Remaining Estimate: 336h We'll integrate Katta into Solr so that: * Distributed search uses Hadoop RPC * Shard/SolrCore distribution and management * Zookeeper based failover * Indexes may be built using Hadoop -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (SOLR-1969) Make MMapDirectory configurable in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved SOLR-1969. --- Resolution: Duplicate mmap directory was added in SOLR-2187 Make MMapDirectory configurable in solrconfig.xml - Key: SOLR-1969 URL: https://issues.apache.org/jira/browse/SOLR-1969 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Stephen Bochinski Attachments: mmap_upd.patch, mmap_upd.patch, mmap_upd.txt, mmap_upd.txt Original Estimate: 102.5h Remaining Estimate: 102.5h This will make it so you can enable mmapdirectory from the solrconfig.xml file. There are also several configurations you can specify in the solrconfig.xml file. You can enable or disable the unmapping files which have been closed by solr. This is almost necessary for an index which is being optimized. You also have the option to not mmap certain files. In this case, FSDirectory will be used to manage those particular files. This is particularly useful if you are using FieldCache (SOLR-1961). Having this enabled makes it useless to memory map the .fdt and .fdx files, considering they are already in memory. The configurations are specified as follows: directoryFactory class=solr.MMapDirectoryFactory str name=unmaptrue/str lst name=filetypes bool name=fdtfalse/bool bool name=fdxfalse/bool /lst /directoryFactory This would enable unmapping of closed files and would not memory map files ending with .fdt and .fdx. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1391) Token type and flags values get lost when using ShingleMatrixFilter
[ https://issues.apache.org/jira/browse/LUCENE-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994700#comment-12994700 ] Uwe Schindler commented on LUCENE-1391: --- As nobody seems to be interested in or understands this Filter and wants to maintain it, I will deprecate it in 3.x branch and remove in trunk. It's only deprecated, so we can easily un-deprecate it after the release of 3.1 ifsomebody rewrites it to be more generic and works with attributes. Token type and flags values get lost when using ShingleMatrixFilter --- Key: LUCENE-1391 URL: https://issues.apache.org/jira/browse/LUCENE-1391 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Affects Versions: 2.4, 2.9, 3.0 Reporter: Wouter Heijke Assignee: Uwe Schindler Fix For: 3.1, 4.0 Attachments: LUCENE-1391.patch While using the new ShingleMatrixFilter I noticed that a token's type and flags get lost while using this filter. ShingleFilter does respect these values like the other filters I know. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2920) Deprecate and remove ShingleMatrixFilter
[ https://issues.apache.org/jira/browse/LUCENE-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2920: -- Fix Version/s: 4.0 3.1 Deprecate and remove ShingleMatrixFilter Key: LUCENE-2920 URL: https://issues.apache.org/jira/browse/LUCENE-2920 Project: Lucene - Java Issue Type: Task Components: contrib/analyzers Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1, 4.0 Spin-off from LUCENE-1391: This filter is unmainatined and no longer up-to-date, has bugs nobody understands and does not work with attributes. This issue deprecates it as of Lucene 3.1 and removes it from trunk. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2920) Deprecate and remove ShingleMatrixFilter
Deprecate and remove ShingleMatrixFilter Key: LUCENE-2920 URL: https://issues.apache.org/jira/browse/LUCENE-2920 Project: Lucene - Java Issue Type: Task Reporter: Uwe Schindler Assignee: Uwe Schindler Spin-off from LUCENE-1391: This filter is unmainatined and no longer up-to-date, has bugs nobody understands and does not work with attributes. This issue deprecates it as of Lucene 3.1 and removes it from trunk. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2920) Deprecate and remove ShingleMatrixFilter
[ https://issues.apache.org/jira/browse/LUCENE-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2920: -- Component/s: contrib/analyzers Deprecate and remove ShingleMatrixFilter Key: LUCENE-2920 URL: https://issues.apache.org/jira/browse/LUCENE-2920 Project: Lucene - Java Issue Type: Task Components: contrib/analyzers Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1, 4.0 Spin-off from LUCENE-1391: This filter is unmainatined and no longer up-to-date, has bugs nobody understands and does not work with attributes. This issue deprecates it as of Lucene 3.1 and removes it from trunk. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Closed: (LUCENE-1391) Token type and flags values get lost when using ShingleMatrixFilter
[ https://issues.apache.org/jira/browse/LUCENE-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler closed LUCENE-1391. - Resolution: Won't Fix See LUCENE-2920. Token type and flags values get lost when using ShingleMatrixFilter --- Key: LUCENE-1391 URL: https://issues.apache.org/jira/browse/LUCENE-1391 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Affects Versions: 2.4, 2.9, 3.0 Reporter: Wouter Heijke Assignee: Uwe Schindler Fix For: 3.1, 4.0 Attachments: LUCENE-1391.patch While using the new ShingleMatrixFilter I noticed that a token's type and flags get lost while using this filter. ShingleFilter does respect these values like the other filters I know. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1709) Distributed Date Faceting
[ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994716#comment-12994716 ] Peter Sturge commented on SOLR-1709: Hi David, Thank you thank you thank you for working on this and providing tests - your efforts are very much appreciated! For deprecation of facet.date, I suspect it probably shouldn't be deprecated until a fully-fledged replacement is ready, ported and committed, but if SOLR-1240 can functionally slot-in (including the 'NOW' stuff in SOLR-1729), that's great. Many thanks, Peter Distributed Date Faceting - Key: SOLR-1709 URL: https://issues.apache.org/jira/browse/SOLR-1709 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 1.4 Reporter: Peter Sturge Priority: Minor Attachments: FacetComponent.java, FacetComponent.java, ResponseBuilder.java, SOLR-1709_distributed_date_faceting_v3x.patch, solr-1.4.0-solr-1709.patch This patch is for adding support for date facets when using distributed searches. Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of: Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time). The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in. This means that if subsequent shards' facet_dates are skewed in relation to the first by 1 'gap', these 'earlier' or 'later' facets will not be merged in. There are several reasons for this: * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data) This could be dealt with if timezone and skew information was added, and the dates were normalized. One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized. The patch affects 2 files in the Solr core: org.apache.solr.handler.component.FacetComponent.java org.apache.solr.handler.component.ResponseBuilder.java The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage. One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired. Comments suggestions welcome. As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company). -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2920) Deprecate and remove ShingleMatrixFilter
[ https://issues.apache.org/jira/browse/LUCENE-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-2920. --- Resolution: Fixed Deprecated 3.x revision: 1070818 Removed trunk revision: 1070821 Deprecate and remove ShingleMatrixFilter Key: LUCENE-2920 URL: https://issues.apache.org/jira/browse/LUCENE-2920 Project: Lucene - Java Issue Type: Task Components: contrib/analyzers Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1, 4.0 Spin-off from LUCENE-1391: This filter is unmainatined and no longer up-to-date, has bugs nobody understands and does not work with attributes. This issue deprecates it as of Lucene 3.1 and removes it from trunk. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
inverted index pruning
hi all, I recently read a paper Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee. It's idea is interesting and I have some questions and like to share with you. it's idea is pruning unlikely documents for certain terms e.g. term1 d1 d3 d6 | d9 d7 d8 term2 d1 d6 d8 | d3 d4 d5 we have 2 terms here -- term1 and term2 we perform a and query for searching documents that contain both the 2 terms if our score function consider 2 types of scores: document static score and term related document score for simplicity, the static score is the page_rank of a document and term related score is tf*idf and final score is linear combination such as page_rank + tf*idf the pruning criterion is : we only keep the documents whose page_rank is top N of all the docList for this term and tf*idf is also top N take above example d1 d3 d5's page_rank is top 3 of all the 5 documents containing term1 and also d1's tf *idf is top 3 of all the 5 documents Then we want to get top 3 documents we evaluate d1 d3 d6 d8 because it's in pruned docList d1 and d6 's information is complete so we can calculate the accurate score of d1 d3 d8's information is incomplete, we can only calculate the upper bound of them. we select top 3 document by score. suppose the result is d6 d1 d8 d3 because d8 and d3 is upper bound score, we can compare the real score of d8 and d3. So the search is failed and we have to search un-pruned index for answer But if we want to get top 2 documents. we can know d1 and d6 are the answer because d6, d8 and other documents' score is less than them The experiments in this paper shows that we can use pruned index(30% size of full index) to answer 90%+ queries This result is excited but we have a problem in using it in lucene. for and query in lucene, we use skipList to speed up queries. but this pruning algorithm cannot use pruned index. we can check it by above example. if we use skipList, we can skip d3. but d3's upper bound score may larger than d1. if we don't score d3, we cannot know it. although we need only score docList in pruned index(30%), it may be slower than and query in full index because we can use skipList to skip many docs. Anyone has good idea for this problem? if we can solve this problem, we can improve performance well. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1395) Integrate Katta
[ https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994757#comment-12994757 ] JohnWu commented on SOLR-1395: -- tomliu: so the solrhome of ISolrServer need configure to multi-core style? in solr.xml solr persistent=false cores adminPath=/admin/cores core name=queryCore instanceDir=queryCore/ /cores /solr but how to set the handler to each role of katta slave? can you show the solr home folder hierarchy and config content of katta slave node? Integrate Katta --- Key: SOLR-1395 URL: https://issues.apache.org/jira/browse/SOLR-1395 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: Next Attachments: SOLR-1395.patch, SOLR-1395.patch, SOLR-1395.patch, back-end.log, front-end.log, hadoop-core-0.19.0.jar, katta-core-0.6-dev.jar, katta-solrcores.jpg, katta.node.properties, katta.zk.properties, log4j-1.2.13.jar, solr-1395-1431-3.patch, solr-1395-1431-4.patch, solr-1395-1431-katta0.6.patch, solr-1395-1431-katta0.6.patch, solr-1395-1431.patch, solr-1395-katta-0.6.2-1.patch, solr-1395-katta-0.6.2-2.patch, solr-1395-katta-0.6.2-3.patch, solr-1395-katta-0.6.2.patch, test-katta-core-0.6-dev.jar, zkclient-0.1-dev.jar, zookeeper-3.2.1.jar Original Estimate: 336h Remaining Estimate: 336h We'll integrate Katta into Solr so that: * Distributed search uses Hadoop RPC * Shard/SolrCore distribution and management * Zookeeper based failover * Indexes may be built using Hadoop -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2894) Use of google-code-prettify for Lucene/Solr Javadoc
[ https://issues.apache.org/jira/browse/LUCENE-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994759#comment-12994759 ] Koji Sekiguchi commented on LUCENE-2894: {quote} but Solr javadoc on hudson looks not good: https://hudson.apache.org/hudson/job/Solr-trunk/javadoc/org/apache/solr/handler/component/TermsComponent.html {quote} The problem was gone. Use of google-code-prettify for Lucene/Solr Javadoc --- Key: LUCENE-2894 URL: https://issues.apache.org/jira/browse/LUCENE-2894 Project: Lucene - Java Issue Type: Improvement Components: Javadocs Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 3.1, 4.0 Attachments: LUCENE-2894.patch, LUCENE-2894.patch, LUCENE-2894.patch, LUCENE-2894.patch My company, RONDHUIT uses google-code-prettify (Apache License 2.0) in Javadoc for syntax highlighting: http://www.rondhuit-demo.com/RCSS/api/com/rondhuit/solr/analysis/JaReadingSynonymFilterFactory.html I think we can use it for Lucene javadoc (java sample code in overview.html etc) and Solr javadoc (Analyzer Factories etc) to improve or simplify our life. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2272) Join
[ https://issues.apache.org/jira/browse/SOLR-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994762#comment-12994762 ] Bojan Smid commented on SOLR-2272: -- Very nice patch Yonik. However, it doesn't apply on current trunk any more. Does anyone, by any chance, have a fresh version of this patch? Join Key: SOLR-2272 URL: https://issues.apache.org/jira/browse/SOLR-2272 Project: Solr Issue Type: New Feature Components: search Reporter: Yonik Seeley Fix For: 4.0 Attachments: SOLR-2272.patch Limited join functionality for Solr, mapping one set of IDs matching a query to another set of IDs, based on the indexed tokens of the fields. Example: fq={!join from=parent_ptr to:parent_id}child_doc:query -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Closed: (SOLR-2293) SolrCloud distributed indexing
[ https://issues.apache.org/jira/browse/SOLR-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl closed SOLR-2293. - Resolution: Duplicate Use SOLR-2358 SolrCloud distributed indexing -- Key: SOLR-2293 URL: https://issues.apache.org/jira/browse/SOLR-2293 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Jan Høydahl Add SolrCloud support for distributed indexing, as described in http://wiki.apache.org/solr/DistributedSearch#Distributed_Indexing and the Support user specified partitioning paragraph of http://wiki.apache.org/solr/SolrCloud#High_level_design_goals Currently, the client needs to decide what shard indexer to talk to for each document. Common partitioning strategies include has-based, date-based and custom. Solr should have the capability of accepting a document update on any of the nodes in a cluster, and perform partitioning and distribution of updates to correct shard, based on current ZK config. The ShardDistributionPolicy should be pluggable, with the most common provided out of the box. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994766#comment-12994766 ] Jan Høydahl commented on SOLR-2358: --- See SOLR-2293 for some thoughts. Since this functionality is core to Solr and should always be present, it would be natural to either build it into the DirectUpdateHandler2 or to add this processor to the set of default UpdateProcessors that are executed if no update.processor parameter is specified. Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Reporter: William Mayor Priority: Minor Attachments: SOLR-2358.patch The first steps towards creating distributed indexing functionality in Solr -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2908) clean up serialization in the codebase
[ https://issues.apache.org/jira/browse/LUCENE-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994769#comment-12994769 ] Earwin Burrfoot commented on LUCENE-2908: - Oh, damn :) On my project, we specifically use java-serialization to pass configured Queries/Filters between cluster nodes, as it saves us HEAPS of wrapping/unwrapping them into some parallel serializable classes. clean up serialization in the codebase -- Key: LUCENE-2908 URL: https://issues.apache.org/jira/browse/LUCENE-2908 Project: Lucene - Java Issue Type: Task Reporter: Robert Muir Assignee: Robert Muir Fix For: 4.0 Attachments: LUCENE-2908.patch We removed contrib/remote, but forgot to cleanup serialization hell everywhere. this is no longer needed, never really worked (e.g. across versions), and slows development (e.g. i wasted a long time debugging stupid serialization of Similarity.idfExplain when trying to make a patch for the scoring system). -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-756) Make DisjunctionMaxQueryParser generally useful by supporting all query types.
[ https://issues.apache.org/jira/browse/SOLR-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994772#comment-12994772 ] Jan Høydahl commented on SOLR-756: -- I think this issue can be closed as it duplicates SOLR-1553, not? Make DisjunctionMaxQueryParser generally useful by supporting all query types. -- Key: SOLR-756 URL: https://issues.apache.org/jira/browse/SOLR-756 Project: Solr Issue Type: Improvement Affects Versions: 1.3 Reporter: David Smiley Fix For: Next Attachments: SolrPluginUtilsDisMax.patch This is an enhancement to the DisjunctionMaxQueryParser to work on all the query variants such as wildcard, prefix, and fuzzy queries, and to support working in AND scenarios that are not processed by the min-should-match DisMax QParser. This was not in Solr already because DisMax was only used for a very limited syntax that didn't use those features. In my opinion, this makes a more suitable base parser for general use because unlike the Lucene/Solr parser, this one supports multiple default fields whereas other ones (say Yonik's {!prefix} one for example, can't do dismax). The notion of a single default field is antiquated and a technical under-the-hood detail of Lucene that I think Solr should shield the user from by on-the-fly using a DisMax when multiple fields are used. (patch to be attached soon) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2363) Rename the dismax request handler
Rename the dismax request handler --- Key: SOLR-2363 URL: https://issues.apache.org/jira/browse/SOLR-2363 Project: Solr Issue Type: Bug Components: Schema and Analysis Reporter: Jan Høydahl It is misleading that one of the requestHandlers in the example schema is named the same as the queryParser dismax. It creates confusion as to whether the use of defType=dismax vs qt=dismax. It would be better if the example requestHandler was named e.g. dismaxexample -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: inverted index pruning
On 2/15/11 11:57 AM, Li Li wrote: hi all, I recently read a paper Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee. It's idea is interesting and I have some questions and like to share with you. Please take a look at LUCENE-1812, LUCENE-2632 and my presentation from Apache EuroCon 2010 in Prague, Munching and Crunching. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1581) Facet by Function
[ https://issues.apache.org/jira/browse/SOLR-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994774#comment-12994774 ] Grant Ingersoll commented on SOLR-1581: --- I would agree it relates to SOLR-1240. In fact, API wise, I think we could just add facet.range.function= (or some abbreviation like facet.range.fn). The key here is that we don't want to have to run the query multiple times like one has to do w/ frange Facet by Function - Key: SOLR-1581 URL: https://issues.apache.org/jira/browse/SOLR-1581 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Fix For: Next It would be really great if we could execute a function and quantize it into buckets that could then be returned as facets. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2348) No error reported when using a FieldCached backed ValueSource for a field Solr knows won't work
[ https://issues.apache.org/jira/browse/SOLR-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-2348: -- Fix Version/s: (was: 3.1) 3.2 moving to 3.2 No error reported when using a FieldCached backed ValueSource for a field Solr knows won't work --- Key: SOLR-2348 URL: https://issues.apache.org/jira/browse/SOLR-2348 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Fix For: 3.2, 4.0 For the same reasons outlined in SOLR-2339, Solr FieldTypes that return FieldCached backed ValueSources should explicitly check for situations where knows the FieldCache is meaningless. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2921) Now that we track the code version at the segment level, we can stop tracking it also in each file level
Now that we track the code version at the segment level, we can stop tracking it also in each file level Key: LUCENE-2921 URL: https://issues.apache.org/jira/browse/LUCENE-2921 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Shai Erera Fix For: 3.2, 4.0 Now that we track the code version that created the segment at the segment level, we can stop tracking versions in each file. This has several major benefits: # Today the constant names that use to track versions are confusing - they do not state since which version it applies to, and so it's harder to determine which formats we can stop supporting when working on the next major release. # Those format numbers are usually negative, but in some cases positive (inconsistency) -- we need to remember to increase it one down for the negative ones, which I always find confusing. # It will remove the format tracking from all the *Writers, and the *Reader will receive the code format (String) and work w/ the appropriate constant (e.g. Constants.LUCENE_30). Centralizing version tracking to SegmentInfo is an advantage IMO. It's not urgent that we do it for 3.1 (though it requires an index format change), because starting from 3.1 all segments track their version number anyway (or migrated to track it), so we can safely release it in follow-on 3x release. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1553) extended dismax query parser
[ https://issues.apache.org/jira/browse/SOLR-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994781#comment-12994781 ] Robert Muir commented on SOLR-1553: --- I marked this experimental in trunk. I'll keep the issue open in 3.1 for a few more days as discussed, then i'm moving it out. extended dismax query parser Key: SOLR-1553 URL: https://issues.apache.org/jira/browse/SOLR-1553 Project: Solr Issue Type: New Feature Reporter: Yonik Seeley Assignee: Yonik Seeley Fix For: 1.5, 3.1, 4.0 Attachments: SOLR-1553.patch, SOLR-1553.pf-refactor.patch, edismax.unescapedcolon.bug.test.patch, edismax.unescapedcolon.bug.test.patch, edismax.userFields.patch An improved user-facing query parser based on dismax -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (SOLR-2363) Rename the dismax request handler
[ https://issues.apache.org/jira/browse/SOLR-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley resolved SOLR-2363. - Resolution: Duplicate The DismaxRequestHandler and StandardRequestHandler are both deprecated and replaced with SearchHandler. Rename the dismax request handler --- Key: SOLR-2363 URL: https://issues.apache.org/jira/browse/SOLR-2363 Project: Solr Issue Type: Bug Components: Schema and Analysis Reporter: Jan Høydahl Labels: dismax, example-schema It is misleading that one of the requestHandlers in the example schema is named the same as the queryParser dismax. It creates confusion as to whether the use of defType=dismax vs qt=dismax. It would be better if the example requestHandler was named e.g. dismaxexample -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] Created: (SOLR-2363) Rename the dismax request handler
+1 2011/2/15 Jan Høydahl (JIRA) j...@apache.org: Rename the dismax request handler --- Key: SOLR-2363 URL: https://issues.apache.org/jira/browse/SOLR-2363 Project: Solr Issue Type: Bug Components: Schema and Analysis Reporter: Jan Høydahl It is misleading that one of the requestHandlers in the example schema is named the same as the queryParser dismax. It creates confusion as to whether the use of defType=dismax vs qt=dismax. It would be better if the example requestHandler was named e.g. dismaxexample -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Reopened: (SOLR-2363) Rename the dismax request handler
[ https://issues.apache.org/jira/browse/SOLR-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl reopened SOLR-2363: --- Reopening. This issue is not talking about the old DisMaxRequestHandler but the example SearchHandler config named dismax. We should probably start using the name RequestHandler instance or similar for these entries. Rename the dismax request handler --- Key: SOLR-2363 URL: https://issues.apache.org/jira/browse/SOLR-2363 Project: Solr Issue Type: Bug Components: Schema and Analysis Reporter: Jan Høydahl Labels: dismax, example-schema Attachments: SOLR-2363.patch It is misleading that one of the requestHandlers in the example schema is named the same as the queryParser dismax. It creates confusion as to whether the use of defType=dismax vs qt=dismax. It would be better if the example requestHandler was named e.g. dismaxexample -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2363) Rename the dismax request handler
[ https://issues.apache.org/jira/browse/SOLR-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-2363: -- Attachment: SOLR-2363.patch This patch renames the example requesthandler to dismaxexample, updates the outdated comment with more proper reference to DisMaxQParser and switches to edismax as default. Rename the dismax request handler --- Key: SOLR-2363 URL: https://issues.apache.org/jira/browse/SOLR-2363 Project: Solr Issue Type: Bug Components: Schema and Analysis Reporter: Jan Høydahl Labels: dismax, example-schema Attachments: SOLR-2363.patch It is misleading that one of the requestHandlers in the example schema is named the same as the queryParser dismax. It creates confusion as to whether the use of defType=dismax vs qt=dismax. It would be better if the example requestHandler was named e.g. dismaxexample -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2363) Rename the example dismax request handler instance
[ https://issues.apache.org/jira/browse/SOLR-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-2363: -- Summary: Rename the example dismax request handler instance (was: Rename the dismax request handler) Just renaming issue to reflect that it's about a requesthandler instance Rename the example dismax request handler instance Key: SOLR-2363 URL: https://issues.apache.org/jira/browse/SOLR-2363 Project: Solr Issue Type: Bug Components: Schema and Analysis Reporter: Jan Høydahl Labels: dismax, example-schema Attachments: SOLR-2363.patch It is misleading that one of the requestHandlers in the example schema is named the same as the queryParser dismax. It creates confusion as to whether the use of defType=dismax vs qt=dismax. It would be better if the example requestHandler was named e.g. dismaxexample -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2363) Rename the example dismax request handler instance
[ https://issues.apache.org/jira/browse/SOLR-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994793#comment-12994793 ] Ryan McKinley commented on SOLR-2363: - ah - my bad. what about something more descriptive? dismax is kinda cryptic. maybe 'escaped', 'safe', or just 'query' Though i'm not convinced it really needs changing -- we would also need to update all the documentation that refers to ?qt=dismax Rename the example dismax request handler instance Key: SOLR-2363 URL: https://issues.apache.org/jira/browse/SOLR-2363 Project: Solr Issue Type: Bug Components: Schema and Analysis Reporter: Jan Høydahl Labels: dismax, example-schema Attachments: SOLR-2363.patch It is misleading that one of the requestHandlers in the example schema is named the same as the queryParser dismax. It creates confusion as to whether the use of defType=dismax vs qt=dismax. It would be better if the example requestHandler was named e.g. dismaxexample -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2351) Allow the MoreLikeThis component to accept filters and use the already parsed query from previous stages (if applicable) as seed.
[ https://issues.apache.org/jira/browse/SOLR-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-2351: --- Affects Version/s: (was: 1.5) Fix Version/s: (was: 1.5) (was: 1.3) Next Allow the MoreLikeThis component to accept filters and use the already parsed query from previous stages (if applicable) as seed. - Key: SOLR-2351 URL: https://issues.apache.org/jira/browse/SOLR-2351 Project: Solr Issue Type: Improvement Components: MoreLikeThis Reporter: Amit Nithian Priority: Minor Fix For: Next Attachments: mlt.patch Currently the MLT component doesn't accept filter queries specified on the URL which my application needed (I needed to restrict similar results by a lat/long bounding box). This patch also attempts to solve the issue of allowing the boost functions of the dismax to be used in the MLT component by using the query object created by the QueryComponent to OR with the query created by the MLT as part of the final query. In a blank dismax query with no query/phrase clauses, this works although a separate BF definition/parsing would be ideal. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1191) NullPointerException in delta import
[ https://issues.apache.org/jira/browse/SOLR-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994795#comment-12994795 ] Yonik Seeley commented on SOLR-1191: If someone could whip up a test for this, we could get this fix into the upcoming 3.1 release. NullPointerException in delta import Key: SOLR-1191 URL: https://issues.apache.org/jira/browse/SOLR-1191 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.3, 1.4 Environment: OS: Windows Linux. Java: 1.6 DB: MySQL SQL Server Reporter: Ali Syed Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1191.patch Seeing few of these NullPointerException during delta imports. Once this happens delta import stops working and keeps giving the same error. java.lang.NullPointerException at org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:622) at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:240) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:337) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:376) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:355) Running delta import for a particular entity fixes the problem and delta import start working again. Here is the log just before after the exception 05/27 11:59:29 86987686 INFO btpool0-538 org.apache.solr.core.SolrCore - [localhost] webapp=/solr path=/dataimport params={command=delta-importoptimize=false} status=0 QTime=0 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.SolrWriter - Read dataimport.properties 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.DataImporter - Starting Delta Import 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.SolrWriter - Read dataimport.properties 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Starting delta collection. 05/27 11:59:29 86987690 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Running ModifiedRowKey() for Entity: content 05/27 11:59:29 86987690 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed ModifiedRowKey for Entity: content rows obtained : 0 05/27 11:59:29 86987690 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed DeletedRowKey for Entity: content rows obtained : 0 05/27 11:59:29 86987692 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed parentDeltaQuery for Entity: content 05/27 11:59:29 86987692 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Running ModifiedRowKey() for Entity: job 05/27 11:59:29 86987692 INFO Thread-4162 org.apache.solr.handler.dataimport.JdbcDataSource - Creating a connection for entity job with URL: jdbc:sqlserver://localhost;databaseName=TestDB 05/27 11:59:29 86987704 INFO Thread-4162 org.apache.solr.handler.dataimport.JdbcDataSource - Time taken for getConnection(): 12 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed ModifiedRowKey for Entity: job rows obtained : 0 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed DeletedRowKey for Entity: job rows obtained : 0 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed parentDeltaQuery for Entity: job 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Delta Import completed successfully 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Starting delta collection. 05/27 11:59:29 86987709 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Running ModifiedRowKey() for Entity: user 05/27 11:59:29 86987709 INFO Thread-4162 org.apache.solr.handler.dataimport.JdbcDataSource - Creating a connection for entity user with URL: jdbc:sqlserver://localhost;databaseName=TestDB 05/27 11:59:29 86987716 INFO Thread-4162 org.apache.solr.handler.dataimport.JdbcDataSource - Time taken for getConnection(): 7 05/27 11:59:29 86987873 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed ModifiedRowKey for Entity: user rows obtained : 46 05/27 11:59:29 86987873 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed DeletedRowKey for Entity: user rows obtained : 0 05/27 11:59:29 86987873 INFO
[jira] Commented: (SOLR-2245) MailEntityProcessor Update
[ https://issues.apache.org/jira/browse/SOLR-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994797#comment-12994797 ] Yonik Seeley commented on SOLR-2245: Thanks Peter, If we can get someone who knows more DIH stuff to add some tests, we can get this committed! MailEntityProcessor Update -- Key: SOLR-2245 URL: https://issues.apache.org/jira/browse/SOLR-2245 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler Affects Versions: 1.4, 1.4.1 Reporter: Peter Sturge Priority: Minor Fix For: 1.4.2 Attachments: SOLR-2245.patch, SOLR-2245.patch, SOLR-2245.zip This patch addresses a number of issues in the MailEntityProcessor contrib-extras module. The changes are outlined here: * Added an 'includeContent' entity attribute to allow specifying content to be included independently of processing attachments e.g. entity includeContent=true processAttachments=false . . . / would include message content, but not attachment content * Added a synonym called 'processAttachments', which is synonymous to the mis-spelled (and singular) 'processAttachement' property. This property functions the same as processAttachement. Default= 'true' - if either is false, then attachments are not processed. Note that only one of these should really be specified in a given entity tag. * Added a FLAGS.NONE value, so that if an email has no flags (i.e. it is unread, not deleted etc.), there is still a property value stored in the 'flags' field (the value is the string none) Note: there is a potential backward compat issue with FLAGS.NONE for clients that expect the absence of the 'flags' field to mean 'Not read'. I'm calculating this would be extremely rare, and is inadviasable in any case as user flags can be arbitrarily set, so fixing it up now will ensure future client access will be consistent. * The folder name of an email is now included as a field called 'folder' (e.g. folder=INBOX.Sent). This is quite handy in search/post-indexing processing * The addPartToDocument() method that processes attachments is significantly re-written, as there looked to be no real way the existing code would ever actually process attachment content and add it to the row data Tested on the 3.x trunk with a number of popular imap servers. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1191) NullPointerException in delta import
[ https://issues.apache.org/jira/browse/SOLR-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994803#comment-12994803 ] Gunnlaugur Thor Briem commented on SOLR-1191: - I'll make one later today or tomorrow. NullPointerException in delta import Key: SOLR-1191 URL: https://issues.apache.org/jira/browse/SOLR-1191 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.3, 1.4 Environment: OS: Windows Linux. Java: 1.6 DB: MySQL SQL Server Reporter: Ali Syed Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1191.patch Seeing few of these NullPointerException during delta imports. Once this happens delta import stops working and keeps giving the same error. java.lang.NullPointerException at org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:622) at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:240) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:337) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:376) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:355) Running delta import for a particular entity fixes the problem and delta import start working again. Here is the log just before after the exception 05/27 11:59:29 86987686 INFO btpool0-538 org.apache.solr.core.SolrCore - [localhost] webapp=/solr path=/dataimport params={command=delta-importoptimize=false} status=0 QTime=0 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.SolrWriter - Read dataimport.properties 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.DataImporter - Starting Delta Import 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.SolrWriter - Read dataimport.properties 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Starting delta collection. 05/27 11:59:29 86987690 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Running ModifiedRowKey() for Entity: content 05/27 11:59:29 86987690 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed ModifiedRowKey for Entity: content rows obtained : 0 05/27 11:59:29 86987690 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed DeletedRowKey for Entity: content rows obtained : 0 05/27 11:59:29 86987692 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed parentDeltaQuery for Entity: content 05/27 11:59:29 86987692 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Running ModifiedRowKey() for Entity: job 05/27 11:59:29 86987692 INFO Thread-4162 org.apache.solr.handler.dataimport.JdbcDataSource - Creating a connection for entity job with URL: jdbc:sqlserver://localhost;databaseName=TestDB 05/27 11:59:29 86987704 INFO Thread-4162 org.apache.solr.handler.dataimport.JdbcDataSource - Time taken for getConnection(): 12 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed ModifiedRowKey for Entity: job rows obtained : 0 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed DeletedRowKey for Entity: job rows obtained : 0 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed parentDeltaQuery for Entity: job 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Delta Import completed successfully 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Starting delta collection. 05/27 11:59:29 86987709 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Running ModifiedRowKey() for Entity: user 05/27 11:59:29 86987709 INFO Thread-4162 org.apache.solr.handler.dataimport.JdbcDataSource - Creating a connection for entity user with URL: jdbc:sqlserver://localhost;databaseName=TestDB 05/27 11:59:29 86987716 INFO Thread-4162 org.apache.solr.handler.dataimport.JdbcDataSource - Time taken for getConnection(): 7 05/27 11:59:29 86987873 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed ModifiedRowKey for Entity: user rows obtained : 46 05/27 11:59:29 86987873 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed DeletedRowKey for Entity: user rows obtained : 0 05/27 11:59:29 86987873 INFO Thread-4162
Re: [jira] Commented: (SOLR-2363) Rename the example dismax request handler instance
Since the qt=dismax has specific qt fields, I would suggest we have a qt=dismax that is plain vanilla, and one that is called qt=dismaxexample with the fields. On 2/15/11 7:01 AM, Ryan McKinley (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/SOLR-2363?page=com.atlassian.jira.pl ugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994793#comm ent-12994793 ] Ryan McKinley commented on SOLR-2363: - ah - my bad. what about something more descriptive? dismax is kinda cryptic. maybe 'escaped', 'safe', or just 'query' Though i'm not convinced it really needs changing -- we would also need to update all the documentation that refers to ?qt=dismax Rename the example dismax request handler instance Key: SOLR-2363 URL: https://issues.apache.org/jira/browse/SOLR-2363 Project: Solr Issue Type: Bug Components: Schema and Analysis Reporter: Jan Høydahl Labels: dismax, example-schema Attachments: SOLR-2363.patch It is misleading that one of the requestHandlers in the example schema is named the same as the queryParser dismax. It creates confusion as to whether the use of defType=dismax vs qt=dismax. It would be better if the example requestHandler was named e.g. dismaxexample -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2363) Rename the example dismax request handler instance
[ https://issues.apache.org/jira/browse/SOLR-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994817#comment-12994817 ] Erick Erickson commented on SOLR-2363: -- bq: Though i'm not convinced it really needs changing – we would also need to update all the documentation that refers to ?qt=dismax I agree with Jan on this one. I distinctly remember having this confusion, and I've seen it go round multiple times on the user's list. Interestingly, I can't find anything on the Wiki where qt=dismax is in the text.. bq: I would suggest we have a qt=dismax that is plain vanilla, Please no! The whole point is to avoid the confusion over qt=dismax and defType=dismax. Rename the example dismax request handler instance Key: SOLR-2363 URL: https://issues.apache.org/jira/browse/SOLR-2363 Project: Solr Issue Type: Bug Components: Schema and Analysis Reporter: Jan Høydahl Labels: dismax, example-schema Attachments: SOLR-2363.patch It is misleading that one of the requestHandlers in the example schema is named the same as the queryParser dismax. It creates confusion as to whether the use of defType=dismax vs qt=dismax. It would be better if the example requestHandler was named e.g. dismaxexample -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1581) Facet by Function
[ https://issues.apache.org/jira/browse/SOLR-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994828#comment-12994828 ] Grant Ingersoll commented on SOLR-1581: --- actually, in looking at this, we don't need facet.range.function, we just need facet.range to take functions Facet by Function - Key: SOLR-1581 URL: https://issues.apache.org/jira/browse/SOLR-1581 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Fix For: Next It would be really great if we could execute a function and quantize it into buckets that could then be returned as facets. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Please mark distributed date faceting for 3.1
Distributed date faceting now has a patch and is tested: https://issues.apache.org/jira/browse/SOLR-1709 I'm posting to the dev list because I want a committer to mark this for 3.1. I don't want to assume any of you guys see the comment activity. ~ David
[jira] Commented: (SOLR-2363) Rename the example dismax request handler instance
[ https://issues.apache.org/jira/browse/SOLR-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994837#comment-12994837 ] Jan Høydahl commented on SOLR-2363: --- Also we must remember that it's only (supposed to be) an EXAMPLE schema. It's where most people start learning about Solr, request handlers and the like - thus it should not be confusing, but rather have super-clear comments helping the user get going. Also, per definition, changing this in the example schema will not break anything anywhere :) qt=robust or qt=userfriendly or qt=onesearchbox could be other alternatives? Rename the example dismax request handler instance Key: SOLR-2363 URL: https://issues.apache.org/jira/browse/SOLR-2363 Project: Solr Issue Type: Bug Components: Schema and Analysis Reporter: Jan Høydahl Labels: dismax, example-schema Attachments: SOLR-2363.patch It is misleading that one of the requestHandlers in the example schema is named the same as the queryParser dismax. It creates confusion as to whether the use of defType=dismax vs qt=dismax. It would be better if the example requestHandler was named e.g. dismaxexample -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Please mark distributed date faceting for 3.1
On Tue, Feb 15, 2011 at 10:10 AM, Smiley, David W. dsmi...@mitre.org wrote: Distributed date faceting now has a patch and is tested: https://issues.apache.org/jira/browse/SOLR-1709 I’m posting to the dev list because I want a committer to mark this for 3.1. I don’t want to assume any of you guys see the comment activity. Thanks very much for adding a test! But, can't we just do this for 3.2 instead? I don't like the idea of rushing features into 3.1 at the last minute because we are nearing a release (0 open lucene issues, 2 open solr ones). Right now the 3.x branch is feature-frozen for 3.1 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2105) RequestHandler param update.processor is confusing
[ https://issues.apache.org/jira/browse/SOLR-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994841#comment-12994841 ] Jan Høydahl commented on SOLR-2105: --- 5-minute fix candidate for 3.1 Anyone vote for including this name change fix in the 3.1 release? Custom update chains are very little in use out there so it's easier to change the name of the parameter now than later. Marking this change clearly in CHANGES.TXT should let anyone be able to catch up. A softer option is to leave the old param in there but deprecated. RequestHandler param update.processor is confusing -- Key: SOLR-2105 URL: https://issues.apache.org/jira/browse/SOLR-2105 Project: Solr Issue Type: Improvement Components: update Affects Versions: 1.4.1 Reporter: Jan Høydahl Priority: Minor Attachments: SOLR-2105.patch Today we reference a custom updateRequestProcessorChain using the update request parameter update.processor. See http://wiki.apache.org/solr/SolrConfigXml#UpdateRequestProcessorChain_section This is confusing, since what we are really referencing is not an UpdateProcessor, but an updateRequestProcessorChain. I propose that update.processor is renamed as update.chain or similar -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Problem loading jcc from java : undefined symbol: PyExc_IOError
In the end, I compiled a new python with the necessary modules, and that works just fine. But it was an interesting experience. Thank you Andi, your help is always great. Cheers, roman On Tue, Feb 15, 2011 at 9:22 AM, Roman Chyla roman.ch...@gmail.com wrote: On Tue, Feb 15, 2011 at 4:22 AM, Andi Vajda va...@apache.org wrote: On Tue, 15 Feb 2011, Roman Chyla wrote: from: http://realmike.org/blog/2010/07/18/python-extensions-in-cpp-using-swig/ Q. ?Fatal Python error: Interpreter not initialized (version mismatch?)? A. This error occurs when the version of the Python interpreter for which the extension module has been built is different from the version of the interpreter that attempts to import the module. Is there a way to find out which python interpreter version is inside JCC? Also, Is it somehow possible that the java process that load jcc library will be picking the default python (2.4) instead of the python (2.5)? PATH is set to python2.5. There is no Python interpreter inside jcc. It's dynamically linked. To know which version of the shared library is looked for and expected, use the 'ldd' utility against the various shared libraries involved to tell you. That version is selected at build time, when you run 'python setup.py ...' That version of python determines the version of libpython.so used. This will be probably the problem (as you said before), the libjcc.so shows no python - bash-3.2$ ldd build/lib.linux-x86_64-2.5/libjcc.so linux-vdso.so.1 = (0x7fff7affc000) /$LIB/snoopy.so = /lib64/snoopy.so (0x2b8ed0e74000) libjava.so = /afs/cern.ch/user/r/rchyla/public/jdk1.6.0_18/jre/lib/amd64/libjava.so (0x2b8ed1076000) libjvm.so = /afs/cern.ch/user/r/rchyla/public/jdk1.6.0_18/jre/lib/amd64/server/libjvm.so (0x2b8ed11a5000) libstdc++.so.6 = /usr/lib64/libstdc++.so.6 (0x2b8ed1c3f000) libm.so.6 = /lib64/libm.so.6 (0x2b8ed1f3f000) libgcc_s.so.1 = /lib64/libgcc_s.so.1 (0x2b8ed21c2000) libpthread.so.0 = /lib64/libpthread.so.0 (0x2b8ed23cf000) libc.so.6 = /lib64/libc.so.6 (0x2b8ed25eb000) libdl.so.2 = /lib64/libdl.so.2 (0x2b8ed2943000) libverify.so = /afs/cern.ch/user/r/rchyla/public/jdk1.6.0_18/jre/lib/amd64/libverify.so (0x2b8ed2b47000) libnsl.so.1 = /lib64/libnsl.so.1 (0x2b8ed2c57000) /lib64/ld-linux-x86-64.so.2 (0x2b8ed08c9000) And I think, the python2.4 (the default on the system) is being loaded -- but how to force loading of python2.5 (if that was possible at all) I don't know. Compilation is definitely done with -lpython2.5 Cheers, roman Andi.. Cheers, roman On Tue, Feb 15, 2011 at 2:40 AM, Roman Chyla roman.ch...@gmail.com wrote: On Tue, Feb 15, 2011 at 1:32 AM, Andi Vajda va...@apache.org wrote: On Tue, 15 Feb 2011, Roman Chyla wrote: The python embedded in Java works really well on MacOsX and also Ubuntu. But I am trying hard to make it work also on Scientific Linux (SLC5) with *statically* built Python. The python is a build from ActiveState. You mean you're going to try to dynamically load libpython.a into a JVM ? I have no idea if this can work at all. I am very ignorant as far as the difference between statically and dynamically linked libraries go - I just wanted to use JCC wrapped code with this particular statically linked python I got little bit further, but just little: after I changed -Xlinker --export-dynamic into -Xlinker -export-dynamic (and installed python into /opt...) I am getting a different error: SEVERE: org.apache.jcc.PythonException: No module named solrpie.java_bridge null at org.apache.jcc.PythonVM.instantiate(Native Method) at rca.python.jni.PythonVMBridge.start(Unknown Source) at rca.python.jni.PythonVMBridge.start(Unknown Source) at rca.python.jni.PythonVMBridge.start(Unknown Source) at rca.python.jni.SolrpieVM.getBridge(Unknown Source) My understanding is that the previous error has gone (and the python module time is loaded), because if I set PYTHONPATH incorrectly, I get: This message is IMHO coming from Python But when I correct the PYTHONPATH, I am getting only this: [java] Fatal Python error: Interpreter not initialized (version mismatch?) [java] Java Result: 134 If my understanding of static builds is correct, I'd imagine the only way for this to work would be to statically compile the JVM (hotspot) and python together. oooups, that is way over my head But why all this ? Because on the grid, we already had a statically linked python and it was working very well with pylucene (and after all, I managed to make it work also for solr and other packages) But if you think that it is not possible, I should do something else :) But it was fun trying, if you get some idea, please let me know. Thank you, Roman Andi.. So far, I managed to build all the
Fwd: Any contribs available for Range field type?
-- Forwarded message -- From: kenf_nc ken.fos...@realestate.com Date: Tue, Feb 15, 2011 at 10:49 AM Subject: Re: Any contribs available for Range field type? To: solr-u...@lucene.apache.org I've tried several times to get an active account on solr-...@lucene.apache.org and the mailing list won't send me a confirmation email, and therefore won't let me post because I'm not confirmed. Could I get someone that is a member of Solr-Dev to post either my original request in this thread, or a link to this thread on the Dev mailing list? I really was hoping for more response than this to this question. This would be a terrifically useful field type to just about any solr index. Thanks, Ken -- View this message in context: http://lucene.472066.n3.nabble.com/Any-contribs-available-for-Range-field-type-tp2473601p2502203.html Sent from the Solr - User mailing list archive at Nabble.com.
Fwd: Any contribs available for Range field type?
-- Forwarded message -- From: kenf_nc ken.fos...@realestate.com Date: Fri, Feb 11, 2011 at 8:49 AM Subject: Any contribs available for Range field type? To: solr-u...@lucene.apache.org I have a huge need for a new field type. It would be a Poly field, similar to Point or Payload. It would take 2 data elements and a search would return a hit if the search term fell within the range of the elements. For example let's say I have a document representing an Employment record. I may want to create a field for years_of_service where it would take values 1999,2004. Then in a query q=years_of_service:2001 would be a hit, q=years_of_service:2010 would not. The field would need to take a data type attribute as a parameter. I may need to do integer ranges, float/double ranges, date ranges. I don't see the need now, but heck maybe even a string range. This would be useful for things like Event dates. An event often occurs between several days (or hours) but the query is something like what events are happening today. If I did q=event_date:NOW (or similar) it should hit all documents where event_date has a range that in inclusive of today. Another example would be product category document. A specific automobile may have a fixed price, but a category of auto (2010 BMW 3-series for example) would have a price range. I hope you get the point. My question (finally) is, does anyone know of an existing contribution to the public domain that already does this? I'm more of a .Net/C# developer than a Java developer. I know my way around Java, but don't really have the right tools to build/test/etc. So was hoping to borrow rather than build if I could. Thanks, Ken -- View this message in context: http://lucene.472066.n3.nabble.com/Any-contribs-available-for-Range-field-type-tp2473601p2473601.html Sent from the Solr - User mailing list archive at Nabble.com.
[jira] Commented: (SOLR-2245) MailEntityProcessor Update
[ https://issues.apache.org/jira/browse/SOLR-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994847#comment-12994847 ] Peter Sturge commented on SOLR-2245: I've been meaning to get back to this, as I have made some local updates to this that help performance. Could you give me some feedback on these 2 questions please - it would be really useful: * Is there a committer's standard or similar spec that describes what tests should be included, and if so, could you point me to it please? I can then make sure I include appropriate tests * Is there a time-frame for committing for this or next release? I have a product release of my own coming fup or beg-March, so if I know the time-scales, I can plan accordingly. Thanks! Peter MailEntityProcessor Update -- Key: SOLR-2245 URL: https://issues.apache.org/jira/browse/SOLR-2245 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler Affects Versions: 1.4, 1.4.1 Reporter: Peter Sturge Priority: Minor Fix For: 1.4.2 Attachments: SOLR-2245.patch, SOLR-2245.patch, SOLR-2245.zip This patch addresses a number of issues in the MailEntityProcessor contrib-extras module. The changes are outlined here: * Added an 'includeContent' entity attribute to allow specifying content to be included independently of processing attachments e.g. entity includeContent=true processAttachments=false . . . / would include message content, but not attachment content * Added a synonym called 'processAttachments', which is synonymous to the mis-spelled (and singular) 'processAttachement' property. This property functions the same as processAttachement. Default= 'true' - if either is false, then attachments are not processed. Note that only one of these should really be specified in a given entity tag. * Added a FLAGS.NONE value, so that if an email has no flags (i.e. it is unread, not deleted etc.), there is still a property value stored in the 'flags' field (the value is the string none) Note: there is a potential backward compat issue with FLAGS.NONE for clients that expect the absence of the 'flags' field to mean 'Not read'. I'm calculating this would be extremely rare, and is inadviasable in any case as user flags can be arbitrarily set, so fixing it up now will ensure future client access will be consistent. * The folder name of an email is now included as a field called 'folder' (e.g. folder=INBOX.Sent). This is quite handy in search/post-indexing processing * The addPartToDocument() method that processes attachments is significantly re-written, as there looked to be no real way the existing code would ever actually process attachment content and add it to the row data Tested on the 3.x trunk with a number of popular imap servers. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Release 3.2 (was Re: Please mark distributed date faceting for 3.1)
Can we see more frequent releases? Can we look forward to a 3.2 release in a few months? Say May 15? That'd be a quarterly release cycle. (Personally, I'd like to see Robert's improvement to the handling of Chinese as soon as possible.) -- DM On 02/15/2011 10:24 AM, Robert Muir wrote: On Tue, Feb 15, 2011 at 10:10 AM, Smiley, David W.dsmi...@mitre.org wrote: Distributed date faceting now has a patch and is tested: https://issues.apache.org/jira/browse/SOLR-1709 I’m posting to the dev list because I want a committer to mark this for 3.1. I don’t want to assume any of you guys see the comment activity. Thanks very much for adding a test! But, can't we just do this for 3.2 instead? I don't like the idea of rushing features into 3.1 at the last minute because we are nearing a release (0 open lucene issues, 2 open solr ones). Right now the 3.x branch is feature-frozen for 3.1 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1709) Distributed Date Faceting
[ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994853#comment-12994853 ] Bill Bell commented on SOLR-1709: - 1 vote for 3.1 Distributed Date Faceting - Key: SOLR-1709 URL: https://issues.apache.org/jira/browse/SOLR-1709 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 1.4 Reporter: Peter Sturge Priority: Minor Attachments: FacetComponent.java, FacetComponent.java, ResponseBuilder.java, SOLR-1709_distributed_date_faceting_v3x.patch, solr-1.4.0-solr-1709.patch This patch is for adding support for date facets when using distributed searches. Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of: Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time). The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in. This means that if subsequent shards' facet_dates are skewed in relation to the first by 1 'gap', these 'earlier' or 'later' facets will not be merged in. There are several reasons for this: * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data) This could be dealt with if timezone and skew information was added, and the dates were normalized. One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized. The patch affects 2 files in the Solr core: org.apache.solr.handler.component.FacetComponent.java org.apache.solr.handler.component.ResponseBuilder.java The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage. One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired. Comments suggestions welcome. As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company). -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] Commented: (SOLR-2245) MailEntityProcessor Update
3.1 may be too late Bill Bell Sent from mobile On Feb 15, 2011, at 8:52 AM, Peter Sturge (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/SOLR-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994847#comment-12994847 ] Peter Sturge commented on SOLR-2245: I've been meaning to get back to this, as I have made some local updates to this that help performance. Could you give me some feedback on these 2 questions please - it would be really useful: * Is there a committer's standard or similar spec that describes what tests should be included, and if so, could you point me to it please? I can then make sure I include appropriate tests * Is there a time-frame for committing for this or next release? I have a product release of my own coming fup or beg-March, so if I know the time-scales, I can plan accordingly. Thanks! Peter MailEntityProcessor Update -- Key: SOLR-2245 URL: https://issues.apache.org/jira/browse/SOLR-2245 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler Affects Versions: 1.4, 1.4.1 Reporter: Peter Sturge Priority: Minor Fix For: 1.4.2 Attachments: SOLR-2245.patch, SOLR-2245.patch, SOLR-2245.zip This patch addresses a number of issues in the MailEntityProcessor contrib-extras module. The changes are outlined here: * Added an 'includeContent' entity attribute to allow specifying content to be included independently of processing attachments e.g. entity includeContent=true processAttachments=false . . . / would include message content, but not attachment content * Added a synonym called 'processAttachments', which is synonymous to the mis-spelled (and singular) 'processAttachement' property. This property functions the same as processAttachement. Default= 'true' - if either is false, then attachments are not processed. Note that only one of these should really be specified in a given entity tag. * Added a FLAGS.NONE value, so that if an email has no flags (i.e. it is unread, not deleted etc.), there is still a property value stored in the 'flags' field (the value is the string none) Note: there is a potential backward compat issue with FLAGS.NONE for clients that expect the absence of the 'flags' field to mean 'Not read'. I'm calculating this would be extremely rare, and is inadviasable in any case as user flags can be arbitrarily set, so fixing it up now will ensure future client access will be consistent. * The folder name of an email is now included as a field called 'folder' (e.g. folder=INBOX.Sent). This is quite handy in search/post-indexing processing * The addPartToDocument() method that processes attachments is significantly re-written, as there looked to be no real way the existing code would ever actually process attachment content and add it to the row data Tested on the 3.x trunk with a number of popular imap servers. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Release 3.2 (was Re: Please mark distributed date faceting for 3.1)
I would love to see a release every 3 to 6 months too Bill Bell Sent from mobile On Feb 15, 2011, at 8:55 AM, DM Smith dmsmith...@gmail.com wrote: Can we see more frequent releases? Can we look forward to a 3.2 release in a few months? Say May 15? That'd be a quarterly release cycle. (Personally, I'd like to see Robert's improvement to the handling of Chinese as soon as possible.) -- DM On 02/15/2011 10:24 AM, Robert Muir wrote: On Tue, Feb 15, 2011 at 10:10 AM, Smiley, David W.dsmi...@mitre.org wrote: Distributed date faceting now has a patch and is tested: https://issues.apache.org/jira/browse/SOLR-1709 I’m posting to the dev list because I want a committer to mark this for 3.1. I don’t want to assume any of you guys see the comment activity. Thanks very much for adding a test! But, can't we just do this for 3.2 instead? I don't like the idea of rushing features into 3.1 at the last minute because we are nearing a release (0 open lucene issues, 2 open solr ones). Right now the 3.x branch is feature-frozen for 3.1 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2922) Optimize BlockTermsReader.seek
Optimize BlockTermsReader.seek -- Key: LUCENE-2922 URL: https://issues.apache.org/jira/browse/LUCENE-2922 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 4.0 When we seek, we first consult the terms index to find the right block of 32 (default) terms that may hold the target term. Then, we scan that block looking for an exact match. The scanning just uses next() and then compares the full term, but this is actually rather wasteful. First off, since all terms in the block share a common prefix, we should compare the target against that common prefix once, and then only compare the new suffix of each term. Second, since the term suffixes have already been read up front into a byte[], we should do a no-copy comparison (vs today, where we first read a copy into the local BytesRef and then compare). With this opto, I removed the ability for BlockTermsWriter/Reader to support arbitrary term sort order -- it's now hardwired to BytesRef.utf8SortedAsUnicode. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2922) Optimize BlockTermsReader.seek
[ https://issues.apache.org/jira/browse/LUCENE-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2922: --- Attachment: LUCENE-2922.patch Patch. Optimize BlockTermsReader.seek -- Key: LUCENE-2922 URL: https://issues.apache.org/jira/browse/LUCENE-2922 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 4.0 Attachments: LUCENE-2922.patch When we seek, we first consult the terms index to find the right block of 32 (default) terms that may hold the target term. Then, we scan that block looking for an exact match. The scanning just uses next() and then compares the full term, but this is actually rather wasteful. First off, since all terms in the block share a common prefix, we should compare the target against that common prefix once, and then only compare the new suffix of each term. Second, since the term suffixes have already been read up front into a byte[], we should do a no-copy comparison (vs today, where we first read a copy into the local BytesRef and then compare). With this opto, I removed the ability for BlockTermsWriter/Reader to support arbitrary term sort order -- it's now hardwired to BytesRef.utf8SortedAsUnicode. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2922) Optimize BlockTermsReader.seek
[ https://issues.apache.org/jira/browse/LUCENE-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994865#comment-12994865 ] Michael McCandless commented on LUCENE-2922: The opto is a big win for FuzzyQuery (and, automaton respeller): ||Query||QPS base||QPS opto||Pct diff |united states|13.92|13.81|{color:red}-0.8%{color}| |+united +states|20.59|20.55|{color:red}-0.2%{color}| |united states|20.06|20.03|{color:red}-0.1%{color}| |states|56.67|56.68|{color:green}0.0%{color}| |united states~3|9.55|9.55|{color:green}0.0%{color}| |uni*|17.67|17.71|{color:green}0.2%{color}| |spanNear([unit, state], 10, true)|65.84|66.03|{color:green}0.3%{color}| |unit*|31.50|31.62|{color:green}0.4%{color}| |timesecnum:[1 TO 6]|10.88|10.93|{color:green}0.4%{color}| |un*d|19.64|19.74|{color:green}0.5%{color}| |title:.*[Uu]nited.*|1.48|1.49|{color:green}0.9%{color}| |u*d|8.52|8.63|{color:green}1.3%{color}| |+nebraska +states|230.99|235.15|{color:green}1.8%{color}| |spanFirst(unit, 5)|289.74|300.65|{color:green}3.8%{color}| |united~0.75|18.01|19.26|{color:green}7.0%{color}| |unit~0.7|36.39|40.33|{color:green}10.8%{color}| |united~0.6|14.15|15.73|{color:green}11.1%{color}| |unit~0.5|24.99|29.82|{color:green}19.3%{color}| Optimize BlockTermsReader.seek -- Key: LUCENE-2922 URL: https://issues.apache.org/jira/browse/LUCENE-2922 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 4.0 Attachments: LUCENE-2922.patch When we seek, we first consult the terms index to find the right block of 32 (default) terms that may hold the target term. Then, we scan that block looking for an exact match. The scanning just uses next() and then compares the full term, but this is actually rather wasteful. First off, since all terms in the block share a common prefix, we should compare the target against that common prefix once, and then only compare the new suffix of each term. Second, since the term suffixes have already been read up front into a byte[], we should do a no-copy comparison (vs today, where we first read a copy into the local BytesRef and then compare). With this opto, I removed the ability for BlockTermsWriter/Reader to support arbitrary term sort order -- it's now hardwired to BytesRef.utf8SortedAsUnicode. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Release 3.2 (was Re: Please mark distributed date faceting for 3.1)
More contributors contributin' will help us get there! Release work is not glorious. Release work is not fun (most of it). Release discussions involve...*cough*...Maven... Been there. Many hands make light work or something though. Many want more releases - few have more time to give - that's my impression. Open Source - help scratch your itch is the best advice I can give. - Mark On Feb 15, 2011, at 11:04 AM, Bill Bell wrote: I would love to see a release every 3 to 6 months too Bill Bell Sent from mobile On Feb 15, 2011, at 8:55 AM, DM Smith dmsmith...@gmail.com wrote: Can we see more frequent releases? Can we look forward to a 3.2 release in a few months? Say May 15? That'd be a quarterly release cycle. (Personally, I'd like to see Robert's improvement to the handling of Chinese as soon as possible.) -- DM On 02/15/2011 10:24 AM, Robert Muir wrote: On Tue, Feb 15, 2011 at 10:10 AM, Smiley, David W.dsmi...@mitre.org wrote: Distributed date faceting now has a patch and is tested: https://issues.apache.org/jira/browse/SOLR-1709 I’m posting to the dev list because I want a committer to mark this for 3.1. I don’t want to assume any of you guys see the comment activity. Thanks very much for adding a test! But, can't we just do this for 3.2 instead? I don't like the idea of rushing features into 3.1 at the last minute because we are nearing a release (0 open lucene issues, 2 open solr ones). Right now the 3.x branch is feature-frozen for 3.1 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - Mark Miller lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: wind down for 3.1?
On Feb 12, 2011, at 7:38 PM, David Smiley (@MITRE.org) wrote: I don't want to overstep my role in this conversation (not being a committer as much as I want to be), My advice? Purge both of these idea's from your head. We don't like to talk about this subject around here much, but rebel that I am: Mark Miller's guide to becoming a Committer - The simple answer: Act like a Committer. The long answer: Lucene/Solr is not developed by Committers IMO. It's developed by contributors. It's measured by it's contributors. Great contributors - great stewards - they will all become Committers over time. I don't think a lot of us really care about the time tables. Sometimes a name is nominated and some of us think - oh, I already thought he was a committer - or wow, it's about time. What prompts the creation of a Committer is wide and varied. It might be as simple as someone is sick of committing all of your work. Committing others work takes time - and the shouldering of some responsibility. Being a Committer is more work than being a contributor in this way. In a lot of ways, it's an added burden - it's not just the convenience of being able to commit straight to svn. That is not really a convenience if you ask me. But honestly, a committer has no true weight over a regular contributor in Apache land. A respected member of the community can easily have the same influence as a respected committer IMO. Only PMC members have binding votes when lines are drawn in the sand. But again - great contributors - great stewards - they will all become PMC members too. And I don't think most of us are too worried about the time table. Great contributors will continue to contribute regardless of that time table in my experience. And over time, things are brought into line as they should be. When the nominee is ready - when he shows that he gets the Apache way - that he fits into the community - that he has demonstrated enough merit - that's point in time one. When the nominator is ready - when he see's or is prompted to act - when he feels comfortable putting his name out there for someone - that's point in time two. These two points don't always coincide, much as we would like them too. Persistence - it's the key to so many things. Lucene/Solr is like a cat farm, if such a things existed. - Mark Miller lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1711) Race condition in org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.java
[ https://issues.apache.org/jira/browse/SOLR-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994903#comment-12994903 ] Yonik Seeley commented on SOLR-1711: bq. What about moving the queue.put() inside the synchronized(runners) block to fix this? On second thought, that looks like a pretty bad idea ;-) Looks like a recipe for deadlock since the runners lock will be held if put then blocks. Race condition in org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.java -- Key: SOLR-1711 URL: https://issues.apache.org/jira/browse/SOLR-1711 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 1.4, 1.5 Reporter: Attila Babo Assignee: Yonik Seeley Priority: Critical Fix For: 1.4.1, 1.5, 3.1, 4.0 Attachments: StreamingUpdateSolrServer.patch Original Estimate: 1h Remaining Estimate: 1h While inserting a large pile of documents using StreamingUpdateSolrServer there is a race condition as all Runner instances stop processing while the blocking queue is full. With a high performance client this could happen quite often, there is no way to recover from it at the client side. In StreamingUpdateSolrServer there is a BlockingQueue called queue to store UpdateRequests, there are up to threadCount number of workers threads from StreamingUpdateSolrServer.Runner to read that queue and push requests to a Solr instance. If at one point the BlockingQueue is empty all workers stop processing it and pushing the collected content to Solr which could be a time consuming process, sometimes all worker threads are waiting for Solr. If at this time the client fills the BlockingQueue to full all worker threads will quit without processing any further and the main thread will block forever. There is a simple, well tested patch attached to handle this situation. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Release 3.2 (was Re: Please mark distributed date faceting for 3.1)
Mark, I understand what you are saying. In this case, there are two issues that are not making it into 3.1 because they landed too late. After the freeze. The contributions appear to be done. So, the itch at this point needs to be scratched by one or more committers, to commit the changes and to act as release manager. It appears to me, that the effort to commit the contributions are minimal, and that in this case the true cost is that of doing the release. As to release discussions involving maven: if the next release were in a couple of months and nothing had been contributed to make maven better, why would it even need to be discussed. The last decision could still stand. I think it is the long time between releases that bring up the same intensity on the maven discussion. -- DM On 02/15/2011 12:08 PM, Mark Miller wrote: More contributors contributin' will help us get there! Release work is not glorious. Release work is not fun (most of it). Release discussions involve...*cough*...Maven... Been there. Many hands make light work or something though. Many want more releases - few have more time to give - that's my impression. Open Source - help scratch your itch is the best advice I can give. - Mark On Feb 15, 2011, at 11:04 AM, Bill Bell wrote: I would love to see a release every 3 to 6 months too Bill Bell Sent from mobile On Feb 15, 2011, at 8:55 AM, DM Smithdmsmith...@gmail.com wrote: Can we see more frequent releases? Can we look forward to a 3.2 release in a few months? Say May 15? That'd be a quarterly release cycle. (Personally, I'd like to see Robert's improvement to the handling of Chinese as soon as possible.) -- DM On 02/15/2011 10:24 AM, Robert Muir wrote: On Tue, Feb 15, 2011 at 10:10 AM, Smiley, David W.dsmi...@mitre.org wrote: Distributed date faceting now has a patch and is tested: https://issues.apache.org/jira/browse/SOLR-1709 I’m posting to the dev list because I want a committer to mark this for 3.1. I don’t want to assume any of you guys see the comment activity. Thanks very much for adding a test! But, can't we just do this for 3.2 instead? I don't like the idea of rushing features into 3.1 at the last minute because we are nearing a release (0 open lucene issues, 2 open solr ones). Right now the 3.x branch is feature-frozen for 3.1 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2105) RequestHandler param update.processor is confusing
[ https://issues.apache.org/jira/browse/SOLR-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994914#comment-12994914 ] Mark Miller commented on SOLR-2105: --- I like this change. Can you leave update.processor in but deprecated? Perhaps print log warning if it's detected. Then we could make a hard change in 4.X perhaps? RequestHandler param update.processor is confusing -- Key: SOLR-2105 URL: https://issues.apache.org/jira/browse/SOLR-2105 Project: Solr Issue Type: Improvement Components: update Affects Versions: 1.4.1 Reporter: Jan Høydahl Priority: Minor Attachments: SOLR-2105.patch Today we reference a custom updateRequestProcessorChain using the update request parameter update.processor. See http://wiki.apache.org/solr/SolrConfigXml#UpdateRequestProcessorChain_section This is confusing, since what we are really referencing is not an UpdateProcessor, but an updateRequestProcessorChain. I propose that update.processor is renamed as update.chain or similar -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (SOLR-2249) ArrayIndexOutOfBoundsException thrown instead of useful FieldCache exception when too many terms
[ https://issues.apache.org/jira/browse/SOLR-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-2249. Resolution: Fixed Fix Version/s: (was: 4.0) 3.1 strictly speaking, this has already been fixed in 3.1 - AIOOBE is no longer thrown when using field cache. the related issues track the more specific tasks of dealing with the various uses of FieldCache in solr to throw errors when appropriate. ArrayIndexOutOfBoundsException thrown instead of useful FieldCache exception when too many terms - Key: SOLR-2249 URL: https://issues.apache.org/jira/browse/SOLR-2249 Project: Solr Issue Type: Bug Components: clients - php Affects Versions: 1.4.1 Environment: Windows 7 Reporter: Anees shoukat Assignee: Hoss Man Fix For: 3.1 when attempting to sort, or otherwise use the FieldCache on a field that has more terms then documents, Solr currently propogates an AIOOBE (ArrayIndexOutOfBoundsException) all the way to the user -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Release 3.2 (was Re: Please mark distributed date faceting for 3.1)
On Feb 15, 2011, at 1:00 PM, DM Smith wrote: Mark, I understand what you are saying. In this case, there are two issues that are not making it into 3.1 because they landed too late. After the freeze. The contributions appear to be done. So, the itch at this point needs to be scratched by one or more committers, to commit the changes and to act as release manager. But it's after the freeze? I'm not sure the contributions are %100 done either. Often these things need to be iterated on a bit once a committer takes a look. And are we sure we are happy with the level of the tests? If these are coming up as candidates after the freeze, I lean towards Roberts line of thinking... By all means, shape them up, add tests, etc - that's the only hope they have - but I wouldn't expect them to get in. Many feel that nailing a release as soon as can be done is more important than last minute additions. If you can't find a sympathetic committer, sometimes, them is indeed the breaks. A feature freeze got a lazy consensus go ahead - I'm not sure we want to consider much more than bugs at this point...but thats just me. It appears to me, that the effort to commit the contributions are minimal, and that in this case the true cost is that of doing the release. Heh. I think looks can be deceiving sometimes. I'm not sure I'm willing to hold the responsibility of those commits right now. If someone else is, that's great ... but I don't find them minimal enough for my taste I suppose ;) Depends on what areas you feel comfortable with I guess. As to release discussions involving maven: if the next release were in a couple of months and nothing had been contributed to make maven better, why would it even need to be discussed. The last decision could still stand. I think it is the long time between releases that bring up the same intensity on the maven discussion. Heh - I wish things where that simple. -- DM On 02/15/2011 12:08 PM, Mark Miller wrote: More contributors contributin' will help us get there! Release work is not glorious. Release work is not fun (most of it). Release discussions involve...*cough*...Maven... Been there. Many hands make light work or something though. Many want more releases - few have more time to give - that's my impression. Open Source - help scratch your itch is the best advice I can give. - Mark On Feb 15, 2011, at 11:04 AM, Bill Bell wrote: I would love to see a release every 3 to 6 months too Bill Bell Sent from mobile On Feb 15, 2011, at 8:55 AM, DM Smithdmsmith...@gmail.com wrote: Can we see more frequent releases? Can we look forward to a 3.2 release in a few months? Say May 15? That'd be a quarterly release cycle. (Personally, I'd like to see Robert's improvement to the handling of Chinese as soon as possible.) -- DM On 02/15/2011 10:24 AM, Robert Muir wrote: On Tue, Feb 15, 2011 at 10:10 AM, Smiley, David W.dsmi...@mitre.org wrote: Distributed date faceting now has a patch and is tested: https://issues.apache.org/jira/browse/SOLR-1709 I’m posting to the dev list because I want a committer to mark this for 3.1. I don’t want to assume any of you guys see the comment activity. Thanks very much for adding a test! But, can't we just do this for 3.2 instead? I don't like the idea of rushing features into 3.1 at the last minute because we are nearing a release (0 open lucene issues, 2 open solr ones). Right now the 3.x branch is feature-frozen for 3.1 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - Mark Miller lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Reopened: (SOLR-1711) Race condition in org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.java
[ https://issues.apache.org/jira/browse/SOLR-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley reopened SOLR-1711: Race condition in org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.java -- Key: SOLR-1711 URL: https://issues.apache.org/jira/browse/SOLR-1711 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 1.4, 1.5 Reporter: Attila Babo Assignee: Yonik Seeley Priority: Critical Fix For: 1.4.1, 1.5, 3.1, 4.0 Attachments: StreamingUpdateSolrServer.patch Original Estimate: 1h Remaining Estimate: 1h While inserting a large pile of documents using StreamingUpdateSolrServer there is a race condition as all Runner instances stop processing while the blocking queue is full. With a high performance client this could happen quite often, there is no way to recover from it at the client side. In StreamingUpdateSolrServer there is a BlockingQueue called queue to store UpdateRequests, there are up to threadCount number of workers threads from StreamingUpdateSolrServer.Runner to read that queue and push requests to a Solr instance. If at one point the BlockingQueue is empty all workers stop processing it and pushing the collected content to Solr which could be a time consuming process, sometimes all worker threads are waiting for Solr. If at this time the client fills the BlockingQueue to full all worker threads will quit without processing any further and the main thread will block forever. There is a simple, well tested patch attached to handle this situation. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Release 3.2 (was Re: Please mark distributed date faceting for 3.1)
On Tue, Feb 15, 2011 at 1:33 PM, Mark Miller markrmil...@gmail.com wrote: It appears to me, that the effort to commit the contributions are minimal, and that in this case the true cost is that of doing the release. Heh. I think looks can be deceiving sometimes. I'm not sure I'm willing to hold the responsibility of those commits right now. If someone else is, that's great ... but I don't find them minimal enough for my taste I suppose ;) Depends on what areas you feel comfortable with I guess. Right, this is why some features with functional patches are sitting targeted at 3.2 instead of 3.1. Is it possible that we could put distributed date faceting (SOLR-1709), better cjk handling out of box (LUCENE-2906), and a better default merge policy (LUCENE-854) all in 3.1 right now? sure it is. But is this the best decision... I don't think it is. I think as far as 3.1 goes we already have a great set of features that have baked for some time, including some rather serious performance improvements (Mike and I have done some benchmarking against 3.0)... and its already going to be a more challenging release since its the first one since we merged lucene and solr. For these newer features, its not that we are lazy... its that sometimes you want more tests, want things to bake for a while with hudson's random testing, perhaps want some reviews/second pairs of eyes on the code, or maybe even just some more time to think about the change before committing to it. When we commit it and release it, we are signing up for some degree of support in the future. Also, personally I think its better to put out a good release with solid code and a few less features, than a more buggy release that has a couple of extra features. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-1711) Race condition in org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.java
[ https://issues.apache.org/jira/browse/SOLR-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-1711: --- Attachment: SOLR-1711.patch Here's a patch that uses offer instead of put in a retry loop. Race condition in org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.java -- Key: SOLR-1711 URL: https://issues.apache.org/jira/browse/SOLR-1711 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 1.4, 1.5 Reporter: Attila Babo Assignee: Yonik Seeley Priority: Critical Fix For: 1.4.1, 1.5, 3.1, 4.0 Attachments: SOLR-1711.patch, StreamingUpdateSolrServer.patch Original Estimate: 1h Remaining Estimate: 1h While inserting a large pile of documents using StreamingUpdateSolrServer there is a race condition as all Runner instances stop processing while the blocking queue is full. With a high performance client this could happen quite often, there is no way to recover from it at the client side. In StreamingUpdateSolrServer there is a BlockingQueue called queue to store UpdateRequests, there are up to threadCount number of workers threads from StreamingUpdateSolrServer.Runner to read that queue and push requests to a Solr instance. If at one point the BlockingQueue is empty all workers stop processing it and pushing the collected content to Solr which could be a time consuming process, sometimes all worker threads are waiting for Solr. If at this time the client fills the BlockingQueue to full all worker threads will quit without processing any further and the main thread will block forever. There is a simple, well tested patch attached to handle this situation. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2348) No error reported when using a FieldCached backed ValueSource for a field Solr knows won't work
[ https://issues.apache.org/jira/browse/SOLR-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-2348: --- Attachment: SOLR-2348.patch patch with needed functionality, breaks some tests (most likely tests abusing multiValued field types)... {noformat} hossman@bester:~/lucene/dev/solr$ grep -L Failures: 0, Errors: 0 build/test-results/TEST-org.apache.solr.* build/test-results/TEST-org.apache.solr.schema.PolyFieldTest.txt build/test-results/TEST-org.apache.solr.search.function.distance.DistanceFunctionTest.txt build/test-results/TEST-org.apache.solr.search.function.SortByFunctionTest.txt build/test-results/TEST-org.apache.solr.search.QueryParsingTest.txt build/test-results/TEST-org.apache.solr.search.SpatialFilterTest.txt build/test-results/TEST-org.apache.solr.search.TestIndexSearcher.txt build/test-results/TEST-org.apache.solr.search.TestQueryTypes.txt {noformat} No error reported when using a FieldCached backed ValueSource for a field Solr knows won't work --- Key: SOLR-2348 URL: https://issues.apache.org/jira/browse/SOLR-2348 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Fix For: 3.1, 4.0 Attachments: SOLR-2348.patch For the same reasons outlined in SOLR-2339, Solr FieldTypes that return FieldCached backed ValueSources should explicitly check for situations where knows the FieldCache is meaningless. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2348) No error reported when using a FieldCached backed ValueSource for a field Solr knows won't work
[ https://issues.apache.org/jira/browse/SOLR-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-2348: --- Fix Version/s: (was: 3.2) 3.1 i'm actively working on this today .. moving back in line for 3.1 No error reported when using a FieldCached backed ValueSource for a field Solr knows won't work --- Key: SOLR-2348 URL: https://issues.apache.org/jira/browse/SOLR-2348 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Fix For: 3.1, 4.0 Attachments: SOLR-2348.patch For the same reasons outlined in SOLR-2339, Solr FieldTypes that return FieldCached backed ValueSources should explicitly check for situations where knows the FieldCache is meaningless. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Release 3.2 (was Re: Please mark distributed date faceting for 3.1)
On 02/15/2011 02:07 PM, Robert Muir wrote: On Tue, Feb 15, 2011 at 1:33 PM, Mark Millermarkrmil...@gmail.com wrote: It appears to me, that the effort to commit the contributions are minimal, and that in this case the true cost is that of doing the release. Heh. I think looks can be deceiving sometimes. I'm not sure I'm willing to hold the responsibility of those commits right now. If someone else is, that's great ... but I don't find them minimal enough for my taste I suppose ;) Depends on what areas you feel comfortable with I guess. Right, this is why some features with functional patches are sitting targeted at 3.2 instead of 3.1. Is it possible that we could put distributed date faceting (SOLR-1709), better cjk handling out of box (LUCENE-2906), and a better default merge policy (LUCENE-854) all in 3.1 right now? sure it is. But is this the best decision... I don't think it is. Nor do I. I'm fine with the freeze. I think as far as 3.1 goes we already have a great set of features that have baked for some time, including some rather serious performance improvements (Mike and I have done some benchmarking against 3.0)... and its already going to be a more challenging release since its the first one since we merged lucene and solr. For these newer features, its not that we are lazy... I did not mean to suggest that anyone is lazy. Far from it, the effort that goes into this project is impressive. its that sometimes you want more tests, want things to bake for a while with hudson's random testing, perhaps want some reviews/second pairs of eyes on the code, or maybe even just some more time to think about the change before committing to it. I have a personal interest in LUCENE-2906. If there is anything I can do to help it along, I'll be glad to do that. I'll take it up on that issue. When we commit it and release it, we are signing up for some degree of support in the future. Also, personally I think its better to put out a good release with solid code and a few less features, than a more buggy release that has a couple of extra features. As I said, I'm happy with 3.1 being frozen. This release is much more timely. :) In the past, I saw releases being repeatedly pushed out to get one last thing in. (Maybe it just appeared that way to me.) -- DM - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Release 3.2 (was Re: Please mark distributed date faceting for 3.1)
On Tue, Feb 15, 2011 at 2:34 PM, DM Smith dmsmith...@gmail.com wrote: I have a personal interest in LUCENE-2906. If there is anything I can do to help it along, I'll be glad to do that. I'll take it up on that issue. thanks DM, I know I promised to update the patch after solving the subtask, and haven't yet done this. I'll try to do this tonight. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2272) Join
[ https://issues.apache.org/jira/browse/SOLR-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-2272: --- Attachment: SOLR-2272.patch bq. However, it doesn't apply on current trunk any more. Here's a refresh. Join Key: SOLR-2272 URL: https://issues.apache.org/jira/browse/SOLR-2272 Project: Solr Issue Type: New Feature Components: search Reporter: Yonik Seeley Fix For: 4.0 Attachments: SOLR-2272.patch, SOLR-2272.patch Limited join functionality for Solr, mapping one set of IDs matching a query to another set of IDs, based on the indexed tokens of the fields. Example: fq={!join from=parent_ptr to:parent_id}child_doc:query -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2105) RequestHandler param update.processor is confusing
[ https://issues.apache.org/jira/browse/SOLR-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994983#comment-12994983 ] Ryan McKinley commented on SOLR-2105: - +1 RequestHandler param update.processor is confusing -- Key: SOLR-2105 URL: https://issues.apache.org/jira/browse/SOLR-2105 Project: Solr Issue Type: Improvement Components: update Affects Versions: 1.4.1 Reporter: Jan Høydahl Priority: Minor Attachments: SOLR-2105.patch Today we reference a custom updateRequestProcessorChain using the update request parameter update.processor. See http://wiki.apache.org/solr/SolrConfigXml#UpdateRequestProcessorChain_section This is confusing, since what we are really referencing is not an UpdateProcessor, but an updateRequestProcessorChain. I propose that update.processor is renamed as update.chain or similar -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2272) Join
[ https://issues.apache.org/jira/browse/SOLR-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994984#comment-12994984 ] Bojan Smid commented on SOLR-2272: -- Great, thx a lot Yonik :). Join Key: SOLR-2272 URL: https://issues.apache.org/jira/browse/SOLR-2272 Project: Solr Issue Type: New Feature Components: search Reporter: Yonik Seeley Fix For: 4.0 Attachments: SOLR-2272.patch, SOLR-2272.patch Limited join functionality for Solr, mapping one set of IDs matching a query to another set of IDs, based on the indexed tokens of the fields. Example: fq={!join from=parent_ptr to:parent_id}child_doc:query -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Any contribs available for Range field type?
solr-dev is the old list; it's now just dev. The old one forwards to the new list though. ~ David From: mike anderson [mailto:saidthero...@gmail.com] Sent: Tuesday, February 15, 2011 10:51 AM To: solr-...@lucene.apache.org Cc: ken.fos...@realestate.com Subject: Fwd: Any contribs available for Range field type? -- Forwarded message -- From: kenf_nc ken.fos...@realestate.commailto:ken.fos...@realestate.com Date: Tue, Feb 15, 2011 at 10:49 AM Subject: Re: Any contribs available for Range field type? To: solr-u...@lucene.apache.orgmailto:solr-u...@lucene.apache.org I've tried several times to get an active account on solr-...@lucene.apache.orgmailto:solr-...@lucene.apache.org and the mailing list won't send me a confirmation email, and therefore won't let me post because I'm not confirmed. Could I get someone that is a member of Solr-Dev to post either my original request in this thread, or a link to this thread on the Dev mailing list? I really was hoping for more response than this to this question. This would be a terrifically useful field type to just about any solr index. Thanks, Ken -- View this message in context: http://lucene.472066.n3.nabble.com/Any-contribs-available-for-Range-field-type-tp2473601p2502203.html Sent from the Solr - User mailing list archive at Nabble.com.
[jira] Resolved: (SOLR-1711) Race condition in org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.java
[ https://issues.apache.org/jira/browse/SOLR-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-1711. Resolution: Fixed Fix Version/s: (was: 1.4.1) (was: 1.5) Committed the latest patch - hopefully that finally fixes this issue! Race condition in org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.java -- Key: SOLR-1711 URL: https://issues.apache.org/jira/browse/SOLR-1711 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 1.4, 1.5 Reporter: Attila Babo Assignee: Yonik Seeley Priority: Critical Fix For: 3.1, 4.0 Attachments: SOLR-1711.patch, StreamingUpdateSolrServer.patch Original Estimate: 1h Remaining Estimate: 1h While inserting a large pile of documents using StreamingUpdateSolrServer there is a race condition as all Runner instances stop processing while the blocking queue is full. With a high performance client this could happen quite often, there is no way to recover from it at the client side. In StreamingUpdateSolrServer there is a BlockingQueue called queue to store UpdateRequests, there are up to threadCount number of workers threads from StreamingUpdateSolrServer.Runner to read that queue and push requests to a Solr instance. If at one point the BlockingQueue is empty all workers stop processing it and pushing the collected content to Solr which could be a time consuming process, sometimes all worker threads are waiting for Solr. If at this time the client fills the BlockingQueue to full all worker threads will quit without processing any further and the main thread will block forever. There is a simple, well tested patch attached to handle this situation. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2155) Geospatial search using geohash prefixes
[ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Bell updated SOLR-2155: Attachment: SOLR.2155.p3tests.patch Test cases for geomultidist() function. Add this and SOLR.2155.p3.patch Geospatial search using geohash prefixes Key: SOLR-2155 URL: https://issues.apache.org/jira/browse/SOLR-2155 Project: Solr Issue Type: Improvement Reporter: David Smiley Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch There currently isn't a solution in Solr for doing geospatial filtering on documents that have a variable number of points. This scenario occurs when there is location extraction (i.e. via a gazateer) occurring on free text. None, one, or many geospatial locations might be extracted from any given document and users want to limit their search results to those occurring in a user-specified area. I've implemented this by furthering the GeoHash based work in Lucene/Solr with a geohash prefix based filter. A geohash refers to a lat-lon box on the earth. Each successive character added further subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of the geohash) grid. The first step in this scheme is figuring out which geohash grid squares cover the user's search query. I've added various extra methods to GeoHashUtils (and added tests) to assist in this purpose. The next step is an actual Lucene Filter, GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares in the index. Once a matching geohash grid is found, the points therein are compared against the user's query to see if it matches. I created an abstraction GeoShape extended by subclasses named PointDistance... and CartesianBox to support different queried shapes so that the filter need not care about these details. This work was presented at LuceneRevolution in Boston on October 8th. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2348) No error reported when using a FieldCached backed ValueSource for a field Solr knows won't work
[ https://issues.apache.org/jira/browse/SOLR-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-2348: --- Attachment: SOLR-2348.patch Updated patch that fixes the test failures. For the most part, this is fairly straight forward: tests that were abusing multiValued fields as if they were single valued. The one situation where i made a genuine code change was in AbstractSubTypeFieldType and the way it deals with the subFieldType attribute. When it's used, the registerPolyFieldDynamicPrototype function registers a new dynamic field based on the specified fieldType instance. I updated the properties used to generate these dynamicFields so that it explicitly specifies multiValued=false (it was already specifying indexed=true and stored=false) I could have just updated the test schemas so that the fieldType specified was already multiValued, but i think this makes more sense from a functional standpoint. the existing code already enabled a use case like this... {noformat} fieldType name=double class=solr.TrieDoubleField indexed=false multiValued=false ... / fieldType name=xy class=solr.PointType dimension=2 subFieldType=double/ {noformat} ...so it makes sense that this should work equally well automatically... {noformat} fieldType name=double class=solr.TrieDoubleField indexed=true multiValued=true ... / fieldType name=xy class=solr.PointType dimension=2 subFieldType=double/ {noformat} No error reported when using a FieldCached backed ValueSource for a field Solr knows won't work --- Key: SOLR-2348 URL: https://issues.apache.org/jira/browse/SOLR-2348 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Fix For: 3.1, 4.0 Attachments: SOLR-2348.patch, SOLR-2348.patch For the same reasons outlined in SOLR-2339, Solr FieldTypes that return FieldCached backed ValueSources should explicitly check for situations where knows the FieldCache is meaningless. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2903) Improvement of PForDelta Codec
[ https://issues.apache.org/jira/browse/LUCENE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hao yan updated LUCENE-2903: Attachment: LUCENE-2903.patch This new patch provides PForDeltaFixedIntBlockWithIntBufferCodec (PatchedFrameOfRef4) which improves the performance of previous couterparts(PatchedFrameOfRef4,5,6). Note that the PatchedFrameOfRef4 is different from the previous PatchedFrameOfRef4. Improvement of PForDelta Codec -- Key: LUCENE-2903 URL: https://issues.apache.org/jira/browse/LUCENE-2903 Project: Lucene - Java Issue Type: Improvement Reporter: hao yan Attachments: LUCENE-2903.patch, LUCENE-2903.patch, LUCENE_2903.patch, LUCENE_2903.patch There are 3 versions of PForDelta implementations in the Bulk Branch: FrameOfRef, PatchedFrameOfRef, and PatchedFrameOfRef2. The FrameOfRef is a very basic one which is essentially a binary encoding (may result in huge index size). The PatchedFrameOfRef is the implmentation based on the original version of PForDelta in the literatures. The PatchedFrameOfRef2 is my previous implementation which are improved this time. (The Codec name is changed to NewPForDelta.). In particular, the changes are: 1. I fixed the bug of my previous version (in Lucene-1410.patch), where the old PForDelta does not support very large exceptions (since the Simple16 does not support very large numbers). Now this has been fixed in the new LCPForDelta. 2. I changed the PForDeltaFixedIntBlockCodec. Now it is faster than the other two PForDelta implementation in the bulk branch (FrameOfRef and PatchedFrameOfRef). The codec's name is NewPForDelta, as you can see in the CodecProvider and PForDeltaFixedIntBlockCodec. 3. The performance test results are: 1) My NewPForDelta codec is faster then FrameOfRef and PatchedFrameOfRef for almost all kinds of queries, slightly worse then BulkVInt. 2) My NewPForDelta codec can result in the smallest index size among all 4 methods, including FrameOfRef, PatchedFrameOfRef, and BulkVInt, and itself) 3) All performance test results are achieved by running with -server instead of -client -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2903) Improvement of PForDelta Codec
[ https://issues.apache.org/jira/browse/LUCENE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hao yan updated LUCENE-2903: Attachment: LUCENE-2903.patch This patch improves the performance of previous PatchedFrameOfRef4 and removed the PatchedFrameOfRef5 and PatchedFrameOfRef6. Now the performance ofPatchedFrameOfRef4 is better than BulkVInt and comparable to PatchedFrameOfRef in my tests. Improvement of PForDelta Codec -- Key: LUCENE-2903 URL: https://issues.apache.org/jira/browse/LUCENE-2903 Project: Lucene - Java Issue Type: Improvement Reporter: hao yan Attachments: LUCENE-2903.patch There are 3 versions of PForDelta implementations in the Bulk Branch: FrameOfRef, PatchedFrameOfRef, and PatchedFrameOfRef2. The FrameOfRef is a very basic one which is essentially a binary encoding (may result in huge index size). The PatchedFrameOfRef is the implmentation based on the original version of PForDelta in the literatures. The PatchedFrameOfRef2 is my previous implementation which are improved this time. (The Codec name is changed to NewPForDelta.). In particular, the changes are: 1. I fixed the bug of my previous version (in Lucene-1410.patch), where the old PForDelta does not support very large exceptions (since the Simple16 does not support very large numbers). Now this has been fixed in the new LCPForDelta. 2. I changed the PForDeltaFixedIntBlockCodec. Now it is faster than the other two PForDelta implementation in the bulk branch (FrameOfRef and PatchedFrameOfRef). The codec's name is NewPForDelta, as you can see in the CodecProvider and PForDeltaFixedIntBlockCodec. 3. The performance test results are: 1) My NewPForDelta codec is faster then FrameOfRef and PatchedFrameOfRef for almost all kinds of queries, slightly worse then BulkVInt. 2) My NewPForDelta codec can result in the smallest index size among all 4 methods, including FrameOfRef, PatchedFrameOfRef, and BulkVInt, and itself) 3) All performance test results are achieved by running with -server instead of -client -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2903) Improvement of PForDelta Codec
[ https://issues.apache.org/jira/browse/LUCENE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hao yan updated LUCENE-2903: Attachment: (was: LUCENE_2903.patch) Improvement of PForDelta Codec -- Key: LUCENE-2903 URL: https://issues.apache.org/jira/browse/LUCENE-2903 Project: Lucene - Java Issue Type: Improvement Reporter: hao yan Attachments: LUCENE-2903.patch There are 3 versions of PForDelta implementations in the Bulk Branch: FrameOfRef, PatchedFrameOfRef, and PatchedFrameOfRef2. The FrameOfRef is a very basic one which is essentially a binary encoding (may result in huge index size). The PatchedFrameOfRef is the implmentation based on the original version of PForDelta in the literatures. The PatchedFrameOfRef2 is my previous implementation which are improved this time. (The Codec name is changed to NewPForDelta.). In particular, the changes are: 1. I fixed the bug of my previous version (in Lucene-1410.patch), where the old PForDelta does not support very large exceptions (since the Simple16 does not support very large numbers). Now this has been fixed in the new LCPForDelta. 2. I changed the PForDeltaFixedIntBlockCodec. Now it is faster than the other two PForDelta implementation in the bulk branch (FrameOfRef and PatchedFrameOfRef). The codec's name is NewPForDelta, as you can see in the CodecProvider and PForDeltaFixedIntBlockCodec. 3. The performance test results are: 1) My NewPForDelta codec is faster then FrameOfRef and PatchedFrameOfRef for almost all kinds of queries, slightly worse then BulkVInt. 2) My NewPForDelta codec can result in the smallest index size among all 4 methods, including FrameOfRef, PatchedFrameOfRef, and BulkVInt, and itself) 3) All performance test results are achieved by running with -server instead of -client -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2903) Improvement of PForDelta Codec
[ https://issues.apache.org/jira/browse/LUCENE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hao yan updated LUCENE-2903: Attachment: (was: LUCENE-2903.patch) Improvement of PForDelta Codec -- Key: LUCENE-2903 URL: https://issues.apache.org/jira/browse/LUCENE-2903 Project: Lucene - Java Issue Type: Improvement Reporter: hao yan Attachments: LUCENE-2903.patch There are 3 versions of PForDelta implementations in the Bulk Branch: FrameOfRef, PatchedFrameOfRef, and PatchedFrameOfRef2. The FrameOfRef is a very basic one which is essentially a binary encoding (may result in huge index size). The PatchedFrameOfRef is the implmentation based on the original version of PForDelta in the literatures. The PatchedFrameOfRef2 is my previous implementation which are improved this time. (The Codec name is changed to NewPForDelta.). In particular, the changes are: 1. I fixed the bug of my previous version (in Lucene-1410.patch), where the old PForDelta does not support very large exceptions (since the Simple16 does not support very large numbers). Now this has been fixed in the new LCPForDelta. 2. I changed the PForDeltaFixedIntBlockCodec. Now it is faster than the other two PForDelta implementation in the bulk branch (FrameOfRef and PatchedFrameOfRef). The codec's name is NewPForDelta, as you can see in the CodecProvider and PForDeltaFixedIntBlockCodec. 3. The performance test results are: 1) My NewPForDelta codec is faster then FrameOfRef and PatchedFrameOfRef for almost all kinds of queries, slightly worse then BulkVInt. 2) My NewPForDelta codec can result in the smallest index size among all 4 methods, including FrameOfRef, PatchedFrameOfRef, and BulkVInt, and itself) 3) All performance test results are achieved by running with -server instead of -client -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2903) Improvement of PForDelta Codec
[ https://issues.apache.org/jira/browse/LUCENE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hao yan updated LUCENE-2903: Attachment: (was: LUCENE-2903.patch) Improvement of PForDelta Codec -- Key: LUCENE-2903 URL: https://issues.apache.org/jira/browse/LUCENE-2903 Project: Lucene - Java Issue Type: Improvement Reporter: hao yan Attachments: LUCENE-2903.patch There are 3 versions of PForDelta implementations in the Bulk Branch: FrameOfRef, PatchedFrameOfRef, and PatchedFrameOfRef2. The FrameOfRef is a very basic one which is essentially a binary encoding (may result in huge index size). The PatchedFrameOfRef is the implmentation based on the original version of PForDelta in the literatures. The PatchedFrameOfRef2 is my previous implementation which are improved this time. (The Codec name is changed to NewPForDelta.). In particular, the changes are: 1. I fixed the bug of my previous version (in Lucene-1410.patch), where the old PForDelta does not support very large exceptions (since the Simple16 does not support very large numbers). Now this has been fixed in the new LCPForDelta. 2. I changed the PForDeltaFixedIntBlockCodec. Now it is faster than the other two PForDelta implementation in the bulk branch (FrameOfRef and PatchedFrameOfRef). The codec's name is NewPForDelta, as you can see in the CodecProvider and PForDeltaFixedIntBlockCodec. 3. The performance test results are: 1) My NewPForDelta codec is faster then FrameOfRef and PatchedFrameOfRef for almost all kinds of queries, slightly worse then BulkVInt. 2) My NewPForDelta codec can result in the smallest index size among all 4 methods, including FrameOfRef, PatchedFrameOfRef, and BulkVInt, and itself) 3) All performance test results are achieved by running with -server instead of -client -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2364) lib dir=.../ directives are logging serious errors when they should not be
lib dir=.../ directives are logging serious errors when they should not be -- Key: SOLR-2364 URL: https://issues.apache.org/jira/browse/SOLR-2364 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Fix For: 3.1, 4.0 The {{lib dir=foo ... /}} syntax for solrconfig.xml was specificly designed so that it would *not* log errors if the directory (or jars in that directory) didn't exist -- this was designed to make it possible to have a {{lib/}} directive that would optionally include jars if they are not there, and ignore them if they can't be found ({{lib path=foo/bar.jar.../}} can be used when you have an explict jar you want to load and you want an error if it's not there) At some point in the not too distant past, something seems to have changed on both the 3x and trunk branches in how SolrResourceLoader.replaceClassLoader works, such that in the example you get errors logged like this... {noformat} Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader addToClassLoader SEVERE: Can't find (or read) file to add to classloader: /total/crap/dir/ignored {noformat} This is in spite of hte fact that the solrconfig.xml says... {noformat} !-- If a dir option (with or without a regex) is used and nothing is found that matches, it will be ignored -- lib dir=../../contrib/clustering/lib/downloads/ / lib dir=../../contrib/clustering/lib/ / lib dir=/total/crap/dir/ignored / {noformat} Note these errors are also logged when running the example, even though there are no {{lib/}} declarations that corrispond to them -- they seem to be errors coming from the default behavior of looking for $solr_home/lib (which is evidently happening twice?)... {noformat} Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to 'solr/' Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader addToClassLoader SEVERE: Can't find (or read) file to add to classloader: solr/./lib Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to 'solr/./' Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader addToClassLoader SEVERE: Can't find (or read) file to add to classloader: solr/././lib {noformat} -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments
[ https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995098#comment-12995098 ] Jon Druse commented on LUCENE-1824: --- Has this had any progress? I'm dealing with the same issues. Or is there a workaround? Thanks! FastVectorHighlighter truncates words at beginning and end of fragments --- Key: LUCENE-1824 URL: https://issues.apache.org/jira/browse/LUCENE-1824 Project: Lucene - Java Issue Type: Improvement Components: contrib/highlighter Environment: any Reporter: Alex Vigdor Priority: Minor Fix For: 4.0 Attachments: LUCENE-1824.patch FastVectorHighlighter does not take word boundaries into consideration when building fragments, so that in most cases the first and last word of a fragment are truncated. This makes the highlights less legible than they should be. I will attach a patch to BaseFragmentBuilder that resolves this by expanding the start and end boundaries of the fragment to the first whitespace character on either side of the fragment, or the beginning or end of the source text, whichever comes first. This significantly improves legibility, at the cost of returning a slightly larger number of characters than specified for the fragment size. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2364) lib dir=.../ directives are logging serious errors when they should not be
[ https://issues.apache.org/jira/browse/SOLR-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995102#comment-12995102 ] Koji Sekiguchi commented on SOLR-2364: -- Ah, sorry. I've committed the change. http://svn.apache.org/viewvc?view=revisionrevision=1069656 http://svn.apache.org/viewvc?view=revisionrevision=1069657 I didn't know the background. I'll see now if I can revert it... lib dir=.../ directives are logging serious errors when they should not be -- Key: SOLR-2364 URL: https://issues.apache.org/jira/browse/SOLR-2364 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Fix For: 3.1, 4.0 The {{lib dir=foo ... /}} syntax for solrconfig.xml was specificly designed so that it would *not* log errors if the directory (or jars in that directory) didn't exist -- this was designed to make it possible to have a {{lib/}} directive that would optionally include jars if they are not there, and ignore them if they can't be found ({{lib path=foo/bar.jar.../}} can be used when you have an explict jar you want to load and you want an error if it's not there) At some point in the not too distant past, something seems to have changed on both the 3x and trunk branches in how SolrResourceLoader.replaceClassLoader works, such that in the example you get errors logged like this... {noformat} Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader addToClassLoader SEVERE: Can't find (or read) file to add to classloader: /total/crap/dir/ignored {noformat} This is in spite of hte fact that the solrconfig.xml says... {noformat} !-- If a dir option (with or without a regex) is used and nothing is found that matches, it will be ignored -- lib dir=../../contrib/clustering/lib/downloads/ / lib dir=../../contrib/clustering/lib/ / lib dir=/total/crap/dir/ignored / {noformat} Note these errors are also logged when running the example, even though there are no {{lib/}} declarations that corrispond to them -- they seem to be errors coming from the default behavior of looking for $solr_home/lib (which is evidently happening twice?)... {noformat} Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to 'solr/' Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader addToClassLoader SEVERE: Can't find (or read) file to add to classloader: solr/./lib Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to 'solr/./' Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader addToClassLoader SEVERE: Can't find (or read) file to add to classloader: solr/././lib {noformat} -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2364) lib dir=.../ directives are logging serious errors when they should not be
[ https://issues.apache.org/jira/browse/SOLR-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995103#comment-12995103 ] Hoss Man commented on SOLR-2364: This seems to have been caused by the following commits on Feb 11... http://svn.apache.org/viewvc?view=revisionrevision=1069656 http://svn.apache.org/viewvc?view=revisionrevision=1069657 ...which koji attributed to SOLR-1449, even though that issue (which added the {{lib/}} feature was resolved back in 2009 and was included in Solr 1.4.1. I really don't know why Koji did that ... as far as i'm concerned this is a break in compatibility: the whole point of how these directives were setup was to support the possibility of directories not existing. (and the examples documented them as working that way) Unless i here a strong reason to the contrary, i plan to revert those commits. lib dir=.../ directives are logging serious errors when they should not be -- Key: SOLR-2364 URL: https://issues.apache.org/jira/browse/SOLR-2364 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Fix For: 3.1, 4.0 The {{lib dir=foo ... /}} syntax for solrconfig.xml was specificly designed so that it would *not* log errors if the directory (or jars in that directory) didn't exist -- this was designed to make it possible to have a {{lib/}} directive that would optionally include jars if they are not there, and ignore them if they can't be found ({{lib path=foo/bar.jar.../}} can be used when you have an explict jar you want to load and you want an error if it's not there) At some point in the not too distant past, something seems to have changed on both the 3x and trunk branches in how SolrResourceLoader.replaceClassLoader works, such that in the example you get errors logged like this... {noformat} Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader addToClassLoader SEVERE: Can't find (or read) file to add to classloader: /total/crap/dir/ignored {noformat} This is in spite of hte fact that the solrconfig.xml says... {noformat} !-- If a dir option (with or without a regex) is used and nothing is found that matches, it will be ignored -- lib dir=../../contrib/clustering/lib/downloads/ / lib dir=../../contrib/clustering/lib/ / lib dir=/total/crap/dir/ignored / {noformat} Note these errors are also logged when running the example, even though there are no {{lib/}} declarations that corrispond to them -- they seem to be errors coming from the default behavior of looking for $solr_home/lib (which is evidently happening twice?)... {noformat} Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to 'solr/' Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader addToClassLoader SEVERE: Can't find (or read) file to add to classloader: solr/./lib Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to 'solr/./' Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader addToClassLoader SEVERE: Can't find (or read) file to add to classloader: solr/././lib {noformat} -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Assigned: (SOLR-2364) lib dir=.../ directives are logging serious errors when they should not be
[ https://issues.apache.org/jira/browse/SOLR-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man reassigned SOLR-2364: -- Assignee: Koji Sekiguchi (was: Hoss Man) lib dir=.../ directives are logging serious errors when they should not be -- Key: SOLR-2364 URL: https://issues.apache.org/jira/browse/SOLR-2364 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Koji Sekiguchi Fix For: 3.1, 4.0 The {{lib dir=foo ... /}} syntax for solrconfig.xml was specificly designed so that it would *not* log errors if the directory (or jars in that directory) didn't exist -- this was designed to make it possible to have a {{lib/}} directive that would optionally include jars if they are not there, and ignore them if they can't be found ({{lib path=foo/bar.jar.../}} can be used when you have an explict jar you want to load and you want an error if it's not there) At some point in the not too distant past, something seems to have changed on both the 3x and trunk branches in how SolrResourceLoader.replaceClassLoader works, such that in the example you get errors logged like this... {noformat} Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader addToClassLoader SEVERE: Can't find (or read) file to add to classloader: /total/crap/dir/ignored {noformat} This is in spite of hte fact that the solrconfig.xml says... {noformat} !-- If a dir option (with or without a regex) is used and nothing is found that matches, it will be ignored -- lib dir=../../contrib/clustering/lib/downloads/ / lib dir=../../contrib/clustering/lib/ / lib dir=/total/crap/dir/ignored / {noformat} Note these errors are also logged when running the example, even though there are no {{lib/}} declarations that corrispond to them -- they seem to be errors coming from the default behavior of looking for $solr_home/lib (which is evidently happening twice?)... {noformat} Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to 'solr/' Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader addToClassLoader SEVERE: Can't find (or read) file to add to classloader: solr/./lib Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to 'solr/./' Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader addToClassLoader SEVERE: Can't find (or read) file to add to classloader: solr/././lib {noformat} -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2364) lib dir=.../ directives are logging serious errors when they should not be
[ https://issues.apache.org/jira/browse/SOLR-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995105#comment-12995105 ] Hoss Man commented on SOLR-2364: Koji: thanks. FWIW: attributing a commit to an issue that was resolved two years ago doesn't seem like a good idea in any situation -- filling a new bug to track the change (wheter you considered it a bug or and improvement) would have made this more noticable. If you think we should have an option to control whether it complains or not when trying to load libs out of a dir i'm open to suggestions -- but let's track that as a new issue. lib dir=.../ directives are logging serious errors when they should not be -- Key: SOLR-2364 URL: https://issues.apache.org/jira/browse/SOLR-2364 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Fix For: 3.1, 4.0 The {{lib dir=foo ... /}} syntax for solrconfig.xml was specificly designed so that it would *not* log errors if the directory (or jars in that directory) didn't exist -- this was designed to make it possible to have a {{lib/}} directive that would optionally include jars if they are not there, and ignore them if they can't be found ({{lib path=foo/bar.jar.../}} can be used when you have an explict jar you want to load and you want an error if it's not there) At some point in the not too distant past, something seems to have changed on both the 3x and trunk branches in how SolrResourceLoader.replaceClassLoader works, such that in the example you get errors logged like this... {noformat} Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader addToClassLoader SEVERE: Can't find (or read) file to add to classloader: /total/crap/dir/ignored {noformat} This is in spite of hte fact that the solrconfig.xml says... {noformat} !-- If a dir option (with or without a regex) is used and nothing is found that matches, it will be ignored -- lib dir=../../contrib/clustering/lib/downloads/ / lib dir=../../contrib/clustering/lib/ / lib dir=/total/crap/dir/ignored / {noformat} Note these errors are also logged when running the example, even though there are no {{lib/}} declarations that corrispond to them -- they seem to be errors coming from the default behavior of looking for $solr_home/lib (which is evidently happening twice?)... {noformat} Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to 'solr/' Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader addToClassLoader SEVERE: Can't find (or read) file to add to classloader: solr/./lib Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to 'solr/./' Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader addToClassLoader SEVERE: Can't find (or read) file to add to classloader: solr/././lib {noformat} -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: wind down for 3.1?
: 1. javadocs warnings/errors: this is a constant battle, its worth : considering if the build should actually fail if you get one of these, : in my opinion if we can do this we really should. its frustrating to for a brief period we did, and then we rolled it back... https://issues.apache.org/jira/browse/LUCENE-875 : 2. introducing new compiler warnings: another problem just being left : for someone else to clean up later, another constant losing battle. : 99% of the time (for non-autogenerated code) the warnings are : useful... in my opinion we should not commit patches that create new : warnings. it's hard to spot new compiler warnings when there are already so many ... if we can get down to 0 then we can add hacks to make hte build fail if someone adds 1 but until then we have an uphill battle. -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (SOLR-2364) lib dir=.../ directives are logging serious errors when they should not be
[ https://issues.apache.org/jira/browse/SOLR-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved SOLR-2364. -- Resolution: Fixed The reverts were committed. trunk:1071121, 3x:1071122. Thanks Hoss for taking your time for this issue! lib dir=.../ directives are logging serious errors when they should not be -- Key: SOLR-2364 URL: https://issues.apache.org/jira/browse/SOLR-2364 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Koji Sekiguchi Fix For: 3.1, 4.0 The {{lib dir=foo ... /}} syntax for solrconfig.xml was specificly designed so that it would *not* log errors if the directory (or jars in that directory) didn't exist -- this was designed to make it possible to have a {{lib/}} directive that would optionally include jars if they are not there, and ignore them if they can't be found ({{lib path=foo/bar.jar.../}} can be used when you have an explict jar you want to load and you want an error if it's not there) At some point in the not too distant past, something seems to have changed on both the 3x and trunk branches in how SolrResourceLoader.replaceClassLoader works, such that in the example you get errors logged like this... {noformat} Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader addToClassLoader SEVERE: Can't find (or read) file to add to classloader: /total/crap/dir/ignored {noformat} This is in spite of hte fact that the solrconfig.xml says... {noformat} !-- If a dir option (with or without a regex) is used and nothing is found that matches, it will be ignored -- lib dir=../../contrib/clustering/lib/downloads/ / lib dir=../../contrib/clustering/lib/ / lib dir=/total/crap/dir/ignored / {noformat} Note these errors are also logged when running the example, even though there are no {{lib/}} declarations that corrispond to them -- they seem to be errors coming from the default behavior of looking for $solr_home/lib (which is evidently happening twice?)... {noformat} Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to 'solr/' Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader addToClassLoader SEVERE: Can't find (or read) file to add to classloader: solr/./lib Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to 'solr/./' Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader addToClassLoader SEVERE: Can't find (or read) file to add to classloader: solr/././lib {noformat} -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Assigned: (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments
[ https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi reassigned LUCENE-1824: -- Assignee: Koji Sekiguchi FastVectorHighlighter truncates words at beginning and end of fragments --- Key: LUCENE-1824 URL: https://issues.apache.org/jira/browse/LUCENE-1824 Project: Lucene - Java Issue Type: Improvement Components: contrib/highlighter Environment: any Reporter: Alex Vigdor Assignee: Koji Sekiguchi Priority: Minor Fix For: 4.0 Attachments: LUCENE-1824.patch FastVectorHighlighter does not take word boundaries into consideration when building fragments, so that in most cases the first and last word of a fragment are truncated. This makes the highlights less legible than they should be. I will attach a patch to BaseFragmentBuilder that resolves this by expanding the start and end boundaries of the fragment to the first whitespace character on either side of the fragment, or the beginning or end of the source text, whichever comes first. This significantly improves legibility, at the cost of returning a slightly larger number of characters than specified for the fragment size. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1553) extended dismax query parser
[ https://issues.apache.org/jira/browse/SOLR-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995125#comment-12995125 ] Hoss Man commented on SOLR-1553: bq. I'll keep the issue open in 3.1 for a few more days as discussed, then i'm moving it out. it would be less confusing to just resolve it as fixed, and open new issues to track the outstanding problems/bugs/questions. extended dismax query parser Key: SOLR-1553 URL: https://issues.apache.org/jira/browse/SOLR-1553 Project: Solr Issue Type: New Feature Reporter: Yonik Seeley Assignee: Yonik Seeley Fix For: 1.5, 3.1, 4.0 Attachments: SOLR-1553.patch, SOLR-1553.pf-refactor.patch, edismax.unescapedcolon.bug.test.patch, edismax.unescapedcolon.bug.test.patch, edismax.userFields.patch An improved user-facing query parser based on dismax -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Any contribs available for Range field type?
I did a similar thing at Kaango.com (classified system). The idea that I used was to use dynamic fields based on type, and load them into SOLR. For example, Autos: s_auto_make - String s_auto_model - String l_auto_year - Long Real Estate: l_real_estate_bedrooms - Long l_read_estate_baths - Long You get the idea. I created these by using the DIH handler and adding a script at the top of the file that would take the field from the database, and rename it based on what is was. Then I would load it into SOLR as a dynamic field. Then for the facets, I would configure the name of the dynamic fields that need to be pulled (with facet.field or query). For ranges: facet.query=l_years_of_service:[1999 TO 2004] or [1999 TO *] That is how I solved a similar problem (if I understand the issue). Bill From: mike anderson saidthero...@gmail.com Reply-To: dev@lucene.apache.org Date: Tue, 15 Feb 2011 10:51:43 -0500 To: solr-...@lucene.apache.org Cc: ken.fos...@realestate.com Subject: Fwd: Any contribs available for Range field type? -- Forwarded message -- From: kenf_nc ken.fos...@realestate.com Date: Fri, Feb 11, 2011 at 8:49 AM Subject: Any contribs available for Range field type? To: solr-u...@lucene.apache.org I have a huge need for a new field type. It would be a Poly field, similar to Point or Payload. It would take 2 data elements and a search would return a hit if the search term fell within the range of the elements. For example let's say I have a document representing an Employment record. I may want to create a field for years_of_service where it would take values 1999,2004. Then in a query q=years_of_service:2001 would be a hit, q=years_of_service:2010 would not. The field would need to take a data type attribute as a parameter. I may need to do integer ranges, float/double ranges, date ranges. I don't see the need now, but heck maybe even a string range. This would be useful for things like Event dates. An event often occurs between several days (or hours) but the query is something like what events are happening today. If I did q=event_date:NOW (or similar) it should hit all documents where event_date has a range that in inclusive of today. Another example would be product category document. A specific automobile may have a fixed price, but a category of auto (2010 BMW 3-series for example) would have a price range. I hope you get the point. My question (finally) is, does anyone know of an existing contribution to the public domain that already does this? I'm more of a .Net/C# developer than a Java developer. I know my way around Java, but don't really have the right tools to build/test/etc. So was hoping to borrow rather than build if I could. Thanks, Ken -- View this message in context: http://lucene.472066.n3.nabble.com/Any-contribs-available-for-Range-field-t ype-tp2473601p2473601.html Sent from the Solr - User mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: MultiValued FC question
I'll ask another way If I use termsIndex. There does not appear to be a way to get a list of terms for a document and field easily using ValueSource. Is that right? If not, how would I go about getting a list of terms for a field in a document? When I do fq=ids:56 it appears to work on multivalued fields. How does it work? Thanks. From: Bill Bell billnb...@gmail.com Reply-To: dev@lucene.apache.org Date: Sun, 13 Feb 2011 02:31:57 -0700 To: dev@lucene.apache.org dev@lucene.apache.org Subject: MultiValued FC question (I posted on solr-user by mistake) I am working on https://issues.apache.org/jira/browse/SOLR-2155 Trying to get a list of multiValued fields from the cache ValueSource vs = sf.getType().getValueSource(sf, fp); DocValues llVals = vs.getValues(context, reader); org.apache.lucene.spatial.geohash.GeoHashUtils.decode(llVals.strVal(doc)); public String strVal(int doc) { int ord=termsIndex.getOrd(doc); if (ord == 0) { return null; } else { return termsIndex.lookup(ord, new BytesRef()).utf8ToString(); } } I figure the problem is that lookup only returns one. I need more than 1 I thought ./lucene/src/java/org/apache/lucene/document/Document.java would help me, but it didn't much. Would I want to call getFieldables(name) ? Or would that slow down the caching? Thoughts? 1. What is termIndex ? Why does ord() matter? 2. Is there a helper for getting a multiValue field from the Field cache? The strVal(doc) only returns one of the multiValues. Thought one of you gurus might know the answer. Thanks. Bill
RE: Please mark distributed date faceting for 3.1
I may have added a test just now, but I and others have been using this [simple] code for some time now. It has baked, it doesn't need more baking IMO. If this patch wasn't the biggest reason to not use distributed search (a key feature) then I wouldn't be here arguing my point. But I've apparently lost this argument already so I give up;... assign if for 3.2 if that's the best you can do Rob. It's better than being unassigned which is what it is now. ~ David From: Robert Muir [rcm...@gmail.com] Sent: Tuesday, February 15, 2011 10:24 AM To: dev@lucene.apache.org Subject: Re: Please mark distributed date faceting for 3.1 On Tue, Feb 15, 2011 at 10:10 AM, Smiley, David W. dsmi...@mitre.org wrote: Distributed date faceting now has a patch and is tested: https://issues.apache.org/jira/browse/SOLR-1709 I’m posting to the dev list because I want a committer to mark this for 3.1. I don’t want to assume any of you guys see the comment activity. Thanks very much for adding a test! But, can't we just do this for 3.2 instead? I don't like the idea of rushing features into 3.1 at the last minute because we are nearing a release (0 open lucene issues, 2 open solr ones). Right now the 3.x branch is feature-frozen for 3.1 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Why does DIH jar end up in Solr war?
I noticed that the DIH .jar file ends up in the .war file. It ends up this way because the DIH's build.xml copies it into a place so that it ultimately winds up there. This seems like an odd thing because no other contrib module gets this special treatment. I noticed that the dataimport.jsp has a trivial dependency on the DataImportHandler class for an instanceof check that could be replaced with a string comparison of the class name. With that in place, this JSP won't error out if the DIH is not included. So does someone have a reason? In the absence of a good one, I suggest this needless exception be removed on the basis of consistency. ~ David Smiley - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Why does DIH jar end up in Solr war?
Well, it doesn't really make any sense for dataimport.jsp to be in the WAR file if DIH isn't there (will it really work being loaded, and friends, by SolrResourceLoader)? Erik On Feb 16, 2011, at 00:57 , Smiley, David W. wrote: I noticed that the DIH .jar file ends up in the .war file. It ends up this way because the DIH's build.xml copies it into a place so that it ultimately winds up there. This seems like an odd thing because no other contrib module gets this special treatment. I noticed that the dataimport.jsp has a trivial dependency on the DataImportHandler class for an instanceof check that could be replaced with a string comparison of the class name. With that in place, this JSP won't error out if the DIH is not included. So does someone have a reason? In the absence of a good one, I suggest this needless exception be removed on the basis of consistency. ~ David Smiley - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Why does DIH jar end up in Solr war?
Yes it would be a slight anomaly for this .jsp file (a small text file) to be there but not the jar file. But that feels like a better trade than this contrib module being the only contrib module that has it's jar file within Solr's war. I don't see what issue there would be regarding SolrResourceLoader. I tried out what I'm talking about and used example-DIH with the DIH .jar file in the multicore lib directory of that example and I used the db core fine. I can submit a patch in JIRA if you're agreeable. ~ David From: Erik Hatcher [erik.hatc...@gmail.com] Sent: Wednesday, February 16, 2011 1:06 AM To: dev@lucene.apache.org Subject: Re: Why does DIH jar end up in Solr war? Well, it doesn't really make any sense for dataimport.jsp to be in the WAR file if DIH isn't there (will it really work being loaded, and friends, by SolrResourceLoader)? Erik On Feb 16, 2011, at 00:57 , Smiley, David W. wrote: I noticed that the DIH .jar file ends up in the .war file. It ends up this way because the DIH's build.xml copies it into a place so that it ultimately winds up there. This seems like an odd thing because no other contrib module gets this special treatment. I noticed that the dataimport.jsp has a trivial dependency on the DataImportHandler class for an instanceof check that could be replaced with a string comparison of the class name. With that in place, this JSP won't error out if the DIH is not included. So does someone have a reason? In the absence of a good one, I suggest this needless exception be removed on the basis of consistency. ~ David Smiley - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2365) DIH should not be in the Solr war
DIH should not be in the Solr war - Key: SOLR-2365 URL: https://issues.apache.org/jira/browse/SOLR-2365 Project: Solr Issue Type: Improvement Components: Build Reporter: David Smiley Priority: Minor The DIH has a build.xml that puts itself into the Solr war file. This is the only contrib module that does this, and I don't think it should be this way. Granted there is a small dataimport.jsp file that would be most convenient to remain included, but the jar should not be. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2365) DIH should not be in the Solr war
[ https://issues.apache.org/jira/browse/SOLR-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated SOLR-2365: --- Attachment: SOLR-2365_DIH_should_not_be_in_war.patch This patch removes the line in the DIH build.xml that includes its jar file into the war. It makes a simple fix to dataimport.jsp so that it does not have a compile-time dependency on the DIH. And in example-DIH, adds some dih jar file references via lib directives. DIH should not be in the Solr war - Key: SOLR-2365 URL: https://issues.apache.org/jira/browse/SOLR-2365 Project: Solr Issue Type: Improvement Components: Build Reporter: David Smiley Priority: Minor Attachments: SOLR-2365_DIH_should_not_be_in_war.patch The DIH has a build.xml that puts itself into the Solr war file. This is the only contrib module that does this, and I don't think it should be this way. Granted there is a small dataimport.jsp file that would be most convenient to remain included, but the jar should not be. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2365) DIH should not be in the Solr war
[ https://issues.apache.org/jira/browse/SOLR-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995198#comment-12995198 ] Erik Hatcher commented on SOLR-2365: Since DIH worked out of the box with Solr 1.4.x, we probably want to keep it that way moving forward (for now). We should put the lib directive into Solr's main example solrconfig.xml (just we do with clustering, Solr Cell, etc) also. Other than that, no objections to this. [tangent, but ideally we can eventually get all Solr UI to be Velocity generated, and plugins can then ship with their own .vm files in the JAR file to add in something like a dataimport.jsp] DIH should not be in the Solr war - Key: SOLR-2365 URL: https://issues.apache.org/jira/browse/SOLR-2365 Project: Solr Issue Type: Improvement Components: Build Reporter: David Smiley Priority: Minor Attachments: SOLR-2365_DIH_should_not_be_in_war.patch The DIH has a build.xml that puts itself into the Solr war file. This is the only contrib module that does this, and I don't think it should be this way. Granted there is a small dataimport.jsp file that would be most convenient to remain included, but the jar should not be. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2881) Track FieldInfo per segment instead of per-IW-session
[ https://issues.apache.org/jira/browse/LUCENE-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch updated LUCENE-2881: -- Attachment: lucene-2881.patch I fixed a bug in FieldInfos that could lead to wrong field numbers, that might have been related to the wrong behavior you're seeing, Simon. About codecIds: I made the fix to FieldInfo.clone() to set the codecId on the clone. I also made FieldInfo.codecId private and added getter and setter. The setter checks whether the new value for codecId is different from the previous one, and throws in exception in that case (unless it was set to the default 0 before, which I think means Preflex codec). All tests pass. Please let me know if that fixes your problem. If not then you should at least see the new exception that I added, which might make debugging easier. Track FieldInfo per segment instead of per-IW-session - Key: LUCENE-2881 URL: https://issues.apache.org/jira/browse/LUCENE-2881 Project: Lucene - Java Issue Type: Improvement Affects Versions: Realtime Branch, CSF branch, 4.0 Reporter: Simon Willnauer Assignee: Michael Busch Fix For: Realtime Branch, CSF branch, 4.0 Attachments: lucene-2881.patch, lucene-2881.patch, lucene-2881.patch, lucene-2881.patch Currently FieldInfo is tracked per IW session to guarantee consistent global field-naming / ordering. IW carries FI instances over from previous segments which also carries over field properties like isIndexed etc. While having consistent field ordering per IW session appears to be important due to bulk merging stored fields etc. carrying over other properties might become problematic with Lucene's Codec support. Codecs that rely on consistent properties in FI will fail if FI properties are carried over. The DocValuesCodec (DocValuesBranch) for instance writes files per segment and field (using the field id within the file name). Yet, if a segment has no DocValues indexed in a particular segment but a previous segment in the same IW session had DocValues, FieldInfo#docValues will be true since those values are reused from previous segments. We already work around this limitation in SegmentInfo with properties like hasVectors or hasProx which is really something we should manage per Codec Segment. Ideally FieldInfo would be managed per Segment and Codec such that its properties are valid per segment. It also seems to be necessary to bind FieldInfoS to SegmentInfo logically since its really just per segment metadata. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org