RE: duplicate records in index
You are adding the same doc twice. (See how you add acttime ) DIGY -Original Message- From: Wen Gao [mailto:samuel.gao...@gmail.com] Sent: Wednesday, February 16, 2011 11:35 AM To: lucene-net-dev@lucene.apache.org Subject: duplicate records in index Hi, I am creating an index from my database, however, the record in .cfs files contains duplicate records, e.g. book1, 1, susan, 1 book1, 1,susan,1, 03/01/2010 book2, 2,tom, book2,2,tom, 2,03/02/2010 .. I got the data from several tables, and am sure that the sql only generate one record. Also, when I debug the code, the record is only added once. So I am confused whether data replicate in idex. I define my index as following format: doc.Add(new Lucene.Net.Documents.Field( lmname, readerreader1[lmname].ToString(), //new System.IO.StringReader(readerreader[cname].ToString()), Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.TOKENIZED) ); //lmid doc.Add(new Lucene.Net.Documents.Field( lmid, readerreader1[lmid].ToString(), Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.UN_TOKENIZED)); // nick name of user doc.Add(new Lucene.Net.Documents.Field( nickName, readerreader1[nickName].ToString(), Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.UN_TOKENIZED)); // uid doc.Add(new Lucene.Net.Documents.Field( uid, readerreader1[uid].ToString(), Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.UN_TOKENIZED)); writer.AddDocument(doc); // acttime doc.Add(new Lucene.Net.Documents.Field( acttime, readerreader1[acttime].ToString(), Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.UN_TOKENIZED)); writer.AddDocument(doc); // Any ideas? Thanks, Wen Gao
[jira] Issue Comment Edited: (LUCENENET-379) Clean up Lucene.Net website
[ https://issues.apache.org/jira/browse/LUCENENET-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995283#comment-12995283 ] michael herndon edited comment on LUCENENET-379 at 2/16/11 1:32 PM: I think anything would be better than the current one (which would look cool if it was cleaned up and put on the side of a chevelle. But I don't know how would help brand lucene.net). I'd say keep doing a few more variations. Open it up for the public to make some submissions as well. (giving credit to whoever's design is chosen maybe even give them some social media love). The final one needs to work well with both rgb and cymk color formats and in a scalable graphics format so that it can be resized cleanly. Also it should have a visual aspect of it that can be turned into a decent 16 x 16 favicon. (like the 3 yellow hexagons that is in the jpg). Though keep in mind basic color theory. Yellow is irritating on the eyes. Its definitely grabs attention, but its harder on the eyes for an extended period of time. Green is the most relaxing. But above all else: keep moving forward towards something new. :: edited due to posting this while on an empty stomach, never wise :: was (Author: michaelherndon): I think anything would be better than the current one (which would look cool if was cleaned up and put on the side of a chevelle, but I don't know how would help brand lucene.net). I'd say keep doing a few more variations. open it up for the public to make some submissions as well. (giving credit to whoever's design is chosen maybe even give them some social media love). The final one needs to work well with both rgb and cymk color formats and in a scalable graphics format so that it can be resized cleanly. Also it should have a visual aspect of it that can be turned into a decent 16 x 16 favicon. (like the 3 yellow hexagons that is in the jpg). Though keep in mind basic color theory. Yellow is irritating on the eyes. Its definitely grabs attention, but its harder on the eyes for an extended period of time. Green is the most relaxing. But above all else keep moving forward towards something new. Clean up Lucene.Net website --- Key: LUCENENET-379 URL: https://issues.apache.org/jira/browse/LUCENENET-379 Project: Lucene.Net Issue Type: Task Reporter: George Aroush Attachments: Lucene.zip, New Logo Idea.jpg, asfcms.zip, asfcms_1.patch The existing Lucene.Net home page at http://lucene.apache.org/lucene.net/ is still based on the incubation, out of date design. This JIRA task is to bring it up to date with other ASF project's web page. The existing website is here: https://svn.apache.org/repos/asf/lucene/lucene.net/site/ See http://www.apache.org/dev/project-site.html to get started. It would be best to start by cloning an existing ASF project's website and adopting it for Lucene.Net. Some examples, https://svn.apache.org/repos/asf/lucene/pylucene/site/ and https://svn.apache.org/repos/asf/lucene/java/site/ -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: how can I get the similarity in fuzzy query
As far as i know, you'll need to calculate that manually. FuzzyQuery searches don't return any results like that. On Wed, Feb 16, 2011 at 11:47 AM, Wen Gao samuel.gao...@gmail.com wrote: Hi, I think my situation is just to compare the similarity of strings: I want to calculate the similarity between the typed results and the returned results using *FuzzyQuery*. I have set the minimumSimilarity of FuzzyQuery as 0.5f, what i want to do is get the similariy instead of score for every result that returns. Thanks for your time. Wen 2011/2/16 Christopher Currens currens.ch...@gmail.com I was going to post the link that Digy posted, which suggests not to determine a match that way. If my understanding is correct, the scores returned for a query are relative to which documents were retrieved by the search, in that if a document is deleted from the index, the scores will change even though the query did not, because the number of returned documents are different. If the only thing you wanted to do was to calculate how a resulting string was to a search string, I suggest the Levenshtein Distance algorithm http://en.wikipedia.org/wiki/Levenshtein_distance...but it doesn't seem like that's quite what you want to accomplish based on your question. Christopher On Wed, Feb 16, 2011 at 10:55 AM, Wen Gao samuel.gao...@gmail.com wrote: Hi, I am using FuzzyQuery to get fuzzy mathed results. I want to get the similarity in percent for every matched record. for example, if i search for databasd, and it will return results such as database, database1, and database11. I want to get the similarity in percent for evey record, such as 87.5%, 75%, and 62.5%. How can I do this? Any ideas? Wen Gao
RE: how can I get the similarity in fuzzy query
Whether *fuzzy* or not, all queries are simple term queries at the end and Lucene does not have an info like *similarity*, just scores. DIGY -Original Message- From: Wen Gao [mailto:samuel.gao...@gmail.com] Sent: Wednesday, February 16, 2011 9:47 PM To: lucene-net-dev@lucene.apache.org Subject: Re: how can I get the similarity in fuzzy query Hi, I think my situation is just to compare the similarity of strings: I want to calculate the similarity between the typed results and the returned results using *FuzzyQuery*. I have set the minimumSimilarity of FuzzyQuery as 0.5f, what i want to do is get the similariy instead of score for every result that returns. Thanks for your time. Wen 2011/2/16 Christopher Currens currens.ch...@gmail.com I was going to post the link that Digy posted, which suggests not to determine a match that way. If my understanding is correct, the scores returned for a query are relative to which documents were retrieved by the search, in that if a document is deleted from the index, the scores will change even though the query did not, because the number of returned documents are different. If the only thing you wanted to do was to calculate how a resulting string was to a search string, I suggest the Levenshtein Distance algorithm http://en.wikipedia.org/wiki/Levenshtein_distance...but it doesn't seem like that's quite what you want to accomplish based on your question. Christopher On Wed, Feb 16, 2011 at 10:55 AM, Wen Gao samuel.gao...@gmail.com wrote: Hi, I am using FuzzyQuery to get fuzzy mathed results. I want to get the similarity in percent for every matched record. for example, if i search for databasd, and it will return results such as database, database1, and database11. I want to get the similarity in percent for evey record, such as 87.5%, 75%, and 62.5%. How can I do this? Any ideas? Wen Gao
RE: how can I get the similarity in fuzzy query
Download the source from https://svn.apache.org/repos/asf/incubator/lucene.net/tags/Lucene.Net_2_9_2 using a svn client(like TortoiseSVN), and open the project file with VS20XX. DIGY -Original Message- From: Wen Gao [mailto:samuel.gao...@gmail.com] Sent: Wednesday, February 16, 2011 9:58 PM To: lucene-net-dev@lucene.apache.org Subject: Re: how can I get the similarity in fuzzy query OK. i get it. how can I recompile a Lucene_src on Windows? Thanks. Wen 2011/2/16 Christopher Currens currens.ch...@gmail.com As far as i know, you'll need to calculate that manually. FuzzyQuery searches don't return any results like that. On Wed, Feb 16, 2011 at 11:47 AM, Wen Gao samuel.gao...@gmail.com wrote: Hi, I think my situation is just to compare the similarity of strings: I want to calculate the similarity between the typed results and the returned results using *FuzzyQuery*. I have set the minimumSimilarity of FuzzyQuery as 0.5f, what i want to do is get the similariy instead of score for every result that returns. Thanks for your time. Wen 2011/2/16 Christopher Currens currens.ch...@gmail.com I was going to post the link that Digy posted, which suggests not to determine a match that way. If my understanding is correct, the scores returned for a query are relative to which documents were retrieved by the search, in that if a document is deleted from the index, the scores will change even though the query did not, because the number of returned documents are different. If the only thing you wanted to do was to calculate how a resulting string was to a search string, I suggest the Levenshtein Distance algorithm http://en.wikipedia.org/wiki/Levenshtein_distance...but it doesn't seem like that's quite what you want to accomplish based on your question. Christopher On Wed, Feb 16, 2011 at 10:55 AM, Wen Gao samuel.gao...@gmail.com wrote: Hi, I am using FuzzyQuery to get fuzzy mathed results. I want to get the similarity in percent for every matched record. for example, if i search for databasd, and it will return results such as database, database1, and database11. I want to get the similarity in percent for evey record, such as 87.5%, 75%, and 62.5%. How can I do this? Any ideas? Wen Gao
Re: how can I get the similarity in fuzzy query
Thanks you. Wen 2011/2/16 Digy digyd...@gmail.com Download the source from https://svn.apache.org/repos/asf/incubator/lucene.net/tags/Lucene.Net_2_9_2 using a svn client(like TortoiseSVN), and open the project file with VS20XX. DIGY -Original Message- From: Wen Gao [mailto:samuel.gao...@gmail.com] Sent: Wednesday, February 16, 2011 9:58 PM To: lucene-net-dev@lucene.apache.org Subject: Re: how can I get the similarity in fuzzy query OK. i get it. how can I recompile a Lucene_src on Windows? Thanks. Wen 2011/2/16 Christopher Currens currens.ch...@gmail.com As far as i know, you'll need to calculate that manually. FuzzyQuery searches don't return any results like that. On Wed, Feb 16, 2011 at 11:47 AM, Wen Gao samuel.gao...@gmail.com wrote: Hi, I think my situation is just to compare the similarity of strings: I want to calculate the similarity between the typed results and the returned results using *FuzzyQuery*. I have set the minimumSimilarity of FuzzyQuery as 0.5f, what i want to do is get the similariy instead of score for every result that returns. Thanks for your time. Wen 2011/2/16 Christopher Currens currens.ch...@gmail.com I was going to post the link that Digy posted, which suggests not to determine a match that way. If my understanding is correct, the scores returned for a query are relative to which documents were retrieved by the search, in that if a document is deleted from the index, the scores will change even though the query did not, because the number of returned documents are different. If the only thing you wanted to do was to calculate how a resulting string was to a search string, I suggest the Levenshtein Distance algorithm http://en.wikipedia.org/wiki/Levenshtein_distance...but it doesn't seem like that's quite what you want to accomplish based on your question. Christopher On Wed, Feb 16, 2011 at 10:55 AM, Wen Gao samuel.gao...@gmail.com wrote: Hi, I am using FuzzyQuery to get fuzzy mathed results. I want to get the similarity in percent for every matched record. for example, if i search for databasd, and it will return results such as database, database1, and database11. I want to get the similarity in percent for evey record, such as 87.5%, 75%, and 62.5%. How can I do this? Any ideas? Wen Gao
Re: Site
Off topic, can we get a [Lucene.NET] prefix for messages to the list? On Wed, Feb 16, 2011 at 11:05 PM, Prescott Nasser geobmx...@hotmail.comwrote: Where does that site compile to? The incubator lucene.net site appears to be the older one
Re: Site
So, currently we are only setup for working in the staging environment. Once we are ready to publish, we'll need to enter a new JIRA ticket to the infrastructure project and ask for the site to be set up for publishing. Once that's done, we will be able to self-publish whenever we'd like either through the web ui for the CMS or by running the publish script on the server. Each time we publish, the changes will build and go public immediately. The current staging site is here: http://lucene.net.staging.apache.org/lucene.net/ The CMS Web UI for our site is: https://cms.apache.org/lucene.net/ You can use the web based editors to do most everything and that's the preferred method for making site modifications. This provides a controlled semi WYSIWYG environment for editing and will perform SVN commits for you when you save. It's a pretty easy system to work with. At first there were some issues with building the site and web ui, but Joe S in infrastructure got those taken care of today. I've cleaned up the other issues with the markdown and we've got a functioning version available at the staging site. Next steps are to edit content as a group and get it to where we are comfortable publishing it. Once we do that, we'll get setup for public publishing. I found the #asfinfra IRC channel very helpful, as it allowed me to work with Joe in real time to get the issues resolve and get my questions answered. I suggest looking there for help on the site, as the documentation is a bit sparse and a number of aspects of the CMS design are shrouded in mystery at first because of that. Hopefully they'll get the documentation updated soon, til then IRC and mailing lists... :) Thanks, Troy On Wed, Feb 16, 2011 at 1:05 PM, Prescott Nasser geobmx...@hotmail.com wrote: Where does that site compile to? The incubator lucene.net site appears to be the older one
[jira] Commented: (LUCENENET-379) Clean up Lucene.Net website
[ https://issues.apache.org/jira/browse/LUCENENET-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995595#comment-12995595 ] Troy Howard commented on LUCENENET-379: --- The staging site and CMS Web UI are working now and ready for us to get in there and edit content/layout/etc.. I set this up with a really basic template copied from the Lucy project, which is copied from the default Apache site. Browse here to see the staging site: http://lucene.net.staging.apache.org/lucene.net/ And here to edit content using CMS Web UI: https://cms.apache.org/lucene.net/ Clean up Lucene.Net website --- Key: LUCENENET-379 URL: https://issues.apache.org/jira/browse/LUCENENET-379 Project: Lucene.Net Issue Type: Task Reporter: George Aroush Attachments: Lucene.zip, New Logo Idea.jpg, asfcms.zip, asfcms_1.patch The existing Lucene.Net home page at http://lucene.apache.org/lucene.net/ is still based on the incubation, out of date design. This JIRA task is to bring it up to date with other ASF project's web page. The existing website is here: https://svn.apache.org/repos/asf/lucene/lucene.net/site/ See http://www.apache.org/dev/project-site.html to get started. It would be best to start by cloning an existing ASF project's website and adopting it for Lucene.Net. Some examples, https://svn.apache.org/repos/asf/lucene/pylucene/site/ and https://svn.apache.org/repos/asf/lucene/java/site/ -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: subclassing Python classes in Java
On Feb 16, 2011, at 9:39, Bill Janssen jans...@parc.com wrote: How do I subclass a Python class in a JCC-wrapped Java module? - define a Java class with native methods - using the usual extension tricks have a Python class implement these native methods - define a subclass of that Java class so as to inherit these native implementations Andi.. In UpLib, I've got a class, uplib.ripper.Ripper, and I'd like to be able to create a Java subclass for that in my module. I presume I need a Java interface for that Python class, but how do I hook the two together so that the Java subclass can inherit from the Python class? Bill
[jira] Commented: (SOLR-1395) Integrate Katta
[ https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995212#comment-12995212 ] tom liu commented on SOLR-1395: --- On Katta slave node, my folder hierarchy is: |/var/data|root| |/var/data/hadoop|store hadoop data| |/var/data/hdfszips|store zip tmp data, which get from hdfs,then move to katta's shardes| |/var/data/solr|root store solr core configures| |/var/data/solr/seoproxy|store seoproxy's solr config,which is used by sub-proxy| |/var/data/katta/shards/nodename_2/seo0#seo0|store seo0 shard,which is deployed from master node| |/var/data/zkdata|store zkserver data,which is zk logs and snapshotes| On Katta master node, my folder hierarchy is: |/var/data|root| |/var/data/hadoop|store hadoop data| |/var/data/hdfsfile|store solr tmp data, which get from solr dataimporter,then zip put to hdfs| |/var/data/solr|root store solr core configures| |/var/data/solr/seo|store seo's solr config,which is used by tomcat's webapp| |/var/data/zkdata|store zkserver data,which is zk logs and snapshotes| so, my config is from five folderes: |Master|/var/data/solr/seo|tomcat webapp's solrcore config| |Slave|/var/data/solr/seoproxy|sub-proxy's solrcore config| |Master|/var/data/hdfsfile|query-core's config,which is config template| |HDFS|http://hdfsname:9000/seo/seo0.zip|query-core seo0's zip file,which is hold conf| |Slave|/var/data/katta/shards/nodename_2/seo0#seo0/conf|query-core seo0's config,which is unzipped from seo0.zip of HDFS| and, /var/data/hdfsfile structure is: {noformat} seo@seo-solr1:/var/data/hdfsfile$ ll total 28 drwxr-xr-x 6 seo seo 4096 Oct 21 15:21 ./ drwxr-xr-x 4 seo seo 4096 Feb 16 15:49 ../ drwxr-xr-x 2 seo seo 4096 Oct 8 09:17 bin/ drwxr-xr-x 4 seo seo 4096 Jan 21 18:22 conf/ drwxr-xr-x 3 seo seo 4096 Oct 21 15:21 data/ drwxr-xr-x 2 seo seo 4096 Sep 29 14:01 lib/ -rw-r--r-- 1 seo seo 1320 Oct 8 09:20 solr.xml {noformat} Integrate Katta --- Key: SOLR-1395 URL: https://issues.apache.org/jira/browse/SOLR-1395 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: Next Attachments: SOLR-1395.patch, SOLR-1395.patch, SOLR-1395.patch, back-end.log, front-end.log, hadoop-core-0.19.0.jar, katta-core-0.6-dev.jar, katta-solrcores.jpg, katta.node.properties, katta.zk.properties, log4j-1.2.13.jar, solr-1395-1431-3.patch, solr-1395-1431-4.patch, solr-1395-1431-katta0.6.patch, solr-1395-1431-katta0.6.patch, solr-1395-1431.patch, solr-1395-katta-0.6.2-1.patch, solr-1395-katta-0.6.2-2.patch, solr-1395-katta-0.6.2-3.patch, solr-1395-katta-0.6.2.patch, test-katta-core-0.6-dev.jar, zkclient-0.1-dev.jar, zookeeper-3.2.1.jar Original Estimate: 336h Remaining Estimate: 336h We'll integrate Katta into Solr so that: * Distributed search uses Hadoop RPC * Shard/SolrCore distribution and management * Zookeeper based failover * Indexes may be built using Hadoop -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Please mark distributed date faceting for 3.1
On Wed, Feb 16, 2011 at 12:06 AM, Smiley, David W. dsmi...@mitre.org wrote: I may have added a test just now, but I and others have been using this [simple] code for some time now. It has baked, it doesn't need more baking IMO. I am sure people will say I am just being silly, but hudson does a better job testing these things than people playing with the code. For example, hudson randomizes external variables (locale X timezone)... on the latest 1.6u23 there are 152 locales, and 609 timezones (only 424 unique according to raw offset + rules). With hudson selecting 1 of these ~ 65K possibilities 96 times a day, you can start to calculate how long is a good baking for date-related functionality. Someone can argue that because Solr insists on treating dates internally, that this does not matter, but I have found and fixed timezone and localization related bugs in Lucene and Solr before, so that argument fails... not knowing the surrounding code, nothing makes me feel better than a couple weeks of hudson grinding on the code. Even then, sometimes a few weeks isnt enough.. for example if I remember right, SOLR-1821 was daylight-savings related (note: the issue was reported the very day daylight savings started in the United States, but in other timezones it had not yet, and would fail for some developers but not others). If this patch wasn't the biggest reason to not use distributed search (a key feature) then I wouldn't be here arguing my point. But I've apparently lost this argument already so I give up;... assign if for 3.2 if that's the best you can do Rob. It's better than being unassigned which is what it is now. I don't think that would be the best, as its not my area of expertise. If I see good patches being ignored because other devs are time-constrained sometimes I will take the time to try to bring myself up to speed to get them committed though, but I haven't yet given up on this patch :) Just so you know, Its nothing about your patch at all, I am just against any new features of any sort being added to 3.1 at this point. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: strange problem of PForDelta decoder
our recent experiments show that PFOR is not a good solution for and query we tested it with our dataset and users' queries. for most case, PFOR is slower than vint. we analyzed the reason may be that it's very likely there is a low-frequent term in most queries. So the scoring time is the majority while decoding is not. e.g in our index, term beijing's df is 2557916 and park is 2313201, both them are hight frequent terms. but the count of documents containing both is only 1552 for vint, it only need decode 1552 documents, while PFOR, it may decode many blocks. for most search engines, and query is used. So PFOR is only good for or query and and query whose terms are all high frequent. So we have to give up this in our application. partial decoder for PFOR? for all high frequent terms, using normal PFOR decoder ;for quries with low frequent terms, using partial decoder? partial decoder of PFOR many need many if/else and will be slower. Any one has any solution for this? 2010/12/27 Li Li fancye...@gmail.com: I integrated pfor codec into lucene 2.9.3 and the search time comparsion is as follows: single term and query or query VINT in lucene 2.9.3 11.2 36.5 38.6 PFor in lucene 2.9.3 8.7 27.6 33.4 VINT in lucene 4 branch 10.6 26.5 35.4 PFor in lcuene 4 branch 8.1 22.5 30.7 My test terms are high frequncy terms because we are interested in bad case It seems lucene 4 branch's implementation of and query(conjuction query) is well optimized that even for VINT codec, it's faster than PFor in lucene 2.9.3. Could any one tell me what optimization is done? is store docIDs and freqs separately making it faster? or anything else? Another querstion, Is there anyone interested in integrating pfor codec into lucene 2.9.3 as me( we have to use lucene 2.9 and solr 1.4). And how do I contribute this patch? 2010/12/24 Michael McCandless luc...@mikemccandless.com: Well, an early patch somewhere was able to run PFor on trunk, but the performance wasn't great because the trunk bulk-read API is a bottleneck (this is why the bulk postings branch was created). Mike On Wed, Dec 22, 2010 at 9:45 PM, Li Li fancye...@gmail.com wrote: I used the bulkpostings branch(https://svn.apache.org/repos/asf/lucene/dev/branches/bulkpostings/lucene) does trunk have PForDelta decoder/encoder ? 2010/12/23 Michael McCandless luc...@mikemccandless.com: Those are nice speedups! Did you use the 4.0 branch (ie trunk) or the bulkpostings branch for this test? Mike On Tue, Dec 21, 2010 at 9:59 PM, Li Li fancye...@gmail.com wrote: great improvement! I did a test in our data set. doc count is about 2M+ and index size after optimization is about 13.3GB(including fdt) it seems lucene4's index format is better than lucene2.9.3. and PFor give good results. Besides BlockEncoder for frq and pos. is there any other modification for lucene 4? decoder \ avg time single word(ms) and query(ms) or query(ms) VINT in lucene 2.9 11.2 36.5 38.6 VINT in lucene 4 branch 10.6 26.5 35.4 PFor in lucene 4 branch 8.1 22.5 30.7 2010/12/21 Li Li fancye...@gmail.com: OK we should have a look at that one still. We need to converge on a good default codec for 4.0. Fortunately it's trivial to take any int block encoder (fixed or variable block) and make a Lucene codec out of it! I suggests you not to use this one, I fixed dozens of bugs but it still failed when with random tests. it's codes is hand coded rather than generated by program. But we may learn something from it. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: inverted index pruning
great things. But I think the patch is different from the method in that paper. my colleague had tested this patch but don't get good results (I don't know the detail well, and he just tell me his experience) 2011/2/15 Andrzej Bialecki a...@getopt.org: On 2/15/11 11:57 AM, Li Li wrote: hi all, I recently read a paper Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee. It's idea is interesting and I have some questions and like to share with you. Please take a look at LUCENE-1812, LUCENE-2632 and my presentation from Apache EuroCon 2010 in Prague, Munching and Crunching. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Assigned: (LUCENE-1812) Static index pruning by in-document term frequency (Carmel pruning)
[ https://issues.apache.org/jira/browse/LUCENE-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen reassigned LUCENE-1812: --- Assignee: Doron Cohen Static index pruning by in-document term frequency (Carmel pruning) --- Key: LUCENE-1812 URL: https://issues.apache.org/jira/browse/LUCENE-1812 Project: Lucene - Java Issue Type: New Feature Components: contrib/* Affects Versions: 2.9, 3.1 Reporter: Andrzej Bialecki Assignee: Doron Cohen Attachments: pruning.patch, pruning.patch, pruning.patch, pruning.patch This module provides tools to produce a subset of input indexes by removing postings data for those terms where their in-document frequency is below a specified threshold. The net effect of this processing is a much smaller index that for common types of queries returns nearly identical top-N results as compared with the original index, but with increased performance. Optionally, stored values and term vectors can also be removed. This functionality is largely independent, so it can be used without term pruning (when term freq. threshold is set to 1). As the threshold value increases, the total size of the index decreases, search performance increases, and recall decreases (i.e. search quality deteriorates). NOTE: especially phrase recall deteriorates significantly at higher threshold values. Primary purpose of this class is to produce small first-tier indexes that fit completely in RAM, and store these indexes using IndexWriter.addIndexes(IndexReader[]). Usually the performance of this class will not be sufficient to use the resulting index view for on-the-fly pruning and searching. NOTE: If the input index is optimized (i.e. doesn't contain deletions) then the index produced via IndexWriter.addIndexes(IndexReader[]) will preserve internal document id-s so that they are in sync with the original index. This means that all other auxiliary information not necessary for first-tier processing, such as some stored fields, can also be removed, to be quickly retrieved on-demand from the original index using the same internal document id. Threshold values can be specified globally (for terms in all fields) using defaultThreshold parameter, and can be overriden using per-field or per-term values supplied in a thresholds map. Keys in this map are either field names, or terms in field:text format. The precedence of these values is the following: first a per-term threshold is used if present, then per-field threshold if present, and finally the default threshold. A command-line tool (PruningTool) is provided for convenience. At this moment it doesn't support all functionality available through API. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1812) Static index pruning by in-document term frequency (Carmel pruning)
[ https://issues.apache.org/jira/browse/LUCENE-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-1812: Affects Version/s: (was: 3.1) (was: 2.9) Fix Version/s: 4.0 3.2 Static index pruning by in-document term frequency (Carmel pruning) --- Key: LUCENE-1812 URL: https://issues.apache.org/jira/browse/LUCENE-1812 Project: Lucene - Java Issue Type: New Feature Components: contrib/* Reporter: Andrzej Bialecki Assignee: Doron Cohen Fix For: 3.2, 4.0 Attachments: pruning.patch, pruning.patch, pruning.patch, pruning.patch This module provides tools to produce a subset of input indexes by removing postings data for those terms where their in-document frequency is below a specified threshold. The net effect of this processing is a much smaller index that for common types of queries returns nearly identical top-N results as compared with the original index, but with increased performance. Optionally, stored values and term vectors can also be removed. This functionality is largely independent, so it can be used without term pruning (when term freq. threshold is set to 1). As the threshold value increases, the total size of the index decreases, search performance increases, and recall decreases (i.e. search quality deteriorates). NOTE: especially phrase recall deteriorates significantly at higher threshold values. Primary purpose of this class is to produce small first-tier indexes that fit completely in RAM, and store these indexes using IndexWriter.addIndexes(IndexReader[]). Usually the performance of this class will not be sufficient to use the resulting index view for on-the-fly pruning and searching. NOTE: If the input index is optimized (i.e. doesn't contain deletions) then the index produced via IndexWriter.addIndexes(IndexReader[]) will preserve internal document id-s so that they are in sync with the original index. This means that all other auxiliary information not necessary for first-tier processing, such as some stored fields, can also be removed, to be quickly retrieved on-demand from the original index using the same internal document id. Threshold values can be specified globally (for terms in all fields) using defaultThreshold parameter, and can be overriden using per-field or per-term values supplied in a thresholds map. Keys in this map are either field names, or terms in field:text format. The precedence of these values is the following: first a per-term threshold is used if present, then per-field threshold if present, and finally the default threshold. A command-line tool (PruningTool) is provided for convenience. At this moment it doesn't support all functionality available through API. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2105) RequestHandler param update.processor is confusing
[ https://issues.apache.org/jira/browse/SOLR-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-2105: -- Attachment: SOLR-2105.patch Updated patch attached. * Use of update.processor is not deprecated but still works, logging a warning * Added test case which tests that both params work Patch is for trunk. RequestHandler param update.processor is confusing -- Key: SOLR-2105 URL: https://issues.apache.org/jira/browse/SOLR-2105 Project: Solr Issue Type: Improvement Components: update Affects Versions: 1.4.1 Reporter: Jan Høydahl Priority: Minor Attachments: SOLR-2105.patch, SOLR-2105.patch Today we reference a custom updateRequestProcessorChain using the update request parameter update.processor. See http://wiki.apache.org/solr/SolrConfigXml#UpdateRequestProcessorChain_section This is confusing, since what we are really referencing is not an UpdateProcessor, but an updateRequestProcessorChain. I propose that update.processor is renamed as update.chain or similar -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENENET-379) Clean up Lucene.Net website
[ https://issues.apache.org/jira/browse/LUCENENET-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995283#comment-12995283 ] michael herndon commented on LUCENENET-379: --- I think anything would be better than the current one (which would look cool if was cleaned up and put on the side of a chevelle, but I don't know how would help brand lucene.net). I'd say keep doing a few more variations. open it up for the public to make some submissions as well. (giving credit to whoever's design is chosen maybe even give them some social media love). The final one needs to work well with both rgb and cymk color formats and in a scalable graphics format so that it can be resized cleanly. Also it should have a visual aspect of it that can be turned into a decent 16 x 16 favicon. (like the 3 yellow hexagons that is in the jpg). Though keep in mind basic color theory. Yellow is irritating on the eyes. Its definitely grabs attention, but its arder on the eyes for an extended period of time. Green is the most relaxing. But above all else keep moving forward towards something new. Clean up Lucene.Net website --- Key: LUCENENET-379 URL: https://issues.apache.org/jira/browse/LUCENENET-379 Project: Lucene.Net Issue Type: Task Reporter: George Aroush Attachments: Lucene.zip, New Logo Idea.jpg, asfcms.zip, asfcms_1.patch The existing Lucene.Net home page at http://lucene.apache.org/lucene.net/ is still based on the incubation, out of date design. This JIRA task is to bring it up to date with other ASF project's web page. The existing website is here: https://svn.apache.org/repos/asf/lucene/lucene.net/site/ See http://www.apache.org/dev/project-site.html to get started. It would be best to start by cloning an existing ASF project's website and adopting it for Lucene.Net. Some examples, https://svn.apache.org/repos/asf/lucene/pylucene/site/ and https://svn.apache.org/repos/asf/lucene/java/site/ -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Issue Comment Edited: (LUCENENET-379) Clean up Lucene.Net website
[ https://issues.apache.org/jira/browse/LUCENENET-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995283#comment-12995283 ] michael herndon edited comment on LUCENENET-379 at 2/16/11 1:21 PM: I think anything would be better than the current one (which would look cool if was cleaned up and put on the side of a chevelle, but I don't know how would help brand lucene.net). I'd say keep doing a few more variations. open it up for the public to make some submissions as well. (giving credit to whoever's design is chosen maybe even give them some social media love). The final one needs to work well with both rgb and cymk color formats and in a scalable graphics format so that it can be resized cleanly. Also it should have a visual aspect of it that can be turned into a decent 16 x 16 favicon. (like the 3 yellow hexagons that is in the jpg). Though keep in mind basic color theory. Yellow is irritating on the eyes. Its definitely grabs attention, but its harder on the eyes for an extended period of time. Green is the most relaxing. But above all else keep moving forward towards something new. was (Author: michaelherndon): I think anything would be better than the current one (which would look cool if was cleaned up and put on the side of a chevelle, but I don't know how would help brand lucene.net). I'd say keep doing a few more variations. open it up for the public to make some submissions as well. (giving credit to whoever's design is chosen maybe even give them some social media love). The final one needs to work well with both rgb and cymk color formats and in a scalable graphics format so that it can be resized cleanly. Also it should have a visual aspect of it that can be turned into a decent 16 x 16 favicon. (like the 3 yellow hexagons that is in the jpg). Though keep in mind basic color theory. Yellow is irritating on the eyes. Its definitely grabs attention, but its arder on the eyes for an extended period of time. Green is the most relaxing. But above all else keep moving forward towards something new. Clean up Lucene.Net website --- Key: LUCENENET-379 URL: https://issues.apache.org/jira/browse/LUCENENET-379 Project: Lucene.Net Issue Type: Task Reporter: George Aroush Attachments: Lucene.zip, New Logo Idea.jpg, asfcms.zip, asfcms_1.patch The existing Lucene.Net home page at http://lucene.apache.org/lucene.net/ is still based on the incubation, out of date design. This JIRA task is to bring it up to date with other ASF project's web page. The existing website is here: https://svn.apache.org/repos/asf/lucene/lucene.net/site/ See http://www.apache.org/dev/project-site.html to get started. It would be best to start by cloning an existing ASF project's website and adopting it for Lucene.Net. Some examples, https://svn.apache.org/repos/asf/lucene/pylucene/site/ and https://svn.apache.org/repos/asf/lucene/java/site/ -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Issue Comment Edited: (SOLR-2105) RequestHandler param update.processor is confusing
[ https://issues.apache.org/jira/browse/SOLR-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995282#comment-12995282 ] Jan Høydahl edited comment on SOLR-2105 at 2/16/11 1:24 PM: Updated patch attached. * Use of update.processor is now deprecated, logging a warning (instead of removing as in previous patch) * Added test case which tests that both params work Patch is for trunk. was (Author: janhoy): Updated patch attached. * Use of update.processor is not deprecated but still works, logging a warning * Added test case which tests that both params work Patch is for trunk. RequestHandler param update.processor is confusing -- Key: SOLR-2105 URL: https://issues.apache.org/jira/browse/SOLR-2105 Project: Solr Issue Type: Improvement Components: update Affects Versions: 1.4.1 Reporter: Jan Høydahl Priority: Minor Attachments: SOLR-2105.patch, SOLR-2105.patch Today we reference a custom updateRequestProcessorChain using the update request parameter update.processor. See http://wiki.apache.org/solr/SolrConfigXml#UpdateRequestProcessorChain_section This is confusing, since what we are really referencing is not an UpdateProcessor, but an updateRequestProcessorChain. I propose that update.processor is renamed as update.chain or similar -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2903) Improvement of PForDelta Codec
[ https://issues.apache.org/jira/browse/LUCENE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2903: --- Attachment: LUCENE-2903.patch Thanks Hao! The new patch looks great -- much leaner. I fixed a few things... new patch attache. To keep the comparison fair, I cutover BulkVInt back to Sep (it was Fixed (interleaved)). I also impl'd skipBlock in PFor4 (though this method is never called by Sep). I cutover PFor4 to var gap terms index. Finally I added back copyright headers (Simple16.java's had been stripped but other new sources were missing...). Also, we need to eventually remove the @author tags.. One question: it looks like this PFOR impl can only handle up to 28 bit wide ints? Which means... could it could fail on some cases? Though I suppose you would never see too many of these immense ints in one block, and so they'd always be encoded as exceptions and so it's actually safe...? Here are the results on Linux, MMapDir, 10M docs, unshuffled: ||Query||QPS BulkVInt||QPS PFor4||Pct diff |united states|13.66|11.63|{color:red}-14.9%{color}| |u*d|12.75|11.55|{color:red}-9.4%{color}| |un*d|24.71|22.46|{color:red}-9.1%{color}| |uni*|24.68|22.85|{color:red}-7.4%{color}| |unit*|41.22|39.25|{color:red}-4.8%{color}| |+nebraska +states|128.41|123.73|{color:red}-3.6%{color}| |spanFirst(unit, 5)|263.41|258.27|{color:red}-1.9%{color}| |+united +states|21.37|21.09|{color:red}-1.3%{color}| |title:.*[Uu]nited.*|5.70|5.66|{color:red}-0.6%{color}| |timesecnum:[1 TO 6]|15.01|14.96|{color:red}-0.4%{color}| |unit~0.7|41.78|43.44|{color:green}4.0%{color}| |united states~3|6.48|6.79|{color:green}4.8%{color}| |unit~0.5|24.61|25.83|{color:green}4.9%{color}| |spanNear([unit, state], 10, true)|52.34|55.67|{color:green}6.4%{color}| |united~0.6|11.36|12.18|{color:green}7.1%{color}| |united~0.75|15.96|17.58|{color:green}10.2%{color}| |states|53.41|61.03|{color:green}14.3%{color}| |united states|16.87|20.62|{color:green}22.2%{color}| Very nice! Improvement of PForDelta Codec -- Key: LUCENE-2903 URL: https://issues.apache.org/jira/browse/LUCENE-2903 Project: Lucene - Java Issue Type: Improvement Reporter: hao yan Attachments: LUCENE-2903.patch, LUCENE-2903.patch There are 3 versions of PForDelta implementations in the Bulk Branch: FrameOfRef, PatchedFrameOfRef, and PatchedFrameOfRef2. The FrameOfRef is a very basic one which is essentially a binary encoding (may result in huge index size). The PatchedFrameOfRef is the implmentation based on the original version of PForDelta in the literatures. The PatchedFrameOfRef2 is my previous implementation which are improved this time. (The Codec name is changed to NewPForDelta.). In particular, the changes are: 1. I fixed the bug of my previous version (in Lucene-1410.patch), where the old PForDelta does not support very large exceptions (since the Simple16 does not support very large numbers). Now this has been fixed in the new LCPForDelta. 2. I changed the PForDeltaFixedIntBlockCodec. Now it is faster than the other two PForDelta implementation in the bulk branch (FrameOfRef and PatchedFrameOfRef). The codec's name is NewPForDelta, as you can see in the CodecProvider and PForDeltaFixedIntBlockCodec. 3. The performance test results are: 1) My NewPForDelta codec is faster then FrameOfRef and PatchedFrameOfRef for almost all kinds of queries, slightly worse then BulkVInt. 2) My NewPForDelta codec can result in the smallest index size among all 4 methods, including FrameOfRef, PatchedFrameOfRef, and BulkVInt, and itself) 3) All performance test results are achieved by running with -server instead of -client -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2366) Facet Range Gaps
Facet Range Gaps Key: SOLR-2366 URL: https://issues.apache.org/jira/browse/SOLR-2366 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Priority: Minor Fix For: 3.2, 4.0 There really is no reason why the range gap for date and numeric faceting needs to be evenly spaced. For instance, if and when SOLR-1581 is completed and one were doing spatial distance calculations, one could facet by function into 3 different sized buckets: walking distance (0-5KM), driving distance (5KM-150KM) and everything else (150KM+), for instance. We should be able to quantize the results into arbitrarily sized buckets. I'd propose the syntax to be a comma separated list of sizes for each bucket. If only one value is specified, then it behaves as it currently does. Otherwise, it creates the different size buckets. If the number of buckets doesn't evenly divide up the space, then the size of the last bucket specified is used to fill out the remaining space (not sure on this) For instance, facet.range.start=0 facet.range.end=400 facet.range.gap=5,25,50,100 would yield buckets of: 0-5,5-30,30-80,80-180,180-280,280-380,380-400 -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-trunk - Build # 4967 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/4967/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriter.testIndexingThenDeleting Error Message: flush happened too quickly during deleting count=1155 Stack Trace: junit.framework.AssertionFailedError: flush happened too quickly during deleting count=1155 at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1183) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1115) at org.apache.lucene.index.TestIndexWriter.testIndexingThenDeleting(TestIndexWriter.java:2579) Build Log (for compile errors): [...truncated 3048 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2903) Improvement of PForDelta Codec
[ https://issues.apache.org/jira/browse/LUCENE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2903: Attachment: for_pfor.patch Nice results Hao! One idea for the low-frequency multitermqueries (foo* etc) could be in the attached patch: i only implemented this for the existing FrameOfRef and PatchedFrameOfRef but perhaps you could steal/test the idea with your implementation. In these cases i switched them over to a single byte header instead of an int. This means less overhead per-block, a slightly smaller (maybe 1-2%?) index. It might be more useful if we switch your codec over from Sep layout to interleaved (Fixed) layout, to make a more efficient skipBlock()... but this interleaved layout is still a work in progress. Improvement of PForDelta Codec -- Key: LUCENE-2903 URL: https://issues.apache.org/jira/browse/LUCENE-2903 Project: Lucene - Java Issue Type: Improvement Reporter: hao yan Attachments: LUCENE-2903.patch, LUCENE-2903.patch, for_pfor.patch There are 3 versions of PForDelta implementations in the Bulk Branch: FrameOfRef, PatchedFrameOfRef, and PatchedFrameOfRef2. The FrameOfRef is a very basic one which is essentially a binary encoding (may result in huge index size). The PatchedFrameOfRef is the implmentation based on the original version of PForDelta in the literatures. The PatchedFrameOfRef2 is my previous implementation which are improved this time. (The Codec name is changed to NewPForDelta.). In particular, the changes are: 1. I fixed the bug of my previous version (in Lucene-1410.patch), where the old PForDelta does not support very large exceptions (since the Simple16 does not support very large numbers). Now this has been fixed in the new LCPForDelta. 2. I changed the PForDeltaFixedIntBlockCodec. Now it is faster than the other two PForDelta implementation in the bulk branch (FrameOfRef and PatchedFrameOfRef). The codec's name is NewPForDelta, as you can see in the CodecProvider and PForDeltaFixedIntBlockCodec. 3. The performance test results are: 1) My NewPForDelta codec is faster then FrameOfRef and PatchedFrameOfRef for almost all kinds of queries, slightly worse then BulkVInt. 2) My NewPForDelta codec can result in the smallest index size among all 4 methods, including FrameOfRef, PatchedFrameOfRef, and BulkVInt, and itself) 3) All performance test results are achieved by running with -server instead of -client -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2366) Facet Range Gaps
[ https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-2366: -- Attachment: SOLR-2366.patch Adds variable width gap capabilities and some tests. Still needs some more tests for edge conditions, etc. but it is something that others can look at and comment on. Facet Range Gaps Key: SOLR-2366 URL: https://issues.apache.org/jira/browse/SOLR-2366 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Priority: Minor Fix For: 3.2, 4.0 Attachments: SOLR-2366.patch There really is no reason why the range gap for date and numeric faceting needs to be evenly spaced. For instance, if and when SOLR-1581 is completed and one were doing spatial distance calculations, one could facet by function into 3 different sized buckets: walking distance (0-5KM), driving distance (5KM-150KM) and everything else (150KM+), for instance. We should be able to quantize the results into arbitrarily sized buckets. I'd propose the syntax to be a comma separated list of sizes for each bucket. If only one value is specified, then it behaves as it currently does. Otherwise, it creates the different size buckets. If the number of buckets doesn't evenly divide up the space, then the size of the last bucket specified is used to fill out the remaining space (not sure on this) For instance, facet.range.start=0 facet.range.end=400 facet.range.gap=5,25,50,100 would yield buckets of: 0-5,5-30,30-80,80-180,180-280,280-380,380-400 -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995362#comment-12995362 ] Doug Steigerwald commented on SOLR-236: --- Has anyone successfully applied field collapsing to the branch_3x branch? Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Shalin Shekhar Mangar Fix For: Next Attachments: DocSetScoreCollector.java, NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, SOLR-236-1_4_1-NPEfix.patch, SOLR-236-1_4_1-paging-totals-working.patch, SOLR-236-1_4_1.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-distinctFacet.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch, collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, solr-236.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Assigned: (SOLR-2105) RequestHandler param update.processor is confusing
[ https://issues.apache.org/jira/browse/SOLR-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller reassigned SOLR-2105: - Assignee: Mark Miller RequestHandler param update.processor is confusing -- Key: SOLR-2105 URL: https://issues.apache.org/jira/browse/SOLR-2105 Project: Solr Issue Type: Improvement Components: update Affects Versions: 1.4.1 Reporter: Jan Høydahl Assignee: Mark Miller Priority: Minor Attachments: SOLR-2105.patch, SOLR-2105.patch Today we reference a custom updateRequestProcessorChain using the update request parameter update.processor. See http://wiki.apache.org/solr/SolrConfigXml#UpdateRequestProcessorChain_section This is confusing, since what we are really referencing is not an UpdateProcessor, but an updateRequestProcessorChain. I propose that update.processor is renamed as update.chain or similar -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-1191) NullPointerException in delta import
[ https://issues.apache.org/jira/browse/SOLR-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunnlaugur Thor Briem updated SOLR-1191: Attachment: SOLR-1191.patch Updated patch with unit test. NullPointerException in delta import Key: SOLR-1191 URL: https://issues.apache.org/jira/browse/SOLR-1191 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.3, 1.4 Environment: OS: Windows Linux. Java: 1.6 DB: MySQL SQL Server Reporter: Ali Syed Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1191.patch, SOLR-1191.patch Seeing few of these NullPointerException during delta imports. Once this happens delta import stops working and keeps giving the same error. java.lang.NullPointerException at org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:622) at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:240) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:337) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:376) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:355) Running delta import for a particular entity fixes the problem and delta import start working again. Here is the log just before after the exception 05/27 11:59:29 86987686 INFO btpool0-538 org.apache.solr.core.SolrCore - [localhost] webapp=/solr path=/dataimport params={command=delta-importoptimize=false} status=0 QTime=0 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.SolrWriter - Read dataimport.properties 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.DataImporter - Starting Delta Import 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.SolrWriter - Read dataimport.properties 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Starting delta collection. 05/27 11:59:29 86987690 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Running ModifiedRowKey() for Entity: content 05/27 11:59:29 86987690 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed ModifiedRowKey for Entity: content rows obtained : 0 05/27 11:59:29 86987690 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed DeletedRowKey for Entity: content rows obtained : 0 05/27 11:59:29 86987692 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed parentDeltaQuery for Entity: content 05/27 11:59:29 86987692 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Running ModifiedRowKey() for Entity: job 05/27 11:59:29 86987692 INFO Thread-4162 org.apache.solr.handler.dataimport.JdbcDataSource - Creating a connection for entity job with URL: jdbc:sqlserver://localhost;databaseName=TestDB 05/27 11:59:29 86987704 INFO Thread-4162 org.apache.solr.handler.dataimport.JdbcDataSource - Time taken for getConnection(): 12 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed ModifiedRowKey for Entity: job rows obtained : 0 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed DeletedRowKey for Entity: job rows obtained : 0 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed parentDeltaQuery for Entity: job 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Delta Import completed successfully 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Starting delta collection. 05/27 11:59:29 86987709 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Running ModifiedRowKey() for Entity: user 05/27 11:59:29 86987709 INFO Thread-4162 org.apache.solr.handler.dataimport.JdbcDataSource - Creating a connection for entity user with URL: jdbc:sqlserver://localhost;databaseName=TestDB 05/27 11:59:29 86987716 INFO Thread-4162 org.apache.solr.handler.dataimport.JdbcDataSource - Time taken for getConnection(): 7 05/27 11:59:29 86987873 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed ModifiedRowKey for Entity: user rows obtained : 46 05/27 11:59:29 86987873 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed DeletedRowKey for Entity: user rows obtained : 0 05/27 11:59:29 86987873 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder
[jira] Created: (SOLR-2367) DataImportHandler unit tests are very noisy
DataImportHandler unit tests are very noisy --- Key: SOLR-2367 URL: https://issues.apache.org/jira/browse/SOLR-2367 Project: Solr Issue Type: Improvement Components: Build, contrib - DataImportHandler Reporter: Gunnlaugur Thor Briem Priority: Trivial Running DataImportHandler unit tests emits a lot of console noise, mainly stacktraces because dataimport.properties can't be written. This makes it hard to scan the output for useful information. I'm attaching a patch to get rid of most of the noise by creating the conf directory before test runs so that the properties file write doesn't fail. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2367) DataImportHandler unit tests are very noisy
[ https://issues.apache.org/jira/browse/SOLR-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunnlaugur Thor Briem updated SOLR-2367: Attachment: SOLR-2367.patch Patch to address this issue. Creates conf directories under work directory before test runs, and suppresses a warning. The console noise that remains is some XML parsing failure, which may or may not be meaningful (I don't know) — at least now it is visible. :) DataImportHandler unit tests are very noisy --- Key: SOLR-2367 URL: https://issues.apache.org/jira/browse/SOLR-2367 Project: Solr Issue Type: Improvement Components: Build, contrib - DataImportHandler Reporter: Gunnlaugur Thor Briem Priority: Trivial Attachments: SOLR-2367.patch Original Estimate: 5m Remaining Estimate: 5m Running DataImportHandler unit tests emits a lot of console noise, mainly stacktraces because dataimport.properties can't be written. This makes it hard to scan the output for useful information. I'm attaching a patch to get rid of most of the noise by creating the conf directory before test runs so that the properties file write doesn't fail. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (SOLR-1553) extended dismax query parser
[ https://issues.apache.org/jira/browse/SOLR-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-1553. Resolution: Fixed Fix Version/s: (was: 4.0) (was: 1.5) Resolving. Improvements can be tracked in a new issue. extended dismax query parser Key: SOLR-1553 URL: https://issues.apache.org/jira/browse/SOLR-1553 Project: Solr Issue Type: New Feature Reporter: Yonik Seeley Assignee: Yonik Seeley Fix For: 3.1 Attachments: SOLR-1553.patch, SOLR-1553.pf-refactor.patch, edismax.unescapedcolon.bug.test.patch, edismax.unescapedcolon.bug.test.patch, edismax.userFields.patch An improved user-facing query parser based on dismax -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2368) Improve extended dismax (edismax) parser
Improve extended dismax (edismax) parser Key: SOLR-2368 URL: https://issues.apache.org/jira/browse/SOLR-2368 Project: Solr Issue Type: Improvement Reporter: Yonik Seeley Improve edismax and replace dismax once it has all of the needed features. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (SOLR-2367) DataImportHandler unit tests are very noisy
[ https://issues.apache.org/jira/browse/SOLR-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995387#comment-12995387 ] Gunnlaugur Thor Briem edited comment on SOLR-2367 at 2/16/11 5:09 PM: -- Patch to address this issue. Creates conf directories under work directory before test runs, and suppresses a warning. The console noise that remains is some XML parsing failure, which may or may not be meaningful (I don't know) — at least now it is visible. :) This patch is against branch_3x as of just now. was (Author: gthb): Patch to address this issue. Creates conf directories under work directory before test runs, and suppresses a warning. The console noise that remains is some XML parsing failure, which may or may not be meaningful (I don't know) — at least now it is visible. :) DataImportHandler unit tests are very noisy --- Key: SOLR-2367 URL: https://issues.apache.org/jira/browse/SOLR-2367 Project: Solr Issue Type: Improvement Components: Build, contrib - DataImportHandler Reporter: Gunnlaugur Thor Briem Priority: Trivial Attachments: SOLR-2367.patch Original Estimate: 5m Remaining Estimate: 5m Running DataImportHandler unit tests emits a lot of console noise, mainly stacktraces because dataimport.properties can't be written. This makes it hard to scan the output for useful information. I'm attaching a patch to get rid of most of the noise by creating the conf directory before test runs so that the properties file write doesn't fail. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (SOLR-1191) NullPointerException in delta import
[ https://issues.apache.org/jira/browse/SOLR-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995381#comment-12995381 ] Gunnlaugur Thor Briem edited comment on SOLR-1191 at 2/16/11 5:09 PM: -- Updated patch with unit test, against current branch_3x. was (Author: gthb): Updated patch with unit test. NullPointerException in delta import Key: SOLR-1191 URL: https://issues.apache.org/jira/browse/SOLR-1191 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.3, 1.4 Environment: OS: Windows Linux. Java: 1.6 DB: MySQL SQL Server Reporter: Ali Syed Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1191.patch, SOLR-1191.patch Seeing few of these NullPointerException during delta imports. Once this happens delta import stops working and keeps giving the same error. java.lang.NullPointerException at org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:622) at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:240) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:337) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:376) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:355) Running delta import for a particular entity fixes the problem and delta import start working again. Here is the log just before after the exception 05/27 11:59:29 86987686 INFO btpool0-538 org.apache.solr.core.SolrCore - [localhost] webapp=/solr path=/dataimport params={command=delta-importoptimize=false} status=0 QTime=0 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.SolrWriter - Read dataimport.properties 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.DataImporter - Starting Delta Import 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.SolrWriter - Read dataimport.properties 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Starting delta collection. 05/27 11:59:29 86987690 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Running ModifiedRowKey() for Entity: content 05/27 11:59:29 86987690 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed ModifiedRowKey for Entity: content rows obtained : 0 05/27 11:59:29 86987690 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed DeletedRowKey for Entity: content rows obtained : 0 05/27 11:59:29 86987692 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed parentDeltaQuery for Entity: content 05/27 11:59:29 86987692 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Running ModifiedRowKey() for Entity: job 05/27 11:59:29 86987692 INFO Thread-4162 org.apache.solr.handler.dataimport.JdbcDataSource - Creating a connection for entity job with URL: jdbc:sqlserver://localhost;databaseName=TestDB 05/27 11:59:29 86987704 INFO Thread-4162 org.apache.solr.handler.dataimport.JdbcDataSource - Time taken for getConnection(): 12 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed ModifiedRowKey for Entity: job rows obtained : 0 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed DeletedRowKey for Entity: job rows obtained : 0 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed parentDeltaQuery for Entity: job 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Delta Import completed successfully 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Starting delta collection. 05/27 11:59:29 86987709 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Running ModifiedRowKey() for Entity: user 05/27 11:59:29 86987709 INFO Thread-4162 org.apache.solr.handler.dataimport.JdbcDataSource - Creating a connection for entity user with URL: jdbc:sqlserver://localhost;databaseName=TestDB 05/27 11:59:29 86987716 INFO Thread-4162 org.apache.solr.handler.dataimport.JdbcDataSource - Time taken for getConnection(): 7 05/27 11:59:29 86987873 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed ModifiedRowKey for Entity: user rows obtained : 46 05/27 11:59:29 86987873 INFO Thread-4162
Re: duplicate records in index
I saw that. so careless.. Thanks. Wen Gao 2011/2/16 Digy digyd...@gmail.com You are adding the same doc twice. (See how you add acttime ) DIGY -Original Message- From: Wen Gao [mailto:samuel.gao...@gmail.com] Sent: Wednesday, February 16, 2011 11:35 AM To: lucene-net-...@lucene.apache.org Subject: duplicate records in index Hi, I am creating an index from my database, however, the record in .cfs files contains duplicate records, e.g. book1, 1, susan, 1 book1, 1,susan,1, 03/01/2010 book2, 2,tom, book2,2,tom, 2,03/02/2010 .. I got the data from several tables, and am sure that the sql only generate one record. Also, when I debug the code, the record is only added once. So I am confused whether data replicate in idex. I define my index as following format: doc.Add(new Lucene.Net.Documents.Field( lmname, readerreader1[lmname].ToString(), //new System.IO.StringReader(readerreader[cname].ToString()), Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.TOKENIZED) ); //lmid doc.Add(new Lucene.Net.Documents.Field( lmid, readerreader1[lmid].ToString(), Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.UN_TOKENIZED)); // nick name of user doc.Add(new Lucene.Net.Documents.Field( nickName, readerreader1[nickName].ToString(), Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.UN_TOKENIZED)); // uid doc.Add(new Lucene.Net.Documents.Field( uid, readerreader1[uid].ToString(), Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.UN_TOKENIZED)); writer.AddDocument(doc); // acttime doc.Add(new Lucene.Net.Documents.Field( acttime, readerreader1[acttime].ToString(), Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.UN_TOKENIZED)); writer.AddDocument(doc); // Any ideas? Thanks, Wen Gao
[jira] Updated: (SOLR-2367) DataImportHandler unit tests are very noisy
[ https://issues.apache.org/jira/browse/SOLR-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-2367: -- Attachment: SOLR-2367.patch Thanks for the patch. I modified it, to just specify the absolute path to these directories. this way we don't have to make any useless directories underneath the CWD. Separately, as far as the exceptions, this is in the test TestErrorHandling, its 'expected exceptions'. I tried to modify this test to use the 'expected exception' logic in SolrTestCaseJ4, etc, but I could not make it work. I think this is because DIH throws DataImportHandlerExceptions (extends RuntimeException) instead of ones that extend SolrException? DataImportHandler unit tests are very noisy --- Key: SOLR-2367 URL: https://issues.apache.org/jira/browse/SOLR-2367 Project: Solr Issue Type: Improvement Components: Build, contrib - DataImportHandler Reporter: Gunnlaugur Thor Briem Assignee: Robert Muir Priority: Trivial Attachments: SOLR-2367.patch, SOLR-2367.patch Original Estimate: 5m Remaining Estimate: 5m Running DataImportHandler unit tests emits a lot of console noise, mainly stacktraces because dataimport.properties can't be written. This makes it hard to scan the output for useful information. I'm attaching a patch to get rid of most of the noise by creating the conf directory before test runs so that the properties file write doesn't fail. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENENET-379) Clean up Lucene.Net website
[ https://issues.apache.org/jira/browse/LUCENENET-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995405#comment-12995405 ] Alex Thompson commented on LUCENENET-379: - The concept of the current logo isn't that bad, its just executed poorly (looks like someone did it in Paint). I don't mind if it changes but maybe keep a green color scheme to imply our loose connection with java lucene. Clean up Lucene.Net website --- Key: LUCENENET-379 URL: https://issues.apache.org/jira/browse/LUCENENET-379 Project: Lucene.Net Issue Type: Task Reporter: George Aroush Attachments: Lucene.zip, New Logo Idea.jpg, asfcms.zip, asfcms_1.patch The existing Lucene.Net home page at http://lucene.apache.org/lucene.net/ is still based on the incubation, out of date design. This JIRA task is to bring it up to date with other ASF project's web page. The existing website is here: https://svn.apache.org/repos/asf/lucene/lucene.net/site/ See http://www.apache.org/dev/project-site.html to get started. It would be best to start by cloning an existing ASF project's website and adopting it for Lucene.Net. Some examples, https://svn.apache.org/repos/asf/lucene/pylucene/site/ and https://svn.apache.org/repos/asf/lucene/java/site/ -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
subclassing Python classes in Java
How do I subclass a Python class in a JCC-wrapped Java module? In UpLib, I've got a class, uplib.ripper.Ripper, and I'd like to be able to create a Java subclass for that in my module. I presume I need a Java interface for that Python class, but how do I hook the two together so that the Java subclass can inherit from the Python class? Bill
[jira] Commented: (LUCENE-2903) Improvement of PForDelta Codec
[ https://issues.apache.org/jira/browse/LUCENE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995436#comment-12995436 ] hao yan commented on LUCENE-2903: - Thank both of you! Thanks for testing my codec so quickly, Michael! RE: One question: it looks like this PFOR impl can only handle up to 28 bit wide ints? Which means... could it could fail on some cases? Though I suppose you would never see too many of these immense ints in one block, and so they'd always be encoded as exceptions and so it's actually safe...? Hao: This won't fail. In my PFOR impl, I will first checkBigNumbers() to see if there is any number = 2^28, if there is, i will force encoding the lower 4 bits using the 128 4-bit slots. Thus, all exceptions left to simple16 are 2^28, which can definitely be handled. So, there is no failure cases!!! :) . BTW, my PFOR impl will save more index size than VInt and other PFOR impls. Thus, if the user case is real-time search which requires loading index from disk to memory frequently, my PFOR impl may save even more. Improvement of PForDelta Codec -- Key: LUCENE-2903 URL: https://issues.apache.org/jira/browse/LUCENE-2903 Project: Lucene - Java Issue Type: Improvement Reporter: hao yan Attachments: LUCENE-2903.patch, LUCENE-2903.patch, for_pfor.patch There are 3 versions of PForDelta implementations in the Bulk Branch: FrameOfRef, PatchedFrameOfRef, and PatchedFrameOfRef2. The FrameOfRef is a very basic one which is essentially a binary encoding (may result in huge index size). The PatchedFrameOfRef is the implmentation based on the original version of PForDelta in the literatures. The PatchedFrameOfRef2 is my previous implementation which are improved this time. (The Codec name is changed to NewPForDelta.). In particular, the changes are: 1. I fixed the bug of my previous version (in Lucene-1410.patch), where the old PForDelta does not support very large exceptions (since the Simple16 does not support very large numbers). Now this has been fixed in the new LCPForDelta. 2. I changed the PForDeltaFixedIntBlockCodec. Now it is faster than the other two PForDelta implementation in the bulk branch (FrameOfRef and PatchedFrameOfRef). The codec's name is NewPForDelta, as you can see in the CodecProvider and PForDeltaFixedIntBlockCodec. 3. The performance test results are: 1) My NewPForDelta codec is faster then FrameOfRef and PatchedFrameOfRef for almost all kinds of queries, slightly worse then BulkVInt. 2) My NewPForDelta codec can result in the smallest index size among all 4 methods, including FrameOfRef, PatchedFrameOfRef, and BulkVInt, and itself) 3) All performance test results are achieved by running with -server instead of -client -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
how can I get the similarity in fuzzy query
Hi, I am using FuzzyQuery to get fuzzy mathed results. I want to get the similarity in percent for every matched record. for example, if i search for databasd, and it will return results such as database, database1, and database11. I want to get the similarity in percent for evey record, such as 87.5%, 75%, and 62.5%. How can I do this? Any ideas? Wen Gao
[jira] Commented: (SOLR-2367) DataImportHandler unit tests are very noisy
[ https://issues.apache.org/jira/browse/SOLR-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995466#comment-12995466 ] Gunnlaugur Thor Briem commented on SOLR-2367: - Oh, right, much neater. DataImportHandler unit tests are very noisy --- Key: SOLR-2367 URL: https://issues.apache.org/jira/browse/SOLR-2367 Project: Solr Issue Type: Improvement Components: Build, contrib - DataImportHandler Reporter: Gunnlaugur Thor Briem Assignee: Robert Muir Priority: Trivial Attachments: SOLR-2367.patch, SOLR-2367.patch Original Estimate: 5m Remaining Estimate: 5m Running DataImportHandler unit tests emits a lot of console noise, mainly stacktraces because dataimport.properties can't be written. This makes it hard to scan the output for useful information. I'm attaching a patch to get rid of most of the noise by creating the conf directory before test runs so that the properties file write doesn't fail. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: how can I get the similarity in fuzzy query
http://wiki.apache.org/lucene-java/ScoresAsPercentages DIGY -Original Message- From: Wen Gao [mailto:samuel.gao...@gmail.com] Sent: Wednesday, February 16, 2011 8:55 PM To: lucene-net-...@lucene.apache.org Subject: how can I get the similarity in fuzzy query Hi, I am using FuzzyQuery to get fuzzy mathed results. I want to get the similarity in percent for every matched record. for example, if i search for databasd, and it will return results such as database, database1, and database11. I want to get the similarity in percent for evey record, such as 87.5%, 75%, and 62.5%. How can I do this? Any ideas? Wen Gao
[jira] Updated: (SOLR-2367) DataImportHandler unit tests are very noisy
[ https://issues.apache.org/jira/browse/SOLR-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunnlaugur Thor Briem updated SOLR-2367: Attachment: SOLR-2367-extend-SolrException.patch If it helps, here's a patch that makes DataImportHandlerException extend SolrException (and deprecates a constructor that seems not to be used anywhere). All tests pass, but beyond that this has not been tried out at runtime (and maybe the change isn't even appropriate?) ... does this make the exception silencing work? DataImportHandler unit tests are very noisy --- Key: SOLR-2367 URL: https://issues.apache.org/jira/browse/SOLR-2367 Project: Solr Issue Type: Improvement Components: Build, contrib - DataImportHandler Reporter: Gunnlaugur Thor Briem Assignee: Robert Muir Priority: Trivial Attachments: SOLR-2367-extend-SolrException.patch, SOLR-2367.patch, SOLR-2367.patch Original Estimate: 5m Remaining Estimate: 5m Running DataImportHandler unit tests emits a lot of console noise, mainly stacktraces because dataimport.properties can't be written. This makes it hard to scan the output for useful information. I'm attaching a patch to get rid of most of the noise by creating the conf directory before test runs so that the properties file write doesn't fail. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2923) remove writer.optimize() from contrib/demo
remove writer.optimize() from contrib/demo -- Key: LUCENE-2923 URL: https://issues.apache.org/jira/browse/LUCENE-2923 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.1, 4.0 I don't think we should include optimize in the demo; many people start from the demo and may think you must optimize to do searching, and that's clearly not the case. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2367) DataImportHandler unit tests are very noisy
[ https://issues.apache.org/jira/browse/SOLR-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995479#comment-12995479 ] Robert Muir commented on SOLR-2367: --- Thanks for the followup patch, I will try and see if i can use the exception ignores mechanism now with it... maybe this time it will work. DataImportHandler unit tests are very noisy --- Key: SOLR-2367 URL: https://issues.apache.org/jira/browse/SOLR-2367 Project: Solr Issue Type: Improvement Components: Build, contrib - DataImportHandler Reporter: Gunnlaugur Thor Briem Assignee: Robert Muir Priority: Trivial Attachments: SOLR-2367-extend-SolrException.patch, SOLR-2367.patch, SOLR-2367.patch Original Estimate: 5m Remaining Estimate: 5m Running DataImportHandler unit tests emits a lot of console noise, mainly stacktraces because dataimport.properties can't be written. This makes it hard to scan the output for useful information. I'm attaching a patch to get rid of most of the noise by creating the conf directory before test runs so that the properties file write doesn't fail. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: how can I get the similarity in fuzzy query
I was going to post the link that Digy posted, which suggests not to determine a match that way. If my understanding is correct, the scores returned for a query are relative to which documents were retrieved by the search, in that if a document is deleted from the index, the scores will change even though the query did not, because the number of returned documents are different. If the only thing you wanted to do was to calculate how a resulting string was to a search string, I suggest the Levenshtein Distance algorithm http://en.wikipedia.org/wiki/Levenshtein_distance...but it doesn't seem like that's quite what you want to accomplish based on your question. Christopher On Wed, Feb 16, 2011 at 10:55 AM, Wen Gao samuel.gao...@gmail.com wrote: Hi, I am using FuzzyQuery to get fuzzy mathed results. I want to get the similarity in percent for every matched record. for example, if i search for databasd, and it will return results such as database, database1, and database11. I want to get the similarity in percent for evey record, such as 87.5%, 75%, and 62.5%. How can I do this? Any ideas? Wen Gao
Re: how can I get the similarity in fuzzy query
Hi, I think my situation is just to compare the similarity of strings: I want to calculate the similarity between the typed results and the returned results using *FuzzyQuery*. I have set the minimumSimilarity of FuzzyQuery as 0.5f, what i want to do is get the similariy instead of score for every result that returns. Thanks for your time. Wen 2011/2/16 Christopher Currens currens.ch...@gmail.com I was going to post the link that Digy posted, which suggests not to determine a match that way. If my understanding is correct, the scores returned for a query are relative to which documents were retrieved by the search, in that if a document is deleted from the index, the scores will change even though the query did not, because the number of returned documents are different. If the only thing you wanted to do was to calculate how a resulting string was to a search string, I suggest the Levenshtein Distance algorithm http://en.wikipedia.org/wiki/Levenshtein_distance...but it doesn't seem like that's quite what you want to accomplish based on your question. Christopher On Wed, Feb 16, 2011 at 10:55 AM, Wen Gao samuel.gao...@gmail.com wrote: Hi, I am using FuzzyQuery to get fuzzy mathed results. I want to get the similarity in percent for every matched record. for example, if i search for databasd, and it will return results such as database, database1, and database11. I want to get the similarity in percent for evey record, such as 87.5%, 75%, and 62.5%. How can I do this? Any ideas? Wen Gao
[jira] Commented: (LUCENE-2923) cleanup contrib/demo
[ https://issues.apache.org/jira/browse/LUCENE-2923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995490#comment-12995490 ] Uwe Schindler commented on LUCENE-2923: --- Yeah, we remove the optimize. Too many people tell me exactly that they should optimize because they see it in almost every demo code. Optimizing is with recent Lucene versions not needed anymore. It's hard to explain to people, so example code and books should never tell to optimize. In books about lucene there should also be an explanation when optimizing is needed or usefully, put prevent people from always doing this. cleanup contrib/demo Key: LUCENE-2923 URL: https://issues.apache.org/jira/browse/LUCENE-2923 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.1, 4.0 I don't think we should include optimize in the demo; many people start from the demo and may think you must optimize to do searching, and that's clearly not the case. I think we should also use a buffered reader in FileDocument? And... I'm tempted to remove IndexHTML (and the html parser) entirely. It's ancient, and we now have Tika to extract text from many doc formats. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2923) cleanup contrib/demo
[ https://issues.apache.org/jira/browse/LUCENE-2923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2923: --- Attachment: LUCENE-2923.patch Patch. cleanup contrib/demo Key: LUCENE-2923 URL: https://issues.apache.org/jira/browse/LUCENE-2923 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.1, 4.0 Attachments: LUCENE-2923.patch I don't think we should include optimize in the demo; many people start from the demo and may think you must optimize to do searching, and that's clearly not the case. I think we should also use a buffered reader in FileDocument? And... I'm tempted to remove IndexHTML (and the html parser) entirely. It's ancient, and we now have Tika to extract text from many doc formats. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2923) cleanup contrib/demo
[ https://issues.apache.org/jira/browse/LUCENE-2923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995513#comment-12995513 ] Mark Miller commented on LUCENE-2923: - bq. I think we should also use a buffered reader in FileDocument? And close the reader... cleanup contrib/demo Key: LUCENE-2923 URL: https://issues.apache.org/jira/browse/LUCENE-2923 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.1, 4.0 Attachments: LUCENE-2923.patch I don't think we should include optimize in the demo; many people start from the demo and may think you must optimize to do searching, and that's clearly not the case. I think we should also use a buffered reader in FileDocument? And... I'm tempted to remove IndexHTML (and the html parser) entirely. It's ancient, and we now have Tika to extract text from many doc formats. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2923) cleanup contrib/demo
[ https://issues.apache.org/jira/browse/LUCENE-2923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995518#comment-12995518 ] Mark Miller commented on LUCENE-2923: - bq. I don't think we should include optimize in the demo; I wonder if it wouldn't be better to leave it, but commented out - with a short explanation. Optimizing is not necessary, but it clearly has benefits to query perf! If you are not updating often, I think it can make perfect sense. So I'm fine with just dropping, but not sure if commenting it out and putting something like: // for an index that is not updated often, we might optimize now or variation... cleanup contrib/demo Key: LUCENE-2923 URL: https://issues.apache.org/jira/browse/LUCENE-2923 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.1, 4.0 Attachments: LUCENE-2923.patch I don't think we should include optimize in the demo; many people start from the demo and may think you must optimize to do searching, and that's clearly not the case. I think we should also use a buffered reader in FileDocument? And... I'm tempted to remove IndexHTML (and the html parser) entirely. It's ancient, and we now have Tika to extract text from many doc formats. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: how can I get the similarity in fuzzy query
If you are running in VS 2010, I'd advise saving yourself some trouble and just grabbing the 2.9.2 package off nuget. On Wed, Feb 16, 2011 at 3:13 PM, Wen Gao samuel.gao...@gmail.com wrote: Thanks you. Wen 2011/2/16 Digy digyd...@gmail.com Download the source from https://svn.apache.org/repos/asf/incubator/lucene.net/tags/Lucene.Net_2_9_2 using a svn client(like TortoiseSVN), and open the project file with VS20XX. DIGY -Original Message- From: Wen Gao [mailto:samuel.gao...@gmail.com] Sent: Wednesday, February 16, 2011 9:58 PM To: lucene-net-...@lucene.apache.org Subject: Re: how can I get the similarity in fuzzy query OK. i get it. how can I recompile a Lucene_src on Windows? Thanks. Wen 2011/2/16 Christopher Currens currens.ch...@gmail.com As far as i know, you'll need to calculate that manually. FuzzyQuery searches don't return any results like that. On Wed, Feb 16, 2011 at 11:47 AM, Wen Gao samuel.gao...@gmail.com wrote: Hi, I think my situation is just to compare the similarity of strings: I want to calculate the similarity between the typed results and the returned results using *FuzzyQuery*. I have set the minimumSimilarity of FuzzyQuery as 0.5f, what i want to do is get the similariy instead of score for every result that returns. Thanks for your time. Wen 2011/2/16 Christopher Currens currens.ch...@gmail.com I was going to post the link that Digy posted, which suggests not to determine a match that way. If my understanding is correct, the scores returned for a query are relative to which documents were retrieved by the search, in that if a document is deleted from the index, the scores will change even though the query did not, because the number of returned documents are different. If the only thing you wanted to do was to calculate how a resulting string was to a search string, I suggest the Levenshtein Distance algorithm http://en.wikipedia.org/wiki/Levenshtein_distance...but it doesn't seem like that's quite what you want to accomplish based on your question. Christopher On Wed, Feb 16, 2011 at 10:55 AM, Wen Gao samuel.gao...@gmail.com wrote: Hi, I am using FuzzyQuery to get fuzzy mathed results. I want to get the similarity in percent for every matched record. for example, if i search for databasd, and it will return results such as database, database1, and database11. I want to get the similarity in percent for evey record, such as 87.5%, 75%, and 62.5%. How can I do this? Any ideas? Wen Gao
[jira] Commented: (SOLR-2367) DataImportHandler unit tests are very noisy
[ https://issues.apache.org/jira/browse/SOLR-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995537#comment-12995537 ] Robert Muir commented on SOLR-2367: --- I tried to use your patch and silence the tests in various ways... I was unsuccessful. Its a mystery really to me (because I don't understand the code that well) that all these exceptions are being thrown and nothing is failing... so I'm not sure how to silence them. Lets commit the first patch and fix 80% of the problem... maybe we can figure out the other exceptions in the future. I'll keep the issue open. DataImportHandler unit tests are very noisy --- Key: SOLR-2367 URL: https://issues.apache.org/jira/browse/SOLR-2367 Project: Solr Issue Type: Improvement Components: Build, contrib - DataImportHandler Reporter: Gunnlaugur Thor Briem Assignee: Robert Muir Priority: Trivial Attachments: SOLR-2367-extend-SolrException.patch, SOLR-2367.patch, SOLR-2367.patch Original Estimate: 5m Remaining Estimate: 5m Running DataImportHandler unit tests emits a lot of console noise, mainly stacktraces because dataimport.properties can't be written. This makes it hard to scan the output for useful information. I'm attaching a patch to get rid of most of the noise by creating the conf directory before test runs so that the properties file write doesn't fail. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2365) DIH should not be in the Solr war
[ https://issues.apache.org/jira/browse/SOLR-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995540#comment-12995540 ] David Smiley commented on SOLR-2365: Uwe; are you willing to put fix-for of 3.1 on this or is that a touchy subject? ;-P DIH should not be in the Solr war - Key: SOLR-2365 URL: https://issues.apache.org/jira/browse/SOLR-2365 Project: Solr Issue Type: Improvement Components: Build Reporter: David Smiley Priority: Minor Attachments: SOLR-2365_DIH_should_not_be_in_war.patch The DIH has a build.xml that puts itself into the Solr war file. This is the only contrib module that does this, and I don't think it should be this way. Granted there is a small dataimport.jsp file that would be most convenient to remain included, but the jar should not be. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2369) Zookeeper depends on log4j, thus also SolrCloud does
Zookeeper depends on log4j, thus also SolrCloud does Key: SOLR-2369 URL: https://issues.apache.org/jira/browse/SOLR-2369 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 3.1 Reporter: Jan Høydahl Reproduce: 1. Use default Solr example build (with JDK logging) 2. Run example C on http://wiki.apache.org/solr/SolrCloud 3. You get Exception: java.lang.NoClassDefFoundError: org/apache/log4j/jmx/HierarchyDynamicMBean at org.apache.zookeeper.jmx.ManagedUtil.registerLog4jMBeans(ManagedUtil.java:51) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:114) at org.apache.solr.cloud.SolrZkServer$1.run(SolrZkServer.java:111) Probable reason: Zookeeper depends on log4j Quickfix: Switch to log4j logging (as you cannot include both log4j bridge and log4j): * Remove log4j-over-slf4j-1.5.5.jar and slf4j-jdk14-1.5.5.jar * Add slf4j-logj12.jar and log4j-1.2.16.jar Document the shortcoming in release notes Long term fix: Vote for the resolution of ZOOKEEPER-850 which switches ZK to slf4j logging -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1553) extended dismax query parser
[ https://issues.apache.org/jira/browse/SOLR-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1299#comment-1299 ] David Smiley commented on SOLR-1553: I'm confused about why this cool query parser I've been using is experimental. Sure, there are opportunities for improvement, but it's already better than the original dismax which this makes obsolete. No? extended dismax query parser Key: SOLR-1553 URL: https://issues.apache.org/jira/browse/SOLR-1553 Project: Solr Issue Type: New Feature Reporter: Yonik Seeley Assignee: Yonik Seeley Fix For: 3.1 Attachments: SOLR-1553.patch, SOLR-1553.pf-refactor.patch, edismax.unescapedcolon.bug.test.patch, edismax.unescapedcolon.bug.test.patch, edismax.userFields.patch An improved user-facing query parser based on dismax -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2366) Facet Range Gaps
[ https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-2366: -- Attachment: SOLR-2366.patch Added more tests, cleaned up the patch, all tests pass. I think it is ready to commit and will do so in a day or two or maybe this weekend. Facet Range Gaps Key: SOLR-2366 URL: https://issues.apache.org/jira/browse/SOLR-2366 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Priority: Minor Fix For: 3.2, 4.0 Attachments: SOLR-2366.patch, SOLR-2366.patch There really is no reason why the range gap for date and numeric faceting needs to be evenly spaced. For instance, if and when SOLR-1581 is completed and one were doing spatial distance calculations, one could facet by function into 3 different sized buckets: walking distance (0-5KM), driving distance (5KM-150KM) and everything else (150KM+), for instance. We should be able to quantize the results into arbitrarily sized buckets. I'd propose the syntax to be a comma separated list of sizes for each bucket. If only one value is specified, then it behaves as it currently does. Otherwise, it creates the different size buckets. If the number of buckets doesn't evenly divide up the space, then the size of the last bucket specified is used to fill out the remaining space (not sure on this) For instance, facet.range.start=0 facet.range.end=400 facet.range.gap=5,25,50,100 would yield buckets of: 0-5,5-30,30-80,80-180,180-280,280-380,380-400 -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-756) Make DisjunctionMaxQueryParser generally useful by supporting all query types.
[ https://issues.apache.org/jira/browse/SOLR-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995567#comment-12995567 ] David Smiley commented on SOLR-756: --- Jan, you refer to the Extended Dismax QParser -- and the answer is no. I think you intended to comment on SOLR-758. This patch here, as I said in a comment above here https://issues.apache.org/jira/browse/SOLR-756?focusedCommentId=12630223page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12630223 only has to do with a specific improvement to SolrPluginUtils.java, that is an enabler for other improvements to DismaxQParser. According to Hoss, I need to add tests for this issue. Make DisjunctionMaxQueryParser generally useful by supporting all query types. -- Key: SOLR-756 URL: https://issues.apache.org/jira/browse/SOLR-756 Project: Solr Issue Type: Improvement Affects Versions: 1.3 Reporter: David Smiley Fix For: Next Attachments: SolrPluginUtilsDisMax.patch This is an enhancement to the DisjunctionMaxQueryParser to work on all the query variants such as wildcard, prefix, and fuzzy queries, and to support working in AND scenarios that are not processed by the min-should-match DisMax QParser. This was not in Solr already because DisMax was only used for a very limited syntax that didn't use those features. In my opinion, this makes a more suitable base parser for general use because unlike the Lucene/Solr parser, this one supports multiple default fields whereas other ones (say Yonik's {!prefix} one for example, can't do dismax). The notion of a single default field is antiquated and a technical under-the-hood detail of Lucene that I think Solr should shield the user from by on-the-fly using a DisMax when multiple fields are used. (patch to be attached soon) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995570#comment-12995570 ] Alex Cowell commented on SOLR-2358: --- bq. Since this functionality is core to Solr and should always be present, it would be natural to either build it into the DirectUpdateHandler2 or to add this processor to the set of default UpdateProcessors that are executed if no update.processor parameter is specified. What advantage would we gain from moving this functionality into DirectUpdateHandler2? From what I understand, the UpdateHandler deals directly with the index whereas the DistributedUpdateRequestProcessor merely takes requests deemed to be distributed by the request handler and distributes them to a list of shards based on a distribution policy. Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Reporter: William Mayor Priority: Minor Attachments: SOLR-2358.patch The first steps towards creating distributed indexing functionality in Solr -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (SOLR-1191) NullPointerException in delta import
[ https://issues.apache.org/jira/browse/SOLR-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-1191. Resolution: Fixed Fix Version/s: (was: 1.4) 3.1 Assignee: (was: Noble Paul) Thanks Gunnlaugur, I committed to trunk and 3x. NullPointerException in delta import Key: SOLR-1191 URL: https://issues.apache.org/jira/browse/SOLR-1191 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.3, 1.4 Environment: OS: Windows Linux. Java: 1.6 DB: MySQL SQL Server Reporter: Ali Syed Fix For: 3.1 Attachments: SOLR-1191.patch, SOLR-1191.patch Seeing few of these NullPointerException during delta imports. Once this happens delta import stops working and keeps giving the same error. java.lang.NullPointerException at org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:622) at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:240) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:337) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:376) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:355) Running delta import for a particular entity fixes the problem and delta import start working again. Here is the log just before after the exception 05/27 11:59:29 86987686 INFO btpool0-538 org.apache.solr.core.SolrCore - [localhost] webapp=/solr path=/dataimport params={command=delta-importoptimize=false} status=0 QTime=0 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.SolrWriter - Read dataimport.properties 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.DataImporter - Starting Delta Import 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.SolrWriter - Read dataimport.properties 05/27 11:59:29 86987687 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Starting delta collection. 05/27 11:59:29 86987690 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Running ModifiedRowKey() for Entity: content 05/27 11:59:29 86987690 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed ModifiedRowKey for Entity: content rows obtained : 0 05/27 11:59:29 86987690 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed DeletedRowKey for Entity: content rows obtained : 0 05/27 11:59:29 86987692 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed parentDeltaQuery for Entity: content 05/27 11:59:29 86987692 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Running ModifiedRowKey() for Entity: job 05/27 11:59:29 86987692 INFO Thread-4162 org.apache.solr.handler.dataimport.JdbcDataSource - Creating a connection for entity job with URL: jdbc:sqlserver://localhost;databaseName=TestDB 05/27 11:59:29 86987704 INFO Thread-4162 org.apache.solr.handler.dataimport.JdbcDataSource - Time taken for getConnection(): 12 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed ModifiedRowKey for Entity: job rows obtained : 0 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed DeletedRowKey for Entity: job rows obtained : 0 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed parentDeltaQuery for Entity: job 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Delta Import completed successfully 05/27 11:59:29 86987707 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Starting delta collection. 05/27 11:59:29 86987709 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Running ModifiedRowKey() for Entity: user 05/27 11:59:29 86987709 INFO Thread-4162 org.apache.solr.handler.dataimport.JdbcDataSource - Creating a connection for entity user with URL: jdbc:sqlserver://localhost;databaseName=TestDB 05/27 11:59:29 86987716 INFO Thread-4162 org.apache.solr.handler.dataimport.JdbcDataSource - Time taken for getConnection(): 7 05/27 11:59:29 86987873 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed ModifiedRowKey for Entity: user rows obtained : 46 05/27 11:59:29 86987873 INFO Thread-4162 org.apache.solr.handler.dataimport.DocBuilder - Completed DeletedRowKey for Entity: user rows obtained : 0 05/27 11:59:29 86987873 INFO
[jira] Updated: (SOLR-2367) DataImportHandler unit tests are very noisy
[ https://issues.apache.org/jira/browse/SOLR-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunnlaugur Thor Briem updated SOLR-2367: Attachment: SOLR-2367-log-exceptions-through-SolrException.patch Here goes the remaining 20% — I'm attaching SOLR-2367-log-exceptions-through-SolrException.patch which makes {{DataImportHandler}} log exceptions through {{SolrException.log()}} instead of directly into the logger. This way the exception-ignoring mechanism gets a say in matters. Test output is nice and clean now. I addressed only those logger calls that were emitting exceptions in unit test runs. Note: this does *not* require {{DataImportHandlerException}} to extend {{SolrException}}, so the earlier SOLR-2367-extend-SolrException.patch is not needed. (Might still be worthwhile, I don't know — but not needed for this fix). DataImportHandler unit tests are very noisy --- Key: SOLR-2367 URL: https://issues.apache.org/jira/browse/SOLR-2367 Project: Solr Issue Type: Improvement Components: Build, contrib - DataImportHandler Reporter: Gunnlaugur Thor Briem Assignee: Robert Muir Priority: Trivial Attachments: SOLR-2367-extend-SolrException.patch, SOLR-2367-log-exceptions-through-SolrException.patch, SOLR-2367.patch, SOLR-2367.patch Original Estimate: 5m Remaining Estimate: 5m Running DataImportHandler unit tests emits a lot of console noise, mainly stacktraces because dataimport.properties can't be written. This makes it hard to scan the output for useful information. I'm attaching a patch to get rid of most of the noise by creating the conf directory before test runs so that the properties file write doesn't fail. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2365) DIH should not be in the Solr war
[ https://issues.apache.org/jira/browse/SOLR-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995583#comment-12995583 ] Uwe Schindler commented on SOLR-2365: - +1; who wants to set the touchy fix version? DIH should not be in the Solr war - Key: SOLR-2365 URL: https://issues.apache.org/jira/browse/SOLR-2365 Project: Solr Issue Type: Improvement Components: Build Reporter: David Smiley Priority: Minor Attachments: SOLR-2365_DIH_should_not_be_in_war.patch The DIH has a build.xml that puts itself into the Solr war file. This is the only contrib module that does this, and I don't think it should be this way. Granted there is a small dataimport.jsp file that would be most convenient to remain included, but the jar should not be. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2365) DIH should not be in the Solr war
[ https://issues.apache.org/jira/browse/SOLR-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated SOLR-2365: Fix Version/s: 4.0 3.1 DIH should not be in the Solr war - Key: SOLR-2365 URL: https://issues.apache.org/jira/browse/SOLR-2365 Project: Solr Issue Type: Improvement Components: Build Reporter: David Smiley Priority: Minor Fix For: 3.1, 4.0 Attachments: SOLR-2365_DIH_should_not_be_in_war.patch The DIH has a build.xml that puts itself into the Solr war file. This is the only contrib module that does this, and I don't think it should be this way. Granted there is a small dataimport.jsp file that would be most convenient to remain included, but the jar should not be. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2366) Facet Range Gaps
[ https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995585#comment-12995585 ] Hoss Man commented on SOLR-2366: the use case of facet.range (and facet.date before it) was always about having ranges generated for you automatcly using a fixed gap size. if you want variable gap sizes, it's just as easy to specify them using facet.query. i don't really understand how your proposal adds value over using facet.query for the ranges you want to have specific widths, and then using facet.range for the rest of the ranges you want generated automaticly with a specific gap. it just seems like a more confusing way of expressing the same thing Facet Range Gaps Key: SOLR-2366 URL: https://issues.apache.org/jira/browse/SOLR-2366 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Priority: Minor Fix For: 3.2, 4.0 Attachments: SOLR-2366.patch, SOLR-2366.patch There really is no reason why the range gap for date and numeric faceting needs to be evenly spaced. For instance, if and when SOLR-1581 is completed and one were doing spatial distance calculations, one could facet by function into 3 different sized buckets: walking distance (0-5KM), driving distance (5KM-150KM) and everything else (150KM+), for instance. We should be able to quantize the results into arbitrarily sized buckets. I'd propose the syntax to be a comma separated list of sizes for each bucket. If only one value is specified, then it behaves as it currently does. Otherwise, it creates the different size buckets. If the number of buckets doesn't evenly divide up the space, then the size of the last bucket specified is used to fill out the remaining space (not sure on this) For instance, facet.range.start=0 facet.range.end=400 facet.range.gap=5,25,50,100 would yield buckets of: 0-5,5-30,30-80,80-180,180-280,280-380,380-400 -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2368) Improve extended dismax (edismax) parser
[ https://issues.apache.org/jira/browse/SOLR-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995587#comment-12995587 ] Jan Høydahl commented on SOLR-2368: --- I agree with David's comments on SOLR-1553 that edismax is already good enough to replace dismax already, as it is clearly better, more useful and also backward compatible. It may still need some tuning, but not replacing dismax now in 3.1 could be an example of perfect being the enemy of good :) In Cominvent, we've been using edismax as the main query parser on all customer projects for several months now, and it is clearly much better than the old dismax, which is not robust enough nor does it allow the syntaxes which people have come to expect. We have not seen any bugs or instabilities on either of these sites where it is live: www.dn.no, www.libris.no, http://www.rechargenews.com/search?q=oil+AND+(usa+OR+eu) and many more. May I suggest the following for 3.1: * defType=dismax is changed to point to Extended DisMax * defType=basicdismax is pointed to the old Basic DisMax (to give people a way to revert if needed) * defType=edismax is dropped (or added as a temporary alias to dismax) * The wiki page http://wiki.apache.org/solr/DisMaxQParserPlugin is edited to reflect the changes, and specific parameters or features which are likely to be changed in future are marked as experimental, may change to warn people. Improve extended dismax (edismax) parser Key: SOLR-2368 URL: https://issues.apache.org/jira/browse/SOLR-2368 Project: Solr Issue Type: Improvement Reporter: Yonik Seeley Improve edismax and replace dismax once it has all of the needed features. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1553) extended dismax query parser
[ https://issues.apache.org/jira/browse/SOLR-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995588#comment-12995588 ] Hoss Man commented on SOLR-1553: bq. I'm confused about why this cool query parser I've been using is experimental because some of it's current default behavior is less then ideal, particularly for people migrating from dismax (ie: see comments about making field queries configurable) and in a few cases even broken compared to how it worked when the patch was initially commited (see recent comments about foo:bar when foo is *not* a field) in general, marking it experimental is a way to allow us to leave it in the 3.1 release but still have the flexibility to modify the default behavior moving forward. extended dismax query parser Key: SOLR-1553 URL: https://issues.apache.org/jira/browse/SOLR-1553 Project: Solr Issue Type: New Feature Reporter: Yonik Seeley Assignee: Yonik Seeley Fix For: 3.1 Attachments: SOLR-1553.patch, SOLR-1553.pf-refactor.patch, edismax.unescapedcolon.bug.test.patch, edismax.unescapedcolon.bug.test.patch, edismax.userFields.patch An improved user-facing query parser based on dismax -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2365) DIH should not be in the Solr war
[ https://issues.apache.org/jira/browse/SOLR-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995591#comment-12995591 ] Hoss Man commented on SOLR-2365: +1 we need to make sure to call this out at the top of CHANGES.txt so people upgrading from 1.x know they *must* modify their solrconfig.xml (to add the {{lib/}} directive) if they use DIH ... but yeah, if it doesn't need to be in hte war for that JSP to work, then let's keep it as an isolated contrib jar. DIH should not be in the Solr war - Key: SOLR-2365 URL: https://issues.apache.org/jira/browse/SOLR-2365 Project: Solr Issue Type: Improvement Components: Build Reporter: David Smiley Priority: Minor Fix For: 3.1, 4.0 Attachments: SOLR-2365_DIH_should_not_be_in_war.patch The DIH has a build.xml that puts itself into the Solr war file. This is the only contrib module that does this, and I don't think it should be this way. Granted there is a small dataimport.jsp file that would be most convenient to remain included, but the jar should not be. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1581) Facet by Function
[ https://issues.apache.org/jira/browse/SOLR-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995594#comment-12995594 ] Hoss Man commented on SOLR-1581: could probably reuse the sort parsing code for this ... it does a pretty good job of doing a quick test for field names, then looking for a matching function, then falling back to an assumption of esoteric field names Facet by Function - Key: SOLR-1581 URL: https://issues.apache.org/jira/browse/SOLR-1581 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Fix For: Next It would be really great if we could execute a function and quantize it into buckets that could then be returned as facets. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2368) Improve extended dismax (edismax) parser
[ https://issues.apache.org/jira/browse/SOLR-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995596#comment-12995596 ] Hoss Man commented on SOLR-2368: {quote} May I suggest the following for 3.1: * defType=dismax is changed to point to Extended DisMax {quote} -1 beyond the key value of don't break on malformed input that using edismax would bring to existing dismax users, edismax's default behavior changes to many things for me to want to recommend it to existing dismax users (or change the default out from under them) the code will be there in 3.1, and savy users can use it, and we can fix the bugs and defaults as we move forward. Improve extended dismax (edismax) parser Key: SOLR-2368 URL: https://issues.apache.org/jira/browse/SOLR-2368 Project: Solr Issue Type: Improvement Reporter: Yonik Seeley Improve edismax and replace dismax once it has all of the needed features. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2368) Improve extended dismax (edismax) parser
[ https://issues.apache.org/jira/browse/SOLR-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995606#comment-12995606 ] Jan Høydahl commented on SOLR-2368: --- As much as I believe the known issues will affect only a tiny percentage of existing (or new) dismax users, I have no problem with a more phased approach. Trying to see what's best for the user community. On a humorous note, if I was the non-savy user upgrading Solr from 1.4.1 to 3.1, I'd for sure read those release notes carefylly and test it all, given the huge version leap :) It would really help a quicker resolution of this long-running issue, if the current edismax features and params are documented on the Wiki for others to test, and that all known bugs and planned improvements are detailed here or linked to this issue so me and others may know how to contribute. Improve extended dismax (edismax) parser Key: SOLR-2368 URL: https://issues.apache.org/jira/browse/SOLR-2368 Project: Solr Issue Type: Improvement Reporter: Yonik Seeley Improve edismax and replace dismax once it has all of the needed features. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2348) No error reported when using a FieldCached backed ValueSource for a field Solr knows won't work
[ https://issues.apache.org/jira/browse/SOLR-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995612#comment-12995612 ] Hoss Man commented on SOLR-2348: Committed revision 1071459. - trunk working on 3x backporting now No error reported when using a FieldCached backed ValueSource for a field Solr knows won't work --- Key: SOLR-2348 URL: https://issues.apache.org/jira/browse/SOLR-2348 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Fix For: 3.1, 4.0 Attachments: SOLR-2348.patch, SOLR-2348.patch For the same reasons outlined in SOLR-2339, Solr FieldTypes that return FieldCached backed ValueSources should explicitly check for situations where knows the FieldCache is meaningless. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Solr-dev mailing list on Nabble
: As I mentioned on the solr-dev mailing list : http://lucene.472066.n3.nabble.com/wind-down-for-3-1-tp2414923p2483929.html, : David Smiley's responses to emails on dev@l.a.o have been going to : solr-dev@l.a.o. This is a problem, and it's not restricted to David's : emails. what problem does it cause? : I put up a support request on Nabble: : http://nabble-support.1.n2.nabble.com/solr-dev-mailing-list-td6023495.html : and the only response so far seems to indicate that mailing lists are : managed by admins associated with the project with which each mailing : list is associated. this is relatively new -- it's a change they made several years ago, but at the time the solr-dev and java-dev archives were setup, anyone could add/configure a list archive forum -- even the description was community editable (i know i remember writing the description on that page, but i just checked and i don't have a nabble account) I've even recieved emails from nabble telling me that forums i'm the admin of (ie: i asked them to start archive a mailing list) are scheduled for deletion do to inactivity, but when i try to login or recover the password for the accout they sent me email at, their system says i have no account. According to the People pages for solr-dev and java-dev, some guy named Hugo is the only adminstrator of those forums... http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=app_peoplenode=506503filter=Administrators http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=app_peoplenode=564358filter=Administrators -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2370) Let some UpdateProcessors be default without explicitly configuring them
Let some UpdateProcessors be default without explicitly configuring them Key: SOLR-2370 URL: https://issues.apache.org/jira/browse/SOLR-2370 Project: Solr Issue Type: Improvement Components: update Reporter: Jan Høydahl Problem: Today the user needs to make sure that crucial UpdateProcessors like the Log- and Run UpdateProcessors are present when creating a new UpdateRequestProcessorChain. This is error prone, and when introducing a new core UpdateProcessor, like in SOLR-2358, all existing users need to insert the changes into all their pipelines. A customer made pipeline should not need to care about distributed indexing, logging or anything else, and should be as slim as possible. Proposal: The proposal is to lend from the first-components and last-components pattern used in RequestHandler configs. In that way, we could let all core processors be included either first or last by default in all UpdateChains. To do this, we need a place to configure the defaults, e.g. by a default=true param: {code:xml} updateRequestProcessorChain name=default default=true first-processors processor class=solr.DistributedUpdateRequestProcessor/ /first-processors last-processors processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /last-processors /updateRequestProcessorChain {code} Next, the customer made chain will be only the center part: {code:xml} updateRequestProcessorChain name=mychain processor class=my.nice.DoSomethingProcessor/ processor class=my.nice.DoAnotherThingProcessor/ /updateRequestProcessorChain {code} To override the core processors config for a particular chain, you would start a clean chain by parameter reset=true {code:xml} updateRequestProcessorChain name=mychain reset=true processor class=my.nice.DoSomethingProcessor/ processor class=my.nice.DoAnotherThingProcessor/ processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain {code} If you only need to make sure that one of your custom processors run at the very beginning or the very end, you could use: {code:xml} updateRequestProcessorChain name=mychain processor class=my.nice.DoSomethingProcessor/ processor class=my.nice.DoAnotherThingProcessor/ last-processors processor class=solr.MySpecialDebugProcessor / /last-processors /updateRequestProcessorChain {code} The default should be reset=false, but the example schema could keep the default chain commented out to provide backward compatibility for upgraders. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2358) Distributing Indexing
[ https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995628#comment-12995628 ] Jan Høydahl commented on SOLR-2358: --- I'm not sure if DirectUpdateHandler2 is the right location either. My point is that the user should not need to manually make sure that the UpdateProcessor is present in all his UpdateChains for distributed indexing to work. See new issue SOLR-2370 for a suggestion on how to tackle this. Distributing Indexing - Key: SOLR-2358 URL: https://issues.apache.org/jira/browse/SOLR-2358 Project: Solr Issue Type: New Feature Reporter: William Mayor Priority: Minor Attachments: SOLR-2358.patch The first steps towards creating distributed indexing functionality in Solr -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (SOLR-2348) No error reported when using a FieldCached backed ValueSource for a field Solr knows won't work
[ https://issues.apache.org/jira/browse/SOLR-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-2348. Resolution: Fixed Committed revision 1071480. - 3x No error reported when using a FieldCached backed ValueSource for a field Solr knows won't work --- Key: SOLR-2348 URL: https://issues.apache.org/jira/browse/SOLR-2348 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Fix For: 3.1, 4.0 Attachments: SOLR-2348.patch, SOLR-2348.patch For the same reasons outlined in SOLR-2339, Solr FieldTypes that return FieldCached backed ValueSources should explicitly check for situations where knows the FieldCache is meaningless. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1553) extended dismax query parser
[ https://issues.apache.org/jira/browse/SOLR-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995642#comment-12995642 ] Ryan McKinley commented on SOLR-1553: - the 'experimental' label is a flag to say that the behavior will likely change in the future -- since back compatibility is taken so seriously, this allows a way to add features before they are 100% cooked. extended dismax query parser Key: SOLR-1553 URL: https://issues.apache.org/jira/browse/SOLR-1553 Project: Solr Issue Type: New Feature Reporter: Yonik Seeley Assignee: Yonik Seeley Fix For: 3.1 Attachments: SOLR-1553.patch, SOLR-1553.pf-refactor.patch, edismax.unescapedcolon.bug.test.patch, edismax.unescapedcolon.bug.test.patch, edismax.userFields.patch An improved user-facing query parser based on dismax -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2366) Facet Range Gaps
[ https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995652#comment-12995652 ] Grant Ingersoll commented on SOLR-2366: --- bq. it just seems like a more confusing way of expressing the same thing I think it's a lot less confusing. You only have to express start, end and the size of the buckets you want. With facet.query, you have to write out each expression for every bucket and do the math on all the boundaries. I don't think it is just as easy to specify using facet.query. Not too mention that facet.query also involves a lot more parsing. Facet Range Gaps Key: SOLR-2366 URL: https://issues.apache.org/jira/browse/SOLR-2366 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Priority: Minor Fix For: 3.2, 4.0 Attachments: SOLR-2366.patch, SOLR-2366.patch There really is no reason why the range gap for date and numeric faceting needs to be evenly spaced. For instance, if and when SOLR-1581 is completed and one were doing spatial distance calculations, one could facet by function into 3 different sized buckets: walking distance (0-5KM), driving distance (5KM-150KM) and everything else (150KM+), for instance. We should be able to quantize the results into arbitrarily sized buckets. I'd propose the syntax to be a comma separated list of sizes for each bucket. If only one value is specified, then it behaves as it currently does. Otherwise, it creates the different size buckets. If the number of buckets doesn't evenly divide up the space, then the size of the last bucket specified is used to fill out the remaining space (not sure on this) For instance, facet.range.start=0 facet.range.end=400 facet.range.gap=5,25,50,100 would yield buckets of: 0-5,5-30,30-80,80-180,180-280,280-380,380-400 -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org