Getting hits in RequestHandler
Hi, I am writing my own request handler and I was wondering how I go about get a list of hits back. Thanks. -- View this message in context: http://www.nabble.com/Getting-hits-in-RequestHandler-tp24248810p24248810.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: nested dismax queries
Ensdorf Ken schrieb: For exmaple, a user might enter Alabama Biotechnology in the main search box, triggering a dismax request which returns lots of different types of results. They may then want to refine their search by selecting a specific industry from a drop-down box. We handle this by adding a filterquery (fq=) to the original query. We have dozens of additional fields like this - some with a finite set of discrete values, some with arbitrary text values. The combinations are infinite, and I'm worried we will overwhelm the filterCache by supporting all of these cases as filter queries. Filter queries with arbitrary text values may swamp the cache in 1.3. Otherwise, the combinations aren't infinite. Keep the filters seperate in order to limit their number. Specify two simple filters instead of one composite filter, fq=x:bla and fq=y:blub instead of fq=x:bla AND y:blub. See: filterCache/@size, queryResultCache/@size, documentCache/@size http://markmail.org/thread/tb6aanicpt43okcm Michael Ludwig
Re: plans for switching to maven2 (after 1.4 release)?
I'm not particularly opposed to it, but I'm not exactly for it either. I very much have a love hate relationship with Maven. The simple things work fine w/ Maven and the power of pointing Eclipse or IntelliJ at a POM file and having the whole project imported and ready to work on w/o one iota of setup is something that the proponents of Ant just don't get, especially when it comes to multiple module builds like Solr and Lucene have.That being said, there are a lot of headaches with Maven, number one being releases, number two being anything custom and number three being the constant instability of the magic happening behind the scenes with it upgrading dependencies, etc. automatically. Finally, I've always had a hard time getting help in Maven land. It always seemed to me the number of incoming questions outweighed the number of answers about 10 to 1. I converted Mahout to Maven and it was a pain. I also use Maven for personal development as well. It is much easier to start fresh on Maven than it is to add it in later. And, there is something to be said for the Maven Ant plugin, but even that is clunky. In the end, I think I'd be +0 on it. It's also come up in the past on the lists and there never is a clear consensus. -Grant On Jun 28, 2009, at 12:33 PM, aldana wrote: hi, are there plans to migrate from ant to maven2? maybe not for the current trunk (mainline for 1.4), but maybe for the trunk after releasing solr 1.4. it makes the build more standard and easier to import to IDEs. - manuel aldana aldana((at))gmx.de software-engineering blog: http://www.aldana-online.de -- View this message in context: http://www.nabble.com/plans-for-switching-to-maven2-%28after-1.4-release%29--tp24243036p24243036.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: plans for switching to maven2 (after 1.4 release)?
I'll weigh in and throw a -1 to a Maven-only build system for Solr. If there is still a functioning Ant build, but Mavenites have a parallel setup, that's fine by me and I'd be -0 on that. These days, Buildr has my attention as a way to get the best of all worlds: access to Ant's powerful task library, POM/repo handling, AND Ruby :) Erik On Jun 29, 2009, at 9:01 AM, Grant Ingersoll wrote: I'm not particularly opposed to it, but I'm not exactly for it either. I very much have a love hate relationship with Maven. The simple things work fine w/ Maven and the power of pointing Eclipse or IntelliJ at a POM file and having the whole project imported and ready to work on w/o one iota of setup is something that the proponents of Ant just don't get, especially when it comes to multiple module builds like Solr and Lucene have.That being said, there are a lot of headaches with Maven, number one being releases, number two being anything custom and number three being the constant instability of the magic happening behind the scenes with it upgrading dependencies, etc. automatically. Finally, I've always had a hard time getting help in Maven land. It always seemed to me the number of incoming questions outweighed the number of answers about 10 to 1. I converted Mahout to Maven and it was a pain. I also use Maven for personal development as well. It is much easier to start fresh on Maven than it is to add it in later. And, there is something to be said for the Maven Ant plugin, but even that is clunky. In the end, I think I'd be +0 on it. It's also come up in the past on the lists and there never is a clear consensus. -Grant On Jun 28, 2009, at 12:33 PM, aldana wrote: hi, are there plans to migrate from ant to maven2? maybe not for the current trunk (mainline for 1.4), but maybe for the trunk after releasing solr 1.4. it makes the build more standard and easier to import to IDEs. - manuel aldana aldana((at))gmx.de software-engineering blog: http://www.aldana-online.de -- View this message in context: http://www.nabble.com/plans-for-switching-to-maven2-%28after-1.4-release%29--tp24243036p24243036.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: facets: case and accent insensitive sort
Thanks for your reply. I will have a look at this. Peter Wolanin a écrit : Seems like this might be approached using a Lucene payload? For example where the original string is stored as the payload and available in the returned facets for display purposes? Payloads are byte arrays stored with Terms on Fields. See https://issues.apache.org/jira/browse/LUCENE-755 Solr seems to have support for a few example payloads already like NumericPayloadTokenFilter Almost any way you approach this it seems like there are potentially problems since you might have multiple combinations of case and accent mapping to the same case-less accent-less value that you want to use for sorting (and I assume for counting) your facets? -Peter On Fri, Jun 26, 2009 at 9:02 AM, Sébastien Lamylamys...@free.fr wrote: Shalin Shekhar Mangar a écrit : On Fri, Jun 26, 2009 at 6:02 PM, Sébastien Lamy lamys...@free.fr wrote: If I use a copyField to store into a string type, and facet on that, my problem remains: The facets are sorted case and accent sensitive. And I want an *insensitive* sort. If I use a copyField to store into a type with no accents and case (e.g alphaOnlySort), then solr return me facet values with no accents and no case. And I want the facet values returned by solr to *have accents and case*. Ah, of course you are right. There is no way to do this right now except at the client side. Thank you for your response. Would it be easy to modify Solr to behave like I want. Where should I start to investigate?
Excluding Characters and SubStrings in a Faceted Wildcard Query
Hello, I've been using SOLR for a while now, but am stuck for information on two issues : 1) Is it possible to exclude characters in a SOLR facet wildcard query? e.g. [^,]* to match any character except an , ? 2) Can one setup the facet wildcard query to return the exact sub strings it matched of the queried facet, rather than the whole string? I hope somebody can help :) Thanks, Ben
Re: Excluding Characters and SubStrings in a Faceted Wildcard Query
Ben, Could you post an example of the type of data you're dealing with and how you want it handled? I suspect there is a way to accomplish what you want using an analyzed field, or by preprocessing the data you're indexing. Erik On Jun 29, 2009, at 9:29 AM, Ben wrote: Hello, I've been using SOLR for a while now, but am stuck for information on two issues : 1) Is it possible to exclude characters in a SOLR facet wildcard query? e.g. [^,]* to match any character except an , ? 2) Can one setup the facet wildcard query to return the exact sub strings it matched of the queried facet, rather than the whole string? I hope somebody can help :) Thanks, Ben
Re: plans for switching to maven2 (after 1.4 release)?
On Jun 29, 2009, at 9:01 AM, Grant Ingersoll wrote: I converted Mahout to Maven and it was a pain. I'd add, however, that now that it is done, it is fine, except of course, that Maven 2.1.0 doesn't work with it apparently because of upgrades.
RE: nested dismax queries
Filter queries with arbitrary text values may swamp the cache in 1.3. Are you implying this won't happen in 1.4? Can you point me to the feature that would mitigate this? Otherwise, the combinations aren't infinite. Keep the filters seperate in order to limit their number. Specify two simple filters instead of one composite filter, fq=x:bla and fq=y:blub instead of fq=x:bla AND y:blub. See: filterCache/@size, queryResultCache/@size, documentCache/@size http://markmail.org/thread/tb6aanicpt43okcm Michael Ludwig That's what I was thinking would make the most sense, assuming the intersection of the cached bitmaps is efficient enough. Thanks for the reply. -Ken
Re: Excluding Characters and SubStrings in a Faceted Wildcard Query
Hi Erik, I'm not sure exactly how much context you need here, so I'll try to keep it short and expand as needed. The column I am faceting contains a comma deliniated set of vectors. Each vector is made up of {Make,Year,Model} e.g. _ford_1996_focus,mercedes_1996_clk,ford_2000_focus I have a custom request handler, where if I want to find all the cars from 1996 I pass in a facet query for the Year (1996) which is transformed to a wildcard facet query : _*_1996_* In otherwords, it'll match any records whose vector column contains a string, which somewhere has a car from 1996. Why not put the Make, Year and Model in separate columns and do a facet query of multiple columns?... because once we've selected 1996, we should (in the above example) then be offering ford and mercedes as further facet choices, and nothing more. If the parts were in their own columns, there would be no way to tie the Makes and Models to specific years, for example. At anyrate, the wildcard search returns the entire match (_ford_1996_focus,mercedes_1996_clk,ford_2000_focus). I then have to do another RegExp over it to extract only the two parts (the first ford and mercedes) that were from 1996. This isn't using SOLR's cache very effectively. It would be excellent if SOLR could break up that comma separated list into three different parts, and run the RegExp over each , returning only those which match. Is that what you're implying with Analysis? If that were the case, I'd not need to worry about character exclusion. Sorry if that's a bit fuzzy... it's hard trying to explain enough to be useful, but not too much that it turns into an essay!!! Thanks, Ben The solution I'm using is to form a vector Erik Hatcher wrote: Ben, Could you post an example of the type of data you're dealing with and how you want it handled? I suspect there is a way to accomplish what you want using an analyzed field, or by preprocessing the data you're indexing. Erik On Jun 29, 2009, at 9:29 AM, Ben wrote: Hello, I've been using SOLR for a while now, but am stuck for information on two issues : 1) Is it possible to exclude characters in a SOLR facet wildcard query? e.g. [^,]* to match any character except an , ? 2) Can one setup the facet wildcard query to return the exact sub strings it matched of the queried facet, rather than the whole string? I hope somebody can help :) Thanks, Ben
Re: nested dismax queries
Ensdorf Ken schrieb: Filter queries with arbitrary text values may swamp the cache in 1.3. Are you implying this won't happen in 1.4? I intended to say just this, but I was on the wrong track. Can you point me to the feature that would mitigate this? What I was thinking of is the following: [#SOLR-475] multi-valued faceting via un-inverted field https://issues.apache.org/jira/browse/SOLR-475 But as you can see, this refers to faceting on multi-valued fields, not to filter queries with arbitrary text. I was off on a tangent. Sorry. To get back to your initial mail, I tend to think that drop-down boxes (the values of which you control) are a nice match for the filter query, whereas user-entered text is more likely to be a candidate for the main query. Michael Ludwig
RE: nested dismax queries
Filter queries with arbitrary text values may swamp the cache in 1.3. Are you implying this won't happen in 1.4? I intended to say just this, but I was on the wrong track. Can you point me to the feature that would mitigate this? What I was thinking of is the following: [#SOLR-475] multi-valued faceting via un-inverted field https://issues.apache.org/jira/browse/SOLR-475 But as you can see, this refers to faceting on multi-valued fields, not to filter queries with arbitrary text. I was off on a tangent. Sorry. To get back to your initial mail, I tend to think that drop-down boxes (the values of which you control) are a nice match for the filter query, whereas user-entered text is more likely to be a candidate for the main query. Michael Ludwig I agree, which brings me back tot the issue of combining dismax with standard queries. It looks like we may need to create a custom query parser to get optimal performance. Thanks again.
Entire heap consumed to answer initial ping()
Jconsole shows the entire 2.1g heap consumed on the first request (a simple ping) to Solr after a Tomcat restart. After a Tomcat restart: 13140 tomcatvirtual=2255m resident=183m ... jsvc After the ping(): 13140 tomcatvirtual=2255m resident=2.0g ... jsvc Jconsole says my Tenured Gen heap is at 100%. 08:06:02 Lucene Implementation Version: 2.9-dev 719313 - 2008-11-20 23:51:24 Java: JAVA_OPTS=-Xmx2048M -Xms2048M -XX:MaxPermSize=128M -Xshare:off i.e. about 2.1g I have several solr test instances under this tomcat. When one gets all the heap after restart, I can't even open the admin interface to the others. Can someone advise me as to whether this is a Tomcat issue or a Solr issue? And an approach to fixing this? Thanks! Phil Farber
Re: plans for switching to maven2 (after 1.4 release)?
I know migrating to maven2 has its pain points but in my view is worth it if one sees it as a long run investment. It follows standards/conventions and importing projects to IDEs like eclipse or IntelliJ is much more straightforward. When using maven getting used to a new project using it is also much quicker as grasping propriertary builds reinventing the wheel. After having used maven2 for three years now I really couldn't live with it (though in the beginning when migrating builds I was swearing at its evil details). Support (documentation + mailing-list) has also greatly improved since then. Because smooth migration is not that easy, one should maybe take the cut after release 1.4 or 1.5? Though I am not so much into codebase history would like to help out. Grant Ingersoll schrieb: I'm not particularly opposed to it, but I'm not exactly for it either. I very much have a love hate relationship with Maven. The simple things work fine w/ Maven and the power of pointing Eclipse or IntelliJ at a POM file and having the whole project imported and ready to work on w/o one iota of setup is something that the proponents of Ant just don't get, especially when it comes to multiple module builds like Solr and Lucene have.That being said, there are a lot of headaches with Maven, number one being releases, number two being anything custom and number three being the constant instability of the magic happening behind the scenes with it upgrading dependencies, etc. automatically. Finally, I've always had a hard time getting help in Maven land. It always seemed to me the number of incoming questions outweighed the number of answers about 10 to 1. I converted Mahout to Maven and it was a pain. I also use Maven for personal development as well. It is much easier to start fresh on Maven than it is to add it in later. And, there is something to be said for the Maven Ant plugin, but even that is clunky. In the end, I think I'd be +0 on it. It's also come up in the past on the lists and there never is a clear consensus. -Grant On Jun 28, 2009, at 12:33 PM, aldana wrote: hi, are there plans to migrate from ant to maven2? maybe not for the current trunk (mainline for 1.4), but maybe for the trunk after releasing solr 1.4. it makes the build more standard and easier to import to IDEs. - manuel aldana aldana((at))gmx.de software-engineering blog: http://www.aldana-online.de -- View this message in context: http://www.nabble.com/plans-for-switching-to-maven2-%28after-1.4-release%29--tp24243036p24243036.html Sent from the Solr - User mailing list archive at Nabble.com. -- manuel aldana ald...@gmx.de software-engineering blog: http://www.aldana-online.de
RE: plans for switching to maven2 (after 1.4 release)?
FWIW I strongly agree with your sentiments, Manual. One of the neat maven features that isn't well known is just being able to do mvn jetty:run and have Jetty load up right away (no creating of a web-app directory or packaging of a war or anything like that). What I hate about ant based projects is that each ant file is yet another build script to figure out. That and dealing with .jar's of course. Yeah, maven can be annoying at times. ~ David Smiley From: manuel aldana [ald...@gmx.de] Sent: Monday, June 29, 2009 5:36 PM To: solr-user@lucene.apache.org Subject: Re: plans for switching to maven2 (after 1.4 release)? I know migrating to maven2 has its pain points but in my view is worth it if one sees it as a long run investment. It follows standards/conventions and importing projects to IDEs like eclipse or IntelliJ is much more straightforward. When using maven getting used to a new project using it is also much quicker as grasping propriertary builds reinventing the wheel. After having used maven2 for three years now I really couldn't live with it (though in the beginning when migrating builds I was swearing at its evil details). Support (documentation + mailing-list) has also greatly improved since then. Because smooth migration is not that easy, one should maybe take the cut after release 1.4 or 1.5? Though I am not so much into codebase history would like to help out. Grant Ingersoll schrieb: I'm not particularly opposed to it, but I'm not exactly for it either. I very much have a love hate relationship with Maven. The simple things work fine w/ Maven and the power of pointing Eclipse or IntelliJ at a POM file and having the whole project imported and ready to work on w/o one iota of setup is something that the proponents of Ant just don't get, especially when it comes to multiple module builds like Solr and Lucene have.That being said, there are a lot of headaches with Maven, number one being releases, number two being anything custom and number three being the constant instability of the magic happening behind the scenes with it upgrading dependencies, etc. automatically. Finally, I've always had a hard time getting help in Maven land. It always seemed to me the number of incoming questions outweighed the number of answers about 10 to 1. I converted Mahout to Maven and it was a pain. I also use Maven for personal development as well. It is much easier to start fresh on Maven than it is to add it in later. And, there is something to be said for the Maven Ant plugin, but even that is clunky. In the end, I think I'd be +0 on it. It's also come up in the past on the lists and there never is a clear consensus. -Grant On Jun 28, 2009, at 12:33 PM, aldana wrote: hi, are there plans to migrate from ant to maven2? maybe not for the current trunk (mainline for 1.4), but maybe for the trunk after releasing solr 1.4. it makes the build more standard and easier to import to IDEs. - manuel aldana aldana((at))gmx.de software-engineering blog: http://www.aldana-online.de -- View this message in context: http://www.nabble.com/plans-for-switching-to-maven2-%28after-1.4-release%29--tp24243036p24243036.html Sent from the Solr - User mailing list archive at Nabble.com. -- manuel aldana ald...@gmx.de software-engineering blog: http://www.aldana-online.de
Re: Reverse querying
Any other suggestion this suggestion doesn't loook to work AlexElba wrote: Otis Gospodnetic wrote: Alex Oleg, Look at MemoryIndex in Lucene's contrib. It's the closest thing to what you are looking for. What you are describing is sometimes referred to as prospective search, sometimes saved searches, and a few other names. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: AlexElba ramal...@yahoo.com To: solr-user@lucene.apache.org Sent: Wednesday, June 24, 2009 7:47:20 PM Subject: Reverse querying Hello, I have problem which I am trying to solve using solr. I have search text (term) and I have index full of words which are mapped to ids. Is there any query that I can run to do this? Example: Term 3) A recommendation to use VAR=value in the configure command line will not work with some 'configure' scripts that comply to GNU standards but are not generated by autoconf. Index docs id:1 name:recommendation ... id:3 name:GNU id:4 name food after running query I want to get as results 1 and 3 Thanks -- View this message in context: http://www.nabble.com/Reverse-querying-tp24194777p24194777.html Sent from the Solr - User mailing list archive at Nabble.com. Hello, I looked into this MemoryIndex, there search is returning only score. Which will mean is it here or not I build test method base on example Term: On my last night in the Silicon Valley area, I decided to head up the east side of San Francisco Bay to visit Vito’s Pizzeria located in Newark, California. I have to say it was excellent! I met the owner (Vito!) and after eating a couple slices I introduced myself. I was happy to know he was familiar with the New York Pizza Blog and the New York Pizza Finder directory. Once we got to talking he decided I NEEDED to try some bread sticks and home-made marinara sauce and they were muy delicioso. I finished off my late night snack with a meatball dipped in the same marinara. Data {Silicon Valley, New York, Chicago} public static void find(String term, SetString data) throws Exception { Analyzer analyzer = PatternAnalyzer.EXTENDED_ANALYZER; MemoryIndex index = new MemoryIndex(); int i = 0; for (String str : data) { index.addField(bn + i, str, analyzer); i++; } QueryParser parser = new QueryParser(bn*, analyzer); Query query = parser.parse(URLEncoder.encode(term, UTF-8)); float score = index.search(query); if (score 0.0f) { System.out.println(it's a match); } else { System.out.println(no match found); } // System.out.println(indexData= + index.toString()); } no match found What am I doing wrong? Thanks, Alex -- View this message in context: http://www.nabble.com/Reverse-querying-tp24194777p24261892.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Entire heap consumed to answer initial ping()
Hello, If Solr is your only webapp in that container, than this is probably a Solr issue. Note that Solr issue could also mean issue with your ping query. Perhaps you can provide some more information about the size of your index, number of docs, your ping query, including the relevant piece of the config in the email, and such. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Phillip Farber pfar...@umich.edu To: solr-user solr-user@lucene.apache.org Sent: Monday, June 29, 2009 4:20:26 PM Subject: Entire heap consumed to answer initial ping() Jconsole shows the entire 2.1g heap consumed on the first request (a simple ping) to Solr after a Tomcat restart. After a Tomcat restart: 13140 tomcatvirtual=2255m resident=183m ... jsvc After the ping(): 13140 tomcatvirtual=2255m resident=2.0g ... jsvc Jconsole says my Tenured Gen heap is at 100%. 08:06:02 Lucene Implementation Version: 2.9-dev 719313 - 2008-11-20 23:51:24 Java: JAVA_OPTS=-Xmx2048M -Xms2048M -XX:MaxPermSize=128M -Xshare:off i.e. about 2.1g I have several solr test instances under this tomcat. When one gets all the heap after restart, I can't even open the admin interface to the others. Can someone advise me as to whether this is a Tomcat issue or a Solr issue? And an approach to fixing this? Thanks! Phil Farber