Re: SolrCloud new....
Hi, I'm busy doing the exact same thing. I figured things out - all by myself - the wiki page is a nice 'fist view', but doesn't goes in dept... Lets go ahead: 1)Should i copy the libraries from cloud to trunk??? 2)should i keep the cloud module in every system??? A: Yes, you should. You should get yourself the latest dev trunk and compile it. The steps I followed: + grap latest trunk build solr + backup all solr config files + in dir tomcat6/webapps/ remove the dir 'solr' + copy the new solr.war ( which you build in first step ) to tomcat6/webapps + On your Solr_home/conf dir solrconfig.xml need to be replaced by a new one ( you take from example dir of your build) -- some other config files ( like schema.xml ) you may keep using the old ones. + Adapt the new files to represent the old configuration + restart tomcat and it will install new version of solr It seems the index isn't compatible - so you need to flush your whole index and re-index all data. And finally you have your solr system back with zookeeper integrated in /admin zone :) 3) I am not using any cores in the solr. It is a single solr in every system.can solrcloud support it?? A: Actually you are using one cor - so gives no problem. But be sure to check you have solr.xml file in your solr_home dir. This file just mentions all cores - in your case just one core; ( you can find examples of layout of this file easily on http://wiki.apache.org/solr/CoreAdmin ) 4) the example is given in jetty.Is it the same way to make it in tomcat??? A: Right now - it is the same way. You have to edit your /etc/init.d/tomcat6 startup script. In the start) section you can specify all the JAVA_OPTS ( the ones the solrcloud wiki mentions). Be sure to set following one: export JAVA_OPTS=$JAVA_OPTS -DhostPort=8080 ( if tomcat runs on port 8080 ) At first I didn't -- my zookeeper pointed to standard 8983 port, which gave errors. In the above I gave you a quick peak how to get the SolrCloud feature. In above the Zookeeper is embedded in one of your solr machines. If you don't want this you may place zookeeper on a different machine ( like I'm doing right now). If you need more help - you can contact me. Stijn Vanhoorelbeke, -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-new-tp1528872p2526080.html Sent from the Solr - User mailing list archive at Nabble.com.
[solrCloud] Distributed IDF - scoring in the cloud
Hi all, doing the solrCloud examples and one thing I am not clear about is the scoring in a distributed search. I did a small test where I used the Example A: Simple two shard cluster from wiki:SolrCloud and additional added java -Durl=http://localhost:7574/solr/collection1/update -jar post.jar ipod_other.xml java -Durl=http://localhost:8983/solr/collection1/update -jar post.jar monitor2.xml Now requesting http://localhost:8983/solr/collection1/select?distrib=trueq=electronicsfl=scoreshards=localhost:8983/solr,localhost:7574/solr for both host will return the same result. Here we get the score for each hit based on the shard specific score and merge them into one result doc. However when I add monitor2.xml as well to 7574 which previously did not contained this, the scoring changes depending on the server I request. The score returned for 8983 is always float name=score0.09289607/float being distrib=true|false The score returned for 7574 is always float name=score0.121383816/float being distrib=true|false So is it correct to assume that if a document is indexed in both shards the score which will predominate is the one from the host which has been requested? My client plan to distribute the current index into different shards. For example each Consejería (counseling) should be hosted in a shard. The critical point for the client is that the scoring is the same as in the big unique index they use right now for a distributed search. As I understand the current solrCloud implementation there is no concern about harmonizing the score. In my research I came across http://markmail.org/message/bhhfwymz5y7lvoj7 The IDF part of the relevancy score is the only place that distributed search scoring won't match up with no distributed scoring because the document frequency used for the term is local to every core instead of global. If you distribute your documents fairly randomly to the different shards, this won't matter. There is a patch in the works to add global idf, but I think that even when it's committed, it will default to off because of the higher cost associated with it. the patch is https://issues.apache.org/jira/browse/SOLR-1632 However last comment is from 26/Jul/10 reporting the patch failed and a comment from Yonik give the impression that is not ready to use: It looks like the issue is this: rewrite() doesn't work for function queries (there is no propagation mechanism to go through value sources). This is a problem when real queries are embedded in function queries. Is there a general interest to bring 1632 to the trunk (especially for solrCloud)? Or may it be better to look into something that aims to scale the index into hbase so he does not lose the scoring. TIA for your feedback -- Thorsten Scherler thorsten.at.apache.org codeBusters S.L. - web based systems consulting, training and solutions http://www.codebusters.es/ smime.p7s Description: S/MIME cryptographic signature
Re: GET or POST for large queries?
Thanks for the tip. No, I did not know about that. Unfortunately, we use Oracle OLS which does not appear to be supported. Jan Høydahl / Cominvent wrote: Hi, There are better ways to combat row level security in search than sending huge lists of users over the wire. Have you checked out the ManifoldCF project with which you can integrate security to Solr? http://incubator.apache.org/connectors/ -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com -- View this message in context: http://lucene.472066.n3.nabble.com/GET-or-POST-for-large-queries-tp2521700p2527765.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: GET or POST for large queries?
OK. I would ask on the mailing list of ManifoldCF to see if they have some experience with OLS. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 18. feb. 2011, at 17.29, mrw wrote: Thanks for the tip. No, I did not know about that. Unfortunately, we use Oracle OLS which does not appear to be supported. Jan Høydahl / Cominvent wrote: Hi, There are better ways to combat row level security in search than sending huge lists of users over the wire. Have you checked out the ManifoldCF project with which you can integrate security to Solr? http://incubator.apache.org/connectors/ -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com -- View this message in context: http://lucene.472066.n3.nabble.com/GET-or-POST-for-large-queries-tp2521700p2527765.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Validate Query Syntax of Solr Request Before Sending
Hi, FYI, I found out. I'm using the SolrQueryParser (tadaa...) It needs the solrconfig.xml and the solr.xml files in other to validate the query. Then I'm able to validate any query before sending it to the Solrserver, thereby preventing unnecessary requests. /Christian -- View this message in context: http://lucene.472066.n3.nabble.com/Validate-Query-Syntax-of-Solr-Request-Before-Sending-tp2515797p2528183.html Sent from the Solr - User mailing list archive at Nabble.com.
Best way for a query-expander?
Hello Solr-friends, I want to implement a query-expander, one that enriches the input by the usage of extra parameters that, for example, a form may provide. Is the right way to subclass SearchHandler? Or rather to subclass QueryComponent? thanks in advance paul
Dih sproc call
I an trying to call a stored procedure using query= in DIH. I tried exec name, call name, and name and none works. This is SQL server 2008. Bill Bell Sent from mobile On Feb 18, 2011, at 10:27 AM, Paul Libbrecht p...@hoplahup.net wrote: Hello Solr-friends, I want to implement a query-expander, one that enriches the input by the usage of extra parameters that, for example, a form may provide. Is the right way to subclass SearchHandler? Or rather to subclass QueryComponent? thanks in advance paul
Re: Best way for a query-expander?
Hi Paul, what do you understand by saying extra parameters? Regards Paul Libbrecht-4 wrote: Hello Solr-friends, I want to implement a query-expander, one that enriches the input by the usage of extra parameters that, for example, a form may provide. Is the right way to subclass SearchHandler? Or rather to subclass QueryComponent? thanks in advance paul -- View this message in context: http://lucene.472066.n3.nabble.com/Best-way-for-a-query-expander-tp2528194p2528736.html Sent from the Solr - User mailing list archive at Nabble.com.
Dih sproc does not work
I an trying to call a stored procedure using query= in DIH. I tried exec name, call name, and name and none works. This is SQL server 2008. Bill Bell Sent from mobile
Re: Best way for a query-expander?
Erm... extra web-request-parameters simply. paul Le 18 févr. 2011 à 19:37, Em a écrit : Hi Paul, what do you understand by saying extra parameters? Regards Paul Libbrecht-4 wrote: Hello Solr-friends, I want to implement a query-expander, one that enriches the input by the usage of extra parameters that, for example, a form may provide. Is the right way to subclass SearchHandler? Or rather to subclass QueryComponent? thanks in advance
Re: Best way for a query-expander?
Hi Paul, me and a colleague worked on a QParserPlugin to expand alias field names to many existing field names ex: q=mockfield:val == q=actualfield1:val OR actualfield2:val but if you want to be able to use other params that come from the HTTP request you should use a custom RequestHandler I think, My 2 cents, Tommaso 2011/2/18 Em mailformailingli...@yahoo.de Hi Paul, what do you understand by saying extra parameters? Regards Paul Libbrecht-4 wrote: Hello Solr-friends, I want to implement a query-expander, one that enriches the input by the usage of extra parameters that, for example, a form may provide. Is the right way to subclass SearchHandler? Or rather to subclass QueryComponent? thanks in advance paul -- View this message in context: http://lucene.472066.n3.nabble.com/Best-way-for-a-query-expander-tp2528194p2528736.html Sent from the Solr - User mailing list archive at Nabble.com.
Understanding multi-field queries with q and fq
After searching this list, Google, and looking through the Pugh book, I am a little confused about the right way to structure a query. The Packt book uses the example of the MusicBrainz DB full of song metadata. What if they also had the song lyrics in English and German as files on disk, and wanted to index them along with the metadata, so that each document would basically have song title, artist, publisher, date, ..., All_Metadata (copy field of all metadata fields), Text_English, and Text_German fields? There can only be one default field, correct? So if we want to search for all songs containing (zeppelin AND (dog OR merle)) do we repeat the entire query text for all three major fields in the 'q' clause (assuming we don't want to use the cache): q=(+All_Metadata:zeppelin AND (dog OR merle)+Text_English:zeppelin AND (dog OR merle)+Text_German:(zeppelin AND (dog OR merle)) or repeat the entire query text for all three major fields in the 'fq' clause (assuming we want to use the cache): q=*:*fq=(+All_Metadata:zeppelin AND (dog OR merle)+Text_English:zeppelin AND (dog OR merle)+Text_German:zeppelin AND (dog OR merle)) ? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Understanding-multi-field-queries-with-q-and-fq-tp2528866p2528866.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Dih sproc does not work
When I use 'call sprocname' it does call the process, but I am not getting the select into Solr. It shows 0 docs added. I am only returning 1 rs. Bill Bell Sent from mobile On Feb 18, 2011, at 11:49 AM, Bill Bell billnb...@gmail.com wrote: I an trying to call a stored procedure using query= in DIH. I tried exec name, call name, and name and none works. This is SQL server 2008. Bill Bell Sent from mobile
Re: Best way for a query-expander?
using rb.req.getParams().get(blip) inside prepare(ResponseBuilder)'s subclass of QueryComponent I could easily get the extra http request param. However, how would I change the query? using rb.setQuery(xxx) within that same prepare method seems to have no effect. paul Le 18 févr. 2011 à 19:51, Tommaso Teofili a écrit : Hi Paul, me and a colleague worked on a QParserPlugin to expand alias field names to many existing field names ex: q=mockfield:val == q=actualfield1:val OR actualfield2:val but if you want to be able to use other params that come from the HTTP request you should use a custom RequestHandler I think, My 2 cents, Tommaso 2011/2/18 Em mailformailingli...@yahoo.de Hi Paul, what do you understand by saying extra parameters? Regards Paul Libbrecht-4 wrote: Hello Solr-friends, I want to implement a query-expander, one that enriches the input by the usage of extra parameters that, for example, a form may provide. Is the right way to subclass SearchHandler? Or rather to subclass QueryComponent? thanks in advance paul -- View this message in context: http://lucene.472066.n3.nabble.com/Best-way-for-a-query-expander-tp2528194p2528736.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr.KeepWordsFilterFactory confusion
Thanks for your response. After making that change it seemed at first like it made no difference, after restarting the jetty server, and reindexing the test object, the display still shows: arr name=format_facet strVideo/str strStreaming Video/str strOnline/str strGooberhead/str strBook of the Month/str /arr But it turns out that I had been making an incorrect assumption. I was looking at the retruned stored values for the solr document, and seeing the Gooberhead entry listed, and thinking that the analyzer wasn't running. However as I have subsequently figured out, the analyzers are not run on the data that is to be stored, only on the data that is to being indexed. So after making your change to that field type statement, if I search for format_facet:Gooberhead I get results = 0 which is what I'd expect. But seeing that the unexpected values are still stored with the solr document, it seems that I will have to take a different approach. Thanks again. -Bob Haschart Ahmet Arslan wrote: I've added a new field type in schema.xml: fieldType name=formatFacet class=solr.StrField sortMissingLast=true omitNorms=true analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.KeepWordFilterFactory words=format_facet.txt ignoreCase=false / /analyzer /fieldType class=solr.StrField should be class=solr.TextField
Re: Best way for a query-expander?
it does work! Le 18 févr. 2011 à 20:48, Paul Libbrecht a écrit : using rb.req.getParams().get(blip) inside prepare(ResponseBuilder)'s subclass of QueryComponent I could easily get the extra http request param. However, how would I change the query? using rb.setQuery(xxx) within that same prepare method seems to have no effect. Sorry for the noise, it does have the exact desired effect. Nice pattern. I believe everyone needs query expansion except maybe if using Dismax. paul Le 18 févr. 2011 à 19:51, Tommaso Teofili a écrit : Hi Paul, me and a colleague worked on a QParserPlugin to expand alias field names to many existing field names ex: q=mockfield:val == q=actualfield1:val OR actualfield2:val but if you want to be able to use other params that come from the HTTP request you should use a custom RequestHandler I think, My 2 cents, Tommaso
XML Stripping from DIH
Hi all- I have some XML in a database that I am trying to index and store; I am interested in the various pieces of text, but none of the tags. I've been trying to figure out a way to strip all the tags out, but haven't found anything within Solr to do so; the XML parser seems to want XPath to get the various element values, when all I want is to turn the whole thing into one blob of text, regardless of whether it makes any contextual sense. Is there something in Solr to do this, or is it something I'd have to write myself (which I'm willing to do if necessary)? Thanks for any info, Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Re: solr.KeepWordsFilterFactory confusion
--- On Fri, 2/18/11, Robert Haschart rh...@virginia.edu wrote: From: Robert Haschart rh...@virginia.edu Subject: Re: solr.KeepWordsFilterFactory confusion To: solr-user@lucene.apache.org Date: Friday, February 18, 2011, 10:19 PM Thanks for your response. After making that change it seemed at first like it made no difference, after restarting the jetty server, and reindexing the test object, the display still shows: arr name=format_facet strVideo/str strStreaming Video/str strOnline/str strGooberhead/str strBook of the Month/str /arr But it turns out that I had been making an incorrect assumption. I was looking at the retruned stored values for the solr document, and seeing the Gooberhead entry listed, and thinking that the analyzer wasn't running. However as I have subsequently figured out, the analyzers are not run on the data that is to be stored, only on the data that is to being indexed. So after making your change to that field type statement, if I search for format_facet:Gooberhead I get results = 0 which is what I'd expect. But seeing that the unexpected values are still stored with the solr document, it seems that I will have to take a different approach. Facets are populated from indexed values. However deleted documents (and their terms) are not really deleted until an optimize. Issuing an optimize may help in your case.
Re: Index Design Question
Thank you. These are good general suggestion. Regarding the optimization for indexing vs. querying: are there any specific recommendations for each of those cases available somewhere. A link, for example, would be fabulous. I'm also still curious about solutions that go further. For example, there is a 2007 Lucene Overview presentation by Aaron Bannert claiming that Lucene provides built-in methods to allow queries to span multiple remote Lucene indexes. and A much more involved way to achieving high levels of update performance can be had by dividing the data into separate “columns”, or “silos”. Each column will hold a subset of the overall data, and will only receive updates for data that it controls. By taking advantage of the remote index merging query utility mentioned on an earlier slide, the data can still be searched in its entirety without any loss of accuracy and with negligible performance impact. Is this possible using Solr? How could this be accomplished? Again, any link would be fabulous. The wiki page http://wiki.apache.org/solr/MergingSolrIndexes seems to describe a somewhat different approach to merging. Is this something that could be integrated into master/slave replication by having two masters and one merged slave (in the above sense of separate “columns”, or “silos”)? If yes, what are the performance considerations when using it?
DIH threads
Has anyone applied the DIH threads patch on 1.4.1 (https://issues.apache.org/jira/browse/SOLR-1352)? Does anyone know if this works and/or does it improve performance? Thanks
Removing duplicates
I know that I can use the SignatureUpdateProcessorFactory to remove duplicates but I would like the duplicates in the index but remove them conditionally at query time. Is there any easy way I could accomplish this?
Re: solr current workding directory or reading config files
: I have a class (in a jar) that reads from properties (text) files. I have these : files in the same jar file as the class. : : However, when my class reads those properties files, those files cannot be found : since solr reads from tomcat's bin directory. Can you elaborate a bit more on what these Jars are? ... are these Solr Plugins you've writen (ie: that know about the internal Solr APIs?) ? ... how does your jar realted to solr? are you building your own solr.war containing those jars, or are you loading it using a solr plugin lib directory? ... what do you mean by my class reads those properties files ? ... what code are you using to read them? what log/error messages are you getting? : I don't really want to put the config files in tomcat's bin directory. in an ideal world, solr would never use the current working directory, and would only ever pay attention to the Solr Home dir and paths things specificly mentioned by config directives -- but the world is not ideal, and solr definitely has some historic behavior that does utilize the CWD. But if you are using Solr's ResourceLoader API in your plugin, it should actively try to find your resource in a multitude of places (if it's not an absolute path) need more specifics to understand exactly what is going wrong for you though. -Hoss
Re: Help migrating from Lucene
: to our indexing service are defined in a central interface. Here is an : example of a query executed from a programmatically constructed Lucene : query. ... : solrQuery.setQuery(query.toString()); first of all, be advised that Query.toString() is not garunteed to produce a string that the Lucene QueryParser can parse back into a real query. If you are programaticly building up a Lucene query just to format it back as a string, you should probably consider just programaticly building up the Solr query string. Second: you should also consider the fact that there may be better ways to express your query to solr that are more efficient, or do what you want more then what you had before (ie: some of those MUST clauses you had probably are ment to act as filters, which don't need to influence the scores, and are most likely reused on many queries -- in which case specifying them using fq instead of q is going to make things simpler/faster and give you better relevancy scores on your real user input. : How can I set the sort into the java client? Did you look at the SolrQuery.addSortField method? : Also, with the annotations of Pojo's outlined here. ... : How are sets handled? For instance, how are Lists of other POJO's added to : the document? i had no idea, but a google serach for solrj annotation beans lead me... http://lucene.472066.n3.nabble.com/Does-SolrJ-support-nested-annotated-beans-td868375.html ...and then to... https://issues.apache.org/jira/browse/SOLR-1945 -Hoss
Re: Dih sproc call
: References: ce2ecd6b-7a3f-4669-972d-492ab89c8...@hoplahup.net : In-Reply-To: ce2ecd6b-7a3f-4669-972d-492ab89c8...@hoplahup.net : Subject: Dih sproc call http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is hidden in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. -Hoss
Re: Best way for a query-expander?
: I want to implement a query-expander, one that enriches the input by the : usage of extra parameters that, for example, a form may provide. : : Is the right way to subclass SearchHandler? : Or rather to subclass QueryComponent? This smells like the poster child for an X/Y problem (or maybe an X/(Y OR Z) problem)... if you can elaborate a bit more on the type of enrichment you want to do, it's highly likely that your goal can be met w/o needing to write a custom plugin (i'm thinking particularly of the multitudes of parsers solr already has, local params, and variable substitution) http://people.apache.org/~hossman/#xyproblem XY Problem Your question appears to be an XY Problem ... that is: you are dealing with X, you are assuming Y will help you, and you are asking about Y without giving more details about the X so that we can understand the full issue. Perhaps the best solution doesn't involve Y at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 -Hoss
Re: DIH threads
I used it on 4,0 and it did not help us. We were bound on SQL io Bill Bell Sent from mobile On Feb 18, 2011, at 4:47 PM, Mark static.void@gmail.com wrote: Has anyone applied the DIH threads patch on 1.4.1 (https://issues.apache.org/jira/browse/SOLR-1352)? Does anyone know if this works and/or does it improve performance? Thanks
adding a TimerTask
Hi, How can I add a TimerTask to Solr? Tri
Remove part of keywords from existing index and merging new index
Hello, I am not sure if it is possible. 1. I have a document of 100MB, I want to remove keywords started with a specific pattern, e.g. abc*, so all keywords started with abc* in the index will be removed, and I don't need to reindex the document again. 2. I have another document of 100KB, I want to append the new document to an existing one, without the new to reindex the existing document again. I believe (2) is possible, but not sure about (1). Thanks.
Indexing AutoCAD files
Hi team, Is there a way lucene can index AutoCAD files - *.dwg files? If so, please let me know. Can you please provide some insight on the same? Thanks in advance.. Regards Vignesh
Index Autocad
Hi team, Is there a way lucene can index AutoCAD files – “*.dwg” files? If so, please let me know. Can you please provide some insight on the same? Thanks in advance.. Regards Vignesh