Index Autocad
Hi team, Is there a way lucene can index AutoCAD files – “*.dwg” files? If so, please let me know. Can you please provide some insight on the same? Thanks in advance.. Regards Vignesh
Indexing AutoCAD files
Hi team, Is there a way lucene can index AutoCAD files - "*.dwg" files? If so, please let me know. Can you please provide some insight on the same? Thanks in advance.. Regards Vignesh
Remove part of keywords from existing index and merging new index
Hello, I am not sure if it is possible. 1. I have a document of 100MB, I want to remove keywords started with a specific pattern, e.g. abc*, so all keywords started with abc* in the index will be removed, and I don't need to reindex the document again. 2. I have another document of 100KB, I want to append the new document to an existing one, without the new to reindex the existing document again. I believe (2) is possible, but not sure about (1). Thanks.
adding a TimerTask
Hi, How can I add a TimerTask to Solr? Tri
Re: DIH threads
I used it on 4,0 and it did not help us. We were bound on SQL io Bill Bell Sent from mobile On Feb 18, 2011, at 4:47 PM, Mark wrote: > Has anyone applied the DIH threads patch on 1.4.1 > (https://issues.apache.org/jira/browse/SOLR-1352)? > > Does anyone know if this works and/or does it improve performance? > > Thanks > >
Re: Best way for a query-expander?
: I want to implement a query-expander, one that enriches the input by the : usage of extra parameters that, for example, a form may provide. : : Is the right way to subclass SearchHandler? : Or rather to subclass QueryComponent? This smells like the poster child for an X/Y problem (or maybe an "X/(Y OR Z)" problem)... if you can elaborate a bit more on the type of enrichment you want to do, it's highly likely that your goal can be met w/o needing to write a custom plugin (i'm thinking particularly of the multitudes of parsers solr already has, local params, and variable substitution) http://people.apache.org/~hossman/#xyproblem XY Problem Your question appears to be an "XY Problem" ... that is: you are dealing with "X", you are assuming "Y" will help you, and you are asking about "Y" without giving more details about the "X" so that we can understand the full issue. Perhaps the best solution doesn't involve "Y" at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 -Hoss
Re: Dih sproc call
: References: : In-Reply-To: : Subject: Dih sproc call http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is "hidden" in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. -Hoss
Re: Help migrating from Lucene
: to our indexing service are defined in a central interface. Here is an : example of a query executed from a programmatically constructed Lucene : query. ... : solrQuery.setQuery(query.toString()); first of all, be advised that Query.toString() is not garunteed to produce a string that the Lucene QueryParser can parse back into a real query. If you are programaticly building up a Lucene query just to format it back as a string, you should probably consider just programaticly building up the Solr query string. Second: you should also consider the fact that there may be better ways to express your query to solr that are more efficient, or do what you want more then what you had before (ie: some of those MUST clauses you had probably are ment to act as "filters", which don't need to influence the scores, and are most likely reused on many queries -- in which case specifying them using "fq" instead of "q" is going to make things simpler/faster and give you better relevancy scores on your real user input. : How can I set the sort into the java client? Did you look at the "SolrQuery.addSortField" method? : Also, with the annotations of Pojo's outlined here. ... : How are sets handled? For instance, how are Lists of other POJO's added to : the document? i had no idea, but a google serach for "solrj annotation beans" lead me... http://lucene.472066.n3.nabble.com/Does-SolrJ-support-nested-annotated-beans-td868375.html ...and then to... https://issues.apache.org/jira/browse/SOLR-1945 -Hoss
Re: solr current workding directory or reading config files
: I have a class (in a jar) that reads from properties (text) files. I have these : files in the same jar file as the class. : : However, when my class reads those properties files, those files cannot be found : since solr reads from tomcat's bin directory. Can you elaborate a bit more on what these Jars are? ... are these Solr Plugins you've writen (ie: that know about the internal Solr APIs?) ? ... how does your jar realted to solr? are you building your own solr.war containing those jars, or are you loading it using a solr plugin "lib" directory? ... what do you mean by "my class reads those properties files" ? ... what code are you using to "read" them? what log/error messages are you getting? : I don't really want to put the config files in tomcat's bin directory. in an ideal world, solr would never use the current working directory, and would only ever pay attention to the Solr Home dir and paths things specificly mentioned by config directives -- but the world is not ideal, and solr definitely has some historic behavior that does utilize the CWD. But if you are using Solr's ResourceLoader API in your plugin, it should actively try to find your resource in a multitude of places (if it's not an absolute path) need more specifics to understand exactly what is going wrong for you though. -Hoss
Removing duplicates
I know that I can use the SignatureUpdateProcessorFactory to remove duplicates but I would like the duplicates in the index but remove them conditionally at query time. Is there any easy way I could accomplish this?
DIH threads
Has anyone applied the DIH threads patch on 1.4.1 (https://issues.apache.org/jira/browse/SOLR-1352)? Does anyone know if this works and/or does it improve performance? Thanks
Re: Index Design Question
Thank you. These are good general suggestion. Regarding the optimization for indexing vs. querying: are there any specific recommendations for each of those cases available somewhere. A link, for example, would be fabulous. I'm also still curious about solutions that go further. For example, there is a 2007 Lucene Overview presentation by Aaron Bannert claiming that "Lucene provides built-in methods to allow queries to span multiple remote Lucene indexes." and "A much more involved way to achieving high levels of update performance can be had by dividing the data into separate “columns”, or “silos”. Each column will hold a subset of the overall data, and will only receive updates for data that it controls. By taking advantage of the remote index merging query utility mentioned on an earlier slide, the data can still be searched in its entirety without any loss of accuracy and with negligible performance impact." Is this possible using Solr? How could this be accomplished? Again, any link would be fabulous. The wiki page http://wiki.apache.org/solr/MergingSolrIndexes seems to describe a somewhat different approach to merging. Is this something that could be integrated into master/slave replication by having two masters and one merged slave (in the above sense of separate “columns”, or “silos”)? If yes, what are the performance considerations when using it?
Re: solr.KeepWordsFilterFactory confusion
--- On Fri, 2/18/11, Robert Haschart wrote: > From: Robert Haschart > Subject: Re: solr.KeepWordsFilterFactory confusion > To: solr-user@lucene.apache.org > Date: Friday, February 18, 2011, 10:19 PM > Thanks for your response. After > making that change it seemed at first like it made no > difference, after restarting the jetty server, and > reindexing the test object, the display still shows: > > > Video > Streaming Video > Online > Gooberhead > Book of the Month > > > But it turns out that I had been making an incorrect > assumption. I was looking at the retruned stored > values for the solr document, and seeing the "Gooberhead" > entry listed, and thinking that the analyzer wasn't > running. However as I have subsequently figured out, > the analyzers are not run on the data that is to be stored, > only on the data that is to being indexed. > So after making your change to that field type statement, > if I search > for format_facet:Gooberhead I > get results = 0 which is what I'd expect. But seeing > that the unexpected values are still stored with the solr > document, it seems that I will have to take a different > approach. Facets are populated from indexed values. However deleted documents (and their terms) are not really deleted until an optimize. Issuing an optimize may help in your case.
XML Stripping from DIH
Hi all- I have some XML in a database that I am trying to index and store; I am interested in the various pieces of text, but none of the tags. I've been trying to figure out a way to strip all the tags out, but haven't found anything within Solr to do so; the XML parser seems to want XPath to get the various element values, when all I want is to turn the whole thing into one blob of text, regardless of whether it makes any "contextual" sense. Is there something in Solr to do this, or is it something I'd have to write myself (which I'm willing to do if necessary)? Thanks for any info, Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Re: Best way for a query-expander?
it does work! Le 18 févr. 2011 à 20:48, Paul Libbrecht a écrit : > using rb.req.getParams().get("blip") inside prepare(ResponseBuilder)'s > subclass of QueryComponent I could easily get the extra http request param. > > However, how would I change the query? > using rb.setQuery(xxx) within that same prepare method seems to have no > effect. Sorry for the noise, it does have the exact desired effect. Nice pattern. I believe everyone needs query expansion except maybe if using Dismax. paul > > Le 18 févr. 2011 à 19:51, Tommaso Teofili a écrit : > >> Hi Paul, >> me and a colleague worked on a QParserPlugin to "expand" alias field names >> to many existing field names >> ex: q=mockfield:val ==> q=actualfield1:val OR actualfield2:val >> but if you want to be able to use other params that come from the HTTP >> request you should use a custom RequestHandler I think, >> My 2 cents, >> Tommaso >
Re: solr.KeepWordsFilterFactory confusion
Thanks for your response. After making that change it seemed at first like it made no difference, after restarting the jetty server, and reindexing the test object, the display still shows: Video Streaming Video Online Gooberhead Book of the Month But it turns out that I had been making an incorrect assumption. I was looking at the retruned stored values for the solr document, and seeing the "Gooberhead" entry listed, and thinking that the analyzer wasn't running. However as I have subsequently figured out, the analyzers are not run on the data that is to be stored, only on the data that is to being indexed. So after making your change to that field type statement, if I search for format_facet:Gooberhead I get results = 0 which is what I'd expect. But seeing that the unexpected values are still stored with the solr document, it seems that I will have to take a different approach. Thanks again. -Bob Haschart Ahmet Arslan wrote: I've added a new field type in schema.xml: class="solr.StrField" should be class="solr.TextField"
Re: Best way for a query-expander?
using rb.req.getParams().get("blip") inside prepare(ResponseBuilder)'s subclass of QueryComponent I could easily get the extra http request param. However, how would I change the query? using rb.setQuery(xxx) within that same prepare method seems to have no effect. paul Le 18 févr. 2011 à 19:51, Tommaso Teofili a écrit : > Hi Paul, > me and a colleague worked on a QParserPlugin to "expand" alias field names > to many existing field names > ex: q=mockfield:val ==> q=actualfield1:val OR actualfield2:val > but if you want to be able to use other params that come from the HTTP > request you should use a custom RequestHandler I think, > My 2 cents, > Tommaso > > > 2011/2/18 Em > >> >> Hi Paul, >> >> what do you understand by saying "extra parameters"? >> >> Regards >> >> >> Paul Libbrecht-4 wrote: >>> >>> >>> Hello Solr-friends, >>> >>> I want to implement a query-expander, one that enriches the input by the >>> usage of extra parameters that, for example, a form may provide. >>> >>> Is the right way to subclass SearchHandler? >>> Or rather to subclass QueryComponent? >>> >>> thanks in advance >>> >>> paul >>> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Best-way-for-a-query-expander-tp2528194p2528736.html >> Sent from the Solr - User mailing list archive at Nabble.com. >>
Re: Dih sproc does not work
When I use 'call sprocname' it does call the process, but I am not getting the select into Solr. It shows 0 docs added. I am only returning 1 rs. Bill Bell Sent from mobile On Feb 18, 2011, at 11:49 AM, Bill Bell wrote: > I an trying to call a stored procedure using query= in DIH. I tried exec > name, call name, and name and none works. > > This is SQL server 2008. > > Bill Bell > Sent from mobile >
Understanding multi-field queries with q and fq
After searching this list, Google, and looking through the Pugh book, I am a little confused about the right way to structure a query. The Packt book uses the example of the MusicBrainz DB full of song metadata. What if they also had the song lyrics in English and German as files on disk, and wanted to index them along with the metadata, so that each document would basically have song title, artist, publisher, date, ..., All_Metadata (copy field of all metadata fields), Text_English, and Text_German fields? There can only be one default field, correct? So if we want to search for all songs containing (zeppelin AND (dog OR merle)) do we repeat the entire query text for all three major fields in the 'q' clause (assuming we don't want to use the cache): q=(+All_Metadata:zeppelin AND (dog OR merle)+Text_English:zeppelin AND (dog OR merle)+Text_German:(zeppelin AND (dog OR merle)) or repeat the entire query text for all three major fields in the 'fq' clause (assuming we want to use the cache): q=*:*&fq=(+All_Metadata:zeppelin AND (dog OR merle)+Text_English:zeppelin AND (dog OR merle)+Text_German:zeppelin AND (dog OR merle)) ? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Understanding-multi-field-queries-with-q-and-fq-tp2528866p2528866.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Best way for a query-expander?
Hi Paul, me and a colleague worked on a QParserPlugin to "expand" alias field names to many existing field names ex: q=mockfield:val ==> q=actualfield1:val OR actualfield2:val but if you want to be able to use other params that come from the HTTP request you should use a custom RequestHandler I think, My 2 cents, Tommaso 2011/2/18 Em > > Hi Paul, > > what do you understand by saying "extra parameters"? > > Regards > > > Paul Libbrecht-4 wrote: > > > > > > Hello Solr-friends, > > > > I want to implement a query-expander, one that enriches the input by the > > usage of extra parameters that, for example, a form may provide. > > > > Is the right way to subclass SearchHandler? > > Or rather to subclass QueryComponent? > > > > thanks in advance > > > > paul > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Best-way-for-a-query-expander-tp2528194p2528736.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Best way for a query-expander?
Erm... extra web-request-parameters simply. paul Le 18 févr. 2011 à 19:37, Em a écrit : > > Hi Paul, > > what do you understand by saying "extra parameters"? > > Regards > > > Paul Libbrecht-4 wrote: >> >> >> Hello Solr-friends, >> >> I want to implement a query-expander, one that enriches the input by the >> usage of extra parameters that, for example, a form may provide. >> >> Is the right way to subclass SearchHandler? >> Or rather to subclass QueryComponent? >> >> thanks in advance
Dih sproc does not work
I an trying to call a stored procedure using query= in DIH. I tried exec name, call name, and name and none works. This is SQL server 2008. Bill Bell Sent from mobile
Re: Best way for a query-expander?
Hi Paul, what do you understand by saying "extra parameters"? Regards Paul Libbrecht-4 wrote: > > > Hello Solr-friends, > > I want to implement a query-expander, one that enriches the input by the > usage of extra parameters that, for example, a form may provide. > > Is the right way to subclass SearchHandler? > Or rather to subclass QueryComponent? > > thanks in advance > > paul > -- View this message in context: http://lucene.472066.n3.nabble.com/Best-way-for-a-query-expander-tp2528194p2528736.html Sent from the Solr - User mailing list archive at Nabble.com.
Dih sproc call
I an trying to call a stored procedure using query= in DIH. I tried exec name, call name, and name and none works. This is SQL server 2008. Bill Bell Sent from mobile On Feb 18, 2011, at 10:27 AM, Paul Libbrecht wrote: > > Hello Solr-friends, > > I want to implement a query-expander, one that enriches the input by the > usage of extra parameters that, for example, a form may provide. > > Is the right way to subclass SearchHandler? > Or rather to subclass QueryComponent? > > thanks in advance > > paul
Best way for a query-expander?
Hello Solr-friends, I want to implement a query-expander, one that enriches the input by the usage of extra parameters that, for example, a form may provide. Is the right way to subclass SearchHandler? Or rather to subclass QueryComponent? thanks in advance paul
Re: Validate Query Syntax of Solr Request Before Sending
Hi, FYI, I found out. I'm using the SolrQueryParser (tadaa...) It needs the solrconfig.xml and the solr.xml files in other to validate the query. Then I'm able to validate any query before sending it to the Solrserver, thereby preventing unnecessary requests. /Christian -- View this message in context: http://lucene.472066.n3.nabble.com/Validate-Query-Syntax-of-Solr-Request-Before-Sending-tp2515797p2528183.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: GET or POST for large queries?
OK. I would ask on the mailing list of ManifoldCF to see if they have some experience with OLS. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 18. feb. 2011, at 17.29, mrw wrote: > > Thanks for the tip. No, I did not know about that. Unfortunately, we use > Oracle OLS which does not appear to be supported. > > > Jan Høydahl / Cominvent wrote: >> >> Hi, >> >> There are better ways to combat row level security in search than sending >> huge lists of users over the wire. >> >> Have you checked out the ManifoldCF project with which you can integrate >> security to Solr? http://incubator.apache.org/connectors/ >> >> -- >> Jan Høydahl, search solution architect >> Cominvent AS - www.cominvent.com >> >> >> > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/GET-or-POST-for-large-queries-tp2521700p2527765.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: GET or POST for large queries?
Thanks for the tip. No, I did not know about that. Unfortunately, we use Oracle OLS which does not appear to be supported. Jan Høydahl / Cominvent wrote: > > Hi, > > There are better ways to combat row level security in search than sending > huge lists of users over the wire. > > Have you checked out the ManifoldCF project with which you can integrate > security to Solr? http://incubator.apache.org/connectors/ > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > > > -- View this message in context: http://lucene.472066.n3.nabble.com/GET-or-POST-for-large-queries-tp2521700p2527765.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: GET or POST for large queries?
Hi, There are better ways to combat row level security in search than sending huge lists of users over the wire. Have you checked out the ManifoldCF project with which you can integrate security to Solr? http://incubator.apache.org/connectors/ -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 18. feb. 2011, at 15.30, mrw wrote: > > Thanks for the response. > > Yes, the queries are fairly large. Basically, the corporate security policy > dictates that we use row-level security attributes from the DB for access > control to Solr. So, we bake row-level security attributes from the > database into the index, and then, at query time, ask for those same > attributes from the DB and pass them as part of the Solr query. So, imagine > a bank VP with access to tens of thousands of customer records and > transactions, and all those access attributes get sent to Solr. The system > works well for the low-level account managers and low-entitlement users, but > cannot scale for the high-level folks. > > POSTing the data appears to avoid the header threshold issue, but it breaks > because of the "too many boolean clauses" error. > > > > > gearond wrote: >> >> Probably you could do it, and solving a problem in business supersedes >> 'rightness' concerns, much to the dismay of geeks and 'those who like >> rightness >> and say the word "Neemph!" '. >> >> >> the not rightness about this is that: >> POST, PUT, DELETE are assumed to make changes to the URL's backend. >> GET is assumed NOT to make changes. >> >> So if your POST does not make a change . . . it breaks convention. But if >> it >> solves the problem . . . :-) >> >> Another way would be to GET with a 'query file' location, and then have >> the >> server fetch that query and execute it. >> >> Boy!!! I'd love to see one of your queries!!! You must have a few ANDs/ORs >> in >> them :-) >> >> Dennis Gearon >> > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/GET-or-POST-for-large-queries-tp2521700p2526934.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: My Plan to Scale Solr
He misspelled it as "LSA". The original post says "'m not sure if it will work out in a real production environment, which has a tight SLA pending." Clearly a Service Level Agreement, not Latent Semantic Analysis. Since we're working on search engines, let's all try to figure stuff out for ourselves at least once, before we interrupt a few hundred people with questions. wunder On Feb 17, 2011, at 11:47 PM, Lance Norskog wrote: > Or even better, search with 'LSA'. > > On Thu, Feb 17, 2011 at 9:22 AM, Walter Underwood > wrote: >> http://lmgtfy.com/?q=SLA >> >> wunder >> >> On Feb 17, 2011, at 11:04 AM, Dennis Gearon wrote: >> >>> What's an 'LSA' >>> >>> Dennis Gearon >>> >>> >>> Signature Warning >>> >>> It is always a good idea to learn from your own mistakes. It is usually a >>> better >>> idea to learn from others’ mistakes, so you do not have to make them >>> yourself. >>> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' >>> >>> >>> EARTH has a Right To Life, >>> otherwise we all die. >>> >>> >>> >>> >>> >>> From: Stijn Vanhoorelbeke >>> To: solr-user@lucene.apache.org; bing...@asu.edu >>> Sent: Thu, February 17, 2011 4:28:13 AM >>> Subject: Re: My Plan to Scale Solr >>> >>> Hi, >>> >>> I'm currently looking at SolrCloud. I've managed to set up a scalable >>> cluster with ZooKeeper. >>> ( see the examples in http://wiki.apache.org/solr/SolrCloud for a quick >>> understanding ) >>> This way, all different shards / replicas are stored in a centralised >>> configuration. >>> >>> Moreover the ZooKeeper contains out-of-the-box loadbalancing. >>> So, lets say - you have 2 different shards and each is replicated 2 times. >>> Your zookeeper config will look like this: >>> >>> \config >>> ... >>> /live_nodes (v=6 children=4) >>> lP_Port:7500_solr (ephemeral v=0) >>> lP_Port:7574_solr (ephemeral v=0) >>> lP_Port:8900_solr (ephemeral v=0) >>> lP_Port:8983_solr (ephemeral v=0) >>> /collections (v=20 children=1) >>> collection1 (v=0 children=1) "configName=myconf" >>> shards (v=0 children=2) >>>shard1 (v=0 children=3) >>> lP_Port:8983_solr_ (v=4) >>> "node_name=lP_Port:8983_solr url=http://lP_Port:8983/solr/"; >>> lP_Port:7574_solr_ (v=1) >>> "node_name=lP_Port:7574_solr url=http://lP_Port:7574/solr/"; >>> lP_Port:8900_solr_ (v=1) >>> "node_name=lP_Port:8900_solr url=http://lP_Port:8900/solr/"; >>>shard2 (v=0 children=2) >>> lP_Port:7500_solr_ (v=0) >>> "node_name=lP_Port:7500_solr url=http://lP_Port:7500/solr/"; >>> lP_Port:7574_solr_ (v=1) >>> "node_name=lP_Port:7574_solr url=http://lP_Port:7574/solr/"; >>> >>> --> This setup can be realised, by 1 ZooKeeper module - the other solr >>> machines need just to know the IP_Port were the zookeeper is active & that's >>> it. >>> --> So no configuration / installing is needed to realise quick a scalable / >>> load balanced cluster. >>> >>> Disclaimer: >>> ZooKeeper is a relative new feature - I'm not sure if it will work out in a >>> real production environment, which has a tight SLA pending. >>> But - definitely keep your eyes on this stuff - this will mature quickly! >>> >>> Stijn Vanhoorelbeke >>
Re: [solrCloud] Distributed IDF - scoring in the cloud
On Fri, Feb 18, 2011 at 7:07 AM, Thorsten Scherler wrote: > Is there a general interest to bring 1632 to the trunk (especially for > solrCloud)? Definitely - distributed idf is needed (as an option). -Yonik http://lucidimagination.com
Re: Solr multi cores or not
Multi-core was first added in 1.3 version and matured in 1.4. And as far as I understand the Solr team encourages the use of multi-core. Marc. On Fri, Feb 18, 2011 at 3:04 PM, Thumuluri, Sai < sai.thumul...@verizonwireless.com> wrote: > Thank you, I will go the multi-core route and see how that works out. I > guess, if we have to run queries across the cores, I may have to just > run separate queries. > > -Original Message- > From: Marc SCHNEIDER [mailto:marc.schneide...@gmail.com] > Sent: Friday, February 18, 2011 8:01 AM > To: solr-user@lucene.apache.org > Subject: Re: Solr multi cores or not > > Hi, > > It depends on what kind of data you are indexing between your multiple > applications. > If app1 has many fields to be indexed and app2 too and if theses fields > are > differents then it would probably be better to have multi cores. > If you have a lot of common fields between app1 and app2 then one index > is > probably the best choice as it will avoid you configuring / implementing > several indexes. In this case you can also have a differentiating field > (like 'type') so that you can get data corresponding to your app. > It really depends on your data structure. > > Hope this helps, > Marc. > > On Wed, Feb 16, 2011 at 9:45 PM, Thumuluri, Sai < > sai.thumul...@verizonwireless.com> wrote: > > > Hi, > > > > I have a need to index multiple applications using Solr, I also have > the > > need to share indexes or run a search query across these application > > indexes. Is solr multi-core - the way to go? My server config is > > 2virtual CPUs @ 1.8 GHz and has about 32GB of memory. What is the > > recommendation? > > > > Thanks, > > Sai Thumuluri > > > > > > >
Re: GET or POST for large queries?
Increase the setting in solrconfig On Friday 18 February 2011 15:30:11 mrw wrote: > Thanks for the response. > > POSTing the data appears to avoid the header threshold issue, but it breaks > because of the "too many boolean clauses" error. > > gearond wrote: > > Probably you could do it, and solving a problem in business supersedes > > 'rightness' concerns, much to the dismay of geeks and 'those who like > > rightness > > and say the word "Neemph!" '. > > > > > > the not rightness about this is that: > > POST, PUT, DELETE are assumed to make changes to the URL's backend. > > GET is assumed NOT to make changes. > > > > So if your POST does not make a change . . . it breaks convention. But if > > it > > solves the problem . . . :-) > > > > Another way would be to GET with a 'query file' location, and then have > > the > > server fetch that query and execute it. > > > > Boy!!! I'd love to see one of your queries!!! You must have a few > > ANDs/ORs in > > them :-) > > > > Dennis Gearon -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: GET or POST for large queries?
Thanks for the response and info. I'll try that. Jonathan Rochkind wrote: > > Yes, I think it's 1024 by default. I think you can raise it in your > config. But your performance may suffer. > > Best would be to try and find a better way to do what you want without > using thousands of clauses. This might require some custom Java plugins > to Solr though. > > > -- View this message in context: http://lucene.472066.n3.nabble.com/GET-or-POST-for-large-queries-tp2521700p2526950.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: GET or POST for large queries?
Thanks for the response. Yes, the queries are fairly large. Basically, the corporate security policy dictates that we use row-level security attributes from the DB for access control to Solr. So, we bake row-level security attributes from the database into the index, and then, at query time, ask for those same attributes from the DB and pass them as part of the Solr query. So, imagine a bank VP with access to tens of thousands of customer records and transactions, and all those access attributes get sent to Solr. The system works well for the low-level account managers and low-entitlement users, but cannot scale for the high-level folks. POSTing the data appears to avoid the header threshold issue, but it breaks because of the "too many boolean clauses" error. gearond wrote: > > Probably you could do it, and solving a problem in business supersedes > 'rightness' concerns, much to the dismay of geeks and 'those who like > rightness > and say the word "Neemph!" '. > > > the not rightness about this is that: > POST, PUT, DELETE are assumed to make changes to the URL's backend. > GET is assumed NOT to make changes. > > So if your POST does not make a change . . . it breaks convention. But if > it > solves the problem . . . :-) > > Another way would be to GET with a 'query file' location, and then have > the > server fetch that query and execute it. > > Boy!!! I'd love to see one of your queries!!! You must have a few ANDs/ORs > in > them :-) > > Dennis Gearon > -- View this message in context: http://lucene.472066.n3.nabble.com/GET-or-POST-for-large-queries-tp2521700p2526934.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr multi cores or not
Thank you, I will go the multi-core route and see how that works out. I guess, if we have to run queries across the cores, I may have to just run separate queries. -Original Message- From: Marc SCHNEIDER [mailto:marc.schneide...@gmail.com] Sent: Friday, February 18, 2011 8:01 AM To: solr-user@lucene.apache.org Subject: Re: Solr multi cores or not Hi, It depends on what kind of data you are indexing between your multiple applications. If app1 has many fields to be indexed and app2 too and if theses fields are differents then it would probably be better to have multi cores. If you have a lot of common fields between app1 and app2 then one index is probably the best choice as it will avoid you configuring / implementing several indexes. In this case you can also have a differentiating field (like 'type') so that you can get data corresponding to your app. It really depends on your data structure. Hope this helps, Marc. On Wed, Feb 16, 2011 at 9:45 PM, Thumuluri, Sai < sai.thumul...@verizonwireless.com> wrote: > Hi, > > I have a need to index multiple applications using Solr, I also have the > need to share indexes or run a search query across these application > indexes. Is solr multi-core - the way to go? My server config is > 2virtual CPUs @ 1.8 GHz and has about 32GB of memory. What is the > recommendation? > > Thanks, > Sai Thumuluri > > >
Re: Solr multi cores or not
Hi, It depends on what kind of data you are indexing between your multiple applications. If app1 has many fields to be indexed and app2 too and if theses fields are differents then it would probably be better to have multi cores. If you have a lot of common fields between app1 and app2 then one index is probably the best choice as it will avoid you configuring / implementing several indexes. In this case you can also have a differentiating field (like 'type') so that you can get data corresponding to your app. It really depends on your data structure. Hope this helps, Marc. On Wed, Feb 16, 2011 at 9:45 PM, Thumuluri, Sai < sai.thumul...@verizonwireless.com> wrote: > Hi, > > I have a need to index multiple applications using Solr, I also have the > need to share indexes or run a search query across these application > indexes. Is solr multi-core - the way to go? My server config is > 2virtual CPUs @ 1.8 GHz and has about 32GB of memory. What is the > recommendation? > > Thanks, > Sai Thumuluri > > >
string field_type query
i had declare a field_name=category ,field_type=string now i am querying category:Crime but it did nt show any results .But when i query for *:* it shows values related to this category can anyone tell me the problem?
[solrCloud] Distributed IDF - scoring in the cloud
Hi all, doing the solrCloud examples and one thing I am not clear about is the scoring in a distributed search. I did a small test where I used the "Example A: Simple two shard cluster" from wiki:SolrCloud and additional added java -Durl=http://localhost:7574/solr/collection1/update -jar post.jar ipod_other.xml java -Durl=http://localhost:8983/solr/collection1/update -jar post.jar monitor2.xml Now requesting http://localhost:8983/solr/collection1/select?distrib=true&q=electronics&fl=score&shards=localhost:8983/solr,localhost:7574/solr for both host will return the same result. Here we get the score for each hit based on the shard specific score and merge them into one result doc. However when I add monitor2.xml as well to 7574 which previously did not contained this, the scoring changes depending on the server I request. The score returned for 8983 is always 0.09289607 being distrib=true|false The score returned for 7574 is always 0.121383816 being distrib=true|false So is it correct to assume that if a document is indexed in both shards the score which will predominate is the one from the host which has been requested? My client plan to distribute the current index into different shards. For example each "Consejería" (counseling) should be hosted in a shard. The critical point for the client is that the scoring is the same as in the big unique index they use right now for a distributed search. As I understand the current solrCloud implementation there is no concern about harmonizing the score. In my research I came across http://markmail.org/message/bhhfwymz5y7lvoj7 "The "IDF" part of the relevancy score is the only place that distributed search scoring won't "match up" with no distributed scoring because the document frequency used for the term is local to every core instead of global. If you distribute your documents fairly randomly to the different shards, this won't matter. There is a patch in the works to add global idf, but I think that even when it's committed, it will default to off because of the higher cost associated with it." the patch is https://issues.apache.org/jira/browse/SOLR-1632 However last comment is from 26/Jul/10 reporting the patch failed and a comment from Yonik give the impression that is not ready to use: "It looks like the issue is this: rewrite() doesn't work for function queries (there is no propagation mechanism to go through value sources). This is a problem when real queries are embedded in function queries." Is there a general interest to bring 1632 to the trunk (especially for solrCloud)? Or may it be better to look into something that aims to scale the index into hbase so he does not lose the scoring. TIA for your feedback -- Thorsten Scherler codeBusters S.L. - web based systems http://www.codebusters.es/ smime.p7s Description: S/MIME cryptographic signature
Re: SolrCloud new....
Hi, I'm busy doing the exact same thing. I figured things out - all by myself - the wiki page is a nice 'fist view', but doesn't goes in dept... Lets go ahead: 1)Should i copy the libraries from cloud to trunk??? 2)should i keep the cloud module in every system??? A: Yes, you should. You should get yourself the latest dev trunk and compile it. The steps I followed: + grap latest trunk & build solr + backup all solr config files + in dir tomcat6/webapps/ remove the dir 'solr' + copy the new solr.war ( which you build in first step ) to tomcat6/webapps + On your Solr_home/conf dir solrconfig.xml need to be replaced by a new one ( you take from example dir of your build) -- some other config files ( like schema.xml ) you may keep using the old ones. + Adapt the new files to represent the old configuration + restart tomcat and it will install new version of solr It seems the index isn't compatible - so you need to flush your whole index and re-index all data. And finally you have your solr system back with zookeeper integrated in /admin zone :) 3) I am not using any cores in the solr. It is a single solr in every system.can solrcloud support it?? A: Actually you are using one cor - so gives no problem. But be sure to check you have solr.xml file in your solr_home dir. This file just mentions all cores - in your case just one core; ( you can find examples of layout of this file easily on http://wiki.apache.org/solr/CoreAdmin ) 4) the example is given in jetty.Is it the same way to make it in tomcat??? A: Right now - it is the same way. You have to edit your /etc/init.d/tomcat6 startup script. In the start) section you can specify all the JAVA_OPTS ( the ones the solrcloud wiki mentions). Be sure to set following one: export JAVA_OPTS="$JAVA_OPTS -DhostPort=8080" ( if tomcat runs on port 8080 ) At first I didn't --> my zookeeper pointed to standard 8983 port, which gave errors. In the above I gave you a quick peak how to get the SolrCloud feature. In above the Zookeeper is embedded in one of your solr machines. If you don't want this you may place zookeeper on a different machine ( like I'm doing right now). If you need more help - you can contact me. Stijn Vanhoorelbeke, -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-new-tp1528872p2526080.html Sent from the Solr - User mailing list archive at Nabble.com.