More Number of characters???
Ausgangssprache: Deutsch javascript:void(0) Geben Sie Text oder eine Website-Adresse ein oder lassen Sie ein Dokument übersetzen http://translate.google.de/?tr=fhl=de. Abbrechen http://translate.google.de/?tr=thl=de is it posible to get more Number of characters? I have a problem with too many characters in the search, my Think Tank is very long, but this also be the case. Unfortunately I can not find a setting that is responsible.
Re: Currency field type
Thank you Erik, I think about taking time to be more involved in solr development. In the meantime, I will choose to store prices and currency in a normalized way. -- View this message in context: http://lucene.472066.n3.nabble.com/Currency-field-type-tp3684682p3690076.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query for documents that have ONLY a certain value in a multivalued field
I am having a similar problem and would appreciate any useful explanation on this topic. I couldn't find a way of querying for exact match in multivalued or normal text fields On Thu, Jan 26, 2012 at 3:14 AM, Garrett Conaty gcon...@gmail.com wrote: Does anyone know if there's a way using the SOLR query syntax to filter documents that have only a certain value in a multivalued field? As an example if I have some field country that's multivalued and I want q=id:[* TO *]fq=country:brazil where 'brazil' is the only value present. I've run through a few possibilities to do this, but I think it would be more common and a better solution would exist: 1) On index creation time, aggregate my source data and create a count_country field that contains the number of terms in the country field. Then the query would be q=id:[* TO *]fq=country:brazilfq=count_country=1 2) In the search client, use the terms component to retrieve all terms for country and then do the exclusions in the client and construct the query as follows q=id:[* TO *]fq=country:brazilfq=-country:canadafq=-country:us etc. 3) Write a function query or similar that could capture the info. Thanks in advance, Garrett Conaty -- Bilal Dadanlar
is it posible to get more Number of characters?
is it posible to get more Number of characters? I have a problem with too many characters in the search, my Think Tank is very long, but this also be the case. Unfortunately I can not find a setting that is responsible.
Re: is it posible to get more Number of characters?
Hi Jörg, Hmmm, do you mind rephrasing the question? Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html - Original Message - From: Jörg Agatz joerg.ag...@googlemail.com To: solr-user@lucene.apache.org Cc: Sent: Thursday, January 26, 2012 5:23 AM Subject: is it posible to get more Number of characters? is it posible to get more Number of characters? I have a problem with too many characters in the search, my Think Tank is very long, but this also be the case. Unfortunately I can not find a setting that is responsible.
Re: Size of index to use shard
@Erick: Thanks for the detailed explanation. On this note, we have 75GB for *.fdt and *.fdx out of 99GB index. The search is still not that fast, if cache size is small. But giving more cache led to OOMs. Partitioning to shards is not an option either, as at the moment we try to run as less machines as possible. @Vadim: Thanks for the info! For the 6GB of heap size I assume you cache are not that big? We had filterCache (used heavily compared to other cache types in facet and non-facet queries according to our measurements) in the order of 20 thousand entries and heap size 22GB and observed OOM. So we decided to lower the cache params down substantially. Dmitry On Tue, Jan 24, 2012 at 10:25 PM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: @Erick thanks:) i´m with you with your opinion. my load tests show the same. @Dmitry my docs are small too, i think about 3-15KB per doc. i update my index all the time and i have an average of 20-50 requests per minute (20% facet queries, 80% large boolean queries with wildcard/fuzzy) . How much docs at a time= depends from choosed filters, from 10 to all 100Mio. I work with very small caches (strangely, but if my index is under 100GB i need larger caches, over 100GB smaller caches..) My JVM has 6GB, 18GB for I/O. With few updates a day i would configure very big caches, like Tim Burton (see HathiTrust´s Blog) Regards Vadim 2012/1/24 Anderson vasconcelos anderson.v...@gmail.com: Thanks for the explanation Erick :) 2012/1/24, Erick Erickson erickerick...@gmail.com: Talking about index size can be very misleading. Take a look at http://lucene.apache.org/java/3_5_0/fileformats.html#file-names. Note that the *.fdt and *.fdx files are used to for stored fields, i.e. the verbatim copy of data put in the index when you specify stored=true. These files have virtually no impact on search speed. So, if your *.fdx and *.fdt files are 90G out of a 100G index it is a much different thing than if these files are 10G out of a 100G index. And this doesn't even mention the peculiarities of your query mix. Nor does it say a thing about whether your cheapest alternative is to add more memory. Anderson's method is about the only reliable one, you just have to test with your index and real queries. At some point, you'll find your tipping point, typically when you come under memory pressure. And it's a balancing act between how much memory you allocate to the JVM and how much you leave for the op system. Bottom line: No hard and fast numbers. And you should periodically re-test the empirical numbers you *do* arrive at... Best Erick On Tue, Jan 24, 2012 at 5:31 AM, Anderson vasconcelos anderson.v...@gmail.com wrote: Apparently, not so easy to determine when to break the content into pieces. I'll investigate further about the amount of documents, the size of each document and what kind of search is being used. It seems, I will have to do a load test to identify the cutoff point to begin using the strategy of shards. Thanks 2012/1/24, Dmitry Kan dmitry@gmail.com: Hi, The article you gave mentions 13GB of index size. It is quite small index from our perspective. We have noticed, that at least solr 3.4 has some sort of choking point with respect to growing index size. It just becomes substantially slower than what we need (a query on avg taking more than 3-4 seconds) once index size crosses a magic level (about 80GB following our practical observations). We try to keep our indices at around 60-70GB for fast searches and above 100GB for slow ones. We also route majority of user queries to fast indices. Yes, caching may help, but not necessarily we can afford adding more RAM for bigger indices. BTW, our documents are very small, thus in 100GB index we can have around 200 mil. documents. It would be interesting to see, how you manage to ensure q-times under 1 sec with an index of 250GB? How many documents / facets do you ask max. at a time? FYI, we ask for a thousand of facets in one go. Regards, Dmitry On Tue, Jan 24, 2012 at 10:30 AM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: Hi, it depends from your hardware. Read this: http://www.derivante.com/2009/05/05/solr-performance-benchmarks-single-vs-multi-core-index-shards/ Think about your cache-config (few updates, big caches) and a good HW-infrastructure. In my case i can handle a 250GB index with 100mil. docs on a I7 machine with RAID10 and 24GB RAM = q-times under 1 sec. Regards Vadim 2012/1/24 Anderson vasconcelos anderson.v...@gmail.com: Hi Has some size of index (or number of docs) that is necessary to break the index in shards? I have a index with 100GB of size. This index increase 10GB per year. (I don't have information how many docs they have) and the docs never will be deleted. Thinking in 30
Re: Query for documents that have ONLY a certain value in a multivalued field
Thought of another way to do this which will at least work for one field, and that is by mapping all of the values into a simple string field and then querying for an exact match in the string (one term). This is similar to having a 'count' field, but for our index creation process we could reuse a string field we already had made (for sorting). Still, I'd like to see if the community has any other options from within Solr itself. On Thu, Jan 26, 2012 at 2:05 AM, bilal dadanlar bi...@fizy.com wrote: I am having a similar problem and would appreciate any useful explanation on this topic. I couldn't find a way of querying for exact match in multivalued or normal text fields On Thu, Jan 26, 2012 at 3:14 AM, Garrett Conaty gcon...@gmail.com wrote: Does anyone know if there's a way using the SOLR query syntax to filter documents that have only a certain value in a multivalued field? As an example if I have some field country that's multivalued and I want q=id:[* TO *]fq=country:brazil where 'brazil' is the only value present. I've run through a few possibilities to do this, but I think it would be more common and a better solution would exist: 1) On index creation time, aggregate my source data and create a count_country field that contains the number of terms in the country field. Then the query would be q=id:[* TO *]fq=country:brazilfq=count_country=1 2) In the search client, use the terms component to retrieve all terms for country and then do the exclusions in the client and construct the query as follows q=id:[* TO *]fq=country:brazilfq=-country:canadafq=-country:us etc. 3) Write a function query or similar that could capture the info. Thanks in advance, Garrett Conaty -- Bilal Dadanlar
RE: Using multiple DirectSolrSpellcheckers for a query
Nalini, Right now the best you can do is to use copyField to combine everything into a catch-all for spellchecking purposes. While this seems wasteful, this often has to be done anyhow because typically you'll need less/different analysis for spellchecking than for searching. But rather than having separate copyFields to create multiple dictionaries, put everything into one field to create a single master dictionary. From there, you need to set spellcheck.collate to true and also spellcheck.maxCollationTries greater than zero (5-10 usually works). The first parameter tells it to generate re-written queries with spelling suggestions (collations). The second parameter tells it to weed out any collations that won't generate hits if you re-query them. This is important because having unrelated keywords in your master dictionary will increase the chances the spellchecker will pick the wrong words as corrections. There is a significant caveat to this: The spellchecker typically only suggests for words in the dictionary. So by creating a huge, master dictionary you might find that many misspelled words won't generate suggestions. See this thread for some workarounds: http://lucene.472066.n3.nabble.com/Improving-Solr-Spell-Checker-Results-td3658411.html I think having multiple, per-field dictionaries as you suggest might be a good way to go. While this is not supported, I don't think its because of performance concerns. (There would be an overhead cost to this but I think it would still be practical). It just hasn't been implemented yet. But we might be getting to a possible start to this type of functionality. In https://issues.apache.org/jira/browse/SOLR-2585 a separate spellchecker is added that just corrects wordbreak (or is it word break?) problems, then a ConjunctionSolrSpellChecker combines the results from the main spellchecker and the wordbreak spellcheker. I could see a next step beyond this being to support per-field dictionaries, checking them separately, then combining the results. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Nalini Kartha [mailto:nalinikar...@gmail.com] Sent: Wednesday, January 25, 2012 11:56 AM To: solr-user@lucene.apache.org Subject: Using multiple DirectSolrSpellcheckers for a query Hi, We are trying to use the DirectSolrSpellChecker to get corrections for mis-spelled query terms directly from fields in the Solr index. However, we need to use multiple fields for spellchecking a query. It looks looks like you can only use one spellchecker for a request and so the workaround for this it to create a copy field from the fields required for spell correction? We'd like to avoid this because we allow users to perform different kinds of queries on different sets of fields and so to provide meaningful corrections we'd have to create multiple copy fields - one for each query type. Is there any reason why Solr doesn't support using multiple spellcheckers for a query? Is it because of performance overhead? Thanks, Nalini
Re: is it posible to get more Number of characters?
no, i have a lot of charecter in my url.. it looks like a stop at xyz charakters, so i hope to find a way use mor character On Thu, Jan 26, 2012 at 3:11 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Jörg, Hmmm, do you mind rephrasing the question? Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html - Original Message - From: Jörg Agatz joerg.ag...@googlemail.com To: solr-user@lucene.apache.org Cc: Sent: Thursday, January 26, 2012 5:23 AM Subject: is it posible to get more Number of characters? is it posible to get more Number of characters? I have a problem with too many characters in the search, my Think Tank is very long, but this also be the case. Unfortunately I can not find a setting that is responsible.
decreasing of maxFieldLength in solrconfig.xml doesn't work
Hello Folks, i want to decrease the max. number of terms for my fields to 500. I thought what the maxFieldLength parameter in solrconfig.xml is intended for this. In my case it doesn't work. The half of my text fields includes longer text(about 1 words). With 100 docs in my index i had an segment size of 1140KB for indexed data and 270KB for stored data (.fdx, .fdt). After a change from default maxFieldLength1/maxFieldLength to maxFieldLength500/maxFieldLength, delete(index folder), restarting Tomcat and reindex, i see the same segment sizes (1140KB for indexed and 270KB for stored data). Please tell me if I made an error in reasoning. Regards Vadim
Re: decreasing of maxFieldLength in solrconfig.xml doesn't work
P.S.: i use Solr 4.0 from trunk. Is maxFieldLength deprecated in Solr 4.0 ? If so, do i have an alternative to decrease the number of terms during indexing? Regards Vadim 2012/1/26 Vadim Kisselmann v.kisselm...@googlemail.com: Hello Folks, i want to decrease the max. number of terms for my fields to 500. I thought what the maxFieldLength parameter in solrconfig.xml is intended for this. In my case it doesn't work. The half of my text fields includes longer text(about 1 words). With 100 docs in my index i had an segment size of 1140KB for indexed data and 270KB for stored data (.fdx, .fdt). After a change from default maxFieldLength1/maxFieldLength to maxFieldLength500/maxFieldLength, delete(index folder), restarting Tomcat and reindex, i see the same segment sizes (1140KB for indexed and 270KB for stored data). Please tell me if I made an error in reasoning. Regards Vadim
Re: decreasing of maxFieldLength in solrconfig.xml doesn't work
i want to decrease the max. number of terms for my fields to 500. I thought what the maxFieldLength parameter in solrconfig.xml is intended for this. In my case it doesn't work. The half of my text fields includes longer text(about 1 words). With 100 docs in my index i had an segment size of 1140KB for indexed data and 270KB for stored data (.fdx, .fdt). After a change from default maxFieldLength1/maxFieldLength to maxFieldLength500/maxFieldLength, delete(index folder), restarting Tomcat and reindex, i see the same segment sizes (1140KB for indexed and 270KB for stored data). Please tell me if I made an error in reasoning. What version of solr are you using? Could it be http://lucene.apache.org/solr/api/org/apache/solr/analysis/LimitTokenCountFilterFactory.html? http://lucene.apache.org/java/3_5_0/api/core/org/apache/lucene/analysis/LimitTokenCountFilter.html
Re: decreasing of maxFieldLength in solrconfig.xml doesn't work
Vadim, Is it possible that your solrconfig.xml is using maxFieldLength in both the indexDefaults and mainIndex? If so the mainIndex config overwrites the other. See this issue: http://lucene.472066.n3.nabble.com/Solr-ignoring-maxFieldLength-td473263.html Sean On Thu, Jan 26, 2012 at 10:15 AM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: P.S.: i use Solr 4.0 from trunk. Is maxFieldLength deprecated in Solr 4.0 ? If so, do i have an alternative to decrease the number of terms during indexing? Regards Vadim 2012/1/26 Vadim Kisselmann v.kisselm...@googlemail.com: Hello Folks, i want to decrease the max. number of terms for my fields to 500. I thought what the maxFieldLength parameter in solrconfig.xml is intended for this. In my case it doesn't work. The half of my text fields includes longer text(about 1 words). With 100 docs in my index i had an segment size of 1140KB for indexed data and 270KB for stored data (.fdx, .fdt). After a change from default maxFieldLength1/maxFieldLength to maxFieldLength500/maxFieldLength, delete(index folder), restarting Tomcat and reindex, i see the same segment sizes (1140KB for indexed and 270KB for stored data). Please tell me if I made an error in reasoning. Regards Vadim -- Sean Adams-Hiett Owner, Web Geeks For Hire phone: (361) 433.5748 email: s...@webgeeksforhire.com web: www.webgeeksforhire.com twitter: @geekbusiness http://twitter.com/geekbusiness
Re: decreasing of maxFieldLength in solrconfig.xml doesn't work
Sean, Ahmet, thanks for response:) I use Solr 4.0 from trunk. In my solrconfig.xml is only one maxFieldLength param. I think it is deprecated in Solr Versions 3.5+... But LimitTokenCountFilterFactory works in my case :) Thanks! Regards Vadim 2012/1/26 Ahmet Arslan iori...@yahoo.com: i want to decrease the max. number of terms for my fields to 500. I thought what the maxFieldLength parameter in solrconfig.xml is intended for this. In my case it doesn't work. The half of my text fields includes longer text(about 1 words). With 100 docs in my index i had an segment size of 1140KB for indexed data and 270KB for stored data (.fdx, .fdt). After a change from default maxFieldLength1/maxFieldLength to maxFieldLength500/maxFieldLength, delete(index folder), restarting Tomcat and reindex, i see the same segment sizes (1140KB for indexed and 270KB for stored data). Please tell me if I made an error in reasoning. What version of solr are you using? Could it be http://lucene.apache.org/solr/api/org/apache/solr/analysis/LimitTokenCountFilterFactory.html? http://lucene.apache.org/java/3_5_0/api/core/org/apache/lucene/analysis/LimitTokenCountFilter.html
Solr Join query with fq not correctly filtering results?
Hello, I'm trying out the Solr JOIN query functionality on trunk. I have the latest checkout, revision #1236272 - I did the following steps to get the example up and running: cd solr ant example java -jar start.jar cd exampledocs java -jar post.jar *.xml Then I tried a few of the sample queries on the wiki page http://wiki.apache.org/solr/Join. In particular, this is one that I'm interest in Find all manufacturer docs named belkin, then join them against (product) docs and filter that list to only products with a price less than 12 dollars http://localhost:8983/solr/select?q={!join+from=id+to=manu_id_s}compName_s:Belkinfq=price:%5B%2A+TO+12%5D However, when I run that query, I get two results, one with a price of 19.95 and another with a price of 11.5 Because of the filter query, I'm only expecting to see one result - the one with a price of 11.99. I was also able to replicate this in a unit test added to org.apache.solr.TestJoin: @Test public void testJoin_withFilterQuery() throws Exception { assertU(add(doc(id, 1,name, john, title, Director, dept_s,Engineering))); assertU(add(doc(id, 2,name, mark, title, VP, dept_s,Marketing))); assertU(add(doc(id, 3,name, nancy, title, MTS, dept_s,Sales))); assertU(add(doc(id, 4,name, dave, title, MTS, dept_s,Support, dept_s,Engineering))); assertU(add(doc(id, 5,name, tina, title, VP, dept_s,Engineering))); assertU(add(doc(id,10, dept_id_s, Engineering, text,These guys develop stuff))); assertU(add(doc(id,11, dept_id_s, Marketing, text,These guys make you look good))); assertU(add(doc(id,12, dept_id_s, Sales, text,These guys sell stuff))); assertU(add(doc(id,13, dept_id_s, Support, text,These guys help customers))); assertU(commit()); //*** //This works as expected - the correct number of results are found //*** // find people that develop stuff assertJQ(req(q,{!join from=dept_id_s to=dept_s}text:develop, fl,id) ,/response=={'numFound':3,'start':0,'docs':[{'id':'1'},{'id':'4'},{'id':'5'}]} ); *// *// this fails - the response returned finds all three people - it should only find John* *//expected =/response=={numFound:1,start:0,docs:[{id:1}]} * *//response = {* *//responseHeader:{* *// status:0,* *// QTime:4},* *//response:{numFound:3,start:0,docs:[* *// {* *//id:1},* *// {* *//id:4},* *// {* *//id:5}]* *//}}* *// *// find people that develop stuff - but limit via filter query to a name of john* *assertJQ(req(q,{!join from=dept_id_s to=dept_s}text:develop, fl,id, fq, name:john)* *,/response=={'numFound':1,'start':0,'docs':[{'id':'1'}]}* *);* } Interestingly, I know this worked at some point. I had a snapshot build in my ivy cache from 10/2/2011 and it was working with that build maven_artifacts/org/apache/solr/ solr/4.0-SNAPSHOT/solr-4.0-20111002.161157-1.pom Mike
Solr and TF-IDF
Hey there, I'm using Solr for my thesis, where I have to implement a content-based recommender system for movies. I have indexed about 20thousand movies with their informations: movie-id title genre plot/movie-description - !!! cast I've enabled the TermvektorComponent for the fields genre, description and cast. So I can get the tf-idf-values for the terms of every movie. With these term-TfIdfValue-couples I have to compute the similarities between movies by using the cosine similarity. I know about the Solr-Feature MLT (MoreLikeThis), but thats not the solution, I have to implement the CosineSimilarity in java myself. Now I have some problems/questions: I get the responses in XML-format, which I read out with an XML-reader in Java, where it wriggle trough every child-node in order to reach the right node. Is there a better way, to get these values in Node-Attributes or node-texts? I have tried it with wt=csv but for the requests I get responses only with the Movie-ID's, nothing more. By XML-responseWriter my request is for example this: http://localhost:8983/solr/select/?qt=tvrhq=id:1800180382fl=idtv.tf_idf=true I get the right response with all terms and tf-tdf's - in xml. And if I add csv-notation http://localhost:8983/solr/select/?qt=tvrhq=id:1800180382fl=idtv.tf_idf=truewt=csv I get only this: id 1800180382 Maybe my request is wrong? Another problem is, if I get the terms and their tfidf-values, I store them in a map. But there isn't a succession in the values. I want e.g. store only the 10 chief terms, so 10 terms with the highest tfidf-values. Can I sort them in a descending succession? I haven't find anything therefor. If its not possible, I must sort them later in the map. My last question is: any movie has a genre - often more than one. Its like the cat-field (category) in the exampledocs with ipod/monitor etc. and its an important pointfor the movies. How can I integrate this factor? I changed the boost-attribute in the Solr-Xml-Schema like this: field name=genre type=string indexed=true stored=true multiValued=true omitNorms=false boost=3 termVectors=true termPositions=true termOffsets=true/ Is that enough or is there any other possibility? Perhaps you see, that I am a beginner in Solr, at the beginning a few weeks ago it was even more difficult for me but now it goes better. I would be very grateful for any help, ideas, tips or suggestions! Many regards Nejla
Re: Solr and TF-IDF
Why are you using a search engine to build a recomender? None of the leading teams in the Netflix Prize used search engines as a base technology. Start with the recommender algorithms in Mahout: http://mahout.apache.org/ wunder On Jan 26, 2012, at 9:18 AM, Nejla Karacan wrote: Hey there, I'm using Solr for my thesis, where I have to implement a content-based recommender system for movies. I have indexed about 20thousand movies with their informations: movie-id title genre plot/movie-description - !!! cast I've enabled the TermvektorComponent for the fields genre, description and cast. So I can get the tf-idf-values for the terms of every movie. With these term-TfIdfValue-couples I have to compute the similarities between movies by using the cosine similarity. I know about the Solr-Feature MLT (MoreLikeThis), but thats not the solution, I have to implement the CosineSimilarity in java myself. Now I have some problems/questions: I get the responses in XML-format, which I read out with an XML-reader in Java, where it wriggle trough every child-node in order to reach the right node. Is there a better way, to get these values in Node-Attributes or node-texts? I have tried it with wt=csv but for the requests I get responses only with the Movie-ID's, nothing more. By XML-responseWriter my request is for example this: http://localhost:8983/solr/select/?qt=tvrhq=id:1800180382fl=idtv.tf_idf=true I get the right response with all terms and tf-tdf's - in xml. And if I add csv-notation http://localhost:8983/solr/select/?qt=tvrhq=id:1800180382fl=idtv.tf_idf=truewt=csv I get only this: id 1800180382 Maybe my request is wrong? Another problem is, if I get the terms and their tfidf-values, I store them in a map. But there isn't a succession in the values. I want e.g. store only the 10 chief terms, so 10 terms with the highest tfidf-values. Can I sort them in a descending succession? I haven't find anything therefor. If its not possible, I must sort them later in the map. My last question is: any movie has a genre - often more than one. Its like the cat-field (category) in the exampledocs with ipod/monitor etc. and its an important pointfor the movies. How can I integrate this factor? I changed the boost-attribute in the Solr-Xml-Schema like this: field name=genre type=string indexed=true stored=true multiValued=true omitNorms=false boost=3 termVectors=true termPositions=true termOffsets=true/ Is that enough or is there any other possibility? Perhaps you see, that I am a beginner in Solr, at the beginning a few weeks ago it was even more difficult for me but now it goes better. I would be very grateful for any help, ideas, tips or suggestions! Many regards Nejla
Re: is it posible to get more Number of characters?
You still haven't given us much to go on. It would be helpful to give some sample inputs, what you see when you query (the output after adding debugQuery=on is helpful), and the fieldType definition from schema.xml for the field in question. You might also try looking at the admin/analysis page to see how your analysis chain breaks up the incoming stream into tokens, that's often helpful Best Erick On Thu, Jan 26, 2012 at 7:24 AM, Jörg Agatz joerg.ag...@googlemail.com wrote: no, i have a lot of charecter in my url.. it looks like a stop at xyz charakters, so i hope to find a way use mor character On Thu, Jan 26, 2012 at 3:11 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Jörg, Hmmm, do you mind rephrasing the question? Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html - Original Message - From: Jörg Agatz joerg.ag...@googlemail.com To: solr-user@lucene.apache.org Cc: Sent: Thursday, January 26, 2012 5:23 AM Subject: is it posible to get more Number of characters? is it posible to get more Number of characters? I have a problem with too many characters in the search, my Think Tank is very long, but this also be the case. Unfortunately I can not find a setting that is responsible.
Re: WARNING: Unable to read: dataimport.properties DHI issue
Nothing jumps out at me, but you might you might get some insight from http://wiki.apache.org/solr/DataImportHandler, see the interactive development mode section. The dataimport.jsp page is helpful. It *looks* like you're sql statement is having problems, but I confess I only glanced at the output... Best Erick On Wed, Jan 25, 2012 at 2:17 PM, Egonsith egons...@gmail.com wrote: I have tried to search for my specific problem but have not found solution. I have also read the wiki on the DHI and seem to have everything set up right but my Query still fails. Thank you for your help I am running Solr 3.1 with Tomcat 6.0 Windows server 2003 r2 and SQL 2008 I have the sqljdbc4.jar sitting in C:\Program Files\Apache Software Foundation\Tomcat 6.0\lib /My solrconfig.xml/ - requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler - lst name=defaults str name=configdb-data-config.xml/str /lst /requestHandler /My db-data-config.xml/ - dataConfig dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=://localhost:1433;DatabaseName=KnowledgeBase_DM user=user password=password / - document - entity dataSource=ds1 name=Titles query=SELECT mrID, mrTitle from KnowledgeBase_DM.dbo.AskMe_Data field column=mrID name=id / field column=mrTitle name=title / - entity name=Desc query=select meDescription from KnowledgeBase_DM.dbo.AskMe_Data field column=meDescription name=description / /entity /entity /document /dataConfig /My logfile Output / Jan 25, 2012 2:17:37 PM org.apache.solr.handler.dataimport.DataImportHandler processConfiguration INFO: Processing configuration from solrconfig.xml: {config=db-data-config.xml} Jan 25, 2012 2:17:37 PM org.apache.solr.handler.dataimport.DataImporter loadDataConfig INFO: Data Configuration loaded successfully Jan 25, 2012 2:17:37 PM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Jan 25, 2012 2:17:37 PM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties *WARNING: Unable to read: dataimport.properties* Jan 25, 2012 2:17:37 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity Titles with URL: ://localhost:1433;DatabaseName=KnowledgeBase_DM Jan 25, 2012 2:17:37 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Time taken for getConnection(): 0 Jan 25, 2012 2:17:37 PM org.apache.solr.common.SolrException log *SEVERE: Exception while processing: Titles document : SolrInputDocument[{}]:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT mrID, mrTitle from KnowledgeBase_DM.dbo.AskMe_Data Processing Document # 1* at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39) at org.apache.solr.handler.dataimport.DebugLogger$2.getData(DebugLogger.java:188) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:591) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:267) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:353) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:411) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:205) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
ord/rord with a function
Is it possible for ord/rord to work with a function? I'm attempting to use rord with a spatial function like the following as a bf: bf=rord(geodist()) If there's no way for this to work, is there a way to simulate the same behavior? For some background, I have two sets of documents: one set applies to a location in NY and another in LA. I want to boost documents that are closer to where the user is searching from. But I only need these sets to be ranked 1 2. In other words, the actual distance should not be used to boost the documents, just if you are closer or farther. We may add more locations in the future, so I'd like to be able to rank the locations from closest to furthest. I need some way to rank the distances, and rord is the right idea, but doesn't seem to work with functions. I'm running Solr 3.4, btw. -- View this message in context: http://lucene.472066.n3.nabble.com/ord-rord-with-a-function-tp3691138p3691138.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Advice - evaluating Solr for categorization keyword search
See below... On Wed, Jan 25, 2012 at 2:38 PM, Becky Neil be...@lovemachineinc.com wrote: Hi all, I've been tasked with evaluating whether Solr is the right solution for my company's search needs. If this isn't the right forum for this kind of question, please let me know where to go instead! We are currently using sql queries to find mysql db results that match a single keyword in one short text field, so our search is pretty crude. Be a little careful here. Often, when people come from a DB background they think in terms of normalized data. If each of your tables is independent of all other tables, then the simple map the rows into documents approach works. More likely, you'll combine bits from several tables into each Solr document and your reflexive distaste for de-normalizing data will trip you up. Get over it G.. What we hope that Solr can do initially is: 1 enable more flexible search (booleans, more than one field searched/matched, etc) This is OOB functionality. But do note that Solr/Lucene query parsing is not a true boolean process, see: http://www.lucidimagination.com/blog/2011/12/28/why-not-and-or-and-not/ 2 live search results (eg new records get added to the index upon creation) As you indicated below, you'd need some process that noticed that your DB changed and then indexed the changed records. Once the records are indexed, Solr will pick up the changes automatically but you have to control the indexing process from outside. 3 search rankings (eg most relevant - least relevant) OOB functionality with lots of knobs to turn for tuning. See edismax 4 categorize our db (take records and at least group them, better if it could assign a label to each record) Depending on what the details are here, this may be OOB. See faceting and grouping/field collapsing. See: http://wiki.apache.org/solr/SolrFacetingOverview http://wiki.apache.org/solr/FieldCollapsing 5 locate nearby results (geospatial search) OOB, although you need to store the lat/lon. See: http://wiki.apache.org/solr/SpatialSearch What I hope you can advise on is: A How would you go about #2 - making sure that new documents are added/indexed asap, based on a new rows to the db? Is that as simple as a setting in Solr, or does it take some coding (eg a listener object, a kron job, etc.). I tried looking at the wiki tutorial but wasn't able to find answers - I couldn't make sense of how to use UpdateRequestProcessor to do it. (http://wiki.apache.org/solr/UpdateRequestProcessor) What you'll be doing here is either using Data Import Handler or SolrJ (Java client) to push solr documents into Solr. This is straight-forward once you know the magic. A trivial SolrJ program that indexes documents from a DB is maybe 100 lines, including imports. It *uses* the updatehandler, but you don't see that, you see something like solrServer.add(ListOfSolrInputDocuments); B What's the status of document clustering? The wiki says it's not been fully implemented. Would we be able to achieve any of #4 yet? If not, what else should we consider? I don't think you're really thinking about document clustering here. I suspect that grouping and/or faceting will be where you start. At least I'd look at that first although clustering may be exactly what you want. Half the battle is learning the right vocabulary G C Would you use Solr over say Google Maps api to run location aware searches? *shrugs* D How long should we expect it to take to configure Solr on our servers with our db, get the initial index set up, and enable live search results? Are we talking one week, or one month? Our db is not tiny, but it's not huge - say around 8k records in each of ~20 tables. Most tables have around 10 fields, including at least one large text field and then a variety of dates, numbers, and small text. Too many variables for you to count on this estimate, but: *If* you can use Data Import Handler and starting from scratch, probably a week. Someone who already knows Solr maybe a day. But whenever I start something new, I usually chase a number of blind alleys. Once set up, indexing your entire corpus will probably be a matter of less than an hour (and I'm being quite conservative here. On my laptop, Solr can index 7K documents/second from the English wiki dump). But at times the database connection is the limiting factor By the way, I recommend that if DIH starts getting hard to use, especially due to the relationships between tables, consider jumping to SolrJ earlier rather than later. Your index size is pretty small by Solr standards, so you probably won't have to shard or do some of the other complex kinds of things that come up when you have lots of data. Note that this is *just* for setting up Solr and being able to query through, say, the admin page. It does not exclude all the work for the UI you'll need to front the app. Count on tweaking your configuration files (e.g. schema.xml and solrconfig.xml) and
Shard timeouts on large (1B docs) Solr cluster
I'm on a project where we have 1B docs sharded across 20 servers. We're not in production yet and we're doing load tests now. We're sending load to hit 100qps per server. As the load increases we're seeing query times sporadically increasing to 10 seconds, 20 seconds, etc. at times. What we're trying to do is set a shard timeout so that responses longer than 2 seconds are discarded. We can live with less results in these cases. We're not replicating yet as we want to see how the 20 shards perform first (plus we're waiting on the massive amount of hardware) I've tried setting the following config in our default req. handler: int name=shard-socket-timeout2000/int int name=shard-connection-timeout2000/int I've just added these, and am testing now, but this doesn't look promising either: int name=timeAllowed2000/int bool name=partialResultstrue/bool Couldn't find much on the wiki about these params - I'm looking for more details about how these work. I'll be happy to update the wiki with more details based on the discussion here. Any details about exactly how I can achieve my goal of timing out and disregarding queries longer that 2 seconds would be greatly appreciated. The index is insanely lean - no stored fields, no norms, no stop words, etc. RAM buffer is 128, and we're using the standard search req. handler. Essentially we're running Solr as a nosql data store, which suits this project, but we need responses to be no longer than 2 seconds at the max. Thanks, -Jay
social123 Data Appending Service
Hi there- I was on your site today and was not sure who to reach out to. My Company, Social123, provides Social Data Appending for companies that provide lists. In a nutshell, we add Facebook, LinkedIn and Twitter contact information to your current lists. Its a great way to easily offer a new service or add on to your current offerings. Providing social media contact information to your customers will allow them to interact with their customers on a whole new level. If you are the right person to speak with, please let me know your availability for a quick 5-minute demo or check out our tour at www.social123.com. If you are not the right person, would you mind passing this e-mail along? Thanks in advance. -- Aaron Biddar Founder, CEO aaron.bid...@social123.com www.social123.com 78 Alexander St. #K Charleston SC 29403 M 678 925 3556 P 800.505.7295 ex101
Re: WARNING: Unable to read: dataimport.properties DHI issue
Erik, Thanks for the reply, i a bit embarres to say this is a clasiic example of a way to messy development enviroment and these erros were due to many diffrent drivers and xml files that were edited way to many times. i have cleaned my dev enviromant and reinstalled tomcat and solr and am now getting past this error. thank you for the help. Mike -- View this message in context: http://lucene.472066.n3.nabble.com/WARNING-Unable-to-read-dataimport-properties-DHI-issue-tp3689183p3691278.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: social123 Data Appending Service
No thanks, not sure which site you're talking about btw. But anyway, no thanks Op 26 januari 2012 19:41 schreef Aaron Biddar aaron.bid...@social123.comhet volgende: Hi there- I was on your site today and was not sure who to reach out to. My Company, Social123, provides Social Data Appending for companies that provide lists. In a nutshell, we add Facebook, LinkedIn and Twitter contact information to your current lists. Its a great way to easily offer a new service or add on to your current offerings. Providing social media contact information to your customers will allow them to interact with their customers on a whole new level. If you are the right person to speak with, please let me know your availability for a quick 5-minute demo or check out our tour at www.social123.com. If you are not the right person, would you mind passing this e-mail along? Thanks in advance. -- Aaron Biddar Founder, CEO aaron.bid...@social123.com www.social123.com 78 Alexander St. #K Charleston SC 29403 M 678 925 3556 P 800.505.7295 ex101
Re: Solr 3.5.0 can't find Carrot classes
Hi, Can you paste the logs from the second run? Thanks, Staszek On Wed, Jan 25, 2012 at 00:12, Christopher J. Bottaro cjbott...@onespot.com wrote: On Tuesday, January 24, 2012 at 3:07 PM, Christopher J. Bottaro wrote: SEVERE: java.lang.NoClassDefFoundError: org/carrot2/core/ControllerFactory at org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.init(CarrotClusteringEngine.java:102) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) at java.lang.reflect.Constructor.newInstance(Unknown Source) at java.lang.Class.newInstance0(Unknown Source) at java.lang.Class.newInstance(Unknown Source) … I'm starting Solr with -Dsolr.clustering.enabled=true and I can see that the Carrot jars in contrib are getting loaded. Full log file is here: http://onespot-development.s3.amazonaws.com/solr.log Any ideas? Thanks for the help. Ok, got a little further. Seems that Solr doesn't like it if you include jars more than once (I had a lib dir and also lib directives in the solrconfig which ended up loading the same jars twice). But now I'm getting these errors: java.lang.NoClassDefFoundError: org/apache/solr/handler/clustering/SearchClusteringEngine Any help? Thanks.
Re: Solr and TF-IDF
content-based recommender so its not CF etc and its a project so its whatever his supervisor wants. take a look at solrj should be more natural to integrate your java code with. (Although not sure if it supports termv ector comp) good luck On 26 January 2012 17:27, Walter Underwood wun...@wunderwood.org wrote: Why are you using a search engine to build a recomender? None of the leading teams in the Netflix Prize used search engines as a base technology. Start with the recommender algorithms in Mahout: http://mahout.apache.org/ wunder On Jan 26, 2012, at 9:18 AM, Nejla Karacan wrote: Hey there, I'm using Solr for my thesis, where I have to implement a content-based recommender system for movies. I have indexed about 20thousand movies with their informations: movie-id title genre plot/movie-description - !!! cast I've enabled the TermvektorComponent for the fields genre, description and cast. So I can get the tf-idf-values for the terms of every movie. With these term-TfIdfValue-couples I have to compute the similarities between movies by using the cosine similarity. I know about the Solr-Feature MLT (MoreLikeThis), but thats not the solution, I have to implement the CosineSimilarity in java myself. Now I have some problems/questions: I get the responses in XML-format, which I read out with an XML-reader in Java, where it wriggle trough every child-node in order to reach the right node. Is there a better way, to get these values in Node-Attributes or node-texts? I have tried it with wt=csv but for the requests I get responses only with the Movie-ID's, nothing more. By XML-responseWriter my request is for example this: http://localhost:8983/solr/select/?qt=tvrhq=id:1800180382fl=idtv.tf_idf=true I get the right response with all terms and tf-tdf's - in xml. And if I add csv-notation http://localhost:8983/solr/select/?qt=tvrhq=id:1800180382fl=idtv.tf_idf=truewt=csv I get only this: id 1800180382 Maybe my request is wrong? Another problem is, if I get the terms and their tfidf-values, I store them in a map. But there isn't a succession in the values. I want e.g. store only the 10 chief terms, so 10 terms with the highest tfidf-values. Can I sort them in a descending succession? I haven't find anything therefor. If its not possible, I must sort them later in the map. My last question is: any movie has a genre - often more than one. Its like the cat-field (category) in the exampledocs with ipod/monitor etc. and its an important pointfor the movies. How can I integrate this factor? I changed the boost-attribute in the Solr-Xml-Schema like this: field name=genre type=string indexed=true stored=true multiValued=true omitNorms=false boost=3 termVectors=true termPositions=true termOffsets=true/ Is that enough or is there any other possibility? Perhaps you see, that I am a beginner in Solr, at the beginning a few weeks ago it was even more difficult for me but now it goes better. I would be very grateful for any help, ideas, tips or suggestions! Many regards Nejla
solr shards
Hello, I've gone through the list and have not found the answer but if it is a repetitive question, my apologies. I have a 3x shards solr cluster. If i send a query to each of the shards individually I get the result with a list of relevant docs. However, if i send the query to the main solr server (dispatcher) it only returns the value for numFound but there is no list of docs. Since i seem to be the only one having this issue, it is probably a misconfiguration for which i couldn't find an answer in the documentations. Can someone please help? Also, the sum of all the individual numFound's seems to not match the numFound I get from the main solr server, given that i do not have any duplicate on the unique key. Thanks in advance, Ramin -- View this message in context: http://lucene.472066.n3.nabble.com/solr-shards-tp3691370p3691370.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: WARNING: Unable to read: dataimport.properties DHI issue
Yeah, that happens. Glad you're past this issue thanks for closing it out. Erick On Thu, Jan 26, 2012 at 10:45 AM, Egonsith egons...@gmail.com wrote: Erik, Thanks for the reply, i a bit embarres to say this is a clasiic example of a way to messy development enviroment and these erros were due to many diffrent drivers and xml files that were edited way to many times. i have cleaned my dev enviromant and reinstalled tomcat and solr and am now getting past this error. thank you for the help. Mike -- View this message in context: http://lucene.472066.n3.nabble.com/WARNING-Unable-to-read-dataimport-properties-DHI-issue-tp3689183p3691278.html Sent from the Solr - User mailing list archive at Nabble.com.
Multiple Data Directories and 1 SOLR instance
Hi, We are using SOLR/Lucene to index/search the data about the user's of an organization. The nature of data is brief information about the user's work. Our data index requirement is to have segregated stores for each organization and currently we have 10 organizations and we have to run 10 different instances of SOLR to serve search results for an organization. As the new organizations are joining it is getting difficult to manage these many instances. I think now there is a need to use 1 SOLR instance and then have 10/multiple different data directories for each organization. When index/search request is received in SOLR we decide the data directory based on the organization. 1. Is it possible to do the same in SOLR and how can we achieve the same? 2. Will it be a good design to use SOLR like this? 3. Is there any impact on the scalability if we are able to manage the separate data directories inside SOLR? Thanks in advance Nitin -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-Data-Directories-and-1-SOLR-instance-tp3691644p3691644.html Sent from the Solr - User mailing list archive at Nabble.com.
Re:Multiple Data Directories and 1 SOLR instance
I wish I had the link for you but it sounds like you are looking to use solr cores. They are separate indexes all under one solr instance. Check out solr 3.5 example as I believe cores are now used and suggested as the default configuration even if you only want to use one core. Cameron On Jan 26, 2012 4:18 PM, Nitin Arora aro_ni...@yahoo.com wrote: Hi, We are using SOLR/Lucene to index/search the data about the user's of an organization. The nature of data is brief information about the user's work. Our data index requirement is to have segregated stores for each organization and currently we have 10 organizations and we have to run 10 different instances of SOLR to serve search results for an organization. As the new organizations are joining it is getting difficult to manage these many instances. I think now there is a need to use 1 SOLR instance and then have 10/multiple different data directories for each organization. When index/search request is received in SOLR we decide the data directory based on the organization. 1. Is it possible to do the same in SOLR and how can we achieve the same? 2. Will it be a good design to use SOLR like this? 3. Is there any impact on the scalability if we are able to manage the separate data directories inside SOLR? Thanks in advance Nitin -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-Data-Directories-and-1-SOLR-instance-tp3691644p3691644.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multiple Data Directories and 1 SOLR instance
Hey, Sounds like what you need to setup is Multiple Cores configuration. At first I confused this with Multi Core CPU, but that's not what it's about. Basically it's a way to run multiple 'solr' cores/indexes/configurations from a single Solr instance (which will scale better as the resources will be shared). Have a read anyway: http://wiki.apache.org/solr/CoreAdmin Cheers, David On 27/01/2012 8:18 AM, Nitin Arora wrote: Hi, We are using SOLR/Lucene to index/search the data about the user's of an organization. The nature of data is brief information about the user's work. Our data index requirement is to have segregated stores for each organization and currently we have 10 organizations and we have to run 10 different instances of SOLR to serve search results for an organization. As the new organizations are joining it is getting difficult to manage these many instances. I think now there is a need to use 1 SOLR instance and then have 10/multiple different data directories for each organization. When index/search request is received in SOLR we decide the data directory based on the organization. 1. Is it possible to do the same in SOLR and how can we achieve the same? 2. Will it be a good design to use SOLR like this? 3. Is there any impact on the scalability if we are able to manage the separate data directories inside SOLR? Thanks in advance Nitin -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-Data-Directories-and-1-SOLR-instance-tp3691644p3691644.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Shard timeouts on large (1B docs) Solr cluster
On Jan 26, 2012, at 1:28 PM, Jay Hill wrote: I've tried setting the following config in our default req. handler: int name=shard-socket-timeout2000/int int name=shard-connection-timeout2000/int What version are you using Jay? At least on trunk, I took a look and it appears at some point these where renamed to socketTimeout and connTimeout. What about a timeout on your clients? - Mark Miller lucidimagination.com
Re: Shard timeouts on large (1B docs) Solr cluster
We're on the trunk: 4.0-2011-10-26_08-46-59 1189079 - hudson - 2011-10-26 08:51:47 Client timeouts are set to 4 seconds. Thanks, -Jay On Thu, Jan 26, 2012 at 1:40 PM, Mark Miller markrmil...@gmail.com wrote: On Jan 26, 2012, at 1:28 PM, Jay Hill wrote: I've tried setting the following config in our default req. handler: int name=shard-socket-timeout2000/int int name=shard-connection-timeout2000/int What version are you using Jay? At least on trunk, I took a look and it appears at some point these where renamed to socketTimeout and connTimeout. What about a timeout on your clients? - Mark Miller lucidimagination.com
Re: Shard timeouts on large (1B docs) Solr cluster
i'm changing the params to socketTimeout and connTimeout and will test this afternoon. client timeout was actually removed today, which helped a bit. what about the other params, timeAllowed and partialResults. my expectation was that these were specifically for distributed search, meaning if a response wasn't received w/in the timeAllowed, and if partialResults is true, then that shard would not be waited on for results. is that correct? thanks, -jay On Thu, Jan 26, 2012 at 2:23 PM, Jay Hill jayallenh...@gmail.com wrote: We're on the trunk: 4.0-2011-10-26_08-46-59 1189079 - hudson - 2011-10-26 08:51:47 Client timeouts are set to 4 seconds. Thanks, -Jay On Thu, Jan 26, 2012 at 1:40 PM, Mark Miller markrmil...@gmail.comwrote: On Jan 26, 2012, at 1:28 PM, Jay Hill wrote: I've tried setting the following config in our default req. handler: int name=shard-socket-timeout2000/int int name=shard-connection-timeout2000/int What version are you using Jay? At least on trunk, I took a look and it appears at some point these where renamed to socketTimeout and connTimeout. What about a timeout on your clients? - Mark Miller lucidimagination.com
Re: Multiple Data Directories and 1 SOLR instance
Nitin, Use Multicore configuration. For each organization, you create a new core with especific configurations. You will have one SOLR instance and one SOLR Admin tool to manage all cores. The configuration is simple. Good Luck Regards Anderson 2012/1/26 David Radunz da...@boxen.net Hey, Sounds like what you need to setup is Multiple Cores configuration. At first I confused this with Multi Core CPU, but that's not what it's about. Basically it's a way to run multiple 'solr' cores/indexes/configurations from a single Solr instance (which will scale better as the resources will be shared). Have a read anyway: http://wiki.apache.org/solr/**CoreAdminhttp://wiki.apache.org/solr/CoreAdmin Cheers, David On 27/01/2012 8:18 AM, Nitin Arora wrote: Hi, We are using SOLR/Lucene to index/search the data about the user's of an organization. The nature of data is brief information about the user's work. Our data index requirement is to have segregated stores for each organization and currently we have 10 organizations and we have to run 10 different instances of SOLR to serve search results for an organization. As the new organizations are joining it is getting difficult to manage these many instances. I think now there is a need to use 1 SOLR instance and then have 10/multiple different data directories for each organization. When index/search request is received in SOLR we decide the data directory based on the organization. 1. Is it possible to do the same in SOLR and how can we achieve the same? 2. Will it be a good design to use SOLR like this? 3. Is there any impact on the scalability if we are able to manage the separate data directories inside SOLR? Thanks in advance Nitin -- View this message in context: http://lucene.472066.n3.** nabble.com/Multiple-Data-**Directories-and-1-SOLR-** instance-tp3691644p3691644.**htmlhttp://lucene.472066.n3.nabble.com/Multiple-Data-Directories-and-1-SOLR-instance-tp3691644p3691644.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: WARNING: Unable to read: dataimport.properties DHI issue
On Thu, Jan 26, 2012 at 3:47 AM, Egonsith egons...@gmail.com wrote: I have tried to search for my specific problem but have not found solution. I have also read the wiki on the DHI and seem to have everything set up right but my Query still fails. Thank you for your help [...] This has nothing to do with the warning in the title of your message. That is very likely because the user running DIH (typically the Jetty/ tomcat user) does not have permissions to read/write the dataimport.properties file in your Solr conf/ directory The relevant error in your log is the following one: *SEVERE: Exception while processing: Titles document : SolrInputDocument[{}]:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT mrID, mrTitle from KnowledgeBase_DM.dbo.AskMe_Data Processing Document # 1* at [...] Caused by: java.lang.NullPointerException at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:241) [...] Your SQL select is failing for some reason. Please check the setup there. E.g., one item that is incorrect is the url attribute in: dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=://localhost:1433;DatabaseName=KnowledgeBase_DM user=user password=password / It should be something like url=jdbc:sqlserver://localhost:1433;DatabaseName=KnowledgeBase_DM Regards, Gora
Re: Solr Join query with fq not correctly filtering results?
I created issue https://issues.apache.org/jira/browse/SOLR-3062 for this problem. I was able to track it down to something in this commit - http://svn.apache.org/viewvc?view=revisionrevision=1188624 (LUCENE-1536: Filters can now be applied down-low, if their DocIdSet implements a new bits() method, returning all documents in a random access way ) - before that commit the join / fq functionality works as expected / documented on the wiki page. After that commit it's broken. Any assistance is greatly appreciated! Thanks, Mike On Thu, Jan 26, 2012 at 11:04 AM, Mike Hugo m...@piragua.com wrote: Hello, I'm trying out the Solr JOIN query functionality on trunk. I have the latest checkout, revision #1236272 - I did the following steps to get the example up and running: cd solr ant example java -jar start.jar cd exampledocs java -jar post.jar *.xml Then I tried a few of the sample queries on the wiki page http://wiki.apache.org/solr/Join. In particular, this is one that I'm interest in Find all manufacturer docs named belkin, then join them against (product) docs and filter that list to only products with a price less than 12 dollars http://localhost:8983/solr/select?q={!join+from=id+to=manu_id_s}compName_s:Belkinfq=price:%5B%2A+TO+12%5Dhttp://localhost:8983/solr/select?q=%7B!join+from=id+to=manu_id_s%7DcompName_s:Belkinfq=price:%5B%2A+TO+12%5D However, when I run that query, I get two results, one with a price of 19.95 and another with a price of 11.5 Because of the filter query, I'm only expecting to see one result - the one with a price of 11.99. I was also able to replicate this in a unit test added to org.apache.solr.TestJoin: @Test public void testJoin_withFilterQuery() throws Exception { assertU(add(doc(id, 1,name, john, title, Director, dept_s,Engineering))); assertU(add(doc(id, 2,name, mark, title, VP, dept_s,Marketing))); assertU(add(doc(id, 3,name, nancy, title, MTS, dept_s,Sales))); assertU(add(doc(id, 4,name, dave, title, MTS, dept_s,Support, dept_s,Engineering))); assertU(add(doc(id, 5,name, tina, title, VP, dept_s,Engineering))); assertU(add(doc(id,10, dept_id_s, Engineering, text,These guys develop stuff))); assertU(add(doc(id,11, dept_id_s, Marketing, text,These guys make you look good))); assertU(add(doc(id,12, dept_id_s, Sales, text,These guys sell stuff))); assertU(add(doc(id,13, dept_id_s, Support, text,These guys help customers))); assertU(commit()); //*** //This works as expected - the correct number of results are found //*** // find people that develop stuff assertJQ(req(q,{!join from=dept_id_s to=dept_s}text:develop, fl,id) ,/response=={'numFound':3,'start':0,'docs':[{'id':'1'},{'id':'4'},{'id':'5'}]} ); *// *// this fails - the response returned finds all three people - it should only find John* *//expected =/response=={numFound:1,start:0,docs:[{id:1}]}* *//response = {* *//responseHeader:{* *// status:0,* *// QTime:4},* *//response:{numFound:3,start:0,docs:[* *// {* *//id:1},* *// {* *//id:4},* *// {* *//id:5}]* *//}}* *// *// find people that develop stuff - but limit via filter query to a name of john* *assertJQ(req(q,{!join from=dept_id_s to=dept_s}text:develop, fl,id, fq, name:john)* *,/response=={'numFound':1,'start':0,'docs':[{'id':'1'}]}* *);* } Interestingly, I know this worked at some point. I had a snapshot build in my ivy cache from 10/2/2011 and it was working with that build maven_artifacts/org/apache/solr/ solr/4.0-SNAPSHOT/solr-4.0-20111002.161157-1.pom Mike
addBean method inserting multivalued values
Hi, I have annotated the setter methods with Field annotations. And I am using addBean method to add SOLR document. But all fields are being indexed as multivalued: doc float name=score1.0/float arr name=id str1/str /arr arr name=name strsiddharth 0/str /arr arr name=updated_dt date2012-01-28T06:22:19.946Z/date /arr /doc How to avoid this?
Re: SpellCheck Help
Downloaded Apache Solr from the URL: http://apache.dattatec.com//lucene/solr/ , extracted it at my windows machine. Then started solr: [solr-path]/example, and typed the following in a terminal: java –jar start.jar. it started and i can see the solr page at http://localhost:8983/solr/admin/ Now copied Magento [magento-instance-root]/lib/Apache/Solr/conf to [Solr-instance-root]/example/solr/conf. then again restared solr lots of activity was going on their. then I run System-index management and at front end search box i tried to search a product with incorrect spelling, in solr console i can see some activity but at magento front end I couldnt get any result, why ? I followed the steps given at this URL: http://www.summasolutions.net/blogposts/magento-apache-solr-set#comment-615 Please look into it and let me know any other information you require. I also want to know how i can implement facet and highlight search with resulted output. -- View this message in context: http://lucene.472066.n3.nabble.com/SpellCheck-Help-tp3648589p3692518.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SpellCheck Help
Hey, I really recommend you contact Magento pre-sales to find out why THEIR stuff doesn't work. The information you have provided is specific to magento... You can't expect people on a Solr mailing list to help you with a Magento problem. I guarantee you the issue is probably something Magento is doing, so try seeking support their first (Try their mailing lists if they have any, or on IRC: irc.freenode.org #magento). I am not trying to be rude, rather to save you time and others effort. Cheers, David On 27/01/2012 5:37 PM, vishal_asc wrote: Downloaded Apache Solr from the URL: http://apache.dattatec.com//lucene/solr/ , extracted it at my windows machine. Then started solr: [solr-path]/example, and typed the following in a terminal: java –jar start.jar. it started and i can see the solr page at http://localhost:8983/solr/admin/ Now copied Magento [magento-instance-root]/lib/Apache/Solr/conf to [Solr-instance-root]/example/solr/conf. then again restared solr lots of activity was going on their. then I run System-index management and at front end search box i tried to search a product with incorrect spelling, in solr console i can see some activity but at magento front end I couldnt get any result, why ? I followed the steps given at this URL: http://www.summasolutions.net/blogposts/magento-apache-solr-set#comment-615 Please look into it and let me know any other information you require. I also want to know how i can implement facet and highlight search with resulted output. -- View this message in context: http://lucene.472066.n3.nabble.com/SpellCheck-Help-tp3648589p3692518.html Sent from the Solr - User mailing list archive at Nabble.com.