in Problem
Hi, When i type any query string without in it is giving proper results. But when i try same query string using in then it is not displaying the proper results. May i know what is the problem. And i mentioned in as a stopword. If remove in from the stop words it is not showing relevant results. Ex : used computers chennai -- showing good results used computer in chennai -- Not showing proper results Can anybody tell me what is the problem? -- View this message in context: http://lucene.472066.n3.nabble.com/in-Problem-tp4092866.html Sent from the Solr - User mailing list archive at Nabble.com.
Not able to run sample solr examples
Hi, I am running Solr on Tomcat server and am able to go to the solr link from my Tomcat manager. I want to try running quieries through the solr admin page on the solr examples which come built-in when i install solr. How can i run queries on those examples? Thanks, Mamta -- View this message in context: http://lucene.472066.n3.nabble.com/Not-able-to-run-sample-solr-examples-tp4092872.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Issue in parallel Indexing using multiple csv files
Ran more tests. It works. -- View this message in context: http://lucene.472066.n3.nabble.com/Issue-in-parallel-Indexing-using-multiple-csv-files-tp4092452p4092873.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: in Problem
Hi, See here, hope it helps. http://stackoverflow.com/questions/2681393/solr-is-there-a-way-to-include-stopwords-when-searching-exact-phrases On Tue, Oct 1, 2013 at 9:34 AM, PAVAN pavans2...@gmail.com wrote: Hi, When i type any query string without in it is giving proper results. But when i try same query string using in then it is not displaying the proper results. May i know what is the problem. And i mentioned in as a stopword. If remove in from the stop words it is not showing relevant results. Ex : used computers chennai -- showing good results used computer in chennai -- Not showing proper results Can anybody tell me what is the problem? -- View this message in context: http://lucene.472066.n3.nabble.com/in-Problem-tp4092866.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem regarding queries enclosed in double quotes in Solr 3.4
Perhaps you can make a query parser to fix this? It would parse the incoming query and substitute some_terms with some_terms ~0 On Tue, Oct 1, 2013 at 7:43 AM, Kunal Mittal kunalmitta...@gmail.comwrote: We have a Solr 3.4 setup. When we try to do queries with double quotes like : semantic web , the query takes a long time to execute. One solution we are thinking about is to make the same query without the quotes and set the phrase slop(ps) parameter to 0. That is quite quicker than the query with the quotes and gives similar results to the query with quotes. Is there a way to fix this by modifying the schema.xml file? Any suggestions would be appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-regarding-queries-enclosed-in-double-quotes-in-Solr-3-4-tp4092856.html Sent from the Solr - User mailing list archive at Nabble.com.
Newbie to Solr
Hi, I want to know that if i have to fire some query through the Solr admin, do i need to create a new schema.xml? Where do i place it incase iahve to create a new one. Incase i can edit the original schema.xml can there be two fields named id in my schema.xml? I desperately need help in running queries on the Solr admin which is configured on a Tomcat server. What all preparation will i need to do? Schema.xml any docs? Any help will be highly appreciated. Thanks, Mamta -- View this message in context: http://lucene.472066.n3.nabble.com/Newbie-to-Solr-tp4092876.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: OpenJDK or OracleJDK
This sounds interesting... Thanks guyz for the replies.. :) On Tue, Oct 1, 2013 at 8:07 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, A while back I remember we notices some SPM users were having issues with OpenJDK. Since then we've been recommending Oracle's implementation to our Solr and to SPM users. At the same time, we haven't seen any issues with OpenJDK in the last ~6 months. Oracle JDK is not slow. :) Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Mon, Sep 30, 2013 at 11:02 PM, Shawn Heisey s...@elyograg.org wrote: On 9/30/2013 9:28 AM, Raheel Hasan wrote: hmm why is that so? Isnt Oracle's version a bit slow? For Java 6, the Sun JDK is the reference implementation. For Java 7, OpenJDK is the reference implementation. http://en.wikipedia.org/wiki/Reference_implementation I don't think Oracle's version could really be called slow. Sun invented Java. Sun open sourced Java. Oracle bought Sun. The Oracle implemetation is likely more conservative than some of the other implementations, like the one by IBM. The IBM implementation is pretty aggressive with optimization, so aggressive that Solr and Lucene have a history of revealing bugs that only exist in that implementation. Thanks, Shawn -- Regards, Raheel Hasan
{soft}Commit and cache flusing
Hello! This is a minor thing, perhaps, but thought to ask / share: if there are no modifications to an index and a softCommit or hardCommit issued, then solr flushes the cache. Is this designed on purpose? Regards, Dmitry
Re: Problem regarding queries enclosed in double quotes in Solr 3.4
Which query parser are you using? It seems you are mixing them up. As far as I know, edismax doesnt support quoted phrases, it uses pf param to invoke phrase queries. Likewise, the lucene query parser doesn't support a phrase slop param, it uses a phrase slop~2 syntax. Upayavira On Tue, Oct 1, 2013, at 05:43 AM, Kunal Mittal wrote: We have a Solr 3.4 setup. When we try to do queries with double quotes like : semantic web , the query takes a long time to execute. One solution we are thinking about is to make the same query without the quotes and set the phrase slop(ps) parameter to 0. That is quite quicker than the query with the quotes and gives similar results to the query with quotes. Is there a way to fix this by modifying the schema.xml file? Any suggestions would be appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-regarding-queries-enclosed-in-double-quotes-in-Solr-3-4-tp4092856.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: in Problem
Hi Dmitry, I already defined in the following way filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ -- View this message in context: http://lucene.472066.n3.nabble.com/in-Problem-tp4092866p4092899.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr doesn't return TermVectors
Nope, it's not the last component problem, but it's definetely the request handler problem, it was the same for me ... Switching to the /tvrh requesthandler solved my problem. We should update the wiki ! 2013/9/27 Shawn Heisey s...@elyograg.org On 9/27/2013 4:02 PM, Jack Krupansky wrote: You are using components instead of last-components, so you have to all search components, including the QueryComponent. Better to use last-components. That did it. Thank you! I didn't know why this was a problem even with your note, until I read the last part of this page, which says that using components will entirely replace the default component list with what you specify: http://wiki.apache.org/solr/**SearchComponenthttp://wiki.apache.org/solr/SearchComponent I copied and modified the handler from one I've already got that's using TermsComponent, which was using components instead of last-components. That handler works, so I figured it would for /tv as well. :) Thanks, Shawn -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Not able to run sample solr examples
http://www.coretechnologies.com/products/AlwaysUp/Apps/RunApacheSolrAsAService.html Regards, Kishan Parmar Software Developer +91 95 100 77394 Jay Shree Krishnaa !! On Tue, Oct 1, 2013 at 12:48 AM, mamta mamta.al...@gmail.com wrote: Hi, I am running Solr on Tomcat server and am able to go to the solr link from my Tomcat manager. I want to try running quieries through the solr admin page on the solr examples which come built-in when i install solr. How can i run queries on those examples? Thanks, Mamta -- View this message in context: http://lucene.472066.n3.nabble.com/Not-able-to-run-sample-solr-examples-tp4092872.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Not able to run sample solr examples
Hi, My problem is i am not able to run the sample examples given in solr .i cannot run them through the solr admin consoleit doesn't give me the result.I have already indexed the documents. Appreciate your help! Thanks, Mamta On Tue, Oct 1, 2013 at 3:08 PM, Kishan Parmar kishan@gmail.com wrote: http://www.coretechnologies.com/products/AlwaysUp/Apps/RunApacheSolrAsAService.html Regards, Kishan Parmar Software Developer +91 95 100 77394 Jay Shree Krishnaa !! On Tue, Oct 1, 2013 at 12:48 AM, mamta mamta.al...@gmail.com wrote: Hi, I am running Solr on Tomcat server and am able to go to the solr link from my Tomcat manager. I want to try running quieries through the solr admin page on the solr examples which come built-in when i install solr. How can i run queries on those examples? Thanks, Mamta -- View this message in context: http://lucene.472066.n3.nabble.com/Not-able-to-run-sample-solr-examples-tp4092872.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Newbie to Solr
yes you have to create your own schema but in schema file you have to add your xml files field name in it like wise you can add your field name in it or you can add your filed in the default schema file whiithout schema you can not add your xml file to solr my schema is like this -- ?xml version=1.0 encoding=UTF-8 ? schema name=example version=1.5 fields field name=No type=string indexed=true stored=true required=true multiValued=false / field name=Name type=string indexed=true stored=true required=true multiValued=false / field name=Address type=string indexed=true stored=true required=true multiValued=false / field name=Mobile type=string indexed=true stored=true required=true multiValued=false / /fields uniqueKeyNo/uniqueKey types fieldType name=string class=solr.StrField sortMissingLast=true / fieldType name=int class=solr.TrieIntField precisionStep=0 positionIncrementGap=0 / /types /schema - and my file is like this ,,.,.,.,. - add doc field name=No100120107088/field field name=Namekishan/field field name=Addressghatlodia/field field name=Mobile9510077394/field /doc /add Regards, Kishan Parmar Software Developer +91 95 100 77394 Jay Shree Krishnaa !! On Tue, Oct 1, 2013 at 1:11 AM, mamta mamta.al...@gmail.com wrote: Hi, I want to know that if i have to fire some query through the Solr admin, do i need to create a new schema.xml? Where do i place it incase iahve to create a new one. Incase i can edit the original schema.xml can there be two fields named id in my schema.xml? I desperately need help in running queries on the Solr admin which is configured on a Tomcat server. What all preparation will i need to do? Schema.xml any docs? Any help will be highly appreciated. Thanks, Mamta -- View this message in context: http://lucene.472066.n3.nabble.com/Newbie-to-Solr-tp4092876.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Newbie to Solr
Can you tell me what all docs I need to create...there needs to be a schema.xml and what else? A document having my data? Also, where these should be placed. There's already a schema.xml Thanks for the prompt response. Mamta. -Original Message- From: Kishan Parmar [mailto:kishan@gmail.com] Sent: 01 October, 2013 03:16 PM To: solr-user@lucene.apache.org Subject: Re: Newbie to Solr yes you have to create your own schema but in schema file you have to add your xml files field name in it like wise you can add your field name in it or you can add your filed in the default schema file whiithout schema you can not add your xml file to solr my schema is like this -- ?xml version=1.0 encoding=UTF-8 ? schema name=example version=1.5 fields field name=No type=string indexed=true stored=true required=true multiValued=false / field name=Name type=string indexed=true stored=true required=true multiValued=false / field name=Address type=string indexed=true stored=true required=true multiValued=false / field name=Mobile type=string indexed=true stored=true required=true multiValued=false / /fields uniqueKeyNo/uniqueKey types fieldType name=string class=solr.StrField sortMissingLast=true / fieldType name=int class=solr.TrieIntField precisionStep=0 positionIncrementGap=0 / /types /schema - and my file is like this ,,.,.,.,. - add doc field name=No100120107088/field field name=Namekishan/field field name=Addressghatlodia/field field name=Mobile9510077394/field /doc /add Regards, Kishan Parmar Software Developer +91 95 100 77394 Jay Shree Krishnaa !! On Tue, Oct 1, 2013 at 1:11 AM, mamta mamta.al...@gmail.com wrote: Hi, I want to know that if i have to fire some query through the Solr admin, do i need to create a new schema.xml? Where do i place it incase iahve to create a new one. Incase i can edit the original schema.xml can there be two fields named id in my schema.xml? I desperately need help in running queries on the Solr admin which is configured on a Tomcat server. What all preparation will i need to do? Schema.xml any docs? Any help will be highly appreciated. Thanks, Mamta -- View this message in context: http://lucene.472066.n3.nabble.com/Newbie-to-Solr-tp4092876.html Sent from the Solr - User mailing list archive at Nabble.com. The content of this email together with any attachments, statements and opinions expressed herein contains information that is private and confidential are intended for the named addressee(s) only. If you are not the addressee of this email you may not copy, forward, disclose or otherwise use it or any part of it in any form whatsoever. If you have received this message in error please notify postmas...@etisalat.ae by email immediately and delete the message without making any copies.
Re: in Problem
can you run both examples you provided through the query analysis of solr admin and see if there is any difference with term positions? On Tue, Oct 1, 2013 at 1:36 PM, PAVAN pavans2...@gmail.com wrote: Hi Dmitry, I already defined in the following way filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ -- View this message in context: http://lucene.472066.n3.nabble.com/in-Problem-tp4092866p4092899.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Newbie to Solr
you have to create only schema file dont change anything in solr config file,, and your xml file which you want to index from solr if you are new in solr then there is core named collection1 you have to add thee schema file in that collection conf folder C:\solr\example\solr\collection1\conf your schema file is should be in c: solr - examples example docs folder in that folder post.jar and post.sh file there so that you can add yu Regards, Kishan Parmar Software Developer +91 95 100 77394 Jay Shree Krishnaa !! On Tue, Oct 1, 2013 at 4:19 AM, Mamta S Kanade mkan...@etisalat.ae wrote: Can you tell me what all docs I need to create...there needs to be a schema.xml and what else? A document having my data? Also, where these should be placed. There's already a schema.xml Thanks for the prompt response. Mamta. -Original Message- From: Kishan Parmar [mailto:kishan@gmail.com] Sent: 01 October, 2013 03:16 PM To: solr-user@lucene.apache.org Subject: Re: Newbie to Solr yes you have to create your own schema but in schema file you have to add your xml files field name in it like wise you can add your field name in it or you can add your filed in the default schema file whiithout schema you can not add your xml file to solr my schema is like this -- ?xml version=1.0 encoding=UTF-8 ? schema name=example version=1.5 fields field name=No type=string indexed=true stored=true required=true multiValued=false / field name=Name type=string indexed=true stored=true required=true multiValued=false / field name=Address type=string indexed=true stored=true required=true multiValued=false / field name=Mobile type=string indexed=true stored=true required=true multiValued=false / /fields uniqueKeyNo/uniqueKey types fieldType name=string class=solr.StrField sortMissingLast=true / fieldType name=int class=solr.TrieIntField precisionStep=0 positionIncrementGap=0 / /types /schema - and my file is like this ,,.,.,.,. - add doc field name=No100120107088/field field name=Namekishan/field field name=Addressghatlodia/field field name=Mobile9510077394/field /doc /add Regards, Kishan Parmar Software Developer +91 95 100 77394 Jay Shree Krishnaa !! On Tue, Oct 1, 2013 at 1:11 AM, mamta mamta.al...@gmail.com wrote: Hi, I want to know that if i have to fire some query through the Solr admin, do i need to create a new schema.xml? Where do i place it incase iahve to create a new one. Incase i can edit the original schema.xml can there be two fields named id in my schema.xml? I desperately need help in running queries on the Solr admin which is configured on a Tomcat server. What all preparation will i need to do? Schema.xml any docs? Any help will be highly appreciated. Thanks, Mamta -- View this message in context: http://lucene.472066.n3.nabble.com/Newbie-to-Solr-tp4092876.html Sent from the Solr - User mailing list archive at Nabble.com. The content of this email together with any attachments, statements and opinions expressed herein contains information that is private and confidential are intended for the named addressee(s) only. If you are not the addressee of this email you may not copy, forward, disclose or otherwise use it or any part of it in any form whatsoever. If you have received this message in error please notify postmas...@etisalat.ae by email immediately and delete the message without making any copies.
Re: Sorting dependent on user preferences with FunctionQuery
Hello, thanks for your answers. I checked your suggestions, but I'm not quite there yet. With field collapsing, I only get the top result per category, which is not what I want, I want to have all results! And boosting is quite an interesting idea. With the following I get what I need, all results but Books at the top: q=+*:* category:Book^2.2q.op=OR Unfortunatly, our basic operator is AND. I'm not sure if our customers are ok whith the results if we change that. How can I do it with AND? Best regards, Nikola -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-dependent-on-user-preferences-with-FunctionQuery-tp4092119p4092912.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Newbie to Solr
I can have only one schema.xml file right? Can i over-write the one which originally comes with solr set-up? the original schema.xml is @ C:\solr\solr\solr\conf along with post.sh et all..where should my other document be? i need to run post.jar on my doc file (xml) to index it right? I could unfortunately not get any document which tells me how to run solr queries through my tomcat..do you know of any link/books? Thank you! Kishan. Thanks, Mamta On Tue, Oct 1, 2013 at 3:30 PM, Kishan Parmar kishan@gmail.com wrote: you have to create only schema file dont change anything in solr config file,, and your xml file which you want to index from solr if you are new in solr then there is core named collection1 you have to add thee schema file in that collection conf folder C:\solr\example\solr\collection1\conf your schema file is should be in c: solr - examples example docs folder in that folder post.jar and post.sh file there so that you can add yu Regards, Kishan Parmar Software Developer +91 95 100 77394 Jay Shree Krishnaa !! On Tue, Oct 1, 2013 at 4:19 AM, Mamta S Kanade mkan...@etisalat.ae wrote: Can you tell me what all docs I need to create...there needs to be a schema.xml and what else? A document having my data? Also, where these should be placed. There's already a schema.xml Thanks for the prompt response. Mamta. -Original Message- From: Kishan Parmar [mailto:kishan@gmail.com] Sent: 01 October, 2013 03:16 PM To: solr-user@lucene.apache.org Subject: Re: Newbie to Solr yes you have to create your own schema but in schema file you have to add your xml files field name in it like wise you can add your field name in it or you can add your filed in the default schema file whiithout schema you can not add your xml file to solr my schema is like this -- ?xml version=1.0 encoding=UTF-8 ? schema name=example version=1.5 fields field name=No type=string indexed=true stored=true required=true multiValued=false / field name=Name type=string indexed=true stored=true required=true multiValued=false / field name=Address type=string indexed=true stored=true required=true multiValued=false / field name=Mobile type=string indexed=true stored=true required=true multiValued=false / /fields uniqueKeyNo/uniqueKey types fieldType name=string class=solr.StrField sortMissingLast=true / fieldType name=int class=solr.TrieIntField precisionStep=0 positionIncrementGap=0 / /types /schema - and my file is like this ,,.,.,.,. - add doc field name=No100120107088/field field name=Namekishan/field field name=Addressghatlodia/field field name=Mobile9510077394/field /doc /add Regards, Kishan Parmar Software Developer +91 95 100 77394 Jay Shree Krishnaa !! On Tue, Oct 1, 2013 at 1:11 AM, mamta mamta.al...@gmail.com wrote: Hi, I want to know that if i have to fire some query through the Solr admin, do i need to create a new schema.xml? Where do i place it incase iahve to create a new one. Incase i can edit the original schema.xml can there be two fields named id in my schema.xml? I desperately need help in running queries on the Solr admin which is configured on a Tomcat server. What all preparation will i need to do? Schema.xml any docs? Any help will be highly appreciated. Thanks, Mamta -- View this message in context: http://lucene.472066.n3.nabble.com/Newbie-to-Solr-tp4092876.html Sent from the Solr - User mailing list archive at Nabble.com. The content of this email together with any attachments, statements and opinions expressed herein contains information that is private and confidential are intended for the named addressee(s) only. If you are not the addressee of this email you may not copy, forward, disclose or otherwise use it or any part of it in any form whatsoever. If you have received this message in error please notify postmas...@etisalat.ae by email immediately and delete the message without making any copies.
SolrCloud. Scale-test by duplicating same index to the shards and make it behave each index is different (uniqueId).
Hello everyone, I have a small challenge performance testing a SolrCloud setup. I have 10 shards, and each shard is supposed to have index-size ~200GB. However I only have a single index of 200GB because it will take too long to build another index with different data, and I hope to somehow use this index on all 10 shards and make it behave as all documents are different on each shard. So building more indexes from new data is not an option. Making a query to a SolrCloud is a two-phase operation. First all shards receive the query and return ID's and ranking. The merger will then remove duplicate ID's and then the full documents will be retreived. When I copy this index to all shards and make a request the following will happen: Phase one: All shards will receive the query and return ids+ranking (actually same set from all shards). This part is realistic enough. Phase two: ID's will be merged and retrieving the documents is not realistic as if they were spread out between shards (IO wise). Is there any way I can 'fake' this somehow and have shards return a prefixed_id for phase1 etc., which then also have to be undone when retriving the documents for phase2. I have tried making the hack in org.apache.solr.handler.component.QueryComponent and a few other classes, but no success. (The resultset are always empty). I do not need to index any new documents, which would also be a challenge due to the ID hash-interval for the shards with this hack. Anyone has a good idea how to make this hack work? From, Thomas Egense
solr cpu usage
hi We're building a spec for a machine to purchase. We're going to buy 10 machines. we aren't sure yet how many proccesses we will run per machine. the question is -should we buy faster cpu with less cores or slower cpu with more cores? in any case we will have 2 cpus in each machine. should we buy 2.6Ghz cpu with 8 cores or 3.5Ghz cpu with 4 cores? what will we gain by having many cores? what kinds of usages would make cpu be the bottleneck? -- View this message in context: http://lucene.472066.n3.nabble.com/solr-cpu-usage-tp4092938.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: {soft}Commit and cache flusing
On 10/1/2013 2:48 AM, Dmitry Kan wrote: This is a minor thing, perhaps, but thought to ask / share: if there are no modifications to an index and a softCommit or hardCommit issued, then solr flushes the cache. Any time you do a commit that opens a new Searcher object (openSearcher=true, which is required if you want index changes to be visible to people making queries), the caches are invalidated. This is because the layout of the index (and therefore the Lucene internal IDs) can completely change with *any* commit/merge, and there is no easy and reliable way to determine when the those numbers have NOT changed. If you have warming queries configured, those happen on the new searcher, populating the new cache. If you have cache autoWarming configured, then keys from the old caches are re-queried against the new index and used to populate the new cache. I do not understand deep Lucene internals, but what I've seen come through Jira activity and commits over the last year or two has been a strong move towards per-segment thinking instead of whole-index thinking. If this idea becomes applicable to all aspects of Lucene, then perhaps Solr caches can also become per-segment, and will not need to be completely invalidated except in the case of a major merge or forceMerge. Thanks, Shawn
how to manually update a field in the index without re-crawling?
Good morning, I'm currently using Solr 4.0 FINAL. I indexed a website and it took over 24 hours to crawl. I just realized I need to rename one of the fields (or add a new one). so I added the new field to the schema, But how do I copy the data over from the old field to the new field without recrawling everything? Is this possible? I was thinking about maybe putting an update chain processor in the /update handler but I'm not sure that will work. Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-manually-update-a-field-in-the-index-without-re-crawling-tp4092955.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Doing time sensitive search in solr
Try it and see :). Dynamic fields are just like regular fields once you index a document that uses one. After that, they should behave just like regular. If you're asking if you can create a query like *_txt:text meaning search all the fields that end with _txt for the word text, I don't think so. An alternative is to copy all the fields into a catch-all field... Best, Erick On Mon, Sep 30, 2013 at 3:41 PM, Darniz rnizamud...@edmunds.com wrote: Hello i just wanted to make sure can we query dynamic fields using wildcard well if not then i dont think this solution might work, since i dont know the exact concrete name of the field. -- View this message in context: http://lucene.472066.n3.nabble.com/Doing-time-sensitive-search-in-solr-tp4092273p4092830.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.0 is stripping XML format from RSS content field
If anyone is interested, I managed to resolve this a long time ago. I used a Data Import Handler instead and it worked beautifully. DIH are very forgiving and it takes what ever XML data is there and injects it into the Solr Index. It's a lot faster than crawling too. You use XPATH to map the fields to your schema. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-is-stripping-XML-format-from-RSS-content-field-tp4039809p4092961.html Sent from the Solr - User mailing list archive at Nabble.com.
Auto Suggest - Time decay
I am trying to implement an auto suggest based on time decay function. I have a separate index just to store auto suggest keywords. I would be calculating the frequency over time rather than just calculating just based on frequency alone. I am thinking of using a database to perform the calculation and update the SOLR index with the boost calculated based on time decay function. I am not sure if there is a better way to do this... I need to boost the terms based on the frequency over time, Ex: when someone searches for 'apple' 1 times during a iphone launch (one particular day) shouldn't really make apple come up in the auto suggestion always when someone types in the keyword 'a' rather it should lose its popularity exponentially.. Anyone has any suggestions? -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-Suggest-Time-decay-tp4092965.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Auto Suggest - Time decay
Are you using the suggester component? or a separated core? I've used a separated core to store suggestions and order this suggestions (queries performed on the frontend) using a time decay function, and it works great for me. Regards, - Mensaje original - De: SolrLover bbar...@gmail.com Para: solr-user@lucene.apache.org Enviados: Martes, 1 de Octubre 2013 12:12:13 Asunto: Auto Suggest - Time decay I am trying to implement an auto suggest based on time decay function. I have a separate index just to store auto suggest keywords. I would be calculating the frequency over time rather than just calculating just based on frequency alone. I am thinking of using a database to perform the calculation and update the SOLR index with the boost calculated based on time decay function. I am not sure if there is a better way to do this... I need to boost the terms based on the frequency over time, Ex: when someone searches for 'apple' 1 times during a iphone launch (one particular day) shouldn't really make apple come up in the auto suggestion always when someone types in the keyword 'a' rather it should lose its popularity exponentially.. Anyone has any suggestions? -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-Suggest-Time-decay-tp4092965.html Sent from the Solr - User mailing list archive at Nabble.com. III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu
Re: Percolate feature?
On 01/10/2013 04:12, Otis Gospodnetic wrote: Just came across this ancient thread. Charlie, did this end up happening? I suspect Wolfgang may be interested, but that's just a wild guess. Hi Otis all, Yes we're actually planning to talk about it at Lucene Revolution in November and open source it around then - it's called 'Luwak' and we're working on a live customer implementation based on it currently. I was curious about your feeling that what you were open-sourcing might be a lot faster and more flexible than ES's percolator - can you share more about why do you have that feeling and whether you've confirmed this? Difficult to say at present - we've not done a direct comparative test yet and obviously we like our own implementation! It works very well for our clients' use case. Cheers Charlie Thanks, Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Mon, Aug 5, 2013 at 6:34 AM, Charlie Hull char...@flax.co.uk wrote: On 03/08/2013 00:50, Mark wrote: We have a set number of known terms we want to match against. In Index: term one term two term three I know how to match all terms of a user query against the index but we would like to know how/if we can match a user's query against all the terms in the index? Search Queries: my search term = 0 matches my term search one = 1 match (term one) some prefix term two = 1 match (term two) one two three = 0 matches I can only explain this is almost a reverse search??? I came across the following from ElasticSearch (http://www.elasticsearch.org/guide/reference/api/percolate/) and it sounds like this may accomplish the above but haven't tested. I was wondering if Solr had something similar or an alternative way of accomplishing this? Thanks Hi Mark, We've built something that implements this kind of reverse search for our clients in the media monitoring sector - we're working on releasing the core of this as open source very soon, hopefully in a month or two. It's based on Lucene. Just for reference it's able to apply tens of thousands of stored queries to a document per second (our clients often have very large and complex Boolean strings representing their clients' interests and may monitor hundreds of thousands of news stories every day). It also records the positions of every match. We suspect it's a lot faster and more flexible than Elasticsearch's Percolate feature. Cheers Charlie -- Charlie Hull Flax - Open Source Enterprise Search tel/fax: +44 (0)8700 118334 mobile: +44 (0)7767 825828 web: www.flax.co.uk -- Charlie Hull Flax - Open Source Enterprise Search tel/fax: +44 (0)8700 118334 mobile: +44 (0)7767 825828 web: www.flax.co.uk
Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception
i'm already using URLDataSource On 30. Sep 2013, at 5:41 PM, P Williams wrote: Hi Andreas, When using XPathEntityProcessorhttp://wiki.apache.org/solr/DataImportHandler#XPathEntityProcessoryour DataSource must be of type DataSourceReader. You shouldn't be using BinURLDataSource, it's giving you the cast exception. Use URLDataSourcehttps://builds.apache.org/job/Solr-Artifacts-4.x/javadoc/solr-dataimporthandler/org/apache/solr/handler/dataimport/URLDataSource.html or FileDataSourcehttps://builds.apache.org/job/Solr-Artifacts-4.x/javadoc/solr-dataimporthandler/org/apache/solr/handler/dataimport/FileDataSource.htmlinstead. I don't think you need to specify namespaces, at least you didn't used to. The other thing that I've noticed is that the anywhere xpath expression // doesn't always work in DIH. You might have to be more specific. Cheers, Tricia On Sun, Sep 29, 2013 at 9:47 AM, Andreas Owen a...@conx.ch wrote: how dum can you get. obviously quite dum... i would have to analyze the html-pages with a nested instance like this: entity name=rec processor=XPathEntityProcessor url=file:///C:\ColdFusion10\cfusion\solr\solr\tkbintranet\docImportUrl.xml forEach=/docs/doc dataSource=main entity name=htm processor=XPathEntityProcessor url=${rec.urlParse} forEach=/xhtml:html dataSource=dataUrl field column=text xpath=//content / field column=h_2 xpath=//body / field column=text_nohtml xpath=//text / field column=h_1 xpath=//h:h1 / /entity /entity but i'm pretty sure the foreach is wrong and the xpath expressions. in the moment i getting the following error: Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.ClassCastException: sun.net.www.protocol.http.HttpURLConnection$HttpInputStream cannot be cast to java.io.Reader On 28. Sep 2013, at 1:39 AM, Andreas Owen wrote: ok i see what your getting at but why doesn't the following work: field xpath=//h:h1 column=h_1 / field column=text xpath=/xhtml:html/xhtml:body / i removed the tiki-processor. what am i missing, i haven't found anything in the wiki? On 28. Sep 2013, at 12:28 AM, P Williams wrote: I spent some more time thinking about this. Do you really need to use the TikaEntityProcessor? It doesn't offer anything new to the document you are building that couldn't be accomplished by the XPathEntityProcessor alone from what I can tell. I also tried to get the Advanced Parsinghttp://wiki.apache.org/solr/TikaEntityProcessorexample to work without success. There are some obvious typos (document instead of /document) and an odd order to the pieces (dataSources is enclosed by document). It also looks like FieldStreamDataSource http://lucene.apache.org/solr/4_3_1/solr-dataimporthandler/org/apache/solr/handler/dataimport/FieldStreamDataSource.html is the one that is meant to work in this context. If Koji is still around maybe he could offer some help? Otherwise this bit of erroneous instruction should probably be removed from the wiki. Cheers, Tricia $ svn diff Index: solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java === --- solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java (revision 1526990) +++ solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java (working copy) @@ -99,13 +99,13 @@ runFullImport(getConfigHTML(identity)); assertQ(req(*:*), testsHTMLIdentity); } - + private String getConfigHTML(String htmlMapper) { return dataConfig + dataSource type='BinFileDataSource'/ + document + -entity name='Tika' format='xml' processor='TikaEntityProcessor' + +entity name='Tika' format='html' processor='TikaEntityProcessor' + url=' + getFile(dihextras/structured.html).getAbsolutePath() + ' + ((htmlMapper == null) ? : ( htmlMapper=' + htmlMapper + ')) + + field column='text'/ + @@ -114,4 +114,36 @@ /dataConfig; } + private String[] testsHTMLH1 = { + //*[@numFound='1'] + , //str[@name='h1'][contains(.,'H1 Header')] + }; + + @Test + public void testTikaHTMLMapperSubEntity() throws Exception { +runFullImport(getConfigSubEntity(identity)); +assertQ(req(*:*), testsHTMLH1); + } + + private String getConfigSubEntity(String htmlMapper) { +return +dataConfig + +dataSource type='BinFileDataSource' name='bin'/ + +dataSource type='FieldStreamDataSource' name='fld'/ + +document + +entity
Re: Auto Suggest - Time decay
I am using a totally separate core for storing the auto suggest keywords. Would you be able to send me some more details on your implementation? -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-Suggest-Time-decay-tp4092965p4092969.html Sent from the Solr - User mailing list archive at Nabble.com.
Autosuggest - Custom sorting
Is there a way to sort the returned Autosuggest list based on a particular value (ex: score)? I am trying to sort the returned suggestions based on a field that has been calculated manually but not sure how to use that field for sorting suggestions. -- View this message in context: http://lucene.472066.n3.nabble.com/Autosuggest-Custom-sorting-tp4092980.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to manually update a field in the index without re-crawling?
On 10/1/2013 9:03 AM, eShard wrote: I'm currently using Solr 4.0 FINAL. I indexed a website and it took over 24 hours to crawl. I just realized I need to rename one of the fields (or add a new one). so I added the new field to the schema, But how do I copy the data over from the old field to the new field without recrawling everything? Is this possible? I was thinking about maybe putting an update chain processor in the /update handler but I'm not sure that will work. If you meet all the caveats and limitations, then you can use the atomic update functionality to add the new field and delete the old field. For each document, you'll need the value of the uniqueKey and the value of the field that you want to essentially rename. http://wiki.apache.org/solr/Atomic_Updates If you have not configured your fields in the way described by the caveats and limitations section of that wiki page, then you will have to reindex. There is no way around that requirement. Final comment, unrelated to your question: 4.0 is ancient and buggy. You're going to need to upgrade before too long. Thanks, Shawn
Re: Auto Suggest - Time decay
For that core just use a boost factor as explained on [1]: You could use a query like this to see (before make any change) how your suggestions will be retrieved, in this case a query for goog has been made, and recent documents will be boosted (an extra bonus will be given for the newer documents). http://localhost:8983/solr/select?q={!boost b=recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)}goog If this is enough for you you could poot the boost parameter in your request handler and make it even simpler so any query againsta this particular request handler will be automatically boosted by date. PS: You could tweak the above formula used in the boost parameter for a more suitable to your needs. - Mensaje original - De: SolrLover bbar...@gmail.com Para: solr-user@lucene.apache.org Enviados: Martes, 1 de Octubre 2013 12:19:51 Asunto: Re: Auto Suggest - Time decay I am using a totally separate core for storing the auto suggest keywords. Would you be able to send me some more details on your implementation? -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-Suggest-Time-decay-tp4092965p4092969.html Sent from the Solr - User mailing list archive at Nabble.com. III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu
Re: Auto Suggest - Time decay
Sorry, I forgot the link: [1] - http://wiki.apache.org/solr/SolrRelevancyFAQ - Mensaje original - De: Ing. Jorge Luis Betancourt Gonzalez jlbetanco...@uci.cu Para: solr-user@lucene.apache.org Enviados: Martes, 1 de Octubre 2013 13:34:03 Asunto: Re: Auto Suggest - Time decay For that core just use a boost factor as explained on [1]: You could use a query like this to see (before make any change) how your suggestions will be retrieved, in this case a query for goog has been made, and recent documents will be boosted (an extra bonus will be given for the newer documents). http://localhost:8983/solr/select?q={!boost b=recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)}goog If this is enough for you you could poot the boost parameter in your request handler and make it even simpler so any query againsta this particular request handler will be automatically boosted by date. PS: You could tweak the above formula used in the boost parameter for a more suitable to your needs. - Mensaje original - De: SolrLover bbar...@gmail.com Para: solr-user@lucene.apache.org Enviados: Martes, 1 de Octubre 2013 12:19:51 Asunto: Re: Auto Suggest - Time decay I am using a totally separate core for storing the auto suggest keywords. Would you be able to send me some more details on your implementation? -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-Suggest-Time-decay-tp4092965p4092969.html Sent from the Solr - User mailing list archive at Nabble.com. III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu
Re: {soft}Commit and cache flusing
Thanks a lot Shawn for an exhaustive reply! Regards, Dmitry On Tue, Oct 1, 2013 at 5:37 PM, Shawn Heisey s...@elyograg.org wrote: On 10/1/2013 2:48 AM, Dmitry Kan wrote: This is a minor thing, perhaps, but thought to ask / share: if there are no modifications to an index and a softCommit or hardCommit issued, then solr flushes the cache. Any time you do a commit that opens a new Searcher object (openSearcher=true, which is required if you want index changes to be visible to people making queries), the caches are invalidated. This is because the layout of the index (and therefore the Lucene internal IDs) can completely change with *any* commit/merge, and there is no easy and reliable way to determine when the those numbers have NOT changed. If you have warming queries configured, those happen on the new searcher, populating the new cache. If you have cache autoWarming configured, then keys from the old caches are re-queried against the new index and used to populate the new cache. I do not understand deep Lucene internals, but what I've seen come through Jira activity and commits over the last year or two has been a strong move towards per-segment thinking instead of whole-index thinking. If this idea becomes applicable to all aspects of Lucene, then perhaps Solr caches can also become per-segment, and will not need to be completely invalidated except in the case of a major merge or forceMerge. Thanks, Shawn
Re: Doing time sensitive search in solr
Thanks Eric When i did solr in 2010 i thought now they might have evolved and allow doing query by providing wildcard in field name, but looks like i have to provide a concrete dynamic field name to query. Anyway will look in the catch all fields. Do you have any examples on how a catch all fields will help with this, or how my doc will look like and how can i query. darniz -- View this message in context: http://lucene.472066.n3.nabble.com/Doing-time-sensitive-search-in-solr-tp4092273p4092989.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sorting dependent on user preferences with FunctionQuery
: select?q=*%3A*sort=query(qf=category v='Book')desc : : but Solr returns Can't determine a Sort Order (asc or desc) in sort. the root cause of that error is that you don't have any whitespace between your query function and desc as for your broader goal: doing a straight sort on the users pref is probably not the best idea -- it's better to incorporate user prefrnces into boosting functions so you still retain the benefits of the relevancy score based on what the user searched for --- even if you know someone generally buys a lot of books, if they search for the beatles white album you probably don't want all the books that mention the white album, even just tangentially, to appear before the album itself. I did a talk last year on boosting biasing that introduces a lot of hte concepts to think about and the basics of how to appoach problems like this in solr... https://people.apache.org/~hossman/ac2012eu/ http://vimeopro.com/user11514798/apache-lucene-eurocon-2012/video/55822630 -Hoss
Advice for using Solr 4.5 custom sharding to handle rolling time-oriented event data
I'm interesting in using the new custom sharding features in the collections API to search a rolling window of event data. I'd appreciate a spot/sanity check of my plan/understanding. Say I only care about the last 7 days of events and I have thousands per second (billions per week). Am I correct that I could create a new shard for each hour, and send events that happen in those hour with the ID (uniqueKey) of `new_event_hour!event_id` so that each hour block of events goes into one shard? I *always* query these events by the time in which they occurred, which is another TrieInt field that I index with every document. So at query time I would need to calculate the range the user cared about and send something like _route_=hour1_route_=hour2 if I wanted to only query those two shards. (I *can* set multiple _route_ arguments in one query, right? And Solr will handle merging results like it would with any other cores?) Some scheduled task would drop and delete shards after they were more than 7 days old. Does all of that make sense? Do you see a smarter way to do large time-oriented search in SolrCloud? Thanks!
Re: Profiling Solr Lucene for query
Hi Dmitry, I'm trying to examine your suggestion to create a frontend node. It sounds pretty usefull. I saw that every node in solr cluster can serve request for any collection, even if it does not hold a core of that collection. because of that, I thought that adding a new node to the cluster (aka, the frontend/gateway server), and creating a dummy collection (with 1 dummy core), will solve the problem. But, I see that a request which sent to the gateway node, is not then sent to the shards. Instead, the request is proxyed to a (random) core of the requested collection, and from there it is sent to the shards. (It is reasonable, because the SolrCore on the gateway might run with different configuration, etc). This means that my new node isn't functioning as a frontend (which responsible for sorting, etc.), but as a poor load balancer. No performance improvement will come from this implementation. So, how do you suggest to implement a frontend? On the one hand, it has to run a core of the target collection, but on the other hand, we don't want it to hold any shard contents. On Fri, Sep 13, 2013 at 1:08 PM, Dmitry Kan solrexp...@gmail.com wrote: Manuel, Whether to have the front end solr as aggregator of shard results depends on your requirements. To repeat, we found merging from many shards very inefficient fo our use case. It can be the opposite for you (i.e. requires testing). There are some limitations with distributed search, see here: http://docs.lucidworks.com/display/solr/Distributed+Search+with+Index+Sharding On Wed, Sep 11, 2013 at 3:35 PM, Manuel Le Normand manuel.lenorm...@gmail.com wrote: Dmitry - currently we don't have such a front end, this sounds like a good idea creating it. And yes, we do query all 36 shards every query. Mikhail - I do think 1 minute is enough data, as during this exact minute I had a single query running (that took a qtime of 1 minute). I wanted to isolate these hard queries. I repeated this profiling few times. I think I will take the termInterval from 128 to 32 and check the results. I'm currently using NRTCachingDirectoryFactory On Mon, Sep 9, 2013 at 11:29 PM, Dmitry Kan solrexp...@gmail.com wrote: Hi Manuel, The frontend solr instance is the one that does not have its own index and is doing merging of the results. Is this the case? If yes, are all 36 shards always queried? Dmitry On Mon, Sep 9, 2013 at 10:11 PM, Manuel Le Normand manuel.lenorm...@gmail.com wrote: Hi Dmitry, I have solr 4.3 and every query is distributed and merged back for ranking purpose. What do you mean by frontend solr? On Mon, Sep 9, 2013 at 2:12 PM, Dmitry Kan solrexp...@gmail.com wrote: are you querying your shards via a frontend solr? We have noticed, that querying becomes much faster if results merging can be avoided. Dmitry On Sun, Sep 8, 2013 at 6:56 PM, Manuel Le Normand manuel.lenorm...@gmail.com wrote: Hello all Looking on the 10% slowest queries, I get very bad performances (~60 sec per query). These queries have lots of conditions on my main field (more than a hundred), including phrase queries and rows=1000. I do return only id's though. I can quite firmly say that this bad performance is due to slow storage issue (that are beyond my control for now). Despite this I want to improve my performances. As tought in school, I started profiling these queries and the data of ~1 minute profile is located here: http://picpaste.com/pics/IMG_20130908_132441-ZyrfXeTY.1378637843.jpg Main observation: most of the time I do wait for readVInt, who's stacktrace (2 out of 2 thread dumps) is: catalina-exec-3870 - Thread t@6615 java.lang.Thread.State: RUNNABLE at org.apadhe.lucene.store.DataInput.readVInt(DataInput.java:108) at org.apaChe.lucene.codeosAockTreeIermsReade$FieldReader$SegmentTermsEnumFrame.loadBlock(BlockTreeTermsReader.java: 2357) at ora.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.seekExact(BlockTreeTermsReader.java:1745) at org.apadhe.lucene.index.TermContext.build(TermContext.java:95) at org.apache.lucene.search.PhraseQuery$PhraseWeight.init(PhraseQuery.java:221) at org.apache.lucene.search.PhraseQuery.createWeight(PhraseQuery.java:326) at org.apache.lucene.search.BooleanQuery$BooleanWeight.init(BooleanQuery.java:183) at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:384) at org.apache.lucene.searth.BooleanQuery$BooleanWeight.init(BooleanQuery.java:183) at
Re: Profiling Solr Lucene for query
On 10/1/2013 2:35 PM, Isaac Hebsh wrote: Hi Dmitry, I'm trying to examine your suggestion to create a frontend node. It sounds pretty usefull. I saw that every node in solr cluster can serve request for any collection, even if it does not hold a core of that collection. because of that, I thought that adding a new node to the cluster (aka, the frontend/gateway server), and creating a dummy collection (with 1 dummy core), will solve the problem. But, I see that a request which sent to the gateway node, is not then sent to the shards. Instead, the request is proxyed to a (random) core of the requested collection, and from there it is sent to the shards. (It is reasonable, because the SolrCore on the gateway might run with different configuration, etc). This means that my new node isn't functioning as a frontend (which responsible for sorting, etc.), but as a poor load balancer. No performance improvement will come from this implementation. So, how do you suggest to implement a frontend? On the one hand, it has to run a core of the target collection, but on the other hand, we don't want it to hold any shard contents. With SolrCloud, every node is a frontend node. If you're running SolrCloud, then it doesn't make sense to try and use that concept. It only makes sense to create a frontend node (or core) if you are using traditional distributed search, where you need to include a shards parameter. http://wiki.apache.org/solr/DistributedSearch Thanks, Shawn
Accent insensitive multi-words suggester
Hi, Up to now, the best solution I found in order to implement a multi-words suggester was to use ShingleFilterFactory filter at index time and the termsComponent. At index time the analyzer was : analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.ElisionFilterFactory ignoreCase=true articles=lang/contractions_fr.txt/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.LowerCaseFilterFactory / filter class=solr.ShingleFilterFactory maxShingleSize=4 outputUnigrams=true/ /analyzer With ASCIIFoldingFilter filter, it works find if the user do not use accent in query terms and all suggestions are without accents. Without ASCIIFoldingFilter filter, it works find if the user do not forget accent in query terms and all suggestions are with accents. Note : I use the StopFilter to avoid suggestions including stop words and particularly starting or ending with stop words. What I need is a suggester where the user can use or not use the accent in query terms and the suggestions are returned with accent. For example, if the user type éco or eco, the suggester should return : école école primaire école publique école privée école primaire privée I think it is impossible to achieve this with the termComponents and I should use the SpellCheckComponent instead. However, I don't see how to make the suggester accent insensitive and return the suggestions with accents. Did somebody already achieved that ? Thank you. Dominique
Re: Profiling Solr Lucene for query
Hi Shawn, I know that every node operates as a frontend. This is the way our cluster currently run. If I seperate the frontend from the nodes which hold the shards, I can let him different amount of CPUs as RAM. (e.g. large amount of RAM to JVM, because this server won't need the OS cache for reading the index, or more CPUs because the merging process might be more CPU intensive). Isn't it possible? On Wed, Oct 2, 2013 at 12:42 AM, Shawn Heisey s...@elyograg.org wrote: On 10/1/2013 2:35 PM, Isaac Hebsh wrote: Hi Dmitry, I'm trying to examine your suggestion to create a frontend node. It sounds pretty usefull. I saw that every node in solr cluster can serve request for any collection, even if it does not hold a core of that collection. because of that, I thought that adding a new node to the cluster (aka, the frontend/gateway server), and creating a dummy collection (with 1 dummy core), will solve the problem. But, I see that a request which sent to the gateway node, is not then sent to the shards. Instead, the request is proxyed to a (random) core of the requested collection, and from there it is sent to the shards. (It is reasonable, because the SolrCore on the gateway might run with different configuration, etc). This means that my new node isn't functioning as a frontend (which responsible for sorting, etc.), but as a poor load balancer. No performance improvement will come from this implementation. So, how do you suggest to implement a frontend? On the one hand, it has to run a core of the target collection, but on the other hand, we don't want it to hold any shard contents. With SolrCloud, every node is a frontend node. If you're running SolrCloud, then it doesn't make sense to try and use that concept. It only makes sense to create a frontend node (or core) if you are using traditional distributed search, where you need to include a shards parameter. http://wiki.apache.org/solr/**DistributedSearchhttp://wiki.apache.org/solr/DistributedSearch Thanks, Shawn
Re: Profiling Solr Lucene for query
On 10/1/2013 4:04 PM, Isaac Hebsh wrote: Hi Shawn, I know that every node operates as a frontend. This is the way our cluster currently run. If I seperate the frontend from the nodes which hold the shards, I can let him different amount of CPUs as RAM. (e.g. large amount of RAM to JVM, because this server won't need the OS cache for reading the index, or more CPUs because the merging process might be more CPU intensive). Isn't it possible? Not with SolrCloud. If you manage all your shards and replicas yourself and use manual distributed search, then you can do what you're trying to do. You lose a *LOT* of automation that SolrCloud handles for you if you follow this route, though. I can't find an existing feature request issue for doing this with SolrCloud. It's a good idea, just not possible currently. Thanks, Shawn
Re: Newbie to Solr
Mamta, You are trying to do multiple things at once. Slow down before you drown. Use the default Solr distribution. That runs embedded server. Do not switch to Tomcat. Do it on your personal machine if you need to (it's just unzip and run). Then, go through Solr tutorial. That will answer some of the questions you are trying to ask here. Then, if you are still confused, maybe read one of many books on Solr. I wrote one specifically for people with problems that sound exactly like yours (starting from basics and doing a learning journey). http://www.packtpub.com/apache-solr-for-indexing-data/book . But there are many others. Then, once you understand what those files and handlers and things are, use Tomcat if you have to. There is an extra issue with latest Solr and Tomcat due to logging jars requirements, so make sure to consult WIKI on that and not just random old internet page. This will not take long. You just need to stop randomly poking in all possibly directions and do it systematically. Good luck, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Oct 1, 2013 at 6:50 PM, Mamta Alshi mamta.al...@gmail.com wrote: I can have only one schema.xml file right? Can i over-write the one which originally comes with solr set-up? the original schema.xml is @ C:\solr\solr\solr\conf along with post.sh et all..where should my other document be? i need to run post.jar on my doc file (xml) to index it right? I could unfortunately not get any document which tells me how to run solr queries through my tomcat..do you know of any link/books? Thank you! Kishan. Thanks, Mamta On Tue, Oct 1, 2013 at 3:30 PM, Kishan Parmar kishan@gmail.com wrote: you have to create only schema file dont change anything in solr config file,, and your xml file which you want to index from solr if you are new in solr then there is core named collection1 you have to add thee schema file in that collection conf folder C:\solr\example\solr\collection1\conf your schema file is should be in c: solr - examples example docs folder in that folder post.jar and post.sh file there so that you can add yu Regards, Kishan Parmar Software Developer +91 95 100 77394 Jay Shree Krishnaa !! On Tue, Oct 1, 2013 at 4:19 AM, Mamta S Kanade mkan...@etisalat.ae wrote: Can you tell me what all docs I need to create...there needs to be a schema.xml and what else? A document having my data? Also, where these should be placed. There's already a schema.xml Thanks for the prompt response. Mamta. -Original Message- From: Kishan Parmar [mailto:kishan@gmail.com] Sent: 01 October, 2013 03:16 PM To: solr-user@lucene.apache.org Subject: Re: Newbie to Solr yes you have to create your own schema but in schema file you have to add your xml files field name in it like wise you can add your field name in it or you can add your filed in the default schema file whiithout schema you can not add your xml file to solr my schema is like this -- ?xml version=1.0 encoding=UTF-8 ? schema name=example version=1.5 fields field name=No type=string indexed=true stored=true required=true multiValued=false / field name=Name type=string indexed=true stored=true required=true multiValued=false / field name=Address type=string indexed=true stored=true required=true multiValued=false / field name=Mobile type=string indexed=true stored=true required=true multiValued=false / /fields uniqueKeyNo/uniqueKey types fieldType name=string class=solr.StrField sortMissingLast=true / fieldType name=int class=solr.TrieIntField precisionStep=0 positionIncrementGap=0 / /types /schema - and my file is like this ,,.,.,.,. - add doc field name=No100120107088/field field name=Namekishan/field field name=Addressghatlodia/field field name=Mobile9510077394/field /doc /add Regards, Kishan Parmar Software Developer +91 95 100 77394 Jay Shree Krishnaa !! On Tue, Oct 1, 2013 at 1:11 AM, mamta mamta.al...@gmail.com wrote: Hi, I want to know that if i
Problems with maxShardsPerNode in 4.5
It seems that changes in 4.5 collection configuration now require users to set a maxShardsPerNode (or it defaults to 1). Maybe this was the case before, but with the new CREATESHARD API it seems a very restrictive. I've just created a very simple test collection on 3 machines where I set maxShardsPerNode at collection creation time to 1, and I made 3 shards. Everything is good. Now I want a 4th shard, it seems impossible to create because the cluster knows I should only have 1 shard per node. Yet my problem doesn't require more hardware, I just my new shard to exist on one of the existing servers. So I try again -- I create a collection with 3 shards and set maxShardsPerNode to 1000 (just as a silly test). Everything is good. Now I add shard4 and it immediately tries to add 1000 replicas of shard4... You can see my earlier email today about time-oriented data in 4.5 to see what I'm trying to do. I was hoping to have 1 shard per hour/day with the ability to easily add/drop them as I move the time window (say, a week of data, 1 per day). Am I missing something? Thanks!
Re: Problems with maxShardsPerNode in 4.5
Related, 1 more try: Created collection starting with 4 shards on 1 box. Had to set maxShardsPerNode to 4 to do this. Now I want to roll over my time window, so to attempt to deal with the problems noted above I delete the oldest shard first. That works fine. Now I try to add my new shard, which works, but again it defaults to maxShardsPerNode # of replicas, so I'm left with: * [deleted by me] hour0 * hour1 - 1 replica * hour2 - 1 replica * hour3 - 1 replica * hour4 - 4 replicas [ the one I created after deleting hour0] Still at a loss as to how I would create 1 new shard with 1 replica on any server in 4.5? Thanks! On Tue, Oct 1, 2013 at 8:14 PM, Brett Hoerner br...@bretthoerner.comwrote: It seems that changes in 4.5 collection configuration now require users to set a maxShardsPerNode (or it defaults to 1). Maybe this was the case before, but with the new CREATESHARD API it seems a very restrictive. I've just created a very simple test collection on 3 machines where I set maxShardsPerNode at collection creation time to 1, and I made 3 shards. Everything is good. Now I want a 4th shard, it seems impossible to create because the cluster knows I should only have 1 shard per node. Yet my problem doesn't require more hardware, I just my new shard to exist on one of the existing servers. So I try again -- I create a collection with 3 shards and set maxShardsPerNode to 1000 (just as a silly test). Everything is good. Now I add shard4 and it immediately tries to add 1000 replicas of shard4... You can see my earlier email today about time-oriented data in 4.5 to see what I'm trying to do. I was hoping to have 1 shard per hour/day with the ability to easily add/drop them as I move the time window (say, a week of data, 1 per day). Am I missing something? Thanks!
Re: Problems with maxShardsPerNode in 4.5
Thanks for reporting this Brett. This is indeed a bug. A workaround is to specify replicationFactor=1 with the createShard command which will create only one replica even if maxShardsPerNode=1000 at collection level. I'll open an issue. On Wed, Oct 2, 2013 at 7:25 AM, Brett Hoerner br...@bretthoerner.comwrote: Related, 1 more try: Created collection starting with 4 shards on 1 box. Had to set maxShardsPerNode to 4 to do this. Now I want to roll over my time window, so to attempt to deal with the problems noted above I delete the oldest shard first. That works fine. Now I try to add my new shard, which works, but again it defaults to maxShardsPerNode # of replicas, so I'm left with: * [deleted by me] hour0 * hour1 - 1 replica * hour2 - 1 replica * hour3 - 1 replica * hour4 - 4 replicas [ the one I created after deleting hour0] Still at a loss as to how I would create 1 new shard with 1 replica on any server in 4.5? Thanks! On Tue, Oct 1, 2013 at 8:14 PM, Brett Hoerner br...@bretthoerner.com wrote: It seems that changes in 4.5 collection configuration now require users to set a maxShardsPerNode (or it defaults to 1). Maybe this was the case before, but with the new CREATESHARD API it seems a very restrictive. I've just created a very simple test collection on 3 machines where I set maxShardsPerNode at collection creation time to 1, and I made 3 shards. Everything is good. Now I want a 4th shard, it seems impossible to create because the cluster knows I should only have 1 shard per node. Yet my problem doesn't require more hardware, I just my new shard to exist on one of the existing servers. So I try again -- I create a collection with 3 shards and set maxShardsPerNode to 1000 (just as a silly test). Everything is good. Now I add shard4 and it immediately tries to add 1000 replicas of shard4... You can see my earlier email today about time-oriented data in 4.5 to see what I'm trying to do. I was hoping to have 1 shard per hour/day with the ability to easily add/drop them as I move the time window (say, a week of data, 1 per day). Am I missing something? Thanks! -- Regards, Shalin Shekhar Mangar.