RE: SOLR - Unable to execute query error - DIH
Thanks James. We have tried the following options *(individually)* including the one you suggested, 1.selectMethod=cursor 2. batchSize=-1 3.responseBuffering=adaptive But the indexing process doesn't seem to be improving at all. When we try to index set of 500 rows it works well gets completed in 18 min. For 1000K rows it took 22 hours (long) for indexing. But, when we try to index the complete set of 750K rows it doesn't show any progress and keeps on executing. Currently both the SQL server as well as the SOLR machine is running on 4 GB RAM. With this configuration does the above scenario stands justified? If we think of upgrading the RAM, which machine should that be, the SOLR machine or the SQL Server machine? Are there any other efficient methods to import/ index data from SQL Server to SOLR? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Unable-to-execute-query-error-DIH-tp4051028p4051981.html Sent from the Solr - User mailing list archive at Nabble.com.
solrj sample code for solrcloud
Does anyone have solrj indexing and searching sample code? I could not find it on the internet. Thanks.
Re: Querying a transitive closure?
Why don't you index all ancestor classes with the document, as a multivalued field, then you could get it in one hit. Am I missing something? Upayavira On Thu, Mar 28, 2013, at 01:59 AM, Jack Park wrote: Hi Otis, That's essentially the answer I was looking for: each shard (are we talking master + replicas?) has the plug-in custom query handler. I need to build it to find out. What I mean is that there is a taxonomy, say one with a single root for sake of illustration, which grows all the classes, subclasses, and instances. If I have an object that is somewhere in that taxonomy, then it has a zigzag chain of parents up that tree (I've seen that called a transitive closure. If class B is way up that tree from M, no telling how many queries it will take to find it. Hmmm... recursive ascent, I suppose. Many thanks Jack On Wed, Mar 27, 2013 at 6:52 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi Jack, I don't fully understand the exact taxonomy structure and your needs, but in terms of reducing the number of HTTP round trips, you can do it by writing a custom SearchComponent that, upon getting the initial request, does everything locally, meaning that it talks to the local/specified shard before returning to the caller. In SolrCloud setup with N shards, each of these N shards could be queried in such a way in parallel, running query/queries on their local shards. Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Mar 27, 2013 at 3:11 PM, Jack Park jackp...@topicquests.org wrote: Hi Otis, I fully expect to grow to SolrCloud -- many shards. For now, it's solo. But, my thinking relates to cloud. I look for ways to reduce the number of HTTP round trips through SolrJ. Maybe you have some ideas? Thanks Jack On Wed, Mar 27, 2013 at 10:04 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi Jack, Is this really about HTTP and Solr vs. SolrCloud or more whether Solr(Cloud) is the right tool for the job and if so how to structure the schema and queries to make such lookups efficient? Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Mar 27, 2013 at 12:53 PM, Jack Park jackp...@topicquests.org wrote: This is a question about isA? We want to know if M isA B isA?(M,B) For some M, one might be able to look into M to see its type or which class(es) for which it is a subClass. We're talking taxonomic queries now. But, for some M, one might need to ripple up the transitive closure, looking at all the super classes, etc, recursively. It seems unreasonable to do that over HTTP; it seems more reasonable to grab a core and write a custom isA query handler. But, how do you do that in a SolrCloud? Really curious... Many thanks in advance for ideas. Jack
Re: Setup solrcloud on tomcat
First of all, can you check your catalina.out log. It gives the detail about what is wrong. Secondly you can separate such kind of JVM parameters from that solr.xml and put them into a file setenv.sh (you will create it under bin folder of tomcat.) and here is what you should do: #!/bin/sh JAVA_OPTS=$JAVA_OPTS -Dbootstrap_confdir=/usr/share/solrhome/collection1/conf -Dcollection.configName=custom_conf -DnumShards=2 -DzkRun export JAVA_OPTS You should change here - /usr/share/solrhome into where is your solr home. That should start up an embedded zookeper. On the other hand client that will connect to embedded zookeper should have that setenv.sh: #!/bin/sh JAVA_OPTS=$JAVA_OPTS -DzkHost=**.**.***.**:2181 export JAVA_OPTS I have masked ip address, you should put your's. 2013/3/28 하정대 jungdae...@ahnlab.com Hi, all I tried setup solrcloud on tomcat. But I couldn’t see the cloud bar on solr menu. I think embedded zookeeper might not be loaded. This is my solr.xml file that was supposed to run zookeeper. solr persistent=”true” cores adminPath=”/admin/cores” defaultCoreName=”collection1” host=”${host:}” hostPort=”8080” hostContext=”${hostContext:}” numShards=”2” zkRun=http://localhost:9081 zkClientTimeout=”${zkClientTimeout:15000}” core name=”collection1” instanceDir=”collection1” / /cores /solr What shall I have? I need your help. Also, Example file or tutorial could be a good help for me. I am working this with solrcloud wiki. Thanks. All. “세상에서 가장 안전한 이름 - 안철수연구소” 하정대, 선임연구원 / ASD실 Tel: 031-722-8338 e-mail: jungdae...@ahnlab.com http://www.ahnlab.com http://www.ahnlab.com/ (우)463-400 경기도 성남시 분당구 삼평동 673번지
How to update synonyms.txt without restart?
Dear all, I investigating how to update synonyms.txt. Some people says CORE RELOAD will reload synonyms.txt. But solr wiki says: ``` Starting with Solr4.0, the RELOAD command is implemented in a way that results a live reloads of the SolrCore, reusing the existing various objects such as the SolrIndexWriter. As a result, some configuration options can not be changed and made active with a simple RELOAD... ``` http://wiki.apache.org/solr/CoreAdmin#RELOAD And https://issues.apache.org/jira/browse/SOLR-3592 is marked as unresolved. Problem is How can I update synonyms.txt in production environment? Workaround is restart Solr process. But it is not looks good for me. Will someone tell me what is the best practice of synonyms.txt updating? Thanks in advance.
Re: multicore vs multi collection
Does that means i can create multiple collections with different configurations ? can you please outline basic steps to create multiple collections,cause i am not able to create them on solr 4.0 -- View this message in context: http://lucene.472066.n3.nabble.com/multicore-vs-multi-collection-tp4051352p4052002.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Querying a transitive closure?
Exactly, you should usually design your schema to fit your queries, and if you need to retrieve all ancestors then you should index all ancestors so you can query for them easily. If that doesn't work for you then either Solr is not the right tool for the job, or you need to rethink your schema. The description of doing lookups within a tree structure doesn't sound at all like what you would use a text retrieval engine for, so you might want to rethink why you want to use Solr for this. But if that transitive closure is something you can calculate at indexing time then the correct solution is the one Upayavira provided. If you want people to be able to help you you need to actually describe your problem (i.e. what is my data, and what are my queries) instead of diving into technical details like reducing HTTP roundtrips. My guess is that if you need to reduce HTTP roundtrips you're probably doing it wrong. HTH, Jens On 03/28/2013 08:15 AM, Upayavira wrote: Why don't you index all ancestor classes with the document, as a multivalued field, then you could get it in one hit. Am I missing something? Upayavira On Thu, Mar 28, 2013, at 01:59 AM, Jack Park wrote: Hi Otis, That's essentially the answer I was looking for: each shard (are we talking master + replicas?) has the plug-in custom query handler. I need to build it to find out. What I mean is that there is a taxonomy, say one with a single root for sake of illustration, which grows all the classes, subclasses, and instances. If I have an object that is somewhere in that taxonomy, then it has a zigzag chain of parents up that tree (I've seen that called a transitive closure. If class B is way up that tree from M, no telling how many queries it will take to find it. Hmmm... recursive ascent, I suppose. Many thanks Jack On Wed, Mar 27, 2013 at 6:52 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi Jack, I don't fully understand the exact taxonomy structure and your needs, but in terms of reducing the number of HTTP round trips, you can do it by writing a custom SearchComponent that, upon getting the initial request, does everything locally, meaning that it talks to the local/specified shard before returning to the caller. In SolrCloud setup with N shards, each of these N shards could be queried in such a way in parallel, running query/queries on their local shards. Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Mar 27, 2013 at 3:11 PM, Jack Park jackp...@topicquests.org wrote: Hi Otis, I fully expect to grow to SolrCloud -- many shards. For now, it's solo. But, my thinking relates to cloud. I look for ways to reduce the number of HTTP round trips through SolrJ. Maybe you have some ideas? Thanks Jack On Wed, Mar 27, 2013 at 10:04 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi Jack, Is this really about HTTP and Solr vs. SolrCloud or more whether Solr(Cloud) is the right tool for the job and if so how to structure the schema and queries to make such lookups efficient? Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Mar 27, 2013 at 12:53 PM, Jack Park jackp...@topicquests.org wrote: This is a question about isA? We want to know if M isA B isA?(M,B) For some M, one might be able to look into M to see its type or which class(es) for which it is a subClass. We're talking taxonomic queries now. But, for some M, one might need to ripple up the transitive closure, looking at all the super classes, etc, recursively. It seems unreasonable to do that over HTTP; it seems more reasonable to grab a core and write a custom isA query handler. But, how do you do that in a SolrCloud? Really curious... Many thanks in advance for ideas. Jack
Re: How to update synonyms.txt without restart?
You should be fine for synonym and other schema changes since they are unrelated to the IndexWriter. But... if you are using synonyms in your index analyzer, as opposed to in your query analyzer, then you need to do a full reindex anyway, which is best done by deleting the contents of the Solr data directory for the collection and restarting Solr and resending all of the source documents. -- Jack Krupansky -Original Message- From: Kaneyama Genta Sent: Thursday, March 28, 2013 5:11 AM To: solr-user@lucene.apache.org Subject: How to update synonyms.txt without restart? Dear all, I investigating how to update synonyms.txt. Some people says CORE RELOAD will reload synonyms.txt. But solr wiki says: ``` Starting with Solr4.0, the RELOAD command is implemented in a way that results a live reloads of the SolrCore, reusing the existing various objects such as the SolrIndexWriter. As a result, some configuration options can not be changed and made active with a simple RELOAD... ``` http://wiki.apache.org/solr/CoreAdmin#RELOAD And https://issues.apache.org/jira/browse/SOLR-3592 is marked as unresolved. Problem is How can I update synonyms.txt in production environment? Workaround is restart Solr process. But it is not looks good for me. Will someone tell me what is the best practice of synonyms.txt updating? Thanks in advance.
Re: How to update synonyms.txt without restart?
https://issues.apache.org/jira/browse/SOLR-3587 (pointed to from SOLR-3592) indicates it is resolved. I just tried it on my local 4x branch checkout, using the analysis page (text_general analyzing foo), added a synonym, went to core admin clicked reload and saw the synonym appear afterwards. Erik On Mar 28, 2013, at 05:11 , Kaneyama Genta wrote: Dear all, I investigating how to update synonyms.txt. Some people says CORE RELOAD will reload synonyms.txt. But solr wiki says: ``` Starting with Solr4.0, the RELOAD command is implemented in a way that results a live reloads of the SolrCore, reusing the existing various objects such as the SolrIndexWriter. As a result, some configuration options can not be changed and made active with a simple RELOAD... ``` http://wiki.apache.org/solr/CoreAdmin#RELOAD And https://issues.apache.org/jira/browse/SOLR-3592 is marked as unresolved. Problem is How can I update synonyms.txt in production environment? Workaround is restart Solr process. But it is not looks good for me. Will someone tell me what is the best practice of synonyms.txt updating? Thanks in advance.
Re: Too many fields to Sort in Solr
Hi, I tested this config on Solr 4.2 this morning and it worked: fieldType name=long class=solr.TrieLongField precisionStep=0 docValuesFormat=Disk positionIncrementGap=0/ field name=MMDDhh type=long indexed=true stored=true required=true docValues=true multiValued=false / I also loaded data and ran a sort and looked at the heap with jvisualvm and the longs were not loaded into the jvm's heap. The sort was also very fast, although only on 600,000 records. Possibly you are not on Solr 4.2? Can you post both your filedType definition and your field definition? Joel On Thu, Mar 28, 2013 at 12:57 AM, adityab aditya_ba...@yahoo.com wrote: Hi Joel, you are correct, boost function populates the field cache. Well i am not aware of docValue, so while trying the example you provided i see the error when i define the field type Caused by: org.apache.solr.common.SolrException: FieldType 'dvLong' is configured with a docValues format, but the codec does not support it: class org.apache.solr.core.SolrCore$3 at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:854) at org.apache.solr.core.SolrCore.init(SolrCore.java:719) ... 13 more My field defination: fieldType name=dvLong class=solr.TrieLongField precisionStep=0 positionIncrementGap=0 docValuesFormat=Disk/ what am i missing here? thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Too-many-fields-to-Sort-in-Solr-tp4049139p4051960.html Sent from the Solr - User mailing list archive at Nabble.com. -- Joel Bernstein Professional Services LucidWorks
Re: Urgent:Solr cloud issue
Waiting for your assitence to get config entries for 3 server solr cloud setup.. Thanks in advance Anuj From: anuj vatslt;vats_a...@rediffmail.comgt;Sent: Fri, 22 Mar 2013 17:32:10 To: solr-user@lucene.apache.orglt;solr-user@lucene.apache.orggt;Cc: mayank...@gmail.comlt;mayank...@gmail.comgt;Subject: lt;Urgent:Solr cloud issuegt; Hi Shawan, I have seen your post on solr cloud Master-Master configuration on two servers. I have to use the same Solr structure, but from long I am not able to configure it to comunicate between two server, on single server it works fine. Can you pls help me out to provide required config changes, so that SOLR can comunicate between two servers. http://grokbase.com/t/lucene/solr-user/132pb1pe34/solrcloud-master-master Regards Anuj Vats Get your own FREE website and domain with business email solutions, click here
Re: multicore vs multi collection
Unable? In what way? Did you look at the Solr example? Did you look at solr.xml? Did you see the core element? (Needs to be one per core/collection.) Did you see the multicore directory in the example? Did you look at the solr.xml file in multicore? Did you see how there are separate directories for each collection/core in multicore? Did you see how there is a core element in solr.xml in multicore, one for each collection directory (instance)? Did you try setting up your own test directory parallel to multicore in example? Did you read the README.txt files in the Solr example directories? Did you see the command to start Solr with a specific Solr home directory? - java -Dsolr.solr.home=multicore -jar start.jar Did you try that for your own test solr home directory created above? So... what exactly was the problem you were encountering? Be specific. My guess is that you simply need to re-read the README.txt files more carefully in the Solr example directories. If you have questions about what the README.txt files say, please ask them, but please be specific. -- Jack Krupansky -Original Message- From: hupadhyay Sent: Thursday, March 28, 2013 5:35 AM To: solr-user@lucene.apache.org Subject: Re: multicore vs multi collection Does that means i can create multiple collections with different configurations ? can you please outline basic steps to create multiple collections,cause i am not able to create them on solr 4.0 -- View this message in context: http://lucene.472066.n3.nabble.com/multicore-vs-multi-collection-tp4051352p4052002.html Sent from the Solr - User mailing list archive at Nabble.com.
How to male Solr complex Join Query
Hi I need to do complex join in single core with multiple table. Like Inner , Outer, Left, Right and so on. I am working with solr4. Is there I can work with any type of join with solr4? Is there any way to do so? Please give your suggestion, its very important. Please help me.. Thanks in advance. Ashim -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-male-Solr-complex-Join-Query-tp4052023.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to male Solr complex Join Query
Hi Ashim, You probably doing something in wrong way if You need using such a complex joins. Remember that solr isn't relational database. You should probably revisit Your schema and flatten Your data structure. Regards, Karol W dniu 28.03.2013 13:45, ashimbose pisze: Hi I need to do complex join in single core with multiple table. Like Inner , Outer, Left, Right and so on. I am working with solr4. Is there I can work with any type of join with solr4? Is there any way to do so? Please give your suggestion, its very important. Please help me.. Thanks in advance. Ashim -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-male-Solr-complex-Join-Query-tp4052023.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Urgent:Solr cloud issue
Could you give more details on what's not working? Have you followed the instructions here: http://wiki.apache.org/solr/SolrCloud#Getting_Started Are you using an embedded Zookeeper or an external server? How many of them? Are you using numShards=1?2? What do you see in the Solr UI, in the cloud section? Tomás On Thu, Mar 28, 2013 at 8:44 AM, anuj vats vats_a...@rediffmail.com wrote: Waiting for your assitence to get config entries for 3 server solr cloud setup.. Thanks in advance Anuj From: anuj vatslt;vats_a...@rediffmail.comgt;Sent: Fri, 22 Mar 2013 17:32:10 To: solr-user@lucene.apache.org lt;solr-user@lucene.apache.orggt;Cc: mayank...@gmail.com lt;mayank...@gmail.comgt;Subject: lt;Urgent:Solr cloud issuegt; Hi Shawan, I have seen your post on solr cloud Master-Master configuration on two servers. I have to use the same Solr structure, but from long I am not able to configure it to comunicate between two server, on single server it works fine. Can you pls help me out to provide required config changes, so that SOLR can comunicate between two servers. http://grokbase.com/t/lucene/solr-user/132pb1pe34/solrcloud-master-master Regards Anuj Vats Get your own FREE website and domain with business email solutions, click here
Re: Too many fields to Sort in Solr
Here is the field type definition. same as what you posted yesterday just a different name. fieldType name=dvLong class=solr.TrieLongField precisionStep=0 docValuesFormat=Disk positionIncrementGap=0/ And Field Definition field name=lcontNumOfDownloads type=dvLong indexed=true stored=true default=0 docValues=true/ as soon as i restart the server i see the exception in log. removing the *docValuesFormat=Disk* from the field type i don't see this exception. 01:49:37,177 ERROR [org.apache.solr.core.CoreContainer] (coreLoadExecutor-3-thread-1) Unable to create core: collection1: org.apache.solr.common.SolrException: FieldType 'dvLong' is configured with a docValues format, but the codec does not support it: class org.apache.solr.core.SolrCore$3 at org.apache.solr.core.SolrCore.init(SolrCore.java:806) [solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13] at org.apache.solr.core.SolrCore.init(SolrCore.java:619) [solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13] at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021) [solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13] at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051) [solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13] at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634) [solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13] at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) [solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [rt.jar:1.7.0_09] at java.util.concurrent.FutureTask.run(FutureTask.java:166) [rt.jar:1.7.0_09] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_09] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [rt.jar:1.7.0_09] at java.util.concurrent.FutureTask.run(FutureTask.java:166) [rt.jar:1.7.0_09] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [rt.jar:1.7.0_09] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [rt.jar:1.7.0_09] at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_09] Caused by: org.apache.solr.common.SolrException: FieldType 'dvLong' is configured with a docValues format, but the codec does not support it: class org.apache.solr.core.SolrCore$3 at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:854) [solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13] at org.apache.solr.core.SolrCore.init(SolrCore.java:719) [solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13] ... 13 more 01:49:37,202 ERROR [org.apache.solr.core.CoreContainer] (coreLoadExecutor-3-thread-1) null:org.apache.solr.common.SolrException: Unable to create core: collection1 at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1672) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1057) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.solr.common.SolrException: FieldType 'dvLong' is configured with a docValues format, but the codec does not support it: class org.apache.solr.core.SolrCore$3 at org.apache.solr.core.SolrCore.init(SolrCore.java:806) at org.apache.solr.core.SolrCore.init(SolrCore.java:619) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051) ... 10 more Caused by: org.apache.solr.common.SolrException: FieldType 'dvLong' is configured with a docValues format, but the codec does not support it: class org.apache.solr.core.SolrCore$3 at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:854) at org.apache.solr.core.SolrCore.init(SolrCore.java:719) ... 13 more -- View this message in context: http://lucene.472066.n3.nabble.com/Too-many-fields-to-Sort-in-Solr-tp4049139p4052036.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Too many fields to Sort in Solr
OK, you'll need to re-index. Shutdown, delete the data, re-index. On Thu, Mar 28, 2013 at 9:12 AM, adityab aditya_ba...@yahoo.com wrote: Here is the field type definition. same as what you posted yesterday just a different name. fieldType name=dvLong class=solr.TrieLongField precisionStep=0 docValuesFormat=Disk positionIncrementGap=0/ And Field Definition field name=lcontNumOfDownloads type=dvLong indexed=true stored=true default=0 docValues=true/ as soon as i restart the server i see the exception in log. removing the *docValuesFormat=Disk* from the field type i don't see this exception. 01:49:37,177 ERROR [org.apache.solr.core.CoreContainer] (coreLoadExecutor-3-thread-1) Unable to create core: collection1: org.apache.solr.common.SolrException: FieldType 'dvLong' is configured with a docValues format, but the codec does not support it: class org.apache.solr.core.SolrCore$3 at org.apache.solr.core.SolrCore.init(SolrCore.java:806) [solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13] at org.apache.solr.core.SolrCore.init(SolrCore.java:619) [solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13] at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021) [solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13] at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051) [solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13] at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634) [solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13] at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) [solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [rt.jar:1.7.0_09] at java.util.concurrent.FutureTask.run(FutureTask.java:166) [rt.jar:1.7.0_09] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_09] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [rt.jar:1.7.0_09] at java.util.concurrent.FutureTask.run(FutureTask.java:166) [rt.jar:1.7.0_09] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [rt.jar:1.7.0_09] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [rt.jar:1.7.0_09] at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_09] Caused by: org.apache.solr.common.SolrException: FieldType 'dvLong' is configured with a docValues format, but the codec does not support it: class org.apache.solr.core.SolrCore$3 at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:854) [solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13] at org.apache.solr.core.SolrCore.init(SolrCore.java:719) [solr-core-4.2.0.jar:4.2.0 1453694 - rmuir - 2013-03-06 22:32:13] ... 13 more 01:49:37,202 ERROR [org.apache.solr.core.CoreContainer] (coreLoadExecutor-3-thread-1) null:org.apache.solr.common.SolrException: Unable to create core: collection1 at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1672) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1057) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.solr.common.SolrException: FieldType 'dvLong' is configured with a docValues format, but the codec does not support it: class org.apache.solr.core.SolrCore$3 at org.apache.solr.core.SolrCore.init(SolrCore.java:806) at org.apache.solr.core.SolrCore.init(SolrCore.java:619) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051) ... 10 more Caused by: org.apache.solr.common.SolrException: FieldType 'dvLong' is configured with a docValues format, but the codec does not support it: class org.apache.solr.core.SolrCore$3 at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:854) at org.apache.solr.core.SolrCore.init(SolrCore.java:719) ... 13 more -- View
Re: [ANNOUNCE] Solr wiki editing change
On Mar 24, 2013, at 10:18 PM, Steve Rowe sar...@gmail.com wrote: The wiki at http://wiki.apache.org/solr/ has come under attack by spammers more frequently of late, so the PMC has decided to lock it down in an attempt to reduce the work involved in tracking and removing spam. From now on, only people who appear on http://wiki.apache.org/solr/ContributorsGroup will be able to create/modify/delete wiki pages. Please request either on the solr-user@lucene.apache.org or on d...@lucene.apache.org to have your wiki username added to the ContributorsGroup page - this is a one-time step. Please add my username, AndyLester, to the approved editors list. Thanks. -- Andy Lester = a...@petdance.com = www.petdance.com = AIM:petdance
Re: [ANNOUNCE] Solr wiki editing change
On Mar 28, 2013, at 9:25 AM, Andy Lester a...@petdance.com wrote: On Mar 24, 2013, at 10:18 PM, Steve Rowe sar...@gmail.com wrote: Please request either on the solr-user@lucene.apache.org or on d...@lucene.apache.org to have your wiki username added to the ContributorsGroup page - this is a one-time step. Please add my username, AndyLester, to the approved editors list. Thanks. Added to solr ContributorsGroup.
Is deltaQuery mandatory ?
Is deltaQuery mandatory in data-config.xml ? I did it like this : entity name=residential query=select * from tsunami.consumer_data_01 where state='MA' and rownum = 5000 deltaQuery=select LEMSMATCHCODE, STREETNAME from residential where last_modified '${dataimporter.last_index_time}' Then my manager come and said we don't need it, this is only for incremental. I took off the line that start with deltaQuery, now in : http://localhost:8983/solr/#/db/dataimport//dataimport entity is empty, when I click the button Exwcute, nothing happened, thanks.
Re: Querying a transitive closure?
Thank you for this. I had thought about it but reasoned in a naive way: who would do such a thing? Doing so makes the query local: once the object has been retrieved, no further HTTP queries are required. Implementation perhaps entails one request to fetch the presumed parent in order to harvest its transitive closure. I need to think about that. Many thanks Jack On Thu, Mar 28, 2013 at 5:06 AM, Jens Grivolla j+...@grivolla.net wrote: Exactly, you should usually design your schema to fit your queries, and if you need to retrieve all ancestors then you should index all ancestors so you can query for them easily. If that doesn't work for you then either Solr is not the right tool for the job, or you need to rethink your schema. The description of doing lookups within a tree structure doesn't sound at all like what you would use a text retrieval engine for, so you might want to rethink why you want to use Solr for this. But if that transitive closure is something you can calculate at indexing time then the correct solution is the one Upayavira provided. If you want people to be able to help you you need to actually describe your problem (i.e. what is my data, and what are my queries) instead of diving into technical details like reducing HTTP roundtrips. My guess is that if you need to reduce HTTP roundtrips you're probably doing it wrong. HTH, Jens On 03/28/2013 08:15 AM, Upayavira wrote: Why don't you index all ancestor classes with the document, as a multivalued field, then you could get it in one hit. Am I missing something? Upayavira On Thu, Mar 28, 2013, at 01:59 AM, Jack Park wrote: Hi Otis, That's essentially the answer I was looking for: each shard (are we talking master + replicas?) has the plug-in custom query handler. I need to build it to find out. What I mean is that there is a taxonomy, say one with a single root for sake of illustration, which grows all the classes, subclasses, and instances. If I have an object that is somewhere in that taxonomy, then it has a zigzag chain of parents up that tree (I've seen that called a transitive closure. If class B is way up that tree from M, no telling how many queries it will take to find it. Hmmm... recursive ascent, I suppose. Many thanks Jack On Wed, Mar 27, 2013 at 6:52 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi Jack, I don't fully understand the exact taxonomy structure and your needs, but in terms of reducing the number of HTTP round trips, you can do it by writing a custom SearchComponent that, upon getting the initial request, does everything locally, meaning that it talks to the local/specified shard before returning to the caller. In SolrCloud setup with N shards, each of these N shards could be queried in such a way in parallel, running query/queries on their local shards. Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Mar 27, 2013 at 3:11 PM, Jack Park jackp...@topicquests.org wrote: Hi Otis, I fully expect to grow to SolrCloud -- many shards. For now, it's solo. But, my thinking relates to cloud. I look for ways to reduce the number of HTTP round trips through SolrJ. Maybe you have some ideas? Thanks Jack On Wed, Mar 27, 2013 at 10:04 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi Jack, Is this really about HTTP and Solr vs. SolrCloud or more whether Solr(Cloud) is the right tool for the job and if so how to structure the schema and queries to make such lookups efficient? Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Mar 27, 2013 at 12:53 PM, Jack Park jackp...@topicquests.org wrote: This is a question about isA? We want to know if M isA B isA?(M,B) For some M, one might be able to look into M to see its type or which class(es) for which it is a subClass. We're talking taxonomic queries now. But, for some M, one might need to ripple up the transitive closure, looking at all the super classes, etc, recursively. It seems unreasonable to do that over HTTP; it seems more reasonable to grab a core and write a custom isA query handler. But, how do you do that in a SolrCloud? Really curious... Many thanks in advance for ideas. Jack
RE: Is deltaQuery mandatory ?
No, it's not mandatory. You can't do delta imports without delta queries though; you'd need to do a full-import. Per your query, you'd only ever do objects with rownum=5000. -Original Message- From: A. Lotfi [mailto:majidna...@yahoo.com] Sent: Thursday, March 28, 2013 10:07 AM To: gene...@lucene.apache.org; solr-user@lucene.apache.org Subject: Is deltaQuery mandatory ? Is deltaQuery mandatory in data-config.xml ? I did it like this : entity name=residential query=select * from tsunami.consumer_data_01 where state='MA' and rownum = 5000 deltaQuery=select LEMSMATCHCODE, STREETNAME from residential where last_modified '${dataimporter.last_index_time}' Then my manager come and said we don't need it, this is only for incremental. I took off the line that start with deltaQuery, now in : http://localhost:8983/solr/#/db/dataimport//dataimport entity is empty, when I click the button Exwcute, nothing happened, thanks.
RE: SOLR - Unable to execute query error - DIH
You may want to run your jdbc driver in trace mode just to see if it is picking up these different options. I know from experience that the selectMethod parameter can sometimes be important to prevent SQLServer drivers from caching the entire resultset in memory. But something seems very wrong here and maybe driver tuning is really not what you need. 18 minutes to index 500 documents is extreme. Unless the documents were huge or you were doing very unusual, I'd expect this to happen in seconds (1 second?). Are you indexing on a Raspberry Pi? Possibly, you have a cartesian join somewhere in your sql, or some other little mistake? If you post your entire data-config.xml possibly someone will see the error. Or, could you be extremely memory constrained because of bad JVM heap choices? Do your logs show you the jvm constantly in GC cycles? Just a little note: batchSize goes on the dataSource / tag, not on document /. I really don't think tweaking batchSize is going to fix this though. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: kobe.free.wo...@gmail.com [mailto:kobe.free.wo...@gmail.com] Sent: Thursday, March 28, 2013 1:43 AM To: solr-user@lucene.apache.org Subject: RE: SOLR - Unable to execute query error - DIH Thanks James. We have tried the following options *(individually)* including the one you suggested, 1.selectMethod=cursor 2. batchSize=-1 3.responseBuffering=adaptive But the indexing process doesn't seem to be improving at all. When we try to index set of 500 rows it works well gets completed in 18 min. For 1000K rows it took 22 hours (long) for indexing. But, when we try to index the complete set of 750K rows it doesn't show any progress and keeps on executing. Currently both the SQL server as well as the SOLR machine is running on 4 GB RAM. With this configuration does the above scenario stands justified? If we think of upgrading the RAM, which machine should that be, the SOLR machine or the SQL Server machine? Are there any other efficient methods to import/ index data from SQL Server to SOLR? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Unable-to-execute-query-error-DIH-tp4051028p4051981.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Is deltaQuery mandatory ?
You do not need deltaQuery unless you're doing delta (incremental) updates. To configure a full import, try starting with this example: http://wiki.apache.org/solr/DataImportHandler#A_shorter_data-config James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: A. Lotfi [mailto:majidna...@yahoo.com] Sent: Thursday, March 28, 2013 9:07 AM To: gene...@lucene.apache.org; solr-user@lucene.apache.org Subject: Is deltaQuery mandatory ? Is deltaQuery mandatory in data-config.xml ? I did it like this : entity name=residential query=select * from tsunami.consumer_data_01 where state='MA' and rownum = 5000 deltaQuery=select LEMSMATCHCODE, STREETNAME from residential where last_modified '${dataimporter.last_index_time}' Then my manager come and said we don't need it, this is only for incremental. I took off the line that start with deltaQuery, now in : http://localhost:8983/solr/#/db/dataimport//dataimport entity is empty, when I click the button Exwcute, nothing happened, thanks.
RE: SOLR - Unable to execute query error - DIH
What version of Solr4 are you running? We are on 3.6.2 so I can't be confident whether these settings still exist (they probably do...), but here is what we do to speed up full-indexing: In solrconfig.xml, increase your ramBufferSize to 128MB. Increase mergeFactor to 20. Make sure autoCommit is disabled. Basically, you want to minimize how often Lucene/Solr flushes (as that is very time consuming). Merging is also very time consuming, so you want large segments and fewer merges (hence the merge factor increase). We use these settings when we are doing our initial full-indexing and then switch them over to saner defaults do our regular/delta indexing. Roll-backs concern me; why did your query roll back? Did it give an error -- it should have. Should be in your solr log file. Was it because the connection timed out? It's important to find out. We prevented roll backs by effectively splitting our data across entities and then indexing one-entity at a time. This allowed us to make sure that if one sector failed, it didn't impact the entire process. (This can be done by using autoCommit, but that slows down indexing.) If you're getting OOM errors, be sure that your Xmx value is set high enough (and that you have enough memory). You may be able to increase ramBufferSize depending on how much memory you had (we didn't have much). Hope this helps. Swati -Original Message- From: kobe.free.wo...@gmail.com [mailto:kobe.free.wo...@gmail.com] Sent: Thursday, March 28, 2013 2:43 AM To: solr-user@lucene.apache.org Subject: RE: SOLR - Unable to execute query error - DIH Thanks James. We have tried the following options *(individually)* including the one you suggested, 1.selectMethod=cursor 2. batchSize=-1 3.responseBuffering=adaptive But the indexing process doesn't seem to be improving at all. When we try to index set of 500 rows it works well gets completed in 18 min. For 1000K rows it took 22 hours (long) for indexing. But, when we try to index the complete set of 750K rows it doesn't show any progress and keeps on executing. Currently both the SQL server as well as the SOLR machine is running on 4 GB RAM. With this configuration does the above scenario stands justified? If we think of upgrading the RAM, which machine should that be, the SOLR machine or the SQL Server machine? Are there any other efficient methods to import/ index data from SQL Server to SOLR? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Unable-to-execute-query-error-DIH-tp4051028p4051981.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.2 - Slave Index version is higher than Master
Yes, Only thing is, on master delta import is running every 1/2 hour but as there is no data change in last 24 hour i think index version still remains same. another thing i notice is after full import index Gen is bumped to directly higher than slave. Can that means Master is not increasing Version and Gen with delta-import correctly? See as below. *Before Full Import* Master: 1364331607690 154 88.28 KB Slave: 1364395321127 241 98.75 KB *After Full Import* Master: 1364395566324 242 88.28 KB Slave: 1364395321127 241 98.75 KB On Tue, Mar 26, 2013 at 1:05 PM, Mark Miller-3 [via Lucene] ml-node+s472066n4051477...@n3.nabble.com wrote: That's pretty interesting. The slave should have no way of doing this without a commit… - Mark On Mar 26, 2013, at 11:07 AM, Uomesh [hidden email]http://user/SendEmail.jtp?type=nodenode=4051477i=0 wrote: Hi Mark, Further details: My master details has not changed since last 24 hours but Slave index version and Gen has increased. If i do the full import slave is replicated and Version and Gen is reset. Version GenSize Master: 1364238678758 111 768.23 KB Slave: 1364299206396 155 768.02 KB On Fri, Mar 22, 2013 at 3:32 PM, Mark Miller-3 [via Lucene] [hidden email] http://user/SendEmail.jtp?type=nodenode=4051477i=1 wrote: That was to you Phil. So it seems this is a problem with the configuration replication case I would guess - I didn't really look at that path in the 4.2 fixes I worked on. I did add it to the new testing I'm doing since I've suspected it (it will prompt a core reload that doesn't happen when configs don't replicate). I'll see what I can do to try and get a test to catch it. - mark On Mar 22, 2013, at 1:49 PM, Mark Miller [hidden email] http://user/SendEmail.jtp?type=nodenode=4050577i=0 wrote: And your also on 4.2? - Mark On Mar 22, 2013, at 12:41 PM, Uomesh [hidden email] http://user/SendEmail.jtp?type=nodenode=4050577i=1 wrote: Also, I am replicating only on commit and startup. Thanks, Umesh On Fri, Mar 22, 2013 at 11:23 AM, Umesh Sharma [hidden email] http://user/SendEmail.jtp?type=nodenode=4050577i=2 wrote: Hi Mrk, I am replicating below config files but not replicating solrconfig.xml. confFiles: schema.xml, elevate.xml, stopwords.txt, mapping-FoldToASCII.txt, mapping-ISOLatin1Accent.txt, protwords.txt, spellings.txt, synonyms.txt also strange I am seeing big Gen difference between Master and slave. My master slave is 2 while Slave is 56. If i do the full import then the Gen is getting higher then slave one and its replicating. i have more than 30 cores on my solr instance and all are scheduled to replicate on same time. Index Version Gen Size Master: 1363903243590 2 94 bytes Slave: 1363967579193 56 94 bytes Thanks, Umesh On Fri, Mar 22, 2013 at 10:42 AM, Mark Miller-3 [via Lucene] [hidden email] http://user/SendEmail.jtp?type=nodenode=4050577i=3 wrote: Are you replicating configuration files as well? - Mark On Mar 22, 2013, at 6:38 AM, John, Phil (CSS) [hidden email] http://user/SendEmail.jtp?type=nodenode=4050075i=0 wrote: To add to the discussion. We're running classic master/slave replication (not solrcloud) with 1 master and 2 slaves and I noticed the slave having a higher version number than the master the other day as well. In our case, knock on wood, it hasn't stopped replication. If you'd like a copy of our config I can provide off-list. Regards, Phil. From: Mark Miller [mailto:[hidden email] http://user/SendEmail.jtp?type=nodenode=4050075i=1] Sent: Fri 22/03/2013 06:32 To: [hidden email] http://user/SendEmail.jtp?type=nodenode=4050075i=2 Subject: Re: Solr 4.2 - Slave Index version is higher than Master The other odd thing here is that this should not stop replication at all. When the slave is ahead, it will still have it's index replaced. - Mark On Mar 22, 2013, at 1:26 AM, Mark Miller [hidden email] http://user/SendEmail.jtp?type=nodenode=4050075i=3 wrote: I'm working on testing to try and catch what you are seeing here: https://issues.apache.org/jira/browse/SOLR-4629 - Mark On Mar 22, 2013, at 12:23 AM, Mark Miller [hidden email] http://user/SendEmail.jtp?type=nodenode=4050075i=4 wrote: Let me know if there is anything else you can add. A test with your setup that index n docs randomly, commits, randomly updates a conf file or not, and then replicates and repeats x times does not seem to fail, even with very high values for n and x. On every replication, the versions are compared. Is there anything else you are putting into this mix? - Mark On Mar 21, 2013, at 11:28 PM, Uomesh [hidden email] http://user/SendEmail.jtp?type=nodenode=4050075i=5 wrote:
Re: Too many fields to Sort in Solr
still no luck Performed. 1. Stop the Application Server (JBoss) 2. Deleted everything under data 3. Star the server 4. Observe exception in log (i have uploaded the file) on a side note. do i need to have any additional jar files in the solr home lib folder. currently its empty. docValueException.log http://lucene.472066.n3.nabble.com/file/n4052070/docValueException.log -- View this message in context: http://lucene.472066.n3.nabble.com/Too-many-fields-to-Sort-in-Solr-tp4049139p4052070.html Sent from the Solr - User mailing list archive at Nabble.com.
Batch Search Query
Hello. My company is currently thinking of switching over to Solr 4.2, coming off of SQL Server. However, what we need to do is a bit weird. Right now, we have ~12 million segments and growing. Usually these are sentences but can be other things. These segments are what will be stored in Solr. I’ve already done that. Now, what happens is a user will upload say a word document to us. We then parse it and process it into segments. It very well could be 5000 segments or even more in that word document. Each one of those ~5000 segments needs to be searched for similar segments in solr. I’m not quite sure how I will do the query (whether proximate or something else). The point though, is to get back similar results for each segment. However, I think I’m seeing a bigger problem first. I have to search against ~5000 segments. That would be 5000 http requests. That’s a lot! I’m pretty sure that would take a LOT of hardware. Keep in mind this could be happening with maybe 4 different users at once right now (and of course more in the future). Is there a good way to send a batch query over one (or at least a lot fewer) http requests? If not, what kinds of things could I do to implement such a feature (if feasible, of course)? Thanks, Mike
Re: Batch Search Query
Hi Mike, Interesting problem - here's some pointers on where to get started. For finding similar segments, check out Solr's More Like This support - it's built in to the query request processing so you just need to enable it with query params. There's nothing built in for doing batch queries from the client side. You might look into implementing a custom search component and register it as a first-component in your search handler (take a look at solrconfig.xml for how search handlers are configured, e.g. /browse). Cheers, Tim On Thu, Mar 28, 2013 at 9:43 AM, Mike Haas mikehaas...@gmail.com wrote: Hello. My company is currently thinking of switching over to Solr 4.2, coming off of SQL Server. However, what we need to do is a bit weird. Right now, we have ~12 million segments and growing. Usually these are sentences but can be other things. These segments are what will be stored in Solr. I’ve already done that. Now, what happens is a user will upload say a word document to us. We then parse it and process it into segments. It very well could be 5000 segments or even more in that word document. Each one of those ~5000 segments needs to be searched for similar segments in solr. I’m not quite sure how I will do the query (whether proximate or something else). The point though, is to get back similar results for each segment. However, I think I’m seeing a bigger problem first. I have to search against ~5000 segments. That would be 5000 http requests. That’s a lot! I’m pretty sure that would take a LOT of hardware. Keep in mind this could be happening with maybe 4 different users at once right now (and of course more in the future). Is there a good way to send a batch query over one (or at least a lot fewer) http requests? If not, what kinds of things could I do to implement such a feature (if feasible, of course)? Thanks, Mike
Re: [ANNOUNCE] Solr wiki editing change
Please add OussamaJilal to the group. Thank you. 2013/3/28 Steve Rowe sar...@gmail.com On Mar 28, 2013, at 9:25 AM, Andy Lester a...@petdance.com wrote: On Mar 24, 2013, at 10:18 PM, Steve Rowe sar...@gmail.com wrote: Please request either on the solr-user@lucene.apache.org or on d...@lucene.apache.org to have your wiki username added to the ContributorsGroup page - this is a one-time step. Please add my username, AndyLester, to the approved editors list. Thanks. Added to solr ContributorsGroup.
Re: Too many fields to Sort in Solr
Update --- I was able to fix the exception by adding following line in solrconfig.xml codecFactory name=CodecFactory class=solr.SchemaCodecFactory / Not sure if its mentioned in any document to have this declared in config file. I am now re-indexing and data on the master and will perform test to see if it works as expected. thanks for your support. Aditya -- View this message in context: http://lucene.472066.n3.nabble.com/Too-many-fields-to-Sort-in-Solr-tp4049139p4052091.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: [ANNOUNCE] Solr wiki editing change
On Mar 28, 2013, at 11:57 AM, Jilal Oussama jilal.ouss...@gmail.com wrote: Please add OussamaJilal to the group. Added to solr ContributorsGroup.
Re: Solr and OpenPipe
git clone https://github.com/kolstae/openpipe cd openpipe mvn install regards -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-and-OpenPipe-tp484777p4052079.html Sent from the Solr - User mailing list archive at Nabble.com.
bootstrap_conf without restarting
I'm doing fairly frequent changes to my data-config.xml files on some of my cores in a solr cloud setup. Is there anyway to to get these files active and up to Zookeeper without restarting the instance? I've noticed that if I just launch another instance of solr with the bootstrap_conf flag set to true, it uploads the new settings, but it dies because there's already a solr instance running on that port. It also seems to make the original one unresponsive or at least down in zookeeper's eyes. I then just restart that instance and everything is back up. It'd be nice if I could bootstrap without actually starting solr. What's the best practice for deploying changes to data-config.xml? Thanks, Jim -- View this message in context: http://lucene.472066.n3.nabble.com/bootstrap-conf-without-restarting-tp4052092.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Batch Search Query
Apologies if you already do something similar, but perhaps of general interest... One (different approach) to your problem is to implement a local fingerprint - if you want to find documents with overlapping segments, this algorithm will dramatically reduce the number of segments you create/search for every document http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf Then you simply end up indexing each document, and upon submission: computing fingerprints and querying for them. I don't know (ie. remember) exact numbers, but my feeling is that you end up storing ~13% of document text (besides, it is a one token fingerprint, therefore quite fast to search for - you could even try one huge boolean query with 1024 clauses, ouch... :)) roman On Thu, Mar 28, 2013 at 11:43 AM, Mike Haas mikehaas...@gmail.com wrote: Hello. My company is currently thinking of switching over to Solr 4.2, coming off of SQL Server. However, what we need to do is a bit weird. Right now, we have ~12 million segments and growing. Usually these are sentences but can be other things. These segments are what will be stored in Solr. I’ve already done that. Now, what happens is a user will upload say a word document to us. We then parse it and process it into segments. It very well could be 5000 segments or even more in that word document. Each one of those ~5000 segments needs to be searched for similar segments in solr. I’m not quite sure how I will do the query (whether proximate or something else). The point though, is to get back similar results for each segment. However, I think I’m seeing a bigger problem first. I have to search against ~5000 segments. That would be 5000 http requests. That’s a lot! I’m pretty sure that would take a LOT of hardware. Keep in mind this could be happening with maybe 4 different users at once right now (and of course more in the future). Is there a good way to send a batch query over one (or at least a lot fewer) http requests? If not, what kinds of things could I do to implement such a feature (if feasible, of course)? Thanks, Mike
Re: Solr and OpenPipe
Bella lì! vedo che ci divertiamo Il giorno 28/mar/2013 17:11, Fabio Curti fabio.cu...@gmail.com ha scritto: git clone https://github.com/kolstae/openpipe cd openpipe mvn install regards -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-and-OpenPipe-tp484777p4052079.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Batch Search Query
Thanks for your reply, Roman. Unfortunately, the business has been running this way forever so I don't think it would be feasible to switch to a whole document store versus segments store. Even then, if I understand you correctly it would not work for our needs. I'm thinking because we don't care about any other parts of the document, just the segment. If a similar segment is in an entirely different document, we want that segment. I'll keep taking any and all feedback however so that I can develop an idea and present it to my manager. On Thu, Mar 28, 2013 at 11:16 AM, Roman Chyla roman.ch...@gmail.com wrote: Apologies if you already do something similar, but perhaps of general interest... One (different approach) to your problem is to implement a local fingerprint - if you want to find documents with overlapping segments, this algorithm will dramatically reduce the number of segments you create/search for every document http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf Then you simply end up indexing each document, and upon submission: computing fingerprints and querying for them. I don't know (ie. remember) exact numbers, but my feeling is that you end up storing ~13% of document text (besides, it is a one token fingerprint, therefore quite fast to search for - you could even try one huge boolean query with 1024 clauses, ouch... :)) roman On Thu, Mar 28, 2013 at 11:43 AM, Mike Haas mikehaas...@gmail.com wrote: Hello. My company is currently thinking of switching over to Solr 4.2, coming off of SQL Server. However, what we need to do is a bit weird. Right now, we have ~12 million segments and growing. Usually these are sentences but can be other things. These segments are what will be stored in Solr. I’ve already done that. Now, what happens is a user will upload say a word document to us. We then parse it and process it into segments. It very well could be 5000 segments or even more in that word document. Each one of those ~5000 segments needs to be searched for similar segments in solr. I’m not quite sure how I will do the query (whether proximate or something else). The point though, is to get back similar results for each segment. However, I think I’m seeing a bigger problem first. I have to search against ~5000 segments. That would be 5000 http requests. That’s a lot! I’m pretty sure that would take a LOT of hardware. Keep in mind this could be happening with maybe 4 different users at once right now (and of course more in the future). Is there a good way to send a batch query over one (or at least a lot fewer) http requests? If not, what kinds of things could I do to implement such a feature (if feasible, of course)? Thanks, Mike
Re: Batch Search Query
This might not be a good match for Solr, or for many other systems. It does seem like a natural fit for MarkLogic. That natively searches and selects over XML documents. Disclaimer: I worked at MarkLogic for a couple of years. wunder On Mar 28, 2013, at 9:27 AM, Mike Haas wrote: Thanks for your reply, Roman. Unfortunately, the business has been running this way forever so I don't think it would be feasible to switch to a whole document store versus segments store. Even then, if I understand you correctly it would not work for our needs. I'm thinking because we don't care about any other parts of the document, just the segment. If a similar segment is in an entirely different document, we want that segment. I'll keep taking any and all feedback however so that I can develop an idea and present it to my manager. On Thu, Mar 28, 2013 at 11:16 AM, Roman Chyla roman.ch...@gmail.com wrote: Apologies if you already do something similar, but perhaps of general interest... One (different approach) to your problem is to implement a local fingerprint - if you want to find documents with overlapping segments, this algorithm will dramatically reduce the number of segments you create/search for every document http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf Then you simply end up indexing each document, and upon submission: computing fingerprints and querying for them. I don't know (ie. remember) exact numbers, but my feeling is that you end up storing ~13% of document text (besides, it is a one token fingerprint, therefore quite fast to search for - you could even try one huge boolean query with 1024 clauses, ouch... :)) roman On Thu, Mar 28, 2013 at 11:43 AM, Mike Haas mikehaas...@gmail.com wrote: Hello. My company is currently thinking of switching over to Solr 4.2, coming off of SQL Server. However, what we need to do is a bit weird. Right now, we have ~12 million segments and growing. Usually these are sentences but can be other things. These segments are what will be stored in Solr. I’ve already done that. Now, what happens is a user will upload say a word document to us. We then parse it and process it into segments. It very well could be 5000 segments or even more in that word document. Each one of those ~5000 segments needs to be searched for similar segments in solr. I’m not quite sure how I will do the query (whether proximate or something else). The point though, is to get back similar results for each segment. However, I think I’m seeing a bigger problem first. I have to search against ~5000 segments. That would be 5000 http requests. That’s a lot! I’m pretty sure that would take a LOT of hardware. Keep in mind this could be happening with maybe 4 different users at once right now (and of course more in the future). Is there a good way to send a batch query over one (or at least a lot fewer) http requests? If not, what kinds of things could I do to implement such a feature (if feasible, of course)? Thanks, Mike -- Walter Underwood wun...@wunderwood.org
Re: Batch Search Query
Thanks Timothy, In regards to you mentioning using MoreLikeThis, do you know what kind of algorithm it uses? My searching didn't reveal anything. On Thu, Mar 28, 2013 at 10:51 AM, Timothy Potter thelabd...@gmail.comwrote: Hi Mike, Interesting problem - here's some pointers on where to get started. For finding similar segments, check out Solr's More Like This support - it's built in to the query request processing so you just need to enable it with query params. There's nothing built in for doing batch queries from the client side. You might look into implementing a custom search component and register it as a first-component in your search handler (take a look at solrconfig.xml for how search handlers are configured, e.g. /browse). Cheers, Tim On Thu, Mar 28, 2013 at 9:43 AM, Mike Haas mikehaas...@gmail.com wrote: Hello. My company is currently thinking of switching over to Solr 4.2, coming off of SQL Server. However, what we need to do is a bit weird. Right now, we have ~12 million segments and growing. Usually these are sentences but can be other things. These segments are what will be stored in Solr. I’ve already done that. Now, what happens is a user will upload say a word document to us. We then parse it and process it into segments. It very well could be 5000 segments or even more in that word document. Each one of those ~5000 segments needs to be searched for similar segments in solr. I’m not quite sure how I will do the query (whether proximate or something else). The point though, is to get back similar results for each segment. However, I think I’m seeing a bigger problem first. I have to search against ~5000 segments. That would be 5000 http requests. That’s a lot! I’m pretty sure that would take a LOT of hardware. Keep in mind this could be happening with maybe 4 different users at once right now (and of course more in the future). Is there a good way to send a batch query over one (or at least a lot fewer) http requests? If not, what kinds of things could I do to implement such a feature (if feasible, of course)? Thanks, Mike
Re: Batch Search Query
On Thu, Mar 28, 2013 at 12:27 PM, Mike Haas mikehaas...@gmail.com wrote: Thanks for your reply, Roman. Unfortunately, the business has been running this way forever so I don't think it would be feasible to switch to a whole sure, no arguing against that :) document store versus segments store. Even then, if I understand you correctly it would not work for our needs. I'm thinking because we don't care about any other parts of the document, just the segment. If a similar segment is in an entirely different document, we want that segment. the algo should work for this case - the beauty of the local winnowing is that it is *local*, ie it tends to select the same segments from the text (ie. you process two documents, written by two different people - but if they cited the same thing, and it is longer than 'm' tokens, you will have at least one identical fingerprints from both documents - which means: match!) then of course, you can store the position offset of the original words of the fingerprint and retrieve the original, compute ratio of overlap etc... but a database seems to be better suited for these kind of jobs... let us know what you adopt! ps: MoreLikeThis selects 'significant' tokens from the document you selected and then constructs a new boolean query searching for those. http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/ I'll keep taking any and all feedback however so that I can develop an idea and present it to my manager. On Thu, Mar 28, 2013 at 11:16 AM, Roman Chyla roman.ch...@gmail.com wrote: Apologies if you already do something similar, but perhaps of general interest... One (different approach) to your problem is to implement a local fingerprint - if you want to find documents with overlapping segments, this algorithm will dramatically reduce the number of segments you create/search for every document http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf Then you simply end up indexing each document, and upon submission: computing fingerprints and querying for them. I don't know (ie. remember) exact numbers, but my feeling is that you end up storing ~13% of document text (besides, it is a one token fingerprint, therefore quite fast to search for - you could even try one huge boolean query with 1024 clauses, ouch... :)) roman On Thu, Mar 28, 2013 at 11:43 AM, Mike Haas mikehaas...@gmail.com wrote: Hello. My company is currently thinking of switching over to Solr 4.2, coming off of SQL Server. However, what we need to do is a bit weird. Right now, we have ~12 million segments and growing. Usually these are sentences but can be other things. These segments are what will be stored in Solr. I’ve already done that. Now, what happens is a user will upload say a word document to us. We then parse it and process it into segments. It very well could be 5000 segments or even more in that word document. Each one of those ~5000 segments needs to be searched for similar segments in solr. I’m not quite sure how I will do the query (whether proximate or something else). The point though, is to get back similar results for each segment. However, I think I’m seeing a bigger problem first. I have to search against ~5000 segments. That would be 5000 http requests. That’s a lot! I’m pretty sure that would take a LOT of hardware. Keep in mind this could be happening with maybe 4 different users at once right now (and of course more in the future). Is there a good way to send a batch query over one (or at least a lot fewer) http requests? If not, what kinds of things could I do to implement such a feature (if feasible, of course)? Thanks, Mike
multiple SolrCloud clusters with one ZooKeeper ensemble?
Can I use a single ZooKeeper ensemble for multiple SolrCloud clusters or would each SolrCloud cluster requires its own ZooKeeper ensemble? Bill
Re: multiple SolrCloud clusters with one ZooKeeper ensemble?
: Can I use a single ZooKeeper ensemble for multiple SolrCloud clusters or : would each SolrCloud cluster requires its own ZooKeeper ensemble? https://wiki.apache.org/solr/SolrCloud#Zookeeper_chroot (I'm going to FAQ this) -Hoss
Re: Too many fields to Sort in Solr
I didn't have to do anything with the codecs to make it work. Checked my solrconfig.xml and the codecFactory element is not present. I'm running the out of the box jetty setup. On Thu, Mar 28, 2013 at 11:58 AM, adityab aditya_ba...@yahoo.com wrote: Update --- I was able to fix the exception by adding following line in solrconfig.xml codecFactory name=CodecFactory class=solr.SchemaCodecFactory / Not sure if its mentioned in any document to have this declared in config file. I am now re-indexing and data on the master and will perform test to see if it works as expected. thanks for your support. Aditya -- View this message in context: http://lucene.472066.n3.nabble.com/Too-many-fields-to-Sort-in-Solr-tp4049139p4052091.html Sent from the Solr - User mailing list archive at Nabble.com. -- Joel Bernstein Professional Services LucidWorks
Re: Batch Search Query
I will definitely let you all know what we end up doing. I realized I forgot to mention something that might make what we do more clear. Right now we use sql server full text to get back fairly similar matches for each segment. We do this with some funky sql stuff which I didn't write and haven't even looked at. It gives us back 100 results. They are not really all that good of matches though, it just gives us something to work with. So although some results are good, some are horrible. Then, to truly make sure we have a good match we take each one of those ~100 results and run it through a levenshtein algorithm implemented in c# code. Levenshtein gives back a % match. We then use the highest match so long as it is above 85% Hope this makes it a little more clear what we are doing. On Thu, Mar 28, 2013 at 11:39 AM, Roman Chyla roman.ch...@gmail.com wrote: On Thu, Mar 28, 2013 at 12:27 PM, Mike Haas mikehaas...@gmail.com wrote: Thanks for your reply, Roman. Unfortunately, the business has been running this way forever so I don't think it would be feasible to switch to a whole sure, no arguing against that :) document store versus segments store. Even then, if I understand you correctly it would not work for our needs. I'm thinking because we don't care about any other parts of the document, just the segment. If a similar segment is in an entirely different document, we want that segment. the algo should work for this case - the beauty of the local winnowing is that it is *local*, ie it tends to select the same segments from the text (ie. you process two documents, written by two different people - but if they cited the same thing, and it is longer than 'm' tokens, you will have at least one identical fingerprints from both documents - which means: match!) then of course, you can store the position offset of the original words of the fingerprint and retrieve the original, compute ratio of overlap etc... but a database seems to be better suited for these kind of jobs... let us know what you adopt! ps: MoreLikeThis selects 'significant' tokens from the document you selected and then constructs a new boolean query searching for those. http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/ I'll keep taking any and all feedback however so that I can develop an idea and present it to my manager. On Thu, Mar 28, 2013 at 11:16 AM, Roman Chyla roman.ch...@gmail.com wrote: Apologies if you already do something similar, but perhaps of general interest... One (different approach) to your problem is to implement a local fingerprint - if you want to find documents with overlapping segments, this algorithm will dramatically reduce the number of segments you create/search for every document http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf Then you simply end up indexing each document, and upon submission: computing fingerprints and querying for them. I don't know (ie. remember) exact numbers, but my feeling is that you end up storing ~13% of document text (besides, it is a one token fingerprint, therefore quite fast to search for - you could even try one huge boolean query with 1024 clauses, ouch... :)) roman On Thu, Mar 28, 2013 at 11:43 AM, Mike Haas mikehaas...@gmail.com wrote: Hello. My company is currently thinking of switching over to Solr 4.2, coming off of SQL Server. However, what we need to do is a bit weird. Right now, we have ~12 million segments and growing. Usually these are sentences but can be other things. These segments are what will be stored in Solr. I’ve already done that. Now, what happens is a user will upload say a word document to us. We then parse it and process it into segments. It very well could be 5000 segments or even more in that word document. Each one of those ~5000 segments needs to be searched for similar segments in solr. I’m not quite sure how I will do the query (whether proximate or something else). The point though, is to get back similar results for each segment. However, I think I’m seeing a bigger problem first. I have to search against ~5000 segments. That would be 5000 http requests. That’s a lot! I’m pretty sure that would take a LOT of hardware. Keep in mind this could be happening with maybe 4 different users at once right now (and of course more in the future). Is there a good way to send a batch query over one (or at least a lot fewer) http requests? If not, what kinds of things could I do to implement such a feature (if feasible, of course)? Thanks, Mike
Re: Solr sorting and relevance
Thanks for the fast response. I am still just learning solr so please bear with me. This still sounds like the wrong products would appear at the top if they have more inventory unless I am misunderstanding. High boost low boost seems to make sense to me. That alone would return the more relevant items at the top but once we do a query boost on inventory, wouldn't jeans (using the aforementioned example) with more inventory that boots appear at top. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-sorting-and-relevance-tp4051918p4052122.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Too many fields to Sort in Solr
Wo. that's strange. I tried toggling with the code factory line in solrconfig.xml (attached in this post) commenting gives me error where as un-commenting works. can you please take a look into config and let me know if anything wrong there? thanks Aditya solrconfig.xml http://lucene.472066.n3.nabble.com/file/n4052131/solrconfig.xml -- View this message in context: http://lucene.472066.n3.nabble.com/Too-many-fields-to-Sort-in-Solr-tp4049139p4052131.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr sorting and relevance
If you had a high boost on the title with a moderate boost on the inventory it sounds like you'd get boots first ordered by inventory followed by jeans ordered by inventory. Because the heavy title boost would move the boots to the top. You can play with the boost factors to try and get the mix you're looking for. On Thu, Mar 28, 2013 at 1:20 PM, scallawa dami...@altrec.com wrote: Thanks for the fast response. I am still just learning solr so please bear with me. This still sounds like the wrong products would appear at the top if they have more inventory unless I am misunderstanding. High boost low boost seems to make sense to me. That alone would return the more relevant items at the top but once we do a query boost on inventory, wouldn't jeans (using the aforementioned example) with more inventory that boots appear at top. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-sorting-and-relevance-tp4051918p4052122.html Sent from the Solr - User mailing list archive at Nabble.com. -- Joel Bernstein Professional Services LucidWorks
Re: multiple SolrCloud clusters with one ZooKeeper ensemble?
Thanks. Now I have to go back and re-read the entire SolrCloud Wiki to see what other info I missed and/or forgot. Bill On Thu, Mar 28, 2013 at 12:48 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Can I use a single ZooKeeper ensemble for multiple SolrCloud clusters or : would each SolrCloud cluster requires its own ZooKeeper ensemble? https://wiki.apache.org/solr/SolrCloud#Zookeeper_chroot (I'm going to FAQ this) -Hoss
Could not load config for solrconfig.xml
Hi, solr setup in windows worked fine, I tried to follow installing solr in unix, when I started tomcat I got this exxception : SEVERE: Unable to create core: collection1 org.apache.solr.common.SolrException: Could not load config for solrconfig.xml at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java :991) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:44 1) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec utor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor .java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.io.IOException: Can't find resource 'solrconfig.xml' in classpat h or '/home/javaguys/solr-home/collection1/conf/', cwd=/home/spbear/javaguys/apa che-tomcat-7.0.39/bin at org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoad er.java:318) at org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader .java:283) at org.apache.solr.core.Config.init(Config.java:103) at org.apache.solr.core.Config.init(Config.java:73) at org.apache.solr.core.SolrConfig.init(SolrConfig.java:117) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java :989) ... 11 more Mar 28, 2013 1:39:43 PM org.apache.solr.common.SolrException log SEVERE: null:org.apache.solr.common.SolrException: Unable to create core: collec tion1 at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java: 1672) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1057) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:44 1) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec utor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor .java:908) at java.lang.Thread.run(Thread.java:619) Caused by: org.apache.solr.common.SolrException: Could not load config for solrc onfig.xml at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java :991) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051) ... 10 more Caused by: java.io.IOException: Can't find resource 'solrconfig.xml' in classpat h or '/home/javaguys/solr-home/collection1/conf/', cwd=/home/spbear/javaguys/apa che-tomcat-7.0.39/bin at org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoad er.java:318) at org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader .java:283) at org.apache.solr.core.Config.init(Config.java:103) at org.apache.solr.core.Config.init(Config.java:73) at org.apache.solr.core.SolrConfig.init(SolrConfig.java:117) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java :989) ... 11 more Mar 28, 2013 1:39:43 PM org.apache.solr.servlet.SolrDispatchFilter init INFO: user.dir=/home/spbear/javaguys/apache-tomcat-7.0.39/bin Mar 28, 2013 1:39:43 PM org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() done Mar 28, 2013 1:39:43 PM org.apache.catalina.startup.HostConfig deployDirectory INFO: Deploying web application directory /home/spbear/javaguys/apache-tomcat-7. INFO: Registering Log Listener Mar 28, 2013 1:39:42 PM org.apache.solr.core.CoreContainer create INFO: Creating SolrCore 'collection1' using instanceDir: /home/javaguys/solr-hom e/collection1 Mar 28, 2013 1:39:42 PM org.apache.solr.core.SolrResourceLoader init INFO: new SolrResourceLoader for directory: '/home/javaguys/solr-home/collection 1/' Mar 28, 2013 1:39:43 PM org.apache.solr.core.CoreContainer recordAndThrow SEVERE: Unable to create core: collection1 org.apache.solr.common.SolrException: Could not load config for solrconfig.xml at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java :991) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
Re: bootstrap_conf without restarting
Couple notes though: java -classpath example/solr-webapp/WEB-INF/lib/* org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost 127.0.0.1:9983 -confdir example/solr/collection1/conf -confname conf1 -solrhome example/solr I don't think you want that -solrhome - if I remember right, thats for testing/local purposes and is just for when you want to run zk internally from the cmd. Generally that should be ignored. I think you also might want to put the -classpath value in quotes, or your OS can do some auto expanding that causes issues…so I think it might be better to do like: java -classpath example/solr-webapp/WEB-INF/lib/* org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost 127.0.0.1:9983 -confdir example/solr/collection1/conf -confname conf1 I think the examples on the wiki should probably be updated. -solrhome is only needed with the bootstrap option I believe. - Mark On Mar 28, 2013, at 1:14 PM, Joel Bernstein joels...@gmail.com wrote: You can use the upconfig command witch is described on the Solr Cloud wiki page, followed by a collection reload also described on the wiki. Here is a sample command upconfig: java -classpath example/solr-webapp/WEB-INF/lib/* org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost 127.0.0.1:9983 -confdir example/solr/collection1/conf -confname conf1 -solrhome example/solr On Thu, Mar 28, 2013 at 12:05 PM, jimtronic jimtro...@gmail.com wrote: I'm doing fairly frequent changes to my data-config.xml files on some of my cores in a solr cloud setup. Is there anyway to to get these files active and up to Zookeeper without restarting the instance? I've noticed that if I just launch another instance of solr with the bootstrap_conf flag set to true, it uploads the new settings, but it dies because there's already a solr instance running on that port. It also seems to make the original one unresponsive or at least down in zookeeper's eyes. I then just restart that instance and everything is back up. It'd be nice if I could bootstrap without actually starting solr. What's the best practice for deploying changes to data-config.xml? Thanks, Jim -- View this message in context: http://lucene.472066.n3.nabble.com/bootstrap-conf-without-restarting-tp4052092.html Sent from the Solr - User mailing list archive at Nabble.com. -- Joel Bernstein Professional Services LucidWorks
Re: Could not load config for solrconfig.xml
On 29 March 2013 00:19, A. Lotfi majidna...@yahoo.com wrote: Hi, solr setup in windows worked fine, I tried to follow installing solr in unix, when I started tomcat I got this exxception : [...] Seems it cannot find solrconfig.xml. The relevant part from the logs is: Caused by: java.io.IOException: Can't find resource 'solrconfig.xml' in classpat h or '/home/javaguys/solr-home/collection1/conf/', cwd=/home/spbear/javaguys/apa che-tomcat-7.0.39/bin Have you defined the solr/home property properly in your Solr configuration file? Regards, Gora
Re: SOLR - Documents with large number of fields ~ 450
Hi John, Mark is right. DocValues can be enabled in two ways: RAM resident (default) or on-disk. You can read more here: http://www.slideshare.net/LucidImagination/column-stride-fields-aka-docvalues Regards. On 22 March 2013 16:55, John Nielsen j...@mcb.dk wrote: with the on disk option. Could you elaborate on that? Den 22/03/2013 05.25 skrev Mark Miller markrmil...@gmail.com: You might try using docvalues with the on disk option and try and let the OS manage all the memory needed for all the faceting/sorting. This would require Solr 4.2. - Mark On Mar 21, 2013, at 2:56 AM, kobe.free.wo...@gmail.com wrote: Hello All, Scenario: My data model consist of approx. 450 fields with different types of data. We want to include each field for indexing as a result it will create a single SOLR document with *450 fields*. The total of number of records in the data set is *755K*. We will be using the features like faceting and sorting on approx. 50 fields. We are planning to use SOLR 4.1. Following is the hardware configuration of the web server that we plan to install SOLR on:- CPU: 2 x Dual Core (4 cores) | RAM: 12GB | Storage: 212 GB Questions : 1)What's the best approach when dealing with documents with large number of fields. What's the drawback of having a single document with a very large number of fields. Does SOLR support documents with large number of fields as in my case? 2)Will there be any performance issue if i define all of the 450 fields for indexing? Also if faceting is done on 50 fields with document having large number of fields and huge number of records? 3)The name of the fields in the data set are quiet lengthy around 60 characters. Will it be a problem defining fields with such a huge name in the schema file? Is there any best practice to be followed related to naming convention? Will big field names create problem during querying? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Documents-with-large-number-of-fields-450-tp4049633.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr sorting and relevance
Otis brings up a good point. Possibly you could put logic in your function query to account for this. But it may be that you can't achieve the mix you're looking for without taking direct control. That is the main reason that SOLR-4465 was put out there, for cases where direct control is needed. I have to reiterate that SOLR-4465 is experimental at this point and subject to change. On Thu, Mar 28, 2013 at 3:00 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, But can you ever get this universally right? In some cases there is very little inventory and in some case there is a ton of inventory, so even if you use a small boost for inventory, when the intentory is very large, that will overpower the title boost, no? Otis -- Solr ElasticSearch Support http://sematext.com/ On Thu, Mar 28, 2013 at 2:27 PM, Joel Bernstein joels...@gmail.com wrote: If you had a high boost on the title with a moderate boost on the inventory it sounds like you'd get boots first ordered by inventory followed by jeans ordered by inventory. Because the heavy title boost would move the boots to the top. You can play with the boost factors to try and get the mix you're looking for. On Thu, Mar 28, 2013 at 1:20 PM, scallawa dami...@altrec.com wrote: Thanks for the fast response. I am still just learning solr so please bear with me. This still sounds like the wrong products would appear at the top if they have more inventory unless I am misunderstanding. High boost low boost seems to make sense to me. That alone would return the more relevant items at the top but once we do a query boost on inventory, wouldn't jeans (using the aforementioned example) with more inventory that boots appear at top. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-sorting-and-relevance-tp4051918p4052122.html Sent from the Solr - User mailing list archive at Nabble.com. -- Joel Bernstein Professional Services LucidWorks -- Joel Bernstein Professional Services LucidWorks
Re: Could not load config for solrconfig.xml
Thanks, my path to solr home was missing something, it's worlking, but no results, the same solr app with same configuration files worked in windows. Abdel From: Gora Mohanty g...@mimirtech.com To: solr-user@lucene.apache.org; A. Lotfi majidna...@yahoo.com Cc: gene...@lucene.apache.org gene...@lucene.apache.org Sent: Thursday, March 28, 2013 3:22 PM Subject: Re: Could not load config for solrconfig.xml On 29 March 2013 00:19, A. Lotfi majidna...@yahoo.com wrote: Hi, solr setup in windows worked fine, I tried to follow installing solr in unix, when I started tomcat I got this exxception : [...] Seems it cannot find solrconfig.xml. The relevant part from the logs is: Caused by: java.io.IOException: Can't find resource 'solrconfig.xml' in classpat h or '/home/javaguys/solr-home/collection1/conf/', cwd=/home/spbear/javaguys/apa che-tomcat-7.0.39/bin Have you defined the solr/home property properly in your Solr configuration file? Regards, Gora
Re: Too many fields to Sort in Solr
Not, sure that making changes to the solrconfig.xml is going down the right path here. There might something else with your setup that's causing this issue. I'm not sure what it would be though. On Thu, Mar 28, 2013 at 1:38 PM, adityab aditya_ba...@yahoo.com wrote: Wo. that's strange. I tried toggling with the code factory line in solrconfig.xml (attached in this post) commenting gives me error where as un-commenting works. can you please take a look into config and let me know if anything wrong there? thanks Aditya solrconfig.xml http://lucene.472066.n3.nabble.com/file/n4052131/solrconfig.xml -- View this message in context: http://lucene.472066.n3.nabble.com/Too-many-fields-to-Sort-in-Solr-tp4049139p4052131.html Sent from the Solr - User mailing list archive at Nabble.com. -- Joel Bernstein Professional Services LucidWorks
Re: bootstrap_conf without restarting
I do this frequently, but use the scripts provided in cloud-scripts, e.g. export ZK_HOST=... cloud-scripts/zkcli.sh -zkhost $ZK_HOST -cmd upconfig -confdir $COLLECTION_INSTANCE_DIR/conf -confname $COLLECTION_NAME Also, once you do this, you still have to reload the collection so that it picks up the change: curl -i -v http://URL/solr/admin/collections?action=RELOADname=COLLECTION_NAME; On Thu, Mar 28, 2013 at 1:03 PM, Mark Miller markrmil...@gmail.com wrote: Couple notes though: java -classpath example/solr-webapp/WEB-INF/lib/* org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost 127.0.0.1:9983 -confdir example/solr/collection1/conf -confname conf1 -solrhome example/solr I don't think you want that -solrhome - if I remember right, thats for testing/local purposes and is just for when you want to run zk internally from the cmd. Generally that should be ignored. I think you also might want to put the -classpath value in quotes, or your OS can do some auto expanding that causes issues…so I think it might be better to do like: java -classpath example/solr-webapp/WEB-INF/lib/* org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost 127.0.0.1:9983 -confdir example/solr/collection1/conf -confname conf1 I think the examples on the wiki should probably be updated. -solrhome is only needed with the bootstrap option I believe. - Mark On Mar 28, 2013, at 1:14 PM, Joel Bernstein joels...@gmail.com wrote: You can use the upconfig command witch is described on the Solr Cloud wiki page, followed by a collection reload also described on the wiki. Here is a sample command upconfig: java -classpath example/solr-webapp/WEB-INF/lib/* org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost 127.0.0.1:9983 -confdir example/solr/collection1/conf -confname conf1 -solrhome example/solr On Thu, Mar 28, 2013 at 12:05 PM, jimtronic jimtro...@gmail.com wrote: I'm doing fairly frequent changes to my data-config.xml files on some of my cores in a solr cloud setup. Is there anyway to to get these files active and up to Zookeeper without restarting the instance? I've noticed that if I just launch another instance of solr with the bootstrap_conf flag set to true, it uploads the new settings, but it dies because there's already a solr instance running on that port. It also seems to make the original one unresponsive or at least down in zookeeper's eyes. I then just restart that instance and everything is back up. It'd be nice if I could bootstrap without actually starting solr. What's the best practice for deploying changes to data-config.xml? Thanks, Jim -- View this message in context: http://lucene.472066.n3.nabble.com/bootstrap-conf-without-restarting-tp4052092.html Sent from the Solr - User mailing list archive at Nabble.com. -- Joel Bernstein Professional Services LucidWorks
Re: Could not load config for solrconfig.xml
On 29 March 2013 01:59, A. Lotfi majidna...@yahoo.com wrote: Thanks, my path to solr home was missing something, it's worlking, but no results, the same solr app with same configuration files worked in windows. What do you mean by no results? Have you indexed stuff, and are not able to search for it? Are you expecting to copy Solr files from an old setup with an index, and have things work? That would be OK, provided that the Solr index formats were compatible, but you would also need to copy the index, and define dataDir properly in solrconfig.xml. Regards, Gora
Re: Solr Cloud update process
There are lots of small issues, though. 1. Is Solr tested with a mix of current and previous versions? It is safe to run a cluster that is a mix of 4.1 and 4.2, even for a little bit? 2. Can Solr 4.2 run with Solr 4.1 config files? This means all of conf/, not just the main XML files. 3. We don't want a cluster with config files that are ahead of the software version, so I think we need: * Update all the war files and restart each Solr process. * Upload the new config files * Reload each collection on each Solr process But this requires that Solr 4.2 be able to start with Solr 4.1 config files. 4. Do we need to stop updates, wait for all nodes to sync, and not restart until the whole cluster is uploaded. 5. I'd like a bit more detail about exactly what upconfig is supposed to do, because I spent a lot of time with it doing things that did not result in a working Solr cluster. For example, for files in the directory argument, where exactly do they end up in the Zookeeper space? Currently, I've been doing updates with bootstrap, because it was the only thing I could get to work. wunder On Mar 27, 2013, at 11:56 AM, Shawn Heisey wrote: On 3/27/2013 12:34 PM, Walter Underwood wrote: What do people do for updating, say from 4.1 to 4.2.1, on a live cluster? I need to help our release engineering team create the Jenkins scripts for deployment. Aside from replacing the .war file and restarting your container, there hopefully won't be anything additional required. The subject says SolrCloud, so your config(s) should be in zookeeper. It would generally be a good idea to update luceneMatchVersion to LUCENE_42 in the config(s), unless you happen to know that you're relying on behavior from the old version that changed in the new version. I also make a point of deleting the old extracted version of the .war before restarting, just to be sure there won't be any problems. In theory a servlet container should be able to handle this without intervention, but I don't like taking the chance. Thanks, Shawn
Re: Solr Cloud update process
Hi Walter, I just did our upgrade from a nightly build of 4.1 (a few weeks before the release) and 4.2 - thankfully it went off with 0 downtime and no issues ;-) First and foremost, I had a staging environment that I upgraded first so I already had a good feeling that things would be fine. Hopefully you have a sandbox environment where you can mess around with the upgrade first. On Thu, Mar 28, 2013 at 3:01 PM, Walter Underwood wun...@wunderwood.orgwrote: There are lots of small issues, though. 1. Is Solr tested with a mix of current and previous versions? It is safe to run a cluster that is a mix of 4.1 and 4.2, even for a little bit? I did a rolling upgrade and no issues. So I dropped a node, waited until that was noticed by Zk (almost instant). This left me with a new leader still on 4.1 and then I brought up a replica on 4.2. Then I took down the leader on 4.1 (so Solr failed over to my 4.2 node) and brought it up to 4.2 2. Can Solr 4.2 run with Solr 4.1 config files? This means all of conf/, not just the main XML files. Afaik yes - I didn't change any configuration between 4.1 and 4.2 other than some newSearcher warming queries and cache settings 3. We don't want a cluster with config files that are ahead of the software version, so I think we need: * Update all the war files and restart each Solr process. * Upload the new config files * Reload each collection on each Solr process But this requires that Solr 4.2 be able to start with Solr 4.1 config files. This is what I did too. 4. Do we need to stop updates, wait for all nodes to sync, and not restart until the whole cluster is uploaded. Can't help you on this one as I was not accepting updates during the upgrade. 5. I'd like a bit more detail about exactly what upconfig is supposed to do, because I spent a lot of time with it doing things that did not result in a working Solr cluster. For example, for files in the directory argument, where exactly do they end up in the Zookeeper space? Currently, I've been doing updates with bootstrap, because it was the only thing I could get to work. So when you do upconfig, you pass the collection name, so the files get put under: /configs/COLLECTION_NAME You can test this by doing the upconfig and then going into the admin console: Cloud Tree /configs and verifying your updates are correct. wunder On Mar 27, 2013, at 11:56 AM, Shawn Heisey wrote: On 3/27/2013 12:34 PM, Walter Underwood wrote: What do people do for updating, say from 4.1 to 4.2.1, on a live cluster? I need to help our release engineering team create the Jenkins scripts for deployment. Aside from replacing the .war file and restarting your container, there hopefully won't be anything additional required. The subject says SolrCloud, so your config(s) should be in zookeeper. It would generally be a good idea to update luceneMatchVersion to LUCENE_42 in the config(s), unless you happen to know that you're relying on behavior from the old version that changed in the new version. I also make a point of deleting the old extracted version of the .war before restarting, just to be sure there won't be any problems. In theory a servlet container should be able to handle this without intervention, but I don't like taking the chance. Thanks, Shawn
Re: Solr Cloud update process
Comments hidden inline below. Overall - we need to focus on upgrades at some point, but there is little that should stop the old distrib update process from working (multi node clusters pre solrcloud). Hoever, we should have tests and stuff. If only the days were twice as long. On Mar 28, 2013, at 5:27 PM, Timothy Potter thelabd...@gmail.com wrote: Hi Walter, I just did our upgrade from a nightly build of 4.1 (a few weeks before the release) and 4.2 - thankfully it went off with 0 downtime and no issues ;-) First and foremost, I had a staging environment that I upgraded first so I already had a good feeling that things would be fine. Hopefully you have a sandbox environment where you can mess around with the upgrade first. On Thu, Mar 28, 2013 at 3:01 PM, Walter Underwood wun...@wunderwood.orgwrote: There are lots of small issues, though. 1. Is Solr tested with a mix of current and previous versions? It is safe to run a cluster that is a mix of 4.1 and 4.2, even for a little bit? I did a rolling upgrade and no issues. So I dropped a node, waited until that was noticed by Zk (almost instant). This left me with a new leader still on 4.1 and then I brought up a replica on 4.2. Then I took down the leader on 4.1 (so Solr failed over to my 4.2 node) and brought it up to 4.2 2. Can Solr 4.2 run with Solr 4.1 config files? This means all of conf/, not just the main XML files. Afaik yes - I didn't change any configuration between 4.1 and 4.2 other than some newSearcher warming queries and cache settings That's generally been how things work - old config works with new versions. Occasionally, things might get deprecated. That's why there is the version thing in solrconfig.xml. 3. We don't want a cluster with config files that are ahead of the software version, so I think we need: * Update all the war files and restart each Solr process. * Upload the new config files * Reload each collection on each Solr process But this requires that Solr 4.2 be able to start with Solr 4.1 config files. This is what I did too. 4. Do we need to stop updates, wait for all nodes to sync, and not restart until the whole cluster is uploaded. Can't help you on this one as I was not accepting updates during the upgrade. This should generally work fine. 5. I'd like a bit more detail about exactly what upconfig is supposed to do, because I spent a lot of time with it doing things that did not result in a working Solr cluster. For example, for files in the directory argument, where exactly do they end up in the Zookeeper space? Currently, I've been doing updates with bootstrap, because it was the only thing I could get to work. So when you do upconfig, you pass the collection name, so the files get put under: /configs/COLLECTION_NAME You can test this by doing the upconfig and then going into the admin console: Cloud Tree /configs and verifying your updates are correct. The main different between using bootstrap and upconfig is that upconfig does not link a collection to a config set. You must have a link from a collection to a config set. The following rules apply for this: 1. If there is only one config set, when you start a new collection without an explicit link, it will link to it. 2. If a collection does not have an explicit link, but shares the name of a config set, it will link to it. 3. You can set an explicit link. Also, you can link before creating the collection - it will sit in zk waiting for the collection to find it. - Mark wunder On Mar 27, 2013, at 11:56 AM, Shawn Heisey wrote: On 3/27/2013 12:34 PM, Walter Underwood wrote: What do people do for updating, say from 4.1 to 4.2.1, on a live cluster? I need to help our release engineering team create the Jenkins scripts for deployment. Aside from replacing the .war file and restarting your container, there hopefully won't be anything additional required. The subject says SolrCloud, so your config(s) should be in zookeeper. It would generally be a good idea to update luceneMatchVersion to LUCENE_42 in the config(s), unless you happen to know that you're relying on behavior from the old version that changed in the new version. I also make a point of deleting the old extracted version of the .war before restarting, just to be sure there won't be any problems. In theory a servlet container should be able to handle this without intervention, but I don't like taking the chance. Thanks, Shawn
Re: [ANNOUNCE] Solr wiki editing change
Steve, could you add me to the contrib group? TomasFernandezLobbe Thanks! Tomás On Thu, Mar 28, 2013 at 1:04 PM, Steve Rowe sar...@gmail.com wrote: On Mar 28, 2013, at 11:57 AM, Jilal Oussama jilal.ouss...@gmail.com wrote: Please add OussamaJilal to the group. Added to solr ContributorsGroup.
Re: Solrcloud 4.1 Collection with multiple slices only use
So, by using the numshards at initialization time, using the sample collection1 solr.xml, I'm able to create a sharded and distributed index. Also, by removing any initial cores from the solr.xml file, i'm able to use the collections API via the web to create multiple collection with sharded indexes that work correctly; however, I can't create distributed collections by using the solr.xml alone. Adding the numshards parameter to the first instance of a collection core in the solr.xml file is ignore, cores are created, by update distribution doesn't happen. When booting up Solr, the configs INFO messages show numShards= null. I get the impression from the documentation that you should be able to do this, buy I haven't seen a specific example. With out that, it seems that I'm relegated to the shard names, locations, etc provided by the collections API. I've done this testing under 4.1 True or False? Chris On Mar 27, 2013 9:46 PM, corg...@gmail.com corg...@gmail.com wrote: I realized my error shortly, more docs, better spread. I continued to do some testing to see how I could manually lay out the shards in what I thought was a more organized manner and with more descriptive names than the numshards parameter alone produced. I also gen'd up a few thousand docs and schema to test with. Appreciate the help. - Reply message - From: Erick Erickson erickerick...@gmail.com To: solr-user@lucene.apache.org Subject: Solrcloud 4.1 Collection with multiple slices only use Date: Wed, Mar 27, 2013 9:30 pm First, three documents isn't enough to really test. The formula for assigning shards is to hash on the unique ID. It _is_ possible that all three just happened to land on the same shard. If you index all 32 docs in the example dir and they're all on the same shard, we should talk. Second, a regular query to the cluster will always search all the shards. Use distrib=false on the URL to restrict the search to just the node you fire the request at. Let us know if you index more docs and still see the problem. Best Erick On Wed, Mar 27, 2013 at 9:39 AM, Chris R corg...@gmail.com wrote: So - I must be missing something very basic here and I've gone back to the Wiki example. After setting up the two shard example in the first tutorial and indexing the three example documents, look at the shards in the Admin UI. The documents are stored in the index where the update with directed - they aren't distributed across both shards. Release notes state that the compositeId router is the default when using the numshards parameter? I want an even distribution of documents based on ID across all shards suggestions on what I'm screwing up. Chris On Mon, Mar 25, 2013 at 11:34 PM, Mark Miller markrmil...@gmail.com wrote: I'm guessing you didn't specify numShards. Things changed in 4.1 - if you don't specify numShards it goes into a mode where it's up to you to distribute updates. - Mark On Mar 25, 2013, at 10:29 PM, Chris R corg...@gmail.com wrote: I have two issues and I'm unsure if they are related: Problem: After setting up a multiple collection Solrcloud 4.1 instance on seven servers, when I index the documents they aren't distributed across the index slices. It feels as though, I don't actually have a cloud implementation, yet everything I see in the admin interface and zookeeper implies I do. I feel as I'm overlooking something obvious, but have not been able to figure out what. Configuration: Seven servers and four collections, each with 12 slices (no replica shards yet). Zookeeper configured in a three node ensemble. When I send documents to Server1/Collection1 (which holds two slices of collection1), all the documents show up in a single index shard (core). Perhaps related, I have found it impossible to get Solr to recognize the server names with anything but a literal host=servername parameter in the solr.xml. hostname parameters, host files, network, dns, are all configured correctly I have a Solr 4.0 single collection set up similarly and it works just fine. I'm using the same schema.xml and solrconfig.xml files on the 4.1 implementation with only the luceneMatchVersion changed to LUCENE_41. sample solr.xml from server1 ?xml version=1.0 encoding=UTF-8 ? solr persistent=true cores adminPath=/admin/cores hostPort=8080 host=server1 shareSchema=true zkClientTimeout=6 core collection=col201301 shard=col201301s04 instanceDir=/solr/col201301/col201301s04sh01 name=col201301s04sh01 dataDir=/solr/col201301/col201301s04sh01/data/ core collection=col201301 shard=col201301s11 instanceDir=/solr/col201301/col201301s11sh01 name=col201301s11sh01 dataDir=/solr/col201301/col201301s11sh01/data/ core collection=col201302 shard=col201302s06 instanceDir=/solr/col201302/col201302s06sh01
Re: Solrcloud 4.1 Collection with multiple slices only use
True - though I think for 4.2. numShards has never been respected in the cores def's for various reasons. In 4.0 and 4.1, things should have still worked though - you didn't need to give numShards and everything should work just based on configuring different shard names for core or accepting the default shard names. In 4.2 this went away - not passing numShards now means that you must distrib updates yourself. There are various technical reasons for this given new features that are being added. So, you can only really pre configure *one* collection in solr.xml and then use the numShards sys prop. If you wanted to create another collection the same way with a *different* number of shards, you would have to stop Solr, do a new numShards sys prop after pre configuring the next collection, then start Solr. Not really a good option. And so, the collections API is the way to go - and it's fairly poor in 4.2 due to it's lack of result responses (you have to search the overseer logs). It's slightly better in 4.2 (you will get some response) and much better in 4.2.1 (you will get decent responses). Now that it's much more central, it will continue to improve rapidly. - Mark On Mar 28, 2013, at 6:08 PM, Chris R corg...@gmail.com wrote: So, by using the numshards at initialization time, using the sample collection1 solr.xml, I'm able to create a sharded and distributed index. Also, by removing any initial cores from the solr.xml file, i'm able to use the collections API via the web to create multiple collection with sharded indexes that work correctly; however, I can't create distributed collections by using the solr.xml alone. Adding the numshards parameter to the first instance of a collection core in the solr.xml file is ignore, cores are created, by update distribution doesn't happen. When booting up Solr, the configs INFO messages show numShards= null. I get the impression from the documentation that you should be able to do this, buy I haven't seen a specific example. With out that, it seems that I'm relegated to the shard names, locations, etc provided by the collections API. I've done this testing under 4.1 True or False? Chris On Mar 27, 2013 9:46 PM, corg...@gmail.com corg...@gmail.com wrote: I realized my error shortly, more docs, better spread. I continued to do some testing to see how I could manually lay out the shards in what I thought was a more organized manner and with more descriptive names than the numshards parameter alone produced. I also gen'd up a few thousand docs and schema to test with. Appreciate the help. - Reply message - From: Erick Erickson erickerick...@gmail.com To: solr-user@lucene.apache.org Subject: Solrcloud 4.1 Collection with multiple slices only use Date: Wed, Mar 27, 2013 9:30 pm First, three documents isn't enough to really test. The formula for assigning shards is to hash on the unique ID. It _is_ possible that all three just happened to land on the same shard. If you index all 32 docs in the example dir and they're all on the same shard, we should talk. Second, a regular query to the cluster will always search all the shards. Use distrib=false on the URL to restrict the search to just the node you fire the request at. Let us know if you index more docs and still see the problem. Best Erick On Wed, Mar 27, 2013 at 9:39 AM, Chris R corg...@gmail.com wrote: So - I must be missing something very basic here and I've gone back to the Wiki example. After setting up the two shard example in the first tutorial and indexing the three example documents, look at the shards in the Admin UI. The documents are stored in the index where the update with directed - they aren't distributed across both shards. Release notes state that the compositeId router is the default when using the numshards parameter? I want an even distribution of documents based on ID across all shards suggestions on what I'm screwing up. Chris On Mon, Mar 25, 2013 at 11:34 PM, Mark Miller markrmil...@gmail.com wrote: I'm guessing you didn't specify numShards. Things changed in 4.1 - if you don't specify numShards it goes into a mode where it's up to you to distribute updates. - Mark On Mar 25, 2013, at 10:29 PM, Chris R corg...@gmail.com wrote: I have two issues and I'm unsure if they are related: Problem: After setting up a multiple collection Solrcloud 4.1 instance on seven servers, when I index the documents they aren't distributed across the index slices. It feels as though, I don't actually have a cloud implementation, yet everything I see in the admin interface and zookeeper implies I do. I feel as I'm overlooking something obvious, but have not been able to figure out what. Configuration: Seven servers and four collections, each with 12 slices (no replica shards yet). Zookeeper configured in a three node ensemble. When I send documents
Re: How to shut down the SolrCloud?
Currently, yes. Stop each web container in the normal fashion. That will do a clean shutdown. - Mark On Mar 28, 2013, at 5:48 PM, Li, Qiang qiang...@msci.com wrote: How to shut down the SolrCloud? Just kill all nodes? Regards, Ivan This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged or otherwise protected from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not an intended recipient, please contact the sender by reply email and destroy the original message and any copies of the message as well as any attachments to the original message. Local registered entity information: http://www.msci.com/legal/local_registered_entities.html
Re: Solrcloud 4.1 Collection with multiple slices only use
Interesting, I've been doing battle with it while coming from a 4.0 environment. I only had a single collection then and just created the solr.xml files for each server up front. They each supported a half dozen cores for a single collection. As for 4.1 and collections API, the only issue I've had is the maxCoresPerNode. As you said, the responses all say ok even when its not. I'll probably move up to 4.2 tomorrow. Thanks for the reply. On Mar 28, 2013 6:23 PM, Mark Miller markrmil...@gmail.com wrote: True - though I think for 4.2. numShards has never been respected in the cores def's for various reasons. In 4.0 and 4.1, things should have still worked though - you didn't need to give numShards and everything should work just based on configuring different shard names for core or accepting the default shard names. In 4.2 this went away - not passing numShards now means that you must distrib updates yourself. There are various technical reasons for this given new features that are being added. So, you can only really pre configure *one* collection in solr.xml and then use the numShards sys prop. If you wanted to create another collection the same way with a *different* number of shards, you would have to stop Solr, do a new numShards sys prop after pre configuring the next collection, then start Solr. Not really a good option. And so, the collections API is the way to go - and it's fairly poor in 4.2 due to it's lack of result responses (you have to search the overseer logs). It's slightly better in 4.2 (you will get some response) and much better in 4.2.1 (you will get decent responses). Now that it's much more central, it will continue to improve rapidly. - Mark On Mar 28, 2013, at 6:08 PM, Chris R corg...@gmail.com wrote: So, by using the numshards at initialization time, using the sample collection1 solr.xml, I'm able to create a sharded and distributed index. Also, by removing any initial cores from the solr.xml file, i'm able to use the collections API via the web to create multiple collection with sharded indexes that work correctly; however, I can't create distributed collections by using the solr.xml alone. Adding the numshards parameter to the first instance of a collection core in the solr.xml file is ignore, cores are created, by update distribution doesn't happen. When booting up Solr, the configs INFO messages show numShards= null. I get the impression from the documentation that you should be able to do this, buy I haven't seen a specific example. With out that, it seems that I'm relegated to the shard names, locations, etc provided by the collections API. I've done this testing under 4.1 True or False? Chris On Mar 27, 2013 9:46 PM, corg...@gmail.com corg...@gmail.com wrote: I realized my error shortly, more docs, better spread. I continued to do some testing to see how I could manually lay out the shards in what I thought was a more organized manner and with more descriptive names than the numshards parameter alone produced. I also gen'd up a few thousand docs and schema to test with. Appreciate the help. - Reply message - From: Erick Erickson erickerick...@gmail.com To: solr-user@lucene.apache.org Subject: Solrcloud 4.1 Collection with multiple slices only use Date: Wed, Mar 27, 2013 9:30 pm First, three documents isn't enough to really test. The formula for assigning shards is to hash on the unique ID. It _is_ possible that all three just happened to land on the same shard. If you index all 32 docs in the example dir and they're all on the same shard, we should talk. Second, a regular query to the cluster will always search all the shards. Use distrib=false on the URL to restrict the search to just the node you fire the request at. Let us know if you index more docs and still see the problem. Best Erick On Wed, Mar 27, 2013 at 9:39 AM, Chris R corg...@gmail.com wrote: So - I must be missing something very basic here and I've gone back to the Wiki example. After setting up the two shard example in the first tutorial and indexing the three example documents, look at the shards in the Admin UI. The documents are stored in the index where the update with directed - they aren't distributed across both shards. Release notes state that the compositeId router is the default when using the numshards parameter? I want an even distribution of documents based on ID across all shards suggestions on what I'm screwing up. Chris On Mon, Mar 25, 2013 at 11:34 PM, Mark Miller markrmil...@gmail.com wrote: I'm guessing you didn't specify numShards. Things changed in 4.1 - if you don't specify numShards it goes into a mode where it's up to you to distribute updates. - Mark On Mar 25, 2013, at 10:29 PM, Chris R corg...@gmail.com wrote: I have two issues and I'm unsure if they
Re: Solrcloud 4.1 Collection with multiple slices only use
On Mar 28, 2013, at 6:30 PM, Chris R corg...@gmail.com wrote: I'll probably move up to 4.2 tomorrow. 4.2.1 should be ready as soon as I have time to publish it - we have a passing vote and I think we are close to 72 hours after. I just have to stock up on some beer first - Robert tells me it's like a 20 beer event… - Mark
Re: Batch Search Query
: Now, what happens is a user will upload say a word document to us. We then : parse it and process it into segments. It very well could be 5000 segments : or even more in that word document. Each one of those ~5000 segments needs : to be searched for similar segments in solr. I’m not quite sure how I will : do the query (whether proximate or something else). The point though, is to : get back similar results for each segment. You've described your black box (an index of small textual documents) and you've described your input (a large document that will be broken down into N=~5000 small textual snippets) but you haven't really clarified what your desired output should be... * N textual documents from your index, where each doc is the 1 'best' match to 1 of hte N textual input snippets. * Some fixed number Y textual documents from your index representing the best of the best matches against your textual input snippets (ie: if one input snippet is a really good match for multiple indexed docs, return all of those really good matches, but don't return any matches from other snippets if the only matches are poor.) * Some variable number Y textual documents from your index representing the best of hte best matches against your textual input snippets based on some minimum threshhold of matching criteria. * etc... Forgot for a momoent that we are talking about solr at all -- describe some hypothetical data, some hypothetical query examples, and some hypothetical results you would like to get back (or not get back) from each of those query examples (ideally in psuedo-code) and lets see if that doesn't help suggest an implemntation strategy. -Hoss
Re: Solrcloud 4.1 Collection with multiple slices only use
That's my kind of release! Sent from my Verizon Wireless Phone - Reply message - From: Mark Miller markrmil...@gmail.com To: solr-user@lucene.apache.org Subject: Solrcloud 4.1 Collection with multiple slices only use Date: Thu, Mar 28, 2013 6:34 pm On Mar 28, 2013, at 6:30 PM, Chris R corg...@gmail.com wrote: I'll probably move up to 4.2 tomorrow. 4.2.1 should be ready as soon as I have time to publish it - we have a passing vote and I think we are close to 72 hours after. I just have to stock up on some beer first - Robert tells me it's like a 20 beer event… - Mark
Re: How to update synonyms.txt without restart?
: But solr wiki says: : ``` : Starting with Solr4.0, the RELOAD command is implemented in a way that : results a live reloads of the SolrCore, reusing the existing various : objects such as the SolrIndexWriter. As a result, some configuration : options can not be changed and made active with a simple RELOAD... Directly below that sentence are bullet points listing exactly which config options can't be changed with a simple reload... * IndexWriter related settings in indexConfig * dataDir location : http://wiki.apache.org/solr/CoreAdmin#RELOAD -Hoss
Re: Solr Cloud update process
On 3/28/2013 3:01 PM, Walter Underwood wrote: There are lots of small issues, though. 1. Is Solr tested with a mix of current and previous versions? It is safe to run a cluster that is a mix of 4.1 and 4.2, even for a little bit? 2. Can Solr 4.2 run with Solr 4.1 config files? This means all of conf/, not just the main XML files. 3. We don't want a cluster with config files that are ahead of the software version, so I think we need: * Update all the war files and restart each Solr process. * Upload the new config files * Reload each collection on each Solr process But this requires that Solr 4.2 be able to start with Solr 4.1 config files. 4. Do we need to stop updates, wait for all nodes to sync, and not restart until the whole cluster is uploaded. 5. I'd like a bit more detail about exactly what upconfig is supposed to do, because I spent a lot of time with it doing things that did not result in a working Solr cluster. For example, for files in the directory argument, where exactly do they end up in the Zookeeper space? Currently, I've been doing updates with bootstrap, because it was the only thing I could get to work. Solr 4.2 will work just fine with config files from 4.1. I have a SolrCloud that was running a 4.1 snapshot. I upgraded it to 4.2.1 built from source with no problem. The exact steps that I did were: 1) Replace solr.war. 2) Replace lucene-analyzers-icu-4.1-SNAPSHOT.jar with lucene-analyzers-icu-4.2.1-SNAPSHOT.jar 3) Upgrade all of my jetty jars from 8.1.7 to 8.1.9. 4) Repeat the steps above on the other server. 5) Use zkcli.sh to 'upconfig' a replacement config set with only one change - luceneMatchVersion went from LUCENE_40 to LUCENE_42. 6) Restart both Solr instances. Upgrading jetty is something applicable to only my install, and was not a necessary step. The jetty version currently included in Solr as of 4.1 is 8.1.8 - see SOLR-4155. The upconfig command on zkcli.sh will add/replace the config set with the one that you specify. It will go into /configs in your zookeeper ensemble. If you specify a chroot on your zkhost parameter, then it will go into /path/to/chroot/configs instead. Most of the time a chroot will only have one element, so /chroot/configs would the most likely location. I actually would like more detail on upconfig myself - what if you delete files from the config directory on disk? Will they be deleted from zookeeper? I use a solrconfig that has xinclude statements, and occasionally those files do get deleted or renamed. Thanks, Shawn
Re: Solrcloud 4.1 Collection with multiple slices only use
On 3/28/2013 4:23 PM, Mark Miller wrote: True - though I think for 4.2. numShards has never been respected in the cores def's for various reasons. In 4.0 and 4.1, things should have still worked though - you didn't need to give numShards and everything should work just based on configuring different shard names for core or accepting the default shard names. In 4.2 this went away - not passing numShards now means that you must distrib updates yourself. There are various technical reasons for this given new features that are being added. So, you can only really pre configure *one* collection in solr.xml and then use the numShards sys prop. If you wanted to create another collection the same way with a *different* number of shards, you would have to stop Solr, do a new numShards sys prop after pre configuring the next collection, then start Solr. Not really a good option. And so, the collections API is the way to go - and it's fairly poor in 4.2 due to it's lack of result responses (you have to search the overseer logs). It's slightly better in 4.2 (you will get some response) and much better in 4.2.1 (you will get decent responses). Now that it's much more central, it will continue to improve rapidly. Can't you leave numShards out completely, then include a numShards parameter on a collection api CREATE url, possibly giving a different numShards to each collection? Thanks, Shawn
Re: SOLR - Unable to execute query error - DIH
: I am trying to index data from SQL Server view to the SOLR using the DIH Have you ruled out the view itself being the bottle neck? Try running whatever command line SQLServer client exists on your SOLR server to connect remotely to your existing SQL server and run select * from view and redirect thek output to a file. that will give you a minimal absolute baseline for the best possible performace you could expect to hope for when indexing into Solr -- and tip you off to wether the view is the problem when asking for more then a handful of documents. -Hoss
Re: Solrcloud 4.1 Collection with multiple slices only use
On Mar 28, 2013, at 7:30 PM, Shawn Heisey s...@elyograg.org wrote: Can't you leave numShards out completely, then include a numShards parameter on a collection api CREATE url, possibly giving a different numShards to each collection? Thanks, Shawn Yes - that's why I say the collections API is the way forward - it has none of these limitations. The limitations are all around pre-configuring everything in solr.xml and not using the collections API. - Mark
Re: Solr Cloud update process
On Mar 28, 2013, at 7:27 PM, Shawn Heisey s...@elyograg.org wrote: I actually would like more detail on upconfig myself - what if you delete files from the config directory on disk? Will they be deleted from zookeeper? I use a solrconfig that has xinclude statements, and occasionally those files do get deleted or renamed. Thanks, Shawn Currently, it's a straight upload - if files went away locally, they will stay in zk. It will just replace what you upload. Happy to help implement a sync option or something if you create a JIRA for it. - mark
Re: How to update synonyms.txt without restart?
Not sure, but if you put it in the data dir, I think it picks it up and reloads on commit. Upayavira On Thu, Mar 28, 2013, at 09:11 AM, Kaneyama Genta wrote: Dear all, I investigating how to update synonyms.txt. Some people says CORE RELOAD will reload synonyms.txt. But solr wiki says: ``` Starting with Solr4.0, the RELOAD command is implemented in a way that results a live reloads of the SolrCore, reusing the existing various objects such as the SolrIndexWriter. As a result, some configuration options can not be changed and made active with a simple RELOAD... ``` http://wiki.apache.org/solr/CoreAdmin#RELOAD And https://issues.apache.org/jira/browse/SOLR-3592 is marked as unresolved. Problem is How can I update synonyms.txt in production environment? Workaround is restart Solr process. But it is not looks good for me. Will someone tell me what is the best practice of synonyms.txt updating? Thanks in advance.
Re: How to update synonyms.txt without restart?
But this is fixed in 4.2 - now the index writer is rebooted on core reload. So that's just 4.0 and 4.1. - Mark On Mar 28, 2013, at 6:48 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : But solr wiki says: : ``` : Starting with Solr4.0, the RELOAD command is implemented in a way that : results a live reloads of the SolrCore, reusing the existing various : objects such as the SolrIndexWriter. As a result, some configuration : options can not be changed and made active with a simple RELOAD... Directly below that sentence are bullet points listing exactly which config options can't be changed with a simple reload... * IndexWriter related settings in indexConfig * dataDir location : http://wiki.apache.org/solr/CoreAdmin#RELOAD -Hoss
Re: How to update synonyms.txt without restart?
Though I think *another* JIRA made data dir not changeable over core reload for some reason I don't recall exactly. But the other stuff is back to being changeable :) - Mark On Mar 28, 2013, at 8:04 PM, Mark Miller markrmil...@gmail.com wrote: But this is fixed in 4.2 - now the index writer is rebooted on core reload. So that's just 4.0 and 4.1. - Mark On Mar 28, 2013, at 6:48 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : But solr wiki says: : ``` : Starting with Solr4.0, the RELOAD command is implemented in a way that : results a live reloads of the SolrCore, reusing the existing various : objects such as the SolrIndexWriter. As a result, some configuration : options can not be changed and made active with a simple RELOAD... Directly below that sentence are bullet points listing exactly which config options can't be changed with a simple reload... * IndexWriter related settings in indexConfig * dataDir location : http://wiki.apache.org/solr/CoreAdmin#RELOAD -Hoss
Basic auth on SolrCloud /admin/* calls
Hey guys, I've recently setup basic auth under Jetty 8 for all my Solr 4.x '/admin/*' calls, in order to protect my Collections and Cores API. Although the security constraint is working as expected ('/admin/*' calls require Basic Auth or return 401), when I use the Collections API to create a collection, I receive a 200 OK to the Collections API CREATE call, but the background Cores API calls that are ran on the Collection API's behalf fail on the Basic Auth on other nodes with a 401 code, as I should have foreseen, but didn't. Is there a way to tell SolrCloud to use authentication on internal Cores API calls that are spawned on Collections API's behalf, or is this a new feature request? To reproduce: 1. Implement basic auth on '/admin/*' URIs. 2. Perform a CREATE Collections API call to a node (which will return 200 OK). 3. Notice all Cores API calls fail (Collection isn't created). See stack trace below from the node that was issued the CREATE call. The stack trace I get is: org.apache.solr.common.SolrException: Server at http://HOST HERE:8983/solrhttp://%3cHOST%20HERE%3e:8983/solr returned non ok status:401, message:Unauthorized at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:169) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:135) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) Cheers! Tim
Re: Could not load config for solrconfig.xml
In windows when I hit Execute Query button I got this results : ?xml version=1.0 encoding=UTF-8?responselst name=responseHeaderint name=status0/intint name=QTime181/intlst name=paramsstr name=indenttrue/strstr name=qstreetname:mdw/strstr name=wtxml/str/lst/lstresult name=response numFound=13674 start=0docstr name=streetnameMEADOW/strstr name=lemsmatchcode2501001ABN 1MD 262/str/docdocstr name=streetnameMEADOW/strstr name=lemsmatchcode2501001ABRM1MD 472/str/docdocstr name=streetnameMEADOW/strstr name=lemsmatchcode2501001ADMS1MD 350/str/docdoc .. . In Unix with same setup, I got this result : ?xml version=1.0 encoding=UTF-8?responselst name=responseHeaderint name=status0/intint name=QTime2/intlst name=paramsstr name=indenttrue/strstr name=q*:*/strstr name=wtxml/str/lst/lstresult name=response numFound=0 start=0/result/response I did not understand why . thanks, your help is appreciated. From: Gora Mohanty g...@mimirtech.com To: solr-user@lucene.apache.org; A. Lotfi majidna...@yahoo.com Sent: Thursday, March 28, 2013 4:40 PM Subject: Re: Could not load config for solrconfig.xml On 29 March 2013 01:59, A. Lotfi majidna...@yahoo.com wrote: Thanks, my path to solr home was missing something, it's worlking, but no results, the same solr app with same configuration files worked in windows. What do you mean by no results? Have you indexed stuff, and are not able to search for it? Are you expecting to copy Solr files from an old setup with an index, and have things work? That would be OK, provided that the Solr index formats were compatible, but you would also need to copy the index, and define dataDir properly in solrconfig.xml. Regards, Gora
Re: Could not load config for solrconfig.xml
On 29 March 2013 07:23, A. Lotfi majidna...@yahoo.com wrote: In windows when I hit Execute Query button I got this results : [...] There seem to be no documents in your Solr index on the UNIX system. As I mentioned in my previous message, you either need to copy the index files from the WIndows system (provided that the Solr index format has not changed, this will work), or reindex on the UNIX system. Regards, Gora
Re: Could not load config for solrconfig.xml
In Unix, in data/index there is : segments.gen 20 B 3/28/2013 rw-r--r-- segments_1 45 B 3/28/2013 rw-r--r-- I don't know how this was generated , should I delete them from the directory, or from other place ? If so, how to delete reindex on the UNIX system ? thanks lot. From: Gora Mohanty g...@mimirtech.com To: A. Lotfi majidna...@yahoo.com Cc: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Thursday, March 28, 2013 9:59 PM Subject: Re: Could not load config for solrconfig.xml On 29 March 2013 07:23, A. Lotfi majidna...@yahoo.com wrote: In windows when I hit Execute Query button I got this results : [...] There seem to be no documents in your Solr index on the UNIX system. As I mentioned in my previous message, you either need to copy the index files from the WIndows system (provided that the Solr index format has not changed, this will work), or reindex on the UNIX system. Regards, Gora