Re: Schema Change: Int - String (i am the original poster, new email address)
Maybe if I were to say that the column user_id will become user_ids that would clarify things? user_id:2002+AND+created:[${**from}+TO+${until}]+data:more becomes user_id*s*:2002+AND+created:[${**from}+TO+${until}]+data:more where I want 2002 to be an exact positive match on one of the user_ids embedded in the TEXT ... not string :) If I am totally off or making no sense, feedback it very welcome. I am just seeing lots of similar data going into my db and it feels like Solr should be able to handle this. I just want to know if transforming the data like that will still allow exact searches against a user_id. My language from a solr gurus point of view is probably *very* poorly phrased ... exact and TEXT might not go hand in hand. Is the TEXT 20 1442 35 parsed as 20 1442 35 so that a search against it for 1442 will yield exact results? A search against 442 wont match right? 1. 20 1442 35 2. 20 442 35 3. 20 1442 user_ids:1442 - yields #1 #3 always? user_ids:442 - yields only #2 always? My lack of understanding about what solr does when it indexes is shining through :) On Fri, Jun 7, 2013 at 1:43 PM, z z zenlok.testi...@gmail.com wrote: My language might be a bit off (I am saying string when I probably mean text in the context of solr), but I'm pretty sure that my story is unwavering ;) `id` int(11) NOT NULL AUTO_INCREMENT `created` int(10) `data` varbinary(255) `user_id` int(11) So, imagine that we have 1000 entries come in where data above is exactly the same for all 1000 entries, but user_id is different (id and created being different is irrelevant). I am thinking that prior to inserting into mysql, I should be able to concatenate the user_ids together with whitespace and then insert them into something like: `id` int(11) NOT NULL AUTO_INCREMENT `created` int(10) `data` varbinary(255) `user_id` blob Then on solr's end it will treat the user_id as Text and parse it (I want to say tokenize, but maybe my language is incorrect here?). Then when I search user_id:2002+AND+created:[${**from}+TO+${until}]+data:more I want to be sure that if I look for user_id 2002, I will get data that only has a value 2002 in the user_id column and that a separate user with id 20 cannot accidentally pull data for user_id 2002 as a result of a fuzzy (my language ok?) match of 20 against (20)02. Current schema definition: field name=user_id type=int indexed=true stored=true/ New schema definition: field name=user_id type=user_id_string indexed=true stored=true/ ... fieldType name=user_id_string class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory maxTokenLength=120/ /analyzer /fieldType
Re: Solr 4.2.1 higher memory footprint vs Solr 3.5
Hi Shawn, I also had CMS with tons of tuning options but still had once in a while bigger GC pause. After switching to JDK7 I tried G1GC with no other options and it runs perfekt. With CMS I saw that old and young generation where growing until they had to do a GC. This produces the sawtooth and also takes longer GC pause time. With G1GC the GC is more frequently and better timed, it is softer, more flexible. I just removed any old tuning and old GC and have only the G1GC option. ulimit -c unlimited ulimit -l 256 ulimit -m unlimited ulimit -n 8192 ulimit -s unlimited ulimit -v unlimited JAVA_OPTS=-server -d64 -Xmx20g -Xms20g -XX:+UseG1GC -verbose:gc -XX:+PrintGCDateStamps -XX:+PrintGCDetails -Xloggc:gc.log java version 1.7.0_07 Java(TM) SE Runtime Environment (build 1.7.0_07-b10) Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode) May be I have just luck with it, but for big heaps it works fine. Regards Bernd Am 06.06.2013 16:23, schrieb Shawn Heisey: On 6/6/2013 3:50 AM, Bernd Fehling wrote: What helped me a lot was switching to G1GC. Faster, smoother, very little ripple, nearly no sawtooth. When I tried G1, it did indeed produce a better looking memory graph, but it didn't do anything about my GC pauses. They were several seconds with just CMS and NewRatio, and they actually seemed to get slightly worse when I tried G1 instead. To solve the GC pause problem, I've had to switch back to CMS and tack on several more tuning options, most of which are CMS-specific. I'm not sure how to tune G1. Have you done any additional tuning? Thanks, Shawn
Re: [blogpost] Memory is overrated, use SSDs
On Fri, 2013-06-07 at 07:15 +0200, Andy wrote: One question I have is did you precondition the SSD ( http://www.sandforce.com/userfiles/file/downloads/FMS2009_F2A_Smith.pdf )? SSD performance tends to take a very deep dive once all blocks are written at least once and the garbage collector kicks in. Not explicitly so. The machine is our test server with the SSDs in RAID 0 with - to my knowledge - no TRIM support. They are 2½ year old and has had a fair amount of data written and being 3/4 full most of the time. At one point in time we experimented with 10M+ relatively small files and a couple of 40GB databases, so the drives are definitely not in pristine condition. Anyway, as Solr searches is heavy on tiny random reads, I suspect that search performance will be largely unaffected by SSD fragmentation. It would be interesting to examine, but for now I cannot prioritize another large performance test. Thank you for your input. I will update the blog post accordingly, Toke Eskildsen, State and University Library, Denmark
Re: nutch 1.4, solr 3.4 configuration error
I had a similar error. I couldn't find any documentation which nutch and solr versions are compatible. For instance, we' re using nutch 1.6 on hadoop 1.0.4 with solrj 3.4.0 and index crawled segments to solr 4.2.0. But I remember that I could find a compatible version of solrj for nutch 1.4 (because of using hadoop). You can upgrade your nutch from 1.4 to 1.6 easily. And also I suggest you to check for your solrindex-mapping.xml in your /conf directory. Best, Tugcem. On Fri, Jun 7, 2013 at 12:58 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : ./nutch crawl urls -dir myCrawl2 -solr http://localhost:8080 -depth 2 -topN ... : Caused by: org.apache.solr.common.SolrException: Not Found : : Not Found : : request: http://localhost:8080/select?q=id:[* TO : *]fl=idrows=1wt=javabinversion=2 ... : Other possibly helpful information: : 1) The solr admin screen comes up fine in the browser. At which URL does the Solr admin screen come up fine in your browser? Best guess... 1) you have solr installed such that it uses the webcontext /solr but you gave the wrong url to nutch (ie: try -solr http://localhost:8080/solr;) 2) you are using multiple collections, and you may need to configure nutch to know about which collection you are using (ie: try -solr http://localhost:8080/solr/collection1;) ...if neither of those don't help, i would suggest you follow up with the nutch-user list, as the nutch community is probably in the best position to help you configure nutch to work with Solr and vice versa) -Hoss -- TO
Clear cache used by Solr
Hi I 'm trying to compare the performance of different Solr queries. In order to get a fair test, I want to clear the cache between queries. How is this done? Of course, one can restart the server, I was to know if there is a quicker way. -- View this message in context: http://lucene.472066.n3.nabble.com/Clear-cache-used-by-Solr-tp4068817.html Sent from the Solr - User mailing list archive at Nabble.com.
solr.NoOpDistributingUpdateProcessorFactory in SOLR CLOUD
Hi , Need more information how NoOpDistributingUpdateProcessorFactory works, Below is the cloud setup, collection1 shard1 ---node1:8983 (leader) | | _ _ _ _ _ _ _ _ _ _ node2:8984 | |_ _ _ _ _ _ _ _ _ _ _ _ shard2--- node3:7585 (leader) |_ _ _ _ _ _ _ _ __ _ node4:7586 node 1, node 2, node 3 , node4 are 4 seperate solr instance running on 4 tomcat container. We have included the following tag to solrconfig.xml , for not distributing the index across shards. updateRequestProcessorChain processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / processor class=solr.NoOpDistributingUpdateProcessorFactory / /updateRequestProcessorChain We are able accomplish the task of loading an index only single shard by using no-op distributingupdateprocessfactory. Loaded data into node:8984 of shard 1 After indexing the size of the index on node 8984 was 94MB Whereas the index size on leader node for shard 1 was 4 kb. Seems for shard 1 the leader is not performing the index building and replication is not working. But on good note, the index was not distributed to shard 2 (node 3, node 4) When i removed above tag updateRequestProcessorChain, Index is distributed accorss shards Replication is working fine. My requirement is to store specific region index into single shard, so the region data is not distributed across shards. Can you some help on this ? Thanks, Sathish -- View this message in context: http://lucene.472066.n3.nabble.com/solr-NoOpDistributingUpdateProcessorFactory-in-SOLR-CLOUD-tp4068818.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Configuring seperate db-data-config.xml per shard
Hi, we were able to accomplish this by single collection. Zookeeper : create separate node for each shards, and upload the dbconfig file under shards. eg : /config/config1/shard1 /config/config1/shard2 /config/config1/shard3 In the solrconfig.xml, requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=config${dbconfig}/str /lst /requestHandler In solr.xml, ?xml version=1.0 encoding=UTF-8 ? solr persistent=true zkHost=localhost:2181 cores defaultCoreName=core1 adminPath=/admin/cores zkClientTimeout=${zkClientTimeout:15000} host=${host:} hostPort=9985 hostContext=${hostContext:} core loadOnStartup=true instanceDir=core1 transient=false name=core1 property name=dbconfig value=shard1/db-data-config.xml / /core /cores /solr This way you can configure dbconfig file per shard. Thanks, Sathish -- View this message in context: http://lucene.472066.n3.nabble.com/Configuring-seperate-db-data-config-xml-per-shard-tp4068383p4068819.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is there a way to load multiple schema when using zookeeper?
Hi, we were able to accomplish this by single collection. Zookeeper : create separate node for each shards, and upload the dbconfig file under shards. eg : /config/config1/shard1 /config/config1/shard2 /config/config1/shard3 In the solrconfig.xml, requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=config${dbconfig}/str /lst /requestHandler In solr.xml, ?xml version=1.0 encoding=UTF-8 ? solr persistent=true zkHost=localhost:2181 cores defaultCoreName=core1 adminPath=/admin/cores zkClientTimeout=${zkClientTimeout:15000} host=${host:} hostPort=9985 hostContext=${hostContext:} core loadOnStartup=true instanceDir=core1 transient=false name=core1 property name=dbconfig value=shard1/db-data-config.xml / /core /cores /solr This way you can configure dbconfig file per shard. Thanks, Sathish -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-load-multiple-schema-when-using-zookeeper-tp4058358p4068821.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is there a way to load multiple schema when using zookeeper?
Hi, we were able to accomplish this by single collection. Zookeeper : create separate node for each shards, and upload the dbconfig file under shards. eg : /config/config1/shard1 /config/config1/shard2 /config/config1/shard3 In the solrconfig.xml, requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=config${dbconfig}/str /lst /requestHandler In solr.xml, ?xml version=1.0 encoding=UTF-8 ? solr persistent=true zkHost=localhost:2181 cores defaultCoreName=core1 adminPath=/admin/cores zkClientTimeout=${zkClientTimeout:15000} host=${host:} hostPort=9985 hostContext=${hostContext:} core loadOnStartup=true instanceDir=core1 transient=false name=core1 property name=dbconfig value=shard1/db-data-config.xml / /core /cores /solr This way you can configure dbconfig file per shard. Thanks, Sathish -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-load-multiple-schema-when-using-zookeeper-tp4058358p4068820.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Clear cache used by Solr
On Fri, 2013-06-07 at 09:24 +0200, Varsha Rani wrote: I 'm trying to compare the performance of different Solr queries. In order to get a fair test, I want to clear the cache between queries. How is this done? Of course, one can restart the server, I was to know if there is a quicker way. That depends on your system. If you are using Linux or OSX, this should work: sudo echo 1 /proc/sys/vm/drop_caches For Windows, CacheSet seems to provide the functionality: http://technet.microsoft.com/en-us/sysinternals/bb897561.aspx To avoid any leftover from memory mapping vs. cache trickery, I stop Solr, issue the drop_caches call and start Solr again. - Toke Eskildsen
Re: LotsOfCores feature
A use case would a web site or service that had millions of users, each of whom would have an active Solr core when they are active, but inactive otherwise. Of course those cores would not all reside on one node and ZooKeeper is out of the question for managing anything that is in the millions. This would be a true cloud or data center and even multi-data center app, not a cluster app. I am getting a little bit confused again. It seems now the answer to my question is a clear no? Also, instead of managing cores is it not possible to manage servers which will be in tens and hundreds? As far as which core goes to which server, that could be based on some hashing scheme.
Using Solr Scripts
I have a SolrCloud and I want to maintain some important things on it. i.e. I will backup indexes, start - stop Solr nodes individually, send an optimize request to the cloud etc. However I see that there is a scripts folder comes with Solr. Can I use some of them for my purposes or should I implement something that connects to Zookeeper quorum by Solrj and does what I want?
How to stop index distribution among shards in solr cloud
Hi, I have two shards, logically each shards corresponds to a region. Currently index is distributed in solr cloud to shards, how to load index to specific shard in solr cloud, Any thoughts ? Thanks, Sathish -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-stop-index-distribution-among-shards-in-solr-cloud-tp4068831.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr4.3 Internationalization.
Guys, Please clarify the following questions regarding Solr Internationalization. 1) Initially my requirement is need to support 2 languages(English French) for a Web application. And we are using Mysql DB. 2) So please share good and easy approach to achieve it with some sample configs. 3) And my question is whether I need to index the data with both languages(English French) with different cores? 4) Or indexing with English is only enough? So solr have any mechanism to handle multiple languages while retrieving? If there anything share with some sample configs. Thanks Guru -- View this message in context: http://lucene.472066.n3.nabble.com/Solr4-3-Internationalization-tp4068834.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: LotsOfCores feature
The Wiki page was built not for Cloud Solr. We have done such a deployment where less than a tenth of cores were active at any given point in time. though there were tens of million indices they were split among a large no:of hosts. If you don't insist of Cloud deployment it is possible. I'm not sure if it is possible with cloud On Fri, Jun 7, 2013 at 12:38 AM, Aleksey bitterc...@gmail.com wrote: I was looking at this wiki and linked issues: http://wiki.apache.org/solr/LotsOfCores they talk about a limit being 100K cores. Is that per server or per entire fleet because zookeeper needs to manage that? I was considering a use case where I have tens of millions of indices but less that a million needs to be active at any time, so they need to be loaded on demand and evicted when not used for a while. Also since number one requirement is efficient loading of course I assume I will store a prebuilt index somewhere so Solr will just download it and strap it in, right? The root issue is marked as won;t fix but some other important subissues are marked as resolved. What's the overall status of the effort? Thank you in advance, Aleksey -- - Noble Paul
Re: SOLR CSV output in custom order
Have you tried explicitly giving the field names (fl) as parameter http://wiki.apache.org/solr/CommonQueryParameters#fl On Thu, Jun 6, 2013 at 12:41 PM, anurag.jain anurag.k...@gmail.com wrote: I want output of csv file in proper order. when I use wt=csv it gives output in random order. Is there any way to get output in proper format. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-CSV-output-in-custom-order-tp4068527.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul
Re: [blogpost] Memory is overrated, use SSDs
Thanks for this, hard data is always welcome! Another blog post for my reference list! Erick On Fri, Jun 7, 2013 at 2:59 AM, Toke Eskildsen t...@statsbiblioteket.dk wrote: On Fri, 2013-06-07 at 07:15 +0200, Andy wrote: One question I have is did you precondition the SSD ( http://www.sandforce.com/userfiles/file/downloads/FMS2009_F2A_Smith.pdf )? SSD performance tends to take a very deep dive once all blocks are written at least once and the garbage collector kicks in. Not explicitly so. The machine is our test server with the SSDs in RAID 0 with - to my knowledge - no TRIM support. They are 2½ year old and has had a fair amount of data written and being 3/4 full most of the time. At one point in time we experimented with 10M+ relatively small files and a couple of 40GB databases, so the drives are definitely not in pristine condition. Anyway, as Solr searches is heavy on tiny random reads, I suspect that search performance will be largely unaffected by SSD fragmentation. It would be interesting to examine, but for now I cannot prioritize another large performance test. Thank you for your input. I will update the blog post accordingly, Toke Eskildsen, State and University Library, Denmark
Re: solr.NoOpDistributingUpdateProcessorFactory in SOLR CLOUD
I don't think you want the noop bits, I'd go back to the standard definitions here. What you _do_ want, I think, is the custom hashing option, see: https://issues.apache.org/jira/browse/SOLR-2592 which has been in place since Solr 4.1. It allows you to send documents to the shard of your choice, which is I believe what you're really after here. Best Erick On Fri, Jun 7, 2013 at 3:31 AM, sathish_ix skandhasw...@inautix.co.in wrote: Hi , Need more information how NoOpDistributingUpdateProcessorFactory works, Below is the cloud setup, collection1 shard1 ---node1:8983 (leader) | | _ _ _ _ _ _ _ _ _ _ node2:8984 | |_ _ _ _ _ _ _ _ _ _ _ _ shard2--- node3:7585 (leader) |_ _ _ _ _ _ _ _ __ _ node4:7586 node 1, node 2, node 3 , node4 are 4 seperate solr instance running on 4 tomcat container. We have included the following tag to solrconfig.xml , for not distributing the index across shards. updateRequestProcessorChain processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / processor class=solr.NoOpDistributingUpdateProcessorFactory / /updateRequestProcessorChain We are able accomplish the task of loading an index only single shard by using no-op distributingupdateprocessfactory. Loaded data into node:8984 of shard 1 After indexing the size of the index on node 8984 was 94MB Whereas the index size on leader node for shard 1 was 4 kb. Seems for shard 1 the leader is not performing the index building and replication is not working. But on good note, the index was not distributed to shard 2 (node 3, node 4) When i removed above tag updateRequestProcessorChain, Index is distributed accorss shards Replication is working fine. My requirement is to store specific region index into single shard, so the region data is not distributed across shards. Can you some help on this ? Thanks, Sathish -- View this message in context: http://lucene.472066.n3.nabble.com/solr-NoOpDistributingUpdateProcessorFactory-in-SOLR-CLOUD-tp4068818.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Clear cache used by Solr
I really question whether this is valuable. Much of Solr performance is there explicitly because of caches, so what you're measuring is disk I/O to fill caches and any other latency. I'm just not sure what operational information you'll get here. But assuming that you're really getting actionable data, you can comment out all of the caches in the solrconfig.xml file to at least remove those. The underlying lucene caches will not be emptied, but they'll always be filled anyway for all the queries after the first few, you can't avoid them. Best Erick On Fri, Jun 7, 2013 at 3:24 AM, Varsha Rani varsha.ya...@orkash.com wrote: Hi I 'm trying to compare the performance of different Solr queries. In order to get a fair test, I want to clear the cache between queries. How is this done? Of course, one can restart the server, I was to know if there is a quicker way. -- View this message in context: http://lucene.472066.n3.nabble.com/Clear-cache-used-by-Solr-tp4068817.html Sent from the Solr - User mailing list archive at Nabble.com.
solr facet query on multiple search term
Hello All, I required facet counts for multiple SearchTerms. Currently I am doing two separate facet query on each search term with facet.range=dateField e.g. http://solrserver/select?q=1stsearchTermfq=onfacet-parameters http://solrserver/select?q=2ndsearchTermfq=onfacet-parameters Note :: SearchTerm field will be text_en_splitting Now I have found another way to do facet query on multiple search term by tagging and excluding e.g. http://solrurl/select?start=0rows=10hl=off; facet=on facet.range.start=2013-06-06T16%3a00%3a00Z facet.range.end=2013-06-07T16%3a00%3a01Z facet.range.gap=%2B1HOUR wt=xml sort=dateField+desc facet.range={!key=music+ex=movie}dateField fq={!tag=music}content:musicfacet.range={!key=movie+ex=music}dateField fq={!tag=movie}content:movieq=(col2:1+) fq=+dateField:[2013-06-05T16:00:00Z+TO+2013-06-07T16:00:00Z]+AND+(+Col1:test+) fl=col1,col2,col3 I have tested for few search term , It is providing same result as different query for each search term. Is this the proper way (with results and performance)? -- View this message in context: http://lucene.472066.n3.nabble.com/solr-facet-query-on-multiple-search-term-tp4068856.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: LotsOfCores feature
I should have been clearer, and others have mentioned... the lots of cores stuff is really outside Zookeeper/SolrCloud at present. I don't think it's incompatible, but it wasn't part of the design so it'll need some effort to make it play nice with SolrCloud. I'm not sure there's actually a compelling use-case for combining the two. bq: Also, instead of managing cores is it not possible to manage servers which will be in tens and hundreds? Well, tens to hundreds of servers will work with SolrCloud. You could theoretically take over routing documents (i.e. custom hashing) and simply use SolrCloud without the lots of cores stuff. So the scenario is that you have, say, 250 machines that will hold all your data and use custom routing to get the right docs to the right core. Some of the upcoming SolrJ being capable of sending requests only to the proper shard would certainly help here. But this too is rather unexplored territory. I don't think Zookeeper would really have a problem here because it's not moving much data back and forth, the 1M limitation for data in ZK is on a per-core basis and really applies only to the conf data, NOT the index. But the current approach does lend itself to Jack's scenario. Essentially your ClusterKeeper could send the index to one of the machines and create the core there. The current approach addresses the case where you are essentially doing what Jack outlined semi-manually. That is, you're distributing your cores around your cluster based on historical access patterns. It's pretty easy to move the cores around by copying the dirs and using the auto-discovery stuff to keep things in balance, but it's in no way automatic and probably requires a restart (or at least core unload/load). Jack's idea of doing this dynamically should work in that kind of scenario. I can imagine, for instance, some relatively small number of physical machines and all the user's indexes actually being kept on a networked filesystem. The startup process is simply finding a machine with spare capacity and telling it to create the core and pointing it at the pre-existing index. On the assumption that the indexes fit into memory, you'd pay a small penalty for start-up but wouldn't need to copy indexes around. You could elaborate this as necessary, tuning the transient caches such that you fit the number/size of users to particular hardware. If the store were an HDFS file system, redundancy/backup/error recovery would come along for free. But under any scenario, one of the hurdles will be figuring out how many simultaneous users of whatever size can actually be comfortably handled by a particular piece of hardware. And usually there's some kind of long tail just to make it worse. Most of your users will be under X documents, and some users will be 100X And updating would be interesting. But I should emphasize that anything elaborate like this dynamic shuffling is kind of theoretical at this point, meaning we haven't actually tested it. It _should_ work, but I'm sure there will be some issues to flush out. Best Erick On Fri, Jun 7, 2013 at 6:38 AM, Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com wrote: The Wiki page was built not for Cloud Solr. We have done such a deployment where less than a tenth of cores were active at any given point in time. though there were tens of million indices they were split among a large no:of hosts. If you don't insist of Cloud deployment it is possible. I'm not sure if it is possible with cloud On Fri, Jun 7, 2013 at 12:38 AM, Aleksey bitterc...@gmail.com wrote: I was looking at this wiki and linked issues: http://wiki.apache.org/solr/LotsOfCores they talk about a limit being 100K cores. Is that per server or per entire fleet because zookeeper needs to manage that? I was considering a use case where I have tens of millions of indices but less that a million needs to be active at any time, so they need to be loaded on demand and evicted when not used for a while. Also since number one requirement is efficient loading of course I assume I will store a prebuilt index somewhere so Solr will just download it and strap it in, right? The root issue is marked as won;t fix but some other important subissues are marked as resolved. What's the overall status of the effort? Thank you in advance, Aleksey -- - Noble Paul
Documents
Good morning, I would like to know how I can modify a xml file to access to my information and not to the example information because I have one file from I obtains the information that I use to show the user with Blacklight. Sorry about my english, Alex
Re: Documents
hi, you need to parse your custom xml file and transform it into the xml file that will be of format solr understands. If you are familiar with xslt, you could do that in a few lines depending on the complexity of the input xml file. Dmitry On Fri, Jun 7, 2013 at 3:34 PM, acas...@greendata.com wrote: Good morning, I would like to know how I can modify a xml file to access to my information and not to the example information because I have one file from I obtains the information that I use to show the user with Blacklight. Sorry about my english, Alex
Re: Doubt Regarding Shards Index
Hi , How did you distribute the index by year to different shards, do we need to write any code ? Thanks, Sathish -- View this message in context: http://lucene.472066.n3.nabble.com/Doubt-Regarding-Shards-Index-tp3629964p4068869.html Sent from the Solr - User mailing list archive at Nabble.com.
[CROSS-POSTING] SOLR-4903 and SOLR-4904
CROSS-POSTING from dev list. Hi guys, As discussed with Grant and Andrzej I have created two jiras related to inefficiency in distributed faceting. This affects 3.4, but my gut feeling is telling me 4.x is affected as well. Regards, Dmitry Kan P.S. Asking this question won yours truly second prize on Stump the chump this year. :)
Re: HdfsDirectoryFactory
Eagle eye man. Yeah, we plan on contributing hdfs support for Solr. I'm flying home today and will create a JIRA issue for it shortly after I get there. - Mark On Jun 6, 2013, at 6:16 PM, Jamie Johnson jej2...@gmail.com wrote: I've seen reference to an HdfsDirectoryFactory in the new Cloudera Search along with a commit in the Solr SVN ( http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/test-files/solr/collection1/conf/solrconfig-tlog.xml?view=markup), is this something that is being made part of the core? I've seen discussions in the past where folks have recommended not using an HDFS based DirectoryFactory for reasons like speed, any details/information that can be provided would be really appreciated.
Re: Doubt Regarding Shards Index
Hi, Sharding by time by itself does not need any custom code on solr side: start indexing your data to a shard, depending on the timestamp of your document. The querying part is trickier if you want to have one front end solr: it should know which shards to query. If querying all shards for each query is fine for you, then you are good and no changes needed. Alternatively, you can shoot a query to a particular year shard knowing the year of your user query. Dmitry On Fri, Jun 7, 2013 at 3:54 PM, sathish_ix skandhasw...@inautix.co.inwrote: Hi , How did you distribute the index by year to different shards, do we need to write any code ? Thanks, Sathish -- View this message in context: http://lucene.472066.n3.nabble.com/Doubt-Regarding-Shards-Index-tp3629964p4068869.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.2.1 higher memory footprint vs Solr 3.5
This is exactly what we did for a clients (alas using Elasticsearch). We then observed better performance through SPM. We used the latest Oracle JVM. Otis Solr ElasticSearch Support http://sematext.com/ On Jun 7, 2013 2:55 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Hi Shawn, I also had CMS with tons of tuning options but still had once in a while bigger GC pause. After switching to JDK7 I tried G1GC with no other options and it runs perfekt. With CMS I saw that old and young generation where growing until they had to do a GC. This produces the sawtooth and also takes longer GC pause time. With G1GC the GC is more frequently and better timed, it is softer, more flexible. I just removed any old tuning and old GC and have only the G1GC option. ulimit -c unlimited ulimit -l 256 ulimit -m unlimited ulimit -n 8192 ulimit -s unlimited ulimit -v unlimited JAVA_OPTS=-server -d64 -Xmx20g -Xms20g -XX:+UseG1GC -verbose:gc -XX:+PrintGCDateStamps -XX:+PrintGCDetails -Xloggc:gc.log java version 1.7.0_07 Java(TM) SE Runtime Environment (build 1.7.0_07-b10) Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode) May be I have just luck with it, but for big heaps it works fine. Regards Bernd Am 06.06.2013 16:23, schrieb Shawn Heisey: On 6/6/2013 3:50 AM, Bernd Fehling wrote: What helped me a lot was switching to G1GC. Faster, smoother, very little ripple, nearly no sawtooth. When I tried G1, it did indeed produce a better looking memory graph, but it didn't do anything about my GC pauses. They were several seconds with just CMS and NewRatio, and they actually seemed to get slightly worse when I tried G1 instead. To solve the GC pause problem, I've had to switch back to CMS and tack on several more tuning options, most of which are CMS-specific. I'm not sure how to tune G1. Have you done any additional tuning? Thanks, Shawn
Re: Clear cache used by Solr
On Fri, Jun 7, 2013 at 7:32 AM, Erick Erickson erickerick...@gmail.com wrote: I really question whether this is valuable. Much of Solr performance is there explicitly because of caches Right, and it's also the case that certain solr features are coded with the cache in mind (i.e. they will be utilized for a single request for things like highlighting, multi-select faceting, etc. On Fri, Jun 7, 2013 at 3:24 AM, Varsha Rani varsha.ya...@orkash.com wrote: I 'm trying to compare the performance of different Solr queries. In order to get a fair test, I want to clear the cache between queries. If you are using/testing lucene query syntax, you can just add an additional term that doesn't match anything and then keep changing it... that will prevent the query/filter cache from recognizing it as the same. q=(my big query I'm testing) ab And then next time change the b to a c, etc. Or you could explicitly tell solr not to cache it: http://yonik.com/posts/advanced-filter-caching-in-solr/ q={!cache=false}(my big query I'm testing) -Yonik http://lucidworks.com
Re: LotsOfCores feature
AFAICT, SolrCloud addresses the use case of distributed update for a relatively smaller number of collections (dozens?) that have a relatively larger number of rows - billions over a modest to moderate number of nodes (a handful to a dozen or dozens). So, maybe dozens of collections (some people still call these cores) that distribute hundreds of millions if not billions of rows over dozens (or potentially low hundreds) of nodes. Technically, ZK was designed for thousands of nodes, but I don't think that was for the use case of distributed query that constantly fans out to all shards. Aleksey: What would you say is the average core size for your use case - thousands or millions of rows? And how sharded would each of your collections be, if at all? -- Jack Krupansky -Original Message- From: Noble Paul നോബിള് नोब्ळ् Sent: Friday, June 07, 2013 6:38 AM To: solr-user@lucene.apache.org Subject: Re: LotsOfCores feature The Wiki page was built not for Cloud Solr. We have done such a deployment where less than a tenth of cores were active at any given point in time. though there were tens of million indices they were split among a large no:of hosts. If you don't insist of Cloud deployment it is possible. I'm not sure if it is possible with cloud On Fri, Jun 7, 2013 at 12:38 AM, Aleksey bitterc...@gmail.com wrote: I was looking at this wiki and linked issues: http://wiki.apache.org/solr/LotsOfCores they talk about a limit being 100K cores. Is that per server or per entire fleet because zookeeper needs to manage that? I was considering a use case where I have tens of millions of indices but less that a million needs to be active at any time, so they need to be loaded on demand and evicted when not used for a while. Also since number one requirement is efficient loading of course I assume I will store a prebuilt index somewhere so Solr will just download it and strap it in, right? The root issue is marked as won;t fix but some other important subissues are marked as resolved. What's the overall status of the effort? Thank you in advance, Aleksey -- - Noble Paul
Re: Schema Change: Int - String (i am the original poster, new email address)
Right, a search for 442 would not match 1442. -- Jack Krupansky -Original Message- From: z z Sent: Friday, June 07, 2013 2:18 AM To: solr-user@lucene.apache.org Subject: Re: Schema Change: Int - String (i am the original poster, new email address) Maybe if I were to say that the column user_id will become user_ids that would clarify things? user_id:2002+AND+created:[${**from}+TO+${until}]+data:more becomes user_id*s*:2002+AND+created:[${**from}+TO+${until}]+data:more where I want 2002 to be an exact positive match on one of the user_ids embedded in the TEXT ... not string :) If I am totally off or making no sense, feedback it very welcome. I am just seeing lots of similar data going into my db and it feels like Solr should be able to handle this. I just want to know if transforming the data like that will still allow exact searches against a user_id. My language from a solr gurus point of view is probably *very* poorly phrased ... exact and TEXT might not go hand in hand. Is the TEXT 20 1442 35 parsed as 20 1442 35 so that a search against it for 1442 will yield exact results? A search against 442 wont match right? 1. 20 1442 35 2. 20 442 35 3. 20 1442 user_ids:1442 - yields #1 #3 always? user_ids:442 - yields only #2 always? My lack of understanding about what solr does when it indexes is shining through :) On Fri, Jun 7, 2013 at 1:43 PM, z z zenlok.testi...@gmail.com wrote: My language might be a bit off (I am saying string when I probably mean text in the context of solr), but I'm pretty sure that my story is unwavering ;) `id` int(11) NOT NULL AUTO_INCREMENT `created` int(10) `data` varbinary(255) `user_id` int(11) So, imagine that we have 1000 entries come in where data above is exactly the same for all 1000 entries, but user_id is different (id and created being different is irrelevant). I am thinking that prior to inserting into mysql, I should be able to concatenate the user_ids together with whitespace and then insert them into something like: `id` int(11) NOT NULL AUTO_INCREMENT `created` int(10) `data` varbinary(255) `user_id` blob Then on solr's end it will treat the user_id as Text and parse it (I want to say tokenize, but maybe my language is incorrect here?). Then when I search user_id:2002+AND+created:[${**from}+TO+${until}]+data:more I want to be sure that if I look for user_id 2002, I will get data that only has a value 2002 in the user_id column and that a separate user with id 20 cannot accidentally pull data for user_id 2002 as a result of a fuzzy (my language ok?) match of 20 against (20)02. Current schema definition: field name=user_id type=int indexed=true stored=true/ New schema definition: field name=user_id type=user_id_string indexed=true stored=true/ ... fieldType name=user_id_string class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory maxTokenLength=120/ /analyzer /fieldType
Re: OR query with null value and non-null value(s)
Yes, it SHOULD! And in the LucidWorks Search query parser it does. Why doesn't it in Solr? Ask Yonik to explain that! -- Jack Krupansky -Original Message- From: Rahul R Sent: Friday, June 07, 2013 1:21 AM To: solr-user@lucene.apache.org Subject: Re: OR query with null value and non-null value(s) Thank you Shawn. This does work. To help me understand better, why do we need the *:* ? Shouldn't it be implicit ? Shouldn't fq=(price:4+OR+(-price:[* TO *])) //does not work mean the same as fq=(price:4+OR+(*:* -price:[* TO *])) //works Why does Solr need the *:* there ? On Fri, Jun 7, 2013 at 12:07 AM, Shawn Heisey s...@elyograg.org wrote: On 6/6/2013 12:28 PM, Rahul R wrote: I have recently enabled facet.missing=true in solrconfig.xml which gives null facet values also. As I understand it, the syntax to do a faceted search on a null value is something like this: fq=-price:[* TO *] So when I want to search on a particular value (for example : 4) OR null value, I would expect the syntax to be something like this: fq=(price:4+OR+(-price:[* TO *])) But this does not work. After searching around for more, read somewhere that the right way to achieve this would be: fq=-(-price:4+AND+price:[*+TO+***]) Now this does work but seems like a very roundabout way. Is there a better way to achieve this ? Pure negative queries don't work -- you have to have results in the query before you can subtract. For some top-level queries, Solr is able to detect this situation and fix it internally, but on inner queries you must explicitly state your intentions. It is best if you always use '*:* -query' syntax, just to be safe. fq=(price:4+OR+(*:* -price:[* TO *])) Thanks, Shawn
Re: Solr 4.2.1 higher memory footprint vs Solr 3.5
Hi All, I work with Sandeep M, so continued to his comments. We did observe a memory growth. We use jdk1.6.0_45 with CMS. We see this issue because of large document size. With large i mean our single document has large multivalued fields. We found that JIRA LUCENE-4995 https://issues.apache.org/jira/browse/LUCENE-4995 is what we experienced. and the patch seam to resolve our issue. We are performing more test around it. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-2-1-higher-memory-footprint-vs-Solr-3-5-tp4067879p4068886.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Documents
If you are trying to import an external XML file into your system, you may want to look at DataImportHandler. It is a good way to start. Look at Wikipedia examples. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, Jun 7, 2013 at 8:34 AM, acas...@greendata.com wrote: Good morning, I would like to know how I can modify a xml file to access to my information and not to the example information because I have one file from I obtains the information that I use to show the user with Blacklight. Sorry about my english, Alex
Re: Solr4.3 Internationalization.
It may be helpful to approach this from the other side. Specifically search. Are you: 1) Expecting to search across both French and English content (e.g. French, but fallback to English if translation is missing)? If yes, you want a single collection 2) Is French content completely separate from English content or are they just a couple of translated fields in otherwise shared entity? If later, you want a single collection. 3) Are you accessing all languages at once when you retrieve a record or just one language at a time? If all languages at once, you want a single collection. And so on. If your content is completely separate, you could do different collections. Otherwise, you probably want the same collection. If you do want a single collection, there is a couple of things you can do to make it transparent for the frontend code to switch between languages and make search transparent. While not a production use, it is explored in details in my just released book: http://www.packtpub.com/apache-solr-for-indexing-data/book . The corresponding example is at: https://github.com/arafalov/solr-indexing-book/tree/master/published/languages but I am not sure how easy it is to understand without the walkthrough in the book. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, Jun 7, 2013 at 6:08 AM, bsargurunathan bsargurunat...@gmail.com wrote: Guys, Please clarify the following questions regarding Solr Internationalization. 1) Initially my requirement is need to support 2 languages(English French) for a Web application. And we are using Mysql DB. 2) So please share good and easy approach to achieve it with some sample configs. 3) And my question is whether I need to index the data with both languages(English French) with different cores? 4) Or indexing with English is only enough? So solr have any mechanism to handle multiple languages while retrieving? If there anything share with some sample configs. Thanks Guru -- View this message in context: http://lucene.472066.n3.nabble.com/Solr4-3-Internationalization-tp4068834.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr 4.3.0 Cloud Issue indexing pdf documents
Hi I am having an issue with adding pdf documents to a SolrCloud index I have setup. I can index pdf documents fine using 4.3.0 on my local box, but I have a SolrCloud instance setup on the Amazon Cloud (Using 2 servers) and I get Error. It seems that it is not loading org.apache.pdfbox.pdmodel.PDPage. However, the jar is in the directory, and referenced in the solrconfig.xml file lib dir=/www/solr/lib/contrib/extraction/lib regex=.*\.jar / lib dir=/www/solr/lib/ regex=solr-cell-\d.*\.jar / lib dir=/www/solr/lib/contrib/clustering/lib/ regex=.*\.jar / lib dir=/www/solr/lib/ regex=solr-clustering-\d.*\.jar / lib dir=/www/solr/lib/contrib/langid/lib/ regex=.*\.jar / lib dir=/www/solr/lib/ regex=solr-langid-\d.*\.jar / lib dir=/www/solr/lib/contrib/velocity/lib regex=.*\.jar / lib dir=/www/solr/lib/ regex=solr-velocity-\d.*\.jar / When I start Tomcat, I can see that the file has loaded. 2705 [coreLoadExecutor-4-thread-3] INFO org.apache.solr.core.SolrResourceLoader Adding 'file:/www/solr/lib/contrib/extraction/lib/pdfbox-1.7.1.jar' to classloader But when I try to add a document. java -Durl=http://ec2-blah-blaheu-west-1.compute.amazonaws.com:8080/solr/quosa2-c ollection/update/extract -Dparams=literal.id=pdf1 -Dtype=text/pdf -jar post.jar 2008.Genomics.pdf I get this error. I¹m running on an Ubuntu machine. Linux ip-10-229-125-163 3.5.0-21-generic #32-Ubuntu SMP Tue Dec 11 18:51:59 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux Error log. 88168 [http-bio-8080-exec-1] INFO org.apache.solr.update.processor.LogUpdateProcessor [quosa2-collection_shard1_replica1] webapp=/solr path=/update/extract params={literal.id=pdf1} {} 0 1534 88180 [http-bio-8080-exec-1] ERROR org.apache.solr.servlet.SolrDispatchFilter null:java.lang.RuntimeException: java.lang.UnsatisfiedLinkError: /usr/lib/jvm/java-7-oracle/jre/lib/amd64/xawt/libmawt.so: libXrender.so.1: cannot open shared object file: No such file or directory at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java :670) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 380) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 155) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application FilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh ain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja va:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja va:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171 ) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:947) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java :118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Proce ssor.java:1009) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(Abstrac tProtocol.java:589) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java: 310) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11 45) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6 15) at java.lang.Thread.run(Thread.java:722) Caused by: java.lang.UnsatisfiedLinkError: /usr/lib/jvm/java-7-oracle/jre/lib/amd64/xawt/libmawt.so: libXrender.so.1: cannot open shared object file: No such file or directory at java.lang.ClassLoader$NativeLibrary.load(Native Method) at java.lang.ClassLoader.loadLibrary1(ClassLoader.java:1939) at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1864) at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1825) at java.lang.Runtime.load0(Runtime.java:792) at java.lang.System.load(System.java:1059) at java.lang.ClassLoader$NativeLibrary.load(Native Method) at java.lang.ClassLoader.loadLibrary1(ClassLoader.java:1939) at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1864) at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1846) at java.lang.Runtime.loadLibrary0(Runtime.java:845) at java.lang.System.loadLibrary(System.java:1084) at sun.security.action.LoadLibraryAction.run(LoadLibraryAction.java:67) at sun.security.action.LoadLibraryAction.run(LoadLibraryAction.java:47) at java.security.AccessController.doPrivileged(Native Method) at java.awt.Toolkit.loadLibraries(Toolkit.java:1648) at java.awt.Toolkit.clinit(Toolkit.java:1670) at java.awt.Color.clinit(Color.java:275) at org.apache.pdfbox.pdmodel.PDPage.clinit(PDPage.java:72) at org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:212) at
Custom Data Clustering
Hi, Can someone please tell me if there is a way to have a custom *`clustering of the data`* from `solr` 'query' results? I am facing 2 issues currently: 1. The `*Carrot*` clustering only applies clustering to the paged results (i.e. in the current pagination's page results). 2. I need to have custom clustering and classify results into certain classes only (i.e. only few very specific words in the search results). Like for example Red, Green, Blue etc... and not hello World, Known World, green world etc -(if you know what I mean here) - Where all these words in both Do and DoNot existing in the search results. Please tell me how to achieve this. Perhaps Carrot/clustering is not needed here and some other classifier is needed. So what to do here? Basically, I cannot receive 1 million results, then process them via PHP-Array to classify them as per need. The classification must be done here in solr only. Thanks -- Regards, Raheel Hasan
RE: How to stop index distribution among shards in solr cloud
This may help: http://docs.lucidworks.com/display/solr/Shards+and+Indexing+Data+in+SolrCloud --- See Document Routing section. -Original Message- From: sathish_ix [mailto:skandhasw...@inautix.co.in] Sent: Friday, June 07, 2013 5:27 AM To: solr-user@lucene.apache.org Subject: How to stop index distribution among shards in solr cloud Hi, I have two shards, logically each shards corresponds to a region. Currently index is distributed in solr cloud to shards, how to load index to specific shard in solr cloud, Any thoughts ? Thanks, Sathish -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-stop-index-distribution-among-shards-in-solr-cloud-tp4068831.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr.NoOpDistributingUpdateProcessorFactory in SOLR CLOUD
: I don't think you want the noop bits, I'd go back to the : standard definitions here. Correct. the NoOpDistributingUpdateProcessorFactory is for telling the update processor chain that you do not want it to do any distribution of updates at all -- whatever SolrCore you send the doc to, is the only do that gets it, and RunUpdateProcessor will write it to it's local index. -Hoss
Re: Solr 4.3.0 Cloud Issue indexing pdf documents
Hi Mark, This is a total shot in the dark, but does passing -Djava.awt.headless=true when you run the server help at all? More on awt headless mode: http://www.oracle.com/technetwork/articles/javase/headless-136834.html Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions w: appinions.com http://www.appinions.com/ On Fri, Jun 7, 2013 at 11:31 AM, Mark Wilson m...@sanger.ac.uk wrote: Hi I am having an issue with adding pdf documents to a SolrCloud index I have setup. I can index pdf documents fine using 4.3.0 on my local box, but I have a SolrCloud instance setup on the Amazon Cloud (Using 2 servers) and I get Error. It seems that it is not loading org.apache.pdfbox.pdmodel.PDPage. However, the jar is in the directory, and referenced in the solrconfig.xml file lib dir=/www/solr/lib/contrib/extraction/lib regex=.*\.jar / lib dir=/www/solr/lib/ regex=solr-cell-\d.*\.jar / lib dir=/www/solr/lib/contrib/clustering/lib/ regex=.*\.jar / lib dir=/www/solr/lib/ regex=solr-clustering-\d.*\.jar / lib dir=/www/solr/lib/contrib/langid/lib/ regex=.*\.jar / lib dir=/www/solr/lib/ regex=solr-langid-\d.*\.jar / lib dir=/www/solr/lib/contrib/velocity/lib regex=.*\.jar / lib dir=/www/solr/lib/ regex=solr-velocity-\d.*\.jar / When I start Tomcat, I can see that the file has loaded. 2705 [coreLoadExecutor-4-thread-3] INFO org.apache.solr.core.SolrResourceLoader Adding 'file:/www/solr/lib/contrib/extraction/lib/pdfbox-1.7.1.jar' to classloader But when I try to add a document. java -Durl= http://ec2-blah-blaheu-west-1.compute.amazonaws.com:8080/solr/quosa2-c ollection/update/extract -Dparams=literal.id=pdf1 -Dtype=text/pdf -jar post.jar 2008.Genomics.pdf I get this error. I¹m running on an Ubuntu machine. Linux ip-10-229-125-163 3.5.0-21-generic #32-Ubuntu SMP Tue Dec 11 18:51:59 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux Error log. 88168 [http-bio-8080-exec-1] INFO org.apache.solr.update.processor.LogUpdateProcessor [quosa2-collection_shard1_replica1] webapp=/solr path=/update/extract params={literal.id=pdf1} {} 0 1534 88180 [http-bio-8080-exec-1] ERROR org.apache.solr.servlet.SolrDispatchFilter null:java.lang.RuntimeException: java.lang.UnsatisfiedLinkError: /usr/lib/jvm/java-7-oracle/jre/lib/amd64/xawt/libmawt.so: libXrender.so.1: cannot open shared object file: No such file or directory at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java :670) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 380) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 155) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application FilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh ain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja va:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja va:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171 ) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:947) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java :118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Proce ssor.java:1009) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(Abstrac tProtocol.java:589) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java: 310) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11 45) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6 15) at java.lang.Thread.run(Thread.java:722) Caused by: java.lang.UnsatisfiedLinkError: /usr/lib/jvm/java-7-oracle/jre/lib/amd64/xawt/libmawt.so: libXrender.so.1: cannot open shared object file: No such file or directory at java.lang.ClassLoader$NativeLibrary.load(Native Method) at java.lang.ClassLoader.loadLibrary1(ClassLoader.java:1939) at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1864) at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1825) at java.lang.Runtime.load0(Runtime.java:792) at java.lang.System.load(System.java:1059) at java.lang.ClassLoader$NativeLibrary.load(Native Method) at java.lang.ClassLoader.loadLibrary1(ClassLoader.java:1939) at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1864) at
Re: OR query with null value and non-null value(s)
Thank you for the Clarification Shawn. On Fri, Jun 7, 2013 at 7:34 PM, Jack Krupansky j...@basetechnology.comwrote: Yes, it SHOULD! And in the LucidWorks Search query parser it does. Why doesn't it in Solr? Ask Yonik to explain that! -- Jack Krupansky -Original Message- From: Rahul R Sent: Friday, June 07, 2013 1:21 AM To: solr-user@lucene.apache.org Subject: Re: OR query with null value and non-null value(s) Thank you Shawn. This does work. To help me understand better, why do we need the *:* ? Shouldn't it be implicit ? Shouldn't fq=(price:4+OR+(-price:[* TO *])) //does not work mean the same as fq=(price:4+OR+(*:* -price:[* TO *])) //works Why does Solr need the *:* there ? On Fri, Jun 7, 2013 at 12:07 AM, Shawn Heisey s...@elyograg.org wrote: On 6/6/2013 12:28 PM, Rahul R wrote: I have recently enabled facet.missing=true in solrconfig.xml which gives null facet values also. As I understand it, the syntax to do a faceted search on a null value is something like this: fq=-price:[* TO *] So when I want to search on a particular value (for example : 4) OR null value, I would expect the syntax to be something like this: fq=(price:4+OR+(-price:[* TO *])) But this does not work. After searching around for more, read somewhere that the right way to achieve this would be: fq=-(-price:4+AND+price:[*+TO+*]) Now this does work but seems like a very roundabout way. Is there a better way to achieve this ? Pure negative queries don't work -- you have to have results in the query before you can subtract. For some top-level queries, Solr is able to detect this situation and fix it internally, but on inner queries you must explicitly state your intentions. It is best if you always use '*:* -query' syntax, just to be safe. fq=(price:4+OR+(*:* -price:[* TO *])) Thanks, Shawn
Re: LotsOfCores feature
Aleksey: What would you say is the average core size for your use case - thousands or millions of rows? And how sharded would each of your collections be, if at all? Average core/collection size wouldn't even be thousands, hundreds more like. And the largest would be half a million or so but that's a pathological case. I don't need sharding and queries than fan out to different machines. If fact I'd like to avoid that so I don't have to collate the results. The Wiki page was built not for Cloud Solr. We have done such a deployment where less than a tenth of cores were active at any given point in time. though there were tens of million indices they were split among a large no:of hosts. If you don't insist of Cloud deployment it is possible. I'm not sure if it is possible with cloud By Cloud you mean specifically SolrCloud? I don't have to have it if I can do without it. Bottom line is I want a bunch of small cores to be distributed over a fleet, each core completely fitting on one server. Would you be willing to provide a little more details on your setup? In particular, how are you managing the cores? How do you route requests to proper server? If you scale the fleet up and down, does reshuffling of the cores happen automatically or is it an involved manual process? Thanks, Aleksey
Re: LotsOfCores feature
Thanks. That's what I suspected. Yes, MegaMiniCores. My scenario is purely hypothetical. But it is also relevant for multi-tenant use cases, where the users and schemas are not known in advance and are only online intermittently. Users could fit three rough size categories: very small, medium, and very large. Over time a user might move from very small to medium to very large. Very large users could require their own dedicated clusters. Medium size could occasionally require a dedicated node, but not always. And very small is mostly offline but occasionally a fair number are online for short periods of time. -- Jack Krupansky -Original Message- From: Aleksey Sent: Friday, June 07, 2013 3:44 PM To: solr-user Subject: Re: LotsOfCores feature Aleksey: What would you say is the average core size for your use case - thousands or millions of rows? And how sharded would each of your collections be, if at all? Average core/collection size wouldn't even be thousands, hundreds more like. And the largest would be half a million or so but that's a pathological case. I don't need sharding and queries than fan out to different machines. If fact I'd like to avoid that so I don't have to collate the results. The Wiki page was built not for Cloud Solr. We have done such a deployment where less than a tenth of cores were active at any given point in time. though there were tens of million indices they were split among a large no:of hosts. If you don't insist of Cloud deployment it is possible. I'm not sure if it is possible with cloud By Cloud you mean specifically SolrCloud? I don't have to have it if I can do without it. Bottom line is I want a bunch of small cores to be distributed over a fleet, each core completely fitting on one server. Would you be willing to provide a little more details on your setup? In particular, how are you managing the cores? How do you route requests to proper server? If you scale the fleet up and down, does reshuffling of the cores happen automatically or is it an involved manual process? Thanks, Aleksey
RE: SolrCloud Load Balancer weight
Cool! Having those values influenced by stats is a neat idea too. I'll get on that soon. Tim -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Monday, June 03, 2013 5:07 PM To: solr-user@lucene.apache.org Subject: Re: SolrCloud Load Balancer weight On Jun 3, 2013, at 3:33 PM, Tim Vaillancourt t...@elementspace.com wrote: Should I JIRA this? Thoughts? Yeah - it's always been in the back of my mind - it's come up a few times - eventually we would like nodes to report some stats to zk to influence load balancing. - mark
translating a character code to an ordinal?
hello all, environment: solr 3.5, centos problem statement: i have several character codes that i want to translate to ordinal (integer) values (for sorting), while retaining the original code field in the document. i was thinking that i could use a copyField from my code field to my ord field - then employ a pattern replace filter factory during indexing. but won't the copyfield fail because the two field types are different? ps: i also read the wiki about http://wiki.apache.org/solr/DataImportHandler#Transformer the script transformer and regex transformer - but was hoping to avoid this - if i could. thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/translating-a-character-code-to-an-ordinal-tp4068966.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr facet query on multiple search term
I'm a little confused here. Faceting is about counting docs that meet your query restrictions. I.e. the q= and fq= clauses. So your original problem statement simply cannot be combined into a single query since your q= clauses are different. You could do something like q=(firstterm OR secondterm)facet.query=firsttermfacet.query=secondTerm That would give you accurate facet counts for the terms, but it certainly doesn't preserve the original intent of q=firsttermfacet.query=blahblah. But facet.query is only counted over the docs that match the q= clause (well, the q= clause and any fq clauses). So perhaps you can supply a few example input docs and desired counts on the other side. Best Erick On Fri, Jun 7, 2013 at 8:01 AM, vrparekh vrpar...@gmail.com wrote: Hello All, I required facet counts for multiple SearchTerms. Currently I am doing two separate facet query on each search term with facet.range=dateField e.g. http://solrserver/select?q=1stsearchTermfq=onfacet-parameters http://solrserver/select?q=2ndsearchTermfq=onfacet-parameters Note :: SearchTerm field will be text_en_splitting Now I have found another way to do facet query on multiple search term by tagging and excluding e.g. http://solrurl/select?start=0rows=10hl=off; facet=on facet.range.start=2013-06-06T16%3a00%3a00Z facet.range.end=2013-06-07T16%3a00%3a01Z facet.range.gap=%2B1HOUR wt=xml sort=dateField+desc facet.range={!key=music+ex=movie}dateField fq={!tag=music}content:musicfacet.range={!key=movie+ex=music}dateField fq={!tag=movie}content:movieq=(col2:1+) fq=+dateField:[2013-06-05T16:00:00Z+TO+2013-06-07T16:00:00Z]+AND+(+Col1:test+) fl=col1,col2,col3 I have tested for few search term , It is providing same result as different query for each search term. Is this the proper way (with results and performance)? -- View this message in context: http://lucene.472066.n3.nabble.com/solr-facet-query-on-multiple-search-term-tp4068856.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: translating a character code to an ordinal?
This won't help you unless you move to Solr 4.0, but here's an update processor script from the book that can take the first character of a string field and add it as an integer value for another field: updateRequestProcessorChain name=script-add-char-code processor class=solr.StatelessScriptUpdateProcessorFactory str name=scriptadd-char-code.js/str lst name=params str name=fieldNamecontent/str str name=codeFieldNamecontent_code_i/str /lst /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain Here is the JavaScript script that should be placed in the add-char-code.js file in the conf directory for the Solr collection: function processAdd(cmd) { var fieldName; var codeFieldName; if (typeof params !== undefined) { fieldName = params.get(fieldName); codeFieldName = params.get(codeFieldName); } if (fieldName == null) fieldName = content; if (codeFieldName == null) codeFieldName = content_code_i; // Get value for named field, no-op if empty var value = cmd.getSolrInputDocument().getField(fieldName); if (value != null){ var str = value.getFirstValue(); // No-op if string is empty if (str != null str.length() != 0){ // Get code for first character var code = str.charCodeAt(0); logger.info(String: \ + str + \ len: + str.length() + code: + code); // Set the character code output field value cmd.getSolrInputDocument().addField(codeFieldName, code); } } } function processDelete() { // Dummy - add if needed } function processCommit() { // Dummy - add if needed } function processRollback() { // Dummy - add if needed } function processMergeIndexes() { // Dummy - add if needed } function finish() { // Dummy - add if needed } Test it: curl http://localhost:8983/solr/update?commit=trueupdate.chain=script-add-char-code; \ -H 'Content-type:application/json' -d ' [{id: doc-1, content: abc}, {id: doc-2, content: 1}, {id: doc-3, content: }, {id: doc-4}, {id: doc-5, content: \u0002 abc}, {id: doc-6, content: [And, this, is the end, of this test.]}]' Results: id:doc-1, content:[abc], content_code_i:97, id:doc-2, content:[1], content_code_i:49, id:doc-3, content:[], id:doc-4, id:doc-5, content:[\u0002 abc], content_code_i:2, id:doc-6, content:[And, this, is the end, of this test.], content_code_i:65, -- Jack Krupansky -Original Message- From: geeky2 Sent: Friday, June 07, 2013 6:27 PM To: solr-user@lucene.apache.org Subject: translating a character code to an ordinal? hello all, environment: solr 3.5, centos problem statement: i have several character codes that i want to translate to ordinal (integer) values (for sorting), while retaining the original code field in the document. i was thinking that i could use a copyField from my code field to my ord field - then employ a pattern replace filter factory during indexing. but won't the copyfield fail because the two field types are different? ps: i also read the wiki about http://wiki.apache.org/solr/DataImportHandler#Transformer the script transformer and regex transformer - but was hoping to avoid this - if i could. thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/translating-a-character-code-to-an-ordinal-tp4068966.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Filtering on results with more than N words.
Also from the book, here's an alternative update request processor that uses a JavaScript script to do the counting and field creation: updateRequestProcessorChain name=script-add-word-count processor class=solr.StatelessScriptUpdateProcessorFactory str name=scriptadd-word-count.js/str lst name=params str name=fieldNamecontent/str str name=wordCountFieldNamecontent_wc_i/str /lst /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain Here is the JavaScript script that should be placed in the add-word-count.js file in the conf directory for the Solr collection: function processAdd(cmd) { var fieldName; var wordCountFieldName; if (typeof params !== undefined) { fieldName = params.get(fieldName); wordCountFieldName = params.get(wordCountFieldName); } if (fieldName == null) fieldName = content; if (wordCountFieldName == null) wordCountFieldName = content_wc_i; // Get value(s) for named field var values = cmd.getSolrInputDocument().getField(fieldName).getValues(); // Combine values into one string var str = ; var n = values.size(); for (i = 0; i n; i++) str += ' ' + values.get(i); // Compress out hyphens and underscores to join words var str_no_dash = str.replace(/-|_/g, '');; // Replace words with simply X var str_x_words = str_no_dash.replace(/\w+/g, 'X'); // Remove punctuation and white space, leaving just the Xes. var str_final = str_x_words.replace(/[^X]+/g, ''); // A count of the Xes is a good proxy for the word count. var wordCount = str_final.length; // Set the word count output field value cmd.getSolrInputDocument().addField(wordCountFieldName, wordCount); } function processDelete() { // Dummy - add if needed } function processCommit() { // Dummy - add if needed } function processRollback() { // Dummy - add if needed } function processMergeIndexes() { // Dummy - add if needed } function finish() { // Dummy - add if needed } A test: curl http://localhost:8983/solr/update?commit=trueupdate.chain=script-add-word-count; \ -H 'Content-type:application/json' -d ' [{id: doc-1, content: Hello World}, {id: doc-2, content: }, {id: doc-3, content: -- --- !}, {id: doc-4, content: This is some more.}, {id: doc-5, content: The CD-ROM, (and num_events_seen.)}, {id: doc-6, content: Four score and seven years ago our fathers brought forth on this continent a new nation, conceived in liberty, and dedicated to the proposition that all men are created equal. Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. }, {id: doc-7, content: 401(k)}, {id: doc-8, content: [And, this, is the end, of this test.]}]' Results: id:doc-1, content:[Hello World], content_wc_i:2, id:doc-2, content:[], content_wc_i:0, id:doc-3, content:[ -- --- !], content_wc_i:0, id:doc-4, content:[This is some more.], content_wc_i:4, id:doc-5, content:[The CD-ROM, (and num_events_seen.)], content_wc_i:4, id:doc-6, content:[Four score and seven years ago our fathers\n brought forth on this continent a new nation, conceived in liberty,\n and dedicated to the proposition that all men are created equal.\n Now we are engaged in a great civil war, testing whether that nation,\n or any nation so conceived and so dedicated, can long endure. ], content_wc_i:54, id:doc-7, content:[401(k)], content_wc_i:2, id:doc-8, content:[And, this, is the end, of this test.], content_wc_i:8, -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Thursday, June 06, 2013 5:07 PM To: solr-user@lucene.apache.org Subject: Re: Filtering on results with more than N words. From the book, here's an update request processor chain which will count the words in the content field and place it in the content_len_I field. Then you could do a range query on that count. updateRequestProcessorChain name=regex-count-words !-- Start with a copy of the content field -- processor class=solr.CloneFieldUpdateProcessorFactory str name=sourcecontent/str str name=destcontent_len_i/str /processor !-- Combine multivalued input into a single string -- processor class=solr.ConcatFieldUpdateProcessorFactory str name=fieldNamecontent_len_i/str str name=delimiter /str /processor !-- Remove hyphens and underscores - join parts into single word -- processor class=solr.RegexReplaceProcessorFactory str name=fieldNamecontent_len_i/str str name=pattern-|_/str str name=replacement/str /processor !-- Reduce words into a single letter X -- processor class=solr.RegexReplaceProcessorFactory str name=fieldNamecontent_len_i/str str name=pattern\w+/str str name=replacementX/str /processor !-- Remove punctuation
Re: translating a character code to an ordinal?
hello jack, thank you for the code ;) what book are you referring to? AFAICT - all of the 4.0 books are future order. we won't be moving to 4.0 (soon enough). so i take it - copyfield will not work, eg - i cannot take a code like ABC and copy it to an int field and then use the regex to turn it in to an ordinal? thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/translating-a-character-code-to-an-ordinal-tp4068966p4068984.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: translating a character code to an ordinal?
Correct, you need either an update request processor, a custom field type, or to preprocess your input before you give it to Solr. You can't do analysis on a non-text field. The book is my new Solr reference/guide that I will be self-publishing. We hope to make an Alpha draft available later next week. -- Jack Krupansky -Original Message- From: geeky2 Sent: Friday, June 07, 2013 8:08 PM To: solr-user@lucene.apache.org Subject: Re: translating a character code to an ordinal? hello jack, thank you for the code ;) what book are you referring to? AFAICT - all of the 4.0 books are future order. we won't be moving to 4.0 (soon enough). so i take it - copyfield will not work, eg - i cannot take a code like ABC and copy it to an int field and then use the regex to turn it in to an ordinal? thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/translating-a-character-code-to-an-ordinal-tp4068966p4068984.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Lucene/Solr Filesystem tunings
I figured as much for atime, thanks Otis! I haven't ran benchmarks just yet, but I'll be sure to share whatever I find. I plan to try ext4 vs xfs. I am also curious what effect disabling journaling (ext2) would have, relying on SolrCloud to manage 'consistency' over many instances vs FS journaling. Anyone have opinions there? If I test I'll share the results. Cheers, Tim On 4 June 2013 16:11, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, You can use noatime, nodiratime, nothing in Solr depends on that as far as I know. We tend to use ext4. Some people love xfs. Want to run some benchmarks and publish the results? :) Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Jun 4, 2013 at 6:48 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey all, Does anyone have any advice or special filesytem tuning to share for Lucene/Solr, and which file systems they like more? Also, does Lucene/Solr care about access times if I turn them off (I think I doesn't care)? A bit unrelated: What are people's opinions on reducing some consistency things like filesystem journaling, etc (ext2?) due to SolrCloud's additional HA with replicas? How about RAID 0 x 3 replicas or so? Thanks! Tim Vaillancourt
Re: Two instances of solr - the same datadir?
If it makes you feel better, I also considered this approach when I was in the same situation with a separate indexer and searcher on one Physical linux machine. My main concern was re-using the FS cache between both instances - If I replicated to myself there would be two independent copies of the index, FS-cached separately. I like the suggestion of using autoCommit to reload the index. If I'm reading that right, you'd set an autoCommit on 'zero docs changing', or just 'every N seconds'? Did that work? Best of luck! Tim On 5 June 2013 10:19, Roman Chyla roman.ch...@gmail.com wrote: So here it is for a record how I am solving it right now: Write-master is started with: -Dmontysolr.warming.enabled=false -Dmontysolr.write.master=true -Dmontysolr.read.master= http://localhost:5005 Read-master is started with: -Dmontysolr.warming.enabled=true -Dmontysolr.write.master=false solrconfig.xml changes: 1. all index changing components have this bit, enable=${montysolr.master:true} - ie. updateHandler class=solr.DirectUpdateHandler2 enable=${montysolr.master:true} 2. for cache warming de/activation listener event=newSearcher class=solr.QuerySenderListener enable=${montysolr.enable.warming:true}... 3. to trigger refresh of the read-only-master (from write-master): listener event=postCommit class=solr.RunExecutableListener enable=${montysolr.master:true} str name=execurl/str str name=dir./str bool name=waitfalse/bool arr name=args str${montysolr.read.master:http://localhost }/solr/admin/cores?wt=jsonamp;action=RELOADamp;core=collection1/str/arr /listener This works, I still don't like the reload of the whole core, but it seems like the easiest thing to do now. -- roman On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla roman.ch...@gmail.com wrote: Hi Peter, Thank you, I am glad to read that this usecase is not alien. I'd like to make the second instance (searcher) completely read-only, so I have disabled all the components that can write. (being lazy ;)) I'll probably use http://wiki.apache.org/solr/CollectionDistribution to call the curl after commit, or write some IndexReaderFactory that checks for changes The problem with calling the 'core reload' - is that it seems lots of work for just opening a new searcher, eeekkk...somewhere I read that it is cheap to reload a core, but re-opening the index searches must be definitely cheaper... roman On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge peter.stu...@gmail.com wrote: Hi, We use this very same scenario to great effect - 2 instances using the same dataDir with many cores - 1 is a writer (no caching), the other is a searcher (lots of caching). To get the searcher to see the index changes from the writer, you need the searcher to do an empty commit - i.e. you invoke a commit with 0 documents. This will refresh the caches (including autowarming), [re]build the relevant searchers etc. and make any index changes visible to the RO instance. Also, make sure to use lockTypenative/lockType in solrconfig.xml to ensure the two instances don't try to commit at the same time. There are several ways to trigger a commit: Call commit() periodically within your own code. Use autoCommit in solrconfig.xml. Use an RPC/IPC mechanism between the 2 instance processes to tell the searcher the index has changed, then call commit when called (more complex coding, but good if the index changes on an ad-hoc basis). Note, doing things this way isn't really suitable for an NRT environment. HTH, Peter On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla roman.ch...@gmail.com wrote: Replication is fine, I am going to use it, but I wanted it for instances *distributed* across several (physical) machines - but here I have one physical machine, it has many cores. I want to run 2 instances of solr because I think it has these benefits: 1) I can give less RAM to the writer (4GB), and use more RAM for the searcher (28GB) 2) I can deactivate warming for the writer and keep it for the searcher (this considerably speeds up indexing - each time we commit, the server is rebuilding a citation network of 80M edges) 3) saving disk space and better OS caching (OS should be able to use more RAM for the caching, which should result in faster operations - the two processes are accessing the same index) Maybe I should just forget it and go with the replication, but it doesn't 'feel right' IFF it is on the same physical machine. And Lucene specifically has a method for discovering changes and re-opening the index (DirectoryReader.openIfChanged) Am I not seeing something? roman On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman jhell...@innoventsolutions.com wrote: Roman, Could you be more specific as to
Re: Two instances of solr - the same datadir?
I have auto commit after 40k RECs/1800secs. But I only tested with manual commit, but I don't see why it should work differently. Roman On 7 Jun 2013 20:52, Tim Vaillancourt t...@elementspace.com wrote: If it makes you feel better, I also considered this approach when I was in the same situation with a separate indexer and searcher on one Physical linux machine. My main concern was re-using the FS cache between both instances - If I replicated to myself there would be two independent copies of the index, FS-cached separately. I like the suggestion of using autoCommit to reload the index. If I'm reading that right, you'd set an autoCommit on 'zero docs changing', or just 'every N seconds'? Did that work? Best of luck! Tim On 5 June 2013 10:19, Roman Chyla roman.ch...@gmail.com wrote: So here it is for a record how I am solving it right now: Write-master is started with: -Dmontysolr.warming.enabled=false -Dmontysolr.write.master=true -Dmontysolr.read.master= http://localhost:5005 Read-master is started with: -Dmontysolr.warming.enabled=true -Dmontysolr.write.master=false solrconfig.xml changes: 1. all index changing components have this bit, enable=${montysolr.master:true} - ie. updateHandler class=solr.DirectUpdateHandler2 enable=${montysolr.master:true} 2. for cache warming de/activation listener event=newSearcher class=solr.QuerySenderListener enable=${montysolr.enable.warming:true}... 3. to trigger refresh of the read-only-master (from write-master): listener event=postCommit class=solr.RunExecutableListener enable=${montysolr.master:true} str name=execurl/str str name=dir./str bool name=waitfalse/bool arr name=args str${montysolr.read.master:http://localhost }/solr/admin/cores?wt=jsonamp;action=RELOADamp;core=collection1/str/arr /listener This works, I still don't like the reload of the whole core, but it seems like the easiest thing to do now. -- roman On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla roman.ch...@gmail.com wrote: Hi Peter, Thank you, I am glad to read that this usecase is not alien. I'd like to make the second instance (searcher) completely read-only, so I have disabled all the components that can write. (being lazy ;)) I'll probably use http://wiki.apache.org/solr/CollectionDistribution to call the curl after commit, or write some IndexReaderFactory that checks for changes The problem with calling the 'core reload' - is that it seems lots of work for just opening a new searcher, eeekkk...somewhere I read that it is cheap to reload a core, but re-opening the index searches must be definitely cheaper... roman On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge peter.stu...@gmail.com wrote: Hi, We use this very same scenario to great effect - 2 instances using the same dataDir with many cores - 1 is a writer (no caching), the other is a searcher (lots of caching). To get the searcher to see the index changes from the writer, you need the searcher to do an empty commit - i.e. you invoke a commit with 0 documents. This will refresh the caches (including autowarming), [re]build the relevant searchers etc. and make any index changes visible to the RO instance. Also, make sure to use lockTypenative/lockType in solrconfig.xml to ensure the two instances don't try to commit at the same time. There are several ways to trigger a commit: Call commit() periodically within your own code. Use autoCommit in solrconfig.xml. Use an RPC/IPC mechanism between the 2 instance processes to tell the searcher the index has changed, then call commit when called (more complex coding, but good if the index changes on an ad-hoc basis). Note, doing things this way isn't really suitable for an NRT environment. HTH, Peter On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla roman.ch...@gmail.com wrote: Replication is fine, I am going to use it, but I wanted it for instances *distributed* across several (physical) machines - but here I have one physical machine, it has many cores. I want to run 2 instances of solr because I think it has these benefits: 1) I can give less RAM to the writer (4GB), and use more RAM for the searcher (28GB) 2) I can deactivate warming for the writer and keep it for the searcher (this considerably speeds up indexing - each time we commit, the server is rebuilding a citation network of 80M edges) 3) saving disk space and better OS caching (OS should be able to use more RAM for the caching, which should result in faster operations - the two processes are accessing the same index) Maybe I should just forget it and go with the replication, but it doesn't 'feel right' IFF it is on the
Re: translating a character code to an ordinal?
thx, please send me a link to the book so i get/purchase it. thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/translating-a-character-code-to-an-ordinal-tp4068966p4068997.html Sent from the Solr - User mailing list archive at Nabble.com.
custom field tutorial
can someone point me to a custom field tutorial. i checked the wiki and this list - but still a little hazy on how i would do this. essentially - when the user issues a query, i want my class to interrogate a string field (containing several codes - example boo, baz, bar) and return a single integer field that maps to the string field (containing the code). example: boo=1 baz=2 bar=3 thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/custom-field-tutorial-tp4068998.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: LotsOfCores feature
We set it up like this + individual solr instances are setup + external mapping/routing to allocate users to instances. This information can be stored in an external data store + all cores are created as transient and loadonstart as false + cores come online on demand + as and when users data get bigger (or hosts are hot)they are migrated between less hit hosts using in built replication Keep in mind we had the schema for all users. Currently there is no way to upload a new schema to solr. On Jun 8, 2013 1:15 AM, Aleksey bitterc...@gmail.com wrote: Aleksey: What would you say is the average core size for your use case - thousands or millions of rows? And how sharded would each of your collections be, if at all? Average core/collection size wouldn't even be thousands, hundreds more like. And the largest would be half a million or so but that's a pathological case. I don't need sharding and queries than fan out to different machines. If fact I'd like to avoid that so I don't have to collate the results. The Wiki page was built not for Cloud Solr. We have done such a deployment where less than a tenth of cores were active at any given point in time. though there were tens of million indices they were split among a large no:of hosts. If you don't insist of Cloud deployment it is possible. I'm not sure if it is possible with cloud By Cloud you mean specifically SolrCloud? I don't have to have it if I can do without it. Bottom line is I want a bunch of small cores to be distributed over a fleet, each core completely fitting on one server. Would you be willing to provide a little more details on your setup? In particular, how are you managing the cores? How do you route requests to proper server? If you scale the fleet up and down, does reshuffling of the cores happen automatically or is it an involved manual process? Thanks, Aleksey
Re: custom field tutorial
What are you trying to do? This seems really odd. I've been working in search for fifteen years and I've never heard this request. You could always return all the fields to the client and ignore the ones you don't want. wunder On Jun 7, 2013, at 8:24 PM, geeky2 wrote: can someone point me to a custom field tutorial. i checked the wiki and this list - but still a little hazy on how i would do this. essentially - when the user issues a query, i want my class to interrogate a string field (containing several codes - example boo, baz, bar) and return a single integer field that maps to the string field (containing the code). example: boo=1 baz=2 bar=3 thx mark
Re: Multitable import - uniqueKey
Thank you for all reply members. Solve the issue. -- View this message in context: http://lucene.472066.n3.nabble.com/Multitable-import-uniqueKey-tp4067796p4069007.html Sent from the Solr - User mailing list archive at Nabble.com.