Re: using HttpSolrServer with PoolingHttpClientConnectionManager
Thank you Shawn! this is very helpful. Renee -- View this message in context: http://lucene.472066.n3.nabble.com/using-HttpSolrServer-with-PoolingHttpClientConnectionManager-tp4322905p4322972.html Sent from the Solr - User mailing list archive at Nabble.com.
using HttpSolrServer with PoolingHttpClientConnectionManager
first of all I apologize for the length of this message ... there are few questions I would appreciate your help please: 1. originally I wanted to use solrj in my application layer (webapp deployed with tomcat), to query the solr server(s) with multi-cores, non-cloud setup. Since I need send back XML format to my client, I realize it is not an use case for solrj, so I should abandon the idea (correct?) 2. I also looked into CommonsHttpSolrServer trying to query solr directly, which supposedly allow me to set XMLResponseParser as ResponseParser. however, it seems CommonsHttpSolrServer is deprecated, with httpclient 4.x I think I should use HttpSolrServer. I do need to have a way to set the returned data in xml format, and I want to use pooled http conn manager to support multiple thread for queries. I thought I could do all this with HttpSolrServer, (yes?) as below: PoolingHttpClientConnectionManager connManager = new PoolingHttpClientConnectionManager(); connManager.setMaxTotal(5); connManager.setDefaultMaxPerRoute(4); ... ... CloseableHttpClient httpclient = HttpClients.custom().setConnectionManager(connManager).build(); ... ... ResponseParser parser = new XMLResponseParser(); ... ... HttpSolrServer server = new HttpSolrServer(myUrl, httpclient, parser); ... ... SolrQuery query = new SolrQuery(); query.setQuery(q); query.setParam("wt", "xml"); // not needed? ... ... QueryResponse response = server.query(query); SolrDocumentList sdl = response.getResults(); at this point will the documents in sdl be in xml format if I use toString() looping through them? will there be overhead if this works at all? will solrj skip the xml parsing and simply return the results as I requested xml parser? I somehow feel its very fishy and I could be better off just not use solrj ? what is the best practice here? 3. I think my next question could be more like a httpclient question, but it does relate to solr / cores, so I will hope someone can give me help here: when I try to config PoolingHttpClientConnectionManager, for the per route connection etc, will the following different url considered to be different routes, or since they hit the same server, it will ignore the collection/core part? String myUrl = "http://localhost:8983/solr/core1";; and String myUrl = "http://localhost:8983/solr/core2";; Thanks! Renee -- View this message in context: http://lucene.472066.n3.nabble.com/using-HttpSolrServer-with-PoolingHttpClientConnectionManager-tp4322905.html Sent from the Solr - User mailing list archive at Nabble.com.
is there a way to match related multivalued fields of different types
Hi - I have a schema looks like: (text_nost and text_st are just defined field type without/with stopwords... irrelevant to the issues here) these 3 fields are parallel in means of their values. I want to be able to match these values and be able to search something like : give me all attachment_names if their corresponding attachment_size > 5000 I googled and saw someone mentioned about using dynamic fields, but I think dynamic fields are more suitable for 'type' style value rather than what I am having is attachment_names being just individual values. Please advice what is the best way to achieve this. Thanks in advance! Renee -- View this message in context: http://lucene.472066.n3.nabble.com/is-there-a-way-to-match-related-multivalued-fields-of-different-types-tp4319342.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: project related configsets need to be deployed in both data and solr install folders ?
thanks for your time! -- View this message in context: http://lucene.472066.n3.nabble.com/project-related-configsets-need-to-be-deployed-in-both-data-and-solr-install-folders-tp4317897p4318382.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: project related configsets need to be deployed in both data and solr install folders ?
Hi Chris, since I have been playing with this install, and I am not certain if I have unknowingly messed some other settings. I want to avoid put in a false Jira wasting your time. I wiped out everything on my solr box and did a fresh install of solr 6.4.0, made sure my config file set are placed in the data folder ( /myprojectdata/solr/data/configsets/myproject_configs ). My solr home is set to /myprojectdata/solr/data , it is WORKING now. I did not have to specify configSetBaseDir in the solr.xml (its in the data folder /myprojectdata/solr/data/solr.xml, NOT the one in install folder /opt/solr/server/solr/solr.xml), the default correctly point at the solr home which is my data folder, and find the config file set. So there is no problem, everything works fine, I can create new core without any issue. There is no bug whatsoever. Thank you for all your help! -- View this message in context: http://lucene.472066.n3.nabble.com/project-related-configsets-need-to-be-deployed-in-both-data-and-solr-install-folders-tp4317897p4318369.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: project related configsets need to be deployed in both data and solr install folders ?
Thanks Erick! I looked at solr twiki though if configSetBaseDir is not set, the default should be SOLR_HOME/configsets: configSetBaseDir The directory under which configsets for solr cores can be found. Defaults to SOLR_HOME/configsets and I do have my solr started with : -Dsolr.solr.home=/myprojectdata/solr/data I also deploy my config into: /myprojectdata/solr/data/configsets/myproject_configs anyways, looks like the default is not working? I found this https://issues.apache.org/jira/browse/SOLR-6158, which seems to talk about the configSetBaseDir issue ... I do set configSetBaseDir in solr.xml and it works now. Just wonder why the default wont work. Or I might did something else wrong. -- View this message in context: http://lucene.472066.n3.nabble.com/project-related-configsets-need-to-be-deployed-in-both-data-and-solr-install-folders-tp4317897p4318163.html Sent from the Solr - User mailing list archive at Nabble.com.
project related configsets need to be deployed in both data and solr install folders ?
Hi - We use separate solr install and data folders with a shared schema/config (configsets) in multi-cores setup, it seems the configsets need to be deployed in both places (we are running solr 6.4.0)? for example, solr is installed in /opt/solr, thus there is folder: /opt/solr/server/solr/configsets we separate the data into a different partition, thus there is: /mysolrdata/solr/data/configsets At first, I only deployed the project configsets to the solr install folder /opt/solr/server/solr/configsets : /opt/solr/server/solr/configsets/myproject_configs then when I create a core, solr complains it could not load config from /mysolrdata/solr/data/configsets/myproject_configs (the data folder): curl 'http://localhost:8983/solr/admin/cores?action=CREATE&name=abc&instanceDir=abc&configSet=myproject_configs' 40014org.apache.solr.common.SolrExceptionorg.apache.solr.common.SolrExceptionError CREATEing SolrCore 'abc': Unable to create core [abc] Caused by: Could not load configuration from directory /mysolrdata/solr/data/configsets/myproject_configs400 So next, I moved the configs to /mysolrdata/solr/data/configsets, but it now complains it could not load config from the install folder /opt/solr/server/solr/configsets/myproject_configs with the same error. I had to copy the same config set to both folders ( I eventually did a symlink from the /opt/solr/server/solr/configsets/myproject_configs to /mysolrdata/solr/data/configsets), and it worked. I wonder if I have missed any settings to allow me only deploy the configset at one place: either my data folder or the install folder, I would assume the benefit is obvious. Thanks Renee -- View this message in context: http://lucene.472066.n3.nabble.com/project-related-configsets-need-to-be-deployed-in-both-data-and-solr-install-folders-tp4317897.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr 5 leaving tomcat, will I be the only one fearing about this?
Thanks John... yes that was the first idea came to our mind, but it will require doubling our servers (in replica data centers as well etc), definitely we can't afford the cost. We have thought of first establishing a small pool of 'hot' servers and use them to take incoming new index data using upgraded solr version (a relative much smaller resources pool), meanwhile take one exist server (and its replicas as well) at one time to upgrade one by one. Although most (99%) index will happen at the small hot servers pool, but there are still some of the updates to the 'cold' servers at all time. We will also need to introduce a write lock down on the impacted servers... with one server at a time, the scope of impact will be reduced to its minimum... I am pretty sure we must not be the only one that has to face the re-index issue with large data set... am I correct? If there is a better approach, please share... thanks a lot! Renee -- View this message in context: http://lucene.472066.n3.nabble.com/solr-5-leaving-tomcat-will-I-be-the-only-one-fearing-about-this-tp4300065p4300550.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr 5 leaving tomcat, will I be the only one fearing about this?
Shawn and Ari, the 3rd party jars are exactly just one of the concerns I have. We had more than just a multi-lingual integration, we have to integrate with many other 3rd party tools. We basically deploy all those jars into an 'external' lib extension path in production, then for each 3rd party tools involved, we have to follow their instruction to integrate with solr/tomcat by symlink them into either tomcat/lib or ourwebapps/WEB-INF/lib etc. We will need to rewrite these integration in the build process I imagine. I am sure there will be a lot of other work if we go for this 'upgrade' ... I bet we will need to re-index the data ... for each major solr version upgrade (like from 3.0 to 4.0) the data needs to be re-indexed and this is another huge concern. Our data is not that BIG considering what others have nowadays, but still few hundreds of terabytes are a pain in the neck to go through re-index process, resource wise and time wise. Almost a road block. I have tested that solr shard query works with querying data from solr servers with different versions, so we could upgrade our production solr servers and reindex data on them one by one... but still, it is almost a non-practical thing to do, similar to how Ari feels, I also would rather not upgrade ... I guess we fortunately started to use Solr long time ago (which benefited our system of its key feature to search for email content in our SAAS services), but on the other side we become so depending on the old version of solr, as a reality with how Solr evolves, it is so hard for us to keep up with these upgrades... few years ago, the scalability become a major issue in our system, I did a lot of experiments with the Solr 4.0, unfortunately by then the lack of supporting of multi-tenancy as well as other fundamental flaws drove it out of our choices, so our team ended up with developing a light weighted scalalible layer wrapping up on top of solr, which worked very well but here we are... from all elements: build process, architect, data migration etc it is scary. Good discussion and it is great help ... :-) Renee -- View this message in context: http://lucene.472066.n3.nabble.com/solr-5-leaving-tomcat-will-I-be-the-only-one-fearing-about-this-tp4300065p4300523.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr 5 leaving tomcat, will I be the only one fearing about this?
I just read through the following link Shawn shared in his reply: https://wiki.apache.org/solr/WhyNoWar While the following statement is true: "Supporting a single set of binary bits is FAR easier than worrying about what kind of customized environment the user has chosen for their deployment. " But it also probably will reduce the flexibility... for example, we tune for Scalability at tomcat level, such as its thread pool etc. I assume the standalone Solr (which is still using Jetty underlying) would expose sufficient configurable 'knobs' that allow me to turn 'Solr' to meet our data work load. If we want to minimize the migration work, our existing business logic component will remain in tomcat, then the fact that we will have co-exist jetty and tomcat deployed in production system is a bit strange... or is it? Even if I could port our webapps to use Jetty, I assume the way solr is embedding Jetty I would be able to integrate at that level, I probably end up with 2 Jetty container instances running on same server, correct? It is still too early for me to be sure how this will impact our system but I am a little worried. Renee -- View this message in context: http://lucene.472066.n3.nabble.com/solr-5-leaving-tomcat-will-I-be-the-only-one-fearing-about-this-tp4300065p4300259.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr 5 leaving tomcat, will I be the only one fearing about this?
Thanks everyone, I think this is very helpful... I will post more specific questions once we start to get more familiar with solr 6. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-5-leaving-tomcat-will-I-be-the-only-one-fearing-about-this-tp4300065p4300253.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr 5 leaving tomcat, will I be the only one fearing about this?
Thanks ... but that is an extremely simplified situation. We are not just looking for Solr as a new tool to start using it. In our production, we have cloud based big data indexing using Solr for many years. We have developed lots business related logic/component deployed as webapps working seamlessly with solr. I will give you a simple example, we purchased multi-lingual processors (and many other 3rd parties) which we integrated with solr by carefully deploy the libraries (e.g.) in the tomcat container so they work together. This basically means we have to rewrite all those components to make it work with solr 5 or 6. In my opinion, for those solr users like our company, it will really be beneficial if Solr could keep supporting deploying a war and maintain parallel support with its new standalone release, although this might be too much work? Thanks Renee -- View this message in context: http://lucene.472066.n3.nabble.com/solr-5-leaving-tomcat-will-I-be-the-only-one-fearing-about-this-tp4300065p4300202.html Sent from the Solr - User mailing list archive at Nabble.com.
solr 5 leaving tomcat, will I be the only one fearing about this?
need some general advises please... our infra is built with multiple webapps with tomcat ... the scale layer is archived on top of those webapps which work hand-in-hand with solr admin APIs / shard queries / commit or optimize / core management etc etc. While I have not get a chance to actually play with solr 5 yet, just by imagination, we will be facing some huge changes in our infra to be able to upgrade to solr 5, yes? Thanks Renee -- View this message in context: http://lucene.472066.n3.nabble.com/solr-5-leaving-tomcat-will-I-be-the-only-one-fearing-about-this-tp4300065.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to efficiently get sum of an int field
thanks Yonik... I bet with solr 3.5 we do not have jason facet api support yet ... -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-efficiently-get-sum-of-an-int-field-tp4238464p4238522.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to efficiently get sum of an int field
Also Yonik, out of curiosity... when I run stats on a large msg set (such as 200 million msgs), it tends to use a lot of memory, this should be expected correct? if I were able to use !sum=true to only get sum, a clever algorithm should be able to tell if sum is only requited, it will avoid memory overhead, is that implemented so ? anyways I was only trying to avoid running these stats on thousands customers that kills our solr servers. thanks Renee -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-efficiently-get-sum-of-an-int-field-tp4238464p4238520.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to efficiently get sum of an int field
now I think with solr 3.5 (that we are using), !sum=true (overwrite default ) probably is not supported yet :-( thanks Renee -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-efficiently-get-sum-of-an-int-field-tp4238464p4238519.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to efficiently get sum of an int field
I did try single quote with backslash of the bang. also tried disable history chars... did not work for me. unfortunately, we are using solr 3.5, probably does not support json format? -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-efficiently-get-sum-of-an-int-field-tp4238464p4238497.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to efficiently get sum of an int field
thanks! but it is silly that I can seem to escape the {!sum=true} properly to make it work in my curl :-( time curl -d 'q=*:*&rows=0&shards=solrhostname:8080/solr/413-1,anothersolrhost:8080/solr/413-2&stats=true&stats.field={!sum=true}myfieldname' http://localhost:8080/solr/413-1/select/? | xmllint --format - double quote or single quote, only escape ! or escape all { and !, nothing will make it work. -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-efficiently-get-sum-of-an-int-field-tp4238464p4238478.html Sent from the Solr - User mailing list archive at Nabble.com.
how to efficiently get sum of an int field
Hi - I have been using stats to get the sum of a field data (int) like: &stats=true&stats.field=my_field_name&rows=0 It works fine but when the index has hundreds million messages on a sharded indices, it take long time. I noticed the 'stats' give out more information than I needed (just sum), I suspect the min/max/mean etc are the ones that caused the time. Is there a simple way I can just get the sum without other things, and run it on a faster and less stressed to the solr server manner? Thanks Renee -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-efficiently-get-sum-of-an-int-field-tp4238464.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: any easy way to find out when a core's index physical file has been last updated?
Thanks a lot Shawn, for the details, it is very helpful ! -- View this message in context: http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044p4227274.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: any easy way to find out when a core's index physical file has been last updated?
Shawn, thanks so much, and this user forum is so helpful! I will start use autocommit with confidence it will greatly help reducing the false commit requests (a lot) from processes in our system. Regarding the solr version, it is actually a big problem we have to resolve sooner or later. When we upgraded to Solr 3.5 about 2 years ago, to avoid re-index our large data, we used : LUCENE_29 which seems to work fine except a lot of such warnings in catalina.out: WARNING: StopFilterFactory is using deprecated LUCENE_29 emulation. You should at some point declare and reindex to at least 3.0, because 2.x emulation is deprecated and will be removed in 4.0 We have a built a infrastructure which scales well using solr, is it a good practice to upgrade to solr 4.x without using solrCloud if it is possible at all? thanks! Renee -- View this message in context: http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044p4227220.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: any easy way to find out when a core's index physical file has been last updated?
unfortunately we are still using solr 3.5 with lucene 2.9.3 :-( If we upgrade to solr 4.x it will require upgrade of lucene away from 2.x.x which will need re-index of all our data. With current measures, it might take about 8-9 for the data we have to be re-indexed, a big concern. so to understand autocommit better, since it says: 30 I want to know 1) if I have a batch of 2000 documents being added to index, it may span of 3 minutes to index all 2000 document. Will the autocommit defined above kick off a commit 5 minutes after the first of 2000 document being indexed? 2) the autocommit will NOT commit if there is no update in last 5 minutes? 3) will maxTime counts in the document deletion or it only cares about adding a document? In another word, should I use maxPendingDeletes for document deletion? thanks Renee -- View this message in context: http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044p4227132.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: any easy way to find out when a core's index physical file has been last updated?
thank you! I will look into that. Also I came across autosoftcommit, it seems to be useful... we are still using solr 3.5, I hope autosoftcommit is included in solr 3.5... -- View this message in context: http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044p4227098.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: any easy way to find out when a core's index physical file has been last updated?
Walter, thanks! I will do some tests using auto commit, I guess if there is requirement for console UI to make documents searchable in 10 minutes, we will need to use the autocommit with maxTime instead of maxDoc. I wonder if in case we need to do a 'force commit', the autocommit will not get in the way by its not yet its maxTime, as long as there are updates? thanks Renee -- View this message in context: http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044p4227091.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: any easy way to find out when a core's index physical file has been last updated?
this make sense now. Thanks! why I got on this idea is: In our system we have large customer base and lots of cores, each customer may have multiple cores. there are also a lot of processes running in our system processing the data for these customers, and once a while, they would ask a center piece of webapp that we wrote to commit on a core. In this center piece webapp, I deploy it with solr in same tomcat container, its task is mainly a wrapper around the local cores to manage monitoring of the core size, merge cores if needed etc. I also have controls over the commit requests this webapp receives from time to time, try to space the commit out. In the case where multiple processes asking commits to the same core , my webapp will guarantee only one commit in x mintues interval get executed and drop the other commit requests. Now I just discovered some of the processes send in large amount of commit requests on many cores which never had any changes in the last interval. This was due to a bug in those other processes but the programmers there are behind on fixing the issue. this triggers me to the idea of verifying the incoming commit requests by checking the physical index files to see if any updates really occurred in the last interval. I was searching for any solr core admin RESTful api to get some meta data about the core such as 'last modified timestamp' ... but did not have any luck. I thought I could use 'index' folder timestamp to get accurate last modified time, but with what you just explained, it would not be the case. I will have to traverse through the files in the folder and figure out the last modified file. any input will be appreciated. Thanks a lot! Renee -- View this message in context: http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044p4227084.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: any easy way to find out when a core's index physical file has been last updated?
[core]/index is a folder holding index files. But index files in that folder is not just being deleted or added, they are also being updated. on Linux file system, the folder's timestamp will only be updated if the files in it is being added or deleted, NOT updated. So if I check the index folder, it will not be accurately reflexing the last time the index files are updated. -- View this message in context: http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044p4227058.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: any easy way to find out when a core's index physical file has been last updated?
hum... at beginning I also assumed segment index files will only be deleted or added, but not modified. But I did a test with heavy indexing on going, and observed the index file in [core]/index with a latest updated timestamp keep growing for about 7 minutes... not sure if the new write caused any merge and the file being updated has pretty big size, so it could be merging... but that does mean index files can be modified. thanks Renee -- View this message in context: http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044p4227049.html Sent from the Solr - User mailing list archive at Nabble.com.
any easy way to find out when a core's index physical file has been last updated?
I will need to figure out when was last index activity on a core. I can't use [corename]/index timestamp, because it only reflex the file deletion or addition, not file update. I am curious if any solr core admin RESTful api sort of thing thing I can use to get last modified timestamp on physical index ... Thanks Renee -- View this message in context: http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: is there any way to tell delete by query actually deleted anything?
thanks Shawn... on the other side, I have just created a thin layer webapp I deploy it with solr/tomcat. this webapp provides RESTful api allow all kind of clients in our system to call and request a commit on the certain core on that solr server. I put in with the idea to have a centre/final place to control the commit on the cores in local solr server. so far it works by reducing the arbitrary requests, such as that I will not allow 2 commit requests from different clients to commit on same core happen too close to each other, I will disregard the second request if the first just being done like less than 5 minutes ago. I am think enhance this webapp to check on physical index dir timestamp, and drop the request if the core has not been changed since last commit. This will prevent the client trying to commit on all cold cores blindly where only one of them actually was updated. I mean to ask: is there any solr admin meta data I can fetch through restful api, to get data such as index last updated time, or something like that? -- View this message in context: http://lucene.472066.n3.nabble.com/is-there-any-way-to-tell-delete-by-query-actually-deleted-anything-tp4226776p4226818.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: is there any way to tell delete by query actually deleted anything?
Hi Erick... as Shawn pointed out... I am not using solrcloud, I am using a more complicated sharding scheme, home grown... thanks for your response :-) Renee -- View this message in context: http://lucene.472066.n3.nabble.com/is-there-any-way-to-tell-delete-by-query-actually-deleted-anything-tp4226776p4226806.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: is there any way to tell delete by query actually deleted anything?
Hi Shawn, I think we have similar structure where we use frontier/back instead of hot/cold :-) so yes we will probably have to do the same. since we have large customers and some of them may have tera bytes data and end up with hundreds of cold cores the blind delete broadcasting to all of them is a performance kill. I am thinking of adding a in-memory inventory of coreID : docID so I can identify which core the document is in efficiently... what do you think about it? thanks Renee -- View this message in context: http://lucene.472066.n3.nabble.com/is-there-any-way-to-tell-delete-by-query-actually-deleted-anything-tp4226776p4226805.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: is there any way to tell delete by query actually deleted anything?
Shawn, thanks for the reply. I have a sharded index. When I re-index a document (vs new index, which is different process), I need to delete the old one first to avoid dup. We all know that if there is only one core, the newly added document will replace the old one, but with multiple core indexes, we will have to issue delete command first to ALL shards since we do NOT know/remember which core the old document was indexed to ... I also wanted to know if there is a better way handling this efficiently. Anyways, we are sending delete to all cores of this customer, one of them hit , others did not. But consequently, when I need to decide about commit, I do NOT want blindly commit to all cores, I want to know which one actually had the old doc so I only send commit to that core. I could alternatively use query first and skip if it did not hit, but delete if it does, and I can't short circuit since we have dups :-( based on a historical reason. any suggestion how to make this more efficiently? thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/is-there-any-way-to-tell-delete-by-query-actually-deleted-anything-tp4226776p4226788.html Sent from the Solr - User mailing list archive at Nabble.com.
is there any way to tell delete by query actually deleted anything?
I run this curl trying to delete some messages : curl 'http://localhost:8080/solr/mycore/update?commit=true&stream.body=abacd' | xmllint --format - or curl 'http://localhost:8080/solr/mycore/update?commit=true&stream.body=myfield:mycriteria' | xmllint --format - the results I got is like: % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 148 1480 1480 0 11402 0 --:--:-- --:--:-- --:--:-- 14800 0 10 Is there an easy way for me to get the actually deleted document number? I mean if the query did not hit any documents, I want to know that nothing got deleted. But if it did hit documents, i would like to know how many were delete... thanks Renee -- View this message in context: http://lucene.472066.n3.nabble.com/is-there-any-way-to-tell-delete-by-query-actually-deleted-anything-tp4226776.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: partial optimize does not reduce the segment number to maxNumSegments
sorry I should elaborate that earlier... in our production environment, we have multiple cores and the ingest continuously all day long; we only do optimize periodically, and optimize once a day in mid night. So sometimes we could see 'too many open files' error. To prevent it from happening, in production we maintain a script to monitor the segment files total with all cores, and send out warnings if that number exceed a threshold... it is kind of preventive measurement. Currently we are using the linux command to count the files. We are wondering if we can simply use some formula to figure out this number, it will be better that way. Seems we could use the stat url to get segment number and multiply it by 8 (that is what we have given our schema). Any better way to approach this? thanks a lot! Renee -- View this message in context: http://lucene.472066.n3.nabble.com/partial-optimize-does-not-reduce-the-segment-number-to-maxNumSegments-tp2682195p2825736.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: partial optimize does not reduce the segment number to maxNumSegments
yeah, I can figure out the segment number by going to stat page of solr... but my question was how to figure out exact total number of files in 'index' folder for each core. Like I mentioned in previous message, I currently have 8 files per segment (.prx .tii etc), but it seems this might change if I use term vector for example. So I need suggestions on how to accurately figure out the total file number. thanks -- View this message in context: http://lucene.472066.n3.nabble.com/partial-optimize-does-not-reduce-the-segment-number-to-maxNumSegments-tp2682195p2817912.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: partial optimize does not reduce the segment number to maxNumSegments
thanks! It seems the file count in index directory is the segment# * 8 in my dev environment... I see there are .fnm .frq .fdt .fdx .nrm .prx .tii .tis (8) file extensions, and each has as many as segment# files. Is it always safe to calculate the file counts using segment number multiply by 8? of course this excludes the segment_N, segment.gen and xxx_del files. I found most of the cores has the file count that can be calculated just using above formula, but few cores do not have a match number... thanks Renee -- View this message in context: http://lucene.472066.n3.nabble.com/partial-optimize-does-not-reduce-the-segment-number-to-maxNumSegments-tp2682195p2813419.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: partial optimize does not reduce the segment number to maxNumSegments
ok I dug more into this and realize the file extensions can vary depending on schema, right? for instance we dont have *.tvx, *.tvd, *.tvf (not using term vector)... and I suspect the file extensions may change with future lucene releases? now it seems we can't just count the file using any formula, we have to list all files in that directory and count that way... any insight will be appreciated. thanks Renee -- View this message in context: http://lucene.472066.n3.nabble.com/partial-optimize-does-not-reduce-the-segment-number-to-maxNumSegments-tp2682195p2813561.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: partial optimize does not reduce the segment number to maxNumSegments
Hi Hoss, thanks for your response... you are right I got a typo in my question, but I did use maxSegments, and here is the exactly url I used: curl 'http://localhost:8080/solr/97/update?optimize=true&maxSegments=10&waitFlush=true' I used jconsole and du -sk to monitor each partial optimize, and I am sure the optimize was done and it always reduce segment files from 130+ to 65+ when I started with maxSegments=10; when I run again with maxSegments=9, it reduce to somewhere in 50. when I use maxSegments=2, it always reduce the segment to 18; and maxSegments=1 (full optimize) will always reduce the core to 10 segment files. this has been repeated for about dozen times. I think the resulting files number is depending on the size of the core. I have a core takes 10GB disk space, and it has 4 million documents. It perhaps also depends on other sole/lucene configurations? let me know if I should give you any data with our solr config. Here is the actual data from the test I run lately for your reference, you can see it definitely finished each partial optimize and the time spent is also included (please note I am using a core id there which is different from yours): /tmp # ls /xxx/solr/data/32455077/index | wc ---> this is the start point, 150 seg files 150 150 946 /tmp # time curl 'http://localhost:8080/solr/32455077/update?optimize=true&maxSegments=10&waitFlush=true' real0m36.050s user0m0.002s sys0m0.003s /tmp # ls /xxx/solr/data/32455077/index | wc-> after first partial optimize (10), reduce to 82 82 82 746 /tmp # time curl 'http://localhost:8080/solr/32455077/update?optimize=true&maxSegments=9&waitFlush=true' real1m54.364s user0m0.003s sys0m0.002s /tmp # ls /xxx/solr/data/32455077/index | wc 74 74 674 /tmp # time curl 'http://localhost:8080/solr/32455077/update?optimize=true&maxSegments=8&waitFlush=true' real2m0.443s user0m0.002s sys0m0.003s /tmp # ls /xxx/solr/data/32455077/index | wc 66 66 602 /tmp # time curl 'http://localhost:8080/solr/32455077/update?optimize=true&maxSegments=7&waitFlush=true' real3m22.201s user0m0.002s sys0m0ls /tmp # ls /xxx/solr/data/32455077/index | wc 58 58 530 /tmp # time curl 'http://localhost:8080/solr/32455077/update?optimize=true&maxSegments=6&w real3m29.277s user0m0.001s sys0m0.004s /tmp # ls /xxx/solr/data/32455077/index | wc 50 50 458 /tmp # time curl 'http://localhost:8080/solr/32455077/update?optimize=true&maxSegments=5&w real3m41.514s user0m0.003s sys0m0.003s /tmp # ls /xxx/solr/data/32455077/index | wc 42 42 386 /tmp # time curl 'http://localhost:8080/solr/32455077/update?optimize=true&maxSegments=4&w real5m35.697s user0m0.003s sys0m0.004s /tmp # ls /xxx/solr/data/32455077/index | wc 34 34 314 /tmp # time curl 'http://localhost:8080/solr/32455077/update?optimize=true&maxSegments=3wa real7m8.773s user0m0.003s sys0m0.002s /tmp # ls /xxx/solr/data/32455077/index | wc 26 26 242 /tmp # time curl 'http://localhost:8080/solr/32455077/update?optimize=true&maxSegments=2&w real9m18.814s user0m0.004s sys0m0.001s /tmp # ls /xxx/solr/data/32455077/index | wc 18 18 170 /tmp # time curl 'http://localhost:8080/solr/32455077/update?optimize=true&maxSegments=1&w (full optimize) real16m6.599s user0m0.003s sys0m0.004s Disk Space Usage: first 3 runs took about 20% extra middle couple runs took about 50% extra last full optimize took 100% extra -- View this message in context: http://lucene.472066.n3.nabble.com/partial-optimize-does-not-reduce-the-segment-number-to-maxNumSegments-tp2682195p2812415.html Sent from the Solr - User mailing list archive at Nabble.com.
partial optimize does not reduce the segment number to maxNumSegments
I have a core with 120+ segment files and I tried partial optimize specify maxNumSegments=10, after the optimize the segment files reduced to 64 files; I did the same optimize again, it reduced to 30 something; this keeps going and eventually it drops to teen number. I was expecting seeing the optimize results in exactly 10 segment files or somewhere near, and why do I have to manually repeat the optimize to reach that number? thanks Renee -- View this message in context: http://lucene.472066.n3.nabble.com/partial-optimize-does-not-reduce-the-segment-number-to-maxNumSegments-tp2682195p2682195.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Upgrade to Solr 1.4, very slow at start up when loading all cores
just update on this issue... we turned off the new/first searchers (upgrade to Solr 1.4.1), and ran benchmark tests, there is no noticeable performance impact on the queries we perform comparing with Solr 1.3 benchmark tests WITH new/first searchers. Also the memory usage reduced by 5.5 GB after loading the cores with our data volume by turning these static warm caches off. We will take this approach in our production environment but meanwhile I am curious if this issue will be addressed: it seems the new/first searchers do not really buy any performance benefits because it uses so much memory, especially at core loading time. thanks Renee -- View this message in context: http://lucene.472066.n3.nabble.com/Upgrade-to-Solr-1-4-very-slow-at-start-up-when-loading-all-cores-tp1608728p1697609.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: using HTTPClient sending solr ping request wont timeout as specified
Ken, looks like we posted at same time :-) thanks very much! Renee -- View this message in context: http://lucene.472066.n3.nabble.com/using-HTTPClient-sending-solr-ping-request-wont-timeout-as-specified-tp1691292p1695584.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: using HTTPClient sending solr ping request wont timeout as specified
thanks Michael, I got it resolved last night... you are right, it is more like a HttpClient issue after I tried another link unrelated to solr. If anyone is interested, here is the working code: HttpClientParams httpClientParams = new HttpClientParams(); httpClientParams.setSoTimeout(timeout); // set connection parameters HttpConnectionManagerParams httpConnectionMgrParams = new HttpConnectionManagerParams(); httpConnectionMgrParams.setConnectionTimeout(timeout); // connection // timeout HttpConnectionManager httpConnectionMgr = new SimpleHttpConnectionManager(); httpConnectionMgr.setParams(httpConnectionMgrParams); // create httpclient HttpClient client = new HttpClient(httpClientParams, httpConnectionMgr); HttpMethod method = new GetMethod(solrReq); thanks Renee -- View this message in context: http://lucene.472066.n3.nabble.com/using-HTTPClient-sending-solr-ping-request-wont-timeout-as-specified-tp1691292p1695551.html Sent from the Solr - User mailing list archive at Nabble.com.
using HTTPClient sending solr ping request wont timeout as specified
I am using the following code to send out solr request from a webapp. please notice the timeout setting: HttpClient client = new HttpClient(); HttpMethod method = new GetMethod(solrReq); method.getParams().setParameter(HttpConnectionParams.SO_TIMEOUT, new Integer(15000)); client.executeMethod(method); int statcode = method.getStatusCode(); if (statcode == HttpStatus.SC_OK) { ... ... when the 'solrReq' is a solr query url (such as http://[host]:8080/solr/blah/select?q=x), if the server is not responsive it times out in 15 seconds; however, if the 'solrReq' is a solr ping (such as http://[host]:8080/solr/default/admin/ping), it wont timeout in 15 second, it seems it will time out in few minutest instead. any idea, suggestions? thanks Renee -- View this message in context: http://lucene.472066.n3.nabble.com/using-HTTPClient-sending-solr-ping-request-wont-timeout-as-specified-tp1691292p1691292.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: using HTTPClient sending solr ping request wont timeout as specified
I also added the following timeout for the connection, still not working: client.getParams().setSoTimeout(httpClientPingTimeout); client.getParams().setConnectionManagerTimeout(httpClientPingTimeout); -- View this message in context: http://lucene.472066.n3.nabble.com/using-HTTPClient-sending-solr-ping-request-wont-timeout-as-specified-tp1691292p1691355.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Upgrade to Solr 1.4, very slow at start up when loading all cores
Hi Yonik, I tried the fix suggested in your comments (using "solr.TrieDateField" ), and it loaded up 130 cores in 1 minute, 1.3GB memory (a little more than 1GB when turning off static warm cache, and much less than 6.5GB when use 'solr.DateField'). Will this have any impact on first query or performance? I am about to run some benchmark test and compare with old data, will update you. Renee -- View this message in context: http://lucene.472066.n3.nabble.com/Upgrade-to-Solr-1-4-very-slow-at-start-up-when-loading-all-cores-tp1608728p1637176.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Upgrade to Solr 1.4, very slow at start up when loading all cores
http://lucene.472066.n3.nabble.com/file/n1617135/solrconfig.xml solrconfig.xml Hi Yonik, I have uploaded our solrconfig.xml file for your reference. we also tried 1.4.1, for same index data, it took about 30-55 minutes to load up all 130 cores, it did not help at all. There is no query running when we tried to upload the cores. Since JConsole is not responding at all when this happens, I am not sure if there is any command link memory profiler I can use to collect information, any suggestions? thanks Renee -- View this message in context: http://lucene.472066.n3.nabble.com/Upgrade-to-Solr-1-4-very-slow-at-start-up-when-loading-all-cores-tp1608728p1617135.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Upgrade to Solr 1.4, very slow at start up when loading all cores
Hi Yonik, I attached the solrconfig.xml to you in previous post, and we do have firstSearch and newSearch hook ups. I commented them out, all 130 cores loaded up in 1 minute, same as in solr 1.3. total memory took about 1GB. Whereas in 1.3, with hook ups, it took about 6.5GB for same amount of data. I assuem the consequence of commenting out the static warm requests will be it will slow down first time we hit the core for query? thanks Renee -- View this message in context: http://lucene.472066.n3.nabble.com/Upgrade-to-Solr-1-4-very-slow-at-start-up-when-loading-all-cores-tp1608728p1617263.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Upgrade to Solr 1.4, very slow at start up when loading all cores
Hi Yonik, thanks for your reply. I entered a bug for this at : https://issues.apache.org/jira/browse/SOLR-2138 to answer your questions here: - do you have any warming queries configured? > no, all autowarmingcount are set to 0 for all caches - do the cores have documents already, and if so, how many per core? > yes, 130 cores total, 2,3 of them already have 1~2.4 million documents, others have about 50,000 documents - are you using the same schema & solrconfig, or did you upgrade? > yes, absolutely no change - have you tried finding out what is taking up all the memory (or all the CPU time)? > yes, JConsole shows after 70 cores are loaded in about 4 minutes, all 16GB memory are taken and rest of cores load extremely slow. The memory remain high and never dropped. We are in process to upgrade to 1.4.1 -- View this message in context: http://lucene.472066.n3.nabble.com/Upgrade-to-Solr-1-4-very-slow-at-start-up-when-loading-all-cores-tp1608728p1611030.html Sent from the Solr - User mailing list archive at Nabble.com.
Upgrade to Solr 1.4, very slow at start up when loading all cores
Hi - I posted this problem but no response, I guess I need to post this in the Solr-User forum. Hopefully you will help me on this. We were running Solr 1.3 for long time, with 130 cores. Just upgrade to Solr 1.4, then when we start the Solr, it took about 45 minutes. The catalina.log shows Solr is very slowly loading all the cores. We did optimize, did not help at all. I run JConsole to monitor the memory. I notice the first 70 cores were loaded pretty fast, like in 3,4 minutes. But after that, the memory went all way up to about 15GB (we allocated 16GB to solr), and it slows down right there, slower and slower. We use concurrent GC. JConsole shows only ParNew GCs kicked off, but it doesnt bring down the memory. With Solr 1.3, all 130 cores loaded in 5,6 minutes. Please let me know if there is known memory issue with Solr 1.4. Or is there something (configuration) we need to tweak to make it work efficiently in 1.4? thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Upgrade-to-Solr-1-4-very-slow-at-start-up-when-loading-all-cores-tp1608728p1608728.html Sent from the Solr - User mailing list archive at Nabble.com.