Re: how many shards required to search data
As you suggestedI have indexed 12million sample records in solr on hardware of 8gb ram. Size of index is 3gb. can i extrapolate this to predict actual size of index.? -- View this message in context: http://lucene.472066.n3.nabble.com/how-many-shards-required-to-search-data-tp4118715p4118753.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how many shards required to search data
On 2/21/2014 1:39 AM, search engn dev wrote: As you suggestedI have indexed 12million sample records in solr on hardware of 8gb ram. Size of index is 3gb. can i extrapolate this to predict actual size of index.? If the sizes of those records are about the same size as the records in the system as a whole, you can probably use that to extrapolate. Based on that, I would guess that the index is probably going to be about 85GB. That's a lot less than I would have guessed, so perhaps there's a lot of extra stuff in that 250GB that doesn't actually get sent to Solr. Even though they are small, the number of documents will probably require a larger Java heap than the relatively small index size would normally require. Do you have any kind of notion as to what kind of query volume you're going to have? If it's low, you can put multiple shards on your multi-cpu machines and take advantage of parallel processing. If the query volume is high, you'll need all those cpus to handle the load of one shard, and you might need more than two machines for each shard. You'll want to shard your index even though it's relatively small in terms of disk space, because a billion documents is a LOT. If you're just starting out, SolrCloud is probably a good way to go. It handles document routing across shards for you. You didn't say whether that was your plan or not. Thanks, Shawn
RE: Grouping performance improvement
Thanks Alexey for giving some really good points. Just to make sure I get it right Are you suggesting 1. do facets on category first lets say I get 10 distinct category 2. do another query where q=search query and fq= facet category values May be im missing something, however Im not sure how to get factes along with lets say 5 documents under each facet value. -- View this message in context: http://lucene.472066.n3.nabble.com/Grouping-performance-improvement-tp4118549p4118844.html Sent from the Solr - User mailing list archive at Nabble.com.
in XML Node Getting Error
Hi, I am getting Error if any of the field in the xml file has as value. How can I fix this issue FYI I changed to amp; in the field but still has issues e.g filed name=NameATT/field or field name=NameATamp;T/field Above both gives error, DO I need to change something in the configuration. Thanks Ravi
ZK connection problems
I’ve been experimenting with SolrCloud configurations in AWS. One issue I’ve been plagued with is that during indexing, occasionally a node decides it can’t talk to ZK, and this disables updates in the pool. The node usually recovers within a second or two. It’s possible this happens when I’m not indexing too, but I’m much less likely to notice. I’ve seen this with multiple sharding configurations and multiple cluster sizes. I’ve searched around, and I think I’ve addressed the usual resolutions when someone complains about ZK and Solr. I’m using: * 60-sec ZK connection timeout (although this seems like a pretty terrible requirement) * Independent 3-node ZK cluster, also in AWS. * Solr 4.6.1 * Optimized GC settings (and I’ve confirmed no GC pauses are occurring) * 5-min auto-hard-commit with openSearcher=false I’m indexing some 10K docs/sec using CloudSolrServer, but the CPU usage on the nodes doesn’t exceed 20%, typically it’s around 5%. Here is the relevant section of logs from one of the nodes when this happened: http://pastebin.com/K0ZdKmL4 It looks like it had a connection timeout, and tried to re-establish the same session on a connection to a new ZK node, except the session had also expired. It then closes *that* connection, changes to read-only mode, and eventually creates a new connection and new session which allows writes again. Can anyone familiar with the ZK connection/session stuff comment on whether this is a bug? I really know nothing about proper ZK client behaviour. Thanks.
Re: in XML Node Getting Error
Ravi, What's the error you're getting? Thanks, Greg On Feb 21, 2014, at 11:08 AM, EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote: Hi, I am getting Error if any of the field in the xml file has as value. How can I fix this issue FYI I changed to amp; in the field but still has issues e.g filed name=NameATT/field or field name=NameATamp;T/field Above both gives error, DO I need to change something in the configuration. Thanks Ravi
RE: in XML Node Getting Error
I am getting something like ERROR org.apache.solr.core.SolrCore [com.ctc.wstx.excwstxLazyEception] com.ctc.wstx.exc.WstxParsingException: Undeclared general entity nbsp The filed content is nbsp; or amp; -Original Message- From: Greg Walters [mailto:greg.walt...@answers.com] Sent: Friday, February 21, 2014 12:16 PM To: solr-user@lucene.apache.org Subject: Re: in XML Node Getting Error Ravi, What's the error you're getting? Thanks, Greg On Feb 21, 2014, at 11:08 AM, EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote: Hi, I am getting Error if any of the field in the xml file has as value. How can I fix this issue FYI I changed to amp; in the field but still has issues e.g filed name=NameATT/field or field name=NameATamp;T/field Above both gives error, DO I need to change something in the configuration. Thanks Ravi
RE: in XML Node Getting Error
: ERROR org.apache.solr.core.SolrCore [com.ctc.wstx.excwstxLazyEception] : com.ctc.wstx.exc.WstxParsingException: Undeclared general entity nbsp nbsp is not a legal XML entity unless you have an enty declaration that defines it. it sounds like you don't have valid xml -- it sounds like you maybe you have some HTML that someone cut/paste into a file that they called XML but isn't really. you said the field in the xml file suggesting that someone/something attempted to build up a file in containing the xml messages for adding documents to solr -- what software created this file? if it's just doing string manipulations to try and hack together some XML, you're going to keep running into pain. You really want to be using a true XML library to generate correct XML. Alternatively: don't generate files, or even XML at all -- make that software use a client API to talk directly to Solr via java obects, or json, or csv, etc -Hoss http://www.lucidworks.com/
Re: in XML Node Getting Error
On 2/21/2014 10:31 AM, EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) wrote: I am getting something like ERROR org.apache.solr.core.SolrCore [com.ctc.wstx.excwstxLazyEception] com.ctc.wstx.exc.WstxParsingException: Undeclared general entity nbsp The filed content is nbsp; or amp; If you have nbsp entities, then it's not actually XML, it's a hybrid of HTML and XML. There are exactly five legal entities in XML, and nbsp isn't one of them: http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Predefined_entities_in_XML You'll need to clean up the XML. As far as I know, there is no way to declare a permissive mode. Solr uses standard and common XML libraries. Thanks, Shawn
Best way to get results ordered
Hi all, we are using SolR 4.4.0 and planning to migrate to 4.6.1 very soon. We are looking for a way to get results ordered in a certain way. For example, we are doing query by ids this way : q=id=A OR id =C OR id=B and we want the results to be sorted as A,C,B. Is there a good way to do this with SolR or should we sort the items on the client application side ? Regards, Metin
Re: Best way to get results ordered
Hi Metin, How many IDs are you supplying in a single query? You could probably accomplish this easily with boosts if it were few. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. The Science of Influence Marketing 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Fri, Feb 21, 2014 at 1:25 PM, OSMAN Metin metin.os...@canal-plus.comwrote: Hi all, we are using SolR 4.4.0 and planning to migrate to 4.6.1 very soon. We are looking for a way to get results ordered in a certain way. For example, we are doing query by ids this way : q=id=A OR id =C OR id=B and we want the results to be sorted as A,C,B. Is there a good way to do this with SolR or should we sort the items on the client application side ? Regards, Metin
RE: Best way to get results ordered
Thank you Michael, this applies from 5 to about 60 contents. We have already tried with boosts, but the results were not sorted well every time. Maybe our boost coefficients were not set properly, but I thought that there will be a correct way to do this. Metin OSMAN Canal+ || DTD - VOD 01 71 35 02 70 -Message d'origine- De : Michael Della Bitta [mailto:michael.della.bi...@appinions.com] Envoyé : vendredi 21 février 2014 19:28 À : solr-user@lucene.apache.org Objet : Re: Best way to get results ordered Hi Metin, How many IDs are you supplying in a single query? You could probably accomplish this easily with boosts if it were few. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. The Science of Influence Marketing 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Fri, Feb 21, 2014 at 1:25 PM, OSMAN Metin metin.os...@canal-plus.comwrote: Hi all, we are using SolR 4.4.0 and planning to migrate to 4.6.1 very soon. We are looking for a way to get results ordered in a certain way. For example, we are doing query by ids this way : q=id=A OR id =C OR id=B and we want the results to be sorted as A,C,B. Is there a good way to do this with SolR or should we sort the items on the client application side ? Regards, Metin
search across cores
If I want to search across cores, can I use (abuse?) the distributed search? My simple experiment seems to confirm this but I'd like to know if there is any drawbacks other than those of distributed search listed here? https://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations If all cores are served by the same machine, does a distributed search actually make sub-search requests over HTTP? Or is it clever enough to skip the HTTP connection? Kuro
hardcommit setting in solrconfig
Hello, We have following hard commit setting in solrconfig.xml. updateHandler class=solr.DirectUpdateHandler2 updateLog str name=dir${solr.ulog.dir:}/str /updateLog autoCommit maxTime${solr.autoCommit.maxTime:60}/maxTime maxDocs10/maxDocs openSearchertrue/openSearcher /autoCommit /updateHandler Shouldn't we see DirectUpdateHandler2; start commit and DirectUpdateHandler2; end_commit_flush message in our log at least every ten minutes? I understand that if we have more than 100K documents to commit, hard commit could happen earlier than 10 minutes. But we see hard commit spaced out by more than 20 to 30 minutes and sometimes couple hours. Can you please explain this behavior? Thanks!
Re: hardcommit setting in solrconfig
: Shouldn't we see DirectUpdateHandler2; start commit and : DirectUpdateHandler2; end_commit_flush message in our log at least every : ten minutes? I understand that if we have more than 100K documents to : commit, hard commit could happen earlier than 10 minutes. But we see : hard commit spaced out by more than 20 to 30 minutes and sometimes : couple hours. Can you please explain this behavior? autoCommit's only happen if needed -- if you start up your server and 20 minutes go by w/o any updates that need committed, there won't be a commit. if after 20 minutes of uptime you send a single document, then with your autoCommit setting of 10 minutes, a max of 10 more minutes will elapse before commit happens automaticaly - if you explicitly commit before the 10 minutes are up, no auto commiting will happen. -Hoss http://www.lucidworks.com/
Re: search across cores
On 2/21/2014 2:15 PM, T. Kuro Kurosaka wrote: If I want to search across cores, can I use (abuse?) the distributed search? My simple experiment seems to confirm this but I'd like to know if there is any drawbacks other than those of distributed search listed here? https://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations If all cores are served by the same machine, does a distributed search actually make sub-search requests over HTTP? Or is it clever enough to skip the HTTP connection? As long as the cores use the same schema, or at least have enough fields in common, searching across multiple cores with the shards parameter will work just fine. You would need the uniqueKey field to have the same name and underlying type on all cores, and any fields that you are searching would also have to be in all the cores. It does make subrequests with HTTP. If the address that is being contacted is local, the connection is very fast and does not actually go out on the network, so it has very low overhead. Thanks, Shawn
Re: hardcommit setting in solrconfig
On 2/21/2014 2:34 PM, Joshi, Shital wrote: updateHandler class=solr.DirectUpdateHandler2 updateLog str name=dir${solr.ulog.dir:}/str /updateLog autoCommit maxTime${solr.autoCommit.maxTime:60}/maxTime maxDocs10/maxDocs openSearchertrue/openSearcher /autoCommit /updateHandler Shouldn't we see DirectUpdateHandler2; start commit and DirectUpdateHandler2; end_commit_flush message in our log at least every ten minutes? I understand that if we have more than 100K documents to commit, hard commit could happen earlier than 10 minutes. But we see hard commit spaced out by more than 20 to 30 minutes and sometimes couple hours. Can you please explain this behavior? The autoCommit will not happen if you haven't indexed anything since the last commit. As I understand it, the timer and document counter don't actually start until the moment you send an update request (add, update, or delete). If no updates have come in, they are turned off once the commit completes. Are you seeing this happen when you do not have delays between updates? Thanks, Shawn
How long do commits take?
In solr 3.6, strictly using log files (catalina.out), how can I determine how long a commit operation takes? I don’t see a QTime to help me out as in optimize… No doubt, it’s staring me in the face but I can’t figure it out. Thanks in advance, Bill
Re: How long do commits take?
On 2/21/2014 4:26 PM, William Tantzen wrote: In solr 3.6, strictly using log files (catalina.out), how can I determine how long a commit operation takes? I don’t see a QTime to help me out as in optimize… No doubt, it’s staring me in the face but I can’t figure it out. Here's a log entry from Solr 4.6.1: INFO - 2014-02-21 17:09:04.837; org.apache.solr.update.processor.LogUpdateProcessor; [s1live] webapp=/solr path=/update params={waitSearcher=truecommit=truewt=javabinversion=2softCommit=true} {commit=} 0 4698 The QTime value here is 4698 milliseconds. I no longer have a 3.x server I can look at. Thanks, Shawn
Re: How long do commits take?
On 2/21/2014 5:15 PM, Shawn Heisey wrote: Here's a log entry from Solr 4.6.1: INFO - 2014-02-21 17:09:04.837; org.apache.solr.update.processor.LogUpdateProcessor; [s1live] webapp=/solr path=/update params={waitSearcher=truecommit=truewt=javabinversion=2softCommit=true} {commit=} 0 4698 The QTime value here is 4698 milliseconds. I no longer have a 3.x server I can look at. It was bugging me, not knowing what 3.x says. I pulled down the lucene_solr_3_6 branch, built the example, fired it up, and then sent a commit request to the update handler on collection1. http://server:8983/solr/collection1/update?commit=true I got the following in the logs: Feb 21, 2014 5:25:17 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {commit=} 0 12 Feb 21, 2014 5:25:17 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/update params={commit=true} status=0 QTime=12 Thanks, Shawn
Fwd: help on edismax_dynamic fields
Hello, I am using edismax parser in my project. I just wanted to confirm whether we can use dynamic fields with edismax or not. When I am using specific dynamic field in qf or pf parameter , it is working. But when iam using dynamic fields with *, like this: requestHandler name=/select class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=dftext/str str name=defTypeedismax/str * str name=qf* * *_nlp_new_sv^0.8* * *_nlp_copy_sv^0.2* * /str* /lst /requestHandler It is not working. Is it possible to use dynamic fields with *, like mentioned above with edismax? Please provide me some pointers on this. Thanks in advance.