Re: Solr Replication is not Possible on RAMDirectory?
There's an open issue (with a patch!) that enables this, it seems: https://issues.apache.org/jira/browse/SOLR-3911 Erik On Nov 5, 2012, at 07:41 , deniz wrote: Michael Della Bitta-2 wrote No, RAMDirectory doesn't work for replication. Use MMapDirectory... it ends up storing the index in RAM and more efficiently so, plus it's backed by disk. Just be sure to not set a big heap because MMapDirectory works outside of heap. for my tests, i dont think index is ended up in ram with mmap... i gave 4gigs for heap while using mmap and got mapping error while indexing... while index should be something around 2 gigs, ram consumption was around 300mbs... Can anyone explain why RAMDirectory cant be used for replication? I cant see why the master is set for using RAMDirectory and replica is using MMap or some other? As far as I understand SolrCloud is some kinda pushing from master to replica/slave... so why it is not possible to push from RAM to HDD? If my logic is wrong, someone can please explain me all these? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Replication-is-not-Possible-on-RAMDirectory-tp4017766p4018198.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: trunk is unable to replicate between nodes ( Unable to download ... completely)
https://issues.apache.org/jira/browse/SOLR-4032 -Original message- From:Mark Miller markrmil...@gmail.com Sent: Sat 03-Nov-2012 14:25 To: solr-user@lucene.apache.org Subject: Re: trunk is unable to replicate between nodes ( Unable to download ... completely) Likely some of the trunk work around allowing any Directory impl to replicate. JIRA pls :) - Mark On Oct 30, 2012, at 12:29 PM, Markus Jelsma markus.jel...@openindex.io wrote: Hi, We're testing again with today's trunk and using the new Lucene 4.1 format by default. When nodes are not restarted things are kind of stable but restarting nodes leads to a lot of mayhem. It seems we can get the cluster back up and running by clearing ZK and restarting everything (another issue) but replication becomes impossible for some nodes leading to a continuous state of failing recovery etc. Here are some excepts from the logs: 2012-10-30 16:12:39,674 ERROR [solr.servlet.SolrDispatchFilter] - [http-8080-exe c-5] - : null:java.lang.IndexOutOfBoundsException at java.nio.Buffer.checkBounds(Buffer.java:530) at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:218) at org.apache.lucene.store.ByteBufferIndexInput.readBytes(ByteBufferInde xInput.java:91) at org.apache.solr.handler.ReplicationHandler$DirectoryFileStream.write( ReplicationHandler.java:1065) at org.apache.solr.handler.ReplicationHandler$3.write(ReplicationHandler.java:932) 2012-10-30 16:10:32,220 ERROR [solr.handler.ReplicationHandler] - [RecoveryThrea d] - : SnapPull failed :org.apache.solr.common.SolrException: Unable to download _x.fdt completely. Downloaded 13631488!=13843504 at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.cleanup(SnapP uller.java:1237) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(Sna pPuller.java:1118) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java :716) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:387) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:273) at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:152) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:407) 2012-10-30 16:12:51,061 WARN [solr.handler.ReplicationHandler] - [http-8080-exec -3] - : Exception while writing response for params: file=_p_Lucene41_0.doccomm and=filecontentchecksum=truegeneration=6qt=/replicationwt=filestream java.io.EOFException: read past EOF: MMapIndexInput(path=/opt/solr/cores/openindex_h/data/index.20121030152234973/_p_Lucene41_0.doc) at org.apache.lucene.store.ByteBufferIndexInput.readBytes(ByteBufferIndexInput.java:100) at org.apache.solr.handler.ReplicationHandler$DirectoryFileStream.write(ReplicationHandler.java:1065) at org.apache.solr.handler.ReplicationHandler$3.write(ReplicationHandler.java:932) Needless to say i'm puzzled so i'm wondering if anyone has seen this before or have some hints that might help digg further. Thanks, Markus
RE: No lockType configured for NRTCachingDirectory
https://issues.apache.org/jira/browse/SOLR-4033 -Original message- From:Mark Miller markrmil...@gmail.com Sent: Sat 03-Nov-2012 14:26 To: solr-user@lucene.apache.org Subject: Re: No lockType configured for NRTCachingDirectory I think I've seen it on 4.X as well yesterday. Let's file a JIRA to track looking into it. - Mark On Oct 31, 2012, at 11:30 AM, Markus Jelsma markus.jel...@openindex.io wrote: That's 5, the actual trunk/ -Original message- From:Mark Miller markrmil...@gmail.com Sent: Wed 31-Oct-2012 16:29 To: solr-user@lucene.apache.org Subject: Re: No lockType configured for NRTCachingDirectory By trunk do you mean 4X or 5X? On Wed, Oct 31, 2012 at 7:47 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi, Besides replication issues (see other thread) we're also seeing these warnings in the logs on all 10 nodes and for all cores using today's or yesterday's trunk. 2012-10-31 11:01:03,328 WARN [solr.core.CachingDirectoryFactory] - [main] - : No lockType configured for NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/opt/solr/cores/shard_h/data lockFactory=org.apache.lucene.store.NativeFSLockFactory@5dd183b7; maxCacheMB=48.0 maxMergeSizeMB=4.0) assuming 'simple' The factory is configured like: config luceneMatchVersionLUCENE_50/luceneMatchVersion directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory}/ .. /config And the locking mechanisch is configured like: indexConfig .. lockTypenative/lockType .. /indexConfig Any ideas to why it doesn't seem to see my lockType? Thanks Markus -- - Mark
RE: trouble instantiating CloudSolrServer
Something was wrong on my machine, removing Nutch' build dir and cleanly rebuilding everything fixed the issue. Thanks -Original message- From:Mark Miller markrmil...@gmail.com Sent: Sat 03-Nov-2012 02:57 To: solr-user@lucene.apache.org Subject: Re: trouble instantiating CloudSolrServer I think the maven jars must be out of whack? On Fri, Nov 2, 2012 at 6:38 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi, We use trunk but got SolrJ 4.0 from Maven. Creating an instance of CloudSolrServer fails because its constructor calls a not existing LBServer constructor, it attempts to create an instance by only passing a HttpClient. How is LBHttpSolrServer supposed to work without passing a SolrServer URL to it? public CloudSolrServer(String zkHost) throws MalformedURLException { this.zkHost = zkHost; this.myClient = HttpClientUtil.createClient(null); this.lbServer = new LBHttpSolrServer(myClient); this.updatesToLeaders = true; } java.lang.NoSuchMethodError: org.apache.solr.client.solrj.impl.LBHttpSolrServer.init(Lorg/apache/http/client/HttpClient;[Ljava/lang/String;)V at org.apache.solr.client.solrj.impl.CloudSolrServer.init(CloudSolrServer.java:84) Thanks, Markus -- - Mark
RE: Running solr on apache tomcat
I need both lucidwork admin and solr to run on apache tomcat. Thanks Regards, Leena Jawale Software Engineer Trainee BFS BU Phone No. - 9762658130 Email - leena.jaw...@lntinfotech.commailto:leena.jaw...@lntinfotech.com From: Leena Jawale Sent: Monday, November 05, 2012 5:37 PM To: 'solr-user@lucene.apache.org' Subject: Running solr on apache tomcat Hi, I have installed lucidwork enterprise 2.1.1 and apache tomcat 6. I want to run solr on tomcat server. For that I need to deploy solr.war file on Tomcat Web Application Manager. But where can I find .war in lucidwork enterprise installation. Or I need to make a .war file? How to make it ? Do you have any solution for this ? Thanks Regards, Leena Jawale The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. LT Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail
Re: Running solr on apache tomcat
You need to contact LucidWorks for support and documentation for their products. This mailing list is for Apache Solr only. -- Jack Krupansky -Original Message- From: Leena Jawale Sent: Monday, November 05, 2012 7:07 AM To: solr-user@lucene.apache.org Subject: Running solr on apache tomcat Hi, I have installed lucidwork enterprise 2.1.1 and apache tomcat 6. I want to run solr on tomcat server. For that I need to deploy solr.war file on Tomcat Web Application Manager. But where can I find .war in lucidwork enterprise installation. Or I need to make a .war file? How to make it ? Do you have any solution for this ? Thanks Regards, Leena Jawale The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. LT Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail
Re: Add new shard will be treated as replicas in Solr4.0?
Not at present. What you're interested in is shard splitting which is certainly on the roadmap but not implemented yet. To expand the number of shards you'll have to reconfigure, then re-index. Best Erick On Mon, Nov 5, 2012 at 4:09 AM, Zeng Lames lezhi.z...@gmail.com wrote: Dear All, we have an existing solr collection, 2 shards, numOfShard is 2. and there are already records in the index files. now we start another solr instance with ShardId= shard3, and found that Solr treat it as replicas. check the zookeeper data, found the range of shard doesn't change correspondingly. shard 1 is 0-7fff, while shard 2 is 8000-. is there any way to increase new shard for existing collection? thanks a lot! Lames
How to request multiple snippet lengths
Hi, I am trying to find a way to request multiple snippets of different lengths for a single field. I know that this could be accomplished by issuing two separate queries or duplicating the field, but I want to avoid doubling the query load or index size. I'd also rather not resort to writing a custom snippet generator. I am using Solr 4.0 and I thought perhaps the new field aliasing feature would be an ideal solution. However it appears that the highlighting component doesn't understand this new syntax. For instance the following request would ideally return two highlighted snippets: a short snippet and the full field: http://localhost:8983/solr/select?defType=edismaxq=fooqf=contentfl=idhl=truehl.fl=content,snip:contentf.content.hl.fragsize=5f.snip.hl.fragsize=200wt=jsonindent=true This results in something like the following: highlighting:{ 12345:{ content:{ snippet:[emfoo/em bar baz]}, snip:content:{ snippet:null}}, From this I gather that the highlighter looks for the field named snip:content and fails. I tried a number of variations with the field alias specified in the fl, qf and f.myalias.qf params, but to no avail. I've been pouring over JIRA issues, patches and source code but I can't determine what the proper syntax should be. Does anyone have an ideas on how to go about this? Thanks, Dave -- Where there is love, nothing is too much trouble and there is always time. ~ ‘Abdu’l-Bahá
Re: Where to get more documents or references about sold cloud?
Is most of the Web blocked in your location? When I Google SolrCloud, Google says that there are About 61,400 results with LOTS of informative links, including blogs, videos, slideshares, etc. just on the first two pages pf search results alone. If you have specific questions, please ask them with specific detail, but try reading a few of the many sources of information available on the Web first. -- Jack Krupansky -Original Message- From: SuoNayi Sent: Monday, November 05, 2012 3:32 AM To: solr-user@lucene.apache.org Subject: Where to get more documents or references about sold cloud? Hi all, there is only one entry about solr cloud on the wiki,http://wiki.apache.org/solr/SolrCloud. I have googled a lot and found no more details about solr cloud, or maybe I miss something?
Re: Grouping for categories / performance
Maybe you simply don't have enough heap memory space available to the Java JVM for Solr to do large groups. -- Jack Krupansky -Original Message- From: ilay Sent: Monday, November 05, 2012 2:20 AM To: solr-user@lucene.apache.org Subject: Grouping for categories / performance Hello all, I have a situation for solr grouping where I want group my products into top categories for a ecommerce application. The number of groups here is less than 10 and total number of docs in the index is 10 Million. Will solr goruping is an issue here, we have seen OOM issue when we tried grouping for books simillar editions against the same index. However, if we are grouping for categories where number of groups is less than 10, will it still be a problem? Any thoughts on this can be greatly appreciated. - -Ilay -- View this message in context: http://lucene.472066.n3.nabble.com/Grouping-for-categories-performance-tp4018200.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Does SolrCloud supports MoreLikeThis?
There is a ticket for that with some recent activity (sorry I don't have it handy right now), but I'm not sure if that work made it into the trunk, so probably solrcloud does not support MLT...yet. Would love an update from the dev team though! brbrbr--- Original Message --- On 11/5/2012 10:37 AM Luis Cappa Banda wrote:brThat´s the question, :-) br brRegards, br brLuis Cappa. br
Re: Does SolrCloud supports MoreLikeThis?
Thanks for the answer, Darren! I still have the hope that MLT is supported in the current version. An important feature of the product that I´m developing depends on that, and even if I can emulate MLT with a Dismax or E-dismax component, the thing is that MLT fits and works perfectly... Regards, Luis Cappa. 2012/11/5 Darren Govoni dar...@ontrenet.com There is a ticket for that with some recent activity (sorry I don't have it handy right now), but I'm not sure if that work made it into the trunk, so probably solrcloud does not support MLT...yet. Would love an update from the dev team though! brbrbr--- Original Message --- On 11/5/2012 10:37 AM Luis Cappa Banda wrote:brThat´s the question, :-) br brRegards, br brLuis Cappa. br
Re: Solr question
It seems this is exactly what i need. From this tutorial http://www.solrtutorial.com/custom-solr-functionquery.html it would seem the main thing that i have to do is implement a DocValues class which gets (Map context, IndexReader reader) and provides methods which return a score for the document given an id. Only thing i am still unsure of is if i can get the text of the documents by id but im fairly sure i can do that through the reader. Thanks/hvala, Mladen. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-question-tp4017024p4018297.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: lukeall.jar for Solr4r?
I checked out luke-src-4.0.0-ALPHA.tgz, the most recent I could find, and compiled, but I still get the error Format version not supported (resource MMapIndexInput(path=/var/lib/tomcat6/solr/apache-solr-4.0.0-/core1/data/index/_7.tvx)): 1 needs to be between 0 and 0 Can anyone post a luke.jar capable of reading 4.0 indexes? On 10/27/2012 09:17 PM, Lance Norskog wrote: Aha! Andrzej has not built a 4.0 release version. You need to check out the source and compile your own. http://code.google.com/p/luke/downloads/list - Original Message - | From: Carrie Coyc...@ssww.com | To: solr-user@lucene.apache.org | Sent: Friday, October 26, 2012 7:33:45 AM | Subject: lukeall.jar for Solr4r? | | Where can I get a copy of Luke capable of reading Solr4 indexes? My | lukeall-4.0.0-ALPHA.jar no longer works. | | Thx, | Carrie Coy |
More Like this without a document?
Hi, I'm designing a K-nearest neighbors classifier for Solr. So I am taking information IMDB and creating a set of documents with the description of each movie and the categories selected for each document. To validate if the classification is correct I'm using cross-validation. So I do not include in the index the documents that I want to guess. If I want to use MoreLikeThis algorithm I need to add this documents in the index? The MoreLikeThis will work with soft commits? Is there a solution to do a MoreLikeThis without adding the document in the index? Thanks, Raimon Bosch.
ANNOUNCE: Stump The Chump @ ApacheCon EU
Hey folks, In 2 days, I'll be doing a Stump The Chump session at ApacheCon EU in Sinsheim, Germany http://www.apachecon.eu/schedule/presentation/170/ If you aren't familiar with Stump The Chump it is a QA style session where I (the Chump) get put on the hot seat to answer tough / interesting / unusual questions about Lucene Solr -- live, on stage, in front of hundreds of people who are laughing at me, with judges who have all seen and thought about the questions in advance and get to mock me and make me look bad. It's really a lot of fun. Even if you won't be at the conference, you can still participate by emailing your challenging question to st...@lucene-eurocon.org. (Regardless of whether you already found a solution to a tough problem, you can still submit it and see what kind of creative solution I might come up with under pressure.) Prizes will be awarded at the discretion of the judges, and video should be posted online at some point soon after the con -- more details and links to videos of past sessions are in my recent blog post(s)... http://searchhub.org/dev/tag/chump/ -Hoss
Re: large text blobs in string field
Gora, currently our core does use mult-valued fields. however the exsiting multi-valued fields in the schema are will only result in 3 - 10 values. we are thinking of using the text blob approach primarily because of the large number of possible values in this field. if we were to use a multi-valued field, it is likely that the MV field would have 200+ values and in some edge cases 400+ values. are you saying that the MV field approach to represent the data (given the scale previously indicated) is the best design solution? -- View this message in context: http://lucene.472066.n3.nabble.com/large-text-blobs-in-string-field-tp4017882p4018315.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: large text blobs in string field
On 5 November 2012 22:26, geeky2 gee...@hotmail.com wrote: Gora, currently our core does use mult-valued fields. however the exsiting multi-valued fields in the schema are will only result in 3 - 10 values. we are thinking of using the text blob approach primarily because of the large number of possible values in this field. if we were to use a multi-valued field, it is likely that the MV field would have 200+ values and in some edge cases 400+ values. are you saying that the MV field approach to represent the data (given the scale previously indicated) is the best design solution? Yes. I do not have direct experience with so many values per multi-valued field, but as per people who know better 400-odd values should not be a problem. This is probably better than indexing, retrieving, and parsing a text blob. Regards, Gora
Re: More Like this without a document?
I wrote something like this for Ultraseek. After the document was parsed and analyzed, I took the top terms (by tf.idf) and did a search, then added fields with the categories. You might be able to use the document analysis request handler for this. Analyze it, then choose terms, do the search, modify the doc, then submit it for indexing. It would get parsed twice, but that might not be a big deal. Warning, this could put a big load on Solr. My implementation really pounded Ultraseek. The queries are long and they don't really match what is in the caches. wunder On Nov 5, 2012, at 8:40 AM, Raimon Bosch wrote: Hi, I'm designing a K-nearest neighbors classifier for Solr. So I am taking information IMDB and creating a set of documents with the description of each movie and the categories selected for each document. To validate if the classification is correct I'm using cross-validation. So I do not include in the index the documents that I want to guess. If I want to use MoreLikeThis algorithm I need to add this documents in the index? The MoreLikeThis will work with soft commits? Is there a solution to do a MoreLikeThis without adding the document in the index? Thanks, Raimon Bosch.
RE: DataImport Handler : Transformer Function Eval Failed Error
Looks like it will be helpful. I'm going to give it a shot. Thanks, Otis. Shikhar From: Otis Gospodnetic [otis.gospodne...@gmail.com] Sent: Friday, November 02, 2012 4:36 PM To: solr-user@lucene.apache.org Subject: Re: DataImport Handler : Transformer Function Eval Failed Error Would http://wiki.apache.org/solr/Join do anything for you? Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Fri, Nov 2, 2012 at 10:06 AM, Mishra, Shikhar shikhar.mis...@telcobuy.com wrote: We have a scenario where the same products are available from multiple vendors at different prices. We want to store these prices along with the products in the index (product has many prices), so that we can apply dynamic filtering on the prices at the time of search. Thanks, Shikhar -Original Message- From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] Sent: Thursday, November 01, 2012 8:13 PM To: solr-user@lucene.apache.org Subject: Re: DataImport Handler : Transformer Function Eval Failed Error Hi, That looks a little painful... what are you trying to achieve by storing JSON in there? Maybe there's a simpler way to get there... Otis -- Performance Monitoring - http://sematext.com/spm On Nov 1, 2012 6:16 PM, Mishra, Shikhar shikhar.mis...@telcobuy.com wrote: Hi, I'm trying to store a list of JSON objects as stored value for the field prices (see below). I'm getting the following error from the custom transformer function (see the data-config file at the end) of data import handler. Error Message -- - Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: 'eval' failed with language: JavaScript and script: function vendorPrices(row){ var wwtCost = row.get('WWT_COST'); var listPrice = row.get('LIST_PRICE'); var vendorName = row.get('VENDOR_NAME'); //Below approach fails var prices = []; prices.push({'vendor':vendorName}); prices.push({'wwtCost':wwtCost}); prices.push({'listPrice':listPrice}); row.put('prices':prices); //Below approach works //row.put('prices', '{' + 'vendor:' + vendorName + ', ' + 'wwtCost:' + wwtCost + ', ' + 'listPrice:' + listPrice + '}'); return row; } Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndT hrow(DataImportHandlerException.java:71) Data Import Handler Configuration File dataConfig script ![CDATA[ function vendorPrices(row){ var wwtCost = row.get('WWT_COST'); var listPrice = row.get('LIST_PRICE'); var vendorName = row.get('VENDOR_NAME'); //Below approach fails var prices = []; prices.push({'vendor':vendorName}); prices.push({'wwtCost':wwtCost}); prices.push({'listPrice':listPrice}); row.put('prices':prices); //Below approach works //row.put('prices', '{' + 'vendor:' + vendorName + ', ' + 'wwtCost:' + wwtCost + ', ' + 'listPrice:' + listPrice + '}'); return row; } ]] /script dataSource driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST= rac-scan.somr.com)(PORT=3465))(CONNECT_DATA=(SERVICE_NAME= ERP_GENERAL.SOMR.ORG))) user=dummy password=xx/ document entity name=item query=select * from wwt_catalog.wwt_product prod, wwt_catalog.wwt_manufacturer mfg where prod.mfg_id = mfg.mfg_id and prod.mfg_product_number = 'CON-CBO2-B22HPF' field column=PRODUCT_ID name=id / field column=MFG_PRODUCT_NUMBER name=name / field column=MFG_PRODUCT_NUMBER name=nameSort / field column=MFG_NAME name=manu / field column=MFG_ITEM_NUMBER name=alphaNameSort / field column=DESCRIPTION name=features / field column=DESCRIPTION name=description / entity name=vendor_sources transformer=script:vendorPrices query=SELECT PRICE.WWT_COST, PRICE.LIST_PRICE, VEND.VENDOR_NAME, AVAIL.LEAD_TIME, AVAIL.QTY_AVAILABLE FROM wwt_catalog.wwt_product prod, wwt_catalog.wwt_product_pricing price, wwt_catalog.wwt_vendor vend, wwt_catalog.wwt_product_availability avail WHERE PROD.PRODUCT_ID = price.product_id(+) AND price.vendor_id =
Re: large text blobs in string field
The only thing special about a multiValued field is that it can have non-consecutive positions due to the incrementGap. So, if you set the incrementGap=1, adding 10,000,000 words in one go is the same as adding 1 word at a time 10,000,000 times to a mutliValued field. I think the only practical is that you're _probably_ going to have problems if (total tokens added) + (increment_gap * number of entries) 2B or so... FWIW Erick On Mon, Nov 5, 2012 at 1:40 PM, geeky2 gee...@hotmail.com wrote: is there any documented limit (or practical limit) on how many values in a multi-valued field? -- View this message in context: http://lucene.472066.n3.nabble.com/large-text-blobs-in-string-field-tp4017882p4018335.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: large text blobs in string field
Erick, thanks for the insight. FWIW and to add to the context of this discussion, if we do decide to add the previously mentioned content as a multivalued field, we would likely use a DIH hooked to our database schema (this is currently how we add ALL content to our core) and within the DIH, use a sub-entity to pull the many rows for each parent row. thx mark -- View this message in context: http://lucene.472066.n3.nabble.com/large-text-blobs-in-string-field-tp4017882p4018355.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.0 simultaneous query problem
Hi, So it seems that when I query multiple shards with the sort criteria for 5000 documents, it queries all shards and gets a list of document ids and then adds the document ids to the original query and queries all the shards again. This process of doing the join of query results with the unique ids and getting the remaining fields is turning out to be really slow. It takes a while to search for a list of unique ids. Is there any config change to make this process faster? Also what does isDistrib=false mean when solr generates the queries internally? Thanks, Rohit On Fri, Oct 19, 2012 at 5:23 PM, Rohit Harchandani rhar...@gmail.comwrote: Hi, The same query is fired always for 500 rows. The only thing different is the start parameter. The 3 shards are in the same instance on the same server. They all have the same schema. But the inherent type of the documents is different. Also most of the apps queries goes to shard A which has the smallest index size (4gb). The query is made to a master shard which by default goes to all 3 shards for results. (also, the query that i am trying matches documents only only in shard A mentioned above) Will try debugQuery now and post it here. Thanks, Rohit On Thu, Oct 18, 2012 at 11:00 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Maybe you can narrow this down a little further. Are there some queries that are faster and some slower? Is there a pattern? Can you share examples of slow queries? Have you tried debugQuery=true? These 3 shards is each of them on its own server or? Is the slow one always the one that hits the biggest shard? Do they hold the same type of data? How come their sizes are so different? Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Thu, Oct 18, 2012 at 12:22 PM, Rohit Harchandani rhar...@gmail.com wrote: Hi all, I have an application which queries a solr instance having 3 shards(4gb, 13gb and 30gb index size respectively) having 6 million documents in all. When I start 10 threads in my app to make simultaneous queries (with rows=500 and different start parameter, sort on 1 field and no facets) to solr to return 500 different documents in each query, sometimes I see that most of the responses come back within no time (500ms-1000ms), but the last response takes close to 50 seconds (Qtime). I am using the latest 4.0 release. What is the reason for this delay? Is there a way to prevent this? Thanks and regards, Rohit
Re: Solr 4.0 simultaneous query problem
Don't query for 5000 documents. That is going to be slow no matter how it is implemented. wunder On Nov 5, 2012, at 1:00 PM, Rohit Harchandani wrote: Hi, So it seems that when I query multiple shards with the sort criteria for 5000 documents, it queries all shards and gets a list of document ids and then adds the document ids to the original query and queries all the shards again. This process of doing the join of query results with the unique ids and getting the remaining fields is turning out to be really slow. It takes a while to search for a list of unique ids. Is there any config change to make this process faster? Also what does isDistrib=false mean when solr generates the queries internally? Thanks, Rohit On Fri, Oct 19, 2012 at 5:23 PM, Rohit Harchandani rhar...@gmail.comwrote: Hi, The same query is fired always for 500 rows. The only thing different is the start parameter. The 3 shards are in the same instance on the same server. They all have the same schema. But the inherent type of the documents is different. Also most of the apps queries goes to shard A which has the smallest index size (4gb). The query is made to a master shard which by default goes to all 3 shards for results. (also, the query that i am trying matches documents only only in shard A mentioned above) Will try debugQuery now and post it here. Thanks, Rohit On Thu, Oct 18, 2012 at 11:00 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Maybe you can narrow this down a little further. Are there some queries that are faster and some slower? Is there a pattern? Can you share examples of slow queries? Have you tried debugQuery=true? These 3 shards is each of them on its own server or? Is the slow one always the one that hits the biggest shard? Do they hold the same type of data? How come their sizes are so different? Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Thu, Oct 18, 2012 at 12:22 PM, Rohit Harchandani rhar...@gmail.com wrote: Hi all, I have an application which queries a solr instance having 3 shards(4gb, 13gb and 30gb index size respectively) having 6 million documents in all. When I start 10 threads in my app to make simultaneous queries (with rows=500 and different start parameter, sort on 1 field and no facets) to solr to return 500 different documents in each query, sometimes I see that most of the responses come back within no time (500ms-1000ms), but the last response takes close to 50 seconds (Qtime). I am using the latest 4.0 release. What is the reason for this delay? Is there a way to prevent this? Thanks and regards, Rohit -- Walter Underwood wun...@wunderwood.org
load balance with SolrCloud
we are using solr 3.5 in production and we deal with customers data of terabytes. we are using shards for large customers and write our own replica management in our software. Now with the rapid growth of data, we are looking into solrcloud for its robustness of sharding and replications. I understand by read some documents on line that there is no SPOF using solrcloud, so any instance in the cluster can server the query/index. However, is it true that we need to write our own load balancer in front of solrCloud? For example if we want to implement a model similar to Loggly, i.e. each customer start indexing into the small shard of its own, then if any of the customers grow more than the small shard's limit, we switch to index into another small shard (we call it front end shard), meanwhile merge the just released small shard to next level larger shard. Since the merge can happen between two instances on different servers, we probably end up with synch the index files for the merging shards and then use solr merge. I am curious if there is anything solr provide to help on these kind of strategy dealing with unevenly grow big customer data (a core)? or do we have to write these in our software layer from scratch? thanks Jie -- View this message in context: http://lucene.472066.n3.nabble.com/load-balance-with-SolrCloud-tp4018367.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: lukeall.jar for Solr4r?
On 11/5/2012 9:19 AM, Carrie Coy wrote: I checked out luke-src-4.0.0-ALPHA.tgz, the most recent I could find, and compiled, but I still get the error Format version not supported (resource MMapIndexInput(path=/var/lib/tomcat6/solr/apache-solr-4.0.0-/core1/data/index/_7.tvx)): 1 needs to be between 0 and 0 Can anyone post a luke.jar capable of reading 4.0 indexes? Just for giggles, I downloaded the src for the latest luke, pulled it into eclipse, replaced all the 4.0.0-ALPHA jars with 4.0 release jars, then attempted to fix the resulting code problems. By comparing javadocs for 4.0.0-ALPHA, 4.0.0-BETA, and 4.0.0, I was able to get rid all the error flags. No idea whether I did it right, or even whether it works. All my indexes are either 3.5 or 4.1-SNAPSHOT, so I can't actually test it. You can get to the resulting jar and my patch against the luke-4.0.0-ALPHA source: https://dl.dropbox.com/u/97770508/luke-4.0.0-unofficial.patch https://dl.dropbox.com/u/97770508/lukeall-4.0.0-unofficial.jar If you have an immediate need for 4.0.0 support in Luke, please try it out and let me know whether it works. If it doesn't work, or when the official luke 4.0.0 is released, I will remove those files from my dropbox. Thanks, Shawn
Re: lukeall.jar for Solr4r?
On 11/5/2012 2:52 PM, Shawn Heisey wrote: No idea whether I did it right, or even whether it works. All my indexes are either 3.5 or 4.1-SNAPSHOT, so I can't actually test it. You can get to the resulting jar and my patch against the luke-4.0.0-ALPHA source: https://dl.dropbox.com/u/97770508/luke-4.0.0-unofficial.patch https://dl.dropbox.com/u/97770508/lukeall-4.0.0-unofficial.jar If you have an immediate need for 4.0.0 support in Luke, please try it out and let me know whether it works. If it doesn't work, or when the official luke 4.0.0 is released, I will remove those files from my dropbox. I just realized that the version I uploaded there was compiled with java 1.7.0_09. I don't know if this is actually a problem, but just in case, I re-did the compile on a machine with 1.6.0_29. The filename referenced above now points to this version and I have included a file that indicates its java7 origins: https://dl.dropbox.com/u/97770508/lukeall-4.0.0-unofficial-java7.jar Thanks, Shawn
Re: Add new shard will be treated as replicas in Solr4.0?
hi Erick, thanks for your kindly response. hv got the information from the SolrCloud wiki. think we may need to defined the shard numbers when we really rollout it. thanks again On Mon, Nov 5, 2012 at 8:40 PM, Erick Erickson erickerick...@gmail.comwrote: Not at present. What you're interested in is shard splitting which is certainly on the roadmap but not implemented yet. To expand the number of shards you'll have to reconfigure, then re-index. Best Erick On Mon, Nov 5, 2012 at 4:09 AM, Zeng Lames lezhi.z...@gmail.com wrote: Dear All, we have an existing solr collection, 2 shards, numOfShard is 2. and there are already records in the index files. now we start another solr instance with ShardId= shard3, and found that Solr treat it as replicas. check the zookeeper data, found the range of shard doesn't change correspondingly. shard 1 is 0-7fff, while shard 2 is 8000-. is there any way to increase new shard for existing collection? thanks a lot! Lames
Re: Add new shard will be treated as replicas in Solr4.0?
btw, where can i find all the items in the road map? thanks! On Tue, Nov 6, 2012 at 8:55 AM, Zeng Lames lezhi.z...@gmail.com wrote: hi Erick, thanks for your kindly response. hv got the information from the SolrCloud wiki. think we may need to defined the shard numbers when we really rollout it. thanks again On Mon, Nov 5, 2012 at 8:40 PM, Erick Erickson erickerick...@gmail.comwrote: Not at present. What you're interested in is shard splitting which is certainly on the roadmap but not implemented yet. To expand the number of shards you'll have to reconfigure, then re-index. Best Erick On Mon, Nov 5, 2012 at 4:09 AM, Zeng Lames lezhi.z...@gmail.com wrote: Dear All, we have an existing solr collection, 2 shards, numOfShard is 2. and there are already records in the index files. now we start another solr instance with ShardId= shard3, and found that Solr treat it as replicas. check the zookeeper data, found the range of shard doesn't change correspondingly. shard 1 is 0-7fff, while shard 2 is 8000-. is there any way to increase new shard for existing collection? thanks a lot! Lames
Re: Solr Replication is not Possible on RAMDirectory?
Erik Hatcher-4 wrote There's an open issue (with a patch!) that enables this, it seems: lt;https://issues.apache.org/jira/browse/SOLR-3911gt; i will check it for sure, thank you Erik :) Shawn Heisey-4 wrote ... transparently mapping the files on disk to a virtual memory space and using excess RAM to cache that data and make it fast. If you have enough extra memory (disk cache) to fit the entire index, the OS will never have to read any part of the index from disk more than once so for disk cache, are there any disks with 1 gigs or more of caches? if am not wrong there are mostly 16 or 32mb cache disks around (or i am checking the wrong stuff? ) if so, that amount definitely too small... - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Replication-is-not-Possible-on-RAMDirectory-tp4017766p4018396.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: need help on solr search
Used mm parameter and it works! Right now preparing perf test. Please share if anybody has method to optimize dismax queries Thanks! Jeremy Otis Gospodnetic-5 wrote Hi, Have a look at your solrconfig.xml and look for your default operator. Also look at the docs for the mm parameter on the Wiki. Let us know if that does it for you. -- View this message in context: http://lucene.472066.n3.nabble.com/need-help-on-solr-search-tp4017191p4018397.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Replication is not Possible on RAMDirectory?
Shawn Heisey-4 wrote ... transparently mapping the files on disk to a virtual memory space and using excess RAM to cache that data and make it fast. If you have enough extra memory (disk cache) to fit the entire index, the OS will never have to read any part of the index from disk more than once so for disk cache, are there any disks with 1 gigs or more of caches? if am not wrong there are mostly 16 or 32mb cache disks around (or i am checking the wrong stuff? ) if so, that amount definitely too small... I am not talking about the cache on the actual disk drive, or even cache on your hard drive controller. I am talking about the operating system using RAM, specifically RAM not being used by programs, to cache data on your hard drive. All modern operating systems do it, even the one made in Redmond that people love to hate. If you have 16 GB of RAM and all your programs use up 4.5 GB, you can count on the OS using at least another half GB, so you have about 11 GB left. The OS is going to put data that it reads and writes to/from your disk in this space. If you start up another program that wants 2GB, the OS will simply throw away 2 GB of data in its cache (it's still on the disk, after all) and give that RAM to the new program. Solr counts on this OS capability for good performance. Thanks, Shawn
Re: Solr Replication is not Possible on RAMDirectory?
Here's some reading: http://en.wikipedia.org/wiki/Page_cache Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Mon, Nov 5, 2012 at 8:02 PM, deniz denizdurmu...@gmail.com wrote: Erik Hatcher-4 wrote There's an open issue (with a patch!) that enables this, it seems: lt;https://issues.apache.org/jira/browse/SOLR-3911gt; i will check it for sure, thank you Erik :) Shawn Heisey-4 wrote ... transparently mapping the files on disk to a virtual memory space and using excess RAM to cache that data and make it fast. If you have enough extra memory (disk cache) to fit the entire index, the OS will never have to read any part of the index from disk more than once so for disk cache, are there any disks with 1 gigs or more of caches? if am not wrong there are mostly 16 or 32mb cache disks around (or i am checking the wrong stuff? ) if so, that amount definitely too small... - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Replication-is-not-Possible-on-RAMDirectory-tp4017766p4018396.html Sent from the Solr - User mailing list archive at Nabble.com.
Reply:Re: Where to get more documents or references about sold cloud?
Thanks jack and thanks for the great country. All big famous websites such as google, slideshares and blogspot etc are blocked. What I want to know about is more details about solrcloud, here is my questions: 1.Can we control the relocation of shard / replica dynamically? 2.Can we move shard between solr instances? 3.Is one solr instance related to one shard / replica? 4.What's the sharding key algorithm? 5.Does it support custom sharding key? At 2012-11-05 20:44:46,Jack Krupansky j...@basetechnology.com wrote: Is most of the Web blocked in your location? When I Google SolrCloud, Google says that there are About 61,400 results with LOTS of informative links, including blogs, videos, slideshares, etc. just on the first two pages pf search results alone. If you have specific questions, please ask them with specific detail, but try reading a few of the many sources of information available on the Web first. -- Jack Krupansky -Original Message- From: SuoNayi Sent: Monday, November 05, 2012 3:32 AM To: solr-user@lucene.apache.org Subject: Where to get more documents or references about sold cloud? Hi all, there is only one entry about solr cloud on the wiki,http://wiki.apache.org/solr/SolrCloud. I have googled a lot and found no more details about solr cloud, or maybe I miss something?
Re: How to change the boost of fields in edismx at runtime
Thanks Hoss. Yes, that approach would work as I can change the query. Is there a way to extend the Edismax Handler to read a config file at startup and then use some events like commit to instruct edismax handler to re-read the config file. That way, I can ensure that my boost params are just on on Solr Servers' config files and If I need to change, I would just change the file and wait for commit to re-read the file. Any inputs? -Saroj On Thu, Nov 1, 2012 at 2:50 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Then, If I find that results are not of my liking then I would like to : change the boost as following : : - Title - boosted to 2 : -Keyword - boosted to 10 : : Is there any way to change this boost, at run-time, without having to : restart solr with new boosts in edismax? edismax field boosts (specified in the qf and pf params) can always be specified at runtime -- first and foremost they are query params. when you put then in your solrconfig.xml file those are just as defaults (or invariants, or appends) of those query params. -Hoss
Re: Splitting index created from a csv using solr
Thanks for the reply Gora i just wanted to know if solr could do it by itself now from your answer i could see its not possible So what do you think is the best way to split it I mean should i use Luke to split the index or should I split the csv and index it @Walter Thankyou sir , i dont have a unix environment though -- View this message in context: http://lucene.472066.n3.nabble.com/Splitting-index-created-from-a-csv-using-solr-tp4018195p4018427.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Splitting index created from a csv using solr
On 6 November 2012 10:52, mitra mitra.re...@ornext.com wrote: Thanks for the reply Gora i just wanted to know if solr could do it by itself now from your answer i could see its not possible Yes, this is not a common use case. So what do you think is the best way to split it I mean should i use Luke to split the index or should I split the csv and index it [...] Walter already covered that. It is better to split the CSV What OS are you using? I am not familiar with Windows, but I am sure that there would be tools that do the equivalent of split. You would have better luck asking elsewhere. Regards, Gora
SolrCloud - configuration management in ZooKeeper
Zookeeper manages not only the cluster state, but also the common configuration files. My question is, what are the exact rules of precedence? That is, when SOLR node will decide to download new configuration files? Will configuration files be updated from ZooKeeper every time the core is refreshed? What if bootstrapping is defined (bootstrap_configdir)? Will the node always try to upload? What are the best practices for production environment? Is it better to use external tool (ZkCLI) to trigger configuration changes? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-configuration-management-in-ZooKeeper-tp4018432.html Sent from the Solr - User mailing list archive at Nabble.com.
Problem with ping handler, SolrJ 4.1-SNAPSHOT, Solr 3.5.0
If I send a ping request to Solr 4.1 from SolrJ 4.1, it works. I don't have an exact revision number from branch_4x, I don't know how to get it from SolrJ. The 4.1 server is running solr-impl 4.1-SNAPSHOT 1401798M with the patch from SOLR-1972 applied, and it's somewhat newer than SolrJ. Java code snippets: private static final String PING_HANDLER = /admin/ping; query.setRequestHandler(PING_HANDLER); response = _querySolr.query(query); If I use this exact same code to talk to a Solr 3.5.0 server (older version of the SOLR-1972 patch applied) with the ping handler in the enabled state, I get the following exception. The /admin/ping handler works in a browser on both Solr versions: Caused by: org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:98) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301) at com.newscom.common.solr.Core.ping(Core.java:396) ... 4 more Caused by: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:109) at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:384) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90) ... 6 more If I use the XMLResponseParser instead, then I get a different exception: Caused by: org.apache.solr.common.SolrException: parsing error at org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:143) at org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:104) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:384) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301) at com.newscom.common.solr.Core.ping(Core.java:398) ... 4 more Caused by: java.lang.Exception: really needs to be response or result. not:html at org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:134) ... 10 more I'm already dealing with the expected Service Unavailable exception, this is something different. Is it going to be possible to make this work? Should I file an issue in Jira? Is the problem in the newer SolrJ or in the older Solr? At this time I do not really need the information in the response, I just need to be able to judge success (Solr is up and working) by nothing being thrown, and be able to look into any exception thrown to see whether I've got a disabled handler or an error condition. Thanks, Shawn