Order by an expression in Solr
In SQL you can order by an expression like: SELECT * FROM TABLE1 ORDER BY ( CASE WHEN COL1 BETWEEN ${PARAM1} - 10 AND ${PARAM1} + 10 AND COL2=${PARAM2} THEN 1 WHEN COL1 BETWEEN ${PARAM1} - 10 AND ${PARAM1} + 10 AND COL3=${PARAM2} THEN 2 WHEN COL1 - COL3=${PARAM3} THEN 3 WHEN COL4 LIKE '${PARAM4}%' THEN 4 END ) AND HOW TO DO THAT IN SOLR? THANKS! -- View this message in context: http://lucene.472066.n3.nabble.com/Order-by-an-expression-in-Solr-tp4079270.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dataimporter, custom fields and parsing error
they are in my schema, path is typed correctly the others are default fields which already exist. all the other fields are populated and i can search for them, just path and text aren't. On 19. Jul 2013, at 6:16 PM, Alexandre Rafalovitch wrote: Dumb question: they are in your schema? Spelled right, in the right section, using types also defined? Can you populate them by hand with a CSV file and post.jar? Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, Jul 19, 2013 at 12:09 PM, Andreas Owen a...@conx.ch wrote: i'm using solr 4.3 which i just downloaded today and am using only jars that came with it. i have enabled the dataimporter and it runs without error. but the field path (included in schema.xml) and text (file content) aren't indexed. what am i doing wrong? solr-path: C:\ColdFusion10\cfusion\jetty-new collection-path: C:\ColdFusion10\cfusion\jetty-new\solr\collection1 pdf-doc-path: C:\web\development\tkb\internet\public data-config.xml: dataConfig dataSource type=BinFileDataSource name=data/ dataSource type=BinURLDataSource name=dataUrl/ dataSource type=URLDataSource baseUrl= http://127.0.0.1/tkb/internet/; name=main/ document entity name=rec processor=XPathEntityProcessor url=docImportUrl.xml forEach=/albums/album dataSource=main !-- transformer=script:GenerateId-- field column=title xpath=//title / field column=id xpath=//file / field column=path xpath=//path / field column=Author xpath=//author / !-- field column=tstamp2013-07-05T14:59:46.889Z/field -- entity name=tika processor=TikaEntityProcessor url=../../../../../web/development/tkb/internet/public/${rec.path}/${ rec.id} dataSource=data field column=text / /entity /entity /document /dataConfig docImportUrl.xml: ?xml version=1.0 encoding=utf-8? albums album authorPeter Z./author titleBeratungsseminar kundenbrief/title descriptionwie kommuniziert man/description file0226520141_e-banking_Checkliste_CLX.Sentinel.pdf/file pathdownload/online/path /album album authorMarcel X./author titlekuchen backen/title descriptiontorten, kuchen, geb‰ck .../description fileKundenbrief.pdf/file pathdownload/online/path /album /albums
Re: dataimporter, custom fields and parsing error
Are the path and text fields set to stored in the schema.xml? On Sat, Jul 20, 2013 at 3:37 PM, Andreas Owen a...@conx.ch wrote: they are in my schema, path is typed correctly the others are default fields which already exist. all the other fields are populated and i can search for them, just path and text aren't. On 19. Jul 2013, at 6:16 PM, Alexandre Rafalovitch wrote: Dumb question: they are in your schema? Spelled right, in the right section, using types also defined? Can you populate them by hand with a CSV file and post.jar? Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, Jul 19, 2013 at 12:09 PM, Andreas Owen a...@conx.ch wrote: i'm using solr 4.3 which i just downloaded today and am using only jars that came with it. i have enabled the dataimporter and it runs without error. but the field path (included in schema.xml) and text (file content) aren't indexed. what am i doing wrong? solr-path: C:\ColdFusion10\cfusion\jetty-new collection-path: C:\ColdFusion10\cfusion\jetty-new\solr\collection1 pdf-doc-path: C:\web\development\tkb\internet\public data-config.xml: dataConfig dataSource type=BinFileDataSource name=data/ dataSource type=BinURLDataSource name=dataUrl/ dataSource type=URLDataSource baseUrl= http://127.0.0.1/tkb/internet/; name=main/ document entity name=rec processor=XPathEntityProcessor url=docImportUrl.xml forEach=/albums/album dataSource=main !-- transformer=script:GenerateId-- field column=title xpath=//title / field column=id xpath=//file / field column=path xpath=//path / field column=Author xpath=//author / !-- field column=tstamp2013-07-05T14:59:46.889Z/field -- entity name=tika processor=TikaEntityProcessor url=../../../../../web/development/tkb/internet/public/${rec.path}/${ rec.id} dataSource=data field column=text / /entity /entity /document /dataConfig docImportUrl.xml: ?xml version=1.0 encoding=utf-8? albums album authorPeter Z./author titleBeratungsseminar kundenbrief/title descriptionwie kommuniziert man/description file0226520141_e-banking_Checkliste_CLX.Sentinel.pdf/file pathdownload/online/path /album album authorMarcel X./author titlekuchen backen/title descriptiontorten, kuchen, geb‰ck .../description fileKundenbrief.pdf/file pathdownload/online/path /album /albums -- Regards, Shalin Shekhar Mangar.
Re: Order by an expression in Solr
Sorry, but Solr doesn't perform SQL-like operations. But, please rephrase your query in simple, plain English, and we'll be happy to suggest approaches using Solr. Starting principle: When you think Solr, think NoSQL. You're still thinking SQL! That said, generally, SQL order by refers to sorting, and Solr does have a sort parameter: http://wiki.apache.org/solr/CommonQueryParameters#sort -- Jack Krupansky -Original Message- From: cmd.ares Sent: Saturday, July 20, 2013 2:51 AM To: solr-user@lucene.apache.org Subject: Order by an expression in Solr In SQL you can order by an expression like: SELECT * FROM TABLE1 ORDER BY ( CASE WHEN COL1 BETWEEN ${PARAM1} - 10 AND ${PARAM1} + 10 AND COL2=${PARAM2} THEN 1 WHEN COL1 BETWEEN ${PARAM1} - 10 AND ${PARAM1} + 10 AND COL3=${PARAM2} THEN 2 WHEN COL1 - COL3=${PARAM3} THEN 3 WHEN COL4 LIKE '${PARAM4}%' THEN 4 END ) AND HOW TO DO THAT IN SOLR? THANKS! -- View this message in context: http://lucene.472066.n3.nabble.com/Order-by-an-expression-in-Solr-tp4079270.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: custom field type plugin
Sorry, I accidentally hit send somehow. Here is my ant build.xml for building the CustomPlugins jar file: ?xml version=1.0 encoding=UTF-8 standalone=no? project basedir=. default=jar name=CustomPlugins property environment=env/ property name=debuglevel value=source,lines,vars/ property name=target value=1.6/ property name=source value=1.6/ path id=CustomPlugins.classpath pathelement location=bin/ pathelement location=lib/apache-solr-core-4.0.0.jar/ pathelement location=lib/lucene-core-4.0.0.jar/ pathelement location=lib/lucene-queries-4.0.0.jar/ pathelement location=lib/apache-solr-solrj-4.0.0.jar/ /path target name=init mkdir dir=bin/ copy includeemptydirs=false todir=bin fileset dir=src exclude name=**/*.launch/ exclude name=**/*.java/ /fileset /copy /target target name=clean delete dir=bin/ delete dir=dist/ /target target depends=clean name=cleanall/ target depends=build-subprojects,build-project name=build/ target name=build-subprojects/ target depends=init name=build-project echo message=building project ${ant.project.name}: ${ant.file}/ javac debug=true debuglevel=DEBUG destdir=bin source=${source} target=${target} src path=src/ classpath refid=CustomPlugins.classpath/ /javac /target target depends=build name=jar echo message=${ant.project.name}: ${ant.file}/ mkdir dir=dist/ jar jarfile=dist/CustomPlugins.jar basedir=bin includes=**/*.class /jar /target /project Here is the code for the GeneticLocation.java file. It is not complete, and might have errors in it. I used PointType as my starting point, and trimmed out what I didn't think I needed. I want to verify that I can load it now before I muck with it any further. package org.jax.mgi.fe.solrplugin; import java.io.IOException; import java.util.ArrayList; import java.util.List; import java.util.Map; import org.apache.lucene.document.FieldType; import org.apache.lucene.index.IndexableField; import org.apache.lucene.search.BooleanClause; import org.apache.lucene.search.BooleanQuery; import org.apache.lucene.search.Query; import org.apache.lucene.search.SortField; import org.apache.solr.common.SolrException; import org.apache.solr.common.params.MapSolrParams; import org.apache.solr.common.params.SolrParams; import org.apache.solr.response.TextResponseWriter; import org.apache.solr.schema.AbstractSubTypeFieldType; import org.apache.solr.schema.IndexSchema; import org.apache.solr.schema.SchemaField; import org.apache.solr.search.QParser; import org.apache.lucene.queries.function.ValueSource; import org.apache.lucene.queries.function.valuesource.VectorValueSource; /** * Custom solr field type to support querying against genetic location data. */ public class GeneticLocation extends AbstractSubTypeFieldType { @Override protected void init(IndexSchema schema, MapString, String args) { SolrParams p = new MapSolrParams(args); this.schema = schema; super.init(schema, args); // cache suffixes createSuffixCache(2); } @Override public boolean isPolyField() { return true; // really only true if the field is indexed } @Override public IndexableField[] createFields(SchemaField field, Object value, float boost) { String externalVal = value.toString(); String[] coords = externalVal.split(-); if(coords.length!=2) { throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,Invalid coordinate format for +externalVal); } ListIndexableField f = new ArrayListIndexableField(); if (field.indexed()) { SchemaField coordField1 = subField(field,0); SchemaField coordField2 = subField(field,1); f.add(coordField1.createField(coords[0],coordField1.indexed() !coordField1.omitNorms() ? boost : 1f)); f.add(coordField2.createField(coords[1],coordField2.indexed() !coordField2.omitNorms() ? boost : 1f)); } if (field.stored()) { String storedVal = externalVal; // normalize or not? FieldType customType = new FieldType(); customType.setStored(true); f.add(createField(field.getName(), storedVal, customType, 1f)); }
RE: custom field type plugin
Thank you for the links, they really helped me understand. I see how the spatial solution works now. I think this could work as a good backup if I cannot get the custom field type working. The custom field would ideally be a bit more robust than what I mentioned before, because a region really means four pieces, a chromosome (e.g. 1-22), a start base pair, an end base pair, and the direction (forward or reverse). But if need be, the chomosome and direction can be multiplied into the base pairs to get it down to two translated numbers. As for the upper bounds, I do have an idea, but it would be a large number, say between 1 and 10 billion depending on how I translate the values. I'll just have to try it out I guess. Ok, now back to the custom field problem. From here on I'll spam source code and stack traces. I started fresh, removing all places where I may have had my jar file, and popped in a fresh solr.war. I define the plugin class in my schema like this: fieldType name=geneticLocation class=org.jax.mgi.fe.solrplugin.GeneticLocation omitNorms=true/ and use it here: field name=coordinate type=geneticLocation indexed=true stored=true / Ok, when I start solr, I get this error saying it can't find the plugin class that is defined in my schema. org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType geneticLocation: Error loading class 'org.jax.mgi.fe.solrplugin.GeneticLocation' at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:369) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:113) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:846) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:534) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356) ...etc... Caused by: org.apache.solr.common.SolrException: Error loading class 'org.jax.mgi.fe.solrplugin.GeneticLocation' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:436) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:457) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:453) ...etc... So, that's all fine. In my solr.xml, I define this sharedLib folder: solr persistent=true sharedLib=../lib I shut the server down, drop in my CustomPlugins.jar file and start the server back up. And... I got a different error! It said I was missing the subFieldType or subFieldSuffix in my fieldType definition. So I added one 'subFieldSuffix=_gl. Then I restart the server thinking that I'm making progress, and I get the old error again. I pulled out the jar, did the above test again to verify that it couldn't find my plugin. Then I re-add it and restart. Nope, still this error about AbstractSubTypeFieldType. Here is the full stack trace: SEVERE: null:java.lang.NoClassDefFoundError: org/apache/solr/schema/AbstractSubTypeFieldType at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:791) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:401) at java.lang.ClassLoader.loadClass(ClassLoader.java:410) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:789) at java.lang.ClassLoader.loadClass(ClassLoader.java:410) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:789) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:266) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:420) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:457) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:453) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:81) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151) at
Re: Order by an expression in Solr
Also, when you import set a field that does this in SQL select (CASE WHEN COL1 BETWEEN ${PARAM1} - 10 AND ${PARAM1} + 10 AND COL2=${PARAM2} THEN 1 WHEN COL1 BETWEEN ${PARAM1} - 10 AND ${PARAM1} + 10 AND COL3=${PARAM2} THEN 2 WHEN COL1 - COL3=${PARAM3} THEN 3 WHEN COL4 LIKE '${PARAM4}%' THEN 4 END) as whenField from On Sat, Jul 20, 2013 at 6:28 AM, Jack Krupansky j...@basetechnology.comwrote: Sorry, but Solr doesn't perform SQL-like operations. But, please rephrase your query in simple, plain English, and we'll be happy to suggest approaches using Solr. Starting principle: When you think Solr, think NoSQL. You're still thinking SQL! That said, generally, SQL order by refers to sorting, and Solr does have a sort parameter: http://wiki.apache.org/solr/**CommonQueryParameters#sorthttp://wiki.apache.org/solr/CommonQueryParameters#sort -- Jack Krupansky -Original Message- From: cmd.ares Sent: Saturday, July 20, 2013 2:51 AM To: solr-user@lucene.apache.org Subject: Order by an expression in Solr In SQL you can order by an expression like: SELECT * FROM TABLE1 ORDER BY ( CASE WHEN COL1 BETWEEN ${PARAM1} - 10 AND ${PARAM1} + 10 AND COL2=${PARAM2} THEN 1 WHEN COL1 BETWEEN ${PARAM1} - 10 AND ${PARAM1} + 10 AND COL3=${PARAM2} THEN 2 WHEN COL1 - COL3=${PARAM3} THEN 3 WHEN COL4 LIKE '${PARAM4}%' THEN 4 END ) AND HOW TO DO THAT IN SOLR? THANKS! -- View this message in context: http://lucene.472066.n3.** nabble.com/Order-by-an-**expression-in-Solr-tp4079270.**htmlhttp://lucene.472066.n3.nabble.com/Order-by-an-expression-in-Solr-tp4079270.html Sent from the Solr - User mailing list archive at Nabble.com. -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: dataimporter, custom fields and parsing error
path was set text wasn't, but it doesn't make a difference. my importer says 1 row fetched, 0 docs processed, 0 docs skipped. i don't understand how it can have 2 docs indexed with such a output. On 20. Jul 2013, at 12:47 PM, Shalin Shekhar Mangar wrote: Are the path and text fields set to stored in the schema.xml? On Sat, Jul 20, 2013 at 3:37 PM, Andreas Owen a...@conx.ch wrote: they are in my schema, path is typed correctly the others are default fields which already exist. all the other fields are populated and i can search for them, just path and text aren't. On 19. Jul 2013, at 6:16 PM, Alexandre Rafalovitch wrote: Dumb question: they are in your schema? Spelled right, in the right section, using types also defined? Can you populate them by hand with a CSV file and post.jar? Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, Jul 19, 2013 at 12:09 PM, Andreas Owen a...@conx.ch wrote: i'm using solr 4.3 which i just downloaded today and am using only jars that came with it. i have enabled the dataimporter and it runs without error. but the field path (included in schema.xml) and text (file content) aren't indexed. what am i doing wrong? solr-path: C:\ColdFusion10\cfusion\jetty-new collection-path: C:\ColdFusion10\cfusion\jetty-new\solr\collection1 pdf-doc-path: C:\web\development\tkb\internet\public data-config.xml: dataConfig dataSource type=BinFileDataSource name=data/ dataSource type=BinURLDataSource name=dataUrl/ dataSource type=URLDataSource baseUrl= http://127.0.0.1/tkb/internet/; name=main/ document entity name=rec processor=XPathEntityProcessor url=docImportUrl.xml forEach=/albums/album dataSource=main !-- transformer=script:GenerateId-- field column=title xpath=//title / field column=id xpath=//file / field column=path xpath=//path / field column=Author xpath=//author / !-- field column=tstamp2013-07-05T14:59:46.889Z/field -- entity name=tika processor=TikaEntityProcessor url=../../../../../web/development/tkb/internet/public/${rec.path}/${ rec.id} dataSource=data field column=text / /entity /entity /document /dataConfig docImportUrl.xml: ?xml version=1.0 encoding=utf-8? albums album authorPeter Z./author titleBeratungsseminar kundenbrief/title descriptionwie kommuniziert man/description file0226520141_e-banking_Checkliste_CLX.Sentinel.pdf/file pathdownload/online/path /album album authorMarcel X./author titlekuchen backen/title descriptiontorten, kuchen, geb‰ck .../description fileKundenbrief.pdf/file pathdownload/online/path /album /albums -- Regards, Shalin Shekhar Mangar.
Re: Auto-sharding and numShard parameter
Flavio: One of the great things about having people continually using Solr (and SolrCloud) for the first time is the opportunity to improve the docs. Anyone can update/add to the docs, all it takes is a signon. Unfortunately we has a bunch of spam bots a while ago, so it's now a two step process 1 create a login on the Solr wiki 2 post a message on this list indicating that you'd like to help improve the Wiki and give us your Solr login. We'll add you to the list of people who can edit the wiki and you can help the community by improving the documentation. Best Erick On Fri, Jul 19, 2013 at 8:46 AM, Flavio Pompermaier pomperma...@okkam.it wrote: Thank you for the reply Erick, I was facing exactly with that problem..from the documentation it seems that those parameter are required to run SolrCloud, instead they are just used to initialize a sample collection.. I think that in the examples on the user doc it should be better to separate those 2 concepts: one is starting the server, another one is creating/managing collections. Best, Flavio On Fri, Jul 19, 2013 at 2:13 PM, Erick Erickson erickerick...@gmail.comwrote: First the numShards parameter is only relevant the very first time you create your collection. It's a little confusing because in the SolrCloud examples you're getting collection1 by default. Look further down the SolrCloud Wiki page, the section titled Managing Collections via the Collections API for creating collections with a different name. Either way, either when you run the bootstrap command or when you create a new collection, that's the only time numShards counts. It's ignored the rest of the time. As far as data growing, you need to either 1 create enough shards to handle the eventual size things will be, sometimes called oversharding or 2 use the splitShard capabilities in very recent Solrs to expand capacity. Best Erick On Thu, Jul 18, 2013 at 4:52 PM, Flavio Pompermaier pomperma...@okkam.it wrote: Hi to all, Probably this question has a simple answer but I just want to be sure of the potential drawbacks..when I run SolrCloud I run the main solr instance with the -numShard option (e.g. 2). Then as data grows, shards could potentially become a huge number. If I hadstio to restart all nodes and I re-run the master with the numShard=2, what will happen? It will be just ignored or Solr will try to reduce shards...? Another question...in SolrCloud, how do I restart all the cloud at once? Is it possible? Best, Flavio
Re: Indexing into SolrCloud
NP, glad I was able to help! Erick On Fri, Jul 19, 2013 at 11:07 AM, Beale, Jim (US-KOP) jim.be...@hibu.com wrote: Hi Erick! Thanks for the reply. When I call server.add() it is just to add a single document. But, still, I think you might be correct about the size of the ultimate request. I decided to grab the bull by the horns by instantiating my own HttpClient and, in so doing, my first run changed the following parameters, SOLR_HTTP_THREAD_COUNT=4 SOLR_MAX_BUFFERED_DOCS=1 SOLR_MAX_CONNECTIONS=256 SOLR_MAX_CONNECTIONS_PER_HOST=128 SOLR_CONNECTION_TIMEOUT=0 SOLR_SO_TIMEOUT=0 I doubled the number of emptying threads, reduced the size of the request buffer 5x, increased the connection limits and set the timeouts to infinite. (I'm not actually sure what the defaults for the timeouts were since I didn't see them in the Solr code and didn't track it down.) Anyway, the good news is that this combination of parameters worked. The bad news is that I don't know whether it was resolved by changing one or more of the parameters. But, regardless, I think the whole experiment verifies your thinking that the request was too big! Thanks again!! :) Jim Beale Lead Developer hibu.com 2201 Renaissance Boulevard, King of Prussia, PA, 19406 Office: 610-879-3864 Mobile: 610-220-3067 -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, July 19, 2013 8:08 AM To: solr-user@lucene.apache.org Subject: Re: Indexing into SolrCloud Usually EOF errors indicate that the packet you're sending are too big. Wait, though. 50K is not buffered docs, I think it's buffered _requests_. So you're creating a queue that's ginormous and asking 2 threads to empty it. But that's not really the issue I suspect. How many documents are you adding at a time when you call server.add? I.e. are you using sever.add(doc) or server.add(doclist)? If the latter and you're adding a bunch of docs, try lowering that number. If you're sending one doc at a time I'm on the wrong track. Best Erick On Thu, Jul 18, 2013 at 2:51 PM, Beale, Jim (US-KOP) jim.be...@hibu.com wrote: Hey folks, I've been migrating an application which indexes about 15M documents from straight-up Lucene into SolrCloud. We've set up 5 Solr instances with a 3 zookeeper ensemble using HAProxy for load balancing. The documents are processed on a quad core machine with 6 threads and indexed into SolrCloud through HAProxy using ConcurrentUpdateSolrServer in order to batch the updates. The indexing box is heavily-loaded during indexing but I don't think it is so bad that it would cause issues. I'm using Solr 4.3.1 on client and server side, zookeeper 3.4.5 and HAProxy 1.4.22. I've been accepting the default HttpClient with 50K buffered docs and 2 threads, i.e., int solrMaxBufferedDocs = 5; int solrThreadCount = 2; solrServer = new ConcurrentUpdateSolrServer(solrHttpIPAddress, solrMaxBufferedDocs, solrThreadCount); autoCommit is configured in the solrconfig as follows: autoCommit maxTime60/maxTime maxDocs50/maxDocs openSearcherfalse/openSearcher /autoCommit I'm getting the following errors on the client and server sides respectively: Client side: 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-4] INFO SystemDefaultHttpClient - I/O exception (java.net.SocketException) caught when processing request: Software caused connection abort: socket write error 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-4] INFO SystemDefaultHttpClient - Retrying request 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-5] INFO SystemDefaultHttpClient - I/O exception (java.net.SocketException) caught when processing request: Software caused connection abort: socket write error 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-5] INFO SystemDefaultHttpClient - Retrying request Server side: 7988753 [qtp1956653918-23] ERROR org.apache.solr.core.SolrCore â java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException] early EOF at com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18) at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731) at com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393) When I disabled autoCommit on the server side, I didn't see any errors there but I still get the issue client-side after about 2 million documents - which is about 45 minutes. Has anyone seen this issue before? I couldn't find anything useful on the usual places. I suppose I could setup wireshark to see what is happening but I'm hoping that someone has a better suggestion.
Re: Auto-sharding and numShard parameter
A lot has changed since those example were written - in general, we are moving away from that type of collection initialization and towards using the Collections API. Eventually, I'd personally like SolrCloud to ship with no predefined collections and have users simply start it and then start using the Collections API - preconfigured collections will be second class and possibly deprecated at some point. - Mark On Jul 20, 2013, at 10:13 PM, Erick Erickson erickerick...@gmail.com wrote: Flavio: One of the great things about having people continually using Solr (and SolrCloud) for the first time is the opportunity to improve the docs. Anyone can update/add to the docs, all it takes is a signon. Unfortunately we has a bunch of spam bots a while ago, so it's now a two step process 1 create a login on the Solr wiki 2 post a message on this list indicating that you'd like to help improve the Wiki and give us your Solr login. We'll add you to the list of people who can edit the wiki and you can help the community by improving the documentation. Best Erick On Fri, Jul 19, 2013 at 8:46 AM, Flavio Pompermaier pomperma...@okkam.it wrote: Thank you for the reply Erick, I was facing exactly with that problem..from the documentation it seems that those parameter are required to run SolrCloud, instead they are just used to initialize a sample collection.. I think that in the examples on the user doc it should be better to separate those 2 concepts: one is starting the server, another one is creating/managing collections. Best, Flavio On Fri, Jul 19, 2013 at 2:13 PM, Erick Erickson erickerick...@gmail.comwrote: First the numShards parameter is only relevant the very first time you create your collection. It's a little confusing because in the SolrCloud examples you're getting collection1 by default. Look further down the SolrCloud Wiki page, the section titled Managing Collections via the Collections API for creating collections with a different name. Either way, either when you run the bootstrap command or when you create a new collection, that's the only time numShards counts. It's ignored the rest of the time. As far as data growing, you need to either 1 create enough shards to handle the eventual size things will be, sometimes called oversharding or 2 use the splitShard capabilities in very recent Solrs to expand capacity. Best Erick On Thu, Jul 18, 2013 at 4:52 PM, Flavio Pompermaier pomperma...@okkam.it wrote: Hi to all, Probably this question has a simple answer but I just want to be sure of the potential drawbacks..when I run SolrCloud I run the main solr instance with the -numShard option (e.g. 2). Then as data grows, shards could potentially become a huge number. If I hadstio to restart all nodes and I re-run the master with the numShard=2, what will happen? It will be just ignored or Solr will try to reduce shards...? Another question...in SolrCloud, how do I restart all the cloud at once? Is it possible? Best, Flavio