Order by an expression in Solr

2013-07-20 Thread cmd.ares
In SQL you can order by an expression like:
SELECT * FROM TABLE1 ORDER BY 
(
CASE WHEN COL1 BETWEEN ${PARAM1} - 10 AND ${PARAM1} + 10 AND COL2=${PARAM2}
THEN 1
WHEN COL1 BETWEEN ${PARAM1} - 10 AND ${PARAM1} + 10 AND
COL3=${PARAM2} THEN 2
WHEN COL1 - COL3=${PARAM3} THEN 3
WHEN COL4 LIKE '${PARAM4}%' THEN 4
END
)

AND HOW TO DO THAT IN SOLR?
THANKS!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Order-by-an-expression-in-Solr-tp4079270.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: dataimporter, custom fields and parsing error

2013-07-20 Thread Andreas Owen
they are in my schema, path is typed correctly the others are default fields 
which already exist. all the other fields are populated and i can search for 
them, just path and text aren't.


On 19. Jul 2013, at 6:16 PM, Alexandre Rafalovitch wrote:

 Dumb question: they are in your schema? Spelled right, in the right
 section, using types also defined? Can you populate them by hand with a CSV
 file and post.jar?
 
 Regards,
   Alex.
 
 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
 
 
 On Fri, Jul 19, 2013 at 12:09 PM, Andreas Owen a...@conx.ch wrote:
 
 i'm using solr 4.3 which i just downloaded today and am using only jars
 that came with it. i have enabled the dataimporter and it runs without
 error. but the field path (included in schema.xml) and text (file
 content) aren't indexed. what am i doing wrong?
 
 solr-path: C:\ColdFusion10\cfusion\jetty-new
 collection-path: C:\ColdFusion10\cfusion\jetty-new\solr\collection1
 pdf-doc-path: C:\web\development\tkb\internet\public
 
 
 data-config.xml:
 
 dataConfig
dataSource type=BinFileDataSource name=data/
dataSource type=BinURLDataSource name=dataUrl/
dataSource type=URLDataSource baseUrl=
 http://127.0.0.1/tkb/internet/; name=main/
 document
entity name=rec processor=XPathEntityProcessor
 url=docImportUrl.xml forEach=/albums/album dataSource=main !--
 
 transformer=script:GenerateId--
field column=title xpath=//title /
field column=id xpath=//file /
field column=path xpath=//path /
field column=Author xpath=//author /
 
!-- field
 column=tstamp2013-07-05T14:59:46.889Z/field --
 
entity name=tika processor=TikaEntityProcessor
 url=../../../../../web/development/tkb/internet/public/${rec.path}/${
 rec.id}
 
 dataSource=data 
field column=text /
 
/entity
/entity
 /document
 /dataConfig
 
 
 docImportUrl.xml:
 
 ?xml version=1.0 encoding=utf-8?
 albums
album
authorPeter Z./author
titleBeratungsseminar kundenbrief/title
descriptionwie kommuniziert man/description
 
 file0226520141_e-banking_Checkliste_CLX.Sentinel.pdf/file
pathdownload/online/path
/album
album
authorMarcel X./author
titlekuchen backen/title
descriptiontorten, kuchen, geb‰ck .../description
fileKundenbrief.pdf/file
pathdownload/online/path
/album
 /albums



Re: dataimporter, custom fields and parsing error

2013-07-20 Thread Shalin Shekhar Mangar
Are the path and text fields set to stored in the schema.xml?


On Sat, Jul 20, 2013 at 3:37 PM, Andreas Owen a...@conx.ch wrote:

 they are in my schema, path is typed correctly the others are default
 fields which already exist. all the other fields are populated and i can
 search for them, just path and text aren't.


 On 19. Jul 2013, at 6:16 PM, Alexandre Rafalovitch wrote:

  Dumb question: they are in your schema? Spelled right, in the right
  section, using types also defined? Can you populate them by hand with a
 CSV
  file and post.jar?
 
  Regards,
Alex.
 
  Personal website: http://www.outerthoughts.com/
  LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
  - Time is the quality of nature that keeps events from happening all at
  once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
 
 
  On Fri, Jul 19, 2013 at 12:09 PM, Andreas Owen a...@conx.ch wrote:
 
  i'm using solr 4.3 which i just downloaded today and am using only jars
  that came with it. i have enabled the dataimporter and it runs without
  error. but the field path (included in schema.xml) and text (file
  content) aren't indexed. what am i doing wrong?
 
  solr-path: C:\ColdFusion10\cfusion\jetty-new
  collection-path: C:\ColdFusion10\cfusion\jetty-new\solr\collection1
  pdf-doc-path: C:\web\development\tkb\internet\public
 
 
  data-config.xml:
 
  dataConfig
 dataSource type=BinFileDataSource name=data/
 dataSource type=BinURLDataSource name=dataUrl/
 dataSource type=URLDataSource baseUrl=
  http://127.0.0.1/tkb/internet/; name=main/
  document
 entity name=rec processor=XPathEntityProcessor
  url=docImportUrl.xml forEach=/albums/album dataSource=main !--
 
  transformer=script:GenerateId--
 field column=title xpath=//title /
 field column=id xpath=//file /
 field column=path xpath=//path /
 field column=Author xpath=//author /
 
 !-- field
  column=tstamp2013-07-05T14:59:46.889Z/field --
 
 entity name=tika processor=TikaEntityProcessor
  url=../../../../../web/development/tkb/internet/public/${rec.path}/${
  rec.id}
 
  dataSource=data 
 field column=text /
 
 /entity
 /entity
  /document
  /dataConfig
 
 
  docImportUrl.xml:
 
  ?xml version=1.0 encoding=utf-8?
  albums
 album
 authorPeter Z./author
 titleBeratungsseminar kundenbrief/title
 descriptionwie kommuniziert man/description
 
  file0226520141_e-banking_Checkliste_CLX.Sentinel.pdf/file
 pathdownload/online/path
 /album
 album
 authorMarcel X./author
 titlekuchen backen/title
 descriptiontorten, kuchen, geb‰ck .../description
 fileKundenbrief.pdf/file
 pathdownload/online/path
 /album
  /albums




-- 
Regards,
Shalin Shekhar Mangar.


Re: Order by an expression in Solr

2013-07-20 Thread Jack Krupansky

Sorry, but Solr doesn't perform SQL-like operations.

But, please rephrase your query in simple, plain English, and we'll be 
happy to suggest approaches using Solr.


Starting principle: When you think Solr, think NoSQL. You're still 
thinking SQL!


That said, generally, SQL order by refers to sorting, and Solr does have a 
sort parameter:


http://wiki.apache.org/solr/CommonQueryParameters#sort

-- Jack Krupansky

-Original Message- 
From: cmd.ares

Sent: Saturday, July 20, 2013 2:51 AM
To: solr-user@lucene.apache.org
Subject: Order by an expression in Solr

In SQL you can order by an expression like:
SELECT * FROM TABLE1 ORDER BY
(
CASE WHEN COL1 BETWEEN ${PARAM1} - 10 AND ${PARAM1} + 10 AND COL2=${PARAM2}
THEN 1
   WHEN COL1 BETWEEN ${PARAM1} - 10 AND ${PARAM1} + 10 AND
COL3=${PARAM2} THEN 2
   WHEN COL1 - COL3=${PARAM3} THEN 3
   WHEN COL4 LIKE '${PARAM4}%' THEN 4
END
)

AND HOW TO DO THAT IN SOLR?
THANKS!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Order-by-an-expression-in-Solr-tp4079270.html
Sent from the Solr - User mailing list archive at Nabble.com. 



RE: custom field type plugin

2013-07-20 Thread Kevin Stone
Sorry, I accidentally hit send somehow.


Here is my ant build.xml for building the CustomPlugins jar file:

?xml version=1.0 encoding=UTF-8 standalone=no?
project basedir=. default=jar name=CustomPlugins
 property environment=env/
 property name=debuglevel value=source,lines,vars/
 property name=target value=1.6/
 property name=source value=1.6/
 path id=CustomPlugins.classpath
 pathelement location=bin/
 pathelement location=lib/apache-solr-core-4.0.0.jar/
 pathelement location=lib/lucene-core-4.0.0.jar/
 pathelement location=lib/lucene-queries-4.0.0.jar/
 pathelement location=lib/apache-solr-solrj-4.0.0.jar/
 /path
 target name=init
mkdir dir=bin/
copy includeemptydirs=false todir=bin
fileset dir=src
exclude name=**/*.launch/
exclude name=**/*.java/
/fileset
/copy
 /target
 target name=clean
delete dir=bin/
delete dir=dist/
 /target
 target depends=clean name=cleanall/
 target depends=build-subprojects,build-project name=build/
 target name=build-subprojects/
 target depends=init name=build-project
echo message=building project ${ant.project.name}: ${ant.file}/
javac debug=true debuglevel=DEBUG destdir=bin source=${source} 
target=${target}
src path=src/
classpath refid=CustomPlugins.classpath/
/javac
 /target
 target depends=build name=jar
echo message=${ant.project.name}: ${ant.file}/
mkdir dir=dist/
jar jarfile=dist/CustomPlugins.jar basedir=bin 
includes=**/*.class
/jar
 /target
/project


Here is the code for the GeneticLocation.java file. It is not complete, and 
might have errors in it. I used PointType as my starting point, and trimmed out 
what I didn't think I needed. I want to verify that I can load it now before I 
muck with it any further.


package org.jax.mgi.fe.solrplugin;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;

import org.apache.lucene.document.FieldType;
import org.apache.lucene.index.IndexableField;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.SortField;
import org.apache.solr.common.SolrException;
import org.apache.solr.common.params.MapSolrParams;
import org.apache.solr.common.params.SolrParams;
import org.apache.solr.response.TextResponseWriter;
import org.apache.solr.schema.AbstractSubTypeFieldType;
import org.apache.solr.schema.IndexSchema;
import org.apache.solr.schema.SchemaField;
import org.apache.solr.search.QParser;
import org.apache.lucene.queries.function.ValueSource;
import org.apache.lucene.queries.function.valuesource.VectorValueSource;


/**
 * Custom solr field type to support querying against genetic location data.
 */
public class GeneticLocation extends AbstractSubTypeFieldType
{
 @Override
  protected void init(IndexSchema schema, MapString, String args) {
SolrParams p = new MapSolrParams(args);

this.schema = schema;
super.init(schema, args);

// cache suffixes
createSuffixCache(2);
  }

  @Override
  public boolean isPolyField() {
return true;   // really only true if the field is indexed
  }

  @Override
  public IndexableField[] createFields(SchemaField field, Object value, 
float boost) 
  {
String externalVal = value.toString();
String[] coords = externalVal.split(-);

if(coords.length!=2)
{
throw new 
SolrException(SolrException.ErrorCode.BAD_REQUEST,Invalid coordinate format 
for +externalVal);
}

ListIndexableField f = new 
ArrayListIndexableField();

if (field.indexed()) 
{
SchemaField coordField1 = subField(field,0);
SchemaField coordField2 = subField(field,1);

f.add(coordField1.createField(coords[0],coordField1.indexed()  
!coordField1.omitNorms() ? boost : 1f));

f.add(coordField2.createField(coords[1],coordField2.indexed()  
!coordField2.omitNorms() ? boost : 1f));
}

if (field.stored()) 
{
String storedVal = externalVal;  // normalize 
or not?
FieldType customType = new FieldType();
customType.setStored(true);
f.add(createField(field.getName(), storedVal, 
customType, 1f));
}


RE: custom field type plugin

2013-07-20 Thread Kevin Stone
Thank you for the links, they really helped me understand. I see how the 
spatial solution works now. I think this could work as a good backup if I 
cannot get the custom field type working. The custom field would ideally be a 
bit more robust than what I mentioned before, because a region really means 
four pieces, a chromosome (e.g. 1-22), a start base pair, an end base pair, and 
the direction (forward or reverse). But if need be, the chomosome and direction 
can be multiplied into the base pairs to get it down to two translated numbers.

As for the upper bounds, I do have an idea, but  it would be a large number, 
say between 1 and 10 billion depending on how I translate the values. I'll just 
have to try it out I guess.


Ok, now back to the custom field problem. From here on I'll spam source code 
and stack traces.

I started fresh, removing all places where I may have had my jar file, and 
popped in a fresh solr.war.
I define the plugin class in my schema like this:

 fieldType name=geneticLocation 
class=org.jax.mgi.fe.solrplugin.GeneticLocation omitNorms=true/

and use it here:

 field name=coordinate type=geneticLocation indexed=true stored=true /


Ok, when I start solr, I get this error saying it can't find the plugin class 
that is defined in my schema.
org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] 
fieldType geneticLocation: Error loading class 
'org.jax.mgi.fe.solrplugin.GeneticLocation'
at 
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:369)
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:113)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:846)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:534)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356)
...etc...
Caused by: org.apache.solr.common.SolrException: Error loading class 
'org.jax.mgi.fe.solrplugin.GeneticLocation'
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:436)
at 
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:457)
at 
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:453)
...etc...

So, that's all fine. 

In my solr.xml, I define this sharedLib folder:
solr persistent=true sharedLib=../lib

I shut the server down, drop in my CustomPlugins.jar file and start the server 
back up. And... I got a different error! It said I was missing the subFieldType 
or subFieldSuffix in my fieldType definition. So I added one 
'subFieldSuffix=_gl. Then I restart the server thinking that I'm making 
progress, and I get the old error again. I pulled out the jar, did the above 
test again to verify that it couldn't find my plugin. Then I re-add it and 
restart. Nope, still this error about AbstractSubTypeFieldType. Here is the 
full stack trace:

SEVERE: null:java.lang.NoClassDefFoundError: 
org/apache/solr/schema/AbstractSubTypeFieldType
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:791)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at 
org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:401)
at java.lang.ClassLoader.loadClass(ClassLoader.java:410)
at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:789)
at java.lang.ClassLoader.loadClass(ClassLoader.java:410)
at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:789)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:266)
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:420)
at 
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:457)
at 
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:453)
at 
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:81)
at 
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43)
at 
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)
at 

Re: Order by an expression in Solr

2013-07-20 Thread William Bell
Also, when you import set a field that does this in SQL

select (CASE WHEN COL1 BETWEEN ${PARAM1} - 10 AND ${PARAM1} + 10 AND
COL2=${PARAM2}
THEN 1
WHEN COL1 BETWEEN ${PARAM1} - 10 AND ${PARAM1} + 10 AND
COL3=${PARAM2} THEN 2
WHEN COL1 - COL3=${PARAM3} THEN 3
WHEN COL4 LIKE '${PARAM4}%' THEN 4
END) as whenField
from 




On Sat, Jul 20, 2013 at 6:28 AM, Jack Krupansky j...@basetechnology.comwrote:

 Sorry, but Solr doesn't perform SQL-like operations.

 But, please rephrase your query in simple, plain English, and we'll be
 happy to suggest approaches using Solr.

 Starting principle: When you think Solr, think NoSQL. You're still
 thinking SQL!

 That said, generally, SQL order by refers to sorting, and Solr does have
 a sort parameter:

 http://wiki.apache.org/solr/**CommonQueryParameters#sorthttp://wiki.apache.org/solr/CommonQueryParameters#sort

 -- Jack Krupansky

 -Original Message- From: cmd.ares
 Sent: Saturday, July 20, 2013 2:51 AM
 To: solr-user@lucene.apache.org
 Subject: Order by an expression in Solr


 In SQL you can order by an expression like:
 SELECT * FROM TABLE1 ORDER BY
 (
 CASE WHEN COL1 BETWEEN ${PARAM1} - 10 AND ${PARAM1} + 10 AND COL2=${PARAM2}
 THEN 1
WHEN COL1 BETWEEN ${PARAM1} - 10 AND ${PARAM1} + 10 AND
 COL3=${PARAM2} THEN 2
WHEN COL1 - COL3=${PARAM3} THEN 3
WHEN COL4 LIKE '${PARAM4}%' THEN 4
 END
 )

 AND HOW TO DO THAT IN SOLR?
 THANKS!



 --
 View this message in context: http://lucene.472066.n3.**
 nabble.com/Order-by-an-**expression-in-Solr-tp4079270.**htmlhttp://lucene.472066.n3.nabble.com/Order-by-an-expression-in-Solr-tp4079270.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: dataimporter, custom fields and parsing error

2013-07-20 Thread Andreas Owen
path was set text wasn't, but it doesn't make a difference. my importer says 1 
row fetched, 0 docs processed, 0 docs skipped. i don't understand how it can 
have 2 docs indexed with such a output.


On 20. Jul 2013, at 12:47 PM, Shalin Shekhar Mangar wrote:

 Are the path and text fields set to stored in the schema.xml?
 
 
 On Sat, Jul 20, 2013 at 3:37 PM, Andreas Owen a...@conx.ch wrote:
 
 they are in my schema, path is typed correctly the others are default
 fields which already exist. all the other fields are populated and i can
 search for them, just path and text aren't.
 
 
 On 19. Jul 2013, at 6:16 PM, Alexandre Rafalovitch wrote:
 
 Dumb question: they are in your schema? Spelled right, in the right
 section, using types also defined? Can you populate them by hand with a
 CSV
 file and post.jar?
 
 Regards,
  Alex.
 
 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
 
 
 On Fri, Jul 19, 2013 at 12:09 PM, Andreas Owen a...@conx.ch wrote:
 
 i'm using solr 4.3 which i just downloaded today and am using only jars
 that came with it. i have enabled the dataimporter and it runs without
 error. but the field path (included in schema.xml) and text (file
 content) aren't indexed. what am i doing wrong?
 
 solr-path: C:\ColdFusion10\cfusion\jetty-new
 collection-path: C:\ColdFusion10\cfusion\jetty-new\solr\collection1
 pdf-doc-path: C:\web\development\tkb\internet\public
 
 
 data-config.xml:
 
 dataConfig
   dataSource type=BinFileDataSource name=data/
   dataSource type=BinURLDataSource name=dataUrl/
   dataSource type=URLDataSource baseUrl=
 http://127.0.0.1/tkb/internet/; name=main/
 document
   entity name=rec processor=XPathEntityProcessor
 url=docImportUrl.xml forEach=/albums/album dataSource=main !--
 
 transformer=script:GenerateId--
   field column=title xpath=//title /
   field column=id xpath=//file /
   field column=path xpath=//path /
   field column=Author xpath=//author /
 
   !-- field
 column=tstamp2013-07-05T14:59:46.889Z/field --
 
   entity name=tika processor=TikaEntityProcessor
 url=../../../../../web/development/tkb/internet/public/${rec.path}/${
 rec.id}
 
 dataSource=data 
   field column=text /
 
   /entity
   /entity
 /document
 /dataConfig
 
 
 docImportUrl.xml:
 
 ?xml version=1.0 encoding=utf-8?
 albums
   album
   authorPeter Z./author
   titleBeratungsseminar kundenbrief/title
   descriptionwie kommuniziert man/description
 
 file0226520141_e-banking_Checkliste_CLX.Sentinel.pdf/file
   pathdownload/online/path
   /album
   album
   authorMarcel X./author
   titlekuchen backen/title
   descriptiontorten, kuchen, geb‰ck .../description
   fileKundenbrief.pdf/file
   pathdownload/online/path
   /album
 /albums
 
 
 
 
 -- 
 Regards,
 Shalin Shekhar Mangar.



Re: Auto-sharding and numShard parameter

2013-07-20 Thread Erick Erickson
Flavio:

One of the great things about having people continually using Solr
(and SolrCloud) for the first time is the opportunity to improve the
docs. Anyone can update/add to the docs, all it takes is a signon.
Unfortunately we has a bunch of spam bots a while ago, so it's now a
two step process
1 create a login on the Solr wiki
2 post a message on this list indicating that you'd like to help
improve the Wiki and give us your Solr login. We'll add you to the
list of people who can edit the wiki and you can help the community by
improving the documentation.

Best
Erick

On Fri, Jul 19, 2013 at 8:46 AM, Flavio Pompermaier
pomperma...@okkam.it wrote:
 Thank you for the reply Erick,
 I was facing exactly with that problem..from the documentation it seems
 that those parameter are required to run SolrCloud,
 instead they are just used to initialize a sample collection..
 I think that in the examples on the user doc it should be better to
 separate those 2 concepts: one is starting the server,
 another one is creating/managing collections.

 Best,
 Flavio


 On Fri, Jul 19, 2013 at 2:13 PM, Erick Erickson 
 erickerick...@gmail.comwrote:

 First the numShards parameter is only relevant the very first time you
 create your collection. It's a little confusing because in the SolrCloud
 examples you're getting collection1 by default. Look further down the
 SolrCloud Wiki page, the section titled
 Managing Collections via the Collections API for creating collections
 with a different name.

 Either way, either when you run the bootstrap command or when you
 create a new collection, that's the only time numShards counts. It's
 ignored the rest of the time.

 As far as data growing, you need to either
 1 create enough shards to handle the eventual size things will be,
 sometimes called oversharding
 or
 2 use the splitShard capabilities in very recent Solrs to expand
 capacity.

 Best
 Erick

 On Thu, Jul 18, 2013 at 4:52 PM, Flavio Pompermaier
 pomperma...@okkam.it wrote:
  Hi to all,
  Probably this question has a simple answer but I just want to be sure of
  the potential drawbacks..when I run SolrCloud I run the main solr
 instance
  with the -numShard option (e.g. 2).
  Then as data grows, shards could potentially become a huge number. If I
  hadstio to restart all nodes and I re-run the master with the numShard=2,
  what will happen? It will be just ignored or Solr will try to reduce
  shards...?
 
  Another question...in SolrCloud, how do I restart all the cloud at once?
 Is
  it possible?
 
  Best,
  Flavio



Re: Indexing into SolrCloud

2013-07-20 Thread Erick Erickson
NP, glad I was able to help!

Erick

On Fri, Jul 19, 2013 at 11:07 AM, Beale, Jim (US-KOP)
jim.be...@hibu.com wrote:
 Hi Erick!

 Thanks for the reply.  When I call server.add() it is just to add a single 
 document.

 But, still, I think you might be correct about the size of the ultimate 
 request.  I decided to grab the bull by the horns by instantiating my own 
 HttpClient and, in so doing, my first run changed the following parameters,

 SOLR_HTTP_THREAD_COUNT=4
 SOLR_MAX_BUFFERED_DOCS=1
 SOLR_MAX_CONNECTIONS=256
 SOLR_MAX_CONNECTIONS_PER_HOST=128
 SOLR_CONNECTION_TIMEOUT=0
 SOLR_SO_TIMEOUT=0

 I doubled the number of emptying threads, reduced the size of the request 
 buffer 5x, increased the connection limits and set the timeouts to infinite.  
 (I'm not actually sure what the defaults for the timeouts were since I didn't 
 see them in the Solr code and didn't track it down.)

 Anyway, the good news is that this combination of parameters worked.  The bad 
 news is that I don't know whether it was resolved by changing one or more of 
 the parameters.

 But, regardless, I think the whole experiment verifies your thinking that the 
 request was too big!

 Thanks again!! :)


 Jim Beale
 Lead Developer
 hibu.com
 2201 Renaissance Boulevard, King of Prussia, PA, 19406
 Office: 610-879-3864
 Mobile: 610-220-3067




 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Friday, July 19, 2013 8:08 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Indexing into SolrCloud

 Usually EOF errors indicate that the packet you're sending are too big.

 Wait, though. 50K is not buffered docs, I think it's buffered _requests_.
 So you're creating a queue that's ginormous and asking 2 threads to empty it.

 But that's not really the issue I suspect. How many documents are you adding
 at a time when you call server.add? I.e. are you using sever.add(doc) or
 server.add(doclist)? If the latter and you're adding a bunch of docs, try
 lowering that number. If you're sending one doc at a time I'm on the
 wrong track.

 Best
 Erick

 On Thu, Jul 18, 2013 at 2:51 PM, Beale, Jim (US-KOP) jim.be...@hibu.com 
 wrote:
 Hey folks,

 I've been migrating an application which indexes about 15M documents from 
 straight-up Lucene into SolrCloud.  We've set up 5 Solr instances with a 3 
 zookeeper ensemble using HAProxy for load balancing. The documents are 
 processed on a quad core machine with 6 threads and indexed into SolrCloud 
 through HAProxy using ConcurrentUpdateSolrServer in order to batch the 
 updates.  The indexing box is heavily-loaded during indexing but I don't 
 think it is so bad that it would cause issues.

 I'm using Solr 4.3.1 on client and server side, zookeeper 3.4.5 and HAProxy 
 1.4.22.

 I've been accepting the default HttpClient with 50K buffered docs and 2 
 threads, i.e.,

 int solrMaxBufferedDocs = 5;
 int solrThreadCount = 2;
 solrServer = new ConcurrentUpdateSolrServer(solrHttpIPAddress, 
 solrMaxBufferedDocs, solrThreadCount);

 autoCommit is configured in the solrconfig as follows:

  autoCommit
maxTime60/maxTime
maxDocs50/maxDocs
openSearcherfalse/openSearcher
  /autoCommit

 I'm getting the following errors on the client and server sides respectively:

 Client side:

 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-4] INFO  
 SystemDefaultHttpClient - I/O exception (java.net.SocketException) caught 
 when processing request: Software caused connection abort: socket write error
 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-4] INFO  
 SystemDefaultHttpClient - Retrying request
 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-5] INFO  
 SystemDefaultHttpClient - I/O exception (java.net.SocketException) caught 
 when processing request: Software caused connection abort: socket write error
 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-5] INFO  
 SystemDefaultHttpClient - Retrying request

 Server side:

 7988753 [qtp1956653918-23] ERROR org.apache.solr.core.SolrCore  â 
 java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException] 
 early EOF
 at 
 com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
 at 
 com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
 at 
 com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
 at 
 com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
 at 
 org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393)

 When I disabled autoCommit on the server side, I didn't see any errors there 
 but I still get the issue client-side after about 2 million documents - 
 which is about 45 minutes.

 Has anyone seen this issue before?  I couldn't find anything useful on the 
 usual places.

 I suppose I could setup wireshark to see what is happening but I'm hoping 
 that someone has a better suggestion.

 

Re: Auto-sharding and numShard parameter

2013-07-20 Thread Mark Miller
A lot has changed since those example were written - in general, we are moving 
away from that type of collection initialization and towards using the 
Collections API. Eventually, I'd personally like SolrCloud to ship with no 
predefined collections and have users simply start it and then start using the 
Collections API - preconfigured collections will be second class and possibly 
deprecated at some point.

- Mark

On Jul 20, 2013, at 10:13 PM, Erick Erickson erickerick...@gmail.com wrote:

 Flavio:
 
 One of the great things about having people continually using Solr
 (and SolrCloud) for the first time is the opportunity to improve the
 docs. Anyone can update/add to the docs, all it takes is a signon.
 Unfortunately we has a bunch of spam bots a while ago, so it's now a
 two step process
 1 create a login on the Solr wiki
 2 post a message on this list indicating that you'd like to help
 improve the Wiki and give us your Solr login. We'll add you to the
 list of people who can edit the wiki and you can help the community by
 improving the documentation.
 
 Best
 Erick
 
 On Fri, Jul 19, 2013 at 8:46 AM, Flavio Pompermaier
 pomperma...@okkam.it wrote:
 Thank you for the reply Erick,
 I was facing exactly with that problem..from the documentation it seems
 that those parameter are required to run SolrCloud,
 instead they are just used to initialize a sample collection..
 I think that in the examples on the user doc it should be better to
 separate those 2 concepts: one is starting the server,
 another one is creating/managing collections.
 
 Best,
 Flavio
 
 
 On Fri, Jul 19, 2013 at 2:13 PM, Erick Erickson 
 erickerick...@gmail.comwrote:
 
 First the numShards parameter is only relevant the very first time you
 create your collection. It's a little confusing because in the SolrCloud
 examples you're getting collection1 by default. Look further down the
 SolrCloud Wiki page, the section titled
 Managing Collections via the Collections API for creating collections
 with a different name.
 
 Either way, either when you run the bootstrap command or when you
 create a new collection, that's the only time numShards counts. It's
 ignored the rest of the time.
 
 As far as data growing, you need to either
 1 create enough shards to handle the eventual size things will be,
 sometimes called oversharding
 or
 2 use the splitShard capabilities in very recent Solrs to expand
 capacity.
 
 Best
 Erick
 
 On Thu, Jul 18, 2013 at 4:52 PM, Flavio Pompermaier
 pomperma...@okkam.it wrote:
 Hi to all,
 Probably this question has a simple answer but I just want to be sure of
 the potential drawbacks..when I run SolrCloud I run the main solr
 instance
 with the -numShard option (e.g. 2).
 Then as data grows, shards could potentially become a huge number. If I
 hadstio to restart all nodes and I re-run the master with the numShard=2,
 what will happen? It will be just ignored or Solr will try to reduce
 shards...?
 
 Another question...in SolrCloud, how do I restart all the cloud at once?
 Is
 it possible?
 
 Best,
 Flavio