Re: how to change configurations in solrcloud setup

2015-03-10 Thread Aman Tandon
Hi,

Thanks Nitin for replying, isn't it will be costly operation to restart all
nodes.

What i am doing in this is uploading the configurations again to zookeeper
and then reloading my core. And it is working well. So am i missing
something?

With Regards
Aman Tandon

On Wed, Mar 11, 2015 at 11:21 AM, Nitin Solanki 
wrote:

> Hi Aman,
>  You can apply configuration on solr cloud by using this
> command -
>
> sudo
> //example/scripts/cloud-scripts/zkcli.sh
> -zkhost localhost:9983 -cmd upconfig -confdir
> //example/solr/collection1/conf -confname
> default
>
> and then restart all nodes of solrcloud.
>
> On Mon, Mar 9, 2015 at 11:43 AM, Aman Tandon 
> wrote:
>
> > Please help.
> >
> > With Regards
> > Aman Tandon
> >
> > On Sat, Mar 7, 2015 at 9:58 PM, Aman Tandon 
> > wrote:
> >
> > > Hi,
> > >
> > > Please tell me what is best way to apply configuration changes in solr
> > > cloud and how to do that.
> > >
> > > Thanks in advance.
> > >
> > > With Regards
> > > Aman Tandon
> > >
> >
>


Re: how to change configurations in solrcloud setup

2015-03-10 Thread Nitin Solanki
Hi Aman,
 You can apply configuration on solr cloud by using this
command -

sudo
//example/scripts/cloud-scripts/zkcli.sh
-zkhost localhost:9983 -cmd upconfig -confdir
//example/solr/collection1/conf -confname
default

and then restart all nodes of solrcloud.

On Mon, Mar 9, 2015 at 11:43 AM, Aman Tandon 
wrote:

> Please help.
>
> With Regards
> Aman Tandon
>
> On Sat, Mar 7, 2015 at 9:58 PM, Aman Tandon 
> wrote:
>
> > Hi,
> >
> > Please tell me what is best way to apply configuration changes in solr
> > cloud and how to do that.
> >
> > Thanks in advance.
> >
> > With Regards
> > Aman Tandon
> >
>


Re: Import Feed rss delta-import

2015-03-10 Thread Alexandre Rafalovitch
I don't think you can since you can't query RSS normally. You just do full
import and override on ids.

Regards,
Alex
On 10 Mar 2015 7:16 pm, "Ednardo"  wrote:

> Hi,
>
> How do I create a DataImportHandler using delta-import for rss feeds?
>
> Thanks!!
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Import-Feed-rss-delta-import-tp4192257.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Import Feed rss delta-import

2015-03-10 Thread Ednardo
Hi,

How do I create a DataImportHandler using delta-import for rss feeds?

Thanks!!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Import-Feed-rss-delta-import-tp4192257.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Cores and and ranking (search quality)

2015-03-10 Thread Walter Underwood
If the documents are distributed randomly across shards/cores, then the 
statistics will be similar in each core and the results will be similar.

If the documents are distributed semantically (say, by topic or type), the 
statistics of each core will be skewed towards that set of documents and the 
results could be quite different.

Assume I have tech support documents and I put all the LaserJet docs in one 
core. That term is very common in that core (poor idf) and rare in other cores 
(strong idf). But for the query “laserjet”, all the good answers are in the 
LaserJet-specific core, where they will be scored low.

An identical document that mentions “LaserJet” once will score fairly low in 
the LaserJet-specific collection and fairly high in the other collection.

Global IDF fixes this, by using corpus-wide statistics. That’s how we ran 
Infoseek and Ultraseek in the late 1990’s.

Random allocation to cores avoids it.

If you have significant traffic directed to one object type AND you need peak 
performance, you may want to segregate your cores by object type. Otherwise, 
I’d let SolrCloud spread them around randomly and filter based on an object 
type field. That should work well for most purposes.

Any core with less than 1000 records is likely to give somewhat mysterious 
results. A word that is common in English, like “next”, will only be in one 
document and will score too high. A less-common word, like “unreasonably”, will 
be in 20 and will score low. You need lots of docs for the language statistics 
to even out.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On Mar 10, 2015, at 1:23 PM, johnmu...@aol.com wrote:

> Thanks Walter.
> 
> The design decision I'm trying to solve is this: using multiple cores, will 
> my ranking be impacted vs. using single core?
> 
> I have records to index and each record can be grouped into object-types, 
> such as object-A, object-B, object-C, etc.  I have a total of 30 (maybe more) 
> object-types.  There may be only 10 records of object-A, but 10 million 
> records of object-B or 1 million of object-C, etc.  I need to be able to 
> search against a single object-type and / or across all object-types.
> 
> From my past experience, in a single core setup, if I have two identical 
> records, and I search on the term " XYZ" that matches one of the records, the 
> second record ranks right next to the other (because it too contains "XYZ").  
> This is good and is the expected behavior.  If I want to limit my search to 
> an object-type, I AND "XYZ" with that object-type.  So all is well.
> 
> What I'm considering to do for my new design is use multi-cores and 
> distributed search.  I am considering to create a core for each object-type: 
> core-A will hold records from object-A, core-B will hold records from 
> object-B, etc.  Before I can make a decision on this design, I need to know 
> how ranking will be impacted.
> 
> Going back to my earlier example: if I have 2 identical records, one of them 
> went to core-A which has 10 records, and the other went to core-B which has 
> 10 million records, using distributed search, if I now search across all 
> cores on the term " XYZ" (just like in the single core case), it will match 
> both of those records all right, but will those two records be ranked next to 
> each other just like in the single core case?  If not, which will rank 
> higher, the one from core-A or the one from core-B?
> 
> My concern is, using multi-cores and distributed search means I will give up 
> on rank quality when records are not distributed across cores evenly.  If so, 
> than maybe this is not a design I can use.
> 
> - MJ
> 
> -Original Message-
> From: Walter Underwood [mailto:wun...@wunderwood.org] 
> Sent: Tuesday, March 10, 2015 2:39 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Cores and and ranking (search quality)
> 
> On Mar 10, 2015, at 10:17 AM, johnmu...@aol.com wrote:
> 
>> If I have two cores, one core has 10 docs another has 100,000 docs.  I then 
>> submit two docs that are 100% identical (with the exception of the unique-ID 
>> fields, which is stored but not indexed) one to each core.  The question is, 
>> during search, will both of those docs rank near each other or not? […]
>> 
>> Put another way: are docs from the smaller core (the one has 10 docs only) 
>> rank higher or lower compared to docs from the larger core (the one with 
>> 100,000) docs?
> 
> These are not quite the same question.
> 
> tf.idf ranking depends on the other documents in the collection (the idf 
> term). With 10 docs, the document frequency statistics are effectively random 
> noise, so the ranking is unpredictable.
> 
> Identical documents should rank identically, but whether they are higher or 
> lower in the two cores depends on the rest of the docs.
> 
> idf statistics don’t settle down until at least 10K docs. You still sometimes 
> see anomalies under a million documents. 
> 
> Wh

Re: Invalid Date String:'1992-07-10T17'

2015-03-10 Thread Chris Hostetter

":" is a syntactically significant character to the query parser, so it's 
getting confused by it in the text of your query.

you're seeing the same problem as if you tried to search for "foo:bar" in 
the "yak" field using q=yak:foo:bar

you either need to backslash escape the ":" characters, or wrap the date 
in quotes, or use a diff parser that doesn't treat colons as special 
characters (but remember that since you are building this up as a java 
string, you have to deal with *java* string escaping as well...

   String a = "speechDate:1992-07-10T17\\:33\\:18Z";
   String a = "speechDate:\"1992-07-10T17:33:18Z\"";
   String a = "speechDate:" + 
ClientUtils.escapeQueryChars("1992-07-10T17:33:18Z");
   String a = "{!field f=speechDate}1992-07-10T17:33:18Z";

: My goal is to group these speeches (hopefully using date math syntax). I would

Unless you are truely seraching for only documents that have an *exact* 
date value matching your input (down to the millisecond) then seraching or 
a single date value is almost certainly not what you want -- you most 
likely want to do a range search...

  String a = "speechDate:[1992-07-10T00:00:00Z TO 1992-07-11T00:00:00Z]";

(which doesn't require special escaping, because the query parser is smart 
enough to know that ":" aren't special inside of the "[..]")

: like to know if you suggest me to use date or tdate or other because I have
: not understood the difference.

the difference between date and tdate has to do with how you wnat to trade 
index size (on disk & in ram) with search speed for range queries like 
these -- tdate takes up a little more room in the index, but came make 
range queries faster.


-Hoss
http://www.lucidworks.com/


Re: Cores and and ranking (search quality)

2015-03-10 Thread johnmunir
Thanks Walter.

The design decision I'm trying to solve is this: using multiple cores, will my 
ranking be impacted vs. using single core?

I have records to index and each record can be grouped into object-types, such 
as object-A, object-B, object-C, etc.  I have a total of 30 (maybe more) 
object-types.  There may be only 10 records of object-A, but 10 million records 
of object-B or 1 million of object-C, etc.  I need to be able to search against 
a single object-type and / or across all object-types.

>From my past experience, in a single core setup, if I have two identical 
>records, and I search on the term " XYZ" that matches one of the records, the 
>second record ranks right next to the other (because it too contains "XYZ").  
>This is good and is the expected behavior.  If I want to limit my search to an 
>object-type, I AND "XYZ" with that object-type.  So all is well.

What I'm considering to do for my new design is use multi-cores and distributed 
search.  I am considering to create a core for each object-type: core-A will 
hold records from object-A, core-B will hold records from object-B, etc.  
Before I can make a decision on this design, I need to know how ranking will be 
impacted.

Going back to my earlier example: if I have 2 identical records, one of them 
went to core-A which has 10 records, and the other went to core-B which has 10 
million records, using distributed search, if I now search across all cores on 
the term " XYZ" (just like in the single core case), it will match both of 
those records all right, but will those two records be ranked next to each 
other just like in the single core case?  If not, which will rank higher, the 
one from core-A or the one from core-B?

My concern is, using multi-cores and distributed search means I will give up on 
rank quality when records are not distributed across cores evenly.  If so, than 
maybe this is not a design I can use.

- MJ

-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org] 
Sent: Tuesday, March 10, 2015 2:39 PM
To: solr-user@lucene.apache.org
Subject: Re: Cores and and ranking (search quality)

On Mar 10, 2015, at 10:17 AM, johnmu...@aol.com wrote:

> If I have two cores, one core has 10 docs another has 100,000 docs.  I then 
> submit two docs that are 100% identical (with the exception of the unique-ID 
> fields, which is stored but not indexed) one to each core.  The question is, 
> during search, will both of those docs rank near each other or not? […]
> 
> Put another way: are docs from the smaller core (the one has 10 docs only) 
> rank higher or lower compared to docs from the larger core (the one with 
> 100,000) docs?

These are not quite the same question.

tf.idf ranking depends on the other documents in the collection (the idf term). 
With 10 docs, the document frequency statistics are effectively random noise, 
so the ranking is unpredictable.

Identical documents should rank identically, but whether they are higher or 
lower in the two cores depends on the rest of the docs.

idf statistics don’t settle down until at least 10K docs. You still sometimes 
see anomalies under a million documents. 

What design decision do you need to make? We can probably answer that for you.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



RE: Invalid Date String:'1992-07-10T17'

2015-03-10 Thread Ryan, Michael F. (LNG-DAY)
You'll need to wrap the date in quotes, since it contains a colon:

String a = "speechDate:\"1992-07-10T17:33:18Z\"";

-Michael

-Original Message-
From: Mirko Torrisi [mailto:mirko.torr...@ucdconnect.ie] 
Sent: Tuesday, March 10, 2015 3:34 PM
To: solr-user@lucene.apache.org
Subject: Invalid Date String:'1992-07-10T17'

Hi all,

I am very new with Solr (and Lucene) and I use the last version of it.
I do not understand why I obtain this:

Exception in thread "main"
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://localhost:8983/solr/Collection1: Invalid Date
String:'1992-07-10T17'
 at

org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:558)
 at

org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:214)
 at

org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:210)
 at

org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91)
 at
org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:302)
 at Update.main(Update.java:18)


Here the code that creates this error:

 SolrQuery query = new SolrQuery();
 String a = "speechDate:1992-07-10T17:33:18Z";
 query.set("fq", a);
 //query.setQuery( a );  <-- I also tried using this one.



According to
https://cwiki.apache.org/confluence/display/solr/Working+with+Dates, it should 
be right. I tried with others date, or just |-MM-DD, with no success.


My goal is to group these speeches (hopefully using date math syntax). I would 
like to know if you suggest me to use date or tdate or other because I have not 
understood the difference.


Thanks in advance,|

Mirko||


Invalid Date String:'1992-07-10T17'

2015-03-10 Thread Mirko Torrisi

Hi all,

I am very new with Solr (and Lucene) and I use the last version of it.
I do not understand why I obtain this:

   Exception in thread "main"
   org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
   from server at http://localhost:8983/solr/Collection1: Invalid Date
   String:'1992-07-10T17'
at
   
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:558)
at
   
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:214)
at
   
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:210)
at
   
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91)
at
   org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:302)
at Update.main(Update.java:18)


Here the code that creates this error:

SolrQuery query = new SolrQuery();
String a = "speechDate:1992-07-10T17:33:18Z";
query.set("fq", a);
//query.setQuery( a );  <-- I also tried using this one.



According to 
https://cwiki.apache.org/confluence/display/solr/Working+with+Dates, it 
should be right. I tried with others date, or just |-MM-DD, with no 
success.



My goal is to group these speeches (hopefully using date math syntax). I 
would like to know if you suggest me to use date or tdate or other 
because I have not understood the difference.



Thanks in advance,|

Mirko||


Re: Num docs, block join, and dupes?

2015-03-10 Thread Mikhail Khludnev
On Tue, Mar 10, 2015 at 7:09 PM, Timothy Potter 
wrote:

> So I guess my question is why doesn't the non-distrib query do
> de-duping?
>

Tim,
that's by design behavior. the special _root_ field is used as a delete
term when a block update is applied i.e in case of block,  is
not used. see
https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java#L224
I agree that's one of the issues of the current block update
implementation, but frankly speaking, I didn't consider it as an oddity. Do
you? What do you want to achieve?

-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Cores and and ranking (search quality)

2015-03-10 Thread Walter Underwood
On Mar 10, 2015, at 10:17 AM, johnmu...@aol.com wrote:

> If I have two cores, one core has 10 docs another has 100,000 docs.  I then 
> submit two docs that are 100% identical (with the exception of the unique-ID 
> fields, which is stored but not indexed) one to each core.  The question is, 
> during search, will both of those docs rank near each other or not? […]
> 
> Put another way: are docs from the smaller core (the one has 10 docs only) 
> rank higher or lower compared to docs from the larger core (the one with 
> 100,000) docs?

These are not quite the same question.

tf.idf ranking depends on the other documents in the collection (the idf term). 
With 10 docs, the document frequency statistics are effectively random noise, 
so the ranking is unpredictable.

Identical documents should rank identically, but whether they are higher or 
lower in the two cores depends on the rest of the docs.

idf statistics don’t settle down until at least 10K docs. You still sometimes 
see anomalies under a million documents. 

What design decision do you need to make? We can probably answer that for you.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



Re: Num docs, block join, and dupes?

2015-03-10 Thread Jessica Mallet
We've seen this as well. Before we understood the cause, it seemed very
bizarre that hitting different nodes would yield different numFound, as
well as using different rows=N (since the proxying node only de-dupe the
documents that are returned in the response).

I think "consistency" and "correctness" should be clearly delineated. Of
course we'd rather have consistently correct result, but failing that, I'd
rather have consistently incorrect result rather than inconsistent results
because otherwise it's even hard to debug, as was the case here.

I think either the node hosting the shard should also do the de-duping, or
no one should. It's strange that the proxying node decides to do some
sketchy limited result set de-dupe.

On Tue, Mar 10, 2015 at 9:09 AM, Timothy Potter 
wrote:
>
> Before I open a JIRA, I wanted to put this out to solicit feedback on what
> I'm seeing and what Solr should be doing. So I've indexed the following 8
> docs into a 2-shard collection (Solr 4.8'ish - internal custom branch
> roughly based on 4.8) ... notice that the 3 grand-children of 2-1 have
> dup'd keys:
>
> [
>   {
> "id":"1",
> "name":"parent",
> "_childDocuments_":[
>   {
> "id":"1-1",
> "name":"child"
>   },
>   {
> "id":"1-2",
> "name":"child"
>   }
> ]
>   },
>   {
> "id":"2",
> "name":"parent",
> "_childDocuments_":[
>   {
> "id":"2-1",
> "name":"child",
> "_childDocuments_":[
>   {
> "id":"2-1-1",
> "name":"grandchild"
>   },
>   {
> "id":"2-1-1",
> "name":"grandchild2"
>   },
>   {
> "id":"2-1-1",
> "name":"grandchild3"
>   }
> ]
>   }
> ]
>   }
> ]
>
> When I query this collection, using:
>
>
http://localhost:8984/solr/blockjoin2_shard2_replica1/select?q=*%3A*&wt=json&indent=true&shards.info=true&rows=10
>
> I get:
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":9,
> "params":{
>   "indent":"true",
>   "q":"*:*",
>   "shards.info":"true",
>   "wt":"json",
>   "rows":"10"}},
>   "shards.info":{
> "
http://localhost:8984/solr/blockjoin2_shard1_replica1/|http://localhost:8985/solr/blockjoin2_shard1_replica2/
":{
>   "numFound":3,
>   "maxScore":1.0,
>   "shardAddress":"
http://localhost:8984/solr/blockjoin2_shard1_replica1";,
>   "time":4},
> "
http://localhost:8984/solr/blockjoin2_shard2_replica1/|http://localhost:8985/solr/blockjoin2_shard2_replica2/
":{
>   "numFound":5,
>   "maxScore":1.0,
>   "shardAddress":"
http://localhost:8985/solr/blockjoin2_shard2_replica2";,
>   "time":4}},
>   "response":{"numFound":6,"start":0,"maxScore":1.0,"docs":[
>   {
> "id":"1-1",
> "name":"child"},
>   {
> "id":"1-2",
> "name":"child"},
>   {
> "id":"1",
> "name":"parent",
> "_version_":1495272401329455104},
>   {
> "id":"2-1-1",
> "name":"grandchild"},
>   {
> "id":"2-1",
> "name":"child"},
>   {
> "id":"2",
> "name":"parent",
> "_version_":1495272401361960960}]
>   }}
>
>
> So Solr has de-duped the results.
>
> If I execute this query against the shard that has the dupes
(distrib=false):
>
>
http://localhost:8984/solr/blockjoin2_shard2_replica1/select?q=*%3A*&wt=json&indent=true&shards.info=true&rows=10&distrib=false
>
> Then the dupes are returned:
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":0,
> "params":{
>   "indent":"true",
>   "q":"*:*",
>   "shards.info":"true",
>   "distrib":"false",
>   "wt":"json",
>   "rows":"10"}},
>   "response":{"numFound":5,"start":0,"docs":[
>   {
> "id":"2-1-1",
> "name":"grandchild"},
>   {
> "id":"2-1-1",
> "name":"grandchild2"},
>   {
> "id":"2-1-1",
> "name":"grandchild3"},
>   {
> "id":"2-1",
> "name":"child"},
>   {
> "id":"2",
> "name":"parent",
> "_version_":1495272401361960960}]
>   }}
>
> So I guess my question is why doesn't the non-distrib query do
> de-duping? Mainly confirming this is how it's supposed to work and
> this behavior doesn't strike anyone else as odd ;-)
>
> Cheers,
>
> Tim


Re: Solr TCP layer

2015-03-10 Thread Shawn Heisey
On 3/10/2015 12:13 PM, Saumitra Srivastav wrote:
> Now we want to do the same with Solr. While I do realize that this is going
> to be a lot of work, but if its something that will reap benefit in long
> run, then so be it. Datastax provides a netty based layer in their
> enterprise version which folks have reported to be faster.

Netty has been discussed as a replacement for the Servlet API, as one
pathway towards Solr becoming a standalone application.  I'm pretty sure
that the general thinking within the project is to keep using HTTP (that
is one of the protocols that Netty implements) but the hope is that it
would be more efficient than a servlet container.  There is a lot of
evidence that Netty implements network communication much more
efficiently than other libraries.

If you have the experience to do work like that, user contributions are
always welcome.

Thanks,
Shawn



Re: Solr TCP layer

2015-03-10 Thread Walter Underwood
I would strongly recommend taking a look at HTTP/2. It might not be fast enough 
for you, but it is fast enough for Google and there are already implementations.

http://http2.github.io/faq/

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On Mar 10, 2015, at 11:18 AM, Erick Erickson  wrote:

> Saumitra:
> 
> We certainly don't mean to be overly discouraging, so have at it!
> There has been some talk of using Netty in the future as we pull the
> war-file distribution out of the distro. Now, I have no technical clue
> about the merits .vs. TCP. But that's another possibility you might
> want to put into your analysis.
> 
> Best,
> Erick
> 
> On Tue, Mar 10, 2015 at 11:13 AM, Saumitra Srivastav
>  wrote:
>> Thanks everyone for the responses.
>> 
>> My motivation for TCP is coming from a very heavy indexing pipeline where
>> the smallest of optimization matters. I am working on a machine data parser
>> which feeds data into Cassandra and Solr and we have SLAs based on how fast
>> we can make data available in both the sources. We used to have issues with
>> Cassandra as well but we optimized the s**t out of it.
>> 
>> Now we want to do the same with Solr. While I do realize that this is going
>> to be a lot of work, but if its something that will reap benefit in long
>> run, then so be it. Datastax provides a netty based layer in their
>> enterprise version which folks have reported to be faster. Now just because
>> a commercial vendor ships it, doesn't mean we will jump into it without
>> thinking. We will definitely do a effect-vs-effort analysis before
>> committing to this.
>> 
>> For majority of users, such high performance might not be a
>> requirement/priority, so I understand the reluctance to go down this path.
>> 
>> I think it would be best at this time that I start exploring this option and
>> get back with my analysis.
>> 
>> Thanks again.
>> 
>> Saumitra
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Solr-TCP-layer-tp4191715p4192176.html
>> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr TCP layer

2015-03-10 Thread Erick Erickson
Saumitra:

We certainly don't mean to be overly discouraging, so have at it!
There has been some talk of using Netty in the future as we pull the
war-file distribution out of the distro. Now, I have no technical clue
about the merits .vs. TCP. But that's another possibility you might
want to put into your analysis.

Best,
Erick

On Tue, Mar 10, 2015 at 11:13 AM, Saumitra Srivastav
 wrote:
> Thanks everyone for the responses.
>
> My motivation for TCP is coming from a very heavy indexing pipeline where
> the smallest of optimization matters. I am working on a machine data parser
> which feeds data into Cassandra and Solr and we have SLAs based on how fast
> we can make data available in both the sources. We used to have issues with
> Cassandra as well but we optimized the s**t out of it.
>
> Now we want to do the same with Solr. While I do realize that this is going
> to be a lot of work, but if its something that will reap benefit in long
> run, then so be it. Datastax provides a netty based layer in their
> enterprise version which folks have reported to be faster. Now just because
> a commercial vendor ships it, doesn't mean we will jump into it without
> thinking. We will definitely do a effect-vs-effort analysis before
> committing to this.
>
> For majority of users, such high performance might not be a
> requirement/priority, so I understand the reluctance to go down this path.
>
> I think it would be best at this time that I start exploring this option and
> get back with my analysis.
>
> Thanks again.
>
> Saumitra
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-TCP-layer-tp4191715p4192176.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr TCP layer

2015-03-10 Thread Saumitra Srivastav
Thanks everyone for the responses.

My motivation for TCP is coming from a very heavy indexing pipeline where
the smallest of optimization matters. I am working on a machine data parser
which feeds data into Cassandra and Solr and we have SLAs based on how fast
we can make data available in both the sources. We used to have issues with
Cassandra as well but we optimized the s**t out of it.

Now we want to do the same with Solr. While I do realize that this is going
to be a lot of work, but if its something that will reap benefit in long
run, then so be it. Datastax provides a netty based layer in their
enterprise version which folks have reported to be faster. Now just because
a commercial vendor ships it, doesn't mean we will jump into it without
thinking. We will definitely do a effect-vs-effort analysis before
committing to this. 

For majority of users, such high performance might not be a
requirement/priority, so I understand the reluctance to go down this path.

I think it would be best at this time that I start exploring this option and
get back with my analysis.

Thanks again.

Saumitra



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-TCP-layer-tp4191715p4192176.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Chaining components in request handler

2015-03-10 Thread Alexandre Rafalovitch
Ok. Components then. Defined in solrconfig.xml. You can
prepend/append/replace the standard list.

Try that and see if that's enough.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 10 March 2015 at 14:03, Ashish Mukherjee  wrote:
> Would like to do it during querying.
>
> Thanks,
> Ashish
>
> On Tue, Mar 10, 2015 at 11:07 PM, Alexandre Rafalovitch 
> wrote:
>
>> Is that during indexing or during query phase?
>>
>> Indexing has UpdateRequestProcessors (e.g.
>> http://www.solr-start.com/info/update-request-processors/ )
>> Query has Components (e.g. Faceting, MoreLIkeThis, etc)
>>
>> Or something different?
>>
>> Regards,
>>Alex.
>> 
>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> http://www.solr-start.com/
>>
>>
>> On 10 March 2015 at 13:34, Ashish Mukherjee 
>> wrote:
>> > Hello,
>> >
>> > I would like to create a request handler which chains components in a
>> > particular sequence to return the result, similar to a Unix pipe.
>> >
>> > eg. Component 1 -> result1 -> Component 2 -> result2
>> >
>> > result2 is final result returned.
>> >
>> > Component 1 may be a standard component, Component 2 may be out of the
>> box.
>> >
>> > Is there any tutorial which describes how to wire together components
>> like
>> > this in a single handler?
>> >
>> > Regards,
>> > Ashish
>>


Re: Solr 5.0.0 - Multiple instances sharing Solr server *read-only* dir

2015-03-10 Thread Damien Dykman
Thanks Timothy for the pointer to the Jira ticket. That's exactly it :-)

Erick, the main reason why I would run multiple instances on the same
machine is to simulate a multi node environment. But beyond that, I like
the idea of being able to clearly separate the server dir and the data
dirs. That way the server dir could be deployed by root. Yet Solr
instances could run in userland.

Damien

On 03/10/2015 09:31 AM, Timothy Potter wrote:
> I think the next step here is to ship Solr with the war already extracted
> so that Jetty doesn't need to extract it on first startup -
> https://issues.apache.org/jira/browse/SOLR-7227
>
> On Tue, Mar 10, 2015 at 10:15 AM, Erick Erickson 
> wrote:
>
>> If I'm understanding your problem correctly, I think you want the -d
>> option,
>> then all the -s guys would be under that.
>>
>> Just to check, though, why are you running multiple Solrs? There are
>> sometimes
>> very good reasons, just checking that you're not making things more
>> difficult
>> than necessary
>>
>> Best,
>> Erick
>>
>> On Mon, Mar 9, 2015 at 4:59 PM, Damien Dykman 
>> wrote:
>>> Hi all,
>>>
>>> Quoted from
>>>
>> https://cwiki.apache.org/confluence/display/solr/Solr+Start+Script+Reference
>>> "When running multiple instances of Solr on the same host, it is more
>>> common to use the same server directory for each instance and use a
>>> unique Solr home directory using the -s option."
>>>
>>> Is there a way to achieve this without making *any* changes to the
>>> extracted content of solr-5.0.0.tgz and only use runtime parameters? I
>>> other words, make the extracted folder solr-5.0.0 strictly read-only?
>>>
>>> By default, the Solr web app is deployed under server/solr-webapp, as
>>> per solr-jetty-context.xml. So unless I change solr-jetty-context.xml, I
>>> cannot make folder sorl-5.0.0 read-only to my Solr instances.
>>>
>>> I've figured out how to make the log files and pid file to be located
>>> under the Solr data dir by doing:
>>>
>>> export SOLR_PID_DIR=mySolrDataDir/logs; \
>>> export SOLR_LOGS_DIR=mySolrDataDir/logs; \
>>> bin/solr start -c -z localhost:32101/solr \
>>>  -s mySolrDataDir \
>>>  -a "-Dsolr.log=mySolrDataDir/logs" \
>>>  -p 31100 -h localhost
>>>
>>> But if there was a way to not have to change solr-jetty-context.xml that
>>> would be awesome! Thoughts?
>>>
>>> Thanks,
>>> Damien



Re: Cores and and ranking (search quality)

2015-03-10 Thread Shawn Heisey
On 3/10/2015 11:17 AM, johnmu...@aol.com wrote:
> If I have two cores, one core has 10 docs another has 100,000 docs.  I then 
> submit two docs that are 100% identical (with the exception of the unique-ID 
> fields, which is stored but not indexed) one to each core.  The question is, 
> during search, will both of those docs rank near each other or not?  If so, 
> this is great because it will behave the same as if I had one core and index 
> both docs to this single core.  If not, which core's doc will rank higher and 
> how far apart the two docs be from each other in the ranking?
>
> Put another way: are docs from the smaller core (the one has 10 docs only) 
> rank higher or lower compared to docs from the larger core (the one with 
> 100,000) docs?

Without specific knowledge about the document in question as well as all
the other documents, this is impossible to answer, except to say that
the relative ranking position is likely to be different.  Dropping back
to general info:

The overall term frequency and inverse document frequency (TF-IDF) in
the 100,000 document index will very likely be quite a lot different
than in the 10 document index.  That will affect ranking order. 
Sometimes users are surprised by the results they get, but it is very
rare to find a bug in Lucene scoring.

In addition to the debug parameter that Erick told you about, here are a
couple of classes you could investigate at the source code level for
more information about ranking:

http://lucene.apache.org/core/4_10_2/core/org/apache/lucene/search/similarities/Similarity.html
http://lucene.apache.org/core/4_10_2/core/org/apache/lucene/search/similarities/DefaultSimilarity.html

Here's info that is more general, and from a much earlier Lucene version:

https://lucene.apache.org/core/3_6_2/scoring.html

I have my Solr install configured to use the BM25 similarity.

http://lucene.apache.org/core/4_10_2/core/org/apache/lucene/search/similarities/BM25Similarity.html
http://en.wikipedia.org/wiki/Okapi_BM25

SOLR-1632 aims to make TF-IDF the same across multiple cores as you
would get if you only had one core.  I do not know enough about it to
know whether it is EXACTLY the same, or only an approximation ... but in
a search context, 100 percent precise calculation is rarely required. 
When you drop that as a requirement, search becomes easier and a LOT faster.

Thanks,
Shawn



Re: Chaining components in request handler

2015-03-10 Thread Alexandre Rafalovitch
Is that during indexing or during query phase?

Indexing has UpdateRequestProcessors (e.g.
http://www.solr-start.com/info/update-request-processors/ )
Query has Components (e.g. Faceting, MoreLIkeThis, etc)

Or something different?

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 10 March 2015 at 13:34, Ashish Mukherjee  wrote:
> Hello,
>
> I would like to create a request handler which chains components in a
> particular sequence to return the result, similar to a Unix pipe.
>
> eg. Component 1 -> result1 -> Component 2 -> result2
>
> result2 is final result returned.
>
> Component 1 may be a standard component, Component 2 may be out of the box.
>
> Is there any tutorial which describes how to wire together components like
> this in a single handler?
>
> Regards,
> Ashish


Chaining components in request handler

2015-03-10 Thread Ashish Mukherjee
Hello,

I would like to create a request handler which chains components in a
particular sequence to return the result, similar to a Unix pipe.

eg. Component 1 -> result1 -> Component 2 -> result2

result2 is final result returned.

Component 1 may be a standard component, Component 2 may be out of the box.

Is there any tutorial which describes how to wire together components like
this in a single handler?

Regards,
Ashish


Re: Cores and and ranking (search quality)

2015-03-10 Thread johnmunir
Thanks Erick for trying to help, I really appreciate it.  Unfortunately, I'm 
still stuck.

There are times one must know the inner working and behavior of the software to 
make design decision and this one is one of them.  If I know the inner working 
of Solr, I would not be asking.  In addition, I'm in the design process, so I'm 
not able to fully test.  Beside my test could be invalid because I may not set 
it up right due to my lack of understanding the inner working of Solr.

Given this, I hope you don't mind me asking again.

If I have two cores, one core has 10 docs another has 100,000 docs.  I then 
submit two docs that are 100% identical (with the exception of the unique-ID 
fields, which is stored but not indexed) one to each core.  The question is, 
during search, will both of those docs rank near each other or not?  If so, 
this is great because it will behave the same as if I had one core and index 
both docs to this single core.  If not, which core's doc will rank higher and 
how far apart the two docs be from each other in the ranking?

Put another way: are docs from the smaller core (the one has 10 docs only) rank 
higher or lower compared to docs from the larger core (the one with 100,000) 
docs?

Thanks!

-- MJ

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, March 10, 2015 11:47 AM
To: solr-user@lucene.apache.org
Subject: Re: Cores and and ranking (search quality)

SOLR-1632 will certainly help. But trying to predict whether your core A or 
core B will appear first doesn't really seem like a good use of time. If you 
actually have a setup like you describe, add &debug=all to your query on both 
cores and you'll see all the gory detail of how the scores are calculated, 
providing a definitive answer in _your_ situation.

Best,
Erick

On Mon, Mar 9, 2015 at 5:44 AM,   wrote:
> (reposing this to see if anyone can help)
>
>
> Help me understand this better (regarding ranking).
>
> If I have two docs that are 100% identical with the exception of uid (which 
> is stored but not indexed).  In a single core setup, if I search "xyz" such 
> that those 2 docs end up ranking as #1 and #2.  When I switch over to two 
> core setup, doc-A goes to core-A (which has 10 records) and doc-B goes to 
> core-B (which has 100,000 records).
>
> Now, are you saying in 2 core setup if I search on "xyz" (just like in singe 
> core setup) this time I will not see doc-A and doc-B as #1 and #2 in ranking? 
>  That is, are you saying doc-A may now be somewhere at the top / bottom far 
> away from doc-B?  If so, which will be #1: the doc off core-A (that has 10 
> records) or doc-B off core-B (that has 100,000 records)?
>
> If I got all this right, are you saying SOLR-1632 will fix this issue such 
> that the end result will now be as if I had 1 core?
>
> - MJ
>
>
> -Original Message-
> From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk]
> Sent: Thursday, March 5, 2015 9:06 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Cores and and ranking (search quality)
>
> On Thu, 2015-03-05 at 14:34 +0100, johnmu...@aol.com wrote:
>> My question is this: if I put my data in multiple cores and use 
>> distributed search will the ranking be different if I had all my data 
>> in a single core?
>
> Yes, it will be different. The practical impact depends on how homogeneous 
> your data are across the shards and how large your shards are. If you have 
> small and dissimilar shards, your ranking will suffer a lot.
>
> Work is being done to remedy this:
> https://issues.apache.org/jira/browse/SOLR-1632
>
>> Also, will facet and more-like-this quality / result be the same?
>
> It is not formally guaranteed, but for most practical purposes, faceting on 
> multi-shards will give you the same results as single-shards.
>
> I don't know about more-like-this. My guess is that it will be affected in 
> the same way that standard searches are.
>
>> Also, reading the distributed search wiki
>> (http://wiki.apache.org/solr/DistributedSearch) it looks like Solr 
>> does the search and result merging (all I have to do is issue a 
>> search), is this correct?
>
> Yes. From a user-perspective, searches are no different.
>
> - Toke Eskildsen, State and University Library, Denmark
>



Re: Solr 5.0.0 - Multiple instances sharing Solr server *read-only* dir

2015-03-10 Thread Timothy Potter
I think the next step here is to ship Solr with the war already extracted
so that Jetty doesn't need to extract it on first startup -
https://issues.apache.org/jira/browse/SOLR-7227

On Tue, Mar 10, 2015 at 10:15 AM, Erick Erickson 
wrote:

> If I'm understanding your problem correctly, I think you want the -d
> option,
> then all the -s guys would be under that.
>
> Just to check, though, why are you running multiple Solrs? There are
> sometimes
> very good reasons, just checking that you're not making things more
> difficult
> than necessary
>
> Best,
> Erick
>
> On Mon, Mar 9, 2015 at 4:59 PM, Damien Dykman 
> wrote:
> > Hi all,
> >
> > Quoted from
> >
> https://cwiki.apache.org/confluence/display/solr/Solr+Start+Script+Reference
> >
> > "When running multiple instances of Solr on the same host, it is more
> > common to use the same server directory for each instance and use a
> > unique Solr home directory using the -s option."
> >
> > Is there a way to achieve this without making *any* changes to the
> > extracted content of solr-5.0.0.tgz and only use runtime parameters? I
> > other words, make the extracted folder solr-5.0.0 strictly read-only?
> >
> > By default, the Solr web app is deployed under server/solr-webapp, as
> > per solr-jetty-context.xml. So unless I change solr-jetty-context.xml, I
> > cannot make folder sorl-5.0.0 read-only to my Solr instances.
> >
> > I've figured out how to make the log files and pid file to be located
> > under the Solr data dir by doing:
> >
> > export SOLR_PID_DIR=mySolrDataDir/logs; \
> > export SOLR_LOGS_DIR=mySolrDataDir/logs; \
> > bin/solr start -c -z localhost:32101/solr \
> >  -s mySolrDataDir \
> >  -a "-Dsolr.log=mySolrDataDir/logs" \
> >  -p 31100 -h localhost
> >
> > But if there was a way to not have to change solr-jetty-context.xml that
> > would be awesome! Thoughts?
> >
> > Thanks,
> > Damien
>


Re: unusually high 4.10.2 vs 4.3.1 RAM consumption

2015-03-10 Thread Erick Erickson
Thanks for letting us know!

Erick

On Tue, Mar 10, 2015 at 5:20 AM, Dmitry Kan  wrote:
> For the sake of the story completeness, just wanted to confirm these params
> made a positive affect:
>
> -Dsolr.solr.home=cores -Xmx12000m -Djava.awt.headless=true -XX:+UseParNewGC
> -XX:+ExplicitGCInvokesConcurrent -XX:+UseConcMarkSweepGC
> -XX:MaxTenuringThreshold=8 -XX:CMSInitiatingOccupancyFraction=40
>
> This freed up couple dozen GBs on the solr server!
>
> On Tue, Feb 17, 2015 at 1:47 PM, Dmitry Kan  wrote:
>
>> Thanks Toke!
>>
>> Now I consistently see the saw-tooth pattern on two shards with new GC
>> parameters, next I will try your suggestion.
>>
>> The current params are:
>>
>> -Xmx25600m -XX:+UseParNewGC -XX:+ExplicitGCInvokesConcurrent
>> -XX:+UseConcMarkSweepGC -XX:MaxTenuringThreshold=8
>> -XX:CMSInitiatingOccupancyFraction=40
>>
>> Dmitry
>>
>> On Tue, Feb 17, 2015 at 1:34 PM, Toke Eskildsen 
>> wrote:
>>
>>> On Tue, 2015-02-17 at 11:05 +0100, Dmitry Kan wrote:
>>> > Solr: 4.10.2 (high load, mass indexing)
>>> > Java: 1.7.0_76 (Oracle)
>>> > -Xmx25600m
>>> >
>>> >
>>> > Solr: 4.3.1 (normal load, no mass indexing)
>>> > Java: 1.7.0_11 (Oracle)
>>> > -Xmx25600m
>>> >
>>> > The RAM consumption remained the same after the load has stopped on the
>>> > 4.10.2 cluster. Manually collecting the memory on a 4.10.2 shard via
>>> > jvisualvm dropped the used RAM from 8,5G to 0,5G. But the reserved RAM
>>> as
>>> > seen by top remained at 9G level.
>>>
>>> As the JVM does not free OS memory once allocated, top just shows
>>> whatever peak it reached at some point. When you tell the JVM that it is
>>> free to use 25GB, it makes a lot of sense to allocate a fair chunk of
>>> that instead of garbage collecting if there is a period of high usage
>>> (mass indexing for example).
>>>
>>> > What else could be the artifact of such a difference -- Solr or JVM?
>>> Can it
>>> > only be explained by the mass indexing? What is worrisome is that the
>>> > 4.10.2 shard reserves 8x times it uses.
>>>
>>> If you set your Xmx to a lot less, the JVM will probably favour more
>>> frequent garbage collections over extra heap allocation.
>>>
>>> - Toke Eskildsen, State and University Library, Denmark
>>>
>>>
>>>
>>
>>
>> --
>> Dmitry Kan
>> Luke Toolbox: http://github.com/DmitryKey/luke
>> Blog: http://dmitrykan.blogspot.com
>> Twitter: http://twitter.com/dmitrykan
>> SemanticAnalyzer: www.semanticanalyzer.info
>>
>>
>
>
> --
> Dmitry Kan
> Luke Toolbox: http://github.com/DmitryKey/luke
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
> SemanticAnalyzer: www.semanticanalyzer.info


Re: Solrcloud Index corruption

2015-03-10 Thread Erick Erickson
Ahhh, ok. When you reloaded the cores, did you do it core-by-core?
I can see how something could get dropped in that case.

However, if you used the Collections API and two cores mysteriously
failed to reload that would be a bug. Assuming the replicas in question
were up and running at the time you reloaded.

Thanks for letting us know what's going on.
Erick

On Tue, Mar 10, 2015 at 4:34 AM, Martin de Vries
 wrote:
> Hi,
>
>> this _sounds_ like you somehow don't have indexed="true" set for the
>> field in question.
>
>
> We investigated a lot more. The CheckIndex tool didn't find any error. We
> now think the following happened:
> - We changed the schema two months ago: we changed a field to
> indexed="true". We reloaded the cores, but two of them doesn't seem to be
> reloaded (maybe we forgot).
> - We reindexed all content. The new field worked fine.
> - We think the leader changed to a server that didn't reload the core
> - After that we field stopped working for new indexed documents
>
> Thanks for your help.
>
>
> Martin
>
>
>
>
> Erick Erickson schreef op 06.03.2015 17:02:
>
>> bq: You say in our case some docs didn't made it to the node, but
>> that's not really true: the docs can be found on the corrupted nodes
>> when I search on ID. The docs are also complete. The problem is that
>> the docs do not appear when I filter on certain fields
>>
>> this _sounds_ like you somehow don't have indexed="true" set for the
>> field in question. But it also sounds like you're saying that search
>> on that field works on some nodes but not on others, I'm assuming
>> you're adding "&distrib=false" to verify this. It shouldn't be
>> possible to have different schema.xml files on the different nodes,
>> but you might try checking through the admin UI.
>>
>> Network burps shouldn't be related here. If the content is stored,
>> then the info made it to Solr intact, so this issue shouldn't be
>> related to that.
>>
>> Sounds like it may just be the bugs Mark is referencing, sorry I don't
>> have the JIRA numbers right off.
>>
>> Best,
>> Erick
>>
>> On Thu, Mar 5, 2015 at 4:46 PM, Shawn Heisey  wrote:
>>
>>> On 3/5/2015 3:13 PM, Martin de Vries wrote:
>>>
 I understand there is not a "master" in SolrCloud. In our case we use
 haproxy as a load balancer for every request. So when indexing every
 document will be sent to a different solr server, immediately after
 each other. Maybe SolrCloud is not able to handle that correctly?
>>>
>>> SolrCloud can handle that correctly, but currently sending index
>>> updates to a core that is not the leader of the shard will incur a
>>> significant performance hit, compared to always sending updates to the
>>> correct core. A small performance penalty would be understandable,
>>> because the request must be redirected, but what actually happens is a
>>> much larger penalty than anyone expected. We have an issue in Jira to
>>> investigate that performance issue and make it work as efficiently as
>>> possible. Indexing batches of documents is recommended, not sending one
>>> document per update request. General performance problems with Solr
>>> itself can lead to extremely odd and unpredictable behavior from
>>> SolrCloud. Most often these kinds of performance problems are related
>>> in some way to memory, either the java heap or available memory in the
>>> system. http://wiki.apache.org/solr/SolrPerformanceProblems [1] Thanks,
>>> Shawn
>
>
>
>
> Links:
> --
> [1] http://wiki.apache.org/solr/SolrPerformanceProblems


Re: Solr 5.0.0 - Multiple instances sharing Solr server *read-only* dir

2015-03-10 Thread Erick Erickson
If I'm understanding your problem correctly, I think you want the -d option,
then all the -s guys would be under that.

Just to check, though, why are you running multiple Solrs? There are sometimes
very good reasons, just checking that you're not making things more difficult
than necessary

Best,
Erick

On Mon, Mar 9, 2015 at 4:59 PM, Damien Dykman  wrote:
> Hi all,
>
> Quoted from
> https://cwiki.apache.org/confluence/display/solr/Solr+Start+Script+Reference
>
> "When running multiple instances of Solr on the same host, it is more
> common to use the same server directory for each instance and use a
> unique Solr home directory using the -s option."
>
> Is there a way to achieve this without making *any* changes to the
> extracted content of solr-5.0.0.tgz and only use runtime parameters? I
> other words, make the extracted folder solr-5.0.0 strictly read-only?
>
> By default, the Solr web app is deployed under server/solr-webapp, as
> per solr-jetty-context.xml. So unless I change solr-jetty-context.xml, I
> cannot make folder sorl-5.0.0 read-only to my Solr instances.
>
> I've figured out how to make the log files and pid file to be located
> under the Solr data dir by doing:
>
> export SOLR_PID_DIR=mySolrDataDir/logs; \
> export SOLR_LOGS_DIR=mySolrDataDir/logs; \
> bin/solr start -c -z localhost:32101/solr \
>  -s mySolrDataDir \
>  -a "-Dsolr.log=mySolrDataDir/logs" \
>  -p 31100 -h localhost
>
> But if there was a way to not have to change solr-jetty-context.xml that
> would be awesome! Thoughts?
>
> Thanks,
> Damien


Re: Field Rename in SOLR

2015-03-10 Thread Erick Erickson
What do you mean "rename field"? It _looks_ like you're trying to get
the results into a doc from your document and changing it's name _in
the results_.
I.e. you have "ProductName" in your document, but want to see
Name_en-US in your output.

My guess is that the hyphen is the problem. Does it work if you try to
bet Name_en_US? Generally, hyphens are a bad idea with field names.

Best,
Erick

On Mon, Mar 9, 2015 at 2:38 PM, EXTERNAL Taminidi Ravi (ETI,
AA-AS/PAS-PTS)  wrote:
> Hello, Can anyone know how to rename a field with the below field Name. When 
> I try the below method it says undefined field "Name_en"
>
> &fl=ProductName:Name_en-US
>
> It throws error saying undefined field 'Name_en', it is not recognizing the 
> full field name.. 'Name_en-US'
>
> Is there any work around..?
>
> Thanks
>
> Ravi
>


Num docs, block join, and dupes?

2015-03-10 Thread Timothy Potter
Before I open a JIRA, I wanted to put this out to solicit feedback on what
I'm seeing and what Solr should be doing. So I've indexed the following 8
docs into a 2-shard collection (Solr 4.8'ish - internal custom branch
roughly based on 4.8) ... notice that the 3 grand-children of 2-1 have
dup'd keys:

[
  {
"id":"1",
"name":"parent",
"_childDocuments_":[
  {
"id":"1-1",
"name":"child"
  },
  {
"id":"1-2",
"name":"child"
  }
]
  },
  {
"id":"2",
"name":"parent",
"_childDocuments_":[
  {
"id":"2-1",
"name":"child",
"_childDocuments_":[
  {
"id":"2-1-1",
"name":"grandchild"
  },
  {
"id":"2-1-1",
"name":"grandchild2"
  },
  {
"id":"2-1-1",
"name":"grandchild3"
  }
]
  }
]
  }
]

When I query this collection, using:

http://localhost:8984/solr/blockjoin2_shard2_replica1/select?q=*%3A*&wt=json&indent=true&shards.info=true&rows=10

I get:

{
  "responseHeader":{
"status":0,
"QTime":9,
"params":{
  "indent":"true",
  "q":"*:*",
  "shards.info":"true",
  "wt":"json",
  "rows":"10"}},
  "shards.info":{

"http://localhost:8984/solr/blockjoin2_shard1_replica1/|http://localhost:8985/solr/blockjoin2_shard1_replica2/":{
  "numFound":3,
  "maxScore":1.0,
  "shardAddress":"http://localhost:8984/solr/blockjoin2_shard1_replica1";,
  "time":4},

"http://localhost:8984/solr/blockjoin2_shard2_replica1/|http://localhost:8985/solr/blockjoin2_shard2_replica2/":{
  "numFound":5,
  "maxScore":1.0,
  "shardAddress":"http://localhost:8985/solr/blockjoin2_shard2_replica2";,
  "time":4}},
  "response":{"numFound":6,"start":0,"maxScore":1.0,"docs":[
  {
"id":"1-1",
"name":"child"},
  {
"id":"1-2",
"name":"child"},
  {
"id":"1",
"name":"parent",
"_version_":1495272401329455104},
  {
"id":"2-1-1",
"name":"grandchild"},
  {
"id":"2-1",
"name":"child"},
  {
"id":"2",
"name":"parent",
"_version_":1495272401361960960}]
  }}


So Solr has de-duped the results.

If I execute this query against the shard that has the dupes (distrib=false):

http://localhost:8984/solr/blockjoin2_shard2_replica1/select?q=*%3A*&wt=json&indent=true&shards.info=true&rows=10&distrib=false

Then the dupes are returned:

{
  "responseHeader":{
"status":0,
"QTime":0,
"params":{
  "indent":"true",
  "q":"*:*",
  "shards.info":"true",
  "distrib":"false",
  "wt":"json",
  "rows":"10"}},
  "response":{"numFound":5,"start":0,"docs":[
  {
"id":"2-1-1",
"name":"grandchild"},
  {
"id":"2-1-1",
"name":"grandchild2"},
  {
"id":"2-1-1",
"name":"grandchild3"},
  {
"id":"2-1",
"name":"child"},
  {
"id":"2",
"name":"parent",
"_version_":1495272401361960960}]
  }}

So I guess my question is why doesn't the non-distrib query do
de-duping? Mainly confirming this is how it's supposed to work and
this behavior doesn't strike anyone else as odd ;-)

Cheers,

Tim


Re: Parsing cluster result's docs

2015-03-10 Thread Erick Erickson
You can get some fields back besides ID, see the carrot.title and
carrot.snippet params. I don't know a good way to get the full
underlying documents though.

Best,
Erick

On Mon, Mar 9, 2015 at 9:33 AM, Jorge Luis Lazo  wrote:
> Hi,
>
> I have a Solr instance using the clustering component (with the Lingo
> algorithm) working perfectly. However when I get back the cluster results
> only the ID's of these come back with it. What is the easiest way to
> retrieve full documents instead? Should I parse these IDs into a new query
> to Solr, or is there some configuration I am missing to return full docs
> instead of IDs?
>
> If it matters, I am using Solr 4.10.
>
> Thanks.


Re: Solr TCP layer

2015-03-10 Thread Erick Erickson
Just to pile on:

I admire your bravery! I'll add to the other comments only by saying
that _before_ you start down this path, you really need to articulate
the benefit/cost analysis. "to gain a little more communications
efficiency" will be a pretty hard sell due to the reasons Shawn
outlined. This is hugely risky and would require a lot of work for
as-yet-unarticulated benefits.

There are lots and lots of other things to work on of significantly
greater impact IMO. How would you like to work on something to help
manage Solr's memory usage for instance ;)?

Best,
Erick

On Mon, Mar 9, 2015 at 9:24 AM, Reitzel, Charles
 wrote:
> A couple thoughts:
> 0. Interesting topic.
> 1. But perhaps better suited to the dev list.
> 2. Given the existing architecture, shouldn't we be looking to transport 
> projects, e.g. Jetty, Apache HttpComponents, for support of new socket or 
> even HTTP layer protocols?
> 3. To the extent such support exists, then integration work is still needed 
> at the solr level.  Shalin, is this your intention?
>
> Also, for those of us not tracking protocol standards in detail, can you 
> describe the benefits to Solr users of http/2?
>
> Do you expect HTTP/2 to be transparent at the application layer?
>
> -Original Message-
> From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com]
> Sent: Monday, March 09, 2015 6:23 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr TCP layer
>
> Hi Saumitra,
>
> I've been thinking of adding http/2 support for inter node communication 
> initially and client server communication next in Solr. There's a patch for 
> SPDY support but now that spdy is deprecated and http/2 is the new standard 
> we need to wait for Jetty 9.3 to release. That will take care of many 
> bottlenecks in solrcloud communication. The current trunk is already using 
> jetty 9.2.x which has support for the draft http/2 spec.
>
> A brand new async TCP layer based on netty can be considered but that's a 
> huge amount of work considering our need to still support simple http, SSL 
> etc. Frankly for me that effort is better spent optimizing the routing layer.
> On 09-Mar-2015 1:37 am, "Saumitra Srivastav" 
> wrote:
>
>> Dear Solr Contributors,
>>
>> I want to start working on adding a TCP layer for client to node and
>> inter-node communication.
>>
>> I am not up to date on recent changes happening to Solr. So before I
>> start looking into code, I would like to know if there is already some
>> work done in this direction, which I can reuse. Are there any know
>> challenges/complexities?
>>
>> I would appreciate any help to kick start this effort. Also, what
>> would be the best way to discuss and get feedback on design from
>> contributors? Open a JIRA??
>>
>> Regards,
>> Saumitra
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Solr-TCP-layer-tp4191715.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
> *
> This e-mail may contain confidential or privileged information.
> If you are not the intended recipient, please notify the sender immediately 
> and then delete it.
>
> TIAA-CREF
> *


Solr 5 upgrade

2015-03-10 Thread richardg
Ubuntu 14.04.02
Trying to install solr 5 following this:
https://cwiki.apache.org/confluence/display/solr/Upgrading+a+Solr+4.x+Cluster+to+Solr+5.0

I keep getting "this script requires extracting a war file with either the
jar or unzip utility, please install these utilities or contact your
administrator for assistance." after running install_solr_service.sh.  It
says "Service solr installed." but when I try to run the service I get the
above error.  Not sure the resolution.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-5-upgrade-tp4192127.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 5: data_driven_schema_config's solrconfig causing error

2015-03-10 Thread Steve Rowe
Hi Aman,

The stack trace shows that the AddSchemaFieldsUpdateProcessorFactory specified 
in data_driven_schema_configs’s solrconfig.xml expects the “booleans” field 
type to exist.

Solr 5’s data_driven_schema_configs includes the “booleans” field type:



So you must have removed it when you modified the schema?  Did you do this 
intentionally?  If so, why?

Steve

> On Mar 10, 2015, at 5:25 AM, Aman Tandon  wrote:
> 
> Hi,
> 
> For the sake of using the new schema.xml and solrconfig.xml with solr 5, I
> put my old required field type & fields names (being used with solr 4.8.1)
> in the schema.xml given in *basic_configs* & configurations setting given
> in solrconfig.xml present in *data_driven_schema_configs* and put I put
> these configuration files in the configs of zookeeper.
> 
> But when i am creating the core it is giving the error as booleans
> fieldType is not found in schema. So correct me if i am doing something
> wrong.
> 
> ERROR - 2015-03-10 08:20:16.788; org.apache.solr.core.CoreContainer; Error
>> creating core [core1]: fieldType 'booleans' not found in the schema
>> org.apache.solr.common.SolrException: fieldType 'booleans' not found in
>> the schema
>> at org.apache.solr.core.SolrCore.(SolrCore.java:896)
>> at org.apache.solr.core.SolrCore.(SolrCore.java:662)
>> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:513)
>> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:488)
>> at
>> org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:573)
>> at
>> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:197)
>> at
>> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:186)
>> at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
>> at
>> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:736)
>> at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:261)
>> at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:204)
>> at
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
>> at
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
>> at
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
>> at
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>> at
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
>> at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
>> at
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
>> at
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
>> at
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
>> at
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
>> at
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
>> at org.eclipse.jetty.server.Server.handle(Server.java:368)
>> at
>> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
>> at
>> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
>> at
>> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
>> at
>> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
>> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
>> at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
>> at
>> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
>> at
>> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
>> at
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
>> at
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: org.apache.solr.common.SolrException: fieldType 'booleans' not
>> found in the schema
>> at
>> org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$TypeMapping.populateValueClasses(AddSchemaFieldsUpdateProcessorFactory.java:244)
>> at
>> org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory.inform(AddSchemaFieldsUpdateProcessorFactory.java:170)
>> at
>> org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.j

Re: SolrCloud: Chroot error

2015-03-10 Thread Shawn Heisey
On 3/10/2015 6:10 AM, Aman Tandon wrote:
> Thanks Shawn, I tried it with single string but still no success.
> 
> So currently i am running it without chroot and it is working fine.

That brings up a something for me or you to try.  I wonder if perhaps
there is a bug that will prevent the "directory" creation from
happening.  I would imagine that if you create the directory manually,
Solr would work just fine.  My production cloud is running a very old
release - 4.2.1 - and I find it difficult to set up a full SolrCloud
test environment because I don't have a lot of hardware.

Thanks,
Shawn



RE: Solr phonetics with spelling

2015-03-10 Thread Dyer, James
Ashish,

I would not recommend using spellcheck against a phonetic-analyzed field.  
Instead, you can use  to create a separate field that is lightly 
analyzed and use the copy for spelling.  

James Dyer
Ingram Content Group


-Original Message-
From: Ashish Mukherjee [mailto:ashish.mukher...@gmail.com] 
Sent: Tuesday, March 10, 2015 7:05 AM
To: solr-user@lucene.apache.org
Subject: Solr phonetics with spelling

Hello,

Couple of questions related to phonetics -

1. If I enable the phonetic filter in managed-schema file for a particular
field, how does it affect the spell handler?

2. What is the meaning of the inject attribute within  in
managed-schema? The documentation is not very clear about it.

Regards,
Ashish


Re: unusually high 4.10.2 vs 4.3.1 RAM consumption

2015-03-10 Thread Dmitry Kan
For the sake of the story completeness, just wanted to confirm these params
made a positive affect:

-Dsolr.solr.home=cores -Xmx12000m -Djava.awt.headless=true -XX:+UseParNewGC
-XX:+ExplicitGCInvokesConcurrent -XX:+UseConcMarkSweepGC
-XX:MaxTenuringThreshold=8 -XX:CMSInitiatingOccupancyFraction=40

This freed up couple dozen GBs on the solr server!

On Tue, Feb 17, 2015 at 1:47 PM, Dmitry Kan  wrote:

> Thanks Toke!
>
> Now I consistently see the saw-tooth pattern on two shards with new GC
> parameters, next I will try your suggestion.
>
> The current params are:
>
> -Xmx25600m -XX:+UseParNewGC -XX:+ExplicitGCInvokesConcurrent
> -XX:+UseConcMarkSweepGC -XX:MaxTenuringThreshold=8
> -XX:CMSInitiatingOccupancyFraction=40
>
> Dmitry
>
> On Tue, Feb 17, 2015 at 1:34 PM, Toke Eskildsen 
> wrote:
>
>> On Tue, 2015-02-17 at 11:05 +0100, Dmitry Kan wrote:
>> > Solr: 4.10.2 (high load, mass indexing)
>> > Java: 1.7.0_76 (Oracle)
>> > -Xmx25600m
>> >
>> >
>> > Solr: 4.3.1 (normal load, no mass indexing)
>> > Java: 1.7.0_11 (Oracle)
>> > -Xmx25600m
>> >
>> > The RAM consumption remained the same after the load has stopped on the
>> > 4.10.2 cluster. Manually collecting the memory on a 4.10.2 shard via
>> > jvisualvm dropped the used RAM from 8,5G to 0,5G. But the reserved RAM
>> as
>> > seen by top remained at 9G level.
>>
>> As the JVM does not free OS memory once allocated, top just shows
>> whatever peak it reached at some point. When you tell the JVM that it is
>> free to use 25GB, it makes a lot of sense to allocate a fair chunk of
>> that instead of garbage collecting if there is a period of high usage
>> (mass indexing for example).
>>
>> > What else could be the artifact of such a difference -- Solr or JVM?
>> Can it
>> > only be explained by the mass indexing? What is worrisome is that the
>> > 4.10.2 shard reserves 8x times it uses.
>>
>> If you set your Xmx to a lot less, the JVM will probably favour more
>> frequent garbage collections over extra heap allocation.
>>
>> - Toke Eskildsen, State and University Library, Denmark
>>
>>
>>
>
>
> --
> Dmitry Kan
> Luke Toolbox: http://github.com/DmitryKey/luke
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
> SemanticAnalyzer: www.semanticanalyzer.info
>
>


-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


Re: SolrCloud: Chroot error

2015-03-10 Thread Aman Tandon
Thanks Shawn, I tried it with single string but still no success.

So currently i am running it without chroot and it is working fine.

With Regards
Aman Tandon

On Mon, Mar 9, 2015 at 9:46 PM, Shawn Heisey  wrote:

> On 3/9/2015 10:03 AM, Aman Tandon wrote:
> > Thanks for replying, Just to send the mail, I replaced the IP addresses
> > with the imaginary hostname, now the command is
> >
> > *./solr start -c -z localhost:2181,abc.com:2181
> > ,xyz.com:2181/home/aman/solrcloud/solr_zoo
> >  -p 4567*
>
> The same URL replacement is still happening.  I think I know what you
> are doing, but I was hoping to have a clean string just to make sure.
>
> You should not be using "localhost" in the zkHost string unless there is
> only one zk server, or you are trying to start the entire cluster on one
> machine.  All of your Solr machines should have identical zkHost
> parameters.  That is not possible if they are separate machines and you
> use localhost.
>
> Your chroot should be very simple, as I mentioned in the other email.
> Using "/solr" is appropriate if you won't be sharing the zookeeper
> ensemble with multiple SolrCloud clusters.  The filesystem layout of
> your zookeeper install (bin, data, logs, etc) is NOT relevant for this
> chroot.  It exists only within the zookeeper database.
>
> Thanks,
> Shawn
>
>


Solr phonetics with spelling

2015-03-10 Thread Ashish Mukherjee
Hello,

Couple of questions related to phonetics -

1. If I enable the phonetic filter in managed-schema file for a particular
field, how does it affect the spell handler?

2. What is the meaning of the inject attribute within  in
managed-schema? The documentation is not very clear about it.

Regards,
Ashish


Re: Solrcloud Index corruption

2015-03-10 Thread Martin de Vries

Hi,


this _sounds_ like you somehow don't have indexed="true" set for the
field in question.


We investigated a lot more. The CheckIndex tool didn't find any error. 
We now think the following happened:
- We changed the schema two months ago: we changed a field to 
indexed="true". We reloaded the cores, but two of them doesn't seem to 
be reloaded (maybe we forgot).

- We reindexed all content. The new field worked fine.
- We think the leader changed to a server that didn't reload the core
- After that we field stopped working for new indexed documents

Thanks for your help.


Martin




Erick Erickson schreef op 06.03.2015 17:02:


bq: You say in our case some docs didn't made it to the node, but
that's not really true: the docs can be found on the corrupted nodes
when I search on ID. The docs are also complete. The problem is that
the docs do not appear when I filter on certain fields

this _sounds_ like you somehow don't have indexed="true" set for the
field in question. But it also sounds like you're saying that search
on that field works on some nodes but not on others, I'm assuming
you're adding "&distrib=false" to verify this. It shouldn't be
possible to have different schema.xml files on the different nodes,
but you might try checking through the admin UI.

Network burps shouldn't be related here. If the content is stored,
then the info made it to Solr intact, so this issue shouldn't be
related to that.

Sounds like it may just be the bugs Mark is referencing, sorry I 
don't

have the JIRA numbers right off.

Best,
Erick

On Thu, Mar 5, 2015 at 4:46 PM, Shawn Heisey  
wrote:



On 3/5/2015 3:13 PM, Martin de Vries wrote:

I understand there is not a "master" in SolrCloud. In our case we 
use
haproxy as a load balancer for every request. So when indexing 
every

document will be sent to a different solr server, immediately after
each other. Maybe SolrCloud is not able to handle that correctly?

SolrCloud can handle that correctly, but currently sending index
updates to a core that is not the leader of the shard will incur a
significant performance hit, compared to always sending updates to 
the

correct core. A small performance penalty would be understandable,
because the request must be redirected, but what actually happens is 
a
much larger penalty than anyone expected. We have an issue in Jira 
to
investigate that performance issue and make it work as efficiently 
as
possible. Indexing batches of documents is recommended, not sending 
one

document per update request. General performance problems with Solr
itself can lead to extremely odd and unpredictable behavior from
SolrCloud. Most often these kinds of performance problems are 
related
in some way to memory, either the java heap or available memory in 
the
system. http://wiki.apache.org/solr/SolrPerformanceProblems [1] 
Thanks,

Shawn




Links:
--
[1] http://wiki.apache.org/solr/SolrPerformanceProblems


Solr 5: data_driven_schema_config's solrconfig causing error

2015-03-10 Thread Aman Tandon
Hi,

For the sake of using the new schema.xml and solrconfig.xml with solr 5, I
put my old required field type & fields names (being used with solr 4.8.1)
in the schema.xml given in *basic_configs* & configurations setting given
in solrconfig.xml present in *data_driven_schema_configs* and put I put
these configuration files in the configs of zookeeper.

But when i am creating the core it is giving the error as booleans
fieldType is not found in schema. So correct me if i am doing something
wrong.

ERROR - 2015-03-10 08:20:16.788; org.apache.solr.core.CoreContainer; Error
> creating core [core1]: fieldType 'booleans' not found in the schema
> org.apache.solr.common.SolrException: fieldType 'booleans' not found in
> the schema
> at org.apache.solr.core.SolrCore.(SolrCore.java:896)
> at org.apache.solr.core.SolrCore.(SolrCore.java:662)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:513)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:488)
> at
> org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:573)
> at
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:197)
> at
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:186)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
> at
> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:736)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:261)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:204)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> at org.eclipse.jetty.server.Server.handle(Server.java:368)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
> at
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
> at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
> at
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.solr.common.SolrException: fieldType 'booleans' not
> found in the schema
> at
> org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$TypeMapping.populateValueClasses(AddSchemaFieldsUpdateProcessorFactory.java:244)
> at
> org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory.inform(AddSchemaFieldsUpdateProcessorFactory.java:170)
> at
> org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:620)
> at org.apache.solr.core.SolrCore.(SolrCore.java:879)
> ... 35 more
> ERROR - 2015-03-10 08:20:16.825; org.apache.solr.common.SolrException;
> org.apache.solr.common.SolrException: Error CREATEing SolrCore 'core1':
> Unable to create core [core1] Caused by: fieldType 'booleans' not found in
> the schema
> at
> org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:606)
> at
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:197)
> at
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:186)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(