Re: Data Import Handler - autoSoftCommit and autoCommit

2016-02-08 Thread Rajesh Hazari
we have this for a collection which updated every 3mins with min of 500
documents and once in a day of 10k documents in start of the day


   ${solr.autoCommit.maxTime:30}
1
true
true
 
   
  ${solr.autoSoftCommit.maxTime:6000}
   

As per solr documentation, If you have solr client to index documents,
its not suggested to use commit=true and optimize=true explicitly.

we have not tested data import handle with 10 million records.

we have settled with this config after many tests and after understanding
the need and requirements.


*Rajesh**.*

On Mon, Feb 8, 2016 at 10:15 AM, Troy Edwards 
wrote:

> We are running the data import handler to retrieve about 10 million records
> during work hours every day of the week. We are using Clean = true, Commit
> = true and Optimize = true. The entire process takes about 1 hour.
>
> What would be a good setting for autoCommit and autoSoftCommit?
>
> Thanks
>


Re: Data Import Handler - autoSoftCommit and autoCommit

2016-02-08 Thread Susheel Kumar
You can start with one of the suggestions from this link based on your
indexing and query load.


https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/


Thanks,
Susheel

On Mon, Feb 8, 2016 at 10:15 AM, Troy Edwards 
wrote:

> We are running the data import handler to retrieve about 10 million records
> during work hours every day of the week. We are using Clean = true, Commit
> = true and Optimize = true. The entire process takes about 1 hour.
>
> What would be a good setting for autoCommit and autoSoftCommit?
>
> Thanks
>


Re: Solr 4.10 with Jetty 8.1.10 & Tomcat 7

2016-02-08 Thread Shahzad Masud
Thank you Shawn for your reply. Here is my structure of cores and shards

Shard 1 = localhost:8983/solr_2014 [3 Core  - Employee, Service Tickets,
Departments]
Shard 2 = localhost:8983/solr_2015 [3 Core  - Employee, Service Tickets,
Departments]
Shard 3 = localhost:8983/solr_2016 [3 Core  - Employee, Service Tickets,
Departments]

While searching, I use distributed search feature to search data from all
three shards in respective cores e.g. If I want to search from Employee
data for all three years, I search from Employee core of three contexts.
This is legacy design, do you think this is okay, or this require immediate
restructure / design? I am going to try this,

Context = localhost:8982/solr (9 cores - Employee-2014, Employee-2015,
Employee-2016, ServiceTickets-2014, ServiceTickets-2015,
ServiceTickets-2016, Department-2014, Department-2015, Department-2016]
distributed search would be from all three cores of same data category
(i.e. For Employee search, it would be from Employee-2014, Employee-2015,
Employee-2016).

Regarding one Solr context per jetty; I cannot run two solr contexts
pointing to different data in Jetty, as while starting jetty I have to
provide -Dsolr.solr.home variable - which ends up pointing to one data
folder (2014 data) only.

Shahzad


On Thu, Feb 4, 2016 at 10:25 PM, Shawn Heisey  wrote:

> On 2/4/2016 9:48 AM, Shahzad Masud wrote:
> > Thank you Shawn for your response. I have been using manual shards (old
> > mechanism) i.e. seperate context for each shard and each shard pointing
> to
> > seperate data and indexing folder.
> >
> > Shard 1 = localhost:8983/solr_2014
> > Shard 2 = localhost:8983/solr_2015
> > Shard 3 = localhost:8983/solr_2016
> >
> > Do you think this is a good design practise? Can you share an example
> which
> > may help me deploy two shards in one jetty?
>
> Manual sharding typically does *not* involve multiple contexts (webapps)
> in your container.
>
> One instance of Solr (using, for example, the /solr context) can handle
> many cores.
>
> https://cwiki.apache.org/confluence/display/solr/Solr+Cores+and+solr.xml
>
> This functionality is available in *any* container that Solr supports,
> including both Tomcat and Jetty.
>
> Thanks,
> Shawn
>
>


RE: Tutorial or Code Samples to explain how to Write Solr Plugins

2016-02-08 Thread Gian Maria Ricci - aka Alkampfer
Perfect, Thanks again to everyone.

--
Gian Maria Ricci
Cell: +39 320 0136949


-Original Message-
From: Binoy Dalal [mailto:binoydala...@gmail.com] 
Sent: giovedì 4 febbraio 2016 15:07
To: solr-user@lucene.apache.org
Subject: Re: Tutorial or Code Samples to explain how to Write Solr Plugins

I used those links to learn to write my first plugin as well.
I might have that code still lying around somewhere. Let me take a look and get 
back.

On Thu, 4 Feb 2016, 19:32 Gian Maria Ricci - aka Alkampfer < 
alkamp...@nablasoft.com> wrote:

> I've already found these two presentation, sadly enough link for 
> source code is broken, it seems that the domain www.searchbox.com is 
> completely down :|
>
> --
> Gian Maria Ricci
> Cell: +39 320 0136949
>
>
> -Original Message-
> From: Binoy Dalal [mailto:binoydala...@gmail.com]
> Sent: mercoledì 3 febbraio 2016 17:46
> To: solr-user@lucene.apache.org
> Subject: Re: Tutorial or Code Samples to explain how to Write Solr 
> Plugins
>
> Here's a couple of links you can follow to get started:
>
> https://www.slideshare.net/mobile/searchbox-com/tutorial-on-developin-
> a-solr-search-component-plugin
>
> https://www.slideshare.net/mobile/searchbox-com/develop-a-solr-request
> -handler-plugin These are to write a search component and a request 
> handler respectively.
> They are on older solr versions but they should work with 5.x as well.
> I used these to get started when I was trying to write my first plugin.
> Once you get a hang of how it's to be done it's really not that difficult.
>
> On Wed, 3 Feb 2016, 21:59 Gian Maria Ricci - aka Alkampfer < 
> alkamp...@nablasoft.com> wrote:
>
> > Hi,
> >
> >
> >
> > I wonder if there is some code samples or tutorial (updated to work 
> > with version 5) to help users writing plugins.
> >
> >
> >
> > I’ve found lots of difficulties on the past to find such kind of 
> > information when I needed to write some plugins, and I wonder if I 
> > missed some site or link that does a better explanation than 
> > official page http://wiki.apache.org/solr/SolrPlugins that is really old.
> >
> >
> >
> > Thanks in advance.
> >
> >
> >
> > --
> > Gian Maria Ricci
> > Cell: +39 320 0136949
> >
> > [image:
> > https://ci5.googleusercontent.com/proxy/5oNMOYAeFXZ_LDKanNfoLRHC37mA
> > Zk 
> > VVhkPN7QxMdA0K5JW2m0bm8azJe7oWZMNt8fKHNX1bzrUTd-kIyE40CmwT2Mlf8OI=s0
> > -d -e1-ft#http://www.codewrecks.com/files/signature/mvp.png]
> > 
> [image:
> > https://ci3.googleusercontent.com/proxy/f-unQbmk6NtkHFspO5Y6x4jlIf_x
> > rm 
> > GLUT3fU9y_7VUHSFUjLs7aUIMdZQYTh3eWIA0sBnvNX3WGXCU59chKXLuAHi2ArWdAcB
> > cl 
> > KA=s0-d-e1-ft#http://www.codewrecks.com/files/signature/linkedin.jpg
> > ]  [image:
> > https://ci3.googleusercontent.com/proxy/gjapMzu3KEakBQUstx_-cN7gHJ_G
> > pc 
> > IZNEPjCzOYMrPl-r1DViPE378qNAQyEWbXMTj6mcduIAGaApe9qHG1KN_hyFxQAIkdNS
> > VT 
> > =s0-d-e1-ft#http://www.codewrecks.com/files/signature/twitter.jpg]
> >  [image:
> > https://ci5.googleusercontent.com/proxy/iuDOD2sdaxRDvTwS8MO7-CcXchpN
> > JX 
> > 96uaWuvagoVLcjpAPsJi88XeOonE4vHT6udVimo7yL9ZtdrYueEfH7jXnudmi_Vvw=s0
> > -d -e1-ft#http://www.codewrecks.com/files/signature/rss.jpg]
> >  [image:
> > https://ci6.googleusercontent.com/proxy/EBJjfkBzcsSlAzlyR88y86YXcwaK
> > fn 
> > 3x7ydAObL1vtjJYclQr_l5TvrFx4PQ5qLNYW3yp7Ig66DJ-0tPJCDbDmYAFcamPQehwg
> > =s 0-d-e1-ft#http://www.codewrecks.com/files/signature/skype.jpg]
> >
> >
> >
> --
> Regards,
> Binoy Dalal
>
--
Regards,
Binoy Dalal


Re: Solr architecture

2016-02-08 Thread Emir Arnautovic

Hi Mark,
Can you give us bit more details: size of docs, query types, are docs 
grouped somehow, are they time sensitive, will they update or it is 
rebuild every time, etc.


Thanks,
Emir

On 08.02.2016 16:56, Mark Robinson wrote:

Hi,
We have a requirement where we would need to index around 2 Billion docs in
a day.
The queries against this indexed data set can be around 80K queries per
second during peak time and during non peak hours around 12K queries per
second.

Can Solr realize this huge volumes.

If so, assuming we have no constraints for budget what would be a
recommended Solr set up (number of shards, number of Solr instances etc...)

Thanks!
Mark



--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: Tutorial or Code Samples to explain how to Write Solr Plugins

2016-02-08 Thread Binoy Dalal
I've compiled a sample search component example working on solr 5.4.1
The code is ready to run. Find it here:
https://github.com/lttazz99/SolrPluginsExamples.git

On Mon, Feb 8, 2016 at 1:51 PM Gian Maria Ricci - aka Alkampfer <
alkamp...@nablasoft.com> wrote:

> Perfect, Thanks again to everyone.
>
> --
> Gian Maria Ricci
> Cell: +39 320 0136949
>
>
> -Original Message-
> From: Binoy Dalal [mailto:binoydala...@gmail.com]
> Sent: giovedì 4 febbraio 2016 15:07
> To: solr-user@lucene.apache.org
> Subject: Re: Tutorial or Code Samples to explain how to Write Solr Plugins
>
> I used those links to learn to write my first plugin as well.
> I might have that code still lying around somewhere. Let me take a look
> and get back.
>
> On Thu, 4 Feb 2016, 19:32 Gian Maria Ricci - aka Alkampfer <
> alkamp...@nablasoft.com> wrote:
>
> > I've already found these two presentation, sadly enough link for
> > source code is broken, it seems that the domain www.searchbox.com is
> > completely down :|
> >
> > --
> > Gian Maria Ricci
> > Cell: +39 320 0136949
> >
> >
> > -Original Message-
> > From: Binoy Dalal [mailto:binoydala...@gmail.com]
> > Sent: mercoledì 3 febbraio 2016 17:46
> > To: solr-user@lucene.apache.org
> > Subject: Re: Tutorial or Code Samples to explain how to Write Solr
> > Plugins
> >
> > Here's a couple of links you can follow to get started:
> >
> > https://www.slideshare.net/mobile/searchbox-com/tutorial-on-developin-
> > a-solr-search-component-plugin
> >
> > https://www.slideshare.net/mobile/searchbox-com/develop-a-solr-request
> > -handler-plugin These are to write a search component and a request
> > handler respectively.
> > They are on older solr versions but they should work with 5.x as well.
> > I used these to get started when I was trying to write my first plugin.
> > Once you get a hang of how it's to be done it's really not that
> difficult.
> >
> > On Wed, 3 Feb 2016, 21:59 Gian Maria Ricci - aka Alkampfer <
> > alkamp...@nablasoft.com> wrote:
> >
> > > Hi,
> > >
> > >
> > >
> > > I wonder if there is some code samples or tutorial (updated to work
> > > with version 5) to help users writing plugins.
> > >
> > >
> > >
> > > I’ve found lots of difficulties on the past to find such kind of
> > > information when I needed to write some plugins, and I wonder if I
> > > missed some site or link that does a better explanation than
> > > official page http://wiki.apache.org/solr/SolrPlugins that is really
> old.
> > >
> > >
> > >
> > > Thanks in advance.
> > >
> > >
> > >
> > > --
> > > Gian Maria Ricci
> > > Cell: +39 320 0136949
> > >
> > > [image:
> > > https://ci5.googleusercontent.com/proxy/5oNMOYAeFXZ_LDKanNfoLRHC37mA
> > > Zk
> > > VVhkPN7QxMdA0K5JW2m0bm8azJe7oWZMNt8fKHNX1bzrUTd-kIyE40CmwT2Mlf8OI=s0
> > > -d -e1-ft#http://www.codewrecks.com/files/signature/mvp.png]
> > > 
> > [image:
> > > https://ci3.googleusercontent.com/proxy/f-unQbmk6NtkHFspO5Y6x4jlIf_x
> > > rm
> > > GLUT3fU9y_7VUHSFUjLs7aUIMdZQYTh3eWIA0sBnvNX3WGXCU59chKXLuAHi2ArWdAcB
> > > cl
> > > KA=s0-d-e1-ft#http://www.codewrecks.com/files/signature/linkedin.jpg
> > > ]  [image:
> > > https://ci3.googleusercontent.com/proxy/gjapMzu3KEakBQUstx_-cN7gHJ_G
> > > pc
> > > IZNEPjCzOYMrPl-r1DViPE378qNAQyEWbXMTj6mcduIAGaApe9qHG1KN_hyFxQAIkdNS
> > > VT
> > > =s0-d-e1-ft#http://www.codewrecks.com/files/signature/twitter.jpg]
> > >  [image:
> > > https://ci5.googleusercontent.com/proxy/iuDOD2sdaxRDvTwS8MO7-CcXchpN
> > > JX
> > > 96uaWuvagoVLcjpAPsJi88XeOonE4vHT6udVimo7yL9ZtdrYueEfH7jXnudmi_Vvw=s0
> > > -d -e1-ft#http://www.codewrecks.com/files/signature/rss.jpg]
> > >  [image:
> > > https://ci6.googleusercontent.com/proxy/EBJjfkBzcsSlAzlyR88y86YXcwaK
> > > fn
> > > 3x7ydAObL1vtjJYclQr_l5TvrFx4PQ5qLNYW3yp7Ig66DJ-0tPJCDbDmYAFcamPQehwg
> > > =s 0-d-e1-ft#http://www.codewrecks.com/files/signature/skype.jpg]
> > >
> > >
> > >
> > --
> > Regards,
> > Binoy Dalal
> >
> --
> Regards,
> Binoy Dalal
>
-- 
Regards,
Binoy Dalal


Bulk delete of Solr documents

2016-02-08 Thread Anil
Hi ,

Can we delete solr documents from a collection in a bulk ?

Regards,
Anil


Re: Bulk delete of Solr documents

2016-02-08 Thread Yago Riveiro
Yes.  
  
You can delete using a query

  

http://blog.dileno.com/archive/201106/delete-documents-from-solr-index-by-
query/  

  

\--

/Yago Riveiro

> On Feb 8 2016, at 4:35 pm, Anil anilk...@gmail.com wrote:  

>

> Hi ,

>

> Can we delete solr documents from a collection in a bulk ?

>

> Regards,  
Anil



Re: Bulk delete of Solr documents

2016-02-08 Thread Susheel Kumar
Yes, use below url

http://localhost:8983/solr//update?stream.body=
*:*=true

On Mon, Feb 8, 2016 at 11:33 AM, Anil  wrote:

> Hi ,
>
> Can we delete solr documents from a collection in a bulk ?
>
> Regards,
> Anil
>


Re: Solr architecture

2016-02-08 Thread Susheel Kumar
Also if you are expecting indexing of 2 billion docs as NRT or if it will
be offline (during off hours etc).  For more accurate sizing you may also
want to index say 10 million documents which may give you idea how much is
your index size and then use that for extrapolation to come up with memory
requirements.

Thanks,
Susheel

On Mon, Feb 8, 2016 at 11:00 AM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:

> Hi Mark,
> Can you give us bit more details: size of docs, query types, are docs
> grouped somehow, are they time sensitive, will they update or it is rebuild
> every time, etc.
>
> Thanks,
> Emir
>
>
> On 08.02.2016 16:56, Mark Robinson wrote:
>
>> Hi,
>> We have a requirement where we would need to index around 2 Billion docs
>> in
>> a day.
>> The queries against this indexed data set can be around 80K queries per
>> second during peak time and during non peak hours around 12K queries per
>> second.
>>
>> Can Solr realize this huge volumes.
>>
>> If so, assuming we have no constraints for budget what would be a
>> recommended Solr set up (number of shards, number of Solr instances
>> etc...)
>>
>> Thanks!
>> Mark
>>
>>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>


Re: Solr architecture

2016-02-08 Thread Erick Erickson
Short form: You really have to prototype. Here's the long form:

https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

I've seen between 20M and 200M docs fit on a single piece of hardware,
so you'll absolutely have to shard.

And the other thing you haven't told us is whether you plan on
_adding_ 2B docs a day or whether that number is the total corpus size
and you are re-indexing the 2B docs/day. IOW, if you are adding 2B
docs/day, 30 days later do you have 2B docs or 60B docs in your
corpus?

Best,
Erick

On Mon, Feb 8, 2016 at 8:09 AM, Susheel Kumar  wrote:
> Also if you are expecting indexing of 2 billion docs as NRT or if it will
> be offline (during off hours etc).  For more accurate sizing you may also
> want to index say 10 million documents which may give you idea how much is
> your index size and then use that for extrapolation to come up with memory
> requirements.
>
> Thanks,
> Susheel
>
> On Mon, Feb 8, 2016 at 11:00 AM, Emir Arnautovic <
> emir.arnauto...@sematext.com> wrote:
>
>> Hi Mark,
>> Can you give us bit more details: size of docs, query types, are docs
>> grouped somehow, are they time sensitive, will they update or it is rebuild
>> every time, etc.
>>
>> Thanks,
>> Emir
>>
>>
>> On 08.02.2016 16:56, Mark Robinson wrote:
>>
>>> Hi,
>>> We have a requirement where we would need to index around 2 Billion docs
>>> in
>>> a day.
>>> The queries against this indexed data set can be around 80K queries per
>>> second during peak time and during non peak hours around 12K queries per
>>> second.
>>>
>>> Can Solr realize this huge volumes.
>>>
>>> If so, assuming we have no constraints for budget what would be a
>>> recommended Solr set up (number of shards, number of Solr instances
>>> etc...)
>>>
>>> Thanks!
>>> Mark
>>>
>>>
>> --
>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>


Re: solr performance issue

2016-02-08 Thread Susheel Kumar
1 million document shouldn't have any issues at all.  Something else is
wrong with your hw/system configuration.

Thanks,
Susheel

On Mon, Feb 8, 2016 at 6:45 AM, sara hajili  wrote:

> On Mon, Feb 8, 2016 at 3:04 AM, sara hajili  wrote:
>
> > sorry i made a mistake i have a bout 1000 K doc.
> > i mean about 100 doc.
> >
> > On Mon, Feb 8, 2016 at 1:35 AM, Emir Arnautovic <
> > emir.arnauto...@sematext.com> wrote:
> >
> >> Hi Sara,
> >> Not sure if I am reading this right, but I read it as you have 1000 doc
> >> index and issues? Can you tell us bit more about your setup: number of
> >> servers, hw, index size, number of shards, queries that you run, do you
> >> index at the same time...
> >>
> >> It seems to me that you are running Solr on server with limited RAM and
> >> probably small heap. Swapping for sure will slow things down and GC is
> most
> >> likely reason for high CPU.
> >>
> >> You can use http://sematext.com/spm to collect Solr and host metrics
> and
> >> see where the issue is.
> >>
> >> Thanks,
> >> Emir
> >>
> >> --
> >> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> >> Solr & Elasticsearch Support * http://sematext.com/
> >>
> >>
> >>
> >> On 08.02.2016 10:27, sara hajili wrote:
> >>
> >>> hi all.
> >>> i have a problem with my solr performance and usage hardware like a
> >>> ram,cup...
> >>> i have a lot of document and so indexed file about 1000 doc in solr
> that
> >>> every doc has about 8 field in average.
> >>> and each field has about 60 char.
> >>> i set my field as a storedfield = "false" except of  1 field. // i read
> >>> that this help performance.
> >>> i used copy field and dynamic field if it was necessary . // i read
> that
> >>> this help performance.
> >>> and now my question is that when i run a lot of query on solr i faced
> >>> with
> >>> a problem solr use more cpu and ram and after that filled ,it use a lot
> >>>   swapped storage and then use hard,but doesn't create a system file!
> >>> solr
> >>> fill hard until i forced to restart server to release hard disk.
> >>> and now my question is why solr treat in this way? and how i can avoid
> >>> solr
> >>> to use huge cpu space?
> >>> any config need?!
> >>>
> >>>
> >>
> >
>


Solr architecture

2016-02-08 Thread Mark Robinson
Hi,
We have a requirement where we would need to index around 2 Billion docs in
a day.
The queries against this indexed data set can be around 80K queries per
second during peak time and during non peak hours around 12K queries per
second.

Can Solr realize this huge volumes.

If so, assuming we have no constraints for budget what would be a
recommended Solr set up (number of shards, number of Solr instances etc...)

Thanks!
Mark


solr performance issue

2016-02-08 Thread sara hajili
hi all.
i have a problem with my solr performance and usage hardware like a
ram,cup...
i have a lot of document and so indexed file about 1000 doc in solr that
every doc has about 8 field in average.
and each field has about 60 char.
i set my field as a storedfield = "false" except of  1 field. // i read
that this help performance.
i used copy field and dynamic field if it was necessary . // i read that
this help performance.
and now my question is that when i run a lot of query on solr i faced with
a problem solr use more cpu and ram and after that filled ,it use a lot
 swapped storage and then use hard,but doesn't create a system file! solr
fill hard until i forced to restart server to release hard disk.
and now my question is why solr treat in this way? and how i can avoid solr
to use huge cpu space?
any config need?!


Re: solr performance issue

2016-02-08 Thread Emir Arnautovic

Hi Sara,
It is still considered to be small index. Can you give us bit details 
about your setup?


Thanks,
Emir

On 08.02.2016 12:04, sara hajili wrote:

sorry i made a mistake i have a bout 1000 K doc.
i mean about 100 doc.

On Mon, Feb 8, 2016 at 1:35 AM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:


Hi Sara,
Not sure if I am reading this right, but I read it as you have 1000 doc
index and issues? Can you tell us bit more about your setup: number of
servers, hw, index size, number of shards, queries that you run, do you
index at the same time...

It seems to me that you are running Solr on server with limited RAM and
probably small heap. Swapping for sure will slow things down and GC is most
likely reason for high CPU.

You can use http://sematext.com/spm to collect Solr and host metrics and
see where the issue is.

Thanks,
Emir

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



On 08.02.2016 10:27, sara hajili wrote:


hi all.
i have a problem with my solr performance and usage hardware like a
ram,cup...
i have a lot of document and so indexed file about 1000 doc in solr that
every doc has about 8 field in average.
and each field has about 60 char.
i set my field as a storedfield = "false" except of  1 field. // i read
that this help performance.
i used copy field and dynamic field if it was necessary . // i read that
this help performance.
and now my question is that when i run a lot of query on solr i faced with
a problem solr use more cpu and ram and after that filled ,it use a lot
   swapped storage and then use hard,but doesn't create a system file! solr
fill hard until i forced to restart server to release hard disk.
and now my question is why solr treat in this way? and how i can avoid
solr
to use huge cpu space?
any config need?!




--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: solr performance issue

2016-02-08 Thread Emir Arnautovic

Hi Sara,
Not sure if I am reading this right, but I read it as you have 1000 doc 
index and issues? Can you tell us bit more about your setup: number of 
servers, hw, index size, number of shards, queries that you run, do you 
index at the same time...


It seems to me that you are running Solr on server with limited RAM and 
probably small heap. Swapping for sure will slow things down and GC is 
most likely reason for high CPU.


You can use http://sematext.com/spm to collect Solr and host metrics and 
see where the issue is.


Thanks,
Emir

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On 08.02.2016 10:27, sara hajili wrote:

hi all.
i have a problem with my solr performance and usage hardware like a
ram,cup...
i have a lot of document and so indexed file about 1000 doc in solr that
every doc has about 8 field in average.
and each field has about 60 char.
i set my field as a storedfield = "false" except of  1 field. // i read
that this help performance.
i used copy field and dynamic field if it was necessary . // i read that
this help performance.
and now my question is that when i run a lot of query on solr i faced with
a problem solr use more cpu and ram and after that filled ,it use a lot
  swapped storage and then use hard,but doesn't create a system file! solr
fill hard until i forced to restart server to release hard disk.
and now my question is why solr treat in this way? and how i can avoid solr
to use huge cpu space?
any config need?!





Re: solr performance issue

2016-02-08 Thread sara hajili
sorry i made a mistake i have a bout 1000 K doc.
i mean about 100 doc.

On Mon, Feb 8, 2016 at 1:35 AM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:

> Hi Sara,
> Not sure if I am reading this right, but I read it as you have 1000 doc
> index and issues? Can you tell us bit more about your setup: number of
> servers, hw, index size, number of shards, queries that you run, do you
> index at the same time...
>
> It seems to me that you are running Solr on server with limited RAM and
> probably small heap. Swapping for sure will slow things down and GC is most
> likely reason for high CPU.
>
> You can use http://sematext.com/spm to collect Solr and host metrics and
> see where the issue is.
>
> Thanks,
> Emir
>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
> On 08.02.2016 10:27, sara hajili wrote:
>
>> hi all.
>> i have a problem with my solr performance and usage hardware like a
>> ram,cup...
>> i have a lot of document and so indexed file about 1000 doc in solr that
>> every doc has about 8 field in average.
>> and each field has about 60 char.
>> i set my field as a storedfield = "false" except of  1 field. // i read
>> that this help performance.
>> i used copy field and dynamic field if it was necessary . // i read that
>> this help performance.
>> and now my question is that when i run a lot of query on solr i faced with
>> a problem solr use more cpu and ram and after that filled ,it use a lot
>>   swapped storage and then use hard,but doesn't create a system file! solr
>> fill hard until i forced to restart server to release hard disk.
>> and now my question is why solr treat in this way? and how i can avoid
>> solr
>> to use huge cpu space?
>> any config need?!
>>
>>
>


Re: solr performance issue

2016-02-08 Thread sara hajili
On Mon, Feb 8, 2016 at 3:04 AM, sara hajili  wrote:

> sorry i made a mistake i have a bout 1000 K doc.
> i mean about 100 doc.
>
> On Mon, Feb 8, 2016 at 1:35 AM, Emir Arnautovic <
> emir.arnauto...@sematext.com> wrote:
>
>> Hi Sara,
>> Not sure if I am reading this right, but I read it as you have 1000 doc
>> index and issues? Can you tell us bit more about your setup: number of
>> servers, hw, index size, number of shards, queries that you run, do you
>> index at the same time...
>>
>> It seems to me that you are running Solr on server with limited RAM and
>> probably small heap. Swapping for sure will slow things down and GC is most
>> likely reason for high CPU.
>>
>> You can use http://sematext.com/spm to collect Solr and host metrics and
>> see where the issue is.
>>
>> Thanks,
>> Emir
>>
>> --
>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>
>>
>> On 08.02.2016 10:27, sara hajili wrote:
>>
>>> hi all.
>>> i have a problem with my solr performance and usage hardware like a
>>> ram,cup...
>>> i have a lot of document and so indexed file about 1000 doc in solr that
>>> every doc has about 8 field in average.
>>> and each field has about 60 char.
>>> i set my field as a storedfield = "false" except of  1 field. // i read
>>> that this help performance.
>>> i used copy field and dynamic field if it was necessary . // i read that
>>> this help performance.
>>> and now my question is that when i run a lot of query on solr i faced
>>> with
>>> a problem solr use more cpu and ram and after that filled ,it use a lot
>>>   swapped storage and then use hard,but doesn't create a system file!
>>> solr
>>> fill hard until i forced to restart server to release hard disk.
>>> and now my question is why solr treat in this way? and how i can avoid
>>> solr
>>> to use huge cpu space?
>>> any config need?!
>>>
>>>
>>
>


RE: Multi-lingual search

2016-02-08 Thread vidya
Hi 
I need to search on these languages which includes proximity search also.
1.Malay
2.Tamil
3.Bahasa Indonesia
4.Vietnamese
5.Cantonese
Will IndicNormalizationFilter work fine or any other filter? Help me if you
have already worked on it or have any idea.


Thanks in advance




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-lingual-search-tp4254398p4255826.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.10 with Jetty 8.1.10 & Tomcat 7

2016-02-08 Thread Shahzad Masud
Thank you Shawn for your response. I would be running some performance
tests lately on this structure (one JVM with multiple cores), and would
share feedback on this thread.

>There IS a way to specify the solr home for a specific context, but keep
>in mind that I definitely DO NOT recommend doing this.  There is
>resource and administrative overhead to running multiple copies of Solr
>in one JVM.  Simply run one context and let it handle multiple shards,
>whether you choose SolrCloud or not.
Due to distributed search feature, I might not be able to run SolrCloud. I
would appreciate, if you please share that way of setting solr home for a
specific context in Jetty-Solr. Its good to seek more information for
comparison purposes. Do you think having multiple JVMs would increase or
decrease performance. My document base is around 20 million rows (in 24
shards), with document size ranging from 100KB - 400 MB.

SM

On Mon, Feb 8, 2016 at 8:09 PM, Shawn Heisey  wrote:

> On 2/8/2016 1:14 AM, Shahzad Masud wrote:
> > Thank you Shawn for your reply. Here is my structure of cores and shards
> >
> > Shard 1 = localhost:8983/solr_2014 [3 Core  - Employee, Service Tickets,
> > Departments]
> > Shard 2 = localhost:8983/solr_2015 [3 Core  - Employee, Service Tickets,
> > Departments]
> > Shard 3 = localhost:8983/solr_2016 [3 Core  - Employee, Service Tickets,
> > Departments]
> >
> > While searching, I use distributed search feature to search data from all
> > three shards in respective cores e.g. If I want to search from Employee
> > data for all three years, I search from Employee core of three contexts.
> > This is legacy design, do you think this is okay, or this require
> immediate
> > restructure / design? I am going to try this,
> >
> > Context = localhost:8982/solr (9 cores - Employee-2014, Employee-2015,
> > Employee-2016, ServiceTickets-2014, ServiceTickets-2015,
> > ServiceTickets-2016, Department-2014, Department-2015, Department-2016]
> > distributed search would be from all three cores of same data category
> > (i.e. For Employee search, it would be from Employee-2014, Employee-2015,
> > Employee-2016).
>
> With SolrCloud, you can have multiple collections for each of these
> types and alias them together.  Or you can simply have one collection
> for employee, one for servicetickets, and one for department, with
> SolrCloud automatically handling splitting those documents into the
> number of shardsthat you specify when you create the collection.  You
> can also do manual sharding and split each collection on a time basis
> like you have been doing, but then you lose some of the automation that
> SolrCloud provides, so I do not recommend handling it that way.
>
> > Regarding one Solr context per jetty; I cannot run two solr contexts
> > pointing to different data in Jetty, as while starting jetty I have to
> > provide -Dsolr.solr.home variable - which ends up pointing to one data
> > folder (2014 data) only.
>
> You do not need multiple contexts to have multiple indexes.
>
> My dev Solr server has exactly one Solr JVM, with exactly one context --
> /solr.  That instance of Solr has 45 indexes (cores) on it.  These 45
> cores are various shards for three larger indexes.  I am not running
> SolrCloud, but I certainly could.
>
> You can see 25 of the 45 cores in my Solr instance in this screenshot of
> the admin UI for this server:
>
> https://www.dropbox.com/s/v87mxvkdejvd92h/solr-with-45-cores.png?dl=0
>
> There IS a way to specify the solr home for a specific context, but keep
> in mind that I definitely DO NOT recommend doing this.  There is
> resource and administrative overhead to running multiple copies of Solr
> in one JVM.  Simply run one context and let it handle multiple shards,
> whether you choose SolrCloud or not.
>
> Thanks,
> Shawn
>
>


RE: Multi-lingual search

2016-02-08 Thread vidya
Hi
  Can i implement proximity search if i use 
>seperate core per language 
>field per language
>multilingual field that supports all languages.

And what does proximity search exactly mean?

searching for walk word when walking is indexed,should fetch and display the
record?
It will be included in stemming filter.right?

Thanks in advance



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-lingual-search-tp4254398p4256094.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Tesseract command-line OCR engine has stopped working

2016-02-08 Thread Zheng Lin Edwin Yeo
Has anyone experienced this before during indexing of EML files?

Regards,
Edwin

On 5 February 2016 at 17:30, Zheng Lin Edwin Yeo 
wrote:

> Hi,
>
> I am indexing EML files (emails) into Solr, and some of those emails has
> attachment.
>
> During the indexing, I encountered this "*Tesseract command-line OCR
> engine has stopped working*" message that come out from the server.
> However, I did not see any error with the indexing, and all the EML files
> are indexed successfully.
>
> Does anyone knows what could be the reason? I am using Solr 5.4.0
>
> Regards,
> Edwin
>


Sequential Documents Ids

2016-02-08 Thread Shai Rubin
Hi,


Recently I've read Michael McCandless' article  
(http://blog.mikemccandless.com/2014/05/choosing-fast-unique-identifier-uuid.html)
 and made some changes to the id assigning mechanism on my project.

I'm newbie to Solr / Lucene and I'm trying to figure out how do these changes 
affect the indexing / searching performance.

Please assist.

Thanks,
Shai



Re: Solr 4.10 with Jetty 8.1.10 & Tomcat 7

2016-02-08 Thread Shawn Heisey
On 2/8/2016 1:14 AM, Shahzad Masud wrote:
> Thank you Shawn for your reply. Here is my structure of cores and shards
>
> Shard 1 = localhost:8983/solr_2014 [3 Core  - Employee, Service Tickets,
> Departments]
> Shard 2 = localhost:8983/solr_2015 [3 Core  - Employee, Service Tickets,
> Departments]
> Shard 3 = localhost:8983/solr_2016 [3 Core  - Employee, Service Tickets,
> Departments]
>
> While searching, I use distributed search feature to search data from all
> three shards in respective cores e.g. If I want to search from Employee
> data for all three years, I search from Employee core of three contexts.
> This is legacy design, do you think this is okay, or this require immediate
> restructure / design? I am going to try this,
>
> Context = localhost:8982/solr (9 cores - Employee-2014, Employee-2015,
> Employee-2016, ServiceTickets-2014, ServiceTickets-2015,
> ServiceTickets-2016, Department-2014, Department-2015, Department-2016]
> distributed search would be from all three cores of same data category
> (i.e. For Employee search, it would be from Employee-2014, Employee-2015,
> Employee-2016).

With SolrCloud, you can have multiple collections for each of these
types and alias them together.  Or you can simply have one collection
for employee, one for servicetickets, and one for department, with
SolrCloud automatically handling splitting those documents into the
number of shardsthat you specify when you create the collection.  You
can also do manual sharding and split each collection on a time basis
like you have been doing, but then you lose some of the automation that
SolrCloud provides, so I do not recommend handling it that way.

> Regarding one Solr context per jetty; I cannot run two solr contexts
> pointing to different data in Jetty, as while starting jetty I have to
> provide -Dsolr.solr.home variable - which ends up pointing to one data
> folder (2014 data) only.

You do not need multiple contexts to have multiple indexes.

My dev Solr server has exactly one Solr JVM, with exactly one context --
/solr.  That instance of Solr has 45 indexes (cores) on it.  These 45
cores are various shards for three larger indexes.  I am not running
SolrCloud, but I certainly could.

You can see 25 of the 45 cores in my Solr instance in this screenshot of
the admin UI for this server:

https://www.dropbox.com/s/v87mxvkdejvd92h/solr-with-45-cores.png?dl=0

There IS a way to specify the solr home for a specific context, but keep
in mind that I definitely DO NOT recommend doing this.  There is
resource and administrative overhead to running multiple copies of Solr
in one JVM.  Simply run one context and let it handle multiple shards,
whether you choose SolrCloud or not.

Thanks,
Shawn



Data Import Handler - autoSoftCommit and autoCommit

2016-02-08 Thread Troy Edwards
We are running the data import handler to retrieve about 10 million records
during work hours every day of the week. We are using Clean = true, Commit
= true and Optimize = true. The entire process takes about 1 hour.

What would be a good setting for autoCommit and autoSoftCommit?

Thanks


Re: Solr architecture

2016-02-08 Thread Jack Krupansky
So is there any aging or TTL (in database terminology) of older docs?

And do all of your queries need to query all of the older documents all of
the time or is there a clear hierarchy of querying for aged documents, like
past 24-hours vs. past week vs. past year vs. older than a year? Sure, you
can always use a function query to boost by the inverse of document age,
but Solr would be more efficient with filter queries or separate indexes
for different time scales.

Are documents ever updated or are they write-once?

Are documents explicitly deleted?

Technically you probably could meet those specs, but... how many
organizations have the resources and the energy to do so?

As a back of the envelope calculation, if Solr gave you 100 queries per
second per node, that would mean you would need 1,200 nodes. It would also
depend on whether those queries are very narrow so that a single node can
execute them or if they require fanout to other shards and then aggregation
of results from those other shards.

-- Jack Krupansky

On Mon, Feb 8, 2016 at 11:24 AM, Erick Erickson 
wrote:

> Short form: You really have to prototype. Here's the long form:
>
>
> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> I've seen between 20M and 200M docs fit on a single piece of hardware,
> so you'll absolutely have to shard.
>
> And the other thing you haven't told us is whether you plan on
> _adding_ 2B docs a day or whether that number is the total corpus size
> and you are re-indexing the 2B docs/day. IOW, if you are adding 2B
> docs/day, 30 days later do you have 2B docs or 60B docs in your
> corpus?
>
> Best,
> Erick
>
> On Mon, Feb 8, 2016 at 8:09 AM, Susheel Kumar 
> wrote:
> > Also if you are expecting indexing of 2 billion docs as NRT or if it will
> > be offline (during off hours etc).  For more accurate sizing you may also
> > want to index say 10 million documents which may give you idea how much
> is
> > your index size and then use that for extrapolation to come up with
> memory
> > requirements.
> >
> > Thanks,
> > Susheel
> >
> > On Mon, Feb 8, 2016 at 11:00 AM, Emir Arnautovic <
> > emir.arnauto...@sematext.com> wrote:
> >
> >> Hi Mark,
> >> Can you give us bit more details: size of docs, query types, are docs
> >> grouped somehow, are they time sensitive, will they update or it is
> rebuild
> >> every time, etc.
> >>
> >> Thanks,
> >> Emir
> >>
> >>
> >> On 08.02.2016 16:56, Mark Robinson wrote:
> >>
> >>> Hi,
> >>> We have a requirement where we would need to index around 2 Billion
> docs
> >>> in
> >>> a day.
> >>> The queries against this indexed data set can be around 80K queries per
> >>> second during peak time and during non peak hours around 12K queries
> per
> >>> second.
> >>>
> >>> Can Solr realize this huge volumes.
> >>>
> >>> If so, assuming we have no constraints for budget what would be a
> >>> recommended Solr set up (number of shards, number of Solr instances
> >>> etc...)
> >>>
> >>> Thanks!
> >>> Mark
> >>>
> >>>
> >> --
> >> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> >> Solr & Elasticsearch Support * http://sematext.com/
> >>
> >>
>


Re: Request for SOLR-wiki edit permissions

2016-02-08 Thread Anshum Gupta
Done.

On Mon, Feb 8, 2016 at 9:55 AM, Jason Gerlowski 
wrote:

> Hi all,
>
> Can someone please give me edit permissions for the Solr wiki.  Is
> there anything I should or need to do to get these permissions?  My
> wiki username is "Jason.Gerlowski", and my wiki email is
> "gerlowsk...@gmail.com".
>
> I spotted a few things that could use some clarification on the
> HowToContribute page (https://wiki.apache.org/solr/HowToContribute)
> and wanted to make them a bit clearer.
>
> Jason
>



-- 
Anshum Gupta


Re: Solr architecture

2016-02-08 Thread Jack Krupansky
Oops... at 100 qps for a single node you would need 120 nodes to get to 12K
qps and 800 nodes to get 80K qps, but that is just an extremely rough
ballpark estimate, not some precise and firm number. And that's if all the
queries can be evenly distributed throughout the cluster and don't require
fanout to other shards, which effectively turns each incoming query into n
queries where n is the number of shards.

-- Jack Krupansky

On Mon, Feb 8, 2016 at 12:07 PM, Jack Krupansky 
wrote:

> So is there any aging or TTL (in database terminology) of older docs?
>
> And do all of your queries need to query all of the older documents all of
> the time or is there a clear hierarchy of querying for aged documents, like
> past 24-hours vs. past week vs. past year vs. older than a year? Sure, you
> can always use a function query to boost by the inverse of document age,
> but Solr would be more efficient with filter queries or separate indexes
> for different time scales.
>
> Are documents ever updated or are they write-once?
>
> Are documents explicitly deleted?
>
> Technically you probably could meet those specs, but... how many
> organizations have the resources and the energy to do so?
>
> As a back of the envelope calculation, if Solr gave you 100 queries per
> second per node, that would mean you would need 1,200 nodes. It would also
> depend on whether those queries are very narrow so that a single node can
> execute them or if they require fanout to other shards and then aggregation
> of results from those other shards.
>
> -- Jack Krupansky
>
> On Mon, Feb 8, 2016 at 11:24 AM, Erick Erickson 
> wrote:
>
>> Short form: You really have to prototype. Here's the long form:
>>
>>
>> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>>
>> I've seen between 20M and 200M docs fit on a single piece of hardware,
>> so you'll absolutely have to shard.
>>
>> And the other thing you haven't told us is whether you plan on
>> _adding_ 2B docs a day or whether that number is the total corpus size
>> and you are re-indexing the 2B docs/day. IOW, if you are adding 2B
>> docs/day, 30 days later do you have 2B docs or 60B docs in your
>> corpus?
>>
>> Best,
>> Erick
>>
>> On Mon, Feb 8, 2016 at 8:09 AM, Susheel Kumar 
>> wrote:
>> > Also if you are expecting indexing of 2 billion docs as NRT or if it
>> will
>> > be offline (during off hours etc).  For more accurate sizing you may
>> also
>> > want to index say 10 million documents which may give you idea how much
>> is
>> > your index size and then use that for extrapolation to come up with
>> memory
>> > requirements.
>> >
>> > Thanks,
>> > Susheel
>> >
>> > On Mon, Feb 8, 2016 at 11:00 AM, Emir Arnautovic <
>> > emir.arnauto...@sematext.com> wrote:
>> >
>> >> Hi Mark,
>> >> Can you give us bit more details: size of docs, query types, are docs
>> >> grouped somehow, are they time sensitive, will they update or it is
>> rebuild
>> >> every time, etc.
>> >>
>> >> Thanks,
>> >> Emir
>> >>
>> >>
>> >> On 08.02.2016 16:56, Mark Robinson wrote:
>> >>
>> >>> Hi,
>> >>> We have a requirement where we would need to index around 2 Billion
>> docs
>> >>> in
>> >>> a day.
>> >>> The queries against this indexed data set can be around 80K queries
>> per
>> >>> second during peak time and during non peak hours around 12K queries
>> per
>> >>> second.
>> >>>
>> >>> Can Solr realize this huge volumes.
>> >>>
>> >>> If so, assuming we have no constraints for budget what would be a
>> >>> recommended Solr set up (number of shards, number of Solr instances
>> >>> etc...)
>> >>>
>> >>> Thanks!
>> >>> Mark
>> >>>
>> >>>
>> >> --
>> >> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>> >> Solr & Elasticsearch Support * http://sematext.com/
>> >>
>> >>
>>
>
>


online scoring explanation

2016-02-08 Thread Doug Turnbull
Splainer maybe ;) http://splainer.io

Hope it's useful to you. Let us know if you have suggestions/ideas/bugs

 http://github.com/o19s/splainer

On Monday, February 8, 2016, John Blythe > wrote:

> hi all,
>
> last year i had gotten a site recommended to me on this forum. it helped
> you break down the results/score you were getting from your queries. it
> isn't explain.solr.pl, but another one that seemed a bit more robust if my
> memory serves me correctly. i want to say a member of the thread not only
> suggested it to me but built it.
>
> anyone know of one such site?
>
> thanks-
>


-- 
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections
, LLC | 240.476.9983
Author: Relevant Search 
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.


SolrCloud behavior when a ZooKeeper node goes down

2016-02-08 Thread Kelly, Frank
We are running a small SolrCloud instance on AWS

Solr : Version 5.3.1
ZooKeeper: Version 3.4.6

3 x ZooKeeper nodes (with higher limits and timeouts due to being on AWS)
3 x Solr Nodes (8 GB of memory each - 2 collections with 3 shards for each 
collection)

Let's call the ZooKeeper nodes A, B and C.
One of our ZooKeeper nodes (B) failed a health check and was replaced due to 
autoscaling , but during this time of failover
our SolrCloud cluster became unavailable. All new connections to Solr were 
unable to connect complaining about connectivity issues
and preexisting connections also had errors

These errors happened for both querys and adds

org.apache.solr.common.SolrException: Could not load collection from 
ZK:qa_us-east-1_here_account

at 
org.apache.solr.client.solrj.impl.CloudSolrClient.getDocCollection(CloudSolrClient.java:1205)

at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:837)

at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:805)

at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)

at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:943)

at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:958)

at 
com.here.scbe.search.solr.SolrFacadeImpl.querySearchIndex(SolrFacadeImpl.java:183)

at 
com.ovi.scbe.search.search.impl.SolrSearcher.searchInner(SolrSearcher.java:69)

at com.ovi.scbe.search.search.impl.SolrSearcher.search(SolrSearcher.java:56)

at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:345)

at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:342)

at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:342)



org.apache.solr.common.SolrException: Could not load collection from 
ZK:qa_us-east-1_public_index

at 
org.apache.solr.client.solrj.impl.CloudSolrClient.getDocCollection(CloudSolrClient.java:1205)

at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:837)

at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:805)

at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)

at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:107)

at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:72)

at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:86)

at 
com.here.scbe.search.solr.SolrFacadeImpl.addToSearchIndex(SolrFacadeImpl.java:108)

at com.ovi.scbe.search.index.impl.SolrIndexer.index(SolrIndexer.java:72)

at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:345)

at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:342)

at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:342)

I thought because we had configured SolrCloud to point at all three ZK nodes 
that the failure of one ZK node would be OK (since we still had a quorum).
 Did I misunderstand something about SolrCloud and its relationship with ZK?

The weird thing now is that when the new ZooKeeper node (D) started up - after 
a few minutes we could connect to SolrCloud again even though we were still 
only pointing to A,B and C (not D).
Any thoughts on why this also happened?

Best,

-Frank

[cid:4BEEB30D-EF88-4787-B5F3-E6BF0E951BE3]
Frank Kelly
Principal Software Engineer
Predictive Analytics Team (SCBE/HAC/CDA)






HERE
5 Wayside Rd, Burlington, MA 01803, USA
42° 29' 7" N 71° 11' 32" W

[cid:92482087-2AF4-4A90-9097-2CC3B0F9BFEB]  
[cid:4FC535C5-9858-4C8C-A8E3-E656910D0DCA]    
[cid:527F4AAD-8F3D-4270-94A3-D69A29E2CCBF] 
[cid:3147AF0F-7BA9-4466-A271-0AA00F6FABB4] 

[cid:F0105D77-5164-4306-91EC-F1F9E6E31A85] 







Solr 5.3 SSL

2016-02-08 Thread Jian Zhang
Hi, Solr guru

We have worked well with solr 5.3 non SSL. Now we are working on SSL by 
following https://cwiki.apache.org/confluence/display/solr/Enabling+SSL 


Cluster is up and and can be accessible via https://:18983/solr/#

When we create a collection,  it redirect request to a non ssl port and error 
out . 

$  solr-5.3.1_8983/solr-5.3.1-SNAPSHOT/bin/solr create -c collection4 -n 
scconfig -shards 5 -replicationFactor 1

Connecting to ZooKeeper at ma1-solrt-lcb04:8925,ma1-solrt-lcb05:8925 ...
Re-using existing configuration directory scconfig

Creating new collection 'collection4' using command:
https://:18985/solr/admin/collections?action=CREATE=collection4=5=1=1=scconfig


ERROR: Failed to create collection 'collection4' due to: 
org.apache.solr.client.solrj.SolrServerException:IOException occured when 
talking to server at: http://:18983/solr


Could you please shed some light  ?

Thanks a lot 
-Jian 

Re: SolrCloud behavior when a ZooKeeper node goes down

2016-02-08 Thread Erick Erickson
My first guess would be that all of the ZK are configured with each other's
addresses available?

Or perhaps AWS is messing with your machine addresses



On Mon, Feb 8, 2016 at 12:09 PM, Kelly, Frank  wrote:

> We are running a small SolrCloud instance on AWS
>
> Solr : Version 5.3.1
> ZooKeeper: Version 3.4.6
>
> 3 x ZooKeeper nodes (with higher limits and timeouts due to being on AWS)
> 3 x Solr Nodes (8 GB of memory each – 2 collections with 3 shards for each
> collection)
>
> Let’s call the ZooKeeper nodes A, B and C.
> One of our ZooKeeper nodes (B) failed a health check and was replaced due
> to autoscaling , but during this time of failover
> our SolrCloud cluster became unavailable. All new connections to Solr were
> unable to connect complaining about connectivity issues
> and preexisting connections also had errors
>
> These errors happened for both querys and adds
>
> org.apache.solr.common.SolrException: Could not load collection from
> ZK:qa_us-east-1_here_account
>
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.getDocCollection(CloudSolrClient.java:1205)
>
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:837)
>
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:805)
>
> at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
>
> at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:943)
>
> at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:958)
>
> at
> com.here.scbe.search.solr.SolrFacadeImpl.querySearchIndex(SolrFacadeImpl.java:183)
>
> at
> com.ovi.scbe.search.search.impl.SolrSearcher.searchInner(SolrSearcher.java:69)
>
> at
> com.ovi.scbe.search.search.impl.SolrSearcher.search(SolrSearcher.java:56)
>
> at
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:345)
>
> at
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:342)
>
> at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:342)
>
>
> org.apache.solr.common.SolrException: Could not load collection from
> ZK:qa_us-east-1_public_index
>
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.getDocCollection(CloudSolrClient.java:1205)
>
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:837)
>
> at
> org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:805)
>
> at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
>
> at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:107)
>
> at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:72)
>
> at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:86)
>
> at
> com.here.scbe.search.solr.SolrFacadeImpl.addToSearchIndex(SolrFacadeImpl.java:108)
>
> at com.ovi.scbe.search.index.impl.SolrIndexer.index(SolrIndexer.java:72)
>
> at
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:345)
>
> at
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:342)
>
> at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:342)
>
> I thought because we had configured SolrCloud to point at all three ZK
> nodes that the failure of one ZK node would be OK (since we still had a
> quorum).
>  Did I misunderstand something about SolrCloud and its relationship with
> ZK?
>
> The weird thing now is that when the new ZooKeeper node (D) started up –
> after a few minutes we could connect to SolrCloud again even though we were
> still only pointing to A,B and C (not D).
> Any thoughts on why this also happened?
>
> Best,
>
> -Frank
>
> *Frank Kelly*
>
> Principal Software Engineer
>
> Predictive Analytics Team (SCBE/HAC/CDA)
>
>
> *HERE *
>
> 5 Wayside Rd, Burlington, MA 01803, USA
>
> *42° 29' 7" N 71° 11' 32” W*
>
>
>    
> 
>   
>
>


Request for SOLR-wiki edit permissions

2016-02-08 Thread Jason Gerlowski
Hi all,

Can someone please give me edit permissions for the Solr wiki.  Is
there anything I should or need to do to get these permissions?  My
wiki username is "Jason.Gerlowski", and my wiki email is
"gerlowsk...@gmail.com".

I spotted a few things that could use some clarification on the
HowToContribute page (https://wiki.apache.org/solr/HowToContribute)
and wanted to make them a bit clearer.

Jason


online scoring explanation

2016-02-08 Thread John Blythe
hi all,

last year i had gotten a site recommended to me on this forum. it helped
you break down the results/score you were getting from your queries. it
isn't explain.solr.pl, but another one that seemed a bit more robust if my
memory serves me correctly. i want to say a member of the thread not only
suggested it to me but built it.

anyone know of one such site?

thanks-


Re: online scoring explanation

2016-02-08 Thread Toke Eskildsen
John Blythe  wrote:
> last year i had gotten a site recommended to me on this forum. it helped
> you break down the results/score you were getting from your queries.

http://splainer.io/ perhaps?

- Toke Eskildsen


Re: Request for SOLR-wiki edit permissions

2016-02-08 Thread Jason Gerlowski
Thanks Anshum!

On Mon, Feb 8, 2016 at 1:01 PM, Anshum Gupta  wrote:
> Done.
>
> On Mon, Feb 8, 2016 at 9:55 AM, Jason Gerlowski 
> wrote:
>
>> Hi all,
>>
>> Can someone please give me edit permissions for the Solr wiki.  Is
>> there anything I should or need to do to get these permissions?  My
>> wiki username is "Jason.Gerlowski", and my wiki email is
>> "gerlowsk...@gmail.com".
>>
>> I spotted a few things that could use some clarification on the
>> HowToContribute page (https://wiki.apache.org/solr/HowToContribute)
>> and wanted to make them a bit clearer.
>>
>> Jason
>>
>
>
>
> --
> Anshum Gupta


Leader election issues after upgrade from 4.10.4 to 5.4.1

2016-02-08 Thread Mike Thomsen
We get this error on one of our nodes:

Caused by: org.apache.solr.common.SolrException: There is conflicting
information about the leader of shard: shard2 our state says:
http://server01:8983/solr/collection/ but zookeeper says:
http://server02:8983/collection/


Then I noticed this in the log:

] o.a.s.c.c.ZkStateReader Load collection config
from:/collections/collection
2016-02-09 00:09:56.763 INFO  (qtp1037197792-12) [   ]
o.a.s.c.c.ZkStateReader path=/collections/collection configName=collection
specified config exists in ZooKeeper

We have a clusterstate.json file left over from 4.X. I read this thread and
the first comment or two suggested that clusterstate.json is now broken up
and refactored into the collections' configuration:

http://grokbase.com/t/lucene/solr-user/152v8bab2z/solr-cloud-does-not-start-with-many-collections

So should we get rid of the clusterstate.json file or keep it? We have 4
Solr VMs in our devops environment. They have 2 CPUs and 4GB of RAM. There
are about 7 collections shared between then, but all are negligible (like a
few hundred kb each) except for one which is about 22GB.

Thanks,

Mike