Re: Monitoring Solr for currently running queries

2020-12-29 Thread Markus Jelsma
Hello Ufuk,

You can log slow queries [1].

If you would want to see currently running queries you would have to extend
SearchHandler and build the custom logic yourself. Watch out for SolrCloud
because the main query as well as the per-shard queries can pass through
that same SearchHandler. You can distinguish between then reading the
shard=true parameter.

Regards,
Markus

[1] https://lucene.apache.org/solr/guide/6_6/configuring-logging.html

Op di 29 dec. 2020 om 16:49 schreef ufuk yılmaz :

> Hello All,
>
> Is there a way to see currently executing queries in a SolrCloud? Or a
> general strategy to detect a query using absurd amount or resources?
>
> We are using Solr for not only simple querying, but running complex
> streaming expressions, facets with large data etc. Sometimes, randomly, CPU
> usage gets so high that it starts to respond very slowly to even simple
> queries, or don’t respond at all. I’m trying to determine if it’s a result
> of simple overloading of the system by many “normal” queries, or someone
> sends Solr an unreasonably compute-heavy request.
>
> A few days ago when this occured, I stopped every service that can send
> Solr a query. After that, for about an hour, nodes were reading from the
> disk at 1GB/s which is the maximum of our disks. Then everything went back
> to the normal as I started the other services.
>
> One (bad) idea I had is to build a proxy service which proxies every
> request to our SolrCloud and monitors current running requests, but scaling
> this to the size of SolrCloud may be reinventing the wheel.
>
> For now all I can detect is that Solr is struggling, but I have no idea
> what causes that and when.
>
> -Chees and happy new year
>


Monitoring Solr for currently running queries

2020-12-29 Thread ufuk yılmaz
Hello All,

Is there a way to see currently executing queries in a SolrCloud? Or a general 
strategy to detect a query using absurd amount or resources?

We are using Solr for not only simple querying, but running complex streaming 
expressions, facets with large data etc. Sometimes, randomly, CPU usage gets so 
high that it starts to respond very slowly to even simple queries, or don’t 
respond at all. I’m trying to determine if it’s a result of simple overloading 
of the system by many “normal” queries, or someone sends Solr an unreasonably 
compute-heavy request.

A few days ago when this occured, I stopped every service that can send Solr a 
query. After that, for about an hour, nodes were reading from the disk at 1GB/s 
which is the maximum of our disks. Then everything went back to the normal as I 
started the other services.

One (bad) idea I had is to build a proxy service which proxies every request to 
our SolrCloud and monitors current running requests, but scaling this to the 
size of SolrCloud may be reinventing the wheel.

For now all I can detect is that Solr is struggling, but I have no idea what 
causes that and when.

-Chees and happy new year


Re: Search issue in the SOLR for few words

2020-11-03 Thread Erick Erickson
There is not nearly enough information here to begin
to help you.

At minimum we need:
1> your field definition
2> the text you index
3> the query you send

You might want to review: 
https://wiki.apache.org/solr/UsingMailingLists

Best,
Erick

> On Nov 3, 2020, at 1:08 AM, Viresh Sasalawad 
>  wrote:
> 
> Hi Sir/Madam,
> 
> Am facing an issue with few keyword searches (like gazing, one) in solr.
> Can you please help why these words are not listed in solr results?
> 
> Indexing is done properly.
> 
> 
> -- 
> Thanks and Regards
> Veeresh Sasalawad



Search issue in the SOLR for few words

2020-11-02 Thread Viresh Sasalawad
Hi Sir/Madam,

Am facing an issue with few keyword searches (like gazing, one) in solr.
Can you please help why these words are not listed in solr results?

Indexing is done properly.


-- 
Thanks and Regards
Veeresh Sasalawad


Re: support need in solr for min and max

2020-01-08 Thread Mel Mason
Try looking at range JSON facets: 
https://lucene.apache.org/solr/guide/8_2/json-facet-api.html#range-facet. 
If you facet over the eventTimeStamp with a gap of 1 day, you should 
then be able to use a sub facet to return a min and max value 
(https://lucene.apache.org/solr/guide/8_2/json-facet-api.html#stat-facet-functions) 
for each day bucket.


On 08/01/2020 11:07, Mohamed Azharuddin wrote:

Hi team,

We are migrating from mysql to apache solr since solr is fast in 
searching. Thank you. We had a scenario to


*find 1) difference (max-min)* 


*        2) with group by date(timeStamp)*

Given below is our mysql table :
Untitled.png

And mysql query is,
*/SELECT Date(eventTimeStamp), MAX(field) - MIN(field) AS Energy FROM 
PowerTable GROUP BY DATE(eventTimeStamp);/*


will results,
Untitled2.png

So we have to calculate difference per day, where date column is in 
datetime format where we are using result grouping as
*/group=true&group.query=eventTimeStamp:[2019-12-11T00:00:00Z TO 
2019-12-11T23:59:59Z]&group.query=eventTimeStamp:[2019-12-12T00:00:00Z 
TO 2019-12-12T23:59:59Z]/*


Using Apache solr statistics option, we are able to calculate max and 
min for whole result, But we need max and min value per day basis.

Untitled31.png

When we try to get max and min value per day basis, we are able to 
fetch either min or max using following query.

*/&group.sort=event1 desc or &group.sort=event1 asc/*
*/
/*
Untitled6.png

But we need both min and max in single query.

So kindly help us to go ahead.

--

Regards,
Azar@EJ



Re: support need in solr for min and max

2020-01-08 Thread Walter Underwood
I hope you do not plan to use Solr as a primary repository. Solr is NOT a 
database. If you use Solr as a database, you will lose data at some point.

The Solr feature set is very different from MySQL. There is no guarantee that a 
SQL query can be translated into a Solr query.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jan 8, 2020, at 3:07 AM, Mohamed Azharuddin  wrote:
> 
> Hi team,
> 
> We are migrating from mysql to apache solr since solr is fast in searching. 
> Thank you. We had a scenario to 
>  
> find 1) difference (max-min) 
> 2) with group by date(timeStamp)
>  
> Given below is our mysql table :
> 
> 
> And mysql query is,
> SELECT Date(eventTimeStamp), MAX(field) - MIN(field) AS Energy FROM 
> PowerTable GROUP BY DATE(eventTimeStamp);
> 
> will results,
> 
> 
> So we have to calculate difference per day, where date column is in datetime 
> format where we are using result grouping as 
> group=true&group.query=eventTimeStamp:[2019-12-11T00:00:00Z TO 
> 2019-12-11T23:59:59Z]&group.query=eventTimeStamp:[2019-12-12T00:00:00Z TO 
> 2019-12-12T23:59:59Z]
> 
> Using Apache solr statistics option, we are able to calculate max and min for 
> whole result, But we need max and min value per day basis.
> 
> 
> When we try to get max and min value per day basis, we are able to fetch 
> either min or max using following query. 
> &group.sort=event1 desc or &group.sort=event1 asc
> 
> 
> 
> But we need both min and max in single query.
> 
> So kindly help us to go ahead.
> 
> -- 
> Regards,
> Azar@EJ



support need in solr for min and max

2020-01-08 Thread Mohamed Azharuddin
Hi team,

We are migrating from mysql to apache solr since solr is fast in searching.
Thank you. We had a scenario to


> *find 1) difference (max-min)*

*2) with group by date(timeStamp)*


Given below is our mysql table :
[image: Untitled.png]

And mysql query is,
*SELECT Date(eventTimeStamp), MAX(field) - MIN(field) AS Energy FROM
PowerTable GROUP BY DATE(eventTimeStamp);*

will results,
[image: Untitled2.png]

So we have to calculate difference per day, where date column is in
datetime format where we are using result grouping as
*group=true&group.query=eventTimeStamp:[2019-12-11T00:00:00Z TO
2019-12-11T23:59:59Z]&group.query=eventTimeStamp:[2019-12-12T00:00:00Z TO
2019-12-12T23:59:59Z]*

Using Apache solr statistics option, we are able to calculate max and min
for whole result, But we need max and min value per day basis.
[image: Untitled31.png]

When we try to get max and min value per day basis, we are able to fetch
either min or max using following query.
*&group.sort=event1 desc or &group.sort=event1 asc*

[image: Untitled6.png]

But we need both min and max in single query.

So kindly help us to go ahead.

-- 

Regards,
Azar@EJ


Re: Solr for Content Management

2018-06-10 Thread Shawn Heisey
On 6/7/2018 12:10 PM, Moenieb Davids wrote:
> Challenges:
> When performing full text searches without concurrently executing updates,
> solr seems to be doing well. Running updates also does okish given the
> nature of the transaction. However, when I run search and updates
> simultaneously, performance drops quite significantly. I have played with
> field properties, analyzers, tokenizers, shafting sizes etc.

I have absolutely no idea what a shafting size is.  If I google for it,
the only relevant thing that comes up is your message on this list.

Doing updates at the same time as queries will always have an impact on
query performance.  But if that impact is very significant, then it
sounds like the machine doesn't have enough memory to allow the OS to
effectively cache the index data.  When updates are made, all the data
that is written will end up in the disk cache, and if the cache is as
big as can get already, it will push older data out of the cache.

Disks are very slow compared to memory, so if the index data required to
complete a query must be read from the disk, performance is adversely
affected.

A page discussing OS disk cache requirements:

https://wiki.apache.org/solr/SolrPerformanceProblems#RAM

Thanks,
Shawn



Re: Solr for Content Management

2018-06-08 Thread Emir Arnautović
Hi,
It is also likely that your indexing is using resources and that there are not 
enough resources for queries to process. Indexing can put stress on heap and 
GCs might be slowing Solr down resulting in observed latency. Can you tell us a 
bit more on size of your index, server configs, heap size, indexing rate, how 
do you do indexing (batch size) and query rate. This might give us better ideas 
to point you into right direction. 
Do you use anything to monitor your Solr/host? Does monitoring tool suggest 
that there are some bottleneck?

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 8 Jun 2018, at 09:06, Alexandre Rafalovitch  wrote:
> 
> And in solrconfig.xml, it is possible to configure the searches to warm the
> index up before the users see it.
> 
> Regards,
>Alex
> 
> On Thu, Jun 7, 2018, 21:27 David Hastings, 
> wrote:
> 
>> When you are sending updates you are adjusting the segments which take them
>> out of memory and the index becomes "cold" until it gets enough searches to
>> cache the various aspects of the index.
>> 
>> On Thu, Jun 7, 2018 at 2:10 PM, Moenieb Davids 
>> wrote:
>> 
>>> Hi All,
>>> 
>>> Background:
>>> I am currently testing a deployment of a content management framework
>> where
>>> I am trying to punt Solr as the tool of choice for ingestion and
>> searching.
>>> 
>>> Current status:
>>> I have deployed SolrCloud across multiple servers with multiple shards
>> and
>>> a replication factor of 2.
>>> In terms of collections, I have a person collection that contains details
>>> individuals including address and high level portfolio info.
>> Structurally,
>>> this collection contains great grandchildren.
>>> Then I have a few collections that deals with content. For now, content
>> is
>>> just emails and document with a max size of 2MB, with certain user
>>> exceptions that can go higher than 2MB.
>>> Content is indexed twice in terms of the actual content, firstly as
>>> binary/stream and then as readable text. Metadata is negligible
>>> 
>>> 
>>> Challenges:
>>> When performing full text searches without concurrently executing
>> updates,
>>> solr seems to be doing well. Running updates also does okish given the
>>> nature of the transaction. However, when I run search and updates
>>> simultaneously, performance drops quite significantly. I have played with
>>> field properties, analyzers, tokenizers, shafting sizes etc.
>>> Any advice?
>>> Would like to know if anyone has done something similar. Please excuse
>> the
>>> long winded message
>>> 
>>> 
>>> --
>>> Sent from Gmail Mobile
>>> 
>>> 
>>> 
>>> --
>>> Sent from Gmail Mobile
>>> 
>> 



Re: Solr for Content Management

2018-06-08 Thread Alexandre Rafalovitch
And in solrconfig.xml, it is possible to configure the searches to warm the
index up before the users see it.

Regards,
Alex

On Thu, Jun 7, 2018, 21:27 David Hastings, 
wrote:

> When you are sending updates you are adjusting the segments which take them
> out of memory and the index becomes "cold" until it gets enough searches to
> cache the various aspects of the index.
>
> On Thu, Jun 7, 2018 at 2:10 PM, Moenieb Davids 
> wrote:
>
> > Hi All,
> >
> > Background:
> > I am currently testing a deployment of a content management framework
> where
> > I am trying to punt Solr as the tool of choice for ingestion and
> searching.
> >
> > Current status:
> > I have deployed SolrCloud across multiple servers with multiple shards
> and
> > a replication factor of 2.
> > In terms of collections, I have a person collection that contains details
> > individuals including address and high level portfolio info.
> Structurally,
> > this collection contains great grandchildren.
> > Then I have a few collections that deals with content. For now, content
> is
> > just emails and document with a max size of 2MB, with certain user
> > exceptions that can go higher than 2MB.
> > Content is indexed twice in terms of the actual content, firstly as
> > binary/stream and then as readable text. Metadata is negligible
> >
> >
> > Challenges:
> > When performing full text searches without concurrently executing
> updates,
> > solr seems to be doing well. Running updates also does okish given the
> > nature of the transaction. However, when I run search and updates
> > simultaneously, performance drops quite significantly. I have played with
> > field properties, analyzers, tokenizers, shafting sizes etc.
> > Any advice?
> > Would like to know if anyone has done something similar. Please excuse
> the
> > long winded message
> >
> >
> > --
> > Sent from Gmail Mobile
> >
> >
> >
> > --
> > Sent from Gmail Mobile
> >
>


Re: Solr for Content Management

2018-06-07 Thread David Hastings
When you are sending updates you are adjusting the segments which take them
out of memory and the index becomes "cold" until it gets enough searches to
cache the various aspects of the index.

On Thu, Jun 7, 2018 at 2:10 PM, Moenieb Davids 
wrote:

> Hi All,
>
> Background:
> I am currently testing a deployment of a content management framework where
> I am trying to punt Solr as the tool of choice for ingestion and searching.
>
> Current status:
> I have deployed SolrCloud across multiple servers with multiple shards and
> a replication factor of 2.
> In terms of collections, I have a person collection that contains details
> individuals including address and high level portfolio info. Structurally,
> this collection contains great grandchildren.
> Then I have a few collections that deals with content. For now, content is
> just emails and document with a max size of 2MB, with certain user
> exceptions that can go higher than 2MB.
> Content is indexed twice in terms of the actual content, firstly as
> binary/stream and then as readable text. Metadata is negligible
>
>
> Challenges:
> When performing full text searches without concurrently executing updates,
> solr seems to be doing well. Running updates also does okish given the
> nature of the transaction. However, when I run search and updates
> simultaneously, performance drops quite significantly. I have played with
> field properties, analyzers, tokenizers, shafting sizes etc.
> Any advice?
> Would like to know if anyone has done something similar. Please excuse the
> long winded message
>
>
> --
> Sent from Gmail Mobile
>
>
>
> --
> Sent from Gmail Mobile
>


Solr for Content Management

2018-06-07 Thread Moenieb Davids
Hi All,

Background:
I am currently testing a deployment of a content management framework where
I am trying to punt Solr as the tool of choice for ingestion and searching.

Current status:
I have deployed SolrCloud across multiple servers with multiple shards and
a replication factor of 2.
In terms of collections, I have a person collection that contains details
individuals including address and high level portfolio info. Structurally,
this collection contains great grandchildren.
Then I have a few collections that deals with content. For now, content is
just emails and document with a max size of 2MB, with certain user
exceptions that can go higher than 2MB.
Content is indexed twice in terms of the actual content, firstly as
binary/stream and then as readable text. Metadata is negligible


Challenges:
When performing full text searches without concurrently executing updates,
solr seems to be doing well. Running updates also does okish given the
nature of the transaction. However, when I run search and updates
simultaneously, performance drops quite significantly. I have played with
field properties, analyzers, tokenizers, shafting sizes etc.
Any advice?
Would like to know if anyone has done something similar. Please excuse the
long winded message


-- 
Sent from Gmail Mobile



-- 
Sent from Gmail Mobile


Need Help on solr for Email Search

2017-05-08 Thread Udaya Ganga Santosh Kumar Palivela
HI Team,

We are using solr for Quick retrieval of search result.
Recently we are encountered with a problem while searching for Email  in
solr.
search is performing well when i enter simple text ,but When i enter any
special characters (Like @ ,(comma))  it is not returning any results.

i have attached the schema file once please verify and let us know how to
perform search on solr for email address .

please get back to me as soon as possible.
-- 

*Thanks & Regards,*

*Santosh Palivela.*


Re: Configuring Solr for Maximum Concurrency

2016-12-29 Thread Dave Seltzer
Just a little update on my concurrency issue.

The problem I was having was that under heavy load individual Solr
instances would be slow to respond eventually leading to flapping cluster
membership.

I tweaked a bunch of settings in Linux, Jetty, Solr and within my
application but in the end none of these changes prevented the stability
issues I was having.

Instead, I modified my HAProxy config to limit the maximum simultaneous
number of connections on a per-server basis. By capping the number of
simultaneous queries being handled by Solr at 30 I've effectively prevented
long-running queries from stacking up and getting continually slower.
Instead, HAProxy is now queueing up the pending requests and letting them
in whenever there's available capacity. As a result Solr, behaves normally
under intense load and even though queries perform more slowly during these
times the it never results in runaway slowness.

My best guess as to why I ran into this issue is that perhaps my query
volume was large relative to the on-disk index size. As a result Solr
spends almost no time waiting on disk IO. This, perhaps, leaves the door
open for query-driven CPU utilization to cause more fundamental issues in
Solr's performance

Or maybe I missed something stupid at the OS level.

Sigh.

Many thanks for all the help!

-Dave

On Wed, Dec 28, 2016 at 7:11 PM, Erick Erickson 
wrote:

> You'll see some lines with three different times in them, "user" "sys"
> and "real".
> The one that really counts is "real", that's the time that the process was
> stopped while GC went on. The "stop" in "Stop the world" (STW) GC
>
> What you're looking for is two things:
>
> 1> outrageously long times
> and/or
> 2> these happening one right after the other.
>
> For <2> I've seen situations where you go on to a STW pauses, collect
> a tiny bit of memory (say a few meg) and try to continue only to go
> right back into another. It might take, say, 2 seconds of "real" time to
> do the GC then go back into another 2 second cycle 500ms later. that kind
> of thing.
>
> GCViewer can help you make sense of the GC logs
> https://sourceforge.net/projects/gcviewer/
>
> Unfortunately GC tuning is "more art than science" ;(
>
> Best,
> Erick
>
> Best,
> Erick
>
> On Wed, Dec 28, 2016 at 10:57 AM, Dave Seltzer 
> wrote:
> > Hi Erick,
> >
> > You're probably right about it not being a threading issue. In general it
> > seems that CPU contention could indeed be the issue.
> >
> > Most of the settings we're using in Solr came "right out of the box"
> > including Jetty's configuration which specifies:
> >
> > solr.jetty.threads.min: 10
> > solr.jetty.threads.max: 1
> > solr.jetty.threads.idle.timeout: 5000
> > solr.jetty.threads.stop.timeout: 6
> >
> > The only interesting thing we're doing is disabling the query cache. This
> > is because individual hash-matching queries tend to be unique and
> therefore
> > don't benefit significantly from query caching.
> >
> > On the GC side, I'm not really sure what to look for. Here's an example
> > message from /solr/logs/solr_gc.log
> >
> > 2016-12-28T13:48:56.872-0500: 9453.890: Total time for which application
> > threads were stopped: 0.8394383 seconds, Stopping threads took: 0.0004007
> > seconds
> > {Heap before GC invocations=8169 (full 124):
> >  par new generation   total 3495296K, used 3495296K [0x0003c000,
> > 0x0004c000, 0x0004c000)
> >   eden space 2796288K, 100% used [0x0003c000, 0x00046aac,
> > 0x00046aac)
> >   from space 699008K, 100% used [0x00049556, 0x0004c000,
> > 0x0004c000)
> >   to   space 699008K,   0% used [0x00046aac, 0x00046aac,
> > 0x00049556)
> >  concurrent mark-sweep generation total 12582912K, used 1253K
> > [0x0004c000, 0x0007c000, 0x0007c000)
> >  Metaspace   used 33470K, capacity 33998K, committed 34360K, reserved
> > 1079296K
> >   class spaceused 3716K, capacity 3888K, committed 3960K, reserved
> > 1048576K
> > 2016-12-28T13:48:57.415-0500: 9454.434: [GC (Allocation Failure)
> > 2016-12-28T13:48:57.415-0500: 9454.434: [ParNew
> > Desired survivor size 644205768 bytes, new threshold 3 (max 8)
> > - age   1:  284566200 bytes,  284566200 total
> > - age   2:  197448288 bytes,  482014488 total
> > - age   3:  168306328 bytes,  650320816 total
> > - age   4:   48423744 bytes,  698744560 total
> > - age   5:   17038920 bytes,  715783480 total
> > : 3495296K->699008K(3495296K), 1.2399730 secs]
> > 15606449K->13188910K(16078208K), 1.2403791 secs] [Times: user=4.60
> > sys=0.00, real=1.24 secs]
> >
> > Is there something I should be grepping for in this enormous file?
> >
> > Many thanks!
> >
> > -Dave
> >
> > On Wed, Dec 28, 2016 at 12:44 PM, Erick Erickson <
> erickerick...@gmail.com>
> > wrote:
> >
> >> Threads are usually a container parameter I think. True, Solr wants
> >> lots of threads. My return volley would be how busy is your CPU when
> >> t

Re: Configuring Solr for Maximum Concurrency

2016-12-28 Thread Erick Erickson
You'll see some lines with three different times in them, "user" "sys"
and "real".
The one that really counts is "real", that's the time that the process was
stopped while GC went on. The "stop" in "Stop the world" (STW) GC

What you're looking for is two things:

1> outrageously long times
and/or
2> these happening one right after the other.

For <2> I've seen situations where you go on to a STW pauses, collect
a tiny bit of memory (say a few meg) and try to continue only to go
right back into another. It might take, say, 2 seconds of "real" time to
do the GC then go back into another 2 second cycle 500ms later. that kind
of thing.

GCViewer can help you make sense of the GC logs
https://sourceforge.net/projects/gcviewer/

Unfortunately GC tuning is "more art than science" ;(

Best,
Erick

Best,
Erick

On Wed, Dec 28, 2016 at 10:57 AM, Dave Seltzer  wrote:
> Hi Erick,
>
> You're probably right about it not being a threading issue. In general it
> seems that CPU contention could indeed be the issue.
>
> Most of the settings we're using in Solr came "right out of the box"
> including Jetty's configuration which specifies:
>
> solr.jetty.threads.min: 10
> solr.jetty.threads.max: 1
> solr.jetty.threads.idle.timeout: 5000
> solr.jetty.threads.stop.timeout: 6
>
> The only interesting thing we're doing is disabling the query cache. This
> is because individual hash-matching queries tend to be unique and therefore
> don't benefit significantly from query caching.
>
> On the GC side, I'm not really sure what to look for. Here's an example
> message from /solr/logs/solr_gc.log
>
> 2016-12-28T13:48:56.872-0500: 9453.890: Total time for which application
> threads were stopped: 0.8394383 seconds, Stopping threads took: 0.0004007
> seconds
> {Heap before GC invocations=8169 (full 124):
>  par new generation   total 3495296K, used 3495296K [0x0003c000,
> 0x0004c000, 0x0004c000)
>   eden space 2796288K, 100% used [0x0003c000, 0x00046aac,
> 0x00046aac)
>   from space 699008K, 100% used [0x00049556, 0x0004c000,
> 0x0004c000)
>   to   space 699008K,   0% used [0x00046aac, 0x00046aac,
> 0x00049556)
>  concurrent mark-sweep generation total 12582912K, used 1253K
> [0x0004c000, 0x0007c000, 0x0007c000)
>  Metaspace   used 33470K, capacity 33998K, committed 34360K, reserved
> 1079296K
>   class spaceused 3716K, capacity 3888K, committed 3960K, reserved
> 1048576K
> 2016-12-28T13:48:57.415-0500: 9454.434: [GC (Allocation Failure)
> 2016-12-28T13:48:57.415-0500: 9454.434: [ParNew
> Desired survivor size 644205768 bytes, new threshold 3 (max 8)
> - age   1:  284566200 bytes,  284566200 total
> - age   2:  197448288 bytes,  482014488 total
> - age   3:  168306328 bytes,  650320816 total
> - age   4:   48423744 bytes,  698744560 total
> - age   5:   17038920 bytes,  715783480 total
> : 3495296K->699008K(3495296K), 1.2399730 secs]
> 15606449K->13188910K(16078208K), 1.2403791 secs] [Times: user=4.60
> sys=0.00, real=1.24 secs]
>
> Is there something I should be grepping for in this enormous file?
>
> Many thanks!
>
> -Dave
>
> On Wed, Dec 28, 2016 at 12:44 PM, Erick Erickson 
> wrote:
>
>> Threads are usually a container parameter I think. True, Solr wants
>> lots of threads. My return volley would be how busy is your CPU when
>> this happens? If it's pegged more threads probably aren't really going
>> to help. And if it's a GC issue then more threads would probably hurt.
>>
>> Best,
>> Erick
>>
>> On Wed, Dec 28, 2016 at 9:14 AM, Dave Seltzer  wrote:
>> > Hi Erick,
>> >
>> > I'll dig in on these timeout settings and see how changes affect
>> behavior.
>> >
>> > One interesting aspect is that we're not indexing any content at the
>> > moment. The rate of ingress is something like 10 to 20 documents per day.
>> >
>> > So my guess is that ZK simply is deciding that these servers are dead
>> based
>> > on the fact that responses are so very sluggish.
>> >
>> > You've mentioned lots of timeouts, but are there any settings which
>> control
>> > the number of available threads? Or is this something which is largely
>> > handled automagically?
>> >
>> > Many thanks!
>> >
>> > -Dave
>> >
>> > On Wed, Dec 28, 2016 at 11:56 AM, Erick Erickson <
>> erickerick...@gmail.com>
>> > wrote:
>> >
>> >> Dave:
>> >>
>> >> There are at least 4 timeouts (not even including ZK) that can
>> >> be relevant, defined in solr.xml:
>> >> socketTimeout
>> >> connTimeout
>> >> distribUpdateConnTimeout
>> >> distribUpdateSoTimeout
>> >>
>> >> Plus the ZK timeout
>> >> zkClientTimeout
>> >>
>> >> Plus the ZK configurations.
>> >>
>> >> So it would help narrow down what's going on if we knew why the nodes
>> >> dropped out. There are indeed a lot of messages dumped, but somewhere
>> >> in the logs there should be a root cause.
>> >>
>> >> You might see Leader Initiated Recovery (LIR) which can indicate that
>> >> an update operati

Re: Configuring Solr for Maximum Concurrency

2016-12-28 Thread Dave Seltzer
Hi Erick,

You're probably right about it not being a threading issue. In general it
seems that CPU contention could indeed be the issue.

Most of the settings we're using in Solr came "right out of the box"
including Jetty's configuration which specifies:

solr.jetty.threads.min: 10
solr.jetty.threads.max: 1
solr.jetty.threads.idle.timeout: 5000
solr.jetty.threads.stop.timeout: 6

The only interesting thing we're doing is disabling the query cache. This
is because individual hash-matching queries tend to be unique and therefore
don't benefit significantly from query caching.

On the GC side, I'm not really sure what to look for. Here's an example
message from /solr/logs/solr_gc.log

2016-12-28T13:48:56.872-0500: 9453.890: Total time for which application
threads were stopped: 0.8394383 seconds, Stopping threads took: 0.0004007
seconds
{Heap before GC invocations=8169 (full 124):
 par new generation   total 3495296K, used 3495296K [0x0003c000,
0x0004c000, 0x0004c000)
  eden space 2796288K, 100% used [0x0003c000, 0x00046aac,
0x00046aac)
  from space 699008K, 100% used [0x00049556, 0x0004c000,
0x0004c000)
  to   space 699008K,   0% used [0x00046aac, 0x00046aac,
0x00049556)
 concurrent mark-sweep generation total 12582912K, used 1253K
[0x0004c000, 0x0007c000, 0x0007c000)
 Metaspace   used 33470K, capacity 33998K, committed 34360K, reserved
1079296K
  class spaceused 3716K, capacity 3888K, committed 3960K, reserved
1048576K
2016-12-28T13:48:57.415-0500: 9454.434: [GC (Allocation Failure)
2016-12-28T13:48:57.415-0500: 9454.434: [ParNew
Desired survivor size 644205768 bytes, new threshold 3 (max 8)
- age   1:  284566200 bytes,  284566200 total
- age   2:  197448288 bytes,  482014488 total
- age   3:  168306328 bytes,  650320816 total
- age   4:   48423744 bytes,  698744560 total
- age   5:   17038920 bytes,  715783480 total
: 3495296K->699008K(3495296K), 1.2399730 secs]
15606449K->13188910K(16078208K), 1.2403791 secs] [Times: user=4.60
sys=0.00, real=1.24 secs]

Is there something I should be grepping for in this enormous file?

Many thanks!

-Dave

On Wed, Dec 28, 2016 at 12:44 PM, Erick Erickson 
wrote:

> Threads are usually a container parameter I think. True, Solr wants
> lots of threads. My return volley would be how busy is your CPU when
> this happens? If it's pegged more threads probably aren't really going
> to help. And if it's a GC issue then more threads would probably hurt.
>
> Best,
> Erick
>
> On Wed, Dec 28, 2016 at 9:14 AM, Dave Seltzer  wrote:
> > Hi Erick,
> >
> > I'll dig in on these timeout settings and see how changes affect
> behavior.
> >
> > One interesting aspect is that we're not indexing any content at the
> > moment. The rate of ingress is something like 10 to 20 documents per day.
> >
> > So my guess is that ZK simply is deciding that these servers are dead
> based
> > on the fact that responses are so very sluggish.
> >
> > You've mentioned lots of timeouts, but are there any settings which
> control
> > the number of available threads? Or is this something which is largely
> > handled automagically?
> >
> > Many thanks!
> >
> > -Dave
> >
> > On Wed, Dec 28, 2016 at 11:56 AM, Erick Erickson <
> erickerick...@gmail.com>
> > wrote:
> >
> >> Dave:
> >>
> >> There are at least 4 timeouts (not even including ZK) that can
> >> be relevant, defined in solr.xml:
> >> socketTimeout
> >> connTimeout
> >> distribUpdateConnTimeout
> >> distribUpdateSoTimeout
> >>
> >> Plus the ZK timeout
> >> zkClientTimeout
> >>
> >> Plus the ZK configurations.
> >>
> >> So it would help narrow down what's going on if we knew why the nodes
> >> dropped out. There are indeed a lot of messages dumped, but somewhere
> >> in the logs there should be a root cause.
> >>
> >> You might see Leader Initiated Recovery (LIR) which can indicate that
> >> an update operation from the leader took too long, the timeouts above
> >> can be adjusted in this case.
> >>
> >> You might see evidence that ZK couldn't get a response from Solr in
> >> "too long" and decided it was gone.
> >>
> >> You might see...
> >>
> >> One thing I'd look at very closely is GC processing. One of the
> >> culprits for this behavior I've seen is a very long GC stop-the-world
> >> pause leading to ZK thinking the node is dead and tripping this chain.
> >> Depending on the timeouts, "very long" might be a few seconds.
> >>
> >> Not entirely helpful, but until you pinpoint why the node goes into
> >> recovery it's throwing darts at the wall. GC and log messages might
> >> give some insight into the root cause.
> >>
> >> Best,
> >> Erick
> >>
> >> On Wed, Dec 28, 2016 at 8:26 AM, Dave Seltzer 
> wrote:
> >> > Hello Everyone,
> >> >
> >> > I'm working on a Solr Cloud cluster which is used in a hash matching
> >> > application.
> >> >
> >> > For performance reasons we've opted to batch-execute hash matching
> >> quer

Re: Configuring Solr for Maximum Concurrency

2016-12-28 Thread Dave Seltzer
Hi Pablo,

I'm not sure what settings govern Solr's jetty container.

/opt/solr/server/etc/jetty.xml includes the following:

solr.jetty.threads.min: 10
solr.jetty.threads.max: 1
solr.jetty.threads.idle.timeout: 5000
solr.jetty.threads.stop.timeout: 6

MAX_CONNECTIONS_PER_HOST could certainly be an issue, but I'm not sure
where that would be configured.

I'm not sure what you're asking about with respect to a singleton pattern.
My application is highly distributed (over 900 agent applications making
queries) making queries via a loadbalancer running HAProxy.

-D



On Wed, Dec 28, 2016 at 12:42 PM, Pablo Anzorena 
wrote:

> Dave,
>
> there is something similar like MAX_CONNECTIONS and
> MAX_CONNECTIONS_PER_HOST which control the number of connections.
>
> Are you leaving open the connection to zookeeper after you establish it?
> Are you using the singleton pattern?
>
> 2016-12-28 14:14 GMT-03:00 Dave Seltzer :
>
> > Hi Erick,
> >
> > I'll dig in on these timeout settings and see how changes affect
> behavior.
> >
> > One interesting aspect is that we're not indexing any content at the
> > moment. The rate of ingress is something like 10 to 20 documents per day.
> >
> > So my guess is that ZK simply is deciding that these servers are dead
> based
> > on the fact that responses are so very sluggish.
> >
> > You've mentioned lots of timeouts, but are there any settings which
> control
> > the number of available threads? Or is this something which is largely
> > handled automagically?
> >
> > Many thanks!
> >
> > -Dave
> >
> > On Wed, Dec 28, 2016 at 11:56 AM, Erick Erickson <
> erickerick...@gmail.com>
> > wrote:
> >
> > > Dave:
> > >
> > > There are at least 4 timeouts (not even including ZK) that can
> > > be relevant, defined in solr.xml:
> > > socketTimeout
> > > connTimeout
> > > distribUpdateConnTimeout
> > > distribUpdateSoTimeout
> > >
> > > Plus the ZK timeout
> > > zkClientTimeout
> > >
> > > Plus the ZK configurations.
> > >
> > > So it would help narrow down what's going on if we knew why the nodes
> > > dropped out. There are indeed a lot of messages dumped, but somewhere
> > > in the logs there should be a root cause.
> > >
> > > You might see Leader Initiated Recovery (LIR) which can indicate that
> > > an update operation from the leader took too long, the timeouts above
> > > can be adjusted in this case.
> > >
> > > You might see evidence that ZK couldn't get a response from Solr in
> > > "too long" and decided it was gone.
> > >
> > > You might see...
> > >
> > > One thing I'd look at very closely is GC processing. One of the
> > > culprits for this behavior I've seen is a very long GC stop-the-world
> > > pause leading to ZK thinking the node is dead and tripping this chain.
> > > Depending on the timeouts, "very long" might be a few seconds.
> > >
> > > Not entirely helpful, but until you pinpoint why the node goes into
> > > recovery it's throwing darts at the wall. GC and log messages might
> > > give some insight into the root cause.
> > >
> > > Best,
> > > Erick
> > >
> > > On Wed, Dec 28, 2016 at 8:26 AM, Dave Seltzer 
> > wrote:
> > > > Hello Everyone,
> > > >
> > > > I'm working on a Solr Cloud cluster which is used in a hash matching
> > > > application.
> > > >
> > > > For performance reasons we've opted to batch-execute hash matching
> > > queries.
> > > > This means that a single query will contain many nested queries. As
> you
> > > > might expect, these queries take a while to execute. (On the order
> of 5
> > > to
> > > > 10 seconds.)
> > > >
> > > > I've noticed that Solr will act erratically when we send too many
> > > > long-running queries. Specifically, heavily-loaded servers will
> > > repeatedly
> > > > fall out of the cluster and then recover. My theory is that there's
> > some
> > > > limit on the number of concurrent connections and that client queries
> > are
> > > > preventing zookeeper related queries... but I'm not sure. I've
> > increased
> > > > ZKClientTimeout to combat this.
> > > >
> > > > My question is: What configuration settings should I be looking at in
> > > order
> > > > to make sure I'm maximizing the ability of Solr to handle concurrent
> > > > requests.
> > > >
> > > > Many thanks!
> > > >
> > > > -Dave
> > >
> >
>


Re: Configuring Solr for Maximum Concurrency

2016-12-28 Thread Erick Erickson
Threads are usually a container parameter I think. True, Solr wants
lots of threads. My return volley would be how busy is your CPU when
this happens? If it's pegged more threads probably aren't really going
to help. And if it's a GC issue then more threads would probably hurt.

Best,
Erick

On Wed, Dec 28, 2016 at 9:14 AM, Dave Seltzer  wrote:
> Hi Erick,
>
> I'll dig in on these timeout settings and see how changes affect behavior.
>
> One interesting aspect is that we're not indexing any content at the
> moment. The rate of ingress is something like 10 to 20 documents per day.
>
> So my guess is that ZK simply is deciding that these servers are dead based
> on the fact that responses are so very sluggish.
>
> You've mentioned lots of timeouts, but are there any settings which control
> the number of available threads? Or is this something which is largely
> handled automagically?
>
> Many thanks!
>
> -Dave
>
> On Wed, Dec 28, 2016 at 11:56 AM, Erick Erickson 
> wrote:
>
>> Dave:
>>
>> There are at least 4 timeouts (not even including ZK) that can
>> be relevant, defined in solr.xml:
>> socketTimeout
>> connTimeout
>> distribUpdateConnTimeout
>> distribUpdateSoTimeout
>>
>> Plus the ZK timeout
>> zkClientTimeout
>>
>> Plus the ZK configurations.
>>
>> So it would help narrow down what's going on if we knew why the nodes
>> dropped out. There are indeed a lot of messages dumped, but somewhere
>> in the logs there should be a root cause.
>>
>> You might see Leader Initiated Recovery (LIR) which can indicate that
>> an update operation from the leader took too long, the timeouts above
>> can be adjusted in this case.
>>
>> You might see evidence that ZK couldn't get a response from Solr in
>> "too long" and decided it was gone.
>>
>> You might see...
>>
>> One thing I'd look at very closely is GC processing. One of the
>> culprits for this behavior I've seen is a very long GC stop-the-world
>> pause leading to ZK thinking the node is dead and tripping this chain.
>> Depending on the timeouts, "very long" might be a few seconds.
>>
>> Not entirely helpful, but until you pinpoint why the node goes into
>> recovery it's throwing darts at the wall. GC and log messages might
>> give some insight into the root cause.
>>
>> Best,
>> Erick
>>
>> On Wed, Dec 28, 2016 at 8:26 AM, Dave Seltzer  wrote:
>> > Hello Everyone,
>> >
>> > I'm working on a Solr Cloud cluster which is used in a hash matching
>> > application.
>> >
>> > For performance reasons we've opted to batch-execute hash matching
>> queries.
>> > This means that a single query will contain many nested queries. As you
>> > might expect, these queries take a while to execute. (On the order of 5
>> to
>> > 10 seconds.)
>> >
>> > I've noticed that Solr will act erratically when we send too many
>> > long-running queries. Specifically, heavily-loaded servers will
>> repeatedly
>> > fall out of the cluster and then recover. My theory is that there's some
>> > limit on the number of concurrent connections and that client queries are
>> > preventing zookeeper related queries... but I'm not sure. I've increased
>> > ZKClientTimeout to combat this.
>> >
>> > My question is: What configuration settings should I be looking at in
>> order
>> > to make sure I'm maximizing the ability of Solr to handle concurrent
>> > requests.
>> >
>> > Many thanks!
>> >
>> > -Dave
>>


Re: Configuring Solr for Maximum Concurrency

2016-12-28 Thread Pablo Anzorena
Dave,

there is something similar like MAX_CONNECTIONS and
MAX_CONNECTIONS_PER_HOST which control the number of connections.

Are you leaving open the connection to zookeeper after you establish it?
Are you using the singleton pattern?

2016-12-28 14:14 GMT-03:00 Dave Seltzer :

> Hi Erick,
>
> I'll dig in on these timeout settings and see how changes affect behavior.
>
> One interesting aspect is that we're not indexing any content at the
> moment. The rate of ingress is something like 10 to 20 documents per day.
>
> So my guess is that ZK simply is deciding that these servers are dead based
> on the fact that responses are so very sluggish.
>
> You've mentioned lots of timeouts, but are there any settings which control
> the number of available threads? Or is this something which is largely
> handled automagically?
>
> Many thanks!
>
> -Dave
>
> On Wed, Dec 28, 2016 at 11:56 AM, Erick Erickson 
> wrote:
>
> > Dave:
> >
> > There are at least 4 timeouts (not even including ZK) that can
> > be relevant, defined in solr.xml:
> > socketTimeout
> > connTimeout
> > distribUpdateConnTimeout
> > distribUpdateSoTimeout
> >
> > Plus the ZK timeout
> > zkClientTimeout
> >
> > Plus the ZK configurations.
> >
> > So it would help narrow down what's going on if we knew why the nodes
> > dropped out. There are indeed a lot of messages dumped, but somewhere
> > in the logs there should be a root cause.
> >
> > You might see Leader Initiated Recovery (LIR) which can indicate that
> > an update operation from the leader took too long, the timeouts above
> > can be adjusted in this case.
> >
> > You might see evidence that ZK couldn't get a response from Solr in
> > "too long" and decided it was gone.
> >
> > You might see...
> >
> > One thing I'd look at very closely is GC processing. One of the
> > culprits for this behavior I've seen is a very long GC stop-the-world
> > pause leading to ZK thinking the node is dead and tripping this chain.
> > Depending on the timeouts, "very long" might be a few seconds.
> >
> > Not entirely helpful, but until you pinpoint why the node goes into
> > recovery it's throwing darts at the wall. GC and log messages might
> > give some insight into the root cause.
> >
> > Best,
> > Erick
> >
> > On Wed, Dec 28, 2016 at 8:26 AM, Dave Seltzer 
> wrote:
> > > Hello Everyone,
> > >
> > > I'm working on a Solr Cloud cluster which is used in a hash matching
> > > application.
> > >
> > > For performance reasons we've opted to batch-execute hash matching
> > queries.
> > > This means that a single query will contain many nested queries. As you
> > > might expect, these queries take a while to execute. (On the order of 5
> > to
> > > 10 seconds.)
> > >
> > > I've noticed that Solr will act erratically when we send too many
> > > long-running queries. Specifically, heavily-loaded servers will
> > repeatedly
> > > fall out of the cluster and then recover. My theory is that there's
> some
> > > limit on the number of concurrent connections and that client queries
> are
> > > preventing zookeeper related queries... but I'm not sure. I've
> increased
> > > ZKClientTimeout to combat this.
> > >
> > > My question is: What configuration settings should I be looking at in
> > order
> > > to make sure I'm maximizing the ability of Solr to handle concurrent
> > > requests.
> > >
> > > Many thanks!
> > >
> > > -Dave
> >
>


Re: Configuring Solr for Maximum Concurrency

2016-12-28 Thread Dave Seltzer
Hi Erick,

I'll dig in on these timeout settings and see how changes affect behavior.

One interesting aspect is that we're not indexing any content at the
moment. The rate of ingress is something like 10 to 20 documents per day.

So my guess is that ZK simply is deciding that these servers are dead based
on the fact that responses are so very sluggish.

You've mentioned lots of timeouts, but are there any settings which control
the number of available threads? Or is this something which is largely
handled automagically?

Many thanks!

-Dave

On Wed, Dec 28, 2016 at 11:56 AM, Erick Erickson 
wrote:

> Dave:
>
> There are at least 4 timeouts (not even including ZK) that can
> be relevant, defined in solr.xml:
> socketTimeout
> connTimeout
> distribUpdateConnTimeout
> distribUpdateSoTimeout
>
> Plus the ZK timeout
> zkClientTimeout
>
> Plus the ZK configurations.
>
> So it would help narrow down what's going on if we knew why the nodes
> dropped out. There are indeed a lot of messages dumped, but somewhere
> in the logs there should be a root cause.
>
> You might see Leader Initiated Recovery (LIR) which can indicate that
> an update operation from the leader took too long, the timeouts above
> can be adjusted in this case.
>
> You might see evidence that ZK couldn't get a response from Solr in
> "too long" and decided it was gone.
>
> You might see...
>
> One thing I'd look at very closely is GC processing. One of the
> culprits for this behavior I've seen is a very long GC stop-the-world
> pause leading to ZK thinking the node is dead and tripping this chain.
> Depending on the timeouts, "very long" might be a few seconds.
>
> Not entirely helpful, but until you pinpoint why the node goes into
> recovery it's throwing darts at the wall. GC and log messages might
> give some insight into the root cause.
>
> Best,
> Erick
>
> On Wed, Dec 28, 2016 at 8:26 AM, Dave Seltzer  wrote:
> > Hello Everyone,
> >
> > I'm working on a Solr Cloud cluster which is used in a hash matching
> > application.
> >
> > For performance reasons we've opted to batch-execute hash matching
> queries.
> > This means that a single query will contain many nested queries. As you
> > might expect, these queries take a while to execute. (On the order of 5
> to
> > 10 seconds.)
> >
> > I've noticed that Solr will act erratically when we send too many
> > long-running queries. Specifically, heavily-loaded servers will
> repeatedly
> > fall out of the cluster and then recover. My theory is that there's some
> > limit on the number of concurrent connections and that client queries are
> > preventing zookeeper related queries... but I'm not sure. I've increased
> > ZKClientTimeout to combat this.
> >
> > My question is: What configuration settings should I be looking at in
> order
> > to make sure I'm maximizing the ability of Solr to handle concurrent
> > requests.
> >
> > Many thanks!
> >
> > -Dave
>


Re: Configuring Solr for Maximum Concurrency

2016-12-28 Thread Erick Erickson
Dave:

There are at least 4 timeouts (not even including ZK) that can
be relevant, defined in solr.xml:
socketTimeout
connTimeout
distribUpdateConnTimeout
distribUpdateSoTimeout

Plus the ZK timeout
zkClientTimeout

Plus the ZK configurations.

So it would help narrow down what's going on if we knew why the nodes
dropped out. There are indeed a lot of messages dumped, but somewhere
in the logs there should be a root cause.

You might see Leader Initiated Recovery (LIR) which can indicate that
an update operation from the leader took too long, the timeouts above
can be adjusted in this case.

You might see evidence that ZK couldn't get a response from Solr in
"too long" and decided it was gone.

You might see...

One thing I'd look at very closely is GC processing. One of the
culprits for this behavior I've seen is a very long GC stop-the-world
pause leading to ZK thinking the node is dead and tripping this chain.
Depending on the timeouts, "very long" might be a few seconds.

Not entirely helpful, but until you pinpoint why the node goes into
recovery it's throwing darts at the wall. GC and log messages might
give some insight into the root cause.

Best,
Erick

On Wed, Dec 28, 2016 at 8:26 AM, Dave Seltzer  wrote:
> Hello Everyone,
>
> I'm working on a Solr Cloud cluster which is used in a hash matching
> application.
>
> For performance reasons we've opted to batch-execute hash matching queries.
> This means that a single query will contain many nested queries. As you
> might expect, these queries take a while to execute. (On the order of 5 to
> 10 seconds.)
>
> I've noticed that Solr will act erratically when we send too many
> long-running queries. Specifically, heavily-loaded servers will repeatedly
> fall out of the cluster and then recover. My theory is that there's some
> limit on the number of concurrent connections and that client queries are
> preventing zookeeper related queries... but I'm not sure. I've increased
> ZKClientTimeout to combat this.
>
> My question is: What configuration settings should I be looking at in order
> to make sure I'm maximizing the ability of Solr to handle concurrent
> requests.
>
> Many thanks!
>
> -Dave


Configuring Solr for Maximum Concurrency

2016-12-28 Thread Dave Seltzer
Hello Everyone,

I'm working on a Solr Cloud cluster which is used in a hash matching
application.

For performance reasons we've opted to batch-execute hash matching queries.
This means that a single query will contain many nested queries. As you
might expect, these queries take a while to execute. (On the order of 5 to
10 seconds.)

I've noticed that Solr will act erratically when we send too many
long-running queries. Specifically, heavily-loaded servers will repeatedly
fall out of the cluster and then recover. My theory is that there's some
limit on the number of concurrent connections and that client queries are
preventing zookeeper related queries... but I'm not sure. I've increased
ZKClientTimeout to combat this.

My question is: What configuration settings should I be looking at in order
to make sure I'm maximizing the ability of Solr to handle concurrent
requests.

Many thanks!

-Dave


Re: Solr for Multi Tenant architecture

2016-09-06 Thread Chamil Jeewantha
Dear all,

Thank you for all your advices.

This comment says:

"SolrCloud starts to have serious problems when you create a lot of
collections.
We are aware of the scalability issues, but they are not easy to fix."

http://lucene.472066.n3.nabble.com/Fwd-Solr-Cloud-6-0-0-hangs-when-creating-large-amount-of-collections-and-node-fails-to-recover-aftert-tp4276364p4276404.html

So I am doubt whether it will affect when our system goes beyond thousands
of tenants..

One way I feel is adding a custom load balancing mechanism which will route
tenants to different solr clusters. Any easy way of dealing with this
situation?

Best Regards,
Chamil

On Wed, Aug 31, 2016 at 1:42 PM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:

> HI Chamil,
>
> One thing to consider is relevancy, especially in case tenants' domains
> are different (e.g. one is tech and other pharmacy). If you go with one
> collection and use same field (e.g. desc) for all tenants, you will get one
> field stats and could skew results ordering if you order by score (e.g.
> word 'cream' might be infrequent in tech tenant but could become frequent
> overall because of large pharmacy tenant).
>
> On the other side having large number of collection could also be
> problematic. You can address that issue with splitting tenants to multiple
> clusters, or having collections for large tenants and grouping smaller
> tenants by domain.
>
> Make sure that you use routing by tenant id in case of multi tenant
> collection.
>
> HTH,
> Emir
>
>
>
> On 28.08.2016 07:02, Chamil Jeewantha wrote:
>
>> Thank you everyone for your great support.
>>
>> I will update you with our final approach.
>>
>> Best regards,
>> Chamil
>>
>> On Aug 28, 2016 01:34, "John Bickerstaff" 
>> wrote:
>>
>> In my own work, the risk to the business if every single client cannot
>>> access search is so great, we would never consider putting everything in
>>> one.  You should certainly ask that question of the business stakeholders
>>> before you decide.
>>>
>>> For that reason, I might recommend that each of the multiple collections
>>> suggested above by Erick could also be on a separate SolrCloud (or single
>>> Solr instance) so that no single failure can ever take down every
>>> tenant's
>>> ability to search -- only those on that particular SolrCloud...
>>>
>>> On Sat, Aug 27, 2016 at 10:36 AM, Erick Erickson <
>>> erickerick...@gmail.com>
>>> wrote:
>>>
>>> There's no one right answer here. I've also seen a hybrid approach
 where there are multiple collections each of which has some
 number of tenants resident. Eventually, you need to think of some
 kind of partitioning, my rough number of documents for a single core
 is 50M (NOTE: I've seen between 10M and 300M docs fit in a core).

 All that said, you may also be interested in the "transient cores"
 option, see: https://cwiki.apache.org/confluence/display/solr/
 Defining+core.properties
 and the transient and transientCacheSize (this latter in solr.xml). Note
 that this is stand-alone only so you can't move that concept to
 SolrCloud if you eventually go there.

 Best,
 Erick

 On Fri, Aug 26, 2016 at 12:13 PM, Chamil Jeewantha 
 wrote:

> Dear Solr Members,
>
> We are using SolrCloud as the search provider of a multi-tenant cloud
>
 based

> application. We have one schema for all the tenants. The indexes will
>
 have

> large number(millions) of documents.
>
> As of our research, we have two options,
>
> - One large collection for all the tenants and use Composite-ID
>
 routing

> - Collection per tenant
>
> The below mail says,
>
>
> https://mail-archives.apache.org/mod_mbox/lucene-solr-user/
>
 201403.mbox/%3c5324cd4b.2020...@protulae.com%3E

> SolrCloud is *more scalable in terms of index size*. Plus you get
> redundancy which can't be underestimated in a hosted solution.
>
>
> AND
>
> The issue is management. 1000s of cores/collections require a level of
> automation. On the other hand, having a single core/collection means if
> you make one change to the schema or solrconfig, it affects everyone.
>
>
> Based on the above facts we think One large collection will be the way
>
 to
>>>
 go.
>
> Questions:
>
> 1. Is that the right way to go?
> 2. Will it be a hassle when we need to do reindexing?
> 3. What is the chance of entire collection crash? (in that case all
> tenants will be affected and reindexing will be painful.
>
> Thank you in advance for your kind opinion.
>
> Best Regards,
> Chamil
>
> --
> http://kavimalla.blgospot.com
> http://kdchamil.blogspot.com
>

> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/

Re: Solr for Multi Tenant architecture

2016-08-31 Thread Emir Arnautovic

HI Chamil,

One thing to consider is relevancy, especially in case tenants' domains 
are different (e.g. one is tech and other pharmacy). If you go with one 
collection and use same field (e.g. desc) for all tenants, you will get 
one field stats and could skew results ordering if you order by score 
(e.g. word 'cream' might be infrequent in tech tenant but could become 
frequent overall because of large pharmacy tenant).


On the other side having large number of collection could also be 
problematic. You can address that issue with splitting tenants to 
multiple clusters, or having collections for large tenants and grouping 
smaller tenants by domain.


Make sure that you use routing by tenant id in case of multi tenant 
collection.


HTH,
Emir


On 28.08.2016 07:02, Chamil Jeewantha wrote:

Thank you everyone for your great support.

I will update you with our final approach.

Best regards,
Chamil

On Aug 28, 2016 01:34, "John Bickerstaff"  wrote:


In my own work, the risk to the business if every single client cannot
access search is so great, we would never consider putting everything in
one.  You should certainly ask that question of the business stakeholders
before you decide.

For that reason, I might recommend that each of the multiple collections
suggested above by Erick could also be on a separate SolrCloud (or single
Solr instance) so that no single failure can ever take down every tenant's
ability to search -- only those on that particular SolrCloud...

On Sat, Aug 27, 2016 at 10:36 AM, Erick Erickson 
wrote:


There's no one right answer here. I've also seen a hybrid approach
where there are multiple collections each of which has some
number of tenants resident. Eventually, you need to think of some
kind of partitioning, my rough number of documents for a single core
is 50M (NOTE: I've seen between 10M and 300M docs fit in a core).

All that said, you may also be interested in the "transient cores"
option, see: https://cwiki.apache.org/confluence/display/solr/
Defining+core.properties
and the transient and transientCacheSize (this latter in solr.xml). Note
that this is stand-alone only so you can't move that concept to
SolrCloud if you eventually go there.

Best,
Erick

On Fri, Aug 26, 2016 at 12:13 PM, Chamil Jeewantha 
wrote:

Dear Solr Members,

We are using SolrCloud as the search provider of a multi-tenant cloud

based

application. We have one schema for all the tenants. The indexes will

have

large number(millions) of documents.

As of our research, we have two options,

- One large collection for all the tenants and use Composite-ID

routing

- Collection per tenant

The below mail says,


https://mail-archives.apache.org/mod_mbox/lucene-solr-user/

201403.mbox/%3c5324cd4b.2020...@protulae.com%3E

SolrCloud is *more scalable in terms of index size*. Plus you get
redundancy which can't be underestimated in a hosted solution.


AND

The issue is management. 1000s of cores/collections require a level of
automation. On the other hand, having a single core/collection means if
you make one change to the schema or solrconfig, it affects everyone.


Based on the above facts we think One large collection will be the way

to

go.

Questions:

1. Is that the right way to go?
2. Will it be a hassle when we need to do reindexing?
3. What is the chance of entire collection crash? (in that case all
tenants will be affected and reindexing will be painful.

Thank you in advance for your kind opinion.

Best Regards,
Chamil

--
http://kavimalla.blgospot.com
http://kdchamil.blogspot.com


--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: Solr for Multi Tenant architecture

2016-08-28 Thread Walter Underwood
Apple did a preso on massive multi-tenancy. I haven’t watched it yet, but it 
might help.

https://www.youtube.com/watch?v=_Erkln5WWLw 


wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Aug 27, 2016, at 10:02 PM, Chamil Jeewantha  wrote:
> 
> Thank you everyone for your great support.
> 
> I will update you with our final approach.
> 
> Best regards,
> Chamil
> 
> On Aug 28, 2016 01:34, "John Bickerstaff"  wrote:
> 
>> In my own work, the risk to the business if every single client cannot
>> access search is so great, we would never consider putting everything in
>> one.  You should certainly ask that question of the business stakeholders
>> before you decide.
>> 
>> For that reason, I might recommend that each of the multiple collections
>> suggested above by Erick could also be on a separate SolrCloud (or single
>> Solr instance) so that no single failure can ever take down every tenant's
>> ability to search -- only those on that particular SolrCloud...
>> 
>> On Sat, Aug 27, 2016 at 10:36 AM, Erick Erickson 
>> wrote:
>> 
>>> There's no one right answer here. I've also seen a hybrid approach
>>> where there are multiple collections each of which has some
>>> number of tenants resident. Eventually, you need to think of some
>>> kind of partitioning, my rough number of documents for a single core
>>> is 50M (NOTE: I've seen between 10M and 300M docs fit in a core).
>>> 
>>> All that said, you may also be interested in the "transient cores"
>>> option, see: https://cwiki.apache.org/confluence/display/solr/
>>> Defining+core.properties
>>> and the transient and transientCacheSize (this latter in solr.xml). Note
>>> that this is stand-alone only so you can't move that concept to
>>> SolrCloud if you eventually go there.
>>> 
>>> Best,
>>> Erick
>>> 
>>> On Fri, Aug 26, 2016 at 12:13 PM, Chamil Jeewantha 
>>> wrote:
 Dear Solr Members,
 
 We are using SolrCloud as the search provider of a multi-tenant cloud
>>> based
 application. We have one schema for all the tenants. The indexes will
>>> have
 large number(millions) of documents.
 
 As of our research, we have two options,
 
   - One large collection for all the tenants and use Composite-ID
>>> routing
   - Collection per tenant
 
 The below mail says,
 
 
 https://mail-archives.apache.org/mod_mbox/lucene-solr-user/
>>> 201403.mbox/%3c5324cd4b.2020...@protulae.com%3E
 
 SolrCloud is *more scalable in terms of index size*. Plus you get
 redundancy which can't be underestimated in a hosted solution.
 
 
 AND
 
 The issue is management. 1000s of cores/collections require a level of
 automation. On the other hand, having a single core/collection means if
 you make one change to the schema or solrconfig, it affects everyone.
 
 
 Based on the above facts we think One large collection will be the way
>> to
 go.
 
 Questions:
 
   1. Is that the right way to go?
   2. Will it be a hassle when we need to do reindexing?
   3. What is the chance of entire collection crash? (in that case all
   tenants will be affected and reindexing will be painful.
 
 Thank you in advance for your kind opinion.
 
 Best Regards,
 Chamil
 
 --
 http://kavimalla.blgospot.com
 http://kdchamil.blogspot.com
>>> 
>> 



Re: Solr for Multi Tenant architecture

2016-08-27 Thread Chamil Jeewantha
Thank you everyone for your great support.

I will update you with our final approach.

Best regards,
Chamil

On Aug 28, 2016 01:34, "John Bickerstaff"  wrote:

> In my own work, the risk to the business if every single client cannot
> access search is so great, we would never consider putting everything in
> one.  You should certainly ask that question of the business stakeholders
> before you decide.
>
> For that reason, I might recommend that each of the multiple collections
> suggested above by Erick could also be on a separate SolrCloud (or single
> Solr instance) so that no single failure can ever take down every tenant's
> ability to search -- only those on that particular SolrCloud...
>
> On Sat, Aug 27, 2016 at 10:36 AM, Erick Erickson 
> wrote:
>
> > There's no one right answer here. I've also seen a hybrid approach
> > where there are multiple collections each of which has some
> > number of tenants resident. Eventually, you need to think of some
> > kind of partitioning, my rough number of documents for a single core
> > is 50M (NOTE: I've seen between 10M and 300M docs fit in a core).
> >
> > All that said, you may also be interested in the "transient cores"
> > option, see: https://cwiki.apache.org/confluence/display/solr/
> > Defining+core.properties
> > and the transient and transientCacheSize (this latter in solr.xml). Note
> > that this is stand-alone only so you can't move that concept to
> > SolrCloud if you eventually go there.
> >
> > Best,
> > Erick
> >
> > On Fri, Aug 26, 2016 at 12:13 PM, Chamil Jeewantha 
> > wrote:
> > > Dear Solr Members,
> > >
> > > We are using SolrCloud as the search provider of a multi-tenant cloud
> > based
> > > application. We have one schema for all the tenants. The indexes will
> > have
> > > large number(millions) of documents.
> > >
> > > As of our research, we have two options,
> > >
> > >- One large collection for all the tenants and use Composite-ID
> > routing
> > >- Collection per tenant
> > >
> > > The below mail says,
> > >
> > >
> > > https://mail-archives.apache.org/mod_mbox/lucene-solr-user/
> > 201403.mbox/%3c5324cd4b.2020...@protulae.com%3E
> > >
> > > SolrCloud is *more scalable in terms of index size*. Plus you get
> > > redundancy which can't be underestimated in a hosted solution.
> > >
> > >
> > > AND
> > >
> > > The issue is management. 1000s of cores/collections require a level of
> > > automation. On the other hand, having a single core/collection means if
> > > you make one change to the schema or solrconfig, it affects everyone.
> > >
> > >
> > > Based on the above facts we think One large collection will be the way
> to
> > > go.
> > >
> > > Questions:
> > >
> > >1. Is that the right way to go?
> > >2. Will it be a hassle when we need to do reindexing?
> > >3. What is the chance of entire collection crash? (in that case all
> > >tenants will be affected and reindexing will be painful.
> > >
> > > Thank you in advance for your kind opinion.
> > >
> > > Best Regards,
> > > Chamil
> > >
> > > --
> > > http://kavimalla.blgospot.com
> > > http://kdchamil.blogspot.com
> >
>


Re: Solr for Multi Tenant architecture

2016-08-27 Thread John Bickerstaff
In my own work, the risk to the business if every single client cannot
access search is so great, we would never consider putting everything in
one.  You should certainly ask that question of the business stakeholders
before you decide.

For that reason, I might recommend that each of the multiple collections
suggested above by Erick could also be on a separate SolrCloud (or single
Solr instance) so that no single failure can ever take down every tenant's
ability to search -- only those on that particular SolrCloud...

On Sat, Aug 27, 2016 at 10:36 AM, Erick Erickson 
wrote:

> There's no one right answer here. I've also seen a hybrid approach
> where there are multiple collections each of which has some
> number of tenants resident. Eventually, you need to think of some
> kind of partitioning, my rough number of documents for a single core
> is 50M (NOTE: I've seen between 10M and 300M docs fit in a core).
>
> All that said, you may also be interested in the "transient cores"
> option, see: https://cwiki.apache.org/confluence/display/solr/
> Defining+core.properties
> and the transient and transientCacheSize (this latter in solr.xml). Note
> that this is stand-alone only so you can't move that concept to
> SolrCloud if you eventually go there.
>
> Best,
> Erick
>
> On Fri, Aug 26, 2016 at 12:13 PM, Chamil Jeewantha 
> wrote:
> > Dear Solr Members,
> >
> > We are using SolrCloud as the search provider of a multi-tenant cloud
> based
> > application. We have one schema for all the tenants. The indexes will
> have
> > large number(millions) of documents.
> >
> > As of our research, we have two options,
> >
> >- One large collection for all the tenants and use Composite-ID
> routing
> >- Collection per tenant
> >
> > The below mail says,
> >
> >
> > https://mail-archives.apache.org/mod_mbox/lucene-solr-user/
> 201403.mbox/%3c5324cd4b.2020...@protulae.com%3E
> >
> > SolrCloud is *more scalable in terms of index size*. Plus you get
> > redundancy which can't be underestimated in a hosted solution.
> >
> >
> > AND
> >
> > The issue is management. 1000s of cores/collections require a level of
> > automation. On the other hand, having a single core/collection means if
> > you make one change to the schema or solrconfig, it affects everyone.
> >
> >
> > Based on the above facts we think One large collection will be the way to
> > go.
> >
> > Questions:
> >
> >1. Is that the right way to go?
> >2. Will it be a hassle when we need to do reindexing?
> >3. What is the chance of entire collection crash? (in that case all
> >tenants will be affected and reindexing will be painful.
> >
> > Thank you in advance for your kind opinion.
> >
> > Best Regards,
> > Chamil
> >
> > --
> > http://kavimalla.blgospot.com
> > http://kdchamil.blogspot.com
>


Re: Solr for Multi Tenant architecture

2016-08-27 Thread Shawn Heisey
On 8/26/2016 1:13 PM, Chamil Jeewantha wrote:
> We are using SolrCloud as the search provider of a multi-tenant cloud based
> application. We have one schema for all the tenants. The indexes will have
> large number(millions) of documents.
>
> As of our research, we have two options,
>
>- One large collection for all the tenants and use Composite-ID routing
>- Collection per tenant

I would tend to agree that you should use SolrCloud.  And to avoid
potential problems, each tenant should have their own collection or
collections.

You probably also need to put a smart load balancer in front of Solr
that can restrict access to URL paths containing the collection names to
the source addresses for each tenant.  The tenants should have no access
to the admin UI, because it's not possible to keep people using the
admin UI from seeing collections that aren't theirs.  Developing that
kind of security could be possible, but won't be easy at all.

If access to the admin UI is something that your customers demand, then
I think you'll need to have an entire cloud per tenant -- which probably
means you're going to want to delve into virtualization, possibly using
one of the lightweight implementations like Docker.  Note that if you
take this path, you're going to need a LOT of RAM -- much more than you
might imagine.

Thanks,
Shawn



Re: Solr for Multi Tenant architecture

2016-08-27 Thread Erick Erickson
There's no one right answer here. I've also seen a hybrid approach
where there are multiple collections each of which has some
number of tenants resident. Eventually, you need to think of some
kind of partitioning, my rough number of documents for a single core
is 50M (NOTE: I've seen between 10M and 300M docs fit in a core).

All that said, you may also be interested in the "transient cores"
option, see: 
https://cwiki.apache.org/confluence/display/solr/Defining+core.properties
and the transient and transientCacheSize (this latter in solr.xml). Note
that this is stand-alone only so you can't move that concept to
SolrCloud if you eventually go there.

Best,
Erick

On Fri, Aug 26, 2016 at 12:13 PM, Chamil Jeewantha  wrote:
> Dear Solr Members,
>
> We are using SolrCloud as the search provider of a multi-tenant cloud based
> application. We have one schema for all the tenants. The indexes will have
> large number(millions) of documents.
>
> As of our research, we have two options,
>
>- One large collection for all the tenants and use Composite-ID routing
>- Collection per tenant
>
> The below mail says,
>
>
> https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201403.mbox/%3c5324cd4b.2020...@protulae.com%3E
>
> SolrCloud is *more scalable in terms of index size*. Plus you get
> redundancy which can't be underestimated in a hosted solution.
>
>
> AND
>
> The issue is management. 1000s of cores/collections require a level of
> automation. On the other hand, having a single core/collection means if
> you make one change to the schema or solrconfig, it affects everyone.
>
>
> Based on the above facts we think One large collection will be the way to
> go.
>
> Questions:
>
>1. Is that the right way to go?
>2. Will it be a hassle when we need to do reindexing?
>3. What is the chance of entire collection crash? (in that case all
>tenants will be affected and reindexing will be painful.
>
> Thank you in advance for your kind opinion.
>
> Best Regards,
> Chamil
>
> --
> http://kavimalla.blgospot.com
> http://kdchamil.blogspot.com


Solr for Multi Tenant architecture

2016-08-26 Thread Chamil Jeewantha
Dear Solr Members,

We are using SolrCloud as the search provider of a multi-tenant cloud based
application. We have one schema for all the tenants. The indexes will have
large number(millions) of documents.

As of our research, we have two options,

   - One large collection for all the tenants and use Composite-ID routing
   - Collection per tenant

The below mail says,


https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201403.mbox/%3c5324cd4b.2020...@protulae.com%3E

SolrCloud is *more scalable in terms of index size*. Plus you get
redundancy which can't be underestimated in a hosted solution.


AND

The issue is management. 1000s of cores/collections require a level of
automation. On the other hand, having a single core/collection means if
you make one change to the schema or solrconfig, it affects everyone.


Based on the above facts we think One large collection will be the way to
go.

Questions:

   1. Is that the right way to go?
   2. Will it be a hassle when we need to do reindexing?
   3. What is the chance of entire collection crash? (in that case all
   tenants will be affected and reindexing will be painful.

Thank you in advance for your kind opinion.

Best Regards,
Chamil

-- 
http://kavimalla.blgospot.com
http://kdchamil.blogspot.com


Re: How to search in solr for words like %rek Dr%

2016-05-11 Thread Ahmet Arslan
Hi Thrinadh,

Why don't you use plain wildcard search? There are two operator star and 
question mark for this purpose.

Ahmet


On Wednesday, May 11, 2016 4:31 AM, Thrinadh Kuppili  
wrote:
Thank you, Yes i am aware that surround with quotes will result in match for
space but i am trying to match word based on input which cant be controlled. 
I need to search solr for %rek Dr%  and return all result which has "rek Dr"
without qoutes.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-search-in-solr-for-words-like-rek-Dr-tp4275854p4276027.html

Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to search in solr for words like %rek Dr%

2016-05-10 Thread Thrinadh Kuppili
Thank you, Yes i am aware that surround with quotes will result in match for
space but i am trying to match word based on input which cant be controlled. 
I need to search solr for %rek Dr%  and return all result which has "rek Dr"
without qoutes.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-search-in-solr-for-words-like-rek-Dr-tp4275854p4276027.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to search in solr for words like %rek Dr%

2016-05-10 Thread Walter Underwood
That is going to be a very slow search in Solr.

But if you want to match space separated words, that is very easy and fast in 
Solr. Surround the phrase in quotes: “N Derek”.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On May 10, 2016, at 3:53 PM, Thrinadh Kuppili  wrote:
> 
> Thanks Nick, will look into it.
> 
> My main moto is to able to search like %xxx xxx% similar to database search
> of contians with.
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-search-in-solr-for-words-like-rek-Dr-tp4275854p4275970.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to search in solr for words like %rek Dr%

2016-05-10 Thread Thrinadh Kuppili
Thanks Nick, will look into it.

My main moto is to able to search like %xxx xxx% similar to database search
of contians with.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-search-in-solr-for-words-like-rek-Dr-tp4275854p4275970.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to search in solr for words like %rek Dr%

2016-05-10 Thread Nick D
Don't really get what "Q= {!dismax qf=address} "rek Dr*" - It is not
allowed since perfix in Quotes is not allowed" means, why cant you use
exact phrase matching? Do you have some limitation of quoting as you are
specifically looking for an exact phrase I dont see why you wouldn't want
exact matching.


Anyways

You can look into using another type of tokenizer, my guess is you are
probably using the standard tokenizer or possibly the whitespace tokenizer.
You may want to try a different one a see what result you get. Also you
probably wont need to use the wildcards if you setup you gram sizes the way
you want.

The shingle factory can do stuff like (now my memory is a bit fuzzy on this
but I play with it in the admin page).

This is a sentence
shingle = 4
this_is_a_sentence

Combine that with your ngram factory and you can do something like. Mingram
= 4 max =50
this
this_i
this_is

this_is_a_sentence

his_i
his_is

his_is_a_sentence

etc.


Then apply the shingle factory on query to take something like

his is-> his_is and you will get that phrase back.

My personal favorite is just using edgengram and fixing something like but
the concept is the same with regular old ngram:

2001 N Drive Derek Fullerton

2
[32]
0
1
1
word
1
20
[32 30]
0
2
1
word
1
200
[32 30 30]
0
3
1
word
1
2001
[32 30 30 31]
0
4
1
word
1
n
[6e]
5
6
1
word
2
d
[64]
7
8
1
word
3
dr
[64 72]
7
9
1
word
3
dri
[64 72 69]
7
10
1
word
3
driv
[64 72 69 76]
7
11
1
word
3
drive
[64 72 69 76 65]
7
12
1
word
3
d
[64]
13
14
1
word
4
de
[64 65]
13
15
1
word
4
der
[64 65 72]
13
16
1
word
4
dere
[64 65 72 65]
13
17
1
word
4
derek
[64 65 72 65 6b]
13
18
1
word
4
f
[66]
19
20
1
word
5
fu
[66 75]
19
21
1
word
5
ful
[66 75 6c]
19
22
1
word
5
full
[66 75 6c 6c]
19
23
1
word
5
fulle
[66 75 6c 6c 65]
19
24
1
word
5
fuller
[66 75 6c 6c 65 72]
19
25
1
word
5
fullert
[66 75 6c 6c 65 72 74]
19
26
1
word
5
fullerto
[66 75 6c 6c 65 72 74 6f]
19
27
1
word
5
fullerton
[66 75 6c 6c 65 72 74 6f 6e]
19
28
1
word
5

Works great for a quick type-ahead field type.

Oh and by the way your ngram size is two small for _rek_ to be split up
from _derek_


Setting up a few different field types and playing with the analyzer in
admin page can give you a good idea about what both index and query time
results can be and with your tiny data set is the best way I can think of
to see instant results with your new field types.

Nick

On Tue, May 10, 2016 at 10:01 AM, Thrinadh Kuppili 
wrote:

> I have tried with  maxGramSize="12"/> and search using the Extended Dismax
>
> Q= {!dismax qf=address} rek Dr* - It did not work as expected since i am
> getting all the records which has rek, Dr .
>
> Q= {!dismax qf=address} "rek Dr*" - It is not allowed since perfix in
> Quotes
> is not allowed.
>
> Q= {!complexphrase inOrder=true}address:"rek dr*" - It did not work since
> it
> is searching for words starts with rek
>
> I am not aware of shingle factory as of now will try to use and findout how
> i can use.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-search-in-solr-for-words-like-rek-Dr-tp4275854p4275859.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: How to search in solr for words like %rek Dr%

2016-05-10 Thread Thrinadh Kuppili
I have tried with  and search using the Extended Dismax 

Q= {!dismax qf=address} rek Dr* - It did not work as expected since i am
getting all the records which has rek, Dr .

Q= {!dismax qf=address} "rek Dr*" - It is not allowed since perfix in Quotes
is not allowed.

Q= {!complexphrase inOrder=true}address:"rek dr*" - It did not work since it
is searching for words starts with rek 

I am not aware of shingle factory as of now will try to use and findout how
i can use.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-search-in-solr-for-words-like-rek-Dr-tp4275854p4275859.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to search in solr for words like %rek Dr%

2016-05-10 Thread Nick D
You can use a combination of ngram or edgengram fields and possibly the
shingle factory if you want to combine words. Also might want to have it as
exact text with no query sloop if the two words, even the partial text,
need to be right next to each other. Edge is great for left to right ngram
is great just to splitup by a size.  There are a number of tokenizers you
can try out.

Nick
On May 10, 2016 9:22 AM, "Thrinadh Kuppili"  wrote:

> I am trying to search a field named Address which has a space in it.
> Example :
> Address has the below values in it.
> 1. 2000 North Derek Dr Fullerton
> 2. 2011 N Derek Drive Fullerton
> 3. 2108 N Derek Drive Fullerton
> 4. 2100 N Derek Drive Fullerton
> 5. 2001 N Drive Derek Fullerton
>
> Search Query:- Derek Drive or rek Dr
> Expectation is it should return all  2,3,4 and it should not return 1 & 5 .
>
> Finally i am trying to find a word which can search similar to database
> search of %N Derek%
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-search-in-solr-for-words-like-rek-Dr-tp4275854.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


How to search in solr for words like %rek Dr%

2016-05-10 Thread Thrinadh Kuppili
I am trying to search a field named Address which has a space in it.
Example :
Address has the below values in it.
1. 2000 North Derek Dr Fullerton
2. 2011 N Derek Drive Fullerton 
3. 2108 N Derek Drive Fullerton
4. 2100 N Derek Drive Fullerton
5. 2001 N Drive Derek Fullerton

Search Query:- Derek Drive or rek Dr 
Expectation is it should return all  2,3,4 and it should not return 1 & 5 .

Finally i am trying to find a word which can search similar to database
search of %N Derek% 

 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-search-in-solr-for-words-like-rek-Dr-tp4275854.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Tuning solr for large index with rapid writes

2016-05-02 Thread Stephen Lewis
Thanks for the good suggestions on read traffic. I have been simulating
reads through parsing our elb logs and replaying them from a fleet of test
servers acting as frontends using Siege .
We are hoping to tune mostly based on exact use case, and so this seems the
most effective route. I see why for the average user experience, 0-hit
queries would provide some better data. Our plan is to start with exact
user patterns and then branch and refine our metrics from there.

For writes, I am using an index rebuild which we have written. We use this
for building anew or refreshing an existing index in case of changes to our
data model, document structure, schema, etc... It was actually turning on
this rebuild to our main cluster that started edging us toward the
performance limits on writes.

After writing last, we discovered we were garbage collection limited in our
current cluster. We noticed that when doing writes, especially the large
volume of writes our background rebuild was using, we generally do okay,
but eventually the GC would do a deep pass and we'd see 504 gateway
timeouts. We updated with the settings from Shawn Heisey
's page, and we have only seen
timeouts a couple of times since then (these don't kill the rebuild, they
simply get retried later). I see from you here and on another thread right
now that gc seems to be an area of active discussion.

Best,
Stephen

On Mon, May 2, 2016 at 9:20 AM, Erick Erickson 
wrote:

> Bram:
>
> That works. I try to monitor the number of 0-hit
> queries when I generate a test set on the theory that
> those are _usually_ groups of random terms I've
> selected that aren't a good model. So it's often
> a sequence like "generate my list, see which
> ones give 0 results and remove them". Rinse,
> repeat.
>
> Like you said, imperfect but _loads_ better than
> trying to create them without real user queries
> as guidance...
>
> Best,
> Erick
>
> On Sat, Apr 30, 2016 at 4:19 AM, Bram Van Dam 
> wrote:
> >> If I'm reading this right, you have 420M docs on a single shard?
> >> Yep, you were reading it right.
> >
> > Is Erick mentioned, it's hard to give concrete sizing advice, but we've
> > found 120M to be the magic number. When a shard contains more than 120M
> > documents, performance goes down rapidly & GC pauses grow a lot longer.
> > Up until 250M things remain acceptable. But then performance starts to
> > drop very quickly after that.
> >
> >  - Bram
> >
>



-- 
Stephen

(206)753-9320
stephen-lewis.net


Re: Tuning solr for large index with rapid writes

2016-05-02 Thread Erick Erickson
Bram:

That works. I try to monitor the number of 0-hit
queries when I generate a test set on the theory that
those are _usually_ groups of random terms I've
selected that aren't a good model. So it's often
a sequence like "generate my list, see which
ones give 0 results and remove them". Rinse,
repeat.

Like you said, imperfect but _loads_ better than
trying to create them without real user queries
as guidance...

Best,
Erick

On Sat, Apr 30, 2016 at 4:19 AM, Bram Van Dam  wrote:
>> If I'm reading this right, you have 420M docs on a single shard?
>> Yep, you were reading it right.
>
> Is Erick mentioned, it's hard to give concrete sizing advice, but we've
> found 120M to be the magic number. When a shard contains more than 120M
> documents, performance goes down rapidly & GC pauses grow a lot longer.
> Up until 250M things remain acceptable. But then performance starts to
> drop very quickly after that.
>
>  - Bram
>


Re: Tuning solr for large index with rapid writes

2016-04-30 Thread Bram Van Dam
> If I'm reading this right, you have 420M docs on a single shard?
> Yep, you were reading it right. 

Is Erick mentioned, it's hard to give concrete sizing advice, but we've
found 120M to be the magic number. When a shard contains more than 120M
documents, performance goes down rapidly & GC pauses grow a lot longer.
Up until 250M things remain acceptable. But then performance starts to
drop very quickly after that.

 - Bram



Re: Tuning solr for large index with rapid writes

2016-04-30 Thread Bram Van Dam
On 29/04/16 16:33, Erick Erickson wrote:
> You have one huge advantage when doing prototyping, you can
> mine your current logs for real user queries. It's actually
> surprisingly difficult to generate, say, 10,000 "realistic" queries. And
> IMO you need something approaching that number to insure that
> you're queries don't hit the caches etc

Our approach is to log queries for a while, boil them down to their
different use cases (full text search, simple facet, complex 2D ranged
with stats, etc) and then generate realistic parameter values for each
search field used in those queries. It's not perfect, but it gives you
large amounts of reasonably realistic queries.

Also, you can bypass the query cache by adding {!cache=false} to your query.

 - Bram




Re: Tuning solr for large index with rapid writes

2016-04-29 Thread Erick Erickson
Good luck!

You have one huge advantage when doing prototyping, you can
mine your current logs for real user queries. It's actually
surprisingly difficult to generate, say, 10,000 "realistic" queries. And
IMO you need something approaching that number to insure that
you're queries don't hit the caches etc

Anyway, sounds like you're off and running.

Best,
Erick

On Wed, Apr 27, 2016 at 10:12 AM, Stephen Lewis  wrote:
>>
> If I'm reading this right, you have 420M docs on a single shard?
> Yep, you were reading it right. Thanks for your guidance. We will do
> various prototyping following "the sizing exercise".
>
> Best,
> Stephen
>
> On Tue, Apr 26, 2016 at 6:17 PM, Erick Erickson 
> wrote:
>
>>
>> If I'm reading this right, you have 420M docs on a single shard? If that's
>> true
>> you are pushing the envelope of what I've seen work and be performant. Your
>> OOM errors are the proverbial 'smoking gun' that you're putting too many
>> docs
>> on too few nodes.
>>
>> You say that the document count is "growing quite rapidly". My expectation
>> is
>> that your problems will only get worse as you cram more docs into your
>> shard.
>>
>> You're correct that adding more memory (and consequently more JVM
>> memory?) only gets you so far before you start running into GC trouble,
>> when you hit full GC pauses they'll get longer and longer which is its own
>> problem. And you don't want to have huge JVM memory at the expense
>> of op system memory due the fact that Lucene uses MMapDirectory, see
>> Uwe's excellent blog:
>> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>>
>> I'd _strongly_ recommend you do "the sizing exercise". There are lots of
>> details here:
>>
>> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>>
>> You've already done some of this inadvertently, unfortunately it sounds
>> like
>> it's in production. If I were going to guess, I'd say the maximum number of
>> docs on any shard should be less than half what you currently have. So you
>> need to figure out how many docs you expect to host in this collection
>> eventually
>> and have N/200M shards. At least.
>>
>> There are various strategies when the answer is "I don't know", you
>> might add new
>> collections when you max out and then use "collection aliasing" to
>> query them etc.
>>
>> Best,
>> Erick
>>
>> On Tue, Apr 26, 2016 at 3:49 PM, Stephen Lewis  wrote:
>> > Hello,
>> >
>> > I'm looking for some guidance on the best steps for tuning a solr cloud
>> > cluster which is heavy on writes. We are currently running a solr cloud
>> > fleet composed of one core, one shard, and three nodes. The cloud is
>> hosted
>> > in AWS, and each solr node is on its own linux r3.2xl instance with 8 cpu
>> > and 61 GiB mem, and a 2TB EBS volume attached. Our index is currently 550
>> > GiB over 420M documents, and growing quite rapidly. We are currently
>> doing
>> > a bit more than 1000 document writes/deletes per second.
>> >
>> > Recently, we've hit some trouble with our production cloud. We have had
>> the
>> > process on individual instances die a few times, and we see the following
>> > error messages being logged (expanded logs at the bottom of the email):
>> >
>> > ERROR - 2016-04-26 00:56:43.873; org.apache.solr.common.SolrException;
>> > null:org.eclipse.jetty.io.EofException
>> >
>> > WARN  - 2016-04-26 00:55:29.571;
>> org.eclipse.jetty.servlet.ServletHandler;
>> > /solr/panopto/select
>> > java.lang.IllegalStateException: Committed
>> >
>> > WARN  - 2016-04-26 00:55:29.571; org.eclipse.jetty.server.Response;
>> > Committed before 500 {trace=org.eclipse.jetty.io.EofException
>> >
>> >
>> > Another time we saw this happen, we had java OOM errors (expanded logs at
>> > the bottom):
>> >
>> > WARN  - 2016-04-25 22:58:43.943;
>> org.eclipse.jetty.servlet.ServletHandler;
>> > Error for /solr/panopto/select
>> > java.lang.OutOfMemoryError: Java heap space
>> > ERROR - 2016-04-25 22:58:43.945; org.apache.solr.common.SolrException;
>> > null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap
>> space
>> > ...
>> > Caused by: java.lang.OutOfMemoryError: Java heap space
>> >
>> >
>> > When the cloud goes into recovery during live indexing, it takes about
>> 4-6
>> > hours for a node to recover, but when we turn off indexing, recovery only
>> > takes about 90 minutes.
>> >
>> > Moreover, we see that deletes are extremely slow. We do batch deletes of
>> > about 300 documents based on two value filters, and this takes about one
>> > minute:
>> >
>> > Research online suggests that a larger disk cache
>> >  could be helpful,
>> > but I also see from an older page
>> >  on tuning for
>> > Lucene that turning down the swappiness on our Linux instances may be
>> > preferred to simply increasing space for the disk cache.
>> >
>> > Moreover, to scale 

Re: Tuning solr for large index with rapid writes

2016-04-27 Thread Stephen Lewis
​>
If I'm reading this right, you have 420M docs on a single shard?
Yep, you were reading it right. Thanks for your guidance. We will do
various prototyping following "the sizing exercise".

Best,
Stephen

On Tue, Apr 26, 2016 at 6:17 PM, Erick Erickson 
wrote:

> ​​
> If I'm reading this right, you have 420M docs on a single shard? If that's
> true
> you are pushing the envelope of what I've seen work and be performant. Your
> OOM errors are the proverbial 'smoking gun' that you're putting too many
> docs
> on too few nodes.
>
> You say that the document count is "growing quite rapidly". My expectation
> is
> that your problems will only get worse as you cram more docs into your
> shard.
>
> You're correct that adding more memory (and consequently more JVM
> memory?) only gets you so far before you start running into GC trouble,
> when you hit full GC pauses they'll get longer and longer which is its own
> problem. And you don't want to have huge JVM memory at the expense
> of op system memory due the fact that Lucene uses MMapDirectory, see
> Uwe's excellent blog:
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> I'd _strongly_ recommend you do "the sizing exercise". There are lots of
> details here:
>
> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> You've already done some of this inadvertently, unfortunately it sounds
> like
> it's in production. If I were going to guess, I'd say the maximum number of
> docs on any shard should be less than half what you currently have. So you
> need to figure out how many docs you expect to host in this collection
> eventually
> and have N/200M shards. At least.
>
> There are various strategies when the answer is "I don't know", you
> might add new
> collections when you max out and then use "collection aliasing" to
> query them etc.
>
> Best,
> Erick
>
> On Tue, Apr 26, 2016 at 3:49 PM, Stephen Lewis  wrote:
> > Hello,
> >
> > I'm looking for some guidance on the best steps for tuning a solr cloud
> > cluster which is heavy on writes. We are currently running a solr cloud
> > fleet composed of one core, one shard, and three nodes. The cloud is
> hosted
> > in AWS, and each solr node is on its own linux r3.2xl instance with 8 cpu
> > and 61 GiB mem, and a 2TB EBS volume attached. Our index is currently 550
> > GiB over 420M documents, and growing quite rapidly. We are currently
> doing
> > a bit more than 1000 document writes/deletes per second.
> >
> > Recently, we've hit some trouble with our production cloud. We have had
> the
> > process on individual instances die a few times, and we see the following
> > error messages being logged (expanded logs at the bottom of the email):
> >
> > ERROR - 2016-04-26 00:56:43.873; org.apache.solr.common.SolrException;
> > null:org.eclipse.jetty.io.EofException
> >
> > WARN  - 2016-04-26 00:55:29.571;
> org.eclipse.jetty.servlet.ServletHandler;
> > /solr/panopto/select
> > java.lang.IllegalStateException: Committed
> >
> > WARN  - 2016-04-26 00:55:29.571; org.eclipse.jetty.server.Response;
> > Committed before 500 {trace=org.eclipse.jetty.io.EofException
> >
> >
> > Another time we saw this happen, we had java OOM errors (expanded logs at
> > the bottom):
> >
> > WARN  - 2016-04-25 22:58:43.943;
> org.eclipse.jetty.servlet.ServletHandler;
> > Error for /solr/panopto/select
> > java.lang.OutOfMemoryError: Java heap space
> > ERROR - 2016-04-25 22:58:43.945; org.apache.solr.common.SolrException;
> > null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap
> space
> > ...
> > Caused by: java.lang.OutOfMemoryError: Java heap space
> >
> >
> > When the cloud goes into recovery during live indexing, it takes about
> 4-6
> > hours for a node to recover, but when we turn off indexing, recovery only
> > takes about 90 minutes.
> >
> > Moreover, we see that deletes are extremely slow. We do batch deletes of
> > about 300 documents based on two value filters, and this takes about one
> > minute:
> >
> > Research online suggests that a larger disk cache
> >  could be helpful,
> > but I also see from an older page
> >  on tuning for
> > Lucene that turning down the swappiness on our Linux instances may be
> > preferred to simply increasing space for the disk cache.
> >
> > Moreover, to scale in the past, we've simply rolled our cluster while
> > increasing the memory on the new machines, but I wonder if we're hitting
> > the limit for how much we should scale vertically. My impression is that
> > sharding will allow us to warm searchers faster and maintain a more
> > effective cache as we scale. Will we really be helped by sharding, or is
> it
> > only a matter of total CPU/Memory in the cluster?
> >
> > Thanks!
> >
> > Stephen
> >
> > (206)753-9320
> > stephen-lewis.net
> >
> > Logs:
> >
> > ERROR - 2016-04-26 00:56:43.873;

Re: Tuning solr for large index with rapid writes

2016-04-26 Thread Erick Erickson
If I'm reading this right, you have 420M docs on a single shard? If that's true
you are pushing the envelope of what I've seen work and be performant. Your
OOM errors are the proverbial 'smoking gun' that you're putting too many docs
on too few nodes.

You say that the document count is "growing quite rapidly". My expectation is
that your problems will only get worse as you cram more docs into your shard.

You're correct that adding more memory (and consequently more JVM
memory?) only gets you so far before you start running into GC trouble,
when you hit full GC pauses they'll get longer and longer which is its own
problem. And you don't want to have huge JVM memory at the expense
of op system memory due the fact that Lucene uses MMapDirectory, see
Uwe's excellent blog:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

I'd _strongly_ recommend you do "the sizing exercise". There are lots of
details here:
https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

You've already done some of this inadvertently, unfortunately it sounds like
it's in production. If I were going to guess, I'd say the maximum number of
docs on any shard should be less than half what you currently have. So you
need to figure out how many docs you expect to host in this collection
eventually
and have N/200M shards. At least.

There are various strategies when the answer is "I don't know", you
might add new
collections when you max out and then use "collection aliasing" to
query them etc.

Best,
Erick

On Tue, Apr 26, 2016 at 3:49 PM, Stephen Lewis  wrote:
> Hello,
>
> I'm looking for some guidance on the best steps for tuning a solr cloud
> cluster which is heavy on writes. We are currently running a solr cloud
> fleet composed of one core, one shard, and three nodes. The cloud is hosted
> in AWS, and each solr node is on its own linux r3.2xl instance with 8 cpu
> and 61 GiB mem, and a 2TB EBS volume attached. Our index is currently 550
> GiB over 420M documents, and growing quite rapidly. We are currently doing
> a bit more than 1000 document writes/deletes per second.
>
> Recently, we've hit some trouble with our production cloud. We have had the
> process on individual instances die a few times, and we see the following
> error messages being logged (expanded logs at the bottom of the email):
>
> ERROR - 2016-04-26 00:56:43.873; org.apache.solr.common.SolrException;
> null:org.eclipse.jetty.io.EofException
>
> WARN  - 2016-04-26 00:55:29.571; org.eclipse.jetty.servlet.ServletHandler;
> /solr/panopto/select
> java.lang.IllegalStateException: Committed
>
> WARN  - 2016-04-26 00:55:29.571; org.eclipse.jetty.server.Response;
> Committed before 500 {trace=org.eclipse.jetty.io.EofException
>
>
> Another time we saw this happen, we had java OOM errors (expanded logs at
> the bottom):
>
> WARN  - 2016-04-25 22:58:43.943; org.eclipse.jetty.servlet.ServletHandler;
> Error for /solr/panopto/select
> java.lang.OutOfMemoryError: Java heap space
> ERROR - 2016-04-25 22:58:43.945; org.apache.solr.common.SolrException;
> null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
> ...
> Caused by: java.lang.OutOfMemoryError: Java heap space
>
>
> When the cloud goes into recovery during live indexing, it takes about 4-6
> hours for a node to recover, but when we turn off indexing, recovery only
> takes about 90 minutes.
>
> Moreover, we see that deletes are extremely slow. We do batch deletes of
> about 300 documents based on two value filters, and this takes about one
> minute:
>
> Research online suggests that a larger disk cache
>  could be helpful,
> but I also see from an older page
>  on tuning for
> Lucene that turning down the swappiness on our Linux instances may be
> preferred to simply increasing space for the disk cache.
>
> Moreover, to scale in the past, we've simply rolled our cluster while
> increasing the memory on the new machines, but I wonder if we're hitting
> the limit for how much we should scale vertically. My impression is that
> sharding will allow us to warm searchers faster and maintain a more
> effective cache as we scale. Will we really be helped by sharding, or is it
> only a matter of total CPU/Memory in the cluster?
>
> Thanks!
>
> Stephen
>
> (206)753-9320
> stephen-lewis.net
>
> Logs:
>
> ERROR - 2016-04-26 00:56:43.873; org.apache.solr.common.SolrException;
> null:org.eclipse.jetty.io.EofException
> at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:142)
> at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107)
> at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
> at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
> at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
> at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207)
> at org.apache.solr.util.F

Tuning solr for large index with rapid writes

2016-04-26 Thread Stephen Lewis
Hello,

I'm looking for some guidance on the best steps for tuning a solr cloud
cluster which is heavy on writes. We are currently running a solr cloud
fleet composed of one core, one shard, and three nodes. The cloud is hosted
in AWS, and each solr node is on its own linux r3.2xl instance with 8 cpu
and 61 GiB mem, and a 2TB EBS volume attached. Our index is currently 550
GiB over 420M documents, and growing quite rapidly. We are currently doing
a bit more than 1000 document writes/deletes per second.

Recently, we've hit some trouble with our production cloud. We have had the
process on individual instances die a few times, and we see the following
error messages being logged (expanded logs at the bottom of the email):

ERROR - 2016-04-26 00:56:43.873; org.apache.solr.common.SolrException;
null:org.eclipse.jetty.io.EofException

WARN  - 2016-04-26 00:55:29.571; org.eclipse.jetty.servlet.ServletHandler;
/solr/panopto/select
java.lang.IllegalStateException: Committed

WARN  - 2016-04-26 00:55:29.571; org.eclipse.jetty.server.Response;
Committed before 500 {trace=org.eclipse.jetty.io.EofException


Another time we saw this happen, we had java OOM errors (expanded logs at
the bottom):

WARN  - 2016-04-25 22:58:43.943; org.eclipse.jetty.servlet.ServletHandler;
Error for /solr/panopto/select
java.lang.OutOfMemoryError: Java heap space
ERROR - 2016-04-25 22:58:43.945; org.apache.solr.common.SolrException;
null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
...
Caused by: java.lang.OutOfMemoryError: Java heap space


When the cloud goes into recovery during live indexing, it takes about 4-6
hours for a node to recover, but when we turn off indexing, recovery only
takes about 90 minutes.

Moreover, we see that deletes are extremely slow. We do batch deletes of
about 300 documents based on two value filters, and this takes about one
minute:

Research online suggests that a larger disk cache
 could be helpful,
but I also see from an older page
 on tuning for
Lucene that turning down the swappiness on our Linux instances may be
preferred to simply increasing space for the disk cache.

Moreover, to scale in the past, we've simply rolled our cluster while
increasing the memory on the new machines, but I wonder if we're hitting
the limit for how much we should scale vertically. My impression is that
sharding will allow us to warm searchers faster and maintain a more
effective cache as we scale. Will we really be helped by sharding, or is it
only a matter of total CPU/Memory in the cluster?

Thanks!

Stephen

(206)753-9320
stephen-lewis.net

​Logs:​

ERROR - 2016-04-26 00:56:43.873; org.apache.solr.common.SolrException;
null:org.eclipse.jetty.io.EofException
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:142)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207)
at org.apache.solr.util.FastWriter.flush(FastWriter.java:141)
at org.apache.solr.util.FastWriter.flushBuffer(FastWriter.java:155)
at
org.apache.solr.response.TextResponseWriter.close(TextResponseWriter.java:83)
at
org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:42)
at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:765)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:426)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConn

Re: Solr for real time analytics system

2016-02-04 Thread Susheel Kumar
Hi Rohit,

Please take a loot at Streaming expressions & Parallel SQL Interface.  That
should meet many of your analytics requirement (aggregation queries like
sum/average/groupby etc).
https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interface

Thanks,
Susheel

On Thu, Feb 4, 2016 at 3:17 AM, Arkadiusz Robiński <
arkadiusz.robin...@otodom.pl> wrote:

> A few people did a real time analytics system with solr and talked about it
> at conferences. Maybe you'll find their presentations useful:
>
> https://www.youtube.com/results?search_query=solr%20real%20time%20analytics&oq=&gs_l=
> (esp. the first one: https://www.youtube.com/watch?v=PkoyCxBXAiA )
>
> On Thu, Feb 4, 2016 at 8:25 AM, Rohit Kumar  >
> wrote:
>
> > Thanks Bhimavarapu for the information.
> >
> > We are creating our own dashboard, so probably wont need kibana/banana. I
> > was more curious about Solr support for fast aggregation query over very
> > large data set. As suggested, I guess elasticsearch  has this capability.
> > Is there any published metrics or data regarding elasticsearch/solr
> > performance in this area that I can refer to?
> >
> > Thanks
> > Rohit
> >
> >
> >
> > On Thu, Feb 4, 2016 at 11:48 AM, CKReddy Bhimavarapu <
> chaitu...@gmail.com>
> > wrote:
> >
> > > Hello Rohit,
> > >
> > > You can use the Banana project which was forked from Kibana
> > > , and works with all kinds of time
> > > series (and non-time series) data stored in Apache Solr
> > > . It uses Kibana's powerful dashboard
> > > configuration capabilities, ports key panels to work with Solr, and
> > > provides significant additional capabilities, including new panels that
> > > leverage D3.js 
> > >
> > >  would need mostly aggregation queries like sum/average/groupby etc,
> but
> > > > data set is quite huge. The aggregation queries should be very fast.
> > >
> > >
> > > all your requirement can be served by this banana but I'm not sure
> about
> > > how fast solr compare to ELK 
> > >
> > > On Thu, Feb 4, 2016 at 10:51 AM, Rohit Kumar <
> > > rohitkumarbhagat...@gmail.com>
> > > wrote:
> > >
> > > > Hi
> > > >
> > > > I am quite new to Solr. I have to build a real time analytics system
> > > which
> > > > displays metrics based on multiple filters over a huge data set
> > > (~50million
> > > > documents with ~100 fileds ).  I would need mostly aggregation
> queries
> > > like
> > > > sum/average/groupby etc, but data set is quite huge. The aggregation
> > > > queries should be very fast.
> > > >
> > > > Is Solr suitable for such use cases?
> > > >
> > > > Thanks
> > > > Rohit
> > > >
> > >
> > >
> > >
> > > --
> > > ckreddybh. 
> > >
> >
>
>
>
> --
> Arkadiusz Robiński
> Software Developer
> Otodom.pl
>


Re: Solr for real time analytics system

2016-02-04 Thread Rohit Kumar
Thanks Bhimavarapu for the information.

We are creating our own dashboard, so probably wont need kibana/banana. I
was more curious about Solr support for fast aggregation query over very
large data set. As suggested, I guess elasticsearch  has this capability.
Is there any published metrics or data regarding elasticsearch/solr
performance in this area that I can refer to?

Thanks
Rohit



On Thu, Feb 4, 2016 at 11:48 AM, CKReddy Bhimavarapu 
wrote:

> Hello Rohit,
>
> You can use the Banana project which was forked from Kibana
> , and works with all kinds of time
> series (and non-time series) data stored in Apache Solr
> . It uses Kibana's powerful dashboard
> configuration capabilities, ports key panels to work with Solr, and
> provides significant additional capabilities, including new panels that
> leverage D3.js 
>
>  would need mostly aggregation queries like sum/average/groupby etc, but
> > data set is quite huge. The aggregation queries should be very fast.
>
>
> all your requirement can be served by this banana but I'm not sure about
> how fast solr compare to ELK 
>
> On Thu, Feb 4, 2016 at 10:51 AM, Rohit Kumar <
> rohitkumarbhagat...@gmail.com>
> wrote:
>
> > Hi
> >
> > I am quite new to Solr. I have to build a real time analytics system
> which
> > displays metrics based on multiple filters over a huge data set
> (~50million
> > documents with ~100 fileds ).  I would need mostly aggregation queries
> like
> > sum/average/groupby etc, but data set is quite huge. The aggregation
> > queries should be very fast.
> >
> > Is Solr suitable for such use cases?
> >
> > Thanks
> > Rohit
> >
>
>
>
> --
> ckreddybh. 
>


Re: Solr for real time analytics system

2016-02-04 Thread Arkadiusz Robiński
A few people did a real time analytics system with solr and talked about it
at conferences. Maybe you'll find their presentations useful:
https://www.youtube.com/results?search_query=solr%20real%20time%20analytics&oq=&gs_l=
(esp. the first one: https://www.youtube.com/watch?v=PkoyCxBXAiA )

On Thu, Feb 4, 2016 at 8:25 AM, Rohit Kumar 
wrote:

> Thanks Bhimavarapu for the information.
>
> We are creating our own dashboard, so probably wont need kibana/banana. I
> was more curious about Solr support for fast aggregation query over very
> large data set. As suggested, I guess elasticsearch  has this capability.
> Is there any published metrics or data regarding elasticsearch/solr
> performance in this area that I can refer to?
>
> Thanks
> Rohit
>
>
>
> On Thu, Feb 4, 2016 at 11:48 AM, CKReddy Bhimavarapu 
> wrote:
>
> > Hello Rohit,
> >
> > You can use the Banana project which was forked from Kibana
> > , and works with all kinds of time
> > series (and non-time series) data stored in Apache Solr
> > . It uses Kibana's powerful dashboard
> > configuration capabilities, ports key panels to work with Solr, and
> > provides significant additional capabilities, including new panels that
> > leverage D3.js 
> >
> >  would need mostly aggregation queries like sum/average/groupby etc, but
> > > data set is quite huge. The aggregation queries should be very fast.
> >
> >
> > all your requirement can be served by this banana but I'm not sure about
> > how fast solr compare to ELK 
> >
> > On Thu, Feb 4, 2016 at 10:51 AM, Rohit Kumar <
> > rohitkumarbhagat...@gmail.com>
> > wrote:
> >
> > > Hi
> > >
> > > I am quite new to Solr. I have to build a real time analytics system
> > which
> > > displays metrics based on multiple filters over a huge data set
> > (~50million
> > > documents with ~100 fileds ).  I would need mostly aggregation queries
> > like
> > > sum/average/groupby etc, but data set is quite huge. The aggregation
> > > queries should be very fast.
> > >
> > > Is Solr suitable for such use cases?
> > >
> > > Thanks
> > > Rohit
> > >
> >
> >
> >
> > --
> > ckreddybh. 
> >
>



-- 
Arkadiusz Robiński
Software Developer
Otodom.pl


Re: Solr for real time analytics system

2016-02-03 Thread CKReddy Bhimavarapu
Hello Rohit,

You can use the Banana project which was forked from Kibana
, and works with all kinds of time
series (and non-time series) data stored in Apache Solr
. It uses Kibana's powerful dashboard
configuration capabilities, ports key panels to work with Solr, and
provides significant additional capabilities, including new panels that
leverage D3.js 

 would need mostly aggregation queries like sum/average/groupby etc, but
> data set is quite huge. The aggregation queries should be very fast.


all your requirement can be served by this banana but I'm not sure about
how fast solr compare to ELK 

On Thu, Feb 4, 2016 at 10:51 AM, Rohit Kumar 
wrote:

> Hi
>
> I am quite new to Solr. I have to build a real time analytics system which
> displays metrics based on multiple filters over a huge data set (~50million
> documents with ~100 fileds ).  I would need mostly aggregation queries like
> sum/average/groupby etc, but data set is quite huge. The aggregation
> queries should be very fast.
>
> Is Solr suitable for such use cases?
>
> Thanks
> Rohit
>



-- 
ckreddybh. 


Solr for real time analytics system

2016-02-03 Thread Rohit Kumar
Hi

I am quite new to Solr. I have to build a real time analytics system which
displays metrics based on multiple filters over a huge data set (~50million
documents with ~100 fileds ).  I would need mostly aggregation queries like
sum/average/groupby etc, but data set is quite huge. The aggregation
queries should be very fast.

Is Solr suitable for such use cases?

Thanks
Rohit


Re: Solr for Pictures

2015-10-29 Thread Rallavagu
I was playing with exiftool (written in perl) and a custom java class 
built using metadata-extrator project 
(https://github.com/drewnoakes/metadata-extractor) and wondering if 
there is anything built into Solr or are there any best practices 
(general practices) to index pictures.


On 10/29/15 1:56 PM, Daniel Valdivia wrote:

Some extra googling yield this Wiki from a integration between Tika and a 
EXIFTool

https://wiki.apache.org/tika/EXIFToolParser 



On Oct 29, 2015, at 1:48 PM, Daniel Valdivia  wrote:

I think you can look into Tika for this https://tika.apache.org/ 


There’s handlers to integrate Tika and Solr, some context:

https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika
 





On Oct 29, 2015, at 1:47 PM, Rallavagu mailto:rallav...@gmail.com>> wrote:

In general, is there a built-in data handler to index pictures (essentially, 
EXIF and other data embedded in an image)? If not, what is the best practice to 
do so? Thanks.







Re: Solr for Pictures

2015-10-29 Thread Daniel Valdivia
Some extra googling yield this Wiki from a integration between Tika and a 
EXIFTool

https://wiki.apache.org/tika/EXIFToolParser 


> On Oct 29, 2015, at 1:48 PM, Daniel Valdivia  wrote:
> 
> I think you can look into Tika for this https://tika.apache.org/ 
> 
> 
> There’s handlers to integrate Tika and Solr, some context:
> 
> https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika
>  
> 
> 
> 
> 
>> On Oct 29, 2015, at 1:47 PM, Rallavagu > > wrote:
>> 
>> In general, is there a built-in data handler to index pictures (essentially, 
>> EXIF and other data embedded in an image)? If not, what is the best practice 
>> to do so? Thanks.
> 



RE: Solr for Pictures

2015-10-29 Thread Markus Jelsma


Hi - Solr does integrate with Apache Tika, which happily accepts images and 
other media formats. I am not sure if EXIF is exposed though but you might want 
to try. Otherwise patch it up or use Tika in your own process that indexes data 
to Solr.

https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika
 
-Original message-
> From:Rallavagu 
> Sent: Thursday 29th October 2015 21:47
> To: solr-user@lucene.apache.org
> Subject: Solr for Pictures
> 
> In general, is there a built-in data handler to index pictures 
> (essentially, EXIF and other data embedded in an image)? If not, what is 
> the best practice to do so? Thanks.
> 


Re: Solr for Pictures

2015-10-29 Thread Daniel Valdivia
I think you can look into Tika for this https://tika.apache.org/ 


There’s handlers to integrate Tika and Solr, some context:

https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika
 




> On Oct 29, 2015, at 1:47 PM, Rallavagu  wrote:
> 
> In general, is there a built-in data handler to index pictures (essentially, 
> EXIF and other data embedded in an image)? If not, what is the best practice 
> to do so? Thanks.



Solr for Pictures

2015-10-29 Thread Rallavagu
In general, is there a built-in data handler to index pictures 
(essentially, EXIF and other data embedded in an image)? If not, what is 
the best practice to do so? Thanks.


How can I use solr for hbase

2015-07-29 Thread weibaohui

Hi Everyone,

Recently, I want to use solr for the query of hbase ,but I cann't find a 
effective way. 
So,how can I use solr with hbase,is solr supported hbase?

Hope for any answer,

Thank you!




weibaohui


Re: How best to fork Solr for enhancement

2014-12-23 Thread Upayavira
I'm somewhat open to other suggestions, as I'm right at the beginning of
the project. I know Angular, and like it. I've looked at a couple of
others, but have found them to be more of a collection of disparate
components and not as integrated as Angular.

However, if folks want to have a discussion on competing frameworks, I'm
at least prepared to listen!!

Note - the design goal is to make it as easy for *Java* developers to
work with. Folks who are typically back-end developers, thus the
framework must isolate the developer from UI quirks as much as possible,
and handle have some form of design abstraction.

Upayavira

On Tue, Dec 23, 2014, at 10:09 AM, Alexandre Rafalovitch wrote:
> Semi Off Topic, but is AngularJS the best next choice, given the
> version 2 being so different from version 1?
> 
> Regards,
>Alex.
> 
> Sign up for my Solr resources newsletter at http://www.solr-start.com/
> 
> 
> On 23 December 2014 at 06:52, Upayavira  wrote:
> > Hi,
> >
> > I've (hopefully) made some time to do some work on the Solr Admin UI
> > (convert it to AngularJS). I plan to do it on a clone of the lucene-solr
> > project at GitHub.
> >
> > Before I dive too thoroughly into this, I wanted to see if there were
> > any best practices that would make it easier to back-port these changes
> > into SVN should I actually succeed at producing something useful. Is it
> > enough just to make a branch called SOLR-5507 and start committing my
> > changes there?
> >
> > Periodically, I'll zip up the relevant bits and attach them to the JIRA
> > ticket.
> >
> > TIA
> >
> > Upayavira


Re: How best to fork Solr for enhancement

2014-12-23 Thread Yago Riveiro
There is other options like Ember or Backbone, either way AngularJS is well 
adopted.




Alexandre, your question is about the radical change between versions?





In some way this shows progress and support to the framework.




Other good reason is that AngularJS has a ton of components ready to use.






—
/Yago Riveiro




On Tuesday, Dec 23, 2014 at 3:10 pm, Alexandre Rafalovitch 
, wrote:
Semi Off Topic, but is AngularJS the best next choice, given the

version 2 being so different from version 1?


Regards,

   Alex.



Sign up for my Solr resources newsletter at http://www.solr-start.com/



On 23 December 2014 at 06:52, Upayavira  wrote:

> Hi,

>

> I've (hopefully) made some time to do some work on the Solr Admin UI

> (convert it to AngularJS). I plan to do it on a clone of the lucene-solr

> project at GitHub.

>

> Before I dive too thoroughly into this, I wanted to see if there were

> any best practices that would make it easier to back-port these changes

> into SVN should I actually succeed at producing something useful. Is it

> enough just to make a branch called SOLR-5507 and start committing my

> changes there?

>

> Periodically, I'll zip up the relevant bits and attach them to the JIRA

> ticket.

>

> TIA

>

> Upayavira

Re: How best to fork Solr for enhancement

2014-12-23 Thread Alexandre Rafalovitch
Semi Off Topic, but is AngularJS the best next choice, given the
version 2 being so different from version 1?

Regards,
   Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 23 December 2014 at 06:52, Upayavira  wrote:
> Hi,
>
> I've (hopefully) made some time to do some work on the Solr Admin UI
> (convert it to AngularJS). I plan to do it on a clone of the lucene-solr
> project at GitHub.
>
> Before I dive too thoroughly into this, I wanted to see if there were
> any best practices that would make it easier to back-port these changes
> into SVN should I actually succeed at producing something useful. Is it
> enough just to make a branch called SOLR-5507 and start committing my
> changes there?
>
> Periodically, I'll zip up the relevant bits and attach them to the JIRA
> ticket.
>
> TIA
>
> Upayavira


Re: How best to fork Solr for enhancement

2014-12-23 Thread Upayavira
Perfect, thanks!

On Tue, Dec 23, 2014, at 07:10 AM, Shalin Shekhar Mangar wrote:
> You can make github play well with Apache Infra. See
> https://wiki.apache.org/lucene-java/BensonMarguliesGitWorkflow
> 
> On Tue, Dec 23, 2014 at 11:52 AM, Upayavira  wrote:
> 
> > Hi,
> >
> > I've (hopefully) made some time to do some work on the Solr Admin UI
> > (convert it to AngularJS). I plan to do it on a clone of the lucene-solr
> > project at GitHub.
> >
> > Before I dive too thoroughly into this, I wanted to see if there were
> > any best practices that would make it easier to back-port these changes
> > into SVN should I actually succeed at producing something useful. Is it
> > enough just to make a branch called SOLR-5507 and start committing my
> > changes there?
> >
> > Periodically, I'll zip up the relevant bits and attach them to the JIRA
> > ticket.
> >
> > TIA
> >
> > Upayavira
> >
> 
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.


Re: How best to fork Solr for enhancement

2014-12-23 Thread Shalin Shekhar Mangar
You can make github play well with Apache Infra. See
https://wiki.apache.org/lucene-java/BensonMarguliesGitWorkflow

On Tue, Dec 23, 2014 at 11:52 AM, Upayavira  wrote:

> Hi,
>
> I've (hopefully) made some time to do some work on the Solr Admin UI
> (convert it to AngularJS). I plan to do it on a clone of the lucene-solr
> project at GitHub.
>
> Before I dive too thoroughly into this, I wanted to see if there were
> any best practices that would make it easier to back-port these changes
> into SVN should I actually succeed at producing something useful. Is it
> enough just to make a branch called SOLR-5507 and start committing my
> changes there?
>
> Periodically, I'll zip up the relevant bits and attach them to the JIRA
> ticket.
>
> TIA
>
> Upayavira
>



-- 
Regards,
Shalin Shekhar Mangar.


How best to fork Solr for enhancement

2014-12-23 Thread Upayavira
Hi,

I've (hopefully) made some time to do some work on the Solr Admin UI
(convert it to AngularJS). I plan to do it on a clone of the lucene-solr
project at GitHub.

Before I dive too thoroughly into this, I wanted to see if there were
any best practices that would make it easier to back-port these changes
into SVN should I actually succeed at producing something useful. Is it
enough just to make a branch called SOLR-5507 and start committing my
changes there?

Periodically, I'll zip up the relevant bits and attach them to the JIRA
ticket.

TIA

Upayavira


Re: Using Solr for finding Flight Routes

2014-12-05 Thread Nazik Huq
Check Grant's SOLR Air reference app here 
http://www.ibm.com/developerworks/library/j-solr-lucene/index.html .

@Nazik_Huq


On Dec 5, 2014, at 1:19 PM, Robin Woods  wrote:

> Thanks Alex. I'll check the GraphDB solutions.
> 
> On Fri, Dec 5, 2014 at 6:20 AM, Alexandre Rafalovitch 
> wrote:
> 
>> Sounds like a standard graph-database problem. I think some GraphDBs
>> integrate with Solr (or at least Lucene) for search.
>> 
>> Regards,
>>   Alex.
>> 
>> 
>> Personal: http://www.outerthoughts.com/ and @arafalov
>> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
>> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>> 
>> 
>> On 5 December 2014 at 01:11, Robin Woods  wrote:
>>> Hello,
>>> 
>>> Anyone implemented Solr for searching the flights between two
>> destinations,
>>> sort by shortest trip and best price? is geo-spatial search a right
>> module
>>> to use?
>>> 
>>> Thanks!
>> 


Re: Using Solr for finding Flight Routes

2014-12-05 Thread Robin Woods
Thanks Alex. I'll check the GraphDB solutions.

On Fri, Dec 5, 2014 at 6:20 AM, Alexandre Rafalovitch 
wrote:

> Sounds like a standard graph-database problem. I think some GraphDBs
> integrate with Solr (or at least Lucene) for search.
>
> Regards,
>Alex.
>
>
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>
> On 5 December 2014 at 01:11, Robin Woods  wrote:
> > Hello,
> >
> > Anyone implemented Solr for searching the flights between two
> destinations,
> > sort by shortest trip and best price? is geo-spatial search a right
> module
> > to use?
> >
> > Thanks!
>


Re: Using Solr for finding Flight Routes

2014-12-05 Thread Alexandre Rafalovitch
Sounds like a standard graph-database problem. I think some GraphDBs
integrate with Solr (or at least Lucene) for search.

Regards,
   Alex.


Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 5 December 2014 at 01:11, Robin Woods  wrote:
> Hello,
>
> Anyone implemented Solr for searching the flights between two destinations,
> sort by shortest trip and best price? is geo-spatial search a right module
> to use?
>
> Thanks!


Using Solr for finding Flight Routes

2014-12-04 Thread Robin Woods
Hello,

Anyone implemented Solr for searching the flights between two destinations,
sort by shortest trip and best price? is geo-spatial search a right module
to use?

Thanks!


Re: Changing default behavior of solr for overwrite the whole document on uniquekey duplication

2014-07-13 Thread Umesh Prasad
Must Mention here. This Atomic Update will only work if you all your fields
are stored.  It eases out work on your part, but the stored fields will
bloat the index.





On 12 July 2014 22:06, Erick Erickson  wrote:

> bq: But does performance remain same in this situation
>
> No. Some documents will require two calls to be indexed. And you'll
> be sending one document at a time rather than batching them up.
> Of course it'll be slower. But will it still be "fast enough"? Only you can
> answer that.
>
> If it's _really_ a problem, you could consider using a custom update
> processor
> plugin that does all this on the server side. This would not require you to
> change Solr code, just write a relatively small bit of code and use the
> plugin infrastructure.
>
> Best,
> Erick
>
> On Thu, Jul 10, 2014 at 1:56 PM, Ali Nazemian 
> wrote:
> > Thank you very much. Now I understand what was the idea. It is better
> than
> > changing Solr. But does performance remain same in this situation?
> >
> >
> > On Tue, Jul 8, 2014 at 10:43 PM, Chris Hostetter <
> hossman_luc...@fucit.org>
> > wrote:
> >
> >>
> >> I think you are missunderstanding what Himanshu is suggesting to you.
> >>
> >> You don't need to make lots of big changes ot the internals of solr's
> code
> >> to get what you want -- instead you can leverage the Atomic Updates &
> >> Optimistic Concurrency features of Solr to get the existing internal
> Solr
> >> to reject any attempts to add a duplicate documentunless the client code
> >> sending the document specifies it should be an "update".
> >>
> >> This means your client code needs to be a bit more sophisticated, but
> the
> >> benefit is that you don't have to try to make complex changes to the
> >> internals of Solr that may be impossible and/or difficult to
> >> support/upgrade later.
> >>
> >> More details...
> >>
> >>
> >>
> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-OptimisticConcurrency
> >>
> >> Simplest possible idea based on the basic info you have given so far...
> >>
> >> 1) send every doc using _version_=-1
> >> 2a) if doc update fails with error 409, that means a version of this doc
> >> already exists
> >> 2b) resend just the field changes (using "set" atomic
> >> operation) and specify _version_=1
> >>
> >>
> >>
> >> : Dear Himanshu,
> >> : Hi,
> >> : You misunderstood what I meant. I am not going to update some field.
> I am
> >> : going to change what Solr do on duplication of uniquekey field. I dont
> >> want
> >> : to solr overwrite Whole document I just want to overwrite some parts
> of
> >> : document. This situation does not come from user side this is what
> solr
> >> do
> >> : to documents with duplicated uniquekey.
> >> : Regards.
> >> :
> >> :
> >> : On Tue, Jul 8, 2014 at 12:29 PM, Himanshu Mehrotra <
> >> : himanshu.mehro...@snapdeal.com> wrote:
> >> :
> >> : > Please look at https://wiki.apache.org/solr/Atomic_Updates
> >> : >
> >> : > This does what you want just update relevant fields.
> >> : >
> >> : > Thanks,
> >> : > Himanshu
> >> : >
> >> : >
> >> : > On Tue, Jul 8, 2014 at 1:09 PM, Ali Nazemian  >
> >> : > wrote:
> >> : >
> >> : > > Dears,
> >> : > > Hi,
> >> : > > According to my requirement I need to change the default behavior
> of
> >> Solr
> >> : > > for overwriting the whole document on unique-key duplication. I am
> >> going
> >> : > to
> >> : > > change that the overwrite just part of document (some fields) and
> >> other
> >> : > > parts of document (other fields) remain unchanged. First of all I
> >> need to
> >> : > > know such changing in Solr behavior is possible? Second, I really
> >> : > > appreciate if you can guide me through what class/classes should I
> >> : > consider
> >> : > > for changing that?
> >> : > > Best regards.
> >> : > >
> >> : > > --
> >> : > > A.Nazemian
> >> : > >
> >> : >
> >> :
> >> :
> >> :
> >> : --
> >> : A.Nazemian
> >> :
> >>
> >> -Hoss
> >> http://www.lucidworks.com/
> >>
> >
> >
> >
> > --
> > A.Nazemian
>



-- 
---
Thanks & Regards
Umesh Prasad


Re: Changing default behavior of solr for overwrite the whole document on uniquekey duplication

2014-07-12 Thread Erick Erickson
bq: But does performance remain same in this situation

No. Some documents will require two calls to be indexed. And you'll
be sending one document at a time rather than batching them up.
Of course it'll be slower. But will it still be "fast enough"? Only you can
answer that.

If it's _really_ a problem, you could consider using a custom update processor
plugin that does all this on the server side. This would not require you to
change Solr code, just write a relatively small bit of code and use the
plugin infrastructure.

Best,
Erick

On Thu, Jul 10, 2014 at 1:56 PM, Ali Nazemian  wrote:
> Thank you very much. Now I understand what was the idea. It is better than
> changing Solr. But does performance remain same in this situation?
>
>
> On Tue, Jul 8, 2014 at 10:43 PM, Chris Hostetter 
> wrote:
>
>>
>> I think you are missunderstanding what Himanshu is suggesting to you.
>>
>> You don't need to make lots of big changes ot the internals of solr's code
>> to get what you want -- instead you can leverage the Atomic Updates &
>> Optimistic Concurrency features of Solr to get the existing internal Solr
>> to reject any attempts to add a duplicate documentunless the client code
>> sending the document specifies it should be an "update".
>>
>> This means your client code needs to be a bit more sophisticated, but the
>> benefit is that you don't have to try to make complex changes to the
>> internals of Solr that may be impossible and/or difficult to
>> support/upgrade later.
>>
>> More details...
>>
>>
>> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-OptimisticConcurrency
>>
>> Simplest possible idea based on the basic info you have given so far...
>>
>> 1) send every doc using _version_=-1
>> 2a) if doc update fails with error 409, that means a version of this doc
>> already exists
>> 2b) resend just the field changes (using "set" atomic
>> operation) and specify _version_=1
>>
>>
>>
>> : Dear Himanshu,
>> : Hi,
>> : You misunderstood what I meant. I am not going to update some field. I am
>> : going to change what Solr do on duplication of uniquekey field. I dont
>> want
>> : to solr overwrite Whole document I just want to overwrite some parts of
>> : document. This situation does not come from user side this is what solr
>> do
>> : to documents with duplicated uniquekey.
>> : Regards.
>> :
>> :
>> : On Tue, Jul 8, 2014 at 12:29 PM, Himanshu Mehrotra <
>> : himanshu.mehro...@snapdeal.com> wrote:
>> :
>> : > Please look at https://wiki.apache.org/solr/Atomic_Updates
>> : >
>> : > This does what you want just update relevant fields.
>> : >
>> : > Thanks,
>> : > Himanshu
>> : >
>> : >
>> : > On Tue, Jul 8, 2014 at 1:09 PM, Ali Nazemian 
>> : > wrote:
>> : >
>> : > > Dears,
>> : > > Hi,
>> : > > According to my requirement I need to change the default behavior of
>> Solr
>> : > > for overwriting the whole document on unique-key duplication. I am
>> going
>> : > to
>> : > > change that the overwrite just part of document (some fields) and
>> other
>> : > > parts of document (other fields) remain unchanged. First of all I
>> need to
>> : > > know such changing in Solr behavior is possible? Second, I really
>> : > > appreciate if you can guide me through what class/classes should I
>> : > consider
>> : > > for changing that?
>> : > > Best regards.
>> : > >
>> : > > --
>> : > > A.Nazemian
>> : > >
>> : >
>> :
>> :
>> :
>> : --
>> : A.Nazemian
>> :
>>
>> -Hoss
>> http://www.lucidworks.com/
>>
>
>
>
> --
> A.Nazemian


Re: Changing default behavior of solr for overwrite the whole document on uniquekey duplication

2014-07-10 Thread Ali Nazemian
Thank you very much. Now I understand what was the idea. It is better than
changing Solr. But does performance remain same in this situation?


On Tue, Jul 8, 2014 at 10:43 PM, Chris Hostetter 
wrote:

>
> I think you are missunderstanding what Himanshu is suggesting to you.
>
> You don't need to make lots of big changes ot the internals of solr's code
> to get what you want -- instead you can leverage the Atomic Updates &
> Optimistic Concurrency features of Solr to get the existing internal Solr
> to reject any attempts to add a duplicate documentunless the client code
> sending the document specifies it should be an "update".
>
> This means your client code needs to be a bit more sophisticated, but the
> benefit is that you don't have to try to make complex changes to the
> internals of Solr that may be impossible and/or difficult to
> support/upgrade later.
>
> More details...
>
>
> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-OptimisticConcurrency
>
> Simplest possible idea based on the basic info you have given so far...
>
> 1) send every doc using _version_=-1
> 2a) if doc update fails with error 409, that means a version of this doc
> already exists
> 2b) resend just the field changes (using "set" atomic
> operation) and specify _version_=1
>
>
>
> : Dear Himanshu,
> : Hi,
> : You misunderstood what I meant. I am not going to update some field. I am
> : going to change what Solr do on duplication of uniquekey field. I dont
> want
> : to solr overwrite Whole document I just want to overwrite some parts of
> : document. This situation does not come from user side this is what solr
> do
> : to documents with duplicated uniquekey.
> : Regards.
> :
> :
> : On Tue, Jul 8, 2014 at 12:29 PM, Himanshu Mehrotra <
> : himanshu.mehro...@snapdeal.com> wrote:
> :
> : > Please look at https://wiki.apache.org/solr/Atomic_Updates
> : >
> : > This does what you want just update relevant fields.
> : >
> : > Thanks,
> : > Himanshu
> : >
> : >
> : > On Tue, Jul 8, 2014 at 1:09 PM, Ali Nazemian 
> : > wrote:
> : >
> : > > Dears,
> : > > Hi,
> : > > According to my requirement I need to change the default behavior of
> Solr
> : > > for overwriting the whole document on unique-key duplication. I am
> going
> : > to
> : > > change that the overwrite just part of document (some fields) and
> other
> : > > parts of document (other fields) remain unchanged. First of all I
> need to
> : > > know such changing in Solr behavior is possible? Second, I really
> : > > appreciate if you can guide me through what class/classes should I
> : > consider
> : > > for changing that?
> : > > Best regards.
> : > >
> : > > --
> : > > A.Nazemian
> : > >
> : >
> :
> :
> :
> : --
> : A.Nazemian
> :
>
> -Hoss
> http://www.lucidworks.com/
>



-- 
A.Nazemian


Re: Changing default behavior of solr for overwrite the whole document on uniquekey duplication

2014-07-08 Thread Chris Hostetter

I think you are missunderstanding what Himanshu is suggesting to you.

You don't need to make lots of big changes ot the internals of solr's code 
to get what you want -- instead you can leverage the Atomic Updates & 
Optimistic Concurrency features of Solr to get the existing internal Solr 
to reject any attempts to add a duplicate documentunless the client code 
sending the document specifies it should be an "update".

This means your client code needs to be a bit more sophisticated, but the 
benefit is that you don't have to try to make complex changes to the 
internals of Solr that may be impossible and/or difficult to 
support/upgrade later.

More details...

https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-OptimisticConcurrency

Simplest possible idea based on the basic info you have given so far...

1) send every doc using _version_=-1
2a) if doc update fails with error 409, that means a version of this doc 
already exists
2b) resend just the field changes (using "set" atomic 
operation) and specify _version_=1



: Dear Himanshu,
: Hi,
: You misunderstood what I meant. I am not going to update some field. I am
: going to change what Solr do on duplication of uniquekey field. I dont want
: to solr overwrite Whole document I just want to overwrite some parts of
: document. This situation does not come from user side this is what solr do
: to documents with duplicated uniquekey.
: Regards.
: 
: 
: On Tue, Jul 8, 2014 at 12:29 PM, Himanshu Mehrotra <
: himanshu.mehro...@snapdeal.com> wrote:
: 
: > Please look at https://wiki.apache.org/solr/Atomic_Updates
: >
: > This does what you want just update relevant fields.
: >
: > Thanks,
: > Himanshu
: >
: >
: > On Tue, Jul 8, 2014 at 1:09 PM, Ali Nazemian 
: > wrote:
: >
: > > Dears,
: > > Hi,
: > > According to my requirement I need to change the default behavior of Solr
: > > for overwriting the whole document on unique-key duplication. I am going
: > to
: > > change that the overwrite just part of document (some fields) and other
: > > parts of document (other fields) remain unchanged. First of all I need to
: > > know such changing in Solr behavior is possible? Second, I really
: > > appreciate if you can guide me through what class/classes should I
: > consider
: > > for changing that?
: > > Best regards.
: > >
: > > --
: > > A.Nazemian
: > >
: >
: 
: 
: 
: -- 
: A.Nazemian
: 

-Hoss
http://www.lucidworks.com/


Re: Changing default behavior of solr for overwrite the whole document on uniquekey duplication

2014-07-08 Thread Ali Nazemian
Dear Himanshu,
Hi,
You misunderstood what I meant. I am not going to update some field. I am
going to change what Solr do on duplication of uniquekey field. I dont want
to solr overwrite Whole document I just want to overwrite some parts of
document. This situation does not come from user side this is what solr do
to documents with duplicated uniquekey.
Regards.


On Tue, Jul 8, 2014 at 12:29 PM, Himanshu Mehrotra <
himanshu.mehro...@snapdeal.com> wrote:

> Please look at https://wiki.apache.org/solr/Atomic_Updates
>
> This does what you want just update relevant fields.
>
> Thanks,
> Himanshu
>
>
> On Tue, Jul 8, 2014 at 1:09 PM, Ali Nazemian 
> wrote:
>
> > Dears,
> > Hi,
> > According to my requirement I need to change the default behavior of Solr
> > for overwriting the whole document on unique-key duplication. I am going
> to
> > change that the overwrite just part of document (some fields) and other
> > parts of document (other fields) remain unchanged. First of all I need to
> > know such changing in Solr behavior is possible? Second, I really
> > appreciate if you can guide me through what class/classes should I
> consider
> > for changing that?
> > Best regards.
> >
> > --
> > A.Nazemian
> >
>



-- 
A.Nazemian


Re: Changing default behavior of solr for overwrite the whole document on uniquekey duplication

2014-07-08 Thread Himanshu Mehrotra
Please look at https://wiki.apache.org/solr/Atomic_Updates

This does what you want just update relevant fields.

Thanks,
Himanshu


On Tue, Jul 8, 2014 at 1:09 PM, Ali Nazemian  wrote:

> Dears,
> Hi,
> According to my requirement I need to change the default behavior of Solr
> for overwriting the whole document on unique-key duplication. I am going to
> change that the overwrite just part of document (some fields) and other
> parts of document (other fields) remain unchanged. First of all I need to
> know such changing in Solr behavior is possible? Second, I really
> appreciate if you can guide me through what class/classes should I consider
> for changing that?
> Best regards.
>
> --
> A.Nazemian
>


Changing default behavior of solr for overwrite the whole document on uniquekey duplication

2014-07-08 Thread Ali Nazemian
Dears,
Hi,
According to my requirement I need to change the default behavior of Solr
for overwriting the whole document on unique-key duplication. I am going to
change that the overwrite just part of document (some fields) and other
parts of document (other fields) remain unchanged. First of all I need to
know such changing in Solr behavior is possible? Second, I really
appreciate if you can guide me through what class/classes should I consider
for changing that?
Best regards.

-- 
A.Nazemian


Re: Using solr for image retrieval - very long response time

2014-07-04 Thread Yossi Biton
1. debugQuery shows almost all of the time spent in query.

2. i cant look right now at the heap, but i remember i allocated 4gb for
the JVM and it's far from being fully used.
Regarding GC im not sure how to check it (gc.log ?).

3. The whole index fits in memory during the query.
 On Jul 4, 2014 3:31 PM, "Jack Krupansky"  wrote:

> I would expect an excessively long query (greater than dozens or low
> hundreds of terms) to run significantly slower, but 100 seconds does seem
> excessively slow even for a 1000-term query.
>
> Add the debugQuery=true parameter to your query request and checking the
> timing section to see if the time is spent in the query process or some
> other stage of processing.
>
> How is your JVM heap usage? Make sure you have enough heap but not too
> much. Are a lot of GCs occurring?
>
> Does your index fit entirely in OS system memory for file caching? If not,
> you could be incurring tons of IO.
>
> -- Jack Krupansky
>
> -Original Message- From: Yossi Biton
> Sent: Friday, July 4, 2014 7:25 AM
> To: solr-user@lucene.apache.org
> Subject: Using solr for image retrieval - very long response time
>
> Hello there,
>
> Recently I was trying to implement the bag-of-words model for image
> retrieval by using Solr. Shortly this model consists of extracting "visual
> words" from images and then use tf-idf schema for fast querying (usually
> include also re-ranking stage).
> I found solr as a suitable platform (hope i'm not wrong), as it provides
> tf-idf ranking.
>
> Currently i'm issuing the following problem :
> My images usually contains about 1,000 words, so it means the query
> consists of 1,000 terms.
> When using simple select query with 1,000 OR i get a very long response
> time (100s for index with 2M images).
>
> Is there an efficient way to build the query in this case ?
>


Re: Using solr for image retrieval - very long response time

2014-07-04 Thread Jack Krupansky
I would expect an excessively long query (greater than dozens or low 
hundreds of terms) to run significantly slower, but 100 seconds does seem 
excessively slow even for a 1000-term query.


Add the debugQuery=true parameter to your query request and checking the 
timing section to see if the time is spent in the query process or some 
other stage of processing.


How is your JVM heap usage? Make sure you have enough heap but not too much. 
Are a lot of GCs occurring?


Does your index fit entirely in OS system memory for file caching? If not, 
you could be incurring tons of IO.


-- Jack Krupansky

-Original Message- 
From: Yossi Biton

Sent: Friday, July 4, 2014 7:25 AM
To: solr-user@lucene.apache.org
Subject: Using solr for image retrieval - very long response time

Hello there,

Recently I was trying to implement the bag-of-words model for image
retrieval by using Solr. Shortly this model consists of extracting "visual
words" from images and then use tf-idf schema for fast querying (usually
include also re-ranking stage).
I found solr as a suitable platform (hope i'm not wrong), as it provides
tf-idf ranking.

Currently i'm issuing the following problem :
My images usually contains about 1,000 words, so it means the query
consists of 1,000 terms.
When using simple select query with 1,000 OR i get a very long response
time (100s for index with 2M images).

Is there an efficient way to build the query in this case ? 



Using solr for image retrieval - very long response time

2014-07-04 Thread Yossi Biton
Hello there,

Recently I was trying to implement the bag-of-words model for image
retrieval by using Solr. Shortly this model consists of extracting "visual
words" from images and then use tf-idf schema for fast querying (usually
include also re-ranking stage).
I found solr as a suitable platform (hope i'm not wrong), as it provides
tf-idf ranking.

Currently i'm issuing the following problem :
My images usually contains about 1,000 words, so it means the query
consists of 1,000 terms.
When using simple select query with 1,000 OR i get a very long response
time (100s for index with 2M images).

Is there an efficient way to build the query in this case ?


Re: How to Query to Solr for comparing two dates in solr

2014-06-13 Thread Chris Hostetter

: I think you'd have to get creative with function queries. The trick is

You don't have to get *very* creative...

: > I want to retrieve all docs or records from solr where  updateDate >=
: > appliedDate OR appliedDate == null

Pretty sure all you need is...

fq={!frange l=0}ms(updateDate,appliedDate)

...the ms function will subtract the dates and return the number of 
seconds they differ -- so if the number is greater then of equal to 0 
appliedDate comes after updateDate.  The only tricky part is the "there is 
no appliedDate" part of your question -- but i still think thta will work, 
becuase i'm pretty sure in the context of ms() a doc w/o an appliedDate 
will get an effective appliedDate of "0" -- test to be sure.

https://cwiki.apache.org/confluence/display/solr/Function+Queries
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-FunctionRangeQueryParser


-Hoss
http://www.lucidworks.com/


Re: How to Query to Solr for comparing two dates in solr

2014-06-13 Thread Erick Erickson
I think you'd have to get creative with function queries. The trick is
to define a function that returns 0 in the case where your if test is
false, and 1 otherwise. Since the return is multiplied into the score,
the eventual score is 0 and the doc is not returned.

Best
Erick

On Fri, Jun 13, 2014 at 2:49 AM, Pbbhoge  wrote:
> Hi,
>
> I have two date fields in Solr Schema, I want to compare two different date
> fields in solr itself .
> how can i write the Query in Solr for comparing the two dates in solr itself
> .
>
> I want to retrieve all docs or records from solr where  updateDate >=
> appliedDate OR appliedDate == null
>
> you can assume relational Query like
> Retrieve all the records from reference data where  updateDate >=
> appliedDate OR appliedDate == null
>
>
> My schema is as follows
> 
>  required="true" multiValued="false"/>
>  required="true" multiValued="false"/>
>  required="true" multiValued="false"/>
>  required="true"  multiValued="false"/>
>  required="true"  multiValued="false"/>
>  required="true"  multiValued="false"/>
>  required="true"  multiValued="false"/>
>  required="true"  multiValued="false"/>
>  required="false" multiValued="false"/>
> 
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-Query-to-Solr-for-comparing-two-dates-in-solr-tp4141611.html
> Sent from the Solr - User mailing list archive at Nabble.com.


How to Query to Solr for comparing two dates in solr

2014-06-13 Thread Pbbhoge
Hi,

I have two date fields in Solr Schema, I want to compare two different date
fields in solr itself .
how can i write the Query in Solr for comparing the two dates in solr itself
.

I want to retrieve all docs or records from solr where  updateDate >=
appliedDate OR appliedDate == null

you can assume relational Query like 
Retrieve all the records from reference data where  updateDate >=
appliedDate OR appliedDate == null


My schema is as follows 



 
 









--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-Query-to-Solr-for-comparing-two-dates-in-solr-tp4141611.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to Configure Solr For Test Purposes?

2014-05-27 Thread Furkan KAMACI
Hi;

I've developed a Proxy application that takes request from clients and
sends them to Solr then gets response and sends them to client as response.
So, I am testing my application, my proxy for Solr.

Thanks;
Furkan KAMACI


2014-05-27 14:52 GMT+03:00 Tomás Fernández Löbbe :

> > What do you suggest for my purpose? If a test case fails re-running it
> for
> > some times maybe a solution? What kind of configuration do you suggest
> for
> > my Solr configuration?
> >
> >
> From the snippet of test that you showed, it looks like it's testing only
> Solr functionality. So, first make sure this is a test that you really
> need. Solr has it's own tests, and it you feel it could use more (for some
> specific case or context), I'd open a Jira and try to get the test inside
> Solr.
> If my impression is wrong and your test is actually testing your code, then
> I'd suggest you to use a specific soft commit call with waitSearcher = true
> on your test instead of relying on the autocommit (and remove the
> autocommit completely from your solrconfig).
>
> Tomás
>
>
>
> Thanks;
> > Furkan KAMACI
> > 26 May 2014 21:03 tarihinde "Shawn Heisey"  yazdı:
> >
> > > On 5/26/2014 10:57 AM, Furkan KAMACI wrote:
> > > > Hi;
> > > >
> > > > I run Solr within my Test Suite. I delete documents or atomically
> > update
> > > > them and check whether if it works or not. I know that I have to
> setup
> > a
> > > > hard/soft commit timing for my test Solr. However even I have that
> > > settings:
> > > >
> > > >  
> > > >1
> > > >true
> > > >  
> > > >
> > > >
> > > >  1
> > > >
> > >
> > > I hope you know that this is BAD configuration.  Doing automatic
> commits
> > > on an interval of 1 millisecond is asking for a whole host of problems.
> > >  In some cases, this could do a commit after every single document that
> > > is indexed, which is NOT recommended at all.  The openSearcher setting
> > > of "true" on autoCommit makes it even worse.  There's no reason to do
> > > both autoSoftCommit and autoCommit with openSearcher=true.  I don't
> know
> > > which one "wins" between autoCommit and autoSoftCommit if they both
> have
> > > the same config, but I would guess the hard commit does.
> > >
> > > > and even I wait (Thread.sleep()) for a time to wait Solr *sometimes*
> my
> > > > tests are failed. I get fail error even I increase wait time.
>  Example
> > > of a
> > > > sometimes failed code piece:
> > > >
> > > > for (int i = 0; i < dummyDocumentSize; i++) {
> > > >  deleteById("id" + i);
> > > >  dummyDocumentSize--;
> > > >  queryResponse = query(solrParams);
> > > >  assertTrue(queryResponse.getResults().size() ==
> > > dummyDocumentSize);
> > > >   }
> > > >
> > > > at debug mode if I wait for Solr to reflect changes I see that I do
> not
> > > get
> > > > error. What do you think, what kind of configuration I should have
> for
> > > such
> > > > kind of purposes?
> > >
> > > Chances are that commits are going to take longer than 1 millisecond.
> > > If you're actively indexing, the system is going to be trying to stack
> > > up lots of commits at the same time.  The maxWarmingSearchers value
> will
> > > limit the number of new searchers that can be opened, but it will not
> > > stop the commits themselves.  When lots of commits are going on, each
> > > one will take *even longer* to complete, which probably explains the
> > > problem.
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
>


Re: How to Configure Solr For Test Purposes?

2014-05-27 Thread Tomás Fernández Löbbe
> What do you suggest for my purpose? If a test case fails re-running it for
> some times maybe a solution? What kind of configuration do you suggest for
> my Solr configuration?
>
>
>From the snippet of test that you showed, it looks like it's testing only
Solr functionality. So, first make sure this is a test that you really
need. Solr has it's own tests, and it you feel it could use more (for some
specific case or context), I'd open a Jira and try to get the test inside
Solr.
If my impression is wrong and your test is actually testing your code, then
I'd suggest you to use a specific soft commit call with waitSearcher = true
on your test instead of relying on the autocommit (and remove the
autocommit completely from your solrconfig).

Tomás



Thanks;
> Furkan KAMACI
> 26 May 2014 21:03 tarihinde "Shawn Heisey"  yazdı:
>
> > On 5/26/2014 10:57 AM, Furkan KAMACI wrote:
> > > Hi;
> > >
> > > I run Solr within my Test Suite. I delete documents or atomically
> update
> > > them and check whether if it works or not. I know that I have to setup
> a
> > > hard/soft commit timing for my test Solr. However even I have that
> > settings:
> > >
> > >  
> > >1
> > >true
> > >  
> > >
> > >
> > >  1
> > >
> >
> > I hope you know that this is BAD configuration.  Doing automatic commits
> > on an interval of 1 millisecond is asking for a whole host of problems.
> >  In some cases, this could do a commit after every single document that
> > is indexed, which is NOT recommended at all.  The openSearcher setting
> > of "true" on autoCommit makes it even worse.  There's no reason to do
> > both autoSoftCommit and autoCommit with openSearcher=true.  I don't know
> > which one "wins" between autoCommit and autoSoftCommit if they both have
> > the same config, but I would guess the hard commit does.
> >
> > > and even I wait (Thread.sleep()) for a time to wait Solr *sometimes* my
> > > tests are failed. I get fail error even I increase wait time.  Example
> > of a
> > > sometimes failed code piece:
> > >
> > > for (int i = 0; i < dummyDocumentSize; i++) {
> > >  deleteById("id" + i);
> > >  dummyDocumentSize--;
> > >  queryResponse = query(solrParams);
> > >  assertTrue(queryResponse.getResults().size() ==
> > dummyDocumentSize);
> > >   }
> > >
> > > at debug mode if I wait for Solr to reflect changes I see that I do not
> > get
> > > error. What do you think, what kind of configuration I should have for
> > such
> > > kind of purposes?
> >
> > Chances are that commits are going to take longer than 1 millisecond.
> > If you're actively indexing, the system is going to be trying to stack
> > up lots of commits at the same time.  The maxWarmingSearchers value will
> > limit the number of new searchers that can be opened, but it will not
> > stop the commits themselves.  When lots of commits are going on, each
> > one will take *even longer* to complete, which probably explains the
> > problem.
> >
> > Thanks,
> > Shawn
> >
> >
>


Re: How to Configure Solr For Test Purposes?

2014-05-27 Thread Furkan KAMACI
Hi All;

I have defined just that:

 
 1
 

then turned of hard commit. I've run my tests time and time again and did
not get any error. Who wants to write a unit test that interacts with Solr
as like my situation can use it.

Thanks;
Furkan KAMACI


2014-05-26 23:37 GMT+03:00 Furkan KAMACI :

> Hi Shawn;
>
> I know that it is a bad practise but I just commit up to 5 documents and
> there will not be more than 5 documents at any time at any test method. It
> is just for test purpose to see that my API works. I want to have
> automatic tests.
>
> What do you suggest for my purpose? If a test case fails re-running it for
> some times maybe a solution? What kind of configuration do you suggest for
> my Solr configuration?
>
> Thanks;
> Furkan KAMACI
> 26 May 2014 21:03 tarihinde "Shawn Heisey"  yazdı:
>
> On 5/26/2014 10:57 AM, Furkan KAMACI wrote:
>> > Hi;
>> >
>> > I run Solr within my Test Suite. I delete documents or atomically update
>> > them and check whether if it works or not. I know that I have to setup a
>> > hard/soft commit timing for my test Solr. However even I have that
>> settings:
>> >
>> >  
>> >1
>> >true
>> >  
>> >
>> >
>> >  1
>> >
>>
>> I hope you know that this is BAD configuration.  Doing automatic commits
>> on an interval of 1 millisecond is asking for a whole host of problems.
>>  In some cases, this could do a commit after every single document that
>> is indexed, which is NOT recommended at all.  The openSearcher setting
>> of "true" on autoCommit makes it even worse.  There's no reason to do
>> both autoSoftCommit and autoCommit with openSearcher=true.  I don't know
>> which one "wins" between autoCommit and autoSoftCommit if they both have
>> the same config, but I would guess the hard commit does.
>>
>> > and even I wait (Thread.sleep()) for a time to wait Solr *sometimes* my
>> > tests are failed. I get fail error even I increase wait time.  Example
>> of a
>> > sometimes failed code piece:
>> >
>> > for (int i = 0; i < dummyDocumentSize; i++) {
>> >  deleteById("id" + i);
>> >  dummyDocumentSize--;
>> >  queryResponse = query(solrParams);
>> >  assertTrue(queryResponse.getResults().size() ==
>> dummyDocumentSize);
>> >   }
>> >
>> > at debug mode if I wait for Solr to reflect changes I see that I do not
>> get
>> > error. What do you think, what kind of configuration I should have for
>> such
>> > kind of purposes?
>>
>> Chances are that commits are going to take longer than 1 millisecond.
>> If you're actively indexing, the system is going to be trying to stack
>> up lots of commits at the same time.  The maxWarmingSearchers value will
>> limit the number of new searchers that can be opened, but it will not
>> stop the commits themselves.  When lots of commits are going on, each
>> one will take *even longer* to complete, which probably explains the
>> problem.
>>
>> Thanks,
>> Shawn
>>
>>


Re: How to Configure Solr For Test Purposes?

2014-05-26 Thread Furkan KAMACI
Hi Shawn;

I know that it is a bad practise but I just commit up to 5 documents and
there will not be more than 5 documents at any time at any test method. It
is just for test purpose to see that my API works. I want to have
automatic tests.

What do you suggest for my purpose? If a test case fails re-running it for
some times maybe a solution? What kind of configuration do you suggest for
my Solr configuration?

Thanks;
Furkan KAMACI
26 May 2014 21:03 tarihinde "Shawn Heisey"  yazdı:

> On 5/26/2014 10:57 AM, Furkan KAMACI wrote:
> > Hi;
> >
> > I run Solr within my Test Suite. I delete documents or atomically update
> > them and check whether if it works or not. I know that I have to setup a
> > hard/soft commit timing for my test Solr. However even I have that
> settings:
> >
> >  
> >1
> >true
> >  
> >
> >
> >  1
> >
>
> I hope you know that this is BAD configuration.  Doing automatic commits
> on an interval of 1 millisecond is asking for a whole host of problems.
>  In some cases, this could do a commit after every single document that
> is indexed, which is NOT recommended at all.  The openSearcher setting
> of "true" on autoCommit makes it even worse.  There's no reason to do
> both autoSoftCommit and autoCommit with openSearcher=true.  I don't know
> which one "wins" between autoCommit and autoSoftCommit if they both have
> the same config, but I would guess the hard commit does.
>
> > and even I wait (Thread.sleep()) for a time to wait Solr *sometimes* my
> > tests are failed. I get fail error even I increase wait time.  Example
> of a
> > sometimes failed code piece:
> >
> > for (int i = 0; i < dummyDocumentSize; i++) {
> >  deleteById("id" + i);
> >  dummyDocumentSize--;
> >  queryResponse = query(solrParams);
> >  assertTrue(queryResponse.getResults().size() ==
> dummyDocumentSize);
> >   }
> >
> > at debug mode if I wait for Solr to reflect changes I see that I do not
> get
> > error. What do you think, what kind of configuration I should have for
> such
> > kind of purposes?
>
> Chances are that commits are going to take longer than 1 millisecond.
> If you're actively indexing, the system is going to be trying to stack
> up lots of commits at the same time.  The maxWarmingSearchers value will
> limit the number of new searchers that can be opened, but it will not
> stop the commits themselves.  When lots of commits are going on, each
> one will take *even longer* to complete, which probably explains the
> problem.
>
> Thanks,
> Shawn
>
>


Re: How to Configure Solr For Test Purposes?

2014-05-26 Thread Shawn Heisey
On 5/26/2014 10:57 AM, Furkan KAMACI wrote:
> Hi;
> 
> I run Solr within my Test Suite. I delete documents or atomically update
> them and check whether if it works or not. I know that I have to setup a
> hard/soft commit timing for my test Solr. However even I have that settings:
> 
>  
>1
>true
>  
> 
>
>  1
>

I hope you know that this is BAD configuration.  Doing automatic commits
on an interval of 1 millisecond is asking for a whole host of problems.
 In some cases, this could do a commit after every single document that
is indexed, which is NOT recommended at all.  The openSearcher setting
of "true" on autoCommit makes it even worse.  There's no reason to do
both autoSoftCommit and autoCommit with openSearcher=true.  I don't know
which one "wins" between autoCommit and autoSoftCommit if they both have
the same config, but I would guess the hard commit does.

> and even I wait (Thread.sleep()) for a time to wait Solr *sometimes* my
> tests are failed. I get fail error even I increase wait time.  Example of a
> sometimes failed code piece:
> 
> for (int i = 0; i < dummyDocumentSize; i++) {
>  deleteById("id" + i);
>  dummyDocumentSize--;
>  queryResponse = query(solrParams);
>  assertTrue(queryResponse.getResults().size() == dummyDocumentSize);
>   }
> 
> at debug mode if I wait for Solr to reflect changes I see that I do not get
> error. What do you think, what kind of configuration I should have for such
> kind of purposes?

Chances are that commits are going to take longer than 1 millisecond.
If you're actively indexing, the system is going to be trying to stack
up lots of commits at the same time.  The maxWarmingSearchers value will
limit the number of new searchers that can be opened, but it will not
stop the commits themselves.  When lots of commits are going on, each
one will take *even longer* to complete, which probably explains the
problem.

Thanks,
Shawn



How to Configure Solr For Test Purposes?

2014-05-26 Thread Furkan KAMACI
Hi;

I run Solr within my Test Suite. I delete documents or atomically update
them and check whether if it works or not. I know that I have to setup a
hard/soft commit timing for my test Solr. However even I have that settings:

 
   1
   true
 

   
 1
   

and even I wait (Thread.sleep()) for a time to wait Solr *sometimes* my
tests are failed. I get fail error even I increase wait time.  Example of a
sometimes failed code piece:

for (int i = 0; i < dummyDocumentSize; i++) {
 deleteById("id" + i);
 dummyDocumentSize--;
 queryResponse = query(solrParams);
 assertTrue(queryResponse.getResults().size() == dummyDocumentSize);
  }

at debug mode if I wait for Solr to reflect changes I see that I do not get
error. What do you think, what kind of configuration I should have for such
kind of purposes?

Thanks;
Furkan KAMACI


Re: saving user actions on item in solr for later retrieval

2014-04-30 Thread Mikhail Khludnev
is there somebody from LucidWorks who can refer to Click Score Relevance
Framework in LucidWorks Search?


On Mon, Apr 28, 2014 at 10:48 PM, nolim  wrote:

> Hi,
> We are using solr in production system for around ~500 users and we have
> around ~1 queries per day.
> Our user's search topics most of the time static and repeat themselves over
> time.
>
> We have in our system an option to specify "specific search subject" (we
> also call it "specific information need") and most of our users are using
> this option.
> We keep in our system logs each query and document retrieved from each
> "information need"
> and the user can also give feedback if the document is relevant for his
> "information need".
>
> We also have special query expansion technique and diversity algorithm
> based
> on MMR.
>
> We want to use this information from logs as data set for training our
> ranking system
> and preforming "Learning To Rank" for each "information need" or cluster of
> "information needs".
> We also want to give the user the option filter by "relevant" and "read"
> based on his actions\friends actions in the same topic.
> When he runs a query again or similar one he can skip already read
> documents. That's an important requirement to our users.
>
> We think about 2 possibilities to implement it:
> 1. Updating each item in solr and creating 2 fields named: "read",
> "relevant".
> Each field is multivalue field with the corresponding label of the
> "information need".
> When the user reads a document an update is sent to solr and the field
> "read" gets a label with
> the "information need" the user is working on...
> Will cause update when each item is read by user (still nothing compare to
> new items coming in each day).
> We are saving information that "belongs" to the application in solr which
> may be wrong architecture.
>
> 2. Save the information In DB, and then preforming filtering on the
> retrieved results.
> this option is much more complicated (We now have "fields" that aren't solr
> and the user uses them for search). We won't get facets, autocomplete and
> other nice stuff that a regular field in solr can have.
> cost in preformances, we can''t retrieve easy: "give me top 10 documents
> that answer the query and unread from the information need" and more
> complicated code to hold.
>
> 3. Do you have more ideas?
>
> Which of those options is the better?
>
> Thanks in advance!
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/saving-user-actions-on-item-in-solr-for-later-retrieval-tp4133558.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 


Re: saving user actions on item in solr for later retrieval

2014-04-30 Thread nolim
Thank you, we will check it out.
 On Apr 29, 2014 9:28 PM, "iorixxx [via Lucene]" <
ml-node+s472066n4133796...@n3.nabble.com> wrote:

> Hi Nolim,
>
> Actually EFF is searchable. See my comments at the end of the page
>
>
> https://cwiki.apache.org/confluence/display/solr/Working+with+External+Files+and+Processes
>
> Ahmet
>
>
>
> On Tuesday, April 29, 2014 9:07 PM, nolim <[hidden 
> email]<http://user/SendEmail.jtp?type=node&node=4133796&i=0>>
> wrote:
> Thank you, it was interesting and I have learned some new things in solr
> :)
>
> But the "External File Field" isn't a good option because the field is
> unsearchable which it very important to us.
> We think about the first option (updating document in solr) but preforming
> commit only each 10 minutes - If we would like to retrieve the value
> realtime we can use RealTimeGet.
>
> Maybe you have other suggestion?
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/saving-user-actions-on-item-in-solr-for-later-retrieval-tp4133558p4133793.html
>
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/saving-user-actions-on-item-in-solr-for-later-retrieval-tp4133558p4133796.html
>  To unsubscribe from saving user actions on item in solr for later
> retrieval, click 
> here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4133558&code=YWxvbnlhZG9AZ21haWwuY29tfDQxMzM1NTh8MTMwMDI0NTg3MA==>
> .
> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: 
http://lucene.472066.n3.nabble.com/saving-user-actions-on-item-in-solr-for-later-retrieval-tp4133558p4133955.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: saving user actions on item in solr for later retrieval

2014-04-29 Thread Ahmet Arslan
Hi Nolim,

Actually EFF is searchable. See my comments at the end of the page 

https://cwiki.apache.org/confluence/display/solr/Working+with+External+Files+and+Processes

Ahmet



On Tuesday, April 29, 2014 9:07 PM, nolim  wrote:
Thank you, it was interesting and I have learned some new things in solr :)

But the "External File Field" isn't a good option because the field is
unsearchable which it very important to us.
We think about the first option (updating document in solr) but preforming
commit only each 10 minutes - If we would like to retrieve the value
realtime we can use RealTimeGet.

Maybe you have other suggestion?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/saving-user-actions-on-item-in-solr-for-later-retrieval-tp4133558p4133793.html

Sent from the Solr - User mailing list archive at Nabble.com.



Re: saving user actions on item in solr for later retrieval

2014-04-29 Thread nolim
Thank you, it was interesting and I have learned some new things in solr :)

But the "External File Field" isn't a good option because the field is
unsearchable which it very important to us.
We think about the first option (updating document in solr) but preforming
commit only each 10 minutes - If we would like to retrieve the value
realtime we can use RealTimeGet.

Maybe you have other suggestion?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/saving-user-actions-on-item-in-solr-for-later-retrieval-tp4133558p4133793.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: saving user actions on item in solr for later retrieval

2014-04-28 Thread Alexandre Rafalovitch
1. might be too expensive in terms of commits and performance of
refreshing the index every time.

3. Have you looked at external fields, custom components, etc. For example:
http://www.slideshare.net/lucenerevolution/potter-timothy-boosting-documents-in-solr
http://lucene.472066.n3.nabble.com/Combining-Solr-score-with-customized-user-ratings-for-a-document-td4040200.html
(past discussion that seems relevant)

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Tue, Apr 29, 2014 at 1:48 AM, nolim  wrote:
> Hi,
> We are using solr in production system for around ~500 users and we have
> around ~1 queries per day.
> Our user's search topics most of the time static and repeat themselves over
> time.
>
> We have in our system an option to specify "specific search subject" (we
> also call it "specific information need") and most of our users are using
> this option.
> We keep in our system logs each query and document retrieved from each
> "information need"
> and the user can also give feedback if the document is relevant for his
> "information need".
>
> We also have special query expansion technique and diversity algorithm based
> on MMR.
>
> We want to use this information from logs as data set for training our
> ranking system
> and preforming "Learning To Rank" for each "information need" or cluster of
> "information needs".
> We also want to give the user the option filter by "relevant" and "read"
> based on his actions\friends actions in the same topic.
> When he runs a query again or similar one he can skip already read
> documents. That's an important requirement to our users.
>
> We think about 2 possibilities to implement it:
> 1. Updating each item in solr and creating 2 fields named: "read",
> "relevant".
> Each field is multivalue field with the corresponding label of the
> "information need".
> When the user reads a document an update is sent to solr and the field
> "read" gets a label with
> the "information need" the user is working on...
> Will cause update when each item is read by user (still nothing compare to
> new items coming in each day).
> We are saving information that "belongs" to the application in solr which
> may be wrong architecture.
>
> 2. Save the information In DB, and then preforming filtering on the
> retrieved results.
> this option is much more complicated (We now have "fields" that aren't solr
> and the user uses them for search). We won't get facets, autocomplete and
> other nice stuff that a regular field in solr can have.
> cost in preformances, we can''t retrieve easy: "give me top 10 documents
> that answer the query and unread from the information need" and more
> complicated code to hold.
>
> 3. Do you have more ideas?
>
> Which of those options is the better?
>
> Thanks in advance!
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/saving-user-actions-on-item-in-solr-for-later-retrieval-tp4133558.html
> Sent from the Solr - User mailing list archive at Nabble.com.


saving user actions on item in solr for later retrieval

2014-04-28 Thread nolim
Hi,
We are using solr in production system for around ~500 users and we have
around ~1 queries per day.
Our user's search topics most of the time static and repeat themselves over
time. 

We have in our system an option to specify "specific search subject" (we
also call it "specific information need") and most of our users are using
this option.
We keep in our system logs each query and document retrieved from each
"information need"
and the user can also give feedback if the document is relevant for his
"information need".

We also have special query expansion technique and diversity algorithm based
on MMR.

We want to use this information from logs as data set for training our
ranking system
and preforming "Learning To Rank" for each "information need" or cluster of
"information needs".
We also want to give the user the option filter by "relevant" and "read"
based on his actions\friends actions in the same topic.
When he runs a query again or similar one he can skip already read
documents. That's an important requirement to our users.

We think about 2 possibilities to implement it:
1. Updating each item in solr and creating 2 fields named: "read",
"relevant".
Each field is multivalue field with the corresponding label of the
"information need".
When the user reads a document an update is sent to solr and the field
"read" gets a label with
the "information need" the user is working on...
Will cause update when each item is read by user (still nothing compare to
new items coming in each day).
We are saving information that "belongs" to the application in solr which
may be wrong architecture.

2. Save the information In DB, and then preforming filtering on the
retrieved results.
this option is much more complicated (We now have "fields" that aren't solr
and the user uses them for search). We won't get facets, autocomplete and
other nice stuff that a regular field in solr can have.
cost in preformances, we can''t retrieve easy: "give me top 10 documents
that answer the query and unread from the information need" and more
complicated code to hold.

3. Do you have more ideas?

Which of those options is the better?

Thanks in advance!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/saving-user-actions-on-item-in-solr-for-later-retrieval-tp4133558.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Singles in solr for bigrams,trigrams in parsed_query

2014-03-24 Thread Dmitry Kan
Hi,

Query rewrite happens down the chain, after query parsing. For example a
wildcard query triggers an index based query rewrite where terms matching
the wildcard are added into the original query.

In your case, looks like the query rewrite will generate the ngrams and add
them into the original query.

So just make sure, that the analysis page shows what you expect on indexing
and querying sides.

Out of curiosity: what are you trying to achieve with the query side
shingles? Isn't just index time shingles enough?


On Thu, Mar 20, 2014 at 8:06 PM, Jyotirmoy Sundi  wrote:

> Hi Folks,
>I am using singles to index bigrams/trigrams. The same is also used
> for query in the schema.xml file. But when I run the query in debug mode
> for a collections, I dont see the bigrams in the parsed_query . Any idea
> what I might be missing.
> solr/colection/select?q=best%20price&debugQuery=on
>
> text:best text:price
> I was hoping to see
> text:best text:price text:best price
>
> My schema files looks like this:
>  
>  omitNorms="true"/>
>  omitNorms="true" positionIncrementGap="0"/>
>
>  positionIncrementGap="100">
>   
> 
>  maxShingleSize="4" outputUnigrams="true" />
> 
> 
> 
>  generateWordParts="0" generateNumberParts="0" catenateWords="1"
> catenateNumbers="1" catenateAll="1" preserveOriginal="1"
> splitOnCaseChange="0" splitOnNumerics="0" stemEnglishPossessive="1"/>
> 
>  ignoreCase="true" expand="true"/>
> 
> 
>
>   
> 
> 
> 
> 
> 
>  ignoreCase="true" expand="true"/>
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="1" splitOnCaseChange="0"
> splitOnNumerics="0" stemEnglishPossessive="1"/>
>  maxShingleSize="4" outputUnigrams="true" />
>  ignoreCase="true"/>
> 
>  
> 
>  
>
>
>
> --
> Best Regards,
> Jyotirmoy Sundi
>



-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan


Re: Shingles in solr for bigrams,trigrams in parsed_query

2014-03-23 Thread Jyotirmoy Sundi
Hi Jack,
  Thanks for your response, but if I try q="best quality and best
price", the parsedquery comes as following which is a lot of unwanted
combinations. I am just looking for uni-bi and tri grams.

"debug":{
"rawquerystring":"\"best quality and best price\"",
"querystring":"\"best quality and best price\"",
"*parsedquery*":"MultiPhraseQuery(text:\"(best best_best quality
best quality best quality _ best quality _ best) (quality quality _
quality _ best quality _ best price) (_ best _ best price _ best
price_best) (best best_best price best price) price\")",
"*parsedquery_toString*":"text:\"(best best_best quality best
quality best quality _ best quality _ best) (quality quality _ quality
_ best quality _ best price) (_ best _ best price _ best price_best)
(best best_best price best price) price\"",
"explain":{},
"QParser":"LuceneQParser",

..




On Sun, Mar 23, 2014 at 11:31 AM, Jack Krupansky wrote:

>
> The query parser only presents the query terms one at a time to the
> analyzer, so your analyzer doesn't see both terms on one analysis call.
>
> If you enclose your query terms in quotes as a single phrase, you should
> see multiple terms being processed.
>
> q="best price"
>
> -- Jack Krupansky
>
> -Original Message- From: Jyotirmoy Sundi
> Sent: Thursday, March 20, 2014 2:06 PM
> To: solr-user@lucene.apache.org
> Subject: Singles in solr for bigrams,trigrams in parsed_query
>
> Hi Folks,
>   I am using singles to index bigrams/trigrams. The same is also used
> for query in the schema.xml file. But when I run the query in debug mode
> for a collections, I dont see the bigrams in the parsed_query . Any idea
> what I might be missing.
> solr/colection/select?q=best%20price&debugQuery=on
>
> text:best text:price
> I was hoping to see
> text:best text:price text:best
> price
>
> My schema files looks like this:
> 
> omitNorms="true"/>
> omitNorms="true" positionIncrementGap="0"/>
>
> positionIncrementGap="100">
>  
>
> maxShingleSize="4" outputUnigrams="true" />
>
>
>
> generateWordParts="0" generateNumberParts="0" catenateWords="1"
> catenateNumbers="1" catenateAll="1" preserveOriginal="1"
> splitOnCaseChange="0" splitOnNumerics="0" stemEnglishPossessive="1"/>
>
> ignoreCase="true" expand="true"/>
>
> 
>
>  
>
>
>
>
>
> ignoreCase="true" expand="true"/>
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="1" splitOnCaseChange="0"
> splitOnNumerics="0" stemEnglishPossessive="1"/>
> maxShingleSize="4" outputUnigrams="true" />
> ignoreCase="true"/>
>
> 
>
> 
>
>
>
> --
> Best Regards,
> Jyotirmoy Sundi
>



-- 
Best Regards,
Jyotirmoy Sundi


Re: Shingles in solr for bigrams,trigrams in parsed_query

2014-03-23 Thread Jack Krupansky


The query parser only presents the query terms one at a time to the 
analyzer, so your analyzer doesn't see both terms on one analysis call.


If you enclose your query terms in quotes as a single phrase, you should see 
multiple terms being processed.


q="best price"

-- Jack Krupansky

-Original Message- 
From: Jyotirmoy Sundi

Sent: Thursday, March 20, 2014 2:06 PM
To: solr-user@lucene.apache.org
Subject: Singles in solr for bigrams,trigrams in parsed_query

Hi Folks,
  I am using singles to index bigrams/trigrams. The same is also used
for query in the schema.xml file. But when I run the query in debug mode
for a collections, I dont see the bigrams in the parsed_query . Any idea
what I might be missing.
solr/colection/select?q=best%20price&debugQuery=on

text:best text:price
I was hoping to see
text:best text:price text:best price

My schema files looks like this:

   
   

   
 
   
   
   
   
   
   
   
   
   


 
   
   
   
   
   
   
   
   
   
   

   




--
Best Regards,
Jyotirmoy Sundi 



Singles in solr for bigrams,trigrams in parsed_query

2014-03-20 Thread Jyotirmoy Sundi
Hi Folks,
   I am using singles to index bigrams/trigrams. The same is also used
for query in the schema.xml file. But when I run the query in debug mode
for a collections, I dont see the bigrams in the parsed_query . Any idea
what I might be missing.
solr/colection/select?q=best%20price&debugQuery=on

text:best text:price
I was hoping to see
text:best text:price text:best price

My schema files looks like this:
 




  











  










 

 



-- 
Best Regards,
Jyotirmoy Sundi


Re: How to use Solr for two different projects on one server

2014-01-23 Thread Alexandre Rafalovitch
gt;>> 
>>>>> 2013-10-29T14:17:22Z
>>>>> 
>>>>> 
>>>>> 
>>>>> glPrototypeCore
>>>>> /etc/solr/
>>>>> /var/lib/solr/data/
>>>>> 2014-01-23T09:29:30.019Z
>>>>> 245267
>>>>> 
>>>>> 4401029
>>>>> 4401029
>>>>> 1370010628806
>>>>> 12
>>>>> true
>>>>> false
>>>>> 
>>>>> org.apache.lucene.store.MMapDirectory:org.apache.lucene.store.MMapDirectory@/var/lib/solr/data/index
>>>>> lockFactory=org.apache.lucene.store.NativeFSLockFactory@5ad83862
>>>>> 
>>>>> 2013-10-29T14:17:22Z
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>
>>>>>
>>>>> From my understanding I now have an unnamed core and a core named
>>>>> "glPrototypeCore" which uses the same configuration.
>>>>>
>>>>> I copied the files data-config.xml, schema.xml into a new directory
>>>>> "/etc/solr/glinstance" and tried to create another core but this always
>>>>> throws me error 400. I even tried by adding the schema- and
>>>>> config-parameters with full path, but this did not lead to any
>>>>> difference. Also I don't understand what the "dataDir"-parameter is for.
>>>>> I could not find any data-directories in /etc/solr/ but the creation of
>>>>> the first core worked anyway.
>>>>>
>>>>> Can someone help? Is there any better place for my new
>>>>> instance-directory and what files do I really need?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Am 23.01.2014 10:10, schrieb Stavros Delisavas:
>>>>>> I didn't know that the "core"-term is associated with this use case. I
>>>>>> expected it to be some technical feature that allows to run more
>>>>>> solr-instances for better multithread-cpu-usage. For example to activate
>>>>>> two solr-cores when two cpu-cores are available on the server.
>>>>>>
>>>>>> So in general, I have the feeling that the term "core" is somewhat
>>>>>> confusing for solr-beginners like me.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Am 23.01.2014 09:54, schrieb Alexandre Rafalovitch:
>>>>>>> Which is why it is curious that you did not find it. Looking back at
>>>>>>> it now, do you have a suggestion of what could be improved to insure
>>>>>>> people find this easier in the future?
>>>>>>>
>>>>>>> Regards,
>>>>>>>Alex.
>>>>>>> Personal website: http://www.outerthoughts.com/
>>>>>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>>>>>>> - Time is the quality of nature that keeps events from happening all
>>>>>>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>>>>>>> book)
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jan 23, 2014 at 3:49 PM, Stavros Delisavas 
>>>>>>>  wrote:
>>>>>>>> Thanks for the fast responses. Looks like exactly what I was looking 
>>>>>>>> for!
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Am 23.01.2014 09:46, schrieb Furkan KAMACI:
>>>>>>>>> Hi;
>>>>>>>>>
>>>>>>>>> Firstly you should read here and learn the terminology of Solr:
>>>>>>>>> http://wiki.apache.org/solr/SolrTerminology
>>>>>>>>>
>>>>>>>>> Thanks;
>>>>>>>>> Furkan KAMACI
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2014/1/23 Alexandre Rafalovitch 
>>>>>>>>>
>>>>>>>>>> If you are not worried about them stepping on each other's toes
>>>>>>>>>> (performance, disk space, etc), just create multiple collections.
>>>>>>>>>> There are examples of that in standard distribution (e.g. badly named
>>>>>>>>>> example/multicore).
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>   Alex.
>>>>>>>>>> Personal website: http://www.outerthoughts.com/
>>>>>>>>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>>>>>>>>>> - Time is the quality of nature that keeps events from happening all
>>>>>>>>>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via 
>>>>>>>>>> GTD
>>>>>>>>>> book)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Jan 23, 2014 at 3:36 PM, Stavros Delisavas 
>>>>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>>> Dear Solr-Experts,
>>>>>>>>>>>
>>>>>>>>>>> I am using Solr for my current web-application on my server 
>>>>>>>>>>> successfully.
>>>>>>>>>>> Now I would like to use it in my second web-application that is 
>>>>>>>>>>> hosted
>>>>>>>>>>> on the same server. Is it possible in any way to create two 
>>>>>>>>>>> independent
>>>>>>>>>>> instances/databases in Solr? I know that I could create another set 
>>>>>>>>>>> of
>>>>>>>>>>> fields with alternated field names, but I would prefer to be 
>>>>>>>>>>> independent
>>>>>>>>>>> on my field naming for all my projects.
>>>>>>>>>>>
>>>>>>>>>>> Also I would like to be able to have one state of my development 
>>>>>>>>>>> version
>>>>>>>>>>> and one state of my production version on my server so that I can do
>>>>>>>>>>> tests on my development-state without interference on my
>>>>>>>>>> production-version.
>>>>>>>>>>> What is the best-practice to achieve this or how can this be done in
>>>>>>>>>>> general?
>>>>>>>>>>>
>>>>>>>>>>> I have searched google but could not get any usefull results 
>>>>>>>>>>> because I
>>>>>>>>>>> don't even know what terms to search for with solr.
>>>>>>>>>>> A minimal-example would be most helpfull.
>>>>>>>>>>>
>>>>>>>>>>> Thanks a lot!
>>>>>>>>>>>
>>>>>>>>>>> Stavros
>


Re: How to use Solr for two different projects on one server

2014-01-23 Thread Stavros Delisavas
ve an unnamed core and a core named
>>>> "glPrototypeCore" which uses the same configuration.
>>>>
>>>> I copied the files data-config.xml, schema.xml into a new directory
>>>> "/etc/solr/glinstance" and tried to create another core but this always
>>>> throws me error 400. I even tried by adding the schema- and
>>>> config-parameters with full path, but this did not lead to any
>>>> difference. Also I don't understand what the "dataDir"-parameter is for.
>>>> I could not find any data-directories in /etc/solr/ but the creation of
>>>> the first core worked anyway.
>>>>
>>>> Can someone help? Is there any better place for my new
>>>> instance-directory and what files do I really need?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Am 23.01.2014 10:10, schrieb Stavros Delisavas:
>>>>> I didn't know that the "core"-term is associated with this use case. I
>>>>> expected it to be some technical feature that allows to run more
>>>>> solr-instances for better multithread-cpu-usage. For example to activate
>>>>> two solr-cores when two cpu-cores are available on the server.
>>>>>
>>>>> So in general, I have the feeling that the term "core" is somewhat
>>>>> confusing for solr-beginners like me.
>>>>>
>>>>>
>>>>>
>>>>> Am 23.01.2014 09:54, schrieb Alexandre Rafalovitch:
>>>>>> Which is why it is curious that you did not find it. Looking back at
>>>>>> it now, do you have a suggestion of what could be improved to insure
>>>>>> people find this easier in the future?
>>>>>>
>>>>>> Regards,
>>>>>>    Alex.
>>>>>> Personal website: http://www.outerthoughts.com/
>>>>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>>>>>> - Time is the quality of nature that keeps events from happening all
>>>>>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>>>>>> book)
>>>>>>
>>>>>>
>>>>>> On Thu, Jan 23, 2014 at 3:49 PM, Stavros Delisavas 
>>>>>>  wrote:
>>>>>>> Thanks for the fast responses. Looks like exactly what I was looking 
>>>>>>> for!
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Am 23.01.2014 09:46, schrieb Furkan KAMACI:
>>>>>>>> Hi;
>>>>>>>>
>>>>>>>> Firstly you should read here and learn the terminology of Solr:
>>>>>>>> http://wiki.apache.org/solr/SolrTerminology
>>>>>>>>
>>>>>>>> Thanks;
>>>>>>>> Furkan KAMACI
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014/1/23 Alexandre Rafalovitch 
>>>>>>>>
>>>>>>>>> If you are not worried about them stepping on each other's toes
>>>>>>>>> (performance, disk space, etc), just create multiple collections.
>>>>>>>>> There are examples of that in standard distribution (e.g. badly named
>>>>>>>>> example/multicore).
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>   Alex.
>>>>>>>>> Personal website: http://www.outerthoughts.com/
>>>>>>>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>>>>>>>>> - Time is the quality of nature that keeps events from happening all
>>>>>>>>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>>>>>>>>> book)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Jan 23, 2014 at 3:36 PM, Stavros Delisavas 
>>>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>>> Dear Solr-Experts,
>>>>>>>>>>
>>>>>>>>>> I am using Solr for my current web-application on my server 
>>>>>>>>>> successfully.
>>>>>>>>>> Now I would like to use it in my second web-application that is 
>>>>>>>>>> hosted
>>>>>>>>>> on the same server. Is it possible in any way to create two 
>>>>>>>>>> independent
>>>>>>>>>> instances/databases in Solr? I know that I could create another set 
>>>>>>>>>> of
>>>>>>>>>> fields with alternated field names, but I would prefer to be 
>>>>>>>>>> independent
>>>>>>>>>> on my field naming for all my projects.
>>>>>>>>>>
>>>>>>>>>> Also I would like to be able to have one state of my development 
>>>>>>>>>> version
>>>>>>>>>> and one state of my production version on my server so that I can do
>>>>>>>>>> tests on my development-state without interference on my
>>>>>>>>> production-version.
>>>>>>>>>> What is the best-practice to achieve this or how can this be done in
>>>>>>>>>> general?
>>>>>>>>>>
>>>>>>>>>> I have searched google but could not get any usefull results because 
>>>>>>>>>> I
>>>>>>>>>> don't even know what terms to search for with solr.
>>>>>>>>>> A minimal-example would be most helpfull.
>>>>>>>>>>
>>>>>>>>>> Thanks a lot!
>>>>>>>>>>
>>>>>>>>>> Stavros



Re: How to use Solr for two different projects on one server

2014-01-23 Thread Alexandre Rafalovitch
;> instance-directory and what files do I really need?
>>>
>>>
>>>
>>>
>>>
>>> Am 23.01.2014 10:10, schrieb Stavros Delisavas:
>>>> I didn't know that the "core"-term is associated with this use case. I
>>>> expected it to be some technical feature that allows to run more
>>>> solr-instances for better multithread-cpu-usage. For example to activate
>>>> two solr-cores when two cpu-cores are available on the server.
>>>>
>>>> So in general, I have the feeling that the term "core" is somewhat
>>>> confusing for solr-beginners like me.
>>>>
>>>>
>>>>
>>>> Am 23.01.2014 09:54, schrieb Alexandre Rafalovitch:
>>>>> Which is why it is curious that you did not find it. Looking back at
>>>>> it now, do you have a suggestion of what could be improved to insure
>>>>> people find this easier in the future?
>>>>>
>>>>> Regards,
>>>>>Alex.
>>>>> Personal website: http://www.outerthoughts.com/
>>>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>>>>> - Time is the quality of nature that keeps events from happening all
>>>>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>>>>> book)
>>>>>
>>>>>
>>>>> On Thu, Jan 23, 2014 at 3:49 PM, Stavros Delisavas  
>>>>> wrote:
>>>>>> Thanks for the fast responses. Looks like exactly what I was looking for!
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Am 23.01.2014 09:46, schrieb Furkan KAMACI:
>>>>>>> Hi;
>>>>>>>
>>>>>>> Firstly you should read here and learn the terminology of Solr:
>>>>>>> http://wiki.apache.org/solr/SolrTerminology
>>>>>>>
>>>>>>> Thanks;
>>>>>>> Furkan KAMACI
>>>>>>>
>>>>>>>
>>>>>>> 2014/1/23 Alexandre Rafalovitch 
>>>>>>>
>>>>>>>> If you are not worried about them stepping on each other's toes
>>>>>>>> (performance, disk space, etc), just create multiple collections.
>>>>>>>> There are examples of that in standard distribution (e.g. badly named
>>>>>>>> example/multicore).
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>   Alex.
>>>>>>>> Personal website: http://www.outerthoughts.com/
>>>>>>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>>>>>>>> - Time is the quality of nature that keeps events from happening all
>>>>>>>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>>>>>>>> book)
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jan 23, 2014 at 3:36 PM, Stavros Delisavas 
>>>>>>>> 
>>>>>>>> wrote:
>>>>>>>>> Dear Solr-Experts,
>>>>>>>>>
>>>>>>>>> I am using Solr for my current web-application on my server 
>>>>>>>>> successfully.
>>>>>>>>> Now I would like to use it in my second web-application that is hosted
>>>>>>>>> on the same server. Is it possible in any way to create two 
>>>>>>>>> independent
>>>>>>>>> instances/databases in Solr? I know that I could create another set of
>>>>>>>>> fields with alternated field names, but I would prefer to be 
>>>>>>>>> independent
>>>>>>>>> on my field naming for all my projects.
>>>>>>>>>
>>>>>>>>> Also I would like to be able to have one state of my development 
>>>>>>>>> version
>>>>>>>>> and one state of my production version on my server so that I can do
>>>>>>>>> tests on my development-state without interference on my
>>>>>>>> production-version.
>>>>>>>>> What is the best-practice to achieve this or how can this be done in
>>>>>>>>> general?
>>>>>>>>>
>>>>>>>>> I have searched google but could not get any usefull results because I
>>>>>>>>> don't even know what terms to search for with solr.
>>>>>>>>> A minimal-example would be most helpfull.
>>>>>>>>>
>>>>>>>>> Thanks a lot!
>>>>>>>>>
>>>>>>>>> Stavros
>


Re: How to use Solr for two different projects on one server

2014-01-23 Thread Stavros Delisavas
t;>>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>>>> book)
>>>>
>>>>
>>>> On Thu, Jan 23, 2014 at 3:49 PM, Stavros Delisavas  
>>>> wrote:
>>>>> Thanks for the fast responses. Looks like exactly what I was looking for!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Am 23.01.2014 09:46, schrieb Furkan KAMACI:
>>>>>> Hi;
>>>>>>
>>>>>> Firstly you should read here and learn the terminology of Solr:
>>>>>> http://wiki.apache.org/solr/SolrTerminology
>>>>>>
>>>>>> Thanks;
>>>>>> Furkan KAMACI
>>>>>>
>>>>>>
>>>>>> 2014/1/23 Alexandre Rafalovitch 
>>>>>>
>>>>>>> If you are not worried about them stepping on each other's toes
>>>>>>> (performance, disk space, etc), just create multiple collections.
>>>>>>> There are examples of that in standard distribution (e.g. badly named
>>>>>>> example/multicore).
>>>>>>>
>>>>>>> Regards,
>>>>>>>   Alex.
>>>>>>> Personal website: http://www.outerthoughts.com/
>>>>>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>>>>>>> - Time is the quality of nature that keeps events from happening all
>>>>>>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>>>>>>> book)
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jan 23, 2014 at 3:36 PM, Stavros Delisavas 
>>>>>>> 
>>>>>>> wrote:
>>>>>>>> Dear Solr-Experts,
>>>>>>>>
>>>>>>>> I am using Solr for my current web-application on my server 
>>>>>>>> successfully.
>>>>>>>> Now I would like to use it in my second web-application that is hosted
>>>>>>>> on the same server. Is it possible in any way to create two independent
>>>>>>>> instances/databases in Solr? I know that I could create another set of
>>>>>>>> fields with alternated field names, but I would prefer to be 
>>>>>>>> independent
>>>>>>>> on my field naming for all my projects.
>>>>>>>>
>>>>>>>> Also I would like to be able to have one state of my development 
>>>>>>>> version
>>>>>>>> and one state of my production version on my server so that I can do
>>>>>>>> tests on my development-state without interference on my
>>>>>>> production-version.
>>>>>>>> What is the best-practice to achieve this or how can this be done in
>>>>>>>> general?
>>>>>>>>
>>>>>>>> I have searched google but could not get any usefull results because I
>>>>>>>> don't even know what terms to search for with solr.
>>>>>>>> A minimal-example would be most helpfull.
>>>>>>>>
>>>>>>>> Thanks a lot!
>>>>>>>>
>>>>>>>> Stavros



  1   2   3   4   >