ways to check if document is in a huge search result set

2017-09-10 Thread Derek Poh

Hi

I have a collection of productdocument.
Each productdocument has supplier information in it.

I need to check if a supplier's products is return in a search 
resultcontaining over 100,000 products and in which page (assuming 
pagination is 20 products per page).
Itis time-consuming and "labour-intensive" to go through each page to 
look for the product of the supplier.


Would like to know if you guys have any better and easier waysto do this?

Derek

--
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. 


This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.

Re: multi language search engine in solr

2017-09-10 Thread Mugeesh Husain
Thank you rick for your response.

The document document have sepearte of the lanaguage instead of mix of
Arabic, English, Bengali, Hindi, Malay.

I coul not find any tokenizer for Malay, can you suggest me if you know
please.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr memory leak

2017-09-10 Thread Hendrik Haddorp
I didn't meant to say that the fix is not in 7.0. I just stated that I 
do not see it listed in the release notes 
(https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310230=12335718).


Thanks for explaining the release process.

regards,
Hendrik

On 10.09.2017 17:32, Erick Erickson wrote:

There will be no 6.7. Once the X+1 version is released, all past fixes
are applied to as minor releases to the last released version of the
previous major release. So now that 7.0 has been cut, there might be a
6.6.2 (6.6.1 was just released) but no 6.7. Current un-released JIRAs
are parked on the 6.x (as opposed to branch_6_6) for convenience. If
anyone steps up to release 6.6.2, they can include ths.

Why do you say this isn't in 7.0? The "Fix Versions" clearly states
so, as does CHANGES.txt for 7.0. The new file is is in the 7.0 branch.


If you need it in 6x you have a couple of options:

1> agitate fo ra 6.6.2 with this included
2> apply the patch yourself and compile it locally

Best,
Erick

On Sun, Sep 10, 2017 at 6:04 AM, Hendrik Haddorp
 wrote:

Hi,

looks like SOLR-10506 didn't make it into 6.6.1. I do however also not see
it listen in the current release notes for 6.7 nor 7.0:
 https://issues.apache.org/jira/projects/SOLR/versions/12340568
 https://issues.apache.org/jira/projects/SOLR/versions/12335718

Is there any any rough idea already when 6.7 or 7.0 will be released?

thanks,
Hendrik


On 28.08.2017 18:31, Erick Erickson wrote:

Varun Thacker is the RM for Solr 6.6.1, I've pinged him about including
it.

On Mon, Aug 28, 2017 at 8:52 AM, Walter Underwood 
wrote:

That would be a really good reason for a 6.7.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



On Aug 28, 2017, at 8:48 AM, Markus Jelsma 
wrote:

It is, unfortunately, not committed for 6.7.





-Original message-

From:Markus Jelsma 
Sent: Monday 28th August 2017 17:46
To: solr-user@lucene.apache.org
Subject: RE: Solr memory leak

See https://issues.apache.org/jira/browse/SOLR-10506
Fixed for 7.0

Markus



-Original message-

From:Hendrik Haddorp 
Sent: Monday 28th August 2017 17:42
To: solr-user@lucene.apache.org
Subject: Solr memory leak

Hi,

we noticed that triggering collection reloads on many collections has
a
good chance to result in an OOM-Error. To investigate that further I
did
a simple test:
  - Start solr with a 2GB heap and 1GB Metaspace
  - create a trivial collection with a few documents (I used only 2
fields and 100 documents)
  - trigger a collection reload in a loop (I used SolrJ for this)

Using Solr 6.3 the test started to fail after about 250 loops. Solr
6.6
worked better but also failed after 1100 loops.

When looking at the memory usage on the Solr dashboard it looks like
the
space left after GC cycles gets less and less. Then Solr gets very
slow,
as the JVM is busy with the GC. A bit later Solr gets an OOM-Error. In
my last run this was actually for the Metaspace. So it looks like more
and more heap and metaspace is being used by just constantly reloading
a
trivial collection.

regards,
Hendrik





Re: multi language search engine in solr

2017-09-10 Thread Rick Leir
Mugeesh,
One important question: will the typical document have a mix of English and 
Bangla and Hindi? If so, you would probably have them all in one collection.

Another thing to think about is the tokenizer. Are all words separated by white 
space? If not, then you might need to think about which tokenizer to use. 

As for character sets, I think you should make sure all the inputs are in 
UTF-8, then there should be no problem.

There will be other things to consider but this is a start.
Cheers -- Rick


On September 10, 2017 9:32:11 AM EDT, Mugeesh Husain  wrote:
>Hi 
>
>I am working on multi language search engine for english,bangla, hindi
>and
>indonesia  language.  can anybody guide me how to configure solr
>schema.
>
>1.) should i need to configure all the language in a single
>shard/collection. ?
>2.)should I need to configure separate  shard/collection for each of
>language ?
>
>I am looking for the suggestion about architecture level of this
>project,
>Please suggest and guide me to defining the schema and architecture.
>
>
>
>--
>Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: Solr memory leak

2017-09-10 Thread Erick Erickson
There will be no 6.7. Once the X+1 version is released, all past fixes
are applied to as minor releases to the last released version of the
previous major release. So now that 7.0 has been cut, there might be a
6.6.2 (6.6.1 was just released) but no 6.7. Current un-released JIRAs
are parked on the 6.x (as opposed to branch_6_6) for convenience. If
anyone steps up to release 6.6.2, they can include ths.

Why do you say this isn't in 7.0? The "Fix Versions" clearly states
so, as does CHANGES.txt for 7.0. The new file is is in the 7.0 branch.


If you need it in 6x you have a couple of options:

1> agitate fo ra 6.6.2 with this included
2> apply the patch yourself and compile it locally

Best,
Erick

On Sun, Sep 10, 2017 at 6:04 AM, Hendrik Haddorp
 wrote:
> Hi,
>
> looks like SOLR-10506 didn't make it into 6.6.1. I do however also not see
> it listen in the current release notes for 6.7 nor 7.0:
> https://issues.apache.org/jira/projects/SOLR/versions/12340568
> https://issues.apache.org/jira/projects/SOLR/versions/12335718
>
> Is there any any rough idea already when 6.7 or 7.0 will be released?
>
> thanks,
> Hendrik
>
>
> On 28.08.2017 18:31, Erick Erickson wrote:
>>
>> Varun Thacker is the RM for Solr 6.6.1, I've pinged him about including
>> it.
>>
>> On Mon, Aug 28, 2017 at 8:52 AM, Walter Underwood 
>> wrote:
>>>
>>> That would be a really good reason for a 6.7.
>>>
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>>
>>>
 On Aug 28, 2017, at 8:48 AM, Markus Jelsma 
 wrote:

 It is, unfortunately, not committed for 6.7.





 -Original message-
>
> From:Markus Jelsma 
> Sent: Monday 28th August 2017 17:46
> To: solr-user@lucene.apache.org
> Subject: RE: Solr memory leak
>
> See https://issues.apache.org/jira/browse/SOLR-10506
> Fixed for 7.0
>
> Markus
>
>
>
> -Original message-
>>
>> From:Hendrik Haddorp 
>> Sent: Monday 28th August 2017 17:42
>> To: solr-user@lucene.apache.org
>> Subject: Solr memory leak
>>
>> Hi,
>>
>> we noticed that triggering collection reloads on many collections has
>> a
>> good chance to result in an OOM-Error. To investigate that further I
>> did
>> a simple test:
>>  - Start solr with a 2GB heap and 1GB Metaspace
>>  - create a trivial collection with a few documents (I used only 2
>> fields and 100 documents)
>>  - trigger a collection reload in a loop (I used SolrJ for this)
>>
>> Using Solr 6.3 the test started to fail after about 250 loops. Solr
>> 6.6
>> worked better but also failed after 1100 loops.
>>
>> When looking at the memory usage on the Solr dashboard it looks like
>> the
>> space left after GC cycles gets less and less. Then Solr gets very
>> slow,
>> as the JVM is busy with the GC. A bit later Solr gets an OOM-Error. In
>> my last run this was actually for the Metaspace. So it looks like more
>> and more heap and metaspace is being used by just constantly reloading
>> a
>> trivial collection.
>>
>> regards,
>> Hendrik
>>
>


multi language search engine in solr

2017-09-10 Thread Mugeesh Husain
Hi 

I am working on multi language search engine for english,bangla, hindi and
indonesia  language.  can anybody guide me how to configure solr schema.

1.) should i need to configure all the language in a single
shard/collection. ?
2.)should I need to configure separate  shard/collection for each of
language ?

I am looking for the suggestion about architecture level of this project,
Please suggest and guide me to defining the schema and architecture.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr memory leak

2017-09-10 Thread Hendrik Haddorp

Hi,

looks like SOLR-10506 didn't make it into 6.6.1. I do however also not 
see it listen in the current release notes for 6.7 nor 7.0:

https://issues.apache.org/jira/projects/SOLR/versions/12340568
https://issues.apache.org/jira/projects/SOLR/versions/12335718

Is there any any rough idea already when 6.7 or 7.0 will be released?

thanks,
Hendrik

On 28.08.2017 18:31, Erick Erickson wrote:

Varun Thacker is the RM for Solr 6.6.1, I've pinged him about including it.

On Mon, Aug 28, 2017 at 8:52 AM, Walter Underwood  wrote:

That would be a really good reason for a 6.7.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



On Aug 28, 2017, at 8:48 AM, Markus Jelsma  wrote:

It is, unfortunately, not committed for 6.7.





-Original message-

From:Markus Jelsma 
Sent: Monday 28th August 2017 17:46
To: solr-user@lucene.apache.org
Subject: RE: Solr memory leak

See https://issues.apache.org/jira/browse/SOLR-10506
Fixed for 7.0

Markus



-Original message-

From:Hendrik Haddorp 
Sent: Monday 28th August 2017 17:42
To: solr-user@lucene.apache.org
Subject: Solr memory leak

Hi,

we noticed that triggering collection reloads on many collections has a
good chance to result in an OOM-Error. To investigate that further I did
a simple test:
 - Start solr with a 2GB heap and 1GB Metaspace
 - create a trivial collection with a few documents (I used only 2
fields and 100 documents)
 - trigger a collection reload in a loop (I used SolrJ for this)

Using Solr 6.3 the test started to fail after about 250 loops. Solr 6.6
worked better but also failed after 1100 loops.

When looking at the memory usage on the Solr dashboard it looks like the
space left after GC cycles gets less and less. Then Solr gets very slow,
as the JVM is busy with the GC. A bit later Solr gets an OOM-Error. In
my last run this was actually for the Metaspace. So it looks like more
and more heap and metaspace is being used by just constantly reloading a
trivial collection.

regards,
Hendrik





mm with sow set to false bug

2017-09-10 Thread Basel Ariqat
Hello,

We decided to upgrade solr to use Synonym graph filter, and this filter
requires sow to be set to false.
But after setting sow to false we’ve started to see some unexpected
behaviour and after digging down the problem we’ve found out that mm
results in wrong behaviour when setting sow to false and mm.autoRelax
doesn’t work properly.

The tests were as follows:
solr setup:
Solr version:
6.6.0 and 6.6.1
Index:
id,value
doc1,“This is the first sentence”
doc2,“This is the second sentence”
doc3,“This is the third one”
doc4,“This is the last one”
Field type:






stopwords.txt file:
This
Is
The
random_word

Tests:
test 1:
Query string: This one
mm value: 2
mm.autoRelax: TRUE

Expected result: doc1
Actual result: doc1

test 2:
Query string: This one
mm value: 2
mm.autoRelax: FALSE

Expected result: nothing
Actual result: doc1

test 3:
Query string: This one
mm value: 1
mm.autoRelax: TRUE

Expected result: doc1
Actual result: doc1

test 4:
Query string: This one
mm value: 1
mm.autoRelax: FALSE

Expected result: nothing
Actual result: doc1


As we can see in the tests, mm.autoRelax isn’t working properly. It behaves
like it’s always true in the previous cases.
This behaviour was present in both solr 6.6.0 and 6.6.1

What can we do to keep using SynonymGraphFilter without introducing the
above issues with autoRelax and mm?