RE: How to manages diversity in search results in Solr

2016-12-22 Thread Daisy
The main issue is: Our page needs 80 products to display from different 
suppliers per keyword search. We have some of the search keywords which only 
have less than 80 suppliers. If we use groupby(supplier), the 80 products per 
page is not possible anymore.

Regards,
Daisy

-Original Message-
From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] 
Sent: Thursday, December 22, 2016 6:24 PM
To: solr_user lucene_apache
Subject: Re: How to manages diversity in search results in Solr

On Thu, 2016-12-22 at 17:35 +0800, Daisy wrote:
> How to restrict the product search in a marketplace where no more than 
> 3 results per retailer are permitted in search results?
> 
> I understand the groupby/collapse could solve the issue but is there 
> any other way to do it?

Grouping is the obvious solution. Since that does not work for you, you need to 
describe what the problem is with that solution, in order for us to suggest 
alternatives.

- Toke Eskildsen, State and University Library, Denmark


--
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or 
privileged information. If you are not the intended recipient or have received 
this e-mail in error, please inform the sender immediately and delete this 
e-mail (including any attachments) from your computer, and you must not use, 
disclose to anyone else or copy this e-mail (including any attachments), 
whether in whole or in part. 

This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.



RE: Customizing the search result

2016-12-22 Thread Daisy
Thanks a lot for the response.
What I would like to do is something similar to groupby function. But because 
of some performance issue and business requirement limitation, I rather don't 
want to use groupby. 

Our business requirement is to avoid one supplier is dominating the search 
result page. 50 out of 80 products are belong to one supplier. 
I understand that this issue can be solved by groupby function.

1. I did check the query time and groupby takes a little bit longer time 
compare to without groupby query.
2. Another business limitation is: Our page needs 80 products to display from 
different suppliers per keyword search. We have some of the search keywords 
which only have less than 80 suppliers. If we use groupby(supplier), the 80 
products per page is not possible anymore.


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Friday, December 23, 2016 12:01 PM
To: solr-user
Subject: Re: Customizing the search result

My very, very, very first question is "why do you think you have to develop 
your own customized re-ranking?". How have you determined that your needs 
aren't satisfied out-of-the-box? What I'm going for here is wondering if this 
is an XY problem. You're asking how to do X because you think that will 
accomplish Y, without stating what the task (Y) is. It'll save you a LOT of 
work if you don't have to create (and
maintain) your own.

That said, maybe you _do_ have to extend BaseSimilarity. But there's a lot 
built in to Solr so before going there let's see if there's an easier solution.

For instance, there's the ReRankingQParserPlugin that takes the output from the 
main clause and pushes it through a completely independent Solr query that at 
least sounds similar to what you want to do. There is boosting, altering the 
score by function queries, etc. etc, etc.

For <2> what you probaby want is a search component, which is pluggable. These 
are chained together in your request handler and you can add a 
 entry and get the packet to be returned just before it's 
sent. It will contain all the data to be returned, the docs (rows worth), the 
facets, groups, all that stuff.

But again, why do you want to do this? There are also DocTransformers that can 
be used to munge the individual documents coming back that you can configure 
rather than code fresh. They may not actually do what you need but before 
writing your own let's see if maybe there's an easier way to do what you want 
than extending org.apache.solr.response.transform.DocTransformer and creating a 
plugin..

Best,
Erick

On Thu, Dec 22, 2016 at 6:58 PM, Daisy  wrote:
> I’m really new to SOLR and excuse me if my question is vague.
>
> I found some of the search related things in solr-core → 
> org.apache.solr.search package. I’m not sure this is the right package to 
> look into.
>
>
>
> 1.   I would like to know if we are going to develop our own customized 
> re-ranking, where and how can we add the new codes?
>
> 2.   Which class is the final step before returning the result from Solr? 
> For e.g. “ start="0">”
>
> Thank you.
>
>
> --
> CONFIDENTIALITY NOTICE
>
> This e-mail (including any attachments) may contain confidential and/or 
> privileged information. If you are not the intended recipient or have 
> received this e-mail in error, please inform the sender immediately and 
> delete this e-mail (including any attachments) from your computer, and you 
> must not use, disclose to anyone else or copy this e-mail (including any 
> attachments), whether in whole or in part.
>
> This e-mail and any reply to it may be monitored for security, legal, 
> regulatory compliance and/or other appropriate reasons.
>


--
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or 
privileged information. If you are not the intended recipient or have received 
this e-mail in error, please inform the sender immediately and delete this 
e-mail (including any attachments) from your computer, and you must not use, 
disclose to anyone else or copy this e-mail (including any attachments), 
whether in whole or in part. 

This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.



Re: Limit = 0? Does it still calculate facet ?

2016-12-22 Thread William Bell
Yeah we have a bunch of facet.fields that we need, but want to selectively
turn a few off based on user input.

QT - has default facet.fields.

We want to turn a couple off like this:

http://localhost:8983/solr/core/select?qt=provider=0

Will this turn off the payor field that is defined in QT?

Any other way to do it?

On Thu, Dec 22, 2016 at 4:15 PM, Tomás Fernández Löbbe <
tomasflo...@gmail.com> wrote:

> Yes, facet.limit will short circuit and not calculate the facet for the
> field. I'm assuming you can't just use facet=false?
>
> Tomas
>
> On Thu, Dec 22, 2016 at 1:00 PM, William Bell  wrote:
>
> > We have a qt=provider and it sets facets.
> >
> > We want to short circuit the facet. Can we set limit=0 and will it NOT
> > calculate it?
> >
> > Or does it calculate it and not return results? Can we make it faster ?
> >
> > f..facet.limit = 0
> >
> > --
> > Bill Bell
> > billnb...@gmail.com
> > cell 720-256-8076
> >
>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: Limit = 0? Does it still calculate facet ?

2016-12-22 Thread Mikhail Khludnev
> facet.limit will short circuit and not calculate the facet for the field.
But not for SolrCloud, I believe.

On Fri, Dec 23, 2016 at 2:15 AM, Tomás Fernández Löbbe <
tomasflo...@gmail.com> wrote:

> Yes, facet.limit will short circuit and not calculate the facet for the
> field. I'm assuming you can't just use facet=false?
>
> Tomas
>
> On Thu, Dec 22, 2016 at 1:00 PM, William Bell  wrote:
>
> > We have a qt=provider and it sets facets.
> >
> > We want to short circuit the facet. Can we set limit=0 and will it NOT
> > calculate it?
> >
> > Or does it calculate it and not return results? Can we make it faster ?
> >
> > f..facet.limit = 0
> >
> > --
> > Bill Bell
> > billnb...@gmail.com
> > cell 720-256-8076
> >
>



-- 
Sincerely yours
Mikhail Khludnev


Re: Customizing the search result

2016-12-22 Thread Erick Erickson
My very, very, very first question is "why do you think you have to
develop your own customized re-ranking?". How have you determined that
your needs aren't satisfied out-of-the-box? What I'm going for here is
wondering if this is an XY problem. You're asking how to do X because
you think that will accomplish Y, without stating what the task (Y)
is. It'll save you a LOT of work if you don't have to create (and
maintain) your own.

That said, maybe you _do_ have to extend BaseSimilarity. But there's a
lot built in to Solr so before going there let's see if there's an
easier solution.

For instance, there's the ReRankingQParserPlugin that takes the output
from the main clause and pushes it through a completely independent
Solr query that at least sounds similar to what you want to do. There
is boosting, altering the score by function queries, etc. etc,
etc.

For <2> what you probaby want is a search component, which is
pluggable. These are chained together in your request handler and you
can add a  entry and get the packet to be returned
just before it's sent. It will contain all the data to be returned,
the docs (rows worth), the facets, groups, all that stuff.

But again, why do you want to do this? There are also DocTransformers
that can be used to munge the individual documents coming back that
you can configure rather than code fresh. They may not actually do
what you need but before writing your own let's see if maybe there's
an easier way to do what you want than extending
org.apache.solr.response.transform.DocTransformer and creating a
plugin..

Best,
Erick

On Thu, Dec 22, 2016 at 6:58 PM, Daisy  wrote:
> I’m really new to SOLR and excuse me if my question is vague.
>
> I found some of the search related things in solr-core → 
> org.apache.solr.search package. I’m not sure this is the right package to 
> look into.
>
>
>
> 1.   I would like to know if we are going to develop our own customized 
> re-ranking, where and how can we add the new codes?
>
> 2.   Which class is the final step before returning the result from Solr? 
> For e.g. “ start="0">”
>
> Thank you.
>
>
> --
> CONFIDENTIALITY NOTICE
>
> This e-mail (including any attachments) may contain confidential and/or 
> privileged information. If you are not the intended recipient or have 
> received this e-mail in error, please inform the sender immediately and 
> delete this e-mail (including any attachments) from your computer, and you 
> must not use, disclose to anyone else or copy this e-mail (including any 
> attachments), whether in whole or in part.
>
> This e-mail and any reply to it may be monitored for security, legal, 
> regulatory compliance and/or other appropriate reasons.
>


Customizing the search result

2016-12-22 Thread Daisy
I’m really new to SOLR and excuse me if my question is vague. 

I found some of the search related things in solr-core → org.apache.solr.search 
package. I’m not sure this is the right package to look into.

 

1.   I would like to know if we are going to develop our own customized 
re-ranking, where and how can we add the new codes?

2.   Which class is the final step before returning the result from Solr? 
For e.g. “”

Thank you.


--
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or 
privileged information. If you are not the intended recipient or have received 
this e-mail in error, please inform the sender immediately and delete this 
e-mail (including any attachments) from your computer, and you must not use, 
disclose to anyone else or copy this e-mail (including any attachments), 
whether in whole or in part. 

This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.



Re: DIH Commit Issue

2016-12-22 Thread Erick Erickson
I would set the times in the autoCommit to a large number (or -1 I
think). It's possible that there's a default there if the autocommit
section is found but nothing specified, you'll have to look at the
code to be sure.

But what I would do is use aliasing (either core if you're in
stand-alone or collection if you're in SolrCloud). Index to the
offline collection, and when you're satisfied switch the alias. That
way these updates are all atomic.

Best,
Erick

On Thu, Dec 22, 2016 at 2:38 PM, AJ Lemke  wrote:
> Hi All,
>
> I have a DIH issue where the index will commit after 1 and 2 minutes then 
> will not commit again until the end.
> We would like the commit to happen at the end so the index does not lose 75% 
> or more of the records until the end of the process.
> We went from 370,000+ records to around 27,000 records then back to 370,000+ 
> when the process ended.
> I have change the update handler to the following.
>
> 
> 
> ${solr.ulog.dir:}
> 
> 
> 
> 
>
>
> Is there something else that I should do?
>
> Thanks All!
> AJ


Re: Limit = 0? Does it still calculate facet ?

2016-12-22 Thread Tomás Fernández Löbbe
Yes, facet.limit will short circuit and not calculate the facet for the
field. I'm assuming you can't just use facet=false?

Tomas

On Thu, Dec 22, 2016 at 1:00 PM, William Bell  wrote:

> We have a qt=provider and it sets facets.
>
> We want to short circuit the facet. Can we set limit=0 and will it NOT
> calculate it?
>
> Or does it calculate it and not return results? Can we make it faster ?
>
> f..facet.limit = 0
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076
>


MLT Java example for Solr 6.3

2016-12-22 Thread Todd_Peterson
I am having trouble locating a decent example for using the MLT Java API 
in Solr 6.3. What I want is to retrieve document IDs that are similar to a 
given document ID.

Todd Peterson
Chief Embedded Systems Engineer
Management Sciences, Inc.
6022 Constitution Ave NE
Albuquerque, NM 87144
505-255-8611 (office)
505-205-7057 (cell)

DIH Commit Issue

2016-12-22 Thread AJ Lemke
Hi All,

I have a DIH issue where the index will commit after 1 and 2 minutes then will 
not commit again until the end.
We would like the commit to happen at the end so the index does not lose 75% or 
more of the records until the end of the process.
We went from 370,000+ records to around 27,000 records then back to 370,000+ 
when the process ended.
I have change the update handler to the following.



${solr.ulog.dir:}






Is there something else that I should do?

Thanks All!
AJ


Re: problem executing a query using lucene directly

2016-12-22 Thread Alan Woodward
Solr wraps its IndexReader in an UninvertingReader, which builds doc-values 
structures in memory if required.  If you include the solr jar file on your 
classpath, you should be able to use UninvertingReader.wrap() to do something 
similar.

Alan Woodward
www.flax.co.uk


> On 22 Dec 2016, at 17:58, Roxana Danger  
> wrote:
> 
> Hi Alan,
> thank you very much, but I am not sure if this is the reason.
> 
> but if I use the solrSearcher, FieldValueQuery works well, using the same
> index.
> If SolrIndexSearcher enable this feature, how does it do it?
> 
> Thank you again!
> 
> 
> 
> 
> On 22 December 2016 at 17:34, Alan Woodward  wrote:
> 
>> Hi,
>> 
>> FieldValueQuery reports matches using docvalues, and it looks like they’re
>> not enabled on that field.
>> 
>> Alan Woodward
>> www.flax.co.uk
>> 
>> 
>>> On 22 Dec 2016, at 16:21, Roxana Danger 
>> wrote:
>>> 
>>> Hi all,
>>> 
>>> I have created an index using solr. I am trying to execute the following
>>> code, but I get zero results in the count.
>>> 
>>> DirectoryReader dr = DirectoryReader.open(FSDirectory.open(new
>>> File(indexDir).toPath()));
>>> IndexSearcher searcher = new IndexSearcher( dr );
>>> 
>>> System.out.println(dr.maxDoc()); // Shows 200
>>> Query query = new FieldValueQuery("table");
>>> CollectionStatistics stats = searcher.collectionStatistics("table");
>>> System.out.println(stats.docCount()); // Shows 200
>>> 
>>> System.out.println(searcher.count(query)); //Shows 0, should be 200
>>> 
>>> The definition of the table filed in the schema.xml is:
>>> 
>>> >> required="true" multiValued="false"/>
>>> 
>>> 
>>> Any idea, why this could be happening? Why the search with the
>>> FieldValueQuery is not returning the correct result?
>>> 
>>> Thank you very much in advance.
>>> 
>>> --
>>> Reed Online Ltd is a company registered in England and Wales. Company
>>> Registration Number: 6317279.
>>> Registered Office: Academy Court, 94 Chancery Lane, London WC2A 1DT.
>> 
>> 
> 
> 
> -- 
> Roxana Danger | Senior Data Scientist
> Dragon Court, 27-29 Macklin Street, London, WC2B 5LX
> Tel: 020 7067 4568   Ext:
> [image: reed.co.uk] 
> The UK's #1 job site. 
> [image: Follow us on Twitter] 
>  [image: Like us on Facebook]
> 
> 
> It's time to Love Mondays » 
> 
> -- 
> Reed Online Ltd is a company registered in England and Wales. Company 
> Registration Number: 6317279.
> Registered Office: Academy Court, 94 Chancery Lane, London WC2A 1DT.



Limit = 0? Does it still calculate facet ?

2016-12-22 Thread William Bell
We have a qt=provider and it sets facets.

We want to short circuit the facet. Can we set limit=0 and will it NOT
calculate it?

Or does it calculate it and not return results? Can we make it faster ?

f..facet.limit = 0

-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: update operation

2016-12-22 Thread Erick Erickson
Well, there are two options:

1> set up your autocommit interval in solrconfig.xml and wait for as
long as you set it. Say 30 seconds for softcommit. Note, you must
either use soft commit or your  entry must have
true.

2> curl (or use the browser) http://snip/solr/TEST_CORE/update?commit=true

Best,
Erick

On Thu, Dec 22, 2016 at 12:31 PM, KRIS MUSSHORN  wrote:
> How would i exp;licitly commit?
> Sorry for the silly questions but im pretty fried
>
> - Original Message -
>
> From: "Erick Erickson" 
> To: "solr-user" 
> Sent: Thursday, December 22, 2016 2:49:05 PM
> Subject: Re: update operation
>
> Kris:
>
> Maybe too simple, but did you commit afterwards?
>
> On Thu, Dec 22, 2016 at 10:45 AM, Shawn Heisey  wrote:
>> On 12/22/2016 10:18 AM, KRIS MUSSHORN wrote:
>>> UPDATE_RESULT=$( curl -s -X POST -H 'Content-Type: text/json' 
>>> "https://snip/solr/TEST_CORE/update/json/docs; --data-binary 
>>> '{"id":"*'$DOC_ID'","metatag.date.single":{"set":"$VAL"}}')
>>>
>>> was the only version that did not throw an error but did not update the 
>>> document.
>>
>> I think that will put a literal "$VAL" in the output, rather than the
>> value of the VAL variable. It will also put an asterisk before your
>> DOC_ID ... is that what you wanted it to do? If an asterisk is not part
>> of your id value, that might be why it's not working.
>>
>> Answering the earlier email: Your command choices are add, delete,
>> commit, and optimize. An update is just an add that deletes the original.
>>
>> Thanks,
>> Shawn
>>
>


Re: update operation

2016-12-22 Thread KRIS MUSSHORN
How would i exp;licitly commit? 
Sorry for the silly questions but im pretty fried 

- Original Message -

From: "Erick Erickson"  
To: "solr-user"  
Sent: Thursday, December 22, 2016 2:49:05 PM 
Subject: Re: update operation 

Kris: 

Maybe too simple, but did you commit afterwards? 

On Thu, Dec 22, 2016 at 10:45 AM, Shawn Heisey  wrote: 
> On 12/22/2016 10:18 AM, KRIS MUSSHORN wrote: 
>> UPDATE_RESULT=$( curl -s -X POST -H 'Content-Type: text/json' 
>> "https://snip/solr/TEST_CORE/update/json/docs; --data-binary 
>> '{"id":"*'$DOC_ID'","metatag.date.single":{"set":"$VAL"}}') 
>> 
>> was the only version that did not throw an error but did not update the 
>> document. 
> 
> I think that will put a literal "$VAL" in the output, rather than the 
> value of the VAL variable. It will also put an asterisk before your 
> DOC_ID ... is that what you wanted it to do? If an asterisk is not part 
> of your id value, that might be why it's not working. 
> 
> Answering the earlier email: Your command choices are add, delete, 
> commit, and optimize. An update is just an add that deletes the original. 
> 
> Thanks, 
> Shawn 
> 



Re: XFS or EXT4 on Amazon AWS AMIs

2016-12-22 Thread William Bell
http://edgystuff.tumblr.com/post/81219256714/tips-to-check-and-improve-your-storage-io

Which specifies:  SERVER-13417



You might be right on XFS... We are testing today.

On Thu, Dec 22, 2016 at 1:03 AM, Will Martin  wrote:

> I'd like to see the MongoDB report(?). ext4fs design specifications
> includes support for large files via allocation placement. MongoDB, the
> last time I checked, does pre-allocation which gives it the performance
> benefit of ext4fs multiple design factors (Block and Inode Allocation
> Policy), but the disadvantage of having to rebuild when file lengths are
> being exceeded; at which time the disk fragmentation may prevent ext4fs
> from getting the allocation pattern it was designed for.
>
> That design feature is going to be unavailable with Solr where ext4fs
> dynamic allocation features are less deterministic. Other performance
> factors on ext4fs, and mutexes (even with guard mutexes) are pretty
> standard patterns. The threaded calls sound like the advantages of the
> allocation pattern.
>
> Still those statements, *based on a dated reading of mine*, may be out of
> date with the MongoDB report factors.
>
> "ext4 recognizes (better than ext3, anyway) that data locality is
> generally a desirably quality of a filesystem"
>
> https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#
> Block_and_Inode_Allocation_Policy
>
> For AWS AMI, is there an r4 instance type? The c3 and m3 are superseded
> with *4 types that have notable improvements in IOPs and don't cost more.
>
> http://howto.unixdev.net/Test_LVM_Trim_Ext4.html   -- not an extended
> performance benchmark, but useful to validate discard/TRIM.
>
> On 12/22/2016 1:32 AM, William Bell wrote:
>
> So what are people recommending for SOLR on AWS on Amazon AMI - ext4 or
> xfs?
>
> I saw an article about MongoDB - saying performance on Amazon was better
> due to a mutex issue on ext4 files and threaded calls.
>
> I have been using ext4 for a long time, but I am moving to r3.* instances
> and TRIM / DISCARD support just appears more supported on XFS.
>
>
>
>
>
>
>


-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: update operation

2016-12-22 Thread Erick Erickson
Kris:

Maybe too simple, but did you commit afterwards?

On Thu, Dec 22, 2016 at 10:45 AM, Shawn Heisey  wrote:
> On 12/22/2016 10:18 AM, KRIS MUSSHORN wrote:
>> UPDATE_RESULT=$( curl -s -X POST -H 'Content-Type: text/json' 
>> "https://snip/solr/TEST_CORE/update/json/docs; --data-binary 
>> '{"id":"*'$DOC_ID'","metatag.date.single":{"set":"$VAL"}}')
>>
>> was the only version that did not throw an error but did not update the 
>> document.
>
> I think that will put a literal "$VAL" in the output, rather than the
> value of the VAL variable.  It will also put an asterisk before your
> DOC_ID ... is that what you wanted it to do?  If an asterisk is not part
> of your id value, that might be why it's not working.
>
> Answering the earlier email:  Your command choices are add, delete,
> commit, and optimize.  An update is just an add that deletes the original.
>
> Thanks,
> Shawn
>


Re: update operation

2016-12-22 Thread Shawn Heisey
On 12/22/2016 10:18 AM, KRIS MUSSHORN wrote:
> UPDATE_RESULT=$( curl -s -X POST -H 'Content-Type: text/json' 
> "https://snip/solr/TEST_CORE/update/json/docs; --data-binary 
> '{"id":"*'$DOC_ID'","metatag.date.single":{"set":"$VAL"}}') 
>
> was the only version that did not throw an error but did not update the 
> document. 

I think that will put a literal "$VAL" in the output, rather than the
value of the VAL variable.  It will also put an asterisk before your
DOC_ID ... is that what you wanted it to do?  If an asterisk is not part
of your id value, that might be why it's not working.

Answering the earlier email:  Your command choices are add, delete,
commit, and optimize.  An update is just an add that deletes the original.

Thanks,
Shawn



Re: problem executing a query using lucene directly

2016-12-22 Thread Roxana Danger
Hi Alan,
thank you very much, but I am not sure if this is the reason.

but if I use the solrSearcher, FieldValueQuery works well, using the same
index.
If SolrIndexSearcher enable this feature, how does it do it?

Thank you again!




On 22 December 2016 at 17:34, Alan Woodward  wrote:

> Hi,
>
> FieldValueQuery reports matches using docvalues, and it looks like they’re
> not enabled on that field.
>
> Alan Woodward
> www.flax.co.uk
>
>
> > On 22 Dec 2016, at 16:21, Roxana Danger 
> wrote:
> >
> > Hi all,
> >
> > I have created an index using solr. I am trying to execute the following
> > code, but I get zero results in the count.
> >
> > DirectoryReader dr = DirectoryReader.open(FSDirectory.open(new
> > File(indexDir).toPath()));
> > IndexSearcher searcher = new IndexSearcher( dr );
> >
> > System.out.println(dr.maxDoc()); // Shows 200
> > Query query = new FieldValueQuery("table");
> > CollectionStatistics stats = searcher.collectionStatistics("table");
> > System.out.println(stats.docCount()); // Shows 200
> >
> > System.out.println(searcher.count(query)); //Shows 0, should be 200
> >
> > The definition of the table filed in the schema.xml is:
> >
> >  > required="true" multiValued="false"/>
> >
> >
> > Any idea, why this could be happening? Why the search with the
> > FieldValueQuery is not returning the correct result?
> >
> > Thank you very much in advance.
> >
> > --
> > Reed Online Ltd is a company registered in England and Wales. Company
> > Registration Number: 6317279.
> > Registered Office: Academy Court, 94 Chancery Lane, London WC2A 1DT.
>
>


-- 
Roxana Danger | Senior Data Scientist
Dragon Court, 27-29 Macklin Street, London, WC2B 5LX
Tel: 020 7067 4568   Ext:
[image: reed.co.uk] 
The UK's #1 job site. 
[image: Follow us on Twitter] 
 [image: Like us on Facebook]


It's time to Love Mondays » 

-- 
Reed Online Ltd is a company registered in England and Wales. Company 
Registration Number: 6317279.
Registered Office: Academy Court, 94 Chancery Lane, London WC2A 1DT.


solrj: get to which shard a id will be routed

2016-12-22 Thread xavier jmlucjav
Hi

Is there somewhere a sample of some solrj code that given:
- a collection
- the id (like "IBM!12345")

returns the shard to where the doc will be routed? I was hoping to get that
info from CloudSolrClient  itself but it's not exposing it as far as I can
see.

thanks
xavier


Re: problem executing a query using lucene directly

2016-12-22 Thread Alan Woodward
Hi, 

FieldValueQuery reports matches using docvalues, and it looks like they’re not 
enabled on that field.

Alan Woodward
www.flax.co.uk


> On 22 Dec 2016, at 16:21, Roxana Danger  
> wrote:
> 
> Hi all,
> 
> I have created an index using solr. I am trying to execute the following
> code, but I get zero results in the count.
> 
> DirectoryReader dr = DirectoryReader.open(FSDirectory.open(new
> File(indexDir).toPath()));
> IndexSearcher searcher = new IndexSearcher( dr );
> 
> System.out.println(dr.maxDoc()); // Shows 200
> Query query = new FieldValueQuery("table");
> CollectionStatistics stats = searcher.collectionStatistics("table");
> System.out.println(stats.docCount()); // Shows 200
> 
> System.out.println(searcher.count(query)); //Shows 0, should be 200
> 
> The definition of the table filed in the schema.xml is:
> 
>  required="true" multiValued="false"/>
> 
> 
> Any idea, why this could be happening? Why the search with the
> FieldValueQuery is not returning the correct result?
> 
> Thank you very much in advance.
> 
> -- 
> Reed Online Ltd is a company registered in England and Wales. Company 
> Registration Number: 6317279.
> Registered Office: Academy Court, 94 Chancery Lane, London WC2A 1DT.



Re: update operation

2016-12-22 Thread KRIS MUSSHORN
Shawn, 

Running: 


UPDATE_RESULT=$( curl -s -X POST -H 'Content-Type: text/json' 
"https://snip/solr/TEST_CORE/update/json/docs; --data-binary 
'{"id":"*'$DOC_ID'","metatag.date.single":{"set":"$VAL"}}') 

was the only version that did not throw an error but did not update the 
document. 


It returned: 

{"responseHeader":{"status":0,"QTime":1}} 




Where do i go from here? 




K 





- Original Message -

From: "Shawn Heisey"  
To: solr-user@lucene.apache.org 
Sent: Thursday, December 22, 2016 11:00:21 AM 
Subject: Re: update operation 

On 12/22/2016 8:45 AM, KRIS MUSSHORN wrote: 
> Here is the bash line: 
> 
> UPDATE_RESULT=$( curl -s "https://snip/solr/TEST_CORE/update?=true; 
> --data-binary '{"id":"$DOC_ID","metatag.date.single" :{"set":"$VAL"}}') 

One thing I know you need for sure with the "/update" handler is the 
Content-Type header. Without it, Solr will not know that you are 
sending JSON. 

https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-JSONFormattedIndexUpdates
 

There are some alternate update URL paths that assume JSON. See below. 
Because your JSON does not include the "add" command, but instead has a 
bare document, you *might* need to send to /update/json/docs instead of 
just /update or even /update/json. Or you can restructure it to use the 
"add" command. 

https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-JSONUpdateConveniencePaths
 

Thanks, 
Shawn 




Re: update operation

2016-12-22 Thread KRIS MUSSHORN

Shawn, 

Perhaps i misunderstood the documentation but when you included the add clause 
does it not create an entirely new document? 

K 


- Original Message -

From: "Shawn Heisey"  
To: solr-user@lucene.apache.org 
Sent: Thursday, December 22, 2016 11:00:21 AM 
Subject: Re: update operation 

On 12/22/2016 8:45 AM, KRIS MUSSHORN wrote: 
> Here is the bash line: 
> 
> UPDATE_RESULT=$( curl -s "https://snip/solr/TEST_CORE/update?=true; 
> --data-binary '{"id":"$DOC_ID","metatag.date.single" :{"set":"$VAL"}}') 

One thing I know you need for sure with the "/update" handler is the 
Content-Type header. Without it, Solr will not know that you are 
sending JSON. 

https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-JSONFormattedIndexUpdates
 

There are some alternate update URL paths that assume JSON. See below. 
Because your JSON does not include the "add" command, but instead has a 
bare document, you *might* need to send to /update/json/docs instead of 
just /update or even /update/json. Or you can restructure it to use the 
"add" command. 

https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-JSONUpdateConveniencePaths
 

Thanks, 
Shawn 




Re: Solr | OOM

2016-12-22 Thread Shawn Heisey
On 12/22/2016 9:29 AM, Prateek Jain J wrote:
> We are using solr 4.8.1 and getting OOM Error in one of the test 
> environments. Given below are the details:

There are exactly two ways to deal with OOM:  1) Increase the Java
heap.  2) Make the program use less memory.  Any other action (such as
changing garbage collection parameters) will NOT fix problems with
running out of memory.

> 1.   OS - Linux, 64 bit, 32 GB RAM
>
> 2.   Solr - 4.8.1, 8 GB allocation as java heap. Installed as service. 
> Thread size (-Xss of 256K, -XX:+UseParallelOldGC).
>
> 3.   Java - 1.7 update 95, 64-bit
>
> It is happening when one of the solr instance is trying to come up. It has 
> around 100GB of data, lying on some network storage like NFS. Now, the 
> interesting part is, eclipse MAT plugin shows that FieldCache has taken more 
> than 3.5GB from 8GB. This environment has setup for stress testing solr, so 
> even when a new solr instance is starting, load is on it.

The size of each allocation that goes into the FieldCache will be
determined mostly by the number of documents you have in your index.  A
large amount of memory can be allocated in the FieldCache if you use
many fields for sorting and/or facets and don't have docValues enabled
on those fields.  If you have ten million documents in your index and
don't use docValues, then each field you sort on will add ten million
entries to the FieldCache.  The same goes for facet fields when using
the default facet.method setting.

This page discusses ways you may be able to reduce Solr's heap requirements:

https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

Although the GC tuning you use will not affect OOM, I would strongly
recommend *not* using the parallel collector.  All the info below is
about GC tuning, not OOM problems:

https://wiki.apache.org/solr/ShawnHeisey

Solr 5.x and 6.x have GC tuning built in, and the settings are very
similar to the CMS settings you can find on my wiki page.

Thanks,
Shawn



Re: Use ConcurrentUpdateSolrClient some program

2016-12-22 Thread Shawn Heisey
On 12/21/2016 11:15 PM, 苗海泉 wrote:
> I use the solr is 6.0 version, the solrj is 6.0 version, using
> SolrCloud mode deployment, in my code did not make an explicit commit,
> configure the autoCommit and softAutoCommit, using the
> ConcurrentUpdateSolrClient class.
>
> When we send 100 million data, often read timeout exception occurred
> in this anomaly, the data is lost. I would like to ask a few
> questions:
> 1, ConcurrentUpdateSolrClient.add time, if not thrown on behalf of the
> data is not an exception has been successfully sent to the solr, this
> time is the synchronization of Well, that is, solr server to accept
> the data written to the log before we return?

I can't decipher what you're asking here.

> 2, if the answer to question 1 is no, then how do we determine
> ConcurrentUpdateSolrClient.add implementation failure, so that we have
> the wrong data retransmission processing.

ConcurrentUpdateSolrClient will *never* inform you about exceptions that
occur related to the "add" calls you make.  Those calls will return to
the code immediately and the actual adds are done in the background. 
Errors that occur will be logged, but no exceptions will make it back to
your code.

All of your Solr servers could be completely down, and the "add" calls
will show no errors at all.  Use HttpSolrClient or CloudSolrClient as
appropriate, like Erick mentioned.  If you want multi-threaded indexing
*and* error detection, you'll have to write the multi-threading yourself.

Thanks,
Shawn



Re: problem executing a query using lucene directly

2016-12-22 Thread Shawn Heisey
On 12/22/2016 9:21 AM, Roxana Danger wrote:
> I have created an index using solr. I am trying to execute the following
> code, but I get zero results in the count.
>
> DirectoryReader dr = DirectoryReader.open(FSDirectory.open(new
> File(indexDir).toPath()));
> IndexSearcher searcher = new IndexSearcher( dr );
>
> System.out.println(dr.maxDoc()); // Shows 200
> Query query = new FieldValueQuery("table");
> CollectionStatistics stats = searcher.collectionStatistics("table");
> System.out.println(stats.docCount()); // Shows 200
>
> System.out.println(searcher.count(query)); //Shows 0, should be 200
>
> The definition of the table filed in the schema.xml is:
>
>  required="true" multiValued="false"/>
>
>
> Any idea, why this could be happening? Why the search with the
> FieldValueQuery is not returning the correct result?

You're writing Lucene code here.  A large part of Solr's purpose is to
avoid the need for Lucene code, so only a fraction of the people on this
list know how to write and troubleshoot Lucene code.  I am not one of them.

You may have better luck with the java-u...@lucene.apache.org mailing
list.  Your question is off-topic for this list.

http://lucene.apache.org/core/discussion.html

Thanks,
Shawn



Solr | OOM

2016-12-22 Thread Prateek Jain J

Hi,

We are using solr 4.8.1 and getting OOM Error in one of the test environments. 
Given below are the details:


1.   OS - Linux, 64 bit, 32 GB RAM

2.   Solr - 4.8.1, 8 GB allocation as java heap. Installed as service. 
Thread size (-Xss of 256K, -XX:+UseParallelOldGC).

3.   Java - 1.7 update 95, 64-bit

It is happening when one of the solr instance is trying to come up. It has 
around 100GB of data, lying on some network storage like NFS. Now, the 
interesting part is, eclipse MAT plugin shows that FieldCache has taken more 
than 3.5GB from 8GB. This environment has setup for stress testing solr, so 
even when a new solr instance is starting, load is on it. What I suspect here 
is, two things are happening at same time:


a.   Solr is starting and indexing data (on network).

b.   Solr is trying to answer client queries.

Now, this might be resulting in scenario where millions of new objects are 
coming in cache and getting de-referenced. hprof file shows that minor GC is 
happening every 1 minute.

Is this a reasonable scenario? Anyone seen/encountered it? Any pointers are 
welcomed. Feel free to revert in case you need more information.

Regards,
Prateek Jain



problem executing a query using lucene directly

2016-12-22 Thread Roxana Danger
Hi all,

I have created an index using solr. I am trying to execute the following
code, but I get zero results in the count.

DirectoryReader dr = DirectoryReader.open(FSDirectory.open(new
File(indexDir).toPath()));
IndexSearcher searcher = new IndexSearcher( dr );

System.out.println(dr.maxDoc()); // Shows 200
Query query = new FieldValueQuery("table");
CollectionStatistics stats = searcher.collectionStatistics("table");
System.out.println(stats.docCount()); // Shows 200

System.out.println(searcher.count(query)); //Shows 0, should be 200

The definition of the table filed in the schema.xml is:




Any idea, why this could be happening? Why the search with the
FieldValueQuery is not returning the correct result?

Thank you very much in advance.

-- 
Reed Online Ltd is a company registered in England and Wales. Company 
Registration Number: 6317279.
Registered Office: Academy Court, 94 Chancery Lane, London WC2A 1DT.


Re: update operation

2016-12-22 Thread Shawn Heisey
On 12/22/2016 8:45 AM, KRIS MUSSHORN wrote:
> Here is the bash line: 
>
> UPDATE_RESULT=$( curl -s "https://snip/solr/TEST_CORE/update?=true; 
> --data-binary '{"id":"$DOC_ID","metatag.date.single" :{"set":"$VAL"}}') 

One thing I know you need for sure with the "/update" handler is the
Content-Type header.  Without it, Solr will not know that you are
sending JSON.

https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-JSONFormattedIndexUpdates

There are some alternate update URL paths that assume JSON.  See below. 
Because your JSON does not include the "add" command, but instead has a
bare document, you *might* need to send to /update/json/docs instead of
just /update or even /update/json.  Or you can restructure it to use the
"add" command.

https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-JSONUpdateConveniencePaths

Thanks,
Shawn



update operation

2016-12-22 Thread KRIS MUSSHORN
Merry Christmas everyone, 

I'm using solr 5.4.1 and writing a bash script to update the value in a field 
of a single document in solr. 

Here is the bash line: 



UPDATE_RESULT=$( curl -s "https://snip/solr/TEST_CORE/update?=true; 
--data-binary '{"id":"$DOC_ID","metatag.date.single" :{"set":"$VAL"}}') 




$DOC_ID is a variable in the script that contains the document UID. I have 
confirmed that this value can be found in the documents with query. 

$VAL is the value to set. I have valudated that the field im trying to set will 
accept the value. 

metatag.date.single is the field I want to set and it does NOT yet exist in 
SOlr but is defined in schema.xml. 




When i run the line in bash i get: 

{"responseHeader":{"status":400,"QTime":0},"error":{"msg":"Unknown command 'id' 
at [5]","code":400}} 




the UID field in SOLR is named id. 




What am i doing wrong? 




Is their a better way to handle this? 




TIA, 




Kris 


Re: Use ConcurrentUpdateSolrClient some program

2016-12-22 Thread Erick Erickson
Hmmm, when you say "When we send 100 million data", _how_ are you
sending it? All at once? And is the read timeout on the client or in
the server logs?

What I suspect is happening is that Solr is too busy to promptly read
the entire packet you're sending. This could be due to several things:
- you're sending too much data at once. I generally send 1,000 docs in a packet.
- Your Solr instance may just be too busy. There are lots of ways it
could be "too busy"
-- You're telling ConcurrentUpdateSolrClient (CUSC) to use a large
number of threads.
-- Solr is spending a lot of resources doing garbage collection etc.
-- You're running other processes on the Solr box.
-- Monitor your Solr server(s) to see the CPU consumption and if you
have it CPU bound.

If you're getting a read timeout, then there's no guarantee that Solr
has even received your data, thus no way to determine what has been
indexed, written to the tlog etc.

By the way, if you're using SolrCloud, I recommend CloudSolrClient
instead. Since you're using SolrJ, you can spin up multiple threads if
you need the increased throughput.

Best,
Erick

On Wed, Dec 21, 2016 at 10:15 PM, 苗海泉  wrote:
> I use the solr is 6.0 version, the solrj is 6.0 version, using
> SolrCloud mode deployment, in my code did not make an explicit commit,
> configure the autoCommit and softAutoCommit, using the
> ConcurrentUpdateSolrClient class.
>
> When we send 100 million data, often read timeout exception occurred
> in this anomaly, the data is lost. I would like to ask a few
> questions:
> 1, ConcurrentUpdateSolrClient.add time, if not thrown on behalf of the
> data is not an exception has been successfully sent to the solr, this
> time is the synchronization of Well, that is, solr server to accept
> the data written to the log before we return?
> 2, if the answer to question 1 is no, then how do we determine
> ConcurrentUpdateSolrClient.add implementation failure, so that we have
> the wrong data retransmission processing.
> 3, there is no use ConcurrentUpdateSolrClient
> Thank you!


Re: How to get SOLR document metadata in UIMA using SOLR6.3

2016-12-22 Thread soumitra80
I am more interested in UIMA JCas code. as I  know that from Update
processing chain I  can send analyze fields. I can mention there "title" but
how to access that in UIMA Jcas



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-get-SOLR-document-metadata-in-UIMA-using-SOLR6-3-tp4310902p4310921.html
Sent from the Solr - User mailing list archive at Nabble.com.


CDCR logging is Needlessly verbose, fills up the file system fast

2016-12-22 Thread Webster Homer
While testing CDCR I found that it is writing tons of log messages per
second. Example:
2016-12-21 23:24:41.652 INFO  (qtp110456297-13) [c:sial-catalog-material
s:shard1 r:core_node1 x:sial-catalog-material_shard1_replica1]
o.a.s.c.S.Request [sial-catalog-material_shard1_replica1]  webapp=/solr
path=/cdcr params={qt=/cdcr=BOOTSTRAP_STATUS=javabin=2}
status=0 QTime=0
2016-12-21 23:24:41.653 INFO  (qtp110456297-18) [c:sial-catalog-material
s:shard1 r:core_node1 x:sial-catalog-material_shard1_replica1]
o.a.s.c.S.Request [sial-catalog-material_shard1_replica1]  webapp=/solr
path=/cdcr params={qt=/cdcr=BOOTSTRAP_STATUS=javabin=2}
status=0 QTime=0
2016-12-21 23:24:41.655 INFO  (qtp110456297-14) [c:sial-catalog-material
s:shard1 r:core_node1 x:sial-catalog-material_shard1_replica1]
o.a.s.c.S.Request [sial-catalog-material_shard1_replica1]  webapp=/solr
path=/cdcr params={qt=/cdcr=BOOTSTRAP_STATUS=javabin=2}
status=0 QTime=0
2016-12-21 23:24:41.657 INFO  (qtp110456297-17) [c:sial-catalog-material
s:shard1 r:core_node1 x:sial-catalog-material_shard1_replica1]
o.a.s.c.S.Request [sial-catalog-material_shard1_replica1]  webapp=/solr
path=/cdcr params={qt=/cdcr=BOOTSTRAP_STATUS=javabin=2}
status=0 QTime=0


These should be DEBUG messages and NOT INFO messages. Is there a way to
selectively turn them off?  The above is from a Target collection, it is
even worse on the Source side.

I'd rather not change my logging level as most INFO messages are useful.

This is a very poor default logging level for these messages.

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.merckgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


Re: Request to be added to Solr Wiki

2016-12-22 Thread Erick Erickson
I added you to the Contributor's group which should allow you to edit
that page. If not, let us know.

Best,
Erick

On Thu, Dec 22, 2016 at 2:54 AM, Patricia Kaufmann
 wrote:
> Hi.
>
> Could you please add me to the Solr Wiki (https://wiki.apache.org/solr). My 
> user name is KaufmannPatricia.
> I realized that our company (SHI GmbH) is listed in 
> https://wiki.apache.org/solr/Support, but the name of the company has changed 
> and I want to correct this. If it is easier, if one of your admins just makes 
> the change, here is what I want to correct:
>
> Recent entry:SHI Elektronische Medien GmbH is a 
> German Search specialist and the first Training Partner of Lucid Imagination 
> in Germany, Austria, Switzerland and Russia. Consulting, training, custom 
> implementations for Lucene/Solr. Contact
>
> Corrected entry:  SHI GmbH is a German 
> Search specialist and the first Training Partner of Lucid Imagination in 
> Germany, Austria, Switzerland and Russia. Consulting, training, custom 
> implementations for Lucene/Solr. Contact
>
> Thank you very much and best regards,
>
> Patricia Kaufmann
> Consultant Search & Big Data
>
> SHI Consulting * Software * Development * Training
>
>
> SHI GmbH - Hauptsitz  
>   SHI GmbH - Büro München
> Adresse: Curt-Frenzel-Str. 12, 86167 Augsburg   
> Adresse: Landshuter Allee 8, 80637 München
> Telefon 0821-74 82 633-0 | Fax 0821-74 82 633-29  
> Telefon 089 - 54 55 82240 | Fax 089 - 55 7443
> www.shi-gmbh.com
> 
>
> SHI GmbH | Registergericht Augsburg HRB 29850 | Geschäftsführer: Peter Spiske 
> | Prokurist: Thomas Hoffmann | USt.-ID: DE 301293356
>


Request to be added to Solr Wiki

2016-12-22 Thread Patricia Kaufmann
Hi.

Could you please add me to the Solr Wiki (https://wiki.apache.org/solr). My 
user name is KaufmannPatricia.
I realized that our company (SHI GmbH) is listed in 
https://wiki.apache.org/solr/Support, but the name of the company has changed 
and I want to correct this. If it is easier, if one of your admins just makes 
the change, here is what I want to correct:

Recent entry:SHI Elektronische Medien GmbH is a 
German Search specialist and the first Training Partner of Lucid Imagination in 
Germany, Austria, Switzerland and Russia. Consulting, training, custom 
implementations for Lucene/Solr. Contact

Corrected entry:  SHI GmbH is a German 
Search specialist and the first Training Partner of Lucid Imagination in 
Germany, Austria, Switzerland and Russia. Consulting, training, custom 
implementations for Lucene/Solr. Contact

Thank you very much and best regards,

Patricia Kaufmann
Consultant Search & Big Data

SHI Consulting * Software * Development * Training


SHI GmbH - Hauptsitz
SHI GmbH - Büro München
Adresse: Curt-Frenzel-Str. 12, 86167 Augsburg   
Adresse: Landshuter Allee 8, 80637 München
Telefon 0821-74 82 633-0 | Fax 0821-74 82 633-29  
Telefon 089 - 54 55 82240 | Fax 089 - 55 7443
www.shi-gmbh.com


SHI GmbH | Registergericht Augsburg HRB 29850 | Geschäftsführer: Peter Spiske | 
Prokurist: Thomas Hoffmann | USt.-ID: DE 301293356



FuzzyLookupFactory throws FuzzyLookupFactory

2016-12-22 Thread Furkan KAMACI
Hi,

When I try suggester component and use FuzzyLookupFactory I get that error:

"error": {
"msg": "java.lang.StackOverflowError",
"trace": "java.lang.RuntimeException: FuzzyLookupFactory n\tat
org.apache.solr.servlet.HttpSolrCall.sendError(HttpSolrCall.java:607)\n\tat

I searched on the web and there are some other people who gets that error
too. Responses to such questions indicate that it may be usual if there are
many data on index. However I just index 4 small PDF files and get that
error when I want to construct suggester.

Any ideas?

Kind Regards,
Furkan KAMACI


Limit Suggested Term Counts

2016-12-22 Thread Furkan KAMACI
I have a list to make suggestions on it. When I check the analyser page I
see that field is analysed as I intended. i.e. tokens are:

java
linux
mac

However, when I use BlendedInfixLookupFactory to run a suggestion on that
field it returns me whole paragraph instead of a limited size of terms (I
know that such implementations does return suggestions even desired terms
are inside the term, not the beginning).

Is it possible to limit that suggested term count?

Kind Regards,
Furkan KAMACI


How to get SOLR document metadata in UIMA using SOLR6.3

2016-12-22 Thread soumitra80
Hi All,
  I am working on SOLR and UIMA development assignment. where I  need to
pass some of the SOLR  document metadata to UIMA  chain. Is there any
concrete example on how to do SO. For example if I  have pass "Title"
information to the UIMA update processor then how can I  do it. I  am
searching for a concrete documentation. Please help.
Regards
Soumitra



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-get-SOLR-document-metadata-in-UIMA-using-SOLR6-3-tp4310902.html
Sent from the Solr - User mailing list archive at Nabble.com.


Error Loading Custom Codec class with Solr Codec Factory. Class cast exception

2016-12-22 Thread Mohit Sidana
Hello,


I am trying to experiment with my solr indexes with the patch open on JIRA -
Codec for index-level encryption
 (LUCENE-6966).
https://issues.apache.org/jira/browse/LUCENE-6966.

I am currently trying to test this Custom codec with Solr to encrypt
sensitive documents.


I have applied this patch to cloned Lucene-solr trunk (branch 6.3 on my
local machine) and used “ant compile” and “ant jar” to get the jar files.

By using this custom codec I can make use of the custom codec(Posting
-format) classes by overridden posting format in field type definations.






But when I try to load this codec directly via Solrconfig.xml CodecFactory
as below.






Solr is not able to load the core and initialize the codec. I am getting
the following error in my logs.

java.util.concurrent.ExecutionException:
org.apache.solr.common.SolrException: Unable to create core [test]

at java.util.concurrent.FutureTask.report(FutureTask.java:122)

at java.util.concurrent.FutureTask.get(FutureTask.java:192)

at
org.apache.solr.core.CoreContainer.lambda$load$1(CoreContainer.java:526)

at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

Caused by: org.apache.solr.common.SolrException: Unable to create core
[test]

at org.apache.solr.core.CoreContainer.create(CoreContainer.java:855)

at
org.apache.solr.core.CoreContainer.lambda$load$0(CoreContainer.java:498)

... 5 more

Caused by: org.apache.solr.common.SolrException: class
com.wk.codecs.encrypted.DummyEncryptedLucene60Codec

at org.apache.solr.core.SolrCore.(SolrCore.java:903)

at org.apache.solr.core.SolrCore.(SolrCore.java:776)

at org.apache.solr.core.CoreContainer.create(CoreContainer.java:842)

... 6 more

Caused by: java.lang.ClassCastException: class
com.wk.codecs.encrypted.DummyEncryptedLucene60Codec

at java.lang.Class.asSubclass(Class.java:3404)

at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:540)

at
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:625)

at
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:590)

at
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:583)

at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:1112)

at org.apache.solr.core.SolrCore.(SolrCore.java:847)

... 8 more



I am new to solr and unable to track what is causing this error.


Is it classpath issue or something else ?



I will appreciate any feedback on this.



Thanks,

Mohit


Re: Solr Suggester

2016-12-22 Thread Michael Kuhlmann
For the suggester, the field must be indexed. It's not necessary to have
it stored.

Best,
Michael

Am 22.12.2016 um 11:24 schrieb Furkan KAMACI:
> Hi Emir,
>
> As far as I know, it should be enough to be stored=true for a suggestion
> field? Should it be both indexed and stored?
>
> Kind Regards,
> Furkan KAMACI
>
> On Thu, Dec 22, 2016 at 11:31 AM, Emir Arnautovic <
> emir.arnauto...@sematext.com> wrote:
>
>> That is because my_field_2 is not indexed.
>>
>> Regards,
>> Emir
>>
>>
>> On 21.12.2016 18:04, Furkan KAMACI wrote:
>>
>>> Hi All,
>>>
>>> I've a field like that:
>>>
>>>  >>   multiValued="false" />
>>>
>>>  >> stored="true" multiValued="false"/>
>>>
>>> When I run a suggester on my_field_1 it returns response. However
>>> my_field_2 doesn't. I've defined suggester as:
>>>
>>>suggester
>>>FuzzyLookupFactory
>>>DocumentDictionaryFactory
>>>
>>> What can be the reason?
>>>
>>> Kind Regards,
>>> Furkan KAMACI
>>>
>>>
>> --
>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>



Re: How to manages diversity in search results in Solr

2016-12-22 Thread Toke Eskildsen
On Thu, 2016-12-22 at 17:35 +0800, Daisy wrote:
> How to restrict the product search in a marketplace where no more
> than 3 results per retailer are permitted in search results?
> 
> I understand the groupby/collapse could solve the issue but is there
> any other way to do it?

Grouping is the obvious solution. Since that does not work for you, you
need to describe what the problem is with that solution, in order for
us to suggest alternatives.

- Toke Eskildsen, State and University Library, Denmark


Re: Solr Suggester

2016-12-22 Thread Furkan KAMACI
Hi Emir,

As far as I know, it should be enough to be stored=true for a suggestion
field? Should it be both indexed and stored?

Kind Regards,
Furkan KAMACI

On Thu, Dec 22, 2016 at 11:31 AM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:

> That is because my_field_2 is not indexed.
>
> Regards,
> Emir
>
>
> On 21.12.2016 18:04, Furkan KAMACI wrote:
>
>> Hi All,
>>
>> I've a field like that:
>>
>>  >   multiValued="false" />
>>
>>  > stored="true" multiValued="false"/>
>>
>> When I run a suggester on my_field_1 it returns response. However
>> my_field_2 doesn't. I've defined suggester as:
>>
>>suggester
>>FuzzyLookupFactory
>>DocumentDictionaryFactory
>>
>> What can be the reason?
>>
>> Kind Regards,
>> Furkan KAMACI
>>
>>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>


How to manages diversity in search results in Solr

2016-12-22 Thread Daisy
How to restrict the product search in a marketplace where no more than 3 
results per retailer are permitted in search results?

I understand the groupby/collapse could solve the issue but is there any other 
way to do it? Thank you.

 

Regards,

Daisy

 


--
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or 
privileged information. If you are not the intended recipient or have received 
this e-mail in error, please inform the sender immediately and delete this 
e-mail (including any attachments) from your computer, and you must not use, 
disclose to anyone else or copy this e-mail (including any attachments), 
whether in whole or in part. 

This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.



Re: Solr Suggester

2016-12-22 Thread Emir Arnautovic

That is because my_field_2 is not indexed.

Regards,
Emir

On 21.12.2016 18:04, Furkan KAMACI wrote:

Hi All,

I've a field like that:

 

 

When I run a suggester on my_field_1 it returns response. However
my_field_2 doesn't. I've defined suggester as:

   suggester
   FuzzyLookupFactory
   DocumentDictionaryFactory

What can be the reason?

Kind Regards,
Furkan KAMACI



--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: XFS or EXT4 on Amazon AWS AMIs

2016-12-22 Thread Will Martin
I'd like to see the MongoDB report(?). ext4fs design specifications includes 
support for large files via allocation placement. MongoDB, the last time I 
checked, does pre-allocation which gives it the performance benefit of ext4fs 
multiple design factors (Block and Inode Allocation Policy), but the 
disadvantage of having to rebuild when file lengths are being exceeded; at 
which time the disk fragmentation may prevent ext4fs from getting the 
allocation pattern it was designed for.

That design feature is going to be unavailable with Solr where ext4fs dynamic 
allocation features are less deterministic. Other performance factors on 
ext4fs, and mutexes (even with guard mutexes) are pretty standard patterns. The 
threaded calls sound like the advantages of the allocation pattern.

Still those statements, *based on a dated reading of mine*, may be out of date 
with the MongoDB report factors.

"ext4 recognizes (better than ext3, anyway) that data locality is generally a 
desirably quality of a filesystem"

https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Block_and_Inode_Allocation_Policy

For AWS AMI, is there an r4 instance type? The c3 and m3 are superseded with *4 
types that have notable improvements in IOPs and don't cost more.

http://howto.unixdev.net/Test_LVM_Trim_Ext4.html   -- not an extended 
performance benchmark, but useful to validate discard/TRIM.

On 12/22/2016 1:32 AM, William Bell wrote:

So what are people recommending for SOLR on AWS on Amazon AMI - ext4 or xfs?

I saw an article about MongoDB - saying performance on Amazon was better
due to a mutex issue on ext4 files and threaded calls.

I have been using ext4 for a long time, but I am moving to r3.* instances
and TRIM / DISCARD support just appears more supported on XFS.