Re: Clarification on +, and in edismax parser

2016-03-09 Thread Dikshant Shahi
Hi,

No, + and "and" doesn't works similar. Even "and" and "AND" would have a
different behavior (is configurable) in edismax.

When you put a + before a term, you specify that it's mandatory. Hence,
"+google +india" will get you the same result as "google AND india".

Best Regards,
*Dikshant Shahi*



On Thu, Mar 10, 2016 at 12:59 PM, Anil  wrote:

>  "google"+"india" ,  "india"+"google" returning different results. Any help
> would be appreciated.
>
> Thanks,
> Anil
>
>
> On 10 March 2016 at 11:47, Anil  wrote:
>
> > HI,
> >
> > I am using edismax query parser for my solr search.
> >
> > i believe '+' and 'and' should work similar.
> >
> > ex : "google"+"india", "google" and "india" should return same number of
> > results.
> >
> > Correct me if I am wrong. Thanks.
> >
> > Regards,
> > Anil
> >
> >
> >
>


Re: Clarification on +, and in edismax parser

2016-03-09 Thread Anil
 "google"+"india" ,  "india"+"google" returning different results. Any help
would be appreciated.

Thanks,
Anil


On 10 March 2016 at 11:47, Anil  wrote:

> HI,
>
> I am using edismax query parser for my solr search.
>
> i believe '+' and 'and' should work similar.
>
> ex : "google"+"india", "google" and "india" should return same number of
> results.
>
> Correct me if I am wrong. Thanks.
>
> Regards,
> Anil
>
>
>


Re: How to pass facet info of the top inner nested doc to the top parent doc

2016-03-09 Thread Mikhail Khludnev
Jhon,
What if you just forgen about middle level and q={!parent
which=type_s:parent}...=SIZE_s

Binoy,
facet.query works but there should be many of them one for L, one for M, etc

On Wed, Mar 9, 2016 at 7:09 PM, Jhon Smith  wrote:

> There are 3 levels of nested docs: parent -> mniddle -> child.
>
> E.g.
> 
> 9
> parent
> 
> 10
> middle
> Nike
> 
> 11
> Red
> XL
> 
> 
> 12
> Blue
> XL
> 
> 
> 
>
> If i retrieve middle docs with q={!parent
> which=type_s:middle}...=SIZE_s
> then facets work fine (in the latest solr): XL(1)
>
> But i want to retrieve top parent documents (type_s:parent) while still
> having facet info about SIZE_s from child document. How to do that?
>
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: How to sort docs basing on nested docs' fields

2016-03-09 Thread Mikhail Khludnev
Hello,

I suppose there are a clue over there
http://blog.griddynamics.com/2015/08/scoring-join-party-in-solr-53.html

On Wed, Mar 9, 2016 at 6:51 PM, Jhon Smith  wrote:

> There are usual documents: products and nested documents containing
> different prices.
> How to sort product documents basing on minimum price in nested documents.
>
> Example:
> 
> 1
> product
> 
> 2
> price
> 100
> 
> 
> 3
> price
> 200
> 
> 
> 
> 4
> product
> 
> 5
> price
> 300
> 
> 
> 6
> price
> 50
> 
> 
>
> So
> product with id=1 has prices 100 and 200: minimum price = 100
> product with id=4 has prices 300 and 50: minimum price = 50
> Hence sorting in ascending order should return second document(id=4) first
> and first document(id=1) next.
>
> Denormalization or storing the min price in product document itself is not
> an option since the actual structure and requirements are more complex.
>
> I guess some function query should be used somehow?
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Clarification on +, and in edismax parser

2016-03-09 Thread Anil
HI,

I am using edismax query parser for my solr search.

i believe '+' and 'and' should work similar.

ex : "google"+"india", "google" and "india" should return same number of
results.

Correct me if I am wrong. Thanks.

Regards,
Anil


Re: Query behavior.

2016-03-09 Thread Shawn Heisey
On 3/9/2016 10:55 PM, Shawn Heisey wrote:
> The ~2 syntax, when not attached to a phrase query (quotes) is the way
> you express a fuzzy query. If it's attached to a query in quotes, then
> it is a proximity query. I'm not sure whether it means something
> different when it's attached to a query clause in parentheses, someone
> with more knowledge will need to comment.

> https://issues.apache.org/jira/browse/SOLR-8812

After I read SOLR-8812 more closely, it seems that the ~2 syntax with
parentheses is the way that the effective mm value is expressed for a
particular query clause in the parsed query.  I've learned something new
today.

Thanks,
Shawn



Re: Query behavior.

2016-03-09 Thread Shawn Heisey
On 3/9/2016 12:07 AM, Modassar Ather wrote:
> Kindly help me understand the parsing of following query. I am using
> edismax parser and Solr-5.5.0.
> q.op is set to AND and there is no explicit mm value set.
>
> fl:(java OR book) => "boost(+((fl:java fl:book)~2),int(val))"
>
> When the query has explicit OR then why the ~2 is present in the parsed
> query?

The ~2 syntax, when not attached to a phrase query (quotes) is the way
you express a fuzzy query.  If it's attached to a query in quotes, then
it is a proximity query.  I'm not sure whether it means something
different when it's attached to a query clause in parentheses, someone
with more knowledge will need to comment.

> How can I achieve following?
> "boost(+((fl:java fl:book)),int(val))"
>
> The reason being the ANDed and ORed queries both returns the same number of
> documents. But what expected is that the ORed query should have more number
> of documents.

Normally I would say that if you get the same number of documents with
both "AND" & "OR" then it means that every document that contains "java"
also contains "book" ... but since you are running version 5.5.0, there
is a bug report that describes what you are seeing:

https://issues.apache.org/jira/browse/SOLR-8812

Thanks,
Shawn



Re: Query behavior.

2016-03-09 Thread Modassar Ather
Hi,

A suggestion will be very helpful.

Thanks,
Modassar

On Wed, Mar 9, 2016 at 12:37 PM, Modassar Ather 
wrote:

> Hi,
>
> Kindly help me understand the parsing of following query. I am using
> edismax parser and Solr-5.5.0.
> q.op is set to AND and there is no explicit mm value set.
>
> fl:(java OR book) => "boost(+((fl:java fl:book)~2),int(val))"
>
> When the query has explicit OR then why the ~2 is present in the parsed
> query?
>
> How can I achieve following?
> "boost(+((fl:java fl:book)),int(val))"
>
> The reason being the ANDed and ORed queries both returns the same number
> of documents. But what expected is that the ORed query should have more
> number of documents.
>
> Thanks,
> Modassar
>


Re: Multiple custom Similarity implementations

2016-03-09 Thread Parvesh Garg
Thanks Markus. We will look at other options. May I ask what can be the
reasons for not supporting this ever?


Parvesh Garg,

http://www.zettata.com

On Tue, Mar 8, 2016 at 8:59 PM, Markus Jelsma 
wrote:

> Hello, you can not change similarities per request, and this is likely
> never going to be supported for good reasons. You need multiple cores, or
> multiple fields with different similarity defined in the same core.
> Markus
>
> -Original message-
> > From:Parvesh Garg 
> > Sent: Tuesday 8th March 2016 5:36
> > To: solr-user@lucene.apache.org
> > Subject: Multiple custom Similarity implementations
> >
> > Hi,
> >
> > We have a requirement where we want to run an A/B test over multiple
> > Similarity implementations. Is it possible to define multiple similarity
> > tags in schema.xml file and chose one using the URL parameter? We are
> using
> > solr 4.7
> >
> > Currently, we are planning to have different cores with different
> > similarity configured and split traffic based on core names. This is
> > leading to index duplication and un-necessary resource usage.
> >
> > Any help is highly appreciated.
> >
> > Parvesh Garg,
> >
> > http://www.zettata.com
> >
>


Re: Non-contigous terms in SuggestComponent

2016-03-09 Thread Zheng Lin Edwin Yeo
Hi Alfonso,

I think we can't escape totally from edismax. I guess we just have to try
to optimise the performance. I'm dealing with similar issues at the moment
now.

Regards,
Edwin


On 3 March 2016 at 00:14, Alfonso Muñoz-Pomer Fuentes 
wrote:

> Hi Edwin.
>
> That was what I suspected, but I wanted to confirm. If we go down this
> route I’ll do some testing and post the results.
>
> We’re using 5.1 in production, but I’m testing with 5.4.1.
>
> The index has 40,891,287 documents and is 3.01 GB, so it’s not big at all.
>
> Many thanks,
> Alfonso
>
>
> On 01/03/2016 06:25, Zheng Lin Edwin Yeo wrote:
>
>>  From what I have experienced, the performance using edismax will be
>> slower.
>> It may not be that significant if your index size is small, but it will
>> get
>> more significant as your index size grows.
>>
>> By the way, which version of Solr are you using?
>>
>> Regards,
>> Edwin
>>
>>
>> On 29 February 2016 at 21:33, Alfonso Muñoz-Pomer Fuentes <
>> amu...@ebi.ac.uk>
>> wrote:
>>
>> Hi all.
>>>
>>> I’ve been reading through the Suggester component in Solr at
>>> https://cwiki.apache.org/confluence/display/solr/Suggester.
>>>
>>> I have a couple of questions regarding it which I haven’t been able to
>>> find the answer for in that page or anywhere else.
>>>
>>> Is there a way to get suggestions from non-contiguous terms using a
>>> SuggestComponent? E.g. let’s say we have a document which contains “The
>>> quick brown fox” in a text field, can it be configured so that a user can
>>> obtain that suggestion by typing “quick fox”?
>>>
>>> I know I can get this sort of results using edismax queries, so maybe I
>>> can set a request handler to do suggestions in this way instead than with
>>> SuggestComponent. What are the downsides performance-wise?
>>>
>>> Thank you in advance.
>>>
>>> --
>>> Alfonso Muñoz-Pomer Fuentes
>>> Software Engineer @ Expression Atlas Team
>>> European Bioinformatics Institute (EMBL-EBI)
>>> European Molecular Biology Laboratory
>>> Tel:+ 44 (0) 1223 49 2633
>>> Skype: amunozpomer
>>>
>>>
>>
> --
> Alfonso Muñoz-Pomer Fuentes
> Software Engineer @ Expression Atlas Team
> European Bioinformatics Institute (EMBL-EBI)
> European Molecular Biology Laboratory
> Tel:+ 44 (0) 1223 49 2633
> Skype: amunozpomer
>


Re: Timeout error during commit

2016-03-09 Thread Shawn Heisey
On 3/9/2016 6:10 PM, Steven White wrote:
> I'm indexing about 1 billion records (each are small Solr doc, no more than
> 20 bytes each).  The logic is basically as follows:
>
> while (data-of-1-billion) {
> read-1000-items from DB
> at-100-items send 100 items to Solr: i.e.:
> solrConnection.add(docs);
> }
> solrConnection.commit()
>
> I'm seeing the following expection from SolrJ:
>
> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at: http://localhost:8983/solr/test_data

> Which tells me it took Solr a bit over 5 sec. to complete the commit.
>
> Now when I created the Solr connection, I used 5 seconds like so:
>
> solrClient.setConnectionTimeout(5000;
>   solrClient.setSoTimeout(5000);
>
> Two questions:
>
> 1) Is the time out error because of my use of 5000?
> 2) Should I be calling "solrConnection.commit()" every now and than inside
> the loop?

Yes, this problem is happening because you set the SoTimeout value to 5
seconds.  This is an inactivity timeout on the TCP socket.  It's not
clear whether the problem happened on the commit operation or on the add
operation -- it could be either.

Your SoTimeout value should either remain unset, or should be set to
something *significantly* longer than you ever expect the request to
take.  I would suggest something between five and fifteen minutes.  I
use fifteen minutes.  This is long enough that it should only be reached
if there's a real problem, but short enough that my build program will
not hang indefinitely, and will have an opportunity to send me email to
tell me there's a problem.

I would suggest that you don't do *any* commits until the end of the
loop -- after all one billion docs have been indexed.  If you want to do
them in your loop, set up something that will do them far less
frequently, perhaps every 100 times through the loop.  You could include
a commitWithin parameter on the add request instead of sending actual
commits, which I would recommend you set to a fairly large value.  I
would use at least five minutes, but never less than one minute. 
Alternately, you could configure autoSoftCommit in your solrconfig.xml
file.  I would recommend a maxTime value on that config of at least five
minutes.

Also, consider increasing your batch size to something larger than 100
or 1000.  Use 1 or more.  With 20 byte documents, you could send a
LOT of documents in each batch without worrying too much about memory.

Regardless of what else you do with commits, if you're running at least
Solr 4.0, your solrconfig.xml file should include an autoCommit section
configured with openSearcher set to false and a maxTime between one and
five minutes.

By now, I hope you've seen a recommendation to read this blog post:

http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Thanks,
Shawn



Timeout error during commit

2016-03-09 Thread Steven White
Hi folks,

I'm indexing about 1 billion records (each are small Solr doc, no more than
20 bytes each).  The logic is basically as follows:

while (data-of-1-billion) {
read-1000-items from DB
at-100-items send 100 items to Solr: i.e.:
solrConnection.add(docs);
}
solrConnection.commit()

I'm seeing the following expection from SolrJ:

org.apache.solr.client.solrj.SolrServerException: Timeout occured while
waiting response from server at: http://localhost:8983/solr/test_data

Looking at Solr's log, I see this:

INFO  - 2016-01-15 21:15:34.836; [   test_data]
org.apache.solr.update.processor.LogUpdateProcessor; [test_data]
webapp=/solr path=/update params={wt=xml=2.2} {add=[, ... (101
adds)]} 0 5172

Which tells me it took Solr a bit over 5 sec. to complete the commit.

Now when I created the Solr connection, I used 5 seconds like so:

solrClient.setConnectionTimeout(5000;
  solrClient.setSoTimeout(5000);

Two questions:

1) Is the time out error because of my use of 5000?
2) Should I be calling "solrConnection.commit()" every now and than inside
the loop?

Thanks

Steve


Re: ngrams with position

2016-03-09 Thread Alessandro Benedetti
if you store the positions for your tokens ( and it is by default if you
don't omit them), you have the relative position in the index. [1]
I attach a blog post of mine, describing a little bit more in details the
lucene internals.

Apart from that, can you explain the problem you are trying to solve ?
The high level user experience ?
What kind of search/autocompletion/relevancy tuning are you trying to
achieve ?
Maybe we can help better if we start from the problem :)

Cheers

[1]
http://alexbenedetti.blogspot.co.uk/2015/07/exploring-solr-internals-lucene.html

On 9 March 2016 at 15:02, elisabeth benoit 
wrote:

> Hello Alessandro,
>
> You may be right. What would you use to keep relative order between, for
> instance, grams
>
> __a
> _am
> ams
> mst
> ste
> ter
> erd
> rda
> dam
> am_
>
> of amsterdam? pf2 and pf3? That's all I can think about. Please let me know
> if you have more insights.
>
> Best regards,
> Elisabeth
>
> 2016-03-08 17:46 GMT+01:00 Alessandro Benedetti :
>
> > Elizabeth,
> > out of curiousity, could we know what you are trying to solve with that
> > complex way of tokenisation ?
> > Solr is really good in storing positions along with token, so I am
> curious
> > to know why your are mixing the things up.
> >
> > Cheers
> >
> > On 8 March 2016 at 10:08, elisabeth benoit 
> > wrote:
> >
> > > Thanks for your answer Emir,
> > >
> > > I'll check that out.
> > >
> > > Best regards,
> > > Elisabeth
> > >
> > > 2016-03-08 10:24 GMT+01:00 Emir Arnautovic <
> emir.arnauto...@sematext.com
> > >:
> > >
> > > > Hi Elisabeth,
> > > > I don't think there is such token filter, so you would have to create
> > > your
> > > > own token filter that takes token and emits ngram token of specific
> > > length.
> > > > It should not be too hard to create such filter - you can take a look
> > how
> > > > nagram filter is coded - yours should be simpler than that.
> > > >
> > > > Regards,
> > > > Emir
> > > >
> > > >
> > > > On 08.03.2016 08:52, elisabeth benoit wrote:
> > > >
> > > >> Hello,
> > > >>
> > > >> I'm using solr 4.10.1. I'd like to index words with ngrams of fix
> > lenght
> > > >> with a position in the end.
> > > >>
> > > >> For instance, with fix lenght 3, Amsterdam would be something like:
> > > >>
> > > >>
> > > >> a0 (two spaces added at beginning)
> > > >> am1
> > > >> ams2
> > > >> mst3
> > > >> ste4
> > > >> ter5
> > > >> erd6
> > > >> rda7
> > > >> dam8
> > > >> am9 (one more space in the end)
> > > >>
> > > >> The number at the end being the position.
> > > >>
> > > >> Does anyone have a clue how to achieve this?
> > > >>
> > > >> Best regards,
> > > >> Elisabeth
> > > >>
> > > >>
> > > > --
> > > > Monitoring * Alerting * Anomaly Detection * Centralized Log
> Management
> > > > Solr & Elasticsearch Support * http://sematext.com/
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: XJoin, a way to use external data sources with Solr

2016-03-09 Thread TomW
Hi Zisis,

I think I remember considering using a PostFilter, however, whilst that
seems to be fine for a simple filter, what we wanted to do with XJoin was,
as well as filtering, to pull in fields from the external data and
incorporate them into the matching documents in the SOLR results set.  Once
we'd done that, it was then fairly easy to extend this to allow boosting on
an external score (instead of a filter, or as well as) and other neat
features.

Hopefully that answers your questions, if not, fire away with some more!

Cheers,
Tom W (Flax)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/XJoin-a-way-to-use-external-data-sources-with-Solr-tp4254055p4262847.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: JSON FACET API - multiselect

2016-03-09 Thread Jay Potharaju
Actually there is a problem with my data..found my error.
Thanks


On Wed, Mar 9, 2016 at 9:24 AM, Jay Potharaju  wrote:

> Hi,
> I am using solr 5.4 and testing the multi select JSON facet feature.
> When I select 1 value the results are the same as number of counts for the
> facet. But when I select more than 1 facet the number of results returned
> are not correct.
>
> *Single Facet selected*
> fq: [
> "{!tag=FIRSTLETTER}facet_firstLetter_lastname:(Q)"
> ],
> json.facet.name: "{type:terms,field:facet_firstLetter_lastname,
> sort:{count:desc}}"
> response: {
> numFound: 540,
> start: 0,
> docs: [ ]
> },
> facets: {
> count: 5246,
> name: {
> buckets: [
> {
> val: "Q",
> count: 540
> },
> {
> val: "X",
> count: 1302
> },
> {
> val: "J",
> count: 4718
> },
> {
> val: "Z",
> count: 7242
> },
> {
> val: "V",
> count: 9089
> },
> {
> val: "F",
> count: 10053
> },
> {
> val: "P",
> count: 14966
> },
> {
> val: "Y",
> count: 18520
> },
> {
> val: "W",
> count: 20781
> },
> {
> val: "G",
> count: 21935
> }
> ]
> }
> }
>
> *Multi-select facet*
> fq: [
> "{!tag=FIRSTLETTER}facet_firstLetter_lastname:(Q J)"
> ],
>
> response: {
> numFound: 5246,
> start: 0,
> docs: [ ]
> }
>
> I was expecting the response count to be 540 + 4718 = 5258 but the
> response is 5246.
>
> Can someone comment on regarding this?
>
> --
> Thanks
> Jay
>
>



-- 
Thanks
Jay Potharaju


JSON FACET API - multiselect

2016-03-09 Thread Jay Potharaju
Hi,
I am using solr 5.4 and testing the multi select JSON facet feature.
When I select 1 value the results are the same as number of counts for the
facet. But when I select more than 1 facet the number of results returned
are not correct.

*Single Facet selected*
fq: [
"{!tag=FIRSTLETTER}facet_firstLetter_lastname:(Q)"
],
json.facet.name: "{type:terms,field:facet_firstLetter_lastname,
sort:{count:desc}}"
response: {
numFound: 540,
start: 0,
docs: [ ]
},
facets: {
count: 5246,
name: {
buckets: [
{
val: "Q",
count: 540
},
{
val: "X",
count: 1302
},
{
val: "J",
count: 4718
},
{
val: "Z",
count: 7242
},
{
val: "V",
count: 9089
},
{
val: "F",
count: 10053
},
{
val: "P",
count: 14966
},
{
val: "Y",
count: 18520
},
{
val: "W",
count: 20781
},
{
val: "G",
count: 21935
}
]
}
}

*Multi-select facet*
fq: [
"{!tag=FIRSTLETTER}facet_firstLetter_lastname:(Q J)"
],

response: {
numFound: 5246,
start: 0,
docs: [ ]
}

I was expecting the response count to be 540 + 4718 = 5258 but the response
is 5246.

Can someone comment on regarding this?

-- 
Thanks
Jay


Re: Solrcloud Batch Indexing

2016-03-09 Thread Bin Wang
Hi Eric,

I have done a benchmark writing directly to Solrcloud running on my Macbook
using SolrJ.
In a nutshell, the best indexing speed is *12K* dps (documents per second)
with an optimized batch size.
You can find more detail and my source code here
. It is just a
laptop and the computing power is limited.

The second step for me will be to write multiple processes writing to
Solrcloud in parallel.
Timothy Potter has done a benchmark
 (page13) before on
AWS. And he has reached a indexing speed of *121K* dps.

You are right, looks like if we fine tune the number of records we might
just be fast enough to index all the data fast enough.

will keep you posted.

Bin

On Mon, Mar 7, 2016 at 3:40 PM, Erick Erickson 
wrote:

> Bin:
>
> The MRIT/Morphlines only makes sense if you have lots more
> nodes devoted to the M/R jobs than you do Solr shards since the
> actual work done to index a given doc is exactly the same either
> with MRIT/Morphlines or just sending straight to Solr.
>
> A bit of background here. I mentioned that MRIT/Morphlines uses
> EmbeddedSolrServer. This is exactly Solr as far as the actual indexing
> is concerned. So using --go-live is not buying you anything and, in fact,
> is costing you quite a bit over just using <2> to index directly to Solr
> since
> the index has to be copied around. I confess I'm surprised that --go-live
> is taking that long. basically it's just copying your index up to Solr so
> perhaps there's an I/O problem or some such.
>
> OK, I'm lying a little bit here, _if_ you have more than one replica per
> shard, then indexing straight to Solr will cost you (anecdotally)
> 10-15% in indexing speed. But if this is a single replica/shard (i.e.
> leader-only), then it's near enough to being the exact same.
>
> Anyway, at the end of the day, the index produced is self-contained.
> You could even just copy it to your shards (with Solr down), and then
> bring up your Solr nodes on a non-HDFS-based Solr.
>
> But frankly I'd avoid that and benchmark on <2> first. My expectation
> is that you'll be fine there and see indexing roughly on par with your
> MRIT/Morphlines.
>
> Now, all that said, indexing 300M docs in 'a few minutes' is a bit
> surprising.
> I'm really wondering if you're not being fooled by something "odd". Have
> you compared the identical runs with and without --go-live?
>
> _Very_ often, the bottleneck isn't Solr at all, it's the data acquisition,
> so be
> careful when measuring that the Solr CPU's are pegged... otherwise
> you're bottlenecking upstream of Solr. A super-simple way to figure that
> out is to comment out the solrServer.add(list, 1) line in <2> or just
> run MRIT/Morphlines without the --go-live switch.
>
> BTW, with <2> you could run with as many jobs as you wanted to run
> the Solr servers flat-out.
>
> FWIW,
> Erick
>
> On Mon, Mar 7, 2016 at 1:14 PM, Bin Wang  wrote:
> > Hi Eric,
> >
> > Thanks for your quick response.
> >
> > From the data's perspective, we have 300+ million rows and believe it or
> > not, the source data is from relational database (Hive) and the database
> is
> > rebuilt every day (I am as frustrated as most of you who read this but it
> > is what it is) and potentially need to store actually all of the fields.
> > In this case, I have to figure out a solution to quickly index 300+
> million
> > rows as fast as I can.
> >
> > I am still at a stage evaluating all the different solutions, and I am
> > sorry that I haven't really benchmarked the second approach yet.
> > I will find a time to run some benchmark and share the result with the
> > community.
> >
> > Regarding the approach that I suggested - mapreduce Lucene indexes, do
> you
> > think it is feasible and does that worth the effort to dive into?
> >
> > Best regards,
> >
> > Bin
> >
> >
> >
> > On Mon, Mar 7, 2016 at 1:57 PM, Erick Erickson 
> > wrote:
> >
> >> I'm wondering if you need map reduce at all ;)...
> >>
> >> The achilles heel with M/R viz: Solr is all the copying around
> >> that's done at the end of the cycle. For really large bulk indexing
> >> jobs, that's a reasonable price to pay..
> >>
> >> How many docs and how would you characterize them as far
> >> as size, fields, etc? And what are your time requirements? What
> >> kind of docs?
> >>
> >> I'm thinking this may be an "XY Problem". You're asking about
> >> a specific solution before explaining the problem.
> >>
> >> Why do you say that Solr is not really optimized for bulk loading?
> >> I took a quick look at <2> and the approach is sound. It batches
> >> up the docs in groups of 1,000 and uses CloudSolrServer as it should.
> >> Have you tried it? At the end of the day, MapReduceIndexerTool does
> >> the same work to index a doc as a regular Solr server would via
> >> EmbeddedSolrServer so if the number of 

Re: How to pass facet info of the top inner nested doc to the top parent doc

2016-03-09 Thread Binoy Dalal
I think you are looking for the facet.query method.
To get your child doc facets append =

On Wed, 9 Mar 2016, 21:39 Jhon Smith,  wrote:

> There are 3 levels of nested docs: parent -> mniddle -> child.
>
> E.g.
> 
> 9
> parent
> 
> 10
> middle
> Nike
> 
> 11
> Red
> XL
> 
> 
> 12
> Blue
> XL
> 
> 
> 
>
> If i retrieve middle docs with q={!parent
> which=type_s:middle}...=SIZE_s
> then facets work fine (in the latest solr): XL(1)
>
> But i want to retrieve top parent documents (type_s:parent) while still
> having facet info about SIZE_s from child document. How to do that?
>
>
> --
Regards,
Binoy Dalal


How to pass facet info of the top inner nested doc to the top parent doc

2016-03-09 Thread Jhon Smith
There are 3 levels of nested docs: parent -> mniddle -> child.

E.g.

9
parent

10
middle
Nike

11
Red
XL


12
Blue
XL




If i retrieve middle docs with q={!parent 
which=type_s:middle}...=SIZE_s
then facets work fine (in the latest solr): XL(1)

But i want to retrieve top parent documents (type_s:parent) while still having 
facet info about SIZE_s from child document. How to do that?




Re: Failed to set SSL solr 5.2.1 Windows OS

2016-03-09 Thread Steve Rowe
So, did you try converting the backslashes to forward slashes?

You could try to increase logging to get more information: 


Can you provide a larger snippet of your log around the error?

Sounds like at a minimum Solr could do better at reporting errors 
locating/loading SSL stores.

Yes, the files in server/etc are being used in solr 5.2.1.

--
Steve
www.lucidworks.com

> On Mar 9, 2016, at 2:14 AM, Ilan Schwarts  wrote:
> 
> How would one try to solve this issue? What would you suggest me to do?
> Debug that module? I will try only to install clean jetty with ssl first.
> 
> Another question. The files jetty.xml\jetty-ssl.xml and the rest of files
> in /etc are being used in solr 5.2.1?
> On Mar 9, 2016 12:08 AM, "Steve Rowe"  wrote:
> 
>> Hmm, not sure what’s happening.  Have you tried converting the backslashes
>> in your paths to forward slashes?
>> 
>> --
>> Steve
>> www.lucidworks.com
>> 
>>> On Mar 8, 2016, at 3:39 PM, Ilan Schwarts  wrote:
>>> 
>>> Hi, thanks for reply.
>>> I am using solr.in.cmd
>>> I even put some pause in the cmd with echo to see the parameters are
>> ok.. This is the original file as found in
>> https://www.apache.org/dist/lucene/solr/5.2.1/solr-5.2.1.zip
>>> 
>>> 
>>> 
>>> On Tue, Mar 8, 2016 at 10:25 PM, Steve Rowe  wrote:
>>> Hi Ilan,
>>> 
>>> Looks like you’re modifying solr.in.sh instead of solr.in.cmd?
>>> 
>>> FYI running under Cygwin is not supported.
>>> 
>>> --
>>> Steve
>>> www.lucidworks.com
>>> 
 On Mar 8, 2016, at 11:51 AM, Ilan Schwarts  wrote:
 
 Hi all, I am trying to integrate solr with SSL on Windows 7 OS
 I followed the enable ssl guide at
 https://cwiki.apache.org/confluence/display/solr/Enabling+SSL
 
 I created the keystore and placed in on etc folder. I un-commented the
 lines and set:
 SOLR_SSL_KEY_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks
 SOLR_SSL_KEY_STORE_PASSWORD=password
 SOLR_SSL_TRUST_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks
 SOLR_SSL_TRUST_STORE_PASSWORD=password
 SOLR_SSL_NEED_CLIENT_AUTH=false
 
 When i test the storekey using
 keytool -list -alias solr-ssl -keystore
 C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks -storepass password
>> -keypass
 password
 It is okay, and print me there is 1 entry in keystore.
 
 When i am running in from solr, it will write:
 "Keystore was tampered with, or password was incorrect"
 I get this exception after
>> JavaKeyStore.engineLoad(JavaKeyStore.java:780)
 
 
 If i replace
 SOLR_SSL_KEY_STORE=C:\solr-5.2.1\server\etc\solr-ssl.keystore.jks with
 SOLR_SSL_KEY_STORE=NOTHING_REALISTIC
 it will write the same error, i suspect i dont deliver the path as it
 should be.
 
 Any suggestions ?
 
 Thanks
 
 
 --
 
 
 -
 Ilan Schwarts
>>> 
>>> 
>>> 
>>> 
>>> --
>>> 
>>> 
>>> -
>>> Ilan Schwarts
>> 
>> 



How to sort docs basing on nested docs' fields

2016-03-09 Thread Jhon Smith
There are usual documents: products and nested documents containing different 
prices.
How to sort product documents basing on minimum price in nested documents.

Example:

1
product  

2
price   
100


3
price   
200
  


4
product  

5
price   
300


6
price   
50
  
  

So 
product with id=1 has prices 100 and 200: minimum price = 100
product with id=4 has prices 300 and 50: minimum price = 50
Hence sorting in ascending order should return second document(id=4) first and 
first document(id=1) next.

Denormalization or storing the min price in product document itself is not an 
option since the actual structure and requirements are more complex.

I guess some function query should be used somehow?


Re: ngrams with position

2016-03-09 Thread elisabeth benoit
Hello Alessandro,

You may be right. What would you use to keep relative order between, for
instance, grams

__a
_am
ams
mst
ste
ter
erd
rda
dam
am_

of amsterdam? pf2 and pf3? That's all I can think about. Please let me know
if you have more insights.

Best regards,
Elisabeth

2016-03-08 17:46 GMT+01:00 Alessandro Benedetti :

> Elizabeth,
> out of curiousity, could we know what you are trying to solve with that
> complex way of tokenisation ?
> Solr is really good in storing positions along with token, so I am curious
> to know why your are mixing the things up.
>
> Cheers
>
> On 8 March 2016 at 10:08, elisabeth benoit 
> wrote:
>
> > Thanks for your answer Emir,
> >
> > I'll check that out.
> >
> > Best regards,
> > Elisabeth
> >
> > 2016-03-08 10:24 GMT+01:00 Emir Arnautovic  >:
> >
> > > Hi Elisabeth,
> > > I don't think there is such token filter, so you would have to create
> > your
> > > own token filter that takes token and emits ngram token of specific
> > length.
> > > It should not be too hard to create such filter - you can take a look
> how
> > > nagram filter is coded - yours should be simpler than that.
> > >
> > > Regards,
> > > Emir
> > >
> > >
> > > On 08.03.2016 08:52, elisabeth benoit wrote:
> > >
> > >> Hello,
> > >>
> > >> I'm using solr 4.10.1. I'd like to index words with ngrams of fix
> lenght
> > >> with a position in the end.
> > >>
> > >> For instance, with fix lenght 3, Amsterdam would be something like:
> > >>
> > >>
> > >> a0 (two spaces added at beginning)
> > >> am1
> > >> ams2
> > >> mst3
> > >> ste4
> > >> ter5
> > >> erd6
> > >> rda7
> > >> dam8
> > >> am9 (one more space in the end)
> > >>
> > >> The number at the end being the position.
> > >>
> > >> Does anyone have a clue how to achieve this?
> > >>
> > >> Best regards,
> > >> Elisabeth
> > >>
> > >>
> > > --
> > > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > > Solr & Elasticsearch Support * http://sematext.com/
> > >
> > >
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


Re: Stopping Solr JVM on OOM

2016-03-09 Thread Shawn Heisey
On 3/9/2016 6:07 AM, Binoy Dalal wrote:
> Have you had a chance to check and review the patch?

I have not.  I will look at it sometime today, probably later this
evening (UTC-7 timezone).

Thanks,
Shawn



Re: Duplicate Document IDs when updateing parent document with child document

2016-03-09 Thread Mikhail Khludnev
Hello Sebastian,

Mixing standalone docs and blocks doesn't work. There are a plenty of
issues open.

On Wed, Mar 9, 2016 at 3:02 PM, Sebastian Riemer 
wrote:

> Hi,
>
> to actually describe my problem in short, instead of just linking to the
> test applicaton, using SolrJ I do the following:
>
> 1) Create a new document as a parent and commit
> SolrInputDocument parentDoc = new SolrInputDocument();
> parentDoc.addField("id", "parent_1");
> parentDoc.addField("name_s", "Sarah Connor");
> parentDoc.addField("blockJoinId", "1");
> solrClient.add(parentDoc);
> solrClient.commit();
>
> 2) Create a new document with the same unique-id as in 1) with a child
> document appended
> SolrInputDocument parentDocUpdateing = new SolrInputDocument();
> parentDocUpdateing.addField("id", "parent_1");
> parentDocUpdateing.addField("name_s", "Sarah Connor");
> parentDocUpdateing.addField("blockJoinId", "1");
>
> SolrInputDocument childDoc = new SolrInputDocument();
> childDoc.addField("id", "child_1");
> childDoc.addField("name_s", "John Connor");
> childDoc.addField("blockJoinId", "1");
>
> parentDocUpdateing.addChildDocument(childDoc);
> solrClient.add(parentDocUpdateing);
> solrClient.commit();
>
> 3) Results in 2 Documents with id="parent_1" in solr index
>
> Is this normal behaviour? I thought the existing document should be
> updated instead of generating a new document with same id.
>
> For a full working test application please see orginal message.
>
> Best regards,
> Sebastian
>
> -Ursprüngliche Nachricht-
> Von: Sebastian Riemer [mailto:s.rie...@littera.eu]
> Gesendet: Dienstag, 8. März 2016 20:05
> An: solr-user@lucene.apache.org
> Betreff: Duplicate Document IDs when updateing parent document with child
> document
>
> Hi,
>
> I have created a simple Java application which illustrates this issue.
>
> I am using Solr-Version 5.5.0 and SolrJ.
>
> Here is a link to the github repository:
> https://github.com/sebastianriemer/SolrDuplicateTest
>
> The issue I am facing is also described by another person on
> stackoverflow:
> http://stackoverflow.com/questions/34253178/solr-doesnt-overwrite-duplicated-uniquekey-entries
>
> I would love if any of you could run the test at your place and give me
> feedback.
>
> If you have any questions do not hesitate to write me.
>
> Many thanks in advance and best regards,
>
> Sebastian Riemer
>
>
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Stopping Solr JVM on OOM

2016-03-09 Thread Binoy Dalal
Hi Shawn,
Have you had a chance to check and review the patch?

On Wed, 9 Mar 2016, 00:49 Binoy Dalal,  wrote:

> I've uploaded both files.
> Please review and advise.
>
> On Wed, Mar 9, 2016 at 12:46 AM Binoy Dalal 
> wrote:
>
>> Hi Shawn,
>> The JIRA issue is SOLR-8803 (
>> https://issues.apache.org/jira/browse/SOLR-8803).
>> I've used "git diff" and created a patch but it only has the changes that
>> I made to the solr.cmd file under bin to add the -XX:OnOutOfMemoryError
>> option.
>> There's the entire file of the actual OOM kill script that does not show
>> in the patch.
>> Do I upload this file along with the patch or is there something else
>> I've to do to put in the new file.
>> Please advise.
>>
>> Thanks.
>>
>>
>> On Tue, Mar 8, 2016 at 7:03 PM Shawn Heisey  wrote:
>>
>>> On 3/8/2016 5:13 AM, Binoy Dalal wrote:
>>> > I've just finished writing a batch oom killer script and it seems to
>>> work
>>> > fine.
>>> >
>>> > I couldn't try it on the actual solr process since I'm a bit stumped
>>> on how
>>> > I can make solr throw an oom at will.
>>> > Although I did write another code that does throw an oom upon which
>>> this
>>> > script is called and the running solr process is killed.
>>> >
>>> > I would like to know how I should proceed from here with submitting the
>>> > code for review etc.
>>>
>>> Open an Improvement issue on the SOLR project in Apache's Jira with a
>>> title like "OOM killer for Windows" and a useful description.  Clone the
>>> source code from git, make your changes/additions.  Create a patch using
>>> "git diff" and upload it using SOLR-.patch as the filename -- the
>>> same name as the Jira issue.
>>>
>>> Making Solr OOM on purpose is possible, but it is usually better to
>>> write a small test program with an intentional memory leak.
>>>
>>> I wonder if we can write a test for OOM death.
>>>
>>> Thanks,
>>> Shawn
>>>
>>> --
>> Regards,
>> Binoy Dalal
>>
> --
> Regards,
> Binoy Dalal
>
-- 
Regards,
Binoy Dalal


Re: Disable hyper-threading for better Solr performance?

2016-03-09 Thread Ilan Schwarts
Solrcloud.. Faster discs.. Multiple cores on different physical discs would
help
On Mar 9, 2016 2:22 PM, "Vincenzo D'Amore"  wrote:

> Upgrading to Solr 5 you should improve your indexing performance.
>
>
> http://lucidworks.com/blog/2015/06/10/indexing-performance-solr-5-2-now-twice-fast/
>
> On Wed, Mar 9, 2016 at 1:13 PM, Avner Levy  wrote:
>
> > Currently I'm using Solr 4.8.1 but I can move to another version if it
> > performs significantly faster.
> > My target is to reach the max indexing throughput possible on the
> machine.
> > Since it seems the indexing process is CPU bound I was wondering whether
> > 32 logical cores with twice indexing threads will perform better.
> > Thanks,
> >  Avner
> >
> > -Original Message-
> > From: Ilan Schwarts [mailto:ila...@gmail.com]
> > Sent: Wednesday, March 09, 2016 9:09 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Disable hyper-threading for better Solr performance?
> >
> > What is the solr version and shard config? Standalone? Multiple cores?
> > Spread over RAID ?
> > On Mar 9, 2016 9:00 AM, "Avner Levy"  wrote:
> >
> > > I have a machine with 16 real cores (32 with HT enabled).
> > > I'm running on it a Solr server and trying to reach maximum
> > > performance for indexing and queries (indexing 20k documents/sec by a
> > > number of threads).
> > > I've read on multiple places that in some scenarios / products
> > > disabling the hyper-threading may result in better performance results.
> > > I'm looking for inputs / insights about HT on Solr setups.
> > > Thanks in advance,
> > >   Avner
> > >
> >
> >
> > Email secured by Check Point
> >
>
>
>
> --
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251
>


Re: Disable hyper-threading for better Solr performance?

2016-03-09 Thread Vincenzo D'Amore
Upgrading to Solr 5 you should improve your indexing performance.

http://lucidworks.com/blog/2015/06/10/indexing-performance-solr-5-2-now-twice-fast/

On Wed, Mar 9, 2016 at 1:13 PM, Avner Levy  wrote:

> Currently I'm using Solr 4.8.1 but I can move to another version if it
> performs significantly faster.
> My target is to reach the max indexing throughput possible on the machine.
> Since it seems the indexing process is CPU bound I was wondering whether
> 32 logical cores with twice indexing threads will perform better.
> Thanks,
>  Avner
>
> -Original Message-
> From: Ilan Schwarts [mailto:ila...@gmail.com]
> Sent: Wednesday, March 09, 2016 9:09 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Disable hyper-threading for better Solr performance?
>
> What is the solr version and shard config? Standalone? Multiple cores?
> Spread over RAID ?
> On Mar 9, 2016 9:00 AM, "Avner Levy"  wrote:
>
> > I have a machine with 16 real cores (32 with HT enabled).
> > I'm running on it a Solr server and trying to reach maximum
> > performance for indexing and queries (indexing 20k documents/sec by a
> > number of threads).
> > I've read on multiple places that in some scenarios / products
> > disabling the hyper-threading may result in better performance results.
> > I'm looking for inputs / insights about HT on Solr setups.
> > Thanks in advance,
> >   Avner
> >
>
>
> Email secured by Check Point
>



-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


RE: Disable hyper-threading for better Solr performance?

2016-03-09 Thread Avner Levy
Currently I'm using Solr 4.8.1 but I can move to another version if it performs 
significantly faster.
My target is to reach the max indexing throughput possible on the machine.
Since it seems the indexing process is CPU bound I was wondering whether 32 
logical cores with twice indexing threads will perform better.
Thanks,
 Avner

-Original Message-
From: Ilan Schwarts [mailto:ila...@gmail.com] 
Sent: Wednesday, March 09, 2016 9:09 AM
To: solr-user@lucene.apache.org
Subject: Re: Disable hyper-threading for better Solr performance?

What is the solr version and shard config? Standalone? Multiple cores?
Spread over RAID ?
On Mar 9, 2016 9:00 AM, "Avner Levy"  wrote:

> I have a machine with 16 real cores (32 with HT enabled).
> I'm running on it a Solr server and trying to reach maximum 
> performance for indexing and queries (indexing 20k documents/sec by a 
> number of threads).
> I've read on multiple places that in some scenarios / products 
> disabling the hyper-threading may result in better performance results.
> I'm looking for inputs / insights about HT on Solr setups.
> Thanks in advance,
>   Avner
>


Email secured by Check Point


AW: Duplicate Document IDs when updateing parent document with child document

2016-03-09 Thread Sebastian Riemer
Hi,

to actually describe my problem in short, instead of just linking to the test 
applicaton, using SolrJ I do the following:

1) Create a new document as a parent and commit
SolrInputDocument parentDoc = new SolrInputDocument();
parentDoc.addField("id", "parent_1");
parentDoc.addField("name_s", "Sarah Connor");
parentDoc.addField("blockJoinId", "1");
solrClient.add(parentDoc);
solrClient.commit();

2) Create a new document with the same unique-id as in 1) with a child document 
appended
SolrInputDocument parentDocUpdateing = new SolrInputDocument();
parentDocUpdateing.addField("id", "parent_1");
parentDocUpdateing.addField("name_s", "Sarah Connor");
parentDocUpdateing.addField("blockJoinId", "1");

SolrInputDocument childDoc = new SolrInputDocument();
childDoc.addField("id", "child_1");
childDoc.addField("name_s", "John Connor");
childDoc.addField("blockJoinId", "1");

parentDocUpdateing.addChildDocument(childDoc);
solrClient.add(parentDocUpdateing);
solrClient.commit();

3) Results in 2 Documents with id="parent_1" in solr index

Is this normal behaviour? I thought the existing document should be updated 
instead of generating a new document with same id.

For a full working test application please see orginal message.

Best regards,
Sebastian

-Ursprüngliche Nachricht-
Von: Sebastian Riemer [mailto:s.rie...@littera.eu] 
Gesendet: Dienstag, 8. März 2016 20:05
An: solr-user@lucene.apache.org
Betreff: Duplicate Document IDs when updateing parent document with child 
document

Hi,

I have created a simple Java application which illustrates this issue.

I am using Solr-Version 5.5.0 and SolrJ.

Here is a link to the github repository: 
https://github.com/sebastianriemer/SolrDuplicateTest

The issue I am facing is also described by another person on stackoverflow: 
http://stackoverflow.com/questions/34253178/solr-doesnt-overwrite-duplicated-uniquekey-entries

I would love if any of you could run the test at your place and give me 
feedback.

If you have any questions do not hesitate to write me.

Many thanks in advance and best regards,

Sebastian Riemer





Re: Relevancy for "tablet"

2016-03-09 Thread Alessandro Benedetti
Hi Robert,
this is the kind of scenario I have worked on in the last couple of years
in my previous company.
Adding semantic and natural language capabilities to your indexing pipeline
could help a lot.
First of all you need a meaningful knowledge base describing your business
ontology.
i.e. having the ground knowledge to let your machine understand that an
Ipad is a Tablet, and not a pill but an electronical device.
Adding a Named Entity Linking layer to your indexing pipeline ( with the
configured knowledge base) you can first of all identify at indexing level
when an occurrence of an entity should be linked to a real world object.
In your case, assuming a meaningful knowledge base, Ipad occurrences will
be linked to the Ipad entity which is of type Tablet which is of type
Electronical Device.
At this point you need to model your index with nested object and manage
the query time side.
Of course it is not an immediate solution, but the benefit could be good
and you can get closer to natural language search.

Take a look to my Lucene Revolution Presentation and some old blogs :

https://lucidworks.com/blog/2015/08/31/apache-solr-multi-language-content-discovery-entity-driven-search/

http://www.zaizi.com/blog/sensefy-content-discovery-through-entity-driven-search


On 9 March 2016 at 10:53, Charlie Hull  wrote:

> On 09/03/2016 10:05, Robert Brown wrote:
>
>> Hi,
>>
>> I'm looking for some advice and possible options for dealing with our
>> relevancy when searching through shopping products.
>>
>> A search for "tablet" returns pills, when the user would expect
>> electronic devices.
>>
>> Without any extra criteria (like category), how would/could you manage
>> this situation?
>>
>> Any solution would also need to scale since this is just a random example.
>>
>> Thanks,
>> Rob
>>
>> Hi Rob,
>
> Solr out of the box has no way of knowing that 'the user would expect
> electronic devices', unfortunately: since the record for 'pills' contains
> the word 'tablet', that's what you get. Note that if your users were
> expecting a medical answer everything would be rosy!
>
> Firstly, consider setting up some tests for these kinds of issues, so you
> can measure if the adjustments you're making are having the effect you want
> - we call this test-driven relevancy tuning. One tool you might consider
> (if you don't want to use a pile of spreadsheets) is Quepid (disclaimer: we
> resell this in the UK).
>
> Then, you need to work out *why* the results are wrong for your use case
> (using Solr's debugQuery helps here). If the words in the body text are
> having a disproportionate effect, consider boosting another part of the
> source data. Consider synonyms (if I search 'tablet' I should also get
> 'iPad'). To be honest this is a complex field with a lot of different knobs
> to adjust - I would recommend you take a look at Doug Turnbull and John
> Berryman's new book 'Relevant Search' (available on MEAP at Manning
> Publications) which is an excellent take on this.
>
> In short, you need a sensible methodology for tuning relevance, otherwise
> it can easily become a game of whack-a-mole!
>
> Cheers
>
> Charlie
>
> --
> Charlie Hull
> Flax - Open Source Enterprise Search
>
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.flax.co.uk
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Relevancy for "tablet"

2016-03-09 Thread Charlie Hull

On 09/03/2016 10:05, Robert Brown wrote:

Hi,

I'm looking for some advice and possible options for dealing with our
relevancy when searching through shopping products.

A search for "tablet" returns pills, when the user would expect
electronic devices.

Without any extra criteria (like category), how would/could you manage
this situation?

Any solution would also need to scale since this is just a random example.

Thanks,
Rob


Hi Rob,

Solr out of the box has no way of knowing that 'the user would expect 
electronic devices', unfortunately: since the record for 'pills' 
contains the word 'tablet', that's what you get. Note that if your users 
were expecting a medical answer everything would be rosy!


Firstly, consider setting up some tests for these kinds of issues, so 
you can measure if the adjustments you're making are having the effect 
you want - we call this test-driven relevancy tuning. One tool you might 
consider (if you don't want to use a pile of spreadsheets) is Quepid 
(disclaimer: we resell this in the UK).


Then, you need to work out *why* the results are wrong for your use case 
(using Solr's debugQuery helps here). If the words in the body text are 
having a disproportionate effect, consider boosting another part of the 
source data. Consider synonyms (if I search 'tablet' I should also get 
'iPad'). To be honest this is a complex field with a lot of different 
knobs to adjust - I would recommend you take a look at Doug Turnbull and 
John Berryman's new book 'Relevant Search' (available on MEAP at Manning 
Publications) which is an excellent take on this.


In short, you need a sensible methodology for tuning relevance, 
otherwise it can easily become a game of whack-a-mole!


Cheers

Charlie

--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Relevancy for "tablet"

2016-03-09 Thread Robert Brown

Hi,

I'm looking for some advice and possible options for dealing with our 
relevancy when searching through shopping products.


A search for "tablet" returns pills, when the user would expect 
electronic devices.


Without any extra criteria (like category), how would/could you manage 
this situation?


Any solution would also need to scale since this is just a random example.

Thanks,
Rob



BlockJoinQuery parser question

2016-03-09 Thread Sathyakumar Seshachalam
Hi,

Can an index contain both Standalone documents and nested documents ? I 
remember being told before that this may not work, but asking again for 
confirmation here.

What I have is a bunch of standalone document followed by nested documents in 
the index. The BlockJoin then works fine and am able to select/search parents 
whose children meet a specific criteria.
But the moment a set of new standalone document gets added to the index, the 
previously working search result starts to return wrong results (or wrongly 
matches a standalone parent).

Am using Solr 4.10.3. Will migrating to Solr 5 solve this ?

Regards,
Sathya


RE: Disable hyper-threading for better Solr performance?

2016-03-09 Thread Markus Jelsma
Hi - i can't remember having seen any threads on this topic for the past seven 
years. Can you perform a controlled test with a lot of concurrent users. I 
would suspect nowadays HT would boost highly concurrent environments such a 
search engines.

Markus

 
 
-Original message-
> From:Avner Levy 
> Sent: Wednesday 9th March 2016 8:00
> To: solr-user@lucene.apache.org
> Subject: Disable hyper-threading for better Solr performance?
> 
> I have a machine with 16 real cores (32 with HT enabled).
> I'm running on it a Solr server and trying to reach maximum performance for 
> indexing and queries (indexing 20k documents/sec by a number of threads).
> I've read on multiple places that in some scenarios / products disabling the 
> hyper-threading may result in better performance results.
> I'm looking for inputs / insights about HT on Solr setups.
> Thanks in advance,
>   Avner
>