Re: Does DocValues improve Grouping performance ?

2015-01-31 Thread Kydryavtsev Andrey


31.01.2015, 23:23, "Michael Sokolov" :
> On 1/31/2015 2:47 PM, Mikhail Khludnev wrote:
>>  Michael,
>>
>>  Please check two questions inlined below
>
> Hi Mikhail,
>>  On Sat, Jan 31, 2015 at 10:14 PM, Michael Sokolov <
>>  msoko...@safaribooksonline.com> wrote:
>>
>>  You can only handle a single relation this way since you have to
>>  restructure your index to use it; grouping is more flexible.
>>
>>  Michael,
>>  would you mind to comment which relations you need to model particularly?
>>  BJQ is definitely much restrictive than grouping, but still have some
>>  flexibility to cover the most frequent demands.
>
> This was really a theoretical comment only - in our case we only had a
> single relation (book->chapter), and the parent->child join worked out
> great.
>>  Would you mind to leave your vote
>>  https://issues.apache.org/jira/browse/SOLR-5662 it's not a big deal to
>>  implement.
>
> Sure, I just voted for the issue. In my case, I used the max score.
>
> -Mike


Re: [MASSMAIL]Re: "Contextual" sponsored results with Solr

2015-01-31 Thread Michael Sokolov

If you have a finite known set of hosts, you could do something truly awful:

create a field for each distinct host and set all of them to have 
value={id of the document} except for the host to which the document 
belongs: assign that hostname field some constant value, like "true".


Then query using group.field=host, group.limit=N, and apply a high boost 
to an optional term: host-wikipedia:true^100


Each group will contain a single entry except the top one.

But I bet you will get better performance from two queries.

-Mike

On 1/28/2015 10:51 AM, Jorge Luis Betancourt González wrote:

We are trying to avoid firing 2 queries per request. I've started to play with 
a PostFilter to see how it goes, perhaps something in the line of the 
ReRankQueryQueryParser could be used to avoid using two queries and instead 
rerank the results?

- Original Message -
From: "Ahmet Arslan" 
To: solr-user@lucene.apache.org
Sent: Tuesday, January 27, 2015 11:06:29 PM
Subject: [MASSMAIL]Re: "Contextual" sponsored results with Solr

Hi Jorge,

We have done similar thing with N=3. We issue separate two queries/requests, 
display 'special N' above the results.
We excluded 'special N' with -id:(1 2 3 ... N) type query. all done on client 
side.

Ahmet



On Tuesday, January 27, 2015 8:28 PM, Jorge Luis Betancourt González 
 wrote:
Hi all,

Recently I got an interesting use case that I'm not sure how to implement, the 
idea is that the client wants a fixed number of documents, let's call it N, to 
appear in the top of the results. Let me explain a little we're working with 
web documents so the idea is too promote the documents that match the query of 
the user from a given domain (wikipedia, for example) to the top of the list. 
So if I apply a a boost using the boost parameter:

http://localhost:8983/solr/select?q=search&fl=url&boost=map(query($type1query),0,0,1,50)&type1query=host:wikipedia

I get *all* the documents from the desired host at the top, but there is no way 
of limiting the number of documents from the host that are boosted to the top 
of the result list (which could lead to several pages of content from the same 
host, which is not desired, the idea is to only show N) . I was thinking in 
something like field collapsing/grouping but only for the documents that match 
my $type1query parameter (host:wikipedia) but I don't see any way of doing 
grouping/collapsing on only one group and leave the other results untouched.

I although thought on using 2 groups using group.query=host:wikipedia and 
group.query=-host:wikipedia, but in this case there is no way of controlling 
how much documents each independently group will have.

In this particular case QueryElevationComponent it's not helping because I 
don't want to map all the posible queries I just want to put the some of the 
results from a certain host in the top of the list, but without boosting all 
the documents from the same host.

Any thoughts or recommendations on this?

Thank you,

Regards,


---
XII Aniversario de la creación de la Universidad de las Ciencias Informáticas. 
12 años de historia junto a Fidel. 12 de diciembre de 2014.


---
XII Aniversario de la creación de la Universidad de las Ciencias Informáticas. 
12 años de historia junto a Fidel. 12 de diciembre de 2014.





Re: Does DocValues improve Grouping performance ?

2015-01-31 Thread Michael Sokolov

On 1/31/2015 2:47 PM, Mikhail Khludnev wrote:

Michael,

Please check two questions inlined below

Hi Mikhail,


On Sat, Jan 31, 2015 at 10:14 PM, Michael Sokolov <
msoko...@safaribooksonline.com> wrote:


You can only handle a single relation this way since you have to
restructure your index to use it; grouping is more flexible.

Michael,
would you mind to comment which relations you need to model particularly?
BJQ is definitely much restrictive than grouping, but still have some
flexibility to cover the most frequent demands.

This was really a theoretical comment only - in our case we only had a 
single relation (book->chapter), and the parent->child join worked out 
great.

Would you mind to leave your vote
https://issues.apache.org/jira/browse/SOLR-5662 it's not a big deal to
implement.


Sure, I just voted for the issue. In my case, I used the max score.

-Mike


Re: Calling custom request handler with data import

2015-01-31 Thread Mikhail Khludnev
at your service!

On Sat, Jan 31, 2015 at 1:00 PM, vineet yadav 
wrote:

> Hi mikhail,
> Thanks for the suggestion. it is helpful.
>
> Regards
> Vineet Yadav
>
>
> On Sat, Jan 31, 2015 at 2:38 PM, Mikhail Khludnev <
> mkhlud...@griddynamics.com> wrote:
>
> > Did you try to specify update processor?, ie
> >
> > On Fri, Jan 30, 2015 at 5:07 PM, vineet yadav <
> vineet.yadav.i...@gmail.com
> > >
> > wrote:
> >
> > >  > > class="org.apache.solr.handler.dataimport.DataImportHandler">
> > > 
> > >  data-import.xml
> > >
> >
> > /ner
> >
> >  
> > > 
> > >
> >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > 
> > 
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Does DocValues improve Grouping performance ?

2015-01-31 Thread Mikhail Khludnev
Michael,

Please check two questions inlined below

On Sat, Jan 31, 2015 at 10:14 PM, Michael Sokolov <
msoko...@safaribooksonline.com> wrote:

> We were using grouping (no DocValues, though) and recently switched to
> using block-indexing and joins (see https://cwiki.apache.org/
> confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers).
> We got a nice speedup on average (perhaps 2x faster) and an even better
> improvement in the worst times; overall the performance is much more
> predictable and better, and I suspect (haven't checked) that we may be
> using less heap too.  The block indexing is cutting edge, a little
> complicated to get right, and I had to make some custom java code to get
> things just the way I wanted, but for best performance it does seem to be
> the way to go.
>
> Beware some gotchas:
>
> You have to reindex all the docs that participate in the parent-child
> relation so that each parent-child block gets indexed at once.  This might
> cause difficulties, but for us and I suspect most people, it's the natural
> thing to do anyway.
>
> You can only handle a single relation this way since you have to
> restructure your index to use it; grouping is more flexible.
>
Michael,
would you mind to comment which relations you need to model particularly?
BJQ is definitely much restrictive than grouping, but still have some
flexibility to cover the most frequent demands.


>
> Clients may not support the new block-indexing syntax (I think SolrJ has
> it, but the python client we were using did not);
>
> Converting an existing index requires special care; you basically have to
> delete all documents you are re-indexing
>
> The Solr query parsers don't support scoring the joined-from documents
> (child docs in the to-parent query, parent docs in the to-child query).
> This might not matter to you, but it was important for our use case
>
Would you mind to leave your vote
https://issues.apache.org/jira/browse/SOLR-5662 it's not a big deal to
implement.


> So there are some kinks still, but if you can make it work for you, it
> does seem to perform better than grouping.
>
> -Mike
>
>
> On 1/30/2015 4:10 PM, Cario, Elaine wrote:
>
>> Hi Shamik,
>>
>> We use DocValues for grouping, and although I have nothing to compare it
>> to (we started with DocValues), we are also seeing similar poor results as
>> you: easily 60% overhead compared to non-group queries.  Looking around for
>> some solution, no quick fix is presenting itself unfortunately.
>> CollapsingQParserPlugin also is too limited for our needs.
>>
>> -Original Message-
>> From: Shamik Bandopadhyay [mailto:sham...@gmail.com]
>> Sent: Thursday, January 15, 2015 6:02 PM
>> To: solr-user@lucene.apache.org
>> Subject: Does DocValues improve Grouping performance ?
>>
>> Hi,
>>
>> Does use of DocValues provide any performance improvement for
>> Grouping ?
>> I' looked into the blog which mentions improving Grouping performance
>> through DocValues.
>>
>> https://lucidworks.com/blog/fun-with-docvalues-in-solr-4-2/
>>
>> Right now, Group by queries (which I can't sadly avoid) has become a huge
>> bottleneck. It has an overhead of 60-70% compared to the same query san
>> group by. Unfortunately, I'm not able to be CollapsingQParserPlugin as it
>> doesn't have a support similar to "group.facet" feature.
>>
>> My understanding on DocValues is that it's intended for faceting and
>> sorting. Just wondering if anyone have tried DocValues for Grouping and saw
>> any improvements ?
>>
>> -Thanks,
>> Shamik
>>
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Does DocValues improve Grouping performance ?

2015-01-31 Thread Michael Sokolov
We were using grouping (no DocValues, though) and recently switched to 
using block-indexing and joins (see 
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers). 
We got a nice speedup on average (perhaps 2x faster) and an even better 
improvement in the worst times; overall the performance is much more 
predictable and better, and I suspect (haven't checked) that we may be 
using less heap too.  The block indexing is cutting edge, a little 
complicated to get right, and I had to make some custom java code to get 
things just the way I wanted, but for best performance it does seem to 
be the way to go.


Beware some gotchas:

You have to reindex all the docs that participate in the parent-child 
relation so that each parent-child block gets indexed at once.  This 
might cause difficulties, but for us and I suspect most people, it's the 
natural thing to do anyway.


You can only handle a single relation this way since you have to 
restructure your index to use it; grouping is more flexible.


Clients may not support the new block-indexing syntax (I think SolrJ has 
it, but the python client we were using did not);


Converting an existing index requires special care; you basically have 
to delete all documents you are re-indexing


The Solr query parsers don't support scoring the joined-from documents 
(child docs in the to-parent query, parent docs in the to-child query). 
This might not matter to you, but it was important for our use case


So there are some kinks still, but if you can make it work for you, it 
does seem to perform better than grouping.


-Mike

On 1/30/2015 4:10 PM, Cario, Elaine wrote:

Hi Shamik,

We use DocValues for grouping, and although I have nothing to compare it to (we 
started with DocValues), we are also seeing similar poor results as you: easily 
60% overhead compared to non-group queries.  Looking around for some solution, 
no quick fix is presenting itself unfortunately.  CollapsingQParserPlugin also 
is too limited for our needs.

-Original Message-
From: Shamik Bandopadhyay [mailto:sham...@gmail.com]
Sent: Thursday, January 15, 2015 6:02 PM
To: solr-user@lucene.apache.org
Subject: Does DocValues improve Grouping performance ?

Hi,

Does use of DocValues provide any performance improvement for Grouping ?
I' looked into the blog which mentions improving Grouping performance through 
DocValues.

https://lucidworks.com/blog/fun-with-docvalues-in-solr-4-2/

Right now, Group by queries (which I can't sadly avoid) has become a huge bottleneck. It 
has an overhead of 60-70% compared to the same query san group by. Unfortunately, I'm not 
able to be CollapsingQParserPlugin as it doesn't have a support similar to 
"group.facet" feature.

My understanding on DocValues is that it's intended for faceting and sorting. 
Just wondering if anyone have tried DocValues for Grouping and saw any 
improvements ?

-Thanks,
Shamik




How deletes affect on QPS

2015-01-31 Thread Dmitry Kan
Hi,

Somebody on IRC has recently asked about when to do solr index
optimization. I pointed out, that deletes in the index could be a good
reason to periodically optimize, here is a nice post with QPS statistics
per query type (though on Elasticsearch site, it is relevant, because
essentially the post is on Lucene level):

https://www.elasticsearch.org/blog/lucenes-handling-of-deleted-documents/

-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


Re: solrj returning no results but curl can get them

2015-01-31 Thread Dmitry Kan
Hi Sol,

Glad to hear it was easier to fix, i.e. not a solr core issue.

On Fri, Jan 30, 2015 at 5:27 PM, S L  wrote:

> It was pilot error. I just reviewed my servlet and noticed a parameter in
> web.xml that was looking to find data for the new product in the production
> index which doesn't have that data yet while my curl command was running
> against the staging index. I rebuilt the servlet with the fixed parameter
> and life is now good.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/solrj-returning-no-results-but-curl-can-get-them-tp4183053p4183119.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


subscription to the mailing list

2015-01-31 Thread Vijay Tiwary
Hi,

I want to subscribe to the mailing list



Regards,
Vijay


Solr Consultant for remote project - on NLP and Solr Faceted Search

2015-01-31 Thread MKGoose
We are looking for a remote / freelance consultant to work with us on a
project related to Solr faceted search and NLP. It involves data extraction
/ summarisation and custom faceted search on Solr.

Please contact me if you have expertise in this area and can work remotely
with a small team.

Thanks,
MG.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Consultant-for-remote-project-on-NLP-and-Solr-Faceted-Search-tp4183236.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Calling custom request handler with data import

2015-01-31 Thread vineet yadav
Hi mikhail,
Thanks for the suggestion. it is helpful.

Regards
Vineet Yadav


On Sat, Jan 31, 2015 at 2:38 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Did you try to specify update processor?, ie
>
> On Fri, Jan 30, 2015 at 5:07 PM, vineet yadav  >
> wrote:
>
> >  > class="org.apache.solr.handler.dataimport.DataImportHandler">
> > 
> >  data-import.xml
> >
>
> /ner
>
>  
> > 
> >
>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>


Re: Calling custom request handler with data import

2015-01-31 Thread Mikhail Khludnev
Did you try to specify update processor?, ie

On Fri, Jan 30, 2015 at 5:07 PM, vineet yadav 
wrote:

>  class="org.apache.solr.handler.dataimport.DataImportHandler">
> 
>  data-import.xml
>

/ner

 
> 
>




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics