Re: retreive all the fields in join

2014-05-16 Thread Kranti Parisa
Aman,

The option you have got is:
- write custom components like request handlers, collectors & response
writers..
- first you would do the join, then apply the pagination
- you will get the docList in response writer, you would need to make a
call to the second core (you could be smart to use the FQs so that you
could hit the cache and hence the second call will be fast) and fetch the
documents
- use them for building the response

out of the box Solr won't do this for you..

Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Mon, May 12, 2014 at 7:05 AM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> On Sun, May 11, 2014 at 12:14 PM, Aman Tandon  >wrote:
>
> > Is it possible?
>
>
> no.
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
>  
>


Re: Join solr

2014-04-24 Thread Kranti Parisa
Can you describe what is your business requirement (how'z your data
indexed, what is the request and what should be the response). and give us
some examples.

If you want to sort the results of the first core based on the sorting
preference of the second core that you are joining with, that doesn't work
on the fly. you will need to write custom code

Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Thu, Apr 24, 2014 at 6:11 AM, hungctk33  wrote:

> Pls! Help me.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Join-solr-tp4132615p4132830.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: cache warming questions

2014-04-18 Thread Kranti Parisa
cool, thanks.

Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Thu, Apr 17, 2014 at 11:37 PM, Erick Erickson wrote:

> No, the 5 most recently used in a query will be used to autowarm.
>
> If you have things you _know_ are going to be popular fqs, you could
> put them in newSearcher queries.
>
> Best,
> Erick
>
> On Thu, Apr 17, 2014 at 4:51 PM, Kranti Parisa 
> wrote:
> > Erik,
> >
> > I have a followup question on this topic.
> >
> > If we have used 10 unique FQs and when we configure filterCache=100 &
> > autoWarm=5, then which 5 out of the 10 will be repopulated in the case of
> > new searcher?
> >
> > I don't think there is a way to set the preference or there is?
> >
> >
> > Thanks,
> > Kranti K. Parisa
> > http://www.linkedin.com/in/krantiparisa
> >
> >
> >
> > On Thu, Apr 17, 2014 at 5:25 PM, Matt Kuiper 
> wrote:
> >
> >> Ok,  that makes sense.
> >>
> >> Thanks again,
> >> Matt
> >>
> >> Matt Kuiper - Software Engineer
> >> Intelligent Software Solutions
> >> p. 719.452.7721 | matt.kui...@issinc.com
> >> www.issinc.com | LinkedIn: intelligent-software-solutions
> >>
> >> -Original Message-
> >> From: Erick Erickson [mailto:erickerick...@gmail.com]
> >> Sent: Thursday, April 17, 2014 9:26 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: cache warming questions
> >>
> >> Don't go overboard warming here, you often hit diminishing returns very
> >> quickly. For instance, if the size is 512 you might set your autowarm
> count
> >> to 16 and get the most bang for your buck. Beyond some (usually small)
> >> number, the additional work you put in to warming is wasted. This is
> >> especially true if your autocommit (soft, or hard with
> >> openSearcher=true) is short.
> >>
> >> So while you're correct in your sizing bit, practically it's rarely that
> >> complicated since the autowarm count is usually so much smaller than the
> >> size that there's no danger of swapping them out. YMMV of course.
> >>
> >> Best,
> >> Erick
> >>
> >> On Wed, Apr 16, 2014 at 10:33 AM, Matt Kuiper 
> >> wrote:
> >> > Thanks Erick, this is helpful information!
> >> >
> >> > So it sounds like, at minimum the cache size (at least for filterCache
> >> and queryResultCache) should be the sum of the autowarmCount for that
> cache
> >> and the number of queries defined for the newSearcher listener.
>  Otherwise
> >> some items in the caches will be evicted right away.
> >> >
> >> > Matt
> >> >
> >> > -Original Message-
> >> > From: Erick Erickson [mailto:erickerick...@gmail.com]
> >> > Sent: Tuesday, April 15, 2014 5:21 PM
> >> > To: solr-user@lucene.apache.org
> >> > Subject: Re: cache warming questions
> >> >
> >> > bq: What does it mean that items will be regenerated or prepopulated
> >> from the current searcher's cache...
> >> >
> >> > You're right, the values aren't cached. They can't be since the
> internal
> >> Lucene document id is used to identify docs, and due to merging the
> >> internal ID may bear no relation to the old internal ID for a particular
> >> document.
> >> >
> >> > I find it useful to think of Solr's caches as a  map where the key is
> >> the "query" and the value is some representation of the found documents.
> >> The details of the value don't matter, so I'll skip them.
> >> >
> >> > What matters is the key. Consider the filter cache. You put something
> >> like &fq=price:[0 TO 100] on a URL. Solr then uses the fq  clause as the
> >> key to the filterCache.
> >> >
> >> > Here's the sneaky bit. When you specify an autowarm count of N for the
> >> filterCache, when a new searcher is opened the first N keys from the map
> >> are re-executed in the new searcher's context and the results put into
> the
> >> new searcher's filterCache.
> >> >
> >> > bq:  ...how does auto warming and explicit warming work together?
> >> >
> >> > They're orthogonal. IOW, the autowarming for each cache is executed as
> >> well as the newSearcher static warming queries. Use the static queries
> to
> >> do 

Re: cache warming questions

2014-04-17 Thread Kranti Parisa
Erik,

I have a followup question on this topic.

If we have used 10 unique FQs and when we configure filterCache=100 &
autoWarm=5, then which 5 out of the 10 will be repopulated in the case of
new searcher?

I don't think there is a way to set the preference or there is?


Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Thu, Apr 17, 2014 at 5:25 PM, Matt Kuiper  wrote:

> Ok,  that makes sense.
>
> Thanks again,
> Matt
>
> Matt Kuiper - Software Engineer
> Intelligent Software Solutions
> p. 719.452.7721 | matt.kui...@issinc.com
> www.issinc.com | LinkedIn: intelligent-software-solutions
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Thursday, April 17, 2014 9:26 AM
> To: solr-user@lucene.apache.org
> Subject: Re: cache warming questions
>
> Don't go overboard warming here, you often hit diminishing returns very
> quickly. For instance, if the size is 512 you might set your autowarm count
> to 16 and get the most bang for your buck. Beyond some (usually small)
> number, the additional work you put in to warming is wasted. This is
> especially true if your autocommit (soft, or hard with
> openSearcher=true) is short.
>
> So while you're correct in your sizing bit, practically it's rarely that
> complicated since the autowarm count is usually so much smaller than the
> size that there's no danger of swapping them out. YMMV of course.
>
> Best,
> Erick
>
> On Wed, Apr 16, 2014 at 10:33 AM, Matt Kuiper 
> wrote:
> > Thanks Erick, this is helpful information!
> >
> > So it sounds like, at minimum the cache size (at least for filterCache
> and queryResultCache) should be the sum of the autowarmCount for that cache
> and the number of queries defined for the newSearcher listener.  Otherwise
> some items in the caches will be evicted right away.
> >
> > Matt
> >
> > -Original Message-
> > From: Erick Erickson [mailto:erickerick...@gmail.com]
> > Sent: Tuesday, April 15, 2014 5:21 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: cache warming questions
> >
> > bq: What does it mean that items will be regenerated or prepopulated
> from the current searcher's cache...
> >
> > You're right, the values aren't cached. They can't be since the internal
> Lucene document id is used to identify docs, and due to merging the
> internal ID may bear no relation to the old internal ID for a particular
> document.
> >
> > I find it useful to think of Solr's caches as a  map where the key is
> the "query" and the value is some representation of the found documents.
> The details of the value don't matter, so I'll skip them.
> >
> > What matters is the key. Consider the filter cache. You put something
> like &fq=price:[0 TO 100] on a URL. Solr then uses the fq  clause as the
> key to the filterCache.
> >
> > Here's the sneaky bit. When you specify an autowarm count of N for the
> filterCache, when a new searcher is opened the first N keys from the map
> are re-executed in the new searcher's context and the results put into the
> new searcher's filterCache.
> >
> > bq:  ...how does auto warming and explicit warming work together?
> >
> > They're orthogonal. IOW, the autowarming for each cache is executed as
> well as the newSearcher static warming queries. Use the static queries to
> do things like fill the sort caches etc.
> >
> > Incidentally, this bears on why there's a "firstSearcher" and
> "newSearcher". The newSearcher queries are run in addition to the cache
> autowarms. firstSearcher static queries are only run when a Solr server is
> started the first time, and there are no cache entries to autowarm. So the
> firstSearcher queries might be quite a bit more complex than newSearcher
> queries.
> >
> > HTH,
> > Erick
> >
> > On Tue, Apr 15, 2014 at 1:55 PM, Matt Kuiper 
> wrote:
> >> Hello,
> >>
> >> I have a few questions regarding how Solr caches are warmed.
> >>
> >> My understanding is that there are two ways to warm internal Solr
> caches (only one way for document cache and lucene FieldCache):
> >>
> >> Auto warming - occurs when there is a current searcher handling
> requests and new searcher is being prepared.  "When a new searcher is
> opened, its caches may be prepopulated or "autowarmed" with cached object
> from caches in the old searcher. autowarmCount is the number of cached
> items that will be regenerated in the new searcher."
> http://wiki.apache.org/solr/SolrCaching#autowarmCount
> >>
> >> Explicit warming - where the static warming queries specified in
> Solrconfig.xml for newSearcher and firstSearcher listeners are executed
> when a new searcher is being prepared.
> >>
> >> What does it mean that items will be regenerated or prepopulated from
> the current searcher's cache to the new searcher's cache?  I doubt it means
> copy, as the index has likely changed with a commit and possibly
> invalidated some contents of the cache.  Are the queries, or filters, that
> define the contents of the current caches re-executed for th

Re: join and filter query with AND

2014-03-24 Thread Kranti Parisa
glad the suggestions are working for you!

Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Mon, Mar 24, 2014 at 4:10 AM, Marcin Rzewucki wrote:

> Hi,
>
> Yonik, thank you for explaining me the reason of the issue. The workarounds
> you suggested are working fine.
> Kranti, your suggestion was also good :-)
>
> Thanks a lot!
>
>
>
> On 21 March 2014 20:00, Kranti Parisa  wrote:
>
> > My example should also work, am I missing something?
> >
> > &q=({!join from=inner_id to=outer_id fromIndex=othercore
> > v=$joinQuery})&joinQuery=(city:"Stara Zagora" AND prod:214)
> >
> > Thanks,
> > Kranti K. Parisa
> > http://www.linkedin.com/in/krantiparisa
> >
> >
> >
> > On Fri, Mar 21, 2014 at 2:11 PM, Yonik Seeley 
> > wrote:
> >
> > > Correct.  This is only a limitation of embedding a local-params style
> > > subquery within lucene syntax.
> > > The parser, not knowing the syntax of the embedded query, currently
> > > assumes the query text ends at whitespace or other special punctuation
> > > such as ")".
> > >
> > > Original:
> > > (({!join from=inner_id to=outer_id fromIndex=othercore}city:"Stara
> > > Zagora")) AND (prod:214)
> > >
> > > Some possible workarounds that should work:
> > > &q={!join from=inner_id to=outer_id fromIndex=othercore}city:"Stara
> > Zagora"
> > > &fq=prod:214
> > >
> > > &q=({!join from=inner_id to=outer_id fromIndex=othercore
> > > v='city:"Stara Zagora"'} AND prod:214)
> > >
> > > &q=({!join from=inner_id to=outer_id fromIndex=othercore v=$jq} AND
> > > prod:214)
> > > &jq=city:"Stara Zagora"
> > >
> > >
> > > -Yonik
> > > http://heliosearch.org - solve Solr GC pauses with off-heap filters
> > > and fieldcache
> > >
> > >
> > > On Fri, Mar 21, 2014 at 1:54 PM, Jack Krupansky <
> j...@basetechnology.com
> > >
> > > wrote:
> > > > I suspect that this is a bug in the implementation of the parsing of
> > > > embedded nested query parsers . That's a fairly new feature compared
> to
> > > > non-embedded nested query parsers - maybe Yonik could shed some
> light.
> > > This
> > > > may date from when he made a copy of the Lucene query parser for Solr
> > and
> > > > added the parsing of embedded nested query parsers to the grammar. It
> > > seems
> > > > like the embedded nested query parser is only being applied to a
> > single,
> > > > white space-delimited term, and not respecting the fact that the term
> > is
> > > a
> > > > quoted phrase.
> > > >
> > > > -- Jack Krupansky
> > > >
> > > > -Original Message- From: Marcin Rzewucki
> > > > Sent: Thursday, March 20, 2014 5:19 AM
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: join and filter query with AND
> > > >
> > > >
> > > > Nope. There is no line break in the string and it is not feed from
> > file.
> > > > What else could be the reason ?
> > > >
> > > >
> > > >
> > > > On 19 March 2014 17:57, Erick Erickson 
> > wrote:
> > > >
> > > >> It looks to me like you're feeding this from some
> > > >> kind of text file and you really _do_ have a
> > > >> line break after "Stara
> > > >>
> > > >> Or have a line break in the string you paste into the URL
> > > >> or something similar.
> > > >>
> > > >> Kind of shooting in the dark though.
> > > >>
> > > >> Erick
> > > >>
> > > >> On Wed, Mar 19, 2014 at 8:48 AM, Marcin Rzewucki <
> mrzewu...@gmail.com
> > >
> > > >> wrote:
> > > >> > Hi,
> > > >> >
> > > >> > I have the following issue with join query parser and filter
> query.
> > > For
> > > >> > such query:
> > > >> >
> > > >> > *:*
> > > >> > 
> > > >> > (({!join from=inner_id to=outer_id fromIndex=othercore}city:"Stara
> > > >> > Zagora")) AND (prod:214)
> > > >> > 
> > > >> >
> > > >> > I got error:

Re: join and filter query with AND

2014-03-21 Thread Kranti Parisa
My example should also work, am I missing something?

&q=({!join from=inner_id to=outer_id fromIndex=othercore
v=$joinQuery})&joinQuery=(city:"Stara Zagora" AND prod:214)

Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Fri, Mar 21, 2014 at 2:11 PM, Yonik Seeley  wrote:

> Correct.  This is only a limitation of embedding a local-params style
> subquery within lucene syntax.
> The parser, not knowing the syntax of the embedded query, currently
> assumes the query text ends at whitespace or other special punctuation
> such as ")".
>
> Original:
> (({!join from=inner_id to=outer_id fromIndex=othercore}city:"Stara
> Zagora")) AND (prod:214)
>
> Some possible workarounds that should work:
> &q={!join from=inner_id to=outer_id fromIndex=othercore}city:"Stara Zagora"
> &fq=prod:214
>
> &q=({!join from=inner_id to=outer_id fromIndex=othercore
> v='city:"Stara Zagora"'} AND prod:214)
>
> &q=({!join from=inner_id to=outer_id fromIndex=othercore v=$jq} AND
> prod:214)
> &jq=city:"Stara Zagora"
>
>
> -Yonik
> http://heliosearch.org - solve Solr GC pauses with off-heap filters
> and fieldcache
>
>
> On Fri, Mar 21, 2014 at 1:54 PM, Jack Krupansky 
> wrote:
> > I suspect that this is a bug in the implementation of the parsing of
> > embedded nested query parsers . That's a fairly new feature compared to
> > non-embedded nested query parsers - maybe Yonik could shed some light.
> This
> > may date from when he made a copy of the Lucene query parser for Solr and
> > added the parsing of embedded nested query parsers to the grammar. It
> seems
> > like the embedded nested query parser is only being applied to a single,
> > white space-delimited term, and not respecting the fact that the term is
> a
> > quoted phrase.
> >
> > -- Jack Krupansky
> >
> > -Original Message- From: Marcin Rzewucki
> > Sent: Thursday, March 20, 2014 5:19 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: join and filter query with AND
> >
> >
> > Nope. There is no line break in the string and it is not feed from file.
> > What else could be the reason ?
> >
> >
> >
> > On 19 March 2014 17:57, Erick Erickson  wrote:
> >
> >> It looks to me like you're feeding this from some
> >> kind of text file and you really _do_ have a
> >> line break after "Stara
> >>
> >> Or have a line break in the string you paste into the URL
> >> or something similar.
> >>
> >> Kind of shooting in the dark though.
> >>
> >> Erick
> >>
> >> On Wed, Mar 19, 2014 at 8:48 AM, Marcin Rzewucki 
> >> wrote:
> >> > Hi,
> >> >
> >> > I have the following issue with join query parser and filter query.
> For
> >> > such query:
> >> >
> >> > *:*
> >> > 
> >> > (({!join from=inner_id to=outer_id fromIndex=othercore}city:"Stara
> >> > Zagora")) AND (prod:214)
> >> > 
> >> >
> >> > I got error:
> >> > 
> >> > 
> >> > org.apache.solr.search.SyntaxError: Cannot parse 'city:"Stara':
> Lexical
> >> > error at line 1, column 12. Encountered:  after : "\"Stara"
> >> > 
> >> > 400
> >> > 
> >> >
> >> > Stack:
> >> > DEBUG - 2014-03-19 13:35:20.825;
> >> org.eclipse.jetty.servlet.ServletHandler;
> >> > chain=SolrRequestFilter->default
> >> > DEBUG - 2014-03-19 13:35:20.826;
> >> > org.eclipse.jetty.servlet.ServletHandler$CachedChain; call filter
> >> > SolrRequestFilter
> >> > ERROR - 2014-03-19 13:35:20.828; org.apache.solr.common.SolrException;
> >> > org.apache.solr.common.SolrException: >
> >> > org.apache.solr.search.SyntaxError:
> >> > Cannot parse 'city:"Stara': Lexical error at line 1, column 12.  E
> >> > ncountered:  after : "\"Stara"
> >> > at
> >> >
> >>
> >>
> org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:179)
> >> > at
> >> >
> >>
> >>
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:193)
> >> > at
> >> >
> >>
> >>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> >> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
> >> > at
> >> >
> >>
> >>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)
> >> > at
> >> >
> >>
> >>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
> >> > at
> >> >
> >>
> >>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
> >> > at
> >> >
> >>
> >>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> >> > at
> >> >
> >>
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> >> > at
> >> >
> >>
> >>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> >> > at
> >> >
> >>
> >>
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> >> > at
> >> >
> >>
> >>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> >> > at
> >> >
> >>
> >>
> org.eclipse.jetty.server.han

Re: join and filter query with AND

2014-03-21 Thread Kranti Parisa
You may try this

(({!join from=inner_id to=outer_id fromIndex=othercore v=$joinQuery}

And pass another parameter joinQuery=(city:"Stara Zagora" AND prod:214)

Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Fri, Mar 21, 2014 at 4:47 AM, Marcin Rzewucki wrote:

> Hi,
>
> Erick, I do not get your point. What kind of servlet container settings do
> you mean and why do you think they might be related ? I'm using Jetty and
> never set any limit for packet size. My query does not work only in case of
> double quotes and space between words. Why? It works in other cases as
> described in my first mail.
>
> Cheers.
>
>
>
> On 20 March 2014 15:23, Erick Erickson  wrote:
>
> > Well, the error message really looks like your input is
> > getting chopped off.
> >
> > It's vaguely possible that you have some super-low limit
> > in your servlet container configuration that is only letting very
> > small packets through.
> >
> > What I'd do is look in the Solr log file to see exactly what
> > is coming through. Because regardless of what you _think_
> > you're sending, it _really_ looks like Solr is getting the fq
> > clause with something that breaks it up. So I'd like to
> > absolutely nail that as being wrong before speculating.
> >
> > Because I can cut/paste your fq clause just fine. Of course
> > it fails because I don't have the other core defined, but that
> > means the query has made it through query parsing while
> > yours hasn't in your setup.
> >
> > Best,
> > Erick
> >
> > On Thu, Mar 20, 2014 at 2:19 AM, Marcin Rzewucki 
> > wrote:
> > > Nope. There is no line break in the string and it is not feed from
> file.
> > > What else could be the reason ?
> > >
> > >
> > >
> > > On 19 March 2014 17:57, Erick Erickson 
> wrote:
> > >
> > >> It looks to me like you're feeding this from some
> > >> kind of text file and you really _do_ have a
> > >> line break after "Stara
> > >>
> > >> Or have a line break in the string you paste into the URL
> > >> or something similar.
> > >>
> > >> Kind of shooting in the dark though.
> > >>
> > >> Erick
> > >>
> > >> On Wed, Mar 19, 2014 at 8:48 AM, Marcin Rzewucki  >
> > >> wrote:
> > >> > Hi,
> > >> >
> > >> > I have the following issue with join query parser and filter query.
> > For
> > >> > such query:
> > >> >
> > >> > *:*
> > >> > 
> > >> > (({!join from=inner_id to=outer_id fromIndex=othercore}city:"Stara
> > >> > Zagora")) AND (prod:214)
> > >> > 
> > >> >
> > >> > I got error:
> > >> > 
> > >> > 
> > >> > org.apache.solr.search.SyntaxError: Cannot parse 'city:"Stara':
> > Lexical
> > >> > error at line 1, column 12. Encountered:  after : "\"Stara"
> > >> > 
> > >> > 400
> > >> > 
> > >> >
> > >> > Stack:
> > >> > DEBUG - 2014-03-19 13:35:20.825;
> > >> org.eclipse.jetty.servlet.ServletHandler;
> > >> > chain=SolrRequestFilter->default
> > >> > DEBUG - 2014-03-19 13:35:20.826;
> > >> > org.eclipse.jetty.servlet.ServletHandler$CachedChain; call filter
> > >> > SolrRequestFilter
> > >> > ERROR - 2014-03-19 13:35:20.828;
> org.apache.solr.common.SolrException;
> > >> > org.apache.solr.common.SolrException:
> > org.apache.solr.search.SyntaxError:
> > >> > Cannot parse 'city:"Stara': Lexical error at line 1, column 12.  E
> > >> > ncountered:  after : "\"Stara"
> > >> > at
> > >> >
> > >>
> >
> org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:179)
> > >> > at
> > >> >
> > >>
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:193)
> > >> > at
> > >> >
> > >>
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> > >> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
> > >> > at
> > >> >
> > >>
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)
> > >> > at
> > >> >
> > >>
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
> > >> > at
> > >> >
> > >>
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
> > >> > at
> > >> >
> > >>
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> > >> > at
> > >> >
> > >>
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> > >> > at
> > >> >
> > >>
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> > >> > at
> > >> >
> > >>
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> > >> > at
> > >> >
> > >>
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> > >> > at
> > >> >
> > >>
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> > >> > at
> > >> >
> > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> > >> > at
> > >> >
> > >>
> >
> org.eclips

Re: Indexing huge data

2014-03-06 Thread Kranti Parisa
thats what I do. precreate JSONs following the schema, saving that in
MongoDB, this is part of the ETL process. after that, just dump the JSONs
into Solr using batching etc. with this you can do full and incremental
indexing as well.

Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Thu, Mar 6, 2014 at 9:57 AM, Rallavagu  wrote:

> Yeah. I have thought about spitting out JSON and run it against Solr using
> parallel Http threads separately. Thanks.
>
>
> On 3/5/14, 6:46 PM, Susheel Kumar wrote:
>
>> One more suggestion is to collect/prepare the data in CSV format (1-2
>> million sample depending on size) and then import data direct into Solr
>> using CSV handler & curl.  This will give you the pure indexing time & the
>> differences.
>>
>> Thanks,
>> Susheel
>>
>> -Original Message-
>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>> Sent: Wednesday, March 05, 2014 8:03 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Indexing huge data
>>
>> Here's the easiest thing to try to figure out where to concentrate your
>> energies. Just comment out the server.add call in your SolrJ program.
>> Well, and any commits you're doing from SolrJ.
>>
>> My bet: Your program will run at about the same speed it does when you
>> actually index the docs, indicating that your problem is in the data
>> acquisition side. Of course the older I get, the more times I've been wrong
>> :).
>>
>> You can also monitor the CPU usage on the box running Solr. I often see
>> it idling along < 30% when indexing, or even < 10%, again indicating that
>> the bottleneck is on the acquisition side.
>>
>> Note I haven't mentioned any solutions, I'm a believer in identifying the
>> _problem_ before worrying about a solution.
>>
>> Best,
>> Erick
>>
>> On Wed, Mar 5, 2014 at 4:29 PM, Jack Krupansky 
>> wrote:
>>
>>> Make sure you're not doing a commit on each individual document add.
>>> Commit every few minutes or every few hundred or few thousand
>>> documents is sufficient. You can set up auto commit in solrconfig.xml.
>>>
>>> -- Jack Krupansky
>>>
>>> -Original Message- From: Rallavagu
>>> Sent: Wednesday, March 5, 2014 2:37 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Indexing huge data
>>>
>>>
>>> All,
>>>
>>> Wondering about best practices/common practices to index/re-index huge
>>> amount of data in Solr. The data is about 6 million entries in the db
>>> and other source (data is not located in one resource). Trying with
>>> solrj based solution to collect data from difference resources to
>>> index into Solr. It takes hours to index Solr.
>>>
>>> Thanks in advance
>>>
>>


Re: Does SolrCloud Improves Indexing or Slows it down

2014-02-19 Thread Kranti Parisa
Why don't you do parallel indexing and then merge everything into one and
replicate that from the master to the slaves in SolrCloud?

Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Wed, Feb 19, 2014 at 3:04 PM, Susheel Kumar <
susheel.ku...@thedigitalgroup.net> wrote:

> Hi,
>
> If we setup a solr cloud with 3 nodes and then we have like 100+ million
> documents to index. How we should be indexing a) will the indexing request
> be going to each machine assuming we are able to divide data based on some
> field or b) we should be sending the request to one end point and what
> should be end point?
>
> Can you please clarify and reading this article it says indexing may
> become slower.
>
>
> http://stackoverflow.com/questions/13500955/does-solrclouds-scalability-extend-to-indexing
>
>
> Please suggest & let me know if you need more info.
>
> Thnx
>


Re: Solr server requirements for 100+ million documents

2014-01-24 Thread Kranti Parisa
can you post the complete solrconfig.xml file and schema.xml files to
review all of your settings that would impact your indexing performance.

Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Sat, Jan 25, 2014 at 12:56 AM, Susheel Kumar <
susheel.ku...@thedigitalgroup.net> wrote:

> Thanks, Svante. Your indexing speed using db seems to really fast. Can you
> please provide some more detail on how you are indexing db records. Is it
> thru DataImportHandler? And what database? Is that local db?  We are
> indexing around 70 fields (60 multivalued) but data is not populated always
> in all fields. The average size of document is in 5-10 kbs.
>
> -Original Message-
> From: saka.csi...@gmail.com [mailto:saka.csi...@gmail.com] On Behalf Of
> svante karlsson
> Sent: Friday, January 24, 2014 5:05 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr server requirements for 100+ million documents
>
> I just indexed 100 million db docs (records) with 22 fields (4
> multivalued) in 9524 sec using libcurl.
> 11 million took 763 seconds so the speed drops somewhat with increasing
> dbsize.
>
> We write 1000 docs (just an arbitrary number) in each request from two
> threads. If you will be using solrcloud you will want more writer threads.
>
> The hardware is a single cheap hp DL320E GEN8 V2 1P E3-1220V3 with one SSD
> and 32GB and the solr runs on ubuntu 13.10 inside a esxi virtual machine.
>
> /svante
>
>
>
>
> 2014/1/24 Susheel Kumar 
>
> > Thanks, Erick for the info.
> >
> > For indexing I agree the more time is consumed in data acquisition
> > which in our case from Database.  For indexing currently we are using
> > the manual process i.e. Solr dashboard Data Import but now looking to
> > automate.  How do you suggest to automate the index part. Do you
> > recommend to use SolrJ or should we try to automate using Curl?
> >
> >
> > -Original Message-
> > From: Erick Erickson [mailto:erickerick...@gmail.com]
> > Sent: Friday, January 24, 2014 2:59 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Solr server requirements for 100+ million documents
> >
> > Can't be done with the information you provided, and can only be
> > guessed at even with more comprehensive information.
> >
> > Here's why:
> >
> >
> > http://searchhub.org/2012/07/23/sizing-hardware-in-the-abstract-why-we
> > -dont-have-a-definitive-answer/
> >
> > Also, at a guess, your indexing speed is so slow due to data
> > acquisition; I rather doubt you're being limited by raw Solr indexing.
> > If you're using SolrJ, try commenting out the
> > server.add() bit and running again. My guess is that your indexing
> > speed will be almost unchanged, in which case it's the data
> > acquisition process is where you should concentrate efforts. As a
> > comparison, I can index 11M Wikipedia docs on my laptop in 45 minutes
> > without any attempts at parallelization.
> >
> >
> > Best,
> > Erick
> >
> > On Fri, Jan 24, 2014 at 12:10 PM, Susheel Kumar <
> > susheel.ku...@thedigitalgroup.net> wrote:
> > > Hi,
> > >
> > > Currently we are indexing 10 million document from database (10 db
> > > data
> > entities) & index size is around 8 GB on windows virtual box. Indexing
> > in one shot taking 12+ hours while indexing parallel in separate cores
> > & merging them together taking 4+ hours.
> > >
> > > We are looking to scale to 100+ million documents and looking for
> > recommendation on servers requirements on below parameters for a
> > Production environment. There can be 200+ users performing search same
> time.
> > >
> > > No of physical servers (considering solr cloud) Memory requirement
> > > Processor requirement (# cores) Linux as OS oppose to windows
> > >
> > > Thanks in advance.
> > > Susheel
> > >
> >
>


Re: Changing Cache Properties after Indexing

2014-01-17 Thread Kranti Parisa
Doesn't it make sense to have a Indexer, Query Engine setup?

Indexer = Solr instance with replication configured as Master
Query Engine = One or more Solr instances with replication configured as
Slave

So that, you can do batch indexing on the Indexer, perform threshold checks
if needed by disabling the replication. If threshold checks are passed,
enable the replication.

This way you can configure the caches, other settings that you need for
indexing and configure something else on your Query Engine.

Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Fri, Jan 17, 2014 at 11:31 AM, P Williams  wrote:

> You're both completely right.  There isn't any issue with indexing with
> large cache settings.
>
> I ran the same indexing job five times, twice with large cache and twice
> with the default values. I threw out the first job because no matter if
> it's cached or uncached it runs ~2x slower. This must have been the
> observation I based my incorrect caching notion on.
>
> I unloaded with delete of the data directory and reloaded the core each
> time.  I'm using DIH with the FileEntityProcessor and
> PlainTextEnityProcessor to index ~11000 fulltext books.
>
> w/ cache
> 0:13:14.823
> 0:12:33.910
>
> w/o cache
> 0:12:13.186
> 0:15:56.566
>
> There is variation, but not anything that could be explained by the cache
> settings. Doh!
>
> Thanks,
> Tricia
>
>
> On Mon, Jan 13, 2014 at 6:08 PM, Shawn Heisey  wrote:
>
> > On 1/13/2014 4:44 PM, Erick Erickson wrote:
> >
> >> On the face of it, it's somewhat unusual to have the cache settings
> >> affect indexing performance. What are you seeing and how are you
> indexing?
> >>
> >
> > I think this is probably an indirect problem.  Cache settings don't
> > directly affect indexing speed, but when autoWarm values are high and NRT
> > indexing is happening, new searchers are requested frequently and the
> > autoWarm makes that happen slowly with a lot of resources consumed.
> >
> > Thanks,
> > Shawn
> >
> >
>


Re: Query time join with conditions

2014-01-16 Thread Kranti Parisa
cool, np.

Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Thu, Jan 16, 2014 at 11:30 AM, heaven  wrote:

> Nvm, figured it out.
>
> To match profiles that have "test entry" in own attributes or in related
> rss
> entries it is possible to use ({!join from=profile_ids_im to=id_i
> v=$rssQuery}Test entry) OR Test entry in "q" parameter, not in "fq".
>
> Thanks again for the help,
> Alex
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Query-time-join-with-conditions-tp4108365p4111719.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Query time join with conditions

2014-01-14 Thread Kranti Parisa
you should be able to do the following
/ProfileCore/select?q=*:*&fq={!join fromIndex=RssCore from=profile_id to=id
v=$rssQuery}&rssQuery=(type:'RssEntry')

There is also a new join impl
https://issues.apache.org/jira/browse/SOLR-4787 which allows you to use fq
within join, which will support Nested Joins and obviously hit filter cache.

Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Tue, Jan 14, 2014 at 2:20 PM, heaven  wrote:

> Can someone shed some light on this?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Query-time-join-with-conditions-tp4108365p4111300.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Range queries with Grouping is slow?

2014-01-09 Thread Kranti Parisa
Thank you, will take a look at it.

Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Thu, Jan 9, 2014 at 10:25 AM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Hello,
>
> Here is workaround for caching separate clauses in OR filters.
> http://blog.griddynamics.com/2014/01/segmented-filter-cache-in-solr.html
> No coding is required, just try to experiment with request parameters.
>
>
> On Wed, Jan 8, 2014 at 9:11 PM, Erick Erickson  >wrote:
>
> > Well, actually you can use fqs, it's just that re-using them becomes a
> bit
> > more tricky. Specifically,
> > fq=field1:blah OR field2:blort
> > is perfectly reasonable. However, it doesn't break things down into
> > sub-clauses, so
> > fq=field1:blah
> > will create a new entry in the filtercache. And
> > fq=field2:blort OR field1:blah
> > will not match the first one.
> >
> > It kind of depends on the query pattern whether the filtercache will be
> > re-used, you have to take care to construct the fq clauses with re-use in
> > mind if you want ORs.
> >
> > Best,
> > Erick
> >
> >
> > On Wed, Jan 8, 2014 at 11:56 AM, Kranti Parisa  > >wrote:
> >
> > > I was trying with the  [* TO *] as an example, the real use case is OR
> > > query between 2/more range queries of timestamp fields (saved in
> > > milliseconds). So I can't use FQs as they are ANDed by definition.
> > >
> > > Am I missing something here?
> > >
> > >
> > >
> > >
> > > Thanks,
> > > Kranti K. Parisa
> > > http://www.linkedin.com/in/krantiparisa
> > >
> > >
> > >
> > > On Wed, Jan 8, 2014 at 8:15 AM, Joel Bernstein 
> > wrote:
> > >
> > > > Kranti,
> > > >
> > > > The range query also looks like a good candidate to be moved to a
> > filter
> > > > query so it can be cached.
> > > >
> > > > Joel Bernstein
> > > > Search Engineer at Heliosearch
> > > >
> > > >
> > > > On Tue, Jan 7, 2014 at 11:34 PM, Smiley, David W.  >
> > > > wrote:
> > > >
> > > > > Kranti,
> > > > >
> > > > > I can't speak to the specific slow-down while grouping, but if you
> > > expect
> > > > > to run [* TO *] queries with any frequency then you should index a
> > > > boolean
> > > > > flag and query for that instead.  You might also reduce the
> > > precisionStep
> > > > > value for the field you are using to 6 or even 4.  But wow that's a
> > big
> > > > > difference you noted; it wouldn't hurt to double-check with the
> > > debugger
> > > > > that the [* TO *] is treated as a numeric range query instead of a
> > > > generic
> > > > > term range.
> > > > >
> > > > > ~ David
> > > > > 
> > > > > From: Kranti Parisa [kranti.par...@gmail.com]
> > > > > Sent: Tuesday, January 07, 2014 10:26 PM
> > > > > To: solr-user@lucene.apache.org
> > > > > Subject: Range queries with Grouping is slow?
> > > > >
> > > > > Is there any known issue with Range queries + grouping?
> > > > >
> > > > > Case1:
> > > > > q=id:123&group=true&sort=price
> > > > > asc&group.field=entityId&group.limit=2&group.ngroups=true
> > > > >
> > > > > Case2:
> > > > > q=id:123 AND price:[* TO *]&group=true&sort=price
> > > > > asc&group.field=entityId&group.limit=2&group.ngroups=true
> > > > >
> > > > > Index Size:10M/~5GB
> > > > > After running both queries at least once, I was expecting to hit
> the
> > > > query
> > > > > caches and response should be quick enough, but
> > > > > Case1: 15-20ms (looks fine)
> > > > > Case2: 400+ms (this seems constantly >400ms even after the first
> > query)
> > > > >
> > > > > any thought? if it's a known issue, please point me to the jira
> link
> > > > > otherwise I can open an issue if this needs some analysis?
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Kranti K. Parisa
> > > > > http://www.linkedin.com/in/krantiparisa
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  
>


Re: Passing variables as values in Query Filter

2014-01-08 Thread Kranti Parisa
did you try this?
q={!func}customfunc($v1)&v1=somevalue&qf=fieldname

more info
http://wiki.apache.org/solr/FunctionQuery


Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Wed, Jan 8, 2014 at 2:22 AM, Mukundaraman valakumaresan <
muk...@8kmiles.com> wrote:

> Hi Ahmet,
>
> Thanks a lot
>
> What I need is this .
> q={!lucene df=city v=$qq}&qq=customfunc(x)
>
> In this case,
> qq=custfunc(x) --> where custfunc is a custom function that has to be
> executed. Instead, how it acts now is, it takes it as a string to search
> if qq  is a number, you will get a NumberFormatException
>
> Thanks & Regards
> Mukund
>
>
>
> On Tue, Jan 7, 2014 at 7:45 PM, Ahmet Arslan  wrote:
>
> > Hi Mukund,
> >
> > I am not sure what you are after but may be you can use this : q={!lucene
> > df=city v=$qq}&qq=Adyar
> >
> >
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Local+Parameters+in+Queries
> >
> > Ahmet
> >
> >
> >
> >
> > On Tuesday, January 7, 2014 3:39 PM, Mukundaraman valakumaresan <
> > muk...@8kmiles.com> wrote:
> > Hi
> >
> > The following query executes
> >
> >
> >
> http://localhost:8983/solr/collection1/select?wt=json&indent=true&q=locality:A
> > <
> >
> http://localhost:8983/solr/collection1/select?wt=json&indent=true&q=city:Chennai
> > >
> > dyar
> >
> > But I wanted something like the one below which is not working.
> >
> >
> >
> http://localhost:8983/solr/collection1/select?wt=json&indent=true&q=locality:$str&str
> > <
> >
> http://localhost:8983/solr/collection1/select?wt=json&indent=true&q=city:$str&str=Chennai
> > >
> > ="Adyar"
> >
> >
> >
> http://localhost:8983/solr/collection1/select?wt=json&indent=true&q=locality
> > :<
> >
> http://localhost:8983/solr/collection1/select?wt=json&indent=true&q=city:$str&str=Chennai
> > >
> > getAdjacentLocalities("Adyar")
> >
> > getAdjacentLocalities() is a custom function implemented.
> >
> > Any suggestions.
> >
> > Thanks & Regards
> > Mukund
> >
> >
>


Re: Range queries with Grouping is slow?

2014-01-08 Thread Kranti Parisa
yes thats the key, these time ranges change frequently and hitting
filtercache then is a problem. I will try few more samples and probably
debug thru it. thanks.


Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Wed, Jan 8, 2014 at 12:11 PM, Erick Erickson wrote:

> Well, actually you can use fqs, it's just that re-using them becomes a bit
> more tricky. Specifically,
> fq=field1:blah OR field2:blort
> is perfectly reasonable. However, it doesn't break things down into
> sub-clauses, so
> fq=field1:blah
> will create a new entry in the filtercache. And
> fq=field2:blort OR field1:blah
> will not match the first one.
>
> It kind of depends on the query pattern whether the filtercache will be
> re-used, you have to take care to construct the fq clauses with re-use in
> mind if you want ORs.
>
> Best,
> Erick
>
>
> On Wed, Jan 8, 2014 at 11:56 AM, Kranti Parisa  >wrote:
>
> > I was trying with the  [* TO *] as an example, the real use case is OR
> > query between 2/more range queries of timestamp fields (saved in
> > milliseconds). So I can't use FQs as they are ANDed by definition.
> >
> > Am I missing something here?
> >
> >
> >
> >
> > Thanks,
> > Kranti K. Parisa
> > http://www.linkedin.com/in/krantiparisa
> >
> >
> >
> > On Wed, Jan 8, 2014 at 8:15 AM, Joel Bernstein 
> wrote:
> >
> > > Kranti,
> > >
> > > The range query also looks like a good candidate to be moved to a
> filter
> > > query so it can be cached.
> > >
> > > Joel Bernstein
> > > Search Engineer at Heliosearch
> > >
> > >
> > > On Tue, Jan 7, 2014 at 11:34 PM, Smiley, David W. 
> > > wrote:
> > >
> > > > Kranti,
> > > >
> > > > I can't speak to the specific slow-down while grouping, but if you
> > expect
> > > > to run [* TO *] queries with any frequency then you should index a
> > > boolean
> > > > flag and query for that instead.  You might also reduce the
> > precisionStep
> > > > value for the field you are using to 6 or even 4.  But wow that's a
> big
> > > > difference you noted; it wouldn't hurt to double-check with the
> > debugger
> > > > that the [* TO *] is treated as a numeric range query instead of a
> > > generic
> > > > term range.
> > > >
> > > > ~ David
> > > > 
> > > > From: Kranti Parisa [kranti.par...@gmail.com]
> > > > Sent: Tuesday, January 07, 2014 10:26 PM
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Range queries with Grouping is slow?
> > > >
> > > > Is there any known issue with Range queries + grouping?
> > > >
> > > > Case1:
> > > > q=id:123&group=true&sort=price
> > > > asc&group.field=entityId&group.limit=2&group.ngroups=true
> > > >
> > > > Case2:
> > > > q=id:123 AND price:[* TO *]&group=true&sort=price
> > > > asc&group.field=entityId&group.limit=2&group.ngroups=true
> > > >
> > > > Index Size:10M/~5GB
> > > > After running both queries at least once, I was expecting to hit the
> > > query
> > > > caches and response should be quick enough, but
> > > > Case1: 15-20ms (looks fine)
> > > > Case2: 400+ms (this seems constantly >400ms even after the first
> query)
> > > >
> > > > any thought? if it's a known issue, please point me to the jira link
> > > > otherwise I can open an issue if this needs some analysis?
> > > >
> > > >
> > > > Thanks,
> > > > Kranti K. Parisa
> > > > http://www.linkedin.com/in/krantiparisa
> > > >
> > >
> >
>


Re: Range queries with Grouping is slow?

2014-01-08 Thread Kranti Parisa
I was trying with the  [* TO *] as an example, the real use case is OR
query between 2/more range queries of timestamp fields (saved in
milliseconds). So I can't use FQs as they are ANDed by definition.

Am I missing something here?




Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Wed, Jan 8, 2014 at 8:15 AM, Joel Bernstein  wrote:

> Kranti,
>
> The range query also looks like a good candidate to be moved to a filter
> query so it can be cached.
>
> Joel Bernstein
> Search Engineer at Heliosearch
>
>
> On Tue, Jan 7, 2014 at 11:34 PM, Smiley, David W. 
> wrote:
>
> > Kranti,
> >
> > I can't speak to the specific slow-down while grouping, but if you expect
> > to run [* TO *] queries with any frequency then you should index a
> boolean
> > flag and query for that instead.  You might also reduce the precisionStep
> > value for the field you are using to 6 or even 4.  But wow that's a big
> > difference you noted; it wouldn't hurt to double-check with the debugger
> > that the [* TO *] is treated as a numeric range query instead of a
> generic
> > term range.
> >
> > ~ David
> > 
> > From: Kranti Parisa [kranti.par...@gmail.com]
> > Sent: Tuesday, January 07, 2014 10:26 PM
> > To: solr-user@lucene.apache.org
> > Subject: Range queries with Grouping is slow?
> >
> > Is there any known issue with Range queries + grouping?
> >
> > Case1:
> > q=id:123&group=true&sort=price
> > asc&group.field=entityId&group.limit=2&group.ngroups=true
> >
> > Case2:
> > q=id:123 AND price:[* TO *]&group=true&sort=price
> > asc&group.field=entityId&group.limit=2&group.ngroups=true
> >
> > Index Size:10M/~5GB
> > After running both queries at least once, I was expecting to hit the
> query
> > caches and response should be quick enough, but
> > Case1: 15-20ms (looks fine)
> > Case2: 400+ms (this seems constantly >400ms even after the first query)
> >
> > any thought? if it's a known issue, please point me to the jira link
> > otherwise I can open an issue if this needs some analysis?
> >
> >
> > Thanks,
> > Kranti K. Parisa
> > http://www.linkedin.com/in/krantiparisa
> >
>


Range queries with Grouping is slow?

2014-01-07 Thread Kranti Parisa
Is there any known issue with Range queries + grouping?

Case1:
q=id:123&group=true&sort=price
asc&group.field=entityId&group.limit=2&group.ngroups=true

Case2:
q=id:123 AND price:[* TO *]&group=true&sort=price
asc&group.field=entityId&group.limit=2&group.ngroups=true

Index Size:10M/~5GB
After running both queries at least once, I was expecting to hit the query
caches and response should be quick enough, but
Case1: 15-20ms (looks fine)
Case2: 400+ms (this seems constantly >400ms even after the first query)

any thought? if it's a known issue, please point me to the jira link
otherwise I can open an issue if this needs some analysis?


Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa


Re: config JoinQParserPlugin

2014-01-07 Thread Kranti Parisa
Ray,

FYI: there are more sophisticated joins available via
https://issues.apache.org/jira/browse/SOLR-4787
not on trunk yet, but worth taking a look.

Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Thu, Jan 2, 2014 at 8:05 PM, Ray Cheng  wrote:

> Hi Chris,
>
> > but also exactly what response you got
> I didn't get any response. Even with debug=true, there was nothing at all
> printed after the curl command. Nothing on the Solr log file either. (Are
> there higher debug levels on Solr log?) That was the reason I thought I
> needed to add JoinQParserPlugin explicitly in solrconfig.xml.
>
> Thanks for your and other email saying JoinQParserPlugin is Solr already.
> After reading your email, I tried a simple example of collections "brands"
> and "products" used in this url:
>
> http://stackoverflow.com/questions/12665797/is-solr-4-0-capable-of-using-join-for-multiple-core
>
>
> I also added -v to curl with !join syntax and saw some output:
> < HTTP/1.1 400 Bad Request
> < Content-Length: 0
>
> Then, I tried join syntax from Solr admin console and browser rather than
> using curl. After a few tries, cross core join worked with the simple
> "brands" and "products" collections! :) So, as you said, both of the
> following worked:
>
> http://localhost:8983/solr/brands/select?q=*:*&fq={!join from=brand_id
> to=id fromIndex=products_shard1_replica1}name:iPad
> http://localhost:8983/solr/brands/select?q=*:*&fq={!type=joinfrom=brand_id 
> to=id fromIndex=products_shard1_replica1}name:iPad
>
> However, without _shard1_replica1 in "products",
> http://localhost:8983/solr/brands/select?q=*:*&fq={!join from=brand_id
> to=id fromIndex=products}name:iPad
>
> gave this error: (I'm using SolrCloud from sole-4.6.0)
> msg">Cross-core join: no such core products
>
>
> It is inconvenient to specify the exact shard and replica on join queries.
> But, this is a good step forward for me. I'll try my more complicated
> schemas now. Thanks so much to you and others' replies!
>
> Ray
>
>
>
>
>
>
> On Tuesday, December 31, 2013 8:47 AM, Chris Hostetter <
> hossman_luc...@fucit.org> wrote:
>
>
> >: Earlier I tried join queries using curl
> >: '
> http://myLinux:8983/solr/abc.edu_up/select?debug=true&q=*:*&fq={defType=join
> >: from=id to=id
>  fromIndex=abc.edu}subject:financial'  but didn't get any
> >: response. There was nothing on Solr log either. So, I thought I need to
> >: config join. Is there another way to at least get some response from
> >: join queries?
> >
> >When posting questions, it's important to not only show the URLs you
> >tried, but also exactly what response you got -- in this case you have
> >debuging turned on (good!) but you don't show us what the debugging
> >information returend.
> >
> >from whati can tell, you are missunderstanding how to use localparams
> >and the use of "type" vs "defTpe" in local params.
> >
> >1) the syntax for local params is "{!p1=v1 p2=v2 ...}" ... note the "!",
> >it's important, otherwise the "{...}" is just treated as input to the
> >default parser.
> >
> >2) inside local params, you use the "type" param to indicate which parser
> >you want to use (or as a shorthand just specify the parser name
> >immediately after the "!"
> >
> >3) if you use "defType" as a localparam, it controls which parser is used
> >for parsing hte *nested* query.
> >
> >- - -
> >
> >So in your example, you should probably be using...
> >
> >/abc.edu_up/select?debug=true&q=*:*&fq={!type=join ...
> >
> >...or this syntactic sugar...
> >
> >/abc.edu_up/select?debug=true&q=*:*&fq={!join ...
> >
> >
> >If that still isn't working for you, please show us what output you do
> >get, and some
>  examples of the same query w/o the join filter (as well as
> >showing us what the nested join query produces on it's own so we can
> >verify you have docs matching it)
> >
> >
>


Re: dropping noise words and maintaining the relevancy

2013-10-31 Thread Kranti Parisa
One possible approach is you can populate the titles in a field (say
exactMatch) and point your search query to exactMatch:"160 Associates LP"
OR text:""160 Associates LP"
assuming that you have all the text populated into the field called "text"

you can also use field level boosting with the above query, example
exactMatch:"160 Associates LP"^10 OR text:""160 Associates LP"^5


Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Thu, Oct 31, 2013 at 4:00 PM, Susheel Kumar <
susheel.ku...@thedigitalgroup.net> wrote:

> Hello,
>
> We have a very particular requirement of dropping noise words (LP, LLP,
> LLC, Corp, Corporation, Inc, Incoporation, PA, Professional Association,
> Attorney at law, GP, General Partnership etc.) at the end of search key but
> maintaining the relevancy. For e.g.
>
> If user search for "160 Associates LP", we want search to return in their
> below relevancy order. Basically if exact / similar match is present, it
> comes first followed by other results.
>
> 160 Associates LP
> 160 Associates
> 160 Associates LLC
> 160 Associates LLLP
> 160 Hilton Associates
>
> If I handle this through "Stop words" then LP will get dropped from search
> key and then all results will come but exact match will be shown somewhere
> lower or deep.
>
> Regards and appreciate your help.
> Susheel
>