Re: JSON Facet Syntax Sorting

2016-10-26 Thread Zheng Lin Edwin Yeo
Thanks for the update Yonik.

Regards,
Edwin

On 26 October 2016 at 20:07, Yonik Seeley  wrote:

> On Wed, Oct 26, 2016 at 3:16 AM, Zheng Lin Edwin Yeo
>  wrote:
> > Hi,
> >
> > I'm using Solr 6.2.1.
> >
> > For the JSON Facet Syntax, are we able to sort on multiple values at one
> go?
> >
> > Like for example, if I want to sort by count, follow by the average
> price.
> > is this the correct way tot do?
>
> Sorting by multiple metrics isn't yet supported.
>
> -Yonik
>
> >  json.facet={
> >categories:{
> >  type : terms,
> >  field : cat,
> >  sort : { count : desc},
> >  sort : { x : desc},
> >  facet:{
> >x : "avg(price)",
> >y : "sum(price)"
> >  }
> >}
> >  }
> >
> >
> > Regards,
> > Edwin
>


Re: Combine Data from PDF + XML

2016-10-26 Thread Erick Erickson
In that case you'll have to write an indexing client that (probably)
uses Tika to parse the PDF file, some kind of XML parser to parse the
metadata XML and combine the two into Solr documents that you send to
Solr. Here's a skeletal program with some extra stuff in there for
database connectivity, but you should be able to chop that out pretty
easily.

https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/

Best,
Erick


On Wed, Oct 26, 2016 at 1:47 PM, tesm...@gmail.com  wrote:
> Hi Erick,
>
> Thanks for your reply.
>
> Yes, XML files contain metadata about PDF files. I need to search from both
> XML and PDF files and to show search results from both sources.
>
>
> Regards,
>
> On Wed, Oct 26, 2016 at 1:47 AM, Erick Erickson 
> wrote:
>
>> First you need to define the problem
>>
>> what do you mean by "combine"? Do the XML files
>> contain, say, metadata about an associated PDF file?
>>
>> Or are these entirely orthogonal documents that
>> you need to index into the same collection?
>>
>> Best,
>> Erick
>>
>> On Tue, Oct 25, 2016 at 4:18 PM, tesm...@gmail.com 
>> wrote:
>> > Hi,
>> >
>> > I ma new to Apache Solr.  Developing a search project. The source data is
>> > coming from two sources:
>> >
>> > 1) XML Files
>> >
>> > 2) PDF Files
>> >
>> >
>> > I need to combine these two sources for search.  Couldn't find example of
>> > combining these two sources. Any help is appreciated.
>> >
>> >
>> > Regards,
>>


Re: Combine Data from PDF + XML

2016-10-26 Thread tesm...@gmail.com
Hi Erick,

Thanks for your reply.

Yes, XML files contain metadata about PDF files. I need to search from both
XML and PDF files and to show search results from both sources.


Regards,

On Wed, Oct 26, 2016 at 1:47 AM, Erick Erickson 
wrote:

> First you need to define the problem
>
> what do you mean by "combine"? Do the XML files
> contain, say, metadata about an associated PDF file?
>
> Or are these entirely orthogonal documents that
> you need to index into the same collection?
>
> Best,
> Erick
>
> On Tue, Oct 25, 2016 at 4:18 PM, tesm...@gmail.com 
> wrote:
> > Hi,
> >
> > I ma new to Apache Solr.  Developing a search project. The source data is
> > coming from two sources:
> >
> > 1) XML Files
> >
> > 2) PDF Files
> >
> >
> > I need to combine these two sources for search.  Couldn't find example of
> > combining these two sources. Any help is appreciated.
> >
> >
> > Regards,
>


Re: Graph Traversal Question

2016-10-26 Thread Grant Ingersoll
On Wed, Oct 26, 2016 at 10:46 AM Joel Bernstein  wrote:

> Grant, can you describe your use case? Currently we can filter on the
> relationship using a filter query. So I was wondering what use case would
> involve retrieving the relationship. Are you looking to discover what
> relationships are available? One of the assumptions I made was that users
> would know what relationships they wanted to traverse.
>
>
Some of this is admittedly a thought experiment of what's possible, but I
think when dealing w/ graph operations it's pretty natural to use edge
attributes as part of your calculation.  The most obvious use case of that
is a weighted graph where the edge attribute is a numerical weight (e.g. in
Yonik's example: sort/rank by rating).  For me, I'm exploring how to use KB
data (Yago, which is basically RDF triples) as part of relevance and to
answer questions.  These are commonly done in a triple store (RDF engine),
but w/ this graph stuff in Solr, I think it could be possible to do in Solr
(and quite simply at that), which significantly simplifies the overall
system.


>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, Oct 26, 2016 at 9:39 AM, Grant Ingersoll 
> wrote:
>
> > The other way to think about is: I want to put labels on the edges.  In
> my
> > case, the label is the relationship, in your case, the label is the
> rating
> > or author.
> >
> > On Wed, Oct 26, 2016 at 7:26 AM Yonik Seeley  wrote:
> >
> > > On Wed, Oct 26, 2016 at 7:13 AM, Grant Ingersoll 
> > > wrote:
> > > > On Tue, Oct 25, 2016 at 6:26 PM Yonik Seeley 
> > wrote:
> > > >
> > > > In your example below it would be akin to injecting the rating onto
> > those
> > > > responses as well, not just in the 'fq'.
> > >
> > > Gotcha... Yeah, I remember wondering how to do that myself.
> > >
> > > -Yonik
> > >
> >
>


Re: Solr Cloud A/B Deployment Issue

2016-10-26 Thread Pushkar Raste
Nodes will still go into recovery but only for a short duration.

On Oct 26, 2016 1:26 PM, "jimtronic"  wrote:

It appears this has all been resolved by the following ticket:

https://issues.apache.org/jira/browse/SOLR-9446

My scenario fails in 6.2.1, but works in 6.3 and Master where this bug has
been fixed.

In the meantime, we can use our workaround to issue a simple delete command
that deletes a non-existent document.

Jim



--
View this message in context: http://lucene.472066.n3.
nabble.com/Solr-Cloud-A-B-Deployment-Issue-tp4302810p4303210.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Cloud A/B Deployment Issue

2016-10-26 Thread Pushkar Raste
This is due to leader initiated recovery. When Take a look at

https://issues.apache.org/jira/browse/SOLR-9446

On Oct 24, 2016 1:23 PM, "jimtronic"  wrote:

> We are running into a timing issue when trying to do a scripted deployment
> of
> our Solr Cloud cluster.
>
> Scenario to reproduce (sometimes):
>
> 1. launch 3 clean solr nodes connected to zookeeper.
> 2. create a 1 shard collection with replicas on each node.
> 3. load data (more will make the problem worse)
> 4. launch 3 more nodes
> 5. add replicas to each new node
> 6. once entire cluster is healthy, start killing first three nodes.
>
> Depending on the timing, the second three nodes end up all in RECOVERING
> state without a leader.
>
> This appears to be happening because when the first leader dies, all the
> new
> nodes go into full replication recovery and if all the old boxes happen to
> die during that state, the boxes are stuck. The boxes cannot serve requests
> and they eventually (1-8 hours) go into RECOVERY_FAILED state.
>
> This state is easy to fix with a FORCELEADER call to the collections API,
> but that's only remediation, not prevention.
>
> My question is this: Why do the new nodes have to go into full replication
> recovery when they are already up to date? I just added the replica, so it
> shouldn't have to a new full replication again.
>
> Jim
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Solr-Cloud-A-B-Deployment-Issue-tp4302810.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Related Search

2016-10-26 Thread Trey Grainger
Yeah, the approach listed by Grant and Markus is a common approach. I've
worked on systems that mined query logs like this, and it's a good approach
if you have sufficient query logs to pull it off.

There are a lot of linguistic nuances you'll encounter along the way,
including how you disambiguate homonyms and their related terms, identify
synonyms/acronyms as having the same underlying meaning, how you parse and
handle unknown phrases, removing noise present in the query logs, and even
how you weight the strength or relationship between related queries. I gave
a presentation on this topic at Lucene/Solr Revolution in 2015 if you're
interested in learning more about how to build such a system (
http://www.treygrainger.com/posts/presentations/leveraging-lucene-solr-as-a-knowledge-graph-and-intent-engine/
).

Another approach (also referenced in the above presentation), for those
with more of a cold-start problem with query logs, is to mine related terms
and phrases out of the underlying content in the search engine (inverted
index) itself. The Semantic Knowledge Graph that was recently open sourced
by CareerBuilder and contributed back to Solr (disclaimer: I worked on it,
and it's available both a Solr plugin and patch, but it's not ready to be
committed into Solr yet.) enables such a capability. See
https://issues.apache.org/jira/browse/SOLR-9480 for the most current patch.

It is a request handler that can take in any query and discover the most
related other terms to that entire query from the inverted index, sorted by
strength of relationship to that query (it can also traverse from those
terms across fields/relationships to other terms, but that's probably
overkill for the basic related searches use case). Think of it as a way to
run a query and find the most relevant other keywords, as opposed to
finding the most relevant documents.

Using this, you can then either return the related keywords as your related
searches, or you can modify your query to include them and power a
conceptual/semantic search instead of the pure text-based search you
started with. It's effectively a (better) way to implement More Like This,
where instead of taking a document and using tf-idf to extract out the
globally-interesting terms from the document (like MLT), you can instead
use a query to find contextually-relevant keywords across many documents,
score them based upon their similarity to the original query, and then turn
around and use the top most semantically-relevant terms as your related
search(es).

I don't have near-term plans to expose the semantic knowledge graph as a
search component (it's a request handler right now), but once it's finished
that could certainly be done. Just wanted to mention it as another approach
to solve this specific problem.

-Trey Grainger
SVP of Engineering @ Lucidworks
Co-author, Solr in Action



On Wed, Oct 26, 2016 at 1:59 PM, Markus Jelsma 
wrote:

> Indeed, we have similar processes running of which one generates a
> 'related query collection' which just contains a (normalized) query and its
> related queries. I would not know how this is even possible without
> continuously processing query and click logs.
>
> M.
>
>
> -Original message-
> > From:Grant Ingersoll 
> > Sent: Tuesday 25th October 2016 23:51
> > To: solr-user@lucene.apache.org
> > Subject: Re: Related Search
> >
> > Hi Rick,
> >
> > I typically do this stuff just by searching a different collection that I
> > create offline by analyzing query logs and then indexing them and
> searching.
> >
> > On Mon, Oct 24, 2016 at 8:32 PM Rick Leir  wrote:
> >
> > > Hi all,
> > >
> > > There is an issue 'Create a Related Search Component' which has been
> > > open for some years now.
> > >
> > > It has a priority: major.
> > >
> > > https://issues.apache.org/jira/browse/SOLR-2080
> > >
> > >
> > > I discovered it linked from Lucidwork's very useful blog on ecommerce:
> > >
> > >
> > > https://lucidworks.com/blog/2011/01/25/implementing-the-
> ecommerce-checklist-with-apache-solr-and-lucidworks/
> > >
> > >
> > > Did people find a better way to accomplish Related Search? Perhaps MLT
> > > http://wiki.apache.org/solr/MoreLikeThis ?
> > >
> > > cheers -- Rick
> > >
> > >
> > >
> >
>


CodaHale metrics for Solr 6?

2016-10-26 Thread Walter Underwood
Anybody using the CodaHale metrics.jetty9.InstrumentedHandler? It looks a lot 
like something we built for our own use with Solr 4.

http://metrics.dropwizard.io/3.1.0/manual/jetty/ 

http://metrics.dropwizard.io/3.1.0/apidocs/com/codahale/metrics/jetty9/InstrumentedHandler.html
 


wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)




RE: Related Search

2016-10-26 Thread Markus Jelsma
Indeed, we have similar processes running of which one generates a 'related 
query collection' which just contains a (normalized) query and its related 
queries. I would not know how this is even possible without continuously 
processing query and click logs.

M.
 
 
-Original message-
> From:Grant Ingersoll 
> Sent: Tuesday 25th October 2016 23:51
> To: solr-user@lucene.apache.org
> Subject: Re: Related Search
> 
> Hi Rick,
> 
> I typically do this stuff just by searching a different collection that I
> create offline by analyzing query logs and then indexing them and searching.
> 
> On Mon, Oct 24, 2016 at 8:32 PM Rick Leir  wrote:
> 
> > Hi all,
> >
> > There is an issue 'Create a Related Search Component' which has been
> > open for some years now.
> >
> > It has a priority: major.
> >
> > https://issues.apache.org/jira/browse/SOLR-2080
> >
> >
> > I discovered it linked from Lucidwork's very useful blog on ecommerce:
> >
> >
> > https://lucidworks.com/blog/2011/01/25/implementing-the-ecommerce-checklist-with-apache-solr-and-lucidworks/
> >
> >
> > Did people find a better way to accomplish Related Search? Perhaps MLT
> > http://wiki.apache.org/solr/MoreLikeThis ?
> >
> > cheers -- Rick
> >
> >
> >
> 


Re: Solr Cloud A/B Deployment Issue

2016-10-26 Thread jimtronic
It appears this has all been resolved by the following ticket:

https://issues.apache.org/jira/browse/SOLR-9446

My scenario fails in 6.2.1, but works in 6.3 and Master where this bug has
been fixed.

In the meantime, we can use our workaround to issue a simple delete command
that deletes a non-existent document.

Jim



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-A-B-Deployment-Issue-tp4302810p4303210.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Query formulation help

2016-10-26 Thread Prasanna S. Dhakephalkar
John, 

You are right, I am also looking for document fields as variables.
That was going to be my next trials.

I have been using admin panel for trying out queries.

Regards,

Prasanna.

-Original Message-
From: John Bickerstaff [mailto:j...@johnbickerstaff.com] 
Sent: Wednesday, October 26, 2016 9:26 PM
To: solr-user@lucene.apache.org
Subject: Re: Query formulation help

For what it's worth- you can do some complex stuff - including using document 
fields as "variables" -- I did it on an Solr query endpoint (like
/search) because I had stuff that was constant for every query.  The syntax is 
challenging, but it can be done.

I won't confuse the issue more unless you need something like that - let me 
know if you do.

On Wed, Oct 26, 2016 at 9:52 AM, Tom Evans  wrote:

> On Wed, Oct 26, 2016 at 4:00 PM, Prasanna S. Dhakephalkar 
>  wrote:
> > Hi,
> >
> > Thanks for reply, I did
> >
> > "q": "cost:[2 TO (2+5000)]"
> >
> > Got
> >
> >   "error": {
> > "msg": "org.apache.solr.search.SyntaxError: Cannot parse
> 'cost:[2 to (2+5000)]': Encountered \"  \"(2+5000)
> \"\" at line 1, column 18.\nWas expecting one of:\n\"]\" ...\n\"}\"
> ...\n",
> >   }
> >
> > I want solr to do the addition.
> > I tried
> > "q": "cost:[2 TO (2+5000)]"
> > "q": "cost:[2 TO sum(2,5000)]"
> >
> > I has not worked. I am missing something. I donot know what. May be 
> > how
> to invoke functions.
> >
> > Regards,
> >
> > Prasanna.
>
> Sorry, I was unclear - do the maths before constructing the query!
>
> You might be able to do this with function queries, but why bother? If 
> the number is fixed, then fix it in the query, if it varies then there 
> must be some code executing on your client that can be used to do a 
> simple addition.
>
> Cheers
>
> Tom
>



Re: Query formulation help

2016-10-26 Thread John Bickerstaff
For what it's worth- you can do some complex stuff - including using
document fields as "variables" -- I did it on an Solr query endpoint (like
/search) because I had stuff that was constant for every query.  The syntax
is challenging, but it can be done.

I won't confuse the issue more unless you need something like that - let me
know if you do.

On Wed, Oct 26, 2016 at 9:52 AM, Tom Evans  wrote:

> On Wed, Oct 26, 2016 at 4:00 PM, Prasanna S. Dhakephalkar
>  wrote:
> > Hi,
> >
> > Thanks for reply, I did
> >
> > "q": "cost:[2 TO (2+5000)]"
> >
> > Got
> >
> >   "error": {
> > "msg": "org.apache.solr.search.SyntaxError: Cannot parse
> 'cost:[2 to (2+5000)]': Encountered \"  \"(2+5000)
> \"\" at line 1, column 18.\nWas expecting one of:\n\"]\" ...\n\"}\"
> ...\n",
> >   }
> >
> > I want solr to do the addition.
> > I tried
> > "q": "cost:[2 TO (2+5000)]"
> > "q": "cost:[2 TO sum(2,5000)]"
> >
> > I has not worked. I am missing something. I donot know what. May be how
> to invoke functions.
> >
> > Regards,
> >
> > Prasanna.
>
> Sorry, I was unclear - do the maths before constructing the query!
>
> You might be able to do this with function queries, but why bother? If
> the number is fixed, then fix it in the query, if it varies then there
> must be some code executing on your client that can be used to do a
> simple addition.
>
> Cheers
>
> Tom
>


Re: Query formulation help

2016-10-26 Thread Tom Evans
On Wed, Oct 26, 2016 at 4:00 PM, Prasanna S. Dhakephalkar
 wrote:
> Hi,
>
> Thanks for reply, I did
>
> "q": "cost:[2 TO (2+5000)]"
>
> Got
>
>   "error": {
> "msg": "org.apache.solr.search.SyntaxError: Cannot parse 'cost:[2 to 
> (2+5000)]': Encountered \"  \"(2+5000) \"\" at line 1, 
> column 18.\nWas expecting one of:\n\"]\" ...\n\"}\" ...\n",
>   }
>
> I want solr to do the addition.
> I tried
> "q": "cost:[2 TO (2+5000)]"
> "q": "cost:[2 TO sum(2,5000)]"
>
> I has not worked. I am missing something. I donot know what. May be how to 
> invoke functions.
>
> Regards,
>
> Prasanna.

Sorry, I was unclear - do the maths before constructing the query!

You might be able to do this with function queries, but why bother? If
the number is fixed, then fix it in the query, if it varies then there
must be some code executing on your client that can be used to do a
simple addition.

Cheers

Tom


Re: Query formulation help

2016-10-26 Thread John Bickerstaff
Ahh - I see what you're after (I think)

This page should be helpful for you:

https://cwiki.apache.org/confluence/display/solr/Function+Queries

again, I'd try using the Admin UI as a test phase to get things right (and
see the syntax in the URL that comes back on the response)

Open the edismax section of the Admin UI to find fields that you can use to
enter function queries and things like this...

In the case of X + Y, you're probably interested in the "sum" function

HTH...

On Wed, Oct 26, 2016 at 9:28 AM, Shawn Heisey  wrote:

> On 10/26/2016 9:00 AM, Prasanna S. Dhakephalkar wrote:
> > Hi, Thanks for reply, I did "q": "cost:[2 TO (2+5000)]"
>
> Solr doesn't support doing math in that way in a query, except with
> dates.  It's invalid syntax for a range query.  Tom's reply was correct,
> but was phrased in a way that makes a potential promise that Solr won't
> deliver.
>
> https://cwiki.apache.org/confluence/display/solr/Working+with+Dates#
> WorkingwithDates-DateMath
>
> There might be a way to somehow use function query to do it, but if it's
> possible, I do not know how to write it.  If it's even possible, the
> syntax probably would not be straightforward.
>
> The way I would do your query is to have my code do the calculation and
> use 25000 directly instead of 2+5000.
>
> Thanks,
> Shawn
>
>


Re: Query formulation help

2016-10-26 Thread Shawn Heisey
On 10/26/2016 9:00 AM, Prasanna S. Dhakephalkar wrote:
> Hi, Thanks for reply, I did "q": "cost:[2 TO (2+5000)]"

Solr doesn't support doing math in that way in a query, except with
dates.  It's invalid syntax for a range query.  Tom's reply was correct,
but was phrased in a way that makes a potential promise that Solr won't
deliver.

https://cwiki.apache.org/confluence/display/solr/Working+with+Dates#WorkingwithDates-DateMath

There might be a way to somehow use function query to do it, but if it's
possible, I do not know how to write it.  If it's even possible, the
syntax probably would not be straightforward.

The way I would do your query is to have my code do the calculation and
use 25000 directly instead of 2+5000.

Thanks,
Shawn



Re: Solr Hit Highlighting

2016-10-26 Thread Bryan Bende
Hello,

I think part of the problem is the mis-match between what you are
highlighting on and what you are searching on.

Your query has no field specified so it must be searching a default field
field which looks like it would be _text_ since the copyField was setup to
copy everything to that field.

So you are searching against _text_ and then highlighting on content. These
two fields are also different types, one is text_en_splitting and one is
text_general, I suspect that could cause a difference in finding results vs
highlighting them.

Some things I would try...
- See what happens if your query is content:(What is lactose intolerance?)
and hl.fl = content  that way you are searching on what you are
highlighting on
- See what happens if you made content and _text_ the same type of field
(either both text_en_splitting or both text_general)
- You could make _text_ a stored field and set hl.fl =*  or hl.fl=_text_
and that should get you highlighting results from _text_  and allow you to
still use unfielded queries... normally this adds a lot of size to your
index if you are copying lots of fields to _text_ but you said it is only
content so maybe its fine

-Bryan


On Mon, Oct 24, 2016 at 11:51 PM, Al Hudson 
wrote:

> Hello All,
>
> I’m new to the world of Solr and hoping someone on this list can help me
> hit highlighting in solr.
>
> I am trying to set up a hit highlighting in Solr and have been seeing some
> strange issues.
>
> My core.xml file has a single tag   which houses all
> the text in a document.
>
> Using the Solr web interface I submit the following query : What is milk?
> – I get back many answers and in addition, just by selecting the hl box and
> entering ‘content’ in the hl.fl box I get hit highlighted portions of text.
>
> However things stop working when I change the query to : What is lactose
> intolerance? I still get valid results but the highlighting section is full
> of empty arrays.
>
> I’ve tried different combinations of commenting out the copyField, making
> content multivalued, but to be honest I’m trying things and hoping some
> configuration will work.
>
> required="false" multiValued="false" />
> 
>  docValues="false" />
>  multiValued="true"/>
>
> 
> 
>
>  multiValued="false" />
>
>  stored="true" multiValued="false" />
>
> Can someone help?
>
> Thank you,
> Al
>
>
> Sent from Mail for
> Windows 10
>
>


Re: Query formulation help

2016-10-26 Thread John Bickerstaff
It looks to me as if it's blowing up on syntax.

I don't have access to the Admin UI right now, but I would suggest
attempting to submit this query via the UI and examining the URL that comes
back.  That frequently solves my more frustrating syntax problems.

I.E. try putting the cost:[...] in the query box on the UI (no quotes
as a first try) and see what happens

Alternatively, try putting cost as the query field (qf) and just the 2
TO (2000) in the query box...

Apologies - I don't have the UI in front of me or I'd try it myself, but
this is the general idea - try to issue the query in the Admin UI and
observe the syntax in the URL that is returned at the top of the page along
with the results.

On Wed, Oct 26, 2016 at 9:00 AM, Prasanna S. Dhakephalkar <
prasann...@merajob.in> wrote:

> Hi,
>
> Thanks for reply, I did
>
> "q": "cost:[2 TO (2+5000)]"
>
> Got
>
>   "error": {
> "msg": "org.apache.solr.search.SyntaxError: Cannot parse 'cost:[2
> to (2+5000)]': Encountered \"  \"(2+5000) \"\" at line
> 1, column 18.\nWas expecting one of:\n\"]\" ...\n\"}\" ...\n",
>   }
>
> I want solr to do the addition.
> I tried
> "q": "cost:[2 TO (2+5000)]"
> "q": "cost:[2 TO sum(2,5000)]"
>
> I has not worked. I am missing something. I donot know what. May be how to
> invoke functions.
>
> Regards,
>
> Prasanna.
>
>
> -Original Message-
> From: Tom Evans [mailto:tevans...@googlemail.com]
> Sent: Wednesday, October 26, 2016 3:07 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Query formulation help
>
> On Wed, Oct 26, 2016 at 8:03 AM, Prasanna S. Dhakephalkar <
> prasann...@merajob.in> wrote:
> > Hi,
> >
> >
> >
> > May be very rudimentary question
> >
> >
> >
> > There is a integer field in a core : "cost"
> >
> > Need to build a query that will return documents where 0  <
> > "cost"-given_number  <  500
> >
>
> cost:[given_number TO (500+given_number)]
>
>


RE: Query formulation help

2016-10-26 Thread Prasanna S. Dhakephalkar
Hi,

Thanks for reply, I did

"q": "cost:[2 TO (2+5000)]"

Got

  "error": {
"msg": "org.apache.solr.search.SyntaxError: Cannot parse 'cost:[2 to 
(2+5000)]': Encountered \"  \"(2+5000) \"\" at line 1, 
column 18.\nWas expecting one of:\n\"]\" ...\n\"}\" ...\n",
  }

I want solr to do the addition.
I tried 
"q": "cost:[2 TO (2+5000)]"
"q": "cost:[2 TO sum(2,5000)]"

I has not worked. I am missing something. I donot know what. May be how to 
invoke functions.

Regards,

Prasanna.


-Original Message-
From: Tom Evans [mailto:tevans...@googlemail.com] 
Sent: Wednesday, October 26, 2016 3:07 PM
To: solr-user@lucene.apache.org
Subject: Re: Query formulation help

On Wed, Oct 26, 2016 at 8:03 AM, Prasanna S. Dhakephalkar 
 wrote:
> Hi,
>
>
>
> May be very rudimentary question
>
>
>
> There is a integer field in a core : "cost"
>
> Need to build a query that will return documents where 0  < 
> "cost"-given_number  <  500
>

cost:[given_number TO (500+given_number)]



Re: OOM Error

2016-10-26 Thread Susheel Kumar
Hi Toke,

I think your guess is right.  We have ingestion running in batches.  We
have 6 shards & 6 replicas on 12 VM's each around 40+ million docs on each
shard.

Thanks everyone for the suggestions/pointers.

Thanks,
Susheel

On Wed, Oct 26, 2016 at 1:52 AM, Toke Eskildsen 
wrote:

> On Tue, 2016-10-25 at 15:04 -0400, Susheel Kumar wrote:
> > Thanks, Toke.  Analyzing GC logs helped to determine that it was a
> > sudden
> > death.
>
> > The peaks in last 20 mins... See   http://tinypic.com/r/n2zonb/9
>
> Peaks yes, but there is a pattern of
>
> 1) Stable memory use
> 2) Temporary doubling of the memory used and a lot of GC
> 3) Increased (relative to last stable period) but stable memory use
> 4) Goto 2
>
> Should I guess, I would say that you are running ingests in batches,
> which temporarily causes 2 searchers to be open at the same time. That
> is 2 in the list above. After the batch ingest, the baseline moves up,
> assumedly because your have added quite a lot of documents, relative to
> the overall number of documents.
>
>
> The temporary doubling of the baseline is hard to avoid, but I am
> surprised of the amount of heap that you need in the stable periods.
> Just to be clear: This is from a Solr with 8GB of heap handling only 1
> shard of 20GB and you are using DocValues? How many documents do you
> have in such a shard?
>
> - Toke Eskildsen, State and University Library, Denmark
>


Re: Graph Traversal Question

2016-10-26 Thread Joel Bernstein
Grant, can you describe your use case? Currently we can filter on the
relationship using a filter query. So I was wondering what use case would
involve retrieving the relationship. Are you looking to discover what
relationships are available? One of the assumptions I made was that users
would know what relationships they wanted to traverse.



Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Oct 26, 2016 at 9:39 AM, Grant Ingersoll 
wrote:

> The other way to think about is: I want to put labels on the edges.  In my
> case, the label is the relationship, in your case, the label is the rating
> or author.
>
> On Wed, Oct 26, 2016 at 7:26 AM Yonik Seeley  wrote:
>
> > On Wed, Oct 26, 2016 at 7:13 AM, Grant Ingersoll 
> > wrote:
> > > On Tue, Oct 25, 2016 at 6:26 PM Yonik Seeley 
> wrote:
> > >
> > > In your example below it would be akin to injecting the rating onto
> those
> > > responses as well, not just in the 'fq'.
> >
> > Gotcha... Yeah, I remember wondering how to do that myself.
> >
> > -Yonik
> >
>


Re: Graph Traversal Question

2016-10-26 Thread Grant Ingersoll
The other way to think about is: I want to put labels on the edges.  In my
case, the label is the relationship, in your case, the label is the rating
or author.

On Wed, Oct 26, 2016 at 7:26 AM Yonik Seeley  wrote:

> On Wed, Oct 26, 2016 at 7:13 AM, Grant Ingersoll 
> wrote:
> > On Tue, Oct 25, 2016 at 6:26 PM Yonik Seeley  wrote:
> >
> > In your example below it would be akin to injecting the rating onto those
> > responses as well, not just in the 'fq'.
>
> Gotcha... Yeah, I remember wondering how to do that myself.
>
> -Yonik
>


Re: JSON Facet Syntax Sorting

2016-10-26 Thread Yonik Seeley
On Wed, Oct 26, 2016 at 3:16 AM, Zheng Lin Edwin Yeo
 wrote:
> Hi,
>
> I'm using Solr 6.2.1.
>
> For the JSON Facet Syntax, are we able to sort on multiple values at one go?
>
> Like for example, if I want to sort by count, follow by the average price.
> is this the correct way tot do?

Sorting by multiple metrics isn't yet supported.

-Yonik

>  json.facet={
>categories:{
>  type : terms,
>  field : cat,
>  sort : { count : desc},
>  sort : { x : desc},
>  facet:{
>x : "avg(price)",
>y : "sum(price)"
>  }
>}
>  }
>
>
> Regards,
> Edwin


Re: Graph Traversal Question

2016-10-26 Thread Yonik Seeley
On Wed, Oct 26, 2016 at 7:13 AM, Grant Ingersoll  wrote:
> On Tue, Oct 25, 2016 at 6:26 PM Yonik Seeley  wrote:
>
> In your example below it would be akin to injecting the rating onto those
> responses as well, not just in the 'fq'.

Gotcha... Yeah, I remember wondering how to do that myself.

-Yonik


Re: Graph Traversal Question

2016-10-26 Thread Grant Ingersoll
On Tue, Oct 25, 2016 at 6:46 PM Joel Bernstein  wrote:

> Because the edges are unique on the subject->object there isn't currently a
> way to capture the relationship. Aggregations can be rolled up on numeric
> fields and as Yonik mentioned you can track the ancestor.
>
> It would be fairly easy to track the relationship by adding a relationship
> array that would correspond with the ancestors array for example:
>
> {"result-set":{"docs":[
>
> {"node":"Haruka","collection":"reviews","field":"user_s","ancestors":["book1"],
> "relationships":["author"],   "level":1},
> {"node":"Maria","collection":"reviews","field":"user_s","
> ancestors":["book2"], "relationships":["author"], "level":1},
> {"EOF":true,"RESPONSE_TIME":22}]}}
>

Right, that is what I am after!


>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Tue, Oct 25, 2016 at 6:26 PM, Yonik Seeley  wrote:
>
> > You can get the nodes that to came from by adding trackTraversal=true
> >
> > A cut'n'paste example from my Lucene/Solr Revolution slides:
> >
> > curl $URL -d 'expr=gatherNodes(reviews,
> >search(reviews, q="user_s:Yonik AND rating_i:5",
> >   fl="book_s,user_s,rating_i",sort="user_s asc"),
> >walk="book_s->book_s",
> >gather="user_s",
> >fq="rating_i:[4 TO *] -user_s:Yonik",
> >trackTraversal=true )'
> >
> > {"result-set":{"docs":[
> > {"node":"Haruka","collection":"reviews","field":"user_s","
> > ancestors":["book1"],"level":1},
> > {"node":"Maria","collection":"reviews","field":"user_s","
> > ancestors":["book2"],"level":1},
> > {"EOF":true,"RESPONSE_TIME":22}]}}
> >
> > -Yonik
> >
> >
> > On Tue, Oct 25, 2016 at 5:57 PM, Grant Ingersoll 
> > wrote:
> > > Hi,
> > >
> > > I'm playing around with the new Graph Traversal/GatherNodes
> capabilities
> > in
> > > Solr 6.  I've been indexing Yago facts (
> > > http://www.mpi-inf.mpg.de/departments/databases-and-
> > information-systems/research/yago-naga/yago/downloads/)
> > > which give me triples of something like subject-relationship-object
> > (United
> > > States -> hasCapital -> Washington DC)
> > >
> > > My documents look like:
> > > subject: string
> > > relationship: string
> > > object: string
> > >
> > > I can do a simple gatherNodes like
> > > http://localhost:8983/solr/default/graph?expr=gatherNodes(default,
> > > walk="United_States->subject", gather="object") and get back the
> objects
> > > that relate to the subject.  However, I don't see any way to capture
> what
> > > the relationship is in the response.  IOW, the request above would just
> > > return a node of "Washington DC", but it doesn't tell me the
> relationship
> > > (i.e. I'd like to get Wash DC and hasCapital back somehow).  Is there
> > > anyway to expand the "gather" or otherwise mark up the nodes returned
> > with
> > > additional field attributes or maybe get additional graph info back?
> > >
> > > Thanks,
> > > Grant
> >
>


Re: Graph Traversal Question

2016-10-26 Thread Grant Ingersoll
On Tue, Oct 25, 2016 at 6:26 PM Yonik Seeley  wrote:

> You can get the nodes that to came from by adding trackTraversal=true
>

Yeah, I've tried that.  It's not quite what I want.  That just gets me the
"subject".

What I'm trying to do is more akin to what a triple store does.

I _can_ do things like filter on the relationship, which is a good start,
but I want the relationship and the object together so that I can do
downstream work on it.

In your example below it would be akin to injecting the rating onto those
responses as well, not just in the 'fq'.


>
> A cut'n'paste example from my Lucene/Solr Revolution slides:
>
> curl $URL -d 'expr=gatherNodes(reviews,
>search(reviews, q="user_s:Yonik AND rating_i:5",
>   fl="book_s,user_s,rating_i",sort="user_s asc"),
>walk="book_s->book_s",
>gather="user_s",
>fq="rating_i:[4 TO *] -user_s:Yonik",
>trackTraversal=true )'
>
> {"result-set":{"docs":[
>
> {"node":"Haruka","collection":"reviews","field":"user_s","ancestors":["book1"],"level":1},
>
> {"node":"Maria","collection":"reviews","field":"user_s","ancestors":["book2"],"level":1},
> {"EOF":true,"RESPONSE_TIME":22}]}}
>
> -Yonik
>
>
> On Tue, Oct 25, 2016 at 5:57 PM, Grant Ingersoll 
> wrote:
> > Hi,
> >
> > I'm playing around with the new Graph Traversal/GatherNodes capabilities
> in
> > Solr 6.  I've been indexing Yago facts (
> >
> http://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/downloads/
> )
> > which give me triples of something like subject-relationship-object
> (United
> > States -> hasCapital -> Washington DC)
> >
> > My documents look like:
> > subject: string
> > relationship: string
> > object: string
> >
> > I can do a simple gatherNodes like
> > http://localhost:8983/solr/default/graph?expr=gatherNodes(default,
> > walk="United_States->subject", gather="object") and get back the objects
> > that relate to the subject.  However, I don't see any way to capture what
> > the relationship is in the response.  IOW, the request above would just
> > return a node of "Washington DC", but it doesn't tell me the relationship
> > (i.e. I'd like to get Wash DC and hasCapital back somehow).  Is there
> > anyway to expand the "gather" or otherwise mark up the nodes returned
> with
> > additional field attributes or maybe get additional graph info back?
> >
> > Thanks,
> > Grant
>


Re: Query formulation help

2016-10-26 Thread Tom Evans
On Wed, Oct 26, 2016 at 8:03 AM, Prasanna S. Dhakephalkar
 wrote:
> Hi,
>
>
>
> May be very rudimentary question
>
>
>
> There is a integer field in a core : "cost"
>
> Need to build a query that will return documents where 0  <
> "cost"-given_number  <  500
>

cost:[given_number TO (500+given_number)]


Re: OOM Error

2016-10-26 Thread Tom Evans
On Wed, Oct 26, 2016 at 4:53 AM, Shawn Heisey  wrote:
> On 10/25/2016 8:03 PM, Susheel Kumar wrote:
>> Agree, Pushkar.  I had docValues for sorting / faceting fields from
>> begining (since I setup Solr 6.0).  So good on that side. I am going to
>> analyze the queries to find any potential issue. Two questions which I am
>> puzzling with
>>
>> a) Should the below JVM parameter be included for Prod to get heap dump
>>
>> "-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/the/dump"
>
> A heap dump can take a very long time to complete, and there may not be
> enough memory in the machine to start another instance of Solr until the
> first one has finished the heap dump.  Also, I do not know whether Java
> would release the listening port before the heap dump finishes.  If not,
> then a new instance would not be able to start immediately.
>
> If a different heap dump file is created each time, that might lead to
> problems with disk space after repeated dumps.  I don't know how the
> option works.
>
>> b) Currently OOM script just kills the Solr instance. Shouldn't it be
>> enhanced to wait and restart Solr instance
>
> As long as there is a problem causing OOMs, it seems rather pointless to
> start Solr right back up, as another OOM is likely.  The safest thing to
> do is kill Solr (since its operation would be unpredictable after OOM)
> and let the admin sort the problem out.
>

Occasionally our cloud nodes can OOM, when particularly complex
faceting is performed. The current OOM management can be exceedingly
annoying; a user will make a too complex analysis request, bringing
down one server, taking it out of the balancer. The user gets fed up
at no response, so reloads the page, re-submitting the analysis and
bringing down the next server in the cluster.

Lather, rinse, repeat - and then you get to have a meeting to discuss
why we invest so much in HA infrastructure that can be made non-HA by
one user with a complex query. In those meetings it is much harder to
justify not restarting.

Cheers

Tom


JSON Facet Syntax Sorting

2016-10-26 Thread Zheng Lin Edwin Yeo
Hi,

I'm using Solr 6.2.1.

For the JSON Facet Syntax, are we able to sort on multiple values at one go?

Like for example, if I want to sort by count, follow by the average price.
is this the correct way tot do?

 json.facet={
   categories:{
 type : terms,
 field : cat,
 sort : { count : desc},
 sort : { x : desc},
 facet:{
   x : "avg(price)",
   y : "sum(price)"
 }
   }
 }


Regards,
Edwin


Query formulation help

2016-10-26 Thread Prasanna S. Dhakephalkar
Hi,

 

May be very rudimentary question

 

There is a integer field in a core : "cost"

Need to build a query that will return documents where 0  <
"cost"-given_number  <  500

 

How can this be achieved ?

 

Thanks.

 

Prasanna.