Re: How to connect Solr with Impala?

2015-07-24 Thread Upayavira


On Fri, Jul 24, 2015, at 12:53 AM, Rex X wrote:
 Given following Impala query:
 
 SELECT date, SUM(CAST(price AS DOUBLE)) AS price
 FROM table
 WHERE date='2014-01-01' AND store_id IN(1,2,3)
 GROUP BY date;
 
 To work with Solr
 
  1. Will it be more efficient to directly use equivalent Solr query? Any
 curl command equivalent to the Impala query above? Or
  2. Will it be faster to create a new table based on the query above with
 Impala, and then connect Impala with Solr? Any such Impala-Solr
 connector?
 
 The final goal is to use Kibana to connect Solr for visualization.
 
 Any comments are greatly welcome!

I do not know Impala so cannot comment much on that - i.e. would
querying Solr or Impala be more efficient? No idea.

The above looks like an aggregation with filtering, so I'd suggest you
look at the new json facet API in Solr which would get your aggregations
(and summing).

To query against Solr, you need to have pushed your content *to* Solr.
It won't go ask Impala for you. You will have to set up mechanisms for
your content to get into Solr for Solr to be any use.

Lastly, Kibana is a tool that works on top of Elasticsearch. To use
Solr, you should look at Lucidworks Banana in its place.

Upayavira


Re: Custom Solr caches in a FunctionQuery that emulates the ExternalFileField

2015-07-24 Thread Upayavira
Mikhail,

I've tried this out, but to be honest I can't work out what the score=
parameter is supposed to add.

I assume that if I do {!join fromIndex=other from=other_key to=key
score=max}somefield:(abc dev)

It will calculate the score for each document that has the same key
value, and include that in the score for the main document?

If this is the case, then I should be able to do:

{!join fromIndex=other from=other_key to=key score=max}{!boost
b=my_boost_value_field}*:*

In which case, it'll take the value of my_boost_field in the other
core, and include it in the score for my document that has the value of
key?

Upayavira

On Fri, Jul 10, 2015, at 04:15 PM, Mikhail Khludnev wrote:
 I've heard that people use
 https://issues.apache.org/jira/browse/SOLR-6234
 for such purpose - adding scores from fast moving core to the bigger slow
 moving one
 
 On Fri, Jul 10, 2015 at 4:54 PM, Upayavira u...@odoko.co.uk wrote:
 
  All,
 
  I have knocked up what I think could be a really cool function query -
  it allows you to retrieve a value from another core (much like a pseudo
  join) and use that value during scoring (much like an
  ExternalFileField).
 
  Examples:
   * Selective boosting of documents based upon a category based value
   * boost on aggregated popularity values
   * boost on fast moving data on your slow moving index
 
  It *works* but it does so very slowly (on 3m docs, milliseconds without,
  and 24s with it). There are two things that happen a lot:
 
   * locate a document with unique ID value of X
   * retrieve the value of field Y for that doc
 
  What it seems to me now is that I need to implement a cache that will
  have a string value as the key and the (float) field value as the
  object, that is warmed alongside existing caches.
 
  Any pointers to examples of how I could do this, or other ways to do the
  conversion from a key value to a float value faster?
 
  NB. I hope to contribute this if I can make it perform.
 
  Thanks!
 
  Upayavira
 
 
 
 
 -- 
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics
 
 http://www.griddynamics.com
 mkhlud...@griddynamics.com


RE: Per-document and per-query analysis

2015-07-24 Thread Markus Jelsma
Hello Alessandro, i have thought about that, but in this case we do not want 
more fields, just perform some additional normalization filters based on some 
parameter. We need this type of index to be very low latency, and we have many 
varieties. We know from experience that hundreds of fields adds considerable 
overhead, visible in the prepare section when debugging.

Markus
 
-Original message-
 From:Alessandro Benedetti benedetti.ale...@gmail.com
 Sent: Thursday 23rd July 2015 18:08
 To: solr-user@lucene.apache.org
 Subject: Re: Per-document and per-query analysis
 
 markus,
 the first idea that come to my mind is this :
 
 1) you configure your schema, creating your field types, and if necessary
 fields associated
 2) you build an UpdateRequestProcessor that do a conditional check per
 document, and create the proper fields starting from one input field .
 
 In this way you will have the possibility of automatically analyse
 indexing/query time differently each field.
 As a cons you will have more fields, and not only one, each field will
 reflect your requirements in terms of analysis.1) you configure your schema, 
 creating your field types, and if necessary
fields associated
2) you build an UpdateRequestProcessor that do a conditional check per
document, and create the proper fields starting from one input field .

 
 Do you think this solution can satisfy you ?
 Please share a feedback and we can discuss better the requirements.
 
 Cheers
 
 2015-07-23 17:03 GMT+01:00 Markus Jelsma markus.jel...@openindex.io:
 
  Hello - the title says it all. When indexing a document, we need to run
  one or more additional filters depending on the value of a specific field.
  Likewise, we need to run that same filter over the already analyzed tokens
  when querying. This is not going to work if i extend TextField, at all. And
  i am not sure about QParsers as well because it should be QParser agnostic.
 
  I am in need of some hints about which parts of the codebase i should
  extend or replace, if possible at all. For the record, in this case we do
  not want to create additional fields.
 
  Many thanks,
  Markus
 
 
 
 
 -- 
 --
 
 Benedetti Alessandro
 Visiting card - http://about.me/alessandro_benedetti
 Blog - http://alexbenedetti.blogspot.co.uk
 
 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?
 
 William Blake - Songs of Experience -1794 England
 


Re: Nested objects in Solr

2015-07-24 Thread Alexandre Rafalovitch
Actually, Solr has been supporting Nested Objects for a little while:
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments

The schema represents a union of all possible fields though, so yes,
some care needs to be taken with names and mappings.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 24 July 2015 at 09:52, Bill Au bill.w...@gmail.com wrote:
 What exactly do you mean by nested objects in Solr.  It would help if you
 give an example.  The Solr schema is flat as far as I know.

 Bill

 On Fri, Jul 24, 2015 at 9:24 AM, Rajesh rajesh.panneersel...@aspiresys.com
 wrote:

 You can use nested entities like below.

 document
 entity name=OuterEntity pk=id
 query=SELECT * FROM User
  field column=id name=id /
 field column=name name=name /

 entity name=InnerEntity child=true
 query=select * from subject 
/entity
 /entity
 /document




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Nested-objects-in-Solr-tp4213212p4219039.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Running SOLR 5.2.1 on Embedded Jetty

2015-07-24 Thread Darin Amos
Thanks Shawn,

I actually figured out the issue while I was on my flight back home. It was a 
trivial problem caused by a bad assumption. I have some classpath issues now 
but those are problems I can solve.

Sorry for not including any logs, the behaviour looked like it was simply not 
detecting the war and I was just curious if there was something obvious I was 
missing because it is hard to find documentation on. It started working when I 
exploded the war archive, a mistake I probably shouldn’t have made.

To Upayavira: I think it is a fair question why I would be using Embedded 
Jetty. There is a lot of value and use cases for this, in my case I want to run 
my SOLR instance within the JVM of another java process. I also believe in a 
philosophy that developers should understand how software works, not just how 
to use it; therefore I like to experiment with unconventional approaches when I 
tinker. This doesn’t mean I would take the unconventional approach to 
production.

Thanks!

Darin


 On Jul 23, 2015, at 7:54 PM, Shawn Heisey apa...@elyograg.org wrote:
 
 On 7/23/2015 3:14 PM, Darin Amos wrote:
 I have been trying to run the SOLR war with embedded Jetty and can’t seem to 
 get the config quiet right. Is there any known documentation on this or is 
 someone else doing this? I seem to just be setting up a document server at 
 my solr.home directory. The code snippet below seems incomplete to me, but I 
 can’t seem to find what I am missing. 
 
 Thanks!
 
 Darin
 
 Server solrServer = new Server(8983);
 
 WebAppContext solrApp = new WebAppContext();
 solrApp.setContextPath(/);
 solrApp.setWar(solr.war);   //solr.war is sitting in my java.home root for 
 now.
 solrServer.setHandler(solrApp);
 
 solrServer.start();
 solrServer.join();
 
 The only officially supported way to run Solr since 5.0 was released is
 with the scripts included in the bin directory in the download.
 
 https://wiki.apache.org/solr/WhyNoWar https://wiki.apache.org/solr/WhyNoWar
 
 That doesn't mean I won't try to help you, but without logs, there's no
 way to know what is happening.  You may need help from the Jetty
 project, at least to set up logging, and possibly with the rest of it. 
 Here's some info on logging for a standard install ... I have no idea
 how you'd go about this for the embedded version:
 
 http://www.eclipse.org/jetty/documentation/9.2.7.v20150116/configuring-logging.html
  
 http://www.eclipse.org/jetty/documentation/9.2.7.v20150116/configuring-logging.html
 
 For Solr's logging, you need the jars from the server/lib/ext directory
 in the Solr download (for the included jetty server) in a similar
 directory for your application, and the log4j.properties file needs to
 be on the classpath or explicitly described with an appropriate system
 property.
 
 https://wiki.apache.org/solr/SolrLogging 
 https://wiki.apache.org/solr/SolrLogging
 
 In the Solr download, look at the xml file in server/contexts (5.x) for
 some hints about how to properly configure jetty for the webapp.
 
 I would recommend that you use /solr for the context path.  Every
 example you'll run into uses that URL path.  If you want to be
 explicitly different than default to make an attacker's job harder, then
 pick some other string to put after the slash.  I don't have much
 experience with the root context, but I've read somewhere that there can
 be some pitfalls.  I do not know what they are.
 
 Thanks,
 Shawn



Re: term frequency with stemming

2015-07-24 Thread Darin Amos
Hi Dale,

I would think the coffee shop is better, I have in-laws visiting at home.

Thanks

Darin


 On Jul 24, 2015, at 12:04 PM, Aki Balogh a...@marketmuse.com wrote:
 
 Hi All,
 
 I'm using TermVectorComponent and stemming (Porter) in order to get term
 frequencies with fuzzy matching. I'm stemming at index and query time.
 
 Is there a way to get term frequency from the index?
 * termfreq doesn't support stemming or wildcards
 * terms component doesn't allow additional filters
 * I could use a copyfield to save a non-stemmed version at indexing, and
 run termfreq on that, but then I don't get any fuzzy matching
 
 Thanks,
 Aki



Scoring, payloads and phrase queries

2015-07-24 Thread Jamie Johnson
Is there a way to consider payloads for scoring in phrase queries like
exists in PayloadTermQuery?


Re: Natively Execute SOLR Queries within an app server.

2015-07-24 Thread Mikhail Khludnev
SolrDispatchFilter holds CoreContainer cores, perhaps you can extend the
filter to manage it to publish cores into jndi, where core can be found in
other application, and is used for instantiating EmbeddedSolrServer.

On Fri, Jul 24, 2015 at 9:50 PM, Darin Amos dari...@gmail.com wrote:

 Hello,

 I have an application server that is running both the solr.war and a REST
 API war within the same JVM. Is it possible to query the SOLR instance
 natively (non-blocking) without connecting over HTTP? I could use
 EmbeddedSolrServer but I cannot create a second instance of my core.

 If I can get a reference to my existing core instance and wrap it with new
 EmbeddedSolrServer(SolrCore), is this reasonable? However, I cannot see how
 to get a reference to an existing core in a supported way.

 Thanks

 Darin




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com


Re: Scoring, payloads and phrase queries

2015-07-24 Thread Jamie Johnson
looks like there is nothing that exists in this regard and there is no jira
ticket that I could find.  Is this something that there is any other
interest in?  Is this something that a ticket should be created for?

On Fri, Jul 24, 2015 at 10:41 AM, Jamie Johnson jej2...@gmail.com wrote:

 Is there a way to consider payloads for scoring in phrase queries like
 exists in PayloadTermQuery?



Re: Unexpected docvalues type error using result grouping - Use UninvertingReader or index with docvalues

2015-07-24 Thread Erick Erickson
bq:  This started when I turned on docvalues

You _must_ re-index completely when changing something like this, so the notion
of removing the index completely isn't really any extra work.

Here's what I'd do.

1 just create a new collection with your current schema definition
and index to _that_.
That'll guarantee you don't have anything pre-existing that pollutes your index.
2 verify that this does what you want. Perhaps use a smaller set of
docs than your entire
corpus.
3 delete your original collection
4 If you require the same name, you can use collection aliasing to
make this change
transparent.

Creating/deleting collections and using collection aliasing are all
through the Collections API.

Best,
Erick

On Fri, Jul 24, 2015 at 10:16 AM, shamik sham...@gmail.com wrote:
 I didn't use the REST API, instead updated the schema manually.

 Can you be specific on removing the data directory content ? I certainly
 don't want to wipe out the index. I've four Solr instances, 2 shards with a
 replica each. Are you suggesting clearing the index and re-indexing from
 scratch ?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Unexpected-docvalues-type-error-using-result-grouping-Use-UninvertingReader-or-index-with-docvalues-tp4218939p4219089.html
 Sent from the Solr - User mailing list archive at Nabble.com.


term frequency with stemming

2015-07-24 Thread Aki Balogh
Hi All,

I'm using TermVectorComponent and stemming (Porter) in order to get term
frequencies with fuzzy matching. I'm stemming at index and query time.

Is there a way to get term frequency from the index?
* termfreq doesn't support stemming or wildcards
* terms component doesn't allow additional filters
* I could use a copyfield to save a non-stemmed version at indexing, and
run termfreq on that, but then I don't get any fuzzy matching

Thanks,
Aki


Re: Unexpected docvalues type error using result grouping - Use UninvertingReader or index with docvalues

2015-07-24 Thread shamik
Thanks Eric.

Here's the part which I'm not able to understand. I've for e.g. Source A, B,
C and D in index. Each source contains n number of documents. Now, out of
these, a bunch of documents in A and B are tagged with MediaType. I took the
following steps:

1. Delete all documents tagged with MediaType for A and B. Documents from C
and D are not touched.

2. Re-Index documents which were tagged with MediaType

3. Run Optimization

Still, I keep seeing this exception. Does this mean, content from C and D
are impacted even though they are not tagged with MediaType ?

I'll follow your recommendation of creating a new collection, do a full
index and delete original collection.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unexpected-docvalues-type-error-using-result-grouping-Use-UninvertingReader-or-index-with-docvalues-tp4218939p4219127.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Unexpected docvalues type error using result grouping - Use UninvertingReader or index with docvalues

2015-07-24 Thread Shawn Heisey
On 7/24/2015 3:48 PM, shamik wrote:
 Here's the part which I'm not able to understand. I've for e.g. Source A, B,
 C and D in index. Each source contains n number of documents. Now, out of
 these, a bunch of documents in A and B are tagged with MediaType. I took the
 following steps:
 
 1. Delete all documents tagged with MediaType for A and B. Documents from C
 and D are not touched.
 
 2. Re-Index documents which were tagged with MediaType
 
 3. Run Optimization
 
 Still, I keep seeing this exception. Does this mean, content from C and D
 are impacted even though they are not tagged with MediaType ?

Do any docs from C and D have that field?  Never mind whether you need
to run your operation on them ... do they have the field?  If so, then
when the facet code (which knows about the schema and the fact that it
has docValues) looks at those segments, they do not have *any* docValues
tagging for that field.  This likely would cause big explosions.  This
lack of docValues tagging probably survives an optimize.

Even if they don't have the field, there may be something about the
Lucene format that the docValues support just doesn't like when the
original docs were indexed without docValues on that field.

Rebuilding the *entire* index is recommended for most schema changes,
especially those like docValues that affect very low-level code
implementations.  Solr hides lots of low-level Lucene details from the
administrator, but makes use of those details to do its job.  Making
sure your config and schema match what was present when the index was
built is sometimes critical.

Thanks,
Shawn



Re: Nested objects in Solr

2015-07-24 Thread Bill Au
What exactly do you mean by nested objects in Solr.  It would help if you
give an example.  The Solr schema is flat as far as I know.

Bill

On Fri, Jul 24, 2015 at 9:24 AM, Rajesh rajesh.panneersel...@aspiresys.com
wrote:

 You can use nested entities like below.

 document
 entity name=OuterEntity pk=id
 query=SELECT * FROM User
  field column=id name=id /
 field column=name name=name /

 entity name=InnerEntity child=true
 query=select * from subject 
/entity
 /entity
 /document




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Nested-objects-in-Solr-tp4213212p4219039.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Natively Execute SOLR Queries within an app server.

2015-07-24 Thread Darin Amos
Hello,

I have an application server that is running both the solr.war and a REST API 
war within the same JVM. Is it possible to query the SOLR instance natively 
(non-blocking) without connecting over HTTP? I could use EmbeddedSolrServer but 
I cannot create a second instance of my core.

If I can get a reference to my existing core instance and wrap it with new 
EmbeddedSolrServer(SolrCore), is this reasonable? However, I cannot see how to 
get a reference to an existing core in a supported way.

Thanks

Darin



Re: Natively Execute SOLR Queries within an app server.

2015-07-24 Thread Upayavira


On Fri, Jul 24, 2015, at 07:50 PM, Darin Amos wrote:
 Hello,
 
 I have an application server that is running both the solr.war and a REST
 API war within the same JVM. Is it possible to query the SOLR instance
 natively (non-blocking) without connecting over HTTP? I could use
 EmbeddedSolrServer but I cannot create a second instance of my core.
 
 If I can get a reference to my existing core instance and wrap it with
 new EmbeddedSolrServer(SolrCore), is this reasonable? However, I cannot
 see how to get a reference to an existing core in a supported way.

This is not a supported use-case. Solr is intended to be a stand-alone
application server that happens to be written in Java.

I believe, as of 5.3, there may not be a war file included in Solr, and
gradually, creating a war will get harder, or even become impossible.

If you wanted to run something inside the same VM, write your own
request handler, and make it a part of Solr itself.

See: http://wiki.apache.org/solr/WhyNoWar

Upayavira


[ANN] New Features For Splainer

2015-07-24 Thread Doug Turnbull
First, I wanted to humbly thank the Solr community for their contributions
and feedback for our open source Solr sandbox, Splainer (http://splainer.io
and http://github.com/o19s/splainer). The reception and comments have been
generally positive and helpful, and I very much appreciate being part of
such a great open source community that wants to support each other.

What is Splainer exactly? Why should you care? Nobody likes working with
Solr in the browser's URL bar.  Splainer let's you paste in your Solr URL
and get an instant, easy to understand breakdown of why some documents are
ranked higher than others. It then gives you a friendly interface to tweak
Solr params and experiment with different ideas with a friendlier UI than
trying to parse through XML and JSON. You needn't worry about security
rules so that some splainer backend needing to talk to your Solr. The
interaction with Solr is 100% through your browser. If your PC can see
Solr, then so can Splainer running in your browser. If you leave work or
turn off the VPN, then Splainer can't see your Solr. It's all running
locally on your machine through the browser!

I wanted to share that we've been slowly adding features to Splainer. The
two I wanted to highlight, are captured in this blog article (
http://opensourceconnections.com/blog/2015/07/24/splainer-a-solr-developers-best-friend/
)

To summarize, they include

- Explain Other
You often wonder why obviously relevant search results don't come back.
Splainer now gives you the ability to compare any document to secondary
document to see what factors caused one document to rank higher than another

- Share Splainerized Solr Results
Once you paste a Solr URL into Splainer, you can then copy the splainer.io
URL to share what you're seeing with a colleague. For example, here's some
information about Virginia state laws about hunting deer from a boat:

http://splainer.io/#?solr=http:%2F%2Fsolr.quepid.com%2Fsolr%2Fstatedecoded%2Fselect%3Fq%3Ddeer%20hunt%20from%20watercraft%0A%26defType%3Dedismax%0A%26qf%3Dcatch_line%20text%0A%26bq%3Dtitle:deer

There's many more smaller features and tweaks, but I wanted to let you know
this was out there. I hope you find Splainer useful. I'm very happy to
field pull requests, ideas, suggestions, or try to figure out why Splainer
isn't working for you!

Cheers!
-- 
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections
http://opensourceconnections.com, LLC | 240.476.9983
Author: Relevant Search http://manning.com/turnbull
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.


Re: caceh implemetation?

2015-07-24 Thread Mikhail Khludnev
On Fri, Jul 24, 2015 at 1:06 AM, Shawn Heisey apa...@elyograg.org wrote:

 On 7/23/2015 10:55 AM, cbuxbaum wrote:
  Say we have 100 party records.  Then the child SQL will be run
 100
  times (once for each party record).  Isn't there a way to just run the
 child
  SQL on all of the party records at once with a join, using a GROUP BY and
  ORDER BY on the PARTY_ID?  Then the results from that query could easily
 be
  placed in SOLR according to the primary key (party_id).  Is there some
 part
  of the Data Import Handler that operates that way?

 Using well-crafted SQL JOIN is almost always going to be better for
 dataimport than nested entities.  The heavy lifting is done by the
 database server, using code that's extremely well-optimized for that
 kind of lifting.  Doing what you describe with a parent entity and one
 nested entity (that is not cached) will result in 101 total SQL
 queries.  A million SQL queries, no matter how fast each one is, will be
 slow.

 If you can do everything in a single SQL query with JOIN, then Solr will
 make exactly one SQL query to the server for a full-import.

 For my own dataimport, I use a view that was defined on the mysql server
 by the dbadmin.  The view does all the JOINs we require.

 Solr's dataimport handler doesn't have any intelligence to do the join
 locally.  It would be cool if it did, but somebody would have to write
 the code to teach it how.  Because the DB server itself can already do
 JOINs, and it can do them VERY well, there's really no reason to teach
 it to Solr.


fwiw, DIH now has join=”zipper”
https://issues.apache.org/jira/browse/SOLR-4799 attribute which can be
specified to child entity, it enables classic ETL external merge join
algorithm.


 Thanks,
 Shawn




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com


Re: Unexpected docvalues type error using result grouping - Use UninvertingReader or index with docvalues

2015-07-24 Thread shamik
I didn't use the REST API, instead updated the schema manually.

Can you be specific on removing the data directory content ? I certainly
don't want to wipe out the index. I've four Solr instances, 2 shards with a
replica each. Are you suggesting clearing the index and re-indexing from
scratch ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unexpected-docvalues-type-error-using-result-grouping-Use-UninvertingReader-or-index-with-docvalues-tp4218939p4219089.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: serious JSON Facet bug

2015-07-24 Thread Nagasharath
Is there a jira logged for this issue?

Sent from my iPhone

 On 23-Jul-2015, at 11:09 pm, Nagasharath sharathrayap...@gmail.com wrote:
 
 I don't have this issue.
 
 I have tried with various json facet queries and my filter cache always come 
 down to the 'minsize'( never exceeds configured) with solr version 5.2.1, and 
 all my queries are json nested faceted.
 
 On 23-Jul-2015, at 7:43 pm, Yonik Seeley ysee...@gmail.com wrote:
 
 On Thu, Jul 23, 2015 at 5:00 PM, Harry Yoo hyunat...@gmail.com wrote:
 Is there a way to patch? I am using 5.2.1 and using json facet in 
 production.
 
 First you should see if your queries tickle the bug...
 check the size of the filter cache from the admin screen (under
 plugins, filterCache)
 and see if it's current size is larger than the configured maximum.
 
 -Yonik
 
 
 On Jul 16, 2015, at 1:43 PM, Yonik Seeley ysee...@gmail.com wrote:
 
 To anyone using the JSON Facet API in released Solr versions:
 I discovered a serious memory leak while doing performance benchmarks
 (see http://yonik.com/facet_performance/ for some of the early results).
 
 Assuming you're in the evaluation / development phase of your project,
 I'd recommend using a recent developer snapshot for now:
 https://builds.apache.org/job/Solr-Artifacts-5.x/lastSuccessfulBuild/artifact/solr/package/
 
 The fix (and performance improvements) will also be in the next Solr
 release (5.3) of course.
 
 -Yonik
 


Re: serious JSON Facet bug

2015-07-24 Thread Yonik Seeley
On Fri, Jul 24, 2015 at 8:03 PM, Nagasharath sharathrayap...@gmail.com wrote:
 Is there a jira logged for this issue?

* SOLR-7781: JSON Facet API: Terms facet on string/text fields with
sub-facets caused
  a bug that resulted in filter cache lookup misses as well as the filter cache
  exceeding it's configured size. (yonik)

https://issues.apache.org/jira/browse/SOLR-7781

-Yonik


Re: Custom Solr caches in a FunctionQuery that emulates the ExternalFileField

2015-07-24 Thread Mikhail Khludnev
I think it's intended for

{!join fromIndex=other from=other_key to=key score=max}my_boost_value_field

thus it runs functional query, which matches all docs at other core with
field value 'my_boost_value_field' as a score. Then, this score is passed
through join query for other.other_key=key. Do you see something on
debugQuery=true?

On Fri, Jul 24, 2015 at 3:41 PM, Upayavira u...@odoko.co.uk wrote:

 Mikhail,

 I've tried this out, but to be honest I can't work out what the score=
 parameter is supposed to add.

 I assume that if I do {!join fromIndex=other from=other_key to=key
 score=max}somefield:(abc dev)

 It will calculate the score for each document that has the same key
 value, and include that in the score for the main document?

 If this is the case, then I should be able to do:

 {!join fromIndex=other from=other_key to=key score=max}{!boost
 b=my_boost_value_field}*:*

 In which case, it'll take the value of my_boost_field in the other
 core, and include it in the score for my document that has the value of
 key?

 Upayavira

 On Fri, Jul 10, 2015, at 04:15 PM, Mikhail Khludnev wrote:
  I've heard that people use
  https://issues.apache.org/jira/browse/SOLR-6234
  for such purpose - adding scores from fast moving core to the bigger slow
  moving one
 
  On Fri, Jul 10, 2015 at 4:54 PM, Upayavira u...@odoko.co.uk wrote:
 
   All,
  
   I have knocked up what I think could be a really cool function query -
   it allows you to retrieve a value from another core (much like a pseudo
   join) and use that value during scoring (much like an
   ExternalFileField).
  
   Examples:
* Selective boosting of documents based upon a category based value
* boost on aggregated popularity values
* boost on fast moving data on your slow moving index
  
   It *works* but it does so very slowly (on 3m docs, milliseconds
 without,
   and 24s with it). There are two things that happen a lot:
  
* locate a document with unique ID value of X
* retrieve the value of field Y for that doc
  
   What it seems to me now is that I need to implement a cache that will
   have a string value as the key and the (float) field value as the
   object, that is warmed alongside existing caches.
  
   Any pointers to examples of how I could do this, or other ways to do
 the
   conversion from a key value to a float value faster?
  
   NB. I hope to contribute this if I can make it perform.
  
   Thanks!
  
   Upayavira
  
 
 
 
  --
  Sincerely yours
  Mikhail Khludnev
  Principal Engineer,
  Grid Dynamics
 
  http://www.griddynamics.com
  mkhlud...@griddynamics.com




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com


Re: Solr Clustering Issue

2015-07-24 Thread Joseph Obernberger
Thank you Upayavira and Shawn.  Yes - the query works correctly using 
the standard select.  I have a workaround where I simply specify the 
fields I want to search in each part of the query and do not specify a 
df.  Just an FYI in case someone else runs into this.


-Joe

On 7/23/2015 10:51 AM, Shawn Heisey wrote:

On 7/23/2015 7:51 AM, Joseph Obernberger wrote:

Hi Upayavira - the URL was:

http://server1:9100/solr/MYCOL1/clustering?q=Collection:(COLLECT1008+OR+COLLECT2587)+AND+(amazon+AND+soap)wt=jsonindent=trueclustering=truerows=1df=FULL_DOCUMENTdebugQuery=true


Here is the relevant part of the response - notice that the default
field (FULL_DOCUMENT) is not in the response, and that it appears to
ignore parts of the query string.

snip


 parsedquery_toString:+(Collection:(COLLECT1008 (id:OR^10.0 |
text:or^0.5) (id:COLLECT2587)^10.0 | text:collect2587^0.5) (id:AND^10.0
| text:and^0.5) (id:(amazon^10.0 | text:amazon^0.5) (id:AND^10.0 |
text:and^0.5) (id:soap)^10.0 | text:soap^0.5)),
 QParser:ExtendedDismaxQParser,


According to the last line I quoted above, you are using the edismax
parser.  This parser does not use the df parameter, it uses qf and other
parameters to determine which fields to search.  It appears that you do
have a qf parameter, listing the id field with a boost of 10, and the
text field with a boost of 0.5.

Something else I noticed, not sure if it's relevant:  The presence of
id:OR^10.0 in that parsed query is very strange.  That is something I
would expect from the dismax parser, not edismax.

There have been some bugs with edismax and parentheses, it's conceivable
that there might be more problems:

https://issues.apache.org/jira/browse/SOLR-5435
https://issues.apache.org/jira/browse/SOLR-3377

Sometimes bugs with parentheses are fixed by adding spaces to separate
them from their contents.

Thanks,
Shawn






Re: Nested objects in Solr

2015-07-24 Thread Rajesh
You can use nested entities like below.

document
entity name=OuterEntity pk=id
query=SELECT * FROM User 
 field column=id name=id /  
field column=name name=name /
   
entity name=InnerEntity child=true
query=select * from subject 
   /entity
/entity
/document




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Nested-objects-in-Solr-tp4213212p4219039.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Running SOLR 5.2.1 on Embedded Jetty

2015-07-24 Thread Upayavira


On Thu, Jul 23, 2015, at 10:14 PM, Darin Amos wrote:
 Hello,
 
 I have been trying to run the SOLR war with embedded Jetty and can’t seem
 to get the config quiet right. Is there any known documentation on this
 or is someone else doing this? I seem to just be setting up a document
 server at my solr.home directory. The code snippet below seems incomplete
 to me, but I can’t seem to find what I am missing. 
 
 Thanks!
 
 Darin
 
 Server solrServer = new Server(8983);
 
 WebAppContext solrApp = new WebAppContext();
 solrApp.setContextPath(/);
 solrApp.setWar(solr.war);   //solr.war is sitting in my java.home root
 for now.
 solrServer.setHandler(solrApp);
 
 solrServer.start();
 solrServer.join();

I suspect the question needed here is why do you want to use an
Embedded jetty?

If it is for the sake of running tests, I'd suggest you look at the
tests that run within Solr itself.

Upayavira


Re: XSLT with maps

2015-07-24 Thread Sreekant Sreedharan
Yes I am fairly new to XSLT. I used the velocity response writer for some
prototypes. I found it very intuitive. But the requirement for the app
specifically rules it out and mandates the XSLT approach. I have finally got
it working. Thanks to all your help. Here's what I got (a minor correction
on your final suggestion; again, it is using the attributes here.). Here's
the final result for anyone else trying to do something similar.

  xsl:template match='/'
IMAGES
  xsl:apply-templates select=response/result/doc/
/IMAGES
  /xsl:template

  xsl:template match=doc
ID NewID={str[@name='id']}

  xsl:apply-templates select=bool[@name='pr']/
/ID
  /xsl:template

  xsl:template match=bool[.='false']
xsl:attribute name={@name}0/xsl:attribute
  /xsl:template

  xsl:template match=bool[.='true']
xsl:attribute name={@name}1/xsl:attribute
  /xsl:template

Thanks again Upayavira.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/XSLT-with-maps-tp4218518p4219015.html
Sent from the Solr - User mailing list archive at Nabble.com.