date:20150724

Re: serious JSON Facet bug

2015-07-24 Thread Yonik Seeley

On Fri, Jul 24, 2015 at 8:03 PM, Nagasharath  wrote:
> Is there a jira logged for this issue?

* SOLR-7781: JSON Facet API: Terms facet on string/text fields with
sub-facets caused
  a bug that resulted in filter cache lookup misses as well as the filter cache
  exceeding it's configured size. (yonik)

https://issues.apache.org/jira/browse/SOLR-7781

-Yonik

Re: serious JSON Facet bug

2015-07-24 Thread Nagasharath

Is there a jira logged for this issue?

Sent from my iPhone

> On 23-Jul-2015, at 11:09 pm, Nagasharath  wrote:
> 
> I don't have this issue.
> 
> I have tried with various json facet queries and my filter cache always come 
> down to the 'minsize'( never exceeds configured) with solr version 5.2.1, and 
> all my queries are json nested faceted.
> 
>>> On 23-Jul-2015, at 7:43 pm, Yonik Seeley  wrote:
>>> 
>>> On Thu, Jul 23, 2015 at 5:00 PM, Harry Yoo  wrote:
>>> Is there a way to patch? I am using 5.2.1 and using json facet in 
>>> production.
>> 
>> First you should see if your queries tickle the bug...
>> check the size of the filter cache from the admin screen (under
>> plugins, filterCache)
>> and see if it's current size is larger than the configured maximum.
>> 
>> -Yonik
>> 
>> 
 On Jul 16, 2015, at 1:43 PM, Yonik Seeley  wrote:
 
 To anyone using the JSON Facet API in released Solr versions:
 I discovered a serious memory leak while doing performance benchmarks
 (see http://yonik.com/facet_performance/ for some of the early results).
 
 Assuming you're in the evaluation / development phase of your project,
 I'd recommend using a recent developer snapshot for now:
 https://builds.apache.org/job/Solr-Artifacts-5.x/lastSuccessfulBuild/artifact/solr/package/
 
 The fix (and performance improvements) will also be in the next Solr
 release (5.3) of course.
 
 -Yonik
>>>

Re: Unexpected docvalues type error using result grouping - Use UninvertingReader or index with docvalues

2015-07-24 Thread Shawn Heisey

On 7/24/2015 3:48 PM, shamik wrote:
> Here's the part which I'm not able to understand. I've for e.g. Source A, B,
> C and D in index. Each source contains "n" number of documents. Now, out of
> these, a bunch of documents in A and B are tagged with MediaType. I took the
> following steps:
> 
> 1. Delete all documents tagged with MediaType for A and B. Documents from C
> and D are not touched.
> 
> 2. Re-Index documents which were tagged with MediaType
> 
> 3. Run Optimization
> 
> Still, I keep seeing this exception. Does this mean, content from C and D
> are impacted even though they are not tagged with MediaType ?

Do any docs from C and D have that field?  Never mind whether you need
to run your operation on them ... do they have the field?  If so, then
when the facet code (which knows about the schema and the fact that it
has docValues) looks at those segments, they do not have *any* docValues
tagging for that field.  This likely would cause big explosions.  This
lack of docValues tagging probably survives an optimize.

Even if they don't have the field, there may be something about the
Lucene format that the docValues support just doesn't like when the
original docs were indexed without docValues on that field.

Rebuilding the *entire* index is recommended for most schema changes,
especially those like docValues that affect very low-level code
implementations.  Solr hides lots of low-level Lucene details from the
administrator, but makes use of those details to do its job.  Making
sure your config and schema match what was present when the index was
built is sometimes critical.

Thanks,
Shawn

Re: Unexpected docvalues type error using result grouping - Use UninvertingReader or index with docvalues

2015-07-24 Thread shamik

Thanks Eric.

Here's the part which I'm not able to understand. I've for e.g. Source A, B,
C and D in index. Each source contains "n" number of documents. Now, out of
these, a bunch of documents in A and B are tagged with MediaType. I took the
following steps:

1. Delete all documents tagged with MediaType for A and B. Documents from C
and D are not touched.

2. Re-Index documents which were tagged with MediaType

3. Run Optimization

Still, I keep seeing this exception. Does this mean, content from C and D
are impacted even though they are not tagged with MediaType ?

I'll follow your recommendation of creating a new collection, do a full
index and delete original collection.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unexpected-docvalues-type-error-using-result-grouping-Use-UninvertingReader-or-index-with-docvalues-tp4218939p4219127.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Scoring, payloads and phrase queries

2015-07-24 Thread Jamie Johnson

looks like there is nothing that exists in this regard and there is no jira
ticket that I could find.  Is this something that there is any other
interest in?  Is this something that a ticket should be created for?

On Fri, Jul 24, 2015 at 10:41 AM, Jamie Johnson  wrote:

> Is there a way to consider payloads for scoring in phrase queries like
> exists in PayloadTermQuery?
>

Re: Unexpected docvalues type error using result grouping - Use UninvertingReader or index with docvalues

2015-07-24 Thread Erick Erickson

bq:  This started when I turned on "docvalues"

You _must_ re-index completely when changing something like this, so the notion
of removing the index completely isn't really any extra work.

Here's what I'd do.

1> just create a new collection with your current schema definition
and index to _that_.
That'll guarantee you don't have anything pre-existing that pollutes your index.
2> verify that this does what you want. Perhaps use a smaller set of
docs than your entire
corpus.
3> delete your original collection
4> If you require the same name, you can use collection aliasing to
make this change
transparent.

Creating/deleting collections and using collection aliasing are all
through the Collections API.

Best,
Erick

On Fri, Jul 24, 2015 at 10:16 AM, shamik  wrote:
> I didn't use the REST API, instead updated the schema manually.
>
> Can you be specific on removing the data directory content ? I certainly
> don't want to wipe out the index. I've four Solr instances, 2 shards with a
> replica each. Are you suggesting clearing the index and re-indexing from
> scratch ?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Unexpected-docvalues-type-error-using-result-grouping-Use-UninvertingReader-or-index-with-docvalues-tp4218939p4219089.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Natively Execute SOLR Queries within an app server.

2015-07-24 Thread Mikhail Khludnev

SolrDispatchFilter holds CoreContainer cores, perhaps you can extend the
filter to manage it to publish cores into jndi, where core can be found in
other application, and is used for instantiating EmbeddedSolrServer.

On Fri, Jul 24, 2015 at 9:50 PM, Darin Amos  wrote:

> Hello,
>
> I have an application server that is running both the solr.war and a REST
> API war within the same JVM. Is it possible to query the SOLR instance
> natively (non-blocking) without connecting over HTTP? I could use
> EmbeddedSolrServer but I cannot create a second instance of my core.
>
> If I can get a reference to my existing core instance and wrap it with new
> EmbeddedSolrServer(SolrCore), is this reasonable? However, I cannot see how
> to get a reference to an existing core in a supported way.
>
> Thanks
>
> Darin
>
>

-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

Re: caceh implemetation?

2015-07-24 Thread Mikhail Khludnev

On Fri, Jul 24, 2015 at 1:06 AM, Shawn Heisey  wrote:

> On 7/23/2015 10:55 AM, cbuxbaum wrote:
> > Say we have 100 party records.  Then the child SQL will be run
> 100
> > times (once for each party record).  Isn't there a way to just run the
> child
> > SQL on all of the party records at once with a join, using a GROUP BY and
> > ORDER BY on the PARTY_ID?  Then the results from that query could easily
> be
> > placed in SOLR according to the primary key (party_id).  Is there some
> part
> > of the Data Import Handler that operates that way?
>
> Using well-crafted SQL JOIN is almost always going to be better for
> dataimport than nested entities.  The heavy lifting is done by the
> database server, using code that's extremely well-optimized for that
> kind of lifting.  Doing what you describe with a parent entity and one
> nested entity (that is not cached) will result in 101 total SQL
> queries.  A million SQL queries, no matter how fast each one is, will be
> slow.
>
> If you can do everything in a single SQL query with JOIN, then Solr will
> make exactly one SQL query to the server for a full-import.
>
> For my own dataimport, I use a view that was defined on the mysql server
> by the dbadmin.  The view does all the JOINs we require.
>
> Solr's dataimport handler doesn't have any intelligence to do the join
> locally.  It would be cool if it did, but somebody would have to write
> the code to teach it how.  Because the DB server itself can already do
> JOINs, and it can do them VERY well, there's really no reason to teach
> it to Solr.
>

fwiw, DIH now has join=”zipper”
 attribute which can be
specified to child entity, it enables classic ETL external merge join
algorithm.


> Thanks,
> Shawn
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

[ANN] New Features For Splainer

2015-07-24 Thread Doug Turnbull

First, I wanted to humbly thank the Solr community for their contributions
and feedback for our open source Solr sandbox, Splainer (http://splainer.io
and http://github.com/o19s/splainer). The reception and comments have been
generally positive and helpful, and I very much appreciate being part of
such a great open source community that wants to support each other.

What is Splainer exactly? Why should you care? Nobody likes working with
Solr in the browser's URL bar. Splainer let's you paste in your Solr URL
and get an instant, easy to understand breakdown of why some documents are
ranked higher than others. It then gives you a friendly interface to tweak
Solr params and experiment with different ideas with a friendlier UI than
trying to parse through XML and JSON. You needn't worry about security
rules so that some splainer backend needing to talk to your Solr. The
interaction with Solr is 100% through your browser. If your PC can see
Solr, then so can Splainer running in your browser. If you leave work or
turn off the VPN, then Splainer can't see your Solr. It's all running
locally on your machine through the browser!

I wanted to share that we've been slowly adding features to Splainer. The
two I wanted to highlight, are captured in this blog article (
http://opensourceconnections.com/blog/2015/07/24/splainer-a-solr-developers-best-friend/
)

To summarize, they include

- Explain Other
You often wonder why obviously relevant search results don't come back.
Splainer now gives you the ability to compare any document to secondary
document to see what factors caused one document to rank higher than another

- Share Splainerized Solr Results
Once you paste a Solr URL into Splainer, you can then copy the splainer.io
URL to share what you're seeing with a colleague. For example, here's some
information about Virginia state laws about hunting deer from a boat:

http://splainer.io/#?solr=http:%2F%2Fsolr.quepid.com%2Fsolr%2Fstatedecoded%2Fselect%3Fq%3Ddeer%20hunt%20from%20watercraft%0A%26defType%3Dedismax%0A%26qf%3Dcatch_line%20text%0A%26bq%3Dtitle:deer

There's many more smaller features and tweaks, but I wanted to let you know
this was out there. I hope you find Splainer useful. I'm very happy to
field pull requests, ideas, suggestions, or try to figure out why Splainer
isn't working for you!

Cheers!
--
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections
, LLC | 240.476.9983
Author: Relevant Search
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.

Re: Natively Execute SOLR Queries within an app server.

2015-07-24 Thread Upayavira

On Fri, Jul 24, 2015, at 07:50 PM, Darin Amos wrote:
> Hello,
> 
> I have an application server that is running both the solr.war and a REST
> API war within the same JVM. Is it possible to query the SOLR instance
> natively (non-blocking) without connecting over HTTP? I could use
> EmbeddedSolrServer but I cannot create a second instance of my core.
> 
> If I can get a reference to my existing core instance and wrap it with
> new EmbeddedSolrServer(SolrCore), is this reasonable? However, I cannot
> see how to get a reference to an existing core in a supported way.

This is not a supported use-case. Solr is intended to be a stand-alone
application server that happens to be written in Java.

I believe, as of 5.3, there may not be a war file included in Solr, and
gradually, creating a war will get harder, or even become impossible.

If you wanted to run something inside the same VM, write your own
request handler, and make it a part of Solr itself.

See: http://wiki.apache.org/solr/WhyNoWar

Upayavira

Natively Execute SOLR Queries within an app server.

2015-07-24 Thread Darin Amos

Hello,

I have an application server that is running both the solr.war and a REST API 
war within the same JVM. Is it possible to query the SOLR instance natively 
(non-blocking) without connecting over HTTP? I could use EmbeddedSolrServer but 
I cannot create a second instance of my core.

If I can get a reference to my existing core instance and wrap it with new 
EmbeddedSolrServer(SolrCore), is this reasonable? However, I cannot see how to 
get a reference to an existing core in a supported way.

Thanks

Darin

Re: Unexpected docvalues type error using result grouping - Use UninvertingReader or index with docvalues

2015-07-24 Thread shamik

I didn't use the REST API, instead updated the schema manually.

Can you be specific on removing the data directory content ? I certainly
don't want to wipe out the index. I've four Solr instances, 2 shards with a
replica each. Are you suggesting clearing the index and re-indexing from
scratch ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unexpected-docvalues-type-error-using-result-grouping-Use-UninvertingReader-or-index-with-docvalues-tp4218939p4219089.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: term frequency with stemming

2015-07-24 Thread Darin Amos

Hi Dale,

I would think the coffee shop is better, I have in-laws visiting at home.

Thanks

Darin


> On Jul 24, 2015, at 12:04 PM, Aki Balogh  wrote:
> 
> Hi All,
> 
> I'm using TermVectorComponent and stemming (Porter) in order to get term
> frequencies with fuzzy matching. I'm stemming at index and query time.
> 
> Is there a way to get term frequency from the index?
> * termfreq doesn't support stemming or wildcards
> * terms component doesn't allow additional filters
> * I could use a copyfield to save a non-stemmed version at indexing, and
> run termfreq on that, but then I don't get any fuzzy matching
> 
> Thanks,
> Aki

term frequency with stemming

2015-07-24 Thread Aki Balogh

Hi All,

I'm using TermVectorComponent and stemming (Porter) in order to get term
frequencies with fuzzy matching. I'm stemming at index and query time.

Is there a way to get term frequency from the index?
* termfreq doesn't support stemming or wildcards
* terms component doesn't allow additional filters
* I could use a copyfield to save a non-stemmed version at indexing, and
run termfreq on that, but then I don't get any fuzzy matching

Thanks,
Aki

Scoring, payloads and phrase queries

2015-07-24 Thread Jamie Johnson

Is there a way to consider payloads for scoring in phrase queries like
exists in PayloadTermQuery?

Re: Running SOLR 5.2.1 on Embedded Jetty

2015-07-24 Thread Darin Amos

Thanks Shawn,

I actually figured out the issue while I was on my flight back home. It was a 
trivial problem caused by a bad assumption. I have some classpath issues now 
but those are problems I can solve.

Sorry for not including any logs, the behaviour looked like it was simply not 
detecting the war and I was just curious if there was something obvious I was 
missing because it is hard to find documentation on. It started working when I 
exploded the war archive, a mistake I probably shouldn’t have made.

To Upayavira: I think it is a fair question why I would be using Embedded 
Jetty. There is a lot of value and use cases for this, in my case I want to run 
my SOLR instance within the JVM of another java process. I also believe in a 
philosophy that developers should understand how software works, not just how 
to use it; therefore I like to experiment with unconventional approaches when I 
tinker. This doesn’t mean I would take the unconventional approach to 
production.

Thanks!

Darin


> On Jul 23, 2015, at 7:54 PM, Shawn Heisey  wrote:
> 
> On 7/23/2015 3:14 PM, Darin Amos wrote:
>> I have been trying to run the SOLR war with embedded Jetty and can’t seem to 
>> get the config quiet right. Is there any known documentation on this or is 
>> someone else doing this? I seem to just be setting up a document server at 
>> my solr.home directory. The code snippet below seems incomplete to me, but I 
>> can’t seem to find what I am missing. 
>> 
>> Thanks!
>> 
>> Darin
>> 
>> Server solrServer = new Server(8983);
>> 
>> WebAppContext solrApp = new WebAppContext();
>> solrApp.setContextPath("/");
>> solrApp.setWar("solr.war");   //solr.war is sitting in my java.home root for 
>> now.
>> solrServer.setHandler(solrApp);
>> 
>> solrServer.start();
>> solrServer.join();
> 
> The only officially supported way to run Solr since 5.0 was released is
> with the scripts included in the "bin" directory in the download.
> 
> https://wiki.apache.org/solr/WhyNoWar 
> 
> That doesn't mean I won't try to help you, but without logs, there's no
> way to know what is happening.  You may need help from the Jetty
> project, at least to set up logging, and possibly with the rest of it. 
> Here's some info on logging for a standard install ... I have no idea
> how you'd go about this for the embedded version:
> 
> http://www.eclipse.org/jetty/documentation/9.2.7.v20150116/configuring-logging.html
>  
> 
> 
> For Solr's logging, you need the jars from the server/lib/ext directory
> in the Solr download (for the included jetty server) in a similar
> directory for your application, and the log4j.properties file needs to
> be on the classpath or explicitly described with an appropriate system
> property.
> 
> https://wiki.apache.org/solr/SolrLogging 
> 
> 
> In the Solr download, look at the xml file in server/contexts (5.x) for
> some hints about how to properly configure jetty for the webapp.
> 
> I would recommend that you use "/solr" for the context path.  Every
> example you'll run into uses that URL path.  If you want to be
> explicitly different than default to make an attacker's job harder, then
> pick some other string to put after the slash.  I don't have much
> experience with the root context, but I've read somewhere that there can
> be some pitfalls.  I do not know what they are.
> 
> Thanks,
> Shawn

Re: Nested objects in Solr

2015-07-24 Thread Alexandre Rafalovitch

Actually, Solr has been supporting Nested Objects for a little while:
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments

The schema represents a union of all possible fields though, so yes,
some care needs to be taken with names and mappings.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 24 July 2015 at 09:52, Bill Au  wrote:
> What exactly do you mean by nested objects in Solr.  It would help if you
> give an example.  The Solr schema is flat as far as I know.
>
> Bill
>
> On Fri, Jul 24, 2015 at 9:24 AM, Rajesh 
> wrote:
>
>> You can use nested entities like below.
>>
>> 
>> > query="SELECT * FROM User">
>>  
>> 
>>
>> > query="select * from subject" >
>>
>> 
>> 
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Nested-objects-in-Solr-tp4213212p4219039.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>

Re: Nested objects in Solr

2015-07-24 Thread Bill Au

What exactly do you mean by nested objects in Solr.  It would help if you
give an example.  The Solr schema is flat as far as I know.

Bill

On Fri, Jul 24, 2015 at 9:24 AM, Rajesh 
wrote:

> You can use nested entities like below.
>
> 
>  query="SELECT * FROM User">
>  
> 
>
>  query="select * from subject" >
>
> 
> 
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Nested-objects-in-Solr-tp4213212p4219039.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Nested objects in Solr

2015-07-24 Thread Rajesh

You can use nested entities like below.


 
   

   

   






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Nested-objects-in-Solr-tp4213212p4219039.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Clustering Issue

2015-07-24 Thread Joseph Obernberger

Thank you Upayavira and Shawn.  Yes - the query works correctly using 
the standard select.  I have a workaround where I simply specify the 
fields I want to search in each part of the query and do not specify a 
df.  Just an FYI in case someone else runs into this.


-Joe

On 7/23/2015 10:51 AM, Shawn Heisey wrote:

On 7/23/2015 7:51 AM, Joseph Obernberger wrote:

Hi Upayavira - the URL was:

http://server1:9100/solr/MYCOL1/clustering?q=Collection:(COLLECT1008+OR+COLLECT2587)+AND+(amazon+AND+soap)&wt=json&indent=true&clustering=true&rows=1&df=FULL_DOCUMENT&debugQuery=true


Here is the relevant part of the response - notice that the default
field (FULL_DOCUMENT) is not in the response, and that it appears to
ignore parts of the query string.




 "parsedquery_toString":"+(Collection:(COLLECT1008 (id:OR^10.0 |
text:or^0.5) (id:COLLECT2587)^10.0 | text:collect2587^0.5) (id:AND^10.0
| text:and^0.5) (id:(amazon^10.0 | text:amazon^0.5) (id:AND^10.0 |
text:and^0.5) (id:soap)^10.0 | text:soap^0.5))",
 "QParser":"ExtendedDismaxQParser",


According to the last line I quoted above, you are using the edismax
parser.  This parser does not use the df parameter, it uses qf and other
parameters to determine which fields to search.  It appears that you do
have a qf parameter, listing the id field with a boost of 10, and the
text field with a boost of 0.5.

Something else I noticed, not sure if it's relevant:  The presence of
"id:OR^10.0" in that parsed query is very strange.  That is something I
would expect from the dismax parser, not edismax.

There have been some bugs with edismax and parentheses, it's conceivable
that there might be more problems:

https://issues.apache.org/jira/browse/SOLR-5435
https://issues.apache.org/jira/browse/SOLR-3377

Sometimes bugs with parentheses are fixed by adding spaces to separate
them from their contents.

Thanks,
Shawn

Re: Custom Solr caches in a FunctionQuery that emulates the ExternalFileField

2015-07-24 Thread Mikhail Khludnev

I think it's intended for

{!join fromIndex=other from=other_key to=key score=max}my_boost_value_field

thus it runs functional query, which matches all docs at "other" core with
field value 'my_boost_value_field' as a score. Then, this score is passed
through join query for other.other_key=key. Do you see something on
debugQuery=true?

On Fri, Jul 24, 2015 at 3:41 PM, Upayavira  wrote:

> Mikhail,
>
> I've tried this out, but to be honest I can't work out what the score=
> parameter is supposed to add.
>
> I assume that if I do {!join fromIndex=other from=other_key to=key
> score=max}somefield:(abc dev)
>
> It will calculate the score for each document that has the same "key"
> value, and include that in the score for the main document?
>
> If this is the case, then I should be able to do:
>
> {!join fromIndex=other from=other_key to=key score=max}{!boost
> b=my_boost_value_field}*:*
>
> In which case, it'll take the value of "my_boost_field" in the other
> core, and include it in the score for my document that has the value of
> "key"?
>
> Upayavira
>
> On Fri, Jul 10, 2015, at 04:15 PM, Mikhail Khludnev wrote:
> > I've heard that people use
> > https://issues.apache.org/jira/browse/SOLR-6234
> > for such purpose - adding scores from fast moving core to the bigger slow
> > moving one
> >
> > On Fri, Jul 10, 2015 at 4:54 PM, Upayavira  wrote:
> >
> > > All,
> > >
> > > I have knocked up what I think could be a really cool function query -
> > > it allows you to retrieve a value from another core (much like a pseudo
> > > join) and use that value during scoring (much like an
> > > ExternalFileField).
> > >
> > > Examples:
> > >  * Selective boosting of documents based upon a category based value
> > >  * boost on aggregated popularity values
> > >  * boost on fast moving data on your slow moving index
> > >
> > > It *works* but it does so very slowly (on 3m docs, milliseconds
> without,
> > > and 24s with it). There are two things that happen a lot:
> > >
> > >  * locate a document with unique ID value of X
> > >  * retrieve the value of field Y for that doc
> > >
> > > What it seems to me now is that I need to implement a cache that will
> > > have a string value as the key and the (float) field value as the
> > > object, that is warmed alongside existing caches.
> > >
> > > Any pointers to examples of how I could do this, or other ways to do
> the
> > > conversion from a key value to a float value faster?
> > >
> > > NB. I hope to contribute this if I can make it perform.
> > >
> > > Thanks!
> > >
> > > Upayavira
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > 
> > 
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

RE: Per-document and per-query analysis

2015-07-24 Thread Markus Jelsma

Hello Alessandro, i have thought about that, but in this case we do not want 
more fields, just perform some additional normalization filters based on some 
parameter. We need this type of index to be very low latency, and we have many 
varieties. We know from experience that hundreds of fields adds considerable 
overhead, visible in the prepare section when debugging.

Markus
 
-Original message-
> From:Alessandro Benedetti 
> Sent: Thursday 23rd July 2015 18:08
> To: solr-user@lucene.apache.org
> Subject: Re: Per-document and per-query analysis
> 
> markus,
> the first idea that come to my mind is this :
> 
> 1) you configure your schema, creating your field types, and if necessary
> fields associated
> 2) you build an UpdateRequestProcessor that do a conditional check per
> document, and create the proper fields starting from one input field .
> 
> In this way you will have the possibility of automatically analyse
> indexing/query time differently each field.
> As a cons you will have more fields, and not only one, each field will
> reflect your requirements in terms of analysis.1) you configure your schema, 
> creating your field types, and if necessary
fields associated
2) you build an UpdateRequestProcessor that do a conditional check per
document, and create the proper fields starting from one input field .

> 
> Do you think this solution can satisfy you ?
> Please share a feedback and we can discuss better the requirements.
> 
> Cheers
> 
> 2015-07-23 17:03 GMT+01:00 Markus Jelsma :
> 
> > Hello - the title says it all. When indexing a document, we need to run
> > one or more additional filters depending on the value of a specific field.
> > Likewise, we need to run that same filter over the already analyzed tokens
> > when querying. This is not going to work if i extend TextField, at all. And
> > i am not sure about QParsers as well because it should be QParser agnostic.
> >
> > I am in need of some hints about which parts of the codebase i should
> > extend or replace, if possible at all. For the record, in this case we do
> > not want to create additional fields.
> >
> > Many thanks,
> > Markus
> >
> 
> 
> 
> -- 
> --
> 
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
> 
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
> 
> William Blake - Songs of Experience -1794 England
>

Re: Custom Solr caches in a FunctionQuery that emulates the ExternalFileField

2015-07-24 Thread Upayavira

Mikhail,

I've tried this out, but to be honest I can't work out what the score=
parameter is supposed to add.

I assume that if I do {!join fromIndex=other from=other_key to=key
score=max}somefield:(abc dev)

It will calculate the score for each document that has the same "key"
value, and include that in the score for the main document?

If this is the case, then I should be able to do:

{!join fromIndex=other from=other_key to=key score=max}{!boost
b=my_boost_value_field}*:*

In which case, it'll take the value of "my_boost_field" in the other
core, and include it in the score for my document that has the value of
"key"?

Upayavira

On Fri, Jul 10, 2015, at 04:15 PM, Mikhail Khludnev wrote:
> I've heard that people use
> https://issues.apache.org/jira/browse/SOLR-6234
> for such purpose - adding scores from fast moving core to the bigger slow
> moving one
> 
> On Fri, Jul 10, 2015 at 4:54 PM, Upayavira  wrote:
> 
> > All,
> >
> > I have knocked up what I think could be a really cool function query -
> > it allows you to retrieve a value from another core (much like a pseudo
> > join) and use that value during scoring (much like an
> > ExternalFileField).
> >
> > Examples:
> >  * Selective boosting of documents based upon a category based value
> >  * boost on aggregated popularity values
> >  * boost on fast moving data on your slow moving index
> >
> > It *works* but it does so very slowly (on 3m docs, milliseconds without,
> > and 24s with it). There are two things that happen a lot:
> >
> >  * locate a document with unique ID value of X
> >  * retrieve the value of field Y for that doc
> >
> > What it seems to me now is that I need to implement a cache that will
> > have a string value as the key and the (float) field value as the
> > object, that is warmed alongside existing caches.
> >
> > Any pointers to examples of how I could do this, or other ways to do the
> > conversion from a key value to a float value faster?
> >
> > NB. I hope to contribute this if I can make it perform.
> >
> > Thanks!
> >
> > Upayavira
> >
> 
> 
> 
> -- 
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
> 
> 
>

Re: Running SOLR 5.2.1 on Embedded Jetty

2015-07-24 Thread Upayavira



On Thu, Jul 23, 2015, at 10:14 PM, Darin Amos wrote:
> Hello,
> 
> I have been trying to run the SOLR war with embedded Jetty and can’t seem
> to get the config quiet right. Is there any known documentation on this
> or is someone else doing this? I seem to just be setting up a document
> server at my solr.home directory. The code snippet below seems incomplete
> to me, but I can’t seem to find what I am missing. 
> 
> Thanks!
> 
> Darin
> 
> Server solrServer = new Server(8983);
> 
> WebAppContext solrApp = new WebAppContext();
> solrApp.setContextPath("/");
> solrApp.setWar("solr.war");   //solr.war is sitting in my java.home root
> for now.
> solrServer.setHandler(solrApp);
> 
> solrServer.start();
> solrServer.join();

I suspect the question needed here is "why do you want to use an
Embedded jetty?"

If it is for the sake of running tests, I'd suggest you look at the
tests that run within Solr itself.

Upayavira

Re: XSLT with maps

2015-07-24 Thread Sreekant Sreedharan

Yes I am fairly new to XSLT. I used the velocity response writer for some
prototypes. I found it very intuitive. But the requirement for the app
specifically rules it out and mandates the XSLT approach. I have finally got
it working. Thanks to all your help. Here's what I got (a minor correction
on your final suggestion; again, it is using the attributes here.). Here's
the final result for anyone else trying to do something similar.

  

  

  

  

  

  

  
0
  

  
1
  

Thanks again Upayavira.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/XSLT-with-maps-tp4218518p4219015.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to connect Solr with Impala?

2015-07-24 Thread Upayavira

On Fri, Jul 24, 2015, at 12:53 AM, Rex X wrote:
> Given following Impala query:
> 
> SELECT date, SUM(CAST(price AS DOUBLE)) AS price
> FROM table
> WHERE date='2014-01-01' AND store_id IN(1,2,3)
> GROUP BY date;
> 
> To work with Solr
> 
>  1. Will it be more efficient to directly use equivalent Solr query? Any
> curl command equivalent to the Impala query above? Or
>  2. Will it be faster to create a new table based on the query above with
> Impala, and then connect Impala with Solr? Any such Impala-Solr
> connector?
> 
> The final goal is to use Kibana to connect Solr for visualization.
> 
> Any comments are greatly welcome!

I do not know Impala so cannot comment much on that - i.e. would
querying Solr or Impala be more efficient? No idea.

The above looks like an aggregation with filtering, so I'd suggest you
look at the new json facet API in Solr which would get your aggregations
(and summing).

To query against Solr, you need to have pushed your content *to* Solr.
It won't go ask Impala for you. You will have to set up mechanisms for
your content to get into Solr for Solr to be any use.

Lastly, Kibana is a tool that works on top of Elasticsearch. To use
Solr, you should look at Lucidworks Banana in its place.

Upayavira

Re: serious JSON Facet bug

Re: serious JSON Facet bug

Re: Unexpected docvalues type error using result grouping - Use UninvertingReader or index with docvalues

Re: Unexpected docvalues type error using result grouping - Use UninvertingReader or index with docvalues

Re: Scoring, payloads and phrase queries

Re: Unexpected docvalues type error using result grouping - Use UninvertingReader or index with docvalues

Re: Natively Execute SOLR Queries within an app server.

Re: caceh implemetation?

[ANN] New Features For Splainer

Re: Natively Execute SOLR Queries within an app server.

Natively Execute SOLR Queries within an app server.

Re: Unexpected docvalues type error using result grouping - Use UninvertingReader or index with docvalues

Re: term frequency with stemming

term frequency with stemming

Scoring, payloads and phrase queries

Re: Running SOLR 5.2.1 on Embedded Jetty

Re: Nested objects in Solr

Re: Nested objects in Solr

Re: Nested objects in Solr

Re: Solr Clustering Issue

Re: Custom Solr caches in a FunctionQuery that emulates the ExternalFileField

RE: Per-document and per-query analysis

Re: Custom Solr caches in a FunctionQuery that emulates the ExternalFileField

Re: Running SOLR 5.2.1 on Embedded Jetty

Re: XSLT with maps

Re: How to connect Solr with Impala?

26 matches

Site Navigation

Mail list logo

Footer information