Re: How to connect Solr with Impala?
On Fri, Jul 24, 2015, at 12:53 AM, Rex X wrote: Given following Impala query: SELECT date, SUM(CAST(price AS DOUBLE)) AS price FROM table WHERE date='2014-01-01' AND store_id IN(1,2,3) GROUP BY date; To work with Solr 1. Will it be more efficient to directly use equivalent Solr query? Any curl command equivalent to the Impala query above? Or 2. Will it be faster to create a new table based on the query above with Impala, and then connect Impala with Solr? Any such Impala-Solr connector? The final goal is to use Kibana to connect Solr for visualization. Any comments are greatly welcome! I do not know Impala so cannot comment much on that - i.e. would querying Solr or Impala be more efficient? No idea. The above looks like an aggregation with filtering, so I'd suggest you look at the new json facet API in Solr which would get your aggregations (and summing). To query against Solr, you need to have pushed your content *to* Solr. It won't go ask Impala for you. You will have to set up mechanisms for your content to get into Solr for Solr to be any use. Lastly, Kibana is a tool that works on top of Elasticsearch. To use Solr, you should look at Lucidworks Banana in its place. Upayavira
Re: Custom Solr caches in a FunctionQuery that emulates the ExternalFileField
Mikhail, I've tried this out, but to be honest I can't work out what the score= parameter is supposed to add. I assume that if I do {!join fromIndex=other from=other_key to=key score=max}somefield:(abc dev) It will calculate the score for each document that has the same key value, and include that in the score for the main document? If this is the case, then I should be able to do: {!join fromIndex=other from=other_key to=key score=max}{!boost b=my_boost_value_field}*:* In which case, it'll take the value of my_boost_field in the other core, and include it in the score for my document that has the value of key? Upayavira On Fri, Jul 10, 2015, at 04:15 PM, Mikhail Khludnev wrote: I've heard that people use https://issues.apache.org/jira/browse/SOLR-6234 for such purpose - adding scores from fast moving core to the bigger slow moving one On Fri, Jul 10, 2015 at 4:54 PM, Upayavira u...@odoko.co.uk wrote: All, I have knocked up what I think could be a really cool function query - it allows you to retrieve a value from another core (much like a pseudo join) and use that value during scoring (much like an ExternalFileField). Examples: * Selective boosting of documents based upon a category based value * boost on aggregated popularity values * boost on fast moving data on your slow moving index It *works* but it does so very slowly (on 3m docs, milliseconds without, and 24s with it). There are two things that happen a lot: * locate a document with unique ID value of X * retrieve the value of field Y for that doc What it seems to me now is that I need to implement a cache that will have a string value as the key and the (float) field value as the object, that is warmed alongside existing caches. Any pointers to examples of how I could do this, or other ways to do the conversion from a key value to a float value faster? NB. I hope to contribute this if I can make it perform. Thanks! Upayavira -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
RE: Per-document and per-query analysis
Hello Alessandro, i have thought about that, but in this case we do not want more fields, just perform some additional normalization filters based on some parameter. We need this type of index to be very low latency, and we have many varieties. We know from experience that hundreds of fields adds considerable overhead, visible in the prepare section when debugging. Markus -Original message- From:Alessandro Benedetti benedetti.ale...@gmail.com Sent: Thursday 23rd July 2015 18:08 To: solr-user@lucene.apache.org Subject: Re: Per-document and per-query analysis markus, the first idea that come to my mind is this : 1) you configure your schema, creating your field types, and if necessary fields associated 2) you build an UpdateRequestProcessor that do a conditional check per document, and create the proper fields starting from one input field . In this way you will have the possibility of automatically analyse indexing/query time differently each field. As a cons you will have more fields, and not only one, each field will reflect your requirements in terms of analysis.1) you configure your schema, creating your field types, and if necessary fields associated 2) you build an UpdateRequestProcessor that do a conditional check per document, and create the proper fields starting from one input field . Do you think this solution can satisfy you ? Please share a feedback and we can discuss better the requirements. Cheers 2015-07-23 17:03 GMT+01:00 Markus Jelsma markus.jel...@openindex.io: Hello - the title says it all. When indexing a document, we need to run one or more additional filters depending on the value of a specific field. Likewise, we need to run that same filter over the already analyzed tokens when querying. This is not going to work if i extend TextField, at all. And i am not sure about QParsers as well because it should be QParser agnostic. I am in need of some hints about which parts of the codebase i should extend or replace, if possible at all. For the record, in this case we do not want to create additional fields. Many thanks, Markus -- -- Benedetti Alessandro Visiting card - http://about.me/alessandro_benedetti Blog - http://alexbenedetti.blogspot.co.uk Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Nested objects in Solr
Actually, Solr has been supporting Nested Objects for a little while: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments The schema represents a union of all possible fields though, so yes, some care needs to be taken with names and mappings. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 24 July 2015 at 09:52, Bill Au bill.w...@gmail.com wrote: What exactly do you mean by nested objects in Solr. It would help if you give an example. The Solr schema is flat as far as I know. Bill On Fri, Jul 24, 2015 at 9:24 AM, Rajesh rajesh.panneersel...@aspiresys.com wrote: You can use nested entities like below. document entity name=OuterEntity pk=id query=SELECT * FROM User field column=id name=id / field column=name name=name / entity name=InnerEntity child=true query=select * from subject /entity /entity /document -- View this message in context: http://lucene.472066.n3.nabble.com/Nested-objects-in-Solr-tp4213212p4219039.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Running SOLR 5.2.1 on Embedded Jetty
Thanks Shawn, I actually figured out the issue while I was on my flight back home. It was a trivial problem caused by a bad assumption. I have some classpath issues now but those are problems I can solve. Sorry for not including any logs, the behaviour looked like it was simply not detecting the war and I was just curious if there was something obvious I was missing because it is hard to find documentation on. It started working when I exploded the war archive, a mistake I probably shouldn’t have made. To Upayavira: I think it is a fair question why I would be using Embedded Jetty. There is a lot of value and use cases for this, in my case I want to run my SOLR instance within the JVM of another java process. I also believe in a philosophy that developers should understand how software works, not just how to use it; therefore I like to experiment with unconventional approaches when I tinker. This doesn’t mean I would take the unconventional approach to production. Thanks! Darin On Jul 23, 2015, at 7:54 PM, Shawn Heisey apa...@elyograg.org wrote: On 7/23/2015 3:14 PM, Darin Amos wrote: I have been trying to run the SOLR war with embedded Jetty and can’t seem to get the config quiet right. Is there any known documentation on this or is someone else doing this? I seem to just be setting up a document server at my solr.home directory. The code snippet below seems incomplete to me, but I can’t seem to find what I am missing. Thanks! Darin Server solrServer = new Server(8983); WebAppContext solrApp = new WebAppContext(); solrApp.setContextPath(/); solrApp.setWar(solr.war); //solr.war is sitting in my java.home root for now. solrServer.setHandler(solrApp); solrServer.start(); solrServer.join(); The only officially supported way to run Solr since 5.0 was released is with the scripts included in the bin directory in the download. https://wiki.apache.org/solr/WhyNoWar https://wiki.apache.org/solr/WhyNoWar That doesn't mean I won't try to help you, but without logs, there's no way to know what is happening. You may need help from the Jetty project, at least to set up logging, and possibly with the rest of it. Here's some info on logging for a standard install ... I have no idea how you'd go about this for the embedded version: http://www.eclipse.org/jetty/documentation/9.2.7.v20150116/configuring-logging.html http://www.eclipse.org/jetty/documentation/9.2.7.v20150116/configuring-logging.html For Solr's logging, you need the jars from the server/lib/ext directory in the Solr download (for the included jetty server) in a similar directory for your application, and the log4j.properties file needs to be on the classpath or explicitly described with an appropriate system property. https://wiki.apache.org/solr/SolrLogging https://wiki.apache.org/solr/SolrLogging In the Solr download, look at the xml file in server/contexts (5.x) for some hints about how to properly configure jetty for the webapp. I would recommend that you use /solr for the context path. Every example you'll run into uses that URL path. If you want to be explicitly different than default to make an attacker's job harder, then pick some other string to put after the slash. I don't have much experience with the root context, but I've read somewhere that there can be some pitfalls. I do not know what they are. Thanks, Shawn
Re: term frequency with stemming
Hi Dale, I would think the coffee shop is better, I have in-laws visiting at home. Thanks Darin On Jul 24, 2015, at 12:04 PM, Aki Balogh a...@marketmuse.com wrote: Hi All, I'm using TermVectorComponent and stemming (Porter) in order to get term frequencies with fuzzy matching. I'm stemming at index and query time. Is there a way to get term frequency from the index? * termfreq doesn't support stemming or wildcards * terms component doesn't allow additional filters * I could use a copyfield to save a non-stemmed version at indexing, and run termfreq on that, but then I don't get any fuzzy matching Thanks, Aki
Scoring, payloads and phrase queries
Is there a way to consider payloads for scoring in phrase queries like exists in PayloadTermQuery?
Re: Natively Execute SOLR Queries within an app server.
SolrDispatchFilter holds CoreContainer cores, perhaps you can extend the filter to manage it to publish cores into jndi, where core can be found in other application, and is used for instantiating EmbeddedSolrServer. On Fri, Jul 24, 2015 at 9:50 PM, Darin Amos dari...@gmail.com wrote: Hello, I have an application server that is running both the solr.war and a REST API war within the same JVM. Is it possible to query the SOLR instance natively (non-blocking) without connecting over HTTP? I could use EmbeddedSolrServer but I cannot create a second instance of my core. If I can get a reference to my existing core instance and wrap it with new EmbeddedSolrServer(SolrCore), is this reasonable? However, I cannot see how to get a reference to an existing core in a supported way. Thanks Darin -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Scoring, payloads and phrase queries
looks like there is nothing that exists in this regard and there is no jira ticket that I could find. Is this something that there is any other interest in? Is this something that a ticket should be created for? On Fri, Jul 24, 2015 at 10:41 AM, Jamie Johnson jej2...@gmail.com wrote: Is there a way to consider payloads for scoring in phrase queries like exists in PayloadTermQuery?
Re: Unexpected docvalues type error using result grouping - Use UninvertingReader or index with docvalues
bq: This started when I turned on docvalues You _must_ re-index completely when changing something like this, so the notion of removing the index completely isn't really any extra work. Here's what I'd do. 1 just create a new collection with your current schema definition and index to _that_. That'll guarantee you don't have anything pre-existing that pollutes your index. 2 verify that this does what you want. Perhaps use a smaller set of docs than your entire corpus. 3 delete your original collection 4 If you require the same name, you can use collection aliasing to make this change transparent. Creating/deleting collections and using collection aliasing are all through the Collections API. Best, Erick On Fri, Jul 24, 2015 at 10:16 AM, shamik sham...@gmail.com wrote: I didn't use the REST API, instead updated the schema manually. Can you be specific on removing the data directory content ? I certainly don't want to wipe out the index. I've four Solr instances, 2 shards with a replica each. Are you suggesting clearing the index and re-indexing from scratch ? -- View this message in context: http://lucene.472066.n3.nabble.com/Unexpected-docvalues-type-error-using-result-grouping-Use-UninvertingReader-or-index-with-docvalues-tp4218939p4219089.html Sent from the Solr - User mailing list archive at Nabble.com.
term frequency with stemming
Hi All, I'm using TermVectorComponent and stemming (Porter) in order to get term frequencies with fuzzy matching. I'm stemming at index and query time. Is there a way to get term frequency from the index? * termfreq doesn't support stemming or wildcards * terms component doesn't allow additional filters * I could use a copyfield to save a non-stemmed version at indexing, and run termfreq on that, but then I don't get any fuzzy matching Thanks, Aki
Re: Unexpected docvalues type error using result grouping - Use UninvertingReader or index with docvalues
Thanks Eric. Here's the part which I'm not able to understand. I've for e.g. Source A, B, C and D in index. Each source contains n number of documents. Now, out of these, a bunch of documents in A and B are tagged with MediaType. I took the following steps: 1. Delete all documents tagged with MediaType for A and B. Documents from C and D are not touched. 2. Re-Index documents which were tagged with MediaType 3. Run Optimization Still, I keep seeing this exception. Does this mean, content from C and D are impacted even though they are not tagged with MediaType ? I'll follow your recommendation of creating a new collection, do a full index and delete original collection. -- View this message in context: http://lucene.472066.n3.nabble.com/Unexpected-docvalues-type-error-using-result-grouping-Use-UninvertingReader-or-index-with-docvalues-tp4218939p4219127.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Unexpected docvalues type error using result grouping - Use UninvertingReader or index with docvalues
On 7/24/2015 3:48 PM, shamik wrote: Here's the part which I'm not able to understand. I've for e.g. Source A, B, C and D in index. Each source contains n number of documents. Now, out of these, a bunch of documents in A and B are tagged with MediaType. I took the following steps: 1. Delete all documents tagged with MediaType for A and B. Documents from C and D are not touched. 2. Re-Index documents which were tagged with MediaType 3. Run Optimization Still, I keep seeing this exception. Does this mean, content from C and D are impacted even though they are not tagged with MediaType ? Do any docs from C and D have that field? Never mind whether you need to run your operation on them ... do they have the field? If so, then when the facet code (which knows about the schema and the fact that it has docValues) looks at those segments, they do not have *any* docValues tagging for that field. This likely would cause big explosions. This lack of docValues tagging probably survives an optimize. Even if they don't have the field, there may be something about the Lucene format that the docValues support just doesn't like when the original docs were indexed without docValues on that field. Rebuilding the *entire* index is recommended for most schema changes, especially those like docValues that affect very low-level code implementations. Solr hides lots of low-level Lucene details from the administrator, but makes use of those details to do its job. Making sure your config and schema match what was present when the index was built is sometimes critical. Thanks, Shawn
Re: Nested objects in Solr
What exactly do you mean by nested objects in Solr. It would help if you give an example. The Solr schema is flat as far as I know. Bill On Fri, Jul 24, 2015 at 9:24 AM, Rajesh rajesh.panneersel...@aspiresys.com wrote: You can use nested entities like below. document entity name=OuterEntity pk=id query=SELECT * FROM User field column=id name=id / field column=name name=name / entity name=InnerEntity child=true query=select * from subject /entity /entity /document -- View this message in context: http://lucene.472066.n3.nabble.com/Nested-objects-in-Solr-tp4213212p4219039.html Sent from the Solr - User mailing list archive at Nabble.com.
Natively Execute SOLR Queries within an app server.
Hello, I have an application server that is running both the solr.war and a REST API war within the same JVM. Is it possible to query the SOLR instance natively (non-blocking) without connecting over HTTP? I could use EmbeddedSolrServer but I cannot create a second instance of my core. If I can get a reference to my existing core instance and wrap it with new EmbeddedSolrServer(SolrCore), is this reasonable? However, I cannot see how to get a reference to an existing core in a supported way. Thanks Darin
Re: Natively Execute SOLR Queries within an app server.
On Fri, Jul 24, 2015, at 07:50 PM, Darin Amos wrote: Hello, I have an application server that is running both the solr.war and a REST API war within the same JVM. Is it possible to query the SOLR instance natively (non-blocking) without connecting over HTTP? I could use EmbeddedSolrServer but I cannot create a second instance of my core. If I can get a reference to my existing core instance and wrap it with new EmbeddedSolrServer(SolrCore), is this reasonable? However, I cannot see how to get a reference to an existing core in a supported way. This is not a supported use-case. Solr is intended to be a stand-alone application server that happens to be written in Java. I believe, as of 5.3, there may not be a war file included in Solr, and gradually, creating a war will get harder, or even become impossible. If you wanted to run something inside the same VM, write your own request handler, and make it a part of Solr itself. See: http://wiki.apache.org/solr/WhyNoWar Upayavira
[ANN] New Features For Splainer
First, I wanted to humbly thank the Solr community for their contributions and feedback for our open source Solr sandbox, Splainer (http://splainer.io and http://github.com/o19s/splainer). The reception and comments have been generally positive and helpful, and I very much appreciate being part of such a great open source community that wants to support each other. What is Splainer exactly? Why should you care? Nobody likes working with Solr in the browser's URL bar. Splainer let's you paste in your Solr URL and get an instant, easy to understand breakdown of why some documents are ranked higher than others. It then gives you a friendly interface to tweak Solr params and experiment with different ideas with a friendlier UI than trying to parse through XML and JSON. You needn't worry about security rules so that some splainer backend needing to talk to your Solr. The interaction with Solr is 100% through your browser. If your PC can see Solr, then so can Splainer running in your browser. If you leave work or turn off the VPN, then Splainer can't see your Solr. It's all running locally on your machine through the browser! I wanted to share that we've been slowly adding features to Splainer. The two I wanted to highlight, are captured in this blog article ( http://opensourceconnections.com/blog/2015/07/24/splainer-a-solr-developers-best-friend/ ) To summarize, they include - Explain Other You often wonder why obviously relevant search results don't come back. Splainer now gives you the ability to compare any document to secondary document to see what factors caused one document to rank higher than another - Share Splainerized Solr Results Once you paste a Solr URL into Splainer, you can then copy the splainer.io URL to share what you're seeing with a colleague. For example, here's some information about Virginia state laws about hunting deer from a boat: http://splainer.io/#?solr=http:%2F%2Fsolr.quepid.com%2Fsolr%2Fstatedecoded%2Fselect%3Fq%3Ddeer%20hunt%20from%20watercraft%0A%26defType%3Dedismax%0A%26qf%3Dcatch_line%20text%0A%26bq%3Dtitle:deer There's many more smaller features and tweaks, but I wanted to let you know this was out there. I hope you find Splainer useful. I'm very happy to field pull requests, ideas, suggestions, or try to figure out why Splainer isn't working for you! Cheers! -- *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections http://opensourceconnections.com, LLC | 240.476.9983 Author: Relevant Search http://manning.com/turnbull This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.
Re: caceh implemetation?
On Fri, Jul 24, 2015 at 1:06 AM, Shawn Heisey apa...@elyograg.org wrote: On 7/23/2015 10:55 AM, cbuxbaum wrote: Say we have 100 party records. Then the child SQL will be run 100 times (once for each party record). Isn't there a way to just run the child SQL on all of the party records at once with a join, using a GROUP BY and ORDER BY on the PARTY_ID? Then the results from that query could easily be placed in SOLR according to the primary key (party_id). Is there some part of the Data Import Handler that operates that way? Using well-crafted SQL JOIN is almost always going to be better for dataimport than nested entities. The heavy lifting is done by the database server, using code that's extremely well-optimized for that kind of lifting. Doing what you describe with a parent entity and one nested entity (that is not cached) will result in 101 total SQL queries. A million SQL queries, no matter how fast each one is, will be slow. If you can do everything in a single SQL query with JOIN, then Solr will make exactly one SQL query to the server for a full-import. For my own dataimport, I use a view that was defined on the mysql server by the dbadmin. The view does all the JOINs we require. Solr's dataimport handler doesn't have any intelligence to do the join locally. It would be cool if it did, but somebody would have to write the code to teach it how. Because the DB server itself can already do JOINs, and it can do them VERY well, there's really no reason to teach it to Solr. fwiw, DIH now has join=”zipper” https://issues.apache.org/jira/browse/SOLR-4799 attribute which can be specified to child entity, it enables classic ETL external merge join algorithm. Thanks, Shawn -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Unexpected docvalues type error using result grouping - Use UninvertingReader or index with docvalues
I didn't use the REST API, instead updated the schema manually. Can you be specific on removing the data directory content ? I certainly don't want to wipe out the index. I've four Solr instances, 2 shards with a replica each. Are you suggesting clearing the index and re-indexing from scratch ? -- View this message in context: http://lucene.472066.n3.nabble.com/Unexpected-docvalues-type-error-using-result-grouping-Use-UninvertingReader-or-index-with-docvalues-tp4218939p4219089.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: serious JSON Facet bug
Is there a jira logged for this issue? Sent from my iPhone On 23-Jul-2015, at 11:09 pm, Nagasharath sharathrayap...@gmail.com wrote: I don't have this issue. I have tried with various json facet queries and my filter cache always come down to the 'minsize'( never exceeds configured) with solr version 5.2.1, and all my queries are json nested faceted. On 23-Jul-2015, at 7:43 pm, Yonik Seeley ysee...@gmail.com wrote: On Thu, Jul 23, 2015 at 5:00 PM, Harry Yoo hyunat...@gmail.com wrote: Is there a way to patch? I am using 5.2.1 and using json facet in production. First you should see if your queries tickle the bug... check the size of the filter cache from the admin screen (under plugins, filterCache) and see if it's current size is larger than the configured maximum. -Yonik On Jul 16, 2015, at 1:43 PM, Yonik Seeley ysee...@gmail.com wrote: To anyone using the JSON Facet API in released Solr versions: I discovered a serious memory leak while doing performance benchmarks (see http://yonik.com/facet_performance/ for some of the early results). Assuming you're in the evaluation / development phase of your project, I'd recommend using a recent developer snapshot for now: https://builds.apache.org/job/Solr-Artifacts-5.x/lastSuccessfulBuild/artifact/solr/package/ The fix (and performance improvements) will also be in the next Solr release (5.3) of course. -Yonik
Re: serious JSON Facet bug
On Fri, Jul 24, 2015 at 8:03 PM, Nagasharath sharathrayap...@gmail.com wrote: Is there a jira logged for this issue? * SOLR-7781: JSON Facet API: Terms facet on string/text fields with sub-facets caused a bug that resulted in filter cache lookup misses as well as the filter cache exceeding it's configured size. (yonik) https://issues.apache.org/jira/browse/SOLR-7781 -Yonik
Re: Custom Solr caches in a FunctionQuery that emulates the ExternalFileField
I think it's intended for {!join fromIndex=other from=other_key to=key score=max}my_boost_value_field thus it runs functional query, which matches all docs at other core with field value 'my_boost_value_field' as a score. Then, this score is passed through join query for other.other_key=key. Do you see something on debugQuery=true? On Fri, Jul 24, 2015 at 3:41 PM, Upayavira u...@odoko.co.uk wrote: Mikhail, I've tried this out, but to be honest I can't work out what the score= parameter is supposed to add. I assume that if I do {!join fromIndex=other from=other_key to=key score=max}somefield:(abc dev) It will calculate the score for each document that has the same key value, and include that in the score for the main document? If this is the case, then I should be able to do: {!join fromIndex=other from=other_key to=key score=max}{!boost b=my_boost_value_field}*:* In which case, it'll take the value of my_boost_field in the other core, and include it in the score for my document that has the value of key? Upayavira On Fri, Jul 10, 2015, at 04:15 PM, Mikhail Khludnev wrote: I've heard that people use https://issues.apache.org/jira/browse/SOLR-6234 for such purpose - adding scores from fast moving core to the bigger slow moving one On Fri, Jul 10, 2015 at 4:54 PM, Upayavira u...@odoko.co.uk wrote: All, I have knocked up what I think could be a really cool function query - it allows you to retrieve a value from another core (much like a pseudo join) and use that value during scoring (much like an ExternalFileField). Examples: * Selective boosting of documents based upon a category based value * boost on aggregated popularity values * boost on fast moving data on your slow moving index It *works* but it does so very slowly (on 3m docs, milliseconds without, and 24s with it). There are two things that happen a lot: * locate a document with unique ID value of X * retrieve the value of field Y for that doc What it seems to me now is that I need to implement a cache that will have a string value as the key and the (float) field value as the object, that is warmed alongside existing caches. Any pointers to examples of how I could do this, or other ways to do the conversion from a key value to a float value faster? NB. I hope to contribute this if I can make it perform. Thanks! Upayavira -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Solr Clustering Issue
Thank you Upayavira and Shawn. Yes - the query works correctly using the standard select. I have a workaround where I simply specify the fields I want to search in each part of the query and do not specify a df. Just an FYI in case someone else runs into this. -Joe On 7/23/2015 10:51 AM, Shawn Heisey wrote: On 7/23/2015 7:51 AM, Joseph Obernberger wrote: Hi Upayavira - the URL was: http://server1:9100/solr/MYCOL1/clustering?q=Collection:(COLLECT1008+OR+COLLECT2587)+AND+(amazon+AND+soap)wt=jsonindent=trueclustering=truerows=1df=FULL_DOCUMENTdebugQuery=true Here is the relevant part of the response - notice that the default field (FULL_DOCUMENT) is not in the response, and that it appears to ignore parts of the query string. snip parsedquery_toString:+(Collection:(COLLECT1008 (id:OR^10.0 | text:or^0.5) (id:COLLECT2587)^10.0 | text:collect2587^0.5) (id:AND^10.0 | text:and^0.5) (id:(amazon^10.0 | text:amazon^0.5) (id:AND^10.0 | text:and^0.5) (id:soap)^10.0 | text:soap^0.5)), QParser:ExtendedDismaxQParser, According to the last line I quoted above, you are using the edismax parser. This parser does not use the df parameter, it uses qf and other parameters to determine which fields to search. It appears that you do have a qf parameter, listing the id field with a boost of 10, and the text field with a boost of 0.5. Something else I noticed, not sure if it's relevant: The presence of id:OR^10.0 in that parsed query is very strange. That is something I would expect from the dismax parser, not edismax. There have been some bugs with edismax and parentheses, it's conceivable that there might be more problems: https://issues.apache.org/jira/browse/SOLR-5435 https://issues.apache.org/jira/browse/SOLR-3377 Sometimes bugs with parentheses are fixed by adding spaces to separate them from their contents. Thanks, Shawn
Re: Nested objects in Solr
You can use nested entities like below. document entity name=OuterEntity pk=id query=SELECT * FROM User field column=id name=id / field column=name name=name / entity name=InnerEntity child=true query=select * from subject /entity /entity /document -- View this message in context: http://lucene.472066.n3.nabble.com/Nested-objects-in-Solr-tp4213212p4219039.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Running SOLR 5.2.1 on Embedded Jetty
On Thu, Jul 23, 2015, at 10:14 PM, Darin Amos wrote: Hello, I have been trying to run the SOLR war with embedded Jetty and can’t seem to get the config quiet right. Is there any known documentation on this or is someone else doing this? I seem to just be setting up a document server at my solr.home directory. The code snippet below seems incomplete to me, but I can’t seem to find what I am missing. Thanks! Darin Server solrServer = new Server(8983); WebAppContext solrApp = new WebAppContext(); solrApp.setContextPath(/); solrApp.setWar(solr.war); //solr.war is sitting in my java.home root for now. solrServer.setHandler(solrApp); solrServer.start(); solrServer.join(); I suspect the question needed here is why do you want to use an Embedded jetty? If it is for the sake of running tests, I'd suggest you look at the tests that run within Solr itself. Upayavira
Re: XSLT with maps
Yes I am fairly new to XSLT. I used the velocity response writer for some prototypes. I found it very intuitive. But the requirement for the app specifically rules it out and mandates the XSLT approach. I have finally got it working. Thanks to all your help. Here's what I got (a minor correction on your final suggestion; again, it is using the attributes here.). Here's the final result for anyone else trying to do something similar. xsl:template match='/' IMAGES xsl:apply-templates select=response/result/doc/ /IMAGES /xsl:template xsl:template match=doc ID NewID={str[@name='id']} xsl:apply-templates select=bool[@name='pr']/ /ID /xsl:template xsl:template match=bool[.='false'] xsl:attribute name={@name}0/xsl:attribute /xsl:template xsl:template match=bool[.='true'] xsl:attribute name={@name}1/xsl:attribute /xsl:template Thanks again Upayavira. -- View this message in context: http://lucene.472066.n3.nabble.com/XSLT-with-maps-tp4218518p4219015.html Sent from the Solr - User mailing list archive at Nabble.com.