AW: SolrJ : fieldcontent from (multiple) file(s)
Thanks for all you advices and thoughts. The client in our case is/are the tomcats. To be more precise the webapps running in the tomcats. These should serve http request. I'd also like to note that it's he batch-updates that in my opinion cause load (cpu and memory (dependeing on the pdf)) which I would like to take of the webapps. Not the single document insertions/updates. But if I don't get a clean/stable Solr-way-to-do-it solution to this problem I will do the extraction in the webapps, as is -Ursprüngliche Nachricht- Von: Erick Erickson [mailto:erickerick...@gmail.com] Gesendet: Samstag, 13. September 2014 23:22 An: solr-user@lucene.apache.org Betreff: Re: SolrJ : fieldcontent from (multiple) file(s) Alexandre: Hmmm, if you're correct, that pretty much shoots SolrCel in the head too. You'd probably have to do something with a custom UpdateRequestProcessor in that case... On Sat, Sep 13, 2014 at 2:06 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: On 13 September 2014 17:03, Erick Erickson erickerick...@gmail.com wrote: Which probably just means I don't understand your problem space in sufficient depth I suspect this means the clients do not have access to the shared drive with the files, but the Solr server does. A firewall in between or some such. If I am right, that would make invoking DataImportHandler a bit complicated as well, due to change of push to pull. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
Re: Advice on highlighting
https://issues.apache.org/jira/plugins/servlet/mobile#issue/LUCENE-2878 provides lucene API what you are trying to do, it's not yet in though. There's a fork which has the change in https://github.com/flaxsearch/lucene-solr-intervals On 12 Sep 2014 21:24, Craig Longman clong...@iconect.com wrote: In order to take our Solr usage to the next step, we really need to improve its highlighting abilities. What I'm trying to do is to be able to write a new component that can return the fields that matched the search (including numeric fields) and the start/end positions for the alphanumeric matches. I see three different approaches take, either way will require making some modifications to the lucene/solr parts, as it just does not appear to be doable as a completely stand alone component. 1) At initial search time. This seemed like a good approach. I can follow IndexSearcher creating the TermContext that parses through AtomicReaderContexts to see if it contains a match and then adds it to the contexts available for later. However, at this point, inside SegmentTermsEnum.seekExact() it seems like Solr is not really looking for matching terms as such, it's just scanning what looks like the raw index. So, I don't think I can easily extract term positions at this point. 2) Write a odified HighlighterComponent. We have managed to get phrases to highlight properly, but it seems like getting the full field matches would be more difficult in this module, however, because it does its highlighting oblivious to any other criteria, we can't use it as is. For example, this search: (body:large+AND+user_id:7)+OR+user_id:346 Will highlight large in records that have user_id = 346 when technically (for our purposes at least) it should not be considered a hit because the large was accompanied by the user_id = 7 criteria. It's not immediately clear to me how difficult it would be to change this. 3) Make a modified DebugComponent and enhance the existing explain() methods (in the query types we require it at least) to include more information such as the start/end positions of the term that was hit. I'm exploring this now, but I don't easily see how I can figure out what those positions might be from the explain() information. Any pointers on how, at the point that TermQuery.explain() is being called that I can figure out which indexed token was the actual hit on? Craig Longman C++ Developer iCONECT Development, LLC 519-645-1663 This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If the reader of the message is not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, notify the sender immediately by return email and delete the message and any attachments from your system.
Re: New tiny high-performance HTTP/Servlet server for Solr
Thanks for sharing, since in future Solr may move towards standalone server this (undertow) could be one option. On Sat, Sep 13, 2014 at 9:36 PM, William Bell billnb...@gmail.com wrote: Can we get some stats? Do you have any numbers on performance? On Sat, Sep 13, 2014 at 3:03 PM, Jayson Minard jay...@bremeld.com.invalid wrote: Instead of within an Application Server such as Jetty, Tomcat or Wildly ... Solr can also now be run standalone on Undertow without the overhead or complexity of a full application server. Open-sourced on https://github.com/bremeld/solr-undertow solr-undertow Solr running in standalone server - High Performance, tiny, fast, easy, standalone deployment. Requires JDK 1.7 or newer. Less than 4MB download, faster than Jetty, Tomcat and all the others. Written in the Kotlin language http://kotlinlang.org/ for the JVM. Releases are available here https://github.com/bremeld/solr-undertow/releases on GitHub. This application launches a Solr WAR file as a standalone server running a high performance HTTP front-end based on undertow.io (the engine behind WildFly, the new JBoss). It has no features of an application server, does nothing more than load Solr servlets and also service the Admin UI. It is production-quality for a stand-alone Solr server. -- Bill Bell billnb...@gmail.com cell 720-256-8076
Solr Dynamic Field Performance
I have a collection with 200 fields and 300M docs running in cloud mode. Each doc have around 20 fields. I now have a use case where I need to replace these explicit fields with 6 dynamic fields. Each of these 200 fields will match one of the 6 dynamic field. I am evaluating performance implications of switching to dynamicFields. I have tested with a smaller dataset(5M docs) but didn't noticed any indexing or query performance degradation. Query on dynamic fields will either be faceting, range query or full text search. Are there any known performance issues with using dynamicFields instead of explicit ones? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Dynamic-Field-Performance-tp4158737.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Dynamic Field Performance
Dynamic fields, once they are actually _in_ a document, aren't any different than statically defined fields. Literally, there's no place in the search code that I know of that _ever_ has to check whether a field was dynamically or statically defined. AFAIK, the only additional cost would be figuring out which pattern matched at index time, which is such a tiny portion of the cost of indexing that I doubt you could measure it. Best, Erick On Sun, Sep 14, 2014 at 7:58 AM, Saumitra Srivastav saumitra.srivast...@gmail.com wrote: I have a collection with 200 fields and 300M docs running in cloud mode. Each doc have around 20 fields. I now have a use case where I need to replace these explicit fields with 6 dynamic fields. Each of these 200 fields will match one of the 6 dynamic field. I am evaluating performance implications of switching to dynamicFields. I have tested with a smaller dataset(5M docs) but didn't noticed any indexing or query performance degradation. Query on dynamic fields will either be faceting, range query or full text search. Are there any known performance issues with using dynamicFields instead of explicit ones? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Dynamic-Field-Performance-tp4158737.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr: Tricky exact match, unwanted search results
*Erick*, thank you for help! For exact match I still want: to use stemming (e.g. for sleep I want the word forms slept, sleeping, sleeps also to be used in searching) to disregard case sensitivity to disregard prepositions, conjunctions and other function words to match only docs having all of the query words and in the given order (except function words) to match only docs if there are no other words in the doc field besides the words in the query to use synonyms (e.g. GB == gigabyte, Television == TV) Erick Erickson wrote The easiest way to make your examples work wouldbe to use a copyField to an exact match field thatuses the KeywordTokenizer The KeywordTokenizer treats the entire field as a single token, regardless of its content. So this does not fit to my requirements. Erick Erickson wrote You'll have to be a little careful to escape spaces for muti-term bits, like exact_field:pussy\ cat. Hmm... I don't care about quoting right now at all. But should I? Erick Erickson wrote As far as your question about if and in, what you're probably getting here is stopword removal, but that's a guess. I have the following document:After I disabled solr.StopFilterFactory for analyzer type=query Solr stopped returning this document for the query: http://localhost:8983/solr/lexikos/select?q=phraseExact%3A%22on+a+case-by-case%22.Can I somehow implement the desired exact match behavior? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Tricky-exact-match-unwanted-search-results-tp4158652p4158745.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr: Tricky exact match, unwanted search results
FiMka wrote After I disabled solr.StopFilterFactory for analyzer type=query Solr stopped returning this document for the query: http://localhost:8983/solr/lexikos/select?q=phraseExact%3A%22on+a+case-by-case%22. Forgot to say, I have also disabled solr.StopFilterFactory for analyzer type=index, removed all the documents and then re-added them. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Tricky-exact-match-unwanted-search-results-tp4158652p4158748.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tricky exact match, unwanted search results
I keep asking people this eternal question: What training or doc are you reading that is using this term exact match? Clearly the term is being used by a lot of people in a lot of ambiguous ways, when exact should be... exact. I think we need to start using the term exact match ONLY for string field queries, and that don't use wildcard, fuzzy, or range queries. And maybe also keyword tokenizer text fields that don't have any filters, which might as well be string fields. -- Jack Krupansky -Original Message- From: FiMka Sent: Sunday, September 14, 2014 9:34 AM To: solr-user@lucene.apache.org Subject: Re: Solr: Tricky exact match, unwanted search results *Erick*, thank you for help! For exact match I still want: to use stemming (e.g. for sleep I want the word forms slept, sleeping, sleeps also to be used in searching) to disregard case sensitivity to disregard prepositions, conjunctions and other function words to match only docs having all of the query words and in the given order (except function words) to match only docs if there are no other words in the doc field besides the words in the query to use synonyms (e.g. GB == gigabyte, Television == TV) Erick Erickson wrote The easiest way to make your examples work wouldbe to use a copyField to an exact match field thatuses the KeywordTokenizer The KeywordTokenizer treats the entire field as a single token, regardless of its content. So this does not fit to my requirements. Erick Erickson wrote You'll have to be a little careful to escape spaces for muti-term bits, like exact_field:pussy\ cat. Hmm... I don't care about quoting right now at all. But should I? Erick Erickson wrote As far as your question about if and in, what you're probably getting here is stopword removal, but that's a guess. I have the following document:After I disabled solr.StopFilterFactory for analyzer type=query Solr stopped returning this document for the query: http://localhost:8983/solr/lexikos/select?q=phraseExact%3A%22on+a+case-by-case%22.Can I somehow implement the desired exact match behavior? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Tricky-exact-match-unwanted-search-results-tp4158652p4158745.html Sent from the Solr - User mailing list archive at Nabble.com.
Altenative preview for specific fields
Suppose I have the following fields : text,author,title users performs a query on all those fileds : ...?q=(text:XX OR author:XX OR title:XX) if this query has a match in 'text' field , so highligter will generate a hit preview based on this field , which is fine . But suppose a query matched an 'author' field , so the preview will not be much intresting . In this case I would like to show something else e.g. first 3 lines of 'text' filed. What will be the best way to achive this ? -- View this message in context: http://lucene.472066.n3.nabble.com/Altenative-preview-for-specific-fields-tp4158771.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Altenative preview for specific fields
Hi, hl.alternateField and hl.maxAlternateFieldLength would be useful. http://wiki.apache.org/solr/HighlightingParameters Ahmet On Sunday, September 14, 2014 9:35 PM, SolrUser1543 osta...@gmail.com wrote: Suppose I have the following fields : text,author,title users performs a query on all those fileds : ...?q=(text:XX OR author:XX OR title:XX) if this query has a match in 'text' field , so highligter will generate a hit preview based on this field , which is fine . But suppose a query matched an 'author' field , so the preview will not be much intresting . In this case I would like to show something else e.g. first 3 lines of 'text' filed. What will be the best way to achive this ? -- View this message in context: http://lucene.472066.n3.nabble.com/Altenative-preview-for-specific-fields-tp4158771.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Dynamic Field Performance
How about perf if you dynamically create 5000 fields ? Bill Bell Sent from mobile On Sep 14, 2014, at 10:06 AM, Erick Erickson erickerick...@gmail.com wrote: Dynamic fields, once they are actually _in_ a document, aren't any different than statically defined fields. Literally, there's no place in the search code that I know of that _ever_ has to check whether a field was dynamically or statically defined. AFAIK, the only additional cost would be figuring out which pattern matched at index time, which is such a tiny portion of the cost of indexing that I doubt you could measure it. Best, Erick On Sun, Sep 14, 2014 at 7:58 AM, Saumitra Srivastav saumitra.srivast...@gmail.com wrote: I have a collection with 200 fields and 300M docs running in cloud mode. Each doc have around 20 fields. I now have a use case where I need to replace these explicit fields with 6 dynamic fields. Each of these 200 fields will match one of the 6 dynamic field. I am evaluating performance implications of switching to dynamicFields. I have tested with a smaller dataset(5M docs) but didn't noticed any indexing or query performance degradation. Query on dynamic fields will either be faceting, range query or full text search. Are there any known performance issues with using dynamicFields instead of explicit ones? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Dynamic-Field-Performance-tp4158737.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Altenative preview for specific fields
Hi , thanks for the answer. I tried to use this technique , but the desired result was not achieved. Can you please provide an example of document to index and some sample query ? -- View this message in context: http://lucene.472066.n3.nabble.com/Altenative-preview-for-specific-fields-tp4158771p4158807.html Sent from the Solr - User mailing list archive at Nabble.com.