Indexing getting failed after some millions of documents
Hi Everyone, I have installed Solr-4.6 Cloud with external Zookeeper-3.4.5 and Tomcat-7, the configuration is as mentioned below. Single Machine Cluster Setup with 3 shards and 2 Replica deployed on 3 Tomcats with 3 Zookeeper. Everything is installed good and fine, I start with the index and till I reach some millions of documents(~1.6M) the indexing stops saying "*#503 Service Unavailable" *and the Cloud Dashboard log says *"ERROR DistributedUpdateProcessor ClusterState says we are the leader, but locally we don't think so"* *"ERROR SolrCore org.apache.solr.common.SolrException: ClusterState says we are the leader (http://host:port1/solr/recollection_shard1_replica1), but locally we don't think so. Request came from http://host:port2/solr/recollection_shard2_replica1/"* *"ERROR ZkController Error registering SolrCore:org.apache.solr.common.SolrException: Error getting leader from zk for shard shard2"* Any suggestions/advice would be appreciated! Thanks! Tim
Re: Vague Behavior while setting Solr Cloud
Thanks Shawn, I much appreciate your help I got it fixed, actually there were some background process already running for tomcat which weren't stopped by the time I faced these issues. Thanks again! Tim On Tue, May 20, 2014 at 11:33 PM, Shawn Heisey wrote: > On 5/20/2014 7:10 AM, Tim Burner wrote: > > I am trying to setup Solr Cloud referring to the blog > > http://myjeeva.com/solrcloud-cluster-single-collection-deployment.html > > > > if I complete the set in one go, then it seems to be going fine. > > > > when the setup is complete and I am trying to restart Solr by restarted > > Tomcat instance, it does not deploy and moreover the shards and replicas > > are not up. > > You've given us nearly zero information about what the problem is. All > we know right now is that you restart tomcat and Solr doesn't deploy. > See this wiki page: > > http://wiki.apache.org/solr/UsingMailingLists > > Getting specific, we'll need tomcat logs, Solr logs, versions of > everything. We might also need your config and schema, depending on > what the other information reveals. > > Thanks, > Shawn > >
Re: Solr Cloud Shards and Replica not reviving after restarting
Thanks Erick, I much appreciate your help I got it fixed, actually there were some background process already running for tomcat which weren't stopped by the time I faced these issues. Thanks again! On Wed, May 21, 2014 at 8:25 AM, Erick Erickson wrote: > First thing I'd look at is the log on the server. It's possible that > you've changed the configuration such that Solr can't start. Shot in > the dark, but that's where I'd start looking. > > Best, > Erick > > On Tue, May 20, 2014 at 4:45 AM, Tim Burner wrote: > > Hi Everyone, > > > > I have installed Solr Cloud 4.6.2 with external Zookeeper and Tomcat, > > having 3 shards with 2 replica each. I tried indexing some documents > which > > went easy. > > > > After which I restarted my Tomcat, and now the Shards are not getting up, > > its coming up with bunch of Exceptions. First exception was "*no servers > > hosting shard:"* > > > > All the replica and leader are down and not responding, its even giving > > > > RecoveryStrategy Error while trying to recover. > > > core=recollection_shard1_replica1:org.apache.solr.client.solrj.SolrServerException: > > Server refused connection at: http://192.168.2.183:9090/solr > > > > It would be great if you can help me out solving this issue. Expert > advice > > needed. > > > > Thanks in Advance! >
Re: Solr performance: multiValued filed vs separate fields
I think multiValue is copied multi values, index is bigger and query easy, but performance may worse, but it depends on how to using. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-performance-multiValued-filed-vs-separate-fields-tp4136121p4137289.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr-server refresh index
Well, you can always make documents in Solr visible by issuing a hard commit or waiting for your hard commit (openSeacher=true) or soft commit interval to expire. But as far as the Cloudera product, you'd get much better answers by asking in Cloudera-specific forums. Here's a place to start... https://groups.google.com/a/cloudera.org/forum/#!forum/scm-users Problem is that Cloudera Manager (CDH) uses Solr, but Solr hasn't done anything special to accommodate Cloudera's usage so this forum is relatively ignorant of CDH, particularly things like hbase integration... Best, Erick On Tue, May 20, 2014 at 8:50 PM, zzz wrote: > Hi > > I am using Solr on a 4 node CDH5 cluster (1 namenode, 3 datanodes). > > I am running the solr-server on the namenode, and the solr-indexer on each > of the datanodes, alongside the hbase regionservers, for NRT indexing of a > hbase table. > > The basics of the indexing seem to work - when I add records via > hbase-shell, I can view the records, however *only* after I either restart > solr-server, or click "optimize" through the Solr Web UI. > > Interesting, after I add some records to hbase, the Solr Web UI displays > the "current" status as a red stop icon. After I restart/optimize, it turns > into a green tick, and I can search and get back the new documents. > > Is there a way to get solr-server to refresh its view of the index > automatically? Or would that even be a good idea? Why doesn't the Web UI > have a clear "refresh index" button available...the "optimize" button is > usually not available. > > TIA
solr-server refresh index
Hi I am using Solr on a 4 node CDH5 cluster (1 namenode, 3 datanodes). I am running the solr-server on the namenode, and the solr-indexer on each of the datanodes, alongside the hbase regionservers, for NRT indexing of a hbase table. The basics of the indexing seem to work - when I add records via hbase-shell, I can view the records, however *only* after I either restart solr-server, or click "optimize" through the Solr Web UI. Interesting, after I add some records to hbase, the Solr Web UI displays the "current" status as a red stop icon. After I restart/optimize, it turns into a green tick, and I can search and get back the new documents. Is there a way to get solr-server to refresh its view of the index automatically? Or would that even be a good idea? Why doesn't the Web UI have a clear "refresh index" button available...the "optimize" button is usually not available. TIA
Re: Odd interaction between {!tag..} and {!field}
Thanks Chris! The query parsing stuff is something I keep stumbling over, but you may have noticed that! Erick On Tue, May 20, 2014 at 10:06 AM, Chris Hostetter wrote: > > : when local params are "embedded" in a query being parsed by the > : LuceneQParser, it applies them using the same scoping as other query > : operators > : > : : fq: "{!tag=name_name}{!field f=name}United States" > > > Think of that example in the context of this one -- the basics of > when/what/why the variuos pices are parsed are the same... > >fq: "{!tag=name_name}(+{!field f=name}United text:(States))" > > > -Hoss > http://www.lucidworks.com/
Re: Solr Cloud Shards and Replica not reviving after restarting
First thing I'd look at is the log on the server. It's possible that you've changed the configuration such that Solr can't start. Shot in the dark, but that's where I'd start looking. Best, Erick On Tue, May 20, 2014 at 4:45 AM, Tim Burner wrote: > Hi Everyone, > > I have installed Solr Cloud 4.6.2 with external Zookeeper and Tomcat, > having 3 shards with 2 replica each. I tried indexing some documents which > went easy. > > After which I restarted my Tomcat, and now the Shards are not getting up, > its coming up with bunch of Exceptions. First exception was "*no servers > hosting shard:"* > > All the replica and leader are down and not responding, its even giving > > RecoveryStrategy Error while trying to recover. > core=recollection_shard1_replica1:org.apache.solr.client.solrj.SolrServerException: > Server refused connection at: http://192.168.2.183:9090/solr > > It would be great if you can help me out solving this issue. Expert advice > needed. > > Thanks in Advance!
Re: How to optimize single shard only?
Marcin is correct. The index size on disk will perhaps double. (triple in compound case). The reason is so you don't lose your index if the process is interrupted. Consider the case where you're optimizing to one segment. 1> All the current segments are copied into the new segment 2> The new segment is flushed 3> "control files" that tell Lucene what files constitute the valid segment(s) are written. 4> the old segments are removed. So at any point up to <3> if the system is killed, crashes, whatever, then the old version of the index is intact and you can keep on working, even optimizing again. If, on the other hand, after each segment was written to the new segment the old segment was deleted, interrupting the process (which may be very long) would leave your index in an inconsistent state. FWIW, Erick On Tue, May 20, 2014 at 4:14 AM, Marcin Rzewucki wrote: > As I wrote before index is being rewritten so it grows during optimization > and later is reduced. I guess there was OOM in your case. > > > > On 20 May 2014 12:11, YouPeng Yang wrote: > >> Hi >> My DIH work indeed hangs, I have only four shards,each has a master and a >> replica.Maybe jvm memory size is very low.it was 3G while the size of >> every >> my core is almost 16GB. >> >> I also have found that the size of the master increased during the >> optimization(you can check on the overview page of the core.).the >> phenomenon is very werid. Is it because that the collection overall >> optimization will comput and copy all the docs of the whole collection. >> >> >> Version Gen Size Master (Searching) >> 1400501330248 >> 98396 >>29.83 GB >> Master (Replicable) >> 1400501330888 >> 98397 >> - >> >> >> After I have check source code,unfortunatly,it seems the optimize action >> distrib overall the collection.you can reference the >> SolrCmdDistributor.distribCommit. >> >> >> 2014-05-20 17:27 GMT+08:00 Marcin Rzewucki : >> >> > Well, it should not hang if all is configured fine :) How many shards and >> > memory you have ? Note that optimize rewrites index so you might need >> > additional disk space for this process. Optimizing works fine however I'd >> > like to be able to do it on a single shard as well. >> > >> > >> > On 20 May 2014 11:19, YouPeng Yang wrote: >> > >> > > Hi Marcin >> > > >> > > Thanks to your mail,now I know why my cloud hangs when I just click >> the >> > > optimize button on the overview page of the shard. >> > > >> > > >> > > 2014-05-20 15:25 GMT+08:00 Ahmet Arslan : >> > > >> > > > Hi Marcin, >> > > > >> > > > just a guess, pass distrib=false ? >> > > > >> > > > >> > > > >> > > > Ahmet >> > > > >> > > > >> > > > On Tuesday, May 20, 2014 10:23 AM, Marcin Rzewucki < >> > mrzewu...@gmail.com> >> > > > wrote: >> > > > Hi, >> > > > >> > > > Do you know how to optimize index on a single shard only ? I was >> trying >> > > to >> > > > use "optimize=true&waitFlush=true&shard.keys=myshard" but it does not >> > > work >> > > > - it optimizes all shards instead of just one. >> > > > >> > > > Kind regards. >> > > > >> > > > >> > > >> > >>
Re: Extensibility and code reuse: SOLR vs Lucene
On Tue, May 20, 2014 at 6:01 PM, Achim Domma wrote: > - I found several times code snippets like " if (collector instanceof > DelegatingCollector) { ((DelegatingCollector)collector).finish() } ". Such > code is considered bad practice in every OO language I know. Do I miss > something here? Is there a reason why it's solved like this? In a single code base you would be correct (we would just add a finish method to the base Collector class). When you are adding additional functionality to an existing API/code base however, this is often the only way to do it. What type of aggregation are you looking for? The Heliosearch project (a Solr fork), also has this: http://heliosearch.org/solr-facet-functions/ -Yonik http://heliosearch.org - facet functions, subfacets, off-heap filters&fieldcache
Re: Extensibility and code reuse: SOLR vs Lucene
Achim, Solr can be extended to plugin custom analytics. The code snippet you mention is part of the framework which enables this. Here is how you do it: 1) Create a QParserPlugin that returns a Query that extends PostFilter. 2) Then implement the PostFilter api and return a DelegatingCollector that collects whatever you like. 3) DelegatingCollector.finish() signals your collector that the search has completed. 4) You can output your analytics directly to the ResponseBuilder. You can get a reference to the ResponseBuilder through a static call in the SolrRequestInfo class. In Solr 4.9 you'll be able to implement your own MergeStrategy, to merge the results generated by DelegatingCollectors on the shards (SOLR-5973). The pluggable collectors in that ticket are for ranking. The PostFilter delegating collectors are a better place for doing custom analytics. Joel Bernstein Search Engineer at Heliosearch On Tue, May 20, 2014 at 6:01 PM, Achim Domma wrote: > Hi, > > I have a project, where we need to do aggregations over facetted values. > The stats component is not powerful enough anymore and the new statistic > component seems not to be ready yet. I understand that it's not easy to > create a general purpose component for this task. I decided to check > whether I can solve my use case by myself, but I'm struggling. Any > clarification regarding the following points would be very appreciated: > > - I assume that some of my use cases could be solved by using a custom > collector. Lucene seems to be build to be extensible by deriving classes > and overriding methods. That's how I would expect SOLID code to be. But > looking at the SOLR code, I see a lot of hard coded types and no way to > just exchange the collector. This is the case for most of the code parts I > have read, so I wonder: Is there another way to customize / extend SOLR? > How is the SOLR code supposed to be reused? > > - I found several times code snippets like " if (collector instanceof > DelegatingCollector) { ((DelegatingCollector)collector).finish() } ". Such > code is considered bad practice in every OO language I know. Do I miss > something here? Is there a reason why it's solved like this? > > cheers, > Achim
Extensibility and code reuse: SOLR vs Lucene
Hi, I have a project, where we need to do aggregations over facetted values. The stats component is not powerful enough anymore and the new statistic component seems not to be ready yet. I understand that it's not easy to create a general purpose component for this task. I decided to check whether I can solve my use case by myself, but I'm struggling. Any clarification regarding the following points would be very appreciated: - I assume that some of my use cases could be solved by using a custom collector. Lucene seems to be build to be extensible by deriving classes and overriding methods. That's how I would expect SOLID code to be. But looking at the SOLR code, I see a lot of hard coded types and no way to just exchange the collector. This is the case for most of the code parts I have read, so I wonder: Is there another way to customize / extend SOLR? How is the SOLR code supposed to be reused? - I found several times code snippets like " if (collector instanceof DelegatingCollector) { ((DelegatingCollector)collector).finish() } ". Such code is considered bad practice in every OO language I know. Do I miss something here? Is there a reason why it's solved like this? cheers, Achim
Stemming for Chinese and Japanese
Hi, What is the filter to be used to implement stemming for Chinese and Japanese language field types. For English, I have used and its working fine. Appreciate your help! Thanks, G. Naresh Kumar -- View this message in context: http://lucene.472066.n3.nabble.com/Stemming-for-Chinese-and-Japanese-tp4137225.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Issue paging when sorting on a Date field
On 5/19/2014 2:05 PM, Bryan Bende wrote: > Using Solr 4.6.1 and in my schema I have a date field storing the time a > document was added to Solr. > > I have a utility program which: > - queries for all of the documents in the previous day sorted by create date > - pages through the results keeping track of the unique document ids > - compare the total number of unique doc ids to the numFound to see if it > they match > > I've noticed that if I use a page size larger than the number of documents > for the given day (aka get everything in one query), then everything works > as expected (results sorted correctly, unique doc ids size == numFound). > > However, when I use a smaller page say, say 10 rows per page, I randomly > see cases where the last document of a page will be duplicated as the first > document of the next page, even though the "start" and "rows" parameters > increased correctly. So I might see something like numFound=100 but unique > doc ids is 97, and then I see three occurrences where the last doc id on a > page was also the first on the next page. This *sounds* like a situation where you have a sharded index that has the same uniqueKey value in more than one shard. This situation will cause Solr to behave in a way that looks completely unpredictable. There is no way for Solr to deal with this problem in a way that would not consume large amounts of real time, CPU time, and RAM ... so Solr does not do anything for dealing with this problem other than removing duplicates from the actual results returned -- which is actually how the discrepancies occur. If you are absolutely sure that you are not running into the duplicate document problem I described, then I am not sure what's going on. It might be related to the sort, and if that's true, adding a second sort parameter using your uniqueKey field might be a solution. Thanks, Shawn
Re: Issue paging when sorting on a Date field
: So I think when I was paging through the results, if the query for page N : was handled by replica1 and page N+1 handled by replica2, and the page : boundary happened to be where the reversed rows were, this would produce : the behavior I was seeing where the last row from the previous page was : also the first row from the next page. Right, this can actually happen even in a single solr node. when 2 docs have identical sort values, the final ordering is non-deterministic -- they usually come back in "index" order (the order that they appear in the segments on disk) but that's not garunteed. In particular if you have concurrent index updates that cause segment merges the order of documents can change (even if those updates don't directly affect the docs being reutrned). If you want to ensure that docs with equal sort values are returned in a consistent order across pagination (in either single or multi-node setups) you have to have a "tie breaker" sort of some kind -- the uniquekey can be useul here. -Hoss http://www.lucidworks.com/
Re: Vague Behavior while setting Solr Cloud
On 5/20/2014 7:10 AM, Tim Burner wrote: > I am trying to setup Solr Cloud referring to the blog > http://myjeeva.com/solrcloud-cluster-single-collection-deployment.html > > if I complete the set in one go, then it seems to be going fine. > > when the setup is complete and I am trying to restart Solr by restarted > Tomcat instance, it does not deploy and moreover the shards and replicas > are not up. You've given us nearly zero information about what the problem is. All we know right now is that you restart tomcat and Solr doesn't deploy. See this wiki page: http://wiki.apache.org/solr/UsingMailingLists Getting specific, we'll need tomcat logs, Solr logs, versions of everything. We might also need your config and schema, depending on what the other information reveals. Thanks, Shawn
Re: solr-user Digest of: get.100322
On 5/20/2014 2:01 AM, Jeongseok Son wrote: > Though it uses only small amount of memory I'm worried about memory > usage because I have to store so many documents. (32GB RAM / total 5B > docs, sum of docs. of all cores) If you've only got 32GB of RAM and there are five billion docs on the system, Solr performance will be dismal no matter what you do with docValues. Your index will be FAR larger than the amount of available RAM for caching. http://wiki.apache.org/solr/SolrPerformanceProblems#RAM With that many documents, even if you don't use RAM-hungry features like sorting and facets, you'll need a significant heap size, which will further reduce the amount of RAM on the system that the OS can use to cache the index. For good performance, Solr *relies* on the operating system caching a significant portion of the index. Thanks, Shawn
Re: Odd interaction between {!tag..} and {!field}
: when local params are "embedded" in a query being parsed by the : LuceneQParser, it applies them using the same scoping as other query : operators : : : fq: "{!tag=name_name}{!field f=name}United States" Think of that example in the context of this one -- the basics of when/what/why the variuos pices are parsed are the same... fq: "{!tag=name_name}(+{!field f=name}United text:(States))" -Hoss http://www.lucidworks.com/
Re: Odd interaction between {!tag..} and {!field}
: The presence of the {!tag} entry changes the filter query generated by : the {!field...} tag. Note below that in one case the filter query is a : phrase query, and in the other it's parsed with one term against the : specified field and the other against the default field. I think you are missunderstanding the way the localparams logic works. when localparams are at the begining of the param, they apply to the entire string value when local params are "embedded" in a query being parsed by the LuceneQParser, it applies them using the same scoping as other query operators : fq: "{!tag=name_name}{!field f=name}United States" that says "parse this entire query string using the default parser,, using "tag=name_name" on the result. then he LuceneQParser gets the string "{!field f=name}United States" and it parses "United" using the "field" Qparser, and "Stats" using itself. : fq: "{!field f=name}United States" that says "parse this entire query string using the "field" parser. I think what you want is... fq: "{!field f=name tag=name_name}United States" or more explicitly w/o shortcut... fq: "{!tag=name_name type=field f=name}United States" -Hoss http://www.lucidworks.com/
Odd interaction between {!tag..} and {!field}
not sure what to make of this... The presence of the {!tag} entry changes the filter query generated by the {!field...} tag. Note below that in one case the filter query is a phrase query, and in the other it's parsed with one term against the specified field and the other against the default field. Using the example data, submitting this: http://localhost:8983/solr/collection1/select?q=*:*&fq={!tag=name_name}{!field f=name}United States&wt=json&indent=true&debug=query generates this response: { responseHeader: { status: 0, QTime: 10, params: { indent: "true", q: "*:*", debug: "query", wt: "json", fq: "{!tag=name_name}{!field f=name}United States" } }, response: { numFound: 0, start: 0, docs: [ ] }, debug: { rawquerystring: "*:*", querystring: "*:*", parsedquery: "MatchAllDocsQuery(*:*)", parsedquery_toString: "*:*", QParser: "LuceneQParser", filter_queries: [ "{!tag=name_name}{!field f=name}United States" ], parsed_filter_queries: [ "name:united text:states" ] } } while this one: http://localhost:8983/solr/collection1/select?q=*:*&fq={!field f=name}United States&wt=json&indent=true&debug=query gives: { responseHeader: { status: 0, QTime: 3, params: { indent: "true", q: "*:*", debug: "query", wt: "json", fq: "{!field f=name}United States" } }, response: { numFound: 0, start: 0, docs: [ ] }, debug: { rawquerystring: "*:*", querystring: "*:*", parsedquery: "MatchAllDocsQuery(*:*)", parsedquery_toString: "*:*", QParser: "LuceneQParser", filter_queries: [ "{!field f=name}United States" ], parsed_filter_queries: [ "PhraseQuery(name:"united states")" ] } } Of course quoting "United States" works. Escaping the space does NOT change the behavior when {!tag...} is present. Is this worth a JIRA or am I just missing the obvious? Erick
Re: WordDelimiterFilterFactory and StandardTokenizer
Hey Ahmet, Yeah I had missed Shawn's response, I'll have to give that a try as well. As for the version, we're using 4.4. StandardTokenizer sets type for HANGUL, HIRAGANA, IDEOGRAPHIC, KATAKANA, and SOUTHEAST_ASIAN and you're right, we're using TypeTokenFilter to remove those. Diego Fernandez - 爱国 Software Engineer US GSS Supportability - Diagnostics - Original Message - > Hi Diego, > > Did you miss Shawn's response? His ICUTokenizerFactory solution is better > than mine. > > By the way, what solr version are you using? Does StandardTokenizer set type > attribute for CJK words? > > To filter out given types, you not need a custom filter. Type Token filter > serves exactly that purpose. > https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-TypeTokenFilter > > > > On Tuesday, May 20, 2014 5:50 PM, Diego Fernandez > wrote: > Great, thanks for the information! Right now we're using the > StandardTokenizer types to filter out CJK characters with a custom filter. > I'll test using MappingCharFilters, although I'm a little concerned with > possible adverse scenarios. > > Diego Fernandez - 爱国 > Software Engineer > US GSS Supportability - Diagnostics > > > > - Original Message - > > Hi Aiguofer, > > > > You mean ClassicTokenizer? Because StandardTokenizer does not set token > > types > > (e-mail, url, etc). > > > > > > I wouldn't go with the JFlex edit, mainly because maintenance costs. It > > will > > be a burden to maintain a custom tokenizer. > > > > MappingCharFilters could be used to manipulate tokenizer behavior. > > > > Just an example, if you don't want your tokenizer to break on hyphens, > > replace it with something that your tokenizer does not break. For example > > under score. > > > > "-" => "_" > > > > > > > > Plus WDF can be customized too. Please see types attribute : > > > > http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/test-files/solr/collection1/conf/wdftypes.txt > > > > > > Ahmet > > > > > > On Friday, May 16, 2014 6:24 PM, aiguofer wrote: > > Jack Krupansky-2 wrote > > > > > Typically the white space tokenizer is the best choice when the word > > > delimiter filter will be used. > > > > > > -- Jack Krupansky > > > > If we wanted to keep the StandardTokenizer (because we make use of the > > token > > types) but wanted to use the WDFF to get combinations of words that are > > split with certain characters (mainly - and /, but possibly others as > > well), > > what is the suggested way of accomplishing this? Would we just have to > > extend the JFlex file for the tokenizer and re-compile it? > > > > > > > > -- > > View this message in context: > > http://lucene.472066.n3.nabble.com/WordDelimiterFilterFactory-and-StandardTokenizer-tp4131628p4136146.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > > > >
Re: WordDelimiterFilterFactory and StandardTokenizer
Hi Diego, Did you miss Shawn's response? His ICUTokenizerFactory solution is better than mine. By the way, what solr version are you using? Does StandardTokenizer set type attribute for CJK words? To filter out given types, you not need a custom filter. Type Token filter serves exactly that purpose. https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-TypeTokenFilter On Tuesday, May 20, 2014 5:50 PM, Diego Fernandez wrote: Great, thanks for the information! Right now we're using the StandardTokenizer types to filter out CJK characters with a custom filter. I'll test using MappingCharFilters, although I'm a little concerned with possible adverse scenarios. Diego Fernandez - 爱国 Software Engineer US GSS Supportability - Diagnostics - Original Message - > Hi Aiguofer, > > You mean ClassicTokenizer? Because StandardTokenizer does not set token types > (e-mail, url, etc). > > > I wouldn't go with the JFlex edit, mainly because maintenance costs. It will > be a burden to maintain a custom tokenizer. > > MappingCharFilters could be used to manipulate tokenizer behavior. > > Just an example, if you don't want your tokenizer to break on hyphens, > replace it with something that your tokenizer does not break. For example > under score. > > "-" => "_" > > > > Plus WDF can be customized too. Please see types attribute : > > http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/test-files/solr/collection1/conf/wdftypes.txt > > > Ahmet > > > On Friday, May 16, 2014 6:24 PM, aiguofer wrote: > Jack Krupansky-2 wrote > > > Typically the white space tokenizer is the best choice when the word > > delimiter filter will be used. > > > > -- Jack Krupansky > > If we wanted to keep the StandardTokenizer (because we make use of the token > types) but wanted to use the WDFF to get combinations of words that are > split with certain characters (mainly - and /, but possibly others as well), > what is the suggested way of accomplishing this? Would we just have to > extend the JFlex file for the tokenizer and re-compile it? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/WordDelimiterFilterFactory-and-StandardTokenizer-tp4131628p4136146.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: Issue paging when sorting on a Date field
This is using the solr.TrieDateField, it is the field type "date" from the example schema in solr 4.6.1: After further testing I was only able to reproduce this in a sharded & replicated environment (numShards=3, replicationFactor=2) and I think I have narrowed down the issue, and at this point it may be expected behavior... I took a query like q=create_date:[2014-05-19T00:00:00Z TO 2014-05-19T23:59:59Z]&sort=create_date DESC&start=0&rows=1 which should get all the documents for yesterday sorted by create date, and then added distrib=false and ran it against shard1_replica1 and shard1_replica2. Then I diff'd the files and it showed 5 occurrences where two consecutive rows in one replica were reversed in the other replica, and in all 5 cases the flipped flopped rows had the exact same create_date value, which happened to only go down to the minute. As an example: shard1_replica1: ... docX, 2014-05-19T20:15:00Z docY, 2014-05-19T20:15:00Z ... shard1_replica2: ... docY, 2014-05-19T20:15:00Z docX, 2014-05-19T20:15:00Z ... So I think when I was paging through the results, if the query for page N was handled by replica1 and page N+1 handled by replica2, and the page boundary happened to be where the reversed rows were, this would produce the behavior I was seeing where the last row from the previous page was also the first row from the next page. I guess the obvious solution is to ensure the date field is always more granular than minutes, or add another field to the sort order to consistently break ties. On Mon, May 19, 2014 at 4:19 PM, Chris Hostetter wrote: > > : Using Solr 4.6.1 and in my schema I have a date field storing the time a > : document was added to Solr. > > what *exactly* does your schema look like? are you using "solr.DateField" > or "solr.TrieDateField" ? what field options do you have specified? > > : I have a utility program which: > : - queries for all of the documents in the previous day sorted by create > date > : - pages through the results keeping track of the unique document ids > : - compare the total number of unique doc ids to the numFound to see if it > : they match > > what *exactly* do your queries look like? show us some examples please > (URL & results). Are you using distributed searching across multiple > nodes, or a single node? do you have concurrent updates going on during > your test? > > : It is not consistent between tests, the number of occurrences changes and > : the locations of the occurrences can change as well. The larger the > result > : set, and smaller the page size, the more frequent the occurrences are. > > if you bring up a test instance of Solr using your current configs, can > you reproduce (even occasionally) with some synthetic data you can share > with us? If so please provide your full configs & sample data (ie: create > a Jira & attach all the neccessary files i na ZIP) > > > -Hoss > http://www.lucidworks.com/ >
Re: Error initializing QueryElevationComponent
Hi, I have changed & as "&" Now, core is getting initialized. But document added in elevate.xml is not coming as top result. Also, why below query is not returning any results though document is available in index? http://localhost:8080/solr/master/select?q=_uniqueid:"sitecore://master/{450555a7-2cf7-40ec-a4ad-a67926d23c4a}?lang=en&ver=1"; Please suggest as I am struck with this.. Thanks, G. Naresh Kumar -- View this message in context: http://lucene.472066.n3.nabble.com/Error-initializing-QueryElevationComponent-tp4133914p4137160.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: WordDelimiterFilterFactory and StandardTokenizer
Great, thanks for the information! Right now we're using the StandardTokenizer types to filter out CJK characters with a custom filter. I'll test using MappingCharFilters, although I'm a little concerned with possible adverse scenarios. Diego Fernandez - 爱国 Software Engineer US GSS Supportability - Diagnostics - Original Message - > Hi Aiguofer, > > You mean ClassicTokenizer? Because StandardTokenizer does not set token types > (e-mail, url, etc). > > > I wouldn't go with the JFlex edit, mainly because maintenance costs. It will > be a burden to maintain a custom tokenizer. > > MappingCharFilters could be used to manipulate tokenizer behavior. > > Just an example, if you don't want your tokenizer to break on hyphens, > replace it with something that your tokenizer does not break. For example > under score. > > "-" => "_" > > > > Plus WDF can be customized too. Please see types attribute : > > http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/test-files/solr/collection1/conf/wdftypes.txt > > > Ahmet > > > On Friday, May 16, 2014 6:24 PM, aiguofer wrote: > Jack Krupansky-2 wrote > > > Typically the white space tokenizer is the best choice when the word > > delimiter filter will be used. > > > > -- Jack Krupansky > > If we wanted to keep the StandardTokenizer (because we make use of the token > types) but wanted to use the WDFF to get combinations of words that are > split with certain characters (mainly - and /, but possibly others as well), > what is the suggested way of accomplishing this? Would we just have to > extend the JFlex file for the tokenizer and re-compile it? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/WordDelimiterFilterFactory-and-StandardTokenizer-tp4131628p4136146.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Autoscaling Solr instances in AWS
We are running Solr 4.6.1 in AWS: - 2 Solr instances (1 shard, 1 leader, 1 replica) - 1 CloudSolrServer SolrJ client updating the index. - 3 Zookeepers The Solr instances are behind a load balanceer and also in an auto scaling group. The ScaleUpPolicy will add up to 9 additional instances (replicas), 1 per minute. Later, the 9 replicas are terminated with the ScaleDownPolicy. Problem: during the ScaleUpPolicy, when the Solr Leader is under heavy query load, the SolrJ indexing client issues a commit which hangs and never returns. Note that the index schema contains 3 ExternalFileFields wich slow down the commit process. Here's the stack trace: Thread 1959: (state = IN_NATIVE) - java.net.SocketInputStream.socketRead0(java.io.FileDescriptor, byte[], int, int, int) @bci=0 (Compiled frame; information may be imprecise) - java.net.SocketInputStream.read(byte[], int, int, int) @bci=79, line=150 (Compiled frame) - java.net.SocketInputStream.read(byte[], int, int) @bci=11, line=121 (Compiled frame) - org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer() @bci=71, line=166 (Compiled frame) - org.apache.http.impl.io.SocketInputBuffer.fillBuffer() @bci=1, line=90 (Compiled frame) - org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(org.apache.http.util.CharArrayBuffer) @bci=137, line=281 (Compiled frame) - org.apache.http.impl.conn.LoggingSessionInputBuffer.readLine(org.apache.http.util.CharArrayBuffer) @bci=5, line=115 (Compiled frame) - org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(org.apache.http.io.SessionInputBuffer) @bci=16, line=92 (Compiled frame) - org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(org.apache.http.io.SessionInputBuffer) @bci=2, line=62 (Compiled frame) - org.apache.http.impl.io.AbstractMessageParser.parse() @bci=38, line=254 (Compiled frame) - org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader() @bci=8, line=289 (Compiled frame) - org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader() @bci=1, line=252 (Compiled frame) - org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader() @bci=6, line=191 (Compiled frame) - org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(org.apache.http.HttpRequest, org.apache.http.HttpClientConnection, org.apache.http.protocol.HttpContext) @bci=62, line=300 (Compiled frame) - org.apache.http.protocol.HttpRequestExecutor.execute(org.apache.http.HttpRequest, org.apache.http.HttpClientConnection, org.apache.http.protocol.HttpContext) @bci=60, line=127 (Compiled frame) - org.apache.http.impl.client.DefaultRequestDirector.tryExecute(org.apache.http.impl.client.RoutedRequest, org.apache.http.protocol.HttpContext) @bci=198, line=717 (Compiled frame) - org.apache.http.impl.client.DefaultRequestDirector.execute(org.apache.http.HttpHost, org.apache.http.HttpRequest, org.apache.http.protocol.HttpContext) @bci=597, line=522 (Compiled frame) - org.apache.http.impl.client.AbstractHttpClient.execute(org.apache.http.HttpHost, org.apache.http.HttpRequest, org.apache.http.protocol.HttpContext) @bci=344, line=906 (Compiled frame) - org.apache.http.impl.client.AbstractHttpClient.execute(org.apache.http.client.methods.HttpUriRequest, org.apache.http.protocol.HttpContext) @bci=21, line=805 (Compiled frame) - org.apache.http.impl.client.AbstractHttpClient.execute(org.apache.http.client.methods.HttpUriRequest) @bci=6, line=784 (Compiled frame) - org.apache.solr.client.solrj.impl.HttpSolrServer.request(org.apache.solr.client.solrj.SolrRequest, org.apache.solr.client.solrj.ResponseParser) @bci=1175, line=395 (Compiled frame) - org.apache.solr.client.solrj.impl.HttpSolrServer.request(org.apache.solr.client.solrj.SolrRequest) @bci=17, line=199 (Compiled frame) - org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(org.apache.solr.client.solrj.impl.LBHttpSolrServer$Req) @bci=132, line=285 (Compiled frame) - org.apache.solr.client.solrj.impl.CloudSolrServer.request(org.apache.solr.client.solrj.SolrRequest) @bci=838, line=640 (Compiled frame) - org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(org.apache.solr.client.solrj.SolrServer) @bci=17, line=117 (Compiled frame) - org.apache.solr.client.solrj.SolrServer.commit(boolean, boolean) @bci=16, line=168 (Interpreted frame) - org.apache.solr.client.solrj.SolrServer.commit() @bci=3, line=146 (Interpreted frame) The Solr leader log shows many connection timeout exceptions from the other Solr replicas during this period. Some of these timeouts may have been caused by replicas disappearing from the ScaleDownPolicy. From the search client application's point of view, everything looked fine, but indexing stopped until I restarted the SolrJ client. Does this look like a case where a timeout value needs to be increased somewhere? If so, which one? Thanks, Peter
Vague Behavior while setting Solr Cloud
Hi Everyone, I am trying to setup Solr Cloud referring to the blog http://myjeeva.com/solrcloud-cluster-single-collection-deployment.html if I complete the set in one go, then it seems to be going fine. when the setup is complete and I am trying to restart Solr by restarted Tomcat instance, it does not deploy and moreover the shards and replicas are not up. Urgent call, let me know if you know anything! Thanks in Advance!
Re: trigger delete on nested documents
Am 20.05.2014 14:11, schrieb Jack Krupansky: To be clear, you cannot update a single document of a nested document in place - you must reindex the whole block, parent and all children. This is because this feature relies on the underlying Lucene block join feature that requires that the documents be contiguous, and updating a single child document would make it discontiguous with the rest of the block of documents. Just update the block by resending the entire block of documents. For e previous discussion of this limitation: http://lucene.472066.n3.nabble.com/block-join-and-atomic-updates-td4117178.html This is totally clear to me and I want nested document to not be accessible without it's root context. There is no way it seems to delete the whole block by the id of the root document. There is no way to update the root document that removes the stale date from the index. Normal SOLR behavior is to automatically delete old documents with same ID. I expect this behavior for other documents in this block to. Anyway to make things clear I issued a JIRA request and tried to explain it more carefully there: https://issues.apache.org/jira/browse/SOLR-6096 regards Thomas
Re: trigger delete on nested documents
To be clear, you cannot update a single document of a nested document in place - you must reindex the whole block, parent and all children. This is because this feature relies on the underlying Lucene block join feature that requires that the documents be contiguous, and updating a single child document would make it discontiguous with the rest of the block of documents. Just update the block by resending the entire block of documents. For e previous discussion of this limitation: http://lucene.472066.n3.nabble.com/block-join-and-atomic-updates-td4117178.html -- Jack Krupansky -Original Message- From: Thomas Scheffler Sent: Tuesday, May 20, 2014 4:27 AM To: solr-user@lucene.apache.org Subject: Re: trigger delete on nested documents Am 19.05.2014 19:25, schrieb Mikhail Khludnev: Thomas, Vanilla way to override a blocks is to send it with the same unique-key (I guess it's "id" for your case, btw don't you have unique-key defined in the schema?), but it must have at least one child. It seems like analysis issue to me https://issues.apache.org/jira/browse/SOLR-5211 While block is indexed the special field _root_ equal to the is added across the whole block (caveat, it's not stored by default). At least you can issue _root_:PK_VAL to wipe the whole block. Thank you for your insight information. It sure helps a lot in understanding. The '_root_' field was new to me on this rather poor documented feature of SOLR. It helps already if I perform single updates and deletes from the index. BUT: If I delete by a query this results in a mess: 1.) request all IDs returned by that query 2.) fire a giant delete query with "id:(id1 OR .. OR idn) _root_:(id1 OR .. OR idn)" Before every update of single documents I have to fire a delete request. This turns into a mess, when updating in batch mode: 1.) remove chunk of 100 documents and nested documents (see above) 2.) index chunk of 100 documents All information for that is available on SOLR side. Can I configure some hook that is executed on SOLR-Server so that I do not have to change all applications? This would at least save these extra network transfers. After big work to migrate from plain Lucene to SOLR I really require proper nested document support. Elastic Search seems to support it (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-nested-type.html) but I am afraid of another migration. Elastic Search even hides the nested documents at queries which seems nice, too. Does anyone have information how nested document support evolve in future releases of SOLR? kind regards, Thomas 19.05.2014 10:37 пользователь "Thomas Scheffler" < thomas.scheff...@uni-jena.de> написал: Hi, I plan to use nested documents to group some of my fields art0001 My first article art0001-foo Smith, John author art0001-bar Power, Max reviewer This way can ask for any documents that are reviewed by Max Power. However to simplify update and deletes I want to ensure that nested documents are deleted automatically on update and delete of the parent document. Does anyone had to deal with this problem and found a solution?
[ANNOUNCE] Apache Solr 4.8.1 released
May 2014, Apache Solr™ 4.8.1 available The Lucene PMC is pleased to announce the release of Apache Solr 4.8.1 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing fault tolerant distributed search and indexing, and powers the search and navigation features of many of the world's largest internet sites. Solr 4.8.1 is available for immediate download at: http://lucene.apache.org/solr/mirrors-solr-latest-redir.html Solr 4.8.1 includes 10 bug fixes, as well as Lucene 4.8.1 and its bug fixes. See the CHANGES.txt file included with the release for a full list of changes and further details. Please report any feedback to the mailing lists (http://lucene.apache.org/solr/discussion.html) Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access.
Re: Howto Search word which contains the character "
It looks like it was escaped in the query, but the word delimiter filter will remove it and treat it as if it were white space. The "types" attribute for WDF can point to a file containing the types for various characters, so you could map a quote to ALPHA. The doc is sketchy, but there are some examples in my e-book that shows how to map @ and _ to ALPHA. -- Jack Krupansky -Original Message- From: Ahmet Arslan Sent: Tuesday, May 20, 2014 4:55 AM To: solr-user@lucene.apache.org Subject: Re: Howto Search word which contains the character " Hi, It is special query parser character, so it needs to be escaped. http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Escaping%20Special%20Characters Ahmet On Tuesday, May 20, 2014 10:57 AM, heyyo wrote: In hebrew words could contain the character *"* ex: דו"ח I would like to know how to configure my schema.xml to be able to index and search correctly those types of words. If I search this character *"* inside solr query tool I got this debug: /"debug": { "rawquerystring": "\"", "querystring": "\"", "parsedquery": "(+())/no_coord", "parsedquery_toString": "+()", / So if I understand correctly solr remove the " when the query is parsed. I'm using this schema: -- View this message in context: http://lucene.472066.n3.nabble.com/Howto-Search-word-which-contains-the-character-tp4137083.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr Cloud Shards and Replica not reviving after restarting
Hi Everyone, I have installed Solr Cloud 4.6.2 with external Zookeeper and Tomcat, having 3 shards with 2 replica each. I tried indexing some documents which went easy. After which I restarted my Tomcat, and now the Shards are not getting up, its coming up with bunch of Exceptions. First exception was "*no servers hosting shard:"* All the replica and leader are down and not responding, its even giving RecoveryStrategy Error while trying to recover. core=recollection_shard1_replica1:org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://192.168.2.183:9090/solr It would be great if you can help me out solving this issue. Expert advice needed. Thanks in Advance!
Re: How to optimize single shard only?
As I wrote before index is being rewritten so it grows during optimization and later is reduced. I guess there was OOM in your case. On 20 May 2014 12:11, YouPeng Yang wrote: > Hi > My DIH work indeed hangs, I have only four shards,each has a master and a > replica.Maybe jvm memory size is very low.it was 3G while the size of > every > my core is almost 16GB. > > I also have found that the size of the master increased during the > optimization(you can check on the overview page of the core.).the > phenomenon is very werid. Is it because that the collection overall > optimization will comput and copy all the docs of the whole collection. > > > Version Gen Size Master (Searching) > 1400501330248 > 98396 >29.83 GB > Master (Replicable) > 1400501330888 > 98397 > - > > > After I have check source code,unfortunatly,it seems the optimize action > distrib overall the collection.you can reference the > SolrCmdDistributor.distribCommit. > > > 2014-05-20 17:27 GMT+08:00 Marcin Rzewucki : > > > Well, it should not hang if all is configured fine :) How many shards and > > memory you have ? Note that optimize rewrites index so you might need > > additional disk space for this process. Optimizing works fine however I'd > > like to be able to do it on a single shard as well. > > > > > > On 20 May 2014 11:19, YouPeng Yang wrote: > > > > > Hi Marcin > > > > > > Thanks to your mail,now I know why my cloud hangs when I just click > the > > > optimize button on the overview page of the shard. > > > > > > > > > 2014-05-20 15:25 GMT+08:00 Ahmet Arslan : > > > > > > > Hi Marcin, > > > > > > > > just a guess, pass distrib=false ? > > > > > > > > > > > > > > > > Ahmet > > > > > > > > > > > > On Tuesday, May 20, 2014 10:23 AM, Marcin Rzewucki < > > mrzewu...@gmail.com> > > > > wrote: > > > > Hi, > > > > > > > > Do you know how to optimize index on a single shard only ? I was > trying > > > to > > > > use "optimize=true&waitFlush=true&shard.keys=myshard" but it does not > > > work > > > > - it optimizes all shards instead of just one. > > > > > > > > Kind regards. > > > > > > > > > > > > > >
Re: How to optimize single shard only?
Hi My DIH work indeed hangs, I have only four shards,each has a master and a replica.Maybe jvm memory size is very low.it was 3G while the size of every my core is almost 16GB. I also have found that the size of the master increased during the optimization(you can check on the overview page of the core.).the phenomenon is very werid. Is it because that the collection overall optimization will comput and copy all the docs of the whole collection. Version Gen Size Master (Searching) 1400501330248 98396 29.83 GB Master (Replicable) 1400501330888 98397 - After I have check source code,unfortunatly,it seems the optimize action distrib overall the collection.you can reference the SolrCmdDistributor.distribCommit. 2014-05-20 17:27 GMT+08:00 Marcin Rzewucki : > Well, it should not hang if all is configured fine :) How many shards and > memory you have ? Note that optimize rewrites index so you might need > additional disk space for this process. Optimizing works fine however I'd > like to be able to do it on a single shard as well. > > > On 20 May 2014 11:19, YouPeng Yang wrote: > > > Hi Marcin > > > > Thanks to your mail,now I know why my cloud hangs when I just click the > > optimize button on the overview page of the shard. > > > > > > 2014-05-20 15:25 GMT+08:00 Ahmet Arslan : > > > > > Hi Marcin, > > > > > > just a guess, pass distrib=false ? > > > > > > > > > > > > Ahmet > > > > > > > > > On Tuesday, May 20, 2014 10:23 AM, Marcin Rzewucki < > mrzewu...@gmail.com> > > > wrote: > > > Hi, > > > > > > Do you know how to optimize index on a single shard only ? I was trying > > to > > > use "optimize=true&waitFlush=true&shard.keys=myshard" but it does not > > work > > > - it optimizes all shards instead of just one. > > > > > > Kind regards. > > > > > > > > >
Re: How to optimize single shard only?
Well, it should not hang if all is configured fine :) How many shards and memory you have ? Note that optimize rewrites index so you might need additional disk space for this process. Optimizing works fine however I'd like to be able to do it on a single shard as well. On 20 May 2014 11:19, YouPeng Yang wrote: > Hi Marcin > > Thanks to your mail,now I know why my cloud hangs when I just click the > optimize button on the overview page of the shard. > > > 2014-05-20 15:25 GMT+08:00 Ahmet Arslan : > > > Hi Marcin, > > > > just a guess, pass distrib=false ? > > > > > > > > Ahmet > > > > > > On Tuesday, May 20, 2014 10:23 AM, Marcin Rzewucki > > wrote: > > Hi, > > > > Do you know how to optimize index on a single shard only ? I was trying > to > > use "optimize=true&waitFlush=true&shard.keys=myshard" but it does not > work > > - it optimizes all shards instead of just one. > > > > Kind regards. > > > > >
Re: How to optimize single shard only?
Hi Maybe you can try _router_=myshard? I will check the source code ,note you later. 2014-05-20 17:19 GMT+08:00 YouPeng Yang : > Hi Marcin > > Thanks to your mail,now I know why my cloud hangs when I just click the > optimize button on the overview page of the shard. > > > 2014-05-20 15:25 GMT+08:00 Ahmet Arslan : > > Hi Marcin, >> >> just a guess, pass distrib=false ? >> >> >> >> Ahmet >> >> >> On Tuesday, May 20, 2014 10:23 AM, Marcin Rzewucki >> wrote: >> Hi, >> >> Do you know how to optimize index on a single shard only ? I was trying to >> use "optimize=true&waitFlush=true&shard.keys=myshard" but it does not work >> - it optimizes all shards instead of just one. >> >> Kind regards. >> >> >
Re: How to optimize single shard only?
Hi Marcin Thanks to your mail,now I know why my cloud hangs when I just click the optimize button on the overview page of the shard. 2014-05-20 15:25 GMT+08:00 Ahmet Arslan : > Hi Marcin, > > just a guess, pass distrib=false ? > > > > Ahmet > > > On Tuesday, May 20, 2014 10:23 AM, Marcin Rzewucki > wrote: > Hi, > > Do you know how to optimize index on a single shard only ? I was trying to > use "optimize=true&waitFlush=true&shard.keys=myshard" but it does not work > - it optimizes all shards instead of just one. > > Kind regards. > >
Re: Howto Search word which contains the character "
Hi, It is special query parser character, so it needs to be escaped. http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Escaping%20Special%20Characters Ahmet On Tuesday, May 20, 2014 10:57 AM, heyyo wrote: In hebrew words could contain the character *"* ex: דו"ח I would like to know how to configure my schema.xml to be able to index and search correctly those types of words. If I search this character *"* inside solr query tool I got this debug: /"debug": { "rawquerystring": "\"", "querystring": "\"", "parsedquery": "(+())/no_coord", "parsedquery_toString": "+()", / So if I understand correctly solr remove the " when the query is parsed. I'm using this schema: -- View this message in context: http://lucene.472066.n3.nabble.com/Howto-Search-word-which-contains-the-character-tp4137083.html Sent from the Solr - User mailing list archive at Nabble.com.
the whole web instance hangs when optimize one core.
Hi. I am using solr4.6, in one my core it contains 50 million docs,and I am just click the optimized button on the overview page of the core,and the whole web instance hangs,one phenomenon is the DIH on another core hanged. Is it a known problem or something wrong with my env? Regards
Re: trigger delete on nested documents
Am 19.05.2014 19:25, schrieb Mikhail Khludnev: Thomas, Vanilla way to override a blocks is to send it with the same unique-key (I guess it's "id" for your case, btw don't you have unique-key defined in the schema?), but it must have at least one child. It seems like analysis issue to me https://issues.apache.org/jira/browse/SOLR-5211 While block is indexed the special field _root_ equal to the is added across the whole block (caveat, it's not stored by default). At least you can issue _root_:PK_VAL to wipe the whole block. Thank you for your insight information. It sure helps a lot in understanding. The '_root_' field was new to me on this rather poor documented feature of SOLR. It helps already if I perform single updates and deletes from the index. BUT: If I delete by a query this results in a mess: 1.) request all IDs returned by that query 2.) fire a giant delete query with "id:(id1 OR .. OR idn) _root_:(id1 OR .. OR idn)" Before every update of single documents I have to fire a delete request. This turns into a mess, when updating in batch mode: 1.) remove chunk of 100 documents and nested documents (see above) 2.) index chunk of 100 documents All information for that is available on SOLR side. Can I configure some hook that is executed on SOLR-Server so that I do not have to change all applications? This would at least save these extra network transfers. After big work to migrate from plain Lucene to SOLR I really require proper nested document support. Elastic Search seems to support it (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-nested-type.html) but I am afraid of another migration. Elastic Search even hides the nested documents at queries which seems nice, too. Does anyone have information how nested document support evolve in future releases of SOLR? kind regards, Thomas 19.05.2014 10:37 пользователь "Thomas Scheffler" < thomas.scheff...@uni-jena.de> написал: Hi, I plan to use nested documents to group some of my fields art0001 My first article art0001-foo Smith, John author art0001-bar Power, Max reviewer This way can ask for any documents that are reviewed by Max Power. However to simplify update and deletes I want to ensure that nested documents are deleted automatically on update and delete of the parent document. Does anyone had to deal with this problem and found a solution?
Re: solr-user Digest of: get.100322
Thank you for your reply! I also found docValues after sending an email and your suggestion seems the best solution for me. Now I'm configuring schema.xml to use docValues and have a question about docValuesFormat. According to this thread( http://lucene.472066.n3.nabble.com/Trade-offs-in-choosing-DocValuesFormat-td4114758.html ), Solr 4.6 only holds some hash structures in memory space with the default docValuesFormat configuration. Though it uses only small amount of memory I'm worried about memory usage because I have to store so many documents. (32GB RAM / total 5B docs, sum of docs. of all cores) Which docValuesFormat is more appropriate in my case? (Default or Disk?) Can I change it later without re-indexing? On Sat, May 17, 2014 at 9:45 PM, wrote: > > solr-user Digest of: get.100322 > > Topics (messages 100322 through 100322) > > Re: Sorting problem in Solr due to Lucene Field Cache > 100322 by: Joel Bernstein > > Administrivia: > > > --- Administrative commands for the solr-user list --- > > I can handle administrative requests automatically. Please > do not send them to the list address! Instead, send > your message to the correct command address: > > To subscribe to the list, send a message to: > > > To remove your address from the list, send a message to: > > > Send mail to the following for info and FAQ for this list: > > > > Similar addresses exist for the digest list: > > > > To get messages 123 through 145 (a maximum of 100 per request), mail: > > > To get an index with subject and author for messages 123-456 , mail: > > > They are always returned as sets of 100, max 2000 per request, > so you'll actually get 100-499. > > To receive all messages with the same subject as message 12345, > send a short message to: > > > The messages should contain one line or word of text to avoid being > treated as sp@m, but I will ignore their content. > Only the ADDRESS you send to is important. > > You can start a subscription for an alternate address, > for example "john@host.domain", just add a hyphen and your > address (with '=' instead of '@') after the command word: > > > To stop subscription for this address, mail: > > > In both cases, I'll send a confirmation message to that address. When > you receive it, simply reply to it to complete your subscription. > > If despite following these instructions, you do not get the > desired results, please contact my owner at > solr-user-ow...@lucene.apache.org. Please be patient, my owner is a > lot slower than I am ;-) > > --- Enclosed is a copy of the request I received. > > Return-Path: > Received: (qmail 64267 invoked by uid 99); 17 May 2014 12:22:20 - > Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) > by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 17 May 2014 12:22:20 + > X-ASF-Spam-Status: No, hits=-0.7 required=5.0 > tests=RCVD_IN_DNSWL_LOW,SPF_PASS > X-Spam-Check-By: apache.org > Received-SPF: pass (athena.apache.org: domain of invictu...@gmail.com > designates 209.85.128.193 as permitted sender) > Received: from [209.85.128.193] (HELO mail-ve0-f193.google.com) > (209.85.128.193) > by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 17 May 2014 12:22:14 + > Received: by mail-ve0-f193.google.com with SMTP id sa20so1075564veb.8 > for ; Sat, 17 May 2014 > 05:21:54 -0700 (PDT) > DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; > d=gmail.com; s=20120113; > h=mime-version:date:message-id:subject:from:to:content-type; > bh=QzTOKgbCPT36kZdZcCT/uV4aRZ2PlQ3OgQFPLH0SCoc=; > b=yygC07cHEwmRg6rS0bHxGg5AaqtPRdsozFD6eO8ssVVC+YsfT32ZWUDDk9s7/2Z91Q > aCwFsbb7Thla9nkKbtMctqonOacly29Tsple/lzQX5qOQyAFdzOsQHpim+9jB+W0B1Ac > ZEDLqPzdMG8ZszKDa8lJ8yRadUtlb83HgB56PulZLh1XQG+WOMAuC8pBQ2zS8c/0lsib > JVehSX/OdqU+6HAhPYcIm6pLNWP4lYPwjTAp66Bms9j2/Y5ROwZ6azwCgGIe2hsk06q6 > 5BSKtoTXAfGweIvTQHEfvp6KgLEhIpgjlgo/s5r0NzNaaRM9zdkhp+qYOWM8nWuT8RAu > ytng== > MIME-Version: 1.0 > X-Received: by 10.220.95.204 with SMTP id e12mr2401964vcn.37.1400329314139; > Sat, 17 May 2014 05:21:54 -0700 (PDT) > Received: by 10.52.10.137 with HTTP; Sat, 17 May 2014 05:21:54 -0700 (PDT) > Date: Sat, 17 May 2014 21:21:54 +0900 > Message-ID: > > Subject: Give me this mail > From: Jeongseok Son > To: solr-user-get.100...@lucene.apache.org > Content-Type: text/plain; charset=UTF-8 > X-Virus-Checked: Checked by ClamAV on apache.org > > > -- > > > > -- Forwarded message -- > From: Joel Bernstein > To: solr-user@lucene.apache.org > Cc: > Date: Fri, 16 May 2014 17:49:51 -0400 > Subject: Re: Sorting problem in Solr due to Lucene Field Cache > Take a look at Solr's use of DocValues: > https://cwiki.apache.org/confluence/display/solr/DocValues. > > There are docValues options that use less memory then the FieldCache. > > Joel Bernstein > Sear
Howto Search word which contains the character "
In hebrew words could contain the character *"* ex: דו"ח I would like to know how to configure my schema.xml to be able to index and search correctly those types of words. If I search this character *"* inside solr query tool I got this debug: /"debug": { "rawquerystring": "\"", "querystring": "\"", "parsedquery": "(+())/no_coord", "parsedquery_toString": "+()", / So if I understand correctly solr remove the " when the query is parsed. I'm using this schema: -- View this message in context: http://lucene.472066.n3.nabble.com/Howto-Search-word-which-contains-the-character-tp4137083.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to optimize single shard only?
Hi Marcin, just a guess, pass distrib=false ? Ahmet On Tuesday, May 20, 2014 10:23 AM, Marcin Rzewucki wrote: Hi, Do you know how to optimize index on a single shard only ? I was trying to use "optimize=true&waitFlush=true&shard.keys=myshard" but it does not work - it optimizes all shards instead of just one. Kind regards.
How to optimize single shard only?
Hi, Do you know how to optimize index on a single shard only ? I was trying to use "optimize=true&waitFlush=true&shard.keys=myshard" but it does not work - it optimizes all shards instead of just one. Kind regards.