Re: How to return more fields on Solr 4.5.1 Suggester?
Hi Omer, That's not how its meant to work; the suggester is giving you potentially matching terms by looking at the set of terms for the given field across the index. Possibly you want to look at the MoreLikeThis component or handler? It will return matching documents, from which you have access to the fields you want. Regards, Lajos On 17/03/2014 14:05, omer sonmez wrote: I am using Solr 4.5.1 to suggest movies for my system. What i need solr to return not only the move_title but also the movie_id that belongs to the movie. As an example; this is kind of what i need:response lst name=responseHeader int name=status0/int int name=QTime1/int /lst lst name=spellcheck lst name=suggestions lst name=har int name=numFound6/int int name=startOffset0/int int name=endOffset3/int arr name=suggestion doc str name=name_autocompletehard eight (1996)/str str name=movie_id144/str /doc doc str name=name_autocompletehard rain (1998)/str str name=movie_id14/str /doc doc str name=name_autocompleteharlem (1993)/str str name=movie_id1044/str /doc /arr /lst /lst /lst /response My search component config is like :searchComponent name=suggest class=solr.SpellCheckComponent lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str str name=fieldname_autocomplete/str str name=spellcheck.onlyMorePopulartrue/str /lst /searchComponent My request hadler config is like:requestHandler name=/suggest class=org.apache.solr.handler.component.SearchHandler lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.count10/str /lst arr name=components strsuggest/str /arr /requestHandler and my shema config is like below:field name=movie_id type=string indexed=true stored=true multiValued=false required=true/ field name=movie_title type=text indexed=true stored=true multiValued=false / !--field name=name_auto type=text_auto indexed=true stored=true multiValued=false /-- field name=name_autocomplete type=text_auto indexed=true stored=true multiValued=false / copyField source=movie_title dest=name_autocomplete / how can i manage to get other fiels using suggester in solr 4.5.1? Thanks,
Re: /suggest
Hi Steve, I've posted previously about a nice Stackoverflow exception I got when using this component ... can you post what you see? I've used it successfully in with a custom dictionary like this: searchComponent name=newsuggester class=solr.SuggestComponent lst name=suggester str name=namenewsuggester/str str name=lookupImplFuzzyLookupFactory/str str name=sourceLocationsuggestions.dict/str str name=storeDirnewsuggester/str str name=suggestAnalyzerFieldTypetext/str str name=buildOnCommittrue/str float name=threshold0.0/float /lst /searchComponent And that works fine, and is a nice improvement over the SpellCheckComponent because it supports fuzzy searching. But this way, I get the overflow when using a text field: searchComponent name=suggest2 class=solr.SuggestComponent lst name=suggester str name=namedefault/str str name=lookupImplFuzzyLookupFactory/str str name=dictionaryImplDocumentDictionaryFactory/str str name=fieldtitle/str str name=weightFieldprice/str str name=suggestAnalyzerFieldTypestring/str /lst /searchComponent Sure would like to see it work! Regards, Lajos Moczar On 17/03/2014 22:11, Steve Huckle wrote: Hi, The Suggest Search Component that comes preconfigured in Solr 4.7.0 solrconfig.xml seems to thread dump when I call it: http://localhost:8983/solr/suggest?spellcheck=onq=acwt=jsonindent=true msg:No suggester named default was configured, Can someone tell me what's going on there? However, I can stop that happening if I replace the preconfigured Suggest Search Component and Request Handler with the Search Component and Request Handler configuration detailed here: https://cwiki.apache.org/confluence/display/solr/Suggester ...but after indexing the data in exampledocs, it doesn't seem to return any suggestions either. Can anyone help suggest how I might get suggest suggesting suggestions? Thanks,
Re: Problems using solr.SpatialRecursivePrefixTreeFieldType
Hi Hamish, Are you running Jetty? In Tomcat, I've put jts-1.13.jar in the WEB-INF/lib directory of the unpacked distribution and restarted. It worked fine. Maybe check file permissions as well ... Regards, Lajos On 16/03/2014 10:18, Hamish Campbell wrote: Hey all, Trying to use SpatialRecursivePrefixTreeFieldType to store extent polygons but I can't seem to get it configured correctly. Hitting this on start up: 4670 [main] ERROR org.apache.solr.core.SolrCore - Error loading core:java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: com/vividsolutions/jts/geom/CoordinateSequenceFactory Per the manual, and David Smiley's previous responses, I've added the jst .jar files to WEB-INF/lib in the solr .war file. I'm still getting the error above, any other clues?
Re: Best practice to support multi-tenant with Solr
Hi Shushuai, Just a few thoughts. I would guess that most people would argue for implementing multi-tenancy within your core (via some unique filter ID) or collection (via document routing) because of the headache of managing individual cores at the scale you are talking about. There are disadvantages the other way too: having a core/collection support multiple tenants does affect scoring, since TF-IDF is calculated across the index, and can open up security implications that you have to address (i.e. making sure a malicious query cannot get another tenants documents). The most important thing you have to lock down is whether there is a need to customize the schema/solrconfig for each tenant. If there is, then having individual cores per tenant is going to be a stronger argument. If I was to guess, and based on my own multi-tenant experience, you'll have some high-end tenants who need their own cores/collections, and a larger number that can all share a configuration. Its like any kind of hosted solution: the cheapest version is one-size-fits-all and involves the minimum of management overhead, while the higher end are more expensive and require more management. My own preference is for a blended environment. While the management of individual cores/collections is not to be taken lightly, I've done it in a variety of hosting situations and it all comes down to smart management and the intelligent use of administrative scripts. I've developed my own set of tools over the years and they work quite well. Finally, I would (in general) argue for cloud-based implementations to give you data redundancy, but that decision would require more information. HTH, Lajos Moczar theconsultantcto.com Enterprise Lucene/Solr On 14/03/2014 23:10, shushuai zhu wrote: Hi, I am looking into Solr 4.7 for best practice of multi-tenancy support. Our use cases require support of thousands of tenants (say 10,000) and the incoming data rate could be more than 10k documents per second. I did some research and found people talked about scaling tenants at all four levels: Solr Cloud Collection Shard Core I am listing them plus some quoted comments from the links. 1) Solr Cloud and Collection http://find.searchhub.org/document/c7caa34d807a8a1b#c7caa34d807a8a1b --- Are you trying to do multi-tenant? If so, you should be talking multi-cluster where you externally manage your tenants, assigning them to clusters, but keeping tenants per cluster down in the dozens/hundreds, and archiving inactive tenants and spinning up (and down) clusters as inactive tenants become active or fall into inactivity. But keeping 1,000 or more tenants active in a single cluster as separate collections is... a no-go. --- 2) Shard http://searchhub.org/2013/06/13/solr-cloud-document-routing/ --- Document routing can be used to achieve a more efficient multi-tenant environment. This can be done by making the tenant id the shard key, which would group all documents from the same tenant on the same shard. --- 3) Core http://find.searchhub.org/document/4312991db2dd90e9#4312991db2dd90e9 --- Every multitenant situation is going to be different, but at the extreme a single core per tenant is the cleanest and provides the best separation, optimal performance, and supports full tf-idf relevancy of document fields for each tenant. --- http://find.searchhub.org/document/fc5b734fba135e83#fc5b734fba135e83 --- Well, we try to use Solr to run a multi-tenant index/search service. We assigns each client a different core with their own config and schema. It would be good for us if we can just let the customer to be able to create cores with their own schema and config. --- I also saw slides talking about scaling time along Collection: timed collections (slides 50 ~ 58) http://www.slideshare.net/sematext/solr-for-indexing-and-searching-logs According to these, I am thinking about the following approach: In a single Solr Cloud, the multi-tenant support is at Core level (one or more cores per tenant), and for better performance, will create a collection every day. When a tenant grows too big, will migrate it from this Solr cloud to a new Solr Cloud. Any potential issue with this approach? Is there better approach based on your experience? A few questions related to proposed approach: 1) When a core is replicated to multiple nodes via multiple shards, the query submitted against a particular core (tenant) should be executed distributed, right? 2) What is the best way to move a core from one Solr Cloud to another? 3) If we create one collection per day and want to keep data for three years for example, is it OK to have so many collections? If yes, is it cheap to maintain the collection alias for easy querying? Thanks. Shushuai
Re: Best practice to support multi-tenant with Solr
Hi Shushuai, --- Finally, I would (in general) argue for cloud-based implementations to give you data redundancy ... --- Do you mean using multi-sharding to have multiple replicas of cores (corresponding to tenants) across nodes? Shushuai What I means first and foremost is that using SolrCloud with replication ensures that your data isn't lost if you lose a note. So in a hosted solution, that's a good thing. If you are using SolrCloud, then its up to you to choose whether to have one collection per tenant, or one collection that supports multiple tenants via document routing. Obviously the former has implications on the number of shards you'll have. For example, if you have a 3-node cluster with replication factor of 2, that's 6 shards per collection. If you have 1,000 tenant collections, that's 6,000 shards. Hence my argument for multiple low-end tenants per collection, and then only give your higher-end tenants their own collections. Just to make things simpler for you ;) Regards, Lajos From: Lajos la...@protulae.com To: solr-user@lucene.apache.org Sent: Saturday, March 15, 2014 5:37 AM Subject: Re: Best practice to support multi-tenant with Solr Hi Shushuai, Just a few thoughts. I would guess that most people would argue for implementing multi-tenancy within your core (via some unique filter ID) or collection (via document routing) because of the headache of managing individual cores at the scale you are talking about. There are disadvantages the other way too: having a core/collection support multiple tenants does affect scoring, since TF-IDF is calculated across the index, and can open up security implications that you have to address (i.e. making sure a malicious query cannot get another tenants documents). The most important thing you have to lock down is whether there is a need to customize the schema/solrconfig for each tenant. If there is, then having individual cores per tenant is going to be a stronger argument. If I was to guess, and based on my own multi-tenant experience, you'll have some high-end tenants who need their own cores/collections, and a larger number that can all share a configuration. Its like any kind of hosted solution: the cheapest version is one-size-fits-all and involves the minimum of management overhead, while the higher end are more expensive and require more management. My own preference is for a blended environment. While the management of individual cores/collections is not to be taken lightly, I've done it in a variety of hosting situations and it all comes down to smart management and the intelligent use of administrative scripts. I've developed my own set of tools over the years and they work quite well. Finally, I would (in general) argue for cloud-based implementations to give you data redundancy, but that decision would require more information. HTH, Lajos Moczar theconsultantcto.com Enterprise Lucene/Solr On 14/03/2014 23:10, shushuai zhu wrote: Hi, I am looking into Solr 4.7 for best practice of multi-tenancy support. Our use cases require support of thousands of tenants (say 10,000) and the incoming data rate could be more than 10k documents per second. I did some research and found people talked about scaling tenants at all four levels: Solr Cloud Collection Shard Core I am listing them plus some quoted comments from the links. 1) Solr Cloud and Collection http://find.searchhub.org/document/c7caa34d807a8a1b#c7caa34d807a8a1b --- Are you trying to do multi-tenant? If so, you should be talking multi-cluster where you externally manage your tenants, assigning them to clusters, but keeping tenants per cluster down in the dozens/hundreds, and archiving inactive tenants and spinning up (and down) clusters as inactive tenants become active or fall into inactivity. But keeping 1,000 or more tenants active in a single cluster as separate collections is... a no-go. --- 2) Shard http://searchhub.org/2013/06/13/solr-cloud-document-routing/ --- Document routing can be used to achieve a more efficient multi-tenant environment. This can be done by making the tenant id the shard key, which would group all documents from the same tenant on the same shard. --- 3) Core http://find.searchhub.org/document/4312991db2dd90e9#4312991db2dd90e9 --- Every multitenant situation is going to be different, but at the extreme a single core per tenant is the cleanest and provides the best separation, optimal performance, and supports full tf-idf relevancy of document fields for each tenant. --- http://find.searchhub.org/document/fc5b734fba135e83#fc5b734fba135e83 --- Well, we try to use Solr to run a multi-tenant index/search service. We assigns each client a different core with their own config
Re: Best practice to support multi-tenant with Solr
Hi Shushuai, Yes, as Robi noted, you have to be careful with terminology: core generally refers to the traditional Solr configuration of a single index + configuration on a single node (optionally replicated to others). A collection is a distributed index that is associated with a configuration (but multiple collections can be associated with the same configuration). A collection is still a single index, however, just like a core - its just spread out across however many nodes you have and replicated according to your chosen replication factor. You can do multi-tenancy with cores and collections, but via different strategies. More inline ... On 15/03/2014 19:17, shushuai zhu wrote: Hi Lajos, thanks again. Your suggestion is to support multi-tenant via collection in a Solr Cloud: putting small tenants in one collection and big tenants in their own collections. My original question was to find out which approach is better: supporting multi-tenant at collection level or core level. Based on the links below and a few comments there, it seems people more prefer at core level. Collection is logical and core is physical. I am trying to figure out the trade-offs between the approaches regarding to scalability, security, performance, and flexibility. My understanding might be wrong, the belows are some rough comparison: 1) Scalability Core is more scalable than collection by number: we can have much more cores than collections in one Solr Cloud? Or collection is more scalable than core by size: a collection could be much bigger than a core? Not sure which one is better: having ~1000 cores or ~1000 collections in a Solr Cloud. SolrCloud is more scalable in terms of index size. Plus you get redundancy which can't be underestimated in a hosted solution. 2) Security Core is more isolated than collection: core is physical and has its own index, but collection is logical so multiple collections may contain the same cores? No: cores are not less or more isolated than collections. Both support multi-tenancy, albeit in different ways. If you do it in a core with some prefix or special field, you just have to be aware of security implications. As Robi said is easily enforced by the middle tier; I use Spring for this, in my case. 3) Performance Core has better performance control since it has its own index? Collection index is bigger so performance is not as good as smaller core index? Not really. You might want to test this, however, to verify with your specific hardware configuration. 4) Flexibilty Core is more flexible since it has its own schema/config, but one collection may have multiple cores hence multiple schemas/configs? Or it does not matter since we can set same schema/config for the whole collection? One could argue that the easiest configuration will be one big collection (or maybe divided up intelligently amongst several big collections). More complex is 1000s of cores or collections. The issue is management. 1000s of cores/collections require a level of automation. On the other hand, having a single core/collection means if you make one change to the schema or solrconfig, it affects everyone. That might not work if you have frequent changes or differing tenant needs. This is a decision you'll have to make yourself, based on your client needs, change management, index sizes, management system, etc, etc. Regards, Lajos Basically, I just want to get opinions about which approach might be better for the given use case. Regards. Shushuai From: Lajos la...@protulae.com To: solr-user@lucene.apache.org Sent: Saturday, March 15, 2014 1:19 PM Subject: Re: Best practice to support multi-tenant with Solr Hi Shushuai, --- Finally, I would (in general) argue for cloud-based implementations to give you data redundancy ... --- Do you mean using multi-sharding to have multiple replicas of cores (corresponding to tenants) across nodes? Shushuai What I means first and foremost is that using SolrCloud with replication ensures that your data isn't lost if you lose a note. So in a hosted solution, that's a good thing. If you are using SolrCloud, then its up to you to choose whether to have one collection per tenant, or one collection that supports multiple tenants via document routing. Obviously the former has implications on the number of shards you'll have. For example, if you have a 3-node cluster with replication factor of 2, that's 6 shards per collection. If you have 1,000 tenant collections, that's 6,000 shards. Hence my argument for multiple low-end tenants per collection, and then only give your higher-end tenants their own collections. Just to make things simpler for you ;) Regards, Lajos From: Lajos la...@protulae.com To: solr-user@lucene.apache.org Sent: Saturday, March 15, 2014 5:37 AM Subject: Re
Re: SOLR cloud disaster recovery
Hi Jan, There are a few ways to do that, but no, nothing is automatic. 1) If your node is alive, you can create new replicas on the new node, let them replicate, verify they are ok, then delete the replicas on the old node and shut it down. 2) If your node is dead, create new replicas on the new node, let them replicate. You'll have to hand-edit clusterstate.json however, to fix the entries for the shards. 3) If you have a fully up-to-date backup of your dead node, just use the same hostname for your new node and restore the backups there. It should be fine. Just verify that the replicas for that node, as listed in clusterstate.json, are present and accounted for. HTH, Lajos On 28/02/2014 16:17, Jan Van Besien wrote: Hi, I am a bit confused about how solr cloud disaster recovery is supposed to work exactly in the case of loosing a single node completely. Say I have a solr cloud cluster with 3 nodes. My collection is created with numShards=3replicationFactor=3maxShardsPerNode=3, so there is no data loss when I loose a node. However, how do configure a new node to take the place of the dead node? I bring up a new node (same hostname, ip, as the dead node) which is completely empty (empty data dir, empty solr.xml), install solr, and connect it to zookeeper. Is it supposed to work automatically from there? In my tests, the server has no cores and the solr-cloud graph overview simply shows all the shards/replicas on this node as down. Do I need to recreate the cores first? Note that these cores were initially created indirectly by creating the collection. Thanks, Jan
Re: SOLR cloud disaster recovery
Hi Jan, There are a few ways to do that, but no, nothing is automatic. 1) If your node is alive, you can create new replicas on the new node, let them replicate, verify they are ok, then delete the replicas on the old node and shut it down. 2) If your node is dead, create new replicas on the new node, let them replicate. You'll have to hand-edit clusterstate.json however, to fix the entries for the shards. 3) If you have a fully up-to-date backup of your dead node, just use the same hostname for your new node and restore the backups there. It should be fine. Just verify that the replicas for that node, as listed in clusterstate.json, are present and accounted for. HTH, Lajos On 28/02/2014 16:17, Jan Van Besien wrote: Hi, I am a bit confused about how solr cloud disaster recovery is supposed to work exactly in the case of loosing a single node completely. Say I have a solr cloud cluster with 3 nodes. My collection is created with numShards=3replicationFactor=3maxShardsPerNode=3, so there is no data loss when I loose a node. However, how do configure a new node to take the place of the dead node? I bring up a new node (same hostname, ip, as the dead node) which is completely empty (empty data dir, empty solr.xml), install solr, and connect it to zookeeper. Is it supposed to work automatically from there? In my tests, the server has no cores and the solr-cloud graph overview simply shows all the shards/replicas on this node as down. Do I need to recreate the cores first? Note that these cores were initially created indirectly by creating the collection. Thanks, Jan
StackOverflow ... the errors, not the site
All, Just playing around with the SuggestComponent, trying to compare results with the old-style spell-check-based suggester. Tried this config against a string field: requestHandler name=/suggest2 class=solr.SearchHandler lst name=defaults str name=wtjson/str str name=indenttrue/str str name=suggesttrue/str str name=suggest.count10/str str name=suggest.dictionarydefault/str /lst arr name=components strsuggest2/str /arr /requestHandler searchComponent name=suggest2 class=solr.SuggestComponent lst name=suggester str name=namedefault/str str name=lookupImplFuzzyLookupFactory/str str name=dictionaryImplDocumentDictionaryFactory/str str name=fieldtitle/str str name=weightFieldprice/str str name=suggestAnalyzerFieldTypestring/str /lst /searchComponent I hit this URL: /suggest2?q=absuggest.build=true and that works, but because title was as StrField, it wasn't quite what I wanted. So I tried a TextField, description. And I get this, with the same URL: ERROR - 2014-02-28 17:29:49.618; org.apache.solr.common.SolrException; null:java.lang.RuntimeException: java.lang.StackOverflowError^M at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:796)^M at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:448)^M at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)^M at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)^M at ... java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)^M at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)^M at java.lang.Thread.run(Thread.java:662)^M Caused by: java.lang.StackOverflowError^M at org.apache.lucene.util.automaton.SpecialOperations.getFiniteStrings(SpecialOperations.java:244)^M at org.apache.lucene.util.automaton.SpecialOperations.getFiniteStrings(SpecialOperations.java:259)^M at org.apache.lucene.util.automaton.SpecialOperations.getFiniteStrings(SpecialOperations.java:259)^M at org.apache.lucene.util.automaton.SpecialOperations.getFiniteStrings(SpecialOperations.java:259)^M at org.apache.lucene.util.automaton.SpecialOperations.getFiniteStrings(SpecialOperations.java:259)^M at org.apache.lucene.util.automaton.SpecialOperations.getFiniteStrings(SpecialOperations.java:259)^M etc etc Any ideas?? Thanks, Lajos
excludeIds in QueryElevationComponent (4.7)
Guys, I've been testing out https://issues.apache.org/jira/browse/SOLR-5541 on 4.7RC4. I previously had an elevate.xml that elevated 3 documents for a specific query. My understanding is that I could, at runtime, exclude one of those. So I tried that like this: http://localhost:8080/solr/ecommerce/search?q=canonexcludeIds=208464207 and now NONE of my documents are elevated. What I would have expected is that I'd have 2 elevated documents, but the 208464207 would not be amongst them. Sadly, what happens is that now nothing is elevated. Am I misunderstanding something or should I open a JIRA? Looking at the source code I can't immediately see what would be wrong. Thanks, Lajos
Re: excludeIds in QueryElevationComponent (4.7)
Hit the send button too fast ... What is seems that is happening is that excludeIds or elevateIds ignores what's in elevate.xml. I would have expected (hoped) that it would layer on top of that, which makes a bit more sense I think. Thanks, Lajos On 25/02/2014 22:58, Lajos wrote: Guys, I've been testing out https://issues.apache.org/jira/browse/SOLR-5541 on 4.7RC4. I previously had an elevate.xml that elevated 3 documents for a specific query. My understanding is that I could, at runtime, exclude one of those. So I tried that like this: http://localhost:8080/solr/ecommerce/search?q=canonexcludeIds=208464207 and now NONE of my documents are elevated. What I would have expected is that I'd have 2 elevated documents, but the 208464207 would not be amongst them. Sadly, what happens is that now nothing is elevated. Am I misunderstanding something or should I open a JIRA? Looking at the source code I can't immediately see what would be wrong. Thanks, Lajos
Re: excludeIds in QueryElevationComponent (4.7)
Thanks Hoss, that makes sense. Anyway, I like the new paradigm better ... it allows for more intelligent elevation control. Cheers, L On 25/02/2014 23:26, Chris Hostetter wrote: : What is seems that is happening is that excludeIds or elevateIds ignores : what's in elevate.xml. I would have expected (hoped) that it would layer on : top of that, which makes a bit more sense I think. That's not how it's implemented -- i believe Joel implemented this way intentional because otherwise if the elevate.xml said elevate A,B and exclude X,Y there would be no simple way to say instead of what's in elevate.xml, i want to elevate X,Y and i don't wnat to exclude *anything* I made sure this was explicitly documented in the ref guide... https://cwiki.apache.org/confluence/display/solr/The+Query+Elevation+Component#TheQueryElevationComponent-TheelevateIdsandexcludeIdsParameters If either one of these parameters is specified at request time, the the entire elevation configuration for the query is ignored. -Hoss http://www.lucidworks.com/
Unloading a SolrCloud core in 4.6.0
Hi all, I just want to verify that it is no longer possible to unload a Cloud core via the Core API UNLOAD command, correct? I had two situations: one where I wanted to remove old replicas in a node that I was deactivating (and I had already created new replicas) and one where I needed to remove a shard I split. In both cases I got this nice stack trace: response lst name=responseHeaderint name=status500/intint name=QTime1/int/lstlst name=errorstr name=tracejava.lang.NullPointerException at org.apache.solr.core.CorePropertiesLocator.delete(CorePropertiesLocator.java:95) at org.apache.solr.core.CoreContainer.remove(CoreContainer.java:754) at org.apache.solr.handler.admin.CoreAdminHandler.handleUnloadAction(CoreAdminHandler.java:589) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:162) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:662) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1041) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:603) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) /strint name=code500/int/lst /response I had to resort to DELETEREPLICA, which worked fine, but I just wanted to verify whether this is a bug or intended behavior. Lots of older docs say to use UNLOAD for these situations. Thanks, Lajos
Re: Announce list
There's always http://projects.apache.org/feeds/rss.xml. L On 03/02/2014 14:59, Arie Zilberstein wrote: Hi, Is there a mailing list for getting just announcements about new versions? Thanks, Arie
Re: Solr middle-ware?
I always go for SolrJ as the intermediate layer, usually in a Spring app. I have sometimes proxied directly to Solr itself, but since we use a lot of Ajax, I'm not comfortable with exposing the Solr URIs directly, even if controlled via a proxy. Having it go through a webapp gives me a layer I can use to validate input; if ever the situation warranted, I could use a filter to check for anything malicious. I can also layer security on top as well. Cheers, Lajos On 22/01/2014 06:45, Alexandre Rafalovitch wrote: So, everybody so far is exposing Solr directly to the web, but with proxy/rewriting. Which means the html/JS libraries are Solr query-format aware as well? Is anybody using Solr clients (SolrNet, SolrJ) as a base? Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Jan 21, 2014 at 9:05 PM, Artem Karpenko gooy...@gmail.com wrote: Hello. Not really middle-ware but might be of interest concerning possible ways implementing security. We use custom built Solr with web.xml including Spring Security filter and appropriate infrastructure classes for authentication added as a dependency into project. We pass token from frontend in each request. If it's accepted in security filter then later user role (identified from token) is used in custom request handler that modifies query according to role permissions. Regards, Artem. 21.01.2014 15:08, Markus Jelsma пишет: Hi - We use Nginx to expose the index to the internet. It comes down to putting some limitations on input parameters and on-the-fly rewrite of queries using embedded Perl scripting. Limitations and rewrites are usually just a bunch of regular expressions, so it is not that hard. Cheers Markus -Original message- From:Alexandre Rafalovitch arafa...@gmail.com Sent: Tuesday 21st January 2014 14:01 To: solr-user@lucene.apache.org Subject: Solr middle-ware? Hello, All the Solr documents talk about not running Solr directly to the cloud. But I see people keep asking for a thin secure layer in front of Solr they can talk from JavaScript to, perhaps with some basic extension options. Has anybody actually written one? Open source or in a community part of larger project? I would love to be able to point people at something. Is there something particularly difficult about writing one? Does anybody has a story of aborted attempt or mid-point reversal? I would like to know. Regards, Alex. P.s. Personal context: I am thinking of doing a series of lightweight examples of how to use Solr. Like I did for a book, but with a bit more depth and something that can actually be exposed to the live web with live data. I don't want to reinvent the wheel of the thin Solr middleware. P.p.s. Though I keep thinking that Dart could make an interesting option for the middleware as it could have the same codebase on the server and in the client. Like NodeJS, but with saner syntax. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
Solr Cloud on HDFS
Hi all, I've been running Solr on HDFS, and that's fine. But I have a Cloud installation I thought I'd try on HDFS. I uploaded the configs for the core that runs in standalone mode already on HDFS (on another cluster). I specify the HdfsDirectoryFactory, HDFS data dir, solr.hdfs.home, and HDFS update log path: dataDirhdfs://master:9000/solr/test/data/dataDir directoryFactory name=DirectoryFactory class=solr.HdfsDirectoryFactory str name=solr.hdfs.homehdfs://master:9000/solr/str /directoryFactory updateHandler class=solr.DirectUpdateHandler2 updateLog str name=dirhdfs://master:9000/solr/test/ulog/str /updateLog /updateHandler Question is: should I create my collection differently than I would a normal collection? If I just try that, Solr will initialise the directory in HDFS as if it were a single core. It will create shard directories on my nodes, but not actually put anything in there. And then it will complain mightily about not being able to forward updates to other nodes. (This same cluster hosts regular collections, and everything is working fine). Am I missing a step? Do I have to manually create HDFS directories for each replica? Thanks, L
Re: Solr Cloud on HDFS
Uugh. I just realised I should have take out the data dir and update log definitions! Now it works fine. Cheers, L On 22/01/2014 11:47, Lajos wrote: Hi all, I've been running Solr on HDFS, and that's fine. But I have a Cloud installation I thought I'd try on HDFS. I uploaded the configs for the core that runs in standalone mode already on HDFS (on another cluster). I specify the HdfsDirectoryFactory, HDFS data dir, solr.hdfs.home, and HDFS update log path: dataDirhdfs://master:9000/solr/test/data/dataDir directoryFactory name=DirectoryFactory class=solr.HdfsDirectoryFactory str name=solr.hdfs.homehdfs://master:9000/solr/str /directoryFactory updateHandler class=solr.DirectUpdateHandler2 updateLog str name=dirhdfs://master:9000/solr/test/ulog/str /updateLog /updateHandler Question is: should I create my collection differently than I would a normal collection? If I just try that, Solr will initialise the directory in HDFS as if it were a single core. It will create shard directories on my nodes, but not actually put anything in there. And then it will complain mightily about not being able to forward updates to other nodes. (This same cluster hosts regular collections, and everything is working fine). Am I missing a step? Do I have to manually create HDFS directories for each replica? Thanks, L
Re: Solr Cloud on HDFS
Thanks Mark ... indeed, some doc updates would help. Regarding what seems to be a popular question on sharding. It seems that it would be a Good Thing that the shards for a collection running HDFS essentially be pointers to the HDFS-replicated index. Is that what your thinking is? I've been following your work recently, would be interested in helping out on this if there's the chance. Is there a JIRA yet on this issue? Thanks, lajos On 22/01/2014 16:57, Mark Miller wrote: Right - solr.hdfs.home is the only setting you should use with SolrCloud. The documentation should probably be improved. If you set the data dir or ulog location in solrconfig.xml explicitly, it will be the same for every collection. SolrCloud shares the solrconfig.xml across SolrCore’s, and this will not work out. By setting solr.hdfs.home and leaving the relative defaults, all of the locations are correctly set for each different collection under solr.hdfs.home without any effort on your part. - Mark On Jan 22, 2014, 7:22:22 AM, Lajos la...@protulae.com wrote: Uugh. I just realised I should have take out the data dir and update log definitions! Now it works fine. Cheers, L On 22/01/2014 11:47, Lajos wrote: Hi all, I've been running Solr on HDFS, and that's fine. But I have a Cloud installation I thought I'd try on HDFS. I uploaded the configs for the core that runs in standalone mode already on HDFS (on another cluster). I specify the HdfsDirectoryFactory, HDFS data dir, solr.hdfs.home, and HDFS update log path: dataDirhdfs://master:9000/solr/test/data/dataDir directoryFactory name=DirectoryFactory class=solr.HdfsDirectoryFactory str name=solr.hdfs.homehdfs://master:9000/solr/str /directoryFactory updateHandler class=solr.DirectUpdateHandler2 updateLog str name=dirhdfs://master:9000/solr/test/ulog/str /updateLog /updateHandler Question is: should I create my collection differently than I would a normal collection? If I just try that, Solr will initialise the directory in HDFS as if it were a single core. It will create shard directories on my nodes, but not actually put anything in there. And then it will complain mightily about not being able to forward updates to other nodes. (This same cluster hosts regular collections, and everything is working fine). Am I missing a step? Do I have to manually create HDFS directories for each replica? Thanks, L
Re: Solr Cloud on HDFS
Cool Mark, I'll keep an eye on this one. L On 22/01/2014 22:36, Mark Miller wrote: Whoops, hit the send keyboard shortcut. I just created a JIRA issue for the first bit I’ll be working on: SOLR-5656: When using HDFS, the Overseer should have the ability to reassign the cores from failed nodes to running nodes. - Mark On Jan 22, 2014, 12:57:46 PM, Lajos la...@protulae.com wrote: Thanks Mark ... indeed, some doc updates would help. Regarding what seems to be a popular question on sharding. It seems that it would be a Good Thing that the shards for a collection running HDFS essentially be pointers to the HDFS-replicated index. Is that what your thinking is? I've been following your work recently, would be interested in helping out on this if there's the chance. Is there a JIRA yet on this issue? Thanks, lajos On 22/01/2014 16:57, Mark Miller wrote: Right - solr.hdfs.home is the only setting you should use with SolrCloud. The documentation should probably be improved. If you set the data dir or ulog location in solrconfig.xml explicitly, it will be the same for every collection. SolrCloud shares the solrconfig.xml across SolrCore’s, and this will not work out. By setting solr.hdfs.home and leaving the relative defaults, all of the locations are correctly set for each different collection under solr.hdfs.home without any effort on your part. - Mark On Jan 22, 2014, 7:22:22 AM, Lajos la...@protulae.com wrote: Uugh. I just realised I should have take out the data dir and update log definitions! Now it works fine. Cheers, L On 22/01/2014 11:47, Lajos wrote: Hi all, I've been running Solr on HDFS, and that's fine. But I have a Cloud installation I thought I'd try on HDFS. I uploaded the configs for the core that runs in standalone mode already on HDFS (on another cluster). I specify the HdfsDirectoryFactory, HDFS data dir, solr.hdfs.home, and HDFS update log path: dataDirhdfs://master:9000/solr/test/data/dataDir directoryFactory name=DirectoryFactory class=solr.HdfsDirectoryFactory str name=solr.hdfs.homehdfs://master:9000/solr/str /directoryFactory updateHandler class=solr.DirectUpdateHandler2 updateLog str name=dirhdfs://master:9000/solr/test/ulog/str /updateLog /updateHandler Question is: should I create my collection differently than I would a normal collection? If I just try that, Solr will initialise the directory in HDFS as if it were a single core. It will create shard directories on my nodes, but not actually put anything in there. And then it will complain mightily about not being able to forward updates to other nodes. (This same cluster hosts regular collections, and everything is working fine). Am I missing a step? Do I have to manually create HDFS directories for each replica? Thanks, L
Re: Advantages of different Servlet Containers
Just go for Tomcat. For all its problems, and I should know having used it since it was originally JavaWebServer, it is perfectly capable of handling high-end production environments provided you tune it correctly. We use it with our customized Solr 1.3 version without any problems. Lajos Simon Wistow wrote: I know that the Solr FAQ says Users should decide for themselves which Servlet Container they consider the easiest/best for their use cases based on their needs/experience. For high traffic scenarios, investing time for tuning the servlet container can often make a big difference. but is there anywhere that lists some of the variosu advantages and disadvantages of, say, Tomcat over Jetty for someone who isn't current with the Java ecosystem? Also, I'm currently using Jetty but I've had to do a horrific hack to make it work under init.d in that I start it up in the background and then tail the output waiting for the line that says the SocketConnector has been started while [ '' = $(tail -1 $LOG | grep 'Started SocketConnector') ] ; do sleep 1 done There's *got* to be a better way of doing this, right? Thanks, Simon No virus found in this incoming message. Checked by AVG - www.avg.com Version: 8.5.409 / Virus Database: 270.14.2/2408 - Release Date: 10/01/09 18:23:00
Help! Issue with tokens in custom synonym filter
Hi all, I've been writing some custom synonym filters and have run into an issue with returning a list of tokens. I have a synonym filter that uses the WordNet database to extract synonyms. My problem is how to define the offsets and position increments in the new Tokens I'm returning. For an input token, I get a list of synonyms from the WordNet database. I then create a ListToken of those results. Each Token is created with the same startOffset, endOffset and positionIncrement of the input Token. Is this correct? My understanding from looking at the Lucene codebase is that the startOffset/endOffset should be the same, as we are referring to the same term in the original text. However, I don't quite get the positionIncrement. I understand that it is relative to the previous term ... does this mean all my synonyms should have a positionIncrement of 0? But whether I use 0 or the positionIncrement of the original input Token, Solr seems to ignore the returned tokens ... This is a summary of what is in my filter: * private IteratorToken output; private ArrayListToken synonyms = null; public Token next(Token in) throws IOException { if (output != null) { // Here we are just outputing matched synonyms // that we previously created from the input token // The input token has already been returned if (output.hasNext()) { return output.next(); } else { return null; } } synonyms = new ArrayListToken(); Token t = input.next(in); if (t == null) return null; String value = new String(t.termBuffer(), 0, t.termLength()).toLowerCase(); // Get list of WordNet synonyms (code removed) // Iterate thru WordNet synonyms for (String wordNetSyn : wordNetSyns) { Token synonym = new Token(t.startOffset(), t.endOffset(), t.type()); synonym.setPositionIncrement(t.getPositionIncrement()); synonym.setTermBuffer(wordNetSyn .toCharArray(), 0, wordNetSyn .length()); synonyms.add(synonym); } output = synonyms.iterator(); // Return the original word, we want it return t; }
Re: Help! Issue with tokens in custom synonym filter
Hi David Ahmet, I hadn't seen the SynonymTokenFilter from Lucene, so that helped. Ultimately, however, it seems I was pretty much doing the right thing, although my token type might have been wrong. Unfortunately, while the tokens are being returned properly (AFAIK), when I do a query using one of the synonyms, I can't get any results. This is not the case if I just directly code in the synonym into the synonyms file with the standard solr synonym filter. So I'll have to keep on hacking away ;) Regarding generating the file from WordNet, we'd considered that but our requirements essentially mean we have to do the heavy lifting within the filter itself. Not that I'm opposed, it is just that I'm apparently missing something simple still. Thanks for the replies. Lajos Smiley, David W. wrote: Although this is not a direct answer to your question, you may want to consider generating a synonyms file from wordnet. Then, you can use the standard synonym filter in Solr. The only downside to this is that the synonym file might be pretty large... but you've probably got some large file for wordnet data any way. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server On 8/31/09 10:32 AM, Lajos la...@protulae.com wrote: Hi all, I've been writing some custom synonym filters and have run into an issue with returning a list of tokens. I have a synonym filter that uses the WordNet database to extract synonyms. My problem is how to define the offsets and position increments in the new Tokens I'm returning. For an input token, I get a list of synonyms from the WordNet database. I then create a ListToken of those results. Each Token is created with the same startOffset, endOffset and positionIncrement of the input Token. Is this correct? My understanding from looking at the Lucene codebase is that the startOffset/endOffset should be the same, as we are referring to the same term in the original text. However, I don't quite get the positionIncrement. I understand that it is relative to the previous term ... does this mean all my synonyms should have a positionIncrement of 0? But whether I use 0 or the positionIncrement of the original input Token, Solr seems to ignore the returned tokens ... This is a summary of what is in my filter: * private IteratorToken output; private ArrayListToken synonyms = null; public Token next(Token in) throws IOException { if (output != null) { // Here we are just outputing matched synonyms // that we previously created from the input token // The input token has already been returned if (output.hasNext()) { return output.next(); } else { return null; } } synonyms = new ArrayListToken(); Token t = input.next(in); if (t == null) return null; String value = new String(t.termBuffer(), 0, t.termLength()).toLowerCase(); // Get list of WordNet synonyms (code removed) // Iterate thru WordNet synonyms for (String wordNetSyn : wordNetSyns) { Token synonym = new Token(t.startOffset(), t.endOffset(), t.type()); synonym.setPositionIncrement(t.getPositionIncrement()); synonym.setTermBuffer(wordNetSyn .toCharArray(), 0, wordNetSyn .length()); synonyms.add(synonym); } output = synonyms.iterator(); // Return the original word, we want it return t; } No virus found in this incoming message. Checked by AVG - www.avg.com Version: 8.5.409 / Virus Database: 270.13.71/2334 - Release Date: 08/29/09 17:51:00