Re: find all two word phrases that appear in more than one document
Thanks Alexandre. I looked at the wiki page for the TermsComponent. But I am not sure if I follow. Do you have an example or some better document? Thanks! :) On Mon, Sep 9, 2013 at 8:17 PM, Alexandre Rafalovitch wrote: > The "phases" are usually called n-grams or shingles. > > You can probably use ShingleFilterFactory to create your shingles (possibly > with outputUnigrams=false) and then use TermsComponent ( > http://wiki.apache.org/solr/TermsComponent) to list the results. > > Regards, >Alex. > > Personal website: http://www.outerthoughts.com/ > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > - Time is the quality of nature that keeps events from happening all at > once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) > > > On Tue, Sep 10, 2013 at 8:22 AM, Ali, Saqib wrote: > > > Dear Solr Ninjas, > > > > We would like to run a query that returns two word phrases that appear in > > more than one document. So for e.g. take the string "Solr Ninja". Since > it > > appears in more than one document in our Solr instance, the query should > > return that. The query should find all such phrases from all the > documents > > in our Solr instance, by querying for two adjacent word combination > > (forming a phrase) in the documents that are in the Solr. These two > > adjacent word combinations should come from the documents in the Solr > > index. > > > > Any ideas on how to write this query? > > > > Thanks. > > >
find all two word phrases that appear in more than one document
Dear Solr Ninjas, We would like to run a query that returns two word phrases that appear in more than one document. So for e.g. take the string "Solr Ninja". Since it appears in more than one document in our Solr instance, the query should return that. The query should find all such phrases from all the documents in our Solr instance, by querying for two adjacent word combination (forming a phrase) in the documents that are in the Solr. These two adjacent word combinations should come from the documents in the Solr index. Any ideas on how to write this query? Thanks.
Re: removing duplicates
Thanks Aloke and Robert. Can you please give me code/query snippets? (newbie here) On Wed, Aug 21, 2013 at 2:31 PM, Aloke Ghoshal wrote: > Hi, > > Facet by one of the duplicate fields (probably by the numeric field that > you mentioned) and set facet.mincount=2. > > Regards, > Aloke > > > On Thu, Aug 22, 2013 at 2:44 AM, Ali, Saqib wrote: > > > hello, > > > > We have documents that are duplicates i.e. the ID is different, but rest > of > > the fields are same. Is there a query that can remove duplicate, and just > > leave one copy of the document on solr? There is one numeric field that > we > > can key off for find duplicates. > > > > Please advise. > > > > Thanks > > >
removing duplicates
hello, We have documents that are duplicates i.e. the ID is different, but rest of the fields are same. Is there a query that can remove duplicate, and just leave one copy of the document on solr? There is one numeric field that we can key off for find duplicates. Please advise. Thanks
Re: [solr 4.4.0] SPLITSHARD and core autodiscovery
Dmitry, That is expected behaviour. You need to manually remove the original core. Thanks. On Fri, Aug 2, 2013 at 6:03 AM, Dmitry Kan wrote: > Hello list, > > I was wondering, if what I see with the split shard a correct behaviour or > is something wrong. > > Following this article: > > http://searchhub.org/2013/06/19/shard-splitting-in-solrcloud/ > > I have issued a low-level core split query: > > > http://localhost:8982/solr/admin/cores?core=core1&action=SPLIT&path=multicore/core11&path=multicore/core12 > > which has completed successfully. Two new index directories got created > under example/multicore directory. > > What didn't happen is core autodiscovery. That is, the dashboard page > still shows the original core core1. > > Is this expected or a bug? > > On a separate note: after splitting a core to two new cores, how does the > search routing work in the non SolrCloud mode environment? Is this taken > care of by Solr (via the original core) or is client side task? > > Thanks, > > Dmitry >
Re: uniqueKey: string vs. long integer
I think I have found an issue with using the long integer for uniqueKey*— *Document routing using ! notation will not work with a long integer uniqueKey :( Thanks Jack and Robi On Thu, Aug 1, 2013 at 10:05 AM, Petersen, Robert < robert.peter...@mail.rakuten.com> wrote: > Hi guys, > > We have used an integer as our unique key since solr 1.3 with no problems > at all. We never thought of using anything else because our solr unique > key is based upon our product sku data base field which is defined as an > integer also. We're on solr 3.6.1 currently. > > Thanks > Robi > > -Original Message- > From: Jack Krupansky [mailto:j...@basetechnology.com] > Sent: Thursday, August 01, 2013 9:27 AM > To: solr-user@lucene.apache.org > Subject: Re: uniqueKey: string vs. long integer > > Although I cringe at the thought of anybody using anything other than a > string for the unique key for a document, I can't point to any part of Solr > that will absolutely fail. I wouldn't be surprised if there weren't a few > nooks and crannies in Solr that might depend on the type of the ID, or at > least depend on it being able to converted to and from string. I'm not sure > if SolrCloud has any dependence on the document ID field type. > > Could you inquire as to why this third party chose to go with a non-string > document key? Just curious if they perceived some advantage. I mean, is the > key used in numeric calculations? Can it be negative? Is it ever sorted? > > But as a Solr best practice, I'd advise against it. > > -- Jack Krupansky > > -Original Message- > From: Ali, Saqib > Sent: Thursday, August 01, 2013 12:02 PM > To: solr-user@lucene.apache.org > Subject: uniqueKey: string vs. long integer > > We have an application that was developed by a third party. It uses > uniqueKey that is a long integer instead of a string. Will there be any > repercussions of using a long integer instead of string for the uniqueKey? > > Thanks! :) > > > >
uniqueKey: string vs. long integer
We have an application that was developed by a third party. It uses uniqueKey that is a long integer instead of a string. Will there be any repercussions of using a long integer instead of string for the uniqueKey? Thanks! :)
Re: FieldCollapsing issues in SolrCloud 4.4
Hello Paul, Can you please explain what you mean by: "To get the exact number of groups, you need to shard along your grouping field" Thanks! :) On Wed, Jul 31, 2013 at 3:08 AM, Paul Masurel wrote: > Do you mean you get different results with group=true? > numFound is supposed returns the number of ungrouped hits. > > To get the number of groups, you are expected to set > set group.ngroups=true. > Even then, the result will only give you an upperbound > in a distributed environment. > To get the exact number of groups, you need to shard along > your grouping field. > > If you have many groups, you may also experience a huge performance > hit, as the current implementation has been heaviy optimized for low > number of groups (e.g. e-commerce categories). > > Paul > > > > On Wed, Jul 31, 2013 at 1:59 AM, Ali, Saqib wrote: > > > Hello all, > > > > Is anyone experiencing issues with the numFound when using group=true in > > SolrCloud 4.4? > > > > Sometimes the results are off for us. > > > > I will post more details shortly. > > > > Thanks. > > > > > > -- > __ > > Masurel Paul > e-mail: paul.masu...@gmail.com >
FieldCollapsing issues in SolrCloud 4.4
Hello all, Is anyone experiencing issues with the numFound when using group=true in SolrCloud 4.4? Sometimes the results are off for us. I will post more details shortly. Thanks.
Using HP SiteScope to monitor individual Solr shards
We would like to use HP SiteScope to monitor the availability of the individual Solr shards. Any ideas on how we can do that? Is there a shard based URL that is a sure shot of knowing that the shard is feeling healthy? Thanks! :)
Re: monitor jvm heap size for solrcloud
You can use SPM (i think): http://sematext.com/spm/solr-performance-monitoring/ On Fri, Jul 26, 2013 at 1:36 PM, Joshi, Shital wrote: > We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes. While > running stress tests, we want to monitor JVM heap size across 10 nodes. Is > there a utility which would connect to all nodes' jmx port and display all > bean details for the cloud? > > Thanks! > > >
maximum number of documents per shard?
still 2.1 billion documents?
Re: zkHost in solr.xml goes missing after SPLITSHARD using Collections API
Thanks Alan and Shawn. Just installed Solr 4.4, and no longer experiencing the issue. Thanks! :) On Tue, Jul 23, 2013 at 7:21 AM, Shawn Heisey wrote: > On 7/23/2013 7:50 AM, Alan Woodward wrote: > > Can you try upgrading to the just-released 4.4? Solr.xml persistence > had all kinds of bugs in 4.3, which should have been fixed now. > > The 4.4.0 release has been finalized and uploaded, but the download link > hasn't been changed yet because the mirror network isn't fully > synchronized yet. It is available from many mirrors, but until the > website download links get changed, there's not yet a direct way to > access it. > > Here's some generic instructions for situations where the new version is > done, but the official announcement isn't out yet: > > http://lucene.apache.org/solr/ > > 1) Go the the Solr website (URL above) and click on the latest version > download button, which at this moment is 4.3.1. Wait for the redirect > to take you to a mirror list. > > 2) Click on one of the mirrors, the best option is usually the one right > on top that the website chose for you. > > 3) When the file list comes up, click the "Parent Directory" link. If > this isn't showing, it will most likely be labelled with ".." instead. > > 4) If a directory for the new version (in this case 4.4.0) is listed, > click on it and then click the file that you want to download. > > If the new version is not listed, click the Back button on your browser > twice, then go back to step 2, but this time choose a different mirror. > > One last reminder: This only works right before a release is officially > announced. These instructions cannot be used while a release is still > in development. > > Thanks, > Shawn > >
zkHost in solr.xml goes missing after SPLITSHARD using Collections API
Hello all, Every time I issue a SPLITSHARD using Collections API, the zkHost attribute in the solr.xml goes missing. I have to manually edit the solr.xml to add zkHost after every SPLITSHARD. Any thoughts on what could be causing this? Thanks.
Re: add to ContributorsGroup - Instructions for setting up SolrCloud on jboss
Thanks Erick! I have added the instructions for running SolrCloud on Jboss: http://wiki.apache.org/solr/SolrCloud%20using%20Jboss I will refine the instructions further, and also post some screenshots. Thanks. On Sun, Jul 14, 2013 at 5:05 AM, Erick Erickson wrote: > Done, sorry it took so long, hadn't looked at the list in a couple of days. > > > Erick > > On Fri, Jul 12, 2013 at 5:46 PM, Ali, Saqib wrote: > > username: saqib > > > > > > On Fri, Jul 12, 2013 at 2:35 PM, Ali, Saqib > wrote: > > > >> Hello, > >> > >> Can you please add me to the ContributorsGroup? I would like to add > >> instructions for setting up SolrCloud using Jboss. > >> > >> thanks. > >> > >> >
Re: Where to specify numShards when startup up a cloud setup
What does the solr.xml look like on the nodes? On Tue, Jul 16, 2013 at 2:36 PM, Robert Stewart wrote: > I want to script the creation of N solr cloud instances (on ec2). > > But its not clear to me where I would specify numShards setting. > From documentation, I see you can specify on the "first node" you start > up, OR alternatively, use the "collections" API to create a new collection > - but in that case you need first at least one running SOLR instance. I > want to push all solr instances with similar configuration onto N instances > and just run them with some number of shards pre-set somehow. Where can I > put numShards configuration setting? > > What I want to do: > > 1) push solr configuration to zookeeper ensemble using zkCli command-line > tool. > 2) create N instances of SOLR running on Ec2, pointing to the same > zookeeper > 3) start all SOLR instances which will become a cloud setup with M shards > (where M > Currently everything starts up with 1 shards, and N replicas. > > I already have one single collection pre-configured. >
Re: Clearing old nodes from zookeper without restarting solrcloud cluster
Hello Luis, I don't think that is possible. If you delete clusterstate.json from zookeeper, you will need to restart the nodes.. I could be very wrong about this Saqib On Mon, Jul 15, 2013 at 8:50 PM, Luis Carlos Guerrero Covo < lcguerreroc...@gmail.com> wrote: > I know that you can clear zookeeper's data directoy using the CLI with the > clear command, I just want to know if its possible to update the cluster's > state without wiping everything out. Anyone have any ideas/suggestions? > > > On Mon, Jul 15, 2013 at 11:21 AM, Luis Carlos Guerrero Covo < > lcguerreroc...@gmail.com> wrote: > > > Hi, > > > > Is there an easy way to clear zookeeper of all offline solr nodes without > > restarting the cluster? We are having some stability issues and we think > it > > maybe due to the leader querying old offline nodes. > > > > thank you, > > > > Luis Guerrero > > > > > > -- > Luis Carlos Guerrero Covo > M.S. Computer Engineering > (57) 3183542047 >
Re: Book contest idea - feedback requested
Hello Alex, This sounds like an excellent idea! :) Saqib On Mon, Jul 15, 2013 at 8:11 PM, Alexandre Rafalovitch wrote: > Hello, > > Packt Publishing has kindly agreed to let me run a contest with e-copies of > my book as prizes: > http://www.packtpub.com/apache-solr-for-indexing-data/book > > Since my book is about learning Solr and targeted at beginners and early > intermediates, here is what I would like to do. I am asking for feedback on > whether people on the mailing list like the idea or have specific > objections to it. > > 1) The basic idea is to get Solr users and write and vote on what they find > hard with Solr, especially in understanding the features (as contrasted > with just missing ones). > 2) I'll probably set it up as a User Voice forum, which has all the > mechanisms for suggesting and voting on ideas. With an easier interface > than Jira > 3) The top N voted ideas will get the books as prizes and I will try to > fix/document/create JIRAs for those issues. > 4) I am hoping to specifically reach out to the communities where Solr is a > component and where they don't necessarily hang out on our mailing list. I > am thinking SolrNet, Drupal, project Blacklight, Cloudera, CrafterCMS, > SiteCore, Typo3, SunSpot, Nutch. Obviously, anybody and everybody from this > list would be absolutely welcome to participate as well. > > Yes? No? Suggestions? > > Also, if you are maintainer of one of the products/services/libraries that > has Solr in it and want to reach out to your community yourself, I think it > would be a lot better than If I did it. Contact me directly and I will let > you know what template/FAQ I want you to include in the announcement > message when it is ready. > > Thank you all in advance for the comments and suggestions. > > Regards, >Alex. > > Personal website: http://www.outerthoughts.com/ > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > - Time is the quality of nature that keeps events from happening all at > once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) >
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
I am getting a java.lang.OutOfMemoryError: Requested array size exceeds VM limit on certain queries. Please advise: 19:25:02,632 INFO [org.apache.solr.core.SolrCore] (http-oktst1509.company.tld/12.5.105.96:8180-9) [collection1] webapp=/solr path=/select params={sort=sent_date+asc&distrib=false&wt=javabin&version=2&rows=2147483647&df=text&fl=id&shard.url= 12.5.105.96:8180/solr/collection1/&NOW=1373675102627&start=0&q=thread_id:1439513570014188310&isShard=true&fq=domain:company.tld+AND+owner:11782344&fsv=true} hits=1 status=0 QTime=1 19:25:02,637 ERROR [org.apache.solr.servlet.SolrDispatchFilter] (http-oktst1509.company.tld/12.5.105.96:8180-2) null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Requested array size exceeds VM limit at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:670) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:380) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:280) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:248) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:275) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161) at org.jboss.as.web.security.SecurityContextAssociationValve.invoke(SecurityContextAssociationValve.java:169) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:155) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:372) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:877) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:679) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:931) at java.lang.Thread.run(Thread.java:722) Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
Re: add to ContributorsGroup - Instructions for setting up SolrCloud on jboss
username: saqib On Fri, Jul 12, 2013 at 2:35 PM, Ali, Saqib wrote: > Hello, > > Can you please add me to the ContributorsGroup? I would like to add > instructions for setting up SolrCloud using Jboss. > > thanks. > >
add to ContributorsGroup - Instructions for setting up SolrCloud on jboss
Hello, Can you please add me to the ContributorsGroup? I would like to add instructions for setting up SolrCloud using Jboss. thanks.
Re: preferred container for running SolrCloud
Thanks Walter. And the container.. On Thu, Jul 11, 2013 at 7:55 PM, Walter Underwood wrote: > Embedded Zookeeper is only for dev. Production needs to run a ZK cluster. > --wunder > > On Jul 11, 2013, at 7:27 PM, Ali, Saqib wrote: > > > With the embedded Zookeeper or separate Zookeeper? Also have run into any > > issues with running SolrCloud on jetty? > > > > > > On Thu, Jul 11, 2013 at 7:01 PM, Saikat Kanjilal >wrote: > > > >> We're running under jetty. > >> > >> Sent from my iPhone > >> > >> On Jul 11, 2013, at 6:06 PM, "Ali, Saqib" > wrote: > >> > >>> 1) Jboss > >>> 2) Jetty > >>> 3) Tomcat > >>> 4) Other.. > >>> > >>> ? > >> > > > > >
Re: preferred container for running SolrCloud
With the embedded Zookeeper or separate Zookeeper? Also have run into any issues with running SolrCloud on jetty? On Thu, Jul 11, 2013 at 7:01 PM, Saikat Kanjilal wrote: > We're running under jetty. > > Sent from my iPhone > > On Jul 11, 2013, at 6:06 PM, "Ali, Saqib" wrote: > > > 1) Jboss > > 2) Jetty > > 3) Tomcat > > 4) Other.. > > > > ? >
preferred container for running SolrCloud
1) Jboss 2) Jetty 3) Tomcat 4) Other.. ?
SolrCloud on Jboss
Hello, Does anyone have step-by-step instructions for running SolrCloud on Jboss? Thanks
Re: SolrJ and SolrCloud
Thanks Mark! On Mon, Jul 8, 2013 at 10:46 AM, Mark Miller wrote: > > On Jul 8, 2013, at 1:40 PM, "Ali, Saqib" wrote: > > > Hello all, > > > > We have an app that uses the SolrJ and instantiates using HttpSolrServer. > > > > Now that we would like to move to SolrCloud, can we still use the same > app, > > or do we HAVE to switch to > > > > CloudSolrServer server = new CloudSolrServer("?"); > > > > right away? > > > > Or will point to one instance using HttpSolrServer suffice for now? > > Yes, it will. > > - Mark > > > > > Thanks. > >
SolrJ and SolrCloud
Hello all, We have an app that uses the SolrJ and instantiates using HttpSolrServer. Now that we would like to move to SolrCloud, can we still use the same app, or do we HAVE to switch to CloudSolrServer server = new CloudSolrServer("?"); right away? Or will point to one instance using HttpSolrServer suffice for now? Thanks.
Re: 2.1billion+ document
Thanks Jason! That was very helpful. I read on the solr wiki that: "Documents must have a unique key and the unique key must be stored (stored="true" in schema.xml)" What is this unique key? Is this just a id that we define in the schema.xml that is unique to all documents? We have something as follows: Will this suffice? Thanks. On Fri, Jul 5, 2013 at 7:45 PM, Jason Hellman < jhell...@innoventsolutions.com> wrote: > Saqib: > > At the simplest level: > > 1) Source the machine > 2) Install Java > 3) Install a servlet container of your choice > 4) Copy your Solr WAR and conf directories as desired (probably a rough > mirror of your current single server) > 5) Start it up and start sending data there > 6) Query both by simply adding: > shards=host1/solr/collection,host2/solr/collection > 7) Profit > > Or, in shorthand: > > 1) Install new Solr instance and start indexing data there > 2) Add the shards parameter to your queries with both (or more) servers > 3) … > 4) Profit > > Now…we usually want to be concerned about how to manage the data so that > we don't send duplicates. Without SolrCloud it is our responsibility to > delegate traffic for updates and deletes. We also like to think a bit more > about how to take advantage of our lovely parallelism to increase index or > query time. We should also consider strategies to isolate domain data to > single shards so as to allow isolated queries against dedicated data models > in single shards. > > But if you just want to basics, it really is as easy as describe above. > > Jason > > > On Jul 5, 2013, at 7:36 PM, "Ali, Saqib" wrote: > > > Hello Otis, > > > > I was thinking more in terms of Solr DistributedSearch rather than > > SolrCloud. I was hoping to add another Solr instance, when the time > comes. > > This is a low use application, but with lot of data. Uptime and query > speed > > are not of importance. However we would like to be able to index more > then > > 2.1 b document when the time comes.. > > > > Any advise will be highly appreciated. > > > > > > Thanks!!! :) > > Saqib > > > > > > On Fri, Jul 5, 2013 at 6:23 PM, Otis Gospodnetic < > otis.gospodne...@gmail.com > >> wrote: > > > >> Hi, > >> > >> It's a broad question, but it starts with getting a few servers, > >> putting Solr 4.3.1 on it (soon 4.4), setting up Zookeeper, creating a > >> Solr Collection (index) with N shards and M replicas, and reindexing > >> your old data to this new cluster, which you can expand with new nodes > >> over time. If you have specific questions... > >> > >> Otis > >> -- > >> Solr & ElasticSearch Support -- http://sematext.com/ > >> Performance Monitoring -- http://sematext.com/spm > >> > >> > >> > >> On Fri, Jul 5, 2013 at 8:42 PM, Ali, Saqib > wrote: > >>> Question regarding the 2.1 billion+ document. > >>> > >>> I understand that a single instance of solr has a limit of 2.1 billion > >>> documents. > >>> > >>> We currently have a single solr server. If we reach 2.1billion > documents > >>> limit, what is involved in moving to the Solr DistributedSearch? > >>> > >>> Thanks! :) > >> > >
Re: 2.1billion+ document
Hello Otis, I was thinking more in terms of Solr DistributedSearch rather than SolrCloud. I was hoping to add another Solr instance, when the time comes. This is a low use application, but with lot of data. Uptime and query speed are not of importance. However we would like to be able to index more then 2.1 b document when the time comes.. Any advise will be highly appreciated. Thanks!!! :) Saqib On Fri, Jul 5, 2013 at 6:23 PM, Otis Gospodnetic wrote: > Hi, > > It's a broad question, but it starts with getting a few servers, > putting Solr 4.3.1 on it (soon 4.4), setting up Zookeeper, creating a > Solr Collection (index) with N shards and M replicas, and reindexing > your old data to this new cluster, which you can expand with new nodes > over time. If you have specific questions... > > Otis > -- > Solr & ElasticSearch Support -- http://sematext.com/ > Performance Monitoring -- http://sematext.com/spm > > > > On Fri, Jul 5, 2013 at 8:42 PM, Ali, Saqib wrote: > > Question regarding the 2.1 billion+ document. > > > > I understand that a single instance of solr has a limit of 2.1 billion > > documents. > > > > We currently have a single solr server. If we reach 2.1billion documents > > limit, what is involved in moving to the Solr DistributedSearch? > > > > Thanks! :) >
2.1billion+ document
Question regarding the 2.1 billion+ document. I understand that a single instance of solr has a limit of 2.1 billion documents. We currently have a single solr server. If we reach 2.1billion documents limit, what is involved in moving to the Solr DistributedSearch? Thanks! :)
Re: [Announcement] Norch- a search engine for node.js
Very interesting. What is the upper limit on the number of documents? Thanks! :) On Fri, Jul 5, 2013 at 11:53 AM, Fergus McDowall wrote: > Here is some news that might be of interest to users and implementers of > Solr > > > http://blog.comperiosearch.com/blog/2013/07/05/norch-a-search-engine-for-node-js/ > > Norch (http://fergiemcdowall.github.io/norch/) is a search engine written > for Node.js. Norch uses the Node search-index module which is in turn > written using the super fast levelDB library that Google open-sourced in > 2011. > > The aim of Norch is to make a simple, fast search server, that requires > minimal configuration to set up. Norch sacrifices complex functionality for > a limited robust feature set, that can be used to set up a free test search > engine for most enterprise scenarios. > > Currently Norch features > > Full text search > Stopword removal > Faceting > Filtering > Relevance weighting (tf-idf) > Field weighting > Paging (offset and resultset length) > > Norch can index any data that is marked up in the appropriate JSON format > > Download the first release of Norch (0.2.1) here ( > https://github.com/fergiemcdowall/norch/releases) >
solrj distributed solr example
Hello all, Can anyone please share a solrj example for distributed solr? Thanks! :)
Re: Moving from single Solr instance to Solr Cloud
Hello Furkan, We are using Solr 4.3 Thanks On Thu, Jul 4, 2013 at 1:43 AM, Furkan KAMACI wrote: > Which version of Solr you are using? > > 2013/7/4 Ali, Saqib > > > We have single Solr instance with lot of indexed document. Now we would > > like to move to SolrCloud implementation. > > > > Can we move the existing index to SolrCloud? If so, how? Or do we need to > > reindex our data in SolrCloud? > > > > Thanks, > > Saqib > > >
Re: omitTermFreqAndPositions="true" in easy English, please?
so in this case since the field type is String, adding omitTermFreqAndPositions="true" does really help in reducing the index size? On Wed, Jul 3, 2013 at 10:00 PM, Jack Krupansky wrote: > Oops... I wasn't reading carefully enough - frequencies and positions only > relate to tokenized fields (text) - not string fields. > > That doesn't impact your ability to do AND and OR of discrete string terms > of a multivalued string field. > > -- Jack Krupansky > > -Original Message- From: Jack Krupansky > Sent: Thursday, July 04, 2013 12:54 AM > > To: solr-user@lucene.apache.org > Subject: Re: omitTermFreqAndPositions="**true" in easy English, please? > > Yes, but it is simply doing an AND or OR of the individual terms - no > phrases or implied ordering of the terms. > > -- Jack Krupansky > > -Original Message- From: Ali, Saqib > Sent: Thursday, July 04, 2013 12:52 AM > To: solr-user@lucene.apache.org > Subject: Re: omitTermFreqAndPositions="**true" in easy English, please? > > Jack, > > Thanks for the explanation! : > > We have a multi-value field as following: > multiValued="true"/> > > Most of these labels are two or more letter phrase e.g. > 1) Google Reader > 2) Google Mail > 3) Google Cloud Storage > > etc. etc. > > if we add omitTermFreqAndPositions="**true" to this field: > multiValued="true" omitTermFreqAndPositions="**true"/> > > Will we be able to execute queries like: > label: (Google Cloud Storage) ? > > Thanks. > > > > > On Wed, Jul 3, 2013 at 8:23 PM, Jack Krupansky > **wrote: > > If you have a text field and simply want to be able to query whether >> individual terms are present in the text without needing to know either >> how >> frequently the terms occur or that some terms may be in present in >> phrases. >> So, you can do AND and OR for individual terms in that field, but not >> phrases, and there is no scoring difference whether a term occurs once or >> a >> thousand times in that field for each document. A lot less information >> needs to be stored in the index. >> >> -- Jack Krupansky >> >> -Original Message- From: Ali, Saqib >> Sent: Wednesday, July 03, 2013 10:31 PM >> To: solr-user@lucene.apache.org >> Subject: omitTermFreqAndPositions="true" in easy English, please? >> >> >> Hello, >> >> Can anyone please explain omitTermFreqAndPositions="true" to me in >> easy >> English, please? >> >> Thanks. >> >>
Re: omitTermFreqAndPositions="true" in easy English, please?
sorry change the query to: label: (Google AND Cloud AND Storage) or will Solr add AND / OR behind the scenes? On Wed, Jul 3, 2013 at 9:59 PM, Ali, Saqib wrote: > So do I have to change my query to > label: (Google Cloud Storage) ? > > or will Solr add AND / OR behind the scenes? > > > On Wed, Jul 3, 2013 at 9:54 PM, Jack Krupansky wrote: > >> Yes, but it is simply doing an AND or OR of the individual terms - no >> phrases or implied ordering of the terms. >> >> >> -- Jack Krupansky >> >> -Original Message- From: Ali, Saqib >> Sent: Thursday, July 04, 2013 12:52 AM >> To: solr-user@lucene.apache.org >> Subject: Re: omitTermFreqAndPositions="**true" in easy English, please? >> >> >> Jack, >> >> Thanks for the explanation! : >> >> We have a multi-value field as following: >> > multiValued="true"/> >> >> Most of these labels are two or more letter phrase e.g. >> 1) Google Reader >> 2) Google Mail >> 3) Google Cloud Storage >> >> etc. etc. >> >> if we add omitTermFreqAndPositions="**true" to this field: >> > multiValued="true" omitTermFreqAndPositions="**true"/> >> >> Will we be able to execute queries like: >> label: (Google Cloud Storage) ? >> >> Thanks. >> >> >> >> >> On Wed, Jul 3, 2013 at 8:23 PM, Jack Krupansky * >> *wrote: >> >> If you have a text field and simply want to be able to query whether >>> individual terms are present in the text without needing to know either >>> how >>> frequently the terms occur or that some terms may be in present in >>> phrases. >>> So, you can do AND and OR for individual terms in that field, but not >>> phrases, and there is no scoring difference whether a term occurs once >>> or a >>> thousand times in that field for each document. A lot less information >>> needs to be stored in the index. >>> >>> -- Jack Krupansky >>> >>> -Original Message- From: Ali, Saqib >>> Sent: Wednesday, July 03, 2013 10:31 PM >>> To: solr-user@lucene.apache.org >>> Subject: omitTermFreqAndPositions="true" in easy English, please? >>> >>> >>> Hello, >>> >>> Can anyone please explain omitTermFreqAndPositions="true" to me in >>> easy >>> English, please? >>> >>> Thanks. >>> >>> >> >
Re: omitTermFreqAndPositions="true" in easy English, please?
So do I have to change my query to label: (Google Cloud Storage) ? or will Solr add AND / OR behind the scenes? On Wed, Jul 3, 2013 at 9:54 PM, Jack Krupansky wrote: > Yes, but it is simply doing an AND or OR of the individual terms - no > phrases or implied ordering of the terms. > > > -- Jack Krupansky > > -Original Message- From: Ali, Saqib > Sent: Thursday, July 04, 2013 12:52 AM > To: solr-user@lucene.apache.org > Subject: Re: omitTermFreqAndPositions="**true" in easy English, please? > > > Jack, > > Thanks for the explanation! : > > We have a multi-value field as following: > multiValued="true"/> > > Most of these labels are two or more letter phrase e.g. > 1) Google Reader > 2) Google Mail > 3) Google Cloud Storage > > etc. etc. > > if we add omitTermFreqAndPositions="**true" to this field: > multiValued="true" omitTermFreqAndPositions="**true"/> > > Will we be able to execute queries like: > label: (Google Cloud Storage) ? > > Thanks. > > > > > On Wed, Jul 3, 2013 at 8:23 PM, Jack Krupansky ** > wrote: > > If you have a text field and simply want to be able to query whether >> individual terms are present in the text without needing to know either >> how >> frequently the terms occur or that some terms may be in present in >> phrases. >> So, you can do AND and OR for individual terms in that field, but not >> phrases, and there is no scoring difference whether a term occurs once or >> a >> thousand times in that field for each document. A lot less information >> needs to be stored in the index. >> >> -- Jack Krupansky >> >> -Original Message- From: Ali, Saqib >> Sent: Wednesday, July 03, 2013 10:31 PM >> To: solr-user@lucene.apache.org >> Subject: omitTermFreqAndPositions="true" in easy English, please? >> >> >> Hello, >> >> Can anyone please explain omitTermFreqAndPositions="true" to me in >> easy >> English, please? >> >> Thanks. >> >> >
Re: omitTermFreqAndPositions="true" in easy English, please?
Jack, Thanks for the explanation! : We have a multi-value field as following: Most of these labels are two or more letter phrase e.g. 1) Google Reader 2) Google Mail 3) Google Cloud Storage etc. etc. if we add omitTermFreqAndPositions="true" to this field: Will we be able to execute queries like: label: (Google Cloud Storage) ? Thanks. On Wed, Jul 3, 2013 at 8:23 PM, Jack Krupansky wrote: > If you have a text field and simply want to be able to query whether > individual terms are present in the text without needing to know either how > frequently the terms occur or that some terms may be in present in phrases. > So, you can do AND and OR for individual terms in that field, but not > phrases, and there is no scoring difference whether a term occurs once or a > thousand times in that field for each document. A lot less information > needs to be stored in the index. > > -- Jack Krupansky > > -Original Message- From: Ali, Saqib > Sent: Wednesday, July 03, 2013 10:31 PM > To: solr-user@lucene.apache.org > Subject: omitTermFreqAndPositions="**true" in easy English, please? > > > Hello, > > Can anyone please explain omitTermFreqAndPositions="**true" to me in easy > English, please? > > Thanks. >
Re: Use case indexed="false" stored="false" field
Thank you Shawn for the excellent use case. :) On Wed, Jul 3, 2013 at 9:34 AM, Shawn Heisey wrote: > On 7/3/2013 9:22 AM, Ali, Saqib wrote: > >> What would be the use case for such a field: >> >> > stored="false"/> >> >> >> and >> >> > stored="false"/> >> > > I have a field like this in my schema. That field is used as one of the > source fields that get copied to my "catchall" field. I don't need the > field by itself, but I use it in conjunction with other fields. > > If I can get the app developers to switch over to using edismax more, I > will get rid of the catchall field and then set that field to indexed and > not stored. > > Thanks, > Shawn > >
Re: unused fields in Solr schema.xml increase the index size
Thanks Jacks! That was very helpful. On Wed, Jul 3, 2013 at 9:54 AM, Jack Krupansky wrote: > If never used, they take up zero space in the index. > > If they were used but are no longed used, they're still there, but any new > or replaced documents will not take up any space for the unused fields > (subject to the facet that deleted fields still exist until a > merge/optimize compresses them away.) > > But, yes, should should try to keep your schema clean - but if the fields > are still populated in some of the documents, you might eventually find > some need to reference them. > > You should keep your schema and config files in a version control system > so that you can always go back or view differences. > > -- Jack Krupansky > > -Original Message- From: Ali, Saqib > Sent: Wednesday, July 03, 2013 11:55 AM > To: solr-user@lucene.apache.org > Subject: unused fields in Solr schema.xml increase the index size > > > Hello all, > > Do unused fields in Solr Schem.xml increase the size of the index files? > > Should we be cleaning up those fields? > > Thanks. > > Saqib >
omitTermFreqAndPositions="true" in easy English, please?
Hello, Can anyone please explain omitTermFreqAndPositions="true" to me in easy English, please? Thanks.
Moving from single Solr instance to Solr Cloud
We have single Solr instance with lot of indexed document. Now we would like to move to SolrCloud implementation. Can we move the existing index to SolrCloud? If so, how? Or do we need to reindex our data in SolrCloud? Thanks, Saqib
unused fields in Solr schema.xml increase the index size
Hello all, Do unused fields in Solr Schem.xml increase the size of the index files? Should we be cleaning up those fields? Thanks. Saqib
Re: Use case indexed="false" stored="false" field
very interesting. thank you all for the explanation!!! :) On Wed, Jul 3, 2013 at 8:32 AM, Jack Krupansky wrote: > Setting both indexed and stored to false means to ignore input values for > that field. > > The effective use case is that these fields may have values in the update > input stream and they will be ignored. Without these field definitions, > those same field values would cause exceptions - references to undefined > fields. In other words, you are telling Solr that it is okay to have inputs > for these fields - simply ignore them. > > But... you could still have update processors that look at the values of > "ignored" fields and maybe assigns them to other, non-ignored fields. > > -- Jack Krupansky > > -Original Message- From: Ali, Saqib > Sent: Wednesday, July 03, 2013 11:22 AM > To: solr-user@lucene.apache.org > Subject: Use case indexed="false" stored="false" field > > > Hello all, > > > What would be the use case for such a field: > > stored="false"/> > > > and > > > > > ? > > > Thanks. >
Use case indexed="false" stored="false" field
Hello all, What would be the use case for such a field: and ? Thanks.
Re: copyField and storage requirements
Thanks Shawn. Here is the text_general type definition. We would like to bring down the storage requirement down to a minimum for those 500KB content documents. We just need basic full-text search. Thanks!!! :) On Tue, Jul 2, 2013 at 11:35 AM, Shawn Heisey wrote: > On 7/2/2013 12:22 PM, Ali, Saqib wrote: > > Newbie question: > > > > We have the following fields defined in the schema: > > > > > > > > > > > > the content is field is about 500KB data. > > > > My question is whether Solr stores the entire contents of the that 500KB > > content field? > > > > We want to minimize the stored data in the Solr index, that is why we > added > > the copyField teaser. > > With that config, the entire 500KB will not be _stored_ .. but it will > affect the index size because you are indexing it. Exactly what degree > that will be depends on the definition of the text_general type. > > Thanks, > Shawn > >
copyField and storage requirements
Newbie question: We have the following fields defined in the schema: the content is field is about 500KB data. My question is whether Solr stores the entire contents of the that 500KB content field? We want to minimize the stored data in the Solr index, that is why we added the copyField teaser. Thanks Saqib
Re: Storing Solr Index on NFS
Hello Walter, Thanks for the response. That has been my experience in the past as well. But I was wondering if there new are things in Solr 4 and NFS 4.1 that make the storing of indexes on a NFS mount feasible. Thanks, Saqib On Mon, Apr 15, 2013 at 9:47 AM, Walter Underwood wrote: > On Apr 15, 2013, at 9:40 AM, Ali, Saqib wrote: > > > Greetings, > > > > Are there any issues with storing Solr Indexes on a NFS share? Also any > > recommendations for using NFS for Solr indexes? > > I recommend that you do not put Solr indexes on NFS. > > It can be very slow, I measured indexing as 100X slower on NFS a few years > ago. > > It is not safe to share Solr index files between two Solr servers, so > there is no benefit to NFS. > > wunder > -- > Walter Underwood > wun...@wunderwood.org > > > >
Storing Solr Index on NFS
Greetings, Are there any issues with storing Solr Indexes on a NFS share? Also any recommendations for using NFS for Solr indexes? Thanks, Saqib
Re: secure deployment of solr.war on jboss
Thanks. Are you using IP tables firewall on the jboss to prevent access from other systems? Or are you using some jboss configuration for that? Thanks, Saqib On Mon, Apr 1, 2013 at 6:25 AM, adityab wrote: > Hi Ali, > > We have Solr 4.2 on Jboss running on a separate VM behind firewall. Only IT > Administration and our FrontEnd Application Server is able to access the > Solr servers in production. > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/secure-deployment-of-solr-war-on-jboss-tp4052754p4052899.html > Sent from the Solr - User mailing list archive at Nabble.com. >
secure deployment of solr.war on jboss
Hello all, We are using Apache Solr 4.2 in our application to provide search capabilities. We are deploying the solr.war file to jboss along with our application. Any suggestions on proper security controls for this type of solr setup? Also solr is now accessible to everyone from the http://jboss_host/solrURL. How can we prevent /solr/ being accessible by all IP addresses? We would like to restrict to certain IP addresses namely the jboss_host and couple of other management API hosts. Any help will be much appreciated. Thanks, Saqib
Re: What is the graceful shutdown API for Solrj embedded?
Hello Alex, I asked a similar question on server fault: http://serverfault.com/a/474442/156440 On Wed, Feb 6, 2013 at 7:05 PM, Alexandre Rafalovitch wrote: > Hello, > > When I CTRL-C the example Solr, it prints a bunch of graceful shutdown > messages. I assume it shuts down safe and without corruption issues. > > When I do that to Solrj (embedded, not remote), it just drops dead. > > I found CoreContainer.shutdown(), which looks about right and does > terminate Solrj but it prints out a completely different set of messages. > > Is CoreContainer.shutdown() the right method for Solrj (4.1)? Is there more > than just one call? > > And what happens if you just Ctrl-C Solrj instance? Wiki says nothing about > shutdown, so I can imagine a lot of people probably think it is ok to just > kill it. Is there a danger of corruption? > > Regards, > Alex. > Personal blog: http://blog.outerthoughts.com/ > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > - Time is the quality of nature that keeps events from happening all at > once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) >
Re: Configuring the jetty shipped with Solr
Thanks Alex. I was able to bind jetty to 127.0.0.1 so that it only accepts connections from localhost using the following: But how I do set it so that it can accept connections from certain non-localhost IP addresses as well? Thanks. On Mon, Feb 4, 2013 at 5:06 PM, Alexandre Rafalovitch wrote: > I believe, for the example directory (as in relative to start.jar), > contexts directory has the url mapping to solr (/solr), etc has some global > jetty properties and solr-webapp/webapp/WEB-INF contains some Solr's > specific jetty configuration. > > Beware that the last one however is a decompressed version of > webapps/solr.war. I don't know if it ever gets overriden after the first > time it is decompressed or not. > > No idea where the actual IP address directive is, though. > > Regards, >Alex. > > Personal blog: http://blog.outerthoughts.com/ > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > - Time is the quality of nature that keeps events from happening all at > once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) > > > On Mon, Feb 4, 2013 at 6:41 PM, Ali, Saqib wrote: > > > Hello all, > > > > How do I change the configuration for the Jetty that is shipped with > Apache > > Solr? Where are the configuration files located? I want to restrict the > IP > > address that can connect to that instance of Solr > > > > Thanks, > > Saqib > > >
Configuring the jetty shipped with Solr
Hello all, How do I change the configuration for the Jetty that is shipped with Apache Solr? Where are the configuration files located? I want to restrict the IP address that can connect to that instance of Solr Thanks, Saqib