Re: MongoDB and Solr
On 30 May 2012 03:51, rjain15 wrote: > Hi Gora, > > I am working on a Mobile App, which is updating/accessing/searching data and > I have created a simple prototype using Solr and the Update JSON / Get JSON > functions of Solr. > > I came across some discussion on MongoDB and how it natively stores JSON > data, and hence as I was looking at scalability of data storage/indexing, I > was pausing to understand if I am on the right track of just using Solr or > should I combine Solr with MongoDB as I am reading this blog post... [...] A discussion on web architecture is off-topic for this list, and will also probably draw in people with strong opinions. Here is a brief personal opinion, but you are probably better off trying out a couple of different architectural prototypes, and/or talking to someone with experience in scalable sites. First of all, you should consider whether you really need a NoSQL store. This would depend on the scale, and requirements of your app. IMHO, RDBMSes now are proven systems with many years of learning behind them. Thus, your question should be why NoSQL, rather than the other way around. Solr for search should do fine, and you already know how to get JSON in and out of it. Incidentally, we also tested out Solr as a NoSQL store (raw data, and not JSON, though), and were quite happy with the performance. Regards, Gora
RE: useFastVectorHighlighter doesn't work
Thanks a lot, It's quite clear now. -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: 2012年5月29日 16:37 To: solr-user@lucene.apache.org Subject: RE: useFastVectorHighlighter doesn't work > So for highlight, stored="true" is > required in any circumstance, right? Exactly. http://wiki.apache.org/solr/FieldOptionsByUseCase
Re: MongoDB and Solr
Solr does not natively store/index/search arbitrary JSON documents. It accepts JSON in a specific format for document input. wunder On May 29, 2012, at 3:21 PM, rjain15 wrote: > Hi Gora, > > I am working on a Mobile App, which is updating/accessing/searching data and > I have created a simple prototype using Solr and the Update JSON / Get JSON > functions of Solr. > > I came across some discussion on MongoDB and how it natively stores JSON > data, and hence as I was looking at scalability of data storage/indexing, I > was pausing to understand if I am on the right track of just using Solr or > should I combine Solr with MongoDB as I am reading this blog post... > > http://blog.knuthaugen.no/2010/04/cooking-with-mongodb-and-solr.html > http://blog.knuthaugen.no/2010/04/cooking-with-mongodb-and-solr.html > > Maybe this is an incorrect question, as you say -- MongoDB might be an > entirely different beast. > > Apologies for a novice question. My point was, for Mobile / Consumer Web > Apps -- what are the architectural considerations. I don't want it to be a > overkill, hence if solr can natively store/index/search json documents, then > that is the solution I can build on top of. > > > Thanks > Rajesh >
Re: MongoDB and Solr
Hi Gora, I am working on a Mobile App, which is updating/accessing/searching data and I have created a simple prototype using Solr and the Update JSON / Get JSON functions of Solr. I came across some discussion on MongoDB and how it natively stores JSON data, and hence as I was looking at scalability of data storage/indexing, I was pausing to understand if I am on the right track of just using Solr or should I combine Solr with MongoDB as I am reading this blog post... http://blog.knuthaugen.no/2010/04/cooking-with-mongodb-and-solr.html http://blog.knuthaugen.no/2010/04/cooking-with-mongodb-and-solr.html Maybe this is an incorrect question, as you say -- MongoDB might be an entirely different beast. Apologies for a novice question. My point was, for Mobile / Consumer Web Apps -- what are the architectural considerations. I don't want it to be a overkill, hence if solr can natively store/index/search json documents, then that is the solution I can build on top of. Thanks Rajesh -- View this message in context: http://lucene.472066.n3.nabble.com/MongoDB-and-Solr-tp3986636p3986729.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to reduce the result size to 2-3 lines and expand based on user interest
hi iorixxx, Sorry I missed your reply. Let me put my requirement in another way. I have a description field which holds more text(2-3 para graphs) and it is indexed. When User search for any word, if solr finds that word in description I want to show the content probably 2-3 lines which matches the search word? Any ideas how to do this? Thanks in Advance!!! Srini -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-reduce-the-result-size-to-2-3-lines-and-expand-based-on-user-interest-tp3985692p3986727.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multi-words synonyms matching
I recently have had the same use case. I wound up doing this: in both index and query time, the synonyms file is 'expand=false'. All multi-word synonyms map to one single-word synonym (per group). This way, only the main word is indexed or queried. If the synonym file changes, you have to re-index the matching content. On Tue, May 29, 2012 at 1:27 PM, elisabeth benoit wrote: > Hello Bernd, > > Thanks a lot for your answer. I'll work on this. > > Best regards, > Elisabeth > > 2012/5/29 Bernd Fehling > >> Hello Elisabeth, >> >> my synonyms.txt is like your 2nd example: >> >> naturwald, φυσικό\ δάσος, естествена\ гора, prírodný\ les, naravni\ gozd, >> foresta\ naturale, natuurbos, natural\ forest, bosque\ natural, >> természetes\ erdő, >> natūralus\ miškas, prirodna\ šuma, dabiskais\ mežs, floresta\ natural, >> naturskov, >> forêt\ naturelle, naturskog, přírodní\ les, luonnonmetsä, pădure\ naturală, >> las\ naturalny, natürlicher\ wald >> >> >> An example from my system with debugging turned on and searching for >> "naturwald": >> >> >> naturwald >> naturwald >> textth:naturwald textth:"φυσικό δάσος" >> textth:"естествена гора" >> textth:"prírodný les" textth:"naravni gozd" textth:"foresta naturale" >> textth:natuurbos >> textth:"natural forest" textth:"bosque natural" textth:"természetes erdő" >> textth:"natūralus miškas" textth:"prirodna šuma" textth:"dabiskais mežs" >> textth:"floresta natural" textth:naturskov textth:"forêt naturelle" >> textth:naturskog >> textth:"přírodní les" textth:luonnonmetsä textth:"pădure naturală" >> textth:"las naturalny" >> textth:"natürlicher wald" >> ... >> >> As you can see my search for "naturwald" extends to single and multiword >> synonyms e.g. "forêt naturelle" >> >> >> My SynonymFilterFactory has the following settings: >> >> org.apache.solr.analysis.SynonymFilterFactory >> {tokenizerFactory=solr.KeywordTokenizerFactory, >> synonyms=synonyms_eurovoc_desc_desc_ufall.txt, expand=true, format=solr, >> ignoreCase=true, >> luceneMatchVersion=LUCENE_36} >> >> But as I already mentioned, there is much more work to be done to get it >> running than >> just using SynonymFilterFactory. >> >> Regards >> Bernd >> >> >> >> Am 23.05.2012 08:49, schrieb elisabeth benoit: >> > Hello Bernd, >> > >> > Thanks for your advice. >> > >> > I have one question: how did you manage to map one word to a multiwords >> > synonym??? >> > >> > I've tried (in synonyms.txt) >> > >> > mairie, hotel de ville >> > >> > mairie, hotel\ de\ ville >> > >> > mairie => mairie, hotel de ville >> > >> > mairie => mairie, hotel\ de\ ville >> > >> > but nothing prevents mairie from matching with "hotel"... >> > >> > The only way I found is to use >> > tokenizerFactory="solr.KeywordTokenizerFactory" in my synonyms >> declaration >> > in schema.xml, but then since "mairie" is not alone in my index field, it >> > doesn't match. >> > >> > >> > best regards, >> > Elisabeth >> > >> > >> > >> > >> > the only way I found, I schema.xml, is to use >> > >> > >> > >> > 2012/5/15 Bernd Fehling >> > >> >> Without reading the whole thread let me say that you should not trust >> >> the solr admin analysis. It takes the whole multiword search and runs >> >> it all together at once through each analyzer step (factory). >> >> But this is not how the real system works. First pitfall, the query >> parser >> >> is also splitting at white space (if not a phrase query). Due to this, >> >> a multiword query is send chunk after chunk through the analyzer and, >> >> second pitfall, each chunk runs through the whole analyzer by its own. >> >> >> >> So if you are dealing with multiword synonyms you have the following >> >> problems. Either you turn your query into a phrase so that the whole >> >> phrase is analyzed at once and therefore looked up as multiword synonym >> >> but phrase queries are not analyzed !!! OR you send your query chunk >> >> by chunk through the analyzer but then they are not multiwords anymore >> >> and are not found in your synonyms.txt. >> >> >> >> From my experience I can say that it requires some deep work to get it >> done >> >> but it is possible. I have connected a thesaurus to solr which is doing >> >> query time expansion (no need to reindex if the thesaurus changes). >> >> The thesaurus holds synonyms and "used for terms" in 24 languages. So >> >> it is also some kind of language translation. And naturally the >> thesaurus >> >> translates from single term to multi term synonyms and vice versa. >> >> >> >> Regards, >> >> Bernd >> >> >> >> >> >> Am 14.05.2012 13:54, schrieb elisabeth benoit: >> >>> Just for the record, I'd like to conclude this thread >> >>> >> >>> First, you were right, there was no behaviour difference between fq >> and q >> >>> parameters. >> >>> >> >>> I realized that: >> >>> >> >>> 1) my synonym (hotel de ville) has a stopword in it (de) and since I >> used >> >>> tokenizerFactory="solr.KeywordTokenizerFactory" in my synonyms >> >> declaration, >> >>> there was no stopword removal
Re: Multi-words synonyms matching
Hello Bernd, Thanks a lot for your answer. I'll work on this. Best regards, Elisabeth 2012/5/29 Bernd Fehling > Hello Elisabeth, > > my synonyms.txt is like your 2nd example: > > naturwald, φυσικό\ δάσος, естествена\ гора, prírodný\ les, naravni\ gozd, > foresta\ naturale, natuurbos, natural\ forest, bosque\ natural, > természetes\ erdő, > natūralus\ miškas, prirodna\ šuma, dabiskais\ mežs, floresta\ natural, > naturskov, > forêt\ naturelle, naturskog, přírodní\ les, luonnonmetsä, pădure\ naturală, > las\ naturalny, natürlicher\ wald > > > An example from my system with debugging turned on and searching for > "naturwald": > > > naturwald > naturwald > textth:naturwald textth:"φυσικό δάσος" > textth:"естествена гора" > textth:"prírodný les" textth:"naravni gozd" textth:"foresta naturale" > textth:natuurbos > textth:"natural forest" textth:"bosque natural" textth:"természetes erdő" > textth:"natūralus miškas" textth:"prirodna šuma" textth:"dabiskais mežs" > textth:"floresta natural" textth:naturskov textth:"forêt naturelle" > textth:naturskog > textth:"přírodní les" textth:luonnonmetsä textth:"pădure naturală" > textth:"las naturalny" > textth:"natürlicher wald" > ... > > As you can see my search for "naturwald" extends to single and multiword > synonyms e.g. "forêt naturelle" > > > My SynonymFilterFactory has the following settings: > > org.apache.solr.analysis.SynonymFilterFactory > {tokenizerFactory=solr.KeywordTokenizerFactory, > synonyms=synonyms_eurovoc_desc_desc_ufall.txt, expand=true, format=solr, > ignoreCase=true, > luceneMatchVersion=LUCENE_36} > > But as I already mentioned, there is much more work to be done to get it > running than > just using SynonymFilterFactory. > > Regards > Bernd > > > > Am 23.05.2012 08:49, schrieb elisabeth benoit: > > Hello Bernd, > > > > Thanks for your advice. > > > > I have one question: how did you manage to map one word to a multiwords > > synonym??? > > > > I've tried (in synonyms.txt) > > > > mairie, hotel de ville > > > > mairie, hotel\ de\ ville > > > > mairie => mairie, hotel de ville > > > > mairie => mairie, hotel\ de\ ville > > > > but nothing prevents mairie from matching with "hotel"... > > > > The only way I found is to use > > tokenizerFactory="solr.KeywordTokenizerFactory" in my synonyms > declaration > > in schema.xml, but then since "mairie" is not alone in my index field, it > > doesn't match. > > > > > > best regards, > > Elisabeth > > > > > > > > > > the only way I found, I schema.xml, is to use > > > > > > > > 2012/5/15 Bernd Fehling > > > >> Without reading the whole thread let me say that you should not trust > >> the solr admin analysis. It takes the whole multiword search and runs > >> it all together at once through each analyzer step (factory). > >> But this is not how the real system works. First pitfall, the query > parser > >> is also splitting at white space (if not a phrase query). Due to this, > >> a multiword query is send chunk after chunk through the analyzer and, > >> second pitfall, each chunk runs through the whole analyzer by its own. > >> > >> So if you are dealing with multiword synonyms you have the following > >> problems. Either you turn your query into a phrase so that the whole > >> phrase is analyzed at once and therefore looked up as multiword synonym > >> but phrase queries are not analyzed !!! OR you send your query chunk > >> by chunk through the analyzer but then they are not multiwords anymore > >> and are not found in your synonyms.txt. > >> > >> From my experience I can say that it requires some deep work to get it > done > >> but it is possible. I have connected a thesaurus to solr which is doing > >> query time expansion (no need to reindex if the thesaurus changes). > >> The thesaurus holds synonyms and "used for terms" in 24 languages. So > >> it is also some kind of language translation. And naturally the > thesaurus > >> translates from single term to multi term synonyms and vice versa. > >> > >> Regards, > >> Bernd > >> > >> > >> Am 14.05.2012 13:54, schrieb elisabeth benoit: > >>> Just for the record, I'd like to conclude this thread > >>> > >>> First, you were right, there was no behaviour difference between fq > and q > >>> parameters. > >>> > >>> I realized that: > >>> > >>> 1) my synonym (hotel de ville) has a stopword in it (de) and since I > used > >>> tokenizerFactory="solr.KeywordTokenizerFactory" in my synonyms > >> declaration, > >>> there was no stopword removal in the indewed expression, so when > >> requesting > >>> "hotel de ville", after stopwords removal in query, Solr was comparing > >>> "hotel de ville" > >>> with "hotel ville" > >>> > >>> but my queries never even got to that point since > >>> > >>> 2) I made a mistake using "mairie" alone in the admin interface when > >>> testing my schema. The real field was something like "collectivités > >>> territoriales mairie", > >>> so the synonym "hotel de ville" was not even applied, because of the > >>> tokenizerFactory="solr.Keyw
Re: Many Cores with Solr
That was one of my concerns. To date I've been using lucene directly and pointing it at an index for the current authenticated user. solr cores seemed to come close to that. Is the issue with a lot of cores just creating a lot or using many cores concurrently? Erik Hatcher-4 wrote > > You do get relevancy related "leakage" though. With users content all in > the same index and using the same field names, term and document > frequencies across the index will be used for scoring. This may be (and > has been) a good reason to keep separately searchable content in different > indexes/cores. > -- View this message in context: http://lucene.472066.n3.nabble.com/Many-Cores-with-Solr-tp3161889p3986710.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: suggestions developing a multi-version concurrency control (MVCC) mechanism
Solr uses a flat schema. You can store old versions, but you have to encode them somehow and save them as data. On Tue, May 29, 2012 at 7:20 AM, Nicholas Ball wrote: > > Hmmm interesting, that will definitely work and may be the way to go. > Ideally, I'd rather store the older versions within a field of the newest > if possible. > Can one create a custom field that holds other objects? > > Nick > > On Mon, 28 May 2012 17:07:06 -0700, Lance Norskog > wrote: >> You can use the document id and timestamp as a compound unique id. >> Then the search would also sort by id, then by timestamp. Result >> grouping might let you pick the most recent document from each of the >> sorted docs. >> >> On Mon, May 28, 2012 at 3:15 PM, Nicholas Ball >> wrote: >>> >>> Hello all, >>> >>> For the first step of the distributed snapshot isolation system I'm >>> developing for Solr, I'm going to need to have a MVCC mechanism as >>> opposed >>> to the single-version concurrency control mechanism already developed >>> (DistributedUpdateProcessor class). I'm trying to find the very best > way >>> to >>> develop this into Solr 4.x (trunk) and so any help would be greatly >>> appreciated! >>> >>> Essentially I need to be able to store multiple version of a document > so >>> that when you look up a document with a given timestamp, you're given > the >>> correct version (anything the same or older, not fresher). The older >>> versioned documents need to be stored in the index itself to ensure > they >>> are durable and can be manipulated as other Solr data can be. >>> >>> One way to do this is to store the old versioned Solr documents within >>> the >>> latest Solr Document, but I'm not sure this is even possible? >>> Alternatively, I could have the latest versioned Document store the >>> unique >>> keys which point to other older documents. The problem with this is > that >>> it >>> complicates things having various partial objects which all combine as >>> one >>> logically document. >>> >>> Are there any suggestions as to the best way to develop this feature? >>> >>> Thank you in advance for any help you can spare! >>> >>> Nicholas -- Lance Norskog goks...@gmail.com
Re: Many Cores with Solr
In our particular case, we're using this index to do prefix searches for autocomplete of sparse keyword data, so we don't have much to worry about on this front, but I do agree that it's a consideration for those use cases that do reveal information via ranking. Michael Della Bitta Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com On Tue, May 29, 2012 at 4:00 PM, Erik Hatcher wrote: > You do get relevancy related "leakage" though. With users content all in the > same index and using the same field names, term and document frequencies > across the index will be used for scoring. This may be (and has been) a good > reason to keep separately searchable content in different indexes/cores. > > Erik > > > On May 29, 2012, at 15:07 , Mike Douglass wrote: > >> Thank you. >> >> That sounds good - are we sure to get no leakage with this approach? >> >> I'd be indexing personal information which must not be delivered without >> authentication. >> >> The solr instance is front-ended by bedework which can handle the auth and >> adding a query term. >> >>> IMO it would be a better (from Solr's perspective) to handle the security >>> w/ the application code. Each query could include a "?fq=userID:12345..." >>> which would limit results to only what that user is allowed to see. >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Many-Cores-with-Solr-tp3161889p3986675.html >> Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Many Cores with Solr
You do get relevancy related "leakage" though. With users content all in the same index and using the same field names, term and document frequencies across the index will be used for scoring. This may be (and has been) a good reason to keep separately searchable content in different indexes/cores. Erik On May 29, 2012, at 15:07 , Mike Douglass wrote: > Thank you. > > That sounds good - are we sure to get no leakage with this approach? > > I'd be indexing personal information which must not be delivered without > authentication. > > The solr instance is front-ended by bedework which can handle the auth and > adding a query term. > >> IMO it would be a better (from Solr's perspective) to handle the security >> w/ the application code. Each query could include a "?fq=userID:12345..." >> which would limit results to only what that user is allowed to see. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Many-Cores-with-Solr-tp3161889p3986675.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Many Cores with Solr
It's a similar approach as using SQL to filter the rows brought back for a particular user from a table. It's strong as long as you write your queries correctly, you store your data properly, and you guard against injection and privilege escalation. There's an added bonus in this case in that the user's submitted text isn't in the same query as the part that limits the rows they have access to, but if you're doing proper escaping of the query text, that shouldn't be relied on anyway. Michael Della Bitta Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com On Tue, May 29, 2012 at 3:07 PM, Mike Douglass wrote: > Thank you. > > That sounds good - are we sure to get no leakage with this approach? > > I'd be indexing personal information which must not be delivered without > authentication. > > The solr instance is front-ended by bedework which can handle the auth and > adding a query term.
Re: MongoDB and Solr
On 29 May 2012 22:27, rjain15 wrote: > Hi > > I am building web app/mobile app, where users can update information > frequently and there is a search function to quick search the information > using different types of searches. > > Most of the data is going to be posted in JSON Format and stored in JSON > format > > I have a few questions on the architecture choices, I am relatively new to > Solr and MongoDB. > > 1. Should I use MongoDB to store the JSON documents, or does Solr natively > store the documents in the data directory Sorry, but you do not provide nearly enough information for people to be able to make sensible suggestions. What is your use case? MongoDB is largely a different beast from Solr. What do you think merits its use, and where does it fit in your scheme of things? In many cases, one could have both MongoDB, and Solr. In other cases, one or the other might better fit the bill. > 2. Does Solr require a specific schema for the JSON document. You can POST a JSON document to Solr, and get JSON output back. Not sure if this meets your needs, but please take a look at: http://wiki.apache.org/solr/UpdateJSON http://wiki.apache.org/solr/SolJSON Regards, Gora
Re: MongoDB and Solr
Hi This is a sample schema, but it can be more nested as I build the app. As more students enroll, or more classes are added, it will grow. colleges [ "college": { "id" : "college Id" "classes": [ { "id": "0001", "type": "speech", "name": "Speech Class", "credits": 3, "students": { { "id": "1001", "name": "ABC", }, { "id": "1002", "name": "PQQ",... }, { "id": "1003", "name": "AAA",... }, { "id": "1004", "name": "ASA",... } }, "instructors": [ { "id": "5001", "name": "ASAS" }, { "id": "5002", "name": "ASAA" }, ] }, ] "locations": [ { "id": "6001", "address": "Address-1" }, { "id": "6001", "address": "Address-2" }, ] } ] -- View this message in context: http://lucene.472066.n3.nabble.com/MongoDB-and-Solr-tp3986637p3986676.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Many Cores with Solr
Thank you. That sounds good - are we sure to get no leakage with this approach? I'd be indexing personal information which must not be delivered without authentication. The solr instance is front-ended by bedework which can handle the auth and adding a query term. > IMO it would be a better (from Solr's perspective) to handle the security > w/ the application code. Each query could include a "?fq=userID:12345..." > which would limit results to only what that user is allowed to see. -- View this message in context: http://lucene.472066.n3.nabble.com/Many-Cores-with-Solr-tp3161889p3986675.html Sent from the Solr - User mailing list archive at Nabble.com.
Example setup of using Solr 3.6.0 with Jetty 7 (7.6.3)?
Greetings, Has anybody gotten Solr 3.6.0 to work well with Jetty 7.6.3, and if so, would you mind sharing your config files / directory structure / other useful details? Thanks, Aaron
Re: MongoDB and Solr
Could you give us an example of one of your documents. Then we can give you better feedback on what makes sense within Solr. -- Jack Krupansky -Original Message- From: rjain15 Sent: Tuesday, May 29, 2012 2:20 PM To: solr-user@lucene.apache.org Subject: Re: MongoDB and Solr Hi Jack Thanks for the information. I do have multi-level nesting of JSON data. So back to my questions, apologize for repeating... 1. Should I use MongoDB to store the JSON documents, or does Solr natively store the documents in the data directory 2. Does Solr require a specific schema for the JSON document. Thanks Rajesh -- View this message in context: http://lucene.472066.n3.nabble.com/MongoDB-and-Solr-tp3986637p3986662.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: MongoDB and Solr
1. Yes, and 2. Yes. :) Solr's adding more NoSQL-like features for 4.0, but in the meantime, you're better off storing documents with a complex schema in a document store and using Solr for findability. Basically the schema for a document in Solr/Lucene is flat (although it can contain arbitrarily-named fields), so your document will require some sort of transformation for indexing. Michael Della Bitta Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com On Tue, May 29, 2012 at 2:20 PM, rjain15 wrote: > Hi Jack > > Thanks for the information. I do have multi-level nesting of JSON data. > > So back to my questions, apologize for repeating... > > 1. Should I use MongoDB to store the JSON documents, or does Solr natively > store the documents in the data directory > > 2. Does Solr require a specific schema for the JSON document. > > Thanks > Rajesh > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/MongoDB-and-Solr-tp3986637p3986662.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: MongoDB and Solr
Hi Jack Thanks for the information. I do have multi-level nesting of JSON data. So back to my questions, apologize for repeating... 1. Should I use MongoDB to store the JSON documents, or does Solr natively store the documents in the data directory 2. Does Solr require a specific schema for the JSON document. Thanks Rajesh -- View this message in context: http://lucene.472066.n3.nabble.com/MongoDB-and-Solr-tp3986637p3986662.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: MongoDB and Solr
Although Solr uses XML format for document update and query, JSON is a supported option. To post documents in JSON, see: http://wiki.apache.org/solr/UpdateJSON To retrieve query results in JSON, see: http://wiki.apache.org/solr/SolJSON That works well for relatively flat data (each field has a simple value or list of values), but less well if you have complex structure within an individual field value (e.g., multi-level nesting of JSON for a single field value.) For the latter, you would have to store the JSON as a string for such a field. -- Jack Krupansky -Original Message- From: rjain15 Sent: Tuesday, May 29, 2012 12:57 PM To: solr-user@lucene.apache.org Subject: MongoDB and Solr Hi I am building web app/mobile app, where users can update information frequently and there is a search function to quick search the information using different types of searches. Most of the data is going to be posted in JSON Format and stored in JSON format I have a few questions on the architecture choices, I am relatively new to Solr and MongoDB. 1. Should I use MongoDB to store the JSON documents, or does Solr natively store the documents in the data directory 2. Does Solr require a specific schema for the JSON document. Thanks Rajesh -- View this message in context: http://lucene.472066.n3.nabble.com/MongoDB-and-Solr-tp3986637.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: UpdateRequestProcessor : flattened values
Sounds good. Then all that will be needed is a way to disable the SolrCell flattening so that other update processors can see the unflattened field values before they are handled off to a ConcatFieldUpdateProcessor them. -- Jack Krupansky -Original Message- From: Chris Hostetter Sent: Tuesday, May 29, 2012 12:43 PM To: solr-user@lucene.apache.org Subject: Re: UpdateRequestProcessor : flattened values : And it might make sense to have a "multi-value flattening" attribute for Solr : itself rather than in SolrCell. Coming in 4.0... https://builds.apache.org/view/G-L/view/Lucene/job/Solr-trunk/javadoc/org/apache/solr/update/processor/ConcatFieldUpdateProcessorFactory.html DOC Concatenates multiple values for fields matching the specified conditions using a configurable delimiter which defaults to ", ". By default, this processor concatenates the values for any field name which according to the schema is multiValued="false" and uses TextField or StrField DOC -Hoss
MongoDB and Solr
Hi I am building web app/mobile app, where users can update information frequently and there is a search function to quick search the information using different types of searches. Most of the data is going to be posted in JSON Format and stored in JSON format I have a few questions on the architecture choices, I am relatively new to Solr and MongoDB. 1. Should I use MongoDB to store the JSON documents, or does Solr natively store the documents in the data directory 2. Does Solr require a specific schema for the JSON document. Thanks Rajesh -- View this message in context: http://lucene.472066.n3.nabble.com/MongoDB-and-Solr-tp3986637.html Sent from the Solr - User mailing list archive at Nabble.com.
MongoDB and Solr
Hi I am building web app/mobile app, where users can update information frequently and there is a search function to quick search the information using different types of searches. Most of the data is going to be posted in JSON Format and stored in JSON format I have a few questions on the architecture choices, I am relatively new to Solr and MongoDB. 1. Should I use MongoDB to store the JSON documents, or does Solr natively store the documents in the data directory 2. Does Solr require a specific schema for the JSON document. Thanks Rajesh -- View this message in context: http://lucene.472066.n3.nabble.com/MongoDB-and-Solr-tp3986636.html Sent from the Solr - User mailing list archive at Nabble.com.
Relevancy ranking for synonym matches
I was wondering if there is any solution for this. Currently I expand my results to match the synonyms at query time. So if I entered James, I would get results for Jim, Gomes, Game etc as they would be expanded by matching the synonyms for James. But then since this is just a one word match, tf, idf and other parameters dont make sense. I have reset those factors to 1. Hence the results I get have an equal score. What I really want to do is, sort these results by Levenstein Distance without using ~ sign. The issue in using ~ sign is, if I have a synonym which is radically different (say Greg for James), if I use James~0, Greg would not even match closely with James and the number of results returned would be less than the actual number of synonym matches. So my usecase is, without reducing the number of results, I want to sort them by Levenstein Distance, or closest string match to the original query -- View this message in context: http://lucene.472066.n3.nabble.com/Relevancy-ranking-for-synonym-matches-tp3986634.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr backup / replication internals
Hi, Could any one explain me about the internals of Backup / Replication. Please give me more information like do's and don'ts of Backup / Replication. 1. Is the backup / replication incremental ? 2. While taking backup / replication, Whether Solr could add / update the index? 3. Backup command and Backup script does file copy. Is there any difference between them. Regards Ganesh
Re: UpdateRequestProcessor : flattened values
: And it might make sense to have a "multi-value flattening" attribute for Solr : itself rather than in SolrCell. Coming in 4.0... https://builds.apache.org/view/G-L/view/Lucene/job/Solr-trunk/javadoc/org/apache/solr/update/processor/ConcatFieldUpdateProcessorFactory.html DOC Concatenates multiple values for fields matching the specified conditions using a configurable delimiter which defaults to ", ". By default, this processor concatenates the values for any field name which according to the schema is multiValued="false" and uses TextField or StrField DOC -Hoss
Re: Many Cores with Solr
That's what we do. It has the advantage of letting the general queries be cached once across all users. Michael On Tue, May 29, 2012 at 12:39 PM, Klostermeyer, Michael wrote: > IMO it would be a better (from Solr's perspective) to handle the security w/ > the application code. Each query could include a "?fq=userID:12345..." which > would limit results to only what that user is allowed to see. > > Mike > -- Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com
RE: Many Cores with Solr
IMO it would be a better (from Solr's perspective) to handle the security w/ the application code. Each query could include a "?fq=userID:12345..." which would limit results to only what that user is allowed to see. Mike -Original Message- From: Mike Douglass [mailto:mikeadougl...@gmail.com] Sent: Wednesday, May 23, 2012 4:02 PM To: solr-user@lucene.apache.org Subject: Re: Many Cores with Solr My interest in this is the desire to create one index per user of a system - the issue here is privacy - data indexed for one user should not be visible to other users. For this purpose solr will be hidden behind a proxy which steers authenticated sessions to the appropriat ecore. Does this seem like a valid/feasible approach? -- View this message in context: http://lucene.472066.n3.nabble.com/Many-Cores-with-Solr-tp3161889p3985789.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: TF-IDF vector
"Does the tf-idf vector represents one doc or set of docs?" IDF is calculated across all docs that contain the term. TF is calculated for a single document containing the term. Each term of each doc will have its own tf-idf. -- Jack Krupansky -Original Message- From: Allen Sent: Tuesday, May 29, 2012 12:11 PM To: solr-user@lucene.apache.org Subject: TF-IDF vector Hi List, I am curious about the meaning of tf-idf vector after reading this http://wiki.apache.org/solr/TermVectorComponent. The tf flag returns me the tf vector for just one doc. The df flag returns me the df vector of all the docs in the index. Does the tf-idf vector represents one doc or set of docs? Too, can I specify a subset of docs which the df vector is calculated on rather than the entire set of docs?
TF-IDF vector
Hi List, I am curious about the meaning of tf-idf vector after reading this http://wiki.apache.org/solr/TermVectorComponent. The tf flag returns me the tf vector for just one doc. The df flag returns me the df vector of all the docs in the index. Does the tf-idf vector represents one doc or set of docs? Too, can I specify a subset of docs which the df vector is calculated on rather than the entire set of docs?
sort in local params and rows parameter
Hello, we're having some issues with a Solr query and are unsure if we've encountered a bug or just don't understand the expected behaviour. Any help would be appreciated. The problem is this: we're running a query using the browser that for debugging purposes looks like this: q={!sort%3D"eventId%20asc"}a&rows=2 here eventId is a long field in our schema. The sort works fine, but the query returns 10 results (out of 35), clearly ignoring the rows parameter. For reference, q=a&rows=2 only returns 2 results (again out of 35). We can go around this by introducing rows as a local parameter instead: q={!sort%3D"eventId%20asc"+rows%3D2}a this only returns 2 results, as expected. So, it seems that using sort as a local parameter causes solr to ignore the external rows parameter. This does not seem to be true for any local parameters, only "sort" (as far as we can tell). Why is this happening? -- View this message in context: http://lucene.472066.n3.nabble.com/sort-in-local-params-and-rows-parameter-tp3986615.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is optimize needed on slaves if it replicates from optimized master?
You do not need to use optimize at all. Solr continually merges segments ("optimizes") as needed. wunder On May 29, 2012, at 6:08 AM, sudarshan wrote: > Hi Walter, > Thank you. Do you mean that optimize need not be used at all? > If Solr merges segments (when needed as you said), is there a criteria > during which Solr does this automatically. If I want the search to be faster > and Solr does not optimize for quite a long time, would it not compromise my > query processing rate? > > To All, > I have another doubt. If I optimize and replicate, for the > first time it would transfer all the segments from the master to slave > irrespective of the modified segment(s). After first replication, how the > transfer would be made - again all segments are replicated or only the > modified segments are replicated? I believe after the first replication > (master and slave in sync), only the modified segments would be transferred > just like the non-optimized index transfer. Am I right? > > Regards, > Sudarshan > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Is-optimize-needed-on-slaves-if-it-replicates-from-optimized-master-tp3241604p3986597.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: suggestions developing a multi-version concurrency control (MVCC) mechanism
Hmmm interesting, that will definitely work and may be the way to go. Ideally, I'd rather store the older versions within a field of the newest if possible. Can one create a custom field that holds other objects? Nick On Mon, 28 May 2012 17:07:06 -0700, Lance Norskog wrote: > You can use the document id and timestamp as a compound unique id. > Then the search would also sort by id, then by timestamp. Result > grouping might let you pick the most recent document from each of the > sorted docs. > > On Mon, May 28, 2012 at 3:15 PM, Nicholas Ball > wrote: >> >> Hello all, >> >> For the first step of the distributed snapshot isolation system I'm >> developing for Solr, I'm going to need to have a MVCC mechanism as >> opposed >> to the single-version concurrency control mechanism already developed >> (DistributedUpdateProcessor class). I'm trying to find the very best way >> to >> develop this into Solr 4.x (trunk) and so any help would be greatly >> appreciated! >> >> Essentially I need to be able to store multiple version of a document so >> that when you look up a document with a given timestamp, you're given the >> correct version (anything the same or older, not fresher). The older >> versioned documents need to be stored in the index itself to ensure they >> are durable and can be manipulated as other Solr data can be. >> >> One way to do this is to store the old versioned Solr documents within >> the >> latest Solr Document, but I'm not sure this is even possible? >> Alternatively, I could have the latest versioned Document store the >> unique >> keys which point to other older documents. The problem with this is that >> it >> complicates things having various partial objects which all combine as >> one >> logically document. >> >> Are there any suggestions as to the best way to develop this feature? >> >> Thank you in advance for any help you can spare! >> >> Nicholas
Re: Solr - 1143
That issue is marked as a duplicate of SOLR-3134, which has a patch for Solr 3.5. https://issues.apache.org/jira/browse/SOLR-3134 -- Jack Krupansky -Original Message- From: Ramprakash Ramamoorthy Sent: Tuesday, May 29, 2012 3:03 AM To: solr-user@lucene.apache.org Subject: Solr - 1143 Dear all, A small doubt. I realised I will have to apply the patch mentioned in Solr Jira 1143 to return partial results when one of my shards is dead/slow. But the patch has no version explicitly specified. I am using Solr 3.5.0 and can I apply the patch to my installation as such? -- With Thanks and Regards, Ramprakash Ramamoorthy, Engineer Trainee, Zoho Corporation. +91 9626975420
Re: Multicore Issue - Server Restart
Hi Suajtha, each webapps has its own solr home ? Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Multicore-Issue-Server-Restart-tp3986516p3986602.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is optimize needed on slaves if it replicates from optimized master?
Hi Walter, Thank you. Do you mean that optimize need not be used at all? If Solr merges segments (when needed as you said), is there a criteria during which Solr does this automatically. If I want the search to be faster and Solr does not optimize for quite a long time, would it not compromise my query processing rate? To All, I have another doubt. If I optimize and replicate, for the first time it would transfer all the segments from the master to slave irrespective of the modified segment(s). After first replication, how the transfer would be made - again all segments are replicated or only the modified segments are replicated? I believe after the first replication (master and slave in sync), only the modified segments would be transferred just like the non-optimized index transfer. Am I right? Regards, Sudarshan -- View this message in context: http://lucene.472066.n3.nabble.com/Is-optimize-needed-on-slaves-if-it-replicates-from-optimized-master-tp3241604p3986597.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multi-words synonyms matching
Hello Elisabeth, my synonyms.txt is like your 2nd example: naturwald, φυσικό\ δάσος, естествена\ гора, prírodný\ les, naravni\ gozd, foresta\ naturale, natuurbos, natural\ forest, bosque\ natural, természetes\ erdő, natūralus\ miškas, prirodna\ šuma, dabiskais\ mežs, floresta\ natural, naturskov, forêt\ naturelle, naturskog, přírodní\ les, luonnonmetsä, pădure\ naturală, las\ naturalny, natürlicher\ wald An example from my system with debugging turned on and searching for "naturwald": naturwald naturwald textth:naturwald textth:"φυσικό δάσος" textth:"естествена гора" textth:"prírodný les" textth:"naravni gozd" textth:"foresta naturale" textth:natuurbos textth:"natural forest" textth:"bosque natural" textth:"természetes erdő" textth:"natūralus miškas" textth:"prirodna šuma" textth:"dabiskais mežs" textth:"floresta natural" textth:naturskov textth:"forêt naturelle" textth:naturskog textth:"přírodní les" textth:luonnonmetsä textth:"pădure naturală" textth:"las naturalny" textth:"natürlicher wald" ... As you can see my search for "naturwald" extends to single and multiword synonyms e.g. "forêt naturelle" My SynonymFilterFactory has the following settings: org.apache.solr.analysis.SynonymFilterFactory {tokenizerFactory=solr.KeywordTokenizerFactory, synonyms=synonyms_eurovoc_desc_desc_ufall.txt, expand=true, format=solr, ignoreCase=true, luceneMatchVersion=LUCENE_36} But as I already mentioned, there is much more work to be done to get it running than just using SynonymFilterFactory. Regards Bernd Am 23.05.2012 08:49, schrieb elisabeth benoit: > Hello Bernd, > > Thanks for your advice. > > I have one question: how did you manage to map one word to a multiwords > synonym??? > > I've tried (in synonyms.txt) > > mairie, hotel de ville > > mairie, hotel\ de\ ville > > mairie => mairie, hotel de ville > > mairie => mairie, hotel\ de\ ville > > but nothing prevents mairie from matching with "hotel"... > > The only way I found is to use > tokenizerFactory="solr.KeywordTokenizerFactory" in my synonyms declaration > in schema.xml, but then since "mairie" is not alone in my index field, it > doesn't match. > > > best regards, > Elisabeth > > > > > the only way I found, I schema.xml, is to use > > > > 2012/5/15 Bernd Fehling > >> Without reading the whole thread let me say that you should not trust >> the solr admin analysis. It takes the whole multiword search and runs >> it all together at once through each analyzer step (factory). >> But this is not how the real system works. First pitfall, the query parser >> is also splitting at white space (if not a phrase query). Due to this, >> a multiword query is send chunk after chunk through the analyzer and, >> second pitfall, each chunk runs through the whole analyzer by its own. >> >> So if you are dealing with multiword synonyms you have the following >> problems. Either you turn your query into a phrase so that the whole >> phrase is analyzed at once and therefore looked up as multiword synonym >> but phrase queries are not analyzed !!! OR you send your query chunk >> by chunk through the analyzer but then they are not multiwords anymore >> and are not found in your synonyms.txt. >> >> From my experience I can say that it requires some deep work to get it done >> but it is possible. I have connected a thesaurus to solr which is doing >> query time expansion (no need to reindex if the thesaurus changes). >> The thesaurus holds synonyms and "used for terms" in 24 languages. So >> it is also some kind of language translation. And naturally the thesaurus >> translates from single term to multi term synonyms and vice versa. >> >> Regards, >> Bernd >> >> >> Am 14.05.2012 13:54, schrieb elisabeth benoit: >>> Just for the record, I'd like to conclude this thread >>> >>> First, you were right, there was no behaviour difference between fq and q >>> parameters. >>> >>> I realized that: >>> >>> 1) my synonym (hotel de ville) has a stopword in it (de) and since I used >>> tokenizerFactory="solr.KeywordTokenizerFactory" in my synonyms >> declaration, >>> there was no stopword removal in the indewed expression, so when >> requesting >>> "hotel de ville", after stopwords removal in query, Solr was comparing >>> "hotel de ville" >>> with "hotel ville" >>> >>> but my queries never even got to that point since >>> >>> 2) I made a mistake using "mairie" alone in the admin interface when >>> testing my schema. The real field was something like "collectivités >>> territoriales mairie", >>> so the synonym "hotel de ville" was not even applied, because of the >>> tokenizerFactory="solr.KeywordTokenizerFactory" in my synonym definition >>> not splitting field into words when parsing >>> >>> So my problem is not solved, and I'm considering solving it outside of >> Solr >>> scope, unless someone else has a clue >>> >>> Thanks again, >>> Elisabeth >>> >>> >>> >>> 2012/4/25 Erick Erickson >>> A little farther down the debug info output you'll f
Query elevation / boosting or something else to guarantee document position
Hi all, I have an index with thousands of products with various fields (manufacturer, price, popularity, type, color, ...) and I want to guarantee at least one product by a particular manufacturer to be within the first 5 results. The search is done mainly by using filter params and results are ordered by function e.g.: "product(price, popularity) asc" or by "discount desc" And I need to guarantee that if there is any product matching the given filters made by a concrete manufacturer, then it will be on the 5th position at worst, even if the position by the order function is worse. It seems to me that the Query elevation component is not the right thing for me. I don't know the query in advance (or the set of filter criteria) and I don't know concrete product that will be the best for the criteria within the order. And also I don't think that I can construct a function with such requirements to use it directly for ordering the results. Of course I can make a second query in case there is no desired product on the first page of results and put it there, but it requires additional request to solr and complicates results processing and further pagination. Can anybody suggest any solution? Thanks Wenca
A few random questions about solr queries.
*1)* With faceting, how does facet.query perform in comparison to facet.field? I'm just wondering this as in my use case, I need to facet over a field -- which would get me the top n facets for that field, but I also need to show the count for a "selected filter" which might have a relatively low count so it doesn't appear in the top n returned facets. So the solution would be to 'ensure' its presence by adding a 'facet.query=cat:val' in addition to my facet.field=cat. I want to do this to quite a few fields. Related/example-based question: When I facet over a field, and something gets returned, eg: John Smith (83), and I also 'ensure' this facet's presence by having it in facet.query=author:"John Smith", are two different calculations performed? Or is the facet returned by facet.field also used by facet.query to obtain the count? *2) *Is there a performance issue if I have around, say, 20 facet.query conditions along with 10 facet.fields? 3/10 of those fields have around 100,000 possible values. Remaining have a few hundred each. *3)* I've rummaged around a bit, looking for info on when to use q vs fq. I want to clear my doubts for a certain use case. Where should my date range queries go? In q or fq? The default settings in my site show results from the past 90 days with buttons to show stuff from the last month and week as well. But the user is allowed to use a slider to apply any date range... this is allowed, but it's not /that/ common. I definitely use fq for filtering various tags. Choosing a tag is a common activity. Should the date range query go in fq? As I mentioned, the default view shows stuff from the past 90 days. So on each new day does this like invalidate stuff in the cache? Or is stuff stored in the filtered cache in some way that makes it easy to fetch stuff from the past 89 days when a query is performed the next day? -- View this message in context: http://lucene.472066.n3.nabble.com/A-few-random-questions-about-solr-queries-tp3986562.html Sent from the Solr - User mailing list archive at Nabble.com.
[SolrCloud] Replication Factor
Hello all, The page http://wiki.apache.org/solr/NewSolrCloudDesign is mentioning "Replication Factor" It is a feature supported by Katta. Is it actually supported by SolrCloud ? A more general question: katta had some pretty good features like this one. Why is katta not active anymore ? Is there a way to run equivalent functionalities with another Solr based framework today, if these doesn't exist in SolrCloud yet ? Thank you.
RE: useFastVectorHighlighter doesn't work
> So for highlight, stored="true" is > required in any circumstance, right? Exactly. http://wiki.apache.org/solr/FieldOptionsByUseCase
RE: useFastVectorHighlighter doesn't work
So for highlight, stored="true" is required in any circumstance, right? -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: 2012年5月29日 16:04 To: solr-user@lucene.apache.org Subject: RE: useFastVectorHighlighter doesn't work > The reason why I use useFastVectorHighlighter is because I want to set > stored="false", and with more settings like termVectors="true" > termPositions="true" > termOffsets="true". If stored="true", what is the difference between > normal highlight and useFastVectorHighlighter? What is the right > situation for using useFastVectorHighlighter? term*="true" makes sense only for stored="true". FastVectorHighlighter requires and makes use of term*="true" for speedup.
RE: useFastVectorHighlighter doesn't work
> The reason why I use useFastVectorHighlighter is because I > want to set stored="false", and with more settings > like termVectors="true" termPositions="true" > termOffsets="true". If stored="true", what is the difference > between normal highlight and useFastVectorHighlighter? What > is the right situation for using useFastVectorHighlighter? term*="true" makes sense only for stored="true". FastVectorHighlighter requires and makes use of term*="true" for speedup.