RE: [E] Re: Questions about Disk space Usage
Thank you all for your comments and help - I kept the last days' worth of files form the /tmp folder and removed the rest - without any problems or difficulties. Sas -Original Message- From: Walter underwood [mailto:wun...@wunderwood.org] Sent: Saturday, October 29, 2016 1:10 PM To: solr-user@lucene.apache.org Subject: [E] Re: Questions about Disk space Usage If it works the way I think it does, an empty segment should take the same amount of time to read in as a full segment, but zero time to write out. wunder > On Oct 29, 2016, at 9:21 AM, Erick Erickson wrote: > > I would also expect a totally empty segment to be merged very quickly > as the percent deleted documents weighs heavily when determining > whether to merge a segment but that's based on principle, not deep > code knowledge. > > Best, > Erick > >> On Fri, Oct 28, 2016 at 6:02 PM, Walter Underwood >> wrote: >> After the merge. That is what merges do, clean up segments. >> >> I expect it is very rare for a segment to be 100% deleted docs, so it >> isn’t worth handling that case. >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >> >>> On Oct 28, 2016, at 5:54 PM, Alexandre Rafalovitch >>> wrote: >>> >>> Don't the segment that only has deleted documents just gets dropped? >>> Or does it get dropped _after_ the merge and therefore still sits >>> around? >>> >>> Regards, >>> Alex. >>> >>> Solr Example reading group is starting November 2016, join us at >>> http://j.mp/SolrERG Newsletter and resources for Solr beginners and >>> intermediates: >>> http://www.solr-start.com/ >>> >>> >>>> On 29 October 2016 at 08:53, Walter Underwood >>>> wrote: >>>> It is normal for disk usage to double. Under controlled >>>> circumstances, it can triple, but that probably won’t happen. >>>> >>>> This is the second time today that I’ve sent this information to the list. >>>> >>>> It can use nearly 2X the space whenever the largest segment(s) are >>>> merged, especially if there are only a few smaller segments. >>>> >>>> In order to use 3X the space, you need to: >>>> >>>> 1. Disable merging. >>>> 2. Delete all the documents. >>>> 3. Add all the documents. >>>> 4. Enable merging. >>>> >>>> This causes one complete set of segments that are 100% deletes, one >>>> set that is 0% deletes, then the merge creates another set that is >>>> 0% deletes. During the merge, the old files remain while the new >>>> one is created. >>>> >>>> wunder >>>> Walter Underwood >>>> wun...@wunderwood.org >>>> http://observer.wunderwood.org/ (my blog) >>>> >>>> >>>>> On Oct 28, 2016, at 2:41 PM, Alexandre Rafalovitch >>>>> wrote: >>>>> >>>>> 2) Is probably a merge operation. Lucene index segments are not >>>>> rewritable in place, so the merge creates a new file, does >>>>> everything to it, then switches to it. >>>>> >>>>> I remember the number was that the space could temporarily triple >>>>> (?!?) though that may have been before the tiered merge policy. >>>>> >>>>> 3) It should be safe to delete old log files. It is standard log4j stuff. >>>>> >>>>> >>>>> Solr Example reading group is starting November 2016, join us at >>>>> http://j.mp/SolrERG Newsletter and resources for Solr beginners >>>>> and intermediates: >>>>> http://www.solr-start.com/ >>>>> >>>>> >>>>> On 29 October 2016 at 06:55, Jamal, Sarfaraz >>>>> wrote: >>>>>> Hi Guys, >>>>>> >>>>>> I am currently investigating an instance of Solr's Disk space usage and >>>>>> I had a few questions I thought you guys might be able to help answer. >>>>>> >>>>>> First Question >>>>>> * There is 30 gb's worth of autosuggest data in the /tmp folder. >>>>>> Each file is half of a gigabyte Is it safe to delete those files? >>>>>> >>>>>> Second Question >>>>>> Also, we notice that at times the disk runs down to only having a few >>>>>> gigabytes available, and then goes back to having more space. (the index >>>>>> file literally grows and then shrinks). >>>>>> >>>>>> Third Question >>>>>> Is it also safe to delete the log files? >>>>>> >>>>>> We run a database indexer on a set interval, perhaps that is relevant to >>>>>> this discussion. >>>>>> >>>>>> Sas >>>> >>
Questions about Disk space Usage
Hi Guys, I am currently investigating an instance of Solr's Disk space usage and I had a few questions I thought you guys might be able to help answer. First Question * There is 30 gb's worth of autosuggest data in the /tmp folder. Each file is half of a gigabyte Is it safe to delete those files? Second Question Also, we notice that at times the disk runs down to only having a few gigabytes available, and then goes back to having more space. (the index file literally grows and then shrinks). Third Question Is it also safe to delete the log files? We run a database indexer on a set interval, perhaps that is relevant to this discussion. Sas
RE: Question about Simple Post tool
Thank you. That is a great suggestion - Sas -Original Message- From: Scott Chu [mailto:scott@udngroup.com] Sent: Monday, August 1, 2016 10:21 AM To: solr-user Subject: Re: Question about Simple Post tool I don't think it's possible purely using the out-of-box post.jar. But why not disassemble post.jar (or get the source from internet) and modify it yourself. It seems not that hard. Scott Chu,scott@udngroup.com 2016/8/1 (週一) - Original Message ----- From: Jamal, Sarfaraz To: solr-user CC: Date: 2016/8/1 (週一) 22:05 Subject: Question about Simple Post tool Hi Guys, I have a quick question. I read the appropriate documentation and it seems that it is possible, but I might be getting the syntax wrong. I wish to use the simple Post Tool to pass in a URL that brings back a word document, and I Want to index the return of that url using TIka - Is that possible? Or do I have to get the file onto my file system first? Thanks, Sas - 未在此訊息中找到病毒。 已透過 AVG 檢查 - www.avg.com 版本: 2015.0.6201 / 病毒庫: 4627/12724 - 發佈日期: 08/01/16
Question about Simple Post tool
Hi Guys, I have a quick question. I read the appropriate documentation and it seems that it is possible, but I might be getting the syntax wrong. I wish to use the simple Post Tool to pass in a URL that brings back a word document, and I Want to index the return of that url using TIka - Is that possible? Or do I have to get the file onto my file system first? Thanks, Sas
RE: search documents that have a specific field populated
If I understand you properly, I do it using a Filter Query: fq=NOT(field:EMPTY) Hope that helps - Sas -Original Message- From: Valentina Cavazza [mailto:valent...@step-net.it] Sent: Friday, July 15, 2016 10:17 AM To: solr-user@lucene.apache.org Subject: search documents that have a specific field populated Hi, I need to search documents that have a specific field populated, so I want to display all the documents that have the field not empty. This field in schema is set multivalued=true, indexed=true, stored=true, default=EMPTY. This field type is solr.TextField class, use StandardTokenizerFactory tokenizer, ICUFoldingFilterFactory filter, LowerCaseFilterFactory filter and GreekStemFilterFactory filter in index and query analizer. I already tried queries like this: q=field:* q=+field:* q=+field:[* TO *] q=+field:['' TO *] q=+field:["" TO *] q=+field:[' ' TO *] q=+field:' ' q=-field:EMPTY but nothing found. Someone know how to do that? Thanks Valentina
RE: Simple Post Tool result question (UNCLASSIFIED)
I am not entirely sure what you mean, But extra slashes in the middle of a url produce the same result as a single slash (right?). So for example: https://www.visualstudio.com/downloads///download-visual-studio-vs is the same as: https://www.visualstudio.com/downloads/download-visual-studio-vs -Original Message- From: Musshorn, Kris T CTR USARMY RDECOM ARL (US) [mailto:kris.t.musshorn@mail.mil] Sent: Thursday, July 14, 2016 2:09 PM To: solr-user@lucene.apache.org Subject: Simple Post Tool result question (UNCLASSIFIED) CLASSIFICATION: UNCLASSIFIED POSTed web resource https://xx/inside/news/dispatches///view.cfm?id=9128 (depth: 4) What is the significance of the /// ? Thanks, Kris ~~ Kris T. Musshorn FileMaker Developer - Contractor - Catapult Technology Inc. US Army Research Lab Aberdeen Proving Ground Application Management & Development Branch 410-278-7251 kris.t.musshorn@mail.mil ~~ CLASSIFICATION: UNCLASSIFIED
RE: SimplePost tool (UNCLASSIFIED)
In my experience - and if I recall correctly - If the ids are different but the file Is the same, you will have two separate documents that are indexed - Sas Sarfaraz Jamal (Sas) Revenue Assurance Tech Ops 614-560-8556 sarfaraz.ja...@verizonwireless.com -Original Message- From: Musshorn, Kris T CTR USARMY RDECOM ARL (US) [mailto:kris.t.musshorn@mail.mil] Sent: Thursday, July 14, 2016 12:37 PM To: solr-user@lucene.apache.org Subject: SimplePost tool (UNCLASSIFIED) CLASSIFICATION: UNCLASSIFIED Does the simple post tool accomplish deduplication? Thanks, Kris ~~ Kris T. Musshorn FileMaker Developer - Contractor - Catapult Technology Inc. US Army Research Lab Aberdeen Proving Ground Application Management & Development Branch 410-278-7251 kris.t.musshorn@mail.mil ~~ CLASSIFICATION: UNCLASSIFIED
RE: Update index
Hi Kostali, I would look at the Delta Queries - Sas -Original Message- From: kostali hassan [mailto:med.has.kost...@gmail.com] Sent: Wednesday, July 13, 2016 5:17 AM To: solr-user@lucene.apache.org Subject: Update index I am using solr 5.4 1 to index sql database with data import handler. I am looking for update index automatically when the database is modified or insert in it new value.
RE: Searching Home's, Homes and Home
I would start by looking at the stemming documentation - It might be of help. Sas -Original Message- From: Surender [mailto:surender.si...@rsystems.com] Sent: Friday, July 8, 2016 8:30 AM To: solr-user@lucene.apache.org Subject: Searching Home's, Homes and Home User can type keyword for search in many ways an and following are the few examples: if user types any of the keywords homes, home, home's then it should be able to search the following: 1. Home 2. Home's 3. Homes If user types Americas, the results should include 1. Americas 2. America's 3. America Please suggest how to send the search query to Solr to include all the results. -- View this message in context: http://lucene.472066.n3.nabble.com/Searching-Home-s-Homes-and-Home-tp4286341.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: SOLR 6: edismax search query with OR operator does not work as expected
This sounds like it might be of help - < solrQueryParser defaultOperator="AND"/> You can change it from and to or. (If I understood you) - Sas -Original Message- From: Aleš Gregor [mailto:alg...@gmail.com] Sent: Friday, July 8, 2016 9:37 AM To: solr-user@lucene.apache.org Subject: SOLR 6: edismax search query with OR operator does not work as expected Hello, after migrating my index from Solr 4.3 to Solr 6 I noticed that the OR logical operator in search query no longer works as expected. On Solr 4.3 query - Blue OR Red - brings all documents with Blue or Red or both tokens found. On Solr 6 the same query only brings documents with both the tokens, Blue and Red. I see some difference in the debug of the query but I cannot make much sense out of it. Was there any change between Solr 4 and 6 that would cause this? Thanks Ales Gregor
RE: Some questions
Of course, yes -=) Sas Sarfaraz Jamal (Sas) Revenue Assurance Tech Ops 614-560-8556 sarfaraz.ja...@verizonwireless.com -Original Message- From: Siwei Lv [mailto:si...@microsoft.com] Sent: Thursday, July 7, 2016 4:40 AM To: solr-user@lucene.apache.org Subject: Some questions Hi all, I have some questions about solr, Can I send them to this mail box? Thanks, Siwei
RE: Solr more like this
Could you index it, do the 'like this' and then delete it from the index? All in one smooth user experience obviously. (Just throwing it out there). Sas -Original Message- From: Charlie Hull [mailto:char...@flax.co.uk] Sent: Wednesday, July 6, 2016 11:02 AM To: solr-user@lucene.apache.org Subject: Re: Solr more like this On 05/07/2016 19:42, sara hajili wrote: > Hi > I indexed pdf files yo solr.and now I wanna to know is there any way > to uplaod a pdf file and solr return related pdf in result? > I mean I don't want to index pdf file (the file that I wanna to get > pdf more like this for this pdf).and just upload pdf file and get mlt > result.can I do this?? > If Solr hasn't indexed a PDF file, it can't work out it's 'like this'. So I'd say, no, you can't. Cheers Charlie -- Charlie Hull Flax - Open Source Enterprise Search tel/fax: +44 (0)8700 118334 mobile: +44 (0)7767 825828 web: www.flax.co.uk
Question about Indexing Updated Documents
Hi Guys, I have a data-import handler set up that indexes all of the documents from a few small tables. What is the best way to update the index when a single one of those documents change. Is it possible to use SQL or must I post json or xml to solr? Thanks you, Sas
Thank You Guys
Hi Guys, Thank you all - I got synonyms, highlighting, stemming all working the way I wanted to. I am sure I will have more questions later on =) Thanks! Sas
RE: [E] Re: Stemming
Oh, is this what you meant? content_stemming I changed it to content_stemming and now it seems to work :) - It was _text_ before - Thanks! I will update if I discover anything amiss Thanks again so much =) Sas -Original Message- From: Aurélien MAZOYER [mailto:aurelien.mazo...@francelabs.com] Sent: Thursday, June 16, 2016 4:36 PM To: solr-user@lucene.apache.org Subject: Re: [E] Re: Stemming Hi, I was just wondering if you are sure that you query only that field (or fields that use your text_stem analyzer) and not other fields (in your qf for example is you use edismax) that can give you uncorrect results. Regards, Aurélien Le 16/06/2016 22:29, Jamal, Sarfaraz a écrit : > Hello =) > > Just to be safe and make sure it's happening at indexing time AS WELL > as QUERYING time - > > I modified it to be like so: > > > > > words="lang/stopwords_en.txt" ignoreCase="true"/> > > > protected="protwords.txt"/> > > > > > words="lang/stopwords_en.txt" ignoreCase="true"/> > > > protected="protwords.txt"/> > > > > > I am re-indexing the files > And what do you mean about only querying one field? I am not entirely sure I > understand.. > > Sas > > -Original Message- > From: Aurélien MAZOYER [mailto:aurelien.mazo...@francelabs.com] > Sent: Thursday, June 16, 2016 4:20 PM > To: solr-user@lucene.apache.org > Subject: [E] Re: Stemming > > Hi, > > Yes you should have the same resultset. > > Are you sure that you reindex all the data after changing your schema? > Are you sure that you put your analyzer both at indexing and querying? > Are you sure you query only one field? > > Regards, > > Aurélien > > Le 16/06/2016 21:13, Jamal, Sarfaraz a écrit : >> Hi Guys, >> >> I have enabled stemming: >> >> >> >> > language="English"/> >> >> >> >> In the Admin Analysis, I type in running or runs and they both break down to >> run. >> However when I search for run, runs, or running with an actual query >> - >> >> It brings back three different sets of results. >> >> Is that correct? >> >> I would imagine that all three would bring back the exact same resultset? >> >> Sas >>
RE: [E] Re: Stemming
Hello =) Just to be safe and make sure it's happening at indexing time AS WELL as QUERYING time - I modified it to be like so: I am re-indexing the files And what do you mean about only querying one field? I am not entirely sure I understand.. Sas -Original Message- From: Aurélien MAZOYER [mailto:aurelien.mazo...@francelabs.com] Sent: Thursday, June 16, 2016 4:20 PM To: solr-user@lucene.apache.org Subject: [E] Re: Stemming Hi, Yes you should have the same resultset. Are you sure that you reindex all the data after changing your schema? Are you sure that you put your analyzer both at indexing and querying? Are you sure you query only one field? Regards, Aurélien Le 16/06/2016 21:13, Jamal, Sarfaraz a écrit : > Hi Guys, > > I have enabled stemming: > > > >language="English"/> > > > > In the Admin Analysis, I type in running or runs and they both break down to > run. > However when I search for run, runs, or running with an actual query - > > It brings back three different sets of results. > > Is that correct? > > I would imagine that all three would bring back the exact same resultset? > > Sas >
RE: [E] Re: Stemming
HI Ahmet, Thanks for your guidance. I just tried the following two configurations: And They both produced three different sets of results -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] Sent: Thursday, June 16, 2016 3:37 PM To: solr-user@lucene.apache.org Subject: [E] Re: Stemming Hi Jamal, Snowball requires lowercase filter above it. This is documented in javadocs but it is a small but important detail. Please use a lowercase filter after the whitescpace tokenizer. Ahmet On Thursday, June 16, 2016 10:13 PM, "Jamal, Sarfaraz" wrote: Hi Guys, I have enabled stemming: In the Admin Analysis, I type in running or runs and they both break down to run. However when I search for run, runs, or running with an actual query - It brings back three different sets of results. Is that correct? I would imagine that all three would bring back the exact same resultset? Sas
Stemming
Hi Guys, I have enabled stemming: In the Admin Analysis, I type in running or runs and they both break down to run. However when I search for run, runs, or running with an actual query - It brings back three different sets of results. Is that correct? I would imagine that all three would bring back the exact same resultset? Sas
RE: [E] Re: Question(s) about Highlighting
Update on this: I feel I have a good grasp of synonyms: In that I am doing it only at query time and not at indexing time It looks like this in Synonyms.txt sarfaraz jamal,sasjamal, sas,sarfaraz,wiggidy Each one of those bring back the exact same records. However it only highlights Jamal (with a space in front of it) Is there a way I can get the highlight snippets for each of the 4 synonyms of each other? Thank you ! Sas -Original Message- From: Jamal, Sarfaraz [mailto:sarfaraz.ja...@verizonwireless.com.INVALID] Sent: Friday, June 3, 2016 9:52 AM To: solr-user@lucene.apache.org Subject: RE: [E] Re: Question(s) about Highlighting Good Morning Alessandro, I verified it through the analysis tool (thanks for pointing it out), and it appears to be working correctly - As I see all of them as being synonyms of each other for this entry: sasjamal, sarfaraz, sas - When I do it only at indexing time, and disable it during query time (editing the synonyms.txt file SOLR6) - It does not treat them equally When I do it at indexing and query time, it seems to work - but the highlight snippets stop working. I believe it is working, MINUS the highlighting/snippets if that makes sense? Thanks Sarfaraz Jamal (Sas) Revenue Assurance Tech Ops 614-560-8556 sarfaraz.ja...@verizonwireless.com -Original Message- From: Alessandro Benedetti [mailto:abenede...@apache.org] Sent: Thursday, June 2, 2016 5:41 PM To: solr-user@lucene.apache.org Subject: [E] Re: Question(s) about Highlighting Hi Jamal, I assume you are using the Synonym token filter. From the observation I can assume you are using it only at indexing time. This means that when you index you are : 1) given a row in the synonym.txt you index all the terms per row in place of any of the term in the row . 2) given any of the term in the left side of the expression, you index the term in the right side of the expression You can verify this easily with the analysis tool in the Solr UI . On Thu, Jun 2, 2016 at 7:50 PM, Jamal, Sarfaraz < sarfaraz.ja...@verizonwireless.com.invalid> wrote: > I am having some difficulty understanding how to do something and if > it is even possible > > I have tried the following sets of Synonyms: > > 1. sarfaraz, sas, sasjamal > 2. sasjamal,sas => Sarfaraz > > In the second instance, any searches with the world 'sasjamal' do not > appear in the results, as it has been converted to Sarfaraz (I > believe) - > This means you don't use the same synonym.txt at query time. indeed sasjamal is not in the index at all. > In the first instance it works better - I believe all instances of any > of those words appear in the results. However the highlighted > snippets also stop working when any of those words are Matched. Is > there any documentation, insights or help about this issue? > I should verify that, it could be related the term offset. Please take a look to the analysis tool as well to understand better how the offsets are assigned. I remember long time ago there was a discussion about it and a bug or similar raised. Cheers > > Thanks in advance, > > Sas > > > -Original Message- > From: Shawn Heisey [mailto:apa...@elyograg.org] > Sent: Thursday, June 2, 2016 2:43 PM > To: solr-user@lucene.apache.org > Subject: [E] Re: MongoDB and Solr - Massive re-indexing > > On 6/2/2016 11:56 AM, Robert Brown wrote: > > My question is whether sending batches of 1,000 documents to Solr is > > still beneficial (thinking about docs that may not change), or if I > > should look at the MongoDB connector for Solr, based on the volume > > of incoming data we see. > > > > Would the connector still see all docs updating if I re-insert them > > blindly, and thus still send all 50m documents back to Solr everyday > > anyway? > > > > Is my setup quite typical for the MongoDB connector? > > Sending update requests to Solr containing batches of 1000 docs is a > good idea. Depending on how large they are, you may be able to send > even more than 1000. If you can avoid sending documents that haven't > changed, Solr will likely perform better and relevance scoring will be > better, because you won't have as many deleted docs. > > The mongo connector is not software from the Solr project, or even > from Apache. We don't know anything about it. If you have questions > about that software, please contact the people who maintain it. If > their answers lead to questions about Solr itself, then you can bring those > back here. > > Thanks, > Shawn > > -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England
RE: [E] Re: Question about Data Import Handler
I am sorry I might have missed any replies on this. (I was looking out for them) - Is what I am trying to do even possible? Thanks, Sas -Original Message- From: Jamal, Sarfaraz [mailto:sarfaraz.ja...@verizonwireless.com.INVALID] Sent: Thursday, June 9, 2016 12:43 PM To: solr-user Subject: RE: [E] Re: Question about Data Import Handler I am on SOLR6 =) Thanks, Sas -Original Message- From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] Sent: Thursday, June 9, 2016 12:42 PM To: solr-user Subject: [E] Re: Question about Data Import Handler which version of Solr do you run? On Thu, Jun 9, 2016 at 6:23 PM, Jamal, Sarfaraz < sarfaraz.ja...@verizonwireless.com.invalid> wrote: > Hi Guys, > > I have a question about the data import handler and its configuration > file > > This is what a part of my data-config looks like: > > > > > > > > > === > > I would like it so that when its indexed, it returns in xml the > following when on that doc. > > - > This Is my name > This is my description > > The best I have gotten it to do so far is to add to the values in name > and description, which are fields on the doc. > > Thanks for any help - > > P.S. I shall be replying to the other threads as well, I Just took a > break from it to come work on another part of SOLR. > > Sas > -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com>
RE: [E] Re: Question about Data Import Handler
I am on SOLR6 =) Thanks, Sas -Original Message- From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] Sent: Thursday, June 9, 2016 12:42 PM To: solr-user Subject: [E] Re: Question about Data Import Handler which version of Solr do you run? On Thu, Jun 9, 2016 at 6:23 PM, Jamal, Sarfaraz < sarfaraz.ja...@verizonwireless.com.invalid> wrote: > Hi Guys, > > I have a question about the data import handler and its configuration > file > > This is what a part of my data-config looks like: > > > > > > > > > === > > I would like it so that when its indexed, it returns in xml the > following when on that doc. > > - > This Is my name > This is my description > > The best I have gotten it to do so far is to add to the values in name > and description, which are fields on the doc. > > Thanks for any help - > > P.S. I shall be replying to the other threads as well, I Just took a > break from it to come work on another part of SOLR. > > Sas > -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com>
Question about Data Import Handler
Hi Guys, I have a question about the data import handler and its configuration file This is what a part of my data-config looks like: === I would like it so that when its indexed, it returns in xml the following when on that doc. - This Is my name This is my description The best I have gotten it to do so far is to add to the values in name and description, which are fields on the doc. Thanks for any help - P.S. I shall be replying to the other threads as well, I Just took a break from it to come work on another part of SOLR. Sas
Stemming Help
Hi Guys, I am following this tutorial: http://thinknook.com/keyword-stemming-and-lemmatisation-with-apache-solr-2013-08-02/ My (Managed) Schema file looks like this: (in the appropriate places) - - - - I have re-indexed everything - It is not effecting my search at all - - from what I can tell from the analysis tool nothing is happening. Is there something else I am missing or should take a look at, or is it possible to debug this? Or some other documentation I can search though? Thanks! Sas -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Friday, June 3, 2016 2:02 PM To: solr-user@lucene.apache.org Subject: Re: [E] Re: Stemming and Managed Schema On 6/3/2016 9:22 AM, Jamal, Sarfaraz wrote: > I would edit the managed-schema, make my changes, shutdown solr? And > start it back up and verify it is still there? That's the sledgehammer approach. Simple and effective, but Solr does go offline for a short time. > Or is there another way to reload the core/collection? For SolrCloud: https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api2 For non-cloud mode: https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API#CoreAdminAPI-RELOAD Thanks, Shawn
RE: [E] Re: Stemming and Managed Schema
Awesome, So just to make sure I got it right: I would edit the managed-schema, make my changes, shutdown solr? And start it back up and verify it is still there? Or is there another way to reload the core/collection? Thanks! Sas -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Friday, June 3, 2016 11:17 AM To: solr-user@lucene.apache.org Subject: [E] Re: Stemming and Managed Schema On 6/3/2016 9:07 AM, Jamal, Sarfaraz wrote: > I found the following article: > http://thinknook.com/keyword-stemming-and-lemmatisation-with-apache-so > lr-2013-08-02/ > > And I want to do stemming on one of our fields. > > However, I am using a Managed Schema and I am unsure how to add these > two blocks to it - > > I know there is an API for managed schemas, would that support these > additions? You can't edit an existing fieldType with the Schema API. You can entirely replace it, but you have to include the whole definition. https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-ReplaceaFieldType I'm aware that the managed-schema file says to not make manual edits -- but you *can* edit it manually, as long as you are absolutely sure that nobody is using the Schema API until after you complete your edits and reload the core/collection. Thanks, Shawn
Stemming and Managed Schema
Hi Guys, I found the following article: http://thinknook.com/keyword-stemming-and-lemmatisation-with-apache-solr-2013-08-02/ And I want to do stemming on one of our fields. However, I am using a Managed Schema and I am unsure how to add these two blocks to it - I know there is an API for managed schemas, would that support these additions? Thanks! Sas
RE: [E] Re: Question(s) about Highlighting
Good Morning Alessandro, I verified it through the analysis tool (thanks for pointing it out), and it appears to be working correctly - As I see all of them as being synonyms of each other for this entry: sasjamal, sarfaraz, sas - When I do it only at indexing time, and disable it during query time (editing the synonyms.txt file SOLR6) - It does not treat them equally When I do it at indexing and query time, it seems to work - but the highlight snippets stop working. I believe it is working, MINUS the highlighting/snippets if that makes sense? Thanks Sarfaraz Jamal (Sas) Revenue Assurance Tech Ops 614-560-8556 sarfaraz.ja...@verizonwireless.com -Original Message- From: Alessandro Benedetti [mailto:abenede...@apache.org] Sent: Thursday, June 2, 2016 5:41 PM To: solr-user@lucene.apache.org Subject: [E] Re: Question(s) about Highlighting Hi Jamal, I assume you are using the Synonym token filter. From the observation I can assume you are using it only at indexing time. This means that when you index you are : 1) given a row in the synonym.txt you index all the terms per row in place of any of the term in the row . 2) given any of the term in the left side of the expression, you index the term in the right side of the expression You can verify this easily with the analysis tool in the Solr UI . On Thu, Jun 2, 2016 at 7:50 PM, Jamal, Sarfaraz < sarfaraz.ja...@verizonwireless.com.invalid> wrote: > I am having some difficulty understanding how to do something and if > it is even possible > > I have tried the following sets of Synonyms: > > 1. sarfaraz, sas, sasjamal > 2. sasjamal,sas => Sarfaraz > > In the second instance, any searches with the world 'sasjamal' do not > appear in the results, as it has been converted to Sarfaraz (I > believe) - > This means you don't use the same synonym.txt at query time. indeed sasjamal is not in the index at all. > In the first instance it works better - I believe all instances of any > of those words appear in the results. However the highlighted > snippets also stop working when any of those words are Matched. Is > there any documentation, insights or help about this issue? > I should verify that, it could be related the term offset. Please take a look to the analysis tool as well to understand better how the offsets are assigned. I remember long time ago there was a discussion about it and a bug or similar raised. Cheers > > Thanks in advance, > > Sas > > > -Original Message- > From: Shawn Heisey [mailto:apa...@elyograg.org] > Sent: Thursday, June 2, 2016 2:43 PM > To: solr-user@lucene.apache.org > Subject: [E] Re: MongoDB and Solr - Massive re-indexing > > On 6/2/2016 11:56 AM, Robert Brown wrote: > > My question is whether sending batches of 1,000 documents to Solr is > > still beneficial (thinking about docs that may not change), or if I > > should look at the MongoDB connector for Solr, based on the volume > > of incoming data we see. > > > > Would the connector still see all docs updating if I re-insert them > > blindly, and thus still send all 50m documents back to Solr everyday > > anyway? > > > > Is my setup quite typical for the MongoDB connector? > > Sending update requests to Solr containing batches of 1000 docs is a > good idea. Depending on how large they are, you may be able to send > even more than 1000. If you can avoid sending documents that haven't > changed, Solr will likely perform better and relevance scoring will be > better, because you won't have as many deleted docs. > > The mongo connector is not software from the Solr project, or even > from Apache. We don't know anything about it. If you have questions > about that software, please contact the people who maintain it. If > their answers lead to questions about Solr itself, then you can bring those > back here. > > Thanks, > Shawn > > -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England
Question(s) about Highlighting
I am having some difficulty understanding how to do something and if it is even possible I have tried the following sets of Synonyms: 1. sarfaraz, sas, sasjamal 2. sasjamal,sas => Sarfaraz In the second instance, any searches with the world 'sasjamal' do not appear in the results, as it has been converted to Sarfaraz (I believe) - In the first instance it works better - I believe all instances of any of those words appear in the results. However the highlighted snippets also stop working when any of those words are Matched. Is there any documentation, insights or help about this issue? Thanks in advance, Sas -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Thursday, June 2, 2016 2:43 PM To: solr-user@lucene.apache.org Subject: [E] Re: MongoDB and Solr - Massive re-indexing On 6/2/2016 11:56 AM, Robert Brown wrote: > My question is whether sending batches of 1,000 documents to Solr is > still beneficial (thinking about docs that may not change), or if I > should look at the MongoDB connector for Solr, based on the volume of > incoming data we see. > > Would the connector still see all docs updating if I re-insert them > blindly, and thus still send all 50m documents back to Solr everyday > anyway? > > Is my setup quite typical for the MongoDB connector? Sending update requests to Solr containing batches of 1000 docs is a good idea. Depending on how large they are, you may be able to send even more than 1000. If you can avoid sending documents that haven't changed, Solr will likely perform better and relevance scoring will be better, because you won't have as many deleted docs. The mongo connector is not software from the Solr project, or even from Apache. We don't know anything about it. If you have questions about that software, please contact the people who maintain it. If their answers lead to questions about Solr itself, then you can bring those back here. Thanks, Shawn
RE: [E] Re: Faceting Question(s)
Thank you Andrew, that looks like exactly what I am looking for =) Thank you Robert, it looks like we are both doing it in similar fashion =) Thank you MaryJo for jumping right in! Sas -Original Message- From: Andrew Chillrud [mailto:achill...@opentext.com] Sent: Thursday, June 2, 2016 2:17 PM To: solr-user@lucene.apache.org Subject: RE: [E] Re: Faceting Question(s) It is possible to get the original facet counts for the field you are filtering on (we have been using this since Solr 3.6). Don't know if this can be extended to get the original counts for all fields however. This syntax is described here: https://cwiki.apache.org/confluence/display/solr/Faceting Tagging and Excluding Filters You can tag specific filters and exclude those filters when faceting. This is useful when doing multi-select faceting. Consider the following example query with faceting: q=mainquery&fq=status:public&fq=doctype:pdf&facet=true&facet.field=doctype Because everything is already constrained by the filter doctype:pdf, the facet.field=doctype facet command is currently redundant and will return 0 counts for everything except doctype:pdf. To implement a multi-select facet for doctype, a GUI may want to still display the other doctype values and their associated counts, as if the doctype:pdf constraint had not yet been applied. For example: === Document Type === [ ] Word (42) [x] PDF (96) [ ] Excel(11) [ ] HTML (63) To return counts for doctype values that are currently not selected, tag filters that directly constrain doctype, and exclude those filters when faceting on doctype. q=mainquery&fq=status:public&fq={!tag=dt}doctype:pdf&facet=true&facet.field={!ex=dt}doctype Filter exclusion is supported for all types of facets. Both the tag and ex local parameters may specify multiple values by separating them with commas. - Andy - -Original Message- From: Robert Brown [mailto:r...@intelcompute.com] Sent: Thursday, June 02, 2016 2:12 PM To: solr-user@lucene.apache.org Subject: Re: [E] Re: Faceting Question(s) MaryJo, I think you've mis-understood. The counts are different simply because the 2nd query contains an filter of a facet value from the 1st query - that's completely expected. The issue is how to get the original facet counts (with no filters but same q) in the same call as also filtering by one of those facet values. Personally I don't think it's possible, but will be interested to hear others input, since it's a very common situation for me - I cache the first result in memcached and tag future queries as related to the first. Or could you always make 2 calls back to Solr (one original (again), and one with the filters), the caches should help massively. On 02/06/16 19:07, MaryJo Sminkey wrote: > And you're saying the count for the second query is different than > what was returned in the facet? You may need to check for any defaults > you have set up in the solrconfig for the select parser, if for > instance you have any grouping going on, but aren't doing grouping in > your facet, that could result in the counts being off. > > MJ > > > > > On Thu, Jun 2, 2016 at 2:01 PM, Jamal, Sarfaraz < > sarfaraz.ja...@verizonwireless.com.invalid> wrote: > >> Absolutely, >> >> Here is what it looks like: >> >> This brings the right counts as it should http:// >> **select?q=video&hl=true&hl.fl=*&hl.snippets=20&facet=true&fa >> cet.field=team >> >> Then when I specify which team >> http:// >> **select?q=video&hl=true&hl.fl=*&hl.snippets=20&facet=true&fa >> cet.field=team&fq=team:rollback >> >> The counts are obviously different now, as the result set is limited >> to one team. >> >> Sas >> >> -Original Message- >> From: MaryJo Sminkey [mailto:mjsmin...@gmail.com] >> Sent: Thursday, June 2, 2016 1:56 PM >> To: solr-user@lucene.apache.org >> Subject: [E] Re: Faceting Question(s) >> >> Jamai - what is your q= set to? And do you have a fq for the original >> query? I have found that if you do a wildcard search (*.*) you have >> to be careful about other parameters you set as that can often result >> in the numbers returned being off. In my case, my defaults had things >> like edismax settings for phrase boosting, etc. that don't apply if >> there isn't a search term, and once I removed those for a wildcard >> search I got the correct numbers. So possibly your facet query itself >> may be set up correctly but something else in the parameters and/or >> filters with the two queries may be the cause of the difference. >> >> Mary Jo >> >> >>
RE: [E] Re: Faceting Question(s)
Absolutely, Here is what it looks like: This brings the right counts as it should http://**select?q=video&hl=true&hl.fl=*&hl.snippets=20&facet=true&facet.field=team Then when I specify which team http://**select?q=video&hl=true&hl.fl=*&hl.snippets=20&facet=true&facet.field=team&fq=team:rollback The counts are obviously different now, as the result set is limited to one team. Sas -Original Message- From: MaryJo Sminkey [mailto:mjsmin...@gmail.com] Sent: Thursday, June 2, 2016 1:56 PM To: solr-user@lucene.apache.org Subject: [E] Re: Faceting Question(s) Jamai - what is your q= set to? And do you have a fq for the original query? I have found that if you do a wildcard search (*.*) you have to be careful about other parameters you set as that can often result in the numbers returned being off. In my case, my defaults had things like edismax settings for phrase boosting, etc. that don't apply if there isn't a search term, and once I removed those for a wildcard search I got the correct numbers. So possibly your facet query itself may be set up correctly but something else in the parameters and/or filters with the two queries may be the cause of the difference. Mary Jo On Thu, Jun 2, 2016 at 1:47 PM, Jamal, Sarfaraz < sarfaraz.ja...@verizonwireless.com.invalid> wrote: > Hello Everyone, > > I am working on implementing some basic faceting into my project. > > I have it working the way I want to, but I feel like there is probably > a better way the way I went about it. > > * I want to show a category and its count. > * when someone clicks a category, it sets a FQ= to that category. > > But now that the results are being filtered, the category counts from > the original query without the filters are off. > > So, I have a single api call that I make with rows set to 0 and the > base query without any filters, and use that to display my categories. > > And then I call the api again, this time to get the results. And the > category count is the same. > > I hope that makes sense. > > I was hoping facet.query would be of help, but I am not sure I > understood it properly. > > Thanks in advance =) > > Sas >
Faceting Question(s)
Hello Everyone, I am working on implementing some basic faceting into my project. I have it working the way I want to, but I feel like there is probably a better way the way I went about it. * I want to show a category and its count. * when someone clicks a category, it sets a FQ= to that category. But now that the results are being filtered, the category counts from the original query without the filters are off. So, I have a single api call that I make with rows set to 0 and the base query without any filters, and use that to display my categories. And then I call the api again, this time to get the results. And the category count is the same. I hope that makes sense. I was hoping facet.query would be of help, but I am not sure I understood it properly. Thanks in advance =) Sas
RE: [E] Re: Simple Question about SimplePostTool
Thank you. Sas -Original Message- From: Erik Hatcher [mailto:erik.hatc...@gmail.com] Sent: Wednesday, June 1, 2016 4:34 PM To: solr-user@lucene.apache.org Subject: [E] Re: Simple Question about SimplePostTool Yes, you can add “literal” field values with bin/post: bin/post -c test ~/Documents/Test.pdf -params "literal.foo=bar” See https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika#UploadingDatawithSolrCellusingApacheTika-InputParameters for details on what parameters you can use with “rich document” indexing. — Erik Hatcher, Senior Solutions Architect http://www.lucidworks.com > On Jun 1, 2016, at 3:28 PM, Jamal, Sarfaraz > wrote: > > Hi Guys, > > I am a newbie at Solr, so I may have some very simple questions. > I am also waiting for my book to arrive. > > Can the SimplePostTool be used to add additional fields when indexing a > word/excel/text. > > So, for example, as I index a word document, I pass in a parameter > saying team=avengers > > Or something along the lines of that - > > Thank you, > > Sas
Simple Question about SimplePostTool
Hi Guys, I am a newbie at Solr, so I may have some very simple questions. I am also waiting for my book to arrive. Can the SimplePostTool be used to add additional fields when indexing a word/excel/text. So, for example, as I index a word document, I pass in a parameter saying team=avengers Or something along the lines of that - Thank you, Sas