Re: Stemming and other tokenizers
I can't create one field per language, that is the problem but I'll dig into it following your indications. I let you know what I could come out with. Patrick. 2011/9/11 Jan Høydahl > Hi, > > You'll not be able to detect language and change stemmer on the same field > in one go. You need to create one fieldType in your schema per language you > want to use, and then use LanguageIdentification (SOLR-1979) to do the magic > of detecting language and renaming the field. If you set > langid.override=false, languid.map=true and populate your "language" field > with the known language, you will probably get the desired effect. > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > Solr Training - www.solrtraining.com > > On 10. sep. 2011, at 03:24, Patrick Sauts wrote: > > > Hello, > > > > > > > > I want to implement some king of AutoStemming that will detect the > language > > of a field based on a tag at the start of this field like #en# my field > is > > stored on disc but I don't want this tag to be stored. Is there a way to > > avoid this field to be stored ? > > > > To me all the filters and the tokenizers interact only with the indexed > > field and not the stored one. > > > > Am I wrong ? > > > > Is it possible to you to do such a filter. > > > > > > > > Patrick. > > > >
Re: Adding Query Filter custom implementation to Solr's pipeline
: When I was using Lucene directly I used a custom implementation of query : filter to enforce entitlements of search results. Now, that I'm : switching my infrastructure from custom host to Solr, what is the best : way to configure Solr to use my custom query filter for every request? It depends on how complex your custom Filter was. many people find that things that when using Solr, they can reimplement basic Filter logic using "fq" params and the built in QParsers provided by solr. If you do need to implement something truely custom, writing it as your own QParser to trigger via an "fq" can be advantageous so it can cached and re-used by many queries. If that doesn't cut it for you, some people implement their own SearchComponents to manipulate the Queries. And as a last resort: you can always implement your own RequestHandler and directly use so SolrIndexSearcher to execute the queyr anyway you want -- but if you don't use the DocList/DocSet methods, other built in features like faceting won't be very easy to use. If you provide some more details on how your existing Filter work,s people cna provide more advice on what would make the most sense. -Hoss
Re: Using multivalued field in map function
: Hmmm, would it be simpler to do something like append : a clause like this? : BloggerId:12304^10 OR CoBloggerId:123404^5? Definitely, but that won't garuntee you a strict ordering if there is a particularly good relevany match. There's a bunch of ways to go about something like this, but trying to use the map function is definitely overkill (even if it could work on multivalued fields) this kind of thing is particularly easy with the sort by function feature added in 3.2 -- because any query can be used as a function ... q=your_query&sort=query(BloggerId:12304)+desc,+query(CoBloggerId:123404)+desc,+score+desc -Hoss
Re: Solr and DateTimes - bug?
: The XML output when performing a query via the solr interface is like this: : 1-01-01T00:00:00Z i think you mean: 1-01-01T00:00:00Z : > > So my question is: Is this a bug in the solr output engine, or should mono : > > be able to parse the date as given from solr? I have not yet tried it out : > > on .net as I do not have access to a windows machine at the moment. it is in fact a bug in Solr that not a lot of people have been overly concerned with some most people don't deal with dates that far back https://issues.apache.org/jira/browse/SOLR-1899 ...I spent a little time working on it at one point but got side tracked by other things since there are a coupld of related issues with the canonical iso8601 date format arround year "0" that made it non obvious what hte "ideal" solution was. -Hoss
Parameter not working for master/slave
I am using 3.3 SOLR. I tried passing in -Denable.master=true and -Denable.slave=true on the Slave machine. Then I changed solrconfig.xml to reference each as per: http://wiki.apache.org/solr/SolrReplication#enable.2BAC8-disable_master.2BAC8-slave_in_a_node But this is not working. The enable parameter does not appear to work in 3.3. If this supposed to be working? What else can I do to debug it? How can I see other parameters working in solrconfig.xml ? -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: Will Solr/Lucene crawl multi websites (aka a mini google with faceted search)?
On Sep 11, 2011, at 7:04pm, dpt9876 wrote: > Hi thanks for the reply. > > How does nutch/solr handle the scenario where 1 website calls price, "price" > and another website calls it "cost". Same thing different name, yet I would > want the facet to handle that and not create a different facet. > > Is this combo of nutch and Solr that intelligent and or intuitive? What you're describing here is web mining, not web crawling. You want to extract price data from web pages, and put that into a specific field in Solr. To do that using Nutch, you'd need to write custom plug-ins that know how to extract the price from a page, and add that as a custom field to the crawl results. The above is a topic for the Nutch mailing list, since Solr is just a downstream consumer of whatever Nutch provides. -- Ken > On Sep 12, 2011 9:06 AM, "Erick Erickson [via Lucene]" < > ml-node+s472066n3328340...@n3.nabble.com> wrote: >> >> >> Nope, there's nothing in Solr that crawls anything, you have to feed >> documents in yourself from the websites. >> >> Or, look at the Nutch project, see: http://nutch.apache.org/about.html >> >> which is designed for this kind of problem. >> >> Best >> Erick >> >> On Sun, Sep 11, 2011 at 8:53 PM, dpt9876 > wrote: >>> Hi all, >>> I am wondering if Solr will do the following for a project I am working > on. >>> I want to create a search engine with facets for potentially hundreds of >>> websites. >>> Similar to say crawling amazon + buy.com + ebay and someone can search > these >>> 3 sites from my 1 website. >>> (I realise there are better ways of doing the above example, its for >>> illustrative purposes). >>> Eventually I would build that search crawl to index say 200 or 1000 >>> merchants. >>> Someone would come to my site and search for "digital camera". >>> >>> They would get results from all 3 indexes and hopefully dynamic facets eg >>> Price $100-200 >>> Price 200-300 >>> Resolution 1mp-2mp >>> >>> etc etc >>> >>> Can this be done on the fly? >>> >>> I ask this because I am currently developing webscrapers to crawl these >>> websites, dump that data into a db, then was thinking of tacking on a > solr >>> server to crawl my db. >>> >>> Problem with that approach is that crawling the worlds ecommerce sites > will >>> take forever, when it seems solr might do that for me? (I have read about >>> multiple indexes etc). >>> >>> Many thanks >>> >>> -- >>> View this message in context: > http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328314.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >> >> >> ___ >> If you reply to this email, your message will be added to the discussion > below: >> > http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328340.html >> >> To unsubscribe from Will Solr/Lucene crawl multi websites (aka a mini > google with faceted search)?, visit > http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3328314&code=ZGFuaW50aGV0cm9waWNzQGdtYWlsLmNvbXwzMzI4MzE0fC04MDk0NTc1ODg= > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328449.html > Sent from the Solr - User mailing list archive at Nabble.com. -- Ken Krugler +1 530-210-6378 http://bixolabs.com custom big data solutions & training Hadoop, Cascading, Mahout & Solr
Re: Will Solr/Lucene crawl multi websites (aka a mini google with faceted search)?
Hi thanks for the reply. How does nutch/solr handle the scenario where 1 website calls price, "price" and another website calls it "cost". Same thing different name, yet I would want the facet to handle that and not create a different facet. Is this combo of nutch and Solr that intelligent and or intuitive? Thanks for the fast response. On Sep 12, 2011 9:06 AM, "Erick Erickson [via Lucene]" < ml-node+s472066n3328340...@n3.nabble.com> wrote: > > > Nope, there's nothing in Solr that crawls anything, you have to feed > documents in yourself from the websites. > > Or, look at the Nutch project, see: http://nutch.apache.org/about.html > > which is designed for this kind of problem. > > Best > Erick > > On Sun, Sep 11, 2011 at 8:53 PM, dpt9876 wrote: >> Hi all, >> I am wondering if Solr will do the following for a project I am working on. >> I want to create a search engine with facets for potentially hundreds of >> websites. >> Similar to say crawling amazon + buy.com + ebay and someone can search these >> 3 sites from my 1 website. >> (I realise there are better ways of doing the above example, its for >> illustrative purposes). >> Eventually I would build that search crawl to index say 200 or 1000 >> merchants. >> Someone would come to my site and search for "digital camera". >> >> They would get results from all 3 indexes and hopefully dynamic facets eg >> Price $100-200 >> Price 200-300 >> Resolution 1mp-2mp >> >> etc etc >> >> Can this be done on the fly? >> >> I ask this because I am currently developing webscrapers to crawl these >> websites, dump that data into a db, then was thinking of tacking on a solr >> server to crawl my db. >> >> Problem with that approach is that crawling the worlds ecommerce sites will >> take forever, when it seems solr might do that for me? (I have read about >> multiple indexes etc). >> >> Many thanks >> >> -- >> View this message in context: http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328314.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > > > ___ > If you reply to this email, your message will be added to the discussion below: > http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328340.html > > To unsubscribe from Will Solr/Lucene crawl multi websites (aka a mini google with faceted search)?, visit http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3328314&code=ZGFuaW50aGV0cm9waWNzQGdtYWlsLmNvbXwzMzI4MzE0fC04MDk0NTc1ODg= -- View this message in context: http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328449.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Will Solr/Lucene crawl multi websites (aka a mini google with faceted search)?
Nope, there's nothing in Solr that crawls anything, you have to feed documents in yourself from the websites. Or, look at the Nutch project, see: http://nutch.apache.org/about.html which is designed for this kind of problem. Best Erick On Sun, Sep 11, 2011 at 8:53 PM, dpt9876 wrote: > Hi all, > I am wondering if Solr will do the following for a project I am working on. > I want to create a search engine with facets for potentially hundreds of > websites. > Similar to say crawling amazon + buy.com + ebay and someone can search these > 3 sites from my 1 website. > (I realise there are better ways of doing the above example, its for > illustrative purposes). > Eventually I would build that search crawl to index say 200 or 1000 > merchants. > Someone would come to my site and search for "digital camera". > > They would get results from all 3 indexes and hopefully dynamic facets eg > Price $100-200 > Price 200-300 > Resolution 1mp-2mp > > etc etc > > Can this be done on the fly? > > I ask this because I am currently developing webscrapers to crawl these > websites, dump that data into a db, then was thinking of tacking on a solr > server to crawl my db. > > Problem with that approach is that crawling the worlds ecommerce sites will > take forever, when it seems solr might do that for me? (I have read about > multiple indexes etc). > > Many thanks > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328314.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Will Solr/Lucene crawl multi websites (aka a mini google with faceted search)?
Hi all, I am wondering if Solr will do the following for a project I am working on. I want to create a search engine with facets for potentially hundreds of websites. Similar to say crawling amazon + buy.com + ebay and someone can search these 3 sites from my 1 website. (I realise there are better ways of doing the above example, its for illustrative purposes). Eventually I would build that search crawl to index say 200 or 1000 merchants. Someone would come to my site and search for "digital camera". They would get results from all 3 indexes and hopefully dynamic facets eg Price $100-200 Price 200-300 Resolution 1mp-2mp etc etc Can this be done on the fly? I ask this because I am currently developing webscrapers to crawl these websites, dump that data into a db, then was thinking of tacking on a solr server to crawl my db. Problem with that approach is that crawling the worlds ecommerce sites will take forever, when it seems solr might do that for me? (I have read about multiple indexes etc). Many thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328314.html Sent from the Solr - User mailing list archive at Nabble.com.
select query does not find indexed pdf document
I am new to solr. I tried to upload a pdf file via curl to my solr webapp (on tomcat) curl "http://www/SearchApp/update/extract?stream.file=c:\dmvpn.pdf&stream.contentType=application/pdf&literal.id=pdf&commit=true"; 0860 but http://www/SearchApp/select/?q=vpn does not find the document 0 0 vpn help is appreciated. = fyi I point my test webapp to the index/solr home via mod meta-data/context.xml and I had to copy all these jars to my webapp lib dir: (to avoid the classnotfound) Solr_download\contrib\extraction\lib ...in the future i plan to put them in the tomcat/lib dir. Also, I have not modified conf\solrconfig.xml or schema.xml.
Re: Full-search index for the database
How much search-specific stuff are we talking here? Do you want to do stemming? Plurals? Or are you talking exact match? Phrases? multi-word queries? If exact match on individual terms is all you want, you could hack something together like this: index each term into a catch-all field with the field appended, something like val1|field1 val2|field2 be sure you don't use an analysis chain that splits on non-letters. Then, for each term, append |* to the term and your returned terms will have the field they came from. Of course you'll have to "do the right thing" with the results to show them correctly, but that'd work. But this is really abusing Solr . I wonder if this is an "XY problem", so can you explain what it is you're trying to do at a higher level and maybe we can suggest some other approach? You could also have some kind of hybrid solution that searched with Solr (not using the trick above) and just returned the PK from Solr, then go to the DB to fill things out. Best Erick On Sun, Sep 11, 2011 at 7:06 PM, Eugeny Balakhonov wrote: > My task is very simple: > > I have a big database with a lot tables and fields. This database has > dynamic structure and can be extended or changed in any time. > I need a tool for full-search possibility via all fields in all tables of my > database. On the input of this tool - some text for search. On the output - > some unique key and the name of field which contains this text. > > > Solr is very good selection, but I have serious problem with it: all Solr > query parsers (standard, dismax, edismax) requires explicit declaration of > fields for search. But list of these fields in my case is very and very big! > And at search time I don't know all field names in the database. > > I think that my task is not unique. According google a lot of people tries > to solve same problems with Solr. > > May be good idea to add more flexible possibilities for search in all > indexed fields? > > > I see following variants: > > 1. Add wildcards in the qf parameter for dismax/edismax query parsers. > > 2. Add possibility to store source field name in operator in > schema.xml. In this case user can do following: > > a) create field for default search: > multiValued="true"/> > ... > TEXT > > b) copy all fields to default search field: > > > c) In query response user can receive needed source field name: > > > > > foo foo foo test foo foo > > > > > 2011/9/12 Eugeny Balakhonov > >> Hello, >> >> Thanks for answer! >> >> I have created separate fields in mysolr schema for each field in database >> (more than 500!). How to ask parser for search via all these fields? By >> default Solr schema should contain explicit declaration of default search >> field like following: >> >> TEXT >> >> I tried to use following search query: >> >> .?q=*:search text&hl=on&defType=edismax >> >> In this case search goes across default search field. >> >> I can't concatenate all 500 database field names in a big search >> expression. >> >> >> 2011/9/11 Jamie Johnson >> >>> You should create separate fields in your solr schema for each field >>> in your database that you want recognized separately. You can use a >>> query parser like edismax to do a weighted query across all of your >>> fields and then provide highlighting on the specific field which >>> matched. >>> >>> 2011/9/10 Eugeny Balakhonov : >>> > I want to create full-text search for my database. >>> > >>> > It means that search engine should look up some string for all fields of >>> my >>> > database. >>> > >>> > I have created Solr configuration for extracting and indexing data from >>> a >>> > database. >>> > >>> > >>> > >>> > >>> > >>> > According documentation in the file schema.xml I have created field for >>> > full-text search index: >>> > >>> > >>> > >>> > >> > multiValued="true"/> >>> > >>> > >>> > >>> > Also I have added strings for copying all values of all fields into this >>> > full-search field: >>> > >>> > >>> > >>> > ... >>> > >>> > >>> > >>> > ... >>> > >>> > >>> > >>> > In result I have possibility to search for all fields in my database. >>> But I >>> > can't recognize which field in the found record contains requested >>> string. >>> > >>> > Highlighting functionality just marks string in the "TEXT" field like >>> > following: >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > Any text any text Test" >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > Any text any text Test" >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > How to create full-search index with possibility to recognize source >>> > database field? >>> > >>> > >>> > >>> > Thx a lot. >>> > >>> > Eugeny >>> > >>> > >>> >> >> >> >> -- >> Best regards, >> Eugeny Balakhonov >> > > > > -- > Best regards, > Eugeny Balakhonov >
Re: Full-search index for the database
My task is very simple: I have a big database with a lot tables and fields. This database has dynamic structure and can be extended or changed in any time. I need a tool for full-search possibility via all fields in all tables of my database. On the input of this tool - some text for search. On the output - some unique key and the name of field which contains this text. Solr is very good selection, but I have serious problem with it: all Solr query parsers (standard, dismax, edismax) requires explicit declaration of fields for search. But list of these fields in my case is very and very big! And at search time I don't know all field names in the database. I think that my task is not unique. According google a lot of people tries to solve same problems with Solr. May be good idea to add more flexible possibilities for search in all indexed fields? I see following variants: 1. Add wildcards in the qf parameter for dismax/edismax query parsers. 2. Add possibility to store source field name in operator in schema.xml. In this case user can do following: a) create field for default search: ... TEXT b) copy all fields to default search field: c) In query response user can receive needed source field name: foo foo foo test foo foo 2011/9/12 Eugeny Balakhonov > Hello, > > Thanks for answer! > > I have created separate fields in mysolr schema for each field in database > (more than 500!). How to ask parser for search via all these fields? By > default Solr schema should contain explicit declaration of default search > field like following: > > TEXT > > I tried to use following search query: > > .?q=*:search text&hl=on&defType=edismax > > In this case search goes across default search field. > > I can't concatenate all 500 database field names in a big search > expression. > > > 2011/9/11 Jamie Johnson > >> You should create separate fields in your solr schema for each field >> in your database that you want recognized separately. You can use a >> query parser like edismax to do a weighted query across all of your >> fields and then provide highlighting on the specific field which >> matched. >> >> 2011/9/10 Eugeny Balakhonov : >> > I want to create full-text search for my database. >> > >> > It means that search engine should look up some string for all fields of >> my >> > database. >> > >> > I have created Solr configuration for extracting and indexing data from >> a >> > database. >> > >> > >> > >> > >> > >> > According documentation in the file schema.xml I have created field for >> > full-text search index: >> > >> > >> > >> > > > multiValued="true"/> >> > >> > >> > >> > Also I have added strings for copying all values of all fields into this >> > full-search field: >> > >> > >> > >> > ... >> > >> > >> > >> > ... >> > >> > >> > >> > In result I have possibility to search for all fields in my database. >> But I >> > can't recognize which field in the found record contains requested >> string. >> > >> > Highlighting functionality just marks string in the "TEXT" field like >> > following: >> > >> > >> > >> > >> > >> > >> > >> > >> > >> >Any text any text Test" >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > Any text any text Test" >> > >> > >> > >> > >> > >> > >> > >> > How to create full-search index with possibility to recognize source >> > database field? >> > >> > >> > >> > Thx a lot. >> > >> > Eugeny >> > >> > >> > > > > -- > Best regards, > Eugeny Balakhonov > -- Best regards, Eugeny Balakhonov
Re: Solr and DateTimes - bug?
Hi, The XML output when performing a query via the solr interface is like this: 1-01-01T00:00:00Z It's solr 3.3.0 on an ArchLinux desktop machine with "OpenJDK 6.b22_1.10.3-1" as my java runtime environment. /Nicklas On 2011-09-12 00:26, Jan Høydahl wrote: Hi, Can you try to make a plain HTTP query from the admin GUI on your index and tell us what the XML response is for that date field? http://localhost:8983/solr/select?q=*:* If that date output is wrong as well, there may be a bug with Solr. If it is correct, you have a problem in SolrNet. Btw, which version of Solr do you use? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 12. sep. 2011, at 00:28, Nicklas Overgaard wrote: Hi everybody, I just started playing around with solr, however i'm facing some trouble. The test data i'm indexing with solr is, amongst other things, containing date and times. By the way, I'm using mono and i'm talking to solr through the SolrNet library. The issue i'm facing: Some of the dates corresponds to the DateTime.MinValue of .net, which is "0001-01-01 00:00:00". When this date is returned from Solr, it's returned like "1-01-01T00:00:00Z". Now, I figured out that solr supposedly should return dates according to the ISO 8601 standard - but the above output is not in that format. This basically leads to mono breaking down because it's not able to parse the above date. If i add three leading zeroes, it parses just fine (so it becomes "0001-01-01T00:00:00Z", the correct ISO 8601 format). So my question is: Is this a bug in the solr output engine, or should mono be able to parse the date as given from solr? I have not yet tried it out on .net as I do not have access to a windows machine at the moment. Best regards, Nicklas
Re: Nested documents
Even if it applies, this is for Lucene. I don't think we've added Solr support for this yet... we should! Mike McCandless http://blog.mikemccandless.com On Sun, Sep 11, 2011 at 12:16 PM, Erick Erickson wrote: > Does this JIRA apply? > > https://issues.apache.org/jira/browse/LUCENE-3171 > > Best > Erick > > On Sat, Sep 10, 2011 at 8:32 PM, Andy wrote: >> Hi, >> >> Does Solr support nested documents? If not is there any plan to add such a >> feature? >> >> Thanks. >
Re: Solr and DateTimes - bug?
Hi, Can you try to make a plain HTTP query from the admin GUI on your index and tell us what the XML response is for that date field? http://localhost:8983/solr/select?q=*:* If that date output is wrong as well, there may be a bug with Solr. If it is correct, you have a problem in SolrNet. Btw, which version of Solr do you use? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 12. sep. 2011, at 00:28, Nicklas Overgaard wrote: > Hi everybody, > > I just started playing around with solr, however i'm facing some trouble. The > test data i'm indexing with solr is, amongst other things, containing date > and times. > > By the way, I'm using mono and i'm talking to solr through the SolrNet > library. > > The issue i'm facing: > > Some of the dates corresponds to the DateTime.MinValue of .net, which is > "0001-01-01 00:00:00". When this date is returned from Solr, it's returned > like "1-01-01T00:00:00Z". Now, I figured out that solr supposedly should > return dates according to the ISO 8601 standard - but the above output is not > in that format. > > This basically leads to mono breaking down because it's not able to parse the > above date. If i add three leading zeroes, it parses just fine (so it becomes > "0001-01-01T00:00:00Z", the correct ISO 8601 format). > > So my question is: Is this a bug in the solr output engine, or should mono be > able to parse the date as given from solr? I have not yet tried it out on > .net as I do not have access to a windows machine at the moment. > > Best regards, > > Nicklas
Re: Full-search index for the database
Hello, Thanks for answer! I have created separate fields in mysolr schema for each field in database (more than 500!). How to ask parser for search via all these fields? By default Solr schema should contain explicit declaration of default search field like following: TEXT I tried to use following search query: .?q=*:search text&hl=on&defType=edismax In this case search goes across default search field. I can't concatenate all 500 database field names in a big search expression. 2011/9/11 Jamie Johnson > You should create separate fields in your solr schema for each field > in your database that you want recognized separately. You can use a > query parser like edismax to do a weighted query across all of your > fields and then provide highlighting on the specific field which > matched. > > 2011/9/10 Eugeny Balakhonov : > > I want to create full-text search for my database. > > > > It means that search engine should look up some string for all fields of > my > > database. > > > > I have created Solr configuration for extracting and indexing data from a > > database. > > > > > > > > > > > > According documentation in the file schema.xml I have created field for > > full-text search index: > > > > > > > > > multiValued="true"/> > > > > > > > > Also I have added strings for copying all values of all fields into this > > full-search field: > > > > > > > > ... > > > > > > > > ... > > > > > > > > In result I have possibility to search for all fields in my database. But > I > > can't recognize which field in the found record contains requested > string. > > > > Highlighting functionality just marks string in the "TEXT" field like > > following: > > > > > > > > > > > > > > > > > > > >Any text any text Test" > > > > > > > > > > > > > > > > > > > > Any text any text Test" > > > > > > > > > > > > > > > > How to create full-search index with possibility to recognize source > > database field? > > > > > > > > Thx a lot. > > > > Eugeny > > > > > -- Best regards, Eugeny Balakhonov
Re: Running solr on small amounts of RAM
Hi, Beware that Solr4.0 branch has multiple RAM conserving optimizations which may cause your index to take considerably less space, so try it out. Also, of course, prune your schema to turn off everything you don't need, and also your OS to stop services you don't use. Consider disallowing certain type of queries from the clients (such as wildcard, sorting, fuzzy etc) to avoid getting int high-mem situations. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 11. sep. 2011, at 17:59, Erick Erickson wrote: > Well, this answer isn't much more satisfactory than "get more memory", > but about all I can say is "try it and see". > > Sure, make your caches very small and monitor memory and test it out. > > You'll get a sense of how fast (or slow) the queries are pretty quickly. Or > you can get a ballpark estimate of what running without caches would > do performance wise by simply measuring the first query after a restart. > > Because, unfortunately, "it depends" is the only accurate answer. It > depends on how much sorting, faceting etc. you do as well as the > queries themselves. > > Best > Erick > > On Fri, Sep 9, 2011 at 12:48 PM, Mike Austin wrote: >> I'm trying to push to get solr used in our environment. I know I could have >> responses saying WHY can't you get more RAM etc.., but lets just skip those >> and work with this situation. >> >> Our index is very small with 100k documents and a light load at the moment. >> If I wanted to use the smallest possible RAM on the server, how would I do >> this and what are the issues? >> >> I know that caching would be the biggest lose but if solr ran with no to >> little caching, the performance would still be ok? I know this is a relative >> question.. >> This is the only application using java on this machine, would tuning java >> to use less cache help anything? >> I should set the cache settings low in the config? >> Basically, what will having a very low cache hit rate do to search speed and >> server performance? I know more is better and it depends on what I'm >> comparing it to but if you could just answer in some way saying that it's >> not going to cripple the machine or cause 5 second searches? >> >> It's on a windows server. >> >> >> Thanks, >> Mike >>
Re: Stemming and other tokenizers
Hi, You'll not be able to detect language and change stemmer on the same field in one go. You need to create one fieldType in your schema per language you want to use, and then use LanguageIdentification (SOLR-1979) to do the magic of detecting language and renaming the field. If you set langid.override=false, languid.map=true and populate your "language" field with the known language, you will probably get the desired effect. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 10. sep. 2011, at 03:24, Patrick Sauts wrote: > Hello, > > > > I want to implement some king of AutoStemming that will detect the language > of a field based on a tag at the start of this field like #en# my field is > stored on disc but I don't want this tag to be stored. Is there a way to > avoid this field to be stored ? > > To me all the filters and the tokenizers interact only with the indexed > field and not the stored one. > > Am I wrong ? > > Is it possible to you to do such a filter. > > > > Patrick. >
Re: Example Solr Config on EC2
Just to clarify, that link doesn't do anything to promote an already running slave into a master. One would have to bounce the Solr node which has that slave and then make the shift. It is not something that happens at runtime live. On Wed, Aug 10, 2011 at 4:04 PM, Akshay wrote: > Yes you can promote a slave to be master refer > > http://wiki.apache.org/solr/SolrReplication#enable.2BAC8-disable_master.2BAC8-slave_in_a_node > > In AWS one can use an elastic IP(http://aws.amazon.com/articles/1346) to > refer to the master and this can be assigned to slaves as they assume the > role of master(in case of failure). All slaves will then refer to this new > master and there will be no need to regenerate data. > > Automation of this maybe possible through CloudWatch alarm-actions. I don't > know of any available example automation scripts. > > Cheers > Akshay. > > On Wed, Aug 10, 2011 at 9:08 PM, Matt Shields > wrote: > > > If I were to build a master with multiple slaves, is it possible to > promote > > a slave to be the new master if the original master fails? Will all the > > slaves pickup right where they left off, or any time the master fails > will > > we need to completely regenerate all the data? > > > > If this is possible, are there any examples of this being automated? > > Especially on Win2k3. > > > > Matthew Shields > > Owner > > BeanTown Host - Web Hosting, Domain Names, Dedicated Servers, Colocation, > > Managed Services > > www.beantownhost.com > > www.sysadminvalley.com > > www.jeeprally.com > > > > > > > > On Mon, Aug 8, 2011 at 5:34 PM, wrote: > > > > > Matthew, > > > > > > Here's another resource: > > > > > > > > > http://www.lucidimagination.com/blog/2010/02/01/solr-shines-through-the-cloud-lucidworks-solr-on-ec2/ > > > > > > > > > Michael Bohlig > > > Lucid Imagination > > > > > > > > > > > > - Original Message > > > From: Matt Shields > > > To: solr-user@lucene.apache.org > > > Sent: Mon, August 8, 2011 2:03:20 PM > > > Subject: Example Solr Config on EC2 > > > > > > I'm looking for some examples of how to setup Solr on EC2. The > > > configuration I'm looking for would have multiple nodes for redundancy. > > > I've tested in-house with a single master and slave with replication > > > running in Tomcat on Windows Server 2003, but even if I have multiple > > > slaves > > > the single master is a single point of failure. Any suggestions or > > example > > > configurations? The project I'm working on is a .NET setup, so ideally > > I'd > > > like to keep this search cluster on Windows Server, even though I > prefer > > > Linux. > > > > > > Matthew Shields > > > Owner > > > BeanTown Host - Web Hosting, Domain Names, Dedicated Servers, > Colocation, > > > Managed Services > > > www.beantownhost.com > > > www.sysadminvalley.com > > > www.jeeprally.com > > > > > > > > >
Solr and DateTimes - bug?
Hi everybody, I just started playing around with solr, however i'm facing some trouble. The test data i'm indexing with solr is, amongst other things, containing date and times. By the way, I'm using mono and i'm talking to solr through the SolrNet library. The issue i'm facing: Some of the dates corresponds to the DateTime.MinValue of .net, which is "0001-01-01 00:00:00". When this date is returned from Solr, it's returned like "1-01-01T00:00:00Z". Now, I figured out that solr supposedly should return dates according to the ISO 8601 standard - but the above output is not in that format. This basically leads to mono breaking down because it's not able to parse the above date. If i add three leading zeroes, it parses just fine (so it becomes "0001-01-01T00:00:00Z", the correct ISO 8601 format). So my question is: Is this a bug in the solr output engine, or should mono be able to parse the date as given from solr? I have not yet tried it out on .net as I do not have access to a windows machine at the moment. Best regards, Nicklas
Re: SolrCloud Feedback
On Sep 9, 2011, at 1:09 PM, Pulkit Singhal wrote: > I think I understand it a bit better now but wouldn't mind some validation. > > 1) solr.xml does not become part of ZooKeeper Right - currently it does not. Info is put there to tell Solr how to connect to zookeeper and register the cores. > 2) The default looks like this out-of-box: > > > > so that may leave one wondering where the core's association to a > collection name is made? > > It can be made like so: > a) statically in a file: > > b) at start time via java: > java ... -Dcollection.configName=myconf ... -jar start.jar These are two different things. First, just to make the bootstrap case simple, if you don't specify a collection name, it defaults to the SolrCore name. That is why we make a default SolrCore name of collection1. In the simple wiki SolrCloud example, you can avoid naming the collection on each shard and simply have things come up under collection1 by default. a) shows how to override using the SolrCore name for the collection name. b) shows how to set the configuration set name for the config files that you upload with -Dbootstrap_confdir=. If you specify nothing for collection.configName, it defaults to configuration1. > > And I'm guessing that since the core's name ("collection1") for shard1 > has already been associated with -Dcollection.configname=myconf in > http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster > once already, adding an additional shard2 with the same core name > ("collection1"), automatically throws it in with the collection name > ("myconf") without any need to specify anything at startup via -D or > statically in solr.xml file. "myconf" is not the collection name - it's the name of a collection of configuration files. If only one such set exists, you don't have to specify which to use (which you would do by changing the value at a given node in the zookeeper layout). If you wanted multiple named collection file sets, you would have to explicitly set each collection -> name configuration file set. > > Validate away otherwise I'll just accept any hate mail after making > edits to the Solr wiki directly. > > - Pulkit > > On Fri, Sep 9, 2011 at 11:38 AM, Pulkit Singhal > wrote: >> Hello Jan, >> >> You've made a very good point in (b). I would be happy to make the >> edit to the wiki if I understood your explanation completely. >> >> When you say that it is "looking up what collection that core is part >> of" ... I'm curious how a core is being put under a particular >> collection in the first place? And what that collection is named? >> Obviously you've made it clear that colelction1 is really the name of >> the core itself. And where this association is being stored for the >> code to look it up? >> >> If not Jan, then perhaps the gurus who wrote Solr Cloud could answer :) >> >> Thanks! >> - Pulkit >> >> On Thu, Feb 10, 2011 at 9:10 AM, Jan Høydahl wrote: >>> Hi, >>> >>> I have so far just tested the examples and got a N by M cluster running. My >>> feedback: >>> >>> a) First of all, a major update of the SolrCloud Wiki is needed, to clearly >>> state what is in which version, what are current improvement plans and get >>> rid of outdated stuff. That said I think there are many good ideas there. >>> >>> b) The "collection" terminology is too much confused with "core", and >>> should probably be made more distinct. I just tried to configure two cores >>> on the same Solr instance into the same collection, and that worked fine, >>> both as distinct shards and as same shard (replica). The wiki examples give >>> the impression that "collection1" in >>> localhost:8983/solr/collection1/select?distrib=true is some magic >>> collection identifier, but what it really does is doing the query on the >>> *core* named "collection1", looking up what collection that core is part of >>> and distributing the query to all shards in that collection. >>> >>> c) ZK is not designed to store large files. While the files in conf are >>> normally well below the 1M limit ZK imposes, we should perhaps consider >>> using a lightweight distributed object or k/v store for holding the >>> /CONFIGS and let ZK store a reference only >>> >>> d) How are admins supposed to update configs in ZK? Install their favourite >>> ZK editor? >>> >>> e) We should perhaps not be so afraid to make ZK a requirement for Solr in >>> v4. Ideally you should interact with a 1-node Solr in the same manner as >>> you do with a 100-node Solr. An example is the Admin GUI where the "schema" >>> and "solrconfig" links assume local file. This requires decent tool support >>> to make ZK interaction intuitive, such as "import" and "export" commands. >>> >>> -- >>> Jan Høydahl, search solution architect >>> Cominvent AS - www.cominvent.com >>> >>> On 19. jan. 2011, at 21.07, Mark Miller wrote: >>> Hello Users, About a little over a year ago, a few of us started workin
Re: solr equivalent of "select distinct"
You can get what you want - unique lists of values from docs matching your query - for a single field (using facets), but not for the co-occurrence of two field values. So you could combine the two fields together, if you know what they are going to be "in advance." Facets also give you counts, so in some special cases, you could get what you want - eg you can tell when there is only a single pair of values since their counts will be the same and the same as the total. But that's all I can think of. -Mike On 9/11/2011 12:39 PM, Mark juszczec wrote: Here's an example: PK FLD1 FLD2FLD3 FLD4 FLD5 AB0 AB 0 x y AB1 AB 1 x y CD0 CD 0 a b CD1 CD 1 e f I want to write a query using only the terms FLD1 and FLD2 and ONLY get back: A B x y C D a b C D e f Since FLD4 and FLD5 are the same for PK=AB0 and AB1, I only want one occurrence of those records. Since FLD4 and FLD5 are different for PK=CD0 and CD1, I want BOTH occurrences of those records.
Re: searching for terms containing embedded spaces
OK, there are several issues here: q= *:* AND CUSTOMER_TYPE_NM:Network Advertiser AND ACTIVE_IND:1&defType=edismax&rows=500&sort=ACCOUNT_CUSTOMER_ID asc&start=0 the *:* is doing you no good, I'd just remove it. defType=edismax probably isn't doing what you expect, you're not specifying any fields (no qf parameter). This is going to your request handler that has ' default="true" ' defined. If you're using a stock example, you're probably searching against the default search field defined in schema.xml, probably a field named "text". If you have a request handler named "edismax", you can use the qt=edismax parameter. If your request handler is named "/edismax", then use either qt=/edismax or solr/edismax?q= Attach the &debugQuery=on" and look at the parsed form of the query. But edismax plays nicer than dismax used to, it's probably searching against your default search field. Which is probably NOT CUSTOMER_TYPE_NM. String types are completely unanalyzed, so they're case sensitive. If you want a case-insensitive version, use something like KeywordTokenizer followed by LowerCaseFilter. The admin/analysis page will help you a lot here. I think you'll get a lot of insight into this if you attach &debugQuery=on and look at the and sections (after the results list). Best Erick On Sun, Sep 11, 2011 at 2:25 PM, Mark juszczec wrote: > The field's properties are: > > field name="CUSTOMER_TYPE_NM" type="string" indexed="true" stored="true" > required="true" default="CUSTOMER_TYPE_NM_MISSING" > > There have been no changes since I last completely rebuilt the index. > > Is re-indexing done when an index is completely rebuilt with a a > dataimport=full? How about if we've done dataimport=delta? > > If it helps, this is what I get when I print out the ModifiableSolrParams > object I'm sending to the query method: > > q=+*%3A*++AND+CUSTOMER_TYPE_NM%3ANetwork+Advertiser+AND+ACTIVE_IND%3A1&defType=edismax&rows=500&sort=ACCOUNT_CUSTOMER_ID+asc&start=0 > > Mark > > On Sun, Sep 11, 2011 at 2:05 PM, Yonik Seeley > wrote: > >> On Sun, Sep 11, 2011 at 1:39 PM, Mark juszczec >> wrote: >> > That's what I thought. The problem is, its not and I am unsure what is >> > wrong. >> >> What is the fieldType definition for that field? Did you change it >> without re-indexing? >> >> -Yonik >> http://www.lucene-eurocon.com - The Lucene/Solr User Conference >> >
Re: solr equivalent of "select distinct"
Hmmm, there's no good way I can think of off the top of my head to do this. Whenever people find themselves thinking in terms of RDBMSs, I have to ask whether the problem is really appropriate for a search engine. And/or what the problem you're trying to solve with this approach is from a higher level. Perhaps there's another approach completely that would serve Best Erick On Sun, Sep 11, 2011 at 12:39 PM, Mark juszczec wrote: > Erick > > Thanks very much for the reply. > > I typed this late Friday after work and tried to simplify the problem > description. I got something wrong. Hopefully this restatement is better: > > My PK is FLD1, FLD2 and FLD3 concatenated together. > > In some cases FLD1 and FLD2 can be the same. The ONLY differing field being > FLD3. > > Here's an example: > > PK FLD1 FLD2 FLD3 FLD4 FLD5 > AB0 A B 0 x y > AB1 A B 1 x y > CD0 C D 0 a b > CD1 C D 1 e f > > I want to write a query using only the terms FLD1 and FLD2 and ONLY get > back: > > A B x y > C D a b > C D e f > > Since FLD4 and FLD5 are the same for PK=AB0 and AB1, I only want one > occurrence of those records. > > Since FLD4 and FLD5 are different for PK=CD0 and CD1, I want BOTH > occurrences of those records. > > I'm hoping I can use wildcards to get FLD4 and FLD5. If not, I can use fl= > > I'm using edismax. > > We are also creating the query string on the fly. I suspect using SolrJ and > plugging the values into a bean would be easier - or do I have that wrong? > > I hope the tables of example data display properly. > > Mark > > On Sun, Sep 11, 2011 at 12:06 PM, Erick Erickson > wrote: > >> This smells like an XY problem, can you back up and give a higher-level >> reason *why* you want this behavior? >> >> Because given your problem description, this seems like you are getting >> correct behavior no matter how you define the problem. You're essentially >> saying that you have two records with identical beginnings of your PK, >> why is it incorrect to give you both records? >> >> But, anyway, if you're searching on FLD1 and FLD2, then by definition >> you're going to get both records back or the search would be failing! >> >> Best >> Erick >> >> On Fri, Sep 9, 2011 at 8:08 PM, Mark juszczec >> wrote: >> > Hello everyone >> > >> > Let's say each record in my index contains fields named PK, FLD1, FLD2, >> FLD3 >> > FLD100 >> > >> > PK is my solr primary key and I'm creating it by concatenating >> > FLD1+FLD2+FLD3 and I'm guaranteed that combination will be unique >> > >> > Let's say 2 of these records have FLD1 = A and FLD2 = B. I am unsure >> about >> > the remaining fields >> > >> > Right now, if I do a query specifying FLD1 = A and FLD2 = B then I get >> both >> > records. I only want 1. >> > >> > Research says I should use faceting. But this: >> > >> > q=FLD1:A and FLD2:B & rows=500 & defType=edismax & fl=FLD1, FLD2 & >> > facet=true & facet_field=FLD1 & facet_field=FLD2 >> > >> > gives me 2 records. >> > >> > In fact, it gives me the same results as: >> > >> > q=FLD1:A and FLD2:B & rows=500 & defType=edismax & fl=FLD1, FLD2 >> > >> > I'm wrong somewhere, but I'm unsure where. >> > >> > Is faceting the right way to go or should I be using grouping? >> > >> > Curiously, when I use grouping like this: >> > >> > q=FLD1:A and FLD2:B &rows=500 &defType=edismax &indent=true &fl=FLD1, >> FLD2 >> > &group=true &group.field=FLD1 &group.field=FLD2 >> > >> > I get 2 records as well. >> > >> > Has anyone dealt with mimicing "select distinct" in Solr? >> > >> > Any advice would be very appreciated. >> > >> > Mark >> > >> >
Re: searching for terms containing embedded spaces
The field's properties are: field name="CUSTOMER_TYPE_NM" type="string" indexed="true" stored="true" required="true" default="CUSTOMER_TYPE_NM_MISSING" There have been no changes since I last completely rebuilt the index. Is re-indexing done when an index is completely rebuilt with a a dataimport=full? How about if we've done dataimport=delta? If it helps, this is what I get when I print out the ModifiableSolrParams object I'm sending to the query method: q=+*%3A*++AND+CUSTOMER_TYPE_NM%3ANetwork+Advertiser+AND+ACTIVE_IND%3A1&defType=edismax&rows=500&sort=ACCOUNT_CUSTOMER_ID+asc&start=0 Mark On Sun, Sep 11, 2011 at 2:05 PM, Yonik Seeley wrote: > On Sun, Sep 11, 2011 at 1:39 PM, Mark juszczec > wrote: > > That's what I thought. The problem is, its not and I am unsure what is > > wrong. > > What is the fieldType definition for that field? Did you change it > without re-indexing? > > -Yonik > http://www.lucene-eurocon.com - The Lucene/Solr User Conference >
Re: searching for terms containing embedded spaces
On Sun, Sep 11, 2011 at 1:39 PM, Mark juszczec wrote: > That's what I thought. The problem is, its not and I am unsure what is > wrong. What is the fieldType definition for that field? Did you change it without re-indexing? -Yonik http://www.lucene-eurocon.com - The Lucene/Solr User Conference
Re: searching for terms containing embedded spaces
That's what I thought. The problem is, its not and I am unsure what is wrong. On Sun, Sep 11, 2011 at 1:35 PM, Yonik Seeley wrote: > On Sun, Sep 11, 2011 at 1:15 PM, Mark juszczec > wrote: > > I am looking for a text string with a single, embedded space. For the > > purposes of this example, it is "a b" and its stored in the index in a > field > > called field. > > > > Am I incorrect in assuming the query field:"a b" will match the the > string a > > followed by a single embedded space followed by a b? > > Yes, that should work regardless of how the field is indexed (as a big > single token, or as a normal text field that doesn't preserve spaces). > > -Yonik > http://www.lucene-eurocon.com - The Lucene/Solr User Conference >
Re: searching for terms containing embedded spaces
On Sun, Sep 11, 2011 at 1:15 PM, Mark juszczec wrote: > I am looking for a text string with a single, embedded space. For the > purposes of this example, it is "a b" and its stored in the index in a field > called field. > > Am I incorrect in assuming the query field:"a b" will match the the string a > followed by a single embedded space followed by a b? Yes, that should work regardless of how the field is indexed (as a big single token, or as a normal text field that doesn't preserve spaces). -Yonik http://www.lucene-eurocon.com - The Lucene/Solr User Conference
Re: searching for terms containing embedded spaces
> > But as Erick says, it's not clear that's really what you want (to > search on a single term with a space in it). If it's a normal text > field, each word will be indexed separately, so you really want a > phrase query or a boolean query: > > field:"a b" > or > field:(a b) > > I am looking for a text string with a single, embedded space. For the purposes of this example, it is "a b" and its stored in the index in a field called field. Am I incorrect in assuming the query field:"a b" will match the the string a followed by a single embedded space followed by a b? I'm also wondering if this is already handled by the Solr/SolrJ API and if we are making our lives more difficult by assembling the query strings ourselves. Mark > -Yonik > http://www.lucene-eurocon.com - The Lucene/Solr User Conference >
Re: searching for terms containing embedded spaces
On Sun, Sep 11, 2011 at 12:56 PM, Mark juszczec wrote: > We've also tried making it create > > field:a\ b > > The first case just does not work and I'm unsure why. > > The second case ends up url encoding the \ and I'm unsure if that will cause > it to be used in the query or not. URL encoding is just part of the transfer syntax for an HTTP GET/POST - by the time the query makes it to the lucene/solr query parser, that escaping will have been removed. You can also use http://lucene.apache.org/solr/api/org/apache/solr/search/TermQParserPlugin.html and not worry about any escaping. But as Erick says, it's not clear that's really what you want (to search on a single term with a space in it). If it's a normal text field, each word will be indexed separately, so you really want a phrase query or a boolean query: field:"a b" or field:(a b) -Yonik http://www.lucene-eurocon.com - The Lucene/Solr User Conference
Re: searching for terms containing embedded spaces
Erick My field contains "a b" (without ") We are trying to assemble the query as a String by appending the various values. I think that is a large part of the problem and our lives would be easier if we let the Solr api do this work. We've experimented with our "query assembler" producing field:a+b We've also tried making it create field:a\ b The first case just does not work and I'm unsure why. The second case ends up url encoding the \ and I'm unsure if that will cause it to be used in the query or not. Mark On Sun, Sep 11, 2011 at 12:10 PM, Erick Erickson wrote: > Try escaping it for a start. > > But why do you want to? If it's a phrase query, enclose it in double > quotes. > You really have to provide more details, because there are too many > possibilities > to answer. For instance: > > If you're entering field:a b then 'b' will be searched against your > default text field > and you should enter field:(a b) or field:a field:b > > If you've tokenized the field, you shouldn't care. > > If you're using keywordanalyzer, escaping should work. > > Etc. > > > Best > Erick > > On Fri, Sep 9, 2011 at 8:11 PM, Mark juszczec > wrote: > > Hi folks > > > > I've got a field that contains 2 words separated by a single blank. > > > > What's the trick to creating a search string that contains the single > blank? > > > > Mark > > >
Re: solr equivalent of "select distinct"
Erick Thanks very much for the reply. I typed this late Friday after work and tried to simplify the problem description. I got something wrong. Hopefully this restatement is better: My PK is FLD1, FLD2 and FLD3 concatenated together. In some cases FLD1 and FLD2 can be the same. The ONLY differing field being FLD3. Here's an example: PK FLD1 FLD2FLD3 FLD4 FLD5 AB0 AB 0 x y AB1 AB 1 x y CD0 CD 0 a b CD1 CD 1 e f I want to write a query using only the terms FLD1 and FLD2 and ONLY get back: A B x y C D a b C D e f Since FLD4 and FLD5 are the same for PK=AB0 and AB1, I only want one occurrence of those records. Since FLD4 and FLD5 are different for PK=CD0 and CD1, I want BOTH occurrences of those records. I'm hoping I can use wildcards to get FLD4 and FLD5. If not, I can use fl= I'm using edismax. We are also creating the query string on the fly. I suspect using SolrJ and plugging the values into a bean would be easier - or do I have that wrong? I hope the tables of example data display properly. Mark On Sun, Sep 11, 2011 at 12:06 PM, Erick Erickson wrote: > This smells like an XY problem, can you back up and give a higher-level > reason *why* you want this behavior? > > Because given your problem description, this seems like you are getting > correct behavior no matter how you define the problem. You're essentially > saying that you have two records with identical beginnings of your PK, > why is it incorrect to give you both records? > > But, anyway, if you're searching on FLD1 and FLD2, then by definition > you're going to get both records back or the search would be failing! > > Best > Erick > > On Fri, Sep 9, 2011 at 8:08 PM, Mark juszczec > wrote: > > Hello everyone > > > > Let's say each record in my index contains fields named PK, FLD1, FLD2, > FLD3 > > FLD100 > > > > PK is my solr primary key and I'm creating it by concatenating > > FLD1+FLD2+FLD3 and I'm guaranteed that combination will be unique > > > > Let's say 2 of these records have FLD1 = A and FLD2 = B. I am unsure > about > > the remaining fields > > > > Right now, if I do a query specifying FLD1 = A and FLD2 = B then I get > both > > records. I only want 1. > > > > Research says I should use faceting. But this: > > > > q=FLD1:A and FLD2:B & rows=500 & defType=edismax & fl=FLD1, FLD2 & > > facet=true & facet_field=FLD1 & facet_field=FLD2 > > > > gives me 2 records. > > > > In fact, it gives me the same results as: > > > > q=FLD1:A and FLD2:B & rows=500 & defType=edismax & fl=FLD1, FLD2 > > > > I'm wrong somewhere, but I'm unsure where. > > > > Is faceting the right way to go or should I be using grouping? > > > > Curiously, when I use grouping like this: > > > > q=FLD1:A and FLD2:B &rows=500 &defType=edismax &indent=true &fl=FLD1, > FLD2 > > &group=true &group.field=FLD1 &group.field=FLD2 > > > > I get 2 records as well. > > > > Has anyone dealt with mimicing "select distinct" in Solr? > > > > Any advice would be very appreciated. > > > > Mark > > >
Re: Nested documents
Does this JIRA apply? https://issues.apache.org/jira/browse/LUCENE-3171 Best Erick On Sat, Sep 10, 2011 at 8:32 PM, Andy wrote: > Hi, > > Does Solr support nested documents? If not is there any plan to add such a > feature? > > Thanks.
Re: How to write this query?
So are you still having a problem, and if so what? Best Erick On Sat, Sep 10, 2011 at 5:48 AM, crisfromnova wrote: > Hi, > > key:value1^8 key:value2^4 key:value3^2 is correct. > > Sorry for bad query written. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/How-to-write-this-query-tp3318577p3325033.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: searching for terms containing embedded spaces
Try escaping it for a start. But why do you want to? If it's a phrase query, enclose it in double quotes. You really have to provide more details, because there are too many possibilities to answer. For instance: If you're entering field:a b then 'b' will be searched against your default text field and you should enter field:(a b) or field:a field:b If you've tokenized the field, you shouldn't care. If you're using keywordanalyzer, escaping should work. Etc. Best Erick On Fri, Sep 9, 2011 at 8:11 PM, Mark juszczec wrote: > Hi folks > > I've got a field that contains 2 words separated by a single blank. > > What's the trick to creating a search string that contains the single blank? > > Mark >
Re: solr equivalent of "select distinct"
This smells like an XY problem, can you back up and give a higher-level reason *why* you want this behavior? Because given your problem description, this seems like you are getting correct behavior no matter how you define the problem. You're essentially saying that you have two records with identical beginnings of your PK, why is it incorrect to give you both records? But, anyway, if you're searching on FLD1 and FLD2, then by definition you're going to get both records back or the search would be failing! Best Erick On Fri, Sep 9, 2011 at 8:08 PM, Mark juszczec wrote: > Hello everyone > > Let's say each record in my index contains fields named PK, FLD1, FLD2, FLD3 > FLD100 > > PK is my solr primary key and I'm creating it by concatenating > FLD1+FLD2+FLD3 and I'm guaranteed that combination will be unique > > Let's say 2 of these records have FLD1 = A and FLD2 = B. I am unsure about > the remaining fields > > Right now, if I do a query specifying FLD1 = A and FLD2 = B then I get both > records. I only want 1. > > Research says I should use faceting. But this: > > q=FLD1:A and FLD2:B & rows=500 & defType=edismax & fl=FLD1, FLD2 & > facet=true & facet_field=FLD1 & facet_field=FLD2 > > gives me 2 records. > > In fact, it gives me the same results as: > > q=FLD1:A and FLD2:B & rows=500 & defType=edismax & fl=FLD1, FLD2 > > I'm wrong somewhere, but I'm unsure where. > > Is faceting the right way to go or should I be using grouping? > > Curiously, when I use grouping like this: > > q=FLD1:A and FLD2:B &rows=500 &defType=edismax &indent=true &fl=FLD1, FLD2 > &group=true &group.field=FLD1 &group.field=FLD2 > > I get 2 records as well. > > Has anyone dealt with mimicing "select distinct" in Solr? > > Any advice would be very appreciated. > > Mark >
Re: Running solr on small amounts of RAM
Well, this answer isn't much more satisfactory than "get more memory", but about all I can say is "try it and see". Sure, make your caches very small and monitor memory and test it out. You'll get a sense of how fast (or slow) the queries are pretty quickly. Or you can get a ballpark estimate of what running without caches would do performance wise by simply measuring the first query after a restart. Because, unfortunately, "it depends" is the only accurate answer. It depends on how much sorting, faceting etc. you do as well as the queries themselves. Best Erick On Fri, Sep 9, 2011 at 12:48 PM, Mike Austin wrote: > I'm trying to push to get solr used in our environment. I know I could have > responses saying WHY can't you get more RAM etc.., but lets just skip those > and work with this situation. > > Our index is very small with 100k documents and a light load at the moment. > If I wanted to use the smallest possible RAM on the server, how would I do > this and what are the issues? > > I know that caching would be the biggest lose but if solr ran with no to > little caching, the performance would still be ok? I know this is a relative > question.. > This is the only application using java on this machine, would tuning java > to use less cache help anything? > I should set the cache settings low in the config? > Basically, what will having a very low cache hit rate do to search speed and > server performance? I know more is better and it depends on what I'm > comparing it to but if you could just answer in some way saying that it's > not going to cripple the machine or cause 5 second searches? > > It's on a windows server. > > > Thanks, > Mike >
Solr messing up the UK GBP (pound) symbol in response, even though Java environment variabe has file encoding is set to UTF 8....
Any idea why solr is unable to return the pound sign as-is? I tried typing in £ 1 million in Solr admin GUI and got following response. 0 5 on 0 £ 1 million 10 2.2 Here is my Java Properties I got also from admin interface: java.runtime.name = Java(TM) SE Runtime Environment sun.boot.library.path = /usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/amd64 java.vm.version = 20.1-b02 solr.data.dir = target/solr_data shared.loader = java.vm.vendor = Sun Microsystems Inc. java.vendor.url = http://java.sun.com/ path.separator = :java.vm.name = Java HotSpot(TM) 64-Bit Server VM tomcat.util.buf.StringCache.byte.enabled = true file.encoding.pkg = sun.io user.country = GB sun.java.launcher = SUN_STANDARD sun.os.patch.level = unknownjava.vm.specification.name = Java Virtual Machine Specification user.dir = /home/rbhagdev/SCCRepos/SCC_Platform/search/solr java.runtime.version = 1.6.0_26-b03 java.awt.graphicsenv = sun.awt.X11GraphicsEnvironment java.endorsed.dirs = /usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/endorsed os.arch = amd64 java.io.tmpdir = /tmp line.separator = java.vm.specification.vendor = Sun Microsystems Inc. java.naming.factory.url.pkgs = org.apache.namingos.name = Linux classworlds.conf = /usr/share/maven2/bin/m2.conf sun.jnu.encoding = UTF-8 java.library.path = /usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/amd64/server:/usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/amd64:/usr/lib/jvm/java-6-sun-1.6.0.26/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/libjava.specification.name = Java Platform API Specification java.class.version = 50.0 sun.management.compiler = HotSpot 64-Bit Tiered Compilers os.version = 2.6.38-11-generic user.home = /home/rbhagdev user.timezone = Europe/London catalina.useNaming = true java.awt.printerjob = sun.print.PSPrinterJob java.specification.version = 1.6 file.encoding = UTF-8 solr.solr.home = src/test/resources/solr_home catalina.home = /home/rbhagdev/SCCRepos/SCC_Platform/search/solr/target/tomcatuser.name = rbhagdev java.class.path = /usr/share/maven2/boot/classworlds.jar java.naming.factory.initial = org.apache.naming.java.javaURLContextFactory package.definition = sun.,java.,org.apache.catalina.,org.apache.coyote.,org.apache.tomcat.,org.apache.jasper. java.vm.specification.version = 1.0 sun.arch.data.model = 64 java.home = /usr/lib/jvm/java-6-sun-1.6.0.26/jre sun.java.command = org.codehaus.classworlds.Launcher "tomcat:run-war" java.specification.vendor = Sun Microsystems Inc. user.language = enjava.vm.info = mixed mode java.version = 1.6.0_26 java.ext.dirs = /usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/ext:/usr/java/packages/lib/ext securerandom.source = file:/dev/./urandom sun.boot.class.path = /usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/resources.jar:/usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/rt.jar:/usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/sunrsasign.jar:/usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/jsse.jar:/usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/jce.jar:/usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/charsets.jar:/usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/modules/jdk.boot.jar:/usr/lib/jvm/java-6-sun-1.6.0.26/jre/classes java.vendor = Sun Microsystems Inc. server.loader = maven.home = /usr/share/maven2 catalina.base = /home/rbhagdev/SCCRepos/SCC_Platform/search/solr/target/tomcat file.separator = / java.vendor.url.bug = http://java.sun.com/cgi-bin/bugreport.cgi common.loader = ${catalina.home}/lib,${catalina.home}/lib/*.jar sun.cpu.endian = little sun.io.unicode.encoding = UnicodeLittle package.access = sun.,org.apache.catalina.,org.apache.coyote.,org.apache.tomcat.,org.apache.jasper.,sun.beans. sun.desktop = gnome sun.cpu.isalist = Thanks, Ravish
Re: NRT and commit behavior
Hmm, OK. You might want to look at the non-cached filter query stuff, it's quite recent. The point here is that it is a filter that is applied only after all of the less expensive filter queries are run, One of its uses is exactly ACL calculations. Rather than calculate the ACL for the entire doc set, it only calculates access for docs that have made it past all the other elements of the query See SOLR-2429 and note that it is a 3.4 (currently being released) only. As to why your commits are taking so long, I have no idea given that you really haven't given us much to work with. How big is your index? Are you optimizing? Have you profiled the application to see what the bottleneck is (I/O, CPU, etc?). What else is running on your machine? It's quite surprising that it takes that long. How much memory are you giving the JVM? etc... You might want to review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Fri, Sep 9, 2011 at 9:41 AM, Tirthankar Chatterjee wrote: > Erick, > What you said is correct for us the searches are based on some Active > Directory permissions which are populated in Filter query parameter. So we > don't have any warming query concept as we cannot fire for every user ahead > of time. > > What we do here is that when user logs in we do an invalid query(which return > no results instead of '*') with the correct filter query (which is his > permissions based on the login). This way the cache gets warmed up with valid > docs. > > It works then. > > > Also, can you please let me know why commit is taking 45 mins to 1 hours on a > good resourced hardware with multiple processors and 16gb RAM 64 bit VM, etc. > We tried passing waitSearcher as false and found that inside the code it hard > coded to be true. Is there any specific reason. Can we change that value to > honor what is being passed. > > Thanks, > Tirthankar > > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Thursday, September 01, 2011 8:38 AM > To: solr-user@lucene.apache.org > Subject: Re: NRT and commit behavior > > Hmm, I'm guessing a bit here, but using an invalid query doesn't sound very > safe, but I suppose it *might* be OK. > > What does "invalid" mean? Syntax error? not safe. > > search that returns 0 results? I don't know, but I'd guess that filling your > caches, which is the point of warming queries, might be short circuited if > the query returns > 0 results but I don't know for sure. > > But the fact that "invalid queries return quicker" does not inspire > confidence since the *point* of warming queries is to spend the time up front > so your users don't have to wait. > > So here's a test. Comment out your warming queries. > Restart your server and fire the warming query from the browser > with&debugQuery=on and look at the QTime parameter. > > Now fire the same form of the query (as in the same sort, facet, grouping, > etc, but presumably a valid term). See the QTime. > > Now fire the same form of the query with a *different* value in the query. > That is, it should search on different terms but with the same sort, facet, > etc. to avoid getting your data straight from the queryResultCache. > > My guess is that the last query will return much more quickly than the second > query. Which would indicate that the first form isn't doing you any good. > > But a test is worth a thousand opinions. > > Best > Erick > > On Wed, Aug 31, 2011 at 11:04 AM, Tirthankar Chatterjee > wrote: >> Also noticed that "waitSearcher" parameter value is not honored inside >> commit. It is always defaulted to true which makes it slow during indexing. >> >> What we are trying to do is use an invalid query (which wont return any >> results) as a warming query. This way the commit returns faster. Are we >> doing something wrong here? >> >> Thanks, >> Tirthankar >> >> -Original Message- >> From: Jonathan Rochkind [mailto:rochk...@jhu.edu] >> Sent: Monday, July 18, 2011 11:38 AM >> To: solr-user@lucene.apache.org; yo...@lucidimagination.com >> Subject: Re: NRT and commit behavior >> >> In practice, in my experience at least, a very 'expensive' commit can >> still slow down searches significantly, I think just due to CPU (or >> i/o?) starvation. Not sure anything can be done about that. That's my >> experience in Solr 1.4.1, but since searches have always been async with >> commits, it probably is the same situation even in more recent versions, I'd >> guess. >> >> On 7/18/2011 11:07 AM, Yonik Seeley wrote: >>> On Mon, Jul 18, 2011 at 10:53 AM, Nicholas Chase >>> wrote: Very glad to hear that NRT is finally here! But my question is this: will things still come to a standstill during a commit? >>> New updates can now proceed in parallel with a commit, and searches >>> have always been completely asynchronous w.r.t. commits. >>> >>> -Yonik >>> http://www.lucidimagination.com >>> >> **Legal Disclaimer**
Re: Using multivalued field in map function
Hmmm, would it be simpler to do something like append a clause like this? BloggerId:12304^10 OR CoBloggerId:123404^5? Best Erick On Fri, Sep 9, 2011 at 2:14 AM, tkamphuis wrote: > Well, I'd like to do the following: > > I've got a website full of blogposts and every blogpost has an owner, this > owner is refererred to through his/her id. For example: BloggerId = 123. > It's also possible that the blog has multiple co-writers, which are also > referred to by there BloggerId but these id's are stored in the multivalue > field, in my previous example SubIds. > > When searching for a specific blogger one searches the BloggerId. > Searchresults are influenced by a number of variables, the > country/state/more specific geological data, the blogcategory, etc. For this > I use a facetted query. Next I want to make some results more important, > depending on the BloggerId, I tried to do this with the following query: > > ?q={!func}map(sum(map(BloggerId,12304,12304,2,0),map(BloggerId,12304,12304,1,0)),3,3,2)&fl=*,score&facet.field=Country&f.Country.facet.limit=6&facet.field=State&fq=(BlogCategory:internet%20OR%20BlogCategory:sports&sort=score%20desc,Top%20desc,%20SortPriority%20asc&start=0&omitHeader=true > > In the resulting list, blogs written by BloggerId 12304 should be on top of > the list, followed by the blogs where BloggerId 12304 was co-writer. After > that, all other blogs that follow the criteria but aren't written (or > co-written) by BloggerId 12304. > > Any ideas? Thanks! > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Using-multivalued-field-in-map-function-tp3318843p3322023.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: indexing data from rich documents - Tika with solr3.1
oh, it is good for me. Thank Erik Hatcher-4 very much. I have done to index from https. -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-data-from-rich-documents-Tika-with-solr3-1-tp3322555p3326971.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Full-search index for the database
You should create separate fields in your solr schema for each field in your database that you want recognized separately. You can use a query parser like edismax to do a weighted query across all of your fields and then provide highlighting on the specific field which matched. 2011/9/10 Eugeny Balakhonov : > I want to create full-text search for my database. > > It means that search engine should look up some string for all fields of my > database. > > I have created Solr configuration for extracting and indexing data from a > database. > > > > > > According documentation in the file schema.xml I have created field for > full-text search index: > > > > multiValued="true"/> > > > > Also I have added strings for copying all values of all fields into this > full-search field: > > > > ... > > > > ... > > > > In result I have possibility to search for all fields in my database. But I > can't recognize which field in the found record contains requested string. > > Highlighting functionality just marks string in the "TEXT" field like > following: > > > > > > > > > > Any text any text Test" > > > > > > > > > > Any text any text Test" > > > > > > > > How to create full-search index with possibility to recognize source > database field? > > > > Thx a lot. > > Eugeny > >