Using Solr to build a product matcher, with learning to rank
Hello, I'm considering using Solr with learning to rank to build a product matcher. For example, it should match the titles: - Apple iPhone 6 16 Gb, - iPhone 6 16 Gb, - Smartphone IPhone 6 16 Gb, - iPhone 6 black 16 Gb, to the same internal reference, an unique identifier. With Solr, each document would then have a field for the product title and one for its class, which is the unique identifier of the product. Solr would then be used to perform matching as follows. 1. A search is performed with a given product title. 2. The first three results are considered (this requires an initial product title database). 3. The most frequent identifier is returned. This method corresponds roughly to a k-Nearest Neighbor approach with the cosine metric, k = 3, and a TF-IDF model. I've done some preliminary tests with Sci-kit learn and the results are good, but not as good as the ones of more sophisticated learning algorithms. Then, I noticed that there exists learning to rank with Solr. First, do you think that such an use of Solr makes sense? Second, is there a relatively simple way to build a learning model using a sparse representation of the query TF-IDF vector? Kind regards, Xavier Schepler
Re: choosing placement upon RESTORE
thanks Mikhail, that sounds like it would help me as it allows you to set createNodeSet on RESTORE calls On Tue, May 2, 2017 at 2:50 PM, Mikhail Khludnev <m...@apache.org> wrote: > This sounds relevant, but different to https://issues.apache.org/ > jira/browse/SOLR-9527 > You may want to follow this ticket. > > On Mon, May 1, 2017 at 9:15 PM, xavier jmlucjav <jmluc...@gmail.com> > wrote: > >> hi, >> >> I am facing this situation: >> - I have a 3 node Solr 6.1 with some 1 shard, 1 node collections (it's >> just >> for dev work) >> - the collections where created with: >>action=CREATE&...=EMPTY" >> then >> action=ADDREPLICA&...=$NODEA=$DATADIR" >> - I have taken a BACKUP of the collections >> - Solr is upgraded to 6.5.1 >> >> Now, I started using RESTORE to restore the collections on the node A >> (where they lived before), but, instead of all being created in node A, >> collections have been created in A, then B, then C nodes. Well, Solrcloud >> tried to, as 2nd and 3rd RESTOREs failed, as the backup was in node A's >> disk, not reachable from nodes B and C. >> >> How is this supposed to work? I am looking at Rule Based Placement but it >> seems it is only available for CREATESHARD, so I can use it in RESTORE? >> Isn't there a way to force Solrcloud to create the collection in a given >> node? >> >> thanks! >> > > > > -- > Sincerely yours > Mikhail Khludnev >
choosing placement upon RESTORE
hi, I am facing this situation: - I have a 3 node Solr 6.1 with some 1 shard, 1 node collections (it's just for dev work) - the collections where created with: action=CREATE&...=EMPTY" then action=ADDREPLICA&...=$NODEA=$DATADIR" - I have taken a BACKUP of the collections - Solr is upgraded to 6.5.1 Now, I started using RESTORE to restore the collections on the node A (where they lived before), but, instead of all being created in node A, collections have been created in A, then B, then C nodes. Well, Solrcloud tried to, as 2nd and 3rd RESTOREs failed, as the backup was in node A's disk, not reachable from nodes B and C. How is this supposed to work? I am looking at Rule Based Placement but it seems it is only available for CREATESHARD, so I can use it in RESTORE? Isn't there a way to force Solrcloud to create the collection in a given node? thanks!
DIH: last_index_time not updated on if 0 docs updated
Hi, After getting our interval for calling delta index shorter and shorter, I have found out that last_index_time in dataimport.properties is not updated every time the indexing runs, it is skipped if no docs where added. This happens at least in the following scenario: - running delta as full index ( /dataimport?command=full-import=false=true ) - Solrcloud setup, so dataimport.properties is in zookeeper - Solr 5.5.0 I understand skipping the commit on the index if no docs were updated is a nice optimization, but I believe the last_index_time info should be updated in all cases, so it reflects reality. We, for instance, are looking at this piece of information in order to do other stuff. I could not find any mention of this on Jira, so I wonder if this is intented or just nobody had an issue with it? xavier
Re: procedure to restart solrcloud, and config/collection consistency
hi Shawn, as I replied to Markus, of course I know (and use) the collections api to reload the config. I am asking what would happen in that scenario: - config updated (but collection not reloaded) - i restart one node now one node has the new config and the rest the old one?? To which he already replied: >The restared/reloaded node has the new config, the others have the old config until reloaded/restarted. I was not asking about making solr restart itself, my English must be worst than I thought. By the way, stuff like that can be achieved with http://yajsw.sourceforge.net/ a very powerful java wrapper, I used to use it when Solr did not have a built in daemon setup. It was built by someone how was using JSW, and got pissed when that one went commercial. It is very configurable, but of course more complex. I wrote something about it some time ago https://medium.com/@jmlucjav/how-to-install-solr-as-a-service-in-any-platform-including-solr-5-8e4a93cc3909 thanks On Thu, Feb 9, 2017 at 4:53 PM, Shawn Heisey <apa...@elyograg.org> wrote: > On 2/9/2017 5:24 AM, xavier jmlucjav wrote: > > I always wondered, if this was not really needed, and I could just call > > 'restart' in every node, in a quick loop, and forget about it. Does > anyone > > know if this is the case? > > > > My doubt is in regards to changing some config, and then doing the above > > (just restart nodes in a loop). For example, what if I change a config G > > used in collection C, and I restart just one of the nodes (N1), leaving > the rest alone. If all the nodes contain a shard for C, what happens, N1 is > using the new config and the rest are not? how is this handled? > > If you want to change the config or schema for a collection and make it > active across all nodes, just use the collections API to RELOAD the > collection. The change will be picked up everywhere. > > https://cwiki.apache.org/confluence/display/solr/Collections+API > > To answer your question: No. Solr does not have the ability to restart > itself. It would require significant development effort and a > fundamental change in how Solr is started to make it possible. It is > something that has been discussed, but at this time it is not possible. > > One idea that would make this possible is mentioned on the following > wiki page. It talks about turning Solr into two applications instead of > one: > > https://wiki.apache.org/solr/WhyNoWar#Information_that.27s_ > not_version_specific > > Again -- it would not be easy, which is why it hasn't been done yet. > > Thanks, > Shawn > >
Re: procedure to restart solrcloud, and config/collection consistency
Hi Markus, yes, of course I know (and use) the collections api to reload the config. I am asking what would happen in that scenario: - config updated (but collection not reloaded) - i restart one node now one node has the new config and the rest the old one?? Regarding restarting many hosts, my question is if we can just 'restart' each solr and that is enough, or it is better to first stop all, and then start all. thanks On Thu, Feb 9, 2017 at 1:28 PM, Markus Jelsma <markus.jel...@openindex.io> wrote: > Hello - if you just want to use updated configuration, you can use Solr's > collection reload API call. For restarting we rely on remote provisioning > tools such as Salt, other managing tools can probably execute commands > remotely as well. > > If you operate more than just a very few machines, i'd really recommend > using these tools. > > Markus > > > > -Original message- > > From:xavier jmlucjav <jmluc...@gmail.com> > > Sent: Thursday 9th February 2017 13:24 > > To: solr-user <solr-user@lucene.apache.org> > > Subject: procedure to restart solrcloud, and config/collection > consistency > > > > Hi, > > > > When I need to restart a Solrcloud cluster, I always do this: > > - log in into host nb1, stop solr > > - log in into host nb2, stop solr > > -... > > - log in into host nbX, stop solr > > - verify all hosts did stop > > - in host nb1, start solr > > - in host nb12, start solr > > -... > > > > I always wondered, if this was not really needed, and I could just call > > 'restart' in every node, in a quick loop, and forget about it. Does > anyone > > know if this is the case? > > > > My doubt is in regards to changing some config, and then doing the above > > (just restart nodes in a loop). For example, what if I change a config G > > used in collection C, and I restart just one of the nodes (N1), leaving > the > > rest alone. If all the nodes contain a shard for C, what happens, N1 is > > using the new config and the rest are not? how is this handled? > > > > thanks > > xavier > > >
procedure to restart solrcloud, and config/collection consistency
Hi, When I need to restart a Solrcloud cluster, I always do this: - log in into host nb1, stop solr - log in into host nb2, stop solr -... - log in into host nbX, stop solr - verify all hosts did stop - in host nb1, start solr - in host nb12, start solr -... I always wondered, if this was not really needed, and I could just call 'restart' in every node, in a quick loop, and forget about it. Does anyone know if this is the case? My doubt is in regards to changing some config, and then doing the above (just restart nodes in a loop). For example, what if I change a config G used in collection C, and I restart just one of the nodes (N1), leaving the rest alone. If all the nodes contain a shard for C, what happens, N1 is using the new config and the rest are not? how is this handled? thanks xavier
reuse a org.apache.lucene.search.Query in Solrj?
Hi, I have a lucene Query (Boolean query with a bunch of possibly complex spatial queries, even polygon etc) that I am building for some MemoryIndex stuff. Now I need to add that same query to a Solr query (adding it to a bunch of other fq I am using). Is there a some way to piggyback the lucene query this way?? It would be extremelly handy in my situation. thanks xavier
solrj: get to which shard a id will be routed
Hi Is there somewhere a sample of some solrj code that given: - a collection - the id (like "IBM!12345") returns the shard to where the doc will be routed? I was hoping to get that info from CloudSolrClient itself but it's not exposing it as far as I can see. thanks xavier
Re: 'solr zk upconfig' etc not working on windows since 6.1 at least
done, with simple patch https://issues.apache.org/jira/browse/SOLR-9697 On Thu, Oct 27, 2016 at 4:21 PM, xavier jmlucjav <jmluc...@gmail.com> wrote: > sure, will do, I tried before but I could not create a Jira, now I can, > not sure what was happening. > > On Thu, Oct 27, 2016 at 3:14 PM, Shalin Shekhar Mangar < > shalinman...@gmail.com> wrote: > >> Would you mind opening a jira issue and give a patch (diff)? 6.3 is coming >> out soon and we'd have to hurry if this fix has to go in. >> >> On Thu, Oct 27, 2016 at 6:32 PM, xavier jmlucjav <jmluc...@gmail.com> >> wrote: >> >> > Correcting myself here, I was wrong about the cause (I had already >> messed >> > with the script). >> > >> > I made it work by commenting out line 1261 (the number might be a bit >> off >> > as I have modified the script, but hopefully its easy to see where): >> > >> > ) ELSE IF "%1"=="/?" ( >> > goto zk_usage >> > ) ELSE IF "%1"=="-h" ( >> > goto zk_usage >> > ) ELSE IF "%1"=="-help" ( >> > goto zk_usage >> > ) ELSE IF "!ZK_SRC!"=="" ( >> > if not "%~1"=="" ( >> > goto set_zk_src >> > ) >> > * rem goto zk_usage* >> > ) ELSE IF "!ZK_DST!"=="" ( >> > IF "%ZK_OP%"=="cp" ( >> > goto set_zk_dst >> > ) >> > IF "%ZK_OP%"=="mv" ( >> > goto set_zk_dst >> > ) >> > set ZK_DST="_" >> > ) ELSE IF NOT "%1"=="" ( >> > set ERROR_MSG="Unrecognized or misplaced zk argument %1%" >> > >> > Now upconfig works! >> > >> > thanks >> > xavier >> > >> > >> > On Thu, Oct 27, 2016 at 2:43 PM, xavier jmlucjav <jmluc...@gmail.com> >> > wrote: >> > >> > > hi, >> > > >> > > Am I missing something or this is broken in windows? I cannot >> upconfig, >> > > the scripts keeps exiting immediately and showing usage, as if I use >> some >> > > wrong parameters. This is on win10, jdk8. But I am pretty sure I saw >> it >> > > also on win7 (don't have that around anymore to try) >> > > >> > > I think the issue is: there is a SHIFT too much in line 1276 of >> solr.cmd: >> > > >> > > :set_zk_op >> > > set ZK_OP=%~1 >> > > SHIFT >> > > goto parse_zk_args >> > > >> > > if this SHIFT is removed, then parse_zk_args works (and it does the >> shift >> > > itself). But the upconfig hangs, so still it does not work. >> > > >> > > this probably was introduced in a851d5f557aefd76c01ac23da076a1 >> 4dc7576d8e >> > > by Erick (not sure which one :) ) on July 2nd. Master still has this >> > issue. >> > > Would be great if this was fixed in the incoming 6.3... >> > > >> > > My cmd scripting is not too strong and I did not go further. I >> searched >> > > Jira but found nothing. By the way is it not possible to open tickets >> in >> > > Jira anymore? >> > > >> > > xavier >> > > >> > >> >> >> >> -- >> Regards, >> Shalin Shekhar Mangar. >> > >
Re: 'solr zk upconfig' etc not working on windows since 6.1 at least
sure, will do, I tried before but I could not create a Jira, now I can, not sure what was happening. On Thu, Oct 27, 2016 at 3:14 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > Would you mind opening a jira issue and give a patch (diff)? 6.3 is coming > out soon and we'd have to hurry if this fix has to go in. > > On Thu, Oct 27, 2016 at 6:32 PM, xavier jmlucjav <jmluc...@gmail.com> > wrote: > > > Correcting myself here, I was wrong about the cause (I had already messed > > with the script). > > > > I made it work by commenting out line 1261 (the number might be a bit off > > as I have modified the script, but hopefully its easy to see where): > > > > ) ELSE IF "%1"=="/?" ( > > goto zk_usage > > ) ELSE IF "%1"=="-h" ( > > goto zk_usage > > ) ELSE IF "%1"=="-help" ( > > goto zk_usage > > ) ELSE IF "!ZK_SRC!"=="" ( > > if not "%~1"=="" ( > > goto set_zk_src > > ) > > * rem goto zk_usage* > > ) ELSE IF "!ZK_DST!"=="" ( > > IF "%ZK_OP%"=="cp" ( > > goto set_zk_dst > > ) > > IF "%ZK_OP%"=="mv" ( > > goto set_zk_dst > > ) > > set ZK_DST="_" > > ) ELSE IF NOT "%1"=="" ( > > set ERROR_MSG="Unrecognized or misplaced zk argument %1%" > > > > Now upconfig works! > > > > thanks > > xavier > > > > > > On Thu, Oct 27, 2016 at 2:43 PM, xavier jmlucjav <jmluc...@gmail.com> > > wrote: > > > > > hi, > > > > > > Am I missing something or this is broken in windows? I cannot upconfig, > > > the scripts keeps exiting immediately and showing usage, as if I use > some > > > wrong parameters. This is on win10, jdk8. But I am pretty sure I saw > it > > > also on win7 (don't have that around anymore to try) > > > > > > I think the issue is: there is a SHIFT too much in line 1276 of > solr.cmd: > > > > > > :set_zk_op > > > set ZK_OP=%~1 > > > SHIFT > > > goto parse_zk_args > > > > > > if this SHIFT is removed, then parse_zk_args works (and it does the > shift > > > itself). But the upconfig hangs, so still it does not work. > > > > > > this probably was introduced in a851d5f557aefd76c01ac23da076a1 > 4dc7576d8e > > > by Erick (not sure which one :) ) on July 2nd. Master still has this > > issue. > > > Would be great if this was fixed in the incoming 6.3... > > > > > > My cmd scripting is not too strong and I did not go further. I searched > > > Jira but found nothing. By the way is it not possible to open tickets > in > > > Jira anymore? > > > > > > xavier > > > > > > > > > -- > Regards, > Shalin Shekhar Mangar. >
Re: 'solr zk upconfig' etc not working on windows since 6.1 at least
Correcting myself here, I was wrong about the cause (I had already messed with the script). I made it work by commenting out line 1261 (the number might be a bit off as I have modified the script, but hopefully its easy to see where): ) ELSE IF "%1"=="/?" ( goto zk_usage ) ELSE IF "%1"=="-h" ( goto zk_usage ) ELSE IF "%1"=="-help" ( goto zk_usage ) ELSE IF "!ZK_SRC!"=="" ( if not "%~1"=="" ( goto set_zk_src ) * rem goto zk_usage* ) ELSE IF "!ZK_DST!"=="" ( IF "%ZK_OP%"=="cp" ( goto set_zk_dst ) IF "%ZK_OP%"=="mv" ( goto set_zk_dst ) set ZK_DST="_" ) ELSE IF NOT "%1"=="" ( set ERROR_MSG="Unrecognized or misplaced zk argument %1%" Now upconfig works! thanks xavier On Thu, Oct 27, 2016 at 2:43 PM, xavier jmlucjav <jmluc...@gmail.com> wrote: > hi, > > Am I missing something or this is broken in windows? I cannot upconfig, > the scripts keeps exiting immediately and showing usage, as if I use some > wrong parameters. This is on win10, jdk8. But I am pretty sure I saw it > also on win7 (don't have that around anymore to try) > > I think the issue is: there is a SHIFT too much in line 1276 of solr.cmd: > > :set_zk_op > set ZK_OP=%~1 > SHIFT > goto parse_zk_args > > if this SHIFT is removed, then parse_zk_args works (and it does the shift > itself). But the upconfig hangs, so still it does not work. > > this probably was introduced in a851d5f557aefd76c01ac23da076a14dc7576d8e > by Erick (not sure which one :) ) on July 2nd. Master still has this issue. > Would be great if this was fixed in the incoming 6.3... > > My cmd scripting is not too strong and I did not go further. I searched > Jira but found nothing. By the way is it not possible to open tickets in > Jira anymore? > > xavier >
'solr zk upconfig' etc not working on windows since 6.1 at least
hi, Am I missing something or this is broken in windows? I cannot upconfig, the scripts keeps exiting immediately and showing usage, as if I use some wrong parameters. This is on win10, jdk8. But I am pretty sure I saw it also on win7 (don't have that around anymore to try) I think the issue is: there is a SHIFT too much in line 1276 of solr.cmd: :set_zk_op set ZK_OP=%~1 SHIFT goto parse_zk_args if this SHIFT is removed, then parse_zk_args works (and it does the shift itself). But the upconfig hangs, so still it does not work. this probably was introduced in a851d5f557aefd76c01ac23da076a14dc7576d8e by Erick (not sure which one :) ) on July 2nd. Master still has this issue. Would be great if this was fixed in the incoming 6.3... My cmd scripting is not too strong and I did not go further. I searched Jira but found nothing. By the way is it not possible to open tickets in Jira anymore? xavier
Re: JNDI settings
I did set up JNDI for DIH once, and you have to tweak the jetty setup. Of course, solr should have its own jetty instance, the old way of being just a war is not true anymore. I don't remember where, but there should be some instructions somewhere, it took me an afternoon to set it up fine. xavier On Wed, Sep 21, 2016 at 1:15 PM, Aristedes Maniatis <amania...@apache.org> wrote: > On 13/09/2016 1:29am, Aristedes Maniatis wrote: > > I am using Solr 5.5 and wanting to add JNDI settings to Solr (for data > import). I'm new to Solr Cloud setup (previously I was running Solr running > as a custom bundled war) so I can't figure where to put the JNDI settings > with user/pass themselves. > > > > I don't want to add it to jetty.xml because that's part of the packaged > application which will be upgraded from time to time. > > > > Should it go into solr.xml inside the solr.home directory? If so, what's > the right syntax there? > > > Just a follow up on this question. Does anyone know of how I can add JNDI > settings to Solr without overwriting parts of the application itself? > > Cheers > Ari > > > > -- > --> > Aristedes Maniatis > GPG fingerprint CBFB 84B4 738D 4E87 5E5C 5EFA EF6A 7D2E 3E49 102A >
Re: SOLR 5.4.0?
El 31/12/15 a las 8:07, Ere Maijala escribió: Well, for us SOLR-8418 is a major issue. I haven't encountered other issues, but that one was sort of a show-stopper. --Ere 31.12.2015, 7.27, William Bell kirjoitti: How is SOLR 5.4.0 ? I heard there was a quick 5.4.1 coming out? Any major issues? For us, SOLR-7864 (where timeAllowed is broken) is a major bug, which prevents us from finishing migration to solr 5 (we are currently using 4.3). For our use case, correct operation of timeAllowed is critical. Best regards, Xavier -- Trovit Twitter <http://twitter.com/trovit>Facebook <http://www.facebook.com/trovit.search>Linkedin <http://www.linkedin.com/company/trovit>Google + <http://plus.google.com/+trovit/>Blog <http://about.trovit.com/blog/> *Xavier Sánchez Loro* Pipeline +34 93 209 2556
Re: How to use DocumentAnalysisRequestHandler in java
Hi, Faceting is indeed the best way to do it. Here is how it will look like in java: SolrQuery query = new SolrQuery(); query.setQuery(id: + docId); query.setFacet(true); query.addFacetField(text); // You can add all fields you want to inspect query.setFacetMinCount(1); // Otherwise you'll get even tokens that are not in your document QueryResponse rsp = this.index.query(query); // Now look at the results (for field text) FacetField facetField = rsp.getFacetField(text); for (Count field : facetField.getValues()) { System.out.println(field.getName()); } Xavier. Le 20/08/2015 22:20, Upayavira a écrit : On Thu, Aug 20, 2015, at 04:34 PM, Jean-Pierre Lauris wrote: Hi, I'm trying to obtain indexed tokens from a document id, in order to see what has been indexed exactly. It seems that DocumentAnalysisRequestHandler does that, but I couldn't figure out how to use it in java. The doc says I must provide a contentstream but the available init() method only takes a NamedList as a parameter. https://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/handler/DocumentAnalysisRequestHandler.html Could somebody provide me with a short example of how to get index information from a document id? If you are talking about what I think you are, then that is used by the Admin UI to implement the analysis tab. You pass in a document, and it returns it analysed. As Alexandre says, faceting may well get you there if you want to query a document already in your index. Upayavira -- Xavier Tannier Associate Professor / Maître de conférence (HDR) Univ. Paris-Sud LIMSI-CNRS (bât. 508, bureau 12, RdC) B.P. 133 91403 ORSAY CEDEX FRANCE http://www.limsi.fr/~xtannier/ http://www.limsi.fr/%7Extannier/ tel: 0033 (0)1 69 85 80 12 fax: 0033 (0)1 69 85 80 88 ---
Can I be added to the Wiki contributors group?
I mean for: https://wiki.apache.org/solr/FrontPage My username is XavierMorera Regards, Xavier -- *Xavier Morera* Entrepreneur | Author Trainer | Consultant | Developer Scrum Master *www.xaviermorera.com http://www.xaviermorera.com/* office: (305) 600-4919 cel: +506 8849-8866 skype: xmorera Twitter https://twitter.com/xmorera | LinkedIn https://www.linkedin.com/in/xmorera | Pluralsight Author http://www.pluralsight.com/author/xavier-morera
Re: Mongo DB Users
I think what some people are actually saying is burn in hell Aaron Susan for using a solr apache dl for marketing purposes? On Tue, Sep 16, 2014 at 8:31 AM, Suman Ghosh suman.ghos...@gmail.com wrote: Remove On Mon, Sep 15, 2014 at 11:35 AM, Aaron Susan aaronsus...@gmail.com wrote: Hi, I am here to inform you that we are having a contact list of *Mongo DB Users *would you be interested in it? Data Field’s Consist Of: Name, Job Title, Verified Phone Number, Verified Email Address, Company Name Address Employee Size, Revenue size, SIC Code, Industry Type etc., We also provide other technology users as well depends on your requirement. For Example: *Red Hat * *Terra data * *Net-app * *NuoDB* *MongoHQ ** and many more* We also provide IT Decision Makers, Sales and Marketing Decision Makers, C-level Titles and other titles as per your requirement. Please review and let me know your interest if you are looking for above mentioned users list or other contacts list for your campaigns. Waiting for a positive response! Thanks *Aaron Susan* Data Specialist If you are not the right person, feel free to forward this email to the right person in your organization. To opt out response Remove -- *Xavier Morera* email: xav...@familiamorera.com CR: +(506) 8849 8866 US: +1 (305) 600 4919 skype: xmorera
Getting Started with Enterprise Search using Apache Solr
Hi. Most of the members here are already seasoned search professionals. However I believe there may also be a few who joined because they want to get started on search and IMHO, probably like you, Solr is the best way to start. Therefore I wanted to post a link to a course that I created on Getting Started Enterprise Search using Apache Solr. For some it might be a good way to start learning. If you are already a search professional maybe you will not benefit greatly, but if you can provide feedback that will be great as I want to create more trainings to help people get started on search. It is a Pluralsight training so if you are not a subscriber, just create a trial account and you have 10 days to watch. If you have questions, let me know. You can reach me through here or @xmorera in Twitter Here is the course http://pluralsight.com/training/Courses/TableOfContents/enterprise-search-using-apache-solr PS: Pluralsight is also a great way to learn so I really recommend it. https://www.linkedin.com/news?viewArticle=articleID=8578259352468791690gid=161594type=memberitem=5887568199951605762articleURL=http%3A%2F%2Fpluralsight%2Ecom%2Ftraining%2FCourses%2FTableOfContents%2Fenterprise-search-using-apache-solrurlhash=45UXgoback=%2Egde_161594_member_5887568199951605762 Getting Started with Enterprise Search using Apache Solr pluralsight.com Search is one of the most misunderstood functionalities in the IT industry. Even further, Enterprise Search used to be neither for the faint of heart, nor for those with a thin wallet. However, since the introduction of Apache Solr, the name of the game has changed. Don't leave home without it! -- *Xavier Morera* email: xav...@familiamorera.com CR: +(506) 8849 8866 US: +1 (305) 600 4919 skype: xmorera
Re: Raw query parameters
You saved my life Shawn! Thanks! On Mon, Apr 28, 2014 at 11:54 PM, Shawn Heisey s...@elyograg.org wrote: On 4/28/2014 7:54 PM, Xavier Morera wrote: Would anyone be so kind to explain what are the Raw query parameters in Solr's admin UI. I can't find an explanation in either the reference guide nor wiki nor web search. The query API supports a lot more parameters than are shown on the admin UI. For instance, If you are doing a faceted search, there are only boxes for facet.query, facet.field, and facet.prefix ... but faceted search supports a lot more parameters (like facet.method, facet.limit, facet.mincount, facet.sort, etc). Raw Query Parameters gives you a way to use the entire query API, not just the few things that have UI input boxes. Thanks, Shawn -- *Xavier Morera* email: xav...@familiamorera.com CR: +(506) 8849 8866 US: +1 (305) 600 4919 skype: xmorera
Raw query parameters
Hi, Would anyone be so kind to explain what are the Raw query parameters in Solr's admin UI. I can't find an explanation in either the reference guide nor wiki nor web search. [image: Inline image 1] A bit confused on what it actually is for [image: Inline image 3] Thanks in advance, Xavier -- *Xavier Morera* email: xav...@familiamorera.com CR: +(506) 8849 8866 US: +1 (305) 600 4919 skype: xmorera
[ANN] sadat: generate fake docs for your Solr index
Hi, A couple of times I found myself in the following situation: I had to work on a Solr schema, but had no docs to index yet (the db was not ready etc). In order to start learning js, I needed some small project to practice, so I thought of this small utility. It allows you to generate fake docs to index, so you can at least advance with the schema/solrconfig design. Currently it allows (based on your current schema) to generate the most basic field types (int, float, boolean, text, date), and user defined functions can be plugged in for customized generation. Have a look at https://github.com/jmlucjav/sadat
Re: When is/should qf different from pf?
I am confused, wouldn't a doc that match both the phrase and the term queries have a better score than a doc matching only the term score, even if qf and pf are the same?? On Mon, Oct 28, 2013 at 7:54 PM, Upayavira u...@odoko.co.uk wrote: There'd be no point having them the same. You're likely to include boosts in your pf, so that docs that match the phrase query as well as the term query score higher than those that just match the term query. Such as: qf=text descriptionpf=text^2 description^4 Upayavira On Mon, Oct 28, 2013, at 05:44 PM, Amit Nithian wrote: Thanks Erick. Numeric fields make sense as I guess would strictly string fields too since its one term? In the normal text searching case though does it make sense to have qf and pf differ? Thanks Amit On Oct 28, 2013 3:36 AM, Erick Erickson erickerick...@gmail.com wrote: The facetious answer is when phrases aren't important in the fields. If you're doing a simple boolean match, adding phrase fields will add expense, to no good purpose etc. Phrases on numeric fields seems wrong. FWIW, Erick On Mon, Oct 28, 2013 at 1:03 AM, Amit Nithian anith...@gmail.com wrote: Hi all, I have been using Solr for years but never really stopped to wonder: When using the dismax/edismax handler, when do you have the qf different from the pf? I have always set them to be the same (maybe different weights) but I was wondering if there is a situation where you would have a field in the qf not in the pf or vice versa. My understanding from the docs is that qf is a term-wise hard filter while pf is a phrase-wise boost of documents who made it past the qf filter. Thanks! Amit
Re: do SearchComponents have access to response contents
I knew I could do that at jetty level with a servlet for instance, but the user wants to do this stuff inside solr code itself. Now that you mention the logs...that could be a solution without modifying the webapp... thanks for the input! xavier On Fri, Apr 5, 2013 at 7:55 AM, Amit Nithian anith...@gmail.com wrote: We need to also track the size of the response (as the size in bytes of the whole xml response tat is streamed, with stored fields and all). I was a bit worried cause I am wondering if a searchcomponent will actually have access to the response bytes... == Can't you get this from your container access logs after the fact? I may be misunderstanding something but why wouldn't mining the Jetty/Tomcat logs for the response size here suffice? Thanks! Amit On Thu, Apr 4, 2013 at 1:34 AM, xavier jmlucjav jmluc...@gmail.com wrote: A custom QueryResponseWriter...this makes sense, thanks Jack On Wed, Apr 3, 2013 at 11:21 PM, Jack Krupansky j...@basetechnology.com wrote: The search components can see the response as a namedlist, but it is only when SolrDispatchFIlter calls the QueryResponseWriter that XML or JSON or whatever other format (Javabin as well) is generated from the named list for final output in an HTTP response. You probably want a custom query response writer that wraps the XML response writer. Then you can generate the XML and then do whatever you want with it. The QueryResponseWriter class and queryResponseWriter in solrconfig.xml. -- Jack Krupansky -Original Message- From: xavier jmlucjav Sent: Wednesday, April 03, 2013 4:22 PM To: solr-user@lucene.apache.org Subject: do SearchComponents have access to response contents I need to implement some SearchComponent that will deal with metrics on the response. Some things I see will be easy to get, like number of hits for instance, but I am more worried with this: We need to also track the size of the response (as the size in bytes of the whole xml response tat is streamed, with stored fields and all). I was a bit worried cause I am wondering if a searchcomponent will actually have access to the response bytes... Can someone confirm one way or the other? We are targeting Sorl4.0 thanks xavier
Re: do SearchComponents have access to response contents
A custom QueryResponseWriter...this makes sense, thanks Jack On Wed, Apr 3, 2013 at 11:21 PM, Jack Krupansky j...@basetechnology.comwrote: The search components can see the response as a namedlist, but it is only when SolrDispatchFIlter calls the QueryResponseWriter that XML or JSON or whatever other format (Javabin as well) is generated from the named list for final output in an HTTP response. You probably want a custom query response writer that wraps the XML response writer. Then you can generate the XML and then do whatever you want with it. The QueryResponseWriter class and queryResponseWriter in solrconfig.xml. -- Jack Krupansky -Original Message- From: xavier jmlucjav Sent: Wednesday, April 03, 2013 4:22 PM To: solr-user@lucene.apache.org Subject: do SearchComponents have access to response contents I need to implement some SearchComponent that will deal with metrics on the response. Some things I see will be easy to get, like number of hits for instance, but I am more worried with this: We need to also track the size of the response (as the size in bytes of the whole xml response tat is streamed, with stored fields and all). I was a bit worried cause I am wondering if a searchcomponent will actually have access to the response bytes... Can someone confirm one way or the other? We are targeting Sorl4.0 thanks xavier
do SearchComponents have access to response contents
I need to implement some SearchComponent that will deal with metrics on the response. Some things I see will be easy to get, like number of hits for instance, but I am more worried with this: We need to also track the size of the response (as the size in bytes of the whole xml response tat is streamed, with stored fields and all). I was a bit worried cause I am wondering if a searchcomponent will actually have access to the response bytes... Can someone confirm one way or the other? We are targeting Sorl4.0 thanks xavier
custom similary on a field not working
I have the following setup: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType field name=descriptiontype=text indexed=true stored=true multiValued=false omitNorms=true / I index my corpus, and I can see tf is as usual, in this doc is 14 times in this field: 4.5094776 = (MATCH) weight(description:galaxy^10.0 in 440) [DefaultSimilarity], result of: 4.5094776 = score(doc=440,freq=14.0 = termFreq=14.0), product of: 0.14165252 = queryWeight, product of: 10.0 = boost 8.5082035 = idf(docFreq=30, maxDocs=56511) 0.0016648936 = queryNorm 31.834784 = fieldWeight in 440, product of: 3.7416575 = tf(freq=14.0), with freq of: 14.0 = termFreq=14.0 8.5082035 = idf(docFreq=30, maxDocs=56511) 1.0 = fieldNorm(doc=440) Then I modify my schema: similarity class=solr.SchemaSimilarityFactory/ fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer similarity class=com.customsolr.NoTfSimilarityFactory/ /fieldType I just want to disable term freq 1, so a term its either present or not. public class NoTfSimilarity extends DefaultSimilarity { public float tf(float freq) { return freq 0 ? 1.0f : 0.0f; } } But I still see tf=14 in my query?? 723.89526 = (MATCH) weight(description:galaxy^10.0 in 440) [], result of: 723.89526 = score(doc=440,freq=14.0 = termFreq=14.0), product of: 85.08203 = queryWeight, product of: 10.0 = boost 8.5082035 = idf(docFreq=30, maxDocs=56511) 1.0 = queryNorm 8.5082035 = fieldWeight in 440, product of: 1.0 = tf(freq=14.0), with freq of: 14.0 = termFreq=14.0 8.5082035 = idf(docFreq=30, maxDocs=56511) 1.0 = fieldNorm(doc=440) anyone sees what I am missing? I am on solr4.0 thanks xavier
Re: custom similary on a field not working
Hi Felipe, I need to keep positions, that is why I cannot just use omitTermFreqAndPositions On Thu, Mar 21, 2013 at 2:36 PM, Felipe Lahti fla...@thoughtworks.comwrote: Do you really need a custom similarity? Did you try to put the attribute omitTermFreqAndPositions in your field? It could be: field name=description omitTermFreqAndPositions=truetype=text indexed=true stored=true multiValued=false omitNorms=true / http://wiki.apache.org/solr/SchemaXml On Thu, Mar 21, 2013 at 7:35 AM, xavier jmlucjav jmluc...@gmail.com wrote: I have the following setup: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType field name=descriptiontype=text indexed=true stored=true multiValued=false omitNorms=true / I index my corpus, and I can see tf is as usual, in this doc is 14 times in this field: 4.5094776 = (MATCH) weight(description:galaxy^10.0 in 440) [DefaultSimilarity], result of: 4.5094776 = score(doc=440,freq=14.0 = termFreq=14.0), product of: 0.14165252 = queryWeight, product of: 10.0 = boost 8.5082035 = idf(docFreq=30, maxDocs=56511) 0.0016648936 = queryNorm 31.834784 = fieldWeight in 440, product of: 3.7416575 = tf(freq=14.0), with freq of: 14.0 = termFreq=14.0 8.5082035 = idf(docFreq=30, maxDocs=56511) 1.0 = fieldNorm(doc=440) Then I modify my schema: similarity class=solr.SchemaSimilarityFactory/ fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer similarity class=com.customsolr.NoTfSimilarityFactory/ /fieldType I just want to disable term freq 1, so a term its either present or not. public class NoTfSimilarity extends DefaultSimilarity { public float tf(float freq) { return freq 0 ? 1.0f : 0.0f; } } But I still see tf=14 in my query?? 723.89526 = (MATCH) weight(description:galaxy^10.0 in 440) [], result of: 723.89526 = score(doc=440,freq=14.0 = termFreq=14.0), product of: 85.08203 = queryWeight, product of: 10.0 = boost 8.5082035 = idf(docFreq=30, maxDocs=56511) 1.0 = queryNorm 8.5082035 = fieldWeight in 440, product of: 1.0 = tf(freq=14.0), with freq of: 14.0 = termFreq=14.0 8.5082035 = idf(docFreq=30, maxDocs=56511) 1.0 = fieldNorm(doc=440) anyone sees what I am missing? I am on solr4.0 thanks xavier -- Felipe Lahti Consultant Developer - ThoughtWorks Porto Alegre
Re: custom similary on a field not working
Steve, yes, as I already included (though maybe is not very visible) I have this before types element: similarity class=solr.SchemaSimilarityFactory/ I can see explain info is indeed different, for example I have [] instead of [DefaultSimilarity] thanks On Thu, Mar 21, 2013 at 3:08 PM, Steve Rowe sar...@gmail.com wrote: Hi xavier, Have you set the global similarity to solr.SchemaSimilarityFactory? See http://wiki.apache.org/solr/SchemaXml#Similarity. Steve On Mar 21, 2013, at 9:44 AM, xavier jmlucjav jmluc...@gmail.com wrote: Hi Felipe, I need to keep positions, that is why I cannot just use omitTermFreqAndPositions On Thu, Mar 21, 2013 at 2:36 PM, Felipe Lahti fla...@thoughtworks.com wrote: Do you really need a custom similarity? Did you try to put the attribute omitTermFreqAndPositions in your field? It could be: field name=description omitTermFreqAndPositions=truetype=text indexed=true stored=true multiValued=false omitNorms=true / http://wiki.apache.org/solr/SchemaXml On Thu, Mar 21, 2013 at 7:35 AM, xavier jmlucjav jmluc...@gmail.com wrote: I have the following setup: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType field name=descriptiontype=text indexed=true stored=true multiValued=false omitNorms=true / I index my corpus, and I can see tf is as usual, in this doc is 14 times in this field: 4.5094776 = (MATCH) weight(description:galaxy^10.0 in 440) [DefaultSimilarity], result of: 4.5094776 = score(doc=440,freq=14.0 = termFreq=14.0), product of: 0.14165252 = queryWeight, product of: 10.0 = boost 8.5082035 = idf(docFreq=30, maxDocs=56511) 0.0016648936 = queryNorm 31.834784 = fieldWeight in 440, product of: 3.7416575 = tf(freq=14.0), with freq of: 14.0 = termFreq=14.0 8.5082035 = idf(docFreq=30, maxDocs=56511) 1.0 = fieldNorm(doc=440) Then I modify my schema: similarity class=solr.SchemaSimilarityFactory/ fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer similarity class=com.customsolr.NoTfSimilarityFactory/ /fieldType I just want to disable term freq 1, so a term its either present or not. public class NoTfSimilarity extends DefaultSimilarity { public float tf(float freq) { return freq 0 ? 1.0f : 0.0f; } } But I still see tf=14 in my query?? 723.89526 = (MATCH) weight(description:galaxy^10.0 in 440) [], result of: 723.89526 = score(doc=440,freq=14.0 = termFreq=14.0), product of: 85.08203 = queryWeight, product of: 10.0 = boost 8.5082035 = idf(docFreq=30, maxDocs=56511) 1.0 = queryNorm 8.5082035 = fieldWeight in 440, product of: 1.0 = tf(freq=14.0), with freq of: 14.0 = termFreq=14.0 8.5082035 = idf(docFreq=30, maxDocs=56511) 1.0 = fieldNorm(doc=440) anyone sees what I am missing? I am on solr4.0 thanks xavier -- Felipe Lahti Consultant Developer - ThoughtWorks Porto Alegre
Re: custom similary on a field not working
Damn...I was obfuscated seeing the 14 there...I had naively thought that term freq would not be stored in the doc, 1 would be stored, but I guess it still stores the real value and then applies custom similarity at query time. That means changing to a custom similarity does not need reindexing right? thanks for the help! xavier On Thu, Mar 21, 2013 at 5:26 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : public class NoTfSimilarity extends DefaultSimilarity { : public float tf(float freq) { : return freq 0 ? 1.0f : 0.0f; : } : } ... : But I still see tf=14 in my query?? ... : 1.0 = tf(freq=14.0), with freq of: :14.0 = termFreq=14.0 pretty sure you are looking at the explanation of the *input* to your tf() function, not that the *output* is 1.0, just like in your function. Did you compare this to what you see using the DefaultSimilarity? -Hoss
Re: 4.0 hanging on startup on Windows after Control-C
Hi Shawn, I am using DIH with commit at the end...I'll investigate further to see if this is what is happening and will report back, also will check 4.2 (that I had to do anyway...). thanks for your input xavier On Mon, Mar 18, 2013 at 6:12 PM, Shawn Heisey s...@elyograg.org wrote: On 3/17/2013 11:51 AM, xavier jmlucjav wrote: Hi, I have an index where, if I kill solr via Control-C, it consistently hangs next time I start it. Admin does not show cores, and searches never return. If I delete the index contents and I restart again all is ok. I am on windows 7, jdk1.7 and Solr4.0. Is this a known issue? I looked in jira but found nothing. I scanned your thread dump. Nothing jumped out at me, but given my inexperience with such things, I'm not surprised by that. Have you tried 4.1 or 4.2 yet to see if the problem persists? 4.0 is no longer the new hotness. Below I will discuss the culprit that springs to mind, though I don't know whether it's what you are actually hitting. One thing that can make Solr take a really long time to start up is huge transaction logs. Transaction logs must be replayed when Solr starts, and if they are huge, it can take a really long time. Do you have tlog directories in your cores (in the data dir, next to the index directory), and if you do, how much disk space do they use? The example config in 4.x has updateLog turned on. There are two common situations that can lead to huge transaction logs. One is exclusively using soft commits when indexing, the other is running a very large import with the dataimport handler and not committing until the very end. AutoCommit with openSearcher=false is a good solution to both of these situations. As long as you use openSearcher=false, it will not change what documents are visible. AutoCommit does a regular hard commit every X new documents or every Y milliseconds. A hard commit flushes index data to disk and starts a new transaction log. Solr will only keep a few transaction logs around, so frequently building new ones keeps their size down. When you restart Solr, you don't need to wait for a long time while it replays them. Thanks, Shawn
Re: Is there an EdgeSingleFilter already?
Steve, worked like a charm. thanks! On Sun, Mar 17, 2013 at 7:37 AM, Steve Rowe sar...@gmail.com wrote: See https://issues.apache.org/jira/browse/LUCENE-4843 Let me know if it works for you. Steve On Mar 16, 2013, at 5:35 PM, xavier jmlucjav jmluc...@gmail.com wrote: I read too fast your reply, so I thought you meant configuring the LimitTokenPositionFilter. I see you mean I have to write one, ok... On Sat, Mar 16, 2013 at 10:33 PM, xavier jmlucjav jmluc...@gmail.com wrote: Steve, Yes, I want only one, one two, and one two three, but nothing else. Cool if this can be achieved without java code even better, I'll check that filter. I need this for building a field used for suggestions, the user specifically wants no match only from the edge. thanks! On Sat, Mar 16, 2013 at 10:22 PM, Steve Rowe sar...@gmail.com wrote: Hi xavier, It's not clear to me what you want. Is the edge you're referring to the beginning of a field? E.g. raw text one two three four with EdgeShingleFilter configured to produce unigrams, bigrams and trigams would produce one, one two, and one two three, but nothing else? If so, I suspect writing a LimitTokenPositionFilter (which would stop emitting tokens after the token position exceeds a specified limit) would be better, rather than subclassing ShingleFilter. You could use LimitTokenCountFilter as a model, especially its comsumeAllTokens option. I think this would make a nice addition to Lucene. Also, what do you plan to use this for? Steve On Mar 16, 2013, at 5:02 PM, xavier jmlucjav jmluc...@gmail.com wrote: Hi, I need to use shingles but only keep the ones that start from the edge. I want to confirm there is no way to get this feature without subclassing ShingleFilter, cause I thought someone would have already encountered this use case thanks xavier
4.0 hanging on startup on Windows after Control-C
Hi, I have an index where, if I kill solr via Control-C, it consistently hangs next time I start it. Admin does not show cores, and searches never return. If I delete the index contents and I restart again all is ok. I am on windows 7, jdk1.7 and Solr4.0. Is this a known issue? I looked in jira but found nothing. xavier Here is a thread dump: 2013-03-17 17:58:33 Full thread dump Java HotSpot(TM) 64-Bit Server VM (23.7-b01 mixed mode): JMX server connection timeout 30 daemon prio=6 tid=0x0bbf9000 nid=0x3b4c in Object.wait() [0x1df3e000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0xe7054338 (a [I) at com.sun.jmx.remote.internal.ServerCommunicatorAdmin$Timeout.run(ServerCommunicatorAdmin.java:168) - locked 0xe7054338 (a [I) at java.lang.Thread.run(Thread.java:722) Locked ownable synchronizers: - None RMI Scheduler(0) daemon prio=6 tid=0x0bbf8000 nid=0x39d8 waiting on condition [0x1db9f000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0xb9e1e6d8 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1090) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Locked ownable synchronizers: - None RMI TCP Connection(1)-192.168.1.128 daemon prio=6 tid=0x0bbf7800 nid=0x111c runnable [0x1dd3e000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:150) at java.net.SocketInputStream.read(SocketInputStream.java:121) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read(BufferedInputStream.java:254) - locked 0xe70003c8 (a java.io.BufferedInputStream) at java.io.FilterInputStream.read(FilterInputStream.java:83) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Locked ownable synchronizers: - 0xb959bc68 (a java.util.concurrent.ThreadPoolExecutor$Worker) RMI TCP Accept-0 daemon prio=6 tid=0x0bbf5000 nid=0x1fe0 runnable [0x1da4e000] java.lang.Thread.State: RUNNABLE at java.net.DualStackPlainSocketImpl.accept0(Native Method) at java.net.DualStackPlainSocketImpl.socketAccept(DualStackPlainSocketImpl.java:121) at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:183) - locked 0xb9531a78 (a java.net.SocksSocketImpl) at java.net.ServerSocket.implAccept(ServerSocket.java:522) at java.net.ServerSocket.accept(ServerSocket.java:490) at sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:52) at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:387) at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:359) at java.lang.Thread.run(Thread.java:722) Locked ownable synchronizers: - None DestroyJavaVM prio=6 tid=0x0bbf6800 nid=0x60c waiting on condition [0x] java.lang.Thread.State: RUNNABLE Locked ownable synchronizers: - None searcherExecutor-6-thread-1 prio=6 tid=0x0bbf6000 nid=0x3480 in Object.wait() [0x1441e000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0xb9e6a4a0 (a java.lang.Object) at java.lang.Object.wait(Object.java:503) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1379) - locked 0xb9e6a4a0 (a java.lang.Object) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1200) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1135
Re: Is there an EdgeSingleFilter already?
Steve, Yes, I want only one, one two, and one two three, but nothing else. Cool if this can be achieved without java code even better, I'll check that filter. I need this for building a field used for suggestions, the user specifically wants no match only from the edge. thanks! On Sat, Mar 16, 2013 at 10:22 PM, Steve Rowe sar...@gmail.com wrote: Hi xavier, It's not clear to me what you want. Is the edge you're referring to the beginning of a field? E.g. raw text one two three four with EdgeShingleFilter configured to produce unigrams, bigrams and trigams would produce one, one two, and one two three, but nothing else? If so, I suspect writing a LimitTokenPositionFilter (which would stop emitting tokens after the token position exceeds a specified limit) would be better, rather than subclassing ShingleFilter. You could use LimitTokenCountFilter as a model, especially its comsumeAllTokens option. I think this would make a nice addition to Lucene. Also, what do you plan to use this for? Steve On Mar 16, 2013, at 5:02 PM, xavier jmlucjav jmluc...@gmail.com wrote: Hi, I need to use shingles but only keep the ones that start from the edge. I want to confirm there is no way to get this feature without subclassing ShingleFilter, cause I thought someone would have already encountered this use case thanks xavier
Re: Is there an EdgeSingleFilter already?
I read too fast your reply, so I thought you meant configuring the LimitTokenPositionFilter. I see you mean I have to write one, ok... On Sat, Mar 16, 2013 at 10:33 PM, xavier jmlucjav jmluc...@gmail.comwrote: Steve, Yes, I want only one, one two, and one two three, but nothing else. Cool if this can be achieved without java code even better, I'll check that filter. I need this for building a field used for suggestions, the user specifically wants no match only from the edge. thanks! On Sat, Mar 16, 2013 at 10:22 PM, Steve Rowe sar...@gmail.com wrote: Hi xavier, It's not clear to me what you want. Is the edge you're referring to the beginning of a field? E.g. raw text one two three four with EdgeShingleFilter configured to produce unigrams, bigrams and trigams would produce one, one two, and one two three, but nothing else? If so, I suspect writing a LimitTokenPositionFilter (which would stop emitting tokens after the token position exceeds a specified limit) would be better, rather than subclassing ShingleFilter. You could use LimitTokenCountFilter as a model, especially its comsumeAllTokens option. I think this would make a nice addition to Lucene. Also, what do you plan to use this for? Steve On Mar 16, 2013, at 5:02 PM, xavier jmlucjav jmluc...@gmail.com wrote: Hi, I need to use shingles but only keep the ones that start from the edge. I want to confirm there is no way to get this feature without subclassing ShingleFilter, cause I thought someone would have already encountered this use case thanks xavier
RE: Need help with delta import
This is absolutely a sintax error, I had the same problem, and with dih.delta.id it solves all my problems. Thanks to god and the special person who post the answer in this page. You have to revise your sintax in queries for delta import and watch the catalina (i use tomcat) log file for any errors. Regards,
[ANN] vifun: a GUI to help visually tweak Solr scoring, release 0.6
Hi, I am releasing an new version (0.6) of vifun, a GUI to help visually tweak Solr scoring. Most relevant changes are: - support float values - add support for tie - synch both Current/Baseline scrollbars (if some checkbox is selected) - doubleclick in a doc: show side by side comparison of debug score info - upgrade to griffon1.2.0 - allow using another handler (besides /select) enhancement You can check it out here: https://github.com/jmlucjav/vifun Binary distribution: http://code.google.com/p/vifun/downloads/detail?name=vifun-0.6.zip xavier
Re: [ANN] vifun: tool to help visually tweak Solr boosting
Hi Mark, Thanks for trying it out. Let me see if I explain it better: the number you have to select (in order to later being able to tweak it with the slider), is any number that must be in one of the parameters in the Scoring section. The issue you have, is that you are using /select handler from the example distribution, and that handler does not have any of these parameters (qf, pf, pf2, pf3, ps, ps2, ps3, bf, bq, boost, mm, tie), so it's normal they don't show up, there is nothing to tweak... In the example configuration from 4.1, you can select /browse handler, as it uses qf and mm, and you should be able to tweak them. Of course If you were using a real Solr installation with a sizable number of documents and some complex usage of edismax, you would be able to see much better what the tool can do. xavier On Mon, Mar 4, 2013 at 10:52 PM, Mark Bennett mark.benn...@lucidworks.comwrote: Hello Xavier, Thanks for uploading this and sharing. I also read the other messages in the thread. I'm able to get part way through your Getting Started section, I get results, but I get stuck on the editing values. I've tried with Java 6 and 7, with both the 0.5 binary and from the source distribution. What's working: * Default Solr 4.1 install (plus a couple extra fields in schema) * Able to connect to Solr (/collection1) * Able to select handler (/select) * Able to run a search: q=bandwidth rows=10 fl=title rest: pt=45.15,-93.85 (per your example) * Get 2 search results with titles * Able to select a result, mouse over, highlight score, etc. However, what I'm stuck on: * Below the Run Query button, I only see the grayed out Scoring slider. * The instructions say to highlight some numbers - I tried highlighting the 10 in rows paramour - I also tried the 45.15 in rest, and some of the scores in the results list I never see the extra parameters you show in this screen shot: https://raw.github.com/jmlucjav/vifun/master/img/screenshot-selecttarget.jpg I see the word Scoring: I don't see the blue text Select a number as a target to tweak I don't see the parameters qf, bf_0, 1, 2, bq_0, etc. I'm not sure how to get those extra fields to appear in the UI. I also tried adding defType=edismax, no luck The Handlers it sees: /select, /query, /browse, /spell, /tvrh, /clustering, /terms, /elevate (from default Solr 4.1 solrconfig.xml) I'm using /select -- Mark Bennett / LucidWorks: Search Big Data / mark.benn...@lucidworks.com Office: 408-898-4201 / Telecommute: 408-733-0387 / Cell: 408-829-6513 On Feb 23, 2013, at 6:12 AM, jmlucjav jmluc...@gmail.com wrote: Hi, I have built a small tool to help me tweak some params in Solr (typically qf, bf in edismax). As maybe others find it useful, I am open sourcing it on github: https://github.com/jmlucjav/vifun Check github for some more info and screenshots. I include part of the github page below. regards Description Did you ever spend lots of time trying to tweak all numbers in a *edismax* handler *qf*, *bf*, etc params so docs get scored to your liking? Imagine you have the params below, is 20 the right boosting for *name* or is it too much? Is *population* being boosted too much versus distance? What about new documents? !-- fields, boost some -- str name=qfname^20 textsuggest^10 edge^5 ngram^2 phonetic^1/str str name=mm33%/str !-- boost closest hits -- str name=bfrecip(geodist(),1,500,0)/str !-- boost by population -- str name=bfproduct(log(sum(population,1)),100)/str !-- boost newest docs -- str name=bfrecip(rord(moddate),1,1000,1000)/str This tool was developed in order to help me tweak the values of boosting functions etc in Solr, typically when using edismax handler. If you are fed up of: change a number a bit, restart Solr, run the same query to see how documents are scored now...then this tool is for you. https://github.com/jmlucjav/vifun#featuresFeatures - Can tweak numeric values in the following params: *qf, pf, bf, bq, boost, mm* (others can be easily added) even in *appends or invariants* - View side by side a Baseline query result and how it changes when you gradually change each value in the params - Colorized values, color depends on how the document does related to baseline query - Tooltips give you Explain info - Works on remote Solr installations - Tested with Solr 3.6, 4.0 and 4.1 (other versions would work too, as long as wt=javabin format is compatible) - Developed using Groovy/Griffon https://github.com/jmlucjav/vifun#requirementsRequirements - */select* handler should be available, and not have any *appends or invariants*, as it could interfere with how vifun works. - Java6 is needed (maybe it runs on Java5 too). A JRE should be enough. https://github.com
Index all possible facets values even if there is no document in relation
Hi everyone, My question is a little weird but i need to have all my facet values in solr index : I have a database with all possible values of my facets for my solr documents. I don't have all my facets values used by my documents, but I would like to index theses facets values even if they returned 0 documents. I need this for SEO management, and because i want to test this facets values (with 0 documents) without requesting my database. Best Regards, Xavier -- View this message in context: http://lucene.472066.n3.nabble.com/Index-all-possible-facets-values-even-if-there-is-no-document-in-relation-tp3806461p3806461.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: 'location' fieldType indexation impossible
You totally get it :) I'v deleted thoses dynamicField (though it was just an exemple), why didn't i read the comment above the line ! Thanks alot ;) Best regards, Xavier. -- View this message in context: http://lucene.472066.n3.nabble.com/location-fieldType-indexation-impossible-tp3766136p3769065.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to merge an autofacet with a predefined facet
Thank you for theses informations, I'll keep that in mind. But i'm sorry, i don't get it about the process to do it ??? Em wrote Well, you could create a keyword-file out of your database and join it with your self-maintained keywordslist. By that you mean : - 'self-maintained keywordslist' is my 'predefined_facet' already filled in database that i'll still import with DIH ? - The keyword-file isnt the same thing that i've created with synonyms/keepsword combination ? And still don't get how to 'merge' those both way of getting facets values in an only one facet ! Thanks for advance, Xavier -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-merge-an-autofacet-with-a-predefined-facet-tp3763988p3769121.html Sent from the Solr - User mailing list archive at Nabble.com.
'location' fieldType indexation impossible
Hi, When i try to index my location field i get this error for each documents : *ATTENTION: Error creating document Error adding field 'emploi_city_geoloc'='48.85,2.5525' * (so i have 0 files indexed) Here is my schema.xml : *field name=emploi_city_geoloc type=location indexed=true stored=false/* I really don't understand why it isnt working because, it was working on my local server with the same configuration (Solr 3.5.0) and the same database !!! If i try to use geohash instead of location it is working for indexation, but my geodist query in front isnt working anymore ... Any ideas ? Best regards, Xavier -- View this message in context: http://lucene.472066.n3.nabble.com/location-fieldType-indexation-impossible-tp3766136p3766136.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to merge an autofacet with a predefined facet
I'm not sure to understand your solution ? When (and how) will be the 'word' detection in the fulltext ? before (by my own) or during (with) solr indexation ? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-merge-an-autofacet-with-a-predefined-facet-tp3763988p3767059.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to index a facetfield by searching words matching from another Textfield
That's it ! Thanks :) First time i see that documentation page (which is really helpfull) : http://lucidworks.lucidimagination.com/display/solr/Filter+Descriptions#FilterDescriptions-KeepWordsFilter So, now i want to associate a wordslist to a value of an existing facets So i tried i combine synonyms and keepwords like that : fieldType name=text_tag class=solr.TextField sortMissingLast=true omitNorms=true analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonymswords.txt/ filter class=solr.KeepWordFilterFactory words=keepwords.txt ignoreCase=true/ /analyzer /fieldType It works very well but my problem now is that i want to have whitespaces return in synonym and match it with my keepwords ! (because i have whitespaces in the values of my facet) Exemple if i see : 'php' term i get with my synonyms_words : 'web langage' and i keep the whole word 'web langage' So my files are : synonymswords.txt : php=web langage keepwords.txt : web langage The problem is that each words are analyze separatly and i dont know how to handle it with whitespaces ... (synonyms return 'web' and 'langage' so it don't match with 'web langage') I tried to use 'solr.PatternReplaceFilter' (as you can see in my configuration above ) with a chosen caractere '_' as a space caracter but i get an error so if you have an other tip for me it would be great :p -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-index-a-facetfield-by-searching-words-matching-from-another-Textfield-tp3761201p3763247.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to index a facetfield by searching words matching from another Textfield
Seems that's an error from the documentation with the 'Factory' missing in the classname !!? I found filter class=solr.PatternReplaceFilterFactory pattern=_ replacement= / That is working fine !!! Conclusion i have this files : *synonymswords.txt :* php,mysql,html,css=web_langage And *keepwords.txt :* web langage With this fieldType : fieldType name=text_tag class=solr.TextField sortMissingLast=true omitNorms=true analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonymswords.txt/ filter class=solr.PatternReplaceFilterFactory pattern=_ replacement= / filter class=solr.KeepWordFilterFactory words=keepwords.txt ignoreCase=true/ /analyzer /fieldType And it's working fine ;) But I have another question, my fields are configured like that : copyField source=mytext dest=text_tag_facet / field name=text_tag_facet type=text_tag indexed=true stored=false multiValued=true/ But if I turn stored to true, it always return the full original text in my documents field value for text_tag_facet and not the facets created (like 'web langage') How can i get the result of the facet in the stored field of the document ? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-index-a-facetfield-by-searching-words-matching-from-another-Textfield-tp3761201p3763551.html Sent from the Solr - User mailing list archive at Nabble.com.
How to merge an autofacet with a predefined facet
Hi everyone, Like explained in this post : http://lucene.472066.n3.nabble.com/How-to-index-a-facetfield-by-searching-words-matching-from-another-Textfield-td3761201.html I have created a dynamic facet at indexation by searching terms in a fulltext field. But i don't know if it's possible to merge this autocreated facet with a facet already predefined ? i tried to used copyField (adding this to my code in my previous post) : *copyField source=text_tag_facet dest=predefined_facet /* but it's not seems to work ... (my text_tag_facet is always working, but didnt merged with my predefined_facet) It's maybe because (As I understood) the real (stored) value of this dynamic facet is still the initial fulltext ?? (or maybe i'm wrong ...) I'm a little confused about this and i'm certainly doing it wrong but i begin to feel that those kinds of manipulation arent feasible into schema.xml Best regards. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-merge-an-autofacet-with-a-predefined-facet-tp3763988p3763988.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to index a facetfield by searching words matching from another Textfield
Thanks for this answer. I have posted my new question (related to this post) into a new topic ;) ( http://lucene.472066.n3.nabble.com/How-to-merge-an-quot-autofacet-quot-with-a-predefined-facet-td3763988.html ) Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-index-a-facetfield-by-searching-words-matching-from-another-Textfield-tp3761201p3763993.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to merge an autofacet with a predefined facet
Sure, the difference between my 2 facets are : - 'predefined_facets' contains values already filled in my database like : 'web langage', 'cooking', 'fishing' - 'text_tag_facets' will contain the same possible value but determined automatically from a given wordslist by searching in the document text as shown in my previous post Why i want to do that ? because sometimes my 'predefined_facets' is not defined, and even if it is, i want to defined it the more as possible. Best regards, Xavier -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-merge-an-autofacet-with-a-predefined-facet-tp3763988p3764116.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to merge an autofacet with a predefined facet
In a way I agree that it would be easier to do that but i really wants to avoid this solution because it prefer to work harder on preparing my index than adding field requests on my front query :) So the only solution i see right now is to do that on my own in order to have my database fully prepared to be indexed ... but i had hope that solr could handle it ... so if anyone see any solution to handle it directly with solr you are welcome :p Anyways thanks for your help Em ;) Best regards, Xavier -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-merge-an-autofacet-with-a-predefined-facet-tp3763988p3764506.html Sent from the Solr - User mailing list archive at Nabble.com.
How to index a facetfield by searching words matching from another Textfield
Hi everyone, I'm a new Solr User but i used to work on Endeca. There is a modul called TextTagger with Endeca that is auto indexing values in a facetfield (multivalued) when he find words (from a given wordslist) into an other TextField from that document. I didn't see any subjects or any ways to do it with Solr ??? Thanks for advance ;) -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-index-a-facetfield-by-searching-words-matching-from-another-Textfield-tp3761201p3761201.html Sent from the Solr - User mailing list archive at Nabble.com.
Tomcat6 and Log4j
Hi, I added “slf4j-log4j12-1.5.5.jar” and “log4j-1.2.15.jar” to $CATALINA_HOME/webapps/solr/WEB-INF/lib , then deleted the library “slf4j-jdk14-1.5.5.jar” from $CATALINA_HOME/webapps/solr/WEB-INF/lib, then created a directory $CATALINA_HOME/webapps/solr/WEB-INF/classes. and created $CATALINA_HOME/webapps/solr/WEB-INF/classes/log4j.properties with the following contents : log4j.rootLogger=INFO log4j.appender.SOLR.logfile=org.apache.log4j.DailyRollingFileAppender log4j.appender.SOLR.logfile.file=/home/quetelet_bdq/logs/bdq.log log4j.appender.SOLR.logfile.DatePattern='.'-MM-dd log4j.appender.SOLR.logfile.layout=org.apache.log4j.PatternLayout log4j.appender.SOLR.logfile.layout.conversionPattern=%d %p [%c{3}] - [%t] - %X{ip}: %m%n log4j.appender.SOLR.logfile = true I restarted solr and I got the following message in the catalina.out log : log4j:WARN No appenders could be found for logger (org.apache.solr.core.SolrResourceLoader). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. What is told on this page is that this error occurs what the log4j.properties isn't found. Could someone help me to have it working ? Thanks in advance, Xavier
Re: Tomcat6 and Log4j
Thanks for your response. How could I do that ? From: Jan Høydahl jan@cominvent.com Sent: Thu Feb 10 11:01:15 CET 2011 To: solr-user@lucene.apache.org Subject: Re: Tomcat6 and Log4j Have you tried to start Tomcat with -Dlog4j.configuration=$CATALINA_HOME/webapps/solr/WEB-INF/classes/log4j.properties -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 10. feb. 2011, at 09.41, Xavier Schepler wrote: Hi, I added “slf4j-log4j12-1.5.5.jar” and “log4j-1.2.15.jar” to $CATALINA_HOME/webapps/solr/WEB-INF/lib , then deleted the library “slf4j-jdk14-1.5.5.jar” from $CATALINA_HOME/webapps/solr/WEB-INF/lib, then created a directory $CATALINA_HOME/webapps/solr/WEB-INF/classes. and created $CATALINA_HOME/webapps/solr/WEB-INF/classes/log4j.properties with the following contents : log4j.rootLogger=INFO log4j.appender.SOLR.logfile=org.apache.log4j.DailyRollingFileAppender log4j.appender.SOLR.logfile.file=/home/quetelet_bdq/logs/bdq.log log4j.appender.SOLR.logfile.DatePattern='.'-MM-dd log4j.appender.SOLR.logfile.layout=org.apache.log4j.PatternLayout log4j.appender.SOLR.logfile.layout.conversionPattern=%d %p [%c{3}] - [%t] - %X{ip}: %m%n log4j.appender.SOLR.logfile = true I restarted solr and I got the following message in the catalina.out log : log4j:WARN No appenders could be found for logger (org.apache.solr.core.SolrResourceLoader). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. What is told on this page is that this error occurs what the log4j.properties isn't found. Could someone help me to have it working ? Thanks in advance, Xavier -- Tous les courriers électroniques émis depuis la messagerie de Sciences Po doivent respecter des conditions d'usages. Pour les consulter rendez-vous sur http://www.ressources-numeriques.sciences-po.fr/confidentialite_courriel.htm
Re: Tomcat6 and Log4j
I added it to /etc/default/tomcat6. What happened is that the same error message appeared twice in /var/log/tomcat6/catalina.out. Like the same file was loaded twice. -- Tous les courriers électroniques émis depuis la messagerie de Sciences Po doivent respecter des conditions d'usages. Pour les consulter rendez-vous sur http://www.ressources-numeriques.sciences-po.fr/confidentialite_courriel.htm
Re: Tomcat6 and Log4j
Yes thanks. This works fine : log4j.rootLogger=INFO, SOLR log4j.appender.SOLR=org.apache.log4j.DailyRollingFileAppender log4j.appender.SOLR.file=/home/quetelet_bdq/logs/bdq.log log4j.appender.SOLR.datePattern='.'-MM-dd log4j.appender.SOLR.layout=org.apache.log4j.PatternLayout log4j.appender.SOLR.layout.conversionPattern=%d %p [%c{3}] - [%t] - %X{ip}: %m%n -- Tous les courriers électroniques émis depuis la messagerie de Sciences Po doivent respecter des conditions d'usages. Pour les consulter rendez-vous sur http://www.ressources-numeriques.sciences-po.fr/confidentialite_courriel.htm
Re: Local param tag voodoo ?
Ok, I tryed to use nested queries this way: wt=jsonindent=truefl=qFRq=sarkozy _query_:{!tag=test}chiracfacet=truefacet.field={!ex=test}studyDescriptionId It resulted in this error: facet_counts:{ facet_queries:{}, exception:java.lang.NullPointerException\n\tat org.apache.solr.request.SimpleFacets.parseParams(SimpleFacets.java:132)\n\tat org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:278)\n\tat org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:166)\n\tat org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72)\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)\n\tat org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)\n\tat org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)\n\tat org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)\n\tat org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)\n\tat org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)\n\tat org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)\n\tat org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)\n\tat org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)\n\tat org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)\n\tat org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)\n\tat java.lang.Thread.run(Thread.java:636)\n}} Then I tryed a simpler version: q={!tag=test}chiracfacet=truefacet.field={!ex=test}studyDescriptionId It resulted in the same error. From: Jonathan Rochkind rochk...@jhu.edu Sent: Wed Jan 19 17:38:53 CET 2011 To: solr-user@lucene.apache.org solr-user@lucene.apache.org Subject: Re: Local param tag voodoo ? What query are you actually trying to do? There's probably a way to do it, possibly using nested queries -- but not using illegal syntax like some of your examples! If you explain what you want to do, someone may be able to tell you how. From the hints in your last message, I suspect nested queries _might_ be helpful to you. On 1/19/2011 3:46 AM, Xavier SCHEPLER wrote: Ok I was already at this point. My facetting system use exactly what is described in this page. I read it from the Solr 1.4 book. Otherwise I would'nt ask. The problem is that the filter queries doesn't affect the relevance score of the results so I want the terms in the main query. From: Markus Jelsmamarkus.jel...@openindex.io Sent: Tue Jan 18 21:31:52 CET 2011 To:solr-user@lucene.apache.org Subject: Re: Local param tag voodoo ? Hi, You get an error because LocalParams need to be in the beginning of a parameter's value. So no parenthesis first. The second query should not give an error because it's a valid query. Anyway, i assume you're looking for : http://wiki.apache.org/solr/SimpleFacetParameters#Multi- Select_Faceting_and_LocalParams Cheers, Hey, here are my needs : - a query that has tagged and untagged contents - facets that ignore the tagged contents I tryed : q=({!tag=toExclude} ignored) taken into account q={tag=toExclude v='ignored'} take into account Both resulted in a error. Is this possible or do I have to try another way ? -- Tous les courriers électroniques émis depuis la messagerie de Sciences Po doivent respecter des conditions d'usages. Pour les consulter rendez-vous sur http://www.ressources-numeriques.sciences-po.fr/confidentialite_courriel.htm -- Tous les courriers électroniques émis depuis la messagerie de Sciences Po doivent respecter des conditions d'usages. Pour les consulter rendez-vous sur http://www.ressources-numeriques.sciences-po.fr/confidentialite_courriel.htm
Re: Local param tag voodoo ?
Since it seems to be no voodoo available I did it on the client side. I send a first request to get the facets and a second to get the documents and their highlighting. It works well but requires more processing. From: Xavier SCHEPLER xavier.schep...@sciences-po.fr Sent: Thu Jan 20 10:59:40 CET 2011 To: solr-user@lucene.apache.org Subject: Re: Local param tag voodoo ? Ok, I tryed to use nested queries this way: wt=jsonindent=truefl=qFRq=sarkozy _query_:{!tag=test}chiracfacet=truefacet.field={!ex=test}studyDescriptionId It resulted in this error: facet_counts:{ facet_queries:{}, exception:java.lang.NullPointerException\n\tat org.apache.solr.request.SimpleFacets.parseParams(SimpleFacets.java:132)\n\tat org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:278)\n\tat org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:166)\n\tat org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72)\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)\n\tat org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)\n\tat org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)\n\tat org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)\n\tat org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)\n\tat org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)\n\tat org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)\n\tat org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)\n\tat org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)\n\tat org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)\n\tat org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)\n\tat java.lang.Thread.run(Thread.java:636)\n}} Then I tryed a simpler version: q={!tag=test}chiracfacet=truefacet.field={!ex=test}studyDescriptionId It resulted in the same error. From: Jonathan Rochkind rochk...@jhu.edu Sent: Wed Jan 19 17:38:53 CET 2011 To: solr-user@lucene.apache.org solr-user@lucene.apache.org Subject: Re: Local param tag voodoo ? What query are you actually trying to do? There's probably a way to do it, possibly using nested queries -- but not using illegal syntax like some of your examples! If you explain what you want to do, someone may be able to tell you how. From the hints in your last message, I suspect nested queries _might_ be helpful to you. On 1/19/2011 3:46 AM, Xavier SCHEPLER wrote: Ok I was already at this point. My facetting system use exactly what is described in this page. I read it from the Solr 1.4 book. Otherwise I would'nt ask. The problem is that the filter queries doesn't affect the relevance score of the results so I want the terms in the main query. From: Markus Jelsmamarkus.jel...@openindex.io Sent: Tue Jan 18 21:31:52 CET 2011 To:solr-user@lucene.apache.org Subject: Re: Local param tag voodoo ? Hi, You get an error because LocalParams need to be in the beginning of a parameter's value. So no parenthesis first. The second query should not give an error because it's a valid query. Anyway, i assume you're looking for : http://wiki.apache.org/solr/SimpleFacetParameters#Multi- Select_Faceting_and_LocalParams Cheers, Hey, here are my needs : - a query that has tagged and untagged contents - facets that ignore the tagged contents I tryed : q=({!tag=toExclude} ignored) taken into account q={tag=toExclude v='ignored'} take into account Both resulted in a error. Is this possible or do I have to try another way ? -- Tous les courriers électroniques émis depuis la messagerie de Sciences Po doivent respecter des conditions d'usages. Pour les consulter rendez-vous sur http://www.ressources-numeriques.sciences-po.fr/confidentialite_courriel.htm -- Tous les courriers électroniques émis depuis la
Re: Local param tag voodoo ?
You're right the second query didn't result in an error but neither gave the expected result. I'm gone to have a look at the link you gave me. Thanks ! From: Markus Jelsma markus.jel...@openindex.io Sent: Tue Jan 18 21:31:52 CET 2011 To: solr-user@lucene.apache.org Subject: Re: Local param tag voodoo ? Hi, You get an error because LocalParams need to be in the beginning of a parameter's value. So no parenthesis first. The second query should not give an error because it's a valid query. Anyway, i assume you're looking for : http://wiki.apache.org/solr/SimpleFacetParameters#Multi- Select_Faceting_and_LocalParams Cheers, Hey, here are my needs : - a query that has tagged and untagged contents - facets that ignore the tagged contents I tryed : q=({!tag=toExclude} ignored) taken into account q={tag=toExclude v='ignored'} take into account Both resulted in a error. Is this possible or do I have to try another way ? -- Tous les courriers électroniques émis depuis la messagerie de Sciences Po doivent respecter des conditions d'usages. Pour les consulter rendez-vous sur http://www.ressources-numeriques.sciences-po.fr/confidentialite_courriel.htm
Re: Local param tag voodoo ?
Ok I was already at this point. My facetting system use exactly what is described in this page. I read it from the Solr 1.4 book. Otherwise I would'nt ask. The problem is that the filter queries doesn't affect the relevance score of the results so I want the terms in the main query. From: Markus Jelsma markus.jel...@openindex.io Sent: Tue Jan 18 21:31:52 CET 2011 To: solr-user@lucene.apache.org Subject: Re: Local param tag voodoo ? Hi, You get an error because LocalParams need to be in the beginning of a parameter's value. So no parenthesis first. The second query should not give an error because it's a valid query. Anyway, i assume you're looking for : http://wiki.apache.org/solr/SimpleFacetParameters#Multi- Select_Faceting_and_LocalParams Cheers, Hey, here are my needs : - a query that has tagged and untagged contents - facets that ignore the tagged contents I tryed : q=({!tag=toExclude} ignored) taken into account q={tag=toExclude v='ignored'} take into account Both resulted in a error. Is this possible or do I have to try another way ? -- Tous les courriers électroniques émis depuis la messagerie de Sciences Po doivent respecter des conditions d'usages. Pour les consulter rendez-vous sur http://www.ressources-numeriques.sciences-po.fr/confidentialite_courriel.htm
Local param tag voodoo ?
Hey, here are my needs : - a query that has tagged and untagged contents - facets that ignore the tagged contents I tryed : q=({!tag=toExclude} ignored) taken into account q={tag=toExclude v='ignored'} take into account Both resulted in a error. Is this possible or do I have to try another way ?
Solr boolean operators
Hi, with the Lucene query syntax, is : a AND (a OR b) equivalent to : a (absorption) ?
Re: Solr boolean operators
Ok, thanks. That's what I expected :D From: dante stroe dante.st...@gmail.com Sent: Thu Jan 13 15:56:33 CET 2011 To: solr-user@lucene.apache.org Subject: Re: Solr boolean operators To my understanding: in terms of the results that will be matched by your query ... it's the same. In terms of the score of the results no, since, if you are using the first query, the documents that will match both the a and the b terms, will match higher then the ones matching just the a term. On Thu, Jan 13, 2011 at 3:29 PM, Xavier Schepler xavier.schep...@sciences-po.fr wrote: Hi, with the Lucene query syntax, is : a AND (a OR b) equivalent to : a (absorption) ? -- Tous les courriers électroniques émis depuis la messagerie de Sciences Po doivent respecter des conditions d'usages. Pour les consulter rendez-vous sur http://www.ressources-numeriques.sciences-po.fr/confidentialite_courriel.htm
Re: No response from Solr on complex request after several days
On 29/10/2010 12:08, Lance Norskog wrote: There are a few problems that can happen. This is usually a sign of garbage collection problems. You can monitor the Tomcat instance with JConsole or one of the other java monitoring tools and see if there is a memory leak. Also, most people don't need to do it, but you can automatically restart it once a day. On Thu, Oct 28, 2010 at 2:20 AM, Xavier Schepler xavier.schep...@sciences-po.fr wrote: Hi, We are in a beta testing phase, with several users a day. After several days of waiting, the solr server didn't respond to requests that require a lot of processing time. I'm using Solr inside Tomcat. This is the request that had no response from the server : wt=jsonomitHeader=trueq=qiAndMSwFR%3A%28transport%29q.op=ANDstart=0rows=5fl=id,domainId,solrLangCode,ddiFileId,studyDescriptionId,studyYearAndDescriptionId,nesstarServerId,studyNesstarId,variableId,questionId,variableNesstarId,concept,studyTitle,studyQuestionCount,hasMultipleItems,variableName,hasQuestionnaire,questionnaireUrl,studyDescriptionUrl,universe,notes,preQuestionText,postQuestionText,interviewerInstructions,questionPosition,vlFR,qFR,iFR,mFR,vlEN,qEN,iEN,mEN,sort=score%20descfq=solrLangCode%3AFRfacet=truefacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DdomainIdfacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudyDecadefacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudySerieIdfacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudyYearAndDescriptionIdfacet.sort=countf.studyDecade.facet.sort=lexspellcheck=truespellcheck.count=10spellcheck.dictionary=qiAndMFRspellcheck.q=transporthl=onhl.fl=qSwFR,iHLSwFR,mHLSwFRhl.fragsize=0hl.snippets=1hl.usePhraseHighlighter=truehl.highlightMultiTerm=truehl.simple.pre=%3Cb%3Ehl.simple.post=%3C%2Fb%3Ehl.mergeContiguous=false It involves highlighting on a multivalued field with more than 600 short values inside. It takes 200 or 300 ms because of highlighting. After restarting tomcat all went fine again. I'm trying to understand why I had to restart tomcat and solr and what should I do to have it working 7/7 24/24. Xavier Thanks for your response. Today, I've increased the Tomcat JVM heap size from 128-256 to 1024-2048. I will see if it helps.
No response from Solr on complex request after several days
Hi, We are in a beta testing phase, with several users a day. After several days of waiting, the solr server didn't respond to requests that require a lot of processing time. I'm using Solr inside Tomcat. This is the request that had no response from the server : wt=jsonomitHeader=trueq=qiAndMSwFR%3A%28transport%29q.op=ANDstart=0rows=5fl=id,domainId,solrLangCode,ddiFileId,studyDescriptionId,studyYearAndDescriptionId,nesstarServerId,studyNesstarId,variableId,questionId,variableNesstarId,concept,studyTitle,studyQuestionCount,hasMultipleItems,variableName,hasQuestionnaire,questionnaireUrl,studyDescriptionUrl,universe,notes,preQuestionText,postQuestionText,interviewerInstructions,questionPosition,vlFR,qFR,iFR,mFR,vlEN,qEN,iEN,mEN,sort=score%20descfq=solrLangCode%3AFRfacet=truefacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DdomainIdfacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudyDecadefacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudySerieIdfacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudyYearAndDescriptionIdfacet.sort=countf.studyDecade.facet.sort=lexspellcheck=truespellcheck.count=10spellcheck.dictionary=qiAndMFRspellcheck.q=transporthl=onhl.fl=qSwFR,iHLSwFR,mHLSwFRhl.fragsize=0hl.snippets=1hl.usePhraseHighlighter=truehl.highlightMultiTerm=truehl.simple.pre=%3Cb%3Ehl.simple.post=%3C%2Fb%3Ehl.mergeContiguous=false It involves highlighting on a multivalued field with more than 600 short values inside. It takes 200 or 300 ms because of highlighting. After restarting tomcat all went fine again. I'm trying to understand why I had to restart tomcat and solr and what should I do to have it working 7/7 24/24. Xavier
No response from Solr on complex request (real issue explained)
Hi, We are in a beta testing phase, with several users a day. After several days of running well, the solr server stopped responding to requests that require a lot of processing time, like this one : wt=jsonomitHeader=trueq=qiAndMSwFR%3A%28transport%29q.op=ANDstart=0rows=5fl=id,domainId,solrLangCode,ddiFileId,studyDescriptionId,studyYearAndDescriptionId,nesstarServerId,studyNesstarId,variableId,questionId,variableNesstarId,concept,studyTitle,studyQuestionCount,hasMultipleItems,variableName,hasQuestionnaire,questionnaireUrl,studyDescriptionUrl,universe,notes,preQuestionText,postQuestionText,interviewerInstructions,questionPosition,vlFR,qFR,iFR,mFR,vlEN,qEN,iEN,mEN,sort=score%20descfq=solrLangCode%3AFRfacet=truefacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DdomainIdfacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudyDecadefacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudySerieIdfacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudyYearAndDescriptionIdfacet.sort=countf.studyDecade.facet.sort=lexspellcheck=truespellcheck.count=10spellcheck.dictionary=qiAndMFRspellcheck.q=transporthl=onhl.fl=qSwFR,iHLSwFR,mHLSwFRhl.fragsize=0hl.snippets=1hl.usePhraseHighlighter=truehl.highlightMultiTerm=truehl.simple.pre=%3Cb%3Ehl.simple.post=%3C%2Fb%3Ehl.mergeContiguous=false It involves highlighting on a multivalued field with more than 600 short values inside. Usually, it takes 200 or 300 ms. I'm using Solr within Tomcat. After restarting Tomcat all went fine again. I'm trying to understand why I had to restart tomcat and what should I do to have it working 7/7 24/24. Xavier
More like this and terms positions
Hi, does the more like this search uses terms positions information in the score formula ?
Re: More like this and terms positions
On 04/10/2010 16:40, Robert Muir wrote: On Mon, Oct 4, 2010 at 10:16 AM, Xavier Schepler xavier.schep...@sciences-po.fr wrote: Hi, does the more like this search uses terms positions information in the score formula ? no, it would be nice if it did use them though (based upon query terms), seems like it would yield improvements. http://sifaka.cs.uiuc.edu/~ylv2/pub/sigir10-prm.pdf maybe in a next solr version ?
stopwords in AND clauses
Let's suppose we have a regular search field body_t, and an internal boolean flag flag_t not exposed to the user. I'd like body_t:foo AND flag_t:true to be an intersection, but if foo is a stopword I get all documents for which flag_t is true, as if the first class was dropped, or if technically all documents match an empty string. Is there a way to get 0 results instead?
Re: stopwords in AND clauses
On Mon, Sep 13, 2010 at 4:29 PM, Simon Willnauer simon.willna...@googlemail.com wrote: On Mon, Sep 13, 2010 at 3:27 PM, Xavier Noria f...@hashref.com wrote: Let's suppose we have a regular search field body_t, and an internal boolean flag flag_t not exposed to the user. I'd like body_t:foo AND flag_t:true this is solr right? why don't you use filterquery for you unexposed flat_t field q=boty_t:foofq=flag_t:true this might help too: http://wiki.apache.org/solr/CommonQueryParameters#fq Sounds good.
Phrase search + multi-word index time expanded synonym
Hello, well, first, here's the field type that is searched : fieldtype name=SyFR class=solr.TextField analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ !-- Synonyms -- filter class=solr.SynonymFilterFactory synonyms=synonyms-fr.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ /analyzer /fieldtype here's the synonym from the synonyms-fr.txt file : ... PS,Parti socialiste ... and here's the query : PS et. It returns no result, whereas Parti socialiste et returns the results. How can I have both queries working ? I'm thinking about different configurations but I didn't found any solution at the moment. Thx for reading, Xavier Schepler
Re: Phrase search + multi-word index time expanded synonym
On 08/09/2010 12:21, Grijesh.singh wrote: see the analysis.jsp with debug verbose and see what happens at index time and search time during analysis with your data Also u can use debugQuery=on for seeing what actually parsed query is. - Grijesh I've found a first solution by myself, using the query analyzer, that works for couple of synonyms. I have to test it with rows of 3 or 4 equivalents synonyms. I used analysis.jsp. The query time analyzer became : analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms2-fr.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ /analyzer And the synonyms2-fr.txt contains : PS = Parti socialiste Thxs for your reply.
spellcheck distance measure algorithms error ?
Hi, When I take the two letters from the middle of a word and put the first in place of the second and the second in place of the first, ex : jospin = jopsin, I don't get any suggestion from the spellchecker component. I tryed the default algorithm and the Jaro Winkler Distance, with a coeff of 0.5. Errors like : jospni instead of jospin josipn instead of jospin ojspin instead of jospin ... are successfully corrected. But jopsin instead of jospin returns no suggestion and I wonder why. Has anyone else encountered this error ?
Re: spellcheck distance measure algorithms error ?
On 03/09/2010 15:31, Grant Ingersoll wrote: On Sep 3, 2010, at 9:14 AM, Xavier Schepler wrote: On 03/09/2010 14:47, Grant Ingersoll wrote: On Sep 3, 2010, at 6:02 AM, Xavier Schepler wrote: no, jopsin isn't in the index. I tryed this with other words and I had the same error. Thx for your reply. And what happens if you drop the accuracy to 0? Also, please share your relevant configuration (spell checker config) and the URL command you are using. I lowered the accuracy to 0 and restarted the server but I had no extra suggestion. Here are extracts from my configuration : - solrconfig : searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypeSC/str !-- French spellcheckers : Start -- lst name=spellchecker str name=nameqiAndMAndVlFR/str str name=fieldqiAndMAndVlSCFR/str str name=distanceMeasure org.apache.lucene.search.spell.JaroWinklerDistance /str str name=spellcheckIndexDir./spellchecker_qiAndMAndVlFR/str str name=buildOnOptimizetrue/str str name=accuracy0.6/str /lst - schema.xml SC type : fieldtype name=SC class=solr.TextField positionIncrementGap=100 stored=false multiValued=true analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ /analyzer /fieldtype - schema.xml qiAndMAndVlFR field : field name=qiAndMAndVlSCFR type=SC/ - url command : wt=jsonomitHeader=trueq=qiAndMAndVlSyFR%3A%28jopsin%29q.op=ANDstart=0rows=5fl=id,solrLangCode,ddiFileId,studyDescriptionId,studyYearAndDescriptionId,nesstarServerId,studyNesstarId,variableId,questionId,variableNesstarId,concept,studyTitle,studyQuestionCount,hasMultipleItems,variableName,hasQuestionnaire,questionnaireUrl,studyDescriptionUrl,universe,notes,preQuestionText,postQuestionText,interviewerInstructions,questionPosition,vlFR,qFR,iFR,mFR,vlEN,qEN,iEN,mEN,sort=score%20descfq=solrLangCode%3AFRfq=solrLangCode%3AFRfq=solrLangCode%3AFRfq=solrLangCode%3AFRfacet=truefacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%7DstudyDecadefacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%7DstudySerieIdfacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%7DstudyYearAndDescriptionIdfacet.sort=lex spellcheck=truespellcheck.count=10spellcheck.dictionary=qiAndMAndVlFRspellcheck.q=jopsin hl=onhl.fl=qFR,iFR,mFR,vlFRhl.fragsize=1hl.snippets=100hl.usePhraseHighlighter=truehl.highlightMultiTerm=truehl.simple.pre=%3Cb%3Ehl.simple.post=%3C%2Fb%3Ehl.mergeContiguous=false Regards, Xavier
Proximity search + Highlighting
Hi, can the highlighting component highlight terms only if the distance between them matches the query ? I use those parameters : hl=onhl.fl=qFR,iFR,mFR,vlFRhl.usePhraseHighlighter=falsehl.highlightMultiTerm=truehl.simple.pre=bhl.simple.post=%2Fbhl.mergeContiguous=false
Re: Proximity search + Highlighting
On 01/09/2010 12:38, Markus Jelsma wrote: I think you need to enable usePhraseHighlighter in order to use the highlightMultiTerm parameter. On Wednesday 01 September 2010 12:12:11 Xavier Schepler wrote: Hi, can the highlighting component highlight terms only if the distance between them matches the query ? I use those parameters : hl=onhl.fl=qFR,iFR,mFR,vlFRhl.usePhraseHighlighter=falsehl.highlightMult iTerm=truehl.simple.pre=bhl.simple.post=%2Fbhl.mergeContiguous=false Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350 yes, you're right.
Re: Proximity search + Highlighting
On 01/09/2010 13:54, Xavier Schepler wrote: On 01/09/2010 12:38, Markus Jelsma wrote: I think you need to enable usePhraseHighlighter in order to use the highlightMultiTerm parameter. On Wednesday 01 September 2010 12:12:11 Xavier Schepler wrote: Hi, can the highlighting component highlight terms only if the distance between them matches the query ? I use those parameters : hl=onhl.fl=qFR,iFR,mFR,vlFRhl.usePhraseHighlighter=falsehl.highlightMult iTerm=truehl.simple.pre=bhl.simple.post=%2Fbhl.mergeContiguous=false Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350 yes, you're right. but it doesn't help for the other problem
Re: Highlighting, return the matched terms only
Chris Hostetter wrote: : how could I have the highlighting component return only the terms that were : matched, without any surrounding text ? I'm not a Highlighter expert, but this is somethign that certainly *sounds* like it should be easy. I took a shot at it and this is hte best i could come up with... http://localhost:8983/solr/select/?q=solrhl.simple.pre=%20hl.simple.post=%20fl=idhl=truehl.snippets=1000hl.fragmenter=regexhl.regex.pattern=^\S%2B%24hl.fragsize=1hl.regex.slop=1000.0 ...however the fragments still wind up wider then it seems like they should based on the regex slop. I have no idea why. I've seen enough people with this request htat it seems like there should be a built in fragmenter/formatter option for it in Solr, so i opened a feature request... https://issues.apache.org/jira/browse/SOLR-2095 -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump! Hi, your solution is a little better than the one I'm using ATM. Thanks.
Expanded Synonyms + phrase search
Hi, several documents from my index contain the phrase : PS et. However, PS is expanded to parti socialiste and a phrase search for PS et fails. A phrase search for parti socialiste et succeeds. Can I have both queries working ? Here's the field type : fieldtype name=SyFR class=solr.TextField analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ !-- Synonyms -- filter class=solr.SynonymFilterFactory synonyms=synonyms-fr.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ /analyzer /fieldtype
Highlighting, return the matched terms only
Hi, how could I have the highlighting component return only the terms that were matched, without any surrounding text ?
Re: Adding new elements to index
Thanks for the quick reply! In fact it was a typo, the 200 rows I got were from postgres. I tried to say that the full-import was omitting the 100 oracle rows. When I run the full import, I run it as a single job, using the url command=full-import. I've tried to clear the index both using the clean command and manually deleting it, but when I run the full-import, the number of indexed documents are the documents coming from postgres. To be sure that the id field is unique, i get the id by assigning a letter before the id value. When indexed, the id looks like s_123, and that's the id 123 for an entity identified as s. Other entities use different prefixes, but never s. I used DIH to index the data. My configuration is the folllowing: File db-data-config.xml dataSource type=JdbcDataSource name=ds_ora driver=oracle.jdbc.OracleDriver url=jdbc:oracle:thin:@xxx.xxx.xxx.xxx:1521:SID user=user password=password / dataSource type=JdbcDataSource name=ds_pg driver=org.postgresql.Driver url=jdbc:postgresql://xxx.xxx.xxx.yyy:5432/sid user=user password=password / entity name=carrers dataSource=ds_ora query=select 's_'||id as id_carrer,'a' as tooltip from imi_carrers field column=id_carrer name=identificador / field column=tooltip name=Nom / /entity entity name=hidrants dataSource=ds_pg query=select 'h_'||id as id_hidrant, parc as tooltip from hidrants field column=id_hidrant name=identificador / field column=tooltip name=Nom / /entity -- In that configuration, all the fields coming from ds_pg are indexed, and the fields coming from ds_ora are not indexed. As I've said, the strange behaviour for me is that no error is logged in tomcat, the number of documents created is the number of rows returned by hidrants, while the number of rows returned is the sum of the rows from hidrants and carrers. Thanks in advance. Xavi. On 7 July 2010 02:46, Erick Erickson erickerick...@gmail.com wrote: first do you have a unique key defined in your schema.xml? If you do, some of those 300 rows could be replacing earlier rows. You say: if I have 200 rows indexed from postgres and 100 rows from Oracle, the full-import process only indexes 200 documents from oracle, although it shows clearly that the query retruned 300 rows. Which really looks like a typo, if you have 100 rows from Oracle how did you get 200 rows from Oracle? Are you perhaps doing this in two different jobs and deleting the first import before running the second? And if this is irrelevant, could you provide more details like how you're indexing things (I'm assuming DIH, but you don't state that anywhere). If it *is* DIH, providing that configuration would help. Best Erick On Tue, Jul 6, 2010 at 11:19 AM, Xavier Rodriguez xee...@gmail.com wrote: Hi, I have a SOLR installed on a Tomcat application server. This solr instance has some data indexed from a postgres database. Now I need to add some entities from an Oracle database. When I run the full-import command, the documents indexed are only documents from postgres. In fact, if I have 200 rows indexed from postgres and 100 rows from Oracle, the full-import process only indexes 200 documents from oracle, although it shows clearly that the query retruned 300 rows. I'm not doing a delta-import, simply a full import. I've tried to clean the index, reload the configuration, and manually remove dataimport.properties because it's the only metadata i found. Is there any other file to check or modify just to get all 300 rows indexed? Of course, I tried to find one of that oracle fields, with no results. Thanks a lot, Xavier Rodriguez.
Adding new elements to index
Hi, I have a SOLR installed on a Tomcat application server. This solr instance has some data indexed from a postgres database. Now I need to add some entities from an Oracle database. When I run the full-import command, the documents indexed are only documents from postgres. In fact, if I have 200 rows indexed from postgres and 100 rows from Oracle, the full-import process only indexes 200 documents from oracle, although it shows clearly that the query retruned 300 rows. I'm not doing a delta-import, simply a full import. I've tried to clean the index, reload the configuration, and manually remove dataimport.properties because it's the only metadata i found. Is there any other file to check or modify just to get all 300 rows indexed? Of course, I tried to find one of that oracle fields, with no results. Thanks a lot, Xavier Rodriguez.
Multi word synonyms + highlighting
Hi, Here's a field type using synonyms : fieldtype name=SFR class=solr.TextField analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=french-synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ /analyzer /fieldtype Here are the contents of 'french-synonyms.txt' that I used for testing : PC,parti communiste PS,parti socialiste When I query a field for the words : parti communiste, those things are highlighted : parti communiste parti socialiste parti PC PS communiste Having parti socialiste highlighted is a problem. I expected only parti communiste, parti, communiste and PC highlighted. Is there a way to have things working like I expected ? Here is the query I use : wt=json q=qAndMSFR%3A%28parti%20communiste%29 q.op=AND start=0 rows=5 fl=id,studyId,questionFR,modalitiesFR,variableLabelFR,variableName,nesstarVariableId,lang,studyTitle,nesstarStudyId,CevipofConcept,studyQuestionCount,questionPosition,preQuestionText, sort=score%20desc facet=true facet.field=CevipofConceptCode facet.field=studyDateAndId facet.sort=lex spellcheck=true spellcheck.collate=on spellcheck.count=10 hl=on hl.fl=questionSMFR,modalitiesSMFR,variableLabelSMFR hl.fragsize=1 hl.snippets=100 hl.usePhraseHighlighter=true hl.highlightMultiTerm=true hl.simple.pre=%3Cb%3E hl.simple.post=%3C%2Fb%3E
Targeting two fields with the same query or one field gathering contents from both ?
Hey, let's say I have : - a field named A with specific contents - a field named B with specific contents - a field named C witch contents only from A and B added with copyField. Are those queries equivalents in terms of performance : - A: (the lazy fox) AND B: (the lazy fox) - C: (the lazy fox) ?? Thanks, Xavier
Re: Targeting two fields with the same query or one field gathering contents from both ?
Le 17/05/2010 16:57, Xavier Schepler a écrit : Hey, let's say I have : - a field named A with specific contents - a field named B with specific contents - a field named C witch contents only from A and B added with copyField. Are those queries equivalents in terms of performance : - A: (the lazy fox) AND B: (the lazy fox) - C: (the lazy fox) ?? Thanks, Xavier I made some tests and it appears than the second query is much faster than the first ...
Re: Targeting two fields with the same query or one field gathering contents from both ?
Le 17/05/2010 17:49, Marco Martinez a écrit : No, the equivalent for this will be: - A: (the lazy fox) *OR* B: (the lazy fox) - C: (the lazy fox) Imagine the situation that you dont have in B 'the lazy fox', with the AND you get 0 results although you have 'the lazy fox' in A and C Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42 2010/5/17 Xavier Scheplerxavier.schep...@sciences-po.fr Hey, let's say I have : - a field named A with specific contents - a field named B with specific contents - a field named C witch contents only from A and B added with copyField. Are those queries equivalents in terms of performance : - A: (the lazy fox) AND B: (the lazy fox) - C: (the lazy fox) ?? Thanks, Xavier yes you're right I figured it after posting.
What hardware do I need ?
Hi, I'm working with Solr 1.4. My schema has about 50 fields. I'm using full text search in short strings (~ 30-100 terms) and facetted search. My index will have 100 000 documents. The number of requests per second will be low. Let's say between 0 and 1000 because of auto-complete. Is a standard server (3ghz proc, 4gb ram) with the client application (apache + php5 + ZF + apc) and Tomcat + Solr enough ??? Do I need more hardware ? Thanks in advance, Xavier S.
Re: What hardware do I need ?
Le 23/04/2010 17:08, Otis Gospodnetic a écrit : Xavier, 0-1000 QPS is a pretty wide range. Plus, it depends on how good your auto-complete is, which depends on types of queries it issues, among other things. 100K short docs is small, so that will all fit in RAM nicely, assuming those other processes leave enough RAM for the OS to cache the index. That said, you do need more than 1 box if you want your auto-complete more fault tolerant. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Xavier Scheplerxavier.schep...@sciences-po.fr To: solr-user@lucene.apache.org Sent: Fri, April 23, 2010 11:01:24 AM Subject: What hardware do I need ? Hi, I'm working with Solr 1.4. My schema has about 50 fields. I'm using full text search in short strings (~ 30-100 terms) and facetted search. My index will have 100 000 documents. The number of requests per second will be low. Let's say between 0 and 1000 because of auto-complete. Is a standard server (3ghz proc, 4gb ram) with the client application (apache + php5 + ZF + apc) and Tomcat + Solr enough ??? Do I need more hardware ? Thanks in advance, Xavier S. Well my auto-complete is built on the facet prefix search component. I think that 100-700 requests per seconds is maybe a better approximation.
More like this - setting a minimum number of terms used to build queries
Hey, Is there a way to make the more like this feature build its queries from a minimum number of interesting terms ? It looks like this component fires query with only 1 term in them. I got a lot of results that aren't similar at all with the parsed document fields. My parameters : mlt.fl=question,mlt.mintf=1mlt.mindf=mlt.minwl=4 The question field contains between 15 and 50 terms. Xavier S.
Highlighting inside a field with HTML contents
Hello, this field would not be searched, but it would be used to display results. A query could be : q=tablehl=truehl.fl=htmlfieldhl.fragsize=0 It would be tokenized with the HTMLStripStandardTokenizerFactory, then analyzed the same way as the searcheable fields. Could this result in highlighting inside HTML tags (I mean thinks like emtable/em.../emtable/em) ?
Re: Need feedback on solr security
Vijayant Kumar wrote: Hi Group, I need some feedback on solr security. For Making by solr admin password protected, I had used the Path Based Authentication form http://wiki.apache.org/solr/SolrSecurity. In this way my admin area,search,delete,add to index is protected.But Now when I make solr authenticated then for every update/delete from the fornt end is blocked without authentication. I do not need this authentication from the front end so I simply pass the username and password to the solr in my fornt end scripts and it is working fine. I had done it in the below way. http://username:passw...@localhost:8983/solr/admin/update I need your suggestion and feed back on the above method.Is it fessiable method and secure? TO over come from this issue is there any alternate method? Hey, there is at least another solution. You can set a firewall rule that allow connections to the Solr's port only from trusted IPs.
Re: Need feedback on solr security
Vijayant Kumar wrote: Hi Xavier, Thanks for your feedback the firewall rule for the trusted IP is not fessiable for us because the application is open for public so we can not work through IP banning. Vijayant Kumar wrote: Hi Group, I need some feedback on solr security. For Making by solr admin password protected, I had used the Path Based Authentication form http://wiki.apache.org/solr/SolrSecurity. In this way my admin area,search,delete,add to index is protected.But Now when I make solr authenticated then for every update/delete from the fornt end is blocked without authentication. I do not need this authentication from the front end so I simply pass the username and password to the solr in my fornt end scripts and it is working fine. I had done it in the below way. http://username:passw...@localhost:8983/solr/admin/update I need your suggestion and feed back on the above method.Is it fessiable method and secure? TO over come from this issue is there any alternate method? Hey, there is at least another solution. You can set a firewall rule that allow connections to the Solr's port only from trusted IPs. Do your users connect directly to Solr ? I mean, the firewall rule is for the solr client, i.e. the computer that host the application that connect to Solr.
Re: Need feedback on solr security
Xavier Schepler wrote: Vijayant Kumar wrote: Hi Xavier, Thanks for your feedback the firewall rule for the trusted IP is not fessiable for us because the application is open for public so we can not work through IP banning. Vijayant Kumar wrote: Hi Group, I need some feedback on solr security. For Making by solr admin password protected, I had used the Path Based Authentication form http://wiki.apache.org/solr/SolrSecurity. In this way my admin area,search,delete,add to index is protected.But Now when I make solr authenticated then for every update/delete from the fornt end is blocked without authentication. I do not need this authentication from the front end so I simply pass the username and password to the solr in my fornt end scripts and it is working fine. I had done it in the below way. http://username:passw...@localhost:8983/solr/admin/update I need your suggestion and feed back on the above method.Is it fessiable method and secure? TO over come from this issue is there any alternate method? Hey, there is at least another solution. You can set a firewall rule that allow connections to the Solr's port only from trusted IPs. Do your users connect directly to Solr ? I mean, the firewall rule is for the solr client, i.e. the computer that host the application that connect to Solr. You could set a firewall that forbid any connection to your Solr's server port to everyone, except the computer that host your application that connect to Solr. So, only your application will be able to connect to Solr. This idea comes from the book Solr 1.4 Entreprise Search Server.
Re: Dynamic fields with more than 100 fields inside
Shalin Shekhar Mangar a écrit : On Mon, Feb 8, 2010 at 9:47 PM, Xavier Schepler xavier.schep...@sciences-po.fr wrote: Hey, I'm thinking about using dynamic fields. I need one or more user specific field in my schema, for example, concept_user_*, and I will have maybe more than 200 users using this feature. One user will send and retrieve values from its field. It will then be used to filter result. How would it impact query performance ? Can you give an example of such a query? Hi, it could be queries such as : allFr: état-unis AND concept_researcher_99 = 303 modalitiesFr: exactement AND questionFr: correspond AND concept_researcher_2 = 101 and facetting like this : q=%2A%3A%2Afl=variableXMLFr,langstart=0rows=10facet=truefacet.field=concept_researcher_2facet.field=studyDateAndStudyTitlefacet.sort=lex Thanks in advance, Xavier S.
Re: Dynamic fields with more than 100 fields inside
Shalin Shekhar Mangar a écrit : On Tue, Feb 9, 2010 at 2:43 PM, Xavier Schepler xavier.schep...@sciences-po.fr wrote: Shalin Shekhar Mangar a écrit : On Mon, Feb 8, 2010 at 9:47 PM, Xavier Schepler xavier.schep...@sciences-po.fr wrote: Hey, I'm thinking about using dynamic fields. I need one or more user specific field in my schema, for example, concept_user_*, and I will have maybe more than 200 users using this feature. One user will send and retrieve values from its field. It will then be used to filter result. How would it impact query performance ? Can you give an example of such a query? Hi, it could be queries such as : allFr: état-unis AND concept_researcher_99 = 303 modalitiesFr: exactement AND questionFr: correspond AND concept_researcher_2 = 101 and facetting like this : q=%2A%3A%2Afl=variableXMLFr,langstart=0rows=10facet=truefacet.field=concept_researcher_2facet.field=studyDateAndStudyTitlefacet.sort=lex It doesn't impact query performance any more than filtering on other fields. Is there a performance problem or were you just asking generally? I was asking generally, thanks for your response.
Dynamic fields with more than 100 fields inside
Hey, I'm thinking about using dynamic fields. I need one or more user specific field in my schema, for example, concept_user_*, and I will have maybe more than 200 users using this feature. One user will send and retrieve values from its field. It will then be used to filter result. How would it impact query performance ? Thanks, Xavier S.
Field highlighting
Hi, I'm trying to highlight short text values. The field they came from has a type shared with other fields. I have highlighting working on other fields but not on this one. Why ?
Re: Field highlighting
Erick Erickson a écrit : It's really hard to provide any response with so little information, could you show us the difference between a field that works and one that doesn't? Especially the relevant schema.xml entries and the query that fails to highlight Erick On Thu, Jan 7, 2010 at 7:47 AM, Xavier Schepler xavier.schep...@sciences-po.fr wrote: Hi, I'm trying to highlight short text values. The field they came from has a type shared with other fields. I have highlighting working on other fields but not on this one. Why ? Thanks for your response. Here are some extracts from my schema.xml : fieldtype name=textFr class=solr.TextField analyzer !-- suppression des mots vides de sens -- filter class=solr.StopFilterFactory words=french-stopwords.txt ignoreCase=true/ !-- decoupage en jetons -- tokenizer class=solr.StandardTokenizerFactory/ !-- suppression des accents -- filter class=solr.ISOLatin1AccentFilterFactory/ !-- suppression des points a la fin des accronymes -- filter class=solr.StandardFilterFactory/ !-- passage en miniscules -- filter class=solr.LowerCaseFilterFactory/ !-- lexemisation avec le filtre porter -- filter class=solr.SnowballPorterFilterFactory language=French/ !-- synonymes -- filter class=solr.SynonymFilterFactory synonyms=test-synonyms.txt ignoreCase=true expand=true/ /analyzer /fieldtype Here's a field on which highlighting works : field name=questionsLabelsFr required=false type=textFr multiValued=true indexed=true stored=true compressed=false omitNorms=false termVectors=true termPositions=true termOffsets=true / Here's the field on which it doesn't : field name=modalitiesLabelsFr required=false type=textFr multiValued=true indexed=true stored=true compressed=false omitNorms=false termVectors=true termPositions=true termOffsets=true / They are kinda the same. But modalitiesLabelFr contains mostly short strings like : Côtes-d Armor Creuse Dordogne Doubs Drôme Eure Eure-et-Loir Finistère When matches are found in them, I get a list like this, with no text : lst name=highlighting lst name=dbbd3642-db1d-4b35-9280-11582523903d/ lst name=f1d8be2d-1070-4111-b16e-94d16c8c0bc6/ /lst The name attribute is the uid of the document. I tryed several values for hl.fragsize (0, 1, 2, ...) with no success at all.