RE: Regarding Copyfield
What is your text_general type definition in schema.xml? -Original Message- From: anurag.jain [mailto:anurag.k...@gmail.com] Sent: Tuesday, January 15, 2013 12:16 PM To: solr-user@lucene.apache.org Subject: Regarding Copyfield hi in copy field i am not storing first_name last_name etc. but in dest = text it is showing first_name .. etc. in auto suggestion mode. my copy field are .. copyField source=percentage dest=text/ copyField source=university_name dest=text/ copyField source=course_name dest=text/ ... and field are .. field name=id type=text_general indexed=true stored=true required=true multiValued=false / field name=first_name type=text_general indexed=false stored=true/ field name=last_name type=text_general indexed=false stored=true/ field name=date_of_birth type=text_general indexed=false stored=true/ field name=state_name type=text_general indexed=false stored=true/ field name=mobile_no type=text_general indexed=false stored=true/ ... and also i want to make own field like text named as autosuggest then it is also not working for autosuggestion. please reply urgent -- View this message in context: http://lucene.472066.n3.nabble.com/Regarding-Copyfield-tp4033385.html Sent from the Solr - User mailing list archive at Nabble.com.
Multicore configuration
Hi, I'd like to use two separate indexes (Solr 3.6.1). I've read several wiki pages and looked at the multicore example bundled with the distribution but it seems I missing something. I have this hierarchy : solr-home/ | -- conf | -- solr.xml -- solrconfig.xml (if I don't put it, solr complains) -- schema.xml (idem) -- ... | -- cores | -- dossier | -- conf | -- dataconfig.xml -- schema.xml -- solrconfig.xml | -- data | -- procedure | -- conf | -- dataconfig.xml -- schema.xml -- solrconfig.xml | -- data Here's the content of my solr.xml file : http://paste.debian.net/224818/ And I launch my servlet container with -Dsolr.solr.home=my-directory/solr-home. I've put nearly nothing in my solr-home/conf/schema.xml so Solr complains, but that's not the point. When I go to the admin of core dossier, http://localhost:8080/solr/dossier/admin, the container says it doesn't exist. But when I go to http://localhost:8080/solr/admin it finds it, which makes me guess that Solr is stil in single core mode. What am I missing ? Regards. -- Bruno Dusausoy Software Engineer YP5 Software -- Pensez environnement : limitez l'impression de ce mail. Please don't print this e-mail unless you really need to.
Re: Multicore configuration
Hi Bruno, Maybe this helps. I wrote something about it: http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1/ Dariusz On Tue, Jan 15, 2013 at 9:52 AM, Bruno Dusausoy bdusau...@yp5.be wrote: Hi, I'd like to use two separate indexes (Solr 3.6.1). I've read several wiki pages and looked at the multicore example bundled with the distribution but it seems I missing something. I have this hierarchy : solr-home/ | -- conf | -- solr.xml -- solrconfig.xml (if I don't put it, solr complains) -- schema.xml (idem) -- ... | -- cores | -- dossier | -- conf | -- dataconfig.xml -- schema.xml -- solrconfig.xml | -- data | -- procedure | -- conf | -- dataconfig.xml -- schema.xml -- solrconfig.xml | -- data Here's the content of my solr.xml file : http://paste.debian.net/**224818/ http://paste.debian.net/224818/ And I launch my servlet container with -Dsolr.solr.home=my-directory/** solr-home. I've put nearly nothing in my solr-home/conf/schema.xml so Solr complains, but that's not the point. When I go to the admin of core dossier, http://localhost:8080/solr/**dossier/adminhttp://localhost:8080/solr/dossier/admin, the container says it doesn't exist. But when I go to http://localhost:8080/solr/**adminhttp://localhost:8080/solr/adminit finds it, which makes me guess that Solr is stil in single core mode. What am I missing ? Regards. -- Bruno Dusausoy Software Engineer YP5 Software -- Pensez environnement : limitez l'impression de ce mail. Please don't print this e-mail unless you really need to.
Solr Query | Loading documents with large content (Performance)
Hi there, sometimes we have to load very big documents, 1-2 multi-value-fields of it can contain 10.000 items. And unfortunately we need this informations. We have to load 50 documents in order to show to the result table in the UI. The query takes around 50 seconds. I guess 48 seconds of it, is just to transfer the content of the documents over the net. What can I do here? -I know, can take out this long informations outside of the document. But this is also not really a solution -Then I was thinking about compressed-fields. They come with solr 4.1 again, right? How is it with compressed field. As I understood the stored field will be stored in a compressed way. Ok, but when they will be uncompressed? -Before sending back to the client on server-side? -Or, on the clientside? I am using solrJ. Any other ideas? Can it work to increase the query performance using compressed fields? Thanks a lot for your ideas and answers! Regards Uwe -- Uwe Clement Software Architect Project Manager ___ |X__ X| eXXcellent solutions gmbh Beim Alten Fritz 2 D-89075 Ulm e | mailto:uwe.clem...@exxcellent.de uwe.clem...@exxcellent.de m | +49 [0]151-275 692 27 i | http://www.exxcellent.de http://www.exxcellent.de Geschäftsführer: Dr. Martina Burgetsmeier, Wilhelm Zorn, Gerhard Gruber Sitz der Gesellschaft: Ulm, Registergericht: Ulm HRB 4309
Re: Multicore configuration
You should put your solr.xml into your 'cores' directory, and set -Dsolr.solr.home=cores That should get you going. 'cores' *is* your Solr Home. Otherwise, your instanceDir entries in your current solr.xml will need correct paths to ../cores/procedure/ etc. Upayavira On Tue, Jan 15, 2013, at 08:52 AM, Bruno Dusausoy wrote: Hi, I'd like to use two separate indexes (Solr 3.6.1). I've read several wiki pages and looked at the multicore example bundled with the distribution but it seems I missing something. I have this hierarchy : solr-home/ | -- conf | -- solr.xml -- solrconfig.xml (if I don't put it, solr complains) -- schema.xml (idem) -- ... | -- cores | -- dossier | -- conf | -- dataconfig.xml -- schema.xml -- solrconfig.xml | -- data | -- procedure | -- conf | -- dataconfig.xml -- schema.xml -- solrconfig.xml | -- data Here's the content of my solr.xml file : http://paste.debian.net/224818/ And I launch my servlet container with -Dsolr.solr.home=my-directory/solr-home. I've put nearly nothing in my solr-home/conf/schema.xml so Solr complains, but that's not the point. When I go to the admin of core dossier, http://localhost:8080/solr/dossier/admin, the container says it doesn't exist. But when I go to http://localhost:8080/solr/admin it finds it, which makes me guess that Solr is stil in single core mode. What am I missing ? Regards. -- Bruno Dusausoy Software Engineer YP5 Software -- Pensez environnement : limitez l'impression de ce mail. Please don't print this e-mail unless you really need to.
Re: Multicore configuration
Dariusz Borowski a écrit : Hi Bruno, Maybe this helps. I wrote something about it: http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1/ Hi Darius, Thanks for the link. I've found my - terrible - mistake : solr.xml was not in solr.home dir but in solr.home/conf dir, so it didn't take it :-/ It works perfectly now. Sorry for the noise. Regards. -- Bruno Dusausoy Software Engineer YP5 Software -- Pensez environnement : limitez l'impression de ce mail. Please don't print this e-mail unless you really need to.
Re: Performance issue with group.ngroups=true
Hi, Retry on a better machine (2CPU, 8GB RAM, 1.5GB for java half used according admin interface) still have the same issue. It seems to grow with matches count : with a search matching 100k documents, it takes 700ms, vs 70ms without ngroup (CPU is 100% during request) For information, my index has 1M documents, for 700MB of data. -- View this message in context: http://lucene.472066.n3.nabble.com/Performance-issue-with-group-ngroups-true-tp4031888p4033422.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Results in same or different fields
Hi Gastone, I am not very sure, but I think phrase query will resolve this problem. q=title:white house will always have higher relevance that term white and house separately. Regards Harshvardhan Ojha -Original Message- From: Gastone Penzo [mailto:gastone.pe...@gmail.com] Sent: Tuesday, January 15, 2013 2:46 PM To: solr-user@lucene.apache.org Subject: Results in same or different fields Hi, i'm using solr 4.0 with edismax search handler. i'm searching inside 3 fields with same boost. i'd like to have high score for results in the same fields, instead of results in different fields es. qf=title,description if white house is found in title, it must have higher score than white in title field and house in description field how is it possible? ps. i set omitTermFreqAndPositions=true for all fields thanx *Gastone Penzo* * *
DataImportHandlerException: Unable to execute query with OPTIM
I have tried to search for my specific problem but have not found solution. I have also read the wiki on the DIH and seem to have everything set up right but my Query still fails. Thank you for your help I am running Solr 3.6.1 with Tomcat 6.0 Windows7 64bit and IBM Optim Archive File I have all jar file sitting in C:\Program Files\Apache Software Foundation\Tomcat 6.0\lib My solrconfig.xml is requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdb-data-config.xml/str /lst /requestHandler My db-data-config.xml ?xml version=1.0 encoding=utf-8? dataConfig dataSource type=JdbcDataSource name=SAMPLE_OPTIM_DB driver=com.ibm.optim.connect.jdbc.NvDriver url=jdbc:attconnect://198.168.2.89:2551/NAVIGATOR;DefTdpName=SAMPLE_OPTIM_DB batchSize=-1 user= password= readOnly=True / document name=headwords entity name=CUSTOMERS dataSource=SAMPLE_OPTIM_DB query=SELECT * FROM SAMPLE_OPTIM_DB:CUSTOMERS transformer=RegexTransformer field column=CUSTNAME name=CUSTNAME/ /entity /document /dataConfig I am Having below error WARNING: no uniqueKey specified in schema. Jan 15, 2013 4:05:44 PM org.apache.solr.core.SolrCore init INFO: [core0] Opening new SolrCore at solr\core0\, dataDir=solr/core0\data\ Jan 15, 2013 4:05:44 PM org.apache.solr.core.SolrCore init INFO: JMX monitoring not detected for core: core0 Jan 15, 2013 4:05:44 PM org.apache.solr.core.SolrCore initListeners INFO: [core0] Added SolrEventListener for newSearcher: org.apache.solr.core. QuerySenderListener{queries=[{q=solr,start=0,rows=10}, {q=rocks,start=0,rows=10} , {q=static newSearcher warming query from solrconfig.xml}]} Jan 15, 2013 4:05:44 PM org.apache.solr.core.SolrCore initListeners INFO: [core0] Added SolrEventListener for firstSearcher: org.apache.solr.cor e.QuerySenderListener{queries=[]} Jan 15, 2013 4:05:44 PM org.apache.solr.core.RequestHandlers initHandlersFromCon fig INFO: created standard: solr.StandardRequestHandler Jan 15, 2013 4:05:44 PM org.apache.solr.core.RequestHandlers initHandlersFromCon fig INFO: created /dataimport: org.apache.solr.handler.dataimport.DataImportHandler Jan 15, 2013 4:05:44 PM org.apache.solr.core.RequestHandlers initHandlersFromCon fig INFO: created /search: org.apache.solr.handler.component.SearchHandler Jan 15, 2013 4:05:44 PM org.apache.solr.core.RequestHandlers initHandlersFromCon fig INFO: created /update: solr.XmlUpdateRequestHandler Jan 15, 2013 4:05:44 PM org.apache.solr.search.SolrIndexSearcher init INFO: Opening Searcher@53bc93fe main Jan 15, 2013 4:05:44 PM org.apache.solr.update.CommitTracker init INFO: commitTracker AutoCommit: disabled Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.SearchHandler inform INFO: Adding component:org.apache.solr.handler.component.QueryComponent@781fb1f b Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.SearchHandler inform INFO: Adding component:org.apache.solr.handler.component.FacetComponent@68de135 9 Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.SearchHandler inform INFO: Adding component:org.apache.solr.handler.component.MoreLikeThisComponent@ 4bc86dd8 Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.SearchHandler inform INFO: Adding component:org.apache.solr.handler.component.HighlightComponent@53a 3a6c6 Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.SearchHandler inform INFO: Adding component:org.apache.solr.handler.component.StatsComponent@1d1a3c1 0 Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.SearchHandler inform INFO: Adding debug component:org.apache.solr.handler.component.DebugComponent@2 55d4d5d Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.HttpShardHandlerFactor y getParameter INFO: Setting socketTimeout to: 0 Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.HttpShardHandlerFactor y getParameter INFO: Setting urlScheme to: http:// Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.HttpShardHandlerFactor y getParameter INFO: Setting connTimeout to: 0 Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.HttpShardHandlerFactor y getParameter INFO: Setting maxConnectionsPerHost to: 20 Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.HttpShardHandlerFactor y getParameter INFO: Setting corePoolSize to: 0 Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.HttpShardHandlerFactor y getParameter INFO: Setting maximumPoolSize to: 2147483647 Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.HttpShardHandlerFactor y getParameter INFO: Setting maxThreadIdleTime to: 5 Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.HttpShardHandlerFactor y getParameter INFO: Setting sizeOfQueue to: -1 Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.HttpShardHandlerFactor y getParameter INFO: Setting fairnessPolicy to: false Jan 15, 2013 4:05:44 PM org.apache.solr.handler.dataimport.DataImportHandler pro cessConfiguration INFO: Processing
Re: Index data from multiple tables into Solr
Get user's input, form the solr query and send a request to the server (you can also pass a parameter called wt (xml,json etc) to direct solr to return output in that format). Parse the results from solr and display them to user in your website. Depending on what kind of server-side programming language you are using, there might be some libraries available that will allow to integrate your web-application with solr (for example: sunspot_solr in ruby) On Tue, Jan 15, 2013 at 5:24 AM, hassancrowdc hassancrowdc...@gmail.comwrote: thanx, I got it. How Can i integrate solr with my website? so that i can use it for search? On Mon, Jan 14, 2013 at 4:04 PM, Lance Norskog-2 [via Lucene] ml-node+s472066n4033291...@n3.nabble.com wrote: Try all of the links under the collection name in the lower left-hand columns. There several administration monitoring tools you may find useful. On 01/14/2013 11:45 AM, hassancrowdc wrote: ok stats are changing, so the data is indexed. But how can i do query with this data, or ow can i search it, like the command will be http://localhost:8983/solr/select?q=(any of my field column from table)? coz whatever i am putting in my url it shows me an xml file but the numFound are always 0? On Sat, Jan 12, 2013 at 1:24 PM, Alexandre Rafalovitch [via Lucene] [hidden email] http://user/SendEmail.jtp?type=nodenode=4033291i=0 wrote: Have you tried the Admin interface yet? The one on :8983 port if you are running default setup. That has a bunch of different stats you can look at apart from a nice way of doing a query. I am assuming you are on Solr 4, of course. Regards, Alex. On Fri, Jan 11, 2013 at 5:13 PM, hassancrowdc [hidden email] http://user/SendEmail.jtp?type=nodenode=4032778i=0wrote: So, I followed all the steps and solr is working successfully, Can you please tell me how i can see if my data is indexed or not? do i have to enter specific url into my browser or anything. I want to make sure that the data is indexed. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) -- If you reply to this email, your message will be added to the discussion below: . NAML http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033268.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033291.html To unsubscribe from Index data from multiple tables into Solr, click here http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4032266code=aGFzc2FuY3Jvd2RjYXJlQGdtYWlsLmNvbXw0MDMyMjY2fC00ODMwNzMyOTM= . NAML http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033296.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards Naresh
Re: Performance issue with group.ngroups=true
Mickael, I just wonder you have considered BlockJoin? it performs much better than query time approaches http://blog.griddynamics.com/2012/08/block-join-query-performs.html ,but faceting hasn't been implemented for it yet. On Tue, Jan 15, 2013 at 2:01 PM, Mickael Magniez mickaelmagn...@gmail.comwrote: Hi, Retry on a better machine (2CPU, 8GB RAM, 1.5GB for java half used according admin interface) still have the same issue. It seems to grow with matches count : with a search matching 100k documents, it takes 700ms, vs 70ms without ngroup (CPU is 100% during request) For information, my index has 1M documents, for 700MB of data. -- View this message in context: http://lucene.472066.n3.nabble.com/Performance-issue-with-group-ngroups-true-tp4031888p4033422.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: access matched token ids in the FacetComponent?
Dmitry, I have some relevant experience and ready to help, but I can not get the core problem. Could you please expand the description and/or provide a sample? On Tue, Jan 15, 2013 at 11:01 AM, Dmitry Kan solrexp...@gmail.com wrote: Hello! Is there a simple way of accessing the matched token ids in the FacetComponent? The use case is to text search on one field and facet on another. And in the facet counts we want to see the text hit counts. Can it be done via some other component / approach? Any input is greatly appreciated. Dmitry -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: SOlr 3.5 and sharding
You're confusing shards and slaves here. Shards are splitting a logical index amongst N machines, where each machine contains a portion of the index. In that setup, you have to configure the slaves to know about the other shards, and the incoming query has to be distributed amongst all the shards to find all the docs. In your case, since you're really replicating (rather than sharding), you only have to query _one_ slave, the query doesn't need to be distributed. So pull all the sharding stuff out of your config files, put a load balancer in front of your slaves and only send the request to one of them would be the place I'd start. Also, don't be at all surprised if the number of hits from the _master_ (which you shouldn't be searching, BTW) is different than the slaves, there's the polling interval to consider. Best Erick On Mon, Jan 14, 2013 at 9:58 AM, Jean-Sebastien Vachon jean-sebastien.vac...@wantedanalytics.com wrote: Hi, I`m setting up a small Sorl setup consisting of 1 master node and 4 shards. For now, all four shards contains the exact same data. When I perform a query on each individual shards for the word `java` I am receiving the same number of docs (as expected). However, when I am going through the master node using the shards parameters, the number of results is slightly off by a few documents. There is nothing special in my setup so I`m looking for hints on why I am getting this problem Thanks
Re: SolrCloud :: Adding replica :: Sync-up issue
Trying again, original reply rejected as spam. This won't be all that helpful, but 4.1 has a lot of improvements as far as SolrCloud is concerned, and it's in the process of being put together now. So I suspect the best use of time would be to work with 4.1 (or a nightly build between now and then, or a build off the 4.1 branch) and report of the issue is still there. As I said, not much help but Best, Erick
Error loading plugin
Hi, I'm trying to write my own search handler, but i have problem loading it into solr. Error message is : Caused by: org.apache.solr.common.SolrException: Error loading class 'com.company.solr.GroupRequestHandler' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:438) ... 14 more Caused by: java.lang.ClassNotFoundException: com.company.solr.GroupRequestHandler The .jar file is loaded at startup : INFO: Adding 'file:/home/solr/solr/apache-solr-4.1-2013-01-10_05-50-28/company/solr/lib/GroupRequestHandler.jar' to classloader My jar seems correct : contains one file com/company/solr/GroupRequestHandler.class Any idea, Mickael -- View this message in context: http://lucene.472066.n3.nabble.com/Error-loading-plugin-tp4033454.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: retrieving latest document **only**
The sum of all the count in the groups… does not match the total no of docs found. ./zahoor On 12-Jan-2013, at 1:27 PM, Upayavira u...@odoko.co.uk wrote: Not sure exactly what you mean, can you give an example? Upayavira On Sat, Jan 12, 2013, at 06:32 AM, J Mohamed Zahoor wrote: Cool… it worked… But the count of all the groups and the count inside stats component does not match… Is that a bug? ./zahoor On 11-Jan-2013, at 6:48 PM, Upayavira u...@odoko.co.uk wrote: could you use field collapsing? Boost by date and only show one value per group, and you'll have the most recent document only. Upayavira On Fri, Jan 11, 2013, at 01:10 PM, jmozah wrote: one crude way is first query and pick the latest date from the result then issue a query with q=timestamp[latestDate TO latestDate] But i dont want to execute two queries... ./zahoor On 11-Jan-2013, at 6:37 PM, jmozah jmo...@gmail.com wrote: What do you want? 'the most recent ones' or '**only** the latest' ? Perhaps a range query q=timestamp:[refdate TO NOW] will match your needs. Uwe I need **only** the latest documents... in the above query , refdate can vary based on the query. ./zahoor
Re: retrieving latest document **only**
Is your group field multivalued? Could docs appear in more than one group? Upayavira On Tue, Jan 15, 2013, at 01:22 PM, J Mohamed Zahoor wrote: The sum of all the count in the groups… does not match the total no of docs found. ./zahoor On 12-Jan-2013, at 1:27 PM, Upayavira u...@odoko.co.uk wrote: Not sure exactly what you mean, can you give an example? Upayavira On Sat, Jan 12, 2013, at 06:32 AM, J Mohamed Zahoor wrote: Cool… it worked… But the count of all the groups and the count inside stats component does not match… Is that a bug? ./zahoor On 11-Jan-2013, at 6:48 PM, Upayavira u...@odoko.co.uk wrote: could you use field collapsing? Boost by date and only show one value per group, and you'll have the most recent document only. Upayavira On Fri, Jan 11, 2013, at 01:10 PM, jmozah wrote: one crude way is first query and pick the latest date from the result then issue a query with q=timestamp[latestDate TO latestDate] But i dont want to execute two queries... ./zahoor On 11-Jan-2013, at 6:37 PM, jmozah jmo...@gmail.com wrote: What do you want? 'the most recent ones' or '**only** the latest' ? Perhaps a range query q=timestamp:[refdate TO NOW] will match your needs. Uwe I need **only** the latest documents... in the above query , refdate can vary based on the query. ./zahoor
RE: SOlr 3.5 and sharding
Hi Erick, Thanks for your comments but I am migrating an existing index (single instance) to a sharded setup and currently I have no access to the code involved in the indexation process. That`s why I made a simple copy of the index on each shards. In the end, the data will be distributed among all shards. I was just curious to know why I had not the expected number of documents with my four shards. Can you elaborate on this polling interval thing? I am pretty sure I never eared about this... Regards -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: January-15-13 8:00 AM To: solr-user@lucene.apache.org Subject: Re: SOlr 3.5 and sharding You're confusing shards and slaves here. Shards are splitting a logical index amongst N machines, where each machine contains a portion of the index. In that setup, you have to configure the slaves to know about the other shards, and the incoming query has to be distributed amongst all the shards to find all the docs. In your case, since you're really replicating (rather than sharding), you only have to query _one_ slave, the query doesn't need to be distributed. So pull all the sharding stuff out of your config files, put a load balancer in front of your slaves and only send the request to one of them would be the place I'd start. Also, don't be at all surprised if the number of hits from the _master_ (which you shouldn't be searching, BTW) is different than the slaves, there's the polling interval to consider. Best Erick On Mon, Jan 14, 2013 at 9:58 AM, Jean-Sebastien Vachon jean-sebastien.vac...@wantedanalytics.com wrote: Hi, I`m setting up a small Sorl setup consisting of 1 master node and 4 shards. For now, all four shards contains the exact same data. When I perform a query on each individual shards for the word `java` I am receiving the same number of docs (as expected). However, when I am going through the master node using the shards parameters, the number of results is slightly off by a few documents. There is nothing special in my setup so I`m looking for hints on why I am getting this problem Thanks - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2013.0.2890 / Base de données virale: 2638/6032 - Date: 14/01/2013
Tutorial for Solr query language, dismax and edismax?
Does anyone have a great tutorial for learning the solr query language, dismax and edismax? I've searched endlessly for one but I haven't been able to locate one that is comprehensive enough and has a lot of examples (that actually work!). I also tried to use wildcards, logical operators, and a phrase search and it either didn't work or behave the way I thought it would. for example, I tried to search a multivalued field solr.title and a content field that contains their phone number (and a lot of other data) so, from the solr admin query page; in the q field i tried lots of variations of this- solr.title:*Costa, Julie* AND content:tel= And I either got 0 results or ALL the results. solr.title would only work if I put in solr.title:*Costa* but not anything longer than that. Even though there are plenty of Costa, J's (John, Julie, Julia, Jerry etc) I should be able to do a phrase search out of the box, shouldn't I? I also read on one site that only edismax can use logical operators but I couldn't get that to work either. Can anyone point me in the right direction? I'm currently using Solr 4.0 Final with ManifoldCF v 1.2 dev Thank you, -- View this message in context: http://lucene.472066.n3.nabble.com/Tutorial-for-Solr-query-language-dismax-and-edismax-tp4033465.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOlr 3.5 and sharding
He was referring to master/slave setup, where a slave will poll the master periodically asking for index updates. That frequency is configured in solrconfig.xml on the slave. So, you are saying that you have, say 1m documents in your master index. You then copy your index to four other boxes. At that point you have 1m documents on each of those four. Eventually, you'll delete some docs, so'd you have 250k on each. You're wondering, before the deletes, you're not seeing 1m docs on each of your instances. Or are you wondering why you're not seeing 1m docs when you do a distributed query across all for of these boxes? Is that correct? Upayavira On Tue, Jan 15, 2013, at 02:11 PM, Jean-Sebastien Vachon wrote: Hi Erick, Thanks for your comments but I am migrating an existing index (single instance) to a sharded setup and currently I have no access to the code involved in the indexation process. That`s why I made a simple copy of the index on each shards. In the end, the data will be distributed among all shards. I was just curious to know why I had not the expected number of documents with my four shards. Can you elaborate on this polling interval thing? I am pretty sure I never eared about this... Regards -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: January-15-13 8:00 AM To: solr-user@lucene.apache.org Subject: Re: SOlr 3.5 and sharding You're confusing shards and slaves here. Shards are splitting a logical index amongst N machines, where each machine contains a portion of the index. In that setup, you have to configure the slaves to know about the other shards, and the incoming query has to be distributed amongst all the shards to find all the docs. In your case, since you're really replicating (rather than sharding), you only have to query _one_ slave, the query doesn't need to be distributed. So pull all the sharding stuff out of your config files, put a load balancer in front of your slaves and only send the request to one of them would be the place I'd start. Also, don't be at all surprised if the number of hits from the _master_ (which you shouldn't be searching, BTW) is different than the slaves, there's the polling interval to consider. Best Erick On Mon, Jan 14, 2013 at 9:58 AM, Jean-Sebastien Vachon jean-sebastien.vac...@wantedanalytics.com wrote: Hi, I`m setting up a small Sorl setup consisting of 1 master node and 4 shards. For now, all four shards contains the exact same data. When I perform a query on each individual shards for the word `java` I am receiving the same number of docs (as expected). However, when I am going through the master node using the shards parameters, the number of results is slightly off by a few documents. There is nothing special in my setup so I`m looking for hints on why I am getting this problem Thanks - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2013.0.2890 / Base de données virale: 2638/6032 - Date: 14/01/2013
Re: how to optimize same query with different start values
You are setting yourself up for disaster. If you ask Solr for documents 1000 to 1010, it needs to sort documents 1 to 1010, and discard the first 1000, which causes horrible performance. I'm curious to hear if others have strategies to extract content sequentially from an index. I suspect a new SearchComponent could really help here. I suspect it would work better if you don't sort at all, in which case you'll return the documents in index order. The issue is that a commit, or a background merge could change index order which would mess up your export. Sorry no clearer answers. Upayavira On Tue, Jan 15, 2013, at 02:07 PM, elisabeth benoit wrote: Hello, I have a Solr instance (solr 3.6.1) with around 3 000 000 documents. I want to read (in a java test application) all my documents, but not in one shot (because it takes too much memory). So I send the same request, over and over, with q=*:* rows=1000 sort=id desc = to be sure I always get same ordering* and start parameter increased of 1000 at each iteration checking the solr logs, I realized that the query responding time increases as the start parameter gets bigger for instance with start 500 000, it takes about 500ms with start 1 100 000 and 1 200 000, it takes between 5000 and 5200 ms with start 1 250 000 and 1 320 000, it takes between 6100 and 6400 ms Does someone have an idea how to optimize this query? Thanks, Elisabeth
Re: Results in same or different fields
Hi, maybe it helps to have a closer look on the other params of edismax. http://wiki.apache.org/solr/ExtendedDisMax#pf_.28Phrase_Fields.29 'mm=2' will be to strong, but th usage of pf, pf2, and pf is likely your solution. uwe Am 15.01.2013 10:15, schrieb Gastone Penzo: Hi, i'm using solr 4.0 with edismax search handler. i'm searching inside 3 fields with same boost. i'd like to have high score for results in the same fields, instead of results in different fields es. qf=title,description if white house is found in title, it must have higher score than white in title field and house in description field how is it possible? ps. i set omitTermFreqAndPositions=true for all fields thanx *Gastone Penzo* * *
RE: SOlr 3.5 and sharding
Ok I see what Erick`s meant now.. Thanks. The original index I`m working on contains about 120k documents. Since I have no access to the code that pushes documents into the index, I made four copies of the same index. The master node contains no data at all, it simply use the data available in its four shards. Knowing that I have 1000 documents matching the keyword java on each shard I was expecting to receive 4000 documents out of my sharded setup. There are only a few documents that are not accounted for (The result count is about 3996 which is pretty close but not accurate). Right now, the index is static so there is no need for any replication so the polling interval has no effect. Later this week, I will configure the replication and have the indexation modified to distribute the documents to each shard using a simple ID modulo 4 rule. Were my expectations wrong about the number of documents? -Original Message- From: Upayavira [mailto:u...@odoko.co.uk] Sent: January-15-13 9:21 AM To: solr-user@lucene.apache.org Subject: Re: SOlr 3.5 and sharding He was referring to master/slave setup, where a slave will poll the master periodically asking for index updates. That frequency is configured in solrconfig.xml on the slave. So, you are saying that you have, say 1m documents in your master index. You then copy your index to four other boxes. At that point you have 1m documents on each of those four. Eventually, you'll delete some docs, so'd you have 250k on each. You're wondering, before the deletes, you're not seeing 1m docs on each of your instances. Or are you wondering why you're not seeing 1m docs when you do a distributed query across all for of these boxes? Is that correct? Upayavira On Tue, Jan 15, 2013, at 02:11 PM, Jean-Sebastien Vachon wrote: Hi Erick, Thanks for your comments but I am migrating an existing index (single instance) to a sharded setup and currently I have no access to the code involved in the indexation process. That`s why I made a simple copy of the index on each shards. In the end, the data will be distributed among all shards. I was just curious to know why I had not the expected number of documents with my four shards. Can you elaborate on this polling interval thing? I am pretty sure I never eared about this... Regards -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: January-15-13 8:00 AM To: solr-user@lucene.apache.org Subject: Re: SOlr 3.5 and sharding You're confusing shards and slaves here. Shards are splitting a logical index amongst N machines, where each machine contains a portion of the index. In that setup, you have to configure the slaves to know about the other shards, and the incoming query has to be distributed amongst all the shards to find all the docs. In your case, since you're really replicating (rather than sharding), you only have to query _one_ slave, the query doesn't need to be distributed. So pull all the sharding stuff out of your config files, put a load balancer in front of your slaves and only send the request to one of them would be the place I'd start. Also, don't be at all surprised if the number of hits from the _master_ (which you shouldn't be searching, BTW) is different than the slaves, there's the polling interval to consider. Best Erick On Mon, Jan 14, 2013 at 9:58 AM, Jean-Sebastien Vachon jean-sebastien.vac...@wantedanalytics.com wrote: Hi, I`m setting up a small Sorl setup consisting of 1 master node and 4 shards. For now, all four shards contains the exact same data. When I perform a query on each individual shards for the word `java` I am receiving the same number of docs (as expected). However, when I am going through the master node using the shards parameters, the number of results is slightly off by a few documents. There is nothing special in my setup so I`m looking for hints on why I am getting this problem Thanks - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2013.0.2890 / Base de données virale: 2638/6032 - Date: 14/01/2013 - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2013.0.2890 / Base de données virale: 2638/6032 - Date: 14/01/2013
Re: how to optimize same query with different start values
It's a well know search engines limitation. This post will help you get into the core problem http://www.searchworkings.org/blog/-/blogs/lucene-solr-and-deep-paging . it seems that the solution is contributed into Lucene, but not yet for Solr. On Tue, Jan 15, 2013 at 6:36 PM, Upayavira u...@odoko.co.uk wrote: You are setting yourself up for disaster. If you ask Solr for documents 1000 to 1010, it needs to sort documents 1 to 1010, and discard the first 1000, which causes horrible performance. I'm curious to hear if others have strategies to extract content sequentially from an index. I suspect a new SearchComponent could really help here. I suspect it would work better if you don't sort at all, in which case you'll return the documents in index order. The issue is that a commit, or a background merge could change index order which would mess up your export. Sorry no clearer answers. Upayavira On Tue, Jan 15, 2013, at 02:07 PM, elisabeth benoit wrote: Hello, I have a Solr instance (solr 3.6.1) with around 3 000 000 documents. I want to read (in a java test application) all my documents, but not in one shot (because it takes too much memory). So I send the same request, over and over, with q=*:* rows=1000 sort=id desc = to be sure I always get same ordering* and start parameter increased of 1000 at each iteration checking the solr logs, I realized that the query responding time increases as the start parameter gets bigger for instance with start 500 000, it takes about 500ms with start 1 100 000 and 1 200 000, it takes between 5000 and 5200 ms with start 1 250 000 and 1 320 000, it takes between 6100 and 6400 ms Does someone have an idea how to optimize this query? Thanks, Elisabeth -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
RE: DataImportHandlerException: Unable to execute query with OPTIM
I think your JDBC driver is complaining because it doesn't like what is being set for the fetch size on the Statement. Fetch size is controlled by the batchSize parameter on dataSource / . Using batchSize=-1, I believe, is a workaround for MySql but I suspect your driver requires it to be 0 (or at least -1). If you omit batchSize entirely, DIH sets it to 500 as a default. Also, setting it to -1 causes DIH to change this to Integer.MIN_VALUE. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: ashimbose [mailto:ashimb...@gmail.com] Sent: Tuesday, January 15, 2013 4:48 AM To: solr-user@lucene.apache.org Subject: DataImportHandlerException: Unable to execute query with OPTIM I have tried to search for my specific problem but have not found solution. I have also read the wiki on the DIH and seem to have everything set up right but my Query still fails. Thank you for your help I am running Solr 3.6.1 with Tomcat 6.0 Windows7 64bit and IBM Optim Archive File I have all jar file sitting in C:\Program Files\Apache Software Foundation\Tomcat 6.0\lib My solrconfig.xml is requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdb-data-config.xml/str /lst /requestHandler My db-data-config.xml ?xml version=1.0 encoding=utf-8? dataConfig dataSource type=JdbcDataSource name=SAMPLE_OPTIM_DB driver=com.ibm.optim.connect.jdbc.NvDriver url=jdbc:attconnect://198.168.2.89:2551/NAVIGATOR;DefTdpName=SAMPLE_OPTIM_DB batchSize=-1 user= password= readOnly=True / document name=headwords entity name=CUSTOMERS dataSource=SAMPLE_OPTIM_DB query=SELECT * FROM SAMPLE_OPTIM_DB:CUSTOMERS transformer=RegexTransformer field column=CUSTNAME name=CUSTNAME/ /entity /document /dataConfig I am Having below error WARNING: no uniqueKey specified in schema. Jan 15, 2013 4:05:44 PM org.apache.solr.core.SolrCore init INFO: [core0] Opening new SolrCore at solr\core0\, dataDir=solr/core0\data\ Jan 15, 2013 4:05:44 PM org.apache.solr.core.SolrCore init INFO: JMX monitoring not detected for core: core0 Jan 15, 2013 4:05:44 PM org.apache.solr.core.SolrCore initListeners INFO: [core0] Added SolrEventListener for newSearcher: org.apache.solr.core. QuerySenderListener{queries=[{q=solr,start=0,rows=10}, {q=rocks,start=0,rows=10} , {q=static newSearcher warming query from solrconfig.xml}]} Jan 15, 2013 4:05:44 PM org.apache.solr.core.SolrCore initListeners INFO: [core0] Added SolrEventListener for firstSearcher: org.apache.solr.cor e.QuerySenderListener{queries=[]} Jan 15, 2013 4:05:44 PM org.apache.solr.core.RequestHandlers initHandlersFromCon fig INFO: created standard: solr.StandardRequestHandler Jan 15, 2013 4:05:44 PM org.apache.solr.core.RequestHandlers initHandlersFromCon fig INFO: created /dataimport: org.apache.solr.handler.dataimport.DataImportHandler Jan 15, 2013 4:05:44 PM org.apache.solr.core.RequestHandlers initHandlersFromCon fig INFO: created /search: org.apache.solr.handler.component.SearchHandler Jan 15, 2013 4:05:44 PM org.apache.solr.core.RequestHandlers initHandlersFromCon fig INFO: created /update: solr.XmlUpdateRequestHandler Jan 15, 2013 4:05:44 PM org.apache.solr.search.SolrIndexSearcher init INFO: Opening Searcher@53bc93fe main Jan 15, 2013 4:05:44 PM org.apache.solr.update.CommitTracker init INFO: commitTracker AutoCommit: disabled Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.SearchHandler inform INFO: Adding component:org.apache.solr.handler.component.QueryComponent@781fb1f b Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.SearchHandler inform INFO: Adding component:org.apache.solr.handler.component.FacetComponent@68de135 9 Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.SearchHandler inform INFO: Adding component:org.apache.solr.handler.component.MoreLikeThisComponent@ 4bc86dd8 Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.SearchHandler inform INFO: Adding component:org.apache.solr.handler.component.HighlightComponent@53a 3a6c6 Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.SearchHandler inform INFO: Adding component:org.apache.solr.handler.component.StatsComponent@1d1a3c1 0 Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.SearchHandler inform INFO: Adding debug component:org.apache.solr.handler.component.DebugComponent@2 55d4d5d Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.HttpShardHandlerFactor y getParameter INFO: Setting socketTimeout to: 0 Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.HttpShardHandlerFactor y getParameter INFO: Setting urlScheme to: http:// Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.HttpShardHandlerFactor y getParameter INFO: Setting connTimeout to: 0 Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.HttpShardHandlerFactor y getParameter INFO: Setting maxConnectionsPerHost to: 20 Jan 15, 2013 4:05:44 PM
RE: Disabling document cache usage
No, SolrIndexSearcher has no mechanism to do that. The only way is to disable the cache altogether or patch it up :) -Original message- From:Otis Gospodnetic otis.gospodne...@gmail.com Sent: Tue 15-Jan-2013 16:57 To: solr-user@lucene.apache.org Subject: Disabling document cache usage Hi, https://issues.apache.org/jira/browse/SOLR-2429 added the ability to disable filter and query caches on a request by request basis. Is there anything one can use to disable usage of (lookups and insertion into) document cache? Thanks, Otis -- Solr ElasticSearch Support http://sematext.com/
Re: Disabling document cache usage
Hi, Thanks Markus. How are caches disabled these days... in Solr 4.0 that is? I remember trying to comment them out in the past, but seeing them still enabled and used with some custom size and other settings. Thanks, Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Jan 15, 2013 at 11:00 AM, Markus Jelsma markus.jel...@openindex.iowrote: No, SolrIndexSearcher has no mechanism to do that. The only way is to disable the cache altogether or patch it up :) -Original message- From:Otis Gospodnetic otis.gospodne...@gmail.com Sent: Tue 15-Jan-2013 16:57 To: solr-user@lucene.apache.org Subject: Disabling document cache usage Hi, https://issues.apache.org/jira/browse/SOLR-2429 added the ability to disable filter and query caches on a request by request basis. Is there anything one can use to disable usage of (lookups and insertion into) document cache? Thanks, Otis -- Solr ElasticSearch Support http://sematext.com/
V 4.0.0.0 insert
I don't understand how to add data into the document. I created a core in version 4.0.0 test_core I can read the data on solr/test_core/select and insert does not work. How to add data?
Re: Search across a specified number of boundaries
Mikhail, Yeah, I considered that originally, but then after analyzing the data noticed that was not possible. Some of the content we analyze contains large tables that after ocr get turned into long running sentences which contain 500k+ words per a sentence. Overall there are probably around 10k of those anomalies that stop the ranges from working as we run out of positions with the max value an integer can contain and run the risk of a future document breaking it. I found a Jira on what I'm looking for. Going to look into it and see if I can get it to work for my situation. https://issues.apache.org/jira/browse/LUCENE-777 Thanks for the help. Mike On Mon, Jan 14, 2013 at 11:48 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Mike, When Lucene's Analyser indexes the text it adds positions into the index which are lately used by SpanQueries. Have you considered idea of position increment gap? e.g. the first sentence is indexed with words positions: 0,1,2,3,... the second sentence with 100,101,102,103,..., third 200,201,202.. Then applying some span constraint allows you search across/inside of the sentences. WDYT? On Sun, Jan 6, 2013 at 6:50 PM, Erick Erickson erickerick...@gmail.comwrote: Mike: I'm _really_ stretching here, but you might be able to do something interesting with payloads. Say each word had a payload with the sentence number and you _somehow_ made use of that information in a custom scorer. But like I said, I really have no good idea how to accomplish that... BTW, in future this kind of question is better asked on the user's list (either Lucene or Solr), this list if intended for discussing development work Best Erick On Fri, Jan 4, 2013 at 1:02 PM, Mike Ree mike.ad...@olytech.net wrote: d terms that are in nearby sentences. IE: TermA NEAR3 TermB would find all TermA's that are within 3 sentences of TermB. Have found ways to find TermA within same sentence -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: how to optimize same query with different start values
It looks like a use case for using Solrj with queryAndStreamResponse ? http://lucene.apache.org/solr/api-4_0_0-BETA/org/apache/solr/client/solrj/SolrServer.html#queryAndStreamResponse%28org.apache.solr.common.params.SolrParams,%20org.apache.solr.client.solrj.StreamingResponseCallback%29 André On 01/15/2013 04:49 PM, Mikhail Khludnev wrote: It's a well know search engines limitation. This post will help you get into the core problem http://www.searchworkings.org/blog/-/blogs/lucene-solr-and-deep-paging . it seems that the solution is contributed into Lucene, but not yet for Solr. On Tue, Jan 15, 2013 at 6:36 PM, Upayavirau...@odoko.co.uk wrote: You are setting yourself up for disaster. If you ask Solr for documents 1000 to 1010, it needs to sort documents 1 to 1010, and discard the first 1000, which causes horrible performance. I'm curious to hear if others have strategies to extract content sequentially from an index. I suspect a new SearchComponent could really help here. I suspect it would work better if you don't sort at all, in which case you'll return the documents in index order. The issue is that a commit, or a background merge could change index order which would mess up your export. Sorry no clearer answers. Upayavira On Tue, Jan 15, 2013, at 02:07 PM, elisabeth benoit wrote: Hello, I have a Solr instance (solr 3.6.1) with around 3 000 000 documents. I want to read (in a java test application) all my documents, but not in one shot (because it takes too much memory). So I send the same request, over and over, with q=*:* rows=1000 sort=id desc = to be sure I always get same ordering* and start parameter increased of 1000 at each iteration checking the solr logs, I realized that the query responding time increases as the start parameter gets bigger for instance with start 500 000, it takes about 500ms with start 1 100 000 and 1 200 000, it takes between 5000 and 5200 ms with start 1 250 000 and 1 320 000, it takes between 6100 and 6400 ms Does someone have an idea how to optimize this query? Thanks, Elisabeth -- André Bois-Crettez Search technology, Kelkoo http://www.kelkoo.com/ Kelkoo SAS Société par Actions Simplifiée Au capital de € 4.168.964,30 Siège social : 8, rue du Sentier 75002 Paris 425 093 069 RCS Paris Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.
Re: Index data from multiple tables into Solr
Hi, once i have indexed data from multiple tables from mysql database into solr, is there any way that it update data(automatically) if any change is made to the data in mysql? On Tue, Jan 15, 2013 at 6:13 AM, Naresh [via Lucene] ml-node+s472066n403343...@n3.nabble.com wrote: Get user's input, form the solr query and send a request to the server (you can also pass a parameter called wt (xml,json etc) to direct solr to return output in that format). Parse the results from solr and display them to user in your website. Depending on what kind of server-side programming language you are using, there might be some libraries available that will allow to integrate your web-application with solr (for example: sunspot_solr in ruby) On Tue, Jan 15, 2013 at 5:24 AM, hassancrowdc [hidden email]http://user/SendEmail.jtp?type=nodenode=4033438i=0wrote: thanx, I got it. How Can i integrate solr with my website? so that i can use it for search? On Mon, Jan 14, 2013 at 4:04 PM, Lance Norskog-2 [via Lucene] [hidden email] http://user/SendEmail.jtp?type=nodenode=4033438i=1 wrote: Try all of the links under the collection name in the lower left-hand columns. There several administration monitoring tools you may find useful. On 01/14/2013 11:45 AM, hassancrowdc wrote: ok stats are changing, so the data is indexed. But how can i do query with this data, or ow can i search it, like the command will be http://localhost:8983/solr/select?q=(any of my field column from table)? coz whatever i am putting in my url it shows me an xml file but the numFound are always 0? On Sat, Jan 12, 2013 at 1:24 PM, Alexandre Rafalovitch [via Lucene] [hidden email] http://user/SendEmail.jtp?type=nodenode=4033291i=0 wrote: Have you tried the Admin interface yet? The one on :8983 port if you are running default setup. That has a bunch of different stats you can look at apart from a nice way of doing a query. I am assuming you are on Solr 4, of course. Regards, Alex. On Fri, Jan 11, 2013 at 5:13 PM, hassancrowdc [hidden email] http://user/SendEmail.jtp?type=nodenode=4032778i=0wrote: So, I followed all the steps and solr is working successfully, Can you please tell me how i can see if my data is indexed or not? do i have to enter specific url into my browser or anything. I want to make sure that the data is indexed. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) -- If you reply to this email, your message will be added to the discussion below: . NAML http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033268.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033291.html To unsubscribe from Index data from multiple tables into Solr, click here . NAML http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033296.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards Naresh -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033438.html To unsubscribe from Index data from multiple tables into Solr, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4032266code=aGFzc2FuY3Jvd2RjYXJlQGdtYWlsLmNvbXw0MDMyMjY2fC00ODMwNzMyOTM= .
Re: Tutorial for Solr query language, dismax and edismax?
You should not need to use wildcards. Most configurations of Solr will index space-separated words as separate tokens. They can be matched separately. DId you use a string field type (probably the wrong choice)? How are your fields tokenized? Solr/Lucene query syntax: http://wiki.apache.org/solr/SolrQuerySyntax http://lucene.apache.org/core/3_6_0/queryparsersyntax.html The analysis page in the admin UI is your friend here. You can put in text for the index and the query, choose a field type, and see how it is tokenized and matched. wunder On Jan 15, 2013, at 6:14 AM, eShard wrote: Does anyone have a great tutorial for learning the solr query language, dismax and edismax? I've searched endlessly for one but I haven't been able to locate one that is comprehensive enough and has a lot of examples (that actually work!). I also tried to use wildcards, logical operators, and a phrase search and it either didn't work or behave the way I thought it would. for example, I tried to search a multivalued field solr.title and a content field that contains their phone number (and a lot of other data) so, from the solr admin query page; in the q field i tried lots of variations of this- solr.title:*Costa, Julie* AND content:tel= And I either got 0 results or ALL the results. solr.title would only work if I put in solr.title:*Costa* but not anything longer than that. Even though there are plenty of Costa, J's (John, Julie, Julia, Jerry etc) I should be able to do a phrase search out of the box, shouldn't I? I also read on one site that only edismax can use logical operators but I couldn't get that to work either. Can anyone point me in the right direction? I'm currently using Solr 4.0 Final with ManifoldCF v 1.2 dev Thank you, -- View this message in context: http://lucene.472066.n3.nabble.com/Tutorial-for-Solr-query-language-dismax-and-edismax-tp4033465.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Suggestion that preserve original phrase case
Thanks Erick, can you tell me how to do the appending (lowercaseversion:LowerCaseVersion) before indexing. I tried pattern factory filters, but I could not get it right. On Sun, Jan 13, 2013 at 8:49 PM, Erick Erickson erickerick...@gmail.comwrote: One way I've seen this done is to index pairs like lowercaseversion:LowerCaseVersion. You can't push this whole thing through your field as defined since it'll all be lowercased, you have to produce the left hand side of the above yourself and just use KeywordTokenizer without LowercaseFilter. Then, your application displays the right-hand-side of the returned token. Simple solution, not very elegant, but sometimes the easiest... Best Erick On Fri, Jan 11, 2013 at 1:30 AM, Selvam s.selvams...@gmail.com wrote: Hi*, * I have been trying to figure out a way for case insensitive suggestion but which should return original phrase as result.* *I am using* *solr 3.5* * *For eg: * If I index 'Hello world' and search for 'hello' it needs to return *'Hello world'* not *'hello world'. *My configurations are as follows,* * * New field type:* fieldType class=solr.TextField name=text_auto analyzer tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.LowerCaseFilterFactory/ /analyzer *Field values*: field name=label type=text indexed=true stored=true termVectors=true omitNorms=true/ field name=label_autocomplete type=text_auto indexed=true stored=true multiValued=false/ copyField source=label dest=label_autocomplete / *Spellcheck Component*: searchComponent name=suggest class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetext_auto/str lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str str name=buildOnOptimizetrue/str str name=buildOnCommittrue/str str name=fieldlabel_autocomplete/str /lst /searchComponent Kindly share your suggestions to implement this behavior. -- Regards, Selvam KnackForge http://knackforge.com Acquia Service Partner No. 1, 12th Line, K.K. Road, Venkatapuram, Ambattur, Chennai, Tamil Nadu, India. PIN - 600 053. -- Regards, Selvam KnackForge http://knackforge.com Acquia Service Partner No. 1, 12th Line, K.K. Road, Venkatapuram, Ambattur, Chennai, Tamil Nadu, India. PIN - 600 053.
RE: Disabling document cache usage
Hi, Commenting them out works fine. We don't use documentCaches either as they eat too much and return only so little. Cheers -Original message- From:Otis Gospodnetic otis.gospodne...@gmail.com Sent: Tue 15-Jan-2013 17:29 To: solr-user@lucene.apache.org Subject: Re: Disabling document cache usage Hi, Thanks Markus. How are caches disabled these days... in Solr 4.0 that is? I remember trying to comment them out in the past, but seeing them still enabled and used with some custom size and other settings. Thanks, Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Jan 15, 2013 at 11:00 AM, Markus Jelsma markus.jel...@openindex.iowrote: No, SolrIndexSearcher has no mechanism to do that. The only way is to disable the cache altogether or patch it up :) -Original message- From:Otis Gospodnetic otis.gospodne...@gmail.com Sent: Tue 15-Jan-2013 16:57 To: solr-user@lucene.apache.org Subject: Disabling document cache usage Hi, https://issues.apache.org/jira/browse/SOLR-2429 added the ability to disable filter and query caches on a request by request basis. Is there anything one can use to disable usage of (lookups and insertion into) document cache? Thanks, Otis -- Solr ElasticSearch Support http://sematext.com/
Re: V 4.0.0.0 insert
Have you gone through the tutorial on the wiki first? It should cover basic use cases. If you have, how do you send the data in? Regards, Alex On 15 Jan 2013 11:22, Николай Измаилов bob...@mail.ru wrote: I don't understand how to add data into the document. I created a core in version 4.0.0 test_core I can read the data on solr/test_core/select and insert does not work. How to add data?
Re: Solr Query | Loading documents with large content (Performance)
Hi, Have a look under http://wiki.apache.org/solr/UpdateCSV#Methods_of_uploading_CSV_recordsabout uploading a *local* file. Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Jan 15, 2013 at 3:59 AM, Uwe Clement uwe.clem...@exxcellent.dewrote: Hi there, sometimes we have to load very big documents, 1-2 multi-value-fields of it can contain 10.000 items. And unfortunately we need this informations. We have to load 50 documents in order to show to the result table in the UI. The query takes around 50 seconds. I guess 48 seconds of it, is just to transfer the content of the documents over the net. What can I do here? -I know, can take out this long informations outside of the document. But this is also not really a solution -Then I was thinking about compressed-fields. They come with solr 4.1 again, right? How is it with compressed field. As I understood the stored field will be stored in a compressed way. Ok, but when they will be uncompressed? -Before sending back to the client on server-side? -Or, on the clientside? I am using solrJ. Any other ideas? Can it work to increase the query performance using compressed fields? Thanks a lot for your ideas and answers! Regards Uwe -- Uwe Clement Software Architect Project Manager ___ |X__ X| eXXcellent solutions gmbh Beim Alten Fritz 2 D-89075 Ulm e | mailto:uwe.clem...@exxcellent.de uwe.clem...@exxcellent.de m | +49 [0]151-275 692 27 i | http://www.exxcellent.de http://www.exxcellent.de Geschäftsführer: Dr. Martina Burgetsmeier, Wilhelm Zorn, Gerhard Gruber Sitz der Gesellschaft: Ulm, Registergericht: Ulm HRB 4309
SolrCloud Performance for High Query Volume
Hi all, I'm currently in the process of doing some performance testing in preparations for upgrading from Solr 3.6.1 to Solr 4.0. (We're badly in need of NRT functionality) Our existing deployment is not a typical deployment for Solr, as we use it to search and facet on financial data such as accounts, positions and transactions records. To make matters worse, each request could potentially return upwards of 50,000 or more records from the index. As I said, it's not an ideal use case for Solr but this is the system that is in place and it really can't be changed at this point. With this defined use case, our current 3.6.1 deployment is able to scale to about 1500 queries per minute, with an average response time in the low 100-200ms. Note that this time includes the query time and the transport time (time to stream all the documents to the calling services). At the 50,000 document mark, we're getting about 1.6-2 sec. response time. The client is willing to live with this as these type of requests are not very frequent. Our hardware configuration on the 3.6.1 environment is as follows: * 1 Master Server for indexing with 2 CPU (each 6 cores, 2.67GHz) 4GB of RAM and 150GB HDD * 2 Slaves Servers for query only each with 2 of CPUs (each 6 cores, 2.67GHz) with 12GB of RAM each and same HDD space. (mechanical drive) Each of the servers are virtual servers in a VMWare environment. Now with the roughly the same schema and solrconfig configuration, the performance on Solr 4.0 is quite bad. Running just 500 queries per minute our query performance degrades to almost 2 minute response times in some cases. The average is about 40-50 sec. response time. Note that the index at the moment is only a fraction of the size of the existing environment (about 1/8th the size). The hardware setup for the SolrCloud deployment is as follows: * 4 Solr server instances each with 4 CPUs (each 6 cores, 2.67GHz), 8GB of RAM and 150GB HDD * 3 ZooKeeper server instances. We are using each Solr server instance to run 1 ZK instance, with the 4th server not running a ZK server. We haven't observed any issues with memory utilization. Additionally the virtual servers are co-located. We're wondering if upgrading to Solid State Drives would improve performance significantly? Are there any other pointers or configuration changes that we can make to help bring down our query times? Any tips will be greatly appreciated. Thanks all!
Re: Index data from multiple tables into Solr
On 1/15/2013 9:20 AM, hassancrowdc wrote: Hi, once i have indexed data from multiple tables from mysql database into solr, is there any way that it update data(automatically) if any change is made to the data in mysql? You need to write a program to do this. Although this list can provide guidance, such programs are highly customized to the particulars for your setup. There is not really any general purpose solution here. There are two typical approaches - have a program that initiates delta-imports with the dataimporter, or write a program that both talks to your database and uses a Solr client API to send updates to Solr. I used to use the former approach, now I use the latter. I still use the dataimporter for full reindexes, though. Thanks, Shawn
Re: Stored hierachical data in Solr
You can store structured data in Solr. You can't *query* it, in such a way as respects its structure. E.g. If I had xmlthisband/bthat/xml, I could parse that into terms: [this] [and] [that], and do searches upon them. But you couldn't search for documents that match an xpath such as /xml/b='and'. Upayavira On Tue, Jan 15, 2013, at 05:02 PM, Nicholas Ding wrote: Hello, I'm thinking store hierachical data structure on Solr. I know I have to flatten the structure in a form like A_B_C, but it is possible to extend Solr to support hierachical data? What about I store JSON text into a field, then load it and process it while Solr output the response? Is that doable by extending Solr? Thanks Nicholas
RE: Index data from multiple tables into Solr
He is talking about this list, the list we are using to communicate. You are sending your messages to a mailing list -- thousands are on it. Example of programs that will run the delta-import/full-import commands: Cron You are basically calling a URL with specific parameters to pull data from your DB Example of program that will use the Solr API: these are all application specific (based on what fields are in your schema, etc.). Swati -Original Message- From: hassancrowdc [mailto:hassancrowdc...@gmail.com] Sent: Tuesday, January 15, 2013 2:00 PM To: solr-user@lucene.apache.org Subject: Re: Index data from multiple tables into Solr Which list are you reffering to? and can you please give an example of such program(doesn't matter if it is for your setup)? On Tue, Jan 15, 2013 at 12:06 PM, Shawn Heisey-4 [via Lucene] ml-node+s472066n4033518...@n3.nabble.com wrote: On 1/15/2013 9:20 AM, hassancrowdc wrote: Hi, once i have indexed data from multiple tables from mysql database into solr, is there any way that it update data(automatically) if any change is made to the data in mysql? You need to write a program to do this. Although this list can provide guidance, such programs are highly customized to the particulars for your setup. There is not really any general purpose solution here. There are two typical approaches - have a program that initiates delta-imports with the dataimporter, or write a program that both talks to your database and uses a Solr client API to send updates to Solr. I used to use the former approach, now I use the latter. I still use the dataimporter for full reindexes, though. Thanks, Shawn -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-int o-Solr-tp4032266p4033518.html To unsubscribe from Index data from multiple tables into Solr, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro =unsubscribe_by_codenode=4032266code=aGFzc2FuY3Jvd2RjYXJlQGdtYWlsLmN vbXw0MDMyMjY2fC00ODMwNzMyOTM= . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro =macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.n amespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabb le.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21na bble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_em ail%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033545.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Index data from multiple tables into Solr
https://wiki.apache.org/solr/Solrj client. You'd have to configure it / use it based on your application needs. -Original Message- From: hassancrowdc [mailto:hassancrowdc...@gmail.com] Sent: Tuesday, January 15, 2013 2:38 PM To: solr-user@lucene.apache.org Subject: Re: Index data from multiple tables into Solr ok. so if i have manufacturer and id fields in schema file, what will be wat will be program that will use that will use solr API? -- View this message in context: http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033556.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Top Terms Using Luke
On 1/15/2013 11:54 AM, Lighton Phiri wrote: I would like to get a sense of the top terms for fields in my index and just enable the LukeRequestHandler [1] in my solrconfig.xml file. However, Luke seems to include stopwords as well. I've tried searching previous threads but nothing I've come across [2, 3, 4] has helped. How can I tell Luke not to include stopwords? Alternatively, what's the easiest way of getting top terms without stopwords? If you don't want stopwords in the top terms report, you have to remove them from your index. IMHO, this is not a good idea because you will lose search precision, but using StopFilterFactory in a fieldType analysis chain is very common. If you were to leave stopwords in your index but tell the tools to not display them, then the top terms list would be lying to you, and it would not be very useful as a troubleshooting tool. Troubleshooting is one of Luke's primary purposes. To get an idea for which non-stopwords are dominant in your index, just ask for more top terms, instead of just the top ten or top twenty. If you are using a program to parse the information, have your program remove the terms that you don't want to include, then trim the list to the proper size. Thanks, Shawn
Re: Index data from multiple tables into Solr
On 1/15/2013 12:00 PM, hassancrowdc wrote: Which list are you reffering to? The solr-user mailing list that we are both using here. and can you please give an example of such program(doesn't matter if it is for your setup)? I can't do that. It is confidential and proprietary code. Although I wrote it, I do not have any rights to share it because it was written on the job. Thanks, Shawn
Solr exception when parsing XML
Hi, I got SolrException when submitting XML for indexing (using solr 3.6.1) Jan 15, 2013 10:22:42 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Illegal character ((CTRL-CHAR, cod e 31)) at [row,col {unknown-source}]: [2,1169] at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:81) Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character ((CTRL-CHAR, code 31)) ... at [row,col {unknown-source}]: [2,1169] at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675) at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:660) at com.ctc.wstx.sr.BasicStreamReader.readCDataPrimary(BasicStreamReader.java:4240) at com.ctc.wstx.sr.BasicStreamReader.nextFromTreeCommentOrCData(BasicStreamReader.java:3280) at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2824) at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019) at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:309) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:156) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79) I checked details, the data causing trouble is word1chr(31)word2 here both word1 and word2 are normail English characters and chr(31) is just the returning value of PHP function chr(31). Our XML is well constructed and encoding/charset are well defined. The problem is due to chr(31), if I replace it with another UTF-8 character, indexing is OK. I checked source code com.ctc.wstx.sr.BasicStreamReader.java, it seems that it is by design any CTRL character is not allowed inside CDATA text, but I am puzzled that how could we avoid CTRL character in text in general (sure it is not a common occurance but can still happen)? Thanks very much for helps, Lisheng
Re: Index data from multiple tables into Solr
okay, thank you. After indexing data from database to solr. I want to search such that if i write any word (that is included in the documents been indexed) it should return all the documents that include that word. But it does not. When i write http://localhost:8983/solr/select?q=anyword i gives me error. is there anything wrong with my http? or is this the wrong place to search? On Tue, Jan 15, 2013 at 2:48 PM, sswoboda [via Lucene] ml-node+s472066n4033563...@n3.nabble.com wrote: https://wiki.apache.org/solr/Solrj client. You'd have to configure it / use it based on your application needs. -Original Message- From: hassancrowdc [mailto:[hidden email]http://user/SendEmail.jtp?type=nodenode=4033563i=0] Sent: Tuesday, January 15, 2013 2:38 PM To: [hidden email] http://user/SendEmail.jtp?type=nodenode=4033563i=1 Subject: Re: Index data from multiple tables into Solr ok. so if i have manufacturer and id fields in schema file, what will be wat will be program that will use that will use solr API? -- View this message in context: http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033556.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033563.html To unsubscribe from Index data from multiple tables into Solr, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4032266code=aGFzc2FuY3Jvd2RjYXJlQGdtYWlsLmNvbXw0MDMyMjY2fC00ODMwNzMyOTM= . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033614.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Index data from multiple tables into Solr
What error are you getting? Which field are you searching (default field)? Did you try specifying a default field? What is your schema like? Which analyzers did you use? Which version of solr are you using? I highly recommend going through the tutorial to get a basic understanding of inserting, updating, and searching: http://lucene.apache.org/solr/tutorial.html Hours have been spent in setting up these tutorials and they are very informative. -Original Message- From: hassancrowdc [mailto:hassancrowdc...@gmail.com] Sent: Tuesday, January 15, 2013 3:38 PM To: solr-user@lucene.apache.org Subject: Re: Index data from multiple tables into Solr okay, thank you. After indexing data from database to solr. I want to search such that if i write any word (that is included in the documents been indexed) it should return all the documents that include that word. But it does not. When i write http://localhost:8983/solr/select?q=anyword i gives me error. is there anything wrong with my http? or is this the wrong place to search? On Tue, Jan 15, 2013 at 2:48 PM, sswoboda [via Lucene] ml-node+s472066n4033563...@n3.nabble.com wrote: https://wiki.apache.org/solr/Solrj client. You'd have to configure it / use it based on your application needs. -Original Message- From: hassancrowdc [mailto:[hidden email]http://user/SendEmail.jtp?type=nodenode=4033563i=0] Sent: Tuesday, January 15, 2013 2:38 PM To: [hidden email] http://user/SendEmail.jtp?type=nodenode=4033563i=1 Subject: Re: Index data from multiple tables into Solr ok. so if i have manufacturer and id fields in schema file, what will be wat will be program that will use that will use solr API? -- View this message in context: http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-int o-Solr-tp4032266p4033556.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-int o-Solr-tp4032266p4033563.html To unsubscribe from Index data from multiple tables into Solr, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro =unsubscribe_by_codenode=4032266code=aGFzc2FuY3Jvd2RjYXJlQGdtYWlsLmN vbXw0MDMyMjY2fC00ODMwNzMyOTM= . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro =macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.n amespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabb le.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21na bble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_em ail%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033614.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Index data from multiple tables into Solr
I dont want to search by one field, i want to search as a whole. I am following that tutorial i got indexing, updating but now for search i would like to search through everything i have indexed not a specific field. I can do by using defaultfield but i would like to search through everything i have indexed. any hint how i can do that? On Tue, Jan 15, 2013 at 3:49 PM, sswoboda [via Lucene] ml-node+s472066n4033617...@n3.nabble.com wrote: What error are you getting? Which field are you searching (default field)? Did you try specifying a default field? What is your schema like? Which analyzers did you use? Which version of solr are you using? I highly recommend going through the tutorial to get a basic understanding of inserting, updating, and searching: http://lucene.apache.org/solr/tutorial.html Hours have been spent in setting up these tutorials and they are very informative. -Original Message- From: hassancrowdc [mailto:[hidden email]http://user/SendEmail.jtp?type=nodenode=4033617i=0] Sent: Tuesday, January 15, 2013 3:38 PM To: [hidden email] http://user/SendEmail.jtp?type=nodenode=4033617i=1 Subject: Re: Index data from multiple tables into Solr okay, thank you. After indexing data from database to solr. I want to search such that if i write any word (that is included in the documents been indexed) it should return all the documents that include that word. But it does not. When i write http://localhost:8983/solr/select?q=anyword i gives me error. is there anything wrong with my http? or is this the wrong place to search? On Tue, Jan 15, 2013 at 2:48 PM, sswoboda [via Lucene] [hidden email] http://user/SendEmail.jtp?type=nodenode=4033617i=2 wrote: https://wiki.apache.org/solr/Solrj client. You'd have to configure it / use it based on your application needs. -Original Message- From: hassancrowdc [mailto:[hidden email]http://user/SendEmail.jtp?type=nodenode=4033563i=0] Sent: Tuesday, January 15, 2013 2:38 PM To: [hidden email] http://user/SendEmail.jtp?type=nodenode=4033563i=1 Subject: Re: Index data from multiple tables into Solr ok. so if i have manufacturer and id fields in schema file, what will be wat will be program that will use that will use solr API? -- View this message in context: http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-int o-Solr-tp4032266p4033556.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-int o-Solr-tp4032266p4033563.html To unsubscribe from Index data from multiple tables into Solr, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro =unsubscribe_by_codenode=4032266code=aGFzc2FuY3Jvd2RjYXJlQGdtYWlsLmN vbXw0MDMyMjY2fC00ODMwNzMyOTM= . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro =macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.n amespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabb le.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21na bble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_em ail%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033614.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033617.html To unsubscribe from Index data from multiple tables into Solr, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4032266code=aGFzc2FuY3Jvd2RjYXJlQGdtYWlsLmNvbXw0MDMyMjY2fC00ODMwNzMyOTM= . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033622.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr exception when parsing XML
Interesting point. Looks like CDATA is more limiting than I thought: http://en.wikipedia.org/wiki/CDATA#Issues_with_encoding . Basically, the recommendation is to avoid CDATA and automatically encode characters such as yours, as well as less/more and ampersand. Regards, Alex.
RE: Index data from multiple tables into Solr
http://wiki.apache.org/solr/ExtendedDisMax Specify your query fields in the qf parameter. Take a look at the example at the bottom of the page. -Original Message- From: hassancrowdc [mailto:hassancrowdc...@gmail.com] Sent: Tuesday, January 15, 2013 3:56 PM To: solr-user@lucene.apache.org Subject: Re: Index data from multiple tables into Solr I dont want to search by one field, i want to search as a whole. I am following that tutorial i got indexing, updating but now for search i would like to search through everything i have indexed not a specific field. I can do by using defaultfield but i would like to search through everything i have indexed. any hint how i can do that? On Tue, Jan 15, 2013 at 3:49 PM, sswoboda [via Lucene] ml-node+s472066n4033617...@n3.nabble.com wrote: What error are you getting? Which field are you searching (default field)? Did you try specifying a default field? What is your schema like? Which analyzers did you use? Which version of solr are you using? I highly recommend going through the tutorial to get a basic understanding of inserting, updating, and searching: http://lucene.apache.org/solr/tutorial.html Hours have been spent in setting up these tutorials and they are very informative. -Original Message- From: hassancrowdc [mailto:[hidden email]http://user/SendEmail.jtp?type=nodenode=4033617i=0] Sent: Tuesday, January 15, 2013 3:38 PM To: [hidden email] http://user/SendEmail.jtp?type=nodenode=4033617i=1 Subject: Re: Index data from multiple tables into Solr okay, thank you. After indexing data from database to solr. I want to search such that if i write any word (that is included in the documents been indexed) it should return all the documents that include that word. But it does not. When i write http://localhost:8983/solr/select?q=anyword i gives me error. is there anything wrong with my http? or is this the wrong place to search? On Tue, Jan 15, 2013 at 2:48 PM, sswoboda [via Lucene] [hidden email] http://user/SendEmail.jtp?type=nodenode=4033617i=2 wrote: https://wiki.apache.org/solr/Solrj client. You'd have to configure it / use it based on your application needs. -Original Message- From: hassancrowdc [mailto:[hidden email]http://user/SendEmail.jtp?type=nodenode=4033563i=0] Sent: Tuesday, January 15, 2013 2:38 PM To: [hidden email] http://user/SendEmail.jtp?type=nodenode=4033563i=1 Subject: Re: Index data from multiple tables into Solr ok. so if i have manufacturer and id fields in schema file, what will be wat will be program that will use that will use solr API? -- View this message in context: http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-i nt o-Solr-tp4032266p4033556.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-i nt o-Solr-tp4032266p4033563.html To unsubscribe from Index data from multiple tables into Solr, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?mac ro =unsubscribe_by_codenode=4032266code=aGFzc2FuY3Jvd2RjYXJlQGdtYWlsL mN vbXw0MDMyMjY2fC00ODMwNzMyOTM= . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?mac ro =macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml .n amespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-na bb le.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21 na bble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_ em ail%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-int o-Solr-tp4032266p4033614.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-int o-Solr-tp4032266p4033617.html To unsubscribe from Index data from multiple tables into Solr, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro =unsubscribe_by_codenode=4032266code=aGFzc2FuY3Jvd2RjYXJlQGdtYWlsLmN vbXw0MDMyMjY2fC00ODMwNzMyOTM= . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro =macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.n amespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabb le.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21na bble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_em ail%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033622.html Sent from the
Missing documents with ConcurrentUpdateSolrServer (vs. HttpSolrServer) ?
First off, just reporting this: I wound up with approx 58% few documents having submitted via ConcurrentUpdateSolrServer. I went back and changed the code to use HttpSolrServer and had 100% This was a long running test, approx 12 hours, with gigabytes of data, so conveniently shared / reproducible, but I at least wanted to email around, in part to get it on the record, and second to see if anybody else has seen this? I didn't see anything in JIRA. I realize that Concurrent update is asynchronous and I'm giving up the ability to monitor things, but since it works using the old server, there's nothing glaringly wrong at least. Here's a few more details: * Approx 2 M docs, submitted 1,000 at a time. * Solr 4.0.0 on Windows Server 2008 * Solr server JVM configured with 4 Gigs of RAM * Submitting client JVM (SolrJ) configured with 10 Gigs of RAM * Did didn't see any OOM (Out Of Memory) errors on the asynchronous / ConcurrentUpdateSolrServer run. However, I didn't capture the entire log. Usually with OOM it's just before the run crashes, and the end of the log on the screen looked fine. * I also didn't think there was OOM issues on the Solr server side, for the same reason * When submitting the same data synchronously (via HttpSolrServer) it didn't have any problems Questions: The async client certainly finished faster, and since the underlying Solr server presumably didn't do the real work any faster, presumably a backlog built up somewhere. Agreed? I'm guessing this backlog had something to do with the failure. Or are there other areas to think about? Which process would get backlogged, the SolrJ client or the Solr server? I'd guess the server? And if async submits are accumulated in the Solr server, is there some mechanism to queue them onto disk, or does it try to hold them all in RAM? And *if* the backlog caused an OOM condition, wouldn't that JVM have mostly crashed (if not completely)? Any guesses on the mostly likely failure point, and where to look? Thanks, Mark -- Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
Re: Index data from multiple tables into Solr
On 1/15/2013 1:37 PM, hassancrowdc wrote: After indexing data from database to solr. I want to search such that if i write any word (that is included in the documents been indexed) it should return all the documents that include that word. But it does not. When i write http://localhost:8983/solr/select?q=anyword i gives me error. You haven't told it which core (or collection if using SolrCloud) you want to search. http://localhost:8983/solr/corename/select?q=anyword
Re: Synonyms and trailing wildcard
It's certainly true that wildcard suppresses the synonym filter since it is not multi-term aware. Other than implementing your own version of the synonym filter that was multi-term aware and interpreted wildcards, you may have to do your own preprocessor. Or, you could do index-time synonyms, so that bill, billy, will, willy, and william were all indexed at the same location. Then the bil* wildcard would match william sincebill is also indexed at the same location. -- Jack Krupansky -Original Message- From: Roberto Isaac Gonzalez Sent: Tuesday, January 15, 2013 3:10 PM To: solr-user@lucene.apache.org Subject: Synonyms and trailing wildcard Hi I'm working on adding nicknames capability to our system. It's basically a synonym mapping stored in a nicknames.txt file that uses the SynonymFilter framework. In one of our search boxes (used for lookups), we automatically append a trailing wildcard. There's one use case we're dealing with which is expanding synonyms even if there's a trailing wildcard. i.e. Q: Bill* Expected Results: Bill, Billie, William Q: Bil* Expected Results: Bill, so no synonym expansion. Basically, for synonym expansion, we want to treat the token as if it didn't contain the trailing wildcard and we also *don't* want to expand the wildcard before doing the synonym matches. We tried using the multiterm analysis chain but by definition that expects one token *in* and one token *out*(org.apache.solr.schema.TextField.analyzeMultiTerm()) so it throws an exception. I'm looking for options about implementing this scenario and some of the options I've explored are: 1. Use the multiterm analysis chain and allow Synonym expansion, so one token in and multiple tokens out. 2. Iterate ourselves and see if the multiterm analysis chain returns more than one token, if it does, then remove the SynonymFilter from the analysis chain, something similar to ExtendedDismaxQParser.shouldRemoveStopFilter(). 3. ExtendedDismaxQParser.preProcessUserQuery() to OR the non-wildcarded term. What do you guys think? Best Regards, Roberto Gonzalez
Re: Missing documents with ConcurrentUpdateSolrServer (vs. HttpSolrServer) ?
On 1/15/2013 2:10 PM, Mark Bennett wrote: First off, just reporting this: I wound up with approx 58% few documents having submitted via ConcurrentUpdateSolrServer. I went back and changed the code to use HttpSolrServer and had 100% This was a long running test, approx 12 hours, with gigabytes of data, so conveniently shared / reproducible, but I at least wanted to email around, in part to get it on the record, and second to see if anybody else has seen this? I didn't see anything in JIRA. I realize that Concurrent update is asynchronous and I'm giving up the ability to monitor things, but since it works using the old server, there's nothing glaringly wrong at least. You're not only giving up the ability to monitor things, you're also giving up the ability to detect errors. All exceptions that get thrown by the internals of ConcurrentUpdateSolrServer are swallowed, your code will never know they happened. The client log (slf4j with whatever binding config you chose) may have such errors logged, but they are completely undetectable by the code. Make sure you're actually logging someplace with your solrj app at a minimum level of INFO, then check that log. It might be a case of errors being silently swallowed, or it might be a bug. Thanks, Shawn
from 1.4 to 3.6
HI I hope this doesn't turn out to be a very stupid question. I have upgraded from solr 1.4 to 3.6 and now in the response that I am getting from solr maxScore field in the [response] is missing. I am doing something wrong? how can I get it back? thanks, -- Kaveh Minooie www.plutoz.com
Re: from 1.4 to 3.6
On 1/15/2013 4:14 PM, kaveh minooie wrote: HI I hope this doesn't turn out to be a very stupid question. I have upgraded from solr 1.4 to 3.6 and now in the response that I am getting from solr maxScore field in the [response] is missing. I am doing something wrong? how can I get it back? Just add the special score field to the fl parameter (field list). If you don't have the fl parameter at all, use fl=score,* to get it. If you aren't displaying the score, then it won't give you maxScore. Thanks, Shawn
Is *:* the only possible search with * on the left-hand-side?
Hello, Is *:* hardcoded somewhere as a unique special pattern or is there actually a class of queries with *:'something'? I tried searching for it, but I suspect this is not the patterns most tokenizers will actually index as searchable. :-) Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
Re: Is *:* the only possible search with * on the left-hand-side?
Semi-hard-coded. In QueryParserBase.java: protected Query getWildcardQuery(String field, String termStr) throws ParseException { if (*.equals(field)) { if (*.equals(termStr)) return newMatchAllDocsQuery(); Otherwise, if you try *:x, * is an undefined field. -- Jack Krupansky -Original Message- From: Alexandre Rafalovitch Sent: Tuesday, January 15, 2013 7:06 PM To: solr-user@lucene.apache.org Subject: Is *:* the only possible search with * on the left-hand-side? Hello, Is *:* hardcoded somewhere as a unique special pattern or is there actually a class of queries with *:'something'? I tried searching for it, but I suspect this is not the patterns most tokenizers will actually index as searchable. :-) Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
Re: Top Terms Using Luke
I suppose this will do; I just figured they'd be a built-in way of excluding stopwords. Thank you. On 15 January 2013 22:08, Shawn Heisey s...@elyograg.org wrote: To get an idea for which non-stopwords are dominant in your index, just ask for more top terms, instead of just the top ten or top twenty. If you are using a program to parse the information, have your program remove the terms that you don't want to include, then trim the list to the proper size. Lighton Phiri http://lightonphiri.org
RE: DataImportHandlerException: Unable to execute query with OPTIM
Dear James Dyer , Thank You Very Much. Its really working now. I was struggling past 3 weeks to solve it. You are really awesome. I am really happy now. Thank you to make me happy. Regards, Ashim -- View this message in context: http://lucene.472066.n3.nabble.com/DataImportHandlerException-Unable-to-execute-query-with-OPTIM-tp4033436p4033755.html Sent from the Solr - User mailing list archive at Nabble.com.