SOLR-769 clustering
Hello: I have got the clustering working i.e SOLR-769. I am wondering - why there is a filed called body, does it have special purpose? field name=body type=text indexed=true stored=true multiValued=true/ - can my clustering field be a copyField? basically I like to remove the urls and html? - is there anyway to have minimum number of labels per cluster? Thanks. Antonio __ Ta semester! - sök efter resor hos Kelkoo. Jämför pris på flygbiljetter och hotellrum här: http://www.kelkoo.se/c-169901-resor-biljetter.html?partnerId=96914052
Re: Big Problem with special characters
Otis Gospodnetic schrieb: Try debugQuery=true and see if the resulting query string makes sense. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch thx for the hint... My problem was the WhitespaceTokenizer :-( After I change back to StandardTokenizer everythign was fine ! Greets -Ralf-
autowarmcount how to check if cache has been warmed up
Hi, Is it possible to have autowarmcount=500 with warmupTime=2751 and size=5, where can I check up if the cache is full or not cuz really there it looks empty still??? and commitment is done. solr1.4 thanks for your help, sunny name:queryResultCache class: org.apache.solr.search.FastLRUCache version:1.0 description:Concurrent LRU Cache(maxSize=14774644, initialSize=14774644, minSize=13297179, acceptableSize=14035911, cleanupThread=false, autowarmCount=500, regenerator=org.apache.solr.search.solrindexsearche...@6e4eeaaf) stats: lookups : 0 hits : 0 hitratio : 0.00 inserts : 0 evictions : 0 size : 5 warmupTime : 2751 cumulative_lookups : 0 cumulative_hits : 0 cumulative_hitratio : 0.00 cumulative_inserts : 0 cumulative_evictions : 0 -- View this message in context: http://www.nabble.com/autowarmcount-how-to-check-if-cache-has-been-warmed-up-tp23156612p23156612.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using Solr to index a database
Thanks for the link... I'm still a bit unclear as to how it goes. For example, lets say i have a table called PRODUCTS, and within that table, I have the following columns: NUMBER (product number) NAME (product name) PRICE How would I index all this information? Here is an example (from the links you provided) of xml that confuses me: entity name=item pk=ID query=select * from item ---deltaQuery=select id from item where last_modified '${dataimporter.last_index_time}' field column=NAME name=name / field column=NAME name=nameSort / field column=NAME name=alphaNameSort / What is that deltaQuery (or even if it was a regular query expression) line for? It seems to me like a sort of filter. What if I don't want to filter anything and just want to index all the rows? Cheers Noble Paul നോബിള് नोब्ळ् wrote: On Mon, Apr 20, 2009 at 7:15 PM, ahammad ahmed.ham...@gmail.com wrote: Hello, I've never used Solr before, but I believe that it will suit my current needs with indexing information from a database. I downloaded and extracted Solr 1.3 to play around with it. I've been looking at the following tutorials: http://www.ibm.com/developerworks/java/library/j-solr-update/index.html http://www.ibm.com/developerworks/java/library/j-solr-update/index.html http://wiki.apache.org/solr/DataImportHandler http://wiki.apache.org/solr/DataImportHandler There are a few things I don't understand. For example, the IBM article sometimes refers to directories that aren't there, or a little different from what I have in my extracted copy of Solr (ie solr-dw/rss/conf/solrconfig.xml). I tried to follow the steps as best I can, but as soon as I put the following in solrconfig.xml, the whole thing breaks: requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configrss-data-config.xml/str /lst /requestHandler Obviously I replace with my own info...One thing I don't quite get is the data-config.xml file. What exactly is it? I've seen examples of what it contains but since I don't know enough, I couldn't really adjust it. In any case, this is the error I get, which may be because of a misconfigured data-config.xml... the data-config.xml describes how to fetch data from various data sources and index them into Solr. The stacktrace says that your xml is invalid. The best bet is to take one of the sample dataconfig xml files and make changes. http://svn.apache.org/viewvc/lucene/solr/trunk/example/example-DIH/solr/db/conf/db-data-config.xml?revision=691151view=markup http://svn.apache.org/viewvc/lucene/solr/trunk/example/example-DIH/solr/rss/conf/rss-data-config.xml?revision=691151view=markup org.apache.solr.handler.dataimport.DataImportHandlerException: Exception occurred while initializing context at org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:165) at org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.java:99) at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:96) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:388) at org.apache.solr.core.SolrCore.init(SolrCore.java:571) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:122) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:221) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:302) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:78) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3635) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4222) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:760) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:740) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:544) at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:831) at org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:720) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:490) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1150) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:311) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1022) at org.apache.catalina.core.StandardHost.start(StandardHost.java:736) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1014) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) at
RE: Sort by distance from location?
I've never used them personally, but I think a function query would suit you here. Function queries allow you to define a custom function as a component of the score of a result document. Define a distance function based on the user's current location and the that of the search result, such that the shorter the distance, the higher the function output. This will boost results inversely proportional to the distance from the user. -Ken -Original Message- From: Development Team [mailto:dev.and...@gmail.com] Sent: Tuesday, April 14, 2009 5:32 PM To: solr-user@lucene.apache.org Subject: Sort by distance from location? Hi everybody, My index has latitude/longitude values for locations. I am required to do a search based on a set of criteria, and order the results based on how far the lat/long location is to the current user's location. Currently we are emulating such a search by adding criteria of ever-widening bounding boxes, and the more of those boxes match the document, the higher the score and thus the closer ones appear at the start of the results. The query looks something like this (newlines between each search term): +criteraOne:1 +criteriaTwo:true +latitude:[-90.0 TO 90.0] +longitude:[-180.0 TO 180.0] (latitude:[40.52 TO 40.81] longitude:[-74.17 TO -73.79]) (latitude:[40.30 TO 41.02] longitude:[-74.45 TO -73.51]) (latitude:[39.94 TO 41.38] longitude:[-74.93 TO -73.03]) [[...etc...about 10 times...]] Naturally this is quite slow (query is approximately 6x slower than normal), and... I can't help but feel that there's a more elegant way of sorting by distance. Does anybody know how to do this or have any suggestions? Sincerely, Daryl.
Master Slave Solr Replication Automation
We have a requirement of replicating data from one Solr set on a Linux Box to Second Solr on another Linux box. In order to achieve the same we will use the SolrCollectionDistributionScripts(snapshooter, snappuller etc) and rsync utility. Configurations: 1. Apache Solr 1.3.0 2. Machines : Linux 3. Master Slave : 1 Master and 1 slave Settings done at our end: Solr on the both Linux boxes contains multiple cores. We have disintegrated the data to be indexed among multicores, sample solr path of data folder for the same is like: Path :: {SOLR_HOME}/solr/multicore/multi_corename/data Sample : machine_path/apache-solr-1.3.0/example/solr/multicore/CORE_WWW.ABCD.COM/data * SOLR_HOME :: machine_path/apache-solr-1.3.0/example/ ** multi_corename :: CORE_WWW.ABCD.COM Thus we will be going to have multiple cores on master as well as slaves servers As mentioned on http:// http://wiki.apache.org/solr/CollectionDistribution :For the Solr distribution scripts, the name of the index directory can be defined by the environment variable data_dir in the configuration file conf/scripts.conf Example conf/scripts.conf file on slave solr server : user= solr_hostname=localhost solr_port=8080 rsyncd_port=18983 data_dir=${SOLR_HOME}/solr/muticore/CORE_WWW.ABCD.COM /data webapp_name=solr master_host=10.x.xx.xxx master_data_dir=${SOLR_HOME}/solr/muticore/CORE_WWW.ABCD.COM/data master_status_dir=${SOLR_HOME}/solr/muticore/CORE_WWW.ABCD.COM /status The index directory name mentioned above should match the value used by the Solr server which is defined in solr/conf/solrconfig.xml. Following are few queries: 1. Please confirm whether the tag entry : dataDir/datadir In solrconfig.xml should match for the Slave solr server / master solr server in accordance to the scripts.conf configuration settings. 2. Also let us know whether some specific handling has to be done in case of using multi cores during replication. 3. Are there any pitfalls in using the solr distribution scripts and rsync utility. Please throw some light on the queries. -- View this message in context: http://www.nabble.com/Master-Slave-Solr-Replication-Automation-tp23158672p23158672.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Best way to index without diacritics
Hola! Yo también tengo el mismo problema, ya tengo mis índices de mis documentos cambie el charset a iso-8859-1 y ya pude ver las ñ y acentos, ahora ligué el buscador a mi aplicación y desde páginas jsp se hacen búsquedas, el problema es que cuando el usuario escribe en el text que pide el parámetro de la consulta debe escribir los acentos, y eso es justamente lo que no quiero, leí un poco y creo que ocupo incluir la clase isolatin o algo así, pero no sé bien como, primero se que la debo incluir en el config.xml o en el schema.xml pero también debo bajar un jar o algo así de la clase isolatin? o que debo hacer no sé... por favor alguna ayuda!!!? me estoy volviendo loca! :( Hello! I too have the same problem as I have my documents indexes change charset to iso-8859-1, and I could see ñ and accents, now linking the browser to my application from jsp pages are searched, the problem is that when the user types in the text that requests the parameter of the query must write accents, and this is what we do not want to, I read a bit and I think the deal include ISOLATINO class or something, but I do not know as well, first is that I include in the config.xml or the schema.xml but I download a jar or something of the kind ISOLATINO? or that I do not know ... please some help !!!?. I'm going crazy! : ( -- View this message in context: http://www.nabble.com/Best-way-to-index-without-diacritics-tp18935599p23159812.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Delete from Solr index...
Hola! ¿qué tal? Tengo un problema parecido, necesito borrar algunos índices de mi solr, ya que los di de alta mientras hacía pruebas y ahora que entregaré el proyecto necesito que no aparezcan ya, se me complica esto ya que toda la información de solr está en inglés y pués yo no lo entiendo bien, en fin, espero me puedan ayudar ya que tengo solo unos días para entregar el proyecto, de antemano mil gracias! :) Hello! How are you? I have a similar problem, I need to delete some of my SOLR indexes, since the various tests was as high and now that I need to deliver the project because they do not show, I compounded this, as all information is in English and SOLR after I do not understand, finally, I hope I can help because I have just days to deliver the project, a thousand thanks in advance! :) -- View this message in context: http://www.nabble.com/Delete-from-Solr-index...-tp10264940p23159879.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Best way to index without diacritics
Does this help: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?highlight=(isolatin)#head-4ebf7aea23b3d6d34a1f8314f9de17334a3e2fac Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: lupiss lupitaga...@hotmail.com To: solr-user@lucene.apache.org Sent: Tuesday, April 21, 2009 12:22:08 PM Subject: Re: Best way to index without diacritics Hola! Yo también tengo el mismo problema, ya tengo mis índices de mis documentos cambie el charset a iso-8859-1 y ya pude ver las ñ y acentos, ahora ligué el buscador a mi aplicación y desde páginas jsp se hacen búsquedas, el problema es que cuando el usuario escribe en el text que pide el parámetro de la consulta debe escribir los acentos, y eso es justamente lo que no quiero, leí un poco y creo que ocupo incluir la clase isolatin o algo así, pero no sé bien como, primero se que la debo incluir en el config.xml o en el schema.xml pero también debo bajar un jar o algo así de la clase isolatin? o que debo hacer no sé... por favor alguna ayuda!!!? me estoy volviendo loca! :( Hello! I too have the same problem as I have my documents indexes change charset to iso-8859-1, and I could see ñ and accents, now linking the browser to my application from jsp pages are searched, the problem is that when the user types in the text that requests the parameter of the query must write accents, and this is what we do not want to, I read a bit and I think the deal include ISOLATINO class or something, but I do not know as well, first is that I include in the config.xml or the schema.xml but I download a jar or something of the kind ISOLATINO? or that I do not know ... please some help !!!?. I'm going crazy! : ( -- View this message in context: http://www.nabble.com/Best-way-to-index-without-diacritics-tp18935599p23159812.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr - clarification on date sortable fields
I am sending this question out on behalf a college. Which needs a clarification on solr indexing on date and sortable fields. We have declared a field date in schema.xml like below field name=premierDate_dt type=date indexed=true stored=true multiValued=false default=NOW/ While indexing if I don't pass any value to this field like premierDate_dt/ or premierDate_dt/premierDate_dt, I am getting the below error SEVERE: org.apache.solr.common.SolrException: Invalid Date String:'' at org.apache.solr.schema.DateField.parseMath(DateField.java:167) at org.apache.solr.schema.DateField.toInternal(DateField.java:138) at org.apache.solr.schema.FieldType.createField(FieldType.java:179) at org.apache.solr.schema.SchemaField.createField(SchemaField.java:93) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:243) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProces sorFactory.java:58) Instead if I remove the tag from the request, it is not giving any issues. The same behavious exist for sortable fields as well like sint, slong. Is there any work around we can make in schema file? Or the request needs to be changed accordingly? A quick work around for this is declaring the fields as string. But the limitation would be we can not perform any range search queries on these fields.. Interestingly,f we replace with all zeros in the date (I.e. premierDate_dt-00-00T00:00:00Z/premierDate_dt, It gets indexed and the value in index is created as 0002-11-30T00:00:00. Thanks.
Re: query on part number not matching
Or in this case, I was using DisMax. My ps was 5, but I didn't have a qs field. Setting qs to a small value did the trick. From: Yonik Seeley yo...@lucidimagination.com To: solr-user@lucene.apache.org Sent: Monday, April 20, 2009 6:09:51 PM Subject: Re: query on part number not matching On Mon, Apr 20, 2009 at 8:50 PM, Kevin Osborn osbo...@yahoo.com wrote: Looks like the format didn't come through in the email. ch, vxrch, and cisco7204xvrch are all in position 4. Ah... the traditional way to handle that case is to use a little slop with the phrase query. -Yonik
Re: Solr - clarification on date sortable fields
This all makes sense. You are sending a blank string for a field that expects a date (or null - no element at all - if you want it to default to NOW). So, yes, you need to either pass a valid date or don't pass that element in at all. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Wesley Small wesley.sm...@mtvstaff.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Tuesday, April 21, 2009 12:57:22 PM Subject: Solr - clarification on date sortable fields I am sending this question out on behalf a college. Which needs a clarification on solr indexing on date and sortable fields. We have declared a field date in schema.xml like below multiValued=false default=NOW/ While indexing if I don't pass any value to this field like or , I am getting the below error SEVERE: org.apache.solr.common.SolrException: Invalid Date String:'' at org.apache.solr.schema.DateField.parseMath(DateField.java:167) at org.apache.solr.schema.DateField.toInternal(DateField.java:138) at org.apache.solr.schema.FieldType.createField(FieldType.java:179) at org.apache.solr.schema.SchemaField.createField(SchemaField.java:93) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:243) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProces sorFactory.java:58) Instead if I remove the tag from the request, it is not giving any issues. The same behavious exist for sortable fields as well like sint, slong. Is there any work around we can make in schema file? Or the request needs to be changed accordingly? A quick work around for this is declaring the fields as string. But the limitation would be we can not perform any range search queries on these fields.. Interestingly,f we replace with all zeros in the date (I.e. -00-00T00:00:00Z, It gets indexed and the value in index is created as 0002-11-30T00:00:00. Thanks.
Re: SOLR-769 clustering
Hi Antonio, - is there anyway to have minimum number of labels per cluster? The current search results clustering algorithms (from Carrot2) by design generate one label per cluster, so there is no way to force them to create more. What is the reason you'd like to have more labels per cluster? I'd leave the other two Solr-related questions to answer by a more competent person (Grant?). Cheers, Staszek
Re: Hierarchal Faceting Field Type
Thank you. We tried your suggestion but we are still getting the following problem: fieldType name=category class=solr.TextField analyzer type=store tokenizer class=solr.PatternTokenizerFactory pattern=;/ /analyzer /fieldType field name=my_facet type=category indexed=true stored=false multiValued=true/ Sample data: level one;level two;level three; level one;level two;level threeB; When we query for: level one;level two;level three;* We are getting back : level one;level two;level threeB; Even though the B is before the semicolon. Any idea why? Thank you, Nasseam Check out our solr-powered Ajax search+nav solution: http://factbook.bodukai.com/ Powered by Boutique: http://bodukai.com/boutique/ On Apr 17, 2009, at 3:10 PM, Chris Hostetter wrote: : level one# : level one#level two# : level one#level two#level three# : : Trying to find the right combination of field type and query to get the : desired results. Saw some previous posts about hierarchal facets which helped : in the generating the right query but having an issue using the built in text : field which ignores our delimiter and the string field which prevents us from : doing a start with search. Does anyone have any insight into the field : declaration? Use TextField, with a PatternTokenizer BTW: if this isn't thread you've already seen, it's handy to know about... http://www.nabble.com/Hierarchical-Faceting-to20090898.html#a20176326 -Hoss
Re: Solr Getting values for an id
The id field and testScore fields are name value pairs for each id i have a testScore.When I search based on id how do I know the position.I there any method or api in solr which gives me the position. I have not understand the second part of your reply.Can you please tell me how I can do with making those fields non multivalued.If possible some example or code is help ful Thanks, Raju Otis Gospodnetic wrote: You'll have to manually pull/parse those out and match them based on their positions, I think. Or make those fields non-multivalued and add additional fields instead, if their number is fixed. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Raju444us ngudipa...@cormineid.com To: solr-user@lucene.apache.org Sent: Tuesday, April 21, 2009 5:40:47 PM Subject: Solr Getting values for an id i have a problem. I have a requirement.I indexed document something like this.The id and testScore fields are multivalued. My problem is if i search for id=1 this should return the search results with id = 1 and testScore = 90. Is there any way I can do this. Test Name 1 90 2 92 3 97 Thanks, Naveen -- View this message in context: http://www.nabble.com/Solr-Getting-values-for-an-id-tp23165464p23165464.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Solr-Getting-values-for-an-id-tp23165464p23165694.html Sent from the Solr - User mailing list archive at Nabble.com.
filtering a query by a set of pk values
Hi We have a need to filter and rank a queryset outside of solr (its a specialised spatial search) and then restrict the solr search based on that filter. Previously we were doing our filter, then passing a set of primary keys to solr like so: q = '(aerial photos) AND (pk:123^1.8 OR pk:163^1.2 OR pk:920^0.73)' I know it's quite ugly but we are lacking a better alternative. Anyway that worked when we were using the standard query handler, but we've now switched to the dismax handler (so we can boost individual fields) and it no longer works. The fq parameter looked promising for doing something similar until I realised that it doesn't seem to do OR queries and it doesn't allow you to influence the ranking of results. Can anyone suggest a cleaner way of doing this, or a just-as-ugly way that works with the dismax handler? Thanks Craig de Stigter
Re: Best way to index without diacritics
hola, gracias por contestar. sí, yo creo que esa es la clase que me servirá, pero no sé cómo implementarla, podrías decirme si tu ya la haz usado, y si es así, decirme qué líneas incluíste en el schema.xml, en el config.xml, qué .jar adjuntaste, etc, todos los detalles, o incluso si tienes un ejemplo que puedas escribir en el foro por favor, se me hace difícil porque además de que soy nueva en solr, toda la información está en inglés y pués yo no lo entiendo muy bien que digamos :( gracias nuevamente hi, thanks for reply. yes, I believe that is the class that will serve me, but I do not know how to implement it, and you could tell me if the beam used, and if so, tell me what lines included in the schema.xml in config.xml, which . jar attached, etc, all the details, or even if you have an example that you can write in the forum please, I also made it difficult because I am new to SOLR, all information is in English and then I do not understand very well to say: ( thanks again -- View this message in context: http://www.nabble.com/Best-way-to-index-without-diacritics-tp18935599p23166430.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Best way to index without diacritics
Amigo! Viva Solr :) Sent from my BlackBerry device on the Rogers Wireless Network -Original Message- From: lupiss lupitaga...@hotmail.com Date: Tue, 21 Apr 2009 16:17:04 To: solr-user@lucene.apache.org Subject: Re: Best way to index without diacritics hola, gracias por contestar. sí, yo creo que esa es la clase que me servirá, pero no sé cómo implementarla, podrías decirme si tu ya la haz usado, y si es así, decirme qué líneas incluíste en el schema.xml, en el config.xml, qué .jar adjuntaste, etc, todos los detalles, o incluso si tienes un ejemplo que puedas escribir en el foro por favor, se me hace difícil porque además de que soy nueva en solr, toda la información está en inglés y pués yo no lo entiendo muy bien que digamos :( gracias nuevamente hi, thanks for reply. yes, I believe that is the class that will serve me, but I do not know how to implement it, and you could tell me if the beam used, and if so, tell me what lines included in the schema.xml in config.xml, which . jar attached, etc, all the details, or even if you have an example that you can write in the forum please, I also made it difficult because I am new to SOLR, all information is in English and then I do not understand very well to say: ( thanks again -- View this message in context: http://www.nabble.com/Best-way-to-index-without-diacritics-tp18935599p23166430.html Sent from the Solr - User mailing list archive at Nabble.com.
Issue with Solr Snapshots (missing .nrm file)
We had one issue with our Solr production deployment couple of weeks back. Following is more info about it. Server Setup === Platform: Sun Solaris Ultrasparc JDK: 1.5 Solr: 1.2 Index Size: ~15GB Topology: One master and two slaves Problem Statement === Every day we index different contents into the Solr master and run the optimize at the end of it. Snapshooter is triggered at the end of Optimize and it creates a snapshot of the index. After couple of hours, Slaves pull the latest snapshot and install it to serve the searches. Couple of weeks back, Slaves didn't pull the snapshot and when researched we found that one file (.nrm) was missing in the snapshot created by master, as shown below. # ls -lrth total 29877000 -rw-r--r-- 1 jbossstaff 1.9K Apr 14 11:00 _429o.fnm -rw-r--r-- 1 jbossstaff28M Apr 14 12:18 _429o.fdx -rw-r--r-- 1 jbossstaff 9.8G Apr 14 12:18 _429o.fdt -rw-r--r-- 1 jbossstaff 501M Apr 14 12:43 _429o.tis -rw-r--r-- 1 jbossstaff 6.5M Apr 14 12:43 _429o.tii -rw-r--r-- 1 jbossstaff 2.4G Apr 14 12:43 _429o.prx -rw-r--r-- 1 jbossstaff 1.2G Apr 14 12:43 _429o.frq -rw-r--r-- 1 jbossstaff 44 Apr 14 12:44 segments_53gt -rw-r--r-- 1 jbossstaff 20 Apr 14 12:44 segments.gen -rw-r--r-- 1 jbossstaff 351M Apr 14 12:44 _429o.nrm --- This is the missing file in the snapshot! # cd snapshot.20090414124449/ # ls -lrth total 29157784 -rw-r--r-- 1 jbossstaff 20 Apr 14 12:44 segments.gen -rw-r--r-- 1 jbossstaff 1.9K Apr 14 12:44 _429o.fnm -rw-r--r-- 1 jbossstaff 9.8G Apr 14 13:03 _429o.fdt -rw-r--r-- 1 jbossstaff28M Apr 14 13:03 _429o.fdx -rw-r--r-- 1 jbossstaff 1.2G Apr 14 13:06 _429o.frq -rw-r--r-- 1 jbossstaff 2.4G Apr 14 13:12 _429o.prx -rw-r--r-- 1 jbossstaff 501M Apr 14 13:13 _429o.tis -rw-r--r-- 1 jbossstaff 6.5M Apr 14 13:13 _429o.tii -rw-r--r-- 1 jbossstaff 44 Apr 14 13:13 segments_53gt ***This snapshot is missing _429o.nrm file!*** Has anybody faced this issue (missing a file, or may be missing .nrm file) might happen? Any insight is greatly appreciated. Couple of other questions. 1. For index of size 15GB, how much of breathing space is required (both memory and diskspace) in master and slave? 2. Would Slaves pull the snapshots if any of the files are missing in index? (I guess that snapshot-puller will pull but may not be able to install it?) 3. We are using Solr 1.2 and contemplating to upgrade to 1.3. What is your experience in this path of upgrade? Is it strongly recommended (based on any critical bugs which were fixed?) Thanks, Santhosh.
RE: OutofMemory on Highlightling
I tried disabling the documentCache but still the same issue. documentCache class=solr.LRUCache size=0 initialSize=0 autowarmCount=0/ -Original Message- From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] Sent: Monday, April 20, 2009 4:38 PM To: solr-user@lucene.apache.org Subject: Re: OutofMemory on Highlightling Gargate, Siddharth wrote: Anybody facing the same issue? Following is my configuration ... field name=content type=text indexed=true stored=false multiValued=true/ field name=teaser type=text indexed=false stored=true/ copyField source=content dest=teaser maxChars=100 / ... ... requestHandler name=standard class=solr.SearchHandler default=true lst name=defaults str name=echoParamsexplicit/str int name=rows500/int str name=hltrue/str str name=flid,score/str str name=hl.flteaser/str str name=hl.alternateFieldteaser/str int name=hl.fragsize200/int int name=hl.maxAlternateFieldLength200/int int name=hl.maxAnalyzedChars500/int /lst /requestHandler ... Search works fine if I disable highlighting and it brings 500 results. But if I enable hightlighting and set the no. of rows to just 20 I get OOME. How about switching documentCache off? Koji
Re: Using Solr to index a database
delta query is for incremental imports us ethe 'query' attribute to import data On Tue, Apr 21, 2009 at 7:35 PM, ahammad ahmed.ham...@gmail.com wrote: Thanks for the link... I'm still a bit unclear as to how it goes. For example, lets say i have a table called PRODUCTS, and within that table, I have the following columns: NUMBER (product number) NAME (product name) PRICE How would I index all this information? Here is an example (from the links you provided) of xml that confuses me: entity name=item pk=ID query=select * from item --- deltaQuery=select id from item where last_modified '${dataimporter.last_index_time}' field column=NAME name=name / field column=NAME name=nameSort / field column=NAME name=alphaNameSort / What is that deltaQuery (or even if it was a regular query expression) line for? It seems to me like a sort of filter. What if I don't want to filter anything and just want to index all the rows? Cheers Noble Paul നോബിള് नोब्ळ् wrote: On Mon, Apr 20, 2009 at 7:15 PM, ahammad ahmed.ham...@gmail.com wrote: Hello, I've never used Solr before, but I believe that it will suit my current needs with indexing information from a database. I downloaded and extracted Solr 1.3 to play around with it. I've been looking at the following tutorials: http://www.ibm.com/developerworks/java/library/j-solr-update/index.html http://www.ibm.com/developerworks/java/library/j-solr-update/index.html http://wiki.apache.org/solr/DataImportHandler http://wiki.apache.org/solr/DataImportHandler There are a few things I don't understand. For example, the IBM article sometimes refers to directories that aren't there, or a little different from what I have in my extracted copy of Solr (ie solr-dw/rss/conf/solrconfig.xml). I tried to follow the steps as best I can, but as soon as I put the following in solrconfig.xml, the whole thing breaks: requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configrss-data-config.xml/str /lst /requestHandler Obviously I replace with my own info...One thing I don't quite get is the data-config.xml file. What exactly is it? I've seen examples of what it contains but since I don't know enough, I couldn't really adjust it. In any case, this is the error I get, which may be because of a misconfigured data-config.xml... the data-config.xml describes how to fetch data from various data sources and index them into Solr. The stacktrace says that your xml is invalid. The best bet is to take one of the sample dataconfig xml files and make changes. http://svn.apache.org/viewvc/lucene/solr/trunk/example/example-DIH/solr/db/conf/db-data-config.xml?revision=691151view=markup http://svn.apache.org/viewvc/lucene/solr/trunk/example/example-DIH/solr/rss/conf/rss-data-config.xml?revision=691151view=markup org.apache.solr.handler.dataimport.DataImportHandlerException: Exception occurred while initializing context at org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:165) at org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.java:99) at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:96) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:388) at org.apache.solr.core.SolrCore.init(SolrCore.java:571) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:122) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:221) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:302) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:78) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3635) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4222) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:760) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:740) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:544) at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:831) at org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:720) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:490) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1150) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:311) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1022) at org.apache.catalina.core.StandardHost.start(StandardHost.java:736) at
RE: OutofMemory on Highlightling
Here is the stack trace SEVERE: java.lang.OutOfMemoryError: Java heap space at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:133) at java.lang.StringCoding.decode(StringCoding.java:173) at java.lang.String.init(String.java:444) at org.apache.lucene.store.IndexInput.readString(IndexInput.java:125) at org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:390) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:230) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:892) at org.apache.lucene.index.MultiSegmentReader.document(MultiSegmentReader.j ava:277) at org.apache.solr.search.SolrIndexReader.document(SolrIndexReader.java:176 ) at org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:457) at org.apache.solr.search.SolrIndexSearcher.readDocs(SolrIndexSearcher.java :482) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultS olrHighlighter.java:253) at org.apache.solr.handler.component.HighlightComponent.process(HighlightCo mponent.java:84) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(Search Handler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB ase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja va:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica tionFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt erChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv e.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv e.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java :128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java :102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve. java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:2 86) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:84 5) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process( Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) -Original Message- From: Gargate, Siddharth [mailto:sgarg...@ptc.com] Sent: Wednesday, April 22, 2009 9:29 AM To: solr-user@lucene.apache.org Subject: RE: OutofMemory on Highlightling I tried disabling the documentCache but still the same issue. documentCache class=solr.LRUCache size=0 initialSize=0 autowarmCount=0/ -Original Message- From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] Sent: Monday, April 20, 2009 4:38 PM To: solr-user@lucene.apache.org Subject: Re: OutofMemory on Highlightling Gargate, Siddharth wrote: Anybody facing the same issue? Following is my configuration ... field name=content type=text indexed=true stored=false multiValued=true/ field name=teaser type=text indexed=false stored=true/ copyField source=content dest=teaser maxChars=100 / ... ... requestHandler name=standard class=solr.SearchHandler default=true lst name=defaults str name=echoParamsexplicit/str int name=rows500/int str name=hltrue/str str name=flid,score/str str name=hl.flteaser/str str name=hl.alternateFieldteaser/str int name=hl.fragsize200/int int name=hl.maxAlternateFieldLength200/int int name=hl.maxAnalyzedChars500/int /lst /requestHandler ... Search works fine if I disable highlighting and it brings 500 results. But if I enable hightlighting and set the no. of rows to just 20 I get OOME. How about switching documentCache off? Koji