Memory usage
Hi all, I am testing indexing with 2000 text documents of size 2 MB each. These documents contain words created with random characters. I observed that the tomcat memory usage goes on increasing slowly. I tried by removing all the cache configuration, but still memory usage increases. Once the memory reaches to max heap specified, commit looks like blocked until the memory is freed. With larger documents, I see some OOMEs Below are few properties set in solrconfig.xml mainIndex useCompoundFilefalse/useCompoundFile ramBufferSizeMB128/ramBufferSizeMB mergeFactor25/mergeFactor maxMergeDocs2147483647/maxMergeDocs maxFieldLength2147483647/maxFieldLength writeLockTimeout1000/writeLockTimeout commitLockTimeout1/commitLockTimeout lockTypesingle/lockType unlockOnStartupfalse/unlockOnStartup /mainIndex autoCommit maxDocs1/maxDocs maxTime7000/maxTime /autoCommit useColdSearcherfalse/useColdSearcher maxWarmingSearchers10/maxWarmingSearchers Where does the memory get used? And how to avoid it? Thanks, Siddharth
Re: indexing txt file
what is the cntent of your text file? Solr does not directly index files --Noble On Tue, Apr 14, 2009 at 3:54 AM, Alex Vu alex.v...@gmail.com wrote: Hi all, Currently I wrote an xml file and schema.xml file. What is the next step to index a txt file? Where should I put my txt file I want to index? thank you, Alex V. -- --Noble Paul
Re: DataImporter : Java heap space
Hi Shalin: yes i tried with batchSize=-1 parameter as well here the config i tried with dataConfig dataSource type=JdbcDataSource batchSize=-1 name=sp driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/mydb_development user=root password=** / document name=items entity name=item dataSource=sp query=select * from items field column=id name=id / field column=title name=title / /entity /document /dataConfig I hope i have used batchSize parameter @ right place. Thanks! Mani Kumar On Tue, Apr 14, 2009 at 11:24 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Tue, Apr 14, 2009 at 11:18 AM, Mani Kumar manikumarchau...@gmail.com wrote: Here is the stack trace: notice in stack trace * at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1749)* It looks like that its trying to read whole table into memory at a time. n thts y getting OOM. Mani, the data-config.xml you posted does not have the batchSize=-1 attribute to your data source. Did you try that? This is a known bug in MySql jdbc driver. -- Regards, Shalin Shekhar Mangar.
Re: Memory usage
On Tue, Apr 14, 2009 at 11:30 AM, Gargate, Siddharth sgarg...@ptc.comwrote: Hi all, I am testing indexing with 2000 text documents of size 2 MB each. These documents contain words created with random characters. I observed that the tomcat memory usage goes on increasing slowly. I tried by removing all the cache configuration, but still memory usage increases. Once the memory reaches to max heap specified, commit looks like blocked until the memory is freed. With larger documents, I see some OOMEs Below are few properties set in solrconfig.xml mainIndex useCompoundFilefalse/useCompoundFile ramBufferSizeMB128/ramBufferSizeMB mergeFactor25/mergeFactor maxMergeDocs2147483647/maxMergeDocs maxFieldLength2147483647/maxFieldLength writeLockTimeout1000/writeLockTimeout commitLockTimeout1/commitLockTimeout lockTypesingle/lockType unlockOnStartupfalse/unlockOnStartup /mainIndex autoCommit maxDocs1/maxDocs maxTime7000/maxTime /autoCommit useColdSearcherfalse/useColdSearcher maxWarmingSearchers10/maxWarmingSearchers Where does the memory get used? And how to avoid it? What jvm parameters are you using? Also see the following: http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr#d0e105 http://www.lucidimagination.com/blog/2009/02/09/investigating-oom-and-other-jvm-issues/ -- Regards, Shalin Shekhar Mangar.
Re: DataImporter : Java heap space
On Tue, Apr 14, 2009 at 11:36 AM, Mani Kumar manikumarchau...@gmail.comwrote: Hi Shalin: yes i tried with batchSize=-1 parameter as well here the config i tried with dataConfig dataSource type=JdbcDataSource batchSize=-1 name=sp driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/mydb_development user=root password=** / I hope i have used batchSize parameter @ right place. Yes that is correct. Did it still throw OOM from the same place? I'd suggest you increase the heap and see what works for you. Also try -server on the jvm. -- Regards, Shalin Shekhar Mangar.
Re: DataImporter : Java heap space
Yes its throwing the same OOM error and from same place... yes i will try increasing the size ... just curious : how this dataimport works? Does it loads the whole table into memory? Is there any estimate about how much memory it needs to create index for 1GB of data. thx mani On Tue, Apr 14, 2009 at 11:48 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Tue, Apr 14, 2009 at 11:36 AM, Mani Kumar manikumarchau...@gmail.com wrote: Hi Shalin: yes i tried with batchSize=-1 parameter as well here the config i tried with dataConfig dataSource type=JdbcDataSource batchSize=-1 name=sp driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/mydb_development user=root password=** / I hope i have used batchSize parameter @ right place. Yes that is correct. Did it still throw OOM from the same place? I'd suggest you increase the heap and see what works for you. Also try -server on the jvm. -- Regards, Shalin Shekhar Mangar.
Re: Can Solr have Multiple Separate Indexes?
Wow, that was pretty straight forward. Sorry I didn't catch that on the wiki on my first few go rounds, I'll navigate harder next time. Thanks. Isaac On Sun, Apr 12, 2009 at 11:40 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Mon, Apr 13, 2009 at 5:35 AM, Isaac Foster isaac.z.fos...@gmail.com wrote: Hi, I'm new using Solr but have used the Zend Framework implementation of Lucene before. One thing it supports is the ability to have separate indexes, so that you could keep your index of (example) forum posts and your index of user profiles separate, and query them separately. Can this be done with Solr? I've looked through the docs a good bit and will continue to, but if anyone can point me in the right direction I'd greatly appreciate it. Sure. There are a couple of ways. Take a look at http://wiki.apache.org/solr/MultipleIndexes -- Regards, Shalin Shekhar Mangar.
Re: DataImporter : Java heap space
DIH streams 1 row at a time. DIH is just a component in Solr. Solr indexing also takes a lot of memory On Tue, Apr 14, 2009 at 12:02 PM, Mani Kumar manikumarchau...@gmail.com wrote: Yes its throwing the same OOM error and from same place... yes i will try increasing the size ... just curious : how this dataimport works? Does it loads the whole table into memory? Is there any estimate about how much memory it needs to create index for 1GB of data. thx mani On Tue, Apr 14, 2009 at 11:48 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Tue, Apr 14, 2009 at 11:36 AM, Mani Kumar manikumarchau...@gmail.com wrote: Hi Shalin: yes i tried with batchSize=-1 parameter as well here the config i tried with dataConfig dataSource type=JdbcDataSource batchSize=-1 name=sp driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/mydb_development user=root password=** / I hope i have used batchSize parameter @ right place. Yes that is correct. Did it still throw OOM from the same place? I'd suggest you increase the heap and see what works for you. Also try -server on the jvm. -- Regards, Shalin Shekhar Mangar. -- --Noble Paul
Re: Question on StreamingUpdateSolrServer
The machine's ulimit is set to 9000 and the OS has upper limit of 12000 on files. What would explain this? Has anyone tried Solr with 25 cores on the same Solr instance? Thanks, -vivek 2009/4/13 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com: On Tue, Apr 14, 2009 at 7:14 AM, vivek sar vivex...@gmail.com wrote: Some more update. As I mentioned earlier we are using multi-core Solr (up to 65 cores in one Solr instance with each core 10G). This was opening around 3000 file descriptors (lsof). I removed some cores and after some trial and error I found at 25 cores system seems to work fine (around 1400 file descriptors). Tomcat is responsive even when the indexing is happening at Solr (for 25 cores). But, as soon as it goes to 26 cores the Tomcat becomes unresponsive again. The puzzling thing is if I stop indexing I can search on even 65 cores, but while indexing is happening it seems to support only up to 25 cores. 1) Is there a limit on number of cores a Solr instance can handle? 2) Does Solr do anything to the existing cores while indexing? I'm writing to only one core at a time. There is no hard limit (it is Integer.MAX_VALUE) . But inreality your mileage depends on your hardware and no:of file handles the OS can open We are struggling to find why Tomcat stops responding on high number of cores while indexing is in-progress. Any help is very much appreciated. Thanks, -vivek On Mon, Apr 13, 2009 at 10:52 AM, vivek sar vivex...@gmail.com wrote: Here is some more information about my setup, Solr - v1.4 (nightly build 03/29/09) Servlet Container - Tomcat 6.0.18 JVM - 1.6.0 (64 bit) OS - Mac OS X Server 10.5.6 Hardware Overview: Processor Name: Quad-Core Intel Xeon Processor Speed: 3 GHz Number Of Processors: 2 Total Number Of Cores: 8 L2 Cache (per processor): 12 MB Memory: 20 GB Bus Speed: 1.6 GHz JVM Parameters (for Solr): export CATALINA_OPTS=-server -Xms6044m -Xmx6044m -DSOLR_APP -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:gc.log -Dsun.rmi.dgc.client.gcInterval=360 -Dsun.rmi.dgc.server.gcInterval=360 Other: lsof|grep solr|wc -l 2493 ulimit -an open files (-n) 9000 Tomcat Connector port=8080 protocol=HTTP/1.1 connectionTimeout=2 maxThreads=100 / Total Solr cores on same instance - 65 useCompoundFile - true The tests I ran, While Indexer is running 1) Go to http://juum19.co.com:8080/solr; - returns blank page (no error in the catalina.out) 2) Try telnet juum19.co.com 8080 - returns with Connection closed by foreign host Stop the Indexer Program (Tomcat is still running with Solr) 3) Go to http://juum19.co.com:8080/solr; - works ok, shows the list of all the Solr cores 4) Try telnet - able to Telnet fine 5) Now comment out all the caches in solrconfig.xml. Try same tests, but the Tomcat still doesn't response. Is there a way to stop the auto-warmer. I commented out the caches in the solrconfig.xml but still see the following log, INFO: autowarming result for searc...@3aba3830 main fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} INFO: Closing searc...@175dc1e2 main fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} 6) Change the Indexer frequency so it runs every 2 min (instead of all the time). I noticed once the commit is done, I'm able to run my searches. During commit and auto-warming period I just get blank page. 7) Changed from Solrj to XML update - I still get the blank page whenever update/commit is happening. Apr 13, 2009 6:46:18 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[621094001, 621094002, 621094003, 621094004, 621094005, 621094006, 621094007, 621094008, ...(6992 more)]} 0 1948 Apr 13, 2009 6:46:18 PM org.apache.solr.core.SolrCore execute INFO: [20090413_12] webapp=/solr path=/update params={} status=0 QTime=1948 So, looks like it's not just StreamingUpdateSolrServer, but whenever the update/commit is happening I'm not able to search. I don't know if it's related to using
Re: indexing txt file
you should construct the xml containing the fields defined in your schema.xml and give them the values from the text files. for example if you have an schema defining two fields title and text you should construct an xml with a field title and its value and another called text containing the body of your doc. then you can post it to Solr you have deployed and make a commit an it's done. it's possible to construct an xml defining more than jus t a doc add doc field name=titledoc1 title/field field name=textdoc1 text/field /doc . . . doc field name=titledocn title/field field name=textdocn text/field /doc /add 2009/4/14 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com what is the cntent of your text file? Solr does not directly index files --Noble On Tue, Apr 14, 2009 at 3:54 AM, Alex Vu alex.v...@gmail.com wrote: Hi all, Currently I wrote an xml file and schema.xml file. What is the next step to index a txt file? Where should I put my txt file I want to index? thank you, Alex V. -- --Noble Paul
Re: solr 1.4 memory jvm
do you have an idea? sunnyfr wrote: Hi Noble, Yes exactly that, I would like to know how people do during a replication ? Do they turn off servers and put a high autowarmCount which turn off the slave for a while like for my case, 10mn to bring back the new index and then autowarmCount maybe 10 minutes more. Otherwise I tried to put large number of mergefactor but I guess I've too much update every 30mn something like 2000docs and almost all segment are modified. What would you reckon? :( :) Thanks a lot Noble Noble Paul നോബിള് नोब्ळ् wrote: So what I decipher from the numbers is w/o queries Solr replication is not performing too badly. The queries are inherently slow and you wish to optimize the query performance itself. am I correct? On Tue, Apr 7, 2009 at 7:50 PM, sunnyfr johanna...@gmail.com wrote: Hi, So I did two test on two servers; First server : with just replication every 20mn like you can notice: http://www.nabble.com/file/p22930179/cpu_without_request.png cpu_without_request.png http://www.nabble.com/file/p22930179/cpu2_without_request.jpg cpu2_without_request.jpg Second server : with one first replication and a second one during query test: between 15:32pm and 15h41 during replication (checked on .../admin/replication/index.jsp) my respond time query at the end was around 5000msec after the replication I guess during commitment I couldn't get answer of my query for a long time, I refreshed my page few minutes after. http://www.nabble.com/file/p22930179/cpu_with_request.png cpu_with_request.png http://www.nabble.com/file/p22930179/cpu2_with_request.jpg cpu2_with_request.jpg Now without replication I kept going query on the second server, and I can't get better than 1000msec repond time and 11request/second. http://www.nabble.com/file/p22930179/cpu_.jpg cpu_.jpg This is my request : select?fl=idfq=status_published:1+AND+status_moderated:0+AND+status_personal:0+AND+status_private:0+AND+status_deleted:0+AND+status_error:0+AND+status_ready_web:1json.nl=mapwt=jsonstart=0version=1.2bq=status_official:1^1.5+OR+status_creative:1^1+OR+language:en^0.5bf=recip(rord(created),1,10,10)^3+pow(stat_views,0.1)^15+pow(stat_comments,0.1)^15rows=100qt=dismaxqf=title_en^0.8+title^0.2+description_en^0.3+description^0.2+tags^1+owner_login^0.5 Do you have advice ? Thanks Noble -- View this message in context: http://www.nabble.com/solr-1.4-memory-jvm-tp22913742p22930179.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul -- View this message in context: http://www.nabble.com/solr-1.4-memory-jvm-tp22913742p23035520.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: commit / new searcher delay?
Hi Hossman, I would love to know either how do you manage this ? thanks, Shalin Shekhar Mangar wrote: On Fri, Mar 6, 2009 at 8:47 AM, Steve Conover scono...@gmail.com wrote: That's exactly what I'm doing, but I'm explicitly replicating, and committing. Even under these circumstances, what could explain the delay after commit before the new index becomes available? How are you explicitly replicating? I mean, how do you make sure that the slave has actually finished replication and the new index is available now? Are you using the script based replication or the new java based one? -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/commit---new-searcher-delay--tp22342916p23036207.html Sent from the Solr - User mailing list archive at Nabble.com.
Boolean query in Solr
Hi, I am using SolrJ and firing the query on Solr indexes. The indexed contains three fields viz. 1. Document_id (type=integer required= true) 2. Ticket Id (type= integer) 3. Content (type=text) Here the query formulation is such that I am having query with “AND” clause. So the query, that I am firing on index files look like “Content: search query AND Ticket_id:123 Ticket_Id:789)”. Here I am using the AND clause which make my job easy through which I retrieve the document having the query words in the “Content” field and the document is having Ticket_id field(123). I know this type of query is easily fired on lucene indexes. But when I am firing the above query I am not getting the required result . The result contains the document which does not belongs to the ticket id mentioned in the query. Please can anyone help me out of this issue. Thanks in advance. Regards, Sagar Khetkade _ Windows Live Messenger. Multitasking at its finest. http://www.microsoft.com/india/windows/windowslive/messenger.aspx
Re: solr 1.4 memory jvm
We do not have such high update frequency. So we never encountered this problem. If it is possible to take the slave offline during auto warming that is a good solution. --Noble On Thu, Apr 9, 2009 at 2:02 PM, sunnyfr johanna...@gmail.com wrote: Hi Noble, Yes exactly that, I would like to know how people do during a replication ? Do they turn off servers and put a high autowarmCount which turn off the slave for a while like for my case, 10mn to bring back the new index and then autowarmCount maybe 10 minutes more. Otherwise I tried to put large number of mergefactor but I guess I've too much update every 30mn something like 2000docs and almost all segment are modified. What would you reckon? :( :) Thanks a lot Noble Noble Paul നോബിള് नोब्ळ् wrote: So what I decipher from the numbers is w/o queries Solr replication is not performing too badly. The queries are inherently slow and you wish to optimize the query performance itself. am I correct? On Tue, Apr 7, 2009 at 7:50 PM, sunnyfr johanna...@gmail.com wrote: Hi, So I did two test on two servers; First server : with just replication every 20mn like you can notice: http://www.nabble.com/file/p22930179/cpu_without_request.png cpu_without_request.png http://www.nabble.com/file/p22930179/cpu2_without_request.jpg cpu2_without_request.jpg Second server : with one first replication and a second one during query test: between 15:32pm and 15h41 during replication (checked on .../admin/replication/index.jsp) my respond time query at the end was around 5000msec after the replication I guess during commitment I couldn't get answer of my query for a long time, I refreshed my page few minutes after. http://www.nabble.com/file/p22930179/cpu_with_request.png cpu_with_request.png http://www.nabble.com/file/p22930179/cpu2_with_request.jpg cpu2_with_request.jpg Now without replication I kept going query on the second server, and I can't get better than 1000msec repond time and 11request/second. http://www.nabble.com/file/p22930179/cpu_.jpg cpu_.jpg This is my request : select?fl=idfq=status_published:1+AND+status_moderated:0+AND+status_personal:0+AND+status_private:0+AND+status_deleted:0+AND+status_error:0+AND+status_ready_web:1json.nl=mapwt=jsonstart=0version=1.2bq=status_official:1^1.5+OR+status_creative:1^1+OR+language:en^0.5bf=recip(rord(created),1,10,10)^3+pow(stat_views,0.1)^15+pow(stat_comments,0.1)^15rows=100qt=dismaxqf=title_en^0.8+title^0.2+description_en^0.3+description^0.2+tags^1+owner_login^0.5 Do you have advice ? Thanks Noble -- View this message in context: http://www.nabble.com/solr-1.4-memory-jvm-tp22913742p22930179.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul -- View this message in context: http://www.nabble.com/solr-1.4-memory-jvm-tp22913742p22966630.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
Re: Search included in *all* fields
Or in schema.xml you can set the defaultOperator to AND: solrQueryParser defaultOperator=AND/ which applies only to the Lucene/SolrQueryParser, not dismax. Erik On Apr 13, 2009, at 10:49 PM, Ryan McKinley wrote: what about: fieldA:value1 AND fieldB:value2 this can also be written as: +fieldA:value1 +fieldB:value2 On Apr 13, 2009, at 9:53 PM, Johnny X wrote: I'll start a new thread to make things easier, because I've only really got one problem now. I've configured my Solr to search on all fields, so it will only search for a specific query in a specific field (e.g. q=Date:October) will only search the 'Date' field, rather than all the others. The issue is when you build up multiple fields to search on. Only one of those has to match for a result to be returned, rather than all of them. Is there a way to change this? Cheers! -- View this message in context: http://www.nabble.com/Search-included-in-*all*-fields-tp23031829p23031829.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Use more then one document tag with Dataimporthandler ?
nope, but it is possible to have multiple root entities within a document and you can execute one at a time. --Noble On Tue, Apr 14, 2009 at 4:15 PM, gateway0 reiterwo...@yahoo.de wrote: Hi, is it possible to use more than one document tag within my data-config.xml file? Like: dataConfig dataSource type=JdbcDataSource name=abc driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost:3306/my_zend_appz user=root password=/ document name=first ...entities /document document name=second ...entities /document /dataConfig ??? kind regards, Sebastian -- View this message in context: http://www.nabble.com/Use-more-then-one-%3Cdocument%3E-tag-with-Dataimporthandler---tp23037189p23037189.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
Re: Search included in *all* fields
Cheers guys, got it working! Erik Hatcher wrote: Or in schema.xml you can set the defaultOperator to AND: solrQueryParser defaultOperator=AND/ which applies only to the Lucene/SolrQueryParser, not dismax. Erik On Apr 13, 2009, at 10:49 PM, Ryan McKinley wrote: what about: fieldA:value1 AND fieldB:value2 this can also be written as: +fieldA:value1 +fieldB:value2 On Apr 13, 2009, at 9:53 PM, Johnny X wrote: I'll start a new thread to make things easier, because I've only really got one problem now. I've configured my Solr to search on all fields, so it will only search for a specific query in a specific field (e.g. q=Date:October) will only search the 'Date' field, rather than all the others. The issue is when you build up multiple fields to search on. Only one of those has to match for a result to be returned, rather than all of them. Is there a way to change this? Cheers! -- View this message in context: http://www.nabble.com/Search-included-in-*all*-fields-tp23031829p23031829.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Search-included-in-*all*-fields-tp23031829p23037645.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: indexing txt file
On Apr 14, 2009, at 2:01 AM, Noble Paul നോബിള് नोब्ळ् wrote: what is the cntent of your text file? Solr does not directly index file Solr's ExtractingRequestHandler (aka Solr Cell) does index text (and Word, PDF, etc) files directly. This is a Solr 1.4/trunk feature. Erik
Re: Using ExtractingRequestHandler to index a large PDF ~solved
On Apr 6, 2009, at 10:16 AM, Fergus McMenemie wrote: Hmmm, Not sure how this all hangs together. But editing my solrconfig.xml as follows sorted the problem:- requestParsers enableRemoteStreaming=false multipartUploadLimitInKB=2048 / to requestParsers enableRemoteStreaming=false multipartUploadLimitInKB=20048 / We should document this on the wiki or in the config, if it isn't already. As best I could tell it is not documented. I stumbled across the idea of changing multipartUploadLimitInKB after reviewing http://wiki.apache.org/solr/UpdateRichDocuments. But this leads onto wondering if streaming files from a local disk was in some way also available via enableRemoteStreaming for the solr-cell feature? With 20:20 hindsight I see that http://wiki.apache.org/solr/SolrConfigXml does briefly refer to file upload size I feel that the requestDispatcher section of solrconfig.xml needs a more complete description. I get the impression it acts a filter on *any* URL sent to SOLR? What does it do? I will mark up the wiki when this is clarified Also, my initial report of the issue was misled by the log messages. The mention of oceania.pdf refers to a previous successful tika extract. There no mention of the filename that was rejected in the logs or any information that would help me identify it! We should fix this so it at least spits out a meaningful message. Can you open a JIRA? OK SOLR-1113 raised. Regards Fergus. Sorry if this is a FAQ; I suspect it could be. But how do I work around the following:- INFO: [] webapp=/apache-solr-1.4-dev path=/update/extract params={ext.def.fl=textext.literal.id=factbook/reference_maps/pdf/ oceania.pdf} status=0 QTime=318 Apr 2, 2009 11:17:46 AM org.apache.solr.common.SolrException log SEVERE: org.apache.commons.fileupload.FileUploadBase $SizeLimitExceededException: the request was rejected because its size (4585774) exceeds the configured maximum (2097152) at org.apache.commons.fileupload.FileUploadBase $FileItemIteratorImpl.init(FileUploadBase.java:914) at org .apache .commons .fileupload.FileUploadBase.getItemIterator(FileUploadBase.java:331) at org .apache .commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java: 349) at org .apache .commons .fileupload .servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:126) at org .apache .solr .servlet .MultipartRequestParser .parseParamsAndFillStreams(SolrRequestParsers.java:343) at org .apache .solr .servlet .StandardRequestParser .parseParamsAndFillStreams(SolrRequestParsers.java:396) at org .apache .solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:114) at org .apache .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 217) at org .apache .catalina .core .ApplicationFilterChain .internalDoFilter(ApplicationFilterChain.java:202) at org .apache .catalina .core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java: 173) at org .apache .catalina .core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org .apache .catalina .core.StandardContextValve.invoke(StandardContextValve.java:178) at org .apache .catalina.core.StandardHostValve.invoke(StandardHostValve.java:126) at org .apache .catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105) Although the PDF is big, it contains very little text; it is a map. java -jar solr/lib/tika-0.3.jar -g appears to have no bother with it. Fergus... -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===
Customizing solr with my lucene
hey, I am trying to modify the lucene code by adding payload functionality into it. Now if i want to use this lucene with solr what should i do. I have added this to the lib folder of solr.war replacing the old lucene..Is this enough?? Plus i am also using a different schema than the default shema.xml used by solr.(Added some fields and removed some of the previous ones). The problem i am facing is that now the solr is not returning results but the lucene individually is for the same query. Could you help me on this...ny ideas n suggestions?? -- View this message in context: http://www.nabble.com/Customizing-solr-with-my-lucene-tp23038007p23038007.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Maintaining XML Layout
Pre tag fixed it instantly! Thanks! Shalin Shekhar Mangar wrote: On Tue, Apr 14, 2009 at 4:56 PM, Johnny X jonathanwel...@gmail.com wrote: Hey, One of the fields returned from my queries (Content) is essentially the body of an e-mail. However, it's returned as one long stream of text (or at least, that's how it appears on the web page). Viewing the source of the page it appears with the right layout characteristics (paragraphs, name at end of message separate from main message etc.) Is there anyway of making it appear this way on the web page, or is this just a browser specific thing? I think you'd need to convert line break characters in the returned string into equivalent html tags yourself before displaying. You could also try displaying them in a 'pre' tag and see if it looks ok. -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/Maintaining-XML-Layout-tp23037698p23038026.html Sent from the Solr - User mailing list archive at Nabble.com.
Maintaining XML Layout
Hey, One of the fields returned from my queries (Content) is essentially the body of an e-mail. However, it's returned as one long stream of text (or at least, that's how it appears on the web page). Viewing the source of the page it appears with the right layout characteristics (paragraphs, name at end of message separate from main message etc.) Is there anyway of making it appear this way on the web page, or is this just a browser specific thing? Cheers! -- View this message in context: http://www.nabble.com/Maintaining-XML-Layout-tp23037698p23037698.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Maintaining XML Layout
On Tue, Apr 14, 2009 at 4:56 PM, Johnny X jonathanwel...@gmail.com wrote: Hey, One of the fields returned from my queries (Content) is essentially the body of an e-mail. However, it's returned as one long stream of text (or at least, that's how it appears on the web page). Viewing the source of the page it appears with the right layout characteristics (paragraphs, name at end of message separate from main message etc.) Is there anyway of making it appear this way on the web page, or is this just a browser specific thing? I think you'd need to convert line break characters in the returned string into equivalent html tags yourself before displaying. You could also try displaying them in a 'pre' tag and see if it looks ok. -- Regards, Shalin Shekhar Mangar.
Use more then one document tag with Dataimporthandler ?
Hi, is it possible to use more than one document tag within my data-config.xml file? Like: dataConfig dataSource type=JdbcDataSource name=abc driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost:3306/my_zend_appz user=root password=/ document name=first ...entities /document document name=second ...entities /document /dataConfig ??? kind regards, Sebastian -- View this message in context: http://www.nabble.com/Use-more-then-one-%3Cdocument%3E-tag-with-Dataimporthandler---tp23037189p23037189.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Random queries extremely slow
Hi Oleg Did you find a way to pass over this issue ?? thanks a lot, oleg_gnatovskiy wrote: Can you expand on this? Mirroring delay on what? zayhen wrote: Use multiple boxes, with a mirroring delaay from one to another, like a pipeline. 2009/1/22 oleg_gnatovskiy oleg_gnatovs...@citysearch.com Well this probably isn't the cause of our random slow queries, but might be the cause of the slow queries after pulling a new index. Is there anything we could do to reduce the performance hit we take from this happening? Otis Gospodnetic wrote: Here is one example: pushing a large newly optimized index onto the server. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: oleg_gnatovskiy oleg_gnatovs...@citysearch.com To: solr-user@lucene.apache.org Sent: Thursday, January 22, 2009 2:22:51 PM Subject: Re: Random queries extremely slow What are some things that could happen to force files out of the cache on a Linux machine? I don't know what kinds of events to look for... yonik wrote: On Thu, Jan 22, 2009 at 1:46 PM, oleg_gnatovskiy wrote: Hello. Our production servers are operating relatively smoothly most of the time running Solr with 19 million listings. However every once in a while the same query that used to take 100 miliseconds takes 6000. Anything else happening on the system that may have forced some of the index files out of operating system disk cache at these times? -Yonik -- View this message in context: http://www.nabble.com/Random-queries-extremely-slow-tp21610568p21611240.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Random-queries-extremely-slow-tp21610568p21611454.html Sent from the Solr - User mailing list archive at Nabble.com. -- Alexander Ramos Jardim - RPG da Ilha -- View this message in context: http://www.nabble.com/Random-queries-extremely-slow-tp21610568p23039151.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boolean query in Solr
On Apr 14, 2009, at 5:38 AM, Sagar Khetkade wrote: Hi, I am using SolrJ and firing the query on Solr indexes. The indexed contains three fields viz. 1. Document_id (type=integer required= true) 2. Ticket Id (type= integer) 3. Content (type=text) Here the query formulation is such that I am having query with “AND” clause. So the query, that I am firing on index files look like “Content: search query AND Ticket_id:123 Ticket_Id:789)”. That query is invalid query parser syntax, with an unopen paren first of all. I assume that's a typo though. Be careful in how you construct queries with field selectors. Saying: Content:search query does NOT necessarily mean that the term query is being searched in the Content field, as that depends on your default field setting for the query parser. This, however, does use the Content field for both terms: Content:(search query) I know this type of query is easily fired on lucene indexes. But when I am firing the above query I am not getting the required result . The result contains the document which does not belongs to the ticket id mentioned in the query. Please can anyone help me out of this issue. What does the query parse to with debugQuery output? That's mighty informative info. Erik
Re: Customizing solr with my lucene
What is the query parsed to? Add debugQuery=true to your Solr request and let us know what the query parses to. As for whether upgrading a Lucene library is sufficient... depends on what Solr version you're starting with (payload support is already in all recent versions of Solr's Lucene JARs) and what has changed in Lucene since, and whether you're expecting an existing index to work or rebuilding it from scratch. Erik On Apr 14, 2009, at 7:51 AM, mirage1987 wrote: hey, I am trying to modify the lucene code by adding payload functionality into it. Now if i want to use this lucene with solr what should i do. I have added this to the lib folder of solr.war replacing the old lucene..Is this enough?? Plus i am also using a different schema than the default shema.xml used by solr.(Added some fields and removed some of the previous ones). The problem i am facing is that now the solr is not returning results but the lucene individually is for the same query. Could you help me on this...ny ideas n suggestions?? -- View this message in context: http://www.nabble.com/Customizing-solr-with-my-lucene-tp23038007p23038007.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: synchronizing slave indexes in distributing collections
Hi, I would like to know where are you about your script which take the slave out of the load balancer ?? I've no choice to do that during update on the slave server. Thanks, Yu-Hui Jin wrote: Thanks, guys. Glad to know the scripts work very well in your experience. (well, indeed they are quite simple.) So that's how I imagine we should do it except that you guys added a very good point -- that the monitoring system can invoke a script to take the slave out of the load balancer. I'd like to implement this idea. Cheers, -Hui On 8/17/07, Bill Au bill.w...@gmail.com wrote: If snapinstaller fails to install the lastest snapshot, then chances are that it would be able to install any earlier snapshots as well. All it does is some very simple filesystem operations and then invoke the Solr server to do a commit. I agree with Chris that the best thing to do is to take it out of rotation and fix the underlying problem. Bill On 8/17/07, Chris Hostetter hossman_luc...@fucit.org wrote: : So looks like all we can do is it monitoring the logs and alarm people to : fix the issue and rerun the scripts, etc. whenever failures occur. Is that : the correct understanding? I have *never* seen snappuller or snapinstaller fail (except during an initial rollout of Solr when i forgot to setup the neccessary ssh keys). I suppose we could at an option to snapinstaller to support explicitly installing a snapshot by name ... then if you detect that salve Z didn't load the latest snapshot, you could always tell the other slaves to snapinstall whatever older version slave Z is still using -- but frankly that seems a little silly -- not to mention that if you couldn't load the snapshot into Z, odds are Z isn't responding to queries either. a better course of action might just be to have an automated system which monitors the distribution status info on the master, and takes any slaves that don't update it properly out of your load balances rotation (and notifies people to look into it) -Hoss -- Regards, -Hui -- View this message in context: http://www.nabble.com/synchronizing-slave-indexes-in-distributing-collections-tp12194297p23039732.html Sent from the Solr - User mailing list archive at Nabble.com.
Disable logging in SOLR
Hi, is there a way to disable all logging output in SOLR ? I mean the output text like : INFO: [core_de] webapp=/solr path=/update params={wt=json} status=0 QTime=3736 greets -Ralf-
RE: Term Counts/Term Frequency Vector Info
Grant, This works: String url = http://localhost:8983/solr;; SolrServer server = new CommonsHttpSolrServer(url); SolrQuery query = new SolrQuery(); query.setQueryType(/autoSuggest); query.setParam(terms, true); query.setParam(terms.fl, CONTENTS); query.setParam(terms.lower, london); query.setParam(terms.upper, london); query.setParam(terms.upper.incl, true); For the query: http://localhost:8983/solr/autoSuggest?terms=trueterms.fl=CONTENTSterms.lower=londonterms.upper=londonterms.upper.incl=true It turned out that I was missing the leading / in /autoSuggest. This needs to be explicit in the documentation. Thanks! Clay -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Monday, April 13, 2009 3:15 PM To: solr-user@lucene.apache.org Subject: Re: Term Counts/Term Frequency Vector Info Sorry, should have add that you should set the qt param: http://wiki.apache.org/solr/CoreQueryParameters#head-2c940d42ec4f2a74c5d251f12f4077e53f2f00f4 -Grant On Apr 13, 2009, at 1:35 PM, Fink, Clayton R. wrote: The query method seems to only support solr/select requests. I subclassed SolrRequest and created a request class that supports solr/autoSuggest - following the pattern in LukeRequest. It seems to work fine for me. Clay -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Tuesday, April 07, 2009 10:41 PM To: solr-user@lucene.apache.org Subject: Re: Term Counts/Term Frequency Vector Info You can send arbitrary requests via SolrJ, just use the parameter map via the query method: http://lucene.apache.org/solr/api/solrj/org/apache/solr/client/solrj/S olrServer.html . -Grant On Apr 7, 2009, at 1:52 PM, Fink, Clayton R. wrote: These URLs give me what I want - word completion and term counts. What I don't see is a way to call these via SolrJ. I could call the server directly using java.net classes and process the XML myself, I guess. There needs to be an auto suggest request class. http://localhost:8983/solr/autoSuggest? terms=trueterms.fl=CONTENTSte rms.lower=Londterms.prefix=Lonindent=true response − lst name=responseHeader int name=status0/int int name=QTime0/int /lst − lst name=terms − lst name=CONTENTS int name=London11/int int name=Londoners2/int /lst /lst /response http://localhost:8983/solr/autoSuggest? terms=trueterms.fl=CONTENTSte rms.lower=Londonterms.upper=Londonterms.upper.incl=trueindent=true response − lst name=responseHeader int name=status0/int int name=QTime0/int /lst − lst name=terms − lst name=CONTENTS int name=London11/int /lst /lst /response -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Monday, April 06, 2009 5:43 PM To: solr-user@lucene.apache.org Subject: Re: Term Counts/Term Frequency Vector Info See also http://wiki.apache.org/solr/TermsComponent You might be able to apply these patches to 1.3 and have them work, but there is no guarantee. You also can get some termDocs like capabilities through Solr's faceting capabilities, but I am not aware of any way to get at the term vector capabilities. HTH, Grant On Apr 6, 2009, at 1:49 PM, Fink, Clayton R. wrote: I want the functionality that Lucene IndexReader.termDocs gives me. That or access on the document level to the term vector. This (http://wiki.apache.org/solr/TermVectorComponent?highlight=(term )|(vector) seems to suggest that this will be available in 1.4. Is there any way to do this in 1.3? Thanks, Clay -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Embedded Solr weird behaviour
Hello, I am using both Solr server and Solr embedded versions in the same context. I am using the Solr Server for indexing data which can be accessed at enterprise level, and the embedded version in a desktop application. The idea is that both index the same data, have the same schema.xml and config. My problem: when querying both versions I get different results for this case: query=adventure AND category:Publishing Industry Please note that 'Publishing Industry is actually composed by 2 words. For the Server version it works very well, for the Embedded version, I get no result. In this case: query=adventure AND category:Book - I get correct results with both version. category is a field type in my schema. I noticed that when I have something like: AND category:'composed words', the Embedded version fails. In the schema I tried making the category fieldType as text, string, etc, but no results any suggestion would be very appreciated. Thanks, Adrian
Re: Help with relevance failure in Solr 1.3
Dang, had another server do this. Syncing and committing a new index does not fix it. The two servers show the same bad results. wunder On 4/11/09 9:12 AM, Walter Underwood wunderw...@netflix.com wrote: Restarting Solr fixes it. If I remember correctly, a sync and commit does not fix it. I have disabled snappuller this time, so I can study the broken instance. wunder On 4/11/09 5:03 AM, Grant Ingersoll gsing...@apache.org wrote: On Apr 10, 2009, at 5:50 PM, Walter Underwood wrote: Normally, both changeling and the changeling work fine. This one server is misbehaving like this for all multi-term queries. Yes, it is VERY weird that the term changeling does not show up in the explain. A server will occasionally go bad and stay in that state. In one case, two servers went bad and both gave the same wrong results. What's the solution for when they go bad? Do you have to restart Solr or reboot or what? Here is the dismax config. groups means movies. The title* fields are stemmed and stopped, the exact* fields are not. !-- groups and people -- requestHandler name=groups_people class=solr.SearchHandler lst name=defaults str name=defTypedismax/str str name=echoParamsnone/str float name=tie0.01/float str name=qf exact^6.0 exact_alt^6.0 exact_base~jw_0.7_1^8.0 exact_alias^8.0 title^3.0 title_alt^3.0 title_base^4.0 /str str name=pf exact^9.0 exact_alt^9.0 exact_base^12.0 exact_alias^12.0 title^3.0 title_alt^4.0 title_base^6.0 /str str name=bf search_popularity^100.0 /str str name=mm1/str int name=ps100/int str name=flid,type,movieid,personid,genreid/str /lst lst name=appends str name=fqtype:group OR type:person/str /lst /requestHandler wunder On 4/10/09 12:51 PM, Grant Ingersoll gsing...@apache.org wrote: On Apr 10, 2009, at 1:56 PM, Walter Underwood wrote: We have a rare, hard-to-reproduce problem with our Solr 1.3 servers, and I would appreciate any ideas. Ocassionally, a server will start returning results with really poor relevance. Single term queries work fine, but multi-term queries are scored based on the most common term (lowest IDF). I don't see anything in the logs when this happens. We have a monitor doing a search for the 100 most popular movies once per minute to catch this, so we know when it was first detected. I'm attaching two explain outputs, one for the query changeling and one for the changeling. I'm not sure what exactly you are asking, so bear with me... Are you saying that the changeling normally returns results just fine and then periodically it will go bad or are you saying you don't understand why the changeling scores differently from changeling? In looking at the explains, it is weird that in the the changeling case, the term changeling doesn't even show up as a term. Can you share your dismax configuration? That will be easier to parse than trying to make sense of the debug query parsing. -Grant -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Memory usage
Could you give us a dump of http://localhost:port/solr/admin/luke ? A huge max field length and random terms in 2000 2 MB files is going to be a bit of a resource hog :) Can you explain why you are doing that? You will have *so* many unique terms... I can't remember if you can set it in Solr, but there is a way to lessen how much RAM terms take in Lucene though (term interval I believe?). - Mark Gargate, Siddharth wrote: Hi all, I am testing indexing with 2000 text documents of size 2 MB each. These documents contain words created with random characters. I observed that the tomcat memory usage goes on increasing slowly. I tried by removing all the cache configuration, but still memory usage increases. Once the memory reaches to max heap specified, commit looks like blocked until the memory is freed. With larger documents, I see some OOMEs Below are few properties set in solrconfig.xml mainIndex useCompoundFilefalse/useCompoundFile ramBufferSizeMB128/ramBufferSizeMB mergeFactor25/mergeFactor maxMergeDocs2147483647/maxMergeDocs maxFieldLength2147483647/maxFieldLength writeLockTimeout1000/writeLockTimeout commitLockTimeout1/commitLockTimeout lockTypesingle/lockType unlockOnStartupfalse/unlockOnStartup /mainIndex autoCommit maxDocs1/maxDocs maxTime7000/maxTime /autoCommit useColdSearcherfalse/useColdSearcher maxWarmingSearchers10/maxWarmingSearchers Where does the memory get used? And how to avoid it? Thanks, Siddharth -- - Mark http://www.lucidimagination.com
Re: Help with relevance failure in Solr 1.3
It just occurred to me that a query cache issue could potentially cause this... if it's caching it would most likely be a query.equals() implementation incorrectly returning true. Perhaps check the JaroWinkler.equals() first? Also, when one server starts to return bad results, have you tried using explainOther=id:id_of_other_doc_that_should_score_higher? -Yonik http://www.lucidimagination.com On Tue, Apr 14, 2009 at 11:43 AM, Walter Underwood wunderw...@netflix.com wrote: Dang, had another server do this. Syncing and committing a new index does not fix it. The two servers show the same bad results. wunder On 4/11/09 9:12 AM, Walter Underwood wunderw...@netflix.com wrote: Restarting Solr fixes it. If I remember correctly, a sync and commit does not fix it. I have disabled snappuller this time, so I can study the broken instance. wunder
Re: indexing txt file
Hi all, I'm trying to use solr1.3 and trying to index a text file. I wrote a schema.xsd and a xml file. *The content of my text file is * #src dstprotook sportdportpktsbytesflowsfirst atest 192.168.220.13526.147.238.1466 13283980 6 463 1 1237333861.4657640001237333861.664701000 *schema file is * ?xml version=1.0 encoding=UTF-8? !--W3C Schema generated by XMLSpy v2009 sp1 (http://www.altova.com)-- xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema; xs:element name=networkTraffic xs:complexType xs:sequence xs:element name=packet maxOccurs=unbounded xs:complexType xs:attribute name=terminationTimestamp type=xs:string use=required/ xs:attribute name=sourcePort type=xs:string use=required/ xs:attribute name=sourceIp type=xs:string use=required/ xs:attribute name=protocolPortNumber type=xs:string use=required/ xs:attribute name=packets type=xs:string use=required/ xs:attribute name=ok type=xs:string use=required/ xs:attribute name=initialTimestamp type=xs:string use=required/ xs:attribute name=flows type=xs:string use=required/ xs:attribute name=destinatoinIp type=xs:string use=required/ xs:attribute name=destinationPort type=xs:string use=required/ xs:attribute name=bytes type=xs:string use=required/ /xs:complexType /xs:element /xs:sequence /xs:complexType /xs:element /xs:schema *and my xml file is * ?xml version=1.0 encoding=UTF-8? networkTraffic xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; xsi:noNamespaceSchemaLocation=C:\DOCUME~1\tpham\Desktop\networkTraffic.xsd packet sourceIp=192.168.54.23 destinatoinIp=192.168.0.1 protocolPortNumber=6 ok=1 sourcePort=32439 destinationPort=80 packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000 terminationTimestamp=1237963861.664701000/ packet sourceIp=192.168.56.23 destinatoinIp=192.168.0.1 protocolPortNumber=17 ok=1 sourcePort=32439 destinationPort=80 packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000 terminationTimestamp=1237963861.664701000/ packet sourceIp=192.168.74.23 destinatoinIp=192.168.0.1 protocolPortNumber=6 ok=1 sourcePort=32139 destinationPort=80 packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000 terminationTimestamp=1237963861.664701000/ packet sourceIp=192.168.54.123 destinatoinIp=192.168.0.1 protocolPortNumber=6 ok=1 sourcePort=32839 destinationPort=80 packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000 terminationTimestamp=1237963861.664701000/ packet sourceIp=192.168.14.23 destinatoinIp=192.168.0.1 protocolPortNumber=17 ok=1 sourcePort=32839 destinationPort=80 packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000 terminationTimestamp=1237963861.664701000/ packet sourceIp=192.168.5.23 destinatoinIp=192.168.0.1 protocolPortNumber=17 ok=1 sourcePort=32439 destinationPort=80 packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000 terminationTimestamp=1237963861.664701000/ packet sourceIp=192.168.15.23 destinatoinIp=192.168.0.1 protocolPortNumber=6 ok=1 sourcePort=36839 destinationPort=80 packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000 terminationTimestamp=1237963861.664701000/ packet sourceIp=192.168.24.23 destinatoinIp=192.168.0.1 protocolPortNumber=6 ok=1 sourcePort=32839 destinationPort=80 packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000 terminationTimestamp=1237963861.664701000/ /networkTraffic Can someone please show me where do I put these files? I'm aware that the schema.xsd file goes into the directory conf. What about my xml file, and txt file? Thank you, Alex On Tue, Apr 14, 2009 at 12:37 AM, Alejandro Gonzalez alejandrogonzalezd...@gmail.com wrote: you should construct the xml containing the fields defined in your schema.xml and give them the values from the text files. for example if you have an schema defining two fields title and text you should construct an xml with a field title and its value and another called text containing the body of your doc. then you can post it to Solr you have deployed and make a commit an it's done. it's possible to construct an xml defining more than jus t a doc add doc field name=titledoc1 title/field field name=textdoc1 text/field /doc . . . doc field name=titledocn title/field field name=textdocn text/field /doc /add 2009/4/14 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com what is the cntent of your text file? Solr does not directly index files --Noble On Tue, Apr 14, 2009 at 3:54 AM,
DIH uniqueKey
Hi, I have separate JDBC datasources (DS1 DS2) that I want to index with DIH in a single SOLR instance. The unique record for the two sources are different. Do I have to synthesize a uniqueKey that spans both the datasources? Something like this? That is, the uniqueKey values will be like (+ indicating concatenation): DS1 + primary key for DS1 DS2 + primary key for DS2 Thanks - ashok -- View this message in context: http://www.nabble.com/DIH---uniqueKey-tp23042732p23042732.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Help with relevance failure in Solr 1.3
The JaroWinkler equals was broken, but I fixed that a month ago. Query cache sounds possible, but those are cleared on a commit, right? I could run with a cache size of 0, since our middle tier HTTP cache is leaving almost nothing for the caches to do. I'll try that explain. The stored fields for the correct doc are fine, because I can see them when I use a single-term query. The indexed fields seem OK, because that query works. wunder On 4/14/09 9:11 AM, Yonik Seeley yo...@lucidimagination.com wrote: It just occurred to me that a query cache issue could potentially cause this... if it's caching it would most likely be a query.equals() implementation incorrectly returning true. Perhaps check the JaroWinkler.equals() first? Also, when one server starts to return bad results, have you tried using explainOther=id:id_of_other_doc_that_should_score_higher? -Yonik http://www.lucidimagination.com On Tue, Apr 14, 2009 at 11:43 AM, Walter Underwood wunderw...@netflix.com wrote: Dang, had another server do this. Syncing and committing a new index does not fix it. The two servers show the same bad results. wunder On 4/11/09 9:12 AM, Walter Underwood wunderw...@netflix.com wrote: Restarting Solr fixes it. If I remember correctly, a sync and commit does not fix it. I have disabled snappuller this time, so I can study the broken instance. wunder
Re: indexing txt file
now you should post (http post) your xml file (the schema must be in conf folder) to the url in wich it's supossed you have deployed Solr. Don forget to post a commit command after that or you won't see the results: The commit command it's just an xml this way: commit/commit On Tue, Apr 14, 2009 at 6:14 PM, Alex Vu alex.v...@gmail.com wrote: Hi all, I'm trying to use solr1.3 and trying to index a text file. I wrote a schema.xsd and a xml file. *The content of my text file is * #src dstprotook sportdportpktsbytesflowsfirst atest 192.168.220.13526.147.238.1466 13283980 6 463 1 1237333861.4657640001237333861.664701000 *schema file is * ?xml version=1.0 encoding=UTF-8? !--W3C Schema generated by XMLSpy v2009 sp1 (http://www.altova.com)-- xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema; xs:element name=networkTraffic xs:complexType xs:sequence xs:element name=packet maxOccurs=unbounded xs:complexType xs:attribute name=terminationTimestamp type=xs:string use=required/ xs:attribute name=sourcePort type=xs:string use=required/ xs:attribute name=sourceIp type=xs:string use=required/ xs:attribute name=protocolPortNumber type=xs:string use=required/ xs:attribute name=packets type=xs:string use=required/ xs:attribute name=ok type=xs:string use=required/ xs:attribute name=initialTimestamp type=xs:string use=required/ xs:attribute name=flows type=xs:string use=required/ xs:attribute name=destinatoinIp type=xs:string use=required/ xs:attribute name=destinationPort type=xs:string use=required/ xs:attribute name=bytes type=xs:string use=required/ /xs:complexType /xs:element /xs:sequence /xs:complexType /xs:element /xs:schema *and my xml file is * ?xml version=1.0 encoding=UTF-8? networkTraffic xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; xsi:noNamespaceSchemaLocation=C:\DOCUME~1\tpham\Desktop\networkTraffic.xsd packet sourceIp=192.168.54.23 destinatoinIp=192.168.0.1 protocolPortNumber=6 ok=1 sourcePort=32439 destinationPort=80 packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000 terminationTimestamp=1237963861.664701000/ packet sourceIp=192.168.56.23 destinatoinIp=192.168.0.1 protocolPortNumber=17 ok=1 sourcePort=32439 destinationPort=80 packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000 terminationTimestamp=1237963861.664701000/ packet sourceIp=192.168.74.23 destinatoinIp=192.168.0.1 protocolPortNumber=6 ok=1 sourcePort=32139 destinationPort=80 packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000 terminationTimestamp=1237963861.664701000/ packet sourceIp=192.168.54.123 destinatoinIp=192.168.0.1 protocolPortNumber=6 ok=1 sourcePort=32839 destinationPort=80 packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000 terminationTimestamp=1237963861.664701000/ packet sourceIp=192.168.14.23 destinatoinIp=192.168.0.1 protocolPortNumber=17 ok=1 sourcePort=32839 destinationPort=80 packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000 terminationTimestamp=1237963861.664701000/ packet sourceIp=192.168.5.23 destinatoinIp=192.168.0.1 protocolPortNumber=17 ok=1 sourcePort=32439 destinationPort=80 packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000 terminationTimestamp=1237963861.664701000/ packet sourceIp=192.168.15.23 destinatoinIp=192.168.0.1 protocolPortNumber=6 ok=1 sourcePort=36839 destinationPort=80 packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000 terminationTimestamp=1237963861.664701000/ packet sourceIp=192.168.24.23 destinatoinIp=192.168.0.1 protocolPortNumber=6 ok=1 sourcePort=32839 destinationPort=80 packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000 terminationTimestamp=1237963861.664701000/ /networkTraffic Can someone please show me where do I put these files? I'm aware that the schema.xsd file goes into the directory conf. What about my xml file, and txt file? Thank you, Alex On Tue, Apr 14, 2009 at 12:37 AM, Alejandro Gonzalez alejandrogonzalezd...@gmail.com wrote: you should construct the xml containing the fields defined in your schema.xml and give them the values from the text files. for example if you have an schema defining two fields title and text you should construct an xml with a field title and its value and another called text containing the body of your doc. then you can post it to Solr you have deployed and make a commit an it's done. it's possible
Re: indexing txt file
what about the text file? On Tue, Apr 14, 2009 at 9:23 AM, Alejandro Gonzalez alejandrogonzalezd...@gmail.com wrote: now you should post (http post) your xml file (the schema must be in conf folder) to the url in wich it's supossed you have deployed Solr. Don forget to post a commit command after that or you won't see the results: The commit command it's just an xml this way: commit/commit On Tue, Apr 14, 2009 at 6:14 PM, Alex Vu alex.v...@gmail.com wrote: Hi all, I'm trying to use solr1.3 and trying to index a text file. I wrote a schema.xsd and a xml file. *The content of my text file is * #src dstprotook sportdportpktsbytesflowsfirst atest 192.168.220.13526.147.238.1466 13283980 6 463 1 1237333861.4657640001237333861.664701000 *schema file is * ?xml version=1.0 encoding=UTF-8? !--W3C Schema generated by XMLSpy v2009 sp1 (http://www.altova.com)-- xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema; xs:element name=networkTraffic xs:complexType xs:sequence xs:element name=packet maxOccurs=unbounded xs:complexType xs:attribute name=terminationTimestamp type=xs:string use=required/ xs:attribute name=sourcePort type=xs:string use=required/ xs:attribute name=sourceIp type=xs:string use=required/ xs:attribute name=protocolPortNumber type=xs:string use=required/ xs:attribute name=packets type=xs:string use=required/ xs:attribute name=ok type=xs:string use=required/ xs:attribute name=initialTimestamp type=xs:string use=required/ xs:attribute name=flows type=xs:string use=required/ xs:attribute name=destinatoinIp type=xs:string use=required/ xs:attribute name=destinationPort type=xs:string use=required/ xs:attribute name=bytes type=xs:string use=required/ /xs:complexType /xs:element /xs:sequence /xs:complexType /xs:element /xs:schema *and my xml file is * ?xml version=1.0 encoding=UTF-8? networkTraffic xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; xsi:noNamespaceSchemaLocation=C:\DOCUME~1\tpham\Desktop\networkTraffic.xsd packet sourceIp=192.168.54.23 destinatoinIp=192.168.0.1 protocolPortNumber=6 ok=1 sourcePort=32439 destinationPort=80 packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000 terminationTimestamp=1237963861.664701000/ packet sourceIp=192.168.56.23 destinatoinIp=192.168.0.1 protocolPortNumber=17 ok=1 sourcePort=32439 destinationPort=80 packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000 terminationTimestamp=1237963861.664701000/ packet sourceIp=192.168.74.23 destinatoinIp=192.168.0.1 protocolPortNumber=6 ok=1 sourcePort=32139 destinationPort=80 packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000 terminationTimestamp=1237963861.664701000/ packet sourceIp=192.168.54.123 destinatoinIp=192.168.0.1 protocolPortNumber=6 ok=1 sourcePort=32839 destinationPort=80 packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000 terminationTimestamp=1237963861.664701000/ packet sourceIp=192.168.14.23 destinatoinIp=192.168.0.1 protocolPortNumber=17 ok=1 sourcePort=32839 destinationPort=80 packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000 terminationTimestamp=1237963861.664701000/ packet sourceIp=192.168.5.23 destinatoinIp=192.168.0.1 protocolPortNumber=17 ok=1 sourcePort=32439 destinationPort=80 packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000 terminationTimestamp=1237963861.664701000/ packet sourceIp=192.168.15.23 destinatoinIp=192.168.0.1 protocolPortNumber=6 ok=1 sourcePort=36839 destinationPort=80 packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000 terminationTimestamp=1237963861.664701000/ packet sourceIp=192.168.24.23 destinatoinIp=192.168.0.1 protocolPortNumber=6 ok=1 sourcePort=32839 destinationPort=80 packets=6 bytes=463 flows=1 initialTimestamp=1237963861.465764000 terminationTimestamp=1237963861.664701000/ /networkTraffic Can someone please show me where do I put these files? I'm aware that the schema.xsd file goes into the directory conf. What about my xml file, and txt file? Thank you, Alex On Tue, Apr 14, 2009 at 12:37 AM, Alejandro Gonzalez alejandrogonzalezd...@gmail.com wrote: you should construct the xml containing the fields defined in your schema.xml and give them the values from the text files. for example if you have an schema defining two
Re: indexing txt file
On Tue, Apr 14, 2009 at 9:44 PM, Alex Vu alex.v...@gmail.com wrote: *schema file is * ?xml version=1.0 encoding=UTF-8? !--W3C Schema generated by XMLSpy v2009 sp1 (http://www.altova.com)-- xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema; xs:element name=networkTraffic xs:complexType xs:sequence xs:element name=packet maxOccurs=unbounded xs:complexType xs:attribute name=terminationTimestamp type=xs:string use=required/ xs:attribute name=sourcePort type=xs:string use=required/ xs:attribute name=sourceIp type=xs:string use=required/ xs:attribute name=protocolPortNumber type=xs:string use=required/ xs:attribute name=packets type=xs:string use=required/ xs:attribute name=ok type=xs:string use=required/ xs:attribute name=initialTimestamp type=xs:string use=required/ xs:attribute name=flows type=xs:string use=required/ xs:attribute name=destinatoinIp type=xs:string use=required/ xs:attribute name=destinationPort type=xs:string use=required/ xs:attribute name=bytes type=xs:string use=required/ /xs:complexType /xs:element /xs:sequence /xs:complexType /xs:element /xs:schema Can someone please show me where do I put these files? I'm aware that the schema.xsd file goes into the directory conf. What about my xml file, and txt file? Alex, the Solr schema is not the usual XML Schema (xsd). It is an xml file which describes the fields, their analyzers, tokenizers, copyFields, default search field etc. Look into the example schema supplied by Solr (inside example/solr/conf) directory and modify it according to your needs. -- Regards, Shalin Shekhar Mangar.
Re: indexing txt file
and i'm not sure of understanding what are u trying to do, but maybe you should define a text field and fill it with the text in each file for indexing the text in them, or maybe a path to that file if that's what u want. On Tue, Apr 14, 2009 at 6:28 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Tue, Apr 14, 2009 at 9:44 PM, Alex Vu alex.v...@gmail.com wrote: *schema file is * ?xml version=1.0 encoding=UTF-8? !--W3C Schema generated by XMLSpy v2009 sp1 (http://www.altova.com)-- xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema; xs:element name=networkTraffic xs:complexType xs:sequence xs:element name=packet maxOccurs=unbounded xs:complexType xs:attribute name=terminationTimestamp type=xs:string use=required/ xs:attribute name=sourcePort type=xs:string use=required/ xs:attribute name=sourceIp type=xs:string use=required/ xs:attribute name=protocolPortNumber type=xs:string use=required/ xs:attribute name=packets type=xs:string use=required/ xs:attribute name=ok type=xs:string use=required/ xs:attribute name=initialTimestamp type=xs:string use=required/ xs:attribute name=flows type=xs:string use=required/ xs:attribute name=destinatoinIp type=xs:string use=required/ xs:attribute name=destinationPort type=xs:string use=required/ xs:attribute name=bytes type=xs:string use=required/ /xs:complexType /xs:element /xs:sequence /xs:complexType /xs:element /xs:schema Can someone please show me where do I put these files? I'm aware that the schema.xsd file goes into the directory conf. What about my xml file, and txt file? Alex, the Solr schema is not the usual XML Schema (xsd). It is an xml file which describes the fields, their analyzers, tokenizers, copyFields, default search field etc. Look into the example schema supplied by Solr (inside example/solr/conf) directory and modify it according to your needs. -- Regards, Shalin Shekhar Mangar.
Re: indexing txt file
I also wrote another schema file that is supplied by Solr, I do have some questions. *The content of my text file is * #src dstprotook sportdportpktsbytesflowsfirst latest 192.168.220.13526.147.238.1466 13283980 6 463 1 1237333861.4657640001237333861.664701000 I chose my: *1. fieldType to be* : tint, tfloat, tlong, tdouble *2. tokenizer class*: solr.WhiteSpaceTokenizerFactory, solr.StandardTokenizerFactory, solr.HTMLStripWhitespaceTokenizerFactory *3. filter class : *solr.LengthFilterFactory, solr.TrimFilterFactory *4. filed name:* src, dst, proto, ok, sport, deport, poks bytes, flow, first, and latest *5. uniqueKey:* src, dst Are these modification legal accordingly to my text file? Also, if I put this schema.xml file to conf, what do I do with my text file? Thank you, Nga P. On Tue, Apr 14, 2009 at 9:28 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Tue, Apr 14, 2009 at 9:44 PM, Alex Vu alex.v...@gmail.com wrote: *schema file is * ?xml version=1.0 encoding=UTF-8? !--W3C Schema generated by XMLSpy v2009 sp1 (http://www.altova.com)-- xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema; xs:element name=networkTraffic xs:complexType xs:sequence xs:element name=packet maxOccurs=unbounded xs:complexType xs:attribute name=terminationTimestamp type=xs:string use=required/ xs:attribute name=sourcePort type=xs:string use=required/ xs:attribute name=sourceIp type=xs:string use=required/ xs:attribute name=protocolPortNumber type=xs:string use=required/ xs:attribute name=packets type=xs:string use=required/ xs:attribute name=ok type=xs:string use=required/ xs:attribute name=initialTimestamp type=xs:string use=required/ xs:attribute name=flows type=xs:string use=required/ xs:attribute name=destinatoinIp type=xs:string use=required/ xs:attribute name=destinationPort type=xs:string use=required/ xs:attribute name=bytes type=xs:string use=required/ /xs:complexType /xs:element /xs:sequence /xs:complexType /xs:element /xs:schema Can someone please show me where do I put these files? I'm aware that the schema.xsd file goes into the directory conf. What about my xml file, and txt file? Alex, the Solr schema is not the usual XML Schema (xsd). It is an xml file which describes the fields, their analyzers, tokenizers, copyFields, default search field etc. Look into the example schema supplied by Solr (inside example/solr/conf) directory and modify it according to your needs. -- Regards, Shalin Shekhar Mangar.
Re: Random queries extremely slow
It was actually our use of the field collapse patch. Once we disabled this the random slow queries went away. We also added *:* as a warmup query in order to speed up performance after indexing. sunnyfr wrote: Hi Oleg Did you find a way to pass over this issue ?? thanks a lot, oleg_gnatovskiy wrote: Can you expand on this? Mirroring delay on what? zayhen wrote: Use multiple boxes, with a mirroring delaay from one to another, like a pipeline. 2009/1/22 oleg_gnatovskiy oleg_gnatovs...@citysearch.com Well this probably isn't the cause of our random slow queries, but might be the cause of the slow queries after pulling a new index. Is there anything we could do to reduce the performance hit we take from this happening? Otis Gospodnetic wrote: Here is one example: pushing a large newly optimized index onto the server. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: oleg_gnatovskiy oleg_gnatovs...@citysearch.com To: solr-user@lucene.apache.org Sent: Thursday, January 22, 2009 2:22:51 PM Subject: Re: Random queries extremely slow What are some things that could happen to force files out of the cache on a Linux machine? I don't know what kinds of events to look for... yonik wrote: On Thu, Jan 22, 2009 at 1:46 PM, oleg_gnatovskiy wrote: Hello. Our production servers are operating relatively smoothly most of the time running Solr with 19 million listings. However every once in a while the same query that used to take 100 miliseconds takes 6000. Anything else happening on the system that may have forced some of the index files out of operating system disk cache at these times? -Yonik -- View this message in context: http://www.nabble.com/Random-queries-extremely-slow-tp21610568p21611240.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Random-queries-extremely-slow-tp21610568p21611454.html Sent from the Solr - User mailing list archive at Nabble.com. -- Alexander Ramos Jardim - RPG da Ilha -- View this message in context: http://www.nabble.com/Random-queries-extremely-slow-tp21610568p23043152.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH uniqueKey
use TemplateTransformer to create a key On Tue, Apr 14, 2009 at 9:49 PM, ashokc ash...@qualcomm.com wrote: Hi, I have separate JDBC datasources (DS1 DS2) that I want to index with DIH in a single SOLR instance. The unique record for the two sources are different. Do I have to synthesize a uniqueKey that spans both the datasources? Something like this? That is, the uniqueKey values will be like (+ indicating concatenation): DS1 + primary key for DS1 DS2 + primary key for DS2 Thanks - ashok -- View this message in context: http://www.nabble.com/DIH---uniqueKey-tp23042732p23042732.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
Re: indexing txt file
I just want to be able to index my text file, and other files that carries the same format but with different IP address, ports, ect. I will have the traffic flow running in real-time. Do you think Solr will be able to index a bunch of my text files in real time? On Tue, Apr 14, 2009 at 9:35 AM, Alejandro Gonzalez alejandrogonzalezd...@gmail.com wrote: and i'm not sure of understanding what are u trying to do, but maybe you should define a text field and fill it with the text in each file for indexing the text in them, or maybe a path to that file if that's what u want. On Tue, Apr 14, 2009 at 6:28 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Tue, Apr 14, 2009 at 9:44 PM, Alex Vu alex.v...@gmail.com wrote: *schema file is * ?xml version=1.0 encoding=UTF-8? !--W3C Schema generated by XMLSpy v2009 sp1 (http://www.altova.com )-- xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema; xs:element name=networkTraffic xs:complexType xs:sequence xs:element name=packet maxOccurs=unbounded xs:complexType xs:attribute name=terminationTimestamp type=xs:string use=required/ xs:attribute name=sourcePort type=xs:string use=required/ xs:attribute name=sourceIp type=xs:string use=required/ xs:attribute name=protocolPortNumber type=xs:string use=required/ xs:attribute name=packets type=xs:string use=required/ xs:attribute name=ok type=xs:string use=required/ xs:attribute name=initialTimestamp type=xs:string use=required/ xs:attribute name=flows type=xs:string use=required/ xs:attribute name=destinatoinIp type=xs:string use=required/ xs:attribute name=destinationPort type=xs:string use=required/ xs:attribute name=bytes type=xs:string use=required/ /xs:complexType /xs:element /xs:sequence /xs:complexType /xs:element /xs:schema Can someone please show me where do I put these files? I'm aware that the schema.xsd file goes into the directory conf. What about my xml file, and txt file? Alex, the Solr schema is not the usual XML Schema (xsd). It is an xml file which describes the fields, their analyzers, tokenizers, copyFields, default search field etc. Look into the example schema supplied by Solr (inside example/solr/conf) directory and modify it according to your needs. -- Regards, Shalin Shekhar Mangar.
Using Solr from AppEngine application via SolrJ: any problematic issues?
I was wondering if those more up on SolrJ internals could take a look if there were any serious gotchas with the AppEngine's Java urlfetch with respect to SolrJ. http://code.google.com/appengine/docs/java/urlfetch/overview.html The URL must use the standard ports for HTTP (80) and HTTPS (443). The port is implied by the scheme, but may also be mentioned in the URL as long as the port is standard for the scheme (https://...:443/). An app cannot connect to an arbitrary port of a remote host, nor can it use a non-standard port for a scheme. This is an annoyance for those running Solr on non-80/443. To some, this may be a fatal limitation. There is a 1M upload/download limit, which would impact large adds to the index and large results sets back from the index. There are also other quotas: http://code.google.com/appengine/docs/java/urlfetch/overview.html#Quotas_and_Limits Otherwise, my eyes see no other major issues. Others? thanks, Glen -- -
Re: Help with relevance failure in Solr 1.3
On Tue, Apr 14, 2009 at 12:19 PM, Walter Underwood wunderw...@netflix.com wrote: The JaroWinkler equals was broken, but I fixed that a month ago. Query cache sounds possible, but those are cleared on a commit, right? Yes, but if you use autowarming, those items are regenerated and if there is a problem with equals() then it could re-appear (the cache items are correct, it's just the lookup that returns the wrong one). -Yonik http://www.lucidimagination.com
Re: Help with relevance failure in Solr 1.3
But why would it work for a few days, then go bad and stay bad? It fails for every multi-term query, even those not in cache. I ran a test with more queries than the cache size. We do use autowarming. wunder On 4/14/09 10:55 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Tue, Apr 14, 2009 at 12:19 PM, Walter Underwood wunderw...@netflix.com wrote: The JaroWinkler equals was broken, but I fixed that a month ago. Query cache sounds possible, but those are cleared on a commit, right? Yes, but if you use autowarming, those items are regenerated and if there is a problem with equals() then it could re-appear (the cache items are correct, it's just the lookup that returns the wrong one). -Yonik http://www.lucidimagination.com
solr 1.3 + tomcat 5.5
Hi, got problem setting up solr + tomcat Tomcat5.5 + apache solr 1.3.0 + centos 5.3 I don't familiar with java at all, so sorry if it's dumb question. Here is what i did: placed solr.war in webapps folder changed solr home to /etc/solr copied contents of solr distribution example folder to /etc/solr tomcat starting successfully and i even can access admin interface but following error appears in catalina.out every 10 seconds: SEVERE: Error deploying configuration descriptor var#lib#tomcat5#webapps#solr.xml Apr 14, 2009 1:30:14 PM org.apache.catalina.startup.HostConfig deployDescriptor SEVERE: Error deploying configuration descriptor etc#solr#.xml Apr 14, 2009 1:30:24 PM org.apache.catalina.startup.HostConfig deployDescriptor SEVERE: Error deploying configuration descriptor var#lib#tomcat5#webapps#solr.xml Apr 14, 2009 1:30:24 PM org.apache.catalina.startup.HostConfig deployDescriptor SEVERE: Error deploying configuration descriptor etc#solr#.xml Apr 14, 2009 1:30:34 PM org.apache.catalina.startup.HostConfig deployDescriptor SEVERE: Error deploying configuration descriptor var#lib#tomcat5#webapps#solr.xml Apr 14, 2009 1:30:34 PM org.apache.catalina.startup.HostConfig deployDescriptor SEVERE: Error deploying configuration descriptor etc#solr#.xml Apr 14, 2009 1:30:44 PM org.apache.catalina.startup.HostConfig deployDescriptor SEVERE: Error deploying configuration descriptor var#lib#tomcat5#webapps#solr.xml Apr 14, 2009 1:30:44 PM org.apache.catalina.startup.HostConfig deployDescriptor SEVERE: Error deploying configuration descriptor etc#solr#.xml Apr 14, 2009 1:30:54 PM org.apache.catalina.startup.HostConfig deployDescriptor SEVERE: Error deploying configuration descriptor var#lib#tomcat5#webapps#solr.xml Apr 14, 2009 1:30:54 PM org.apache.catalina.startup.HostConfig deployDescriptor SEVERE: Error deploying configuration descriptor etc#solr#.xml Googled about 3 hours. tried to set allow write permissions for all to /etc, /etc/solr /var/ lib/tomcat5/webapps tried to create empty file named solr.xml in /etc, /etc/solr tried to copy solrconfig.xml to /etc/, /etc/solr
Re: Analyzers and stemmer
I would say a language is supported if there is a Tokenizer available for it. Everything else after that is generally seen as an improvement. On Apr 9, 2009, at 5:26 AM, revas wrote: Hi , With respect to language support in solr ,we have analyzers for some languages and stemmers for certain langauges.Do we say that solr supports this particular language only if we have both analyzer and stemmer for the language or also for which we have analyzer but not stemmer Regards Sujatha -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Multi-language support
On Apr 9, 2009, at 7:09 AM, revas wrote: Hi, To reframe my earlier question Some languages have just analyzers only but nostemmer from snowball porter,then does the analyzer take care of stemming as well? Some languages only have the stemmer from snowball but no analyzer? Some have both. Can we say then that solr supports all the above languages .Will search be same across all the above cases? I just responded to the earlier question, but it didn't contain this question. No, I wouldn't say that search would be the same. Stemmed vs. non-stemmed may result in different results, just as one stemmer implementation results will differ from a different stemming approach. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Help with relevance failure in Solr 1.3
Are there changes occuring when it goes bad that maybe aren't committed? On Apr 14, 2009, at 1:59 PM, Walter Underwood wrote: But why would it work for a few days, then go bad and stay bad? It fails for every multi-term query, even those not in cache. I ran a test with more queries than the cache size. We do use autowarming. wunder On 4/14/09 10:55 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Tue, Apr 14, 2009 at 12:19 PM, Walter Underwood wunderw...@netflix.com wrote: The JaroWinkler equals was broken, but I fixed that a month ago. Query cache sounds possible, but those are cleared on a commit, right? Yes, but if you use autowarming, those items are regenerated and if there is a problem with equals() then it could re-appear (the cache items are correct, it's just the lookup that returns the wrong one). -Yonik http://www.lucidimagination.com -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Help with relevance failure in Solr 1.3
Nope. This is a slave, so no indexing happens, just a sync. The sync happens once per day. It went bad at a different time. wunder On 4/14/09 11:42 AM, Grant Ingersoll gsing...@apache.org wrote: Are there changes occuring when it goes bad that maybe aren't committed? On Apr 14, 2009, at 1:59 PM, Walter Underwood wrote: But why would it work for a few days, then go bad and stay bad? It fails for every multi-term query, even those not in cache. I ran a test with more queries than the cache size. We do use autowarming. wunder On 4/14/09 10:55 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Tue, Apr 14, 2009 at 12:19 PM, Walter Underwood wunderw...@netflix.com wrote: The JaroWinkler equals was broken, but I fixed that a month ago. Query cache sounds possible, but those are cleared on a commit, right? Yes, but if you use autowarming, those items are regenerated and if there is a problem with equals() then it could re-appear (the cache items are correct, it's just the lookup that returns the wrong one). -Yonik http://www.lucidimagination.com -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Using Solr from AppEngine application via SolrJ: any problematic issues?
SolrJ would require some modification. SolrJ internally uses Jakarta HTTP Client via Solr's CommonsHttpSolrServer class. It would need to be ported to a different implementation of SolrServer (the base class), one that uses java.net.URL. I suggest JavaNetUrlHttpSolrServer. ~ David Smiley On 4/14/09 1:13 PM, Glen Newton glen.new...@gmail.com wrote: I was wondering if those more up on SolrJ internals could take a look if there were any serious gotchas with the AppEngine's Java urlfetch with respect to SolrJ. http://code.google.com/appengine/docs/java/urlfetch/overview.html The URL must use the standard ports for HTTP (80) and HTTPS (443). The port is implied by the scheme, but may also be mentioned in the URL as long as the port is standard for the scheme (https://...:443/). An app cannot connect to an arbitrary port of a remote host, nor can it use a non-standard port for a scheme. This is an annoyance for those running Solr on non-80/443. To some, this may be a fatal limitation. There is a 1M upload/download limit, which would impact large adds to the index and large results sets back from the index. There are also other quotas: http://code.google.com/appengine/docs/java/urlfetch/overview.html#Quotas_and_Limits Otherwise, my eyes see no other major issues. Others? thanks, Glen -- -
How to manage real-time (presence) data in a large index?
Hi everybody, I have a relatively large index (it will eventually contain ~4M documents and be about 3G in size, I think) that indexes user data, settings, and the like. The documents represent a community of users whereupon a subset of them may be online at any time. Also, we want to score our search results across searches that span the whole index by the online (i.e. presence) status. Right now the list of online members is kept in a database table, however we very often need to search on these users. The problem is, we're using Solr for our searches and we don't know how to approach setting up a search system for a large amount of highly volatile data. How do people typically go about this? Do they do one of the following: 1) Set up a second core and keep only index the online members in there? (Then we could not score normal search results by online status.) 2) Index the online status in our regular solr index and not worry about it? (If it's fast to update docs in a large index, then why not maintain real-time data in the main index?) 3) Just use a database for the presence data and forget about using Solr for the presence-related searches? Is there anything in Solr that I should be looking into to help with this problem? I'd appreciate any help. Sincerely, Daryl.
Re: Using Solr from AppEngine application via SolrJ: any problematic issues?
I see. So this is a show stopper for those wanting to use SolrJ with AppEngine. Any chance this could be added as a Solr issue? -glen 2009/4/14 Smiley, David W. dsmi...@mitre.org: SolrJ would require some modification. SolrJ internally uses Jakarta HTTP Client via Solr’s “CommonsHttpSolrServer” class. It would need to be ported to a different implementation of SolrServer (the base class), one that uses java.net.URL. I suggest “JavaNetUrlHttpSolrServer”. ~ David Smiley On 4/14/09 1:13 PM, Glen Newton glen.new...@gmail.com wrote: I was wondering if those more up on SolrJ internals could take a look if there were any serious gotchas with the AppEngine's Java urlfetch with respect to SolrJ. http://code.google.com/appengine/docs/java/urlfetch/overview.html The URL must use the standard ports for HTTP (80) and HTTPS (443). The port is implied by the scheme, but may also be mentioned in the URL as long as the port is standard for the scheme (https://...:443/). An app cannot connect to an arbitrary port of a remote host, nor can it use a non-standard port for a scheme. This is an annoyance for those running Solr on non-80/443. To some, this may be a fatal limitation. There is a 1M upload/download limit, which would impact large adds to the index and large results sets back from the index. There are also other quotas: http://code.google.com/appengine/docs/java/urlfetch/overview.html#Quotas_and_Limits Otherwise, my eyes see no other major issues. Others? thanks, Glen -- - -- -
Re: Using Solr from AppEngine application via SolrJ: any problematic issues?
On Wed, Apr 15, 2009 at 12:47 AM, Glen Newton glen.new...@gmail.com wrote: I see. So this is a show stopper for those wanting to use SolrJ with AppEngine. Any chance this could be added as a Solr issue? Yes, commons-httpclient tries to use Socket directly. So it may not work. It was mentioned here - http://briccetti.blogspot.com/2009/04/my-first-scala-web-app-on-google-app.html There is an issue I opened some time back which we could use - https://issues.apache.org/jira/browse/SOLR-599 -- Regards, Shalin Shekhar Mangar.
Distinct terms in facet field
How could I get a count of distinct terms for a given query? For example: The Wiki page http://wiki.apache.org/solr/SimpleFacetParameters has a section Facet Fields with No Zeros which shows the query: http://localhost:8983/solr/select?q=ipodrows=0facet=truefacet.limit=-1facet.field=catfacet.mincount=1facet.field=inStock and returns results where the inStock field has two facet counts (false is 3, and true is 1) But what I would want to know is how many distinct values were found ( in this case it would be 2 / true and false ). I realize I could count the number of terms returned, but if the set were large that would be non-performant. Is there a better way? Thanks, Tim
Hierarchal Faceting Field Type
Background: Set up a system for hierarchal categories using the following scheme: level one# level one#level two# level one#level two#level three# Trying to find the right combination of field type and query to get the desired results. Saw some previous posts about hierarchal facets which helped in the generating the right query but having an issue using the built in text field which ignores our delimiter and the string field which prevents us from doing a start with search. Does anyone have any insight into the field declaration? Any help is appreciated. Thank you.
Re: Search on all fields and know in which field was the match
: With this structure i think (correct me if i am wrong) i cant search for all : attachBody_* and know where the match was (attachBody_1, _2, _3, etc). correct : I really don't know if this is the best approach so any help would be : appreciated. one option is to index each attachemnt as it's own document *in addition* to indexing each email will all of hte attachment text in a single atachments field. that way you can search for all emails where Bob is mentioned in an attachment -- but if you want to know which specific attaahments mention bob you can do that search as well. -Hoss
Re: How to send a parsed Query to shards?
: reference some large in-memory lookup tables. After the search components : get done processing the orignal query, the query may contain SpanNearQueries : and DisjunctionMaxQueries. I'd like to send that query to the shards, not : the original query. : : I've come up with the following idea for doing this. Would people please : comment on this idea or suggest a better alternative? : : * Subclass QueryComponent to base64 encode the serialized form of the query : and send that in place of the original query. : : * set the queryParser on the shard servers to a custom class that unencodes : and deserializes the encoded query and returns it. those are essentially the same idea a query string is just a simple form of QUery serialization. a COmponent on your master could modify the query string to be anything you want (base64 encoded native serialization, xml based serialization, json, etc...) as long as the QParser on the slave machines know how to make sense of it. -Hoss
Re: Custom sort based on arbitrary order
: custom order that is fairly simple: there is a list of venues and some of : them are more relevant than others (there is no logic, it's arbitrary, it's : not an alphabetic order), it'd be something like this: : : Orange venue = 1 : Red venu = 2 : Blue venue = 3 : : So results where venue is orange should go first, then red and finally : blue. : Could you advice on the easiest way to have this example working? use your rules to add values to all the docs at index time ... then sort on that value (ie: for each doc you actually index the value of 1, 2, or 3 in a field no one ever looks at, but you sort on it.) -Hoss
Re: Help with relevance failure in Solr 1.3
Is bad memory a possibility? i.e. is it the same machine all the time? Is there any recognizable pattern for when it happens? -Grant (grasping at straws) On Apr 14, 2009, at 2:51 PM, Walter Underwood wrote: Nope. This is a slave, so no indexing happens, just a sync. The sync happens once per day. It went bad at a different time. wunder On 4/14/09 11:42 AM, Grant Ingersoll gsing...@apache.org wrote: Are there changes occuring when it goes bad that maybe aren't committed? On Apr 14, 2009, at 1:59 PM, Walter Underwood wrote: But why would it work for a few days, then go bad and stay bad? It fails for every multi-term query, even those not in cache. I ran a test with more queries than the cache size. We do use autowarming. wunder On 4/14/09 10:55 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Tue, Apr 14, 2009 at 12:19 PM, Walter Underwood wunderw...@netflix.com wrote: The JaroWinkler equals was broken, but I fixed that a month ago. Query cache sounds possible, but those are cleared on a commit, right? Yes, but if you use autowarming, those items are regenerated and if there is a problem with equals() then it could re-appear (the cache items are correct, it's just the lookup that returns the wrong one). -Yonik http://www.lucidimagination.com -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: using NGramTokenizerFactory for partial matching
: I want it to match lor lorem and lorem i. However I am finding it : matches the first two but not the third - the white space is causing : problems. Here are the relevant parts of my config: : : fieldType name=text_substring class=solr.TextField : positionIncrementGap=100 : analyzer type=index : tokenizer class=solr.NGramTokenizerFactory : minGramSize=3 maxGramSize=15 / NGramTokenizer doesn't do anything special with whitespace -- but teh QueryParser does ... what does your query for lorem i look like? if you're using the example query parser nad request handler configs then this won't work like you want... http://localhost:8963/select?q=lorem+i ...because the query parser will split on the whitespace. try quoting your string, or using the FieldQParserPlugin. -Hoss
Sort by distance from location?
Hi everybody, My index has latitude/longitude values for locations. I am required to do a search based on a set of criteria, and order the results based on how far the lat/long location is to the current user's location. Currently we are emulating such a search by adding criteria of ever-widening bounding boxes, and the more of those boxes match the document, the higher the score and thus the closer ones appear at the start of the results. The query looks something like this (newlines between each search term): +criteraOne:1 +criteriaTwo:true +latitude:[-90.0 TO 90.0] +longitude:[-180.0 TO 180.0] (latitude:[40.52 TO 40.81] longitude:[-74.17 TO -73.79]) (latitude:[40.30 TO 41.02] longitude:[-74.45 TO -73.51]) (latitude:[39.94 TO 41.38] longitude:[-74.93 TO -73.03]) [[...etc...about 10 times...]] Naturally this is quite slow (query is approximately 6x slower than normal), and... I can't help but feel that there's a more elegant way of sorting by distance. Does anybody know how to do this or have any suggestions? Sincerely, Daryl.
Re: Help with relevance failure in Solr 1.3
I already ruled out cosmic rays. It has happened on different hardware and at different times of day, including low load. The only thing associated with it is load from a new faceted browse thing we turned on. wunder On 4/14/09 2:23 PM, Grant Ingersoll gsing...@apache.org wrote: Is bad memory a possibility? i.e. is it the same machine all the time? Is there any recognizable pattern for when it happens? -Grant (grasping at straws) On Apr 14, 2009, at 2:51 PM, Walter Underwood wrote: Nope. This is a slave, so no indexing happens, just a sync. The sync happens once per day. It went bad at a different time. wunder On 4/14/09 11:42 AM, Grant Ingersoll gsing...@apache.org wrote: Are there changes occuring when it goes bad that maybe aren't committed? On Apr 14, 2009, at 1:59 PM, Walter Underwood wrote: But why would it work for a few days, then go bad and stay bad? It fails for every multi-term query, even those not in cache. I ran a test with more queries than the cache size. We do use autowarming. wunder On 4/14/09 10:55 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Tue, Apr 14, 2009 at 12:19 PM, Walter Underwood wunderw...@netflix.com wrote: The JaroWinkler equals was broken, but I fixed that a month ago. Query cache sounds possible, but those are cleared on a commit, right? Yes, but if you use autowarming, those items are regenerated and if there is a problem with equals() then it could re-appear (the cache items are correct, it's just the lookup that returns the wrong one). -Yonik http://www.lucidimagination.com -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Sort by distance from location?
Have you tried LocalSolr? http://www.gissearch.com/localsolr (I haven't but looks cool) On 4/14/09 5:31 PM, Development Team dev.and...@gmail.com wrote: Hi everybody, My index has latitude/longitude values for locations. I am required to do a search based on a set of criteria, and order the results based on how far the lat/long location is to the current user's location. Currently we are emulating such a search by adding criteria of ever-widening bounding boxes, and the more of those boxes match the document, the higher the score and thus the closer ones appear at the start of the results. The query looks something like this (newlines between each search term): +criteraOne:1 +criteriaTwo:true +latitude:[-90.0 TO 90.0] +longitude:[-180.0 TO 180.0] (latitude:[40.52 TO 40.81] longitude:[-74.17 TO -73.79]) (latitude:[40.30 TO 41.02] longitude:[-74.45 TO -73.51]) (latitude:[39.94 TO 41.38] longitude:[-74.93 TO -73.03]) [[...etc...about 10 times...]] Naturally this is quite slow (query is approximately 6x slower than normal), and... I can't help but feel that there's a more elegant way of sorting by distance. Does anybody know how to do this or have any suggestions? Sincerely, Daryl.
Re: Sort by distance from location?
Ah, good question: Yes, we've tried it... and it was slower. To give some avg times: Regular non-distance Searches: 100ms Our expanding-criteria solution: 600ms LocalSolr: 800ms (We also had problems with LocalSolr in that the results didn't seem to be cached in Solr upon doing a search. So each page of results meant another 800ms.) - Daryl. On Tue, Apr 14, 2009 at 5:34 PM, Smiley, David W. dsmi...@mitre.org wrote: Have you tried LocalSolr? http://www.gissearch.com/localsolr (I haven’t but looks cool)
Re: Disable logging in SOLR
Have you tried setting logging level to OFF from Solr's admin GUI: http://wiki.apache.org/solr/SolrAdminGUI Bill On Tue, Apr 14, 2009 at 9:56 AM, Kraus, Ralf | pixelhouse GmbH r...@pixelhouse.de wrote: Hi, is there a way to disable all logging output in SOLR ? I mean the output text like : INFO: [core_de] webapp=/solr path=/update params={wt=json} status=0 QTime=3736 greets -Ralf-
Re: _val:ord(field) (from wiki LargeIndexes)
: I see this interesting line in the wiki page LargeIndexes : http://wiki.apache.org/solr/LargeIndexes (sorting section towards the : bottom) : : Using _val:ord(field) as a search term will sort the results without : incurring the memory cost. : : I'd like to know what this means, but I'm having a bit of trouble : parsing it What is _val:ord(field) exactly? Does this just mean that's refering to using function queries with the _val_ hack that is supported by the LuceneQParserPlugin... http://wiki.apache.org/solr/SolrQuerySyntax ...it *seems* to be suggesting that if you use a function query based on the ordinal value of a field, you won't need the same amount of memory as if you just sorted on that field ... but that is incorrect, so i removed that like from the page. (for string fields, the same FieldCache is initialized either way, for non string fields following that advice could result in 2 or 3 times as much memory being needed for both the numeric FieldCache and the String FieldCache entries) -Hoss
Re: More than one language in the same document
: A related question. What does 'copyField' actually do? Does it 'append' : content from the source field to the 'target' field? Or does it : replace/overwrite it? Thank you. : : : It appends the content of the source field to the target. strictly speaking, it adds the content to the target field as if it were another multi-valued field value. -Hoss
Re: Hierarchal Faceting Field Type
Nasseam Elkarra wrote: Background: Set up a system for hierarchal categories using the following scheme: level one# level one#level two# level one#level two#level three# Trying to find the right combination of field type and query to get the desired results. Saw some previous posts about hierarchal facets which helped in the generating the right query but having an issue using the built in text field which ignores our delimiter and the string field which prevents us from doing a start with search. Does anyone have any insight into the field declaration? Any help is appreciated. Thank you. Out of need in my project, I'll get started to work for SOLR-64, expected any day. I'm thinking introducing a field type for hierarchical facet. Koji
Re: using multisearcher
: As for the second part, I was thinking of trying to replace the standard : SolrIndexSearcher with one that employs a MultiSearcher. But I'm not very : familiar with the workings of Solr, especially with respect to the caching : that goes on. I thought that maybe people who are more familiar with it might : have some tips on how to go about it. Or perhaps there are reasons that make : this a bad idea. If your indexes are all local, then using a MultiReader would be simpler trying to shoehorn MultiSearcher type logic into SolrIndexSearcher. https://issues.apache.org/jira/browse/SOLR-243 -Hoss
Re: Access HTTP headers from custom request handler
: Solr cannot assume that the request would always come from http (think : of EmbeddedSolrServer) .So it assumes that there are only parameters exactly. : Your best bet is to modify SolrDispatchFilter and readthe params and : set them in the SolrRequest Object SolrDispatchFilter is designed to be subclassed to make this easy by overriding the execute method... protected void execute( HttpServletRequest req, SolrRequestHandler handler, SolrQueryRequest sreq, SolrQueryResponse rsp) { sreq.getContext().put( HttpServletRequest, req ); super.execute( req, handler, sreq, rsp ) } -Hoss
Index Replication or Distributed Search ?
Hi, Can someone provide a practical advice of how large a Solr search index can be? for a better performance for consumer facing media website?. Is it good or bad to think about Distributed Search and dividing index in earlier stage of development? Thanks Ram -- View this message in context: http://www.nabble.com/Index-Replication-or-Distributed-Search---tp23050013p23050013.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Help with relevance failure in Solr 1.3
OK, I guess details on the new faceting stuff would be in order. Which faceting are using? Are you sure that it never occurred before (i.e. it slipped under the radar)? Obviously, the key is reproducibility here, but this has all the earmarks of some weird threading issue, it seems, at least IMO. On Apr 14, 2009, at 5:32 PM, Walter Underwood wrote: I already ruled out cosmic rays. It has happened on different hardware and at different times of day, including low load. The only thing associated with it is load from a new faceted browse thing we turned on. wunder On 4/14/09 2:23 PM, Grant Ingersoll gsing...@apache.org wrote: Is bad memory a possibility? i.e. is it the same machine all the time? Is there any recognizable pattern for when it happens? -Grant (grasping at straws) On Apr 14, 2009, at 2:51 PM, Walter Underwood wrote: Nope. This is a slave, so no indexing happens, just a sync. The sync happens once per day. It went bad at a different time. wunder On 4/14/09 11:42 AM, Grant Ingersoll gsing...@apache.org wrote: Are there changes occuring when it goes bad that maybe aren't committed? On Apr 14, 2009, at 1:59 PM, Walter Underwood wrote: But why would it work for a few days, then go bad and stay bad? It fails for every multi-term query, even those not in cache. I ran a test with more queries than the cache size. We do use autowarming. wunder On 4/14/09 10:55 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Tue, Apr 14, 2009 at 12:19 PM, Walter Underwood wunderw...@netflix.com wrote: The JaroWinkler equals was broken, but I fixed that a month ago. Query cache sounds possible, but those are cleared on a commit, right? Yes, but if you use autowarming, those items are regenerated and if there is a problem with equals() then it could re-appear (the cache items are correct, it's just the lookup that returns the wrong one). -Yonik http://www.lucidimagination.com -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: How to manage real-time (presence) data in a large index?
On Wed, Apr 15, 2009 at 12:39 AM, Development Team dev.and...@gmail.com wrote: Hi everybody, I have a relatively large index (it will eventually contain ~4M documents and be about 3G in size, I think) that indexes user data, settings, and the like. The documents represent a community of users whereupon a subset of them may be online at any time. Also, we want to score our search results across searches that span the whole index by the online (i.e. presence) status. Right now the list of online members is kept in a database table, however we very often need to search on these users. The problem is, we're using Solr for our searches and we don't know how to approach setting up a search system for a large amount of highly volatile data. How do people typically go about this? Do they do one of the following: 1) Set up a second core and keep only index the online members in there? (Then we could not score normal search results by online status.) This will not work because creating an index is quite expensive 2) Index the online status in our regular solr index and not worry about it? (If it's fast to update docs in a large index, then why not maintain real-time data in the main index?) Do you wish to have the data almost realtime?. That means you will have to commit too often. It may result in very poor performance 3) Just use a database for the presence data and forget about using Solr for the presence-related searches? If the no:of users is low enough to be held in a HashSet in memory, you can think of implementing a special Field akin to org.apache.solr.schema.ExternalFileField . But do not hope to make it realtime. But try to make it close to realtime (say 1 min update of the hashSet. means fetch the data from DB once in a minute). Is there anything in Solr that I should be looking into to help with this problem? I'd appreciate any help. Sincerely, Daryl. -- --Noble Paul
Re: Using Solr from AppEngine application via SolrJ: any problematic issues?
I guess SOLR-599 can be easily fixed if we do not implement Multipart-support (which is non-essential) --Noble On Wed, Apr 15, 2009 at 1:12 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Wed, Apr 15, 2009 at 12:47 AM, Glen Newton glen.new...@gmail.com wrote: I see. So this is a show stopper for those wanting to use SolrJ with AppEngine. Any chance this could be added as a Solr issue? Yes, commons-httpclient tries to use Socket directly. So it may not work. It was mentioned here - http://briccetti.blogspot.com/2009/04/my-first-scala-web-app-on-google-app.html There is an issue I opened some time back which we could use - https://issues.apache.org/jira/browse/SOLR-599 -- Regards, Shalin Shekhar Mangar. -- --Noble Paul