Re: why sorl is slower than lucene so much?
thanks a lot. I got it. On 2010年10月21日 22:36, Yonik Seeley wrote: 2010/10/21 kafka0102kafka0...@163.com: I found the problem's cause.It's the DocSetCollector. my fitler query result's size is about 300,so the DocSetCollector.getDocSet() is OpenBitSet. And 300 OpenBitSet.fastSet(doc) op is too slow. As I said in my other response to you, that's a perfect reason why you want Solr to cache that for you (unless the filter will be different each time). -Yonik http://www.lucidimagination.com
Re: Import From MYSQL database
i really try to index tables with english keywords in mysql database but fail, and also try to import data from this database during java and successed i don't know how to use the dataimport folder in contrib folder, may be this is the problem what i done was build configurations file (shema.xml, solrconfig.xml, db-data-config.xml) put mysql lib in lib folder -- View this message in context: http://lucene.472066.n3.nabble.com/Import-From-MYSQL-database-tp1738753p1751201.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr Javascript+JSON not optimized for SEO
Hi, When I retrieve data via javascript+JSON method (instead of REST via URL), the link which I click does not reflect what the user will end up seeing. Example for showing the features belonging to a LED TV product: JSON getFeatureFacets('LEDTV') Get features for LED TV REST www.domain.com/TVs/LEDTV Get features for LED TV As you see the href in the second example clearly displays what the user may expect when clicking this link. That is VERY important for search engines. So how can I still use javascript+JSON, but not loose the SEO value? Regards, Peter -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Javascript-JSON-not-optimized-for-SEO-tp1751641p1751641.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: MoreLikeThis explanation?
Hi Koji, I tried to apply your patch to the 1.4.0 tagged branch, but it didn't take completely. What branch does it work for? Darren On Thu, 2010-10-21 at 23:03 +0900, Koji Sekiguchi wrote: (10/10/21 20:33), dar...@ontrenet.com wrote: Hi, Does the latest Solr provide an explanation for results returned by MLT? No, but there is an open issue: https://issues.apache.org/jira/browse/SOLR-860 Koji
Re: different results depending on result format
strange..are you absolutely sure the two queries are directed to the same Solr instance? I'm running the same query from the admin page (which specifies the xml format) and I get the exact same results as solrj. On 21 October 2010 22:25, Mike Sokolov soko...@ifactory.com wrote: quick follow-up: I also notice that the query from solrj gets version=1, whereas the admin webapp puts version=2.2 on the query string, although this param doesn't seem to change the xml results at all. Does this indicate an older version of solrj perhaps? -Mike On 10/21/2010 04:47 PM, Mike Sokolov wrote: I'm experiencing something really weird: I get different results depending on whether I specify wt=javabin, and retrieve using SolrJ, or wt=xml. I spent quite a while staring at query params to make sure everything else is the same, and they do seem to be. At first I thought the problem related to the javabin format change that has been talked about recently, but I am using solr 1.4.0 and solrj 1.4.0. Notice in the two entries that the wt param is different and the hits result count is different. Oct 21, 2010 4:22:19 PM org.apache.solr.core.SolrCore execute INFO: [bopp.ba] webapp=/solr path=/select/ params={wt=xmlrows=20start=0facet=truefacet.field=ref_taxid_msq=*:*fl=uri,meta_ssversion=1} hits=261 status=0 QTime=1 Oct 21, 2010 4:22:28 PM org.apache.solr.core.SolrCore execute INFO: [bopp.ba] webapp=/solr path=/select params={wt=javabinrows=20start=0facet=truefacet.field=ref_taxid_msq=*:*fl=uri,meta_ssversion=1} hits=57 status=0 QTime=0 The xml format results seem to be the correct ones. So one thought I had is that I could somehow fall back to using xml format in solrj, but I tried SolrQuery.set('wt','xml') and that didn't have the desired effect (I get 'wt=javabinwt=javabin' in the log - ie the param is repeated, but still javabin). Am I crazy? Is this a known issue? Thanks for any suggestions
Re: mincount doesn't work with FacetQuery
This is a response to a thread from several months ago ( http://lucene.472066.n3.nabble.com/mincount-doesn-t-work-with-FacetQuery-tp473162p473162.html ) Sorry, I don't know where to get the thread number to request that specific thread from listserv and reply properly via email. Anyway, I've recently come across the same problem while working with branch_3x of Solr and I'm wondering if anyone ever opened a JIRA for this feature request? I can't find one but that doesn't mean it's not there, and I don't want to create a duplicate. Cheers Mark -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
facet Prefix (or term prefix)
I am aware of the facet.prefix facility. I am using SOLR to return a facetted fields contents - I use the facet.prefix to restrict what returns from SOLR - this is very useful for predictive search functionality (autocomplete). My only issue is that the field I facet on is a string and could have 2 or 3 words in it, thus this process will only return strings that begin with what the user is typing into my UI search box. It would be useful if I could get facets back where I could match somewhere in the facetted field (not just at the begninning), i.e. is there a fact.contains method? If not I'll just have to code this in my service layer having received all facets from SOLR (without the prefix) Thanks for any help. If you wish to view the St. James's Place email disclaimer, please use the link below http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
Re: facet Prefix (or term prefix)
Hi, There is no facet.contains facility there are alternatives. Instead of using the faceting engine, you will need to create a field that has an NGramTokenizer. Properly configured, you can use this field to query upon and it will return what you would expect from a facet.contains feature. Here's a post on the subject which you may find useful: http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular- queries-using-edgengrams/ Cheers, On Friday 22 October 2010 13:20:56 Jason Brown wrote: I am aware of the facet.prefix facility. I am using SOLR to return a facetted fields contents - I use the facet.prefix to restrict what returns from SOLR - this is very useful for predictive search functionality (autocomplete). My only issue is that the field I facet on is a string and could have 2 or 3 words in it, thus this process will only return strings that begin with what the user is typing into my UI search box. It would be useful if I could get facets back where I could match somewhere in the facetted field (not just at the begninning), i.e. is there a fact.contains method? If not I'll just have to code this in my service layer having received all facets from SOLR (without the prefix) Thanks for any help. If you wish to view the St. James's Place email disclaimer, please use the link below http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536600 / 06-50258350
Re: how well does multicore scale?
On Fri, Oct 22, 2010 at 11:18 AM, Lance Norskog goks...@gmail.com wrote: There is an API now for dynamically loading, unloading, creating and deleting cores. Restarting a Solr with thousands of cores will take, I don't know, hours. Is this in the trunk? Any docs available? On Thu, Oct 21, 2010 at 10:44 PM, Tharindu Mathew mcclou...@gmail.com wrote: Hi Mike, I've also considered using a separate cores in a multi tenant application, ie a separate core for each tenant/domain. But the cores do not suit that purpose. If you check out documentation no real API support exists for this so it can be done dynamically through SolrJ. And all use cases I found, only had users configuring it statically and then using it. That was maybe 2 or 3 cores. Please correct me if I'm wrong Solr folks. So your better off using a single index and with a user id and use a query filter with the user id when fetching data. On Fri, Oct 22, 2010 at 1:12 AM, Jonathan Rochkind rochk...@jhu.edu wrote: No, it does not seem reasonable. Why do you think you need a seperate core for every user? mike anderson wrote: I'm exploring the possibility of using cores as a solution to bookmark folders in my solr application. This would mean I'll need tens of thousands of cores... does this seem reasonable? I have plenty of CPUs available for scaling, but I wonder about the memory overhead of adding cores (aside from needing to fit the new index in memory). Thoughts? -mike -- Regards, Tharindu -- Lance Norskog goks...@gmail.com -- Regards, Tharindu
Re: how well does multicore scale?
On 10/22/10 1:44 AM, Tharindu Mathew wrote: Hi Mike, I've also considered using a separate cores in a multi tenant application, ie a separate core for each tenant/domain. But the cores do not suit that purpose. If you check out documentation no real API support exists for this so it can be done dynamically through SolrJ. And all use cases I found, only had users configuring it statically and then using it. That was maybe 2 or 3 cores. Please correct me if I'm wrong Solr folks. You can dynamically manage cores with solrj. See org.apache.solr.client.solrj.request.CoreAdminRequest's static methods for a place to start. You probably want to turn solr.xml's persist option on so that your cores survive restarts. So your better off using a single index and with a user id and use a query filter with the user id when fetching data. Many times this is probably the case - pro's and con's to each depending on what you are up to. - Mark lucidimagination.com On Fri, Oct 22, 2010 at 1:12 AM, Jonathan Rochkind rochk...@jhu.edu wrote: No, it does not seem reasonable. Why do you think you need a seperate core for every user? mike anderson wrote: I'm exploring the possibility of using cores as a solution to bookmark folders in my solr application. This would mean I'll need tens of thousands of cores... does this seem reasonable? I have plenty of CPUs available for scaling, but I wonder about the memory overhead of adding cores (aside from needing to fit the new index in memory). Thoughts? -mike
solr performance
last week we put our solr in production. it was a very smooth start. solr really works great and without any problems so far. its a huge improvement over our old intranet search i wonder however whether we can increase the search performance of our solr installation, just to make the search experience even better. i know that performance is depended on many different things and parameters so a general answer is hard to make. here are some figures: - at the moment we have a bout 20.000 search queries a day. - median query time is about 400ms - ca. 80% are running under 500ms - ca. 90% are running under 1s, - ca. 10% over 1s, 3% over 2s, - there are even some queries which lasts way to looong, over 6s and up to 18s there are even simple queries for one word which last that long. maybe there is one special thing to mention. we do have a kind of user-filter with each query, these parameters differs for each usergroup, so i think at least one of the caches won't work very well, because even if the query (foobar) is the same, fq and bq can (and will) differ from user to user. fq=__intern:0+OR+__intern:344 together with a boost query bq=__lokal:0^6+OR+__lokal:344^2 our query looks like: INFO: [core] webapp=/solr path=/select params={spellcheck=truefacet=onfacet.limit=500initSearch=1hl=onversion=1.2bq=__lokal:0^6+OR+__lokal:344^2fl=score,+id,+title,+visiblePath,+__doctype,+_erstelldatum,+_dienststelle,+_dokumententyp,+__source,+__intern,+objClass,+jurislinkUrl,+destinationUrl,+_aktenzeichen,+_stelle,+_zielgruppen,+_stichwort,+_kurzbeschreibung,+_autor,+_hauptthema,+_unterthemafacet.field=__sourcefacet.field=__dstfacet.field=__cyearfacet.field=_dokumententypfacet.field=__mikronavfacet.field=_zielgruppenfacet.field=__doctypespellcheck.count=2qt=dismaxfq=__intern:0+OR+__intern:344hl.fragsize=640facet.mincount=1spellcheck.extendedResults=truejson.nl=maphl.fl=body,+_kurzbeschreibung,+_stichwortwt=jsonspellcheck.collate=truehl.maxAnalyzedChars=9rows=20spellcheck.onlyMorePopular=falsestart=0facet.sort=indexq=foobar} hits=93 status=0 QTime=113 - we have indexed 115.000 documents, our index size is about 720 MB any hints where to look? what will the ramBufferSizeMB in mainIndex in solrconfig.xml do? does it make sense to increase this value? should we increase one of your caches? - we're using jetty, java jdk 1.6.0_21, java settings are -D64 -server -Xms892m -Xmx2048m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:-HeapDumpOnOutOfMemoryError -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled - our machine as 4GB of mem and 4 CPUs, load is about 0.6%, the java process seems to use only one CPU, no other services are running on this machine. - from the beginning we have a master/slave setup. but at the moment we are only working with the master. yesterday i included the slave in our search application, so that half the queries were handled by the master and the other half by the slave. the query times didn't change. so it is not a bottleneck with our machine or I/O or memory. - cache stats from admin panel queryResultCache, LRU Cache(maxSize=65536, initialSize=65536) lookups : 1159 hits : 498 hitratio : 0.42 (=== seems a bit low compared to the other) inserts : 697 evictions : 0 size : 661 warmupTime : 0 cumulative_lookups : 91470 cumulative_hits : 41370 cumulative_hitratio : 0.45 cumulative_inserts : 52835 cumulative_evictions : 0 documentCache, LRU Cache(maxSize=32768, initialSize=32768) lookups : 53099 hits : 45429 hitratio : 0.85 inserts : 7670 evictions : 0 size : 7670 warmupTime : 0 cumulative_lookups : 4254335 cumulative_hits : 3760521 cumulative_hitratio : 0.88 cumulative_inserts : 493814 cumulative_evictions : 0 fieldValueCache, Concurrent LRU Cache(maxSize=1, initialSize=10, minSize=9000, acceptableSize=9500, cleanupThread=false) lookups : 3312 hits : 3306 hitratio : 0.99 inserts : 3 evictions : 0 size : 3 warmupTime : 0 cumulative_lookups : 261969 cumulative_hits : 261351 cumulative_hitratio : 0.99 cumulative_inserts : 306 cumulative_evictions : 0 item__zielgruppen : {field=_zielgruppen,memSize=491861,tindexSize=46,time=10,phase1=10,nTerms=46,bigTerms=13,termInstances=53913,uses=1187} item___mikronav : {field=__mikronav,memSize=464524,tindexSize=82,time=5,phase1=5,nTerms=39,bigTerms=4,termInstances=18817,uses=1187} item___dst : {field=__dst,memSize=464640,tindexSize=66,time=10,phase1=9,nTerms=160,bigTerms=5,termInstances=86516,uses=1187} (these are a few of our facet fields) filterCache Concurrent LRU Cache(maxSize=16384, initialSize=16384, minSize=14745, acceptableSize=15564, cleanupThread=false) lookups : 26851 hits : 26434 hitratio : 0.98 inserts : 417 evictions : 0 size : 417 warmupTime : 0 cumulative_lookups : 1985851 cumulative_hits : 1959304 cumulative_hitratio : 0.98 cumulative_inserts : 26547 cumulative_evictions : 0 Markus Rietzler rietzler_software/ Rechenzentrum der
Re: different results depending on result format
Yes - I really only have the one solr instance. And I have plenty of other cases where I am getting good results back via solrj. It's really a mystery. Unfortunately I have to catch up on other stuff I have been neglecting, but I'll follow up when I'm able to get a solution... -Mike On 10/22/2010 06:58 AM, Savvas-Andreas Moysidis wrote: strange..are you absolutely sure the two queries are directed to the same Solr instance? I'm running the same query from the admin page (which specifies the xml format) and I get the exact same results as solrj. On 21 October 2010 22:25, Mike Sokolovsoko...@ifactory.com wrote: quick follow-up: I also notice that the query from solrj gets version=1, whereas the admin webapp puts version=2.2 on the query string, although this param doesn't seem to change the xml results at all. Does this indicate an older version of solrj perhaps? -Mike On 10/21/2010 04:47 PM, Mike Sokolov wrote: I'm experiencing something really weird: I get different results depending on whether I specify wt=javabin, and retrieve using SolrJ, or wt=xml. I spent quite a while staring at query params to make sure everything else is the same, and they do seem to be. At first I thought the problem related to the javabin format change that has been talked about recently, but I am using solr 1.4.0 and solrj 1.4.0. Notice in the two entries that the wt param is different and the hits result count is different. Oct 21, 2010 4:22:19 PM org.apache.solr.core.SolrCore execute INFO: [bopp.ba] webapp=/solr path=/select/ params={wt=xmlrows=20start=0facet=truefacet.field=ref_taxid_msq=*:*fl=uri,meta_ssversion=1} hits=261 status=0 QTime=1 Oct 21, 2010 4:22:28 PM org.apache.solr.core.SolrCore execute INFO: [bopp.ba] webapp=/solr path=/select params={wt=javabinrows=20start=0facet=truefacet.field=ref_taxid_msq=*:*fl=uri,meta_ssversion=1} hits=57 status=0 QTime=0 The xml format results seem to be the correct ones. So one thought I had is that I could somehow fall back to using xml format in solrj, but I tried SolrQuery.set('wt','xml') and that didn't have the desired effect (I get 'wt=javabinwt=javabin' in the log - ie the param is repeated, but still javabin). Am I crazy? Is this a known issue? Thanks for any suggestions
Re: Solr sorting problem
The field type of the first name and last name is text. Could that be why it's not sorting properly? I just changed it to string and started a full-import. Hopefully that will work. Thanks, Moazzam On Thu, Oct 21, 2010 at 7:42 PM, Jayendra Patil jayendra.patil@gmail.com wrote: need additional information . Sorting is easy in Solr just by passing the sort parameter However, when it comes to text sorting it depends on how you analyse and tokenize your fields Sorting does not work on fields with multiple tokens. http://wiki.apache.org/solr/FAQ#Why_Isn.27t_Sorting_Working_on_my_Text_Fields.3F On Thu, Oct 21, 2010 at 7:24 PM, Moazzam Khan moazz...@gmail.com wrote: Hey guys, I have a list of people indexed in Solr. I am trying to sort by their first names but I keep getting results that are not alphabetically sorted (I see the names starting with W before the names starting with A). I have a feeling that the results are first being sorted by relevancy then sorted by first name. Is there a way I can get the results to be sorted alphabetically? Thanks, Moazzam
Re: Solr sorting problem
For anyone who faced the same problem, changing the field to string from text worked! -Moazzam On Fri, Oct 22, 2010 at 8:50 AM, Moazzam Khan moazz...@gmail.com wrote: The field type of the first name and last name is text. Could that be why it's not sorting properly? I just changed it to string and started a full-import. Hopefully that will work. Thanks, Moazzam On Thu, Oct 21, 2010 at 7:42 PM, Jayendra Patil jayendra.patil@gmail.com wrote: need additional information . Sorting is easy in Solr just by passing the sort parameter However, when it comes to text sorting it depends on how you analyse and tokenize your fields Sorting does not work on fields with multiple tokens. http://wiki.apache.org/solr/FAQ#Why_Isn.27t_Sorting_Working_on_my_Text_Fields.3F On Thu, Oct 21, 2010 at 7:24 PM, Moazzam Khan moazz...@gmail.com wrote: Hey guys, I have a list of people indexed in Solr. I am trying to sort by their first names but I keep getting results that are not alphabetically sorted (I see the names starting with W before the names starting with A). I have a feeling that the results are first being sorted by relevancy then sorted by first name. Is there a way I can get the results to be sorted alphabetically? Thanks, Moazzam
Re: Strange file name after installing solr
On Oct 21, 2010, at 11:52 PM, Bac Hoang wrote: apache-solr-1.4.1Hello folks, I'm very new user to solr. Please help What I have in hand: 1) apache-solr-1.4.1; 2) Geronimo After installing solr.war using Geronimo administration GUI, I got a strange file, under the opt/dev/ofwi-geronimo2.1.6/repository/default/*solr/1287558884961/solr-1287558884961.war. *Is this alright or any thing abnormal? My Geronimo says that solr running status, but when start, I got an error java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in classpath or 'solr/conf/', cwd=/opt/dev... Looks like you don't have your Solr Home set. I would try starting with the Solr tutorial or one of the Solr books and get a basic understanding of how it works and then go towards deploying in Geronimo. Thanks indeed for your time With regards, Bac Hoang -- Grant Ingersoll http://www.lucidimagination.com
Re: different results depending on result format
OK I solved the problem. It turns out that I was connecting to the server using its FQDN (rosen.ifactory.com). When, instead, I connect to it using the name rosen (which maps to the same IP using the default domain name configured in my resolver, ifactory.com), I get results back. I am looking into the virtual hosts config in tomcat; it seems as if there must indeed be another solr instance running; in fact I'm now concerned there might be two solr instances running against the same data folder. yargh. -Mike On 10/22/2010 09:05 AM, Mike Sokolov wrote: Yes - I really only have the one solr instance. And I have plenty of other cases where I am getting good results back via solrj. It's really a mystery. Unfortunately I have to catch up on other stuff I have been neglecting, but I'll follow up when I'm able to get a solution... -Mike On 10/22/2010 06:58 AM, Savvas-Andreas Moysidis wrote: strange..are you absolutely sure the two queries are directed to the same Solr instance? I'm running the same query from the admin page (which specifies the xml format) and I get the exact same results as solrj. On 21 October 2010 22:25, Mike Sokolovsoko...@ifactory.com wrote: quick follow-up: I also notice that the query from solrj gets version=1, whereas the admin webapp puts version=2.2 on the query string, although this param doesn't seem to change the xml results at all. Does this indicate an older version of solrj perhaps? -Mike On 10/21/2010 04:47 PM, Mike Sokolov wrote: I'm experiencing something really weird: I get different results depending on whether I specify wt=javabin, and retrieve using SolrJ, or wt=xml. I spent quite a while staring at query params to make sure everything else is the same, and they do seem to be. At first I thought the problem related to the javabin format change that has been talked about recently, but I am using solr 1.4.0 and solrj 1.4.0. Notice in the two entries that the wt param is different and the hits result count is different. Oct 21, 2010 4:22:19 PM org.apache.solr.core.SolrCore execute INFO: [bopp.ba] webapp=/solr path=/select/ params={wt=xmlrows=20start=0facet=truefacet.field=ref_taxid_msq=*:*fl=uri,meta_ssversion=1} hits=261 status=0 QTime=1 Oct 21, 2010 4:22:28 PM org.apache.solr.core.SolrCore execute INFO: [bopp.ba] webapp=/solr path=/select params={wt=javabinrows=20start=0facet=truefacet.field=ref_taxid_msq=*:*fl=uri,meta_ssversion=1} hits=57 status=0 QTime=0 The xml format results seem to be the correct ones. So one thought I had is that I could somehow fall back to using xml format in solrj, but I tried SolrQuery.set('wt','xml') and that didn't have the desired effect (I get 'wt=javabinwt=javabin' in the log - ie the param is repeated, but still javabin). Am I crazy? Is this a known issue? Thanks for any suggestions
Re: Import From MYSQL database
In the main directory of jetty should be directory called 'logs' log name is usually coded like this: 2010_07_31.request.log change the date and try searching your system -- View this message in context: http://lucene.472066.n3.nabble.com/Import-From-MYSQL-database-tp1738753p1752946.html Sent from the Solr - User mailing list archive at Nabble.com.
Failing to successfully import international characters via DIH
Hi, wanted to share problem i have got with importing text from different languages. All international text looks wrong on luke and on AJAX solr. What I see for chinese and japanese characters is this: æ˜ ç”»ã‚„éŸ³æ¥½ãŒæ¥½ã—ã„ï¼AIã®ã‚µã‚¤ãƒ¢ãƒ³ã®ãƒ•ã‚¡ãƒ³ã§ã™ã€‚アダムやマットãŒå¥½ãã§ã™ã€‚LeeDeWyze優å‹ï¼I Although it should be: 映画や音楽が楽しい!AIのサイモンのファンです。アダムやマットが好きです。 My setup is Ubuntu server 10.04, Tomcat6, Solr 1.4 and mysql. Things i have configured but with no luck: 1. /etc/tomcat6/server.xml contains this Connector port=8080 protocol=HTTP/1.1 connectionTimeout=2 URIEncoding=UTF-8 redirectPort=8443 / 2. /etc/mysql/my.cnf contains: [mysqld] default-character-set = utf8 character-set-server = utf8 3. /etc/solr/conf/data-config.xml dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost:3306/spuvocom_spuvo?characterEncoding=UTF-8 encoding = UTF-8 / document 4. my mysql table collation is utf8_bin What would you recommend changing or checking? Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/Failing-to-successfully-import-international-characters-via-DIH-tp1753190p1753190.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Failing to successfully import international characters via DIH
What would you recommend changing or checking? Tomcat *Connector* URIEncoding. I have done this several times on tomcat, might be at a loss on other servers though. - Pradeep
Re: Failing to successfully import international characters via DIH
Holy cow, you already have this in place. I apologize. This looked exactly the kind of problem I have solved this way. On Fri, Oct 22, 2010 at 8:38 AM, Pradeep Singh pksing...@gmail.com wrote: What would you recommend changing or checking? Tomcat *Connector* URIEncoding. I have done this several times on tomcat, might be at a loss on other servers though. - Pradeep
Using different schemas when syncing with PostgreSQL and DIH
Hello everyone! I am using Solr synced with a PostgreSQL database using DIH and I am facing an issue. The thing is that I use one Solr server and different Postgre schemas in the same database, with the same tables inside each one, so the following queries: SELECT * FROM schema1.Objects; and SELECT * FROM schema2.Objects; are both valid. The schemas are completely dynamic, so I can't do anything manually each time I add a new schema. In the DIH id field I am using a combination of the schema name and PK of the Objects table, to avoid duplicates. My question is: Every time I do an import operation (delta or full) with DIH, I only need to sync the index with one schema only, so... is there a way to pass a custom parameter with the schema name to DIH so I can build the query with the corresponding schema name? Thank you very much! Juan M.
How to index long words with StandardTokenizerFactory?
I'm trying to force solr to index words which length is more than 255 symbols (this constant is DEFAULT_MAX_TOKEN_LENGTH in lucene StandardAnalyzer.java) using StandardTokenizerFactory as 'filter' tag in schema configuration XML. Specifying the maxTokenLength attribute won't work. I'd tried to make the dirty hack: I downloaded lucene-core-2.9.3 src and changed the DEFAULT_MAX_TOKEN_LENGTH to 100, built it to jar and replaced original lucene-core jar in solr /lib. But seems like that it had bring no effect.
Re: Solr Javascript+JSON not optimized for SEO
How can we see what each will do? Dennis Gearon --- On Fri, 10/22/10, PeterKerk vettepa...@hotmail.com wrote: From: PeterKerk vettepa...@hotmail.com Subject: Solr Javascript+JSON not optimized for SEO To: solr-user@lucene.apache.org Date: Friday, October 22, 2010, 2:59 AM Hi, When I retrieve data via javascript+JSON method (instead of REST via URL), the link which I click does not reflect what the user will end up seeing. Example for showing the features belonging to a LED TV product: JSON getFeatureFacets('LEDTV') Get features for LED TV REST www.domain.com/TVs/LEDTV Get features for LED TV As you see the href in the second example clearly displays what the user may expect when clicking this link. That is VERY important for search engines. So how can I still use javascript+JSON, but not loose the SEO value? Regards, Peter -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Javascript-JSON-not-optimized-for-SEO-tp1751641p1751641.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Failing to successfully import international characters via DIH
Sounds like one of three things: 1/ Everything is set to UTF-*, but the content has another encoding. 2/ Something 'mirocosoftish' is adding a BOM (byte order mark) that is being incorrectly interpreted. 3/ The byte order is wrong somewhere along the way and not being translated correctly across machine/media boundaries. You need to look at what your source is providing, directly first, before it gets into the database. Then do the following. I would open up an editor that you KNOW outputs utf-8: 1/ Compose a web page, view it with fonts set to UTF8, that will tell you that it is really creating UTF-8 files. (Obviously use some character over 0xFF) 2/ Build an SQL query with it that inserts one record, or many, using those characters. Try commandline, server side language, and any DBase management program also. Make the records distinct relative to where they are being inserted from. 3/ Select these records and view on a web page set to UTF-8 and see if they come out of the database OK. 4/ Import inot Solr, and view again in a browser set to UTF-8 Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Fri, 10/22/10, virtas pkaro...@gmail.com wrote: From: virtas pkaro...@gmail.com Subject: Failing to successfully import international characters via DIH To: solr-user@lucene.apache.org Date: Friday, October 22, 2010, 8:20 AM Hi, wanted to share problem i have got with importing text from different languages. All international text looks wrong on luke and on AJAX solr. What I see for chinese and japanese characters is this: æ˜ ç”»ã‚„éŸ³æ¥½ãŒæ¥½ã—ã„ï¼AIã®ã‚µã‚¤ãƒ¢ãƒ³ã®ãƒ•ã‚¡ãƒ³ã§ã™ã€‚アダムやマットãŒå¥½ãã§ã™ã€‚LeeDeWyze優å‹ï¼I Although it should be: 映画や音楽が楽しい!AIのサイモンのファンです。アダムやマットが好きです。 My setup is Ubuntu server 10.04, Tomcat6, Solr 1.4 and mysql. Things i have configured but with no luck: 1. /etc/tomcat6/server.xml contains this Connector port=8080 protocol=HTTP/1.1 connectionTimeout=2 URIEncoding=UTF-8 redirectPort=8443 / 2. /etc/mysql/my.cnf contains: [mysqld] default-character-set = utf8 character-set-server = utf8 3. /etc/solr/conf/data-config.xml dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost:3306/spuvocom_spuvo?characterEncoding=UTF-8 encoding = UTF-8 / document 4. my mysql table collation is utf8_bin What would you recommend changing or checking? Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/Failing-to-successfully-import-international-characters-via-DIH-tp1753190p1753190.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: How to index long words with StandardTokenizerFactory?
Hi Sergey, I've opened an issue to add a maxTokenLength param to the StandardTokenizerFactory configuration: https://issues.apache.org/jira/browse/SOLR-2188 I'll work on it this weekend. Are you using Solr 1.4.1? I ask because of your mention of Lucene 2.9.3. I'm not sure there will ever be a Solr 1.4.2 release. I plan on targeting Solr 3.1 and 4.0 for the SOLR-2188 fix. I'm not sure why you didn't get the results you wanted with your Lucene hack - is it possible you have other Lucene jars in your Solr classpath? Steve -Original Message- From: Sergey Bartunov [mailto:sbos@gmail.com] Sent: Friday, October 22, 2010 12:08 PM To: solr-user@lucene.apache.org Subject: How to index long words with StandardTokenizerFactory? I'm trying to force solr to index words which length is more than 255 symbols (this constant is DEFAULT_MAX_TOKEN_LENGTH in lucene StandardAnalyzer.java) using StandardTokenizerFactory as 'filter' tag in schema configuration XML. Specifying the maxTokenLength attribute won't work. I'd tried to make the dirty hack: I downloaded lucene-core-2.9.3 src and changed the DEFAULT_MAX_TOKEN_LENGTH to 100, built it to jar and replaced original lucene-core jar in solr /lib. But seems like that it had bring no effect.
Confusion about entities and documents
Hi all- I've been checking the online docs about this, but I haven't found a suitable explanation about how entities and sub-entities work within a document. I am loading records from a SQL database and everything seems to be getting flattened in a way I was not expecting. For example, I have a document that defines, say, engine. The engine is made up of parts, which are manufactured by various companies. A hypothetical, abbreviated config would be: document name=engines entity name=engine query=select id, name, desc from engines entity name=parts query = select part_id, part_name from parts where engine_id = ${engine.id} entity name=parts_manu query=select manu_name from manufacturer where id = ${parts.part_id} ... /entity /entity /entity /document What I get when I search for, say, XYZ, is a document that has XYZ Corp as a manufacturer name, but the array of parts_manu appears to be a child of the document, not the parts array. Is this the correct behavior, insofar as a document has a single level of elements, and that's it? If so, what might be a better strategy for being able to maintain the hierarchy of information within a document? Thanks for any info, Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Re: A bug in ComplexPhraseQuery ?
In my opinion, ordering term in a proximity search does not make sense! So the work around for us is to generate the opposite search every time a proximity operator is used. not very elegant! If you want I can make it configurable. You can define your choice in solrconfig.xml like this: queryParser name=complexphrase class=org.apache.solr.search.ComplexPhraseQParserPlugin bool name=inOrderfalse/bool /queryParser
Re: SolrJ addField with Reader
Is there an example of how to use ContentStreamBase.FileStream from SolrJ during indexing to reduce memory footprint? Using addField is requiring a string. The only example I could find in JUnits is below and does not show indexing... thx! *public* *void* testFileStream() *throws* IOException { 46:*File* http://www.java2s.com/Open-Source/Java-Document/6.0-JDK-Core/io-nio/java/io/File.java.htm file = *new* *File* http://www.java2s.com/Open-Source/Java-Document/6.0-JDK-Core/io-nio/java/io/File.java.htm(README); 47:assertTrue(file.exists()); // make sure you are running from: solr\src\test\test-files 48: 49:*ContentStreamBase* http://www.java2s.com/Open-Source/Java-Document/Search-Engine/apache-solr-1.2.0/org/apache/solr/util/ContentStreamBase.java.htm stream = *new* *ContentStreamBase.FileStream* http://www.java2s.com/Open-Source/Java-Document/Search-Engine/apache-solr-1.2.0/org/apache/solr/util/ContentStreamBase.java.htm( 50:file); 51:assertEquals(file.length(), stream.getSize().intValue()); 52:assertTrue(IOUtils.contentEquals(*new* *FileInputStream* http://www.java2s.com/Open-Source/Java-Document/6.0-JDK-Core/io-nio/java/io/FileInputStream.java.htm(file), 53:stream.getStream())); 54:assertTrue(IOUtils.contentEquals(*new* *FileReader* http://www.java2s.com/Open-Source/Java-Document/6.0-JDK-Core/io-nio/java/io/FileReader.java.htm(file), stream 55:.getReader())); 56:}
Re: How to index long words with StandardTokenizerFactory?
I'm using Solr 1.4.1. Now I'm successed with replacing lucene-core jar but maxTokenValue seems to be used in very strange way. Currenty for me it's set to 1024*1024, but I couldn't index a field with just size of ~34kb. I understand that it's a little weird to index such a big data, but I just want to know it doesn't work On 22 October 2010 20:36, Steven A Rowe sar...@syr.edu wrote: Hi Sergey, I've opened an issue to add a maxTokenLength param to the StandardTokenizerFactory configuration: https://issues.apache.org/jira/browse/SOLR-2188 I'll work on it this weekend. Are you using Solr 1.4.1? I ask because of your mention of Lucene 2.9.3. I'm not sure there will ever be a Solr 1.4.2 release. I plan on targeting Solr 3.1 and 4.0 for the SOLR-2188 fix. I'm not sure why you didn't get the results you wanted with your Lucene hack - is it possible you have other Lucene jars in your Solr classpath? Steve -Original Message- From: Sergey Bartunov [mailto:sbos@gmail.com] Sent: Friday, October 22, 2010 12:08 PM To: solr-user@lucene.apache.org Subject: How to index long words with StandardTokenizerFactory? I'm trying to force solr to index words which length is more than 255 symbols (this constant is DEFAULT_MAX_TOKEN_LENGTH in lucene StandardAnalyzer.java) using StandardTokenizerFactory as 'filter' tag in schema configuration XML. Specifying the maxTokenLength attribute won't work. I'd tried to make the dirty hack: I downloaded lucene-core-2.9.3 src and changed the DEFAULT_MAX_TOKEN_LENGTH to 100, built it to jar and replaced original lucene-core jar in solr /lib. But seems like that it had bring no effect.
Re: A bug in ComplexPhraseQuery ?
queryParser name=complexphrase class=org.apache.solr.search.ComplexPhraseQParserPlugin bool name=inOrderfalse/bool /queryParser I added this change to SOLR-1604, can you test it give us feedback?
Re: Using different schemas when syncing with PostgreSQL and DIH
On 10/22/2010 10:06 AM, Juan Manuel Alvarez wrote: My question is: Every time I do an import operation (delta or full) with DIH, I only need to sync the index with one schema only, so... is there a way to pass a custom parameter with the schema name to DIH so I can build the query with the corresponding schema name? Yes, there is. Below is the latest version of my dih config used with a MySQL database. I've got almost everything in the SELECT statement specified by the input URL, which gets built using the following template: http://HOST:PORT/solr/CORE/dataimport?command=COMMANDdbHost=DBHOSTdbSchema=DBSCHEMAdataTable=DATATABLEsgTable=SGTABLEnumShards=NUMSHARDSmodVal=MODVALminDid=MINDIDmaxDid=MAXDID dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver encoding=UTF-8 url=jdbc:mysql://${dataimporter.request.dbServer}:3306/${dataimporter.request.dbSchema}?zeroDateTimeBehavior=convertToNull batchSize=-1 user=removed password=removed/ document entity name=dataTable pk=did query= SELECT d.*,FROM_UNIXTIME(d.post_date) AS pd, s.search_group_map AS sg FROM ${dataimporter.request.dataTable} d LEFT JOIN ${dataimporter.request.sgTable} s ON d.feature=s.featurecode WHERE did gt; ${dataimporter.request.minDid} AND did lt;= ${dataimporter.request.maxDid} AND (did % ${dataimporter.request.numShards}) IN (${dataimporter.request.modVal}) GROUP BY d.did deltaImportQuery= SELECT d.*,FROM_UNIXTIME(d.post_date) AS pd, s.search_group_map AS sg FROM ${dataimporter.request.dataTable} d LEFT JOIN ${dataimporter.request.sgTable} s ON d.feature=s.featurecode WHERE did gt; ${dataimporter.request.minDid} AND did lt;= ${dataimporter.request.maxDid} AND (did % ${dataimporter.request.numShards}) IN (${dataimporter.request.modVal}) GROUP BY d.did deltaQuery=SELECT MAX(d.did) FROM ${dataimporter.request.dataTable} d !-- That lone angle bracket looks wrong, but it's not. -- field column=search_group splitBy=; */ /entity /document /dataConfig
Re: Using different schemas when syncing with PostgreSQL and DIH
Thank you Shawn! That was exactly what I was looking for! =o) On Fri, Oct 22, 2010 at 4:29 PM, Shawn Heisey s...@elyograg.org wrote: On 10/22/2010 10:06 AM, Juan Manuel Alvarez wrote: My question is: Every time I do an import operation (delta or full) with DIH, I only need to sync the index with one schema only, so... is there a way to pass a custom parameter with the schema name to DIH so I can build the query with the corresponding schema name? Yes, there is. Below is the latest version of my dih config used with a MySQL database. I've got almost everything in the SELECT statement specified by the input URL, which gets built using the following template: http://HOST:PORT/solr/CORE/dataimport?command=COMMANDdbHost=DBHOSTdbSchema=DBSCHEMAdataTable=DATATABLEsgTable=SGTABLEnumShards=NUMSHARDSmodVal=MODVALminDid=MINDIDmaxDid=MAXDID dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver encoding=UTF-8 url=jdbc:mysql://${dataimporter.request.dbServer}:3306/${dataimporter.request.dbSchema}?zeroDateTimeBehavior=convertToNull batchSize=-1 user=removed password=removed/ document entity name=dataTable pk=did query= SELECT d.*,FROM_UNIXTIME(d.post_date) AS pd, s.search_group_map AS sg FROM ${dataimporter.request.dataTable} d LEFT JOIN ${dataimporter.request.sgTable} s ON d.feature=s.featurecode WHERE did gt; ${dataimporter.request.minDid} AND did lt;= ${dataimporter.request.maxDid} AND (did % ${dataimporter.request.numShards}) IN (${dataimporter.request.modVal}) GROUP BY d.did deltaImportQuery= SELECT d.*,FROM_UNIXTIME(d.post_date) AS pd, s.search_group_map AS sg FROM ${dataimporter.request.dataTable} d LEFT JOIN ${dataimporter.request.sgTable} s ON d.feature=s.featurecode WHERE did gt; ${dataimporter.request.minDid} AND did lt;= ${dataimporter.request.maxDid} AND (did % ${dataimporter.request.numShards}) IN (${dataimporter.request.modVal}) GROUP BY d.did deltaQuery=SELECT MAX(d.did) FROM ${dataimporter.request.dataTable} d !-- That lone angle bracket looks wrong, but it's not. -- field column=search_group splitBy=; */ /entity /document /dataConfig
Re: how well does multicore scale?
Thanks for the advice, everyone. I'll take a look at the API mentioned and do some benchmarking over the weekend. -Mike On Fri, Oct 22, 2010 at 8:50 AM, Mark Miller markrmil...@gmail.com wrote: On 10/22/10 1:44 AM, Tharindu Mathew wrote: Hi Mike, I've also considered using a separate cores in a multi tenant application, ie a separate core for each tenant/domain. But the cores do not suit that purpose. If you check out documentation no real API support exists for this so it can be done dynamically through SolrJ. And all use cases I found, only had users configuring it statically and then using it. That was maybe 2 or 3 cores. Please correct me if I'm wrong Solr folks. You can dynamically manage cores with solrj. See org.apache.solr.client.solrj.request.CoreAdminRequest's static methods for a place to start. You probably want to turn solr.xml's persist option on so that your cores survive restarts. So your better off using a single index and with a user id and use a query filter with the user id when fetching data. Many times this is probably the case - pro's and con's to each depending on what you are up to. - Mark lucidimagination.com On Fri, Oct 22, 2010 at 1:12 AM, Jonathan Rochkind rochk...@jhu.edu wrote: No, it does not seem reasonable. Why do you think you need a seperate core for every user? mike anderson wrote: I'm exploring the possibility of using cores as a solution to bookmark folders in my solr application. This would mean I'll need tens of thousands of cores... does this seem reasonable? I have plenty of CPUs available for scaling, but I wonder about the memory overhead of adding cores (aside from needing to fit the new index in memory). Thoughts? -mike
Re: Date faceting +1MONTH problem
On Fri, Sep 17, 2010 at 9:51 PM, Chris Hostetter hossman_luc...@fucit.org wrote: the default query parser doesn't support range queries with mixed upper/lower bound inclusion. This has just been added to trunk. Things like [0 TO 100} now work. -Yonik http://www.lucidimagination.com
Re: Confusion about entities and documents
What I get when I search for, say, XYZ, is a document that has XYZ Corp as a manufacturer name, but the array of parts_manu appears to be a child of the document, not the parts array. Is this the correct behavior, insofar as a document has a single level of elements, and that's it? If so, what might be a better strategy for being able to maintain the hierarchy of information within a document? Yes, this is the correct behavior. I still struggle with the same issue, and there is no 'best practices' (that I have found at least) of maintaining relationships within a Solr doc. The argument is Solr is not the correct place for these representations and should only represent a flat version of your document. For a similar question see: http://lucene.472066.n3.nabble.com/Schema-Definition-Question-td1049966.html#a1105593 A few possible solutions are posted there, and i'm interested in how others have tackled this issue. -- View this message in context: http://lucene.472066.n3.nabble.com/Confusion-about-entities-and-documents-tp1753926p1755152.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Confusion about entities and documents
Hmm, okay, I guess I wasn't taking the hierarchy-flattening aspect of Solr seriously enough. :) Based on your reply from the other thread, I guess the best solution, as far as I can tell, is to maintain the multiple value lists and take advantage of the fact that the arrays will always be in the right order: arr name=manu_id int1/int int2/int /arr arr name=manu_name strABC Corp/str !-- ID should be 1, right? -- strXYZ Inc/str !-- Should be 2 -- /arr So I guess the problem isn't really *sooo* bad...I just need to make sure that I have the appropriate names defined so I can link between two arrays in my client code. I suppose I could keep things straight by preserving the hierarchy within the name attribute. -Original Message- From: harrysmith [mailto:harrysmith...@gmail.com] Sent: Friday, October 22, 2010 4:10 PM To: solr-user@lucene.apache.org Subject: Re: Confusion about entities and documents What I get when I search for, say, XYZ, is a document that has XYZ Corp as a manufacturer name, but the array of parts_manu appears to be a child of the document, not the parts array. Is this the correct behavior, insofar as a document has a single level of elements, and that's it? If so, what might be a better strategy for being able to maintain the hierarchy of information within a document? Yes, this is the correct behavior. I still struggle with the same issue, and there is no 'best practices' (that I have found at least) of maintaining relationships within a Solr doc. The argument is Solr is not the correct place for these representations and should only represent a flat version of your document. For a similar question see: http://lucene.472066.n3.nabble.com/Schema-Definition-Question-td1049966.html#a1105593 A few possible solutions are posted there, and i'm interested in how others have tackled this issue. -- View this message in context: http://lucene.472066.n3.nabble.com/Confusion-about-entities-and-documents-tp1753926p1755152.html Sent from the Solr - User mailing list archive at Nabble.com. DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Question about gaze
Anyone have an idea about the error below. Installed gaze and it works ok, then when trying to import after installing received the following error. Commented out the line in solrconfig.xml and it imports fine again, but now gaze Any ideas? Thanks, Dean ./import-marc.sh /home/filemove/vufindextract.mrc /var/solr/solr /var/solr Now Importing /home/filemove/vufindextract.mrc ... /usr/java/latest/bin/java -Xms2048m -Xmx4096m -XX:+UseParallelGC -XX:NewRatio=5 -XX:-PrintGC -XX:-PrintGCDetails -XX:-PrintGCTimeStamps -Dsolrmarc.solr.war.path=/var/solr/solr/jetty/webapps/solr.war -Dsolr.core.name=biblio -Dsolrmarc.path=/var/solr/import -Dsolr.path=/var/solr/solr -Dsolr.solr.home=/var/solr/solr -jar /var/solr/import/SolrMarc.jar /var/solr/import/import.properties /home/filemove/vufindextract.mrc INFO [main] (MarcImporter.java:769) - Starting SolrMarc indexing. INFO [main] (Utils.java:189) - Opening file: /var/solr/import/import.properties INFO [main] (MarcHandler.java:325) - Attempting to open data file: /home/filemove/vufindextract.mrc INFO [main] (MarcImporter.java:618) - Updating to Solr index at /var/solr/solr INFO [main] (MarcImporter.java:634) - Using Solr core biblio INFO [main] (SolrCoreLoader.java:102) - Using the data directory of: /var/solr/solr/biblio INFO [main] (SolrCoreLoader.java:104) - Using the multicore schema file at : /var/solr/solr/solr.xml INFO [main] (SolrCoreLoader.java:105) - Using the biblio core Oct 22, 2010 5:34:39 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NoClassDefFoundError: com/lucidimagination/gaze/shared/GazeStorage at com.lucidimagination.gaze.plugin.StatMonitor.init(StatMonitor.java:120) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at java.lang.Class.newInstance0(Class.java:355) at java.lang.Class.newInstance(Class.java:308) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:398) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:425) at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:449) at org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:152) at org.apache.solr.core.SolrCore.init(SolrCore.java:556) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:428) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:278) at org.apache.solr.core.CoreContainer.init(CoreContainer.java:181) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.solrmarc.solr.SolrCoreLoader.loadCore(SolrCoreLoader.java:110) at org.solrmarc.marc.MarcImporter.getSolrProxy(MarcImporter.java:635) at org.solrmarc.marc.MarcImporter.loadLocalProperties(MarcImporter.java:173) at org.solrmarc.marc.MarcHandler.init(MarcHandler.java:112) at org.solrmarc.marc.MarcImporter.main(MarcImporter.java:775) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.simontuffs.onejar.Boot.run(Boot.java:334) at com.simontuffs.onejar.Boot.main(Boot.java:170) Caused by: java.lang.ClassNotFoundException: com.lucidimagination.gaze.shared.GazeStorage at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:627) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) ... 30 more Oct 22, 2010 5:34:39 PM org.apache.solr.core.SolrCore finalize SEVERE: REFCOUNT ERROR: unreferenced org.apache.solr.core.solrc...@4a5f2db0 (biblio) has a reference count of 1 Error: Problem creating updateHandler in SolrCoreProxy ERROR [main] (MarcImporter.java:310) - Error indexing record: 8188 -- Error: Problem creating updateHandler in SolrCoreProxy org.solrmarc.solr.SolrRuntimeException: Error: Problem creating updateHandler in SolrCoreProxy
Re: Date faceting +1MONTH problem
On 10/22/2010 3:01 PM, Yonik Seeley wrote: On Fri, Sep 17, 2010 at 9:51 PM, Chris Hostetter hossman_luc...@fucit.org wrote: the default query parser doesn't support range queries with mixed upper/lower bound inclusion. This has just been added to trunk. Things like [0 TO 100} now work. Awesome! Is it easily ported back to branch_3x? Shawn
Solr ExtractingRequestHandler with Compressed files
Hi, Has anyone had success using ExtractingRequestHandler and Tika with any of the compressed file formats (zip, tar, gz, etc) ? I am sending solr the archived.tar file using curl. curl http://localhost:8983/solr/update/extract?literal.id=doc1fmap.content=body_textscommit=true; -H 'Content-type:application/octet-stream' --data-binary @/home/archived.tar The result I get when I query the document is that the filenames inside the archive are indexed as the body_texts, but the content of those files is not extracted or included. This is not the behvior I expected. Ref: http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Content-Extraction-Tika#article.tika.example. When I send 1 of the actual documents inside the archive using the same curl command the extracted content is then stored in the body_texts field. Am I missing a step for the compressed files? I have added all the extraction depednenices as indicated by mat in http://outoftime.lighthouseapp.com/projects/20339/tickets/98-solr-cell and am able to succesfully extract data from MS Word, PDF, HTML documents. I'm using the following library versions. Solr 1.40, Solr Cell 1.4.1, with Tika Core 0.4 Given everything I have read this version of Tika should support extracting data from all files within a compressed file. Any help or suggestions would be appreciated.
Re: Date faceting +1MONTH problem
On Fri, Oct 22, 2010 at 6:02 PM, Shawn Heisey s...@elyograg.org wrote: On 10/22/2010 3:01 PM, Yonik Seeley wrote: On Fri, Sep 17, 2010 at 9:51 PM, Chris Hostetter hossman_luc...@fucit.org wrote: the default query parser doesn't support range queries with mixed upper/lower bound inclusion. This has just been added to trunk. Things like [0 TO 100} now work. Awesome! Is it easily ported back to branch_3x? Between the refactoring work on the QP, and the back compat concerns, it's not trivial. -Yonik http://www.lucidimagination.com
RE: How to index long words with StandardTokenizerFactory?
Hi Sergey, What does your ~34kb field value look like? Does StandardTokenizer think it's just one token? What doesn't work? What happens? Steve -Original Message- From: Sergey Bartunov [mailto:sbos@gmail.com] Sent: Friday, October 22, 2010 3:18 PM To: solr-user@lucene.apache.org Subject: Re: How to index long words with StandardTokenizerFactory? I'm using Solr 1.4.1. Now I'm successed with replacing lucene-core jar but maxTokenValue seems to be used in very strange way. Currenty for me it's set to 1024*1024, but I couldn't index a field with just size of ~34kb. I understand that it's a little weird to index such a big data, but I just want to know it doesn't work On 22 October 2010 20:36, Steven A Rowe sar...@syr.edu wrote: Hi Sergey, I've opened an issue to add a maxTokenLength param to the StandardTokenizerFactory configuration: https://issues.apache.org/jira/browse/SOLR-2188 I'll work on it this weekend. Are you using Solr 1.4.1? I ask because of your mention of Lucene 2.9.3. I'm not sure there will ever be a Solr 1.4.2 release. I plan on targeting Solr 3.1 and 4.0 for the SOLR-2188 fix. I'm not sure why you didn't get the results you wanted with your Lucene hack - is it possible you have other Lucene jars in your Solr classpath? Steve -Original Message- From: Sergey Bartunov [mailto:sbos@gmail.com] Sent: Friday, October 22, 2010 12:08 PM To: solr-user@lucene.apache.org Subject: How to index long words with StandardTokenizerFactory? I'm trying to force solr to index words which length is more than 255 symbols (this constant is DEFAULT_MAX_TOKEN_LENGTH in lucene StandardAnalyzer.java) using StandardTokenizerFactory as 'filter' tag in schema configuration XML. Specifying the maxTokenLength attribute won't work. I'd tried to make the dirty hack: I downloaded lucene-core-2.9.3 src and changed the DEFAULT_MAX_TOKEN_LENGTH to 100, built it to jar and replaced original lucene-core jar in solr /lib. But seems like that it had bring no effect.
Re: xpath processing
Quoting pghorp...@ucla.edu: Can someone help me please? I am trying to import mods xml data in solr using the xml/http datasource This does not work with XPathEntityProcessor of the data import handler xpath=/mods/name/namepa...@type = 'date'] I actually have 143 records with type attribute as 'date' for element namePart. Thank you Parinita
Re: how well does multicore scale?
http://wiki.apache.org/solr/CoreAdmin Since Solr 1.3 On Fri, Oct 22, 2010 at 1:40 PM, mike anderson saidthero...@gmail.com wrote: Thanks for the advice, everyone. I'll take a look at the API mentioned and do some benchmarking over the weekend. -Mike On Fri, Oct 22, 2010 at 8:50 AM, Mark Miller markrmil...@gmail.com wrote: On 10/22/10 1:44 AM, Tharindu Mathew wrote: Hi Mike, I've also considered using a separate cores in a multi tenant application, ie a separate core for each tenant/domain. But the cores do not suit that purpose. If you check out documentation no real API support exists for this so it can be done dynamically through SolrJ. And all use cases I found, only had users configuring it statically and then using it. That was maybe 2 or 3 cores. Please correct me if I'm wrong Solr folks. You can dynamically manage cores with solrj. See org.apache.solr.client.solrj.request.CoreAdminRequest's static methods for a place to start. You probably want to turn solr.xml's persist option on so that your cores survive restarts. So your better off using a single index and with a user id and use a query filter with the user id when fetching data. Many times this is probably the case - pro's and con's to each depending on what you are up to. - Mark lucidimagination.com On Fri, Oct 22, 2010 at 1:12 AM, Jonathan Rochkind rochk...@jhu.edu wrote: No, it does not seem reasonable. Why do you think you need a seperate core for every user? mike anderson wrote: I'm exploring the possibility of using cores as a solution to bookmark folders in my solr application. This would mean I'll need tens of thousands of cores... does this seem reasonable? I have plenty of CPUs available for scaling, but I wonder about the memory overhead of adding cores (aside from needing to fit the new index in memory). Thoughts? -mike -- Lance Norskog goks...@gmail.com
Re: xpath processing
Parinita, In its simplest form, what does your entity definition for DIH look like; also, what does one record from your xml look like? We need more information before we can really be of any help. :) - Ken It looked like something resembling white marble, which was probably what it was: something resembling white marble. -- Douglas Adams, The Hitchhikers Guide to the Galaxy On Fri, Oct 22, 2010 at 8:00 PM, pghorp...@ucla.edu wrote: Quoting pghorp...@ucla.edu: Can someone help me please? I am trying to import mods xml data in solr using the xml/http datasource This does not work with XPathEntityProcessor of the data import handler xpath=/mods/name/namepa...@type = 'date'] I actually have 143 records with type attribute as 'date' for element namePart. Thank you Parinita
Re: xpath processing
dataConfig dataSource name=myfilereader type=FileDataSource/ document entity name=f rootEntity=false dataSource=null processor=FileListEntityProcessor fileName=.*xml recursive=true baseDir=C:\data\sample_records\mods\starr entity name=x dataSource=myfilereader processor=XPathEntityProcessor url=${f.fileAbsolutePath} stream=false forEach=/mods transformer=DateFormatTransformer,RegexTransformer,TemplateTransformer field column=id template=${f.file}/ field column=collectionKey template=starr/ field column=collectionName template=starr/ field column=fileAbsolutePath template=${f.fileAbsolutePath}/ field column=fileName template=${f.file}/ field column=fileSize template=${f.fileSize}/ field column=fileLastModified template=${f.fileLastModified}/ field column=classification_keyword xpath=/mods/classification/ field column=accessCondition_keyword xpath=/mods/accessCondition/ field column=nameNamePart_s xpath=/mods/name/namepa...@type = 'date'] / /entity /entity /document /dataConfig Quoting Ken Stanley doh...@gmail.com: Parinita, In its simplest form, what does your entity definition for DIH look like; also, what does one record from your xml look like? We need more information before we can really be of any help. :) - Ken It looked like something resembling white marble, which was probably what it was: something resembling white marble. -- Douglas Adams, The Hitchhikers Guide to the Galaxy On Fri, Oct 22, 2010 at 8:00 PM, pghorp...@ucla.edu wrote: Quoting pghorp...@ucla.edu: Can someone help me please? I am trying to import mods xml data in solr using the xml/http datasource This does not work with XPathEntityProcessor of the data import handler xpath=/mods/name/namepa...@type = 'date'] I actually have 143 records with type attribute as 'date' for element namePart. Thank you Parinita