Re: Hi
(start-off-topic): Alexandre, nice ideas. Last in the *) list is a bit far stretched, but still good. I would still add one: how to have exact matches and inexact matches in the same analyzed field. (end-off-topic) On Wed, Jan 23, 2013 at 2:40 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: We need a Make your own adventure (TM) Solr troubleshooting guide. :-) *) You are staring at the Solr installation full of twisty little passages and nuances. Would you like to: *) Build your first index? *) Make your first query? *) Spread your documents in the cloud? *) Build your own UpdateProcessor to integrate reverse Geocoding web service into your NLP disambiguation UIMA module to drive your More Like This suggestions? Well, maybe somebody with more imagination can figure the better way to phrase it. Then, we make a mobile app for doing this and retire millionaires. :-) Though that last one could make for an awesome Solr demo. :-) Seriously though. Thendral, You do need to say at least how far you got before you emailed us. Have you gone through tutorial and understood that but your own custom schema is giving you troubles? Have you tried indexing a Solr Update XML document containing the data you believe you have? You need to be able to take a long problem and split it into half and see which half works and which one does not. It is bit hard from your description. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Jan 23, 2013 at 7:00 AM, Upayavira u...@odoko.co.uk wrote: You are going to have to give more information than this. If you get bad request, look in the logs for the Solr server and you will probably find an exception there that tells you what was wrong with your document. Upayavira On Wed, Jan 23, 2013, at 08:58 AM, Thendral Thiruvengadam wrote: Hi, We are trying to use solar for indexing our application data. When we try to add a new object into solr, we are getting Bad Request. Please help us with this. Thanks, Thendral http://www.mindtree.com/email/disclaimer.html
RE: Issues with docFreq/docCount on SolrCloud
Alright, so my suggestion of overriding HttpShardHandler to route users to the same replica instead of shuffling the replica URL's is doable? What about the comment in HttpShardHandler then? // // Shuffle the list instead of use round-robin by default. // This prevents accidental synchronization where multiple shards could get in sync // and query the same replica at the same time. // if (urls.size() 1) Collections.shuffle(urls, httpShardHandlerFactory.r); shardToURLs.put(shard, urls); Instead of shuffling i would then hash the user to the correct replica if possible. Thanks, Markus -Original message- From:Mark Miller markrmil...@gmail.com Sent: Thu 24-Jan-2013 00:33 To: solr-user@lucene.apache.org Subject: Re: Issues with docFreq/docCount on SolrCloud On Jan 23, 2013, at 6:21 PM, Yonik Seeley yo...@lucidworks.com wrote: A solr request could request a token that when resubmitted with a follow-up request would result in hitting the same replicas if possible. Yeah, this would be good. It's also useful for not catching eventual consistency effects between queries. - Mark
RE: problem in qf parameter - no results
Hi, I think it's your mm-parameter and that the terms are not matched in the 'setctor' field. Cheers, -Original message- From:Gastone Penzo gastone.pe...@gmail.com Sent: Thu 24-Jan-2013 10:11 To: solr-user@lucene.apache.org Subject: problem in qf parameter - no results Hi, i have a problem with qf parameter: 38 results localhost:8983/solr/select/?defType=edismaxqf=title^1author ^0.75 publisher^0.25q=bibbia di gerusalemme 0 risults localhost:8983/solr/select/?defType=edismaxqf=title^1 author^0.75 publisher^0.25 setctor^0.25q=bibbia di gerusalemme the different is only the field sector which is: field name=sector type=string indexed=true stored=true required=false multiValued=true/ why adding the sector field in qf parameter solr returns 0 products?? thank you -- *Gastone Penzo* * *
Re: Confused by queries
Hello. That is indeed an excellent article, thanks for pointing me at it. With a title like that, it is no wonder that I was unable to google it on my own. It is probably the exception in this rule that has been confusing me: If a BooleanQuery contains no MUST BooleanClauses, then a document is only considered a match against the BooleanQuery if one or more of the SHOULD BooleanClauses is a match. So +group:id +keyword:text and (+group:id) +keyword:text mean completely different things. I have mostly been using the reference at http://lucene.apache.org/core/3_6_0/queryparsersyntax.html and it does not mention this distinction. Quite the contrary, actually, as it says that grouping can be used to eliminate confusion, thereby suggesting that the usual rules of Boolean algebra apply. Thanks again, Anders. On 23.01.2013 02:20, Erick Erickson wrote: Solr/Lucene does not implement strict boolean logic. Here's an excellent blog discussing this: http://searchhub.org/dev/2011/12/28/why-not-and-or-and-not/ Best Erick On Tue, Jan 22, 2013 at 7:25 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Well, depends on what you indexed. Otis Solr ElasticSearch Support http://sematext.com/ On Jan 22, 2013 5:48 PM, Anders Melchiorsen m...@spoon.kalibalik.dk wrote: Thanks, though I am still confused. How about this one: manu:apple = 1 hit +name:video = 2 hits manu:apple +name:video = 2 hits Solr ignores the manu:apple part completely? Cheers, Anders. Den 22/01/13 23.16, Jack Krupansky skrev: The first query: name:ipod OR -name:ipod = 0 hits The OR and - are actually at the same level of the BooleanQuery, so the - overrides the OR so it's equivalent to: name:ipod -name:ipod = 0 hits For the second query: (name:ipod) OR (-name:ipod) = 3 hits Pure negative queries are supported only at the top level, so the (-name:ipod) matches nothing, so the query is equivalent to: (name:ipod) = 3 hits You can simply insert a *:* to assure that it is not a pure negative query inside the parentheses: (name:ipod) OR (*:* -name:ipod) -- Jack Krupansky -Original Message- From: Anders Melchiorsen Sent: Tuesday, January 22, 2013 4:59 PM To: solr-user@lucene.apache.org Subject: Confused by queries Hello! With the example server of Solr 4.0.0 (with *.xml indexed), I get these results: *:* = 32 hits name:ipod = 3 hits -name:ipod = 29 hits That is all fine, but for these next queries, I would expect to get 32 hits (i.e. everything), or at least the same number of hits for both queries: name:ipod OR -name:ipod = 0 hits (name:ipod) OR (-name:ipod) = 3 hits As my expectations are not met, I must be missing something? Thanks, Anders.
Re: setting up master and slave in same machine with diff ip's and same port
You could configure your servlet container (jetty/tomcat) to have specific webapps/contexts listen on specific IP/port combinations, that would get you some way, But what you are asking is more about networking and servlet container configuration than about Solr. Upayavira On Wed, Jan 23, 2013, at 10:48 PM, epnRui wrote: Hi everyone its my first post here so I hope im doing it in the right place. Im a software developer and Im setting up a DEV environment in Ubuntu with the same configuration as in PROD. (apparently this IT department doesnt know the difference between a developer and a sys admin) In PROD we have Solr Master and Solr slave, on two different IPs. Lets say: Master 192.10.1.1 Slave 192.10.1.2 In DEV I have only one server: 10.1.1.1 All of them are Ubuntu servers. Can I put Master and Slave, without touching any configurations in Solr,no IP change, no Port change, in 10.1.1.1 (DEV), and still make it work? Basically what Im looking for is what Ubuntu server configuration Id have to do to make this work. Thanks a lot -- View this message in context: http://lucene.472066.n3.nabble.com/setting-up-master-and-slave-in-same-machine-with-diff-ip-s-and-same-port-tp4035795.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: zookeeper config
Cool. Thanks. On 24-Jan-2013, at 1:28 PM, Per Steffensen st...@designware.dk wrote: This is supported. You just need to ajust your ZK connection-string: host1:port1/solr,host2:port2/solr,...,hostN:portN/solr Regards, Per Steffensen On 1/24/13 7:57 AM, J Mohamed Zahoor wrote: Hi I am using Solr 4.0. I see the Solr data in zookeeper is placed on the root znode itself. This becomes a pain if the zookeeper instance is used for multiple projects like HBase and like. I am thinking of raising a Jira for putting them under a znode /solr or something like that? ./Zahoor
solr running with multi cores
Hi, Our company want to use solr to index our reports'data ,so we are going to understand solr. Solr support the multi cores ,in our system, the cores'num will dynamic increase, I afraid with more cores,the performance is decresing dramatically.Our system's cores'num will by over one hundred. What I want to know is: How many cores is supported of solr , under which level can solr running perfectly? How solr allocate the system's resource(memory,disk space, cpu...) of the multi cores? Is there a performance experment about solr running with many cores ? Thanks junlin.
Re: solr running with multi cores
Hi, Please search the mailing list archives - this has been discussed a few times in the last few months. Otis Solr ElasticSearch Support http://sematext.com/ On Jan 24, 2013 6:33 AM, real_junlin real_jun...@163.com wrote: Hi, Our company want to use solr to index our reports'data ,so we are going to understand solr. Solr support the multi cores ,in our system, the cores'num will dynamic increase, I afraid with more cores,the performance is decresing dramatically.Our system's cores'num will by over one hundred. What I want to know is: How many cores is supported of solr , under which level can solr running perfectly? How solr allocate the system's resource(memory,disk space, cpu...) of the multi cores? Is there a performance experment about solr running with many cores ? Thanks junlin.
Re: zookeeper config
On 1/24/2013 12:58 AM, Per Steffensen wrote: This is supported. You just need to ajust your ZK connection-string: host1:port1/solr,host2:port2/solr,...,hostN:portN/solr My experience has been that you put the chroot at the very end, not on every host entry. For a standalone zookeeper ensemble with three nodes: server1:2181,server2:2181,server3:2181/mysolr1 This is used for the zkHost parameter both on Solr startup and with the CloudSolrServer object from SolrJ. The string is used without modification in constructing the actual ZooKeeper object down in the SolrCloud internals. Here's the documentation for that object: http://zookeeper.apache.org/doc/r3.4.5/api/org/apache/zookeeper/ZooKeeper.html#ZooKeeper%28java.lang.String,%20int,%20org.apache.zookeeper.Watcher%29 Thanks, Shawn
Solr autocomplete feature
Hi I want to change autocomplete implementation for our search. Current I have a suggest field whose definition in schema.xml is as below: field name=suggest type=edgytext indexed=true stored=true required=true omitNorms=false/ fieldType name=edgytext class=solr.TextField positionIncrementGap=0 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 splitOnCaseChange=0 splitOnNumerics=0 catenateWords=1 catenateNumbers=1 catenateAll=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=10 / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType It works as follows. “shoes” will match “casual shoes”, “sports shoes”, “shoes” etc. Whereas I want it to match only the values that starts with the user query. Ie. If user types “shoes”, I want suggest terms that starts with “shoes” (or) has the query string as prefix string in “suggest” filed in the index. Please let me know how to do this. Regards, Ilay -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-autocomplete-feature-tp4035927.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr autocomplete feature
Hi I want to change autocomplete implementation for our search. Current I have a suggest field whose definition in schema.xml is as below: field name=suggest type=edgytext indexed=true stored=true required=true omitNorms=false/ fieldType name=edgytext class=solr.TextField positionIncrementGap=0 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 splitOnCaseChange=0 splitOnNumerics=0 catenateWords=1 catenateNumbers=1 catenateAll=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=10 / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType It works as follows. shoes will match casual shoes, sports shoes, shoes etc. Whereas I want it to match only the values that starts with the user query. Ie. If user types shoes, I want suggest terms that starts with shoes (or) has the query string as prefix string in suggest filed in the index. Please let me know how to do this. Regards, Ilay
Re: AW: AW: auto completion search with solr using NGrams in SOLR
Thanks for your solution it works for me too, I'm new with Solr but how I can additionally fetch another fields not only field that was used for searching? For example I have product title and image fields and I want to get the title but also related to this title image ? How can I do this? Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559p4035931.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr 4.1.0 shardHandlerFactory Null Pointer Exception when setting up embedded solrj solr server for unit testing
We recently updated from Solr 4.0.0 to Solr 4.1.0. Because of the change we were forced to upgrade a custom query parser. While the code change itself was minimal, we found that our unit tests stopped working because of a NullPointerException on line 181 of handler.component.SearchHandler: ShardHandler shardHandler1 = shardHandlerFactory.getShardHandler(); We determined that the cause of this exception was that shardHandlerFactory was never initialized in the solr container. The reason for this seems to be that the shard handler is setup in core.CoreContainer::initShardHandler which is called from core.CoreContainer::load. When setting up the core container we were using the public CoreContainer(SolrResourceLoader loader) constructor. This constructor never calls the load method, so initShardHandler is never called and the shardHandler is never initialized. In Solr 4.0.0 the shardHandler was initialized on the calling of getShardHandlerFactory. This code was modified and moved by revision 1422728: SOLR-4204: Make SolrCloud tests more friendly to FreeBSD blackhole 2 environments. We fixed our issue by using the public CoreContainer(String dir, File configFile) constructor which calls the load method. I just wanted to make sure that people were aware of this issue and to determine if it really is an issue or if having the shardHandler be null was expected behavior unless someone called the load(String dir, File configFile ) method. Thank you, Ted Stack trace of error: org.apache.solr.client.solrj.SolrServerException: org.apache.solr.client.solrj.SolrServerException: java.lang.NullPointerException at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:223) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301) at com.cision.search.solr.ProximityQParserTest.testInit(ProximityQParserTest.java:72) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Caused by: org.apache.solr.client.solrj.SolrServerException: java.lang.NullPointerException at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:155) ... 27 more Caused by: java.lang.NullPointerException at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:181) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at
Re: Problem with migration from solr 3.5 with SOLR-2155 usage to solr 4.0
Hi David, thank you for your answer. After update to this field type and change the SOLR query I receive required behavior. Also could you update the WIKI page after the words it needs to be in WEB-INF/lib in Solr's war file, basically also add the maven artifact code like this? dependency groupIdcom.vividsolutions/groupId artifactIdjts/artifactId version1.13/version /dependency I think this may help for users used maven. WBR Viacheslav. On 23.01.2013, at 19:24, Smiley, David W. wrote: Viacheslav, SOLR-2155 is only compatible with Solr 3. However the technology it is based on lives on in Lucene/Solr 4 in the SpatialRecursivePrefixTreeFieldType field type. In the example schema it's registered under the name location_rpt. For more information on how to use this field type, see: SpatialRecursivePrefixTreeFieldType ~ David Smiley On 1/23/13 11:11 AM, Viacheslav Davidovich viacheslav.davidov...@objectstyle.com wrote: Hi, With Solr 3.5 I use SOLR-2155 plugin to filter the documents by distance as described in http://wiki.apache.org/solr/SpatialSearch#Advanced_Spatial_Search and this solution perfectly filter the multiValued data defined in schema.xml like fieldType name=geohash class=solr2155.solr.schema.GeoHashField length=12 / field name=location_data type=geohash indexed=true stored=true multiValued=true/ the query looks like this with Solr 3.5: q=*:*fq={!geofilt}sfield= location_datapt=45.15,-93.85d=50sort=geodist() asc As SOLR-2155 plugin not compatible with solr 4.0 I try to change the field definition to next: fieldType name=location class=solr.LatLonType subFieldSuffix=_coordinate / field name=location_data type=location indexed=true stored=true multiValued=true/ dynamicField name=*_coordinate type=tdouble indexed=true stored=false / But in this case after geofilt by location_data execution the correct values returns only if the field have 1 value, if more them 1 value stored in index required documents returns only when all the location points are matched. Have anybody experience or any ideas how to receive the same behavior in solr4.0 as this was in solr3.5 with SOLR-2155 plugin usage? Is this possible at all or I need to refactor the document structure and field definition to store only 1 location value per document? WBR Viacheslav.
Submit schema definition using curl via SOLR
Hi, We would like to use Solr to index statistics from any Java module in our production environment. Applications have to can create collections and index data on demand, so my initial thought is to use different HTTP methods to accomplish a collection in cluster and then right away start HTTP POST documents, but the issue here is the schema.xml. Is it possible to HTTP POST the schema via Solr to Zookeeper? Or do I have to know about other service host/IP than SOLR, such as ZooKeeper (wanted to understand whether there is a way to avoid knowing about zookeeper in production.)? This must be a duplicate of another question, excuse me in advance. Regards Fadi
Re: AW: AW: auto completion search with solr using NGrams in SOLR
Hi, You can fetch all the stored fields by passing them as part of *fl*parameter. Go through http://wiki.apache.org/solr/CommonQueryParameters#fl On Thu, Jan 24, 2013 at 8:56 PM, AnnaVak anna.vakulc...@gmail.com wrote: Thanks for your solution it works for me too, I'm new with Solr but how I can additionally fetch another fields not only field that was used for searching? For example I have product title and image fields and I want to get the title but also related to this title image ? How can I do this? Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559p4035931.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards Naresh
Does solr 4.1 support field compression?
Hi everyone, I didn't see any mention of field compression in the release notes for Solr 4.1. Did the ability to automatically compress fields end up getting added to this release? Thanks!, Ken
Re: Does solr 4.1 support field compression?
Hello! It should be turned on by default, because the stored fields compression is the behavior of the default Lucene 4.1 codec. -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch Hi everyone, I didn't see any mention of field compression in the release notes for Solr 4.1. Did the ability to automatically compress fields end up getting added to this release? Thanks!, Ken
AW: Does solr 4.1 support field compression?
This is what it listed under the Highlights on the Apache page announcing the Solr 4.1 release: The default codec incorporates an efficient compressed stored fields implementation that compresses chunks of documents together with LZ4. (see http://blog.jpountz.net/post/33247161884/efficient-compressed-stored-fields-with-lucene) André Von: Rafał Kuć [r@solr.pl] Gesendet: Donnerstag, 24. Januar 2013 16:45 An: solr-user@lucene.apache.org Betreff: Re: Does solr 4.1 support field compression? Hello! It should be turned on by default, because the stored fields compression is the behavior of the default Lucene 4.1 codec. -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch Hi everyone, I didn't see any mention of field compression in the release notes for Solr 4.1. Did the ability to automatically compress fields end up getting added to this release? Thanks!, Ken
Re: Submit schema definition using curl via SOLR
Basically uploading a Solr config (including schema.xml, solrconfig.xml etc.) is an operation different from creating collections. When creating a collection (e.g. using the Collection API) you reference the (already existing) Solr config it needs to use. Collections can share Solr configs. I know of at least two ways to load a Solr config into ZK using Solr-tools. 1) You can use ZkCLI tool (of course ZK needs to be started) - something like this mkdir -p ${SOLR_INSTALL}/example/webapps/temp cp ${SOLR_INSTALL}/example/webapps/solr.war ${SOLR_INSTALL}/example/webapps/temp cd ${SOLR_INSTALL}/example/webapps/temp jar -xf solr.war java -classpath ${SOLR_INSTALL}/example/webapps/temp/WEB-INF/lib/* org.apache.solr.cloud.ZkCLI -cmd upconfig -confdir path_to_solr_config_dir -confname logical_solr_config_name --zkhost zk_connection_str rm -rf ${SOLR_INSTALL}/example/webapps/temp Believe there is also a zkcli.sh tool 2) or You can have an Solr node (server) load a Solr config into ZK during startup by adding collection.configName and bootstrap_confdir VM params - something like this java -DzkHost=zk_connection_str -Dcollection.configName=edr_sms_conf -Dbootstrap_confdir=path_to_solr_config_dir -jar start.jar I prefer 1) for several reasons. Regards, Per Steffensen On 1/24/13 4:02 PM, Fadi Mohsen wrote: Hi, We would like to use Solr to index statistics from any Java module in our production environment. Applications have to can create collections and index data on demand, so my initial thought is to use different HTTP methods to accomplish a collection in cluster and then right away start HTTP POST documents, but the issue here is the schema.xml. Is it possible to HTTP POST the schema via Solr to Zookeeper? Or do I have to know about other service host/IP than SOLR, such as ZooKeeper (wanted to understand whether there is a way to avoid knowing about zookeeper in production.)? This must be a duplicate of another question, excuse me in advance. Regards Fadi
Re: Does solr 4.1 support field compression?
Doh!, I went straight for the release notes. Thanks, this is the feature I was waiting for :) Ken On Thu, Jan 24, 2013 at 10:49 AM, André Widhani andre.widh...@digicol.de wrote: This is what it listed under the Highlights on the Apache page announcing the Solr 4.1 release: The default codec incorporates an efficient compressed stored fields implementation that compresses chunks of documents together with LZ4. (see http://blog.jpountz.net/post/33247161884/efficient-compressed-stored-fields-with-lucene) André Von: Rafał Kuć [r@solr.pl] Gesendet: Donnerstag, 24. Januar 2013 16:45 An: solr-user@lucene.apache.org Betreff: Re: Does solr 4.1 support field compression? Hello! It should be turned on by default, because the stored fields compression is the behavior of the default Lucene 4.1 codec. -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch Hi everyone, I didn't see any mention of field compression in the release notes for Solr 4.1. Did the ability to automatically compress fields end up getting added to this release? Thanks!, Ken
Re: Does solr 4.1 support field compression?
On 1/24/2013 8:42 AM, Ken Prows wrote: I didn't see any mention of field compression in the release notes for Solr 4.1. Did the ability to automatically compress fields end up getting added to this release? The concept of compressed fields (an option in schema.xml) that existed in the 1.x versions of Solr (based on Lucene 2.9) was removed in Lucene 3.0. Because Lucene and Solr development were combined, the Solr version after 1.4.1 is 3.1.0, there is no 1.5 or 2.x version of Solr. Solr/Lucene 4.1 compresses all stored field data by default. I don't think there's a way to turn it off at the moment, which is causing performance problems for a small subset of Solr users. When it comes out, Solr 4.2 will also have compressed term vectors. The release note contains this text: Stored fields are compressed. (See http://blog.jpountz.net/post/33247161884/efficient-compressed-stored-fields-with-lucene) It looks like the solr CHANGES.txt file fails to specifically mention LUCENE-4226 https://issues.apache.org/jira/browse/LUCENE-4226 which implemented compressed stored fields. Thanks, Shawn
Re: Submit schema definition using curl via SOLR
On 1/24/13 4:51 PM, Per Steffensen wrote: 2) or You can have an Solr node (server) load a Solr config into ZK during startup by adding collection.configName and bootstrap_confdir VM params - something like this java -DzkHost=zk_connection_str -Dcollection.configName=logical_solr_config_name -Dbootstrap_confdir=path_to_solr_config_dir -jar start.jar Well logical_solr_config_name instead of edr_sms_conf, of course
Re: Starting instances with multiple collections
Each node needs a -Dsolr.solr.home pointing to a solr.xml, but the configuration-subfolder does not need to be there. It only needs to be there for the node you start with -Dbootstrap_confdir (to have it load the config into ZK). The next time you start this Solr you do not need to provide -Dbootstrap_confdir, since config is already loaded into ZK (well unless you run your ZK embedded in the Solr - in this case I believe all ZK state is removed when you close the Solr, but that is also just for playing) In general, IMHO, using a Solr node to load a configuration during startup is only for playing. You ought to load configs into ZK as a separate operation from starting Solrs (and creating collections for that matter). Also see recent mail-list dialog Submit schema definition using curl via SOLR Regards, Per Steffensen On 1/23/13 11:12 PM, Walter Underwood wrote: I can get one Solr 4.1 instance up with the config bootstrapped into Zookeeper. In zk I see two configs, two collections, and I can run the DIH on the first node. I can get the other two nodes to start and sync if I give them a -Dsolr.solr.home pointing to a directory with a solr.xml and subdirectories with configuration for each collection. If I don't do that, they look for solr/solr.xml, then fail. But what is the point of putting configs in Zookeeper if each host needs a copy anyway? The wiki does not have an example of how to start a cluster with multiple collections. Am I missing something here? wunder -- Walter Underwood wun...@wunderwood.org
Re: Solr SQL Express Integrated Security - Unable to execute query
Shawn Heisey-4 wrote There will be a lot more detail to this error. This detail may have a clue about what happened. Can you include the entire stacktrace? Thanks, Shawn Thank you Shawn. The following is the entire stacktrace. I hope this helps: INFO: Creating a connection for entity Product with URL: jdbc:sqlserver://localhost;instanceName=SQLEXPRESS;databaseName=Amazon;integratedSecurity=true; Jan 23, 2013 3:26:05 PM org.apache.solr.core.SolrCore execute INFO: [db] webapp=/solr path=/dataimport params={command=status} status=0 QTime=1 Jan 23, 2013 3:26:31 PM org.apache.solr.common.SolrException log SEVERE: Exception while processing: Product document : SolrInputDocument[]:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT [ProdID],[Descr] FROM [Amazon].[dbo].[Table_Temp] Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:252) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:209) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:472) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:411) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:326) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:234) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:448) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:429) Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The server SQLEXPRESS is not configured to listen with TCP/IP. at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:171) at com.microsoft.sqlserver.jdbc.SQLServerConnection.getInstancePort(SQLServerConnection.java:3188) at com.microsoft.sqlserver.jdbc.SQLServerConnection.primaryPermissionCheck(SQLServerConnection.java:937) at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:800) at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:700) at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:842) at org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:160) at org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:127) at org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:362) at org.apache.solr.handler.dataimport.JdbcDataSource.access$200(JdbcDataSource.java:38) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:239) ... 12 more Jan 23, 2013 3:26:31 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [db] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=13 {deleteByQuery=*:*} 0 13 Jan 23, 2013 3:26:31 PM org.apache.solr.common.SolrException log SEVERE: Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT [ProdID],[Descr] FROM [Amazon].[dbo].[Table_Temp] Processing Document # 1 at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:273) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:448) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:429) Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT [ProdID],[Descr] FROM [Amazon].[dbo].[Table_Temp] Processing Document # 1 at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:413) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:326) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:234) ... 3 more Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT [ProdID],[Descr] FROM [Amazon].[dbo].[Table_Temp]
Mahout - Solr vs Mahout Lucene Question
Hi, I hate to double post but I'm not sure in which domain, the answer to my question lies, so here's the link to my question on the mahout groups. Basically, I'm getting different clustering results depending on whether I index data with SOLR or Lucene. Please post any responses against the original question. Thanks http://lucene.472066.n3.nabble.com/Clustering-using-Solr-Index-vs-Lucene-Index-Different-Results-td4036013.html -- View this message in context: http://lucene.472066.n3.nabble.com/Mahout-Solr-vs-Mahout-Lucene-Question-tp4036014.html Sent from the Solr - User mailing list archive at Nabble.com.
Deletion from database
Hi, I am trying to figure out a way so that if i delete anything from my database how will that item be deleted from my indexed data? is there anyway i can make new core with same config as the existing core, do full index, swap the data with the existing core and delete the new core. So every time i delete anything from database, it creates a new core, index data, swap it and then delete the new core(that was made)? -- View this message in context: http://lucene.472066.n3.nabble.com/Deletion-from-database-tp4036018.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Deletion from database
The general solution is to add a deleted column to your database, or even a deleted date column. When you update Solr from the DB, issue a delete for each item deleted since the last successful update. You can delete those rows after the Solr update or to be extra safe, delete them a few days later. For this to work, you must not re-use IDs. wunder On Jan 24, 2013, at 10:05 AM, hassancrowdc wrote: Hi, I am trying to figure out a way so that if i delete anything from my database how will that item be deleted from my indexed data? is there anyway i can make new core with same config as the existing core, do full index, swap the data with the existing core and delete the new core. So every time i delete anything from database, it creates a new core, index data, swap it and then delete the new core(that was made)? -- View this message in context: http://lucene.472066.n3.nabble.com/Deletion-from-database-tp4036018.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: zookeeper config
On Jan 24, 2013, at 7:05 AM, Shawn Heisey s...@elyograg.org wrote: My experience has been that you put the chroot at the very end, not on every host entry Yup - this came up on the mailing list not too long ago and it's currently correctly documented on the SolrCloud wiki. - Mark
Re: Solr 4.1.0 shardHandlerFactory Null Pointer Exception when setting up embedded solrj solr server for unit testing
This is my fault - I discovered this myself a few days ago. I've been meaning to file a jira ticket and have not gotten around to it yet. You can also work around it like this: CoreContainer container = new CoreContainer(loader) { // workaround since we don't call container#load {initShardHandler(null);} }; - Mark On Jan 24, 2013, at 9:22 AM, Ted Merchant ted.merch...@cision.com wrote: We recently updated from Solr 4.0.0 to Solr 4.1.0. Because of the change we were forced to upgrade a custom query parser. While the code change itself was minimal, we found that our unit tests stopped working because of a NullPointerException on line 181 of handler.component.SearchHandler: ShardHandler shardHandler1 = shardHandlerFactory.getShardHandler(); We determined that the cause of this exception was that shardHandlerFactory was never initialized in the solr container. The reason for this seems to be that the shard handler is setup in core.CoreContainer::initShardHandler which is called from core.CoreContainer::load. When setting up the core container we were using the public CoreContainer(SolrResourceLoader loader) constructor. This constructor never calls the load method, so initShardHandler is never called and the shardHandler is never initialized. In Solr 4.0.0 the shardHandler was initialized on the calling of getShardHandlerFactory. This code was modified and moved by revision 1422728: SOLR-4204: Make SolrCloud tests more friendly to FreeBSD blackhole 2 environments. We fixed our issue by using the public CoreContainer(String dir, File configFile) constructor which calls the load method. I just wanted to make sure that people were aware of this issue and to determine if it really is an issue or if having the shardHandler be null was expected behavior unless someone called the load(String dir, File configFile ) method. Thank you, Ted Stack trace of error: org.apache.solr.client.solrj.SolrServerException: org.apache.solr.client.solrj.SolrServerException: java.lang.NullPointerException at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:223) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301) at com.cision.search.solr.ProximityQParserTest.testInit(ProximityQParserTest.java:72) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Caused by: org.apache.solr.client.solrj.SolrServerException: java.lang.NullPointerException at
Re: Solr SQL Express Integrated Security - Unable to execute query
On Thu, Jan 24, 2013 at 11:34 AM, O. Olson olson_...@yahoo.it wrote: Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The server SQLEXPRESS is not configured to listen with TCP/IP. That's probably your problem... Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game
Re: Submit schema definition using curl via SOLR
On Jan 24, 2013, at 10:02 AM, Fadi Mohsen fadi.moh...@gmail.com wrote: Hi, We would like to use Solr to index statistics from any Java module in our production environment. Applications have to can create collections and index data on demand, so my initial thought is to use different HTTP methods to accomplish a collection in cluster and then right away start HTTP POST documents, but the issue here is the schema.xml. Is it possible to HTTP POST the schema via Solr to Zookeeper? I've done some work towards this at https://issues.apache.org/jira/browse/SOLR-4193 Or do I have to know about other service host/IP than SOLR, such as ZooKeeper (wanted to understand whether there is a way to avoid knowing about zookeeper in production.)? I wouldn't try to avoid it - it's probably simpler to deal with than you think. It's also pretty easy to use http://wiki.apache.org/solr/SolrCloud#Command_Line_Util to upload a new schema.xml - then just Collections API reload command. Two lines in a script. - Mark
Re: Deletion from database
ok, how can i issue delete for each item deleted since the last successful update? Do i write something like delete query with delta import query in dataconfig? If so, what will i add in dataconfig for deletion? -- View this message in context: http://lucene.472066.n3.nabble.com/Deletion-from-database-tp4036018p4036026.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Deletion from database
This post on stackoverflow has a good run-down on your options: http://stackoverflow.com/questions/1555610/solr-dih-how-to-handle-deleted-documents/1557604#1557604 If you're using DIH, you can get more information from: http://wiki.apache.org/solr/DataImportHandler The easiest thing, if using a delta import is to add deletePkQuery on your entity like this: entity name=... query=... deltaQuery=... deltaImportQuery=... deletedPkQuery=SELECT ID FROM MY_TABLE WHERE DELETED='Y' / Another approach is to have a second top-level entity that uses the special command: entity name=Deletes query=SELECT ID AS '$deleteDocById' FROM MY_TABLE WHERE DELETED='Y' / This second approach works if you use DIH but do delta updates using the approach described here: http://wiki.apache.org/solr/DataImportHandlerFaq#fullimportdelta James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: hassancrowdc [mailto:hassancrowdc...@gmail.com] Sent: Thursday, January 24, 2013 12:19 PM To: solr-user@lucene.apache.org Subject: Re: Deletion from database ok, how can i issue delete for each item deleted since the last successful update? Do i write something like delete query with delta import query in dataconfig? If so, what will i add in dataconfig for deletion? -- View this message in context: http://lucene.472066.n3.nabble.com/Deletion-from-database-tp4036018p4036026.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr SQL Express Integrated Security - Unable to execute query
Michael Della Bitta-2 wrote On Thu, Jan 24, 2013 at 11:34 AM, O. Olson lt; olson_ord@ gt; wrote: Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The server SQLEXPRESS is not configured to listen with TCP/IP. That's probably your problem... Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game Good call Michael. I did have to enable TCP (http://msdn.microsoft.com/en-us/library/hh231672.aspx for others who have the same problem), but I did not still not get this to work. I then tested my Driver, JDBC URL SQL Query in a plain old Java class. This showed me that it was almost impossible to get integrated authentication to work in Java. I finally went with specifying the usename and password literally. (I hope this useful to others): public static void main(String[] args) throws Exception { String url = jdbc:sqlserver://localhost\\SQLEXPRESS;database=Amazon;user=solrusr;password=solrusr;; String driver = com.microsoft.sqlserver.jdbc.SQLServerDriver; Connection connection = null; try { System.out.println(Loading driver...); Class.forName(driver); System.out.println(Driver loaded! Attempting Connection ...); connection = DriverManager.getConnection(url); System.out.println(Connection succeeded!); ResultSet RS = connection.createStatement().executeQuery(SELECT ProdID, Descr FROM Table_Temp); try { while(RS.next() != false) { System.out.println(RS.getString(1) ++ RS.getString(2)); } } finally { RS.close(); } // Success. } catch (SQLException e) {} finally { if (connection != null) try { connection.close(); } catch (SQLException ignore) {} } } Hence, I modified my db-data-config.xml to dataConfig dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://localhost\SQLEXPRESS;databaseName=Amazon;user=solrusr;password=solrusr;/ document entity name=Product query=SELECT ProdID,Descr FROM Table_Temp field column=ProdID name=ProdID / field column=Descr name=Descr / /entity /document /dataConfig This worked for me. Thanks again Michael Shawn. O. O. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-SQL-Express-Integrated-Security-Unable-to-execute-query-tp4035758p4036056.html Sent from the Solr - User mailing list archive at Nabble.com.
PK uniqueness aware Solr index merging?
We have a Hadoop process that produces a set of Solr indexes from a cluster of HBase documents. After the job runs, we pull the indexes from HDFS and merge the them together locally. The issue we're running into is that sometimes we'll have duplicate occurrences of a primary key across indexes that we'll want merged out. For example, a set of directories with: ./dir00/ doc_id=0 PK=1 ./dir01/ doc_id=0 PK=1 should merge into a Solr index containing a single document rather than one with two Lucene documents each containing PK=1. The Lucene-level merge code -- i.e. oal.index.SegmentMerger.merge()-- doesn't know about the Solr schema, so it will merge these two directories into two duplicate documents. It doesn't appear that either Solr's oas.handler.admin.CoreAdminHandler.handleMergeAction(SolrQueryRequest, SolrQueryResponse) handles this either, as it ends up passing the list of merge directories to oal.index.IndexWriter.addIndexes(IndexReader...) via oas.update.DirectUpdateHandler2.mergeIndexes(MergeIndexesCommand). So, if I want to merge multiple Solr directories in a way that respects primary key uniqueness, is there any more efficient manner than re-adding all of the documents in each directory to a new Solr index to avoid PK duplicates? Thanks. --Gregg Gregg Donovan Senior Software Engineer, Etsy.com gr...@etsy.com
indexVersion returns multiple results when called
Hi, We have 5 core masters and 5 core slaves. The main core houses about 85,000 douments, so small, although the content of each document is quite large. The second core holds the same number of docs but far less - and different - data. We reindex all cores every morning and the replication poll is 5 minutues. The main core takes 15 minutes to reindex (optimize). At some point, an incomplete index is picked up by the slave and our web site disappears until the optimize takes place. I know we could increase the poll to 30 minutes but that would be no guarantee. Thought we'd sove it by writing a script to get the indexversion, kick off reindexing and periodically check the current indexversion against the first - if the same, sleep for 2 minutes and then check again. Once they're different, do a fetchIndex from the slave. Works on all the cores except the main one. We get a different indexversion after two minutes, the slave gets populated with an almost empty index and the site is out! All the other cores exhibit the same indexversion. What have we misunderstood or got wrong? Regards, David Q -- View this message in context: http://lucene.472066.n3.nabble.com/indexVersion-returns-multiple-results-when-called-tp4036046.html Sent from the Solr - User mailing list archive at Nabble.com.
PK uniqueness aware Solr index merging?
FollowUp.cc Reminder You received this email because gregg...@gmail.com set a public FollowUp.cc reminder and it's the first time you've appeared on one (congrats, you have wise friends!). 3 Reasons why people use FollowUp.cc Reminders - It removes the step of having to mark on your calendar when to follow up with someone - Can be used with the BCC field so the recipient does not know it was set - Forward emails you want to deal with later using a simple reminder Currently, you will receive the reminder that was just set or this email thread: Unsubscribe from just this reminder http://followup.cc/et.php?pref=threadaction=pref_setcid=192384msg_id=792402date_rem_sent=1359061873email=solr-user%40lucene.apache.orgutm_source=pref_emailutm_medium=emailutm_content=c192384utm_term=thread Unsubscribe from all future FollowUp reminders http://followup.cc/et.php?pref=allaction=pref_setcid=192384msg_id=792402date_rem_sent=1359061873email=solr-user%40lucene.apache.orgutm_source=pref_emailutm_medium=emailutm_content=c192384utm_term=perm Copyright 2013 FollowUp.cc | http://www.followup.cc | All rights reserved
Re: Submit schema definition using curl via SOLR
Thanks Per, would the first approach involve restarting Solr? Thanks Mark, that's great, Ill try checkout and apply patches from ticket to understand further. The reason we would like to avoid Zookeeper are * due to lack of knowledge. * the amount of work/scripting for developers per module and release documentation. * the extra steps of patching ZK nodes for QA and operations. ZkCLI is a nice tool, but then instead of interacting with one service over HTTP, the application needs: * extra jar files * know ZK hostname/IP and port (different in each dev/qa/systest/accept/production environment), which is per module a one to much configuration step. On Thu, Jan 24, 2013 at 7:18 PM, Mark Miller markrmil...@gmail.com wrote: On Jan 24, 2013, at 10:02 AM, Fadi Mohsen fadi.moh...@gmail.com wrote: Hi, We would like to use Solr to index statistics from any Java module in our production environment. Applications have to can create collections and index data on demand, so my initial thought is to use different HTTP methods to accomplish a collection in cluster and then right away start HTTP POST documents, but the issue here is the schema.xml. Is it possible to HTTP POST the schema via Solr to Zookeeper? I've done some work towards this at https://issues.apache.org/jira/browse/SOLR-4193 Or do I have to know about other service host/IP than SOLR, such as ZooKeeper (wanted to understand whether there is a way to avoid knowing about zookeeper in production.)? I wouldn't try to avoid it - it's probably simpler to deal with than you think. It's also pretty easy to use http://wiki.apache.org/solr/SolrCloud#Command_Line_Util to upload a new schema.xml - then just Collections API reload command. Two lines in a script. - Mark
AW: Does solr 4.1 support field compression?
These are the figures I got after indexing 4 and half million documents with both Solr 3.6.1 and 4.1.0 (and optimizing the index at the end). $ du -h --max-depth=1 67G ./solr410 80G ./solr361 Main contributor to the reduced space consumption is (as expected I guess) the .fdt file: $ ls -lh solr361/*/*/*.fdt 29G solr361/core-tex68bohyrh23qs192adaq-index361/index/_bab.fdt $ ls -lh solr410/*/*/*.fdt 18G solr410/core-tex68bohyz1teef3xsjdaw-index410/index/_23uy.fdt Depends of course on your individual ratio of stored versus indexed-only fields. André Von: Shawn Heisey [s...@elyograg.org] Gesendet: Donnerstag, 24. Januar 2013 16:58 An: solr-user@lucene.apache.org Betreff: Re: Does solr 4.1 support field compression? On 1/24/2013 8:42 AM, Ken Prows wrote: I didn't see any mention of field compression in the release notes for Solr 4.1. Did the ability to automatically compress fields end up getting added to this release? The concept of compressed fields (an option in schema.xml) that existed in the 1.x versions of Solr (based on Lucene 2.9) was removed in Lucene 3.0. Because Lucene and Solr development were combined, the Solr version after 1.4.1 is 3.1.0, there is no 1.5 or 2.x version of Solr. Solr/Lucene 4.1 compresses all stored field data by default. I don't think there's a way to turn it off at the moment, which is causing performance problems for a small subset of Solr users. When it comes out, Solr 4.2 will also have compressed term vectors. The release note contains this text: Stored fields are compressed. (See http://blog.jpountz.net/post/33247161884/efficient-compressed-stored-fields-with-lucene) It looks like the solr CHANGES.txt file fails to specifically mention LUCENE-4226 https://issues.apache.org/jira/browse/LUCENE-4226 which implemented compressed stored fields. Thanks, Shawn
RE: Sorting on Score Problem
Hi Hoss Thanks for the reply. Unfortunately we have other customized similarity classes that I don’t know how to disable them and still make query work. I am trying to attach more information once I work out how to simply the issue. Thanks Ben From: Chris Hostetter [hossman_luc...@fucit.org] Sent: Thursday, January 24, 2013 12:34 PM To: solr-user@lucene.apache.org Subject: Re: Sorting on Score Problem : We met a wired problem in our project when sorting by score in Solr 4.0, : the biggest score document is not a the top the debug explanation from : solr are like this, that's weird ... can you post the full debugQuery output of a an example query showing the problem, using echoParams=all fl=id,score (or whatever unique key field you have) also: can you elaborate wether you are using a single node setup or a distributed (ie: SolrCloud) query? : Then we thought it could be a float rounding problem then we implement : our own similarity class to increse queryNorm by 10,000 and it changes : the score scale but the rank is still wrong. when you post the details request above, please don't use your custom similarity (just the out of the box solr code) so there's one less variable in the equation. -Hoss
Re: Submit schema definition using curl via SOLR
On Jan 24, 2013, at 5:22 PM, Fadi Mohsen fadi.moh...@gmail.com wrote: The reason we would like to avoid Zookeeper are * due to lack of knowledge. * the amount of work/scripting for developers per module and release documentation. * the extra steps of patching ZK nodes for QA and operations. ZkCLI is a nice tool, but then instead of interacting with one service over HTTP, the application needs: * extra jar files We should address this I think - it really shouldn't require anymore than the SolrJ jars. Currently it also requires the core jars. Still not as minimal as just curl posting I know. Testing and reporting on the issue I posted, as well as discussion around expanding it, will likely help pushing those features forward. - Mark
Re: solr parsed query dropping special chars
: When I search for these characters in the admin query, I can only find the Greeks. : debug shows the parsed query only has greek chars like omega, delta, sigma : but does not contain others like degree, angle, cent, bullet, less_equal… this is most likeley because of the analyzer you are using for your text field, an assumption which can be verified using the Analysis tool in the admin UI to see how the various pieces of your query analzer deal with the input. My guess is you are using a tokenizer which ignores punctuation. Don't foget to check your index analyzer as well -- you may not even be indexing these punctuation symbols either... : the response dumps the document and shows me the chars exist in the document.. : strangle (∠)/str ...that's the stored value, the *indexed* text may not contain those terms. -Hoss
Re: Solr load balancer
: For example perhaps a load balancer that sends multiple queries : concurrently to all/some replicas and only keeps the first response : might be effective. Or maybe a load balancer which takes account of the I know of other distributed query systems that use this approach, when query speed is more important to people then load and people who use them seem to think it works well. given that it synthetically multiplies the load of each end user request, it's probably not something we'd want to turn on by default, but a configurable option certainly seems like it might be handy. -Hoss
Re: Search strategy - improving search quality for short search terms such as doll
: My next target is searches on simple terms such as doll which, in google, : would return documents about, well, toy dolls, because that's the most : common usage of the simple term doll. But in my index it predominantly : returns documents about CDs with the song Doll Face, and My baby doll in : them. if you have good metdata about your documents, then you might get satisfing results using something like the edismax parser with appropriate weights on various fields -- you could for example say that matching on the product_title field is important, but matching on a category_name is much more important and thus use something like... q=dollqf=product_title^5+category_name^50 ..but that only helps you if you have category_name values that match the words people are searching for like Doll This type of appoach doesn't help you in the case where you might have the inverse problem: document (category_name=doll, product_name=My baby) showing up first when a user searches for my baby doll but the user is really trying to find the document (category_name=cd, product_name=my baby doll) it really all depends on your user base and the type of queries you expect. An interesting solution to this problem that i've seen is to pre-process the query using a baysiean classifier to suggest which categories to boost on. Here's a blog on this where the classifier was trained based on the keywords categories of the documents... http://engineering.wayfair.com/better-lucenesolr-searches-with-a-boost-from-an-external-naive-bayes-classifier/ ...but you could also train the classifier using query logs and data about what documents users ultimately clicked on (to help you learn that for your userbase, people who search for baby are typically looking for CDs not dolls -- or vice versa) : : : : I'm not directly asking how to solve this as much as I'm asking what : direction I should be looking in to learn what I need to know to tackle the : general issue myself. : : : : Left on my own I would start looking at categorizing the CD's into a facet : called music, reasonably doable in my dataset. Then I need to reduce the : boost-value of the entire facet/category of music unless certain pre-defined : query terms exist, such as [music, cd, song, listen, dvd, analyze actual : user queries to come up with a more exhaustive list, etc.]. : : : : I don't yet know how to do all of this, but after a couple more good books I : should be dangerous. : : : : So the question to this list: : : : : - Am I on the right track here? If not, can you point me in a : direction to go? : : : : : : -Hoss
RE: solr parsed query dropping special chars
Thanks for the education Chris, I pasted the chars into Index and Query fields on analyzer panel. Index/Query Analyzers almost the same.. On both, non-greeks drop out after worddelimiterfilter Index analyzer has grey background of words that seem to make it thru all the filters. WhitespaceTokenizerFactory - ∠ ψ Σ • ≤ ≠ • ≥ μ ω φ θ ¢ β √ Ω ° ± Δ # SynonymFilterFactory (query only) - ditto StopFilterFactory- ditto WordDelimiterFilterFactory - ψ Σ μ ω φ θ β Ω Δ now only greeks LowerCaseFilterFactory - ψ σ μ ω φ θ β ω δ lower case Greeks only SnowballPorterFilterFactory - ψ σ μ ω φ θ β ω δ so I'm thinking I need to change the worddelimiterfilter properties {catenateWords=0, catenateNumbers=0, splitOnCaseChange=1, catenateAll=0, generateNumberParts=1, generateWordParts=1, splitOnNumerics=0} or copy these strings into a different field name/type without word delimiter, that way I wouldn't affect any ways that existing text is being searched. Sound right? Allan Tegelberg -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Thursday, January 24, 2013 3:46 PM To: solr-user@lucene.apache.org Subject: Re: solr parsed query dropping special chars : When I search for these characters in the admin query, I can only find the Greeks. : debug shows the parsed query only has greek chars like omega, delta, sigma : but does not contain others like degree, angle, cent, bullet, less_equal… this is most likeley because of the analyzer you are using for your text field, an assumption which can be verified using the Analysis tool in the admin UI to see how the various pieces of your query analzer deal with the input. My guess is you are using a tokenizer which ignores punctuation. Don't foget to check your index analyzer as well -- you may not even be indexing these punctuation symbols either... : the response dumps the document and shows me the chars exist in the document.. : strangle (∠)/str ...that's the stored value, the *indexed* text may not contain those terms. -Hoss
JSON query syntax
Although lucene syntax tends to be quite concise, nice looking, and easy to build by hand (the web browser is a major debugging tool for me), some people prefer to use a more structured query language that's easier to build up programmatically. XML fits the bill, but people tend to prefer JSON these days. Hence my first quick prototype: https://issues.apache.org/jira/browse/SOLR-4351 I'm pretty happy so far with how easily it's fit in with our QParser framework, which should generally allow parsers to not care about the underlying syntax of queries they need to deal with. For example: the join qparser uses the query specified by v, but doesn't care of it's in lucene syntax, or if it was part of the JSON. {'join':{'from':'qqq_s', 'to':'www_s', 'v':'id:10'}} {'join':{'from':'qqq_s', 'to':'www_s', 'v':{'term':{'id':'10' Note: replace the single quotes with double quotes before trying it out - these are just test strings that have the replacement done in the test code so that they are easier to read. There's a fair bit left to do of course... like how to deal with boost, cache, cost, parameter dereferencing, etc. Feedback welcome... and hopefully this will be good to go for 4.2 -Yonik http://lucidworks.com
Re: JSON query syntax
Nice, Yonik! Here is one suggestion. OK, I'm beginning you - please don't make it be as hard on the eyes as Local Params. :) I thought it was just me who could never get along with Local Params, but I've learned that a number of people find Local Params very hard to grok. Yes, this is JSON, so right there it may be better, but for instance I see v here which to a regular human may not be as nice as value if that is what v stands for. Looking at examples from the JIRA issue {'frange':{'v':'mul(foo_i,2)', 'l':20,'u':24}}} v is value? mul is multiply? what's l? left? No, low(er)? what's u? Aha, upper? I'd rather use a few extra character and be clear, easily memorable, and user friendly. People love ES's JSON API and I have never ever heard anyone say it's too verbose. Thanks, Otis On Thu, Jan 24, 2013 at 8:44 PM, Yonik Seeley yo...@lucidworks.com wrote: Although lucene syntax tends to be quite concise, nice looking, and easy to build by hand (the web browser is a major debugging tool for me), some people prefer to use a more structured query language that's easier to build up programmatically. XML fits the bill, but people tend to prefer JSON these days. Hence my first quick prototype: https://issues.apache.org/jira/browse/SOLR-4351 I'm pretty happy so far with how easily it's fit in with our QParser framework, which should generally allow parsers to not care about the underlying syntax of queries they need to deal with. For example: the join qparser uses the query specified by v, but doesn't care of it's in lucene syntax, or if it was part of the JSON. {'join':{'from':'qqq_s', 'to':'www_s', 'v':'id:10'}} {'join':{'from':'qqq_s', 'to':'www_s', 'v':{'term':{'id':'10' Note: replace the single quotes with double quotes before trying it out - these are just test strings that have the replacement done in the test code so that they are easier to read. There's a fair bit left to do of course... like how to deal with boost, cache, cost, parameter dereferencing, etc. Feedback welcome... and hopefully this will be good to go for 4.2 -Yonik http://lucidworks.com
Re: JSON query syntax
On Thu, Jan 24, 2013 at 8:55 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Yes, this is JSON, so right there it may be better, but for instance I see v here which to a regular human may not be as nice as value if that is what v stands for. One goal was to reuse the parsers/parameter names. A completely disjoint set would certainly lead to confusion. Concise *common* abbreviations are fine I think - for example we quickly get used to (and prefer) f(x) over function(variable1) We could add some aliases though. -Yonik http://lucidworks.com
Re: Get tokenized words in Solr Response
Hi Mikhail, Thanks for your guidance. I found the required information in debugQuery=on. Thanks and regards, Romita From: Mikhail Khludnev mkhlud...@griddynamics.com To: solr-user solr-user@lucene.apache.org, Date: 01/24/2013 03:19 PM Subject:Re: Get tokenized words in Solr Response Romita, IIRC you've already asked this, and I replied that everything what you need is on debugQuery=on output. That format is a little bit verbose, and I suppose you can experience some difficulties on finding the necessary info there. Please provide debugQuery=on output, I can try to highlight the necessary info for you. On Thu, Jan 24, 2013 at 6:11 AM, Romita Saha romita.s...@sg.panasonic.comwrote: Hi, I want the tokenized keywords to be displayed in solr response. As for example, my solr search could be Seach this document named XYZ-123. And the tokenizer in schema.xml tokenizes the query as follows: search documnent xyz 123. I want to get these tokenized words in the Solr response. Is it possible? Thanks and regards, Romita -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
RE: SOLR 4 getting stuck during restart
Thanks James for the heads up and apologies for a delayed response.Here's the full details about this issue. Mine is an e-com app so the index contains the product catalog comprising roughly 13million products. At this point I thought of using the index based dictionary as the bet option for the Did you Mean functionality. I am not sure if every one facing this issue, but here is what I am observing as far as dictionary is concerned. Index based dictionary - I was building the dictionary using the following url, once I completed the full indexing. For the time being I have kept the buildOnCommit and buildOnOptimize options intentionally to false, as I didn't want it to slow down the full indexing. http://localhost:8090/solr/select?rows=0spellcheck=truespellcheck.build=truespellcheck.dictionary=jarowinkler - Once I created the dictionary when I tried to re-start my tomcat, I am facing the issue which I have stated before (I was waiting for around 20mts, the restart didn't happen). - When I removed the dictionary from the data folder, the server restart started working. - I have tried the spellcheck.collation=false as you suggested, but it didn't help. Direct Spell Checker I have experimented with the new DirectSolrSpellChecker, where it does not create a separate dictionary folder, rather build the spellchecker in the main index itself. The results were exactly same as before, I was getting stuck during the restarts. I think the traditional spellchecker would be better in this case, as you can remove, restart and move back the dictionary as and when required. Where in case of DirectSolrSpellChecker, it doesn't create a separate dictionary folder, so not sure what to remove from the index, so that server can restart. James, I will request you to validate this, and it will be really great help if you can point out if I am doing any mistakes here. If you think what I am doing make sens, I will go ahead and log this bug in JIIRA. Thanks Vijesh K Nair -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-4-getting-stuck-during-restart-tp4034734p4036163.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr HTTP Replication Question
Okay so after some debugging I found the problem. While the replication piece will download the index from the master server and move the files to the index directory but during the commit phase, these older generation files are deleted and the index is essentially left in tact. I noticed that a full copy is needed if the index is stale (meaning that files in common between the master and slave have different sizes) but also I think a full copy should be needed if the slaves generation is higher than the master as well. In short, to me it's not sufficient enough to simply say a full copy is needed if the slave's index version is = master's index version. I'll create a patch and file a bug along with a more thorough writeup of how I got in this state. Thanks! Amit On Thu, Jan 24, 2013 at 2:33 PM, Amit Nithian anith...@gmail.com wrote: Does Solr's replication look at the generation difference between master and slave when determining whether or not to replicate? To be more clear: What happens if a slave's generation is higher than the master yet the slave's index version is less than the master's index version? I looked at the source and didn't seem to see any reason why the generation matters other than fetching the file list from the master for a given generation. It's too wordy to explain how this happened so I'll go into details on that if anyone cares. Thanks! Amit