Cache fails to warm after Replication Recovery in solr cloud
Hi! I have some custom cache set up in solrconfig XML for a solr cloud cluster in Kubernetes. Each node has Kubernetes persistence set up. After I execute a “delete pod” command to restart a node it goes into Replication Recovery successfully but my custom cache’s warm() method never gets called. Is this expected behavior? The events I observed are: 1. Cache init() method called 2. Searcher created and registered 3. Replication recovery Thanks! Li
Re: [EXTERNAL] Autoscaling simulation error
Thank you for creating the JIRA! Will follow On 12/19/19, 11:09 AM, "Andrzej Białecki" wrote: Hi, Thanks for the data. I see the problem now - it’s a bug in the simulator. I filed a Jira issue to track and fix it: SOLR-14122. > On 16 Dec 2019, at 19:13, Cao, Li wrote: > >> I am using solr 8.3.0 in cloud mode. I have collection level autoscaling policy and the collection name is “entity”. But when I run autoscaling simulation all the steps failed with this message: >> >> "error":{ >> "exception":"java.io.IOException: java.util.concurrent.ExecutionException: org.apache.solr.common.SolrException: org.apache.solr.common.SolrException: Could not find collection : entity/shards", >> "suggestion":{ >> "type":"repair", >> "operation":{ >> "method":"POST", >> "path":"/c/entity/shards", >> "command":{"add-replica":{ >> "shard":"shard2", >> "node":"my_node:8983_solr", >> "type":"TLOG", >> "replicaInfo":null}}},
Re: [EXTERNAL] Re: "No value present" when set cluster policy for autoscaling in solr cloud mode
Thank you, Andrzej! I am going to try IN operand as a work around. On 12/19/19, 10:17 AM, "Andrzej Białecki" wrote: Hi, For some strange reason global tags (such as “cores”) don’t support the “nodeset” syntax. For “cores” the only supported attribute is “node”, and then you’re only allowed to use #ANY or a single specific node name (with optional “!" NOT operand), or a JSON array containing node names to indicate the IN operand. The Ref Guide indeed is not very clear on that… > On 17 Dec 2019, at 21:20, Cao, Li wrote: > > Hi! > > I am trying to add a cluster policy to a freshly built 8.3.0 cluster (no collection added). I got this error when adding such a cluster policy > > { "set-cluster-policy":[{"cores":"<3","nodeset":{"sysprop.rex.node.type":"tlog"}}]} > > Basically I want to limit the number of cores for certain machines with a special environmental variable value. > > But I got this error response: > > { > "responseHeader":{ >"status":400, >"QTime":144}, > "result":"failure", > "WARNING":"This response format is experimental. It is likely to change in the future.", > "error":{ >"metadata":[ > "error-class","org.apache.solr.api.ApiBag$ExceptionWithErrObject", > "root-error-class","org.apache.solr.api.ApiBag$ExceptionWithErrObject"], >"details":[{ >"set-cluster-policy":[{ >"cores":"<3", >"nodeset":{"sysprop.rex.node.type":"tlog"}}], > "errorMessages":["No value present"]}], >"msg":"Error in command payload", >"code":400}} > > However, this works: > > { "set-cluster-policy":[{"cores":"<3","node":"#ANY"}]} > > I read the autoscaling policy documentations and cannot figure out why. Could someone help me on this? > > Thanks! > > Li
"No value present" when set cluster policy for autoscaling in solr cloud mode
Hi! I am trying to add a cluster policy to a freshly built 8.3.0 cluster (no collection added). I got this error when adding such a cluster policy { "set-cluster-policy":[{"cores":"<3","nodeset":{"sysprop.rex.node.type":"tlog"}}]} Basically I want to limit the number of cores for certain machines with a special environmental variable value. But I got this error response: { "responseHeader":{ "status":400, "QTime":144}, "result":"failure", "WARNING":"This response format is experimental. It is likely to change in the future.", "error":{ "metadata":[ "error-class","org.apache.solr.api.ApiBag$ExceptionWithErrObject", "root-error-class","org.apache.solr.api.ApiBag$ExceptionWithErrObject"], "details":[{ "set-cluster-policy":[{ "cores":"<3", "nodeset":{"sysprop.rex.node.type":"tlog"}}], "errorMessages":["No value present"]}], "msg":"Error in command payload", "code":400}} However, this works: { "set-cluster-policy":[{"cores":"<3","node":"#ANY"}]} I read the autoscaling policy documentations and cannot figure out why. Could someone help me on this? Thanks! Li
Re: [EXTERNAL] Re: Autoscaling simulation error
Hi Andrzej , I have put the JSONs produced by "save" commands below: autoscalingState.json - https://pastebin.com/CrR0TdLf clusterState.json - https://pastebin.com/zxuYAMux nodeState.json https://pastebin.com/hxqjVUfV statistics.json https://pastebin.com/Jkaw8Y3j The simulate command is: /opt/solr-8.3.0/bin/solr autoscaling -a policy2.json -simulate -zkHost rexcloud-swoods-zookeeper-headless:2181 Policy2 can be found here: https://pastebin.com/VriJ27DE Setup: 12 nodes on Kubernetes. 6 for TLOG and 6 for Pull. The simulation is run on one of nodes inside Kubernetes because it needs the zookeeper inside the Kubernetes. Thanks! Li On 12/15/19, 5:13 PM, "Andrzej Białecki" wrote: Could you please provide the exact command-line? It would also help if you could provide an autoscaling snapshot of the cluster (bin/solr autoscaling -save ) or at least the autoscaling diagnostic info. (Please note that the mailing list removes all attachments, so just provide a link to the snapshot). > On 15 Dec 2019, at 18:42, Cao, Li wrote: > > Hi! > > I am using solr 8.3.0 in cloud mode. I have collection level autoscaling policy and the collection name is “entity”. But when I run autoscaling simulation all the steps failed with this message: > >"error":{ > "exception":"java.io.IOException: java.util.concurrent.ExecutionException: org.apache.solr.common.SolrException: org.apache.solr.common.SolrException: Could not find collection : entity/shards", > "suggestion":{ >"type":"repair", >"operation":{ > "method":"POST", > "path":"/c/entity/shards", > "command":{"add-replica":{ > "shard":"shard2", > "node":"my_node:8983_solr", > "type":"TLOG", > "replicaInfo":null}}}, > > Does anyone know how to fix this? Is this a bug? > > Thanks! > > Li
Autoscaling simulation error
Hi! I am using solr 8.3.0 in cloud mode. I have collection level autoscaling policy and the collection name is “entity”. But when I run autoscaling simulation all the steps failed with this message: "error":{ "exception":"java.io.IOException: java.util.concurrent.ExecutionException: org.apache.solr.common.SolrException: org.apache.solr.common.SolrException: Could not find collection : entity/shards", "suggestion":{ "type":"repair", "operation":{ "method":"POST", "path":"/c/entity/shards", "command":{"add-replica":{ "shard":"shard2", "node":"my_node:8983_solr", "type":"TLOG", "replicaInfo":null}}}, Does anyone know how to fix this? Is this a bug? Thanks! Li
Re: what's in cursorMark
Hi, Did you just do base84 decoding? Thanks, Yi On 10/1/18, 9:41 AM, "Vincenzo D'Amore" wrote: Hi Yi, have you tried to decode the string? AoE/E2Zhdm9yaXRlUGxhY2UvZjg1MzMzYzEtYzQ0NC00Y2ZiLWFmZDctMzcyODFhMDdiMGY3 seems to be only: ? favoritePlace/f85333c1-c444-4cfb-afd7-37281a07b0f7 On Mon, Oct 1, 2018 at 3:37 PM Li, Yi wrote: > Hi, > > cursorMark appears as something like > AoE/E2Zhdm9yaXRlUGxhY2UvZjg1MzMzYzEtYzQ0NC00Y2ZiLWFmZDctMzcyODFhMDdiMGY3 > > and the document says it is “Base64 encoded serialized representation of > the sort values encapsulated by this object” > > I like to know if I can decode and what content I will see in there. > > For example, If there is an object as a json: > { > “id”:”123”, > “name”:”objectname”, > “secret”:”my secret” > } > if I search id:123, and only that object returned with a cursorMark, will > I be able to decode the cursorMark and get that secret? > > Thanks, > Yi > -- Vincenzo D'Amore
what's in cursorMark
Hi, cursorMark appears as something like AoE/E2Zhdm9yaXRlUGxhY2UvZjg1MzMzYzEtYzQ0NC00Y2ZiLWFmZDctMzcyODFhMDdiMGY3 and the document says it is “Base64 encoded serialized representation of the sort values encapsulated by this object” I like to know if I can decode and what content I will see in there. For example, If there is an object as a json: { “id”:”123”, “name”:”objectname”, “secret”:”my secret” } if I search id:123, and only that object returned with a cursorMark, will I be able to decode the cursorMark and get that secret? Thanks, Yi
Running Solr 5.3.1 with JDK10
Hi, Currently we are running Solr 5.3.1 with JDK8 and we are trying to run Solr 5.3.1 with JDK10. Initially we got a few errors complaining some JVM options are removed since JDK9. We removed those options in solr.in.sh: UseConcMarkSweepGC UseParNewGC PrintHeapAtGC PrintGCDateStamps PrintGCTimeStamps PrintTenuringDistribution PrintGCApplicationStoppedTime And the options left in solr.in.sh: 1. Enable verbose GC logging GC_LOG_OPTS="-verbose:gc -XX:+PrintGCDetails" 1. These GC settings have shown to work well for a number of common Solr workloads GC_TUNE="-XX:NewRatio=3 \ -XX:SurvivorRatio=4 \ -XX:TargetSurvivorRatio=90 \ -XX:MaxTenuringThreshold=8 \ -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 \ -XX:+CMSScavengeBeforeRemark \ -XX:PretenureSizeThreshold=64m \ -XX:+UseCMSInitiatingOccupancyOnly \ -XX:CMSInitiatingOccupancyFraction=50 \ -XX:CMSMaxAbortablePrecleanTime=6000 \ -XX:+CMSParallelRemarkEnabled \ -XX:+ParallelRefProcEnabled" After that SOLR runs but it got an error on SystemInfoHandler Error getting JMX properties. [root@centos6 logs]# service solr status Found 1 Solr nodes: Solr process 4630 running on port 8983 ERROR: Failed to get system information from http://localhost:8983/solr due to: java.lang.NullPointerException Can someone share experience using Solr 5.3.x with JDK9 or above? Thanks, Yi P.S. Console output: [0.001s][warning][gc] -Xloggc is deprecated. Will use -Xlog:gc:/solr/logs/solr_gc.log instead. [0.001s][warning][gc] -XX:+PrintGCDetails is deprecated. Will use -Xlog:gc* instead. [0.003s][info ][gc] Using Serial WARNING: System properties and/or JVM args set. Consider using --dry-run or --exec 0 INFO (main) [ ] o.e.j.u.log Logging initialized @532ms 205 INFO (main) [ ] o.e.j.s.Server jetty-9.2.11.v20150529 218 WARN (main) [ ] o.e.j.s.h.RequestLogHandler !RequestLog 220 INFO (main) [ ] o.e.j.d.p.ScanningAppProvider Deployment monitor file:/home/solr/solr-5.3.1/server/contexts/ at interval 0 559 INFO (main) [ ] o.e.j.w.StandardDescriptorProcessor NO JSP Support for /solr, did not find org.apache.jasper.servlet.JspServlet 569 WARN (main) [ ] o.e.j.s.SecurityHandler ServletContext@o.e.j.w.WebAppContext@1a75e76a {/solr,file:/home/solr/solr-5.3.1/server/solr-webapp/webapp/,STARTING} {/home/solr/solr-5.3.1/server/solr-webapp/webapp} has uncovered http methods for path: / 577 INFO (main) [ ] o.a.s.s.SolrDispatchFilter SolrDispatchFilter.init(): WebAppClassLoader=1904783235@7188af83 625 INFO (main) [ ] o.a.s.c.SolrResourceLoader JNDI not configured for solr (NoInitialContextEx) 626 INFO (main) [ ] o.a.s.c.SolrResourceLoader using system property solr.solr.home: /solr/data 627 INFO (main) [ ] o.a.s.c.SolrResourceLoader new SolrResourceLoader for directory: '/solr/data/' 750 INFO (main) [ ] o.a.s.c.SolrXmlConfig Loading container configuration from /solr/data/solr.xml 817 INFO (main) [ ] o.a.s.c.CoresLocator Config-defined core root directory: /solr/data [1.402s][info ][gc] GC(0) Pause Full (Metadata GC Threshold) 85M->7M(490M) 37.281ms 875 INFO (main) [ ] o.a.s.c.CoreContainer New CoreContainer 1193398802 875 INFO (main) [ ] o.a.s.c.CoreContainer Loading cores into CoreContainer [instanceDir=/solr/data/] 875 INFO (main) [ ] o.a.s.c.CoreContainer loading shared library: /solr/data/lib 875 WARN (main) [ ] o.a.s.c.SolrResourceLoader Can't find (or read) directory to add to classloader: lib (resolved as: /solr/data/lib). 889 INFO (main) [ ] o.a.s.h.c.HttpShardHandlerFactory created with socketTimeout : 60,connTimeout : 6,maxConnectionsPerHost : 20,maxConnections : 1,corePoolSize : 0,maximumPoolSize : 2147483647,maxThreadIdleTime : 5,sizeOfQueue : -1,fairnessPolicy : false,useRetries : false, 1036 INFO (main) [ ] o.a.s.u.UpdateShardHandler Creating UpdateShardHandler HTTP client with params: socketTimeout=60=6=true 1038 INFO (main) [ ] o.a.s.l.LogWatcher SLF4J impl is org.slf4j.impl.Log4jLoggerFactory 1039 INFO (main) [ ] o.a.s.l.LogWatcher Registering Log Listener [Log4j (org.slf4j.impl.Log4jLoggerFactory)] 1040 INFO (main) [ ] o.a.s.c.CoreContainer Security conf doesn't exist. Skipping setup for authorization module. 1041 INFO (main) [ ] o.a.s.c.CoreContainer No authentication plugin used. 1179 INFO (main) [ ] o.a.s.c.CoresLocator Looking for core definitions underneath /solr/data 1180 INFO (main) [ ] o.a.s.c.CoresLocator Found 0 core definitions 1185 INFO (main) [ ] o.a.s.s.SolrDispatchFilter user.dir=/home/solr/solr-5.3.1/server 1186 INFO (main) [ ] o.a.s.s.SolrDispatchFilter SolrDispatchFilter.init() done 1216 INFO (main) [ ] o.e.j.s.h.ContextHandler Started o.e.j.w.WebAppContext@1a75e76a{/solr,file:/home/solr/solr-5.3.1/server/solr-webapp/webapp/,AVAILABLE}{/home/solr/solr-5.3.1/server/solr-webapp/webapp} 1224 INFO (main) [ ] o.e.j.s.ServerConnector Started ServerConnector@2102a4d5 {HTTP/1.1} {0.0.0.0:8983} 1228 INFO (main) [ ] o.e.j.s.Server Started @1762ms 14426 WARN (qtp1045997582-15) [ ] o.a.s.h.a.SystemInfoHandler Error
Problem encountered upon starting Solr after improper exit
To whom it may concern, I am running Solr 7.1.0 and encountered a problem starting Solr after I killed the Java process running Solr without proper cleanup. The error message that I received is as following: solr-7.1.0 liyifan$ bin/solr run dyld: Library not loaded: /usr/local/opt/mpfr/lib/libmpfr.4.dylib Referenced from: /usr/local/bin/awk Reason: image not found Your current version of Java is too old to run this version of Solr We found version , using command 'java -version', with response: java version "1.8.0_45" Java(TM) SE Runtime Environment (build 1.8.0_45-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode) Please install latest version of Java 1.8 or set JAVA_HOME properly. Debug information: JAVA_HOME: N/A Active Path: /Users/liyifan/anaconda3/bin:/Library/Frameworks/Python.framework/Versions/3.5/bin:/opt/local/bin:/opt/local/sbin:/usr/Documents/2016\ Spring/appcivist-mobilization/activator-dist-1.3.9/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Library/TeX/texbin:/opt/X11/bin:/usr/local/git/bin After I reset the JAVA_HOME variable, it still gives me the error: bin/solr start dyld: Library not loaded: /usr/local/opt/mpfr/lib/libmpfr.4.dylib Referenced from: /usr/local/bin/awk Reason: image not found Your current version of Java is too old to run this version of Solr We found version , using command '/Library/Java/JavaVirtualMachines/jdk1.8.0_45.jdk/Contents/Home/bin/java -version', with response: java version "1.8.0_45" Java(TM) SE Runtime Environment (build 1.8.0_45-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode) Please install latest version of Java 1.8 or set JAVA_HOME properly. Debug information: JAVA_HOME: /Library/Java/JavaVirtualMachines/jdk1.8.0_45.jdk/Contents/Home Active Path: /Users/liyifan/anaconda3/bin:/Library/Frameworks/Python.framework/Versions/3.5/bin:/opt/local/bin:/opt/local/sbin:/usr/Documents/2016\ Spring/appcivist-mobilization/activator-dist-1.3.9/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Library/TeX/texbin:/opt/X11/bin:/usr/local/git/bin:/Library/Java/JavaVirtualMachines/jdk1.8.0_45.jdk/Contents/Home/bin and the director /usr/local/opt/mpfr/lib/ only contains the following files: ls /usr/local/opt/mpfr/lib/ libmpfr.6.dylib libmpfr.a libmpfr.dylib pkgconfig Do you think this problem is caused by killing the Java process without proper cleanup? Could you suggest some solution to this problem? Thank you very much! Best, Yifan
Re: Disable leaders in SolrCloud mode
This happened when the second time I'm performing restart. But after that, every time this collection is stuck at here. If I restart the leader node as well, the core can get out of the recovering state On Mon, May 16, 2016 at 5:00 PM, Li Ding <li.d...@bloomreach.com> wrote: > Hi Anshum, > > This is for restart solr with 1000 collections. I created an environment > with 1023 collections today All collections are empty. During repeated > restart test, one of the cores are marked as "recovering" and stuck there > for ever. The solr is 4.6.1 and we have 3 zk hosts and 8 solr hosts, here > is the relevant logs: > > ---This is the logs for the core stuck at "recovering" > > INFO - 2016-05-16 22:47:04.984; org.apache.solr.cloud.ZkController; > publishing core=test_collection_112_shard1_replica2 state=down > > INFO - 2016-05-16 22:47:05.999; org.apache.solr.core.SolrCore; > [test_collection_112_shard1_replica2] CLOSING SolrCore > org.apache.solr.core.SolrCore@1e48619 > > INFO - 2016-05-16 22:47:06.001; org.apache.solr.core.SolrCore; > [test_collection_112_shard1_replica2] Closing main searcher on request. > > INFO - 2016-05-16 22:47:06.001; > org.apache.solr.core.CachingDirectoryFactory; looking to close /mnt > /solrcloud_latest/solr/test_collection_112_shard1_replica2/data/index > [CachedDir<
Re: Disable leaders in SolrCloud mode
Hi Anshum, This is for restart solr with 1000 collections. I created an environment with 1023 collections today All collections are empty. During repeated restart test, one of the cores are marked as "recovering" and stuck there for ever. The solr is 4.6.1 and we have 3 zk hosts and 8 solr hosts, here is the relevant logs: ---This is the logs for the core stuck at "recovering" INFO - 2016-05-16 22:47:04.984; org.apache.solr.cloud.ZkController; publishing core=test_collection_112_shard1_replica2 state=down INFO - 2016-05-16 22:47:05.999; org.apache.solr.core.SolrCore; [test_collection_112_shard1_replica2] CLOSING SolrCore org.apache.solr.core.SolrCore@1e48619 INFO - 2016-05-16 22:47:06.001; org.apache.solr.core.SolrCore; [test_collection_112_shard1_replica2] Closing main searcher on request. INFO - 2016-05-16 22:47:06.001; org.apache.solr.core.CachingDirectoryFactory; looking to close /mnt /solrcloud_latest/solr/test_collection_112_shard1_replica2/data/index
Disable leaders in SolrCloud mode
Hi all, We have an unique scenario where we don't need leaders in every collection to recover from failures. The indexing never changes. But we have faced problems where either zk marked a core as down while the core is fine in non-distributed query or during restart, the core never comes up. My question is that is there any simple way to disable those leaders and leaders election in SolrCloud, We do use multi-shard and distributed queries. But with our unique situation, we don't need leaders to maintain the correct status of the index. So if we can get rid of that part, our solr restart will be more robust. Any suggestions will be appreciated. Thanks, Li
Re: Questions on SolrCloud core state, when will Solr recover a "DOWN" core to "ACTIVE" core.
Hi Erick, I don't have the GC log. But after the GC finished. Isn't zk ping succeeds and the core should be back to normal state? From the log I posted. The sequence is: 1) Solr Detects itself can't connect to ZK and reconnect to ZK 2) Solr marked all cores are down 3) Solr recovery each cores, some succeeds, some failed. 4) After 30 minutes, the cores that are failed still marked as down. So my questions is, during the 30 minutes interval, if GC takes too long, all cores should failed. And GC doesn't take longer than a minute since all serving requests to other calls succeeds and the next zk ping should bring the core back to normal? right? We have an active monitor running at the same time querying every core in distrib=false mode and every query succeeds. Thanks, Li On Tue, Apr 26, 2016 at 6:20 PM, Erick Erickson <erickerick...@gmail.com> wrote: > One of the reasons this happens is if you have very > long GC cycles, longer than the Zookeeper "keep alive" > timeout. During a full GC pause, Solr is unresponsive and > if the ZK ping times out, ZK assumes the machine is > gone and you get into this recovery state. > > So I'd collect GC logs and see if you have any > stop-the-world GC pauses that take longer than the ZK > timeout. > > see Mark Millers primer on GC here: > https://lucidworks.com/blog/2011/03/27/garbage-collection-bootcamp-1-0/ > > Best, > Erick > > On Tue, Apr 26, 2016 at 2:13 PM, Li Ding <li.d...@bloomreach.com> wrote: > > Thank you all for your help! > > > > The zookeeper log rolled over, thisis from Solr.log: > > > > Looks like the solr and zk connection is gone for some reason > > > > INFO - 2016-04-21 12:37:57.536; > > org.apache.solr.common.cloud.ConnectionManager; Watcher > > org.apache.solr.common.cloud.ConnectionManager@19789a96 > > name:ZooKeeperConnection Watcher:{ZK HOSTS here} got event WatchedEvent > > state:Disconnected type:None path:null path:null type:None > > > > INFO - 2016-04-21 12:37:57.536; > > org.apache.solr.common.cloud.ConnectionManager; zkClient has disconnected > > > > INFO - 2016-04-21 12:38:24.248; > > org.apache.solr.common.cloud.DefaultConnectionStrategy; Connection > expired > > - starting a new one... > > > > INFO - 2016-04-21 12:38:24.262; > > org.apache.solr.common.cloud.ConnectionManager; Waiting for client to > > connect to ZooKeeper > > > > INFO - 2016-04-21 12:38:24.269; > > org.apache.solr.common.cloud.ConnectionManager; Connected:true > > > > > > Then it publishes all cores on the hosts are down. I just list three > cores > > here: > > > > INFO - 2016-04-21 12:38:24.269; org.apache.solr.cloud.ZkController; > > publishing core=product1_shard1_replica1 state=down > > > > INFO - 2016-04-21 12:38:24.271; org.apache.solr.cloud.ZkController; > > publishing core=collection1 state=down > > > > INFO - 2016-04-21 12:38:24.272; org.apache.solr.cloud.ZkController; > > numShards not found on descriptor - reading it from system property > > > > INFO - 2016-04-21 12:38:24.289; org.apache.solr.cloud.ZkController; > > publishing core=product2_shard5_replica1 state=down > > > > INFO - 2016-04-21 12:38:24.292; org.apache.solr.cloud.ZkController; > > publishing core=product2_shard13_replica1 state=down > > > > > > product1 has only one shard one replica and it's able to be active > > successfully: > > > > INFO - 2016-04-21 12:38:26.383; org.apache.solr.cloud.ZkController; > > Register replica - core:product1_shard1_replica1 address:http:// > > {internalIp}:8983/solr collection:product1 shard:shard1 > > > > WARN - 2016-04-21 12:38:26.385; org.apache.solr.cloud.ElectionContext; > > cancelElection did not find election node to remove > > > > INFO - 2016-04-21 12:38:26.393; > > org.apache.solr.cloud.ShardLeaderElectionContext; Running the leader > > process for shard shard1 > > > > INFO - 2016-04-21 12:38:26.399; > > org.apache.solr.cloud.ShardLeaderElectionContext; Enough replicas found > to > > continue. > > > > INFO - 2016-04-21 12:38:26.399; > > org.apache.solr.cloud.ShardLeaderElectionContext; I may be the new > leader - > > try and sync > > > > INFO - 2016-04-21 12:38:26.399; org.apache.solr.cloud.SyncStrategy; Sync > > replicas to http://{internalIp}:8983/solr/product1_shard1_replica1/ > > > > INFO - 2016-04-21 12:38:26.399; org.apache.solr.cloud.SyncStrategy; Sync > > Success - now sync replicas to me > > > > INFO - 2016-04-21 12:38:26.399; org.apache.solr.cloud.SyncStrategy; > > http://{internalIp}:898
Re: Questions on SolrCloud core state, when will Solr recover a "DOWN" core to "ACTIVE" core.
; org.apache.solr.cloud.ShardLeaderElectionContext; I am the new leader: http://{internalIp}:8983/solr/product2_shard5_replica1_shard5_replica1/ shard5 INFO - 2016-04-21 12:38:26.632; org.apache.solr.common.cloud.SolrZkClient; makePath: /collections/product2_shard5_replica1/leaders/shard5 INFO - 2016-04-21 12:38:26.645; org.apache.solr.cloud.ZkController; We are http://{internalIp}:8983/solr/product2_shard5_replica1_shard5_replica1/ and leader is http://{internalIp}:8983/solr product2_shard5_replica1_shard5_replica1/ INFO - 2016-04-21 12:38:26.646; org.apache.solr.common.cloud.ZkStateReader; Updating cloud state from ZooKeeper... Before I restarted this server, a bunch of queries failed for this collection product2. But I don't think it will affect the core status. Do you guys have any idea about why this particular core is not published as active since from the log, most steps are done except the very last one to publish info to ZK. Thanks, Li On Thu, Apr 21, 2016 at 7:08 AM, Rajesh Hazari <rajeshhaz...@gmail.com> wrote: > Hi Li, > > Do you see timeouts liek "CLUSTERSTATUS the collection time out:180s" > if its the case, this may be related to > https://issues.apache.org/jira/browse/SOLR-7940, > and i would say either use the patch file or upgrade. > > > *Thanks,* > *Rajesh,* > *8328789519,* > *If I don't answer your call please leave a voicemail with your contact > info, * > *will return your call ASAP.* > > On Thu, Apr 21, 2016 at 6:02 AM, YouPeng Yang <yypvsxf19870...@gmail.com> > wrote: > > > Hi > >We have used Solr4.6 for 2 years,If you post more logs ,maybe we can > > fixed it. > > > > 2016-04-21 6:50 GMT+08:00 Li Ding <li.d...@bloomreach.com>: > > > > > Hi All, > > > > > > We are using SolrCloud 4.6.1. We have observed following behaviors > > > recently. A Solr node in a Solrcloud cluster is up but some of the > cores > > > on the nodes are marked as down in Zookeeper. If the cores are parts > of > > a > > > multi-sharded collection with one replica, the queries to that > > collection > > > will fail. However, when this happened, if we issue queries to the > core > > > directly, it returns 200 and correct info. But once Solr got into the > > > state, the core will be marked down forever unless we do a restart on > > Solr. > > > > > > Has anyone seen this behavior before? Is there any to get out of the > > state > > > on its own? > > > > > > Thanks, > > > > > > Li > > > > > >
Questions on SolrCloud core state, when will Solr recover a "DOWN" core to "ACTIVE" core.
Hi All, We are using SolrCloud 4.6.1. We have observed following behaviors recently. A Solr node in a Solrcloud cluster is up but some of the cores on the nodes are marked as down in Zookeeper. If the cores are parts of a multi-sharded collection with one replica, the queries to that collection will fail. However, when this happened, if we issue queries to the core directly, it returns 200 and correct info. But once Solr got into the state, the core will be marked down forever unless we do a restart on Solr. Has anyone seen this behavior before? Is there any to get out of the state on its own? Thanks, Li
Re: Why are these two queries different?
Thanks for your help. I figured it out. Just as you said. Appreciate your help. Somehow forgot to reply your post. On Wed, Apr 29, 2015 at 9:24 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : We did two SOLR qeries and they supposed to return the same results but : did not: the short answer is: if you want those queries to return the same results, then you need to adjust your query time analyzer forthe all_text field to not split intra numberic tokens on , i don't know *why* exactly it's doing that, because you didn't give us the full details of your field/fieldtypes (or other really important info: the full request params -- echoParams=all -- and the documents matched by your second query, etc... https://wiki.apache.org/solr/UsingMailingLists ) ... but that's the reason the queries are different as evident from the parsedquery output. : Query 1: all_text:(US 4,568,649 A) : : parsedquery: (+((all_text:us ((all_text:4 all_text:568 all_text:649 : all_text:4568649)~4))~2))/no_coord, : : Result: numFound: 0, : : Query 2: all_text:(US 4568649) : : parsedquery: (+((all_text:us all_text:4568649)~2))/no_coord, : : : Result: numFound: 2, : : : We assumed the two return the same result. Our default operator is AND. -Hoss http://www.lucidworks.com/
Re: JSON Facet Analytics API in Solr 5.1
Thank you, Yonik! Looks cool to me. Only problem is it is not working for me. I see you have cats and cat in your URL. cat must be a field name. What is cats? We are doing a POC with facet count ascending. You help is really important to us. On Sat, May 9, 2015 at 8:05 AM, Yonik Seeley ysee...@gmail.com wrote: curl -g http://localhost:8983/solr/techproducts/query?q=*:*json.facet={cats:{terms:{field:cat,sort:'count+asc'}}} Using curl with everything in the URL is definitely trickier. Everything needs to be URL escaped. If it's not, curl will often silently do nothing. For example, when I had sort:'count asc' , the command above would do nothing. When I remembered to URL encode the space as a +, it started working. It's definitely easier to use -d with curl... curl http://localhost:8983/solr/techproducts/query; -d 'q=*:*json.facet={cats:{terms:{field:cat,sort:count asc}}}' That also allows you to format it nicer for reading as well: curl http://localhost:8983/solr/techproducts/query; -d 'q=*:*json.facet= {cats:{terms:{ field:cat, sort:count asc }}}' -Yonik On Thu, May 7, 2015 at 5:32 PM, Frank li fudon...@gmail.com wrote: This one does not have problem, but how do I include sort in this facet query. Basically, I want to write a solr query which can sort the facet count ascending. Something like http://localhost:8983/solr /demo/query?q=applejson.facet={field=price sort='count asc'} http://localhost:8983/solr/demo/query?q=applejson.facet=%7Bx:%27avg%28price%29%27%7D I really appreciate your help. Frank http://localhost:8983/solr/demo/query?q=applejson.facet=%7Bx:%27avg%28price%29%27%7D On Thu, May 7, 2015 at 2:24 PM, Yonik Seeley ysee...@gmail.com wrote: On Thu, May 7, 2015 at 4:47 PM, Frank li fudon...@gmail.com wrote: Hi Yonik, I am reading your blog. It is helpful. One question for you, for following example, curl http://localhost:8983/solr/query -d 'q=*:*rows=0 json.facet={ categories:{ type : terms, field : cat, sort : { x : desc}, facet:{ x : avg(price), y : sum(price) } } } ' If I want to write it in the format of this: http://localhost:8983/solr/query?q=applejson.facet={x:'avg(campaign_ult_defendant_cnt_is) '} , how do I do? What problems do you encounter when you try that? If you try that URL with curl, be aware that curly braces {} are special globbing characters in curl. Turn them off with the -g option: curl -g http://localhost:8983/solr/demo/query?q=applejson.facet={x:'avg(price)'} -Yonik
Re: JSON Facet Analytics API in Solr 5.1
Here is our SOLR query: http://qa-solr:8080/solr/select?q=type:PortalCasejson.facet={categories:{terms:{field:campaign_id_ls,sort:%27count+asc%27}}}rows=0 I replaced cats with categories. It is still not working. On Sun, May 10, 2015 at 12:10 AM, Frank li fudon...@gmail.com wrote: Thank you, Yonik! Looks cool to me. Only problem is it is not working for me. I see you have cats and cat in your URL. cat must be a field name. What is cats? We are doing a POC with facet count ascending. You help is really important to us. On Sat, May 9, 2015 at 8:05 AM, Yonik Seeley ysee...@gmail.com wrote: curl -g http://localhost:8983/solr/techproducts/query?q=*:*json.facet={cats:{terms:{field:cat,sort:'count+asc'}}} http://localhost:8983/solr/techproducts/query?q=*:*json.facet=%7Bcats:%7Bterms:%7Bfield:cat,sort:'count+asc'%7D%7D%7D Using curl with everything in the URL is definitely trickier. Everything needs to be URL escaped. If it's not, curl will often silently do nothing. For example, when I had sort:'count asc' , the command above would do nothing. When I remembered to URL encode the space as a +, it started working. It's definitely easier to use -d with curl... curl http://localhost:8983/solr/techproducts/query; -d 'q=*:*json.facet={cats:{terms:{field:cat,sort:count asc}}}' That also allows you to format it nicer for reading as well: curl http://localhost:8983/solr/techproducts/query; -d 'q=*:*json.facet= {cats:{terms:{ field:cat, sort:count asc }}}' -Yonik On Thu, May 7, 2015 at 5:32 PM, Frank li fudon...@gmail.com wrote: This one does not have problem, but how do I include sort in this facet query. Basically, I want to write a solr query which can sort the facet count ascending. Something like http://localhost:8983/solr /demo/query?q=applejson.facet={field=price sort='count asc'} http://localhost:8983/solr/demo/query?q=applejson.facet=%7Bx:%27avg%28price%29%27%7D I really appreciate your help. Frank http://localhost:8983/solr/demo/query?q=applejson.facet=%7Bx:%27avg%28price%29%27%7D On Thu, May 7, 2015 at 2:24 PM, Yonik Seeley ysee...@gmail.com wrote: On Thu, May 7, 2015 at 4:47 PM, Frank li fudon...@gmail.com wrote: Hi Yonik, I am reading your blog. It is helpful. One question for you, for following example, curl http://localhost:8983/solr/query -d 'q=*:*rows=0 json.facet={ categories:{ type : terms, field : cat, sort : { x : desc}, facet:{ x : avg(price), y : sum(price) } } } ' If I want to write it in the format of this: http://localhost:8983/solr/query?q=applejson.facet={x:'avg(campaign_ult_defendant_cnt_is) '} , how do I do? What problems do you encounter when you try that? If you try that URL with curl, be aware that curly braces {} are special globbing characters in curl. Turn them off with the -g option: curl -g http://localhost:8983/solr/demo/query?q=applejson.facet={x:'avg(price)'} -Yonik
Re: JSON Facet Analytics API in Solr 5.1
I figured it out now. It works. cats just a name, right? It does not matter what is used. Really appreciate your help. This is going to be really useful. I meant json.facet. On Sun, May 10, 2015 at 12:13 AM, Frank li fudon...@gmail.com wrote: Here is our SOLR query: http://qa-solr:8080/solr/select?q=type:PortalCasejson.facet={categories:{terms:{field:campaign_id_ls,sort:%27count+asc%27}}}rows=0 http://qa-solr:8080/solr/select?q=type:PortalCasejson.facet=%7Bcategories:%7Bterms:%7Bfield:campaign_id_ls,sort:%27count+asc%27%7D%7D%7Drows=0 I replaced cats with categories. It is still not working. On Sun, May 10, 2015 at 12:10 AM, Frank li fudon...@gmail.com wrote: Thank you, Yonik! Looks cool to me. Only problem is it is not working for me. I see you have cats and cat in your URL. cat must be a field name. What is cats? We are doing a POC with facet count ascending. You help is really important to us. On Sat, May 9, 2015 at 8:05 AM, Yonik Seeley ysee...@gmail.com wrote: curl -g http://localhost:8983/solr/techproducts/query?q=*:*json.facet={cats:{terms:{field:cat,sort:'count+asc'}}} http://localhost:8983/solr/techproducts/query?q=*:*json.facet=%7Bcats:%7Bterms:%7Bfield:cat,sort:'count+asc'%7D%7D%7D Using curl with everything in the URL is definitely trickier. Everything needs to be URL escaped. If it's not, curl will often silently do nothing. For example, when I had sort:'count asc' , the command above would do nothing. When I remembered to URL encode the space as a +, it started working. It's definitely easier to use -d with curl... curl http://localhost:8983/solr/techproducts/query; -d 'q=*:*json.facet={cats:{terms:{field:cat,sort:count asc}}}' That also allows you to format it nicer for reading as well: curl http://localhost:8983/solr/techproducts/query; -d 'q=*:*json.facet= {cats:{terms:{ field:cat, sort:count asc }}}' -Yonik On Thu, May 7, 2015 at 5:32 PM, Frank li fudon...@gmail.com wrote: This one does not have problem, but how do I include sort in this facet query. Basically, I want to write a solr query which can sort the facet count ascending. Something like http://localhost:8983/solr /demo/query?q=applejson.facet={field=price sort='count asc'} http://localhost:8983/solr/demo/query?q=applejson.facet=%7Bx:%27avg%28price%29%27%7D I really appreciate your help. Frank http://localhost:8983/solr/demo/query?q=applejson.facet=%7Bx:%27avg%28price%29%27%7D On Thu, May 7, 2015 at 2:24 PM, Yonik Seeley ysee...@gmail.com wrote: On Thu, May 7, 2015 at 4:47 PM, Frank li fudon...@gmail.com wrote: Hi Yonik, I am reading your blog. It is helpful. One question for you, for following example, curl http://localhost:8983/solr/query -d 'q=*:*rows=0 json.facet={ categories:{ type : terms, field : cat, sort : { x : desc}, facet:{ x : avg(price), y : sum(price) } } } ' If I want to write it in the format of this: http://localhost:8983/solr/query?q=applejson.facet={x:'avg(campaign_ult_defendant_cnt_is) '} , how do I do? What problems do you encounter when you try that? If you try that URL with curl, be aware that curly braces {} are special globbing characters in curl. Turn them off with the -g option: curl -g http://localhost:8983/solr/demo/query?q=applejson.facet={x:'avg(price)'} -Yonik
Re: JSON Facet Analytics API in Solr 5.1
Hi Yonik, Any update for the question? Thanks in advance, Frank On Thu, May 7, 2015 at 2:49 PM, Frank li fudon...@gmail.com wrote: Is there any book to read so I won't ask such dummy questions? Thanks. On Thu, May 7, 2015 at 2:32 PM, Frank li fudon...@gmail.com wrote: This one does not have problem, but how do I include sort in this facet query. Basically, I want to write a solr query which can sort the facet count ascending. Something like http://localhost:8983/solr /demo/query?q=applejson.facet={field=price sort='count asc'} http://localhost:8983/solr/demo/query?q=applejson.facet=%7Bx:%27avg%28price%29%27%7D I really appreciate your help. Frank http://localhost:8983/solr/demo/query?q=applejson.facet=%7Bx:%27avg%28price%29%27%7D On Thu, May 7, 2015 at 2:24 PM, Yonik Seeley ysee...@gmail.com wrote: On Thu, May 7, 2015 at 4:47 PM, Frank li fudon...@gmail.com wrote: Hi Yonik, I am reading your blog. It is helpful. One question for you, for following example, curl http://localhost:8983/solr/query -d 'q=*:*rows=0 json.facet={ categories:{ type : terms, field : cat, sort : { x : desc}, facet:{ x : avg(price), y : sum(price) } } } ' If I want to write it in the format of this: http://localhost:8983/solr/query?q=applejson.facet={x:'avg(campaign_ult_defendant_cnt_is)'} , how do I do? What problems do you encounter when you try that? If you try that URL with curl, be aware that curly braces {} are special globbing characters in curl. Turn them off with the -g option: curl -g http://localhost:8983/solr/demo/query?q=applejson.facet={x:'avg(price)'} -Yonik
Re: JSON Facet Analytics API in Solr 5.1
Hi Yonik, I am reading your blog. It is helpful. One question for you, for following example, curl http://localhost:8983/solr/query -d 'q=*:*rows=0 json.facet={ categories:{ type : terms, field : cat, sort : { x : desc}, facet:{ x : avg(price), y : sum(price) } } } ' If I want to write it in the format of this: http://localhost:8983/solr/query?q=applejson.facet={x:'avg(campaign_ult_defendant_cnt_is)'}, how do I do? Thanks, Frank On Mon, Apr 20, 2015 at 7:35 AM, Davis, Daniel (NIH/NLM) [C] daniel.da...@nih.gov wrote: Indeed - XML is not human readable if it contains colons, JSON is not human readable if it is too deep, and the objects/keys are not semantic. I also vote for flatter. -Original Message- From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] Sent: Friday, April 17, 2015 11:16 PM To: solr-user@lucene.apache.org Subject: Re: JSON Facet Analytics API in Solr 5.1 Flatter please. The other nested stuff makes my head hurt. Until recently I thought I was the only person on the planet who had a hard time mentally parsing anything but the simplest JSON, but then I learned that I'm not alone at all it's just that nobody is saying it. :) Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Fri, Apr 17, 2015 at 7:26 PM, Trey Grainger solrt...@gmail.com wrote: Agreed, I also prefer the second way. I find it more readible, less verbose while communicating the same information, less confusing to mentally parse (is 'terms' the name of my facet, or the type of my facet?...), and less prone to syntactlcally valid, but logically invalid inputs. Let's break those topics down. *1) Less verbose while communicating the same information:* The flatter structure is particularly useful when you have nested facets to reduce unnecessary verbosity / extra levels. Let's contrast the two approaches with just 2 levels of subfacets: ** Current Format ** top_genres:{ terms:{ field: genre, limit: 5, facet:{ top_authors:{ terms:{ field: author, limit: 4, facet: { top_books:{ terms:{ field: title, limit: 5 } } } } } } } } ** Flat Format ** top_genres:{ type: terms, field: genre, limit: 5, facet:{ top_authors:{ type: terms field: author, limit: 4, facet: { top_books:{ type: terms field: title, limit: 5 } } } } } The flat format is clearly shorter and more succinct, while communicating the same information. What value do the extra levels add? *2) Less confusing to mentally parse* I also find the flatter structure less confusing, as I'm consistently having to take a mental pause with the current format to verify whether terms is the name of my facet or the type of my facet and have to count the curly braces to figure this out. Not that I would name my facets like this, but to give an extreme example of why that extra mental calculation is necessary due to the name of an attribute in the structure being able to represent both a facet name and facet type: terms: { terms: { field: genre, limit: 5, facet: { terms: { terms:{ field: author limit: 4 } } } } } In this example, the first terms is a facet name, the second terms is a facet type, the third is a facet name, etc. Even if you don't name your facets like this, it still requires parsing someone else's query mentally to ensure that's not what was done. 3) *Less prone to syntactically valid, but logically invalid inputs* Also, given this first format (where the type is indicated by one of several possible attributes: terms, range, etc.), what happens if I pass in multiple of the valid JSON attributes... the flatter structure prevents this from being possible (which is a good thing!): top_authors : { terms : { field : author, limit : 5 }, range : { field : price, start : 0, end : 100, gap : 20 } } I don't think the response format can currently handle this without adding in extra levels to make it look like the input side, so this is an exception case even thought it seems
Re: JSON Facet Analytics API in Solr 5.1
This one does not have problem, but how do I include sort in this facet query. Basically, I want to write a solr query which can sort the facet count ascending. Something like http://localhost:8983/solr /demo/query?q=applejson.facet={field=price sort='count asc'} http://localhost:8983/solr/demo/query?q=applejson.facet=%7Bx:%27avg%28price%29%27%7D I really appreciate your help. Frank http://localhost:8983/solr/demo/query?q=applejson.facet=%7Bx:%27avg%28price%29%27%7D On Thu, May 7, 2015 at 2:24 PM, Yonik Seeley ysee...@gmail.com wrote: On Thu, May 7, 2015 at 4:47 PM, Frank li fudon...@gmail.com wrote: Hi Yonik, I am reading your blog. It is helpful. One question for you, for following example, curl http://localhost:8983/solr/query -d 'q=*:*rows=0 json.facet={ categories:{ type : terms, field : cat, sort : { x : desc}, facet:{ x : avg(price), y : sum(price) } } } ' If I want to write it in the format of this: http://localhost:8983/solr/query?q=applejson.facet={x:'avg(campaign_ult_defendant_cnt_is)'} , how do I do? What problems do you encounter when you try that? If you try that URL with curl, be aware that curly braces {} are special globbing characters in curl. Turn them off with the -g option: curl -g http://localhost:8983/solr/demo/query?q=applejson.facet={x:'avg(price)'} -Yonik
Re: JSON Facet Analytics API in Solr 5.1
Is there any book to read so I won't ask such dummy questions? Thanks. On Thu, May 7, 2015 at 2:32 PM, Frank li fudon...@gmail.com wrote: This one does not have problem, but how do I include sort in this facet query. Basically, I want to write a solr query which can sort the facet count ascending. Something like http://localhost:8983/solr /demo/query?q=applejson.facet={field=price sort='count asc'} http://localhost:8983/solr/demo/query?q=applejson.facet=%7Bx:%27avg%28price%29%27%7D I really appreciate your help. Frank http://localhost:8983/solr/demo/query?q=applejson.facet=%7Bx:%27avg%28price%29%27%7D On Thu, May 7, 2015 at 2:24 PM, Yonik Seeley ysee...@gmail.com wrote: On Thu, May 7, 2015 at 4:47 PM, Frank li fudon...@gmail.com wrote: Hi Yonik, I am reading your blog. It is helpful. One question for you, for following example, curl http://localhost:8983/solr/query -d 'q=*:*rows=0 json.facet={ categories:{ type : terms, field : cat, sort : { x : desc}, facet:{ x : avg(price), y : sum(price) } } } ' If I want to write it in the format of this: http://localhost:8983/solr/query?q=applejson.facet={x:'avg(campaign_ult_defendant_cnt_is)'} , how do I do? What problems do you encounter when you try that? If you try that URL with curl, be aware that curly braces {} are special globbing characters in curl. Turn them off with the -g option: curl -g http://localhost:8983/solr/demo/query?q=applejson.facet={x:'avg(price)'} -Yonik
Why are these two queries different?
We did two SOLR qeries and they supposed to return the same results but didnot: Query 1: all_text:(US 4,568,649 A) parsedquery: (+((all_text:us ((all_text:4 all_text:568 all_text:649 all_text:4568649)~4))~2))/no_coord, Result: numFound: 0, Query 2: all_text:(US 4568649) parsedquery: (+((all_text:us all_text:4568649)~2))/no_coord, Result: numFound: 2, We assumed the two return the same result. Our default operator is AND.
Re: Config join parse in solrconfig.xml
Cool. It actually works after I removed those extra columns. Thanks for your help. On Mon, Apr 6, 2015 at 8:19 PM, Erick Erickson erickerick...@gmail.com wrote: df does not allow multiple fields, it stands for default field, not default fields. To get what you're looking for, you need to use edismax or explicitly create the multiple clauses. I'm not quite sure what the join parser is doing with the df parameter. So my first question is what happens if you just use a single field for df?. Best, Erick On Mon, Apr 6, 2015 at 11:51 AM, Frank li fudon...@gmail.com wrote: The error message was from the query with debug=query. On Mon, Apr 6, 2015 at 11:49 AM, Frank li fudon...@gmail.com wrote: Hi Erick, Thanks for your response. Here is the query I am sending: http://dev-solr:8080/solr/collection1/select?q={!join+from=litigation_id_ls+to=lit_id_lms}all_text:applefq=type:PartyLawyerLawfirmfacet=truefacet.field=lawyer_id_lmsfacet.mincount=1rows=0 http://dev-solr:8080/solr/collection1/select?q=%7B!join+from=litigation_id_ls+to=lit_id_lms%7Dall_text:applefq=type:PartyLawyerLawfirmfacet=truefacet.field=lawyer_id_lmsfacet.mincount=1rows=0 You can see it has all_text:apple. I added field name all_text, because it gives error without it. Errors: lst name=errorstr name=msgundefined field all_text number party name all_code ent_name/strint name=code400/int/lst These fields are defined as the default search fields in our solr_config.xml file: str name=dfall_text number party name all_code ent_name/str Thanks, Fudong On Fri, Apr 3, 2015 at 1:31 PM, Erick Erickson erickerick...@gmail.com wrote: You have to show us several more things: 1 what exactly does the query look like? 2 what do you expect? 3 output when you specify debug=query 4 anything else that would help. You might review: http://wiki.apache.org/solr/UsingMailingLists Best, Erick On Fri, Apr 3, 2015 at 10:58 AM, Frank li fudon...@gmail.com wrote: Hi, I am starting using join parser with our solr. We have some default fields. They are defined in solrconfig.xml: lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str int name=rows10/int str name=dfall_text number party name all_code ent_name/str str name=qfall_text number^3 name^5 party^3 all_code^2 ent_name^7/str str name=flid description market_sector_type parent ult_parent ent_name title patent_title *_ls *_lms *_is *_texts *_ac *_as *_s *_ss *_ds *_sms *_ss *_bs/str str name=q.opAND/str /lst I found out once I use join parser, it does not recognize the default fields any more. How do I modify the configuration for this? Thanks, Fred
Re: Config join parse in solrconfig.xml
Hi Erick, Thanks for your response. Here is the query I am sending: http://dev-solr:8080/solr/collection1/select?q={!join+from=litigation_id_ls+to=lit_id_lms}all_text:applefq=type:PartyLawyerLawfirmfacet=truefacet.field=lawyer_id_lmsfacet.mincount=1rows=0 You can see it has all_text:apple. I added field name all_text, because it gives error without it. Errors: lst name=errorstr name=msgundefined field all_text number party name all_code ent_name/strint name=code400/int/lst These fields are defined as the default search fields in our solr_config.xml file: str name=dfall_text number party name all_code ent_name/str Thanks, Fudong On Fri, Apr 3, 2015 at 1:31 PM, Erick Erickson erickerick...@gmail.com wrote: You have to show us several more things: 1 what exactly does the query look like? 2 what do you expect? 3 output when you specify debug=query 4 anything else that would help. You might review: http://wiki.apache.org/solr/UsingMailingLists Best, Erick On Fri, Apr 3, 2015 at 10:58 AM, Frank li fudon...@gmail.com wrote: Hi, I am starting using join parser with our solr. We have some default fields. They are defined in solrconfig.xml: lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str int name=rows10/int str name=dfall_text number party name all_code ent_name/str str name=qfall_text number^3 name^5 party^3 all_code^2 ent_name^7/str str name=flid description market_sector_type parent ult_parent ent_name title patent_title *_ls *_lms *_is *_texts *_ac *_as *_s *_ss *_ds *_sms *_ss *_bs/str str name=q.opAND/str /lst I found out once I use join parser, it does not recognize the default fields any more. How do I modify the configuration for this? Thanks, Fred
Re: Config join parse in solrconfig.xml
The error message was from the query with debug=query. On Mon, Apr 6, 2015 at 11:49 AM, Frank li fudon...@gmail.com wrote: Hi Erick, Thanks for your response. Here is the query I am sending: http://dev-solr:8080/solr/collection1/select?q={!join+from=litigation_id_ls+to=lit_id_lms}all_text:applefq=type:PartyLawyerLawfirmfacet=truefacet.field=lawyer_id_lmsfacet.mincount=1rows=0 http://dev-solr:8080/solr/collection1/select?q=%7B!join+from=litigation_id_ls+to=lit_id_lms%7Dall_text:applefq=type:PartyLawyerLawfirmfacet=truefacet.field=lawyer_id_lmsfacet.mincount=1rows=0 You can see it has all_text:apple. I added field name all_text, because it gives error without it. Errors: lst name=errorstr name=msgundefined field all_text number party name all_code ent_name/strint name=code400/int/lst These fields are defined as the default search fields in our solr_config.xml file: str name=dfall_text number party name all_code ent_name/str Thanks, Fudong On Fri, Apr 3, 2015 at 1:31 PM, Erick Erickson erickerick...@gmail.com wrote: You have to show us several more things: 1 what exactly does the query look like? 2 what do you expect? 3 output when you specify debug=query 4 anything else that would help. You might review: http://wiki.apache.org/solr/UsingMailingLists Best, Erick On Fri, Apr 3, 2015 at 10:58 AM, Frank li fudon...@gmail.com wrote: Hi, I am starting using join parser with our solr. We have some default fields. They are defined in solrconfig.xml: lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str int name=rows10/int str name=dfall_text number party name all_code ent_name/str str name=qfall_text number^3 name^5 party^3 all_code^2 ent_name^7/str str name=flid description market_sector_type parent ult_parent ent_name title patent_title *_ls *_lms *_is *_texts *_ac *_as *_s *_ss *_ds *_sms *_ss *_bs/str str name=q.opAND/str /lst I found out once I use join parser, it does not recognize the default fields any more. How do I modify the configuration for this? Thanks, Fred
Config join parse in solrconfig.xml
Hi, I am starting using join parser with our solr. We have some default fields. They are defined in solrconfig.xml: lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str int name=rows10/int str name=dfall_text number party name all_code ent_name/str str name=qfall_text number^3 name^5 party^3 all_code^2 ent_name^7/str str name=flid description market_sector_type parent ult_parent ent_name title patent_title *_ls *_lms *_is *_texts *_ac *_as *_s *_ss *_ds *_sms *_ss *_bs/str str name=q.opAND/str /lst I found out once I use join parser, it does not recognize the default fields any more. How do I modify the configuration for this? Thanks, Fred
sort and group.sort
We have a query which has both sort and group.sort. What we are expecting is that we can use sort to sort groups but inside the group we have a different sort. However, looks like sort is over-writting the sorting order inside groups. Can any one of you help us on this? Basically we want to sort the groups in one way but sort inside the group in another way. Thanks, Fudong
Re: Solr add document over 20 times slower after upgrade from 4.0 to 4.9
HI Shawn, Thanks for your reply. The memory setting of my Solr box is 12G physically memory. 4G for java (-Xmx4096m) The index size is around 4G in Solr 4.9, I think it was over 6G in Solr 4.0. I do think the RAM size of java is one of the reasons for this slowness. I'm doing one big commit and when the ingestion process finished 50%, I can see the solr server already used over 90% of full memory. I'll try to assign more RAM to Solr Java. But from your experience, does 4G sounds like a good number for Java heap size for my scenario? Is there any way to reduce memory usage during index time? (One thing I know is do a few commits instead of one commit. ) My concern is providing I have 12 G in total, If I assign too much to Solr server, I may not have enough for the OS to cache Solr index file. I had a look to solr config file, but couldn't find anything that obviously wrong, Just wondering which part of that config file would impact the index time? Thanks, Ryan One possible source of problems with that particular upgrade is the fact that stored field compression was added in 4.1, and termvector compression was added in 4.2. They are on by default and cannot be turned off. The compression is typically fast, but with very large documents like yours, it might result in pretty major computational overhead. It can also require additional java heap, which ties into what follows: Another problem might be RAM-related. If your java heap is very large, or just a little bit too small, there can be major performance issues from garbage collection. Based on the fact that the earlier version performed well, a too-small heap is more likely than a very large heap. If your index size is such that it can't be effectively cached by the amount of total RAM on the machine (minus the java heap assigned to Solr), that can cause performance problems. Your index size is likely to be several gigabytes, and might even reach double-digit gigabytes. Can you relate those numbers -- index size, java heap size, and total system RAM? If you can, it would also be a good idea to share your solrconfig.xml. Here's a wiki page that goes into more detail about possible performance issues. It doesn't mention the possible compression problem: http://wiki.apache.org/solr/SolrPerformanceProblems Thanks, Shawn
RE: Solr add document over 20 times slower after upgrade from 4.0 to 4.9
Hi Erick, As Ryan Ernst noticed, those big fields (eg majorTextSignalStem) is not stored. There are a few stored fields in my schema, but they are very small fields basically name or id for that document. I tried turn them off(only store id filed) and that didn't make any difference. Thanks, Ryan Ryan: As it happens, there's a discssion on the dev list about this. If at all possible, could you try a brief experiment? Turn off all the storage, i.e. set stored=false on all fields. It's a lot to ask, but it'd help the discussion. Or join the discussion at https://issues.apache.org/jira/browse/LUCENE-5914. Best, Erick From: Li, Ryan Sent: Friday, September 05, 2014 3:28 PM To: solr-user@lucene.apache.org Subject: Re: Solr add document over 20 times slower after upgrade from 4.0 to 4.9 HI Shawn, Thanks for your reply. The memory setting of my Solr box is 12G physically memory. 4G for java (-Xmx4096m) The index size is around 4G in Solr 4.9, I think it was over 6G in Solr 4.0. I do think the RAM size of java is one of the reasons for this slowness. I'm doing one big commit and when the ingestion process finished 50%, I can see the solr server already used over 90% of full memory. I'll try to assign more RAM to Solr Java. But from your experience, does 4G sounds like a good number for Java heap size for my scenario? Is there any way to reduce memory usage during index time? (One thing I know is do a few commits instead of one commit. ) My concern is providing I have 12 G in total, If I assign too much to Solr server, I may not have enough for the OS to cache Solr index file. I had a look to solr config file, but couldn't find anything that obviously wrong, Just wondering which part of that config file would impact the index time? Thanks, Ryan One possible source of problems with that particular upgrade is the fact that stored field compression was added in 4.1, and termvector compression was added in 4.2. They are on by default and cannot be turned off. The compression is typically fast, but with very large documents like yours, it might result in pretty major computational overhead. It can also require additional java heap, which ties into what follows: Another problem might be RAM-related. If your java heap is very large, or just a little bit too small, there can be major performance issues from garbage collection. Based on the fact that the earlier version performed well, a too-small heap is more likely than a very large heap. If your index size is such that it can't be effectively cached by the amount of total RAM on the machine (minus the java heap assigned to Solr), that can cause performance problems. Your index size is likely to be several gigabytes, and might even reach double-digit gigabytes. Can you relate those numbers -- index size, java heap size, and total system RAM? If you can, it would also be a good idea to share your solrconfig.xml. Here's a wiki page that goes into more detail about possible performance issues. It doesn't mention the possible compression problem: http://wiki.apache.org/solr/SolrPerformanceProblems Thanks, Shawn
Re: Solr add document over 20 times slower after upgrade from 4.0 to 4.9
Hi Guys, Just some update. I've tried with Solr 4.10 (same code for Solr 4.9). And that has the same index speed as 4.0. The only problem left now is that Solr 4.10 takes more memory than 4.0 so I'm trying to figure out what is the best number for Java heap size. I think that proves there is some performance issue with Solr 4.9 when index big document (even just over 1mb). Thanks, Ryan
Solr add document over 20 times slower after upgrade from 4.0 to 4.9
I have a Solr server indexes 2500 documents (up to 50MB each, ave 3MB) to Solr server. When running on Solr 4.0 I managed to finish index in 3 hours. However after we upgrade to Solr 4.9, the index need 3 days to finish. I've done some profiling, numbers I get are: size figure of document,time for adding to Solr server (4.0), time for adding to Solr server (4.9) 1.18, 6 sec, 123 sec 2.26 12sec 444 sec 3.35 18sec over 600 sec 9.6546sec timeout. From what I can see index seems has an o(n) performance for Solr 4.0 and is almost o(log n) for Solr 4.9. I also tried to comment out some copied fields to narrow down the problem, seems size of the document after index(we copy fields and the more fields we copy, the bigger the index size is) is the dominating factor for index time. Just wondering has any one experience similar problem? Does that sound like a bug of Solr or just we have use Solr 4.9 wrong? Here is one example of field definition in my schema file. fieldType name=text_stem class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ charFilter class=solr.PatternReplaceCharFilterFactory pattern='+ replacement= / !-- strip off all apostrophe (') characters -- tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SynonymFilterFactory expand=true ignoreCase=true synonyms=../../resources/type-index-synonyms.txt/ filter class=solr.SnowballPorterFilterFactory language=English / !-- Used to have language=English - seems this param is gone in 4.9 -- filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer analyzer type=query charFilter class=solr.HTMLStripCharFilterFactory/ charFilter class=solr.PatternReplaceCharFilterFactory pattern='+ replacement= / !-- strip off all apostrophe (') characters -- tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SynonymFilterFactory expand=true ignoreCase=true synonyms=../../resources/type-query-colloq-synonyms.txt/ filter class=solr.SnowballPorterFilterFactory language=English / !-- Used to have language=English - seems this param is gone in 4.9 -- filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer /fieldType Field: field name=majorTextSignalStem type=text_stem indexed=true stored=false multiValued=true omitNorms=false/ Copy: copyField dest=majorTextSignalStem source=majorTextSignalRaw / Thanks, Ryan
What is the difference between attorney:(Roger Miller) and attorney:Roger Miller
We got different results for these two queries. The first one returned 115 records and the second returns 179 records. Thanks, Fudong
Re: Custom FunctionQuery Guide/Tutorial (4.3.0+) ?
Hi Jack, Do you have a date for the new version of your book: solr_4x_deep_dive_early_access? Thanks, Fudong On Mon, Oct 21, 2013 at 10:39 AM, Jack Krupansky j...@basetechnology.comwrote: Take a look at the unit tests for various value sources, and find a Jira that added some value source and look at the patch for what changes had to be made. -- Jack Krupansky -Original Message- From: JT Sent: Monday, October 21, 2013 1:17 PM To: solr-user@lucene.apache.org Subject: Custom FunctionQuery Guide/Tutorial (4.3.0+) ? Does anyone have a good link to a guide / tutorial /etc. for writing a custom function query in Solr 4? The tutorials I've seen vary from showing half the code to being written for older versions of Solr. Any type of pointers would be appreciated, thanks.
stats on dynamic fields?
Hi, I don't seem to be able to find any info on the possibility to get stats on dynamic fields. stats=truestates.field=xyz_* appears to literally treat xyz_* as the field name with a star. Is there a way to get stats on dynamic fields without explicitly listing them in the query? Thanks! Li
RE: How to share config files in SolrCloud between multiple cores(collections)
I just want to share the solrconfig.xml and schema.xml. As there should be differences between collections for other files, such as the DIH's configurations. -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, March 19, 2013 11:19 AM To: solr-user@lucene.apache.org Subject: Re: How to share config files in SolrCloud between multiple cores(collections) To share configs in SolrCloud you just upload a single config set and then link it to multiple collections. You don't actually use solr.xml to do it. - Mark On Mar 19, 2013, at 10:43 AM, Li, Qiang qiang...@msci.com wrote: We have multiple cores with the same configurations, before using SolrCloud, we can use relative path in solr.xml. But with Solr4, is seems denied for using relative path for the schema and config in solr.xml. Regards, Ivan This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged or otherwise protected from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not an intended recipient, please contact the sender by reply email and destroy the original message and any copies of the message as well as any attachments to the original message. Local registered entity information: http://www.msci.com/legal/local_registered_entities.html This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged or otherwise protected from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not an intended recipient, please contact the sender by reply email and destroy the original message and any copies of the message as well as any attachments to the original message. Local registered entity information: http://www.msci.com/legal/local_registered_entities.html
How to share config files in SolrCloud between multiple cores(collections)
We have multiple cores with the same configurations, before using SolrCloud, we can use relative path in solr.xml. But with Solr4, is seems denied for using relative path for the schema and config in solr.xml. Regards, Ivan This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged or otherwise protected from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not an intended recipient, please contact the sender by reply email and destroy the original message and any copies of the message as well as any attachments to the original message. Local registered entity information: http://www.msci.com/legal/local_registered_entities.html
Re: build CMIS compatible Solr
I think this might be the one you are talking about: https://github.com/sourcesense/solr-cmis But I think Alfresco has already had search functionality, similar to Solr. Then why did you want to use it to index docs out of Alfresco? On Fri, Jan 18, 2013 at 8:00 PM, Upayavira u...@odoko.co.uk wrote: A colleague of mine when I was working for Sourcesense made a CMIS plugin for Solr. It was one way, and we used it to index stuff out of Alfresco into Solr. I can't search for it now, let me know if you can't find it. Upayavira On Fri, Jan 18, 2013, at 05:35 AM, Nicholas Li wrote: I want to make something like Alfresco, but not having that many features. And I'd like to utilise the searching ability of Solr. On Fri, Jan 18, 2013 at 4:11 PM, Gora Mohanty g...@mimirtech.com wrote: On 18 January 2013 10:36, Nicholas Li nicholas...@yarris.com wrote: hi I am new to solr and I would like to use Solr as my document server, plus search engine. But solr is not CMIS compatible( While it shoud not be, as it is not build as a pure document management server). In that sense, I would build another layer beyond Solr so that the exposed interface would be CMIS compatible. [...] May I ask why? Solr is designed to be a search engine, which is a very different beast from a document repository. In the open-source world, Alfresco ( http://www.alfresco.com/ ) already exists, can index into Solr, and supports CMIS-based access. Regards, Gora
build CMIS compatible Solr
hi I am new to solr and I would like to use Solr as my document server, plus search engine. But solr is not CMIS compatible( While it shoud not be, as it is not build as a pure document management server). In that sense, I would build another layer beyond Solr so that the exposed interface would be CMIS compatible. I did some investigation and looks like OpenCMIS is one of the choices. My next step would be build this CMIS Bridge layer, which can marshall the request as CMIS request, then within the CMIS implementation, marshall the requst as Solr compatible request and send it to Solr. Finally marshall the Solr response to CMIS compatible response. Is my logic right? And, is that any other library other than OpenCMIS to do this job? cheers. Nick
Re: build CMIS compatible Solr
I want to make something like Alfresco, but not having that many features. And I'd like to utilise the searching ability of Solr. On Fri, Jan 18, 2013 at 4:11 PM, Gora Mohanty g...@mimirtech.com wrote: On 18 January 2013 10:36, Nicholas Li nicholas...@yarris.com wrote: hi I am new to solr and I would like to use Solr as my document server, plus search engine. But solr is not CMIS compatible( While it shoud not be, as it is not build as a pure document management server). In that sense, I would build another layer beyond Solr so that the exposed interface would be CMIS compatible. [...] May I ask why? Solr is designed to be a search engine, which is a very different beast from a document repository. In the open-source world, Alfresco ( http://www.alfresco.com/ ) already exists, can index into Solr, and supports CMIS-based access. Regards, Gora
Store document while using Solr
hi there, I am quite new to Solr and have a very basic question about storing and indexing the document. I am trying with the Solr example, and when I run command like 'java -jar post.jar foo/test.xml', it gives me the feeling that solr will index the given file, no matter where it is store, and solr won't re-store this file to some other location in the file system. Am I correct? If I want use file system to manage the document, it seem like it is better to define some location, which will be used to store all the potential files(It may need some processing to move/copy/upload the files to this location), then use solr to index them under this location. Am I correct? Cheers, Nick
Index version generation for Solr 3.5
Hi, I ran into an issue lately with Index version generation for Solr 3.5. In Solr 1.4., the index version of slave service increments upon each replication. However, I noticed it's not the case for Solr 3.5; the index version would increase 20, or 30 after replication. Does anyone know why and any reference on the web for this? The index generation does still increment after replication though. Thanks, Xin
Re: Atomic Multicore Operations - E.G. Move Docs
在 2012-7-2 傍晚6:37,Nicholas Ball nicholas.b...@nodelay.com写道: That could work, but then how do you ensure commit is called on the two cores at the exact same time? that may needs something like two phrase commit in relational dB. lucene has prepareCommit, but to implement 2pc, many things need to do. Also, any way to commit a specific update rather then all the back-logged ones? Cheers, Nicholas On Sat, 30 Jun 2012 16:19:31 -0700, Lance Norskog goks...@gmail.com wrote: Index all documents to both cores, but do not call commit until both report that indexing worked. If one of the cores throws an exception, call roll back on both cores. On Sat, Jun 30, 2012 at 6:50 AM, Nicholas Ball nicholas.b...@nodelay.com wrote: Hey all, Trying to figure out the best way to perform atomic operation across multiple cores on the same solr instance i.e. a multi-core environment. An example would be to move a set of docs from one core onto another core and ensure that a softcommit is done as the exact same time. If one were to fail so would the other. Obviously this would probably require some customization but wanted to know what the best way to tackle this would be and where should I be looking in the source. Many thanks for the help in advance, Nicholas a.k.a. incunix
Re: Atomic Multicore Operations - E.G. Move Docs
do you really need this? distributed transaction is a difficult problem. in 2pc, every node could fail, including coordinator. something like leader election needed to make sure it works. you maybe try zookeeper. but if the transaction is not very very important like transfer money in bank, you can do like this. coordinator: 在 2012-8-16 上午7:42,Nicholas Ball nicholas.b...@nodelay.com写道: Haven't managed to find a good way to do this yet. Does anyone have any ideas on how I could implement this feature? Really need to move docs across from one core to another atomically. Many thanks, Nicholas On Mon, 02 Jul 2012 04:37:12 -0600, Nicholas Ball nicholas.b...@nodelay.com wrote: That could work, but then how do you ensure commit is called on the two cores at the exact same time? Cheers, Nicholas On Sat, 30 Jun 2012 16:19:31 -0700, Lance Norskog goks...@gmail.com wrote: Index all documents to both cores, but do not call commit until both report that indexing worked. If one of the cores throws an exception, call roll back on both cores. On Sat, Jun 30, 2012 at 6:50 AM, Nicholas Ball nicholas.b...@nodelay.com wrote: Hey all, Trying to figure out the best way to perform atomic operation across multiple cores on the same solr instance i.e. a multi-core environment. An example would be to move a set of docs from one core onto another core and ensure that a softcommit is done as the exact same time. If one were to fail so would the other. Obviously this would probably require some customization but wanted to know what the best way to tackle this would be and where should I be looking in the source. Many thanks for the help in advance, Nicholas a.k.a. incunix
Re: Atomic Multicore Operations - E.G. Move Docs
http://zookeeper.apache.org/doc/r3.3.6/recipes.html#sc_recipes_twoPhasedCommit On Thu, Aug 16, 2012 at 7:41 AM, Nicholas Ball nicholas.b...@nodelay.com wrote: Haven't managed to find a good way to do this yet. Does anyone have any ideas on how I could implement this feature? Really need to move docs across from one core to another atomically. Many thanks, Nicholas On Mon, 02 Jul 2012 04:37:12 -0600, Nicholas Ball nicholas.b...@nodelay.com wrote: That could work, but then how do you ensure commit is called on the two cores at the exact same time? Cheers, Nicholas On Sat, 30 Jun 2012 16:19:31 -0700, Lance Norskog goks...@gmail.com wrote: Index all documents to both cores, but do not call commit until both report that indexing worked. If one of the cores throws an exception, call roll back on both cores. On Sat, Jun 30, 2012 at 6:50 AM, Nicholas Ball nicholas.b...@nodelay.com wrote: Hey all, Trying to figure out the best way to perform atomic operation across multiple cores on the same solr instance i.e. a multi-core environment. An example would be to move a set of docs from one core onto another core and ensure that a softcommit is done as the exact same time. If one were to fail so would the other. Obviously this would probably require some customization but wanted to know what the best way to tackle this would be and where should I be looking in the source. Many thanks for the help in advance, Nicholas a.k.a. incunix
Re: how to boost exact match
create an field for exact match. it is a optional boolean clause 在 2012-8-11 下午1:42,abhayd ajdabhol...@hotmail.com写道: hi I have documents like iphone 4 - white iphone 4s - black ipone4 - black when user searches for iphone 4 i would like to show iphone 4 docs first and iphone 4s after that. Similary when user is searching for iphone 4s i would like to show iphone 4s docs first then iphone 4 docs. At present i use whitespace tokenizer. Any idea how to achieve this? -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-boost-exact-match-tp4000576.html Sent from the Solr - User mailing list archive at Nabble.com.
filed type for text search
I have used Solr 3.4 for a long time. Recently, when I upgrade to Solr 4.0 and reindex the whole data, I find that the fields which are specified as string type can not be searched by q parameter. If I just change the type to text_general, it works. So my question is for Solr 4.0, must I set the field type of text_general for text search?
Search special chars
Hi All, I want to search some keywords like Non-taxable, which has a - in the word. Can I make it working in Solr by some configuration? Or any other ways? Thanks Regards, Ivan This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged or otherwise protected from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not an intended recipient, please contact the sender by reply email and destroy the original message and any copies of the message as well as any attachments to the original message. Local registered entity information: http://www.msci.com/legal/local_registered_entities.html
Re: Solr seems to hang
could you please use jstack to dump the call stacks? On Thu, Jun 28, 2012 at 2:53 PM, Arkadi Colson ark...@smartbit.be wrote: It now hanging for 15 hour and nothing changes in the index directory. Tips for further debugging? On 06/27/2012 03:50 PM, Arkadi Colson wrote: I'm sending files to solr with the php Solr library. I'm doing a commit every 1000 documents: autoCommit maxDocs1000/maxDocs !-- maxTime1000/maxTime -- /autoCommit Hard to say how long it's hanging. At least for 1 hour. After that I restarted Tomcat to continue... I will have a look at the indexes next time it's hanging. Thanks for the tip! SOLR: 3.6 TOMCAT: 7.0.28 JAVA: 1.7.0_05-b05 On 06/27/2012 03:13 PM, Erick Erickson wrote: How long is it hanging? And how are you sending files to Tika, and especially how often do you commit? One problem that people run into is that they commit too often, causing segments to be merged and occasionally that just takes a while and people think that Solr is hung. 18G isn't very large as indexes go, so it's unlikely that's your problem, except if merging is going on in which case you might be copying a bunch of data. So try seeing if you're getting a bunch of disk activity, you can get a crude idea of what's going on if you just look at the index directory on your Solr server while it's hung. What version of Solr are you using? Details matter Best Erick On Wed, Jun 27, 2012 at 7:51 AM, Arkadi Colson ark...@smartbit.be wrote: Anybody an idea? The thread Dump looks like this: Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.1-b02 mixed mode): http-8983-6 daemon prio=10 tid=0x41126000 nid=0x5c1 in Object.wait() [0x7fa0ad197000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x00070abf4ad0 (a org.apache.tomcat.util.net.JIoEndpoint$Worker) at java.lang.Object.wait(Object.java:485) at org.apache.tomcat.util.net.JIoEndpoint$Worker.await(JIoEndpoint.java:458) - locked 0x00070abf4ad0 (a org.apache.tomcat.util.net.JIoEndpoint$Worker) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:484) at java.lang.Thread.run(Thread.java:662) pool-4-thread-1 prio=10 tid=0x7fa0a054d800 nid=0x5be waiting on condition [0x7f9f962f4000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000702598b30 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.DelayQueue.take(DelayQueue.java:160) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:662) http-8983-5 daemon prio=10 tid=0x412d2800 nid=0x5bd runnable [0x7f9f94171000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.java:735) at org.apache.coyote.http11.InternalInputBuffer.parseRequestLine(InternalInputBuffer.java:366) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:814) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) http-8983-4 daemon prio=10 tid=0x41036000 nid=0x5b1 in Object.wait() [0x7f9f966c9000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x00070b6e4790 (a org.apache.lucene.index.DocumentsWriter) at java.lang.Object.wait(Object.java:485) at org.apache.lucene.index.DocumentsWriter.waitIdle(DocumentsWriter.java:986) - locked 0x00070b6e4790 (a org.apache.lucene.index.DocumentsWriter) at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:524) - locked 0x00070b6e4790 (a org.apache.lucene.index.DocumentsWriter) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3580) - locked 0x00070b6e4858 (a
Re: what is precisionStep and positionIncrementGap
read How it works of http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/search/NumericRangeQuery.html if you can read Chinese, I have a blog explaining the details of the implementation. http://blog.csdn.net/fancyerii/article/details/7256379 On Thu, Jun 28, 2012 at 3:51 PM, ZHANG Liang F liang.f.zh...@alcatel-sbell.com.cn wrote: Thanks a lot, but the precisionStep is still very vague to me! Could you give me a example? -Original Message- From: Li Li [mailto:fancye...@gmail.com] Sent: 2012年6月28日 11:25 To: solr-user@lucene.apache.org Subject: Re: what is precisionStep and positionIncrementGap 1. precisionStep is used for ranging query of Numeric Fields. see http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/search/NumericRangeQuery.html 2. positionIncrementGap is used for phrase query of multi-value fields e.g. doc1 has two titles. title1: ab cd title2: xy zz if your positionIncrementGap is 0, then the position of the 4 terms are 0,1,2,3. if you search phrase cd xy, it will hit. But you may think it should not match so you can adjust positionIncrementGap to a larger one. e.g. 100. Then the positions now are 0,1,100,101. the phrase query will not match it. On Thu, Jun 28, 2012 at 10:00 AM, ZHANG Liang F liang.f.zh...@alcatel-sbell.com.cn wrote: Hi, in the schema.xml, usually there will be fieldType definition like this: fieldType name=int class=solr.TrieIntField precisionStep=0 omitNorms=true positionIncrementGap=0/ the precisionStep and positionIncrementGap is not very clear to me. Could you please elaborate more on these 2? Thanks! Liang
Re: Solr seems to hang
seems that the indexwriter wants to flush but need to wait others become idle. but i see you the n gram filter is working. is your field's value too long? you sould also tell us average load the system. the free memory and memory used by jvm 在 2012-6-27 晚上7:51,Arkadi Colson ark...@smartbit.be写道: Anybody an idea? The thread Dump looks like this: Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.1-b02 mixed mode): http-8983-6 daemon prio=10 tid=0x41126000 nid=0x5c1 in Object.wait() [0x7fa0ad197000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x00070abf4ad0 (a org.apache.tomcat.util.net.** JIoEndpoint$Worker) at java.lang.Object.wait(Object.**java:485) at org.apache.tomcat.util.net.**JIoEndpoint$Worker.await(** JIoEndpoint.java:458) - locked 0x00070abf4ad0 (a org.apache.tomcat.util.net.** JIoEndpoint$Worker) at org.apache.tomcat.util.net.**JIoEndpoint$Worker.run(** JIoEndpoint.java:484) at java.lang.Thread.run(Thread.**java:662) pool-4-thread-1 prio=10 tid=0x7fa0a054d800 nid=0x5be waiting on condition [0x7f9f962f4000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000702598b30 (a java.util.concurrent.locks.**AbstractQueuedSynchronizer$**ConditionObject) at java.util.concurrent.locks.**LockSupport.park(LockSupport.** java:158) at java.util.concurrent.locks.**AbstractQueuedSynchronizer$** ConditionObject.await(**AbstractQueuedSynchronizer.**java:1987) at java.util.concurrent.**DelayQueue.take(DelayQueue.**java:160) at java.util.concurrent.**ScheduledThreadPoolExecutor$** DelayedWorkQueue.take(**ScheduledThreadPoolExecutor.**java:609) at java.util.concurrent.**ScheduledThreadPoolExecutor$** DelayedWorkQueue.take(**ScheduledThreadPoolExecutor.**java:602) at java.util.concurrent.**ThreadPoolExecutor.getTask(** ThreadPoolExecutor.java:947) at java.util.concurrent.**ThreadPoolExecutor$Worker.run(** ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.**java:662) http-8983-5 daemon prio=10 tid=0x412d2800 nid=0x5bd runnable [0x7f9f94171000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.**socketRead0(Native Method) at java.net.SocketInputStream.**read(SocketInputStream.java:**129) at org.apache.coyote.http11.**InternalInputBuffer.fill(** InternalInputBuffer.java:735) at org.apache.coyote.http11.**InternalInputBuffer.** parseRequestLine(**InternalInputBuffer.java:366) at org.apache.coyote.http11.**Http11Processor.process(** Http11Processor.java:814) at org.apache.coyote.http11.**Http11Protocol$** Http11ConnectionHandler.**process(Http11Protocol.java:**602) at org.apache.tomcat.util.net.**JIoEndpoint$Worker.run(** JIoEndpoint.java:489) at java.lang.Thread.run(Thread.**java:662) http-8983-4 daemon prio=10 tid=0x41036000 nid=0x5b1 in Object.wait() [0x7f9f966c9000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x00070b6e4790 (a org.apache.lucene.index.** DocumentsWriter) at java.lang.Object.wait(Object.**java:485) at org.apache.lucene.index.**DocumentsWriter.waitIdle(** DocumentsWriter.java:986) - locked 0x00070b6e4790 (a org.apache.lucene.index.** DocumentsWriter) at org.apache.lucene.index.**DocumentsWriter.flush(** DocumentsWriter.java:524) - locked 0x00070b6e4790 (a org.apache.lucene.index.** DocumentsWriter) at org.apache.lucene.index.**IndexWriter.doFlush(** IndexWriter.java:3580) - locked 0x00070b6e4858 (a org.apache.solr.update.** SolrIndexWriter) at org.apache.lucene.index.**IndexWriter.flush(IndexWriter.** java:3545) at org.apache.lucene.index.**IndexWriter.updateDocument(** IndexWriter.java:2328) at org.apache.lucene.index.**IndexWriter.updateDocument(** IndexWriter.java:2293) at org.apache.solr.update.**DirectUpdateHandler2.addDoc(** DirectUpdateHandler2.java:240) at org.apache.solr.update.**processor.RunUpdateProcessor.** processAdd(**RunUpdateProcessorFactory.**java:61) at org.apache.solr.update.**processor.LogUpdateProcessor.** processAdd(**LogUpdateProcessorFactory.**java:115) at org.apache.solr.handler.**extraction.**ExtractingDocumentLoader. **doAdd(**ExtractingDocumentLoader.java:**141) at org.apache.solr.handler.**extraction.**ExtractingDocumentLoader. **addDoc(**ExtractingDocumentLoader.java:**146) at org.apache.solr.handler.**extraction.** ExtractingDocumentLoader.load(**ExtractingDocumentLoader.java:**236) at org.apache.solr.handler.**ContentStreamHandlerBase.** handleRequestBody(**ContentStreamHandlerBase.java:**58)
Re: what is precisionStep and positionIncrementGap
1. precisionStep is used for ranging query of Numeric Fields. see http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/search/NumericRangeQuery.html 2. positionIncrementGap is used for phrase query of multi-value fields e.g. doc1 has two titles. title1: ab cd title2: xy zz if your positionIncrementGap is 0, then the position of the 4 terms are 0,1,2,3. if you search phrase cd xy, it will hit. But you may think it should not match so you can adjust positionIncrementGap to a larger one. e.g. 100. Then the positions now are 0,1,100,101. the phrase query will not match it. On Thu, Jun 28, 2012 at 10:00 AM, ZHANG Liang F liang.f.zh...@alcatel-sbell.com.cn wrote: Hi, in the schema.xml, usually there will be fieldType definition like this: fieldType name=int class=solr.TrieIntField precisionStep=0 omitNorms=true positionIncrementGap=0/ the precisionStep and positionIncrementGap is not very clear to me. Could you please elaborate more on these 2? Thanks! Liang
Re: Query Logic Question
I think they are logically the same. but 1 may be a little bit faster than 2 On Thu, Jun 28, 2012 at 5:59 AM, Rublex ruble...@hotmail.com wrote: Hi, Can someone explain to me please why these two queries return different results: 1. -PaymentType:Finance AND -PaymentType:Lease AND -PaymentType:Cash *(700 results)* 2. (-PaymentType:Finance AND -PaymentType:Lease) AND -PaymentType:Cash *(0 results)* Logically the two above queries should be return the same results no? Thank you -- View this message in context: http://lucene.472066.n3.nabble.com/Query-Logic-Question-tp3991689.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: what's better for in memory searching?
I have roughly read the codes of RAMDirectory. it use a list of 1024 byte arrays and many overheads. But as far as I know, using MMapDirectory, I can't prevent the page faults. OS will swap less frequent pages out. Even if I allocate enough memory for JVM, I can guarantee all the files in the directory are in memory. am I understanding right? if it is, then some less frequent queries will be slow. How can I let them always in memory? On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskog goks...@gmail.com wrote: Yes, use MMapDirectory. It is faster and uses memory more efficiently than RAMDirectory. This sounds wrong, but it is true. With RAMDirectory, Java has to work harder doing garbage collection. On Fri, Jun 8, 2012 at 1:30 AM, Li Li fancye...@gmail.com wrote: hi all I want to use lucene 3.6 providing searching service. my data is not very large, raw data is less that 1GB and I want to use load all indexes into memory. also I need save all indexes into disk persistently. I originally want to use RAMDirectory. But when I read its javadoc. Warning: This class is not intended to work with huge indexes. Everything beyond several hundred megabytes will waste resources (GC cycles), because it uses an internal buffer size of 1024 bytes, producing millions of byte [1024] arrays. This class is optimized for small memory-resident indexes. It also has bad concurrency on multithreaded environments. It is recommended to materialize large indexes on disk and use MMapDirectory, which is a high-performance directory implementation working directly on the file system cache of the operating system, so copying data to Java heap space is not useful. should I use MMapDirectory? it seems another contrib instantiated. anyone test it with RAMDirectory? -- Lance Norskog goks...@gmail.com
Re: what's better for in memory searching?
do you mean software RAM disk? using RAM to simulate disk? How to deal with Persistence? maybe I can hack by increase RAMOutputStream.BUFFER_SIZE from 1024 to 1024*1024. it may have a waste. but I can adjust my merge policy to avoid to much segments. I will have a big segment and a small segment. Every night I will merge them. new added documents will flush into a new segment and I will merge the new generated segment and the small one. Our update operations are not very frequent. On Mon, Jun 11, 2012 at 4:59 PM, Paul Libbrecht p...@hoplahup.net wrote: Li Li, have you considered allocating a RAM-Disk? It's not the most flexible thing... but it's certainly close, in performance to a RAMDirectory. MMapping on that is likely to be useless but I doubt you can set it to zero. That'd need experiment. Also, doesn't caching and auto-warming provide the lowest latency for all expected queries ? Paul Le 11 juin 2012 à 10:50, Li Li a écrit : I want to use lucene 3.6 providing searching service. my data is not very large, raw data is less that 1GB and I want to use load all indexes into memory. also I need save all indexes into disk persistently. I originally want to use RAMDirectory. But when I read its javadoc.
Re: what's better for in memory searching?
I am sorry. I make a mistake. even use RAMDirectory, I can not guarantee they are not swapped out. On Mon, Jun 11, 2012 at 4:45 PM, Michael Kuhlmann k...@solarier.de wrote: Set the swapiness to 0 to avoid memory pages being swapped to disk too early. http://en.wikipedia.org/wiki/Swappiness -Kuli Am 11.06.2012 10:38, schrieb Li Li: I have roughly read the codes of RAMDirectory. it use a list of 1024 byte arrays and many overheads. But as far as I know, using MMapDirectory, I can't prevent the page faults. OS will swap less frequent pages out. Even if I allocate enough memory for JVM, I can guarantee all the files in the directory are in memory. am I understanding right? if it is, then some less frequent queries will be slow. How can I let them always in memory? On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskoggoks...@gmail.com wrote: Yes, use MMapDirectory. It is faster and uses memory more efficiently than RAMDirectory. This sounds wrong, but it is true. With RAMDirectory, Java has to work harder doing garbage collection. On Fri, Jun 8, 2012 at 1:30 AM, Li Lifancye...@gmail.com wrote: hi all I want to use lucene 3.6 providing searching service. my data is not very large, raw data is less that 1GB and I want to use load all indexes into memory. also I need save all indexes into disk persistently. I originally want to use RAMDirectory. But when I read its javadoc. Warning: This class is not intended to work with huge indexes. Everything beyond several hundred megabytes will waste resources (GC cycles), because it uses an internal buffer size of 1024 bytes, producing millions of byte [1024] arrays. This class is optimized for small memory-resident indexes. It also has bad concurrency on multithreaded environments. It is recommended to materialize large indexes on disk and use MMapDirectory, which is a high-performance directory implementation working directly on the file system cache of the operating system, so copying data to Java heap space is not useful. should I use MMapDirectory? it seems another contrib instantiated. anyone test it with RAMDirectory? -- Lance Norskog goks...@gmail.com
Re: what's better for in memory searching?
yes, I need average query time less than 10 ms. The faster the better. I have enough memory for lucene because I know there are not too much data. there are not many modifications. every day there are about hundreds of document update. if indexes are not in physical memory, then IO operations will cost a few ms. btw, the full gc may also add uncertainty, So I need optimize it as much as possible. On Mon, Jun 11, 2012 at 5:27 PM, Michael Kuhlmann k...@solarier.de wrote: You cannot guarantee this when you're running out of RAM. You'd have a problem then anyway. Why are you caring that much? Did you yet have performance issues? 1GB should load really fast, and both auto warming and OS cache should help a lot as well. With such an index, you usually don't need to fine tune performance that much. Did you think about using a SSD? Since you want to persist your index, you'll need to live with disk IO anyway. Greetings, Kuli Am 11.06.2012 11:20, schrieb Li Li: I am sorry. I make a mistake. even use RAMDirectory, I can not guarantee they are not swapped out. On Mon, Jun 11, 2012 at 4:45 PM, Michael Kuhlmannk...@solarier.de wrote: Set the swapiness to 0 to avoid memory pages being swapped to disk too early. http://en.wikipedia.org/wiki/Swappiness -Kuli Am 11.06.2012 10:38, schrieb Li Li: I have roughly read the codes of RAMDirectory. it use a list of 1024 byte arrays and many overheads. But as far as I know, using MMapDirectory, I can't prevent the page faults. OS will swap less frequent pages out. Even if I allocate enough memory for JVM, I can guarantee all the files in the directory are in memory. am I understanding right? if it is, then some less frequent queries will be slow. How can I let them always in memory? On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskoggoks...@gmail.com wrote: Yes, use MMapDirectory. It is faster and uses memory more efficiently than RAMDirectory. This sounds wrong, but it is true. With RAMDirectory, Java has to work harder doing garbage collection. On Fri, Jun 8, 2012 at 1:30 AM, Li Lifancye...@gmail.com wrote: hi all I want to use lucene 3.6 providing searching service. my data is not very large, raw data is less that 1GB and I want to use load all indexes into memory. also I need save all indexes into disk persistently. I originally want to use RAMDirectory. But when I read its javadoc. Warning: This class is not intended to work with huge indexes. Everything beyond several hundred megabytes will waste resources (GC cycles), because it uses an internal buffer size of 1024 bytes, producing millions of byte [1024] arrays. This class is optimized for small memory-resident indexes. It also has bad concurrency on multithreaded environments. It is recommended to materialize large indexes on disk and use MMapDirectory, which is a high-performance directory implementation working directly on the file system cache of the operating system, so copying data to Java heap space is not useful. should I use MMapDirectory? it seems another contrib instantiated. anyone test it with RAMDirectory? -- Lance Norskog goks...@gmail.com
Re: what's better for in memory searching?
I found this. http://unix.stackexchange.com/questions/10214/per-process-swapiness-for-linux it can provide fine grained control of swapping On Mon, Jun 11, 2012 at 4:45 PM, Michael Kuhlmann k...@solarier.de wrote: Set the swapiness to 0 to avoid memory pages being swapped to disk too early. http://en.wikipedia.org/wiki/Swappiness -Kuli Am 11.06.2012 10:38, schrieb Li Li: I have roughly read the codes of RAMDirectory. it use a list of 1024 byte arrays and many overheads. But as far as I know, using MMapDirectory, I can't prevent the page faults. OS will swap less frequent pages out. Even if I allocate enough memory for JVM, I can guarantee all the files in the directory are in memory. am I understanding right? if it is, then some less frequent queries will be slow. How can I let them always in memory? On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskoggoks...@gmail.com wrote: Yes, use MMapDirectory. It is faster and uses memory more efficiently than RAMDirectory. This sounds wrong, but it is true. With RAMDirectory, Java has to work harder doing garbage collection. On Fri, Jun 8, 2012 at 1:30 AM, Li Lifancye...@gmail.com wrote: hi all I want to use lucene 3.6 providing searching service. my data is not very large, raw data is less that 1GB and I want to use load all indexes into memory. also I need save all indexes into disk persistently. I originally want to use RAMDirectory. But when I read its javadoc. Warning: This class is not intended to work with huge indexes. Everything beyond several hundred megabytes will waste resources (GC cycles), because it uses an internal buffer size of 1024 bytes, producing millions of byte [1024] arrays. This class is optimized for small memory-resident indexes. It also has bad concurrency on multithreaded environments. It is recommended to materialize large indexes on disk and use MMapDirectory, which is a high-performance directory implementation working directly on the file system cache of the operating system, so copying data to Java heap space is not useful. should I use MMapDirectory? it seems another contrib instantiated. anyone test it with RAMDirectory? -- Lance Norskog goks...@gmail.com
Re: what's better for in memory searching?
is this method equivalent to set vm.swappiness which is global? or it can set the swappiness for jvm process? On Tue, Jun 12, 2012 at 5:11 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Point about premature optimization makes sense for me. However some time ago I've bookmarked potentially useful approach http://lucene.472066.n3.nabble.com/High-response-time-after-being-idle-tp3616599p3617604.html. On Mon, Jun 11, 2012 at 3:02 PM, Toke Eskildsen t...@statsbiblioteket.dkwrote: On Mon, 2012-06-11 at 11:38 +0200, Li Li wrote: yes, I need average query time less than 10 ms. The faster the better. I have enough memory for lucene because I know there are not too much data. there are not many modifications. every day there are about hundreds of document update. if indexes are not in physical memory, then IO operations will cost a few ms. I'm with Michael on this one: It seems that you're doing a premature optimization. Guessing that your final index will be 5GB in size with 1 million documents (give or take 900.000:-), relatively simple queries and so on, an average response time of 10 ms should be attainable even on spinning drives. One hundred document updates per day are not many, so again I would not expect problems. As is often the case on this mailing list, the advice is try it. Using a normal on-disk index and doing some warm up is the easy solution to implement and nearly all of your work on this will be usable for a RAM-based solution, if you are not satisfied with the speed. Or you could buy a small cheap SSD and have no more worries... Regards, Toke Eskildsen -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: [Announce] Solr 3.6 with RankingAlgorithm 1.4.2 - NRT support
yes, I am also interested in good performance with 2 billion docs. how many search nodes do you use? what's the average response time and qps ? another question: where can I find related paper or resources of your algorithm which explains the algorithm in detail? why it's better than google site(better than lucene is not very interested because lucene is not originally designed to provide search function like google)? On Mon, May 28, 2012 at 1:06 AM, Darren Govoni dar...@ontrenet.com wrote: I think people on this list would be more interested in your approach to scaling 2 billion documents than modifying solr/lucene scoring (which is already top notch). So given that, can you share any references or otherwise substantiate good performance with 2 billion documents? Thanks. On Sun, 2012-05-27 at 08:29 -0700, Nagendra Nagarajayya wrote: Actually, RankingAlgorithm 1.4.2 has been scaled to more than 2 billion docs. With RankingAlgorithm 1.4.3, using the parameters age=latestdocs=number feature, you can retrieve the NRT inserted documents in milliseconds from such a huge index improving query and faceting performance and using very little resources ... Currently, RankingAlgorithm 1.4.3 is only available with Solr 4.0, and the NRT insert performance with Solr 4.0 is about 70,000 docs / sec. RankingAlgorithm 1.4.3 should become available with Solr 3.6 soon. Regards, Nagendra Nagarajayya http://solr-ra.tgels.org http://rankingalgorithm.tgels.org On 5/27/2012 7:32 AM, Darren Govoni wrote: Hi, Have you tested this with a billion documents? Darren On Sun, 2012-05-27 at 07:24 -0700, Nagendra Nagarajayya wrote: Hi! I am very excited to announce the availability of Solr 3.6 with RankingAlgorithm 1.4.2. This NRT supports now works with both RankingAlgorithm and Lucene. The insert/update performance should be about 5000 docs in about 490 ms with the MbArtists Index. RankingAlgorithm 1.4.2 has multiple algorithms, improved performance over the earlier releases, supports the entire Lucene Query Syntax, ± and/or boolean queries and can scale to more than a billion documents. You can get more information about NRT performance from here: http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver_3.x You can download Solr 3.6 with RankingAlgorithm 1.4.2 from here: http://solr-ra.tgels.org Please download and give the new version a try. Regards, Nagendra Nagarajayya http://solr-ra.tgels.org http://rankingalgorithm.tgels.org ps. MbArtists index is the example index used in the Solr 1.4 Enterprise Book
Re: How can i search site name
you should define your search first. if the site is www.google.com. how do you match it. full string matching or partial matching. e.g. is google should match? if it does, you should write your own analyzer for this field. On Tue, May 22, 2012 at 2:03 PM, Shameema Umer shem...@gmail.com wrote: Sorry, Please let me know how can I search site name using the solr query syntax. My results should show title, url and content. Title and content are being searched even though the defaultSearchFieldcontent/defaultSearchField. I need url or site name too. please, help. Thanks in advance. On Tue, May 22, 2012 at 11:05 AM, ketan kore ketankore...@gmail.com wrote: you can go on www.google.com and just type the site which you want to search and google will show you the results as simple as that ...
Re: Installing Solr on Tomcat using Shell - Code wrong?
you should find some clues from tomcat log 在 2012-5-22 晚上7:49,Spadez james_will...@hotmail.com写道: Hi, This is the install process I used in my shell script to try and get Tomcat running with Solr (debian server): I swear this used to work, but currently only Tomcat works. The Solr page just comes up with The requested resource (/solr/admin) is not available. Can anyone give me some insight into why this isnt working? Its driving me nuts. James -- View this message in context: http://lucene.472066.n3.nabble.com/Installing-Solr-on-Tomcat-using-Shell-Code-wrong-tp3985393.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr query with mandatory values
+ before term is correct. in lucene term includes field and value. Query ::= ( Clause )* Clause ::= [+, -] [TERM :] ( TERM | ( Query ) ) #_TERM_CHAR: ( _TERM_START_CHAR | _ESCAPED_CHAR | - | + ) #_ESCAPED_CHAR: \\ ~[] in lucene query syntax, you can't express a term value including space. you can use quotation mark but lucene will take it as a phrase query. so you need escape space like title:hello\\ world which will take hello world as a field value. and the analyzer then will tokenize it. so you should use analyzer which can deal with space. e.g. you can use keyword analyzer as far as I know On Thu, May 10, 2012 at 3:35 AM, Matt Kuiper matt.kui...@issinc.com wrote: Yes. See http://wiki.apache.org/solr/SolrQuerySyntax - The standard Solr Query Parser syntax is a superset of the Lucene Query Parser syntax. Which links to http://lucene.apache.org/core/3_6_0/queryparsersyntax.html Note - Based on the info on these pages I believe the + symbol is to be placed just before the mandatory value, not before the field name in the query. Matt Kuiper Intelligent Software Solutions -Original Message- From: G.Long [mailto:jde...@gmail.com] Sent: Wednesday, May 09, 2012 10:45 AM To: solr-user@lucene.apache.org Subject: Solr query with mandatory values Hi :) I remember that in a Lucene query, there is something like mandatory values. I just have to add a + symbol in front of the mandatory parameter, like: +myField:my value I was wondering if there was something similar in Solr queries? Or is this behaviour activated by default? Gary
Re: SOLRJ: Is there a way to obtain a quick count of total results for a query
don't score by relevance and score by document id may speed it up a little? I haven't done any test of this. may be u can give it a try. because scoring will consume some cpu time. you just want to match and get total count On Wed, May 2, 2012 at 11:58 PM, vybe3142 vybe3...@gmail.com wrote: I can achieve this by building a query with start and rows = 0, and using queryResponse.getResults().getNumFound(). Are there any more efficient approaches to this? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/SOLRJ-Is-there-a-way-to-obtain-a-quick-count-of-total-results-for-a-query-tp3955322.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sorting result first which come first in sentance
as for version below 4.0, it's not possible because lucene's score model. position information is stored, but only used to support phrase query. it just tell us whether a document is matched, but we can boost a document. The similar problem is : how to implement proximity boost. for 2 search terms, we need return all docs that contains this 2 terms. but if they are phrase, we give it a largest boost. if there is a word between them, we give it a smaller one. if there are 2 words between them, we will give it smaller score. all this ranking algorithm need more flexible score model. I don't know whether the latest trunk take this into consideration. On Fri, May 4, 2012 at 3:43 AM, Jonty Rhods jonty.rh...@gmail.com wrote: Hi all, I need suggetion: I Hi all, I need suggetion: I have many title like: 1 bomb blast in kabul 2 kabul bomb blast 3 3 people killed in serial bomb blast in kabul I want 2nd result should come first while user search by kabul. Because kabul is on 1st postion in that sentance. Similarly 1st result should come on 2nd and 3rd should come last. Please suggest me hot to implement this.. Regard Jonty
Re: Sorting result first which come first in sentance
for this version, you may consider using payload for position boost. you can save boost values in payload. I have used it in lucene api where anchor text should weigh more than normal text. but I haven't used it in solr. some searched urls: http://wiki.apache.org/solr/Payloads http://digitalpebble.blogspot.com/2010/08/using-payloads-with-dismaxqparser-in.html On Fri, May 4, 2012 at 9:51 AM, Jonty Rhods jonty.rh...@gmail.com wrote: I am using solr version 3.4
Re: get latest 50 documents the fastest way
you should reverse your sort algorithm. maybe you can override the tf method of Similarity and return -1.0f * tf(). (I don't know whether default collector allow score smaller than zero) Or you can hack this by add a large number or write your own collector, in its collect(int doc) method, you can do like this: collect(int doc){ float score=scorer.score(); score*=-1.0f; } if you don't sort by relevant score, just set Sort On Tue, May 1, 2012 at 10:38 PM, Yuval Dotan yuvaldo...@gmail.com wrote: Hi Guys We have a use case where we need to get the 50 *latest *documents that match my query - without additional ranking,sorting,etc on the results. My index contains 1,000,000,000 documents and i noticed that if the number of found documents is very big (larger than 50% of the index size - 500,000,000 docs) than it takes more than 5 seconds to get the results even with rows=50 parameter. Is there a way to get the results faster? Thanks Yuval
question about NRT(soft commit) and Transaction Log in trunk
hi I checked out the trunk and played with its new soft commit feature. it's cool. But I've got a few questions about it. By reading some introductory articles and wiki, and hasted code reading, my understand of it's implementation is: For normal commit(hard commit), we should flush all into disk and commit it. flush is not very time consuming because of os level cache. the most time consuming one is sync in commit process. Soft commit just flush postings and pending deletions into disk and generating new segments. Then solr can use a new searcher to read the latest indexes and warm up and then register itself. if there is no hard commit and the jvm crashes, then new data may lose. if my understanding is correct, then why we need transaction log? I found in DirectUpdateHandler2, every time a command is executed, TransactionLog will record a line in log. But the default sync level in RunUpdateProcessorFactory is flush, which means it will not sync the log file. does this make sense? in database implementation, we usually write log and modify data in memory because log is smaller than real data. if crashes. we can redo the unfinished log and make data correct. will Solr leverage this log like this? if it is, why it's not synced?
Re: Solr Scoring
another way is to use payload http://wiki.apache.org/solr/Payloads the advantage of payload is that you only need one field and can make frq file smaller than use two fields. but the disadvantage is payload is stored in prx file, so I am not sure which one is fast. maybe you can try them both. On Fri, Apr 13, 2012 at 8:04 AM, Erick Erickson erickerick...@gmail.comwrote: GAH! I had my head in make this happen in one field when I wrote my response, without being explicit. Of course Walter's solution is pretty much the standard way to deal with this. Best Erick On Thu, Apr 12, 2012 at 5:38 PM, Walter Underwood wun...@wunderwood.org wrote: It is easy. Create two fields, text_exact and text_stem. Don't use the stemmer in the first chain, do use the stemmer in the second. Give the text_exact a bigger weight than text_stem. wunder On Apr 12, 2012, at 4:34 PM, Erick Erickson wrote: No, I don't think there's an OOB way to make this happen. It's a recurring theme, make exact matches score higher than stemmed matches. Best Erick On Thu, Apr 12, 2012 at 5:18 AM, Kissue Kissue kissue...@gmail.com wrote: Hi, I have a field in my index called itemDesc which i am applying EnglishMinimalStemFilterFactory to. So if i index a value to this field containing Edges, the EnglishMinimalStemFilterFactory applies stemming and Edges becomes Edge. Now when i search for Edges, documents with Edge score better than documents with the actual search word - Edges. Is there a way i can make documents with the actual search word in this case Edges score better than document with Edge? I am using Solr 3.5. My field definition is shown below: fieldType name=text_en class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt enablePositionIncrements=true filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPossessiveFilterFactory/ filter class=solr.EnglishMinimalStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPossessiveFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.EnglishMinimalStemFilterFactory/ /analyzer /fieldType Thanks.
Re: How to read SOLR cache statistics?
http://wiki.apache.org/solr/SolrCaching On Fri, Apr 13, 2012 at 2:30 PM, Kashif Khan uplink2...@gmail.com wrote: Does anyone explain what does the following parameters mean in SOLR cache statistics? *name*: queryResultCache *class*: org.apache.solr.search.LRUCache *version*: 1.0 *description*: LRU Cache(maxSize=512, initialSize=512) *stats*: lookups : 98 *hits *: 59 *hitratio *: 0.60 *inserts *: 41 *evictions *: 0 *size *: 41 *warmupTime *: 0 *cumulative_lookups *: 98 *cumulative_hits *: 59 *cumulative_hitratio *: 0.60 *cumulative_inserts *: 39 *cumulative_evictions *: 0 AND also this *name*: fieldValueCache *class*: org.apache.solr.search.FastLRUCache *version*: 1.0 *description*: Concurrent LRU Cache(maxSize=1, initialSize=10, minSize=9000, acceptableSize=9500, cleanupThread=false) *stats*: *lookups *: 8 *hits *: 4 *hitratio *: 0.50 *inserts *: 2 *evictions *: 0 *size *: 2 *warmupTime *: 0 *cumulative_lookups *: 8 *cumulative_hits *: 4 *cumulative_hitratio *: 0.50 *cumulative_inserts *: 2 *cumulative_evictions *: 0 *item_ABC *: {field=ABC,memSize=340592,tindexSize=1192,time=1360,phase1=1344,nTerms=7373,bigTerms=1,termInstances=11513,uses=4} *item_BCD *: {field=BCD,memSize=341248,tindexSize=1952,time=1688,phase1=1688,nTerms=8075,bigTerms=0,termInstances=13510,uses=2} Without understanding these terms i cannot configure server for better cache usage. The point is searches are very slow. These stats were taken when server was down and restarted. I just want to understand what these terms mean actually -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-read-SOLR-cache-statistics-tp3907294p3907294.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: using solr to do a 'match'
it's not possible now because lucene don't support this. when doing disjunction query, it only record how many terms match this document. I think this is a common requirement for many users. I suggest lucene should divide scorer to a matcher and a scorer. the matcher just return which doc is matched and why/how the doc is matched. especially for disjuction query, it should tell which term matches and possible other information such as tf/idf and the distance of terms(to support proximity search). That's the matcher's job. and then the scorer(a ranking algorithm) use flexible algorithm to score this document and the collector can collect it. On Wed, Apr 11, 2012 at 10:28 AM, Chris Book chrisb...@gmail.com wrote: Hello, I have a solr index running that is working very well as a search. But I want to add the ability (if possible) to use it to do matching. The problem is that by default it is only looking for all the input terms to be present, and it doesn't give me any indication as to how many terms in the target field were not specified by the input. For example, if I'm trying to match to the song title dust in the wind, I'm correctly getting a match if the input query is dust in wind. But I don't want to get a match if the input is just dust. Although as a search dust should return this result, I'm looking for some way to filter this out based on some indication that the input isn't close enough to the output. Perhaps if I could get information that that the number of input terms is much less than the number of terms in the field. Or something else along those line? I realize that this isn't the typical use case for a search, but I'm just looking for some suggestions as to how I could improve the above example a bit. Thanks, Chris
Re: using solr to do a 'match'
I searched my mail but nothing found. the thread searched by key words boolean expression is Indexing Boolean Expressions from joaquin.delgado to tell which terms are matched, for BooleanScorer2, a simple method is to modify DisjunctionSumScorer and add a BitSet to record matched scorers. When collector collect this document, it can get the scorer and recursively find the matched terms. But I think maybe it's better to add a component maybe named matcher that do the matching job, and scorer use the information from the matcher and do ranking things. On Wed, Apr 11, 2012 at 4:32 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hi, This use case is similar to matching boolean expression problem. You can find recent thread about it. I have an idea that we can introduce disjunction query with dynamic mm (minShouldMatch parameter http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/search/BooleanQuery.html#setMinimumNumberShouldMatch(int) ) i.e. 'match these clauses disjunctively but for every document use value from field cache of field xxxCount as a minShouldMatch parameter'. Also norms can be used as a source for dynamics mm values. Wdyt? On Wed, Apr 11, 2012 at 10:08 AM, Li Li fancye...@gmail.com wrote: it's not possible now because lucene don't support this. when doing disjunction query, it only record how many terms match this document. I think this is a common requirement for many users. I suggest lucene should divide scorer to a matcher and a scorer. the matcher just return which doc is matched and why/how the doc is matched. especially for disjuction query, it should tell which term matches and possible other information such as tf/idf and the distance of terms(to support proximity search). That's the matcher's job. and then the scorer(a ranking algorithm) use flexible algorithm to score this document and the collector can collect it. On Wed, Apr 11, 2012 at 10:28 AM, Chris Book chrisb...@gmail.com wrote: Hello, I have a solr index running that is working very well as a search. But I want to add the ability (if possible) to use it to do matching. The problem is that by default it is only looking for all the input terms to be present, and it doesn't give me any indication as to how many terms in the target field were not specified by the input. For example, if I'm trying to match to the song title dust in the wind, I'm correctly getting a match if the input query is dust in wind. But I don't want to get a match if the input is just dust. Although as a search dust should return this result, I'm looking for some way to filter this out based on some indication that the input isn't close enough to the output. Perhaps if I could get information that that the number of input terms is much less than the number of terms in the field. Or something else along those line? I realize that this isn't the typical use case for a search, but I'm just looking for some suggestions as to how I could improve the above example a bit. Thanks, Chris -- Sincerely yours Mikhail Khludnev ge...@yandex.ru http://www.griddynamics.com mkhlud...@griddynamics.com
Re: pagerank??
According to my knowledge, Solr cannot support this. In my case, I get data by keyword-matching from Solr and then rank the data by PageRank after that. Thanks, Bing On Wed, Apr 4, 2012 at 6:37 AM, Manuel Antonio Novoa Proenza mano...@estudiantes.uci.cu wrote: Hello, I have in my Solr index , many indexed documents. Let me know any way or efficient function to calculate the page rank of websites indexed. s 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: Trouble Setting Up Development Environment
. Runtime ClassNotFoundExceptions may result. solr3_5P/solr3_5Classpath Dependency Validator Message Classpath entry /solr3_5/ssrc/solr/lib/guava-r05.jar will not be exported or published. Runtime ClassNotFoundExceptions may result. solr3_5 P/solr3_5Classpath Dependency Validator Message Classpath entry /solr3_5/ssrc/solr/lib/jcl-over-slf4j-1.6.1.jar will not be exported or published. Runtime ClassNotFoundExceptions may result. solr3_5P/solr3_5Classpath Dependency Validator Message Classpath entry /solr3_5/ssrc/solr/lib/junit-4.7.jar will not be exported or published. Runtime ClassNotFoundExceptions may result. solr3_5 P/solr3_5Classpath Dependency Validator Message Classpath entry /solr3_5/ssrc/solr/lib/servlet-api-2.4.jar will not be exported or published. Runtime ClassNotFoundExceptions may result. solr3_5P/solr3_5Classpath Dependency Validator Message Classpath entry /solr3_5/ssrc/solr/lib/slf4j-api-1.6.1.jar will not be exported or published. Runtime ClassNotFoundExceptions may result. solr3_5P/solr3_5Classpath Dependency Validator Message Classpath entry /solr3_5/ssrc/solr/lib/slf4j-jdk14-1.6.1.jar will not be exported or published. Runtime ClassNotFoundExceptions may result. solr3_5P/solr3_5Classpath Dependency Validator Message Classpath entry /solr3_5/ssrc/solr/lib/wstx-asl-3.2.7.jar will not be exported or published. Runtime ClassNotFoundExceptions may result. solr3_5P/solr3_5Classpath Dependency Validator Message On Fri, Mar 23, 2012 at 3:25 AM, Li Li fancye...@gmail.com wrote: here is my method. 1. check out latest source codes from trunk or download tar ball svn checkout http://svn.apache.org/repos/asf/lucene/dev/trunklucene_trunk 2. create a dynamic web project in eclipse and close it. for example, I create a project name lucene-solr-trunk in my workspace. 3. copy/mv the source code to this project(it's not necessary) here is my directory structure lili@lili-desktop:~/workspace/lucene-solr-trunk$ ls bin.tests-framework build lucene_trunk src testindex WebContent lucene_trunk is the top directory checked out from svn in step 1. 4. remove WebContent generated by eclipse and modify it to a soft link to lili@lili-desktop:~/workspace/lucene-solr-trunk$ ll WebContent lrwxrwxrwx 1 lili lili 28 2011-08-18 18:50 WebContent - lucene_trunk/solr/webapp/web/ 5. open lucene_trunk/dev-tools/eclipse/dot.classpath. copy all lines like kind=src to a temp file classpathentry kind=src path=lucene/core/src/java/ classpathentry kind=src path=lucene/core/src/resources/ 6. replace all string like path=xxx to path=lucene_trunk/xxx and copy them into .classpath file 7. mkdir WebContent/WEB-INF/lib 8. extract all jar file in dot.classpath to WebContent/WEB-INF/lib I use this command: lili@lili-desktop:~/workspace/lucene-solr-trunk/lucene_trunk$ cat dev-tools/eclipse/dot.classpath |grep kind=\lib|awk -F path=\ '{print $2}' |awk -F \/ '{print $1}' |xargs cp ../WebContent/WEB-INF/lib/ 9. open this project and refresh it. if everything is ok, it will compile all java files successfully. if there is something wrong, Probably we don't use the correct jar. because there are many versions of the same library. 10. right click the project - debug As - debug on Server it will fail because no solr home is specified. 11. right click the project - debug As - debug Configuration - Arguments Tab - VM arguments add -Dsolr.solr.home=/home/lili/workspace/lucene-solr-trunk/lucene_trunk/solr/example/solr you can also add other vm arguments like -Xmx1g here. 12. all fine, add a break point at SolrDispatchFilter.doFilter(). all solr request comes here 13. have fun~ On Fri, Mar 23, 2012 at 11:49 AM, Karthick Duraisamy Soundararaj karthick.soundara...@gmail.com wrote: Hi Solr Ppl, I have been trying to set up solr dev env. I downloaded the tar ball of eclipse and the solr 3.5 source. Here are the exact sequence of steps I followed I extracted the solr 3.5 source and eclipse. I installed run-jetty-run plugin for eclipse. I ran ant eclipse in the solr 3.5 source directory I used eclipse's Open existing project option to open up the files in solr 3.5 directory. I got a huge tree in the name of lucene_solr. I run it and there is a SEVERE error: System property not set excetption. * solr*.test.sys.*prop1* not set and then the jetty loads solr. I then try localhost:8080/solr/select/ I get null pointer execpiton. I am only able to access admin page. Is there anything else I need to do? I tried to follow http://www.lucidimagination.com/devzone/technical-articles/setting-apache-solr-eclipse . But I dont find the solr-3.5.war file. I tried ant dist to generate the dist folder but that has many jars and wars.. I am able to compile the source
Re: Trouble Setting Up Development Environment
here is my method. 1. check out latest source codes from trunk or download tar ball svn checkout http://svn.apache.org/repos/asf/lucene/dev/trunklucene_trunk 2. create a dynamic web project in eclipse and close it. for example, I create a project name lucene-solr-trunk in my workspace. 3. copy/mv the source code to this project(it's not necessary) here is my directory structure lili@lili-desktop:~/workspace/lucene-solr-trunk$ ls bin.tests-framework build lucene_trunk src testindex WebContent lucene_trunk is the top directory checked out from svn in step 1. 4. remove WebContent generated by eclipse and modify it to a soft link to lili@lili-desktop:~/workspace/lucene-solr-trunk$ ll WebContent lrwxrwxrwx 1 lili lili 28 2011-08-18 18:50 WebContent - lucene_trunk/solr/webapp/web/ 5. open lucene_trunk/dev-tools/eclipse/dot.classpath. copy all lines like kind=src to a temp file classpathentry kind=src path=lucene/core/src/java/ classpathentry kind=src path=lucene/core/src/resources/ 6. replace all string like path=xxx to path=lucene_trunk/xxx and copy them into .classpath file 7. mkdir WebContent/WEB-INF/lib 8. extract all jar file in dot.classpath to WebContent/WEB-INF/lib I use this command: lili@lili-desktop:~/workspace/lucene-solr-trunk/lucene_trunk$ cat dev-tools/eclipse/dot.classpath |grep kind=\lib|awk -F path=\ '{print $2}' |awk -F \/ '{print $1}' |xargs cp ../WebContent/WEB-INF/lib/ 9. open this project and refresh it. if everything is ok, it will compile all java files successfully. if there is something wrong, Probably we don't use the correct jar. because there are many versions of the same library. 10. right click the project - debug As - debug on Server it will fail because no solr home is specified. 11. right click the project - debug As - debug Configuration - Arguments Tab - VM arguments add -Dsolr.solr.home=/home/lili/workspace/lucene-solr-trunk/lucene_trunk/solr/example/solr you can also add other vm arguments like -Xmx1g here. 12. all fine, add a break point at SolrDispatchFilter.doFilter(). all solr request comes here 13. have fun~ On Fri, Mar 23, 2012 at 11:49 AM, Karthick Duraisamy Soundararaj karthick.soundara...@gmail.com wrote: Hi Solr Ppl, I have been trying to set up solr dev env. I downloaded the tar ball of eclipse and the solr 3.5 source. Here are the exact sequence of steps I followed I extracted the solr 3.5 source and eclipse. I installed run-jetty-run plugin for eclipse. I ran ant eclipse in the solr 3.5 source directory I used eclipse's Open existing project option to open up the files in solr 3.5 directory. I got a huge tree in the name of lucene_solr. I run it and there is a SEVERE error: System property not set excetption. * solr*.test.sys.*prop1* not set and then the jetty loads solr. I then try localhost:8080/solr/select/ I get null pointer execpiton. I am only able to access admin page. Is there anything else I need to do? I tried to follow http://www.lucidimagination.com/devzone/technical-articles/setting-apache-solr-eclipse . But I dont find the solr-3.5.war file. I tried ant dist to generate the dist folder but that has many jars and wars.. I am able to compile the source with ant compile, get the solr in example directory up and running. Will be great if someone can help me with this. Thanks, Karthick
Re: How to avoid the unexpected character error?
it's not the right place. when you use java -Durl=http://... -jar post.jar data.xml the data.xml file must be a valid xml file. you shoud escape special chars in this file. I don't know how you generate this file. if you use java program(or other scripts) to generate this file, you should use xml tools to generate this file. but if you generate like this: StringBuilder buf=new StringBuilder(); buf.append(add); buf.append(doc); buf.append(field name=fnametext content/field); you should escape special chars. if you use java, you can make use of org.apache.solr.common.util.XML class On Fri, Mar 16, 2012 at 2:03 PM, neosky neosk...@yahoo.com wrote: I am sorry, but I can't get what you mean. I tried the HTMLStripCharFilter and PatternReplaceCharFilter. It doesn't work. Could you give me an example? Thanks! fieldType name=text_html class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType I also tried: charFilter class=solr.PatternReplaceCharFilterFactory pattern=([^a-z]) replacement= maxBlockChars=1 blockDelimiters=|/ -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-avoid-the-unexpected-character-error-tp3824726p3831064.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr out of memory exception
it seems you are using 64bit jvm(32bit jvm can only allocate about 1.5GB). you should enable pointer compression by -XX:+UseCompressedOops On Thu, Mar 15, 2012 at 1:58 PM, Husain, Yavar yhus...@firstam.com wrote: Thanks for helping me out. I have allocated Xms-2.0GB Xmx-2.0GB However i see Tomcat is still using pretty less memory and not 2.0G Total Memory on my Windows Machine = 4GB. With smaller index size it is working perfectly fine. I was thinking of increasing the system RAM tomcat heap space allocated but then how come on a different server with exactly same system and solr configuration memory it is working fine? -Original Message- From: Li Li [mailto:fancye...@gmail.com] Sent: Thursday, March 15, 2012 11:11 AM To: solr-user@lucene.apache.org Subject: Re: Solr out of memory exception how many memory are allocated to JVM? On Thu, Mar 15, 2012 at 1:27 PM, Husain, Yavar yhus...@firstam.com wrote: Solr is giving out of memory exception. Full Indexing was completed fine. Later while searching maybe when it tries to load the results in memory it starts giving this exception. Though with the same memory allocated to Tomcat and exactly same solr replica on another server it is working perfectly fine. I am working on 64 bit software's including Java Tomcat on Windows. Any help would be appreciated. Here are the logs: The server encountered an internal error (Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: abortOnConfigurationErrorfalse/abortOnConfigurationError in null - java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068) at org.apache.solr.core.SolrCore.init(SolrCore.java:579) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:115) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4072) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4726) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601) at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:943) at org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:778) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:504) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065) at org.apache.catalina.core.StandardHost.start(StandardHost.java:840) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1057) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463) at org.apache.catalina.core.StandardService.start(StandardService.java:525) at org.apache.catalina.core.StandardServer.start(StandardServer.java:754) at org.apache.catalina.startup.Catalina.start(Catalina.java:595) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.index.SegmentTermEnum.termInfo(SegmentTermEnum.java:180) at org.apache.lucene.index.TermInfosReader.init(TermInfosReader.java:91) at org.apache.lucene.index.SegmentReader$CoreReaders.init(SegmentReader.java:122) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:652) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:613) at org.apache.lucene.index.DirectoryReader.init(DirectoryReader.java:104) at org.apache.lucene.index.ReadOnlyDirectoryReader.init(ReadOnlyDirectoryReader.java:27) at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:74) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java
Re: Solr out of memory exception
it can reduce memory usage. for small heap application less than 4GB, it may speed up. but be careful, for large heap application, it depends. you should do some test for yourself. our application's test result is: it reduce memory usage but enlarge response time. we use 25GB memory. http://lists.apple.com/archives/java-dev/2010/Apr/msg00157.html Dyer, James james.d...@ingrambook.com viahttp://support.google.com/mail/bin/answer.py?hl=enctx=mailanswer=1311182 lucene.apache.org 3/18/11 to solr-user Our tests showed, in our situation, the compressed oops flag caused our minor (ParNew) generation time to decrease significantly. We're using a larger heap (22gb) and our index size is somewhere in the 40's gb total. I guess with any of these jvm parameters, it all depends on your situation and you need to test. In our case, this flag solved a real problem we were having. Whoever wrote the JRocket book you refer to no doubt had other scenarios in mind... On Thu, Mar 15, 2012 at 3:02 PM, C.Yunqin 345804...@qq.com wrote: why should enable pointer compression? -- Original -- From: Li Lifancye...@gmail.com; Date: Thu, Mar 15, 2012 02:41 PM To: Husain, Yavaryhus...@firstam.com; Cc: solr-user@lucene.apache.orgsolr-user@lucene.apache.org; Subject: Re: Solr out of memory exception it seems you are using 64bit jvm(32bit jvm can only allocate about 1.5GB). you should enable pointer compression by -XX:+UseCompressedOops On Thu, Mar 15, 2012 at 1:58 PM, Husain, Yavar yhus...@firstam.com wrote: Thanks for helping me out. I have allocated Xms-2.0GB Xmx-2.0GB However i see Tomcat is still using pretty less memory and not 2.0G Total Memory on my Windows Machine = 4GB. With smaller index size it is working perfectly fine. I was thinking of increasing the system RAM tomcat heap space allocated but then how come on a different server with exactly same system and solr configuration memory it is working fine? -Original Message- From: Li Li [mailto:fancye...@gmail.com] Sent: Thursday, March 15, 2012 11:11 AM To: solr-user@lucene.apache.org Subject: Re: Solr out of memory exception how many memory are allocated to JVM? On Thu, Mar 15, 2012 at 1:27 PM, Husain, Yavar yhus...@firstam.com wrote: Solr is giving out of memory exception. Full Indexing was completed fine. Later while searching maybe when it tries to load the results in memory it starts giving this exception. Though with the same memory allocated to Tomcat and exactly same solr replica on another server it is working perfectly fine. I am working on 64 bit software's including Java Tomcat on Windows. Any help would be appreciated. Here are the logs: The server encountered an internal error (Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: abortOnConfigurationErrorfalse/abortOnConfigurationError in null - java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068) at org.apache.solr.core.SolrCore.init(SolrCore.java:579) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:115) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4072) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4726) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601) at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:943) at org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:778) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:504) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065) at org.apache.catalina.core.StandardHost.start(StandardHost.java:840) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1057
Re: Sorting on non-stored field
it should be indexed by not analyzed. it don't need stored. reading field values from stored fields is extremely slow. So lucene will use StringIndex to read fields for sort. so if you want to sort by some field, you should index this field and don't analyze it. On Wed, Mar 14, 2012 at 6:43 PM, Finotti Simone tech...@yoox.com wrote: I was wondering: is it possible to sort a Solr result-set on a non-stored value? Thank you
Re: How to avoid the unexpected character error?
There is a class org.apache.solr.common.util.XML in solr you can use this wrapper: public static String escapeXml(String s) throws IOException{ StringWriter sw=new StringWriter(); XML.escapeCharData(s, sw); return sw.getBuffer().toString(); } On Wed, Mar 14, 2012 at 4:34 PM, neosky neosk...@yahoo.com wrote: I use the xml to index the data. One filed might contains some characters like '' = It seems that will produce the error I modify that filed doesn't index, but it doesn't work. I need to store the filed, but index might not be indexed. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-avoid-the-unexpected-character-error-tp3824726p3824726.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to avoid the unexpected character error?
no, it's nothing to do with schema.xml post.jar just post a file, it don't parse this file. solr will use xml parser to parse this file. if you don't escape special characters, it's not a valid xml file and solr will throw exceptions. On Thu, Mar 15, 2012 at 12:33 AM, neosky neosk...@yahoo.com wrote: Thanks! Does the schema.xml support this parameter? I am using the example post.jar to index my file. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-avoid-the-unexpected-character-error-tp3824726p3825959.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr out of memory exception
how many memory are allocated to JVM? On Thu, Mar 15, 2012 at 1:27 PM, Husain, Yavar yhus...@firstam.com wrote: Solr is giving out of memory exception. Full Indexing was completed fine. Later while searching maybe when it tries to load the results in memory it starts giving this exception. Though with the same memory allocated to Tomcat and exactly same solr replica on another server it is working perfectly fine. I am working on 64 bit software's including Java Tomcat on Windows. Any help would be appreciated. Here are the logs: The server encountered an internal error (Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: abortOnConfigurationErrorfalse/abortOnConfigurationError in null - java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068) at org.apache.solr.core.SolrCore.init(SolrCore.java:579) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:115) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4072) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4726) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601) at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:943) at org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:778) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:504) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065) at org.apache.catalina.core.StandardHost.start(StandardHost.java:840) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1057) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463) at org.apache.catalina.core.StandardService.start(StandardService.java:525) at org.apache.catalina.core.StandardServer.start(StandardServer.java:754) at org.apache.catalina.startup.Catalina.start(Catalina.java:595) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.index.SegmentTermEnum.termInfo(SegmentTermEnum.java:180) at org.apache.lucene.index.TermInfosReader.init(TermInfosReader.java:91) at org.apache.lucene.index.SegmentReader$CoreReaders.init(SegmentReader.java:122) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:652) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:613) at org.apache.lucene.index.DirectoryReader.init(DirectoryReader.java:104) at org.apache.lucene.index.ReadOnlyDirectoryReader.init(ReadOnlyDirectoryReader.java:27) at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:74) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:683) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69) at org.apache.lucene.index.IndexReader.open(IndexReader.java:476) at org.apache.lucene.index.IndexReader.open(IndexReader.java:403) at org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:38) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1057) at org.apache.solr.core.SolrCore.init(SolrCore.java:579) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:115) at
Re: index size with replication
optimize will generate new segments and delete old ones. if your master also provides searching service during indexing, the old files may be opened by old SolrIndexSearcher. they will be deleted later. So when indexing, the index size may double. But a moment later, old indexes will be deleted. On Wed, Mar 14, 2012 at 7:06 AM, Mike Austin mike.aus...@juggle.com wrote: I have a master with two slaves. For some reason on the master if I do an optimize after indexing on the master it double in size from 42meg to 90 meg.. however, when the slaves replicate they get the 42meg index.. Should the master and slaves always be the same size? Thanks, Mike
Re: How to limit the number of open searchers?
what do u mean programmatically? modify codes of solr? becuase solr is not like lucene, it only provide http interfaces for its users other than java api if you want to modify solr, you can find codes in SolrCore private final LinkedListRefCountedSolrIndexSearcher _searchers = new LinkedListRefCountedSolrIndexSearcher(); and _searcher is current searcher. be careful to use searcherLock to synchronizing your codes. maybe you can write your codes like: synchronized(searcherLock){ if(_searchers.size==1){ ... } } On Tue, Mar 6, 2012 at 3:18 AM, Michael Ryan mr...@moreover.com wrote: Is there a way to limit the number of searchers that can be open at a given time? I know there is a maxWarmingSearchers configuration that limits the number of warming searchers, but that's not quite what I'm looking for... Ideally, when I commit, I want there to only be one searcher open before the commit, so that during the commit and warming, there is a max of two searchers open. I'd be okay with delaying the commit until there is only one searcher open. Is there a way to programmatically determine how many searchers are currently open? -Michael
Re: Fw:how to make fdx file
lucene will never modify old segment files, it just flushes into a new segment or merges old segments into new one. after merging, old segments will be deleted. once a file(such as fdt and fdx) is generated. it will never be re-generated. the only possible is that in the generating stage, there is something wrong. or it's deleted by other programs such as wrongly deleted by human. On Sat, Mar 3, 2012 at 2:33 PM, C.Yunqin 345804...@qq.com wrote: yes,the fdt file still is there. can i make new fdx file through fdt file. is there a posibilty that during the process of updating and optimizing, the index will be deleted then re-generated? -- Original -- From: Erick Ericksonerickerick...@gmail.com; Date: Sat, Mar 3, 2012 08:28 AM To: solr-usersolr-user@lucene.apache.org; Subject: Re: Fw:how to make fdx file As far as I know, fdx files don't just disappear, so I can only assume that something external removed it. That said, if you somehow re-indexed and had no fields where stored=true, then the fdx file may not be there. Are you seeing problems as a result? This file is used to store index information for stored fields. Do you have an fdt file? Best Erick On Fri, Mar 2, 2012 at 2:48 AM, C.Yunqin 345804...@qq.com wrote: Hi , my fdx file was unexpected gone, then the solr sever stop running; what I can do to recover solr? Other files still exist. Thanks very much /:includetail
Re: Solr HBase - Re: How is Data Indexed in HBase?
Dear Mr Gupta, Your understanding about my solution is correct. Now both HBase and Solr are used in my system. I hope it could work. Thanks so much for your reply! Best regards, Bing On Fri, Feb 24, 2012 at 3:30 AM, T Vinod Gupta tvi...@readypulse.comwrote: regarding your question on hbase support for high performance and consistency - i would say hbase is highly scalable and performant. how it does what it does can be understood by reading relevant chapters around architecture and design in the hbase book. with regards to ranking, i see your problem. but if you split the problem into hbase specific solution and solr based solution, you can achieve the results probably. may be you do the ranking and store the rank in hbase and then use solr to get the results and then use hbase as a lookup to get the rank. or you can put the rank as part of the document schema and index the rank too for range queries and such. is my understanding of your scenario wrong? thanks On Wed, Feb 22, 2012 at 9:51 AM, Bing Li lbl...@gmail.com wrote: Mr Gupta, Thanks so much for your reply! In my use cases, retrieving data by keyword is one of them. I think Solr is a proper choice. However, Solr does not provide a complex enough support to rank. And, frequent updating is also not suitable in Solr. So it is difficult to retrieve data randomly based on the values other than keyword frequency in text. In this case, I attempt to use HBase. But I don't know how HBase support high performance when it needs to keep consistency in a large scale distributed system. Now both of them are used in my system. I will check out ElasticSearch. Best regards, Bing On Thu, Feb 23, 2012 at 1:35 AM, T Vinod Gupta tvi...@readypulse.comwrote: Bing, Its a classic battle on whether to use solr or hbase or a combination of both. both systems are very different but there is some overlap in the utility. they also differ vastly when it compares to computation power, storage needs, etc. so in the end, it all boils down to your use case. you need to pick the technology that it best suited to your needs. im still not clear on your use case though. btw, if you haven't started using solr yet - then you might want to checkout ElasticSearch. I spent over a week researching between solr and ES and eventually chose ES due to its cool merits. thanks On Wed, Feb 22, 2012 at 9:31 AM, Ted Yu yuzhih...@gmail.com wrote: There is no secondary index support in HBase at the moment. It's on our road map. FYI On Wed, Feb 22, 2012 at 9:28 AM, Bing Li lbl...@gmail.com wrote: Jacques, Yes. But I still have questions about that. In my system, when users search with a keyword arbitrarily, the query is forwarded to Solr. No any updating operations but appending new indexes exist in Solr managed data. When I need to retrieve data based on ranking values, HBase is used. And, the ranking values need to be updated all the time. Is that correct? My question is that the performance must be low if keeping consistency in a large scale distributed environment. How does HBase handle this issue? Thanks so much! Bing On Thu, Feb 23, 2012 at 1:17 AM, Jacques whs...@gmail.com wrote: It is highly unlikely that you could replace Solr with HBase. They're really apples and oranges. On Wed, Feb 22, 2012 at 1:09 AM, Bing Li lbl...@gmail.com wrote: Dear all, I wonder how data in HBase is indexed? Now Solr is used in my system because data is managed in inverted index. Such an index is suitable to retrieve unstructured and huge amount of data. How does HBase deal with the issue? May I replaced Solr with HBase? Thanks so much! Best regards, Bing
How is Data Indexed in HBase?
Dear all, I wonder how data in HBase is indexed? Now Solr is used in my system because data is managed in inverted index. Such an index is suitable to retrieve unstructured and huge amount of data. How does HBase deal with the issue? May I replaced Solr with HBase? Thanks so much! Best regards, Bing
Re: Solr HBase - Re: How is Data Indexed in HBase?
Mr Gupta, Thanks so much for your reply! In my use cases, retrieving data by keyword is one of them. I think Solr is a proper choice. However, Solr does not provide a complex enough support to rank. And, frequent updating is also not suitable in Solr. So it is difficult to retrieve data randomly based on the values other than keyword frequency in text. In this case, I attempt to use HBase. But I don't know how HBase support high performance when it needs to keep consistency in a large scale distributed system. Now both of them are used in my system. I will check out ElasticSearch. Best regards, Bing On Thu, Feb 23, 2012 at 1:35 AM, T Vinod Gupta tvi...@readypulse.comwrote: Bing, Its a classic battle on whether to use solr or hbase or a combination of both. both systems are very different but there is some overlap in the utility. they also differ vastly when it compares to computation power, storage needs, etc. so in the end, it all boils down to your use case. you need to pick the technology that it best suited to your needs. im still not clear on your use case though. btw, if you haven't started using solr yet - then you might want to checkout ElasticSearch. I spent over a week researching between solr and ES and eventually chose ES due to its cool merits. thanks On Wed, Feb 22, 2012 at 9:31 AM, Ted Yu yuzhih...@gmail.com wrote: There is no secondary index support in HBase at the moment. It's on our road map. FYI On Wed, Feb 22, 2012 at 9:28 AM, Bing Li lbl...@gmail.com wrote: Jacques, Yes. But I still have questions about that. In my system, when users search with a keyword arbitrarily, the query is forwarded to Solr. No any updating operations but appending new indexes exist in Solr managed data. When I need to retrieve data based on ranking values, HBase is used. And, the ranking values need to be updated all the time. Is that correct? My question is that the performance must be low if keeping consistency in a large scale distributed environment. How does HBase handle this issue? Thanks so much! Bing On Thu, Feb 23, 2012 at 1:17 AM, Jacques whs...@gmail.com wrote: It is highly unlikely that you could replace Solr with HBase. They're really apples and oranges. On Wed, Feb 22, 2012 at 1:09 AM, Bing Li lbl...@gmail.com wrote: Dear all, I wonder how data in HBase is indexed? Now Solr is used in my system because data is managed in inverted index. Such an index is suitable to retrieve unstructured and huge amount of data. How does HBase deal with the issue? May I replaced Solr with HBase? Thanks so much! Best regards, Bing
Re: Sort by the number of matching terms (coord value)
you can fool the lucene scoring fuction. override each function such as idf queryNorm lengthNorm and let them simply return 1.0f. I don't lucene 4 will expose more details. but for 2.x/3.x, lucene can only score by vector space model and the formula can't be replaced by users. On Fri, Feb 17, 2012 at 10:47 AM, Nicholas Clark clark...@gmail.com wrote: Hi, I'm looking for a way to sort results by the number of matching terms. Being able to sort by the coord() value or by the overlap value that gets passed into the coord() function would do the trick. Is there a way I can expose those values to the sort function? I'd appreciate any help that points me in the right direction. I'm OK with making basic code modifications. Thanks! -Nick
Re: Can I rebuild an index and remove some fields?
great. I think you could make it a public tool. maybe others also need such functionality. On Thu, Feb 16, 2012 at 5:31 AM, Robert Stewart bstewart...@gmail.comwrote: I implemented an index shrinker and it works. I reduced my test index from 6.6 GB to 3.6 GB by removing a single shingled field I did not need anymore. I'm actually using Lucene.Net for this project so code is C# using Lucene.Net 2.9.2 API. But basic idea is: Create an IndexReader wrapper that only enumerates the terms you want to keep, and that removes terms from documents when returning documents. Use the SegmentMerger to re-write each segment (where each segment is wrapped by the wrapper class), writing new segment to a new directory. Collect the SegmentInfos and do a commit in order to create a new segments file in new index directory Done - you now have a shrunk index with specified terms removed. Implementation uses separate thread for each segment, so it re-writes them in parallel. Took about 15 minutes to do 770,000 doc index on my macbook. On Tue, Feb 14, 2012 at 10:12 PM, Li Li fancye...@gmail.com wrote: I have roughly read the codes of 4.0 trunk. maybe it's feasible. SegmentMerger.add(IndexReader) will add to be merged Readers merge() will call mergeTerms(segmentWriteState); mergePerDoc(segmentWriteState); mergeTerms() will construct fields from IndexReaders for(int readerIndex=0;readerIndexmergeState.readers.size();readerIndex++) { final MergeState.IndexReaderAndLiveDocs r = mergeState.readers.get(readerIndex); final Fields f = r.reader.fields(); final int maxDoc = r.reader.maxDoc(); if (f != null) { slices.add(new ReaderUtil.Slice(docBase, maxDoc, readerIndex)); fields.add(f); } docBase += maxDoc; } So If you wrapper your IndexReader and override its fields() method, maybe it will work for merge terms. for DocValues, it can also override AtomicReader.docValues(). just return null for fields you want to remove. maybe it should traverse CompositeReader's getSequentialSubReaders() and wrapper each AtomicReader other things like term vectors norms are similar. On Wed, Feb 15, 2012 at 6:30 AM, Robert Stewart bstewart...@gmail.com wrote: I was thinking if I make a wrapper class that aggregates another IndexReader and filter out terms I don't want anymore it might work. And then pass that wrapper into SegmentMerger. I think if I filter out terms on GetFieldNames(...) and Terms(...) it might work. Something like: HashSetstring ignoredTerms=...; FilteringIndexReader wrapper=new FilterIndexReader(reader); SegmentMerger merger=new SegmentMerger(writer); merger.add(wrapper); merger.Merge(); On Feb 14, 2012, at 1:49 AM, Li Li wrote: for method 2, delete is wrong. we can't delete terms. you also should hack with the tii and tis file. On Tue, Feb 14, 2012 at 2:46 PM, Li Li fancye...@gmail.com wrote: method1, dumping data for stored fields, you can traverse the whole index and save it to somewhere else. for indexed but not stored fields, it may be more difficult. if the indexed and not stored field is not analyzed(fields such as id), it's easy to get from FieldCache.StringIndex. But for analyzed fields, though theoretically it can be restored from term vector and term position, it's hard to recover from index. method 2, hack with metadata 1. indexed fields delete by query, e.g. field:* 2. stored fields because all fields are stored sequentially. it's not easy to delete some fields. this will not affect search speed. but if you want to get stored fields, and the useless fields are very long, then it will slow down. also it's possible to hack with it. but need more effort to understand the index file format and traverse the fdt/fdx file. http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/fileformats.html this will give you some insight. On Tue, Feb 14, 2012 at 6:29 AM, Robert Stewart bstewart...@gmail.com wrote: Lets say I have a large index (100M docs, 1TB, split up between 10 indexes). And a bunch of the stored and indexed fields are not used in search at all. In order to save memory and disk, I'd like to rebuild that index *without* those fields, but I don't have original documents to rebuild entire index with (don't have the full-text anymore, etc.). Is there some way to rebuild or optimize an existing index with only a sub-set of the existing indexed fields? Or alternatively is there a way to avoid loading some indexed fields at all ( to avoid loading term infos and terms index ) ? Thanks Bob
Re: Can I rebuild an index and remove some fields?
I have roughly read the codes of 4.0 trunk. maybe it's feasible. SegmentMerger.add(IndexReader) will add to be merged Readers merge() will call mergeTerms(segmentWriteState); mergePerDoc(segmentWriteState); mergeTerms() will construct fields from IndexReaders for(int readerIndex=0;readerIndexmergeState.readers.size();readerIndex++) { final MergeState.IndexReaderAndLiveDocs r = mergeState.readers.get(readerIndex); final Fields f = r.reader.fields(); final int maxDoc = r.reader.maxDoc(); if (f != null) { slices.add(new ReaderUtil.Slice(docBase, maxDoc, readerIndex)); fields.add(f); } docBase += maxDoc; } So If you wrapper your IndexReader and override its fields() method, maybe it will work for merge terms. for DocValues, it can also override AtomicReader.docValues(). just return null for fields you want to remove. maybe it should traverse CompositeReader's getSequentialSubReaders() and wrapper each AtomicReader other things like term vectors norms are similar. On Wed, Feb 15, 2012 at 6:30 AM, Robert Stewart bstewart...@gmail.comwrote: I was thinking if I make a wrapper class that aggregates another IndexReader and filter out terms I don't want anymore it might work. And then pass that wrapper into SegmentMerger. I think if I filter out terms on GetFieldNames(...) and Terms(...) it might work. Something like: HashSetstring ignoredTerms=...; FilteringIndexReader wrapper=new FilterIndexReader(reader); SegmentMerger merger=new SegmentMerger(writer); merger.add(wrapper); merger.Merge(); On Feb 14, 2012, at 1:49 AM, Li Li wrote: for method 2, delete is wrong. we can't delete terms. you also should hack with the tii and tis file. On Tue, Feb 14, 2012 at 2:46 PM, Li Li fancye...@gmail.com wrote: method1, dumping data for stored fields, you can traverse the whole index and save it to somewhere else. for indexed but not stored fields, it may be more difficult. if the indexed and not stored field is not analyzed(fields such as id), it's easy to get from FieldCache.StringIndex. But for analyzed fields, though theoretically it can be restored from term vector and term position, it's hard to recover from index. method 2, hack with metadata 1. indexed fields delete by query, e.g. field:* 2. stored fields because all fields are stored sequentially. it's not easy to delete some fields. this will not affect search speed. but if you want to get stored fields, and the useless fields are very long, then it will slow down. also it's possible to hack with it. but need more effort to understand the index file format and traverse the fdt/fdx file. http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/fileformats.html this will give you some insight. On Tue, Feb 14, 2012 at 6:29 AM, Robert Stewart bstewart...@gmail.com wrote: Lets say I have a large index (100M docs, 1TB, split up between 10 indexes). And a bunch of the stored and indexed fields are not used in search at all. In order to save memory and disk, I'd like to rebuild that index *without* those fields, but I don't have original documents to rebuild entire index with (don't have the full-text anymore, etc.). Is there some way to rebuild or optimize an existing index with only a sub-set of the existing indexed fields? Or alternatively is there a way to avoid loading some indexed fields at all ( to avoid loading term infos and terms index ) ? Thanks Bob
Re: New segment file created too often
Commit is called after adding each document you should add enough documents and then calling a commit. commit is a cost operation. if you want to get latest feeded documents, you could use NRT On Tue, Feb 14, 2012 at 12:47 AM, Huy Le hu...@springpartners.com wrote: Hi, I am using solr 3.5. I seeing solr keeps creating new segment files (1MB files) so often that it triggers segment merge about every one minute. I search the news archive, but could not find any info on this issue. I am indexing about 10 docs of less 2KB each every second. Commit is called after adding each document. Relevant config params are: mergeFactor10/mergeFactor ramBufferSizeMB1024/ramBufferSizeMB maxMergeDocs2147483647/maxMergeDocs What might be triggering this frequent new segment files creation? Thanks! Huy -- Huy Le Spring Partners, Inc. http://springpadit.com
Re: New segment file created too often
as far as I know, there are three situation it will be flushed to a new segment: RAM buffer for posting data structure is used up; added doc numbers are exceeding threshold and there are many deletions in a segment but your configuration seems it is not likely to flush many small segments. ramBufferSizeMB1024/ramBufferSizeMB maxMergeDocs2147483647/maxMergeDocs On Tue, Feb 14, 2012 at 1:10 AM, Huy Le hu...@springpartners.com wrote: Hi, I am using solr 3.5. As I understood it, NRT is a solr 4 feature, but solr 4 is not released yet. I understand commit after adding each document is expensive, but the application requires that documents be available after adding to the index. What I don't understand is why new segment files are created so often. Are the commit calls triggering new segment files being created? I don't see this behavior in another environment of the same version of solr. Huy On Mon, Feb 13, 2012 at 11:55 AM, Li Li fancye...@gmail.com wrote: Commit is called after adding each document you should add enough documents and then calling a commit. commit is a cost operation. if you want to get latest feeded documents, you could use NRT On Tue, Feb 14, 2012 at 12:47 AM, Huy Le hu...@springpartners.com wrote: Hi, I am using solr 3.5. I seeing solr keeps creating new segment files (1MB files) so often that it triggers segment merge about every one minute. I search the news archive, but could not find any info on this issue. I am indexing about 10 docs of less 2KB each every second. Commit is called after adding each document. Relevant config params are: mergeFactor10/mergeFactor ramBufferSizeMB1024/ramBufferSizeMB maxMergeDocs2147483647/maxMergeDocs What might be triggering this frequent new segment files creation? Thanks! Huy -- Huy Le Spring Partners, Inc. http://springpadit.com -- Huy Le Spring Partners, Inc. http://springpadit.com
Re: New segment file created too often
can you post your config file? I found there are 2 places to config ramBufferSizeMB in latest svn of 3.6's example solrconfig.xml. trying to modify them both? indexDefaults useCompoundFilefalse/useCompoundFile mergeFactor10/mergeFactor !-- Sets the amount of RAM that may be used by Lucene indexing for buffering added documents and deletions before they are flushed to the Directory. -- ramBufferSizeMB32/ramBufferSizeMB !-- If both ramBufferSizeMB and maxBufferedDocs is set, then Lucene will flush based on whichever limit is hit first. -- !-- maxBufferedDocs1000/maxBufferedDocs -- maxFieldLength1/maxFieldLength writeLockTimeout1000/writeLockTimeout . !-- termIndexInterval256/termIndexInterval -- /indexDefaults !-- Main Index Values here override the values in the indexDefaults section for the main on disk index. -- mainIndex useCompoundFilefalse/useCompoundFile ramBufferSizeMB32/ramBufferSizeMB mergeFactor10/mergeFactor /mainIndex On Tue, Feb 14, 2012 at 1:10 AM, Huy Le hu...@springpartners.com wrote: Hi, I am using solr 3.5. As I understood it, NRT is a solr 4 feature, but solr 4 is not released yet. I understand commit after adding each document is expensive, but the application requires that documents be available after adding to the index. What I don't understand is why new segment files are created so often. Are the commit calls triggering new segment files being created? I don't see this behavior in another environment of the same version of solr. Huy On Mon, Feb 13, 2012 at 11:55 AM, Li Li fancye...@gmail.com wrote: Commit is called after adding each document you should add enough documents and then calling a commit. commit is a cost operation. if you want to get latest feeded documents, you could use NRT On Tue, Feb 14, 2012 at 12:47 AM, Huy Le hu...@springpartners.com wrote: Hi, I am using solr 3.5. I seeing solr keeps creating new segment files (1MB files) so often that it triggers segment merge about every one minute. I search the news archive, but could not find any info on this issue. I am indexing about 10 docs of less 2KB each every second. Commit is called after adding each document. Relevant config params are: mergeFactor10/mergeFactor ramBufferSizeMB1024/ramBufferSizeMB maxMergeDocs2147483647/maxMergeDocs What might be triggering this frequent new segment files creation? Thanks! Huy -- Huy Le Spring Partners, Inc. http://springpadit.com -- Huy Le Spring Partners, Inc. http://springpadit.com
Re: Can I rebuild an index and remove some fields?
method1, dumping data for stored fields, you can traverse the whole index and save it to somewhere else. for indexed but not stored fields, it may be more difficult. if the indexed and not stored field is not analyzed(fields such as id), it's easy to get from FieldCache.StringIndex. But for analyzed fields, though theoretically it can be restored from term vector and term position, it's hard to recover from index. method 2, hack with metadata 1. indexed fields delete by query, e.g. field:* 2. stored fields because all fields are stored sequentially. it's not easy to delete some fields. this will not affect search speed. but if you want to get stored fields, and the useless fields are very long, then it will slow down. also it's possible to hack with it. but need more effort to understand the index file format and traverse the fdt/fdx file. http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/fileformats.html this will give you some insight. On Tue, Feb 14, 2012 at 6:29 AM, Robert Stewart bstewart...@gmail.comwrote: Lets say I have a large index (100M docs, 1TB, split up between 10 indexes). And a bunch of the stored and indexed fields are not used in search at all. In order to save memory and disk, I'd like to rebuild that index *without* those fields, but I don't have original documents to rebuild entire index with (don't have the full-text anymore, etc.). Is there some way to rebuild or optimize an existing index with only a sub-set of the existing indexed fields? Or alternatively is there a way to avoid loading some indexed fields at all ( to avoid loading term infos and terms index ) ? Thanks Bob
Re: Can I rebuild an index and remove some fields?
for method 2, delete is wrong. we can't delete terms. you also should hack with the tii and tis file. On Tue, Feb 14, 2012 at 2:46 PM, Li Li fancye...@gmail.com wrote: method1, dumping data for stored fields, you can traverse the whole index and save it to somewhere else. for indexed but not stored fields, it may be more difficult. if the indexed and not stored field is not analyzed(fields such as id), it's easy to get from FieldCache.StringIndex. But for analyzed fields, though theoretically it can be restored from term vector and term position, it's hard to recover from index. method 2, hack with metadata 1. indexed fields delete by query, e.g. field:* 2. stored fields because all fields are stored sequentially. it's not easy to delete some fields. this will not affect search speed. but if you want to get stored fields, and the useless fields are very long, then it will slow down. also it's possible to hack with it. but need more effort to understand the index file format and traverse the fdt/fdx file. http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/fileformats.html this will give you some insight. On Tue, Feb 14, 2012 at 6:29 AM, Robert Stewart bstewart...@gmail.comwrote: Lets say I have a large index (100M docs, 1TB, split up between 10 indexes). And a bunch of the stored and indexed fields are not used in search at all. In order to save memory and disk, I'd like to rebuild that index *without* those fields, but I don't have original documents to rebuild entire index with (don't have the full-text anymore, etc.). Is there some way to rebuild or optimize an existing index with only a sub-set of the existing indexed fields? Or alternatively is there a way to avoid loading some indexed fields at all ( to avoid loading term infos and terms index ) ? Thanks Bob
more sql-like commands for solr
hi all, we have used solr to provide searching service in many products. I found for each product, we have to do some configurations and query expressions. our users are not used to this. they are familiar with sql and they may describe like this: I want a query that can search books whose title contains java, and I will group these books by publishing year and order by matching score and freshness, the weight of score is 2 and the weight of freshness is 1. maybe they will be happy if they can use sql like statements to convey their needs. select * from books where title contains java group by pub_year order by score^2, freshness^1 also they may like they can insert or delete documents by delete from books where title contains java and pub_year between 2011 and 2012. we can define some language similar to sql and translate the to solr query string such as .../select/?q=+title:java^2 +pub_year:2011 this may be equivalent to apache hive for hadoop.
Re: Chinese Phonetic search
you can convert Chinese words to pinyin and use n-gram to search phonetic similar words On Wed, Feb 8, 2012 at 11:10 AM, Floyd Wu floyd...@gmail.com wrote: Hi there, Does anyone here ever implemented phonetic search especially with Chinese(traditional/simplified) using SOLR or Lucene? Please share some thought or point me a possible solution. (hint me search keywords) I've searched and read lot of related articles but have no luck. Many thanks. Floyd