Can we manipulate termfreq to count as 1 for multiple matches?
Hi All I am wondering if there is a way to alter term frequency of a certain field as 1, even if there are multiple matches in that document? Use Case is: Let's say that I have a document with 2 fields - Name and - Description And, there is a document with data like this Document_1 Name = Blue Jeans Description = This jeans is very soft. Jeans is pretty nice. Now, If I Search for "Jeans" then "Jeans" is found in 2 places in Description field. Term Frequency for Description is 2 I want Solr to count term frequency for Description as 1 even if "Jeans" is found multiple times in this field. For all other fields, i do want to get the term frequency, as it is. Is this doable in Solr with any of the functions? Any inputs are welcome. Thanks Saroj
Re: can we configure spellcheck to be invoked after request processing?
James, You are right. I was setting up spell checker incorrectly. It works correctly as you described. Spell checker is invoked after the query component and it does not stop Solr from executing query. Thanks for correcting me. Saroj On Fri, Mar 1, 2013 at 7:30 AM, Dyer, James wrote: > I'm a little confused here because if you are searching q=jeap OR denim , > then you should be getting both documents back. Having spellcheck > configured does not affect your search results at all. Having it in your > request will sometime result in spelling suggestions, usually if one or > more terms you queried is not in the index. But if all of your query terms > are optional then you need only have 1 term match anything to get results. > You should get the same results regardless of whether or not you have > spellcheck in the request. > > While spellcheck does not affect your query results, the results do affect > spellcheck. This is why you should put spellcheck in the "last-components" > section of your request handler configuration. This ensures that the query > is run before spellcheck. > > James Dyer > Ingram Content Group > (615) 213-4311 > > > -Original Message- > From: roz dev [mailto:rozde...@gmail.com] > Sent: Thursday, February 28, 2013 6:33 PM > To: solr-user@lucene.apache.org > Subject: can we configure spellcheck to be invoked after request > processing? > > Hi All, > I may be asking a stupid question but please bear with me. > > Is it possible to configure Spell check to be invoked after Solr has > processed the original query? > > My use case is : > > I am using DirectSpellChecker and have a document which has "Denim" as a > term and there is another document which has "Jeap". > > I am issuing a Search as "Jean" or "Denim" > > I am finding that this Solr query is giving me ZERO results and suggesting > "Jeap" as an alternative. > > I want Solr to try to run the query for "Jean" or "Denim" and if there are > no results found then only suggest "Jeap" as an alternative > > Is this doable in Solr? > > Any suggestions. > > -Saroj > >
Re: How to re-read the config files in Solr, on a commit
Thanks Otis for pointing this out. We may end up using search time synonyms for single word synonym and use index time synonym for multi world synonyms. -Saroj On Tue, Nov 6, 2012 at 8:09 PM, Otis Gospodnetic wrote: > Hi, > > Note about modifying synonyms - you need to reindex, really, if using > index-time synonyms. And if you're using search-time synonyms you have > multi-word synonym issue described on the Wiki. > > Otis > -- > Performance Monitoring - http://sematext.com/spm > On Nov 6, 2012 11:02 PM, "roz dev" wrote: > > > Erick > > > > We have a requirement where seach admin can add or remove some synonyms > and > > would want these changes to be reflected in search thereafter. > > > > yes, we looked at reload command and it seems to be suitable for that > > purpose. We have a master and slave setup so it should be OK to issue > > reload command on master. I expect that slaves will pull the latest > config > > files. > > > > Is reload operation very costly, in terms of time and cpu? We have a > > multicore setup and would need to issue reload on multiple cores. > > > > Thanks > > Saroj > > > > > > On Tue, Nov 6, 2012 at 5:02 AM, Erick Erickson > >wrote: > > > > > Not that I know of. This would be extremely expensive in the usual > case. > > > Loading up configs, reconfiguring all the handlers etc. would add a > huge > > > amount of overhead to the commit operation, which is heavy enough as it > > is. > > > > > > What's the use-case here? Changing your configs really often and > reading > > > them on commit sounds like a way to make for a very confusing > > application! > > > > > > But if you really need to re-read all this info on a running system, > > > consider the core admin RELOAD command. > > > > > > Best > > > Erick > > > > > > > > > On Mon, Nov 5, 2012 at 8:43 PM, roz dev wrote: > > > > > > > Hi All > > > > > > > > I am keen to find out if Solr exposes any event listener or other > hooks > > > > which can be used to re-read configuration files. > > > > > > > > > > > > I know that we have firstSearcher event but I am not sure if it > causes > > > > request handlers to reload themselves and read the conf files again. > > > > > > > > For example, if I change the synonym file and solr gets a commit, > will > > it > > > > re-initialize request handlers and re-read the conf files. > > > > > > > > Or, are there some events which can be listened to? > > > > > > > > Any inputs are welcome. > > > > > > > > Thanks > > > > Saroj > > > > > > > > > >
Re: How to re-read the config files in Solr, on a commit
Erick We have a requirement where seach admin can add or remove some synonyms and would want these changes to be reflected in search thereafter. yes, we looked at reload command and it seems to be suitable for that purpose. We have a master and slave setup so it should be OK to issue reload command on master. I expect that slaves will pull the latest config files. Is reload operation very costly, in terms of time and cpu? We have a multicore setup and would need to issue reload on multiple cores. Thanks Saroj On Tue, Nov 6, 2012 at 5:02 AM, Erick Erickson wrote: > Not that I know of. This would be extremely expensive in the usual case. > Loading up configs, reconfiguring all the handlers etc. would add a huge > amount of overhead to the commit operation, which is heavy enough as it is. > > What's the use-case here? Changing your configs really often and reading > them on commit sounds like a way to make for a very confusing application! > > But if you really need to re-read all this info on a running system, > consider the core admin RELOAD command. > > Best > Erick > > > On Mon, Nov 5, 2012 at 8:43 PM, roz dev wrote: > > > Hi All > > > > I am keen to find out if Solr exposes any event listener or other hooks > > which can be used to re-read configuration files. > > > > > > I know that we have firstSearcher event but I am not sure if it causes > > request handlers to reload themselves and read the conf files again. > > > > For example, if I change the synonym file and solr gets a commit, will it > > re-initialize request handlers and re-read the conf files. > > > > Or, are there some events which can be listened to? > > > > Any inputs are welcome. > > > > Thanks > > Saroj > > >
Re: How to change the boost of fields in edismx at runtime
Thanks Hoss. Yes, that approach would work as I can change the query. Is there a way to extend the Edismax Handler to read a config file at startup and then use some events like commit to instruct edismax handler to re-read the config file. That way, I can ensure that my boost params are just on on Solr Servers' config files and If I need to change, I would just change the file and wait for commit to re-read the file. Any inputs? -Saroj On Thu, Nov 1, 2012 at 2:50 PM, Chris Hostetter wrote: > > : Then, If I find that results are not of my liking then I would like to > : change the boost as following > : > : - Title - boosted to 2 > : -Keyword - boosted to 10 > : > : Is there any way to change this boost, at run-time, without having to > : restart solr with new boosts in edismax? > > edismax field boosts (specified in the qf and pf params) can always be > specified at runtime -- first and foremost they are query params. > > when you put then in your solrconfig.xml file those are just as "defaults" > (or invariants, or appends) of those query params. > > > > -Hoss >
Re: SolrJ - IOException
I have seen this happening We retry and that works. Is your solr server stalled? On Mon, Sep 24, 2012 at 4:50 PM, balaji.gandhi wrote: > Hi, > > I am encountering this error randomly (under load) when posting to Solr > using SolrJ. > > Has anyone encountered a similar error? > > org.apache.solr.client.solrj.SolrServerException: IOException occured when > talking to server at: http://localhost:8080/solr/profile at > > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:414) > at > > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182) > at > > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) > at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:122) at > org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:107) at > > Thanks, > Balaji > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/SolrJ-IOException-tp4010026.html > Sent from the Solr - User mailing list archive at Nabble.com. >
IndexDocValues in Solr
Changing the Subject Line to make it easier to understand the topic of the message is there any plan to expose IndexDocValues as part of Solr 4? Any thoughts? -Saroj On Thu, Aug 2, 2012 at 5:10 PM, roz dev wrote: > As we all know, FIeldCache can be costly if we have lots of documents and > lots of fields to sort on. > I see that IndexDocValues are better at sorting and faceting, w.r.t Memory > usage > > Is there any plan to use IndexDocValues in SOLR for doing sorting and > faceting? > > Will SOLR 4 or 5 have indexDocValues? Is there an easy way to use > IndexDocValues in Solr even though it is not implemented yet? > > -Saroj > >
Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter
Thanks Robert for these inputs. Since we do not really Snowball analyzer for this field, we would not use it for now. If this still does not address our issue, we would tweak thread pool as per eks dev suggestion - I am bit hesitant to do this change yet as we would be reducing thread pool which can adversely impact our throughput If Snowball Filter is being optimized for Solr 4 beta then it would be great for us. If you have already filed a JIRA for this then please let me know and I would like to follow it Thanks again Saroj On Wed, Aug 1, 2012 at 8:37 AM, Robert Muir wrote: > On Tue, Jul 31, 2012 at 2:34 PM, roz dev wrote: > > Hi All > > > > I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing > that > > when we are indexing lots of data with 16 concurrent threads, Heap grows > > continuously. It remains high and ultimately most of the stuff ends up > > being moved to Old Gen. Eventually, Old Gen also fills up and we start > > getting into excessive GC problem. > > Hi: I don't claim to know anything about how tomcat manages threads, > but really you shouldnt have all these objects. > > In general snowball stemmers should be reused per-thread-per-field. > But if you have a lot of fields*threads, especially if there really is > high thread churn on tomcat, then this could be bad with snowball: > see eks dev's comment on https://issues.apache.org/jira/browse/LUCENE-3841 > > I think it would be useful to see if you can tune tomcat's threadpool > as he describes. > > separately: Snowball stemmers are currently really ram-expensive for > stupid reasons. > each one creates a ton of Among objects, e.g. an EnglishStemmer today > is about 8KB. > > I'll regenerate these and open a JIRA issue: as the snowball code > generator in their svn was improved > recently and each one now takes about 64 bytes instead (the Among's > are static and reused). > > Still this wont really "solve your problem", because the analysis > chain could have other heavy parts > in initialization, but it seems good to fix. > > As a workaround until then you can also just use the "good old > PorterStemmer" (PorterStemFilterFactory in solr). > Its not exactly the same as using Snowball(English) but its pretty > close and also much faster. > > -- > lucidimagination.com >
Re: solr/tomcat stops responding
You are referring to a very old thread Did you take any heap dump and thread dumo? They can help you get more insight. -Saroj On Tue, Jul 31, 2012 at 9:04 AM, Suneel wrote: > Hello Kevin, > > I am also facing same problem After few hours or few day my solr server > getting crash. > I try to download following patch but its not accessible now. i am using > 3.1 version of solr. > > http://people.apache.org/~yonik/solr/current/solr.war > > > > - > Regards, > > Suneel Pandey > Sr. Software Developer > -- > View this message in context: > http://lucene.472066.n3.nabble.com/solr-tomcat-stops-responding-tp474577p3998435.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Memory leak?? with CloseableThreadLocal with use of Snowball Filter
Hi All I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing that when we are indexing lots of data with 16 concurrent threads, Heap grows continuously. It remains high and ultimately most of the stuff ends up being moved to Old Gen. Eventually, Old Gen also fills up and we start getting into excessive GC problem. I took a heap dump and found that most of the memory is consumed by CloseableThreadLocal which is holding a WeakHashMap of Threads and its state. Most of the old gen is full with ThreadLocal eating up 3GB of heap and heap dump shows that all such entries are using Snowball Filter. I looked into LUCENE-3841 and verified that my version of SOLR 4 has that code. So, I am wondering the reason for this memory leak - is it due to some other bug with Solr/Lucene? Here is a brief snapshot of HeapDump showing the problem Class Name | Shallow Heap | Retained Heap - *org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer @ 0x300c3eb28 | 24 | 3,885,213,072* |- class org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer @ 0x2f9753340 |0 | 0 |- this$0 org.apache.solr.schema.IndexSchema @ 0x300bf4048 | 96 | 276,704 *|- reuseStrategy org.apache.lucene.analysis.Analyzer$PerFieldReuseStrategy @ 0x300c3eb40 | 16 | 3,885,208,728* | |- class org.apache.lucene.analysis.Analyzer$PerFieldReuseStrategy @ 0x2f98368c0 |0 | 0 | |- storedValue org.apache.lucene.util.CloseableThreadLocal @ 0x300c3eb50 | 24 | 3,885,208,712 | | |- class org.apache.lucene.util.CloseableThreadLocal @ 0x2f9788918 |8 | 8 | | |- t java.lang.ThreadLocal @ 0x300c3eb68 | 16 |16 | | | '- class java.lang.ThreadLocal @ 0x2f80f0868 System Class|8 |24 *| | |- hardRefs java.util.WeakHashMap @ 0x300c3eb78 | 48 | 3,885,208,656* | | | |- class java.util.WeakHashMap @ 0x2f8476c00 System Class| 16 |16 | | | |- table java.util.WeakHashMap$Entry[16] @ 0x300c3eba8 | 80 | 2,200,016,960 | | | | |- class java.util.WeakHashMap$Entry[] @ 0x2f84789e8 |0 | 0 | | | | |-* [7] java.util.WeakHashMap$Entry @ 0x306a24950 | 40 | 318,502,920* | | | | | |- class java.util.WeakHashMap$Entry @ 0x2f84786f8 System Class|0 | 0 | | | | | |- queue java.lang.ref.ReferenceQueue @ 0x300c3ebf8 | 32 |48 | | | | | |- referent java.lang.Thread @ 0x30678c2c0 web-23 | 112 | 160 | | | | | |- value java.util.HashMap @ 0x30678cbb0 | 48 | 318,502,880 | | | | | | |- class java.util.HashMap @ 0x2f80b9428 System Class | 24 |24 *| | | | | | |- table java.util.HashMap$Entry[32768] @ 0x3c07c6f58 | 131,088 | 318,502,832* | | | | | | | |- class java.util.HashMap$Entry[] @ 0x2f80bd9c8 |0 | 0 | | | | | | | |- [10457] java.util.HashMap$Entry @ 0x30678cbe0 | 32 |40,864 | | | | | | | | |- class java.util.HashMap$Entry @ 0x2f80bd400 System Class |0 | 0 | | | | | | | | |- key java.lang.String @ 0x30678cc00 prod_desc_keywd_en_CA | 32 |96 | | | | | | | | |- value org.apache.solr.analysis.TokenizerChain$SolrTokenStreamComponents @ 0x30678cc60 | 24 |20,344 | | | | | | | | |- next java.util.HashMap$Entry @ 0x39a2c9100 | 32 |20,392 | | | | | | | | | |- class java.util.HashMap$Entry @ 0x2f80bd400 System Class|0 | 0 | | | | | | | | | |- key java.lang.String @ 0x39a2c9120 3637994_fr_CA_cat_name_keywd| 32 | 104 | | | | | | | | | |- value org.apache.solr.analysis.TokenizerChain$SolrTokenStreamComponents @ 0x39a2c9188 | 24 |20,256 | | | | | | | | | | |- class org.apache.solr.analysis.TokenizerChain$SolrTokenStreamComponents @ 0x2f97a69a0|0 | 0 | | | | | | | | | | |- this$0 org.apache.solr.analysis.TokenizerChain @ 0x300bf615
Re: too many instances of "org.tartarus.snowball.Among" in the heap
is it some kind of memory leak with Lucene's use of Snowball Stemmer? I tried to google for Snowball Stemmer but could not find any recent info about memory leak this old link does indicate some memory leak but it is from 2004 http://snowball.tartarus.org/archives/snowball-discuss/0631.html Any inputs are welcome -Saroj On Mon, Jul 30, 2012 at 4:39 PM, roz dev wrote: > I did take couple of thread dumps and they seem to be fine > > Heap dump is huge - close to 15GB > > I am having hard time to analyze that heap dump > > 2012-07-30 16:07:32 > Full thread dump Java HotSpot(TM) 64-Bit Server VM (19.0-b09 mixed mode): > > "RMI TCP Connection(33)-10.8.21.124" - Thread t@190 >java.lang.Thread.State: RUNNABLE > at sun.management.ThreadImpl.dumpThreads0(Native Method) > at sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:374) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:167) > at > com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:96) > at > com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:33) > at > com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208) > at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120) > at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262) > at javax.management.StandardMBean.invoke(StandardMBean.java:391) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836) > at > com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761) > at > javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427) > at > javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72) > at > javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265) > at > javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360) > at > javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788) > at sun.reflect.GeneratedMethodAccessor50.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305) > at sun.rmi.transport.Transport$1.run(Transport.java:159) > at java.security.AccessController.doPrivileged(Native Method) > at sun.rmi.transport.Transport.serviceCall(Transport.java:155) > at > sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > >Locked ownable synchronizers: > - locked <49cbecf2> (a > java.util.concurrent.locks.ReentrantLock$NonfairSync) > > "JMX server connection timeout 189" - Thread t@189 >java.lang.Thread.State: TIMED_WAITING > at java.lang.Object.wait(Native Method) > - waiting on (a [I) > at > com.sun.jmx.remote.internal.ServerCommunicatorAdmin$Timeout.run(ServerCommunicatorAdmin.java:150) > at java.lang.Thread.run(Thread.java:662) > >Locked ownable synchronizers: > - None > > "web-77" - Thread t@186 >java.lang.Thread.State: WAITING > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <5ab03cb6> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) > at > java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) >
Re: too many instances of "org.tartarus.snowball.Among" in the heap
nnection.ConnectionThread.connect(ConnectionThread.java:260) at com.wily.introscope.agent.connection.ConnectionThread.run(ConnectionThread.java:64) at java.lang.Thread.run(Thread.java:662) Locked ownable synchronizers: - None "Agent Execution" - Thread t@10 java.lang.Thread.State: WAITING at java.lang.Object.wait(Native Method) - waiting on <2b54befa> (a com.wily.util.adt.BlockingQueue) at java.lang.Object.wait(Object.java:485) at com.wily.util.adt.BlockingQueue.interruptableDequeue(BlockingQueue.java:123) at com.wily.util.task.AsynchExecutionQueue.doTask(AsynchExecutionQueue.java:200) at com.wily.util.task.ATask$CoreTask.run(ATask.java:132) at java.lang.Thread.run(Thread.java:662) Locked ownable synchronizers: - None "Agent Heartbeat" - Thread t@5 java.lang.Thread.State: TIMED_WAITING at java.lang.Thread.sleep(Native Method) at com.wily.util.heartbeat.IntervalHeartbeat$HeartbeatRunnable.run(IntervalHeartbeat.java:670) at java.lang.Thread.run(Thread.java:662) Locked ownable synchronizers: - None "Remove Metric Data Watch Heartbeat Heartbeat" - Thread t@7 java.lang.Thread.State: TIMED_WAITING at java.lang.Thread.sleep(Native Method) at com.wily.util.heartbeat.IntervalHeartbeat$HeartbeatRunnable.run(IntervalHeartbeat.java:670) at java.lang.Thread.run(Thread.java:662) Locked ownable synchronizers: - None "Configuration Watch Heartbeat Heartbeat" - Thread t@6 java.lang.Thread.State: TIMED_WAITING at java.lang.Thread.sleep(Native Method) at com.wily.util.heartbeat.IntervalHeartbeat$HeartbeatRunnable.run(IntervalHeartbeat.java:670) at java.lang.Thread.run(Thread.java:662) Locked ownable synchronizers: - None "Signal Dispatcher" - Thread t@4 java.lang.Thread.State: RUNNABLE Locked ownable synchronizers: - None "Finalizer" - Thread t@3 java.lang.Thread.State: WAITING at java.lang.Object.wait(Native Method) - waiting on <48c6254f> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) Locked ownable synchronizers: - None "Reference Handler" - Thread t@2 java.lang.Thread.State: WAITING at java.lang.Object.wait(Native Method) - waiting on <48bb8adc> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) Locked ownable synchronizers: - None "main" - Thread t@1 java.lang.Thread.State: RUNNABLE at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390) - locked <11dacd96> (a java.net.SocksSocketImpl) at java.net.ServerSocket.implAccept(ServerSocket.java:462) at com.wily.introscope.agent.probe.net.ManagedServerSocket.com_wily_accept14(ManagedServerSocket.java:362) at com.wily.introscope.agent.probe.net.ManagedServerSocket.accept(ManagedServerSocket.java:267) at org.apache.catalina.core.StandardServer.await(StandardServer.java:431) at org.apache.catalina.startup.Catalina.await(Catalina.java:676) at org.apache.catalina.startup.Catalina.start(Catalina.java:628) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) Locked ownable synchronizers: - None On Fri, Jul 27, 2012 at 5:19 AM, Alexandre Rafalovitch wrote: > Try taking a couple of thread dumps and see where in the stack the > snowball classes show up. That might give you a clue. > > Did you customize the parameters to the stemmer? If so, maybe it has > problems with the file you gave it. > > Just some generic thoughts that might help. > > Regards, >Alex. > Personal blog: http://blog.outerthoughts.com/ > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > - Time is the quality of nature that keeps events from happening all > at once. Lately, it doesn't seem to be working. (Anonymous - via GTD > book) > > > On Fri, Jul 27, 2012 at 3:53 AM, roz dev wrote: > > Hi All > > > > I am trying to find out the reason for very high memory use and ran JMAP > > -hist > > > > It is showing that i have too many instances of > org.tartarus.snowball.Among > > > > Any ideas what is this for and why am I getting so many of them > > > > num #instances#bytes Class description > > > -- > > *1: 467281101869124400 > org.tartarus.snowball.Among > > * > > 2: 5244210 1840458960 byte[] >
too many instances of "org.tartarus.snowball.Among" in the heap
Hi All I am trying to find out the reason for very high memory use and ran JMAP -hist It is showing that i have too many instances of org.tartarus.snowball.Among Any ideas what is this for and why am I getting so many of them num #instances#bytes Class description -- *1: 467281101869124400 org.tartarus.snowball.Among * 2: 5244210 1840458960 byte[] 3: 526519495969839368 char[] 4: 10008928864769280 int[] 5: 10250527410021080 java.util.LinkedHashMap$Entry 6: 4672811 268474232 org.tartarus.snowball.Among[] *7: 8072312 258313984 java.util.HashMap$Entry* 8: 466514 246319392 org.apache.lucene.util.fst.FST$Arc[] 9: 1828542 237600432 java.util.HashMap$Entry[] 10: 3834312 153372480 java.util.TreeMap$Entry 11: 2684700 128865600 org.apache.lucene.util.fst.Builder$UnCompiledNode 12: 4712425 113098200 org.apache.lucene.util.BytesRef 13: 3484836 111514752 java.lang.String 14: 2636045 105441800 org.apache.lucene.index.FieldInfo 15: 1813561 101559416 java.util.LinkedHashMap 16: 6291619 100665904 java.lang.Integer 17: 2684700 85910400 org.apache.lucene.util.fst.Builder$Arc 18: 956998 84215824 org.apache.lucene.index.TermsHashPerField 19: 2892957 69430968 org.apache.lucene.util.AttributeSource$State 20: 2684700 64432800 org.apache.lucene.util.fst.Builder$Arc[] 21: 685595 60332360org.apache.lucene.util.fst.FST 22: 933451 59210944java.lang.Object[] 23: 957043 53594408org.apache.lucene.util.BytesRefHash 24: 591463 42585336 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader 25: 424801 40780896 org.tartarus.snowball.ext.EnglishStemmer 26: 424801 40780896 org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter 27: 1549670 37192080org.apache.lucene.index.Term 28: 849602 33984080 org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter$WordDelimiterConcatenation 29: 424801 27187264 org.apache.lucene.analysis.core.WhitespaceTokenizer 30: 478499 26795944 org.apache.lucene.index.FreqProxTermsWriterPerField 31: 535521 25705008 org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray 32: 219081 24537072 org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter 33: 478499 22967952 org.apache.lucene.index.FieldInvertState 34: 956998 22967952 org.apache.lucene.index.TermsHashPerField$PostingsBytesStartArray 35: 478499 22967952 org.apache.lucene.index.TermVectorsConsumerPerField 36: 478499 22967952 org.apache.lucene.index.NormsConsumerPerField 37: 316582 22793904 org.apache.lucene.store.MMapDirectory$MMapIndexInput 38: 906708 21760992 org.apache.lucene.util.AttributeSource$State[] 39: 906708 21760992 org.apache.lucene.analysis.tokenattributes.OffsetAttributeImpl 40: 883588 21206112java.util.ArrayList 41: 438192 21033216 org.apache.lucene.store.RAMOutputStream 42: 860601 20654424java.lang.StringBuilder 43: 424801 20390448 org.apache.lucene.analysis.miscellaneous.WordDelimiterIterator 44: 424801 20390448 org.apache.lucene.analysis.core.StopFilter 45: 424801 20390448 org.apache.lucene.analysis.miscellaneous.KeywordMarkerFilter 46: 424801 20390448 org.apache.lucene.analysis.snowball.SnowballFilter 47: 839390 20145360 org.apache.lucene.index.DocumentsWriterDeleteQueue$TermNode -Saroj
Re: leaks in solr
in my case, I see only 1 searcher, no field cache - still Old Gen is almost full at 22 GB Does it have to do with index or some other configuration -Saroj On Thu, Jul 26, 2012 at 7:41 PM, Lance Norskog wrote: > What does the "Statistics" page in the Solr admin say? There might be > several "searchers" open: org.apache.solr.search.SolrIndexSearcher > > Each searcher holds open different generations of the index. If > obsolete index files are held open, it may be old searchers. How big > are the caches? How long does it take to autowarm them? > > On Thu, Jul 26, 2012 at 6:15 PM, Karthick Duraisamy Soundararaj > wrote: > > Mark, > > We use solr 3.6.0 on freebsd 9. Over a period of time, it > > accumulates lots of space! > > > > On Thu, Jul 26, 2012 at 8:47 PM, roz dev wrote: > > > >> Thanks Mark. > >> > >> We are never calling commit or optimize with openSearcher=false. > >> > >> As per logs, this is what is happening > >> > >> > openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false} > >> > >> -- > >> But, We are going to use 4.0 Alpha and see if that helps. > >> > >> -Saroj > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> On Thu, Jul 26, 2012 at 5:12 PM, Mark Miller > >> wrote: > >> > >> > I'd take a look at this issue: > >> > https://issues.apache.org/jira/browse/SOLR-3392 > >> > > >> > Fixed late April. > >> > > >> > On Jul 26, 2012, at 7:41 PM, roz dev wrote: > >> > > >> > > it was from 4/11/12 > >> > > > >> > > -Saroj > >> > > > >> > > On Thu, Jul 26, 2012 at 4:21 PM, Mark Miller > > >> > wrote: > >> > > > >> > >> > >> > >> On Jul 26, 2012, at 3:18 PM, roz dev wrote: > >> > >> > >> > >>> Hi Guys > >> > >>> > >> > >>> I am also seeing this problem. > >> > >>> > >> > >>> I am using SOLR 4 from Trunk and seeing this issue repeat every > day. > >> > >>> > >> > >>> Any inputs about how to resolve this would be great > >> > >>> > >> > >>> -Saroj > >> > >> > >> > >> > >> > >> Trunk from what date? > >> > >> > >> > >> - Mark > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > > >> > - Mark Miller > >> > lucidimagination.com > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > > > -- > Lance Norskog > goks...@gmail.com >
Re: leaks in solr
Thanks Mark. We are never calling commit or optimize with openSearcher=false. As per logs, this is what is happening openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false} -- But, We are going to use 4.0 Alpha and see if that helps. -Saroj On Thu, Jul 26, 2012 at 5:12 PM, Mark Miller wrote: > I'd take a look at this issue: > https://issues.apache.org/jira/browse/SOLR-3392 > > Fixed late April. > > On Jul 26, 2012, at 7:41 PM, roz dev wrote: > > > it was from 4/11/12 > > > > -Saroj > > > > On Thu, Jul 26, 2012 at 4:21 PM, Mark Miller > wrote: > > > >> > >> On Jul 26, 2012, at 3:18 PM, roz dev wrote: > >> > >>> Hi Guys > >>> > >>> I am also seeing this problem. > >>> > >>> I am using SOLR 4 from Trunk and seeing this issue repeat every day. > >>> > >>> Any inputs about how to resolve this would be great > >>> > >>> -Saroj > >> > >> > >> Trunk from what date? > >> > >> - Mark > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > > - Mark Miller > lucidimagination.com > > > > > > > > > > > >
Re: leaks in solr
it was from 4/11/12 -Saroj On Thu, Jul 26, 2012 at 4:21 PM, Mark Miller wrote: > > On Jul 26, 2012, at 3:18 PM, roz dev wrote: > > > Hi Guys > > > > I am also seeing this problem. > > > > I am using SOLR 4 from Trunk and seeing this issue repeat every day. > > > > Any inputs about how to resolve this would be great > > > > -Saroj > > > Trunk from what date? > > - Mark > > > > > > > > > >
Re: leaks in solr
Hi Guys I am also seeing this problem. I am using SOLR 4 from Trunk and seeing this issue repeat every day. Any inputs about how to resolve this would be great -Saroj On Thu, Jul 26, 2012 at 8:33 AM, Karthick Duraisamy Soundararaj < karthick.soundara...@gmail.com> wrote: > Did you find any more clues? I have this problem in my machines as well.. > > On Fri, Jun 29, 2012 at 6:04 AM, Bernd Fehling < > bernd.fehl...@uni-bielefeld.de> wrote: > > > Hi list, > > > > while monitoring my solr 3.6.1 installation I recognized an increase of > > memory usage > > in OldGen JVM heap on my slave. I decided to force Full GC from jvisualvm > > and > > send optimize to the already optimized slave index. Normally this helps > > because > > I have monitored this issue over the past. But not this time. The Full GC > > didn't free any memory. So I decided to take a heap dump and see what > > MemoryAnalyzer > > is showing. The heap dump is about 23 GB in size. > > > > 1.) > > Report Top consumers - Biggest Objects: > > Total: 12.3 GB > > org.apache.lucene.search.FieldCacheImpl : 8.1 GB > > class java.lang.ref.Finalizer : 2.1 GB > > org.apache.solr.util.ConcurrentLRUCache : 1.5 GB > > org.apache.lucene.index.ReadOnlySegmentReader : 622.5 MB > > ... > > > > As you can see, Finalizer has already reached 2.1 GB!!! > > > > * java.util.concurrent.ConcurrentHashMap$Segment[16] @ 0x37b056fd0 > > * segments java.util.concurrent.ConcurrentHashMap @ 0x39b02d268 > > * map org.apache.solr.util.ConcurrentLRUCache @ 0x398f33c30 > > * referent java.lang.ref.Finalizer @ 0x37affa810 > > * next java.lang.ref.Finalizer @ 0x37affa838 > > ... > > > > Seams to be org.apache.solr.util.ConcurrentLRUCache > > The attributes are: > > > > Type |Name | Value > > - > > boolean| isDestroyed | true > > - > > ref| cleanupThread| null > > > > ref| evictionListener | null > > --- > > long | oldestEntry | 0 > > -- > > int| acceptableWaterMark | 9500 > > > -- > > ref| stats| org.apache.solr.util.ConcurrentLRUCache$Stats > > @ 0x37b074dc8 > > > > boolean| islive | true > > - > > boolean| newThreadForCleanup | false > > > > boolean| isCleaning | false > > > > > > > ref| markAndSweepLock | java.util.concurrent.locks.ReentrantLock @ > > 0x39bf63978 > > - > > int| lowerWaterMark | 9000 > > - > > int| upperWaterMark | 1 > > - > > ref| map | java.util.concurrent.ConcurrentHashMap @ > > 0x39b02d268 > > -- > > > > > > > > > > 2.) > > While searching for open files and their references I noticed that there > > are references to > > index files which are already deleted from disk. > > E.g. recent index files are "data/index/_2iqw.frq" and > > "data/index/_2iqx.frq". > > But I also see references to "data/index/_2hid.frq" which are quite old > > and are deleted way back > > from earlier replications. > > I have to analyze this a bit deeper. > > > > > > So far my report, I go on analyzing this huge heap dump. > > If you need any other info or even the heap dump, let me know. > > > > > > Regards > > Bernd > > > > >
Re: Issue with field collapsing in solr 4 while performing distributed search
I think that there is no way around doing custom logic in this case. If indexing process knows that documents have to be grouped then they better be together. -Saroj On Mon, Jun 11, 2012 at 6:37 AM, Nitesh Nandy wrote: > Martijn, > > How do we add a custom algorithm for distributing documents in Solr Cloud? > According to this discussion > > http://lucene.472066.n3.nabble.com/SolrCloud-how-to-index-documents-into-a-specific-core-and-how-to-search-against-that-core-td3985262.html > , Mark discourages users from using custom distribution mechanism in Solr > Cloud. > > Load balancing is not an issue for us at the moment. In that case, how > should we implement a custom partitioning algorithm. > > > On Mon, Jun 11, 2012 at 6:23 PM, Martijn v Groningen < > martijn.v.gronin...@gmail.com> wrote: > > > The ngroups returns the number of groups that have matched with the > > query. However if you want ngroups to be correct in a distributed > > environment you need > > to put document belonging to the same group into the same shard. > > Groups can't cross shard boundaries. I guess you need to do > > some manual document partitioning. > > > > Martijn > > > > On 11 June 2012 14:29, Nitesh Nandy wrote: > > > Version: Solr 4.0 (svn build 30th may, 2012) with Solr Cloud (2 slices > > and > > > 2 shards) > > > > > > The setup was done as per the wiki: > > http://wiki.apache.org/solr/SolrCloud > > > > > > We are doing distributed search. While querying, we use field > collapsing > > > with "ngroups" set as true as we need the number of search results. > > > > > > However, there is a difference in the number of "result list" returned > > and > > > the "ngroups" value returned. > > > > > > Ex: > > > > > > http://localhost:8983/solr/select?q=message:blah%20AND%20userid:3&&group=true&group.field=id&group.ngroups=true > > > > > > > > > The response XMl looks like > > > > > > > > > > > > > > > 0 > > > 46 > > > > > > id > > > true > > > true > > > messagebody:monit AND usergroupid:3 > > > > > > > > > > > > > > > 10 > > > 9 > > > > > > > > > 320043 > > > > > > ... > > > > > > > > > > > > 398807 > > > ... > > > > > > > > > > > > 346878 > > > ... > > > > > > > > > 346880 > > > ... > > > > > > > > > > > > > > > > > > > > > So you can see that the ngroups value returned is 9 and the actual > number > > > of groups returned is 4 > > > > > > Why do we have this discrepancy in the ngroups, matches and actual > number > > > of groups. Is this an open issue ? > > > > > > Any kind of help is appreciated. > > > > > > -- > > > Regards, > > > > > > Nitesh Nandy > > > > > > > > -- > > Met vriendelijke groet, > > > > Martijn van Groningen > > > > > > -- > Regards, > > Nitesh Nandy >
Re: How to do custom sorting in Solr?
Yes, these documents have lots of unique values as the same product could be assigned to lots of other categories and that too, in a different sort order. We did some evaluation of heap usage and found that with kind of queries we generate, heap usage was going up to 24-26 GB. I could trace it to the fact that fieldCache is creating an array of 2M size for each of the sort fields. Since same products are mapped to multiple categories, we incur significant memory overhead. Therefore, any solve where memory consumption can be reduced is a good one for me. In fact, we have situations where same product is mapped to more than 1 sub-category in the same category like Books -- Programming - Java in a nutshell -- Sale (40% off) - Java in a nutshell So,another thought in my mind is to somehow use second pass collector to group books appropriately in Programming and Sale categories, with right sort order. But, i have no clue about that piece :( -Saroj On Sun, Jun 10, 2012 at 4:30 PM, Erick Erickson wrote: > 2M docs is actually pretty small. Sorting is sensitive to the number > of _unique_ values in the sort fields, not necessarily the number of > documents. > > And sorting only works on fields with a single value (i.e. it can't have > more than one token after analysis). So for each field you're only talking > 2M values at the vary maximum, assuming that the field in question has > a unique value per document, which I doubt very much given your > problem description. > > So with a corpus that size, I'd "just try it'. > > Best > Erick > > On Sun, Jun 10, 2012 at 7:12 PM, roz dev wrote: > > Thanks Erik for your quick feedback > > > > When Products are assigned to a category or Sub-Category then they can be > > in any order and price type can be regular or markdown. > > So, reg and markdown products are intermingled as per their assignment > but > > I want to sort them in such a way that we > > ensure that all the products which are on markdown are at the bottom of > the > > list. > > > > I can use these multiple sorts but I realize that they are costly in > terms > > of heap used, as they are using FieldCache. > > > > I have an index with 2M docs and docs are pretty big. So, I don't want to > > use them unless there is no other option. > > > > I am wondering if I can define a custom function query which can be like > > this: > > > > > > - check if product is on the markdown > > - if yes then change its sort order field to be the max value in the > > given sub-category, say 99 > > - else, use the sort order of the product in the sub-category > > > > I have been looking at existing function queries but do not have a good > > handle on how to make one of my own. > > > > - Another option could be use a custom sort comparator but I am not sure > > about the way it works > > > > Any thoughts? > > > > > > -Saroj > > > > > > > > > > On Sun, Jun 10, 2012 at 5:02 AM, Erick Erickson >wrote: > > > >> Skimming this, I two options come to mind: > >> > >> 1> Simply apply primary, secondary, etc sorts. Something like > >> &sort=subcategory asc,markdown_or_regular desc,sort_order asc > >> > >> 2> You could also use grouping to arrange things in groups and sort > within > >> those groups. This has the advantage of returning some members > >> of each of the top N groups in the result set, which makes it > easier > >> to > >> get some of each group rather than having to analyze the whole > >> list > >> > >> But your example is somewhat contradictory. You say > >> "products which are on markdown, are at > >> the bottom of the documents list" > >> > >> But in your examples, products on "markdown" are intermingled > >> > >> Best > >> Erick > >> > >> On Sun, Jun 10, 2012 at 3:36 AM, roz dev wrote: > >> > Hi All > >> > > >> >> > >> >> I have an index which contains a Catalog of Products and Categories, > >> with > >> >> Solr 4.0 from trunk > >> >> > >> >> Data is organized like this: > >> >> > >> >> Category: Books > >> >> > >> >> Sub Category: Programming > >> >> > >> >> Products: > >> >> > >> >> Product # 1, Price: Regular Sort Order:1 > >> >> Product # 2, Price: Markdown, So
Re: How to do custom sorting in Solr?
Thanks Erik for your quick feedback When Products are assigned to a category or Sub-Category then they can be in any order and price type can be regular or markdown. So, reg and markdown products are intermingled as per their assignment but I want to sort them in such a way that we ensure that all the products which are on markdown are at the bottom of the list. I can use these multiple sorts but I realize that they are costly in terms of heap used, as they are using FieldCache. I have an index with 2M docs and docs are pretty big. So, I don't want to use them unless there is no other option. I am wondering if I can define a custom function query which can be like this: - check if product is on the markdown - if yes then change its sort order field to be the max value in the given sub-category, say 99 - else, use the sort order of the product in the sub-category I have been looking at existing function queries but do not have a good handle on how to make one of my own. - Another option could be use a custom sort comparator but I am not sure about the way it works Any thoughts? -Saroj On Sun, Jun 10, 2012 at 5:02 AM, Erick Erickson wrote: > Skimming this, I two options come to mind: > > 1> Simply apply primary, secondary, etc sorts. Something like > &sort=subcategory asc,markdown_or_regular desc,sort_order asc > > 2> You could also use grouping to arrange things in groups and sort within > those groups. This has the advantage of returning some members > of each of the top N groups in the result set, which makes it easier > to > get some of each group rather than having to analyze the whole > list > > But your example is somewhat contradictory. You say > "products which are on markdown, are at > the bottom of the documents list" > > But in your examples, products on "markdown" are intermingled > > Best > Erick > > On Sun, Jun 10, 2012 at 3:36 AM, roz dev wrote: > > Hi All > > > >> > >> I have an index which contains a Catalog of Products and Categories, > with > >> Solr 4.0 from trunk > >> > >> Data is organized like this: > >> > >> Category: Books > >> > >> Sub Category: Programming > >> > >> Products: > >> > >> Product # 1, Price: Regular Sort Order:1 > >> Product # 2, Price: Markdown, Sort Order:2 > >> Product # 3 Price: Regular, Sort Order:3 > >> Product # 4 Price: Regular, Sort Order:4 > >> > >> . > >> ... > >> Product # 100 Price: Regular, Sort Order:100 > >> > >> Sub Category: Fiction > >> > >> Products: > >> > >> Product # 1, Price: Markdown, Sort Order:1 > >> Product # 2, Price: Regular, Sort Order:2 > >> Product # 3 Price: Regular, Sort Order:3 > >> Product # 4 Price: Markdown, Sort Order:4 > >> > >> . > >> ... > >> Product # 70 Price: Regular, Sort Order:70 > >> > >> > >> I want to query Solr and sort these products within each of the > >> sub-category in a such a way that products which are on markdown, are at > >> the bottom of the documents list and other products > >> which are on regular price, are sorted as per their sort order in their > >> sub-category. > >> > >> Expected Results are > >> > >> Category: Books > >> > >> Sub Category: Programming > >> > >> Products: > >> > >> Product # 1, Price: Regular Sort Order:1 > >> Product # 2, Price: Markdown, Sort Order:101 > >> Product # 3 Price: Regular, Sort Order:3 > >> Product # 4 Price: Regular, Sort Order:4 > >> > >> . > >> ... > >> Product # 100 Price: Regular, Sort Order:100 > >> > >> Sub Category: Fiction > >> > >> Products: > >> > >> Product # 1, Price: Markdown, Sort Order:71 > >> Product # 2, Price: Regular, Sort Order:2 > >> Product # 3 Price: Regular, Sort Order:3 > >> Product # 4 Price: Markdown, Sort Order:71 > >> > >> . > >> ... > >> Product # 70 Price: Regular, Sort Order:70 > >> > >> > >> My query is like this: > >> > >> q=*:*&fq=category:Books > >> > >> What are the options to implement custom sorting and how do I do it? > >> > >> > >>- Define a Custom Function query? > >>- Define a Custom Comparator? Or, > >>- Define a Custom Collector? > >> > >> > >> Please let me know the best way to go about it and any pointers to > >> customize Solr 4. > >> > > > > Thanks > > Saroj >
Re: How to do custom sorting in Solr?
Hi All > > I have an index which contains a Catalog of Products and Categories, with > Solr 4.0 from trunk > > Data is organized like this: > > Category: Books > > Sub Category: Programming > > Products: > > Product # 1, Price: Regular Sort Order:1 > Product # 2, Price: Markdown, Sort Order:2 > Product # 3 Price: Regular, Sort Order:3 > Product # 4 Price: Regular, Sort Order:4 > > . > ... > Product # 100 Price: Regular, Sort Order:100 > > Sub Category: Fiction > > Products: > > Product # 1, Price: Markdown, Sort Order:1 > Product # 2, Price: Regular, Sort Order:2 > Product # 3 Price: Regular, Sort Order:3 > Product # 4 Price: Markdown, Sort Order:4 > > . > ... > Product # 70 Price: Regular, Sort Order:70 > > > I want to query Solr and sort these products within each of the > sub-category in a such a way that products which are on markdown, are at > the bottom of the documents list and other products > which are on regular price, are sorted as per their sort order in their > sub-category. > > Expected Results are > > Category: Books > > Sub Category: Programming > > Products: > > Product # 1, Price: Regular Sort Order:1 > Product # 2, Price: Markdown, Sort Order:101 > Product # 3 Price: Regular, Sort Order:3 > Product # 4 Price: Regular, Sort Order:4 > > . > ... > Product # 100 Price: Regular, Sort Order:100 > > Sub Category: Fiction > > Products: > > Product # 1, Price: Markdown, Sort Order:71 > Product # 2, Price: Regular, Sort Order:2 > Product # 3 Price: Regular, Sort Order:3 > Product # 4 Price: Markdown, Sort Order:71 > > . > ... > Product # 70 Price: Regular, Sort Order:70 > > > My query is like this: > > q=*:*&fq=category:Books > > What are the options to implement custom sorting and how do I do it? > > >- Define a Custom Function query? >- Define a Custom Comparator? Or, >- Define a Custom Collector? > > > Please let me know the best way to go about it and any pointers to > customize Solr 4. > Thanks Saroj
Is there any performance cost of using lots of OR in the solr query
Hi All, I am working on an application which makes few solr calls to get the data. On the high level, We have a requirement like this - Make first call to Solr, to get the list of products which are children of a given category - Make 2nd solr call to get product documents based on a list of product ids 2nd query will look like q=document_type:SKU&fq=product_id:(34 OR 45 OR 56 OR 77) We can have close to 100 product ids in fq. is there a performance cost of doing these solr calls which have lots of OR? As per Slide # 41 of Presentation "The Seven Deadly Sins of Solr", it is a bad idea to have these kind of queries. http://www.slideshare.net/lucenerevolution/hill-jay-7-sins-of-solrpdf But, It does not become clear the reason it is bad. Any inputs will be welcome. Thanks Saroj
Solr Cloud, Commits and Master/Slave configuration
Hi All, I am trying to understand features of Solr Cloud, regarding commits and scaling. - If I am using Solr Cloud then do I need to explicitly call commit (hard-commit)? Or, a soft commit is okay and Solr Cloud will do the job of writing to disk? - Do We still need to use Master/Slave setup to scale searching? If we have to use Master/Slave setup then do i need to issue hard-commit to make my changes visible to slaves? - If I were to use NRT with Master/Slave setup with soft commit then will the slave be able to see changes made on master with soft commit? Any inputs are welcome. Thanks -Saroj
Re: hot deploy of newer version of solr schema in production
Thanks Jan for your inputs. I am keen to know about the way people keep running live sites while there is a breaking change which calls for complete re-indexing. we want to build a new index , with new schema (it may take couple of hours) without impacting live e-commerce site. any thoughts are welcome Thanks Saroj On Tue, Jan 24, 2012 at 12:21 AM, Jan Høydahl wrote: > Hi, > > To be able to do a true hot deploy of newer schema without reindexing, you > must carefully see to that none of your changes are breaking changes. So > you should test the process on your development machine and make sure it > works. Adding and deleting fields would work, but not changing the > field-type or analysis of an existing field. Depending on from/to version, > you may want to keep the old schema-version number. > > The process is: > 1. Deploy the new schema, including all dependencies such as dictionaries > 2. Do a RELOAD CORE http://wiki.apache.org/solr/CoreAdmin#RELOAD > > My preference is to do a more thorough upgrade of schema including new > functionality and breaking changes, and then do a full reindex. The > exception is if my index is huge and the reason for Solr upgrade or schema > change is to fix a bug, not to use new functionality. > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > Solr Training - www.solrtraining.com > > On 24. jan. 2012, at 01:51, roz dev wrote: > > > Hi All, > > > > I need community's feedback about deploying newer versions of solr schema > > into production while existing (older) schema is in use by applications. > > > > How do people perform these things? What has been the learning of people > > about this. > > > > Any thoughts are welcome. > > > > Thanks > > Saroj > >
hot deploy of newer version of solr schema in production
Hi All, I need community's feedback about deploying newer versions of solr schema into production while existing (older) schema is in use by applications. How do people perform these things? What has been the learning of people about this. Any thoughts are welcome. Thanks Saroj
Index format difference between 4.0 and 3.4
Hi All, We are using Solr 1.4.1 in production and are considering an upgrade to newer version. It seems that Solr 3.x requires a complete rebuild of index as the format seems to have changed. Is Solr 4.0 index file format compatible with Solr 3.x format? Please advise. Thanks Saroj
Re: Production Issue: SolrJ client throwing this error even though field type is not defined in schema
This issue disappeared when we reduced the number of documents which were being returned from Solr. Looks to be some issue with Tomcat or Solr, returning truncated responses. -Saroj On Sun, Sep 25, 2011 at 9:21 AM, wrote: > If I had to give a gentle nudge, I would ask you to validate your schema > XML file. You can do so by looking for any w3c XML validator website and > just copy pasting the text there to find out where its malformed. > > Sent from my iPhone > > On Sep 24, 2011, at 2:01 PM, Erick Erickson > wrote: > > > You might want to review: > > > > http://wiki.apache.org/solr/UsingMailingLists > > > > There's really not much to go on here. > > > > Best > > Erick > > > > On Wed, Sep 21, 2011 at 12:13 PM, roz dev wrote: > >> Hi All > >> > >> We are getting this error in our Production Solr Setup. > >> > >> Message: Element type "t_sort" must be followed by either attribute > >> specifications, ">" or "/>". > >> Solr version is 1.4.1 > >> > >> Stack trace indicates that solr is returning malformed document. > >> > >> > >> Caused by: org.apache.solr.client.solrj.SolrServerException: Error > >> executing query > >>at > org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) > >>at > org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118) > >>at > com.gap.gid.search.impl.SearchServiceImpl.executeQuery(SearchServiceImpl.java:232) > >>... 15 more > >> Caused by: org.apache.solr.common.SolrException: parsing error > >>at > org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:140) > >>at > org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:101) > >>at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:481) > >>at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) > >>at > org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89) > >>... 17 more > >> Caused by: javax.xml.stream.XMLStreamException: ParseError at > >> [row,col]:[3,136974] > >> Message: Element type "t_sort" must be followed by either attribute > >> specifications, ">" or "/>". > >>at > com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:594) > >>at > org.apache.solr.client.solrj.impl.XMLResponseParser.readArray(XMLResponseParser.java:282) > >>at > org.apache.solr.client.solrj.impl.XMLResponseParser.readDocument(XMLResponseParser.java:410) > >>at > org.apache.solr.client.solrj.impl.XMLResponseParser.readDocuments(XMLResponseParser.java:360) > >>at > org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:241) > >>at > org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:125) > >>... 21 more > >> >
Re: Production Issue: SolrJ client throwing - Element type must be followed by either attribute specifications, ">" or "/>".
Wanted to update the list with our finding. We reduced the number of documents which are being retrieved from Solr and this error did not appear again. Might be the case that due to high number of documents, solr is returning incomplete documents. -Saroj On Wed, Sep 21, 2011 at 12:13 PM, roz dev wrote: > Hi All > > We are getting this error in our Production Solr Setup. > > Message: Element type "t_sort" must be followed by either attribute > specifications, ">" or "/>". > Solr version is 1.4.1 > > Stack trace indicates that solr is returning malformed document. > > > Caused by: org.apache.solr.client.solrj.SolrServerException: Error executing > query > at > org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) > at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118) > at > com.gap.gid.search.impl.SearchServiceImpl.executeQuery(SearchServiceImpl.java:232) > ... 15 more > Caused by: org.apache.solr.common.SolrException: parsing error > at > org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:140) > at > org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:101) > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:481) > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) > at > org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89) > ... 17 more > Caused by: javax.xml.stream.XMLStreamException: ParseError at > [row,col]:[3,136974] > Message: Element type "t_sort" must be followed by either attribute > specifications, ">" or "/>". > at > com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:594) > at > org.apache.solr.client.solrj.impl.XMLResponseParser.readArray(XMLResponseParser.java:282) > at > org.apache.solr.client.solrj.impl.XMLResponseParser.readDocument(XMLResponseParser.java:410) > at > org.apache.solr.client.solrj.impl.XMLResponseParser.readDocuments(XMLResponseParser.java:360) > at > org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:241) > at > org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:125) > ... 21 more > >
Production Issue: SolrJ client throwing this error even though field type is not defined in schema
Hi All We are getting this error in our Production Solr Setup. Message: Element type "t_sort" must be followed by either attribute specifications, ">" or "/>". Solr version is 1.4.1 Stack trace indicates that solr is returning malformed document. Caused by: org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118) at com.gap.gid.search.impl.SearchServiceImpl.executeQuery(SearchServiceImpl.java:232) ... 15 more Caused by: org.apache.solr.common.SolrException: parsing error at org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:140) at org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:101) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:481) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89) ... 17 more Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[3,136974] Message: Element type "t_sort" must be followed by either attribute specifications, ">" or "/>". at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:594) at org.apache.solr.client.solrj.impl.XMLResponseParser.readArray(XMLResponseParser.java:282) at org.apache.solr.client.solrj.impl.XMLResponseParser.readDocument(XMLResponseParser.java:410) at org.apache.solr.client.solrj.impl.XMLResponseParser.readDocuments(XMLResponseParser.java:360) at org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:241) at org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:125) ... 21 more
q and fq in solr 1.4.1
Hi All I am sure that q vs fq question has been answered several times. But, I still have a question which I would like to know the answers for: if we have a solr query like this q=*&fq=field_1:XYZ&fq=field_2:ABC&sortBy=field_3+asc How does SolrIndexSearcher fire query in 1.4.1 Will it fire query against whole index first because q=* then filter the results against field_1 and field_2 or is it in parallel? and, if we say that get only 20 rows at a time then will solr do following 1) get all the docs (because q is set to *) and sort them by field_3 2) then, filter the results by field_1 and field_2 Or, will it apply sorting after doing the filter? Please let me know how Solr 1.4.1 works. Thanks Saroj
cache invalidation in slaves
Hi All Solr has different types of caches such as filterCache, queryResultCache and document Cache . I know that if a commit is done then a new searcher is opened and new caches are built. And, this makes sense. What happens when commits are happening on master and slaves are pulling all the delta updates. Do slaves trash their cache and rebuild them every time there is a new delta index updates downloaded to slave? Thanks Saroj
what is the default value of omitNorms and termVectors in solr schema
Hi As per this document, http://wiki.apache.org/solr/FieldOptionsByUseCase, omitNorms and termVectors have to be "explicitly" specified in some cases. I am wondering what is the default value of these settings if solr schema definition does not state them. *Example:* In above case, will Solr create norms for this field and term vector as well? Any ideas? Thanks Saroj
Re: Does Solr flush to disk even before ramBufferSizeMB is hit?
Thanks Shawn. If Solr writes this info to Disk as soon as possible (which is what I am seeing) then ramBuffer setting seems to be misleading. Anyone else has any thoughts on this? -Saroj On Mon, Aug 29, 2011 at 6:14 AM, Shawn Heisey wrote: > On 8/28/2011 11:18 PM, roz dev wrote: > >> I notice that even though InfoStream does not mention that data is being >> flushed to disk, new segment files were created on the server. >> Size of these files kept growing even though there was enough Heap >> available >> and 856MB Ram was not even used. >> > > With the caveat that I am not an expert and someone may correct me, I'll > offer this: It's been my experience that Solr will write the files that > constitute stored fields as soon as they are available, because that > information is always the same and nothing will change in those files based > on the next chunk of data. > > Thanks, > Shawn > >
Does Solr flush to disk even before ramBufferSizeMB is hit?
Hi All, I am trying to tune ramBufferSizeMB and merge factor for my setup. So, i enabled Lucene Index Writer's log info stream and started monitoring Data folder where index files are created. I started my test with following Heap: 3GB Solr 1.4.1, Index Size = 20 GB, ramBufferSizeMB=856 Merge Factor=25 I ran my testing with 30 concurrent threads writing to Solr. My jobs delete 6 (approx) records by issuing a deleteByQuery command and then proceed to write data. Commit is done at the end of writing process. Results are bit surprising for me and I need some help understanding them. I notice that even though InfoStream does not mention that data is being flushed to disk, new segment files were created on the server. Size of these files kept growing even though there was enough Heap available and 856MB Ram was not even used. Is it the case that Lucene is flushing to disk even if ramBufferSizeMB is being hit. If that is the case then why is it that InfoStream is not logging this info. As per Infostream, it is flushing at the end but files are created much before that. Here is what InfoStream is saying: - Please note that is indicating that a new segment is being flushed at 12:58 AM but files were created at 12:53 am itself and they kept growing. Aug 29, 2011 12:46:00 AM IW 0 [main]: setInfoStream: dir=org.apache.lucene.store.NIOFSDirectory@/opt/gid/solr/ecom/data/index autoCommit=false mergePolicy=org.apache.lucene.index.LogByteSizeMergePolicy@4552a64dmergeScheduler=org.apache.lucene.index.ConcurrentMergeScheduler@35242cc9ramBufferSizeMB=856.0 maxBufferedDocs=-1 maxBuffereDeleteTerms=-1 maxFieldLength=1 index=_3l:C2151995 Aug 29, 2011 12:57:35 AM IW 0 [web-1]: now flush at close Aug 29, 2011 12:57:35 AM IW 0 [web-1]: flush: now pause all indexing threads Aug 29, 2011 12:57:35 AM IW 0 [web-1]: flush: segment=_3m docStoreSegment=_3m docStoreOffset=0 flushDocs=true flushDeletes=true flushDocStores=true numDocs=60788 numBufDelTerms=60788 Aug 29, 2011 12:57:35 AM IW 0 [web-1]: index before flush _3l:C2151995 Aug 29, 2011 12:57:35 AM IW 0 [web-1]: DW: flush postings as segment _3m numDocs=60788 Aug 29, 2011 12:57:35 AM IW 0 [web-1]: DW: closeDocStore: 2 files to flush to segment _3m numDocs=60788 Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleIntBlocks count=9 total now 9 Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleByteBlocks blockSize=32768 count=182 total now 182 Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleCharBlocks count=49 total now 49 Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleIntBlocks count=7 total now 16 Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleByteBlocks blockSize=32768 count=145 total now 327 Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleCharBlocks count=37 total now 86 Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleIntBlocks count=9 total now 25 Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleByteBlocks blockSize=32768 count=208 total now 535 Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleCharBlocks count=52 total now 138 Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleIntBlocks count=7 total now 32 Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleByteBlocks blockSize=32768 count=136 total now 671 Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleCharBlocks count=39 total now 177 Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleIntBlocks count=3 total now 35 Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleByteBlocks blockSize=32768 count=58 total now 729 Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleCharBlocks count=16 total now 193 Aug 29, 2011 12:57:41 AM IW 0 [web-1]: DW: oldRAMSize=50469888 newFlushedSize=161169038 docs/MB=395.491 new/old=319.337% Aug 29, 2011 12:57:41 AM IFD [web-1]: now checkpoint "segments_1x" [2 segments ; isCommit = false] Aug 29, 2011 12:57:41 AM IW 0 [web-1]: DW: apply 60788 buffered deleted terms and 0 deleted docIDs and 1 deleted queries on 2 segments. Aug 29, 2011 12:57:42 AM IFD [web-1]: now checkpoint "segments_1x" [2 segments ; isCommit = false] Aug 29, 2011 12:57:42 AM IFD [web-1]: now checkpoint "segments_1x" [2 segments ; isCommit = false] Aug 29, 2011 12:57:42 AM IW 0 [web-1]: LMP: findMerges: 2 segments Aug 29, 2011 12:57:42 AM IW 0 [web-1]: LMP: level 6.6799455 to 7.4299455: 1 segments Aug 29, 2011 12:57:42 AM IW 0 [web-1]: LMP: level 5.1209826 to 5.8709826: 1 segments Aug 29, 2011 12:57:42 AM IW 0 [web-1]: CMS: now merge Aug 29, 2011 12:57:42 AM IW 0 [web-1]: CMS: index: _3l:C2151995 _3m:C60788 Aug 29, 2011 12:57:42 AM IW 0 [web-1]: CMS: no more merges pending; now return Aug 29, 2011 12:57:42 AM IW 0 [web-1]: CMS: now merge Aug 29, 2011 12:57:42 AM IW 0 [web-1]: CMS: index: _3l:C2151995 _3m:C60788 Aug 29, 2011 12:57:42 AM IW 0 [web-1]: CMS: no more merges pending; now return Aug 29, 2011 12:57:42 AM IW 0 [web-1]: now call final commit() Aug 29, 2011 12:57:42 AM IW 0 [web-1]: startCommit(): start sizeInBytes=0 Aug 29, 2011 12:57:42
SolrJ Question about Bad Request Root cause error
Hi All We are using SolrJ client (v 1.4.1) to integrate with our solr search server. We notice that whenever SolrJ request does not match with Solr schema, we get Bad Request exception which makes sense. org.apache.solr.common.SolrException: Bad Request But, SolrJ Client does not provide any clue about the reason request is Bad. Is there any way to get the root cause on client side? Of Course, solr server logs have enough info to know that data is bad but it would be great to have the same info in the exception generated by SolrJ. Any thoughts? Is there any plan to add this in future releases? Thanks, Saroj