Can we manipulate termfreq to count as 1 for multiple matches?

2013-03-13 Thread roz dev
Hi All

I am wondering if there is a way to alter term frequency of a certain field
as 1, even if there are multiple matches in that document?

Use Case is:

Let's say that I have a document with 2 fields

- Name and
- Description

And, there is a document with data like this

Document_1
Name = Blue Jeans
Description = This jeans is very soft.  Jeans is pretty nice.

Now, If I Search for "Jeans" then "Jeans" is found in 2 places in
Description field.

Term Frequency for Description is 2

I want Solr to count term frequency for Description as 1 even if "Jeans" is
found multiple times in this field.

For all other fields, i do want to get the term frequency, as it is.

Is this doable in Solr with any of the functions?

Any inputs are welcome.

Thanks
Saroj


Re: can we configure spellcheck to be invoked after request processing?

2013-03-04 Thread roz dev
James,

You are right. I was setting up spell checker incorrectly.

It works correctly as you described.

Spell checker is invoked after the query component and it does not stop
Solr from executing query.

Thanks for correcting me.
Saroj





On Fri, Mar 1, 2013 at 7:30 AM, Dyer, James wrote:

> I'm a little confused here because if you are searching q=jeap OR denim ,
> then you should be getting both documents back.  Having spellcheck
> configured does not affect your search results at all.  Having it in your
> request will sometime result in spelling suggestions, usually if one or
> more terms you queried is not in the index.  But if all of your query terms
> are optional then you need only have 1 term match anything to get results.
>  You should get the same results regardless of whether or not you have
> spellcheck in the request.
>
> While spellcheck does not affect your query results, the results do affect
> spellcheck.  This is why you should put spellcheck in the "last-components"
> section of your request handler configuration.  This ensures that the query
> is run before spellcheck.
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: roz dev [mailto:rozde...@gmail.com]
> Sent: Thursday, February 28, 2013 6:33 PM
> To: solr-user@lucene.apache.org
> Subject: can we configure spellcheck to be invoked after request
> processing?
>
> Hi All,
> I may be asking a stupid question but please bear with me.
>
> Is it possible to configure Spell check to be invoked after Solr has
> processed the original query?
>
> My use case is :
>
> I am using DirectSpellChecker and have a document which has "Denim" as a
> term and there is another document which has "Jeap".
>
> I am issuing a Search as "Jean" or "Denim"
>
> I am finding that this Solr query is giving me ZERO results and suggesting
> "Jeap" as an alternative.
>
> I want Solr to try to run the query for "Jean" or "Denim" and if there are
> no results found then only suggest "Jeap" as an alternative
>
> Is this doable in Solr?
>
> Any suggestions.
>
> -Saroj
>
>


Re: How to re-read the config files in Solr, on a commit

2012-11-06 Thread roz dev
Thanks Otis for pointing this out.

We may end up using search time synonyms for single word synonym and use
index time synonym for multi world synonyms.

-Saroj


On Tue, Nov 6, 2012 at 8:09 PM, Otis Gospodnetic  wrote:

> Hi,
>
> Note about modifying synonyms - you need to reindex, really, if using
> index-time synonyms. And if you're using search-time synonyms you have
> multi-word synonym issue described on the Wiki.
>
> Otis
> --
> Performance Monitoring - http://sematext.com/spm
> On Nov 6, 2012 11:02 PM, "roz dev"  wrote:
>
> > Erick
> >
> > We have a requirement where seach admin can add or remove some synonyms
> and
> > would want these changes to be reflected in search thereafter.
> >
> > yes, we looked at reload command and it seems to be suitable for that
> > purpose. We have a master and slave setup so it should be OK to issue
> > reload command on master. I expect that slaves will pull the latest
> config
> > files.
> >
> > Is reload operation very costly, in terms of time and cpu? We have a
> > multicore setup and would need to issue reload on multiple cores.
> >
> > Thanks
> > Saroj
> >
> >
> > On Tue, Nov 6, 2012 at 5:02 AM, Erick Erickson  > >wrote:
> >
> > > Not that I know of. This would be extremely expensive in the usual
> case.
> > > Loading up configs, reconfiguring all the handlers etc. would add a
> huge
> > > amount of overhead to the commit operation, which is heavy enough as it
> > is.
> > >
> > > What's the use-case here? Changing your configs really often and
> reading
> > > them on commit sounds like a way to make for a very confusing
> > application!
> > >
> > > But if you really need to re-read all this info on a running system,
> > > consider the core admin RELOAD command.
> > >
> > > Best
> > > Erick
> > >
> > >
> > > On Mon, Nov 5, 2012 at 8:43 PM, roz dev  wrote:
> > >
> > > > Hi All
> > > >
> > > > I am keen to find out if Solr exposes any event listener or other
> hooks
> > > > which can be used to re-read configuration files.
> > > >
> > > >
> > > > I know that we have firstSearcher event but I am not sure if it
> causes
> > > > request handlers to reload themselves and read the conf files again.
> > > >
> > > > For example, if I change the synonym file and solr gets a commit,
> will
> > it
> > > > re-initialize request handlers and re-read the conf files.
> > > >
> > > > Or, are there some events which can be listened to?
> > > >
> > > > Any inputs are welcome.
> > > >
> > > > Thanks
> > > > Saroj
> > > >
> > >
> >
>


Re: How to re-read the config files in Solr, on a commit

2012-11-06 Thread roz dev
Erick

We have a requirement where seach admin can add or remove some synonyms and
would want these changes to be reflected in search thereafter.

yes, we looked at reload command and it seems to be suitable for that
purpose. We have a master and slave setup so it should be OK to issue
reload command on master. I expect that slaves will pull the latest config
files.

Is reload operation very costly, in terms of time and cpu? We have a
multicore setup and would need to issue reload on multiple cores.

Thanks
Saroj


On Tue, Nov 6, 2012 at 5:02 AM, Erick Erickson wrote:

> Not that I know of. This would be extremely expensive in the usual case.
> Loading up configs, reconfiguring all the handlers etc. would add a huge
> amount of overhead to the commit operation, which is heavy enough as it is.
>
> What's the use-case here? Changing your configs really often and reading
> them on commit sounds like a way to make for a very confusing application!
>
> But if you really need to re-read all this info on a running system,
> consider the core admin RELOAD command.
>
> Best
> Erick
>
>
> On Mon, Nov 5, 2012 at 8:43 PM, roz dev  wrote:
>
> > Hi All
> >
> > I am keen to find out if Solr exposes any event listener or other hooks
> > which can be used to re-read configuration files.
> >
> >
> > I know that we have firstSearcher event but I am not sure if it causes
> > request handlers to reload themselves and read the conf files again.
> >
> > For example, if I change the synonym file and solr gets a commit, will it
> > re-initialize request handlers and re-read the conf files.
> >
> > Or, are there some events which can be listened to?
> >
> > Any inputs are welcome.
> >
> > Thanks
> > Saroj
> >
>


Re: How to change the boost of fields in edismx at runtime

2012-11-05 Thread roz dev
Thanks Hoss.

Yes, that approach would work as I can change the query.

Is there a way to extend the Edismax Handler to read a config file at
startup and then use some events like commit to instruct edismax handler to
re-read the config file.

That way, I can ensure that my boost params are just on on Solr Servers'
config files and If I need to change, I would just change the file and wait
for commit to re-read the file.

Any inputs?

-Saroj


On Thu, Nov 1, 2012 at 2:50 PM, Chris Hostetter wrote:

>
> : Then, If I find that results are not of my liking then I would like to
> : change the boost as following
> :
> : - Title - boosted to 2
> : -Keyword - boosted to 10
> :
> : Is there any way to change this boost, at run-time, without having to
> : restart solr with new boosts in edismax?
>
> edismax field boosts (specified in the qf and pf params) can always be
> specified at runtime -- first and foremost they are query params.
>
> when you put then in your solrconfig.xml file those are just as "defaults"
> (or invariants, or appends) of those query params.
>
>
>
> -Hoss
>


Re: SolrJ - IOException

2012-09-24 Thread roz dev
I have seen this happening

We retry and that works. Is your solr server stalled?

On Mon, Sep 24, 2012 at 4:50 PM, balaji.gandhi
wrote:

> Hi,
>
> I am encountering this error randomly (under load) when posting to Solr
> using SolrJ.
>
> Has anyone encountered a similar error?
>
> org.apache.solr.client.solrj.SolrServerException: IOException occured when
> talking to server at: http://localhost:8080/solr/profile at
>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:414)
> at
>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182)
> at
>
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
> at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:122) at
> org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:107) at
>
> Thanks,
> Balaji
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrJ-IOException-tp4010026.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


IndexDocValues in Solr

2012-08-03 Thread roz dev
Changing the Subject Line to make it easier to understand the topic of the
message

is there any plan to expose IndexDocValues as part of Solr 4?

Any thoughts?

-Saroj

On Thu, Aug 2, 2012 at 5:10 PM, roz dev  wrote:

> As we all know, FIeldCache can be costly if we have lots of documents and
> lots of fields to sort on.
> I see that IndexDocValues are better at sorting and faceting, w.r.t Memory
> usage
>
> Is there any plan to use IndexDocValues in SOLR for doing sorting and
> faceting?
>
> Will SOLR 4 or 5 have indexDocValues? Is there an easy way to use
> IndexDocValues in Solr even though it is not implemented yet?
>
> -Saroj
>
>


Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

2012-08-01 Thread roz dev
Thanks Robert for these inputs.

Since we do not really Snowball analyzer for this field, we would not use
it for now. If this still does not address our issue, we would tweak thread
pool as per eks dev suggestion - I am bit hesitant to do this change yet as
we would be reducing thread pool which can adversely impact our throughput

If Snowball Filter is being optimized for Solr 4 beta then it would be
great for us. If you have already filed a JIRA for this then please let me
know and I would like to follow it

Thanks again
Saroj





On Wed, Aug 1, 2012 at 8:37 AM, Robert Muir  wrote:

> On Tue, Jul 31, 2012 at 2:34 PM, roz dev  wrote:
> > Hi All
> >
> > I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing
> that
> > when we are indexing lots of data with 16 concurrent threads, Heap grows
> > continuously. It remains high and ultimately most of the stuff ends up
> > being moved to Old Gen. Eventually, Old Gen also fills up and we start
> > getting into excessive GC problem.
>
> Hi: I don't claim to know anything about how tomcat manages threads,
> but really you shouldnt have all these objects.
>
> In general snowball stemmers should be reused per-thread-per-field.
> But if you have a lot of fields*threads, especially if there really is
> high thread churn on tomcat, then this could be bad with snowball:
> see eks dev's comment on https://issues.apache.org/jira/browse/LUCENE-3841
>
> I think it would be useful to see if you can tune tomcat's threadpool
> as he describes.
>
> separately: Snowball stemmers are currently really ram-expensive for
> stupid reasons.
> each one creates a ton of Among objects, e.g. an EnglishStemmer today
> is about 8KB.
>
> I'll regenerate these and open a JIRA issue: as the snowball code
> generator in their svn was improved
> recently and each one now takes about 64 bytes instead (the Among's
> are static and reused).
>
> Still this wont really "solve your problem", because the analysis
> chain could have other heavy parts
> in initialization, but it seems good to fix.
>
> As a workaround until then you can also just use the "good old
> PorterStemmer" (PorterStemFilterFactory in solr).
> Its not exactly the same as using Snowball(English) but its pretty
> close and also much faster.
>
> --
> lucidimagination.com
>


Re: solr/tomcat stops responding

2012-07-31 Thread roz dev
You are referring to a very old thread

Did you take any heap dump and thread dumo?  They can help you get more
insight.

-Saroj


On Tue, Jul 31, 2012 at 9:04 AM, Suneel  wrote:

> Hello Kevin,
>
> I am also facing same problem After few hours or  few day my solr server
> getting crash.
> I try  to download following patch but its not accessible now. i am using
> 3.1 version of solr.
>
> http://people.apache.org/~yonik/solr/current/solr.war
>
>
>
> -
> Regards,
>
> Suneel Pandey
> Sr. Software Developer
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/solr-tomcat-stops-responding-tp474577p3998435.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Memory leak?? with CloseableThreadLocal with use of Snowball Filter

2012-07-31 Thread roz dev
Hi All

I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing that
when we are indexing lots of data with 16 concurrent threads, Heap grows
continuously. It remains high and ultimately most of the stuff ends up
being moved to Old Gen. Eventually, Old Gen also fills up and we start
getting into excessive GC problem.

I took a heap dump and found that most of the memory is consumed by
CloseableThreadLocal which is holding a WeakHashMap of Threads and its
state.

Most of the old gen is full with ThreadLocal eating up 3GB of heap and heap
dump shows that all such entries are using Snowball Filter. I looked into
LUCENE-3841 and verified that my version of SOLR 4 has that code.

So, I am wondering the reason for this memory leak - is it due to some
other bug with Solr/Lucene?

Here is a brief snapshot of HeapDump showing the problem

Class
Name
| Shallow Heap | Retained Heap
-
*org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer @
0x300c3eb28
|   24 | 3,885,213,072*
|-  class org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer @
0x2f9753340   |0
| 0
|- this$0 org.apache.solr.schema.IndexSchema @
0x300bf4048
|   96 |   276,704
*|- reuseStrategy org.apache.lucene.analysis.Analyzer$PerFieldReuseStrategy
@ 0x300c3eb40  |   16 |
3,885,208,728*
|  |-  class
org.apache.lucene.analysis.Analyzer$PerFieldReuseStrategy @
0x2f98368c0   |0 | 0
|  |- storedValue org.apache.lucene.util.CloseableThreadLocal @
0x300c3eb50   |
24 | 3,885,208,712
|  |  |-  class org.apache.lucene.util.CloseableThreadLocal @
0x2f9788918  |8
| 8
|  |  |- t java.lang.ThreadLocal @
0x300c3eb68
|   16 |16
|  |  |  '-  class java.lang.ThreadLocal @ 0x2f80f0868 System
Class|8
|24
*|  |  |- hardRefs java.util.WeakHashMap @
0x300c3eb78
|   48 | 3,885,208,656*
|  |  |  |-  class java.util.WeakHashMap @ 0x2f8476c00 System
Class|   16
|16
|  |  |  |- table java.util.WeakHashMap$Entry[16] @
0x300c3eba8
|   80 | 2,200,016,960
|  |  |  |  |-  class java.util.WeakHashMap$Entry[] @
0x2f84789e8
|0 | 0
|  |  |  |  |-* [7] java.util.WeakHashMap$Entry @
0x306a24950
|   40 |   318,502,920*
|  |  |  |  |  |-  class java.util.WeakHashMap$Entry @ 0x2f84786f8
System Class|0
| 0
|  |  |  |  |  |- queue java.lang.ref.ReferenceQueue @
0x300c3ebf8
|   32 |48
|  |  |  |  |  |- referent java.lang.Thread @ 0x30678c2c0
web-23
|  112 |   160
|  |  |  |  |  |- value java.util.HashMap @
0x30678cbb0
|   48 |   318,502,880
|  |  |  |  |  |  |-  class java.util.HashMap @ 0x2f80b9428 System
Class   |   24
|24
*|  |  |  |  |  |  |- table java.util.HashMap$Entry[32768] @
0x3c07c6f58   |
131,088 |   318,502,832*
|  |  |  |  |  |  |  |-  class java.util.HashMap$Entry[] @
0x2f80bd9c8 |0
| 0
|  |  |  |  |  |  |  |- [10457] java.util.HashMap$Entry @
0x30678cbe0
|   32 |40,864
|  |  |  |  |  |  |  |  |-  class java.util.HashMap$Entry @
0x2f80bd400 System Class   |0
| 0
|  |  |  |  |  |  |  |  |- key java.lang.String @ 0x30678cc00
prod_desc_keywd_en_CA  |
32 |96
|  |  |  |  |  |  |  |  |- value
org.apache.solr.analysis.TokenizerChain$SolrTokenStreamComponents @
0x30678cc60  |   24 |20,344
|  |  |  |  |  |  |  |  |- next java.util.HashMap$Entry @
0x39a2c9100
|   32 |20,392
|  |  |  |  |  |  |  |  |  |-  class java.util.HashMap$Entry @
0x2f80bd400 System Class|0
| 0
|  |  |  |  |  |  |  |  |  |- key java.lang.String @ 0x39a2c9120
3637994_fr_CA_cat_name_keywd|   32
|   104
|  |  |  |  |  |  |  |  |  |- value
org.apache.solr.analysis.TokenizerChain$SolrTokenStreamComponents @
0x39a2c9188   |   24 |20,256
|  |  |  |  |  |  |  |  |  |  |-  class
org.apache.solr.analysis.TokenizerChain$SolrTokenStreamComponents @
0x2f97a69a0|0 | 0
|  |  |  |  |  |  |  |  |  |  |- this$0
org.apache.solr.analysis.TokenizerChain @
0x300bf615

Re: too many instances of "org.tartarus.snowball.Among" in the heap

2012-07-30 Thread roz dev
is it some kind of memory leak with Lucene's use of Snowball Stemmer?

I tried to google for Snowball Stemmer but could not find any recent info
about memory leak

this old link does indicate some memory leak but it is from 2004

http://snowball.tartarus.org/archives/snowball-discuss/0631.html

Any inputs are welcome

-Saroj




On Mon, Jul 30, 2012 at 4:39 PM, roz dev  wrote:

> I did take couple of thread dumps and they seem to be fine
>
> Heap dump is huge - close to 15GB
>
> I am having hard time to analyze that heap dump
>
> 2012-07-30 16:07:32
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (19.0-b09 mixed mode):
>
> "RMI TCP Connection(33)-10.8.21.124" - Thread t@190
>java.lang.Thread.State: RUNNABLE
> at sun.management.ThreadImpl.dumpThreads0(Native Method)
> at sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:374)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:167)
> at
> com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:96)
> at
> com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:33)
> at
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
> at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
> at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
> at javax.management.StandardMBean.invoke(StandardMBean.java:391)
> at
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
> at
> com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
> at
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
> at
> javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
> at
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
> at
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
> at
> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
> at sun.reflect.GeneratedMethodAccessor50.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
> at sun.rmi.transport.Transport$1.run(Transport.java:159)
> at java.security.AccessController.doPrivileged(Native Method)
> at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
> at
> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
> at
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
> at
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
>
>Locked ownable synchronizers:
> - locked <49cbecf2> (a
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>
> "JMX server connection timeout 189" - Thread t@189
>java.lang.Thread.State: TIMED_WAITING
> at java.lang.Object.wait(Native Method)
> - waiting on  (a [I)
> at
> com.sun.jmx.remote.internal.ServerCommunicatorAdmin$Timeout.run(ServerCommunicatorAdmin.java:150)
> at java.lang.Thread.run(Thread.java:662)
>
>Locked ownable synchronizers:
> - None
>
> "web-77" - Thread t@186
>java.lang.Thread.State: WAITING
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <5ab03cb6> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
> at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
> at
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>

Re: too many instances of "org.tartarus.snowball.Among" in the heap

2012-07-30 Thread roz dev
nnection.ConnectionThread.connect(ConnectionThread.java:260)
at
com.wily.introscope.agent.connection.ConnectionThread.run(ConnectionThread.java:64)
at java.lang.Thread.run(Thread.java:662)

   Locked ownable synchronizers:
- None

"Agent Execution" - Thread t@10
   java.lang.Thread.State: WAITING
at java.lang.Object.wait(Native Method)
- waiting on <2b54befa> (a com.wily.util.adt.BlockingQueue)
at java.lang.Object.wait(Object.java:485)
at
com.wily.util.adt.BlockingQueue.interruptableDequeue(BlockingQueue.java:123)
at
com.wily.util.task.AsynchExecutionQueue.doTask(AsynchExecutionQueue.java:200)
at com.wily.util.task.ATask$CoreTask.run(ATask.java:132)
at java.lang.Thread.run(Thread.java:662)

   Locked ownable synchronizers:
- None

"Agent Heartbeat" - Thread t@5
   java.lang.Thread.State: TIMED_WAITING
at java.lang.Thread.sleep(Native Method)
at
com.wily.util.heartbeat.IntervalHeartbeat$HeartbeatRunnable.run(IntervalHeartbeat.java:670)
at java.lang.Thread.run(Thread.java:662)

   Locked ownable synchronizers:
- None

"Remove Metric Data Watch Heartbeat Heartbeat" - Thread t@7
   java.lang.Thread.State: TIMED_WAITING
at java.lang.Thread.sleep(Native Method)
at
com.wily.util.heartbeat.IntervalHeartbeat$HeartbeatRunnable.run(IntervalHeartbeat.java:670)
at java.lang.Thread.run(Thread.java:662)

   Locked ownable synchronizers:
- None

"Configuration Watch Heartbeat Heartbeat" - Thread t@6
   java.lang.Thread.State: TIMED_WAITING
at java.lang.Thread.sleep(Native Method)
at
com.wily.util.heartbeat.IntervalHeartbeat$HeartbeatRunnable.run(IntervalHeartbeat.java:670)
at java.lang.Thread.run(Thread.java:662)

   Locked ownable synchronizers:
- None

"Signal Dispatcher" - Thread t@4
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
- None

"Finalizer" - Thread t@3
   java.lang.Thread.State: WAITING
at java.lang.Object.wait(Native Method)
- waiting on <48c6254f> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

   Locked ownable synchronizers:
- None

"Reference Handler" - Thread t@2
   java.lang.Thread.State: WAITING
at java.lang.Object.wait(Native Method)
- waiting on <48bb8adc> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:485)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)

   Locked ownable synchronizers:
- None

"main" - Thread t@1
   java.lang.Thread.State: RUNNABLE
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
- locked <11dacd96> (a java.net.SocksSocketImpl)
at java.net.ServerSocket.implAccept(ServerSocket.java:462)
at
com.wily.introscope.agent.probe.net.ManagedServerSocket.com_wily_accept14(ManagedServerSocket.java:362)
at
com.wily.introscope.agent.probe.net.ManagedServerSocket.accept(ManagedServerSocket.java:267)
at
org.apache.catalina.core.StandardServer.await(StandardServer.java:431)
at org.apache.catalina.startup.Catalina.await(Catalina.java:676)
at org.apache.catalina.startup.Catalina.start(Catalina.java:628)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)

   Locked ownable synchronizers:
- None



On Fri, Jul 27, 2012 at 5:19 AM, Alexandre Rafalovitch
wrote:

> Try taking a couple of thread dumps and see where in the stack the
> snowball classes show up. That might give you a clue.
>
> Did you customize the parameters to the stemmer? If so, maybe it has
> problems with the file you gave it.
>
> Just some generic thoughts that might help.
>
> Regards,
>Alex.
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
>
> On Fri, Jul 27, 2012 at 3:53 AM, roz dev  wrote:
> > Hi All
> >
> > I am trying to find out the reason for very high memory use and ran JMAP
> > -hist
> >
> > It is showing that i have too many instances of
> org.tartarus.snowball.Among
> >
> > Any ideas what is this for and why am I getting so many of them
> >
> > num   #instances#bytes  Class description
> >
> --
> > *1:  467281101869124400
>  org.tartarus.snowball.Among
> > *
> > 2:  5244210 1840458960  byte[]
>


too many instances of "org.tartarus.snowball.Among" in the heap

2012-07-27 Thread roz dev
Hi All

I am trying to find out the reason for very high memory use and ran JMAP
-hist

It is showing that i have too many instances of org.tartarus.snowball.Among

Any ideas what is this for and why am I getting so many of them

num   #instances#bytes  Class description
--
*1:  467281101869124400  org.tartarus.snowball.Among
*
2:  5244210 1840458960  byte[]
3:  526519495969839368  char[]
4:  10008928864769280   int[]
5:  10250527410021080
java.util.LinkedHashMap$Entry
6:  4672811 268474232   org.tartarus.snowball.Among[]
*7:  8072312 258313984   java.util.HashMap$Entry*
8:  466514  246319392   org.apache.lucene.util.fst.FST$Arc[]
9:  1828542 237600432   java.util.HashMap$Entry[]
10: 3834312 153372480   java.util.TreeMap$Entry
11: 2684700 128865600
org.apache.lucene.util.fst.Builder$UnCompiledNode
12: 4712425 113098200   org.apache.lucene.util.BytesRef
13: 3484836 111514752   java.lang.String
14: 2636045 105441800   org.apache.lucene.index.FieldInfo
15: 1813561 101559416   java.util.LinkedHashMap
16: 6291619 100665904   java.lang.Integer
17: 2684700 85910400
org.apache.lucene.util.fst.Builder$Arc
18: 956998  84215824
org.apache.lucene.index.TermsHashPerField
19: 2892957 69430968
org.apache.lucene.util.AttributeSource$State
20: 2684700 64432800
org.apache.lucene.util.fst.Builder$Arc[]
21: 685595  60332360org.apache.lucene.util.fst.FST
22: 933451  59210944java.lang.Object[]
23: 957043  53594408org.apache.lucene.util.BytesRefHash
24: 591463  42585336
org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader
25: 424801  40780896
org.tartarus.snowball.ext.EnglishStemmer
26: 424801  40780896
org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter
27: 1549670 37192080org.apache.lucene.index.Term
28: 849602  33984080
org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter$WordDelimiterConcatenation
29: 424801  27187264
org.apache.lucene.analysis.core.WhitespaceTokenizer
30: 478499  26795944
org.apache.lucene.index.FreqProxTermsWriterPerField
31: 535521  25705008
org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray
32: 219081  24537072
org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter
33: 478499  22967952
org.apache.lucene.index.FieldInvertState
34: 956998  22967952
org.apache.lucene.index.TermsHashPerField$PostingsBytesStartArray
35: 478499  22967952
org.apache.lucene.index.TermVectorsConsumerPerField
36: 478499  22967952
org.apache.lucene.index.NormsConsumerPerField
37: 316582  22793904
org.apache.lucene.store.MMapDirectory$MMapIndexInput
38: 906708  21760992
org.apache.lucene.util.AttributeSource$State[]
39: 906708  21760992
org.apache.lucene.analysis.tokenattributes.OffsetAttributeImpl
40: 883588  21206112java.util.ArrayList
41: 438192  21033216
org.apache.lucene.store.RAMOutputStream
42: 860601  20654424java.lang.StringBuilder
43: 424801  20390448
org.apache.lucene.analysis.miscellaneous.WordDelimiterIterator
44: 424801  20390448
org.apache.lucene.analysis.core.StopFilter
45: 424801  20390448
org.apache.lucene.analysis.miscellaneous.KeywordMarkerFilter
46: 424801  20390448
org.apache.lucene.analysis.snowball.SnowballFilter
47: 839390  20145360
org.apache.lucene.index.DocumentsWriterDeleteQueue$TermNode


-Saroj


Re: leaks in solr

2012-07-27 Thread roz dev
in my case, I see only 1 searcher, no field cache - still Old Gen is almost
full at 22 GB

Does it have to do with index or some other configuration

-Saroj

On Thu, Jul 26, 2012 at 7:41 PM, Lance Norskog  wrote:

> What does the "Statistics" page in the Solr admin say? There might be
> several "searchers" open: org.apache.solr.search.SolrIndexSearcher
>
> Each searcher holds open different generations of the index. If
> obsolete index files are held open, it may be old searchers. How big
> are the caches? How long does it take to autowarm them?
>
> On Thu, Jul 26, 2012 at 6:15 PM, Karthick Duraisamy Soundararaj
>  wrote:
> > Mark,
> > We use solr 3.6.0 on freebsd 9. Over a period of time, it
> > accumulates lots of space!
> >
> > On Thu, Jul 26, 2012 at 8:47 PM, roz dev  wrote:
> >
> >> Thanks Mark.
> >>
> >> We are never calling commit or optimize with openSearcher=false.
> >>
> >> As per logs, this is what is happening
> >>
> >>
> openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false}
> >>
> >> --
> >> But, We are going to use 4.0 Alpha and see if that helps.
> >>
> >> -Saroj
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Thu, Jul 26, 2012 at 5:12 PM, Mark Miller 
> >> wrote:
> >>
> >> > I'd take a look at this issue:
> >> > https://issues.apache.org/jira/browse/SOLR-3392
> >> >
> >> > Fixed late April.
> >> >
> >> > On Jul 26, 2012, at 7:41 PM, roz dev  wrote:
> >> >
> >> > > it was from 4/11/12
> >> > >
> >> > > -Saroj
> >> > >
> >> > > On Thu, Jul 26, 2012 at 4:21 PM, Mark Miller  >
> >> > wrote:
> >> > >
> >> > >>
> >> > >> On Jul 26, 2012, at 3:18 PM, roz dev  wrote:
> >> > >>
> >> > >>> Hi Guys
> >> > >>>
> >> > >>> I am also seeing this problem.
> >> > >>>
> >> > >>> I am using SOLR 4 from Trunk and seeing this issue repeat every
> day.
> >> > >>>
> >> > >>> Any inputs about how to resolve this would be great
> >> > >>>
> >> > >>> -Saroj
> >> > >>
> >> > >>
> >> > >> Trunk from what date?
> >> > >>
> >> > >> - Mark
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> >
> >> > - Mark Miller
> >> > lucidimagination.com
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>


Re: leaks in solr

2012-07-26 Thread roz dev
Thanks Mark.

We are never calling commit or optimize with openSearcher=false.

As per logs, this is what is happening

openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false}

--
But, We are going to use 4.0 Alpha and see if that helps.

-Saroj










On Thu, Jul 26, 2012 at 5:12 PM, Mark Miller  wrote:

> I'd take a look at this issue:
> https://issues.apache.org/jira/browse/SOLR-3392
>
> Fixed late April.
>
> On Jul 26, 2012, at 7:41 PM, roz dev  wrote:
>
> > it was from 4/11/12
> >
> > -Saroj
> >
> > On Thu, Jul 26, 2012 at 4:21 PM, Mark Miller 
> wrote:
> >
> >>
> >> On Jul 26, 2012, at 3:18 PM, roz dev  wrote:
> >>
> >>> Hi Guys
> >>>
> >>> I am also seeing this problem.
> >>>
> >>> I am using SOLR 4 from Trunk and seeing this issue repeat every day.
> >>>
> >>> Any inputs about how to resolve this would be great
> >>>
> >>> -Saroj
> >>
> >>
> >> Trunk from what date?
> >>
> >> - Mark
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>
>


Re: leaks in solr

2012-07-26 Thread roz dev
it was from 4/11/12

-Saroj

On Thu, Jul 26, 2012 at 4:21 PM, Mark Miller  wrote:

>
> On Jul 26, 2012, at 3:18 PM, roz dev  wrote:
>
> > Hi Guys
> >
> > I am also seeing this problem.
> >
> > I am using SOLR 4 from Trunk and seeing this issue repeat every day.
> >
> > Any inputs about how to resolve this would be great
> >
> > -Saroj
>
>
> Trunk from what date?
>
> - Mark
>
>
>
>
>
>
>
>
>
>


Re: leaks in solr

2012-07-26 Thread roz dev
Hi Guys

I am also seeing this problem.

I am using SOLR 4 from Trunk and seeing this issue repeat every day.

Any inputs about how to resolve this would be great

-Saroj


On Thu, Jul 26, 2012 at 8:33 AM, Karthick Duraisamy Soundararaj <
karthick.soundara...@gmail.com> wrote:

> Did you find any more clues? I have this problem in my machines as well..
>
> On Fri, Jun 29, 2012 at 6:04 AM, Bernd Fehling <
> bernd.fehl...@uni-bielefeld.de> wrote:
>
> > Hi list,
> >
> > while monitoring my solr 3.6.1 installation I recognized an increase of
> > memory usage
> > in OldGen JVM heap on my slave. I decided to force Full GC from jvisualvm
> > and
> > send optimize to the already optimized slave index. Normally this helps
> > because
> > I have monitored this issue over the past. But not this time. The Full GC
> > didn't free any memory. So I decided to take a heap dump and see what
> > MemoryAnalyzer
> > is showing. The heap dump is about 23 GB in size.
> >
> > 1.)
> > Report Top consumers - Biggest Objects:
> > Total: 12.3 GB
> > org.apache.lucene.search.FieldCacheImpl : 8.1 GB
> > class java.lang.ref.Finalizer   : 2.1 GB
> > org.apache.solr.util.ConcurrentLRUCache : 1.5 GB
> > org.apache.lucene.index.ReadOnlySegmentReader : 622.5 MB
> > ...
> >
> > As you can see, Finalizer has already reached 2.1 GB!!!
> >
> > * java.util.concurrent.ConcurrentHashMap$Segment[16] @ 0x37b056fd0
> >   * segments java.util.concurrent.ConcurrentHashMap @ 0x39b02d268
> > * map org.apache.solr.util.ConcurrentLRUCache @ 0x398f33c30
> >   * referent java.lang.ref.Finalizer @ 0x37affa810
> > * next java.lang.ref.Finalizer @ 0x37affa838
> > ...
> >
> > Seams to be org.apache.solr.util.ConcurrentLRUCache
> > The attributes are:
> >
> > Type   |Name  | Value
> > -
> > boolean| isDestroyed  |  true
> > -
> > ref| cleanupThread|  null
> > 
> > ref| evictionListener |  null
> > ---
> > long   | oldestEntry  | 0
> > --
> > int| acceptableWaterMark |  9500
> >
> --
> > ref| stats| org.apache.solr.util.ConcurrentLRUCache$Stats
> > @ 0x37b074dc8
> > 
> > boolean| islive   |  true
> > -
> > boolean| newThreadForCleanup | false
> > 
> > boolean| isCleaning   | false
> >
> >
> 
> > ref| markAndSweepLock | java.util.concurrent.locks.ReentrantLock @
> > 0x39bf63978
> > -
> > int| lowerWaterMark   |  9000
> > -
> > int| upperWaterMark   | 1
> > -
> > ref|  map | java.util.concurrent.ConcurrentHashMap @
> > 0x39b02d268
> > --
> >
> >
> >
> >
> > 2.)
> > While searching for open files and their references I noticed that there
> > are references to
> > index files which are already deleted from disk.
> > E.g. recent index files are "data/index/_2iqw.frq" and
> > "data/index/_2iqx.frq".
> > But I also see references to "data/index/_2hid.frq" which are quite old
> > and are deleted way back
> > from earlier replications.
> > I have to analyze this a bit deeper.
> >
> >
> > So far my report, I go on analyzing this huge heap dump.
> > If you need any other info or even the heap dump, let me know.
> >
> >
> > Regards
> > Bernd
> >
> >
>


Re: Issue with field collapsing in solr 4 while performing distributed search

2012-06-11 Thread roz dev
I think that there is no way around doing custom logic in this case.

If indexing process knows that documents have to be grouped then they
better be together.

-Saroj


On Mon, Jun 11, 2012 at 6:37 AM, Nitesh Nandy  wrote:

> Martijn,
>
> How do we add a custom algorithm for distributing documents in Solr Cloud?
> According to this discussion
>
> http://lucene.472066.n3.nabble.com/SolrCloud-how-to-index-documents-into-a-specific-core-and-how-to-search-against-that-core-td3985262.html
>  , Mark discourages users from using custom distribution mechanism in Solr
> Cloud.
>
> Load balancing is not an issue for us at the moment. In that case, how
> should we implement a custom partitioning algorithm.
>
>
> On Mon, Jun 11, 2012 at 6:23 PM, Martijn v Groningen <
> martijn.v.gronin...@gmail.com> wrote:
>
> > The ngroups returns the number of groups that have matched with the
> > query. However if you want ngroups to be correct in a distributed
> > environment you need
> > to put document belonging to the same group into the same shard.
> > Groups can't cross shard boundaries. I guess you need to do
> > some manual document partitioning.
> >
> > Martijn
> >
> > On 11 June 2012 14:29, Nitesh Nandy  wrote:
> > > Version: Solr 4.0 (svn build 30th may, 2012) with Solr Cloud  (2 slices
> > and
> > > 2 shards)
> > >
> > > The setup was done as per the wiki:
> > http://wiki.apache.org/solr/SolrCloud
> > >
> > > We are doing distributed search. While querying, we use field
> collapsing
> > > with "ngroups" set as true as we need the number of search results.
> > >
> > > However, there is a difference in the number of "result list" returned
> > and
> > > the "ngroups" value returned.
> > >
> > > Ex:
> > >
> >
> http://localhost:8983/solr/select?q=message:blah%20AND%20userid:3&&group=true&group.field=id&group.ngroups=true
> > >
> > >
> > > The response XMl looks like
> > >
> > > 
> > > 
> > > 
> > > 0
> > > 46
> > > 
> > > id
> > > true
> > > true
> > > messagebody:monit AND usergroupid:3
> > > 
> > > 
> > > 
> > > 
> > > 10
> > > 9
> > > 
> > > 
> > > 320043
> > > 
> > > ...
> > > 
> > > 
> > > 
> > > 398807
> > > ...
> > > 
> > > 
> > > 
> > > 346878
> > > ...
> > > 
> > > 
> > > 346880
> > > ...
> > > 
> > > 
> > > 
> > > 
> > > 
> > >
> > > So you can see that the ngroups value returned is 9 and the actual
> number
> > > of groups returned is 4
> > >
> > > Why do we have this discrepancy in the ngroups, matches and actual
> number
> > > of groups. Is this an open issue ?
> > >
> > >  Any kind of help is appreciated.
> > >
> > > --
> > > Regards,
> > >
> > > Nitesh Nandy
> >
> >
> >
> > --
> > Met vriendelijke groet,
> >
> > Martijn van Groningen
> >
>
>
>
> --
> Regards,
>
> Nitesh Nandy
>


Re: How to do custom sorting in Solr?

2012-06-10 Thread roz dev
Yes, these documents have lots of unique values as the same product could
be assigned to lots of other categories and that too, in a different sort
order.

We did some evaluation of heap usage and found that with kind of queries we
generate, heap usage was going up to 24-26 GB. I could trace it to the fact
that
fieldCache is creating an array of 2M size for each of the sort fields.

Since same products are mapped to multiple categories, we incur significant
memory overhead. Therefore, any solve where memory consumption can be
reduced is a good one for me.

In fact, we have situations where same product is mapped to more than 1
sub-category in the same category like


Books
 -- Programming
  - Java in a nutshell
 -- Sale (40% off)
  - Java in a nutshell


So,another thought in my mind is to somehow use second pass collector to
group books appropriately in Programming and Sale categories, with right
sort order.

But, i have no clue about that piece :(

-Saroj


On Sun, Jun 10, 2012 at 4:30 PM, Erick Erickson wrote:

> 2M docs is actually pretty small. Sorting is sensitive to the number
> of _unique_ values in the sort fields, not necessarily the number of
> documents.
>
> And sorting only works on fields with a single value (i.e. it can't have
> more than one token after analysis). So for each field you're only talking
> 2M values at the vary maximum, assuming that the field in question has
> a unique value per document, which I doubt very much given your
> problem description.
>
> So with a corpus that size, I'd "just try it'.
>
> Best
> Erick
>
> On Sun, Jun 10, 2012 at 7:12 PM, roz dev  wrote:
> > Thanks Erik for your quick feedback
> >
> > When Products are assigned to a category or Sub-Category then they can be
> > in any order and price type can be regular or markdown.
> > So, reg and markdown products are intermingled  as per their assignment
> but
> > I want to sort them in such a way that we
> > ensure that all the products which are on markdown are at the bottom of
> the
> > list.
> >
> > I can use these multiple sorts but I realize that they are costly in
> terms
> > of heap used, as they are using FieldCache.
> >
> > I have an index with 2M docs and docs are pretty big. So, I don't want to
> > use them unless there is no other option.
> >
> > I am wondering if I can define a custom function query which can be like
> > this:
> >
> >
> >   - check if product is on the markdown
> >   - if yes then change its sort order field to be the max value in the
> >   given sub-category, say 99
> >   - else, use the sort order of the product in the sub-category
> >
> > I have been looking at existing function queries but do not have a good
> > handle on how to make one of my own.
> >
> > - Another option could be use a custom sort comparator but I am not sure
> > about the way it works
> >
> > Any thoughts?
> >
> >
> > -Saroj
> >
> >
> >
> >
> > On Sun, Jun 10, 2012 at 5:02 AM, Erick Erickson  >wrote:
> >
> >> Skimming this, I two options come to mind:
> >>
> >> 1> Simply apply primary, secondary, etc sorts. Something like
> >>   &sort=subcategory asc,markdown_or_regular desc,sort_order asc
> >>
> >> 2> You could also use grouping to arrange things in groups and sort
> within
> >>  those groups. This has the advantage of returning some members
> >>  of each of the top N groups in the result set, which makes it
> easier
> >> to
> >>  get some of each group rather than having to analyze the whole
> >> list
> >>
> >> But your example is somewhat contradictory. You say
> >> "products which are on markdown, are at
> >> the bottom of the documents list"
> >>
> >> But in your examples, products on "markdown" are intermingled
> >>
> >> Best
> >> Erick
> >>
> >> On Sun, Jun 10, 2012 at 3:36 AM, roz dev  wrote:
> >> > Hi All
> >> >
> >> >>
> >> >> I have an index which contains a Catalog of Products and Categories,
> >> with
> >> >> Solr 4.0 from trunk
> >> >>
> >> >> Data is organized like this:
> >> >>
> >> >> Category: Books
> >> >>
> >> >> Sub Category: Programming
> >> >>
> >> >> Products:
> >> >>
> >> >> Product # 1,  Price: Regular Sort Order:1
> >> >> Product # 2,  Price: Markdown, So

Re: How to do custom sorting in Solr?

2012-06-10 Thread roz dev
Thanks Erik for your quick feedback

When Products are assigned to a category or Sub-Category then they can be
in any order and price type can be regular or markdown.
So, reg and markdown products are intermingled  as per their assignment but
I want to sort them in such a way that we
ensure that all the products which are on markdown are at the bottom of the
list.

I can use these multiple sorts but I realize that they are costly in terms
of heap used, as they are using FieldCache.

I have an index with 2M docs and docs are pretty big. So, I don't want to
use them unless there is no other option.

I am wondering if I can define a custom function query which can be like
this:


   - check if product is on the markdown
   - if yes then change its sort order field to be the max value in the
   given sub-category, say 99
   - else, use the sort order of the product in the sub-category

I have been looking at existing function queries but do not have a good
handle on how to make one of my own.

- Another option could be use a custom sort comparator but I am not sure
about the way it works

Any thoughts?


-Saroj




On Sun, Jun 10, 2012 at 5:02 AM, Erick Erickson wrote:

> Skimming this, I two options come to mind:
>
> 1> Simply apply primary, secondary, etc sorts. Something like
>   &sort=subcategory asc,markdown_or_regular desc,sort_order asc
>
> 2> You could also use grouping to arrange things in groups and sort within
>  those groups. This has the advantage of returning some members
>  of each of the top N groups in the result set, which makes it easier
> to
>  get some of each group rather than having to analyze the whole
> list
>
> But your example is somewhat contradictory. You say
> "products which are on markdown, are at
> the bottom of the documents list"
>
> But in your examples, products on "markdown" are intermingled
>
> Best
> Erick
>
> On Sun, Jun 10, 2012 at 3:36 AM, roz dev  wrote:
> > Hi All
> >
> >>
> >> I have an index which contains a Catalog of Products and Categories,
> with
> >> Solr 4.0 from trunk
> >>
> >> Data is organized like this:
> >>
> >> Category: Books
> >>
> >> Sub Category: Programming
> >>
> >> Products:
> >>
> >> Product # 1,  Price: Regular Sort Order:1
> >> Product # 2,  Price: Markdown, Sort Order:2
> >> Product # 3   Price: Regular, Sort Order:3
> >> Product # 4   Price: Regular, Sort Order:4
> >> 
> >> .
> >> ...
> >> Product # 100   Price: Regular, Sort Order:100
> >>
> >> Sub Category: Fiction
> >>
> >> Products:
> >>
> >> Product # 1,  Price: Markdown, Sort Order:1
> >> Product # 2,  Price: Regular, Sort Order:2
> >> Product # 3   Price: Regular, Sort Order:3
> >> Product # 4   Price: Markdown, Sort Order:4
> >> 
> >> .
> >> ...
> >> Product # 70   Price: Regular, Sort Order:70
> >>
> >>
> >> I want to query Solr and sort these products within each of the
> >> sub-category in a such a way that products which are on markdown, are at
> >> the bottom of the documents list and other products
> >> which are on regular price, are sorted as per their sort order in their
> >> sub-category.
> >>
> >> Expected Results are
> >>
> >> Category: Books
> >>
> >> Sub Category: Programming
> >>
> >> Products:
> >>
> >> Product # 1,  Price: Regular Sort Order:1
> >> Product # 2,  Price: Markdown, Sort Order:101
> >> Product # 3   Price: Regular, Sort Order:3
> >> Product # 4   Price: Regular, Sort Order:4
> >> 
> >> .
> >> ...
> >> Product # 100   Price: Regular, Sort Order:100
> >>
> >> Sub Category: Fiction
> >>
> >> Products:
> >>
> >> Product # 1,  Price: Markdown, Sort Order:71
> >> Product # 2,  Price: Regular, Sort Order:2
> >> Product # 3   Price: Regular, Sort Order:3
> >> Product # 4   Price: Markdown, Sort Order:71
> >> 
> >> .
> >> ...
> >> Product # 70   Price: Regular, Sort Order:70
> >>
> >>
> >> My query is like this:
> >>
> >> q=*:*&fq=category:Books
> >>
> >> What are the options to implement custom sorting and how do I do it?
> >>
> >>
> >>- Define a Custom Function query?
> >>- Define a Custom Comparator? Or,
> >>- Define a Custom Collector?
> >>
> >>
> >> Please let me know the best way to go about it and any pointers to
> >> customize Solr 4.
> >>
> >
> > Thanks
> > Saroj
>


Re: How to do custom sorting in Solr?

2012-06-10 Thread roz dev
Hi All

>
> I have an index which contains a Catalog of Products and Categories, with
> Solr 4.0 from trunk
>
> Data is organized like this:
>
> Category: Books
>
> Sub Category: Programming
>
> Products:
>
> Product # 1,  Price: Regular Sort Order:1
> Product # 2,  Price: Markdown, Sort Order:2
> Product # 3   Price: Regular, Sort Order:3
> Product # 4   Price: Regular, Sort Order:4
> 
> .
> ...
> Product # 100   Price: Regular, Sort Order:100
>
> Sub Category: Fiction
>
> Products:
>
> Product # 1,  Price: Markdown, Sort Order:1
> Product # 2,  Price: Regular, Sort Order:2
> Product # 3   Price: Regular, Sort Order:3
> Product # 4   Price: Markdown, Sort Order:4
> 
> .
> ...
> Product # 70   Price: Regular, Sort Order:70
>
>
> I want to query Solr and sort these products within each of the
> sub-category in a such a way that products which are on markdown, are at
> the bottom of the documents list and other products
> which are on regular price, are sorted as per their sort order in their
> sub-category.
>
> Expected Results are
>
> Category: Books
>
> Sub Category: Programming
>
> Products:
>
> Product # 1,  Price: Regular Sort Order:1
> Product # 2,  Price: Markdown, Sort Order:101
> Product # 3   Price: Regular, Sort Order:3
> Product # 4   Price: Regular, Sort Order:4
> 
> .
> ...
> Product # 100   Price: Regular, Sort Order:100
>
> Sub Category: Fiction
>
> Products:
>
> Product # 1,  Price: Markdown, Sort Order:71
> Product # 2,  Price: Regular, Sort Order:2
> Product # 3   Price: Regular, Sort Order:3
> Product # 4   Price: Markdown, Sort Order:71
> 
> .
> ...
> Product # 70   Price: Regular, Sort Order:70
>
>
> My query is like this:
>
> q=*:*&fq=category:Books
>
> What are the options to implement custom sorting and how do I do it?
>
>
>- Define a Custom Function query?
>- Define a Custom Comparator? Or,
>- Define a Custom Collector?
>
>
> Please let me know the best way to go about it and any pointers to
> customize Solr 4.
>

Thanks
Saroj


Is there any performance cost of using lots of OR in the solr query

2012-04-04 Thread roz dev
Hi All,

I am working on an application which makes few solr calls to get the data.

On the high level, We have a requirement like this


   - Make first call to Solr, to get the list of products which are
   children of a given category
   - Make 2nd solr call to get product documents based on a list of product
   ids

2nd query will look like

q=document_type:SKU&fq=product_id:(34 OR 45 OR 56 OR 77)

We can have close to 100 product ids in fq.

is there a performance cost of doing these solr calls which have lots of OR?

As per Slide # 41 of Presentation "The Seven Deadly Sins of Solr", it is a
bad idea to have these kind of queries.

http://www.slideshare.net/lucenerevolution/hill-jay-7-sins-of-solrpdf

But, It does not become clear the reason it is bad.

Any inputs will be welcome.

Thanks

Saroj


Solr Cloud, Commits and Master/Slave configuration

2012-02-27 Thread roz dev
Hi All,

I am trying to understand features of Solr Cloud, regarding commits and
scaling.


   - If I am using Solr Cloud then do I need to explicitly call commit
   (hard-commit)? Or, a soft commit is okay and Solr Cloud will do the job of
   writing to disk?


   - Do We still need to use  Master/Slave setup to scale searching? If we
   have to use Master/Slave setup then do i need to issue hard-commit to make
   my changes visible to slaves?
   - If I were to use NRT with Master/Slave setup with soft commit then
   will the slave be able to see changes made on master with soft commit?

Any inputs are welcome.

Thanks

-Saroj


Re: hot deploy of newer version of solr schema in production

2012-01-31 Thread roz dev
Thanks Jan for your inputs.

I am keen to know about the way people keep running live sites while there
is a breaking change which calls for complete re-indexing.
we want to build a new index , with new schema (it may take couple of
hours) without impacting live e-commerce site.

any thoughts are welcome

Thanks
Saroj


On Tue, Jan 24, 2012 at 12:21 AM, Jan Høydahl  wrote:

> Hi,
>
> To be able to do a true hot deploy of newer schema without reindexing, you
> must carefully see to that none of your changes are breaking changes. So
> you should test the process on your development machine and make sure it
> works. Adding and deleting fields would work, but not changing the
> field-type or analysis of an existing field. Depending on from/to version,
> you may want to keep the old schema-version number.
>
> The process is:
> 1. Deploy the new schema, including all dependencies such as dictionaries
> 2. Do a RELOAD CORE http://wiki.apache.org/solr/CoreAdmin#RELOAD
>
> My preference is to do a more thorough upgrade of schema including new
> functionality and breaking changes, and then do a full reindex. The
> exception is if my index is huge and the reason for Solr upgrade or schema
> change is to fix a bug, not to use new functionality.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> On 24. jan. 2012, at 01:51, roz dev wrote:
>
> > Hi All,
> >
> > I need community's feedback about deploying newer versions of solr schema
> > into production while existing (older) schema is in use by applications.
> >
> > How do people perform these things? What has been the learning of people
> > about this.
> >
> > Any thoughts are welcome.
> >
> > Thanks
> > Saroj
>
>


hot deploy of newer version of solr schema in production

2012-01-23 Thread roz dev
Hi All,

I need community's feedback about deploying newer versions of solr schema
into production while existing (older) schema is in use by applications.

How do people perform these things? What has been the learning of people
about this.

Any thoughts are welcome.

Thanks
Saroj


Index format difference between 4.0 and 3.4

2011-11-14 Thread roz dev
Hi All,

We are using Solr 1.4.1 in production and are considering an upgrade to
newer version.

It seems that Solr 3.x requires a complete rebuild of index as the format
seems to have changed.

Is Solr 4.0 index file format compatible with Solr 3.x format?

Please advise.

Thanks
Saroj


Re: Production Issue: SolrJ client throwing this error even though field type is not defined in schema

2011-09-30 Thread roz dev
This issue disappeared when we reduced the number of documents which were
being returned from Solr.

Looks to be some issue with Tomcat or Solr, returning truncated responses.

-Saroj


On Sun, Sep 25, 2011 at 9:21 AM,  wrote:

> If I had to give a gentle nudge, I would ask you to validate your schema
> XML file. You can do so by looking for any w3c XML validator website and
> just copy pasting the text there to find out where its malformed.
>
> Sent from my iPhone
>
> On Sep 24, 2011, at 2:01 PM, Erick Erickson 
> wrote:
>
> > You might want to review:
> >
> > http://wiki.apache.org/solr/UsingMailingLists
> >
> > There's really not much to go on here.
> >
> > Best
> > Erick
> >
> > On Wed, Sep 21, 2011 at 12:13 PM, roz dev  wrote:
> >> Hi All
> >>
> >> We are getting this error in our Production Solr Setup.
> >>
> >> Message: Element type "t_sort" must be followed by either attribute
> >> specifications, ">" or "/>".
> >> Solr version is 1.4.1
> >>
> >> Stack trace indicates that solr is returning malformed document.
> >>
> >>
> >> Caused by: org.apache.solr.client.solrj.SolrServerException: Error
> >> executing query
> >>at
> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
> >>at
> org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118)
> >>at
> com.gap.gid.search.impl.SearchServiceImpl.executeQuery(SearchServiceImpl.java:232)
> >>... 15 more
> >> Caused by: org.apache.solr.common.SolrException: parsing error
> >>at
> org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:140)
> >>at
> org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:101)
> >>at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:481)
> >>at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
> >>at
> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
> >>... 17 more
> >> Caused by: javax.xml.stream.XMLStreamException: ParseError at
> >> [row,col]:[3,136974]
> >> Message: Element type "t_sort" must be followed by either attribute
> >> specifications, ">" or "/>".
> >>at
> com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:594)
> >>at
> org.apache.solr.client.solrj.impl.XMLResponseParser.readArray(XMLResponseParser.java:282)
> >>at
> org.apache.solr.client.solrj.impl.XMLResponseParser.readDocument(XMLResponseParser.java:410)
> >>at
> org.apache.solr.client.solrj.impl.XMLResponseParser.readDocuments(XMLResponseParser.java:360)
> >>at
> org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:241)
> >>at
> org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:125)
> >>... 21 more
> >>
>


Re: Production Issue: SolrJ client throwing - Element type must be followed by either attribute specifications, ">" or "/>".

2011-09-22 Thread roz dev
Wanted to update the list with our finding.

We reduced the number of documents which are being retrieved from Solr and
this error did not appear again.
Might be the case that due to high number of documents, solr is returning
incomplete documents.

-Saroj


On Wed, Sep 21, 2011 at 12:13 PM, roz dev  wrote:

> Hi All
>
> We are getting this error in our Production Solr Setup.
>
> Message: Element type "t_sort" must be followed by either attribute 
> specifications, ">" or "/>".
> Solr version is 1.4.1
>
> Stack trace indicates that solr is returning malformed document.
>
>
> Caused by: org.apache.solr.client.solrj.SolrServerException: Error executing 
> query
>   at 
> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
>   at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118)
>   at 
> com.gap.gid.search.impl.SearchServiceImpl.executeQuery(SearchServiceImpl.java:232)
>   ... 15 more
> Caused by: org.apache.solr.common.SolrException: parsing error
>   at 
> org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:140)
>   at 
> org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:101)
>   at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:481)
>   at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
>   at 
> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
>   ... 17 more
> Caused by: javax.xml.stream.XMLStreamException: ParseError at 
> [row,col]:[3,136974]
> Message: Element type "t_sort" must be followed by either attribute 
> specifications, ">" or "/>".
>   at 
> com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:594)
>   at 
> org.apache.solr.client.solrj.impl.XMLResponseParser.readArray(XMLResponseParser.java:282)
>   at 
> org.apache.solr.client.solrj.impl.XMLResponseParser.readDocument(XMLResponseParser.java:410)
>   at 
> org.apache.solr.client.solrj.impl.XMLResponseParser.readDocuments(XMLResponseParser.java:360)
>   at 
> org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:241)
>   at 
> org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:125)
>   ... 21 more
>
>


Production Issue: SolrJ client throwing this error even though field type is not defined in schema

2011-09-21 Thread roz dev
Hi All

We are getting this error in our Production Solr Setup.

Message: Element type "t_sort" must be followed by either attribute
specifications, ">" or "/>".
Solr version is 1.4.1

Stack trace indicates that solr is returning malformed document.


Caused by: org.apache.solr.client.solrj.SolrServerException: Error
executing query
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118)
at 
com.gap.gid.search.impl.SearchServiceImpl.executeQuery(SearchServiceImpl.java:232)
... 15 more
Caused by: org.apache.solr.common.SolrException: parsing error
at 
org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:140)
at 
org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:101)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:481)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
... 17 more
Caused by: javax.xml.stream.XMLStreamException: ParseError at
[row,col]:[3,136974]
Message: Element type "t_sort" must be followed by either attribute
specifications, ">" or "/>".
at 
com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:594)
at 
org.apache.solr.client.solrj.impl.XMLResponseParser.readArray(XMLResponseParser.java:282)
at 
org.apache.solr.client.solrj.impl.XMLResponseParser.readDocument(XMLResponseParser.java:410)
at 
org.apache.solr.client.solrj.impl.XMLResponseParser.readDocuments(XMLResponseParser.java:360)
at 
org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:241)
at 
org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:125)
... 21 more


q and fq in solr 1.4.1

2011-09-20 Thread roz dev
Hi All

I am sure that q vs fq question has been answered several times.

But, I still have a question which I would like to know the answers for:

if we have a solr query like this

q=*&fq=field_1:XYZ&fq=field_2:ABC&sortBy=field_3+asc

How does SolrIndexSearcher fire query in 1.4.1

Will it fire query against whole index first because q=* then filter the
results against field_1 and field_2 or is it in parallel?

and, if we say that get only 20 rows at a time then will solr do following
1) get all the docs (because q is set to *) and sort them by field_3
2) then, filter the results by field_1 and field_2

Or, will it apply sorting after doing the filter?

Please let me know how Solr 1.4.1 works.

Thanks
Saroj


cache invalidation in slaves

2011-09-20 Thread roz dev
Hi All

Solr has different types of caches such as filterCache, queryResultCache and
document Cache .
I know that if a commit is done then a new searcher is opened and new caches
are built. And, this makes sense.

What happens when commits are happening on master and slaves are pulling all
the delta updates.

Do slaves trash their cache and rebuild them every time there is a new delta
index updates downloaded to slave?


Thanks
Saroj


what is the default value of omitNorms and termVectors in solr schema

2011-09-18 Thread roz dev
Hi

As per this document, http://wiki.apache.org/solr/FieldOptionsByUseCase,
omitNorms and termVectors have to be "explicitly" specified in some cases.

I am wondering what is the default value of these settings if solr schema
definition does not state them.

*Example:*



In above case, will Solr create norms for this field and term vector as
well?

Any ideas?

Thanks
Saroj


Re: Does Solr flush to disk even before ramBufferSizeMB is hit?

2011-08-30 Thread roz dev
Thanks Shawn.

If Solr writes this info to Disk as soon as possible (which is what I am
seeing) then ramBuffer setting seems to be misleading.

Anyone else has any thoughts on this?

-Saroj


On Mon, Aug 29, 2011 at 6:14 AM, Shawn Heisey  wrote:

> On 8/28/2011 11:18 PM, roz dev wrote:
>
>> I notice that even though InfoStream does not mention that data is being
>> flushed to disk, new segment files were created on the server.
>> Size of these files kept growing even though there was enough Heap
>> available
>> and 856MB Ram was not even used.
>>
>
> With the caveat that I am not an expert and someone may correct me, I'll
> offer this:  It's been my experience that Solr will write the files that
> constitute stored fields as soon as they are available, because that
> information is always the same and nothing will change in those files based
> on the next chunk of data.
>
> Thanks,
> Shawn
>
>


Does Solr flush to disk even before ramBufferSizeMB is hit?

2011-08-28 Thread roz dev
Hi All,
I am trying to tune ramBufferSizeMB and merge factor for my setup.

So, i enabled Lucene Index Writer's log info stream and started monitoring
Data folder where index files are created.
I started my test with following

Heap: 3GB
Solr 1.4.1,
Index Size = 20 GB,
ramBufferSizeMB=856
Merge Factor=25


I ran my testing with 30 concurrent threads writing to Solr.
My jobs delete 6 (approx) records by issuing a deleteByQuery command and
then proceed to write data.

Commit is done at the end of writing process.

Results are bit surprising for me and I need some help understanding them.

I notice that even though InfoStream does not mention that data is being
flushed to disk, new segment files were created on the server.
Size of these files kept growing even though there was enough Heap available
and 856MB Ram was not even used.

Is it the case that Lucene is flushing to disk even if ramBufferSizeMB is
being hit. If that is the case then why is it that
InfoStream is not logging this info.

As per Infostream, it is flushing at the end but files are created much
before that.

Here is what InfoStream is saying: - Please note that is indicating that a
new segment is being flushed at 12:58 AM but files were created at 12:53 am
itself and they kept growing.

Aug 29, 2011 12:46:00 AM IW 0 [main]: setInfoStream:
dir=org.apache.lucene.store.NIOFSDirectory@/opt/gid/solr/ecom/data/index
autoCommit=false
mergePolicy=org.apache.lucene.index.LogByteSizeMergePolicy@4552a64dmergeScheduler=org.apache.lucene.index.ConcurrentMergeScheduler@35242cc9ramBufferSizeMB=856.0
maxBufferedDocs=-1 maxBuffereDeleteTerms=-1
maxFieldLength=1 index=_3l:C2151995

Aug 29, 2011 12:57:35 AM IW 0 [web-1]: now flush at close
Aug 29, 2011 12:57:35 AM IW 0 [web-1]: flush: now pause all indexing threads
Aug 29, 2011 12:57:35 AM IW 0 [web-1]:   flush: segment=_3m
docStoreSegment=_3m docStoreOffset=0 flushDocs=true flushDeletes=true
flushDocStores=true numDocs=60788 numBufDelTerms=60788
Aug 29, 2011 12:57:35 AM IW 0 [web-1]:   index before flush _3l:C2151995
Aug 29, 2011 12:57:35 AM IW 0 [web-1]: DW: flush postings as segment _3m
numDocs=60788
Aug 29, 2011 12:57:35 AM IW 0 [web-1]: DW: closeDocStore: 2 files to flush
to segment _3m numDocs=60788
Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleIntBlocks count=9 total
now 9
Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleByteBlocks
blockSize=32768 count=182 total now 182
Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleCharBlocks count=49
total now 49
Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleIntBlocks count=7 total
now 16
Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleByteBlocks
blockSize=32768 count=145 total now 327
Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleCharBlocks count=37
total now 86
Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleIntBlocks count=9 total
now 25
Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleByteBlocks
blockSize=32768 count=208 total now 535
Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleCharBlocks count=52
total now 138
Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleIntBlocks count=7 total
now 32
Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleByteBlocks
blockSize=32768 count=136 total now 671
Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleCharBlocks count=39
total now 177
Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleIntBlocks count=3 total
now 35
Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleByteBlocks
blockSize=32768 count=58 total now 729
Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleCharBlocks count=16
total now 193
Aug 29, 2011 12:57:41 AM IW 0 [web-1]: DW:   oldRAMSize=50469888
newFlushedSize=161169038 docs/MB=395.491 new/old=319.337%
Aug 29, 2011 12:57:41 AM IFD [web-1]: now checkpoint "segments_1x" [2
segments ; isCommit = false]
Aug 29, 2011 12:57:41 AM IW 0 [web-1]: DW: apply 60788 buffered deleted
terms and 0 deleted docIDs and 1 deleted queries on 2 segments.
Aug 29, 2011 12:57:42 AM IFD [web-1]: now checkpoint "segments_1x" [2
segments ; isCommit = false]
Aug 29, 2011 12:57:42 AM IFD [web-1]: now checkpoint "segments_1x" [2
segments ; isCommit = false]
Aug 29, 2011 12:57:42 AM IW 0 [web-1]: LMP: findMerges: 2 segments
Aug 29, 2011 12:57:42 AM IW 0 [web-1]: LMP:   level 6.6799455 to 7.4299455:
1 segments
Aug 29, 2011 12:57:42 AM IW 0 [web-1]: LMP:   level 5.1209826 to 5.8709826:
1 segments
Aug 29, 2011 12:57:42 AM IW 0 [web-1]: CMS: now merge
Aug 29, 2011 12:57:42 AM IW 0 [web-1]: CMS:   index: _3l:C2151995 _3m:C60788
Aug 29, 2011 12:57:42 AM IW 0 [web-1]: CMS:   no more merges pending; now
return
Aug 29, 2011 12:57:42 AM IW 0 [web-1]: CMS: now merge
Aug 29, 2011 12:57:42 AM IW 0 [web-1]: CMS:   index: _3l:C2151995 _3m:C60788
Aug 29, 2011 12:57:42 AM IW 0 [web-1]: CMS:   no more merges pending; now
return
Aug 29, 2011 12:57:42 AM IW 0 [web-1]: now call final commit()
Aug 29, 2011 12:57:42 AM IW 0 [web-1]: startCommit(): start sizeInBytes=0
Aug 29, 2011 12:57:42

SolrJ Question about Bad Request Root cause error

2011-01-11 Thread roz dev
Hi All

We are using SolrJ client (v 1.4.1) to integrate with our solr search
server.
We notice that whenever SolrJ request does not match with Solr schema, we
get Bad Request exception which makes sense.

org.apache.solr.common.SolrException: Bad Request

But, SolrJ Client does not provide any clue about the reason request is Bad.

Is there any way to get the root cause on client side?

Of Course, solr server logs have enough info to know that data is bad but it
would be great
to have the same info in the exception generated by SolrJ.

Any thoughts? Is there any plan to add this in future releases?

Thanks,
Saroj