ICUCollation throws exception

2012-07-16 Thread Oliver Schihin

Hello

According to release notes from 4.0.0-ALPHA, SOLR-2396, I replaced 
ICUCollationKeyFilterFactory with ICUCollationField in our schema. But this throws an 
exception, see the following excerpt from the log:


Jul 16, 2012 5:27:48 PM org.apache.solr.common.SolrException log
SEVERE: null:org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] 
fieldType "alphaOnlySort": Pl
ugin init failure for [schema.xml] analyzer/filter: class 
org.apache.solr.schema.ICUCollationField

at 
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:168)
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:359)

The deprecated filter of ICUCollationKeyFilterFactory is working without any problem. This 
is how I did the schema (with the deprecated filter):


   
   omitNorms="true">

  


  



Do I have to replace jars in /contrib/analysis-extras/, or any other hints of what might 
be wrong in my install and configuration?


Thanks a lot
Oliver




RE: Using Solr 3.4 running on tomcat7 - very slow search

2012-07-16 Thread Bryan Loofbourrow
Another thing you may wish to ponder is this blog entry from Mike
McCandless:
http://blog.mikemccandless.com/2011/04/just-say-no-to-swapping.html

In it, he discusses the poor interaction between OS swapping, and
long-neglected allocations in a JVM. You're on Linux, which has decent
control over swapping decisions, so you may find that a tweak is in order,
especially if you can discover evidence that the hard drive is being
worked hard during GC. If the problem exists, it might be especially
pronounced in your large JVM.

I have no direct evidence of thrashing during GC (I am not sure how to go
about gathering such evidence), but I have seen, on a Windows machine, a
Tomcat running Solr refuse to shut down for many minutes, while a Resource
Monitor session reports that that same Tomcat process is frantically
reading from the page file the whole time. So there is something besides
plausibility to the idea.

-- Bryan

> -Original Message-
> From: Mou [mailto:mouna...@gmail.com]
> Sent: Monday, July 16, 2012 9:09 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Using Solr 3.4 running on tomcat7 - very slow search
>
> Thanks Brian. Excellent suggestion.
>
> I haven't used VisualVM before but I am going to use it to see where CPU
> is
> going. I saw that CPU is overly used. I haven't seen so much CPU use in
> testing.
> Although I think GC is not a problem, splitting the jvm per shard would
be
> a good idea.
>
>
> On Mon, Jul 16, 2012 at 9:44 PM, Bryan Loofbourrow [via Lucene] <
> ml-node+s472066n3995446...@n3.nabble.com> wrote:
>
> > 5 min is ridiculously long for a query that used to take 65ms. That
> ought
> > to be a great clue. The only two things I've seen that could cause
that
> > are thrashing, or GC. Hard to see how it could be thrashing, given
your
> > hardware, so I'd initially suspect GC.
> >
> > Aim VisualVM at the JVM. It shows how much CPU goes to GC over time,
in
> a
> > nice blue line. And if it's not GC, try out its Sampler tab, and see
> where
> > the CPU is spending its time.
> >
> > FWIW, when asked at what point one would want to split JVMs and shard,
> on
> > the same machine, Grant Ingersoll mentioned 16GB, and precisely for GC
> > cost reasons. You're way above that. Maybe multiple JVMs and sharding,
> > even on the same machine, would serve you better than a monster 70GB
> JVM.
> >
> > -- Bryan
> >
> > > -Original Message-
> > > From: Mou [mailto:[hidden
> email]]
> >
> > > Sent: Monday, July 16, 2012 7:43 PM
> > > To: [hidden
> email]
> > > Subject: Using Solr 3.4 running on tomcat7 - very slow search
> > >
> > > Hi,
> > >
> > > Our index is divided into two shards and each of them has 120M docs
,
> > > total
> > > size 75G in each core.
> > > The server is a pretty good one , jvm is given memory of 70G and
about
> > > same
> > > is left for OS (SLES 11) .
> > >
> > > We use all dynamic fields except th eunique id and are using long
> > queries
> > > but almost all of them are filter queires, Each query may have 10
-30
> fq
> > > parameters.
> > >
> > > When I tested the index ( same size) but with max heap size 40 G,
> > queries
> >
> > > were blazing fast. I used solrmeter to load test and it was happily
> > > serving
> > > 12000 queries or more per min with avg 65 ms qtime.We had an
excellent
> > > filtercache hit ratio.
> > >
> > > This index is only used for searching and being replicated every 7
sec
> > > from
> > > the master.
> > >
> > > But now in production server it is horribly slow and taking 5
> > mins(qtime)
> >
> > > to
> > > return a query ( same query).
> > > What could go wrong?
> > >
> > > Really appreciate your suggestions on debugging this thing..
> > >
> > >
> > >
> > > --
> > > View this message in context:
> http://lucene.472066.n3.nabble.com/Using-
> > > Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
> > --
> >  If you reply to this email, your message will be added to the
> discussion
> > below:
> >
> > http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-
> very-slow-search-tp3995436p3995446.html
> >  To unsubscribe from Using Solr 3.4 running on tomcat7 - very slow
> search, click
> >
>
here
ubscribe_by_code&node=3995436&code=bW91bmFuZGlAZ21haWwuY29tfDM5OTU0MzZ8Mjg
> 1MTA5MTUw>
> > .
> >
>
NAML
ro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespace
> s.BasicNamespace-nabble.view.web.template.NabbleNamespace-
>
nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21na
> bble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-
> send_instant_email%21nabble%3Aemail.naml>
> >
>
>
> --
> View this message in context: http://lucene.

Re: Using Solr 3.4 running on tomcat7 - very slow search

2012-07-16 Thread Mou
Thanks Brian. Excellent suggestion.

I haven't used VisualVM before but I am going to use it to see where CPU is
going. I saw that CPU is overly used. I haven't seen so much CPU use in
testing.
Although I think GC is not a problem, splitting the jvm per shard would be
a good idea.


On Mon, Jul 16, 2012 at 9:44 PM, Bryan Loofbourrow [via Lucene] <
ml-node+s472066n3995446...@n3.nabble.com> wrote:

> 5 min is ridiculously long for a query that used to take 65ms. That ought
> to be a great clue. The only two things I've seen that could cause that
> are thrashing, or GC. Hard to see how it could be thrashing, given your
> hardware, so I'd initially suspect GC.
>
> Aim VisualVM at the JVM. It shows how much CPU goes to GC over time, in a
> nice blue line. And if it's not GC, try out its Sampler tab, and see where
> the CPU is spending its time.
>
> FWIW, when asked at what point one would want to split JVMs and shard, on
> the same machine, Grant Ingersoll mentioned 16GB, and precisely for GC
> cost reasons. You're way above that. Maybe multiple JVMs and sharding,
> even on the same machine, would serve you better than a monster 70GB JVM.
>
> -- Bryan
>
> > -Original Message-
> > From: Mou [mailto:[hidden 
> > email]]
>
> > Sent: Monday, July 16, 2012 7:43 PM
> > To: [hidden email]
> > Subject: Using Solr 3.4 running on tomcat7 - very slow search
> >
> > Hi,
> >
> > Our index is divided into two shards and each of them has 120M docs ,
> > total
> > size 75G in each core.
> > The server is a pretty good one , jvm is given memory of 70G and about
> > same
> > is left for OS (SLES 11) .
> >
> > We use all dynamic fields except th eunique id and are using long
> queries
> > but almost all of them are filter queires, Each query may have 10 -30 fq
> > parameters.
> >
> > When I tested the index ( same size) but with max heap size 40 G,
> queries
>
> > were blazing fast. I used solrmeter to load test and it was happily
> > serving
> > 12000 queries or more per min with avg 65 ms qtime.We had an excellent
> > filtercache hit ratio.
> >
> > This index is only used for searching and being replicated every 7 sec
> > from
> > the master.
> >
> > But now in production server it is horribly slow and taking 5
> mins(qtime)
>
> > to
> > return a query ( same query).
> > What could go wrong?
> >
> > Really appreciate your suggestions on debugging this thing..
> >
> >
> >
> > --
> > View this message in context: http://lucene.472066.n3.nabble.com/Using-
> > Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995446.html
>  To unsubscribe from Using Solr 3.4 running on tomcat7 - very slow search, 
> click
> here
> .
> NAML
>


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995449.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Mmap

2012-07-16 Thread William Bell
Yep.

-Dsolr.directoryFactory=solr.SimpleFSDirectoryFactory

or

-Dsolr.directoryFactory=solr.MMapDirectoryFactory

works great.


On Mon, Jul 16, 2012 at 7:55 PM, Michael Della Bitta
 wrote:
> Hi Bill,
>
> Standard picks one for you. Otherwise, you can hardcode the
> DirectoryFactory in your config file, or I believe if you specify
>
> -Dsolr.solr.directoryFactory=solr.MMapDirectoryFactory
>
> That will get you what you want.
>
> Michael Della Bitta
>
> 
> Appinions, Inc. -- Where Influence Isn’t a Game.
> http://www.appinions.com
>
>
> On Mon, Jul 16, 2012 at 9:32 PM, Bill Bell  wrote:
>> Any thought on this? Is the default Mmap?
>>
>>
>>
>> Sent from my mobile device
>> 720-256-8076
>>
>> On Feb 14, 2012, at 7:16 AM, Bill Bell  wrote:
>>
>>> Does someone have an example of using unmap in 3.5 and chunksize?
>>>
>>> I am using Solr 3.5.
>>>
>>> I noticed in solrconfig.xml:
>>>
>>> >> class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>
>>>
>>> I don't see this parameter taking.. When I set 
>>> -Dsolr.directoryFactory=solr.MMapDirectoryFactory
>>>
>>> How do I see the setting in the log or in stats.jsp ? I cannot find a place 
>>> that indicates it is set or not.
>>>
>>> I would assume StandardDirectoryFactory is being used but I do see (when I 
>>> set it or NOT set it)
>>>
>>> Bill Bell
>>> Sent from mobile
>>>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


How to setup SimpleFSDirectoryFactory

2012-07-16 Thread William Bell
We all know that MMapDirectory is fastest. However we cannot always
use it since you might run out of memory on large indexes right?

Here is how I got iSimpleFSDirectoryFactory to work. Just set
-Dsolr.directoryFactory=solr.SimpleFSDirectoryFactory.

Your solrconfig.xml:



You can check it with http://localhost:8983/solr/admin/stats.jsp

Notice that the default for Windows 64bit is MMapDirectory. Else
NIOFSDirectory except for WIndows It would be nicer if we just set
it all up with a helper in solrconfig.xml...

if (Constants.WINDOWS) {
 if (MMapDirectory.UNMAP_SUPPORTED && Constants.JRE_IS_64BIT)
return new MMapDirectory(path, lockFactory);
 else
return new SimpleFSDirectory(path, lockFactory);
 } else {
return new NIOFSDirectory(path, lockFactory);
  }
}



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


RE: Using Solr 3.4 running on tomcat7 - very slow search

2012-07-16 Thread Bryan Loofbourrow
5 min is ridiculously long for a query that used to take 65ms. That ought
to be a great clue. The only two things I've seen that could cause that
are thrashing, or GC. Hard to see how it could be thrashing, given your
hardware, so I'd initially suspect GC.

Aim VisualVM at the JVM. It shows how much CPU goes to GC over time, in a
nice blue line. And if it's not GC, try out its Sampler tab, and see where
the CPU is spending its time.

FWIW, when asked at what point one would want to split JVMs and shard, on
the same machine, Grant Ingersoll mentioned 16GB, and precisely for GC
cost reasons. You're way above that. Maybe multiple JVMs and sharding,
even on the same machine, would serve you better than a monster 70GB JVM.

-- Bryan

> -Original Message-
> From: Mou [mailto:mouna...@gmail.com]
> Sent: Monday, July 16, 2012 7:43 PM
> To: solr-user@lucene.apache.org
> Subject: Using Solr 3.4 running on tomcat7 - very slow search
>
> Hi,
>
> Our index is divided into two shards and each of them has 120M docs ,
> total
> size 75G in each core.
> The server is a pretty good one , jvm is given memory of 70G and about
> same
> is left for OS (SLES 11) .
>
> We use all dynamic fields except th eunique id and are using long
queries
> but almost all of them are filter queires, Each query may have 10 -30 fq
> parameters.
>
> When I tested the index ( same size) but with max heap size 40 G,
queries
> were blazing fast. I used solrmeter to load test and it was happily
> serving
> 12000 queries or more per min with avg 65 ms qtime.We had an excellent
> filtercache hit ratio.
>
> This index is only used for searching and being replicated every 7 sec
> from
> the master.
>
> But now in production server it is horribly slow and taking 5
mins(qtime)
> to
> return a query ( same query).
> What could go wrong?
>
> Really appreciate your suggestions on debugging this thing..
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Using-
> Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Using Solr 3.4 running on tomcat7 - very slow search

2012-07-16 Thread Mou
Hi,

Our index is divided into two shards and each of them has 120M docs , total
size 75G in each core.
The server is a pretty good one , jvm is given memory of 70G and about same
is left for OS (SLES 11) .

We use all dynamic fields except th eunique id and are using long queries
but almost all of them are filter queires, Each query may have 10 -30 fq
parameters.

When I tested the index ( same size) but with max heap size 40 G, queries
were blazing fast. I used solrmeter to load test and it was happily serving
12000 queries or more per min with avg 65 ms qtime.We had an excellent
filtercache hit ratio.

This index is only used for searching and being replicated every 7 sec from
the master.

But now in production server it is horribly slow and taking 5 mins(qtime) to
return a query ( same query).
What could go wrong?

Really appreciate your suggestions on debugging this thing..



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Mmap

2012-07-16 Thread Michael Della Bitta
Hi Bill,

Standard picks one for you. Otherwise, you can hardcode the
DirectoryFactory in your config file, or I believe if you specify

-Dsolr.solr.directoryFactory=solr.MMapDirectoryFactory

That will get you what you want.

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Mon, Jul 16, 2012 at 9:32 PM, Bill Bell  wrote:
> Any thought on this? Is the default Mmap?
>
>
>
> Sent from my mobile device
> 720-256-8076
>
> On Feb 14, 2012, at 7:16 AM, Bill Bell  wrote:
>
>> Does someone have an example of using unmap in 3.5 and chunksize?
>>
>> I am using Solr 3.5.
>>
>> I noticed in solrconfig.xml:
>>
>> > class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>
>>
>> I don't see this parameter taking.. When I set 
>> -Dsolr.directoryFactory=solr.MMapDirectoryFactory
>>
>> How do I see the setting in the log or in stats.jsp ? I cannot find a place 
>> that indicates it is set or not.
>>
>> I would assume StandardDirectoryFactory is being used but I do see (when I 
>> set it or NOT set it)
>>
>> Bill Bell
>> Sent from mobile
>>


Re: Mmap

2012-07-16 Thread Bill Bell
Any thought on this? Is the default Mmap?



Sent from my mobile device
720-256-8076

On Feb 14, 2012, at 7:16 AM, Bill Bell  wrote:

> Does someone have an example of using unmap in 3.5 and chunksize?
> 
> I am using Solr 3.5.
> 
> I noticed in solrconfig.xml:
> 
>  class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>
> 
> I don't see this parameter taking.. When I set 
> -Dsolr.directoryFactory=solr.MMapDirectoryFactory
> 
> How do I see the setting in the log or in stats.jsp ? I cannot find a place 
> that indicates it is set or not.
> 
> I would assume StandardDirectoryFactory is being used but I do see (when I 
> set it or NOT set it)
> 
> Bill Bell
> Sent from mobile
> 


RE: SOLR 4 Alpha Out Of Mem Err

2012-07-16 Thread Nick Koton
> That suggests you're running out of threads
Michael,
Thanks for this useful observation.  What I found just prior to the "problem
situation" was literally thousands of threads in the server JVM.  I have
pasted a few samples below obtained from the admin GUI.  I spent some time
today using this barometer, but I don't have enough to share right now.  I'm
looking at the difference between ConcurrentUpdateSolrServer and
HttpSolrServer and how my client may be misusing them.  I'll assume my
client is misbehaving and driving the server crazy for now.  If I figure out
how, I will share it so perhaps a safe guard can be put in place.

Nick


Server threads - very roughly 0.1 %:
cmdDistribExecutor-9-thread-7161 (10096) 
java.util.concurrent.SynchronousQueue$TransferStack@17b90c55
.   sun.misc.Unsafe.park(Native Method)
.
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
.
java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Synchronous
Queue.java:424)
.
java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueu
e.java:323)
.
java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:874)
.
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:945)
.
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
07)
.   java.lang.Thread.run(Thread.java:662)
-0.ms
-0.ms cmdDistribExecutor-9-thread-7160 (10086) 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@5509b5
6
.   sun.misc.Unsafe.park(Native Method)
.   java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
.
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(
AbstractQueuedSynchronizer.java:1987)
.
org.apache.http.impl.conn.tsccm.WaitingThread.await(WaitingThread.java:158)
.
org.apache.http.impl.conn.tsccm.ConnPoolByRoute.getEntryBlocking(ConnPoolByR
oute.java:403)
.
org.apache.http.impl.conn.tsccm.ConnPoolByRoute$1.getPoolEntry(ConnPoolByRou
te.java:300)
.
org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager$1.getConnection(
ThreadSafeClientConnManager.java:224)
.
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDir
ector.java:401)
.
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.ja
va:820)
.
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.ja
va:754)
.
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.ja
va:732)
.
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java
:351)
.
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java
:182)
.
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:325
)
.
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306
)
.   java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
.   java.util.concurrent.FutureTask.run(FutureTask.java:138)
.
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
.   java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
.   java.util.concurrent.FutureTask.run(FutureTask.java:138)
.
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
va:886)
.
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
08)
.   java.lang.Thread.run(Thread.java:662)
20.ms
20.ms cmdDistribExecutor-9-thread-7159 (10085) 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@6f062d
d3
.   sun.misc.Unsafe.park(Native Method)
.   java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
.
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(
AbstractQueuedSynchronizer.java:1987)
.
org.apache.http.impl.conn.tsccm.WaitingThread.await(WaitingThread.java:158)
.
org.apache.http.impl.conn.tsccm.ConnPoolByRoute.getEntryBlocking(ConnPoolByR
oute.java:403)
.
org.apache.http.impl.conn.tsccm.ConnPoolByRoute$1.getPoolEntry(ConnPoolByRou
te.java:300)
.
org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager$1.getConnection(
ThreadSafeClientConnManager.java:224)
.
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDir
ector.java:401)
.
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.ja
va:820)
.
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.ja
va:754)
.
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.ja
va:732)
.
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java
:351)
.
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java
:182)
.
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:325
)
.
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306
)
.   java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
.   java.util.concurrent.FutureTask.run(FutureTask.java:138)
.
java.util.concurrent.Executor

Re: Query results vs. facets results

2012-07-16 Thread tudor

Erick Erickson wrote
> 
> Ahhh, you need to look down another few lines. When you specify fq, there
> should be a section of the debug output like
> 
>   .
>   .
>   .
> 
> 
> where the array is the parsed form of the filter queries. I was thinking
> about
> comparing that with the parsed form of the "q" parameter in the non-filter
> case to see what insight one could gain from that.
> 
> 

There is no "filter_queries" section because I do not use an fq in the first
two queries. I use one in the combined query, for which you can see the
output further below.


Erick Erickson wrote
> 
> 
> But there's already one difference, when you use *, you get
>  ID:*
> 
> Is it possible that you have some documents that do NOT have an ID field?
> try *:* rather than just *. I'm guessing that your default search field is
> ID
> and you have some documents without an ID field. Not a good guess if ID
> is your  though..
> 
> Try q=*:* -ID:* and see if you get 31 docs.
> 
> 

All the entries have an ID, so q=*:* -ID:* yielded 0 results.
The ID could appear multiple times, that is the reason behind grouping of
results. Indeed, ID is the default search field.


Erick Erickson wrote
> 
> 
> Also note that if you _have_ specified ID as your  _but_ you
> didn't
> re-index afterwards (actually, I'd blow away the entire
> /data directory
> and restart) you may have stale data in there that allowed documents to
> exist
> that do not have uniqueKey fields.
> 
> 

For Solr's unique id I use a  field (which, of course, has a different name than the
default search ID), so it should not be a problem.

I have re-indexed the data, and I get somewhat a different result. This is
the query:

http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*:*&fq={!tag=dt}CITY:MILTON&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=STR_ENTERPRISE_ID&group.truncate=true&facet=true&facet.field={!ex=dt}CITY&facet.missing=true&group.ngroups=true&debugQuery=on

And the results as well as the debug information:


  
  284
  134
  
   ...


  


  ...
89
  ...


  *:*
  *:*
  MatchAllDocsQuery(*:*)
  *:*
  
  LuceneQParser
  
  {!tag=dt}CITY:MILTON
  
  CITY:MILTON
  
  


So now fq says: 134 groups with CITY:MILTON and faceted search says: 83
groups with CITY:MILTON. 

How can I see some information about the grouping in Solr?

Thanks Erick!

Regards,
Tudor


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3995079p3995388.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Lost answers?

2012-07-16 Thread Michael Della Bitta
Hello Bruno,

Jetty is a legitimate choice. I do, however, worry that you might be
masking an underlying problem by making that choice, without a
guarantee that it won't someday hurt you even if you use Jetty.

A question: are you using a client to connect to Solr and issue your
queries? Something like SolrJ, solr-php-client, rsolr, etc.? If not,
you might find that someone has already done the work for you of
making a durable client-side API for Solr, and achieve better results.


Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Mon, Jul 16, 2012 at 3:16 PM, Bruno Mannina  wrote:
> Hello Michael,
>
> I will check the log, but today I think to another thing may be it's my
> program that it losts some requests.
> It's the first time where the download is so fast.
>
> With Jetty, it's a little bit slower so may be for this reason my program
> works fine.
>
> Do you think I can use Jetty for my prod' environment?
> I will have around 500 users / year with 10 000 requests by day max
>
> Le 16/07/2012 16:40, Michael Della Bitta a écrit :
>
>> Hello, Bruno,
>>
>> No, 4 simultaneous requests should not be a problem.
>>
>> Have you checked the Tomcat logs or logged the data in the query
>> response object to see if there are any clues to what the problem
>> might be?
>>
>> Michael Della Bitta
>>
>> 
>> Appinions, Inc. -- Where Influence Isn’t a Game.
>> http://www.appinions.com
>>
>>
>> On Sun, Jul 15, 2012 at 2:10 PM, Bruno Mannina  wrote:
>>>
>>> I forgot:
>>>
>>> I do the request on the uniqueKey field, so each request gets one
>>> document
>>>
>>> Le 15/07/2012 14:11, Bruno Mannina a écrit :
>>>
 Dear Solr Users,

 I have a solr3.6 + Tomcat and I have a program that connect 4 http
 requests at the same time.
 I must do 1902 requests.

 I do several tests but each time it losts some requests:
 - sometimes I get 1856 docs, 1895 docs, 1900 docs but never 1902 docs.

 With Jetty, I get always 1902 docs.

 As it's a dev' environment, I'm alone to test it.

 Is it a problem to do 4 requests at the same time for tomcat6?

 thanks for your info,

 Bruno


>>>
>>
>
>


Re: Lost answers?

2012-07-16 Thread Bruno Mannina

Hello Michael,

I will check the log, but today I think to another thing may be it's my 
program that it losts some requests.

It's the first time where the download is so fast.

With Jetty, it's a little bit slower so may be for this reason my 
program works fine.


Do you think I can use Jetty for my prod' environment?
I will have around 500 users / year with 10 000 requests by day max

Le 16/07/2012 16:40, Michael Della Bitta a écrit :

Hello, Bruno,

No, 4 simultaneous requests should not be a problem.

Have you checked the Tomcat logs or logged the data in the query
response object to see if there are any clues to what the problem
might be?

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Sun, Jul 15, 2012 at 2:10 PM, Bruno Mannina  wrote:

I forgot:

I do the request on the uniqueKey field, so each request gets one document

Le 15/07/2012 14:11, Bruno Mannina a écrit :


Dear Solr Users,

I have a solr3.6 + Tomcat and I have a program that connect 4 http
requests at the same time.
I must do 1902 requests.

I do several tests but each time it losts some requests:
- sometimes I get 1856 docs, 1895 docs, 1900 docs but never 1902 docs.

With Jetty, I get always 1902 docs.

As it's a dev' environment, I'm alone to test it.

Is it a problem to do 4 requests at the same time for tomcat6?

thanks for your info,

Bruno











Re: Wildcard query vs facet.prefix for autocomplete?

2012-07-16 Thread solrman
term component will be faster.
like below:
http://host:port/solr/terms?terms.fl=content&terms.prefix=sol

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199p3995378.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr 3.5 DIH delta-import replicating full index or Admin UI problem?

2012-07-16 Thread Arcadius Ahouansou
Hello.

We are running Solr 3.5 multicore in master-slave mode.


-Our delta-import looks like:
/solr/core01/dataimport?command=delta-import&*optimize=false*

The size of the index in 1.18GB

When delta-import is going on, on the slave admin UI
 8983/solr/core01/admin/replication/index.jsp
I can see the following output:

Master http://solrmaster01.somedomain.com:8983/solr/core01/replication
Latest Index Version:null, Generation: null
Replicatable Index Version:1342183977587, Generation: 33
Poll Interval 00:00:60
Local Index Index Version: 1342183977585, Generation: 32
Location: /var/somedomain/solr/solrhome/core01/data/index
Size: 1.18 GB
Times Replicated Since Startup: 32
Previous Replication Done At: Mon Jul 16 17:08:58 GMT 2012
Config Files Replicated At: null
Config Files Replicated: null
Times Config Files Replicated Since Startup: null
Next Replication Cycle At: Mon Jul 16 17:09:58 GMT 2012
Current Replication Status Start Time: Mon Jul 16 17:08:58 GMT 2012
Files Downloaded: 12 / 95
*Downloaded: 4.33 KB / 1.18 GB [0.0%]*
*Downloading File: _1o.fdt, Downloaded: 510 bytes / 510 bytes [100.0%]*
Time Elapsed: 22s, Estimated Time Remaining: 6266208s, Speed: 201 bytes/s
-


- Does "Downloaded: 4.33 KB / *1.18 GB [0.0%]" *means that the solr slave
is going to download the whole 1.18GB?

-I have been monitoring this and the replications takes less that a minute.
And checking the files in the index directory on the slave, the timestamps
are quite different, so apparently, the slave is not downloading the full
index all the time.

-Please, has anyone else seen the whole index size being shown as
denominator of the "Downloaded" fraction?

-Anything I may be doing wrong?

-Also notice the "Files Downloaded: 12 / 95".  That bit never increase
to 95 / 95


Our solrconfig looks like this:

--


 ${enable.master:false}
 commit
 startup
 solrconfig.xml,synonyms.txt,schema.xml,stopwords.txt,data-config.xml


 ${enable.slave:false}
 some-master-full-url
 00:00:60


--


Thanks.

Arcadius.

*
*


Re: Metadata and FullText, indexed at different times - looking for best approach

2012-07-16 Thread Alexandre Rafalovitch
Thank you,

I am already on 4alpha. Patch feels a little too unstable for my
needs/familiarity with the codes.

What about something around multiple cores? Could I have full-text
fields stored in a separate cores and somehow (again, minimum
hand-coding) do search against all those cores and get back combined
list of document IDs? Or would it making comparative ranking/sorting
impossible?

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Sun, Jul 15, 2012 at 12:08 PM, Erick Erickson
 wrote:
> You've got a couple of choices. There's a new patch in town
> https://issues.apache.org/jira/browse/SOLR-139
> that allows you to update individual fields in a doc if (and only if)
> all the fields in the original document were stored (actually, all the
> non-copy fields).
>
> So if you're storing (stored="true") all your metadata information, you can
> just update the document when the  text becomes available assuming you
> know the uniqueKey when you update.
>
> Under the covers, this will find the old document, get all the fields, add the
> new fields to it, and re-index the whole thing.
>
> Otherwise, your fallback idea is a good one.
>
> Best
> Erick
>
> On Sat, Jul 14, 2012 at 11:05 PM, Alexandre Rafalovitch
>  wrote:
>> Hello,
>>
>> I have a database of metadata and I can inject it into SOLR with DIH
>> just fine. But then, I also have the documents to extract full text
>> from that I want to add to the same records as additional fields. I
>> think DIH allows to run Tika at the ingestion time, but I may not have
>> the full-text files at that point (they could arrive days later). I
>> can match the file to the metadata by a file name matching a field
>> name.
>>
>> What is the best approach to do that staggered indexing with minimum
>> custom code? I guess my fallback position is a custom full-text
>> indexer agent that re-adds the metadata fields when the file is being
>> indexed. Is there anything better?
>>
>> I am a newbie using v4.0alpha of SOLR (and loving it).
>>
>> Thank you,
>> Alex.
>> Personal blog: http://blog.outerthoughts.com/
>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> - Time is the quality of nature that keeps events from happening all
>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>> book)


Re: Wildcard query vs facet.prefix for autocomplete?

2012-07-16 Thread Pawel Rog
Maybe try EdgeNgramFilterFactory
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/#solr.EdgeNGramFilterFactory


On Mon, Jul 16, 2012 at 6:57 AM, santamaria2 wrote:

> I'm about to implement an autocomplete mechanism for my search box. I've
> read
> about some of the common approaches, but I have a question about wildcard
> query vs facet.prefix.
>
> Say I want autocomplete for a title: 'Shadows of the Damned'. I want this
> to
> appear as a suggestion if I type 'sha' or 'dam' or 'the'. I don't care that
> it won't appear if I type 'hadows'.
>
> While indexing, I'd use a whitespace tokenizer and a lowercase filter to
> store that title in the index.
> Now I'm thinking two approaches for 'dam' typed in the search box:
>
> 1) q=title:dam*
>
> 2) q=*:*&facet=on&facet.field=title&facet.prefix=dam
>
>
> So any reason that I should favour one over the other? Speed a factor? The
> index has around 200,000 items.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Grouping performance problem

2012-07-16 Thread alxsss
This is strange. We have data folder size 24Gb,  RAM for java 2GB. We query 
with grouping, ngroups and  highlighting, do not query all fields and query 
time mostly is less than 1 sec it rarely goes up to 2 sec. We use solr 3.6 and 
tuned off all kind of caching.
Maybe your problem is with caching and displaying all fields?

Hope this may help.

Alex.



-Original Message-
From: Agnieszka Kukałowicz 
To: solr-user 
Sent: Mon, Jul 16, 2012 10:04 am
Subject: Re: Grouping performance problem


I have server with 24GB RAM. I have 4 shards on it, each of them with 4GB
RAM for java:
JAVA_OPTIONS="-server -Xms4096M -Xmx4096M"
The size is about 15GB for one shard (i use ssd disk for index data).

Agnieszka


2012/7/16 

> What are the RAM of your server and size of the data folder?
>
>
>
> -Original Message-
> From: Agnieszka Kukałowicz 
> To: solr-user 
> Sent: Mon, Jul 16, 2012 6:16 am
> Subject: Re: Grouping performance problem
>
>
> Hi Pavel,
>
> I tried with group.ngroups=false but didn't notice a big improvement.
> The times were still about 4000 ms. It doesn't solve my problem.
> Maybe this is because of my index type. I have millions of documents but
> only about 20 000 groups.
>
>  Cheers
>  Agnieszka
>
> 2012/7/16 Pavel Goncharik 
>
> > Hi Agnieszka ,
> >
> > if you don't need number of groups, you can try leaving out
> > group.ngroups=true param.
> > In this case Solr apparently skips calculating all groups and delivers
> > results much faster.
> > At least for our application the difference in performance
> > with/without group.ngroups=true is significant (have to say, we use
> > Solr 3.6).
> >
> > WBR,
> > Pavel
> >
> > On Mon, Jul 16, 2012 at 1:00 PM, Agnieszka Kukałowicz
> >  wrote:
> > > Hi,
> > >
> > > Is the any way to make grouping searches more efficient?
> > >
> > > My queries look like:
> > >
> >
> /select?q=query&group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1
> > >
> > > For index with 3 mln documents query for all docs with group=true takes
> > > almost 4000ms. Because queryResultCache is not used next queries take a
> > > long time also.
> > >
> > > When I remove group=true and leave only faceting the query for all docs
> > > takes much more less time: for first time ~ 700ms and next runs only
> > 200ms
> > > because of queryResultCache being used.
> > >
> > > So with group=true the query is about 20 time slower than without it.
> > > Is it possible or is there any way to improve performance with
> grouping?
> > >
> > > My application needs grouping feature and all of the queries use it but
> > the
> > > performance of them is to low for production use.
> > >
> > > I use Solr 4.x from trunk
> > >
> > > Agnieszka Kukalowicz
> >
>
>
>

 


Re: Index version on slave incrementing to higher than master

2012-07-16 Thread Andrew Davidoff
Thanks Erick,

I will look harder at our current configuration and how we're handling
config replication, but I just realized that a backup script was doing a
commit and an optimize on the slave prior to taking the backup. This
happens daily, after updates and replication from the master. This is
something I put in place many ages ago and didn't think to look at until
now :/

Based on the times in the logs and the conditions under which my problem
was occurring (when I wasn't optimizing on the master before initiating
replication) it seems clear that this backup script is my problem. Sorry
for taking your time with something that was clearly my own dang fault. I
appreciate your suggestions and responses regardless!

Andy

On Mon, Jul 16, 2012 at 7:35 AM, Erick Erickson wrote:

> Andrew:
>
> I'm not entirely sure that's your problem, but it's the first thing I'd
> try.
>
> As for your config files, see the section "Replicating solrconfig.xml"
> here: http://wiki.apache.org/solr/SolrReplication. That at least
> allows you to centralize separate solrconfigs for master and
> slave, making promoting a slave to a master a bit easier
>
> Best
> Erick
>
> On Sun, Jul 15, 2012 at 2:00 PM, Andrew Davidoff 
> wrote:
> > Erick,
> >
> > Thank you. I think originally my thought was that if I had my slave
> > configuration really close to my master config, it would be very easy to
> > promote a slave to a master (and vice versa) if necessary. But I think
> you
> > are correct that ripping out from the slave config anything that would
> > modify an index in any way makes sense. I will give this a try very soon.
> >
> > Thanks again.
> > Andy
> >
> >
> > On Sat, Jul 14, 2012 at 5:22 PM, Erick Erickson  >wrote:
> >
> >> Gotta admit it's a bit puzzling, and surely you want to move to the 3x
> >> versions ..
> >>
> >> But at a guess, things might be getting confused on the slaves given
> >> you have a merge policy on them. There's no reason to have any
> >> policies on the slaves; slaves should just be about copying the files
> >> from the master, all the policies,commits,optimizes should be done on
> >> the master. About all the slave does is copy the current state of the
> index
> >> from the master.
> >>
> >> So I'd try removing everything but the replication from the slaves,
> >> including
> >> any autocommit stuff and just let replication do it's thing.
> >>
> >> And I'd replicate after the optimize if you keep the optimize going. You
> >> should
> >> end up with one segment in the index after that, on both the master and
> >> slave.
> >> You can't get any more merged than that.
> >>
> >> Of course you'll also copy the _entire_ index every time after you've
> >> optimized...
> >>
> >> Best
> >> Erick
> >>
> >> On Fri, Jul 13, 2012 at 12:31 AM, Andrew Davidoff 
> >> wrote:
> >> > Hi,
> >> >
> >> > I am running solr 1.4.0+ds1-1ubuntu1. I have a master server that has
> a
> >> > number of solr instances running on it (150 or so), and nightly most
> of
> >> > them have documents written to them. The script that does these writes
> >> > (adds) does a commit and an optimize on the indexes when it's entirely
> >> > finished updating them, then initiates replication on the slave per
> >> > instance. In this configuration, the index versions between master and
> >> > slave remain in synch.
> >> >
> >> > The optimize portion, which, again, happens nightly, is taking a lot
> of
> >> > time and I think it's unnecessary. I was hoping to stop doing this
> >> explicit
> >> > optimize, and to let my merge policy handle that. However, if I don't
> do
> >> an
> >> > optimize, and only do a commit before initiating slave replication,
> some
> >> > hours later the slave is, for reasons that are unclear to me,
> >> incrementing
> >> > its index version to 1 higher than the master.
> >> >
> >> > I am not really sure I understand the logs, but it looks like the
> >> > incremented index version is the result of an optimize on the slave,
> but
> >> I
> >> > am never issuing any commands against the slave aside from initiating
> >> > replication, and I don't think there's anything in my solr
> configuration
> >> > that would be initiating this. I do have autoCommit on with maxDocs of
> >> > 1000, but since I am initiating slave replication after doing a
> commit on
> >> > the master, I don't think there would ever be any uncommitted
> documents
> >> on
> >> > the slave. I do have a merge policy configured, but it's not clear to
> me
> >> > that it has anything to do with this. And if it did, I'd expect to see
> >> > similar behavior on the master (right?).
> >> >
> >> > I have included a snipped from my slave logs that shows this issue. In
> >> this
> >> > snipped index version 1286065171264 is what the master has,
> >> > and 1286065171265 is what the slave increments itself to, which is
> then
> >> out
> >> > of synch with the master in terms of version numbers. Nothing that I
> know
> >> > of is issuing any commands to the slave at this

Re: Grouping performance problem

2012-07-16 Thread Agnieszka Kukałowicz
I have server with 24GB RAM. I have 4 shards on it, each of them with 4GB
RAM for java:
JAVA_OPTIONS="-server -Xms4096M -Xmx4096M"
The size is about 15GB for one shard (i use ssd disk for index data).

Agnieszka


2012/7/16 

> What are the RAM of your server and size of the data folder?
>
>
>
> -Original Message-
> From: Agnieszka Kukałowicz 
> To: solr-user 
> Sent: Mon, Jul 16, 2012 6:16 am
> Subject: Re: Grouping performance problem
>
>
> Hi Pavel,
>
> I tried with group.ngroups=false but didn't notice a big improvement.
> The times were still about 4000 ms. It doesn't solve my problem.
> Maybe this is because of my index type. I have millions of documents but
> only about 20 000 groups.
>
>  Cheers
>  Agnieszka
>
> 2012/7/16 Pavel Goncharik 
>
> > Hi Agnieszka ,
> >
> > if you don't need number of groups, you can try leaving out
> > group.ngroups=true param.
> > In this case Solr apparently skips calculating all groups and delivers
> > results much faster.
> > At least for our application the difference in performance
> > with/without group.ngroups=true is significant (have to say, we use
> > Solr 3.6).
> >
> > WBR,
> > Pavel
> >
> > On Mon, Jul 16, 2012 at 1:00 PM, Agnieszka Kukałowicz
> >  wrote:
> > > Hi,
> > >
> > > Is the any way to make grouping searches more efficient?
> > >
> > > My queries look like:
> > >
> >
> /select?q=query&group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1
> > >
> > > For index with 3 mln documents query for all docs with group=true takes
> > > almost 4000ms. Because queryResultCache is not used next queries take a
> > > long time also.
> > >
> > > When I remove group=true and leave only faceting the query for all docs
> > > takes much more less time: for first time ~ 700ms and next runs only
> > 200ms
> > > because of queryResultCache being used.
> > >
> > > So with group=true the query is about 20 time slower than without it.
> > > Is it possible or is there any way to improve performance with
> grouping?
> > >
> > > My application needs grouping feature and all of the queries use it but
> > the
> > > performance of them is to low for production use.
> > >
> > > I use Solr 4.x from trunk
> > >
> > > Agnieszka Kukalowicz
> >
>
>
>


Re: Solr - Spatial Search for Specif Areas on Map

2012-07-16 Thread David Smiley (@MITRE.org)
Thinking more about this, the way to get a Lucene based system to scale to
the maximum extent possible for geospatial queries would be to get a
geospatial query to be satisfied by just one (usually) Lucene index segment. 
It would take quite a bit of customization and work to make this happen.  I
suppose you could always optimize a Solr index and thus get one Lucene
segment, but deploy 10-20x the number of Solr shards (aka "Solr cores") that
one would normally do, and that wouldn't be that hard.  There would be some
work in determining which Solr core (== Lucene segment) a given document
should belong to and which ones to query.

~ David

-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Spatial-Search-for-Specif-Areas-on-Map-tp3995051p3995357.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Grouping performance problem

2012-07-16 Thread alxsss
What are the RAM of your server and size of the data folder?



-Original Message-
From: Agnieszka Kukałowicz 
To: solr-user 
Sent: Mon, Jul 16, 2012 6:16 am
Subject: Re: Grouping performance problem


Hi Pavel,

I tried with group.ngroups=false but didn't notice a big improvement.
The times were still about 4000 ms. It doesn't solve my problem.
Maybe this is because of my index type. I have millions of documents but
only about 20 000 groups.

 Cheers
 Agnieszka

2012/7/16 Pavel Goncharik 

> Hi Agnieszka ,
>
> if you don't need number of groups, you can try leaving out
> group.ngroups=true param.
> In this case Solr apparently skips calculating all groups and delivers
> results much faster.
> At least for our application the difference in performance
> with/without group.ngroups=true is significant (have to say, we use
> Solr 3.6).
>
> WBR,
> Pavel
>
> On Mon, Jul 16, 2012 at 1:00 PM, Agnieszka Kukałowicz
>  wrote:
> > Hi,
> >
> > Is the any way to make grouping searches more efficient?
> >
> > My queries look like:
> >
> /select?q=query&group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1
> >
> > For index with 3 mln documents query for all docs with group=true takes
> > almost 4000ms. Because queryResultCache is not used next queries take a
> > long time also.
> >
> > When I remove group=true and leave only faceting the query for all docs
> > takes much more less time: for first time ~ 700ms and next runs only
> 200ms
> > because of queryResultCache being used.
> >
> > So with group=true the query is about 20 time slower than without it.
> > Is it possible or is there any way to improve performance with grouping?
> >
> > My application needs grouping feature and all of the queries use it but
> the
> > performance of them is to low for production use.
> >
> > I use Solr 4.x from trunk
> >
> > Agnieszka Kukalowicz
>

 


Re: Grouping performance problem

2012-07-16 Thread alxsss



Re: SOLR 4 Alpha Out Of Mem Err

2012-07-16 Thread Mark Miller

On Jul 15, 2012, at 2:45 PM, Nick Koton wrote:

> I converted my program to use
> the SolrServer::add(Collection docs) method with 100
> documents in each add batch.  Unfortunately, the out of memory errors still
> occur without client side commits.

This won't change much unfortunately - currently, each host has 10 add and 10 
deletes buffered for it before it will flush. There are some recovery 
implications that have kept that buffer size low so far - but what it ends up 
meaning is that when you stream docs, every 10 docs is sent off on a thread. 
Generally, you might be able to keep up with this - but the commit cost appears 
to perhaps cause a small resource drop that backs things up a bit - and some of 
those threads take a little longer to finish while new threads fire off to keep 
servicing the constantly coming new documents. What appears will happen is 
large momentary spikes in the number of threads. Each thread needs a bit of 
space on the heap, and it would seem with a high enough spike you could get an 
OOM. In my testing, I have not triggered that yet, but I have seen large thread 
count spikes.

Raising the add doc buffer to 100 docs makes those thread bursts much, much 
less severe. I can't remember all of the implications of that buffer size 
though - need to talk to Yonik about it.

We could limit the number of threads for that executor, but I think that comes 
with some negatives as well.

You could try lowering -Xss so that each thread uses less RAM (if possible) as 
a shorter term (possible) workaround.

You could also use multiple threads with the std HttpSolrServer - it won't be 
quite as fast probably, but it can get close(ish).

My guess is that your client commits help because a commit will cause a wait on 
all outstanding requests - so that the commit is in logical order - this 
probably is like releasing a pressure valve - the system has a chance to catch 
up and reclaim lots of threads.

We will keep looking into what the best improvement is.

- Mark Miller
lucidimagination.com













Re: Solr - Spatial Search for Specif Areas on Map

2012-07-16 Thread David Smiley (@MITRE.org)

samabhiK wrote
> 
> David,
> 
> Thanks for such a detailed response. The data volume I mentioned is the
> total set of records we have - but we would never ever need to search the
> entire base in one query; we would divide the data by region or zip code.
> So, in that case I assume that for a single region, we would not have more
> than 200M records (this is real , we have a region with that many
> records).
> 
> So, I can assume that I can create shards based on regions and the
> requests would get distributed among these region servers, right?
> 

The fact that your searches are always per region (or almost always) helps
things a lot.  Instead of doing a distributed search to all shards, you
would search the specific shard, or worst case 2 shards, and not burden the
other shards with queries you no won't be satisfied.  This new information
suggests that the total 10k queries per second volume would be divided
amongst your shards, so 10k / 40 shards = 250 queries per second.  Now we
are approaching something reasonable.  If any of your regions need to scale
up (more query volume) or out (big region) then you can do that on a case by
case basis.  I can think of ways to optimize that for spatial.

Thinking in terms of pure queries per second on a machine, say a 16 CPU
core/machine one, then 250/16 = ~ 16 queries per second per CPU core of a
shard.  I think that's plausible but you would really need to determine how
many exactly you could do.  I assume the spatial index is going to fit in
RAM.  If successful, this means ~40 machines (one per region). 



>  You also mentioned about ~20 concurrent queries per shard - do you have
> links to some benchmarks? I am very interested to know about the hardware
> sizing details for such a setup.
> 

The best I can offer is on the geospatial side: 
https://issues.apache.org/jira/browse/SOLR-2155?focusedCommentId=12988316&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12988316

But this was an index of "only" 2M distinct points.  It may be that these
figures still hold if the overhead of the spatial query with data is so low
that other constant elements comprise the times, but I really don't know. 
To be clear, this is older code that is not the same as the latest, but they
are algorithmically the same.  The current code has an error epsilon to the
query shape which helps scale further.  There is plenty more optimization
that could be done, like a more efficient binary grid scheme, using Hilbert
Curves, and using an optimizer to find the hotspots to try and optimize
them.



> About setting up Solr for a single shard, I think I will go by your
> advice.  Will see how much a single shard can handle in a decent machine
> :)
> 
> The reason why I came up with that figure was, I have a user base of 500k
> and theres a lot of activity which would happen on the map - every time
> someone moves the tiles, zooms in/out, scrolls, we are going to send a
> server side request to fetch some data ( I agree we can benefit much using
> caching but I believe Solr itself has its own local cache). I might be a
> bit unrealistic with my 10K rps projections but I have read about 9K rps
> to map servers from some sources on the internet. 
> 
> And, NO, I don't work for Google :) But who knows we might be building
> something that can get so much traffic to us in a while. :D
> 
> BTW, my question still remains - can we do search on polygonal areas on
> the map? If so, do you have any link where i can get more details?
> Bounding Box thing wont work for me I guess :(
> 
> Sam
> 

Polygons are supported; I've been doing them for years now.  But it requires
some extensions.  Today, you need the latest Solr trunk, and you need to
apply the Solr adapters to Lucene 4 spatial SOLR-3304, and you need to have
the JTS jar on your classpath, something you download separately.  BTW here
are some basic
docs:http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4  



-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Spatial-Search-for-Specif-Areas-on-Map-tp3995051p3995333.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Lost answers?

2012-07-16 Thread Michael Della Bitta
Hello, Bruno,

No, 4 simultaneous requests should not be a problem.

Have you checked the Tomcat logs or logged the data in the query
response object to see if there are any clues to what the problem
might be?

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Sun, Jul 15, 2012 at 2:10 PM, Bruno Mannina  wrote:
> I forgot:
>
> I do the request on the uniqueKey field, so each request gets one document
>
> Le 15/07/2012 14:11, Bruno Mannina a écrit :
>
>> Dear Solr Users,
>>
>> I have a solr3.6 + Tomcat and I have a program that connect 4 http
>> requests at the same time.
>> I must do 1902 requests.
>>
>> I do several tests but each time it losts some requests:
>> - sometimes I get 1856 docs, 1895 docs, 1900 docs but never 1902 docs.
>>
>> With Jetty, I get always 1902 docs.
>>
>> As it's a dev' environment, I'm alone to test it.
>>
>> Is it a problem to do 4 requests at the same time for tomcat6?
>>
>> thanks for your info,
>>
>> Bruno
>>
>>
>
>


Re: JRockit with SOLR3.4/3.5

2012-07-16 Thread Salman Akram
Michael,

Thanks for the response. Below is the stack trace.

Note: Our environment is 64 bit and the Initial Pool size is set to 4GB and
Max pool size is 12GB so it doesn't makes sense why it tries to allocate
24GB (even that is available as the total RAM is 64GB).

This issue doesn't come with SOLR 1.4

-

SEVERE: Error waiting for multi-thread deployment of directories to
completehostConfig.deployWar=Deploying web application archive {0}
java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError:
classblock allocation, 1535880 loaded, 1536K footprint, in check_alloc
(src/jvm/model/classload/classalloc.c:215).

Attempting to allocate 24000M bytes

There is insufficient native memory for the Java
Runtime Environment to continue.

Possible reasons:
  The system is out of physical RAM or swap space
  In 32 bit mode, the process size limit was hit

Possible solutions:
  Reduce memory load on the system
  Increase physical memory or swap space
  Check if swap backing store is full
  Use 64 bit Java on a 64 bit OS
  Decrease Java heap size (-Xmx/-Xms)
  Decrease number of Java threads
  Decrease Java thread stack sizes (-Xss)
  Disable compressed references (-XXcompressedRefs=false)

at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at org.apache.catalina.startup.HostConfig.deployDirectories(
HostConfig.java:1018)
at org.apache.catalina.startup.HostConfig.deployApps(
HostConfig.java:475)
at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1412)
at org.apache.catalina.startup.HostConfig.lifecycleEvent(
HostConfig.java:312)
at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(
LifecycleSupport.java:119)
at org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(
LifecycleBase.java:91)
at org.apache.catalina.util.LifecycleBase.setStateInternal(
LifecycleBase.java:401)
at org.apache.catalina.util.LifecycleBase.setState(
LifecycleBase.java:346)
at org.apache.catalina.core.ContainerBase.startInternal(
ContainerBase.java:1117)
at org.apache.catalina.core.StandardHost.startInternal(
StandardHost.java:782)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.ContainerBase$StartChild.call(
ContainerBase.java:1526)
at org.apache.catalina.core.ContainerBase$StartChild.call(
ContainerBase.java:1515)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:139)
at java.util.concurrent.ThreadPoolExecutor$Worker.
runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:909)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.OutOfMemoryError: classblock allocation, 1535880
loaded, 1536K footprint, in check_alloc (src/jvm/model/classload/
classalloc.c:215).

Attempting to allocate 24000M bytes

There is insufficient native memory for the Java
Runtime Environment to continue.

Possible reasons:
  The system is out of physical RAM or swap space
  In 32 bit mode, the process size limit was hit

Possible solutions:
  Reduce memory load on the system
  Increase physical memory or swap space
  Check if swap backing store is full
  Use 64 bit Java on a 64 bit OS
  Decrease Java heap size (-Xmx/-Xms)
  Decrease number of Java threads
  Decrease Java thread stack sizes (-Xss)
  Disable compressed references (-XXcompressedRefs=false)

at sun.misc.Unsafe.defineClass(Native Method)
at sun.reflect.ClassDefiner.defineClass(ClassDefiner.java:45)
at sun.reflect.MethodAccessorGenerator$1.run(
MethodAccessorGenerator.java:381)
at sun.reflect.MethodAccessorGenerator.generate(
MethodAccessorGenerator.java:377)
at sun.reflect.MethodAccessorGenerator.generateConstructor(
MethodAccessorGenerator.java:76)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(
NativeConstructorAccessorImpl.java:30)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at java.lang.Class.newInstance0(Class.java:355)
at java.lang.Class.newInstance(Class.java:308)
at javax.xml.parsers.FactoryFinder.newInstance(FactoryFinder.java:147)
at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:233)
at javax.xml.parsers.SAXParserFactory.newInstance(
SAXParserFactory.java:128)
at org.apache.tomcat.util.digester.Digester.getFactory(
Digester.java:470)
at org.apache.tomcat.util.digester.Digester.getParser(Digester.java:677)
at org.apache.catalina.startup.ContextConfig.init(
ContextConfig.java:780)
at org.apache.catalina.startup.ContextConfig.lifecycleEvent(
ContextConfig.java:320)
at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(
LifecycleSupport.java:119)
at 

Re: Grouping performance problem

2012-07-16 Thread Agnieszka Kukałowicz
Hi Pavel,

I tried with group.ngroups=false but didn't notice a big improvement.
The times were still about 4000 ms. It doesn't solve my problem.
Maybe this is because of my index type. I have millions of documents but
only about 20 000 groups.

 Cheers
 Agnieszka

2012/7/16 Pavel Goncharik 

> Hi Agnieszka ,
>
> if you don't need number of groups, you can try leaving out
> group.ngroups=true param.
> In this case Solr apparently skips calculating all groups and delivers
> results much faster.
> At least for our application the difference in performance
> with/without group.ngroups=true is significant (have to say, we use
> Solr 3.6).
>
> WBR,
> Pavel
>
> On Mon, Jul 16, 2012 at 1:00 PM, Agnieszka Kukałowicz
>  wrote:
> > Hi,
> >
> > Is the any way to make grouping searches more efficient?
> >
> > My queries look like:
> >
> /select?q=query&group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1
> >
> > For index with 3 mln documents query for all docs with group=true takes
> > almost 4000ms. Because queryResultCache is not used next queries take a
> > long time also.
> >
> > When I remove group=true and leave only faceting the query for all docs
> > takes much more less time: for first time ~ 700ms and next runs only
> 200ms
> > because of queryResultCache being used.
> >
> > So with group=true the query is about 20 time slower than without it.
> > Is it possible or is there any way to improve performance with grouping?
> >
> > My application needs grouping feature and all of the queries use it but
> the
> > performance of them is to low for production use.
> >
> > I use Solr 4.x from trunk
> >
> > Agnieszka Kukalowicz
>


Re: Facet on all the dynamic fields with *_s feature

2012-07-16 Thread Rajani Maski
In this URL  -  https://issues.apache.org/jira/browse/SOLR-247

there are *patches *and one patch with name "*SOLR-247-FacetAllFields*"

Will that help me to fix this problem?

If yes, how do I  add this to solr plugin ?


Thanks & Regards
Rajani




On Mon, Jul 16, 2012 at 5:04 PM, Darren Govoni  wrote:

> You'll have to query the index for the fields and sift out the _s ones
> and cache them or something.
>
> On Mon, 2012-07-16 at 16:52 +0530, Rajani Maski wrote:
>
> > Yes, This feature will solve the below problem very neatly.
> >
> > All,
> >
> >  Is there any approach to achieve this for now?
> >
> >
> > --Rajani
> >
> > On Sun, Jul 15, 2012 at 6:02 PM, Jack Krupansky  >wrote:
> >
> > > The answer appears to be "No", but it's good to hear people express an
> > > interest in proposed features.
> > >
> > > -- Jack Krupansky
> > >
> > > -Original Message- From: Rajani Maski
> > > Sent: Sunday, July 15, 2012 12:02 AM
> > > To: solr-user@lucene.apache.org
> > > Subject: Facet on all the dynamic fields with *_s feature
> > >
> > >
> > > Hi All,
> > >
> > >   Is this issue fixed in solr 3.6 or 4.0:  Faceting on all Dynamic
> field
> > > with facet.field=*_s
> > >
> > >   Link  :  https://issues.apache.org/**jira/browse/SOLR-247<
> https://issues.apache.org/jira/browse/SOLR-247>
> > >
> > >
> > >
> > >  If it is not fixed, any suggestion on how do I achieve this?
> > >
> > >
> > > My requirement is just same as this one :
> > > http://lucene.472066.n3.**nabble.com/Dynamic-facet-**
> > > field-tc2979407.html#none<
> http://lucene.472066.n3.nabble.com/Dynamic-facet-field-tc2979407.html#none
> >
> > >
> > >
> > > Regards
> > > Rajani
> > >
>
>
>


Re: Grouping performance problem

2012-07-16 Thread Pavel Goncharik
Hi Agnieszka ,

if you don't need number of groups, you can try leaving out
group.ngroups=true param.
In this case Solr apparently skips calculating all groups and delivers
results much faster.
At least for our application the difference in performance
with/without group.ngroups=true is significant (have to say, we use
Solr 3.6).

WBR,
Pavel

On Mon, Jul 16, 2012 at 1:00 PM, Agnieszka Kukałowicz
 wrote:
> Hi,
>
> Is the any way to make grouping searches more efficient?
>
> My queries look like:
> /select?q=query&group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1
>
> For index with 3 mln documents query for all docs with group=true takes
> almost 4000ms. Because queryResultCache is not used next queries take a
> long time also.
>
> When I remove group=true and leave only faceting the query for all docs
> takes much more less time: for first time ~ 700ms and next runs only 200ms
> because of queryResultCache being used.
>
> So with group=true the query is about 20 time slower than without it.
> Is it possible or is there any way to improve performance with grouping?
>
> My application needs grouping feature and all of the queries use it but the
> performance of them is to low for production use.
>
> I use Solr 4.x from trunk
>
> Agnieszka Kukalowicz


Re: Query results vs. facets results

2012-07-16 Thread Erick Erickson
Ahhh, you need to look down another few lines. When you specify fq, there
should be a section of the debug output like

  .
  .
  .


where the array is the parsed form of the filter queries. I was thinking about
comparing that with the parsed form of the "q" parameter in the non-filter
case to see what insight one could gain from that.

But there's already one difference, when you use *, you get
 ID:*

Is it possible that you have some documents that do NOT have an ID field?
try *:* rather than just *. I'm guessing that your default search field is ID
and you have some documents without an ID field. Not a good guess if ID
is your  though..

Try q=*:* -ID:* and see if you get 31 docs.

Also note that if you _have_ specified ID as your  _but_ you didn't
re-index afterwards (actually, I'd blow away the entire
/data directory
and restart) you may have stale data in there that allowed documents to exist
that do not have uniqueKey fields.

Best
Erick

On Sun, Jul 15, 2012 at 4:49 PM, tudor  wrote:
> Hi Erick,
>
> Thanks for the reply.
>
> The query:
>
> http://localhost:8983/solr/db/select?indent=on&version=2.2&q=CITY:MILTON&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.ngroups=true&group.truncate=true&debugQuery=on
>
> yields this in the debug section:
>
> CITY:MILTON
>   CITY:MILTON
>   CITY:MILTON
>   CITY:MILTON
>   LuceneQParser
>
> There is no information about grouping.
>
> Second query:
>
> http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.truncate=true&facet=true&facet.field=CITY&facet.missing=true&group.ngroups=true&debugQuery=on
>
> yields this in the debug section:
>
> 
>   *
>   *
>   ID:*
>   ID:*
>   LuceneQParser
>
> To be honest, these do not tell me too much. I would like to see some
> information about the grouping, since I believe this is where I am missing
> something.
>
> In the mean time, I have combined the two queries above, hoping to make some
> sense out of the results. The following query filters all the entries with
> the city name "MILTON" and groups together the ones with the same ID. Also,
> the query facets the entries on city, grouping the ones with the same ID. So
> the results numbers refer to the number of groups.
>
> http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*&fq={!tag=dt}CITY:MILTON&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.truncate=true&facet=true&facet.field={!ex=dt}CITY&facet.missing=true&group.ngroups=true&debugQuery=on
>
> yields the same (for me perplexing) results:
>
> 
>   
>   284
>   134
>
> (i.e.: fq says: 134 groups with CITY:MILTON)
> ...
>
> 
>   
>   
>...
>   103
>
> (i.e.: faceted search says: 103 groups with CITY:MILTON)
>
> I really believe that these different results have something to do with the
> grouping that Solr makes, but I do not know how to dig into this.
>
> Thank you and best regards,
> Tudor
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3995079p3995156.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Index version on slave incrementing to higher than master

2012-07-16 Thread Erick Erickson
Andrew:

I'm not entirely sure that's your problem, but it's the first thing I'd try.

As for your config files, see the section "Replicating solrconfig.xml"
here: http://wiki.apache.org/solr/SolrReplication. That at least
allows you to centralize separate solrconfigs for master and
slave, making promoting a slave to a master a bit easier

Best
Erick

On Sun, Jul 15, 2012 at 2:00 PM, Andrew Davidoff  wrote:
> Erick,
>
> Thank you. I think originally my thought was that if I had my slave
> configuration really close to my master config, it would be very easy to
> promote a slave to a master (and vice versa) if necessary. But I think you
> are correct that ripping out from the slave config anything that would
> modify an index in any way makes sense. I will give this a try very soon.
>
> Thanks again.
> Andy
>
>
> On Sat, Jul 14, 2012 at 5:22 PM, Erick Erickson 
> wrote:
>
>> Gotta admit it's a bit puzzling, and surely you want to move to the 3x
>> versions ..
>>
>> But at a guess, things might be getting confused on the slaves given
>> you have a merge policy on them. There's no reason to have any
>> policies on the slaves; slaves should just be about copying the files
>> from the master, all the policies,commits,optimizes should be done on
>> the master. About all the slave does is copy the current state of the index
>> from the master.
>>
>> So I'd try removing everything but the replication from the slaves,
>> including
>> any autocommit stuff and just let replication do it's thing.
>>
>> And I'd replicate after the optimize if you keep the optimize going. You
>> should
>> end up with one segment in the index after that, on both the master and
>> slave.
>> You can't get any more merged than that.
>>
>> Of course you'll also copy the _entire_ index every time after you've
>> optimized...
>>
>> Best
>> Erick
>>
>> On Fri, Jul 13, 2012 at 12:31 AM, Andrew Davidoff 
>> wrote:
>> > Hi,
>> >
>> > I am running solr 1.4.0+ds1-1ubuntu1. I have a master server that has a
>> > number of solr instances running on it (150 or so), and nightly most of
>> > them have documents written to them. The script that does these writes
>> > (adds) does a commit and an optimize on the indexes when it's entirely
>> > finished updating them, then initiates replication on the slave per
>> > instance. In this configuration, the index versions between master and
>> > slave remain in synch.
>> >
>> > The optimize portion, which, again, happens nightly, is taking a lot of
>> > time and I think it's unnecessary. I was hoping to stop doing this
>> explicit
>> > optimize, and to let my merge policy handle that. However, if I don't do
>> an
>> > optimize, and only do a commit before initiating slave replication, some
>> > hours later the slave is, for reasons that are unclear to me,
>> incrementing
>> > its index version to 1 higher than the master.
>> >
>> > I am not really sure I understand the logs, but it looks like the
>> > incremented index version is the result of an optimize on the slave, but
>> I
>> > am never issuing any commands against the slave aside from initiating
>> > replication, and I don't think there's anything in my solr configuration
>> > that would be initiating this. I do have autoCommit on with maxDocs of
>> > 1000, but since I am initiating slave replication after doing a commit on
>> > the master, I don't think there would ever be any uncommitted documents
>> on
>> > the slave. I do have a merge policy configured, but it's not clear to me
>> > that it has anything to do with this. And if it did, I'd expect to see
>> > similar behavior on the master (right?).
>> >
>> > I have included a snipped from my slave logs that shows this issue. In
>> this
>> > snipped index version 1286065171264 is what the master has,
>> > and 1286065171265 is what the slave increments itself to, which is then
>> out
>> > of synch with the master in terms of version numbers. Nothing that I know
>> > of is issuing any commands to the slave at this time. If I understand
>> these
>> > logs (I might not), it looks like something issued an optimize that took
>> > 1023720ms? Any ideas?
>> >
>> > Thanks in advance.
>> >
>> > Andy
>> >
>> >
>> >
>> > Jul 12, 2012 12:21:14 PM org.apache.solr.update.SolrIndexWriter close
>> > FINE: Closing Writer DirectUpdateHandler2
>> > Jul 12, 2012 12:21:14 PM org.apache.solr.core.SolrDeletionPolicy onCommit
>> > INFO: SolrDeletionPolicy.onCommit: commits:num=2
>> >
>> >
>> commit{dir=/var/lib/ontolo/solr/o_3952/index,segFN=segments_h8,version=1286065171264,generation=620,filenames=[_h6.fnm,
>> > _h5.nrm, segments_h8, _h4.nrm, _h5.tii, _h4
>> > .tii, _h5.tis, _h4.tis, _h4.fdx, _h5.fnm, _h6.tii, _h4.fdt, _h5.fdt,
>> > _h5.fdx, _h5.frq, _h4.fnm, _h6.frq, _h6.tis, _h4.prx, _h4.frq, _h6.nrm,
>> > _h5.prx, _h6.prx, _h6.fdt, _h6
>> > .fdx]
>> >
>> >
>> commit{dir=/var/lib/ontolo/solr/o_3952/index,segFN=segments_h9,version=1286065171265,generation=621,filenames=[_h7.tis,
>> > _h7.fdx, _h7.fnm, _h7.fdt, _h7.prx, 

Re: Facet on all the dynamic fields with *_s feature

2012-07-16 Thread Darren Govoni
You'll have to query the index for the fields and sift out the _s ones
and cache them or something.

On Mon, 2012-07-16 at 16:52 +0530, Rajani Maski wrote:

> Yes, This feature will solve the below problem very neatly.
> 
> All,
> 
>  Is there any approach to achieve this for now?
> 
> 
> --Rajani
> 
> On Sun, Jul 15, 2012 at 6:02 PM, Jack Krupansky 
> wrote:
> 
> > The answer appears to be "No", but it's good to hear people express an
> > interest in proposed features.
> >
> > -- Jack Krupansky
> >
> > -Original Message- From: Rajani Maski
> > Sent: Sunday, July 15, 2012 12:02 AM
> > To: solr-user@lucene.apache.org
> > Subject: Facet on all the dynamic fields with *_s feature
> >
> >
> > Hi All,
> >
> >   Is this issue fixed in solr 3.6 or 4.0:  Faceting on all Dynamic field
> > with facet.field=*_s
> >
> >   Link  :  
> > https://issues.apache.org/**jira/browse/SOLR-247
> >
> >
> >
> >  If it is not fixed, any suggestion on how do I achieve this?
> >
> >
> > My requirement is just same as this one :
> > http://lucene.472066.n3.**nabble.com/Dynamic-facet-**
> > field-tc2979407.html#none
> >
> >
> > Regards
> > Rajani
> >




Re: Facet on all the dynamic fields with *_s feature

2012-07-16 Thread Rajani Maski
Yes, This feature will solve the below problem very neatly.

All,

 Is there any approach to achieve this for now?


--Rajani

On Sun, Jul 15, 2012 at 6:02 PM, Jack Krupansky wrote:

> The answer appears to be "No", but it's good to hear people express an
> interest in proposed features.
>
> -- Jack Krupansky
>
> -Original Message- From: Rajani Maski
> Sent: Sunday, July 15, 2012 12:02 AM
> To: solr-user@lucene.apache.org
> Subject: Facet on all the dynamic fields with *_s feature
>
>
> Hi All,
>
>   Is this issue fixed in solr 3.6 or 4.0:  Faceting on all Dynamic field
> with facet.field=*_s
>
>   Link  :  
> https://issues.apache.org/**jira/browse/SOLR-247
>
>
>
>  If it is not fixed, any suggestion on how do I achieve this?
>
>
> My requirement is just same as this one :
> http://lucene.472066.n3.**nabble.com/Dynamic-facet-**
> field-tc2979407.html#none
>
>
> Regards
> Rajani
>


Re: DIH - incorrect datasource being picked up by XPathEntityProcessor

2012-07-16 Thread girishyes

Okay... found the problem after some more debugging. I was using a wrong
datasource tag in the data-config.xml, may be Solr should validate the xml
against a schema so these kind of issues are caught upfront.

wrong: s*ource name="fieldSource" type="FieldReaderDataSource" />
correct: S*ource name="fieldSource" type="FieldReaderDataSource"
/>

this resolved the issue.

Thanks.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-incorrect-datasource-being-picked-up-by-XPathEntityProcessor-tp3994802p3995246.html
Sent from the Solr - User mailing list archive at Nabble.com.


Grouping performance problem

2012-07-16 Thread Agnieszka Kukałowicz
Hi,

Is the any way to make grouping searches more efficient?

My queries look like:
/select?q=query&group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1

For index with 3 mln documents query for all docs with group=true takes
almost 4000ms. Because queryResultCache is not used next queries take a
long time also.

When I remove group=true and leave only faceting the query for all docs
takes much more less time: for first time ~ 700ms and next runs only 200ms
because of queryResultCache being used.

So with group=true the query is about 20 time slower than without it.
Is it possible or is there any way to improve performance with grouping?

My application needs grouping feature and all of the queries use it but the
performance of them is to low for production use.

I use Solr 4.x from trunk

Agnieszka Kukalowicz


Re: are stopwords indexed?

2012-07-16 Thread Giovanni Gherdovich
Hi all, thank you for your replies.

Lance:
> Look at the index with the Schema Browser in the Solr UI. This pulls
> the terms for each field.

I did it, and it was the first alarm I got.
After the indexing, I went on the schema browser hoping
to don't see any stopword in the top-terms, but...
they were all there.

Michael:
> Hi Giovanni,
>
> you have entered the stopwords into stopword.txt file, right? But in the
> definition of the field type you are referencing stopwords_FR.txt..

good catch Micheal, but that's not the problem.

In my message I referred to "stopwords.txt", but actually my
stopwords file is named  stopwords_FR.txt, consistently with
what I put in my schema.xml

By the way, your answers make me think that yes,
I have a problem: stopwords should not appear in the index.

what a weird situation:

* querying with SOLR for a stopword (say "and") gives me zero result
  (so, somewhere in the indexing / searching pipeline my stopwords
file *is* taken into account)
* checking the index files with LuCLI for the same stopword give me
tons of hits.

cheers,
GGhh


Re: are stopwords indexed?

2012-07-16 Thread Michael Belenki
Hi Giovanni,

you have entered the stopwords into stopword.txt file, right? But in the
definition of the field type you are referencing stopwords_FR.txt..

best regards,

Michael
On Mon, 16 Jul 2012 05:38:04 +0200, Giovanni Gherdovich
 wrote:
> Hi all,
> 
> are stopwords from the stopwords.txt config file
> supposed to be indexed?
> 
> I would say no, but this is the situation I am
> observing on my Solr instance:
> 
> * I have a bunch of stopwords in stopwords.txt
> * my fields are of fieldType "text" from the example schema.xml,
>   i.e. I have
> 
> -- -- >8 -- -- >8 -- -- >8 -- -- >8
>positionIncrementGap="100">
>   
> [...]
>  ignoreCase="true"
> words="stopwords_FR.txt"
> enablePositionIncrements="true"
> />
> [...]
>   
>   
>  [...]
>   ignoreCase="true"
> words="stopwords_FR.txt"
> enablePositionIncrements="true"
> />
>   
>
> -- -- >8 -- -- >8 -- -- >8 -- -- >8
> 
> * searching for a stopwords thru solr gives always zero results
> * inspecting the index with LuCLI
> http://manpages.ubuntu.com/manpages/natty/man1/lucli.1.html
>   show that all stopwords are in my index. Note that I query
>   LuCLI specifying the field, i.e. with "myFieldName:and"
>   and not just with the stopword "and".
> 
> Is this normal?
> 
> Are stopwords indexed?
> 
> Cheers,
> Giovanni


Re: DIH include Fieldset in query

2012-07-16 Thread stockii
"So you want to re-use same SQL sentence in many entities? "
Yes

is it necessary to deploy complete solr and lucene for this?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-include-Fieldset-in-query-tp3994798p3995228.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Computed fields - can I put a function in fl?

2012-07-16 Thread Yonik Seeley
On Mon, Jul 16, 2012 at 4:43 AM, maurizio1976
 wrote:
> Yes,
> sorry Just a typo.
> I meant  
> q=*:*&fq=&start=0&rows=10&qt=&wt=&explainOther=&fl=product:(if(show_product:true,
> product, "")
> thanks

Functions normally derive their values from the fieldCache... there
isn't currently a function to load stored fields (e.g. your "product"
field"), but it's not a bad idea (given this usecase).

Here's an example with the exampledocs that shows IN_STOCK_PRICE only
if the item is in stock, and otherwise shows 0.
This works because price is a single-valued indexed field that the
fieldCache works on.

http://localhost:8983/solr/query?
  q=*:*
&fl=id, inStock, IN_STOCK_PRICE:if(inStock,price,0)

-Yonik
http://lucidimagination.com


Re: Computed fields - can I put a function in fl?

2012-07-16 Thread maurizio1976
Yes,
sorry Just a typo.
I meant  
q=*:*&fq=&start=0&rows=10&qt=&wt=&explainOther=&fl=product:(if(show_product:true,
product, "")
thanks


On Sat, Jul 14, 2012 at 11:27 PM, Erick Erickson [via Lucene]
 wrote:
> I think in 4.0 you can, but not 3.x as I remember. Your example has
> the fl as part
> of the highlight though, is that a typo?
>
> Best
> Erick
>
> On Fri, Jul 13, 2012 at 5:21 AM, maurizio1976
> <[hidden email]> wrote:
>
>> Hi,
>> I have 2 fields, one containing a string (product) and another containing
>> a
>> boolean (show_product).
>>
>> Is there a way of returning the product field with a value of null when
>> the
>> show_product field is false?
>>
>> I can make another field (product_computed) and index that with null where
>> I
>> need but I would like to understand if there is a better approach like
>> putting a function query in the fl and make a computed field.
>>
>> something like:
>>
>> q=*:*&fq=&start=0&rows=10&fl=&qt=&wt=&explainOther=&hl.fl=/*product:(if(show_product:true,
>> product, "")*/
>>
>> that obviously doesn't work.
>>
>> thanks for any help
>>
>> Maurizio
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Computed-fields-can-I-put-a-function-in-fl-tp3994799.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
> 
> If you reply to this email, your message will be added to the discussion
> below:
> http://lucene.472066.n3.nabble.com/Computed-fields-can-I-put-a-function-in-fl-tp3994799p3995045.html
> To unsubscribe from Computed fields - can I put a function in fl?, click
> here.
> NAML


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Computed-fields-can-I-put-a-function-in-fl-tp3994799p3995218.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: 4.0-ALPHA for general development use?

2012-07-16 Thread John Field
OK: that is helpful, thanks!

On 13 July 2012 15:44, Mark Miller  wrote:

> It really comes down to you.
>
> Many people run a trunk version of Solr in production. Some never would.
> Generally, bugs are fixed quickly, and trunk is pretty stable. The main
> issue is index format changes and upgrades. If you use trunk you generally
> have to be willing to reindex to upgrade. That's one nice thing about this
> Alpha - we are saying that unless there is a really bad bug, you will be
> able to upgrade to future versions without reindexing.
>
> Most of the code itself has been in dev and use for years - so it's not so
> risky in my opinion. It's almost more about Java APIs and what not than
> code stability when we say Alpha.
>
> In fact, just read this
> http://www.lucidimagination.com/blog/2012/07/03/4-0-alpha-whats-in-a-name/
>
> That should help clarify what this release is.
>
> On Fri, Jul 13, 2012 at 6:51 AM, John Field 
> wrote:
>
> > Hi, we are considering a long-term project (likely lifecycle of
> > several years) with an initial production release in approximately
> > three months.
> >
> > We're intending to use Solr 3.6.0, with a view for upgrading to 4.0
> > upon stable release.
> >
> > However, http://lucene.apache.org/solr/ now has 4.0-ALPHA as the main
> > download, implying this version is for general use.
> >
> > But on the other hand, the release notes state "This is an alpha
> > release for early adopters." and http://wiki.apache.org/solr/Solr4.0
> > gives a timescale of 60 days minimum before final release.
> >
> > We'd like to use 4.0 features such as near real-time updates, but
> > haven't identified these as must-haves for the initial release.
> >
> > Given that our first production release is likely to occur a month
> > after that 60 days, is 4.0-ALPHA suitable for general product
> > development, or is it recommended to stick with 3.6.0 and accept an
> > upgrade cost when 4.0 is
> > stable?
> >
> > (Perhaps this hinges on understanding why 4.0-ALPHA is now the main
> > download option).
> >
> > Thanks.
> >
>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>



-- 

John Field, Software Architect
http://www.alexanderstreet.com - Alexander Street Press, world-leading
digital humanities publisher.


Re: Solr - Spatial Search for Specif Areas on Map

2012-07-16 Thread samabhiK
David,

Thanks for such a detailed response. The data volume I mentioned is the
total set of records we have - but we would never ever need to search the
entire base in one query; we would divide the data by region or zip code.
So, in that case I assume that for a single region, we would not have more
than 200M records (this is real , we have a region with that many records).

So, I can assume that I can create shards based on regions and the requests
would get distributed among these region servers, right? You also mentioned
about ~20 concurrent queries per shard - do you have links to some
benchmarks? I am very interested to know about the hardware sizing details
for such a setup.

About setting up Solr for a single shard, I think I will go by your advice. 
Will see how much a single shard can handle in a decent machine :)

The reason why I came up with that figure was, I have a user base of 500k
and theres a lot of activity which would happen on the map - every time
someone moves the tiles, zooms in/out, scrolls, we are going to send a
server side request to fetch some data ( I agree we can benefit much using
caching but I believe Solr itself has its own local cache). I might be a bit
unrealistic with my 10K rps projections but I have read about 9K rps to map
servers from some sources on the internet. 

And, NO, I don't work for Google :) But who knows we might be building
something that can get so much traffic to us in a while. :D

BTW, my question still remains - can we do search on polygonal areas on the
map? If so, do you have any link where i can get more details? Bounding Box
thing wont work for me I guess :(

Sam


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Spatial-Search-for-Specif-Areas-on-Map-tp3995051p3995209.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr facet multiple constraint

2012-07-16 Thread davidbougearel
Ok i'm added the debug, there is the query from the response after executing
query :

facet=true,sort=publishingdate
desc,debugQuery=true,facet.mincount=1,q=service:1 AND
publicationstatus:LIVE,facet.field=pillar,wt=javabin,fq=(((pillar:10))),version=2}},response={numFound=2,start=0,docs=[SolrDocument[{uniquenumber=UniqueNumber1,
name=Doc 1, publicationstatus=LIVE, service=1, servicename=service_1,
pillar=[10], region=EU, regionname=Europe, documenttype=TRACKER,
publishingdate=Sun Jul 15 09:03:32 CEST 2012, publishingyear=2012,
teasersummary=Seo_Description, content=answer, creator=chandan, version=1,
documentinstanceid=1}], SolrDocument[{uniquenumber=UniqueNumber2, name=Doc
2, publicationstatus=LIVE, service=1, servicename=service_1, pillar=[10],
region=EU, regionname=Europe, documenttype=TRACKER, publishingdate=Sat Jul
14 09:03:32 CEST 2012, publishingyear=2012, teasersummary=Seo_Description,
content=answer, creator=chandan, version=1,
documentinstanceid=1}]]},facet_counts={facet_queries={},facet_fields={pillar={10=2}},facet_dates={},facet_ranges={}},debug={rawquerystring=service:1
AND publicationstatus:LIVE,querystring=service:1 AND
publicationstatus:LIVE,parsedquery=+service:1
+publicationstatus:LIVE,parsedquery_toString=+service:1
+publicationstatus:LIVE,explain={UniqueNumber1=
1.2917422 = (MATCH) sum of:
  0.7741482 = (MATCH) weight(service:1 in 0), product of:
0.7741482 = queryWeight(service:1), product of:
  1.0 = idf(docFreq=4, maxDocs=5)
  0.7741482 = queryNorm
1.0 = (MATCH) fieldWeight(service:1 in 0), product of:
  1.0 = tf(termFreq(service:1)=1)
  1.0 = idf(docFreq=4, maxDocs=5)
  1.0 = fieldNorm(field=service, doc=0)
  0.517594 = (MATCH) weight(publicationstatus:LIVE in 0), product of:
0.6330043 = queryWeight(publicationstatus:LIVE), product of:
  0.81767845 = idf(docFreq=5, maxDocs=5)
  0.7741482 = queryNorm
0.81767845 = (MATCH) fieldWeight(publicationstatus:LIVE in 0), product
of:
  1.0 = tf(termFreq(publicationstatus:LIVE)=1)
  0.81767845 = idf(docFreq=5, maxDocs=5)
  1.0 = fieldNorm(field=publicationstatus, doc=0)
,UniqueNumber2=
1.2917422 = (MATCH) sum of:
  0.7741482 = (MATCH) weight(service:1 in 0), product of:
0.7741482 = queryWeight(service:1), product of:
  1.0 = idf(docFreq=4, maxDocs=5)
  0.7741482 = queryNorm
1.0 = (MATCH) fieldWeight(service:1 in 0), product of:
  1.0 = tf(termFreq(service:1)=1)
  1.0 = idf(docFreq=4, maxDocs=5)
  1.0 = fieldNorm(field=service, doc=0)
  0.517594 = (MATCH) weight(publicationstatus:LIVE in 0), product of:
0.6330043 = queryWeight(publicationstatus:LIVE), product of:
  0.81767845 = idf(docFreq=5, maxDocs=5)
  0.7741482 = queryNorm
0.81767845 = (MATCH) fieldWeight(publicationstatus:LIVE in 0), product
of:
  1.0 = tf(termFreq(publicationstatus:LIVE)=1)
  0.81767845 = idf(docFreq=5, maxDocs=5)
  1.0 = fieldNorm(field=publicationstatus, doc=0)
},QParser=LuceneQParser,filter_queries=[(((pillar:10)))

As you can see in this request i'm talking about pillar not about user.

Thanks for all, David.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-facet-multiple-constraint-tp3992974p3995215.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: are stopwords indexed?

2012-07-16 Thread Lance Norskog
Look at the index with the Schema Browser in the Solr UI. This pulls
the terms for each field.

On Sun, Jul 15, 2012 at 8:38 PM, Giovanni Gherdovich
 wrote:
> Hi all,
>
> are stopwords from the stopwords.txt config file
> supposed to be indexed?
>
> I would say no, but this is the situation I am
> observing on my Solr instance:
>
> * I have a bunch of stopwords in stopwords.txt
> * my fields are of fieldType "text" from the example schema.xml,
>   i.e. I have
>
> -- -- >8 -- -- >8 -- -- >8 -- -- >8
>
>   
> [...]
>  ignoreCase="true"
> words="stopwords_FR.txt"
> enablePositionIncrements="true"
> />
> [...]
>   
>   
>  [...]
>   ignoreCase="true"
> words="stopwords_FR.txt"
> enablePositionIncrements="true"
> />
>   
>
> -- -- >8 -- -- >8 -- -- >8 -- -- >8
>
> * searching for a stopwords thru solr gives always zero results
> * inspecting the index with LuCLI
> http://manpages.ubuntu.com/manpages/natty/man1/lucli.1.html
>   show that all stopwords are in my index. Note that I query
>   LuCLI specifying the field, i.e. with "myFieldName:and"
>   and not just with the stopword "and".
>
> Is this normal?
>
> Are stopwords indexed?
>
> Cheers,
> Giovanni



-- 
Lance Norskog
goks...@gmail.com