Re: 3.5 QueryResponseWriter

2011-12-29 Thread Chris Hostetter

: Looks like you've experienced the issue described with fixes here: 
: 

but specifically, since you've already copied the jar file in question, 
and are now getting a class not found for the *baseclass* it suggests you 
have a diff problem

: > What I have done this far is basicly just to copy the /example/solr 
: folder, install the webapp .war file in a tomcat instance and start up. 
: > > At first I complained about the VelocityResponseWriter, so i created 
: a /lib folder in /$SOLR_HOME and added the velocity jar from dist. That 
: seemed to take care of the VRW error. > > But now I get an 
: "NoClassDefFoundError" wich sais something about QueryResponseWriter. So 

...that suggests that it is loading VRW at a higher (or lower depending on 
how you look at it) classloader then where it loads the rest of the solr 
jars.

if you are using the example solr setup, then it sounds like you copied 
the jar to "example/lib" (which is where the jetty jars live) instead of 
"example/solr/lib" (which would be a new lib folder in the $SOLR_HOME dir.

unfortunately, people frequently get these confused, which is one of the 
reasons i have started encouraging people to just use the  
declarations in their solrconfig.xml file instead of making a single "lib" 
dir in $SOLR_HOME.  (but either way, you'll need to remove the copy of the 
VRW jar you've got loading in the system classpath before either approach 
will work)



-Hoss


Re: strange performance issue with many shards on one server

2011-12-29 Thread Ken Krugler
Hi Frederik,

Did you figure out a solution to this problem?

I'm asking because I recently ran into a similar problem, with a similar setup 
(8 shards on one server).

Occasionally a query will take a very long time. Occasionally I see timeout 
exceptions with the HTTP requests. E.g.

> 348914 [pool-19-thread-14] INFO 
> org.apache.commons.httpclient.HttpMethodDirector - I/O exception 
> (org.apache.commons.httpclient.NoHttpResponseException) caught when 
> processing request: The
>  server localhost failed to respond
> 348915 [pool-19-thread-14] INFO 
> org.apache.commons.httpclient.HttpMethodDirector - Retrying request


Restarting Jetty seems to clear up the problem temporarily.

I've been looking at the code in Solr that handles distributed requests - and 
it's got some interesting smells, so I wouldn't be surprised if there's an 
issue related to how it's using HttpClient.

Regards,

-- Ken


On Sep 28, 2011, at 5:21am, Frederik Kraus wrote:

> I just had a look at the thread-dump, pasting 3 examples here:
> 
> 
> 'pool-31-thread-8233' Id=11626, BLOCKED on 
> lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9,
>  total cpu time=20.ms user time=20.ms
> at 
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool.freeConnection(MultiThreadedHttpConnectionManager.java:982)
>  
> at 
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.releaseConnection(MultiThreadedHttpConnectionManager.java:643)
>  
> at 
> org.apache.commons.httpclient.HttpConnection.releaseConnection(HttpConnection.java:1179)
>  
> at 
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.releaseConnection(MultiThreadedHttpConnectionManager.java:1423)
>  
> at 
> org.apache.commons.httpclient.HttpMethodBase.ensureConnectionRelease(HttpMethodBase.java:2430)
>  
> at 
> org.apache.commons.httpclient.HttpMethodBase.responseBodyConsumed(HttpMethodBase.java:2422)
>  
> at 
> org.apache.commons.httpclient.HttpMethodBase$1.responseConsumed(HttpMethodBase.java:1892)
>  
> at 
> org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:198)
>  
> at 
> org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158)
>  
> at 
> org.apache.commons.httpclient.HttpMethodBase.releaseConnection(HttpMethodBase.java:1181)
>  
> at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:486)
>  
> at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
>  
> at 
> org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421)
>  
> at 
> org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393)
>  
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) 
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>  
> at java.lang.Thread.run(Thread.java:662) 
> 
> 'pool-31-thread-8232' Id=11625, BLOCKED on 
> lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9,
>  total cpu time=20.ms user time=20.ms
> at 
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:447)
>  
> at 
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
>  
> at 
> org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
>  
> at 
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) 
> at 
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) 
> at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:427)
>  
> at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
>  
> at 
> org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421)
>  
> at 
> org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393)
>  
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) 
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>  
> a

RE: Solr, SQL Server's LIKE

2011-12-29 Thread Chris Hostetter

: Thanks. I know I'll be able to utilize some of Solr's free text 
: searching capabilities in other search types in this project. The 
: product manager wants this particular search to exactly mimic LIKE%.
...
: Ex: If I search "Albatross" I want "Albert" to be excluded completely, 
: rather than having a low score.

please be specific about the types of queries you want. ie: we need more 
then one example of the type of input you want to provide, the type of 
matches you want to see for that input, and the type of matches you want 
to get back.

in your first message you said you need to match company titles "pretty 
exactly" but then seem to contradict yourself by saying the SQL's LIKE 
command fit's the bill -- even though the SQL LIKE command exists 
specificly for in-exact matches on field values.

Based on your one example above of Albatross, you don't need anything 
special: don't use ngrams, don't use stemming, don't use fuzzy anything -- 
just search for "Albatross" and it will match "Albatross" but not 
"Albert".  if you want "Albatross" to match "Albatross Road" use some 
basic tokenization.

If all you really care about is prefix searching (which seems suggested by 
your "LIKE%" comment above, which i'm guessing is shorthand for something 
similar to "LIKE 'ABC%'"), so that queries like "abc" and "abcd" both 
match "abcdef" and "abcd" but neither of them match "abcd" 
then just use prefix queries (ie: "abcd*") -- they should be plenty 
efficient for your purposes.  you only need to worry about ngrams when you 
want to efficiently match in the middle of a string. (ie: "TITLE LIKE 
%ABC%")


-Hoss


Re: Solr, SQL Server's LIKE

2011-12-29 Thread Sujit Pal
Hi Devon,

Have you considered using a permuterm index? Its workable, but depending
on your requirements (size of fields that you want to create the index
on), it may bloat your index. I've written about it here:
http://sujitpal.blogspot.com/2011/10/lucene-wildcard-query-and-permuterm.html 

Another alternative which I've implemented is a custom mechanism that
retrieves a list of matching unique ids from a database table using a
SQL LIKE, then passes this list as a filter to the main query. Its
hacky, but I was building a custom handler anyway, so it was quite
simple to add in.

-sujit

On Thu, 2011-12-29 at 11:38 -0600, Devon Baumgarten wrote:
> I have been tinkering with Solr for a few weeks, and I am convinced that it 
> could be very helpful in many of my upcoming projects. I am trying to decide 
> whether Solr is appropriate for this one, and I haven't had luck looking for 
> answers on Google.
> 
> I need to search a list of names of companies and individuals pretty exactly. 
> T-SQL's LIKE operator does this with decent performance, but I have a feeling 
> there is a way to configure Solr to do this better. I've tried using an edge 
> N-gram tokenizer, but it feels like it might be more complicated than 
> necessary. What would you suggest?
> 
> I know this sounds kind of 'Golden Hammer,' but there has been talk of other, 
> more complicated (magic) searches that I don't think SQL Server can handle, 
> since its tokens (as far as I know) can't be smaller than one word.
> 
> Thanks,
> 
> Devon Baumgarten
> 



Re: Enabling realtime search in Solr 4.0

2011-12-29 Thread Mark Miller
On Thu, Dec 29, 2011 at 2:35 PM, Avner Levy  wrote:
>
> I've read in Solr-RA documentation that if you add
> true you can add documents and search for them without
> any commit at all (and I assumed it is functionality of Solr).
>

In Solr 4 (trunk) you can either set up the soft auto commit for every
second (or you can try less) or, as Yonik mentioned, you can also modify a
commit command to make it soft. In both cases, a new IndexReader will be
gotten from the IndexWriter.


> So I guess there isn't a way to get such functionality in Solr 4.0, right?
> I think this relates to the ability to open readers from the writer if I
> understood it correctly?
>

No, it's there.


-Original Message-
> From: Mark Miller [mailto:markrmil...@gmail.com]
> Sent: Thursday, December 29, 2011 5:16 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Enabling realtime search in Solr 4.0
>
>
> On Dec 29, 2011, at 3:39 AM, Avner Levy wrote:
>
> > Hi,
> > I'm trying to enable realtime search in Solr 4.0 (So I can see new
> documents without committing).
> > I've added:
> > true
> > 
> >  ${solr.data.dir:}
> > 
> >
> > But documents aren't seen before commit (or softCommit).
> > Any help will be appreciated.
> > Thanks,
> > Avner
>
>
> This is how you enable soft auto commit in trunk:
> http://wiki.apache.org/solr/SolrConfigXml?#Update_Handler_Section
>
> You do not need the update log for it - that is for realtime GET (where
> you would also need to set that up in a Request Handler).
>
> Sounds like you are conflating the two.
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>
>
> Scanned by Check Point Total Security Gateway.
>



-- 
- Mark

http://www.lucidimagination.com


RE: Solr, SQL Server's LIKE

2011-12-29 Thread Devon Baumgarten
Erick,

Thanks. I know I'll be able to utilize some of Solr's free text searching 
capabilities in other search types in this project. The product manager wants 
this particular search to exactly mimic LIKE%.

N-Grams get me pretty great results in general, but I don't want the results 
for this particular search to be fuzzy. How can I prevent the fuzzy matches 
from appearing?

Ex: If I search "Albatross" I want "Albert" to be excluded completely, rather 
than having a low score.

Devon Baumgarten


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, December 29, 2011 3:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr, SQL Server's LIKE

SQLs "like" is usually handled with ngrams if you want
*stuff* kinds of searches. Wildcards are "interesting"
in Solr.

Things Solr handles that aren't easy in SQL
Phrases, phrases with slop, stemming,
synonyms. And, especially, some kind
of relevance ranking.

But Solr does NOT do the things SQL is best at,
things like joins etc. Each has it's sweet spot
and trying to make one do all the functions of the
other is fraught with places to go wrong.

Not a lot of help, but free text searching is what Solr is
all about, so if your problem maps into that space,
it's a great tool!

Best
Erick

On Thu, Dec 29, 2011 at 1:06 PM, Shashi Kant  wrote:
> for a simple, hackish (albeit inefficient) approach look up wildcard searchers
>
> e,g foo*, *bar
>
>
>
> On Thu, Dec 29, 2011 at 12:38 PM, Devon Baumgarten
>  wrote:
>> I have been tinkering with Solr for a few weeks, and I am convinced that it 
>> could be very helpful in many of my upcoming projects. I am trying to decide 
>> whether Solr is appropriate for this one, and I haven't had luck looking for 
>> answers on Google.
>>
>> I need to search a list of names of companies and individuals pretty 
>> exactly. T-SQL's LIKE operator does this with decent performance, but I have 
>> a feeling there is a way to configure Solr to do this better. I've tried 
>> using an edge N-gram tokenizer, but it feels like it might be more 
>> complicated than necessary. What would you suggest?
>>
>> I know this sounds kind of 'Golden Hammer,' but there has been talk of 
>> other, more complicated (magic) searches that I don't think SQL Server can 
>> handle, since its tokens (as far as I know) can't be smaller than one word.
>>
>> Thanks,
>>
>> Devon Baumgarten
>>


RE: NoClassDefFoundError: org/apache/solr/common/params/SolrParams

2011-12-29 Thread Dyer, James
The SolrParams class is in the solrj.jar file so you should verify that this is 
in the classpath.  Also see if it is listed in the manifest.mf file in the 
war's META-INF dir.  If you're running this on a server within Eclipse and 
letting Eclipse do the deploy, my experience is it can be frustrating at times 
to get Eclipse to get the dependencies right.  In this case look at the "Java 
EE Module Dependencies" screen in Eclipse.  I often resort to hand-editing the 
"org.eclipse.wst.common.component" file in the project's ".settings" directory.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Bruno Adam Osiek [mailto:baos...@gmail.com] 
Sent: Thursday, December 29, 2011 4:17 PM
To: solr-user@lucene.apache.org
Subject: NoClassDefFoundError: org/apache/solr/common/params/SolrParams

Hi,

I'm trying to deploy a Solrj based application into JBoss AS 7 using 
Eclipse Indigo. When deploying it I get the following error message:



ERROR [org.jboss.msc.service.fail] (MSC service thread 1-4) MSC1: 
Failed to start service 
jboss.deployment.unit."SolrIntegration.war".POST_MODULE: 
org.jboss.msc.service.StartException in service 
jboss.deployment.unit."SolrIntegration.war".POST_MODULE: Failed to 
process phase POST_MODULE of deployment "SolrIntegration.war"
 at 
org.jboss.as.server.deployment.DeploymentUnitPhaseService.start(DeploymentUnitPhaseService.java:121)
 at 
org.jboss.msc.service.ServiceControllerImpl$StartTask.startService(ServiceControllerImpl.java:1824)
 at 
org.jboss.msc.service.ServiceControllerImpl$StartTask.run(ServiceControllerImpl.java:1759)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) 
[:1.7.0_02]
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) 
[:1.7.0_02]
 at java.lang.Thread.run(Thread.java:722) [:1.7.0_02]
*Caused by: java.lang.NoClassDefFoundError: 
org/apache/solr/common/params/SolrParams*
 at java.lang.Class.getDeclaredConstructors0(Native Method) [:1.7.0_02]
 at java.lang.Class.privateGetDeclaredConstructors(Class.java:2404) 
[:1.7.0_02]
 at java.lang.Class.getConstructor0(Class.java:2714) [:1.7.0_02]
 at java.lang.Class.getConstructor(Class.java:1674) [:1.7.0_02]
 at 
org.jboss.as.web.deployment.jsf.JsfManagedBeanProcessor.deploy(JsfManagedBeanProcessor.java:105)
 at 
org.jboss.as.server.deployment.DeploymentUnitPhaseService.start(DeploymentUnitPhaseService.java:115)
 ... 5 more
*Caused by: java.lang.ClassNotFoundException: 
org.apache.solr.common.params.SolrParams* from [Module 
"deployment.SolrIntegration.war:main" from Service Module Loader]
 at 
org.jboss.modules.ModuleClassLoader.findClass(ModuleClassLoader.java:191)
 at 
org.jboss.modules.ConcurrentClassLoader.performLoadClassChecked(ConcurrentClassLoader.java:361)
 at 
org.jboss.modules.ConcurrentClassLoader.performLoadClassChecked(ConcurrentClassLoader.java:333)
 at 
org.jboss.modules.ConcurrentClassLoader.performLoadClass(ConcurrentClassLoader.java:310)
 at 
org.jboss.modules.ConcurrentClassLoader.loadClass(ConcurrentClassLoader.java:103)
 ... 11 more
===

I have searched with no success for a solution.

I've managed to deploy successfully *solr.war* into JBoss.

Any help will be welcomed.

Regards.


NoClassDefFoundError: org/apache/solr/common/params/SolrParams

2011-12-29 Thread Bruno Adam Osiek

Hi,

I'm trying to deploy a Solrj based application into JBoss AS 7 using 
Eclipse Indigo. When deploying it I get the following error message:




ERROR [org.jboss.msc.service.fail] (MSC service thread 1-4) MSC1: 
Failed to start service 
jboss.deployment.unit."SolrIntegration.war".POST_MODULE: 
org.jboss.msc.service.StartException in service 
jboss.deployment.unit."SolrIntegration.war".POST_MODULE: Failed to 
process phase POST_MODULE of deployment "SolrIntegration.war"
at 
org.jboss.as.server.deployment.DeploymentUnitPhaseService.start(DeploymentUnitPhaseService.java:121)
at 
org.jboss.msc.service.ServiceControllerImpl$StartTask.startService(ServiceControllerImpl.java:1824)
at 
org.jboss.msc.service.ServiceControllerImpl$StartTask.run(ServiceControllerImpl.java:1759)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) 
[:1.7.0_02]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) 
[:1.7.0_02]

at java.lang.Thread.run(Thread.java:722) [:1.7.0_02]
*Caused by: java.lang.NoClassDefFoundError: 
org/apache/solr/common/params/SolrParams*

at java.lang.Class.getDeclaredConstructors0(Native Method) [:1.7.0_02]
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2404) 
[:1.7.0_02]

at java.lang.Class.getConstructor0(Class.java:2714) [:1.7.0_02]
at java.lang.Class.getConstructor(Class.java:1674) [:1.7.0_02]
at 
org.jboss.as.web.deployment.jsf.JsfManagedBeanProcessor.deploy(JsfManagedBeanProcessor.java:105)
at 
org.jboss.as.server.deployment.DeploymentUnitPhaseService.start(DeploymentUnitPhaseService.java:115)

... 5 more
*Caused by: java.lang.ClassNotFoundException: 
org.apache.solr.common.params.SolrParams* from [Module 
"deployment.SolrIntegration.war:main" from Service Module Loader]
at 
org.jboss.modules.ModuleClassLoader.findClass(ModuleClassLoader.java:191)
at 
org.jboss.modules.ConcurrentClassLoader.performLoadClassChecked(ConcurrentClassLoader.java:361)
at 
org.jboss.modules.ConcurrentClassLoader.performLoadClassChecked(ConcurrentClassLoader.java:333)
at 
org.jboss.modules.ConcurrentClassLoader.performLoadClass(ConcurrentClassLoader.java:310)
at 
org.jboss.modules.ConcurrentClassLoader.loadClass(ConcurrentClassLoader.java:103)

... 11 more
===

I have searched with no success for a solution.

I've managed to deploy successfully *solr.war* into JBoss.

Any help will be welcomed.

Regards.


Re: Frequent Indexing of same Documents

2011-12-29 Thread Erick Erickson
See below

1) Is there a significant difference in performances between a freshly
created core ("the first time to index"), to an "old" core (every
document already exists is the core)?
not really. Documents are indexed in "segments", and a fresh one is
usually opened after every commit (you only commit after some time,
it's not once per document).

2) when updating a document, is it updated in-place, or is the old
copy (according to primary key) marked as deleted and a new document
is inserted?

The old copy is marked as deleted and a complete new document is added
to the index. The deleted copy of the document is gradually purged
over time as segments are merged. All of them can be purged by an
optimize, but this step is often unnecessary.

3) will indexing the same documents over and over again will increase
the size of the index? (assuming the documents did not changed much)
Yes. Since it's a delete (mark as) followed by an add the index will
get bigger. As above, though, the space is reclaimed as time passes.


4) sorry for the dumb questions. My boss is making me ask them :-D .
A decent question is "why does your boss want to know"? That is, what
is the higher-level question that's causing the worry? re-indexing the
same document causes the index to grow. You don't care if you're
indexing a measly 100,000 documents. You might care if you're indexing
100,000,000 docs.

Best
Erick
On Thu, Dec 29, 2011 at 2:01 PM, Gora Mohanty  wrote:
> On Fri, Dec 30, 2011 at 12:25 AM, Avni, Itamar  wrote:
> [...]
>> This electronic message may contain proprietary and confidential information 
>> of Verint Systems Inc., its affiliates and/or subsidiaries.
>> The information is intended to be for the use of the individual(s) or
>> entity(ies) named above.  If you are not the intended recipient (or 
>> authorized to receive this e-mail for the intended recipient), you may not 
>> use, copy, disclose or distribute to anyone this message or any information 
>> contained in this message.  If you have received this electronic message in 
>> error, please notify us by replying to this e-mail.
>>
>
> I would reply, but am afraid to, as I am not in the list of
> individual(s) nor entity(ies) [sic!] named above.
>
> Regards,
> Gora


Re: Solr, SQL Server's LIKE

2011-12-29 Thread Erick Erickson
SQLs "like" is usually handled with ngrams if you want
*stuff* kinds of searches. Wildcards are "interesting"
in Solr.

Things Solr handles that aren't easy in SQL
Phrases, phrases with slop, stemming,
synonyms. And, especially, some kind
of relevance ranking.

But Solr does NOT do the things SQL is best at,
things like joins etc. Each has it's sweet spot
and trying to make one do all the functions of the
other is fraught with places to go wrong.

Not a lot of help, but free text searching is what Solr is
all about, so if your problem maps into that space,
it's a great tool!

Best
Erick

On Thu, Dec 29, 2011 at 1:06 PM, Shashi Kant  wrote:
> for a simple, hackish (albeit inefficient) approach look up wildcard searchers
>
> e,g foo*, *bar
>
>
>
> On Thu, Dec 29, 2011 at 12:38 PM, Devon Baumgarten
>  wrote:
>> I have been tinkering with Solr for a few weeks, and I am convinced that it 
>> could be very helpful in many of my upcoming projects. I am trying to decide 
>> whether Solr is appropriate for this one, and I haven't had luck looking for 
>> answers on Google.
>>
>> I need to search a list of names of companies and individuals pretty 
>> exactly. T-SQL's LIKE operator does this with decent performance, but I have 
>> a feeling there is a way to configure Solr to do this better. I've tried 
>> using an edge N-gram tokenizer, but it feels like it might be more 
>> complicated than necessary. What would you suggest?
>>
>> I know this sounds kind of 'Golden Hammer,' but there has been talk of 
>> other, more complicated (magic) searches that I don't think SQL Server can 
>> handle, since its tokens (as far as I know) can't be smaller than one word.
>>
>> Thanks,
>>
>> Devon Baumgarten
>>


Re: How can I check if a more complex query condition matched?

2011-12-29 Thread Chris Hostetter

: I have a more complex query condition like this:
: 
: (city:15 AND country:60)^4 OR city:15^2 OR country:60^2
: 
: What I want to achive with this query is basically if a document has
: city = 15 AND country = 60 it is more important then another document
: which only has city = 15 OR country = 60

what you've got there will do that, but to a lesser degree so will a 
simple query for both clausees...

  q=(city:15 country:60)

will score documents higher if they match both clauses then if the only 
match one becaues of the coord factor -- you can check the debuqQuery 
score explanations to see the details.  If you want the descrepencies in 
scores to be more significant, you can go the route you have, or you can 
customize the similarity to stricter about the coord factor, but that will 
apply to all boolean queries.

: Furhtermore I want to show in my results view why a certain document
: matched, something like "matched city and country" or "matched city
: only" or "matched country only".

this is a bit trickier, but thanks to the new psuedo-fields feature it 
will work in 4.0 (and already works on the trunk)...

any function can be specified in the "fl" param and each document returned 
will include the value that document has for that function -- so 
regardless of how complex or simple your main query is, you could use the 
"query(...)" function in the fl to see what score it gets against some 
other arbitrary function...

Here's an example URL using the example data/configs...

http://localhost:8983/solr/select?q=hard+drive&fl=score,id,name,is_canon:query%28{!v=%27manu_id_s:canon%27}%29


-Hoss


Re: Enabling realtime search in Solr 4.0

2011-12-29 Thread Yonik Seeley
On Thu, Dec 29, 2011 at 2:35 PM, Avner Levy  wrote:
> Thanks Mark, I appreciate your help.
> I need the Solr index to be in sync with my database.
> This means that even if one record was added I need it to appear in the next 
> search (including faceting).

You could just add softCommit=true on every update command I think (or
add it as a default parameter on the update handler).

It's probably worth exploring this requirement of having the index "in
sync" with the database though.
Even if you guarantee that no updates are made without being
searchable, that still doesn't imply completely "in sync" since the
update to the database and to solr are two separate requests.

> Does anyone knows how different is Solr-RA from the regular Solr?

No idea.  I assume they just use Lucene's NRT support (as Solr 4.0 now does).

-Yonik
http://www.lucidimagination.com


Re: solr keep old docs

2011-12-29 Thread Alexander Aristov
well. The first results are ready. I have implemented custom update
processor following your suggestion using low level index reader and
termdocs.

Launched scripts which add about 10 000 docs. Indexing took about 1 minute
including commit that is quite good for me. I don't have larger datasets so
won't be able to check with heavier conditions.

If someone is interested I can send over my jar file with my update
processor.

As I said I am ready to contribute it to solr but will get back to it in
the New Year after 10 Jan.

thanks everybody.

Best Regards
Alexander Aristov


On 29 December 2011 18:12, Erick Erickson  wrote:

> I'd guess it would be much faster, assuming that
> the search savings wouldn't be swamped by the
> additional transmission time over the wire and
> parsing the request (although SolrJ uses a binary
> format, so parsing request probably isn't all
> that expensive).
>
> You could even do a hybrid approach. Pack up all
> of the IDs you are about to update, send them to
> your special *request* handler and have your
> request handler respond with the documents that
> were already in the index...
>
> Hmmm, scratch all that. Start with just stringing
> together a long set of  and just
> search for them. Something like
> q=id:(1 2 47 09873)&fl=id
> The response should be a minimal set of data
> returned (just the ID). Then you can remove
> each document ID returned from your
> next update. No custom Solr components
> required.
>
> Solr defaults to a maxBooleanClause count
> of 1024, so your packets should have fewer IDs
> this or you should bump that config setting.
>
> This should pretty much do what I was thinking
> with custom code without having to write
> anything..
>
> Best
> Erick
>
> On Thu, Dec 29, 2011 at 8:15 AM, Alexander Aristov
>  wrote:
> > I have never developed for solr yet and don't know much internals but
> Today
> > I have tried one approach with searcher.
> >
> > In my update processor I get searcher and search for ID. It works but I
> > need to load test it. Will index traversal be faster (less resource
> > consuming) than search?
> >
> > Best Regards
> > Alexander Aristov
> >
> >
> > On 29 December 2011 17:03, Erick Erickson 
> wrote:
> >
> >> Hmmm, we're not communicating ...
> >>
> >> The update processor wouldn't search in the
> >> classic sense. It would just use lower-level
> >> index traversal to determine if the doc (identified
> >> by your unique key) was already in the index
> >> and skip indexing that document if it was. No real
> >> *searching* involved (see TermDocs.seek for one
> >> approach).
> >>
> >> The price would be that you are transmitting the
> >> document over to the Solr instance and then
> >> throwing it away.
> >>
> >> Best
> >> Erick
> >>
> >> On Thu, Dec 29, 2011 at 12:52 AM, Mikhail Khludnev
> >>  wrote:
> >> > Alexander,
> >> >
> >> > I have two ideas how to implement fast dedupe externally, assuming
> your
> >> PKs
> >> > don't fit to java.util.*Map:
> >> >
> >> >   - your crawler can use inprocess RDBMS (Derby, H2) to track dupes;
> >> >   - if your crawler is stateless - it doesn't track PKs which has been
> >> >   already crawled, you can retrieve it from Solr via
> >> >   http://wiki.apache.org/solr/TermsComponent . That's blazingly fast,
> >> but
> >> >   it might be a problem with removed documents (I'm not sure). And
> it's
> >> also
> >> >   can lead to OOMException (if you have too much PKs). Let me know if
> you
> >> >   need a workaround for one of these problems.
> >> >
> >> > If you choose internal dedupe (UpdateProcessor), pls let me know if
> >> > querying one-by-one will be to slow for your and you'll need to do it
> >> > page-by-page. I did some of such paging, and will do something similar
> >> > soon, so I'm interested in it.
> >> >
> >> > Regards
> >> >
> >> > On Thu, Dec 29, 2011 at 9:34 AM, Alexander Aristov <
> >> > alexander.aris...@gmail.com> wrote:
> >> >
> >> >> Unfortunately I have a lot of duplicates  and taking that searching
> >> might
> >> >> suffer I will try with implementing update procesor.
> >> >>
> >> >> But your idea is interesting and I will consider it, thanks.
> >> >>
> >> >> Best Regards
> >> >> Alexander Aristov
> >> >>
> >> >>
> >> >> On 28 December 2011 19:12, Tanguy Moal 
> wrote:
> >> >>
> >> >> > Hello Alexander,
> >> >> >
> >> >> > I don't know much about your requirements in terms of size and
> >> >> > performances, but I've had a similar use case and found a pretty
> >> simple
> >> >> > workaround.
> >> >> > If your duplicate rate is not too high, you can have the
> >> >> > SignatureProcessor to generate fingerprint of documents (you
> already
> >> did
> >> >> > that).
> >> >> >
> >> >> > Simply turn off overwritting of duplicates, you can then rely on
> >> solr's
> >> >> > grouping / field collapsing to group your search results by
> >> fingerprints.
> >> >> > You'll then have one document group per "real" document. You can
> use
> >> >> > group.sort to sort your groups by i

RE: Enabling realtime search in Solr 4.0

2011-12-29 Thread Avner Levy
Thanks Mark, I appreciate your help.
I need the Solr index to be in sync with my database.
This means that even if one record was added I need it to appear in the next 
search (including faceting).
I've read in Solr-RA documentation that if you add true 
you can add documents and search for them without any commit at all (and I 
assumed it is functionality of Solr).
So I guess there isn't a way to get such functionality in Solr 4.0, right? I 
think this relates to the ability to open readers from the writer if I 
understood it correctly?
Does anyone knows how different is Solr-RA from the regular Solr?
Thanks in advance,
  Avner


-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Thursday, December 29, 2011 5:16 PM
To: solr-user@lucene.apache.org
Subject: Re: Enabling realtime search in Solr 4.0


On Dec 29, 2011, at 3:39 AM, Avner Levy wrote:

> Hi,
> I'm trying to enable realtime search in Solr 4.0 (So I can see new documents 
> without committing).
> I've added:
> true
> 
>  ${solr.data.dir:}
> 
> 
> But documents aren't seen before commit (or softCommit).
> Any help will be appreciated.
> Thanks,
> Avner


This is how you enable soft auto commit in trunk: 
http://wiki.apache.org/solr/SolrConfigXml?#Update_Handler_Section

You do not need the update log for it - that is for realtime GET (where you 
would also need to set that up in a Request Handler).

Sounds like you are conflating the two.

- Mark Miller
lucidimagination.com












Scanned by Check Point Total Security Gateway.


Frequent Indexing of same Documents

2011-12-29 Thread Avni, Itamar
Hi community,

Say I have lots of documents to index, each with primary key in the index, and 
I index them frequently.
They are not indexed all together (like in bulk), but each in a different time.

1) Is there a significant difference in performances between a freshly created 
core ("the first time to index"), to an "old" core (every document already 
exists is the core)?
2) when updating a document, is it updated in-place, or is the old copy 
(according to primary key) marked as deleted and a new document is inserted?
3) will indexing the same documents over and over again will increase the size 
of the index? (assuming the documents did not changed much)
4) sorry for the dumb questions. My boss is making me ask them :-D .

Iavni


This electronic message may contain proprietary and confidential information of 
Verint Systems Inc., its affiliates and/or subsidiaries.
The information is intended to be for the use of the individual(s) or
entity(ies) named above.  If you are not the intended recipient (or authorized 
to receive this e-mail for the intended recipient), you may not use, copy, 
disclose or distribute to anyone this message or any information contained in 
this message.  If you have received this electronic message in error, please 
notify us by replying to this e-mail.



Re: Frequent Indexing of same Documents

2011-12-29 Thread Gora Mohanty
On Fri, Dec 30, 2011 at 12:25 AM, Avni, Itamar  wrote:
[...]
> This electronic message may contain proprietary and confidential information 
> of Verint Systems Inc., its affiliates and/or subsidiaries.
> The information is intended to be for the use of the individual(s) or
> entity(ies) named above.  If you are not the intended recipient (or 
> authorized to receive this e-mail for the intended recipient), you may not 
> use, copy, disclose or distribute to anyone this message or any information 
> contained in this message.  If you have received this electronic message in 
> error, please notify us by replying to this e-mail.
>

I would reply, but am afraid to, as I am not in the list of
individual(s) nor entity(ies) [sic!] named above.

Regards,
Gora


Re: Solr, SQL Server's LIKE

2011-12-29 Thread Shashi Kant
for a simple, hackish (albeit inefficient) approach look up wildcard searchers

e,g foo*, *bar



On Thu, Dec 29, 2011 at 12:38 PM, Devon Baumgarten
 wrote:
> I have been tinkering with Solr for a few weeks, and I am convinced that it 
> could be very helpful in many of my upcoming projects. I am trying to decide 
> whether Solr is appropriate for this one, and I haven't had luck looking for 
> answers on Google.
>
> I need to search a list of names of companies and individuals pretty exactly. 
> T-SQL's LIKE operator does this with decent performance, but I have a feeling 
> there is a way to configure Solr to do this better. I've tried using an edge 
> N-gram tokenizer, but it feels like it might be more complicated than 
> necessary. What would you suggest?
>
> I know this sounds kind of 'Golden Hammer,' but there has been talk of other, 
> more complicated (magic) searches that I don't think SQL Server can handle, 
> since its tokens (as far as I know) can't be smaller than one word.
>
> Thanks,
>
> Devon Baumgarten
>


Solr, SQL Server's LIKE

2011-12-29 Thread Devon Baumgarten
I have been tinkering with Solr for a few weeks, and I am convinced that it 
could be very helpful in many of my upcoming projects. I am trying to decide 
whether Solr is appropriate for this one, and I haven't had luck looking for 
answers on Google.

I need to search a list of names of companies and individuals pretty exactly. 
T-SQL's LIKE operator does this with decent performance, but I have a feeling 
there is a way to configure Solr to do this better. I've tried using an edge 
N-gram tokenizer, but it feels like it might be more complicated than 
necessary. What would you suggest?

I know this sounds kind of 'Golden Hammer,' but there has been talk of other, 
more complicated (magic) searches that I don't think SQL Server can handle, 
since its tokens (as far as I know) can't be smaller than one word.

Thanks,

Devon Baumgarten



how to configure saxon xslt processor for solr

2011-12-29 Thread vrpar...@gmail.com
Hello all,

 
i want to configure saxon xslt processor for solr; how to do that?
its taking xalan as default processor.

also if its needed to set classpath, please provide me path where can i set
classpath? 

and also how can we check that which xslt processor default use.

i am using solr 1.4 and jboss 6.
 

Thanks & Regards,

Vishal Parekh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-configure-saxon-xslt-processor-for-solr-tp3619117p3619117.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Enabling realtime search in Solr 4.0

2011-12-29 Thread Mark Miller

On Dec 29, 2011, at 3:39 AM, Avner Levy wrote:

> Hi,
> I'm trying to enable realtime search in Solr 4.0 (So I can see new documents 
> without committing).
> I've added:
> true
> 
>  ${solr.data.dir:}
> 
> 
> But documents aren't seen before commit (or softCommit).
> Any help will be appreciated.
> Thanks,
> Avner


This is how you enable soft auto commit in trunk: 
http://wiki.apache.org/solr/SolrConfigXml?#Update_Handler_Section

You do not need the update log for it - that is for realtime GET (where you 
would also need to set that up in a Request Handler).

Sounds like you are conflating the two.

- Mark Miller
lucidimagination.com













[Solr Event Listener plug-in] Execute query search from SolrCore - Java Code

2011-12-29 Thread Alessandro Benedetti
Hi guys,
I'm developing a custom SolrEventListener, and inside the PostCommit()
method I need to execute some queries and collect results.
In my SolrEventListener class, I have a SolrCore
Object( org.apache.solr.core.SolrCore) and a list of queries (Strings ).

How can I use the SolrCore to optimally parse the queries ( I have to parse
them like Solr Query parser does and launch them?

I'm fighting with Searchers and Execute methods in the solrCore object, but
I don't know which is the best way to do this ...

Cheers


-- 
--

Benedetti Alessandro
Personal Page: http://tigerbolt.altervista.org

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


distributed faceting: refineFacets()

2011-12-29 Thread Dmitry Kan
Hello list,

In a distributed faceting search scenario, does SOLR frontend (the merger)
expect shard facets to be pre-sorted (by count or by index) ? If so, when
merging the results, is there some smart strategy for combining the shards
results into a final sorted list? Can someone explain what refineFacets()
method does? Removing the call to it, didn't change the SOLR merger
behaviour, but what I'm not yet sure is what was the performance impact.

Our use case is 16 shards with sizes varying from 30G to 130G. When
searching across the entire shard farm, we have noticed that the merger
SOLR takes 10 seconds on avg per request. It could be that the merger is
simply waiting for *fat* shards to answer. What we have found out is that
each shard would answer within 1-4 seconds. That on average leaves about 6
seconds for the merging part.

We also know in advance, that shards respond with non-intersecting hits.
That practically means, that the merger should simply "concatenate" the
shard results into one list (automatically pre-sorted by design).

Can something be improved in the SOLR merger facet logic here? Should we
look at something else as well?

-- 
Thanks,

Dmitry Kan


Re: 3.5 QueryResponseWriter

2011-12-29 Thread Erik Hatcher
Aleksander -

Looks like you've experienced the issue described with fixes here: 


Erik

On Dec 29, 2011, at 08:40 , Aleksander Akerø wrote:

> Hi!
> 
> So I've decided try out Solr 3.5.0.
> 
> What I have done this far is basicly just to copy the /example/solr folder, 
> install the webapp .war file in a tomcat instance and start up.
> 
> At first I complained about the VelocityResponseWriter, so i created a /lib 
> folder in /$SOLR_HOME and added the velocity jar from dist. That seemed to 
> take care of the VRW error.
> 
> But now I get an "NoClassDefFoundError" wich sais something about 
> QueryResponseWriter. So I guess I'm missing this one too then? But I have a 
> feeling that this should be a part of the solr core jar?
> 
> Maybe someone could explain this for me?
> 
> -- 
> Aleksander Akerø
> Systemutvikler
> Mobil: 944 89 054
> 
> Gurusoft AS
> Telefon: 92 44 09 99
> Østre Kullerød 5, 3241 Sandefjord
> www.gurusoft.no
> 



Re: solr keep old docs

2011-12-29 Thread Erick Erickson
I'd guess it would be much faster, assuming that
the search savings wouldn't be swamped by the
additional transmission time over the wire and
parsing the request (although SolrJ uses a binary
format, so parsing request probably isn't all
that expensive).

You could even do a hybrid approach. Pack up all
of the IDs you are about to update, send them to
your special *request* handler and have your
request handler respond with the documents that
were already in the index...

Hmmm, scratch all that. Start with just stringing
together a long set of  and just
search for them. Something like
q=id:(1 2 47 09873)&fl=id
The response should be a minimal set of data
returned (just the ID). Then you can remove
each document ID returned from your
next update. No custom Solr components
required.

Solr defaults to a maxBooleanClause count
of 1024, so your packets should have fewer IDs
this or you should bump that config setting.

This should pretty much do what I was thinking
with custom code without having to write
anything..

Best
Erick

On Thu, Dec 29, 2011 at 8:15 AM, Alexander Aristov
 wrote:
> I have never developed for solr yet and don't know much internals but Today
> I have tried one approach with searcher.
>
> In my update processor I get searcher and search for ID. It works but I
> need to load test it. Will index traversal be faster (less resource
> consuming) than search?
>
> Best Regards
> Alexander Aristov
>
>
> On 29 December 2011 17:03, Erick Erickson  wrote:
>
>> Hmmm, we're not communicating ...
>>
>> The update processor wouldn't search in the
>> classic sense. It would just use lower-level
>> index traversal to determine if the doc (identified
>> by your unique key) was already in the index
>> and skip indexing that document if it was. No real
>> *searching* involved (see TermDocs.seek for one
>> approach).
>>
>> The price would be that you are transmitting the
>> document over to the Solr instance and then
>> throwing it away.
>>
>> Best
>> Erick
>>
>> On Thu, Dec 29, 2011 at 12:52 AM, Mikhail Khludnev
>>  wrote:
>> > Alexander,
>> >
>> > I have two ideas how to implement fast dedupe externally, assuming your
>> PKs
>> > don't fit to java.util.*Map:
>> >
>> >   - your crawler can use inprocess RDBMS (Derby, H2) to track dupes;
>> >   - if your crawler is stateless - it doesn't track PKs which has been
>> >   already crawled, you can retrieve it from Solr via
>> >   http://wiki.apache.org/solr/TermsComponent . That's blazingly fast,
>> but
>> >   it might be a problem with removed documents (I'm not sure). And it's
>> also
>> >   can lead to OOMException (if you have too much PKs). Let me know if you
>> >   need a workaround for one of these problems.
>> >
>> > If you choose internal dedupe (UpdateProcessor), pls let me know if
>> > querying one-by-one will be to slow for your and you'll need to do it
>> > page-by-page. I did some of such paging, and will do something similar
>> > soon, so I'm interested in it.
>> >
>> > Regards
>> >
>> > On Thu, Dec 29, 2011 at 9:34 AM, Alexander Aristov <
>> > alexander.aris...@gmail.com> wrote:
>> >
>> >> Unfortunately I have a lot of duplicates  and taking that searching
>> might
>> >> suffer I will try with implementing update procesor.
>> >>
>> >> But your idea is interesting and I will consider it, thanks.
>> >>
>> >> Best Regards
>> >> Alexander Aristov
>> >>
>> >>
>> >> On 28 December 2011 19:12, Tanguy Moal  wrote:
>> >>
>> >> > Hello Alexander,
>> >> >
>> >> > I don't know much about your requirements in terms of size and
>> >> > performances, but I've had a similar use case and found a pretty
>> simple
>> >> > workaround.
>> >> > If your duplicate rate is not too high, you can have the
>> >> > SignatureProcessor to generate fingerprint of documents (you already
>> did
>> >> > that).
>> >> >
>> >> > Simply turn off overwritting of duplicates, you can then rely on
>> solr's
>> >> > grouping / field collapsing to group your search results by
>> fingerprints.
>> >> > You'll then have one document group per "real" document. You can use
>> >> > group.sort to sort your groups by indexing date ascending, and
>> >> > group.limit=1 to keep only the oldest one.
>> >> > You can even use group.format = simple to serve results as if no
>> >> > collapsing occured, and use group.ngroups (/!\ could be expansive
>> /!\) to
>> >> > get the real number of deduplicated documents.
>> >> >
>> >> > Of course the index will be larger, as I said, I made no assumption
>> >> > regarding your operating requirements. And search can be a bit slower,
>> >> > depending on the average rate of duplicated documents.
>> >> > But you've got your issue addressed by configuration tuning only...
>> >> > Depending on your project's sizing, it could be time saving.
>> >> >
>> >> > The advantage is that you have the precious information of what
>> content
>> >> is
>> >> > duplicated from where :-)
>> >> >
>> >> > Hope this helps,
>> >> >
>> >> > --
>> >> > Tanguy
>> >> >
>> >> > Le 28/12

Re: Facet Ordering

2011-12-29 Thread Jamie Johnson
Thanks Hoss, I'll take a look at this and see if i can understand this.

On Wed, Dec 28, 2011 at 9:44 PM, Chris Hostetter
 wrote:
>
> : I've seen in the solr faceting overview that it is possible to sort
> : either by count or lexicographically, but is there a way to sort so
> : the lowest counts come back first?
>
> Peter Sturge looked into this a while back and provided a patch, but there
> were some issues with it that never got resolved (in particular, it didnt'
> work for several of hte faceting code paths).  If you are interested in
> helping to add this functinality, resurecting that patch might be a good
> palce to start...
>
> https://issues.apache.org/jira/browse/SOLR-1672
>
> -Hoss


3.5 QueryResponseWriter

2011-12-29 Thread Aleksander Akerø

Hi!

So I've decided try out Solr 3.5.0.

What I have done this far is basicly just to copy the /example/solr 
folder, install the webapp .war file in a tomcat instance and start up.


At first I complained about the VelocityResponseWriter, so i created a 
/lib folder in /$SOLR_HOME and added the velocity jar from dist. That 
seemed to take care of the VRW error.


But now I get an "NoClassDefFoundError" wich sais something about 
QueryResponseWriter. So I guess I'm missing this one too then? But I 
have a feeling that this should be a part of the solr core jar?


Maybe someone could explain this for me?

--
Aleksander Akerø
Systemutvikler
Mobil: 944 89 054

Gurusoft AS
Telefon: 92 44 09 99
Østre Kullerød 5, 3241 Sandefjord
www.gurusoft.no



Re: Large RDBMS dataset

2011-12-29 Thread Alexey Serba
> The problem is that for each record in "fd", Solr makes three distinct SELECT 
> on the other three tables. Of course, this is absolutely inefficient.

You can also try to use GROUP_CONCAT (it's MySQL function, but maybe
there's something similar in MS SQL) to select all the nested 1-N
entities in a single result set as strings joined using some separator
and then split them into multivalued fields in post processing phase
(using regex template transformer or similar)


Re: Best practices for installing and maintaining Solr configuration

2011-12-29 Thread Erick Erickson
This should help: http://wiki.apache.org/solr/SolrTomcat

The difference here is that you're not copying the example directory, you're
copying the example/solr directory. And this is just basically to get the
configuration files and directory structure right. You're not copying
executables, jars, wars, or any of that stuff from example. You get
the war file from the dist directory and that should contain all the
executables & etc.


As to your other questions:
1> If at all possible, upping the match version and reindexing
 are good things to do.
2> It's also a good idea to update the config files. Alternatively,
 you can diff the config files between releases to see what the
 changes are and selectively add them to your config file.
 But you should test, test, test before rolling out into prod.

My rule of thumb for upgrading is to just not upgrade minor
releases unless there are compelling reasons. The CHANGES.txt
file will identify major additions.

There are good reasons not to get too far behind on major
(i.e. 3.x -> 4.x) releases, the primary one being that Solr
only makes an effort to be backwards-compatible through
one major release. i.e. 1.4 can be read by 3.x (there was
no 2.x Solr release). But no attempt will be made to
by 4.x code to read 1.x indexes.

Hope this helps
Erick

On Wed, Dec 28, 2011 at 8:49 AM, Brandon Ramirez
 wrote:
> Hi List,
> I've seen several Solr developers mention the fact that people often copy 
> example/ to become their solr installation and that that is not recommended.  
> We are rebuilding our search functionality to use Solr and will be deploying 
> it in a few weeks.
>
> I have read the README, several wiki articles, mailing list and browsed the 
> Solr distribution.  The example/ directory seems to be the only configuration 
> I can find.  So, I have to ask: what is the recommended way to install Solr?
>
> What about maintaining it?  For example, Is it wise to up the 
> luceneMatchVersion and re-index with every upgrade?  When new configuration 
> options are added in new versions of Solr, should we worry about updating our 
> configuration to include them?  I realize these may be vague questions and 
> the answers could be case-by-case, but some general or high-level 
> documentation may help.
>
> Thanks!
>
>
> Brandon Ramirez | Office: 585.214.5413 | Fax: 585.295.4848
> Software Engineer II | Element K | www.elementk.com
>


Re: solr keep old docs

2011-12-29 Thread Alexander Aristov
I have never developed for solr yet and don't know much internals but Today
I have tried one approach with searcher.

In my update processor I get searcher and search for ID. It works but I
need to load test it. Will index traversal be faster (less resource
consuming) than search?

Best Regards
Alexander Aristov


On 29 December 2011 17:03, Erick Erickson  wrote:

> Hmmm, we're not communicating ...
>
> The update processor wouldn't search in the
> classic sense. It would just use lower-level
> index traversal to determine if the doc (identified
> by your unique key) was already in the index
> and skip indexing that document if it was. No real
> *searching* involved (see TermDocs.seek for one
> approach).
>
> The price would be that you are transmitting the
> document over to the Solr instance and then
> throwing it away.
>
> Best
> Erick
>
> On Thu, Dec 29, 2011 at 12:52 AM, Mikhail Khludnev
>  wrote:
> > Alexander,
> >
> > I have two ideas how to implement fast dedupe externally, assuming your
> PKs
> > don't fit to java.util.*Map:
> >
> >   - your crawler can use inprocess RDBMS (Derby, H2) to track dupes;
> >   - if your crawler is stateless - it doesn't track PKs which has been
> >   already crawled, you can retrieve it from Solr via
> >   http://wiki.apache.org/solr/TermsComponent . That's blazingly fast,
> but
> >   it might be a problem with removed documents (I'm not sure). And it's
> also
> >   can lead to OOMException (if you have too much PKs). Let me know if you
> >   need a workaround for one of these problems.
> >
> > If you choose internal dedupe (UpdateProcessor), pls let me know if
> > querying one-by-one will be to slow for your and you'll need to do it
> > page-by-page. I did some of such paging, and will do something similar
> > soon, so I'm interested in it.
> >
> > Regards
> >
> > On Thu, Dec 29, 2011 at 9:34 AM, Alexander Aristov <
> > alexander.aris...@gmail.com> wrote:
> >
> >> Unfortunately I have a lot of duplicates  and taking that searching
> might
> >> suffer I will try with implementing update procesor.
> >>
> >> But your idea is interesting and I will consider it, thanks.
> >>
> >> Best Regards
> >> Alexander Aristov
> >>
> >>
> >> On 28 December 2011 19:12, Tanguy Moal  wrote:
> >>
> >> > Hello Alexander,
> >> >
> >> > I don't know much about your requirements in terms of size and
> >> > performances, but I've had a similar use case and found a pretty
> simple
> >> > workaround.
> >> > If your duplicate rate is not too high, you can have the
> >> > SignatureProcessor to generate fingerprint of documents (you already
> did
> >> > that).
> >> >
> >> > Simply turn off overwritting of duplicates, you can then rely on
> solr's
> >> > grouping / field collapsing to group your search results by
> fingerprints.
> >> > You'll then have one document group per "real" document. You can use
> >> > group.sort to sort your groups by indexing date ascending, and
> >> > group.limit=1 to keep only the oldest one.
> >> > You can even use group.format = simple to serve results as if no
> >> > collapsing occured, and use group.ngroups (/!\ could be expansive
> /!\) to
> >> > get the real number of deduplicated documents.
> >> >
> >> > Of course the index will be larger, as I said, I made no assumption
> >> > regarding your operating requirements. And search can be a bit slower,
> >> > depending on the average rate of duplicated documents.
> >> > But you've got your issue addressed by configuration tuning only...
> >> > Depending on your project's sizing, it could be time saving.
> >> >
> >> > The advantage is that you have the precious information of what
> content
> >> is
> >> > duplicated from where :-)
> >> >
> >> > Hope this helps,
> >> >
> >> > --
> >> > Tanguy
> >> >
> >> > Le 28/12/2011 15:45, Alexander Aristov a écrit :
> >> >
> >> >  Thanks Eric,
> >> >>
> >> >> it sets me direction. I will be writing new plugin and will get back
> to
> >> >> the
> >> >> dev forum with results and then we will decide next steps.
> >> >>
> >> >> Best Regards
> >> >> Alexander Aristov
> >> >>
> >> >>
> >> >> On 28 December 2011 18:08, Erick Erickson >> erickerick...@gmail.com>>
> >> >>  wrote:
> >> >>
> >> >>  Well, the short answer is that nobody else has
> >> >>> 1>  had a similar requirement
> >> >>> AND
> >> >>> 2>  not found a suitable work around
> >> >>> AND
> >> >>> 3>  implemented the change and contributed it back.
> >> >>>
> >> >>> So, if you'd like to volunteer.
> >> >>>
> >> >>> Seriously. If you think this would be valuable and are
> >> >>> willing to work on it, hop on over to the dev list and
> >> >>> discuss it, open a JIRA and make it work. I'd start
> >> >>> by opening a discussion on the dev list before
> >> >>> opening a JIRA, just to get a sense of where the
> >> >>> snags would be to changing the Solr code, but that's
> >> >>> optional.
> >> >>>
> >> >>> That said, writing your own update request handler
> >> >>> that detected this case isn't very difficult,
> >> >>> extend Up

Re: a question on jmx solr exposure

2011-12-29 Thread Dmitry Kan
That's absolutely right. Thanks for the suggestion.

On Thu, Dec 29, 2011 at 2:47 PM, Gora Mohanty  wrote:

> On Thu, Dec 29, 2011 at 6:15 PM, Dmitry Kan  wrote:
> > Well, we don't use multicore feature of SOLR, so in our case SOLR
> instances
> > are just separate web-apps. The web-app loading order probably then
> affects
> > on which app gets hold of a jmx 'pipe'.
> > We should probably start using the feature to collect stats from
> different
> > cores at the same time. Thanks.
>
> But, as I pointed out earlier, you should still have a different
> solrconfig.xml file for each web app, and thus should be able
> to set up JMX differently for each.
>
> Regards,
> Gora
>



-- 
Regards,

Dmitry Kan


Re: solr keep old docs

2011-12-29 Thread Erick Erickson
Hmmm, we're not communicating ...

The update processor wouldn't search in the
classic sense. It would just use lower-level
index traversal to determine if the doc (identified
by your unique key) was already in the index
and skip indexing that document if it was. No real
*searching* involved (see TermDocs.seek for one
approach).

The price would be that you are transmitting the
document over to the Solr instance and then
throwing it away.

Best
Erick

On Thu, Dec 29, 2011 at 12:52 AM, Mikhail Khludnev
 wrote:
> Alexander,
>
> I have two ideas how to implement fast dedupe externally, assuming your PKs
> don't fit to java.util.*Map:
>
>   - your crawler can use inprocess RDBMS (Derby, H2) to track dupes;
>   - if your crawler is stateless - it doesn't track PKs which has been
>   already crawled, you can retrieve it from Solr via
>   http://wiki.apache.org/solr/TermsComponent . That's blazingly fast, but
>   it might be a problem with removed documents (I'm not sure). And it's also
>   can lead to OOMException (if you have too much PKs). Let me know if you
>   need a workaround for one of these problems.
>
> If you choose internal dedupe (UpdateProcessor), pls let me know if
> querying one-by-one will be to slow for your and you'll need to do it
> page-by-page. I did some of such paging, and will do something similar
> soon, so I'm interested in it.
>
> Regards
>
> On Thu, Dec 29, 2011 at 9:34 AM, Alexander Aristov <
> alexander.aris...@gmail.com> wrote:
>
>> Unfortunately I have a lot of duplicates  and taking that searching might
>> suffer I will try with implementing update procesor.
>>
>> But your idea is interesting and I will consider it, thanks.
>>
>> Best Regards
>> Alexander Aristov
>>
>>
>> On 28 December 2011 19:12, Tanguy Moal  wrote:
>>
>> > Hello Alexander,
>> >
>> > I don't know much about your requirements in terms of size and
>> > performances, but I've had a similar use case and found a pretty simple
>> > workaround.
>> > If your duplicate rate is not too high, you can have the
>> > SignatureProcessor to generate fingerprint of documents (you already did
>> > that).
>> >
>> > Simply turn off overwritting of duplicates, you can then rely on solr's
>> > grouping / field collapsing to group your search results by fingerprints.
>> > You'll then have one document group per "real" document. You can use
>> > group.sort to sort your groups by indexing date ascending, and
>> > group.limit=1 to keep only the oldest one.
>> > You can even use group.format = simple to serve results as if no
>> > collapsing occured, and use group.ngroups (/!\ could be expansive /!\) to
>> > get the real number of deduplicated documents.
>> >
>> > Of course the index will be larger, as I said, I made no assumption
>> > regarding your operating requirements. And search can be a bit slower,
>> > depending on the average rate of duplicated documents.
>> > But you've got your issue addressed by configuration tuning only...
>> > Depending on your project's sizing, it could be time saving.
>> >
>> > The advantage is that you have the precious information of what content
>> is
>> > duplicated from where :-)
>> >
>> > Hope this helps,
>> >
>> > --
>> > Tanguy
>> >
>> > Le 28/12/2011 15:45, Alexander Aristov a écrit :
>> >
>> >  Thanks Eric,
>> >>
>> >> it sets me direction. I will be writing new plugin and will get back to
>> >> the
>> >> dev forum with results and then we will decide next steps.
>> >>
>> >> Best Regards
>> >> Alexander Aristov
>> >>
>> >>
>> >> On 28 December 2011 18:08, Erick Erickson> erickerick...@gmail.com>>
>> >>  wrote:
>> >>
>> >>  Well, the short answer is that nobody else has
>> >>> 1>  had a similar requirement
>> >>> AND
>> >>> 2>  not found a suitable work around
>> >>> AND
>> >>> 3>  implemented the change and contributed it back.
>> >>>
>> >>> So, if you'd like to volunteer.
>> >>>
>> >>> Seriously. If you think this would be valuable and are
>> >>> willing to work on it, hop on over to the dev list and
>> >>> discuss it, open a JIRA and make it work. I'd start
>> >>> by opening a discussion on the dev list before
>> >>> opening a JIRA, just to get a sense of where the
>> >>> snags would be to changing the Solr code, but that's
>> >>> optional.
>> >>>
>> >>> That said, writing your own update request handler
>> >>> that detected this case isn't very difficult,
>> >>> extend UpdateRequestProcessorFactory/**UpdateRequestProcessor
>> >>> and use it as a plugin.
>> >>>
>> >>> Best
>> >>> Erick
>> >>>
>> >>> On Wed, Dec 28, 2011 at 6:46 AM, Alexander Aristov
>> >>>   wrote:
>> >>>
>>  the problem with dedupe (SignatureUpdateProcessor ) is that it
>> REPLACES
>> 
>> >>> old
>> >>>
>>  docs. I have tried it already.
>> 
>>  Best Regards
>>  Alexander Aristov
>> 
>> 
>>  On 28 December 2011 13:04, Lance Norskog  wrote:
>> 
>>   The SignatureUpdateProcessor is for exactly this problem:
>> >
>> >
>> >
>> >  http://www.lucidimagination.**com/sea

Re: a question on jmx solr exposure

2011-12-29 Thread Gora Mohanty
On Thu, Dec 29, 2011 at 6:15 PM, Dmitry Kan  wrote:
> Well, we don't use multicore feature of SOLR, so in our case SOLR instances
> are just separate web-apps. The web-app loading order probably then affects
> on which app gets hold of a jmx 'pipe'.
> We should probably start using the feature to collect stats from different
> cores at the same time. Thanks.

But, as I pointed out earlier, you should still have a different
solrconfig.xml file for each web app, and thus should be able
to set up JMX differently for each.

Regards,
Gora


Re: a question on jmx solr exposure

2011-12-29 Thread Dmitry Kan
Well, we don't use multicore feature of SOLR, so in our case SOLR instances
are just separate web-apps. The web-app loading order probably then affects
on which app gets hold of a jmx 'pipe'.
We should probably start using the feature to collect stats from different
cores at the same time. Thanks.

On Thu, Dec 29, 2011 at 2:21 PM, Alexey Serba  wrote:

> Which Solr version do you use? Maybe it has something to do with
> default collection?
>
> I do see separate jmx domain for every collection, i.e.
>
> solr/collection1
> solr/collection2
> solr/collection3
> ...
>
> On Wed, Dec 21, 2011 at 1:56 PM, Dmitry Kan  wrote:
> > Hello list,
> >
> > This might be not the right place to ask the jmx specific questions, but
> I
> > decided to try, as we are polling SOLR statistics through jmx.
> >
> > We currently have two solr cores with different schemas A and B being run
> > under the same tomcat instance. Question is: which stat is jconsole going
> > to see under solr/ ?
> >
> > From the numbers (e.g. numDocs of searcher), jconsole see the stats of A.
> > Where do stats of B go? Or is firstly activated core will capture the jmx
> > "pipe" and won't let B's stats to go through?
> >
> > --
> > Regards,
> >
> > Dmitry Kan
>



-- 
Regards,

Dmitry Kan


Re: Decimal Mapping problem

2011-12-29 Thread Alexey Serba
Try to cast MySQL decimal data type to string, i.e.

CAST( IF(drt.discount IS NULL,'0',(drt.discount/100)) AS CHAR) as discount
(or CAST AS TEXT)

On Mon, Dec 19, 2011 at 1:24 PM, Niels Stevens  wrote:
> Hey everybody,
>
> I'm having an issue importing Decimal numbers from my Mysql DB to Solr.
> Is there anybody with some advise, I will start and try to explain my
> problem.
>
> According to my findings, I think the lack of a explicit mapping of a
> Decimal value in the schema.xml
> is causing some issues I'm experiencing.
>
> The decimal numbers I'm trying to import look like this :
>
> 0.075000
> 7.50
> 2.25
>
>
> but after the import statement the results for the equivalent Solr field
> are returned as this:
>
> [B@1413d20
> [B@11c86ff
> [B@1e2fd0d
>
>
> The import statement for this particular field looks like:
>
>  IF(drt.discount IS NULL,'0',(drt.discount/100)) ...
>
>
> Now I thought that using the Round functions from mysql to 3 numbers after
> the dot.
> In conjunction with a explicite mapping field in the schema.xml could solve
> this issue.
> Is there someone with some similar problems with decimal fields or anybody
> with an expert view on this?
>
> Thanks a lot in advance.
>
> Regards,
>
> Niels Stevens


Re: Grouping results after Sorting or vice-versa

2011-12-29 Thread Tomás Fernández Löbbe
Hi Vijayaragavan, did you apply a patch for grouping in Solr 3.1? It is
available out of the box since 3.3.
Also, the result from grouping will not look exactly like you are
expecting, as results with the same value in the grouping field (in this
case, thread_id) will be collapsed into one group. You can get many
documents per group though (with group.limit) but the output will not be
like regular results.

Tomás

On Thu, Dec 29, 2011 at 3:38 AM, Vijayaragavan wrote:

> Hi Juan,
>
> I'm using Solr 3.1
> The type of the date field is long.
> Let's say, the documents indexed in Solr server be..
>
> 
> 1326c5cc09bbc99a_1
> 1326c5cc09bbc99a
> 1316078009000
> <.. Some Other fields here ..>
> Some subject here...
> Some message here...
> 
> 
> 1321dff33cecd5f4_1
> 1321dff33cecd5f4
> 1314956314000
> <.. Some Other fields here ..>
> Some subject here...
> Some message here...
> 
> 
> 1321dff33cecd5f4_2
> 1321dff33cecd5f4
> 131771922
> <.. Some Other fields here ..>
> Some subject here...
> Some message here...
> 
> 
> 
> 133b70d0d0e32f12_1
> 133b70d0d0e32f12
> 1321626044000
> <.. Some Other fields here ..>
> Some subject here...
> Some message here...
> 
>
> The results i'm getting for
>
> http://localhost:8080/solr/core1/select/?qt=nutch&q=*:*&fq=userid:333&group=true&group.field=threadid&group.sort=date%20desc&sort=date%20desc
>
> 
> 133b70d0d0e32f12_1
> 133b70d0d0e32f12
> 1321626044000
> <.. Some Other fields here ..>
> Some subject here...
> Some message here...
> 
> 
> 1321dff33cecd5f4_2
> 1321dff33cecd5f4
> 131771922
> <.. Some Other fields here ..>
> Some subject here...
> Some message here...
> 
> 
> 1326c5cc09bbc99a_1
> 1326c5cc09bbc99a
> 1316078009000
> <.. Some Other fields here ..>
> Some subject here...
> Some message here...
> 
> 
> 1321dff33cecd5f4_1
> 1321dff33cecd5f4
> 1314956314000
> <.. Some Other fields here ..>
> Some subject here...
> Some message here...
> 
>
> But the results i should get be...
> 
> 133b70d0d0e32f12_1
> 133b70d0d0e32f12
> 1321626044000
> <.. Some Other fields here ..>
> Some subject here...
> Some message here...
> 
> 
> 1321dff33cecd5f4_2
> 1321dff33cecd5f4
> 131771922
> <.. Some Other fields here ..>
> Some subject here...
> Some message here...
> 
> 
> 
> 1321dff33cecd5f4_1
> 1321dff33cecd5f4
> 1314956314000
> <.. Some Other fields here ..>
> Some subject here...
> Some message here...
> 
> 1326c5cc09bbc99a_1
> 1326c5cc09bbc99a
> 1316078009000
> <.. Some Other fields here ..>
> Some subject here...
> Some message here...
> 
>
> Is it possible to get such results? If yes, how?
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Grouping-results-after-Sorting-or-vice-versa-tp3615957p3618172.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: a question on jmx solr exposure

2011-12-29 Thread Alexey Serba
Which Solr version do you use? Maybe it has something to do with
default collection?

I do see separate jmx domain for every collection, i.e.

solr/collection1
solr/collection2
solr/collection3
...

On Wed, Dec 21, 2011 at 1:56 PM, Dmitry Kan  wrote:
> Hello list,
>
> This might be not the right place to ask the jmx specific questions, but I
> decided to try, as we are polling SOLR statistics through jmx.
>
> We currently have two solr cores with different schemas A and B being run
> under the same tomcat instance. Question is: which stat is jconsole going
> to see under solr/ ?
>
> From the numbers (e.g. numDocs of searcher), jconsole see the stats of A.
> Where do stats of B go? Or is firstly activated core will capture the jmx
> "pipe" and won't let B's stats to go through?
>
> --
> Regards,
>
> Dmitry Kan


[SOLR 3.5] QueryResponseWriter

2011-12-29 Thread Aleksander Akerø

Hi!

So I've decided try out Solr 3.5.0.

What I have done this far is basicly just to copy the /example/solr 
folder, install the webapp .war file in a tomcat instance and start up.


At first I complained about the VelocityResponseWriter, so i created a 
/lib folder in /$SOLR_HOME and added the velocity jar from dist. That 
seemed to take care of the VRW error.


But now I get an "NoClassDefFoundError" wich sais something about 
QueryResponseWriter. So I guess I'm missing this one too then? But I 
have a feeling that this should be a part of the solr core jar?


Maybe someone could explain this for me?


Enabling realtime search in Solr 4.0

2011-12-29 Thread Avner Levy
Hi,
I'm trying to enable realtime search in Solr 4.0 (So I can see new documents 
without committing).
I've added:
true

  ${solr.data.dir:}


But documents aren't seen before commit (or softCommit).
Any help will be appreciated.
Thanks,
Avner