Re: Guide to using SolrQuery object

2009-08-10 Thread Aleksander M. Stensby
You'll find the available parameters in various interfaces in the package  
org.apache.solr.common.params.*


For instance:
import org.apache.solr.common.params.FacetParams;
import org.apache.solr.common.params.ShardParams;
import org.apache.solr.common.params.TermVectorParams;

As a side note to what Shalin said, SolrQuery extends ModifiableSolrParams  
(just so that you are aware of that).

Hope that helps a bit.

Cheers,
 Aleks

On Tue, 14 Jul 2009 16:27:50 +0200, Reuben Firmin reub...@benetech.org  
wrote:


Also, are there enums or constants around the various param names that  
can

be passed in, or do people tend to define those themselves?
Thanks!
Reuben




--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.com
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail


Re: Configure Collection Distribution in Solr 1.3

2009-06-12 Thread Aleksander M. Stensby
As some people have mentioned here on this mailing lists, the solr 1.3  
distribution scripts (snappuller / shooter) etc do not work on windows.  
Some have indicated that it might be possible to use cygwin but I have  
doubts. So unfortunately, windows users suffers with regard to replication  
(although I would reccommend everyone to use Unix for running servers;) )


That being said, you can use Solr 1.4 (one of the nightly builds) where  
you get built-in replication that is easily configured through the solr  
server configuration, and this works on Windows aswell!


So, if you don't have any real reason to not upgrade, I suggest that you  
try out Solr 1.4 (which also gives lots of new features and major  
improvements!)


Cheers,
 Aleksander


On Tue, 09 Jun 2009 21:00:27 +0200, MaheshR mahesh.ray...@gmail.com  
wrote:




Hi Aleksander ,


I gone thorugh the below links and successfully configured rsync using
cygwin on windows xp. In Solr documentation they mentioned many script  
files

like rysnc-enable, snapshooter..etc. These all UNIX based  files scripts.
where do I get these script files for windows OS ?

Any help on this would be great helpful.

Thanks
MaheshR.



Aleksander M. Stensby wrote:


You'll find everything you need in the Wiki.
http://wiki.apache.org/solr/SolrCollectionDistributionOperationsOutline

http://wiki.apache.org/solr/SolrCollectionDistributionScripts

If things are still uncertain I've written a guide for when we used the
solr distribution scrips on our lucene index earlier. You can read that
guide here:
http://www.integrasco.no/index.php?option=com_contentview=articleid=51:lucene-index-replicationcatid=35:blogItemid=53

Cheers,
  Aleksander


On Mon, 08 Jun 2009 18:22:01 +0200, MaheshR mahesh.ray...@gmail.com
wrote:



Hi,

we configured multi-core solr 1.3 server in Tomcat 6.0.18 servlet
container.
Its working great. Now I need to configure collection Distribution to
replicate indexing data between master and 2 slaves. Please provide me
step
by step instructions to configure collection distribution between  
master

and
slaves would be helpful.

Thanks in advance.

Thanks
Mahesh.




--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this  
e-mail









--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail


Re: Query on date fields

2009-06-12 Thread Aleksander M. Stensby

Hello,
for this you can simply use the nifty date functions supplied by SOLR  
(given that you have indexed your fields with the solr Date field.


If I understand you correctly, you can achieve what you want with the  
following union query:


displayStartDate:[* TO NOW] AND displayEndDate:[NOW TO *]

Cheers,
 Aleksander



On Mon, 08 Jun 2009 09:17:26 +0200, prerna07 pkhandelw...@sapient.com  
wrote:





Hi,

I have two date attributes in my Indexes:

DisplayStartDate_dt
DisplayEndDate_dt

I need to fetch results where today's date lies between displayStartDate  
and

dislayEndDate.

However i cannot send hardcoded displayStartdate and displayEndDate date  
in

query as there are 1000 different dates in indexes

Please suggest the query.

Thanks,
Prerna








--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail


Re: Search Phrase Wildcard?

2009-06-11 Thread Aleksander M. Stensby

Solr does not support wildcards in phrase queries, yet.

Cheers,
 Aleks

On Thu, 11 Jun 2009 11:48:13 +0200, Samnang Chhun  
samnang.ch...@gmail.com wrote:



Hi all,
I have my document like this:

doc

nameSolr web service/name

/doc

Is there any ways that I can search like startswith:

So* We* : found
Sol*: found
We*: not found

Cheers,
Samnang




--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail


Re: Search Phrase Wildcard?

2009-06-11 Thread Aleksander M. Stensby
Well yes:) Since Solr do infact support the entire lucene query parser  
syntax:)


- Aleks

On Thu, 11 Jun 2009 13:57:23 +0200, Avlesh Singh avl...@gmail.com wrote:


Infact, Lucene does not support that.

Lucene supports single and multiple character wildcard searches within

single terms (*not within phrase queries*).



Taken from
http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Wildcard%20Searches

Cheers
Avlesh

On Thu, Jun 11, 2009 at 4:32 PM, Aleksander M. Stensby 
aleksander.sten...@integrasco.no wrote:


Solr does not support wildcards in phrase queries, yet.

Cheers,
 Aleks


On Thu, 11 Jun 2009 11:48:13 +0200, Samnang Chhun  
samnang.ch...@gmail.com

wrote:

 Hi all,

I have my document like this:

doc

nameSolr web service/name

/doc

Is there any ways that I can search like startswith:

So* We* : found
Sol*: found
We*: not found

Cheers,
Samnang





--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this  
e-mail






--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail


Re: Sharding strategy

2009-06-10 Thread Aleksander M. Stensby

Hi Otis,
thanks for your reply!
You could say I'm lucky (and I totally agree since I've made the choice of  
ordering the data that way:p).
What you describe is what I've thought about doing and I'm happy to read  
that you approve. It is always nice to know that you are not doing things  
completely off - that's what I love about this mailing list!


I've implemented a sharded yellow pages that builds up the shard  
parameter and it will obviously be easy to search in two shards to  
overcome the beginning of the year situation, just thought it might be a  
bit stupid to search for 1% of the data in the latest shard and the rest  
in shard n-1. How much of a performance decrease do you recon I will get  
from searching two shards instead of one?


Anyways, thanks for confirming things, Otis!

Cheers,
 Aleksander




On Wed, 10 Jun 2009 07:51:16 +0200, Otis Gospodnetic  
otis_gospodne...@yahoo.com wrote:




Aleksander,

In a sense you are lucky you have time-ordered data.  That makes it very  
easy to shard and cheaper to search - you know exactly which shards you  
need to query.  The beginning of the year situation should also be  
easy.  Do start with the latest shard for the current year, and go to  
next shard only if you have to (e.g. if you don't get enough results  
from the first shard).


 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: Aleksander M. Stensby aleksander.sten...@integrasco.no
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Tuesday, June 9, 2009 7:07:47 AM
Subject: Sharding strategy

Hi all,
I'm trying to figure out how to shard our index as it is growing  
rapidly and we

want to make our solution scalable.
So, we have documents that are most commonly sorted by their date. My  
initial
thought is to shard the index by date, but I wonder if you have any  
input on

this and how to best solve this...

I know that the most frequent queries will be executed against the  
latest
shard, but then let's say we shard by year, how do we best solve the  
situation
that will occur in the beginning of a new year? (Some of the data will  
be in the

last shard, but most of it will be on the second last shard.)

Would it be stupid to have a latest shard with duplicate data (always
consisting of the last 6 months or something like that) and maintain  
that index

in addition to the regular yearly shards? Any one else facing a similar
situation with a good solution?

Any input would be greatly appreciated :)

Cheers,
Aleksander



--Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this  
e-mail







--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail


Re: Solr Multiple Queries?

2009-06-09 Thread Aleksander M. Stensby

Hi there Samnang!
Please see inline for comments:

On Tue, 09 Jun 2009 08:40:02 +0200, Samnang Chhun  
samnang.ch...@gmail.com wrote:



Hi all,
I just get started looking at using Solr as my search web service. But I
don't know does Solr have some features for multiple queries:

- Startswith
This is what we call prefix queries and wild card queries. For instance,  
you want something that starts with man, you can search for man*



- Exact Match

Exact matching is done with apostrophes; Solr rocks


- Contain
Hmm, what do you mean by contain? Inside a given word? That might be a bit  
more tricky. We have an issue open at the moment for supporting leading  
wildcards, and that might allow for you to search for *cogn* and match  
recognition etc. If that was what you meant, you can look at the ongoing  
issue http://issues.apache.org/jira/browse/SOLR-218



- Doesn't Contain
NOT or - are keywords to exclude something (solr supports all the boolean  
operators that Lucene supports).



- In the range

range queries in solr are done by using brackets.
for instance
price:[500 TO 1000]
will return all results with prices ranging from 500 to 1000.

There is a lot of information on the Wiki that you should check out:
http://wiki.apache.org/solr/




Could anyone guide me how to implement those features in Solr?

Cheers,
Samnang



Cheers,
 Aleks


--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail


Re: Multiple queries in one, something similar to a SQL union

2009-06-09 Thread Aleksander M. Stensby
I don't know if I follow you correctly, but you are saying that you want X  
results per type?
So you do something like limit=X and query = type:Y etc. and merge the  
results?


- Aleks


On Tue, 09 Jun 2009 12:33:21 +0200, Avlesh Singh avl...@gmail.com wrote:

I have an index with two fields - name and type. I need to perform a  
search
on the name field so that *equal number of results are fetched for each  
type

*.
Currently, I am achieving this by firing multiple queries with a  
different

type and then merging the results.
In my database driven version, I used to do a union of multiple queries
(and not separate SQL queries) to achieve this.

Can Solr do something similar? If not, can this be a possible  
enhancement?


Cheers
Avlesh




--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail


Sharding strategy

2009-06-09 Thread Aleksander M. Stensby

Hi all,
I'm trying to figure out how to shard our index as it is growing rapidly  
and we want to make our solution scalable.
So, we have documents that are most commonly sorted by their date. My  
initial thought is to shard the index by date, but I wonder if you have  
any input on this and how to best solve this...


I know that the most frequent queries will be executed against the  
latest shard, but then let's say we shard by year, how do we best solve  
the situation that will occur in the beginning of a new year? (Some of the  
data will be in the last shard, but most of it will be on the second last  
shard.)


Would it be stupid to have a latest shard with duplicate data (always  
consisting of the last 6 months or something like that) and maintain that  
index in addition to the regular yearly shards? Any one else facing a  
similar situation with a good solution?


Any input would be greatly appreciated :)

Cheers,
 Aleksander



--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail


StreamingUpdateSolrServer recommendations?

2009-06-08 Thread Aleksander M. Stensby

Hi all,
I guess this questions i mainly aimed to you, Ryan.
I've been trying out your StreamingUpdateSolrServer implementation for
indexin, and clearly see the improvements in indexing-times compared to
the CommonsHttpSolrServer :)
Great work!

My question is, do you have any recommendations as to what values I should
use / have you found a sweet-spot? What are the trade-offs? Thread count
is obvious with regard to the number of cpus available, but what about the
queue size? Any thoughts? I tried 20 / 3 as you have posted in the issue
thread, and get averages of about 80 documents / sec (and I have not
optimized the document processing etc, which takes the larger part of the
time).

Anyways, I was just curious on what others are using (and what times you
are getting at)

Keep up the good work!

   Aleks


--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail


Re: Terms Component

2009-06-08 Thread Aleksander M. Stensby
You can try out the nightly build of solr (which is the solr 1.4 dev  
version) containing all the new nice and shiny features of Solr 1.4:)
To use Terms Component you simply need to configure the handler as  
explained in the documentation / wiki.


Cheers,
 Aleksander


On Mon, 08 Jun 2009 14:22:15 +0200, Anshuman Manur  
anshuman_ma...@stragure.com wrote:



while on the subject, can anybody tell me when Solr 1.4 might come out?

Thanks
Anshuman Manur

On Mon, Jun 8, 2009 at 5:37 PM, Anshuman Manur
anshuman_ma...@stragure.comwrote:


I'm using Solr 1.3 apparently.and Solr 1.4 is not out yet.
Sorry..My mistake!


On Mon, Jun 8, 2009 at 5:18 PM, Anshuman Manur 
anshuman_ma...@stragure.com wrote:


Hello,

I want to use the terms component in Solr 1.4: But

http://localhost:8983/solr/terms?terms.fl=name


But, I get the following error with the above query:

java.lang.NullPointerException
at org.apache.solr.common.util.StrUtils.splitSmart(StrUtils.java:37)
	at  
org.apache.solr.search.OldLuceneQParser.parse(LuceneQParserPlugin.java:104)

at org.apache.solr.search.QParser.getQuery(QParser.java:88)


	at  
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:82)
	at  
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:148)
	at  
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)



at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:84)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:690)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)


	at  
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
	at  
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at  
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:295)



	at  
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
	at  
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at  
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)



	at  
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
	at  
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
	at  
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)



	at  
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
	at  
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:568)
	at  
org.ofbiz.catalina.container.CrossSubdomainSessionValve.invoke(CrossSubdomainSessionValve.java:44)



	at  
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
	at  
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
	at  
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)



	at  
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)

at java.lang.Thread.run(Thread.java:619)


Any help would be great.

Thanks
Anshuman Manur








--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail


Re: Configure Collection Distribution in Solr 1.3

2009-06-08 Thread Aleksander M. Stensby

You'll find everything you need in the Wiki.
http://wiki.apache.org/solr/SolrCollectionDistributionOperationsOutline

http://wiki.apache.org/solr/SolrCollectionDistributionScripts

If things are still uncertain I've written a guide for when we used the  
solr distribution scrips on our lucene index earlier. You can read that  
guide here:

http://www.integrasco.no/index.php?option=com_contentview=articleid=51:lucene-index-replicationcatid=35:blogItemid=53

Cheers,
 Aleksander


On Mon, 08 Jun 2009 18:22:01 +0200, MaheshR mahesh.ray...@gmail.com  
wrote:




Hi,

we configured multi-core solr 1.3 server in Tomcat 6.0.18 servlet  
container.

Its working great. Now I need to configure collection Distribution to
replicate indexing data between master and 2 slaves. Please provide me  
step
by step instructions to configure collection distribution between master  
and

slaves would be helpful.

Thanks in advance.

Thanks
Mahesh.




--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail


Re: Initialising of CommonsHttpSolrServer in Spring framwork

2009-05-15 Thread Aleksander M. Stensby
Out of the box, the simplest way to configure CommonsHttpSolrServer  
through a spring application context is to simply define the bean for the  
server and inject it into whatever class you have that will use it, like  
Avlesh shared below.
	bean id=httpSolrServer  
class=org.apache.solr.client.solrj.impl.CommonsHttpSolrServer 

constructor-arg
valuehttp://localhost:8080/solr/core0/value
/constructor-arg
/bean

You can also set the connection parameters like Avlesh did with the  
HttpClient in the context, or directly in the init method of your  
implementation.

Inject it with a property:
property name=solrServer
ref bean=httpSolrServer /
/property

A bit more tricky with the embedded solr server since you need to also  
register cores etc. We solved that by creating a core configuration loader  
class.


- Aleks


On Sat, 09 May 2009 03:08:25 +0200, Avlesh Singh avl...@gmail.com wrote:


I am giving you a detailed sample of my spring usage.

bean id=solrHttpClient  
class=org.apache.commons.httpclient.HttpClient

property name=httpConnectionManager
bean
class=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager
property name=maxConnectionsPerHost value=10/
property name=maxTotalConnections value=10/
/bean
/property
/bean

bean id=mySearchImpl class=com.me.search.MySearchSolrImpl
property name=core1
bean
class=org.apache.solr.client.solrj.impl.CommonsHttpSolrServer
constructor-arg value=http://localhost/solr/core1/
constructor-arg ref=solrHttpClient/
/bean
/property
property name=core2
bean
class=org.apache.solr.client.solrj.impl.CommonsHttpSolrServer
constructor-arg value=http://localhost/solr/core2/
constructor-arg ref=solrHttpClient/
/bean
/property
/bean

Hope this helps.

Cheers
Avlesh

On Sat, May 9, 2009 at 12:39 AM, sachin78  
tendulkarsachi...@gmail.comwrote:




Ranjeeth,

   Did you figured aout how to do this? If yes, can you share with me  
how

you did it? Example bean definition in xml will be helpful.

--Sachin


Funtick wrote:

 Use constructor and pass URL parameter. Nothing SPRING related...

 Create a Spring bean with attributes 'MySolr', 'MySolrUrl', and 'init'
 method... 'init' will create instance of CommonsHttpSolrServer.  
Configure

 Spring...



 I am using Solr 1.3 and Solrj as a Java Client. I am
 Integarating Solrj in Spring framwork, I am facing a problem,
 Spring framework is not inializing CommonsHttpSolrServer
 class, how can  I define this class to get the instance of
 SolrServer to invoke furthur method on this.





--
View this message in context:
http://www.nabble.com/Initialising-of-CommonsHttpSolrServer-in-Spring-framwork-tp18808743p23451795.html
Sent from the Solr - User mailing list archive at Nabble.com.






--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail


Re: How do I accomplish this (semi-)complicated setup?

2009-03-26 Thread Aleksander M. Stensby
 and other data. All  
this

needs
   to be indexed.
  
   The complication comes in when we have private repositories.  
Only

   select users have access to these, but we still need to index
them.
  
   How would I go about accomplishing this? I can't think of a  
clean

way
 to
   do it.
  
   Any pointers much appreciated.
  
  
   Jesper
  
   -
   Eric Pugh | Principal | OpenSource Connections, LLC |  
434.466.1467

|
   http://www.opensourceconnections.com
   Free/Busy: http://tinyurl.com/eric-cal
  
  
  
  
  
 
 











--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no

Please consider the environment before printing all or any of this e-mail


Re: Solrj: Getting response attributes from QueryResponse

2008-12-24 Thread Aleksander M. Stensby

Hello there Mark!
With SolrJ, you can simply do the following:
server.query(q) returns QueryResponse

the queryResponse has the method getResults() which returns  
SolrDocumentList. This is an extended list containing SolrDocuments, but  
it also exposes methods such as getNumFound(), which is exactly what you  
are looking for!


so, you could do something like this:
int hits = solrServer.query(q).getResults().getNumFound();

and you have similar methods for the other attributes, like:
results.getMaxScore();
and
results.getStart();

Hope that helps.

Cheers, and merry Christmas!
 Aleks

On Fri, 19 Dec 2008 21:22:48 +0100, Mark Ferguson  
mark.a.fergu...@gmail.com wrote:



Hello,

I am trying to get the numFound attribute from a returned QueryResponse
object, but for the life of me I can't find where it is stored. When I  
view
a response in XML format, it is stored as an attribute on the response  
node,

e.g.:

result name=response numFound=207 start=5 maxScore=4.1191907

However, I can't find a way to retrieve these attributes (numFound, start
and maxScore). When I look at the QueryResponse itself, I can see that  
the
attributes are being stored somewhere, because the toString method  
returns

them. For example, queryResponse.toString() returns:

{responseHeader={status=0,QTime=139,params={wt=javabin,hl=true,rows=15,version=2.2,fl=urlmd5,start=0,q=java}},response={
*numFound=1228*,start=03.633028,docs=[SolrDocument[{urlmd5=...

The problem is that when I call queryResponse.get('response'), all I get  
is
the list of SolrDocuments, I don't have any other attributes. Am I  
missing
something or are these attributes just not publically available? If  
they're

not, shouldn't they be? Thanks a lot,

Mark Ferguson




--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no

Please consider the environment before printing all or any of this e-mail


Re: What are the scenarios when a new Searcher is created ?

2008-11-30 Thread Aleksander M. Stensby
When adding documents to solr, the searcher will not be replaced, but once  
you do a commit, (dependening on settings) a new searcher will be opened  
and warmed up while the old searcher will still be open and used when  
searching. Once the new searcher has finished its warmup procedure, the  
old searcher will be replaced with the new warmed searcher, which will now  
allow you to search the newest documents added to the index.


- Aleks

On Mon, 01 Dec 2008 01:32:05 +0100, souravm [EMAIL PROTECTED] wrote:


Hi All,

Say I have started a new Solr server instance using the start.jar in  
java command. Now for this Solr server instance when all a new Searcher  
would be created ?


I am aware of following scenarios -

1. When the instance is started for autowarming a new Searcher is  
created. But not sure whether this searcher will continue to be alive or  
will die after the autowarming is over.
2. When I do the first search in this server instance through select, a  
new searcher would be created and then onwards the same searcher would  
be used for all select to this instance. Even if I run multiple search  
request concurrently I see that the same Searcher is used to service   
those requests.
3. When I try to add an index to this instance through update statement  
a new searcher is created.


Please let me know if there are any other situation when a new Searcher  
is created.


Regards,
Sourav



 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended  
solely
for the use of the addressee(s). If you are not the intended recipient,  
please
notify the sender by e-mail and delete the original message. Further,  
you are not
to copy, disclose, or distribute this e-mail or its contents to any  
other person and
any such actions are unlawful. This e-mail may contain viruses. Infosys  
has taken
every reasonable precaution to minimize this risk, but is not liable for  
any damage
you may sustain as a result of any virus in this e-mail. You should  
carry out your
own virus checks before opening the e-mail or attachment. Infosys  
reserves the
right to monitor and review the content of all messages sent to or from  
this e-mail
address. Messages sent to or from this e-mail address may be stored on  
the

Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***





--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no


Re: Keyword extraction

2008-11-27 Thread Aleksander M. Stensby

Hi again Patrick.
Glad to hear that we can contribute to help you guys. Thats what this  
mailing list is for:)


First of all, I think you use the wrong parameter to get your terms.
Take a look at  
http://lucene.apache.org/solr/api/org/apache/solr/common/params/MoreLikeThisParams.html  
to see the supported params.
In your string you use mlt.displayTerms=list, which i believe should be  
mlt.interestingTerms=list.


If that doesn't work:
One thing you should know is that from what i can tell, you are using the  
StandardRequestHandler in your querying. The StandardRequestHandler  
supports a simplified handling of more like these queries, namely; This  
method returns similar documents for each document in the response set.
it supports the common mlt parameters, needs mlt=true (as you have done)  
and supports a mlt.count parameter to specify the number of similar  
documents returned for each matching doc from your query.


If you want to get the top keywords etc, (and in essence your  
mlt.interestingTerms=list parameter to have any effect at all, if I'm not  
completely wrong), you will need to configure up a MoreLikeThisHandler in  
your solrconfig.xml and then map that to your query.


From the sample configuration file:
	incoming queries will be dispatched to the correct handler based on the  
path or the qt (query type) param. Names starting with a '/' are accessed  
with the a path equal to the registered name.  Names without a leading '/'  
are accessed with: http://host/app/select?qt=name If no qt is defined, the  
requestHandler that declares default=true will be used.


You can read about the MoreLikeThisHandler here:  
http://wiki.apache.org/solr/MoreLikeThisHandler


Once you have it configured properly your query would be something like:
http://localhost:8983/solr/mlt?q=amsterdammlt.fl=textmlt.interestingTerms=listmlt=true  
(don't think you need the mlt=true here tho...)

or
http://localhost:8983/solr/select?qt=mltq=amsterdammlt.fl=textmlt.interestingTerms=listmlt=true
(in the last example I use qt=mlt)

Hope this helps.
Regards,
 Aleksander


On Thu, 27 Nov 2008 11:49:30 +0100, Plaatje, Patrick  
[EMAIL PROTECTED] wrote:



Hi Aleksander,

With all the help of you and the other comments, we're now at a point  
where a MoreLikeThis list is returned, and shows 10 related records.  
However on the query executed there are no keywords whatsoever being  
returned. Is the querystring still wrong or is something else required?


The querystring we're currently executing is:

http://suempnr3:8080/solr/select/?q=amsterdammlt.fl=textmlt.displayTerms=listmlt=true


Best,

Patrick

-Original Message-
From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED]
Sent: woensdag 26 november 2008 15:07
To: solr-user@lucene.apache.org
Subject: Re: Keyword extraction

Ah, yes, That is important. In lucene, the MLT will see if the term  
vector is stored, and if it is not it will still be able to perform the  
querying, but in a much much much less efficient way.. Lucene will  
analyze the document (and the variable DEFAULT_MAX_NUM_TOKENS_PARSED  
will be used to limit the number of tokens that will be parsed). (don't  
want to go into details on this since I haven't really dug through the  
code:p) But when the field isn't stored either, it is rather difficult  
to re-analyze the

document;)

On a general note, if you want to really understand how the MLT works,  
take a look at the wiki or read this thorough blog post:

http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/

Regards,
  Aleksander

On Wed, 26 Nov 2008 14:41:52 +0100, Plaatje, Patrick  
[EMAIL PROTECTED] wrote:



Hi Aleksander,

This was a typo on my end, the original query included a semicolon
instead of an equal sign. But I think it has to do with my field not
being stored and not being identified as termVectors=true. I'm
recreating the index now, and see if this fixes the problem.

Best,

patrick

-Original Message-
From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED]
Sent: woensdag 26 november 2008 14:37
To: solr-user@lucene.apache.org
Subject: Re: Keyword extraction

Hi there!
Well, first of all i think you have an error in your query, if I'm not
mistaken.
You say http://localhost:8080/solr/select/?q=id=18477975...
but since you are referring to the field called id, you must say:
http://localhost:8080/solr/select/?q=id:18477975...
(use colon instead of the equals sign).
I think that will do the trick.
If not, try adding the debugQuery=on at the end of your request url,
to see debug output on how the query is parsed and if/how any
documents are matched against your query.
Hope this helps.

Cheers,
  Aleksander



On Wed, 26 Nov 2008 13:08:30 +0100, Plaatje, Patrick
[EMAIL PROTECTED] wrote:


Hi Aleksander,

Thanx for clearing this up. I am confident that this is a way to
explore for me as I'm just starting to grasp the matter. Do you know
why I'm not getting any results with the query posted earlier

Re: Keyword extraction

2008-11-26 Thread Aleksander M. Stensby

Hi there!
Well, first of all i think you have an error in your query, if I'm not  
mistaken.

You say http://localhost:8080/solr/select/?q=id=18477975...
but since you are referring to the field called id, you must say:
http://localhost:8080/solr/select/?q=id:18477975...
(use colon instead of the equals sign).
I think that will do the trick.
If not, try adding the debugQuery=on at the end of your request url, to  
see debug output on how the query is parsed and if/how any documents are  
matched against your query.

Hope this helps.

Cheers,
 Aleksander



On Wed, 26 Nov 2008 13:08:30 +0100, Plaatje, Patrick  
[EMAIL PROTECTED] wrote:



Hi Aleksander,

Thanx for clearing this up. I am confident that this is a way to explore  
for me as I'm just starting to grasp the matter. Do you know why I'm not  
getting any results with the query posted earlier then? It gives me the  
folowing only:


lst name=moreLikeThis
result name=18477975 numFound=0 start=0/
/lst

Instead of delivering details of the interestingTerms.

Thanks in advance

Patrick


-Original Message-
From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED]
Sent: woensdag 26 november 2008 13:03
To: solr-user@lucene.apache.org
Subject: Re: Keyword extraction

I do not agree with you at all. The concept of MoreLikeThis is based on  
the fundamental idea of TF-IDF weighting, and not term frequency alone.

Please take a look at:
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/similar/MoreLikeThis.html
As you can see, it is possible to use cut-off thresholds to  
significantly reduce the number of unimportant terms, and generate  
highly suitable queries based on the tf-idf frequency of the term, since  
as you point out, high frequency terms alone tends to be useless for  
querying, but taking the document frequency into account drastically  
increases the importance of the term!


In solr, use parameters to manipulate your desired results:
http://wiki.apache.org/solr/MoreLikeThis#head-6460069f297626f2a982f1e22ec5d1519c456b2c
For instance:
mlt.mintf - Minimum Term Frequency - the frequency below which terms  
will be ignored in the source doc.
mlt.mindf - Minimum Document Frequency - the frequency at which words  
will be ignored which do not occur in at least this many docs.

You can also set thresholds for term length etc.

Hope this gives you a better idea of things.
- Aleks

On Wed, 26 Nov 2008 12:38:38 +0100, Scurtu Vitalie [EMAIL PROTECTED]
wrote:


Dear Partick, I had the same problem with MoreLikeThis function.

After  briefly reading and analyzing the source code of moreLikeThis
function in solr, I conducted:

MoreLikeThis uses term vectors to ranks all the terms from a document
by its frequency. According to its ranking, it will start to generate
queries, artificially, and search for documents.

So, moreLikeThis will retrieve related documents by artificially
generating queries based on most frequent terms.

There's a big problem with most frequent terms  from documents. Most
frequent words are usually meaningless, or so called function words,
or, people from Information Retrieval like to call them stopwords.
However, ignoring  technical problems of implementation of
moreLikeThis function, this approach is very dangerous, since queries
are generated artificially based on a given document.
Writting queries for retrieving a document is a human task, and it
assumes some knowledge (user knows what document he wants).

I advice to use others approaches, depending on your expectation. For
example, you can extract similar documents just by searching for
documents with similar title (more like this doesn't work in this case).

I hope it helps,
Best Regards,
Vitalie Scurtu
--- On Wed, 11/26/08, Plaatje, Patrick [EMAIL PROTECTED]
wrote:
From: Plaatje, Patrick [EMAIL PROTECTED]
Subject: RE:  Keyword extraction
To: solr-user@lucene.apache.org
Date: Wednesday, November 26, 2008, 10:52 AM

Hi All,
as an addition to my previous post, no interestingTerms are returned
when i execute the folowing url:
http://localhost:8080/solr/select/?q=id=18477975mlt.fl=textmlt.inter
es tingTerms=listmlt=truemlt.match.include=true
I get a moreLikeThis list though, any thoughts?
Best,
Patrick








--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no





--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no


Re: Keyword extraction

2008-11-26 Thread Aleksander M. Stensby
I'm sure that for certain problems and cases you will need to do quite a  
bit tweaking to make it work (to suite your needs), but i responded to  
your statement because you made it sound like the MoreLikeThis component  
does not work at all for its purpuse, while it actually do work as  
intended and can be of great aid in constructing queries to retrieve  
same-topic-documents etc.


- Aleksander

On Wed, 26 Nov 2008 14:10:57 +0100, Scurtu Vitalie [EMAIL PROTECTED]  
wrote:



Yes, I totally understand, and agree. 

MoreLikeThis uses TF-IDF to rank terms, then it generates queries based  
on top ranked terms.  In any case, I wasn't able to make it work after  
many attempts.


Finally, I've used a different method for queries generation, and it  
works better, or at least gives some results, while with moreLikeThis  
results were poor or no result at all.


To mention that my index was composed by short length documents,  
therefore the intersection between top ranked terms by TF-IDF was empty  
set.  MoreLikeThis works better when you have long documents.


Yes, I've changed the thresholds for min TFIDF and max TFIDF, and others  
parameters.


I've also used mlt.maxqt parameter  to increase the number of terms  
used in queries generation, but still didn't work well, since the method  
of queries generation based on terms with the highest TF-IDF score  
doesn't generate representative query for document.  I wasn't able to  
tune it. For a low value such as mlt.maxqt=3,4, results were poor, while  
for mlt.maxqt=5,6 it gave too many and irrelevant results.




Thank you,
Best Wishes,
Vitalie Scurtu



--- On Wed, 11/26/08, Aleksander M. Stensby  
[EMAIL PROTECTED] wrote:

From: Aleksander M. Stensby aleksander.
[EMAIL PROTECTED]
Subject: Re:  Keyword extraction
To: solr-user@lucene.apache.org
Date: Wednesday, November 26, 2008, 1:03 PM

I do not agree with you at all. The concept of MoreLikeThis is based on  
the

fundamental idea of TF-IDF weighting, and not term frequency alone.
Please take a look at:
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/similar/MoreLikeThis.html
As you can see, it is possible to use cut-off thresholds to significantly
reduce the number of unimportant terms, and generate highly suitable  
queries

based on the tf-idf frequency of the term, since as you point out, high
frequency terms alone tends to be useless for querying, but taking the  
document

frequency into account drastically increases the importance of the term!

In solr, use parameters to manipulate your desired results:
http://wiki.apache.org/solr/MoreLikeThis#head-6460069f297626f2a982f1e22ec5d1519c456b2c
For instance:
mlt.mintf - Minimum Term Frequency - the frequency below which terms  
will be

ignored in the source doc.
mlt.mindf - Minimum Document Frequency - the frequency at which words  
will be

ignored which do not occur in at least this many docs.
You can also set thresholds for term length etc.

Hope this gives you a better idea of things.
- Aleks

On Wed, 26 Nov 2008 12:38:38 +0100, Scurtu Vitalie [EMAIL PROTECTED]
wrote:


Dear Partick, I had the same problem with MoreLikeThis function.

After  briefly reading and analyzing the source code of moreLikeThis

function in solr, I conducted:


MoreLikeThis uses term vectors to ranks all the terms from a document
by its frequency. According to its ranking, it will start to generate
queries, artificially, and search for documents.

So, moreLikeThis will retrieve related documents by artificially

generating queries based on most frequent terms.


There's a big problem with most frequent terms  from
documents. Most frequent words are usually meaningless, or so called  
function

words, or, people from Information Retrieval like to call them stopwords.
However, ignoring  technical problems of implementation of moreLikeThis
function, this approach is very dangerous, since queries are generated
artificially based on a given document.
Writting queries for retrieving a document is a human task, and it  
assumes

some knowledge (user knows what document he wants).


I advice to use others approaches, depending on your expectation. For
example, you can extract similar documents just by searching for  
documents with

similar title (more like this doesn't work in this case).


I hope it helps,
Best Regards,
Vitalie Scurtu
--- On Wed, 11/26/08, Plaatje, Patrick

[EMAIL PROTECTED] wrote:

From: Plaatje, Patrick [EMAIL PROTECTED]
Subject: RE:  Keyword extraction
To: solr-user@lucene.apache.org
Date: Wednesday, November 26, 2008, 10:52 AM

Hi All,
as an addition to my previous post, no interestingTerms are returned
when i execute the folowing url:


http://localhost:8080/solr/select/?q=id=18477975mlt.fl=textmlt.interes

tingTerms=listmlt=truemlt.match.include=true
I get a moreLikeThis list though, any thoughts?
Best,
Patrick








--Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no



Re: Keyword extraction

2008-11-26 Thread Aleksander M. Stensby
Ah, yes, That is important. In lucene, the MLT will see if the term vector  
is stored, and if it is not it will still be able to perform the querying,  
but in a much much much less efficient way.. Lucene will analyze the  
document (and the variable DEFAULT_MAX_NUM_TOKENS_PARSED will be used to  
limit the number of tokens that will be parsed). (don't want to go into  
details on this since I haven't really dug through the code:p) But when  
the field isn't stored either, it is rather difficult to re-analyze the  
document;)


On a general note, if you want to really understand how the MLT works,  
take a look at the wiki or read this thorough blog post:  
http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/


Regards,
 Aleksander

On Wed, 26 Nov 2008 14:41:52 +0100, Plaatje, Patrick  
[EMAIL PROTECTED] wrote:



Hi Aleksander,

This was a typo on my end, the original query included a semicolon  
instead of an equal sign. But I think it has to do with my field not  
being stored and not being identified as termVectors=true. I'm  
recreating the index now, and see if this fixes the problem.


Best,

patrick

-Original Message-
From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED]
Sent: woensdag 26 november 2008 14:37
To: solr-user@lucene.apache.org
Subject: Re: Keyword extraction

Hi there!
Well, first of all i think you have an error in your query, if I'm not  
mistaken.

You say http://localhost:8080/solr/select/?q=id=18477975...
but since you are referring to the field called id, you must say:
http://localhost:8080/solr/select/?q=id:18477975...
(use colon instead of the equals sign).
I think that will do the trick.
If not, try adding the debugQuery=on at the end of your request url, to  
see debug output on how the query is parsed and if/how any documents are  
matched against your query.

Hope this helps.

Cheers,
  Aleksander



On Wed, 26 Nov 2008 13:08:30 +0100, Plaatje, Patrick  
[EMAIL PROTECTED] wrote:



Hi Aleksander,

Thanx for clearing this up. I am confident that this is a way to
explore for me as I'm just starting to grasp the matter. Do you know
why I'm not getting any results with the query posted earlier then? It
gives me the folowing only:

lst name=moreLikeThis
result name=18477975 numFound=0 start=0/ /lst

Instead of delivering details of the interestingTerms.

Thanks in advance

Patrick


-Original Message-
From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED]
Sent: woensdag 26 november 2008 13:03
To: solr-user@lucene.apache.org
Subject: Re: Keyword extraction

I do not agree with you at all. The concept of MoreLikeThis is based
on the fundamental idea of TF-IDF weighting, and not term frequency  
alone.

Please take a look at:
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/simil
ar/MoreLikeThis.html As you can see, it is possible to use cut-off
thresholds to significantly reduce the number of unimportant terms,
and generate highly suitable queries based on the tf-idf frequency of
the term, since as you point out, high frequency terms alone tends to
be useless for querying, but taking the document frequency into
account drastically increases the importance of the term!

In solr, use parameters to manipulate your desired results:
http://wiki.apache.org/solr/MoreLikeThis#head-6460069f297626f2a982f1e2
2ec5d1519c456b2c
For instance:
mlt.mintf - Minimum Term Frequency - the frequency below which terms
will be ignored in the source doc.
mlt.mindf - Minimum Document Frequency - the frequency at which words
will be ignored which do not occur in at least this many docs.
You can also set thresholds for term length etc.

Hope this gives you a better idea of things.
- Aleks

On Wed, 26 Nov 2008 12:38:38 +0100, Scurtu Vitalie [EMAIL PROTECTED]
wrote:


Dear Partick, I had the same problem with MoreLikeThis function.

After  briefly reading and analyzing the source code of moreLikeThis
function in solr, I conducted:

MoreLikeThis uses term vectors to ranks all the terms from a document
by its frequency. According to its ranking, it will start to generate
queries, artificially, and search for documents.

So, moreLikeThis will retrieve related documents by artificially
generating queries based on most frequent terms.

There's a big problem with most frequent terms  from documents.
Most frequent words are usually meaningless, or so called function
words, or, people from Information Retrieval like to call them  
stopwords.

However, ignoring  technical problems of implementation of
moreLikeThis function, this approach is very dangerous, since queries
are generated artificially based on a given document.
Writting queries for retrieving a document is a human task, and it
assumes some knowledge (user knows what document he wants).

I advice to use others approaches, depending on your expectation. For
example, you can extract similar documents just by searching for
documents with similar title (more like this doesn't work in this  
case).


I hope

Re: Keyword extraction

2008-11-26 Thread Aleksander M. Stensby
I do not agree with you at all. The concept of MoreLikeThis is based on  
the fundamental idea of TF-IDF weighting, and not term frequency alone.
Please take a look at:  
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/similar/MoreLikeThis.html
As you can see, it is possible to use cut-off thresholds to significantly  
reduce the number of unimportant terms, and generate highly suitable  
queries based on the tf-idf frequency of the term, since as you point out,  
high frequency terms alone tends to be useless for querying, but taking  
the document frequency into account drastically increases the importance  
of the term!


In solr, use parameters to manipulate your desired results:  
http://wiki.apache.org/solr/MoreLikeThis#head-6460069f297626f2a982f1e22ec5d1519c456b2c

For instance:
mlt.mintf - Minimum Term Frequency - the frequency below which terms will  
be ignored in the source doc.
mlt.mindf - Minimum Document Frequency - the frequency at which words will  
be ignored which do not occur in at least this many docs.

You can also set thresholds for term length etc.

Hope this gives you a better idea of things.
- Aleks

On Wed, 26 Nov 2008 12:38:38 +0100, Scurtu Vitalie [EMAIL PROTECTED]  
wrote:



Dear Partick, I had the same problem with MoreLikeThis function.

After  briefly reading and analyzing the source code of moreLikeThis  
function in solr, I conducted:


MoreLikeThis uses term vectors to ranks all the terms from a document
by its frequency. According to its ranking, it will start to generate
queries, artificially, and search for documents.

So, moreLikeThis will retrieve related documents by artificially  
generating queries based on most frequent terms.


There's a big problem with most frequent terms  from documents. Most  
frequent words are usually meaningless, or so called function words, or,  
people from Information Retrieval like to call them stopwords. However,  
ignoring  technical problems of implementation of moreLikeThis function,  
this approach is very dangerous, since queries are generated  
artificially based on a given document.
Writting queries for retrieving a document is a human task, and it  
assumes some knowledge (user knows what document he wants).


I advice to use others approaches, depending on your expectation. For  
example, you can extract similar documents just by searching for  
documents with similar title (more like this doesn't work in this case).


I hope it helps,
Best Regards,
Vitalie Scurtu
--- On Wed, 11/26/08, Plaatje, Patrick [EMAIL PROTECTED]  
wrote:

From: Plaatje, Patrick [EMAIL PROTECTED]
Subject: RE:  Keyword extraction
To: solr-user@lucene.apache.org
Date: Wednesday, November 26, 2008, 10:52 AM

Hi All,
as an addition to my previous post, no interestingTerms are returned
when i execute the folowing url:
http://localhost:8080/solr/select/?q=id=18477975mlt.fl=textmlt.interes
tingTerms=listmlt=truemlt.match.include=true
I get a moreLikeThis list though, any thoughts?
Best,
Patrick








--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no


Re: Can a lucene document be used in solr?

2008-11-26 Thread Aleksander M. Stensby

Hello there,
 do you mean a lucene Document or do you mean if it is possible to use an  
existing lucene index with solr?
In the latter case, the answer is yes, since solr is built on top of  
lucene. But it requires you to configure your schema.xml to correlate to  
the index-structure of your existing lucene index. On the question of  
document, Solr will take what is called a SolrInputDocument as input if  
you are using solrj, or xml if you are using http. Don't know if that  
answered your question or not..

 Regards,
 Aleksander


On Thu, 27 Nov 2008 05:55:06 +0100, Sajith Vimukthi [EMAIL PROTECTED]  
wrote:



Hi all,

Can someone of you all tell me whether  I can use a lucene document in  
solr?



Regards,


Sajith Vimukthi Weerakoon

Associate Software Engineer | ZONE24X7

| Tel: +94 11 2882390 ext 101 | Fax: +94 11 2878261 |

http://www.zone24x7.com






--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no


Re: facet.sort and distributed search

2008-11-26 Thread Aleksander M. Stensby
This is a known issue but take a look at the following jira issue and the  
patch supplied there:

https://issues.apache.org/jira/browse/SOLR-764

Haven't tried it myself, but i believe it should do the trick for you.
Hope that helps.

Cheers,
 Aleksander

On Wed, 26 Nov 2008 22:53:21 +0100, Grégoire Neuville  
[EMAIL PROTECTED] wrote:



Hi,

I'm working on an web application one functionality of which consists
in presenting to the user a list of terms to seize in a form field,
sorted alphabetically. As long as one single index was concerned, I
used solr facets to produce the list and it worked fine. But I must
now deal with several indices,  and thus use the distributed search
capability of solr, which forbid the use of facet.sort=false.

I would like to know if someone plans to, or is even working on, the
implementation of the natural facet sorting in case of a distributed
search.

Thanks a lot,




--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no


Re: Unique id

2008-11-21 Thread Aleksander M. Stensby
Hello again. I'm getting a bit confused by your questions, and I believe  
it would be easier for us to help you if you could post the field  
definitions from your schema.xml and the structure of your two database  
views.

ie.
table 1: (id (int), subject (string) -.--)
table 2: (category (string), other fields ..)


So please post this and we can try to help you.

- Aleks


On Fri, 21 Nov 2008 07:49:31 +0100, Raghunandan Rao  
[EMAIL PROTECTED] wrote:



Thanks Erik.
If I convert that to a string then id field defined in schema.xml would
fail as I have that as integer. If I change that to string then first
view would fail as it is Integer there. What to do in such scenarios? Do
I need to define multiple schema.xml or multiple unique key definitions
in same schema. How does this work? Pls explain.

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 20, 2008 6:40 PM
To: solr-user@lucene.apache.org
Subject: Re: Unique id

I'd suggest aggregating those three columns into a string that can
serve as the Solr uniqueKey field value.

Erik


On Nov 20, 2008, at 1:10 AM, Raghunandan Rao wrote:


Basically, I am working on two views. First one has an ID column. The
second view has no unique ID column. What to do in such situations?
There are 3 other columns where I can make a composite key out of
those.
I have to index these two views now.


-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 19, 2008 5:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Unique id

Technically, no, a uniqueKey field is NOT required.  I've yet to run
into a situation where it made sense not to use one though.

As for indexing database tables - if one of your tables doesn't have a
primary key, does it have an aggregate unique key of some sort?  Do
you plan on updating the rows in that table and reindexing them?
Seems like some kind of unique key would make sense for updating
documents.

But yeah, a more detailed description of your table structure and
searching needs would be helpful.

Erik


On Nov 19, 2008, at 5:18 AM, Aleksander M. Stensby wrote:


Yes it is. You need a unique id because the add method works as and
add or update method. When adding a document whose ID is already
found in the index, the old document will be deleted and the new
will be added. Are you indexing two tables into the same index? Or
does one entry in the index consist of data from both tables? How
are these linked together without an ID?

- Aleksander

On Wed, 19 Nov 2008 10:42:00 +0100, Raghunandan Rao

[EMAIL PROTECTED]

wrote:



Hi,

Is the uniqueKey in schema.xml really required?


Reason is, I am indexing two tables and I have id as unique key in
schema.xml but id field is not there in one of the tables and
indexing
fails. Do I really require this unique field for Solr to index it
better
or can I do away with this?


Thanks,

Rahgu





--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no




Re: solrQueryParser does not take effect - nightly build

2008-11-21 Thread Aleksander M. Stensby
That sounds a bit strange. Did you do the changes in the schema.xml before  
starting the server? Because if you change it while it is running, it will  
by default delete and replace the file (discarding any changes you make).  
In other words, make sure the server is not running, make your changes and  
then start up the server. Apart from that, I can't really see any reason  
for this to not work...


- Aleks


On Thu, 20 Nov 2008 22:03:30 +0100, ashokc [EMAIL PROTECTED] wrote:



Hi,

I have set

solrQueryParser defaultOperator=AND/

but it is not taking effect. It continues to take it as OR. I am working
with the latest nightly build 11/20/2008

For a querry like

term1 term2

Debug shows

str name=parsedquerycontent:term1 content:term2/str

Bug?

Thanks

- ashok





Re: Unique id

2008-11-21 Thread Aleksander M. Stensby
Ok, this brings me to the question; how are the two view's connected to  
each other (since you are indexing partly view 1 and partly view 2 into a  
single index structure?


If they are not at all connected I believe you have made a fundamental  
mistake / misunderstand the use of your index...
I assume that a Task can be assigned to a person, and your Team view  
displays that person, right?


Maybe you are doing something like this:
View 1
1, somename, sometimestamp, someothertimestamp
2, someothername, somethirdtimestamp, timetamp4
...

View 2
1, 58, 0
2, 58, 1
3, 52, 0
...

I'm really confused about your database structure...
To me, It would be logical to add a team_id field to the team table, and  
add a third table to link tasks to a team (or to individual persons).
Once you have that information (because I do assume there MUST be some  
link there) you would do:

insert into your index:
 (id from the task), (name of the task), (id of the person assigned to  
this task), (id of the departement that this person works in).


I guess that you _might_ be thinking a bit wrong and trying to do  
something like this:
Treat each view as independent views, and inserting values from each table  
as separate documents in the index

so you would do:
insert into your index:
 (id from the task), (name of the task), (no value), (no value)  which  
will be ok to do
 (no value), (no value), (id of the person), (id of the departement) ---  
which makes no sense to me...


So, can you clearify the relationship between the two views, and how you  
are thinking of inserting entries into your index?


- Aleks



On Fri, 21 Nov 2008 10:33:28 +0100, Raghunandan Rao  
[EMAIL PROTECTED] wrote:



View structure is:

1.
Task(id* (int), name (string), start (timestamp), end (timestamp))

2.
Team(person_id (int), deptId (int), isManager (int))

* is primary key

In schema.xml I have

field name=id type=integer indexed=true stored=true
required=true/
field name=name type=text indexed=true stored=true/
field name=personId type=integer indexed=true stored=true/
field name= deptId type=integer indexed=true stored=true/

uniqueKeyid/uniqueKey


-Original Message-
From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED]
Sent: Friday, November 21, 2008 2:56 PM
To: solr-user@lucene.apache.org
Subject: Re: Unique id

Hello again. I'm getting a bit confused by your questions, and I believe

it would be easier for us to help you if you could post the field
definitions from your schema.xml and the structure of your two database

views.
ie.
table 1: (id (int), subject (string) -.--)
table 2: (category (string), other fields ..)


So please post this and we can try to help you.

- Aleks


On Fri, 21 Nov 2008 07:49:31 +0100, Raghunandan Rao
[EMAIL PROTECTED] wrote:


Thanks Erik.
If I convert that to a string then id field defined in schema.xml

would

fail as I have that as integer. If I change that to string then first
view would fail as it is Integer there. What to do in such scenarios?

Do

I need to define multiple schema.xml or multiple unique key

definitions

in same schema. How does this work? Pls explain.

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 20, 2008 6:40 PM
To: solr-user@lucene.apache.org
Subject: Re: Unique id

I'd suggest aggregating those three columns into a string that can
serve as the Solr uniqueKey field value.

Erik


On Nov 20, 2008, at 1:10 AM, Raghunandan Rao wrote:


Basically, I am working on two views. First one has an ID column. The
second view has no unique ID column. What to do in such situations?
There are 3 other columns where I can make a composite key out of
those.
I have to index these two views now.


-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 19, 2008 5:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Unique id

Technically, no, a uniqueKey field is NOT required.  I've yet to run
into a situation where it made sense not to use one though.

As for indexing database tables - if one of your tables doesn't have

a

primary key, does it have an aggregate unique key of some sort?  Do
you plan on updating the rows in that table and reindexing them?
Seems like some kind of unique key would make sense for updating
documents.

But yeah, a more detailed description of your table structure and
searching needs would be helpful.

Erik


On Nov 19, 2008, at 5:18 AM, Aleksander M. Stensby wrote:


Yes it is. You need a unique id because the add method works as and
add or update method. When adding a document whose ID is already
found in the index, the old document will be deleted and the new
will be added. Are you indexing two tables into the same index? Or
does one entry in the index consist of data from both tables? How
are these linked together without an ID?

- Aleksander

On Wed, 19 Nov 2008 10:42:00 +0100, Raghunandan Rao

[EMAIL PROTECTED]

wrote:



Hi,

Is the uniqueKey

Re: Unique id

2008-11-21 Thread Aleksander M. Stensby
And in case that wasn't clear, the reason for it failing then would  
obviously be because you define the id field with required=true, and you  
try inserting a document where this field is missing...


- Aleks

On Fri, 21 Nov 2008 10:46:10 +0100, Aleksander M. Stensby  
[EMAIL PROTECTED] wrote:


Ok, this brings me to the question; how are the two view's connected to  
each other (since you are indexing partly view 1 and partly view 2 into  
a single index structure?


If they are not at all connected I believe you have made a fundamental  
mistake / misunderstand the use of your index...
I assume that a Task can be assigned to a person, and your Team view  
displays that person, right?


Maybe you are doing something like this:
View 1
1, somename, sometimestamp, someothertimestamp
2, someothername, somethirdtimestamp, timetamp4
...

View 2
1, 58, 0
2, 58, 1
3, 52, 0
...

I'm really confused about your database structure...
To me, It would be logical to add a team_id field to the team table, and  
add a third table to link tasks to a team (or to individual persons).
Once you have that information (because I do assume there MUST be some  
link there) you would do:

insert into your index:
  (id from the task), (name of the task), (id of the person assigned to  
this task), (id of the departement that this person works in).


I guess that you _might_ be thinking a bit wrong and trying to do  
something like this:
Treat each view as independent views, and inserting values from each  
table as separate documents in the index

so you would do:
insert into your index:
  (id from the task), (name of the task), (no value), (no value)   
which will be ok to do
  (no value), (no value), (id of the person), (id of the departement)  
--- which makes no sense to me...


So, can you clearify the relationship between the two views, and how you  
are thinking of inserting entries into your index?


- Aleks



On Fri, 21 Nov 2008 10:33:28 +0100, Raghunandan Rao  
[EMAIL PROTECTED] wrote:



View structure is:

1.
Task(id* (int), name (string), start (timestamp), end (timestamp))

2.
Team(person_id (int), deptId (int), isManager (int))

* is primary key

In schema.xml I have

field name=id type=integer indexed=true stored=true
required=true/
field name=name type=text indexed=true stored=true/
field name=personId type=integer indexed=true stored=true/
field name= deptId type=integer indexed=true stored=true/

uniqueKeyid/uniqueKey


-Original Message-
From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED]
Sent: Friday, November 21, 2008 2:56 PM
To: solr-user@lucene.apache.org
Subject: Re: Unique id

Hello again. I'm getting a bit confused by your questions, and I believe

it would be easier for us to help you if you could post the field
definitions from your schema.xml and the structure of your two database

views.
ie.
table 1: (id (int), subject (string) -.--)
table 2: (category (string), other fields ..)


So please post this and we can try to help you.

- Aleks


On Fri, 21 Nov 2008 07:49:31 +0100, Raghunandan Rao
[EMAIL PROTECTED] wrote:


Thanks Erik.
If I convert that to a string then id field defined in schema.xml

would

fail as I have that as integer. If I change that to string then first
view would fail as it is Integer there. What to do in such scenarios?

Do

I need to define multiple schema.xml or multiple unique key

definitions

in same schema. How does this work? Pls explain.

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 20, 2008 6:40 PM
To: solr-user@lucene.apache.org
Subject: Re: Unique id

I'd suggest aggregating those three columns into a string that can
serve as the Solr uniqueKey field value.

Erik


On Nov 20, 2008, at 1:10 AM, Raghunandan Rao wrote:


Basically, I am working on two views. First one has an ID column. The
second view has no unique ID column. What to do in such situations?
There are 3 other columns where I can make a composite key out of
those.
I have to index these two views now.


-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 19, 2008 5:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Unique id

Technically, no, a uniqueKey field is NOT required.  I've yet to run
into a situation where it made sense not to use one though.

As for indexing database tables - if one of your tables doesn't have

a

primary key, does it have an aggregate unique key of some sort?  Do
you plan on updating the rows in that table and reindexing them?
Seems like some kind of unique key would make sense for updating
documents.

But yeah, a more detailed description of your table structure and
searching needs would be helpful.

Erik


On Nov 19, 2008, at 5:18 AM, Aleksander M. Stensby wrote:


Yes it is. You need a unique id because the add method works as and
add or update method. When adding a document whose ID is already
found in the index, the old document will be deleted

Re: Unique id

2008-11-21 Thread Aleksander M. Stensby
Well, In that case, what do you want to search for? If I were you, I would  
make my index consist of tasks (and I assume that is what you are trying  
to do).


So why don't you just use your schema.xml as you have right now, and do  
the following:


Pick a person (let's say he has person_id=42 and deptId=3), get his queue  
of tasks, then for each task in queue do:

insert into index:
(id from the task), (name of the task), (id of the person), (id of the  
departement)

an example:
3, this is a very important task, 42, 3
4, this one is also important, 42, 3
5, this one is low priority, 42, 3

And then for the next person you do the same, (person_id=58 and deptId=5)
insert:
6, this is about solr, 58, 5
7, this is about lucene, 58, 5

etc.

Now you can search for all tasks in departement 5 by doing deptId:5.
If you want to search for all the tasks assigned to a specific person you  
just enter the query personId:42.
And you could also search for all tasks containing certain keywords by  
doing the query name:solr OR name:lucene.


Do you understand now, or is it still unclear?

- Aleks



On Fri, 21 Nov 2008 10:56:38 +0100, Raghunandan Rao  
[EMAIL PROTECTED] wrote:



Ok. There is common column in two views called queueId. I query second
view first and get all the queueids for a person. And having queueIds I
get all the ids from first view.

Sorry for missing that column earlier. I think it should make sense now.


-Original Message-
From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED]
Sent: Friday, November 21, 2008 3:18 PM
To: solr-user@lucene.apache.org
Subject: Re: Unique id

And in case that wasn't clear, the reason for it failing then would
obviously be because you define the id field with required=true, and
you
try inserting a document where this field is missing...

- Aleks

On Fri, 21 Nov 2008 10:46:10 +0100, Aleksander M. Stensby
[EMAIL PROTECTED] wrote:


Ok, this brings me to the question; how are the two view's connected

to

each other (since you are indexing partly view 1 and partly view 2

into

a single index structure?

If they are not at all connected I believe you have made a fundamental



mistake / misunderstand the use of your index...
I assume that a Task can be assigned to a person, and your Team view
displays that person, right?

Maybe you are doing something like this:
View 1
1, somename, sometimestamp, someothertimestamp
2, someothername, somethirdtimestamp, timetamp4
...

View 2
1, 58, 0
2, 58, 1
3, 52, 0
...

I'm really confused about your database structure...
To me, It would be logical to add a team_id field to the team table,

and

add a third table to link tasks to a team (or to individual persons).
Once you have that information (because I do assume there MUST be some



link there) you would do:
insert into your index:
  (id from the task), (name of the task), (id of the person assigned

to

this task), (id of the departement that this person works in).

I guess that you _might_ be thinking a bit wrong and trying to do
something like this:
Treat each view as independent views, and inserting values from each
table as separate documents in the index
so you would do:
insert into your index:
  (id from the task), (name of the task), (no value), (no value) 



which will be ok to do
  (no value), (no value), (id of the person), (id of the departement)



--- which makes no sense to me...

So, can you clearify the relationship between the two views, and how

you

are thinking of inserting entries into your index?

- Aleks



On Fri, 21 Nov 2008 10:33:28 +0100, Raghunandan Rao
[EMAIL PROTECTED] wrote:


View structure is:

1.
Task(id* (int), name (string), start (timestamp), end (timestamp))

2.
Team(person_id (int), deptId (int), isManager (int))

* is primary key

In schema.xml I have

field name=id type=integer indexed=true stored=true
required=true/
field name=name type=text indexed=true stored=true/
field name=personId type=integer indexed=true stored=true/
field name= deptId type=integer indexed=true stored=true/

uniqueKeyid/uniqueKey


-Original Message-
From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED]
Sent: Friday, November 21, 2008 2:56 PM
To: solr-user@lucene.apache.org
Subject: Re: Unique id

Hello again. I'm getting a bit confused by your questions, and I

believe


it would be easier for us to help you if you could post the field
definitions from your schema.xml and the structure of your two

database


views.
ie.
table 1: (id (int), subject (string) -.--)
table 2: (category (string), other fields ..)


So please post this and we can try to help you.

- Aleks


On Fri, 21 Nov 2008 07:49:31 +0100, Raghunandan Rao
[EMAIL PROTECTED] wrote:


Thanks Erik.
If I convert that to a string then id field defined in schema.xml

would

fail as I have that as integer. If I change that to string then

first

view would fail as it is Integer there. What to do in such

scenarios?

Do

I need to define multiple schema.xml or multiple unique key

Re: Unique id

2008-11-21 Thread Aleksander M. Stensby
I still don't understand why you want two different indexes if you want to  
return the linked information each time anyways...
I would say the easiest way is just to index all data (all columns from  
your views) into the index like this:


taskid - taskname - start - end - personid - deptid - ismanager

then you can just search like I already explained earlier. This way, you  
have already joined by queue-id when you insert it into the index and thus  
you get both results from one single search. (if you also want to have the  
ability to search on the queueID, just add a column for that.


In general, your questions doesn't really have anything to do with solr,  
but architecture, db-design and what you want to search on.


 - A.



1.
Task(id* (int), name (string), start (timestamp), end (timestamp))

2.
Team(person_id (int), deptId (int), isManager (int))

* is primary key

In schema.xml I have

field name=id type=integer indexed=true stored=true
required=true/
field name=name type=text indexed=true stored=true/
field name=personId type=integer indexed=true stored=true/
field name= deptId type=integer indexed=true stored=true/



On Fri, 21 Nov 2008 11:59:56 +0100, Raghunandan Rao  
[EMAIL PROTECTED] wrote:



Can you also let me know how I join two search indices in one query?

That means, in this case I have two diff search indices and I need to
join by queueId and get all the tasks in one SolrQuery. I am creating
queries in Solrj.


-Original Message-
From: Raghunandan Rao [mailto:[EMAIL PROTECTED]
Sent: Friday, November 21, 2008 3:45 PM
To: solr-user@lucene.apache.org
Subject: RE: Unique id

Ok. I got your point. So I need not require ID field in the second view.
I will hence remove required=true in schema.xml. What I thought was
unique ID makes indexing easier or used to maintain doc.

Thanks a lot.

-Original Message-
From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED]
Sent: Friday, November 21, 2008 3:36 PM
To: solr-user@lucene.apache.org
Subject: Re: Unique id

Well, In that case, what do you want to search for? If I were you, I
would
make my index consist of tasks (and I assume that is what you are trying

to do).

So why don't you just use your schema.xml as you have right now, and do

the following:

Pick a person (let's say he has person_id=42 and deptId=3), get his
queue
of tasks, then for each task in queue do:
insert into index:
(id from the task), (name of the task), (id of the person), (id of the
departement)
an example:
3, this is a very important task, 42, 3
4, this one is also important, 42, 3
5, this one is low priority, 42, 3

And then for the next person you do the same, (person_id=58 and
deptId=5)
insert:
6, this is about solr, 58, 5
7, this is about lucene, 58, 5

etc.

Now you can search for all tasks in departement 5 by doing deptId:5.
If you want to search for all the tasks assigned to a specific person
you
just enter the query personId:42.
And you could also search for all tasks containing certain keywords by
doing the query name:solr OR name:lucene.

Do you understand now, or is it still unclear?

- Aleks



On Fri, 21 Nov 2008 10:56:38 +0100, Raghunandan Rao
[EMAIL PROTECTED] wrote:


Ok. There is common column in two views called queueId. I query second
view first and get all the queueids for a person. And having queueIds

I

get all the ids from first view.

Sorry for missing that column earlier. I think it should make sense

now.



-Original Message-
From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED]
Sent: Friday, November 21, 2008 3:18 PM
To: solr-user@lucene.apache.org
Subject: Re: Unique id

And in case that wasn't clear, the reason for it failing then would
obviously be because you define the id field with required=true, and
you
try inserting a document where this field is missing...

- Aleks

On Fri, 21 Nov 2008 10:46:10 +0100, Aleksander M. Stensby
[EMAIL PROTECTED] wrote:


Ok, this brings me to the question; how are the two view's connected

to

each other (since you are indexing partly view 1 and partly view 2

into

a single index structure?

If they are not at all connected I believe you have made a

fundamental



mistake / misunderstand the use of your index...
I assume that a Task can be assigned to a person, and your Team view
displays that person, right?

Maybe you are doing something like this:
View 1
1, somename, sometimestamp, someothertimestamp
2, someothername, somethirdtimestamp, timetamp4
...

View 2
1, 58, 0
2, 58, 1
3, 52, 0
...

I'm really confused about your database structure...
To me, It would be logical to add a team_id field to the team table,

and

add a third table to link tasks to a team (or to individual persons).
Once you have that information (because I do assume there MUST be

some



link there) you would do:
insert into your index:
  (id from the task), (name of the task), (id of the person assigned

to

this task), (id of the departement that this person works in).

I guess that you _might_

Re: Unique id

2008-11-19 Thread Aleksander M. Stensby
Ok, but how do you map your table structure to the index? As far as I can  
understand, the two tables have different structre, so why/how do you map  
two different datastructures onto a single index? Are the two tables  
connected in some way? If so, you could make your index structure reflect  
the union of both tables and just make one insertion into the index per  
entry of the two tables.


Maybe you could post the table structure so that I can get a better  
understanding of your use-case...


- Aleks

On Wed, 19 Nov 2008 11:25:56 +0100, Raghunandan Rao  
[EMAIL PROTECTED] wrote:



Ok got it.
I am indexing two tables differently. I am using Solrj to index with
@Field annotation. I make two queries initially and fetch the data from
two tables and index them separately. But what if the ids in two tables
are same? That means documents with same id will be deleted when doing
update.

How does this work? Please explain.

Thanks.

-Original Message-
From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 19, 2008 3:49 PM
To: solr-user@lucene.apache.org
Subject: Re: Unique id

Yes it is. You need a unique id because the add method works as and add

or update method. When adding a document whose ID is already found in
the
index, the old document will be deleted and the new will be added. Are
you
indexing two tables into the same index? Or does one entry in the index

consist of data from both tables? How are these linked together without
an
ID?

- Aleksander

On Wed, 19 Nov 2008 10:42:00 +0100, Raghunandan Rao
[EMAIL PROTECTED] wrote:


Hi,

Is the uniqueKey in schema.xml really required?


Reason is, I am indexing two tables and I have id as unique key in
schema.xml but id field is not there in one of the tables and indexing
fails. Do I really require this unique field for Solr to index it

better

or can I do away with this?


Thanks,

Rahgu






Re: Unique id

2008-11-19 Thread Aleksander M. Stensby
Yes it is. You need a unique id because the add method works as and add  
or update method. When adding a document whose ID is already found in the  
index, the old document will be deleted and the new will be added. Are you  
indexing two tables into the same index? Or does one entry in the index  
consist of data from both tables? How are these linked together without an  
ID?


- Aleksander

On Wed, 19 Nov 2008 10:42:00 +0100, Raghunandan Rao  
[EMAIL PROTECTED] wrote:



Hi,

Is the uniqueKey in schema.xml really required?


Reason is, I am indexing two tables and I have id as unique key in
schema.xml but id field is not there in one of the tables and indexing
fails. Do I really require this unique field for Solr to index it better
or can I do away with this?


Thanks,

Rahgu





--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no


Re: Use SOLR like the MySQL LIKE

2008-11-18 Thread Aleksander M. Stensby

Hi there,

You should use LowerCaseTokenizerFactory as you point out yourself. As far  
as I know, the StandardTokenizer recognizes email addresses and internet  
hostnames as one token. In your case, I guess you want an email, say  
[EMAIL PROTECTED] to be split into four tokens: average joe apache  
org, or something like that, which would indeed allow you to search for  
joe or average j* and match. To do so, you could use the  
WordDelimiterFilterFactory and split on intra-word delimiters (I think the  
defaults here are non-alphanumeric chars).


Take a look at http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters  
for more info on tokenizers and filters.


cheers,
 Aleks

On Tue, 18 Nov 2008 08:35:31 +0100, Carsten L [EMAIL PROTECTED] wrote:



Hello.

The data:
I have a dataset containing ~500.000 documents.
In each document there is an email, a name and an user ID.

The problem:
I would like to be able to search in it, but it should be like the MySQL
LIKE.

So when a user enters the search term: carsten, then the query looks  
like:

name:(carsten) OR name:(carsten*) OR email:(carsten) OR
email:(carsten*) OR userid:(carsten) OR userid:(carsten*)

Then it should match:
carsten l
carsten larsen
Carsten Larsen
Carsten
CARSTEN
etc.

And when the user enters the term: carsten l the query looks like:
name:(carsten l) OR name:(carsten l*) OR email:(carsten l) OR
email:(carsten l*) OR userid:(carsten l) OR userid:(carsten l*)

Then it should match:
carsten l
carsten larsen
Carsten Larsen

Or written to the MySQL syntax: ... WHERE `name` LIKE 'carsten%'  OR
`email` LIKE 'carsten%' OR `userid` LIKE 'carsten%'...

I know that I need to use the solr.LowerCaseTokenizerFactory on my name
and email field, to ensure case insentitive behavior.
The problem seems to be the wildcards and the whitespaces.




--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no


Re: Use SOLR like the MySQL LIKE

2008-11-18 Thread Aleksander M. Stensby

Ah, okay!
Well, then I suggest you index the field in two different ways if you want  
both possible ways of searching. One, where you treat the entire name as  
one token (in lowercase) (then you can search for avera* and match on for  
instance average joe etc.) And then another field where you tokenize on  
whitespace for instance, if you want/need that possibility aswell. Look at  
the solr copy fields and try it out, it works like a charm :)


Cheers,
 Aleksander

On Tue, 18 Nov 2008 10:40:24 +0100, Carsten L [EMAIL PROTECTED] wrote:



Thanks for the quick reply!

It is supposed to work a little like the Google Suggest or field
autocompletion.

I know I mentioned email and userid, but the problem lies with the name
field, because of the whitespaces in combination with the wildcard.

I looked at the solr.WordDelimiterFilterFactory, but it does not mention
anything about whitespaces - or wildcards.

A quick brushup:
I would like to mimic the LIKE functionality from MySQL using the  
wildcards

in the end of the searchquery.
In MySQL whitespaces are treated as characters, not splitters.


Aleksander M. Stensby wrote:


Hi there,

You should use LowerCaseTokenizerFactory as you point out yourself. As  
far
as I know, the StandardTokenizer recognizes email addresses and  
internet

hostnames as one token. In your case, I guess you want an email, say
[EMAIL PROTECTED] to be split into four tokens: average joe  
apache

org, or something like that, which would indeed allow you to search for
joe or average j* and match. To do so, you could use the
WordDelimiterFilterFactory and split on intra-word delimiters (I think  
the

defaults here are non-alphanumeric chars).

Take a look at  
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

for more info on tokenizers and filters.

cheers,
  Aleks

On Tue, 18 Nov 2008 08:35:31 +0100, Carsten L [EMAIL PROTECTED]  
wrote:




Hello.

The data:
I have a dataset containing ~500.000 documents.
In each document there is an email, a name and an user ID.

The problem:
I would like to be able to search in it, but it should be like the  
MySQL

LIKE.

So when a user enters the search term: carsten, then the query looks
like:
name:(carsten) OR name:(carsten*) OR email:(carsten) OR
email:(carsten*) OR userid:(carsten) OR userid:(carsten*)

Then it should match:
carsten l
carsten larsen
Carsten Larsen
Carsten
CARSTEN
etc.

And when the user enters the term: carsten l the query looks like:
name:(carsten l) OR name:(carsten l*) OR email:(carsten l) OR
email:(carsten l*) OR userid:(carsten l) OR userid:(carsten l*)

Then it should match:
carsten l
carsten larsen
Carsten Larsen

Or written to the MySQL syntax: ... WHERE `name` LIKE 'carsten%'  OR
`email` LIKE 'carsten%' OR `userid` LIKE 'carsten%'...

I know that I need to use the solr.LowerCaseTokenizerFactory on my  
name

and email field, to ensure case insentitive behavior.
The problem seems to be the wildcards and the whitespaces.




--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no








--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no


Re: Calculating peaks - solrj support for facet.date?

2008-11-13 Thread Aleksander M. Stensby

As Erik said, you can just set the parameters yourself
 SolrQuery query = new SolrQuery(...);
 query.set(FacetParams.FACET_DATE, ...);
 etc.

You'll find all facet-related parameters in the FacetParams interface,  
located in the org.apache.solr.common.params package.


- Aleks

On Fri, 07 Nov 2008 14:26:56 +0100, Erik Hatcher  
[EMAIL PROTECTED] wrote:




On Nov 7, 2008, at 7:23 AM, [EMAIL PROTECTED] wrote:
Sorry, but I have one more question. Does the java client solrj support  
facet.date?


Yeah, but it doesn't have explicit setters for it.  A SolrQuery is also  
a ModifiableSolrParams - so you can call the add/set methods on it using  
the same keys used with HTTP requests.


Erik






--
Aleksander M. Stensby
Senior software developer
Integrasco A/S


Re: EmbeddedSolrServer and the MultiCore functionality

2008-09-24 Thread Aleksander M. Stensby

Okay, sounds fair.
Well, why I would have multiple shards was based on the presumption that  
it would be more effective to be able to search in single shards when  
needed (if each shard contains lets say 30 million entries) and then when  
time comes, migrate one of the shards to a different node. But I guess the  
gain in performance is not significant and that i should rather have just  
one shard per node. Or?


Best regards and thanks for your answer,
 Aleksander

On Tue, 23 Sep 2008 16:57:08 +0200, Ryan McKinley [EMAIL PROTECTED]  
wrote:




If i have solr up and running and do something like this:
   query.set(shards, localhost:8080/solr/core0,localhost: 
8080/solr/core1);

I will get the results from both cores, obviously...

But is there a way to do this without using shards and accessing the  
cores through http?
I presume it would/should be possible to do the same thing directly  
against the cores, but my question is really if this has been  
implemented already / is it possible?




not implemented...

Check line 384 of SearchHandler.java
   SolrServer server = new CommonsHttpSolrServer(url, client);

it defaults to CommonsHttpSolrServer.

This could easily change to EmbeddedSolrServer, but i'm not sure it is a  
very common usecase...


why would you have multiple shards on the same machine?

ryan







--
Aleksander M. Stensby
Senior Software Developer
Integrasco A/S
+47 41 22 82 72
[EMAIL PROTECTED]


EmbeddedSolrServer and the MultiCore functionality

2008-09-23 Thread Aleksander M. Stensby
Hello everyone, I'm new to Solr (have been using Lucene for a few years  
now). We are looking into Solr and have heard many good things about the  
project:)


I have a few questions regarding the EmbeddedSolrServer in Solrj and the  
MultiCore features... I've tried to find answers to this in the archives  
but have not succeeded.
The thing is, I want to be able to use the Embedded server to access  
multiple cores on one machine, and I would like to at least have the  
possibility to access the lucene indexes without http. In particular I'm  
wondering if it is possible to do the shards (distributed search)  
approach using the embedded server, without using http requests.


lets say I register 2 cores to a container and init my embedded server  
like this:

CoreContainer container = new CoreContainer();
container.register(core1, core1, false);
container.register(core2, core2, false);
server = new EmbeddedSolrServer(container, core1);
then queries performed on my server will return results from core1... and  
if i do ..=new EmbeddedSolrServer(container, core2) the results will  
come from core2.


If i have solr up and running and do something like this:
query.set(shards,  
localhost:8080/solr/core0,localhost:8080/solr/core1);

I will get the results from both cores, obviously...

But is there a way to do this without using shards and accessing the cores  
through http?
I presume it would/should be possible to do the same thing directly  
against the cores, but my question is really if this has been implemented  
already / is it possible?



Thanks in advance for any replies!

Best regards,
 Aleksander


--
Aleksander M. Stensby
Senior Software Developer
Integrasco A/S
+47 41 22 82 72
[EMAIL PROTECTED]