Migrating from Lucene 2.9.1 to Solr 1.4.0 - Performance issues under heavy load

2010-08-04 Thread Ophir Adiv
[posted this yesterday in lucene-user mailing list, and got an advice to
post this here instead. excuse me for spamming]

Hi,

I'm currently involved in a project of migrating from Lucene 2.9.1 to Solr
1.4.0.
During stress testing, I encountered this performance problem:
While actual search times in our shards (which are now running Solr) have
not changed, the total time it takes for a query has increased dramatically.
During this performance test, we of course do not modify the indexes.
Our application is sending Solr select queries concurrently to the 8 shards,
using CommonsHttpSolrServer.
I added some timing debug messages, and found that
CommonsHttpSolrServer.java, line 416 takes about 95% of the application's
total search time:
int statusCode = _httpClient.executeMethod(method);

Just to clarify: looking at access logs of the Solr shards, TTLB for a query
might be around 5 ms. (on all shards), but httpClient.executeMethod() for
this query can be much higher - say, 50 ms.
On average, if under light load queries take 12 ms. on average, under heavy
load the take around 22 ms.

Another route we tried to pursue is add the "shards=shard1,shard2,…"
parameter to the query instead of doing this ourselves, but this doesn't
seem to work due to an NPE caused by QueryComponent.returnFields(), line
553:
if (returnScores && sdoc.score != null) {

where sdoc is null. I saw there is a null check on trunk, but since we're
currently using Solr 1.4.0's ready-made WAR file, I didn't see an easy way
around this.
Note: we're using a custom query component which extends QueryComponent, but
debugging this, I saw nothing wrong with the results at this point in the
code.

Our previous code used HTTP in a different manner:
For each request, we created a new
sun.net.www.protocol.http.HttpURLConnection, and called its getInputStream()
method.
Under the same load as the new application, the old application does not
encounter the delays mentioned above.

Our current code is initializing CommonsHttpSolrServer for each shard this
way:
MultiThreadedHttpConnectionManager httpConnectionManager = new
MultiThreadedHttpConnectionManager();
httpConnectionManager.getParams().setTcpNoDelay(true);
httpConnectionManager.getParams().setMaxTotalConnections(1024);
httpConnectionManager.getParams().setStaleCheckingEnabled(false);
HttpClient httpClient = new HttpClient();
HttpClientParams params = new HttpClientParams();
params.setCookiePolicy(CookiePolicy.IGNORE_COOKIES);
params.setAuthenticationPreemptive(false);
params.setContentCharset(StringConstants.UTF8);
httpClient.setParams(params);
httpClient.setHttpConnectionManager(httpConnectionManager);

and passing the new HttpClient to the Solr Server:
solrServer = new CommonsHttpSolrServer(coreUrl, httpClient);

We tried two different ways - one with a single
MultiThreadedHttpConnectionManager and HttpClient for all the SolrServer's,
and the other with a new MultiThreadedHttpConnectionManager and HttpClient
for each SolrServer.
Both tries yielded similar performance results.
Also tried to give setMaxTotalConnections() a much higher connections number
(1,000,000) - didn't have an effect.

One last thing - to answer Lance's question about this being an "apples to
apples" comparison (in lucene-user thread) - yes, our main goal in this
project is to do things as close to the previous version as possible.
This way we can monitor that behavior (both quality and performance) remains
similar, release this version, and then move forward to improve things.
Of course, there are some changes, but I believe we are indeed measuring the
complete flow on both apps, and that both apps are returning the same fields
via HTTP.

Would love to hear what you think about this. TIA,
Ophir


Setting up apache solr in eclipse with Tomcat

2010-08-04 Thread Hando420

I would like to setup apache solr in eclipse using tomcat. It is easy to
setup with jetty but with tomcat it doesn't run solr on runtime. Anyone has
done this before?

Hando
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Setting-up-apache-solr-in-eclipse-with-Tomcat-tp1021673p1021673.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Migrating from Lucene 2.9.1 to Solr 1.4.0 - Performance issues under heavy load

2010-08-04 Thread Peter Karich
Ophir,

this sounds a bit strange:

> CommonsHttpSolrServer.java, line 416 takes about 95% of the application's 
> total search time

Is this only for heavy load?

Some other things:

 * with lucene you accessed the indices with MultiSearcher in a LAN, right?
 * did you look into the logs of the servers, is there something
wrong/delayed?
 * did you enable gzip compression for your servers or even the binary
writer/parser for your solr clients?

CommonsHttpSolrServer server = ...
server.setRequestWriter(new BinaryRequestWriter());
server.setParser(new BinaryResponseParser());

Regards,
Peter.

> [posted this yesterday in lucene-user mailing list, and got an advice to
> post this here instead. excuse me for spamming]
>
> Hi,
>
> I'm currently involved in a project of migrating from Lucene 2.9.1 to Solr
> 1.4.0.
> During stress testing, I encountered this performance problem:
> While actual search times in our shards (which are now running Solr) have
> not changed, the total time it takes for a query has increased dramatically.
> During this performance test, we of course do not modify the indexes.
> Our application is sending Solr select queries concurrently to the 8 shards,
> using CommonsHttpSolrServer.
> I added some timing debug messages, and found that
> CommonsHttpSolrServer.java, line 416 takes about 95% of the application's
> total search time:
> int statusCode = _httpClient.executeMethod(method);
>
> Just to clarify: looking at access logs of the Solr shards, TTLB for a query
> might be around 5 ms. (on all shards), but httpClient.executeMethod() for
> this query can be much higher - say, 50 ms.
> On average, if under light load queries take 12 ms. on average, under heavy
> load the take around 22 ms.
>
> Another route we tried to pursue is add the "shards=shard1,shard2,…"
> parameter to the query instead of doing this ourselves, but this doesn't
> seem to work due to an NPE caused by QueryComponent.returnFields(), line
> 553:
> if (returnScores && sdoc.score != null) {
>
> where sdoc is null. I saw there is a null check on trunk, but since we're
> currently using Solr 1.4.0's ready-made WAR file, I didn't see an easy way
> around this.
> Note: we're using a custom query component which extends QueryComponent, but
> debugging this, I saw nothing wrong with the results at this point in the
> code.
>
> Our previous code used HTTP in a different manner:
> For each request, we created a new
> sun.net.www.protocol.http.HttpURLConnection, and called its getInputStream()
> method.
> Under the same load as the new application, the old application does not
> encounter the delays mentioned above.
>
> Our current code is initializing CommonsHttpSolrServer for each shard this
> way:
> MultiThreadedHttpConnectionManager httpConnectionManager = new
> MultiThreadedHttpConnectionManager();
> httpConnectionManager.getParams().setTcpNoDelay(true);
> httpConnectionManager.getParams().setMaxTotalConnections(1024);
> httpConnectionManager.getParams().setStaleCheckingEnabled(false);
> HttpClient httpClient = new HttpClient();
> HttpClientParams params = new HttpClientParams();
> params.setCookiePolicy(CookiePolicy.IGNORE_COOKIES);
> params.setAuthenticationPreemptive(false);
> params.setContentCharset(StringConstants.UTF8);
> httpClient.setParams(params);
> httpClient.setHttpConnectionManager(httpConnectionManager);
>
> and passing the new HttpClient to the Solr Server:
> solrServer = new CommonsHttpSolrServer(coreUrl, httpClient);
>
> We tried two different ways - one with a single
> MultiThreadedHttpConnectionManager and HttpClient for all the SolrServer's,
> and the other with a new MultiThreadedHttpConnectionManager and HttpClient
> for each SolrServer.
> Both tries yielded similar performance results.
> Also tried to give setMaxTotalConnections() a much higher connections number
> (1,000,000) - didn't have an effect.
>
> One last thing - to answer Lance's question about this being an "apples to
> apples" comparison (in lucene-user thread) - yes, our main goal in this
> project is to do things as close to the previous version as possible.
> This way we can monitor that behavior (both quality and performance) remains
> similar, release this version, and then move forward to improve things.
> Of course, there are some changes, but I believe we are indeed measuring the
> complete flow on both apps, and that both apps are returning the same fields
> via HTTP.
>
> Would love to hear what you think about this. TIA,
> Ophir
>
>   


-- 
http://karussell.wordpress.com/



RE: wildcard and proximity searches

2010-08-04 Thread Frederico Azeiteiro
Thanks for you ideia.

At this point I'm logging each query time. My ideia is to divide my
queries into "normal queries" and "heavy queries". I have some heavy
queries with 1 minute or 2mintes to get results. But they have for
instance (*word1* AND *word2* AND word3*). I guess that this will be
always slower (could be a little faster with
"ReversedWildcardFilterFactory") but they never be ready in a few
seconds. For now, I just increased the timeout for those :) (using
solrnet).

My priority at the moment is the queries phrases like "word1* word2*
word3". After this is working, I'll try to optimize the "heavy queries"

Frederico


-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu] 
Sent: quarta-feira, 4 de Agosto de 2010 01:41
To: solr-user@lucene.apache.org
Subject: Re: wildcard and proximity searches

Frederico Azeiteiro wrote:
>
>>> But it is unusual to use both leading and trailing * operator. Why
are
>>>   
> you doing this?
>
> Yes I know, but I have a few queries that need this. I'll try the
> "ReversedWildcardFilterFactory". 
>
>
>   

ReverseWildcardFilter will help leading wildcard, but will not help 
trying to use a query with BOTH leading and trailing wildcard. it'll 
still be slow. Solr/lucene isn't good at that; I didn't even know Solr 
would do it at all in fact.

If you really needed to do that, the way to play to solr/lucene's way of

doing things, would be to have a field where you actually index each 
_character_ as a seperate token. Then leading and trailing wildcard 
search is basically reduced to a "phrase search", but where the words 
are actually characters.   But then you're going to get an index where 
pretty much every token belongs to every document, which Solr isn't that

great at either, but then you can apply "commongram" stuff on top to 
help that out a lot too. Not quite sure what the end result will be, 
I've never tried it.  I'd only use that weird special "char as token" 
field for queries that actually required leading and trailing wildcards.

Figuring out how to set up your analyzers, and what (if anything) you're

going to have to do client-app-side to transform the user's query into 
something that'll end up searching like a "phrase search where each 
'word' is a character is left as an exersize for the reader. :)  

Jonathan


AW: Migrating from Lucene 2.9.1 to Solr 1.4.0 - Performance issues under heavy load

2010-08-04 Thread Bastian Spitzer
Im not sure if i understand your problem, but basicly it isnt Solr vs Lucene 
but HttpURLConnection vs Solrj's 
CommonsHttpSolrServer, because Server Query Times havent changed at all from 
what u say?

Why arent you querying the Server the same way you did before when u want to 
compare solr to lucene only?

-Ursprüngliche Nachricht-
Von: Ophir Adiv [mailto:firt...@gmail.com] 
Gesendet: Mittwoch, 4. August 2010 09:11
An: solr-user@lucene.apache.org
Betreff: Migrating from Lucene 2.9.1 to Solr 1.4.0 - Performance issues under 
heavy load

[posted this yesterday in lucene-user mailing list, and got an advice to post 
this here instead. excuse me for spamming]

Hi,

I'm currently involved in a project of migrating from Lucene 2.9.1 to Solr 
1.4.0.
During stress testing, I encountered this performance problem:
While actual search times in our shards (which are now running Solr) have not 
changed, the total time it takes for a query has increased dramatically.
During this performance test, we of course do not modify the indexes.
Our application is sending Solr select queries concurrently to the 8 shards, 
using CommonsHttpSolrServer.
I added some timing debug messages, and found that CommonsHttpSolrServer.java, 
line 416 takes about 95% of the application's total search time:
int statusCode = _httpClient.executeMethod(method);

Just to clarify: looking at access logs of the Solr shards, TTLB for a query 
might be around 5 ms. (on all shards), but httpClient.executeMethod() for this 
query can be much higher - say, 50 ms.
On average, if under light load queries take 12 ms. on average, under heavy 
load the take around 22 ms.

Another route we tried to pursue is add the "shards=shard1,shard2,..."
parameter to the query instead of doing this ourselves, but this doesn't seem 
to work due to an NPE caused by QueryComponent.returnFields(), line
553:
if (returnScores && sdoc.score != null) {

where sdoc is null. I saw there is a null check on trunk, but since we're 
currently using Solr 1.4.0's ready-made WAR file, I didn't see an easy way 
around this.
Note: we're using a custom query component which extends QueryComponent, but 
debugging this, I saw nothing wrong with the results at this point in the code.

Our previous code used HTTP in a different manner:
For each request, we created a new
sun.net.www.protocol.http.HttpURLConnection, and called its getInputStream() 
method.
Under the same load as the new application, the old application does not 
encounter the delays mentioned above.

Our current code is initializing CommonsHttpSolrServer for each shard this
way:
MultiThreadedHttpConnectionManager httpConnectionManager = new 
MultiThreadedHttpConnectionManager();
httpConnectionManager.getParams().setTcpNoDelay(true);
httpConnectionManager.getParams().setMaxTotalConnections(1024);
httpConnectionManager.getParams().setStaleCheckingEnabled(false);
HttpClient httpClient = new HttpClient();
HttpClientParams params = new HttpClientParams();
params.setCookiePolicy(CookiePolicy.IGNORE_COOKIES);
params.setAuthenticationPreemptive(false);
params.setContentCharset(StringConstants.UTF8);
httpClient.setParams(params);
httpClient.setHttpConnectionManager(httpConnectionManager);

and passing the new HttpClient to the Solr Server:
solrServer = new CommonsHttpSolrServer(coreUrl, httpClient);

We tried two different ways - one with a single 
MultiThreadedHttpConnectionManager and HttpClient for all the SolrServer's, and 
the other with a new MultiThreadedHttpConnectionManager and HttpClient for each 
SolrServer.
Both tries yielded similar performance results.
Also tried to give setMaxTotalConnections() a much higher connections number
(1,000,000) - didn't have an effect.

One last thing - to answer Lance's question about this being an "apples to 
apples" comparison (in lucene-user thread) - yes, our main goal in this project 
is to do things as close to the previous version as possible.
This way we can monitor that behavior (both quality and performance) remains 
similar, release this version, and then move forward to improve things.
Of course, there are some changes, but I believe we are indeed measuring the 
complete flow on both apps, and that both apps are returning the same fields 
via HTTP.

Would love to hear what you think about this. TIA, Ophir


Re: Migrating from Lucene 2.9.1 to Solr 1.4.0 - Performance issues under heavy load

2010-08-04 Thread Ophir Adiv
On Wed, Aug 4, 2010 at 10:50 AM, Peter Karich  wrote:

> Ophir,
>
> this sounds a bit strange:
>
> > CommonsHttpSolrServer.java, line 416 takes about 95% of the application's
> total search time
>
> Is this only for heavy load?
>
>
I think this makes sense, since the hard work is done by Solr - once the
application gets the search results from the shards, it does a bit of
manipulations on them (combine, filter, ...), but these are easy tasks.

Some other things:


>  * with lucene you accessed the indices with MultiSearcher in a LAN, right?
>

No, each shard was run under a different tomcat instance, and each shard was
accessed by HTTP calls (the same way we're trying to work now with Solr)


>  * did you look into the logs of the servers, is there something
> wrong/delayed?
>

Everything seems peachy... logs are clean of errors/warnings and the likes


>  * did you enable gzip compression for your servers or even the binary
> writer/parser for your solr clients?
>
>
We're running our application (and Solr) under Tomcat. We do not enable
compression (the configuration remained similar to our old application's
configuration)
Tried using XMLResponseParser instead of BinaryResponseParser - hardly
affected run times.

Thanks for the ideas,
Ophir

CommonsHttpSolrServer server = ...
> server.setRequestWriter(new BinaryRequestWriter());
> server.setParser(new BinaryResponseParser());
>
> Regards,
> Peter.
>
> > [posted this yesterday in lucene-user mailing list, and got an advice to
> > post this here instead. excuse me for spamming]
> >
> > Hi,
> >
> > I'm currently involved in a project of migrating from Lucene 2.9.1 to
> Solr
> > 1.4.0.
> > During stress testing, I encountered this performance problem:
> > While actual search times in our shards (which are now running Solr) have
> > not changed, the total time it takes for a query has increased
> dramatically.
> > During this performance test, we of course do not modify the indexes.
> > Our application is sending Solr select queries concurrently to the 8
> shards,
> > using CommonsHttpSolrServer.
> > I added some timing debug messages, and found that
> > CommonsHttpSolrServer.java, line 416 takes about 95% of the application's
> > total search time:
> > int statusCode = _httpClient.executeMethod(method);
> >
> > Just to clarify: looking at access logs of the Solr shards, TTLB for a
> query
> > might be around 5 ms. (on all shards), but httpClient.executeMethod() for
> > this query can be much higher - say, 50 ms.
> > On average, if under light load queries take 12 ms. on average, under
> heavy
> > load the take around 22 ms.
> >
> > Another route we tried to pursue is add the "shards=shard1,shard2,…"
> > parameter to the query instead of doing this ourselves, but this doesn't
> > seem to work due to an NPE caused by QueryComponent.returnFields(), line
> > 553:
> > if (returnScores && sdoc.score != null) {
> >
> > where sdoc is null. I saw there is a null check on trunk, but since we're
> > currently using Solr 1.4.0's ready-made WAR file, I didn't see an easy
> way
> > around this.
> > Note: we're using a custom query component which extends QueryComponent,
> but
> > debugging this, I saw nothing wrong with the results at this point in the
> > code.
> >
> > Our previous code used HTTP in a different manner:
> > For each request, we created a new
> > sun.net.www.protocol.http.HttpURLConnection, and called its
> getInputStream()
> > method.
> > Under the same load as the new application, the old application does not
> > encounter the delays mentioned above.
> >
> > Our current code is initializing CommonsHttpSolrServer for each shard
> this
> > way:
> > MultiThreadedHttpConnectionManager httpConnectionManager = new
> > MultiThreadedHttpConnectionManager();
> > httpConnectionManager.getParams().setTcpNoDelay(true);
> > httpConnectionManager.getParams().setMaxTotalConnections(1024);
> > httpConnectionManager.getParams().setStaleCheckingEnabled(false);
> > HttpClient httpClient = new HttpClient();
> > HttpClientParams params = new HttpClientParams();
> > params.setCookiePolicy(CookiePolicy.IGNORE_COOKIES);
> > params.setAuthenticationPreemptive(false);
> > params.setContentCharset(StringConstants.UTF8);
> > httpClient.setParams(params);
> > httpClient.setHttpConnectionManager(httpConnectionManager);
> >
> > and passing the new HttpClient to the Solr Server:
> > solrServer = new CommonsHttpSolrServer(coreUrl, httpClient);
> >
> > We tried two different ways - one with a single
> > MultiThreadedHttpConnectionManager and HttpClient for all the
> SolrServer's,
> > and the other with a new MultiThreadedHttpConnectionManager and
> HttpClient
> > for each SolrServer.
> > Both tries yielded similar performance results.
> > Also tried to give setMaxTotalConnections() a much higher connections
> number
> > (1,000,000) - didn't have an effect.
> >
> > One last thing - to answer Lance's question about this being an "apples
> to
> > a

Is there a better for solor server side loadbalance?

2010-08-04 Thread Chengyang
The default solr solution is client side loadbalance.
Is there a solution provide the server side loadbalance?



Support loading queries from external files in QuerySenderListener

2010-08-04 Thread Stanislaw
Hi all!
I cant load my custom queries from the external file, as written here:
https://issues.apache.org/jira/browse/SOLR-784

This option is seems to be not implemented in current version 1.4.1 of Solr.
It was deleted or it comes first with new version?

regards,
Stanislaw


Date faceting

2010-08-04 Thread Eric Grobler
Hi Solr community,

How do I facet on timestamp for example?

I tried something like this - but I get no result.

facet=true
facet.date=timestamp
f.facet.timestamp.date.start=2010-01-01T00:00:00Z
f.facet.timestamp.date.end=2010-12-31T00:00:00Z
f.facet.timestamp.date.gap=+1HOUR
f.facet.timestamp.date.hardend=true

Thanks
ericz


Re: Date faceting

2010-08-04 Thread Koji Sekiguchi

(10/08/04 19:42), Eric Grobler wrote:

Hi Solr community,

How do I facet on timestamp for example?

I tried something like this - but I get no result.

facet=true
facet.date=timestamp
f.facet.timestamp.date.start=2010-01-01T00:00:00Z
f.facet.timestamp.date.end=2010-12-31T00:00:00Z
f.facet.timestamp.date.gap=+1HOUR
f.facet.timestamp.date.hardend=true

Thanks
ericz

   

Your parameters are not correct. Try:

facet=true
facet.date=timestamp
facet.date.start=2010-01-01T00:00:00Z
facet.date.end=2010-12-31T00:00:00Z
facet.date.gap=+1HOUR
facet.date.hardend=true

If you want to use per-field override feature, you can set them:

f.timestamp.facet.date.start=2010-01-01T00:00:00Z
f.timestamp.facet.date.end=2010-12-31T00:00:00Z
f.timestamp.facet.date.gap=+1HOUR
f.timestamp.facet.date.hardend=true

Koji

--
http://www.rondhuit.com/en/



Re: Best solution to avoiding multiple query requests

2010-08-04 Thread kenf_nc

Not sure the processing would be any faster than just querying again, but, in
your original result set the first doc that has a field value that matches a
to 10 facet, will be the number 1 item if you fq on that facet value. So you
don't need to query it again. You would only need to query those that aren't
in your result set.
ie:
   q=dog&facet=on&facet.field=foo
results 10 docs
   id=1, foo=A
   id=2, foo=A
   id=3, foo=B
   id=4, foo=C
   id=5, foo=B
   id=6, foo=A
   id=7, foo=Z
   id=8, foo=T
   id=9, foo=B
   id=10, foo=J

If your facet results top 10 were (A, B, T, J, D, X, Q, O, P, I)
you already have the number 1 for A (id 1), B (id 3), T (id 8) and J (id 10)
in your very first query. You only need to query D, X, Q, O, P, I. 

If your first query returned 100 instead of 10 you may even have more of the
top 10 represented. Again, the processing steps you would need to do may not
be any faster than re-querying, it depends on the speed of your index and
network etc.

I would think that if your second query was
q=dog&fq=(foo=A OR foo=B OR foo=T...etc) then you have even a greater chance
of having the number 1 result for each of the top 10 in just your second
query.

  
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-solution-to-avoiding-multiple-query-requests-tp1020886p1022397.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Best solution to avoiding multiple query requests

2010-08-04 Thread Geert-Jan Brits
Field Collapsing (currently as patch) is exactly what you're looking for
imo.

http://wiki.apache.org/solr/FieldCollapsing

Geert-Jan


2010/8/4 Ken Krugler 

> Hi all,
>
> I've got a situation where the key result from an initial search request
> (let's say for "dog") is the list of values from a faceted field, sorted by
> hit count.
>
> For the top 10 of these faceted field values, I need to get the top hit for
> the target request ("dog") restricted to that value for the faceted field.
>
> Currently this is 11 total requests, of which the 10 requests following the
> initial query can be made in parallel. But that's still a lot of requests.
>
> So my questions are:
>
> 1. Is there any magic query to handle this with Solr as-is?
>
> 2. if not, is the best solution to create my own request handler?
>
> 3. And in that case, any input/tips on developing this type of custom
> request handler?
>
> Thanks,
>
> -- Ken
>
>
> 
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> e l a s t i c   w e b   m i n i n g
>
>
>
>
>


Re: Date faceting

2010-08-04 Thread Eric Grobler
Thanks Koji,

It works :-)

Have a nice day.

regards
ericz

On Wed, Aug 4, 2010 at 12:08 PM, Koji Sekiguchi  wrote:

> (10/08/04 19:42), Eric Grobler wrote:
>
>> Hi Solr community,
>>
>> How do I facet on timestamp for example?
>>
>> I tried something like this - but I get no result.
>>
>> facet=true
>> facet.date=timestamp
>> f.facet.timestamp.date.start=2010-01-01T00:00:00Z
>> f.facet.timestamp.date.end=2010-12-31T00:00:00Z
>> f.facet.timestamp.date.gap=+1HOUR
>> f.facet.timestamp.date.hardend=true
>>
>> Thanks
>> ericz
>>
>>
>>
> Your parameters are not correct. Try:
>
>
> facet=true
> facet.date=timestamp
> facet.date.start=2010-01-01T00:00:00Z
> facet.date.end=2010-12-31T00:00:00Z
> facet.date.gap=+1HOUR
> facet.date.hardend=true
>
> If you want to use per-field override feature, you can set them:
>
> f.timestamp.facet.date.start=2010-01-01T00:00:00Z
> f.timestamp.facet.date.end=2010-12-31T00:00:00Z
> f.timestamp.facet.date.gap=+1HOUR
> f.timestamp.facet.date.hardend=true
>
> Koji
>
> --
> http://www.rondhuit.com/en/
>
>


Re: Multi word synomyms

2010-08-04 Thread Qwerky

It would be nice if you could configure some kind of filter to be processed
before the query string is passed to the parser. The QueryComponent class
seems a nice place for this; a filter could be run against the raw query and
ResponseBuilder's queryString value could be modified before the QParser is
created.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-word-synomyms-tp1019722p1022461.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Setting up apache solr in eclipse with Tomcat

2010-08-04 Thread jayendra patil
Have got solr working in the Eclipse and deployed on Tomcat through eclipse
plugin.
The Crude approach, was to

   1. Import the Solr war into Eclipse which will be imported as a web
   project and can be deployed on tomcat.
   2. Add multiple source folders to the Project, linked to the checked out
   solr source code. e.g. entry in .project file
   
   
   common
   2
   D:/Solr/solr/src/common
   
   .
   
   3. Remove the solr jars from the web-inf lib, so that changes on the
   project sources can be deployed and debugged.

Let me know if you get a better approach.

On Wed, Aug 4, 2010 at 3:49 AM, Hando420  wrote:

>
> I would like to setup apache solr in eclipse using tomcat. It is easy to
> setup with jetty but with tomcat it doesn't run solr on runtime. Anyone has
> done this before?
>
> Hando
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Setting-up-apache-solr-in-eclipse-with-Tomcat-tp1021673p1021673.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


analysis tool vs. reality

2010-08-04 Thread Justin Lolofie
Erik: Yes, I did re-index if that means adding the document again.
Here are the exact steps I took:

1. analysis.jsp "ABC12" does NOT match title "ABC12" (however, ABC or 12 does)
2. changed schema.xml WordDelimeterFilterFactory catenate-all
3. restarted tomcat
4. deleted the document with title "ABC12"
5. added the document with title "ABC12"
6. query "ABC12" does NOT result in the document with title "ABC12"
7. analysis.jsp "ABC12" DOES match that document now

Is there any way to see, given an ID, how something is indexed internally?

Lance: I understand the index/query sections of analysis.jsp. However,
it operates on text that you enter into the form, not on actual index
data. Since all my documents have a unique ID, I'd like to supply an
ID and a query, and get back the same index/query sections- using
whats actually in the index.


-- Forwarded message --
From: Erik Hatcher 
To: solr-user@lucene.apache.org
Date: Tue, 3 Aug 2010 22:43:17 -0400
Subject: Re: analysis tool vs. reality
Did you reindex after changing the schema?


On Aug 3, 2010, at 7:35 PM, Justin Lolofie wrote:

Hi Erik, thank you for replying. So, turning on debugQuery shows
information about how the query is processed- is there a way to see
how things are stored internally in the index?

My query is "ABC12". There is a document who's "title" field is
"ABC12". However, I can only get it to match if I search for "ABC" or
"12". This was also true in the analysis tool up until recently.
However, I changed schema.xml and turned on catenate-all in
WordDelimterFilterFactory for title fieldtype. Now, in the analysis
tool "ABC12" matches "ABC12". However, when doing an actual query, it
does not match.

Thank you for any help,
Justin


-- Forwarded message --
From: Erik Hatcher 
To: solr-user@lucene.apache.org
Date: Tue, 3 Aug 2010 16:50:06 -0400
Subject: Re: analysis tool vs. reality
The analysis tool is merely that, but during querying there is also a
query parser involved.  Adding debugQuery=true to your request will
give you the parsed query in the response offering insight into what
might be going on.   Could be lots of things, like not querying the
fields you think you are to a misunderstanding about some text not
being analyzed (like wildcard clauses).

 Erik

On Aug 3, 2010, at 4:43 PM, Justin Lolofie wrote:

  Hello,

  I have found the analysis tool in the admin page to be very useful in
  understanding my schema. I've made changes to my schema so that a
  particular case I'm looking at matches properly. I restarted solr,
  deleted the document from the index, and added it again. But still,
  when I do a query, the document does not get returned in the results.

  Does anyone have any tips for debugging this sort of issue? What is
  different between what I see in analysis tool and new documents added
  to the index?

  Thanks,
  Justin


No "group by"? looking for an alternative.

2010-08-04 Thread Mickael Magniez

Hello,

I'm dealing with a problem since few days  : I want to index and search
shoes, each shoe can have several size and colors, at different prices.

So, what i want is : when I search for "Converse", i want to retrieve one
"shoe per model", i-e one color and one size, but having colors and sizes in
facets.

My first idea was to copy SQL behaviour with a "SELECT * FROM solr WHERE
text CONTAINS 'converse' GROUP BY model". 
But no group by in Solr :(. I try with FieldCollapsing, but have many bugs
(NullPointerException).

Then I try with multivalued facets  : 



It's nearly working, but i have a problem : when i filtered on red shoes, in
the size facet, I also have sizes which are not available in red. I don't
find any solutions to filter multivalued facet with value of another
multivalued facet.

So if anyone have an idea for solving this problem...



Mickael.

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/No-group-by-looking-for-an-alternative-tp1022738p1022738.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: analysis tool vs. reality

2010-08-04 Thread Robert Muir
I think I agree with Justin here, I think the way analysis tool highlights
'matches' is extremely misleading, especially considering it completely
ignores queryparsing.

it would be better if it put your text in a memoryindex and actually parsed
the query w/ queryparser, ran it, and used the highlighter to try to show
any matches.

On Wed, Aug 4, 2010 at 10:14 AM, Justin Lolofie  wrote:

> Erik: Yes, I did re-index if that means adding the document again.
> Here are the exact steps I took:
>
> 1. analysis.jsp "ABC12" does NOT match title "ABC12" (however, ABC or 12
> does)
> 2. changed schema.xml WordDelimeterFilterFactory catenate-all
> 3. restarted tomcat
> 4. deleted the document with title "ABC12"
> 5. added the document with title "ABC12"
> 6. query "ABC12" does NOT result in the document with title "ABC12"
> 7. analysis.jsp "ABC12" DOES match that document now
>
> Is there any way to see, given an ID, how something is indexed internally?
>
> Lance: I understand the index/query sections of analysis.jsp. However,
> it operates on text that you enter into the form, not on actual index
> data. Since all my documents have a unique ID, I'd like to supply an
> ID and a query, and get back the same index/query sections- using
> whats actually in the index.
>
>
> -- Forwarded message --
> From: Erik Hatcher 
> To: solr-user@lucene.apache.org
> Date: Tue, 3 Aug 2010 22:43:17 -0400
> Subject: Re: analysis tool vs. reality
> Did you reindex after changing the schema?
>
>
> On Aug 3, 2010, at 7:35 PM, Justin Lolofie wrote:
>
>Hi Erik, thank you for replying. So, turning on debugQuery shows
>information about how the query is processed- is there a way to see
>how things are stored internally in the index?
>
>My query is "ABC12". There is a document who's "title" field is
>"ABC12". However, I can only get it to match if I search for "ABC" or
>"12". This was also true in the analysis tool up until recently.
>However, I changed schema.xml and turned on catenate-all in
>WordDelimterFilterFactory for title fieldtype. Now, in the analysis
>tool "ABC12" matches "ABC12". However, when doing an actual query, it
>does not match.
>
>Thank you for any help,
>Justin
>
>
>-- Forwarded message --
>From: Erik Hatcher 
>To: solr-user@lucene.apache.org
>Date: Tue, 3 Aug 2010 16:50:06 -0400
>Subject: Re: analysis tool vs. reality
>The analysis tool is merely that, but during querying there is also a
>query parser involved.  Adding debugQuery=true to your request will
>give you the parsed query in the response offering insight into what
>might be going on.   Could be lots of things, like not querying the
>fields you think you are to a misunderstanding about some text not
>being analyzed (like wildcard clauses).
>
> Erik
>
>On Aug 3, 2010, at 4:43 PM, Justin Lolofie wrote:
>
>  Hello,
>
>  I have found the analysis tool in the admin page to be very useful in
>  understanding my schema. I've made changes to my schema so that a
>  particular case I'm looking at matches properly. I restarted solr,
>  deleted the document from the index, and added it again. But still,
>  when I do a query, the document does not get returned in the results.
>
>  Does anyone have any tips for debugging this sort of issue? What is
>  different between what I see in analysis tool and new documents added
>  to the index?
>
>  Thanks,
>   Justin
>



-- 
Robert Muir
rcm...@gmail.com


analysis tool vs. reality

2010-08-04 Thread Justin Lolofie
Wow, I got to work this morning and my query results now include the
'ABC12' document. I'm not sure what that means. Either I made a
mistake in the process I described in the last email (I dont think
this is the case) or there is some kind of caching of query results
going on that doesnt get flushed on a restart of tomcat.




Erik: Yes, I did re-index if that means adding the document again.
Here are the exact steps I took:

1. analysis.jsp "ABC12" does NOT match title "ABC12" (however, ABC or 12 does)
2. changed schema.xml WordDelimeterFilterFactory catenate-all
3. restarted tomcat
4. deleted the document with title "ABC12"
5. added the document with title "ABC12"
6. query "ABC12" does NOT result in the document with title "ABC12"
7. analysis.jsp "ABC12" DOES match that document now

Is there any way to see, given an ID, how something is indexed internally?

Lance: I understand the index/query sections of analysis.jsp. However,
it operates on text that you enter into the form, not on actual index
data. Since all my documents have a unique ID, I'd like to supply an
ID and a query, and get back the same index/query sections- using
whats actually in the index.


-- Forwarded message --
From: Erik Hatcher 
To: solr-user@lucene.apache.org
Date: Tue, 3 Aug 2010 22:43:17 -0400
Subject: Re: analysis tool vs. reality
Did you reindex after changing the schema?


On Aug 3, 2010, at 7:35 PM, Justin Lolofie wrote:

Hi Erik, thank you for replying. So, turning on debugQuery shows
information about how the query is processed- is there a way to see
how things are stored internally in the index?

My query is "ABC12". There is a document who's "title" field is
"ABC12". However, I can only get it to match if I search for "ABC" or
"12". This was also true in the analysis tool up until recently.
However, I changed schema.xml and turned on catenate-all in
WordDelimterFilterFactory for title fieldtype. Now, in the analysis
tool "ABC12" matches "ABC12". However, when doing an actual query, it
does not match.

Thank you for any help,
Justin


-- Forwarded message --
From: Erik Hatcher 
To: solr-user@lucene.apache.org
Date: Tue, 3 Aug 2010 16:50:06 -0400
Subject: Re: analysis tool vs. reality
The analysis tool is merely that, but during querying there is also a
query parser involved.  Adding debugQuery=true to your request will
give you the parsed query in the response offering insight into what
might be going on.   Could be lots of things, like not querying the
fields you think you are to a misunderstanding about some text not
being analyzed (like wildcard clauses).

 Erik

On Aug 3, 2010, at 4:43 PM, Justin Lolofie wrote:

  Hello,

  I have found the analysis tool in the admin page to be very useful in
  understanding my schema. I've made changes to my schema so that a
  particular case I'm looking at matches properly. I restarted solr,
  deleted the document from the index, and added it again. But still,
  when I do a query, the document does not get returned in the results.

  Does anyone have any tips for debugging this sort of issue? What is
  different between what I see in analysis tool and new documents added
  to the index?

  Thanks,
  Justin


Re: Support loading queries from external files in QuerySenderListener

2010-08-04 Thread Shalin Shekhar Mangar
On Wed, Aug 4, 2010 at 3:27 PM, Stanislaw wrote:

> Hi all!
> I cant load my custom queries from the external file, as written here:
> https://issues.apache.org/jira/browse/SOLR-784
>
> This option is seems to be not implemented in current version 1.4.1 of
> Solr.
> It was deleted or it comes first with new version?
>
>
That patch was never committed so it is not available in any release.

-- 
Regards,
Shalin Shekhar Mangar.


Re: analysis tool vs. reality

2010-08-04 Thread Shalin Shekhar Mangar
On Wed, Aug 4, 2010 at 7:52 PM, Robert Muir  wrote:

> I think I agree with Justin here, I think the way analysis tool highlights
> 'matches' is extremely misleading, especially considering it completely
> ignores queryparsing.
>
> it would be better if it put your text in a memoryindex and actually parsed
> the query w/ queryparser, ran it, and used the highlighter to try to show
> any matches.
>
>
+1

-- 
Regards,
Shalin Shekhar Mangar.


Re: Is there a better for solor server side loadbalance?

2010-08-04 Thread Shalin Shekhar Mangar
2010/8/4 Chengyang 

> The default solr solution is client side loadbalance.
> Is there a solution provide the server side loadbalance?
>
>
No. Most of us stick a HTTP load balancer in front of multiple Solr servers.

-- 
Regards,
Shalin Shekhar Mangar.


DIH and Cassandra

2010-08-04 Thread Mark
Is it possible to use DIH with Cassandra either out of the box or with 
something more custom? Thanks


Re: enhancing auto complete

2010-08-04 Thread Avlesh Singh
I preferred to answer this question privately earlier. But I have received
innumerable requests to unveil the architecture. For the benefit of all, I
am posting it here (after hiding as much info as I should, in my company's
interest).

The context: Auto-suggest feature on http://askme.in

*Solr setup*: Underneath are some of the salient features -

   1. TermsComponent is NOT used.
   2. The index is made up of 4 fields of the following types -
   "autocomplete_full", "autocomplete_token", "string" and "text".
   3. "autocomplete_full" uses KeywordTokenizerFactory and
   EdgeNGramFilterFactory. "autocomplete_token" uses WhitespaceTokenizerFactory
   and EdgeNGramFilterFactory. Both of these are Solr text fields with standard
   filters like LowerCaseFilterFactory etc applied during querying and
   indexing.
   4. Standard DataImportHandler and a bunch of sql procedures are used to
   "derive" all suggestable phrases from the system and index them in the above
   mentioned fields.

*Controller setup*: The controller (to handle suggest queries) is a typical
JAVA servlet using Solr as its backend (connecting via solrj). Based on the
incoming query string, a lucene query is created. It is BooleanQuery
comprising of TermQuery across all the above mentioned fields. The boost
factor to each of these term queries would determine (to an extent) what
kind of matches do you prefer to show up first. JSON is used as the data
exchange format.

*Frontend setup*: It is a home grown JS to address some specific use cases
of the project in question. One simple exercise with Firebug will spill all
the beans. However, I strongly recommend using jQuery to build (and extend)
the UI component.

Any help beyond this is available, but off the list.

Cheers
Avlesh
@avlesh  | http://webklipper.com

On Tue, Aug 3, 2010 at 10:04 AM, Bhavnik Gajjar <
bhavnik.gaj...@gatewaynintec.com> wrote:

>  Whoops!
>
> table still not looks ok :(
>
> trying to send once again
>
>
> loremLorem ipsum dolor sit amet
> Hieyed ddi lorem ipsum dolor
> test lorem ipsume
> test xyz lorem ipslili
>
> lorem ipLorem ipsum dolor sit amet
> Hieyed ddi lorem ipsum dolor
> test lorem ipsume
> test xyz lorem ipslili
>
> lorem ipsltest xyz lorem ipslili
>
> On 8/3/2010 10:00 AM, Bhavnik Gajjar wrote:
>
> Avlesh,
>
> Thanks for responding
>
> The table mentioned below looks like,
>
> lorem   Lorem ipsum dolor sit amet
>  Hieyed ddi lorem ipsum
> dolor
>  test lorem ipsume
>  test xyz lorem ipslili
>
> lorem ip   Lorem ipsum dolor sit amet
>  Hieyed ddi lorem ipsum
> dolor
>  test lorem ipsume
>  test xyz lorem ipslili
>
> lorem ipsl test xyz lorem ipslili
>
>
> Yes, [http://askme.in] looks good!
>
> I would like to know its designs/solr configurations etc.. Can you
> please provide me detailed views of it?
>
> In [http://askme.in], there is one thing to be noted. Search text like,
> [business c] populates [Business Centre] which looks OK but, [Consultant
> Business] looks bit odd. But, in general the pointer you suggested is
> great to start with.
>
> On 8/2/2010 8:39 PM, Avlesh Singh wrote:
>
>
>  From whatever I could read in your broken table of sample use cases, I think
>
>
>  you are looking for something similar to what has been done here 
> -http://askme.in; if this is what you are looking do let me know.
>
> Cheers
> Avlesh
> @avlesh   | 
> http://webklipper.com
>
> On Mon, Aug 2, 2010 at 8:09 PM, Bhavnik 
> Gajjar  wrote:
>
>
>
>
>  Hi,
>
> I'm looking for a solution related to auto complete feature for one
> application.
>
> Below is a list of texts from which auto complete results would be
> populated.
>
> Lorem ipsum dolor sit amet
> tincidunt ut laoreet
> dolore eu feugiat nulla facilisis at vero eros et
> te feugait nulla facilisi
> Claritas est etiam processus
> anteposuerit litterarum formas humanitatis
> fiant sollemnes in futurum
> Hieyed ddi lorem ipsum dolor
> test lorem ipsume
> test xyz lorem ipslili
>
> Consider below table. First column describes user entered value and
> second column describes expected result (list of auto complete terms
> that should be populated from Solr)
>
> lorem
> *Lorem* ipsum dolor sit amet
> Hieyed ddi *lorem* ipsum dolor
> test *lorem *ipsume
> test xyz *lorem *ipslili
> lorem ip
> *Lorem ip*sum dolor sit amet
> Hieyed ddi *lorem ip*sum dolor
> test *lorem ip*sume
> test xyz *lorem ip*slili
> lorem

Re: Setting up apache solr in eclipse with Tomcat

2010-08-04 Thread Hando420

Thanks man i haven't tried this but where do put that xml configuration. Is
it to the web.xml in solr?

Cheers,
Hando
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Setting-up-apache-solr-in-eclipse-with-Tomcat-tp1021673p1023188.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Setting up apache solr in eclipse with Tomcat

2010-08-04 Thread jayendra patil
The sole home is configured in the web.xml of the application which points
to the folder having the conf files and the data directory

   solr/home
   D:/multicore
   java.lang.String


Regards,
Jayendra

On Wed, Aug 4, 2010 at 12:21 PM, Hando420  wrote:

>
> Thanks man i haven't tried this but where do put that xml configuration. Is
> it to the web.xml in solr?
>
> Cheers,
> Hando
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Setting-up-apache-solr-in-eclipse-with-Tomcat-tp1021673p1023188.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


can't use strdist as functionquery?

2010-08-04 Thread solr-user

I want to sort my results by how closely a given resultset field matches a
given string.

For example, say I am searching for a given product, and the product can be
found in many cities including "seattle".  I want to sort the results so
that results from city of "seattle" are at the top, and all other results
below that

I thought that I could do so by using strdist as a functionquery (I am using
solr 1.4 so I cant directly sort on strdist) but am having problems with the
syntax of the query because functionqueries require double quotes and so
does strdist.

My current query, which fails with an NPE, looks something like this:

http://localhost:8080/solr/select?q=(product:"foo")
_val_:"strdist("seattle",city,edit)"&sort=score%20asc&fl=product, city,
score

I have tried various types of URL encoding (ie using %22 instead of double
quotes in the strdist function), but no success.

Any ideas??  Is there a better way to accomplish this sorting??

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/can-t-use-strdist-as-functionquery-tp1023390p1023390.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Setting up apache solr in eclipse with Tomcat

2010-08-04 Thread Hando420

Thanks now its clear and works fine.

Regards,
Hando
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Setting-up-apache-solr-in-eclipse-with-Tomcat-tp1021673p1023404.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Sharing index files between multiple JVMs and replication

2010-08-04 Thread Kelly Taylor
Is anybody else encountering these same issues; IF having a similar setup?  And 
is there a way to configure certain Solr web-apps as read-only (basically dummy 
instances) so that index changes are not allowed?



- Original Message 
From: Kelly Taylor 
To: solr-user@lucene.apache.org
Sent: Tue, August 3, 2010 5:48:11 PM
Subject: Re: Sharing index files between multiple JVMs and replication

Yes, they are on a common file server, and I've been sharing the same index 
directory between the Solr JVMs. But I seem to be hitting a wall when 
attempting 

to use just one instance for changing the index.

With Solr replication disabled, I stream updates to the one instance, and this 
process hangs whenever there are additional Solr JVMs started up with the same 
configuration in solrconfig.xml  -  So I then tried, to no avail, using a 
different configuration, solrconfig-readonly.xml where the updateHandler was 
commmented out, all /update* requestHandlers removed, mainIndex locktype of 
none, etc.

And with Solr replication enabled, the Slave seems to hang, or at least report 
unusually long time estimates for the current running replication process to 
complete. 


-Kelly



- Original Message 
From: Lance Norskog 
To: solr-user@lucene.apache.org
Sent: Tue, August 3, 2010 4:56:58 PM
Subject: Re: Sharing index files between multiple JVMs and replication

Are these files on a common file server? If you want to share them
that way, it actually does work just to give them all the same index
directory, as long as only one of them changes it.

On Tue, Aug 3, 2010 at 4:38 PM, Kelly Taylor  wrote:
> Is there a way to share index files amongst my multiple Solr web-apps, by
> configuring only one of the JVMs as an indexer, and the remaining, as 
read-only
> searchers?
>
> I'd like to configure in such a way that on startup of the read-only 
searchers,
> missing cores/indexes are not created, and updates are not handled.
>
> If I can get around the files being locked by the read-only instances, I 
should
> be able to scale wider in a given environment, as well as have less replicated
> copies of my master index (Solr 1.4 Java Replication).
>
> Then once the commit is issued to the slave, I can fire off a RELOAD script 
for
> each of my read-only cores.
>
> -Kelly
>
>
>
>
>



-- 
Lance Norskog
goks...@gmail.com






Re: analysis tool vs. reality

2010-08-04 Thread Chris Hostetter

: I think I agree with Justin here, I think the way analysis tool highlights
: 'matches' is extremely misleading, especially considering it completely
: ignores queryparsing.

it really only attempts to identify when there is overlap between 
analaysis at query time and at indexing time so you can easily spot when 
one analyzer or the other "breaks" things so that they no longer line up 
(or when it "fiexes" things so they start to line up)

Even if we eliminated that highlighting as missleading, people would still 
do it in thier minds, it would just be harder -- it doesn't change the 
underlying fact that analysis is only part of the picture.

: it would be better if it put your text in a memoryindex and actually parsed
: the query w/ queryparser, ran it, and used the highlighter to try to show
: any matches.

Thta level of "query explanation" really only works if the user gives us a 
full document (all fields, not just one) and a full query string, and all 
of the possible query params -- because the query parser (either implicit 
because of config, or explicitly specified by the user) might change it's 
behavior based on those other params.

I agree with you: debugging functionality along hte lines of what you are 
describing would be *VASTLY* more useful then what we've got right now, 
and is something i breifly looked into doing before as an extension of the 
existing DebugComponent...

   https://issues.apache.org/jira/browse/SOLR-1749

...the problems i encountered trying to do it as a debug component on 
a "real" Solr request seem like they would also be problems for a 
MemoryIndex based "admin tool" approach like what you suggest -- but if 
you've got ideas on working arround them i am 100% interested.

Independent of how we might create a better "QueryPasrser + Analyssis 
Explanation" tool / debug component is hte question of what we can do to 
make it more clear what exactly the analysis.jsp page is doing and what 
people can infer from that page.  As i said, i don't think removing the 
"match" highlighting will actaully reduce confusion, but perhaps there is 
verbage/disclaimers that could be added to make it more clear?



-Hoss



Re: analysis tool vs. reality

2010-08-04 Thread Robert Muir
On Wed, Aug 4, 2010 at 1:45 PM, Chris Hostetter wrote:

>
> it really only attempts to identify when there is overlap between
> analaysis at query time and at indexing time so you can easily spot when
> one analyzer or the other "breaks" things so that they no longer line up
> (or when it "fiexes" things so they start to line up)
>

It attempts badly, because it only "works" in the most trivial of cases
(e.g. doesnt reflect the interaction of queryparser with multiword synonyms
or worddelimiterfilter).

Since Solr includes these non-trivial analysis components *in the example*
it means that this 'highlight matches' doesnt actually even really work at
all.

Someone is gonna use this thing when they dont understand why analysis isnt
doing what they want, i.e. the cases like I outlined above.

For the trivial cases where it does "work" the 'highlight matches' isnt
useful anyway, so in its current state its completely unnecessary.


> Even if we eliminated that highlighting as missleading, people would still
> do it in thier minds, it would just be harder -- it doesn't change the
> underlying fact that analysis is only part of the picture.
>

I'm not suggesting that. I'm suggesting fixing the highlighting so its not
misleading. There are really only two choices:
1. remove the current highlighting
2. fix it.

in its current state its completely useless and misleading, except for very
trivial cases, in which you dont need it anyway.


>
> : it would be better if it put your text in a memoryindex and actually
> parsed
> : the query w/ queryparser, ran it, and used the highlighter to try to show
> : any matches.
>
> Thta level of "query explanation" really only works if the user gives us a
> full document (all fields, not just one) and a full query string, and all
> of the possible query params -- because the query parser (either implicit
> because of config, or explicitly specified by the user) might change it's
> behavior based on those other params.
>

thats true, but I dont see why the user couldnt be allowed to provide just
this.
I'd bet money a lot of people are using this thing with a specific
query/document in mind anyway!


> people can infer from that page.  As i said, i don't think removing the
> "match" highlighting will actaully reduce confusion, but perhaps there is
> verbage/disclaimers that could be added to make it more clear?
>

 As i said before, I think i disagree with you. I think for stuff like this
the technicals are less important, whats important is this is a misleading
checkbox that really confuses users.

I suggest disabling it entirely, you are only going to remove confusion.


-- 
Robert Muir
rcm...@gmail.com


Re: analysis tool vs. reality

2010-08-04 Thread Robert Muir
Furthermore, I would like to add its not just the highlight matches
functionality that is horribly broken here, but the output of the analysis
itself is misleading.

lets say i take 'textTight' from the example, and add the following synonym:

this is broken => broke

the query time analysis is wrong, as it clearly shows synonymfilter
collapsing "this is broken" to broke, but in reality with the qp for that
field, you are gonna get 3 separate tokenstreams and this will never
actually happen (because the qp will divide it up on whitespace first)

So really the output from 'Query Analyzer' is completely bogus.

On Wed, Aug 4, 2010 at 1:57 PM, Robert Muir  wrote:

>
>
> On Wed, Aug 4, 2010 at 1:45 PM, Chris Hostetter 
> wrote:
>
>>
>> it really only attempts to identify when there is overlap between
>> analaysis at query time and at indexing time so you can easily spot when
>> one analyzer or the other "breaks" things so that they no longer line up
>> (or when it "fiexes" things so they start to line up)
>>
>
> It attempts badly, because it only "works" in the most trivial of cases
> (e.g. doesnt reflect the interaction of queryparser with multiword synonyms
> or worddelimiterfilter).
>
> Since Solr includes these non-trivial analysis components *in the example*
> it means that this 'highlight matches' doesnt actually even really work at
> all.
>
> Someone is gonna use this thing when they dont understand why analysis isnt
> doing what they want, i.e. the cases like I outlined above.
>
> For the trivial cases where it does "work" the 'highlight matches' isnt
> useful anyway, so in its current state its completely unnecessary.
>
>
>> Even if we eliminated that highlighting as missleading, people would still
>> do it in thier minds, it would just be harder -- it doesn't change the
>> underlying fact that analysis is only part of the picture.
>>
>
> I'm not suggesting that. I'm suggesting fixing the highlighting so its not
> misleading. There are really only two choices:
> 1. remove the current highlighting
> 2. fix it.
>
> in its current state its completely useless and misleading, except for very
> trivial cases, in which you dont need it anyway.
>
>
>>
>> : it would be better if it put your text in a memoryindex and actually
>> parsed
>> : the query w/ queryparser, ran it, and used the highlighter to try to
>> show
>> : any matches.
>>
>> Thta level of "query explanation" really only works if the user gives us a
>> full document (all fields, not just one) and a full query string, and all
>> of the possible query params -- because the query parser (either implicit
>> because of config, or explicitly specified by the user) might change it's
>> behavior based on those other params.
>>
>
> thats true, but I dont see why the user couldnt be allowed to provide just
> this.
> I'd bet money a lot of people are using this thing with a specific
> query/document in mind anyway!
>
>
>> people can infer from that page.  As i said, i don't think removing the
>> "match" highlighting will actaully reduce confusion, but perhaps there is
>> verbage/disclaimers that could be added to make it more clear?
>>
>
>  As i said before, I think i disagree with you. I think for stuff like this
> the technicals are less important, whats important is this is a misleading
> checkbox that really confuses users.
>
> I suggest disabling it entirely, you are only going to remove confusion.
>
>
> --
> Robert Muir
> rcm...@gmail.com
>



-- 
Robert Muir
rcm...@gmail.com


Re: Best solution to avoiding multiple query requests

2010-08-04 Thread Ken Krugler

Hi Geert-Jan,

On Aug 4, 2010, at 5:30am, Geert-Jan Brits wrote:

Field Collapsing (currently as patch) is exactly what you're looking  
for

imo.

http://wiki.apache.org/solr/FieldCollapsing


Thanks for the ref, good stuff.

I think it's close, but if I understand this correctly, then I could  
get (using just top two, versus top 10 for simplicity) results that  
looked like


"dog training" (faceted field value A)
"super dog" (faceted field value B)

but if the actual faceted field value/hit counts were:

C (10)
D (8)
A (2)
B (1)

Then what I'd want is the top hit for "dog AND facet field:C",  
followed by "dog AND facet field:D".


Used field collapsing would improve the probability that if I asked  
for the top 100 hits, I'd find entries for each of my top N faceted  
field values.


Thanks again,

-- Ken

I've got a situation where the key result from an initial search  
request
(let's say for "dog") is the list of values from a faceted field,  
sorted by

hit count.

For the top 10 of these faceted field values, I need to get the top  
hit for
the target request ("dog") restricted to that value for the faceted  
field.


Currently this is 11 total requests, of which the 10 requests  
following the
initial query can be made in parallel. But that's still a lot of  
requests.


So my questions are:

1. Is there any magic query to handle this with Solr as-is?

2. if not, is the best solution to create my own request handler?

3. And in that case, any input/tips on developing this type of custom
request handler?

Thanks,

-- Ken



Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g






Re: DIH and Cassandra

2010-08-04 Thread Andrei Savu
DIH only works with relational databases and XML files [1], you need
to write custom code in order to index data from Cassandra.

It should be pretty easy to map documents from Cassandra to Solr.
There are a lot of client libraries available [2] for Cassandra.

[1] http://wiki.apache.org/solr/DataImportHandler
[2] http://wiki.apache.org/cassandra/ClientOptions

On Wed, Aug 4, 2010 at 6:41 PM, Mark  wrote:
> Is it possible to use DIH with Cassandra either out of the box or with
> something more custom? Thanks
>



-- 
Indekspot -- http://www.indekspot.com -- Managed Hosting for Apache Solr


Re: DIH and Cassandra

2010-08-04 Thread Andrei Savu
DIH only works with relational databases and XML files [1], you need
to write custom code in order to index data from Cassandra.

It should be pretty easy to map documents from Cassandra to Solr.
There are a lot of client libraries available [2] for Cassandra.

[1] http://wiki.apache.org/solr/DataImportHandler
[2] http://wiki.apache.org/cassandra/ClientOptions

On Wed, Aug 4, 2010 at 6:41 PM, Mark  wrote:
> Is it possible to use DIH with Cassandra either out of the box or with
> something more custom? Thanks
>

-- 
Indekspot -- http://www.indekspot.com -- Managed Hosting for Apache Solr


Re: Is there a better for solor server side loadbalance?

2010-08-04 Thread Andrei Savu
Check this article [1] that explains how to setup haproxy to do load
balacing. The steps are the same even if you are not using Drupal.  By
using this approach you can easily add more replicas without changing
the application configuration files.

You should also check SolrCloud [2] which does automatic load
balancing and fail-over for queries. This branch is still under
development.

[1] 
http://davehall.com.au/blog/dave/2010/03/13/solr-replication-load-balancing-haproxy-and-drupal
[2] http://wiki.apache.org/solr/SolrCloud

2010/8/4 Chengyang :
- Hide quoted text -
> The default solr solution is client side loadbalance.
> Is there a solution provide the server side loadbalance?
>
>

-- 
Indekspot -- http://www.indekspot.com -- Managed Hosting for Apache Solr


Solrj ContentStreamUpdateRequest Slow

2010-08-04 Thread Tod
I'm running a slight variation of the example code referenced below and 
it takes a real long time to finally execute.  In fact it hangs for a 
long time at solr.request(up) before finally executing.  Is there 
anything I can look at or tweak to improve performance?


I am also indexing a local pdf file, there are no firewall issues, solr 
is running on the same machine, and I tried the actual host name in 
addition to localhost but nothing helps.



Thanks - Tod

http://wiki.apache.org/solr/ContentStreamUpdateRequestExample


Re: Best solution to avoiding multiple query requests

2010-08-04 Thread Geert-Jan Brits
If I understand correctly: you want to sort your collapsed results by 'nr of
collapsed results'/ hits.

It seems this can't be done out-of-the-box using this patch (I'm not
entirely sure, at least it doesn't follow from the wiki-page. Perhaps best
is to check the jira-issues to make sure this isn't already available now,
but just not updated on the wiki)

Also I found a blogpost (from the patch creator afaik) with in the comments
someone with the same issue + some pointers.
http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/

hope that helps,
Geert-jan

2010/8/4 Ken Krugler 

> Hi Geert-Jan,
>
>
> On Aug 4, 2010, at 5:30am, Geert-Jan Brits wrote:
>
>  Field Collapsing (currently as patch) is exactly what you're looking for
>> imo.
>>
>> http://wiki.apache.org/solr/FieldCollapsing
>>
>
> Thanks for the ref, good stuff.
>
> I think it's close, but if I understand this correctly, then I could get
> (using just top two, versus top 10 for simplicity) results that looked like
>
> "dog training" (faceted field value A)
> "super dog" (faceted field value B)
>
> but if the actual faceted field value/hit counts were:
>
> C (10)
> D (8)
> A (2)
> B (1)
>
> Then what I'd want is the top hit for "dog AND facet field:C", followed by
> "dog AND facet field:D".
>
> Used field collapsing would improve the probability that if I asked for the
> top 100 hits, I'd find entries for each of my top N faceted field values.
>
> Thanks again,
>
> -- Ken
>
>
>  I've got a situation where the key result from an initial search request
>>> (let's say for "dog") is the list of values from a faceted field, sorted
>>> by
>>> hit count.
>>>
>>> For the top 10 of these faceted field values, I need to get the top hit
>>> for
>>> the target request ("dog") restricted to that value for the faceted
>>> field.
>>>
>>> Currently this is 11 total requests, of which the 10 requests following
>>> the
>>> initial query can be made in parallel. But that's still a lot of
>>> requests.
>>>
>>> So my questions are:
>>>
>>> 1. Is there any magic query to handle this with Solr as-is?
>>>
>>> 2. if not, is the best solution to create my own request handler?
>>>
>>> 3. And in that case, any input/tips on developing this type of custom
>>> request handler?
>>>
>>> Thanks,
>>>
>>> -- Ken
>>>
>>
> 
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> e l a s t i c   w e b   m i n i n g
>
>
>
>
>


Re: Best solution to avoiding multiple query requests

2010-08-04 Thread Ken Krugler

Hi Geert-jan,

On Aug 4, 2010, at 12:04pm, Geert-Jan Brits wrote:

If I understand correctly: you want to sort your collapsed results  
by 'nr of

collapsed results'/ hits.

It seems this can't be done out-of-the-box using this patch (I'm not
entirely sure, at least it doesn't follow from the wiki-page.  
Perhaps best
is to check the jira-issues to make sure this isn't already  
available now,

but just not updated on the wiki)

Also I found a blogpost (from the patch creator afaik) with in the  
comments

someone with the same issue + some pointers.
http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/


Yup, that's the one - 
http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/comment-page-1/#comment-1249

So with some modifications to that patch, it could work...thanks for  
the info!


-- Ken


2010/8/4 Ken Krugler 


Hi Geert-Jan,


On Aug 4, 2010, at 5:30am, Geert-Jan Brits wrote:

Field Collapsing (currently as patch) is exactly what you're  
looking for

imo.

http://wiki.apache.org/solr/FieldCollapsing



Thanks for the ref, good stuff.

I think it's close, but if I understand this correctly, then I  
could get
(using just top two, versus top 10 for simplicity) results that  
looked like


"dog training" (faceted field value A)
"super dog" (faceted field value B)

but if the actual faceted field value/hit counts were:

C (10)
D (8)
A (2)
B (1)

Then what I'd want is the top hit for "dog AND facet field:C",  
followed by

"dog AND facet field:D".

Used field collapsing would improve the probability that if I asked  
for the
top 100 hits, I'd find entries for each of my top N faceted field  
values.


Thanks again,

-- Ken


I've got a situation where the key result from an initial search  
request
(let's say for "dog") is the list of values from a faceted field,  
sorted

by
hit count.

For the top 10 of these faceted field values, I need to get the  
top hit

for
the target request ("dog") restricted to that value for the faceted
field.

Currently this is 11 total requests, of which the 10 requests  
following

the
initial query can be made in parallel. But that's still a lot of
requests.

So my questions are:

1. Is there any magic query to handle this with Solr as-is?

2. if not, is the best solution to create my own request handler?

3. And in that case, any input/tips on developing this type of  
custom

request handler?

Thanks,

-- Ken





Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g








Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g






Indexing boolean value

2010-08-04 Thread PeterKerk

Im trying to index a boolean location, but for some reason it does not show
up in my indexed data.

data-config.xml







OFFICIALLOCATION is a MSSQL database field of type 'bit'


schema.xml




(im not sure why I would use http://lucene.472066.n3.nabble.com/Indexing-boolean-value-tp1023708p1023708.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Indexing fieldvalues with dashes and spaces

2010-08-04 Thread Michael Griffiths
Your schema.xml setting for the field is probably tokenizing the punctuation. 
Change the field type to one that doesn't tokenize on punctuation; e.g. use 
"text_ws" and not "text"

-Original Message-
From: PeterKerk [mailto:vettepa...@hotmail.com] 
Sent: Wednesday, August 04, 2010 3:36 PM
To: solr-user@lucene.apache.org
Subject: Indexing fieldvalues with dashes and spaces


Im having issues with indexing field values containing spaces and dashes.
For example: Im trying to index province names of the Netherlands. Some 
province names contain a "-":
Zuid-Holland
Noord-Holland

my data-config has this:







When I check what has been indexed, I have this:

−

0
0
−

on
0
*:*
2.2
10


−
 −  Nijmegen −  Tuin 
Cafe  1 Gelderland −  
Fotoreportage  −  Gemeentehuis 
 2010-08-04T19:11:51.796Z
Gemeentehuis Nijmegen  −  Utrecht −  Tuin 
Cafe Danszaal  2 Utrecht −  Fotoreportage 
Exclusieve huur  −  Gemeentehuis 
 2010-08-04T19:11:51.796Z
Gemeentehuis Utrecht  −  Bloemendaal −  Strand 
Cafe Danszaal  3 Zuid-Holland
−

Exclusieve huur
Live muziek

−

Strand & Zee

2010-08-04T19:11:51.812Z
Beachclub Vroeger   



So we see that the full field has been indexed:
Zuid-Holland


BUT, when I check the facets via
http://localhost:8983/solr/db/select/?wt=json&indent=on&q=*:*&fl=id,title,city,score,features,official,services&facet=true&facet.field=theme&facet.field=features&facet.field=province&facet.field=services

I get this (snippet):
"facet_counts":{
  "facet_queries":{},
  "facet_fields":{
"theme":[
 "Gemeentehuis",2,
 "&",1,   < a
 "Strand",1,
 "Zee",1],
"features":[
 "cafe",3,
 "danszaal",2,
 "tuin",2,
 "strand",1],
"province":[
 "gelderland",1,
 "holland",1,
 "utrecht",1,
 "zuid",1, <  b
 "zuidholland",1],
"services":[
 "exclusiev",2,
 "fotoreportag",2, <  c
 "huur",2,
 "live",1,  <  d
 "muziek",1]},


There several weird things happen which I have indicated with <===

a. the full field value is "Strand & Zee", but now one facet is "&"
b. the full field value is "Zuid-Holland", but now "zuid" is a separate facet 
c. the full field value is "fotoreportage", but somehow the last character has 
been truncated d. the full field value "live muziek", but now "live" and 
"muziek" have become separate facets

What can I do about this?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023699.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Indexing boolean value

2010-08-04 Thread Michael Griffiths
I could be wrong, but I thought bit was an integer. Try changing fieldtype to 
integer.

-Original Message-
From: PeterKerk [mailto:vettepa...@hotmail.com] 
Sent: Wednesday, August 04, 2010 3:42 PM
To: solr-user@lucene.apache.org
Subject: Indexing boolean value


Im trying to index a boolean location, but for some reason it does not show up 
in my indexed data.

data-config.xml







OFFICIALLOCATION is a MSSQL database field of type 'bit'


schema.xml

 

(im not sure why I would use http://lucene.472066.n3.nabble.com/Indexing-boolean-value-tp1023708p1023708.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Indexing fieldvalues with dashes and spaces

2010-08-04 Thread PeterKerk

I changed  values to text_ws 

Now I only seem to have problems with fieldvalues that hold spacessee
below:

   
   
   
   
   

It has now become:

 "facet_counts":{
  "facet_queries":{},
  "facet_fields":{
"theme":[
 "Gemeentehuis",2,
 "&",1,   < still & is created as separate facet
 "Strand",1,
 "Zee",1],
"features":[
 "Cafe",3,
 "Danszaal",2,
 "Tuin",2,
 "Strand",1],
"province":[
 "Gelderland",1,
 "Utrecht",1,
 "Zuid-Holland",1], < this is now correct
"services":[
 "Exclusieve",2,
 "Fotoreportage",2,
 "huur",2,
 "Live",1, < "Live muziek" is split and separate facets are 
created
 "muziek",1]},
  "facet_dates":{}}}
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023787.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Indexing fieldvalues with dashes and spaces

2010-08-04 Thread Markus Jelsma
You shouldn't fetch faceting results from analyzed fields, it will mess with 
your results. Search on analyzed fields but don't retrieve values from them. 
 
-Original message-
From: PeterKerk 
Sent: Wed 04-08-2010 22:15
To: solr-user@lucene.apache.org; 
Subject: RE: Indexing fieldvalues with dashes and spaces


I changed  values to text_ws 

Now I only seem to have problems with fieldvalues that hold spacessee
below:

  
  
  
  
  

It has now become:

"facet_counts":{
 "facet_queries":{},
 "facet_fields":{
"theme":[
"Gemeentehuis",2,
"&",1,   < still & is created as separate facet
"Strand",1,
"Zee",1],
"features":[
"Cafe",3,
"Danszaal",2,
"Tuin",2,
"Strand",1],
"province":[
"Gelderland",1,
"Utrecht",1,
"Zuid-Holland",1], < this is now correct
"services":[
"Exclusieve",2,
"Fotoreportage",2,
"huur",2,
"Live",1, < "Live muziek" is split and separate facets are created
"muziek",1]},
 "facet_dates":{}}}
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023787.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Indexing boolean value

2010-08-04 Thread PeterKerk

Hi,

I tried that already, so that would make this:




(still not sure what copyField does though)

But even that wont work. I also dont see the officallocation columns indexed
in the documents:
http://localhost:8983/solr/db/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-boolean-value-tp1023708p1023811.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Indexing fieldvalues with dashes and spaces

2010-08-04 Thread PeterKerk

Sorry, but Im a newbie to Solr...how would I change my schema.xml to match
your requirements?

And what do you mean by "it will mess with your results"? What will happen
then?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023824.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Indexing fieldvalues with dashes and spaces

2010-08-04 Thread Michael Griffiths
Echoing Markus - use the tokenized field to return results, but have a 
duplicate field of fieldtype="string" to show the untokenized results. E.g. 
facet on that field.

-Original Message-
From: Markus Jelsma [mailto:markus.jel...@buyways.nl] 
Sent: Wednesday, August 04, 2010 4:18 PM
To: solr-user@lucene.apache.org
Subject: RE: Indexing fieldvalues with dashes and spaces

You shouldn't fetch faceting results from analyzed fields, it will mess with 
your results. Search on analyzed fields but don't retrieve values from them. 
 
-Original message-
From: PeterKerk 
Sent: Wed 04-08-2010 22:15
To: solr-user@lucene.apache.org; 
Subject: RE: Indexing fieldvalues with dashes and spaces


I changed  values to text_ws 

Now I only seem to have problems with fieldvalues that hold spacessee
below:

  
  
  
  
  

It has now become:

"facet_counts":{
 "facet_queries":{},
 "facet_fields":{
"theme":[
"Gemeentehuis",2,
"&",1,   < still & is created as separate facet
"Strand",1,
"Zee",1],
"features":[
"Cafe",3,
"Danszaal",2,
"Tuin",2,
"Strand",1],
"province":[
"Gelderland",1,
"Utrecht",1,
"Zuid-Holland",1], < this is now correct
"services":[
"Exclusieve",2,
"Fotoreportage",2,
"huur",2,
"Live",1, < "Live muziek" is split and separate facets are created
"muziek",1]},
 "facet_dates":{}}}
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023787.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Indexing boolean value

2010-08-04 Thread Michael Griffiths
Copyfield copies the field so you can have multiple versions. Useful to dump 
all fields into one "super" field you can search on, for perf reasons.

If the column isn't being indexed, I'd suggest the problem is in DIH. No 
suggestions as to why, I'm afraid.

-Original Message-
From: PeterKerk [mailto:vettepa...@hotmail.com] 
Sent: Wednesday, August 04, 2010 4:22 PM
To: solr-user@lucene.apache.org
Subject: RE: Indexing boolean value


Hi,

I tried that already, so that would make this:

 

(still not sure what copyField does though)

But even that wont work. I also dont see the officallocation columns indexed in 
the documents:
http://localhost:8983/solr/db/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-boolean-value-tp1023708p1023811.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Indexing fieldvalues with dashes and spaces

2010-08-04 Thread Markus Jelsma
Hmm, you should first read a bit more on schema design on the wiki and learn 
about indexing and querying Solr.

 

The copyField directive is what is commonly used in a faceted navigation 
system, search on analyzed fields, show faceting results using the primitive 
string field type. With copyField, you can, well, copy the field from one to 
another without it being analyzed by the first - so no chaining is possible, 
which is good. 

 

Let's say you have a city field you want to navigate with, but also search in, 
then you would have an analyzed field for search and a string field for 
displaying the navigation.

 

But, check the wiki on this subject.
 
-Original message-
From: PeterKerk 
Sent: Wed 04-08-2010 22:23
To: solr-user@lucene.apache.org; 
Subject: RE: Indexing fieldvalues with dashes and spaces


Sorry, but Im a newbie to Solr...how would I change my schema.xml to match
your requirements?

And what do you mean by "it will mess with your results"? What will happen
then?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023824.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH and Cassandra

2010-08-04 Thread Shalin Shekhar Mangar
On Wed, Aug 4, 2010 at 9:11 PM, Mark  wrote:

> Is it possible to use DIH with Cassandra either out of the box or with
> something more custom? Thanks
>

It will take some modifications but DIH is built to create denormalized
documents so it is possible.

Also see https://issues.apache.org/jira/browse/SOLR-853

-- 
Regards,
Shalin Shekhar Mangar.


RE: Indexing fieldvalues with dashes and spaces

2010-08-04 Thread PeterKerk

Well the example you provided is 100% relevant to me :)

I've read the wiki now (SchemaXml,SolrFacetingOverview,Query Syntax,
SimpleFacetParameters), but still do not have an exact idea of what you
mean.

My situation:
a city field is something that I want users to search on via text input, so
lets say "New Yo" would give the results for "New York".
But also a facet "Cities" is available in which "New York" is just one of
the cities that is clickable.

The other facet is "theme", which in my example holds values like
"Gemeentehuis" and "Strand & Zee", that would not be a thing on which can be
searched via manual input but IS clickable.

If you look at my schema.xml, do you see stuff im doing that is absolutely
wrong for the purpose described above? Because as far as I can see the
documents are indexed correctly (BESIDES the spaces in the fieldvalues).

Any help is greatly appreciated! :)
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023992.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH and Cassandra

2010-08-04 Thread Dennis Gearon
If data is stored in the index, isn't the index of Solr pretty much already a 
'Big/Cassandra Table', except with tokenized columns to make seaching easier?

How are Cassandra/Big/Couch DBs doing text/weighted searching? 

Seems a real duplication to use Cassandra AND Solr. OTOH, I don't know how many 
'Tables'/indexes one can make using Solr, I'm still a newbie.

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 8/4/10, Andrei Savu  wrote:

> From: Andrei Savu 
> Subject: Re: DIH and Cassandra
> To: solr-user@lucene.apache.org
> Date: Wednesday, August 4, 2010, 12:00 PM
> DIH only works with relational
> databases and XML files [1], you need
> to write custom code in order to index data from
> Cassandra.
> 
> It should be pretty easy to map documents from Cassandra to
> Solr.
> There are a lot of client libraries available [2] for
> Cassandra.
> 
> [1] http://wiki.apache.org/solr/DataImportHandler
> [2] http://wiki.apache.org/cassandra/ClientOptions
> 
> On Wed, Aug 4, 2010 at 6:41 PM, Mark 
> wrote:
> > Is it possible to use DIH with Cassandra either out of
> the box or with
> > something more custom? Thanks
> >
> 
> 
> 
> -- 
> Indekspot -- http://www.indekspot.com -- Managed
> Hosting for Apache Solr
> 


Re: Is there a better for solor server side loadbalance?

2010-08-04 Thread Peter Karich

>> The default solr solution is client side loadbalance.
>> Is there a solution provide the server side loadbalance?
>>
>>
>> 
> No. Most of us stick a HTTP load balancer in front of multiple Solr servers.
>   

E.g. mod_jk is a very easy solution (maybe too simple/stupid?) for a
load balancer,
but it offers also a failover functionality:

It is as simple as:

worker.loadbalancer.balance_workers=worker1,worker2,worker3,...

and the failover:

worker.worker1.redirect=worker2



Re: Some basic DataImportHandler questions

2010-08-04 Thread harrysmith

Thanks, I think part of my issue may be I am misunderstanding how to use the
entity and field tags to import data in a particular format and am looking
for a few more examples.

Lets say I have a database table with 2 columns that contain metadata fields
and values, and would like to import this into Solr and keep the pairs
together, an example database table follows consisting of two columns
(String), one containing metadata names and the other metadata values (col
names: metadata_name, metadata_value in this example). There may be multiple
records for a name. The set of potential metadata_names is unknown, it could
be anything.

metadata_name . metadata_value
=====
title   blah blah
subject  some subject
subject  another subject
name some name


What is the proper way to import these and keep the name/value pairs intact.
I am seeing the following after import:


title
subject
name

−

blah blah
some subject
another subject
some name


Ideally, the end goal would be something like below:


some subject



some name


etc

It feels like I am missing something obvious and this would be a common
structure for imports.





>> Just starting with DataImportHandler and had a few simple questions.
>>
>> Is there a location for more in depth documentation other than
>> http://wiki.apache.org/solr/DataImportHandler?
>>
>>

>Umm, no, but let us know what is not covered well and it can be added. 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Some-basic-DataImportHandler-questions-tp1010291p1024205.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: No "group by"? looking for an alternative.

2010-08-04 Thread Lance Norskog
Hello-

A way to do this is to create on faceting field that includes both the
size and the color. I assume you have a different shoe product
document for each model. Each model would include the color & size
'red' and '14a' fields, but you would add a field with 'red-14a'.

On Wed, Aug 4, 2010 at 7:17 AM, Mickael Magniez
 wrote:
>
> Hello,
>
> I'm dealing with a problem since few days  : I want to index and search
> shoes, each shoe can have several size and colors, at different prices.
>
> So, what i want is : when I search for "Converse", i want to retrieve one
> "shoe per model", i-e one color and one size, but having colors and sizes in
> facets.
>
> My first idea was to copy SQL behaviour with a "SELECT * FROM solr WHERE
> text CONTAINS 'converse' GROUP BY model".
> But no group by in Solr :(. I try with FieldCollapsing, but have many bugs
> (NullPointerException).
>
> Then I try with multivalued facets  :
>  multiValued="true"/>
>  multiValued="true"/>
>
> It's nearly working, but i have a problem : when i filtered on red shoes, in
> the size facet, I also have sizes which are not available in red. I don't
> find any solutions to filter multivalued facet with value of another
> multivalued facet.
>
> So if anyone have an idea for solving this problem...
>
>
>
> Mickael.
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/No-group-by-looking-for-an-alternative-tp1022738p1022738.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Lance Norskog
goks...@gmail.com


Re: analysis tool vs. reality

2010-08-04 Thread Lance Norskog
"there is some kind of caching of query results
going on that doesnt get flushed on a restart of tomcat."

Yes. Solr by default has http caching on if there is no configuration,
and the example solrconfig.xml has it configured on. You should edit
solrconfig.xml to use the alternative described in the comments.

On Wed, Aug 4, 2010 at 7:55 AM, Justin Lolofie  wrote:
> Wow, I got to work this morning and my query results now include the
> 'ABC12' document. I'm not sure what that means. Either I made a
> mistake in the process I described in the last email (I dont think
> this is the case) or there is some kind of caching of query results
> going on that doesnt get flushed on a restart of tomcat.
>
>
>
>
> Erik: Yes, I did re-index if that means adding the document again.
> Here are the exact steps I took:
>
> 1. analysis.jsp "ABC12" does NOT match title "ABC12" (however, ABC or 12 does)
> 2. changed schema.xml WordDelimeterFilterFactory catenate-all
> 3. restarted tomcat
> 4. deleted the document with title "ABC12"
> 5. added the document with title "ABC12"
> 6. query "ABC12" does NOT result in the document with title "ABC12"
> 7. analysis.jsp "ABC12" DOES match that document now
>
> Is there any way to see, given an ID, how something is indexed internally?
>
> Lance: I understand the index/query sections of analysis.jsp. However,
> it operates on text that you enter into the form, not on actual index
> data. Since all my documents have a unique ID, I'd like to supply an
> ID and a query, and get back the same index/query sections- using
> whats actually in the index.
>
>
> -- Forwarded message --
> From: Erik Hatcher 
> To: solr-user@lucene.apache.org
> Date: Tue, 3 Aug 2010 22:43:17 -0400
> Subject: Re: analysis tool vs. reality
> Did you reindex after changing the schema?
>
>
> On Aug 3, 2010, at 7:35 PM, Justin Lolofie wrote:
>
>    Hi Erik, thank you for replying. So, turning on debugQuery shows
>    information about how the query is processed- is there a way to see
>    how things are stored internally in the index?
>
>    My query is "ABC12". There is a document who's "title" field is
>    "ABC12". However, I can only get it to match if I search for "ABC" or
>    "12". This was also true in the analysis tool up until recently.
>    However, I changed schema.xml and turned on catenate-all in
>    WordDelimterFilterFactory for title fieldtype. Now, in the analysis
>    tool "ABC12" matches "ABC12". However, when doing an actual query, it
>    does not match.
>
>    Thank you for any help,
>    Justin
>
>
>    -- Forwarded message --
>    From: Erik Hatcher 
>    To: solr-user@lucene.apache.org
>    Date: Tue, 3 Aug 2010 16:50:06 -0400
>    Subject: Re: analysis tool vs. reality
>    The analysis tool is merely that, but during querying there is also a
>    query parser involved.  Adding debugQuery=true to your request will
>    give you the parsed query in the response offering insight into what
>    might be going on.   Could be lots of things, like not querying the
>    fields you think you are to a misunderstanding about some text not
>    being analyzed (like wildcard clauses).
>
>         Erik
>
>    On Aug 3, 2010, at 4:43 PM, Justin Lolofie wrote:
>
>      Hello,
>
>      I have found the analysis tool in the admin page to be very useful in
>      understanding my schema. I've made changes to my schema so that a
>      particular case I'm looking at matches properly. I restarted solr,
>      deleted the document from the index, and added it again. But still,
>      when I do a query, the document does not get returned in the results.
>
>      Does anyone have any tips for debugging this sort of issue? What is
>      different between what I see in analysis tool and new documents added
>      to the index?
>
>      Thanks,
>      Justin
>



-- 
Lance Norskog
goks...@gmail.com


Re: Indexing fieldvalues with dashes and spaces

2010-08-04 Thread Erick Erickson
I suspect you're running afoul of tokenizers and filters. The parts of your
schema
that you published aren't the ones that really count.

What you probably need to look at is the FieldType definitions, i.e. what
analysis is
done for, say, text_ws (see http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

The general idea here is that the Tokenizers in general break up the
incoming stream according to various rules. The Filters then (potentially)
modify each token in various ways.

Until you have a firm handle on this process, facets are probably a
distraction. You're
better off looking at your index with the admin pages and/or Luke and/or
LukeRequestHandler.

And do be aware that fields you get back from a request (i.e. a search) are
the stored fields,
NOT what's indexed. This may trip you up too...

HTH
Erick

On Wed, Aug 4, 2010 at 5:22 PM, PeterKerk  wrote:

>
> Well the example you provided is 100% relevant to me :)
>
> I've read the wiki now (SchemaXml,SolrFacetingOverview,Query Syntax,
> SimpleFacetParameters), but still do not have an exact idea of what you
> mean.
>
> My situation:
> a city field is something that I want users to search on via text input, so
> lets say "New Yo" would give the results for "New York".
> But also a facet "Cities" is available in which "New York" is just one of
> the cities that is clickable.
>
> The other facet is "theme", which in my example holds values like
> "Gemeentehuis" and "Strand & Zee", that would not be a thing on which can
> be
> searched via manual input but IS clickable.
>
> If you look at my schema.xml, do you see stuff im doing that is absolutely
> wrong for the purpose described above? Because as far as I can see the
> documents are indexed correctly (BESIDES the spaces in the fieldvalues).
>
> Any help is greatly appreciated! :)
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023992.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


XML Format

2010-08-04 Thread twojah


1
1.0
27017
Bracket Ceiling untuk semua merk projector,
panjang 60-90 cm  Bahan Besi Cat Hitam = 325rb Bahan Sta
/aksesoris-batere-dan-tripod/update-bracket-projector-dan-lcd-plasma-tv-607136.html
607136
Nego
7
270/27017/bracket_lcd_plasma_3a-1274291780.JPG
2010-05-19 17:56:45
[UPDATE] BRACKET Projector dan LCD/PLASMA TV
1
0
0
0
0
0
0
0
28


above is my recent XML list, I can't do search for example searching
"bracket" word, it will return empty list, so after I search on the
internet, I found out that there is a mistake in my XML Schema, I should
change the schema so it will return a list below (see the bold one):


1
1.0
27017
Bracket Ceiling untuk semua merk projector,
panjang 60-90 cm  Bahan Besi Cat Hitam = 325rb Bahan Sta
/aksesoris-batere-dan-tripod/update-bracket-projector-dan-lcd-plasma-tv-607136.html
607136
Nego
7
270/27017/bracket_lcd_plasma_3a-1274291780.JPG
2010-05-19 17:56:45
[UPDATE] BRACKET Projector dan LCD/PLASMA
TV
1
0
0
0
0
0
0
0
28


my question is, how to change my Schema so it will return the list like the
one above that I bold?
thanks before
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/XML-Format-tp1024608p1024608.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solrj ContentStreamUpdateRequest Slow

2010-08-04 Thread jayendra patil
ContentStreamUpdateRequest seems to read the file contents and transfer it
over http, which slows down the indexing.

Try Using StreamingUpdateSolrServer with stream.file param @
http://wiki.apache.org/solr/SolrPerformanceFactors#Embedded_vs_HTTP_Post

e.g.

SolrServer server = new StreamingUpdateSolrServer("Solr Server URL",20,8);
UpdateRequest req = new UpdateRequest("/update/extract");
ModifiableSolrParams params = null ;
params = new ModifiableSolrParams();
params.add("stream.file", new String[]{"local file path"});
params.set("literal.id", value);
req.setParams(params);
server.request(req);
server.commit();

Regards,
Jayendra

On Wed, Aug 4, 2010 at 3:01 PM, Tod  wrote:

> I'm running a slight variation of the example code referenced below and it
> takes a real long time to finally execute.  In fact it hangs for a long time
> at solr.request(up) before finally executing.  Is there anything I can look
> at or tweak to improve performance?
>
> I am also indexing a local pdf file, there are no firewall issues, solr is
> running on the same machine, and I tried the actual host name in addition to
> localhost but nothing helps.
>
>
> Thanks - Tod
>
> http://wiki.apache.org/solr/ContentStreamUpdateRequestExample
>


how to take a value from the query result

2010-08-04 Thread twojah

this is my query in browser navigation toolbar
http://172.16.17.126:8983/search/select/?q=AUC_ID:607136

and this is the result in browser page:
...

1
1.0
576
27017
Bracket Ceiling untuk semua merk projector,
panjang 60-90 cm  Bahan Besi Cat Hitam = 325rb Bahan Sta
/aksesoris-batere-dan-tripod/update-bracket-projector-dan-lcd-plasma-tv-607136.html
607136
Nego
7
270/27017/bracket_lcd_plasma_3a-1274291780.JPG
2010-05-19 17:56:45
[UPDATE] BRACKET Projector dan LCD/PLASMA TV
1
0
0
0
0
0
0
0
28


I want to get the AUC_CAT value (576) and using it in my PHP, how can I get
that value?
please help
thanks before
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-take-a-value-from-the-query-result-tp1025119p1025119.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: No "group by"? looking for an alternative.

2010-08-04 Thread Mickael Magniez

Thanks for your response.

Unfortunately, I don't think it'll be enough. In fact, I have many other
products than shoes in my index, with many other facets fields.

I simplified my schema : in reality facets are dynamic fields.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/No-group-by-looking-for-an-alternative-tp1022738p1025256.html
Sent from the Solr - User mailing list archive at Nabble.com.