Re: Question on query syntax

2007-07-12 Thread Chris Hostetter

: Solr can process the query which has NOT operator ("-") in the head.
: If Solr find it, Solr adds MatchAllDocsQuery automatically
: in the front of that query as follows:

thta's not strictly true ... Solr doesn't *add* a MatchAllDocsQuery if the
query is entirely prohibitive, instead Solr executes a MatchAllDocsQuery
and then filters that by the DocSet returned by the "absolute value" of
the orriginal query.

the end result should be functionaly equivilent, but this approach caches
better (both "-text:foo" and "text:foo" are cached the same) ... the
downside is that the debuging info for purely prohibitive queries is
currently incorrect (see SOLR-119)



-Hoss



Re: Need question to configure Log4j for solr

2007-07-12 Thread Chris Hostetter

: The one issue I ran into was with daily rolling log files - maybe I
: missed it, but I didn't find that functionality in the JDK logging
: package, however it is in log4j.
:
: I'm not advocating a change, just noting this. We worked around it by
: leveraging Resin's support for wrapping a logger (set up for daily
: rolling log files) around a webapp.

as i recall Resin doens't wrap JDK logging -- they provide subclasses of
java.util.logging.Handler that do what they want (log rotation, writitng
to syslog, etc...) and provide their own LogManager subclass so they
logging can be configured in their resin.conf.

(basically they do all the things the java logging spec was designed to
let people do to have control over logging without needing any third party
logging frameworks ... it's just too bad JDK logging doesn't include all
the nice Handlers and Formatters and configuration helper utilities that
would make the API seem more usefull to people comaring it with log4j and
the other abstraction frameworks)



-Hoss



Re: Question on query syntax

2007-07-12 Thread Mike Klaas

On 12-Jul-07, at 6:25 PM, Lance Lance wrote:


A simplified version of the problem:

text -(collection:pile1)

works, while

text (-collection:pile1)

finds zero records.


see my other message.  You cannot create a (sub)query with only  
prohibited clauses.  The second query asks:


Q = find docs containing 'text' or matching X;
X = find docs that don't match 'collection:pile1' # invalid query

note that if "-collection:pile1" is the main query, Solr detects this  
case and handles it.


-Mike


RE: Question on query syntax

2007-07-12 Thread Lance Lance
A simplified version of the problem:
 
text -(collection:pile1)
 
works, while
 
text (-collection:pile1)
 
finds zero records.
 
lance
 

  _  

From: Lance Lance [mailto:[EMAIL PROTECTED] 
Sent: Thursday, July 12, 2007 5:58 PM
To: 'solr-user@lucene.apache.org'
Subject: Question on query syntax


Are there any known bugs in the syntax parser? We're using lucene-2.2.0 and
Solr 1.2.
 
We have documents with searchable text and a field 'collection'.
 
This query works as expected, finding everything except for collections
'pile1' and 'pile2'.
 
text -(collection:pile1 OR collection:pile2)
 
When we apply De Morgan's Law, we get 0 records:
 
text (-collection:pile1 AND -collection:pile2)
 
This should return all records, but it returns nothing:
 
text (-collection:pile1 OR -collection:pile2)
 
Thanks,
 
Lance
 


Re: Question on query syntax

2007-07-12 Thread Koji Sekiguchi

Lance,

I think you are right. I met the same problem before.

> -(collection:pile1 OR collection:pile2)

Solr can process the query which has NOT operator ("-") in the head.
If Solr find it, Solr adds MatchAllDocsQuery automatically
in the front of that query as follows:

MatchAllDocsQuery -(collection:pile1 OR collection:pile2)

Then Lucene can process this query properly.

However, Solr doesn't add MatchAllDocsQuery if the query doesn't
have NOT operator in the head.

To avoid this problem, you can add "*:*" at the front of your query:

(*:* -collection:pile1 AND -collection:pile2)
(*:* -collection:pile1 OR -collection:pile2)

Thank you,

Koji


Lance Lance wrote:

Are there any known bugs in the syntax parser? We're using lucene-2.2.0 and
Solr 1.2.
 
We have documents with searchable text and a field 'collection'.
 
This query works as expected, finding everything except for collections

'pile1' and 'pile2'.
 
text -(collection:pile1 OR collection:pile2)
 
When we apply De Morgan's Law, we get 0 records:
 
text (-collection:pile1 AND -collection:pile2)
 
This should return all records, but it returns nothing:
 
text (-collection:pile1 OR -collection:pile2)
 
Thanks,
 
Lance
 

  




RE: Question on query syntax

2007-07-12 Thread Lance Lance
Ok, here's a simpler version:
 

  _  

From: Lance Lance [mailto:[EMAIL PROTECTED] 
Sent: Thursday, July 12, 2007 5:58 PM
To: 'solr-user@lucene.apache.org'
Subject: Question on query syntax


Are there any known bugs in the syntax parser? We're using lucene-2.2.0 and
Solr 1.2.
 
We have documents with searchable text and a field 'collection'.
 
This query works as expected, finding everything except for collections
'pile1' and 'pile2'.
 
text -(collection:pile1 OR collection:pile2)
 
When we apply De Morgan's Law, we get 0 records:
 
text (-collection:pile1 AND -collection:pile2)
 
This should return all records, but it returns nothing:
 
text (-collection:pile1 OR -collection:pile2)
 
Thanks,
 
Lance
 


Re: Question on query syntax

2007-07-12 Thread Mike Klaas

On 12-Jul-07, at 5:58 PM, Lance Lance wrote:

Are there any known bugs in the syntax parser? We're using  
lucene-2.2.0 and

Solr 1.2.

We have documents with searchable text and a field 'collection'.

This query works as expected, finding everything except for  
collections

'pile1' and 'pile2'.

text -(collection:pile1 OR collection:pile2)

When we apply De Morgan's Law, we get 0 records:

text (-collection:pile1 AND -collection:pile2)

This should return all records, but it returns nothing:

text (-collection:pile1 OR -collection:pile2)


Lucene's "boolean" operators are not true boolean operators.   
Instead, every clause is one of:


OPTIONAL
REQUIRED
PROHIBITED

for a query (or parenthesized subqueries) to match, all REQUIRED  
clauses must match, zero PROHIBITED clauses must match, and if there  
are not REQUIRED clauses, at least one OPTIONAL must match.  You  
cannot have only PROHIBITED clauses.


Now, the syntax for each is (nothing), +, -, and they can be applied  
to entire subqueries using brackets:


+hello -(goodbye -night)

returns docs that have hello, and do not have (goodbye without night)

In lucene, AND/OR/NOT are syntactic sugar that translates clauses to  
the above form.  However, it imperfectly matches people's (rational)  
expectations of how boolean operators work.  Also, brackets _create  
subqueries_, not just group operators.  I suggest that AND and OR  
never be used programmatically, if possible.


Try these alternatives:

docs (must) containing 'text' that do not match (col=pile1 or col=pile2)

text -(collection:pile1 collection:pile2)


same as above

text -collection:pile1 -collection:pile2


docs (must) contain 'text' that (must) match (col=pile1 or col=pile2)

+text +(collection:pile1 collection:pile2)


Note in the last example, the + is necessary before the text because  
otherwise it would be optional and not required (as there are other  
required clauses).


-Mike






Re: Need question to configure Log4j for solr

2007-07-12 Thread Ken Krugler

: the troubles comes when you integrate third-party stuff depending on
: log4j (as I currently do). Having said this you have a strong point when
: looking at http://www.qos.ch/logging/classloader.jsp

there have been several discussions baout changing the logger used by Solr
... the best summation i can give to these discussions is:

  * JDK logging is universal
  * using any other logging framework would add a dependency without
adding functionality


The one issue I ran into was with daily rolling log files - maybe I 
missed it, but I didn't find that functionality in the JDK logging 
package, however it is in log4j.


I'm not advocating a change, just noting this. We worked around it by 
leveraging Resin's support for wrapping a logger (set up for daily 
rolling log files) around a webapp.


-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"If you can't find it, you can't fix it"


Question on query syntax

2007-07-12 Thread Lance Lance
Are there any known bugs in the syntax parser? We're using lucene-2.2.0 and
Solr 1.2.
 
We have documents with searchable text and a field 'collection'.
 
This query works as expected, finding everything except for collections
'pile1' and 'pile2'.
 
text -(collection:pile1 OR collection:pile2)
 
When we apply De Morgan's Law, we get 0 records:
 
text (-collection:pile1 AND -collection:pile2)
 
This should return all records, but it returns nothing:
 
text (-collection:pile1 OR -collection:pile2)
 
Thanks,
 
Lance
 


Re: Deleting from a very active index

2007-07-12 Thread Yonik Seeley

I was going to say... that exception should never happen since solr
controls and synchronizes adds/deletes at a higher layer (with only
one solr instance accessing an index, we don't really need lucene
level locking at all).

One major cause of this is a crash/restart of the JVM leaving a stale
lock file behind.  Those can be removed automatically at startup with
a tweak in solrconfig.xml

-Yonik


On 7/12/07, Matthew Runo <[EMAIL PROTECTED]> wrote:

It looks like somehow the write.lock got hung. I manually removed the
lock, and now things are good.

Very strange.


Re: Deleting from a very active index

2007-07-12 Thread Matthew Runo
It looks like somehow the write.lock got hung. I manually removed the  
lock, and now things are good.


Very strange.

++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++


On Jul 12, 2007, at 1:32 PM, Matthew Runo wrote:


Hello!

I'm trying to remove a whole brand from our search index, but at  
the same time we're also running an import for others. This means  
the index is extreamly active at this time.


I am getting a lock timeout error, but not sure what to do about  
it... should I just keep trying till it can get the lock to do the  
delete?


[EMAIL PROTECTED]:/home/mruno]$ curl http://search1.zappos.com:8080/solr/ 
update --silent --data-binary "brand:Harley- 
Davidson" -H 'Content-type:text/xml; charset=utf-8'
org.apache.solr.core.SolrException: Error  
deleting doc# 966
at org.apache.solr.update.UpdateHandler 
$DeleteHitCollector.collect(UpdateHandler.java:175)

at org.apache.lucene.search.Scorer.score(Scorer.java:49)
at org.apache.lucene.search.IndexSearcher.search 
(IndexSearcher.java:146)
at org.apache.solr.search.SolrIndexSearcher.search 
(SolrIndexSearcher.java:407)

at org.apache.lucene.search.Searcher.search(Searcher.java:118)
at org.apache.solr.update.DirectUpdateHandler2.deleteByQuery 
(DirectUpdateHandler2.java:343)
at org.apache.solr.handler.XmlUpdateRequestHandler.update 
(XmlUpdateRequestHandler.java:260)
at  
org.apache.solr.handler.XmlUpdateRequestHandler.doLegacyUpdate 
(XmlUpdateRequestHandler.java:355)
at org.apache.solr.servlet.SolrUpdateServlet.doPost 
(SolrUpdateServlet.java:58)
at javax.servlet.http.HttpServlet.service(HttpServlet.java: 
710)
at javax.servlet.http.HttpServlet.service(HttpServlet.java: 
803)
at  
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter 
(ApplicationFilterChain.java:269)
at org.apache.catalina.core.ApplicationFilterChain.doFilter 
(ApplicationFilterChain.java:188)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter 
(SolrDispatchFilter.java:185)
at  
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter 
(ApplicationFilterChain.java:215)
at org.apache.catalina.core.ApplicationFilterChain.doFilter 
(ApplicationFilterChain.java:188)
at org.apache.catalina.core.StandardWrapperValve.invoke 
(StandardWrapperValve.java:210)
at org.apache.catalina.core.StandardContextValve.invoke 
(StandardContextValve.java:174)
at org.apache.catalina.core.StandardHostValve.invoke 
(StandardHostValve.java:127)
at org.apache.catalina.valves.ErrorReportValve.invoke 
(ErrorReportValve.java:117)
at org.apache.catalina.core.StandardEngineValve.invoke 
(StandardEngineValve.java:108)
at org.apache.catalina.connector.CoyoteAdapter.service 
(CoyoteAdapter.java:151)
at org.apache.coyote.http11.Http11Processor.process 
(Http11Processor.java:870)
at org.apache.coyote.http11.Http11BaseProtocol 
$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java: 
665)
at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket 
(PoolTcpEndpoint.java:528)
at  
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt 
(LeaderFollowerWorkerThread.java:81)
at org.apache.tomcat.util.threads.ThreadPool 
$ControlRunnable.run(ThreadPool.java:685)

at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.lucene.store.LockObtainFailedException: Lock  
obtain timed out: SimpleFSLock@/opt/solr/data/index/write.lock

at org.apache.lucene.store.Lock.obtain(Lock.java:70)
at org.apache.lucene.index.IndexReader.acquireWriteLock 
(IndexReader.java:626)
at org.apache.lucene.index.IndexReader.deleteDocument 
(IndexReader.java:660)
at org.apache.solr.update.UpdateHandler 
$DeleteHitCollector.collect(UpdateHandler.java:170)

... 27 more

++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++






Deleting from a very active index

2007-07-12 Thread Matthew Runo

Hello!

I'm trying to remove a whole brand from our search index, but at the  
same time we're also running an import for others. This means the  
index is extreamly active at this time.


I am getting a lock timeout error, but not sure what to do about  
it... should I just keep trying till it can get the lock to do the  
delete?


[EMAIL PROTECTED]:/home/mruno]$ curl http://search1.zappos.com:8080/solr/ 
update --silent --data-binary "brand:Harley-Davidsonquery>" -H 'Content-type:text/xml; charset=utf-8'
org.apache.solr.core.SolrException: Error deleting  
doc# 966
at org.apache.solr.update.UpdateHandler 
$DeleteHitCollector.collect(UpdateHandler.java:175)

at org.apache.lucene.search.Scorer.score(Scorer.java:49)
at org.apache.lucene.search.IndexSearcher.search 
(IndexSearcher.java:146)
at org.apache.solr.search.SolrIndexSearcher.search 
(SolrIndexSearcher.java:407)

at org.apache.lucene.search.Searcher.search(Searcher.java:118)
at org.apache.solr.update.DirectUpdateHandler2.deleteByQuery 
(DirectUpdateHandler2.java:343)
at org.apache.solr.handler.XmlUpdateRequestHandler.update 
(XmlUpdateRequestHandler.java:260)
at  
org.apache.solr.handler.XmlUpdateRequestHandler.doLegacyUpdate 
(XmlUpdateRequestHandler.java:355)
at org.apache.solr.servlet.SolrUpdateServlet.doPost 
(SolrUpdateServlet.java:58)

at javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
at  
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter 
(ApplicationFilterChain.java:269)
at org.apache.catalina.core.ApplicationFilterChain.doFilter 
(ApplicationFilterChain.java:188)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter 
(SolrDispatchFilter.java:185)
at  
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter 
(ApplicationFilterChain.java:215)
at org.apache.catalina.core.ApplicationFilterChain.doFilter 
(ApplicationFilterChain.java:188)
at org.apache.catalina.core.StandardWrapperValve.invoke 
(StandardWrapperValve.java:210)
at org.apache.catalina.core.StandardContextValve.invoke 
(StandardContextValve.java:174)
at org.apache.catalina.core.StandardHostValve.invoke 
(StandardHostValve.java:127)
at org.apache.catalina.valves.ErrorReportValve.invoke 
(ErrorReportValve.java:117)
at org.apache.catalina.core.StandardEngineValve.invoke 
(StandardEngineValve.java:108)
at org.apache.catalina.connector.CoyoteAdapter.service 
(CoyoteAdapter.java:151)
at org.apache.coyote.http11.Http11Processor.process 
(Http11Processor.java:870)
at org.apache.coyote.http11.Http11BaseProtocol 
$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket 
(PoolTcpEndpoint.java:528)
at  
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt 
(LeaderFollowerWorkerThread.java:81)
at org.apache.tomcat.util.threads.ThreadPool 
$ControlRunnable.run(ThreadPool.java:685)

at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.lucene.store.LockObtainFailedException: Lock  
obtain timed out: SimpleFSLock@/opt/solr/data/index/write.lock

at org.apache.lucene.store.Lock.obtain(Lock.java:70)
at org.apache.lucene.index.IndexReader.acquireWriteLock 
(IndexReader.java:626)
at org.apache.lucene.index.IndexReader.deleteDocument 
(IndexReader.java:660)
at org.apache.solr.update.UpdateHandler 
$DeleteHitCollector.collect(UpdateHandler.java:170)

... 27 more

++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++




Re: snappuller copying to wrong directory?

2007-07-12 Thread Bill Au

That change doesn't have anything to do with where snappuller place the
snapshots.
Is the environment variable data_dir set up correctly in conf/scripts.conf?
That's where
snappuller puts the snapshots.

Bill

On 7/12/07, Kevin Lewandowski <[EMAIL PROTECTED]> wrote:


I've been running solr replication for several months with no issues
but recently had an instance where snappuller was running for about
1.5 hours. rsync was still active, so it was still copying data. I
also noticed that there was a snapshot.200707 directory inside of
the main index directory.

I'm running an early version of snappuller. Could there have been any
changes to fix a problem like this?

I noticed this one in svn:
revision 529471
"avoid recursive find, test for maxdepth support, filter snapshot
names on master: SOLR-207"

thanks,
Kevin



Re: custom sorting for multivalued field

2007-07-12 Thread Chris Hostetter

: Is it possible to assign a custom sorting value for
: each of the values in the multivalued field?  So that
: the document gets sorted differently, depending on the
: matched value in the multivalued field.

Sorting happens extremely independently from matching ... there is no
mechanism available in the underlying Lucene code to allow the Sorting
logic to know why a particular document is a match.

: The other approach would be to store each
: document/keyword pair as a separate document with the
: sorting value as an explicit field.  Is it possible to
: filter the results on the Solr end (based on the
: relevancy of the matched keyword), so that the same
: original document doesn't appear in the result set
: twice?

can you elaborate a bit more on what exactly it is you are trying to
achieve? ...i'm having a hard time understanding hte motivation for
sorting on a keyword field where the sort order is on the keyword that
matches .. for simple single word queries all document will sort
identically, for multi-word queries you might as well just search on each
word seperately and concatenate the result sets in order -- except in the
case where a single document matches on more then one of your query terms,
but you've already said you just want it to appear once ... but why would
you want documents in this order in the first place?

my first assumption would be thaty ou just want docs which match on very
rare keyword to come first and you are only doing searches on this keyword
field, then regualr sort by score should do what you want ... but you
might wnat to omitNOrms and maybe change the coordFactor in your
similarity.




-Hoss



Re: Deleting from index via web

2007-07-12 Thread Mike Klaas

On 12-Jul-07, at 6:33 AM, vanderkerkoff wrote:



I/my boss and me worked it out.

The delete funtion in solr.py looks like this
def delete(self, id):
xstr = ''+self.escapeVal(`id`)+''
return self.doUpdateXML(xstr)

As we're not passing an integer it get's all c*nty booby, technical  
term.


So if I rewrite the delete to be like this

  def delete(self, id):
xstr = ''+ id + ''
print xstr
return self.doUpdateXML(xstr)

It works fine.

There's no need for escapeVal, as I know the words I'll be sending  
prior to
the ID, in fact, I'm not sure why escapeVal is in there at all if  
you can't

send it non integer values.

Maybe someone can enlighten us.


I would suggest replacing it with

self.escapeVal(unicode(id))

backticks are equivalent to repr(), which does the wrong thing for  
strings.


-Mike


Re: Need question to configure Log4j for solr

2007-07-12 Thread Ryan McKinley

Check two related discussions:
http://www.nabble.com/logging---slf4j--tf3366438.html#a9366144
where I suggested using slf4j

and:
http://www.nabble.com/Changing-Logging-in-Solr-to-Apache-Commons-Logging-tf3484843.html#a9744439

I'm still for switching to slf4j, but am pushing it as JDK logging is fine.


Siegfried Goeschl wrote:

Hi folks,

would be using commons-logging an improvement? It is a common 
requirement to hook up different logging infrastructure ..


Cheers,

Siegfried Goeschl

Erik Hatcher wrote:


On Jul 11, 2007, at 9:07 PM, solruser wrote:
How do I configure solr to use log4j logging. I am able to configure 
tomcat
5.5.23 to use log4j. But I could not get solr to use log4j. I have 3 
context

of solr running in tomcat which refers to war file in commons.


Solr uses standard JDK logging.  I'm sure it could be bridged to log4j 
somehow, but rather I'd recommend you just configure JDK logging how 
you'd like.


Erik









Re: Need question to configure Log4j for solr

2007-07-12 Thread Chris Hostetter

: the troubles comes when you integrate third-party stuff depending on
: log4j (as I currently do). Having said this you have a strong point when
: looking at http://www.qos.ch/logging/classloader.jsp

there have been several discussions baout changing the logger used by Solr
... the best summation i can give to these discussions is:

  * JDK logging is universal
  * using any other logging framework would add a dependency without
adding functionality
  * there are too many differnet frameworks, each with their own pros/cons
supporters/objectors that switching to any of them would be an uphill
social battle as well as an code effort expenditure.
  * as a webapp, Solr runs ina Servlet Container - any third party
logging framework we may pick to use could have bad interaction with
some Servlet Containers (ie: classloader issues, etc...) but all
servlet containers must be able to handle JDK logging.

Some reading that should be considered mandatory before any futher
discussion...

http://www.nabble.com/logging---slf4j--tf3366438.html#a9366144
http://www.nabble.com/Changing-Logging-in-Solr-to-Apache-Commons-Logging-tf3484843.html#a9782039

Specificly with regard to commons-logging, note the last paragraph of this
URL...

http://wiki.apache.org/jakarta-commons/Commons_Logging_FUD

"...In fact, there are very limited circumstances in which Commons Logging
is useful. If you're building a stand-alone application, don't use
commons-logging. ..."


-Hoss



snappuller copying to wrong directory?

2007-07-12 Thread Kevin Lewandowski

I've been running solr replication for several months with no issues
but recently had an instance where snappuller was running for about
1.5 hours. rsync was still active, so it was still copying data. I
also noticed that there was a snapshot.200707 directory inside of
the main index directory.

I'm running an early version of snappuller. Could there have been any
changes to fix a problem like this?

I noticed this one in svn:
revision 529471
"avoid recursive find, test for maxdepth support, filter snapshot
names on master: SOLR-207"

thanks,
Kevin


Re: Embedded Solr with Java 1.4.x

2007-07-12 Thread Ryan McKinley


solr requires 1.5.  It uses generics and a bunch of other 1.5 code.


Jery Cook wrote:

QUESTION:


Jeryl Cook
 
^ Pharaoh ^

http://pharaohofkush.blogspot.com/
I need to make solr work with java 1.4, the orgnaization I work for has not
approved java 1.5 for the network...Before I download the source code and
see if this is possible, what do u guys thing the level of effort will be?





[Jery Cook] 








Re: Facet Field Limits

2007-07-12 Thread Yonik Seeley

On 7/12/07, Andrew Nagy <[EMAIL PROTECTED]> wrote:

My question is: Is there a way to change the limit per field?  Let's say on 
facet 2 I would like to display 10 values instead of 5 like the other facets.



From the wiki: http://wiki.apache.org/solr/SimpleFacetParameters


Parameters
These are the parameters used to drive the Simple Faceting behavior,
note that some parameters may be overridden on a per-field basis with
the following syntax:
   * f..=
eg. f.category.facet.limit=5

-Yonik


Facet Field Limits

2007-07-12 Thread Andrew Nagy
Hello, I would like to generate a list of facets, let's say on 5 fields.  I 
have the facet limit set to 5 so that for each of the 5 fields there will only 
by up to 5 values.

My question is: Is there a way to change the limit per field?  Let's say on 
facet 2 I would like to display 10 values instead of 5 like the other facets.

Thanks!
Andrew


Re: Embedded Solr with Java 1.4.x

2007-07-12 Thread Yonik Seeley

Oh, and please don't cross-post :-)

On 7/12/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:

On 7/12/07, Jery Cook <[EMAIL PROTECTED]> wrote:
> http://pharaohofkush.blogspot.com/
> I need to make solr work with java 1.4, the orgnaization I work for has not
> approved java 1.5 for the network...Before I download the source code and
> see if this is possible, what do u guys thing the level of effort will be?

1) push your organization to get into the 21st century ;-)
2) start with some of the tools available that can convert 1.5 classes to 1.4

If neither (1) or (2) works, the effort level would probably be substantial.

-Yonik



Re: Embedded Solr with Java 1.4.x

2007-07-12 Thread Yonik Seeley

On 7/12/07, Jery Cook <[EMAIL PROTECTED]> wrote:

http://pharaohofkush.blogspot.com/
I need to make solr work with java 1.4, the orgnaization I work for has not
approved java 1.5 for the network...Before I download the source code and
see if this is possible, what do u guys thing the level of effort will be?


1) push your organization to get into the 21st century ;-)
2) start with some of the tools available that can convert 1.5 classes to 1.4

If neither (1) or (2) works, the effort level would probably be substantial.

-Yonik


Embedded Solr with Java 1.4.x

2007-07-12 Thread Jery Cook
QUESTION:


Jeryl Cook
 
^ Pharaoh ^
http://pharaohofkush.blogspot.com/
I need to make solr work with java 1.4, the orgnaization I work for has not
approved java 1.5 for the network...Before I download the source code and
see if this is possible, what do u guys thing the level of effort will be?





[Jery Cook] 




Re: Need question to configure Log4j for solr

2007-07-12 Thread Siegfried Goeschl

Hi Erik,

the troubles comes when you integrate third-party stuff depending on 
log4j (as I currently do). Having said this you have a strong point when 
looking at http://www.qos.ch/logging/classloader.jsp


Cheers,

Siegfried Goeschl

Erik Hatcher wrote:


On Jul 12, 2007, at 9:03 AM, Siegfried Goeschl wrote:
would be using commons-logging an improvement? It is a common 
requirement to hook up different logging infrastructure ..


My personal take on it is *adding* a dependency to keep functionality 
the same isn't an improvement.  JDK logging, while not with as many 
bells and whistles as Commons Logging, log4j, etc, is plenty good enough 
and keeps us away from many of logging JARmageddon headaches.


I'm not against a logging change should others have different opinions 
with a strong case of improvement.


Erik





Re: Need question to configure Log4j for solr

2007-07-12 Thread Erik Hatcher


On Jul 12, 2007, at 9:03 AM, Siegfried Goeschl wrote:
would be using commons-logging an improvement? It is a common  
requirement to hook up different logging infrastructure ..


My personal take on it is *adding* a dependency to keep functionality  
the same isn't an improvement.  JDK logging, while not with as many  
bells and whistles as Commons Logging, log4j, etc, is plenty good  
enough and keeps us away from many of logging JARmageddon headaches.


I'm not against a logging change should others have different  
opinions with a strong case of improvement.


Erik



RE: How to run the Embedded Solr Sample

2007-07-12 Thread Kijiji Xu, Ping

1 we have to set the solr home in the main function manually, because there is 
some problem to set �CDsolr.solr.home=… in the java command parameters, it 
looks like SolrCore didn’t read the parameter,I'm not sure about this problem, 
so we write in main function: 
Config.setInstanceDir(“E:/apache-solr-1.2.0/example/solr”);

2 there are some libs needed to run the EmbeddedSolr application, too. So we 
would like to copy these libs to our lib folder, and add them to the java build 
path. 

apache-solr-1.2.0/dist/apache-solr-1.2.0.jar

apache-solr-1.2.0/lib/lucene-core-2007-05-20_00-04-53.jar

apache-solr-1.2.0/lib/lucene-analyzers-2007-05-20_00-04-53.jar

apache-solr-1.2.0/lib/lucene-snowball-2007-05-20_00-04-53.jar

apache-solr-1.2.0/lib/lucene-highlighter-2007-05-20_00-04-53.jar

apache-solr-1.2.0/lib/xpp3-1.1.3.4.O.jar

do not use lucene 2.1’s libs but 2.2,it's not supported by apache-solr1.2.0
-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: 2007年7月11日 21:00
To: solr-user@lucene.apache.org
Subject: Re: How to run the Embedded Solr Sample

> 
>  How can i run this program? 
>  In apache site they said its like sample "example" program. If so where i
> have to place this file in tomcat?
> 

If you are running tomcat, this is *not* the way to use solr.

Using tomcat, check:
http://wiki.apache.org/solr/SolrTomcat


Re: Deleting from index via web

2007-07-12 Thread vanderkerkoff

I/my boss and me worked it out.

The delete funtion in solr.py looks like this
def delete(self, id):
xstr = ''+self.escapeVal(`id`)+''
return self.doUpdateXML(xstr)

As we're not passing an integer it get's all c*nty booby, technical term.

So if I rewrite the delete to be like this

  def delete(self, id):
xstr = ''+ id + ''
print xstr
return self.doUpdateXML(xstr)

It works fine.

There's no need for escapeVal, as I know the words I'll be sending prior to
the ID, in fact, I'm not sure why escapeVal is in there at all if you can't
send it non integer values.

Maybe someone can enlighten us.
-- 
View this message in context: 
http://www.nabble.com/Deleting-from-index-via-web-tf4066903.html#a11560068
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Need question to configure Log4j for solr

2007-07-12 Thread Siegfried Goeschl

Hi folks,

would be using commons-logging an improvement? It is a common 
requirement to hook up different logging infrastructure ..


Cheers,

Siegfried Goeschl

Erik Hatcher wrote:


On Jul 11, 2007, at 9:07 PM, solruser wrote:
How do I configure solr to use log4j logging. I am able to configure 
tomcat
5.5.23 to use log4j. But I could not get solr to use log4j. I have 3 
context

of solr running in tomcat which refers to war file in commons.


Solr uses standard JDK logging.  I'm sure it could be bridged to log4j 
somehow, but rather I'd recommend you just configure JDK logging how 
you'd like.


Erik





Re: Deleting from index via web

2007-07-12 Thread vanderkerkoff

ok, I'm now printing out the xstr variable that the delete in solr.py uses
when it's trying to delete.

it's coming out like this
'news:39'

Those quotes look suspicious

Going to work out how to switch more debugging on in solr now so I can see
what's going on exactly
-- 
View this message in context: 
http://www.nabble.com/Deleting-from-index-via-web-tf4066903.html#a11559119
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Deleting from index via web

2007-07-12 Thread vanderkerkoff

Different tactic now

adding like this
idstring = "news:%s"; % self.id  
c.add(id=idstring,url_t=e_url,body_t=body4solr,title_t=title4solr,summary_t=summary4solr,contact_name_t=contactname4solr)
c.commit(optimize=True)

Goes in fine, search results show an ID of news:36

Delete like this
delidstring = "news:%s"; % self.id
c.delete(id=delidstring)
c.commit(optimize=True)

still no joy
-- 
View this message in context: 
http://www.nabble.com/Deleting-from-index-via-web-tf4066903.html#a11559113
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Deleting from index via web

2007-07-12 Thread vanderkerkoff

Done some more digging about this

here's my delete code
def delete(self):
from solr import SolrConnection
c = SolrConnection(host='localhost:8983', persistent=False)
e_url = '/news/' + self.created_at.strftime("%Y/%m/%d") + '/' + 
self.slug
e_url = e_url.encode('ascii','ignore')
c.delete(id=e_url)
c.commit(optimize=True)

I get this back from jetty

INFO: delete(id '/news/2007/07/12/pilly') 0 1

It's not deleting the record form the index though, even if I restart jetty.

I'm wondering if I can use URL's as ID's now.

-- 
View this message in context: 
http://www.nabble.com/Deleting-from-index-via-web-tf4066903.html#a11558048
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Need question to configure Log4j for solr

2007-07-12 Thread Erik Hatcher


On Jul 11, 2007, at 9:07 PM, solruser wrote:
How do I configure solr to use log4j logging. I am able to  
configure tomcat
5.5.23 to use log4j. But I could not get solr to use log4j. I have  
3 context

of solr running in tomcat which refers to war file in commons.


Solr uses standard JDK logging.  I'm sure it could be bridged to  
log4j somehow, but rather I'd recommend you just configure JDK  
logging how you'd like.


Erik



RE: A few questions regarding multi-word synonyms and parameters encoding

2007-07-12 Thread Ard Schrijvers
Hello,

> 
> but honestly i haven't relaly tried anything like this ... 
> the code for
> parsing the synonyms.txt file probaly splits the individual 
> synonyms on
> whitespace to prodce multiple tokens which might screw you up 
> ... you may
> need to get creative (perhaps use a PatternReplaceFilter to 
> encode your
> spaces as "_" before hte SynonymFilter and then another one 
> to convert the
> "_" back to " " after the Synonym filter ... kludgy but it might work)

I had to build exactly this recently, but without solr and only lucene. I chose 
to create a CompressFilter as the last filter, to reduce all tokens into one 
single token (since it were facet fields i do know there where only a couple of 
tokens, and not thousands, because then compressing them in a single token 
might be a problem (not sure))

So for building synonyms on facet fields which can contain multiple tokens, I 
would add your own SynonymAnalyzer, that compresses tokens and when a 
compressed token is found in a synonym map, replace the token with the synonym. 

So, in your SynonymAnalyzer something like

private Map synonyms; // initialize it

public TokenStream tokenStream(String fieldName, Reader reader) {

TokenStream result = super.tokenStream(fieldName, reader);
if(fieldName.equals("synonym_field")){
result = new CompressFilter(result,synonyms);
}
else if(fieldName.equals("compressed_field")){
result = new CompressFilter(result);
}
return result; 
}

and your CompressFilter
  
 
public CompressFilter(TokenStream in, Map synonyms) {
  super(in);
  this.synonyms = synonyms;
}
 
public CompressFilter(TokenStream in) {
super(in);
} 
public Token next() throws IOException {
Token t = input.next();
if(t==null){
return null;
}
StringBuffer sb = new StringBuffer();
while(t!=null){
sb.append(t.termText());
t = input.next();
}

if(synonyms!=null){
if(synonyms.containsKey(sb.toString())){
sb = new StringBuffer( (String)synonyms.get(sb.toString()) );
}
else{
return null; // synonym not found
}
}
return new Token(sb.toString(), 0, sb.toString().length());
}

I am not sure though how easy it is to put this in solr, but i suppose it isn't 
hard. Obviously, I am not sure what happens with the CompressFilter when there 
are *many* tokens in the "synonym_field" field.


Regards Ard


> 
> : Now I want create a link for each of these value so that 
> the user can filter
> : the results by that title by clicking on the link. For 
> example, if I click
> : on "Software Engineer", the results are now narrowed down 
> to just include
> : records with "Software Engineer" in their title. Since 
> "title" field can
> : contain special chars like '+', '&' ..., I really can't 
> find a clean way to
> : do this. At the moment, I replace all the space by '+' and 
> it seems to work
> : for words like "Software engineer" (converted to 
> "Software+Engineer").
> : However, "C++ Programmer" is converted to "C+++Programmer", 
> and it doesn't
> : seem to work (return no results). Any ideas?
> 
> for starters you need to URL encode *all* of hte characters, 
> not just the
> spaces ... space escapes to "+" but only becuase "+" escapes to %2B.
> 
> second, if you are dealing with multi-word values like this in your
> facets, you need to make sure to quote them when doing fq queries to
> (before url encoding) ... so if you have a facet.field 
> "skills" that lists
> "C++ Programmer" as the value, the fq query you want to use 
> would be...
>  skills:"C++ Programmer"
> 
> when you URL encode that it should become...
> 
>  fq=skills%3A%22C%2B%2B+Programmer%22
> 
> ...use teh echoParams=explicit&debugQuery=true params to see 
> exactly what
> your params look like when they've been URL decoded and what your
> query objects look like once they've been parsed.
> 
> 
> 
> -Hoss
> 
> 


RE: Question about synonyms

2007-07-12 Thread Ard Schrijvers
Hello,

> 
> Brievly, what I'm looking for is a query that launch 
> something like this:
> 
> Giving the user search expression
> "A B C D"
> 
> Generated Lucene query :
> (myfield:I OR myfield:J OR myfield:O OR myfield:K)
> 
> if someone knows a way to reach this goal, please tell me how, i'm
> actually tearing my hairs on this issues and I really appreciate some
> help !!

IMHO, it does not make very much sense to me to rewrite a phraseQuery like "A B 
C D" into a boolean OR query! 

But, if you really insist that it should work like this, I think it wont be to 
hard: 

You said,  myfield:("A B C D") is translated into PhraseQuery(myfield:"I J O 
K"). So, think you should start from there, and get the term(s) out of the 
translated PhraseQuery and rewrite this one into a boolean OR query.

I am by the way curious how the PhraseQuery works in combination with synonyms, 
because if my phrase is like:

"A B C D",

will there be first look for a synonym for "A B C D", if not found for "A B C", 
then for  "B C D", then for "A B" and "C D", and then for individual terms? 
Think when the phrase grows the combinations grow pretty fast isnt?

Regards Ard

> 
> Thanks you, and thanks to the solr team for this amazing product that
> really improved by x100 the performance of our search engine !
> 
> Laurent
> 


Deleting from index via web

2007-07-12 Thread vanderkerkoff

Hello everyone

We're adding records to our 1.1 index through django and python like so,
using the jetty app.
This is in the save definition.

from solr import SolrConnection
c = SolrConnection(host='localhost:8983', persistent=False)
c.add(id=e_url,url_t=e_url,body_t=body4solr,title_t=title4solr,summary_t=summary4solr,contact_name_t=contactname4solr)
c.commit(optimize=True)

I need to write a script to remove the item from the index in the delete
function.  Do I need to create all the items like I do on the add or can I
just somehow say delete all the records where the id=e_url?

Something like 

from solr import SolrConnection
c = SolrConnection(host='localhost:8983', persistent=False)
c.delete(* where id=e_url)
c.commit(optimize=True)

Any help as always is greatly appreciated.


-- 
View this message in context: 
http://www.nabble.com/Deleting-from-index-via-web-tf4066903.html#a11556220
Sent from the Solr - User mailing list archive at Nabble.com.