TZ parameter

2013-07-07 Thread Matt Mitchell
Hi,

I'm a little stumped on the TZ param for running a date range query. I've
indexed a single doc with a dateTime field value as 2013-07-08T00:00:00Z.
My query is basically this:

?q=date_dt:[* TO 2013-07-07T23:00:00Z]TZ=America/New_York

From what I'm seeing here:

  http://wiki.apache.org/solr/CoreQueryParameters#TZ

... the date literal I'm passing-in should? be converted to UTC (w/DST
applied), and end up looking something like: 2013-07-08T03:00:00Z -- am I
on the right track here?

When I run that query, the doc-set is empty. I would expect to see my
document in the result because the date range (UTC via TZ) includes the
documents' date_dt time value (UTC). What am I doing wrong here?

Thanks,
Matt


solr home in jar?

2012-07-17 Thread Matt Mitchell
Hi,

I'd like to bundle up a jar file, with a complete solr home and index.
This jar file is a dependency for another application, which uses an
instance of embedded solr, multi-core. Is there any way to have the
application's embedded solr, read the configs/index data from jar
dependency?

I attempted using CoreContainer with a resource loader (and many other
ways), but no luck! Any ideas?

- Matt


solr geospatial / spatial4j

2012-03-07 Thread Matt Mitchell
Hi,

I'm researching options for handling a better geospatial solution. I'm
currently using Solr 3.5 for a read-only database, and the
point/radius searches work great. But I'd like to start doing point in
polygon searches as well. I've skimmed through some of the geospatial
jira issues, and read about spaitial4j, which is very interesting. I
see on the github page that this will soon be part of lucene, can
anyone confirm this?

I attempted to build the spatial4j demo but no luck. It had problems
finding lucene 4.0-SNAPSHOT, which I guess is because there are no
4.0-SNAPSHOT nightly builds? If anyone knows how I can get around
this, please let me know!

Other than spatial4j, is there a way to do point in polgyon searches
with solr 3.5.0 right now? Is there some tricky indexing/querying
strategy that would allow this?

Thanks!

- Matt


KeywordTokenizerFactory and stopwords

2011-06-08 Thread Matt Mitchell
Hi,

I have an autocomplete fieldType that works really well, but because
the KeywordTokenizerFactory (if I understand correctly) is emitting a
single token, the stopword filter will not detect any stopwords.
Anyone know of a way to strip out stopwords when using
KeywordTokenizerFactory? I did try the reg-exp replace filter, but I'm
not sure I want to add a bunch of reg-exps for replacing every
stopword.

Thanks,
Matt

Here's the fieldType definition:

fieldType name=autocomplete class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.TrimFilterFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ASCIIFoldingFilterFactory/

filter class=solr.EdgeNGramFilterFactory minGramSize=1
maxGramSize=50/
  /analyzer
  analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.TrimFilterFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ASCIIFoldingFilterFactory/
  /analyzer
/fieldType


Re: KeywordTokenizerFactory and stopwords

2011-06-08 Thread Matt Mitchell
Hi Erik. Yes something like what you describe would do the trick. I
did find this:

http://lucene.472066.n3.nabble.com/Concatenate-multiple-tokens-into-one-td1879611.html

I might try the pattern replace filter with stopwords, even though
that feels kinda clunky.

Matt

On Wed, Jun 8, 2011 at 11:04 AM, Erik Hatcher erik.hatc...@gmail.com wrote:
 This seems like it deserves some kind of collecting TokenFilter(Factory) 
 that will slurp up all incoming tokens and glue them together with a space 
 (and allow separator to be configurable).   Hmmm surprised one of those 
 doesn't already exist.  With something like that you could have a standard 
 tokenization chain, and put it all back together at the end.

        Erik

 On Jun 8, 2011, at 10:59 , Matt Mitchell wrote:

 Hi,

 I have an autocomplete fieldType that works really well, but because
 the KeywordTokenizerFactory (if I understand correctly) is emitting a
 single token, the stopword filter will not detect any stopwords.
 Anyone know of a way to strip out stopwords when using
 KeywordTokenizerFactory? I did try the reg-exp replace filter, but I'm
 not sure I want to add a bunch of reg-exps for replacing every
 stopword.

 Thanks,
 Matt

 Here's the fieldType definition:

 fieldType name=autocomplete class=solr.TextField
 positionIncrementGap=100
  analyzer type=index
    tokenizer class=solr.KeywordTokenizerFactory/
    filter class=solr.TrimFilterFactory/
    filter class=solr.LowerCaseFilterFactory/
    filter class=solr.ASCIIFoldingFilterFactory/

    filter class=solr.EdgeNGramFilterFactory minGramSize=1
 maxGramSize=50/
  /analyzer
  analyzer type=query
    tokenizer class=solr.KeywordTokenizerFactory/
    filter class=solr.TrimFilterFactory/
    filter class=solr.LowerCaseFilterFactory/
    filter class=solr.ASCIIFoldingFilterFactory/
  /analyzer
 /fieldType




Solr throwing exception when evicting from filterCache

2011-03-24 Thread Matt Mitchell
I have a recent build of solr (4.0.0.2011.02.25.13.06.24). I am seeing this
error when making a request (with fq's), right at the point where the
eviction count goes from 0 up:

severe: java.lang.classcastexception: [ljava.lang.object; cannot be cast to
[lorg.apache.solr.common.util.concurrentlrucache$cacheentry

If you then make another request, Solr response with the expected result.

Is this a bug? Has anyone seen this before? Any tips/help/feedback/questions
would be much appreciated!

Thanks,
Matt


Re: Solr throwing exception when evicting from filterCache

2011-03-24 Thread Matt Mitchell
Here's the full stack trace:

[Ljava.lang.Object; cannot be cast to
[Lorg.apache.solr.common.util.ConcurrentLRUCache$CacheEntry;
java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to
[Lorg.apache.solr.common.util.ConcurrentLRUCache$CacheEntry; at
org.apache.solr.common.util.ConcurrentLRUCache$PQueue.myInsertWithOverflow(ConcurrentLRUCache.java:377)
at
org.apache.solr.common.util.ConcurrentLRUCache.markAndSweep(ConcurrentLRUCache.java:329)
at
org.apache.solr.common.util.ConcurrentLRUCache.put(ConcurrentLRUCache.java:144)
at org.apache.solr.search.FastLRUCache.put(FastLRUCache.java:131) at
org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:613)
at
org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:652)
at
org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1233)
at
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1086)
at
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:337)
at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:431)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:231)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1298) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:340)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at
org.mortbay.jetty.handler.ContextHandlerCollection.h

On Thu, Mar 24, 2011 at 1:54 PM, Matt Mitchell goodie...@gmail.com wrote:

 I have a recent build of solr (4.0.0.2011.02.25.13.06.24). I am seeing this
 error when making a request (with fq's), right at the point where the
 eviction count goes from 0 up:

 severe: java.lang.classcastexception: [ljava.lang.object; cannot be cast to
 [lorg.apache.solr.common.util.concurrentlrucache$cacheentry

 If you then make another request, Solr response with the expected result.

 Is this a bug? Has anyone seen this before? Any
 tips/help/feedback/questions would be much appreciated!

 Thanks,
 Matt



embedded solr and tomcat

2011-02-19 Thread Matt Mitchell
I'm considering running an embedded instance of Solr in Tomcat (Amazon's
beanstalk). Has anyone done this before? I'd be very interested in how I can
instantiate Embedded solr in Tomcat. Do I need a resource loader to
instantiate? If so, how?

Thanks,
Matt


api key filtering

2011-01-22 Thread Matt Mitchell
Just wanted to see if others are handling this in some special way, but I
think this is pretty simple.

We have a database of api keys that map to allowed db records. I'm
planning on indexing the db records into solr, along with their api keys in
an indexed, non-stored, multi-valued field. Then, to query for docs that
belong to a particular api key, they'll be queried using a filter query on
api_key.

The only concern of mine is that, what if we end up with 100k api_keys?
Would it be a problem to have 100k non-stored keys in each document? We have
about 500k documents total.

Matt


Re: api key filtering

2011-01-22 Thread Matt Mitchell
Hey thanks I'll definitely have a read. The only problem with this though,
is that our api is a thin layer of app-code, with solr only (no db), we
index data from our sql db into solr, and push the index off for
consumption.

The only other idea I had was to send a list of the allowed document ids
along with every solr query, but then I'm sure I'd run into a filter query
limit. Each key could be associated with up to 2k documents, so that's 2k
values in an fq which would probably be too many for lucene (I think its
limit 1024).

Matt

On Sat, Jan 22, 2011 at 3:40 PM, Dennis Gearon gear...@sbcglobal.netwrote:

 The only way that you would have that many api keys per record, is if one
 of
 them represented 'public', right? 'public' is a ROLE. Your answer is to use
 RBAC
 style techniques.


 Here are some links that I have on the subject. What I'm thinking of doing
 is:
 Sorry for formatting, Firefox is freaking out. I cut and pasted these from
 an
 email from my sent box. I hope the links came out.


 Part 1


 http://www.xaprb.com/blog/2006/08/16/how-to-build-role-based-access-control-in-sql/


 Part2
 Role-based access control in SQL, part 2 at Xaprb





 ACL/RBAC Bookmarks ALL

 UserRbac - symfony - Trac
 A Role-Based Access Control (RBAC) system for PHP
 Appendix C: Task-Field Access
 Role-based access control in SQL, part 2 at Xaprb
 PHP Access Control - PHP5 CMS Framework Development | PHP Zone
 Linux file and directory permissions
 MySQL :: MySQL 5.0 Reference Manual :: C.5.4.1 How to Reset the Root
 Password
 per RECORD/Entity permissions? - symfony users | Google Groups
 Special Topics: Authentication and Authorization | The Definitive Guide to
 Yii |
 Yii Framework

 att.net Mail (gear...@sbcglobal.net)
 Solr - User - Modelling Access Control
 PHP Generic Access Control Lists
 Row-level Model Access Control for CakePHP « some flot, some jet
 Row-level Model Access Control for CakePHP « some flot, some jet
 Yahoo! GeoCities: Get a web site with easy-to-use site building tools.
 Class that acts as a client to a JSON service : JSON « GWT « Java
 Juozas Kaziukėnas devBlog
 Re: [symfony-users] Implementing an existing ACL API in symfony
 php - CakePHP ACL Database Setup: ARO / ACO structure? - Stack Overflow
 W3C ACL System
 makeAclTables.sql
 SchemaWeb - Classes And Properties - ACL Schema
 Reardon's Ruminations: Spring Security ACL Schema for Oracle
 trunk/modules/auth/libraries/Khacl.php | Source/SVN | Assembla
 Acl.php - kohana-mptt - Project Hosting on Google Code
 Asynchronous JavaScript Technology and XML (Ajax) With the Java Platform
 The page cannot be found


  Dennis Gearon


 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a
 better
 idea to learn from others’ mistakes, so you do not have to make them
 yourself.
 from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


 EARTH has a Right To Life,
 otherwise we all die.



 - Original Message 
 From: Matt Mitchell goodie...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Sat, January 22, 2011 11:48:22 AM
 Subject: api key filtering

 Just wanted to see if others are handling this in some special way, but I
 think this is pretty simple.

 We have a database of api keys that map to allowed db records. I'm
 planning on indexing the db records into solr, along with their api keys in
 an indexed, non-stored, multi-valued field. Then, to query for docs that
 belong to a particular api key, they'll be queried using a filter query on
 api_key.

 The only concern of mine is that, what if we end up with 100k api_keys?
 Would it be a problem to have 100k non-stored keys in each document? We
 have
 about 500k documents total.

 Matt




Re: api key filtering

2011-01-22 Thread Matt Mitchell
I think that indexing the access information is going to work nicely, and I
agree that sticking with the simplest/solr way is best. The constraint is
super simple... you can view this set of documents or you can't... based on
an api key: fq=api_key:xxx

Thanks for the feedback on this guys!
Matt

2011/1/22 Jonathan Rochkind rochk...@jhu.edu

 If you COULD solve your problem by indexing 'public', or other tokens from
 a limited vocabulary of document roles, in a field -- then I'd definitely
 suggest you look into doing that, rather than doing odd things with Solr
 instead. If the only barrier is not currently having sufficient logic at the
 indexing stage to do that, then it is going to end up being a lot less of a
 headache in the long term to simply add a layer at the indexing stage to add
 that in, then trying to get Solr to do things outside of it's, well,
 'comfort zone'.

 Of course, depending on your requirements, it might not be possible to do
 that, maybe you can't express the semantics in terms of a limited set of
 roles applied to documents. And then maybe your best option really is
 sending an up to 2k element list (not exactly the same list every time,
 presumably) of acceptable documents to Solr with every query, and maybe you
 can get that to work reasonably.  Depending on how many different complete
 lists of documents you have, maybe there's a way to use Solr caches
 effectively in that situation, or maybe that's not even neccesary since
 lookup by unique id should be pretty quick anyway, not really sure.

 But if the semantics are possible, much better to work with Solr rather
 than against it, it's going to take a lot less tinkering to get Solr to
 perform well if you can just send an fq=role:public or something, instead of
 a list of document IDs.  You won't need to worry about it, it'll just work,
 because you know you're having Solr do what it's built to do. Totally worth
 a bit of work to add a logic layer at the indexing stage. IMO.
 
 From: Erick Erickson [erickerick...@gmail.com]
 Sent: Saturday, January 22, 2011 4:50 PM
 To: solr-user@lucene.apache.org
 Subject: Re: api key filtering

 1024 is the default number, it can be increased. See MaxBooleanClauses
 in solrconfig.xml

 This shouldn't be a problem with 2K clauses, but expanding it to tens of
 thousands is probably a mistake (but test to be sure).

 Best
 Erick

 On Sat, Jan 22, 2011 at 3:50 PM, Matt Mitchell goodie...@gmail.com
 wrote:

  Hey thanks I'll definitely have a read. The only problem with this
 though,
  is that our api is a thin layer of app-code, with solr only (no db), we
  index data from our sql db into solr, and push the index off for
  consumption.
 
  The only other idea I had was to send a list of the allowed document ids
  along with every solr query, but then I'm sure I'd run into a filter
 query
  limit. Each key could be associated with up to 2k documents, so that's 2k
  values in an fq which would probably be too many for lucene (I think its
  limit 1024).
 
  Matt
 
  On Sat, Jan 22, 2011 at 3:40 PM, Dennis Gearon gear...@sbcglobal.net
  wrote:
 
   The only way that you would have that many api keys per record, is if
 one
   of
   them represented 'public', right? 'public' is a ROLE. Your answer is to
  use
   RBAC
   style techniques.
  
  
   Here are some links that I have on the subject. What I'm thinking of
  doing
   is:
   Sorry for formatting, Firefox is freaking out. I cut and pasted these
  from
   an
   email from my sent box. I hope the links came out.
  
  
   Part 1
  
  
  
 
 http://www.xaprb.com/blog/2006/08/16/how-to-build-role-based-access-control-in-sql/
  
  
   Part2
   Role-based access control in SQL, part 2 at Xaprb
  
  
  
  
  
   ACL/RBAC Bookmarks ALL
  
   UserRbac - symfony - Trac
   A Role-Based Access Control (RBAC) system for PHP
   Appendix C: Task-Field Access
   Role-based access control in SQL, part 2 at Xaprb
   PHP Access Control - PHP5 CMS Framework Development | PHP Zone
   Linux file and directory permissions
   MySQL :: MySQL 5.0 Reference Manual :: C.5.4.1 How to Reset the Root
   Password
   per RECORD/Entity permissions? - symfony users | Google Groups
   Special Topics: Authentication and Authorization | The Definitive Guide
  to
   Yii |
   Yii Framework
  
   att.net Mail (gear...@sbcglobal.net)
   Solr - User - Modelling Access Control
   PHP Generic Access Control Lists
   Row-level Model Access Control for CakePHP « some flot, some jet
   Row-level Model Access Control for CakePHP « some flot, some jet
   Yahoo! GeoCities: Get a web site with easy-to-use site building tools.
   Class that acts as a client to a JSON service : JSON « GWT « Java
   Juozas Kaziukėnas devBlog
   Re: [symfony-users] Implementing an existing ACL API in symfony
   php - CakePHP ACL Database Setup: ARO / ACO structure? - Stack Overflow
   W3C ACL System
   makeAclTables.sql
   SchemaWeb - Classes And Properties - ACL Schema

Re: snapshot-4.0 and maven

2010-10-19 Thread Matt Mitchell
Hey thanks Tommy. To be more specific, I'm trying to use SolrJ in a
clojure project. When I try to use SolrJ using what you showed me, I
get errors saying lucene classes can't be found etc.. Is there a way
to build everything SolrJ (snapshot-4.0) needs into one jar?

Matt

On Mon, Oct 18, 2010 at 11:01 PM, Tommy Chheng tommy.chh...@gmail.com wrote:
 Once you built the solr 4.0 jar, you can use mvn's install command like
 this:

 mvn install:install-file -DgroupId=org.apache -DartifactId=solr
 -Dpackaging=jar -Dversion=4.0-SNAPSHOT -Dfile=solr-4.0-SNAPSHOT.jar
 -DgeneratePom=true

 @tommychheng

 On 10/18/10 7:28 PM, Matt Mitchell wrote:

 I'd like to get solr snapshot-4.0 pushed into my local maven repo. Is
 this possible to do? If so, could someone give me a tip or two on
 getting started?

 Thanks,
 Matt



snapshot-4.0 and maven

2010-10-18 Thread Matt Mitchell
I'd like to get solr snapshot-4.0 pushed into my local maven repo. Is
this possible to do? If so, could someone give me a tip or two on
getting started?

Thanks,
Matt


using score to find high confidence duplicates

2010-10-13 Thread Matt Mitchell
I have a solr index full of documents that contain lots of duplicates.
The duplicates are not exact duplicates though. Each may vary slightly
in content.

After indexing, I have a bit of code that loops through the entire
index just to get what I'm calling target documents. For each target
document, I then send another query to find similar documents to the
target. This similarity query includes a clause to match the target
to itself, so I can have a normalized max score. This was the only way
I could figure out how to reasonably fix the scoring range. The
response always includes the target at the top, and similar documents
afterward. So I take the scores and scale to 0-100, where 100 is
always the target matching itself. So far so good...

What I want to do is create a confidence score threshold, so I can
automatically accept similar documents that have a score above the
threshold. If my query *structure* never changes, but only the values
in the query change... is it possible to produce a reliable
threshold score that I could use?

Hope this makes sense :)

Matt


Re: dynamic stop words?

2010-10-13 Thread Matt Mitchell
Great, thanks Hoss. I'll try dismax out today and see what happens with this.

Matt

On Tue, Oct 12, 2010 at 7:35 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 : Is it possible to have certain query terms not effect score, if that
 : same query term is present in a field? For example, I have an index of

 that use case is precisely what the DisjunctionMaxQuery (generated by
 the dismax parser) does for you if you set the tie param to 0

 when one of the words in query results in a high score in fieldA, the
 contribution to the score from that word in all of the other fields is
 ignored (the tie attribute is multiplied by the score of all the fields
 that are not the max score contribution)


 -Hoss



Re: using score to find high confidence duplicates

2010-10-13 Thread Matt Mitchell
No this isn't the MLT, just the standard query parser for now. I did
try the heuristic approach and I might stick with that actually. I ran
the process on known duplicates and created a collection of all
scores. I was then able to see how well the query worked. The scores
seemed focused to one range, which is promising.

I totally forgot about the de-duper, I'll have a look at that and see
if I can get it to work.

Thanks for your help,
Matt

On Wed, Oct 13, 2010 at 3:00 PM, Peter Karich peat...@yahoo.de wrote:
 Hi,

 are you using moreLikeThis for that feature?
 I have no suggestion for a reliable threshold, I think this depends
 on the domain you are operating and is IMO only solvable with a heuristic.
 It also depends on fields, boosts, ...
 It could be that there is a 'score gap' between duplicates and none
 duplicates
 which you can try to find, but I don't know

 BTW: did you check: http://wiki.apache.org/solr/Deduplication

 If you need deduplication while querying you could determine
 a hashvalue from the procedure above and index that into a different field.
 Then you can use collapse feature on that field to remove duplicates.

 Regards,
 Peter.

 I have a solr index full of documents that contain lots of duplicates.
 The duplicates are not exact duplicates though. Each may vary slightly
 in content.

 After indexing, I have a bit of code that loops through the entire
 index just to get what I'm calling target documents. For each target
 document, I then send another query to find similar documents to the
 target. This similarity query includes a clause to match the target
 to itself, so I can have a normalized max score. This was the only way
 I could figure out how to reasonably fix the scoring range. The
 response always includes the target at the top, and similar documents
 afterward. So I take the scores and scale to 0-100, where 100 is
 always the target matching itself. So far so good...

 What I want to do is create a confidence score threshold, so I can
 automatically accept similar documents that have a score above the
 threshold. If my query *structure* never changes, but only the values
 in the query change... is it possible to produce a reliable
 threshold score that I could use?

 Hope this makes sense :)

 Matt



 --
 http://jetwick.com twitter search prototype




Re: dynamic stop words?

2010-10-12 Thread Matt Mitchell
Thanks for the feedback. I thought about stop words but since I have a
lot of documents spanning lots of different countries, I won't know
all of the possible cities so stop-words could get hard to manage.
Also, the city name is in the same field. I think I might try creating
a new field called name_no_city, and at index time just strip the city
name out?

Matt

On Sat, Oct 9, 2010 at 11:17 AM, Geert-Jan Brits gbr...@gmail.com wrote:
 That might work, although depending on your use-case it might be hard to
 have a good controlled vocab on citynames (hotel metropole bruxelles, hotel
 metropole brussels, hotel metropole brussel, etc.)  Also 'hotel paris
 bruxelles' stinks...

 given your example:

 Doc 1
 name = Holiday  Inn
 city = Denver

 Doc 2
 name = Holiday Inn,  Denver
 city = Denver

 q=name:(Holiday Inn, Denver)

 turning it upside down, perhaps an alternative would be to query on:
 q=name:Holiday Inn+city:Denver

 and configure field 'name' in such a way that doc1 and doc2 score the same.
 I believe that must be possible, just not sure how to config it exactly at
 the moment.

 Of course, it depends on your scenario if you have enough knowlegde on the
 clientside to transform:
 q=name:(Holiday Inn, Denver)  to   q=name:Holiday Inn+city:Denver

 Hth,
 Geert-Jan

 2010/10/9 Otis Gospodnetic otis_gospodne...@yahoo.com

 Matt,

 The first thing that came to my mind is that this might be interesting to
 try
 with a dictionary (of city names) if this example is not a made-up one.


 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/



 - Original Message 
  From: Matt Mitchell goodie...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Fri, October 8, 2010 11:22:36 AM
  Subject: dynamic stop words?
 
  Is it possible to have certain query terms not effect score, if that
  same  query term is present in a field? For example, I have an index of
  hotels.  Each hotel has a name and city. If the name of a hotel has the
  name of the  city in it's name field, I want to completely ignore
  that and not have it  influence score.
 
  Example:
 
  Doc 1
  name = Holiday  Inn
  city = Denver
 
  Doc 2
  name = Holiday Inn,  Denver
  city = Denver
 
  q=name:(Holiday Inn, Denver)
 
  I'd  like those docs to have the same score in the response. I don't
  want Doc2 to  have a higher score, just because it has all of the query
  terms.
 
  Is  this possible without using stop words? I hope this makes  sense!
 
  Thanks,
  Matt
 




Re: dynamic stop words?

2010-10-12 Thread Matt Mitchell
Exactly yep. I think that'll work nicely. Thanks Jonathan,

Matt

On Tue, Oct 12, 2010 at 9:47 AM, Jonathan Rochkind rochk...@jhu.edu wrote:
 You can identify what words are the city name at index time, because they're 
 the ones in the city field, right? So why not just strip those words out at 
 index time?  Create a new field, name_search, and search on that, not name.

 Doc 1
 name = Holiday  Inn
 name_search = Holiday Inn   [analyzed, perhaps lowercase normalized etc]
 city = Denver

 Doc 2
 name = Holiday Inn,  Denver
 name_search = Holiday Inn
 city = Denver

 Jonathan

 
 From: Matt Mitchell [goodie...@gmail.com]
 Sent: Tuesday, October 12, 2010 9:24 AM
 To: solr-user@lucene.apache.org
 Subject: Re: dynamic stop words?

 Thanks for the feedback. I thought about stop words but since I have a
 lot of documents spanning lots of different countries, I won't know
 all of the possible cities so stop-words could get hard to manage.
 Also, the city name is in the same field. I think I might try creating
 a new field called name_no_city, and at index time just strip the city
 name out?

 Matt

 On Sat, Oct 9, 2010 at 11:17 AM, Geert-Jan Brits gbr...@gmail.com wrote:
 That might work, although depending on your use-case it might be hard to
 have a good controlled vocab on citynames (hotel metropole bruxelles, hotel
 metropole brussels, hotel metropole brussel, etc.)  Also 'hotel paris
 bruxelles' stinks...

 given your example:

 Doc 1
 name = Holiday  Inn
 city = Denver

 Doc 2
 name = Holiday Inn,  Denver
 city = Denver

 q=name:(Holiday Inn, Denver)

 turning it upside down, perhaps an alternative would be to query on:
 q=name:Holiday Inn+city:Denver

 and configure field 'name' in such a way that doc1 and doc2 score the same.
 I believe that must be possible, just not sure how to config it exactly at
 the moment.

 Of course, it depends on your scenario if you have enough knowlegde on the
 clientside to transform:
 q=name:(Holiday Inn, Denver)  to   q=name:Holiday Inn+city:Denver

 Hth,
 Geert-Jan

 2010/10/9 Otis Gospodnetic otis_gospodne...@yahoo.com

 Matt,

 The first thing that came to my mind is that this might be interesting to
 try
 with a dictionary (of city names) if this example is not a made-up one.


 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/



 - Original Message 
  From: Matt Mitchell goodie...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Fri, October 8, 2010 11:22:36 AM
  Subject: dynamic stop words?
 
  Is it possible to have certain query terms not effect score, if that
  same  query term is present in a field? For example, I have an index of
  hotels.  Each hotel has a name and city. If the name of a hotel has the
  name of the  city in it's name field, I want to completely ignore
  that and not have it  influence score.
 
  Example:
 
  Doc 1
  name = Holiday  Inn
  city = Denver
 
  Doc 2
  name = Holiday Inn,  Denver
  city = Denver
 
  q=name:(Holiday Inn, Denver)
 
  I'd  like those docs to have the same score in the response. I don't
  want Doc2 to  have a higher score, just because it has all of the query
  terms.
 
  Is  this possible without using stop words? I hope this makes  sense!
 
  Thanks,
  Matt
 





Re: case-insensitive phrase query for string fields

2010-10-08 Thread Matt Mitchell
Hey thanks guys! This all makes sense now. I'm using a text field and
it's giving good results of course.

Matt

On Fri, Oct 8, 2010 at 6:08 AM, Erik Hatcher erik.hatc...@gmail.com wrote:
 Matt - https://issues.apache.org/jira/browse/SOLR-2145

        Erik


 On Oct 7, 2010, at 23:38 , Jonathan Rochkind wrote:

 If you are going to put explict phrase quotes in the query string like that, 
 an ordinary text field will match fine, on phrase searches or other 
 searches. That is a solr.TextField, not a solr.StrField as you're using. And 
 then you can put a LowerCaseFilter on it of course. And use an ordinary 
 tokenizer, whitespace or worddelimiter or what have you, not the 
 non-tokenizing keywordtokenizer. Just an ordinary solr.TextField.

 I've never been entirely sure what an indexed solr.StrField is good for 
 exactly. Oh, facets, right. But it's not generally good for matching in an 
 actual 'q', because it's not a tokenized field. Not sure what happens 
 telling a StrField that isn't ever tokenized to use a 
 KeywordTokenizerFactory, maybe it just ignores it, or maybe that's part of 
 the problem.

 If you mean you only want it to match on _exact_ matches (rather than phrase 
 matches), I haven't quite figured out how to do that, in a dismax query 
 where you only want one field of many to behave that way.  But for a single 
 field query (in an fq, or as the only field in a standard query parser q), 
 the field defType will do it. Although now I'm wondering if there is a way 
 to trick a StrField into doing that.
 
 From: Matt Mitchell [goodie...@gmail.com]
 Sent: Thursday, October 07, 2010 10:53 PM
 To: solr-user@lucene.apache.org
 Subject: case-insensitive phrase query for string fields

 What's the recommended approach for handling case-insensitive phrase
 queries? I've got this setup, but no luck:

 fieldType name=ci_string class=solr.StrField
      analyzer
         filter class=solr.LowerCaseFilterFactory/
         tokenizer class=solr.KeywordTokenizerFactory/
      /analyzer
 /fieldType

 So if I index a doc with a title of Golden Master, then I'd expect a
 query of q=title:golden master to work, but no go...

 I know I must be missing something super obvious!

 Matt




dynamic stop words?

2010-10-08 Thread Matt Mitchell
Is it possible to have certain query terms not effect score, if that
same query term is present in a field? For example, I have an index of
hotels. Each hotel has a name and city. If the name of a hotel has the
name of the city in it's name field, I want to completely ignore
that and not have it influence score.

Example:

Doc 1
name = Holiday Inn
city = Denver

Doc 2
name = Holiday Inn, Denver
city = Denver

q=name:(Holiday Inn, Denver)

I'd like those docs to have the same score in the response. I don't
want Doc2 to have a higher score, just because it has all of the query
terms.

Is this possible without using stop words? I hope this makes sense!

Thanks,
Matt


case-insensitive phrase query for string fields

2010-10-07 Thread Matt Mitchell
What's the recommended approach for handling case-insensitive phrase
queries? I've got this setup, but no luck:

fieldType name=ci_string class=solr.StrField
  analyzer
 filter class=solr.LowerCaseFilterFactory/
 tokenizer class=solr.KeywordTokenizerFactory/
  /analyzer
/fieldType

So if I index a doc with a title of Golden Master, then I'd expect a
query of q=title:golden master to work, but no go...

I know I must be missing something super obvious!

Matt


Re: clustering component

2010-08-12 Thread Matt Mitchell
Hey thanks Stanislaw! I'm going to try this against the current trunk
tonight and see what happens.

Matt

On Wed, Jul 28, 2010 at 8:41 AM, Stanislaw Osinski 
stanislaw.osin...@carrotsearch.com wrote:

  The patch should also work with trunk, but I haven't verified it yet.
 

 I've just added a patch against solr trunk to
 https://issues.apache.org/jira/browse/SOLR-1804.

 S.



clustering component

2010-07-27 Thread Matt Mitchell
Hi,

I'm attempting to get the carrot based clustering component (in trunk) to
work. I see that the clustering contrib has been disabled for the time
being. Does anyone know if this will be re-enabled soon, or even better,
know how I could get it working as it is?

Thanks,
Matt


Re: solr with tomcat in cluster mode

2010-01-22 Thread Matt Mitchell
We have a similar setup and I'd be curious to see how folks are doing this
as well.

Our setup: A few servers and an F5 load balancer. Each Solr instance points
to a shared index. We use a separate server for indexing. When the index is
complete, we do some juggling using the Core Admin SWAP function and update
the shared index. I've wondered about having a shared index across multiple
instances of (read-only) Solr -- any problems there?

Matt

On Fri, Jan 22, 2010 at 9:35 AM, ZAROGKIKAS,GIORGOS 
g.zarogki...@multirama.gr wrote:

 Hi
I'm using solr 1.4 with tomcat in a single pc
and I want to turn it in cluster mode with 2 nodes and load
 balancing
But I can't find info how to do
Is there any manual or a recorded procedure on the internet  to
 do that
Or is there anyone to help me ?

 Thanks in advance


 Ps : I use windows server 2008 for OS







Re: solr with tomcat in cluster mode

2010-01-22 Thread Matt Mitchell
Hey Otis,

We're indexing on a separate machine because we want to keep our production
nodes away from processes like indexing. The indexing server also has a ton
of resources available, more so than the production nodes. We set it up as
an indexing server at one point and have decided to stick with it.

We're not indexing the same index as the search indexes because we want to
be able to step back a day or two if needed. So we do the SWAP when things
are done and OK.

So that last part you mentioned about the searchers needing to re-open will
happen with a SWAP right? Is your concern that there will be a lag time,
making it so the slaves will be out of sync for some small period of time?

Would it be simpler/better to move to using Solrs native slave/master
feature?

I'd love to hear any suggestions you might have.

Thanks,

Matt

On Fri, Jan 22, 2010 at 1:58 PM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:

 This should work fine.
 But why are you indexing to a separate index/core?  Why not index in the
 very same index you are searching?
 Slaves won't see changes until their searchers re-open.

 Otis
 --
 Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



 - Original Message 
  From: Matt Mitchell goodie...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Fri, January 22, 2010 9:44:03 AM
  Subject: Re: solr with tomcat in cluster mode
 
  We have a similar setup and I'd be curious to see how folks are doing
 this
  as well.
 
  Our setup: A few servers and an F5 load balancer. Each Solr instance
 points
  to a shared index. We use a separate server for indexing. When the index
 is
  complete, we do some juggling using the Core Admin SWAP function and
 update
  the shared index. I've wondered about having a shared index across
 multiple
  instances of (read-only) Solr -- any problems there?
 
  Matt
 
  On Fri, Jan 22, 2010 at 9:35 AM, ZAROGKIKAS,GIORGOS 
  g.zarogki...@multirama.gr wrote:
 
   Hi
  I'm using solr 1.4 with tomcat in a single pc
  and I want to turn it in cluster mode with 2 nodes and load
   balancing
  But I can't find info how to do
  Is there any manual or a recorded procedure on the internet  to
   do that
  Or is there anyone to help me ?
  
   Thanks in advance
  
  
   Ps : I use windows server 2008 for OS
  
  
  
  
  




Re: Solr - data flattening

2010-01-17 Thread Matt Mitchell
Can you post a few examples of your source data? What kinds of relationships
are you having to deal with?

If you want to retain a link to the source then that's pretty simple
(field for the file, url etc.). If your relationships will be between the
Solr documents themselves, then I think you'd really need to show source
examples, and then describe what it is you want in the output/Solr
application.

Matt

On Sun, Jan 17, 2010 at 8:05 PM, Ankit Bhatnagar abhatna...@vantage.comwrote:

 Hi guys,

 I have a Question regarding flattening of data for indexing.

 Scenario -
 We have tons of records however they come from disparate data sources.
 How to flatten data so as to retain the relationship?

 Thanks
 Ankit


java heap space error when faceting

2010-01-16 Thread Matt Mitchell
I have an index with more than 6 million docs. All is well, until I turn on
faceting and specify a facet.field. There is only about unique 20 values for
this particular facet throughout the entire index. I was able to make things
a little better by using facet.method=enum. That seems to work, until I add
another facet.field to the request, which is another facet that doesn't have
that many unique values. I utlimately end up running out of heap space
memory. I should also mention that in every case, the rows param is set to
0.

I've thrown as much memory as I can at the JVM (+3G for start-up and max),
tweaked filter cache settings etc.. I can't seem to get this error to go
away. Anyone have any tips to throw my way?

-- using a recent nighlty build of solr 1.5 dev and Jetty as my servlet
container.

Thanks!
Matt


Re: java heap space error when faceting

2010-01-16 Thread Matt Mitchell
These are single valued fields. Strings and integers. Is there more specific
info I could post to help diagnose what might be happening?
Thanks!
Matt

On Sat, Jan 16, 2010 at 10:42 AM, Yonik Seeley
yo...@lucidimagination.comwrote:

 On Sat, Jan 16, 2010 at 10:01 AM, Matt Mitchell goodie...@gmail.com
 wrote:
  I have an index with more than 6 million docs. All is well, until I turn
 on
  faceting and specify a facet.field. There is only about unique 20 values
 for
  this particular facet throughout the entire index.

 Hmmm, that doesn't sound right... unless you're already near max
 memory usage due to other things.
 Is this a single-valued or multi-valued field?  If multi-valued, how
 many values does each document have on average?

 -Yonik
 http://www.lucidimagination.com



Re: java heap space error when faceting

2010-01-16 Thread Matt Mitchell
I'm embarrassed (but hugely relieved) to say that, the script I had for
starting Jetty had a bug in the way it set java options! So, my heap
start/max was always set at the default. I did end up using jconsole and
learned quite a bit from that too.

Thanks for your help Yonik :)

Matt

On Sat, Jan 16, 2010 at 11:13 AM, Yonik Seeley
yo...@lucidimagination.comwrote:

 On Sat, Jan 16, 2010 at 11:04 AM, Matt Mitchell goodie...@gmail.com
 wrote:
  These are single valued fields. Strings and integers. Is there more
 specific
  info I could post to help diagnose what might be happening?

 Faceting on either should currently take ~24MB (6M docs @ 4 bytes per
 doc + size_of_unique_values)
 With that small number of values, facet.enum may be faster in general
 (and take up less room: 6M/8*20 or 15MB).
 But you certainly shouldn't be running out of space with the heap
 sizes you mentioned.

 Perhaps look at the stats.jsp page in the admin and see what's listed
 in the fieldCache?
 And verify that your heap is really as big as you think it is.
 You can also use something like jconsole that ships with the JDK to
 manually do a GC and check out how much of the heap is in use before
 you try to facet.

 -Yonik
 http://www.lucidimagination.com



Re: Getting solr response data in a JS query

2010-01-11 Thread Matt Mitchell
I remember having a difficult time getting jquery to work as I thought it
would. Something to do with the wt. I ended up creating a little client lib.
Maybe this will be useful in finding your problem?

example:
  http://github.com/mwmitchell/get_rest/blob/master/solr_example.html
lib:
  http://github.com/mwmitchell/get_rest/blob/master/solr_client.jquery.js

Matt

On Mon, Jan 11, 2010 at 11:22 AM, Gregg Hoshovsky hosho...@ohsu.edu wrote:

 You might be running into  an Ajax restriction.

 See if an article like this helps.



 http://www.nathanm.com/ajax-bypassing-xmlhttprequest-cross-domain-restriction/


 On 1/9/10 11:37 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote:

 Dan,

 You didn't mention whether you tried wt=json .  Does it work if you use
 that to tell Solr to return its response in JSON format?

  Otis
 --
 Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



 - Original Message 
  From: Dan Yamins dyam...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Sat, January 9, 2010 10:05:54 PM
  Subject: Getting solr response data in a JS query
 
  Hi:
 
  I'm trying to use figure out how to get solr responses and use them in my
  website.I'm having some problems figure out how to
 
  1) My initial thought is is to use ajax, and insert a line like this in
 my
  script:
 
   data = eval($.get(http://localhost:8983/solr/select/?q=*:*
  ).responseText)
 
  ... and then do what I want with the data, with logic being done in
  Javascript on the front page.
 
  However, this is just not working technically:  no matter what
 alternative I
  use, I always seem to get no response to this query.  I think I'm having
  exactly the same problem as described here:
 
  http://www.mail-archive.com/solr-user@lucene.apache.org/msg29949.html
 %20http://www.mail-archive.com/solr-user@lucene.apache.org/msg29949.html
 
  and here:
 
 
 http://stackoverflow.com/questions/1906498/solr-responses-to-webbrowser-url-but-not-from-javascript-code
 
  Just like those two OPs, I can definitely access my solr responese
 through a
  web browser, but my jquery is getting nothing.Unfortunately, in
 neither
  thread did the answer seem to have been figured out satisfactorily.
 Does
  anybody know what the problem is?
 
 
  2)  As an alternative, I _can_ use  the ajax-solr library.   Code like
 this:
 
  var Manager;
  (function ($) {
$(function () {
  Manager = new AjaxSolr.Manager({
solrUrl: 'http://localhost:8983/solr/'
 });
 
Manager.init();
Manager.store.addByValue('q', '*:*');
Manager.store.addByValue('rows', '1000');
Manager.doRequest();
});
  })(jQuery);
 
  does indeed load solr data into my DOM.Somehow, ajax-solr's doRequest
  method is doing something that makes it possible to receive the proper
  response from the solr servlet, but I don't know what it is so I can't
  replicate it with my own ajax.   Does anyone know what is happening?
 
  (Of course, I _could_ just use ajax-solr, but doing so would mean
 figuring
  out how to re-write my existing application for how to display search
  results in a form that works with the ajax-solr api, and I' d rather
 avoid
  this if possible since it looks somewhat nontrivial.)
 
 
  Thanks!
  Dan





Re: why is XMLWriter declared as final?

2009-11-25 Thread Matt Mitchell
OK thanks Shalin.

Matt

On Wed, Nov 25, 2009 at 8:48 AM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Wed, Nov 25, 2009 at 3:33 AM, Matt Mitchell goodie...@gmail.com
 wrote:

  Is there any reason the XMLWriter is declared as final? I'd like to
 extend
  it for a special case but can't. The other writers (ruby, php, json) are
  not
  final.
 
 
 I don't think it needs to be final. Maybe it is final because it wasn't
 designed to be extensible. Please open a jira issue.

 --
 Regards,
 Shalin Shekhar Mangar.



Re: why is XMLWriter declared as final?

2009-11-25 Thread Matt Mitchell
Interesting. Well just to clarify my intentions a bit, I'll quickly explain
what I was trying to do.

I'm using the MLT component but because some of my stored fields are really
big, I don't need (or want) all of the fields for my MLT docs in the
response. I want my MLT docs to have only 2 fields, but I need my main docs
fl to have all fields.

So a simple override of the XMLWriter writeNamedList method would do the
trick. All you have to do is check if the name == moreLikeThis. If so,
process the docs and specify a different field list. If not, just call
super(). Worked like a charm, but oh well. I really only need the Ruby
response anyway, so I'll move on to that. I'm glad this spurred some
interest though.

-- It'd be great to let components have control over their fl value instead
of having a global fl value for all doc lists within a writer?

Matt

On Wed, Nov 25, 2009 at 2:33 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : I don't think it needs to be final. Maybe it is final because it wasn't
 : designed to be extensible. Please open a jira issue.

 it really wasn't, and it probably shouldn't be ... there is another thread
 currently in progress (in response to SOLR-1592) about this.

 Given how kludgy the entire API is, i'd really prefer it not be made
 un-final .. it would need some serious overhaul/review to make it possible
 to subclass in a sensical way, and coming up with a new API is likely to
 make a lot more sense then trying to retrofit that one.

 -Hoss




why is XMLWriter declared as final?

2009-11-24 Thread Matt Mitchell
Is there any reason the XMLWriter is declared as final? I'd like to extend
it for a special case but can't. The other writers (ruby, php, json) are not
final.

Thanks,
Matt


Re: multicore and ruby

2009-09-09 Thread Matt Mitchell
Hey Paul,

In rsolr, you could use the #request method to set a request handler path:
solr.request('/core1/select', :q='*:*')

Alternatively, (rsolr and solr-ruby) you could probably handle this by
creating a new instance of a connection object per-core, and then have some
kind of factory to return connection objects by a core-name?

What kinds of things were you hoping to find when looking for multicore
support in either solr-ruby or rsolr?

Matt

On Wed, Sep 9, 2009 at 12:38 PM, Paul Rosen p...@performantsoftware.comwrote:

 Hi all,

 I'd like to start experimenting with multicore in a ruby on rails app.

 Right now, the app is using the solr-ruby-rails-0.0.5 to communicate with
 solr and it doesn't appear to have direct support for multicore and I didn't
 have any luck googling around for it.

 We aren't necessarily wedded to using solr-ruby-rails-0.0.5, but I looked
 at rsolr very briefly and didn't see any reference to multicore there,
 either.

 I can certainly hack something together, but it seems like this is a common
 problem.

 How are others doing multicore from ruby?

 Thanks,
 Paul



Re: multicore and ruby

2009-09-09 Thread Matt Mitchell
Yep same thing in rsolr and just use the :shards param. It'll return
whatever solr returns.

Matt

On Wed, Sep 9, 2009 at 4:17 PM, Paul Rosen p...@performantsoftware.comwrote:

 Hi Erik,

 Yes, I've been doing that in my tests, but I also have the case of wanting
 to do a search over all the cores using the shards syntax. I was thinking
 that the following wouldn't work:


 solr = Solr::Connection.new('
 http://localhost:8983/solr/core0/select?shards=localhost:8983/solr/core0,localhost:8983/solr/core1
 ')

 because it has a ? in it.


 Erik Hatcher wrote:

 With solr-ruby, simply put the core name in the URL of the
 Solr::Connection...

   solr = Solr::Connection.new('http://localhost:8983/solr/core_name')

Erik


 On Sep 9, 2009, at 6:38 PM, Paul Rosen wrote:

  Hi all,

 I'd like to start experimenting with multicore in a ruby on rails app.

 Right now, the app is using the solr-ruby-rails-0.0.5 to communicate with
 solr and it doesn't appear to have direct support for multicore and I didn't
 have any luck googling around for it.

 We aren't necessarily wedded to using solr-ruby-rails-0.0.5, but I looked
 at rsolr very briefly and didn't see any reference to multicore there,
 either.

 I can certainly hack something together, but it seems like this is a
 common problem.

 How are others doing multicore from ruby?

 Thanks,
 Paul






Re: response issues with ruby and json

2009-08-23 Thread Matt Mitchell
Thanks Yonik. Yeah the additional facet fields being added by the client do
make it a little more complicated.

Matt

On Sun, Aug 23, 2009 at 12:47 PM, Yonik Seeley
yo...@lucidimagination.comwrote:

 The spellcheck issue needs to be resolved.

 It doesn't seem like a good idea to access facet.fields by position
 though - there has never been any guarantee about the order that these
 come back in, and additional ones could be added as default parameters
 for example.

 -Yonik
 http://www.lucidimagination.com



 On Thu, Aug 20, 2009 at 10:54 PM, Matt Mitchellgoodie...@gmail.com
 wrote:
  Hi,
 
  I was using the spellcheck component a while ago and noticed that parts
 of
  the response are hashes, that use duplicate keys. This is the issue here:
  http://issues.apache.org/jira/browse/SOLR-1071
 
  Also, the facet/facet_fields response is a hash, where the keys are field
  names. This is mostly fine BUT, when eval'd in Ruby, the resulting key
 order
  is not consistent; I think this is pretty normal for most languages. It
  seems to me that an array of hashes would be more useful to preserve the
  ordering? For example, we have an application that uses a custom handler
  that specifies the facet fields. It'd be nice if the response ordering
 could
  also be controlled in the solrconfig.xml
 
  I guess I have 2 questions:
 
  1. Does anyone if the spellcheck component going to get updated so there
 are
  not duplicate keys
 
  2. How could we get the facet fields into arrays instead of hashes for
 the
  ruby response writer? Should I submit a patch? Is this important to
 anyone
  else? I guess the alternative is to use the xml response.
 
  Thanks,
  Matt
 



Using DirectConnection or EmbeddedSolrServer, within a component

2009-07-10 Thread Matt Mitchell
Hi,

I'm experimenting with Solr components. I'd like to be able to use a
nice-high-level querying interface like the DirectSolrConnection or
EmbeddedSolrServer provides. Would it be considered absolutely insane to use
one of those *within a component* (using the same core instance)?

Matt


Re: Indexing XML

2009-07-07 Thread Matt Mitchell
Saeli,

Solr expects a certain XML structure when adding documents. You'll need to
come up with a mapping, that translates the original structure to one that
solr understands. You can then search solr and get those solr documents
back. If you want to keep the original XML, you can store it in a field
within the solr document.

original data - mapping - solr XML document (with a field for the original
data)

Does that make sense? Can you describe what it is you want to do with
results of a search?

Matt

On Tue, Jul 7, 2009 at 10:25 AM, Saeli Mathieu saeli.math...@gmail.comwrote:

 Hello.

 I'm a new user of Solr, I already used Lucene to index files and search.
 But my programme was too slow, it's why I was looking for another solution,
 and I thought I found it.

 I said I thought because I don't know if it's possible to use solar with
 this kind of XML files.

  lom xsi:schemaLocation=http://ltsc.ieee.org/xsd/lomv1.0
 http://ltsc.ieee.org/xsd/lomv1.0/lom.xsd;
 general
 identifier
 catalogSTRING HERE/catalog
 entry
 STRING HERE
 /entry
 /identifier
 title
 string language=fr
 STRING HERE
 /string
 /title
 languagefr/language
 description
 string language=fr
 STRING HERE
 /string
 /description
 /general
 lifeCycle
 status
 sourceSTRING HERE/source
 valueSTRING HERE/value
 /status
 contribute
 role
 sourceSTRING HERE/source
 valueSTRING HERE/value
 /role
 entitySTRING HERE
 /entity
 /contribute
 /lifeCycle
 metaMetadata
 identifier
 catalogSTRING HERE/catalog
 entrySTRING HERE/entry
 /identifier
 contribute
 role
 sourceSTRING HERE/source
 valueSTRING HERE/value
 /role
 entitySTRING HERE
 /entity
 date
 dateTimeSTRING HERE/dateTime
 /date
 /contribute
 contribute
 role
 sourceSTRING HERE/source
 valueSTRING HERE/value
 /role
 entitySTRING HERE
 /entity
 entitySTRING HERE/entity
 entitySTRING HERE
 /entity
 date
 dateTimeSTRING HERE/dateTime
 /date
 /contribute
 metadataSchemaSTRING HERE/metadataSchema
 languageSTRING HERE/language
 /metaMetadata
 technical
 locationSTRING HERE
 /location
 /technical
 educational
 intendedEndUserRole
 sourceSTRING HERE/source
 valueSTRING HERE/value
 /intendedEndUserRole
 context
 sourceSTRING HERE/source
 valueSTRING HERE/value
 /context
 typicalAgeRange
 string language=frSTRING HERE/string
 /typicalAgeRange
 description
 string language=frSTRING HERE/string
 /description
 description
 string language=frSTRING HERE/string
 /description
 languageSTRING HERE/language
 /educational
 annotation
 entitySTRING HERE
 /entity
 date
 dateTimeSTRING HERE/dateTime
 /date
 /annotation
 classification
 purpose
 sourceSTRING HERE/source
 valueSTRING HERE/value
 /purpose
 /classification
 classification
 purpose
 sourceSTRING HERE/source
 valueSTRING HERE/value
 /purpose
 taxonPath
 source
 string language=frSTRING HERE/string
 /source
 taxon
 idSTRING HERE/id
 entry
 string language=frSTRING HERE/string
 /entry
 /taxon
 /taxonPath
 /classification
 classification
 purpose
 sourceSTRING HERE/source
 valueSTRING HERE/value
 /purpose
 taxonPath
 source
 string language=frSTRING HERE /string
 /source
 taxon
 idSTRING HERE/id
 entry
 string language=frSTRING HERE/string
 /entry
 /taxon
 /taxonPath
 taxonPath
 source
 string language=frSTRING HERE/string
 /source
 taxon
 idSTRING HERE/id
 entry
 string language=frSTRING HERE/string
 /entry
 /taxon
 /taxonPath
 /classification
 /lom

 I don't know how I can use this kind of file with Solr because the XML
 example are this one.

  add
  doc
  field name=idSOLR1000/field
  field name=nameSolr, the Enterprise Search Server/field
  field name=manuApache Software Foundation/field
  field name=catsoftware/field
  field name=catsearch/field
  field name=featuresAdvanced Full-Text Search Capabilities using
 Lucene/field
  field name=featuresOptimized for High Volume Web Traffic/field
  field name=featuresStandards Based Open Interfaces - XML and
 HTTP/field
  field name=featuresComprehensive HTML Administration
 Interfaces/field
  field name=featuresScalability - Efficient Replication to other Solr
 Search Servers/field
  field name=featuresFlexible and Adaptable with XML configuration and
 Schema/field
  field name=featuresGood unicode support: h#xE9;llo (hello with an
 accent over the e)/field
  field name=price0/field
 field name=popularity10/field
 field name=inStocktrue/field
 field name=incubationdate_dt2006-01-17T00:00:00.000Z/field
 /doc
 /add

 I understood Solr need this kind of architecture, by Architecture I mean
 field + name=keywordValue/field
 or as you can see I can't use this kind of architecture because I'm not
 allow to change my XML files.

 I'm looking forward to read you.

 Mathieu Saeli
 --
 Saeli Mathieu.



transforming an XML field using the XSL tr param

2009-07-01 Thread Matt Mitchell
I know you can transform Solr document fields, but is it possible to have
Solr transform XML that might be embedded (as a string) in a field?

Matt


Re: [OT] New Book: Search User Interfaces

2009-06-28 Thread Matt Mitchell
This is great! Thanks for this.

Matt

On Mon, Jun 29, 2009 at 12:30 AM, Ian Holsman li...@holsman.net wrote:

 not directly related to SOLR I know.. but I think most people would find it
 interesting.


 http://searchuserinterfaces.com/book/


 from the preface:

 Search is an integral part of peoples' online lives; people turn to search
 engines for help with a wide range of needs and desires, from satisfying
 idle curiousity to finding life-saving health remedies, from learning about
 medieval art history to finding video game solutions and pop music lyrics.
 Web search engines are now the second most frequently used online computer
 application, after email. Not long ago, most software applications did not
 contain a search module. Today, search is fully integrated into operating
 systems and is viewed as an essential part of most information systems.

 Many books on information retrieval describe the algorithms behind search
 engines and information retrieval systems. By contrast, this book focuses on
 the human users of search systems and the tool they use to interact with
 them: the search user interface. Because of their global reach, search user
 interfaces must be understandable by and appealing to a wide variety of
 people of all ages, cultures and backgrounds, and for an enormous variety of
 information needs.





Re: are there any good samples / tutorials on making queries facets ?

2009-06-20 Thread Matt Mitchell
Yeah the lucid imagination articles are great!

Jonathan, you can also use the dismax query parser and apply boosts using
the qf (query fields) param:

q=my query hereqf=title^0.5 author^0.1

http://wiki.apache.org/solr/DisMaxRequestHandler#head-af452050ee272a1c88e2ff89dc0012049e69e180

Matt

On Sat, Jun 20, 2009 at 10:11 PM, Michel Bottan freakco...@gmail.comwrote:

 Hi Jonathan,

 I think this is the best article related to faceted search.


 http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Faceted-Search-Solr

 On Sat, Jun 20, 2009 at 9:56 PM, Jonathan Vanasco jvana...@2xlp.com
 wrote:

  i've gone through the official docs a few times, and then found some
  offsite stuff of varying quality regarding how-tos.
 
  can anyone here recommend either howtos/tutorials or sample applications
  that they have found worthwhile ?
 
  specifically i'm looking to do the following:
 
 - with regular searching, query the system with a single term, and
  have solr search multiple fields - each one having a different weight


 In order to search into multiple fields and have a different weight for
 each
 of them, you could use the Dismax requesthandler and boost each field.

 - use dismax
 - boost weights of each field using bq parameter bq=foofield:term^0.5

 http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3



 
 - implement faceted browsing
 
  i know this is quite easy to do with solr, i'm just not seeing docs that
  resonate with me yet.
 
  thanks!


 Cheers,
 Michel



moreLikeThis fl

2009-06-16 Thread Matt Mitchell
I'd like to have a MLT query return similar docs, but the fl for those mlt
docs should be different from the main fl. For example, the main fl is *,
score -- but I only want the title and id in my MLT results. Is this
possible?

Matt


Re: grouping response docs together

2009-05-26 Thread Matt Mitchell
Thomas,

Here's what I get after patching nightly build (yesterday) and running ant
test

compileTests:
[javac] Compiling 1 source file to
/Users/mwm4n/Downloads/apache-solr-nightly/build/tests
[javac]
/Users/mwm4n/Downloads/apache-solr-nightly/src/test/org/apache/solr/search/TestDocSet.java:138:
checkEqual(org.apache.lucene.util.OpenBitSet,org.apache.solr.search.DocSet)
in org.apache.solr.search.TestDocSet cannot be applied to
(org.apache.solr.search.DocSet,org.apache.solr.search.DocSet)
[javac] checkEqual(a1,
NegatedDocSet.negation(NegatedDocSet.negation(b1)));
[javac] ^
[javac] 1 error

Matt

On Mon, May 25, 2009 at 7:59 PM, Matt Mitchell goodie...@gmail.com wrote:

 Hi Thomas,

 In a 5-24-09 nightly build, I applied the patch:

 cd apache-solr-nightly

 patch -p0  ~/Projects/apache-solr-patches/SOLR-236_collapsing.patch
 patching file src/common/org/apache/solr/common/params/CollapseParams.java
 patching file
 src/java/org/apache/solr/handler/component/CollapseComponent.java
 patching file src/java/org/apache/solr/search/CollapseFilter.java
 patching file src/java/org/apache/solr/search/NegatedDocSet.java
 patching file src/java/org/apache/solr/search/SolrIndexSearcher.java
 Hunk #1 succeeded at 1444 (offset -39 lines).
 patching file src/test/org/apache/solr/search/TestDocSet.java
 Hunk #1 succeeded at 134 (offset 42 lines).

 ... and got this when running ant dist

 docs:
 [mkdir] Created dir:
 /Users/mwm4n/Downloads/apache-solr-nightly/contrib/javascript/dist/doc
  [java] Exception in thread main java.lang.NoClassDefFoundError:
 org/mozilla/javascript/tools/shell/Main
  [java] at JsRun.main(Unknown Source)

 BUILD FAILED
 /Users/mwm4n/Downloads/apache-solr-nightly/common-build.xml:338: The
 following error occurred while executing this line:
 /Users/mwm4n/Downloads/apache-solr-nightly/common-build.xml:215: The
 following error occurred while executing this line:
 /Users/mwm4n/Downloads/apache-solr-nightly/contrib/javascript/build.xml:74:
 Java returned: 1

 Not sure what any of that means, but the ant dist task worked fine before
 the patch. Any ideas?

 Thanks,

 Matt


 On Mon, May 25, 2009 at 3:59 PM, Thomas Traeger t.trae...@kabuco.dewrote:

 Hello Matt,

 the patch should work with trunk and after a small fix with 1.3 too (see
 my comment in SOLR-236). I just made a successful build to be sure.

 Do you see any error messages?

 Thomas

 Matt Mitchell schrieb:

  Thanks guys. I looked at the dedup stuff, but the documents I'm adding
 aren't really duplicates. They're very similar, but different.

 I checked out the field collapsing feature patch, applied the patch but
 can't get it to build successfully. Will this patch work with a nightly
 build?

 Thanks!

 On Fri, May 15, 2009 at 7:47 PM, Otis Gospodnetic 
 otis_gospodne...@yahoo.com wrote:

  Matt - you may also want to detect near duplicates at index time:

 http://wiki.apache.org/solr/Deduplication

  Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 

 From: Matt Mitchell goodie...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Friday, May 15, 2009 6:52:48 PM
 Subject: grouping response docs together

 Is there a built-in mechanism for grouping similar documents together
 in

 the

 response? I'd like to make it look like there is only one document with
 multiple hits.

 Matt








Re: grouping response docs together

2009-05-26 Thread Matt Mitchell
Thanks Otis. I'll give the dedup a test drive today.

I'll explain what I'm trying to do a little better though because I don't
think I have yet!

So, I'm indexing an XML file. There are different sections in the XML
file. Each of those sections gets a solr doc (the xml text-only is indexed).
Each solr doc also has a field to specify the source filename. What I'd like
to have happen is, when I do a search, I want my search results to combine
all documents that have the same filename... I want to group by filename
if that makes sense. Or at the very least, show only one and indicate that
there are more.

Matt

On Tue, May 26, 2009 at 12:58 PM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:


 Matt,

 The Deduplication feature in Solr does support near-duplicate scenario.  It
 comes with a few components to help you detect near-duplicates, and you
 should be able to write a custom near-dupe detection component and plug it
 in.


 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
  From: Matt Mitchell goodie...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Monday, May 25, 2009 3:30:42 PM
  Subject: Re: grouping response docs together
 
  Thanks guys. I looked at the dedup stuff, but the documents I'm adding
  aren't really duplicates. They're very similar, but different.
 
  I checked out the field collapsing feature patch, applied the patch but
  can't get it to build successfully. Will this patch work with a nightly
  build?
 
  Thanks!
 
  On Fri, May 15, 2009 at 7:47 PM, Otis Gospodnetic 
  otis_gospodne...@yahoo.com wrote:
 
  
   Matt - you may also want to detect near duplicates at index time:
  
   http://wiki.apache.org/solr/Deduplication
  
Otis
   --
   Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
  
  
  
   - Original Message 
From: Matt Mitchell
To: solr-user@lucene.apache.org
Sent: Friday, May 15, 2009 6:52:48 PM
Subject: grouping response docs together
   
Is there a built-in mechanism for grouping similar documents together
 in
   the
response? I'd like to make it look like there is only one document
 with
multiple hits.
   
Matt
  
  




Re: grouping response docs together

2009-05-25 Thread Matt Mitchell
Thanks guys. I looked at the dedup stuff, but the documents I'm adding
aren't really duplicates. They're very similar, but different.

I checked out the field collapsing feature patch, applied the patch but
can't get it to build successfully. Will this patch work with a nightly
build?

Thanks!

On Fri, May 15, 2009 at 7:47 PM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:


 Matt - you may also want to detect near duplicates at index time:

 http://wiki.apache.org/solr/Deduplication

  Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
  From: Matt Mitchell goodie...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Friday, May 15, 2009 6:52:48 PM
  Subject: grouping response docs together
 
  Is there a built-in mechanism for grouping similar documents together in
 the
  response? I'd like to make it look like there is only one document with
  multiple hits.
 
  Matt




Re: highlighting performance

2009-05-25 Thread Matt Mitchell
Thanks Otis. I added termVector=true for those fields, but there isn't a
noticeable difference. So, just to be a little more clear, the dynamic
fields I'm adding... there might be hundreds. Do you see this as a problem?

Thanks,
Matt

On Fri, May 15, 2009 at 7:48 PM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:


 Matt,

 I believe indexing those fields that you will use for highlighting with
 term vectors enabled will make things faster (and your index a bit bigger).


 Otis --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
  From: Matt Mitchell goodie...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Friday, May 15, 2009 5:08:23 PM
  Subject: highlighting performance
 
  Hi,
 
  I'm experimenting with highlighting and am noticing a big drop in
  performance with my setup. I have documents that use quite a few dynamic
  fields (20-30). The fields are multiValued stored/indexed text fields,
 each
  with a few paragraphs worth of text. My hl.fl param is set to *_t
 
  What kinds of things can I tweak to make this faster? Is it because I'm
  highlighting so many different fields?
 
  Thanks,
  Matt




Re: grouping response docs together

2009-05-25 Thread Matt Mitchell
Hi Thomas,

In a 5-24-09 nightly build, I applied the patch:

cd apache-solr-nightly

patch -p0  ~/Projects/apache-solr-patches/SOLR-236_collapsing.patch
patching file src/common/org/apache/solr/common/params/CollapseParams.java
patching file
src/java/org/apache/solr/handler/component/CollapseComponent.java
patching file src/java/org/apache/solr/search/CollapseFilter.java
patching file src/java/org/apache/solr/search/NegatedDocSet.java
patching file src/java/org/apache/solr/search/SolrIndexSearcher.java
Hunk #1 succeeded at 1444 (offset -39 lines).
patching file src/test/org/apache/solr/search/TestDocSet.java
Hunk #1 succeeded at 134 (offset 42 lines).

... and got this when running ant dist

docs:
[mkdir] Created dir:
/Users/mwm4n/Downloads/apache-solr-nightly/contrib/javascript/dist/doc
 [java] Exception in thread main java.lang.NoClassDefFoundError:
org/mozilla/javascript/tools/shell/Main
 [java] at JsRun.main(Unknown Source)

BUILD FAILED
/Users/mwm4n/Downloads/apache-solr-nightly/common-build.xml:338: The
following error occurred while executing this line:
/Users/mwm4n/Downloads/apache-solr-nightly/common-build.xml:215: The
following error occurred while executing this line:
/Users/mwm4n/Downloads/apache-solr-nightly/contrib/javascript/build.xml:74:
Java returned: 1

Not sure what any of that means, but the ant dist task worked fine before
the patch. Any ideas?

Thanks,

Matt

On Mon, May 25, 2009 at 3:59 PM, Thomas Traeger t.trae...@kabuco.de wrote:

 Hello Matt,

 the patch should work with trunk and after a small fix with 1.3 too (see
 my comment in SOLR-236). I just made a successful build to be sure.

 Do you see any error messages?

 Thomas

 Matt Mitchell schrieb:

  Thanks guys. I looked at the dedup stuff, but the documents I'm adding
 aren't really duplicates. They're very similar, but different.

 I checked out the field collapsing feature patch, applied the patch but
 can't get it to build successfully. Will this patch work with a nightly
 build?

 Thanks!

 On Fri, May 15, 2009 at 7:47 PM, Otis Gospodnetic 
 otis_gospodne...@yahoo.com wrote:

  Matt - you may also want to detect near duplicates at index time:

 http://wiki.apache.org/solr/Deduplication

  Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 

 From: Matt Mitchell goodie...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Friday, May 15, 2009 6:52:48 PM
 Subject: grouping response docs together

 Is there a built-in mechanism for grouping similar documents together in

 the

 response? I'd like to make it look like there is only one document with
 multiple hits.

 Matt







highlighting performance

2009-05-15 Thread Matt Mitchell
Hi,

I'm experimenting with highlighting and am noticing a big drop in
performance with my setup. I have documents that use quite a few dynamic
fields (20-30). The fields are multiValued stored/indexed text fields, each
with a few paragraphs worth of text. My hl.fl param is set to *_t

What kinds of things can I tweak to make this faster? Is it because I'm
highlighting so many different fields?

Thanks,
Matt


grouping response docs together

2009-05-15 Thread Matt Mitchell
Is there a built-in mechanism for grouping similar documents together in the
response? I'd like to make it look like there is only one document with
multiple hits.

Matt


Re: highlighting html content

2009-04-28 Thread Matt Mitchell
Hi Christian,

I decided to do something very similar. How do you handle cases where the
highlighting is inside of html/xml tags though? I'm getting stuff like this:

?q=jackson

entry type=song author=Michael emJackson/emBad by Michael
emJackson/em/entry

I wrote a regular expression to take care of the html/xml problem
(highlighting inside of the tag), I'd be interested in seeing your and
others approach to this, even if it's a regular expression.

Matt

On Tue, Apr 28, 2009 at 3:21 AM, Christian Vogler 
christian.vog...@gmail.com wrote:

 Hi Matt,

 On Tue, Apr 28, 2009 at 4:24 AM, Matt Mitchell goodie...@gmail.com
 wrote:
  I've been toying with setting custom pre/post delimiters and then
 removing
  them in the client, but I thought I'd ask the list before I go to far
 with
  that idea :)

 this is what I do. I define the custom highlight delimiters as
 [solr:hl] and [/solr:hl], and then do a string replace with em
 class=highlight /em on the search results.

 It is simple to implement, and effective.

 Best regards
 - Christian



Re: Can we provide context dependent faceted navigation from SOLR search results

2009-04-28 Thread Matt Mitchell
Wow, this looks great. Thanks for this Koji!

Matt

On Tue, Apr 28, 2009 at 12:13 PM, Koji Sekiguchi k...@r.email.ne.jp wrote:

 Thanh Doan wrote:

 Assuming a solr search returns 10 listing items as below

 1) 4 digital cameras
 2) 4 LCD televisions
 3) 2 clothing items

 If we navigate to /electronics  we want solr  to show
 us facets specific to 8 electronics items (e.g brand, price).
 If we navigate to /electronics/cameraswe want solr  to show us
 facets specific to 4 camera items (e.g mega-pixels, screens-size,
 brand, price).
 If we navigate to /electronics/televisions  we want to see different
 facets and their counts specific to TV  items.
 If we navigate to /clothing   we want to obtain
 totally different facets and their counts.

 I am not sure if we can think of this as Hierarchical Facet Navigation
 system or not.
 From the UI perspective , we can think of /electronics/cameras as
 Hierarchical classification.



 There is a patch for Hierarchical Facet Navigation:

 https://issues.apache.org/jira/browse/SOLR-64

  But how about electronics/cameras/canon vs electronics/canon/camera.
 In this case both navigation should show the same result set no matter
 which facet is selected first.



 The patch supports a document to have multiple hierarchical facet fields.
 for example:

 add
  doc
   field name=nameCanon Brand-new Digital Camera/field
   field name=catelectronics/cameras/canon/field
   field name=catelectronics/canon/cameras/field
  /doc
 /add


 Koji

  My question is with the current solr implementation can we  provide
 context dependent faceted navigation from SOLR search results?

 Thank you.
 Thanh Doan







field type for serialized code?

2009-04-28 Thread Matt Mitchell
Hi,

I'm attempting to serialize a simple ruby object into a solr.StrField - but
it seems that what I'm getting back is munged up a bit, in that I can't
de-serialize it. Is there a field type for doing this type of thing?

Thanks,
Matt


highlighting html content

2009-04-27 Thread Matt Mitchell
Hi,

I've been looking around but can't seem to find any clear instruction on how
to do this... I'm storing html content and would like to enable highlighting
on the html content. The problem is that the search can sometimes match html
element names or attributes, and when the highlighter adds the highlight
tags, the html is bad.

I've been toying with setting custom pre/post delimiters and then removing
them in the client, but I thought I'd ask the list before I go to far with
that idea :)

Thanks,
Matt


storing xml - how to highlight hits in response?

2009-04-23 Thread Matt Mitchell
Hi,

I'm storing some raw xml in solr (stored and non-tokenized). I'd like to
highlight hits in the response, obviously this is problematic as the
highlighting elements are also xml. So if I match an attribute value or tag
name, the xml response is messed up. Is there a way to highlight only text,
that is not part of an xml element? As in, only the text content?

Matt


Re: storing xml - how to highlight hits in response?

2009-04-23 Thread Matt Mitchell
Yeah great idea, thanks. Does anyone know if there is code out there that
will do this sort of thing?

Matt


On Thu, Apr 23, 2009 at 3:23 PM, Ensdorf Ken ensd...@zoominfo.com wrote:

  Hi,
 
  I'm storing some raw xml in solr (stored and non-tokenized). I'd like
  to
  highlight hits in the response, obviously this is problematic as the
  highlighting elements are also xml. So if I match an attribute value or
  tag
  name, the xml response is messed up. Is there a way to highlight only
  text,
  that is not part of an xml element? As in, only the text content?

 You could create a custom Analyzer or Tokenizer that strips everything but
 the text content.

 -Ken




Re: Solr webinar

2009-04-20 Thread Matt Mitchell
Thanks Erik! Looking forward to it.

Matt

On Mon, Apr 20, 2009 at 11:00 AM, ahammad ahmed.ham...@gmail.com wrote:


 Hello Erik,

 I'm interested in attending the Webinar. I just have some questions to
 verify whether or not I am fit to attend...

 1) How will it be carried out? What software or application would I need?
 2) Do I have to have any experience or can I attend for the purpose of
 learning about Solr?

 Thanks for taking time to do this.

 Regards


 Erik Hatcher wrote:
 
  (excuse the cross-post)
 
  I'm presenting a webinar on Solr.  Registration is limited, so sign up
  soon.  Looking forward to seeing some of you there!
 
  Thanks,
Erik
 
 
  Got data? You can build your own Solr-powered Search Engine!
 
  Erik Hatcher, Lucene/Solr Committer and author, will show you how you
  how to use Solr to build an Enterprise Search engine that indexes a
  variety data sources all in a matter of minutes!
 
  Thursday, April 30, 2009
  11:00AM - 12:00PM PDT / 2:00PM - 3:00PM EDT
 
  Sign up for this free webinar today at
  http://www2.eventsvc.com/lucidimagination/?trk=E1
 
 

 --
 View this message in context:
 http://www.nabble.com/Solr-webinar-tp23138157p23138451.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: dismax query not working with 1.4

2009-03-26 Thread Matt Mitchell
Do you have qf set? Just last week I had a problem where no results were
coming back, and it turned out that my qf param was empty.

Matt

On Thu, Mar 26, 2009 at 2:30 PM, Ben Lavender blaven...@gmail.com wrote:

 Hello,

 I'm using the March 18th 1.4 nightly, and I can't get a dismax query
 to return results.  The standard and partitioned query types return
 data fine.  I'm using jetty, and the problem occurs with the default
 solrconfig.xml as well as the one I am using, which is the Drupal
 module, beta 6.  The problem occurs in the admin interface for solr,
 though, not just in the end application.

 And...that's it?  I don't know what else to say or offer other than
 dismax doesn't work, and I'm not sure where else to go to
 troubleshoot.  Any ideas?

 Ben



Re: Tomcat5 + Solr. Problems in deploying the Webapp

2009-03-04 Thread Matt Mitchell
Hi,

Have you looked at this page: http://wiki.apache.org/solr/SolrTomcat

It almost sounds like you're deploying twice? Putting the solr.war in
webapps would be one way, and the other would be a context config file +
using the web manager. If you're using the config/context, then don't put
the solr.war in webapps, tomcat should do that for you after deploying with
the manager.

Matt

On Wed, Mar 4, 2009 at 8:55 AM, Sudharshan S sudha...@gmail.com wrote:

 Hi all,
 I am trying to setup a solr instance with Tomcat5 on a Fedora10
 machine. Here is what I did,

 1.) Copy the apache-solr-nightly.war to webapps/solr.war
 2.) Set solr.solr.home in tomcat.conf
 3.) Use the Manager interface of tomcat to deploy the webapp

 But, while doing so, I get the following exceptions.
 
 Mar 4, 2009 6:55:09 PM org.apache.catalina.core.StandardContext filterStart
 SEVERE: Exception starting filter SolrRequestFilter
 java.lang.NoClassDefFoundError: Could not initialize class
 org.apache.solr.core.SolrConfig
at
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:76)
at
 org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:221)
at
 org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:302)
at
 org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:78)
at
 org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3635)
at
 org.apache.catalina.core.StandardContext.start(StandardContext.java:4222)
at
 org.apache.catalina.manager.ManagerServlet.start(ManagerServlet.java:1173)
at
 org.apache.catalina.manager.HTMLManagerServlet.start(HTMLManagerServlet.java:549)
at
 org.apache.catalina.manager.HTMLManagerServlet.doGet(HTMLManagerServlet.java:105)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:269)
at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
at
 org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:525)
at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
at
 org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
at
 org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
at
 org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
at
 org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
at java.lang.Thread.run(Thread.java:636)
 

 What am I missing? If it matters I am running the nightly build from
 March 3 2009.

 Thanks and Regards
 Sudharshan S
 Blog : http://www.sudharsh.wordpress.com
 IRC   : Sup3rkiddo @ Freenode, Gimpnet



Re: Column Specific Query with q parameter

2009-03-04 Thread Matt Mitchell
The syntax for the q param when using dismax is different from standard.
Check this out:


http://wiki.apache.org/solr/DisMaxRequestHandler#head-df8184dddf870336839490ba276ea6ac566d0bdf

q.alt under dismax is parsed using the standard query parser though:


http://wiki.apache.org/solr/DisMaxRequestHandler#head-9d23a23915b7932490069d3262ef7f3625e398ff

Using dismax with that query... you could do it using the fq param:

  ?fq=prdMainTitle_product_s:mathqt=dismaxrequestq.alt=*:*

But make sure you understand how the fq param works; how solr uses its
caching...


http://wiki.apache.org/solr/CommonQueryParameters#head-6522ef80f22d0e50d2f12ec487758577506d6002

Hope this helps,

Matt

On Thu, Mar 5, 2009 at 1:30 AM, dabboo ag...@sapient.com wrote:


 Hi,

 I am implementing column specific query with q query parameter. for e.g.

 ?q=prdMainTitle_product_s:math  qt=dismaxrequest

 The above query doesnt work while if I use the same query with q.alt
 parameter, it works.

 ?q=q.alt= prdMainTitle_product_s:math  qt=dismaxrequest

 Please suggest, how to achieve this with q query.

 Thanks,
 Amit Garg
 --
 View this message in context:
 http://www.nabble.com/Column-Specific-Query-with-q-parameter-tp22345960p22345960.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: solr and tomcat

2009-03-03 Thread Matt Mitchell
Hi Matthew,

The problem is that we have multiple instances of solr running under one
tomcat. So setting -Dsolr.data.dir=foo would set the home for every solr. I
guess multi-core might solve my problem, but that'd change our app
architecture too much, maybe some other day.

I *kind* of have a solution for the permissions thing though:

- The project user is part of the tomcat group.
- The tomcat user is part of the project user group.
- We're making a call to umask 002 in the tomcat catalina.sh file (means
all files created will have group write)

So when solr (tomcat) creates the index, they're group writable now and I
can remove etc.!

So, I still need to figure out the data.dir problem. Hmm.

Thanks for your help,
Matt

On Tue, Mar 3, 2009 at 11:31 AM, Matthew Runo mr...@zappos.com wrote:

 It looks like if you set a -Dsolr.data.dir=foo then you could specify where
 the index would be stored, yes?  Are you properly setting your solr.home?
 I've never had to set the data directory specifically, Solr has always put
 it under my home.

 From solrconfig.xml:
  dataDir${solr.data.dir:./solr/data}/dataDir

 Since Solr is running under tomcat, I'd assume that the index will always
 appear to be owned by tomcat as well. I don't think there is any way to have
 a different user for the written files - but someone else might want to
 chime in before you believe me 100% on this one.

 Thanks for your time!

 Matthew Runo
 Software Engineer, Zappos.com
 mr...@zappos.com - 702-943-7833


 On Mar 2, 2009, at 5:46 PM, Matt Mitchell wrote:

  Hi. I'm sorry if this is the second time this message comes through!

 A few questions here...

 #1
 Does anyone know how to set the user/group and/or permissions on the index
 that solr creates? It's always the tomcat user. Is it possible to change
 this in my context file? Help!

 #2
 I'm deploying Solr via Tomcat and really thought I had this stuff down.
 But
 it seems that with some recent system upgrades, my scheme is failing to
 set
 the data dir correctly.

 I'm deploying solr to tomcat, using a context file as described here:

 http://wiki.apache.org/solr/SolrTomcat#head-024d7e11209030f1dbcac9974e55106abae837ac

 But when I deploy, Tomcat says that it can't find a ./data/index directory
 -- relative to the tomcat home directory. How can I set the data dir
 relative to the solr home value I'm specifying in the tomcat context file?
 Note: a hard-coded absolute path works, but I want to configure at
 deployment time.

 In the past, I tried setting the data dir in the same way the solr home is
 set in the context file without luck. Does this now work in the latest
 solr
 nightly?

 Thanks,





Re: solr and tomcat

2009-03-03 Thread Matt Mitchell
That's exactly what we're doing (setting the value in each config). The main
problem with that is we have multiple people working on each of these solr
projects, in different environments. Their data.dir path is always the same
(relative) value which works fine under Jetty. But running under tomcat, the
data dir is relative to tomcat's home. So an absolute hard-coded path is the
only solution. My hope was that we'd be able to override it using the same
method as setting the solr/home value in the tomcat context file.

The thought of running multiple tomcats is interesting. Do you have any
issues with memory or cpu performance?

Thanks,
Matt

On Tue, Mar 3, 2009 at 11:45 AM, Matthew Runo mr...@zappos.com wrote:

 Perhaps you could hard code it in the solrconfig.xml file for each solr
 instance? Other than that, what we did was run multiple instances of Tomcat.
 That way if something goes bad in one, it doesn't affect the others.

 Thanks for your time!

 Matthew Runo
 Software Engineer, Zappos.com
 mr...@zappos.com - 702-943-7833

 On Mar 3, 2009, at 8:39 AM, Matt Mitchell wrote:

  Hi Matthew,

 The problem is that we have multiple instances of solr running under one
 tomcat. So setting -Dsolr.data.dir=foo would set the home for every solr.
 I
 guess multi-core might solve my problem, but that'd change our app
 architecture too much, maybe some other day.

 I *kind* of have a solution for the permissions thing though:

 - The project user is part of the tomcat group.
 - The tomcat user is part of the project user group.
 - We're making a call to umask 002 in the tomcat catalina.sh file (means
 all files created will have group write)

 So when solr (tomcat) creates the index, they're group writable now and I
 can remove etc.!

 So, I still need to figure out the data.dir problem. Hmm.

 Thanks for your help,
 Matt

 On Tue, Mar 3, 2009 at 11:31 AM, Matthew Runo mr...@zappos.com wrote:

  It looks like if you set a -Dsolr.data.dir=foo then you could specify
 where
 the index would be stored, yes?  Are you properly setting your solr.home?
 I've never had to set the data directory specifically, Solr has always
 put
 it under my home.

 From solrconfig.xml:
 dataDir${solr.data.dir:./solr/data}/dataDir

 Since Solr is running under tomcat, I'd assume that the index will always
 appear to be owned by tomcat as well. I don't think there is any way to
 have
 a different user for the written files - but someone else might want to
 chime in before you believe me 100% on this one.

 Thanks for your time!

 Matthew Runo
 Software Engineer, Zappos.com
 mr...@zappos.com - 702-943-7833


 On Mar 2, 2009, at 5:46 PM, Matt Mitchell wrote:

 Hi. I'm sorry if this is the second time this message comes through!


 A few questions here...

 #1
 Does anyone know how to set the user/group and/or permissions on the
 index
 that solr creates? It's always the tomcat user. Is it possible to change
 this in my context file? Help!

 #2
 I'm deploying Solr via Tomcat and really thought I had this stuff down.
 But
 it seems that with some recent system upgrades, my scheme is failing to
 set
 the data dir correctly.

 I'm deploying solr to tomcat, using a context file as described here:


 http://wiki.apache.org/solr/SolrTomcat#head-024d7e11209030f1dbcac9974e55106abae837ac

 But when I deploy, Tomcat says that it can't find a ./data/index
 directory
 -- relative to the tomcat home directory. How can I set the data dir
 relative to the solr home value I'm specifying in the tomcat context
 file?
 Note: a hard-coded absolute path works, but I want to configure at
 deployment time.

 In the past, I tried setting the data dir in the same way the solr home
 is
 set in the context file without luck. Does this now work in the latest
 solr
 nightly?

 Thanks,







solr and tomcat

2009-03-02 Thread Matt Mitchell
Hi. I'm sorry if this is the second time this message comes through!

A few questions here...

#1
Does anyone know how to set the user/group and/or permissions on the index
that solr creates? It's always the tomcat user. Is it possible to change
this in my context file? Help!

#2
I'm deploying Solr via Tomcat and really thought I had this stuff down. But
it seems that with some recent system upgrades, my scheme is failing to set
the data dir correctly.

I'm deploying solr to tomcat, using a context file as described here:
http://wiki.apache.org/solr/SolrTomcat#head-024d7e11209030f1dbcac9974e55106abae837ac

But when I deploy, Tomcat says that it can't find a ./data/index directory
-- relative to the tomcat home directory. How can I set the data dir
relative to the solr home value I'm specifying in the tomcat context file?
Note: a hard-coded absolute path works, but I want to configure at
deployment time.

In the past, I tried setting the data dir in the same way the solr home is
set in the context file without luck. Does this now work in the latest solr
nightly?

Thanks,


Re: [ANNOUNCE] Solr Logo Contest Results

2008-12-17 Thread Matt Mitchell
Love it! Congratulations Michiel.

Matt

On Wed, Dec 17, 2008 at 9:15 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:

 (replies to solr-user please)

 On behalf of the Solr Committers, I'm happy to announce that we the Solr
 Logo Contest is officially concluded. (Woot!)

 And the Winner Is...

 https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg
 ...by Michiel

 We ran into a few hiccups during the contest making it take longer then
 intended, but the result was a thorough process in which everyone went above
 and beyond to ensure that the final choice best reflected the wishes of the
 community.

 You can expect to see the new logo appear on the site (and in the Solr app)
 in the next few weeks.

 Congrats Michiel!


 -Hoss




strange difference between json and xml responses

2008-12-09 Thread Matt Mitchell
Hi,

A while ago, we had a field called word which was used as a spelling
field. We switched this to spell. When querying our solr instance with
just q=*:*, we get back the expected results. When querying our solr
instance with q=*:*wt=json, we get this (below). When setting the qt to
dismax, the error goes away but no results come back.

Is this a bug in the json response writer? Or more than likely, something
I'm completely glossing over?

Matt
HTTP Status 400 - undefined field word
--

*type* Status report

*message* *undefined field word*

*description* *The request sent by the client was syntactically incorrect
(undefined field word).*
--
Apache Tomcat/6.0.18


Re: strange difference between json and xml responses

2008-12-09 Thread Matt Mitchell
Actually, the dismax thing was a bad example. So, forget about the qt param
for now. I did however, search the schema and didn't find a reference to
word. The problem comes in when I switch the wt param from xml to json (or
ruby).

q=*:*wt=xml == success
q=*:*wt=json == error
q=*:*wt=ruby == error

Matt

On Tue, Dec 9, 2008 at 5:10 PM, Otis Gospodnetic [EMAIL PROTECTED]
 wrote:

 Hi Matt,

 You need to edit your solrconfig.xml and look for the word word in the
 dismax section of the config and change it to spell.

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
  From: Matt Mitchell [EMAIL PROTECTED]
  To: solr-user@lucene.apache.org
  Sent: Tuesday, December 9, 2008 2:08:43 PM
  Subject: strange difference between json and xml responses
 
  Hi,
 
  A while ago, we had a field called word which was used as a spelling
  field. We switched this to spell. When querying our solr instance with
  just q=*:*, we get back the expected results. When querying our solr
  instance with q=*:*wt=json, we get this (below). When setting the qt to
  dismax, the error goes away but no results come back.
 
  Is this a bug in the json response writer? Or more than likely, something
  I'm completely glossing over?
 
  Matt
  HTTP Status 400 - undefined field word
  --
 
  *type* Status report
 
  *message* *undefined field word*
 
  *description* *The request sent by the client was syntactically incorrect
  (undefined field word).*
  --
  Apache Tomcat/6.0.18




Re: strange difference between json and xml responses

2008-12-09 Thread Matt Mitchell
Thanks Yonik. Should submit this as a bug ticket? Currently it's not a deal
breaker as we're setting fl manually anyway.

Matt

On Tue, Dec 9, 2008 at 5:38 PM, Yonik Seeley [EMAIL PROTECTED] wrote:

 There is probably a document in your index with the field word.
 The json writers may be less tolerant when encountering a field that
 is not known.

 We should perhaps change the json/text based writers to handle this
 case gracefully also.

 -Yonik


 On Tue, Dec 9, 2008 at 5:18 PM, Matt Mitchell [EMAIL PROTECTED] wrote:
  Actually, the dismax thing was a bad example. So, forget about the qt
 param
  for now. I did however, search the schema and didn't find a reference to
  word. The problem comes in when I switch the wt param from xml to json
 (or
  ruby).
 
  q=*:*wt=xml == success
  q=*:*wt=json == error
  q=*:*wt=ruby == error
 
  Matt
 
  On Tue, Dec 9, 2008 at 5:10 PM, Otis Gospodnetic 
 [EMAIL PROTECTED]
  wrote:
 
  Hi Matt,
 
  You need to edit your solrconfig.xml and look for the word word in the
  dismax section of the config and change it to spell.
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
  - Original Message 
   From: Matt Mitchell [EMAIL PROTECTED]
   To: solr-user@lucene.apache.org
   Sent: Tuesday, December 9, 2008 2:08:43 PM
   Subject: strange difference between json and xml responses
  
   Hi,
  
   A while ago, we had a field called word which was used as a spelling
   field. We switched this to spell. When querying our solr instance
 with
   just q=*:*, we get back the expected results. When querying our solr
   instance with q=*:*wt=json, we get this (below). When setting the qt
 to
   dismax, the error goes away but no results come back.
  
   Is this a bug in the json response writer? Or more than likely,
 something
   I'm completely glossing over?
  
   Matt
   HTTP Status 400 - undefined field word
   --
  
   *type* Status report
  
   *message* *undefined field word*
  
   *description* *The request sent by the client was syntactically
 incorrect
   (undefined field word).*
   --
   Apache Tomcat/6.0.18
 
 
 



admin/luke and EmbeddedSolrServer

2008-12-01 Thread Matt Mitchell
Is it possible to send a request to admin/luke using the EmbeddedSolrServer?


Re: solr-ruby gem

2008-11-18 Thread Matt Mitchell
I've been using solr-ruby with 1.3 for quite a while now. It's powering our
experimental, open-source OPAC, Blacklight:

blacklight.rubyforge.org

I've got a custom query builder and response wrapper, but it's using
solr-ruby underneath.

Matt

On Tue, Nov 18, 2008 at 2:57 PM, Erik Hatcher [EMAIL PROTECTED]wrote:


 On Nov 18, 2008, at 2:41 PM, Kashyap, Raghu wrote:

 Anyone knows if the solr-ruby gem is compatible with solr 1.3??


 Yes, the gem at rubyforge is compatible with 1.3.  Also, the library itself
 is distributed with the binary release of Solr, in client/ruby/solr-ruby/lib

  Also anyone using acts_as_solr plugin? Off late the website is down and
 can't find any recent activities on that


 From my perspective, acts_as_solr is a mess.  [My apologies for creating
 the initial hack that then morphed out of control]

 There are a lot of users of various versions of acts_as_solr, and
 discussion of that continues here: 
 http://groups.google.com/group/acts_as_solr.  There are numerous github
 branches each with various patches applied - take your pick and run with one
 of them :)

 Or go lighter weight and roll-your-own acts_as_solr by simply putting in
 after_save/after_destroy hooks.  See slide 13 of 
 http://code4lib.org/files/solr-ruby.pdf

Erik





questions about Solr connection methods

2008-11-14 Thread Matt Mitchell
I'm implementing connection adapters in ruby/jruby and wondering how all of
the different solr connection classes relate.

Is the only difference between EmbeddedSolrServer and DirectSolrConnection,
that EmbeddedSolrServer provides some higher level methods for adding,
deleting etc.? Or is there something else happening underneath the covers?
If the higher level methods in EmbeddedSolrServer aren't really of use to
me, would it be better to use the simpler DirectSolrConnection?

Does DirectSolrConnection support multicore?

Thanks,
Matt


Question about CoreContainer

2008-11-03 Thread Matt Mitchell
Hi,

i'm using CoreContainer in jRuby. I'd like my data directory to be the
standard solr-home/data. But since CoreContainer == multi-core, I need to
supply a core name. Is it possible to use CoreContainer without a core? is
it possible to set the dataDir? Also, it seems that no matter what I set as
the solr home, a solr directory always gets created in the same directory
that I'm execuing my script.

Thanks,
Matt


Re: solr 1.3 - spellcheck doesn't seems to get any data?

2008-10-17 Thread Matt Mitchell
Did you send in a spellcheck.build=true ?

Matt

On Fri, Oct 17, 2008 at 7:31 AM, sunnyfr [EMAIL PROTECTED] wrote:


 Hi,

 How come I've nothing in my spellchercker directories :
 I've updated it but I'm started from an empty data directory and :

 [EMAIL PROTECTED]:/data/solr/video/data# ls spellchecker1/
 segments.gen  segments_1
 [EMAIL PROTECTED]:/data/solr/video/data# ls spellchecker2/
 segments.gen  segments_1
 [EMAIL PROTECTED]:/data/solr/video/data# ls spellcheckerFile/
 segments.gen  segments_1
 [EMAIL PROTECTED]:/data/solr/video/data# ls index/
 _a.fdt  _a.fnm  _a.nrm  _a.tii  _a.tvd  _a.tvx  _b.fdx  _b.frq  _b.prx
 _b.tis  _b.tvf  _c.fdt  _c.fnm  _c.nrm  _c.tii  _c.tvd  _c.tvx
 segments_e
 _a.fdx  _a.frq  _a.prx  _a.tis  _a.tvf  _b.fdt  _b.fnm  _b.nrm  _b.tii
 _b.tvd  _b.tvx  _c.fdx  _c.frq  _c.prx  _c.tis  _c.tvf  segments.gen

 My Files :
 http://www.nabble.com/file/p20031572/solrconfig.xml solrconfig.xml
 http://www.nabble.com/file/p20031572/schema.xml schema.xml

 Thanks,
 --
 View this message in context:
 http://www.nabble.com/solr-1.3---spellcheck-doesn%27t-seems-to-get-any-data--tp20031572p20031572.html
 Sent from the Solr - User mailing list archive at Nabble.com.




delete field from index

2008-10-17 Thread Matt Mitchell
Hi,

I was using a field called word but have changed it to spell. Do I need
to delete this field from the index and if so, how? I'm concerned because
when I do a query like:

?q.alt=*:*qt=dismax

I get an error saying the word field was not found.

Matt


Re: delete field from index

2008-10-17 Thread Matt Mitchell
OK I figured it out. It's because my fl had * in it. So, I'm guessing a
re-index will remove the word field for good?

+ Erik for the tip :)

Matt

On Fri, Oct 17, 2008 at 2:57 PM, Matt Mitchell [EMAIL PROTECTED] wrote:

 Hi,

 I was using a field called word but have changed it to spell. Do I need
 to delete this field from the index and if so, how? I'm concerned because
 when I do a query like:

 ?q.alt=*:*qt=dismax

 I get an error saying the word field was not found.

 Matt



populating a spellcheck dictionary

2008-10-09 Thread Matt Mitchell
I'm starting to implement the new SpellCheckComponent. The solr 1.3 dist
example is using a file based dictionary, but I'd like to figure out the
best way to populate the dictionary from our index. Should the spellcheck
field be multivalued?

Thanks,
Matt


Re: populating a spellcheck dictionary

2008-10-09 Thread Matt Mitchell
Woops, I was looking at the wrong example solrconfig.xml

Thanks Grant!

Matt

On Thu, Oct 9, 2008 at 10:01 AM, Grant Ingersoll [EMAIL PROTECTED]wrote:

 The example in example/solr/conf/solrconfig.xml should show a couple of
 different options:

 searchComponent name=spellcheck class=solr.SpellCheckComponent

str name=queryAnalyzerFieldTypetextSpell/str

lst name=spellchecker
  str name=namedefault/str
  str name=fieldspell/str
  str name=spellcheckIndexDir./spellchecker1/str

/lst
lst name=spellchecker
  str name=namejarowinkler/str
  str name=fieldspell/str
  !-- Use a different Distance Measure --
  str
 name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str
  str name=spellcheckIndexDir./spellchecker2/str

/lst

lst name=spellchecker
  str name=classnamesolr.FileBasedSpellChecker/str
  str name=namefile/str
  str name=sourceLocationspellings.txt/str
  str name=characterEncodingUTF-8/str
  str name=spellcheckIndexDir./spellcheckerFile/str
/lst
  /searchComponent

 The first two are index based.

 The spell field for the example is:
   field name=spell type=textSpell indexed=true stored=true
 multiValued=true/

 HTH,
 Grant


 On Oct 9, 2008, at 9:38 AM, Matt Mitchell wrote:

  I'm starting to implement the new SpellCheckComponent. The solr 1.3 dist
 example is using a file based dictionary, but I'd like to figure out the
 best way to populate the dictionary from our index. Should the spellcheck
 field be multivalued?

 Thanks,
 Matt


 --
 Grant Ingersoll

 Lucene Helpful Hints:
 http://wiki.apache.org/lucene-java/BasicsOfPerformance
 http://wiki.apache.org/lucene-java/LuceneFAQ











Spellchecker Question

2008-04-22 Thread Matt Mitchell
I'm using the Spellchecker handler but am a little confused. The docs say to
run the cmd=rebuild when building the first time. Do I need to supply a q
param with that cmd=rebuild? The examples show a url with the q param set
while rebuilding, but the main section on the cmd param doesn't say much
about it. My hunch is that I need to supply a q?

Thanks,
Matt


Re: Installation help

2008-04-16 Thread Matt Mitchell
What does the Jetty log output say in the console after you start it? It
should mention the port # on one of the last lines. If it does, try using
curl or wget to do a local request:

curl http://localhost:8983/solr/
wget http://localhost:8983/solr/

Matt

On Wed, Apr 16, 2008 at 5:08 PM, Shawn Carraway [EMAIL PROTECTED]
wrote:

 Hi all,
 I am trying to install Solr with Jetty (as part of another application)
 on a Linux server running Gentoo linux and JDK 1.6.0_05.

 When I try to start Jetty (and Solr), it doesn't open a port.

 I know you will need more info, but I'm not sure what you would need as
 I'm not clear on how this part works.

 Thanks,
 Shawn




custom request handler; standard vs dismax

2008-04-01 Thread Matt Mitchell
Hi,

I recently started playing with the dismax handler and custom request
handlers. When using the solr.StandardRequestHandler class, I get the
response that I want; lots of facet values. When I switch to the dismax
class, I get none. I've posted my request handler definitions here. Am I
missing something totally obvious?

Thanks,
Matt

p.s. using the latest/nightly build of solr

* an example url:

http://localhost:8983/solr/select/?facet.limit=6wt=rubyrows=0facet=truefacet.mincount=1facet.offset=0q=*:*fl=*,scoreqt=catalogfacet.missing=truefacet.field=source_facetfacet.sort=true


* no facet values with this:

requestHandler name=catalog class=solr.DisMaxRequestHandler 
str name=q.alt*:*/str
str name=hlon/str
/requestHandler


* lots of facet values with this:

requestHandler name=catalog class=solr.StandardRequestHandler 
str name=q.alt*:*/str
str name=hlon/str
/requestHandler


Re: search for non empty field

2008-03-31 Thread Matt Mitchell
Thanks Erik. I think this is the thread here:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200709.mbox/[EMAIL 
PROTECTED]

Matt

On Sun, Mar 30, 2008 at 9:50 PM, Erik Hatcher [EMAIL PROTECTED]
wrote:

 Documents with a particular field can be matched using:

  field:[* TO *]

 Or documents without a particular field with:

  -field:[* TO *]

 An empty field?  Meaning one that was indexed but with no terms?  I'm
 not sure about that one.  Seems like Hoss replied to something
 similar on this last week or so though - check the archives.

Erik


 On Mar 30, 2008, at 9:43 PM, Matt Mitchell wrote:
  I'm looking for the exact same thing.
 
  On Sun, Mar 30, 2008 at 8:45 PM, Ismail Siddiqui [EMAIL PROTECTED]
  wrote:
 
  Hi all,
 
 
  I have a situation where i have to filter result on a non empty
  field .
  wild card wont work as it will have to match with a letter.
  How can I form query to return result where a particular field is
  non-empty
  .
 
 
 
  Ismail
 




Re: search for non empty field

2008-03-30 Thread Matt Mitchell
I'm looking for the exact same thing.

On Sun, Mar 30, 2008 at 8:45 PM, Ismail Siddiqui [EMAIL PROTECTED] wrote:

 Hi all,


 I have a situation where i have to filter result on a non empty field .
 wild card wont work as it will have to match with a letter.
 How can I form query to return result where a particular field is
 non-empty
 .



 Ismail



Using Ruby to POST to Solr

2007-09-11 Thread Matt Mitchell
Hi, I just posted this to the ruby/google group. It probably belongs  
here! Also, anyone know exactly what the @ symbol in the curl command  
is doing?

Thanks,
Matt


I've got a script that uses curl, and would like (for educational
purposes mind you) to use ruby instead. This is the curl command that
works:

F=./my_data.xml
curl 'http://localhost:8080/update' --data-binary @$F -H 'Content-
type:text/xml; charset=utf-8'

I've been messing with Net::Http using something like below, with
variations (Base64.encode64) but nothing works yet. Anyone know the
ruby equivlent to the curl version above?

Thanks!

# NOT WORKING:
my_url = 'http://localhost:8080/update'
data = File.read('my_data.xml')
url = URI.parse(my_url)
post = Net::HTTP::Post.new(url.path)
post.body = data
post.content_type = 'application/x-www-form-urlencoded; charset=utf-8'
  response = Net::HTTP.start(url.host, url.port) do |http|
http.request(post)
  end
puts response.body

Re: Using Ruby to POST to Solr

2007-09-11 Thread Matt Mitchell

Hi Michael,

Thanks for that. I've got something that's working now:

data = File.read('my_solr_docs.xml')
url = URI.parse('http://localhost:8080/my_solr/update')
http = Net::HTTP.new(url.host, url.port)
response, body = http.post(url.path, data, {'Content-type'='text/ 
xml; charset=utf-8'})


Matt

On Sep 11, 2007, at 9:42 AM, Michael Kimsal wrote:


The curl man page states:

  If you start the data with the letter @, the rest  
should be a
file name to read the data from, or - if you want curl to read the  
data
  from  stdin.   The  contents  of the file must  
already be
url-encoded. Multiple files can also be specified. Posting data  
from a file

  named 'foobar' would thus be done with --data @foobar.




On 9/11/07, Matt Mitchell [EMAIL PROTECTED] wrote:


Hi, I just posted this to the ruby/google group. It probably belongs
here! Also, anyone know exactly what the @ symbol in the curl command
is doing?
Thanks,
Matt


I've got a script that uses curl, and would like (for educational
purposes mind you) to use ruby instead. This is the curl command that
works:

F=./my_data.xml
curl 'http://localhost:8080/update' --data-binary @$F -H 'Content-
type:text/xml; charset=utf-8'

I've been messing with Net::Http using something like below, with
variations (Base64.encode64) but nothing works yet. Anyone know the
ruby equivlent to the curl version above?

Thanks!

# NOT WORKING:
my_url = 'http://localhost:8080/update'
data = File.read('my_data.xml')
url = URI.parse(my_url)
post = Net::HTTP::Post.new(url.path)
post.body = data
post.content_type = 'application/x-www-form-urlencoded;  
charset=utf-8'

   response = Net::HTTP.start(url.host, url.port) do |http|
 http.request(post)
   end
puts response.body





--
Michael Kimsal
http://webdevradio.com


Matt Mitchell
Digital Scholarship Services
Box 400129
Alderman Library
University of Virginia
Charlottesville, VA 22904

[EMAIL PROTECTED]




Re: Using Ruby to POST to Solr

2007-09-11 Thread Matt Mitchell

Yes! Beautiful. I'll be checking that out.

matt

On Sep 11, 2007, at 12:18 PM, Erik Hatcher wrote:


Matt,

Try this instead:

  gem install solr-ruby # ;)

Then in irb or wherever:

  solr = Solr::Connection.new(http://localhost:8983/solr;)
  solr.add(:id = 123, :title = insert title here)
  solr.commit
  solr.query(title)

Visit us over on the [EMAIL PROTECTED] e-mail list for  
more on working with Solr from Ruby.


Erik


On Sep 11, 2007, at 10:55 AM, Matt Mitchell wrote:


Hi Michael,

Thanks for that. I've got something that's working now:

data = File.read('my_solr_docs.xml')
url = URI.parse('http://localhost:8080/my_solr/update')
http = Net::HTTP.new(url.host, url.port)
response, body = http.post(url.path, data, {'Content-type'='text/ 
xml; charset=utf-8'})


Matt

On Sep 11, 2007, at 9:42 AM, Michael Kimsal wrote:


The curl man page states:

  If you start the data with the letter @, the rest  
should be a
file name to read the data from, or - if you want curl to read  
the data
  from  stdin.   The  contents  of the file must  
already be
url-encoded. Multiple files can also be specified. Posting data  
from a file
  named 'foobar' would thus be done with --data  
@foobar.





On 9/11/07, Matt Mitchell [EMAIL PROTECTED] wrote:


Hi, I just posted this to the ruby/google group. It probably  
belongs
here! Also, anyone know exactly what the @ symbol in the curl  
command

is doing?
Thanks,
Matt


I've got a script that uses curl, and would like (for educational
purposes mind you) to use ruby instead. This is the curl command  
that

works:

F=./my_data.xml
curl 'http://localhost:8080/update' --data-binary @$F -H 'Content-
type:text/xml; charset=utf-8'

I've been messing with Net::Http using something like below, with
variations (Base64.encode64) but nothing works yet. Anyone know the
ruby equivlent to the curl version above?

Thanks!

# NOT WORKING:
my_url = 'http://localhost:8080/update'
data = File.read('my_data.xml')
url = URI.parse(my_url)
post = Net::HTTP::Post.new(url.path)
post.body = data
post.content_type = 'application/x-www-form-urlencoded;  
charset=utf-8'

   response = Net::HTTP.start(url.host, url.port) do |http|
 http.request(post)
   end
puts response.body





--
Michael Kimsal
http://webdevradio.com


Matt Mitchell
Digital Scholarship Services
Box 400129
Alderman Library
University of Virginia
Charlottesville, VA 22904

[EMAIL PROTECTED]




Matt Mitchell
Digital Scholarship Services
Box 400129
Alderman Library
University of Virginia
Charlottesville, VA 22904

[EMAIL PROTECTED]




solr/home

2007-09-06 Thread Matt Mitchell

Hi,

I recently upgraded to Solr 1.2. I've set it up through Tomcat using  
context fragment files. I deploy using the tomcat web manager. In the  
context fragment I set the environment variable solr/home. This use  
to work as expected. The solr/home value pointed to the directory  
where data, conf etc. live. Now, this value doesn't get used and  
instead, tomcat creates a new directory called solr and solr/data  
in the same directory where the context fragment file is located.  
It's not really a problem in this particular instance. I like the  
idea of it defaulting to solr in the same location as the context  
fragment file, but as long as I can depend on it always working like  
that. It is a little puzzling as to why the value in my environment  
setting doesn't work though?


Has anyone else experienced this behavior?

Matt


Re: solr/home

2007-09-06 Thread Matt Mitchell

Here you go:

Context docBase=/usr/local/lib/solr.war debug=0  
crossContext=true 
   Environment name=solr/home type=java.lang.String value=/usr/ 
local/projects/my_app/current/solr-home /

/Context

This is the same file I'm putting into the Tomcat manager XML  
Configuration file URL form input.


Matt

On Sep 6, 2007, at 3:25 PM, Tom Hill wrote:


It works for me. (fragments with solr 1.2 on tomcat 5.5.20)

Could you post your fragment file?

Tom


On 9/6/07, Matt Mitchell [EMAIL PROTECTED] wrote:

Hi,

I recently upgraded to Solr 1.2. I've set it up through Tomcat using
context fragment files. I deploy using the tomcat web manager. In the
context fragment I set the environment variable solr/home. This use
to work as expected. The solr/home value pointed to the directory
where data, conf etc. live. Now, this value doesn't get used and
instead, tomcat creates a new directory called solr and solr/data
in the same directory where the context fragment file is located.
It's not really a problem in this particular instance. I like the
idea of it defaulting to solr in the same location as the context
fragment file, but as long as I can depend on it always working like
that. It is a little puzzling as to why the value in my environment
setting doesn't work though?

Has anyone else experienced this behavior?

Matt





Updating index on cluster

2007-07-18 Thread Matt Mitchell

Hi,

I'm currently working on an application which is living in a  
clustered server environment. There is a hardware based balancer, and  
each node in the cluster has  a separate install of Solr. The  
application code and files are on a NFS mount, along with the solr/ 
home. The first node has been acting as the master.


My question is about reindexing, and even schema updates in some  
circumstances.


For a reindex, I post to Solr on the master node and then restart the  
remaining nodes.

Is there a better way to do this?

For a schema update, I stop the master, delete the data/index dir,  
start solr and then post to Solr on the master node. Then I restart  
the remaining nodes.

Is there a better way to do this?

Any tips, feedback or what have are much appreciated!

Matt


Delete entire index

2007-06-13 Thread Matt Mitchell

Hi,
Is there a way to have Solr completely remove the current index?  
deleteAll/ ?


We're still in development and so our schema is wavering. Anytime we  
make a change and want to re-index we first have to:


stop tomcat (or the solr webapp)
manually remove the data/index
restart tomcat (or the solr webapp)

The removing of the data/index directory is where we have the most  
trouble, because of the file permissions. The data/index directory is  
owned by tomcat/tomcat so in order to remove it, we have to issue  
sudo rm which we'd like to avoid.


Ideally if we could just tell Solr to delete all data without having  
to do anymore manual work, it'd be great! : )


Something else that would help is if we tell Tomcat/Solr which user/ 
group and/or permission to use on the data/index directory when it's  
created.


Any thoughts on this?

Matt


Tomcat: The requested resource (/solr/update) is not available.

2007-06-12 Thread Matt Mitchell

Hi,

I've got an app using Cocoon and Solr, both running through Tomcat.  
The post.sh file has been modified to grab local files, send it to  
Cocoon (via http), the Solr-fied xml from Cocoon is then sent to the  
update url in Tomcat/Solr. Not sure any of that is relevant though!


I'm running the post.sh file like:

post.sh ../xml/*.xml

Which sends all of the files in xml to the post.sh script.

Most of the POSTs work fine, but every once in a while I'll get:

The requested resource (/solr/update) is not available.

So my questions is this, is there a problem with sending all of those  
post requests to solr all at once? Should I be waiting to get an ok  
response before posting the next? Or is it OK to just blast solr like  
that? I'm wondering if its a Tomcat issue?


Matt


Commit failing with EOFException

2007-05-31 Thread Matt Mitchell

Hi,

I've had this application running before and not sure what has  
changed to cause this error. When trying to do a clean update  
(removed index dir and restarted solr) with just a commit/, Solr is  
returning a status 1 with this error at the top:


java.io.EOFException: input contained no data

Does anyone have any idea as to why that's happening? The same thing  
occurs when I try to use the post.sh script with a valid xml file.


Thank you!

Matt


Re: Commit failing with EOFException

2007-05-31 Thread Matt Mitchell
OK figured this out. The short of it is, make sure your schema is  
always up to date! : )


The schema did not match the xml docs being posted. And because we  
had a previous solr update with those docs, even trying to post/ 
update a commit/ was failing because there was already bad data  
waiting to be committed.


Matt

On May 31, 2007, at 11:42 AM, Matt Mitchell wrote:


Hi,

I've had this application running before and not sure what has  
changed to cause this error. When trying to do a clean update  
(removed index dir and restarted solr) with just a commit/, Solr  
is returning a status 1 with this error at the top:


java.io.EOFException: input contained no data

Does anyone have any idea as to why that's happening? The same  
thing occurs when I try to use the post.sh script with a valid xml  
file.


Thank you!

Matt