Re: Disable (or prohibit) per-field overrides

2010-10-17 Thread Markus Jelsma
Hi,

Thanks for the suggestion and pointer. We've implemented it using a single 
regex in Nginx for now. 

Cheers,

> : Anyone knows useful method to disable or prohibit the per-field override
> : features for the search components? If not, where to start to make it
> : configurable via solrconfig and attempt to come up with a working patch?
> 
> If your goal is to prevent *clients* from specifying these (while you're
> still allowed to use them in your defaults) then the simplest solution is
> probably something external to Solr -- along the lines of mod_rewrite.
> 
> Internally...
> 
> that would be tough.
> 
> You could probably write a SearchComponent (configured to run "first")
> that does it fairly easily -- just wrap the SolrParams in an impl that
> retuns null anytime a component asks for a param name that starts with
> "f." (and excludes those param names when asked for a list of the param
> names)
> 
> 
> It could probably be generalized to support arbitrary rules i na way
> that might be handy for other folks, but it would still just be
> wrapping all of hte params, so it would prevent you from using them
> in your config as well.
> 
> Ultimatley i think a general solution would need to be in
> RequestHandlerBase ... where it wraps the request params using the
> defaults and invariants ... you'd want the custom exclusion rules to apply
> only to the request params from the client.
> 
> 
> 
> 
> -Hoss


Re: SolrJ new javabin format

2010-10-17 Thread Markus Jelsma
Well, in Nutch we simply replace the two jars and it all still works.

>   The CHANGES.txt file in branch_3x says that the javabin format has
> changed in Solr 3.1, so you need to update SolrJ as well as Solr.  Is
> the SolrJ included in 3.1 compatible with both 3.1 and 1.4.1?  If not,
> that's going to make a graceful upgrade of my replicated distributed
> installation a little harder.
> 
> Thanks,
> Shawn


Re: Disable (or prohibit) per-field overrides

2010-10-18 Thread Markus Jelsma
Thanks for your reply. But as i replied the following to Erick's suggestion 
which is quite the same:

>  Yes, we're using it but the problem is that there can be many fields
>  and that means quite a large list of parameters to set for each request
>  handler, and there can be many request handlers.
>  
>  It's not very practical for us to maintain such big set of invariants.

It's much easier for us to maintain a very short white list than a huge black 
list.

Cheers

On Monday, October 18, 2010 04:59:09 pm Jonathan Rochkind wrote:
> You know about the 'invariant' that can be set in the request handler,
> right?  Not sure if that will do for you or not, but sounds related.
> 
> Added recnetly to some wiki page somewhere although the feature has been
> there for a long time.  Let's see if I can find the wiki page...Ah yes:
> 
> http://wiki.apache.org/solr/SearchHandler#Configuration
> 
> Markus Jelsma wrote:
> > Hi,
> > 
> > Thanks for the suggestion and pointer. We've implemented it using a
> > single regex in Nginx for now.
> > 
> > Cheers,
> > 
> >> : Anyone knows useful method to disable or prohibit the per-field
> >> : override features for the search components? If not, where to start
> >> : to make it configurable via solrconfig and attempt to come up with a
> >> : working patch?
> >> 
> >> If your goal is to prevent *clients* from specifying these (while you're
> >> still allowed to use them in your defaults) then the simplest solution
> >> is probably something external to Solr -- along the lines of
> >> mod_rewrite.
> >> 
> >> Internally...
> >> 
> >> that would be tough.
> >> 
> >> You could probably write a SearchComponent (configured to run "first")
> >> that does it fairly easily -- just wrap the SolrParams in an impl that
> >> retuns null anytime a component asks for a param name that starts with
> >> "f." (and excludes those param names when asked for a list of the param
> >> names)
> >> 
> >> 
> >> It could probably be generalized to support arbitrary rules i na way
> >> that might be handy for other folks, but it would still just be
> >> wrapping all of hte params, so it would prevent you from using them
> >> in your config as well.
> >> 
> >> Ultimatley i think a general solution would need to be in
> >> RequestHandlerBase ... where it wraps the request params using the
> >> defaults and invariants ... you'd want the custom exclusion rules to
> >> apply only to the request params from the client.
> >> 
> >> 
> >> 
> >> 
> >> -Hoss

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350


Re: Removing Common Web Page Header and Footer from All Content Fetched by Nutch

2010-10-19 Thread Markus Jelsma
Unfortunately, Nutch still uses Tika 0.7 in 1.2 and trunk. Nutch needs to be 
upgraded to Tika 0.8 (when it's released or just the current trunk). Also, the 
Boilerpipe API needs to be exposed through Nutch configuration, which extractor 
can be used, which parameters need to be set etc.

Upgrading to Tika's trunk might be relatively easy but exposing Boilerpipe 
surely isn't.

On Tuesday, October 19, 2010 06:47:43 am Otis Gospodnetic wrote:
> Hi Israel,
> 
> You can use this: http://search-lucene.com/?q=boilerpipe&fc_project=Tika
> Not sure if it's built into Nutch, though...
> 
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
> 
> 
> 
> - Original Message 
> 
> > From: Israel Ekpo 
> > To: solr-user@lucene.apache.org; u...@nutch.apache.org
> > Sent: Mon, October 18, 2010 9:01:50 PM
> > Subject: Removing Common Web Page Header and Footer from All Content
> > Fetched by
> >
> >Nutch
> >
> > Hi All,
> > 
> > I am indexing a web application with approximately 9500 distinct  URL and
> > contents using Nutch and Solr.
> > 
> > I use Nutch to fetch the urls,  links and the crawl the entire web
> > application to extract all the content for  all pages.
> > 
> > Then I run the solrindex command to send the content to  Solr.
> > 
> > The problem that I have now is that the first 1000 or so characters  of
> > some pages and the last 400 characters of the pages are showing up in
> > the  search results.
> > 
> > These are contents of the common header and footer  used in the site
> > respectively.
> > 
> > The only work around that I have now is  to index everything and then go
> > through each document one at a time to remove  the first 1000 characters
> > if the levenshtein distance between the first 1000  characters of the
> > page and the common header is less than a certain value.  Same applies
> > to the footer content common to all pages.
> > 
> > Is there a way  to ignore certain "stop phrase" so to speak in the Nutch
> > configuration based  on levenshtein distance or jaro winkler distance so
> > that certain parts of the  fetched data that matches this stop phrases
> > will not be parsed?
> > 
> > Any  useful pointers would be highly appreciated.
> > 
> > Thanks in  advance.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350


Re: Uppercase and lowercase queries

2010-10-19 Thread Markus Jelsma
Because you need to reindex.

On Tuesday, October 19, 2010 12:19:53 pm PeterKerk wrote:
> I want to query on cityname. This works when I query for example:
> "Boston"
> 
> But when I query "boston" it didnt show any results. In the database is
> stored: "Boston".
> 
> So I thought: I should change the filter on this field to make everything
> lowercase.
> 
> 
> The field definition for city is:  indexed="true" stored="true"/>
> 
> So I changed its fieldtype "string" from:  class="solr.StrField" sortMissingLast="true" omitNorms="true">
> 
> TO:
> 
>  omitNorms="true">
>   
>   
> 
>   
>   
> 
> 
>   
>   
> 
> 
> But it still doesnt show any results when I query "boston"...why?

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350


Re: Uppercase and lowercase queries

2010-10-19 Thread Markus Jelsma
Yes, and reindex. And i suggest not to use `string` as the name of the 
fieldType as it will confuse later.




  


 
  


  


On Tuesday, October 19, 2010 12:25:53 pm Pradeep Singh wrote:
> Use text field.
> 
> On Tue, Oct 19, 2010 at 3:19 AM, PeterKerk  wrote:
> > I want to query on cityname. This works when I query for example:
> > "Boston"
> > 
> > But when I query "boston" it didnt show any results. In the database is
> > stored: "Boston".
> > 
> > So I thought: I should change the filter on this field to make everything
> > lowercase.
> > 
> > 
> > The field definition for city is:  > indexed="true" stored="true"/>
> > 
> > So I changed its fieldtype "string" from:  > class="solr.StrField" sortMissingLast="true" omitNorms="true">
> > 
> > TO:
> > > 
> > omitNorms="true">
> > 
> >  
> >  
> >
> >
> >
> >  
> >  
> >  
> >  
> >
> >
> >  
> >  
> >  
> >
> > 
> > But it still doesnt show any results when I query "boston"...why?
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/Uppercase-and-lowercase-queries-tp1731
> > 349p1731349.html Sent from the Solr - User mailing list archive at
> > Nabble.com.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350


Re: **SPAM** Re: boosting injection

2010-10-19 Thread Markus Jelsma
Index-time boosting maybe?
http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22field.22

On Tuesday, October 19, 2010 04:23:46 pm Andrea Gazzarini wrote:
> Hi Ken,
> thanks for your response...unfortunately it doesn't solve my problem.
> 
> I cannot chnage the client behaviour so the query must be a query and not
> only the query terms. In this scenario, It would be great, for example, if
> I could declare the boost in the schema field definitionbut I think
> it's not possible isn't it?
> 
> Regards
> Andrea
>   _
> 
> From: Ken Stanley [mailto:doh...@gmail.com]
> To: solr-user@lucene.apache.org
> Sent: Tue, 19 Oct 2010 15:05:31 +0200
> Subject: **SPAM**  Re: boosting injection
> 
> Andrea,
> 
>   Using the SOLR dismax query handler, you could set up queries like this
> to boost on fields of your choice. Basically, the q parameter would be the
> query terms (without the field definitions, and a qf (Query Fields)
> parameter that you use to define your boost(s):
>   http://wiki.apache.org/solr/DisMaxQParserPlugin. A non-SOLR alternative
>   would be to parse the query in whatever application is sending the
> queries to the SOLR instance to make the necessary transformations.
> 
>   Regards,
> 
>   Ken
> 
>   It looked like something resembling white marble, which was
>   probably what it was: something resembling white marble.
>   -- Douglas Adams, "The Hitchhikers Guide to the Galaxy"
> 
> 
>   On Tue, Oct 19, 2010 at 8:48 AM, Andrea Gazzarini <
> 
>   andrea.gazzar...@atcult.it> wrote:
>   >  Hi all,
>   > 
>   > I have a client that is sending this query
>   > 
>   > q=title:history AND author:joyce
>   > 
>   > is it possible to "transform" at runtime this query in this way:
>   > 
>   > q=title:history^10 AND author:joyce^5
>   > 
>   > ?
>   > 
>   > Best regards,
>   > Andrea

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350


Re: Mulitple facet - fq

2010-10-20 Thread Markus Jelsma
It should work fine. Make sure the field is indexed and check your index.

On Wednesday 20 October 2010 16:39:03 Yavuz Selim YILMAZ wrote:
> Under category facet, there are multiple selections, whicih can be
> personal,corporate or other 
> 
> How can I get both "personal" and "corporate" ones, I tried
> fq=category:corporate&fq=category:personal
> 
> It looks easy, but I can't find the solution.
> 
> 
> --
> 
> Yavuz Selim YILMAZ

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350


Re: facet Prefix (or term prefix)

2010-10-22 Thread Markus Jelsma
Hi,

There is no facet.contains facility there are alternatives. Instead of using 
the faceting engine, you will need to create a field that has an 
NGramTokenizer.  Properly configured, you can use this field to query upon and 
it will return what you would expect from a facet.contains feature.

Here's a post on the subject which you may find useful:
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-
queries-using-edgengrams/

Cheers,

On Friday 22 October 2010 13:20:56 Jason Brown wrote:
> I am aware of the facet.prefix facility. I am using SOLR to return a
> facetted fields contents - I use the facet.prefix to restrict what returns
> from SOLR - this is very useful for predictive search functionality
> (autocomplete).
> 
> My only issue is that the field I facet on is a string and could have 2 or
> 3 words in it, thus this process will only return strings that begin with
> what the user is typing into my UI search box. It would be useful if I
> could get facets back where I could match somewhere in the facetted field
> (not just at the begninning), i.e. is there a fact.contains method?
> 
> If not I'll just have to code this in my service layer having received all
> facets from SOLR (without the prefix)
> 
> Thanks for any help.
> 
> 
> 
> 
> If you wish to view the St. James's Place email disclaimer, please use the
> link below
> 
> http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350


Re: How to use AND as opposed to OR as the default query operator.

2010-10-25 Thread Markus Jelsma
http://wiki.apache.org/solr/SchemaXml#Default_query_parser_operator

On Monday 25 October 2010 15:41:50 Swapnonil Mukherjee wrote:
> Hi Everybody,
> 
> I simply want to use AND as the default operator in queries. When a user
> searches for Jennifer Lopez solr converts this to a Jennifer OR Lopez
> query. On the other hand I want solr to treat this query as Jennifer AND
> Lopez and not as Jennifer OR Lopez.
> 
> In other words I want a default AND behavior in phrase queries instead of
> OR.
> 
> I have seen in this presentation
> http://www.slideshare.net/pittaya/using-apache-solr on Slide number 52
> that this OR behavior is configurable.
> 
> Could you please tell me where this configuration is located? I could not
> locate it in schema.xml.
> 
> Swapnonil Mukherjee
> +91-40092712
> +91-9007131999

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350


Re: documentCache clarification

2010-10-27 Thread Markus Jelsma
I've been wondering about this too some time ago. I've found more information 
in SOLR-52 and some correspondence on this one but it didn't give me a 
definitive answer..

[1]: https://issues.apache.org/jira/browse/SOLR-52
[2]: http://www.mail-archive.com/solr-...@lucene.apache.org/msg01185.html

On Wednesday 27 October 2010 16:39:44 Jay Luker wrote:
> Hi all,
> 
> The solr wiki says this about the documentCache: "The more fields you
> store in your documents, the higher the memory usage of this cache
> will be."
> 
> OK, but if i have enableLazyFieldLoading set to true and in my request
> parameters specify "fl=id", then the number of fields per document
> shouldn't affect the memory usage of the document cache, right?
> 
> Thanks,
> --jay

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350


Re: Stored or indexed?

2010-10-27 Thread Markus Jelsma
http://wiki.apache.org/solr/FieldOptionsByUseCase]

> Hi all-
> 
> I've read through the documentation, but I'm still a little confused about
> the  tag, in terms of the indexed and stored attributes. If I have
> something marked as indexed="true", why would I ever want stored="false"?
> Are there any good tips-n-tricks anywhere about how to properly set the
> field tag? I've been finding bits and pieces both on the wiki and a couple
> of other websites, but there doesn't seem to be a good definitive how-to
> on this.
> 
> Thanks for any info,
> 
> Ron
> 
> DISCLAIMER: This electronic message, including any attachments, files or
> documents, is intended only for the addressee and may contain
> CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information.  If you are
> not the intended recipient, you are hereby notified that any use,
> disclosure, copying or distribution of this message or any of the
> information included in or with it is  unauthorized and strictly
> prohibited.  If you have received this message in error, please notify the
> sender immediately by reply e-mail and permanently delete and destroy this
> message and its attachments, along with any copies thereof. This message
> does not create any contractual obligation on behalf of the sender or Law
> Bulletin Publishing Company. Thank you.


Re: Start parameter and result grouping

2010-10-31 Thread Markus Jelsma
Ah, seems you're just one day behind. SOLR-2207, paging with field collapsing, 
has just been resolved:
https://issues.apache.org/jira/browse/SOLR-2207


> Hi,
> 
> I'm trying to implement paging when grouping is on.
> 
> Start parameter works, but the result contains all the documents that were
> before him.
> 
> http://localhost:8983/solr/select?q=test&group=true&group.field=marketplace
> Id&group.limit=1&rows=1&start=0(I get 1 document).
> http://localhost:8983/solr/select?q=test&group=true&group.field=marketplace
> Id&group.limit=1&rows=1&start=1(I get 2 documents).
> ...
> http://localhost:8983/solr/select?q=test&group=true&group.field=marketplace
> Id&group.limit=1&rows=1&start=N(I get N documents).
> 
> But in all these queries, I should get only one document in results. Am I
> right?
> 
> I'm using this build: https://hudson.apache.org/hudson/job/Solr-trunk/1297/
> 
> Thanks.


Re: Start parameter and result grouping

2010-10-31 Thread Markus Jelsma
Oh, and see the just updated wiki page as well:
http://wiki.apache.org/solr/FieldCollapsing

> Ah, seems you're just one day behind. SOLR-2207, paging with field
> collapsing, has just been resolved:
> https://issues.apache.org/jira/browse/SOLR-2207
> 
> > Hi,
> > 
> > I'm trying to implement paging when grouping is on.
> > 
> > Start parameter works, but the result contains all the documents that
> > were before him.
> > 
> > http://localhost:8983/solr/select?q=test&group=true&group.field=marketpla
> > ce Id&group.limit=1&rows=1&start=0(I get 1 document).
> > http://localhost:8983/solr/select?q=test&group=true&group.field=marketpla
> > ce Id&group.limit=1&rows=1&start=1(I get 2 documents).
> > ...
> > http://localhost:8983/solr/select?q=test&group=true&group.field=marketpla
> > ce Id&group.limit=1&rows=1&start=N(I get N documents).
> > 
> > But in all these queries, I should get only one document in results. Am I
> > right?
> > 
> > I'm using this build:
> > https://hudson.apache.org/hudson/job/Solr-trunk/1297/
> > 
> > Thanks.


Re: Solr in virtual host as opposed to /lib

2010-11-01 Thread Markus Jelsma
No, he didn't make a mistake but you did. Next time, please start a new thread 
not by conveniently replying to an existing thread and just changing the 
subject. Now we have two threads in thread. :)

> I don't think you read the entire thread. I'm assuming you made a mistake.
> 
> -Original Message-
> From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
> Sent: Monday, November 01, 2010 11:49 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr in virtual host as opposed to /lib
> 
> : References:
> : 
> : 
> : 
> : 
> : 
> : 
> : 
> : In-Reply-To:
> 
> 
> : Subject: Solr in virtual host as opposed to /lib
> 
> http://people.apache.org/~hossman/#threadhijack
> Thread Hijacking on Mailing Lists
> 
> When starting a new discussion on a mailing list, please do not reply to
> an existing message, instead start a fresh email.  Even if you change the
> subject line of your email, other mail headers still track which thread
> you replied to and your question is "hidden" in that thread and gets less
> attention.   It makes following discussions in the mailing list archives
> particularly difficult.
> See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking
> 
> 
> 
> -Hoss


Re: Highlighting and maxBooleanClauses limit

2010-11-02 Thread Markus Jelsma
Hmm, i'm not sure it's the highlighter alone. Depending on the query it can 
also get triggered by the spellcheck component. See below what happens with a 
maxBoolean = 16.

HTTP ERROR: 500

maxClauseCount is set to 16

org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 
16
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:153)
at 
org.apache.lucene.search.spell.SpellChecker.add(SpellChecker.java:329)
at 
org.apache.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:260)
at 
org.apache.solr.spelling.AbstractLuceneSpellChecker.getSuggestions(AbstractLuceneSpellChecker.java:140)
at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:140)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)



On Tuesday 02 November 2010 16:26:00 Koji Sekiguchi wrote:
> (10/11/02 23:14), Ken Stanley wrote:
> > I've noticed in the stack trace that this exception occurs when trying to
> > build the query for the highlighting; I've confirmed this by copying the
> > params and changing hl=true to hl=false. Unfortunately, when using
> > debugQuery=on, I do not see any details on what is going on with the
> > highlighting portion of the query (after artificially increasing the
> > maxBooleanClauses so the query will run).
> > 
> > With all of that said, my question(s) to the list are: Is there a way to
> > determine how exactly the highlighter is building its query (i.e., some
> > sort of highlighting debug setting)?
> 
> Basically I think highlighter uses main query, but try to rewrite it
> before highlighting.
> 
> > Is the behavior of highlighting in SOLR
> > intended to be held to the same restrictions (maxBooleanClauses) as the
> > query parser (even though the highlighting query is built internally)?
> 
> I think so because maxBooleanClauses is a static variable.
> 
> I saw your stack trace and glance at highlighter source,
> my assumption is - highlighter tried to rewrite (expand) your
> range queries to boolean query, even if you set requireFieldMatch to true.
> 
> Can you try to query without the range query? If the problem goes away,
> I think it is highlighter bug. Highlighter should skip the range query
> when user set requireFieldMatch to true, because your range query is for
> another field. If so, please open a jira issue.
> 
> Koji

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350


Negative or zero value for fieldNorm

2010-11-03 Thread Markus Jelsma
Hi all,

I've got some puzzling issue here. During tests i noticed a document at the 
bottom of the results where it should not be. I query using DisMax on title 
and content field and have a boost on title using qf. Out of 30 results, only 
two documents also have the term in the title.

Using debugQuery and fl=*,score i quickly noticed large negative maxScore of 
the complete resultset and a portion of the resultset where scores sum up to 
zero because of a product with 0 (fieldNorm).

See below for debug output for a result with score = 0:

0.0 = (MATCH) sum of:
  0.0 = (MATCH) max of:
0.0 = (MATCH) weight(content:kunstgrasveld in 7), product of:
  0.75658196 = queryWeight(content:kunstgrasveld), product of:
6.6516633 = idf(docFreq=33, maxDocs=9682)
0.113743275 = queryNorm
  0.0 = (MATCH) fieldWeight(content:kunstgrasveld in 7), product of:
2.236068 = tf(termFreq(content:kunstgrasveld)=5)
6.6516633 = idf(docFreq=33, maxDocs=9682)
0.0 = fieldNorm(field=content, doc=7)
0.0 = (MATCH) fieldWeight(title:kunstgrasveld in 7), product of:
  1.0 = tf(termFreq(title:kunstgrasveld)=1)
  8.791729 = idf(docFreq=3, maxDocs=9682)
  0.0 = fieldNorm(field=title, doc=7)

And one with a negative score:

3.0716116E-4 = (MATCH) sum of:
  3.0716116E-4 = (MATCH) max of:
3.0716116E-4 = (MATCH) weight(content:kunstgrasveld in 1462), product of:
  0.75658196 = queryWeight(content:kunstgrasveld), product of:
6.6516633 = idf(docFreq=33, maxDocs=9682)
0.113743275 = queryNorm
  4.059853E-4 = (MATCH) fieldWeight(content:kunstgrasveld in 1462), product 
of:
1.0 = tf(termFreq(content:kunstgrasveld)=1)
6.6516633 = idf(docFreq=33, maxDocs=9682)
6.1035156E-5 = fieldNorm(field=content, doc=1462)

There are no funky issues with term analysis for the text fieldType, in fact, 
the term passes through unchanged. I don't do omitNorms, i store termVectors 
etc.

Because fieldNorm = fieldBoost / sqrt(numTermsForField) i suspect my input from 
Nutch is messed up. A fieldNorm can never be =< 0 for a normal positive boost 
and field boosts should not be zero or negative (correct me if i'm wrong). But, 
since i can't yet figure out what field boosts Nutch sends to me i thought i'd 
drop by on this mailing list first.

There are quite a few query terms that return with zero or negative scores and 
many that behave as i expect. I find it also a bit hard to comprehend why the 
docs with negative score rank higher in the result set than documents with 
zero score. Sorting defaults to score DESC,  but this is perhaps another 
issue.

Anyway, the test runs on a Solr 1.4.1 instance with Java 6 under the hood. 
Help or directions are appreciated =)

Cheers,

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350


Re: Negative or zero value for fieldNorm

2010-11-03 Thread Markus Jelsma

> Regarding "Negative or zero value for fieldNorm", I don't see any
> negative fieldNorms here... just very small positive ones?

Of course, you're right. The E-# got twisted in my mind and became negative. 
Silly me.

> Anyway the fieldNorm is the product of the lengthNorm and the
> index-time boost of the field (which is itself the product of the
> index time boost on the document and the index time boost of all
> instances of that field).  Index time boosts default to "1" though, so
> they have no effect unless something has explicitly set a boost.

I've just checked docs 7 and 1462 (resp. first and second in debug output 
below) with Luke. The title and content fields have no index time boosts, thus 
defaulting to 1.0f which is fine.

Then, why does doc 7 have a fieldNorm of 0.0 on title (and so setting a 0.0 
score on the doc in the result set) and does doc 1462 have a very very small 
fieldNorm?

debugOutput for doc 7:
0.0 = fieldNorm(field=title, doc=7)

Luke on the title field of doc 7.
1.0

Thanks for your reply!


> -Yonik
> http://www.lucidimagination.com
> 
> 
> 
> On Wed, Nov 3, 2010 at 2:30 PM, Markus Jelsma
> 
>  wrote:
> > Hi all,
> > 
> > I've got some puzzling issue here. During tests i noticed a document at
> > the bottom of the results where it should not be. I query using DisMax
> > on title and content field and have a boost on title using qf. Out of 30
> > results, only two documents also have the term in the title.
> > 
> > Using debugQuery and fl=*,score i quickly noticed large negative maxScore
> > of the complete resultset and a portion of the resultset where scores
> > sum up to zero because of a product with 0 (fieldNorm).
> > 
> > See below for debug output for a result with score = 0:
> > 
> > 0.0 = (MATCH) sum of:
> >  0.0 = (MATCH) max of:
> >0.0 = (MATCH) weight(content:kunstgrasveld in 7), product of:
> >  0.75658196 = queryWeight(content:kunstgrasveld), product of:
> >6.6516633 = idf(docFreq=33, maxDocs=9682)
> >0.113743275 = queryNorm
> >  0.0 = (MATCH) fieldWeight(content:kunstgrasveld in 7), product of:
> >2.236068 = tf(termFreq(content:kunstgrasveld)=5)
> >6.6516633 = idf(docFreq=33, maxDocs=9682)
> >0.0 = fieldNorm(field=content, doc=7)
> >0.0 = (MATCH) fieldWeight(title:kunstgrasveld in 7), product of:
> >  1.0 = tf(termFreq(title:kunstgrasveld)=1)
> >  8.791729 = idf(docFreq=3, maxDocs=9682)
> >  0.0 = fieldNorm(field=title, doc=7)
> > 
> > And one with a negative score:
> > 
> > 3.0716116E-4 = (MATCH) sum of:
> >  3.0716116E-4 = (MATCH) max of:
> >3.0716116E-4 = (MATCH) weight(content:kunstgrasveld in 1462), product
> > of: 0.75658196 = queryWeight(content:kunstgrasveld), product of:
> > 6.6516633 = idf(docFreq=33, maxDocs=9682)
> >0.113743275 = queryNorm
> >  4.059853E-4 = (MATCH) fieldWeight(content:kunstgrasveld in 1462),
> > product of:
> >1.0 = tf(termFreq(content:kunstgrasveld)=1)
> >6.6516633 = idf(docFreq=33, maxDocs=9682)
> >6.1035156E-5 = fieldNorm(field=content, doc=1462)
> > 
> > There are no funky issues with term analysis for the text fieldType, in
> > fact, the term passes through unchanged. I don't do omitNorms, i store
> > termVectors etc.
> > 
> > Because fieldNorm = fieldBoost / sqrt(numTermsForField) i suspect my
> > input from Nutch is messed up. A fieldNorm can never be =< 0 for a
> > normal positive boost and field boosts should not be zero or negative
> > (correct me if i'm wrong). But, since i can't yet figure out what field
> > boosts Nutch sends to me i thought i'd drop by on this mailing list
> > first.
> > 
> > There are quite a few query terms that return with zero or negative
> > scores and many that behave as i expect. I find it also a bit hard to
> > comprehend why the docs with negative score rank higher in the result
> > set than documents with zero score. Sorting defaults to score DESC,  but
> > this is perhaps another issue.
> > 
> > Anyway, the test runs on a Solr 1.4.1 instance with Java 6 under the
> > hood. Help or directions are appreciated =)
> > 
> > Cheers,
> > 
> > --
> > Markus Jelsma - CTO - Openindex
> > http://www.linkedin.com/in/markus17
> > 050-8536600 / 06-50258350


Re: Negative or zero value for fieldNorm

2010-11-04 Thread Markus Jelsma
Hi,

I've worked around the issue by setting omitNorms=true on the title field. Now 
all fieldNorm values are 1.0f and therefore do not mess up my scores anymore. 
This, of course, is hardly a solution even though i currently do not use 
index-time boosts on any field.

The question remains, why does the title field return a fieldNorm=0 for many 
queries? And a subquestion, does the luke request handler return boost values 
for documents? I know i get boost values for fields but i haven't seen boost 
values for documents. 

Cheers,

On Wednesday 03 November 2010 20:44:48 Markus Jelsma wrote:
> > Regarding "Negative or zero value for fieldNorm", I don't see any
> > negative fieldNorms here... just very small positive ones?
> 
> Of course, you're right. The E-# got twisted in my mind and became
> negative. Silly me.
> 
> > Anyway the fieldNorm is the product of the lengthNorm and the
> > index-time boost of the field (which is itself the product of the
> > index time boost on the document and the index time boost of all
> > instances of that field).  Index time boosts default to "1" though, so
> > they have no effect unless something has explicitly set a boost.
> 
> I've just checked docs 7 and 1462 (resp. first and second in debug output
> below) with Luke. The title and content fields have no index time boosts,
> thus defaulting to 1.0f which is fine.
> 
> Then, why does doc 7 have a fieldNorm of 0.0 on title (and so setting a 0.0
> score on the doc in the result set) and does doc 1462 have a very very
> small fieldNorm?
> 
> debugOutput for doc 7:
> 0.0 = fieldNorm(field=title, doc=7)
> 
> Luke on the title field of doc 7.
> 1.0
> 
> Thanks for your reply!
> 
> > -Yonik
> > http://www.lucidimagination.com
> > 
> > 
> > 
> > On Wed, Nov 3, 2010 at 2:30 PM, Markus Jelsma
> > 
> >  wrote:
> > > Hi all,
> > > 
> > > I've got some puzzling issue here. During tests i noticed a document at
> > > the bottom of the results where it should not be. I query using DisMax
> > > on title and content field and have a boost on title using qf. Out of
> > > 30 results, only two documents also have the term in the title.
> > > 
> > > Using debugQuery and fl=*,score i quickly noticed large negative
> > > maxScore of the complete resultset and a portion of the resultset
> > > where scores sum up to zero because of a product with 0 (fieldNorm).
> > > 
> > > See below for debug output for a result with score = 0:
> > > 
> > > 0.0 = (MATCH) sum of:
> > >  0.0 = (MATCH) max of:
> > >0.0 = (MATCH) weight(content:kunstgrasveld in 7), product of:
> > >  0.75658196 = queryWeight(content:kunstgrasveld), product of:
> > >6.6516633 = idf(docFreq=33, maxDocs=9682)
> > >0.113743275 = queryNorm
> > >  
> > >  0.0 = (MATCH) fieldWeight(content:kunstgrasveld in 7), product of:
> > >2.236068 = tf(termFreq(content:kunstgrasveld)=5)
> > >6.6516633 = idf(docFreq=33, maxDocs=9682)
> > >0.0 = fieldNorm(field=content, doc=7)
> > >
> > >0.0 = (MATCH) fieldWeight(title:kunstgrasveld in 7), product of:
> > >  1.0 = tf(termFreq(title:kunstgrasveld)=1)
> > >  8.791729 = idf(docFreq=3, maxDocs=9682)
> > >  0.0 = fieldNorm(field=title, doc=7)
> > > 
> > > And one with a negative score:
> > > 
> > > 3.0716116E-4 = (MATCH) sum of:
> > >  3.0716116E-4 = (MATCH) max of:
> > >3.0716116E-4 = (MATCH) weight(content:kunstgrasveld in 1462),
> > >product
> > > 
> > > of: 0.75658196 = queryWeight(content:kunstgrasveld), product of:
> > > 6.6516633 = idf(docFreq=33, maxDocs=9682)
> > > 
> > >0.113743275 = queryNorm
> > >  
> > >  4.059853E-4 = (MATCH) fieldWeight(content:kunstgrasveld in 1462),
> > > 
> > > product of:
> > >1.0 = tf(termFreq(content:kunstgrasveld)=1)
> > >6.6516633 = idf(docFreq=33, maxDocs=9682)
> > >6.1035156E-5 = fieldNorm(field=content, doc=1462)
> > > 
> > > There are no funky issues with term analysis for the text fieldType, in
> > > fact, the term passes through unchanged. I don't do omitNorms, i store
> > > termVectors etc.
> > > 
> > > Because fieldNorm = fieldBoost / sqrt(numTermsForField) i suspect my
> > > input from Nutch is messed up. A fieldNorm can never be =< 0 for a
> > > normal positive boos

Re: Negative or zero value for fieldNorm

2010-11-04 Thread Markus Jelsma
I've done some testing with the example docs and it behaves similar when there 
is a zero doc boost. Luke, however, does not show me the index-time boosts. 
Bost document and field boosts are not visible in Luke's output. I've changed 
doc boost and field boosts for the mp500.xml document but all i ever see 
returned is boost=1.0. Is this correct?

Anyway, i'm looking at Nutch now for reasons why i sends a zero boost on a 
docuement.

On Thursday 04 November 2010 14:16:22 Yonik Seeley wrote:
> On Thu, Nov 4, 2010 at 8:04 AM, Markus Jelsma
> 
>  wrote:
> > The question remains, why does the title field return a fieldNorm=0 for
> > many queries?
> 
> Because the index-time boost was set to 0 when the doc was indexed.  I
> can't say how that happened... look to your indexing code.
> 
> > And a subquestion, does the luke request handler return boost values
> > for documents? I know i get boost values for fields but i haven't seen
> > boost values for documents.
> 
> The doc boost is just multiplied into each field boost and doesn't
> have a separate representation in the index.
> 
> -Yonik
> http://www.lucidimagination.com

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350


Re: Negative or zero value for fieldNorm

2010-11-04 Thread Markus Jelsma
On Thursday 04 November 2010 15:12:23 Yonik Seeley wrote:
> On Thu, Nov 4, 2010 at 9:51 AM, Markus Jelsma
> 
>  wrote:
> > I've done some testing with the example docs and it behaves similar when
> > there is a zero doc boost. Luke, however, does not show me the
> > index-time boosts.
> 
> Remember that the norm is a product of the length norm and the index
> time boost... it's recorded as a single number in the index.

Yes.

> > Bost document and field boosts are not visible in Luke's output. I've
> > changed doc boost and field boosts for the mp500.xml document but all i
> > ever see returned is boost=1.0. Is this correct?
> 
> Perhaps you still have omitNorms=true for the field you are querying?

The example schema does not have omitNorms=true on the name, cat or features 
field.

> 
> -Yonik
> http://www.lucidimagination.com

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350


Re: Optimize Index

2010-11-04 Thread Markus Jelsma
Huh? That's something new for me. Optmize removed documents that have been 
flagged for deletion. For relevancy it's important those are removed because 
document frequencies are not updated for deletes.

Did i miss something?

> For what it's worth, the Solr class instructor at the Lucene Revolution
> conference recommended *against* optimizing, and instead suggested to just
> let the merge factor do it's job.
> 
> On Thu, Nov 4, 2010 at 2:55 PM, Shawn Heisey  wrote:
> > On 11/4/2010 7:22 AM, stockiii wrote:
> >> how can i start an optimize by using DIH, but NOT after an delta- or
> >> full-import ?
> > 
> > I'm not aware of a way to do this with DIH, though there might be
> > something I'm not aware of.  You can do it with an HTTP POST.  Here's
> > how to do it with curl:
> > 
> > /usr/bin/curl "http://HOST:PORT/solr/CORE/update"; \
> > -H "Content-Type: text/xml" \
> > --data-binary ''
> > 
> > Shawn


Replication and ignored fields

2010-11-05 Thread Markus Jelsma
Hi,

I've got an ordinary master/slave replication set up. The master contains 
several fields that are not used by the slaves but are used by processes that 
interact with the master. Removing the fields from the master is not an option.

Well, to save disk space i'd figure i create an `ignored` fieldType and set the 
fields that are unused on the slaves to use the ignored fieldType.

..it doesn't work and makes perfectly sense because it's just the index files 
that get copied over.

The question, how to ignore fields with replication?

Cheers,
-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350


Re: Replication and ignored fields

2010-11-05 Thread Markus Jelsma
Thanks for the pointer! 

> How about hooking in  Andrzej's pruning tool at the postCommit event,
> literally removing unused fields. I believe a "commit" is fired on the
> slave by itself after every successful replication, to put the index live.
> You could execute a script which prunes away the dead meat and then call a
> new commit?
> 
> http://wiki.apache.org/solr/SolrConfigXml#A.22Update.22_Related_Event_Liste
> ners
> http://www.lucidimagination.com/solutions/webinars/mastering-the-lucene-in
> dex
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
> On 5. nov. 2010, at 16.11, Markus Jelsma wrote:
> > Hi,
> > 
> > I've got an ordinary master/slave replication set up. The master contains
> > several fields that are not used by the slaves but are used by processes
> > that interact with the master. Removing the fields from the master is
> > not an option.
> > 
> > Well, to save disk space i'd figure i create an `ignored` fieldType and
> > set the fields that are unused on the slaves to use the ignored
> > fieldType.
> > 
> > ..it doesn't work and makes perfectly sense because it's just the index
> > files that get copied over.
> > 
> > The question, how to ignore fields with replication?
> > 
> > Cheers,


Re: Using Multiple Cores for Multiple Users

2010-11-09 Thread Markus Jelsma
Hi,

> All,
> 
> I have a web application that requires the user to register and then login
> to gain access to the site. Pretty standard stuff...Now I would like to
> know what the best approach would be to implement a "customized" search
> experience for each user. Would this mean creating a separate core per
> user? I think that this is not possible without restarting Solr after each
> core is added to the multi-core xml file, right?

No, you can dynamically manage cores and parts of their configuration. 
Sometimes you must reindex after a change, the same is true for reloading 
cores. Check the wiki on this one [1].

> 
> My use case is this...User A would like to index 5 RSS feeds and User B
> would like to index 5 completely different RSS feeds and he is not
> interested at all in what User A is interested in. This means that they
> would have to be separate index cores, right?

If you view documents within an rss feed as a separate documents, you can 
assign an user ID to those documents, creating a multi user index with rss 
documents per user, or group or whatever.

Having a core per user isn't a good idea if you have many users.  It takes up 
additional memory and disk space, doesn't share caches etc.  There is also 
more maintenance and your need some support scripts to dynamically create new 
cores - Solr currently doesn't create a new core directory structure.

But, reindexing a very large index takes up a lot more time and resources and 
relevancy might be an issue depending on the rss feeds' contents. 

> 
> What is the best approach for this kind of thing?

I'd usually store the feeds in a single index and shard if it's too many for a 
single server with your specifications. Unless the demands are too specific.

> 
> Thanks in advance,
> Adam

[1]: http://wiki.apache.org/solr/CoreAdmin

Cheers


Re: Core Swapping

2010-11-16 Thread Markus Jelsma
gt; org.tuckey.web.filters.urlrewrite.UrlRewriteFilter.doFilter(UrlRewriteFilt
> er.java:417) at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicati
> onFilterChain.java:235) at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilter
> Chain.java:206) at
> uk.co.sjp.intranet.utils.JsonpCallbackFilter.doFilter(JsonpCallbackFilter.
> java:108) at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicati
> onFilterChain.java:235) at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilter
> Chain.java:206) at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.
> java:233) at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.
> java:191) at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:1
> 28) at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:1
> 02) at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.ja
> va:109) at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286
> ) at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
> at
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Ht
> tp11Protocol.java:583) at
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at
> java.lang.Thread.run(Thread.java:619)
> 16-Nov-2010 14:54:03 org.apache.catalina.core.StandardWrapperValve invoke
> SEVERE: Servlet.service() for servlet default threw exception
> org.apache.solr.common.SolrException: Not Found

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350


Re: Spell Checker

2010-11-16 Thread Markus Jelsma

> Hi (again)
> 
> 
> 
> I am looking at the spell checker options:
> 
> 
> 
> http://wiki.apache.org/solr/SpellCheckerRequestHandler#Term_Source_Configur
> a tion
> 
> 
> 
> http://wiki.apache.org/solr/SpellCheckComponent#Use_in_the_Solr_Example
> 
> 
> 
> I am looking in my solrconfig.xml and I see one is already in use. I am
> kind of confused by this because the recommended spell checker is not
> default in my Solr 1.4.1. I have read the documentation but am still fuzzy
> on what I should do.
> 

Yes, the wiki on the request handler can be confusing indeed as it discusses 
the spellchecker as a request handler instead of a component. Usually, people 
need the spellchecker just as a component in some request handler instead of a 
request handler specifically designed for only spellchecking. I'd forget about 
that wiki and just follow the spellcheck component wiki as it not only 
describes the request handler but also the component, and it is being 
maintained up to the most recent developments in trunk and branch 3.1.

> 
> 
> My site uses legal terms and as you can see, some terms don't jive with the
> default spell checker so I was hoping to map the spell checker to the body
> for referencing dictionary words. I am unclear what approach I should take
> and how to start the quest.

Map the spellchecker to the body of what? I assume the body of your document 
where the `main content` is stored. In that case, you'd just follow the wiki 
on the component and create a spellchecking fieldType with proper analyzers 
(the example allright) and define a spellchecking field that has the spellcheck 
fieldType as type (again, like in the example).

Then you'll need to configure the spellchecking component in your solrconfig. 
The example is, again, what you're looking for. All you need to map your 
document's main body to the spellchecker is a copyField directive in your 
schema which will copy your body field to the spellcheck field (which has the 
spellcheck fieldType).

The example on the component wiki page should work. Many features have been 
added since 1.4.x but the examples should work as expected.


> 
> 
> 
> Can someone clarify what I should be doing here? Am I on the right track?
> 
> 
> 
> Eric


Re: WordDelimiterFilterFactory + CamelCase query

2010-11-18 Thread Markus Jelsma
Hi,

Please add preserveOriginal="1"  to your WDF [1] definition and reindex (or 
just try with the analysis page).

This will make sure the original input token is being preserved along the 
newly generated tokens. If you then pass it all through a lowercase filter, it 
should match both documents.

[1]: 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

Cheers,


>   Hi,
> 
> I am going crazy but which config is necessary to include the missing doc
> 2? I have:
> doc1 tw:aBc
> doc2 tw:abc
> 
> Now a query "aBc" returns only doc 1 although when I try doc2 from
> admin/analysis.jsp
> then the term text 'abc' of the index gets highlighted as intended.
> I even indexed a simple example (no stopwords, no protwords, no
> synonyms) via* and
> tried this with the normal and dismax handler but I cannot make it
> working :-/
> 
> What have I misunderstood?
> 
> Regards,
> Peter.
> 
> 
> 
> 
> 
>   generateWordParts="1" generateNumberParts="1"
> catenateAll="0" preserveOriginal="1"/>
> 
>  protected="protwords.txt"/>
> 
> 
> 
>  ignoreCase="true" expand="true"/>
>  words="stopwords.txt" enablePositionIncrements="true" />
>   generateWordParts="1" generateNumberParts="1"
> catenateAll="0" preserveOriginal="1"/>
> 
>  protected="protwords.txt"/>
> 
> 
> --
> 
> 
> *
> books.csv:
> 
> id,tw
> 1,aBc
> 2,abc
> 
> curl http://localhost:8983/solr/update/csv?commit=true --data-binary
> @books.csv -H 'Content-type:text/plain; charset=utf-8'


Re: Respect token order in matches

2010-11-18 Thread Markus Jelsma
Hi,

I'm not sure what QParser you're using but with the DismaxQParser you can 
specify slop on explicit phrase queries, did you set it because it can make a 
difference. Check it out:

http://wiki.apache.org/solr/DisMaxQParserPlugin#qs_.28Query_Phrase_Slop.29

Cheers,

> Hi,
> 
> is there a way to make solr respect the order of token matches when the
> query is a multi-term string?
> 
> Here's an example:
> 
> Query String: "John C"
> 
> Indexed Strings:
> 
> - "John Cage"
> - "Cargill John"
> 
> This will return both indexed strings as a result. However, "Cargill John"
> should not match in that case, because the order of the tokens is not the
> same as in the query.
> 
> Here's the fieldtype:
> 
>positionIncrementGap="100">
> 
>
>  
>  
>   words="stopwords.txt" enablePositionIncrements="true" />  class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement=""
> replace="all" />  minGramSize="1" maxGramSize="25" /> 
> 
>
>  
>  
>   words="stopwords.txt" enablePositionIncrements="true" />  class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement=""
> replace="all" /> 
> 
>   
> 
> Is there a way to achieve this using this fieldtype?
> 
> 
> thanks!


Re: simple production set up

2010-11-18 Thread Markus Jelsma
Hi,

It's a common practice not to use Solr as a frontend. Almost all deployed 
instances live in the backend near the database servers. And if Solr is being 
put to the front, it's still being secured by a proxy.

Setting up staging and production instances depend on your need. If the load 
is small, you can run two Solr cores [1] on the same instance and if the load 
is high you'd just separate them, the same goes for development and test 
instances.

[1]: http://wiki.apache.org/solr/CoreAdmin

Cheers,

> Hi I'm pretty new to SOLR and interested in getting an idea about a simple
> standard way of setting up a production SOLR service. I have read the FAQs
> and the wiki around SOLR security and performance but have not found much
> on a best practice architecture. I'm particularly interested in best
> practices around DOS prevention, securing the SOLR web app and setting up
> dev, test, production indexes.
> 
> Any pointers, links to resources would be great. Thanks in advance
> 
> Lee C


Re: simple production set up

2010-11-19 Thread Markus Jelsma
Please stay on the list.

Anyway, it's a matter of not exposing certain request handlers to the public. 
If you have a master/slave set up, you can remove the update handlers from 
your public facing slave (or hide it behind HTTP auth in your proxy). The same 
goes for other defined request handlers.

Essentially, you must know all about your defined request handlers in order to 
know whether they are secure or not.

Cheers,

On Friday 19 November 2010 09:15:42 lee carroll wrote:
> Hi thanks for the response
> So if I follow what you are saying for a public facing index the standard
> pattern is to run behind a reverse proxy providing security (and caching?)
> Are their any docs on this? Or example deployment diagrams / config. Thanks
> lee c
> 
> On 18 Nov 2010 23:14, "Markus Jelsma"  wrote:
> > Hi,
> > 
> > It's a common practice not to use Solr as a frontend. Almost all deployed
> > instances live in the backend near the database servers. And if Solr is
> 
> being
> 
> > put to the front, it's still being secured by a proxy.
> > 
> > Setting up staging and production instances depend on your need. If the
> 
> load
> 
> > is small, you can run two Solr cores [1] on the same instance and if the
> 
> load
> 
> > is high you'd just separate them, the same goes for development and test
> > instances.
> > 
> > [1]: http://wiki.apache.org/solr/CoreAdmin
> > 
> > Cheers,
> > 
> >> Hi I'm pretty new to SOLR and interested in getting an idea about a
> 
> simple
> 
> >> standard way of setting up a production SOLR service. I have read the
> 
> FAQs
> 
> >> and the wiki around SOLR security and performance but have not found
> >> much on a best practice architecture. I'm particularly interested in
> >> best practices around DOS prevention, securing the SOLR web app and
> >> setting up dev, test, production indexes.
> >> 
> >> Any pointers, links to resources would be great. Thanks in advance
> >> 
> >> Lee C

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350


Re: passing arguments to analyzer/filter at runtime

2010-11-22 Thread Markus Jelsma
Hi,

I wouldn't use a multiValued field for this because you then you would have the 
same analyzers (and possibly stemmers) for different languages.

The usual method is to have fieldTypes for each language (en_text, de_text etc) 
and then create specific fields that map to them (en_content, de_content etc).

Since you know the language at index time, you can simply add the content to 
the proper LANG_content field.

Cheers,

On Monday 22 November 2010 15:58:41 jan.kure...@nokia.com wrote:
> Hi,
> 
> I’m trying to find a solution to search only in a given language.
> 
> On index time the language is known per string to be tokenized so I would
> like to write a filter that prefixes each token according to its language.
> First question: how to pass the language argument to the filter best?
> 
> I’m going to use multivalued fields, and each value I put in that field has
> another language. How do I pass several languages on to the filter best?
> 
> on search side it gets a bit trickier, here I do not know exactly the
> language of the input query but several possible. So instead of prefixing
> each token with one language code I need to prefix each token with every
> possible language code. How do I pass parameters to the filter at query
> time?
> 
> I’m not using the URL variant I am using the SolrServer.query(SolrQuery)
> interface.
> 
> Jan

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350


Re: unknown field 'name'

2010-11-23 Thread Markus Jelsma
Hi,

Strange, the example schema should work with the example documents. Anyway, 
see your Solr output as it will show you which field it complains about.

Cheers,

> Good Evening List,
> 
> I have been working with Nutch and due to numerous integration advantages I
> decided to get to grips with the Solr code base.
> 
> Solr dist - 1.4.1
> java version 1.6.0_22
> Windows Vista Home Premium
> Command Prompt to execute commands
> 
> I encountered the following problem very early on during indexing stage,
> and even though I asked this question (through the wrong list :0|) I have
> been unable to resolve what it is thats going wrong. My searches to date
> pick up hits relating to Db problems and are of no use. I have a new dist
> of Solr and have made no configuration to date.
> 
> C:\Users\Mcgibbney\Documents\LEWIS\apache-solr-1.4.1\apache-solr-1.4.1\exam
> ple\e xampledocs>java -jar post.jar *.xml
> SimplePostTool: version 1.2
> SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8,
> othe r encodings are not currently supported
> SimplePostTool: POSTing files to http://localhost:8983/solr/update..
> SimplePostTool: POSTing file hd.xml
> SimplePostTool: FATAL: Solr returned an error: ERRORunknown_field_name
> 
> Help would be great.
> 
> Lewis Mc
> 
> Glasgow Caledonian University is a registered Scottish charity, number
> SC021474
> 
> Winner: Times Higher Education's Widening Participation Initiative of the
> Year 2009 and Herald Society's Education Initiative of the Year 2009
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,
> en.html


Re: unknown field 'name'

2010-11-23 Thread Markus Jelsma
I see i missed the `name`-part. Did you really start the example with java -
jar start.jar in the example directory? Name is a defined field in the shipped 
schema.

> Hi,
> 
> Strange, the example schema should work with the example documents. Anyway,
> see your Solr output as it will show you which field it complains about.
> 
> Cheers,
> 
> > Good Evening List,
> > 
> > I have been working with Nutch and due to numerous integration advantages
> > I decided to get to grips with the Solr code base.
> > 
> > Solr dist - 1.4.1
> > java version 1.6.0_22
> > Windows Vista Home Premium
> > Command Prompt to execute commands
> > 
> > I encountered the following problem very early on during indexing stage,
> > and even though I asked this question (through the wrong list :0|) I have
> > been unable to resolve what it is thats going wrong. My searches to date
> > pick up hits relating to Db problems and are of no use. I have a new dist
> > of Solr and have made no configuration to date.
> > 
> > C:\Users\Mcgibbney\Documents\LEWIS\apache-solr-1.4.1\apache-solr-1.4.1\ex
> > am ple\e xampledocs>java -jar post.jar *.xml
> > SimplePostTool: version 1.2
> > SimplePostTool: WARNING: Make sure your XML documents are encoded in
> > UTF-8, othe r encodings are not currently supported
> > SimplePostTool: POSTing files to http://localhost:8983/solr/update..
> > SimplePostTool: POSTing file hd.xml
> > SimplePostTool: FATAL: Solr returned an error: ERRORunknown_field_name
> > 
> > Help would be great.
> > 
> > Lewis Mc
> > 
> > Glasgow Caledonian University is a registered Scottish charity, number
> > SC021474
> > 
> > Winner: Times Higher Education's Widening Participation Initiative of the
> > Year 2009 and Herald Society's Education Initiative of the Year 2009
> > http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219
> > , en.html


Re: own way synonyms

2010-11-23 Thread Markus Jelsma
Even without expanding the synonym definitions, word1 and word2 will match 
THISKEYWORD. Although word1 != word2, both will still match THISKEYWORD and in 
a sense be... well, synonyms, different word/tone but same meaning.

What are you trying to achieve here? Using synomyms for what they are not 
doesn't make sense. If you explain why you need this you might get a better 
answer.

> I think you can do this by THISKEYWORD => word1,word2,word3,word4
> you can try it and then see if it work by analysing it on the analyzer tab
> on the admin page.
> 
> 
> 
> 
> 
> From: solruser2010 
> To: solr-user@lucene.apache.org
> Sent: Tue, November 23, 2010 1:44:20 PM
> Subject: own way synonyms
> 
> 
> Hi,
> 
> Is it possible to set up synonyms to work like this
> 
> THISKEYWORD = word1,word2,word3,word4
> 
> but have it so word1 != word2 != word3
> 
> 
> in this theoretical example a search for fishing would be set up like this.
> 
> fishing,sport,water,boat,bait
> 
> 
> Thanks


Re: own way synonyms

2010-11-23 Thread Markus Jelsma
Yes, you can do that. Please see the wiki for specifics and good examples:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory


> a better example might be this:
> 
> When someone searches on "programming" i want it to return results with
> java OR python OR php but I don't want a search for "java" to return
> documents with php
> 
> 
> programming,java,php,python


Re: Facet.query and collapsing

2010-11-28 Thread Markus Jelsma
http://wiki.apache.org/solr/FieldCollapsing#Known_Limitations

> Hi All,
> 
> I'm in a situation where I need to perform a facet on a query with field
> collapsing.
> 
> Let's say the main query is something like this
> 
> title:apple&fq={!tag=sources}source_id:(33 OR
> 44)&facet=on&facet.field={!ex=sources}source_id&facet.query=source_id:(33
> OR 44)&collapse=on&collapse.field=hash_id
> 
> I'd like my facet query to return the number of unique documents (based on
> the hash_id field) that are associated to either source 33 or 44
> 
> Right now, the query works but the count returned is larger than expected
> since there is no collapsing performed on the facet query's result set.
> 
> Is there any way of doing this? I'd like to be able to do this without
> performing a second request.
> 
> Thanks
> 
> NOTE: I'm using Solr 1.4.1 with patch 236
> (https://issues.apache.org/jira/browse/SOLR-236)


Re: question about Solr SignatureUpdateProcessorFactory

2010-11-29 Thread Markus Jelsma


On Monday 29 November 2010 14:51:33 Bernd Fehling wrote:
> Dear list,
> another suggestion about SignatureUpdateProcessorFactory.
> 
> Why can I make signatures of several fields and place the
> result in one field but _not_ make a signature of one field
> and place the result in several fields.

Use copyField

> 
> Could be realized without huge programming?
> 
> Best regards,
> Bernd
> 
> Am 29.11.2010 14:30, schrieb Bernd Fehling:
> > Dear list,
> > 
> > a question about Solr SignatureUpdateProcessorFactory:
> > 
> > for (String field : sigFields) {
> > 
> >   SolrInputField f = doc.getField(field);
> >   if (f != null) {
> > 
> > *sig.add(field);
> > 
> > Object o = f.getValue();
> > if (o instanceof String) {
> > 
> >   sig.add((String)o);
> > 
> > } else if (o instanceof Collection) {
> > 
> >   for (Object oo : (Collection)o) {
> >   
> > if (oo instanceof String) {
> > 
> >   sig.add((String)oo);
> > 
> > }
> >   
> >   }
> > 
> > }
> >   
> >   }
> > 
> > }
> > 
> > Why is also the field name (* above) added to the signature
> > and not only the content of the field?
> > 
> > By purpose or by accident?
> > 
> > I would like to suggest removing the field name from the signature and
> > not mixing it up.
> > 
> > Best regards,
> > Bernd

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Good example of multiple tokenizers for a single field

2010-11-29 Thread Markus Jelsma
You can use only one tokenizer per analyzer. You'd better use separate fields + 
fieldTypes for different languages.

> I am looking for a clear example of using more than one tokenizer for a
> source single field. My application has a single "body" field which until
> recently was all latin characters, but we're now encountering both English
> and Japanese words in a single message. Obviously, we need to be using CJK
> in addition to WhitespaceTokenizerFactory.
> 
> I've found some references to using copyFields or NGrams but I can't quite
> grasp what the whole solution would look like.


Re: Batch Update Fields

2010-12-03 Thread Markus Jelsma
You must reindex the complete document, even if you just want to update a 
single field.

On Friday 03 December 2010 04:52:04 Adam Estrada wrote:
> OK part 2 of my previous question...
> 
> Is there a way to batch update field values based on a certain criteria?
> For example, if thousands of documents have a field value of 'US' can I
> update all of them to 'United States' programmatically?
> 
> Adam

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Batch Update Fields

2010-12-03 Thread Markus Jelsma


On Friday 03 December 2010 18:20:44 Adam Estrada wrote:
> I wonder...I know that sed would work to find and replace the terms in all
> of the csv files that I am indexing but would it work to find and replace
> key terms in the index?

It'll most likely corrupt your index. Offsets, positions etc won't have the 
proper meaning anymore.

> find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g' {} \;
> 
> That command would iterate through all the files in the data directory and
> replace the country code with the full country name. I many just back up
> the directory and try it. I have it running on csv files right now and
> it's working wonderfully. For those of you interested, I am indexing the
> entire Geonames dataset http://download.geonames.org/export/dump/
> (allCountries.zip) which gives me a pretty comprehensive world gazetteer.
> My next step is gonna be to display the results as KML to view over a
> google globe.
> 
> Thoughts?
> 
> Adam
> 
> On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson 
wrote:
> > No, there's no equivalent to SQL update for all values in a column.
> > You'll have to reindex all the documents.
> > 
> > On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada <
> > estrada.adam.gro...@gmail.com
> > 
> > > wrote:
> > > 
> > > OK part 2 of my previous question...
> > > 
> > > Is there a way to batch update field values based on a certain
> > > criteria? For example, if thousands of documents have a field value of
> > > 'US' can I update all of them to 'United States' programmatically?
> > > 
> > > Adam

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Highlighting parameters

2010-12-03 Thread Markus Jelsma
Yes


Some parameters may be overriden on a per-field basis with the following 
syntax:

  f..=

http://wiki.apache.org/solr/HighlightingParameters


> Is there a way I can specify separate configuration for 2 different fields.
> 
> For field 1 I wan to display only 100 chars, Field 2 200 chars


Index version on slave nodes

2010-12-06 Thread Markus Jelsma
Hi,

The indexversion command in the replicationHandler on slave nodes returns 0 
for indexversion and generation while the details command does return the 
correct information. I haven't found an existing ticket on this one although 
https://issues.apache.org/jira/browse/SOLR-1573 has similarities.

Cheers,

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Stored field value modification

2010-12-06 Thread Markus Jelsma
Hi,

You can create a custom update request processor [1] to strip unwanted input 
as it is about to enter the index.

[1]: http://wiki.apache.org/solr/UpdateRequestProcessor

Cheers,

On Monday 06 December 2010 17:36:09 Emmanuel Bégué wrote:
> Hello,
> 
> Is it possible to manipulate the value of a field before it is stored?
> 
> I'm indexing a database where some field contain raw HTML, including
> named character entities.
> 
> Using solr.HTMLStripCharFilterFactory on the index analyzer, results
> in this HTML being correctly stripped, and named character entities
> replaced by the corresponding characters, in the index (as verified
> when searching, and with Luke).
> 
> But, the stored values of the documents are stored unmodified, so the
> result sets, including highlights, contain HTML tags (that are
> escaped) and "entities" (where the leading '&' is also escaped) which
> make handling the results quite difficult.
> 
> So, is it possible to apply some filters to the data before it is
> stored in the non-indexed fields?
> 
> I couldn't find a part of the documentation that said whether it was
> 
> possible or not; I did find this message in the archives of this list:
> > From: Noble Paul
> > Sent: Tuesday, March 31, 2009 5:41 PM
> > Subject: Re: indexed fields vs stored fields
> > 
> > indexed = can be searched (mean you can use this to query). This
> 
> undergoes tokenization filter etc
> 
> > stored = can be retrieved. No modification to the data. This is
> 
> stored verbatim
> 
> which seems to say that it is not possible; but maybe things have
> changed since then?
> 
> Any other idea? given that:
> - I have zero control over what is stored in the database
> - using the Solr XML update protocol i could probably transform the
> data before sending it
> - ... but I'd much rather continue using DataImportHandler to access
> the database
> 
> Thanks,
> Regards,
> EB

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Index version on slave nodes

2010-12-07 Thread Markus Jelsma
But why? I'd expect valid version numbers although the replication handler's 
source code seems to agree with you judging from the comments.

On Monday 06 December 2010 17:49:16 Xin Li wrote:
> I think this is expected behavior. You have to issue the "details"
> command to get the real indexversion for slave machines.
> 
> Thanks,
> Xin
> 
> On Mon, Dec 6, 2010 at 11:26 AM, Markus Jelsma
> 
>  wrote:
> > Hi,
> > 
> > The indexversion command in the replicationHandler on slave nodes returns
> > 0 for indexversion and generation while the details command does return
> > the correct information. I haven't found an existing ticket on this one
> > although https://issues.apache.org/jira/browse/SOLR-1573 has
> > similarities.
> > 
> > Cheers,
> > 
> > --
> > Markus Jelsma - CTO - Openindex
> > http://www.linkedin.com/in/markus17
> > 050-8536620 / 06-50258350

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Index version on slave nodes

2010-12-07 Thread Markus Jelsma
Yes, i read that too in the replication request handler's source comments. But 
i would find it convenient if it would just use the same values as we see using 
the details command.

Any devs agree? Then i'd open a ticket for this one.

On Tuesday 07 December 2010 17:14:09 Xin Li wrote:
> I read it somewhere (sorry for not remembering the source).. the
> indexversion command gets the "replicable" index version #. Since it
> is a slave machine, so the result is 0.
> 
> Thanks,
> 
> On Tue, Dec 7, 2010 at 11:06 AM, Markus Jelsma
> 
>  wrote:
> > But why? I'd expect valid version numbers although the replication
> > handler's source code seems to agree with you judging from the comments.
> > 
> > On Monday 06 December 2010 17:49:16 Xin Li wrote:
> >> I think this is expected behavior. You have to issue the "details"
> >> command to get the real indexversion for slave machines.
> >> 
> >> Thanks,
> >> Xin
> >> 
> >> On Mon, Dec 6, 2010 at 11:26 AM, Markus Jelsma
> >> 
> >>  wrote:
> >> > Hi,
> >> > 
> >> > The indexversion command in the replicationHandler on slave nodes
> >> > returns 0 for indexversion and generation while the details command
> >> > does return the correct information. I haven't found an existing
> >> > ticket on this one although
> >> > https://issues.apache.org/jira/browse/SOLR-1573 has
> >> > similarities.
> >> > 
> >> > Cheers,
> >> > 
> >> > --
> >> > Markus Jelsma - CTO - Openindex
> >> > http://www.linkedin.com/in/markus17
> >> > 050-8536620 / 06-50258350
> > 
> > --
> > Markus Jelsma - CTO - Openindex
> > http://www.linkedin.com/in/markus17
> > 050-8536620 / 06-50258350

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Warming searchers/Caching

2010-12-07 Thread Markus Jelsma
XInclude works fine but that's not what your looking for i guess. Having the 
100 top queries is overkill anyway and it can take too long for a new searcher 
to warmup.

Depending on the type of requests, i usually tend to limit warming to popular 
filter queries only as they generate a very high hit ratio at make caching 
useful [1].

If there are very popular user entered queries having a high initial latency, 
i'd have them warmed up as well.

[1]: http://wiki.apache.org/solr/SolrCaching#Tradeoffs

> Warning: I haven't used this personally, but Xinclude looks like what
> you're after, see: http://wiki.apache.org/solr/SolrConfigXml#XInclude
> 
> 
> 
> Best
> Erick
> 
> On Tue, Dec 7, 2010 at 6:33 PM, Mark  wrote:
> > Is there any plugin or easy way to auto-warm/cache a new searcher with a
> > bunch of searches read from a file? I know this can be accomplished using
> > the EventListeners (newSearcher, firstSearcher) but I rather not add 100+
> > queries to my solrconfig.xml.
> > 
> > If there is no hook/listener available, is there some sort of Handler
> > that performs this sort of function? Thanks!


Re: customer ping response

2010-12-07 Thread Markus Jelsma
Of course! The ping request handler behaves like any other request handler and 
accepts at last the wt parameter [1]. Use xslt [2] to transform the output to 
any desirable form or use other response writers [1].

Why anyway, is it a load balancer that only wants an OK output or something?

[1]: http://wiki.apache.org/solr/CoreQueryParameters
[2]: http://wiki.apache.org/solr/XsltResponseWriter
[3]: http://wiki.apache.org/solr/QueryResponseWriter
> Can I have a custom xml response for the ping request?
> 
> thanks,
> 
> Tri


Re: customer ping response

2010-12-07 Thread Markus Jelsma
Well, you can go a long way with xslt but i wouldn't know how to embed the 
server name in the response as Solr simply doesn't return that information.

You'd have to patch the response Solr's giving or put a small script in front 
that can embed the server name.

> I need to return this:
> 
> 
> 
> 
> Server
> ok
> 
> 
> 
> 
> 
> 
> 
> From: Markus Jelsma 
> To: solr-user@lucene.apache.org
> Cc: Tri Nguyen 
> Sent: Tue, December 7, 2010 4:27:32 PM
> Subject: Re: customer ping response
> 
> Of course! The ping request handler behaves like any other request handler
> and accepts at last the wt parameter [1]. Use xslt [2] to transform the
> output to any desirable form or use other response writers [1].
> 
> Why anyway, is it a load balancer that only wants an OK output or
> something?
> 
> [1]: http://wiki.apache.org/solr/CoreQueryParameters
> [2]: http://wiki.apache.org/solr/XsltResponseWriter
> [3]: http://wiki.apache.org/solr/QueryResponseWriter
> 
> > Can I have a custom xml response for the ping request?
> > 
> > thanks,
> > 
> > Tri


Map size must not be negative with spatial results + php serialized

2010-12-08 Thread Markus Jelsma
Hi,

Got another issue here. This time it's the PHP serialized response writer 
throwing the following exception only when spatial parameters are set using 
LocalParams in Solr 1.4.1 using JTeam's plugin:

Map size must not be negative

java.lang.IllegalArgumentException: Map size must not be negative
at 
org.apache.solr.request.PHPSerializedWriter.writeMapOpener(PHPSerializedResponseWriter.java:224)
at 
org.apache.solr.request.JSONWriter.writeSolrDocument(JSONResponseWriter.java:398)
at 
org.apache.solr.request.JSONWriter.writeSolrDocumentList(JSONResponseWriter.java:553)
at 
org.apache.solr.request.TextResponseWriter.writeVal(TextResponseWriter.java:148)
at 
org.apache.solr.request.JSONWriter.writeNamedListAsMapMangled(JSONResponseWriter.java:154)
at 
org.apache.solr.request.PHPSerializedWriter.writeNamedList(PHPSerializedResponseWriter.java:100)
at 
org.apache.solr.request.PHPSerializedWriter.writeResponse(PHPSerializedResponseWriter.java:95)
at 
org.apache.solr.request.PHPSerializedResponseWriter.write(PHPSerializedResponseWriter.java:69)
at 
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:325)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

This is only triggered when the searchComponent is added to the request 
handler.

Cheers,

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Map size must not be negative with spatial results + php serialized

2010-12-08 Thread Markus Jelsma
I know, but since it's an Apache component throwing the exception, i'd figure 
someone just might know more about this. An, the guys do visit the list afaik.

Anyway, i'd ask there too. Thanks 

On Wednesday 08 December 2010 15:41:21 Grant Ingersoll wrote:
> That sounds like a JTeam plugin problem, which is not supported here.
> 
> On Dec 8, 2010, at 5:38 AM, Markus Jelsma wrote:
> > Hi,
> > 
> > Got another issue here. This time it's the PHP serialized response writer
> > throwing the following exception only when spatial parameters are set
> > using LocalParams in Solr 1.4.1 using JTeam's plugin:
> > 
> > Map size must not be negative
> > 
> > java.lang.IllegalArgumentException: Map size must not be negative
> > 
> > at
> > 
> > org.apache.solr.request.PHPSerializedWriter.writeMapOpener(PHPSerializedR
> > esponseWriter.java:224)
> > 
> > at
> > 
> > org.apache.solr.request.JSONWriter.writeSolrDocument(JSONResponseWriter.j
> > ava:398)
> > 
> > at
> > 
> > org.apache.solr.request.JSONWriter.writeSolrDocumentList(JSONResponseWrit
> > er.java:553)
> > 
> > at
> > 
> > org.apache.solr.request.TextResponseWriter.writeVal(TextResponseWriter.ja
> > va:148)
> > 
> > at
> > 
> > org.apache.solr.request.JSONWriter.writeNamedListAsMapMangled(JSONRespons
> > eWriter.java:154)
> > 
> > at
> > 
> > org.apache.solr.request.PHPSerializedWriter.writeNamedList(PHPSerializedR
> > esponseWriter.java:100)
> > 
> > at
> > 
> > org.apache.solr.request.PHPSerializedWriter.writeResponse(PHPSerializedRe
> > sponseWriter.java:95)
> > 
> > at
> > 
> > org.apache.solr.request.PHPSerializedResponseWriter.write(PHPSerializedRe
> > sponseWriter.java:69)
> > 
> > at
> > 
> > org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilt
> > er.java:325)
> > 
> > at
> > 
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.ja
> > va:254)
> > 
> > at
> > 
> > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHand
> > ler.java:1089)
> > 
> > at
> > 
> > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
> > 
> > at
> > 
> > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:21
> > 6)
> > 
> > at
> > 
> > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> > 
> > at
> > 
> > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
> > 
> > at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
> > at
> > 
> > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerC
> > ollection.java:211)
> > 
> > at
> > 
> > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java
> > :114)
> > 
> > at
> > 
> > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
> > 
> > at org.mortbay.jetty.Server.handle(Server.java:285)
> > at
> > 
> > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
> > 
> > at
> > 
> > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnec
> > tion.java:821)
> > 
> > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
> > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
> > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
> > at
> > 
> > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java
> > :226)
> > 
> > at
> > 
> > org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.jav
> > a:442)
> > 
> > This is only triggered when the searchComponent is added to the request
> > handler.
> > 
> > Cheers,
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem docs using Solr/Lucene:
> http://www.lucidimagination.com/search

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Solr Image Result

2010-12-08 Thread Markus Jelsma
Well, you might abuse HTML's possibility to embed binary data in  tags 
using base64 encoding. If you manage to extract the binary data and replace it 
in the  tags and store it in Solr, you could output valid HTML including 
images.

But, i might be totally misunderstanding the question here ;)

On Wednesday 08 December 2010 16:12:51 Stefan Matheis wrote:
> I'm pretty sure ali is talking about a result like that:
> http://books.google.de/books?id=xxBZZ5YS06kC&pg=PA165&dq=apache+solr&hl=de&;
> ei=HKD_TI-vN4yWswbmtvDyDg&sa=X&oi=book_result&ct=result&resnum=1&ved=0CDMQ6
> AEwAA#v=onepage&q=apache%20solr&f=false

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Taxonomy and Faceting

2010-12-08 Thread Markus Jelsma
Don't know if its useful but from the old thread:
http://code.google.com/p/solr-uima/wiki/5MinutesTutorial


On Wednesday 08 December 2010 16:18:06 webdev1977 wrote:
> Any luck with a tutorial?  :-)

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: [Casting] values on update/csv

2010-12-08 Thread Markus Jelsma
Should be no problem but please paste the log output etc.

> All,
> 
> I have a csv file and I want to store one of the fields as a tdouble type.
> It does not like that at all...Is there a way to cast the string value to a
> tdouble?
> 
> Thanks,
> Adam


Re: Map size must not be negative with spatial results + php serialized

2010-12-09 Thread Markus Jelsma
Well, in that case i'd open a ticket for this one. The problem is, for now, 
that i can only replicate the behaviour using the spatial plugin. 

On Wednesday 08 December 2010 21:58:06 Chris Hostetter wrote:
> : That's fine - it could be a Solr bug too.
> 
> it definitely looks like a generic solr bug.
> 
> JSONResponseWriter.java:398
> (in the writeSolrDocument method that supports psuedo-fields)
> 
> writeMapOpener(-1); // no trivial way to determine map size
> 
> PHPSerializedResponseWriter.java:221
> (in which PHPSerializedWriter extends JSONWriter)...
> 
>   public void writeMapOpener(int size) throws IOException,
> IllegalArgumentException { // negative size value indicates that something
> has gone wrong
>   if (size < 0) {
>   throw new IllegalArgumentException("Map size must not be 
> negative");
>   }
> 
> 
> ...it looks like PHPSerializedResponseWriter is fundementally broken.
> 
> I suspect the origin of hte problem is that PHPSerializedWriter overrides
> "writeDoc" and that prevented the writeMapOpener(-1) from ever happening,
> but then "writeSolrDocument" was added which PHPSerializedWriter doesn't
> override that.
> 
> 
> 
> -Hoss

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: SolrHome and Solr Data Dir in solrconfig.xml

2010-12-09 Thread Markus Jelsma
What's the context file for Solr under Catalina? Should read something like 
this:


  
  

  


Or you can set the Solr_home environment variable prior to starting Tomcat.

export JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=/usr/share/solr"


http://wiki.apache.org/solr/SolrTomcat


On Thursday 09 December 2010 17:27:27 Bing Li wrote:
> Dear all,
> 
> I am a new user of Solr.
> 
> When using Solr, SolrHome is set to /home/libing/Solr. When Tomcat is
> started, it must read solrconfig.xml to get Solr data dir, which is used to
> contain indexes. However, I have no idea how to associate SolrHome with
> Solr data dir. So a mistake occurs. All the indexes are put under
> $TOMCAT_HOME/bin. This is NOT what I expect. I hope indexes are under
> SolrHome.
> 
> Could you please give me a hand?
> 
> Best,
> Bing Li

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: SOLR geospatial

2010-12-11 Thread Markus Jelsma
That smells like: http://www.jteam.nl/news/spatialsolr.html

> My partner is using a publicly available plugin for GeoSpatial. It is used
> both during indexing and during search. It forms some kind of gridding
> system and puts 10 fields per row related to that. Doing a Radius search
> (vs a bounding box search which is faster in almost all cases in all
> GeoSpatial query systems) seems pretty fast. GeoSpatial was our project's
> constraint. We've moved past that now.
> 
> Did I mention that it returns distance from the center of the radius based
> on units supplied in the query?
> 
> I would tell you what the plugin is, but in our division of labor, I have
> kept that out of my short term memory. You can contact him at:
> Danilo Unite ;
> 
> Dennis Gearon
> 
> 
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a
> better idea to learn from others’ mistakes, so you do not have to make
> them yourself. from
> 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> 
> 
> EARTH has a Right To Life,
> otherwise we all die.
> 
> 
> 
> - Original Message 
> From: George Anthony 
> To: solr-user@lucene.apache.org
> Sent: Fri, December 10, 2010 9:23:18 AM
> Subject: SOLR geospatial
> 
> In looking at some of the docs support for geospatial search.
> 
> I see this functionality is mostly scheduled for upcoming release 4.0 (with
> some
> 
> playing around with backported code).
> 
> 
> I note the support for the bounding box filter, but will "bounding box" be
> one of the supported *data* types for use with this filter?  For example,
> if my lat/long data describes the "footprint" of a map, I'm curious if
> that type of coordinate data can be used by the bounding box filter (or in
> any other way for similar limiting/filtering capability). I see it can
> work with point type data but curious about functionality with bounding
> box type data (in contrast to simple point lat/long data).
> 
> Thanks,
> George


Re: Very high load after replicating

2010-12-12 Thread Markus Jelsma
There can be numerous explanations such as your configuration (cache warm 
queries, merge factor, replication events etc) but also I/O having trouble 
flushing everything to disk. It could also be a memory problem, the OS might 
start swapping if you allocate too much RAM to the JVM leaving little for the 
OS to work with.

You need to provide more details.

> After replicating an index of around 20g my slaves experience very high
> load (50+!!)
> 
> Is there anything I can do to alleviate this problem?  Would solr cloud
> be of any help?
> 
> thanks


Re: Which query parser and how to do full text on mulitple fields

2010-12-12 Thread Markus Jelsma
Pradeep is right, but, check the solrconfig, the query parser is defined there. 
Look for the basedOn attribute in the queryParser element.



> You said you were using a third party plugin. What do you expect people
> herre to know? Solr plugins don't have parameters lat, long, radius and
> threadCount (they have pt and dist).
> 
> On Sun, Dec 12, 2010 at 4:47 PM, Dennis Gearon wrote:
> > Which query parser did my partner set up below, and how to I parse three
> > fields
> > in the index for scoring and returning results?
> > 
> > 
> > 
> > 
> > /solr/select?wt=json&indent=true&start=0&rows=20&q={!spatial%20lat=37.326
> > 375%20long=-121.892639%20radius=3%20unit=km%20threadCount=3}title:Art%20L
> > oft
> > 
> >  Dennis Gearon
> > 
> > Signature Warning
> > 
> > It is always a good idea to learn from your own mistakes. It is usually a
> > better
> > idea to learn from others’ mistakes, so you do not have to make them
> > yourself.
> > from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> > 
> > 
> > EARTH has a Right To Life,
> > otherwise we all die.


Re: Rebuild Spellchecker based on cron expression

2010-12-12 Thread Markus Jelsma
Maybe you've overlooked the build parameter?
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.build

> Hi,
> 
> the spellchecker component already provides a buildOnCommit and
> buildOnOptimize option.
> 
> Since we have several spellchecker indices building on each commit is
> not really what we want to do.
> Building on optimize is not possible as index optimization is done on
> the master and the slaves don't even run an optimize but only fetch
> the optimized index.
> 
> Therefore I'm thinking about an extension of the spellchecker that
> allows you to rebuild the spellchecker based on a cron-expression
> (e.g. rebuild each night at 1 am).
> 
> What do you think about this, is there anybody else interested in this?
> 
> Regarding the lifecycle, is there already some executor "framework" or
> any regularly running process in place, or would I have to pull up my
> own thread? If so, how can I stop my thread when solr/tomcat is
> shutdown (I couldn't see any shutdown or destroy method in
> SearchComponent)?
> 
> Thanx for your feedback,
> cheers,
> Martin


Re: Which query parser and how to do full text on mulitple fields

2010-12-12 Thread Markus Jelsma
The manual answers most questions.

> Oh, I didn't know that the syntax didn't show the parser used, that it was
> set in the config file.
> 
> I'll talk to my partner, thanks.
> 
>  Dennis Gearon
> 
> 
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a
> better idea to learn from others’ mistakes, so you do not have to make
> them yourself. from
> 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> 
> 
> EARTH has a Right To Life,
> otherwise we all die.
> 
> 
> 
> - Original Message 
> From: Markus Jelsma 
> To: solr-user@lucene.apache.org
> Cc: Pradeep Singh 
> Sent: Sun, December 12, 2010 5:08:11 PM
> Subject: Re: Which query parser and how to do full text on mulitple fields
> 
> Pradeep is right, but, check the solrconfig, the query parser is defined
> there. Look for the basedOn attribute in the queryParser element.
> 
> > You said you were using a third party plugin. What do you expect people
> > herre to know? Solr plugins don't have parameters lat, long, radius and
> > threadCount (they have pt and dist).
> > 
> > On Sun, Dec 12, 2010 at 4:47 PM, Dennis Gearon 
wrote:
> > > Which query parser did my partner set up below, and how to I parse
> > > three fields
> > > in the index for scoring and returning results?
> > > 
> > > 
> > > 
> > > 
> > > /solr/select?wt=json&indent=true&start=0&rows=20&q={!spatial%20lat=37.3
> > > 26
> > > 375%20long=-121.892639%20radius=3%20unit=km%20threadCount=3}title:Art%
> > > 20L oft
> > > 
> > >  Dennis Gearon
> > > 
> > > Signature Warning
> > > 
> > > It is always a good idea to learn from your own mistakes. It is usually
> > > a better
> > > idea to learn from others’ mistakes, so you do not have to make them
> > > yourself.
> > > from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> > > 
> > > 
> > > EARTH has a Right To Life,
> > > otherwise we all die.


Re: SpatialTierQueryParserPlugin Loading Error

2010-12-14 Thread Markus Jelsma
Where did you put the jar?

> All,
> 
> Can anyone shed some light on this error. I can't seem to get this
> class to load. I am using the distribution of Solr from Lucid
> Imagination and the Spatial Plugin from here
> https://issues.apache.org/jira/browse/SOLR-773. I don't know how to
> apply a patch but the jar file is in there. What else can I do?
> 
> org.apache.solr.common.SolrException: Error loading class
> 'org.apache.solr.spatial.tier.SpatialTierQueryParserPlugin'
>   at
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:
> 373) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at
> org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435) at
> org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498)
>   at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492)
>   at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525)
>   at org.apache.solr.core.SolrCore.initQParsers(SolrCore.java:1442)
>   at org.apache.solr.core.SolrCore.(SolrCore.java:548)
>   at
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.ja
> va:137) at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83
> ) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
> at
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
> at
> org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:59
> 4) at org.mortbay.jetty.servlet.Context.startContext(Context.java:139) at
> org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:121
> 8) at
> org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
> at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
> at
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
> at
> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java
> :147) at
> org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerC
> ollection.java:161) at
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
> at
> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java
> :147) at
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
> at
> org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
> at org.mortbay.jetty.Server.doStart(Server.java:210)
>   at
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
> at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929) at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at org.mortbay.start.Main.invokeMain(Main.java:183)
>   at org.mortbay.start.Main.start(Main.java:497)
>   at org.mortbay.start.Main.main(Main.java:115)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.solr.spatial.tier.SpatialTierQueryParserPlugin
>   at java.net.URLClassLoader$1.run(Unknown Source)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(Unknown Source)
>   at java.lang.ClassLoader.loadClass(Unknown Source)
>   at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
>   at java.lang.ClassLoader.loadClass(Unknown Source)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Unknown Source)
>   at
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:
> 357) ... 33 more


Re: De-duplication not working as I expected - duplicates still getting into the index

2010-12-14 Thread Markus Jelsma
Check this setting:
  false


On Tuesday 14 December 2010 14:26:21 Jason Brown wrote:
> I have configured de-duplication according to the Wiki..
> 
> My signature field is defined thus...
> 
>  multiValued="false" />
> 
> and my updateRequestProcessor as follows
> 
> 
>  class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
> true
>   false
>   signature
>   content
>name="signatureClass">org.apache.solr.update.processor.Lookup3Signature tr> 
> 
> 
>   
> 
> I am using SOLRJ to write to the index with the binary (as opposed to XML)
> so my update handler is defined as below.
> 
>   class="solr.BinaryUpdateRequestHandler" > 
>   dedupe
> 
>   
> 
> However I was expecting SOLR to only allow 1 instance of a duplicate
> document into the index, but I get the following results when I query mt
> index...
> 
> I have deliberately added my ISA Letter file 4 times and can see it has
> correctly generated an identical signature for the first 4 entries
> (d91a5ce933457fd5). The fifth entry is a different document and correctly
> has a different signature.
> 
> I was expecting to only see 1 instance of the duplicate. Am I
> misinterpreting the way it works? Many Thanks.
> 
> 
> ?
> 
> ISA Letter
> d91a5ce933457fd5
> 
> ?
> 
> ISA Letter
> d91a5ce933457fd5
> 
> ?
> 
> ISA Letter
> d91a5ce933457fd5
> 
> ?
> 
> ISA Letter
> d91a5ce933457fd5
> 
> ?
> 
> ISA Mailing pack letter
> fd9d9e1c0de32fb5
> 
> 
> If you wish to view the St. James's Place email disclaimer, please use the
> link below
> 
> http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: SpatialTierQueryParserPlugin Loading Error

2010-12-14 Thread Markus Jelsma
Anyway, try putting the jar in 
work/Jetty_0_0_0_0_8983_solr.war__solr__k1kf17/webapp/WEB-INF/lib/ 


On Tuesday 14 December 2010 11:10:47 Markus Jelsma wrote:
> Where did you put the jar?
> 
> > All,
> > 
> > Can anyone shed some light on this error. I can't seem to get this
> > class to load. I am using the distribution of Solr from Lucid
> > Imagination and the Spatial Plugin from here
> > https://issues.apache.org/jira/browse/SOLR-773. I don't know how to
> > apply a patch but the jar file is in there. What else can I do?
> > 
> > org.apache.solr.common.SolrException: Error loading class
> > 'org.apache.solr.spatial.tier.SpatialTierQueryParserPlugin'
> > 
> > at
> > 
> > org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java
> > : 373) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413)
> > at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435)
> > at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498)
> > 
> > at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492)
> > at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525)
> > at org.apache.solr.core.SolrCore.initQParsers(SolrCore.java:1442)
> > at org.apache.solr.core.SolrCore.(SolrCore.java:548)
> > at
> > 
> > org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.j
> > a va:137) at
> > org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:8
> > 3 ) at
> > org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99) at
> > org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
> > at
> > org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:5
> > 9 4) at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
> > at
> > org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1
> > 21 8) at
> > org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
> > at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
> > at
> > org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
> > at
> > org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.jav
> > a
> > 
> > :147) at
> > 
> > org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandler
> > C ollection.java:161) at
> > org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
> > at
> > org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.jav
> > a
> > 
> > :147) at
> > 
> > org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
> > at
> > org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
> > at org.mortbay.jetty.Server.doStart(Server.java:210)
> > 
> > at
> > 
> > org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
> > at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929) at
> > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > 
> > at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
> > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
> > at java.lang.reflect.Method.invoke(Unknown Source)
> > at org.mortbay.start.Main.invokeMain(Main.java:183)
> > at org.mortbay.start.Main.start(Main.java:497)
> > at org.mortbay.start.Main.main(Main.java:115)
> > 
> > Caused by: java.lang.ClassNotFoundException:
> > org.apache.solr.spatial.tier.SpatialTierQueryParserPlugin
> > 
> > at java.net.URLClassLoader$1.run(Unknown Source)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at java.net.URLClassLoader.findClass(Unknown Source)
> > at java.lang.ClassLoader.loadClass(Unknown Source)
> > at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
> > at java.lang.ClassLoader.loadClass(Unknown Source)
> > at java.lang.Class.forName0(Native Method)
> > at java.lang.Class.forName(Unknown Source)
> > at
> > 
> > org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java
> > : 357) ... 33 more

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Problem using curl in PHP to get Solr results

2010-12-15 Thread Markus Jelsma
The GeoDistanceComponent triggers the problem. It may be an issue in the 
component but it could very well be a Solr issue. It seems you missed a very 
recent thread on this one.

https://issues.apache.org/jira/browse/SOLR-2278

> I finally figured out how to use curl to GET results, i.e. just turn all
> spaces into '%20' in my type of queries. I'm using solar spatial, and then
> searching in both the default text field and a couple of columns. Works
> fine on in the browser.
> 
> But if I query for it using curl in PHP, there's an error somewhere in the
> JSON. I don't know if it's in the PHP food chain or something else.
> 
> 
> Just putting my solution to GETing from curl in PHP and my problem up here,
> for others to find.
> 
>  Of course, if anyone knows the answer, all the better.
> 
>  Dennis Gearon
> 
> 
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a
> better idea to learn from others’ mistakes, so you do not have to make
> them yourself. from
> 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> 
> 
> EARTH has a Right To Life,
> otherwise we all die.


Re: Reg blank values ( ) tags in SOLR XML

2010-12-20 Thread Markus Jelsma
No. But why is it a problem? A standard XML parser won't feel the difference.

> Hi,
> 
> In SOLR XML the blank spaces are displayed with just  tags
> 
> Is there a way I can make SOLR XML to display the blank values as
> 
> 
> 
> instead of just
> 
> 
> 
> Also has anyone parsed the blank value tags using SOLRNET before?
> 
> If anyone can help me with my question or provide pointers it would be of
> great help!!!
> 
> Thanks,
> Barani


XInclude in multi core

2010-12-22 Thread Markus Jelsma
Hi,

In a test set up i have a master and slave in the same JVM but different cores. 
Of course i'd like to replicate configuration files and include some via 
XInclude.

The problem is the href path; it's can't use properties and is relative to the 
servlet container.

Here's the problem, i also replicate the solrconfig.xml so a include 
solr/corename/conf/file.xml will not work in the cores i replicate it to and i 
can't embed some corename property in the href to make it generic.

Anyone knows a trick here? Thanks!

Cheers,
-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Solr Spellcheker automatically tokenizes on period marks

2010-12-22 Thread Markus Jelsma
Check the analyzer of the field you defined for queryAnalyzerFieldType which is 
configured in the search component.

On Wednesday 22 December 2010 16:32:18 Sebastian M wrote:
> Hello,
> 
> 
> My main (full text) index contains the terms "www", "sometest", "com",
> which is intended and correct.
> 
> My spellcheck index contains the term "www.sometest.com". which is also
> intended and correct.
> 
> However, when querying the spellchecker using the query "www.sometest.com",
> I get the suggestion "www.www.sometest.com.com", despite the fact that I'm
> not using a tokenizer that splits on "." (period marks) as part of my
> spellcheck query analyzer.
> 
> When running the Field Analyzer (in the Solr admin page), I can see that
> even after the last filter (see below), my term text remains
> "www.sometest.com", which is untokenized, as expected.
> 
> Any thoughts as to what may be causing this undesired tokenization?
> 
> To summarize:
> 
> Main index contains: "www", "sometest", "com"
> Spellcheck index contains: "www.sometest.com"
> Spellcheck query: "www.sometest.com"
> Expected result: (no suggestion)
> Actual result: "www.www.sometest.com.com"
> 
> 
> Here is my spellcheck query analyzer:
> 
>   
>ignoreCase="true" expand="true"/>
>words="stopwords.txt"/>
>   
>   
>   
> 
> 
> 
> 
> Thank you in advance; any suggestions are welcome!
> Sebastian

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: error in html???

2010-12-23 Thread Markus Jelsma
These

HTTP Status 500 - null java.lang.NullPointerException at 
java.io.StringReader.(StringReader.java:50) at 

are returned in HTML. I use Nginx to detect the HTTP error code and return a 
JSON encoded body with the appropriate content type. Maybe it could be done in 
the servlet container but i never tried.

> What html format? Solr responds in XML, not HTML. Any HTML
> has to be created somewhere in the chain. Your browser
> may not be set up to render XML, so you could be seeing problems
> because of that.
> 
> If hit is off-base, could you explain your issue in a bit more detail?
> 
> Best
> Erick
> 
> On Thu, Dec 23, 2010 at 6:30 AM, satya swaroop 
wrote:
> > Hi All,
> > 
> > I am able to get the response in the success case in json format
> > by
> > 
> > stating wt=json in the query. But as in case if any errors i am geting in
> > html format.
> > 
> >  1) Is there any specified reason to get in html format??
> >  2)cant we get the error result in json format??
> > 
> > Regards,
> > satya


Re: Solr confFiles replication

2010-12-28 Thread Markus Jelsma
Check your configuration and log file. And, remember, log files will only get 
replicated if their hashes are different. And, new configuration files will not 
be replicated, you'll need to upload them to the slaves manually for the first 
time. Slaves will not replicate what they don't have.

> Hello Apache Solr users,
> 
> I have master-slave replication setup, and slave is getting index data
> replicated but not configured confFiles. What could be the problem?
> Solr 1.4.1 is used.
> 
> Regards,
> Stevo.


Re: Solr confFiles replication

2010-12-28 Thread Markus Jelsma


On Tuesday 28 December 2010 15:02:24 Stevo Slavić wrote:
> Thanks Markus for the insight!
> 
> I've figured out that initially conf files need to be put manually on
> slaves so slaves know how to connect to master to start polling. I've
> attempted several times to send this question of mine to solr-user
> mailing list, got refused with spam qualifications, found it was
> because email was in html format. After switching to plain text, email
> reached mailing list but I've stripped off information during attempts
> and didn't mention that replication of index data works - only conf
> file replication doesn't work. Maybe hashes of conf files are the
> issue here. Are they calculated automatically by master and slave? I
> assume protocol is same as for index data, where slave issues
> replicaiton request, gets in response list of conf files with metadata
> including hashes that master calculated for its conf files configured
> for replication, slave then calculates hashes of its local conf files
> and does comparison with metadata received from master, and decides
> whether to download or not conf files.

Well, that's about how it works in a nut shell.

> 
> SolrReplication wiki page mentions "Only files in the 'conf' dir of
> the solr instance are replicated."  (wish I could underline that "solr
> instance" fragment) - in my case there are two cores/indexes on single
> solr instance, where each core has its own /conf (and /data) dir -
> since index data replication works well (appropriate core index data
> is replicated) I assume that it's only wrong/incomplete sentence that
> instance conf dir is mentioned and not core conf dir.

Replication in multi core works as expected. In this case instance dir equals 
solr/corename/conf/.

> 
> Same wiki page also mentiones "The files are replicated only along
> with a fresh index. That means even if a file is changed in the master
> the file is replicated only after there is a new commit/optimize on
> the master. ". This sentence doesn't mention after startup conf files
> replication. Does this mean that schema.xml replication will not occur
> after master startup until commit/optimize is issued in case when all
> of the following is done:
> - schema.xml is listed in confFiles
> - master is configured to replicateAfter startup, or commit or optimize
> - master gets brought down
> - master index data is deleted
> - master schema.xml is changed
> - and master is started up again?

Configuration files will only be sent over when index files are to be 
replicated. 
So if the master is reindexed, it will generate a new indexVersion, triggering 
the replication events on the slaves. Then the configuration files are 
replicated as well. Forcing replication won't replicatie configuration files 
iirc.

> 
> Regards,
> Stevo.
> 
> On Tue, Dec 28, 2010 at 1:06 PM, Markus Jelsma
> 
>  wrote:
> > Check your configuration and log file. And, remember, log files will only
> > get replicated if their hashes are different. And, new configuration
> > files will not be replicated, you'll need to upload them to the slaves
> > manually for the first time. Slaves will not replicate what they don't
> > have.
> > 
> >> Hello Apache Solr users,
> >> 
> >> I have master-slave replication setup, and slave is getting index data
> >> replicated but not configured confFiles. What could be the problem?
> >> Solr 1.4.1 is used.
> >> 
> >> Regards,
> >> Stevo.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: geospatial search support for SOLR 1.3 and 1.4?

2010-12-28 Thread Markus Jelsma
You should upgrade indeed and use a 3rd-party plugin or wait for current trunk 
(Solr 4) or branch 3.x to be released but that might take a while.

But if your data set is small enough and you don't have many updates you could 
compute the distance sets outside Solr once a day as we did in 1.3. But we 
only used it to show near by items for a given document, no distance sorting.

On Tuesday 28 December 2010 15:55:43 Bharat Jain wrote:
> hi,
>  we are currently using SOLR 1.3 and planning to use location based search
> for some of functionality. Is there any support for such a thing in 1.3? Do
> we need to upgrade to 1.4+ version.
> 
> Thanks
> Bharat Jain

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Carrot2 clustering tool Beginner

2010-12-30 Thread Markus Jelsma
How about reading the wiki:
http://wiki.apache.org/solr/ClusteringComponent

On Thursday 30 December 2010 13:21:19 Isha Garg wrote:
> Hi,
> I am new to carrot2 clustering tool. Can anyone Guide me related to
> this tool and how it  can integrate  with solr or lucene.
> 
> Thanks!
> Seeking for your guidance.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Sort Facet Query

2010-12-30 Thread Markus Jelsma
No
http://wiki.apache.org/solr/SimpleFacetParameters#facet.sort


On Thursday 30 December 2010 15:42:14 Stephen Duncan Jr wrote:
> Set facet.limit to -1 (globally or for that field).  That will return all
> the facets, in lexicographical order.
> 
> Stephen Duncan Jr
> www.stephenduncanjr.com
> 
> On Thu, Dec 30, 2010 at 9:04 AM, Em  wrote:
> > Hi List,
> > 
> > I got a little issue with sorting a FacetQuery.
> > 
> > Currently I am doing something like that in SolrJ:
> > 
> > SolrQuery q = new SolrQuery("myQuery");
> > q.setFacetQuery("names:thomas");//want to see the count of thomas's
> > documents.
> > q.setFacetPrefix("short", "th");
> > 
> > I don't know any better example, but the result from all those facets
> > should
> > be returned in lexicographic order, not by count - so i can ensure that
> > every constraint is returned at the same place.
> > 
> > Any ideas?
> > 
> > Thank you!
> > 
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/Sort-Facet-Query-tp2167635p2167635.htm
> > l Sent from the Solr - User mailing list archive at Nabble.com.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: old index files not deleted on slave

2011-01-03 Thread Markus Jelsma
rwise, the setup
> > is pretty vanilla. The OS is linux, the indexes are on local
> > directories, write permissions look ok, nothing unusual in the config
> > (default deletion policy, etc.). Contents of the index data dir:
> > 
> > master:
> > -rw-rw-r-- 1 feeddo feeddo  191 Dec 14 01:06 _1lg.fnm
> > -rw-rw-r-- 1 feeddo feeddo  26M Dec 14 01:07 _1lg.fdx
> > -rw-rw-r-- 1 feeddo feeddo 1.9G Dec 14 01:07 _1lg.fdt
> > -rw-rw-r-- 1 feeddo feeddo 474M Dec 14 01:12 _1lg.tis
> > -rw-rw-r-- 1 feeddo feeddo  15M Dec 14 01:12 _1lg.tii
> > -rw-rw-r-- 1 feeddo feeddo 144M Dec 14 01:12 _1lg.prx
> > -rw-rw-r-- 1 feeddo feeddo 277M Dec 14 01:12 _1lg.frq
> > -rw-rw-r-- 1 feeddo feeddo  311 Dec 14 01:12 segments_1ji
> > -rw-rw-r-- 1 feeddo feeddo  23M Dec 14 01:12 _1lg.nrm
> > -rw-rw-r-- 1 feeddo feeddo  191 Dec 18 01:11 _24e.fnm
> > -rw-rw-r-- 1 feeddo feeddo  26M Dec 18 01:12 _24e.fdx
> > -rw-rw-r-- 1 feeddo feeddo 1.9G Dec 18 01:12 _24e.fdt
> > -rw-rw-r-- 1 feeddo feeddo 483M Dec 18 01:23 _24e.tis
> > -rw-rw-r-- 1 feeddo feeddo  15M Dec 18 01:23 _24e.tii
> > -rw-rw-r-- 1 feeddo feeddo 146M Dec 18 01:23 _24e.prx
> > -rw-rw-r-- 1 feeddo feeddo 283M Dec 18 01:23 _24e.frq
> > -rw-rw-r-- 1 feeddo feeddo  311 Dec 18 01:24 segments_1xz
> > -rw-rw-r-- 1 feeddo feeddo  23M Dec 18 01:24 _24e.nrm
> > -rw-rw-r-- 1 feeddo feeddo  191 Dec 18 13:15 _25z.fnm
> > -rw-rw-r-- 1 feeddo feeddo  26M Dec 18 13:16 _25z.fdx
> > -rw-rw-r-- 1 feeddo feeddo 1.9G Dec 18 13:16 _25z.fdt
> > -rw-rw-r-- 1 feeddo feeddo 484M Dec 18 13:35 _25z.tis
> > -rw-rw-r-- 1 feeddo feeddo  15M Dec 18 13:35 _25z.tii
> > -rw-rw-r-- 1 feeddo feeddo 146M Dec 18 13:35 _25z.prx
> > -rw-rw-r-- 1 feeddo feeddo 284M Dec 18 13:35 _25z.frq
> > -rw-rw-r-- 1 feeddo feeddo   20 Dec 18 13:35 segments.gen
> > -rw-rw-r-- 1 feeddo feeddo  311 Dec 18 13:35 segments_1y1
> > -rw-rw-r-- 1 feeddo feeddo  23M Dec 18 13:35 _25z.nrm
> > 
> > slave:
> > -rw-rw-r-- 1 feeddo feeddo   20 Dec 13 17:54 segments.gen
> > -rw-rw-r-- 1 feeddo feeddo  191 Dec 15 01:07 _1mk.fnm
> > -rw-rw-r-- 1 feeddo feeddo  26M Dec 15 01:08 _1mk.fdx
> > -rw-rw-r-- 1 feeddo feeddo 1.9G Dec 15 01:08 _1mk.fdt
> > -rw-rw-r-- 1 feeddo feeddo 476M Dec 15 01:18 _1mk.tis
> > -rw-rw-r-- 1 feeddo feeddo  15M Dec 15 01:18 _1mk.tii
> > -rw-rw-r-- 1 feeddo feeddo 144M Dec 15 01:18 _1mk.prx
> > -rw-rw-r-- 1 feeddo feeddo 278M Dec 15 01:18 _1mk.frq
> > -rw-rw-r-- 1 feeddo feeddo  312 Dec 15 01:18 segments_1kj
> > -rw-rw-r-- 1 feeddo feeddo  23M Dec 15 01:18 _1mk.nrm
> > -rw-rw-r-- 1 feeddo feeddo0 Dec 15 01:19
> > lucene-cdaa80c0fefe1a7dfc7aab89298c614c-write.lock
> > -rw-rw-r-- 1 feeddo feeddo  191 Dec 15 13:14 _1qu.fnm
> > -rw-rw-r-- 1 feeddo feeddo  26M Dec 15 13:16 _1qu.fdx
> > -rw-rw-r-- 1 feeddo feeddo 1.9G Dec 15 13:16 _1qu.fdt
> > -rw-rw-r-- 1 feeddo feeddo 477M Dec 15 13:28 _1qu.tis
> > -rw-rw-r-- 1 feeddo feeddo  15M Dec 15 13:28 _1qu.tii
> > -rw-rw-r-- 1 feeddo feeddo 144M Dec 15 13:28 _1qu.prx
> > -rw-rw-r-- 1 feeddo feeddo 278M Dec 15 13:28 _1qu.frq
> > -rw-rw-r-- 1 feeddo feeddo  311 Dec 15 13:28 segments_1oe
> > -rw-rw-r-- 1 feeddo feeddo  23M Dec 15 13:28 _1qu.nrm
> > -rw-rw-r-- 1 feeddo feeddo  191 Dec 17 01:12 _222.fnm
> > -rw-rw-r-- 1 feeddo feeddo  26M Dec 17 01:15 _222.fdx
> > -rw-rw-r-- 1 feeddo feeddo 1.9G Dec 17 01:15 _222.fdt
> > -rw-rw-r-- 1 feeddo feeddo 481M Dec 17 01:36 _222.tis
> > -rw-rw-r-- 1 feeddo feeddo  15M Dec 17 01:36 _222.tii
> > -rw-rw-r-- 1 feeddo feeddo 145M Dec 17 01:36 _222.prx
> > -rw-rw-r-- 1 feeddo feeddo 281M Dec 17 01:36 _222.frq
> > -rw-rw-r-- 1 feeddo feeddo  23M Dec 17 01:36 _222.nrm
> > -rw-rw-r-- 1 feeddo feeddo  311 Dec 17 01:36 segments_1xv
> > -rw-rw-r-- 1 feeddo feeddo  191 Dec 17 13:10 _233.fnm
> > -rw-rw-r-- 1 feeddo feeddo  26M Dec 17 13:13 _233.fdx
> > -rw-rw-r-- 1 feeddo feeddo 1.9G Dec 17 13:13 _233.fdt
> > -rw-rw-r-- 1 feeddo feeddo 482M Dec 17 13:31 _233.tis
> > -rw-rw-r-- 1 feeddo feeddo  15M Dec 17 13:31 _233.tii
> > -rw-rw-r-- 1 feeddo feeddo 146M Dec 17 13:31 _233.prx
> > -rw-rw-r-- 1 feeddo feeddo 282M Dec 17 13:31 _233.frq
> > -rw-rw-r-- 1 feeddo feeddo  311 Dec 17 13:31 segments_1xx
> > -rw-rw-r-- 1 feeddo feeddo  23M Dec 17 13:31 _233.nrm
> > -rw-rw-r-- 1 feeddo feeddo  191 Dec 18 01:11 _24e.fnm
> > -rw-rw-r-- 1 feeddo feeddo  26M Dec 18 01:12 _24e.fdx
> > -rw-rw-r-- 1 feeddo feeddo 1.9G Dec 18 01:12 _24e.fdt
> > -rw-rw-r-- 1 feeddo feeddo 483M Dec 18 01:23 _24e.tis
> > -rw-rw-r-- 1 feeddo feeddo  15M Dec 18 01:23 _24e.tii
> > -rw-rw-r-- 1 feeddo feeddo 146M Dec 18 01:23 _24e.prx

Replication: the web application [/solr] .. likely to create a memory leak

2011-01-04 Thread Markus Jelsma
Hi,

Anyone seen this before when stopping of restarting Solr 1.4.1 running as 
slave under Tomcat 6?

SEVERE: The web application [/solr] appears to have started a thread named 
[MultiThreadedHttpConnectionManager cleanup] but has failed to stop it. This 
is very likely to create a memory leak.

It does _not_ happen when i set enable="false" in the slave part of the 
replication RH. I haven't tested it under Jetty because it can be reproduced 
by toggling replication only.

I think it somehow relates to my other issue where old index files are deleted. 
There is a connection between regularly restarting Tomcat and the problem 
showing up: http://www.mail-archive.com/solr-
u...@lucene.apache.org/msg45067.html


It can also trigger multiple related exceptions errors:


Jan 4, 2011 3:18:13 PM org.apache.catalina.loader.WebappClassLoader 
clearReferencesThreads
SEVERE: The web application [/solr] appears to have started a thread named 
[pool-1-thread-1] but has failed to stop it. This is very likely to create a 
memory leak.
Jan 4, 2011 3:18:13 PM org.apache.catalina.loader.WebappClassLoader 
clearReferencesThreads
SEVERE: The web application [/solr] appears to have started a thread named 
[pool-3-thread-1] but has failed to stop it. This is very likely to create a 
memory leak.
Jan 4, 2011 3:18:13 PM org.apache.catalina.loader.WebappClassLoader 
clearReferencesThreads
SEVERE: The web application [/solr] appears to have started a thread named 
[MultiThreadedHttpConnectionManager cleanup] but has failed to stop it. This 
is very likely to create a memory leak.


Below is a relevant part of the log with only one error:


Jan 4, 2011 3:09:47 PM org.apache.catalina.core.StandardService stop
INFO: Stopping service Catalina
Jan 4, 2011 3:09:47 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=null path=null 
params={sort=sort_tijd+desc&start=0&event=firstSearcher&q=*:*&fq=catlevel1:"Boeken"&rows=30}
 
hits=325104 status=0 QTime=31 
Jan 4, 2011 3:09:47 PM org.apache.solr.core.SolrCore close
INFO: []  CLOSING SolrCore org.apache.solr.core.solrc...@5c5ddd3
Jan 4, 2011 3:09:47 PM org.apache.solr.core.SolrCore closeSearcher
INFO: [] Closing main searcher on request.
Jan 4, 2011 3:09:47 PM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener done.
Jan 4, 2011 3:09:47 PM org.apache.solr.search.SolrIndexSearcher close
INFO: Closing searc...@35a8d460 main

fieldValueCache{lookups=6,hits=0,hitratio=0.00,inserts=3,evictions=0,size=3,warmupTime=0,cumulative_lookups=6,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=3,cumulative_evictions=0,item_f_bijzonderheden={field=f_bijzonderheden,memSize=13062096,tindexSize=52,time=21,phase1=10,nTerms=4,bigTerms=0,termInstances=665,uses=1},item_f_autoaccessoires={field=f_autoaccessoires,memSize=14156450,tindexSize=46,time=562,phase1=541,nTerms=19,bigTerms=0,termInstances=1077665,uses=1},item_f_sp_eigenschappen={field=f_sp_eigenschappen,memSize=13062188,tindexSize=46,time=186,phase1=125,nTerms=3,bigTerms=1,termInstances=436,uses=1}}

filterCache{lookups=130,hits=125,hitratio=0.96,inserts=183,evictions=0,size=157,warmupTime=0,cumulative_lookups=130,cumulative_hits=125,cumulative_hitratio=0.96,cumulative_inserts=183,cumulative_evictions=0}

queryResultCache{lookups=308,hits=124,hitratio=0.40,inserts=184,evictions=0,size=157,warmupTime=0,cumulative_lookups=308,cumulative_hits=124,cumulative_hitratio=0.40,cumulative_inserts=184,cumulative_evictions=0}

documentCache{lookups=4046,hits=2096,hitratio=0.51,inserts=1950,evictions=0,size=1933,warmupTime=0,cumulative_lookups=4046,cumulative_hits=2096,cumulative_hitratio=0.51,cumulative_inserts=1950,cumulative_evictions=0}
Jan 4, 2011 3:09:47 PM org.apache.solr.update.DirectUpdateHandler2 close
INFO: closing 
DirectUpdateHandler2{commits=0,autocommits=0,optimizes=0,rollbacks=0,expungeDeletes=0,docsPending=0,adds=0,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=0,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_errors=0}
Jan 4, 2011 3:09:47 PM org.apache.solr.update.DirectUpdateHandler2 close
INFO: closed 
DirectUpdateHandler2{commits=0,autocommits=0,optimizes=0,rollbacks=0,expungeDeletes=0,docsPending=0,adds=0,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=0,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_errors=0}
Jan 4, 2011 3:09:47 PM org.apache.catalina.loader.WebappClassLoader 
clearReferencesThreads
SEVERE: The web application [/solr] appears to have started a thread named 
[MultiThreadedHttpConnectionManager cleanup] but has failed to stop it. This 
is very likely to create a memory leak.
Jan 4, 2011 3:09:48 PM org.apache.coyote.http11.Http11Protocol destroy
INFO: Stopping Coyote HTTP/1.1 on http-8080


-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Replication: the web application [/solr] .. likely to create a memory leak

2011-01-04 Thread Markus Jelsma
Is it possible this problem has something to do with my old index files not 
being removed? This problem only surfaces in my setup when i restart with 
replication on the slave. I can confirm that for some reason my replicated 
indexes get messed up only when i start restarting Tomcat several times.

On Tuesday 04 January 2011 15:48:31 Yonik Seeley wrote:
> On Tue, Jan 4, 2011 at 9:34 AM, Robert Muir  wrote:
> >[junit] WARNING: test class left thread running:
> > Thread[MultiThreadedHttpConnectionManager cleanup,5,main]
> 
> I suppose we should move MultiThreadedHttpConnectionManager to
> CoreContainer.
> 
> -Yonik
> http://www.lucidimagination.com

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Replication: abort-fetch and restarting

2011-01-04 Thread Markus Jelsma
Hi,

It seems abort-fetch nicely removes the index directory which i'm replicating 
to which is fine. Restarting, however, does not trigger the the same feature as 
the abort-fetch command does. At least, that's what my tests seems to tell me.

Shouldn't a restart of Solr nicely clean up the mess before exiting? And, 
shouldn't starting Solr also look for mess left behind by a possible sudden 
shutdown of the server at which the mess obviously cannot get cleaned?

If i now stop, clean and start my slave it will attempt to download an 
existing index. If i abort-fetch it will clean up the mess and (due to low 
interval polling) make another attempt. If i, however, restart (instead of 
abort-fetch) the old temporary directory will stay and needs to be deleted 
manually.

Cheers,
-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Replication: the web application [/solr] .. likely to create a memory leak

2011-01-04 Thread Markus Jelsma
I don't have Windows :)

> Is this on Windows or Unix? Windows will not delete a file that is still
> open.
> 
> On Tue, Jan 4, 2011 at 10:07 AM, Markus Jelsma
> 
>  wrote:
> > Is it possible this problem has something to do with my old index files
> > not being removed? This problem only surfaces in my setup when i restart
> > with replication on the slave. I can confirm that for some reason my
> > replicated indexes get messed up only when i start restarting Tomcat
> > several times.
> > 
> > On Tuesday 04 January 2011 15:48:31 Yonik Seeley wrote:
> >> On Tue, Jan 4, 2011 at 9:34 AM, Robert Muir  wrote:
> >> >[junit] WARNING: test class left thread running:
> >> > Thread[MultiThreadedHttpConnectionManager cleanup,5,main]
> >> 
> >> I suppose we should move MultiThreadedHttpConnectionManager to
> >> CoreContainer.
> >> 
> >> -Yonik
> >> http://www.lucidimagination.com
> > 
> > --
> > Markus Jelsma - CTO - Openindex
> > http://www.linkedin.com/in/markus17
> > 050-8536620 / 06-50258350


Re: abort data import on errors

2011-01-04 Thread Markus Jelsma
http://wiki.apache.org/solr/UpdateXmlMessages#A.22rollback.22

> Hi,
>  
> Is there a way to specify to abort (rollback) the data import should there
> be an error/exception? 
> If everything runs smoothly, commit the data import.
>  
> Thanks,
>  
> Tri


Re: Replication: the web application [/solr] .. likely to create a memory leak

2011-01-05 Thread Markus Jelsma
I have no Windows.

On Tuesday 04 January 2011 23:20:00 Lance Norskog wrote:
> Is this on Windows or Unix? Windows will not delete a file that is still
> open.
> 
> On Tue, Jan 4, 2011 at 10:07 AM, Markus Jelsma
> 
>  wrote:
> > Is it possible this problem has something to do with my old index files
> > not being removed? This problem only surfaces in my setup when i restart
> > with replication on the slave. I can confirm that for some reason my
> > replicated indexes get messed up only when i start restarting Tomcat
> > several times.
> > 
> > On Tuesday 04 January 2011 15:48:31 Yonik Seeley wrote:
> >> On Tue, Jan 4, 2011 at 9:34 AM, Robert Muir  wrote:
> >> >[junit] WARNING: test class left thread running:
> >> > Thread[MultiThreadedHttpConnectionManager cleanup,5,main]
> >> 
> >> I suppose we should move MultiThreadedHttpConnectionManager to
> >> CoreContainer.
> >> 
> >> -Yonik
> >> http://www.lucidimagination.com
> > 
> > --
> > Markus Jelsma - CTO - Openindex
> > http://www.linkedin.com/in/markus17
> > 050-8536620 / 06-50258350

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Solr an Greek Chars

2009-12-28 Thread Markus Jelsma
Hi,


Did you post your documents in UTF-8? Also, for querying through GET using
non-ascii you must reconfigure Tomcat6 as per the manual [1].


Cheers,

[1] http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config

ZAROGKIKAS,GIORGOS zei:
> Hi there
>
> I’m using solr 1.4 under tomcat server in windows
> server 2008
>
>  and I want to index some data that contain Greek chars
>
> When I try to index my data and query all of them with
> *:* all the greek chars
>
>  returned like hieroglyphics
>
> can anybody help ???
>
>
>
>
>
> thanks in advance
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> -
>
> Γεώργιος Ζαρογκίκας
>
> Τμήμα Μηχανογράφησης
>
>  6936801497
>
>   g.zarogki...@multirama.gr
>
>  23o Xλμ Εθ. Οδού Αθήνων Λαμίας
>
> ΤΚ. 14564Driveme
> 
>
>
>
> P  Please consider the environment before printing this e-mail
>
>
>





Re: Multi language support

2010-01-11 Thread Markus Jelsma
Hello,


We have implemented language specific search in Solr using language
specific fields and field types. For instance, an en_text field type can
use an English stemmer, and list of stopwords and synonyms. We, however
did not use specific stopwords, instead we used one list shared by both
languages.

So you would have a field type like:

  
  

etc etc.



Cheers,

-  
Markus Jelsma  Buyways B.V.
Technisch ArchitectFriesestraatweg 215c
http://www.buyways.nl  9743 AD Groningen   


Alg. 050-853 6600  KvK  01074105
Tel. 050-853 6620  Fax. 050-3118124
Mob. 06-5025 8350  In: http://www.linkedin.com/in/markus17


On Mon, 2010-01-11 at 13:45 +0100, Daniel Persson wrote:

> Hi Solr users.
> 
> I'm trying to set up a site with Solr search integrated. And I use the
> SolJava API to feed the index with search documents. At the moment I
> have only activated search on the English portion of the site. I'm
> interested in using as many features of solr as possible. Synonyms,
> Stopwords and stems all sounds quite interesting and useful but how do
> I set up this in a good way for a multilingual site?
> 
> The site don't have a huge text mass so performance issues don't
> really bother me but still I'd like to hear your suggestions before I
> try to implement an solution.
> 
> Best regards
> 
> Daniel


Re: Encountering a roadblock with my Solr schema design...use dedupe?

2010-01-11 Thread Markus Jelsma
Hello Kelly,


I am not entirely sure if i understand your problem correctly. But i
believe your first approach is the right one.

Your question: "Which products are available that contain skus with color
Green, size M, and a price of $9.99 or less?" can be easily answered using
a schema like yours.

id = 1
color = [green, blue]
size = [M, S]
price = 6

id = 2
color = [red, blue]
size = [L, S]
price = 12

id = 3
color = [green, red, blue]
size = [L, S, M]
price = 5

Using the data above you can answer your question using a basic Solr query
[1] like the following: q=color:green AND price:[0 TO 9,99] AND size:M

Of course, you would make this a function query [2] but this, if i
understood your question well enough, answers it.

[1] http://wiki.apache.org/solr/SolrQuerySyntax
[2] http://wiki.apache.org/solr/FunctionQuery


Cheers,


Kelly Taylor zei:
>
> I am in the process of building a Solr search solution for my
> application and have run into a roadblock with the schema design.
> Trying to match criteria in one multi-valued field with corresponding
> criteria in another
> multi-valued field.  Any advice would be greatly appreciated.
>
> BACKGROUND:
> My RDBMS data model is such that for every one of my "Product" entities,
> there are one-to-many "SKU" entities available for purchase. Each SKU
> entity can have its own price, as well as one-to-many options, etc.  The
> web frontend displays available "Product" entities on both directory and
> detail pages.
>
> In order to take advantage of Solr's facet count, paging, and sorting
> functionality, I decided to base the Solr schema on "Product" documents;
> so none of my documents currently contain duplicate "Product" data, and
> all "SKU" related data is denormalized as necessary, but into
> multi-valued fields.  For example, I have a document with an "id" field
> set to
> "Product:7," a "docType" field is set to "Product" as well as
> multi-valued "SKU" related fields and data like, "sku_color" {Red |
> Green | Blue}, "sku_size" {Small | Medium | Large}, "sku_price" {10.00 |
> 10.00 | 7.99}
>
> I hit the roadblock when I tried to answer the question, "Which products
> are available that contain skus with color Green, size M, and a price of
> $9.99 or less?"...and have now begun the switch to "SKU" level indexing.
>  This also gives me what I need for faceted browsing/navigation, and
> search refinement...leading the user to "Product" entities having
> purchasable "SKU" entities.  But this also means I now have documents
> which are mostly duplicates for each "Product," and all, facet counts,
> paging and sorting is then inaccurate;  so it appears I need do this
> myself, with multiple Solr requests.
>
> Is this really the best approach; and if so, should I use the Solr
> Deduplication update processor when indexing and querying?
>
> Thanks in advance,
> Kelly
> --
> View this message in context:
> http://old.nabble.com/Encountering-a-roadblock-with-my-Solr-schema-design...use-dedupe--tp27118977p27118977.html
> Sent from the Solr - User mailing list archive at Nabble.com.





Re: Encountering a roadblock with my Solr schema design...use dedupe?

2010-01-11 Thread Markus Jelsma
Hello Kelly,


Simple boolean algebra, you tell Solr you want color = green AND size = M
so it will only return green t-shirts in size M. If you, however, turn the
AND in a OR it will return all t-shirts that are green OR in size M, thus
you can then get M sized shirts in the blue color or green shirts in size
XXL.

I suggest you'd just give it a try and perhaps come back later to find
some improvements for your query. It would also be a good idea - if i may
say so - to read the links provided in the earlier message.

Hope you will find what you're looking for :)


Cheers,

Kelly Taylor zei:
>
> Hi Markus,
>
> Thanks for your reply.
>
> Using the current schema and query like you suggest, how can I identify
> the unique combination of options and price for a given SKU?   I don't
> want the user to arrive at a product which doesn't completely satisfy
> their search request.  For example, with the "color:Green", "size:M",
> and "price:[0 to 9.99]" search refinements applied,  no products should
> be displayed which only have "size:M" in "color:Blue"
>
> The actual data in the database for a product to display on the frontend
> could be as follows:
>
> product id = 1
> product name = T-shirt
>
> related skus...
> -- sku id = 7 [color=green, size=S, price=10.99]
> -- sku id = 9 [color=green, size=L, price=10.99]
> -- sku id = 10 [color=blue, size=S, price=9.99]
> -- sku id = 11 [color=blue, size=M, price=10.99]
> -- sku id = 12 [color=blue, size=L, price=10.99]
>
> Regards,
> Kelly
>
>
> Markus Jelsma - Buyways B.V. wrote:
>>
>> Hello Kelly,
>>
>>
>> I am not entirely sure if i understand your problem correctly. But i
>> believe your first approach is the right one.
>>
>> Your question: "Which products are available that contain skus with
>> color Green, size M, and a price of $9.99 or less?" can be easily
>> answered using a schema like yours.
>>
>> id = 1
>> color = [green, blue]
>> size = [M, S]
>> price = 6
>>
>> id = 2
>> color = [red, blue]
>> size = [L, S]
>> price = 12
>>
>> id = 3
>> color = [green, red, blue]
>> size = [L, S, M]
>> price = 5
>>
>> Using the data above you can answer your question using a basic Solr
>> query [1] like the following: q=color:green AND price:[0 TO 9,99] AND
>> size:M
>>
>> Of course, you would make this a function query [2] but this, if i
>> understood your question well enough, answers it.
>>
>> [1] http://wiki.apache.org/solr/SolrQuerySyntax
>> [2] http://wiki.apache.org/solr/FunctionQuery
>>
>>
>> Cheers,
>>
>>
>
> --
> View this message in context:
> http://old.nabble.com/Encountering-a-roadblock-with-my-Solr-schema-design...use-dedupe--tp27118977p27120031.html
> Sent from the Solr - User mailing list archive at Nabble.com.





Re: Encountering a roadblock with my Solr schema design...use dedupe?

2010-01-12 Thread Markus Jelsma
Hello,


I now believe that i really did misunderstand the problem and,
unfortunately, i don't believe i can be of much assistance as i did not
have to implement a similar problem.


Cheers,

-  
Markus Jelsma  Buyways B.V.
Technisch ArchitectFriesestraatweg 215c
http://www.buyways.nl  9743 AD Groningen   


Alg. 050-853 6600  KvK  01074105
Tel. 050-853 6620  Fax. 050-3118124
Mob. 06-5025 8350  In: http://www.linkedin.com/in/markus17


On Mon, 2010-01-11 at 16:56 -0800, Kelly Taylor wrote:

> Hi Markus,
> 
> Thanks again. I wish this were simple boolean algebra. This is something I
> have already tried. So either I am missing the boat completely, or have
> failed to communicate it clearly. I didn't want to confuse the issue further
> but maybe the following excerpts will help...
> 
> Excerpt from  "Solr 1.4 Enterprise Search Server" by David Smiley & Eric
> Pugh...
> 
> "...the criteria for this hypothetical search involves multi-valued fields,
> where the index of one matching criteria needs to correspond to the same
> value in another multi-valued field in the same index. You can't do that..."
> 
> And this excerpt is from "Solr and RDBMS: The basics of designing your
> application for the best of both" by by Amit Nithianandan...
> 
> "...If I wanted to allow my users to search for wiper blades available in a
> store nearby, I might create an index with multiple documents or records for
> the same exact wiper blade, each document having different location data
> (lat/long, address, etc.) to represent an individual store. Solr has a
> de-duplication component to help show unique documents in case that
> particular wiper blade is available in multiple stores near me..."
> 
> http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Solr-and-RDBMS-design-basics
> 
> Remember, with my original schema definition I have multi-valued fields, and
> when the "product" document is built, these fields do contain an array of
> values retrieved from each of the related skus. Skus are children of my
> products.
> 
> Using your example data, which t-shirt sku is available for purchase as a
> child of t-shirt product with id 3? Is it really the green, M, or have we
> found a product document related to both a green t-shirt and a Medium
> t-shirt of some other color, which will thereby leave the user with nothing
> to purchase?
> 
> sku = 9 [color=green, size=L, price=10.99], product id = 3
> sku = 10 [color=blue, size=S, price=9.99], product id = 3
> sku = 11 [color=blue, size=M, price=10.99], product id = 3
> 
> >> id = 1
> >> color = [green, blue]
> >> size = [M, S]
> >> price = 6
> >>
> >> id = 2
> >> color = [red, blue]
> >> size = [L, S]
> >> price = 12
> >>
> >> id = 3
> >> color = [green, red, blue]
> >> size = [L, S, M]
> >> price = 5
> 
> If this is still unclear, I'll post a new question based on findings from
> this conversation. Thanks for all of your help.
> 
> -Kelly
> 
> 
> Markus Jelsma - Buyways B.V. wrote:
> > 
> > Hello Kelly,
> > 
> > 
> > Simple boolean algebra, you tell Solr you want color = green AND size = M
> > so it will only return green t-shirts in size M. If you, however, turn the
> > AND in a OR it will return all t-shirts that are green OR in size M, thus
> > you can then get M sized shirts in the blue color or green shirts in size
> > XXL.
> > 
> > I suggest you'd just give it a try and perhaps come back later to find
> > some improvements for your query. It would also be a good idea - if i may
> > say so - to read the links provided in the earlier message.
> > 
> > Hope you will find what you're looking for :)
> > 
> > 
> > Cheers,
> > 
> > Kelly Taylor zei:
> >>
> >> Hi Markus,
> >>
> >> Thanks for your reply.
> >>
> >> Using the current schema and query like you suggest, how can I identify
> >> the unique combination of options and price for a given SKU?   I don't
> >> want the user to arrive at a product which doesn't completely satisfy
> >> their search request.  For example, with the "color:Green", "size:M",
> >> and "price:[0 to 9.99]" search refinements applied,  no products should
> >> be displayed which only have "size:M" in "color:Blue"
> >>
> >> The actual data in the database for a product to display on the frontend
> >> could be as follows:
> >>
> &g

LucidGaze, No Data

2010-01-20 Thread Markus Jelsma
Hello all,


I have installed and reconfigured everything according to the readme supplied 
with the recent LucidGaze release. Files have been written in the gaze 
directory in SOLR_HOME but the *.log.x.y files are all empty! The rrd directory 
does contain something that is about 24MiB.

In the end, i see no errors in Tomcat's logs but also no results in the web 
application, all the handler's charts tell me "No Data".

Anyone wit a clue on this?


Cheers,


Re: LucidGaze, No Data

2010-01-25 Thread Markus Jelsma
Hi,


Is the list without clue or should i mail Lucid directly?


Cheers,


>I have installed and reconfigured everything according to the readme
> supplied with the recent LucidGaze release. Files have been written in the
> gaze directory in SOLR_HOME but the *.log.x.y files are all empty! The rrd
> directory does contain something that is about 24MiB.
>
>In the end, i see no errors in Tomcat's logs but also no results in the web
>application, all the handler's charts tell me "No Data".
>
>Anyone wit a clue on this?



Re: solr application for website crawling and indexing html, pdf, word, ... files

2010-01-25 Thread Markus Jelsma
Hello Frank,

Answers are inline:

Frank van Lingen said:
> I recently started working with solr and find it easy to setup and
> tinker with.
>
> I now want to scale up my setup and was wondering if there is an
> application/component that can do the following (I was not able to find
> documentation on this on the solr site):
>
> -Can I send solr an xml document with a url (html, pdf, word, ppt,
> etc..) and solr indexes it after analyzing (can it analyze pdf and other
> documents?). Solr would use some generic basic fields like
> header and content when analyzing the files.

Yes you can! Solr has an integration with Tika [1], yet another Apache
Lucene project. It can index many different formats. Please see the Solr
Cell wiki for more information [2].
>
> -Can I send solr a site url and it indexes the whole site?

No you can't. But there is yet another fine Apache Lucene project called
Nutch [3]. It offers a very convenient API and is very flexible. Since
version 1.0 Nutch can integrate more easily with a standby Solr index, and
together with Tika you can index almost anything you want with the
greatest ease.

You can find information on Nutch [4], also, our friends at
LucidImagination have written a very decent article on this subject [5].
You will find what you're looking for.

Cheers


>
> If the answer to the above is yes; are there some examples? If the
> answer is no; Is there a simple (basic) extractor for html, pdf, word,
> etc.. files that would translates this in a basic xml document (e.g.
> with field names, url, header and content) that solr can ingest, or
> preferably an application that does this for a whole site?
>
> The idea is to configure solr for generic indexing and search of a
> website.
>
> Frank.

[1]: http://lucene.apache.org/tika/index.html
[2]: http://wiki.apache.org/solr/ExtractingRequestHandler
[3]: http://lucene.apache.org/nutch/
[4]: http://wiki.apache.org/nutch/RunningNutchAndSolr
[5]: http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/




Re: To store or not to store serialized objects in solr

2010-01-26 Thread Markus Jelsma
Hello Andre,


We have used this approach before. We did keep all our data in a RDBMS but
added serialized objects to the index so we could simply query the record
and display it as is, without any hassle and SQL connections.

Although storing this data sounds a bit strange, it actually works well
and keeps things a bit simpler.

The performance of querying the index is the same (or with extremely tiny
differences). However, it does take some additional disk space and
transfer time for it to reach your application. On the other hand,
performance would surely be weaker if you would transfer the same data
(although in a not so verbose XML format) and need to connect and query a
SQL server.


Cheers,

Andre Parodi said:
> Hi,
>
> We currently are storing all of our data in sql database and use solr
> for indexing. We get a list of id's from solr and retrieve the data from
>  the db.
>
> We are considering storing all the data in solr to simplify
> administration and remove any synchronisation and are considering the
> following:
>
> 1. storing the data in individual fields in solr (indexed=true,
> store=true) 2. storing the data in a serialized form in a binary field
> in solr  (using google proto buffers or similar) and keep the rest of
> the solr  fields as indexed=true, stored=*false*.
> 3. keep as is. data stored in db and just keep solr fields as
> indexed=true, stored=false
>
> Can anyone provide some advice in terms of performance of the different
> approaches. Are there any obvious pitfalls to option 1 and 2 that i need
>  to be mindful of?
>
> I am thinking option 2 would be the fastest as it would be reading the
> data in one contiguous block. Will be doing some preformance test to
> verify this soon.
>
> FYI we are looking at 5-10M records, a serialised object is 500 to 1000
> bytes and we index approx 20 fields.
>
> Thanks for any advice.
> andre





RE: update doc success, but could not find the new value

2010-01-27 Thread Markus Jelsma
Check out Jetty's output or Tomcat's logs. The logging is very verbose and
you can get a clearer picture.


Jennifer Luo said:
> I am using example, only with two fields, id and body. Id is string
> field, body is text field.
>
> I use another program to do a http post to update the document, url is
> http://localhost:8983/solr/update?commit=true&overwrite=true&commitWithi
> n=10 , the data is
> 
>   
>   id1
>   test body
>   
> 
>
> I get the responseHeader back, the status is 0.
>
> Then I go to admin page, do search, query is body:test.  The result
> numFound = 0.
>
> I think the reason should be the index is not updated with the updated
> document.
>
> What should I do? What's is missing?
> Jennifer Luo
>
>> -Original Message-
>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>> Sent: Wednesday, January 27, 2010 1:39 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: update doc success, but could not find the new value
>>
>> Ummm, you have to provide a *lot* more detail before anyone can help.
>>
>> Have you used Luke or the admin page to examine your index and
> determine
>> that the update did, indeed, work?
>>
>> Have you tried firing your query with debugQuery=on to see if the
> fields
>> searched are the ones you expect?
>>
>> etc.
>>
>> Erick
>>
>> On Wed, Jan 27, 2010 at 11:54 AM, Jennifer Luo
>> wrote:
>>
>> > I am using
>> >
> http://localhost:8983/solr/update?commit=true&overwrite=true&commitWithi
>> > n=10 to update a document. The responseHeader's status is 0.
>> >
>> > But when I search the new value, it couldn't be found.
>> >





Re: Solr and location based searches

2010-02-02 Thread Markus Jelsma
Hi,


You can use three different approaches:
- Solr Spatial [1];
- Local Solr [2];
- Implement it yourself [3].

The first is promising, the latter is fun but is far less useful and powerful!


[1]: http://wiki.apache.org/solr/SpatialSearch
[2]: http://wiki.apache.org/solr/LocalSolr
[3]: http://williams.best.vwh.net/avform.htm#Dist

Cheers,

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: spellcheck

2010-02-11 Thread Markus Jelsma
Hi,


Did you add spellcheck.extendedResults=true to your query? This will a.o. tell 
you if Solr thinks it has been spelled correctly or not. However, if you have 
specified spellcheck.onlyMorePopular=true, you may get suggestions even if it 
has been spelled correctly.

Don't let the onlyMorePopular directive fool you, it caught many users of 
guard before :)


Cheers,

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: spellcheck

2010-02-11 Thread Markus Jelsma
Hi,


Check my earlier reply. You have explicitely set onlyMorePopular to true thus 
you will most likely always get suggestion even if the term was spelled 
correctly. You'll only get no suggestions if the term is spelled correctly and 
it is the most `popular` term.

You can opt for keeping onlyMorePopular set to true but it is then wise to 
enable extendedResults so you can check the correctlySpelled boolean.


cheers,


>here simple query:
>http://estyledesign:8983/request/select?q=popular&spellcheck=true&qt
>=keyrequest&spellcheck.extendedResults=true result:
>populars! but popular is correct word! Maybe i must change some properties
>in solrconfig! Here my configs for keyrequest:
>
>
>dismax
>  true
>  false
>  true
>  external
>
>
>query
>spellcheck
>mlt
>
>
>and search component:
>
>textSpell
>
>  name="classname">org.apache.solr.spelling.FileBasedSpellChecker
>  external
>  spellings.txt
>  UTF-8
>  spellcheckerfile
>
>   
>--
>View this message in context:
> http://old.nabble.com/spellcheck-tp27527425p27548078.html Sent from the
> Solr - User mailing list archive at Nabble.com.

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: spellcheck

2010-02-11 Thread Markus Jelsma
Hi,


I see you use an `external` dictionary. I've no idea what that is and how it 
works but it looks like the dictionary believe `populars!` is a term which 
obviously is not equal to `popular`. If this is an external index under your 
manual control; how about adding `popular` to the dictionary? And why is 
`populars!` in a spellcheck dictionary; it sounds like some weird term in a 
dictionary ;)


I know this is not a schema change but perhaps the following might help:
a) remove the old index
b) restarted your application server
c) reindexed your data
d) rebuilt your spellcheck index






>I change config, but i get the same result!
>
>
>
>dismax
>  false
>  false
>  true
>  external
>
>
>query
>    spellcheck
>mlt
>
>

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: Spell check returns strange suggestion

2010-02-22 Thread Markus Jelsma
darniz said:
>
> Hello All
> Please reply to this ASAP
> I am using indexbasedSpellchecker right now i copy only model, and make
> names and some other fields to my spellcheck field.
> Hence my spell check field consists of only 120 words.
>
> The issue is if i type hond i get back honda which is fine. But when i
> type term like true i get back suggestion like ram.

I'm not quite sure what you're telling us now but you are using
onlyMorePopular=true which will almost always return suggestions except if
the specified term is actually the most popular.

Another good practice, only show the user suggestions if the
correctlySpelled flag is really false. Many users seem to depend on the
condition of availability of a collation.

Try turning onlyMorePopular off or rely on the correctlySpelled flag you
have.


>
> I read there are some configuration to make for distance measure.
> Right now This is my spell check configuration
> 
> default
> searchSpellText
> true
> true
>  name="distanceMeasure">org.apache.lucene.search.spell.JaroWinklerDistance
> ./edmundsSpellcheckerDataIndex
> 
>
> and here is my query
> q=true&version=2.2&start=0&rows=10&indent=on&spellcheck=true&spellcheck.field=edmundsSearchSpellText&spellcheck.collate=true&spellcheck.extendedResults=true&spellcheck.onlyMorePopular=true
>
> thanks
> darniz
> --
> View this message in context:
> http://old.nabble.com/Spell-check-returns-strange-suggestion-tp27693520p27693520.html
> Sent from the Solr - User mailing list archive at Nabble.com.





Re: spellcheck all time

2010-02-23 Thread Markus Jelsma
Although the wiki states it correctly (will also return suggestions even
if properly spelled), perhaps we should add that it's a better practice to
only present end users with suggestions if the correctlySpelled flag is
false.

This issue keeps coming back.

Chris Hostetter said:
> : I have a little problem with spellcheck! I get suggestions all time
> even the : word is correct! I use dictionary from file! Here my
> configuration:
>
> :false
>
> http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.onlyMorePopular
>
> -Hoss





Re: logging

2010-02-23 Thread Markus Jelsma
Hi Peter,


It depends on what you call a debug log and how you interface with Solr.
Anyway, if you use Solr over HTTP you can check out the logs of your
servlet container and configure the logging behaviour on the Solr web
admin page. Usually, the default logging is quite useful. Either way, see
the the Tomcat [1] or Jetty [2] wiki pages for some information on the
commonly used servlet containers. Also, see a specific page on logging.
[3].

[1]: http://wiki.apache.org/solr/SolrTomcat
[2]: http://wiki.apache.org/solr/SolrJetty
[3]: http://wiki.apache.org/solr/SolrLogging


Cheers,


Peter A. Kirk said:
> Hi
>
> in the Solr example, how do I configure debug logging to a file?
>
> Thanks,
> Peter





Re: If you could have one feature in Solr...

2010-02-24 Thread Markus Jelsma
- performing multiple queries at once, perhaps abusing HTTP POST. On some 
application there is a page that executes five different queries. The HTTP 
overhead is not that much of a problem but it would be a nice to have.

- retrieving documents per facet, not unlike the results from the MoreLikeThis 
components, see: http://mail-archives.apache.org/mod_mbox//lucene-solr-
user/200905.mbox/%3c200905151342.56927.jeffrey.gel...@buyways.nl%3e for a use-
case we once had before, suggested by a colleage.

- stemmers for many more different languages

And of course (hopefully to be including in Solr 1.5:
- field collapsing
- Solr Spatial



On Wednesday 24 February 2010 14:42:18 Grant Ingersoll wrote:
> What would it be?
> 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: If you could have one feature in Solr...

2010-02-24 Thread Markus Jelsma
Well, i don't have a specific request in mind. However, i can image a growing 
internet market for thai, chinese and arabic speaking people and the native 
languages on the african continent. Providing them with stemmers to handle 
plurals etc. will allow for a better search experience.

Also, other components might need overhaul, see SOLR-1078 for an example of a 
language specific issue with a filter.

Perhaps i should rephrase
stemmers for many more different languages

to

support (ie. stemmers, tokenizers etc) for many more different languages.



On Wednesday 24 February 2010 15:25:46 Robert Muir wrote:
> On Wed, Feb 24, 2010 at 9:22 AM, Markus Jelsma  wrote:
> 
> - stemmers for many more different languages
> 
> 
> 
> I don't want to hijack this thread, but i would like to know which
>  languages you are interested in!
> 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



<    1   2   3   4   5   6   7   >