date:20090911

On Sat, Sep 12, 2009 at 11:03 AM,  wrote:

> >> So my question is, Is MoreLikeThis with StandardRequestHandler
> >> supported on shards? If not, is MoreLikeThisHandler supported?
>
> > No, MoreLikeThis does not work with distributed search currently.
> > There is an issue open with a couple of patches though.
> > See https://issues.apache.org/jira/browse/SOLR-788
>
> Ah, I see. Thanks. Is it a part of the 1.4 plan to add this support?
>
>
Unfortunately, no. I guess we managed to miss this issue because the fix
version was not set for 1.4. Now the window of opportunity has gone and
we're finishing up pending issues with the 1.4 release. But if you do try
the patches, please report your experience. This will be released with 1.5

-- 
Regards,
Shalin Shekhar Mangar.

Re[2]: Does MoreLikeThis support sharding?

2009-09-11 Thread jlist9

>> So my question is, Is MoreLikeThis with StandardRequestHandler
>> supported on shards? If not, is MoreLikeThisHandler supported?

> No, MoreLikeThis does not work with distributed search currently.
> There is an issue open with a couple of patches though.
> See https://issues.apache.org/jira/browse/SOLR-788

Ah, I see. Thanks. Is it a part of the 1.4 plan to add this support?

Re: Does MoreLikeThis support sharding?

On Thu, Sep 10, 2009 at 1:02 PM,  wrote:

>
> I tried MoreLikeThis (StandardRequestHandler with mlt arguments)
> with a single solr server and it works fine. However, when I tried
> the same query with sharded servers, I don't get the moreLikeThis
> key in the results.
>
> So my question is, Is MoreLikeThis with StandardRequestHandler
> supported on shards? If not, is MoreLikeThisHandler supported?
>

No, MoreLikeThis does not work with distributed search currently. There is
an issue open with a couple of patches though.

See https://issues.apache.org/jira/browse/SOLR-788

-- 
Regards,
Shalin Shekhar Mangar.

Re: Concatenate in Copy Field

On Sat, Sep 12, 2009 at 3:25 AM, Mohamed Parvez  wrote:

> Is its possible to concatenate two fields and copy it to a new field, in
> the
> schema.xml file
>
> I am importing from two tables and both have numeric value as primary key.
>
> If i copy just the primary key, which is a number, from both the tables, to
> one field and make it primary key, records may get over written.
>
> So i want  create a composite primary key for solar schema by concatenating
> two fields.
>

Are you using DataImportHandler? If yes, then you can use
TemplateTransformer to achieve this. If not, then you can either concatenate
the values before sending it to Solr or you can write a custom
UpdateRequestProcessor which can do this.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Facet Response Structure

On Sat, Sep 12, 2009 at 1:20 AM, smock  wrote:

>
> I'd like to propose a change to the facet response structure.  Currently,
> it
> looks like:
>
> {'facet_fields':{'field1':[('value1',count1),('value2',count2),(null,missingCount)]}}
>
> My immediate problem with this structure is that null is not of the same
> type as the 'value's.  Also, the meaning of the (null,missingCount) tuple
> is
> not the same as the meaning of the ('value',count) tuples, it is a special
> case to represent the documents for which the field has no value.  I'd like
> to propose changing the response to:
>
> {'facet_fields',:{'field1':{'facets':[('value1',count1),('value2',count2)],'missing':missingCount}}}
>
>
Well, there are two problems:
1. 'missing' can be a value in the field
2. Facet support has been there for a long time. This would break
compatibility with existing clients.


>
> In addition to cleaning up the 'null' issue mentioned above, I think this
> will allow for greater flexibility moving forward with the facet component.
> For instance, it would be great if the FacetComponent could add an optional
> count of the 'hits', or number of distinct facet values contained in the
> query result.  If the facet request has a limit on it, this number is not
> available via a count of the returned facet values.  The response structure
> I've outlined above could accomodate this piece of metadata very easily:
>
> {'facet_fields',:{'field1':{'facets':[('value1',count1),('value2',count2)],'missing':missingCount,'hits':hitsCount}}}
>
>
Have you looked at StatsComponent? It give counts for total distinct values
and count of documents missing a value among other things:

http://wiki.apache.org/solr/StatsComponent

-- 
Regards,
Shalin Shekhar Mangar.

Re: Default Query Type For Facet Queries

On Sat, Sep 12, 2009 at 12:18 AM, Stephen Duncan Jr <
stephen.dun...@gmail.com> wrote:

> >
> My experience (which is on a trunk build from a few weeks back of Solr
> 2.4),
> is that changing the default parser for the handler does NOT change it for
> facet.query.  I had expected it would, but was disappointed.
>
>
You are right, SimpleFacets#getFacetQueryCounts has the following comment:

/* Ignore SolrParams.DF - could have init param facet.query assuming
 * the schema default with query param DF intented to only affect Q.
 * If user doesn't want schema default for facet.query, they should be
 * explicit.
 */

I'm not sure if this should be changed. Hoss, what do you think?

-- 
Regards,
Shalin Shekhar Mangar.

Re: Highlighting in SolrJ?

Jay, it would be great if you can add this example to the Solrj wiki:

http://wiki.apache.org/solr/Solrj

On Fri, Sep 11, 2009 at 5:15 AM, Jay Hill  wrote:

> Set up the query like this to highlight a field named "content":
>
>SolrQuery query = new SolrQuery();
>query.setQuery("foo");
>
>query.setHighlight(true).setHighlightSnippets(1); //set other params as
> needed
>query.setParam("hl.fl", "content");
>
>QueryResponse queryResponse =getSolrServer().query(query);
>
> Then to get back the highlight results you need something like this:
>
>Iterator iter = queryResponse.getResults();
>
>while (iter.hasNext()) {
>  SolrDocument resultDoc = iter.next();
>
>  String content = (String) resultDoc.getFieldValue("content"));
>  String id = (String) resultDoc.getFieldValue("id"); //id is the
> uniqueKey field
>
>  if (queryResponse.getHighlighting().get(id) != null) {
>List highightSnippets =
> queryResponse.getHighlighting().get(id).get("content");
>  }
>}
>
> Hope that gets you what you need.
>
> -Jay
> http://www.lucidimagination.com
>
> On Thu, Sep 10, 2009 at 3:19 PM, Paul Tomblin  wrote:
>
> > Can somebody point me to some sample code for using highlighting in
> > SolrJ?  I understand the highlighted versions of the field comes in a
> > separate NamedList?  How does that work?
> >
> > --
> > http://www.linkedin.com/in/paultomblin
> >
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: Single Core or Multiple Core?

On Sat, Sep 12, 2009 at 9:43 AM, Chris Hostetter
wrote:

>
> : > For the record: even if you're only going to have one SOlrCore, using
> the
> : > multicore support (ie: having a solr.xml file) might prove handy from a
> : > maintence standpoint ... the ability to configure new "on deck cores"
> with
> ...
> : Yeah, it is a shame that single-core deployments (no solr.xml) does not
> have
> : a way to enable CoreAdminHandler. This is something we should definitely
> : look at in Solr 1.5.
>
> I think the most straight forward starting point is to switch how we
> structure the examples so that all of the examples uses a solr.xml with
> multicore support.
>
> Then we can move forward on deprecating the specification of "Solr Home"
> using JNDI/systemvars and switch to having the location of the solr.xml be
> the one master config option with everything else coming after that.
>
>
+1

-- 
Regards,
Shalin Shekhar Mangar.

Re: Single Core or Multiple Core?


: > For the record: even if you're only going to have one SOlrCore, using the
: > multicore support (ie: having a solr.xml file) might prove handy from a
: > maintence standpoint ... the ability to configure new "on deck cores" with
...
: Yeah, it is a shame that single-core deployments (no solr.xml) does not have
: a way to enable CoreAdminHandler. This is something we should definitely
: look at in Solr 1.5.

I think the most straight forward starting point is to switch how we 
structure the examples so that all of the examples uses a solr.xml with 
multicore support.

Then we can move forward on deprecating the specification of "Solr Home" 
using JNDI/systemvars and switch to having the location of the solr.xml be 
the one master config option with everything else coming after that.



-Hoss

Re: What Tokenizerfactory/TokenFilterFactory can/should I use so a search for "wal mart" matches "walmart"(quotes not included in search or index)?

On Fri, Sep 11, 2009 at 11:23 AM, Christian Zambrano wrote:

> There are a lot of company names that people are uncertain as to the
> correct spelling. A few of examples are:
> 1. best buy, bestbuy
> 2. walmart, wal mart, wal-mart
> 3. Holiday Inn, HolidayInn
>
> What Tokenizer Factory and/or TokenFilterFactory should I use so that
> somebody typing "wal mart"(quotes not included) will find "wal mart" and
> "walmart"(again, quotes not included)
>
>
Look at Intra word delimiter section in the SolrRelevancyCookbook.
WordDelimiterFactory can help here.

http://wiki.apache.org/solr/SolrRelevancyCookbook#head-353fcfa33e5c4a0a5959aa3d8d33c5a3a61f2683

If you need to provide spelling suggestions, see the SpellCheckComponent:

http://wiki.apache.org/solr/SpellCheckComponent

-- 
Regards,
Shalin Shekhar Mangar.

Re: Random Display of result in solr

On Fri, Sep 11, 2009 at 12:02 PM, dharhsana wrote:

>
> I am working on blog module,here the user will be creating more blogs,and
> he
> can post on it and have several comments for post.For implementing this
> module i am using solr 1.4.
>
> When i get blog details of particular user, it brings the result in random
> manner for
> (ex:) If i am passing blogid in the query to get my details,the result i
> got
> as ,if i have 2 result from it
>
>
To filter on a certain value for a field, you should pass
field-name:field-value. An example in your case will be userId:1


> This is the first result
>
> SolrDocument1{blogTitle=New Blog, blogId=New Blog, userId=1}]
> SolrDocument2{blogId=New Blog, postId=New Post, postTitle=New Post,
> postMessage=New Post Message, timestamp_post=Fri Sep 11 09:48:24 IST 2009}]
> SolrDocument3{blogTitle=ammu blog, blogId=ammu blog, userId=1}]
>
> The Second result
> SolrDocument1{blogTitle=New Blog, blogId=New Blog, userId=1}]
> SolrDocument2{blogTitle=ammu blog, blogId=ammu blog, userId=1}]
> SolrDocument3{blogId=New Blog, postId=New Post, postTitle=New Post,
> postMessage=New Post Message, timestamp_post=Fri Sep 11 09:48:24 IST 2009}]
>
> I am using solrj, when i am iterating the list i some times get
> ArrayIndexOutOfBoundException,because of my difference in the result.
>
> When i run again my code some other time ,it produces the proper result.so
> the list was changing all time.
>

Both the results contain the same number of documents (three), isn't it? The
list of documents returned by Solrj is never modifed by Solrj so I don't see
why you will get that exception.


> If anybody faced this type of problem ,please share with me..
>
> And  iam not able to get the specific thing ie if i am going to get blog
> details of particular user, so i will be passing blogtitle for ex: rekha
> blog , it is not giving only the rekha blog it also gives other blog which
> ends with blog (i..e sandhya blog,it brings even that and shows..).
>
>
You need to define what you want to retrieve. Do you want only exact matches
- case sensitive or case insensitive? Do you want full text matches with
exact match on top? Do you want to remove a particular word "blog" from your
index?

Also see http://wiki.apache.org/solr/SolrRelevancyCookbook

-- 
Regards,
Shalin Shekhar Mangar.

Re: shards and facet_count

On Fri, Sep 11, 2009 at 2:35 AM, Paul Rosen wrote:

> Hi again,
>
> I've mostly gotten the multicore working except for one detail.
>
> (I'm using solr 1.3 and solr-ruby 0.0.6 in a rails project.)
>
> I've done a few queries and I appear to be able to get hits from either
> core. (yeah!)
>
> I'm forming my request like this:
>
> req = Solr::Request::Standard.new(
>  :start => start,
>  :rows => max,
>  :sort => sort_param,
>  :query => query,
>  :filter_queries => filter_queries,
>  :field_list => @field_list,
>  :facets => {:fields => @facet_fields, :mincount => 1, :missing => true,
> :limit => -1},
>  :highlighting => {:field_list => ['text'], :fragment_size => 600},
>  :shards => @cores)
>
> If I leave ":shards => @cores" out, then the response includes:
>
> 'facet_counts' => {
>  'facet_dates' => {},
>  'facet_queries' => {},
>  'facet_fields' => { 'myfacet' => [ etc...], etc... }
>
> which is what I expect.
>
> If I add the ":shards => @cores" back in (so that I'm doing the exact
> request above), I get:
>
> 'facet_counts' => {
>  'facet_dates' => {},
>  'facet_queries' => {},
>  'facet_fields' => {}
>
> so I've lost my facet information.
>
> Why would it correctly find my documents, but not report the facet info?
>

I'm not a ruby guy but the response format in both the cases is exactly the
same so I don't think there is any problem with the ruby client parsing. Can
you check the Solr logs to see if there were any exceptions when you sent
the shards parameter?

-- 
Regards,
Shalin Shekhar Mangar.

Re: Query regarding incremental index replication

On Thu, Sep 10, 2009 at 7:08 AM, Silent Surfer wrote:

> Hi ,
>
> Currently we are using Solr 1.3 and we have the following requirement.
>
> As we need to process very high volumes of documents (of the order of 400
> GB per day), we are planning to separate indexer(s) and searcher(s), so that
> there won't be performance hit.
>
> Our idea is to have have a set of servers which is used only for indexers
> for index creation and then every 5 mins or so, the index will be copied to
> the searchers(set of solr servers only for querying). For this we tried to
> use the snapshooter,rsysnc etc.
>
> But the problem with this approach is, the same index is present on both
> the indexer and searcher, and hence occupying large FS.
>
>
Set of servers used only for indexers? Solr replication currently supports
only a single master.

If you have a dedicated master then why do you care about index occupying
too much disk space?

> What we need is a mechanism, where in the indexer contains only the index
> for the past 5 mins(last indexing cycle before the snap shooter is run) and
> the searcher should have the accumulated(total) index i.e every 5 mins, we
> should be able to move the entire index from indexer to searcher and so on.
>
> The above scenario is slightly different from master/slave implementation,
> as on master we want only the latest(WIP) index and the slave should contain
> the entire index.
>

If you commit but do not optimize then rsync will transfer only the new
segment files which should be possible within 5 minutes. So I'd suggest
optimize less frequently (once or twice a day).

However, if for some reasons you still want to go with your design, there is
a new MergeIndexes feature in Solr 1.4 which can help (assuming that you
have only additions or replacements and no deletes). However, that is not
used by the Solr 1.4 Java replication. You may be able to modify the
snappuller and snapinstaller scripts to use merge indexes command though.
Something like that can also work with multiple servers creating indexes
(again assuming no deletes are needed).

http://wiki.apache.org/solr/MergingSolrIndexes

-- 
Regards,
Shalin Shekhar Mangar.

Re: Using EnglishPorterFilterFactory in code

On Fri, Sep 11, 2009 at 6:21 AM, darniz  wrote:

>
> hello
> i have a task where my user is giving me 20 words of english dictionary and
> i have to run a program and generate a report with all stemmed words.
>
> I have to use EnglishPorterFilterFactory and SnowballPorterFilterFactory to
> check which one is faster and gets the best results
>
>
The EnglishPorterFilter is deprecated. It just creates the same stemmer as
SnowballPorterFilter with language specified as english. Since you will end
up creating the same class, there won't be a difference in performance.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Single Core or Multiple Core?

On Sat, Sep 12, 2009 at 12:12 AM, Chris Hostetter
wrote:

>
> For the record: even if you're only going to have one SOlrCore, using the
> multicore support (ie: having a solr.xml file) might prove handy from a
> maintence standpoint ... the ability to configure new "on deck cores" with
> new configs, populate them with data, and then swap them in place for your
> previous core without any downtime is a really nice feature to take
> advantage of.
>
>
Yeah, it is a shame that single-core deployments (no solr.xml) does not have
a way to enable CoreAdminHandler. This is something we should definitely
look at in Solr 1.5.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Spec Version vs Implementation Version


: What are the differences between specification version and implementation
: version

those are concepts from the Java specification for jars and wars (more 
info then you could ever possibly want in the URLs below)

: I downloaded the nightly build for September 05 2009 and it has a spec
: version of 1.3 and the implementation version states 1.4-dev

I think you are abreviating.  the spec version should be something like 
"1.3.0.2009.09.05.12.07.49" and the implementation version should be 
something like "1.4-dev 812759 - hudson - 2009-09-05 12:07:49"

In a nutshell: the spec version identifies the specification, in our case 
the Java APIs.  the Implementation version itentifies the implementation 
(in our case: the internals of those methods).  

a spec version must be purely numeric, and has rules about how to 
interpret when one version is newer then another.  For a relase, the spec 
version looks like "1.3.0" or "1.3.1" or "1.4.0" but for the nightlys we 
include the date to denote that the API is "newer" then it was in 1.3.0. 

An implementation version can be any string.  for releases it starts out 
the same as the spec version, but then includes other details about that 
particular build (the svn revision, who built it, and when it was build) 
... for dev versions it says all the same things, but the initial verion 
info tells you what development branch you're looking at - so 1.4-dev 
means it's working towards 1.4 (as opposed to a hypothetical 1.3.1-dev 
that might exist if someone created a maintence branch on 1.3 in 
anticipation of a 1.3.1 release)


http://java.sun.com/j2se/1.5.0/docs/guide/jar/jar.html#JAR%20Manifest
http://java.sun.com/j2se/1.5.0/docs/guide/versioning/spec/versioning2.html
http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Package.html
http://java.sun.com/j2se/1.5.0/docs/api/java/util/jar/package-summary.html
http://java.sun.com/developer/Books/javaprogramming/JAR/basics/manifest.html

: 
: 
: -- 
: "Good Enough" is not good enough.
: To give anything less than your best is to sacrifice the gift.
: Quality First. Measure Twice. Cut Once.
: 



-Hoss

Re: "standard" requestHandler components

2009-09-11 Thread Jay Hill

RequestHandlers are configured in solrconfig.xml. If no components are
explicitly declared in the request handler config the the defaults are used.
They are:
- QueryComponent
- FacetComponent
- MoreLikeThisComponent
- HighlightComponent
- StatsComponent
- DebugComponent

If you wanted to have a custom list of components (either omitting defaults
or adding custom) you can specify the components for a handler directly:

  query
  facet
  mlt
  highlight
  debug
  someothercomponent


You can add components before or after the main ones like this:

  mycomponent



  myothercomponent


and that's how the spell check component can be added:

  spellcheck


Note that the a component (except the defaults) must be configured in
solrconfig.xml with the name used in the str element as well.

Have a look at the solrconfig.xml in the example directory
(".../example/solr/conf/") for examples on how to set up the spellcheck
component, and on how the request handlers are configured.

-Jay
http://www.lucidimagination.com


On Fri, Sep 11, 2009 at 3:04 PM, michael8  wrote:

>
> Hi,
>
> I have a newbie question about the 'standard' requestHandler in
> solrconfig.xml.  What I like to know is where is the config information for
> this requestHandler kept?  When I go to http://localhost:8983/solr/admin,
> I
> see the following info, but am curious where are the supposedly 'chained'
> components (e.g. QueryComponent, FacetComponent, MoreLikeThisComponent)
> configured for this requestHandler.  I see timing and process debug output
> from these components with "debugQuery=true", so somewhere these components
> must have been configured for this 'standard' requestHandler.
>
> name:standard
> class:  org.apache.solr.handler.component.SearchHandler
> version:$Revision: 686274 $
> description:Search using components:
>
> org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.DebugComponent,
> stats:  handlerStart : 1252703405335
> requests : 3
> errors : 0
> timeouts : 0
> totalTime : 201
> avgTimePerRequest : 67.0
> avgRequestsPerSecond : 0.015179728
>
>
> What I like to do from understanding this is to properly integrate
> spellcheck component into the standard requestHandler as suggested in a
> solr
> spellcheck example.
>
> Thanks for any info in advance.
> Michael
> --
> View this message in context:
> http://www.nabble.com/%22standard%22-requestHandler-components-tp25409075p25409075.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Question regarding Stemmer

2009-09-11 Thread darniz


Hi

i want to get some answers to some of my questions.
Going by the Solr Wiki There are three approaches for Stemming

Porter or Reduction Algorithm
As far as i know there is "solr.EnglishPorterFilterFactory" and there is
"solr.SnowballPorterFilterFactory" Both uses the same stemming algorithm.

Hence i assume the only difference is that SnowBall allows you to specify a
language.
is my  asssumption correct?

The other thing is that the wiki talks about "solr.PorterStemFilterFactory"
and that uses PorterStemming Algorithm but doesn't have any example
specifying how to declare a field type with that kind of stemmer.
Could any body give some snippet.


Expanssion Stemming By using SynonymFilterFactory
no comments

Kstem which is less aggressive stemmer
When i click the download the jar it doesn't work for the past two days.
Dont know exactlty when it will be working.

Thanks
darniz
-- 
View this message in context: 
http://www.nabble.com/Question-regarding-Stemmer-tp25409688p25409688.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Automatically calculate boost factor

2009-09-11 Thread Grant Ingersoll

Sounds like you want to use a FunctionQuery: See http://wiki.apache.org/solr/FunctionQuery
. Either that or roll it up into the document boost, but that loses
some precision.

On Sep 11, 2009, at 10:05 AM, Villemos, Gert wrote:

I would like to automatically calculate the boost factor of a document
based on the values of other fields. For example;

1.2
1.5
0.8

Document boost = 1.2*1.5*0.8

Is it possible to get SOLr to calculate the boost automatically upon
submission based on field values?

Cheers,
Gert.

Please help Logica to respect the environment by not printing this
email / Pour contribuer comme Logica au respect de l'environnement,
merci de ne pas imprimer ce mail / Bitte drucken Sie diese
Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu
schützen. / Por favor ajude a Logica a respeitar o ambiente nao
imprimindo este correio electronico.

This e-mail and any attachment is for authorised use by the intended
recipient(s) only. It may contain proprietary material, confidential
information and/or be subject to legal privilege. It should not be
copied, disclosed to, retained or used by, any other party. If you
are not an intended recipient then please promptly delete this e-
mail and any attachment and all copies and inform the sender. Thank
you.

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:

http://www.lucidimagination.com/search

"standard" requestHandler components

2009-09-11 Thread michael8

Hi,

I have a newbie question about the 'standard' requestHandler in
solrconfig.xml. What I like to know is where is the config information for
this requestHandler kept? When I go to http://localhost:8983/solr/admin, I
see the following info, but am curious where are the supposedly 'chained'
components (e.g. QueryComponent, FacetComponent, MoreLikeThisComponent)
configured for this requestHandler. I see timing and process debug output
from these components with "debugQuery=true", so somewhere these components
must have been configured for this 'standard' requestHandler.

name:standard
class: org.apache.solr.handler.component.SearchHandler
version:$Revision: 686274 $
description:Search using components:
org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.DebugComponent,

stats: handlerStart : 1252703405335
requests : 3
errors : 0
timeouts : 0
totalTime : 201
avgTimePerRequest : 67.0
avgRequestsPerSecond : 0.015179728

What I like to do from understanding this is to properly integrate
spellcheck component into the standard requestHandler as suggested in a solr
spellcheck example.

Thanks for any info in advance.
Michael
--
View this message in context:
http://www.nabble.com/%22standard%22-requestHandler-components-tp25409075p25409075.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Why dismax isn't the default with 1.4 and why it doesn't support fuzzy search ?


: What I dont understand is whether a requesthandler and a queryparser is 
: the same thing, i.e. The configuration contains a REQUESTHANDLER with 
: the name 'dismax', but does not contain a QUERYPARSER with the name 
: 'dismax'. Where does the 'dismax' queryparser come from? Do I have to 
: configure this extra? Or is it there per default? Or does it come from 
: the 'dismax' requesthandler?

Request handler *instances* only exist if they are declared in your 
solrconfig.xml, two differnet instances might be configured to use the 
same *class* but with differnet configurations.  the details on 
which request handler *instance* is picked to process any request can be 
found here...
http://wiki.apache.org/solr/SolrRequestHandler  (look for "Handler Resolution")

If the *class* of a particular instance is "solr.SearchHandler" 
then that instance will use the "defType" param to decide which 
QParserPlugin to use to parse it's query string...

http://wiki.apache.org/solr/SolrPlugins (look for QParserPlugin)

...in addition to being able to register your own QParserPlugins, there 
are some defaults provided -- just like there are default request writers 
provided.  (hmm but there doesnt' seem to be a good list of hte 
QParsers provided by default, hmmm ... that's anoying)

Now to get to the meeat of your question: there is the 
solr.DisMaxRequestHandler -- which is a requestHandler class, and in 
your solrconfig.xml you have given an instance of this class with the name 
"dismax" *AND* there is the solr.DisMaxQParserPlugin -- which is a 
queryParser class, and by default there is an instance of that with the 
name "dismax"

The solr.DisMaxRequestHandler is just a trivial subclass of 
solr.SearchHandler that does nothing by set "defType=dismax" (ie: refering 
to whichever QParserPlugin instance exists with the name "dismax")

If you consider these examples...


http://localhost:8983/select?qt=dismax&;...
http://localhost:8983/select?qt=standard&defType=dismax&;...

the first uses the requestHandler named dismax, the second uses the 
request handler named standard, and tells it to the defType named 
dismax.  Those could be *very* different, but with your colrconfig.xml
(and for 99% of the solrconfigurations out there) these two examples 
should function exactly the same.



  You could change your configs to make them mean 




: 
: Gert.
: 
: 
: 
: 
:  
: 
: -Original Message-
: From: kaoul@gmail.com [mailto:kaoul@gmail.com] On Behalf Of Erwin
: Sent: Wednesday, September 09, 2009 10:55 AM
: To: solr-user@lucene.apache.org
: Subject: Re: Why dismax isn't the default with 1.4 and why it doesn't support 
fuzzy search ?
: 
: Hi Gert,
: 
: &qt=dismax in URL works with Solr 1.3 and 1.4 without further configuration. 
You are right, you should find a "dismax" query parser in solrconfig.xml by 
default.
: 
: Erwin
: 
: On Wed, Sep 9, 2009 at 7:49 AM, Villemos, Gert 
wrote:
: > On question to this;
: >
: > Do you need to explicitly configure a 'dismax' queryparser in the 
: > solrconfig.xml to enable this, or is a queryparser named 'dismax'
: > available per default?
: >
: > Cheers,
: > Gert.
: >
: >
: >
: >
: > -Original Message-
: > From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
: > Sent: Wednesday, September 02, 2009 2:44 AM
: > To: solr-user@lucene.apache.org
: > Subject: Re: Why dismax isn't the default with 1.4 and why it doesn't 
: > support fuzzy search ?
: >
: > : The wiki says "As of Solr 1.3, the DisMaxRequestHandler is simply 
: > the
: > : standard request handler with the default query parser set to the
: > : DisMax Query Parser (defType=dismax).". I just made a checkout of 
: > svn
: > : and dismax doesn't seems to be the default as :
: >
: > that paragraph doesn't say that dismax is the "default handler" ... it 
: > says that using qt=dismax is the same as using qt=standard with the "
: > query parser" set to be the DisMaxQueryParser (using defType=dismax)
: >
: >
: > so doing this replacement on any URL...
: >
: > � �qt=dismax � => �qt=standard&defTYpe=dismax
: >
: > ...should produce identical results.
: >
: > : Secondly, I've patched solr with
: > : http://issues.apache.org/jira/browse/SOLR-629 as I would like to 
: > have
: > : fuzzy with dismax. I built it with "ant example". Now, behavior is
: > : still the same, no fuzzy search with dismax (using the qt=dismax
: > : parameter in GET URL).
: >
: > questions/discussion of uncommitted patches is best done in the Jira 
: > issue wherey ou found the patch ... that way it helps other people 
: > evaluate the patch, and the author of the patch is more likelye to see 
: > your feedback.
: >
: >
: > -Hoss
: >
: >
: >
: > Please help Logica to respect the environment by not printing this email �/ 
Pour contribuer comme Logica au respect de l'environnement, merci de ne pas 
imprimer ce mail / �Bitte drucken Sie diese Nachricht nicht aus und helfen Sie 
so Logica dabei, die Umwelt zu sch�tzen. / �Por favor ajude

Concatenate in Copy Field

2009-09-11 Thread Mohamed Parvez

Is its possible to concatenate two fields and copy it to a new field, in the
schema.xml file

I am importing from two tables and both have numeric value as primary key.

If i copy just the primary key, which is a number, from both the tables, to
one field and make it primary key, records may get over written.

So i want  create a composite primary key for solar schema by concatenating
two fields.

---
Thanks/Regards,
Parvez

Re: Nonsensical Solr Relevancy Score

2009-09-11 Thread Jeff Newburn

Ah that makes more sense.  It does seem that the coord would be a good
option especially in cases like this.
-- 
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562


> From: Yonik Seeley 
> Reply-To: 
> Date: Fri, 11 Sep 2009 14:44:50 -0400
> To: 
> Subject: Re: Nonsensical Solr Relevancy Score
> 
> At a high level, there's this:
> http://wiki.apache.org/solr/SolrRelevancyFAQ#head-343e33b6472ca53afb94e1544ae3
> fcf7d474e5fc
> 
> -Yonik
> http://www.lucidimagination.com
> 
> 
> 
> On Fri, Sep 11, 2009 at 1:05 PM, Matthew Runo  wrote:
>> I'd actually like to see a detailed wiki page on how all the parts of a
>> score are actually calculated and inter-related, but I'm not knowledgeable
>> enough to write it =\
>> 
>> Thanks for your time!
>> 
>> Matthew Runo
>> Software Engineer, Zappos.com
>> mr...@zappos.com - 702-943-7833
>> 
>> On Sep 9, 2009, at 3:00 PM, Jeff Newburn wrote:
>> 
>>> I have done a search on the word ³blue² in our index.  The debugQuery
>>> shows
>>> some extremely strange methods of scoring.  Somehow product 1 gets a
>>> higher
>>> score with only 1 match on the word blue when product 2 gets a lower score
>>> with the same field match AND an additional field match.  Can someone
>>> please
>>> help me understand why such an obviously more relevant product is given a
>>> lower score.
>>> 
>>>  
>>> 2.3623571 = (MATCH) sum of:
>>>  0.26248413 = (MATCH) max plus 0.5 times others of:
>>>   0.26248413 = (MATCH) weight(productNameSearch:blue in 112779), product
>>> of:
>>>     0.032673787 = queryWeight(productNameSearch:blue), product of:
>>>       8.033478 = idf(docFreq=120, numDocs=136731)
>>>       0.0040672035 = queryNorm
>>>     8.033478 = (MATCH) fieldWeight(productNameSearch:blue in 112779),
>>> product of:
>>>       1.0 = tf(termFreq(productNameSearch:blue)=1)
>>>       8.033478 = idf(docFreq=120, numDocs=136731)
>>>       1.0 = fieldNorm(field=productNameSearch, doc=112779)
>>>  2.099873 = (MATCH) max plus 0.5 times others of:
>>>   2.099873 = (MATCH) weight(productNameSearch:blue^8.0 in 112779), product
>>> of:
>>>     0.2613903 = queryWeight(productNameSearch:blue^8.0), product of:
>>>       8.0 = boost
>>>       8.033478 = idf(docFreq=120, numDocs=136731)
>>>       0.0040672035 = queryNorm
>>>     8.033478 = (MATCH) fieldWeight(productNameSearch:blue in 112779),
>>> product of:
>>>       1.0 = tf(termFreq(productNameSearch:blue)=1)
>>>       8.033478 = idf(docFreq=120, numDocs=136731)
>>>       1.0 = fieldNorm(field=productNameSearch, doc=112779)
>>> 
>>>  
>>> 1.9483687 = (MATCH) sum of:
>>>  0.63594794 = (MATCH) max plus 0.5 times others of:
>>>   0.16405259 = (MATCH) weight(productNameSearch:blue in 8142), product of:
>>>     0.032673787 = queryWeight(productNameSearch:blue), product of:
>>>       8.033478 = idf(docFreq=120, numDocs=136731)
>>>       0.0040672035 = queryNorm
>>>     5.0209236 = (MATCH) fieldWeight(productNameSearch:blue in 8142),
>>> product of:
>>>       1.0 = tf(termFreq(productNameSearch:blue)=1)
>>>       8.033478 = idf(docFreq=120, numDocs=136731)
>>>       0.625 = fieldNorm(field=productNameSearch, doc=8142)
>>>   0.55392164 = (MATCH) weight(color:blue^10.0 in 8142), product of:
>>>     0.15009704 = queryWeight(color:blue^10.0), product of:
>>>       10.0 = boost
>>>       3.6904235 = idf(docFreq=9309, numDocs=136731)
>>>       0.0040672035 = queryNorm
>>>     3.6904235 = (MATCH) fieldWeight(color:blue in 8142), product of:
>>>       1.0 = tf(termFreq(color:blue)=1)
>>>       3.6904235 = idf(docFreq=9309, numDocs=136731)
>>>       1.0 = fieldNorm(field=color, doc=8142)
>>>  1.3124207 = (MATCH) max plus 0.5 times others of:
>>>   1.3124207 = (MATCH) weight(productNameSearch:blue^8.0 in 8142), product
>>> of:
>>>     0.2613903 = queryWeight(productNameSearch:blue^8.0), product of:
>>>       8.0 = boost
>>>       8.033478 = idf(docFreq=120, numDocs=136731)
>>>       0.0040672035 = queryNorm
>>>     5.0209236 = (MATCH) fieldWeight(productNameSearch:blue in 8142),
>>> product of:
>>>       1.0 = tf(termFreq(productNameSearch:blue)=1)
>>>       8.033478 = idf(docFreq=120, numDocs=136731)
>>>       0.625 = fieldNorm(field=productNameSearch, doc=8142)
>>> 
>>> 
>>> --
>>> Jeff Newburn
>>> Software Engineer, Zappos.com
>>> jnewb...@zappos.com - 702-943-7562
>>> 
>> 
>>

Re: where can i find solr1.4


: Subject: where can i find solr1.4
: In-Reply-To: <13bed5c20909090154v4507e091k4fefeb073ff69...@mail.gmail.com>

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/Thread_hijacking


-Hoss

Re: Date Faceting and Double Counting


: What are wild-card range searches?

i'm pretty sure we was just refering to open ended range searchers, like 
the example he asked about...

: > What does this mean?
: >
: >      {* TO *}
: 
: Same thing as [* TO *] - not worth trying to make it different IMO.

...right, that's something the SolrQueryParser already supports.  the 
inclusive/exclusive syntax is ignored if it's open ended.




-Hoss

Re: Dynamically building the value of a field upon indexing


: This has to be done by an UpdateRequestProcessor

I think the SignatureUpdateProcessor does exactly what you want ... you 
just need a Signature implementation that does a simple concat (instead of 
an MD5)

so we have a simple identity signature? .. it seems like it would be 
trivial.



-Hoss

Re: Very slow first query

The index is 8GB and I'm giving it 1,5 GB of RAM

On Fri, Sep 11, 2009 at 5:09 PM, Lance Norskog  wrote:

> This sounds like a memory-handling problem. The JVM could be too
> small, forcing a lot of garbage collections during the first search.
> It could be too big and choke off the OS disk cache. It could be too
> big and cause paging.
>
> Does this search query include a sort command? Sorting creates a large
> data structure the first time, then caches it. For 12 million
> documents this should not take 50 seconds.
>
> How big are the index files? Not the number of documents, but the
> total size in gigabytes in solr/data/index.
>
> On Fri, Sep 11, 2009 at 10:21 AM, Jonathan Ariel 
> wrote:
> > Ok thanks, if it's the IO OS Disk cache, which would be my options?
> changing
> > the disk to a faster one?
> >
> > On Fri, Sep 11, 2009 at 1:32 PM, Yonik Seeley <
> > yonik.see...@lucidimagination.com> wrote:
> >
> >> At the Lucene level there is the term index and the norms too:
> >>
> >>
> http://search.lucidimagination.com/search/document/b5eee1fc75cc454c/caching_in_lucene
> >>
> >> But 50s? That would seem to indicate it's the OS disk cache and you're
> >> waiting for IO.  You should be able to confirm if you're IO bound by
> >> simply looking at the CPU utilization during this 50s query.
> >>
> >> -Yonik
> >> http://www.lucidimagination.com
> >>
> >>
> >>
> >> On Fri, Sep 11, 2009 at 8:59 AM, Jonathan Ariel 
> >> wrote:
> >> > yes of course. but in my case I'm not using filter queries nor facets.
> >> > it is a really simple query. actually the query params are like this:
> >> > ?q=location_country:1 AND category:377 AND location_state:"CA" and
> >> > location_city:"Sacramento"
> >> >
> >> > location_country is an integer
> >> > category is an integer
> >> > location_state is a string
> >> > and location_city is a string
> >> >
> >> > as you can see no filter query and no facets. and for this query the
> >> first
> >> > time that I execute it it takes almost 50s to run, while for the
> >> following
> >> > query:
> >> >
> >> > ?q=title_search:test
> >> >
> >> > title_search is a tokenized text field with a bunch of filters
> >> >
> >> > it takes a couple of ms
> >> >
> >> > I'm always talking about executing these queries the first time after
> >> > restarting solr.
> >> >
> >> > I just want to understand the cause and be sure I won't have this
> >> behaviour
> >> > every time I commit or optimize.
> >> >
> >> > Jonathan
> >> >
> >> > On Fri, Sep 11, 2009 at 7:28 AM, Uri Boness 
> wrote:
> >> >
> >> >> "Not having any facet" and "Not using a filter cache" are two
> different
> >> >> things. If you're not using query filters, you can still have facet
> >> >> calculated and returned as part of the search result. The facet
> >> component
> >> >> uses lucene's field cache to retrieve values for the facet field.
> >> >>
> >> >>
> >> >> Jonathan Ariel wrote:
> >> >>
> >> >>> Yes, but in this case the query that I'm executing doesn't have any
> >> facet.
> >> >>> I
> >> >>> mean for this query I'm not using any filter cache.What does it
> means
> >> >>> "operating system cache can be significant"? That my first query
> >> uploads a
> >> >>> big chunk on the index into memory (maybe even the entire index)?
> >> >>>
> >> >>> On Thu, Sep 10, 2009 at 10:07 PM, Yonik Seeley
> >> >>> wrote:
> >> >>>
> >> >>>
> >> >>>
> >>  At 12M documents, operating system cache can be significant.
> >>  Also, the first time you sort or facet on a field, a field cache
> >>  instance is populated which can take a lot of time.  You can
> prevent
> >>  slow first queries by configuring a static warming query in
> >>  solrconfig.xml that includes the common sorts and facets.
> >> 
> >>  -Yonik
> >>  http://www.lucidimagination.com
> >> 
> >>  On Thu, Sep 10, 2009 at 8:55 PM, Jonathan Ariel <
> ionat...@gmail.com>
> >>  wrote:
> >> 
> >> 
> >> > Hi!Why would it take for the first query that I execute almost 60
> >> > seconds
> >> >
> >> >
> >>  to
> >> 
> >> 
> >> > run and after that no more than 50ms? I disabled all my caching to
> >> check
> >> >
> >> >
> >>  if
> >> 
> >> 
> >> > it is the reason for the subsequent fast responses, but the same
> >> > happens.
> >> > I'm using solr 1.3.
> >> > Something really strange is that it doesn't happen with all the
> >> queries.
> >> >
> >> >
> >>  It
> >> 
> >> 
> >> > is happening with a query that filters some integer and string
> fields
> >> >
> >> >
> >>  joined
> >> 
> >> 
> >> > by an AND operator. Something like A:1 AND B:2 AND (C:3 AND
> D:"CA")
> >> >
> >> >
> >>  (exact
> >> 
> >> 
> >> > match).
> >> > My index is around 1200M documents.
> >> >
> >> > Thanks,
> >> >
> >> > Jonathan
> >> >
> >> >
> >> >
> >> 
> >> >>>
> >> >>>
> >> >>
> >> >
> >>
> >
>
>
>
>

Re: dismax matches & ranking

: 1. dismax query handler and filter query (fq)
: 
: if query= coffee , fq= yiw_bus_city: san jose, 
: 
: I get 0 results for this query again, but this one works fine, If mention
: qt=standard query handler

with qt=standard this is matching whatever your defaultSearchField is 
configured to be ... if that field isn't in the "qf" for your dismax 
handler, that would explain why dismax doesn't find it.

: 2. dismax and ranking
: 
: q=san jose 
: 
: but my collection have more document for San Francisco, less for San Jose,
: 
: a. i get san francisco listed or listed before san jose some time, i guess
: this is because of the term frequency of "san francisco",

it's hard to say without looking ath te score expalinations for documents 
that score high (even though you don't want them to) and the explanations 
for documents that score low (even though you wnat them high) ... more 
then likely you should be using more fields with bigger boosts in your 
"pf" so that documents are heavily rewarded when they match the full query 
string exactly.


-Hoss

Re: Date Faceting and Double Counting

On Fri, Sep 11, 2009 at 3:59 PM, Lance Norskog  wrote:
> Thanks! I had to find this in the Lucene query parser syntax- it is
> not mentioned anywhere in the Solr wiki. You are right [a TO z} and {a
> TO z] are obvious improvements and solve the bucket-search problem the
> right way. But this collides with wild-card range searches.

What are wild-card range searches?

> What does this mean?
>
>      {* TO *}

Same thing as [* TO *] - not worth trying to make it different IMO.

-Yonik
http://www.lucidimagination.com

Re: Backups using Replication

2009-09-11 Thread wojtekpia


I've verified that renaming backAfter to snapshot works (I should've checked
before asking). Thanks Noble!


wojtekpia wrote:
> 
> 
>  
>  
> ... 
> optimize 
> ... 
>  
>  
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Backups-using-Replication-tp25350083p25407846.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Backups using Replication

2009-09-11 Thread wojtekpia


Do you mean that it's been renamed, so this should work?

 
 
... 
optimize 
... 
 
 


Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
> 
> before that backupAfter was called "snapshot"
> 

-- 
View this message in context: 
http://www.nabble.com/Backups-using-Replication-tp25350083p25407695.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Very slow first query

2009-09-11 Thread Lance Norskog

This sounds like a memory-handling problem. The JVM could be too
small, forcing a lot of garbage collections during the first search.
It could be too big and choke off the OS disk cache. It could be too
big and cause paging.

Does this search query include a sort command? Sorting creates a large
data structure the first time, then caches it. For 12 million
documents this should not take 50 seconds.

How big are the index files? Not the number of documents, but the
total size in gigabytes in solr/data/index.

On Fri, Sep 11, 2009 at 10:21 AM, Jonathan Ariel  wrote:
> Ok thanks, if it's the IO OS Disk cache, which would be my options? changing
> the disk to a faster one?
>
> On Fri, Sep 11, 2009 at 1:32 PM, Yonik Seeley <
> yonik.see...@lucidimagination.com> wrote:
>
>> At the Lucene level there is the term index and the norms too:
>>
>> http://search.lucidimagination.com/search/document/b5eee1fc75cc454c/caching_in_lucene
>>
>> But 50s? That would seem to indicate it's the OS disk cache and you're
>> waiting for IO.  You should be able to confirm if you're IO bound by
>> simply looking at the CPU utilization during this 50s query.
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>>
>>
>> On Fri, Sep 11, 2009 at 8:59 AM, Jonathan Ariel 
>> wrote:
>> > yes of course. but in my case I'm not using filter queries nor facets.
>> > it is a really simple query. actually the query params are like this:
>> > ?q=location_country:1 AND category:377 AND location_state:"CA" and
>> > location_city:"Sacramento"
>> >
>> > location_country is an integer
>> > category is an integer
>> > location_state is a string
>> > and location_city is a string
>> >
>> > as you can see no filter query and no facets. and for this query the
>> first
>> > time that I execute it it takes almost 50s to run, while for the
>> following
>> > query:
>> >
>> > ?q=title_search:test
>> >
>> > title_search is a tokenized text field with a bunch of filters
>> >
>> > it takes a couple of ms
>> >
>> > I'm always talking about executing these queries the first time after
>> > restarting solr.
>> >
>> > I just want to understand the cause and be sure I won't have this
>> behaviour
>> > every time I commit or optimize.
>> >
>> > Jonathan
>> >
>> > On Fri, Sep 11, 2009 at 7:28 AM, Uri Boness  wrote:
>> >
>> >> "Not having any facet" and "Not using a filter cache" are two different
>> >> things. If you're not using query filters, you can still have facet
>> >> calculated and returned as part of the search result. The facet
>> component
>> >> uses lucene's field cache to retrieve values for the facet field.
>> >>
>> >>
>> >> Jonathan Ariel wrote:
>> >>
>> >>> Yes, but in this case the query that I'm executing doesn't have any
>> facet.
>> >>> I
>> >>> mean for this query I'm not using any filter cache.What does it means
>> >>> "operating system cache can be significant"? That my first query
>> uploads a
>> >>> big chunk on the index into memory (maybe even the entire index)?
>> >>>
>> >>> On Thu, Sep 10, 2009 at 10:07 PM, Yonik Seeley
>> >>> wrote:
>> >>>
>> >>>
>> >>>
>>  At 12M documents, operating system cache can be significant.
>>  Also, the first time you sort or facet on a field, a field cache
>>  instance is populated which can take a lot of time.  You can prevent
>>  slow first queries by configuring a static warming query in
>>  solrconfig.xml that includes the common sorts and facets.
>> 
>>  -Yonik
>>  http://www.lucidimagination.com
>> 
>>  On Thu, Sep 10, 2009 at 8:55 PM, Jonathan Ariel 
>>  wrote:
>> 
>> 
>> > Hi!Why would it take for the first query that I execute almost 60
>> > seconds
>> >
>> >
>>  to
>> 
>> 
>> > run and after that no more than 50ms? I disabled all my caching to
>> check
>> >
>> >
>>  if
>> 
>> 
>> > it is the reason for the subsequent fast responses, but the same
>> > happens.
>> > I'm using solr 1.3.
>> > Something really strange is that it doesn't happen with all the
>> queries.
>> >
>> >
>>  It
>> 
>> 
>> > is happening with a query that filters some integer and string fields
>> >
>> >
>>  joined
>> 
>> 
>> > by an AND operator. Something like A:1 AND B:2 AND (C:3 AND D:"CA")
>> >
>> >
>>  (exact
>> 
>> 
>> > match).
>> > My index is around 1200M documents.
>> >
>> > Thanks,
>> >
>> > Jonathan
>> >
>> >
>> >
>> 
>> >>>
>> >>>
>> >>
>> >
>>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Date Faceting and Double Counting

2009-09-11 Thread Lance Norskog

Thanks! I had to find this in the Lucene query parser syntax- it is
not mentioned anywhere in the Solr wiki. You are right [a TO z} and {a
TO z] are obvious improvements and solve the bucket-search problem the
right way. But this collides with wild-card range searches.

What does this mean?

  {* TO *}

On Fri, Sep 11, 2009 at 10:02 AM, Chris Hostetter
 wrote:
>
> : datefield:[X TO* Y] for X to Y-0....1
> :
> : This would be backwards-compatible. {} are used for other things and lexing
>
> You lost me there ... {} aren't used for "other things" in the query
> parser -- they're used for range queries that are exclusive of their end
> points.  datefield:{X TO Y} is already legal syntax, i'm just saying it
> would be nice if datefield:[X TO Y} and datefield:{X TO Y] were legal as
> well.
>
>
> -Hoss
>
>

-- 
Lance Norskog
goks...@gmail.com

Facet Response Structure

2009-09-11 Thread smock


I'd like to propose a change to the facet response structure.  Currently, it
looks like:
{'facet_fields':{'field1':[('value1',count1),('value2',count2),(null,missingCount)]}}

My immediate problem with this structure is that null is not of the same
type as the 'value's.  Also, the meaning of the (null,missingCount) tuple is
not the same as the meaning of the ('value',count) tuples, it is a special
case to represent the documents for which the field has no value.  I'd like
to propose changing the response to:
{'facet_fields',:{'field1':{'facets':[('value1',count1),('value2',count2)],'missing':missingCount}}}


In addition to cleaning up the 'null' issue mentioned above, I think this
will allow for greater flexibility moving forward with the facet component. 
For instance, it would be great if the FacetComponent could add an optional
count of the 'hits', or number of distinct facet values contained in the
query result.  If the facet request has a limit on it, this number is not
available via a count of the returned facet values.  The response structure
I've outlined above could accomodate this piece of metadata very easily:
{'facet_fields',:{'field1':{'facets':[('value1',count1),('value2',count2)],'missing':missingCount,'hits':hitsCount}}}


What does everyone think?  I'd be happy to submit a patch to solr (for 1.5,
of course), if the solr community is in favor of it.


-- 
View this message in context: 
http://www.nabble.com/Facet-Response-Structure-tp25407363p25407363.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr SVN build problem


: I am building Solr from source. During building it from source I am getting
: following error.

1) what ant targets are you running? ... there's no reason (i can think 
of) for someone building from SVN to need to generate the maven artifacts ... 
try 
"ant dist"

2) I opened a bug for this: SOLR-1424 ... if you want to try making the 
change i suggest there you might see some improvements (but i can't test 
it)


: 
: generate-maven-artifacts:
: [mkdir] Created dir: c:\Downloads\solr_trunk\build\maven
: [mkdir] Created dir: c:\Downloads\solr_trunk\dist\maven
:  [copy] Copying 1 file to
: c:\Downloads\solr_trunk\build\maven\c:\Downloads\s
: olr_trunk\src\maven
: 
: BUILD FAILED
: c:\Downloads\solr_trunk\build.xml:741: The following error occurred while
: execut
: ing this line:
: c:\Downloads\solr_trunk\common-build.xml:261: Failed to copy
: c:\Downloads\solr_t
: runk\src\maven\solr-parent-pom.xml.template to
: c:\Downloads\solr_trunk\build\mav
: en\c:\Downloads\solr_trunk\src\maven\solr-parent-pom.xml.template due to
: java.io
: .FileNotFoundException
: c:\Downloads\solr_trunk\build\maven\c:\Downloads\solr_tru
: nk\src\maven\solr-parent-pom.xml.template (The filename, directory name, or
: volu
: me label syntax is incorrect)
: 
: Regards,
: Allahbaksh
: 



-Hoss

Re: Can't delete with a fq?


: I'm trying to delete using SolJ's "deleteByQuery", but it doesn't like
: it that I've added an "fq" parameter.  Here's what I see in the logs:

the error ayou are getting is because deleteByQuery takes in a solr query 
string ... if you include "&fq=" in that string, then you aren't passing a 
query stirng, you're passing a URL fragment ... you'd get the same error 
if you put a URL encoded "&fq=..." in the "q" param when doing a search.

in general, the delete features of solr don't support filter queries (fq) 
... you are only allowed to pass a single query string.




-Hoss

Re: Default Query Type For Facet Queries

2009-09-11 Thread Stephen Duncan Jr

On Fri, Sep 11, 2009 at 2:36 PM, Chris Hostetter
wrote:

>
> : I haven't experienced any such problems; it's just a query-parser plugin
> : that adds some behavior on top of the normal query parsing.  In any case,
> : even if I use a custom request handler with my custom parser, can I get
> : facet-queries to use this custom parser by default as well?
>
> if you change teh default parser for the entire handler, it should be used
> for all query parsing that doesn't use the {!foo} syntax ... but to answer
> your orriginal question there is no way to set the default for facet.query
> independently of the main default -- that would require a patch to the
> FacetComponent to look at init params (where it could find some
> default/invarrient params that would override the main ones)
>
>
> : > > > Is there a way to register it as the default for FacetComponent in
> : > > > solrconfig.xml?
>
>
>
> -Hoss
>
>
My experience (which is on a trunk build from a few weeks back of Solr 2.4),
is that changing the default parser for the handler does NOT change it for
facet.query.  I had expected it would, but was disappointed.

-- 
Stephen Duncan Jr
www.stephenduncanjr.com

Re: how to create a custom queryparse to handle new functions


I think you and Shalin are having a vocabulary problem.

you used the term "function" which has a specific meaning in Solr, if you 
want to write a new function, which works with the existing Function 
syntax solr provides, and can be nested inside of other functions, then 
what you want to write is a ValueSourceParser.  If you want to write your 
own Query parser with an entirely new syntax (in which you want things to 
look like function calls and produce query objects) then you need to 
implement a QParserPlugin.

Some pointers about both can be found here...

http://wiki.apache.org/solr/SolrPlugins

: > You do not need to create a custom query parser for this. You just need to
: > create a custom function query. Look at one of the existing function queries
: > in Solr as an example.
: >
: This is where the need originates from -
: 
http://www.lucidimagination.com/search/document/a4bb0dfee53f7493/how_to_scan_dynamic_field_without_specifying_each_field_in_query
: 
: Within the function, the intent is to rewrite incoming parameter into a
: different query. Can this be done? AFAIK, not.
: 
: Cheers
: Avlesh
: 
: On Sat, Sep 5, 2009 at 3:21 AM, Shalin Shekhar Mangar <
: shalinman...@gmail.com> wrote:
: 
: > On Sat, Sep 5, 2009 at 2:15 AM, gdeconto  >wrote:
: >
: > >
: > > Can someone point me in the general direction of how to create a custom
: > > queryparser that would allow me to create custom query commands like
: > this:
: > >
: > > 
http://localhost:8994/solr/select?q=myfunction(
: > �Foo�,
: > > 3)
: > >
: > > or point me towards an example?
: > >
: > > note that the actual functionality of myfunction is not defined.  I am
: > just
: > > wondering if this sort of extensibility is possible.
: > >
: >
: > You do not need to create a custom query parser for this. You just need to
: > create a custom function query. Look at one of the existing function
: > queries
: > in Solr as an example.
: >
: >
: > --
: > Regards,
: > Shalin Shekhar Mangar.
: >
: 



-Hoss

Re: Nonsensical Solr Relevancy Score

At a high level, there's this:
http://wiki.apache.org/solr/SolrRelevancyFAQ#head-343e33b6472ca53afb94e1544ae3fcf7d474e5fc

-Yonik
http://www.lucidimagination.com



On Fri, Sep 11, 2009 at 1:05 PM, Matthew Runo  wrote:
> I'd actually like to see a detailed wiki page on how all the parts of a
> score are actually calculated and inter-related, but I'm not knowledgeable
> enough to write it =\
>
> Thanks for your time!
>
> Matthew Runo
> Software Engineer, Zappos.com
> mr...@zappos.com - 702-943-7833
>
> On Sep 9, 2009, at 3:00 PM, Jeff Newburn wrote:
>
>> I have done a search on the word “blue” in our index.  The debugQuery
>> shows
>> some extremely strange methods of scoring.  Somehow product 1 gets a
>> higher
>> score with only 1 match on the word blue when product 2 gets a lower score
>> with the same field match AND an additional field match.  Can someone
>> please
>> help me understand why such an obviously more relevant product is given a
>> lower score.
>>
>>  
>> 2.3623571 = (MATCH) sum of:
>>  0.26248413 = (MATCH) max plus 0.5 times others of:
>>   0.26248413 = (MATCH) weight(productNameSearch:blue in 112779), product
>> of:
>>     0.032673787 = queryWeight(productNameSearch:blue), product of:
>>       8.033478 = idf(docFreq=120, numDocs=136731)
>>       0.0040672035 = queryNorm
>>     8.033478 = (MATCH) fieldWeight(productNameSearch:blue in 112779),
>> product of:
>>       1.0 = tf(termFreq(productNameSearch:blue)=1)
>>       8.033478 = idf(docFreq=120, numDocs=136731)
>>       1.0 = fieldNorm(field=productNameSearch, doc=112779)
>>  2.099873 = (MATCH) max plus 0.5 times others of:
>>   2.099873 = (MATCH) weight(productNameSearch:blue^8.0 in 112779), product
>> of:
>>     0.2613903 = queryWeight(productNameSearch:blue^8.0), product of:
>>       8.0 = boost
>>       8.033478 = idf(docFreq=120, numDocs=136731)
>>       0.0040672035 = queryNorm
>>     8.033478 = (MATCH) fieldWeight(productNameSearch:blue in 112779),
>> product of:
>>       1.0 = tf(termFreq(productNameSearch:blue)=1)
>>       8.033478 = idf(docFreq=120, numDocs=136731)
>>       1.0 = fieldNorm(field=productNameSearch, doc=112779)
>> 
>>  
>> 1.9483687 = (MATCH) sum of:
>>  0.63594794 = (MATCH) max plus 0.5 times others of:
>>   0.16405259 = (MATCH) weight(productNameSearch:blue in 8142), product of:
>>     0.032673787 = queryWeight(productNameSearch:blue), product of:
>>       8.033478 = idf(docFreq=120, numDocs=136731)
>>       0.0040672035 = queryNorm
>>     5.0209236 = (MATCH) fieldWeight(productNameSearch:blue in 8142),
>> product of:
>>       1.0 = tf(termFreq(productNameSearch:blue)=1)
>>       8.033478 = idf(docFreq=120, numDocs=136731)
>>       0.625 = fieldNorm(field=productNameSearch, doc=8142)
>>   0.55392164 = (MATCH) weight(color:blue^10.0 in 8142), product of:
>>     0.15009704 = queryWeight(color:blue^10.0), product of:
>>       10.0 = boost
>>       3.6904235 = idf(docFreq=9309, numDocs=136731)
>>       0.0040672035 = queryNorm
>>     3.6904235 = (MATCH) fieldWeight(color:blue in 8142), product of:
>>       1.0 = tf(termFreq(color:blue)=1)
>>       3.6904235 = idf(docFreq=9309, numDocs=136731)
>>       1.0 = fieldNorm(field=color, doc=8142)
>>  1.3124207 = (MATCH) max plus 0.5 times others of:
>>   1.3124207 = (MATCH) weight(productNameSearch:blue^8.0 in 8142), product
>> of:
>>     0.2613903 = queryWeight(productNameSearch:blue^8.0), product of:
>>       8.0 = boost
>>       8.033478 = idf(docFreq=120, numDocs=136731)
>>       0.0040672035 = queryNorm
>>     5.0209236 = (MATCH) fieldWeight(productNameSearch:blue in 8142),
>> product of:
>>       1.0 = tf(termFreq(productNameSearch:blue)=1)
>>       8.033478 = idf(docFreq=120, numDocs=136731)
>>       0.625 = fieldNorm(field=productNameSearch, doc=8142)
>> 
>>
>> --
>> Jeff Newburn
>> Software Engineer, Zappos.com
>> jnewb...@zappos.com - 702-943-7562
>>
>
>

Re: Nonsensical Solr Relevancy Score

Factor 1: idf
  If you do a search on "blue whales" you are probably much more
interested in whales than you are in things that are blue.  The idf
factor takes this term rarity into account.  In your case, color:blue
appears in over 9000 documents, but productNameSearch:blue only
appears in 120 documents (and thus it's idf factor is much higher).
One option is to simply boost searches on your color field higher.

Factor 2: length normalization
  0.625 = fieldNorm(field=productNameSearch, doc=8142)
The second document probably has a match in a longer field, which is a
less specific match and thus gets penalized. Because this is in the
very important field (as measured by idf) this causes the second doc
to lose.

Factor 3: No coord factor in the top level boolean query in generated
dismax queries.  This would generally cause matches in more fields to
be boosted beyond just adding their scores together.   Maybe we should
have an option for this.

-Yonik
http://www.lucidimagination.com



On Wed, Sep 9, 2009 at 6:00 PM, Jeff Newburn  wrote:
> I have done a search on the word ³blue² in our index.  The debugQuery shows
> some extremely strange methods of scoring.  Somehow product 1 gets a higher
> score with only 1 match on the word blue when product 2 gets a lower score
> with the same field match AND an additional field match.  Can someone please
> help me understand why such an obviously more relevant product is given a
> lower score.
>
>  
> 2.3623571 = (MATCH) sum of:
>  0.26248413 = (MATCH) max plus 0.5 times others of:
>    0.26248413 = (MATCH) weight(productNameSearch:blue in 112779), product
> of:
>      0.032673787 = queryWeight(productNameSearch:blue), product of:
>        8.033478 = idf(docFreq=120, numDocs=136731)
>        0.0040672035 = queryNorm
>      8.033478 = (MATCH) fieldWeight(productNameSearch:blue in 112779),
> product of:
>        1.0 = tf(termFreq(productNameSearch:blue)=1)
>        8.033478 = idf(docFreq=120, numDocs=136731)
>        1.0 = fieldNorm(field=productNameSearch, doc=112779)
>  2.099873 = (MATCH) max plus 0.5 times others of:
>    2.099873 = (MATCH) weight(productNameSearch:blue^8.0 in 112779), product
> of:
>      0.2613903 = queryWeight(productNameSearch:blue^8.0), product of:
>        8.0 = boost
>        8.033478 = idf(docFreq=120, numDocs=136731)
>        0.0040672035 = queryNorm
>      8.033478 = (MATCH) fieldWeight(productNameSearch:blue in 112779),
> product of:
>        1.0 = tf(termFreq(productNameSearch:blue)=1)
>        8.033478 = idf(docFreq=120, numDocs=136731)
>        1.0 = fieldNorm(field=productNameSearch, doc=112779)
> 
>  
> 1.9483687 = (MATCH) sum of:
>  0.63594794 = (MATCH) max plus 0.5 times others of:
>    0.16405259 = (MATCH) weight(productNameSearch:blue in 8142), product of:
>      0.032673787 = queryWeight(productNameSearch:blue), product of:
>        8.033478 = idf(docFreq=120, numDocs=136731)
>        0.0040672035 = queryNorm
>      5.0209236 = (MATCH) fieldWeight(productNameSearch:blue in 8142),
> product of:
>        1.0 = tf(termFreq(productNameSearch:blue)=1)
>        8.033478 = idf(docFreq=120, numDocs=136731)
>        0.625 = fieldNorm(field=productNameSearch, doc=8142)
>    0.55392164 = (MATCH) weight(color:blue^10.0 in 8142), product of:
>      0.15009704 = queryWeight(color:blue^10.0), product of:
>        10.0 = boost
>        3.6904235 = idf(docFreq=9309, numDocs=136731)
>        0.0040672035 = queryNorm
>      3.6904235 = (MATCH) fieldWeight(color:blue in 8142), product of:
>        1.0 = tf(termFreq(color:blue)=1)
>        3.6904235 = idf(docFreq=9309, numDocs=136731)
>        1.0 = fieldNorm(field=color, doc=8142)
>  1.3124207 = (MATCH) max plus 0.5 times others of:
>    1.3124207 = (MATCH) weight(productNameSearch:blue^8.0 in 8142), product
> of:
>      0.2613903 = queryWeight(productNameSearch:blue^8.0), product of:
>        8.0 = boost
>        8.033478 = idf(docFreq=120, numDocs=136731)
>        0.0040672035 = queryNorm
>      5.0209236 = (MATCH) fieldWeight(productNameSearch:blue in 8142),
> product of:
>        1.0 = tf(termFreq(productNameSearch:blue)=1)
>        8.033478 = idf(docFreq=120, numDocs=136731)
>        0.625 = fieldNorm(field=productNameSearch, doc=8142)
> 
>
> --
> Jeff Newburn
> Software Engineer, Zappos.com
> jnewb...@zappos.com - 702-943-7562
>
>

Re: Single Core or Multiple Core?


For the record: even if you're only going to have one SOlrCore, using the 
multicore support (ie: having a solr.xml file) might prove handy from a 
maintence standpoint ... the ability to configure new "on deck cores" with 
new configs, populate them with data, and then swap them in place for your 
previous core without any downtime is a really nice feature to take 
advantage of.


: Date: Thu, 3 Sep 2009 20:05:15 -0300
: From: Jonathan Ariel 
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Single Core or Multiple Core?
: 
: It seems like it is really hard to decide when the Multiple Core solution is
: more appropriate.As I could understand from this list and wiki the Multiple
: Core feature was designed to address the need of handling different sets of
: data within the same solr instance, where the sets of data don't need to be
: joined.
: In my case the documents are of a specific site and country. So document A
: can be of Site 1 / Country 1, B of Site 2 / Country 1, C of Site 1 / Country
: 2, and so on.
: For the use cases of my application I will never query across countries or
: sites. I will always have to provide to the query the country id and the
: site id.
: Would you suggest to split my data into cores? I have few sites (around 20)
: and more countries (around 90).
: Should I split my data into sites (around 20 cores) and within a core filter
: by site? Should I split by Site and Country (around 1800 cores)?
: What should I consider when splitting my data into multiple cores?
: 
: Thanks
: 
: Jonathan
: 



-Hoss

Re: Default Query Type For Facet Queries


: I haven't experienced any such problems; it's just a query-parser plugin
: that adds some behavior on top of the normal query parsing.  In any case,
: even if I use a custom request handler with my custom parser, can I get
: facet-queries to use this custom parser by default as well?

if you change teh default parser for the entire handler, it should be used 
for all query parsing that doesn't use the {!foo} syntax ... but to answer 
your orriginal question there is no way to set the default for facet.query 
independently of the main default -- that would require a patch to the 
FacetComponent to look at init params (where it could find some 
default/invarrient params that would override the main ones)


: > > > Is there a way to register it as the default for FacetComponent in
: > > > solrconfig.xml?



-Hoss

Spec Version vs Implementation Version

2009-09-11 Thread Israel Ekpo

What are the differences between specification version and implementation
version

I downloaded the nightly build for September 05 2009 and it has a spec
version of 1.3 and the implementation version states 1.4-dev

What does that mean?


-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.

Re: Very slow first query

Ok thanks, if it's the IO OS Disk cache, which would be my options? changing
the disk to a faster one?

On Fri, Sep 11, 2009 at 1:32 PM, Yonik Seeley <
yonik.see...@lucidimagination.com> wrote:

> At the Lucene level there is the term index and the norms too:
>
> http://search.lucidimagination.com/search/document/b5eee1fc75cc454c/caching_in_lucene
>
> But 50s? That would seem to indicate it's the OS disk cache and you're
> waiting for IO.  You should be able to confirm if you're IO bound by
> simply looking at the CPU utilization during this 50s query.
>
> -Yonik
> http://www.lucidimagination.com
>
>
>
> On Fri, Sep 11, 2009 at 8:59 AM, Jonathan Ariel 
> wrote:
> > yes of course. but in my case I'm not using filter queries nor facets.
> > it is a really simple query. actually the query params are like this:
> > ?q=location_country:1 AND category:377 AND location_state:"CA" and
> > location_city:"Sacramento"
> >
> > location_country is an integer
> > category is an integer
> > location_state is a string
> > and location_city is a string
> >
> > as you can see no filter query and no facets. and for this query the
> first
> > time that I execute it it takes almost 50s to run, while for the
> following
> > query:
> >
> > ?q=title_search:test
> >
> > title_search is a tokenized text field with a bunch of filters
> >
> > it takes a couple of ms
> >
> > I'm always talking about executing these queries the first time after
> > restarting solr.
> >
> > I just want to understand the cause and be sure I won't have this
> behaviour
> > every time I commit or optimize.
> >
> > Jonathan
> >
> > On Fri, Sep 11, 2009 at 7:28 AM, Uri Boness  wrote:
> >
> >> "Not having any facet" and "Not using a filter cache" are two different
> >> things. If you're not using query filters, you can still have facet
> >> calculated and returned as part of the search result. The facet
> component
> >> uses lucene's field cache to retrieve values for the facet field.
> >>
> >>
> >> Jonathan Ariel wrote:
> >>
> >>> Yes, but in this case the query that I'm executing doesn't have any
> facet.
> >>> I
> >>> mean for this query I'm not using any filter cache.What does it means
> >>> "operating system cache can be significant"? That my first query
> uploads a
> >>> big chunk on the index into memory (maybe even the entire index)?
> >>>
> >>> On Thu, Sep 10, 2009 at 10:07 PM, Yonik Seeley
> >>> wrote:
> >>>
> >>>
> >>>
>  At 12M documents, operating system cache can be significant.
>  Also, the first time you sort or facet on a field, a field cache
>  instance is populated which can take a lot of time.  You can prevent
>  slow first queries by configuring a static warming query in
>  solrconfig.xml that includes the common sorts and facets.
> 
>  -Yonik
>  http://www.lucidimagination.com
> 
>  On Thu, Sep 10, 2009 at 8:55 PM, Jonathan Ariel 
>  wrote:
> 
> 
> > Hi!Why would it take for the first query that I execute almost 60
> > seconds
> >
> >
>  to
> 
> 
> > run and after that no more than 50ms? I disabled all my caching to
> check
> >
> >
>  if
> 
> 
> > it is the reason for the subsequent fast responses, but the same
> > happens.
> > I'm using solr 1.3.
> > Something really strange is that it doesn't happen with all the
> queries.
> >
> >
>  It
> 
> 
> > is happening with a query that filters some integer and string fields
> >
> >
>  joined
> 
> 
> > by an AND operator. Something like A:1 AND B:2 AND (C:3 AND D:"CA")
> >
> >
>  (exact
> 
> 
> > match).
> > My index is around 1200M documents.
> >
> > Thanks,
> >
> > Jonathan
> >
> >
> >
> 
> >>>
> >>>
> >>
> >
>

Re: Nonsensical Solr Relevancy Score

2009-09-11 Thread Matthew Runo

I'd actually like to see a detailed wiki page on how all the parts of  
a score are actually calculated and inter-related, but I'm not  
knowledgeable enough to write it =\


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Sep 9, 2009, at 3:00 PM, Jeff Newburn wrote:

I have done a search on the word “blue” in our index.  The  
debugQuery shows
some extremely strange methods of scoring.  Somehow product 1 gets a  
higher
score with only 1 match on the word blue when product 2 gets a lower  
score
with the same field match AND an additional field match.  Can  
someone please
help me understand why such an obviously more relevant product is  
given a

lower score.

 
2.3623571 = (MATCH) sum of:
 0.26248413 = (MATCH) max plus 0.5 times others of:
   0.26248413 = (MATCH) weight(productNameSearch:blue in 112779),  
product

of:
 0.032673787 = queryWeight(productNameSearch:blue), product of:
   8.033478 = idf(docFreq=120, numDocs=136731)
   0.0040672035 = queryNorm
 8.033478 = (MATCH) fieldWeight(productNameSearch:blue in 112779),
product of:
   1.0 = tf(termFreq(productNameSearch:blue)=1)
   8.033478 = idf(docFreq=120, numDocs=136731)
   1.0 = fieldNorm(field=productNameSearch, doc=112779)
 2.099873 = (MATCH) max plus 0.5 times others of:
   2.099873 = (MATCH) weight(productNameSearch:blue^8.0 in 112779),  
product

of:
 0.2613903 = queryWeight(productNameSearch:blue^8.0), product of:
   8.0 = boost
   8.033478 = idf(docFreq=120, numDocs=136731)
   0.0040672035 = queryNorm
 8.033478 = (MATCH) fieldWeight(productNameSearch:blue in 112779),
product of:
   1.0 = tf(termFreq(productNameSearch:blue)=1)
   8.033478 = idf(docFreq=120, numDocs=136731)
   1.0 = fieldNorm(field=productNameSearch, doc=112779)

 
1.9483687 = (MATCH) sum of:
 0.63594794 = (MATCH) max plus 0.5 times others of:
   0.16405259 = (MATCH) weight(productNameSearch:blue in 8142),  
product of:

 0.032673787 = queryWeight(productNameSearch:blue), product of:
   8.033478 = idf(docFreq=120, numDocs=136731)
   0.0040672035 = queryNorm
 5.0209236 = (MATCH) fieldWeight(productNameSearch:blue in 8142),
product of:
   1.0 = tf(termFreq(productNameSearch:blue)=1)
   8.033478 = idf(docFreq=120, numDocs=136731)
   0.625 = fieldNorm(field=productNameSearch, doc=8142)
   0.55392164 = (MATCH) weight(color:blue^10.0 in 8142), product of:
 0.15009704 = queryWeight(color:blue^10.0), product of:
   10.0 = boost
   3.6904235 = idf(docFreq=9309, numDocs=136731)
   0.0040672035 = queryNorm
 3.6904235 = (MATCH) fieldWeight(color:blue in 8142), product of:
   1.0 = tf(termFreq(color:blue)=1)
   3.6904235 = idf(docFreq=9309, numDocs=136731)
   1.0 = fieldNorm(field=color, doc=8142)
 1.3124207 = (MATCH) max plus 0.5 times others of:
   1.3124207 = (MATCH) weight(productNameSearch:blue^8.0 in 8142),  
product

of:
 0.2613903 = queryWeight(productNameSearch:blue^8.0), product of:
   8.0 = boost
   8.033478 = idf(docFreq=120, numDocs=136731)
   0.0040672035 = queryNorm
 5.0209236 = (MATCH) fieldWeight(productNameSearch:blue in 8142),
product of:
   1.0 = tf(termFreq(productNameSearch:blue)=1)
   8.033478 = idf(docFreq=120, numDocs=136731)
   0.625 = fieldNorm(field=productNameSearch, doc=8142)


--
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562

Re: Date Faceting and Double Counting


: datefield:[X TO* Y] for X to Y-0....1
: 
: This would be backwards-compatible. {} are used for other things and lexing

You lost me there ... {} aren't used for "other things" in the query 
parser -- they're used for range queries that are exclusive of their end 
points.  datefield:{X TO Y} is already legal syntax, i'm just saying it 
would be nice if datefield:[X TO Y} and datefield:{X TO Y] were legal as 
well.


-Hoss

Re: Highlighting in SolrJ?

2009-09-11 Thread Jay Hill

Try adding this param: &hl.fragsize=3
(obviously set the fragsize to whatever very high number you need for your
largest doc.)

-Jay


On Fri, Sep 11, 2009 at 7:54 AM, Paul Tomblin  wrote:

> What I want is the whole text of that field with every instance of the
> search term high lighted, even if the search term only occurs in the
> first line of a 300 page field.  I'm not sure if mergeContinuous will
> do that, or if it will miss everything after the last line that
> contains the search term.
>
> On Fri, Sep 11, 2009 at 10:42 AM, Jay Hill  wrote:
> > It's really just a matter of what you're intentions are. There are an
> awful
> > lot of highlighting params and so highlighting is very flexible and
> > customizable. Regarding snippets, as an example Google presents two
> snippets
> > in results, which is fairly common. I'd recommend doing a lot of
> > experimenting by changing the params on the query string to get what you
> > want, and then setting them up in SolrJ. The example I sent was intended
> to
> > be a generic starting point and mostly just to show how to set
> highlighting
> > params and how to get back a List of highlighting results.
> >
> > -Jay
> > http://www.lucidimagination.com
> >
> >
> > On Thu, Sep 10, 2009 at 5:40 PM, Paul Tomblin 
> wrote:
> >
> >> If I set snippets to 9 and "mergeContinuous" to true, will I get
> >> the entire contents of the field with all the search terms replaced?
> >> I don't see what good it would be just getting one line out of the
> >> whole field as a snippet.
> >>
> >> On Thu, Sep 10, 2009 at 7:45 PM, Jay Hill 
> wrote:
> >> > Set up the query like this to highlight a field named "content":
> >> >
> >> >SolrQuery query = new SolrQuery();
> >> >query.setQuery("foo");
> >> >
> >> >query.setHighlight(true).setHighlightSnippets(1); //set other
> params
> >> as
> >> > needed
> >> >query.setParam("hl.fl", "content");
> >> >
> >> >QueryResponse queryResponse =getSolrServer().query(query);
> >> >
> >> > Then to get back the highlight results you need something like this:
> >> >
> >> >Iterator iter = queryResponse.getResults();
> >> >
> >> >while (iter.hasNext()) {
> >> >  SolrDocument resultDoc = iter.next();
> >> >
> >> >  String content = (String) resultDoc.getFieldValue("content"));
> >> >  String id = (String) resultDoc.getFieldValue("id"); //id is the
> >> > uniqueKey field
> >> >
> >> >  if (queryResponse.getHighlighting().get(id) != null) {
> >> >List highightSnippets =
> >> > queryResponse.getHighlighting().get(id).get("content");
> >> >  }
> >> >}
> >> >
> >> > Hope that gets you what you need.
> >> >
> >> > -Jay
> >> > http://www.lucidimagination.com
> >> >
> >> > On Thu, Sep 10, 2009 at 3:19 PM, Paul Tomblin 
> >> wrote:
> >> >
> >> >> Can somebody point me to some sample code for using highlighting in
> >> >> SolrJ?  I understand the highlighted versions of the field comes in a
> >> >> separate NamedList?  How does that work?
> >> >>
> >> >> --
> >> >> http://www.linkedin.com/in/paultomblin
> >> >>
> >> >
> >>
> >>
> >>
> >> --
> >> http://www.linkedin.com/in/paultomblin
> >>
> >
>
>
>
> --
> http://www.linkedin.com/in/paultomblin
>

Re: SnowballPorterFilterFactory stemming word question

2009-09-11 Thread darniz


The link to download kstem is not working.

Any other link please



Yonik Seeley-2 wrote:
> 
> On Mon, Sep 7, 2009 at 2:49 AM, darniz wrote:
>> Does solr provide any implementation for dictionary stemmer, please let
>> me
>> know
> 
> The Krovetz stemmer is dictionary based (english only):
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem
> 
> But from your original question, maybe you are concerned when the
> stemmer doesn't return real words? For normal search, don't be.
> During index time, words are stemmed, and then later the query is
> stemmed.  If the results match up, you're good.  For example, a
> document containing the word "machines" may stem to "machin" and then
> a query of "machined" will stem to "machin" and thus match the
> document.
> 
> 
> -Yonik
> http://www.lucidimagination.com
> 
> 

-- 
View this message in context: 
http://www.nabble.com/SnowballPorterFilterFactory-stemming-word-question-tp25180310p25404615.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Query runs faster without filter queries?

On Fri, Sep 11, 2009 at 8:50 AM, Jonathan Ariel  wrote:
> Oh... another question regard this. If I disabled the query cache and
> document cache and I execute a query without filter cache (no facets, no
> filter query, etc.). Why the first time I execute the query it takes around
> 400ms and the second time 10ms? It seems like it should always take the same
> amount of time, right?

Same pointer I just gave in a nother thread:
http://search.lucidimagination.com/search/document/b5eee1fc75cc454c/caching_in_lucene

-Yonik
http://www.lucidimagination.com

Re: Very slow first query

At the Lucene level there is the term index and the norms too:
http://search.lucidimagination.com/search/document/b5eee1fc75cc454c/caching_in_lucene

But 50s? That would seem to indicate it's the OS disk cache and you're
waiting for IO.  You should be able to confirm if you're IO bound by
simply looking at the CPU utilization during this 50s query.

-Yonik
http://www.lucidimagination.com



On Fri, Sep 11, 2009 at 8:59 AM, Jonathan Ariel  wrote:
> yes of course. but in my case I'm not using filter queries nor facets.
> it is a really simple query. actually the query params are like this:
> ?q=location_country:1 AND category:377 AND location_state:"CA" and
> location_city:"Sacramento"
>
> location_country is an integer
> category is an integer
> location_state is a string
> and location_city is a string
>
> as you can see no filter query and no facets. and for this query the first
> time that I execute it it takes almost 50s to run, while for the following
> query:
>
> ?q=title_search:test
>
> title_search is a tokenized text field with a bunch of filters
>
> it takes a couple of ms
>
> I'm always talking about executing these queries the first time after
> restarting solr.
>
> I just want to understand the cause and be sure I won't have this behaviour
> every time I commit or optimize.
>
> Jonathan
>
> On Fri, Sep 11, 2009 at 7:28 AM, Uri Boness  wrote:
>
>> "Not having any facet" and "Not using a filter cache" are two different
>> things. If you're not using query filters, you can still have facet
>> calculated and returned as part of the search result. The facet component
>> uses lucene's field cache to retrieve values for the facet field.
>>
>>
>> Jonathan Ariel wrote:
>>
>>> Yes, but in this case the query that I'm executing doesn't have any facet.
>>> I
>>> mean for this query I'm not using any filter cache.What does it means
>>> "operating system cache can be significant"? That my first query uploads a
>>> big chunk on the index into memory (maybe even the entire index)?
>>>
>>> On Thu, Sep 10, 2009 at 10:07 PM, Yonik Seeley
>>> wrote:
>>>
>>>
>>>
 At 12M documents, operating system cache can be significant.
 Also, the first time you sort or facet on a field, a field cache
 instance is populated which can take a lot of time.  You can prevent
 slow first queries by configuring a static warming query in
 solrconfig.xml that includes the common sorts and facets.

 -Yonik
 http://www.lucidimagination.com

 On Thu, Sep 10, 2009 at 8:55 PM, Jonathan Ariel 
 wrote:


> Hi!Why would it take for the first query that I execute almost 60
> seconds
>
>
 to


> run and after that no more than 50ms? I disabled all my caching to check
>
>
 if


> it is the reason for the subsequent fast responses, but the same
> happens.
> I'm using solr 1.3.
> Something really strange is that it doesn't happen with all the queries.
>
>
 It


> is happening with a query that filters some integer and string fields
>
>
 joined


> by an AND operator. Something like A:1 AND B:2 AND (C:3 AND D:"CA")
>
>
 (exact


> match).
> My index is around 1200M documents.
>
> Thanks,
>
> Jonathan
>
>
>

>>>
>>>
>>
>

Re: Solr http post performance seems slow - help?

2009-09-11 Thread Dan A. Dickey

On Thursday 10 September 2009 08:13:33 am Dan A. Dickey wrote:
> I'm posting documents to Solr using http (curl) from
> C++/C code and am seeing approximately 3.3 - 3.4
> documents per second being posted.  Is this to be expected?
> Granted - I understand that this depends somewhat on the
> machine running Solr.  By the way - I'm running Solr inside JBoss.
> 
> I was hoping for maybe 20 or more docs/sec, and 3 or so
> is quite a way from that.
> 
> Also, I'm posting just a single document at a time.  I once tried
> 5 processes each posting documents, and that slowed things
> down considerably.  Down into the multiple (5-10) seconds per document.
> 
> Does anyone have suggestions on what I can try?  I'll soon
> have better servers installed and will be splitting the indexing
> work from the searching - but at this point in time, I wasn't doing
> indexing while searching anyway.  Thanks for any and all help!

Ok, I spent some time on this problem this morning, and have some
interesting results to share.  I started off by making sure both boxes
were attached to the same switch - they weren't, but now are.
It didn't help.

I added some timing code... and found indeed that I was getting about
3.3 - 3.4 documents per second to index.  Not so good.

I stopped JBoss (and Solr) and built up a version of the example
stuff that would run my current configuration instead of the example.
Reading the documentation - this runs Solr in a Jetty container.

And this resulted in indexing speeds ranging between 20 - 30 documents
per second.  Much more acceptable.  And also, with a quick test of using
two processes to index - I hit a rate of about 37 dps.  Much nicer.
I don't know yet how this actually scales - but I intend to find out.
We've almost got some nice quad core xeon's ready...

Our JBoss expert and I will be looking into why this might be occurring.
Does anyone know of any JBoss related slowness with Solr?
And does anyone have any other sort of suggestions to speed indexing
performance?   Thanks for your help all!  I'll keep you up to date with
further progress.
-Dan

-- 
Dan A. Dickey | Senior Software Engineer

Savvis
10900 Hampshire Ave. S., Bloomington, MN  55438
Office: 952.852.4803 | Fax: 952.852.4951
E-mail: dan.dic...@savvis.net

Re: What Tokenizerfactory/TokenFilterFactory can/should I use so a search for "wal mart" matches "walmart"(quotes not included in search or index)?

2009-09-11 Thread Christian Zambrano


Ahmet,

Thanks a lot. Your suggestion was really helpful. I tried using synonyms 
before but for some reason it didn't work but this time around it worked.


On 09/11/2009 02:55 AM, AHMET ARSLAN wrote:

There are a lot of company names that
people are uncertain as to the correct spelling. A few of
examples are:
1. best buy, bestbuy
2. walmart, wal mart, wal-mart
3. Holiday Inn, HolidayInn

What Tokenizer Factory and/or TokenFilterFactory should I
use so that somebody typing "wal mart"(quotes not included)
will find "wal mart" and "walmart"(again, quotes not
included)
 

I faced a similar requirement before. I solved it by hardcoding those names to 
synonyms_index.txt and using SynonymFilterFactory at index time.

synonyms_index.txt will contain:

best buy, bestbuy
walmart, wal mart
Holiday Inn, HolidayInn


   
   
   
   

   
   


Since solr wiki[1] advices to use index time synonym when dealing with 
multi-word synonyms, I am using index time synonym expansion only.

[1] 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-2c461ac74b4ddd82e453dc68fcfc92da77358d46

When working with StandardAnalyzer, wal-mart is broken into two tokens: wal and 
mart. So you dont need to write - forms of the words in synonyms_index.txt


If all of your examples were similar to HolidayInn, you could use solr.WordDelimiterFilterFactory 
(without writing all these company named to a file) but you can't handle "wal mart" and 
"walmart" with it.

Hope this helps.

RE: An issue with using Solr Cell and multiple files

2009-09-11 Thread Kevin Miller

Thank you, this worked perfectly. 


Kevin Miller
Web Services

-Original Message-
From: caman [mailto:aboxfortheotherst...@gmail.com] 
Sent: Thursday, September 10, 2009 9:48 PM
To: solr-user@lucene.apache.org
Subject: Re: An issue with  using Solr Cell and multiple files


You are right. 
I got into same thing. Windows curl gave me error but cygwin ran without
any issues.

thanks


Lance Norskog-2 wrote:
> 
> It is a windows problem (or curl, whatever).  This works with 
> double-quotes.
> 
> C:\Users\work\Downloads>\cygwin\home\work\curl-7.19.4\curl.exe
> http://localhost:8983/solr/update --data-binary "" -H 
> "Content-type:text/xml; charset=utf-8"
> Single-quotes inside double-quotes should work: " waitFlush='false'/>"
> 
> 
> On Tue, Sep 8, 2009 at 11:59 AM, caman
> wrote:
> 
>>
>> seems to be an error with curl
>>
>>
>>
>>
>> Kevin Miller-17 wrote:
>> >
>> > I am getting the same error message.  I am running Solr on a 
>> > Windows machine.  Is the commit command a curl command or is it a
Solr command?
>> >
>> >
>> > Kevin Miller
>> > Web Services
>> >
>> > -Original Message-
>> > From: Grant Ingersoll [mailto:gsing...@apache.org]
>> > Sent: Tuesday, September 08, 2009 12:52 PM
>> > To: solr-user@lucene.apache.org
>> > Subject: Re: An issue with  using Solr Cell and multiple 
>> > files
>> >
>> > solr/examples/exampledocs/post.sh does:
>> > curl $URL --data-binary '' -H 'Content-type:text/xml; 
>> > charset=utf-8'
>> >
>> > Not sure if that helps or how it compares to the book.
>> >
>> > On Sep 8, 2009, at 1:48 PM, Kevin Miller wrote:
>> >
>> >> I am using the Solr nightly build from 8/11/2009.  I am able to 
>> >> index my documents using the Solr Cell but when I attempt to send 
>> >> the commit
>> >
>> >> command I get an error.  I am using the example found in the Solr 
>> >> 1.4 Enterprise Search Server book (recently released) found on
page 84.
>> >> It
>> >> shows to commit the changes as follows (I am showing where my 
>> >> files are located not the example in the book):
>> >>
>>  c:\curl\bin\curl http://echo12:8983/solr/update/ -H
"Content-Type:
>> >> text/xml" --data-binary ''
>> >>
>> >> this give me this error: The system cannot find the file
specified.
>> >>
>> >> I get the same error when I modify it to look like the following:
>> >>
>>  c:\curl\bin\curl http://echo12:8983/solr/update/ '> >> waitFlush="false"/>'
>>  c:\curl\bin\curl "http://echo12:8983/solr/update/"; -H
"Content-Type:
>> >> text/xml" --data-binary ''
>>  c:\curl\bin\curl http://echo12:8983/solr/update/ ''
>>  c:\curl\bin\curl "http://echo12:8983/solr/update/"; ''
>> >>
>> >> I am using the example configuration in Solr so my documents are 
>> >> found
>> >
>> >> in the exampledocs folder also my curl program in located in the 
>> >> root directory which is the reason for the way the curl command is

>> >> being executed.
>> >>
>> >> I would appreciate any information on where to look or how to get 
>> >> the commit command to execute after indexing multiple files.
>> >>
>> >> Kevin Miller
>> >> Oklahoma Tax Commission
>> >> Web Services
>> >
>> > --
>> > Grant Ingersoll
>> > http://www.lucidimagination.com/
>> >
>> > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>> > using Solr/Lucene:
>> > http://www.lucidimagination.com/search
>> >
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/An-issue-with-%3Ccommit-%3E-using-Solr-Cell-and
>> -multiple-files-tp25350995p25352122.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> --
> Lance Norskog
> goks...@gmail.com
> 
> 

--
View this message in context:
http://www.nabble.com/An-issue-with-%3Ccommit-%3E-using-Solr-Cell-and-mu
ltiple-files-tp25350995p25394203.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Highlighting in SolrJ?

2009-09-11 Thread Paul Tomblin

What I want is the whole text of that field with every instance of the
search term high lighted, even if the search term only occurs in the
first line of a 300 page field.  I'm not sure if mergeContinuous will
do that, or if it will miss everything after the last line that
contains the search term.

On Fri, Sep 11, 2009 at 10:42 AM, Jay Hill  wrote:
> It's really just a matter of what you're intentions are. There are an awful
> lot of highlighting params and so highlighting is very flexible and
> customizable. Regarding snippets, as an example Google presents two snippets
> in results, which is fairly common. I'd recommend doing a lot of
> experimenting by changing the params on the query string to get what you
> want, and then setting them up in SolrJ. The example I sent was intended to
> be a generic starting point and mostly just to show how to set highlighting
> params and how to get back a List of highlighting results.
>
> -Jay
> http://www.lucidimagination.com
>
>
> On Thu, Sep 10, 2009 at 5:40 PM, Paul Tomblin  wrote:
>
>> If I set snippets to 9 and "mergeContinuous" to true, will I get
>> the entire contents of the field with all the search terms replaced?
>> I don't see what good it would be just getting one line out of the
>> whole field as a snippet.
>>
>> On Thu, Sep 10, 2009 at 7:45 PM, Jay Hill  wrote:
>> > Set up the query like this to highlight a field named "content":
>> >
>> >    SolrQuery query = new SolrQuery();
>> >    query.setQuery("foo");
>> >
>> >    query.setHighlight(true).setHighlightSnippets(1); //set other params
>> as
>> > needed
>> >    query.setParam("hl.fl", "content");
>> >
>> >    QueryResponse queryResponse =getSolrServer().query(query);
>> >
>> > Then to get back the highlight results you need something like this:
>> >
>> >    Iterator iter = queryResponse.getResults();
>> >
>> >    while (iter.hasNext()) {
>> >      SolrDocument resultDoc = iter.next();
>> >
>> >      String content = (String) resultDoc.getFieldValue("content"));
>> >      String id = (String) resultDoc.getFieldValue("id"); //id is the
>> > uniqueKey field
>> >
>> >      if (queryResponse.getHighlighting().get(id) != null) {
>> >        List highightSnippets =
>> > queryResponse.getHighlighting().get(id).get("content");
>> >      }
>> >    }
>> >
>> > Hope that gets you what you need.
>> >
>> > -Jay
>> > http://www.lucidimagination.com
>> >
>> > On Thu, Sep 10, 2009 at 3:19 PM, Paul Tomblin 
>> wrote:
>> >
>> >> Can somebody point me to some sample code for using highlighting in
>> >> SolrJ?  I understand the highlighted versions of the field comes in a
>> >> separate NamedList?  How does that work?
>> >>
>> >> --
>> >> http://www.linkedin.com/in/paultomblin
>> >>
>> >
>>
>>
>>
>> --
>> http://www.linkedin.com/in/paultomblin
>>
>



-- 
http://www.linkedin.com/in/paultomblin

Re: Highlighting in SolrJ?

2009-09-11 Thread Jay Hill

It's really just a matter of what you're intentions are. There are an awful
lot of highlighting params and so highlighting is very flexible and
customizable. Regarding snippets, as an example Google presents two snippets
in results, which is fairly common. I'd recommend doing a lot of
experimenting by changing the params on the query string to get what you
want, and then setting them up in SolrJ. The example I sent was intended to
be a generic starting point and mostly just to show how to set highlighting
params and how to get back a List of highlighting results.

-Jay
http://www.lucidimagination.com


On Thu, Sep 10, 2009 at 5:40 PM, Paul Tomblin  wrote:

> If I set snippets to 9 and "mergeContinuous" to true, will I get
> the entire contents of the field with all the search terms replaced?
> I don't see what good it would be just getting one line out of the
> whole field as a snippet.
>
> On Thu, Sep 10, 2009 at 7:45 PM, Jay Hill  wrote:
> > Set up the query like this to highlight a field named "content":
> >
> >SolrQuery query = new SolrQuery();
> >query.setQuery("foo");
> >
> >query.setHighlight(true).setHighlightSnippets(1); //set other params
> as
> > needed
> >query.setParam("hl.fl", "content");
> >
> >QueryResponse queryResponse =getSolrServer().query(query);
> >
> > Then to get back the highlight results you need something like this:
> >
> >Iterator iter = queryResponse.getResults();
> >
> >while (iter.hasNext()) {
> >  SolrDocument resultDoc = iter.next();
> >
> >  String content = (String) resultDoc.getFieldValue("content"));
> >  String id = (String) resultDoc.getFieldValue("id"); //id is the
> > uniqueKey field
> >
> >  if (queryResponse.getHighlighting().get(id) != null) {
> >List highightSnippets =
> > queryResponse.getHighlighting().get(id).get("content");
> >  }
> >}
> >
> > Hope that gets you what you need.
> >
> > -Jay
> > http://www.lucidimagination.com
> >
> > On Thu, Sep 10, 2009 at 3:19 PM, Paul Tomblin 
> wrote:
> >
> >> Can somebody point me to some sample code for using highlighting in
> >> SolrJ?  I understand the highlighted versions of the field comes in a
> >> separate NamedList?  How does that work?
> >>
> >> --
> >> http://www.linkedin.com/in/paultomblin
> >>
> >
>
>
>
> --
> http://www.linkedin.com/in/paultomblin
>

Re: Geographic clustering

2009-09-11 Thread gwk


Hi all,

I've just got my geographic clustering component working (somewhat). 
I've attached a sample resultset to this mail. It seems to work pretty 
well and it's pretty fast. I have one issue I need help with concerning 
the API though. At the moment my Hilbert field is a Sortable Integer, 
and I do the following call to get the count for a specific cluster:


Query rangeQ = new TermRangeQuery("geo_hilbert", lowI, highI, true, true);
searcher.numDocs(rangeQ, docs);

But I'd like to further reduce the DocSet by the given longitude and 
latitude bounds given in the geocluster arguments (swlat, swlng, nelat 
and nelng) but only for the purposes of clustering, I don't just want to 
have to add fq arguments for to the query as I want my non-geocluster 
results (like facet counts and numFound) to not be affected by the 
selected range. So how would I achieve the effect of filterqueries 
(including the awesome caching) by manipulating either the rangeQ or 
docs. And since the snippet above is called multiple times with 
different rangeQ but the same (filtered) DocSet I guess manipulating 
docs would be faster (I think).


Regards,

gwk

gwk wrote:

Hi Joe,

Thanks for the link, I'll check it out, I'm not sure it'll help in my 
situation though since the clustering should happen at runtime due to 
faceted browsing (unless I'm mistaken at what the preprocessing does).


More on my progress though, I thought some more about using Hilbert 
curve mapping and it seems really suited for what I want. I've just 
added a Hilbert field to my schema (Trie Integer field) with latitude 
and longitude at 15bits precision (didn't use 16 bits to avoid the 
sign bit) so I have a 30 bit number in said field. Getting facet 
counts for 0 to (2^30 - 1) should get me the entire map while getting 
counts for 0 to (2^28 - 1), 2^28 to (2^29 - 1), 2^29 to (2^29 + 2^28 - 
1) and (2^29 + 2^28) to (2^30 - 1) should give me counts for four 
equal quadrants, all the way down to 0 to 3, 4 to 7, 8 to 11   
(2^30 - 4 to 2^30 - 1) and of course faceting on every separate term. 
Of course since if you're zoomed in far enough to need such fine 
grained clustering you'll be looking at a small portion of the map and 
only a part of the whole range should be counted, but that should be 
doable by calculating the Hilbert number for the lower and upper bounds.


The only problem is the location of the clusters, if I use this method 
I'll only have the Hilbert number and the number of items in that part 
of the, what is essentially a quadtree. But I suppose I can calculate 
the facet counts for one precision finer than the requested precision 
and use a weighted average of the four parts of the cluster, I'll have 
to see if that is accurate enough.


Hopefully I'll have the time to complete this today or tomorrow. I'll 
report back if it has worked.


Regards,

gwk

Joe Calderon wrote:

there are clustering libraries like
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/, that have
bindings to perl/python, you can preprocess your results and create
clusters for each zoom level

On Tue, Sep 8, 2009 at 8:08 AM, gwk wrote:
 

Hi,

I just completed a simple proof-of-concept clusterer component which
naively clusters with a specified bounding box around each position,
similar to what the javascript MarkerClusterer does. It's currently 
very

slow as I loop over the entire docset and request the longitude and
latitude of each document (Not to mention that my unfamiliarity with
Lucene/Solr isn't helping the implementations performance any, most 
code

is copied from grep-ing the solr source). Clustering a set of about
80.000 documents takes about 5-6 seconds. I'm currently looking into
storing the hilber curve mapping in Solr and clustering using facet
counts on numerical ranges of that mapping but I'm not sure it will 
pan out.


Regards,

gwk

Grant Ingersoll wrote:
   

Not directly related to geo clustering, but
http://issues.apache.org/jira/browse/SOLR-769 is all about a pluggable
interface to clustering implementations.  It currently has Carrot2
implemented, but the APIs are marked as experimental.  I would 
definitely be
interested in hearing your experience with implementing your 
clustering

algorithm in it.

-Grant

On Sep 8, 2009, at 4:00 AM, gwk wrote:

 

Hi,

I'm working on a search-on-map interface for our website. I've 
created a

little proof of concept which uses the MarkerClusterer
(http://code.google.com/p/gmaps-utility-library-dev/) which 
clusters the
markers nicely. But because sending tens of thousands of markers 
over Ajax

is not quite as fast as I would like it to be, I'd prefer to do the
clustering on the server side. I've considered a few options like 
storing
the morton-order and throwing away precision to cluster, assigning 
all
locations to a grid position. Or simply cluster based on 
country/region/city
depending on zoom level by adding latitude on longitude fields for 
each zoom
level (so that for smaller cou

Automatically calculate boost factor

2009-09-11 Thread Villemos, Gert

I would like to automatically calculate the boost factor of a document
based on the values of other fields. For example;
 
1.2
1.5
0.8
 
Document boost = 1.2*1.5*0.8
 
Is it possible to get SOLr to calculate the boost automatically upon
submission based on field values?
 
Cheers,
Gert.


Please help Logica to respect the environment by not printing this email  / 
Pour contribuer comme Logica au respect de l'environnement, merci de ne pas 
imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie 
so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a 
respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.

Re: Default Query Type For Facet Queries

2009-09-11 Thread Stephen Duncan Jr

I haven't experienced any such problems; it's just a query-parser plugin
that adds some behavior on top of the normal query parsing.  In any case,
even if I use a custom request handler with my custom parser, can I get
facet-queries to use this custom parser by default as well?

-Stephen

On Thu, Sep 10, 2009 at 11:30 PM, Lance Norskog  wrote:

> Changing basic defaults like this makes it very confusing to work with
> successive solr releases, to read the wiki, etc.
>
> You can make custom search requesthandlers - an example:
>
>  
> 
> customparser
>
> http://localhost:8983/solr/custom?q=string_in_my_custom_language
>
> On 9/10/09, Stephen Duncan Jr  wrote:
> > If using {!type=customparser} is the only way now, should I file an issue
> to
> > make the default configurable?
> >
> > --
> > Stephen Duncan Jr
> > www.stephenduncanjr.com
> >
> > On Thu, Sep 3, 2009 at 11:23 AM, Stephen Duncan Jr <
> stephen.dun...@gmail.com
> > > wrote:
> >
> > > We have a custom query parser plugin registered as the default for
> > > searches, and we'd like to have the same parser used for facet.query.
> > >
> > > Is there a way to register it as the default for FacetComponent in
> > > solrconfig.xml?
> > >
> > > I know I can add {!type=customparser} to each query as a workaround,
> but
> > > I'd rather register it in the config that make my code send that and
> strip
> > > it off on every facet query.
> > >
> > > --
> > > Stephen Duncan Jr
> > > www.stephenduncanjr.com
> > >
> >
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>

Re: Very slow first query

yes of course. but in my case I'm not using filter queries nor facets.
it is a really simple query. actually the query params are like this:
?q=location_country:1 AND category:377 AND location_state:"CA" and
location_city:"Sacramento"

location_country is an integer
category is an integer
location_state is a string
and location_city is a string

as you can see no filter query and no facets. and for this query the first
time that I execute it it takes almost 50s to run, while for the following
query:

?q=title_search:test

title_search is a tokenized text field with a bunch of filters

it takes a couple of ms

I'm always talking about executing these queries the first time after
restarting solr.

I just want to understand the cause and be sure I won't have this behaviour
every time I commit or optimize.

Jonathan

On Fri, Sep 11, 2009 at 7:28 AM, Uri Boness  wrote:

> "Not having any facet" and "Not using a filter cache" are two different
> things. If you're not using query filters, you can still have facet
> calculated and returned as part of the search result. The facet component
> uses lucene's field cache to retrieve values for the facet field.
>
>
> Jonathan Ariel wrote:
>
>> Yes, but in this case the query that I'm executing doesn't have any facet.
>> I
>> mean for this query I'm not using any filter cache.What does it means
>> "operating system cache can be significant"? That my first query uploads a
>> big chunk on the index into memory (maybe even the entire index)?
>>
>> On Thu, Sep 10, 2009 at 10:07 PM, Yonik Seeley
>> wrote:
>>
>>
>>
>>> At 12M documents, operating system cache can be significant.
>>> Also, the first time you sort or facet on a field, a field cache
>>> instance is populated which can take a lot of time.  You can prevent
>>> slow first queries by configuring a static warming query in
>>> solrconfig.xml that includes the common sorts and facets.
>>>
>>> -Yonik
>>> http://www.lucidimagination.com
>>>
>>> On Thu, Sep 10, 2009 at 8:55 PM, Jonathan Ariel 
>>> wrote:
>>>
>>>
 Hi!Why would it take for the first query that I execute almost 60
 seconds


>>> to
>>>
>>>
 run and after that no more than 50ms? I disabled all my caching to check


>>> if
>>>
>>>
 it is the reason for the subsequent fast responses, but the same
 happens.
 I'm using solr 1.3.
 Something really strange is that it doesn't happen with all the queries.


>>> It
>>>
>>>
 is happening with a query that filters some integer and string fields


>>> joined
>>>
>>>
 by an AND operator. Something like A:1 AND B:2 AND (C:3 AND D:"CA")


>>> (exact
>>>
>>>
 match).
 My index is around 1200M documents.

 Thanks,

 Jonathan



>>>
>>
>>
>

Re: Query runs faster without filter queries?

Oh... another question regard this. If I disabled the query cache and
document cache and I execute a query without filter cache (no facets, no
filter query, etc.). Why the first time I execute the query it takes around
400ms and the second time 10ms? It seems like it should always take the same
amount of time, right?

On Thu, Sep 10, 2009 at 9:44 PM, Jonathan Ariel  wrote:

> Thanks! I don't think I can use an unreleased version of solr even is it's
> stable enough (crazy infrastructure guys) but I might be able to apply the 2
> patches mentioned in the link you sent. I will try it in my local copy of
> solr and see if it improves and let you know.
> Thanks!
>
>
> On Thu, Sep 10, 2009 at 5:43 PM, Uri Boness  wrote:
>
>> If I recall correctly, in solr 1.3 there was an issue where filters didn't
>> really behaved as they should have. Basically, if you had a query and
>> filters defined, the query would have executed normally and only after that
>> the filter would be applied. AFAIK this is fixed in 1.4 where now the
>> documents which are defined by the filters are skipped during the query
>> execution.
>>
>> Uri
>>
>>
>> Jonathan Ariel wrote:
>>
>>> Hi all!
>>> I'm trying to measure the query response time when using just a query and
>>> when using some filter queries. From what I read and understand adding
>>> filter query should boost the query response time. I used luke to
>>> understand
>>> over which fields I should use filter query (those that have few unique
>>> terms, in my case 2 fields of 30 and 400 unique fields). I'm using solr
>>> 1.3.
>>> In order to test the query performance I disabled queryCache and
>>> documentCache, so I just have filterCache enabled.I did that because I
>>> wanted to be sure that there is no caching when I measure my queries. I
>>> left
>>> filterCache because it makes sense since filter query uses that.
>>>
>>> When I first execute my query without filter cache it runs in 400ms, next
>>> execution of the same query around 20ms.
>>> When I first execute my query with filter cache it runs in 500ms, next
>>> execution of the same query around 50ms.
>>>
>>> Why the query with filter query runs slower than the query without filter
>>> query? Shouldn't it be the other way around?
>>>
>>> My index is around 12M documents. My filterCache max size is set to 4
>>> (I
>>> think more than enough). The fields that I use as filter queries are
>>> integer
>>> and in my query I search over a tokenized text field.
>>>
>>> What do you think?
>>>
>>> Thanks a lot,
>>>
>>> Jonathan
>>>
>>>
>>>
>>
>

Re: Solr fitting in travel site context?

2009-09-11 Thread Carsten Kraus

Cool, thanks a lot for sharing your experience and thoughts!
I will run a test like you suggested.

However, I've got some questions. The facet list I would retrieve for step 1
- it would 'only' contain the field values for what I faceted on such as
Hotel-ID, right?
How do I receive hotelname, description, price, etc. then? By issuing
further solr requests for each facetlist-item?

Would I be able to sort and paginate over this list?

You see, to end users I would need to show a hotel list sorted by e.g.
price(with only one search hit per hotel), showing the cheapest price
fitting to e.g. the user-specified date range for each hotel.

Would I still go the facet way with these requirements?

Thanks again!
Carsten


On Thu, Sep 10, 2009 at 9:07 AM, Constantijn Visinescu
wrote:

> I'd run look into faceting and run a test.
>
> Create a schema, index the data and then run a query for *:* facteted by
> hotel to get a list of all the hotels you want followed by a query that
> returns all documents matching that hotel for your 2nd usecase.
>
> You're probably still going to want a SQL database to catch the
> reservations
> made tho.
>
> in my experience implementing Solr is more work then implementing a normal
> SQL database, and loosing the relational part of a relational database is
> something you have to wrap your head around to see how it affects your
> application.
>
> That said solr on my 4 year old single core laptop outperforms our new dual
> xeon database server running IBM DB2 when it comes to running a query on a
> 10 million record dataset and retuning the total amount of documents that
> match.
>
> Once you get it up and running properly and you need querys that are like
> "give me the total number of documents that match these criteria,
> optionally
> facted by this and that" it's amazingly fast.
>
> Note that this advantage only becomes apparent when dealing with large data
> sets. anything under a coulpe 100k records (guideline, depends heavily on
> the type of record) and a normal SQL server should also be able to give you
> the results you need near instantly.
>
> Hope this helps ;)
>
>
> On Wed, Sep 9, 2009 at 5:33 PM, Carsten Kraus  >wrote:
>
> > Hi all,
> >
> > I'm about to develop a travel website and am wondering if Solr might fit
> to
> > be used as the search solution.
> > Being quite the opposite of a db guru and new to Solr, it's hard for me
> to
> > judge if for my use-case a relational db should be used in favor of
> Solr(or
> > similar indexing server). Maybe some of you guys would share their
> opinion
> > on this?
> >
> > The products being searched for would be travel packages. That is: hotel
> > room + flight combined into one product.
> > I receive the products via a csv file, where each line defines a travel
> > package with concrete departure/return, accommodation and price data.
> >
> > For example one csv row might represent:
> > Hotel Foo in Paris, flight departing 10/10/09 from London, ending
> 10/20/09,
> > mealplan Bar, pricing $300
> > ..while another one might look like:
> > Hotel Foo in Paris, flight departing 10/10/09 from Amsterdam, ending
> > 10/30/09, mealplan Eggs :), pricing $400
> >
> > Now searches should show results in 2 steps: first step showing results
> > grouped by hotel(so no hotel appears twice) and second one all
> > date-airport-mealplan combinations for the hotel selected by the user in
> > step 1.
> >
> > From some first little tests, it seems to me as if I at least would need
> > the
> > collapse patch(SOLR-236) to be used in step 1 above?!
> >
> > What do you think? Does Solr fit into this scenario? Thoughts?
> >
> > Sorry for the lengthy post & thanks a lot for any pointer!
> > Carsten
> >
>

Re : Re : Re : Indexing fields dynamically

2009-09-11 Thread Noble Paul നോബിള്‍ नोब्ळ्

Ok, i'll try the transformer (javascript needs jdk1.6 i think)

Thanks again.




Noble Paul wrote : 
>
>If you use DIH for indexing writing a transformer is the simplest
>thing. You can even write it in javascript
>

Re: Re : Re : Indexing fields dynamically

If you use DIH for indexing writing a transformer is the simplest
thing. You can even write it in javascript

On Fri, Sep 11, 2009 at 1:13 PM, nourredine khadri
 wrote:
>
> The pb is that i don't handle fields name. It can be anything (i want to let 
> the developpers free for this)
> Where and how can I change to fields name on the fly  (to add "_i" for 
> example) before indexing?
>
> Do i have to use a transformer? an UpdateRequestProcessor? ...
>
> Which api suits for this? SolrInputDocument? the setField() method? 
> removeField() + addField() ?, ...
>
> Thanks
>
>
>
>
> Lance Norskog wrote :
>>
>>In the schema.xml file, "*_i" is defined as a wildcard type for integer.
>>If a name-value pair is an integer, use: name_i as the field name.
>
>
>
> On 9/10/09, nourredine khadri  wrote:
>>
>> Thanks for the quick reply.
>>
>> Ok for dynamicFields but  how can i rename fields during indexation/search
>> to add suffix corresponding to the type ?
>>
>> What is the best way to do this?
>>
>> Nourredine.
>>
>>
>>
>>
>> 
>> De : Yonik Seeley 
>> À : solr-user@lucene.apache.org
>> Envoyé le : Jeudi, 10 Septembre 2009, 14h24mn 26s
>> Objet : Re: Indexing fields dynamically
>>
>> On Thu, Sep 10, 2009 at 5:58 AM, nourredine khadri
>>  wrote:
>> > I want to index my fields dynamically.
>> >
>> > DynamicFields don't suit my need because I don't know fields name in
>> advance and fields type must be set > dynamically too (need strong typage).
>>
>> This is what dynamic fields are meant for - you pick both the name and
>> type (from a pre-defined set of types of course) at runtime.  The
>> suffix of the field name matches one of the dynamic fields and
>> essentially picks the type.
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>>
>>
>>
>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>
>
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: OutOfMemory error on solr 1.3

2009-09-11 Thread Constantijn Visinescu

1,5 GB already seems like quite a bit but adding more just might solve it
... try something like 3GB (if your machine supports it) and see if that
helps?

If 3 GB still doesn't cut it then the problem is most likely somewhere else
and i'd suggest looking at the application with a memory profiler to see if
something is wrong.

I see you're running jrockit .. does that mean you also have mission control
? I'm making assumptions here but i believe one of mission control's
features is that it lets you monitor the application, maybe that can tell
you something usefull .. otherwise grab a real memory profiler to find out
where all the memory is going.

On Thu, Sep 10, 2009 at 7:44 PM, Francis Yakin  wrote:

> SO, do you think increasing the JVM will help? We also have
> 500 in solrconfig.xml
> Originally was set to
> 200
>
> Currently we give solr 1.5GB for Xms and Xmx, we use jrockit version
> 1.5.0_15
>
> 4 S root 12543 12495 16  76   0 - 848974 184466 Jul20 ?
> 8-11:12:03 /opt/bea/jrmc-3.0.3-1.5.0/bin/java -Xms1536m -Xmx1536m -Xns:128m
> -Xgc:gencon -Djavelin.jsp.el.elcache=4096
> -Dsolr.solr.home=/opt/apache-solr-1.3.0/example/solr
>
> Francis
>
> -Original Message-
> From: Constantijn Visinescu [mailto:baeli...@gmail.com]
> Sent: Wednesday, September 09, 2009 11:35 PM
> To: solr-user@lucene.apache.org
> Subject: Re: OutOfMemory error on solr 1.3
>
> Just wondering, how much memory are you giving your JVM ?
>
> On Thu, Sep 10, 2009 at 7:46 AM, Francis Yakin  wrote:
>
> >
> > I am having OutOfMemory error on our slaves server, I would like to know
> if
> > someone has the same issue and have the solution for this.
> >
> > SEVERE: Error during auto-warming of
> > key:org.apache.solr.search.queryresult...@96cd2ffc
> :java.lang.OutOfMemoryError:
> > allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890
> > SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object
> size:
> > 441216, Num elements: 55150
> > SEVERE: Error during auto-warming of
> > key:org.apache.solr.search.queryresult...@519116e0
> :java.lang.OutOfMemoryError:
> > allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890
> > SEVERE: Error during auto-warming of
> > key:org.apache.solr.search.queryresult...@74dc52fa
> :java.lang.OutOfMemoryError:
> > allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890
> > SEVERE: Error during auto-warming of
> > key:org.apache.solr.search.queryresult...@d0dd3e28
> :java.lang.OutOfMemoryError:
> > allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890
> > SEVERE: Error during auto-warming of
> > key:org.apache.solr.search.queryresult...@b6dfa5bc
> :java.lang.OutOfMemoryError:
> > allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204
> > SEVERE: Error during auto-warming of
> > key:org.apache.solr.search.queryresult...@482b13ef
> :java.lang.OutOfMemoryError:
> > allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204
> > SEVERE: Error during auto-warming of
> > key:org.apache.solr.search.queryresult...@2309438c
> :java.lang.OutOfMemoryError:
> > allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204
> > SEVERE: Error during auto-warming of
> > key:org.apache.solr.search.queryresult...@277bd48c
> :java.lang.OutOfMemoryError:
> > allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204
> > Exception in thread "[ACTIVE] ExecuteThread: '7' for queue:
> > 'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError:
> > allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
> > Exception in thread "[ACTIVE] ExecuteThread: '8' for queue:
> > 'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError:
> > allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
> > Exception in thread "[ACTIVE] ExecuteThread: '10' for queue:
> > 'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError:
> > allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
> > Exception in thread "[ACTIVE] ExecuteThread: '11' for queue:
> > 'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError:
> > allocLargeObjectOrArray - Object size: 8208, Num elements: 8192
> > SEVERE: Error during auto-warming of
> > key:org.apache.solr.search.queryresult...@41405463
> :java.lang.OutOfMemoryError:
> > allocLargeObjectOrArray - Object size: 751552, Num elements: 187884
> >  java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208,
> > Num elements: 8192
> > java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208,
> > Num elements: 8192
> > java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 5096,
> > Num elements: 2539
> > java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 5400,
> > Num elements: 2690
> >
> >  > deployment service message for request id "-1" from server "AdminServer".
> > Exception is: "java.lang.OutOfMemoryError: allocLargeObjectOrArray -
> Object
> > size: 4368, Num elemen

Re: Very slow first query

2009-09-11 Thread Uri Boness

"Not having any facet" and "Not using a filter cache" are two different 
things. If you're not using query filters, you can still have facet 
calculated and returned as part of the search result. The facet 
component uses lucene's field cache to retrieve values for the facet field.


Jonathan Ariel wrote:

Yes, but in this case the query that I'm executing doesn't have any facet. I
mean for this query I'm not using any filter cache.What does it means
"operating system cache can be significant"? That my first query uploads a
big chunk on the index into memory (maybe even the entire index)?

On Thu, Sep 10, 2009 at 10:07 PM, Yonik Seeley
wrote:

  

At 12M documents, operating system cache can be significant.
Also, the first time you sort or facet on a field, a field cache
instance is populated which can take a lot of time.  You can prevent
slow first queries by configuring a static warming query in
solrconfig.xml that includes the common sorts and facets.

-Yonik
http://www.lucidimagination.com

On Thu, Sep 10, 2009 at 8:55 PM, Jonathan Ariel 
wrote:


Hi!Why would it take for the first query that I execute almost 60 seconds
  

to


run and after that no more than 50ms? I disabled all my caching to check
  

if


it is the reason for the subsequent fast responses, but the same happens.
I'm using solr 1.3.
Something really strange is that it doesn't happen with all the queries.
  

It


is happening with a query that filters some integer and string fields
  

joined


by an AND operator. Something like A:1 AND B:2 AND (C:3 AND D:"CA")
  

(exact


match).
My index is around 1200M documents.

Thanks,

Jonathan

Re: Trouble Indexing HTML Files

2009-09-11 Thread Noble Paul നോബിള്‍ नोब्ळ्

hey XpathEntityprocessor does not work with wildcard xpath  like '//a...@class'

if you just wish to index htl use a PlaintextEntityProcessor with
HTMLStripTransformer

On Fri, Sep 11, 2009 at 1:22 AM, Daniel Cohen
 wrote:
> *HI there-**
> *
> *I'm trying to get the dataimporthandler working to recursively parse the
> content of a root directory, which contain several other directories beneath
> it... The indexing seems to encounter errors ith the doctype tag in my
> source files.*
> *
> *
> *i've provided my schema.xml with the appropriate fields,  I've added the
> dataimport requestHandler to the  solrconfig.xml. Does anyone know what I am
> doing wrong, or perhaps a better way to attempt this?*
> *
> *
> * dataconfig.xml:*
> 
> 
>    
>          processor="FileListEntityProcessor"
> baseDir="exampledocs/dylan"
>  fileName=".*htm"
> recursive="true"
> rootEntity="false"
>  dataSource="null">
>    processor="XPathEntityProcessor"
> forEach="/html"
>  transformer="HTMLStripTransformer"
> url="${file.fileAbsolutePath}">
>                 
> 
>  
>             
>        
>    
> 
>
>
> *Stack trace:*
>
> ava.lang.RuntimeException: com.ctc.wstx.exc.WstxIOException: Server returned
> HTTP response code: 503 for URL:
> http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd
>  at
> org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:85)
> at
> org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:226)
>  at
> org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:180)
> at
> org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:163)
>  at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:311)
>  at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
> at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
>  at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
> at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
>  at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
> Caused by: com.ctc.wstx.exc.WstxIOException: Server returned HTTP response
> code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd
>  at com.ctc.wstx.sr.StreamScanner.throwFromIOE(StreamScanner.java:708)
> at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1086)
>  at
> org.apache.solr.handler.dataimport.XPathRecordReader$Node.parse(XPathRecordReader.java:141)
> at
> org.apache.solr.handler.dataimport.XPathRecordReader$Node.access$000(XPathRecordReader.java:89)
>  at
> org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:82)
> ... 10 more
> Caused by: java.io.IOException: Server returned HTTP response code: 503 for
> URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd
>  at
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1170)
> at java.net.URL.openStream(URL.java:1007)
>  at com.ctc.wstx.util.URLUtil.optimizedStreamFromURL(URLUtil.java:113)
> at
> com.ctc.wstx.io.DefaultInputResolver.sourceFromURL(DefaultInputResolver.java:256)
>  at
> com.ctc.wstx.io.DefaultInputResolver.resolveEntity(DefaultInputResolver.java:96)
> at
> com.ctc.wstx.sr.ValidatingStreamReader.findDtdExtSubset(ValidatingStreamReader.java:468)
>  at
> com.ctc.wstx.sr.ValidatingStreamReader.finishDTD(ValidatingStreamReader.java:358)
> at com.ctc.wstx.sr.BasicStreamReader.skipToken(BasicStreamReader.java:3351)
>  at
> com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:1988)
> at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)
>  ... 13 more
>
>
> *Sample .htm file:*
>
> *
>  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";>
> http://www.w3.org/1999/xhtml";>
>
> 
> Hazel
> 
> 
>
> 
>
> Hazel
>
>
> Words and music Bob Dylan
> Released on Planet Waves
> (1974)
> Tabbed by Eyolf Østrem
>
> The song could equally well be played with C chords and a capo on the
> 4th fret. Such a version is appended at the end.
>
> The intro is played rather freely (which is a nice way of saying that
> they aren't exactly tight...) – and with both a bass and a guitar. The
> tab below is just a suggestion of an approximation.
>
> 
>
> 
>  E                B                A  E/g# F#m E
> |7---|2---||
> |9-9---9-|4-4---4-|2---0---|
> |9---9---9---|4---4---4---|2---1---2---1---|
> |9---|4---|2---2---4---2---|
> |7---|2---|0---4---2---|
> |||4---2---0---|
> 
>
> 
> E      G#
> Hazel, dirty-blonde hair
> A                           F#7
> I wouldn't be ashamed to be seen with you anywhere.
> E

Re: Re : Re : Re : Re : Pb using delta import with XPathEntityProcessor

2009-09-11 Thread Noble Paul നോബിള്‍ नोब्ळ्

thanks for reporting the issue.

On Fri, Sep 11, 2009 at 2:54 PM, nourredine khadri
 wrote:
> Great! it works !
>
> Thanks Paul. I appreciate your reactivity.
>
>
>
> nourredine khadri wrote :
>>
>
>>Thanks! I'll test it ASAP!
>>
>
>
>
>>Noble Paul wrote :
>>>
>>>https://issues.apache.org/jira/browse/SOLR-1421
>>>
>
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re : Re : Re : Re : Pb using delta import with XPathEntityProcessor

Great! it works !

Thanks Paul. I appreciate your reactivity.



nourredine khadri wrote : 
>

>Thanks! I'll test it ASAP!
>



>Noble Paul wrote : 
>>
>>https://issues.apache.org/jira/browse/SOLR-1421
>>

Re: Random Display of result in solr(Can anyone please answer to this post )

2009-09-11 Thread dharhsana


Hi Team ,

Can anyone please answer to this post 

I have an issue while i am working with solr. 

I am working on blog module,here the user will be creating more blogs,and he
can post on it and have several comments for post.For implementing this
module i am using solr 1.4. 

When i get blog details of particular user, it brings the result in random
manner for 
(ex:) If i am passing blogid in the query to get my details,the result i got
as ,if i have 2 result from it 

This is the first result

SolrDocument1{blogTitle=New Blog, blogId=New Blog, userId=1}] 
SolrDocument2{blogId=New Blog, postId=New Post, postTitle=New Post,
postMessage=New Post Message, timestamp_post=Fri Sep 11 09:48:24 IST 2009}] 
SolrDocument3{blogTitle=ammu blog, blogId=ammu blog, userId=1}] 

The Second result 
SolrDocument1{blogTitle=New Blog, blogId=New Blog, userId=1}] 
SolrDocument2{blogTitle=ammu blog, blogId=ammu blog, userId=1}] 
SolrDocument3{blogId=New Blog, postId=New Post, postTitle=New Post,
postMessage=New Post Message, timestamp_post=Fri Sep 11 09:48:24 IST 2009}] 

I am using solrj, when i am iterating the list i some times get
ArrayIndexOutOfBoundException,because of my difference in the result. 

When i run again my code some other time ,it produces the proper result.so
the list was changing all time. 

If anybody faced this type of problem ,please share with me.. 

And  iam not able to get the specific thing ie if i am going to get blog
details of particular user, so i will be passing blogtitle for ex: rekha
blog , it is not giving only the rekha blog it also gives other blog which
ends with blog (i..e sandhya blog,it brings even that and shows..). 

what should i do for this ,any specific query should be given ,i am using
solrj.. using this how to make my query to get my prompt result. 

Waiting for your reply 

Regards, 

Rekha. 



dharhsana wrote:
> 
> Hi to all, 
> 
> I have an issue while i am working with solr. 
> 
> I am working on blog module,here the user will be creating more blogs,and
> he can post on it and have several comments for post.For implementing this
> module i am using solr 1.4. 
> 
> When i get blog details of particular user, it brings the result in random
> manner for 
> (ex:) If i am passing blogid in the query to get my details,the result i
> got as ,if i have 2 result from it 
> 
> This is the first result
> 
> SolrDocument1{blogTitle=New Blog, blogId=New Blog, userId=1}] 
> SolrDocument2{blogId=New Blog, postId=New Post, postTitle=New Post,
> postMessage=New Post Message, timestamp_post=Fri Sep 11 09:48:24 IST
> 2009}] 
> SolrDocument3{blogTitle=ammu blog, blogId=ammu blog, userId=1}] 
> 
> The Second result 
> SolrDocument1{blogTitle=New Blog, blogId=New Blog, userId=1}] 
> SolrDocument2{blogTitle=ammu blog, blogId=ammu blog, userId=1}] 
> SolrDocument3{blogId=New Blog, postId=New Post, postTitle=New Post,
> postMessage=New Post Message, timestamp_post=Fri Sep 11 09:48:24 IST
> 2009}] 
> 
> I am using solrj, when i am iterating the list i some times get
> ArrayIndexOutOfBoundException,because of my difference in the result. 
> 
> When i run again my code some other time ,it produces the proper result.so
> the list was changing all time. 
> 
> If anybody faced this type of problem ,please share with me.. 
> 
> And  iam not able to get the specific thing ie if i am going to get blog
> details of particular user, so i will be passing blogtitle for ex: rekha
> blog , it is not giving only the rekha blog it also gives other blog which
> ends with blog (i..e sandhya blog,it brings even that and shows..). 
> 
> what should i do for this ,any specific query should be given ,i am using
> solrj.. using this how to make my query to get my prompt result. 
> 
> Waiting for your reply 
> 
> Regards, 
> 
> Rekha. 
> 

-- 
View this message in context: 
http://www.nabble.com/Random-Display-of-result-in-solr-tp25395746p25397538.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Facet fields and the DisMax query handler

2009-09-11 Thread Villemos, Gert

Thanks. Maybe I'm misusing the dismax request handler, but the ability to 
search all fields is just too good a feature.

I found the following description of how to do facetted queries with the 
dismax. I have not tried it yet but will.

http://fisk.stjernesludd.net/archives/16-Solr-Using-the-dismax-Query-Handler-and-Still-Limit-a-Specific-Field.html

Cheers,
Gert.

-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com] 
Sent: Freitag, 11. September 2009 04:56
To: solr-user@lucene.apache.org
Subject: Re: Facet fields and the DisMax query handler

Facets are not involved here. These are only simple searches.

The DisMax parser does not use field names in the query. DisMax creates a
nice simple syntax for people to type into a web browser search field. The
various parameters let you sculpt the relevance in order to tune the user
experience.

There are ways to intermix dismax parsing in the standard query parser
syntax, but I am no expert. You can also use these field queries as filter
queries; this is a hack but does work. Also, using wildcards interferes with
upper/lower case handling.

On 9/10/09, Villemos, Gert  wrote:
>
> I'm trying to understand the DisMax query handler. I orginally
> configured it to ensure that the query was mapped onto different fields
> in the documents and a boost assigned if the fields match. And that
> works pretty smoothly.
>
> However when it comes to facetted searches the results perplexes me.
> Consider the following example;
>
> Document A:
>John Doe
>
> Document B:
>John Doe
>
> The following queries does not return anything;
>Staff:Doe
>Staff:Doe*
>Staff:John
>Staff:John*
>
> The query;
>Staff:"John"
>
> Returns Document A and B, even though document B doesnt even contain the
> field 'Staff' (which is optional)! Through the "qf" field dismax has
> been configured to search over the field 'ProjectManager' but I expected
> the usage of a facet value would exclude the field... Looking at the
> score of the documents, document A does score much higher than Document
> B (a factor 20) but I would expect not to see B at all. I have changed
> the dismax configuration minimum match to be 1, to ensure that all hits
> with a single match is returned without effect. I have changed the tie
> to 0 with no effect.
>
> What am I missing here? I would like queries such as 'Staff:Doe' to
> return document A, and only A.
>
> Cheers,
> Gert.
>
>
>
> Please help Logica to respect the environment by not printing this email  /
> Pour contribuer comme Logica au respect de l'environnement, merci de ne pas
> imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen
> Sie so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a
> respeitar o ambiente nao imprimindo este correio electronico.
>
>
>
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. It may contain proprietary material, confidential
> information and/or be subject to legal privilege. It should not be copied,
> disclosed to, retained or used by, any other party. If you are not an
> intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender. Thank you.
>
>

-- 
Lance Norskog
goks...@gmail.com

Please help Logica to respect the environment by not printing this email  / 
Pour contribuer comme Logica au respect de l'environnement, merci de ne pas 
imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie 
so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a 
respeitar o ambiente nao imprimindo este correio electronico.

This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.

Re: Trouble Indexing HTML Files

On Fri, Sep 11, 2009 at 1:22 AM, Daniel Cohen <
daniel.michael.co...@gmail.com> wrote:

> *HI there-**
> *
> *I'm trying to get the dataimporthandler working to recursively parse the
> content of a root directory, which contain several other directories
> beneath
> it... The indexing seems to encounter errors ith the doctype tag in my
> source files.*
> *
> *Stack trace:*
>
> ava.lang.RuntimeException: com.ctc.wstx.exc.WstxIOException: Server
> returned
> HTTP response code: 503 for URL:
> http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd
>  at
>
> org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:85)
> at
>
> org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:226)
>  at
>

In trunk DataImportHandler ignores DTD [1]. If you are using Solr 1.3, then
unfortunately there is no workaround except removing the dtd declarations
from the files before indexing through DIH.

[1] - See https://issues.apache.org/jira/browse/SOLR-964

-- 
Regards,
Shalin Shekhar Mangar.

Re : Re : Re : Re : Pb using delta import with XPathEntityProcessor

Thanks! I'll test it ASAP!




Noble Paul wrote : 
>
>https://issues.apache.org/jira/browse/SOLR-1421
>

Re: What Tokenizerfactory/TokenFilterFactory can/should I use so a search for "wal mart" matches "walmart"(quotes not included in search or index)?

2009-09-11 Thread AHMET ARSLAN

> There are a lot of company names that
> people are uncertain as to the correct spelling. A few of
> examples are:
> 1. best buy, bestbuy
> 2. walmart, wal mart, wal-mart
> 3. Holiday Inn, HolidayInn
> 
> What Tokenizer Factory and/or TokenFilterFactory should I
> use so that somebody typing "wal mart"(quotes not included)
> will find "wal mart" and "walmart"(again, quotes not
> included)

I faced a similar requirement before. I solved it by hardcoding those names to 
synonyms_index.txt and using SynonymFilterFactory at index time.

synonyms_index.txt will contain:

best buy, bestbuy
walmart, wal mart
Holiday Inn, HolidayInn


  
   
   
  

  
   


Since solr wiki[1] advices to use index time synonym when dealing with 
multi-word synonyms, I am using index time synonym expansion only.

[1] 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-2c461ac74b4ddd82e453dc68fcfc92da77358d46

When working with StandardAnalyzer, wal-mart is broken into two tokens: wal and 
mart. So you dont need to write - forms of the words in synonyms_index.txt


If all of your examples were similar to HolidayInn, you could use 
solr.WordDelimiterFilterFactory (without writing all these company named to a 
file) but you can't handle "wal mart" and "walmart" with it.

Hope this helps.

Re : Re : Indexing fields dynamically