Creating document schema at runtime

2007-12-11 Thread Shalin Shekhar Mangar
Hi,

I'm looking on some tips on how to create a new document schema and
add it to solr core at runtime. The use case that I'm trying to solve
is:

1. Using a custom configuration tool, user creates a solr schema
2. The schema is added (uploaded) to a solr instance (on a remote machine).
3. Documents corresponding to the newly added schema are added to solr.

I understand that with SOLR-215, I can create a new core by specifying
the config and schema but still, there is no way for me to do this
from a remote machine using HTTP calls. If this capability does not
exist, I would be happy to open an issue in JIRA and contribute
patches.

Looking for your inputs.

-- 
Regards,
Shalin Shekhar Mangar.


Creating user-defined field types

2007-12-11 Thread Rishabh Joshi
Hi,

Can anyone guide me as to how one can go on to implement a user defined
field types in solr? I could not find anything on the solr-wiki. Help of any
kind would be appreciated.

Regards,
Rishabh


Facets - What's a better term for non technical people?

2007-12-11 Thread Benjamin O'Steen
 
Whilst many of the people on this list (myself included) have a pretty
good grasp of what is meant by the term facet, this is not clear to
people who approach the system from a more fresh point of view.

So, has anyone got a good example of the language they might use over,
say, a set of radio buttons and fields on a web form, to indicate that
selecting one or more of these would return facets. 'Show grouping by'
or 'List the sets that the results fall into' or something similar.

Ben


Re: Facets - What's a better term for non technical people?

2007-12-11 Thread Adrian Sutton

On 11/12/2007, at 8:32 PM, Benjamin O'Steen wrote:

So, has anyone got a good example of the language they might use over,
say, a set of radio buttons and fields on a web form, to indicate that
selecting one or more of these would return facets. 'Show grouping by'
or 'List the sets that the results fall into' or something similar.


"Filter by" is what I'd use which is unfortunately already used in  
Solr, though very much related since the facet is generally added as a  
filter query. Not close enough to use the same term though.


Other things that are close but not really right would be groups or  
categories. Maybe "Limit to" so facets would be limiters.  I think  
facet is the right term and what you need is to add "see also" type  
entries under a bunch of these other terms.


Regards,

Adrian Sutton
http://www.symphonious.net



RE: Facets - What's a better term for non technical people?

2007-12-11 Thread DAVIGNON Andre - CETE NP/DIODé/PANDOC
Hi,

> So, has anyone got a good example of the language they might use over,
> say, a set of radio buttons and fields on a web form, to indicate that
> selecting one or more of these would return facets. 'Show grouping by'
> or 'List the sets that the results fall into' or something similar.

Here's what i found some time : 
http://www.searchtools.com/info/faceted-metadata.html

It has been quite useful to me.

André Davignon



Re: Replication hooks

2007-12-11 Thread Tracy Flynn

That's what I was after.

As always, thanks for the quick response.

Tracy

On Dec 11, 2007, at 12:18 AM, Yonik Seeley wrote:


On Dec 10, 2007 11:22 PM, climbingrose <[EMAIL PROTECTED]> wrote:
I think there is a event listener interface for hooking into Solr  
events
such as post commit, post optimise and open new searcher. I can't  
remember
on top of my head but if you do a search for *EventListener in  
Eclipse,

you'll find it.
The Wiki shows how to trigger snapshooter after each commit and  
optimise.
You should be able to follow this example to create your own  
listener.


Right... you shouldn't need to implement your own listeners though.
Search for postCommit in the example solrconfig.xml

-Yonik




How to effectively search inside fields that should be indexed with changing them.

2007-12-11 Thread Brian Carmalt

Hello all,

The titles of our docs have the form "ABC0001231-This is an important 
doc.pdf". I would like to be able to
search for 'important', or '1231',  or 'ABC000*', or 'This is an 
important doc'  in the title field. I looked a the NGramTokenizer and 
tried to use it.
In the index it doesn't seem to work, I cannot get any hits. The 
analysis tool on the admin pages shows me that the
ngram tokenizing works by highlighting the matches between the indexed 
value and a query. I have set the

min and max ngram size to 2 and 6, with side equal to left.

Can anyone recommend a procedure that will allow me to search as stated 
above?


I would also like to find out more about how to use the NgramTokenizer, 
but have found little in the form of

documentation. Anyone know about any good sources?

Thanks,

Brian


Two Solr Webapps, one folder for the index data?

2007-12-11 Thread Jörg Kiegeland
I have successfully configured two parallel  Solr webapps , however I 
see that all data gets stored in one folder of my Tomcat installation, 
namely C:\Tomcat\solr\data\index.


How can I configure that each Solr webapp shall store the data in the 
folders I assigned at  , where 
already the Solr scheme etc. resides (so that it get stored at 
individualSolrFolder/data/index)?


Thanks


Re: Two Solr Webapps, one folder for the index data?

2007-12-11 Thread patrick o'leary




I actually have a patch for solr config parser which allows you to use
context environment variables in the solrconfig.xml
I generally use it for development when I'm working with multiple
instances and different data dirs.  I'll add it to jira today if you
want it.

P

Jörg Kiegeland wrote:
I have
successfully configured two parallel  Solr webapps , however I see that
all data gets stored in one folder of my Tomcat installation, namely
C:\Tomcat\solr\data\index.
  
  
How can I configure that each Solr webapp shall store the data in the
folders I assigned at  , where already the Solr scheme etc.
resides (so that it get stored at individualSolrFolder/data/index)?
  
  
Thanks
  
  


-- 
Patrick O'Leary


You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles.
 Do you understand this? 
And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat.
  - Albert Einstein

View
Patrick O Leary's profile





Re: Two Solr Webapps, one folder for the index data?

2007-12-11 Thread Jörg Kiegeland
 I actually have a patch for solr config parser which allows you to 
use context environment variables in the solrconfig.xml
I generally use it for development when I'm working with multiple 
instances and different data dirs.  I'll add it to jira today if you 
want it.

That would be nice.

However I cannot believe that one cannot configure this by 
someconfiguration file by now - what if only one index needs to be 
backuped and the other index does not need to be backuped because it 
carries only redundant information of some other data source, like in my 
case. And if all data is put in one folder, you can backup only both 
indexes together..




Re: Facets - What's a better term for non technical people?

2007-12-11 Thread Charles Hornberger
FAST calls them "navigators" (which I think is a terrible term - YMMV
of course :-))

I tend to think that "filters" -- or perhaps "dynamic filters" --
captures the essential function.

On Dec 11, 2007 2:38 AM, "DAVIGNON Andre - CETE NP/DIODé/PANDOC"
<[EMAIL PROTECTED]> wrote:
> Hi,
>
> > So, has anyone got a good example of the language they might use over,
> > say, a set of radio buttons and fields on a web form, to indicate that
> > selecting one or more of these would return facets. 'Show grouping by'
> > or 'List the sets that the results fall into' or something similar.
>
> Here's what i found some time : 
> http://www.searchtools.com/info/faceted-metadata.html
>
> It has been quite useful to me.
>
> André Davignon
>
>


Re: Creating user-defined field types

2007-12-11 Thread Yonik Seeley
On Dec 11, 2007 5:17 AM, Rishabh Joshi <[EMAIL PROTECTED]> wrote:
> Can anyone guide me as to how one can go on to implement a user defined
> field types in solr?

At a higher level, what are you trying to accomplish?

If you just want to customize analysis, just copy and modify an
existing fieldType definition in the schema.xml file.

-Yonik


Re: Two Solr Webapps, one folder for the index data?

2007-12-11 Thread Mike Klaas

I use jvm system properties for this; they seem to work well.

-Mike

On 11-Dec-07, at 7:39 AM, patrick o'leary wrote:

I actually have a patch for solr config parser which allows you to  
use context environment variables in the solrconfig.xml
I generally use it for development when I'm working with multiple  
instances and different data dirs.  I'll add it to jira today if  
you want it.


P

Jörg Kiegeland wrote:
I have successfully configured two parallel  Solr webapps ,  
however I see that all data gets stored in one folder of my Tomcat  
installation, namely C:\Tomcat\solr\data\index.


How can I configure that each Solr webapp shall store the data in  
the folders I assigned at  value="individualSolrFolder">, where already the Solr scheme etc.  
resides (so that it get stored at individualSolrFolder/data/index)?


Thanks



--
Patrick O'Leary You see, wire telegraph is a kind of a very, very  
long cat. You pull his tail in New York and his head is meowing in  
Los Angeles. Do you understand this? And radio operates exactly the  
same way: you send signals here, they receive them there. The only  
difference is that there is no cat. - Albert Einstein


View Patrick O Leary's profile




Re: Two Solr Webapps, one folder for the index data?

2007-12-11 Thread patrick o'leary




JVM properties restrict you to a single implementation within a jvm.

For instance if you want multiple instances of solr running with the
same schema, with different data
dir's in the one app server. You'll have to have several copies of
solrconfig and schema.xml.

By using context environment, I can have multiple contexts like

pjaol:~/tmp/locallucene/solr/tomcat-conf pjaol$ more solr.xml 

   
   


pjaol:~/tmp/locallucene/solr/tomcat-conf pjaol$ more solr1.xml 

   
   


Changing just the solr/data/dir for each instance.

And in my solrconfig.xml
 ${env/solr/data/dir:./solr/data}

It certainly makes development & operations easier.

P

Mike Klaas wrote:
I use jvm system properties for this; they seem to work
well.
  
  
-Mike
  
  
On 11-Dec-07, at 7:39 AM, patrick o'leary wrote:
  
  
  I actually have a patch for solr config
parser which allows you to use context environment variables in the
solrconfig.xml

I generally use it for development when I'm working with multiple
instances and different data dirs.  I'll add it to jira today if you
want it.


P


Jörg Kiegeland wrote:

I have successfully configured two
parallel  Solr webapps , however I see that all data gets stored in one
folder of my Tomcat installation, namely C:\Tomcat\solr\data\index.
  
  
How can I configure that each Solr webapp shall store the data in the
folders I assigned at  , where already the Solr scheme etc.
resides (so that it get stored at individualSolrFolder/data/index)?
  
  
Thanks
  
  


-- 
Patrick O'Leary You see, wire telegraph is a kind of a very, very long
cat. You pull his tail in New York and his head is meowing in Los
Angeles. Do you understand this? And radio operates exactly the same
way: you send signals here, they receive them there. The only
difference is that there is no cat. - Albert Einstein



View Patrick O Leary's profile

  
  


-- 
Patrick O'Leary

You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles.
 Do you understand this? 
And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat.
  - Albert Einstein

View
Patrick O Leary's profile





Re: Two Solr Webapps, one folder for the index data?

2007-12-11 Thread Chris Hostetter

: However I cannot believe that one cannot configure this by someconfiguration
: file by now - what if only one index needs to be backuped and the other index

is the option you are looking forthe  in solrconfig.xml ?



-Hoss



Pattern that generates two tokens per match

2007-12-11 Thread Ken Krugler

Hi all,

I've got a pattern in a document (call it "xy") that I want to turn 
into two tokens - "xy" and "y".


One approach I could use is PatternTokenizer to extract "xy", and 
then a custom filter that returns "xy" and then "y" on the next call 
(caches the next result).


Or I could extend PatternTokenizer to return multiple tokens per 
match, though figuring out how to specify that in the schema seems 
harder.


Is there another approach that wouldn't require any custom code?

Thanks,

-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"If you can't find it, you can't fix it"


Re: Two Solr Webapps, one folder for the index data?

2007-12-11 Thread Chris Hostetter

: I actually have a patch for solr config parser which allows you to use
: context environment variables in the solrconfig.xml
: I generally use it for development when I'm working with multiple
: instances and different data dirs.  I'll add it to jira today if you
: want it.

yes please! ... Solr already has system property variable replacement in 
solrconfig.xml, and we discussed a while back (on solr-dev i think) adding 
code to automatily create system properties on startup for any solr/* JNDI 
variables set so the same variable subst code could be reused ... but i 
don't think anyone ever opened an issue or created a patch for it.



-Hoss



Re: Pattern that generates two tokens per match

2007-12-11 Thread Mike Klaas

On 11-Dec-07, at 11:51 AM, Ken Krugler wrote:


Hi all,

I've got a pattern in a document (call it "xy") that I want to turn  
into two tokens - "xy" and "y".


One approach I could use is PatternTokenizer to extract "xy", and  
then a custom filter that returns "xy" and then "y" on the next  
call (caches the next result).


Or I could extend PatternTokenizer to return multiple tokens per  
match, though figuring out how to specify that in the schema seems  
harder.


Is there another approach that wouldn't require any custom code?


Not that I can think of.  Perhaps the natural way of extending  
PatterTokenizer to return subtokens is to use the grouping of the  
regular expression.  That is, specify "x(y)" to return both.  I  
assume that java has a non-selecting re group operator (it's (?:) in  
python) so the basic grouping functionality would not be lost.


Python does this for re.split, which I find nice:

>>> re.split('a(b)c', 'oneabctwoabcthree')
 ['one', 'b', 'two', 'b', 'three']





Re: Solr, Multiple processes running

2007-12-11 Thread Otis Gospodnetic
Martin,

Look into MultiCore (new stuff, some info on the Wiki) or into running multiple 
Solrs inside a single JVM.  We just did this with Jetty 6.1.6 for a client and 
it works beautifully.  This is also documented on the Wiki.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: "Owens, Martin" <[EMAIL PROTECTED]>
To: "Owens, Martin" <[EMAIL PROTECTED]>; solr-user@lucene.apache.org
Sent: Tuesday, December 11, 2007 4:01:53 PM
Subject: Solr, Multiple processes running

Hello everyone,

The system we're moving from (dtSearch) allows each of our clients to
 have a search index. So far I have yet to find the options required to
 set this, it seems I can only set the directory path before run time.

Each of the indexes uses the same schema, same configuration just
 different data in each; what kind of performance penalty would I have from
 running a new solr instance per required database? what is the best way
 to track what port or what index is being used? would I be able to run
 1,000 or more solr instances without performance degradation?

Thanks for your help.

Best regards, Martin Owens





Re: How to effectively search inside fields that should be indexed with changing them.

2007-12-11 Thread Otis Gospodnetic
Brian,

This is not really a job for n-grams.  It sounds like you'll want to write a 
custom Tokenizer that has knowledge about this particular pattern, knows how to 
split input like the one in your example, and produce multiple tokens out of 
it.  For the natural language part you can probably get away with one of the 
existing tokenizers/analyzers/factories.  For the first part you'll likely want 
to extract (W+)0+ -- 1 or morel etters followed by 1 or more zeros as one 
token, and then 0+(D+) -- 1 or more zeros followed by 1 or more digits.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Brian Carmalt <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Tuesday, December 11, 2007 9:17:32 AM
Subject: How to effectively search inside fields that should be indexed with 
changing them.

Hello all,

 The titles of our docs have the form "ABC0001231-This is an important 
doc.pdf". I would like to be able to
search for 'important', or '1231',  or 'ABC000*', or 'This is an 
important doc'  in the title field. I looked a the NGramTokenizer and 
tried to use it.
In the index it doesn't seem to work, I cannot get any hits. The 
analysis tool on the admin pages shows me that the
ngram tokenizing works by highlighting the matches between the indexed 
value and a query. I have set the
min and max ngram size to 2 and 6, with side equal to left.

Can anyone recommend a procedure that will allow me to search as stated
 
above?

I would also like to find out more about how to use the NgramTokenizer,
 
but have found little in the form of
documentation. Anyone know about any good sources?

Thanks,

Brian





Re: Solr, Multiple processes running

2007-12-11 Thread Otis Gospodnetic
Keeping track of 1000+ indices is actually not that hard.  I've implemented 
Simpy - http://simpy.com - in a way that keeps each member's index (or indices 
- some users have multiple indices) separate.  I can't give out the total 
number of Simpy users, but I can tell you it is weeell beyond 1000 :)

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Erick Erickson <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Tuesday, December 11, 2007 4:33:45 PM
Subject: Re: Solr, Multiple processes running

How much data are we talking about here? Because it seems *much*
 simpler
to just index a field with each document indicating the user and then
 just
AND that user's ID in with your query.

Or think about facets (although I admit I don't know enough about
 facets
to weigh in on its merits, it's just been mentioned a lot).

Keeping track of 1,000+ indexes seems like a maintenance headache, but
much depends upon how much data you're talking about.

When replying, the number of documents is almost, but not quite
totally, useless unless combined with the number of fields you're
storing per doc, the average length of each field, etc .

Erick

On Dec 11, 2007 4:01 PM, Owens, Martin <[EMAIL PROTECTED]>
 wrote:

> Hello everyone,
>
> The system we're moving from (dtSearch) allows each of our clients to
 have
> a search index. So far I have yet to find the options required to set
 this,
> it seems I can only set the directory path before run time.
>
> Each of the indexes uses the same schema, same configuration just
> different data in each; what kind of performance penalty would I have
 from
> running a new solr instance per required database? what is the best
 way to
> track what port or what index is being used? would I be able to run
 1,000 or
> more solr instances without performance degradation?
>
> Thanks for your help.
>
> Best regards, Martin Owens
>





RE: Two Solr Webapps, one folder for the index data?

2007-12-11 Thread Arnone, Anthony
I asked a question similar to this back in 
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200709.mbox/[EMAIL 
PROTECTED] and didn't really find anyone who was doing this. What I wound up 
doing was adding a variable to the context.xml file called contextRelativeHome:


  solr/contextRelativeHome
  java.lang.Boolean
  true


Which causes the SolrResourceLoader to prepend the context directory to the 
solr/home variable (the context directory is identified in the 
SolrDispatchFilter and stored in the global Config). This way, I can have a 
multiple instances of Solr up and running with the exact same configuration, 
and their indices contained wholly within their deployment directories.

So since this is a fresh thread, does this seem like a bad way to do it? It 
would be much easier if I could put context variables directly into the 
existing solr/home variable, for sure.

Anthony


-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, December 11, 2007 1:09 PM
To: solr-user@lucene.apache.org
Subject: Re: Two Solr Webapps, one folder for the index data?


: I actually have a patch for solr config parser which allows you to use
: context environment variables in the solrconfig.xml
: I generally use it for development when I'm working with multiple
: instances and different data dirs.  I'll add it to jira today if you
: want it.

yes please! ... Solr already has system property variable replacement in 
solrconfig.xml, and we discussed a while back (on solr-dev i think) adding 
code to automatily create system properties on startup for any solr/* JNDI 
variables set so the same variable subst code could be reused ... but i 
don't think anyone ever opened an issue or created a patch for it.



-Hoss



Re: Solr, Multiple processes running

2007-12-11 Thread Erick Erickson
You're right, I'm wrong. I certainly am willing to defer to someone
who's been there before .

On Dec 11, 2007 4:44 PM, Otis Gospodnetic <[EMAIL PROTECTED]>
wrote:

> Keeping track of 1000+ indices is actually not that hard.  I've
> implemented Simpy - http://simpy.com - in a way that keeps each member's
> index (or indices - some users have multiple indices) separate.  I can't
> give out the total number of Simpy users, but I can tell you it is
> weeell beyond 1000 :)
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
> - Original Message 
> From: Erick Erickson <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, December 11, 2007 4:33:45 PM
> Subject: Re: Solr, Multiple processes running
>
> How much data are we talking about here? Because it seems *much*
>  simpler
> to just index a field with each document indicating the user and then
>  just
> AND that user's ID in with your query.
>
> Or think about facets (although I admit I don't know enough about
>  facets
> to weigh in on its merits, it's just been mentioned a lot).
>
> Keeping track of 1,000+ indexes seems like a maintenance headache, but
> much depends upon how much data you're talking about.
>
> When replying, the number of documents is almost, but not quite
> totally, useless unless combined with the number of fields you're
> storing per doc, the average length of each field, etc .
>
> Erick
>
> On Dec 11, 2007 4:01 PM, Owens, Martin <[EMAIL PROTECTED]>
>  wrote:
>
> > Hello everyone,
> >
> > The system we're moving from (dtSearch) allows each of our clients to
>  have
> > a search index. So far I have yet to find the options required to set
>  this,
> > it seems I can only set the directory path before run time.
> >
> > Each of the indexes uses the same schema, same configuration just
> > different data in each; what kind of performance penalty would I have
>  from
> > running a new solr instance per required database? what is the best
>  way to
> > track what port or what index is being used? would I be able to run
>  1,000 or
> > more solr instances without performance degradation?
> >
> > Thanks for your help.
> >
> > Best regards, Martin Owens
> >
>
>
>
>


Re: Two Solr Webapps, one folder for the index data?

2007-12-11 Thread Otis Gospodnetic
Maybe I'm confused.  Can't you use the brand-spanking new MultiCore stuff for 
this, or JNDI, as I just mentioned in the "Re: Solr, Multiple processes 
running" thread?

Otis 
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: patrick o'leary <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Tuesday, December 11, 2007 2:18:58 PM
Subject: Re: Two Solr Webapps, one folder for the index data?




  
  

JVM properties restrict you to a single implementation within a jvm.



For instance if you want multiple instances of solr running with the
same schema, with different data

dir's in the one app server. You'll have to have several copies of
solrconfig and schema.xml.



By using context environment, I can have multiple contexts like



pjaol:~/tmp/locallucene/solr/tomcat-conf pjaol$ more solr.xml 



   

   





pjaol:~/tmp/locallucene/solr/tomcat-conf pjaol$ more solr1.xml 



   

   





Changing just the solr/data/dir for each instance.



And in my solrconfig.xml

 ${env/solr/data/dir:./solr/data}



It certainly makes development & operations easier.



P



Mike Klaas wrote:
I use jvm system properties for this; they seem to work
well.
  

  

-Mike
  

  

On 11-Dec-07, at 7:39 AM, patrick o'leary wrote:
  

  

  I actually have a patch for solr config
parser which allows you to use context environment variables in the
solrconfig.xml


I generally use it for development when I'm working with multiple
instances and different data dirs.  I'll add it to jira today if you
want it.




P




Jörg Kiegeland wrote:


I have successfully configured two
parallel  Solr webapps , however I see that all data gets stored in one
folder of my Tomcat installation, namely C:\Tomcat\solr\data\index.
  

  

How can I configure that each Solr webapp shall store the data in the
folders I assigned at  , where already the Solr scheme etc.
resides (so that it get stored at individualSolrFolder/data/index)?
  

  

Thanks
  

  




-- 

Patrick O'Leary You see, wire telegraph is a kind of a very, very long
cat. You pull his tail in New York and his head is meowing in Los
Angeles. Do you understand this? And radio operates exactly the same
way: you send signals here, they receive them there. The only
difference is that there is no cat. - Albert Einstein





View Patrick O Leary's profile


  
  




-- 

Patrick O'Leary

You see, wire telegraph is a kind of a very, very long cat. You pull his tail 
in New York and his head is meowing in Los Angeles.
 Do you understand this? 
And radio operates exactly the same way: you send signals here, they receive 
them there. The only difference is that there is no cat.
  - Albert Einstein

View
Patrick O Leary's profile







Re: Solr, Multiple processes running

2007-12-11 Thread Erick Erickson
How much data are we talking about here? Because it seems *much* simpler
to just index a field with each document indicating the user and then just
AND that user's ID in with your query.

Or think about facets (although I admit I don't know enough about facets
to weigh in on its merits, it's just been mentioned a lot).

Keeping track of 1,000+ indexes seems like a maintenance headache, but
much depends upon how much data you're talking about.

When replying, the number of documents is almost, but not quite
totally, useless unless combined with the number of fields you're
storing per doc, the average length of each field, etc .

Erick

On Dec 11, 2007 4:01 PM, Owens, Martin <[EMAIL PROTECTED]> wrote:

> Hello everyone,
>
> The system we're moving from (dtSearch) allows each of our clients to have
> a search index. So far I have yet to find the options required to set this,
> it seems I can only set the directory path before run time.
>
> Each of the indexes uses the same schema, same configuration just
> different data in each; what kind of performance penalty would I have from
> running a new solr instance per required database? what is the best way to
> track what port or what index is being used? would I be able to run 1,000 or
> more solr instances without performance degradation?
>
> Thanks for your help.
>
> Best regards, Martin Owens
>


Re: SOLR X FAST

2007-12-11 Thread Matthew Runo

I think it all depends, what do you want out of Solr or FAST?

Thanks!

Matthew Runo
Software Developer
702.943.7833

On Dec 11, 2007, at 2:09 PM, William Silva wrote:


Hi,
How is the best way to compare SOLR and FAST Search ?
Thanks,
William.




Solr and Flex

2007-12-11 Thread jenix

Has anyone used Solr in a Flex application? 
Any code snipplets to share?

Thank you.
Jennifer
-- 
View this message in context: 
http://www.nabble.com/Solr-and-Flex-tp14284703p14284703.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: SOLR X FAST

2007-12-11 Thread William Silva
Hi,
Why use FAST and not use SOLR ? For example.
What will FAST offer that will justify the investment ?
I would like a matrix comparing both.
Thanks,
William.

On Dec 11, 2007 8:15 PM, Matthew Runo <[EMAIL PROTECTED]> wrote:

> I think it all depends, what do you want out of Solr or FAST?
>
>Thanks!
>
> Matthew Runo
> Software Developer
> 702.943.7833
>
> On Dec 11, 2007, at 2:09 PM, William Silva wrote:
>
> > Hi,
> > How is the best way to compare SOLR and FAST Search ?
> > Thanks,
> > William.
>
>


Re: SOLR X FAST

2007-12-11 Thread Ravish Bhagdev
Stability and better Support (at great cost obviously)

On Dec 11, 2007 10:20 PM, William Silva <[EMAIL PROTECTED]> wrote:
> Hi,
> Why use FAST and not use SOLR ? For example.
> What will FAST offer that will justify the investment ?
> I would like a matrix comparing both.
> Thanks,
> William.
>
>
> On Dec 11, 2007 8:15 PM, Matthew Runo <[EMAIL PROTECTED]> wrote:
>
> > I think it all depends, what do you want out of Solr or FAST?
> >
> >Thanks!
> >
> > Matthew Runo
> > Software Developer
> > 702.943.7833
> >
> > On Dec 11, 2007, at 2:09 PM, William Silva wrote:
> >
> > > Hi,
> > > How is the best way to compare SOLR and FAST Search ?
> > > Thanks,
> > > William.
> >
> >
>


Re: Solr, Multiple processes running

2007-12-11 Thread Walter Underwood
Since they all use the same schema, can you add a client ID to each document
when it is indexed? Filter by "clientid:4" and you get a subset of the
index.

wunder

On 12/11/07 1:01 PM, "Owens, Martin" <[EMAIL PROTECTED]> wrote:

> Hello everyone,
> 
> The system we're moving from (dtSearch) allows each of our clients to have a
> search index. So far I have yet to find the options required to set this, it
> seems I can only set the directory path before run time.
> 
> Each of the indexes uses the same schema, same configuration just different
> data in each; what kind of performance penalty would I have from running a new
> solr instance per required database? what is the best way to track what port
> or what index is being used? would I be able to run 1,000 or more solr
> instances without performance degradation?
> 
> Thanks for your help.
> 
> Best regards, Martin Owens



Re: distributing indexes via solr

2007-12-11 Thread Mike Klaas

On 10-Dec-07, at 12:50 PM, Doug T wrote:

I have been using parallelmultisearches on multi-CPU machines, and  
seen
sizable benefit over a single large index (even if all of the  
fragments are
on 1 disk).  Is there a way to quickly enable this on a solr  
server?   Or do

I need to go into the source to make the change?


Unfortunately, there is no easy way to enable this.  Patches welcome!

-Mike


Re: SOLR X FAST

2007-12-11 Thread Nuno Leitao
Depends, if you are looking for a small sized index (gigabytes rather  
than dozens or hundreds of gigabytes or terabytes) with relatively  
simple requirements (a few facets, simple tokenization, English only  
linguistics, etc.) Solr is likely to be appropriate for most cases.


FAST however gives you great horizontal scalability, out of the box  
linguistics for many languages (including CJK), contextual and scope  
searching, a web, file and database crawler, programmable ingestion  
pipeline, etc.


Regards.

--Nuno

On 11 Dec 2007, at 22:09, William Silva wrote:


Hi,
How is the best way to compare SOLR and FAST Search ?
Thanks,
William.




Re: SOLR X FAST

2007-12-11 Thread Ravish Bhagdev
Could you please elaborate on what you mean by ingestion pipeline and
horizontal scalability?  I apologize if this is a stupid question
everyone else on the forum is familiar with.

Thanks,
Ravi

On Dec 12, 2007 1:09 AM, Nuno Leitao <[EMAIL PROTECTED]> wrote:
> Depends, if you are looking for a small sized index (gigabytes rather
> than dozens or hundreds of gigabytes or terabytes) with relatively
> simple requirements (a few facets, simple tokenization, English only
> linguistics, etc.) Solr is likely to be appropriate for most cases.
>
> FAST however gives you great horizontal scalability, out of the box
> linguistics for many languages (including CJK), contextual and scope
> searching, a web, file and database crawler, programmable ingestion
> pipeline, etc.
>
> Regards.
>
> --Nuno
>
>
> On 11 Dec 2007, at 22:09, William Silva wrote:
>
> > Hi,
> > How is the best way to compare SOLR and FAST Search ?
> > Thanks,
> > William.
>
>


Re: SOLR X FAST

2007-12-11 Thread Nuno Leitao


FAST uses two pipelines - an ingestion pipeline (for document feeding)  
and a query pipeline which are fully programmable (i.e., you can  
customize it fully). At ingestion time you typically prepare documents  
for indexing (tokenize, character normalize, lemmatize, clean up text,  
perform entity extraction for facets, perform static boosting for  
certain documents, etc.), while at query time you can expand synonyms,  
and do other general query side tasks (not unlike Solr).


Horizontal scalability means the ability to cluster your search engine  
across a large number of servers, so you can scale up on the number of  
documents, queries, crawls, etc.


There are FAST deployments out there which run on dozens, in some  
cases hundreds of nodes serving multiple terabyte size indexes and  
achieving hundreds of queries per seconds.


Yet again, if your requirements are relatively simple then Lucene  
might do the job just fine.


Hope this helps.

--Nuno.

On 12 Dec 2007, at 01:33, Ravish Bhagdev wrote:


Could you please elaborate on what you mean by ingestion pipeline and
horizontal scalability?  I apologize if this is a stupid question
everyone else on the forum is familiar with.

Thanks,
Ravi

On Dec 12, 2007 1:09 AM, Nuno Leitao <[EMAIL PROTECTED]> wrote:

Depends, if you are looking for a small sized index (gigabytes rather
than dozens or hundreds of gigabytes or terabytes) with relatively
simple requirements (a few facets, simple tokenization, English only
linguistics, etc.) Solr is likely to be appropriate for most cases.

FAST however gives you great horizontal scalability, out of the box
linguistics for many languages (including CJK), contextual and scope
searching, a web, file and database crawler, programmable ingestion
pipeline, etc.

Regards.

--Nuno


On 11 Dec 2007, at 22:09, William Silva wrote:


Hi,
How is the best way to compare SOLR and FAST Search ?
Thanks,
William.







RE: SOLR X FAST

2007-12-11 Thread Norskog, Lance
FAST is a little less flexible (no dynamic fields) and not programmable
at the Lucene level.

We recently switched from FAST to Solr because of cost reasons.  They
did not know how to license us; they are used to, say, IBM running FAST
on hundreds of servers.  We are a startup with very specific needs. It's
turned out to be worthwhile because we only want to do one thing really
well and we can customize Solr for it. 

Lance

-Original Message-
From: Nuno Leitao [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, December 11, 2007 5:51 PM
To: solr-user@lucene.apache.org
Subject: Re: SOLR X FAST


FAST uses two pipelines - an ingestion pipeline (for document feeding)
and a query pipeline which are fully programmable (i.e., you can
customize it fully). At ingestion time you typically prepare documents
for indexing (tokenize, character normalize, lemmatize, clean up text,
perform entity extraction for facets, perform static boosting for
certain documents, etc.), while at query time you can expand synonyms,
and do other general query side tasks (not unlike Solr).

Horizontal scalability means the ability to cluster your search engine
across a large number of servers, so you can scale up on the number of
documents, queries, crawls, etc.

There are FAST deployments out there which run on dozens, in some cases
hundreds of nodes serving multiple terabyte size indexes and achieving
hundreds of queries per seconds.

Yet again, if your requirements are relatively simple then Lucene might
do the job just fine.

Hope this helps.

--Nuno.

On 12 Dec 2007, at 01:33, Ravish Bhagdev wrote:

> Could you please elaborate on what you mean by ingestion pipeline and 
> horizontal scalability?  I apologize if this is a stupid question 
> everyone else on the forum is familiar with.
>
> Thanks,
> Ravi
>
> On Dec 12, 2007 1:09 AM, Nuno Leitao <[EMAIL PROTECTED]> wrote:
>> Depends, if you are looking for a small sized index (gigabytes rather

>> than dozens or hundreds of gigabytes or terabytes) with relatively 
>> simple requirements (a few facets, simple tokenization, English only 
>> linguistics, etc.) Solr is likely to be appropriate for most cases.
>>
>> FAST however gives you great horizontal scalability, out of the box 
>> linguistics for many languages (including CJK), contextual and scope 
>> searching, a web, file and database crawler, programmable ingestion 
>> pipeline, etc.
>>
>> Regards.
>>
>> --Nuno
>>
>>
>> On 11 Dec 2007, at 22:09, William Silva wrote:
>>
>>> Hi,
>>> How is the best way to compare SOLR and FAST Search ?
>>> Thanks,
>>> William.
>>
>>



Re: Facets - What's a better term for non technical people?

2007-12-11 Thread Mike Klaas

"category counts"

On 11-Dec-07, at 6:38 PM, Norskog, Lance wrote:


In SQL terms they are: 'select unique'. Except on only one field.

-Original Message-
From: Charles Hornberger [mailto:[EMAIL PROTECTED]
Sent: Tuesday, December 11, 2007 9:49 AM
To: solr-user@lucene.apache.org
Subject: Re: Facets - What's a better term for non technical people?

FAST calls them "navigators" (which I think is a terrible term -  
YMMV of course :-))


I tend to think that "filters" -- or perhaps "dynamic filters" --  
captures the essential function.


On Dec 11, 2007 2:38 AM, "DAVIGNON Andre - CETE NP/DIODé/PANDOC"
<[EMAIL PROTECTED]> wrote:

Hi,


So, has anyone got a good example of the language they might use
over, say, a set of radio buttons and fields on a web form, to
indicate that selecting one or more of these would return facets.  
'Show grouping by'

or 'List the sets that the results fall into' or something similar.


Here's what i found some time :
http://www.searchtools.com/info/faceted-metadata.html

It has been quite useful to me.

André Davignon






RE: Facets - What's a better term for non technical people?

2007-12-11 Thread Norskog, Lance
In SQL terms they are: 'select unique'. Except on only one field. 

-Original Message-
From: Charles Hornberger [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, December 11, 2007 9:49 AM
To: solr-user@lucene.apache.org
Subject: Re: Facets - What's a better term for non technical people?

FAST calls them "navigators" (which I think is a terrible term - YMMV of course 
:-))

I tend to think that "filters" -- or perhaps "dynamic filters" -- captures the 
essential function.

On Dec 11, 2007 2:38 AM, "DAVIGNON Andre - CETE NP/DIODé/PANDOC"
<[EMAIL PROTECTED]> wrote:
> Hi,
>
> > So, has anyone got a good example of the language they might use 
> > over, say, a set of radio buttons and fields on a web form, to 
> > indicate that selecting one or more of these would return facets. 'Show 
> > grouping by'
> > or 'List the sets that the results fall into' or something similar.
>
> Here's what i found some time : 
> http://www.searchtools.com/info/faceted-metadata.html
>
> It has been quite useful to me.
>
> André Davignon
>
>


Re: SOLR X FAST

2007-12-11 Thread Otis Gospodnetic
Just to comment on that last part:
"There are FAST deployments out there which run on dozens, in some  
cases hundreds of nodes serving multiple terabyte size indexes and  
achieving hundreds of queries per seconds."

There are also a lot of Lucene or Solr deployments with similar setups - I've 
worked on and with some decent-sized search clusters with 50-100 search 
servers, fault tolerance, high input rates, and high query rates.

It's all doable with Lucene and Solr, it's just that not everything comes out 
of the box, so you have to either find somebody to help out or do it in house.  
You pay developers to build exactly what you need as opposed to paying FAST a 
pile of $$$ based on, say, the query rate.  Or you don't pay $250,000 for a 
Google Appliance that can index only 10MM docs (this is a real number).

Otis

--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Nuno Leitao <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Tuesday, December 11, 2007 8:50:37 PM
Subject: Re: SOLR X FAST


FAST uses two pipelines - an ingestion pipeline (for document feeding)
  
and a query pipeline which are fully programmable (i.e., you can  
customize it fully). At ingestion time you typically prepare documents
  
for indexing (tokenize, character normalize, lemmatize, clean up text,
  
perform entity extraction for facets, perform static boosting for  
certain documents, etc.), while at query time you can expand synonyms,
  
and do other general query side tasks (not unlike Solr).

Horizontal scalability means the ability to cluster your search engine
  
across a large number of servers, so you can scale up on the number of
  
documents, queries, crawls, etc.

There are FAST deployments out there which run on dozens, in some  
cases hundreds of nodes serving multiple terabyte size indexes and  
achieving hundreds of queries per seconds.

Yet again, if your requirements are relatively simple then Lucene  
might do the job just fine.

Hope this helps.

--Nuno.

On 12 Dec 2007, at 01:33, Ravish Bhagdev wrote:

> Could you please elaborate on what you mean by ingestion pipeline and
> horizontal scalability?  I apologize if this is a stupid question
> everyone else on the forum is familiar with.
>
> Thanks,
> Ravi
>
> On Dec 12, 2007 1:09 AM, Nuno Leitao <[EMAIL PROTECTED]> wrote:
>> Depends, if you are looking for a small sized index (gigabytes
 rather
>> than dozens or hundreds of gigabytes or terabytes) with relatively
>> simple requirements (a few facets, simple tokenization, English only
>> linguistics, etc.) Solr is likely to be appropriate for most cases.
>>
>> FAST however gives you great horizontal scalability, out of the box
>> linguistics for many languages (including CJK), contextual and scope
>> searching, a web, file and database crawler, programmable ingestion
>> pipeline, etc.
>>
>> Regards.
>>
>> --Nuno
>>
>>
>> On 11 Dec 2007, at 22:09, William Silva wrote:
>>
>>> Hi,
>>> How is the best way to compare SOLR and FAST Search ?
>>> Thanks,
>>> William.
>>
>>






Re: Facets - What's a better term for non technical people?

2007-12-11 Thread Otis Gospodnetic
Isn't that GROUP BY ColumnX, count(1) type of thing?

I'd think "group by" would be a good label.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: "Norskog, Lance" <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Tuesday, December 11, 2007 9:38:37 PM
Subject: RE: Facets - What's a better term for non technical people?

In SQL terms they are: 'select unique'. Except on only one field. 

-Original Message-
From: Charles Hornberger [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, December 11, 2007 9:49 AM
To: solr-user@lucene.apache.org
Subject: Re: Facets - What's a better term for non technical people?

FAST calls them "navigators" (which I think is a terrible term - YMMV
 of course :-))

I tend to think that "filters" -- or perhaps "dynamic filters" --
 captures the essential function.

On Dec 11, 2007 2:38 AM, "DAVIGNON Andre - CETE NP/DIODé/PANDOC"
<[EMAIL PROTECTED]> wrote:
> Hi,
>
> > So, has anyone got a good example of the language they might use 
> > over, say, a set of radio buttons and fields on a web form, to 
> > indicate that selecting one or more of these would return facets.
 'Show grouping by'
> > or 'List the sets that the results fall into' or something similar.
>
> Here's what i found some time : 
> http://www.searchtools.com/info/faceted-metadata.html
>
> It has been quite useful to me.
>
> André Davignon
>
>