Solr on JBOSS 4.0.3

2007-05-31 Thread Thierry Collogne

Hello,

We are trying to run Solr on JBOSS 4.0.3, and are heaving an issue.
When we deploy the war and start our server we get a
ExceptionInInitializerError.

This is part of the stacktrace:

Caused by: java.lang.RuntimeException: XPathFactory#newInstance() failed to
create an XPathFactory for the default object model:
http://java.sun.com/jaxp/xpath/dom with the
XPathFactoryConfigurationException:
javax.xml.xpath.XPathFactoryConfigurationException: No XPathFctory
implementation found for the object model:
http://java.sun.com/jaxp/xpath/dom
   at javax.xml.xpath.XPathFactory.newInstance(Unknown Source)
   at org.apache.solr.core.Config.clinit(Config.java:49)


After some searching on google, I found a suggestion to remove all libs from
JBOS_HOME\lib\endorsed

I have tried this on my local machine and it works.
Now the problem is that, due to our infrastructure, it is not really
possible to remove these libraries from our production servers, so we need
to come up with another solution.

The libraries in JBOS_HOME\lib\endorsed are :

resolver.jar
xalan.jar
xercesImpl.jar
xml-apis.jar


Is there someone who can explain to me what the dependencies are with the
above jar files? Are perhaps offer another solution?

Thank you.


Re: SOLR Indexing/Querying

2007-05-31 Thread Frans Flippo

I think if you add a field that has an analyzer that creates tokens on
alpha/digit/punctuation boundaries, that should go a long way. Use that both
at index and search time.

For example:
* 3555LHP  becomes 3555 LHP
 Searching for D3555 becomes D OR 3555, so it matches on token 3555
from 3555LHP.

* t14240 becomes t 14240
 Searching for t14240-ss  becomes t OR 14240 OR ss, matching 14240
from t14240.

Similarly for your other examples.

If this proves to be too broad, you may need to define some stricter rules,
but you could use this for starters.

I think you will have to write your own analyzer, as it doesn't look like
any of the analyzers available in Solr/Lucene do exactly what you need. But
that's relatively straightforward. Just start with the code from one of the
existing Analyzers (e.g. KeywordAnalyzer).

Good luck,
Frans

On 5/31/07, realw5 [EMAIL PROTECTED] wrote:



Hey Guys,
I need some guidance in regards to a problem we are having with our solr
index. Below is a list of terms our customers search for, which are
failing
or not returning the complete set. The second side of the list is the
product id/keyword we want it to match.

Can you give me some direction on how this can (or let me know if i can't
be
done) with index/query analyzers. Any help is much appeciated!

Dan

---

Keyword Typed In / We want it to find

D3555 / 3555LHP
D460160-BN / D460160
D460160BN / D460160
Dd454557 / D454557
84200ORB / 84200
84200-ORB / 84200
T13420-SCH / T13420
t14240-ss / t14240
--
View this message in context:
http://www.nabble.com/SOLR-Indexing-Querying-tf3843221.html#a10883456
Sent from the Solr - User mailing list archive at Nabble.com.




AW: SOLR Indexing/Querying

2007-05-31 Thread Burkamp, Christian
Hi there,

It looks alot like using Solr's standard WordDelimiterFilter (see the sample 
schema.xml) does what you need.
It splits on alphabetical to numeric boundaries and on the various kinds of 
intra word delimiters like -, _ or .. You can decide whether the parts 
are put together again in addition to the split up tokens. Control this by the 
parameters catenateWords, catenateNumbers and catenateAll.
Good documentation on this topic is found on the wiki

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-1c9b83870ca7890cd73b193cefed83c283339089

-- Christian


-Ursprüngliche Nachricht-
Von: Frans Flippo [mailto:[EMAIL PROTECTED] 
Gesendet: Donnerstag, 31. Mai 2007 11:27
An: solr-user@lucene.apache.org
Betreff: Re: SOLR Indexing/Querying


I think if you add a field that has an analyzer that creates tokens on 
alpha/digit/punctuation boundaries, that should go a long way. Use that both at 
index and search time.

For example:
* 3555LHP  becomes 3555 LHP
  Searching for D3555 becomes D OR 3555, so it matches on token 3555 from 
3555LHP.

* t14240 becomes t 14240
  Searching for t14240-ss  becomes t OR 14240 OR ss, matching 14240 
from t14240.

Similarly for your other examples.

If this proves to be too broad, you may need to define some stricter rules, but 
you could use this for starters.

I think you will have to write your own analyzer, as it doesn't look like any 
of the analyzers available in Solr/Lucene do exactly what you need. But that's 
relatively straightforward. Just start with the code from one of the existing 
Analyzers (e.g. KeywordAnalyzer).

Good luck,
Frans

On 5/31/07, realw5 [EMAIL PROTECTED] wrote:


 Hey Guys,
 I need some guidance in regards to a problem we are having with our 
 solr index. Below is a list of terms our customers search for, which 
 are failing or not returning the complete set. The second side of the 
 list is the product id/keyword we want it to match.

 Can you give me some direction on how this can (or let me know if i 
 can't be
 done) with index/query analyzers. Any help is much appeciated!

 Dan

 ---

 Keyword Typed In / We want it to find

 D3555 / 3555LHP
 D460160-BN / D460160
 D460160BN / D460160
 Dd454557 / D454557
 84200ORB / 84200
 84200-ORB / 84200
 T13420-SCH / T13420
 t14240-ss / t14240
 --
 View this message in context: 
 http://www.nabble.com/SOLR-Indexing-Querying-tf3843221.html#a10883456
 Sent from the Solr - User mailing list archive at Nabble.com.





Re: facet question

2007-05-31 Thread Yonik Seeley

On 5/31/07, Gal Nitzan [EMAIL PROTECTED] wrote:

We have a small index with about 4 million docs.

On this index we have a field tags which is a multiple values field.

Running a facet query on the index with something like:
facet=truefacetField=tagsq=type:video takes about 1 minute.

We have defined a large cache which enables the query to run much faster
(about 1 sec)

filterCache
 class=solr.LRUCache
 size=150
 initialSize=60
 autowarmCount=30/


However, the cache size brings us to the 2GB limit.


To reduce memory usage, you could try setting the facet.enum.cache.minDf
parameter to a low value (on a recent nightly build, soon 1.2).  If that
slows things down too much and your index is not optimized, then you
could try optimizing it.

-Yonik


Re: Schema question: overriding fieldType attributes in field element

2007-05-31 Thread Yonik Seeley

On 5/31/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

I am trying to override the tokenized attribute of a single FieldType from
the field attribute in schema.xml, but it doesn't seem to work


The tokenized attribute is not settable from the schema, and there
is no reason I can think of why this would be useful rather than
confusing.

untokenized means don't use the analyzer.   If you don't want an
analyzer, then use the string type.


PS: Yes, I know I could use solr.StrField for those fields


Could you provide a use-case why you don't want to use StrField
(normally type string in the schema)?  What is the external behavior
you are looking for?

-Yonik


Re: Schema question: overriding fieldType attributes in field element

2007-05-31 Thread RWatkins
Thanks for the prompt response. Comments below ...

[EMAIL PROTECTED] wrote on 05/31/2007 10:55:57 AM:

 On 5/31/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
  I am trying to override the tokenized attribute of a single FieldType 
from
  the field attribute in schema.xml, but it doesn't seem to work
 
 The tokenized attribute is not settable from the schema, and there
 is no reason I can think of why this would be useful rather than
 confusing.

You say the tokenized attribute is not settable from the schema, but the 
output from IndexSchema.readConfig shows that the properties are indeed 
read, and the resulting SchemaField object retains these properties: are 
they then ignored?

 untokenized means don't use the analyzer.   If you don't want an
 analyzer, then use the string type.

This is true only in the simplest of cases. An analyzer can do far more 
than tokenize: it can stem, change to lower case, etc. What if you want 
one or more of these things to happen, but you don't want tokenization? In 
this particular case I want to be able to make exact matches on the entire 
field, so that a search for +termExact:pain (remember that my searches 
are case insensitive, thanks to my analyzer (and regardless of 
tokenization)) will return _only_ the document in which the termExact 
field contains the single word Pain or pain, and not Back Pain, etc.

  PS: Yes, I know I could use solr.StrField for those fields
 
 Could you provide a use-case why you don't want to use StrField
 (normally type string in the schema)?  What is the external behaviour
 you are looking for?

Part of the answer to this question is in the last paragraph, but perhaps 
you want to know why I would like to consolidate all field properties to 
the field element. The reason for this is that the schema is read by 
another class to give access to field properties and more outside the Solr 
context.

-- Robert




Re: Schema question: overriding fieldType attributes in field element

2007-05-31 Thread Yonik Seeley

On 5/31/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

You say the tokenized attribute is not settable from the schema, but the
output from IndexSchema.readConfig shows that the properties are indeed
read, and the resulting SchemaField object retains these properties: are
they then ignored?


Not sure off the top of my head, but don't use it... it's shouldn't be
documented anywhere.
It probably slipped through as part of generic options parsing.


 untokenized means don't use the analyzer.   If you don't want an
 analyzer, then use the string type.

This is true only in the simplest of cases. An analyzer can do far more
than tokenize: it can stem, change to lower case, etc. What if you want
one or more of these things to happen, but you don't want tokenization?



From a Lucene perspective, if you create an untokenized field, the

analyzer will not be used at all.  It should have probably been named
unanalyzed, as that's more accurate.

KeywordTokenizer (via KeywordTokenizerFactory) is probably what you
are looking for.
Create a new text field type with that as the tokenizer, followed by
whatever filters you want (like lowercasing).

-Yonik


Commit failing with EOFException

2007-05-31 Thread Matt Mitchell

Hi,

I've had this application running before and not sure what has  
changed to cause this error. When trying to do a clean update  
(removed index dir and restarted solr) with just a commit/, Solr is  
returning a status 1 with this error at the top:


java.io.EOFException: input contained no data

Does anyone have any idea as to why that's happening? The same thing  
occurs when I try to use the post.sh script with a valid xml file.


Thank you!

Matt


Re: Schema question: overriding fieldType attributes in field element

2007-05-31 Thread RWatkins
Thanks, but I think I'm going to have to work out a different solution. I 
have written my own analyzer that does everything I need: it's not a 
different analyzer I need but a way to specify that certain fields should 
be tokenized and others not -- while still leaving all other options open.

As far as the generic options parsing resulting in unused properties in a 
ShcemaField object, not it is not specifically documented anywhere, but 
the Solr Wiki lists, for both fields and field types: Common options that 
fields can have are I could not find anywhere a definitive list of 
what is allowed/used or excluded, so I went to the code and found that the 
tokenized would indeed be respected in SchemaField.

-- Robert

[EMAIL PROTECTED] wrote on 05/31/2007 11:30:04 AM:

 On 5/31/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
  You say the tokenized attribute is not settable from the schema, but 
the
  output from IndexSchema.readConfig shows that the properties are 
indeed
  read, and the resulting SchemaField object retains these properties: 
are
  they then ignored?
 
 Not sure off the top of my head, but don't use it... it's shouldn't be
 documented anywhere.
 It probably slipped through as part of generic options parsing.
 
   untokenized means don't use the analyzer.   If you don't want an
   analyzer, then use the string type.
  
  This is true only in the simplest of cases. An analyzer can do far 
more
  than tokenize: it can stem, change to lower case, etc. What if you 
want
  one or more of these things to happen, but you don't want 
tokenization?
 
 From a Lucene perspective, if you create an untokenized field, the
 analyzer will not be used at all.  It should have probably been named
 unanalyzed, as that's more accurate.
 
 KeywordTokenizer (via KeywordTokenizerFactory) is probably what you
 are looking for.
 Create a new text field type with that as the tokenizer, followed by
 whatever filters you want (like lowercasing).
 
 -Yonik



Re: Commit failing with EOFException

2007-05-31 Thread Matt Mitchell
OK figured this out. The short of it is, make sure your schema is  
always up to date! : )


The schema did not match the xml docs being posted. And because we  
had a previous solr update with those docs, even trying to post/ 
update a commit/ was failing because there was already bad data  
waiting to be committed.


Matt

On May 31, 2007, at 11:42 AM, Matt Mitchell wrote:


Hi,

I've had this application running before and not sure what has  
changed to cause this error. When trying to do a clean update  
(removed index dir and restarted solr) with just a commit/, Solr  
is returning a status 1 with this error at the top:


java.io.EOFException: input contained no data

Does anyone have any idea as to why that's happening? The same  
thing occurs when I try to use the post.sh script with a valid xml  
file.


Thank you!

Matt




Re: AW: SOLR Indexing/Querying

2007-05-31 Thread Chris Hostetter

: It looks alot like using Solr's standard WordDelimiterFilter (see the
: sample schema.xml) does what you need.

WordDelimiterFilter will only get you so far.  it can split the indexed
text of 3555LHP into tokens 3555 and LHP; and the user entered
D3555 into the tokens D and 3555 -- but because those tokens
orriginated as part of a single chunk of input text, the QueryParser will
turn them into a phrase query, which will not match on the single token
3555 ... the D just isn't there.

I can't think of anyway to achieve what you want out of the box i think
you'd need a custom ReuestHandler that uses your own query parser which
uses boolean queries instead of PhraseQueries.


:  Keyword Typed In / We want it to find
: 
:  D3555 / 3555LHP
:  D460160-BN / D460160
:  D460160BN / D460160
:  Dd454557 / D454557
:  84200ORB / 84200
:  84200-ORB / 84200
:  T13420-SCH / T13420
:  t14240-ss / t14240




-Hoss



Re: Schema question: overriding fieldType attributes in field element

2007-05-31 Thread RWatkins
Chris Hostetter [EMAIL PROTECTED] wrote on 05/31/2007 02:28:58 
PM:

 I'm having a little trouble following this discussion, first off as to
 your immediate issue...
 
 : Thanks, but I think I'm going to have to work out a different 
solution. I
 : have written my own analyzer that does everything I need: it's not a
 : different analyzer I need but a way to specify that certain fields 
should
 : be tokenized and others not -- while still leaving all other options 
open.
 
 ...maybe there is some terminology confusion here ... if you've already
 got an Analyzer (capital A Lucene classname) then you can specify it 
for
 one fieldType, and use that field type for the fields you want analysis
 done.  if you have other fields were you don't want tokenizing/analysis
 done, use a different fieldType (with a StrField).

This is precisely what I've done (but see below for more).

 As for your followup question...
 
 : As far as the generic options parsing resulting in unused properties 
in a
 : ShcemaField object, not it is not specifically documented anywhere, 
but
 : the Solr Wiki lists, for both fields and field types: Common options 
that
 : fields can have are I could not find anywhere a definitive list 
of
 : what is allowed/used or excluded, so I went to the code and found that 
the
 
 That's because there is no definitive list.  Every FieldType can define
 it's own list of attributes that can be declared and handled by it's own
 init method.
 
Unfortunately, unless I've missed something obvious, the tokenized 
property is not available to classes that extend FieldType: the setArgs() 
method of FieldType strips tokenized and other standard properties away 
before calling the init() method. Yes, of course one could override 
setArgs(), but that's not a robust solution.

The terminology confusion stems (sorry, pun sort of not intended) from the 
frequent overlap of the terms tokenize and analyze. As I mentioned in 
an earlier message on this thread, it is quite possible to create an 
Analyzer that does all sorts of things without tokenizing, or, more 
precisely, creates a single Token from the field value. I would posit that 
tokenization and analysis are two separate things, albeit most frequently 
done together.

-- Robert



Re: AW: SOLR Indexing/Querying

2007-05-31 Thread Walter Underwood
I solved something similar to this by creating a stemmer for part
numbers. Variations like -BN on the end can be treated as inflections
in the part number language, similar to plurals in English.

I used a set of regexes to match and transform, in some cases generating
multiple root part numbers. With the per-field analyzers in Solr, this
would work much better.

I'll make another search for the presentation that covers this. It was
at our Ultraseek Users Group Meeting in 1999.

wunder

On 5/31/07 11:46 AM, Chris Hostetter [EMAIL PROTECTED] wrote:

 
 : It looks alot like using Solr's standard WordDelimiterFilter (see the
 : sample schema.xml) does what you need.
 
 WordDelimiterFilter will only get you so far.  it can split the indexed
 text of 3555LHP into tokens 3555 and LHP; and the user entered
 D3555 into the tokens D and 3555 -- but because those tokens
 orriginated as part of a single chunk of input text, the QueryParser will
 turn them into a phrase query, which will not match on the single token
 3555 ... the D just isn't there.
 
 I can't think of anyway to achieve what you want out of the box i think
 you'd need a custom ReuestHandler that uses your own query parser which
 uses boolean queries instead of PhraseQueries.
 
 
 :  Keyword Typed In / We want it to find
 : 
 :  D3555 / 3555LHP
 :  D460160-BN / D460160
 :  D460160BN / D460160
 :  Dd454557 / D454557
 :  84200ORB / 84200
 :  84200-ORB / 84200
 :  T13420-SCH / T13420
 :  t14240-ss / t14240



RE: facet question

2007-05-31 Thread Gal Nitzan


 -Original Message-
 From: Mike Klaas [mailto:[EMAIL PROTECTED]
 Sent: Thursday, May 31, 2007 9:07 PM
 To: solr-user@lucene.apache.org
 Subject: Re: facet question

 On 31-May-07, at 1:33 AM, Gal Nitzan wrote:

  Hi,
 
  We have a small index with about 4 million docs.
 
  On this index we have a field tags which is a multiple values field.
 
  Running a facet query on the index with something like:
  facet=truefacetField=tagsq=type:video takes about 1 minute.
 
  We have defined a large cache which enables the query to run much
  faster
  (about 1 sec)
 
  filterCache
   class=solr.LRUCache
   size=150
   initialSize=60
   autowarmCount=30/
 
 
  However, the cache size brings us to the 2GB limit.

 If the cardinality of many of the tags is low, you can use HashSet-
 based filters (the default size at which a HashSet is used is 3000).
[Gal Nitzan]

I will appreciate a pointer to documentation on HahsSet based filters 
tahnks...



 Do you really have 1.5M unique values in that field.  Are you
 analyzing the field (you probably shouldn't be)?

[Gal Nitzan]
No it is not analyzed. Just indexed and stored.




 -Mike
[Gal Nitzan]





Re: Schema question: overriding fieldType attributes in field element

2007-05-31 Thread Chris Hostetter

: Unfortunately, unless I've missed something obvious, the tokenized
: property is not available to classes that extend FieldType: the setArgs()
: method of FieldType strips tokenized and other standard properties away
: before calling the init() method. Yes, of course one could override
: setArgs(), but that's not a robust solution.

in an ideal world Solr would not strip that property from the Map, since
it doesn't care about it, but sicne it does can't your init method just
call isTokenized() to determine it's value (like any of hte other
properties handled automaticly) ... the build in field types ignore it,
but you could write a custom FieldType that inspects it.

: The terminology confusion stems (sorry, pun sort of not intended) from the
: frequent overlap of the terms tokenize and analyze. As I mentioned in
: an earlier message on this thread, it is quite possible to create an
: Analyzer that does all sorts of things without tokenizing, or, more
: precisely, creates a single Token from the field value. I would posit that
: tokenization and analysis are two separate things, albeit most frequently
: done together.

The semi-equivilece of the word tokenize when refering to fields and the
broader concept of Analysis orriginates with Lucene: in lucene you
declare a field TOKENIZED if you want the Analyzer used at all --
regardless of what the Analyzer does.  While i agreed ANALYZED would
have been a better name for that constant, in practice the istinction is
so subtle it almost doesn't matter: what you desribe as an Analyzer that
does all sorts of things without tokenizing i would call an Analyzer
that tokenizes it's input into a single token, and then does all
sorts of things  KeywordTokenizer works exactly like this.



-Hoss



Re: facet question

2007-05-31 Thread Mike Klaas

On 31-May-07, at 1:35 PM, Gal Nitzan wrote:



However, the cache size brings us to the 2GB limit.


If the cardinality of many of the tags is low, you can use HashSet-
based filters (the default size at which a HashSet is used is 3000).

[Gal Nitzan]

I will appreciate a pointer to documentation on HahsSet based filters
tahnks...


http://wiki.apache.org/solr/SolrConfigXml#head- 
ffe19c34abf267ca2d49d9e7102feab8c79b5fb5


Scroll down to the HashDocSet comment.

I'm not sure how much this will help--it depends greatly on the  
distribution of your tag values.


Also, I'm still suspicious about your application.  You have 1.5M  
distinct tags for 4M documents?  That seems quite dense.


-Mike


RE: facet question

2007-05-31 Thread Gal Nitzan


 -Original Message-
 From: Mike Klaas [mailto:[EMAIL PROTECTED]
 Sent: Friday, June 01, 2007 12:36 AM
 To: solr-user@lucene.apache.org
 Subject: Re: facet question

 On 31-May-07, at 1:35 PM, Gal Nitzan wrote:

 
  However, the cache size brings us to the 2GB limit.
 
  If the cardinality of many of the tags is low, you can use HashSet-
  based filters (the default size at which a HashSet is used is 3000).
  [Gal Nitzan]
 
  I will appreciate a pointer to documentation on HahsSet based filters
  tahnks...

 http://wiki.apache.org/solr/SolrConfigXml#head-
 ffe19c34abf267ca2d49d9e7102feab8c79b5fb5
[Gal Nitzan]
Thanks...

 Scroll down to the HashDocSet comment.

 I'm not sure how much this will help--it depends greatly on the
 distribution of your tag values.

 Also, I'm still suspicious about your application.  You have 1.5M
 distinct tags for 4M documents?  That seems quite dense.
[Gal Nitzan]
Basically the facet query runs on each access to the home page (viewed in a 
tag cloud) but it almost doesn't change...
Here are my cache stats:
description: LRU Cache(maxSize=100, initialSize=60, 
autowarmCount=30, 
[EMAIL PROTECTED])
stats:  lookups : 230538968
hits : 228863426
hitratio : 0.99
inserts : 1675543
evictions : 0
size : 944831
cumulative_lookups : 230538968
cumulative_hits : 228863426
cumulative_hitratio : 0.99
cumulative_inserts : 1675543
cumulative_evictions : 0


 -Mike




RE: RAMDirecotory instead of FSDirectory for SOLR

2007-05-31 Thread Jeryl Cook
Thats the thing,Terracotta persists everything it has in memory to the disk 
when it overflows(u can set how much u want to use in memory), or when the 
server goes offline.  When the server comes back the master terracotta simply 
loads it back into the memory of the once offline worker..identical to the 
approach SOLR already does to handle scalability, this allows unlimited 
storage of the items in memory, ... you just need to cluster the RAMDirectory 
according to the sample giving by TerracottaHowever i read some of the post 
here...I read some say:  i wonder how performance will be.,etci was 
trying to get it working..andload test the hell out it, and see how it acts 
with large amounts of data, and how it ompares with SOLR using typical 
FSDirectory approach.i plan to post findings..Jeryl Cook 



/^\ Pharaoh /^\ 

http://pharaohofkush.blogspot.com/ 



..Act your age, and not your shoe size..

-Prince(1986) Date: Thu, 31 May 2007 13:51:53 -0700 From: [EMAIL PROTECTED] 
To: solr-user@lucene.apache.org Subject: RE: RAMDirecotory instead of 
FSDirectory for SOLR   : board, looks like i can achieve this with the 
embedded version of SOLR : uses the lucene RAMDirectory to store the 
index..Jeryl Cook  yeah ... adding asolrconfig.xml option for using a 
RAMDirectory would be possible ... but almost meaningless for most people (the 
directory would go away when the server shuts down) ... even for use cases 
like what you describe (hooking in terrecota) it wouldn't be enough in itself, 
because there would be no hook to give terracota access to it.   -Hoss 

RE: facet question

2007-05-31 Thread Chris Hostetter

:  Also, I'm still suspicious about your application.  You have 1.5M
:  distinct tags for 4M documents?  That seems quite dense.

it's possible the app is using the filterCache for other things (on other
fields) besies just the tag field ... but that still doesn't explain one
thing...

: description:   LRU Cache(maxSize=100, initialSize=60,

...that doesn't look like it matches the config you posted earlier...

: filterCache
:  class=solr.LRUCache
:  size=150  --- not 100

...either way if you have that many unique tags i think the HashDocSet
suggestion may be the best way to go, since each tag probably has a very
low cardinality (i can't imagine they'd be very high with that kind of
ratio)

I would also give serious thought to caching the solr results externally
(using squid or memcached or something like that) ... Solr will cache the
individual computations for you very well .. but for something like a tag
cloud you probably don't care about the exact numeric values that much,
and minor fluctuations as tags are added removed (or new items come in)
aren't going to be a big issue.


-Hoss



RE: RAMDirecotory instead of FSDirectory for SOLR

2007-05-31 Thread Chris Hostetter

: Thats the thing,Terracotta persists everything it has in memory to the
: disk when it overflows(u can set how much u want to use in memory), or
: when the server goes offline.  When the server comes back the master
: terracotta simply loads it back into the memory of the once offline

Sure .. but Terracotta needs some control over the RAMDirectory to do all
this ... i'm also assuming that for the loading back into memory
part to work when a server starts up, terracotta actually needs to *own*
the RAMDirectory, so just making solr create a RAMDirectory on startup
based on a config option wouldn't be enough ... there would need to be
some (probably complex) hooks to determine how that RAMDirectory gets
created/managed to make what you are describing work.


-Hoss



RE: RAMDirecotory instead of FSDirectory for SOLR

2007-05-31 Thread Jeryl Cook
i have Terracotta to work with Lucene , and it works find with the 
RAMDirectory...i am trying to get it to work with SOLR(Hook the 
RAMDirectory..)..., when i do, ill post the findings,problems,etc..Thanks for 
feedback from everyone.Jeryl Cook 



/^\ Pharaoh /^\ 

http://pharaohofkush.blogspot.com/ 



..Act your age, and not your shoe size..

-Prince(1986) Date: Thu, 31 May 2007 18:24:26 -0700 From: [EMAIL PROTECTED] 
To: solr-user@lucene.apache.org Subject: RE: RAMDirecotory instead of 
FSDirectory for SOLR   Jeryl,  If you need any help getting Terracotta to 
work under Lucene or if you have any questions about performance tuning and/or 
load testing, you can also use the Terracotta community resources (mailing 
lists, forums, IRC, whatnot): 
http://www.terracotta.org/confluence/display/orgsite/Community.  We'd be 
more than happy to help you get this stuff working.  Cheers, Orion   
Jeryl Cook wrote:Thats the thing,Terracotta persists everything it has 
in memory to the  disk when it overflows(u can set how much u want to use in 
memory), or  when the server goes offline.  When the server comes back the 
master  terracotta simply loads it back into the memory of the once offline 
 worker..identical to the approach SOLR already does to handle  
scalability, this allows unlimited storage of the items in memory, ...  
you just need to cluster the RAMDirectory according to the sample giving  
by TerracottaHowever i read some of the post here...I read some say:   i 
wonder how performance will be.,etci was trying to get it  
working..andload test the hell out it, and see how it acts with large  
amounts of data, and how it ompares with SOLR using typical FSDirectory  
approach.i plan to post findings..Jeryl Cook /^\ Pharaoh /^\  
   http://pharaohofkush.blogspot.com/ ..Act your age, and 
not your shoe size..-Prince(1986) Date: Thu, 31 May 2007 13:51:53 
-0700 From:  [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: 
RE:  RAMDirecotory instead of FSDirectory for SOLR   : board, looks like 
i  can achieve this with the embedded version of SOLR : uses the lucene  
RAMDirectory to store the index..Jeryl Cook  yeah ... adding  
asolrconfig.xml option for using a RAMDirectory would be possible ... but  
almost meaningless for most people (the directory would go away when the  
server shuts down) ... even for use cases like what you describe (hooking  
in terrecota) it wouldn't be enough in itself, because there would be no  
hook to give terracota access to it.   -Hoss --  View this message 
in context: 
http://www.nabble.com/RAMDirecotory-instead-of-FSDirectory-for-SOLR-tf3843377.html#a10905062
 Sent from the Solr - User mailing list archive at Nabble.com.