RE: Solr and terracotta

2007-08-22 Thread Jonathan Woods
Note that Hoss was earlier calling for someone to submit an implementation
of SolrDirectoryFactory...
http://www.nabble.com/forum/ViewPost.jtp?post=12260989&framed=y

Jon 

> -Original Message-
> From: Jonathan Ariel [mailto:[EMAIL PROTECTED] 
> Sent: 23 August 2007 03:23
> To: solr-user@lucene.apache.org
> Subject: Re: Solr and terracotta
> 
> If I am not wrong once you have the RAMDir feature mounting 
> Terracotta should be transparent and fast, right?
> 
> On 8/22/07, Orion Letizi <[EMAIL PROTECTED]> wrote:
> >
> >
> > Jeryl,
> >
> > I remember you asking about how to hook in the RAMDirectory 
> a while back.
> > It seemed like there was maybe some support within Solr that you 
> > needed.  I assume you're suggesting adding an issue in the 
> Solr  JIRA, 
> > right?
> >
> > Is there something that the Terracotta team can do to help?
> >
> > Cheers,
> > Orion
> >
> >
> > Jeryl Cook wrote:
> > >
> > > tried it, didn't work that well...so I ended up making my 
> own little 
> > > faceted Search engine directly using RAMDirectory and 
> clustering it 
> > > via Terracotta...not as good as SOLR(smile), but it worked.
> > > i actually posted some questions awhile back in trying to 
> get it to
> > work.
> > > so terracotta can "hook" the RAMDirectory, maybe be good 
> to submit 
> > > this
> > in
> > > JIRA for terrocotta support!
> > >
> > > Jeryl Cook
> > >  /^\ Pharaoh /^\
> > >
> > >
> > > http://pharaohofkush.blogspot.com/
> > >
> > >
> > >
> > > "..Act your age, and not your shoe size.."
> > >
> > > -Prince(1986)
> > >
> > >> Date: Wed, 22 Aug 2007 16:18:24 -0300
> > >> From: [EMAIL PROTECTED]
> > >> To: solr-user@lucene.apache.org
> > >> Subject: Solr and terracotta
> > >>
> > >> Recently I ran into this topic. I googled it a little and didn't 
> > >> find much information.
> > >> It would be great to have solr working with RAMDirectory and
> > Terracotta.
> > >> We
> > >> could stop using crons for rsync, right?
> > >> Has anyone tried that out?
> > >
> > >
> >
> > --
> > View this message in context:
> > http://www.nabble.com/Solr-and-terracotta-tf4313531.html#a12283537
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
> 



Constraining date facets

2007-08-22 Thread raikoe

Hello,

i am using faceting in a project and would like to do date faceting with
facet.date. That works fine, but as well returns dates which have no
resulting pages underneath, i.e. the facet count equals 0. Is it possible to
constrain this just to dates for which results exist similar to
facet.mincount for usual facets? I tried the latter but did not succeed.

Thanks in advance
Raiko
-- 
View this message in context: 
http://www.nabble.com/Constraining-date-facets-tf4315743.html#a12288337
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Structured Lucene documents

2007-08-22 Thread Chris Hostetter

: aren't expandable at query time.  It would be quite cool if Solr could do
: query-time expansions of dynamic fields (e.g. hl.fl=page_*) however that
: would require some knowledge of the dynamic fields already stored in the
: index, which I don't think is currently available in either Solr or Lucene.

it is possible to get a list of all indexed fields from the underlying
Lucence IndexReader, so it's certianly possible .. the notion of
supporting "glob" syntax in all the situations where a list of field names
is used has been talked about before, but no one has attempted a
combrehensive patch yet.

note the comments in this issue, and the two threads it links to...

http://issues.apache.org/jira/browse/SOLR-247



-Hoss



Re: How to extract constrained fields from query

2007-08-22 Thread Chris Hostetter

: in my custom request handler, I want to determine which fields are
: constrained by the user.
:
: E.g. the query (q) might be "ipod AND brand:apple" and there might
: be a filter query (fq) like "color:white" (or more).
:
: What I want to know is that "brand" and "color" are constrained.

technically the "ipod" keyword is field constrained as well, using the
defaultSerachField.

: AFAICS I could use SolrPluginUtils.parseFilterQueries and test
: if the queries are TermQueries and read its Field.
: Then should I also test which kind of queries I get when parsing
: the query (q) and look for all TermQueries from the parsed query?

are you specificly only interested in TermQueries?  wouldn't a range query
also be a user constraint?

: Or is there a more elegant way of doing this?

it's hard to be sure without a better understanding of exactly what your
custom handler needs to do, but my best guess is a custom QueryParser that
records all the FieldNames it sees when parsing.



-Hoss



Re: Running into problems with distributed index and search

2007-08-22 Thread Chris Hostetter

: 3)  I had to bounce the tomcat search SOLR Webapp instance for it to
: read the index files, is it mandatory? In a distributed environment, do
: we always have to
:
: Bounce the SOLR Webapp instances to reflect the changes in the index
: files?

it sounds like you esentially have a master/slave setup except that
instead of using the distribution scripts to copy the index from one to
the other, they both use the same physical files via an NFS mount.

if you send a commit command to your "slave" search server, it will reopen
the index (without needing to bounce the port)


-Hoss



Re: almost realtime updates with replication

2007-08-22 Thread Chris Hostetter
:
: There are a couple queries that we would like to run almost realtime so
: I would like to have it so our client sends an update on every new
: document and then have solr configured to do an autocommit every 5-10
: seconds.
:
: reading the Wiki, it seems like this isn't possible because of the
: strain of snapshotting and pulling to the slaves at such a high rate.
: What I was thinking was for these few queries to just query the master
: and the rest can query the slave with the not realtime data, although
: I'm assuming this wouldn't work either because since a snapshot is
: created on every commit, we would still impact the performance too much?

there is no reason why a commit has to trigger a snapshot, that happens
only if you configure a postCommit hook to do so in your solrconfig.xml

you can absolutely commit every 5 seconds, but have a seperate cron task
that runs snapshooter ever 5 minutes -- you could even continue to run
snapshooter on every commit, and get a new snapshot ever 5 seconds, but
only run snappuller on your slave machines ever 5 minutes (the
snapshots are hardlinks and don't take up a lot of space, and snappuller
only needs to fetch the most recent snapshot)

your idea of querying the msater directly for these queries seems
perfectly fine to me ... just make sure the auto warm count on the caches
on your master is very tiny so the new searchers are ready quickly after
each commit.




-Hoss



RE: SolJava --- which attachments are valid?

2007-08-22 Thread Chris Hostetter

: I noticed that some classes have API docs (.html) but no source code
: (.java).
: For example, there is a javadoc for
: org.apache.solr.client.solrj.util.ClientUtils
: but no ClientUtils.java:

i beleive this issue is that none of the source from the client
directory is included in the builds at the moment ... i don't think we've
ever really figured out a general strategy for releasing any of the client
APIs


-Hoss


Re: Solr and terracotta

2007-08-22 Thread Jonathan Ariel
If I am not wrong once you have the RAMDir feature mounting Terracotta
should be transparent and fast, right?

On 8/22/07, Orion Letizi <[EMAIL PROTECTED]> wrote:
>
>
> Jeryl,
>
> I remember you asking about how to hook in the RAMDirectory a while back.
> It seemed like there was maybe some support within Solr that you
> needed.  I
> assume you're suggesting adding an issue in the Solr  JIRA, right?
>
> Is there something that the Terracotta team can do to help?
>
> Cheers,
> Orion
>
>
> Jeryl Cook wrote:
> >
> > tried it, didn't work that well...so I ended up making my own little
> > faceted Search engine directly using RAMDirectory and clustering it via
> > Terracotta...not as good as SOLR(smile), but it worked.
> > i actually posted some questions awhile back in trying to get it to
> work.
> > so terracotta can "hook" the RAMDirectory, maybe be good to submit this
> in
> > JIRA for terrocotta support!
> >
> > Jeryl Cook
> >  /^\ Pharaoh /^\
> >
> >
> > http://pharaohofkush.blogspot.com/
> >
> >
> >
> > "..Act your age, and not your shoe size.."
> >
> > -Prince(1986)
> >
> >> Date: Wed, 22 Aug 2007 16:18:24 -0300
> >> From: [EMAIL PROTECTED]
> >> To: solr-user@lucene.apache.org
> >> Subject: Solr and terracotta
> >>
> >> Recently I ran into this topic. I googled it a little and didn't find
> >> much
> >> information.
> >> It would be great to have solr working with RAMDirectory and
> Terracotta.
> >> We
> >> could stop using crons for rsync, right?
> >> Has anyone tried that out?
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Solr-and-terracotta-tf4313531.html#a12283537
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: almost realtime updates with replication

2007-08-22 Thread Walter Underwood
At Infoseek, we ran a separate search index with today's updates
and merged that in once each day. It requires a little bit of
federated search to prefer the new content over the big index,
but the daily index can be very nimble for update.

wunder

On 8/22/07 7:58 AM, "mike topper" <[EMAIL PROTECTED]> wrote:

> Hello,
> 
> Currently in our application we are using the master/slave setup and
> have a batch update/commit about every 5 minutes.
> 
> There are a couple queries that we would like to run almost realtime so
> I would like to have it so our client sends an update on every new
> document and then have solr configured to do an autocommit every 5-10
> seconds.
> 
> reading the Wiki, it seems like this isn't possible because of the
> strain of snapshotting and pulling to the slaves at such a high rate.
> What I was thinking was for these few queries to just query the master
> and the rest can query the slave with the not realtime data, although
> I'm assuming this wouldn't work either because since a snapshot is
> created on every commit, we would still impact the performance too much?
> 
> anyone have any suggestions?  If I set autowarmingCount=0 would I be
> able to to pull to the slave faster than every couple of minutes (say,
> every 10 seconds)?
> 
> what if I take out the postcommit hook on the master and just have the
> snapshooter run on a cron every 5 minutes?
> 
> -Mike
> 
> 



Re: Web statistics for solr?

2007-08-22 Thread Pieter Berkel
Matthew,

Maybe the SOLR Statistics page would suit your purpose?
(click on "statistics" from the main solr page or use the following url)
http://localhost:8983/solr/admin/stats.jsp

cheers,
Piete



On 23/08/07, Matthew Runo <[EMAIL PROTECTED]> wrote:
>
> Hello!
>
> I was wondering if anyone has written a script that displays any
> stats from SOLR.. queries per second, number of docs added.. this
> sort of thing.
>
> Sort of a general dashboard for SOLR.
>
> I'd rather not write it myself if I don't need to, and I didn't see
> anything conclusive in the archives for the email list.
>
> ++
>   | Matthew Runo
>   | Zappos Development
>   | [EMAIL PROTECTED]
>   | 702-943-7833
> ++
>
>
>


Re: defining fiels to be returned when using mlt

2007-08-22 Thread Pieter Berkel
Hi Stefan,

Currently there is no way to specify the list of fields to be returned by
the MoreLikeThis handler.  I've been looking to address this issue in
https://issues.apache.org/jira/browse/SOLR-295 (point 3) however in the
broader scheme of things, it seems logical to wait until
https://issues.apache.org/jira/browse/SOLR-281 is resolved before making
changes to MLT.

cheers,
Piete



On 22/08/07, Stefan Rinner <[EMAIL PROTECTED]> wrote:
>
> Hi
>
> Is there any way to define the numer/type of fields of the documents
> returned in the "moreLikeThis" part of the response, when "mlt" is
> set to true?
>
> Currently I'm using morelikethis to show the number and sources of
> similar documents - therefore I'd need only the "source" field of
> these similar documents and not everything.
>
> - stefan
>


Web statistics for solr?

2007-08-22 Thread Matthew Runo

Hello!

I was wondering if anyone has written a script that displays any  
stats from SOLR.. queries per second, number of docs added.. this  
sort of thing.


Sort of a general dashboard for SOLR.

I'd rather not write it myself if I don't need to, and I didn't see  
anything conclusive in the archives for the email list.


++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++




RE: Solr and terracotta

2007-08-22 Thread Orion Letizi

Jeryl,

I remember you asking about how to hook in the RAMDirectory a while back. 
It seemed like there was maybe some support within Solr that you needed.  I
assume you're suggesting adding an issue in the Solr  JIRA, right?

Is there something that the Terracotta team can do to help?

Cheers,
Orion


Jeryl Cook wrote:
> 
> tried it, didn't work that well...so I ended up making my own little
> faceted Search engine directly using RAMDirectory and clustering it via
> Terracotta...not as good as SOLR(smile), but it worked.
> i actually posted some questions awhile back in trying to get it to work.
> so terracotta can "hook" the RAMDirectory, maybe be good to submit this in
> JIRA for terrocotta support!
> 
> Jeryl Cook 
>  /^\ Pharaoh /^\ 
> 
> 
> http://pharaohofkush.blogspot.com/ 
> 
> 
> 
> "..Act your age, and not your shoe size.."
> 
> -Prince(1986)
> 
>> Date: Wed, 22 Aug 2007 16:18:24 -0300
>> From: [EMAIL PROTECTED]
>> To: solr-user@lucene.apache.org
>> Subject: Solr and terracotta
>> 
>> Recently I ran into this topic. I googled it a little and didn't find
>> much
>> information.
>> It would be great to have solr working with RAMDirectory and Terracotta.
>> We
>> could stop using crons for rsync, right?
>> Has anyone tried that out?
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Solr-and-terracotta-tf4313531.html#a12283537
Sent from the Solr - User mailing list archive at Nabble.com.



How to extract constrained fields from query

2007-08-22 Thread Martin Grotzke
Hello,

in my custom request handler, I want to determine which fields are
constrained by the user.

E.g. the query (q) might be "ipod AND brand:apple" and there might
be a filter query (fq) like "color:white" (or more).

What I want to know is that "brand" and "color" are constrained.

AFAICS I could use SolrPluginUtils.parseFilterQueries and test
if the queries are TermQueries and read its Field.
Then should I also test which kind of queries I get when parsing
the query (q) and look for all TermQueries from the parsed query?

Or is there a more elegant way of doing this?

Thanx a lot,
cheers,
Martin




signature.asc
Description: This is a digitally signed message part


Running into problems with distributed index and search

2007-08-22 Thread Kasi Sankaralingam
Hi All,

 

This is the scenario, I have two search SOLR instances running on two
different partitions, I am treating one of the servers strictly
read-only (for search) (search server) and the other

Instance (index server) for indexing. The index file data directory
reside on a NFS partition, I am running into the following problems,

 

1)  Index dir is /indexdata/data, when I index using the  Index
server, the index server  understands the data dir mentioned in
solrconfig.xml, writes the index files

To the location and is able to read the files ( I am able to do queries
using SOLR Admin)

 

2)  Search server respects the NFS directory, but does not read the
index files, SOLR Admin returns no search results, I had to create a sym
link to the NFS partition 

Under $SOLRHOME to point to NFS partition to work.

 

3)  I had to bounce the tomcat search SOLR Webapp instance for it to
read the index files, is it mandatory? In a distributed environment, do
we always have to

Bounce the SOLR Webapp instances to reflect the changes in the index
files?

 

Any help/suggestions would be greatly appreciated.

 

Thanks,

 

kasi



Re: Solr and terracotta

2007-08-22 Thread Jonathan Ariel
How come it didn't work? How did you add RAMDir support to solr?

On 8/22/07, Jeryl Cook <[EMAIL PROTECTED]> wrote:
>
> tried it, didn't work that well...so I ended up making my own little
> faceted Search engine directly using RAMDirectory and clustering it via
> Terracotta...not as good as SOLR(smile), but it worked.
> i actually posted some questions awhile back in trying to get it to work.
> so terracotta can "hook" the RAMDirectory, maybe be good to submit this in
> JIRA for terrocotta support!
>
> Jeryl Cook
> /^\ Pharaoh /^\
>
>
> http://pharaohofkush.blogspot.com/
>
>
>
> "..Act your age, and not your shoe size.."
>
> -Prince(1986)
>
> > Date: Wed, 22 Aug 2007 16:18:24 -0300
> > From: [EMAIL PROTECTED]
> > To: solr-user@lucene.apache.org
> > Subject: Solr and terracotta
> >
> > Recently I ran into this topic. I googled it a little and didn't find
> much
> > information.
> > It would be great to have solr working with RAMDirectory and Terracotta.
> We
> > could stop using crons for rsync, right?
> > Has anyone tried that out?
>


RE: Solr and terracotta

2007-08-22 Thread Jeryl Cook
tried it, didn't work that well...so I ended up making my own little faceted 
Search engine directly using RAMDirectory and clustering it via 
Terracotta...not as good as SOLR(smile), but it worked.
i actually posted some questions awhile back in trying to get it to work. so 
terracotta can "hook" the RAMDirectory, maybe be good to submit this in JIRA 
for terrocotta support!

Jeryl Cook 
 /^\ Pharaoh /^\ 


http://pharaohofkush.blogspot.com/ 



"..Act your age, and not your shoe size.."

-Prince(1986)

> Date: Wed, 22 Aug 2007 16:18:24 -0300
> From: [EMAIL PROTECTED]
> To: solr-user@lucene.apache.org
> Subject: Solr and terracotta
> 
> Recently I ran into this topic. I googled it a little and didn't find much
> information.
> It would be great to have solr working with RAMDirectory and Terracotta. We
> could stop using crons for rsync, right?
> Has anyone tried that out?


Re: Solr scoring: relative or absolute?

2007-08-22 Thread Sean Timm




Indexes cannot be directly compared unless they have similar collection
statistics.  That is the same terms occur with the same frequency
across all indexes and the average document lengths are about the same
(though the default similarity in Lucene may not care about average
document length--I'm not sure).

SOLR-303 is an attempt to solve the
partitioning issue from the search side of things.

-Sean

Lance Norskog wrote:

  Are the score values generated in Solr relative to the index or are they
against an absolute standard?
Is it possible to create a scoring algorithm with this property? Are there
parts of the score inputs that are absolute?
 
My use case is this: I would like to do a parallel search against two Solr
indexes, and combine the results. The two indexes are built with the same
data sources, we just can't handle one giant index. If the score values are
against a common 'scale', then scores from the two search indexes can be
compared. I could combine the result sets with a simple merge by score.
 
This is a difficult concept to explain. I hope I have succeeded.
 
Thanks,
 
Lance

  





Solr scoring: relative or absolute?

2007-08-22 Thread Lance Norskog
Are the score values generated in Solr relative to the index or are they
against an absolute standard?
Is it possible to create a scoring algorithm with this property? Are there
parts of the score inputs that are absolute?
 
My use case is this: I would like to do a parallel search against two Solr
indexes, and combine the results. The two indexes are built with the same
data sources, we just can't handle one giant index. If the score values are
against a common 'scale', then scores from the two search indexes can be
compared. I could combine the result sets with a simple merge by score.
 
This is a difficult concept to explain. I hope I have succeeded.
 
Thanks,
 
Lance


Solr and terracotta

2007-08-22 Thread Jonathan Ariel
Recently I ran into this topic. I googled it a little and didn't find much
information.
It would be great to have solr working with RAMDirectory and Terracotta. We
could stop using crons for rsync, right?
Has anyone tried that out?


RE: SolJava --- which attachments are valid?

2007-08-22 Thread Teruhiko Kurosaka
Sorry for revisiting this 3 weeks old thread.
I downloaded the nighlty yesterday.
I noticed that some classes have API docs (.html) but no source code
(.java).
For example, there is a javadoc for
org.apache.solr.client.solrj.util.ClientUtils
but no ClientUtils.java:

bash-3.00$ find . -type f | grep Client
./docs/api-solrj/org/apache/solr/client/solrj/util/class-use/ClientUtils
.html
./docs/api-solrj/org/apache/solr/client/solrj/util/ClientUtils.html

Is this a packaging problem, or is it intentional?

-kuro

> -Original Message-
> From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
> Sent: Friday, August 03, 2007 12:50 PM
> To: solr-user@lucene.apache.org
> Subject: Re: SolJava --- which attachments are valid?
> 
> Teruhiko Kurosaka wrote:
> >> or you can get it from the nightly builds in:
> >> http://people.apache.org/builds/lucene/solr/nightly/
> > 
> > For those of you who are interested...
> > 
> > As far as I can tell by inspecting the source code in Trunk,
> > solrj.jar from the nightly doesn't seem to work with Solr 1.2.
> > For one thing, there is a new layer org.apache.solr.common
> > and org.apache.util has become a sub component under
> > the common. Things like SolrInputDocument do not exist
> > in Solr 1.2 at all. 
> > 
> 
> To run solrj, you need:
>   apache-solr-1.3-dev-common.jar
>   apache-solr-1.3-dev-solrj.jar
>   and all the files in: solrj-lib
> 
> You *should* be able to use the client against a server that 
> is running 
> 1.2, but I don't make any promises there.
> 
> ryan
> 


Apache web server logs in solr

2007-08-22 Thread Andrew Nagy
Hello, I was thinking that solr - with its built in faceting - would make for a 
great apache log file storage system.  I was wondering if anyone knows of any 
module or library for apache to write log files directly to solr or to a lucene 
index?

Thanks
Andrew


RE: Query optimisation - multiple filter caches?

2007-08-22 Thread Jonathan Woods
Not high priority, but a few thoughts occur, then:

- perhaps it would be better to use org.apache.lucene.search.Searcher by
composition and have SolrIndexSearcher merely implement Searchable.

- or... perhaps search(...) should perform optimally cache-aware searches -
else integrators might wrongly think they're getting the full power of Solr.

Jon

> -Original Message-
> From: Yonik Seeley [mailto:[EMAIL PROTECTED] 
> Sent: 22 August 2007 17:36
> 
> On 8/22/07, Jonathan Woods <[EMAIL PROTECTED]> wrote:
> > I notice that LuceneQueryOptimizer is still used in 
> > SolrIndexSearcher.search(Query, Filter, Sort) - is the idea 
> then that 
> > this method is deprecated,
> 
> Hmmm, so it is.  I hadn't noticed because that method is not 
> called from any query handlers AFAIK (not since the first 
> versions of solr before it went open source).
> The method itself shouldn't be deprecated because it's part 
> of the Lucene IndexSearcher interface.



Re: Query optimisation - multiple filter caches?

2007-08-22 Thread Yonik Seeley
On 8/22/07, Jonathan Woods <[EMAIL PROTECTED]> wrote:
> I notice that LuceneQueryOptimizer is still used in
> SolrIndexSearcher.search(Query, Filter, Sort) - is the idea then that this
> method is deprecated,

Hmmm, so it is.  I hadn't noticed because that method is not called
from any query handlers AFAIK (not since the first versions of solr
before it went open source).
The method itself shouldn't be deprecated because it's part of the
Lucene IndexSearcher interface.

> or that the config parameter
> query/boolTofilterOptimizer is no longer to be used?

That should probably be removed from the example schema... thanks for
pointing that out.

-Yonik


RE: Query optimisation - multiple filter caches?

2007-08-22 Thread Jonathan Woods
I understand - thanks, Yonik.

I notice that LuceneQueryOptimizer is still used in
SolrIndexSearcher.search(Query, Filter, Sort) - is the idea then that this
method is deprecated, or that the config parameter
query/boolTofilterOptimizer is no longer to be used?  As for the other
search() methods, they just delegate directly to
org.apache.lucene.search.IndexSearcher, so no use of caches there.

Jon

> -Original Message-
> From: Yonik Seeley [mailto:[EMAIL PROTECTED] 
> Sent: 16 August 2007 01:40
> To: solr-user@lucene.apache.org
> Subject: Re: Query optimisation - multiple filter caches?
> 
> On 8/15/07, Jonathan Woods <[EMAIL PROTECTED]> wrote:
> > I'm trying to understand how best to integrate directly with Solr 
> > (Java-to-Java in the same JVM) to make the most of its query 
> > optimisation - chiefly, its caching of queries which merely filter 
> > rather than rank results.
> >
> > I notice that SolrIndexSearcher maintains a filter cache 
> and so does 
> > LuceneQueryOptimiser.  Shouldn't they be contributing to/using the 
> > same cache, or are they used for different things?
> 
> LuceneQueryOptimiser is no longer used since one can directly 
> specify filters via fq parameters.
> 
> -Yonik
> 
> 
> 



almost realtime updates with replication

2007-08-22 Thread mike topper

Hello,

Currently in our application we are using the master/slave setup and 
have a batch update/commit about every 5 minutes.


There are a couple queries that we would like to run almost realtime so 
I would like to have it so our client sends an update on every new 
document and then have solr configured to do an autocommit every 5-10 
seconds.


reading the Wiki, it seems like this isn't possible because of the 
strain of snapshotting and pulling to the slaves at such a high rate.  
What I was thinking was for these few queries to just query the master 
and the rest can query the slave with the not realtime data, although 
I'm assuming this wouldn't work either because since a snapshot is 
created on every commit, we would still impact the performance too much?


anyone have any suggestions?  If I set autowarmingCount=0 would I be 
able to to pull to the slave faster than every couple of minutes (say, 
every 10 seconds)?


what if I take out the postcommit hook on the master and just have the 
snapshooter run on a cron every 5 minutes?


-Mike




Re: Indexing HTML content... (Embed HTML into XML?)

2007-08-22 Thread Ravish Bhagdev
Thanks Jérôme!

It seems to work now.  I just hope the provided
HTMLStripWhitespaceTokenizerFactory will strip the right tags now.

I use Java and used HtmlEncoder provided in
http://itext.ugent.be/library/api/  for encoding with success. (just
in case someone happens to search this thread)

Ravi

On 8/22/07, Jérôme Etévé <[EMAIL PROTECTED]> wrote:
> You need to encode your html content so it can be include as a normal
> 'string' value in your xml element.
>
> As far as remember, the only unsafe characters you have to encode as
> entities are:
> <  -> <
> > -> >
> " -> "e;
> & -> &
>
> (google xml entities to be sure).
>
> I dont know what language you use , but for perl for instance, you can
> use something like:
> use HTML::Entities ;
> my $xmlString = encode_entities($rawHTML  , '<>&"' );
>
> Also you need to make sure your Html is encoded in UTF-8 . To comply
> with solr need for UTF-8 encoded xml.
>
> I hope it helps.
>
> J.
>
> On 8/22/07, Ravish Bhagdev <[EMAIL PROTECTED]> wrote:
> > Hello,
> >
> > Sorry for stupid question.  I'm trying to index html file as one of
> > the fields in Solr, I've setup appropriate analyzer in schema but I'm
> > not sure how to add html content to Solr.  Encapsulating HTML content
> > within field tag is obviously not valid.  How do I add html content?
> > Hope the query is clear
> >
> > Thanks,
> > Ravi
> >
>
>
> --
> Jerome Eteve.
> [EMAIL PROTECTED]
> http://jerome.eteve.free.fr/
>


Re: Indexing HTML content... (Embed HTML into XML?)

2007-08-22 Thread Jérôme Etévé
You need to encode your html content so it can be include as a normal
'string' value in your xml element.

As far as remember, the only unsafe characters you have to encode as
entities are:
<  -> <
> -> >
" -> "e;
& -> &

(google xml entities to be sure).

I dont know what language you use , but for perl for instance, you can
use something like:
use HTML::Entities ;
my $xmlString = encode_entities($rawHTML  , '<>&"' );

Also you need to make sure your Html is encoded in UTF-8 . To comply
with solr need for UTF-8 encoded xml.

I hope it helps.

J.

On 8/22/07, Ravish Bhagdev <[EMAIL PROTECTED]> wrote:
> Hello,
>
> Sorry for stupid question.  I'm trying to index html file as one of
> the fields in Solr, I've setup appropriate analyzer in schema but I'm
> not sure how to add html content to Solr.  Encapsulating HTML content
> within field tag is obviously not valid.  How do I add html content?
> Hope the query is clear
>
> Thanks,
> Ravi
>


-- 
Jerome Eteve.
[EMAIL PROTECTED]
http://jerome.eteve.free.fr/


Indexing HTML content... (Embed HTML into XML?)

2007-08-22 Thread Ravish Bhagdev
Hello,

Sorry for stupid question.  I'm trying to index html file as one of
the fields in Solr, I've setup appropriate analyzer in schema but I'm
not sure how to add html content to Solr.  Encapsulating HTML content
within field tag is obviously not valid.  How do I add html content?
Hope the query is clear

Thanks,
Ravi


Re: Replacing existing documents

2007-08-22 Thread Erik Hatcher


On Aug 21, 2007, at 9:25 PM, Lance Norskog wrote:

Recently someone mentioned that it would be possible to have a  
'replace
existing document' feature rather than just dropping and adding  
documents

with the same unique id.


There is such a patch: https://issues.apache.org/jira/browse/SOLR-139

I'm experimenting with it right now and it works well for my cases.

However, it is still under the covers a delete/add and

One use case is that we would like to use the index as our one  
database for
documents, and if we delete a document we want it to stay deleted.  
Thus we
would mark it deleted and check for its existence. Another use case  
is that
we are re-adding the same document a few times a day, and the  
commit times

are ballooning.


...you still have to commit for changes to be visible.

Erik



defining fiels to be returned when using mlt

2007-08-22 Thread Stefan Rinner

Hi

Is there any way to define the numer/type of fields of the documents  
returned in the "moreLikeThis" part of the response, when "mlt" is  
set to true?


Currently I'm using morelikethis to show the number and sources of  
similar documents - therefore I'd need only the "source" field of  
these similar documents and not everything.


- stefan


Major update to Solrsharp

2007-08-22 Thread Jeff Rodenburg
A big update was just posted to the Solrsharp project.  This update now
provides for first-class support for highlighting in the library.

The implementation is really robust and provides the following features:

   - Structured highlight parameter assignment based on the SolrField
   object
   - Full access for all highlight parameters, on both an aggregate and
   per-field basis
   - Incorporation of highlighted values into the base search result
   records

All of the supplied documentation has been updated as well as the example
application in using the highlighting classes.

Please report any issues through JIRA.  Be sure to associate any issues with
the "C# client" component.

cheers,
jeff r.


RE: Replacing existing documents

2007-08-22 Thread Ard Schrijvers
Hello,

"Recently someone mentioned that it would be possible to have a 'replace
existing document' feature rather than just dropping and adding documents
with the same unique id."

AFAIK, this is not possible. You have the update in lucene, but internally it 
just does a delete/add operation

"We have a few use cases in this area and I'm
researching whether it is effective to check for a document via Solr
queries, or whether it is worthwhile to add this to the Solr implementation."

What are the usecases?? I do not see what you mean.

"Does anyone have an estimate for the difference between querying, day, 100
documents by unique ID from the network v.s. fetching them directly from the
index?"

Depends of course from the networkfetching them from the index is fast 
normally.
 
"One use case is that we would like to use the index as our one database for
documents, and if we delete a document we want it to stay deleted. Thus we
would mark it deleted and check for its existence."

I suppose you mark it deleted by setting some flag (like lucene Field: 
isDeleted set to true). I am not sure wether using the lucene index as your 
database is really smart...i might get corrupt. I would at least suggest to 
backup it frequently

Regards Ard

ps sry for my annoying ".." because i am using a web mail client

"Another use case is that we are re-adding the same document a few times a day, 
and the commit times
are ballooning.

 
Where would I implement this?
 
Thanks,
 
Lance"