RE: Solr and terracotta
Note that Hoss was earlier calling for someone to submit an implementation of SolrDirectoryFactory... http://www.nabble.com/forum/ViewPost.jtp?post=12260989&framed=y Jon > -Original Message- > From: Jonathan Ariel [mailto:[EMAIL PROTECTED] > Sent: 23 August 2007 03:23 > To: solr-user@lucene.apache.org > Subject: Re: Solr and terracotta > > If I am not wrong once you have the RAMDir feature mounting > Terracotta should be transparent and fast, right? > > On 8/22/07, Orion Letizi <[EMAIL PROTECTED]> wrote: > > > > > > Jeryl, > > > > I remember you asking about how to hook in the RAMDirectory > a while back. > > It seemed like there was maybe some support within Solr that you > > needed. I assume you're suggesting adding an issue in the > Solr JIRA, > > right? > > > > Is there something that the Terracotta team can do to help? > > > > Cheers, > > Orion > > > > > > Jeryl Cook wrote: > > > > > > tried it, didn't work that well...so I ended up making my > own little > > > faceted Search engine directly using RAMDirectory and > clustering it > > > via Terracotta...not as good as SOLR(smile), but it worked. > > > i actually posted some questions awhile back in trying to > get it to > > work. > > > so terracotta can "hook" the RAMDirectory, maybe be good > to submit > > > this > > in > > > JIRA for terrocotta support! > > > > > > Jeryl Cook > > > /^\ Pharaoh /^\ > > > > > > > > > http://pharaohofkush.blogspot.com/ > > > > > > > > > > > > "..Act your age, and not your shoe size.." > > > > > > -Prince(1986) > > > > > >> Date: Wed, 22 Aug 2007 16:18:24 -0300 > > >> From: [EMAIL PROTECTED] > > >> To: solr-user@lucene.apache.org > > >> Subject: Solr and terracotta > > >> > > >> Recently I ran into this topic. I googled it a little and didn't > > >> find much information. > > >> It would be great to have solr working with RAMDirectory and > > Terracotta. > > >> We > > >> could stop using crons for rsync, right? > > >> Has anyone tried that out? > > > > > > > > > > -- > > View this message in context: > > http://www.nabble.com/Solr-and-terracotta-tf4313531.html#a12283537 > > Sent from the Solr - User mailing list archive at Nabble.com. > > > > >
Constraining date facets
Hello, i am using faceting in a project and would like to do date faceting with facet.date. That works fine, but as well returns dates which have no resulting pages underneath, i.e. the facet count equals 0. Is it possible to constrain this just to dates for which results exist similar to facet.mincount for usual facets? I tried the latter but did not succeed. Thanks in advance Raiko -- View this message in context: http://www.nabble.com/Constraining-date-facets-tf4315743.html#a12288337 Sent from the Solr - User mailing list archive at Nabble.com.
Re: Structured Lucene documents
: aren't expandable at query time. It would be quite cool if Solr could do : query-time expansions of dynamic fields (e.g. hl.fl=page_*) however that : would require some knowledge of the dynamic fields already stored in the : index, which I don't think is currently available in either Solr or Lucene. it is possible to get a list of all indexed fields from the underlying Lucence IndexReader, so it's certianly possible .. the notion of supporting "glob" syntax in all the situations where a list of field names is used has been talked about before, but no one has attempted a combrehensive patch yet. note the comments in this issue, and the two threads it links to... http://issues.apache.org/jira/browse/SOLR-247 -Hoss
Re: How to extract constrained fields from query
: in my custom request handler, I want to determine which fields are : constrained by the user. : : E.g. the query (q) might be "ipod AND brand:apple" and there might : be a filter query (fq) like "color:white" (or more). : : What I want to know is that "brand" and "color" are constrained. technically the "ipod" keyword is field constrained as well, using the defaultSerachField. : AFAICS I could use SolrPluginUtils.parseFilterQueries and test : if the queries are TermQueries and read its Field. : Then should I also test which kind of queries I get when parsing : the query (q) and look for all TermQueries from the parsed query? are you specificly only interested in TermQueries? wouldn't a range query also be a user constraint? : Or is there a more elegant way of doing this? it's hard to be sure without a better understanding of exactly what your custom handler needs to do, but my best guess is a custom QueryParser that records all the FieldNames it sees when parsing. -Hoss
Re: Running into problems with distributed index and search
: 3) I had to bounce the tomcat search SOLR Webapp instance for it to : read the index files, is it mandatory? In a distributed environment, do : we always have to : : Bounce the SOLR Webapp instances to reflect the changes in the index : files? it sounds like you esentially have a master/slave setup except that instead of using the distribution scripts to copy the index from one to the other, they both use the same physical files via an NFS mount. if you send a commit command to your "slave" search server, it will reopen the index (without needing to bounce the port) -Hoss
Re: almost realtime updates with replication
: : There are a couple queries that we would like to run almost realtime so : I would like to have it so our client sends an update on every new : document and then have solr configured to do an autocommit every 5-10 : seconds. : : reading the Wiki, it seems like this isn't possible because of the : strain of snapshotting and pulling to the slaves at such a high rate. : What I was thinking was for these few queries to just query the master : and the rest can query the slave with the not realtime data, although : I'm assuming this wouldn't work either because since a snapshot is : created on every commit, we would still impact the performance too much? there is no reason why a commit has to trigger a snapshot, that happens only if you configure a postCommit hook to do so in your solrconfig.xml you can absolutely commit every 5 seconds, but have a seperate cron task that runs snapshooter ever 5 minutes -- you could even continue to run snapshooter on every commit, and get a new snapshot ever 5 seconds, but only run snappuller on your slave machines ever 5 minutes (the snapshots are hardlinks and don't take up a lot of space, and snappuller only needs to fetch the most recent snapshot) your idea of querying the msater directly for these queries seems perfectly fine to me ... just make sure the auto warm count on the caches on your master is very tiny so the new searchers are ready quickly after each commit. -Hoss
RE: SolJava --- which attachments are valid?
: I noticed that some classes have API docs (.html) but no source code : (.java). : For example, there is a javadoc for : org.apache.solr.client.solrj.util.ClientUtils : but no ClientUtils.java: i beleive this issue is that none of the source from the client directory is included in the builds at the moment ... i don't think we've ever really figured out a general strategy for releasing any of the client APIs -Hoss
Re: Solr and terracotta
If I am not wrong once you have the RAMDir feature mounting Terracotta should be transparent and fast, right? On 8/22/07, Orion Letizi <[EMAIL PROTECTED]> wrote: > > > Jeryl, > > I remember you asking about how to hook in the RAMDirectory a while back. > It seemed like there was maybe some support within Solr that you > needed. I > assume you're suggesting adding an issue in the Solr JIRA, right? > > Is there something that the Terracotta team can do to help? > > Cheers, > Orion > > > Jeryl Cook wrote: > > > > tried it, didn't work that well...so I ended up making my own little > > faceted Search engine directly using RAMDirectory and clustering it via > > Terracotta...not as good as SOLR(smile), but it worked. > > i actually posted some questions awhile back in trying to get it to > work. > > so terracotta can "hook" the RAMDirectory, maybe be good to submit this > in > > JIRA for terrocotta support! > > > > Jeryl Cook > > /^\ Pharaoh /^\ > > > > > > http://pharaohofkush.blogspot.com/ > > > > > > > > "..Act your age, and not your shoe size.." > > > > -Prince(1986) > > > >> Date: Wed, 22 Aug 2007 16:18:24 -0300 > >> From: [EMAIL PROTECTED] > >> To: solr-user@lucene.apache.org > >> Subject: Solr and terracotta > >> > >> Recently I ran into this topic. I googled it a little and didn't find > >> much > >> information. > >> It would be great to have solr working with RAMDirectory and > Terracotta. > >> We > >> could stop using crons for rsync, right? > >> Has anyone tried that out? > > > > > > -- > View this message in context: > http://www.nabble.com/Solr-and-terracotta-tf4313531.html#a12283537 > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: almost realtime updates with replication
At Infoseek, we ran a separate search index with today's updates and merged that in once each day. It requires a little bit of federated search to prefer the new content over the big index, but the daily index can be very nimble for update. wunder On 8/22/07 7:58 AM, "mike topper" <[EMAIL PROTECTED]> wrote: > Hello, > > Currently in our application we are using the master/slave setup and > have a batch update/commit about every 5 minutes. > > There are a couple queries that we would like to run almost realtime so > I would like to have it so our client sends an update on every new > document and then have solr configured to do an autocommit every 5-10 > seconds. > > reading the Wiki, it seems like this isn't possible because of the > strain of snapshotting and pulling to the slaves at such a high rate. > What I was thinking was for these few queries to just query the master > and the rest can query the slave with the not realtime data, although > I'm assuming this wouldn't work either because since a snapshot is > created on every commit, we would still impact the performance too much? > > anyone have any suggestions? If I set autowarmingCount=0 would I be > able to to pull to the slave faster than every couple of minutes (say, > every 10 seconds)? > > what if I take out the postcommit hook on the master and just have the > snapshooter run on a cron every 5 minutes? > > -Mike > >
Re: Web statistics for solr?
Matthew, Maybe the SOLR Statistics page would suit your purpose? (click on "statistics" from the main solr page or use the following url) http://localhost:8983/solr/admin/stats.jsp cheers, Piete On 23/08/07, Matthew Runo <[EMAIL PROTECTED]> wrote: > > Hello! > > I was wondering if anyone has written a script that displays any > stats from SOLR.. queries per second, number of docs added.. this > sort of thing. > > Sort of a general dashboard for SOLR. > > I'd rather not write it myself if I don't need to, and I didn't see > anything conclusive in the archives for the email list. > > ++ > | Matthew Runo > | Zappos Development > | [EMAIL PROTECTED] > | 702-943-7833 > ++ > > >
Re: defining fiels to be returned when using mlt
Hi Stefan, Currently there is no way to specify the list of fields to be returned by the MoreLikeThis handler. I've been looking to address this issue in https://issues.apache.org/jira/browse/SOLR-295 (point 3) however in the broader scheme of things, it seems logical to wait until https://issues.apache.org/jira/browse/SOLR-281 is resolved before making changes to MLT. cheers, Piete On 22/08/07, Stefan Rinner <[EMAIL PROTECTED]> wrote: > > Hi > > Is there any way to define the numer/type of fields of the documents > returned in the "moreLikeThis" part of the response, when "mlt" is > set to true? > > Currently I'm using morelikethis to show the number and sources of > similar documents - therefore I'd need only the "source" field of > these similar documents and not everything. > > - stefan >
Web statistics for solr?
Hello! I was wondering if anyone has written a script that displays any stats from SOLR.. queries per second, number of docs added.. this sort of thing. Sort of a general dashboard for SOLR. I'd rather not write it myself if I don't need to, and I didn't see anything conclusive in the archives for the email list. ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++
RE: Solr and terracotta
Jeryl, I remember you asking about how to hook in the RAMDirectory a while back. It seemed like there was maybe some support within Solr that you needed. I assume you're suggesting adding an issue in the Solr JIRA, right? Is there something that the Terracotta team can do to help? Cheers, Orion Jeryl Cook wrote: > > tried it, didn't work that well...so I ended up making my own little > faceted Search engine directly using RAMDirectory and clustering it via > Terracotta...not as good as SOLR(smile), but it worked. > i actually posted some questions awhile back in trying to get it to work. > so terracotta can "hook" the RAMDirectory, maybe be good to submit this in > JIRA for terrocotta support! > > Jeryl Cook > /^\ Pharaoh /^\ > > > http://pharaohofkush.blogspot.com/ > > > > "..Act your age, and not your shoe size.." > > -Prince(1986) > >> Date: Wed, 22 Aug 2007 16:18:24 -0300 >> From: [EMAIL PROTECTED] >> To: solr-user@lucene.apache.org >> Subject: Solr and terracotta >> >> Recently I ran into this topic. I googled it a little and didn't find >> much >> information. >> It would be great to have solr working with RAMDirectory and Terracotta. >> We >> could stop using crons for rsync, right? >> Has anyone tried that out? > > -- View this message in context: http://www.nabble.com/Solr-and-terracotta-tf4313531.html#a12283537 Sent from the Solr - User mailing list archive at Nabble.com.
How to extract constrained fields from query
Hello, in my custom request handler, I want to determine which fields are constrained by the user. E.g. the query (q) might be "ipod AND brand:apple" and there might be a filter query (fq) like "color:white" (or more). What I want to know is that "brand" and "color" are constrained. AFAICS I could use SolrPluginUtils.parseFilterQueries and test if the queries are TermQueries and read its Field. Then should I also test which kind of queries I get when parsing the query (q) and look for all TermQueries from the parsed query? Or is there a more elegant way of doing this? Thanx a lot, cheers, Martin signature.asc Description: This is a digitally signed message part
Running into problems with distributed index and search
Hi All, This is the scenario, I have two search SOLR instances running on two different partitions, I am treating one of the servers strictly read-only (for search) (search server) and the other Instance (index server) for indexing. The index file data directory reside on a NFS partition, I am running into the following problems, 1) Index dir is /indexdata/data, when I index using the Index server, the index server understands the data dir mentioned in solrconfig.xml, writes the index files To the location and is able to read the files ( I am able to do queries using SOLR Admin) 2) Search server respects the NFS directory, but does not read the index files, SOLR Admin returns no search results, I had to create a sym link to the NFS partition Under $SOLRHOME to point to NFS partition to work. 3) I had to bounce the tomcat search SOLR Webapp instance for it to read the index files, is it mandatory? In a distributed environment, do we always have to Bounce the SOLR Webapp instances to reflect the changes in the index files? Any help/suggestions would be greatly appreciated. Thanks, kasi
Re: Solr and terracotta
How come it didn't work? How did you add RAMDir support to solr? On 8/22/07, Jeryl Cook <[EMAIL PROTECTED]> wrote: > > tried it, didn't work that well...so I ended up making my own little > faceted Search engine directly using RAMDirectory and clustering it via > Terracotta...not as good as SOLR(smile), but it worked. > i actually posted some questions awhile back in trying to get it to work. > so terracotta can "hook" the RAMDirectory, maybe be good to submit this in > JIRA for terrocotta support! > > Jeryl Cook > /^\ Pharaoh /^\ > > > http://pharaohofkush.blogspot.com/ > > > > "..Act your age, and not your shoe size.." > > -Prince(1986) > > > Date: Wed, 22 Aug 2007 16:18:24 -0300 > > From: [EMAIL PROTECTED] > > To: solr-user@lucene.apache.org > > Subject: Solr and terracotta > > > > Recently I ran into this topic. I googled it a little and didn't find > much > > information. > > It would be great to have solr working with RAMDirectory and Terracotta. > We > > could stop using crons for rsync, right? > > Has anyone tried that out? >
RE: Solr and terracotta
tried it, didn't work that well...so I ended up making my own little faceted Search engine directly using RAMDirectory and clustering it via Terracotta...not as good as SOLR(smile), but it worked. i actually posted some questions awhile back in trying to get it to work. so terracotta can "hook" the RAMDirectory, maybe be good to submit this in JIRA for terrocotta support! Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ "..Act your age, and not your shoe size.." -Prince(1986) > Date: Wed, 22 Aug 2007 16:18:24 -0300 > From: [EMAIL PROTECTED] > To: solr-user@lucene.apache.org > Subject: Solr and terracotta > > Recently I ran into this topic. I googled it a little and didn't find much > information. > It would be great to have solr working with RAMDirectory and Terracotta. We > could stop using crons for rsync, right? > Has anyone tried that out?
Re: Solr scoring: relative or absolute?
Indexes cannot be directly compared unless they have similar collection statistics. That is the same terms occur with the same frequency across all indexes and the average document lengths are about the same (though the default similarity in Lucene may not care about average document length--I'm not sure). SOLR-303 is an attempt to solve the partitioning issue from the search side of things. -Sean Lance Norskog wrote: Are the score values generated in Solr relative to the index or are they against an absolute standard? Is it possible to create a scoring algorithm with this property? Are there parts of the score inputs that are absolute? My use case is this: I would like to do a parallel search against two Solr indexes, and combine the results. The two indexes are built with the same data sources, we just can't handle one giant index. If the score values are against a common 'scale', then scores from the two search indexes can be compared. I could combine the result sets with a simple merge by score. This is a difficult concept to explain. I hope I have succeeded. Thanks, Lance
Solr scoring: relative or absolute?
Are the score values generated in Solr relative to the index or are they against an absolute standard? Is it possible to create a scoring algorithm with this property? Are there parts of the score inputs that are absolute? My use case is this: I would like to do a parallel search against two Solr indexes, and combine the results. The two indexes are built with the same data sources, we just can't handle one giant index. If the score values are against a common 'scale', then scores from the two search indexes can be compared. I could combine the result sets with a simple merge by score. This is a difficult concept to explain. I hope I have succeeded. Thanks, Lance
Solr and terracotta
Recently I ran into this topic. I googled it a little and didn't find much information. It would be great to have solr working with RAMDirectory and Terracotta. We could stop using crons for rsync, right? Has anyone tried that out?
RE: SolJava --- which attachments are valid?
Sorry for revisiting this 3 weeks old thread. I downloaded the nighlty yesterday. I noticed that some classes have API docs (.html) but no source code (.java). For example, there is a javadoc for org.apache.solr.client.solrj.util.ClientUtils but no ClientUtils.java: bash-3.00$ find . -type f | grep Client ./docs/api-solrj/org/apache/solr/client/solrj/util/class-use/ClientUtils .html ./docs/api-solrj/org/apache/solr/client/solrj/util/ClientUtils.html Is this a packaging problem, or is it intentional? -kuro > -Original Message- > From: Ryan McKinley [mailto:[EMAIL PROTECTED] > Sent: Friday, August 03, 2007 12:50 PM > To: solr-user@lucene.apache.org > Subject: Re: SolJava --- which attachments are valid? > > Teruhiko Kurosaka wrote: > >> or you can get it from the nightly builds in: > >> http://people.apache.org/builds/lucene/solr/nightly/ > > > > For those of you who are interested... > > > > As far as I can tell by inspecting the source code in Trunk, > > solrj.jar from the nightly doesn't seem to work with Solr 1.2. > > For one thing, there is a new layer org.apache.solr.common > > and org.apache.util has become a sub component under > > the common. Things like SolrInputDocument do not exist > > in Solr 1.2 at all. > > > > To run solrj, you need: > apache-solr-1.3-dev-common.jar > apache-solr-1.3-dev-solrj.jar > and all the files in: solrj-lib > > You *should* be able to use the client against a server that > is running > 1.2, but I don't make any promises there. > > ryan >
Apache web server logs in solr
Hello, I was thinking that solr - with its built in faceting - would make for a great apache log file storage system. I was wondering if anyone knows of any module or library for apache to write log files directly to solr or to a lucene index? Thanks Andrew
RE: Query optimisation - multiple filter caches?
Not high priority, but a few thoughts occur, then: - perhaps it would be better to use org.apache.lucene.search.Searcher by composition and have SolrIndexSearcher merely implement Searchable. - or... perhaps search(...) should perform optimally cache-aware searches - else integrators might wrongly think they're getting the full power of Solr. Jon > -Original Message- > From: Yonik Seeley [mailto:[EMAIL PROTECTED] > Sent: 22 August 2007 17:36 > > On 8/22/07, Jonathan Woods <[EMAIL PROTECTED]> wrote: > > I notice that LuceneQueryOptimizer is still used in > > SolrIndexSearcher.search(Query, Filter, Sort) - is the idea > then that > > this method is deprecated, > > Hmmm, so it is. I hadn't noticed because that method is not > called from any query handlers AFAIK (not since the first > versions of solr before it went open source). > The method itself shouldn't be deprecated because it's part > of the Lucene IndexSearcher interface.
Re: Query optimisation - multiple filter caches?
On 8/22/07, Jonathan Woods <[EMAIL PROTECTED]> wrote: > I notice that LuceneQueryOptimizer is still used in > SolrIndexSearcher.search(Query, Filter, Sort) - is the idea then that this > method is deprecated, Hmmm, so it is. I hadn't noticed because that method is not called from any query handlers AFAIK (not since the first versions of solr before it went open source). The method itself shouldn't be deprecated because it's part of the Lucene IndexSearcher interface. > or that the config parameter > query/boolTofilterOptimizer is no longer to be used? That should probably be removed from the example schema... thanks for pointing that out. -Yonik
RE: Query optimisation - multiple filter caches?
I understand - thanks, Yonik. I notice that LuceneQueryOptimizer is still used in SolrIndexSearcher.search(Query, Filter, Sort) - is the idea then that this method is deprecated, or that the config parameter query/boolTofilterOptimizer is no longer to be used? As for the other search() methods, they just delegate directly to org.apache.lucene.search.IndexSearcher, so no use of caches there. Jon > -Original Message- > From: Yonik Seeley [mailto:[EMAIL PROTECTED] > Sent: 16 August 2007 01:40 > To: solr-user@lucene.apache.org > Subject: Re: Query optimisation - multiple filter caches? > > On 8/15/07, Jonathan Woods <[EMAIL PROTECTED]> wrote: > > I'm trying to understand how best to integrate directly with Solr > > (Java-to-Java in the same JVM) to make the most of its query > > optimisation - chiefly, its caching of queries which merely filter > > rather than rank results. > > > > I notice that SolrIndexSearcher maintains a filter cache > and so does > > LuceneQueryOptimiser. Shouldn't they be contributing to/using the > > same cache, or are they used for different things? > > LuceneQueryOptimiser is no longer used since one can directly > specify filters via fq parameters. > > -Yonik > > >
almost realtime updates with replication
Hello, Currently in our application we are using the master/slave setup and have a batch update/commit about every 5 minutes. There are a couple queries that we would like to run almost realtime so I would like to have it so our client sends an update on every new document and then have solr configured to do an autocommit every 5-10 seconds. reading the Wiki, it seems like this isn't possible because of the strain of snapshotting and pulling to the slaves at such a high rate. What I was thinking was for these few queries to just query the master and the rest can query the slave with the not realtime data, although I'm assuming this wouldn't work either because since a snapshot is created on every commit, we would still impact the performance too much? anyone have any suggestions? If I set autowarmingCount=0 would I be able to to pull to the slave faster than every couple of minutes (say, every 10 seconds)? what if I take out the postcommit hook on the master and just have the snapshooter run on a cron every 5 minutes? -Mike
Re: Indexing HTML content... (Embed HTML into XML?)
Thanks Jérôme! It seems to work now. I just hope the provided HTMLStripWhitespaceTokenizerFactory will strip the right tags now. I use Java and used HtmlEncoder provided in http://itext.ugent.be/library/api/ for encoding with success. (just in case someone happens to search this thread) Ravi On 8/22/07, Jérôme Etévé <[EMAIL PROTECTED]> wrote: > You need to encode your html content so it can be include as a normal > 'string' value in your xml element. > > As far as remember, the only unsafe characters you have to encode as > entities are: > < -> < > > -> > > " -> "e; > & -> & > > (google xml entities to be sure). > > I dont know what language you use , but for perl for instance, you can > use something like: > use HTML::Entities ; > my $xmlString = encode_entities($rawHTML , '<>&"' ); > > Also you need to make sure your Html is encoded in UTF-8 . To comply > with solr need for UTF-8 encoded xml. > > I hope it helps. > > J. > > On 8/22/07, Ravish Bhagdev <[EMAIL PROTECTED]> wrote: > > Hello, > > > > Sorry for stupid question. I'm trying to index html file as one of > > the fields in Solr, I've setup appropriate analyzer in schema but I'm > > not sure how to add html content to Solr. Encapsulating HTML content > > within field tag is obviously not valid. How do I add html content? > > Hope the query is clear > > > > Thanks, > > Ravi > > > > > -- > Jerome Eteve. > [EMAIL PROTECTED] > http://jerome.eteve.free.fr/ >
Re: Indexing HTML content... (Embed HTML into XML?)
You need to encode your html content so it can be include as a normal 'string' value in your xml element. As far as remember, the only unsafe characters you have to encode as entities are: < -> < > -> > " -> "e; & -> & (google xml entities to be sure). I dont know what language you use , but for perl for instance, you can use something like: use HTML::Entities ; my $xmlString = encode_entities($rawHTML , '<>&"' ); Also you need to make sure your Html is encoded in UTF-8 . To comply with solr need for UTF-8 encoded xml. I hope it helps. J. On 8/22/07, Ravish Bhagdev <[EMAIL PROTECTED]> wrote: > Hello, > > Sorry for stupid question. I'm trying to index html file as one of > the fields in Solr, I've setup appropriate analyzer in schema but I'm > not sure how to add html content to Solr. Encapsulating HTML content > within field tag is obviously not valid. How do I add html content? > Hope the query is clear > > Thanks, > Ravi > -- Jerome Eteve. [EMAIL PROTECTED] http://jerome.eteve.free.fr/
Indexing HTML content... (Embed HTML into XML?)
Hello, Sorry for stupid question. I'm trying to index html file as one of the fields in Solr, I've setup appropriate analyzer in schema but I'm not sure how to add html content to Solr. Encapsulating HTML content within field tag is obviously not valid. How do I add html content? Hope the query is clear Thanks, Ravi
Re: Replacing existing documents
On Aug 21, 2007, at 9:25 PM, Lance Norskog wrote: Recently someone mentioned that it would be possible to have a 'replace existing document' feature rather than just dropping and adding documents with the same unique id. There is such a patch: https://issues.apache.org/jira/browse/SOLR-139 I'm experimenting with it right now and it works well for my cases. However, it is still under the covers a delete/add and One use case is that we would like to use the index as our one database for documents, and if we delete a document we want it to stay deleted. Thus we would mark it deleted and check for its existence. Another use case is that we are re-adding the same document a few times a day, and the commit times are ballooning. ...you still have to commit for changes to be visible. Erik
defining fiels to be returned when using mlt
Hi Is there any way to define the numer/type of fields of the documents returned in the "moreLikeThis" part of the response, when "mlt" is set to true? Currently I'm using morelikethis to show the number and sources of similar documents - therefore I'd need only the "source" field of these similar documents and not everything. - stefan
Major update to Solrsharp
A big update was just posted to the Solrsharp project. This update now provides for first-class support for highlighting in the library. The implementation is really robust and provides the following features: - Structured highlight parameter assignment based on the SolrField object - Full access for all highlight parameters, on both an aggregate and per-field basis - Incorporation of highlighted values into the base search result records All of the supplied documentation has been updated as well as the example application in using the highlighting classes. Please report any issues through JIRA. Be sure to associate any issues with the "C# client" component. cheers, jeff r.
RE: Replacing existing documents
Hello, "Recently someone mentioned that it would be possible to have a 'replace existing document' feature rather than just dropping and adding documents with the same unique id." AFAIK, this is not possible. You have the update in lucene, but internally it just does a delete/add operation "We have a few use cases in this area and I'm researching whether it is effective to check for a document via Solr queries, or whether it is worthwhile to add this to the Solr implementation." What are the usecases?? I do not see what you mean. "Does anyone have an estimate for the difference between querying, day, 100 documents by unique ID from the network v.s. fetching them directly from the index?" Depends of course from the networkfetching them from the index is fast normally. "One use case is that we would like to use the index as our one database for documents, and if we delete a document we want it to stay deleted. Thus we would mark it deleted and check for its existence." I suppose you mark it deleted by setting some flag (like lucene Field: isDeleted set to true). I am not sure wether using the lucene index as your database is really smart...i might get corrupt. I would at least suggest to backup it frequently Regards Ard ps sry for my annoying ".." because i am using a web mail client "Another use case is that we are re-adding the same document a few times a day, and the commit times are ballooning. Where would I implement this? Thanks, Lance"