Re: nutch and solr
Try this command. bin/nutch crawl urls//.txt -dir crawl/ -threads 10 -depth 2 -topN 1000 Your folder structure will look like this: -- urls -- -- .txt | | -- crawl -- The folder name will be for different domains. So for each domain folder in urls folder there has to be a corresponding folder (with the same name) in the crawl folder. -- View this message in context: http://lucene.472066.n3.nabble.com/nutch-and-solr-tp3765166p3765607.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to merge an "autofacet" with a predefined facet
: But i don't know if it's possible to merge this "autocreated" facet with a : facet already predefined ? i tried to used (adding this to my : code in my previous post) : : ** copyField applies to the raw input of those fields -- so the special logic you have in the analyzer for your text_tag_facet won't be applied yet when it's copied to your predefined_facet field (copyField happens first) : It's maybe because (As I understood) the real (stored) value of this dynamic : facet is still the initial fulltext ?? (or maybe i'm wrong ...) stored values are differnet from indexed values -- but stored values are also not ever a factor in dealing with faceting, the stored value is just what is returned when you get results back (ie: the "doc list") ... your problem has nothing to do with stored values. -Hoss
Re: Fast Vector Highlighter Working for some records only
(12/02/22 11:58), dhaivat wrote: Thanks for reply, But can you please tell me why it's working for some documents and not for other. As Solr 1.4.1 cannot recognize hl.useFastVectorHighlighter flag, Solr just ignore it, but due to hl=true is there, Solr tries to create highlight snippets by using (existing; traditional; I mean not FVH) Highlighter. Highlighter (including FVH) cannot produce snippets sometime for some reasons, you can use hl.alternateField parameter. http://wiki.apache.org/solr/HighlightingParameters#hl.alternateField koji -- Query Log Visualizer for Apache Solr http://soleami.com/
Re: Fast Vector Highlighter Working for some records only
Koji Sekiguchi wrote > > (12/02/21 21:22), dhaivat wrote: >> Hi Koji, >> >> Thanks for quick reply, i am using solr 1.4.1 >> > > Uh, you cannot use FVH on Solr 1.4.1. FVH is available Solr 3.1 or later. > So your hl.useFastVectorHighlighter=true flag is ignored. > > koji > -- > Query Log Visualizer for Apache Solr > http://soleami.com/ > Thanks for reply, But can you please tell me why it's working for some documents and not for other. -- View this message in context: http://lucene.472066.n3.nabble.com/Fast-Vector-Highlighter-Working-for-some-records-only-tp3763286p3765458.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Help with MMapDirectoryFactory in 3.5
: How do I see the setting in the log or in stats.jsp ? I cannot find a place : that indicates it is set or not. I don't think the DirectoryFactory plugin hook was ever setup so that it can report it's info/stats ... it doesn't look like it implements SOlrInfoMBean, so it can't really report anything about itself. : I would assume StandardDirectoryFactory is being used but I do see (when I : set it or NOT set it) ... : readerDir : : org.apache.lucene.store.MMapDirectory@C:\solr\jetty\example\solr\providersea : rch\data\index this is because StandardDirectoryFactory uses FSDirectory .. if you check out those docs you'll see... http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/store/FSDirectory.html#open%28java.io.File%29 >> Currently this returns MMapDirectory for most Solaris and Windows >> 64-bit JREs, NIOFSDirectory for other non-Windows JREs, and >> SimpleFSDirectory for other JREs on Windows. -Hoss
Re: Solrj Stream Server memory leak
: I am using the SolrJ client's StreamingUpdateSolrServer and when ever i : stop tomcat, it throws a memory leak warning. sample error message: : : SEVERE: The web application [/MyApplication] appears to have started a : thread named [pool-1004-thread-1] but has failed to stop it. This is very : likely to create a memory leak. as part of the SolrCloud work (SOLR-2358) a "shutdown()" method was added to CommonsHttpSolrServer (and StreamingUpdateSolrServer) to instruct it to shutdown the HttpClient it wraps (if it created it). So if you are using trunk, you should call that when you are done with the StreamingUpdateSolrServer object. As a workarround in 3x, you can instantiate the HttpClient yourself using the MultiThreadedHttpConnectionManager, and pass it to the StreamingUpdateSolrServer constructor. then when your app shuts down, you can call shutdown on the HttpClient. alternately: the minimal ammount of change you can make to work arround tthis would be to add a call to the static method... MultiThreadedHttpConnectionManager.shutdownAll(); ...some where in your app's shutdown code (assuming it doesn't cause problems with any subsequent shutdown code) -Hoss
Re: Date filter query
bq: How could I overlook it? Easy, the same way I did for a year and more Best Erick On Tue, Feb 21, 2012 at 6:50 PM, Em wrote: > Erick, > > damn! > > The NOW of now isn't the same NOW a second later. So obvisiously. How > could I overlook it? > > Kind regards, > Em > > Am 22.02.2012 00:17, schrieb Erick Erickson: >> Be a little careful here. Any "fq" that references NOW will probably >> NOT be effectively cached. Think of the fq cache as a map, with >> the key being the fq clause and the value being the set of >> documents that match that value. >> >> So something like NOW gives >> 2012-01-23T00:00:00Z >> but issuing that a second later gives: >> 2012-01-23T00:00:01Z >> >> so the keys don't match, they're considered >> different fq clauses and the calculations are all >> done all over again. >> >> Using the rounding for date math will help here, >> something like NOW/DAY+1DAY to get midnight tonight >> will give you something that's re-used, similarly for >> NOW/DAY-30DAY etc. >> >> All that said, your query times are pretty long. I doubt >> that your fq clause is really the culprit here. You need >> to find out what the bottleneck is here, consider using >> jconsole to see what your machine is occupying its >> time with. Examine your cache statistics to see >> if your getting good usage from your cache. You >> haven't detailed what you're measuring. If this is just >> a half-dozen queries after starting Solr, you may get >> much better performance if you autowarm. >> >> You may have too little memory allocated. You may be >> swapping to disk a lot. You may. >> >> What have you tried and what have the results been? >> >> In short, these times are very suspect and you haven't >> really provided much info to go on. >> >> Best >> Erick >> >> On Tue, Feb 21, 2012 at 5:25 PM, Em wrote: >>> Hi, >>> But they [the cache configurations] are default for both tests, can it >>> affect on results? >>> Yes, they affect both results. Try to increase the values for >>> queryResultCache and documentCache from 512 to 1024 (provided that you >>> got two distinct queries "bay" and "girl"). In general they should fit >>> the amount of documents and results you are expecting to have in a way >>> that chances are good to have a cache-hit. >>> Maybe it is that I use shards. I have 11 shards, summary ~310M docs. >>> 11 shards on the same machine? Could lead to decreased performance due >>> to disk-io. >>> >>> Did you tried my advice of adjusting the precisionSteps of your >>> TrieDateFields and reindexed your documents afterwards? >>> >>> Kind regards, >>> Em >>> >>> >>> Am 21.02.2012 22:52, schrieb ku3ia: Hi, >> First: I am really surprised that the difference between explicit >> Date-Values and the more friendly date-keywords is that large. Maybe it is that I use shards. I have 11 shards, summary ~310M docs. >> Did you made a server restart between both tests? I tried to run these test one after another, I'd rebooted my tomcats, I'd run second test first and vice versa. >> Second: Could you show us your solrconfig to make sure that your caches >> are configured well? I'm using solrconfig from solr/example directory. The difference is that I only commented out unused components. Filter, document and query result cache is default. But they are default for both tests, can it affect on results? >> Furthermore: Take into consideration, whether you really need 500 rows >> per request. Yes, I need 500 rows. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Date-filter-query-tp3764349p3764941.html Sent from the Solr - User mailing list archive at Nabble.com. >>
Re: Fast Vector Highlighter Working for some records only
(12/02/21 21:22), dhaivat wrote: Hi Koji, Thanks for quick reply, i am using solr 1.4.1 Uh, you cannot use FVH on Solr 1.4.1. FVH is available Solr 3.1 or later. So your hl.useFastVectorHighlighter=true flag is ignored. koji -- Query Log Visualizer for Apache Solr http://soleami.com/
Re: Date filter query
Erick, damn! The NOW of now isn't the same NOW a second later. So obvisiously. How could I overlook it? Kind regards, Em Am 22.02.2012 00:17, schrieb Erick Erickson: > Be a little careful here. Any "fq" that references NOW will probably > NOT be effectively cached. Think of the fq cache as a map, with > the key being the fq clause and the value being the set of > documents that match that value. > > So something like NOW gives > 2012-01-23T00:00:00Z > but issuing that a second later gives: > 2012-01-23T00:00:01Z > > so the keys don't match, they're considered > different fq clauses and the calculations are all > done all over again. > > Using the rounding for date math will help here, > something like NOW/DAY+1DAY to get midnight tonight > will give you something that's re-used, similarly for > NOW/DAY-30DAY etc. > > All that said, your query times are pretty long. I doubt > that your fq clause is really the culprit here. You need > to find out what the bottleneck is here, consider using > jconsole to see what your machine is occupying its > time with. Examine your cache statistics to see > if your getting good usage from your cache. You > haven't detailed what you're measuring. If this is just > a half-dozen queries after starting Solr, you may get > much better performance if you autowarm. > > You may have too little memory allocated. You may be > swapping to disk a lot. You may. > > What have you tried and what have the results been? > > In short, these times are very suspect and you haven't > really provided much info to go on. > > Best > Erick > > On Tue, Feb 21, 2012 at 5:25 PM, Em wrote: >> Hi, >> >>> But they [the cache configurations] are default for both tests, can it >> affect on >>> results? >> Yes, they affect both results. Try to increase the values for >> queryResultCache and documentCache from 512 to 1024 (provided that you >> got two distinct queries "bay" and "girl"). In general they should fit >> the amount of documents and results you are expecting to have in a way >> that chances are good to have a cache-hit. >> >>> Maybe it is that I use shards. I have 11 shards, summary ~310M docs. >> 11 shards on the same machine? Could lead to decreased performance due >> to disk-io. >> >> Did you tried my advice of adjusting the precisionSteps of your >> TrieDateFields and reindexed your documents afterwards? >> >> Kind regards, >> Em >> >> >> Am 21.02.2012 22:52, schrieb ku3ia: >>> Hi, >>> > First: I am really surprised that the difference between explicit > Date-Values and the more friendly date-keywords is that large. >>> Maybe it is that I use shards. I have 11 shards, summary ~310M docs. >>> > Did you made a server restart between both tests? >>> I tried to run these test one after another, I'd rebooted my tomcats, I'd >>> run second test first and vice versa. >>> > Second: Could you show us your solrconfig to make sure that your caches > are configured well? >>> I'm using solrconfig from solr/example directory. The difference is that I >>> only commented out unused components. Filter, document and query result >>> cache is default. But they are default for both tests, can it affect on >>> results? >>> > Furthermore: Take into consideration, whether you really need 500 rows > per request. >>> Yes, I need 500 rows. >>> >>> Thanks >>> >>> -- >>> View this message in context: >>> http://lucene.472066.n3.nabble.com/Date-filter-query-tp3764349p3764941.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >
nutch and solr
I try to configured nutch (1.4) on my solr 3.2 But when I try with a crawl command "bin/nutch inject crawl/crawldb urls" don't works, and it reply with "can't convert a empty path" why, in your opinion? tx a.
Re: filter query or boolean?
Apples and oranges here. Filter queries do NOT contribute to score. But they are cached so if you have a frequent use-case for filtering, you'll get much faster performance. OTOH, if your filter queries are never repeated, filter queries aren't helpful. So if correctness isn't defined by the fq clause being included in the relevance score, you're *usually* better off using filter queries... Best Erick On Tue, Feb 21, 2012 at 1:25 PM, wrote: > > Hi, > Which is faster for boolean compound expressions. filter queries or a > single query with boolean expressions? > For that matter, is there any difference other than maybe speed? > > thanks
Re: Date filter query
Be a little careful here. Any "fq" that references NOW will probably NOT be effectively cached. Think of the fq cache as a map, with the key being the fq clause and the value being the set of documents that match that value. So something like NOW gives 2012-01-23T00:00:00Z but issuing that a second later gives: 2012-01-23T00:00:01Z so the keys don't match, they're considered different fq clauses and the calculations are all done all over again. Using the rounding for date math will help here, something like NOW/DAY+1DAY to get midnight tonight will give you something that's re-used, similarly for NOW/DAY-30DAY etc. All that said, your query times are pretty long. I doubt that your fq clause is really the culprit here. You need to find out what the bottleneck is here, consider using jconsole to see what your machine is occupying its time with. Examine your cache statistics to see if your getting good usage from your cache. You haven't detailed what you're measuring. If this is just a half-dozen queries after starting Solr, you may get much better performance if you autowarm. You may have too little memory allocated. You may be swapping to disk a lot. You may. What have you tried and what have the results been? In short, these times are very suspect and you haven't really provided much info to go on. Best Erick On Tue, Feb 21, 2012 at 5:25 PM, Em wrote: > Hi, > >> But they [the cache configurations] are default for both tests, can it > affect on >> results? > Yes, they affect both results. Try to increase the values for > queryResultCache and documentCache from 512 to 1024 (provided that you > got two distinct queries "bay" and "girl"). In general they should fit > the amount of documents and results you are expecting to have in a way > that chances are good to have a cache-hit. > >> Maybe it is that I use shards. I have 11 shards, summary ~310M docs. > 11 shards on the same machine? Could lead to decreased performance due > to disk-io. > > Did you tried my advice of adjusting the precisionSteps of your > TrieDateFields and reindexed your documents afterwards? > > Kind regards, > Em > > > Am 21.02.2012 22:52, schrieb ku3ia: >> Hi, >> First: I am really surprised that the difference between explicit Date-Values and the more friendly date-keywords is that large. >> Maybe it is that I use shards. I have 11 shards, summary ~310M docs. >> Did you made a server restart between both tests? >> I tried to run these test one after another, I'd rebooted my tomcats, I'd >> run second test first and vice versa. >> Second: Could you show us your solrconfig to make sure that your caches are configured well? >> I'm using solrconfig from solr/example directory. The difference is that I >> only commented out unused components. Filter, document and query result >> cache is default. But they are default for both tests, can it affect on >> results? >> Furthermore: Take into consideration, whether you really need 500 rows per request. >> Yes, I need 500 rows. >> >> Thanks >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Date-filter-query-tp3764349p3764941.html >> Sent from the Solr - User mailing list archive at Nabble.com. >>
Re: mixed indexing through dhi and other ways
Hi Ramo, sorry for confusing you. Forget everything that I said after "However" - it was wrong (I mixed something here). Yes, you can index documents via any UpdateRequestHandler you like while using the DIH. Kind regards, Em Am 21.02.2012 23:41, schrieb Ramo Karahasan: > Hi, > > what do you mean? Are you referring the time i add a new document? But that > should be okay, all documents will be added with delta import that are older > than the last time I've indexed, right? > > Thanks, > Ramo > > -Ursprüngliche Nachricht- > Von: Em [mailto:mailformailingli...@yahoo.de] > Gesendet: Dienstag, 21. Februar 2012 23:27 > An: solr-user@lucene.apache.org > Betreff: Re: mixed indexing through dhi and other ways > > Hi Ramo, > > yes, it's possible. > However keep in mind that your cURL, CSV, XML, JSON etc. update-requests > store the information that is needed to do delta-updates with your DIH (if > needed!). > > Kind regards, > Em > > Am 21.02.2012 23:18, schrieb Ramo Karahasan: >> Hi, >> >> >> >> currently i'm indexing via DHI and delta import. >> >> Is it possible to additionaly index data via cURL as XML or JSON into >> the index which was created via DHI, for example for >> "real-time"indexing data, like comments on a question? >> >> >> >> Thank you, >> >> Ramo >> >> > >
Solr Highlighting not working with PayloadTermQueries
Hi, I'm using SOLR and Lucene in my application for search. I'm facing an issue of highlighting using FastVectorHighlighter not working when I use PayloadTermQueries as clauses of a BooleanQuery. After Debugging I found that In DefaultSolrHighlighter.Java, fvh.getFieldQuery does not return any term in the termMap. FastVectorHighlighter fvh = new FastVectorHighlighter( // FVH cannot process hl.usePhraseHighlighter parameter per-field basis params.getBool( HighlightParams.USE_PHRASE_HIGHLIGHTER, true ), // FVH cannot process hl.requireFieldMatch parameter per-field basis params.getBool( HighlightParams.FIELD_MATCH, false ) ); FieldQuery fieldQuery = fvh.getFieldQuery( query ); The reason of empty termmap is, PayloadTermQuery is discarded while constructing the FieldQuery. void flatten( Query sourceQuery, Collection flatQueries ){ if( sourceQuery instanceof BooleanQuery ){ BooleanQuery bq = (BooleanQuery)sourceQuery; for( BooleanClause clause : bq.getClauses() ){ if( !clause.isProhibited() ) flatten( clause.getQuery(), flatQueries ); } } else if( sourceQuery instanceof DisjunctionMaxQuery ){ DisjunctionMaxQuery dmq = (DisjunctionMaxQuery)sourceQuery; for( Query query : dmq ){ flatten( query, flatQueries ); } } else if( sourceQuery instanceof TermQuery ){ if( !flatQueries.contains( sourceQuery ) ) flatQueries.add( sourceQuery ); } else if( sourceQuery instanceof PhraseQuery ){ if( !flatQueries.contains( sourceQuery ) ){ PhraseQuery pq = (PhraseQuery)sourceQuery; if( pq.getTerms().length > 1 ) flatQueries.add( pq ); else if( pq.getTerms().length == 1 ){ flatQueries.add( new TermQuery( pq.getTerms()[0] ) ); } } } // else discard queries } What is the best way to get highlighting working with Payload Term Queries? Thanks Nitin -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Highlighting-not-working-with-PayloadTermQueries-tp3765093p3765093.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: reader/searcher refresh after replication (commit)
Eks, that sounds strange! Am I getting you right? You have a master which indexes batch-updates from time to time. Furthermore you got some slaves, pulling data from that master to keep them up-to-date with the newest batch-updates. Additionally your slaves index own content in soft-commit mode that needs to be available as soon as possible. In consequence the slavesare not in sync with the master. I am not 100% certain, but chances are good that Solr's replication-mechanism only changes those segments that are not in sync with the master. What are you expecting a BeforeCommitListener could do for you, if one would exist? Kind regards, Em Am 21.02.2012 21:10, schrieb eks dev: > Thanks Mark, > Hmm, I would like to have this information asap, not to wait until the > first search gets executed (depends on user) . Is solr going to create > new searcher as a part of "replication transaction"... > > Just to make it clear why I need it... > I have simple master, many slaves config where master does "batch" > updates in big chunks (things user can wait longer to see on search > side) but slaves work in soft commit mode internally where I permit > them to run away slightly from master in order to know where > "incremental update" should start, I read it from UserData > > Basically, ideally, before commit (after successful replication is > finished) ends, I would like to read in these counters to let > "incremental update" run from the right point... > > I need to prevent updating "replicated index" before I read this > information (duplicates can appear) are there any "IndexWriter" > listeners around? > > > Thanks again, > eks. > > > > On Tue, Feb 21, 2012 at 8:03 PM, Mark Miller wrote: >> Post commit calls are made before a new searcher is opened. >> >> Might be easier to try to hook in with a new searcher listener? >> >> On Feb 21, 2012, at 8:23 AM, eks dev wrote: >> >>> Hi all, >>> I am a bit confused with IndexSearcher refresh lifecycles... >>> In a master slave setup, I override postCommit listener on slave >>> (solr trunk version) to read some user information stored in >>> userCommitData on master >>> >>> -- >>> @Override >>> public final void postCommit() { >>> // This returnes "stale" information that was present before >>> replication finished >>> RefCounted refC = core.getNewestSearcher(true); >>> Map userData = >>> refC.get().getIndexReader().getIndexCommit().getUserData(); >>> } >>> >>> I expected core.getNewestSearcher(true); to return refreshed >>> SolrIndexSearcher, but it didn't >>> >>> When is this information going to be refreshed to the status from the >>> replicated index, I repeat this is postCommit listener? >>> >>> What is the way to get the information from the last commit point? >>> >>> Maybe like this? >>> core.getDeletionPolicy().getLatestCommit().getUserData(); >>> >>> Or I need to explicitly open new searcher (isn't solr does this behind >>> the scenes?) >>> core.openNewSearcher(false, false) >>> >>> Not critical, reopening new searcher works, but I would like to >>> understand these lifecycles, when solr loads latest commit point... >>> >>> Thanks, eks >> >> - Mark Miller >> lucidimagination.com >> >> >> >> >> >> >> >> >> >> >> >
AW: mixed indexing through dhi and other ways
Hi, what do you mean? Are you referring the time i add a new document? But that should be okay, all documents will be added with delta import that are older than the last time I've indexed, right? Thanks, Ramo -Ursprüngliche Nachricht- Von: Em [mailto:mailformailingli...@yahoo.de] Gesendet: Dienstag, 21. Februar 2012 23:27 An: solr-user@lucene.apache.org Betreff: Re: mixed indexing through dhi and other ways Hi Ramo, yes, it's possible. However keep in mind that your cURL, CSV, XML, JSON etc. update-requests store the information that is needed to do delta-updates with your DIH (if needed!). Kind regards, Em Am 21.02.2012 23:18, schrieb Ramo Karahasan: > Hi, > > > > currently i'm indexing via DHI and delta import. > > Is it possible to additionaly index data via cURL as XML or JSON into > the index which was created via DHI, for example for > "real-time"indexing data, like comments on a question? > > > > Thank you, > > Ramo > >
Re: SOLR - Just for search or whole site DB?
Hi Spadez, MySQL, as well as any other SQL-database, needs the same amount of work to integrate its data into Solr. Choose your favorite database and get started! Best, Em Am 21.02.2012 18:32, schrieb Spadez: > Thank you for the information Damien. > > Is there a better database to use at the core of the sight which is more > compatible with SOLR than MYSQL, or is hooking MYSQL up with SOLR simple > enough. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/SOLR-Just-for-search-or-whole-site-DB-tp3763439p3764254.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: mixed indexing through dhi and other ways
Hi Ramo, yes, it's possible. However keep in mind that your cURL, CSV, XML, JSON etc. update-requests store the information that is needed to do delta-updates with your DIH (if needed!). Kind regards, Em Am 21.02.2012 23:18, schrieb Ramo Karahasan: > Hi, > > > > currently i'm indexing via DHI and delta import. > > Is it possible to additionaly index data via cURL as XML or JSON into the > index which was created via DHI, for example for "real-time"indexing data, > like comments on a question? > > > > Thank you, > > Ramo > >
Re: Date filter query
Hi, > But they [the cache configurations] are default for both tests, can it affect on > results? Yes, they affect both results. Try to increase the values for queryResultCache and documentCache from 512 to 1024 (provided that you got two distinct queries "bay" and "girl"). In general they should fit the amount of documents and results you are expecting to have in a way that chances are good to have a cache-hit. > Maybe it is that I use shards. I have 11 shards, summary ~310M docs. 11 shards on the same machine? Could lead to decreased performance due to disk-io. Did you tried my advice of adjusting the precisionSteps of your TrieDateFields and reindexed your documents afterwards? Kind regards, Em Am 21.02.2012 22:52, schrieb ku3ia: > Hi, > >>> First: I am really surprised that the difference between explicit >>> Date-Values and the more friendly date-keywords is that large. > Maybe it is that I use shards. I have 11 shards, summary ~310M docs. > >>> Did you made a server restart between both tests? > I tried to run these test one after another, I'd rebooted my tomcats, I'd > run second test first and vice versa. > >>> Second: Could you show us your solrconfig to make sure that your caches >>> are configured well? > I'm using solrconfig from solr/example directory. The difference is that I > only commented out unused components. Filter, document and query result > cache is default. But they are default for both tests, can it affect on > results? > >>> Furthermore: Take into consideration, whether you really need 500 rows >>> per request. > Yes, I need 500 rows. > > Thanks > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Date-filter-query-tp3764349p3764941.html > Sent from the Solr - User mailing list archive at Nabble.com. >
mixed indexing through dhi and other ways
Hi, currently i'm indexing via DHI and delta import. Is it possible to additionaly index data via cURL as XML or JSON into the index which was created via DHI, for example for "real-time"indexing data, like comments on a question? Thank you, Ramo
Re: Date filter query
Hi, >>First: I am really surprised that the difference between explicit >>Date-Values and the more friendly date-keywords is that large. Maybe it is that I use shards. I have 11 shards, summary ~310M docs. >>Did you made a server restart between both tests? I tried to run these test one after another, I'd rebooted my tomcats, I'd run second test first and vice versa. >>Second: Could you show us your solrconfig to make sure that your caches >>are configured well? I'm using solrconfig from solr/example directory. The difference is that I only commented out unused components. Filter, document and query result cache is default. But they are default for both tests, can it affect on results? >>Furthermore: Take into consideration, whether you really need 500 rows >>per request. Yes, I need 500 rows. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Date-filter-query-tp3764349p3764941.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Date filter query
Hi, your QTimes are somewhat slow! First: I am really surprised that the difference between explicit Date-Values and the more friendly date-keywords is that large. Did you made a server restart between both tests? Second: Could you show us your solrconfig to make sure that your caches are configured well? How many documents are part of that test-index? I suggest you to adjust the precisionStep-definition of your TrieDateField. Furthermore: Take into consideration, whether you really need 500 rows per request. Kind regards, Em Am 21.02.2012 21:49, schrieb ku3ia: > Hi, Em, thanks for your response. But seems a have a problem. > I wrote a script, which sends a queries (curl based), with a certain delay. > I had made a dictionary of matched words. I run my script with 500ms delay > during 60 seconds. Take look at catalina logs: > > INFO: [] webapp=/solr path=/select > params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="baby"&wt=json&fq=Date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z]&rows=500} > status=0 QTime=1735 > INFO: [] webapp=/solr path=/select > params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="girl"&wt=json&fq=Date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z]&rows=500} > status=0 QTime=9794 > > INFO: [] webapp=/solr path=/select > params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="baby"&wt=json&fq=Date:[NOW-30DAY+TO+NOW]&rows=500} > status=0 QTime=13885 > INFO: [] webapp=/solr path=/select > params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="girl"&wt=json&fq=Date:[NOW-30DAY+TO+NOW]&rows=500} > status=0 QTime=33995 > > Note, that not all queries from the second test are slower, for example: > INFO: [] webapp=/solr path=/select > params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="with"&wt=json&fq=Date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z]&rows=500} > status=0 QTime=18645 > > INFO: [] webapp=/solr path=/select > params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="with"&wt=json&fq=Date:[NOW-30DAY+TO+NOW]&rows=500} > status=0 QTime=7877 > > but in average I have: > *** Date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z] *** > Queries processed: 110 > Queries cancelled: 4 > Max QTime is: 22728 ms > Avg QTime is: 6681.31 ms > Min QTime is: ms > > *** Date:[NOW-30DAY+TO+NOW] *** > Queries processed: 20 > Queries cancelled: 94 > Max QTime is: 45203 ms > Avg QTime is: 39195.2 ms > Min QTime is: ms > > I repeated this test more times - results seems equal. Is it true, that > [2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z] is faster than > [NOW-30DAY+TO+NOW] > ? > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Date-filter-query-tp3764349p3764781.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: reader/searcher refresh after replication (commit)
And drinks on me to those who decoupled implicit commit from close... this was tricky trap On Tue, Feb 21, 2012 at 9:10 PM, eks dev wrote: > Thanks Mark, > Hmm, I would like to have this information asap, not to wait until the > first search gets executed (depends on user) . Is solr going to create > new searcher as a part of "replication transaction"... > > Just to make it clear why I need it... > I have simple master, many slaves config where master does "batch" > updates in big chunks (things user can wait longer to see on search > side) but slaves work in soft commit mode internally where I permit > them to run away slightly from master in order to know where > "incremental update" should start, I read it from UserData > > Basically, ideally, before commit (after successful replication is > finished) ends, I would like to read in these counters to let > "incremental update" run from the right point... > > I need to prevent updating "replicated index" before I read this > information (duplicates can appear) are there any "IndexWriter" > listeners around? > > > Thanks again, > eks. > > > > On Tue, Feb 21, 2012 at 8:03 PM, Mark Miller wrote: >> Post commit calls are made before a new searcher is opened. >> >> Might be easier to try to hook in with a new searcher listener? >> >> On Feb 21, 2012, at 8:23 AM, eks dev wrote: >> >>> Hi all, >>> I am a bit confused with IndexSearcher refresh lifecycles... >>> In a master slave setup, I override postCommit listener on slave >>> (solr trunk version) to read some user information stored in >>> userCommitData on master >>> >>> -- >>> @Override >>> public final void postCommit() { >>> // This returnes "stale" information that was present before >>> replication finished >>> RefCounted refC = core.getNewestSearcher(true); >>> Map userData = >>> refC.get().getIndexReader().getIndexCommit().getUserData(); >>> } >>> >>> I expected core.getNewestSearcher(true); to return refreshed >>> SolrIndexSearcher, but it didn't >>> >>> When is this information going to be refreshed to the status from the >>> replicated index, I repeat this is postCommit listener? >>> >>> What is the way to get the information from the last commit point? >>> >>> Maybe like this? >>> core.getDeletionPolicy().getLatestCommit().getUserData(); >>> >>> Or I need to explicitly open new searcher (isn't solr does this behind >>> the scenes?) >>> core.openNewSearcher(false, false) >>> >>> Not critical, reopening new searcher works, but I would like to >>> understand these lifecycles, when solr loads latest commit point... >>> >>> Thanks, eks >> >> - Mark Miller >> lucidimagination.com >> >> >> >> >> >> >> >> >> >> >>
Re: Date filter query
Hi, Em, thanks for your response. But seems a have a problem. I wrote a script, which sends a queries (curl based), with a certain delay. I had made a dictionary of matched words. I run my script with 500ms delay during 60 seconds. Take look at catalina logs: INFO: [] webapp=/solr path=/select params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="baby"&wt=json&fq=Date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z]&rows=500} status=0 QTime=1735 INFO: [] webapp=/solr path=/select params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="girl"&wt=json&fq=Date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z]&rows=500} status=0 QTime=9794 INFO: [] webapp=/solr path=/select params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="baby"&wt=json&fq=Date:[NOW-30DAY+TO+NOW]&rows=500} status=0 QTime=13885 INFO: [] webapp=/solr path=/select params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="girl"&wt=json&fq=Date:[NOW-30DAY+TO+NOW]&rows=500} status=0 QTime=33995 Note, that not all queries from the second test are slower, for example: INFO: [] webapp=/solr path=/select params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="with"&wt=json&fq=Date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z]&rows=500} status=0 QTime=18645 INFO: [] webapp=/solr path=/select params={fl=fileds_from_schema_here,score&sort=score+desc&start=0&q="with"&wt=json&fq=Date:[NOW-30DAY+TO+NOW]&rows=500} status=0 QTime=7877 but in average I have: *** Date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z] *** Queries processed: 110 Queries cancelled: 4 Max QTime is: 22728 ms Avg QTime is: 6681.31 ms Min QTime is: ms *** Date:[NOW-30DAY+TO+NOW] *** Queries processed: 20 Queries cancelled: 94 Max QTime is: 45203 ms Avg QTime is: 39195.2 ms Min QTime is: ms I repeated this test more times - results seems equal. Is it true, that [2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z] is faster than [NOW-30DAY+TO+NOW] ? -- View this message in context: http://lucene.472066.n3.nabble.com/Date-filter-query-tp3764349p3764781.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Unique key constraint and optimistic locking (versioning)
Hi Per, Solr provides the so called "UniqueKey"-field. Refer to the Wiki to learn more: http://wiki.apache.org/solr/UniqueKey > Optimistic locking (versioning) ... is not provided by Solr out of the box. If you add a new document with the same UniqueKey it replaces the old one. You have to do the versioning on your own (and keep in mind concurrent updates). Kind regards, Em Am 21.02.2012 13:50, schrieb Per Steffensen: > Hi > > Does solr/lucene provide any mechanism for "unique key constraint" and > "optimistic locking (versioning)"? > Unique key constraint: That a client will not succeed creating a new > document in solr/lucene if a document already exists having the same > value in some field (e.g. an id field). Of course implemented right, so > that even though two or more threads are concurrently trying to create a > new document with the same value in this field, only one of them will > succeed. > Optimistic locking (versioning): That a client will only succeed > updating a document if this updated document is based on the version of > the document currently stored in solr/lucene. Implemented in the > optimistic way that clients during an update have to tell which version > of the document they fetched from Solr and that they therefore have used > as a starting-point for their updated document. So basically having a > version field on the document that clients increase by one before > sending to solr for update, and some code in Solr that only makes the > update succeed if the version number of the updated document is exactly > one higher than the version number of the document already stored. Of > course again implemented right, so that even though two or more thrads > are concurrently trying to update a document, and they all have their > updated document based on the current version in solr/lucene, only one > of them will succeed. > > Or do I have to do stuff like this myself outside solr/lucene - e.g. in > the client using solr. > > Regards, Per Steffensen >
Re: reader/searcher refresh after replication (commit)
Thanks Mark, Hmm, I would like to have this information asap, not to wait until the first search gets executed (depends on user) . Is solr going to create new searcher as a part of "replication transaction"... Just to make it clear why I need it... I have simple master, many slaves config where master does "batch" updates in big chunks (things user can wait longer to see on search side) but slaves work in soft commit mode internally where I permit them to run away slightly from master in order to know where "incremental update" should start, I read it from UserData Basically, ideally, before commit (after successful replication is finished) ends, I would like to read in these counters to let "incremental update" run from the right point... I need to prevent updating "replicated index" before I read this information (duplicates can appear) are there any "IndexWriter" listeners around? Thanks again, eks. On Tue, Feb 21, 2012 at 8:03 PM, Mark Miller wrote: > Post commit calls are made before a new searcher is opened. > > Might be easier to try to hook in with a new searcher listener? > > On Feb 21, 2012, at 8:23 AM, eks dev wrote: > >> Hi all, >> I am a bit confused with IndexSearcher refresh lifecycles... >> In a master slave setup, I override postCommit listener on slave >> (solr trunk version) to read some user information stored in >> userCommitData on master >> >> -- >> @Override >> public final void postCommit() { >> // This returnes "stale" information that was present before >> replication finished >> RefCounted refC = core.getNewestSearcher(true); >> Map userData = >> refC.get().getIndexReader().getIndexCommit().getUserData(); >> } >> >> I expected core.getNewestSearcher(true); to return refreshed >> SolrIndexSearcher, but it didn't >> >> When is this information going to be refreshed to the status from the >> replicated index, I repeat this is postCommit listener? >> >> What is the way to get the information from the last commit point? >> >> Maybe like this? >> core.getDeletionPolicy().getLatestCommit().getUserData(); >> >> Or I need to explicitly open new searcher (isn't solr does this behind >> the scenes?) >> core.openNewSearcher(false, false) >> >> Not critical, reopening new searcher works, but I would like to >> understand these lifecycles, when solr loads latest commit point... >> >> Thanks, eks > > - Mark Miller > lucidimagination.com > > > > > > > > > > >
Re: How to merge an "autofacet" with a predefined facet
Well, you could create a keyword-file out of your database and join it with your self-maintained keywordslist. Doing so, keep in mind that you have to reload your SolrCore in order to make the changes visible to the indexing-process (and keep in mind that you have to reindex those documents that match your new keywordslist but currently do not have those keywords assigned). Kind regards, Em Am 21.02.2012 19:53, schrieb Xavier: > In a way I agree that it would be easier to do that but i really wants to > avoid this solution because it prefer to work "harder" on preparing my index > than adding field requests on my front query :) > > So the only solution i see right now is to do that on my own in order to > have my database fully prepared to be indexed ... but i had hope that solr > could handle it ... so if anyone see any solution to handle it directly with > solr you are welcome :p > > Anyways thanks for your help Em ;) > > Best regards, > Xavier > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/How-to-merge-an-autofacet-with-a-predefined-facet-tp3763988p3764506.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: reader/searcher refresh after replication (commit)
Post commit calls are made before a new searcher is opened. Might be easier to try to hook in with a new searcher listener? On Feb 21, 2012, at 8:23 AM, eks dev wrote: > Hi all, > I am a bit confused with IndexSearcher refresh lifecycles... > In a master slave setup, I override postCommit listener on slave > (solr trunk version) to read some user information stored in > userCommitData on master > > -- > @Override > public final void postCommit() { > // This returnes "stale" information that was present before > replication finished > RefCounted refC = core.getNewestSearcher(true); > Map userData = > refC.get().getIndexReader().getIndexCommit().getUserData(); > } > > I expected core.getNewestSearcher(true); to return refreshed > SolrIndexSearcher, but it didn't > > When is this information going to be refreshed to the status from the > replicated index, I repeat this is postCommit listener? > > What is the way to get the information from the last commit point? > > Maybe like this? > core.getDeletionPolicy().getLatestCommit().getUserData(); > > Or I need to explicitly open new searcher (isn't solr does this behind > the scenes?) > core.openNewSearcher(false, false) > > Not critical, reopening new searcher works, but I would like to > understand these lifecycles, when solr loads latest commit point... > > Thanks, eks - Mark Miller lucidimagination.com
Re: How to merge an "autofacet" with a predefined facet
In a way I agree that it would be easier to do that but i really wants to avoid this solution because it prefer to work "harder" on preparing my index than adding field requests on my front query :) So the only solution i see right now is to do that on my own in order to have my database fully prepared to be indexed ... but i had hope that solr could handle it ... so if anyone see any solution to handle it directly with solr you are welcome :p Anyways thanks for your help Em ;) Best regards, Xavier -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-merge-an-autofacet-with-a-predefined-facet-tp3763988p3764506.html Sent from the Solr - User mailing list archive at Nabble.com.
filter query or boolean?
Hi, Which is faster for boolean compound expressions. filter queries or a single query with boolean expressions? For that matter, is there any difference other than maybe speed? thanks
Re: Date filter query
Hi, 1) and 2) should have equal performance, given that several searches are performed with the same fq-param. Since the filters are cached, 1) and 2) perform better. Kind regards, Em Am 21.02.2012 19:06, schrieb ku3ia: > Hi all! > > Please advice me: > 1) q=test&fq=date:[NOW-30DAY+TO+NOW] > 2) q=test&fq=date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z] > 3) q=test+AND+date:[NOW-30DAY+TO+NOW] > 4) q=test+AND+date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z] > > where date: > precisionStep="6" positionIncrementGap="0"/> > > > Which of these queries will be faster by QTime at Solr 3.5? Thanks! > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Date-filter-query-tp3764349p3764349.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Date filter query
Hi all! Please advice me: 1) q=test&fq=date:[NOW-30DAY+TO+NOW] 2) q=test&fq=date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z] 3) q=test+AND+date:[NOW-30DAY+TO+NOW] 4) q=test+AND+date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z] where date: Which of these queries will be faster by QTime at Solr 3.5? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Date-filter-query-tp3764349p3764349.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR - Just for search or whole site DB?
Thank you for the information Damien. Is there a better database to use at the core of the sight which is more compatible with SOLR than MYSQL, or is hooking MYSQL up with SOLR simple enough. -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Just-for-search-or-whole-site-DB-tp3763439p3764254.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to merge an "autofacet" with a predefined facet
Wouldn't it be easier to store both types in different fields? At query-time you are able to do a facet on both and can combine the results client-side to present them within the GUI. Kind regards, Em Am 21.02.2012 17:52, schrieb Xavier: > Sure, the difference between my 2 facets are : > > - 'predefined_facets' contains values already filled in my database like : > 'web langage', 'cooking', 'fishing' > > - 'text_tag_facets' will contain the same possible value but determined > automatically from a given wordslist by searching in the document text as > shown in my previous post > > > Why i want to do that ? because sometimes my 'predefined_facets' is not > defined, and even if it is, i want to defined it the more as possible. > > Best regards, > Xavier > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/How-to-merge-an-autofacet-with-a-predefined-facet-tp3763988p3764116.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: How to merge an "autofacet" with a predefined facet
Sure, the difference between my 2 facets are : - 'predefined_facets' contains values already filled in my database like : 'web langage', 'cooking', 'fishing' - 'text_tag_facets' will contain the same possible value but determined automatically from a given wordslist by searching in the document text as shown in my previous post Why i want to do that ? because sometimes my 'predefined_facets' is not defined, and even if it is, i want to defined it the more as possible. Best regards, Xavier -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-merge-an-autofacet-with-a-predefined-facet-tp3763988p3764116.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to merge an "autofacet" with a predefined facet
Hi Xavier, > It's maybe because (As I understood) the real (stored) value of this dynamic > facet is still the initial fulltext ?? (or maybe i'm wrong ...) Exactly. CopyField does not copy the analyzed result of a field into another one. Instead, the original content given to that field (the unanalyzed raw input) is getting copied. Could you explain what is the difference between your text_tag_facets and your predefined facets? Kind regards, Em Am 21.02.2012 17:11, schrieb Xavier: > Hi everyone, > > Like explained in this post : > http://lucene.472066.n3.nabble.com/How-to-index-a-facetfield-by-searching-words-matching-from-another-Textfield-td3761201.html > > I have created a dynamic facet at indexation by searching terms in a > fulltext field. > > But i don't know if it's possible to merge this "autocreated" facet with a > facet already predefined ? i tried to used (adding this to my > code in my previous post) : > ** > > but it's not seems to work ... (my text_tag_facet is always working, but > didnt merged with my predefined_facet) > > It's maybe because (As I understood) the real (stored) value of this dynamic > facet is still the initial fulltext ?? (or maybe i'm wrong ...) > > I'm a little confused about this and i'm certainly doing it wrong but i > begin to feel that those kinds of manipulation arent feasible into > schema.xml > > Best regards. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/How-to-merge-an-autofacet-with-a-predefined-facet-tp3763988p3763988.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: How to index a facetfield by searching words matching from another Textfield
Thanks for this answer. I have posted my new question (related to this post) into a new topic ;) ( http://lucene.472066.n3.nabble.com/How-to-merge-an-quot-autofacet-quot-with-a-predefined-facet-td3763988.html ) Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-index-a-facetfield-by-searching-words-matching-from-another-Textfield-tp3761201p3763993.html Sent from the Solr - User mailing list archive at Nabble.com.
How to merge an "autofacet" with a predefined facet
Hi everyone, Like explained in this post : http://lucene.472066.n3.nabble.com/How-to-index-a-facetfield-by-searching-words-matching-from-another-Textfield-td3761201.html I have created a dynamic facet at indexation by searching terms in a fulltext field. But i don't know if it's possible to merge this "autocreated" facet with a facet already predefined ? i tried to used (adding this to my code in my previous post) : ** but it's not seems to work ... (my text_tag_facet is always working, but didnt merged with my predefined_facet) It's maybe because (As I understood) the real (stored) value of this dynamic facet is still the initial fulltext ?? (or maybe i'm wrong ...) I'm a little confused about this and i'm certainly doing it wrong but i begin to feel that those kinds of manipulation arent feasible into schema.xml Best regards. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-merge-an-autofacet-with-a-predefined-facet-tp3763988p3763988.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: SOLR - Just for search or whole site DB?
I would strongly recommend using Solr just for search. Solr is designed for doing fast search lookups. It is really not designed for performing all the functions of a relational database system. You certainly COULD use Solr for everything, and the software is constantly being enhanced to make it more flexible, but you'll still probably find it awkward and inconvenient for certain tasks that are simple with MySQL. It's also useful to be able to throw away and rebuild your Solr index at will, so you can upgrade to a new version or tweak your indexing rules. If you store mission-critical data in Solr itself, this becomes more difficult. The way I like to look at it is, as the name says, as an index. You use one system for actually managing your data, and then you use Solr to create an index of that data for fast look-up. - Demian > -Original Message- > From: Spadez [mailto:james_will...@hotmail.com] > Sent: Tuesday, February 21, 2012 7:45 AM > To: solr-user@lucene.apache.org > Subject: SOLR - Just for search or whole site DB? > > > I am new to this but I wanted to pitch a setup to you. I have a website > being coded at the moment, in the very early stages, but is effectively a > full text scrapper and search engine. We have decided on SOLR for the search > system. > > We basically have two sets of data: > > One is the content for the search engine, which is around 100K records at > any one time. The entire system is built on PHP and currently put into a > MSQL database. We want very quick relevant searches, this is critical. Our > plan is to import our records into SOLR each night from the MYSQL database. > > The second set of data is other parts of the site, such as our ticket > system, stats about the number of clicks etc. The performance on this is not > performance critical at all. > > So, I have two questions: > > Firstly, should everything be run through the SOLR search system, including > tickets and site stats? Alterntively, is it better to keep only the main > full text searches on SOLR and do the ticketing etc through normal MYSQL > queries? > > Secondly, which is probably dependant on the first question. If everything > should go through SOLR, should we even use a MYSQL database at all? If not, > what is the alternative? We use an XML file as a .SQL replacement for > content including tickets, stats, users, passwords etc. > > Sorry if these questions are basic, but I’m out of my depth here (but > learning!) > > James > > > -- > View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Just- > for-search-or-whole-site-DB-tp3763439p3763439.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to index a facetfield by searching words matching from another Textfield
setting stored="true" simply places a verbatim copy of the input in the index. Returning that field in a document will simply return that verbatim copy, there's no way to do anything else. The facet *values* you get back in your response should be what you put in your index though, why doesn't that suffice? BTW, it's best to start a new thread rather than switch topics mid-stream, see: http://people.apache.org/~hossman/#threadhijack Best Erick On Tue, Feb 21, 2012 at 8:35 AM, Xavier wrote: > Seems that's an error from the documentation with the 'Factory' missing in > the classname !!? > > I found > > > > That is working fine !!! > > Conclusion i have this files : > *synonymswords.txt :* > php,mysql,html,css=>web_langage > > And > > *keepwords.txt :* > web langage > > With this fieldType : > > omitNorms="true"> > > > > synonyms="synonymswords.txt"/> > replacement=" "/> > words="keepwords.txt" ignoreCase="true"/> > > > > > And it's working fine ;) > > > But I have another question, my fields are configured like that : > > > multiValued="true"/> > > But if I turn "stored" to "true", it always return the full original text in > my documents field value for "text_tag_facet" and not the facets created > (like 'web langage') > > How can i get the result of the facet in the stored field of the document ? > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/How-to-index-a-facetfield-by-searching-words-matching-from-another-Textfield-tp3761201p3763551.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to index a facetfield by searching words matching from another Textfield
Seems that's an error from the documentation with the 'Factory' missing in the classname !!? I found That is working fine !!! Conclusion i have this files : *synonymswords.txt :* php,mysql,html,css=>web_langage And *keepwords.txt :* web langage With this fieldType : And it's working fine ;) But I have another question, my fields are configured like that : But if I turn "stored" to "true", it always return the full original text in my documents field value for "text_tag_facet" and not the facets created (like 'web langage') How can i get the result of the facet in the stored field of the document ? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-index-a-facetfield-by-searching-words-matching-from-another-Textfield-tp3761201p3763551.html Sent from the Solr - User mailing list archive at Nabble.com.
reader/searcher refresh after replication (commit)
Hi all, I am a bit confused with IndexSearcher refresh lifecycles... In a master slave setup, I override postCommit listener on slave (solr trunk version) to read some user information stored in userCommitData on master -- @Override public final void postCommit() { // This returnes "stale" information that was present before replication finished RefCounted refC = core.getNewestSearcher(true); Map userData = refC.get().getIndexReader().getIndexCommit().getUserData(); } I expected core.getNewestSearcher(true); to return refreshed SolrIndexSearcher, but it didn't When is this information going to be refreshed to the status from the replicated index, I repeat this is postCommit listener? What is the way to get the information from the last commit point? Maybe like this? core.getDeletionPolicy().getLatestCommit().getUserData(); Or I need to explicitly open new searcher (isn't solr does this behind the scenes?) core.openNewSearcher(false, false) Not critical, reopening new searcher works, but I would like to understand these lifecycles, when solr loads latest commit point... Thanks, eks
Unique key constraint and optimistic locking (versioning)
Hi Does solr/lucene provide any mechanism for "unique key constraint" and "optimistic locking (versioning)"? Unique key constraint: That a client will not succeed creating a new document in solr/lucene if a document already exists having the same value in some field (e.g. an id field). Of course implemented right, so that even though two or more threads are concurrently trying to create a new document with the same value in this field, only one of them will succeed. Optimistic locking (versioning): That a client will only succeed updating a document if this updated document is based on the version of the document currently stored in solr/lucene. Implemented in the optimistic way that clients during an update have to tell which version of the document they fetched from Solr and that they therefore have used as a starting-point for their updated document. So basically having a version field on the document that clients increase by one before sending to solr for update, and some code in Solr that only makes the update succeed if the version number of the updated document is exactly one higher than the version number of the document already stored. Of course again implemented right, so that even though two or more thrads are concurrently trying to update a document, and they all have their updated document based on the current version in solr/lucene, only one of them will succeed. Or do I have to do stuff like this myself outside solr/lucene - e.g. in the client using solr. Regards, Per Steffensen
SOLR - Just for search or whole site DB?
I am new to this but I wanted to pitch a setup to you. I have a website being coded at the moment, in the very early stages, but is effectively a full text scrapper and search engine. We have decided on SOLR for the search system. We basically have two sets of data: One is the content for the search engine, which is around 100K records at any one time. The entire system is built on PHP and currently put into a MSQL database. We want very quick relevant searches, this is critical. Our plan is to import our records into SOLR each night from the MYSQL database. The second set of data is other parts of the site, such as our ticket system, stats about the number of clicks etc. The performance on this is not performance critical at all. So, I have two questions: Firstly, should everything be run through the SOLR search system, including tickets and site stats? Alterntively, is it better to keep only the main full text searches on SOLR and do the ticketing etc through normal MYSQL queries? Secondly, which is probably dependant on the first question. If everything should go through SOLR, should we even use a MYSQL database at all? If not, what is the alternative? We use an XML file as a .SQL replacement for content including tickets, stats, users, passwords etc. Sorry if these questions are basic, but I’m out of my depth here (but learning!) James -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Just-for-search-or-whole-site-DB-tp3763439p3763439.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Fast Vector Highlighter Working for some records only
Hi Koji, Thanks for quick reply, i am using solr 1.4.1 i am querying *"camera"* here is the example of documents : which matches the 70 Electronics/Cell Phones /b/l/blackberry-8100-pearl-2.jpg 349.99 BlackBerry 8100 Pearl sports a large 240 x 260 screen that supports over 65,000 colors-- plenty of real estate to view your e-mails, Web browser content, messaging sessions, and attachments. Silver blackberry-8100-pearl.html Like the BlackBerry 7105t, the BlackBerry 8100 Pearl is The BlackBerry 8100 Pearl sports a large 240 x 260 screen that supports over 65,000 colors-- plenty of real estate to view your e-mails, Web browser content, messaging sessions, and attachments. The venerable BlackBerry trackwheel has been replaced on this model with an innovative four-way trackball placed below the screen. On the rear of the handheld, you'll find a 1.3-megapixel camera and a self portrait mirror. The handheld's microSD memory card slot is located inside the device, behind the battery. There's also a standard 2.5mm headset jack that can be used with the included headset, as well as a mini-USB port for data connectivity. BlackBerry 8100 Pearl
89 Electronics/Cameras/Accessories /u/n/universal-camera-case-2.jpg 34.0 Universal Camera Case Green universal-camera-case.html A stylish digital camera demands stylish protection. This leather carrying case will defend your camera from the dings and scratches of travel and everyday use while looking smart all the time. Universal Camera Case on above documents i am getting highlighting response on documentid = 89 and not for documentId = 70 even though there is word called "camera" in document(id=70).. I have field called for your information i am using custom analyser for indexing and querying. Thanks Dhaivat Koji Sekiguchi wrote > > Dhaivat, > > Can you give us the concrete document that you are trying to search and > make > a highlight snippet? And what is your Solr version? > > koji > -- > Query Log Visualizer for Apache Solr > http://soleami.com/ > > (12/02/21 20:29), dhaivat wrote: >> >> Hi >> >> I am newbie to Solr and i am using Sorj Client to create index and query >> the >> solr data.. When i am querying the data i want to use Highlight feature >> of >> solr so i am using Fast Vector Highlighter to enable highlight on words.. >> I >> found that it's working fine for some documents and for some document >> it's >> returning any highlighted words even though the field of document >> contents >> that word.. i am using the following parameters using solrj client : >> >> query.add("hl","true"); >> query.add("hl.q",term); >> query.add("hl.fl","contents"); >> query.add("hl.snippets","100"); >> query.add("hl.fragsize","10"); >> query.add("hl.maxAnalyzedChars","10"); >> query.add("hl.useFastVectorHighlighter","true"); >> query.add("hl.highlightMultiTerm","true"); >> query.add("hl.regex.slop","0.5"); >> query.add("hl.regex.pattern","[-\\w ,/\n\\\"']*"); >> >> query.setHighlightSimplePre("*"); >> query.setHighlightSimplePost("*"); >> >> My solrconfig is pretty strait forward haven't specified anything related >> to >> highlighter there. >> >> this is how my solrConfig looks like : >> >> >> >> >> >> > multipartUploadLimitInKB="2048" /> >> >> >>> default="true" /> >> >>> class="org.apache.solr.handler.admin.AdminHandlers" /> >> >> >> >> >> >> >> solr >> >> >> >> >> i have also enabled the TermVectors,TermOffsets,TermPostions on Field on >> which i am indexing >> >> >> can anyone tell me where i am going wrong ? >> >> thanks in advance >> >> Dhaivat >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Fast-Vector-Highlighter-Working-for-some-records-only-tp3763286p3763286.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > Koji Sekiguchi wrote > > Dhaivat, > > Can you give us the concrete document that you are trying to search and > make > a highlight snippet? And what is your Solr version? > > koji > -- > Query Log Visualizer for Apache Solr > http://soleami.com/ > > (12/02/21 20:29), dhaivat wrote: >> >> Hi >> >> I am newbie to Solr and i am using Sorj Client to create in
- 1.3 mega pixel camera to capture those special moments
- MP3 player lets you listen to your favorite music on the go
- Menu and escape keys on the front of the device for easier access
- Bluetooth technology lets you experience hands free and wire free features
- Package Contents: phone,AC adapter,software CD,headset,USB cable,sim- card,get started poster,reference guide
Re: Fast Vector Highlighter Working for some records only
Dhaivat, Can you give us the concrete document that you are trying to search and make a highlight snippet? And what is your Solr version? koji -- Query Log Visualizer for Apache Solr http://soleami.com/ (12/02/21 20:29), dhaivat wrote: Hi I am newbie to Solr and i am using Sorj Client to create index and query the solr data.. When i am querying the data i want to use Highlight feature of solr so i am using Fast Vector Highlighter to enable highlight on words.. I found that it's working fine for some documents and for some document it's returning any highlighted words even though the field of document contents that word.. i am using the following parameters using solrj client : query.add("hl","true"); query.add("hl.q",term); query.add("hl.fl","contents"); query.add("hl.snippets","100"); query.add("hl.fragsize","10"); query.add("hl.maxAnalyzedChars","10"); query.add("hl.useFastVectorHighlighter","true"); query.add("hl.highlightMultiTerm","true"); query.add("hl.regex.slop","0.5"); query.add("hl.regex.pattern","[-\\w ,/\n\\\"']*"); query.setHighlightSimplePre("*"); query.setHighlightSimplePost("*"); My solrconfig is pretty strait forward haven't specified anything related to highlighter there. this is how my solrConfig looks like : solr i have also enabled the TermVectors,TermOffsets,TermPostions on Field on which i am indexing can anyone tell me where i am going wrong ? thanks in advance Dhaivat -- View this message in context: http://lucene.472066.n3.nabble.com/Fast-Vector-Highlighter-Working-for-some-records-only-tp3763286p3763286.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: lucene operators interfearing in edismax
Ok thanks. But I reviewed some of my searches and the - was not surrounded by withespaces in all cases, so I'll have to remove lucene operators myself from the user input. I understand there is no predefined way to do so. -- View this message in context: http://lucene.472066.n3.nabble.com/lucene-operators-interfearing-in-edismax-tp3761577p3763324.html Sent from the Solr - User mailing list archive at Nabble.com.
Fast Vector Highlighter Working for some records only
Hi I am newbie to Solr and i am using Sorj Client to create index and query the solr data.. When i am querying the data i want to use Highlight feature of solr so i am using Fast Vector Highlighter to enable highlight on words.. I found that it's working fine for some documents and for some document it's returning any highlighted words even though the field of document contents that word.. i am using the following parameters using solrj client : query.add("hl","true"); query.add("hl.q",term); query.add("hl.fl","contents"); query.add("hl.snippets","100"); query.add("hl.fragsize","10"); query.add("hl.maxAnalyzedChars","10"); query.add("hl.useFastVectorHighlighter","true"); query.add("hl.highlightMultiTerm","true"); query.add("hl.regex.slop","0.5"); query.add("hl.regex.pattern","[-\\w ,/\n\\\"']*"); query.setHighlightSimplePre("*"); query.setHighlightSimplePost("*"); My solrconfig is pretty strait forward haven't specified anything related to highlighter there. this is how my solrConfig looks like : solr i have also enabled the TermVectors,TermOffsets,TermPostions on Field on which i am indexing can anyone tell me where i am going wrong ? thanks in advance Dhaivat -- View this message in context: http://lucene.472066.n3.nabble.com/Fast-Vector-Highlighter-Working-for-some-records-only-tp3763286p3763286.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to index a facetfield by searching words matching from another Textfield
That's it ! Thanks :) First time i see that documentation page (which is really helpfull) : http://lucidworks.lucidimagination.com/display/solr/Filter+Descriptions#FilterDescriptions-KeepWordsFilter So, now i want to "associate" a wordslist to a value of an existing facets So i tried i combine synonyms and keepwords like that : It works very well but my problem now is that i want to have whitespaces return in synonym and match it with my keepwords ! (because i have whitespaces in the values of my facet) Exemple if i see : 'php' term i get with my synonyms_words : 'web langage' and i keep the whole word 'web langage' So my files are : synonymswords.txt : php=>web langage keepwords.txt : web langage The problem is that each words are analyze separatly and i dont know how to handle it with whitespaces ... (synonyms return 'web' and 'langage' so it don't match with 'web langage') I tried to use 'solr.PatternReplaceFilter' (as you can see in my configuration above ) with a chosen caractere '_' as a space caracter but i get an error so if you have an other tip for me it would be great :p -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-index-a-facetfield-by-searching-words-matching-from-another-Textfield-tp3761201p3763247.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Do SOLR supports Lemmatization
Hi, Have a look at the following link: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?highlight=%28Lemmatization%29#Stemming Regards, Dirceu On Tue, Feb 21, 2012 at 11:18 AM, dsy99 wrote: > Dear all, > I want to know, do SOLR support Lemmatization? If yes, which in-built > Lemmatizer class should be included in SOLR schema file to analyze the > tokens using lemmatization rather than stemming. > > Thanks in advance. > > With Thanks & Regds: > Divakar Yadav > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Do-SOLR-supports-Lemmatization-tp3763139p3763139.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Dirceu Vieira Júnior --- +47 9753 2473 dirceuvjr.blogspot.com twitter.com/dirceuvjr
Do SOLR supports Lemmatization
Dear all, I want to know, do SOLR support Lemmatization? If yes, which in-built Lemmatizer class should be included in SOLR schema file to analyze the tokens using lemmatization rather than stemming. Thanks in advance. With Thanks & Regds: Divakar Yadav -- View this message in context: http://lucene.472066.n3.nabble.com/Do-SOLR-supports-Lemmatization-tp3763139p3763139.html Sent from the Solr - User mailing list archive at Nabble.com.
Query regarding Lucene Indexing Method
Hi Team , Is there any article or site where I can learn about lucene index Method: how is it written and maintained? And one quick question : The Standard method that Lucene uses to handle Indexes, Is it apache package or Lucene has own Index writing Method? Does lucene use memory mapped files? Thanks and Regards, S SYED ABDUL KATHER