What does this mean?
I have field value cache stats. Anyone can tell me whether this shows any problem with respect to performance? name:fieldCache class: org.apache.solr.search.SolrFieldCacheMBean version: 1.0 description: Provides introspection of the Lucene FieldCache, this is **NOT** a cache that is managed by Solr. stats: entries_count : 4 entry#0 : 'ReadOnlyDirectoryReader(segments_pz4 _28c(2.x):C127800/127800 _5rb(2.x):C130681/130681 _7pw(2.x):C64679/64679 _b9z(2.x):C112359/112359 _ese(2.x):C127711/14 _gwj(2.x):C131443 _lc0(2.x):C129588 _orx(2.x):C116270 _s20(2.x):C113641 _sos(2.x):C12805 _ss5(2.x):C11009 _ssf(2.x):C1010 _ssq(2.x):C1010 _st1(2.x):C1010 _stc(2.x):C1010 _stn(2.x):C1010 _sto(2.x):C101 _stp(2.x):C101 _stq(2.x):C101 _str(2.x):C101 _sts(2.x):C101 _stt(2.x):C101 _stu(2.x):C101 _stv(2.x):C101 _stw(2.x):C101)'='2',class org.apache.lucene.search.FieldCache$StringIndex,null=org.apache.lucene.search.FieldCache$StringIndex#26094814 entry#1 : 'ReadOnlyDirectoryReader(segments_pz4 _28c(2.x):C127800/127800 _5rb(2.x):C130681/130681 _7pw(2.x):C64679/64679 _b9z(2.x):C112359/112359 _ese(2.x):C127711/14 _gwj(2.x):C131443 _lc0(2.x):C129588 _orx(2.x):C116270 _s20(2.x):C113641 _sos(2.x):C12805 _ss5(2.x):C11009 _ssf(2.x):C1010 _ssq(2.x):C1010 _st1(2.x):C1010 _stc(2.x):C1010 _stn(2.x):C1010 _sto(2.x):C101 _stp(2.x):C101 _stq(2.x):C101 _str(2.x):C101 _sts(2.x):C101 _stt(2.x):C101 _stu(2.x):C101 _stv(2.x):C101 _stw(2.x):C101)'='S677',class org.apache.lucene.search.FieldCache$StringIndex,null=org.apache.lucene.search.FieldCache$StringIndex#19129406 entry#2 : 'ReadOnlyDirectoryReader(segments_pz4 _28c(2.x):C127800/127800 _5rb(2.x):C130681/130681 _7pw(2.x):C64679/64679 _b9z(2.x):C112359/112359 _ese(2.x):C127711/14 _gwj(2.x):C131443 _lc0(2.x):C129588 _orx(2.x):C116270 _s20(2.x):C113641 _sos(2.x):C12805 _ss5(2.x):C11009 _ssf(2.x):C1010 _ssq(2.x):C1010 _st1(2.x):C1010 _stc(2.x):C1010 _stn(2.x):C1010 _sto(2.x):C101 _stp(2.x):C101 _stq(2.x):C101 _str(2.x):C101 _sts(2.x):C101 _stt(2.x):C101 _stu(2.x):C101 _stv(2.x):C101 _stw(2.x):C101)'='648',class org.apache.lucene.search.FieldCache$StringIndex,null=org.apache.lucene.search.FieldCache$StringIndex#22924041 entry#3 : 'ReadOnlyDirectoryReader(segments_pz4 _28c(2.x):C127800/127800 _5rb(2.x):C130681/130681 _7pw(2.x):C64679/64679 _b9z(2.x):C112359/112359 _ese(2.x):C127711/14 _gwj(2.x):C131443 _lc0(2.x):C129588 _orx(2.x):C116270 _s20(2.x):C113641 _sos(2.x):C12805 _ss5(2.x):C11009 _ssf(2.x):C1010 _ssq(2.x):C1010 _st1(2.x):C1010 _stc(2.x):C1010 _stn(2.x):C1010 _sto(2.x):C101 _stp(2.x):C101 _stq(2.x):C101 _str(2.x):C101 _sts(2.x):C101 _stt(2.x):C101 _stu(2.x):C101 _stv(2.x):C101 _stw(2.x):C101)'='359',class org.apache.lucene.search.FieldCache$StringIndex,null=org.apache.lucene.search.FieldCache$StringIndex#12376920 insanity_count : 0 -- View this message in context: http://lucene.472066.n3.nabble.com/What-does-this-mean-tp3581884p3581884.html Sent from the Solr - User mailing list archive at Nabble.com.
Too many connections in CLOSE_WAIT state on master solr server
Hi, I am using solr replication and am experiencing a lot of connections in the state CLOSE_WAIT at the master solr server. These disappear after a while, but till then the master solr stops responding. There are about 130 open connections on the master server with the client as the slave m/c and all are in the state CLOSE_WAIT. Also, the client port specified on the master solr server netstat results is not visible in the netstat results on the client (slave solr) m/c. Following is my environment: - 40 cores in the master solr on m/c 1 - 40 cores in the slave solr on m/c 2 - The replication poll interval is 20 seconds. - Replication part in solrconfig.xml in the slave solr: requestHandler name=/replication class=solr.ReplicationHandler lst name=slave !--fully qualified url for the replication handler of master-- str name=masterUrl$mastercorename/replication/str !--Interval in which the slave should poll master .Format is HH:mm:ss . If this is absent slave does not poll automatically. But a fetchindex can be triggered from the admin or the http API-- str name=pollInterval00:00:20/str !-- The following values are used when the slave connects to the master to download the index files. Default values implicitly set as 5000ms and 1ms respectively. The user DOES NOT need to specify these unless the bandwidth is extremely low or if there is an extremely high latency-- str name=httpConnTimeout5000/str str name=httpReadTimeout1/str /lst /requestHandler Thanks for any pointers. -- Regards, Samarth
Re: Virtual Memory very high
If you allow me to chime in, is there a way to check for which DirectoryFactory is in use, if ${solr.directoryFactory:solr.StandardDirectoryFactory} has been configured? Dmitry 2011/12/12 Yury Kats yuryk...@yahoo.com On 12/11/2011 4:57 AM, Rohit wrote: What are the difference in the different DirectoryFactory? http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/store/MMapDirectory.html http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/store/NIOFSDirectory.html
Looking for a good commit/merge strategy
Hi all, we are indexing real-time documents from various sources. Since we have multiple sources, we encounter quite a number of duplicates which we delete from the index. This mostly occurs within a short timeframe; deletes of older documents may happen, but they do not have a high priority. Search results do not need to be exactly reatime (they can be 1 minute or so behind), but facet counts should be correct as we use them to visualize frequencies in the data. We are now looking for a good commit/merge strategy. Any advice? Thanks and best, Peter -- View this message in context: http://lucene.472066.n3.nabble.com/Looking-for-a-good-commit-merge-strategy-tp3582294p3582294.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Looking for a good commit/merge strategy
How do you determine a duplicate? Solr has de-duplication built in and also you may consider hashing documents on some fields to create a consistent doc id that would be the same for same documents and let Solr re-write them. Either approach would reduce or eliminate the possibility of duplicates and save time. Hi all, we are indexing real-time documents from various sources. Since we have multiple sources, we encounter quite a number of duplicates which we delete from the index. This mostly occurs within a short timeframe; deletes of older documents may happen, but they do not have a high priority. Search results do not need to be exactly reatime (they can be 1 minute or so behind), but facet counts should be correct as we use them to visualize frequencies in the data. We are now looking for a good commit/merge strategy. Any advice? Thanks and best, Peter -- View this message in context: http://lucene.472066.n3.nabble.com/Looking-for-a-good-commit-merge-strategy-tp3582294p3582294.html Sent from the Solr - User mailing list archive at Nabble.com.
Problem with result grouping
Hi, Maybe there is something i am missing here but i have a field in my solr index called categoryId. The field definition is as follows: field name=categoryId type=string indexed=true stored=true required=true / I am trying to group on this field and i get a result as follows: str name=groupValue43201810/str result name=doclist numFound=72 start=0 This is the query i am sending to solr: http://localhost:8080/solr/catalogue/select/?q=*.*%0D%0Aversion=2.2start=0rows=1000indent=ongroup=truegroup.field=categoryId My understanding is that this means there are 72 documents in my index that have the value 43201810 for categoryId. Now surprisingly when i search my index specifically for categoryId:43201810 expecting to get 72 results i instead get 124 results. This is the query sent: http://localhost:8080/solr/catalogue/select/?q=categoryId%3A43201810version=2.2start=0rows=10indent=on Is my understanding of result grouping correct? Is there something i am doing wrong. Any help will be much appreciated. I am using Solr 3.5 Thanks.
Re: Looking for a good commit/merge strategy
Peter: You may want to take a look at Solr 3.4 with RankingAlgorithm 1.3. It has NRT support that allows you to search in real time with updates. The performance is about 1 docs / sec with the MBArtists index (approx 43 fields ). MBArtists index is the index of artists from musicbrainz.org in the Solr 1.4 Enterprise Server book. Regards data visibility, you can configure this as a parameter in solrconfig.xml as below: realtime visible=200 facet=falsetrue/realtime The visible attribute, 200 is in ms and controls the max duration updated docs may not be visible in a search. The facet attribute, can be true or false depending on if you need real time faceting. Real time faceting, depending on update load (for high updates), can see performance problems as field cache is invalidated. So turn it on as needed. You can get more information about the NRT with Solr 3.x and RankingAlgorithm 1.3 from here: http://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search_ver_3.x You can download Solr 3.4 with RankingAlgorithm 1.3 from here: http://solr-ra.tgels.org (there is an early access Solr 3.5 with RankingAlgorithm 1.3 release available for download also) Regards, - Nagendra Nagarajayya http://solr-ra.tgels.org http://rankingalgorithm.tgels.org -- View this message in context: http://lucene.472066.n3.nabble.com/Looking-for-a-good-commit-merge-strategy-tp3582294p3582380.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Quick relevance question
There's actually a Solr JIRA about this: https://issues.apache.org/jira/browse/SOLR-2953 But it begs the question of why you want to do this? Are you sure this would actually providing a better experience for your users? The reason I ask is that you could put a lot of effort into making this happen and discover that it was a waste IOTW, what's the use case? Best Erick On Sun, Dec 11, 2011 at 5:57 PM, Ryan Gehring rgehr...@linkedin.com wrote: Hello! SOLR Newbie here. I managed to pick a use case that is a little goofy for my first SOLR voyage and would love help. I'd like to rank documents by the total matching term count rather than the usual cosine similarity stuff. I have a field named all which has all my searchable fields copyfielded to it.It seems like I need to use SOLR 4.0 and _val_:sum( Termfreq(all, 'term1') , Termfreq(all, 'term2') , … ) For every query term. Is there a better way to do this ? Thanks! Ryan Gehring
Re: Highlighter highlighting terms which are not part of the search
Well, we need some more details to even guess. Please review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Mon, Dec 12, 2011 at 12:04 AM, Shyam Bhaskaran shyam.bhaska...@synopsys.com wrote: Hi We recently upgraded our Solr to the latest 4.0 trunk and we are seeing a weird behavior with highlighting which was not seen earlier. When a search query for example generate test pattern is passed in the results et obtained the first few results shows the highlighting properly but in the later results we see terms which were not part of the search like Question, Answer, used etc. are being highlighted. We are using regular and termVectorHighlighter and never faced this kind of scenario, edismax is used in our configuration. Can someone point to what is causing this problem and where I need to look into for fixing this? -Shyam
Re: Ask about the question of solr cache
Are you sure you commit after you're done? If you change the index, this should all be automatic. Although that doesn't make a lot of sense if you restart Solr because the changes would probably be lost then. But I'm a bit confused about what you mean by caches not being updated. Do you mean search results? Some details about how you're verifying that the results aren't available would be helpful. This should be all automatic, so my first guess would be that you're not doing what you think you are G. Best Erick On Mon, Dec 12, 2011 at 6:21 AM, JiaoyanChen chen00...@gmail.com wrote: When I have delete or add data by application through solrj, or have import index through command nutch solrindex, the cache of solr are not changed if I do not restart solr. Could anyone tell me how could I update solr cache without restarting using shell command? When I recreate the index by nutch, I should update data in solr. I use java -jar start.jar to publish solr. Thanks!
Re: manipulate the results coming back from SOLR? (was: possible to do arithmetic on returned values?)
Erik hatcher wrote you a comment assuming you were using Velocity. The more generic form of that comment is that this is an app-level issue by and large. Solr is in charge of searching and returning data, the app is a better place to change that into something pretty... Best Erick On Mon, Dec 12, 2011 at 9:37 AM, Gabriel Cooper gabriel.coo...@jtv.com wrote: I'm hoping I just got lost in the shuffle due to posting on a Friday night. Is there a way to change a field's data via some function, e.g. add, subtract, product, etc.? On 12/9/11 4:17 PM, Gabriel Cooper wrote: Is there a way to manipulate the results coming back from SOLR? I have a SOLR 3.5 index that contains values in cents (e.g. 100 in the index represents $1.00) and in certain contexts (e.g. CSV export) I'd like to divide by 100 for that field to provide a user-friendly in dollars number. To do this I played around with Function Queries for a while without realizing they're limited to relevancy scores, and later found DocTransformers in 4.0 whose description sounded right but don't exist in 3.5. Is there anything else I haven't considered? Thanks for any help Gabriel Cooper.
MLT as a nested query
Hi, is it possible to use MLT as a nested query? I tried the following: select?q=field1:foo field2:bar AND _query_:{!mlt fl=mltField mindf=1 mintf=1 mlt.match.include=false} selectField:baz} but it does not work with an error: Unknown query type 'mlt' I guess I should have an MLT parser enabled in solrconfig.xml, but I was not able to find an implementation. Does anybody have any suggestions? Vyacheslav
social/collaboration features on top of solr
Has anyone implemented some social/collaboration features on top of SOLR? What I am thinking is ability to add ratings and comments to documents in SOLR and then be able to fetch comments and ratings for each document in results (and have as part of response from SOLR), similar in fashion to MLT results. I think a separate index or separate core to store collaboration info would be needed, as well as a search component for fetching collaboration info for results. I would think this would be a great feature and wondering if anyone has done something similar. Bob
RE: social/collaboration features on top of solr
VuFind (http://vufind.org) uses Solr for library catalog (or similar) applications and features a MySQL database which it uses for storing user tags and comments outside of Solr itself. If there were a mechanism more closely tied to Solr for achieving this sort of effect, that would allow VuFind to do things with considerably more elegance! - Demian -Original Message- From: Robert Stewart [mailto:bstewart...@gmail.com] Sent: Tuesday, December 13, 2011 10:28 AM To: solr-user@lucene.apache.org Subject: social/collaboration features on top of solr Has anyone implemented some social/collaboration features on top of SOLR? What I am thinking is ability to add ratings and comments to documents in SOLR and then be able to fetch comments and ratings for each document in results (and have as part of response from SOLR), similar in fashion to MLT results. I think a separate index or separate core to store collaboration info would be needed, as well as a search component for fetching collaboration info for results. I would think this would be a great feature and wondering if anyone has done something similar. Bob
Re: Looking for a good commit/merge strategy
@ project: Thanks for the hints, I will take a look! @ Nagendra: Solr-RA seems very interesting! I take it that you can use it with an existing index? -- View this message in context: http://lucene.472066.n3.nabble.com/Looking-for-a-good-commit-merge-strategy-tp3582294p3582626.html Sent from the Solr - User mailing list archive at Nabble.com.
edismax phrase matching with a non-word char inbetween
I have a field which is indexed and queried as follows: tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=text-synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ When searching for street work (with quotes), i'm getting matches and highlighting on things like... ...Oxford emStreet/em (emWork/em Experience)... why is this happening, and what can I do to stop it? I've set int name=qs0/int in my config to try and avert this sort of behaviour, am I correct in thinking that this is used to ensure there are no words in-between the phrase words?
Re: Looking for a good commit/merge strategy
Yes, no changes to your existing index. No commit needed. You may want to change your autocommit interval to about 15 mins ... Regards, - Nagendra Nagarajayya http://solr-ra.tgels.org http://rankingalgorithm.tgels.org On 12/13/2011 7:32 AM, peter_solr wrote: @ project: Thanks for the hints, I will take a look! @ Nagendra: Solr-RA seems very interesting! I take it that you can use it with an existing index? -- View this message in context: http://lucene.472066.n3.nabble.com/Looking-for-a-good-commit-merge-strategy-tp3582294p3582626.html Sent from the Solr - User mailing list archive at Nabble.com.
Matching all documents in the index
Hi, I have come across this query in the admin interface: *.* Is this meant to match all documents in my index? Currently when i run query with q= *.*, numFound is 130310 but the actuall number of documents in my index is 603308. Shen i then run the query with q = * then numFound is 603308 which is the total number of documents in my index. So what is the difference between query with q = *.* and q = * ? I ran into this problem because i have a particular scenario where in my index where i have a field called categoryId which i am grouping on and another field called orgId which i then filter on. So i do grouping on categoryId but on all documents in the index matching the filter query field. I use q = *.* but this dosen't give me the true picture as highlighted above. So i use q = * and this works fine but takes about 2900ms to execute. Is this efficient? Is there a better way to do something like this? Solr version = 3.5 Thanks.
CRUD on solr Index while replicating between master/slave
Hi, When replication is happening between master to slave what operations can we do on the master what operations are possible on the slave? I know it is not adivisable to do DML on the slave index but I wanted to know this anyway. Also I understand that doing DML on a slave will make the slave index incompatible with the master. Master Search -- Yes/No Update/insert/delete docs -- Yes/No Slave = Search -- Yes/No Update/insert/delete docs -- Yes/No Please share any other caveats that you have discovered regarding the above scenario that might be helpful. Thanks -=-
Re: Matching all documents in the index
try *:* instead of *.* simon On Tue, Dec 13, 2011 at 5:03 PM, Kissue Kissue kissue...@gmail.com wrote: Hi, I have come across this query in the admin interface: *.* Is this meant to match all documents in my index? Currently when i run query with q= *.*, numFound is 130310 but the actuall number of documents in my index is 603308. Shen i then run the query with q = * then numFound is 603308 which is the total number of documents in my index. So what is the difference between query with q = *.* and q = * ? I ran into this problem because i have a particular scenario where in my index where i have a field called categoryId which i am grouping on and another field called orgId which i then filter on. So i do grouping on categoryId but on all documents in the index matching the filter query field. I use q = *.* but this dosen't give me the true picture as highlighted above. So i use q = * and this works fine but takes about 2900ms to execute. Is this efficient? Is there a better way to do something like this? Solr version = 3.5 Thanks.
Best way to convert a field in a query to a fq?
I want to modify incoming queries such that a field is always transformed to a filter query. For example, I want to convert a query field like q= ... part_page=3 ... to a filter query like q= ... fq=partpage(3) . Is the right way to do this in a custom component, or is there someplace else where this should be handled? We have several clients and would like to protect the server from this field being queried on even if they make a mistake. Thank you. -- Andrew Lundgren lundg...@familysearch.org NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
Re: Matching all documents in the index
Hi Simon, Thanks for this. Query time dramatically reduced to 27ms with this. Many thanks. On Tue, Dec 13, 2011 at 4:20 PM, Simon Willnauer simon.willna...@googlemail.com wrote: try *:* instead of *.* simon On Tue, Dec 13, 2011 at 5:03 PM, Kissue Kissue kissue...@gmail.com wrote: Hi, I have come across this query in the admin interface: *.* Is this meant to match all documents in my index? Currently when i run query with q= *.*, numFound is 130310 but the actuall number of documents in my index is 603308. Shen i then run the query with q = * then numFound is 603308 which is the total number of documents in my index. So what is the difference between query with q = *.* and q = * ? I ran into this problem because i have a particular scenario where in my index where i have a field called categoryId which i am grouping on and another field called orgId which i then filter on. So i do grouping on categoryId but on all documents in the index matching the filter query field. I use q = *.* but this dosen't give me the true picture as highlighted above. So i use q = * and this works fine but takes about 2900ms to execute. Is this efficient? Is there a better way to do something like this? Solr version = 3.5 Thanks.
Re: Create 2 index with solr
Thank you Dimitry Kan -- View this message in context: http://lucene.472066.n3.nabble.com/Create-2-index-with-solr-tp2730485p3581568.html Sent from the Solr - User mailing list archive at Nabble.com.
reposting highlighting questions
I am new to solr/xml/xslt, and trying to figure out how to display search query fields highlighted in html. I can enable the highlighting in the query, and I think I get the correct xml response back (See below: I search using 'Contents' and the highlighting is shown with strong and /strong. However, I cannot figure out what to add to the xslt file to transform it in html. I think it is a question of defining the appropriate xpath(?), but I am stuck. Can someone point me in the right direction? Thanks in advance! Here is the result I get back: ?xml version=1.0 encoding=UTF-8 ? http://us.mg4.mail.yahoo.com/neo/ - response http://us.mg4.mail.yahoo.com/neo/ - lst name=responseHeader int name=status0/int int name=QTime20/int http://us.mg4.mail.yahoo.com/neo/ - lst name=params str name=explainOther / str name=indenton/str str name=hl.simple.pre'strong'/str str name=hl.fl*/str str name=wt / str name=hlon/str str name=rows10/str str name=version2.2/str str name=fl / str name=start0/str str name=qcontents/str str name=hl.simple.post'/strong'/str str name=qt / str name=fq / /lst /lst http://us.mg4.mail.yahoo.com/neo/ - result name=response numFound=1 start=0 http://us.mg4.mail.yahoo.com/neo/ - doc http://us.mg4.mail.yahoo.com/neo/ - arr name=content strStart with the Table of Contents. See if you can find the topic that you are interested in. Look through the section to see if there is a resource that can help you. If you find one, you may want to attach a Post-it tab so you can find the page later. Write down all of the information that you need to find out more information about the resource: agency name, name of contact person, telephone number, email and website addresses. If you were unable to find a resource that will help you in this resource guide, a good first step would be to call your local Independent Living Center. They will have a good idea of what is available in your area. A second step would be to call or email us at the Rehabilitation Research Center. We have a ROBOT resource specialist who may be able to assist. /str /arr http://us.mg4.mail.yahoo.com/neo/ - arr name=doclink strrobot.pdf#page=11/str /arr str name=heading1CHAPTER 1: How to Use This Resource Guide/str str name=id1-1/str /doc /result http://us.mg4.mail.yahoo.com/neo/ - lst name=highlighting http://us.mg4.mail.yahoo.com/neo/ - lst name=1-1 http://us.mg4.mail.yahoo.com/neo/ - arr name=content strStart with the Table of 'strong'Contents'/strong'. See if you can find the topic that you are interested in. Look/str /arr /lst /lst /response _ No virus found in this message. Checked by AVG - www.avg.com Version: 2012.0.1873 / Virus Database: 2108/4678 - Release Date: 12/13/11
Combination of edgengram and ngram
I am interested in a new filter type, one that would combine edgengram and ngram. The idea is that it would create all ngrams specified by the min/max size, but the ngrams that happen to be edgengrams (specifically the left side) would get an index-time boost. Optionally the boost would be higher if it came from the first token. The use case: An automatic autosuggest dropdown that populates as a user types into a search box. The index would have one field and it would be built from a manually produced list of suggested search phrases. The boosts mentioned would make it so that matches from the beginning of a word, and especially from the beginning of the entire suggested phrase, would be returned first. I could get a similar effect by using a copyfield, analyzing one field with ngrams and the other with edgengrams, then using edismax to put a boost on the edge version. I will start with this method, but using copyfield makes the index bigger, and using dismax makes the ultimate parsed queries more complicated. If I can avoid the copyfield, the index will be smaller and the queries very simple, which should make for very high speed. I will take a look at the source code, but I'm a bit of a Java novice. Does anyone have the knowledge, desire, and time to crank this one out quickly? Is it possible someone has already written such a filter? Thanks, Shawn
Re: MySQL data import
On 12/11/2011 1:54 PM, Brian Lamb wrote: By nature of my schema, I have several multivalued fields. Each one I populate with a separate entity. Is there a better way to do it? For example, could I pull in all the singular data in one sitting and then come back in later and populate with the multivalued items. An alternate approach in some cases would be to do a GROUP_CONCAT and then populate the multivalued column with some transformation. Is that possible? Lastly, is it possible to use copyField to copy three regular fields into one multiValued field and have all the data show up? The best way to proceed may depend on whether you actually need the field to be multivalued (returning an array in search results), or if you simply need to be able to search on all the values. For me, it's the latter - the field isn't stored. I use the GROUP_CONCAT method (hidden in a database view, so Solr doesn't need to know about it) to put multiple values into a field, separated by semicolons. I then use the following single-valued fieldType to split those up and make all the values searchable. The tokenizer splits by semicolons followed by zero or more spaces, the pattern filter strips leading and trailing punctuation from each token. The ICU filter is basically a better implementation of the ascii folding filter and the lowercase filter, in a single pass. The others are fairly self-explanatory: !-- lowercases, tokenize by semicolons -- fieldType name=lcsemi class=solr.TextField sortMissingLast=true positionIncrementGap=0 omitNorms=true analyzer tokenizer class=solr.PatternTokenizerFactory pattern=; */ filter class=solr.PatternReplaceFilterFactory pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$2 allowempty=false / filter class=solr.ICUFoldingFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ filter class=solr.TrimFilterFactory/ /analyzer /fieldType If you actually do need the field to be multivalued, then you'll need to do dataimport transformation as mentioned by Gora, who also replied. Thanks, Shawn
RE: Virtual Memory very high
Thanks Yurykats. Regards, Rohit Mobile: +91-9901768202 About Me: http://about.me/rohitg -Original Message- From: Dmitry Kan [mailto:dmitry@gmail.com] Sent: 13 December 2011 11:17 To: solr-user@lucene.apache.org Subject: Re: Virtual Memory very high If you allow me to chime in, is there a way to check for which DirectoryFactory is in use, if ${solr.directoryFactory:solr.StandardDirectoryFactory} has been configured? Dmitry 2011/12/12 Yury Kats yuryk...@yahoo.com On 12/11/2011 4:57 AM, Rohit wrote: What are the difference in the different DirectoryFactory? http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/store/MMapDirectory.html http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/store/NIOFSDirectory.html
How to get SolrServer
Hi I am new to Solr and want to do some customize development. I have wrapped solr to my own web application, and want to write a servlet to index a file system. The question is how can I get SolrServer inside my Servlet? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-get-SolrServer-tp3583304p3583304.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Matching all documents in the index
: Thanks for this. Query time dramatically reduced to 27ms with this. to understand what is going on, use debugQuery=true with each of those examples and look at the query toString info. *:* is the one and only true syntax (in any solr QParser that i know of) for find all docs efficiently. q=* or q=*.* are (if i'm not mistaken) a prefix query and a wildcard query (respectively) against the defualt search field (* is a prefix query matching all terms that start with the empty string -- so all terms. which means it should matchany doc that has any term in the default search field -- which in many indexes will be all docs, but is a very inefficient and easily falible way to try and match all docs) -Hoss
RE: Problem with result grouping
There's another discussion going on about this in the solr users mail list, but I think you're using *.* instead of *:* to match all documents. *.* ends up doing a search against the default field where *:* means match all documents. Cody -Original Message- From: Kissue Kissue [mailto:kissue...@gmail.com] Sent: Tuesday, December 13, 2011 6:06 AM To: solr-user@lucene.apache.org Subject: Problem with result grouping Hi, Maybe there is something i am missing here but i have a field in my solr index called categoryId. The field definition is as follows: field name=categoryId type=string indexed=true stored=true required=true / I am trying to group on this field and i get a result as follows: str name=groupValue43201810/str result name=doclist numFound=72 start=0 This is the query i am sending to solr: http://localhost:8080/solr/catalogue/select/?q=*.*%0D%0Aversion=2.2sta rt=0rows=1000indent=ongroup=truegroup.field=categoryId My understanding is that this means there are 72 documents in my index that have the value 43201810 for categoryId. Now surprisingly when i search my index specifically for categoryId:43201810 expecting to get 72 results i instead get 124 results. This is the query sent: http://localhost:8080/solr/catalogue/select/?q=categoryId%3A43201810ver sion=2.2start=0rows=10indent=on Is my understanding of result grouping correct? Is there something i am doing wrong. Any help will be much appreciated. I am using Solr 3.5 Thanks.
Re: highlighting questions
How to get *what* to display in HTML? The VelocityResponseWriter? Extracting this content to show in your webapp? How are you displaying any page at all? You can look at the examples in the VelocityResponseWriter to get an idea of how to do this with that templating engine. But the general idea here is that whatever parses your response has to match the id field in the doc tag with the proper element from the lstname=highlighting element and mix-n-match them. Hope that helps Erick On Mon, Dec 12, 2011 at 8:03 PM, Bent Jensen bentjen...@yahoo.com wrote: I am trying to figure out how to display search query fields highlighted in html. I can enable the highlighting in the query, and I think I get the correct response back (See below: I search using 'Contents' and the highlighting is shown with strong and /strong. However, I can't figure out what to add to the xslt file to display in html. I think it is a question of defining the appropriate xpath(?), but I am stuck. Can someone point me in the right direction? Thanks in advance! Here is the result I get back: ?xml version=1.0 encoding=UTF-8 ? - response - lstname=responseHeader intname=status0/int intname=QTime20/int - lstname=params str name=explainOther/ strname=indenton/str strname=hl.simple.pre'strong'/str strname=hl.fl*/str str name=wt/ strname=hlon/str strname=rows10/str strname=version2.2/str str name=fl/ strname=start0/str strname=qcontents/str strname=hl.simple.post'/strong'/str str name=qt/ str name=fq/ /lst /lst - resultname=responsenumFound=1start=0 - doc - arrname=content strStart with the Table of Contents. See if you can find the topic that you are interested in. Look through the section to see if there is a resource that can help you. If you find one, you may want to attach a Post-it tab so you can find the page later. Write down all of the information that you need to find out more information about the resource: agency name, name of contact person, telephone number, email and website addresses. If you were unable to find a resource that will help you in this resource guide, a good first step would be to call your local Independent Living Center. They will have a good idea of what is available in your area. A second step would be to call or email us at the Rehabilitation Research Center. We have a ROBOT resource specialist who may be able to assist. You can reach Lois Roberts, the “Back On Track …To Success” Mentoring Program Assistant, at 408-793-6426 or email her at lois.robe...@hhs.sccgov.org/str /arr - arrname=doclink strrobot.pdf#page=11/str /arr strname=heading1CHAPTER 1: How to Use This Resource Guide/str strname=id1-1/str /doc /result - lstname=highlighting - lstname=1-1 - arrname=content strStart with the Table of 'strong'Contents'/strong'. See if you can find the topic that you are interested in. Look/str /arr /lst /lst /response
solr ignore duplicate documents
People, I am asking for your help with solr. When a document is sent to solr and such document already exists in its index (by its ID) then the new doc replaces the old one. But I don't want to automatically replace documents. Just ignore and proceed to the next. How can I configure solr to do so? Of course I can query solr to check if it has the document already but it's bad for me since I do bulk updates and this will complicate the process and increase amount of request. So are there any ways to configure solr to ignore duplicates? Just ignore. I don't need any specific responses or actions. Best Regards Alexander Aristov
Re: How to get SolrServer within my own servlet
Anybody could help? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-get-SolrServer-within-my-own-servlet-tp3583304p3583368.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to get SolrServer within my own servlet
Have à look here first and you're will probably be using SolrEmbeddedServer. http://wiki.apache.org/solr/Solrj Patrick Op 13 dec. 2011 om 20:38 heeft Joey vanjo...@gmail.com het volgende geschreven: Anybody could help? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-get-SolrServer-within-my-own-servlet-tp3583304p3583368.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to get SolrServer within my own servlet
Thanks Patrick for the reply. What I did was un-jar solr.war and created my own web application. Now I want to write my own servlet to index all files inside a folder. I suppose there is already solrserver instance initialized when my web app started. How can I access that solr server instance in my servlet? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-get-SolrServer-within-my-own-servlet-tp3583304p3583416.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Best way to convert a field in a query to a fq?
Hi, We've done similar query rewriting in a custom SearchComponent that runs before QueryComponent. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Andrew Lundgren lundg...@familysearch.org To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Tuesday, December 13, 2011 11:58 AM Subject: Best way to convert a field in a query to a fq? I want to modify incoming queries such that a field is always transformed to a filter query. For example, I want to convert a query field like q= ... part_page=3 ... to a filter query like q= ... fq=partpage(3) . Is the right way to do this in a custom component, or is there someplace else where this should be handled? We have several clients and would like to protect the server from this field being queried on even if they make a mistake. Thank you. -- Andrew Lundgren lundg...@familysearch.org NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
Re: CRUD on solr Index while replicating between master/slave
Hi, Master: Update/insert/delete docs -- Yes Slaves: Search -- Yes Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Tarun Jain tjai...@yahoo.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Tuesday, December 13, 2011 11:15 AM Subject: CRUD on solr Index while replicating between master/slave Hi, When replication is happening between master to slave what operations can we do on the master what operations are possible on the slave? I know it is not adivisable to do DML on the slave index but I wanted to know this anyway. Also I understand that doing DML on a slave will make the slave index incompatible with the master. Master Search -- Yes/No Update/insert/delete docs -- Yes/No Slave = Search -- Yes/No Update/insert/delete docs -- Yes/No Please share any other caveats that you have discovered regarding the above scenario that might be helpful. Thanks -=-
Re: How to get SolrServer within my own servlet
Hey Joey, You should first configure your deployed Solr instance by adding/changing the schema.xml and solrconfig.xml. After that you can use SolrJ to connect to that Solr instance and add documents to it. On the link i posted earlier, you'll find à couple of examples on how to do that. - Patrick Verstuurd vanaf mijn iPhone Op 13 dec. 2011 om 20:53 heeft Joey vanjo...@gmail.com het volgende geschreven: Thanks Patrick for the reply. What I did was un-jar solr.war and created my own web application. Now I want to write my own servlet to index all files inside a folder. I suppose there is already solrserver instance initialized when my web app started. How can I access that solr server instance in my servlet? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-get-SolrServer-within-my-own-servlet-tp3583304p3583416.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Best way to convert a field in a query to a fq?
Thanks for the confirmation! -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Tuesday, December 13, 2011 1:02 PM To: solr-user@lucene.apache.org Subject: Re: Best way to convert a field in a query to a fq? Hi, We've done similar query rewriting in a custom SearchComponent that runs before QueryComponent. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Andrew Lundgren lundg...@familysearch.org To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Tuesday, December 13, 2011 11:58 AM Subject: Best way to convert a field in a query to a fq? I want to modify incoming queries such that a field is always transformed to a filter query. For example, I want to convert a query field like q= ... part_page=3 ... to a filter query like q= ... fq=partpage(3) . Is the right way to do this in a custom component, or is there someplace else where this should be handled? We have several clients and would like to protect the server from this field being queried on even if they make a mistake. Thank you. -- Andrew Lundgren lundg...@familysearch.org NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
Re: How to get SolrServer within my own servlet
The first drawback of SolrJ is using xml serialization for in-process communication. I guess you can start from SolrDispatchFilter source, and get something for your servlet from there. Regards On Tue, Dec 13, 2011 at 11:45 PM, Patrick Plaatje pplaa...@gmail.comwrote: Have à look here first and you're will probably be using SolrEmbeddedServer. http://wiki.apache.org/solr/Solrj Patrick Op 13 dec. 2011 om 20:38 heeft Joey vanjo...@gmail.com het volgende geschreven: Anybody could help? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-get-SolrServer-within-my-own-servlet-tp3583304p3583368.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sincerely yours Mikhail Khludnev Developer Grid Dynamics tel. 1-415-738-8644 Skype: mkhludnev http://www.griddynamics.com mkhlud...@griddynamics.com
Re: solr ignore duplicate documents
Man, Does overwrite=false work for you? http://wiki.apache.org/solr/UpdateXmlMessages#add.2BAC8-replace_documents Regards On Tue, Dec 13, 2011 at 11:34 PM, Alexander Aristov alexander.aris...@gmail.com wrote: People, I am asking for your help with solr. When a document is sent to solr and such document already exists in its index (by its ID) then the new doc replaces the old one. But I don't want to automatically replace documents. Just ignore and proceed to the next. How can I configure solr to do so? Of course I can query solr to check if it has the document already but it's bad for me since I do bulk updates and this will complicate the process and increase amount of request. So are there any ways to configure solr to ignore duplicates? Just ignore. I don't need any specific responses or actions. Best Regards Alexander Aristov -- Sincerely yours Mikhail Khludnev Developer Grid Dynamics tel. 1-415-738-8644 Skype: mkhludnev http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Too many connections in CLOSE_WAIT state on master solr server
You can try to reuse your connections (prevent them from closing) by specifying -Dhttp.maxConnections=http://download.oracle.com/javase/1.4.2/docs/guide/net/properties.htmlN in jvm startup params. At client JVM!. Number should be chosen considering the number of connection you'd like to keep alive. Let me know if it works for you. On Tue, Dec 13, 2011 at 2:57 PM, samarth s samarth.s.seksa...@gmail.comwrote: Hi, I am using solr replication and am experiencing a lot of connections in the state CLOSE_WAIT at the master solr server. These disappear after a while, but till then the master solr stops responding. There are about 130 open connections on the master server with the client as the slave m/c and all are in the state CLOSE_WAIT. Also, the client port specified on the master solr server netstat results is not visible in the netstat results on the client (slave solr) m/c. Following is my environment: - 40 cores in the master solr on m/c 1 - 40 cores in the slave solr on m/c 2 - The replication poll interval is 20 seconds. - Replication part in solrconfig.xml in the slave solr: requestHandler name=/replication class=solr.ReplicationHandler lst name=slave !--fully qualified url for the replication handler of master-- str name=masterUrl$mastercorename/replication/str !--Interval in which the slave should poll master .Format is HH:mm:ss . If this is absent slave does not poll automatically. But a fetchindex can be triggered from the admin or the http API-- str name=pollInterval00:00:20/str !-- The following values are used when the slave connects to the master to download the index files. Default values implicitly set as 5000ms and 1ms respectively. The user DOES NOT need to specify these unless the bandwidth is extremely low or if there is an extremely high latency-- str name=httpConnTimeout5000/str str name=httpReadTimeout1/str /lst /requestHandler Thanks for any pointers. -- Regards, Samarth -- Sincerely yours Mikhail Khludnev Developer Grid Dynamics tel. 1-415-738-8644 Skype: mkhludnev http://www.griddynamics.com mkhlud...@griddynamics.com
Re: How to get SolrServer
Joey: I'm not sure what you mean by wapping solr to your own web application. There is a way to embed Solr into your application (same JVM), but I've never used that. If you're talking about your servlet running in one JVM and Solr in another, then use the SolrJ client library to interact with Solr. I use CommonsHttpSolrServer (http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/impl/CommonsHttpSolrServer.html) and specify the URL that locates the Solr server/core name. I use Spring to instantiate the server instance, and then I inject it where I need it. bean id=solrServerIngContent class=org.apache.solr.client.solrj.impl.CommonsHttpSolrServer constructor-arg value=http://localhost:8091/solr/mycorename/ /bean Thus is equivalent to new CommonsHttpSolrServer(http://localhost:8091/solr/mycorename;); Check out the API link above and http://wiki.apache.org/solr/Solrj to examples on using the SolrJ API. Cheers, Jeff On Dec 13, 2011, at 12:12 PM, Joey wrote: Hi I am new to Solr and want to do some customize development. I have wrapped solr to my own web application, and want to write a servlet to index a file system. The question is how can I get SolrServer inside my Servlet? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-get-SolrServer-tp3583304p3583304.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: CRUD on solr Index while replicating between master/slave
Hi, Thanks. So just to clarify here again while replicating we cannot search on master index ? Tarun Jain -=- - Original Message - From: Otis Gospodnetic otis_gospodne...@yahoo.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Cc: Sent: Tuesday, December 13, 2011 3:03 PM Subject: Re: CRUD on solr Index while replicating between master/slave Hi, Master: Update/insert/delete docs -- Yes Slaves: Search -- Yes Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Tarun Jain tjai...@yahoo.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Tuesday, December 13, 2011 11:15 AM Subject: CRUD on solr Index while replicating between master/slave Hi, When replication is happening between master to slave what operations can we do on the master what operations are possible on the slave? I know it is not adivisable to do DML on the slave index but I wanted to know this anyway. Also I understand that doing DML on a slave will make the slave index incompatible with the master. Master Search -- Yes/No Update/insert/delete docs -- Yes/No Slave = Search -- Yes/No Update/insert/delete docs -- Yes/No Please share any other caveats that you have discovered regarding the above scenario that might be helpful. Thanks -=-
Re: solr ignore duplicate documents
You're probably talking a custom update handler here. That way you can do a document ID lookup, that is just see if the incoming document ID is in the index already and throw the document away if you find one. This should be very efficient, much more efficient than making a separate query for each one. There's no way that I know of to do this out of the box in Solr though. Best Erick On Tue, Dec 13, 2011 at 3:44 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Man, Does overwrite=false work for you? http://wiki.apache.org/solr/UpdateXmlMessages#add.2BAC8-replace_documents Regards On Tue, Dec 13, 2011 at 11:34 PM, Alexander Aristov alexander.aris...@gmail.com wrote: People, I am asking for your help with solr. When a document is sent to solr and such document already exists in its index (by its ID) then the new doc replaces the old one. But I don't want to automatically replace documents. Just ignore and proceed to the next. How can I configure solr to do so? Of course I can query solr to check if it has the document already but it's bad for me since I do bulk updates and this will complicate the process and increase amount of request. So are there any ways to configure solr to ignore duplicates? Just ignore. I don't need any specific responses or actions. Best Regards Alexander Aristov -- Sincerely yours Mikhail Khludnev Developer Grid Dynamics tel. 1-415-738-8644 Skype: mkhludnev http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Too many connections in CLOSE_WAIT state on master solr server
Replicating 40 cores every 20 seconds is just *asking* for trouble. How often do your cores change on the master? How big are they? Is there any chance you just have too many cores replicating at once? Best Erick On Tue, Dec 13, 2011 at 3:52 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: You can try to reuse your connections (prevent them from closing) by specifying -Dhttp.maxConnections=http://download.oracle.com/javase/1.4.2/docs/guide/net/properties.htmlN in jvm startup params. At client JVM!. Number should be chosen considering the number of connection you'd like to keep alive. Let me know if it works for you. On Tue, Dec 13, 2011 at 2:57 PM, samarth s samarth.s.seksa...@gmail.comwrote: Hi, I am using solr replication and am experiencing a lot of connections in the state CLOSE_WAIT at the master solr server. These disappear after a while, but till then the master solr stops responding. There are about 130 open connections on the master server with the client as the slave m/c and all are in the state CLOSE_WAIT. Also, the client port specified on the master solr server netstat results is not visible in the netstat results on the client (slave solr) m/c. Following is my environment: - 40 cores in the master solr on m/c 1 - 40 cores in the slave solr on m/c 2 - The replication poll interval is 20 seconds. - Replication part in solrconfig.xml in the slave solr: requestHandler name=/replication class=solr.ReplicationHandler lst name=slave !--fully qualified url for the replication handler of master-- str name=masterUrl$mastercorename/replication/str !--Interval in which the slave should poll master .Format is HH:mm:ss . If this is absent slave does not poll automatically. But a fetchindex can be triggered from the admin or the http API-- str name=pollInterval00:00:20/str !-- The following values are used when the slave connects to the master to download the index files. Default values implicitly set as 5000ms and 1ms respectively. The user DOES NOT need to specify these unless the bandwidth is extremely low or if there is an extremely high latency-- str name=httpConnTimeout5000/str str name=httpReadTimeout1/str /lst /requestHandler Thanks for any pointers. -- Regards, Samarth -- Sincerely yours Mikhail Khludnev Developer Grid Dynamics tel. 1-415-738-8644 Skype: mkhludnev http://www.griddynamics.com mkhlud...@griddynamics.com
Re: CRUD on solr Index while replicating between master/slave
No, you can search on the master when replicating, no problem. But why do you want to? The whole point of master/slave setups is to separate indexing from searching machines. Best Erick On Tue, Dec 13, 2011 at 4:10 PM, Tarun Jain tjai...@yahoo.com wrote: Hi, Thanks. So just to clarify here again while replicating we cannot search on master index ? Tarun Jain -=- - Original Message - From: Otis Gospodnetic otis_gospodne...@yahoo.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Cc: Sent: Tuesday, December 13, 2011 3:03 PM Subject: Re: CRUD on solr Index while replicating between master/slave Hi, Master: Update/insert/delete docs -- Yes Slaves: Search -- Yes Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Tarun Jain tjai...@yahoo.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Tuesday, December 13, 2011 11:15 AM Subject: CRUD on solr Index while replicating between master/slave Hi, When replication is happening between master to slave what operations can we do on the master what operations are possible on the slave? I know it is not adivisable to do DML on the slave index but I wanted to know this anyway. Also I understand that doing DML on a slave will make the slave index incompatible with the master. Master Search -- Yes/No Update/insert/delete docs -- Yes/No Slave = Search -- Yes/No Update/insert/delete docs -- Yes/No Please share any other caveats that you have discovered regarding the above scenario that might be helpful. Thanks -=-
Migrate Lucene 2.9 To SOLR
Hi I have a old project that use Lucene 2.9. Its possible to use the index created by lucene in SOLR? May i just copy de index to data directory of SOLR, or exists some mechanism to import Lucene index? Thanks
Re: How to get SolrServer within my own servlet
Thank you guys for the reply. So what I want to do is to modify Solr a bit - add one servlet so I can trigger a full index of a folder in the file system. What I did: un-jar solr.war; Create a web app and copy the un-jar the solr files to this app; Create my servlet; Repackage the web app to a war and deploy; by following the suggestions of you guys; I create a EmbeddedSolrServer in my Servlet: public void init() throws ServletException { CoreContainer.Initializer initializer = new CoreContainer.Initializer(); CoreContainer coreContainer = null; try { coreContainer = initializer.initialize(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (ParserConfigurationException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (SAXException e) { // TODO Auto-generated catch block e.printStackTrace(); } _solrServer = new EmbeddedSolrServer(coreContainer, ); } And I can now trigger the index by call: http://localhost:8080/testservlet01. The servlet does this: SolrInputDocument doc1 = new SolrInputDocument(); doc1.addField( id, id1, 1.0f ); doc1.addField( name, doc1, 1.0f ); CollectionSolrInputDocument docs = new ArrayListSolrInputDocument(); docs.add( doc1 ); try { _solrServer.add( docs ); _solrServer.commit(); } catch (SolrServerException e) { // TODO Auto-generated catch block e.printStackTrace(); } However, seems the search didn't return unless I restart my application: localhost:8080/select/?q=*%3A*version=2.2start=0rows=10indent=on I guess there are two SolrServer instances(one is EmbeddedSolrServer, created by myself and the other is come with Solr itself and they are holding different index? How can I make them synchronized? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-get-SolrServer-within-my-own-servlet-tp3583304p3583741.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Ask about the question of solr cache
Is it possible that your Solr client (or the way you communicate with it) is aware of HTTP caching? If you are using a navigator in order to confirm these updates and commits, try disabling HTTP caching. On Tue, Dec 13, 2011 at 3:24 PM, Erick Erickson erickerick...@gmail.comwrote: Are you sure you commit after you're done? If you change the index, this should all be automatic. Although that doesn't make a lot of sense if you restart Solr because the changes would probably be lost then. But I'm a bit confused about what you mean by caches not being updated. Do you mean search results? Some details about how you're verifying that the results aren't available would be helpful. This should be all automatic, so my first guess would be that you're not doing what you think you are G. Best Erick On Mon, Dec 12, 2011 at 6:21 AM, JiaoyanChen chen00...@gmail.com wrote: When I have delete or add data by application through solrj, or have import index through command nutch solrindex, the cache of solr are not changed if I do not restart solr. Could anyone tell me how could I update solr cache without restarting using shell command? When I recreate the index by nutch, I should update data in solr. I use java -jar start.jar to publish solr. Thanks! -- Un saludo, Samuel García.
Re: Migrate Lucene 2.9 To SOLR
I am about to try exact same thing, running SOLR on top of Lucene indexes created by Lucene.Net 2.9.2. AFAIK, it should work. Not sure if indexes become non-backwards compatible once any new documents are written to them by SOLR though. Probably good to make a backup first. On Dec 13, 2011, at 4:34 PM, Anderson vasconcelos wrote: Hi I have a old project that use Lucene 2.9. Its possible to use the index created by lucene in SOLR? May i just copy de index to data directory of SOLR, or exists some mechanism to import Lucene index? Thanks
Re: Suggest component
I think I may have solved my problem. Not 100% certain what the solution was because I've been trying so many things, but in the end what I did was revisit this article and re-step my configuration. http://www.lucidimagination.com/blog/2011/04/08/solr-powered-isfdb-part-9/ I believe what the problem was, was the fact that I didn't create a firstSearcher to ensure that the dictionary was built on start/restart. Hopefully that helps anyone else who may run across this issue too. -- View this message in context: http://lucene.472066.n3.nabble.com/Suggest-component-tp2725438p3583831.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Virtual Memory very high
On 12/13/2011 6:16 AM, Dmitry Kan wrote: If you allow me to chime in, is there a way to check for which DirectoryFactory is in use, if ${solr.directoryFactory:solr.StandardDirectoryFactory} has been configured? I think you can get the currently used factory in a Luke response, if you hit your Solr server with a Luke request, eg http://localhost:8983/solr/admin/luke Dmitry 2011/12/12 Yury Kats yuryk...@yahoo.com On 12/11/2011 4:57 AM, Rohit wrote: What are the difference in the different DirectoryFactory? http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/store/MMapDirectory.html http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/store/NIOFSDirectory.html
Re: Difference between field collapsing and result grouping
: Nope, they're the same. The original name was Field Collapsing, : but it was changed to Grouping later. Specificly: Field Collapsing is one type of usecase for the more general concept of Result Grouping (other types of result grouping are group by query, group by function results, etc...) : But note that the functionality has changed over time, so you might : be seeing documents from different incarnations of the code. right ... there was a widely used patch that had param names that were very specific to the idea field collapsing that evolved heavily, hence the diff param names. -Hoss
Re: Maximum File Size Handled by post.jar / Speed of Deletes?
: We would like to know is there a maximum size of a xml file that can be : posted to Solr using the post.jar, maximum number of docs, etc. at one time : as well as how fast deletes can be achieved. post.jar is provided purely as an extremeley trivial tool for beginers to use to manual post arbitrary files to Solr while reading the tutorial and first getting started with Solr. It is not intended to be a production tool for use in production systems for doing automated and/or programatic loading of data. (if curl existed ubiquitiously on every computer in the world capable of running java, post.jar would never have been created) If you want to have custom tools that talk to solr for loading/deleting docs, use whatever Solr client or HTTP library you are comfortable with in whatever programming language you are already using. If you are using Java please look at the SolrJ client library. -Hoss
RE: MoreLikeThis questions
: I'm implementing a MoreLikeThis search. I have a couple of questions. : I'm implementing this with solrj so I would appreciate it if any code : snippets reflect that. : : First, I want to provide the text that Solr should check for : interesting words and do the search on. This means I don't want to : specify a document in the collection. I think the documentation implies You'll want to make sure you're using the MLT *Handler* (not the MLT Component) to specify text in that way. The key phrase you'll want to pay attention to is Content Stream ContentStream is the general abstraction for streaming data to an solr RequestHandler, either via raw HTTP POST, or HTTP multi-part mime, or by asking Solr to pull from a remote URL or local file... https://wiki.apache.org/solr/MoreLikeThisHandler#Using_ContentStreams https://wiki.apache.org/solr/ContentStream ...the ContentStreamUpdateRequest class is somewhat poorly named because there is nothign about it that requires you use it for an update requests (that's just the primary use of Content Streams), you should certinly be able to use it to stream content from your SolrJ client to the MLT Handler... https://lucene.apache.org/solr/api/org/apache/solr/client/solrj/request/ContentStreamUpdateRequest.html -Hoss
Re: Reducing heap space consumption for large dictionaries?
W dniu 2011-12-13 05:48, Chris Male pisze: Hi, Its good to hear some feedback on using the Hunspell dictionaries. Lucene's support is pretty new so we're obviously looking to improve it. Could you open a JIRA issue so we can explore whether there is some ways to reduce memory consumption? Done: https://issues.apache.org/jira/browse/SOLR-2968 -- Maciej Lisiewski
Re: Too many connections in CLOSE_WAIT state on master solr server
The updates to the master are user driven, and are needed to be visible quickly. Hence, the high frequency of replication. It may be that too many replication requests are being handled at a time, but why should that result in half closed connections? On Wed, Dec 14, 2011 at 2:47 AM, Erick Erickson erickerick...@gmail.com wrote: Replicating 40 cores every 20 seconds is just *asking* for trouble. How often do your cores change on the master? How big are they? Is there any chance you just have too many cores replicating at once? Best Erick On Tue, Dec 13, 2011 at 3:52 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: You can try to reuse your connections (prevent them from closing) by specifying -Dhttp.maxConnections=http://download.oracle.com/javase/1.4.2/docs/guide/net/properties.htmlN in jvm startup params. At client JVM!. Number should be chosen considering the number of connection you'd like to keep alive. Let me know if it works for you. On Tue, Dec 13, 2011 at 2:57 PM, samarth s samarth.s.seksa...@gmail.comwrote: Hi, I am using solr replication and am experiencing a lot of connections in the state CLOSE_WAIT at the master solr server. These disappear after a while, but till then the master solr stops responding. There are about 130 open connections on the master server with the client as the slave m/c and all are in the state CLOSE_WAIT. Also, the client port specified on the master solr server netstat results is not visible in the netstat results on the client (slave solr) m/c. Following is my environment: - 40 cores in the master solr on m/c 1 - 40 cores in the slave solr on m/c 2 - The replication poll interval is 20 seconds. - Replication part in solrconfig.xml in the slave solr: requestHandler name=/replication class=solr.ReplicationHandler lst name=slave !--fully qualified url for the replication handler of master-- str name=masterUrl$mastercorename/replication/str !--Interval in which the slave should poll master .Format is HH:mm:ss . If this is absent slave does not poll automatically. But a fetchindex can be triggered from the admin or the http API-- str name=pollInterval00:00:20/str !-- The following values are used when the slave connects to the master to download the index files. Default values implicitly set as 5000ms and 1ms respectively. The user DOES NOT need to specify these unless the bandwidth is extremely low or if there is an extremely high latency-- str name=httpConnTimeout5000/str str name=httpReadTimeout1/str /lst /requestHandler Thanks for any pointers. -- Regards, Samarth -- Sincerely yours Mikhail Khludnev Developer Grid Dynamics tel. 1-415-738-8644 Skype: mkhludnev http://www.griddynamics.com mkhlud...@griddynamics.com -- Regards, Samarth
Re: How to get SolrServer within my own servlet
Just curious, sounds like you try to deploy your servlet with Solr support, why don't you just deploy you app as separate Sevlet with the Solr war in the server, then let your servlet send requests to Solr? This will bring much benefit in maintaining your app with Solr support. Yunfei On Tuesday, December 13, 2011, Joey vanjo...@gmail.com wrote: Thank you guys for the reply. So what I want to do is to modify Solr a bit - add one servlet so I can trigger a full index of a folder in the file system. What I did: un-jar solr.war; Create a web app and copy the un-jar the solr files to this app; Create my servlet; Repackage the web app to a war and deploy; by following the suggestions of you guys; I create a EmbeddedSolrServer in my Servlet: public void init() throws ServletException { CoreContainer.Initializer initializer = new CoreContainer.Initializer(); CoreContainer coreContainer = null; try { coreContainer = initializer.initialize(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (ParserConfigurationException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (SAXException e) { // TODO Auto-generated catch block e.printStackTrace(); } _solrServer = new EmbeddedSolrServer(coreContainer, ); } And I can now trigger the index by call: http://localhost:8080/testservlet01. The servlet does this: SolrInputDocument doc1 = new SolrInputDocument(); doc1.addField( id, id1, 1.0f ); doc1.addField( name, doc1, 1.0f ); CollectionSolrInputDocument docs = new ArrayListSolrInputDocument(); docs.add( doc1 ); try { _solrServer.add( docs ); _solrServer.commit(); } catch (SolrServerException e) { // TODO Auto-generated catch block e.printStackTrace(); } However, seems the search didn't return unless I restart my application: localhost:8080/select/?q=*%3A*version=2.2start=0rows=10indent=on I guess there are two SolrServer instances(one is EmbeddedSolrServer, created by myself and the other is come with Solr itself and they are holding different index? How can I make them synchronized? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-get-SolrServer-within-my-own-servlet-tp3583304p3583741.html Sent from the Solr - User mailing list archive at Nabble.com.