Re: Basic auth on SolrCloud /admin/* calls
Hi Tim, Are you running Solr 4.2? (In 4.0 and 4.1, the Collections API didn't return any failure message. see SOLR-4043 issue). As far as I know, you can't tell Solr to use authentication credentials when communicating other nodes. It's a bigger issue.. for example, if you want to protect the /update requestHandler, so unauthorized users won't delete your whole collection, it can interfere the replication process. I think it's a necessary mechanism in production environment... I'm curious how do people use SolrCloud in production w/o it. On Fri, Mar 29, 2013 at 3:42 AM, Vaillancourt, Tim tvaillanco...@ea.comwrote: Hey guys, I've recently setup basic auth under Jetty 8 for all my Solr 4.x '/admin/*' calls, in order to protect my Collections and Cores API. Although the security constraint is working as expected ('/admin/*' calls require Basic Auth or return 401), when I use the Collections API to create a collection, I receive a 200 OK to the Collections API CREATE call, but the background Cores API calls that are ran on the Collection API's behalf fail on the Basic Auth on other nodes with a 401 code, as I should have foreseen, but didn't. Is there a way to tell SolrCloud to use authentication on internal Cores API calls that are spawned on Collections API's behalf, or is this a new feature request? To reproduce: 1. Implement basic auth on '/admin/*' URIs. 2. Perform a CREATE Collections API call to a node (which will return 200 OK). 3. Notice all Cores API calls fail (Collection isn't created). See stack trace below from the node that was issued the CREATE call. The stack trace I get is: org.apache.solr.common.SolrException: Server at http://HOST HERE:8983/solrhttp://%3cHOST%20HERE%3e:8983/solr returned non ok status:401, message:Unauthorized at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:169) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:135) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) Cheers! Tim
Combining Solr Indexes at SolrCloud
Let's assume that I have two machine in a SolrCloud that works as a part of cloud. If I want to shutdown one of them an combine its indexes into other how can I do that?
SOAP for Solr indexing mechanism
Is there any support for communication over SOAP for Solr indexing mechanism?
Parallel Indexing With Solr?
Does Solr allows parallelism (parallel computing) for indexing?
Re: Parallel Indexing With Solr?
On 29 March 2013 14:56, Furkan KAMACI furkankam...@gmail.com wrote: Does Solr allows parallelism (parallel computing) for indexing? What do you mean by parallel computing in this context? Solr can use multiple threads for indexing if that is what you are asking. Regards, Gora
Re: solrj sample code for solrcloud
Here's some indexing code, should get you started... http://searchhub.org/dev/2012/02/14/indexing-with-solrj/ It's against 3.x as I remember, so there might be a bit of updating to do. Best Erick On Thu, Mar 28, 2013 at 2:49 AM, Jeong-dae Ha sa2ntjul...@gmail.com wrote: Does anyone have solrj indexing and searching sample code? I could not find it on the internet. Thanks.
Need Help in Patching OPENNLP
Hi All, am very new to solr and Java technology. I would wonder if some one can gimme a way out to patch the OpenNLP platform with Solr. Am simply blocked out at the initial step, applying patch to Solr 4.2. Any pointer would be highly appreciated. Thanks, Karthic -- View this message in context: http://lucene.472066.n3.nabble.com/Need-Help-in-Patching-OPENNLP-tp4052362.html Sent from the Solr - User mailing list archive at Nabble.com.
Realtime updates solrcloud
Hello Guys, I want to use the realtime updates mechanism of solrcloud. My setup is as follow: 3 solr engines, 3 zookeeper instances(ensemble) The setup works great, recovery, leader election etc. The problem is the realtime updates, it's slow after the servers gets some traffic. I try to explain it: I test the realtime update with the following command: *curl http://SOLRURL:SOLRPORT/solr/update -H Content-Type: text/xml --data-binary 'adddocfield name=id3504811/fieldfield name=websitehttp://www.google.nl/add/doc'* I see this in logs of solr server: *Mar 29, 2013 12:38:51 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [collection1] webapp=/solr path=/update params={} {add=[3504811 (1430841858290876416)]} 0 35 * The other solr servers get the following lines in the log: *INFO: [collection1] webapp=/solr path=/update params={distrib.from=http://SOLRIP:SOLRPORT/solr/collection1/update.distrib=FROMLEADERwt=javabinversion=2} {add=[3504811 (1430844456234385408)]} 0 14* This looks good, the doc is added and the leader send this doc to the other solr servers. First times it takes 1 sec to make the update visible:) When i send some traffic to the server(200q/s), the update takes +- 30 sec to make it visible. I stopped the traffic it's still takes 30 sec's to make the update visible. How is it possible? The solrconfig parts: *autoCommit maxTime60/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime2000/maxTime /autoSoftCommit* Did i miss something? Best Regards, Roy -- View this message in context: http://lucene.472066.n3.nabble.com/Realtime-updates-solrcloud-tp4052370.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Too many fields to Sort in Solr
Hi Joel, Might have an answer for this. Initially my servers were on 3.5 and then i moved to Solr 4.0. at this time i use the solrconfig.xml that was in the example and updated is with parameters i changed in 3.5 for the environment. there was no codecFactory class=solr.SchemaCodecFactory/ in the 4.0 example solrconfig.xml file. We continued to us the same file and updated war to 4.1 then 4.2 just by changing the luceneMatchVersion in the existing solrconfig.xml file. I was looking at the 4.2 and comparing it with the one we have and i see that the *codefactory* is in the example solrconfig.xml file. codecFactory class=solr.SchemaCodecFactory/ -- View this message in context: http://lucene.472066.n3.nabble.com/Too-many-fields-to-Sort-in-Solr-tp4049139p4052374.html Sent from the Solr - User mailing list archive at Nabble.com.
Suggestions for Customizing Solr Admin Page
I want to customize Solr Admin Page. I think that I will need more complicated things to manage my cloud. I will separate my Solr cluster into just indexing ones and just response ones. I will index my documents by categorical and I will index them at different collections. In my admin page I will combine that collections, I will separate my collection into new ones. I will add, remove, query documents etc. Here is an old topic about admin Solr page: http://lucene.472066.n3.nabble.com/Extending-Solr-s-Admin-functionality-td473974.html My needs my change and some of them should be done via existing Solr admin page. What do you suggest me, extending existing admin page, wrapping up a new one over a Solrj. Which directions should I care and how can I decide one of them.
Re: Combining Solr Indexes at SolrCloud
Let's say you have machine A and machine B. you want to shutdown B. If all the shards on B have replicas (on A), you can shutdown B instantly. If there is a shard on B that has no replica, you should create one on machine A (using Core API), let it replicate the whole shard contents, and then you are safe to shutdown B. [Changing the shard count of an existing collection is not possible for now, so MERGing cores is not relevant.] On Fri, Mar 29, 2013 at 11:23 AM, Furkan KAMACI furkankam...@gmail.comwrote: Let's assume that I have two machine in a SolrCloud that works as a part of cloud. If I want to shutdown one of them an combine its indexes into other how can I do that?
Solr fuzzy search with WordDemiliterFilter
Hi I need to apply fuzzy search for my production. It better the search results for spelling issue. However, it is not applying the analyzer filters configured in schema.xml I know fuzzy and wildcard search wont apply the filters. But is there a way to plugin the filters or write this logic at the client. Because am not getting any results for queries with numbers and special symbols(-). The configuration in schema.xml : analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.EnglishMinimalStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.EnglishMinimalStemFilterFactory/ /analyzer /fieldType How to make sure that the filters as per the indexing also applied on fuzzy search at the query time when the filters configured are not working. Please help.
Re: SOAP for Solr indexing mechanism
Nope. Otis -- Solr ElasticSearch Support http://sematext.com/ On Fri, Mar 29, 2013 at 4:54 AM, Furkan KAMACI furkankam...@gmail.com wrote: Is there any support for communication over SOAP for Solr indexing mechanism?
Re: Solr fuzzy search with WordDemiliterFilter
The use of the fuzzy query operator will suppress the Word Delimiter Filter at query time. That's just the way it works. You can't use both fuzzy query and WDF when WDF is splitting apart words, numbers, and case changes, and throwing away special characters as well. To put it simply, at query time the user needs to close their eyes and imagine what transformations WDF is doing and then query based on that. One workaround: copy to a separate field that does not use WDF. Then the user can use fuzzy query fine (other than that it is limited to an editing distance of 2) for that other field. -- Jack Krupansky -Original Message- From: ilay raja Sent: Friday, March 29, 2013 10:28 AM To: solr-user@lucene.apache.org ; solr-...@lucene.apache.org Subject: Solr fuzzy search with WordDemiliterFilter Hi I need to apply fuzzy search for my production. It better the search results for spelling issue. However, it is not applying the analyzer filters configured in schema.xml I know fuzzy and wildcard search wont apply the filters. But is there a way to plugin the filters or write this logic at the client. Because am not getting any results for queries with numbers and special symbols(-). The configuration in schema.xml : analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.EnglishMinimalStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.EnglishMinimalStemFilterFactory/ /analyzer /fieldType How to make sure that the filters as per the indexing also applied on fuzzy search at the query time when the filters configured are not working. Please help.
Re: Too many fields to Sort in Solr
OK, that makes sense. How are DocValues working for you? On Fri, Mar 29, 2013 at 9:02 AM, adityab aditya_ba...@yahoo.com wrote: Hi Joel, Might have an answer for this. Initially my servers were on 3.5 and then i moved to Solr 4.0. at this time i use the solrconfig.xml that was in the example and updated is with parameters i changed in 3.5 for the environment. there was no codecFactory class=solr.SchemaCodecFactory/ in the 4.0 example solrconfig.xml file. We continued to us the same file and updated war to 4.1 then 4.2 just by changing the luceneMatchVersion in the existing solrconfig.xml file. I was looking at the 4.2 and comparing it with the one we have and i see that the *codefactory* is in the example solrconfig.xml file. codecFactory class=solr.SchemaCodecFactory/ -- View this message in context: http://lucene.472066.n3.nabble.com/Too-many-fields-to-Sort-in-Solr-tp4049139p4052374.html Sent from the Solr - User mailing list archive at Nabble.com. -- Joel Bernstein Professional Services LucidWorks
trying to index postgresql database using solrj
I'm new to solr and my question may be easy but i can't understand why I've got table which I have already indexed in solr (so I've already have the fields of this table in the schema.xml). SO i added 2 new rows in my database and now I try to index again this table but this time from my java apllication usong solrj But it gives me all the time exeption *1030 [pool-1-thread-1] ERROR org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer - error java.lang.Exception: Bad Request Bad Request request: http://localhost:8983/solr/db/update at org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer$Runner.run(StreamingUpdateSolrServer.java:161) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) 1030 [pool-1-thread-1] INFO org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer - finished: org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer$Runner@3f5b4f9c Total Time Taken: 1296 milliseconds to index 100 SQL rows* You see that at the end it shows me the right number of the rows whih means that it reads from my database. May the problem be that the this table is already index or I don't know. public class ReadFromSolr { private Connection conn = null; private static StreamingUpdateSolrServer server; private Collection docs = new ArrayList(); private int _totalSql = 0; private long _start = System.currentTimeMillis(); public static void main(String[] args) throws SolrServerException, SQLException, IOException { String url = http://localhost:8983/solr/db/;; ReadFromSolr idxer = new ReadFromSolr(url); try { idxer.doSqlDocuments(); idxer.endIndexing(); } catch (Exception ex) { ex.printStackTrace(); } } private void doSqlDocuments() throws SQLException { try { Class.forName(org.postgresql.Driver); conn = DriverManager.getConnection( jdbc:postgresql://localhost:5432/plovdivbizloca, postgres, tan); java.sql.Statement st = null; st = conn.createStatement(); ResultSet rs = st.executeQuery(select * from pl_biz); while (rs.next()) { SolrInputDocument doc = new SolrInputDocument(); Integer id = rs.getInt(id); String name = rs.getString(name); String midname = rs.getString(midname); String lastname = rs.getString(lastname); String frlsname = rs.getString(frlsname); String biz_subject = rs.getString(biz_subject); String company_type = rs.getString(company_type); String obshtina = rs.getString(obshtina); String main_office_town = rs.getString(main_office_town); String address = rs.getString(address); String role = rs.getString(role); String country = rs.getString(country); String nace_code = rs.getString(nace_code); String nace_text = rs.getString(nace_text); String zip_code = rs.getString(zip_code); String phone = rs.getString(phone); String fax = rs.getString(fax); String email = rs.getString(email); String web = rs.getString(web); String location = rs.getString(location); String geohash = rs.getString(geohash); Integer popularity = rs.getInt(popularity); doc.addField(id, id); doc.addField(name, name); doc.addField(midname, midname); doc.addField(lastnme, lastname); doc.addField(frlsname, frlsname); doc.addField(biz_subject, biz_subject); doc.addField(company_type, company_type); doc.addField(obshtina, obshtina); doc.addField(main_office_town, main_office_town); doc.addField(address, address); doc.addField(role, role); doc.addField(country, country); doc.addField(nace_code, nace_code); doc.addField(nace_text, nace_text); doc.addField(zip_code, zip_code); doc.addField(phone, phone); doc.addField(fax, fax); doc.addField(email, email); doc.addField(web, web); doc.addField(location, location); doc.addField(geohash, geohash); doc.addField(popularity, popularity); docs.add(doc); ++_totalSql; if (docs.size() 100) { // Commit within 5 minutes. UpdateResponse resp = server.add(docs, 30); docs.clear(); } } } catch (Exception ex) { ex.printStackTrace(); } finally { if (conn != null) { conn.close(); } } } private void endIndexing() throws IOException, SolrServerException { if (docs.size() 0) { // Are there any documents left over? server.add(docs, 30); } server.commit(); long endTime =
Re: Parallel Indexing With Solr?
Yes. You can index from any app that can hit SOlr with multiple threads. You can use StreamingUpdateSolrServer, at least in older Solrs, to handle multi-threading for you. You can index from a MapReduce job Otis -- Solr ElasticSearch Support http://sematext.com/ On Fri, Mar 29, 2013 at 5:26 AM, Furkan KAMACI furkankam...@gmail.com wrote: Does Solr allows parallelism (parallel computing) for indexing?
DocValues vs stored fields?
I’m still a little fuzzy on DocValues (maybe because I’m still grappling with how it does or doesn’t still relate to “Column Stride Fields”), so can anybody clue me in as to how useful DocValues is/are? Are DocValues simply an alternative to “stored fields”? If so, and if DocValues are so great, why aren’t we just switching Solr over to DocValues under the hood for all fields? And if there are “issues” with DocValues that would make such a complete switchover less than absolutely desired, what are those issues? In short, when should a user use DocValues over stored fields, and vice versa? As things stand, all we’ve done is make Solr more confusing than it was before, without improving its OOBE. OOBE should be job one in Solr. Thanks. P.S., And if I actually want to do Column Stride Fields, is there a way to do that? -- Jack Krupansky
Re: Parallel Indexing With Solr?
Can you tell more about You can index from a MapReduce job ? I use nutch and it says Solr to index and reindex. I know that I can use Map Reduce jobs at nutch side however can I use Map Reduce jobs at Solr side (i.e for indexing etc.)? 2013/3/29 Otis Gospodnetic otis.gospodne...@gmail.com Yes. You can index from any app that can hit SOlr with multiple threads. You can use StreamingUpdateSolrServer, at least in older Solrs, to handle multi-threading for you. You can index from a MapReduce job Otis -- Solr ElasticSearch Support http://sematext.com/ On Fri, Mar 29, 2013 at 5:26 AM, Furkan KAMACI furkankam...@gmail.com wrote: Does Solr allows parallelism (parallel computing) for indexing?
Cannot find word with accent
I'm trying to find documents with this word: général It returns one hit for a document containing General. If I search for g*ral I get 230 hits, of which some contain the word général. I'm not sure where to begin looking, I believe everything is encoded correctly. The text_fr (French) fieldType configuration is essentially a boilerplate one from the Solr distribution. Thanks in advance for any insight! -Kristian
Re: Cannot find word with accent
The French Light Stemmer Filter is folding the accents: filter class=solr.FrenchLightStemFilterFactory/ Try the Solr Admin UI Analysis page and you can see that the accents go away at the last step in analysis. This behavior is hardwired into the Lucene FrenchLightStemmer norm method. It would be nice if somebody added an attribute to disable accent folding. Try the French Minimal Stemmer Filter: filter class=solr.FrenchMinimalStemFilterFactory/ It doesn't do the accent folding, but does less stemming as well. -- Jack Krupansky -Original Message- From: Van Tassell, Kristian Sent: Friday, March 29, 2013 11:50 AM To: solr-user@lucene.apache.org Subject: Cannot find word with accent I'm trying to find documents with this word: général It returns one hit for a document containing General. If I search for g*ral I get 230 hits, of which some contain the word général. I'm not sure where to begin looking, I believe everything is encoded correctly. The text_fr (French) fieldType configuration is essentially a boilerplate one from the Solr distribution. Thanks in advance for any insight! -Kristian
Synonyms problem
Hey guys, I have the following problem - I have a website with sport players, where using Solr indexing their data. I have defined synonyms like: NY, New York. When I search for New York - there are 145 results found, but when I search for NY - there are 142 results found. Why there is a diff and how can I fix this? Configuration snippets: synonyms.txt ... NY, New York ... -- schema.xml ... fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index filter class=solr. SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ tokenizer class=solr.WhitespaceTokenizerFactory / !-- we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.PhoneticFilterFactory encoder=DoubleMetaphone inject=true / filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.LengthFilterFactory min=2 max=100 / !-- filter class=solr.SnowballPorterFilterFactory language=English / -- /analyzer analyzer type=query filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 / filter class=solr.LowerCaseFilterFactory / !-- filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ -- filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.StopFilterFactory ignoreCase=true words=letterstops.txt enablePositionIncrements=true / /analyzer /fieldType Thanks in advance. Plamen
Re: DocValues vs stored fields?
Hi Jack, I've just started to dig into this as well, so sharing what I know but still some holes in my knowledge too. DocValues == Column Stride Fields (best resource I know of so far is Simon's preso from Lucene Rev 2011 - http://www.slideshare.net/LucidImagination/column-stride-fields-aka-docvalues). It's pretty dense but some nuggets I've gleaned from this are: 1) DocValues are more efficient in terms of memory usage and I/O performance for building an alternative to FieldCache (slide 27 is very impressive) 2) DocValues has a more efficient way to store primitive types, such as packed ints 3) Faster random access to stored values In terms of switch-over, you have to re-index to change your fields to use DocValues on disk, which is why they are not enabled by default. Lastly, another goal of DocValues is to allow updates to a single field w/o re-indexing the entire doc. That's not implemented yet but I think still planned. Cheers, Tim On Fri, Mar 29, 2013 at 9:31 AM, Jack Krupansky j...@basetechnology.comwrote: I’m still a little fuzzy on DocValues (maybe because I’m still grappling with how it does or doesn’t still relate to “Column Stride Fields”), so can anybody clue me in as to how useful DocValues is/are? Are DocValues simply an alternative to “stored fields”? If so, and if DocValues are so great, why aren’t we just switching Solr over to DocValues under the hood for all fields? And if there are “issues” with DocValues that would make such a complete switchover less than absolutely desired, what are those issues? In short, when should a user use DocValues over stored fields, and vice versa? As things stand, all we’ve done is make Solr more confusing than it was before, without improving its OOBE. OOBE should be job one in Solr. Thanks. P.S., And if I actually want to do Column Stride Fields, is there a way to do that? -- Jack Krupansky
Re: Synonyms problem
Hi Plamen You should set expand to true during analyzer type=index filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=true/ ... Greetings, Thomas Am 29.03.2013 17:16, schrieb Plamen Mihaylov: Hey guys, I have the following problem - I have a website with sport players, where using Solr indexing their data. I have defined synonyms like: NY, New York. When I search for New York - there are 145 results found, but when I search for NY - there are 142 results found. Why there is a diff and how can I fix this? Configuration snippets: synonyms.txt ... NY, New York ... -- schema.xml ... fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index filter class=solr. SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ tokenizer class=solr.WhitespaceTokenizerFactory / !-- we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.PhoneticFilterFactory encoder=DoubleMetaphone inject=true / filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.LengthFilterFactory min=2 max=100 / !-- filter class=solr.SnowballPorterFilterFactory language=English / -- /analyzer analyzer type=query filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 / filter class=solr.LowerCaseFilterFactory / !-- filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ -- filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.StopFilterFactory ignoreCase=true words=letterstops.txt enablePositionIncrements=true / /analyzer /fieldType Thanks in advance. Plamen -- ontopica GmbH Prinz-Albert-Str. 2b 53113 Bonn Germany fon: +49-228-227229-22 fax: +49-228-227229-77 web: http://www.ontopica.de ontopica GmbH Sitz der Gesellschaft: Bonn Geschäftsführung: Thomas Krämer, Christoph Okpue Handelsregister: Amtsgericht Bonn, HRB 17852
Re: Synonyms problem
Also, all the filters need to be after the tokenizer. There are two synonym filters specified, one before the tokenizer and one after. I'm surprised that works at all. Shouldn't that be fatal error when loading the config? wunder On Mar 29, 2013, at 9:33 AM, Thomas Krämer | ontopica wrote: Hi Plamen You should set expand to true during analyzer type=index filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=true/ ... Greetings, Thomas Am 29.03.2013 17:16, schrieb Plamen Mihaylov: Hey guys, I have the following problem - I have a website with sport players, where using Solr indexing their data. I have defined synonyms like: NY, New York. When I search for New York - there are 145 results found, but when I search for NY - there are 142 results found. Why there is a diff and how can I fix this? Configuration snippets: synonyms.txt ... NY, New York ... -- schema.xml ... fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index filter class=solr. SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ tokenizer class=solr.WhitespaceTokenizerFactory / !-- we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.PhoneticFilterFactory encoder=DoubleMetaphone inject=true / filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.LengthFilterFactory min=2 max=100 / !-- filter class=solr.SnowballPorterFilterFactory language=English / -- /analyzer analyzer type=query filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 / filter class=solr.LowerCaseFilterFactory / !-- filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ -- filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.StopFilterFactory ignoreCase=true words=letterstops.txt enablePositionIncrements=true / /analyzer /fieldType Thanks in advance. Plamen -- ontopica GmbH Prinz-Albert-Str. 2b 53113 Bonn Germany fon: +49-228-227229-22 fax: +49-228-227229-77 web: http://www.ontopica.de ontopica GmbH Sitz der Gesellschaft: Bonn Geschäftsführung: Thomas Krämer, Christoph Okpue Handelsregister: Amtsgericht Bonn, HRB 17852 -- Walter Underwood wun...@wunderwood.org
dataimport
Hi, When I hit Execute button in Query tab I only see : Last Update: 12:34:58 Indexing since 01s Requests: 1 (1/s), Fetched: 0 (0/s), Skipped: 0, Processed: 0 (0/s) Started: about an hour ago did not see any green entry saying Indexing Completed. Thanks
Re: Synonyms problem
The XPath expressions used to collect the charFilter sequence, the tokenizer, and the token filter sequence are evaluated independently of each other - see line #244 through #251: http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_4_2_0/solr/core/src/java/org/apache/solr/schema/FieldTypePluginLoader.java?view=markup#l232 Steve On Mar 29, 2013, at 12:37 PM, Walter Underwood wun...@wunderwood.org wrote: Also, all the filters need to be after the tokenizer. There are two synonym filters specified, one before the tokenizer and one after. I'm surprised that works at all. Shouldn't that be fatal error when loading the config? wunder On Mar 29, 2013, at 9:33 AM, Thomas Krämer | ontopica wrote: Hi Plamen You should set expand to true during analyzer type=index filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=true/ ... Greetings, Thomas Am 29.03.2013 17:16, schrieb Plamen Mihaylov: Hey guys, I have the following problem - I have a website with sport players, where using Solr indexing their data. I have defined synonyms like: NY, New York. When I search for New York - there are 145 results found, but when I search for NY - there are 142 results found. Why there is a diff and how can I fix this? Configuration snippets: synonyms.txt ... NY, New York ... -- schema.xml ... fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index filter class=solr. SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ tokenizer class=solr.WhitespaceTokenizerFactory / !-- we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.PhoneticFilterFactory encoder=DoubleMetaphone inject=true / filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.LengthFilterFactory min=2 max=100 / !-- filter class=solr.SnowballPorterFilterFactory language=English / -- /analyzer analyzer type=query filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 / filter class=solr.LowerCaseFilterFactory / !-- filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ -- filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.StopFilterFactory ignoreCase=true words=letterstops.txt enablePositionIncrements=true / /analyzer /fieldType Thanks in advance. Plamen -- ontopica GmbH Prinz-Albert-Str. 2b 53113 Bonn Germany fon: +49-228-227229-22 fax: +49-228-227229-77 web: http://www.ontopica.de ontopica GmbH Sitz der Gesellschaft: Bonn Geschäftsführung: Thomas Krämer, Christoph Okpue Handelsregister: Amtsgericht Bonn, HRB 17852 -- Walter Underwood wun...@wunderwood.org
Re: Synonyms problem
Guys, This is a commented line where expand is false. I moved the synonym filter after tokenizer, but the result is the same. Actual configuration: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.PhoneticFilterFactory encoder=DoubleMetaphone inject=true / filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.LengthFilterFactory min=2 max=100 / !-- filter class=solr.SnowballPorterFilterFactory language=English / -- /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 / filter class=solr.LowerCaseFilterFactory / !-- filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ -- filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.StopFilterFactory ignoreCase=true words=letterstops.txt enablePositionIncrements=true / /analyzer /fieldType 2013/3/29 Walter Underwood wun...@wunderwood.org Also, all the filters need to be after the tokenizer. There are two synonym filters specified, one before the tokenizer and one after. I'm surprised that works at all. Shouldn't that be fatal error when loading the config? wunder On Mar 29, 2013, at 9:33 AM, Thomas Krämer | ontopica wrote: Hi Plamen You should set expand to true during analyzer type=index filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=true/ ... Greetings, Thomas Am 29.03.2013 17:16, schrieb Plamen Mihaylov: Hey guys, I have the following problem - I have a website with sport players, where using Solr indexing their data. I have defined synonyms like: NY, New York. When I search for New York - there are 145 results found, but when I search for NY - there are 142 results found. Why there is a diff and how can I fix this? Configuration snippets: synonyms.txt ... NY, New York ... -- schema.xml ... fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index filter class=solr. SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ tokenizer class=solr.WhitespaceTokenizerFactory / !-- we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.PhoneticFilterFactory encoder=DoubleMetaphone inject=true / filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.LengthFilterFactory min=2 max=100 / !-- filter class=solr.SnowballPorterFilterFactory language=English / -- /analyzer analyzer type=query filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 / filter class=solr.LowerCaseFilterFactory / !-- filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ -- filter
Re: Synonyms problem
There are several problems with this config. Indexing uses the phonetic filter, but query does not. This almost guarantees that nothing will match. Numbers could match, if the filter passes them. Query time has two stopword filters with different lists. Indexing only has one. This isn't fatal, but it is pretty weird. Is letterstops.txt trying to do the same thing as the length filter? If so, use the length filter both place. Or not at all. Deleting single all single characters is a bad idea. You'll never find Vitamin C. The same synonyms are used at index and query time, which is unnecessary. Only use synonyms at index time unless you really know what you are doing and have a special need. wunder On Mar 29, 2013, at 9:53 AM, Plamen Mihaylov wrote: Guys, This is a commented line where expand is false. I moved the synonym filter after tokenizer, but the result is the same. Actual configuration: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.PhoneticFilterFactory encoder=DoubleMetaphone inject=true / filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.LengthFilterFactory min=2 max=100 / !-- filter class=solr.SnowballPorterFilterFactory language=English / -- /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 / filter class=solr.LowerCaseFilterFactory / !-- filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ -- filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.StopFilterFactory ignoreCase=true words=letterstops.txt enablePositionIncrements=true / /analyzer /fieldType 2013/3/29 Walter Underwood wun...@wunderwood.org Also, all the filters need to be after the tokenizer. There are two synonym filters specified, one before the tokenizer and one after. I'm surprised that works at all. Shouldn't that be fatal error when loading the config? wunder On Mar 29, 2013, at 9:33 AM, Thomas Krämer | ontopica wrote: Hi Plamen You should set expand to true during analyzer type=index filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=true/ ... Greetings, Thomas Am 29.03.2013 17:16, schrieb Plamen Mihaylov: Hey guys, I have the following problem - I have a website with sport players, where using Solr indexing their data. I have defined synonyms like: NY, New York. When I search for New York - there are 145 results found, but when I search for NY - there are 142 results found. Why there is a diff and how can I fix this? Configuration snippets: synonyms.txt ... NY, New York ... -- schema.xml ... fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index filter class=solr. SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ tokenizer class=solr.WhitespaceTokenizerFactory / !-- we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.PhoneticFilterFactory encoder=DoubleMetaphone inject=true / filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.LengthFilterFactory min=2 max=100 / !-- filter
Add fuzzy to edismax specs?
I've implemented this for the second time, so it is probably time to contribute it. I find it really useful. I've extended the query spec parser for edismax to also accept a tilde and to generate a FuzzyQuery. I used this at Netflix (on 1.3 with dismax), and re-implemented it for 3.3 here at Chegg. We've had it in production for nearly a year. I'll need to re-port this as part of our move to 4.x. Here is what the spec looks like. This expands to a fuzzy search on title with a similarity of 0.75, and so on. str name=qftitle~0.75^4 long_title^4 title_stem^2 author~0.75/str I'm not 100% sure I understand the spec parser in edismax, so I'd like some review when this is ready. I'd probably only do it for edismax. See: https://issues.apache.org/jira/browse/SOLR-629 wunder -- Walter Underwood wun...@wunderwood.org Search Guy, Chegg.com
Re: Solr 4.2 - Slave Index version is higher than Master
+1 I have observed this same issue no change on master and slave is bumped up with higher index number. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-2-Slave-Index-version-is-higher-than-Master-tp4049827p4052445.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr metrics in Codahale metrics and Graphite?
What are folks using for this? wunder -- Walter Underwood wun...@wunderwood.org
RE: Basic auth on SolrCloud /admin/* calls
Yes, I should have mentioned this is under 4.2 Solr. I sort of expected what I'm doing might be unsupported, but basically my concern is under the current SOLR design, any client with connectivity to SOLR's port can perform Admin-level API calls like create/drop Cores or Collections. I'm only aiming for '/solr/admin/*' calls to separate Application access from the Administrative access logically, and not the non-admin calls like '/update', although you can cause damage with '/update', too. I may try to patch the code to send Basic auth credentials on internal calls just for fun, but I'm thinking longer-term authentication should be implemented/added to the SOLR codebase (for at least admin calls) vs playing with security at the container level, and having the app inside the container aware of it. On the upside, in short testing I was able to get a Collection online using Cores API only using curl calls w/basic auth. Only the Collections API is affected due to it calls itself which do not have auth. Cheers, Tim -Original Message- From: Isaac Hebsh [mailto:isaac.he...@gmail.com] Sent: Friday, March 29, 2013 12:37 AM To: solr-user@lucene.apache.org Subject: Re: Basic auth on SolrCloud /admin/* calls Hi Tim, Are you running Solr 4.2? (In 4.0 and 4.1, the Collections API didn't return any failure message. see SOLR-4043 issue). As far as I know, you can't tell Solr to use authentication credentials when communicating other nodes. It's a bigger issue.. for example, if you want to protect the /update requestHandler, so unauthorized users won't delete your whole collection, it can interfere the replication process. I think it's a necessary mechanism in production environment... I'm curious how do people use SolrCloud in production w/o it. On Fri, Mar 29, 2013 at 3:42 AM, Vaillancourt, Tim tvaillanco...@ea.comwrote: Hey guys, I've recently setup basic auth under Jetty 8 for all my Solr 4.x '/admin/*' calls, in order to protect my Collections and Cores API. Although the security constraint is working as expected ('/admin/*' calls require Basic Auth or return 401), when I use the Collections API to create a collection, I receive a 200 OK to the Collections API CREATE call, but the background Cores API calls that are ran on the Collection API's behalf fail on the Basic Auth on other nodes with a 401 code, as I should have foreseen, but didn't. Is there a way to tell SolrCloud to use authentication on internal Cores API calls that are spawned on Collections API's behalf, or is this a new feature request? To reproduce: 1. Implement basic auth on '/admin/*' URIs. 2. Perform a CREATE Collections API call to a node (which will return 200 OK). 3. Notice all Cores API calls fail (Collection isn't created). See stack trace below from the node that was issued the CREATE call. The stack trace I get is: org.apache.solr.common.SolrException: Server at http://HOST HERE:8983/solrhttp://%3cHOST%20HERE%3e:8983/solr returned non ok status:401, message:Unauthorized at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServe r.java:373) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServe r.java:181) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHan dler.java:169) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHan dler.java:135) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439 ) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecu tor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. java:918) at java.lang.Thread.run(Thread.java:662) Cheers! Tim
Re: Synonyms problem
Thank you a lot, Walter. I removed most of the filters and now it returns the same number of results. It looks simply this way: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer /fieldType Can I ask you another question: I have Magento + Solr and have a requirement to create an admin magento module, where I can add/remove synonyms dynamically. Is this possible? I searched google but it seems not possible. Regards Plamen 2013/3/29 Walter Underwood wun...@wunderwood.org There are several problems with this config. Indexing uses the phonetic filter, but query does not. This almost guarantees that nothing will match. Numbers could match, if the filter passes them. Query time has two stopword filters with different lists. Indexing only has one. This isn't fatal, but it is pretty weird. Is letterstops.txt trying to do the same thing as the length filter? If so, use the length filter both place. Or not at all. Deleting single all single characters is a bad idea. You'll never find Vitamin C. The same synonyms are used at index and query time, which is unnecessary. Only use synonyms at index time unless you really know what you are doing and have a special need. wunder On Mar 29, 2013, at 9:53 AM, Plamen Mihaylov wrote: Guys, This is a commented line where expand is false. I moved the synonym filter after tokenizer, but the result is the same. Actual configuration: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.PhoneticFilterFactory encoder=DoubleMetaphone inject=true / filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.LengthFilterFactory min=2 max=100 / !-- filter class=solr.SnowballPorterFilterFactory language=English / -- /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 / filter class=solr.LowerCaseFilterFactory / !-- filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ -- filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.StopFilterFactory ignoreCase=true words=letterstops.txt enablePositionIncrements=true / /analyzer /fieldType 2013/3/29 Walter Underwood wun...@wunderwood.org Also, all the filters need to be after the tokenizer. There are two synonym filters specified, one before the tokenizer and one after. I'm surprised that works at all. Shouldn't that be fatal error when loading the config? wunder On Mar 29, 2013, at 9:33 AM, Thomas Krämer | ontopica wrote: Hi Plamen You should set expand to true during analyzer type=index filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=true/ ... Greetings, Thomas Am 29.03.2013 17:16, schrieb Plamen Mihaylov: Hey guys, I have the following problem - I have a website with sport players, where using Solr indexing their data. I have defined synonyms like: NY, New York. When I search for New York - there are 145 results found, but when I search for NY - there are 142 results found. Why there is a diff and how can I fix this? Configuration snippets: synonyms.txt ...
Re: Solrcloud 4.1 Collection with multiple slices only use
So, upgraded to 4.2 this morning. I had gotten to the point where I okay with the collection creation process in 4.1 using the API vice the solr.xml file in 4.0, but now 4.2 doesn't seem to want to create the instanceDir? e.g. the Dashboard reports the following when my solr.data.dir is set to /data/solr in the solrconfig.xml. However, the instance dirs aren't created, yet the index and tlog dirs are Instance /data/solr/collection1_shard1_replica1 Data /data/solr Index /data/solr/index - Chris On Thu, Mar 28, 2013 at 7:48 PM, Mark Miller markrmil...@gmail.com wrote: On Mar 28, 2013, at 7:30 PM, Shawn Heisey s...@elyograg.org wrote: Can't you leave numShards out completely, then include a numShards parameter on a collection api CREATE url, possibly giving a different numShards to each collection? Thanks, Shawn Yes - that's why I say the collections API is the way forward - it has none of these limitations. The limitations are all around pre-configuring everything in solr.xml and not using the collections API. - Mark
Query Elevation exception on shard queries
Hello, We have a Solr 3.6.2 multicore setup, where each core is a complete index for one application. In our site search we use sharded query to query two cores at a time. The issue is, If one core has docs but other core doesn't for an elevated query solr is throwing a 500 error. I woudl really appreciate it if somebody can point me in the right direction on how to avoid this error, the following is my query [#|2013-03-29T13:44:55.609-0400|INFO|sun-appserver2.1|org.apache.solr.core.SolrCore|_ThreadID=22;_ThreadName=httpSSLWorkerThread-9001-0;|[core1] webapp=/solr path=/select/ params={q=civil+warstart=0rows=10shards=localhost:/solr/core1,localhost:/solr/core2hl=truehl.fragsize=0hl.snippets=5hl.simple.pre=stronghl.simple.post=/stronghl.fl=bodyfl=*facet=truefacet.field=typefacet.mincount=1facet.method=enumfq=pubdate:[2005-01-01T00:00:00Z+TO+NOW/DAY%2B1DAY]facet.query={!ex%3Ddt+key%3DPast+24+Hours}pubdate:[NOW/DAY-1DAY+TO+NOW/DAY%2B1DAY]facet.query={!ex%3Ddt+key%3DPast+7+Days}pubdate:[NOW/DAY-7DAYS+TO+NOW/DAY%2B1DAY]facet.query={!ex%3Ddt+key%3DPast+60+Days}pubdate:[NOW/DAY-60DAYS+TO+NOW/DAY%2B1DAY]facet.query={!ex%3Ddt+key%3DPast+12+Months}pubdate:[NOW/DAY-1YEAR+TO+NOW/DAY%2B1DAY]facet.query={!ex%3Ddt+key%3DAll+Since+2005}pubdate:[*+TO+NOW/DAY%2B1DAY]} status=500 QTime=15 |#] As you can see the 2 cores are core1 and core2. The core1 has data for he query 'civil war' however core2 doesn't have any data. We have the 'civil war' in the elevate.xml which causes Solr to throw a SolrException as follows. However if I remove the elevate entry for this query, everything works well. *type* Status report *message*Index: 1, Size: 0 java.lang.IndexOutOfBoundsException: Index: 1, Size: 0 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.solr.common.util.NamedList.getVal(NamedList.java:137) at org.apache.solr.handler.component.ShardFieldSortedHitQueue$ShardComparator.sortVal(ShardDoc.java:221) at org.apache.solr.handler.component.ShardFieldSortedHitQueue$2.compare(ShardDoc.java:260) at org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:160) at org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:101) at org.apache.lucene.util.PriorityQueue.upHeap(PriorityQueue.java:223) at org.apache.lucene.util.PriorityQueue.add(PriorityQueue.java:132) at org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:148) at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:786) at org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:587) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:566) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:283) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:313) at org.apache.catalina.core.StandardContextValve.invokeInternal(StandardContextValve.java:287) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:218) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593) at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:94) at com.sun.enterprise.web.PESessionLockingStandardPipeline.invoke(PESessionLockingStandardPipeline.java:98) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:222) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:587) at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1093) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:166) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:587) at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1093) at org.apache.coyote.tomcat5.CoyoteAdapter.service(CoyoteAdapter.java:291) at com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.invokeAdapter(DefaultProcessorTask.java:670) at
Re: DocValues vs stored fields?
Hi, Atomic updates (single field updates) do not depend on DocValues. They were implemented in Solr4.0 and works fine (but all fields have to be retrievable). DocValues are supposed to be more efficient than FieldCache. Why not enabled by default ? Maybe because they are not for all fields and because of their limitations (a field has to be single-valued, required or to have default value). Regards. On 29 March 2013 17:20, Timothy Potter thelabd...@gmail.com wrote: Hi Jack, I've just started to dig into this as well, so sharing what I know but still some holes in my knowledge too. DocValues == Column Stride Fields (best resource I know of so far is Simon's preso from Lucene Rev 2011 - http://www.slideshare.net/LucidImagination/column-stride-fields-aka-docvalues ). It's pretty dense but some nuggets I've gleaned from this are: 1) DocValues are more efficient in terms of memory usage and I/O performance for building an alternative to FieldCache (slide 27 is very impressive) 2) DocValues has a more efficient way to store primitive types, such as packed ints 3) Faster random access to stored values In terms of switch-over, you have to re-index to change your fields to use DocValues on disk, which is why they are not enabled by default. Lastly, another goal of DocValues is to allow updates to a single field w/o re-indexing the entire doc. That's not implemented yet but I think still planned. Cheers, Tim On Fri, Mar 29, 2013 at 9:31 AM, Jack Krupansky j...@basetechnology.com wrote: I’m still a little fuzzy on DocValues (maybe because I’m still grappling with how it does or doesn’t still relate to “Column Stride Fields”), so can anybody clue me in as to how useful DocValues is/are? Are DocValues simply an alternative to “stored fields”? If so, and if DocValues are so great, why aren’t we just switching Solr over to DocValues under the hood for all fields? And if there are “issues” with DocValues that would make such a complete switchover less than absolutely desired, what are those issues? In short, when should a user use DocValues over stored fields, and vice versa? As things stand, all we’ve done is make Solr more confusing than it was before, without improving its OOBE. OOBE should be job one in Solr. Thanks. P.S., And if I actually want to do Column Stride Fields, is there a way to do that? -- Jack Krupansky
Re: DocValues vs stored fields?
Hi, The current field update mechanism is not really a field update mechanism. It just looks like that from the outside. DocValues should make true field updates implementable. Otis -- Solr ElasticSearch Support http://sematext.com/ On Fri, Mar 29, 2013 at 3:30 PM, Marcin Rzewucki mrzewu...@gmail.com wrote: Hi, Atomic updates (single field updates) do not depend on DocValues. They were implemented in Solr4.0 and works fine (but all fields have to be retrievable). DocValues are supposed to be more efficient than FieldCache. Why not enabled by default ? Maybe because they are not for all fields and because of their limitations (a field has to be single-valued, required or to have default value). Regards. On 29 March 2013 17:20, Timothy Potter thelabd...@gmail.com wrote: Hi Jack, I've just started to dig into this as well, so sharing what I know but still some holes in my knowledge too. DocValues == Column Stride Fields (best resource I know of so far is Simon's preso from Lucene Rev 2011 - http://www.slideshare.net/LucidImagination/column-stride-fields-aka-docvalues ). It's pretty dense but some nuggets I've gleaned from this are: 1) DocValues are more efficient in terms of memory usage and I/O performance for building an alternative to FieldCache (slide 27 is very impressive) 2) DocValues has a more efficient way to store primitive types, such as packed ints 3) Faster random access to stored values In terms of switch-over, you have to re-index to change your fields to use DocValues on disk, which is why they are not enabled by default. Lastly, another goal of DocValues is to allow updates to a single field w/o re-indexing the entire doc. That's not implemented yet but I think still planned. Cheers, Tim On Fri, Mar 29, 2013 at 9:31 AM, Jack Krupansky j...@basetechnology.com wrote: I’m still a little fuzzy on DocValues (maybe because I’m still grappling with how it does or doesn’t still relate to “Column Stride Fields”), so can anybody clue me in as to how useful DocValues is/are? Are DocValues simply an alternative to “stored fields”? If so, and if DocValues are so great, why aren’t we just switching Solr over to DocValues under the hood for all fields? And if there are “issues” with DocValues that would make such a complete switchover less than absolutely desired, what are those issues? In short, when should a user use DocValues over stored fields, and vice versa? As things stand, all we’ve done is make Solr more confusing than it was before, without improving its OOBE. OOBE should be job one in Solr. Thanks. P.S., And if I actually want to do Column Stride Fields, is there a way to do that? -- Jack Krupansky
Re: Solrcloud 4.1 Collection with multiple slices only use
Those are paths? /data/solr off the root? When using the collections api, you really don't want to set an absolute data dir - it should be relative, I'd just take the default. Then, even though many shards shard that solrconfig and data dir, they will all find a nice home relative to the instance dir. If you don't do this, you won't be able to over shard, and things get tricky fast. - Mark On Mar 29, 2013, at 2:45 PM, Chris R corg...@gmail.com wrote: So, upgraded to 4.2 this morning. I had gotten to the point where I okay with the collection creation process in 4.1 using the API vice the solr.xml file in 4.0, but now 4.2 doesn't seem to want to create the instanceDir? e.g. the Dashboard reports the following when my solr.data.dir is set to /data/solr in the solrconfig.xml. However, the instance dirs aren't created, yet the index and tlog dirs are Instance /data/solr/collection1_shard1_replica1 Data /data/solr Index /data/solr/index - Chris On Thu, Mar 28, 2013 at 7:48 PM, Mark Miller markrmil...@gmail.com wrote: On Mar 28, 2013, at 7:30 PM, Shawn Heisey s...@elyograg.org wrote: Can't you leave numShards out completely, then include a numShards parameter on a collection api CREATE url, possibly giving a different numShards to each collection? Thanks, Shawn Yes - that's why I say the collections API is the way forward - it has none of these limitations. The limitations are all around pre-configuring everything in solr.xml and not using the collections API. - Mark
Re: Basic auth on SolrCloud /admin/* calls
This has always been the case with Solr. Solr's security model is that clients should not have access to it - only trusted intermediaries should have access to it. Otherwise, it should be locked down at a higher level. That's been the case from day one and still is. That said, someone did do some work on internode basic auth a while back, but it didn't raise a ton of interest yet. - Mark On Mar 29, 2013, at 2:09 PM, Vaillancourt, Tim tvaillanco...@ea.com wrote: Yes, I should have mentioned this is under 4.2 Solr. I sort of expected what I'm doing might be unsupported, but basically my concern is under the current SOLR design, any client with connectivity to SOLR's port can perform Admin-level API calls like create/drop Cores or Collections. I'm only aiming for '/solr/admin/*' calls to separate Application access from the Administrative access logically, and not the non-admin calls like '/update', although you can cause damage with '/update', too. I may try to patch the code to send Basic auth credentials on internal calls just for fun, but I'm thinking longer-term authentication should be implemented/added to the SOLR codebase (for at least admin calls) vs playing with security at the container level, and having the app inside the container aware of it. On the upside, in short testing I was able to get a Collection online using Cores API only using curl calls w/basic auth. Only the Collections API is affected due to it calls itself which do not have auth. Cheers, Tim -Original Message- From: Isaac Hebsh [mailto:isaac.he...@gmail.com] Sent: Friday, March 29, 2013 12:37 AM To: solr-user@lucene.apache.org Subject: Re: Basic auth on SolrCloud /admin/* calls Hi Tim, Are you running Solr 4.2? (In 4.0 and 4.1, the Collections API didn't return any failure message. see SOLR-4043 issue). As far as I know, you can't tell Solr to use authentication credentials when communicating other nodes. It's a bigger issue.. for example, if you want to protect the /update requestHandler, so unauthorized users won't delete your whole collection, it can interfere the replication process. I think it's a necessary mechanism in production environment... I'm curious how do people use SolrCloud in production w/o it. On Fri, Mar 29, 2013 at 3:42 AM, Vaillancourt, Tim tvaillanco...@ea.comwrote: Hey guys, I've recently setup basic auth under Jetty 8 for all my Solr 4.x '/admin/*' calls, in order to protect my Collections and Cores API. Although the security constraint is working as expected ('/admin/*' calls require Basic Auth or return 401), when I use the Collections API to create a collection, I receive a 200 OK to the Collections API CREATE call, but the background Cores API calls that are ran on the Collection API's behalf fail on the Basic Auth on other nodes with a 401 code, as I should have foreseen, but didn't. Is there a way to tell SolrCloud to use authentication on internal Cores API calls that are spawned on Collections API's behalf, or is this a new feature request? To reproduce: 1. Implement basic auth on '/admin/*' URIs. 2. Perform a CREATE Collections API call to a node (which will return 200 OK). 3. Notice all Cores API calls fail (Collection isn't created). See stack trace below from the node that was issued the CREATE call. The stack trace I get is: org.apache.solr.common.SolrException: Server at http://HOST HERE:8983/solrhttp://%3cHOST%20HERE%3e:8983/solr returned non ok status:401, message:Unauthorized at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServe r.java:373) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServe r.java:181) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHan dler.java:169) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHan dler.java:135) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439 ) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecu tor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. java:918) at java.lang.Thread.run(Thread.java:662) Cheers! Tim
Re: DocValues vs stored fields?
Hi Otis, Currently, whole record has to be stored on disk in order to update single field. Are you trying to say that it won't be necessary with the use of DocValues ? Sounds great! Regards. On 29 March 2013 20:51, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, The current field update mechanism is not really a field update mechanism. It just looks like that from the outside. DocValues should make true field updates implementable. Otis -- Solr ElasticSearch Support http://sematext.com/ On Fri, Mar 29, 2013 at 3:30 PM, Marcin Rzewucki mrzewu...@gmail.com wrote: Hi, Atomic updates (single field updates) do not depend on DocValues. They were implemented in Solr4.0 and works fine (but all fields have to be retrievable). DocValues are supposed to be more efficient than FieldCache. Why not enabled by default ? Maybe because they are not for all fields and because of their limitations (a field has to be single-valued, required or to have default value). Regards. On 29 March 2013 17:20, Timothy Potter thelabd...@gmail.com wrote: Hi Jack, I've just started to dig into this as well, so sharing what I know but still some holes in my knowledge too. DocValues == Column Stride Fields (best resource I know of so far is Simon's preso from Lucene Rev 2011 - http://www.slideshare.net/LucidImagination/column-stride-fields-aka-docvalues ). It's pretty dense but some nuggets I've gleaned from this are: 1) DocValues are more efficient in terms of memory usage and I/O performance for building an alternative to FieldCache (slide 27 is very impressive) 2) DocValues has a more efficient way to store primitive types, such as packed ints 3) Faster random access to stored values In terms of switch-over, you have to re-index to change your fields to use DocValues on disk, which is why they are not enabled by default. Lastly, another goal of DocValues is to allow updates to a single field w/o re-indexing the entire doc. That's not implemented yet but I think still planned. Cheers, Tim On Fri, Mar 29, 2013 at 9:31 AM, Jack Krupansky j...@basetechnology.com wrote: I’m still a little fuzzy on DocValues (maybe because I’m still grappling with how it does or doesn’t still relate to “Column Stride Fields”), so can anybody clue me in as to how useful DocValues is/are? Are DocValues simply an alternative to “stored fields”? If so, and if DocValues are so great, why aren’t we just switching Solr over to DocValues under the hood for all fields? And if there are “issues” with DocValues that would make such a complete switchover less than absolutely desired, what are those issues? In short, when should a user use DocValues over stored fields, and vice versa? As things stand, all we’ve done is make Solr more confusing than it was before, without improving its OOBE. OOBE should be job one in Solr. Thanks. P.S., And if I actually want to do Column Stride Fields, is there a way to do that? -- Jack Krupansky
Re: per-fieldtype similarity not working
Any example or suggestion for how to patch the wrapper so that coord method is called for the field type with the custom similarity? -- View this message in context: http://lucene.472066.n3.nabble.com/per-fieldtype-similarity-not-working-tp3987050p4052470.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solrcloud 4.1 Collection with multiple slices only use
Yes, removing the absolute value cured the problem, but I feel like there should be a better option than the default. Given multiple collections, there should be some ability within the API to lay down the directory structure in a different way e.g. ./collection/shard as opposed to the current auto naming scheme. If you wanted to do that now, you would have to create all the collections, stop everything, modify solr.xmls, move files, and restart. painful at best. Some might say it's not necessary Thanks, Chris On Fri, Mar 29, 2013 at 4:01 PM, Mark Miller markrmil...@gmail.com wrote: Those are paths? /data/solr off the root? When using the collections api, you really don't want to set an absolute data dir - it should be relative, I'd just take the default. Then, even though many shards shard that solrconfig and data dir, they will all find a nice home relative to the instance dir. If you don't do this, you won't be able to over shard, and things get tricky fast. - Mark On Mar 29, 2013, at 2:45 PM, Chris R corg...@gmail.com wrote: So, upgraded to 4.2 this morning. I had gotten to the point where I okay with the collection creation process in 4.1 using the API vice the solr.xml file in 4.0, but now 4.2 doesn't seem to want to create the instanceDir? e.g. the Dashboard reports the following when my solr.data.dir is set to /data/solr in the solrconfig.xml. However, the instance dirs aren't created, yet the index and tlog dirs are Instance /data/solr/collection1_shard1_replica1 Data /data/solr Index /data/solr/index - Chris On Thu, Mar 28, 2013 at 7:48 PM, Mark Miller markrmil...@gmail.com wrote: On Mar 28, 2013, at 7:30 PM, Shawn Heisey s...@elyograg.org wrote: Can't you leave numShards out completely, then include a numShards parameter on a collection api CREATE url, possibly giving a different numShards to each collection? Thanks, Shawn Yes - that's why I say the collections API is the way forward - it has none of these limitations. The limitations are all around pre-configuring everything in solr.xml and not using the collections API. - Mark
4.2 Admin UI
I've notice on the Admin UI that on some of my nodes that Core Selector combo box doesn't populate. Known issue? Chris
Re: DocValues vs stored fields?
By the way: even if a field has DocValues with on disk option enabled it has to have stored=true to be retrievable. Why ? On 29 March 2013 20:51, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, The current field update mechanism is not really a field update mechanism. It just looks like that from the outside. DocValues should make true field updates implementable. Otis -- Solr ElasticSearch Support http://sematext.com/ On Fri, Mar 29, 2013 at 3:30 PM, Marcin Rzewucki mrzewu...@gmail.com wrote: Hi, Atomic updates (single field updates) do not depend on DocValues. They were implemented in Solr4.0 and works fine (but all fields have to be retrievable). DocValues are supposed to be more efficient than FieldCache. Why not enabled by default ? Maybe because they are not for all fields and because of their limitations (a field has to be single-valued, required or to have default value). Regards. On 29 March 2013 17:20, Timothy Potter thelabd...@gmail.com wrote: Hi Jack, I've just started to dig into this as well, so sharing what I know but still some holes in my knowledge too. DocValues == Column Stride Fields (best resource I know of so far is Simon's preso from Lucene Rev 2011 - http://www.slideshare.net/LucidImagination/column-stride-fields-aka-docvalues ). It's pretty dense but some nuggets I've gleaned from this are: 1) DocValues are more efficient in terms of memory usage and I/O performance for building an alternative to FieldCache (slide 27 is very impressive) 2) DocValues has a more efficient way to store primitive types, such as packed ints 3) Faster random access to stored values In terms of switch-over, you have to re-index to change your fields to use DocValues on disk, which is why they are not enabled by default. Lastly, another goal of DocValues is to allow updates to a single field w/o re-indexing the entire doc. That's not implemented yet but I think still planned. Cheers, Tim On Fri, Mar 29, 2013 at 9:31 AM, Jack Krupansky j...@basetechnology.com wrote: I’m still a little fuzzy on DocValues (maybe because I’m still grappling with how it does or doesn’t still relate to “Column Stride Fields”), so can anybody clue me in as to how useful DocValues is/are? Are DocValues simply an alternative to “stored fields”? If so, and if DocValues are so great, why aren’t we just switching Solr over to DocValues under the hood for all fields? And if there are “issues” with DocValues that would make such a complete switchover less than absolutely desired, what are those issues? In short, when should a user use DocValues over stored fields, and vice versa? As things stand, all we’ve done is make Solr more confusing than it was before, without improving its OOBE. OOBE should be job one in Solr. Thanks. P.S., And if I actually want to do Column Stride Fields, is there a way to do that? -- Jack Krupansky
Re: Solr 4.2 - Slave Index version is higher than Master
Something is really wrong with replication. Check the document attached which has the screen shot. I - re-indexed the master after adding new fields to schema file (its part of config file replication) The UI shows master as gen '6' where as in slaves log the Master gen is '7' The attached document has the screenshot captured. Replication_Issue_4.2.docx http://lucene.472066.n3.nabble.com/file/n4052485/Replication_Issue_4.2.docx -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-2-Slave-Index-version-is-higher-than-Master-tp4049827p4052485.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Too many fields to Sort in Solr
Joel, thanks for your excellent idea using docValues. its working exactly as you described. So far my unit test case has no issues and i see low memory foot print. Will be sending the build for performance that should give comparable numbers. Now i see another replication issue in 4.2. there is a thread on that. thanks Aditya -- View this message in context: http://lucene.472066.n3.nabble.com/Too-many-fields-to-Sort-in-Solr-tp4049139p4052486.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Basic auth on SolrCloud /admin/* calls
Agreed, we don't have clients hitting Solr directly, it is used like a backend database in our usage by intermediaries, similar to say MySQL. Although restricting the access to Solr to fewer hosts is something, I still feel an application has no business being able to perform admin level calls, at least in my use case. This is being very nitpicky though. We also open Solr's port to monitoring servers who shouldn't have access to admin calls and thinking paranoid a compromised app using a single collection could affect the entire cloud with admin call access. Seeing the long term plan is to leave this feature at the container level (which is totally valid), I think I'll continue with the basic auth approach I attempted and see what I can dig up on past efforts. I'll be sure to share what I've done. Thanks Mark! Tim -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Friday, March 29, 2013 1:04 PM To: solr-user@lucene.apache.org Subject: Re: Basic auth on SolrCloud /admin/* calls This has always been the case with Solr. Solr's security model is that clients should not have access to it - only trusted intermediaries should have access to it. Otherwise, it should be locked down at a higher level. That's been the case from day one and still is. That said, someone did do some work on internode basic auth a while back, but it didn't raise a ton of interest yet. - Mark On Mar 29, 2013, at 2:09 PM, Vaillancourt, Tim tvaillanco...@ea.com wrote: Yes, I should have mentioned this is under 4.2 Solr. I sort of expected what I'm doing might be unsupported, but basically my concern is under the current SOLR design, any client with connectivity to SOLR's port can perform Admin-level API calls like create/drop Cores or Collections. I'm only aiming for '/solr/admin/*' calls to separate Application access from the Administrative access logically, and not the non-admin calls like '/update', although you can cause damage with '/update', too. I may try to patch the code to send Basic auth credentials on internal calls just for fun, but I'm thinking longer-term authentication should be implemented/added to the SOLR codebase (for at least admin calls) vs playing with security at the container level, and having the app inside the container aware of it. On the upside, in short testing I was able to get a Collection online using Cores API only using curl calls w/basic auth. Only the Collections API is affected due to it calls itself which do not have auth. Cheers, Tim -Original Message- From: Isaac Hebsh [mailto:isaac.he...@gmail.com] Sent: Friday, March 29, 2013 12:37 AM To: solr-user@lucene.apache.org Subject: Re: Basic auth on SolrCloud /admin/* calls Hi Tim, Are you running Solr 4.2? (In 4.0 and 4.1, the Collections API didn't return any failure message. see SOLR-4043 issue). As far as I know, you can't tell Solr to use authentication credentials when communicating other nodes. It's a bigger issue.. for example, if you want to protect the /update requestHandler, so unauthorized users won't delete your whole collection, it can interfere the replication process. I think it's a necessary mechanism in production environment... I'm curious how do people use SolrCloud in production w/o it. On Fri, Mar 29, 2013 at 3:42 AM, Vaillancourt, Tim tvaillanco...@ea.comwrote: Hey guys, I've recently setup basic auth under Jetty 8 for all my Solr 4.x '/admin/*' calls, in order to protect my Collections and Cores API. Although the security constraint is working as expected ('/admin/*' calls require Basic Auth or return 401), when I use the Collections API to create a collection, I receive a 200 OK to the Collections API CREATE call, but the background Cores API calls that are ran on the Collection API's behalf fail on the Basic Auth on other nodes with a 401 code, as I should have foreseen, but didn't. Is there a way to tell SolrCloud to use authentication on internal Cores API calls that are spawned on Collections API's behalf, or is this a new feature request? To reproduce: 1. Implement basic auth on '/admin/*' URIs. 2. Perform a CREATE Collections API call to a node (which will return 200 OK). 3. Notice all Cores API calls fail (Collection isn't created). See stack trace below from the node that was issued the CREATE call. The stack trace I get is: org.apache.solr.common.SolrException: Server at http://HOST HERE:8983/solrhttp://%3cHOST%20HERE%3e:8983/solr returned non ok status:401, message:Unauthorized at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServ e r.java:373) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServ e r.java:181) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHa n dler.java:169) at
RE: Basic auth on SolrCloud /admin/* calls
Here we go: https://issues.apache.org/jira/browse/SOLR-4470 Tim -Original Message- From: Vaillancourt, Tim [mailto:tvaillanco...@ea.com] Sent: Friday, March 29, 2013 3:25 PM To: solr-user@lucene.apache.org Subject: RE: Basic auth on SolrCloud /admin/* calls Agreed, we don't have clients hitting Solr directly, it is used like a backend database in our usage by intermediaries, similar to say MySQL. Although restricting the access to Solr to fewer hosts is something, I still feel an application has no business being able to perform admin level calls, at least in my use case. This is being very nitpicky though. We also open Solr's port to monitoring servers who shouldn't have access to admin calls and thinking paranoid a compromised app using a single collection could affect the entire cloud with admin call access. Seeing the long term plan is to leave this feature at the container level (which is totally valid), I think I'll continue with the basic auth approach I attempted and see what I can dig up on past efforts. I'll be sure to share what I've done. Thanks Mark! Tim -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Friday, March 29, 2013 1:04 PM To: solr-user@lucene.apache.org Subject: Re: Basic auth on SolrCloud /admin/* calls This has always been the case with Solr. Solr's security model is that clients should not have access to it - only trusted intermediaries should have access to it. Otherwise, it should be locked down at a higher level. That's been the case from day one and still is. That said, someone did do some work on internode basic auth a while back, but it didn't raise a ton of interest yet. - Mark On Mar 29, 2013, at 2:09 PM, Vaillancourt, Tim tvaillanco...@ea.com wrote: Yes, I should have mentioned this is under 4.2 Solr. I sort of expected what I'm doing might be unsupported, but basically my concern is under the current SOLR design, any client with connectivity to SOLR's port can perform Admin-level API calls like create/drop Cores or Collections. I'm only aiming for '/solr/admin/*' calls to separate Application access from the Administrative access logically, and not the non-admin calls like '/update', although you can cause damage with '/update', too. I may try to patch the code to send Basic auth credentials on internal calls just for fun, but I'm thinking longer-term authentication should be implemented/added to the SOLR codebase (for at least admin calls) vs playing with security at the container level, and having the app inside the container aware of it. On the upside, in short testing I was able to get a Collection online using Cores API only using curl calls w/basic auth. Only the Collections API is affected due to it calls itself which do not have auth. Cheers, Tim -Original Message- From: Isaac Hebsh [mailto:isaac.he...@gmail.com] Sent: Friday, March 29, 2013 12:37 AM To: solr-user@lucene.apache.org Subject: Re: Basic auth on SolrCloud /admin/* calls Hi Tim, Are you running Solr 4.2? (In 4.0 and 4.1, the Collections API didn't return any failure message. see SOLR-4043 issue). As far as I know, you can't tell Solr to use authentication credentials when communicating other nodes. It's a bigger issue.. for example, if you want to protect the /update requestHandler, so unauthorized users won't delete your whole collection, it can interfere the replication process. I think it's a necessary mechanism in production environment... I'm curious how do people use SolrCloud in production w/o it. On Fri, Mar 29, 2013 at 3:42 AM, Vaillancourt, Tim tvaillanco...@ea.comwrote: Hey guys, I've recently setup basic auth under Jetty 8 for all my Solr 4.x '/admin/*' calls, in order to protect my Collections and Cores API. Although the security constraint is working as expected ('/admin/*' calls require Basic Auth or return 401), when I use the Collections API to create a collection, I receive a 200 OK to the Collections API CREATE call, but the background Cores API calls that are ran on the Collection API's behalf fail on the Basic Auth on other nodes with a 401 code, as I should have foreseen, but didn't. Is there a way to tell SolrCloud to use authentication on internal Cores API calls that are spawned on Collections API's behalf, or is this a new feature request? To reproduce: 1. Implement basic auth on '/admin/*' URIs. 2. Perform a CREATE Collections API call to a node (which will return 200 OK). 3. Notice all Cores API calls fail (Collection isn't created). See stack trace below from the node that was issued the CREATE call. The stack trace I get is: org.apache.solr.common.SolrException: Server at http://HOST HERE:8983/solrhttp://%3cHOST%20HERE%3e:8983/solr returned non ok status:401, message:Unauthorized at
Re: Solr 4.2 - Slave Index version is higher than Master
That's pretty weird stuff. As a workaround, you might stop replicating your conf files - that takes a sketchier path at the moment. The key to solving this is to figure out how the heck the slave is increasing it's gen…that should require a commit. In this case, *lots* of them. Commits that don't happen on the master. There should not be another way you can increase the gen… Can you share your full logs? - Mark On Mar 29, 2013, at 5:03 PM, adityab aditya_ba...@yahoo.com wrote: Something is really wrong with replication. Check the document attached which has the screen shot. I - re-indexed the master after adding new fields to schema file (its part of config file replication) The UI shows master as gen '6' where as in slaves log the Master gen is '7' The attached document has the screenshot captured. Replication_Issue_4.2.docx http://lucene.472066.n3.nabble.com/file/n4052485/Replication_Issue_4.2.docx -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-2-Slave-Index-version-is-higher-than-Master-tp4049827p4052485.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.2 - Slave Index version is higher than Master
@Mark attached are the full logs from both master and slave. Hope this might be some help. console_master.log http://lucene.472066.n3.nabble.com/file/n4052516/console_master.log console_slave.log http://lucene.472066.n3.nabble.com/file/n4052516/console_slave.log Ignore the mbeans call in master log. I have a program that pings the master every minute. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-2-Slave-Index-version-is-higher-than-Master-tp4049827p4052516.html Sent from the Solr - User mailing list archive at Nabble.com.
Getting better snippets in highlighting component
Hi all: I'm building a document search plattform, basically indexing a lot of PDF files. Some of this files has an index, which means that when I query for normativos in my application (built using Symfony2+PHP+Solarium) I get a few results like this: 10 6.2 Elementos normativos generales 12 6.3 Elementos normativos técnicos ..32 ANEXOS A Formas verbales (normativo Which is a bit of a problem, is there any way I can get rid of this dots? Is there any sort of relevance in the snippets that the highlighting components returns? I mean in this particular case, the snippet came from the index page of the PDF which I hardly think is the best snippet in the document for this particular query, any thought on this? Is there any golden rule to treat cases like this? Thanks a lot! http://www.uci.cu
Re: Getting better snippets in highlighting component
It looks like a table of contents. The dots are followed by the page number, followed by the text from the next table of contents entry, and repeat. Even Google doesn't do anything special for this. For example, search for chapter 1 chapter 2 pdf: [PDF] 2013 Publication 505 - Internal Revenue Service www.irs.gov/pub/irs-pdf/p505.pdfFile Format: PDF/Adobe Acrobat Mar 21, 2013 – Introduction . . . . . . . . . . . . . . . . . . 1. What's New for 2013 . . . . . . . . . . . . . 2. Reminders . . . . . . . . . . . . . . . . . . . 2. Chapter 1. Tax Withholding for ... I'm sure somebody can come up with a clever heuristic to avoid this kind of thing. Maybe simply truncate any sequence of white space and only punctuation down to two or three characters or so. -- Jack Krupansky -Original Message- From: Jorge Luis Betancourt Gonzalez Sent: Friday, March 29, 2013 10:34 PM To: solr-user@lucene.apache.org Subject: Getting better snippets in highlighting component Hi all: I'm building a document search plattform, basically indexing a lot of PDF files. Some of this files has an index, which means that when I query for normativos in my application (built using Symfony2+PHP+Solarium) I get a few results like this: 10 6.2 Elementos normativos generales 12 6.3 Elementos normativos técnicos ..32 ANEXOS A Formas verbales (normativo Which is a bit of a problem, is there any way I can get rid of this dots? Is there any sort of relevance in the snippets that the highlighting components returns? I mean in this particular case, the snippet came from the index page of the PDF which I hardly think is the best snippet in the document for this particular query, any thought on this? Is there any golden rule to treat cases like this? Thanks a lot! http://www.uci.cu
Re: Getting better snippets in highlighting component
Hi Jack: Thanks for the reply, exactly I know is a common thing to encounter this TOC in a lot of files, I'm plying with the regex fragmenter to be a little more selective about the generated snippets, but no luck so far. - Mensaje original - De: Jack Krupansky j...@basetechnology.com Para: solr-user@lucene.apache.org Enviados: Sábado, 30 de Marzo 2013 0:40:03 Asunto: Re: Getting better snippets in highlighting component It looks like a table of contents. The dots are followed by the page number, followed by the text from the next table of contents entry, and repeat. Even Google doesn't do anything special for this. For example, search for chapter 1 chapter 2 pdf: [PDF] 2013 Publication 505 - Internal Revenue Service www.irs.gov/pub/irs-pdf/p505.pdfFile Format: PDF/Adobe Acrobat Mar 21, 2013 – Introduction . . . . . . . . . . . . . . . . . . 1. What's New for 2013 . . . . . . . . . . . . . 2. Reminders . . . . . . . . . . . . . . . . . . . 2. Chapter 1. Tax Withholding for ... I'm sure somebody can come up with a clever heuristic to avoid this kind of thing. Maybe simply truncate any sequence of white space and only punctuation down to two or three characters or so. -- Jack Krupansky -Original Message- From: Jorge Luis Betancourt Gonzalez Sent: Friday, March 29, 2013 10:34 PM To: solr-user@lucene.apache.org Subject: Getting better snippets in highlighting component Hi all: I'm building a document search plattform, basically indexing a lot of PDF files. Some of this files has an index, which means that when I query for normativos in my application (built using Symfony2+PHP+Solarium) I get a few results like this: 10 6.2 Elementos normativos generales 12 6.3 Elementos normativos técnicos ..32 ANEXOS A Formas verbales (normativo Which is a bit of a problem, is there any way I can get rid of this dots? Is there any sort of relevance in the snippets that the highlighting components returns? I mean in this particular case, the snippet came from the index page of the PDF which I hardly think is the best snippet in the document for this particular query, any thought on this? Is there any golden rule to treat cases like this? Thanks a lot! http://www.uci.cu http://www.uci.cu http://www.uci.cu