Re: Queries on SynonymFilterFactory
I've managed to run the synonyms with 10 different synonyms file. Each of the synonym file size is 1MB, which consist of about 1000 tokens, and each token has about 40-50 words. These lists of files are more extreme, which I probably won't use for the real environment, except now for the testing purpose. The QTime is about 100-200, as compared to about 50 for collection without synonyms configured. Is this timing consider fast or slow? Although the synonyms files are big, there's not that many index in my collection yet. Just afraid the performance will be affected when more index comes in. Regards, Edwin On 9 May 2015 00:14, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Thank you for your suggestions. I can't do a proper testing on that yet as I'm currently using a 4GB RAM normal PC machine, and all these probably requires more RAM that what I have. I've tried running the setup with 20 synonyms file, and the system went Out of Memory before I could test anything. For your option 2), do you mean that I'll need to download a synonym database (like the one with over 20MB in size which I have), and index them into an Ad Hoc Solr Core to manage them? I probably can only try them out properly when I can get the server machine with more RAM. Regards, Edwin On 8 May 2015 at 22:16, Alessandro Benedetti benedetti.ale...@gmail.com wrote: This is a quite big Sinonym corpus ! If it's not feasible to have only 1 big synonym file ( I haven't checked, so I assume the 1 Mb limit is true, even if strange) I would do an experiment : 1) testing query time with a Solr Classic config 2) Use an Ad Hoc Solr Core to manage Synonyms ( in this way we can keep it updated and use it with a custom version of the Sysnonym filter that will get the Synonyms directly from another Solr instance). 2b) develop a Solr plugin to provide this approach If the synonym thesaurus is really big, I guess managing them through another Solr Core ( or something similar) locally , will be better than managing it with an external web service. Cheers 2015-05-08 12:16 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: So it means like having more than 10 or 20 synonym files locally will still be faster than accessing external service? As I found out that zookeeper only allows the synonym.txt file to be a maximum of 1MB, and as my potential synonym file is more than 20MB, I'll need to split the file to more than 20 of them. Regards, Edwin -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
答复: How to get the docs id after commit
Sorry. The newest means all the docs that last committed, I need to get ids of these docs to trigger another server to do something. -邮件原件- 发件人: Erick Erickson [mailto:erickerick...@gmail.com] 发送时间: 2015年5月10日 23:22 收件人: solr-user@lucene.apache.org 主题: Re: How to get the docs id after commit Not really. It's an ambiguous thing though, what's a newest document when a whole batch is committed at once? And in distributed mode, you can fire docs to any node in the cloud and they'll get to the right shard, but order is not guaranteed so newest is a fuzzy concept. I'd put a counter in my docs that I guaranteed was increasing and just q=*:*rows=1sort=timestamp desc. That should give you the most recent doc. Beware using a timestamp though if you're not absolutely sure that the clock times you use are comparable! Best, Erick On Sun, May 10, 2015 at 12:57 AM, liwen(李文).apabi l@founder.com.cn wrote: Hi, Solr Developers I want to get the newest commited docs in the postcommit event, then nofity the other server which data can be used, but I can not find any way to get the newest docs after commited, so is there any way to do this? Thank you. Wen Li
Unable to identify why faceting is taking so much time
I trying to facet over some data. My query is: http://localhost:9020/search/p1-umShard-1/select?q=*:*fq=(msgType:38+AND+snCreatedTime:[2015-04-15T00:00:00Z%20TO%20*])debug=timingwt=jsonrows=0 { - responseHeader: { - status: 0, - QTime: 45 }, - response: { - numFound: 137, - start: 0, - docs: [ ] }, - debug: { - timing: { - time: 45, - prepare: { - time: 0, - query: { - time: 0 }, - facet: { - time: 0 }, - mlt: { - time: 0 }, - highlight: { - time: 0 }, - stats: { - time: 0 }, - debug: { - time: 0 } }, - process: { - time: 45, - query: { - time: 45 }, - facet: { - time: 0 }, - mlt: { - time: 0 }, - highlight: { - time: 0 }, - stats: { - time: 0 }, - debug: { - time: 0 } } } } } According to this there are 137 records. Now I am faceting over these 137 records with facet.method=fc. Ideally it should just iterate over these 137 records and sub up the facets. Facet query is: http://localhost:9020/search/p1-umShard-1/select?q=*:*fq=(msgType:38+AND+snCreatedTime:[2015-04-15T00:00:00Z%20TO%20*])facet.field=conversationIdfacet=trueindent=onwt=jsonrows=0facet.method=fcdebug=timing { - responseHeader: { - status: 0, - QTime: 395103 }, - response: { - numFound: 137, - start: 0, - docs: [ ] }, - facet_counts: { - facet_queries: { }, - facet_fields: { - conversationId: [ - t_mid.1429800181915:43409a654f429a7279, - 14, - t_mid.1430066755916:3f1df73a90f3f56b24, - 12, - t_mid.1424867675391:7a0ce173662f6b3230, - 10, - t_mid.1429264970537:d53579af6852fdd409, - 8, - t_mid.1429968009539:ad97aa3fcfc933ac32, - 6, - t_mid.1429076620603:cf8c8da6cc7c0f7a40, - 5, - t_mid.1429967431080:6f1037c42bc6d10921, - 4, - t_mid.1430335716379:e8d2d7390c6d999689, - 4, - t_mid.1430591984365:9c66f4b3f67a973193, - 4, - t_mid.1431105168474:f5d294b79df5e97a26, - 4, - t_id.539747739369904, - 3, - t_mid.1423253619046:ef3da504f704e12448, - 3, - t_mid.1424454328414:91f82976dc8196e034, - 3, - t_mid.1429967443439:dacb57b0f96b00cb63, - 3, - t_mid.1430734315969:e5002ecd489b51cc19, - 3, - t_mid.1423229143533:71f3dd0f3714f44232, - 2, - t_mid.1429076490131:87feb49fa82041dd77, - 2, - t_mid.1429080523489:00a85a2b07980c9a19, - 2, - t_mid.1429913551113:5870b4366960dc5c10, - 2, - t_mid.1429917749072:7cbdaf3d8c2d15ef78, - 2, - t_mid.1429966041997:616561349e22cb7001, - 2, - t_mid.1429968203236:bcd0c539ae66947618, - 2, - t_mid.1429982604402:6e509023526a0f5b09, - 2, - t_mid.1430475210140:8a963390e62e26f497, - 2, - t_mid.1430746574833:59b08895c5287a2998, - 2, - t_mid.1423229237215:d03fb607be18b2d089, - 1, - t_mid.1423256045556:63089c5cc77c800113, - 1, - t_mid.1426870505993:a5b69b271bea481730, - 1, - t_mid.1428776595760:d5ebc1f3b922952e41, - 1, - t_mid.1429079296566:f9f0e4c24071e55444, - 1, - t_mid.1429315090481:9b7d59d6d483999d57, - 1, - t_mid.1429498786426:04f58597d3f5461330, - 1, - t_mid.1429878261810:4bdc3e6442db876c21, - 1, - t_mid.1429906605359:0f89faf08295015957, - 1, - t_mid.1429915168615:365578d261795d6140, - 1, - t_mid.1429968022645:2a362d85be63c2ab95, - 1, - t_mid.1429968121564:2effeb664562bd9b26, - 1, - t_mid.1429969582192:5aca482f37dca9d843, - 1, - t_mid.1429977290539:4e77d3d821bf0f4776, - 1, -
Re: Solr in different locations
SolrCloud is not for multi-region distribution, the latency will kill you. You may find it useful to review Apple presentation from last Solr Revolution, they discussed bi-derectional message queue for a similar setup: https://youtu.be/_Erkln5WWLw?list=PLU6n9Voqu_1FM8nmVwiWWDRtsEjlPqhgP Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 10 May 2015 at 22:32, Moshe Recanati mos...@kmslh.com wrote: Hi, We would like have a system that will run on different regions with same Solr index. These are the regions: ... I would like to know what is the best practice to implement it. If it by implementing SolrCloud please share some basic guidelines on how to enable it and configure. Thank you, *Regards,* *Moshe Recanati*
Re: Solr Multilingual Indexing with one field- Guidance
On 8 May 2015 at 04:23, Kuntal Ganguly gangulykuntal1...@gmail.com wrote: Please provide some guidance This question comes up a lot on the list and has been discussed multiple times. Did you try searching the mailing list for past discussions? E.g. something like: http://search-lucene.com/?q=multilingual+indexing+single+fieldfc_project=Solrfc_type=mail+_hash_+user Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/
Re: 答复: How to get the docs id after commit
Not something really built into Solr. It's easy enough, at least conceptually, to build in a batch_id. The idea here would be that every doc in each batch would have a unique id (really, something you changed after each commit). That pretty much requires, though, that you control the indexing carefully (we're probably talking SolrJ here). There's no good way that I know to get this info after an autocommit for instance. I suppose you could use a TimestampUpdateProcessorFactory and keep high water marks so a query like q=timestamp:[last_timestamp_I_checked TO most_recent_timestamp] would do it. Even that, though, has some issues in SolrCloud because each server's time may be slightly off. You can get around this by placing the TimestampUpdateProcessorFactory in _front_ of the distributed update processor in your update chain, but then you'd really require that all updates be sent to the _same_ machine, or that the commit intervals were guaranteed to be outside the clock skew on your machines. Bottom line is that you'd have to build it yourself, there's no OOB functionality here. Even all the docs that last committed is ambiguous. What about autocommits? Does last committed mean _just_ the ones between the last two autocommits? It seems like you really want all the docs committed since last time I asked. And for that, you really need to control the mechanism yourself. Not only does Solr not provide this OOB, I'm not even sure what it could be implemented in a general case unless Solr became transactional. Best, Erick On Sun, May 10, 2015 at 5:38 PM, liwen(李文).apabi l@founder.com.cn wrote: Sorry. The newest means all the docs that last committed, I need to get ids of these docs to trigger another server to do something. -邮件原件- 发件人: Erick Erickson [mailto:erickerick...@gmail.com] 发送时间: 2015年5月10日 23:22 收件人: solr-user@lucene.apache.org 主题: Re: How to get the docs id after commit Not really. It's an ambiguous thing though, what's a newest document when a whole batch is committed at once? And in distributed mode, you can fire docs to any node in the cloud and they'll get to the right shard, but order is not guaranteed so newest is a fuzzy concept. I'd put a counter in my docs that I guaranteed was increasing and just q=*:*rows=1sort=timestamp desc. That should give you the most recent doc. Beware using a timestamp though if you're not absolutely sure that the clock times you use are comparable! Best, Erick On Sun, May 10, 2015 at 12:57 AM, liwen(李文).apabi l@founder.com.cn wrote: Hi, Solr Developers I want to get the newest commited docs in the postcommit event, then nofity the other server which data can be used, but I can not find any way to get the newest docs after commited, so is there any way to do this? Thank you. Wen Li
Solr in different locations
Hi, We would like have a system that will run on different regions with same Solr index. These are the regions: 1. Europe 2. Singapore 3. US I would like to know what is the best practice to implement it. If it by implementing SolrCloud please share some basic guidelines on how to enable it and configure. Thank you, Regards, Moshe Recanati SVP Engineering Office + 972-73-2617564 Mobile + 972-52-6194481 Skype: recanati [KMS2]http://finance.yahoo.com/news/kms-lighthouse-named-gartner-cool-121000184.html More at: www.kmslh.comhttp://www.kmslh.com/ | LinkedInhttp://www.linkedin.com/company/kms-lighthouse | FBhttps://www.facebook.com/pages/KMS-lighthouse/123774257810917
Re: JSON Facet Analytics API in Solr 5.1
Thank you, Yonik! Looks cool to me. Only problem is it is not working for me. I see you have cats and cat in your URL. cat must be a field name. What is cats? We are doing a POC with facet count ascending. You help is really important to us. On Sat, May 9, 2015 at 8:05 AM, Yonik Seeley ysee...@gmail.com wrote: curl -g http://localhost:8983/solr/techproducts/query?q=*:*json.facet={cats:{terms:{field:cat,sort:'count+asc'}}} Using curl with everything in the URL is definitely trickier. Everything needs to be URL escaped. If it's not, curl will often silently do nothing. For example, when I had sort:'count asc' , the command above would do nothing. When I remembered to URL encode the space as a +, it started working. It's definitely easier to use -d with curl... curl http://localhost:8983/solr/techproducts/query; -d 'q=*:*json.facet={cats:{terms:{field:cat,sort:count asc}}}' That also allows you to format it nicer for reading as well: curl http://localhost:8983/solr/techproducts/query; -d 'q=*:*json.facet= {cats:{terms:{ field:cat, sort:count asc }}}' -Yonik On Thu, May 7, 2015 at 5:32 PM, Frank li fudon...@gmail.com wrote: This one does not have problem, but how do I include sort in this facet query. Basically, I want to write a solr query which can sort the facet count ascending. Something like http://localhost:8983/solr /demo/query?q=applejson.facet={field=price sort='count asc'} http://localhost:8983/solr/demo/query?q=applejson.facet=%7Bx:%27avg%28price%29%27%7D I really appreciate your help. Frank http://localhost:8983/solr/demo/query?q=applejson.facet=%7Bx:%27avg%28price%29%27%7D On Thu, May 7, 2015 at 2:24 PM, Yonik Seeley ysee...@gmail.com wrote: On Thu, May 7, 2015 at 4:47 PM, Frank li fudon...@gmail.com wrote: Hi Yonik, I am reading your blog. It is helpful. One question for you, for following example, curl http://localhost:8983/solr/query -d 'q=*:*rows=0 json.facet={ categories:{ type : terms, field : cat, sort : { x : desc}, facet:{ x : avg(price), y : sum(price) } } } ' If I want to write it in the format of this: http://localhost:8983/solr/query?q=applejson.facet={x:'avg(campaign_ult_defendant_cnt_is) '} , how do I do? What problems do you encounter when you try that? If you try that URL with curl, be aware that curly braces {} are special globbing characters in curl. Turn them off with the -g option: curl -g http://localhost:8983/solr/demo/query?q=applejson.facet={x:'avg(price)'} -Yonik
Re: JSON Facet Analytics API in Solr 5.1
Here is our SOLR query: http://qa-solr:8080/solr/select?q=type:PortalCasejson.facet={categories:{terms:{field:campaign_id_ls,sort:%27count+asc%27}}}rows=0 I replaced cats with categories. It is still not working. On Sun, May 10, 2015 at 12:10 AM, Frank li fudon...@gmail.com wrote: Thank you, Yonik! Looks cool to me. Only problem is it is not working for me. I see you have cats and cat in your URL. cat must be a field name. What is cats? We are doing a POC with facet count ascending. You help is really important to us. On Sat, May 9, 2015 at 8:05 AM, Yonik Seeley ysee...@gmail.com wrote: curl -g http://localhost:8983/solr/techproducts/query?q=*:*json.facet={cats:{terms:{field:cat,sort:'count+asc'}}} http://localhost:8983/solr/techproducts/query?q=*:*json.facet=%7Bcats:%7Bterms:%7Bfield:cat,sort:'count+asc'%7D%7D%7D Using curl with everything in the URL is definitely trickier. Everything needs to be URL escaped. If it's not, curl will often silently do nothing. For example, when I had sort:'count asc' , the command above would do nothing. When I remembered to URL encode the space as a +, it started working. It's definitely easier to use -d with curl... curl http://localhost:8983/solr/techproducts/query; -d 'q=*:*json.facet={cats:{terms:{field:cat,sort:count asc}}}' That also allows you to format it nicer for reading as well: curl http://localhost:8983/solr/techproducts/query; -d 'q=*:*json.facet= {cats:{terms:{ field:cat, sort:count asc }}}' -Yonik On Thu, May 7, 2015 at 5:32 PM, Frank li fudon...@gmail.com wrote: This one does not have problem, but how do I include sort in this facet query. Basically, I want to write a solr query which can sort the facet count ascending. Something like http://localhost:8983/solr /demo/query?q=applejson.facet={field=price sort='count asc'} http://localhost:8983/solr/demo/query?q=applejson.facet=%7Bx:%27avg%28price%29%27%7D I really appreciate your help. Frank http://localhost:8983/solr/demo/query?q=applejson.facet=%7Bx:%27avg%28price%29%27%7D On Thu, May 7, 2015 at 2:24 PM, Yonik Seeley ysee...@gmail.com wrote: On Thu, May 7, 2015 at 4:47 PM, Frank li fudon...@gmail.com wrote: Hi Yonik, I am reading your blog. It is helpful. One question for you, for following example, curl http://localhost:8983/solr/query -d 'q=*:*rows=0 json.facet={ categories:{ type : terms, field : cat, sort : { x : desc}, facet:{ x : avg(price), y : sum(price) } } } ' If I want to write it in the format of this: http://localhost:8983/solr/query?q=applejson.facet={x:'avg(campaign_ult_defendant_cnt_is) '} , how do I do? What problems do you encounter when you try that? If you try that URL with curl, be aware that curly braces {} are special globbing characters in curl. Turn them off with the -g option: curl -g http://localhost:8983/solr/demo/query?q=applejson.facet={x:'avg(price)'} -Yonik
How to get the docs id after commit
Hi, Solr Developers I want to get the newest commited docs in the postcommit event, then nofity the other server which data can be used, but I can not find any way to get the newest docs after commited, so is there any way to do this? Thank you. Wen Li
Re: JSON Facet Analytics API in Solr 5.1
I figured it out now. It works. cats just a name, right? It does not matter what is used. Really appreciate your help. This is going to be really useful. I meant json.facet. On Sun, May 10, 2015 at 12:13 AM, Frank li fudon...@gmail.com wrote: Here is our SOLR query: http://qa-solr:8080/solr/select?q=type:PortalCasejson.facet={categories:{terms:{field:campaign_id_ls,sort:%27count+asc%27}}}rows=0 http://qa-solr:8080/solr/select?q=type:PortalCasejson.facet=%7Bcategories:%7Bterms:%7Bfield:campaign_id_ls,sort:%27count+asc%27%7D%7D%7Drows=0 I replaced cats with categories. It is still not working. On Sun, May 10, 2015 at 12:10 AM, Frank li fudon...@gmail.com wrote: Thank you, Yonik! Looks cool to me. Only problem is it is not working for me. I see you have cats and cat in your URL. cat must be a field name. What is cats? We are doing a POC with facet count ascending. You help is really important to us. On Sat, May 9, 2015 at 8:05 AM, Yonik Seeley ysee...@gmail.com wrote: curl -g http://localhost:8983/solr/techproducts/query?q=*:*json.facet={cats:{terms:{field:cat,sort:'count+asc'}}} http://localhost:8983/solr/techproducts/query?q=*:*json.facet=%7Bcats:%7Bterms:%7Bfield:cat,sort:'count+asc'%7D%7D%7D Using curl with everything in the URL is definitely trickier. Everything needs to be URL escaped. If it's not, curl will often silently do nothing. For example, when I had sort:'count asc' , the command above would do nothing. When I remembered to URL encode the space as a +, it started working. It's definitely easier to use -d with curl... curl http://localhost:8983/solr/techproducts/query; -d 'q=*:*json.facet={cats:{terms:{field:cat,sort:count asc}}}' That also allows you to format it nicer for reading as well: curl http://localhost:8983/solr/techproducts/query; -d 'q=*:*json.facet= {cats:{terms:{ field:cat, sort:count asc }}}' -Yonik On Thu, May 7, 2015 at 5:32 PM, Frank li fudon...@gmail.com wrote: This one does not have problem, but how do I include sort in this facet query. Basically, I want to write a solr query which can sort the facet count ascending. Something like http://localhost:8983/solr /demo/query?q=applejson.facet={field=price sort='count asc'} http://localhost:8983/solr/demo/query?q=applejson.facet=%7Bx:%27avg%28price%29%27%7D I really appreciate your help. Frank http://localhost:8983/solr/demo/query?q=applejson.facet=%7Bx:%27avg%28price%29%27%7D On Thu, May 7, 2015 at 2:24 PM, Yonik Seeley ysee...@gmail.com wrote: On Thu, May 7, 2015 at 4:47 PM, Frank li fudon...@gmail.com wrote: Hi Yonik, I am reading your blog. It is helpful. One question for you, for following example, curl http://localhost:8983/solr/query -d 'q=*:*rows=0 json.facet={ categories:{ type : terms, field : cat, sort : { x : desc}, facet:{ x : avg(price), y : sum(price) } } } ' If I want to write it in the format of this: http://localhost:8983/solr/query?q=applejson.facet={x:'avg(campaign_ult_defendant_cnt_is) '} , how do I do? What problems do you encounter when you try that? If you try that URL with curl, be aware that curly braces {} are special globbing characters in curl. Turn them off with the -g option: curl -g http://localhost:8983/solr/demo/query?q=applejson.facet={x:'avg(price)'} -Yonik
Re: How to get the docs id after commit
Not really. It's an ambiguous thing though, what's a newest document when a whole batch is committed at once? And in distributed mode, you can fire docs to any node in the cloud and they'll get to the right shard, but order is not guaranteed so newest is a fuzzy concept. I'd put a counter in my docs that I guaranteed was increasing and just q=*:*rows=1sort=timestamp desc. That should give you the most recent doc. Beware using a timestamp though if you're not absolutely sure that the clock times you use are comparable! Best, Erick On Sun, May 10, 2015 at 12:57 AM, liwen(李文).apabi l@founder.com.cn wrote: Hi, Solr Developers I want to get the newest commited docs in the postcommit event, then nofity the other server which data can be used, but I can not find any way to get the newest docs after commited, so is there any way to do this? Thank you. Wen Li
Re: Solr in different locations
This question is much too broad to answer, what progress have you made so far? What have you tried? What problems have you encountered? You might review: http://wiki.apache.org/solr/UsingMailingLists Best, Erick On Sun, May 10, 2015 at 5:32 AM, Moshe Recanati mos...@kmslh.com wrote: Hi, We would like have a system that will run on different regions with same Solr index. These are the regions: 1. Europe 2. Singapore 3. US I would like to know what is the best practice to implement it. If it by implementing SolrCloud please share some basic guidelines on how to enable it and configure. Thank you, *Regards,* *Moshe Recanati* *SVP Engineering* Office + 972-73-2617564 Mobile + 972-52-6194481 Skype: recanati [image: KMS2] http://finance.yahoo.com/news/kms-lighthouse-named-gartner-cool-121000184.html More at: www.kmslh.com | LinkedIn http://www.linkedin.com/company/kms-lighthouse | FB https://www.facebook.com/pages/KMS-lighthouse/123774257810917
Re: New article on ZK Poison Packet
Cool stuff - thanks for sharing Siegfried Goeschl On 09 May 2015, at 08:43, steve sc_shep...@hotmail.com wrote: While very technical and unusual, a very interesting view of the world of Linux and ZooKeeper Clusters... http://www.pagerduty.com/blog/the-discovery-of-apache-zookeepers-poison-packet/