Re: How can i search site name
Sorry, Please let me know how can I search site name using the solr query syntax. My results should show title, url and content. Title and content are being searched even though the defaultSearchFieldcontent/defaultSearchField. I need url or site name too. please, help. Thanks in advance. On Tue, May 22, 2012 at 11:05 AM, ketan kore ketankore...@gmail.com wrote: you can go on www.google.com and just type the site which you want to search and google will show you the results as simple as that ...
Re: How can i search site name
you should define your search first. if the site is www.google.com. how do you match it. full string matching or partial matching. e.g. is google should match? if it does, you should write your own analyzer for this field. On Tue, May 22, 2012 at 2:03 PM, Shameema Umer shem...@gmail.com wrote: Sorry, Please let me know how can I search site name using the solr query syntax. My results should show title, url and content. Title and content are being searched even though the defaultSearchFieldcontent/defaultSearchField. I need url or site name too. please, help. Thanks in advance. On Tue, May 22, 2012 at 11:05 AM, ketan kore ketankore...@gmail.com wrote: you can go on www.google.com and just type the site which you want to search and google will show you the results as simple as that ...
Re: Indexing Searching MySQL table with Hindi and English data
Hi, Thank you so much for replying. The MySQL database server is running on a Fedora Core 12 Machine with Hindi Language Support enabled. Details of the database are - ENGINE=3DMyISAM and DEFAULT CHARSET=3Dutf8 Data is imported using the Solr DataImportHandler (mysql jdbc driver). In the schema.xml file the title field is defined as: field name=title type=text_general indexed=true stored=true/ I tried saving the query results directly to a text file from the MySQL command prompt but it is not storing the results correctly. The file contains the following characters. à ¤¸à ¥Åà ¤° à ¤Šà ¤°à ¥8dà ¤Åà ¤¾ Saur oorja First line of the data-config.xml is ?xml version=1.0 encoding=UTF-8? Please suggest what I have to do to solve this issue. Regards, Sanjailal KP On 5/21/12, Jack Krupansky j...@basetechnology.com wrote: Is it possible that your text editor/display does not support UTF-8 encoding? Assuming the data is properly encoded, do you have the encoding=UTF-8 attribute in your DIH dataSource tag? -- Jack Krupansky -Original Message- From: KP Sanjailal Sent: Monday, May 21, 2012 7:37 AM To: solr-user@lucene.apache.org Subject: Re: Indexing Searching MySQL table with Hindi and English data Hi, Thank you so much for replying. The MySQL database server is running on a Fedora Core 12 Machine with Hindi Language Support enabled. Details of the database are - ENGINE=MyISAM and DEFAULT CHARSET=utf8 Data is imported using the Solr DataImportHandler (mysql jdbc driver). In the schema.xml file the title field is defined as: field name=title type=text_general indexed=true stored=true/ I tried saving the query results directly to a text file from the MySQL command prompt but it is not storing the results correctly. The file contains the following characters. à ¤¸à ¥Åà ¤° à ¤Šà ¤°à ¥8dà ¤Åà ¤¾ Saur oorja Please suggest what I have to do to solve this issue. Regards, Sanjailal KP -- On Sun, May 20, 2012 at 6:59 AM, Lance Norskog goks...@gmail.com wrote: Also, try saving data from a query into a file and verify that it is UTF-8 and the characters are correct. On Fri, May 18, 2012 at 7:54 AM, Jack Krupansky j...@basetechnology.com wrote: Check the analyzers for the field types containing Hindi text to be sure that they are not using a character mapping or folding filter that might mangle the Hindi characters. Post the field type, say for the title field. Also, try manually (using curl or the post jar) adding a single document that has Hindi data and see if that works. -- Jack Krupansky -Original Message- From: KP Sanjailal Sent: Thursday, May 17, 2012 5:55 AM To: solr-user@lucene.apache.org Subject: Indexing Searching MySQL table with Hindi and English data Hi, I tried to setup indexing of MySQL tables in Apache Solr 3.6. Everything works fine but text in Hindi script (only some 10% of total records) not getting indexed properly. A search with keyword in Hindi retrieve emptly result set. Also a retrieved hindi record displays junk characters. The database tables contains bibliographical details of books such as title, author, publisher, isbn, publishing place, series etc. and out of the total records about 10% of records contains text in Hindi in title, author, publisher fields. Example: *Search Results from MySQL using PHP* 1. http://192.168.0.132/shared/biblio_view.php?bibid=26913tab=opac *Title:* सौर ऊर्जा Saur oorjahttp://192.168.0.132/shared/biblio_view.php?bibid=26913tab=opac *Author(s):* विनोद कुमार मिश्र MISHRA (VK) *Material:* Books ** ** *Search Results from Apache Solr (searched using keyword in English)* 1. http://192.168.0.132/test/biblio_view.php?bibid=26913tab=opac *Title:* सौर ऊरॠजा Saur oorjahttp://192.168.0.132/test/biblio_view.php?bibid=26913tab=opac *Author(s):* विनोद कॠमार मिशॠर MISHRA (VK) * Material:* Books How do I go about solving this language problem. Thanks in advace. K. P. Sanjailal -- -- Lance Norskog goks...@gmail.com
Re: Indexing Searching MySQL table with Hindi and English data
Hi, Thank you so much for replying. The MySQL database server is running on a Fedora Core 12 Machine with Hindi Language Support enabled. Details of the database are - ENGINE=MyISAM and DEFAULT CHARSET=utf8 Data is imported using the Solr DataImportHandler (mysql jdbc driver). In the schema.xml file the title field is defined as: field name=title type=text_general indexed=true stored=true/ I tried saving the query results directly to a text file from the MySQL command prompt but it is not storing the results correctly. The file contains the following characters. à ¤¸à ¥Åà ¤° à ¤Šà ¤°à ¥8dà ¤Åà ¤¾ Saur oorja First line of the data-config.xml is ?xml version=1.0 encoding=UTF-8? Please suggest what I have to do to solve this issue. Regards, Sanjailal KP On 5/22/12, KP Sanjailal kpsanjai...@gmail.com wrote: Hi, Thank you so much for replying. The MySQL database server is running on a Fedora Core 12 Machine with Hindi Language Support enabled. Details of the database are - ENGINE=3DMyISAM and DEFAULT CHARSET=3Dutf8 Data is imported using the Solr DataImportHandler (mysql jdbc driver). In the schema.xml file the title field is defined as: field name=title type=text_general indexed=true stored=true/ I tried saving the query results directly to a text file from the MySQL command prompt but it is not storing the results correctly. The file contains the following characters. à ¤¸à ¥Åà ¤° à ¤Šà ¤°à ¥8dà ¤Åà ¤¾ Saur oorja First line of the data-config.xml is ?xml version=1.0 encoding=UTF-8? Please suggest what I have to do to solve this issue. Regards, Sanjailal KP On 5/21/12, Jack Krupansky j...@basetechnology.com wrote: Is it possible that your text editor/display does not support UTF-8 encoding? Assuming the data is properly encoded, do you have the encoding=UTF-8 attribute in your DIH dataSource tag? -- Jack Krupansky -Original Message- From: KP Sanjailal Sent: Monday, May 21, 2012 7:37 AM To: solr-user@lucene.apache.org Subject: Re: Indexing Searching MySQL table with Hindi and English data Hi, Thank you so much for replying. The MySQL database server is running on a Fedora Core 12 Machine with Hindi Language Support enabled. Details of the database are - ENGINE=MyISAM and DEFAULT CHARSET=utf8 Data is imported using the Solr DataImportHandler (mysql jdbc driver). In the schema.xml file the title field is defined as: field name=title type=text_general indexed=true stored=true/ I tried saving the query results directly to a text file from the MySQL command prompt but it is not storing the results correctly. The file contains the following characters. à ¤¸à ¥Åà ¤° à ¤Šà ¤°à ¥8dà ¤Åà ¤¾ Saur oorja Please suggest what I have to do to solve this issue. Regards, Sanjailal KP -- On Sun, May 20, 2012 at 6:59 AM, Lance Norskog goks...@gmail.com wrote: Also, try saving data from a query into a file and verify that it is UTF-8 and the characters are correct. On Fri, May 18, 2012 at 7:54 AM, Jack Krupansky j...@basetechnology.com wrote: Check the analyzers for the field types containing Hindi text to be sure that they are not using a character mapping or folding filter that might mangle the Hindi characters. Post the field type, say for the title field. Also, try manually (using curl or the post jar) adding a single document that has Hindi data and see if that works. -- Jack Krupansky -Original Message- From: KP Sanjailal Sent: Thursday, May 17, 2012 5:55 AM To: solr-user@lucene.apache.org Subject: Indexing Searching MySQL table with Hindi and English data Hi, I tried to setup indexing of MySQL tables in Apache Solr 3.6. Everything works fine but text in Hindi script (only some 10% of total records) not getting indexed properly. A search with keyword in Hindi retrieve emptly result set. Also a retrieved hindi record displays junk characters. The database tables contains bibliographical details of books such as title, author, publisher, isbn, publishing place, series etc. and out of the total records about 10% of records contains text in Hindi in title, author, publisher fields. Example: *Search Results from MySQL using PHP* 1. http://192.168.0.132/shared/biblio_view.php?bibid=26913tab=opac *Title:* सौर ऊर्जा Saur oorjahttp://192.168.0.132/shared/biblio_view.php?bibid=26913tab=opac *Author(s):* विनोद कुमार मिश्र MISHRA (VK) *Material:* Books ** ** *Search Results from Apache Solr (searched using keyword in English)* 1. http://192.168.0.132/test/biblio_view.php?bibid=26913tab=opac *Title:* सौर ऊरॠजा Saur oorjahttp://192.168.0.132/test/biblio_view.php?bibid=26913tab=opac *Author(s):* विनोद कॠमार मिशॠर MISHRA (VK) * Material:* Books How do I go about solving this language problem. Thanks in advace. K. P.
Re: Indexing Searching MySQL table with Hindi and English data
There are are many steps that can go wrong. Your platform should have UTF-8 as its default encoding. Windows and Macos don't do this. I had to configure Chrome to use UTF-8 as its default display encoding. Also, if you use Tomcat, it has to be configured for UTF-8: http://wiki.apache.org/solr/SolrTomcat The characters you posted are not transferring correctly. I think you need to decode them using one of the online unicode utility pages. On Mon, May 21, 2012 at 4:57 AM, Jack Krupansky j...@basetechnology.com wrote: Is it possible that your text editor/display does not support UTF-8 encoding? Assuming the data is properly encoded, do you have the encoding=UTF-8 attribute in your DIH dataSource tag? -- Jack Krupansky -Original Message- From: KP Sanjailal Sent: Monday, May 21, 2012 7:37 AM To: solr-user@lucene.apache.org Subject: Re: Indexing Searching MySQL table with Hindi and English data Hi, Thank you so much for replying. The MySQL database server is running on a Fedora Core 12 Machine with Hindi Language Support enabled. Details of the database are - ENGINE=MyISAM and DEFAULT CHARSET=utf8 Data is imported using the Solr DataImportHandler (mysql jdbc driver). In the schema.xml file the title field is defined as: field name=title type=text_general indexed=true stored=true/ I tried saving the query results directly to a text file from the MySQL command prompt but it is not storing the results correctly. The file contains the following characters. à ¤¸à ¥Åà ¤° à ¤Šà ¤°à ¥8dà ¤Åà ¤¾ Saur oorja Please suggest what I have to do to solve this issue. Regards, Sanjailal KP -- On Sun, May 20, 2012 at 6:59 AM, Lance Norskog goks...@gmail.com wrote: Also, try saving data from a query into a file and verify that it is UTF-8 and the characters are correct. On Fri, May 18, 2012 at 7:54 AM, Jack Krupansky j...@basetechnology.com wrote: Check the analyzers for the field types containing Hindi text to be sure that they are not using a character mapping or folding filter that might mangle the Hindi characters. Post the field type, say for the title field. Also, try manually (using curl or the post jar) adding a single document that has Hindi data and see if that works. -- Jack Krupansky -Original Message- From: KP Sanjailal Sent: Thursday, May 17, 2012 5:55 AM To: solr-user@lucene.apache.org Subject: Indexing Searching MySQL table with Hindi and English data Hi, I tried to setup indexing of MySQL tables in Apache Solr 3.6. Everything works fine but text in Hindi script (only some 10% of total records) not getting indexed properly. A search with keyword in Hindi retrieve emptly result set. Also a retrieved hindi record displays junk characters. The database tables contains bibliographical details of books such as title, author, publisher, isbn, publishing place, series etc. and out of the total records about 10% of records contains text in Hindi in title, author, publisher fields. Example: *Search Results from MySQL using PHP* 1. http://192.168.0.132/shared/biblio_view.php?bibid=26913tab=opac *Title:* सौर ऊर्जा Saur oorjahttp://192.168.0.132/shared/biblio_view.php?bibid=26913tab=opac *Author(s):* विनोद कुमार मिश्र MISHRA (VK) *Material:* Books ** ** *Search Results from Apache Solr (searched using keyword in English)* 1. http://192.168.0.132/test/biblio_view.php?bibid=26913tab=opac *Title:* सौर ऊरॠजा Saur oorjahttp://192.168.0.132/test/biblio_view.php?bibid=26913tab=opac *Author(s):* विनोद कॠमार मिशॠर MISHRA (VK) * Material:* Books How do I go about solving this language problem. Thanks in advace. K. P. Sanjailal -- -- Lance Norskog goks...@gmail.com -- Lance Norskog goks...@gmail.com
Re: Indexing Searching MySQL table with Hindi and English data
Hi I have already configured the Tomcat instance as per the link http://wiki.apache.org/solr/SolrTomcat for the URI Charset Config The necessary updates have made in Tomcat's conf/server.xml with URIEncoding=UTF-8. Thank you for your reply. Sanjailal KP -- On 5/22/12, Lance Norskog goks...@gmail.com wrote: There are are many steps that can go wrong. Your platform should have UTF-8 as its default encoding. Windows and Macos don't do this. I had to configure Chrome to use UTF-8 as its default display encoding. Also, if you use Tomcat, it has to be configured for UTF-8: http://wiki.apache.org/solr/SolrTomcat The characters you posted are not transferring correctly. I think you need to decode them using one of the online unicode utility pages. On Mon, May 21, 2012 at 4:57 AM, Jack Krupansky j...@basetechnology.com wrote: Is it possible that your text editor/display does not support UTF-8 encoding? Assuming the data is properly encoded, do you have the encoding=UTF-8 attribute in your DIH dataSource tag? -- Jack Krupansky -Original Message- From: KP Sanjailal Sent: Monday, May 21, 2012 7:37 AM To: solr-user@lucene.apache.org Subject: Re: Indexing Searching MySQL table with Hindi and English data Hi, Thank you so much for replying. The MySQL database server is running on a Fedora Core 12 Machine with Hindi Language Support enabled. Details of the database are - ENGINE=MyISAM and DEFAULT CHARSET=utf8 Data is imported using the Solr DataImportHandler (mysql jdbc driver). In the schema.xml file the title field is defined as: field name=title type=text_general indexed=true stored=true/ I tried saving the query results directly to a text file from the MySQL command prompt but it is not storing the results correctly. The file contains the following characters. à ¤¸à ¥Åà ¤° à ¤Šà ¤°à ¥8dà ¤Åà ¤¾ Saur oorja Please suggest what I have to do to solve this issue. Regards, Sanjailal KP -- On Sun, May 20, 2012 at 6:59 AM, Lance Norskog goks...@gmail.com wrote: Also, try saving data from a query into a file and verify that it is UTF-8 and the characters are correct. On Fri, May 18, 2012 at 7:54 AM, Jack Krupansky j...@basetechnology.com wrote: Check the analyzers for the field types containing Hindi text to be sure that they are not using a character mapping or folding filter that might mangle the Hindi characters. Post the field type, say for the title field. Also, try manually (using curl or the post jar) adding a single document that has Hindi data and see if that works. -- Jack Krupansky -Original Message- From: KP Sanjailal Sent: Thursday, May 17, 2012 5:55 AM To: solr-user@lucene.apache.org Subject: Indexing Searching MySQL table with Hindi and English data Hi, I tried to setup indexing of MySQL tables in Apache Solr 3.6. Everything works fine but text in Hindi script (only some 10% of total records) not getting indexed properly. A search with keyword in Hindi retrieve emptly result set. Also a retrieved hindi record displays junk characters. The database tables contains bibliographical details of books such as title, author, publisher, isbn, publishing place, series etc. and out of the total records about 10% of records contains text in Hindi in title, author, publisher fields. Example: *Search Results from MySQL using PHP* 1. http://192.168.0.132/shared/biblio_view.php?bibid=26913tab=opac *Title:* सौर ऊर्जा Saur oorjahttp://192.168.0.132/shared/biblio_view.php?bibid=26913tab=opac *Author(s):* विनोद कुमार मिश्र MISHRA (VK) *Material:* Books ** ** *Search Results from Apache Solr (searched using keyword in English)* 1. http://192.168.0.132/test/biblio_view.php?bibid=26913tab=opac *Title:* सौर ऊरॠजा Saur oorjahttp://192.168.0.132/test/biblio_view.php?bibid=26913tab=opac *Author(s):* विनोद कॠमार मिशॠर MISHRA (VK) * Material:* Books How do I go about solving this language problem. Thanks in advace. K. P. Sanjailal -- -- Lance Norskog goks...@gmail.com -- Lance Norskog goks...@gmail.com
System requirements in my case?
Dear Solr users, My company would like to use solr to index around 80 000 000 documents (xml files with around 5~10ko size each). My program (robot) will connect to this solr with boolean requests. Number of users: around 1000 Number of requests by user and by day: 300 Number of users by day: 30 I would like to subscribe to a host provider with this configuration: - Dedicated Server - Ubuntu - Intel Xeon i7 2x 266+ GHz 12 Go 2 * 1500Go - Unlimited bandwidth - IP fixe Do you think this configuration is enough? Thanks for your info, Sincerely Bruno
Re: Question about sampling
My mistake- I did not research whether the data above is stored a strings. The hashcode has to be stored as strings for this trick to work. On Sun, May 20, 2012 at 8:25 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: I'd be curious about this, too! I suspect the answer is: not doable, patches welcome. :) But I'd love to be wrong! Otis Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm From: Yuval Dotan yuvaldo...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Wednesday, May 16, 2012 9:43 AM Subject: Question about sampling Hi Guys We have an environment containing billions of documents. Faceting over this large result set could take many seconds, and so we thought we might be able to use statistical sampling of a smaller result set from the facet, and give an approximate result much quicker. Is there any way to facet only a random sample of the results? Thanks Yuval -- Lance Norskog goks...@gmail.com
Re: Indexing Searching MySQL table with Hindi and English data
On 22 May 2012 12:07, KP Sanjailal kpsanjai...@gmail.com wrote: Hi, Thank you so much for replying. The MySQL database server is running on a Fedora Core 12 Machine with Hindi Language Support enabled. Details of the database are - ENGINE=3DMyISAM and DEFAULT CHARSET=3Dutf8 Data is imported using the Solr DataImportHandler (mysql jdbc driver). In the schema.xml file the title field is defined as: field name=title type=text_general indexed=true stored=true/ Please show us your schema.xml and the configuration file for the DataImportHandler (you might wish to obscure sensitive details like username/password). Have you tried the SELECT in the DIH configuration outside of Solr. Is it producing proper UTF-8? Regards, Gora
fsv=true not returning sort_values for distributed searches
We use fsv=true to help debug sortings which works great for non-distributed searches. However, its not working (no sort_values in response) for multi shard queries. Any idea how to get this fixed? thanks, XJ
Strategy for maintaining De-normalized indexes
Hi, I have a very basic question and hopefully there is a simple answer to this. We are trying to index a simple product catalog which has a master product and child products. Each master product can have multiple child products. A master product can be assigned one or more product categories. Now, we need to be able to show counts of categories based on number of child products in each category. We have indexed data using a join and selecting appropriate values for index from each table. This is basically a De-normalized result set. It works perfectly for our search purposes. However, maintaining the index and keeping index up to date is an issue. Whenever a product master is updated with a new category, we will need to delete all the index entries for child products in index and insert them again. This seems a lot of activity for a regular on-going operation i.e. product category updates. Since, join between schemas is only available in 4.0, what are other strategies to maintain or to create such queries. Thanks for your help. Regards, Sohail
RE: trunk cloud ui not working
Hi, I was using windows 7 but it is fine with chrome on Windows Web Server 2008 R2 also I asked a colleague with windows 7 and it is fine for him too, so really sorry but I think it was a !'works on my machine' thing. Of course if I track down the cause I will reply to this email again. Thanks, Phil -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: 21 May 2012 18:22 To: solr-user@lucene.apache.org Subject: Re: trunk cloud ui not working What OS? I was just trying trunk and looking at that view on Chrome on OSX and Linux and did not see an issue. On May 21, 2012, at 1:15 PM, Phil Hoy wrote: After further investigation I have found that it is not a problem on firefox, only chrome and IE. Phil -Original Message- Sent: 21 May 2012 18:05 To: solr-user@lucene.apache.org Subject: trunk cloud ui not working Hi, I am running from the trunk and the localhost:8983/solr/#/~cloud page shows nothing but Fetch Zookeeper Data. If I run fiddler I see that: http://localhost:8983/solr/zookeeper?wt=jsondetail=truepath=%2Fclust erstate.json and http://localhost:8983/solr/zookeeper?wt=jsonpath=%2Flive_nodes are called and return data but no update to the ui. Cheers, Phil __ This email has been scanned by the brightsolid Email Security System. Powered by MessageLabs __ - Mark Miller lucidimagination.com __ This email has been scanned by the brightsolid Email Security System. Powered by MessageLabs __
Re: How can i search site name
You need to explain your case in much more detail to get precise help. Please read http://wiki.apache.org/solr/UsingMailingLists If your problem is that you have a URL and want to know the domain for it, e.g. www.company.com/foo/bar/index.html and you want only www.company.com you can use the UrlClassifyProcessor, see SOLR-2826. -- Jan Høydahl, search solution architect Cominvent AS - www.facebook.com/Cominvent Solr Training - www.solrtraining.com On 22. mai 2012, at 08:03, Shameema Umer wrote: Sorry, Please let me know how can I search site name using the solr query syntax. My results should show title, url and content. Title and content are being searched even though the defaultSearchFieldcontent/defaultSearchField. I need url or site name too. please, help. Thanks in advance. On Tue, May 22, 2012 at 11:05 AM, ketan kore ketankore...@gmail.com wrote: you can go on www.google.com and just type the site which you want to search and google will show you the results as simple as that ...
Re: System requirements in my case?
Dedicated Server may not be required. If you want to cut down cost, then prefer shared server. How much the RAM? Regards Aditya www.findbestopensource.com On Tue, May 22, 2012 at 12:36 PM, Bruno Mannina bmann...@free.fr wrote: Dear Solr users, My company would like to use solr to index around 80 000 000 documents (xml files with around 5~10ko size each). My program (robot) will connect to this solr with boolean requests. Number of users: around 1000 Number of requests by user and by day: 300 Number of users by day: 30 I would like to subscribe to a host provider with this configuration: - Dedicated Server - Ubuntu - Intel Xeon i7 2x 266+ GHz 12 Go 2 * 1500Go - Unlimited bandwidth - IP fixe Do you think this configuration is enough? Thanks for your info, Sincerely Bruno
Re: Strategy for maintaining De-normalized indexes
Hello, Can't the ID (uniqueKey) of the indexed documents (i.e. denormalized data) be a combination of the master product id and the child product id ? Therefor whenever you update your master product db entry, you simply need to reindex documents depending on the master product entry. You can even simply reindex your whole DB, updates are made in place (i.e. old documents are *completely* overwritten by their respective updates). There's nothing to delete if you build your unique key in a maintainable way. You can re-index documents whenever you need to do so. -- Tanguy 2012/5/22 Sohail Aboobaker sabooba...@gmail.com Hi, I have a very basic question and hopefully there is a simple answer to this. We are trying to index a simple product catalog which has a master product and child products. Each master product can have multiple child products. A master product can be assigned one or more product categories. Now, we need to be able to show counts of categories based on number of child products in each category. We have indexed data using a join and selecting appropriate values for index from each table. This is basically a De-normalized result set. It works perfectly for our search purposes. However, maintaining the index and keeping index up to date is an issue. Whenever a product master is updated with a new category, we will need to delete all the index entries for child products in index and insert them again. This seems a lot of activity for a regular on-going operation i.e. product category updates. Since, join between schemas is only available in 4.0, what are other strategies to maintain or to create such queries. Thanks for your help. Regards, Sohail
Re: System requirements in my case?
Hi Bruno, will you use facets and result sorting ? What is the update frequency/volume ? This could impact the amount of memory/server count. Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/System-requirements-in-my-case-tp3985309p3985327.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strategy for maintaining De-normalized indexes
Thats how de-normalization works. You need to update all child products. If you just need the count and you are using facets then maintain a map between category and main product, main product and child product. Lucene db has no schema. You could retrieve the data based on its type. Category record will have Category name, ProductName and a type (CATEGORY_TYPE) Child product record will have ProductName, MainProductName ProductDetails, and type (PRODUCT_TYPE) Now in this you may need to use two queries. Given the category name, fetch the main product name and query using it to fetch the child products. Hope it helps. Regards Aditya www.findbestopensource.com On Tue, May 22, 2012 at 1:37 PM, Sohail Aboobaker sabooba...@gmail.comwrote: Hi, I have a very basic question and hopefully there is a simple answer to this. We are trying to index a simple product catalog which has a master product and child products. Each master product can have multiple child products. A master product can be assigned one or more product categories. Now, we need to be able to show counts of categories based on number of child products in each category. We have indexed data using a join and selecting appropriate values for index from each table. This is basically a De-normalized result set. It works perfectly for our search purposes. However, maintaining the index and keeping index up to date is an issue. Whenever a product master is updated with a new category, we will need to delete all the index entries for child products in index and insert them again. This seems a lot of activity for a regular on-going operation i.e. product category updates. Since, join between schemas is only available in 4.0, what are other strategies to maintain or to create such queries. Thanks for your help. Regards, Sohail
Multicore Solr
Hi all, greetings from my end. This is my first post on this mailing list. I have few questions on multicore solr. For background we want to create a core for each user logged in to our application. In that case it may be 50, 100, 1000, N-numbers. Each core will be used to write and search index in real time. 1. Is this a good idea to go with? 2. What are the pros and cons of this approch? Awaiting for your response. Regards AJ
Re: System requirements in my case?
My choice: http://www.ovh.com/fr/serveurs_dedies/eg_best_of.xml 24 Go DDR3 Le 22/05/2012 10:26, findbestopensource a écrit : Dedicated Server may not be required. If you want to cut down cost, then prefer shared server. How much the RAM? Regards Aditya www.findbestopensource.com On Tue, May 22, 2012 at 12:36 PM, Bruno Manninabmann...@free.fr wrote: Dear Solr users, My company would like to use solr to index around 80 000 000 documents (xml files with around 5~10ko size each). My program (robot) will connect to this solr with boolean requests. Number of users: around 1000 Number of requests by user and by day: 300 Number of users by day: 30 I would like to subscribe to a host provider with this configuration: - Dedicated Server - Ubuntu - Intel Xeon i7 2x 266+ GHz 12 Go 2 * 1500Go - Unlimited bandwidth - IP fixe Do you think this configuration is enough? Thanks for your info, Sincerely Bruno
Re: System requirements in my case?
Hi, facets I don't know yet because I don't know exactly what is facets (sorry) Sorting: yes Scoring: yes Concerning update Frequency : every week Volume: around 1Go data by year Merci beaucoup :) Aix En Provence France Le 22/05/2012 10:35, lboutros a écrit : Hi Bruno, will you use facets and result sorting ? What is the update frequency/volume ? This could impact the amount of memory/server count. Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/System-requirements-in-my-case-tp3985309p3985327.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multicore Solr
Having cores per user is not good idea. The count is too high. Keep everything in single core. You could filter the data based on user name or user id. Regards Aditya www.findbestopensource.com On Tue, May 22, 2012 at 2:29 PM, Shanu Jha shanuu@gmail.com wrote: Hi all, greetings from my end. This is my first post on this mailing list. I have few questions on multicore solr. For background we want to create a core for each user logged in to our application. In that case it may be 50, 100, 1000, N-numbers. Each core will be used to write and search index in real time. 1. Is this a good idea to go with? 2. What are the pros and cons of this approch? Awaiting for your response. Regards AJ
Re: System requirements in my case?
Seems to be fine. Go head. Before hosting, Have you tried / tested your application in local setup. RAM usage is what matters in terms of Solr. Just benchmark your app for 100 000 documents, Log the memory used. Calculate the RAM reqd for 80 000 000 documents. Regards Aditya www.findbestopensource.com On Tue, May 22, 2012 at 2:36 PM, Bruno Mannina bmann...@free.fr wrote: My choice: http://www.ovh.com/fr/**serveurs_dedies/eg_best_of.xmlhttp://www.ovh.com/fr/serveurs_dedies/eg_best_of.xml 24 Go DDR3 Le 22/05/2012 10:26, findbestopensource a écrit : Dedicated Server may not be required. If you want to cut down cost, then prefer shared server. How much the RAM? Regards Aditya www.findbestopensource.com On Tue, May 22, 2012 at 12:36 PM, Bruno Manninabmann...@free.fr wrote: Dear Solr users, My company would like to use solr to index around 80 000 000 documents (xml files with around 5~10ko size each). My program (robot) will connect to this solr with boolean requests. Number of users: around 1000 Number of requests by user and by day: 300 Number of users by day: 30 I would like to subscribe to a host provider with this configuration: - Dedicated Server - Ubuntu - Intel Xeon i7 2x 266+ GHz 12 Go 2 * 1500Go - Unlimited bandwidth - IP fixe Do you think this configuration is enough? Thanks for your info, Sincerely Bruno
Re: is commit a sequential process in solr indexing
Yes. Lucene / Solr supports multi threaded environment. You could do commit from two different threads to same core or different core. Regards Aditya www.findbestopensource.com On Tue, May 22, 2012 at 12:35 AM, jame vaalet jamevaa...@gmail.com wrote: hi, my use case here is to search all the incoming documents for certain comination of words which are pre-determined. So what am doing here is, create a batch of x docs according to their creation date, index them, commit them and search them for query (pre-determined). My question is, if i have to make the entire process multi threaded and two threads are trying to commit two different set of batchs, will the commit happen in parallel. what if am trying to commit to different solr-cores ? -- -JAME
Re: Strategy for maintaining De-normalized indexes
Thank you for quick replies. Can't the ID (uniqueKey) of the indexed documents (i.e. denormalized data) be a combination of the master product id and the child product id ? -- We do not need it as each child is already a unique key. Therefore whenever you update your master product db entry, you simply need to reindex documents depending on the master product entry. -- This is where the confusion might be. I may have misread it but Apache Solr3 Enterprise Search, it mentions that if any part of the document needs to be updated, the entire document must be replaced. Internally this is a deletion and an addition. Is re-indexing all detail records a huge performance hit? Assuming that a master can have upto 10 to 20k of child records? Thanks again. Sohail
Re: How can i search site name
Hi, I would probably use (e)DisMax. Index your url and metadata fields as text without stemming, e.g. text_general Then query as q=mycompanydefType=edismaxqf=title^10 content^1 url^5 If you like to give higher weight to the domain/site part of the URL, apply UrlClassifyProcessor and search the domain field separately with higher weight. -- Jan Høydahl, search solution architect Cominvent AS - www.facebook.com/Cominvent Solr Training - www.solrtraining.com On 22. mai 2012, at 12:23, Shameema Umer wrote: Thanks Li Li and Jan. Yes, if url is www.company.com/foo/bar/index.html, I should be able to search the sub-strings like company, foo or bar etc. when I changed the part of my schema file from defaultSearchFieldcontent/defaultSearchField to defaultSearchFieldstext/defaultSearchField copyField source=title dest=stext/ copyField source=content dest=stext/ copyField source=site dest=stext/ server error occurred after restarting solr. Do I need to re-index solr. Please help me as i need to search title url and content with privilege to title. If DisMaxRequestHandler helps me solve my problems, let me know the best tutorial page to study it.http://wiki.apache.org/solr/DisMaxRequestHandler?action=fullsearchcontext=180value=linkto%3A%22DisMaxRequestHandler%22 Thanks Shameema http://wiki.apache.org/solr/DisMaxRequestHandler?action=fullsearchcontext=180value=linkto%3A%22DisMaxRequestHandler%22
Re: Strategy for maintaining De-normalized indexes
It all depends on the frequency at which you refresh your data, on your deployment (master/slave setup), ... Many things need to be taken into account! Did you face any performance issue while building your index? If you didn't, rebuilding it shouldn't be more problematic. -- Tanguy 2012/5/22 Sohail Aboobaker sabooba...@gmail.com Thank you for quick replies. Can't the ID (uniqueKey) of the indexed documents (i.e. denormalized data) be a combination of the master product id and the child product id ? -- We do not need it as each child is already a unique key. Therefore whenever you update your master product db entry, you simply need to reindex documents depending on the master product entry. -- This is where the confusion might be. I may have misread it but Apache Solr3 Enterprise Search, it mentions that if any part of the document needs to be updated, the entire document must be replaced. Internally this is a deletion and an addition. Is re-indexing all detail records a huge performance hit? Assuming that a master can have upto 10 to 20k of child records? Thanks again. Sohail
Re: Strategy for maintaining De-normalized indexes
We are still in design phase, so we haven't hit any performance issues. We do not want to discover performance issues too late during QA :) We would rather account for any issues during the design phase. The refresh rate on fields that we are using from master table will be rare. May be three or four times in a year. Regards, Sohail
Re: How can i search site name
Thanks Jan.* It worked perfect*. Thats all i needed. May the God bless you. Regards Shameema On Tue, May 22, 2012 at 4:57 PM, Jan Høydahl jan@cominvent.com wrote: Hi, I would probably use (e)DisMax. Index your url and metadata fields as text without stemming, e.g. text_general Then query as q=mycompanydefType=edismaxqf=title^10 content^1 url^5 If you like to give higher weight to the domain/site part of the URL, apply UrlClassifyProcessor and search the domain field separately with higher weight. -- Jan Høydahl, search solution architect Cominvent AS - www.facebook.com/Cominvent Solr Training - www.solrtraining.com On 22. mai 2012, at 12:23, Shameema Umer wrote: Thanks Li Li and Jan. Yes, if url is www.company.com/foo/bar/index.html, I should be able to search the sub-strings like company, foo or bar etc. when I changed the part of my schema file from defaultSearchFieldcontent/defaultSearchField to defaultSearchFieldstext/defaultSearchField copyField source=title dest=stext/ copyField source=content dest=stext/ copyField source=site dest=stext/ server error occurred after restarting solr. Do I need to re-index solr. Please help me as i need to search title url and content with privilege to title. If DisMaxRequestHandler helps me solve my problems, let me know the best tutorial page to study it. http://wiki.apache.org/solr/DisMaxRequestHandler?action=fullsearchcontext=180value=linkto%3A%22DisMaxRequestHandler%22 Thanks Shameema http://wiki.apache.org/solr/DisMaxRequestHandler?action=fullsearchcontext=180value=linkto%3A%22DisMaxRequestHandler%22
Re: System requirements in my case?
Hi, It is impossible to guess the required HW size without more knowledge about data and usage. 80 mill docs is a fair amount. Here's how I would approach sizing the setup: 1) Get your schema in shape, removing unnecessary stored/indexed fields 2) To a test index locally of a part of the dataset, e.g. 10 mill docs and perform an Optimize 3) Measure the size of the index folder, multiply with 8 to get a clue of total index size 4) Do some benchmarking with realistic types of queries to identify performance bottlenecks on query side Depending on your requirements for search performance, you can beef up your RAM to hold the whole index or depend on slow disks as a bottleneck. If you find that total size of index is 16Gb, you should leave 16Gb free for OS disk caching, e.g. allocate 8Gb to Tomcat/Solr and leave the rest for the OS. If I should guess, you probably find that one server gets overloaded or too slow with your amount of docs, and that you end up with sharding across 2-4 servers. PS: Do you always need to search all data? A trick may be to partition your data such that say 80% of searches go to a fresh index with 10% of the content, while the remaining searches include everything. -- Jan Høydahl, search solution architect Cominvent AS - www.facebook.com/Cominvent Solr Training - www.solrtraining.com On 22. mai 2012, at 11:06, Bruno Mannina wrote: My choice: http://www.ovh.com/fr/serveurs_dedies/eg_best_of.xml 24 Go DDR3 Le 22/05/2012 10:26, findbestopensource a écrit : Dedicated Server may not be required. If you want to cut down cost, then prefer shared server. How much the RAM? Regards Aditya www.findbestopensource.com On Tue, May 22, 2012 at 12:36 PM, Bruno Manninabmann...@free.fr wrote: Dear Solr users, My company would like to use solr to index around 80 000 000 documents (xml files with around 5~10ko size each). My program (robot) will connect to this solr with boolean requests. Number of users: around 1000 Number of requests by user and by day: 300 Number of users by day: 30 I would like to subscribe to a host provider with this configuration: - Dedicated Server - Ubuntu - Intel Xeon i7 2x 266+ GHz 12 Go 2 * 1500Go - Unlimited bandwidth - IP fixe Do you think this configuration is enough? Thanks for your info, Sincerely Bruno
Re: Multicore Solr
Hi, Could please tell me what do you mean by filter data by users? I would like to know is there real problem creating a core for a user. ie. resource utilization, cpu usage etc. AJ On Tue, May 22, 2012 at 4:39 PM, findbestopensource findbestopensou...@gmail.com wrote: Having cores per user is not good idea. The count is too high. Keep everything in single core. You could filter the data based on user name or user id. Regards Aditya www.findbestopensource.com On Tue, May 22, 2012 at 2:29 PM, Shanu Jha shanuu@gmail.com wrote: Hi all, greetings from my end. This is my first post on this mailing list. I have few questions on multicore solr. For background we want to create a core for each user logged in to our application. In that case it may be 50, 100, 1000, N-numbers. Each core will be used to write and search index in real time. 1. Is this a good idea to go with? 2. What are the pros and cons of this approch? Awaiting for your response. Regards AJ
Re: SolrCloud: how to index documents into a specific core and how to search against that core?
Hi Darren, Thanks very much for your reply. The reason I want to control core indexing/searching is that I want to use one core to store one customer's data (all customer share same config): such as customer 1 use coreForCustomer1 and customer 2 use coreForCustomer2. Is there any better way than using different core for different customer? Another way maybe use different collection for different customer, while not sure how many collections solr cloud could support. Which way is better in terms of flexibility/scalability? (suppose there are tens of thousands customers). Regards, Yandong 2012/5/22 Darren Govoni dar...@ontrenet.com Why do you want to control what gets indexed into a core and then knowing what core to search? That's the kind of knowing that SolrCloud solves. In SolrCloud, it handles the distribution of documents across shards and retrieves them regardless of which node is searched from. That is the point of cloud, you don't know the details of where exactly documents are being managed (i.e. they are cloudy). It can change and re-balance from time to time. SolrCloud performs the distributed search for you, therefore when you try to search a node/core with no documents, all the results from the cloud are retrieved regardless. This is considered A Good Thing. It requires a change in thinking about indexing and searching On Tue, 2012-05-22 at 08:43 +0800, Yandong Yao wrote: Hi Guys, I use following command to start solr cloud according to solr cloud wiki. yydzero:example bjcoe$ java -Dbootstrap_confdir=./solr/conf -Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar yydzero:example2 bjcoe$ java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar Then I have created several cores using CoreAdmin API ( http://localhost:8983/solr/admin/cores?action=CREATEname= coreNamecollection=collection1), and clusterstate.json show following topology: collection1: -- shard1: -- collection1 -- CoreForCustomer1 -- CoreForCustomer3 -- CoreForCustomer5 -- shard2: -- collection1 -- CoreForCustomer2 -- CoreForCustomer4 1) Index: Using following command to index mem.xml file in exampledocs directory. yydzero:exampledocs bjcoe$ java -Durl= http://localhost:8983/solr/coreForCustomer3/update -jar post.jar mem.xml SimplePostTool: version 1.4 SimplePostTool: POSTing files to http://localhost:8983/solr/coreForCustomer3/update.. SimplePostTool: POSTing file mem.xml SimplePostTool: COMMITting Solr index changes. And now SolrAdmin UI shows that 'coreForCustomer1', 'coreForCustomer3', 'coreForCustomer5' has 3 documents (mem.xml has 3 documents) and other 2 core has 0 documents. *Question 1:* Is this expected behavior? How do I to index documents into a specific core? *Question 2*: If SolrCloud don't support this yet, how could I extend it to support this feature (index document to particular core), where should i start, the hashing algorithm? *Question 3*: Why the documents are also indexed into 'coreForCustomer1' and 'coreForCustomer5'? The default replica for documents are 1, right? Then I try to index some document to 'coreForCustomer2': $ java -Durl=http://localhost:8983/solr/coreForCustomer2/update -jar post.jar ipod_video.xml While 'coreForCustomer2' still have 0 documents and documents in ipod_video are indexed to core for customer 1/3/5. *Question 4*: Why this happens? 2) Search: I use http://localhost:8983/solr/coreForCustomer2/select?q=*%3A*wt=xml; to search against 'CoreForCustomer2', while it will return all documents in the whole collection even though this core has no documents at all. Then I use http://localhost:8983/solr/coreForCustomer2/select?q=*%3A*wt=xmlshards=localhost:8983/solr/coreForCustomer2 , and it will return 0 documents. *Question 5*: So If want to search against a particular core, we need to use 'shards' parameter and use solrCore name as parameter value, right? Thanks very much in advance! Regards, Yandong
Re: System requirements in my case?
I installed a temp server on my university with 12 000 docs (Ubuntu+solr 3.6.0) May be I can preview the size of memory I need? Q: How can I check the memory used? Le 22/05/2012 13:14, findbestopensource a écrit : Seems to be fine. Go head. Before hosting, Have you tried / tested your application in local setup. RAM usage is what matters in terms of Solr. Just benchmark your app for 100 000 documents, Log the memory used. Calculate the RAM reqd for 80 000 000 documents. Regards Aditya www.findbestopensource.com On Tue, May 22, 2012 at 2:36 PM, Bruno Manninabmann...@free.fr wrote: My choice: http://www.ovh.com/fr/**serveurs_dedies/eg_best_of.xmlhttp://www.ovh.com/fr/serveurs_dedies/eg_best_of.xml 24 Go DDR3 Le 22/05/2012 10:26, findbestopensource a écrit : Dedicated Server may not be required. If you want to cut down cost, then prefer shared server. How much the RAM? Regards Aditya www.findbestopensource.com On Tue, May 22, 2012 at 12:36 PM, Bruno Manninabmann...@free.fr wrote: Dear Solr users, My company would like to use solr to index around 80 000 000 documents (xml files with around 5~10ko size each). My program (robot) will connect to this solr with boolean requests. Number of users: around 1000 Number of requests by user and by day: 300 Number of users by day: 30 I would like to subscribe to a host provider with this configuration: - Dedicated Server - Ubuntu - Intel Xeon i7 2x 266+ GHz 12 Go 2 * 1500Go - Unlimited bandwidth - IP fixe Do you think this configuration is enough? Thanks for your info, Sincerely Bruno
RE: Wildcard-Search Solr 3.5.0
The text may contain FooBar. When I do a wildcard search like this: Foo* - no hits. When I do a wildcard search like this: foo* - doc is found. Please see http://wiki.apache.org/solr/MultitermQueryAnalysis Well, it works in 3.6. With one exception: If I use german umlauts it does not work anymore. Text: Bär Bä* - no hits Bär - hits What can I do in this case? Thank you
Re: System requirements in my case?
Hi Jan, Thanks for all these details ! Answers are below. Sincerely, Bruno Le 22/05/2012 13:58, Jan Høydahl a écrit : Hi, It is impossible to guess the required HW size without more knowledge about data and usage. 80 mill docs is a fair amount. Here's how I would approach sizing the setup: 1) Get your schema in shape, removing unnecessary stored/indexed fields Ok good idea ! 2) To a test index locally of a part of the dataset, e.g. 10 mill docs and perform an Optimize Concerning test, I have only actually a sample with 12000 docs. no more :'( 3) Measure the size of the index folder, multiply with 8 to get a clue of total index size With 12 000 docs my index folder size is: 33Mo ps: I use solr.clustering.enabled=true 4) Do some benchmarking with realistic types of queries to identify performance bottlenecks on query side yep, this point is for later. Depending on your requirements for search performance, you can beef up your RAM to hold the whole index or depend on slow disks as a bottleneck. If you find that total size of index is 16Gb, you should leave16Gb free for OS disk caching, e.g. allocate 8Gb to Tomcat/Solr and leave the rest for the OS. If I should guess, you probably find that one server gets overloaded or too slow with your amount of docs, and that you end up with sharding across 2-4 servers. I will take a look to see if I can easely increase RAM on the server (actually 24Go) Another question concerning the execution of solr, have just to run java -jar start.jar ? or you think I must run it with another way ? PS: Do you always need to search all data? A trick may be to partition your data such that say 80% of searches go to a fresh index with 10% of the content, while the remaining searches include everything. Yes I need to search to the whole index, even old document must be requested. -- Jan Høydahl, search solution architect Cominvent AS - www.facebook.com/Cominvent Solr Training - www.solrtraining.com On 22. mai 2012, at 11:06, Bruno Mannina wrote: My choice: http://www.ovh.com/fr/serveurs_dedies/eg_best_of.xml 24 Go DDR3 Le 22/05/2012 10:26, findbestopensource a écrit : Dedicated Server may not be required. If you want to cut down cost, then prefer shared server. How much the RAM? Regards Aditya www.findbestopensource.com On Tue, May 22, 2012 at 12:36 PM, Bruno Manninabmann...@free.fr wrote: Dear Solr users, My company would like to use solr to index around 80 000 000 documents (xml files with around 5~10ko size each). My program (robot) will connect to this solr with boolean requests. Number of users: around 1000 Number of requests by user and by day: 300 Number of users by day: 30 I would like to subscribe to a host provider with this configuration: - Dedicated Server - Ubuntu - Intel Xeon i7 2x 266+ GHz 12 Go 2 * 1500Go - Unlimited bandwidth - IP fixe Do you think this configuration is enough? Thanks for your info, Sincerely Bruno
Re: Newbie with Carrot2?
Hi Bruno, Just to confirm -- are you seeing the clusters array in the result at all (arr name=clusters)? To get reasonable clusters, you should request at least 30-50 documents (rows), but even with smaller values, you should see an empty clusters array. Staszek On Sun, May 20, 2012 at 9:20 PM, Bruno Mannina bmann...@free.fr wrote: Le 20/05/2012 11:43, Stanislaw Osinski a écrit : Hi Bruno, Here's the wiki documentation for Solr's clustering component: http://wiki.apache.org/solr/**ClusteringComponenthttp://wiki.apache.org/solr/ClusteringComponent For configuration examples, take a look at the Configuration section: http://wiki.apache.org/solr/**ClusteringComponent#**Configurationhttp://wiki.apache.org/solr/ClusteringComponent#Configuration . If you hit any problems, let me know. Staszek On Sun, May 20, 2012 at 11:38 AM, Bruno Manninabmann...@free.fr wrote: Dear all, I use Solr 3.6.0 and I indexed some documents (around 12000). Each documents contains a Abstract-en field (and some other fields). Is it possible to use Carrot2 to create cluster (classes) with the Abstract-en field? What must I configure in the schema.xml ? or in other files? Sorry for my newbie question, but I found only documentation for Workbench tool. Bruno Thx for this link but I have a problem to configure my solrconfig.xml in the section: (note I run java -Dsolr.clustering.enabled=**true) I have a field named abstract-en, and I would like to use only this field. I would like to know if my requestHandler is good? I have a doubt with the content of : carrot.title, carrot.url and also the latest field str name=dfabstract-en/str str name=defTypeedismax/str str name=qf abstract-en^1.0 /str str name=q.alt*:*/str str name=rows10/str str name=fl*,score/str because the result when I do a request is exactly like a search request (without more information) My entire requestHandler is: requestHandler name=/clustering startup=lazy enable=${solr.clustering.**enabled:false} class=solr.SearchHandler lst name=defaults bool name=clusteringtrue/bool str name=clustering.engine**default/str bool name=clustering.results**true/bool !-- The title field -- str name=carrot.titlename/str str name=carrot.urlid/str !-- The field to cluster on -- str name=carrot.snippet**abstract-en/str !-- produce summaries -- bool name=carrot.produceSummary**true/bool !-- the maximum number of labels per cluster -- !--int name=carrot.numDescriptions**5/int-- !-- produce sub clusters -- bool name=carrot.**outputSubClustersfalse/**bool str name=dfabstract-en/str str name=defTypeedismax/str str name=qf abstract-en^1.0 /str str name=q.alt*:*/str str name=rows10/str str name=fl*,score/str /lst arr name=last-components strclustering/str /arr /requestHandler
Re: Question about sampling
Hi Lance, Could you provide more details about implementing this using SignatureUpdateProcessor? Example can be helpful. - Rita -- View this message in context: http://lucene.472066.n3.nabble.com/Question-about-sampling-tp3984103p3985379.html Sent from the Solr - User mailing list archive at Nabble.com.
Multicore solr
Hi all, greetings from my end. This is my first post on this mailing list. I have few questions on multicore solr. For background we want to create a core for each user logged in to our application. In that case it may be 50, 100, 1000, N-numbers. Each core will be used to write and search index in real time. 1. Is this a good idea to go with? 2. What are the pros and cons of this approch? Awaiting for your response. Regards AJ
Re: Newbie with Carrot2?
Arfff Clusters are at the end of my XML answer doc /doc doc /doc doc /doc doc /doc .. .. cluster /cluster ok all work fine now ! Le 22/05/2012 15:33, Stanislaw Osinski a écrit : Hi Bruno, Just to confirm -- are you seeing the clusters array in the result at all (arr name=clusters)? To get reasonable clusters, you should request at least 30-50 documents (rows), but even with smaller values, you should see an empty clusters array. Staszek On Sun, May 20, 2012 at 9:20 PM, Bruno Manninabmann...@free.fr wrote: Le 20/05/2012 11:43, Stanislaw Osinski a écrit : Hi Bruno, Here's the wiki documentation for Solr's clustering component: http://wiki.apache.org/solr/**ClusteringComponenthttp://wiki.apache.org/solr/ClusteringComponent For configuration examples, take a look at the Configuration section: http://wiki.apache.org/solr/**ClusteringComponent#**Configurationhttp://wiki.apache.org/solr/ClusteringComponent#Configuration . If you hit any problems, let me know. Staszek On Sun, May 20, 2012 at 11:38 AM, Bruno Manninabmann...@free.fr wrote: Dear all, I use Solr 3.6.0 and I indexed some documents (around 12000). Each documents contains a Abstract-en field (and some other fields). Is it possible to use Carrot2 to create cluster (classes) with the Abstract-en field? What must I configure in the schema.xml ? or in other files? Sorry for my newbie question, but I found only documentation for Workbench tool. Bruno Thx for this link but I have a problem to configure my solrconfig.xml in the section: (note I run java -Dsolr.clustering.enabled=**true) I have a field named abstract-en, and I would like to use only this field. I would like to know if my requestHandler is good? I have a doubt with the content of : carrot.title, carrot.url and also the latest field str name=dfabstract-en/str str name=defTypeedismax/str str name=qf abstract-en^1.0 /str str name=q.alt*:*/str str name=rows10/str str name=fl*,score/str because the result when I do a request is exactly like a search request (without more information) My entire requestHandler is: requestHandler name=/clustering startup=lazy enable=${solr.clustering.**enabled:false} class=solr.SearchHandler lst name=defaults bool name=clusteringtrue/bool str name=clustering.engine**default/str bool name=clustering.results**true/bool !-- The title field -- str name=carrot.titlename/str str name=carrot.urlid/str !-- The field to cluster on -- str name=carrot.snippet**abstract-en/str !-- produce summaries -- bool name=carrot.produceSummary**true/bool !-- the maximum number of labels per cluster -- !--int name=carrot.numDescriptions**5/int-- !-- produce sub clusters -- bool name=carrot.**outputSubClustersfalse/**bool str name=dfabstract-en/str str name=defTypeedismax/str str name=qf abstract-en^1.0 /str str name=q.alt*:*/str str name=rows10/str str name=fl*,score/str /lst arr name=last-components strclustering/str /arr /requestHandler
Uncatchable Exception on solrj3.6.0
Hi, I use solr-solrj 3.6.0 and solr-core 3.6.0: I have reimplemented the handleError of the ConcurrentUpdateSolrServer class: final ConcurrentUpdateSolrServer newSolrServer = new ConcurrentUpdateSolrServer(url, client, 100, 10){ @Override public void handleError(Throwable ex) { // TODO Auto-generated method stub super.handleError(ex); } }; My problem is when an exception is thrown in the solr server side I cannot catch it in the client side. Thanks -- Jamel E -- View this message in context: http://lucene.472066.n3.nabble.com/Uncatchable-Exception-on-solrj3-6-0-tp3985437.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Re: SolrCloud: how to index documents into a specific core and how to search against that core?
I'm curious what the solrcloud experts say, but my suggestion is to try not to over-engineering the search architecture on solrcloud. For example, what is the benefit of managing the what cores are indexed and searched? Having to know those details, in my mind, works against the automation in solrcore, but maybe there's a good reason you want to do it this way. brbrbr--- Original Message --- On 5/22/2012 07:35 AM Yandong Yao wrote:brHi Darren, br brThanks very much for your reply. br brThe reason I want to control core indexing/searching is that I want to bruse one core to store one customer's data (all customer share same brconfig): such as customer 1 use coreForCustomer1 and customer 2 bruse coreForCustomer2. br brIs there any better way than using different core for different customer? br brAnother way maybe use different collection for different customer, while brnot sure how many collections solr cloud could support. Which way is better brin terms of flexibility/scalability? (suppose there are tens of thousands brcustomers). br brRegards, brYandong br br2012/5/22 Darren Govoni dar...@ontrenet.com br br Why do you want to control what gets indexed into a core and then br knowing what core to search? That's the kind of knowing that SolrCloud br solves. In SolrCloud, it handles the distribution of documents across br shards and retrieves them regardless of which node is searched from. br That is the point of cloud, you don't know the details of where br exactly documents are being managed (i.e. they are cloudy). It can br change and re-balance from time to time. SolrCloud performs the br distributed search for you, therefore when you try to search a node/core br with no documents, all the results from the cloud are retrieved br regardless. This is considered A Good Thing. br br It requires a change in thinking about indexing and searching br br On Tue, 2012-05-22 at 08:43 +0800, Yandong Yao wrote: br Hi Guys, br br I use following command to start solr cloud according to solr cloud wiki. br br yydzero:example bjcoe$ java -Dbootstrap_confdir=./solr/conf br -Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar br yydzero:example2 bjcoe$ java -Djetty.port=7574 -DzkHost=localhost:9983 br -jar br start.jar br br Then I have created several cores using CoreAdmin API ( br http://localhost:8983/solr/admin/cores?action=CREATEname= br coreNamecollection=collection1), and clusterstate.json show following br topology: br br br collection1: br -- shard1: br-- collection1 br-- CoreForCustomer1 br-- CoreForCustomer3 br-- CoreForCustomer5 br -- shard2: br-- collection1 br-- CoreForCustomer2 br-- CoreForCustomer4 br br br 1) Index: br br Using following command to index mem.xml file in exampledocs directory. br br yydzero:exampledocs bjcoe$ java -Durl= br http://localhost:8983/solr/coreForCustomer3/update -jar post.jar mem.xml br SimplePostTool: version 1.4 br SimplePostTool: POSTing files to br http://localhost:8983/solr/coreForCustomer3/update.. br SimplePostTool: POSTing file mem.xml br SimplePostTool: COMMITting Solr index changes. br br And now SolrAdmin UI shows that 'coreForCustomer1', 'coreForCustomer3', br 'coreForCustomer5' has 3 documents (mem.xml has 3 documents) and other 2 br core has 0 documents. br br *Question 1:* Is this expected behavior? How do I to index documents br into br a specific core? br br *Question 2*: If SolrCloud don't support this yet, how could I extend it br to support this feature (index document to particular core), where br should i br start, the hashing algorithm? br br *Question 3*: Why the documents are also indexed into 'coreForCustomer1' br and 'coreForCustomer5'? The default replica for documents are 1, right? br br Then I try to index some document to 'coreForCustomer2': br br $ java -Durl=http://localhost:8983/solr/coreForCustomer2/update -jar br post.jar ipod_video.xml br br While 'coreForCustomer2' still have 0 documents and documents in br ipod_video br are indexed to core for customer 1/3/5. br br *Question 4*: Why this happens? br br 2) Search: I use br http://localhost:8983/solr/coreForCustomer2/select?q=*%3A*wt=xml; to br search against 'CoreForCustomer2', while it will return all documents in br the whole collection even though this core has no documents at all. br br Then I use br br http://localhost:8983/solr/coreForCustomer2/select?q=*%3A*wt=xmlshards=localhost:8983/solr/coreForCustomer2 br , br and it will return 0 documents. br br *Question 5*: So If want to search against a particular core, we need to br use 'shards' parameter and use solrCore name as parameter value, right? br br br Thanks very much in advance! br br Regards, br Yandong br br br br
Re: Installing Solr on Tomcat using Shell - Code wrong?
you should find some clues from tomcat log 在 2012-5-22 晚上7:49,Spadez james_will...@hotmail.com写道: Hi, This is the install process I used in my shell script to try and get Tomcat running with Solr (debian server): I swear this used to work, but currently only Tomcat works. The Solr page just comes up with The requested resource (/solr/admin) is not available. Can anyone give me some insight into why this isnt working? Its driving me nuts. James -- View this message in context: http://lucene.472066.n3.nabble.com/Installing-Solr-on-Tomcat-using-Shell-Code-wrong-tp3985393.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud: how to index documents into a specific core and how to search against that core?
I think the key is this: you want to think of a SolrCore on a single node Solr installation as a collection on a multi node SolrCloud installation. So if you would use multiple SolrCore's with a std Solr setup, you should be using multiple collections in SolrCloud. If you were going to try to do everything in one SolrCore, that would be like putting everything in one collection in SolrCloud. I don't think it generally makes sense to try and work at the SolrCore level when working with SolrCloud. This will be made more clear once we add a simple collections api. So I think your choice should be similar to using a single node - do you want to put everything in one 'collection' and use a filter to separate customers (with all its caveats and limitations) or do you want to use a collection per customer. You can always start up more clusters if you reach any limits. On May 22, 2012, at 10:08 AM, Darren Govoni wrote: I'm curious what the solrcloud experts say, but my suggestion is to try not to over-engineering the search architecture on solrcloud. For example, what is the benefit of managing the what cores are indexed and searched? Having to know those details, in my mind, works against the automation in solrcore, but maybe there's a good reason you want to do it this way. brbrbr--- Original Message --- On 5/22/2012 07:35 AM Yandong Yao wrote:brHi Darren, br brThanks very much for your reply. br brThe reason I want to control core indexing/searching is that I want to bruse one core to store one customer's data (all customer share same brconfig): such as customer 1 use coreForCustomer1 and customer 2 bruse coreForCustomer2. br brIs there any better way than using different core for different customer? br brAnother way maybe use different collection for different customer, while brnot sure how many collections solr cloud could support. Which way is better brin terms of flexibility/scalability? (suppose there are tens of thousands brcustomers). br brRegards, brYandong br br2012/5/22 Darren Govoni dar...@ontrenet.com br br Why do you want to control what gets indexed into a core and then br knowing what core to search? That's the kind of knowing that SolrCloud br solves. In SolrCloud, it handles the distribution of documents across br shards and retrieves them regardless of which node is searched from. br That is the point of cloud, you don't know the details of where br exactly documents are being managed (i.e. they are cloudy). It can br change and re-balance from time to time. SolrCloud performs the br distributed search for you, therefore when you try to search a node/core br with no documents, all the results from the cloud are retrieved br regardless. This is considered A Good Thing. br br It requires a change in thinking about indexing and searching br br On Tue, 2012-05-22 at 08:43 +0800, Yandong Yao wrote: br Hi Guys, br br I use following command to start solr cloud according to solr cloud wiki. br br yydzero:example bjcoe$ java -Dbootstrap_confdir=./solr/conf br -Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar br yydzero:example2 bjcoe$ java -Djetty.port=7574 -DzkHost=localhost:9983 br -jar br start.jar br br Then I have created several cores using CoreAdmin API ( br http://localhost:8983/solr/admin/cores?action=CREATEname= br coreNamecollection=collection1), and clusterstate.json show following br topology: br br br collection1: br -- shard1: br-- collection1 br-- CoreForCustomer1 br-- CoreForCustomer3 br-- CoreForCustomer5 br -- shard2: br-- collection1 br-- CoreForCustomer2 br-- CoreForCustomer4 br br br 1) Index: br br Using following command to index mem.xml file in exampledocs directory. br br yydzero:exampledocs bjcoe$ java -Durl= br http://localhost:8983/solr/coreForCustomer3/update -jar post.jar mem.xml br SimplePostTool: version 1.4 br SimplePostTool: POSTing files to br http://localhost:8983/solr/coreForCustomer3/update.. br SimplePostTool: POSTing file mem.xml br SimplePostTool: COMMITting Solr index changes. br br And now SolrAdmin UI shows that 'coreForCustomer1', 'coreForCustomer3', br 'coreForCustomer5' has 3 documents (mem.xml has 3 documents) and other 2 br core has 0 documents. br br *Question 1:* Is this expected behavior? How do I to index documents br into br a specific core? br br *Question 2*: If SolrCloud don't support this yet, how could I extend it br to support this feature (index document to particular core), where br should i br start, the hashing algorithm? br br *Question 3*: Why the documents are also indexed into 'coreForCustomer1' br and 'coreForCustomer5'? The default replica for documents are 1, right? br br Then I try to index some document to
Re: Multicore solr
It would help if you provide your use case. What are you indexing for each user and why would you need a separate core for indexing each user? How do you decide schema for each user? It might be better to describe your use case and desired results. People on the list will be able to advice on the best approach. Sohail
Re: solr tokenizer not splitting unbreakable expressions
Hello Elisabeth, Wouldn't it be more simple to have a custom component inside of the front-end to your search server that would transform a query like hotel de ville paris into hotel de ville paris (I.e. turning each occurence of the sequence hotel de ville into a phrase query ) ? Concerning protections inside of the tokenizer, I think that is not possible actually. The main reason for this could be that the QueryParser will break the query on each space before passing each query-part through the analysis of every searched field. Hence all the smart things you would put at indexing time to wrap a sequence of tokens into a single one is not reproducible at query time. Please someone correct me if I'm wrong! Alternatively, I think you might do so with a custom query parser (in order to have phrases sent to the analyzers instead of words). But since tokenizers don't have support for protected words list, you would need an additional custom token filter that would consume the tokens stream and annotate those matching an entry in the protection list. Unfortunately, if your protected list is long, you will have performance issues. Unless you rely on a dedicated data structure, like Trie-based structures (Patricia-trie, ...) You can find solid implementations on the Internet (see https://github.com/rkapsi/patricia-trie). Then you could make your filter consume a sliding window of tokens while the window matches in your trie. Once you have a complete match in your trie, the filter can set an attribute of the type your choice (e.g. MyCustomKeywordAttribute) on the first matching token, and make the attribute be the complete match (e.g. Hotel de ville). If you don't have a complete match, drop the unmatched tokens leaving them unmodified. I Hope this helps... -- Tanguy 2012/5/22 elisabeth benoit elisaelisael...@gmail.com Hello, Does someone know if there is a way to configure a tokenizer to split on white spaces, all words excluding a bunch of expressions listed in a file? For instance, if a want hotel de ville not to be split in words, a request like hotel de ville paris would be split into two tokens: hotel de ville and paris instead of 4 tokens hotel de ville paris I imagine something like tokenizer class=solr.StandardTokenizerFactory protected=protoexpressions.txt/ Thanks a lot, Elisabeth
WFST with autosuggest/geo
Does anyone have the slides or sample code from: Building Query Auto-Completion Systems with Lucene 4.0 Presented by Sudarshan Gaikaiwari, Software Engineer,Yelp We want to implement WFST with GEO boosting. -- Bill Bell billnb...@gmail.com cell 720-256-8076
Binary updates handler does not propagate failures?
Hi all, I am facing following issue ... I have an application which is feeding Solr 3.6 index with document updates via Solrj 3.6. I use a binary request writer, because of the issue with XML when sending insert and deletes at once ( https://issues.apache.org/jira/browse/SOLR-1752 ) Now, I have noticed that if I sent a malformed document to the index, I see in logs it got refused by the index, but on the Solrj side, returned UpdateResponse does not indicate any kind of failure (no exception thrown, response status code == 0 ). When I switch to XML requests, I receive exception when sending malformed document. By looking at Solr's BinaryUpdateRequestHandler.java http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_3_6_0/solr/core/src/java/org/apache/solr/handler/BinaryUpdateRequestHandler.java?view=markup at lines 98 - 105, exceptions are not propagated, therefore RequestHandlerBase can not set it into the response ... Is this intended behavior? What am I doing wrong? Any suggestions? Many thanks in advance. Best, Jozef
How to handle filter query against empty fields
Hi all, I have a field(s) in a schema which I need to be able to specify in a filter query. The field is not mandatory, therefore it can be empty. I need to be able to run a query with a filer : return only docs which does not have value for the field ... What would be the optimal recommended way of doing this with Solr? Thanks! Best, Jozef
Re: How to handle filter query against empty fields
I have a field(s) in a schema which I need to be able to specify in a filter query. The field is not mandatory, therefore it can be empty. I need to be able to run a query with a filer : return only docs which does not have value for the field ... What would be the optimal recommended way of doing this with Solr? There are two approaches for this. Please read my earlier post : http://search-lucene.com/m/72Q4YURpgY
Faceted on Similarity ?
Hi All, I'm quite a new user both to Lucene / Solr. I want to ask if faceted search can be used to do a grouping for multiple field's value based on similarity ? I have look at the faceted index so far, but from my understanding they only works on exact single and definite range values. For example, if I have name, address, id number and nationality. And with rows that had a degree of similarity distance between these fields we will group them together. Sample results will be like this : Group1 Name : Angel, Address: Jakarta, ID : 123, Nationality: Indonesian Name : Angeline, Address: Jayakarta, ID : 123, Nationality: Indonesian Group2 Name : Frank, Address: Jl. Tubagus Angke Jakarta, ID : 333, Nationality: Indonesian Name : Frans, Address: Jl. T. Angke Jakarta, ID : 332, Nationality: Indonesian Hope I make myself clear and asking in proper way. Very sorry if my English is not good enough... Thanks, Robby
Re: System requirements in my case?
3) Measure the size of the index folder, multiply with 8 to get a clue of total index size With 12 000 docs my index folder size is: 33Mo ps: I use solr.clustering.enabled=true Clustering is performed at search time, it doesn't affect the size of the index (but obviously it does affect the search response times). Staszek
Re: Solr mail dataimporter cannot be found
Hey Emma, thanks for reporting this, i opened SOLR-3478 and will commit this soon Stefan On Monday, May 21, 2012 at 10:47 PM, Emma Bo Liu wrote: Hi, I want to index emails using solr. I put the user name, password, hostname in data-config.xml under mail folder. This is a valid email but when I run in url http://localhost:8983/solr/mail/dataimport?command=full-import It said cannot access mail/dataimporter reason: no found. But when i run http://localhost:8983/solr/rss/dataimport?command=full-importhttp://localhost:8983/solr/mail/dataimport?command=full-import or http://localhost:8983/solr/db/dataimport?command=full-imporhttp://localhost:8983/solr/mail/dataimport?command=full-import They can be found. In addition, when I run the command java -Dsolr.solr.home=./example-DIH/solr/ -jar start.jar , on the left side of solr UI, there are db, rss, tika and solr but no mail. Is it a bug that mail indexing? Thank you so much! Best, Emma
clickable links as results?
Hi, I want to display - a clickable link to the document along if a search matches along with the no of times the search query matched. What should i be looking at? I am fairly new to Solr and don't know how I can achieve this. Thanks for the help! -- View this message in context: http://lucene.472066.n3.nabble.com/clickable-links-as-results-tp3985505.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Not able to use the highlighting feature! Want to return snippets of text
That worked! Thanks! I did str name=hl.simple.pre /str str name=hl.simple.post /str -- View this message in context: http://lucene.472066.n3.nabble.com/Not-able-to-use-the-highlighting-feature-Want-to-return-snippets-of-text-Urgent-tp3985012p3985507.html Sent from the Solr - User mailing list archive at Nabble.com.
index-time boosting using DIH
hello all, can i use the technique described on the wiki at: http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts if i am populating my core using a DIH? looking at the posts on this subject and the wiki docs - leads me to believe that you can only use this when you are using the xml interface for importing data? thank you -- View this message in context: http://lucene.472066.n3.nabble.com/index-time-boosting-using-DIH-tp3985508.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: index-time boosting using DIH
See http://wiki.apache.org/solr/DataImportHandler#Special_Commands and the $docBoost pseudo-field name. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: geeky2 [mailto:gee...@hotmail.com] Sent: Tuesday, May 22, 2012 2:12 PM To: solr-user@lucene.apache.org Subject: index-time boosting using DIH hello all, can i use the technique described on the wiki at: http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts if i am populating my core using a DIH? looking at the posts on this subject and the wiki docs - leads me to believe that you can only use this when you are using the xml interface for importing data? thank you -- View this message in context: http://lucene.472066.n3.nabble.com/index-time-boosting-using-DIH-tp3985508.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Highlight feature
: That is the default response format. If you would like to change that, : you could extend the search handler or post process the XML data. : Another option would be to use the javabin (if your app is java based) : and build xml the way your app would need. there is actaully a more straight forward way to do stuff like this in trunk now, such that it can work with any response writer, using the DocTransformer API. there is already an ExplainAugmenter that can inline the explain info for a document, we just need someone to help write a corrisponding HighlightAugmenter , i've opened a Jira if anone wnats to take a crack at a patch... https://issues.apache.org/jira/browse/SOLR-3479 -Hoss
Re: Solr 3.6 fails when using XSLT
what does your results.xsl look like? or more sepcificly: can you post a very small example XSL that has this problem? you mentioned you are using xsl:include and that doesn't seem to work ... is that a seperate problem, or does removing/adding the xsl:including fix/cause this problem? what does your xsl:include look like? where to the various xsl templates live in the filesystem realtive to eachother? : Date: Fri, 11 May 2012 08:24:45 -0700 (PDT) : From: pramila_tha...@ontla.ola.org pramila_tha...@ontla.ola.org : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: Solr 3.6 fails when using XSLT : : Hi Everyone, : : I have recently upgraded to *solr 3.6 from solr 1.4.* : My XSL where working fine in solr 1.4. : : but now with Solr 3.6 I keep getting the following Error : : /getTransformer fails in getContentType java.lang.RuntimeException: : getTransformer fails in getContentType / : : But instead of results.xsl If I use example.xsl, it is fine. : : I fine my xsl:include does not seem to work with Solr 3.6 : : Can someone please let me know what am I doing wrong? : : Thanks, : : -- : View this message in context: http://lucene.472066.n3.nabble.com/Solr-3-6-fails-when-using-XSLT-tp3980240.html : Sent from the Solr - User mailing list archive at Nabble.com. -Hoss
RE: Solr 3.6 fails when using XSLT
Hi Everyone, This is what worked in solr 1.4 and did not work in solr 3.6. Actually solr 3.6 requires all the xsl to be present in conf/xslt directory All paths leading to xsl should be relative to conf directory. But before this was not the case. !- NOTE: Does NOT work with solr 3.6 , worked in Solr 1.4 xsl:include href=../webapps/pdfexample/xsl_scripts/highlight.xsl/ xsl:include href=../webapps/pdfexample/xsl_scripts/pagination.xsl/ xsl:include href=../webapps/pdfexample/xsl_scripts/simpleSearch.xsl/ xsl:include href=../webapps/pdfexample/xsl_scripts/bills/facets.xsl/ xsl:include href=../webapps/pdfexample/xsl_scripts/bills/advanceSearch.xsl/ xsl:include href=../webapps/pdfexample/xsl_scripts/bills/search_common_fields.xsl/ -- !-this is relative to conf directory, which is working for solr 3.6 -- xsl:include href=./xsl_scripts/highlight.xsl/ xsl:include href=./xsl_scripts/pagination.xsl/ xsl:include href=./xsl_scripts/simpleSearch.xsl/ xsl:include href=./xsl_scripts/bills/facets.xsl/ xsl:include href=./xsl_scripts/bills/advanceSearch.xsl/ xsl:include href=./xsl_scripts/bills/search_common_fields.xsl/ Thanks, --Pramila Thakur From: Chris Hostetter-3 [via Lucene] [mailto:ml-node+s472066n3985522...@n3.nabble.com] Sent: Tuesday, May 22, 2012 3:37 PM To: Thakur, Pramila Subject: Re: Solr 3.6 fails when using XSLT what does your results.xsl look like? or more sepcificly: can you post a very small example XSL that has this problem? you mentioned you are using xsl:include and that doesn't seem to work ... is that a seperate problem, or does removing/adding the xsl:including fix/cause this problem? what does your xsl:include look like? where to the various xsl templates live in the filesystem realtive to eachother? : Date: Fri, 11 May 2012 08:24:45 -0700 (PDT) : From: [hidden email]/user/SendEmail.jtp?type=nodenode=3985522i=0 [hidden email]/user/SendEmail.jtp?type=nodenode=3985522i=1 : Reply-To: [hidden email]/user/SendEmail.jtp?type=nodenode=3985522i=2 : To: [hidden email]/user/SendEmail.jtp?type=nodenode=3985522i=3 : Subject: Solr 3.6 fails when using XSLT : : Hi Everyone, : : I have recently upgraded to *solr 3.6 from solr 1.4.* : My XSL where working fine in solr 1.4. : : but now with Solr 3.6 I keep getting the following Error : : /getTransformer fails in getContentType java.lang.RuntimeException: : getTransformer fails in getContentType / : : But instead of results.xsl If I use example.xsl, it is fine. : : I fine my xsl:include does not seem to work with Solr 3.6 : : Can someone please let me know what am I doing wrong? : : Thanks, : : -- : View this message in context: http://lucene.472066.n3.nabble.com/Solr-3-6-fails-when-using-XSLT-tp3980240.html : Sent from the Solr - User mailing list archive at Nabble.com. -Hoss If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Solr-3-6-fails-when-using-XSLT-tp3980240p3985522.html To unsubscribe from Solr 3.6 fails when using XSLT, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3980240code=cHJhbWlsYV90aGFrdXJAb250bGEub2xhLm9yZ3wzOTgwMjQwfC0zOTMxMDQ3NzI=. NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-3-6-fails-when-using-XSLT-tp3980240p3985524.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Jetty rerturning HTTP error code 413
Hi Alexandre, Can you please let me know how did you fix this issue. I am also getting this error when I pass very large query to Solr. An reply is highly appreciated. Thanks, Sai
RE: index-time boosting using DIH
thanks for the reply, so to use the $docBoost pseudo-field name, would you do something like below - and would this technique likely increase my total index time? dataConfig dataSource .../ document name=mydoc entity name=myentity transformer=script:BoostDoc query=select ... field column=SOME_COLUMN name=someField / ... -- View this message in context: http://lucene.472066.n3.nabble.com/index-time-boosting-using-DIH-tp3985508p3985527.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: index-time boosting using DIH
You need to add the $docBoost pseudo-field to the document somehow. A transformer is one way to do it. You could just add it to a SELECT statement, which is especially convienent if the boost value somehow is derrived from the data: SELECT case when SELL_MORE_FLAG='Y' then 999 ELSE null END as '$docBoost', ...other fields... from some_table, etc Either way I wouldn't expect it to make the indexing be noticably slower. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: geeky2 [mailto:gee...@hotmail.com] Sent: Tuesday, May 22, 2012 3:06 PM To: solr-user@lucene.apache.org Subject: RE: index-time boosting using DIH thanks for the reply, so to use the $docBoost pseudo-field name, would you do something like below - and would this technique likely increase my total index time? dataConfig dataSource .../ document name=mydoc entity name=myentity transformer=script:BoostDoc query=select ... field column=SOME_COLUMN name=someField / ... -- View this message in context: http://lucene.472066.n3.nabble.com/index-time-boosting-using-DIH-tp3985508p3985527.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: index-time boosting using DIH
thank you james for the feedback - i appreciate it. ultimately - i was trying to decide if i was missing the boat by ONLY using query time boosting, and i should really be using index time boosting. but after your reply, reading the solr book, and looking at the lucene dox - it looks like index-time boosting is not what i need. i can probably do better by using query-time boosting and the proper sort params. thanks again -- View this message in context: http://lucene.472066.n3.nabble.com/index-time-boosting-using-DIH-tp3985508p3985539.html Sent from the Solr - User mailing list archive at Nabble.com.
always getting distinct count of -1 in luke response (solr4 snapshot)
We're testing a snapshot of Solr4 and I'm looking at some of the responses from the Luke request handler. Everything looks good so far, with the exception of the distinct attribute which (in Solr3) shows me the distinct number of terms for a given field. Given the request below, I'm consistently getting a response back with a value in the distinct field of -1. Is there something different I need to do to get back the actual distinct count? Thanks! Mike http://localhost:8080/solr/core1/admin/luke?wt=jsonfl=labelnumTerms=1 fields: { label: { type: text_general, schema: IT-M--, index: (unstored field), docs: 63887, *distinct: -1,* topTerms: [
Indexing Polygons
Hi All, I'm trying to figure out how to index polygons in solr (trunk). I'm using LSP right now as the solr integration of the new spatial module hasn't completed. I have searching for a point using a polygon working, but I'm also looking for searching for a polygon using a point. I've seen some indication that LSP supports this but I haven't been able to find an example. What field type would I need to use? Would it be multivalued? Please and thank you! Cody
Re: Faceted on Similarity ?
Take a look at the clustering component http://wiki.apache.org/solr/ClusteringComponent Consider clustering off line and indexing the pre calculated group memberships I might be wrong but I don't think their is any faceting mileage here. Depending upon the use case you might get some use out of the mlt handler http://wiki.apache.org/solr/MoreLikeThis On 22 May 2012 18:00, Robby java@phi-integration.com wrote: Hi All, I'm quite a new user both to Lucene / Solr. I want to ask if faceted search can be used to do a grouping for multiple field's value based on similarity ? I have look at the faceted index so far, but from my understanding they only works on exact single and definite range values. For example, if I have name, address, id number and nationality. And with rows that had a degree of similarity distance between these fields we will group them together. Sample results will be like this : Group1 Name : Angel, Address: Jakarta, ID : 123, Nationality: Indonesian Name : Angeline, Address: Jayakarta, ID : 123, Nationality: Indonesian Group2 Name : Frank, Address: Jl. Tubagus Angke Jakarta, ID : 333, Nationality: Indonesian Name : Frans, Address: Jl. T. Angke Jakarta, ID : 332, Nationality: Indonesian Hope I make myself clear and asking in proper way. Very sorry if my English is not good enough... Thanks, Robby
RE: Solr 3.6 fails when using XSLT
: This is what worked in solr 1.4 and did not work in solr 3.6. : : Actually solr 3.6 requires all the xsl to be present in conf/xslt directory : All paths leading to xsl should be relative to conf directory. : : But before this was not the case. Right ... this was actually a bug (in how all relative paths in xml includes, or xsl includes, were resolved) that was fixed in Solr 3.1, as noted in CHANGES.txt... * SOLR-1656: XIncludes and other HREFs in XML files loaded by ResourceLoader are fixed to be resolved using the URI standard (RFC 2396). The system identifier is no longer a plain filename with path, it gets initialized using a custom URI scheme solrres:. This scheme is resolved using a EntityResolver that utilizes ResourceLoader (org.apache.solr.common.util.SystemIdResolver). This makes all relative pathes in Solr's config files behave like expected. This change introduces some backwards breaks in the API: Some config classes (Config, SolrConfig, IndexSchema) were changed to take org.xml.sax.InputSource instead of InputStream. There may also be some backwards breaks in existing config files, it is recommended to check your config files / XSLTs and replace all XIncludes/HREFs that were hacked to use absolute paths to use relative ones. (uschindler) -Hoss
Re: Multicore solr
Hi, Thanks for your advice. It is basically a meta search application. Users can perform a search on N number of data sources at a time. We broadcast Parallel search to each selected data sources and write data to solr using custom build API(API and solr are deployed on separate machine API job is to perform parallel search, write data to solr ). API respond to application that some results are available then application fires a search query to display the results(query would be q=unique_search_id). And other side API keep writing data to solr and user can fire a search to solr to view all results. In current scenario we are using single solr server we performing real time index and search. Performing these operations on single solr making process slow as index size increases. So we are planning to use multi core solr and each user will have its core. All core will have the same schema. Please suggest if this approach has any issues. Rgds AJ On 22-May-2012, at 20:14, Sohail Aboobaker sabooba...@gmail.com wrote: It would help if you provide your use case. What are you indexing for each user and why would you need a separate core for indexing each user? How do you decide schema for each user? It might be better to describe your use case and desired results. People on the list will be able to advice on the best approach. Sohail