Re: How can i search site name

2012-05-22 Thread Shameema Umer
Sorry,
Please let me know how can I search site name using the solr query syntax.
My results should show title, url and content.
Title and content are being searched even though the
defaultSearchFieldcontent/defaultSearchField.

I need url or site name too. please, help.

Thanks in advance.

On Tue, May 22, 2012 at 11:05 AM, ketan kore ketankore...@gmail.com wrote:

 you can go on www.google.com and just type the site which you want to
 search and google will show you the results as simple as that ...



Re: How can i search site name

2012-05-22 Thread Li Li
you should define your search first.
if the site is www.google.com. how do you match it. full string
matching or partial matching. e.g. is google should match? if it
does, you should write your own analyzer for this field.

On Tue, May 22, 2012 at 2:03 PM, Shameema Umer shem...@gmail.com wrote:
 Sorry,
 Please let me know how can I search site name using the solr query syntax.
 My results should show title, url and content.
 Title and content are being searched even though the
 defaultSearchFieldcontent/defaultSearchField.

 I need url or site name too. please, help.

 Thanks in advance.

 On Tue, May 22, 2012 at 11:05 AM, ketan kore ketankore...@gmail.com wrote:

 you can go on www.google.com and just type the site which you want to
 search and google will show you the results as simple as that ...



Re: Indexing Searching MySQL table with Hindi and English data

2012-05-22 Thread KP Sanjailal
Hi,

Thank you so much for replying.

The MySQL database server is running on a Fedora Core 12 Machine with Hindi
Language Support enabled.  Details of the database are - ENGINE=3DMyISAM and
 DEFAULT CHARSET=3Dutf8

Data is imported using the Solr DataImportHandler (mysql jdbc driver).
In the schema.xml file the title field is defined as:
field name=title type=text_general indexed=true stored=true/

I tried saving the query results directly to a text file from the MySQL
command prompt but it is not storing the results correctly.  The file
contains the following characters.
à ¤¸à ¥Åà ¤° à ¤Šà ¤°à ¥8dà ¤Åà ¤¾ Saur oorja

First line of the data-config.xml is
?xml version=1.0 encoding=UTF-8?

Please suggest what I have to do to solve this issue.

Regards,

Sanjailal KP

On 5/21/12, Jack Krupansky j...@basetechnology.com wrote:
 Is it possible that your text editor/display does not support UTF-8
 encoding?

 Assuming the data is properly encoded, do you have the encoding=UTF-8
 attribute in your DIH dataSource tag?

 -- Jack Krupansky

 -Original Message-
 From: KP Sanjailal
 Sent: Monday, May 21, 2012 7:37 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Indexing  Searching MySQL table with Hindi and English data

 Hi,

 Thank you so much for replying.

 The MySQL database server is running on a Fedora Core 12 Machine with Hindi
 Language Support enabled.  Details of the database are - ENGINE=MyISAM and
 DEFAULT CHARSET=utf8

 Data is imported using the Solr DataImportHandler (mysql jdbc driver).
 In the schema.xml file the title field is defined as:
 field name=title type=text_general indexed=true stored=true/

 I tried saving the query results directly to a text file from the MySQL
 command prompt but it is not storing the results correctly.  The file
 contains the following characters.


 à ¤¸à ¥Åà ¤° à ¤Šà ¤°à ¥8dà ¤Åà ¤¾ Saur oorja

 Please suggest what I have to do to solve this issue.

 Regards,

 Sanjailal KP
 --



 On Sun, May 20, 2012 at 6:59 AM, Lance Norskog goks...@gmail.com wrote:

 Also, try saving data from a query into a file and verify that it is
 UTF-8 and the characters are correct.

 On Fri, May 18, 2012 at 7:54 AM, Jack Krupansky j...@basetechnology.com
 wrote:
  Check the analyzers for the field types containing Hindi text to be
  sure
  that they are not using a character mapping or folding filter that
 might
  mangle the Hindi characters. Post the field type, say for the title
 field.
 
  Also, try manually (using curl or the post jar) adding a single
  document
  that has Hindi data and see if that works.
 
  -- Jack Krupansky
 
  -Original Message- From: KP Sanjailal
  Sent: Thursday, May 17, 2012 5:55 AM
  To: solr-user@lucene.apache.org
  Subject: Indexing  Searching MySQL table with Hindi and English data
 
 
  Hi,
 
  I tried to setup indexing of MySQL tables in Apache Solr 3.6.
 
  Everything works fine but text in Hindi script (only some 10% of total
  records) not getting indexed properly.
 
  A search with keyword in Hindi retrieve emptly result set.  Also a
  retrieved hindi record displays junk characters.
 
  The database tables contains bibliographical details of books such as
  title, author, publisher, isbn, publishing place, series etc. and out
  of
  the total records about 10% of records contains text in Hindi in title,
  author, publisher fields.
 
  Example:
 
  *Search Results from MySQL using PHP*
 
   1.
  http://192.168.0.132/shared/biblio_view.php?bibid=26913tab=opac
   *Title:* सौर ऊर्जा Saur
  oorjahttp://192.168.0.132/shared/biblio_view.php?bibid=26913tab=opac
  *Author(s):* विनोद कुमार मिश्र MISHRA (VK) *Material:* Books **  **
  *Search Results from Apache Solr (searched using keyword in English)*
 
   1.
  http://192.168.0.132/test/biblio_view.php?bibid=26913tab=opac
   *Title:* सौर ऊरॠजा Saur
  oorjahttp://192.168.0.132/test/biblio_view.php?bibid=26913tab=opac
  *Author(s):* विनोद कॠमार मिशॠर MISHRA
  (VK)
 *
  Material:* Books
 
 
  How do I go about solving this language problem.
 
  Thanks in advace.
 
  K. P. Sanjailal
  --
 



 --
 Lance Norskog
 goks...@gmail.com





Re: Indexing Searching MySQL table with Hindi and English data

2012-05-22 Thread KP Sanjailal
Hi,

Thank you so much for replying.

The MySQL database server is running on a Fedora Core 12 Machine with Hindi
Language Support enabled.  Details of the database are - ENGINE=MyISAM and
 DEFAULT CHARSET=utf8

Data is imported using the Solr DataImportHandler (mysql jdbc driver).
In the schema.xml file the title field is defined as:
field name=title type=text_general indexed=true stored=true/

I tried saving the query results directly to a text file from the MySQL
command prompt but it is not storing the results correctly.  The file
contains the following characters.
à ¤¸à ¥Åà ¤° à ¤Šà ¤°à ¥8dà ¤Åà ¤¾ Saur oorja

First line of the data-config.xml is
?xml version=1.0 encoding=UTF-8?

Please suggest what I have to do to solve this issue.

Regards,

Sanjailal KP

On 5/22/12, KP Sanjailal kpsanjai...@gmail.com wrote:
 Hi,

 Thank you so much for replying.

 The MySQL database server is running on a Fedora Core 12 Machine with Hindi
 Language Support enabled.  Details of the database are - ENGINE=3DMyISAM
 and
  DEFAULT CHARSET=3Dutf8

 Data is imported using the Solr DataImportHandler (mysql jdbc driver).
 In the schema.xml file the title field is defined as:
 field name=title type=text_general indexed=true stored=true/

 I tried saving the query results directly to a text file from the MySQL
 command prompt but it is not storing the results correctly.  The file
 contains the following characters.
 à ¤¸à ¥Åà ¤° à ¤Šà ¤°à ¥8dà ¤Åà ¤¾ Saur oorja

 First line of the data-config.xml is
 ?xml version=1.0 encoding=UTF-8?

 Please suggest what I have to do to solve this issue.

 Regards,

 Sanjailal KP

 On 5/21/12, Jack Krupansky j...@basetechnology.com wrote:
 Is it possible that your text editor/display does not support UTF-8
 encoding?

 Assuming the data is properly encoded, do you have the encoding=UTF-8
 attribute in your DIH dataSource tag?

 -- Jack Krupansky

 -Original Message-
 From: KP Sanjailal
 Sent: Monday, May 21, 2012 7:37 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Indexing  Searching MySQL table with Hindi and English data

 Hi,

 Thank you so much for replying.

 The MySQL database server is running on a Fedora Core 12 Machine with
 Hindi
 Language Support enabled.  Details of the database are - ENGINE=MyISAM
 and
 DEFAULT CHARSET=utf8

 Data is imported using the Solr DataImportHandler (mysql jdbc driver).
 In the schema.xml file the title field is defined as:
 field name=title type=text_general indexed=true stored=true/

 I tried saving the query results directly to a text file from the MySQL
 command prompt but it is not storing the results correctly.  The file
 contains the following characters.


 à ¤¸à ¥Åà ¤° à ¤Šà ¤°à ¥8dà ¤Åà ¤¾ Saur oorja

 Please suggest what I have to do to solve this issue.

 Regards,

 Sanjailal KP
 --



 On Sun, May 20, 2012 at 6:59 AM, Lance Norskog goks...@gmail.com wrote:

 Also, try saving data from a query into a file and verify that it is
 UTF-8 and the characters are correct.

 On Fri, May 18, 2012 at 7:54 AM, Jack Krupansky
 j...@basetechnology.com
 wrote:
  Check the analyzers for the field types containing Hindi text to be
  sure
  that they are not using a character mapping or folding filter that
 might
  mangle the Hindi characters. Post the field type, say for the title
 field.
 
  Also, try manually (using curl or the post jar) adding a single
  document
  that has Hindi data and see if that works.
 
  -- Jack Krupansky
 
  -Original Message- From: KP Sanjailal
  Sent: Thursday, May 17, 2012 5:55 AM
  To: solr-user@lucene.apache.org
  Subject: Indexing  Searching MySQL table with Hindi and English data
 
 
  Hi,
 
  I tried to setup indexing of MySQL tables in Apache Solr 3.6.
 
  Everything works fine but text in Hindi script (only some 10% of total
  records) not getting indexed properly.
 
  A search with keyword in Hindi retrieve emptly result set.  Also a
  retrieved hindi record displays junk characters.
 
  The database tables contains bibliographical details of books such as
  title, author, publisher, isbn, publishing place, series etc. and out
  of
  the total records about 10% of records contains text in Hindi in
  title,
  author, publisher fields.
 
  Example:
 
  *Search Results from MySQL using PHP*
 
   1.
  http://192.168.0.132/shared/biblio_view.php?bibid=26913tab=opac
   *Title:* सौर ऊर्जा Saur
  oorjahttp://192.168.0.132/shared/biblio_view.php?bibid=26913tab=opac
  *Author(s):* विनोद कुमार मिश्र MISHRA (VK) *Material:* Books **  **
  *Search Results from Apache Solr (searched using keyword in English)*
 
   1.
  http://192.168.0.132/test/biblio_view.php?bibid=26913tab=opac
   *Title:* सौर ऊरॠजा Saur
  oorjahttp://192.168.0.132/test/biblio_view.php?bibid=26913tab=opac
  *Author(s):* विनोद कॠमार मिशॠर MISHRA
  (VK)
 *
  Material:* Books
 
 
  How do I go about solving this language problem.
 
  Thanks in advace.
 
  K. P. 

Re: Indexing Searching MySQL table with Hindi and English data

2012-05-22 Thread Lance Norskog
There are are many steps that can go wrong. Your platform should have
UTF-8 as its default encoding. Windows and Macos don't do this. I had
to configure Chrome to use UTF-8 as its default display encoding.
Also, if you use Tomcat, it has to be configured for UTF-8:

http://wiki.apache.org/solr/SolrTomcat

The characters you posted are not transferring correctly. I think you
need to decode them using one of the online unicode utility pages.

On Mon, May 21, 2012 at 4:57 AM, Jack Krupansky j...@basetechnology.com wrote:
 Is it possible that your text editor/display does not support UTF-8
 encoding?

 Assuming the data is properly encoded, do you have the encoding=UTF-8
 attribute in your DIH dataSource tag?


 -- Jack Krupansky

 -Original Message- From: KP Sanjailal
 Sent: Monday, May 21, 2012 7:37 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Indexing  Searching MySQL table with Hindi and English data


 Hi,

 Thank you so much for replying.

 The MySQL database server is running on a Fedora Core 12 Machine with Hindi
 Language Support enabled.  Details of the database are - ENGINE=MyISAM and
 DEFAULT CHARSET=utf8

 Data is imported using the Solr DataImportHandler (mysql jdbc driver).
 In the schema.xml file the title field is defined as:
 field name=title type=text_general indexed=true stored=true/

 I tried saving the query results directly to a text file from the MySQL
 command prompt but it is not storing the results correctly.  The file
 contains the following characters.


 à ¤¸à ¥Åà ¤° à ¤Šà ¤°à ¥8dà ¤Åà ¤¾ Saur oorja

 Please suggest what I have to do to solve this issue.

 Regards,

 Sanjailal KP
 --



 On Sun, May 20, 2012 at 6:59 AM, Lance Norskog goks...@gmail.com wrote:

 Also, try saving data from a query into a file and verify that it is
 UTF-8 and the characters are correct.

 On Fri, May 18, 2012 at 7:54 AM, Jack Krupansky j...@basetechnology.com
 wrote:
  Check the analyzers for the field types containing Hindi text to be sure
  that they are not using a character mapping or folding filter that
 might
  mangle the Hindi characters. Post the field type, say for the title
 field.
 
  Also, try manually (using curl or the post jar) adding a single document
  that has Hindi data and see if that works.
 
  -- Jack Krupansky
 
  -Original Message- From: KP Sanjailal
  Sent: Thursday, May 17, 2012 5:55 AM
  To: solr-user@lucene.apache.org
  Subject: Indexing  Searching MySQL table with Hindi and English data
 
 
  Hi,
 
  I tried to setup indexing of MySQL tables in Apache Solr 3.6.
 
  Everything works fine but text in Hindi script (only some 10% of total
  records) not getting indexed properly.
 
  A search with keyword in Hindi retrieve emptly result set.  Also a
  retrieved hindi record displays junk characters.
 
  The database tables contains bibliographical details of books such as
  title, author, publisher, isbn, publishing place, series etc. and out of
  the total records about 10% of records contains text in Hindi in title,
  author, publisher fields.
 
  Example:
 
  *Search Results from MySQL using PHP*
 
   1.
  http://192.168.0.132/shared/biblio_view.php?bibid=26913tab=opac
   *Title:* सौर ऊर्जा Saur
  oorjahttp://192.168.0.132/shared/biblio_view.php?bibid=26913tab=opac
  *Author(s):* विनोद कुमार मिश्र MISHRA (VK) *Material:* Books **  **
  *Search Results from Apache Solr (searched using keyword in English)*
 
   1.
  http://192.168.0.132/test/biblio_view.php?bibid=26913tab=opac
   *Title:* सौर ऊरॠजा Saur
  oorjahttp://192.168.0.132/test/biblio_view.php?bibid=26913tab=opac
  *Author(s):* विनोद कॠमार मिशॠर MISHRA (VK)
 *
  Material:* Books
 
 
  How do I go about solving this language problem.
 
  Thanks in advace.
 
  K. P. Sanjailal
  --
 



 --
 Lance Norskog
 goks...@gmail.com





-- 
Lance Norskog
goks...@gmail.com


Re: Indexing Searching MySQL table with Hindi and English data

2012-05-22 Thread KP Sanjailal
Hi

I have already configured the Tomcat instance as per the link
http://wiki.apache.org/solr/SolrTomcat for the URI Charset Config

The necessary updates have made in Tomcat's conf/server.xml with
URIEncoding=UTF-8.

Thank you for your reply.

Sanjailal KP
--

On 5/22/12, Lance Norskog goks...@gmail.com wrote:
 There are are many steps that can go wrong. Your platform should have
 UTF-8 as its default encoding. Windows and Macos don't do this. I had
 to configure Chrome to use UTF-8 as its default display encoding.
 Also, if you use Tomcat, it has to be configured for UTF-8:

 http://wiki.apache.org/solr/SolrTomcat

 The characters you posted are not transferring correctly. I think you
 need to decode them using one of the online unicode utility pages.

 On Mon, May 21, 2012 at 4:57 AM, Jack Krupansky j...@basetechnology.com
 wrote:
 Is it possible that your text editor/display does not support UTF-8
 encoding?

 Assuming the data is properly encoded, do you have the encoding=UTF-8
 attribute in your DIH dataSource tag?


 -- Jack Krupansky

 -Original Message- From: KP Sanjailal
 Sent: Monday, May 21, 2012 7:37 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Indexing  Searching MySQL table with Hindi and English data


 Hi,

 Thank you so much for replying.

 The MySQL database server is running on a Fedora Core 12 Machine with
 Hindi
 Language Support enabled.  Details of the database are - ENGINE=MyISAM
 and
 DEFAULT CHARSET=utf8

 Data is imported using the Solr DataImportHandler (mysql jdbc driver).
 In the schema.xml file the title field is defined as:
 field name=title type=text_general indexed=true stored=true/

 I tried saving the query results directly to a text file from the MySQL
 command prompt but it is not storing the results correctly.  The file
 contains the following characters.


 à ¤¸à ¥Åà ¤° à ¤Šà ¤°à ¥8dà ¤Åà ¤¾ Saur oorja

 Please suggest what I have to do to solve this issue.

 Regards,

 Sanjailal KP
 --



 On Sun, May 20, 2012 at 6:59 AM, Lance Norskog goks...@gmail.com wrote:

 Also, try saving data from a query into a file and verify that it is
 UTF-8 and the characters are correct.

 On Fri, May 18, 2012 at 7:54 AM, Jack Krupansky
 j...@basetechnology.com
 wrote:
  Check the analyzers for the field types containing Hindi text to be
  sure
  that they are not using a character mapping or folding filter that
 might
  mangle the Hindi characters. Post the field type, say for the title
 field.
 
  Also, try manually (using curl or the post jar) adding a single
  document
  that has Hindi data and see if that works.
 
  -- Jack Krupansky
 
  -Original Message- From: KP Sanjailal
  Sent: Thursday, May 17, 2012 5:55 AM
  To: solr-user@lucene.apache.org
  Subject: Indexing  Searching MySQL table with Hindi and English data
 
 
  Hi,
 
  I tried to setup indexing of MySQL tables in Apache Solr 3.6.
 
  Everything works fine but text in Hindi script (only some 10% of total
  records) not getting indexed properly.
 
  A search with keyword in Hindi retrieve emptly result set.  Also a
  retrieved hindi record displays junk characters.
 
  The database tables contains bibliographical details of books such as
  title, author, publisher, isbn, publishing place, series etc. and out
  of
  the total records about 10% of records contains text in Hindi in
  title,
  author, publisher fields.
 
  Example:
 
  *Search Results from MySQL using PHP*
 
   1.
  http://192.168.0.132/shared/biblio_view.php?bibid=26913tab=opac
   *Title:* सौर ऊर्जा Saur
  oorjahttp://192.168.0.132/shared/biblio_view.php?bibid=26913tab=opac
  *Author(s):* विनोद कुमार मिश्र MISHRA (VK) *Material:* Books **  **
  *Search Results from Apache Solr (searched using keyword in English)*
 
   1.
  http://192.168.0.132/test/biblio_view.php?bibid=26913tab=opac
   *Title:* सौर ऊरॠजा Saur
  oorjahttp://192.168.0.132/test/biblio_view.php?bibid=26913tab=opac
  *Author(s):* विनोद कॠमार मिशॠर MISHRA
  (VK)
 *
  Material:* Books
 
 
  How do I go about solving this language problem.
 
  Thanks in advace.
 
  K. P. Sanjailal
  --
 



 --
 Lance Norskog
 goks...@gmail.com





 --
 Lance Norskog
 goks...@gmail.com



System requirements in my case?

2012-05-22 Thread Bruno Mannina

Dear Solr users,

My company would like to use solr to index around 80 000 000 documents 
(xml files with around 5~10ko size each).

My program (robot) will connect to this solr with boolean requests.

Number of users: around 1000
Number of requests by user and by day: 300
Number of users by day: 30

I would like to subscribe to a host provider with this configuration:
- Dedicated Server
- Ubuntu
- Intel Xeon i7 2x 266+ GHz 12 Go 2 * 1500Go
- Unlimited bandwidth
- IP fixe

Do you think this configuration is enough?

Thanks for your info,
Sincerely
Bruno


Re: Question about sampling

2012-05-22 Thread Lance Norskog
My mistake- I did not research whether the data above is stored a
strings. The hashcode has to be stored as strings for this trick to
work.

On Sun, May 20, 2012 at 8:25 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 I'd be curious about this, too!
 I suspect the answer is: not doable, patches welcome. :)
 But I'd love to be wrong!

 Otis
 
 Performance Monitoring for Solr / ElasticSearch / HBase - 
 http://sematext.com/spm




 From: Yuval Dotan yuvaldo...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Wednesday, May 16, 2012 9:43 AM
Subject: Question about sampling

Hi Guys
We have an environment containing billions of documents.
Faceting over this large result set could take many seconds, and so we
thought we might be able to use statistical sampling of a smaller result
set from the facet, and give an approximate result much quicker.
Is there any way to facet only a random sample of the results?
Thanks
Yuval






-- 
Lance Norskog
goks...@gmail.com


Re: Indexing Searching MySQL table with Hindi and English data

2012-05-22 Thread Gora Mohanty
On 22 May 2012 12:07, KP Sanjailal kpsanjai...@gmail.com wrote:
 Hi,

 Thank you so much for replying.

 The MySQL database server is running on a Fedora Core 12 Machine with Hindi
 Language Support enabled.  Details of the database are - ENGINE=3DMyISAM and
  DEFAULT CHARSET=3Dutf8

 Data is imported using the Solr DataImportHandler (mysql jdbc driver).
 In the schema.xml file the title field is defined as:
 field name=title type=text_general indexed=true stored=true/

Please show us your schema.xml and the configuration
file for the DataImportHandler (you might wish to obscure
sensitive details like username/password). Have you tried
the SELECT in the DIH configuration outside of Solr. Is
it producing proper UTF-8?

Regards,
Gora


fsv=true not returning sort_values for distributed searches

2012-05-22 Thread XJ
We use fsv=true to help debug sortings which works great for
non-distributed searches. However, its not working (no sort_values in
response) for multi shard queries. Any idea how to get this fixed?

thanks,
XJ


Strategy for maintaining De-normalized indexes

2012-05-22 Thread Sohail Aboobaker
Hi,

I have a very basic question and hopefully there is a simple answer to
this. We are trying to index a simple product catalog which has a master
product and child products. Each master product can have multiple child
products. A master product can be assigned one or more product categories.
Now, we need to be able to show counts of categories based on number of
child products in each category. We have indexed data using a join and
selecting appropriate values for index from each table. This is basically a
De-normalized result set. It works perfectly for our search purposes.
However, maintaining the index and keeping index up to date is an issue.
Whenever a product master is updated with a new category, we will need to
delete all the index entries for child products in index and insert them
again. This seems a lot of activity for a regular on-going operation i.e.
product category updates.

Since, join between schemas is only available in 4.0, what are other
strategies to maintain or to create such queries.

Thanks for your help.

Regards,
Sohail


RE: trunk cloud ui not working

2012-05-22 Thread Phil Hoy
Hi, 

I was using windows 7 but it is fine with chrome on Windows Web Server 2008 R2 
also I asked a colleague with windows 7 and it is fine for him too, so really 
sorry but I think it was a !'works on my machine' thing. 

Of course if I track down the cause I will reply to this email again.

Thanks,
Phil

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: 21 May 2012 18:22
To: solr-user@lucene.apache.org
Subject: Re: trunk cloud ui not working

What OS? I was just trying trunk and looking at that view on Chrome on OSX and 
Linux and did not see an issue.

On May 21, 2012, at 1:15 PM, Phil Hoy wrote:

 After further investigation I have found that it is not a problem on firefox, 
 only chrome and IE. 
 
 Phil
 
 -Original Message-
 Sent: 21 May 2012 18:05
 To: solr-user@lucene.apache.org
 Subject: trunk cloud ui not working
 
 Hi,
 
 I am running from the trunk and the localhost:8983/solr/#/~cloud page shows 
 nothing but Fetch Zookeeper Data.
 
 If I run fiddler I see that:
 http://localhost:8983/solr/zookeeper?wt=jsondetail=truepath=%2Fclust
 erstate.json
 and
 http://localhost:8983/solr/zookeeper?wt=jsonpath=%2Flive_nodes
 are called and return data but no update to the ui.
 
 Cheers,
 Phil
 
 
 __
 This email has been scanned by the brightsolid Email Security System. 
 Powered by MessageLabs 
 __

- Mark Miller
lucidimagination.com












__
This email has been scanned by the brightsolid Email Security System. Powered 
by MessageLabs 
__


Re: How can i search site name

2012-05-22 Thread Jan Høydahl
You need to explain your case in much more detail to get precise help. Please 
read http://wiki.apache.org/solr/UsingMailingLists

If your problem is that you have a URL and want to know the domain for it, e.g. 
www.company.com/foo/bar/index.html and you want only www.company.com you can 
use the UrlClassifyProcessor, see SOLR-2826.

--
Jan Høydahl, search solution architect
Cominvent AS - www.facebook.com/Cominvent
Solr Training - www.solrtraining.com

On 22. mai 2012, at 08:03, Shameema Umer wrote:

 Sorry,
 Please let me know how can I search site name using the solr query syntax.
 My results should show title, url and content.
 Title and content are being searched even though the
 defaultSearchFieldcontent/defaultSearchField.
 
 I need url or site name too. please, help.
 
 Thanks in advance.
 
 On Tue, May 22, 2012 at 11:05 AM, ketan kore ketankore...@gmail.com wrote:
 
 you can go on www.google.com and just type the site which you want to
 search and google will show you the results as simple as that ...
 



Re: System requirements in my case?

2012-05-22 Thread findbestopensource
Dedicated Server may not be required. If you want to cut down cost, then
prefer shared server.

How much the RAM?

Regards
Aditya
www.findbestopensource.com


On Tue, May 22, 2012 at 12:36 PM, Bruno Mannina bmann...@free.fr wrote:

 Dear Solr users,

 My company would like to use solr to index around 80 000 000 documents
 (xml files with around 5~10ko size each).
 My program (robot) will connect to this solr with boolean requests.

 Number of users: around 1000
 Number of requests by user and by day: 300
 Number of users by day: 30

 I would like to subscribe to a host provider with this configuration:
 - Dedicated Server
 - Ubuntu
 - Intel Xeon i7 2x 266+ GHz 12 Go 2 * 1500Go
 - Unlimited bandwidth
 - IP fixe

 Do you think this configuration is enough?

 Thanks for your info,
 Sincerely
 Bruno



Re: Strategy for maintaining De-normalized indexes

2012-05-22 Thread Tanguy Moal
Hello,

Can't the ID (uniqueKey) of the indexed documents (i.e. denormalized data)
be a combination of the master product id and the child product id ?

Therefor whenever you update your master product db entry, you simply need
to reindex documents depending on the master product entry.

You can even simply reindex your whole DB, updates are made in place (i.e.
old documents are *completely* overwritten by their respective updates).

There's nothing to delete if you build your unique key in a maintainable
way.

You can re-index documents whenever you need to do so.

--
Tanguy

2012/5/22 Sohail Aboobaker sabooba...@gmail.com

 Hi,

 I have a very basic question and hopefully there is a simple answer to
 this. We are trying to index a simple product catalog which has a master
 product and child products. Each master product can have multiple child
 products. A master product can be assigned one or more product categories.
 Now, we need to be able to show counts of categories based on number of
 child products in each category. We have indexed data using a join and
 selecting appropriate values for index from each table. This is basically a
 De-normalized result set. It works perfectly for our search purposes.
 However, maintaining the index and keeping index up to date is an issue.
 Whenever a product master is updated with a new category, we will need to
 delete all the index entries for child products in index and insert them
 again. This seems a lot of activity for a regular on-going operation i.e.
 product category updates.

 Since, join between schemas is only available in 4.0, what are other
 strategies to maintain or to create such queries.

 Thanks for your help.

 Regards,
 Sohail



Re: System requirements in my case?

2012-05-22 Thread lboutros
Hi Bruno,

will you use facets and result sorting ? 
What is the update frequency/volume ?

This could impact the amount of memory/server count.

Ludovic.

-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/System-requirements-in-my-case-tp3985309p3985327.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Strategy for maintaining De-normalized indexes

2012-05-22 Thread findbestopensource
Thats how de-normalization works. You need to update all child products.

If you just need the count and you are using facets then maintain a map
between category and main product, main product and child product. Lucene
db has no schema. You could retrieve the data based on its type.

Category record will have Category name, ProductName and a type
(CATEGORY_TYPE)
Child product record will have ProductName, MainProductName ProductDetails,
and type (PRODUCT_TYPE)

Now in this you may need to use two queries. Given the category name, fetch
the main product name and query using it to fetch the child products. Hope
it helps.

Regards
Aditya
www.findbestopensource.com


On Tue, May 22, 2012 at 1:37 PM, Sohail Aboobaker sabooba...@gmail.comwrote:

 Hi,

 I have a very basic question and hopefully there is a simple answer to
 this. We are trying to index a simple product catalog which has a master
 product and child products. Each master product can have multiple child
 products. A master product can be assigned one or more product categories.
 Now, we need to be able to show counts of categories based on number of
 child products in each category. We have indexed data using a join and
 selecting appropriate values for index from each table. This is basically a
 De-normalized result set. It works perfectly for our search purposes.
 However, maintaining the index and keeping index up to date is an issue.
 Whenever a product master is updated with a new category, we will need to
 delete all the index entries for child products in index and insert them
 again. This seems a lot of activity for a regular on-going operation i.e.
 product category updates.

 Since, join between schemas is only available in 4.0, what are other
 strategies to maintain or to create such queries.

 Thanks for your help.

 Regards,
 Sohail



Multicore Solr

2012-05-22 Thread Shanu Jha
Hi all,

greetings from my end. This is my first post on this mailing list. I have
few questions on multicore solr. For background we want to create a core
for each user logged in to our application. In that case it may be 50, 100,
1000, N-numbers. Each core will be used to write and search index in real
time.

1. Is this a good idea to go with?
2. What are the pros and cons of this approch?

Awaiting for your response.

Regards
AJ


Re: System requirements in my case?

2012-05-22 Thread Bruno Mannina

My choice: http://www.ovh.com/fr/serveurs_dedies/eg_best_of.xml

24 Go DDR3

Le 22/05/2012 10:26, findbestopensource a écrit :

Dedicated Server may not be required. If you want to cut down cost, then
prefer shared server.

How much the RAM?

Regards
Aditya
www.findbestopensource.com


On Tue, May 22, 2012 at 12:36 PM, Bruno Manninabmann...@free.fr  wrote:


Dear Solr users,

My company would like to use solr to index around 80 000 000 documents
(xml files with around 5~10ko size each).
My program (robot) will connect to this solr with boolean requests.

Number of users: around 1000
Number of requests by user and by day: 300
Number of users by day: 30

I would like to subscribe to a host provider with this configuration:
- Dedicated Server
- Ubuntu
- Intel Xeon i7 2x 266+ GHz 12 Go 2 * 1500Go
- Unlimited bandwidth
- IP fixe

Do you think this configuration is enough?

Thanks for your info,
Sincerely
Bruno





Re: System requirements in my case?

2012-05-22 Thread Bruno Mannina

Hi,

facets I don't know yet because I don't know exactly what is facets (sorry)

Sorting: yes
Scoring: yes

Concerning update Frequency : every week
Volume: around 1Go data by year


Merci beaucoup :)

Aix En Provence
France

Le 22/05/2012 10:35, lboutros a écrit :

Hi Bruno,

will you use facets and result sorting ?
What is the update frequency/volume ?

This could impact the amount of memory/server count.

Ludovic.

-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/System-requirements-in-my-case-tp3985309p3985327.html
Sent from the Solr - User mailing list archive at Nabble.com.






Re: Multicore Solr

2012-05-22 Thread findbestopensource
Having cores per user is not good idea. The count is too high. Keep
everything in single core. You could filter the data based on user name or
user id.

Regards
Aditya
www.findbestopensource.com



On Tue, May 22, 2012 at 2:29 PM, Shanu Jha shanuu@gmail.com wrote:

 Hi all,

 greetings from my end. This is my first post on this mailing list. I have
 few questions on multicore solr. For background we want to create a core
 for each user logged in to our application. In that case it may be 50, 100,
 1000, N-numbers. Each core will be used to write and search index in real
 time.

 1. Is this a good idea to go with?
 2. What are the pros and cons of this approch?

 Awaiting for your response.

 Regards
 AJ



Re: System requirements in my case?

2012-05-22 Thread findbestopensource
Seems to be fine. Go head.

Before hosting, Have you tried / tested your application in local setup.
RAM usage is what matters in terms of Solr. Just benchmark your app for 100
000 documents, Log the memory used. Calculate the RAM reqd for 80 000 000
documents.

Regards
Aditya
www.findbestopensource.com


On Tue, May 22, 2012 at 2:36 PM, Bruno Mannina bmann...@free.fr wrote:

 My choice: 
 http://www.ovh.com/fr/**serveurs_dedies/eg_best_of.xmlhttp://www.ovh.com/fr/serveurs_dedies/eg_best_of.xml

 24 Go DDR3

 Le 22/05/2012 10:26, findbestopensource a écrit :

  Dedicated Server may not be required. If you want to cut down cost, then
 prefer shared server.

 How much the RAM?

 Regards
 Aditya
 www.findbestopensource.com


 On Tue, May 22, 2012 at 12:36 PM, Bruno Manninabmann...@free.fr  wrote:

  Dear Solr users,

 My company would like to use solr to index around 80 000 000 documents
 (xml files with around 5~10ko size each).
 My program (robot) will connect to this solr with boolean requests.

 Number of users: around 1000
 Number of requests by user and by day: 300
 Number of users by day: 30

 I would like to subscribe to a host provider with this configuration:
 - Dedicated Server
 - Ubuntu
 - Intel Xeon i7 2x 266+ GHz 12 Go 2 * 1500Go
 - Unlimited bandwidth
 - IP fixe

 Do you think this configuration is enough?

 Thanks for your info,
 Sincerely
 Bruno





Re: is commit a sequential process in solr indexing

2012-05-22 Thread findbestopensource
Yes. Lucene / Solr supports multi threaded environment. You could do commit
from two different threads to same core or different core.

Regards
Aditya
www.findbestopensource.com

On Tue, May 22, 2012 at 12:35 AM, jame vaalet jamevaa...@gmail.com wrote:

 hi,
 my use case here is to search all the incoming documents for certain
 comination of words which are pre-determined. So what am doing here is,
 create a batch of x docs according to their creation date, index them,
 commit them and search them for query (pre-determined).
 My question is, if i have to make the entire process multi threaded and two
 threads are trying to commit two different set of batchs, will the commit
 happen in parallel. what if am trying to commit to different solr-cores ?

 --

 -JAME



Re: Strategy for maintaining De-normalized indexes

2012-05-22 Thread Sohail Aboobaker
Thank you for quick replies.

Can't the ID (uniqueKey) of the indexed documents (i.e. denormalized data)
be a combination of the master product id and the child product id ?
  -- We do not need it as each child is already a unique key.

Therefore whenever you update your master product db entry, you simply
need  to reindex documents depending on the master product entry.
  -- This is where the confusion might be. I may have misread it but Apache
Solr3 Enterprise Search, it mentions that if any part of the document
needs to be updated, the entire document must be replaced. Internally this
is a deletion and an addition. Is re-indexing all detail records a huge
performance hit? Assuming that a master can have upto 10 to 20k of child
records?

Thanks again.

Sohail


Re: How can i search site name

2012-05-22 Thread Jan Høydahl
Hi,

I would probably use (e)DisMax.
Index your url and metadata fields as text without stemming, e.g. text_general
Then query as q=mycompanydefType=edismaxqf=title^10 content^1 url^5
If you like to give higher weight to the domain/site part of the URL, apply 
UrlClassifyProcessor and search the domain field separately with higher 
weight.

--
Jan Høydahl, search solution architect
Cominvent AS - www.facebook.com/Cominvent
Solr Training - www.solrtraining.com

On 22. mai 2012, at 12:23, Shameema Umer wrote:

 Thanks Li Li and Jan.
 
 Yes, if url is www.company.com/foo/bar/index.html, I should be able to
 search the sub-strings like company, foo or bar etc.
 
 when I changed the part of my schema file from
 
defaultSearchFieldcontent/defaultSearchField
 
 to
 
   defaultSearchFieldstext/defaultSearchField
   copyField source=title dest=stext/
   copyField source=content dest=stext/
   copyField source=site dest=stext/
 
 server error occurred after restarting solr. Do I need to re-index solr.
 Please help me as i need to search title url and content with privilege to
 title. If DisMaxRequestHandler helps me solve my problems, let me know the
 best tutorial page to study
 it.http://wiki.apache.org/solr/DisMaxRequestHandler?action=fullsearchcontext=180value=linkto%3A%22DisMaxRequestHandler%22
 
 Thanks
 Shameema
 http://wiki.apache.org/solr/DisMaxRequestHandler?action=fullsearchcontext=180value=linkto%3A%22DisMaxRequestHandler%22



Re: Strategy for maintaining De-normalized indexes

2012-05-22 Thread Tanguy Moal
It all depends on the frequency at which you refresh your data, on your
deployment (master/slave setup), ...
Many things need to be taken into account!

Did you face any performance issue while building your index?
If you didn't, rebuilding it shouldn't be more problematic.

--
Tanguy

2012/5/22 Sohail Aboobaker sabooba...@gmail.com

 Thank you for quick replies.

 Can't the ID (uniqueKey) of the indexed documents (i.e. denormalized data)
 be a combination of the master product id and the child product id ?
   -- We do not need it as each child is already a unique key.

 Therefore whenever you update your master product db entry, you simply
 need  to reindex documents depending on the master product entry.
   -- This is where the confusion might be. I may have misread it but Apache
 Solr3 Enterprise Search, it mentions that if any part of the document
 needs to be updated, the entire document must be replaced. Internally this
 is a deletion and an addition. Is re-indexing all detail records a huge
 performance hit? Assuming that a master can have upto 10 to 20k of child
 records?

 Thanks again.

 Sohail



Re: Strategy for maintaining De-normalized indexes

2012-05-22 Thread Sohail Aboobaker
We are still in design phase, so we haven't hit any performance issues. We
do not want to discover performance issues too late during QA :) We would
rather account for any issues during the design phase.

The refresh rate on fields that we are using from master table will be
rare. May be three or four times in a year.

Regards,
Sohail


Re: How can i search site name

2012-05-22 Thread Shameema Umer
Thanks Jan.* It worked perfect*. Thats all i needed.
May the God bless you.

Regards
Shameema

On Tue, May 22, 2012 at 4:57 PM, Jan Høydahl jan@cominvent.com wrote:

 Hi,

 I would probably use (e)DisMax.
 Index your url and metadata fields as text without stemming, e.g.
 text_general
 Then query as q=mycompanydefType=edismaxqf=title^10 content^1 url^5
 If you like to give higher weight to the domain/site part of the URL,
 apply UrlClassifyProcessor and search the domain field separately with
 higher weight.

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.facebook.com/Cominvent
 Solr Training - www.solrtraining.com

 On 22. mai 2012, at 12:23, Shameema Umer wrote:

  Thanks Li Li and Jan.
 
  Yes, if url is www.company.com/foo/bar/index.html, I should be able to
  search the sub-strings like company, foo or bar etc.
 
  when I changed the part of my schema file from
 
 defaultSearchFieldcontent/defaultSearchField
 
  to
 
defaultSearchFieldstext/defaultSearchField
copyField source=title dest=stext/
copyField source=content dest=stext/
copyField source=site dest=stext/
 
  server error occurred after restarting solr. Do I need to re-index solr.
  Please help me as i need to search title url and content with privilege
 to
  title. If DisMaxRequestHandler helps me solve my problems, let me know
 the
  best tutorial page to study
  it.
 http://wiki.apache.org/solr/DisMaxRequestHandler?action=fullsearchcontext=180value=linkto%3A%22DisMaxRequestHandler%22
 
 
  Thanks
  Shameema
  
 http://wiki.apache.org/solr/DisMaxRequestHandler?action=fullsearchcontext=180value=linkto%3A%22DisMaxRequestHandler%22
 




Re: System requirements in my case?

2012-05-22 Thread Jan Høydahl
Hi,

It is impossible to guess the required HW size without more knowledge about 
data and usage. 80 mill docs is a fair amount.

Here's how I would approach sizing the setup:
1) Get your schema in shape, removing unnecessary stored/indexed fields
2) To a test index locally of a part of the dataset, e.g. 10 mill docs and 
perform an Optimize
3) Measure the size of the index folder, multiply with 8 to get a clue of total 
index size
4) Do some benchmarking with realistic types of queries to identify performance 
bottlenecks on query side

Depending on your requirements for search performance, you can beef up your RAM 
to hold the whole index or depend on slow disks as a bottleneck. If you find 
that total size of index is 16Gb, you should leave 16Gb free for OS disk 
caching, e.g. allocate 8Gb to Tomcat/Solr and leave the rest for the OS. If I 
should guess, you probably find that one server gets overloaded or too slow 
with your amount of docs, and that you end up with sharding across 2-4 servers.

PS: Do you always need to search all data? A trick may be to partition your 
data such that say 80% of searches go to a fresh index with 10% of the 
content, while the remaining searches include everything.

--
Jan Høydahl, search solution architect
Cominvent AS - www.facebook.com/Cominvent
Solr Training - www.solrtraining.com

On 22. mai 2012, at 11:06, Bruno Mannina wrote:

 My choice: http://www.ovh.com/fr/serveurs_dedies/eg_best_of.xml
 
 24 Go DDR3
 
 Le 22/05/2012 10:26, findbestopensource a écrit :
 Dedicated Server may not be required. If you want to cut down cost, then
 prefer shared server.
 
 How much the RAM?
 
 Regards
 Aditya
 www.findbestopensource.com
 
 
 On Tue, May 22, 2012 at 12:36 PM, Bruno Manninabmann...@free.fr  wrote:
 
 Dear Solr users,
 
 My company would like to use solr to index around 80 000 000 documents
 (xml files with around 5~10ko size each).
 My program (robot) will connect to this solr with boolean requests.
 
 Number of users: around 1000
 Number of requests by user and by day: 300
 Number of users by day: 30
 
 I would like to subscribe to a host provider with this configuration:
 - Dedicated Server
 - Ubuntu
 - Intel Xeon i7 2x 266+ GHz 12 Go 2 * 1500Go
 - Unlimited bandwidth
 - IP fixe
 
 Do you think this configuration is enough?
 
 Thanks for your info,
 Sincerely
 Bruno
 
 



Re: Multicore Solr

2012-05-22 Thread Shanu Jha
Hi,

Could please tell me what do you mean by filter data by users? I would like
to know is there real problem creating a core for a user. ie. resource
utilization, cpu usage etc.

AJ

On Tue, May 22, 2012 at 4:39 PM, findbestopensource 
findbestopensou...@gmail.com wrote:

 Having cores per user is not good idea. The count is too high. Keep
 everything in single core. You could filter the data based on user name or
 user id.

 Regards
 Aditya
 www.findbestopensource.com



 On Tue, May 22, 2012 at 2:29 PM, Shanu Jha shanuu@gmail.com wrote:

  Hi all,
 
  greetings from my end. This is my first post on this mailing list. I have
  few questions on multicore solr. For background we want to create a core
  for each user logged in to our application. In that case it may be 50,
 100,
  1000, N-numbers. Each core will be used to write and search index in real
  time.
 
  1. Is this a good idea to go with?
  2. What are the pros and cons of this approch?
 
  Awaiting for your response.
 
  Regards
  AJ
 



Re: SolrCloud: how to index documents into a specific core and how to search against that core?

2012-05-22 Thread Yandong Yao
Hi Darren,

Thanks very much for your reply.

The reason I want to control core indexing/searching is that I want to
use one core to store one customer's data (all customer share same
config):  such as customer 1 use coreForCustomer1 and customer 2
use coreForCustomer2.

Is there any better way than using different core for different customer?

Another way maybe use different collection for different customer, while
not sure how many collections solr cloud could support. Which way is better
in terms of flexibility/scalability? (suppose there are tens of thousands
customers).

Regards,
Yandong

2012/5/22 Darren Govoni dar...@ontrenet.com

 Why do you want to control what gets indexed into a core and then
 knowing what core to search? That's the kind of knowing that SolrCloud
 solves. In SolrCloud, it handles the distribution of documents across
 shards and retrieves them regardless of which node is searched from.
 That is the point of cloud, you don't know the details of where
 exactly documents are being managed (i.e. they are cloudy). It can
 change and re-balance from time to time. SolrCloud performs the
 distributed search for you, therefore when you try to search a node/core
 with no documents, all the results from the cloud are retrieved
 regardless. This is considered A Good Thing.

 It requires a change in thinking about indexing and searching

 On Tue, 2012-05-22 at 08:43 +0800, Yandong Yao wrote:
  Hi Guys,
 
  I use following command to start solr cloud according to solr cloud wiki.
 
  yydzero:example bjcoe$ java -Dbootstrap_confdir=./solr/conf
  -Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar
  yydzero:example2 bjcoe$ java -Djetty.port=7574 -DzkHost=localhost:9983
 -jar
  start.jar
 
  Then I have created several cores using CoreAdmin API (
  http://localhost:8983/solr/admin/cores?action=CREATEname=
  coreNamecollection=collection1), and clusterstate.json show following
  topology:
 
 
  collection1:
  -- shard1:
-- collection1
-- CoreForCustomer1
-- CoreForCustomer3
-- CoreForCustomer5
  -- shard2:
-- collection1
-- CoreForCustomer2
-- CoreForCustomer4
 
 
  1) Index:
 
  Using following command to index mem.xml file in exampledocs directory.
 
  yydzero:exampledocs bjcoe$ java -Durl=
  http://localhost:8983/solr/coreForCustomer3/update -jar post.jar mem.xml
  SimplePostTool: version 1.4
  SimplePostTool: POSTing files to
  http://localhost:8983/solr/coreForCustomer3/update..
  SimplePostTool: POSTing file mem.xml
  SimplePostTool: COMMITting Solr index changes.
 
  And now SolrAdmin UI shows that 'coreForCustomer1', 'coreForCustomer3',
  'coreForCustomer5' has 3 documents (mem.xml has 3 documents) and other 2
  core has 0 documents.
 
  *Question 1:*  Is this expected behavior? How do I to index documents
 into
  a specific core?
 
  *Question 2*:  If SolrCloud don't support this yet, how could I extend it
  to support this feature (index document to particular core), where
 should i
  start, the hashing algorithm?
 
  *Question 3*:  Why the documents are also indexed into 'coreForCustomer1'
  and 'coreForCustomer5'?  The default replica for documents are 1, right?
 
  Then I try to index some document to 'coreForCustomer2':
 
  $ java -Durl=http://localhost:8983/solr/coreForCustomer2/update -jar
  post.jar ipod_video.xml
 
  While 'coreForCustomer2' still have 0 documents and documents in
 ipod_video
  are indexed to core for customer 1/3/5.
 
  *Question 4*:  Why this happens?
 
  2) Search: I use 
  http://localhost:8983/solr/coreForCustomer2/select?q=*%3A*wt=xml; to
  search against 'CoreForCustomer2', while it will return all documents in
  the whole collection even though this core has no documents at all.
 
  Then I use 
 
 http://localhost:8983/solr/coreForCustomer2/select?q=*%3A*wt=xmlshards=localhost:8983/solr/coreForCustomer2
 ,
  and it will return 0 documents.
 
  *Question 5*: So If want to search against a particular core, we need to
  use 'shards' parameter and use solrCore name as parameter value, right?
 
 
  Thanks very much in advance!
 
  Regards,
  Yandong





Re: System requirements in my case?

2012-05-22 Thread Bruno Mannina
I installed a temp server on my university with 12 000 docs (Ubuntu+solr 
3.6.0)

May be I can preview the size of memory I need?

Q: How can I check the memory used?


Le 22/05/2012 13:14, findbestopensource a écrit :

Seems to be fine. Go head.

Before hosting, Have you tried / tested your application in local setup.
RAM usage is what matters in terms of Solr. Just benchmark your app for 100
000 documents, Log the memory used. Calculate the RAM reqd for 80 000 000
documents.

Regards
Aditya
www.findbestopensource.com


On Tue, May 22, 2012 at 2:36 PM, Bruno Manninabmann...@free.fr  wrote:


My choice: 
http://www.ovh.com/fr/**serveurs_dedies/eg_best_of.xmlhttp://www.ovh.com/fr/serveurs_dedies/eg_best_of.xml

24 Go DDR3

Le 22/05/2012 10:26, findbestopensource a écrit :

  Dedicated Server may not be required. If you want to cut down cost, then

prefer shared server.

How much the RAM?

Regards
Aditya
www.findbestopensource.com


On Tue, May 22, 2012 at 12:36 PM, Bruno Manninabmann...@free.fr   wrote:

  Dear Solr users,

My company would like to use solr to index around 80 000 000 documents
(xml files with around 5~10ko size each).
My program (robot) will connect to this solr with boolean requests.

Number of users: around 1000
Number of requests by user and by day: 300
Number of users by day: 30

I would like to subscribe to a host provider with this configuration:
- Dedicated Server
- Ubuntu
- Intel Xeon i7 2x 266+ GHz 12 Go 2 * 1500Go
- Unlimited bandwidth
- IP fixe

Do you think this configuration is enough?

Thanks for your info,
Sincerely
Bruno






RE: Wildcard-Search Solr 3.5.0

2012-05-22 Thread spring
  The text may contain FooBar.
  
  When I do a wildcard search like this: Foo* - no hits.
  When I do a wildcard search like this: foo* - doc is
  found.
 
 Please see http://wiki.apache.org/solr/MultitermQueryAnalysis


Well, it works in 3.6. With one exception: If I use german umlauts it does
not work anymore.

Text: Bär

Bä* - no hits
Bär - hits

What can I do in this case?

Thank you



Re: System requirements in my case?

2012-05-22 Thread Bruno Mannina

Hi Jan,

Thanks for all these details !

Answers are below.

Sincerely,
Bruno


Le 22/05/2012 13:58, Jan Høydahl a écrit :

Hi,

It is impossible to guess the required HW size without more knowledge about 
data and usage. 80 mill docs is a fair amount.

Here's how I would approach sizing the setup:
1) Get your schema in shape, removing unnecessary stored/indexed fields

Ok good idea !

2) To a test index locally of a part of the dataset, e.g. 10 mill docs and 
perform an Optimize

Concerning test, I have only actually a sample with 12000 docs. no more :'(

3) Measure the size of the index folder, multiply with 8 to get a clue of total 
index size

With 12 000 docs my index folder size is: 33Mo
ps: I use solr.clustering.enabled=true


4) Do some benchmarking with realistic types of queries to identify performance 
bottlenecks on query side

yep, this point is for later.


Depending on your requirements for search performance, you can beef up your RAM to 
hold the whole index or depend on slow disks as a bottleneck. If you find that 
total size of index is 16Gb, you should leave16Gb free for OS disk caching, 
e.g. allocate 8Gb to Tomcat/Solr and leave the rest for the OS. If I should guess, 
you probably find that one server gets overloaded or too slow with your amount of 
docs, and that you end up with sharding across 2-4 servers.
I will take a look to see if I can easely increase RAM on the server 
(actually 24Go)


Another question concerning the execution of solr, have just to run java 
-jar start.jar ?

or you think I must run it with another way ?



PS: Do you always need to search all data? A trick may be to partition your data such 
that say 80% of searches go to a fresh index with 10% of the content, while 
the remaining searches include everything.
Yes I need to search to the whole index, even old document must be 
requested.




--
Jan Høydahl, search solution architect
Cominvent AS - www.facebook.com/Cominvent
Solr Training - www.solrtraining.com

On 22. mai 2012, at 11:06, Bruno Mannina wrote:


My choice: http://www.ovh.com/fr/serveurs_dedies/eg_best_of.xml

24 Go DDR3

Le 22/05/2012 10:26, findbestopensource a écrit :

Dedicated Server may not be required. If you want to cut down cost, then
prefer shared server.

How much the RAM?

Regards
Aditya
www.findbestopensource.com


On Tue, May 22, 2012 at 12:36 PM, Bruno Manninabmann...@free.fr   wrote:


Dear Solr users,

My company would like to use solr to index around 80 000 000 documents
(xml files with around 5~10ko size each).
My program (robot) will connect to this solr with boolean requests.

Number of users: around 1000
Number of requests by user and by day: 300
Number of users by day: 30

I would like to subscribe to a host provider with this configuration:
- Dedicated Server
- Ubuntu
- Intel Xeon i7 2x 266+ GHz 12 Go 2 * 1500Go
- Unlimited bandwidth
- IP fixe

Do you think this configuration is enough?

Thanks for your info,
Sincerely
Bruno








Re: Newbie with Carrot2?

2012-05-22 Thread Stanislaw Osinski
Hi Bruno,

Just to confirm -- are you seeing the clusters array in the result at all
(arr name=clusters)? To get reasonable clusters, you should request at
least 30-50 documents (rows), but even with smaller values, you should see
an empty clusters array.

Staszek

On Sun, May 20, 2012 at 9:20 PM, Bruno Mannina bmann...@free.fr wrote:

 Le 20/05/2012 11:43, Stanislaw Osinski a écrit :

  Hi Bruno,

 Here's the wiki documentation for Solr's clustering component:

 http://wiki.apache.org/solr/**ClusteringComponenthttp://wiki.apache.org/solr/ClusteringComponent

 For configuration examples, take a look at the Configuration section:
 http://wiki.apache.org/solr/**ClusteringComponent#**Configurationhttp://wiki.apache.org/solr/ClusteringComponent#Configuration
 .

 If you hit any problems, let me know.

 Staszek

 On Sun, May 20, 2012 at 11:38 AM, Bruno Manninabmann...@free.fr  wrote:

  Dear all,

 I use Solr 3.6.0 and I indexed some documents (around 12000).
 Each documents contains a Abstract-en field (and some other fields).

 Is it possible to use Carrot2 to create cluster (classes) with the
 Abstract-en field?

 What must I configure in the schema.xml ? or in other files?

 Sorry for my newbie question, but I found only documentation for
 Workbench
 tool.

 Bruno

  Thx for this link but I have a problem to configure my solrconfig.xml
 in the section:
 (note I run java -Dsolr.clustering.enabled=**true)

 I have a field named abstract-en, and I would like to use only this field.

 I would like to know if my requestHandler is good?
 I have a doubt with the content of  : carrot.title, carrot.url

 and also the latest field
 str name=dfabstract-en/str
 str name=defTypeedismax/str
 str name=qf
  abstract-en^1.0
 /str
 str name=q.alt*:*/str
 str name=rows10/str
 str name=fl*,score/str

 because the result when I do a request is exactly like a search request
 (without more information)


 My entire requestHandler is:

 requestHandler name=/clustering startup=lazy
 enable=${solr.clustering.**enabled:false} class=solr.SearchHandler
 lst name=defaults
 bool name=clusteringtrue/bool
 str name=clustering.engine**default/str
 bool name=clustering.results**true/bool
 !-- The title field --
 str name=carrot.titlename/str
 str name=carrot.urlid/str
 !-- The field to cluster on --
 str name=carrot.snippet**abstract-en/str
 !-- produce summaries --
 bool name=carrot.produceSummary**true/bool
 !-- the maximum number of labels per cluster --
 !--int name=carrot.numDescriptions**5/int--
 !-- produce sub clusters --
 bool name=carrot.**outputSubClustersfalse/**bool
 str name=dfabstract-en/str
 str name=defTypeedismax/str
 str name=qf
  abstract-en^1.0
 /str
 str name=q.alt*:*/str
 str name=rows10/str
 str name=fl*,score/str
 /lst
 arr name=last-components
 strclustering/str
 /arr
 /requestHandler




Re: Question about sampling

2012-05-22 Thread rita
Hi Lance, 
Could you provide more details about implementing this using
SignatureUpdateProcessor? 
Example can be helpful. 

-
Rita
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-about-sampling-tp3984103p3985379.html
Sent from the Solr - User mailing list archive at Nabble.com.


Multicore solr

2012-05-22 Thread Shanu Jha
Hi all,

greetings from my end. This is my first post on this mailing list. I have
few questions on multicore solr. For background we want to create a core
for each user logged in to our application. In that case it may be 50, 100,
1000, N-numbers. Each core will be used to write and search index in real
time.

1. Is this a good idea to go with?
2. What are the pros and cons of this approch?

Awaiting for your response.

Regards
AJ


Re: Newbie with Carrot2?

2012-05-22 Thread Bruno Mannina

Arfff

Clusters are at the end of my XML answer 

doc
/doc
doc
/doc
doc
/doc
doc
/doc
..
..
cluster
/cluster

ok all work fine now !


Le 22/05/2012 15:33, Stanislaw Osinski a écrit :

Hi Bruno,

Just to confirm -- are you seeing the clusters array in the result at all
(arr name=clusters)? To get reasonable clusters, you should request at
least 30-50 documents (rows), but even with smaller values, you should see
an empty clusters array.

Staszek

On Sun, May 20, 2012 at 9:20 PM, Bruno Manninabmann...@free.fr  wrote:


Le 20/05/2012 11:43, Stanislaw Osinski a écrit :

  Hi Bruno,

Here's the wiki documentation for Solr's clustering component:

http://wiki.apache.org/solr/**ClusteringComponenthttp://wiki.apache.org/solr/ClusteringComponent

For configuration examples, take a look at the Configuration section:
http://wiki.apache.org/solr/**ClusteringComponent#**Configurationhttp://wiki.apache.org/solr/ClusteringComponent#Configuration
.

If you hit any problems, let me know.

Staszek

On Sun, May 20, 2012 at 11:38 AM, Bruno Manninabmann...@free.fr   wrote:

  Dear all,

I use Solr 3.6.0 and I indexed some documents (around 12000).
Each documents contains a Abstract-en field (and some other fields).

Is it possible to use Carrot2 to create cluster (classes) with the
Abstract-en field?

What must I configure in the schema.xml ? or in other files?

Sorry for my newbie question, but I found only documentation for
Workbench
tool.

Bruno

  Thx for this link but I have a problem to configure my solrconfig.xml

in the section:
(note I run java -Dsolr.clustering.enabled=**true)

I have a field named abstract-en, and I would like to use only this field.

I would like to know if my requestHandler is good?
I have a doubt with the content of  : carrot.title, carrot.url

and also the latest field
str name=dfabstract-en/str
str name=defTypeedismax/str
str name=qf
  abstract-en^1.0
/str
str name=q.alt*:*/str
str name=rows10/str
str name=fl*,score/str

because the result when I do a request is exactly like a search request
(without more information)


My entire requestHandler is:

requestHandler name=/clustering startup=lazy
enable=${solr.clustering.**enabled:false} class=solr.SearchHandler
lst name=defaults
bool name=clusteringtrue/bool
str name=clustering.engine**default/str
bool name=clustering.results**true/bool
!-- The title field --
str name=carrot.titlename/str
str name=carrot.urlid/str
!-- The field to cluster on --
str name=carrot.snippet**abstract-en/str
!-- produce summaries --
bool name=carrot.produceSummary**true/bool
!-- the maximum number of labels per cluster --
!--int name=carrot.numDescriptions**5/int--
!-- produce sub clusters --
bool name=carrot.**outputSubClustersfalse/**bool
str name=dfabstract-en/str
str name=defTypeedismax/str
str name=qf
  abstract-en^1.0
/str
str name=q.alt*:*/str
str name=rows10/str
str name=fl*,score/str
/lst
arr name=last-components
strclustering/str
/arr
/requestHandler






Uncatchable Exception on solrj3.6.0

2012-05-22 Thread Jamel ESSOUSSI
Hi,

I use solr-solrj 3.6.0 and solr-core 3.6.0:

I have reimplemented the handleError of the ConcurrentUpdateSolrServer
class:


final ConcurrentUpdateSolrServer newSolrServer = new
ConcurrentUpdateSolrServer(url, client, 100, 10){
@Override
public void handleError(Throwable ex) {
// TODO Auto-generated method stub
super.handleError(ex);
}
};

My problem is when an exception is thrown in the solr server side I cannot
catch it in the client side.

Thanks

-- Jamel E

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Uncatchable-Exception-on-solrj3-6-0-tp3985437.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Re: SolrCloud: how to index documents into a specific core and how to search against that core?

2012-05-22 Thread Darren Govoni

I'm curious what the solrcloud experts say, but my suggestion is to try not to 
over-engineering the search architecture  on solrcloud. For example, what is 
the benefit of managing the what cores are indexed and searched? Having to know 
those details, in my mind, works against the automation in solrcore, but maybe 
there's a good reason you want to do it this way.

brbrbr--- Original Message ---
On 5/22/2012  07:35 AM Yandong Yao wrote:brHi Darren,
br
brThanks very much for your reply.
br
brThe reason I want to control core indexing/searching is that I want to
bruse one core to store one customer's data (all customer share same
brconfig):  such as customer 1 use coreForCustomer1 and customer 2
bruse coreForCustomer2.
br
brIs there any better way than using different core for different customer?
br
brAnother way maybe use different collection for different customer, while
brnot sure how many collections solr cloud could support. Which way is better
brin terms of flexibility/scalability? (suppose there are tens of thousands
brcustomers).
br
brRegards,
brYandong
br
br2012/5/22 Darren Govoni dar...@ontrenet.com
br
br Why do you want to control what gets indexed into a core and then
br knowing what core to search? That's the kind of knowing that SolrCloud
br solves. In SolrCloud, it handles the distribution of documents across
br shards and retrieves them regardless of which node is searched from.
br That is the point of cloud, you don't know the details of where
br exactly documents are being managed (i.e. they are cloudy). It can
br change and re-balance from time to time. SolrCloud performs the
br distributed search for you, therefore when you try to search a node/core
br with no documents, all the results from the cloud are retrieved
br regardless. This is considered A Good Thing.
br
br It requires a change in thinking about indexing and searching
br
br On Tue, 2012-05-22 at 08:43 +0800, Yandong Yao wrote:
br  Hi Guys,
br 
br  I use following command to start solr cloud according to solr cloud 
wiki.
br 
br  yydzero:example bjcoe$ java -Dbootstrap_confdir=./solr/conf
br  -Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar
br  yydzero:example2 bjcoe$ java -Djetty.port=7574 -DzkHost=localhost:9983
br -jar
br  start.jar
br 
br  Then I have created several cores using CoreAdmin API (
br  http://localhost:8983/solr/admin/cores?action=CREATEname=
br  coreNamecollection=collection1), and clusterstate.json show following
br  topology:
br 
br 
br  collection1:
br  -- shard1:
br-- collection1
br-- CoreForCustomer1
br-- CoreForCustomer3
br-- CoreForCustomer5
br  -- shard2:
br-- collection1
br-- CoreForCustomer2
br-- CoreForCustomer4
br 
br 
br  1) Index:
br 
br  Using following command to index mem.xml file in exampledocs directory.
br 
br  yydzero:exampledocs bjcoe$ java -Durl=
br  http://localhost:8983/solr/coreForCustomer3/update -jar post.jar mem.xml
br  SimplePostTool: version 1.4
br  SimplePostTool: POSTing files to
br  http://localhost:8983/solr/coreForCustomer3/update..
br  SimplePostTool: POSTing file mem.xml
br  SimplePostTool: COMMITting Solr index changes.
br 
br  And now SolrAdmin UI shows that 'coreForCustomer1', 'coreForCustomer3',
br  'coreForCustomer5' has 3 documents (mem.xml has 3 documents) and other 2
br  core has 0 documents.
br 
br  *Question 1:*  Is this expected behavior? How do I to index documents
br into
br  a specific core?
br 
br  *Question 2*:  If SolrCloud don't support this yet, how could I extend 
it
br  to support this feature (index document to particular core), where
br should i
br  start, the hashing algorithm?
br 
br  *Question 3*:  Why the documents are also indexed into 
'coreForCustomer1'
br  and 'coreForCustomer5'?  The default replica for documents are 1, right?
br 
br  Then I try to index some document to 'coreForCustomer2':
br 
br  $ java -Durl=http://localhost:8983/solr/coreForCustomer2/update -jar
br  post.jar ipod_video.xml
br 
br  While 'coreForCustomer2' still have 0 documents and documents in
br ipod_video
br  are indexed to core for customer 1/3/5.
br 
br  *Question 4*:  Why this happens?
br 
br  2) Search: I use 
br  http://localhost:8983/solr/coreForCustomer2/select?q=*%3A*wt=xml; to
br  search against 'CoreForCustomer2', while it will return all documents in
br  the whole collection even though this core has no documents at all.
br 
br  Then I use 
br 
br 
http://localhost:8983/solr/coreForCustomer2/select?q=*%3A*wt=xmlshards=localhost:8983/solr/coreForCustomer2
br ,
br  and it will return 0 documents.
br 
br  *Question 5*: So If want to search against a particular core, we need to
br  use 'shards' parameter and use solrCore name as parameter value, right?
br 
br 
br  Thanks very much in advance!
br 
br  Regards,
br  Yandong
br
br
br
br


Re: Installing Solr on Tomcat using Shell - Code wrong?

2012-05-22 Thread Li Li
you should find some clues from tomcat log
在 2012-5-22 晚上7:49,Spadez james_will...@hotmail.com写道:

 Hi,

 This is the install process I used in my shell script to try and get Tomcat
 running with Solr (debian server):



 I swear this used to work, but currently only Tomcat works. The Solr page
 just comes up with The requested resource (/solr/admin) is not available.

 Can anyone give me some insight into why this isnt working? Its driving me
 nuts.

 James

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Installing-Solr-on-Tomcat-using-Shell-Code-wrong-tp3985393.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: SolrCloud: how to index documents into a specific core and how to search against that core?

2012-05-22 Thread Mark Miller
I think the key is this: you want to think of a SolrCore on a single node Solr 
installation as a collection on a multi node SolrCloud installation.

So if you would use multiple SolrCore's with a std Solr setup, you should be 
using multiple collections in SolrCloud. If you were going to try to do 
everything in one SolrCore, that would be like putting everything in one 
collection in SolrCloud. I don't think it generally makes sense to try and work 
at the SolrCore level when working with SolrCloud. This will be made more clear 
once we add a simple collections api.

So I think your choice should be similar to using a single node - do you want 
to put everything in one 'collection' and use a filter to separate customers 
(with all its caveats and limitations) or do you want to use a collection per 
customer. You can always start up more clusters if you reach any limits.



On May 22, 2012, at 10:08 AM, Darren Govoni wrote:

 I'm curious what the solrcloud experts say, but my suggestion is to try not 
 to over-engineering the search architecture  on solrcloud. For example, what 
 is the benefit of managing the what cores are indexed and searched? Having to 
 know those details, in my mind, works against the automation in solrcore, but 
 maybe there's a good reason you want to do it this way.
 
 brbrbr--- Original Message ---
 On 5/22/2012  07:35 AM Yandong Yao wrote:brHi Darren,
 br
 brThanks very much for your reply.
 br
 brThe reason I want to control core indexing/searching is that I want to
 bruse one core to store one customer's data (all customer share same
 brconfig):  such as customer 1 use coreForCustomer1 and customer 2
 bruse coreForCustomer2.
 br
 brIs there any better way than using different core for different customer?
 br
 brAnother way maybe use different collection for different customer, while
 brnot sure how many collections solr cloud could support. Which way is 
 better
 brin terms of flexibility/scalability? (suppose there are tens of thousands
 brcustomers).
 br
 brRegards,
 brYandong
 br
 br2012/5/22 Darren Govoni dar...@ontrenet.com
 br
 br Why do you want to control what gets indexed into a core and then
 br knowing what core to search? That's the kind of knowing that SolrCloud
 br solves. In SolrCloud, it handles the distribution of documents across
 br shards and retrieves them regardless of which node is searched from.
 br That is the point of cloud, you don't know the details of where
 br exactly documents are being managed (i.e. they are cloudy). It can
 br change and re-balance from time to time. SolrCloud performs the
 br distributed search for you, therefore when you try to search a node/core
 br with no documents, all the results from the cloud are retrieved
 br regardless. This is considered A Good Thing.
 br
 br It requires a change in thinking about indexing and searching
 br
 br On Tue, 2012-05-22 at 08:43 +0800, Yandong Yao wrote:
 br  Hi Guys,
 br 
 br  I use following command to start solr cloud according to solr cloud 
 wiki.
 br 
 br  yydzero:example bjcoe$ java -Dbootstrap_confdir=./solr/conf
 br  -Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar
 br  yydzero:example2 bjcoe$ java -Djetty.port=7574 -DzkHost=localhost:9983
 br -jar
 br  start.jar
 br 
 br  Then I have created several cores using CoreAdmin API (
 br  http://localhost:8983/solr/admin/cores?action=CREATEname=
 br  coreNamecollection=collection1), and clusterstate.json show 
 following
 br  topology:
 br 
 br 
 br  collection1:
 br  -- shard1:
 br-- collection1
 br-- CoreForCustomer1
 br-- CoreForCustomer3
 br-- CoreForCustomer5
 br  -- shard2:
 br-- collection1
 br-- CoreForCustomer2
 br-- CoreForCustomer4
 br 
 br 
 br  1) Index:
 br 
 br  Using following command to index mem.xml file in exampledocs 
 directory.
 br 
 br  yydzero:exampledocs bjcoe$ java -Durl=
 br  http://localhost:8983/solr/coreForCustomer3/update -jar post.jar 
 mem.xml
 br  SimplePostTool: version 1.4
 br  SimplePostTool: POSTing files to
 br  http://localhost:8983/solr/coreForCustomer3/update..
 br  SimplePostTool: POSTing file mem.xml
 br  SimplePostTool: COMMITting Solr index changes.
 br 
 br  And now SolrAdmin UI shows that 'coreForCustomer1', 
 'coreForCustomer3',
 br  'coreForCustomer5' has 3 documents (mem.xml has 3 documents) and 
 other 2
 br  core has 0 documents.
 br 
 br  *Question 1:*  Is this expected behavior? How do I to index documents
 br into
 br  a specific core?
 br 
 br  *Question 2*:  If SolrCloud don't support this yet, how could I 
 extend it
 br  to support this feature (index document to particular core), where
 br should i
 br  start, the hashing algorithm?
 br 
 br  *Question 3*:  Why the documents are also indexed into 
 'coreForCustomer1'
 br  and 'coreForCustomer5'?  The default replica for documents are 1, 
 right?
 br 
 br  Then I try to index some document to 

Re: Multicore solr

2012-05-22 Thread Sohail Aboobaker
It would help if you provide your use case. What are you indexing for each
user and why would you need a separate core for indexing each user? How do
you decide schema for each user? It might be better to describe your use
case and desired results. People on the list will be able to advice on the
best approach.

Sohail


Re: solr tokenizer not splitting unbreakable expressions

2012-05-22 Thread Tanguy Moal
Hello Elisabeth,

Wouldn't it be more simple to have a custom component inside of the
front-end to your search server that would transform a query like hotel
de ville paris into hotel de ville paris (I.e. turning each
occurence of the sequence hotel de ville into a phrase query ) ?

Concerning protections inside of the tokenizer, I think that is not
possible actually.
The main reason for this could be that the QueryParser will break the query
on each space before passing each query-part through the analysis of every
searched field. Hence all the smart things you would put at indexing time
to wrap a sequence of tokens into a single one is not reproducible at query
time.

Please someone correct me if I'm wrong!

Alternatively, I think you might do so with a custom query parser (in order
to have phrases sent to the analyzers instead of words). But since
tokenizers don't have support for protected words list, you would need an
additional custom token filter that would consume the tokens stream and
annotate those matching an entry in the protection list.
Unfortunately, if your protected list is long, you will have performance
issues. Unless you rely on a dedicated data structure, like Trie-based
structures (Patricia-trie, ...) You can find solid implementations on the
Internet (see https://github.com/rkapsi/patricia-trie).

Then you could make your filter consume a sliding window of tokens while
the window matches in your trie.
Once you have a complete match in your trie, the filter can set an
attribute of the type your choice (e.g. MyCustomKeywordAttribute) on the
first matching token, and make the attribute be the complete match (e.g.
Hotel de ville).
If you don't have a complete match, drop the unmatched tokens leaving them
unmodified.

I Hope this helps...

--
Tanguy


2012/5/22 elisabeth benoit elisaelisael...@gmail.com

 Hello,

 Does someone know if there is a way to configure a tokenizer to split on
 white spaces, all words excluding a bunch of expressions listed in a file?

 For instance, if a want hotel de ville not to be split in words, a
 request like hotel de ville paris would be split into two tokens:

 hotel de ville and paris instead of 4 tokens

 hotel
 de
 ville
 paris

 I imagine something like

 tokenizer class=solr.StandardTokenizerFactory
 protected=protoexpressions.txt/

 Thanks a lot,
 Elisabeth



WFST with autosuggest/geo

2012-05-22 Thread William Bell
Does anyone have the slides or sample code from:

Building Query Auto-Completion Systems with Lucene 4.0
Presented by Sudarshan Gaikaiwari, Software Engineer,Yelp

We want to implement WFST with GEO boosting.


-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Binary updates handler does not propagate failures?

2012-05-22 Thread Jozef Vilcek
Hi all,

I am facing following issue ...
I have an application which is feeding Solr 3.6 index with document
updates via Solrj 3.6. I use a binary request writer, because of the
issue with XML when sending insert and deletes at once (
https://issues.apache.org/jira/browse/SOLR-1752 )

Now, I have noticed that if I sent a malformed document to the index,
I see in logs it got refused by the index, but on the Solrj side,
returned UpdateResponse does not indicate any kind of failure (no
exception thrown, response status code == 0 ). When I switch to XML
requests, I receive exception when sending malformed document.

By looking at Solr's  BinaryUpdateRequestHandler.java
http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_3_6_0/solr/core/src/java/org/apache/solr/handler/BinaryUpdateRequestHandler.java?view=markup
at lines 98 - 105, exceptions are not propagated, therefore
RequestHandlerBase can not set it into the response ...

Is this intended behavior?
What am I doing wrong?
Any suggestions?

Many thanks in advance.

Best,
Jozef


How to handle filter query against empty fields

2012-05-22 Thread Jozef Vilcek
Hi all,

I have a field(s) in a schema which I need to be able to specify in a
filter query. The field is not mandatory, therefore it can be empty. I
need to be able to run a query with a filer :  return only docs which
does not have value for the field  ...

What would be the optimal recommended way of doing this with Solr?

Thanks!

Best,
Jozef


Re: How to handle filter query against empty fields

2012-05-22 Thread Ahmet Arslan
 I have a field(s) in a schema which I need to be able to
 specify in a
 filter query. The field is not mandatory, therefore it can
 be empty. I
 need to be able to run a query with a filer :  return only
 docs which
 does not have value for the field  ...
 
 What would be the optimal recommended way of doing this with
 Solr?

There are two approaches for this. Please read my earlier post :

http://search-lucene.com/m/72Q4YURpgY





Faceted on Similarity ?

2012-05-22 Thread Robby
Hi All,

I'm quite a new user both to Lucene / Solr. I want to ask if faceted search
can be used to do a grouping for multiple field's value based on similarity
? I have look at the faceted index so far, but from my understanding they
only works on exact single and definite range values.

For example, if I have name, address, id number and nationality. And with
rows that had a degree of similarity distance between these fields we will
group them together.

Sample results will be like this :

Group1
Name : Angel, Address: Jakarta, ID : 123, Nationality: Indonesian
Name : Angeline, Address: Jayakarta, ID : 123, Nationality: Indonesian

Group2
Name : Frank, Address: Jl. Tubagus Angke Jakarta, ID : 333,
Nationality: Indonesian
Name : Frans, Address: Jl. T. Angke Jakarta, ID : 332, Nationality:
Indonesian


Hope I make myself clear and asking in proper way. Very sorry if my English
is not good enough...

Thanks,

Robby


Re: System requirements in my case?

2012-05-22 Thread Stanislaw Osinski

 3) Measure the size of the index folder, multiply with 8 to get a clue of
 total index size

 With 12 000 docs my index folder size is: 33Mo
 ps: I use solr.clustering.enabled=true


Clustering is performed at search time, it doesn't affect the size of the
index (but obviously it does affect the search response times).

Staszek


Re: Solr mail dataimporter cannot be found

2012-05-22 Thread Stefan Matheis
Hey Emma,

thanks for reporting this, i opened SOLR-3478 and will commit this soon

Stefan 


On Monday, May 21, 2012 at 10:47 PM, Emma Bo Liu wrote:

 Hi,
 
 I want to index emails using solr. I put the user name, password, hostname
 in data-config.xml under mail folder. This is a valid email but when I run
 in url http://localhost:8983/solr/mail/dataimport?command=full-import It
 said cannot access mail/dataimporter reason: no found. But when i run
 http://localhost:8983/solr/rss/dataimport?command=full-importhttp://localhost:8983/solr/mail/dataimport?command=full-import
 or 
 http://localhost:8983/solr/db/dataimport?command=full-imporhttp://localhost:8983/solr/mail/dataimport?command=full-import
 They can be found.
 
 In addition, when I run the command java
 -Dsolr.solr.home=./example-DIH/solr/ -jar start.jar , on the left side of
 solr UI, there are db, rss, tika and solr but no mail. Is it a bug that
 mail indexing? Thank you so much!
 
 Best,
 
 Emma 




clickable links as results?

2012-05-22 Thread 12rad
Hi, 

I want to display - a clickable link to the document along if a search
matches along with the no of times the search query matched. 
What should i be looking at? 
I am fairly new to Solr and don't know how I can achieve this. 

Thanks for the help!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/clickable-links-as-results-tp3985505.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Not able to use the highlighting feature! Want to return snippets of text

2012-05-22 Thread 12rad
That worked! 
Thanks!
I did str name=hl.simple.pre /str
str name=hl.simple.post /str

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Not-able-to-use-the-highlighting-feature-Want-to-return-snippets-of-text-Urgent-tp3985012p3985507.html
Sent from the Solr - User mailing list archive at Nabble.com.


index-time boosting using DIH

2012-05-22 Thread geeky2
hello all,

can i use the technique described on the wiki at:

http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts

if i am populating my core using a DIH?

looking at the posts on this subject and the wiki docs - leads me to believe
that you can only use this when you are using the xml interface for
importing data?

thank you

--
View this message in context: 
http://lucene.472066.n3.nabble.com/index-time-boosting-using-DIH-tp3985508.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: index-time boosting using DIH

2012-05-22 Thread Dyer, James
See http://wiki.apache.org/solr/DataImportHandler#Special_Commands and the 
$docBoost pseudo-field name.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-Original Message-
From: geeky2 [mailto:gee...@hotmail.com] 
Sent: Tuesday, May 22, 2012 2:12 PM
To: solr-user@lucene.apache.org
Subject: index-time boosting using DIH

hello all,

can i use the technique described on the wiki at:

http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts

if i am populating my core using a DIH?

looking at the posts on this subject and the wiki docs - leads me to believe
that you can only use this when you are using the xml interface for
importing data?

thank you

--
View this message in context: 
http://lucene.472066.n3.nabble.com/index-time-boosting-using-DIH-tp3985508.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Highlight feature

2012-05-22 Thread Chris Hostetter

: That is the default response format. If you would like to change that, 
: you could extend the search handler or post process the XML data. 
: Another option would be to use the javabin (if your app is java based) 
: and build xml the way your app would need.

there is actaully a more straight forward way to do stuff like this in 
trunk now, such that it can work with any response writer, using the 
DocTransformer API.

there is already an ExplainAugmenter that can inline the explain info 
for a document, we just need someone to help write a corrisponding 
HighlightAugmenter , i've opened a Jira if anone wnats to take a crack at 
a patch...

https://issues.apache.org/jira/browse/SOLR-3479

-Hoss


Re: Solr 3.6 fails when using XSLT

2012-05-22 Thread Chris Hostetter

what does your results.xsl look like? or more sepcificly: can you post a 
very small example XSL that has this problem?

you mentioned you are using xsl:include and that doesn't seem to work ... 
is that a seperate problem, or does removing/adding the xsl:including 
fix/cause this problem?

what does your xsl:include look like? where to the various xsl templates 
live in the filesystem realtive to eachother?


: Date: Fri, 11 May 2012 08:24:45 -0700 (PDT)
: From: pramila_tha...@ontla.ola.org pramila_tha...@ontla.ola.org
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Solr 3.6 fails when using XSLT
: 
: Hi Everyone,
: 
: I have recently upgraded to *solr 3.6 from solr 1.4.*
: My XSL where working fine in solr 1.4.
: 
: but now with Solr 3.6 I keep getting the following Error 
: 
: /getTransformer fails in getContentType java.lang.RuntimeException:
: getTransformer fails in getContentType /
: 
: But instead of results.xsl If I use example.xsl, it is fine.
: 
: I fine my xsl:include does not seem to work with Solr 3.6
: 
: Can someone please let me know what am I doing wrong?
: 
: Thanks,
: 
: --
: View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-3-6-fails-when-using-XSLT-tp3980240.html
: Sent from the Solr - User mailing list archive at Nabble.com.

-Hoss


RE: Solr 3.6 fails when using XSLT

2012-05-22 Thread pramila_tha...@ontla.ola.org
Hi Everyone,

This is what worked in solr 1.4 and did not work in solr 3.6.

Actually solr 3.6 requires all the xsl to be present in conf/xslt directory
All paths leading to xsl should be relative to conf directory.

But before this was not the case.

!-
NOTE: Does NOT work with solr 3.6 , worked in Solr 1.4
xsl:include  href=../webapps/pdfexample/xsl_scripts/highlight.xsl/
xsl:include  href=../webapps/pdfexample/xsl_scripts/pagination.xsl/
xsl:include  href=../webapps/pdfexample/xsl_scripts/simpleSearch.xsl/
xsl:include  href=../webapps/pdfexample/xsl_scripts/bills/facets.xsl/
xsl:include  href=../webapps/pdfexample/xsl_scripts/bills/advanceSearch.xsl/
xsl:include  
href=../webapps/pdfexample/xsl_scripts/bills/search_common_fields.xsl/
--
!-this is relative to conf directory, which is working for solr 3.6 --

xsl:include  href=./xsl_scripts/highlight.xsl/
xsl:include  href=./xsl_scripts/pagination.xsl/
xsl:include  href=./xsl_scripts/simpleSearch.xsl/
xsl:include  href=./xsl_scripts/bills/facets.xsl/
xsl:include  href=./xsl_scripts/bills/advanceSearch.xsl/
xsl:include  href=./xsl_scripts/bills/search_common_fields.xsl/

Thanks,

--Pramila Thakur

From: Chris Hostetter-3 [via Lucene] 
[mailto:ml-node+s472066n3985522...@n3.nabble.com]
Sent: Tuesday, May 22, 2012 3:37 PM
To: Thakur, Pramila
Subject: Re: Solr 3.6 fails when using XSLT


what does your results.xsl look like? or more sepcificly: can you post a
very small example XSL that has this problem?

you mentioned you are using xsl:include and that doesn't seem to work ...
is that a seperate problem, or does removing/adding the xsl:including
fix/cause this problem?

what does your xsl:include look like? where to the various xsl templates
live in the filesystem realtive to eachother?


: Date: Fri, 11 May 2012 08:24:45 -0700 (PDT)
: From: [hidden email]/user/SendEmail.jtp?type=nodenode=3985522i=0 
[hidden email]/user/SendEmail.jtp?type=nodenode=3985522i=1
: Reply-To: [hidden email]/user/SendEmail.jtp?type=nodenode=3985522i=2
: To: [hidden email]/user/SendEmail.jtp?type=nodenode=3985522i=3
: Subject: Solr 3.6 fails when using XSLT
:
: Hi Everyone,
:
: I have recently upgraded to *solr 3.6 from solr 1.4.*
: My XSL where working fine in solr 1.4.
:
: but now with Solr 3.6 I keep getting the following Error
:
: /getTransformer fails in getContentType java.lang.RuntimeException:
: getTransformer fails in getContentType /
:
: But instead of results.xsl If I use example.xsl, it is fine.
:
: I fine my xsl:include does not seem to work with Solr 3.6
:
: Can someone please let me know what am I doing wrong?
:
: Thanks,
:
: --
: View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-3-6-fails-when-using-XSLT-tp3980240.html
: Sent from the Solr - User mailing list archive at Nabble.com.

-Hoss


If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/Solr-3-6-fails-when-using-XSLT-tp3980240p3985522.html
To unsubscribe from Solr 3.6 fails when using XSLT, click 
herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3980240code=cHJhbWlsYV90aGFrdXJAb250bGEub2xhLm9yZ3wzOTgwMjQwfC0zOTMxMDQ3NzI=.
NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-3-6-fails-when-using-XSLT-tp3980240p3985524.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Jetty rerturning HTTP error code 413

2012-05-22 Thread Sai
Hi Alexandre,

Can you please let me know how did you fix this issue. I am also getting this
error when I pass very large query to Solr.

An reply is highly appreciated.

Thanks,
Sai



RE: index-time boosting using DIH

2012-05-22 Thread geeky2
thanks for the reply,

so to use the $docBoost pseudo-field name, would you do something like below
- and would this technique likely increase my total index time?



dataConfig
  dataSource .../

 
  document name=mydoc
entity name=myentity
transformer=script:BoostDoc
query=select ...
 
 field column=SOME_COLUMN name=someField /
 ... 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/index-time-boosting-using-DIH-tp3985508p3985527.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: index-time boosting using DIH

2012-05-22 Thread Dyer, James
You need to add the $docBoost pseudo-field to the document somehow.  A 
transformer is one way to do it.  You could just add it to a SELECT statement, 
which is especially convienent if the boost value somehow is derrived from the 
data:

SELECT case when SELL_MORE_FLAG='Y' then 999 ELSE null END as '$docBoost', 
...other fields... from some_table, etc

Either way I wouldn't expect it to make the indexing be noticably slower.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: geeky2 [mailto:gee...@hotmail.com] 
Sent: Tuesday, May 22, 2012 3:06 PM
To: solr-user@lucene.apache.org
Subject: RE: index-time boosting using DIH

thanks for the reply,

so to use the $docBoost pseudo-field name, would you do something like below
- and would this technique likely increase my total index time?



dataConfig
  dataSource .../

 
  document name=mydoc
entity name=myentity
transformer=script:BoostDoc
query=select ...
 
 field column=SOME_COLUMN name=someField /
 ... 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/index-time-boosting-using-DIH-tp3985508p3985527.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: index-time boosting using DIH

2012-05-22 Thread geeky2
thank you james for the feedback - i appreciate it.

ultimately - i was trying to decide if i was missing the boat by ONLY using
query time boosting, and i should really be using index time boosting.

but after your reply, reading the solr book, and looking at the lucene dox -
it looks like index-time boosting is not what i need.  i can probably do
better by using query-time boosting and the proper sort params.

thanks again

--
View this message in context: 
http://lucene.472066.n3.nabble.com/index-time-boosting-using-DIH-tp3985508p3985539.html
Sent from the Solr - User mailing list archive at Nabble.com.


always getting distinct count of -1 in luke response (solr4 snapshot)

2012-05-22 Thread Mike Hugo
We're testing a snapshot of Solr4 and I'm looking at some of the responses
from the Luke request handler.  Everything looks good so far, with the
exception of the distinct attribute which (in Solr3) shows me the
distinct number of terms for a given field.

Given the request below, I'm consistently getting a response back with a
value in the distinct field of -1.  Is there something different I need
to do to get back the actual distinct count?

Thanks!

Mike

http://localhost:8080/solr/core1/admin/luke?wt=jsonfl=labelnumTerms=1

fields: {
label: {
type: text_general,
schema: IT-M--,
index: (unstored field),
docs: 63887,
*distinct: -1,*
topTerms: [


Indexing Polygons

2012-05-22 Thread Young, Cody
Hi All,

I'm trying to figure out how to index polygons in solr (trunk). I'm using LSP 
right now as the solr integration of the new spatial module hasn't completed. I 
have searching for a point using a polygon working, but I'm also looking for 
searching for a polygon using a point.

I've seen some indication that LSP supports this but I haven't been able to 
find an example.

What field type would I need to use? Would it be multivalued?

Please and thank you!
Cody




Re: Faceted on Similarity ?

2012-05-22 Thread Lee Carroll
Take a look at the clustering component

http://wiki.apache.org/solr/ClusteringComponent

Consider clustering off line and indexing the pre calculated group memberships

I might be wrong but I don't think their is any faceting mileage here.
Depending upon the use case
you might get some use out of the mlt handler

http://wiki.apache.org/solr/MoreLikeThis



On 22 May 2012 18:00, Robby java@phi-integration.com wrote:
 Hi All,

 I'm quite a new user both to Lucene / Solr. I want to ask if faceted search
 can be used to do a grouping for multiple field's value based on similarity
 ? I have look at the faceted index so far, but from my understanding they
 only works on exact single and definite range values.

 For example, if I have name, address, id number and nationality. And with
 rows that had a degree of similarity distance between these fields we will
 group them together.

 Sample results will be like this :
 
 Group1
    Name : Angel, Address: Jakarta, ID : 123, Nationality: Indonesian
    Name : Angeline, Address: Jayakarta, ID : 123, Nationality: Indonesian

 Group2
    Name : Frank, Address: Jl. Tubagus Angke Jakarta, ID : 333,
 Nationality: Indonesian
    Name : Frans, Address: Jl. T. Angke Jakarta, ID : 332, Nationality:
 Indonesian
 

 Hope I make myself clear and asking in proper way. Very sorry if my English
 is not good enough...

 Thanks,

 Robby


RE: Solr 3.6 fails when using XSLT

2012-05-22 Thread Chris Hostetter

: This is what worked in solr 1.4 and did not work in solr 3.6.
: 
: Actually solr 3.6 requires all the xsl to be present in conf/xslt directory
: All paths leading to xsl should be relative to conf directory.
: 
: But before this was not the case.

Right ... this was actually a bug (in how all relative paths in xml 
includes, or xsl includes, were resolved) that was fixed in Solr 3.1, as 
noted in CHANGES.txt...

* SOLR-1656: XIncludes and other HREFs in XML files loaded by ResourceLoader
  are fixed to be resolved using the URI standard (RFC 2396). The system
  identifier is no longer a plain filename with path, it gets initialized
  using a custom URI scheme solrres:. This scheme is resolved using a
  EntityResolver that utilizes ResourceLoader
  (org.apache.solr.common.util.SystemIdResolver). This makes all relative
  pathes in Solr's config files behave like expected. This change
  introduces some backwards breaks in the API: Some config classes
  (Config, SolrConfig, IndexSchema) were changed to take
  org.xml.sax.InputSource instead of InputStream. There may also be some
  backwards breaks in existing config files, it is recommended to check
  your config files / XSLTs and replace all XIncludes/HREFs that were
  hacked to use absolute paths to use relative ones. (uschindler)




-Hoss


Re: Multicore solr

2012-05-22 Thread Amit Jha
Hi,

Thanks for your advice.
It is basically a meta search application. Users can perform a search on N 
number of data sources at a time. We broadcast Parallel search to each  
selected data sources and write data to solr using custom build API(API and 
solr are deployed on separate machine API job is to perform parallel search, 
write data to solr ). API respond to application that some results are 
available then application fires  a search query to display the results(query 
would be q=unique_search_id). And other side API keep writing data to solr and 
user can fire a search to solr to view all results. 

In current scenario we are using single solr server  we performing real time 
index and search. Performing these operations on single solr making process 
slow as index size increases. 

So we are planning to use multi core solr and each user will have its core. All 
core will have the same schema.

Please suggest if this approach has any issues.

Rgds
AJ

On 22-May-2012, at 20:14, Sohail Aboobaker sabooba...@gmail.com wrote:

 It would help if you provide your use case. What are you indexing for each
 user and why would you need a separate core for indexing each user? How do
 you decide schema for each user? It might be better to describe your use
 case and desired results. People on the list will be able to advice on the
 best approach.
 
 Sohail