date:20120522


Dear Solr users,

My company would like to use solr to index around 80 000 000 documents 
(xml files with around 5~10ko size each).

My program (robot) will connect to this solr with boolean requests.

Number of users: around 1000
Number of requests by user and by day: 300
Number of users by day: 30

I would like to subscribe to a host provider with this configuration:
- Dedicated Server
- Ubuntu
- Intel Xeon i7 2x 266+ GHz 12 Go 2 * 1500Go
- Unlimited bandwidth
- IP fixe

Do you think this configuration is enough?

Thanks for your info,
Sincerely
Bruno

Re: Question about sampling

2012-05-22 Thread Lance Norskog

My mistake- I did not research whether the data above is stored a
strings. The hashcode has to be stored as strings for this trick to
work.

On Sun, May 20, 2012 at 8:25 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 I'd be curious about this, too!
 I suspect the answer is: not doable, patches welcome. :)
 But I'd love to be wrong!

 Otis
 
 Performance Monitoring for Solr / ElasticSearch / HBase - 
 http://sematext.com/spm




 From: Yuval Dotan yuvaldo...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Wednesday, May 16, 2012 9:43 AM
Subject: Question about sampling

Hi Guys
We have an environment containing billions of documents.
Faceting over this large result set could take many seconds, and so we
thought we might be able to use statistical sampling of a smaller result
set from the facet, and give an approximate result much quicker.
Is there any way to facet only a random sample of the results?
Thanks
Yuval






-- 
Lance Norskog
goks...@gmail.com

Re: Indexing Searching MySQL table with Hindi and English data

2012-05-22 Thread Gora Mohanty

On 22 May 2012 12:07, KP Sanjailal kpsanjai...@gmail.com wrote:
 Hi,

 Thank you so much for replying.

 The MySQL database server is running on a Fedora Core 12 Machine with Hindi
 Language Support enabled.  Details of the database are - ENGINE=3DMyISAM and
  DEFAULT CHARSET=3Dutf8

 Data is imported using the Solr DataImportHandler (mysql jdbc driver).
 In the schema.xml file the title field is defined as:
 field name=title type=text_general indexed=true stored=true/

Please show us your schema.xml and the configuration
file for the DataImportHandler (you might wish to obscure
sensitive details like username/password). Have you tried
the SELECT in the DIH configuration outside of Solr. Is
it producing proper UTF-8?

Regards,
Gora

fsv=true not returning sort_values for distributed searches

2012-05-22 Thread XJ

We use fsv=true to help debug sortings which works great for
non-distributed searches. However, its not working (no sort_values in
response) for multi shard queries. Any idea how to get this fixed?

thanks,
XJ

Strategy for maintaining De-normalized indexes

Hi,

I have a very basic question and hopefully there is a simple answer to
this. We are trying to index a simple product catalog which has a master
product and child products. Each master product can have multiple child
products. A master product can be assigned one or more product categories.
Now, we need to be able to show counts of categories based on number of
child products in each category. We have indexed data using a join and
selecting appropriate values for index from each table. This is basically a
De-normalized result set. It works perfectly for our search purposes.
However, maintaining the index and keeping index up to date is an issue.
Whenever a product master is updated with a new category, we will need to
delete all the index entries for child products in index and insert them
again. This seems a lot of activity for a regular on-going operation i.e.
product category updates.

Since, join between schemas is only available in 4.0, what are other
strategies to maintain or to create such queries.

Thanks for your help.

Regards,
Sohail

RE: trunk cloud ui not working

2012-05-22 Thread Phil Hoy

Hi, 

I was using windows 7 but it is fine with chrome on Windows Web Server 2008 R2 
also I asked a colleague with windows 7 and it is fine for him too, so really 
sorry but I think it was a !'works on my machine' thing. 

Of course if I track down the cause I will reply to this email again.

Thanks,
Phil

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: 21 May 2012 18:22
To: solr-user@lucene.apache.org
Subject: Re: trunk cloud ui not working

What OS? I was just trying trunk and looking at that view on Chrome on OSX and 
Linux and did not see an issue.

On May 21, 2012, at 1:15 PM, Phil Hoy wrote:

 After further investigation I have found that it is not a problem on firefox, 
 only chrome and IE. 
 
 Phil
 
 -Original Message-
 Sent: 21 May 2012 18:05
 To: solr-user@lucene.apache.org
 Subject: trunk cloud ui not working
 
 Hi,
 
 I am running from the trunk and the localhost:8983/solr/#/~cloud page shows 
 nothing but Fetch Zookeeper Data.
 
 If I run fiddler I see that:
 http://localhost:8983/solr/zookeeper?wt=jsondetail=truepath=%2Fclust
 erstate.json
 and
 http://localhost:8983/solr/zookeeper?wt=jsonpath=%2Flive_nodes
 are called and return data but no update to the ui.
 
 Cheers,
 Phil
 
 
 __
 This email has been scanned by the brightsolid Email Security System. 
 Powered by MessageLabs 
 __

- Mark Miller
lucidimagination.com












__
This email has been scanned by the brightsolid Email Security System. Powered 
by MessageLabs 
__

Re: How can i search site name

2012-05-22 Thread Jan Høydahl

You need to explain your case in much more detail to get precise help. Please 
read http://wiki.apache.org/solr/UsingMailingLists

If your problem is that you have a URL and want to know the domain for it, e.g. 
www.company.com/foo/bar/index.html and you want only www.company.com you can 
use the UrlClassifyProcessor, see SOLR-2826.

--
Jan Høydahl, search solution architect
Cominvent AS - www.facebook.com/Cominvent
Solr Training - www.solrtraining.com

On 22. mai 2012, at 08:03, Shameema Umer wrote:

 Sorry,
 Please let me know how can I search site name using the solr query syntax.
 My results should show title, url and content.
 Title and content are being searched even though the
 defaultSearchFieldcontent/defaultSearchField.
 
 I need url or site name too. please, help.
 
 Thanks in advance.
 
 On Tue, May 22, 2012 at 11:05 AM, ketan kore ketankore...@gmail.com wrote:
 
 you can go on www.google.com and just type the site which you want to
 search and google will show you the results as simple as that ...

Re: System requirements in my case?

Dedicated Server may not be required. If you want to cut down cost, then
prefer shared server.

How much the RAM?

Regards
Aditya
www.findbestopensource.com


On Tue, May 22, 2012 at 12:36 PM, Bruno Mannina bmann...@free.fr wrote:

 Dear Solr users,

 My company would like to use solr to index around 80 000 000 documents
 (xml files with around 5~10ko size each).
 My program (robot) will connect to this solr with boolean requests.

 Number of users: around 1000
 Number of requests by user and by day: 300
 Number of users by day: 30

 I would like to subscribe to a host provider with this configuration:
 - Dedicated Server
 - Ubuntu
 - Intel Xeon i7 2x 266+ GHz 12 Go 2 * 1500Go
 - Unlimited bandwidth
 - IP fixe

 Do you think this configuration is enough?

 Thanks for your info,
 Sincerely
 Bruno

Re: Strategy for maintaining De-normalized indexes

2012-05-22 Thread Tanguy Moal

Hello,

Can't the ID (uniqueKey) of the indexed documents (i.e. denormalized data)
be a combination of the master product id and the child product id ?

Therefor whenever you update your master product db entry, you simply need
to reindex documents depending on the master product entry.

You can even simply reindex your whole DB, updates are made in place (i.e.
old documents are *completely* overwritten by their respective updates).

There's nothing to delete if you build your unique key in a maintainable
way.

You can re-index documents whenever you need to do so.

--
Tanguy

2012/5/22 Sohail Aboobaker sabooba...@gmail.com

 Hi,

 I have a very basic question and hopefully there is a simple answer to
 this. We are trying to index a simple product catalog which has a master
 product and child products. Each master product can have multiple child
 products. A master product can be assigned one or more product categories.
 Now, we need to be able to show counts of categories based on number of
 child products in each category. We have indexed data using a join and
 selecting appropriate values for index from each table. This is basically a
 De-normalized result set. It works perfectly for our search purposes.
 However, maintaining the index and keeping index up to date is an issue.
 Whenever a product master is updated with a new category, we will need to
 delete all the index entries for child products in index and insert them
 again. This seems a lot of activity for a regular on-going operation i.e.
 product category updates.

 Since, join between schemas is only available in 4.0, what are other
 strategies to maintain or to create such queries.

 Thanks for your help.

 Regards,
 Sohail

Re: System requirements in my case?

2012-05-22 Thread lboutros

Hi Bruno,

will you use facets and result sorting ? 
What is the update frequency/volume ?

This could impact the amount of memory/server count.

Ludovic.

-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/System-requirements-in-my-case-tp3985309p3985327.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Strategy for maintaining De-normalized indexes

Thats how de-normalization works. You need to update all child products.

If you just need the count and you are using facets then maintain a map
between category and main product, main product and child product. Lucene
db has no schema. You could retrieve the data based on its type.

Category record will have Category name, ProductName and a type
(CATEGORY_TYPE)
Child product record will have ProductName, MainProductName ProductDetails,
and type (PRODUCT_TYPE)

Now in this you may need to use two queries. Given the category name, fetch
the main product name and query using it to fetch the child products. Hope
it helps.

Regards
Aditya
www.findbestopensource.com


On Tue, May 22, 2012 at 1:37 PM, Sohail Aboobaker sabooba...@gmail.comwrote:

 Hi,

 I have a very basic question and hopefully there is a simple answer to
 this. We are trying to index a simple product catalog which has a master
 product and child products. Each master product can have multiple child
 products. A master product can be assigned one or more product categories.
 Now, we need to be able to show counts of categories based on number of
 child products in each category. We have indexed data using a join and
 selecting appropriate values for index from each table. This is basically a
 De-normalized result set. It works perfectly for our search purposes.
 However, maintaining the index and keeping index up to date is an issue.
 Whenever a product master is updated with a new category, we will need to
 delete all the index entries for child products in index and insert them
 again. This seems a lot of activity for a regular on-going operation i.e.
 product category updates.

 Since, join between schemas is only available in 4.0, what are other
 strategies to maintain or to create such queries.

 Thanks for your help.

 Regards,
 Sohail

Multicore Solr

2012-05-22 Thread Shanu Jha

Hi all,

greetings from my end. This is my first post on this mailing list. I have
few questions on multicore solr. For background we want to create a core
for each user logged in to our application. In that case it may be 50, 100,
1000, N-numbers. Each core will be used to write and search index in real
time.

1. Is this a good idea to go with?
2. What are the pros and cons of this approch?

Awaiting for your response.

Regards
AJ

Re: System requirements in my case?


My choice: http://www.ovh.com/fr/serveurs_dedies/eg_best_of.xml

24 Go DDR3

Le 22/05/2012 10:26, findbestopensource a écrit :

Dedicated Server may not be required. If you want to cut down cost, then
prefer shared server.

How much the RAM?

Regards
Aditya
www.findbestopensource.com


On Tue, May 22, 2012 at 12:36 PM, Bruno Manninabmann...@free.fr  wrote:


Dear Solr users,

My company would like to use solr to index around 80 000 000 documents
(xml files with around 5~10ko size each).
My program (robot) will connect to this solr with boolean requests.

Number of users: around 1000
Number of requests by user and by day: 300
Number of users by day: 30

I would like to subscribe to a host provider with this configuration:
- Dedicated Server
- Ubuntu
- Intel Xeon i7 2x 266+ GHz 12 Go 2 * 1500Go
- Unlimited bandwidth
- IP fixe

Do you think this configuration is enough?

Thanks for your info,
Sincerely
Bruno

Re: System requirements in my case?


Hi,

facets I don't know yet because I don't know exactly what is facets (sorry)

Sorting: yes
Scoring: yes

Concerning update Frequency : every week
Volume: around 1Go data by year


Merci beaucoup :)

Aix En Provence
France

Le 22/05/2012 10:35, lboutros a écrit :

Hi Bruno,

will you use facets and result sorting ?
What is the update frequency/volume ?

This could impact the amount of memory/server count.

Ludovic.

-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/System-requirements-in-my-case-tp3985309p3985327.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multicore Solr

Having cores per user is not good idea. The count is too high. Keep
everything in single core. You could filter the data based on user name or
user id.

Regards
Aditya
www.findbestopensource.com



On Tue, May 22, 2012 at 2:29 PM, Shanu Jha shanuu@gmail.com wrote:

 Hi all,

 greetings from my end. This is my first post on this mailing list. I have
 few questions on multicore solr. For background we want to create a core
 for each user logged in to our application. In that case it may be 50, 100,
 1000, N-numbers. Each core will be used to write and search index in real
 time.

 1. Is this a good idea to go with?
 2. What are the pros and cons of this approch?

 Awaiting for your response.

 Regards
 AJ

Re: System requirements in my case?

Seems to be fine. Go head.

Before hosting, Have you tried / tested your application in local setup.
RAM usage is what matters in terms of Solr. Just benchmark your app for 100
000 documents, Log the memory used. Calculate the RAM reqd for 80 000 000
documents.

Regards
Aditya
www.findbestopensource.com


On Tue, May 22, 2012 at 2:36 PM, Bruno Mannina bmann...@free.fr wrote:

 My choice: 
 http://www.ovh.com/fr/**serveurs_dedies/eg_best_of.xmlhttp://www.ovh.com/fr/serveurs_dedies/eg_best_of.xml

 24 Go DDR3

 Le 22/05/2012 10:26, findbestopensource a écrit :

  Dedicated Server may not be required. If you want to cut down cost, then
 prefer shared server.

 How much the RAM?

 Regards
 Aditya
 www.findbestopensource.com


 On Tue, May 22, 2012 at 12:36 PM, Bruno Manninabmann...@free.fr  wrote:

  Dear Solr users,

 My company would like to use solr to index around 80 000 000 documents
 (xml files with around 5~10ko size each).
 My program (robot) will connect to this solr with boolean requests.

 Number of users: around 1000
 Number of requests by user and by day: 300
 Number of users by day: 30

 I would like to subscribe to a host provider with this configuration:
 - Dedicated Server
 - Ubuntu
 - Intel Xeon i7 2x 266+ GHz 12 Go 2 * 1500Go
 - Unlimited bandwidth
 - IP fixe

 Do you think this configuration is enough?

 Thanks for your info,
 Sincerely
 Bruno

Re: is commit a sequential process in solr indexing

Yes. Lucene / Solr supports multi threaded environment. You could do commit
from two different threads to same core or different core.

Regards
Aditya
www.findbestopensource.com

On Tue, May 22, 2012 at 12:35 AM, jame vaalet jamevaa...@gmail.com wrote:

 hi,
 my use case here is to search all the incoming documents for certain
 comination of words which are pre-determined. So what am doing here is,
 create a batch of x docs according to their creation date, index them,
 commit them and search them for query (pre-determined).
 My question is, if i have to make the entire process multi threaded and two
 threads are trying to commit two different set of batchs, will the commit
 happen in parallel. what if am trying to commit to different solr-cores ?

 --

 -JAME

Re: Strategy for maintaining De-normalized indexes

Thank you for quick replies.

Can't the ID (uniqueKey) of the indexed documents (i.e. denormalized data)
be a combination of the master product id and the child product id ?
  -- We do not need it as each child is already a unique key.

Therefore whenever you update your master product db entry, you simply
need  to reindex documents depending on the master product entry.
  -- This is where the confusion might be. I may have misread it but Apache
Solr3 Enterprise Search, it mentions that if any part of the document
needs to be updated, the entire document must be replaced. Internally this
is a deletion and an addition. Is re-indexing all detail records a huge
performance hit? Assuming that a master can have upto 10 to 20k of child
records?

Thanks again.

Sohail

Re: How can i search site name

2012-05-22 Thread Jan Høydahl

Hi,

I would probably use (e)DisMax.
Index your url and metadata fields as text without stemming, e.g. text_general
Then query as q=mycompanydefType=edismaxqf=title^10 content^1 url^5
If you like to give higher weight to the domain/site part of the URL, apply 
UrlClassifyProcessor and search the domain field separately with higher 
weight.

--
Jan Høydahl, search solution architect
Cominvent AS - www.facebook.com/Cominvent
Solr Training - www.solrtraining.com

On 22. mai 2012, at 12:23, Shameema Umer wrote:

 Thanks Li Li and Jan.
 
 Yes, if url is www.company.com/foo/bar/index.html, I should be able to
 search the sub-strings like company, foo or bar etc.
 
 when I changed the part of my schema file from
 
defaultSearchFieldcontent/defaultSearchField
 
 to
 
   defaultSearchFieldstext/defaultSearchField
   copyField source=title dest=stext/
   copyField source=content dest=stext/
   copyField source=site dest=stext/
 
 server error occurred after restarting solr. Do I need to re-index solr.
 Please help me as i need to search title url and content with privilege to
 title. If DisMaxRequestHandler helps me solve my problems, let me know the
 best tutorial page to study
 it.http://wiki.apache.org/solr/DisMaxRequestHandler?action=fullsearchcontext=180value=linkto%3A%22DisMaxRequestHandler%22
 
 Thanks
 Shameema
 http://wiki.apache.org/solr/DisMaxRequestHandler?action=fullsearchcontext=180value=linkto%3A%22DisMaxRequestHandler%22

Re: Strategy for maintaining De-normalized indexes

2012-05-22 Thread Tanguy Moal

It all depends on the frequency at which you refresh your data, on your
deployment (master/slave setup), ...
Many things need to be taken into account!

Did you face any performance issue while building your index?
If you didn't, rebuilding it shouldn't be more problematic.

--
Tanguy

2012/5/22 Sohail Aboobaker sabooba...@gmail.com

 Thank you for quick replies.

 Can't the ID (uniqueKey) of the indexed documents (i.e. denormalized data)
 be a combination of the master product id and the child product id ?
   -- We do not need it as each child is already a unique key.

 Therefore whenever you update your master product db entry, you simply
 need  to reindex documents depending on the master product entry.
   -- This is where the confusion might be. I may have misread it but Apache
 Solr3 Enterprise Search, it mentions that if any part of the document
 needs to be updated, the entire document must be replaced. Internally this
 is a deletion and an addition. Is re-indexing all detail records a huge
 performance hit? Assuming that a master can have upto 10 to 20k of child
 records?

 Thanks again.

 Sohail

Re: Strategy for maintaining De-normalized indexes

We are still in design phase, so we haven't hit any performance issues. We
do not want to discover performance issues too late during QA :) We would
rather account for any issues during the design phase.

The refresh rate on fields that we are using from master table will be
rare. May be three or four times in a year.

Regards,
Sohail

Re: How can i search site name

2012-05-22 Thread Shameema Umer

Thanks Jan.* It worked perfect*. Thats all i needed.
May the God bless you.

Regards
Shameema

On Tue, May 22, 2012 at 4:57 PM, Jan Høydahl jan@cominvent.com wrote:

Hi,

I would probably use (e)DisMax.
Index your url and metadata fields as text without stemming, e.g.
text_general
Then query as q=mycompanydefType=edismaxqf=title^10 content^1 url^5
If you like to give higher weight to the domain/site part of the URL,
apply UrlClassifyProcessor and search the domain field separately with
higher weight.

--
Jan Høydahl, search solution architect
Cominvent AS - www.facebook.com/Cominvent
Solr Training - www.solrtraining.com

On 22. mai 2012, at 12:23, Shameema Umer wrote:

Thanks Li Li and Jan.

Yes, if url is www.company.com/foo/bar/index.html, I should be able to
search the sub-strings like company, foo or bar etc.

when I changed the part of my schema file from

defaultSearchFieldcontent/defaultSearchField

defaultSearchFieldstext/defaultSearchField
copyField source=title dest=stext/
copyField source=content dest=stext/
copyField source=site dest=stext/

server error occurred after restarting solr. Do I need to re-index solr.
Please help me as i need to search title url and content with privilege
to
title. If DisMaxRequestHandler helps me solve my problems, let me know
the
best tutorial page to study
it.
http://wiki.apache.org/solr/DisMaxRequestHandler?action=fullsearchcontext=180value=linkto%3A%22DisMaxRequestHandler%22

Thanks
Shameema

http://wiki.apache.org/solr/DisMaxRequestHandler?action=fullsearchcontext=180value=linkto%3A%22DisMaxRequestHandler%22

Re: System requirements in my case?

2012-05-22 Thread Jan Høydahl

Hi,

It is impossible to guess the required HW size without more knowledge about 
data and usage. 80 mill docs is a fair amount.

Here's how I would approach sizing the setup:
1) Get your schema in shape, removing unnecessary stored/indexed fields
2) To a test index locally of a part of the dataset, e.g. 10 mill docs and 
perform an Optimize
3) Measure the size of the index folder, multiply with 8 to get a clue of total 
index size
4) Do some benchmarking with realistic types of queries to identify performance 
bottlenecks on query side

Depending on your requirements for search performance, you can beef up your RAM 
to hold the whole index or depend on slow disks as a bottleneck. If you find 
that total size of index is 16Gb, you should leave 16Gb free for OS disk 
caching, e.g. allocate 8Gb to Tomcat/Solr and leave the rest for the OS. If I 
should guess, you probably find that one server gets overloaded or too slow 
with your amount of docs, and that you end up with sharding across 2-4 servers.

PS: Do you always need to search all data? A trick may be to partition your 
data such that say 80% of searches go to a fresh index with 10% of the 
content, while the remaining searches include everything.

--
Jan Høydahl, search solution architect
Cominvent AS - www.facebook.com/Cominvent
Solr Training - www.solrtraining.com

On 22. mai 2012, at 11:06, Bruno Mannina wrote:

 My choice: http://www.ovh.com/fr/serveurs_dedies/eg_best_of.xml
 
 24 Go DDR3
 
 Le 22/05/2012 10:26, findbestopensource a écrit :
 Dedicated Server may not be required. If you want to cut down cost, then
 prefer shared server.
 
 How much the RAM?
 
 Regards
 Aditya
 www.findbestopensource.com
 
 
 On Tue, May 22, 2012 at 12:36 PM, Bruno Manninabmann...@free.fr  wrote:
 
 Dear Solr users,
 
 My company would like to use solr to index around 80 000 000 documents
 (xml files with around 5~10ko size each).
 My program (robot) will connect to this solr with boolean requests.
 
 Number of users: around 1000
 Number of requests by user and by day: 300
 Number of users by day: 30
 
 I would like to subscribe to a host provider with this configuration:
 - Dedicated Server
 - Ubuntu
 - Intel Xeon i7 2x 266+ GHz 12 Go 2 * 1500Go
 - Unlimited bandwidth
 - IP fixe
 
 Do you think this configuration is enough?
 
 Thanks for your info,
 Sincerely
 Bruno

Re: Multicore Solr

2012-05-22 Thread Shanu Jha

Hi,

Could please tell me what do you mean by filter data by users? I would like
to know is there real problem creating a core for a user. ie. resource
utilization, cpu usage etc.

AJ

On Tue, May 22, 2012 at 4:39 PM, findbestopensource 
findbestopensou...@gmail.com wrote:

 Having cores per user is not good idea. The count is too high. Keep
 everything in single core. You could filter the data based on user name or
 user id.

 Regards
 Aditya
 www.findbestopensource.com



 On Tue, May 22, 2012 at 2:29 PM, Shanu Jha shanuu@gmail.com wrote:

  Hi all,
 
  greetings from my end. This is my first post on this mailing list. I have
  few questions on multicore solr. For background we want to create a core
  for each user logged in to our application. In that case it may be 50,
 100,
  1000, N-numbers. Each core will be used to write and search index in real
  time.
 
  1. Is this a good idea to go with?
  2. What are the pros and cons of this approch?
 
  Awaiting for your response.
 
  Regards
  AJ

Re: SolrCloud: how to index documents into a specific core and how to search against that core?

2012-05-22 Thread Yandong Yao

Hi Darren,

Thanks very much for your reply.

The reason I want to control core indexing/searching is that I want to
use one core to store one customer's data (all customer share same
config): such as customer 1 use coreForCustomer1 and customer 2
use coreForCustomer2.

Is there any better way than using different core for different customer?

Another way maybe use different collection for different customer, while
not sure how many collections solr cloud could support. Which way is better
in terms of flexibility/scalability? (suppose there are tens of thousands
customers).

Regards,
Yandong

2012/5/22 Darren Govoni dar...@ontrenet.com

Why do you want to control what gets indexed into a core and then
knowing what core to search? That's the kind of knowing that SolrCloud
solves. In SolrCloud, it handles the distribution of documents across
shards and retrieves them regardless of which node is searched from.
That is the point of cloud, you don't know the details of where
exactly documents are being managed (i.e. they are cloudy). It can
change and re-balance from time to time. SolrCloud performs the
distributed search for you, therefore when you try to search a node/core
with no documents, all the results from the cloud are retrieved
regardless. This is considered A Good Thing.

It requires a change in thinking about indexing and searching

On Tue, 2012-05-22 at 08:43 +0800, Yandong Yao wrote:
Hi Guys,

I use following command to start solr cloud according to solr cloud wiki.

yydzero:example bjcoe$ java -Dbootstrap_confdir=./solr/conf
-Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar
yydzero:example2 bjcoe$ java -Djetty.port=7574 -DzkHost=localhost:9983
-jar
start.jar

Then I have created several cores using CoreAdmin API (
http://localhost:8983/solr/admin/cores?action=CREATEname=
coreNamecollection=collection1), and clusterstate.json show following
topology:

collection1:
-- shard1:
-- collection1
-- CoreForCustomer1
-- CoreForCustomer3
-- CoreForCustomer5
-- shard2:
-- collection1
-- CoreForCustomer2
-- CoreForCustomer4

1) Index:

Using following command to index mem.xml file in exampledocs directory.

yydzero:exampledocs bjcoe$ java -Durl=
http://localhost:8983/solr/coreForCustomer3/update -jar post.jar mem.xml
SimplePostTool: version 1.4
SimplePostTool: POSTing files to
http://localhost:8983/solr/coreForCustomer3/update..
SimplePostTool: POSTing file mem.xml
SimplePostTool: COMMITting Solr index changes.

And now SolrAdmin UI shows that 'coreForCustomer1', 'coreForCustomer3',
'coreForCustomer5' has 3 documents (mem.xml has 3 documents) and other 2
core has 0 documents.

*Question 1:* Is this expected behavior? How do I to index documents
into
a specific core?

*Question 2*: If SolrCloud don't support this yet, how could I extend it
to support this feature (index document to particular core), where
should i
start, the hashing algorithm?

*Question 3*: Why the documents are also indexed into 'coreForCustomer1'
and 'coreForCustomer5'? The default replica for documents are 1, right?

Then I try to index some document to 'coreForCustomer2':

$ java -Durl=http://localhost:8983/solr/coreForCustomer2/update -jar
post.jar ipod_video.xml

While 'coreForCustomer2' still have 0 documents and documents in
ipod_video
are indexed to core for customer 1/3/5.

*Question 4*: Why this happens?

2) Search: I use
http://localhost:8983/solr/coreForCustomer2/select?q=*%3A*wt=xml; to
search against 'CoreForCustomer2', while it will return all documents in
the whole collection even though this core has no documents at all.

Then I use

http://localhost:8983/solr/coreForCustomer2/select?q=*%3A*wt=xmlshards=localhost:8983/solr/coreForCustomer2
,
and it will return 0 documents.

*Question 5*: So If want to search against a particular core, we need to
use 'shards' parameter and use solrCore name as parameter value, right?

Thanks very much in advance!

Regards,
Yandong

Re: System requirements in my case?

I installed a temp server on my university with 12 000 docs (Ubuntu+solr 
3.6.0)

May be I can preview the size of memory I need?

Q: How can I check the memory used?


Le 22/05/2012 13:14, findbestopensource a écrit :

Seems to be fine. Go head.

Before hosting, Have you tried / tested your application in local setup.
RAM usage is what matters in terms of Solr. Just benchmark your app for 100
000 documents, Log the memory used. Calculate the RAM reqd for 80 000 000
documents.

Regards
Aditya
www.findbestopensource.com


On Tue, May 22, 2012 at 2:36 PM, Bruno Manninabmann...@free.fr  wrote:


My choice: 
http://www.ovh.com/fr/**serveurs_dedies/eg_best_of.xmlhttp://www.ovh.com/fr/serveurs_dedies/eg_best_of.xml

24 Go DDR3

Le 22/05/2012 10:26, findbestopensource a écrit :

  Dedicated Server may not be required. If you want to cut down cost, then

prefer shared server.

How much the RAM?

Regards
Aditya
www.findbestopensource.com


On Tue, May 22, 2012 at 12:36 PM, Bruno Manninabmann...@free.fr   wrote:

  Dear Solr users,

My company would like to use solr to index around 80 000 000 documents
(xml files with around 5~10ko size each).
My program (robot) will connect to this solr with boolean requests.

Number of users: around 1000
Number of requests by user and by day: 300
Number of users by day: 30

I would like to subscribe to a host provider with this configuration:
- Dedicated Server
- Ubuntu
- Intel Xeon i7 2x 266+ GHz 12 Go 2 * 1500Go
- Unlimited bandwidth
- IP fixe

Do you think this configuration is enough?

Thanks for your info,
Sincerely
Bruno

RE: Wildcard-Search Solr 3.5.0

2012-05-22 Thread spring

  The text may contain FooBar.
  
  When I do a wildcard search like this: Foo* - no hits.
  When I do a wildcard search like this: foo* - doc is
  found.
 
 Please see http://wiki.apache.org/solr/MultitermQueryAnalysis


Well, it works in 3.6. With one exception: If I use german umlauts it does
not work anymore.

Text: Bär

Bä* - no hits
Bär - hits

What can I do in this case?

Thank you

Re: System requirements in my case?


Hi Jan,

Thanks for all these details !

Answers are below.

Sincerely,
Bruno


Le 22/05/2012 13:58, Jan Høydahl a écrit :

Hi,

It is impossible to guess the required HW size without more knowledge about 
data and usage. 80 mill docs is a fair amount.

Here's how I would approach sizing the setup:
1) Get your schema in shape, removing unnecessary stored/indexed fields

Ok good idea !

2) To a test index locally of a part of the dataset, e.g. 10 mill docs and 
perform an Optimize

Concerning test, I have only actually a sample with 12000 docs. no more :'(

3) Measure the size of the index folder, multiply with 8 to get a clue of total 
index size

With 12 000 docs my index folder size is: 33Mo
ps: I use solr.clustering.enabled=true


4) Do some benchmarking with realistic types of queries to identify performance 
bottlenecks on query side

yep, this point is for later.


Depending on your requirements for search performance, you can beef up your RAM to 
hold the whole index or depend on slow disks as a bottleneck. If you find that 
total size of index is 16Gb, you should leave16Gb free for OS disk caching, 
e.g. allocate 8Gb to Tomcat/Solr and leave the rest for the OS. If I should guess, 
you probably find that one server gets overloaded or too slow with your amount of 
docs, and that you end up with sharding across 2-4 servers.
I will take a look to see if I can easely increase RAM on the server 
(actually 24Go)


Another question concerning the execution of solr, have just to run java 
-jar start.jar ?

or you think I must run it with another way ?



PS: Do you always need to search all data? A trick may be to partition your data such 
that say 80% of searches go to a fresh index with 10% of the content, while 
the remaining searches include everything.
Yes I need to search to the whole index, even old document must be 
requested.




--
Jan Høydahl, search solution architect
Cominvent AS - www.facebook.com/Cominvent
Solr Training - www.solrtraining.com

On 22. mai 2012, at 11:06, Bruno Mannina wrote:


My choice: http://www.ovh.com/fr/serveurs_dedies/eg_best_of.xml

24 Go DDR3

Le 22/05/2012 10:26, findbestopensource a écrit :

Dedicated Server may not be required. If you want to cut down cost, then
prefer shared server.

How much the RAM?

Regards
Aditya
www.findbestopensource.com


On Tue, May 22, 2012 at 12:36 PM, Bruno Manninabmann...@free.fr   wrote:


Dear Solr users,

My company would like to use solr to index around 80 000 000 documents
(xml files with around 5~10ko size each).
My program (robot) will connect to this solr with boolean requests.

Number of users: around 1000
Number of requests by user and by day: 300
Number of users by day: 30

I would like to subscribe to a host provider with this configuration:
- Dedicated Server
- Ubuntu
- Intel Xeon i7 2x 266+ GHz 12 Go 2 * 1500Go
- Unlimited bandwidth
- IP fixe

Do you think this configuration is enough?

Thanks for your info,
Sincerely
Bruno

Re: Newbie with Carrot2?

2012-05-22 Thread Stanislaw Osinski

Hi Bruno,

Just to confirm -- are you seeing the clusters array in the result at all
(arr name=clusters)? To get reasonable clusters, you should request at
least 30-50 documents (rows), but even with smaller values, you should see
an empty clusters array.

Staszek

On Sun, May 20, 2012 at 9:20 PM, Bruno Mannina bmann...@free.fr wrote:

 Le 20/05/2012 11:43, Stanislaw Osinski a écrit :

  Hi Bruno,

 Here's the wiki documentation for Solr's clustering component:

 http://wiki.apache.org/solr/**ClusteringComponenthttp://wiki.apache.org/solr/ClusteringComponent

 For configuration examples, take a look at the Configuration section:
 http://wiki.apache.org/solr/**ClusteringComponent#**Configurationhttp://wiki.apache.org/solr/ClusteringComponent#Configuration
 .

 If you hit any problems, let me know.

 Staszek

 On Sun, May 20, 2012 at 11:38 AM, Bruno Manninabmann...@free.fr  wrote:

  Dear all,

 I use Solr 3.6.0 and I indexed some documents (around 12000).
 Each documents contains a Abstract-en field (and some other fields).

 Is it possible to use Carrot2 to create cluster (classes) with the
 Abstract-en field?

 What must I configure in the schema.xml ? or in other files?

 Sorry for my newbie question, but I found only documentation for
 Workbench
 tool.

 Bruno

  Thx for this link but I have a problem to configure my solrconfig.xml
 in the section:
 (note I run java -Dsolr.clustering.enabled=**true)

 I have a field named abstract-en, and I would like to use only this field.

 I would like to know if my requestHandler is good?
 I have a doubt with the content of  : carrot.title, carrot.url

 and also the latest field
 str name=dfabstract-en/str
 str name=defTypeedismax/str
 str name=qf
  abstract-en^1.0
 /str
 str name=q.alt*:*/str
 str name=rows10/str
 str name=fl*,score/str

 because the result when I do a request is exactly like a search request
 (without more information)


 My entire requestHandler is:

 requestHandler name=/clustering startup=lazy
 enable=${solr.clustering.**enabled:false} class=solr.SearchHandler
 lst name=defaults
 bool name=clusteringtrue/bool
 str name=clustering.engine**default/str
 bool name=clustering.results**true/bool
 !-- The title field --
 str name=carrot.titlename/str
 str name=carrot.urlid/str
 !-- The field to cluster on --
 str name=carrot.snippet**abstract-en/str
 !-- produce summaries --
 bool name=carrot.produceSummary**true/bool
 !-- the maximum number of labels per cluster --
 !--int name=carrot.numDescriptions**5/int--
 !-- produce sub clusters --
 bool name=carrot.**outputSubClustersfalse/**bool
 str name=dfabstract-en/str
 str name=defTypeedismax/str
 str name=qf
  abstract-en^1.0
 /str
 str name=q.alt*:*/str
 str name=rows10/str
 str name=fl*,score/str
 /lst
 arr name=last-components
 strclustering/str
 /arr
 /requestHandler

Re: Question about sampling

2012-05-22 Thread rita

Hi Lance, 
Could you provide more details about implementing this using
SignatureUpdateProcessor? 
Example can be helpful. 

-
Rita
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-about-sampling-tp3984103p3985379.html
Sent from the Solr - User mailing list archive at Nabble.com.

Multicore solr

2012-05-22 Thread Shanu Jha

Hi all,

greetings from my end. This is my first post on this mailing list. I have
few questions on multicore solr. For background we want to create a core
for each user logged in to our application. In that case it may be 50, 100,
1000, N-numbers. Each core will be used to write and search index in real
time.

1. Is this a good idea to go with?
2. What are the pros and cons of this approch?

Awaiting for your response.

Regards
AJ

Re: Newbie with Carrot2?


Arfff

Clusters are at the end of my XML answer 

doc
/doc
doc
/doc
doc
/doc
doc
/doc
..
..
cluster
/cluster

ok all work fine now !


Le 22/05/2012 15:33, Stanislaw Osinski a écrit :

Hi Bruno,

Just to confirm -- are you seeing the clusters array in the result at all
(arr name=clusters)? To get reasonable clusters, you should request at
least 30-50 documents (rows), but even with smaller values, you should see
an empty clusters array.

Staszek

On Sun, May 20, 2012 at 9:20 PM, Bruno Manninabmann...@free.fr  wrote:


Le 20/05/2012 11:43, Stanislaw Osinski a écrit :

  Hi Bruno,

Here's the wiki documentation for Solr's clustering component:

http://wiki.apache.org/solr/**ClusteringComponenthttp://wiki.apache.org/solr/ClusteringComponent

For configuration examples, take a look at the Configuration section:
http://wiki.apache.org/solr/**ClusteringComponent#**Configurationhttp://wiki.apache.org/solr/ClusteringComponent#Configuration
.

If you hit any problems, let me know.

Staszek

On Sun, May 20, 2012 at 11:38 AM, Bruno Manninabmann...@free.fr   wrote:

  Dear all,

I use Solr 3.6.0 and I indexed some documents (around 12000).
Each documents contains a Abstract-en field (and some other fields).

Is it possible to use Carrot2 to create cluster (classes) with the
Abstract-en field?

What must I configure in the schema.xml ? or in other files?

Sorry for my newbie question, but I found only documentation for
Workbench
tool.

Bruno

  Thx for this link but I have a problem to configure my solrconfig.xml

in the section:
(note I run java -Dsolr.clustering.enabled=**true)

I have a field named abstract-en, and I would like to use only this field.

I would like to know if my requestHandler is good?
I have a doubt with the content of  : carrot.title, carrot.url

and also the latest field
str name=dfabstract-en/str
str name=defTypeedismax/str
str name=qf
  abstract-en^1.0
/str
str name=q.alt*:*/str
str name=rows10/str
str name=fl*,score/str

because the result when I do a request is exactly like a search request
(without more information)


My entire requestHandler is:

requestHandler name=/clustering startup=lazy
enable=${solr.clustering.**enabled:false} class=solr.SearchHandler
lst name=defaults
bool name=clusteringtrue/bool
str name=clustering.engine**default/str
bool name=clustering.results**true/bool
!-- The title field --
str name=carrot.titlename/str
str name=carrot.urlid/str
!-- The field to cluster on --
str name=carrot.snippet**abstract-en/str
!-- produce summaries --
bool name=carrot.produceSummary**true/bool
!-- the maximum number of labels per cluster --
!--int name=carrot.numDescriptions**5/int--
!-- produce sub clusters --
bool name=carrot.**outputSubClustersfalse/**bool
str name=dfabstract-en/str
str name=defTypeedismax/str
str name=qf
  abstract-en^1.0
/str
str name=q.alt*:*/str
str name=rows10/str
str name=fl*,score/str
/lst
arr name=last-components
strclustering/str
/arr
/requestHandler

Uncatchable Exception on solrj3.6.0

2012-05-22 Thread Jamel ESSOUSSI

Hi,

I use solr-solrj 3.6.0 and solr-core 3.6.0:

I have reimplemented the handleError of the ConcurrentUpdateSolrServer
class:


final ConcurrentUpdateSolrServer newSolrServer = new
ConcurrentUpdateSolrServer(url, client, 100, 10){
@Override
public void handleError(Throwable ex) {
// TODO Auto-generated method stub
super.handleError(ex);
}
};

My problem is when an exception is thrown in the solr server side I cannot
catch it in the client side.

Thanks

-- Jamel E

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Uncatchable-Exception-on-solrj3-6-0-tp3985437.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Re: SolrCloud: how to index documents into a specific core and how to search against that core?

2012-05-22 Thread Darren Govoni


I'm curious what the solrcloud experts say, but my suggestion is to try not to 
over-engineering the search architecture  on solrcloud. For example, what is 
the benefit of managing the what cores are indexed and searched? Having to know 
those details, in my mind, works against the automation in solrcore, but maybe 
there's a good reason you want to do it this way.

brbrbr--- Original Message ---
On 5/22/2012  07:35 AM Yandong Yao wrote:brHi Darren,
br
brThanks very much for your reply.
br
brThe reason I want to control core indexing/searching is that I want to
bruse one core to store one customer's data (all customer share same
brconfig):  such as customer 1 use coreForCustomer1 and customer 2
bruse coreForCustomer2.
br
brIs there any better way than using different core for different customer?
br
brAnother way maybe use different collection for different customer, while
brnot sure how many collections solr cloud could support. Which way is better
brin terms of flexibility/scalability? (suppose there are tens of thousands
brcustomers).
br
brRegards,
brYandong
br
br2012/5/22 Darren Govoni dar...@ontrenet.com
br
br Why do you want to control what gets indexed into a core and then
br knowing what core to search? That's the kind of knowing that SolrCloud
br solves. In SolrCloud, it handles the distribution of documents across
br shards and retrieves them regardless of which node is searched from.
br That is the point of cloud, you don't know the details of where
br exactly documents are being managed (i.e. they are cloudy). It can
br change and re-balance from time to time. SolrCloud performs the
br distributed search for you, therefore when you try to search a node/core
br with no documents, all the results from the cloud are retrieved
br regardless. This is considered A Good Thing.
br
br It requires a change in thinking about indexing and searching
br
br On Tue, 2012-05-22 at 08:43 +0800, Yandong Yao wrote:
br  Hi Guys,
br 
br  I use following command to start solr cloud according to solr cloud 
wiki.
br 
br  yydzero:example bjcoe$ java -Dbootstrap_confdir=./solr/conf
br  -Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar
br  yydzero:example2 bjcoe$ java -Djetty.port=7574 -DzkHost=localhost:9983
br -jar
br  start.jar
br 
br  Then I have created several cores using CoreAdmin API (
br  http://localhost:8983/solr/admin/cores?action=CREATEname=
br  coreNamecollection=collection1), and clusterstate.json show following
br  topology:
br 
br 
br  collection1:
br  -- shard1:
br-- collection1
br-- CoreForCustomer1
br-- CoreForCustomer3
br-- CoreForCustomer5
br  -- shard2:
br-- collection1
br-- CoreForCustomer2
br-- CoreForCustomer4
br 
br 
br  1) Index:
br 
br  Using following command to index mem.xml file in exampledocs directory.
br 
br  yydzero:exampledocs bjcoe$ java -Durl=
br  http://localhost:8983/solr/coreForCustomer3/update -jar post.jar mem.xml
br  SimplePostTool: version 1.4
br  SimplePostTool: POSTing files to
br  http://localhost:8983/solr/coreForCustomer3/update..
br  SimplePostTool: POSTing file mem.xml
br  SimplePostTool: COMMITting Solr index changes.
br 
br  And now SolrAdmin UI shows that 'coreForCustomer1', 'coreForCustomer3',
br  'coreForCustomer5' has 3 documents (mem.xml has 3 documents) and other 2
br  core has 0 documents.
br 
br  *Question 1:*  Is this expected behavior? How do I to index documents
br into
br  a specific core?
br 
br  *Question 2*:  If SolrCloud don't support this yet, how could I extend 
it
br  to support this feature (index document to particular core), where
br should i
br  start, the hashing algorithm?
br 
br  *Question 3*:  Why the documents are also indexed into 
'coreForCustomer1'
br  and 'coreForCustomer5'?  The default replica for documents are 1, right?
br 
br  Then I try to index some document to 'coreForCustomer2':
br 
br  $ java -Durl=http://localhost:8983/solr/coreForCustomer2/update -jar
br  post.jar ipod_video.xml
br 
br  While 'coreForCustomer2' still have 0 documents and documents in
br ipod_video
br  are indexed to core for customer 1/3/5.
br 
br  *Question 4*:  Why this happens?
br 
br  2) Search: I use 
br  http://localhost:8983/solr/coreForCustomer2/select?q=*%3A*wt=xml; to
br  search against 'CoreForCustomer2', while it will return all documents in
br  the whole collection even though this core has no documents at all.
br 
br  Then I use 
br 
br 
http://localhost:8983/solr/coreForCustomer2/select?q=*%3A*wt=xmlshards=localhost:8983/solr/coreForCustomer2
br ,
br  and it will return 0 documents.
br 
br  *Question 5*: So If want to search against a particular core, we need to
br  use 'shards' parameter and use solrCore name as parameter value, right?
br 
br 
br  Thanks very much in advance!
br 
br  Regards,
br  Yandong
br
br
br
br

Re: Installing Solr on Tomcat using Shell - Code wrong?

2012-05-22 Thread Li Li

you should find some clues from tomcat log
在 2012-5-22 晚上7:49，Spadez james_will...@hotmail.com写道：

 Hi,

 This is the install process I used in my shell script to try and get Tomcat
 running with Solr (debian server):



 I swear this used to work, but currently only Tomcat works. The Solr page
 just comes up with The requested resource (/solr/admin) is not available.

 Can anyone give me some insight into why this isnt working? Its driving me
 nuts.

 James

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Installing-Solr-on-Tomcat-using-Shell-Code-wrong-tp3985393.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud: how to index documents into a specific core and how to search against that core?

2012-05-22 Thread Mark Miller

I think the key is this: you want to think of a SolrCore on a single node Solr 
installation as a collection on a multi node SolrCloud installation.

So if you would use multiple SolrCore's with a std Solr setup, you should be 
using multiple collections in SolrCloud. If you were going to try to do 
everything in one SolrCore, that would be like putting everything in one 
collection in SolrCloud. I don't think it generally makes sense to try and work 
at the SolrCore level when working with SolrCloud. This will be made more clear 
once we add a simple collections api.

So I think your choice should be similar to using a single node - do you want 
to put everything in one 'collection' and use a filter to separate customers 
(with all its caveats and limitations) or do you want to use a collection per 
customer. You can always start up more clusters if you reach any limits.



On May 22, 2012, at 10:08 AM, Darren Govoni wrote:

 I'm curious what the solrcloud experts say, but my suggestion is to try not 
 to over-engineering the search architecture  on solrcloud. For example, what 
 is the benefit of managing the what cores are indexed and searched? Having to 
 know those details, in my mind, works against the automation in solrcore, but 
 maybe there's a good reason you want to do it this way.
 
 brbrbr--- Original Message ---
 On 5/22/2012  07:35 AM Yandong Yao wrote:brHi Darren,
 br
 brThanks very much for your reply.
 br
 brThe reason I want to control core indexing/searching is that I want to
 bruse one core to store one customer's data (all customer share same
 brconfig):  such as customer 1 use coreForCustomer1 and customer 2
 bruse coreForCustomer2.
 br
 brIs there any better way than using different core for different customer?
 br
 brAnother way maybe use different collection for different customer, while
 brnot sure how many collections solr cloud could support. Which way is 
 better
 brin terms of flexibility/scalability? (suppose there are tens of thousands
 brcustomers).
 br
 brRegards,
 brYandong
 br
 br2012/5/22 Darren Govoni dar...@ontrenet.com
 br
 br Why do you want to control what gets indexed into a core and then
 br knowing what core to search? That's the kind of knowing that SolrCloud
 br solves. In SolrCloud, it handles the distribution of documents across
 br shards and retrieves them regardless of which node is searched from.
 br That is the point of cloud, you don't know the details of where
 br exactly documents are being managed (i.e. they are cloudy). It can
 br change and re-balance from time to time. SolrCloud performs the
 br distributed search for you, therefore when you try to search a node/core
 br with no documents, all the results from the cloud are retrieved
 br regardless. This is considered A Good Thing.
 br
 br It requires a change in thinking about indexing and searching
 br
 br On Tue, 2012-05-22 at 08:43 +0800, Yandong Yao wrote:
 br  Hi Guys,
 br 
 br  I use following command to start solr cloud according to solr cloud 
 wiki.
 br 
 br  yydzero:example bjcoe$ java -Dbootstrap_confdir=./solr/conf
 br  -Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar
 br  yydzero:example2 bjcoe$ java -Djetty.port=7574 -DzkHost=localhost:9983
 br -jar
 br  start.jar
 br 
 br  Then I have created several cores using CoreAdmin API (
 br  http://localhost:8983/solr/admin/cores?action=CREATEname=
 br  coreNamecollection=collection1), and clusterstate.json show 
 following
 br  topology:
 br 
 br 
 br  collection1:
 br  -- shard1:
 br-- collection1
 br-- CoreForCustomer1
 br-- CoreForCustomer3
 br-- CoreForCustomer5
 br  -- shard2:
 br-- collection1
 br-- CoreForCustomer2
 br-- CoreForCustomer4
 br 
 br 
 br  1) Index:
 br 
 br  Using following command to index mem.xml file in exampledocs 
 directory.
 br 
 br  yydzero:exampledocs bjcoe$ java -Durl=
 br  http://localhost:8983/solr/coreForCustomer3/update -jar post.jar 
 mem.xml
 br  SimplePostTool: version 1.4
 br  SimplePostTool: POSTing files to
 br  http://localhost:8983/solr/coreForCustomer3/update..
 br  SimplePostTool: POSTing file mem.xml
 br  SimplePostTool: COMMITting Solr index changes.
 br 
 br  And now SolrAdmin UI shows that 'coreForCustomer1', 
 'coreForCustomer3',
 br  'coreForCustomer5' has 3 documents (mem.xml has 3 documents) and 
 other 2
 br  core has 0 documents.
 br 
 br  *Question 1:*  Is this expected behavior? How do I to index documents
 br into
 br  a specific core?
 br 
 br  *Question 2*:  If SolrCloud don't support this yet, how could I 
 extend it
 br  to support this feature (index document to particular core), where
 br should i
 br  start, the hashing algorithm?
 br 
 br  *Question 3*:  Why the documents are also indexed into 
 'coreForCustomer1'
 br  and 'coreForCustomer5'?  The default replica for documents are 1, 
 right?
 br 
 br  Then I try to index some document to

Re: Multicore solr