Re: Indexing and searching of sharded/ partitioned databases and tables

2009-10-07 Thread Jayant Kumar Gandhi
Thanks guys. Now I can easily search thru 10TB of my personal photos,
videos, music and other stuff :)

At some point I had split them into multiple db and tables and inserts
to a single db/ table were taking too much time once the index grew
beyond 1gig. I was storing all the possible metadata about the media.
I used two hex characters for naming tables/dbs and ended up with 256
db, each with 256 tables :D . Don't ask me why I had done it this way.
Let's just say I was exploring sharding some years ago and got too
excited and did that :D. Alas, never touched it again to finish the
search portion till now when I really wanted to find a particular
photo :)

The pk is unique across all the tables so no issues there. I think I
should be able to run it off a single server at my home.

Thanks and Best Regards,
Jayant

On Wed, Oct 7, 2009 at 4:52 AM, Shalin Shekhar Mangar
 wrote:
> On Wed, Oct 7, 2009 at 5:09 PM, Sandeep Tagore 
> wrote:
>
>>
>> You can write an automated program which will change the DB conf details in
>> that xml and fire the full import command. You can use
>> http://localhost:8983/solr/dataimport url to check the status of the data
>> import.
>>
>>
> Also note that full-import deletes all existing documents. So if you write
> such a program which changes DB conf details, make sure you invoke the
> "import" command (new in Solr 1.4) to avoid deleting the other documents.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
www.jkg.in | http://www.jkg.in/contact-me/
Jayant Kr. Gandhi


Indexing and searching of sharded/ partitioned databases and tables

2009-10-07 Thread Jayant Kumar Gandhi
Hi All,

I am new to Solr. I looking forward for Solr to index data that is
partitioned into multiple databases and tables and have questions
regarding dataconfig.xml. I have given the doubts at the end.

Lets say I have 3 mysql databases each with 3 tables.

Db1 : Tbl1, Tbl2, Tbl3
Db2 : Tbl1, Tbl2, Tbl3
Db3 : Tbl1, Tbl2, Tbl3

All databases have the same number of tables with same table names as
shown above. All tables have exactly the same structure as well. Each
table has three fields:
id, name, category

Since the data is distributed this way, I don't have a way to search
for a particular record using 'name'. I must look for it in all the 9
tables. This is not scalable when lets say I have 20 databases each
with 20 tables, meaning 400 queries needed to find a single record.

Solr seemed like the solution to help.

I followed the wiki tutorials:
http://wiki.apache.org/solr/DataImportHandler
http://wiki.apache.org/solr/DIHQuickStart
http://wiki.apache.org/solr/DataImportHandlerFaq

The following are my config files so far:

solrconfig.xml



  data-config.xml




dataconfig.xml (so far)


  
  
  
  









  



Doubts/ Questions:


- Is this the right away to achieve indexing this data?
- Is there a better way to achieve this? Imagine 20 databases with 20
tables each translates to 400 lines in the XML. This doesn't scale for
something like 200 databases and 200 tables each. Will solr continue
to work/ index properly if I had 4 entity rows without going out
of memory?
- I will really want that I can search thru the complete database for
a 'name' and do things like 'category' filtering etc easily
independent of the entity name/ datasource. For me they are all
records of the same type.

Thanks and Best Regards,
Jayant

-- 
www.jkg.in | http://www.jkg.in/contact-me/
Jayant Kr. Gandhi