Re: Indexing and searching of sharded/ partitioned databases and tables
Thanks guys. Now I can easily search thru 10TB of my personal photos, videos, music and other stuff :) At some point I had split them into multiple db and tables and inserts to a single db/ table were taking too much time once the index grew beyond 1gig. I was storing all the possible metadata about the media. I used two hex characters for naming tables/dbs and ended up with 256 db, each with 256 tables :D . Don't ask me why I had done it this way. Let's just say I was exploring sharding some years ago and got too excited and did that :D. Alas, never touched it again to finish the search portion till now when I really wanted to find a particular photo :) The pk is unique across all the tables so no issues there. I think I should be able to run it off a single server at my home. Thanks and Best Regards, Jayant On Wed, Oct 7, 2009 at 4:52 AM, Shalin Shekhar Mangar wrote: > On Wed, Oct 7, 2009 at 5:09 PM, Sandeep Tagore > wrote: > >> >> You can write an automated program which will change the DB conf details in >> that xml and fire the full import command. You can use >> http://localhost:8983/solr/dataimport url to check the status of the data >> import. >> >> > Also note that full-import deletes all existing documents. So if you write > such a program which changes DB conf details, make sure you invoke the > "import" command (new in Solr 1.4) to avoid deleting the other documents. > > -- > Regards, > Shalin Shekhar Mangar. > -- www.jkg.in | http://www.jkg.in/contact-me/ Jayant Kr. Gandhi
Indexing and searching of sharded/ partitioned databases and tables
Hi All, I am new to Solr. I looking forward for Solr to index data that is partitioned into multiple databases and tables and have questions regarding dataconfig.xml. I have given the doubts at the end. Lets say I have 3 mysql databases each with 3 tables. Db1 : Tbl1, Tbl2, Tbl3 Db2 : Tbl1, Tbl2, Tbl3 Db3 : Tbl1, Tbl2, Tbl3 All databases have the same number of tables with same table names as shown above. All tables have exactly the same structure as well. Each table has three fields: id, name, category Since the data is distributed this way, I don't have a way to search for a particular record using 'name'. I must look for it in all the 9 tables. This is not scalable when lets say I have 20 databases each with 20 tables, meaning 400 queries needed to find a single record. Solr seemed like the solution to help. I followed the wiki tutorials: http://wiki.apache.org/solr/DataImportHandler http://wiki.apache.org/solr/DIHQuickStart http://wiki.apache.org/solr/DataImportHandlerFaq The following are my config files so far: solrconfig.xml data-config.xml dataconfig.xml (so far) Doubts/ Questions: - Is this the right away to achieve indexing this data? - Is there a better way to achieve this? Imagine 20 databases with 20 tables each translates to 400 lines in the XML. This doesn't scale for something like 200 databases and 200 tables each. Will solr continue to work/ index properly if I had 4 entity rows without going out of memory? - I will really want that I can search thru the complete database for a 'name' and do things like 'category' filtering etc easily independent of the entity name/ datasource. For me they are all records of the same type. Thanks and Best Regards, Jayant -- www.jkg.in | http://www.jkg.in/contact-me/ Jayant Kr. Gandhi