Grouping some smaller customers together may be workable. Could we hack around zk's limitations in the same way as file systems - with a tree of /p/r/e/f/prefixes so that no ZNode has too many children?
On Fri, Dec 21, 2012 at 11:12 AM, Garrett Barton <[email protected]>wrote: > Tens of thousands eh? I've had ~100-150 running and that worked fine. I > could see issues with Blurs table tracking since its zookeeper backed, and > zk doesn't like massive directories like that. Then again Blur has a > caching system built into it for its meta data, so maybe it would be ok? > > Are the table structures going to be different? Is there any reasonable > grouping you could do of the customers? Perhaps the small ones could live > together in a larger index? > > > > On Fri, Dec 21, 2012 at 11:08 AM, Aaron McCurry <[email protected]> > wrote: > > > I agree with Garret. We run ~100 tables with the shard count varying > from > > 1 shard to over 1000 in a single table. How many tables will you have? > > > > Yes Blur works on CDH3U2. It should work on any 0.20.x (1.0.x) version > of > > Hadoop. However if HDFS doesn't support appends then the write ahead log > > won't function correctly. Meaning it won't actually preserve the data. > > > > Aaron > > > > > > On Fri, Dec 21, 2012 at 10:59 AM, Garrett Barton > > <[email protected]>wrote: > > > > > If I understand you correctly you have data from multiple customers > > > (denoted by a customer_id) and you only perform a search against a > single > > > customer at a time? If that's the case the separate index route might > > be a > > > good idea as you can rebuild them separately, and you can model them > > > differently potentially if you have a need. Having said that, if you > > also > > > occasionally want to search across customers, then you would want them > > all > > > in a single index. > > > > > > I have Blur 1.x running on CDH3U5, I think it will work back down to > > CDH3U2 > > > at least, and that's hadoop 0.20 in both cases. Have not tried 0.23 > > though > > > I will be needing to soon. > > > > > > > > > On Fri, Dec 21, 2012 at 10:51 AM, James Kebinger <[email protected] > > > >wrote: > > > > > > > Hello, I'm hoping to kick the tires on apache blur in the near > future. > > I > > > > have a couple of quick questions before I set out. > > > > > > > > What version(s) of hadoop are required/supported at present? > > > > > > > > We have lots of data to index, but we always search within a > particular > > > > customer's data set. Would the best practice be to put all of the > data > > in > > > > one table and have the customer id in all of the queries, or build > > > separate > > > > tables for each customer_id (like users-1, users-123 etc). > > > > > > > > Thanks, and happy holidays! > > > > > > > > -James Kebinger > > > > > > > > > >
