Like you said, the sync might come into play only during short-circuit-reads.
Nonetheless, many thanks for a detailed explanation On Fri, Sep 20, 2013 at 5:17 PM, Aaron McCurry <[email protected]> wrote: > On Fri, Sep 20, 2013 at 7:05 AM, Ravikumar Govindarajan < > [email protected]> wrote: > > > Thanks for all clarifications > > > > I went through SOLR's implementation of HDFSDirectory, where there seems > to > > be no sync blocks in readInternal() methods. > > > > https://issues.apache.org/jira/browse/SOLR-5150 > > > > Blur still has sync blocks in HDFSIndexInput class. Are there any reasons > > for not taking SOLR's path? > > > > Technically SOLR took Blur's path. That's where their original code came > from, I'm sure that they have made changes to it since it was lifted. Also > there are 4 different implementations in Blur's HDFSIndexInput 2 with syncs > and 2 without. I have tested all four on a medium sized cluster (large > cluster soon) and three of the four seem to give very similar results. > > > https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=blob;f=blur-store/src/main/java/org/apache/blur/store/hdfs/HdfsDirectory.java;h=94686e2675c26f140896028367e43f7116941ca5;hb=apache-blur-0.2#l179 > > Basically every implementation works well except for case 0. The default > we use is case 1, the biggest reason for this is because there wasn't a > large enough performance difference (basically nothing) between case 1 and > 3 (which is the version I would likely move to). And there was an issue > with Hadoop when you would use a local filesystem instead of a HDFS file > and use the read with position is would cause terrible performance. This > needs to be revisited to see if it's still the case. Also the terrible > local performance (when using that method) might come into play if we use > short circuit reads. Again that needs to be tested. > > > > > > Another point is, we have a custom codec on the lines of > > > > > http://www.flax.co.uk/blog/2012/06/22/updating-individual-fields-in-lucene-with-a-redis-backed-codec/ > > > > We are using TokyoCabinet and support individual field update operations > > > > These update operations also need to go through transaction-log etc... > > > > Is there a way, I can utilize this without changing Blur's code? > > > > At this point no. Although we are open to building extension points into > Blur for such a thing. > > > > > > We also use grouping queries. Is this supported by Blur? > > > > Also again no. Or not yet. :-) > > Aaron > > > > > > -- > > Ravi > > > > > > On Fri, Sep 20, 2013 at 8:06 AM, Aaron McCurry <[email protected]> > wrote: > > > > > Rahul, > > > > > > Yeah you are right I would not recommend a MapReduce+Blur system. We > > have > > > run systems that share MapReduce/Blur and HDFS, but I would only > > recommend > > > that setup for a test environment. Or one where high latency (a couple > > of > > > seconds) on data retrievals are acceptable. > > > > > > Thanks, > > > > > > Aaron > > > > > > > > > On Thu, Sep 19, 2013 at 9:40 PM, rahul challapalli < > > > [email protected]> wrote: > > > > > > > My assumption regarding the last question was that HDFS was not > > dedicated > > > > for just storing indexes and is running regular MapReduce jobs in > which > > > > case performance would be affected. > > > > > > > > > > > > On Thu, Sep 19, 2013 at 4:38 PM, Aaron McCurry <[email protected]> > > > wrote: > > > > > > > > > Let me try this again. > > > > > > > > > > > > > > > On Thu, Sep 19, 2013 at 1:46 PM, rahul challapalli < > > > > > [email protected]> wrote: > > > > > > > > > > > I will attempt to answer some of your questions below. Aaron or > > > someone > > > > > > else can correct me if I am wrong > > > > > > > > > > > > > > > > > > On Thu, Sep 19, 2013 at 6:15 AM, Ravikumar Govindarajan < > > > > > > [email protected]> wrote: > > > > > > > > > > > > > Thanks Aaron. I think, it has answered my question. I have a > few > > > more > > > > > and > > > > > > > would be great if you can clarify them > > > > > > > > > > > > > > 1. Is the number of shards per-table fixed during > table-creation > > or > > > > we > > > > > > can > > > > > > > dynamically allocate shards? > > > > > > > > > > > > > > > > > > > I believe we cannot dynamically allocate shards. The only > thing > > we > > > > can > > > > > > dynamically add to an existing table is new columns > > > > > > > > > > > > > > > > Currently this is true. However we are planning on making this a > > > > feature. > > > > > > > > > > https://issues.apache.org/jira/browse/BLUR-136 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. Assuming I have 10k shards with each shard-size=2GB, for a > > total > > > > of > > > > > 20 > > > > > > > TB table size. I typically use RowId = UserId and there are > > approx > > > 3 > > > > > > > million users, in our system > > > > > > > > > > > > > > How do I ensure that when a user issues a query, I should > not > > > > > end-up > > > > > > > searching all these 10k shards, but rather search only a very > > small > > > > set > > > > > > of > > > > > > > shards? > > > > > > > > > > > > > > > > > > > When a search request is issued to the Blur Controller it > > > searches > > > > > > though all the shard servers in parallel and each shard server > > > searches > > > > > > through all of its shards. > > > > > > Unlike database partitioning, I believe we cannot direct a > > search > > > > to > > > > > a > > > > > > particular shard. > > > > > > However > > > > > > 1. Upon shard server start, all the shards are warmed up > ie > > > > Index > > > > > > Reader's for each shard is loaded into memory > > > > > > 2. Blur uses a block level cache. With sufficient memory > > > > > allocated > > > > > > to the cache, performance will be greatly enhanced > > > > > > > > > > > > > > > > All this is true, however if you are fetching a single RowId the > > > > controller > > > > > only talks to the shard that is needed. And there is an issue that > > > would > > > > > optimize the controller to only search the single shard where the > row > > > id > > > > > would reside. > > > > > > > > > > https://issues.apache.org/jira/browse/BLUR-139 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3. Are there any advantages of running shard-server and > > data-nodes > > > > > {HDFS} > > > > > > > in the same machine? > > > > > > > > > > > > > > Someone else can provide a better answer here. > > > > > > In a typical Hadoop installation Task Trackers and Data Nodes > > run > > > > > > alongside each other on the same machine. Since data nodes store > > the > > > > > first > > > > > > block replica on the > > > > > > same machine, shard servers might see an advantage in terms > of > > > > > network > > > > > > latency. However I think it is not a good idea to run Blur > > alongside > > > > Task > > > > > > Trackers for > > > > > > performance reasons > > > > > > > > > > > > > > > > I typically run HDFS + Blur Shard on the same node, it is not > > required > > > > but > > > > > performance seems best when HDFS is local to the cluster and it's > > > > > dedicated. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Ravi > > > > > > > > > > > > > > > > > > > > > On Thu, Sep 19, 2013 at 2:36 AM, Aaron McCurry < > > [email protected] > > > > > > > > > > wrote: > > > > > > > > > > > > > > > I will attempt to answer below: > > > > > > > > > > > > > > > > On Wed, Sep 18, 2013 at 1:54 AM, Ravikumar Govindarajan < > > > > > > > > [email protected]> wrote: > > > > > > > > > > > > > > > > > Thanks a bunch for a concise and quick reply. Few more > > > questions > > > > > > > > > > > > > > > > > > 1. Any pointers/links on how you plan to tackle the > > > availability > > > > > > > problem? > > > > > > > > > > > > > > > > > > Lets say we store-forward hints to the failed shard-server. > > > Won't > > > > > the > > > > > > > > HDFS > > > > > > > > > index-files differ in shard replicas? > > > > > > > > > > > > > > > > > > > > > > > > > I am in the process of documenting the strategy and will be > > > adding > > > > it > > > > > > to > > > > > > > > JIRA soon. The way I am planning on solving this problem > > doesn't > > > > > > involve > > > > > > > > storing the indexes in more than once in HDFS (which of > course > > is > > > > > > > > replicated). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. I did not phrase my question on cross-join correctly. > Let > > me > > > > > > clarify > > > > > > > > > > > > > > > > > > RowKey = 123 > > > > > > > > > > > > > > > > > > RecId = 1000 > > > > > > > > > Family = "ACCOUNTS" > > > > > > > > > Col-Name = "NAME" > > > > > > > > > Col-Value = "ABC" > > > > > > > > > ...... > > > > > > > > > > > > > > > > > > RecId = 1001 > > > > > > > > > Family = "CONTACTS" > > > > > > > > > Col-Name = "NAME" > > > > > > > > > Col-Value = "XYZ" > > > > > > > > > Col-Name = "ACCOUNTS-NAME" [FK to RecId=1000] > > > > > > > > > Col-Value = "1000" > > > > > > > > > ....... > > > > > > > > > > > > > > > > > > Lets say the user specifies the search query as > > > > > > > > > key=123 AND name:(ABC OR XYZ) > > > > > > > > > > > > > > > > > > Initially I will apply this query to each of the Family > > types, > > > > > namely > > > > > > > > > "ACCOUNTS", "CONTACTS" etc.... and get their RecIds.. > > > > > > > > > > > > > > > > > > After this, I will have to filter "CONTACTS" family > results, > > > > based > > > > > on > > > > > > > > > RecIds received from "ACCOUNTS" [Join within records of > > > different > > > > > > > family, > > > > > > > > > based on FK] > > > > > > > > > > > > > > > > > > Is something like this achievable? Can I design it > > differently > > > to > > > > > > > satisfy > > > > > > > > > my requirements? > > > > > > > > > > > > > > > > > > > > > > > > > I may not fully understand your scenario. > > > > > > > > > > > > > > > > As I understand your example above: > > > > > > > > > > > > > > > > Row { > > > > > > > > "id" => "123", > > > > > > > > "records" => [ > > > > > > > > Record { > > > > > > > > "recordId" => "1000", "family" => "accounts", > > > > > > > > "columns" => [Column {"name" => "abc"}] > > > > > > > > }, > > > > > > > > Record { > > > > > > > > "recordId" => "1001", "family" => "contacts", > > > > > > > > "columns" => [Column {"name" => "abc"}] > > > > > > > > } > > > > > > > > ] > > > > > > > > } > > > > > > > > > > > > > > > > Let me go through some example queries that we support: > > > > > > > > > > > > > > > > +<accounts.name:abc> +<contacts.name:abc> > > > > > > > > > > > > > > > > Another way of writing it would be: > > > > > > > > > > > > > > > > <accounts.name:abc> AND <contacts.name:abc> > > > > > > > > > > > > > > > > Would yield a hit on the Row, there aren't any FKs in Blur. > > > > > > > > > > > > > > > > However if there are some interesting queries that be done > with > > > > more > > > > > > > > examples: > > > > > > > > > > > > > > > > Row { > > > > > > > > "id" => "123", > > > > > > > > "records" => [ > > > > > > > > Record { > > > > > > > > "recordId" => "1000", "family" => "accounts", > > > > > > > > "columns" => [Column {"name" => "abc"}] > > > > > > > > }, > > > > > > > > Record { > > > > > > > > "recordId" => "1001", "family" => "contacts", > > > > > > > > "columns" => [Column {"name" => "abc"}] > > > > > > > > } > > > > > > > > ] > > > > > > > > } > > > > > > > > > > > > > > > > Row { > > > > > > > > "id" => "456", > > > > > > > > "records" => [ > > > > > > > > Record { > > > > > > > > "recordId" => "1000", "family" => "accounts", > > > > > > > > "columns" => [Column {"name" => "abc"}] > > > > > > > > }, > > > > > > > > Record { > > > > > > > > "recordId" => "1001", "family" => "contacts", > > > > > > > > "columns" => [Column {"name" => "abc"}] > > > > > > > > }, > > > > > > > > Record { > > > > > > > > "recordId" => "1002", "family" => "contacts", > > > > > > > > "columns" => [Column {"name" => "def"}] > > > > > > > > } > > > > > > > > ] > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > Row { > > > > > > > > "id" => "789", > > > > > > > > "records" => [ > > > > > > > > Record { > > > > > > > > "recordId" => "1000", "family" => "accounts", > > > > > > > > "columns" => [Column {"name" => "abc"}] > > > > > > > > }, > > > > > > > > Record { > > > > > > > > "recordId" => "1002", "family" => "contacts", > > > > > > > > "columns" => [Column {"name" => "def"}] > > > > > > > > } > > > > > > > > ] > > > > > > > > } > > > > > > > > > > > > > > > > For the given query: "<accounts.name:abc> AND <contacts.name > : > > > abc>" > > > > > > would > > > > > > > > yield 2 Row hits. 123 and 456 > > > > > > > > For the given query: "<accounts.name:abc> AND <contacts.name > : > > > def>" > > > > > > would > > > > > > > > yield 2 Row hits. 456 and 789 > > > > > > > > For the given query: "<contacts.name:abc> AND <contacts.name > : > > > def>" > > > > > > would > > > > > > > > yield 1 Row hit of 456. NOTICE that the family is the same > > here > > > > > > > > "contacts". > > > > > > > > > > > > > > > > Also in Blur you can turn off the Row query and just query > the > > > > > records. > > > > > > > > This would be your typical Document like access. > > > > > > > > > > > > > > > > I fear that this has not answered your question, so if it > > hasn't > > > > > please > > > > > > > let > > > > > > > > me know. > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > Aaron > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > Ravi > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Sep 17, 2013 at 7:01 PM, Aaron McCurry < > > > > [email protected] > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > First off let me say welcome! Hopefully I can answer > your > > > > > > questions > > > > > > > > > inline > > > > > > > > > > below. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Sep 17, 2013 at 6:52 AM, Ravikumar Govindarajan < > > > > > > > > > > [email protected]> wrote: > > > > > > > > > > > > > > > > > > > > > I am quite new to Blur and need some help with the > > > following > > > > > > > > questions > > > > > > > > > > > > > > > > > > > > > > 1. Lets say I have a replication_factor=3 for all HDFS > > > > indexes. > > > > > > In > > > > > > > > case > > > > > > > > > > one > > > > > > > > > > > of the server hosting HDFS indexes goes down [temporary > > or > > > > > > > > take-down], > > > > > > > > > > what > > > > > > > > > > > will happen to writes? Some kind-of HintedHandoff [as > in > > > > > > Cassandra] > > > > > > > > is > > > > > > > > > > > supported? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > When there is a Blur Shard Server failure state in > > ZooKeeper > > > > will > > > > > > > > change > > > > > > > > > > and the other shard servers will take action to bring the > > > down > > > > > > > shard(s) > > > > > > > > > > online. This is similar to the HBase region model. > While > > > the > > > > > > > shard(s) > > > > > > > > > are > > > > > > > > > > being relocated (which really means being reopened from > > HDFS) > > > > > > writes > > > > > > > to > > > > > > > > > the > > > > > > > > > > shard(s) being moved are not available. However the bulk > > > load > > > > > > > > capability > > > > > > > > > > is always available as long as HDFS is available, this > can > > be > > > > > used > > > > > > > > > through > > > > > > > > > > Hadoop MapReduce. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > To re-phrase, what is the Consistency Vs Availability > > > > trade-off > > > > > > in > > > > > > > > > Blur, > > > > > > > > > > > with replication_factor>1 for HDFS indexes? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Of the two Consistency is favored over Availability, > > however > > > we > > > > > are > > > > > > > > > > starting development (in 0.3.0) to increase availability > > > during > > > > > > > > failures. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. Since HDFSInputStream is used underneath, will this > > > result > > > > > in > > > > > > > too > > > > > > > > > much > > > > > > > > > > > of data-transfer back-and-forth? A case of > > > > multi-segment-merge > > > > > or > > > > > > > > even > > > > > > > > > > > wild-card search could trigger it. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Blur uses an in process file system cache (Block Cache is > > the > > > > > term > > > > > > > used > > > > > > > > > in > > > > > > > > > > the code) to reduce the IO from HDFS. During index > merges > > > data > > > > > > that > > > > > > > is > > > > > > > > > not > > > > > > > > > > in the Block Cache is read from HDFS and the output is > > > written > > > > > back > > > > > > > to > > > > > > > > > > HDFS. Overall once an index is hot (been online for some > > > time) > > > > > the > > > > > > > IO > > > > > > > > > for > > > > > > > > > > any given search is fairly small assuming that the > cluster > > > has > > > > > > enough > > > > > > > > > > memory configured in the Block Cache. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3. Does Blur also support foreign-key like semantics to > > > > search > > > > > > > across > > > > > > > > > > > column-families as well as delete using row_id? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Blur supports something called Row Queries that allow for > > > > > searches > > > > > > > > across > > > > > > > > > > column families within single Rows. Take a look at this > > page > > > > > for a > > > > > > > > > better > > > > > > > > > > explanation: > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://incubator.apache.org/blur/docs/0.2.0/data-model.html#querying > > > > > > > > > > > > > > > > > > > > And yes Blur supports deletes by Row check out: > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://incubator.apache.org/blur/docs/0.2.0/Blur.html#Fn_Blur_mutate > > > > > > > > > > and > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://incubator.apache.org/blur/docs/0.2.0/Blur.html#Struct_RowMutation > > > > > > > > > > > > > > > > > > > > Hopefully this can answer so of your questions. Let us > > know > > > if > > > > > you > > > > > > > > have > > > > > > > > > > any more. > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Aaron > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > Ravi > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
