Does that methodology work in this specific case? The ordering must be a subset of the clustering to guarantee they exist in the same partition when doing a global sort I thought. Though I get the gist that if it does satisfy, then there is no reason to not choose the global sort.
On Fri, Mar 30, 2018 at 1:31 PM, Ryan Blue <rb...@netflix.com> wrote: > > Can you expand on how the ordering containing the clustering expressions > would ensure the global sort? > > The idea was to basically assume that if the clustering can be satisfied > by a global sort, then do the global sort. For example, if the clustering > is Set("b", "a") and the sort is Seq("a", "b", "c") then do a global sort > by columns a, b, and c. > > Technically, you could do this with a hash partitioner instead of a range > partitioner and sort within each partition, but that doesn't make much > sense because the partitioning would ensure that each partition has just > one combination of the required clustering columns. Using a hash > partitioner would make it so that the in-partition sort basically ignores > the first few values, so it must be that the intent was a global sort. > > On Fri, Mar 30, 2018 at 6:51 AM, Patrick Woody <patrick.woo...@gmail.com> > wrote: > >> Right, you could use this to store a global ordering if there is only one >>> write (e.g., CTAS). I don’t think anything needs to change in that case, >>> you would still have a clustering and an ordering, but the ordering would >>> need to include all fields of the clustering. A way to pass in the >>> partition ordinal for the source to store would be required. >> >> >> Can you expand on how the ordering containing the clustering expressions >> would ensure the global sort? Having an RangePartitioning would certainly >> satisfy, but it isn't required - is the suggestion that if Spark sees this >> overlap, then it plans a global sort? >> >> On Thu, Mar 29, 2018 at 12:16 PM, Russell Spitzer < >> russell.spit...@gmail.com> wrote: >> >>> @RyanBlue I'm hoping that through the CBO effort we will continue to get >>> more detailed statistics. Like on read we could be using sketch data >>> structures to get estimates on unique values and density for each column. >>> You may be right that the real way for this to be handled would be giving a >>> "cost" back to a higher order optimizer which can decide which method to >>> use rather than having the data source itself do it. This is probably in a >>> far future version of the api. >>> >>> On Thu, Mar 29, 2018 at 9:10 AM Ryan Blue <rb...@netflix.com> wrote: >>> >>>> Cassandra can insert records with the same partition-key faster if they >>>> arrive in the same payload. But this is only beneficial if the incoming >>>> dataset has multiple entries for the same partition key. >>>> >>>> Thanks for the example, the recommended partitioning use case makes >>>> more sense now. I think we could have two interfaces, a >>>> RequiresClustering and a RecommendsClustering if we want to support >>>> this. But I’m skeptical it will be useful for two reasons: >>>> >>>> - Do we want to optimize the low cardinality case? Shuffles are >>>> usually much cheaper at smaller sizes, so I’m not sure it is necessary >>>> to >>>> optimize this away. >>>> - How do we know there isn’t just a few partition keys for all the >>>> records? It may look like a shuffle wouldn’t help, but we don’t know the >>>> partition keys until it is too late. >>>> >>>> Then there’s also the logic for avoiding the shuffle and how to >>>> calculate the cost, which sounds like something that needs some details >>>> from CBO. >>>> >>>> I would assume that given the estimated data size from Spark and >>>> options passed in from the user, the data source could make a more >>>> intelligent requirement on the write format than Spark independently. >>>> >>>> This is a good point. >>>> >>>> What would an implementation actually do here and how would information >>>> be passed? For my use cases, the store would produce the number of tasks >>>> based on the estimated incoming rows, because the source has the best idea >>>> of how the rows will compress. But, that’s just applying a multiplier most >>>> of the time. To be very useful, this would have to handle skew in the rows >>>> (think row with a type where total size depends on type) and that’s a bit >>>> harder. I think maybe an interface that can provide relative cost estimates >>>> based on partition keys would be helpful, but then keep the planning logic >>>> in Spark. >>>> >>>> This is probably something that we could add later as we find use cases >>>> that require it? >>>> >>>> I wouldn’t assume that a data source requiring a certain write format >>>> would give any guarantees around reading the same data? In the cases where >>>> it is a complete overwrite it would, but for independent writes it could >>>> still be useful for statistics or compression. >>>> >>>> Right, you could use this to store a global ordering if there is only >>>> one write (e.g., CTAS). I don’t think anything needs to change in that >>>> case, you would still have a clustering and an ordering, but the ordering >>>> would need to include all fields of the clustering. A way to pass in the >>>> partition ordinal for the source to store would be required. >>>> >>>> For the second point that ordering is useful for statistics and >>>> compression, I completely agree. Our best practices doc tells users to >>>> always add a global sort when writing because you get the benefit of a >>>> range partitioner to handle skew, plus the stats and compression you’re >>>> talking about to optimize for reads. I think the proposed API can request a >>>> global ordering from Spark already. My only point is that there isn’t much >>>> the source can do to guarantee ordering for reads when there is more than >>>> one write. >>>> >>>> >>>> On Wed, Mar 28, 2018 at 7:14 PM, Patrick Woody < >>>> patrick.woo...@gmail.com> wrote: >>>> >>>>> Spark would always apply the required clustering and sort order >>>>>> because they are required by the data source. It is reasonable for a >>>>>> source >>>>>> to reject data that isn’t properly prepared. For example, data must be >>>>>> written to HTable files with keys in order or else the files are invalid. >>>>>> Sorting should not be implemented in the sources themselves because Spark >>>>>> handles concerns like spilling to disk. Spark must prepare data >>>>>> correctly, >>>>>> which is why the interfaces start with “Requires”. >>>>> >>>>> >>>>> This was in reference to Russell's suggestion that the data source >>>>> could have a required sort, but only a recommended partitioning. I don't >>>>> have an immediate recommending use case that would come to mind though. >>>>> I'm >>>>> definitely in sync that the data source itself shouldn't do work outside >>>>> of >>>>> the writes themselves. >>>>> >>>>> Considering the second use case you mentioned first, I don’t think it >>>>>> is a good idea for a table to put requirements on the number of tasks >>>>>> used >>>>>> for a write. The parallelism should be set appropriately for the data >>>>>> volume, which is for Spark or the user to determine. A minimum or maximum >>>>>> number of tasks could cause bad behavior. >>>>> >>>>> >>>>> For your first use case, an explicit global ordering, the problem is >>>>>> that there can’t be an explicit global ordering for a table when it is >>>>>> populated by a series of independent writes. Each write could have a >>>>>> global >>>>>> order, but once those files are written, you have to deal with multiple >>>>>> sorted data sets. I think it makes sense to focus on order within data >>>>>> files, not order between data files. >>>>> >>>>> >>>>> This is where I'm interested in learning about the separation of >>>>> responsibilities for the data source and how "smart" it is supposed to be. >>>>> >>>>> For the first part, I would assume that given the estimated data size >>>>> from Spark and options passed in from the user, the data source could make >>>>> a more intelligent requirement on the write format than Spark >>>>> independently. Somewhat analogous to how the current FileSource does bin >>>>> packing of small files on the read side, restricting parallelism for the >>>>> sake of overhead. >>>>> >>>>> For the second, I wouldn't assume that a data source requiring a >>>>> certain write format would give any guarantees around reading the same >>>>> data? In the cases where it is a complete overwrite it would, but for >>>>> independent writes it could still be useful for statistics or compression. >>>>> >>>>> Thanks >>>>> Pat >>>>> >>>>> >>>>> >>>>> On Wed, Mar 28, 2018 at 8:28 PM, Ryan Blue <rb...@netflix.com> wrote: >>>>> >>>>>> How would Spark determine whether or not to apply a recommendation - >>>>>> a cost threshold? >>>>>> >>>>>> Spark would always apply the required clustering and sort order >>>>>> because they are required by the data source. It is reasonable for a >>>>>> source >>>>>> to reject data that isn’t properly prepared. For example, data must be >>>>>> written to HTable files with keys in order or else the files are invalid. >>>>>> Sorting should not be implemented in the sources themselves because Spark >>>>>> handles concerns like spilling to disk. Spark must prepare data >>>>>> correctly, >>>>>> which is why the interfaces start with “Requires”. >>>>>> >>>>>> I’m not sure what the second half of your question means. What does >>>>>> Spark need to pass into the data source? >>>>>> >>>>>> Should a datasource be able to provide a Distribution proper rather >>>>>> than just the clustering expressions? Two use cases would be for explicit >>>>>> global sorting of the dataset and attempting to ensure a minimum write >>>>>> task >>>>>> size/number of write tasks. >>>>>> >>>>>> Considering the second use case you mentioned first, I don’t think it >>>>>> is a good idea for a table to put requirements on the number of tasks >>>>>> used >>>>>> for a write. The parallelism should be set appropriately for the data >>>>>> volume, which is for Spark or the user to determine. A minimum or maximum >>>>>> number of tasks could cause bad behavior. >>>>>> >>>>>> That said, I think there is a related use case for sharding. But >>>>>> that’s really just a clustering by an expression with the shard >>>>>> calculation, e.g., hash(id_col, 64). The shards should be handled as >>>>>> a cluster, but it doesn’t matter how many tasks are used for it. >>>>>> >>>>>> For your first use case, an explicit global ordering, the problem is >>>>>> that there can’t be an explicit global ordering for a table when it is >>>>>> populated by a series of independent writes. Each write could have a >>>>>> global >>>>>> order, but once those files are written, you have to deal with multiple >>>>>> sorted data sets. I think it makes sense to focus on order within data >>>>>> files, not order between data files. >>>>>> >>>>>> >>>>>> On Wed, Mar 28, 2018 at 7:26 AM, Patrick Woody < >>>>>> patrick.woo...@gmail.com> wrote: >>>>>> >>>>>>> How would Spark determine whether or not to apply a recommendation - >>>>>>> a cost threshold? And yes, it would be good to flesh out what >>>>>>> information >>>>>>> we get from Spark in the datasource when providing these >>>>>>> recommendations/requirements - I could see statistics and the existing >>>>>>> outputPartitioning/Ordering of the child plan being used for providing >>>>>>> the >>>>>>> requirement. >>>>>>> >>>>>>> Should a datasource be able to provide a Distribution proper rather >>>>>>> than just the clustering expressions? Two use cases would be for >>>>>>> explicit >>>>>>> global sorting of the dataset and attempting to ensure a minimum write >>>>>>> task >>>>>>> size/number of write tasks. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Mar 27, 2018 at 7:59 PM, Russell Spitzer < >>>>>>> russell.spit...@gmail.com> wrote: >>>>>>> >>>>>>>> Thanks for the clarification, definitely would want to require Sort >>>>>>>> but only recommend partitioning ... I think that would be useful to >>>>>>>> request based on details about the incoming dataset. >>>>>>>> >>>>>>>> On Tue, Mar 27, 2018 at 4:55 PM Ryan Blue <rb...@netflix.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> A required clustering would not, but a required sort would. >>>>>>>>> Clustering is asking for the input dataframe's partitioning, and >>>>>>>>> sorting >>>>>>>>> would be how each partition is sorted. >>>>>>>>> >>>>>>>>> On Tue, Mar 27, 2018 at 4:53 PM, Russell Spitzer < >>>>>>>>> russell.spit...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> I forgot since it's been a while, but does Clustering support >>>>>>>>>> allow requesting that partitions contain elements in order as well? >>>>>>>>>> That >>>>>>>>>> would be a useful trick for me. IE >>>>>>>>>> Request/Require(SortedOn(Col1)) >>>>>>>>>> Partition 1 -> ((A,1), (A, 2), (B,1) , (B,2) , (C,1) , (C,2)) >>>>>>>>>> >>>>>>>>>> On Tue, Mar 27, 2018 at 4:38 PM Ryan Blue >>>>>>>>>> <rb...@netflix.com.invalid> wrote: >>>>>>>>>> >>>>>>>>>>> Thanks, it makes sense that the existing interface is for >>>>>>>>>>> aggregation and not joins. Why are there requirements for the >>>>>>>>>>> number of >>>>>>>>>>> partitions that are returned then? >>>>>>>>>>> >>>>>>>>>>> Does it makes sense to design the write-side `Requirement` >>>>>>>>>>> classes and the read-side reporting separately? >>>>>>>>>>> >>>>>>>>>>> On Tue, Mar 27, 2018 at 3:56 PM, Wenchen Fan < >>>>>>>>>>> cloud0...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Ryan, yea you are right that SupportsReportPartitioning >>>>>>>>>>>> doesn't expose hash function, so Join can't benefit from this >>>>>>>>>>>> interface, as >>>>>>>>>>>> Join doesn't require a general ClusteredDistribution, but a more >>>>>>>>>>>> specific >>>>>>>>>>>> one called HashClusteredDistribution. >>>>>>>>>>>> >>>>>>>>>>>> So currently only Aggregate can benefit from >>>>>>>>>>>> SupportsReportPartitioning and save shuffle. We can add a new >>>>>>>>>>>> interface to >>>>>>>>>>>> expose the hash function to make it work for Join. >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Mar 27, 2018 at 9:33 AM, Ryan Blue <rb...@netflix.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I just took a look at SupportsReportPartitioning and I'm not >>>>>>>>>>>>> sure that it will work for real use cases. It doesn't specify, as >>>>>>>>>>>>> far as I >>>>>>>>>>>>> can tell, a hash function for combining clusters into tasks or a >>>>>>>>>>>>> way to >>>>>>>>>>>>> provide Spark a hash function for the other side of a join. It >>>>>>>>>>>>> seems >>>>>>>>>>>>> unlikely to me that many data sources would have partitioning >>>>>>>>>>>>> that happens >>>>>>>>>>>>> to match the other side of a join. And, it looks like task order >>>>>>>>>>>>> matters? >>>>>>>>>>>>> Maybe I'm missing something? >>>>>>>>>>>>> >>>>>>>>>>>>> I think that we should design the write side independently >>>>>>>>>>>>> based on what data stores actually need, and take a look at the >>>>>>>>>>>>> read side >>>>>>>>>>>>> based on what data stores can actually provide. Wenchen, was >>>>>>>>>>>>> there a design >>>>>>>>>>>>> doc for partitioning on the read path? >>>>>>>>>>>>> >>>>>>>>>>>>> I completely agree with your point about a global sort. We >>>>>>>>>>>>> recommend to all of our data engineers to add a sort to most >>>>>>>>>>>>> tables because >>>>>>>>>>>>> it introduces the range partitioner and does a skew calculation, >>>>>>>>>>>>> in >>>>>>>>>>>>> addition to making data filtering much better when it is read. >>>>>>>>>>>>> It's really >>>>>>>>>>>>> common for tables to be skewed by partition values. >>>>>>>>>>>>> >>>>>>>>>>>>> rb >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Mar 26, 2018 at 7:59 PM, Patrick Woody < >>>>>>>>>>>>> patrick.woo...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hey Ryan, Ted, Wenchen >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for the quick replies. >>>>>>>>>>>>>> >>>>>>>>>>>>>> @Ryan - the sorting portion makes sense, but I think we'd >>>>>>>>>>>>>> have to ensure something similar to requiredChildDistribution in >>>>>>>>>>>>>> SparkPlan >>>>>>>>>>>>>> where we have the number of partitions as well if we'd want to >>>>>>>>>>>>>> further >>>>>>>>>>>>>> report to SupportsReportPartitioning, yeah? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Specifying an explicit global sort can also be useful for >>>>>>>>>>>>>> filtering purposes on Parquet row group stats if we have a time >>>>>>>>>>>>>> based/high >>>>>>>>>>>>>> cardinality ID field. If my datasource or catalog knows about >>>>>>>>>>>>>> previous >>>>>>>>>>>>>> queries on a table, it could be really useful to recommend more >>>>>>>>>>>>>> appropriate >>>>>>>>>>>>>> formatting for consumers on the next materialization. The same >>>>>>>>>>>>>> would be >>>>>>>>>>>>>> true of clustering on commonly joined fields. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks again >>>>>>>>>>>>>> Pat >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Mar 26, 2018 at 10:05 PM, Ted Yu <yuzhih...@gmail.com >>>>>>>>>>>>>> > wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hmm. Ryan seems to be right. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Looking at sql/core/src/main/java/org/ >>>>>>>>>>>>>>> apache/spark/sql/sources/v2/reader/SupportsReportPartitioning.java >>>>>>>>>>>>>>> : >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> import org.apache.spark.sql.sources.v >>>>>>>>>>>>>>> 2.reader.partitioning.Partitioning; >>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>> Partitioning outputPartitioning(); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Mon, Mar 26, 2018 at 6:58 PM, Wenchen Fan < >>>>>>>>>>>>>>> cloud0...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Actually clustering is already supported, please take a >>>>>>>>>>>>>>>> look at SupportsReportPartitioning >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Ordering is not proposed yet, might be similar to what Ryan >>>>>>>>>>>>>>>> proposed. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Mon, Mar 26, 2018 at 6:11 PM, Ted Yu < >>>>>>>>>>>>>>>> yuzhih...@gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Interesting. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Should requiredClustering return a Set of Expression's ? >>>>>>>>>>>>>>>>> This way, we can determine the order of Expression's by >>>>>>>>>>>>>>>>> looking at what requiredOrdering() returns. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Mon, Mar 26, 2018 at 5:45 PM, Ryan Blue < >>>>>>>>>>>>>>>>> rb...@netflix.com.invalid> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi Pat, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks for starting the discussion on this, we’re really >>>>>>>>>>>>>>>>>> interested in it as well. I don’t think there is a proposed >>>>>>>>>>>>>>>>>> API yet, but I >>>>>>>>>>>>>>>>>> was thinking something like this: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> interface RequiresClustering { >>>>>>>>>>>>>>>>>> List<Expression> requiredClustering(); >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> interface RequiresSort { >>>>>>>>>>>>>>>>>> List<SortOrder> requiredOrdering(); >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The reason why RequiresClustering should provide >>>>>>>>>>>>>>>>>> Expression is that it needs to be able to customize the >>>>>>>>>>>>>>>>>> implementation. For example, writing to HTable would require >>>>>>>>>>>>>>>>>> building a key >>>>>>>>>>>>>>>>>> (or the data for a key) and that might use a hash function >>>>>>>>>>>>>>>>>> that differs >>>>>>>>>>>>>>>>>> from Spark’s built-ins. RequiresSort is fairly >>>>>>>>>>>>>>>>>> straightforward, but the interaction between the two >>>>>>>>>>>>>>>>>> requirements deserves >>>>>>>>>>>>>>>>>> some consideration. To make the two compatible, I think that >>>>>>>>>>>>>>>>>> RequiresSort must be interpreted as a sort within each >>>>>>>>>>>>>>>>>> partition of the clustering, but could possibly be used for >>>>>>>>>>>>>>>>>> a global sort >>>>>>>>>>>>>>>>>> when the two overlap. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> For example, if I have a table partitioned by “day” and >>>>>>>>>>>>>>>>>> “category” then the RequiredClustering would be by day, >>>>>>>>>>>>>>>>>> category. A required sort might be day ASC, category >>>>>>>>>>>>>>>>>> DESC, name ASC. Because that sort satisfies the required >>>>>>>>>>>>>>>>>> clustering, it could be used for a global ordering. But, is >>>>>>>>>>>>>>>>>> that useful? >>>>>>>>>>>>>>>>>> How would the global ordering matter beyond a sort within >>>>>>>>>>>>>>>>>> each partition, >>>>>>>>>>>>>>>>>> i.e., how would the partition’s place in the global ordering >>>>>>>>>>>>>>>>>> be passed? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> To your other questions, you might want to have a look at >>>>>>>>>>>>>>>>>> the recent SPIP I’m working on to consolidate and clean >>>>>>>>>>>>>>>>>> up logical plans >>>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1gYm5Ji2Mge3QBdOliFV5gSPTKlX4q1DCBXIkiyMv62A/edit?ts=5a987801#heading=h.m45webtwxf2d>. >>>>>>>>>>>>>>>>>> That proposes more specific uses for the DataSourceV2 API >>>>>>>>>>>>>>>>>> that should help >>>>>>>>>>>>>>>>>> clarify what validation needs to take place. As for custom >>>>>>>>>>>>>>>>>> catalyst rules, >>>>>>>>>>>>>>>>>> I’d like to hear about the use cases to see if we can build >>>>>>>>>>>>>>>>>> it into these >>>>>>>>>>>>>>>>>> improvements. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> rb >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Mon, Mar 26, 2018 at 8:40 AM, Patrick Woody < >>>>>>>>>>>>>>>>>> patrick.woo...@gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hey all, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I saw in some of the discussions around DataSourceV2 >>>>>>>>>>>>>>>>>>> writes that we might have the data source inform Spark of >>>>>>>>>>>>>>>>>>> requirements for >>>>>>>>>>>>>>>>>>> the input data's ordering and partitioning. Has there been >>>>>>>>>>>>>>>>>>> a proposed API >>>>>>>>>>>>>>>>>>> for that yet? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Even one level up it would be helpful to understand how >>>>>>>>>>>>>>>>>>> I should be thinking about the responsibility of the data >>>>>>>>>>>>>>>>>>> source writer, >>>>>>>>>>>>>>>>>>> when I should be inserting a custom catalyst rule, and how >>>>>>>>>>>>>>>>>>> I should handle >>>>>>>>>>>>>>>>>>> validation/assumptions of the table before attempting the >>>>>>>>>>>>>>>>>>> write. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>>>>>> Pat >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> Ryan Blue >>>>>>>>>>>>>>>>>> Software Engineer >>>>>>>>>>>>>>>>>> Netflix >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Ryan Blue >>>>>>>>>>>>> Software Engineer >>>>>>>>>>>>> Netflix >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Ryan Blue >>>>>>>>>>> Software Engineer >>>>>>>>>>> Netflix >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Ryan Blue >>>>>>>>> Software Engineer >>>>>>>>> Netflix >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Ryan Blue >>>>>> Software Engineer >>>>>> Netflix >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Ryan Blue >>>> Software Engineer >>>> Netflix >>>> >>> >> > > > -- > Ryan Blue > Software Engineer > Netflix >