Re: [DISCUSS] Enhancements to Tuple Sketch.

David Cromberge Tue, 02 Jun 2020 07:50:07 -0700

Thanks for such a detailed response Lee!  You and your have really raised the 
bar and made Apache data-sketches a very welcoming and approachable community.


I had to re-read your response several times to appreciate the finer details of 
the difficulties that you were referring to.  For an analytics architecture, I 
have anecdotal experience that it is often useful to leverage some intrinsic 
properties of analytic data to make simplifications.
In this instance, assuming that the write workload is independent of the read 
workload, it is fair to assume that the data being queried is immutable, and of 
a fixed size, such as an update sketch (write) vs compact sketch (read).  
However, I didn’t account for storing intermediate results of set operations 
off-heap as well.    At first, it seemed natural to store the intermediate 
accumulator sketch on the heap, and combine this in a pairwise fashion with 
off-heap sketches.  This is probably a fundamental problem with my 
understanding of off-heap allocation, because mixing the two memory sources 
during an operation would not be possible.  You have provided a lot of insight 
here regarding all the problems with dynamically sized slot allocation, and its 
challenges.

In the past, I have used a memory-mapped file to allow the OS to manage 
loading/unloading file segments into memory on demand, which is used in Influx 
and Prometheus TSDB.  I had this in mind when you mentioned Postgres managing 
memory allocations.  

Not to derail the thread, but I would like to raise another discussion at some 
point on the broader topic of whether it is a good idea to performa 1000s of 
set operations at query time, or whether pre-combining and eliminating the long 
tail and uninteresting data (with a tool like Macrobase), would help minimise 
the error introduced by all the set operations.  As mentioned on the site, 
using pre-computations to deal with power-law distributions would lead to many 
small sketches which would store every key anyway.

Returning to the topic at hand, we previously discussed having a generic byte 
implementation for a Tuple Sketch, which could unify the different concrete 
summary types.  I was thinking through how this could be implemented, and it 
would be difficult to do without introducing problems with backwards 
compatibility.  I had an alternative idea that also serves to further widen the 
utility of different Tuple Sketches - namely, introducing a conversion between 
summary types.

This function could be defined as:

        public Sketch<S> convertSummaries<T extends Summary>(S -> T converter) 
{ … }

This approach may be used to freely convert between summary representations, 
and integer, strings, double etc.  For ArrayOfDoubles, a more concrete 
representation would be needed:

        public Sketch<S> convertSummaries<S extends Summary>([double] -> S 
converter) { … }

Such adaptation is compelling because it prevents an all-or-nothing approach to 
storing sketches, in some external datastore.  If we decided today to use a 
TupleInteger, and ran into limitations, we could adapt them to double 
implementations and write these back to storage on access, in a lazy manner.  
Moreover, there may be cases where sketches between datatypes could be mixed in 
the set operations.  The downside is, that some summary types may not be 
coercible, but exposing this as a function allows the user to decide.

Concerning my situation, I have started adopting your latest branch, where I 
have successfully started combining Integer Tuple sketches with Theta sketches 
for the engagements use case.  It is a substantial victory not to have to 
re-encode all our Theta sketches!

Thanks again for the background material, you have helped me and my team 
immensely.

David



> On 29 May 2020, at 19:56, leerho <[email protected]> wrote:
> 
> David, 
> 
> This is great, you are getting up-to-speed fast!.  I really appreciate your 
> digging into this :)
> 
> Putting sketches off-heap is an interesting and deep topic.  I will try to 
> share some of the concepts here, but ultimately, this kind of information 
> needs to be in some sort of tutorial section on the web site.  Perhaps you 
> might have suggestions on how this could be better presented.
> 
> Large Java system clusters have large amounts of RAM.  A few years ago a 
> single machine in a cluster might have 24 CPUs, 48 hyperthreads, and 256GB of 
> RAM.  A medium-sized cluster of such machines might consist of 100 such 
> machines, where the cluster RAM is now 25TB.  That is a good chunk of memory! 
> 
> One model might be where each machine might have only one JVM acting like a 
> supervisor and configured with perhaps 16 GB of RAM.   All the rest of memory 
> is allocated to data, which is paged in and out from disk or even ingested 
> directly from stream feeds or back-end systems such as Hadoop.  
> 
> Traditionally, most of the data was in the form of primitives and strings, 
> but one underlying assumption historically has been that the data is static 
> in size.  When the data is read in, you know what the size is and that 
> doesn't change for the lifetime that that data exists in memory. Analytic 
> processing, of course, create large amounts of intermediate storage but even 
> in these cases, it is generally pretty easy to predict how much storage is 
> required.
> 
> Sketches, viewed as data, especially when there are billions of sketches that 
> need to be processed present some new challenges.  Suppose in my query 
> processing I need to merge millions of sketches together and all of these 
> sketches are sitting in off-heap memory, as they were preprocessed and built 
> in the back-end system and ingested into my analytic query engine cluster.  
> If we had to "heapify" each sketch image into a sketch object prior to its 
> being merged would be very costly as that requires a deserialization and copy 
> for each sketch.  So the first objective of having sketches off-heap is to 
> provide the ability to interpret the sketch image for merging purposes 
> without having to copy or deserialize the image.   This we do with our 
> "Wrap(Memory)" functions. 
> 
> But the query engine needs to allocate perhaps thousands of set operators 
> (Union, Intersection, Difference) that do the merging.  It would also be 
> useful to have these allocated in large segment columns off-heap and manage 
> their memory allocation directly in order to reduce the pressure on the 
> garbage collector.  There are two challenges with this.  First, these 
> operators grow dynamically, and second, Java does not really support the 
> concept of programming dynamic objects off-heap. The only mechanism that Java 
> provides has been the ByteBuffer, which has severe limitations.
> 
> This is why we created the Memory component.  You can think of it as a 
> ByteBuffer replacement, but it is much more.  The Memory component is what 
> allows us to do updating and merging of sketches off-heap.  Once we decide to 
> do this, we are back into more of a C/C++ style of programming where we need 
> to do our own "malloc()" and "free()" operations directly.   But this is 
> exactly what large, real-time, query and analysis engines have been doing for 
> a while.  And these systems have been taking advantage of hidden capabilities 
> in Java, such as Unsafe, in order to achieve unparalleled performance.  Use 
> of Unsafe is a hotly debated topic in the Java community, nonetheless, our 
> Memory component also takes advantage of Unsafe (as does the ByteBuffer).  
> (How we move beyond Java 8 is a whole different topic!)
> 
> The first attempt to allocate a segment of, say, 1000 dynamic sketches (or 
> set operators) in off-heap memory if often to choose a slot size that is the 
> maximum size that a sketch can grow to, given its K configuration, and then 
> evenly divide the segment into 1000 slots of that size.  This turns out to be 
> horribly wasteful.  Big data is almost never just one chunk, but is often 
> highly partitioned into multiple dimensions.  And the result of this natural 
> fragmentation is that the sizes of all the combinations of these dimensions 
> tend to follow a power-law distribution, also called "the long tail".  This 
> happens in nature and almost anything categorized by humans.  What this means 
> is that if you have millions of sketches that have processed the millions of 
> streams of all the dimensional combinations, the vast majority of the 
> sketches will have 1 or a few entries and be very small in size.  If each of 
> these tiny sketches occupy a slot in memory that was set to the maximum size 
> to which that sketch can grow, we have wasted a huge amount of memory.
> 
> A smarter approach is one we learned as we were integrating our C++ 
> datasketches library into PostgreSQL.  Of course C++ doesn't have the 
> off-heap problem at all. Nevertheless, the PostgreSQL system needs to manage 
> and track what gets allocated and deallocated in memory.  So PostgreSQL 
> created "palloc()" and "pfree()" functions for the user-developer, where the 
> PostgreSQL can intercept and manage the underlying malloc and free processes. 
>   This is also what we have proposed to Druid and I think they like this 
> approach, but they also have a long history of using fixed sized slots and 
> transitioning to dynamically sized slots initiated by the user process is a 
> major transition for them and will take a while.
> 
> The question still remains on how to manage the overall system memory 
> requirements.  Although we cannot predict which sketches among the millions 
> of sketches will need only small slots and which ones will grow to the 
> maximum size, in aggregate, we can learn (from the data) and predict how much 
> memory we will need which tends to be pretty stable.  
> 
> Cheers,
> 
> Lee.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Fri, May 29, 2020 at 4:08 AM David Cromberge 
> <[email protected] <mailto:[email protected]>> wrote:
> Hi Lee,
> 
> I have studied our usage of the off-heap features of the library.  Originally 
> there was a compelling reason to leverage this whilst decoding from 
> serialised bytes in storage.  When servicing queries on behalf of many 
> clients, decoding sketches into the heap was potentially problematic from a 
> scalability perspective.  However, for various reasons(unrelated to the 
> library), we do not currently make use of the off-heap features.
> Regardless, it would be interesting to hear some of the learnings from the 
> Druid case (whenever these can be provided), as I am busy collecting notes 
> and intend to supplement the website documentation where applicable.
> To sum up, off-heap may be reconsidered in the future, but is not currently a 
> priority.
> 
> If a byte-array Tuple sketch is being proposed, it may be possible in the end 
> to define the various specialisations (integer, double, strings and doubles) 
> in terms of the byte-array implementation.  It could also be of interest to 
> define a position in the preamble for Tuple sketches where there is a flag to 
> identify the summary type, where the defaults (integer, double, strings and 
> doubles) may reserve an identifier.  This would allow for a powerful Tuple 
> deserialiser that generalises over the summary.  However, I realise that 
> there are other concerns that come into play with backwards compatibility, as 
> well as performance criteria.
> 
> David.
> 
> 
>> On 28 May 2020, at 16:12, leerho <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> David, you are correct.  I also looked at the AOD implementation last night 
>> and it lacks some notable capabilities that I think you need, but others 
>> would want as well.
>> Integration of Theta (as you noted)
>> There is no selectable "mode" equivalent for a Union on how to combine two 
>> double summaries. Intersection has a dedicated combiner, but doesn't have 
>> the "mode" capability to allow easy choices between a set of modes.
>> There may be other issues, but I haven't studied this AOD code for several 
>> years :(
>> I am wondering if, for your case, if we had a generic, only-on-heap solution 
>> relatively quickly that you could use and get some experience with and could 
>> characterize its performance in your environment.  And then work on a more 
>> efficient off-heap solution later. 
>> 
>> Using sketches off-heap requires some significant design decisions at the 
>> system level to make it work really well.  We have worked with the Druid 
>> folks for quite a while and have learned some things that may be helpful for 
>> you.  But these issues are outside and beyond the issues of designing a 
>> sketch that can work off-heap.   It would be helpful to get a sense of where 
>> you are in your thinking about going off-heap.
>> 
>> Lee.
>> 
>> 
>> 
>> On Thu, May 28, 2020 at 7:45 AM David Cromberge 
>> <[email protected] <mailto:[email protected]>> wrote:
>> As a follow-up to the proposal below, it may not be necessary to provide a 
>> combiner and default set of values together, since there may be some 
>> redundancy here.
>> It’s probably better for me to re-iterate the intention - to provide a 
>> meaningful way to interpret the presence of a key in a theta with some 
>> suitable default value.
>> 
>>> On 28 May 2020, at 15:39, David Cromberge <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> Lee, the array of bytes implementation is a better approach than adding 
>>> off-heap implementations for the integer, double and strings Tuple sketches.
>>> 
>>> I originally overlooked the ArrayOfDoubles sketch for the purposes of 
>>> tracking engagement, since it implies that many values could be associated 
>>> with a hashed key, which doesn’t quite fit the use case.  
>>> 
>>> Having said that, I have now looked through the implementation and have 
>>> switched to the array of doubles sketch instead - after all, you pointed 
>>> out that it should suffice. 
>>> 
>>> I have run some initial benchmarks on sizing, and in compacted form I did 
>>> not get a reduction in the size of a compacted sketch generated from a real 
>>> data set, despite the benefits of using primitive doubles.  I realise that 
>>> this is dependent on the test case and tuning / configuration parameters, 
>>> so we could add a TODO item to add a characterisation test for this, if 
>>> there is not one already.
>>> 
>>> We don’t seem to have discussed how the Theta sketches may be included for 
>>> intersections and unions regarding an AOD sketch.   For unions, the 
>>> behaviour is delegated to a merge operation on the sketch, which ultimately 
>>> adds values together for the same key.  Concerning intersection, a combiner 
>>> implementation is used to determine how values should be combined.  It is 
>>> noteworthy that in the druid extension, the values are summed together, 
>>> with a comment noting that this may not apply to all circumstances.
>>> 
>>> I would propose a similar mechanism for both union and intersection on the 
>>> other sketches, where a default array of values can be provided for a tuple 
>>> sketch:
>>> 
>>>     public void update(final org.apache.datasketches.theta.Sketch sketchIn, 
>>> double[] defaultValues, ArrayOfDoublesCombiner c) {...}
>>> 
>>> Of course, a factory could be provided that creates a combiner according to 
>>> a summary mode.  The use case for this suggestion is to have context 
>>> specific behaviour with regard to merging / combining values in the case of 
>>> unions and intersections, which could be sourced from the user or user 
>>> query.
>>> 
>>> I would be interested to hear your thoughts on the 
>>> ArrayOfDoubles/ArrayOfBytes Tuple sketch integration with Theta sketches,
>>> David
>>> 
>>> 
>>>> On 28 May 2020, at 04:32, leerho <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> David,  In fact, double values do just fine with integer data.  They are 
>>>> twice as big at the primitive level, however, the dedicated ArrayOfDoubles 
>>>> implementation in total might actually be smaller, since it is not 
>>>> carrying all the object overhead that are required to do generics. Plus, 
>>>> it will be a whole lot faster! It is already fully implemented with all 
>>>> the set operations, off-heap memory, serialization /deserialization and a 
>>>> full test suite.  
>>>> 
>>>> Designing a similar dedicated ArrayOfIntegers would be a lot of work and 
>>>> wouldn't be my top priority for the next dedicated Tuple sketch to build.  
>>>> What would be more flexible would actually be a dedicated ArrayOfBytes 
>>>> implementation, because bytes are the foundation from which we can derive 
>>>> almost any summary we want.  
>>>> 
>>>> Think about it.
>>>> 
>>>> Lee.
>>>> 
>>>> 
>>>> 
>>>> On Wed, May 27, 2020 at 5:54 PM leerho <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> David,  This is good feedback.  I didn't realize you wanted off-heap 
>>>> operation.  That changes a lot of things.  I have to go right now, but let 
>>>> me think about this.  Attempting to leverage generics off-heap is messy.  
>>>> For your case it may be better to either leverage the ArrayOfDoubles 
>>>> implementation, which already operates off-heap, or think about emulating 
>>>> the AOD and creating a dedicated AOIntegers.  
>>>> 
>>>> Lee.
>>>> 
>>>> On Wed, May 27, 2020 at 3:04 PM David Cromberge 
>>>> <[email protected] <mailto:[email protected]>> 
>>>> wrote:
>>>> Hi Lee,
>>>> 
>>>> Thanks for providing such a detailed plan of action for the Tuple sketch 
>>>> package.
>>>> The enhancements that you have listed are interesting, and I will 
>>>> certainly check out your branch to get a clearer understanding of how the 
>>>> library is evolving.
>>>> 
>>>> For what it’s worth, here is a record of my attempt to integrate Theta 
>>>> sketches into the Tuple sketch set operations:
>>>> https://github.com/davecromberge/incubator-datasketches-java/commit/961ad48bbe709ccfcb973a7fab69e53088f113a5
>>>>  
>>>> <https://github.com/davecromberge/incubator-datasketches-java/commit/961ad48bbe709ccfcb973a7fab69e53088f113a5>
>>>> 
>>>> Although I have a cursory understanding of the library’s internals, I 
>>>> included the commit above because there were some interesting tradeoffs to 
>>>> the implementation, and it gave me a better appreciation for the the 
>>>> internal workings of the existing Tuple sketch as well as some of the 
>>>> finer points in your improvement work.   To a lesser degree, it also 
>>>> serves to independently confirm your argument for adding new variants of 
>>>> the update methods!
>>>> 
>>>> During implementation, I was also faced with the decision as to whether to 
>>>> duplicate the methods or to convert a Theta sketch to a Tuple sketch first 
>>>> and delegate to the existing methods.  But, as you noted, this requires an 
>>>> additional iteration through the result set and incurs a performance 
>>>> penalty.  Therefore, I also duplicated the existing update methods, with 
>>>> some changes for result extraction.  To ensure correctness, I found it 
>>>> necessary to duplicate a large portion of the existing test cases as well 
>>>> - replicating so many of the existing tests was not ideal, but helped 
>>>> verify that the implementation was correct.
>>>> It’s also worth mentioning that I had some difficulty implementing the 
>>>> AnotB functionality, and in fact the results were incorrect when the 
>>>> sketch crossed into estimation mode (see ignored tests).  I’m pleased to 
>>>> have attempted the exercise because it will give much better context as I 
>>>> study your branch further - especially the AnotB refactoring.
>>>> 
>>>> There is one addition that I would like to suggest to your list of TODO 
>>>> items - namely, off-heap implementations.  I am considering using the 
>>>> integer tuple sketch for our engagement use case and would prefer to 
>>>> prevent memory pressure by loading many sketches onto the heap.  I have 
>>>> noticed this come up in the past on #datasketches slack channel in a 
>>>> conversation with some Druid team members.  It appears that the off-heap 
>>>> implementations were omitted from the library due to time constraints, and 
>>>> this is area where I could also potentially provide default 
>>>> implementations for the other tuple sketches.  I think this is important 
>>>> to consider, because the existing ArrayOfDoubles implementation uses an 
>>>> abstract class for the parent.  Making the other Tuple sketches abstract 
>>>> in a similar manner is potentially a breaking change as well.
>>>> 
>>>> I am excited to collaborate on this together on this feature, and I would 
>>>> be happy to contribute in any possible way and coordinate through the 
>>>> project TODO page 
>>>> <https://github.com/apache/incubator-datasketches-java/projects/1>!
>>>> 
>>>> David
>>>> 
>>>> 
>>>> 
>>>>> On 27 May 2020, at 20:04, leerho <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> 
>>>>> David,
>>>>> 
>>>>> Thanks.  I have been putting a lot of thought into it as well and decided 
>>>>> that it was time to make some other long-needed changes in the Tuple 
>>>>> Family of sketches as well including the package layout, which has been 
>>>>> quite cumbersome.  I would suggest holding back on you actual 
>>>>> implementation work until you understand what I have changed so far and 
>>>>> then we can strategize on how to finish the work.   I have checked in my 
>>>>> changes so far into the "Tuple_Theta_Extension" branch, which you can 
>>>>> check out to see what I have been up to :)
>>>>> 
>>>>> The family of tuple sketches have evolved over time and somewhat 
>>>>> haphazardly.  So the first think I decided to do was to do some 
>>>>> rearranging of the package structure to make future downstream 
>>>>> improvements and extensions easier.  
>>>>> 
>>>>> 1.  The first problem is that the root tuple directory was cluttered with 
>>>>> two different groups of classes that made it difficult for anyone to 
>>>>> figure out what is going on.  One group of classes form the base generic 
>>>>> classes of the tuple sketch on which the concrete extensions "adouble" (a 
>>>>> single double), "aninteger" (a single integer), and "strings" (array of 
>>>>> strings) depend.  These three concrete extensions are already in their 
>>>>> own sub directories.  
>>>>> 
>>>>> The second, largest group of classes were a dedicated non-generic 
>>>>> implementation of the tuple sketch, which implemented an array of 
>>>>> doubles.  All of these classes had "ArrayOfDoubles" in their name.  These 
>>>>> classes shared no code with the root generic tuple classes except for a 
>>>>> few methods in the SerializerDeserializer and the Util classes.  By 
>>>>> making a few methods public, I was able to move all of the 
>>>>> "ArrayOfDoubles" classes into their own subdirectory.  This creates an 
>>>>> incompatible API break, which will force us to move to a 2.0.0 for the 
>>>>> next version.   Now the tuple root directory is much cleaner and easier 
>>>>> to navigate and understand.  There are several reasons for this separate 
>>>>> dedicated implementation. First, we felt that a configurable array of 
>>>>> doubles would be a relatively common use case.  Second, we wanted a full 
>>>>> concrete example of the tuple sketch as an example of what it would look 
>>>>> like including both on-heap and off-heap variants.   It is this 
>>>>> ArrayOfDoubles implementation that has been integrated into Druid, for 
>>>>> example. 
>>>>> 
>>>>> 2. Now that the package directories are cleaned up I was able to focus on 
>>>>> what it would mean to allow Tuple sketches to perform set operations with 
>>>>> Theta sketches.  
>>>>> 
>>>>> One approach would be to just provide a converter to take in a Theta 
>>>>> sketch and produce a Tuple sketch with some default or configured summary 
>>>>> and leave everything else the way it is.  But this is less efficient as 
>>>>> it requires more object creation and copying than a direct integration 
>>>>> would.  It turns out that modifying the generic Union and Intersection 
>>>>> classes only required adding one method to each.  I did some minor code 
>>>>> cleanup and code documentation at the same time.  
>>>>> 
>>>>> The AnotB operator is another story.  We have never been really happy 
>>>>> with how this was implemented the first time.  The current API is clumsy. 
>>>>>  So I have taken the opportunity to redesign the API for this class.  It 
>>>>> still has the current API methods but deprecated.  With the new modified 
>>>>> class the user has several ways of performing AnotB.
>>>>> 
>>>>> As stateless operations:
>>>>> With Tuple: resultSk = aNotB(skTupleA, skTupleB);
>>>>> With Theta: resultSk = aNotB(skTupleA, skThetaB);
>>>>> As stateful, sequential operations:
>>>>> void setA(skTupleA);
>>>>> void notB(skTupleB);   or   void notB(skThetaB);   //These are 
>>>>> interchangable.
>>>>> ...
>>>>> void notB(skTupleB);   or   void notB(skThetaB);   //These are 
>>>>> interchangable.
>>>>> resultSk = getResult(reset = false);  // This allows getting an 
>>>>> intermediate result
>>>>> void notB(skTupleB);   or   void notB(skThetaB);   //Continue...
>>>>> resultSK = getResult(reset = true); //This returns the result and clears 
>>>>> the internal state to empty.
>>>>> This I think is pretty slick and flexible.  
>>>>> 
>>>>> Work yet to be done on main:
>>>>> Reexamine the Union and Intersection APIs to add the option of an 
>>>>> intermediate result.
>>>>> Update the other concrete extensions to take advantage of the above new 
>>>>> API: "aninteger", "strings".
>>>>> Examine the dedicated "ArrayOfDoubes" implementation to see how hard it 
>>>>> would be to make the same changes as above.  Implement. Test.
>>>>> Work yet to be done on test:
>>>>> 
>>>>> I did major redesign of the testing class for the AnotB generic class 
>>>>> using the "adouble" concrete extension.  You can see this in 
>>>>> AdoubleAnotBTest.java.  This is essentially a deep exhaustive test of the 
>>>>> base AnotB classes via the concrete extension.   
>>>>> With the deep testing using the "adouble" done, we still need to design 
>>>>> new tests for the "aninteger" and "strings" extensions.  These can be 
>>>>> shallow tests.
>>>>> If we decide to do the same API extensions on the ArrayOfDoubles classes, 
>>>>> those will need to be tested.
>>>>> Work to be done on documentation:
>>>>> The website documentation is still rather thin on the whole Tuple family. 
>>>>>  Having someone that is a real user of these classes contribute to the 
>>>>> documentation to make it more understandable would be outstanding!
>>>>> Work to be done on characterization.
>>>>> The Tuple family has some characterization, but it is sparse and a lot 
>>>>> more would work here would give users a sense of the performance they 
>>>>> could expect.  We have also found that characterization is a powerful way 
>>>>> to find statistical bugs that don't show up in unit tests.   I could 
>>>>> guide you through how to set up the various "test harnesses", which is 
>>>>> really pretty simple, but the real thinking goes into the design of the 
>>>>> test and understanding the output.  This is a great way to really 
>>>>> understand how these sketches behave and why.  
>>>>> Work to be done on code reviews:
>>>>> Having independent set of eyes going over the code would also be a huge 
>>>>> contribution.  
>>>>> Once you have had a chance to study this we should talk about how you 
>>>>> want to contribute.  Clearly a lot of what I have done so far required 
>>>>> deep understanding of the Tuple and Theta classes and was was much more 
>>>>> efficient for me to do.  It would have been a hard slog for anyone new to 
>>>>> the library to undertake.
>>>>> 
>>>>> Once we decide on a strategy, we should put kanban cards in the project 
>>>>> TODO page 
>>>>> <https://github.com/apache/incubator-datasketches-java/projects/1>.
>>>>> 
>>>>> Please let me know what you think!
>>>>> 
>>>>> Lee.
>>>>> 
>>>>>    
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, May 27, 2020 at 7:53 AM David Cromberge 
>>>>> <[email protected] <mailto:[email protected]>> 
>>>>> wrote:
>>>>> Thank you Lee for your proposal regarding my use case and Tuple sketches.
>>>>> 
>>>>> I have spent some time considering the proposal, and I have started 
>>>>> implementing a potential solution.
>>>>> 
>>>>> At what stage of the pipeline should characterisation tests be proposed, 
>>>>> since they would obviously depend on a new SNAPSHOT version of the core 
>>>>> library being available? 
>>>>> 
>>>>> I would be grateful for any input about the characterisation workflow.
>>>>> 
>>>>> Thank you,
>>>>> David
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [email protected] 
>>>>> <mailto:[email protected]>
>>>>> For additional commands, e-mail: [email protected] 
>>>>> <mailto:[email protected]>
>>>>> 
>>>> 
>>> 
>> 
>

Re: [DISCUSS] Enhancements to Tuple Sketch.

Reply via email to