[ https://issues.apache.org/jira/browse/PIG-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248509#comment-13248509 ]
jirapos...@reviews.apache.org commented on PIG-2632: ---------------------------------------------------- bq. On 2012-04-06 03:30:08, Julien Le Dem wrote: bq. > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java, line 133 bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100075#file100075line133> bq. > bq. > it would be nice if Pig called outputSchema() once during the init and that was saved somewhere. bq. bq. Jonathan Coveney wrote: bq. It calls it many times, alas. I feel like changing that is a job for another patch, but I could tackle that here. But given there is already the expectation that it will be called an arbitrary number of times, I don't think that calling it once more on the front end will be an issue? I agree,let's not add to this already big patch. This should be dealt with in a separate patch. bq. On 2012-04-06 03:30:08, Julien Le Dem wrote: bq. > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java, lines 151-152 bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100075#file100075line151> bq. > bq. > those are the inputSchema and the outputSchema. The fields should probably called just that and have boolean flag to know if we use generated Tuples? bq. bq. Jonathan Coveney wrote: bq. Well, we need to know what the tuple to generate is. I will change it save the ID that serves as the base, but as is the Schema info isn't saved and it isn't available. I was more commenting on the naming. Those are the input and output schema independently of code generation. Knowing if you can generate or not could be saved with a boolean. Possibly you don't even need to save the information than can generate the tuples or not. It could fall back to the DefaultTuple. (Anyway, this one is not very important) bq. On 2012-04-06 03:30:08, Julien Le Dem wrote: bq. > trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java, line 188 bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100075#file100075line188> bq. > bq. > I would make it: bq. > TupleFactory.geInstanceForSchema().newTuple() bq. bq. Jonathan Coveney wrote: bq. I think it's worth thinking about how we want to do this. Dmitriy implemented newTupleForSchema for the PrimitiveTuple work, and he suggested I use that for this... however I agree, I think there should be a "SchemaTupleFactory" that generates SchemaTuples of a given SchemaTuple. At the same time, that might be a lot of scaffolding for not a lot of gain: once you generate the code for a given SchemaTuple, there's not a lot of work to be done (though this could encapsulate some of the static maps that come later). One other factor is that the TupleFactory interface doesn't really lend itself very nicely to the SchemaTuple, but I do think it could be extended and be made to work. Would appreciate more of your thoughts on this. I meant: TupleFactory.geInstanceForSchema(inputSchemaForGen).newTuple() The change is relatively small, it is just moving this from one class to the other and getting rid of the Maps. PrimitiveTuple should follow the same logic. Asking for the schema in the factory initialization will simplify the code. As you say, you won't need to constantly look up the schema. Also pig is about writing a lot of tuples of the same type. The block of code (see 2 comments above) that triggers the code generation would just get the factories instead of actually generating Tuples. This is caused by the api asking for the schema in the wrong place. TupleFactory is an abstract class so it should be doable to add methods without breaking compatibility. If needed, backard compatibility can be maintained by having newTupleForSchema(inputSchemaForGen) call TupleFactory.geInstanceForSchema(inputSchemaForGen).newTuple() bq. On 2012-04-06 03:30:08, Julien Le Dem wrote: bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 70 bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line70> bq. > bq. > largestSetValue should already be correct. Why does largestSetValue stays at the size of the schema? bq. bq. Jonathan Coveney wrote: bq. I guess once it is at the value of the SchemaTuple (excluding appends), there's no need to update it. Correct but know you have to check this every time you update. I was thinking it would simplify the code a little if it was just the actual size of the tuple (including past the original size). bq. On 2012-04-06 03:30:08, Julien Le Dem wrote: bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 105 bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line105> bq. > bq. > if it is always overridden, it should be abstract bq. bq. Jonathan Coveney wrote: bq. It's overriden, but the overriden version calls super. Right now it returns 0 because I don't want to take the time to calculate it out while the class is still evolving. Ok, let's make it abstract once we get to this. bq. On 2012-04-06 03:30:08, Julien Le Dem wrote: bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 125 bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line125> bq. > bq. > redundant with the 2 others? bq. > idClassMap.get(schemaIdMap.get(schema)) bq. bq. Jonathan Coveney wrote: bq. Perhaps this is what a SchemaTupleFactory would make cleaner, but my desire is for Tuple creation to be as fast as possible. This is why there are 5 maps... my thought is that they are all static, so the extra cost of caching the extra info is worth avoiding two gets in a row to link the info together when we need to make Tuples. Am open to thoughts. I like the SchemaTupleFactory approach. It would alleviate the need for this. bq. On 2012-04-06 03:30:08, Julien Le Dem wrote: bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, lines 130-142 bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line130> bq. > bq. > could be computed statically by the generated class if you make a static getSchema(schemaString). bq. bq. Jonathan Coveney wrote: bq. Well, my thought is that a static cache wouldn't take up much memory, and it would avoid having to recreate the Schema object every time getSchema() is called. I don't forsee it being called a ton, but possibly enough that recreating the object a bunch would be wasteful. Perhaps this is premature optimization? caching is fine. I meant that you could have the Schema object "cached" in a static field of the generated class instead of a runtime generation + cache in the generated class: static schemaString = "...."; static Schema schema = Utils.getSchemaFromString(schemaString); public Schema getSchema() { return schema; } bq. On 2012-04-06 03:30:08, Julien Le Dem wrote: bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 144 bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line144> bq. > bq. > setAll() ? bq. bq. Jonathan Coveney wrote: bq. It's not really a setAll imho. It's setting it equal to. Maybe that's what setAll means, but it seems to convey something different. It sounds like setEqualTo() is what set() generally means. when you set something to a value then you expect it to be equal to the value. What do you think? bq. On 2012-04-06 03:30:08, Julien Le Dem wrote: bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 159 bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line159> bq. > bq. > do we need all of set(Tuple), set(List), setEqualTo(List), setEqualTo(Tuple) ? bq. bq. Jonathan Coveney wrote: bq. My working version adds even more. I think a lot of it should be made protected, or at least, I should be more thoughtful about what it should be. I think "set" is probably enough, but perhaps it should just be the void version? I guess this is where working with ruby where everything generally returns "this" by default, thus allowing for nice chaining of methods, is the norm. I'm fine with the chaining mechanism. We should try to avoid having both as it makes the code harder to read. You cant set and ignore the return value; bq. On 2012-04-06 03:30:08, Julien Le Dem wrote: bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 191 bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line191> bq. > bq. > you could add a properties file to the jar with this info. Or a generated factory that knows all the classes generated. You mainly need the mapping schema -> id bq. > bq. > you could also use true. Although that would not be THE answer to THE question. bq. bq. Jonathan Coveney wrote: bq. I think the jar properties is the way to go, at least as the ultimate source of the classes that were generated. Alternately, it'd be nice if there was an easy way to browse a jar for certain class files, but I couldn't find a clean one? If there was, we could just search for SchemaTuple_*.class and keep track of which files went across. Do you know if there is an easier way to search jars like that? I can dig around as well. tl;dr: As you know what classes you generate I would go for a mechanism where you provide the list of classes to load like a properties file in the jar. I don't think there is a clean way of scanning a package out of the box. Usually dependency injection containers (like Spring) have a mechanism for scanning the classpath for classes with a given annotation. example: http://static.springsource.org/spring/docs/3.0.0.M3/spring-framework-reference/html/ch04s12.html (I'm not saying we should add dependencies to Pig! Just FYI) You could have a simple scanning mechanism using getResource(): http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/ClassLoader.html#getResource%28java.lang.String%29 this.getClass().getClassLoader().getResource("my/package/path") will return a URL and you can scan yourself the location of the URL. If the package is not in a jar and on the local file system, you will get a file:/... url that you can easily make a File of. If the package is in a jar, then you will need to use a JarFile to scan the content of the package in the jar. The issue here is that the classloading mechanism is extensible: The URL could be a HTTP url or some other that do not provide a listing option. There is no way to have a fully generic scanning mechanism if you don't know the name of the classes you want. A search for "enumerate classes in a package" returns examples of people doing this. bq. On 2012-04-06 03:30:08, Julien Le Dem wrote: bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 209 bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line209> bq. > bq. > As we have schema aware Tuples, in the future we could add get(String fieldName). We may not want to strip the aliases to reuse the class. bq. bq. Jonathan Coveney wrote: bq. Hmm, it seems pretty silly that two Schemas that are essentially "int,int,long" except with different names would necessitate a different class. More importantly, however, I ran into cases where schema names were different due to the way that names were generated by different pieces of code, and so the lookup was thwarted if you were strict about names. Hmm hmm hmm. Do you think the win from get(String) is worth the potential complication? It could be convenient... I would need to look more into it. I would be interested in seeing the problems you see when not stripping the names. This seems changeable later if needed, so let's keep it that way for now. bq. On 2012-04-06 03:30:08, Julien Le Dem wrote: bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, lines 270-273 bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line270> bq. > bq. > Logger bq. bq. Jonathan Coveney wrote: bq. woops, I usually mark these //remove and whack them before the patch goes out... no worries. Usually it can be useful to keep debug log statements for future modification. bq. On 2012-04-06 03:30:08, Julien Le Dem wrote: bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, lines 310-321 bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line310> bq. > bq. > refactor DataByteArray to provide this? bq. bq. Jonathan Coveney wrote: bq. So here is the thing about this. Currently, a ton of the useful methods in BinInterSedes (where this logic was taken from) is private. Also, they aren't static (though there's no real reason why they shouldn't be?) This snippet (and other logic like it) should probably be made into a protected static method of BinInterSedes. Thoughts? It could also make sense for classes to have a static method to serialize themselves, but I'd argue that the BinInterSedes approach is probably more consistent with how pig is laid out (though I have no idea why most of the methods there are private instead of protected static). private stuff can be safely moved around and refactored :) (from a compatibility point of view) Let's think of a way to have the same code used in all cases. There used to be one type of tuple so what made sense before may need to changee. - parent class for both ? - TupleSerializer class ? - static helpers? bq. On 2012-04-06 03:30:08, Julien Le Dem wrote: bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, lines 325-333 bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line325> bq. > bq. > Same comment, the String serialization should be shared with the regular Tuple. bq. > bq. > Also we could keep the a UTF8 in the SchemaTuple to reduce memory usage and convert lazilyto String (possibly caching the result) bq. bq. Jonathan Coveney wrote: bq. This logic is stolen directly from BinInterSedes (see comment above). Can you flesh out what you mean about UTF8? If we change it here we should probably change it there as well (and make that the canonical protected static source of all your string serialization needs). Following your goal of reducing memory utilization and as the serialized format is UTF8, we could keep UTF8 as the memory representation instead of String. Java Strings are UTF16 which is minimum 2 bytes per characters. If the data is mostly ascii, this is wasteful. To maintain backward compatibility the get(int) would lazily convert to String on demand, possibly caching the result in a second field to save processing, but this may be overkill. bq. On 2012-04-06 03:30:08, Julien Le Dem wrote: bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 452 bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line452> bq. > bq. > is (null, null, null) the same as null ? bq. bq. Jonathan Coveney wrote: bq. In DefaultTuple, this is deprecated and not really used. Given that, I use this to basically mean "has any information been written to this." So in your case, both would be null. The more important question is to make sure I use it consistently throughout. Honestly, the whole null thing is a pain, and I need to really comb over things to make sure I incorporated it properly. agreed, we should just make sure this is the same behavior as the DefaultTuple bq. On 2012-04-06 03:30:08, Julien Le Dem wrote: bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 474 bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line474> bq. > bq. > this.set(t) ? bq. bq. Jonathan Coveney wrote: bq. reference is this weird Franken-method that isn't used anywhere in the codebase. I don't know that I want to implement it when the semantics of what it should do don't seem clear at all. Open to thoughts on this though. The original intent of it doesn't make much sense for a SchemaTuple since the underlying structures are primitives, not objects. "the underlying structures are primitives, not objects" for now, as Bags and Maps will come. For backward compatibility, I think we should implement this as a set(). It seems this is intended as a cheap set, not sure if "modifying one modifies the others" behavior is expected. some UDFs could depend on this. Also let's mark reference() as deprecated right now so that we can remove it later. bq. On 2012-04-06 03:30:08, Julien Le Dem wrote: bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 484 bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line484> bq. > bq. > We could have a list implementation that just wraps a tuple. bq. bq. Jonathan Coveney wrote: bq. Hmm, I think this is an intriguing idea, though it might get sills, because that means you'd have a list which wraps a tuple which wraps a list. this is just for saving a list allocation. A wrapper list would use less memory. We can punt this for now. bq. On 2012-04-06 03:30:08, Julien Le Dem wrote: bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 495 bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line495> bq. > bq. > yep, sounds like it could be generated and fall back to this if the other is a regular tuple bq. bq. Jonathan Coveney wrote: bq. My working copy almost has this finished. there are three comparison functions: compareTo(Object), compareTo(SchemaTuple), and compareToSpecific(SchemaTuple_). It will check (unless you tell it not to) if you were actually given a SchemaTuple_X (or just a schemaTuple) and do progressively faster comparisons. I also need to generate a RawComparator but I'd rather clean everything else up first (especially the layout of the API and where pieces live). cool bq. On 2012-04-06 03:30:08, Julien Le Dem wrote: bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 535 bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line535> bq. > bq. > you may want to log compilation warnings/errors bq. bq. Jonathan Coveney wrote: bq. Is there a need to log it if the runtime exception is going to be thrown anyway? Because those errors get written to stderr. But I suppose I could pipe them through the Logger, if that's what you mean? Ok. You can also get them programatically, but stderr is good too. Let's make sure we don't swamp the output of Pig with too many messages. bq. On 2012-04-06 03:30:08, Julien Le Dem wrote: bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, lines 609-635 bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line609> bq. > bq. > the parameter t is thrown away ? bq. > bq. > they don't need to be public. bq. bq. Jonathan Coveney wrote: bq. This is more for conciseness in the code that generates the SchemaTuple_ code. It allows me to make a bunch of if then elses smaller. I suppose there are other ways to do it that don't involve a method call... I go back and forth on that. But the reason is so later on I can do: bq. add(" setPos_X(SchemaTuple.unbox(val, pos_X));"); bq. which means that the right one will be based on the type of the field being filled (ie saving me a bunch of lines, but perhaps that should be trumped by cleanliness of the output code/API). Ah yeah, this comment is obsoleted by my other comment bellow. I should have removed it. I saw how it simplifies the generation later. See my proposal for this bellow. bq. On 2012-04-06 03:30:08, Julien Le Dem wrote: bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 698 bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line698> bq. > bq. > this file has 1500 lines :) bq. > I think those classes have earned their own package. bq. bq. Jonathan Coveney wrote: bq. Yeah, I mean, you're right... but the logic is EXTREMELY narrow, which is why I don't like the idea of splitting it out. Yeah, it makes the file larger, but it makes 0 sense outside of the context of what is going on here. Perhaps that should be rectified? I don't feel strongly about it. Maybe you want to separate the base class SchemaTuple (runtime) from the Generator ? bq. On 2012-04-06 03:30:08, Julien Le Dem wrote: bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, lines 857-860 bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line857> bq. > bq. > this could be a method in the parent class bq. bq. Jonathan Coveney wrote: bq. do you mean in SchemaTuple, or in TypeInFunctionStringOut? Where it makes sense to you. based on the way you removed everything that does not need to be generated, you could factor out the error handling and just generate "default: handleDefault(fieldNum)" in SchemaTuple? bq. On 2012-04-06 03:30:08, Julien Le Dem wrote: bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 1138 bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line1138> bq. > bq. > CheckIfNullString.masks bq. > maybe they should be in a Masks class then, along with the masking code generation (" & (byte)+...) bq. bq. Jonathan Coveney wrote: bq. I don't know how worthwhile it is to split all of this stuff out into classes. Maybe it is. A lot of stuff is used once or twice in disparate places, and I fear that splitting it out more and more will just make people jump through 50 classes to understand what a line of code gen is doing. On the other hand, it would make fixing bugs much cleaner. What probably needs to be done is some sort of cleaner code generation framework... would appreciate more thoughts here. My goal is to have the bit management thing implemented once so that it is easier to change/improve/bugfix. agreed that we should keep things simple. If we have a SchemaTupleClassGenerator that contains bit management helpers that could simplify things. Separating the runtime from the generation would reduce the amount of things you have to look at at the same time. bq. On 2012-04-06 03:30:08, Julien Le Dem wrote: bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 1283 bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line1283> bq. > bq. > This feels weird. bq. > what about: bq. > "SchemaTuple.unbox(("+getBoxedTypeName()+")appendGet(diff))" bq. > bq. > getBoxedTypeName() { bq. > switch (type) { bq. > case (DataType.INTEGER): return "Integer"; bq. > ... bq. bq. Jonathan Coveney wrote: bq. I actually was able to replace it with: bq. add(" case ("+fieldNum+"): return "+fs.type+";"); bq. sometimes I miss the forest for the trees. cool bq. On 2012-04-06 03:30:08, Julien Le Dem wrote: bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 1413 bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line1413> bq. > bq. > use http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/StringBuilder.html for all those string concatenations. bq. bq. Jonathan Coveney wrote: bq. yeahh this is a good idea. I doubt the code gen takes up much time at all but given the gen'd files are potentially large, it's probably good practice a concatenation copies both strings, so concatenating large strings over and over can become slow. Note that "a"+"B"+foo+bar+"c" gets replaced by the compiler by new StringBuilder().append("a").append("B").append(foo).append(bar).append("c").toString() But you still concatenate those to other strings so will end up duplicating large strings. bq. On 2012-04-06 03:30:08, Julien Le Dem wrote: bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, lines 1522-1524 bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line1522> bq. > bq. > what does isObjNotTup mean? What is this case? bq. bq. Jonathan Coveney wrote: bq. it was an obscure case that was obsolete. I removed it ok - Julien ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4651/#review6722 ----------------------------------------------------------- On 2012-04-05 01:31:00, Jonathan Coveney wrote: bq. bq. ----------------------------------------------------------- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/4651/ bq. ----------------------------------------------------------- bq. bq. (Updated 2012-04-05 01:31:00) bq. bq. bq. Review request for Julien Le Dem. bq. bq. bq. Summary bq. ------- bq. bq. This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing the Schema on the frontend, we can code generate Tuples which can be used for fun and profit. In rudimentary tests, the memory efficiency is 2-4x better, and it's ~15% smaller serialized (heavily heavily depends on the data, though). Need to do get/set tests, but assuming that it's on par (or even faster) than Tuple, the memory gain is huge. bq. bq. Need to clean up the code and add tests. bq. bq. Right now, it generates a SchemaTuple for every inputSchema and outputSchema given to UDF's. The next step is to make a SchemaBag, where I think the serialization savings will be really huge. bq. bq. Needs tests and comments, but I want the code to settle a bit. bq. bq. bq. This addresses bug PIG-2632. bq. https://issues.apache.org/jira/browse/PIG-2632 bq. bq. bq. Diffs bq. ----- bq. bq. trunk/src/org/apache/pig/data/TypeAwareTuple.java 1309628 bq. trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java 1309628 bq. trunk/src/org/apache/pig/data/BinInterSedes.java 1309628 bq. trunk/src/org/apache/pig/data/PrimitiveTuple.java 1309628 bq. trunk/src/org/apache/pig/data/SchemaTuple.java PRE-CREATION bq. trunk/src/org/apache/pig/data/TupleFactory.java 1309628 bq. trunk/src/org/apache/pig/impl/PigContext.java 1309628 bq. trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java 1309628 bq. bq. Diff: https://reviews.apache.org/r/4651/diff bq. bq. bq. Testing bq. ------- bq. bq. bq. Thanks, bq. bq. Jonathan bq. bq. > Create a SchemaTuple which generates efficient Tuples via code gen > ------------------------------------------------------------------ > > Key: PIG-2632 > URL: https://issues.apache.org/jira/browse/PIG-2632 > Project: Pig > Issue Type: Improvement > Reporter: Jonathan Coveney > Assignee: Jonathan Coveney > Fix For: 0.11 > > Attachments: PIG-2632-0.patch, PIG-2632-1.patch > > > This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing > the Schema on the frontend, we can code generate Tuples which can be used for > fun and profit. In rudimentary tests, the memory efficiency is 2-4x better, > and it's ~15% smaller serialized (heavily heavily depends on the data, > though). Need to do get/set tests, but assuming that it's on par (or even > faster) than Tuple, the memory gain is huge. > Need to clean up the code and add tests. > Right now, it generates a SchemaTuple for every inputSchema and outputSchema > given to UDF's. The next step is to make a SchemaBag, where I think the > serialization savings will be really huge. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira