[
https://issues.apache.org/jira/browse/PIG-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248133#comment-13248133
]
[email protected] commented on PIG-2632:
----------------------------------------------------
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. >
trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java,
line 133
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100075#file100075line133>
bq. >
bq. > it would be nice if Pig called outputSchema() once during the init
and that was saved somewhere.
It calls it many times, alas. I feel like changing that is a job for another
patch, but I could tackle that here. But given there is already the expectation
that it will be called an arbitrary number of times, I don't think that calling
it once more on the front end will be an issue?
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. >
trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java,
lines 136-145
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100075#file100075line136>
bq. >
bq. > So the side effect is that the Tuple class get generated here ? This
should be encapsulated in a more obvious method here.
bq. > like: generateTupleForSchema(tmpS) and not necessarily create an
instance of a tuple.
bq. > also that would remove duplication here.
I was trying to leverage Dmitriy's newTupleForSchema, but I think you're right.
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. >
trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java,
lines 151-152
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100075#file100075line151>
bq. >
bq. > those are the inputSchema and the outputSchema. The fields should
probably called just that and have boolean flag to know if we use generated
Tuples?
Well, we need to know what the tuple to generate is. I will change it save the
ID that serves as the base, but as is the Schema info isn't saved and it isn't
available.
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. >
trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java,
line 188
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100075#file100075line188>
bq. >
bq. > I would make it:
bq. > TupleFactory.geInstanceForSchema().newTuple()
I think it's worth thinking about how we want to do this. Dmitriy implemented
newTupleForSchema for the PrimitiveTuple work, and he suggested I use that for
this... however I agree, I think there should be a "SchemaTupleFactory" that
generates SchemaTuples of a given SchemaTuple. At the same time, that might be
a lot of scaffolding for not a lot of gain: once you generate the code for a
given SchemaTuple, there's not a lot of work to be done (though this could
encapsulate some of the static maps that come later). One other factor is that
the TupleFactory interface doesn't really lend itself very nicely to the
SchemaTuple, but I do think it could be extended and be made to work. Would
appreciate more of your thoughts on this.
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/BinInterSedes.java, lines 601-602
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100076#file100076line601>
bq. >
bq. > should SchemaTuple extend TypeAwareTuple ?
Agreed, and my working version has that update.
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/PrimitiveTuple.java, line 213
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100077#file100077line213>
bq. >
bq. > BytesKey ?
good call
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, lines 46-48
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line46>
bq. >
bq. > statics on top
I really need to clean up where everything goes, noted.
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 70
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line70>
bq. >
bq. > largestSetValue should already be correct. Why does largestSetValue
stays at the size of the schema?
I guess once it is at the value of the SchemaTuple (excluding appends), there's
no need to update it.
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 77
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line77>
bq. >
bq. > isAppendNull()
bq. > Should the append...() methods be public?
I think a lot of these methods should be protected, agreed
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 81
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line81>
bq. >
bq. > the result would be the same, right?
yeah, you're right
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 94
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line94>
bq. >
bq. > setAppend()
yeah need to rename all of these. naming is what I am worst at :S
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 105
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line105>
bq. >
bq. > if it is always overridden, it should be abstract
It's overriden, but the overriden version calls super. Right now it returns 0
because I don't want to take the time to calculate it out while the class is
still evolving.
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 119
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line119>
bq. >
bq. > set? stream?
This originally meant "Schema Tuple Element Write" but yeah, it should be more
descriptive.
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 125
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line125>
bq. >
bq. > redundant with the 2 others?
bq. > idClassMap.get(schemaIdMap.get(schema))
Perhaps this is what a SchemaTupleFactory would make cleaner, but my desire is
for Tuple creation to be as fast as possible. This is why there are 5 maps...
my thought is that they are all static, so the extra cost of caching the extra
info is worth avoiding two gets in a row to link the info together when we need
to make Tuples. Am open to thoughts.
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, lines 130-142
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line130>
bq. >
bq. > could be computed statically by the generated class if you make a
static getSchema(schemaString).
Well, my thought is that a static cache wouldn't take up much memory, and it
would avoid having to recreate the Schema object every time getSchema() is
called. I don't forsee it being called a ton, but possibly enough that
recreating the object a bunch would be wasteful. Perhaps this is premature
optimization?
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 144
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line144>
bq. >
bq. > setAll() ?
It's not really a setAll imho. It's setting it equal to. Maybe that's what
setAll means, but it seems to convey something different.
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 159
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line159>
bq. >
bq. > do we need all of set(Tuple), set(List), setEqualTo(List),
setEqualTo(Tuple) ?
My working version adds even more. I think a lot of it should be made
protected, or at least, I should be more thoughtful about what it should be. I
think "set" is probably enough, but perhaps it should just be the void version?
I guess this is where working with ruby where everything generally returns
"this" by default, thus allowing for nice chaining of methods, is the norm.
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 191
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line191>
bq. >
bq. > you could add a properties file to the jar with this info. Or a
generated factory that knows all the classes generated. You mainly need the
mapping schema -> id
bq. >
bq. > you could also use true. Although that would not be THE answer to
THE question.
I think the jar properties is the way to go, at least as the ultimate source of
the classes that were generated. Alternately, it'd be nice if there was an easy
way to browse a jar for certain class files, but I couldn't find a clean one?
If there was, we could just search for SchemaTuple_*.class and keep track of
which files went across. Do you know if there is an easier way to search jars
like that? I can dig around as well.
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 209
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line209>
bq. >
bq. > As we have schema aware Tuples, in the future we could add
get(String fieldName). We may not want to strip the aliases to reuse the class.
Hmm, it seems pretty silly that two Schemas that are essentially "int,int,long"
except with different names would necessitate a different class. More
importantly, however, I ran into cases where schema names were different due to
the way that names were generated by different pieces of code, and so the
lookup was thwarted if you were strict about names. Hmm hmm hmm. Do you think
the win from get(String) is worth the potential complication? It could be
convenient...
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, lines 270-273
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line270>
bq. >
bq. > Logger
woops, I usually mark these //remove and whack them before the patch goes out...
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 268
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line268>
bq. >
bq. > there's only one code path where updateMaps is true. It could be
simplified a bit.
agreed
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, lines 310-321
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line310>
bq. >
bq. > refactor DataByteArray to provide this?
So here is the thing about this. Currently, a ton of the useful methods in
BinInterSedes (where this logic was taken from) is private. Also, they aren't
static (though there's no real reason why they shouldn't be?) This snippet (and
other logic like it) should probably be made into a protected static method of
BinInterSedes. Thoughts? It could also make sense for classes to have a static
method to serialize themselves, but I'd argue that the BinInterSedes approach
is probably more consistent with how pig is laid out (though I have no idea why
most of the methods there are private instead of protected static).
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, lines 325-333
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line325>
bq. >
bq. > Same comment, the String serialization should be shared with the
regular Tuple.
bq. >
bq. > Also we could keep the a UTF8 in the SchemaTuple to reduce memory
usage and convert lazilyto String (possibly caching the result)
This logic is stolen directly from BinInterSedes (see comment above). Can you
flesh out what you mean about UTF8? If we change it here we should probably
change it there as well (and make that the canonical protected static source of
all your string serialization needs).
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, lines 344-353
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line344>
bq. >
bq. > same comment as before.
bq. > The same logic should be use with regular tuples.
bq. > The same logic should be shared across types.
bq. >
bq. > It could also use a variable size int format.
bq. > https://developers.google.com/protocol-buffers/docs/encoding#types
varints is a good idea. general question of where this logic should live still
holds as well, as this logic is also directly ripped from BinInterSedes.
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 452
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line452>
bq. >
bq. > is (null, null, null) the same as null ?
In DefaultTuple, this is deprecated and not really used. Given that, I use this
to basically mean "has any information been written to this." So in your case,
both would be null. The more important question is to make sure I use it
consistently throughout. Honestly, the whole null thing is a pain, and I need
to really comb over things to make sure I incorporated it properly.
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 474
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line474>
bq. >
bq. > this.set(t) ?
reference is this weird Franken-method that isn't used anywhere in the
codebase. I don't know that I want to implement it when the semantics of what
it should do don't seem clear at all. Open to thoughts on this though. The
original intent of it doesn't make much sense for a SchemaTuple since the
underlying structures are primitives, not objects.
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 479
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line479>
bq. >
bq. > DefaultTuple replaces null with ""
Good idea
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 484
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line484>
bq. >
bq. > We could have a list implementation that just wraps a tuple.
Hmm, I think this is an intriguing idea, though it might get sills, because
that means you'd have a list which wraps a tuple which wraps a list.
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 495
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line495>
bq. >
bq. > yep, sounds like it could be generated and fall back to this if the
other is a regular tuple
My working copy almost has this finished. there are three comparison functions:
compareTo(Object), compareTo(SchemaTuple), and compareToSpecific(SchemaTuple_).
It will check (unless you tell it not to) if you were actually given a
SchemaTuple_X (or just a schemaTuple) and do progressively faster comparisons.
I also need to generate a RawComparator but I'd rather clean everything else up
first (especially the layout of the API and where pieces live).
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 535
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line535>
bq. >
bq. > you may want to log compilation warnings/errors
Is there a need to log it if the runtime exception is going to be thrown
anyway? Because those errors get written to stderr. But I suppose I could pipe
them through the Logger, if that's what you mean?
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 545
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line545>
bq. >
bq. > Class.forName(className) uses the classLoader of the current class.
nice!
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 585
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line585>
bq. >
bq. > if it does not need to be public then yes
it depends heavily on how we end up forcing people to make these things.
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, lines 609-635
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line609>
bq. >
bq. > the parameter t is thrown away ?
bq. >
bq. > they don't need to be public.
This is more for conciseness in the code that generates the SchemaTuple_ code.
It allows me to make a bunch of if then elses smaller. I suppose there are
other ways to do it that don't involve a method call... I go back and forth on
that. But the reason is so later on I can do:
add(" setPos_X(SchemaTuple.unbox(val, pos_X));");
which means that the right one will be based on the type of the field being
filled (ie saving me a bunch of lines, but perhaps that should be trumped by
cleanliness of the output code/API).
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, lines 637-659
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line637>
bq. >
bq. > depending on how those are used, java already does auto un-boxing.
bq. > The following works so you may not need them:
bq. > long unbox(Long v) {
bq. > return v;
bq. > }
bq. >
bq. > they don't need to be public
Good point. I hate auto-boxing/un-boxing as a language feature, but if it can
be utilized to clean things up it is worth it.
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, lines 667-685
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line667>
bq. >
bq. > same comment, the following works:
bq. > Float box(float v) {
bq. > return v;
bq. > }
bq. >
bq. > they don't need to be public
Yeah same as above as well. I think I can actually also leverage auto
boxing/unboxing deeper in the code to remove some of the need for these
functions.
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 698
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line698>
bq. >
bq. > this file has 1500 lines :)
bq. > I think those classes have earned their own package.
Yeah, I mean, you're right... but the logic is EXTREMELY narrow, which is why I
don't like the idea of splitting it out. Yeah, it makes the file larger, but it
makes 0 sense outside of the context of what is going on here. Perhaps that
should be rectified?
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, lines 857-860
bq. > <https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line857>
bq. >
bq. > this could be a method in the parent class
do you mean in SchemaTuple, or in TypeInFunctionStringOut?
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 1138
bq. >
<https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line1138>
bq. >
bq. > CheckIfNullString.masks
bq. > maybe they should be in a Masks class then, along with the masking
code generation (" & (byte)+...)
I don't know how worthwhile it is to split all of this stuff out into classes.
Maybe it is. A lot of stuff is used once or twice in disparate places, and I
fear that splitting it out more and more will just make people jump through 50
classes to understand what a line of code gen is doing. On the other hand, it
would make fixing bugs much cleaner. What probably needs to be done is some
sort of cleaner code generation framework... would appreciate more thoughts
here.
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 1283
bq. >
<https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line1283>
bq. >
bq. > This feels weird.
bq. > what about:
bq. > "SchemaTuple.unbox(("+getBoxedTypeName()+")appendGet(diff))"
bq. >
bq. > getBoxedTypeName() {
bq. > switch (type) {
bq. > case (DataType.INTEGER): return "Integer";
bq. > ...
I actually was able to replace it with:
add(" case ("+fieldNum+"): return "+fs.type+";");
sometimes I miss the forest for the trees.
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, line 1413
bq. >
<https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line1413>
bq. >
bq. > use
http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/StringBuilder.html for
all those string concatenations.
yeahh this is a good idea. I doubt the code gen takes up much time at all but
given the gen'd files are potentially large, it's probably good practice
bq. On 2012-04-06 03:30:08, Julien Le Dem wrote:
bq. > trunk/src/org/apache/pig/data/SchemaTuple.java, lines 1522-1524
bq. >
<https://reviews.apache.org/r/4651/diff/1/?file=100078#file100078line1522>
bq. >
bq. > what does isObjNotTup mean? What is this case?
it was an obscure case that was obsolete. I removed it
- Jonathan
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4651/#review6722
-----------------------------------------------------------
On 2012-04-05 01:31:00, Jonathan Coveney wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/4651/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-04-05 01:31:00)
bq.
bq.
bq. Review request for Julien Le Dem.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. This work builds on Dmitriy's PrimitiveTuple work. The idea is that,
knowing the Schema on the frontend, we can code generate Tuples which can be
used for fun and profit. In rudimentary tests, the memory efficiency is 2-4x
better, and it's ~15% smaller serialized (heavily heavily depends on the data,
though). Need to do get/set tests, but assuming that it's on par (or even
faster) than Tuple, the memory gain is huge.
bq.
bq. Need to clean up the code and add tests.
bq.
bq. Right now, it generates a SchemaTuple for every inputSchema and
outputSchema given to UDF's. The next step is to make a SchemaBag, where I
think the serialization savings will be really huge.
bq.
bq. Needs tests and comments, but I want the code to settle a bit.
bq.
bq.
bq. This addresses bug PIG-2632.
bq. https://issues.apache.org/jira/browse/PIG-2632
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. trunk/src/org/apache/pig/data/TypeAwareTuple.java 1309628
bq.
trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
1309628
bq. trunk/src/org/apache/pig/data/BinInterSedes.java 1309628
bq. trunk/src/org/apache/pig/data/PrimitiveTuple.java 1309628
bq. trunk/src/org/apache/pig/data/SchemaTuple.java PRE-CREATION
bq. trunk/src/org/apache/pig/data/TupleFactory.java 1309628
bq. trunk/src/org/apache/pig/impl/PigContext.java 1309628
bq.
trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java
1309628
bq.
bq. Diff: https://reviews.apache.org/r/4651/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq.
bq. Thanks,
bq.
bq. Jonathan
bq.
bq.
> Create a SchemaTuple which generates efficient Tuples via code gen
> ------------------------------------------------------------------
>
> Key: PIG-2632
> URL: https://issues.apache.org/jira/browse/PIG-2632
> Project: Pig
> Issue Type: Improvement
> Reporter: Jonathan Coveney
> Assignee: Jonathan Coveney
> Fix For: 0.11
>
> Attachments: PIG-2632-0.patch, PIG-2632-1.patch
>
>
> This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing
> the Schema on the frontend, we can code generate Tuples which can be used for
> fun and profit. In rudimentary tests, the memory efficiency is 2-4x better,
> and it's ~15% smaller serialized (heavily heavily depends on the data,
> though). Need to do get/set tests, but assuming that it's on par (or even
> faster) than Tuple, the memory gain is huge.
> Need to clean up the code and add tests.
> Right now, it generates a SchemaTuple for every inputSchema and outputSchema
> given to UDF's. The next step is to make a SchemaBag, where I think the
> serialization savings will be really huge.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira