Hi Pig users,
Is there an easy/efficient way to sample an inner bag? For example, with input
in a relation like
(id1,att1,{(a,0.01),(b,0.02),(x,0.999749968742)})
(id1,att2,{(a,0.03),(b,0.04),(x,0.998749217772)})
(id2,att1,{(b,0.05),(c,0.06),(x,0.996945334509)})
I’d like to sample 1/3 the
Take a look at this output. Talk about mixed signals!!! :-P
(I change the error slightly to obfuscate internal database and table names)
… so basically it says it's bot successful and failed.
Output(s):
Successfully stored 5406592 records in:
Upgraded to 12.1 and now I'm getting this whenever I try to REGISTER a
jar. I don't use slf4j, so I have no idea what's causing it. Has
anyone else run into it? My Hadoop version is cdh3u3.
Pig Stack Trace
---
ERROR 2998: Unhandled internal error.
Update: Turns out I'm getting it on 11.1 as well. Must be a problem
with something in my jar.
On Tue, May 27, 2014 at 1:26 PM, Ryan Compton compton.r...@gmail.com wrote:
Upgraded to 12.1 and now I'm getting this whenever I try to REGISTER a
jar. I don't use slf4j, so I have no idea what's
If you know how many items you want from each inner bag exactly, you can hack
it like this:
x = foreach x {
y = foreach x generate RANDOM() as rnd, *;
y = order y by rnd;
y = limit y $SAMPLE_NUM;
y = foreach y generate $1 ..;
generate group, y;
}
Basically randomize the
@Mehmet... great hack! I like it :-P
On Tue, May 27, 2014 at 5:08 PM, Mehmet Tepedelenlioglu
mehmets...@yahoo.com wrote:
If you know how many items you want from each inner bag exactly, you can
hack it like this:
x = foreach x {
y = foreach x generate RANDOM() as rnd, *;
y =