How to sample an inner bag?

2014-05-27 Thread william.dowling
Hi Pig users, Is there an easy/efficient way to sample an inner bag? For example, with input in a relation like (id1,att1,{(a,0.01),(b,0.02),(x,0.999749968742)}) (id1,att2,{(a,0.03),(b,0.04),(x,0.998749217772)}) (id2,att1,{(b,0.05),(c,0.06),(x,0.996945334509)}) I’d like to sample 1/3 the

Misleading error messages in pig… 0.12.1 (both successful and failed)

2014-05-27 Thread Kevin Burton
Take a look at this output. Talk about mixed signals!!! :-P (I change the error slightly to obfuscate internal database and table names) … so basically it says it's bot successful and failed. Output(s): Successfully stored 5406592 records in:

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org.slf4j.spi.LocationAwareLogger.log

2014-05-27 Thread Ryan Compton
Upgraded to 12.1 and now I'm getting this whenever I try to REGISTER a jar. I don't use slf4j, so I have no idea what's causing it. Has anyone else run into it? My Hadoop version is cdh3u3. Pig Stack Trace --- ERROR 2998: Unhandled internal error.

Re: ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org.slf4j.spi.LocationAwareLogger.log

2014-05-27 Thread Ryan Compton
Update: Turns out I'm getting it on 11.1 as well. Must be a problem with something in my jar. On Tue, May 27, 2014 at 1:26 PM, Ryan Compton compton.r...@gmail.com wrote: Upgraded to 12.1 and now I'm getting this whenever I try to REGISTER a jar. I don't use slf4j, so I have no idea what's

Re: How to sample an inner bag?

2014-05-27 Thread Mehmet Tepedelenlioglu
If you know how many items you want from each inner bag exactly, you can hack it like this: x = foreach x { y = foreach x generate RANDOM() as rnd, *; y = order y by rnd; y = limit y $SAMPLE_NUM; y = foreach y generate $1 ..; generate group, y; } Basically randomize the

Re: How to sample an inner bag?

2014-05-27 Thread Pradeep Gollakota
@Mehmet... great hack! I like it :-P On Tue, May 27, 2014 at 5:08 PM, Mehmet Tepedelenlioglu mehmets...@yahoo.com wrote: If you know how many items you want from each inner bag exactly, you can hack it like this: x = foreach x { y = foreach x generate RANDOM() as rnd, *; y =