RE: How to sample an inner bag?

2014-05-29 Thread william.dowling
- From: Mehmet Tepedelenlioglu [mailto:mehmets...@yahoo.com] Sent: Wednesday, May 28, 2014 4:27 PM To: user@pig.apache.org user@pig.apache.org Subject: Re: How to sample an inner bag? I have no experience with the python udfs (I use Java). But I doubt the example you supplied would work. First, I am

RE: How to sample an inner bag?

2014-05-28 Thread william.dowling
Reuters -Original Message- From: Mehmet Tepedelenlioglu [mailto:mehmets...@yahoo.com] Sent: Tuesday, May 27, 2014 5:09 PM To: user@pig.apache.org user@pig.apache.org Subject: Re: How to sample an inner bag? If you know how many items you want from each inner bag exactly, you can hack

Re: How to sample an inner bag?

2014-05-28 Thread Mehmet Tepedelenlioglu
Subject: Re: How to sample an inner bag? If you know how many items you want from each inner bag exactly, you can hack it like this: x = foreach x { y = foreach x generate RANDOM() as rnd, *; y = order y by rnd; y = limit y $SAMPLE_NUM; y = foreach y generate $1

How to sample an inner bag?

2014-05-27 Thread william.dowling
Hi Pig users, Is there an easy/efficient way to sample an inner bag? For example, with input in a relation like (id1,att1,{(a,0.01),(b,0.02),(x,0.999749968742)}) (id1,att2,{(a,0.03),(b,0.04),(x,0.998749217772)}) (id2,att1,{(b,0.05),(c,0.06),(x,0.996945334509)}) I’d like to sample 1/3 the

Re: How to sample an inner bag?

2014-05-27 Thread Mehmet Tepedelenlioglu
If you know how many items you want from each inner bag exactly, you can hack it like this: x = foreach x { y = foreach x generate RANDOM() as rnd, *; y = order y by rnd; y = limit y $SAMPLE_NUM; y = foreach y generate $1 ..; generate group, y; } Basically randomize the

Re: How to sample an inner bag?

2014-05-27 Thread Pradeep Gollakota
@Mehmet... great hack! I like it :-P On Tue, May 27, 2014 at 5:08 PM, Mehmet Tepedelenlioglu mehmets...@yahoo.com wrote: If you know how many items you want from each inner bag exactly, you can hack it like this: x = foreach x { y = foreach x generate RANDOM() as rnd, *; y =