date:20100420

[jira] Updated: (PIG-1385) UDF to create tuples and bags

2010-04-20 Thread hc busy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hc busy updated PIG-1385:
-

Attachment: PIG-1385-trunk.patch

> UDF to create tuples and bags
> -
>
> Key: PIG-1385
> URL: https://issues.apache.org/jira/browse/PIG-1385
> Project: Pig
>  Issue Type: New Feature
>  Components: tools
>Affects Versions: 0.6.0
>Reporter: hc busy
> Attachments: PIG-1385-trunk.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Based on this conversation:
> > On Tue, Apr 20, 2010 at 6:34 PM, hc busy  wrote:
> >
> > > What about making them part of the language using symbols?
> > >
> > > instead of
> > >
> > > foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7;
> > >
> > > have language support
> > >
> > > foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7;
> > >
> > > or even:
> > >
> > > foreach T generate ($0, $1, $2), {$3, $4, $5}, [$6#$7, $8#$9], $10, $11;
> > >
> > >
> > > Is there reason not to do the second or third other than being more
> > > complicated?
> > >
> > > Certainly I'd volunteer to put the top implementation in to the util
> > > package and submit them for builtin's, but the latter syntactic candies
> > > seems more natural..
> > >
> > >
> > >
> > > On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates  wrote:
> > >
> > >> The grouping package in piggybank is left over from back when Pig
> > allowed
> > >> users to define grouping functions (0.1).  Functions like these should
> > go in
> > >> evaluation.util.
> > >>
> > >> However, I'd consider putting these in builtin (in main Pig) instead.
> > >>  These are things everyone asks for and they seem like a reasonable
> > addition
> > >> to the core engine.  This will be more of a burden to write (as we'll
> > hold
> > >> them to a higher standard) but of more use to people as well.
> > >>
> > >> Alan.
> > >>
> > >>
> > >> On Apr 19, 2010, at 12:53 PM, hc busy wrote:
> > >>
> > >>  Some times I wonder... I mean, somebody went to the trouble of making a
> > >>> path
> > >>> called
> > >>>
> > >>> org.apache.pig.piggybank.grouping
> > >>>
> > >>> (where it seems like this code belong), but didn't check in any java
> > code
> > >>> into that package.
> > >>>
> > >>>
> > >>> Any comment about where to put this kind of utility classes?
> > >>>
> > >>>
> > >>>
> > >>> On Mon, Apr 19, 2010 at 12:07 PM, Andrey S  wrote:
> > >>>
> > >>>  2010/4/19 hc busy 
> > 
> >   That's just the way it is right now, you can't make bags or tuples
> > > directly... Maybe we should have some UDF's in piggybank for these:
> > >
> > > toBag()
> > > toTuple(); --which is kinda like exec(Tuple in){return in;}
> > > TupleToBag(); --some times you need it this way for some reason.
> > >
> > >
> > >  Ok. I place my current code here, may be later I make a patch (if
> > such
> >  implementation is acceptable of course).
> > 
> >  import org.apache.pig.EvalFunc;
> >  import org.apache.pig.data.BagFactory;
> >  import org.apache.pig.data.DataBag;
> >  import org.apache.pig.data.Tuple;
> >  import org.apache.pig.data.TupleFactory;
> > 
> >  import java.io.IOException;
> > 
> >  /**
> >  * Convert any sequence of fields to bag with specified count of
> >  fields
> >  * Schema: count:int, fld1 [, fld2, fld3, fld4... ].
> >  * Output: count=2, then { (fld1, fld2) , (fld3, fld4) ... }
> >  *
> >  * @author astepachev
> >  */
> >  public class ToBag extends EvalFunc {
> >   public BagFactory bagFactory;
> >   public TupleFactory tupleFactory;
> > 
> >   public ToBag() {
> >   bagFactory = BagFactory.getInstance();
> >   tupleFactory = TupleFactory.getInstance();
> >   }
> > 
> >   @Override
> >   public DataBag exec(Tuple input) throws IOException {
> >   if (input.isNull())
> >   return null;
> >   final DataBag bag = bagFactory.newDefaultBag();
> >   final Integer couter = (Integer) input.get(0);
> >   if (couter == null)
> >   return null;
> >   Tuple tuple = tupleFactory.newTuple();
> >   for (int i = 0; i < input.size() - 1; i++) {
> >   if (i % couter == 0) {
> >   tuple = tupleFactory.newTuple();
> >   bag.add(tuple);
> >   }
> >   tuple.append(input.get(i + 1));
> >   }
> >   return bag;
> >   }
> >  }
> > 
> >  import org.apache.pig.ExecType;
> >  import org.apache.pig.PigServer;
> >  import org.junit.Before;
> >  import org.junit.Test;
> > 
> >  import java.io.IOException;
> >  import java.net.URISyntaxException;
> >  import java.net.URL;
> > 
> >  import static org.junit.Assert.assertTrue;
> > >>

[jira] Updated: (PIG-1385) UDF to create tuples and bags

2010-04-20 Thread hc busy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hc busy updated PIG-1385:
-

Status: Patch Available  (was: Open)

> UDF to create tuples and bags
> -
>
> Key: PIG-1385
> URL: https://issues.apache.org/jira/browse/PIG-1385
> Project: Pig
>  Issue Type: New Feature
>  Components: tools
>Affects Versions: 0.6.0
>Reporter: hc busy
> Attachments: PIG-1385-trunk.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Based on this conversation:
> > On Tue, Apr 20, 2010 at 6:34 PM, hc busy  wrote:
> >
> > > What about making them part of the language using symbols?
> > >
> > > instead of
> > >
> > > foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7;
> > >
> > > have language support
> > >
> > > foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7;
> > >
> > > or even:
> > >
> > > foreach T generate ($0, $1, $2), {$3, $4, $5}, [$6#$7, $8#$9], $10, $11;
> > >
> > >
> > > Is there reason not to do the second or third other than being more
> > > complicated?
> > >
> > > Certainly I'd volunteer to put the top implementation in to the util
> > > package and submit them for builtin's, but the latter syntactic candies
> > > seems more natural..
> > >
> > >
> > >
> > > On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates  wrote:
> > >
> > >> The grouping package in piggybank is left over from back when Pig
> > allowed
> > >> users to define grouping functions (0.1).  Functions like these should
> > go in
> > >> evaluation.util.
> > >>
> > >> However, I'd consider putting these in builtin (in main Pig) instead.
> > >>  These are things everyone asks for and they seem like a reasonable
> > addition
> > >> to the core engine.  This will be more of a burden to write (as we'll
> > hold
> > >> them to a higher standard) but of more use to people as well.
> > >>
> > >> Alan.
> > >>
> > >>
> > >> On Apr 19, 2010, at 12:53 PM, hc busy wrote:
> > >>
> > >>  Some times I wonder... I mean, somebody went to the trouble of making a
> > >>> path
> > >>> called
> > >>>
> > >>> org.apache.pig.piggybank.grouping
> > >>>
> > >>> (where it seems like this code belong), but didn't check in any java
> > code
> > >>> into that package.
> > >>>
> > >>>
> > >>> Any comment about where to put this kind of utility classes?
> > >>>
> > >>>
> > >>>
> > >>> On Mon, Apr 19, 2010 at 12:07 PM, Andrey S  wrote:
> > >>>
> > >>>  2010/4/19 hc busy 
> > 
> >   That's just the way it is right now, you can't make bags or tuples
> > > directly... Maybe we should have some UDF's in piggybank for these:
> > >
> > > toBag()
> > > toTuple(); --which is kinda like exec(Tuple in){return in;}
> > > TupleToBag(); --some times you need it this way for some reason.
> > >
> > >
> > >  Ok. I place my current code here, may be later I make a patch (if
> > such
> >  implementation is acceptable of course).
> > 
> >  import org.apache.pig.EvalFunc;
> >  import org.apache.pig.data.BagFactory;
> >  import org.apache.pig.data.DataBag;
> >  import org.apache.pig.data.Tuple;
> >  import org.apache.pig.data.TupleFactory;
> > 
> >  import java.io.IOException;
> > 
> >  /**
> >  * Convert any sequence of fields to bag with specified count of
> >  fields
> >  * Schema: count:int, fld1 [, fld2, fld3, fld4... ].
> >  * Output: count=2, then { (fld1, fld2) , (fld3, fld4) ... }
> >  *
> >  * @author astepachev
> >  */
> >  public class ToBag extends EvalFunc {
> >   public BagFactory bagFactory;
> >   public TupleFactory tupleFactory;
> > 
> >   public ToBag() {
> >   bagFactory = BagFactory.getInstance();
> >   tupleFactory = TupleFactory.getInstance();
> >   }
> > 
> >   @Override
> >   public DataBag exec(Tuple input) throws IOException {
> >   if (input.isNull())
> >   return null;
> >   final DataBag bag = bagFactory.newDefaultBag();
> >   final Integer couter = (Integer) input.get(0);
> >   if (couter == null)
> >   return null;
> >   Tuple tuple = tupleFactory.newTuple();
> >   for (int i = 0; i < input.size() - 1; i++) {
> >   if (i % couter == 0) {
> >   tuple = tupleFactory.newTuple();
> >   bag.add(tuple);
> >   }
> >   tuple.append(input.get(i + 1));
> >   }
> >   return bag;
> >   }
> >  }
> > 
> >  import org.apache.pig.ExecType;
> >  import org.apache.pig.PigServer;
> >  import org.junit.Before;
> >  import org.junit.Test;
> > 
> >  import java.io.IOException;
> >  import java.net.URISyntaxException;
> >  import java.net.URL;
> > 
> >  import static org.junit.Assert.assertTrue;
>

[jira] Updated: (PIG-1386) UDF to extend functionalities of MaxTupleBy1stField

2010-04-20 Thread hc busy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hc busy updated PIG-1386:
-

Attachment: PIG-1386-trunk.patch

The patch

> UDF to extend functionalities of MaxTupleBy1stField
> ---
>
> Key: PIG-1386
> URL: https://issues.apache.org/jira/browse/PIG-1386
> Project: Pig
>  Issue Type: New Feature
>  Components: tools
>Affects Versions: 0.6.0
>Reporter: hc busy
> Attachments: PIG-1386-trunk.patch
>
>
> Based on this conversation:
> totally, go for it, it'd be pretty straightforward to add this
> functionality.
> - Hide quoted text -
> On Tue, Apr 20, 2010 at 6:45 PM, hc busy  wrote:
> > Hey, while we're on the subject, and I have your attention, can we
> > re-factor
> > the UDF MaxTupleByFirstField to take constructor?
> >
> > *define customMaxTuple ExtremalTupleByNthField(n, 'min');*
> > *G = group T by id;*
> > *M = foreach T generate customMaxTuple(T);
> > *
> >
> > Where n is the nth field, and the second parameter allows us to specify
> > "min", "max", "median",  etc...
> >
> > Does this seem like something useful to everyone?
> >
> >
> >
> > On Tue, Apr 20, 2010 at 6:34 PM, hc busy  wrote:
> >
> > > What about making them part of the language using symbols?
> > >
> > > instead of
> > >
> > > foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7;
> > >
> > > have language support
> > >
> > > foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7;
> > >
> > > or even:
> > >
> > > foreach T generate ($0, $1, $2), {$3, $4, $5}, [$6#$7, $8#$9], $10, $11;
> > >
> > >
> > > Is there reason not to do the second or third other than being more
> > > complicated?
> > >
> > > Certainly I'd volunteer to put the top implementation in to the util
> > > package and submit them for builtin's, but the latter syntactic candies
> > > seems more natural..
> > >
> > >
> > >
> > > On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates  wrote:
> > >
> > >> The grouping package in piggybank is left over from back when Pig
> > allowed
> > >> users to define grouping functions (0.1).  Functions like these should
> > go in
> > >> evaluation.util.
> > >>
> > >> However, I'd consider putting these in builtin (in main Pig) instead.
> > >>  These are things everyone asks for and they seem like a reasonable
> > addition
> > >> to the core engine.  This will be more of a burden to write (as we'll
> > hold
> > >> them to a higher standard) but of more use to people as well.
> > >>
> > >> Alan.
> > >>
> > >>
> > >> On Apr 19, 2010, at 12:53 PM, hc busy wrote:
> > >>
> > >>  Some times I wonder... I mean, somebody went to the trouble of making a
> > >>> path
> > >>> called
> > >>>
> > >>> org.apache.pig.piggybank.grouping
> > >>>
> > >>> (where it seems like this code belong), but didn't check in any java
> > code
> > >>> into that package.
> > >>>
> > >>>
> > >>> Any comment about where to put this kind of utility classes?
> > >>>
> > >>>
> > >>>
> > >>> On Mon, Apr 19, 2010 at 12:07 PM, Andrey S  wrote:
> > >>>
> > >>>  2010/4/19 hc busy 
> > 
> >   That's just the way it is right now, you can't make bags or tuples
> > > directly... Maybe we should have some UDF's in piggybank for these:
> > >
> > > toBag()
> > > toTuple(); --which is kinda like exec(Tuple in){return in;}
> > > TupleToBag(); --some times you need it this way for some reason.
> > >
> > >
> > >  Ok. I place my current code here, may be later I make a patch (if
> > such
> >  implementation is acceptable of course).
> > 
> >  import org.apache.pig.EvalFunc;
> >  import org.apache.pig.data.BagFactory;
> >  import org.apache.pig.data.DataBag;
> >  import org.apache.pig.data.Tuple;
> >  import org.apache.pig.data.TupleFactory;
> > 
> >  import java.io.IOException;
> > 
> >  /**
> >  * Convert any sequence of fields to bag with specified count of
> >  fields
> >  * Schema: count:int, fld1 [, fld2, fld3, fld4... ].
> >  * Output: count=2, then { (fld1, fld2) , (fld3, fld4) ... }
> >  *
> >  * @author astepachev
> >  */
> >  public class ToBag extends EvalFunc {
> >   public BagFactory bagFactory;
> >   public TupleFactory tupleFactory;
> > 
> >   public ToBag() {
> >   bagFactory = BagFactory.getInstance();
> >   tupleFactory = TupleFactory.getInstance();
> >   }
> > 
> >   @Override
> >   public DataBag exec(Tuple input) throws IOException {
> >   if (input.isNull())
> >   return null;
> >   final DataBag bag = bagFactory.newDefaultBag();
> >   final Integer couter = (Integer) input.get(0);
> >   if (couter == null)
> >   return null;
> >   Tuple tuple = tupleFactory.newTuple();
> >   for (int i = 0; i < input.size()

[jira] Updated: (PIG-1386) UDF to extend functionalities of MaxTupleBy1stField

2010-04-20 Thread hc busy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hc busy updated PIG-1386:
-

Status: Patch Available  (was: Open)

Here's a first stab.

> UDF to extend functionalities of MaxTupleBy1stField
> ---
>
> Key: PIG-1386
> URL: https://issues.apache.org/jira/browse/PIG-1386
> Project: Pig
>  Issue Type: New Feature
>  Components: tools
>Affects Versions: 0.6.0
>Reporter: hc busy
>
> Based on this conversation:
> totally, go for it, it'd be pretty straightforward to add this
> functionality.
> - Hide quoted text -
> On Tue, Apr 20, 2010 at 6:45 PM, hc busy  wrote:
> > Hey, while we're on the subject, and I have your attention, can we
> > re-factor
> > the UDF MaxTupleByFirstField to take constructor?
> >
> > *define customMaxTuple ExtremalTupleByNthField(n, 'min');*
> > *G = group T by id;*
> > *M = foreach T generate customMaxTuple(T);
> > *
> >
> > Where n is the nth field, and the second parameter allows us to specify
> > "min", "max", "median",  etc...
> >
> > Does this seem like something useful to everyone?
> >
> >
> >
> > On Tue, Apr 20, 2010 at 6:34 PM, hc busy  wrote:
> >
> > > What about making them part of the language using symbols?
> > >
> > > instead of
> > >
> > > foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7;
> > >
> > > have language support
> > >
> > > foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7;
> > >
> > > or even:
> > >
> > > foreach T generate ($0, $1, $2), {$3, $4, $5}, [$6#$7, $8#$9], $10, $11;
> > >
> > >
> > > Is there reason not to do the second or third other than being more
> > > complicated?
> > >
> > > Certainly I'd volunteer to put the top implementation in to the util
> > > package and submit them for builtin's, but the latter syntactic candies
> > > seems more natural..
> > >
> > >
> > >
> > > On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates  wrote:
> > >
> > >> The grouping package in piggybank is left over from back when Pig
> > allowed
> > >> users to define grouping functions (0.1).  Functions like these should
> > go in
> > >> evaluation.util.
> > >>
> > >> However, I'd consider putting these in builtin (in main Pig) instead.
> > >>  These are things everyone asks for and they seem like a reasonable
> > addition
> > >> to the core engine.  This will be more of a burden to write (as we'll
> > hold
> > >> them to a higher standard) but of more use to people as well.
> > >>
> > >> Alan.
> > >>
> > >>
> > >> On Apr 19, 2010, at 12:53 PM, hc busy wrote:
> > >>
> > >>  Some times I wonder... I mean, somebody went to the trouble of making a
> > >>> path
> > >>> called
> > >>>
> > >>> org.apache.pig.piggybank.grouping
> > >>>
> > >>> (where it seems like this code belong), but didn't check in any java
> > code
> > >>> into that package.
> > >>>
> > >>>
> > >>> Any comment about where to put this kind of utility classes?
> > >>>
> > >>>
> > >>>
> > >>> On Mon, Apr 19, 2010 at 12:07 PM, Andrey S  wrote:
> > >>>
> > >>>  2010/4/19 hc busy 
> > 
> >   That's just the way it is right now, you can't make bags or tuples
> > > directly... Maybe we should have some UDF's in piggybank for these:
> > >
> > > toBag()
> > > toTuple(); --which is kinda like exec(Tuple in){return in;}
> > > TupleToBag(); --some times you need it this way for some reason.
> > >
> > >
> > >  Ok. I place my current code here, may be later I make a patch (if
> > such
> >  implementation is acceptable of course).
> > 
> >  import org.apache.pig.EvalFunc;
> >  import org.apache.pig.data.BagFactory;
> >  import org.apache.pig.data.DataBag;
> >  import org.apache.pig.data.Tuple;
> >  import org.apache.pig.data.TupleFactory;
> > 
> >  import java.io.IOException;
> > 
> >  /**
> >  * Convert any sequence of fields to bag with specified count of
> >  fields
> >  * Schema: count:int, fld1 [, fld2, fld3, fld4... ].
> >  * Output: count=2, then { (fld1, fld2) , (fld3, fld4) ... }
> >  *
> >  * @author astepachev
> >  */
> >  public class ToBag extends EvalFunc {
> >   public BagFactory bagFactory;
> >   public TupleFactory tupleFactory;
> > 
> >   public ToBag() {
> >   bagFactory = BagFactory.getInstance();
> >   tupleFactory = TupleFactory.getInstance();
> >   }
> > 
> >   @Override
> >   public DataBag exec(Tuple input) throws IOException {
> >   if (input.isNull())
> >   return null;
> >   final DataBag bag = bagFactory.newDefaultBag();
> >   final Integer couter = (Integer) input.get(0);
> >   if (couter == null)
> >   return null;
> >   Tuple tuple = tupleFactory.newTuple();
> >   for (int i = 0; i < input.size() - 1; i++) {
> >   i

[jira] Created: (PIG-1387) Syntactical Sugar for PIG-1385

2010-04-20 Thread hc busy (JIRA)

Syntactical Sugar for PIG-1385
--

 Key: PIG-1387
 URL: https://issues.apache.org/jira/browse/PIG-1387
 Project: Pig
  Issue Type: Wish
  Components: grunt
Affects Versions: 0.6.0
Reporter: hc busy


>From this conversation, extend PIG-1385 to instead of calling UDF use built-in 
>behavior when the (),{},[] groupings are encountered.


> > What about making them part of the language using symbols?
> >
> > instead of
> >
> > foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7;
> >
> > have language support
> >
> > foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7;
> >
> > or even:
> >
> > foreach T generate ($0, $1, $2), {$3, $4, $5}, [$6#$7, $8#$9], $10, $11;
> >
> >
> > Is there reason not to do the second or third other than being more
> > complicated?
> >
> > Certainly I'd volunteer to put the top implementation in to the util
> > package and submit them for builtin's, but the latter syntactic candies
> > seems more natural..
> >
> >
> >
> > On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates  wrote:
> >
> >> The grouping package in piggybank is left over from back when Pig
> allowed
> >> users to define grouping functions (0.1).  Functions like these should
> go in
> >> evaluation.util.
> >>
> >> However, I'd consider putting these in builtin (in main Pig) instead.
> >>  These are things everyone asks for and they seem like a reasonable
> addition
> >> to the core engine.  This will be more of a burden to write (as we'll
> hold
> >> them to a higher standard) but of more use to people as well.
> >>
> >> Alan.
> >>
> >>
> >> On Apr 19, 2010, at 12:53 PM, hc busy wrote:
> >>
> >>  Some times I wonder... I mean, somebody went to the trouble of making a
> >>> path
> >>> called
> >>>
> >>> org.apache.pig.piggybank.grouping
> >>>
> >>> (where it seems like this code belong), but didn't check in any java
> code
> >>> into that package.
> >>>
> >>>
> >>> Any comment about where to put this kind of utility classes?
> >>>
> >>>
> >>>
> >>> On Mon, Apr 19, 2010 at 12:07 PM, Andrey S  wrote:
> >>>
> >>>  2010/4/19 hc busy 
> 
>   That's just the way it is right now, you can't make bags or tuples
> > directly... Maybe we should have some UDF's in piggybank for these:
> >
> > toBag()
> > toTuple(); --which is kinda like exec(Tuple in){return in;}
> > TupleToBag(); --some times you need it this way for some reason.
> >
> >
> >  Ok. I place my current code here, may be later I make a patch (if
> such
>  implementation is acceptable of course).
> 
>  import org.apache.pig.EvalFunc;
>  import org.apache.pig.data.BagFactory;
>  import org.apache.pig.data.DataBag;
>  import org.apache.pig.data.Tuple;
>  import org.apache.pig.data.TupleFactory;
> 
>  import java.io.IOException;
> 
>  /**
>  * Convert any sequence of fields to bag with specified count of
>  fields
>  * Schema: count:int, fld1 [, fld2, fld3, fld4... ].
>  * Output: count=2, then { (fld1, fld2) , (fld3, fld4) ... }
>  *
>  * @author astepachev
>  */
>  public class ToBag extends EvalFunc {
>   public BagFactory bagFactory;
>   public TupleFactory tupleFactory;
> 
>   public ToBag() {
>   bagFactory = BagFactory.getInstance();
>   tupleFactory = TupleFactory.getInstance();
>   }
> 
>   @Override
>   public DataBag exec(Tuple input) throws IOException {
>   if (input.isNull())
>   return null;
>   final DataBag bag = bagFactory.newDefaultBag();
>   final Integer couter = (Integer) input.get(0);
>   if (couter == null)
>   return null;
>   Tuple tuple = tupleFactory.newTuple();
>   for (int i = 0; i < input.size() - 1; i++) {
>   if (i % couter == 0) {
>   tuple = tupleFactory.newTuple();
>   bag.add(tuple);
>   }
>   tuple.append(input.get(i + 1));
>   }
>   return bag;
>   }
>  }
> 
>  import org.apache.pig.ExecType;
>  import org.apache.pig.PigServer;
>  import org.junit.Before;
>  import org.junit.Test;
> 
>  import java.io.IOException;
>  import java.net.URISyntaxException;
>  import java.net.URL;
> 
>  import static org.junit.Assert.assertTrue;
> 
>  /**
>  * @author astepachev
>  */
>  public class ToBagTest {
>   PigServer pigServer;
>   URL inputTxt;
> 
>   @Before
>   public void init() throws IOException, URISyntaxException {
>   pigServer = new PigServer(ExecType.LOCAL);
>   inputTxt =
>  this.getClass().getResource("bagTest.txt").toURI().toURL();
>   }
> 
>   @Test
>   public void testSimple() throws IOException {
>   pigServer.registerQuery("a = load '" + inputTxt.toExternalForm()
> +
>  "

[jira] Updated: (PIG-1385) UDF to create tuples and bags

2010-04-20 Thread hc busy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hc busy updated PIG-1385:
-

Affects Version/s: 0.6.0
  Description: 
Based on this conversation:

> On Tue, Apr 20, 2010 at 6:34 PM, hc busy  wrote:
>
> > What about making them part of the language using symbols?
> >
> > instead of
> >
> > foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7;
> >
> > have language support
> >
> > foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7;
> >
> > or even:
> >
> > foreach T generate ($0, $1, $2), {$3, $4, $5}, [$6#$7, $8#$9], $10, $11;
> >
> >
> > Is there reason not to do the second or third other than being more
> > complicated?
> >
> > Certainly I'd volunteer to put the top implementation in to the util
> > package and submit them for builtin's, but the latter syntactic candies
> > seems more natural..
> >
> >
> >
> > On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates  wrote:
> >
> >> The grouping package in piggybank is left over from back when Pig
> allowed
> >> users to define grouping functions (0.1).  Functions like these should
> go in
> >> evaluation.util.
> >>
> >> However, I'd consider putting these in builtin (in main Pig) instead.
> >>  These are things everyone asks for and they seem like a reasonable
> addition
> >> to the core engine.  This will be more of a burden to write (as we'll
> hold
> >> them to a higher standard) but of more use to people as well.
> >>
> >> Alan.
> >>
> >>
> >> On Apr 19, 2010, at 12:53 PM, hc busy wrote:
> >>
> >>  Some times I wonder... I mean, somebody went to the trouble of making a
> >>> path
> >>> called
> >>>
> >>> org.apache.pig.piggybank.grouping
> >>>
> >>> (where it seems like this code belong), but didn't check in any java
> code
> >>> into that package.
> >>>
> >>>
> >>> Any comment about where to put this kind of utility classes?
> >>>
> >>>
> >>>
> >>> On Mon, Apr 19, 2010 at 12:07 PM, Andrey S  wrote:
> >>>
> >>>  2010/4/19 hc busy 
> 
>   That's just the way it is right now, you can't make bags or tuples
> > directly... Maybe we should have some UDF's in piggybank for these:
> >
> > toBag()
> > toTuple(); --which is kinda like exec(Tuple in){return in;}
> > TupleToBag(); --some times you need it this way for some reason.
> >
> >
> >  Ok. I place my current code here, may be later I make a patch (if
> such
>  implementation is acceptable of course).
> 
>  import org.apache.pig.EvalFunc;
>  import org.apache.pig.data.BagFactory;
>  import org.apache.pig.data.DataBag;
>  import org.apache.pig.data.Tuple;
>  import org.apache.pig.data.TupleFactory;
> 
>  import java.io.IOException;
> 
>  /**
>  * Convert any sequence of fields to bag with specified count of
>  fields
>  * Schema: count:int, fld1 [, fld2, fld3, fld4... ].
>  * Output: count=2, then { (fld1, fld2) , (fld3, fld4) ... }
>  *
>  * @author astepachev
>  */
>  public class ToBag extends EvalFunc {
>   public BagFactory bagFactory;
>   public TupleFactory tupleFactory;
> 
>   public ToBag() {
>   bagFactory = BagFactory.getInstance();
>   tupleFactory = TupleFactory.getInstance();
>   }
> 
>   @Override
>   public DataBag exec(Tuple input) throws IOException {
>   if (input.isNull())
>   return null;
>   final DataBag bag = bagFactory.newDefaultBag();
>   final Integer couter = (Integer) input.get(0);
>   if (couter == null)
>   return null;
>   Tuple tuple = tupleFactory.newTuple();
>   for (int i = 0; i < input.size() - 1; i++) {
>   if (i % couter == 0) {
>   tuple = tupleFactory.newTuple();
>   bag.add(tuple);
>   }
>   tuple.append(input.get(i + 1));
>   }
>   return bag;
>   }
>  }
> 
>  import org.apache.pig.ExecType;
>  import org.apache.pig.PigServer;
>  import org.junit.Before;
>  import org.junit.Test;
> 
>  import java.io.IOException;
>  import java.net.URISyntaxException;
>  import java.net.URL;
> 
>  import static org.junit.Assert.assertTrue;
> 
>  /**
>  * @author astepachev
>  */
>  public class ToBagTest {
>   PigServer pigServer;
>   URL inputTxt;
> 
>   @Before
>   public void init() throws IOException, URISyntaxException {
>   pigServer = new PigServer(ExecType.LOCAL);
>   inputTxt =
>  this.getClass().getResource("bagTest.txt").toURI().toURL();
>   }
> 
>   @Test
>   public void testSimple() throws IOException {
>   pigServer.registerQuery("a = load '" + inputTxt.toExternalForm()
> +
>  "' using PigStorage(',') " +
>   "as (id:int, a:chararray, b:chararray, c:chararray,
>  d:chararray);");
> >>

[jira] Created: (PIG-1386) UDF to extend functionalities of MaxTupleBy1stField

2010-04-20 Thread hc busy (JIRA)

UDF to extend functionalities of MaxTupleBy1stField
---

 Key: PIG-1386
 URL: https://issues.apache.org/jira/browse/PIG-1386
 Project: Pig
  Issue Type: New Feature
  Components: tools
Affects Versions: 0.6.0
Reporter: hc busy


Based on this conversation:

totally, go for it, it'd be pretty straightforward to add this
functionality.
- Hide quoted text -



On Tue, Apr 20, 2010 at 6:45 PM, hc busy  wrote:

> Hey, while we're on the subject, and I have your attention, can we
> re-factor
> the UDF MaxTupleByFirstField to take constructor?
>
> *define customMaxTuple ExtremalTupleByNthField(n, 'min');*
> *G = group T by id;*
> *M = foreach T generate customMaxTuple(T);
> *
>
> Where n is the nth field, and the second parameter allows us to specify
> "min", "max", "median",  etc...
>
> Does this seem like something useful to everyone?
>
>
>
> On Tue, Apr 20, 2010 at 6:34 PM, hc busy  wrote:
>
> > What about making them part of the language using symbols?
> >
> > instead of
> >
> > foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7;
> >
> > have language support
> >
> > foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7;
> >
> > or even:
> >
> > foreach T generate ($0, $1, $2), {$3, $4, $5}, [$6#$7, $8#$9], $10, $11;
> >
> >
> > Is there reason not to do the second or third other than being more
> > complicated?
> >
> > Certainly I'd volunteer to put the top implementation in to the util
> > package and submit them for builtin's, but the latter syntactic candies
> > seems more natural..
> >
> >
> >
> > On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates  wrote:
> >
> >> The grouping package in piggybank is left over from back when Pig
> allowed
> >> users to define grouping functions (0.1).  Functions like these should
> go in
> >> evaluation.util.
> >>
> >> However, I'd consider putting these in builtin (in main Pig) instead.
> >>  These are things everyone asks for and they seem like a reasonable
> addition
> >> to the core engine.  This will be more of a burden to write (as we'll
> hold
> >> them to a higher standard) but of more use to people as well.
> >>
> >> Alan.
> >>
> >>
> >> On Apr 19, 2010, at 12:53 PM, hc busy wrote:
> >>
> >>  Some times I wonder... I mean, somebody went to the trouble of making a
> >>> path
> >>> called
> >>>
> >>> org.apache.pig.piggybank.grouping
> >>>
> >>> (where it seems like this code belong), but didn't check in any java
> code
> >>> into that package.
> >>>
> >>>
> >>> Any comment about where to put this kind of utility classes?
> >>>
> >>>
> >>>
> >>> On Mon, Apr 19, 2010 at 12:07 PM, Andrey S  wrote:
> >>>
> >>>  2010/4/19 hc busy 
> 
>   That's just the way it is right now, you can't make bags or tuples
> > directly... Maybe we should have some UDF's in piggybank for these:
> >
> > toBag()
> > toTuple(); --which is kinda like exec(Tuple in){return in;}
> > TupleToBag(); --some times you need it this way for some reason.
> >
> >
> >  Ok. I place my current code here, may be later I make a patch (if
> such
>  implementation is acceptable of course).
> 
>  import org.apache.pig.EvalFunc;
>  import org.apache.pig.data.BagFactory;
>  import org.apache.pig.data.DataBag;
>  import org.apache.pig.data.Tuple;
>  import org.apache.pig.data.TupleFactory;
> 
>  import java.io.IOException;
> 
>  /**
>  * Convert any sequence of fields to bag with specified count of
>  fields
>  * Schema: count:int, fld1 [, fld2, fld3, fld4... ].
>  * Output: count=2, then { (fld1, fld2) , (fld3, fld4) ... }
>  *
>  * @author astepachev
>  */
>  public class ToBag extends EvalFunc {
>   public BagFactory bagFactory;
>   public TupleFactory tupleFactory;
> 
>   public ToBag() {
>   bagFactory = BagFactory.getInstance();
>   tupleFactory = TupleFactory.getInstance();
>   }
> 
>   @Override
>   public DataBag exec(Tuple input) throws IOException {
>   if (input.isNull())
>   return null;
>   final DataBag bag = bagFactory.newDefaultBag();
>   final Integer couter = (Integer) input.get(0);
>   if (couter == null)
>   return null;
>   Tuple tuple = tupleFactory.newTuple();
>   for (int i = 0; i < input.size() - 1; i++) {
>   if (i % couter == 0) {
>   tuple = tupleFactory.newTuple();
>   bag.add(tuple);
>   }
>   tuple.append(input.get(i + 1));
>   }
>   return bag;
>   }
>  }
> 
>  import org.apache.pig.ExecType;
>  import org.apache.pig.PigServer;
>  import org.junit.Before;
>  import org.junit.Test;
> 
>  import java.io.IOException;
>  import java.net.URISyntaxException;
>  import java.net.URL;
> 
>  import st

[jira] Created: (PIG-1385) UDF to create tuples and bags

2010-04-20 Thread hc busy (JIRA)

UDF to create tuples and bags
-

 Key: PIG-1385
 URL: https://issues.apache.org/jira/browse/PIG-1385
 Project: Pig
  Issue Type: New Feature
  Components: tools
Reporter: hc busy


Based on this conversation:

totally, go for it, it'd be pretty straightforward to add this
functionality.
- Hide quoted text -



On Tue, Apr 20, 2010 at 6:45 PM, hc busy  wrote:

> Hey, while we're on the subject, and I have your attention, can we
> re-factor
> the UDF MaxTupleByFirstField to take constructor?
>
> *define customMaxTuple ExtremalTupleByNthField(n, 'min');*
> *G = group T by id;*
> *M = foreach T generate customMaxTuple(T);
> *
>
> Where n is the nth field, and the second parameter allows us to specify
> "min", "max", "median",  etc...
>
> Does this seem like something useful to everyone?
>
>
>
> On Tue, Apr 20, 2010 at 6:34 PM, hc busy  wrote:
>
> > What about making them part of the language using symbols?
> >
> > instead of
> >
> > foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7;
> >
> > have language support
> >
> > foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7;
> >
> > or even:
> >
> > foreach T generate ($0, $1, $2), {$3, $4, $5}, [$6#$7, $8#$9], $10, $11;
> >
> >
> > Is there reason not to do the second or third other than being more
> > complicated?
> >
> > Certainly I'd volunteer to put the top implementation in to the util
> > package and submit them for builtin's, but the latter syntactic candies
> > seems more natural..
> >
> >
> >
> > On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates  wrote:
> >
> >> The grouping package in piggybank is left over from back when Pig
> allowed
> >> users to define grouping functions (0.1).  Functions like these should
> go in
> >> evaluation.util.
> >>
> >> However, I'd consider putting these in builtin (in main Pig) instead.
> >>  These are things everyone asks for and they seem like a reasonable
> addition
> >> to the core engine.  This will be more of a burden to write (as we'll
> hold
> >> them to a higher standard) but of more use to people as well.
> >>
> >> Alan.
> >>
> >>
> >> On Apr 19, 2010, at 12:53 PM, hc busy wrote:
> >>
> >>  Some times I wonder... I mean, somebody went to the trouble of making a
> >>> path
> >>> called
> >>>
> >>> org.apache.pig.piggybank.grouping
> >>>
> >>> (where it seems like this code belong), but didn't check in any java
> code
> >>> into that package.
> >>>
> >>>
> >>> Any comment about where to put this kind of utility classes?
> >>>
> >>>
> >>>
> >>> On Mon, Apr 19, 2010 at 12:07 PM, Andrey S  wrote:
> >>>
> >>>  2010/4/19 hc busy 
> 
>   That's just the way it is right now, you can't make bags or tuples
> > directly... Maybe we should have some UDF's in piggybank for these:
> >
> > toBag()
> > toTuple(); --which is kinda like exec(Tuple in){return in;}
> > TupleToBag(); --some times you need it this way for some reason.
> >
> >
> >  Ok. I place my current code here, may be later I make a patch (if
> such
>  implementation is acceptable of course).
> 
>  import org.apache.pig.EvalFunc;
>  import org.apache.pig.data.BagFactory;
>  import org.apache.pig.data.DataBag;
>  import org.apache.pig.data.Tuple;
>  import org.apache.pig.data.TupleFactory;
> 
>  import java.io.IOException;
> 
>  /**
>  * Convert any sequence of fields to bag with specified count of
>  fields
>  * Schema: count:int, fld1 [, fld2, fld3, fld4... ].
>  * Output: count=2, then { (fld1, fld2) , (fld3, fld4) ... }
>  *
>  * @author astepachev
>  */
>  public class ToBag extends EvalFunc {
>   public BagFactory bagFactory;
>   public TupleFactory tupleFactory;
> 
>   public ToBag() {
>   bagFactory = BagFactory.getInstance();
>   tupleFactory = TupleFactory.getInstance();
>   }
> 
>   @Override
>   public DataBag exec(Tuple input) throws IOException {
>   if (input.isNull())
>   return null;
>   final DataBag bag = bagFactory.newDefaultBag();
>   final Integer couter = (Integer) input.get(0);
>   if (couter == null)
>   return null;
>   Tuple tuple = tupleFactory.newTuple();
>   for (int i = 0; i < input.size() - 1; i++) {
>   if (i % couter == 0) {
>   tuple = tupleFactory.newTuple();
>   bag.add(tuple);
>   }
>   tuple.append(input.get(i + 1));
>   }
>   return bag;
>   }
>  }
> 
>  import org.apache.pig.ExecType;
>  import org.apache.pig.PigServer;
>  import org.junit.Before;
>  import org.junit.Test;
> 
>  import java.io.IOException;
>  import java.net.URISyntaxException;
>  import java.net.URL;
> 
>  import static org.junit.Assert.assertTrue;
> 
>  /**
>  * @author ast

[jira] Commented: (PIG-1384) Adding contrib javadoc to main Pig javadoc

2010-04-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859181#action_12859181
 ] 

Hadoop QA commented on PIG-1384:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12442360/PIG-1384-0.7.patch
  against trunk revision 935968.

+1 @author.  The patch does not contain any @author tags.

+0 tests included.  The patch appears to be a documentation patch that 
doesn't require tests.

-1 javadoc.  The javadoc tool appears to have generated 1 warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/296/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/296/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/296/console

This message is automatically generated.

> Adding contrib javadoc to main Pig javadoc
> --
>
> Key: PIG-1384
> URL: https://issues.apache.org/jira/browse/PIG-1384
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 0.7.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.7.0
>
> Attachments: PIG-1384-0.7.patch, PIG-1384-trunk.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-798) Schema errors when using PigStorage and none when using BinStorage in FOREACH??

2010-04-20 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859159#action_12859159
 ] 

Ashutosh Chauhan commented on PIG-798:
--

Viraj,

I am confused with this description. It seems to me that you are first storing 
some data using BinStorage and then loading it using PigStorage. If that is so, 
obviously it will not work. PigStorage and BinStorage aren't interoperable in 
this way. Specifically, data stored using BinStorage, can only be loaded using 
BinStorage.

> Schema errors when using PigStorage and none when using BinStorage in 
> FOREACH??
> ---
>
> Key: PIG-798
> URL: https://issues.apache.org/jira/browse/PIG-798
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: Viraj Bhat
> Attachments: binstoragecreateop, schemaerr.pig, visits.txt
>
>
> In the following script I have a tab separated text file, which I load using 
> PigStorage() and store using BinStorage()
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage() as (name:chararray, 
> url:chararray, time:chararray);
> B = group A by name;
> store B into '/user/viraj/binstoragecreateop' using BinStorage();
> dump B;
> {code}
> I later load file 'binstoragecreateop' in the following way.
> {code}
> A = load '/user/viraj/binstoragecreateop' using BinStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> Result
> ===
> (Amy)
> (Fred)
> ===
> The above code work properly and returns the right results. If I use 
> PigStorage() to achieve the same, I get the following error.
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> ===
> {code}
> 2009-05-02 03:58:50,662 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1022: Type mismatch merging schema prefix. Field Schema: bytearray. Other 
> Field Schema: name: chararray
> Details at logfile: /home/viraj/pig-svn/trunk/pig_1241236728311.log
> {code}
> ===
> So why should the semantics of BinStorage() be different from PigStorage() 
> where is ok not to specify a schema??? Should it not be consistent across 
> both.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1341) BinStorage cannot convert DataByteArray to Chararray and results in FIELD_DISCARDED_TYPE_CONVERSION_FAILED

2010-04-20 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859157#action_12859157
 ] 

Ashutosh Chauhan commented on PIG-1341:
---

I think BinStorage is an internal way of moving data around in Pig and it 
should be treated that way. I think we should discourage its usage to user. 
Otherwise, we need to add capabilities as the one requested here. Important 
impact of making such a change is that we can't  then swap out BinStorage with 
other storage mechanisms. If Avro (or protobuf or whatever) proved to be a 
better replacement for BinStorage, then we cant just swap them in place of 
BinStorage, unless we add to them all the capabilities that BinStorage has. 
Therefore, I suggest to keep capabilities of BinStorage to minimal.  

> BinStorage cannot convert DataByteArray to Chararray and results in 
> FIELD_DISCARDED_TYPE_CONVERSION_FAILED
> --
>
> Key: PIG-1341
> URL: https://issues.apache.org/jira/browse/PIG-1341
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Viraj Bhat
>Assignee: Richard Ding
> Attachments: PIG-1341.patch
>
>
> Script reads in BinStorage data and tries to convert a column which is in 
> DataByteArray to Chararray. 
> {code}
> raw = load 'sampledata' using BinStorage() as (col1,col2, col3);
> --filter out null columns
> A = filter raw by col1#'bcookie' is not null;
> B = foreach A generate col1#'bcookie'  as reqcolumn;
> describe B;
> --B: {regcolumn: bytearray}
> X = limit B 5;
> dump X;
> B = foreach A generate (chararray)col1#'bcookie'  as convertedcol;
> describe B;
> --B: {convertedcol: chararray}
> X = limit B 5;
> dump X;
> {code}
> The first dump produces:
> (36co9b55onr8s)
> (36co9b55onr8s)
> (36hilul5oo1q1)
> (36hilul5oo1q1)
> (36l4cj15ooa8a)
> The second dump produces:
> ()
> ()
> ()
> ()
> ()
> It also throws an error message: FIELD_DISCARDED_TYPE_CONVERSION_FAILED 5 
> time(s).
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1339) International characters in column names not supported

2010-04-20 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859152#action_12859152
 ] 

Ashutosh Chauhan commented on PIG-1339:
---

This is not reproducible on trunk. I get the expected output. Viraj, can you 
please verify if it works for you in trunk ?

> International characters in column names not supported
> --
>
> Key: PIG-1339
> URL: https://issues.apache.org/jira/browse/PIG-1339
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Viraj Bhat
>
> There is a particular use-case in which someone specifies a column name to be 
> in International characters.
> {code}
> inputdata = load '/user/viraj/inputdata.txt' using PigStorage() as (あいうえお);
> describe inputdata;
> dump inputdata;
> {code}
> ==
> Pig Stack Trace
> ---
> ERROR 1000: Error during parsing. Lexical error at line 1, column 64.  
> Encountered: "\u3042" (12354), after : ""
> org.apache.pig.impl.logicalLayer.parser.TokenMgrError: Lexical error at line 
> 1, column 64.  Encountered: "\u3042" (12354), after : ""
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParserTokenManager.getNextToken(QueryParserTokenManager.java:1791)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_scan_token(QueryParser.java:8959)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_51(QueryParser.java:7462)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_120(QueryParser.java:7769)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_106(QueryParser.java:7787)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_63(QueryParser.java:8609)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3R_32(QueryParser.java:8621)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_3_4(QueryParser.java:8354)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_2_4(QueryParser.java:6903)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1249)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700)
> at 
> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
> at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
> at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:391)
> ==
> Thanks Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1383) Remove empty svn directorirs from source tree

2010-04-20 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859099#action_12859099
 ] 

Daniel Dai commented on PIG-1383:
-

+1, go ahead and remove them.

> Remove empty svn directorirs from source tree
> -
>
> Key: PIG-1383
> URL: https://issues.apache.org/jira/browse/PIG-1383
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Richard Ding
> Fix For: 0.8.0
>
>
> Directories such as src/org/apache/pig/backend/local/ and its sub directories 
> are empty and should be removed from the svn repository.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1384) Adding contrib javadoc to main Pig javadoc

2010-04-20 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1384:


Status: Patch Available  (was: Open)

Hudson may cry loud since there are so many new javadoc warnings, especially in 
piggybank.

> Adding contrib javadoc to main Pig javadoc
> --
>
> Key: PIG-1384
> URL: https://issues.apache.org/jira/browse/PIG-1384
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 0.7.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.7.0
>
> Attachments: PIG-1384-0.7.patch, PIG-1384-trunk.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1384) Adding contrib javadoc to main Pig javadoc

2010-04-20 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1384:


Attachment: PIG-1384-trunk.patch
PIG-1384-0.7.patch

> Adding contrib javadoc to main Pig javadoc
> --
>
> Key: PIG-1384
> URL: https://issues.apache.org/jira/browse/PIG-1384
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 0.7.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.7.0
>
> Attachments: PIG-1384-0.7.patch, PIG-1384-trunk.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1384) Adding contrib javadoc to main Pig javadoc

2010-04-20 Thread Daniel Dai (JIRA)

Adding contrib javadoc to main Pig javadoc
--

 Key: PIG-1384
 URL: https://issues.apache.org/jira/browse/PIG-1384
 Project: Pig
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1383) Remove empty svn directorirs from source tree

2010-04-20 Thread Richard Ding (JIRA)

Remove empty svn directorirs from source tree
-

 Key: PIG-1383
 URL: https://issues.apache.org/jira/browse/PIG-1383
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Richard Ding
 Fix For: 0.8.0


Directories such as src/org/apache/pig/backend/local/ and its sub directories 
are empty and should be removed from the svn repository.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1382) Command line option -c doesn't work

2010-04-20 Thread Richard Ding (JIRA)

Command line option -c doesn't work
---

 Key: PIG-1382
 URL: https://issues.apache.org/jira/browse/PIG-1382
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Richard Ding
 Fix For: 0.8.0


Currently this option is not used, but it's documented:

"-c, -cluster clustername, kryptonite is default"

We should either remove it from documentation or find someway to use it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1375) [Zebra] To support writing multiple Zebra tables through Pig

2010-04-20 Thread Yan Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1375:
--

Status: Resolved  (was: Patch Available)
Resolution: Fixed

Committed to the trunk.

> [Zebra] To support writing multiple Zebra tables through Pig
> 
>
> Key: PIG-1375
> URL: https://issues.apache.org/jira/browse/PIG-1375
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Chao Wang
>Assignee: Chao Wang
> Fix For: 0.8.0
>
> Attachments: PIG-1375.patch, PIG-1375.patch, PIG-1375.patch
>
>
> In Zebra, we already have multiple outputs support for map/reduce.  But we do 
> not support this feature if users use Zebra through Pig.
> This jira is to address this issue. We plan to support writing to multiple 
> output tables through Pig as well.
> We propose to support the following Pig store statements with multiple 
> outputs:
> store relation into 'loc1,loc2,loc3' using 
> org.apache.hadoop.zebra.pig.TableStorer('storagehint_string',
> 'complete name of your custom partition class', 'some arguments to partition 
> class'); /* if certain partition class arguments is needed */
> store relation into 'loc1,loc2,loc3' using 
> org.apache.hadoop.zebra.pig.TableStorer('storagehint_string',
> 'complete name of your custom partition class'); /* if no partition class 
> arguments is needed */
> Note that users need to specify up to three arguments - storage hint string, 
> complete name of partition class and partition class arguments string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1375) [Zebra] To support writing multiple Zebra tables through Pig

2010-04-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12858754#action_12858754
 ] 

Hadoop QA commented on PIG-1375:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12442232/PIG-1375.patch
  against trunk revision 935101.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 9 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/295/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/295/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/295/console

This message is automatically generated.

> [Zebra] To support writing multiple Zebra tables through Pig
> 
>
> Key: PIG-1375
> URL: https://issues.apache.org/jira/browse/PIG-1375
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Chao Wang
>Assignee: Chao Wang
> Fix For: 0.8.0
>
> Attachments: PIG-1375.patch, PIG-1375.patch, PIG-1375.patch
>
>
> In Zebra, we already have multiple outputs support for map/reduce.  But we do 
> not support this feature if users use Zebra through Pig.
> This jira is to address this issue. We plan to support writing to multiple 
> output tables through Pig as well.
> We propose to support the following Pig store statements with multiple 
> outputs:
> store relation into 'loc1,loc2,loc3' using 
> org.apache.hadoop.zebra.pig.TableStorer('storagehint_string',
> 'complete name of your custom partition class', 'some arguments to partition 
> class'); /* if certain partition class arguments is needed */
> store relation into 'loc1,loc2,loc3' using 
> org.apache.hadoop.zebra.pig.TableStorer('storagehint_string',
> 'complete name of your custom partition class'); /* if no partition class 
> arguments is needed */
> Note that users need to specify up to three arguments - storage hint string, 
> complete name of partition class and partition class arguments string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1385) UDF to create tuples and bags

[jira] Updated: (PIG-1385) UDF to create tuples and bags

[jira] Updated: (PIG-1386) UDF to extend functionalities of MaxTupleBy1stField

[jira] Updated: (PIG-1386) UDF to extend functionalities of MaxTupleBy1stField

[jira] Created: (PIG-1387) Syntactical Sugar for PIG-1385

[jira] Updated: (PIG-1385) UDF to create tuples and bags

[jira] Created: (PIG-1386) UDF to extend functionalities of MaxTupleBy1stField

[jira] Created: (PIG-1385) UDF to create tuples and bags

[jira] Commented: (PIG-1384) Adding contrib javadoc to main Pig javadoc

[jira] Commented: (PIG-798) Schema errors when using PigStorage and none when using BinStorage in FOREACH??

[jira] Commented: (PIG-1341) BinStorage cannot convert DataByteArray to Chararray and results in FIELD_DISCARDED_TYPE_CONVERSION_FAILED

[jira] Commented: (PIG-1339) International characters in column names not supported

[jira] Commented: (PIG-1383) Remove empty svn directorirs from source tree

[jira] Updated: (PIG-1384) Adding contrib javadoc to main Pig javadoc

[jira] Updated: (PIG-1384) Adding contrib javadoc to main Pig javadoc

[jira] Created: (PIG-1384) Adding contrib javadoc to main Pig javadoc

[jira] Created: (PIG-1383) Remove empty svn directorirs from source tree

[jira] Created: (PIG-1382) Command line option -c doesn't work

[jira] Updated: (PIG-1375) [Zebra] To support writing multiple Zebra tables through Pig

[jira] Commented: (PIG-1375) [Zebra] To support writing multiple Zebra tables through Pig

20 matches

Site Navigation

Mail list logo

Footer information