Re: Pig 0.11

2012-10-10 Thread Bill Graham
+1 for me.

There's https://issues.apache.org/jira/browse/PIG-2756 which tracks a few
documentation issues that should block Pig 0.11, but they can also be done
on the trunk and merged to the branch. Gianmarco, you can add a rank
subtask there to serve as a reminder.


On Wed, Oct 10, 2012 at 11:03 PM, Gianmarco De Francisci Morales <
g...@apache.org> wrote:

> We are missing some documentation on the RANK but I guess we could add that
> to the branch and trunk in parallel.
> All the patches I was keeping an eye on are in.
>
> So +1 for me.
> --
> Gianmarco
>
>
>
> On Wed, Oct 10, 2012 at 5:31 PM, Jonathan Coveney  >wrote:
>
> > I think all of the major patches are in, no? Now it's just bug testing?
> > Just wanted to touch base on where we are at with this.
> >
>



-- 
*Note that I'm no longer using my Yahoo! email address. Please email me at
billgra...@gmail.com going forward.*


Re: Pig 0.11

2012-10-10 Thread Gianmarco De Francisci Morales
We are missing some documentation on the RANK but I guess we could add that
to the branch and trunk in parallel.
All the patches I was keeping an eye on are in.

So +1 for me.
--
Gianmarco



On Wed, Oct 10, 2012 at 5:31 PM, Jonathan Coveney wrote:

> I think all of the major patches are in, no? Now it's just bug testing?
> Just wanted to touch base on where we are at with this.
>


[jira] Subscription: PIG patch available

2012-10-10 Thread jira
Issue Subscription
Filter: PIG patch available (37 issues)

Subscriber: pigdaily

Key Summary
PIG-2963Illustrate command and POPackageLite
https://issues.apache.org/jira/browse/PIG-2963
PIG-2960Increase the timeout for unit test
https://issues.apache.org/jira/browse/PIG-2960
PIG-2959Add a pig.cmd for Pig to run under Windows
https://issues.apache.org/jira/browse/PIG-2959
PIG-2958Pig tests do not appear to have a logger attached
https://issues.apache.org/jira/browse/PIG-2958
PIG-2957TetsScriptUDF fail due to volume prefix in jar
https://issues.apache.org/jira/browse/PIG-2957
PIG-2956Invalid cache specification for some streaming statement
https://issues.apache.org/jira/browse/PIG-2956
PIG-2955 Fix bunch of Pig e2e tests on Windows 
https://issues.apache.org/jira/browse/PIG-2955
PIG-2954 TestParamSubPreproc still depends on "bash" to run 
https://issues.apache.org/jira/browse/PIG-2954
PIG-2953"which" utility does not exist on Windows
https://issues.apache.org/jira/browse/PIG-2953
PIG-2943DevTests, Refactor Windows checks to use new Util.WINDOWS method 
for code health
https://issues.apache.org/jira/browse/PIG-2943
PIG-2942DevTests, TestLoad has a false failure on Windows
https://issues.apache.org/jira/browse/PIG-2942
PIG-2940HBaseStorage store fails in secure cluster
https://issues.apache.org/jira/browse/PIG-2940
PIG-2931$ signs in the replacement string make parameter substitution fail
https://issues.apache.org/jira/browse/PIG-2931
PIG-2928Fix e2e test failures in trunk: FilterBoolean_23/24
https://issues.apache.org/jira/browse/PIG-2928
PIG-2925Extremely long JobConf values should not be added to Streaming 
environment
https://issues.apache.org/jira/browse/PIG-2925
PIG-2908Fix unit tests to work with jdk7
https://issues.apache.org/jira/browse/PIG-2908
PIG-2898Parallel execution of e2e tests
https://issues.apache.org/jira/browse/PIG-2898
PIG-2881Add SUBTRACT eval function
https://issues.apache.org/jira/browse/PIG-2881
PIG-2873Converting bin/pig shell script to python
https://issues.apache.org/jira/browse/PIG-2873
PIG-2834MultiStorage requires unused constructor argument
https://issues.apache.org/jira/browse/PIG-2834
PIG-2824Pushing checking number of fields into LoadFunc
https://issues.apache.org/jira/browse/PIG-2824
PIG-2801grunt "sh" command should invoke the shell implicitly instead of 
calling exec directly with the command tokens
https://issues.apache.org/jira/browse/PIG-2801
PIG-2798pig streaming tests assume interpreters are auto-resolved
https://issues.apache.org/jira/browse/PIG-2798
PIG-2796Local temporary paths are not always valid HDFS path names.
https://issues.apache.org/jira/browse/PIG-2796
PIG-2795Fix test cases that generate pig scripts with "load " + pathStr to 
encode "\" in the path
https://issues.apache.org/jira/browse/PIG-2795
PIG-2794Pig test: add utils to simplify testing on Windows
https://issues.apache.org/jira/browse/PIG-2794
PIG-2778Add 'matches' operator to predicate pushdown
https://issues.apache.org/jira/browse/PIG-2778
PIG-2661Pig uses an extra job for loading data in Pigmix L9
https://issues.apache.org/jira/browse/PIG-2661
PIG-2657Print warning if using wrong jython version
https://issues.apache.org/jira/browse/PIG-2657
PIG-2579Support for multiple input schemas in AvroStorage
https://issues.apache.org/jira/browse/PIG-2579
PIG-2495Using merge JOIN from a HBaseStorage produces an error
https://issues.apache.org/jira/browse/PIG-2495
PIG-2417Streaming UDFs -  allow users to easily write UDFs in scripting 
languages with no JVM implementation.
https://issues.apache.org/jira/browse/PIG-2417
PIG-2405svn tags/release-0.9.1: some unit test case failed with open JDK
https://issues.apache.org/jira/browse/PIG-2405
PIG-2362Rework Ant build.xml to use macrodef instead of antcall
https://issues.apache.org/jira/browse/PIG-2362
PIG-2312NPE when relation and column share the same name and used in Nested 
Foreach 
https://issues.apache.org/jira/browse/PIG-2312
PIG-1942script UDF (jython) should utilize the intended output schema to 
more directly convert Py objects to Pig objects
https://issues.apache.org/jira/browse/PIG-1942
PIG-1237Piggybank MutliStorage - specify field to write in output
https://issues.apache.org/jira/browse/PIG-1237

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384


Re: PigServer API

2012-10-10 Thread Bill Graham
Ok, I'm sold. :)

On Wed, Oct 10, 2012 at 11:00 AM, Prashant Kommireddi
wrote:

> Thanks Bill.
>
> The rationale behind providing a List is that it simply provides a lot
> more methods than an iterator. You are right in saying one could do that in
> the caller code, I have a feeling providing this helper in the API would be
> beneficial. For eg, a framework that is used by clients could initiate
> several pig scripts/store commands at once. At the framework layer, you
> might want to be able to determine the number of MR jobs in total spawned
> by these multiple scripts and query stats on those. That's just one
> use-case, there could be more methods on List that a user could be
> interested in.
>
> -Prashant
>
>
> On Wed, Oct 10, 2012 at 10:28 AM, Bill Graham wrote:
>
>> Hi Prashant,
>>
>> [Replying to the dev list to get others take on these...]
>>
>> Just curious, why do you prefer a List of JobStats over the already
>> existing iterator? I hesitate to add one-liner methods if it's something
>> that can be a one-liner my the caller, unless the use case if very common.
>>
>> Making getSuccessfulJobs() and getFailedJobs() public seems reasonable to
>> me.
>>
>> I'm not sure about the rationale behind the differences between
>> registerScript and store(). Store() and registerQuery() are able to
>> manually add to the DAG as statements come in, but register script needs
>> parsing for execution. That's probably why execution is delegated to the
>> GruntParser. The resulting DAG for a single-store script should be the same
>> though. It seems like registerScript() should be able to return a list of
>> ExecJobs.
>>
>> thanks,
>> Bill
>>
>>
>> On Tue, Oct 9, 2012 at 11:22 PM, Prashant Kommireddi > > wrote:
>>
>>> Hi Bill,
>>>
>>> I am looking at PigStats and JobGraph, and am thinking of adding some
>>> functions. Let me know what you think.
>>>
>>> *getJobList()* returns a List representation of the iterator.
>>>
>>> public List getJobList() {
>>> return IteratorUtils.toList(iterator());
>>> }
>>>
>>> What do you think about making getSuccessfulJobs() and getFailedJobs()
>>> public and exposing it to the API? Currently they are package-private?
>>>
>>> Had another question, seems like the execution flow for
>>> PigServer.registerScript/Query is different from PigServer.store(). Was
>>> there a reason to make these different? The function store() returns an
>>> ExecJob which is great to get info regarding the runs, but registerScript()
>>> calls the GruntParser for execution which I think is a different flow?
>>>
>>> Thanks,
>>> Prashant
>>>
>>>
>>> On Thu, Oct 4, 2012 at 6:05 PM, Bill Graham wrote:
>>>
 Makes sense to me. We could return a PigStats object.

 On Thu, Oct 4, 2012 at 1:49 PM, Prashant Kommireddi <
 prash1...@gmail.com>wrote:

 > Hi All,
 >
 > I am looking at PigServer methods for running scripts/queries and it
 seems
 > like currently theie return type is void which does not tell much
 about job
 > completion.
 >
 > public void registerScript(InputStream in, Map
 > params,List paramsFiles) throws IOException {
 > try {
 > String substituted = doParamSubstitution(in, params,
 > paramsFiles);
 > GruntParser grunt = new GruntParser(new
 > StringReader(substituted));
 > grunt.setInteractive(false);
 > grunt.setParams(this);
 > grunt.parseStopOnError(true);
 > } catch (org.apache.pig.tools.pigscript.parser.ParseException
 e) {
 > log.error(e.getLocalizedMessage());
 > throw new IOException(e.getCause());
 > }
 > }
 >
 >
 > We do have a handle on number of jobs succeeded/failed as part of the
 job
 > run, so that is something we should add as return type?
 >
 > Thanks,
 > Prashant
 >



 --
 *Note that I'm no longer using my Yahoo! email address. Please email me
 at
 billgra...@gmail.com going forward.*

>>>
>>>
>>
>>
>> --
>> *Note that I'm no longer using my Yahoo! email address. Please email me
>> at billgra...@gmail.com going forward.*
>>
>
>


-- 
*Note that I'm no longer using my Yahoo! email address. Please email me at
billgra...@gmail.com going forward.*


[jira] [Updated] (PIG-2579) Support for multiple input schemas in AvroStorage

2012-10-10 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-2579:
---

Attachment: PIG-2579-6.patch

Updating the patch.

@Santhosh,
Can you please also remove the following files when committing the patch? They 
are no longer used by tests so should be deleted.
{code}
#   deleted:
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_generic_union_schema.avro
#   deleted:
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_recursive_schema.avro
{code}

> Support for multiple input schemas in AvroStorage
> -
>
> Key: PIG-2579
> URL: https://issues.apache.org/jira/browse/PIG-2579
> Project: Pig
>  Issue Type: New Feature
>  Components: piggybank
>Affects Versions: 0.9.2, 0.11
>Reporter: Stan Rosenberg
>Assignee: Cheolsoo Park
>Priority: Minor
> Attachments: avro_storage_union_schema.patch, 
> avro_storage_union_schema_test.tar.gz, PIG-2579-2-avro_test_files.tar.gz, 
> PIG-2579-2.patch, PIG-2579-3.patch, PIG-2579-4.patch, PIG-2579-5.patch, 
> PIG-2579-6.patch
>
>
> This is a barebones patch for AvroStorage which enables support of multiple 
> input schemas.  The assumption is that the input consists of avro files 
> having different schemas that can be unioned, e.g., flat records.  
> A simple illustrative example is attached 
> (avro_storage_union_schema_test.tar.gz): run create_avro1.pig, followed by 
> create_avro2.pig, followed by read_avro.pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: PIG-2579 Support for multiple input schemas in AvroStorage

2012-10-10 Thread Cheolsoo Park


> On Oct. 10, 2012, 9:50 p.m., Santhosh Srinivasan wrote:
> > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java,
> >  line 102
> > 
> >
> > Can you replace add with set since you are already initializing the 
> > capacity of the array list?

No, I can't because the size of list is still 0 even though I initialize its 
capacity until I add entries to it. Now set() throws an OutOfBoundException if 
(index < 0 || index >= size()), which is always true with any non-negative 
index as size() == 0.

http://docs.oracle.com/javase/6/docs/api/java/util/ArrayList.html#set(int, E)


- Cheolsoo


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6884/#review12319
---


On Oct. 10, 2012, 11:15 p.m., Cheolsoo Park wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6884/
> ---
> 
> (Updated Oct. 10, 2012, 11:15 p.m.)
> 
> 
> Review request for pig and Santhosh Srinivasan.
> 
> 
> Description
> ---
> 
> Add support for multiple avro schemas to AvroStorage. This patch is based on 
> Stan Rosenberg's original work.
> 
> Please see https://issues.apache.org/jira/browse/PIG-2579 for details
> 
> 
> This addresses bug PIG-2579.
> https://issues.apache.org/jira/browse/PIG-2579
> 
> 
> Diffs
> -
> 
>   
> contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java
>  d7a004f 
>   
> contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
>  84280af 
>   
> contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java
>  fb5cc25 
>   
> contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java
>  75057f9 
>   
> contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
>  1f6e581 
>   
> contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java
>  0761d5a 
> 
> Diff: https://reviews.apache.org/r/6884/diff/
> 
> 
> Testing
> ---
> 
> New unit tests are added:
> - TestAvroStorageUtils.testMergeSchema
> - TestAvroStorage.testMultipleSchemas1,2
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>



Re: Review Request: PIG-2579 Support for multiple input schemas in AvroStorage

2012-10-10 Thread Cheolsoo Park

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6884/
---

(Updated Oct. 10, 2012, 11:15 p.m.)


Review request for pig and Santhosh Srinivasan.


Changes
---

Incorporate Santhosh's comments.


Description
---

Add support for multiple avro schemas to AvroStorage. This patch is based on 
Stan Rosenberg's original work.

Please see https://issues.apache.org/jira/browse/PIG-2579 for details


This addresses bug PIG-2579.
https://issues.apache.org/jira/browse/PIG-2579


Diffs (updated)
-

  
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java
 d7a004f 
  
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
 84280af 
  
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java
 fb5cc25 
  
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java
 75057f9 
  
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
 1f6e581 
  
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java
 0761d5a 

Diff: https://reviews.apache.org/r/6884/diff/


Testing
---

New unit tests are added:
- TestAvroStorageUtils.testMergeSchema
- TestAvroStorage.testMultipleSchemas1,2


Thanks,

Cheolsoo Park



[jira] [Commented] (PIG-2951) Overflow, Underflow errors

2012-10-10 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473645#comment-13473645
 ] 

Jonathan Coveney commented on PIG-2951:
---

+1 to the idea of incrementing a counter

And if you need to work with huge numbers where precision matters, +1 the 
BigInteger/BigDecimal patch :)

> Overflow, Underflow errors
> --
>
> Key: PIG-2951
> URL: https://issues.apache.org/jira/browse/PIG-2951
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.0.0, 0.1.0, 0.2.0, 0.3.0, 0.4.0, 0.5.0, 0.6.0, 0.7.0, 
> 0.8.0, 0.8.1, 0.9.0, 0.9.1, 0.9.2, 0.10.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: pig-2951.patch
>
>
> With very large (or very small) integer values there is a possibility of 
> overflow (or underflow) errors. Worse thing is instead of failing, this 
> currently results in incorrect results being returned, thereby leaving user 
> with no clue that some of the tuples may have wrong value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2963) Illustrate command and POPackageLite

2012-10-10 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-2963:
---

Assignee: Cheolsoo Park
  Status: Patch Available  (was: Open)

> Illustrate command and POPackageLite
> 
>
> Key: PIG-2963
> URL: https://issues.apache.org/jira/browse/PIG-2963
> Project: Pig
>  Issue Type: Bug
>Reporter: Allan Avendaño
>Assignee: Cheolsoo Park
>Priority: Critical
> Fix For: 0.11
>
> Attachments: PIG-2963.patch
>
>
> While trying to execute a simple script like:
> A = LOAD 'test01' AS (f1:chararray,f2:int,f3:chararray);
> B = order A by f1;
> illustrate B;
> or 
> C = foreach B generate f1, f2; 
> illustrate C;
> I got the following exception:
> java.lang.RuntimeException: ReadOnceBag does not support getMemorySize 
> operation
>   at org.apache.pig.data.ReadOnceBag.getMemorySize(ReadOnceBag.java:74)
>   at org.apache.pig.data.SizeUtil.getPigObjMemSize(SizeUtil.java:61)
>   at org.apache.pig.data.DefaultTuple.getMemorySize(DefaultTuple.java:180)
>   at 
> org.apache.pig.pen.util.ExampleTuple.getMemorySize(ExampleTuple.java:97)
>   at 
> org.apache.pig.data.DefaultAbstractBag.getMemorySize(DefaultAbstractBag.java:148)
>   at 
> org.apache.pig.data.DefaultAbstractBag.markSpillableIfNecessary(DefaultAbstractBag.java:100)
>   at 
> org.apache.pig.data.DefaultAbstractBag.add(DefaultAbstractBag.java:92)
>   at org.apache.pig.pen.Illustrator.addData(Illustrator.java:116)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackageLite.illustratorMarkup(POPackageLite.java:227)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackageLite.getNext(POPackageLite.java:182)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:422)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:413)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:257)
>   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>   at 
> org.apache.pig.pen.LocalMapReduceSimulator.launchPig(LocalMapReduceSimulator.java:235)
>   at 
> org.apache.pig.pen.ExampleGenerator.getData(ExampleGenerator.java:257)
>   at 
> org.apache.pig.pen.ExampleGenerator.getData(ExampleGenerator.java:238)
>   at 
> org.apache.pig.pen.LineageTrimmingVisitor.init(LineageTrimmingVisitor.java:103)
>   at 
> org.apache.pig.pen.LineageTrimmingVisitor.(LineageTrimmingVisitor.java:98)
>   at 
> org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:166)
>   at org.apache.pig.PigServer.getExamples(PigServer.java:1180)
>   at 
> org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:738)
>   at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.Illustrate(PigScriptParser.java:626)
>   at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:323)
>   at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193)
>   at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:169)
>   at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
>   at org.apache.pig.Main.run(Main.java:538)
>   at org.apache.pig.Main.main(Main.java:154)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> ==
> At log file, the following:
> Pig Stack Trace
> ---
> ERROR 2997: Encountered IOException. Exception
> java.io.IOException: Exception
> at org.apache.pig.PigServer.getExamples(PigServer.java:1186)
> at 
> org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:738)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.Illustrate(PigScriptParser.java:626)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:323)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:169)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
> at org.apache.pig.Mai

[jira] [Updated] (PIG-2963) Illustrate command and POPackageLite

2012-10-10 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-2963:
---

Attachment: PIG-2963.patch

This is a regression from PIG-2923. Verified that the error goes away by 
reverting it.

What's happening is that illustrate adds to DefautAbstractBag a tuple that has 
a ReadOnceBag field.

The problem is that PIG-2923 made DefautAbstractBag check whether or not it 
should spill to disk every time a new element is added to it. To compute the 
memory size of the new element, DefautAbstractBag iterates through every field 
of the tuple, in this case which is ReadOnceBag. Unfortunately, ReadOnceBag 
doesn't support getMemorySize() and throws a runtime exception.

I am attaching a simple fix that makes ReadOnceBag.getMemorySize() return 0 
instead of throwing a runtime exception. I am returning 0 here because the 
comment in ReadOnceBag says "this bag does not store the tuples in memory".

I am not familiar with ReadOnceBag, so please correct me if this is not a 
proper fix.

Thanks!

> Illustrate command and POPackageLite
> 
>
> Key: PIG-2963
> URL: https://issues.apache.org/jira/browse/PIG-2963
> Project: Pig
>  Issue Type: Bug
>Reporter: Allan Avendaño
>Priority: Critical
> Fix For: 0.11
>
> Attachments: PIG-2963.patch
>
>
> While trying to execute a simple script like:
> A = LOAD 'test01' AS (f1:chararray,f2:int,f3:chararray);
> B = order A by f1;
> illustrate B;
> or 
> C = foreach B generate f1, f2; 
> illustrate C;
> I got the following exception:
> java.lang.RuntimeException: ReadOnceBag does not support getMemorySize 
> operation
>   at org.apache.pig.data.ReadOnceBag.getMemorySize(ReadOnceBag.java:74)
>   at org.apache.pig.data.SizeUtil.getPigObjMemSize(SizeUtil.java:61)
>   at org.apache.pig.data.DefaultTuple.getMemorySize(DefaultTuple.java:180)
>   at 
> org.apache.pig.pen.util.ExampleTuple.getMemorySize(ExampleTuple.java:97)
>   at 
> org.apache.pig.data.DefaultAbstractBag.getMemorySize(DefaultAbstractBag.java:148)
>   at 
> org.apache.pig.data.DefaultAbstractBag.markSpillableIfNecessary(DefaultAbstractBag.java:100)
>   at 
> org.apache.pig.data.DefaultAbstractBag.add(DefaultAbstractBag.java:92)
>   at org.apache.pig.pen.Illustrator.addData(Illustrator.java:116)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackageLite.illustratorMarkup(POPackageLite.java:227)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackageLite.getNext(POPackageLite.java:182)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:422)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:413)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:257)
>   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>   at 
> org.apache.pig.pen.LocalMapReduceSimulator.launchPig(LocalMapReduceSimulator.java:235)
>   at 
> org.apache.pig.pen.ExampleGenerator.getData(ExampleGenerator.java:257)
>   at 
> org.apache.pig.pen.ExampleGenerator.getData(ExampleGenerator.java:238)
>   at 
> org.apache.pig.pen.LineageTrimmingVisitor.init(LineageTrimmingVisitor.java:103)
>   at 
> org.apache.pig.pen.LineageTrimmingVisitor.(LineageTrimmingVisitor.java:98)
>   at 
> org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:166)
>   at org.apache.pig.PigServer.getExamples(PigServer.java:1180)
>   at 
> org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:738)
>   at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.Illustrate(PigScriptParser.java:626)
>   at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:323)
>   at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193)
>   at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:169)
>   at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
>   at org.apache.pig.Main.run(Main.java:538)
>   at org.apache.pig.Main.main(Main.java:154)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> 

Re: Review Request: PIG-2579 Support for multiple input schemas in AvroStorage

2012-10-10 Thread Santhosh Srinivasan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6884/#review12319
---


Couple of comments. Otherwise, the patch looks good.


contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java


Can you replace add with set since you are already initializing the 
capacity of the array list?



contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java


Can you add a message for the failure?


- Santhosh Srinivasan


On Sept. 28, 2012, 8:44 p.m., Cheolsoo Park wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6884/
> ---
> 
> (Updated Sept. 28, 2012, 8:44 p.m.)
> 
> 
> Review request for pig and Santhosh Srinivasan.
> 
> 
> Description
> ---
> 
> Add support for multiple avro schemas to AvroStorage. This patch is based on 
> Stan Rosenberg's original work.
> 
> Please see https://issues.apache.org/jira/browse/PIG-2579 for details
> 
> 
> This addresses bug PIG-2579.
> https://issues.apache.org/jira/browse/PIG-2579
> 
> 
> Diffs
> -
> 
>   
> contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java
>  d7a004f 
>   
> contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
>  84280af 
>   
> contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java
>  fb5cc25 
>   
> contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java
>  75057f9 
>   
> contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
>  1f6e581 
>   
> contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorageUtils.java
>  0761d5a 
> 
> Diff: https://reviews.apache.org/r/6884/diff/
> 
> 
> Testing
> ---
> 
> New unit tests are added:
> - TestAvroStorageUtils.testMergeSchema
> - TestAvroStorage.testMultipleSchemas1,2
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>



[jira] [Assigned] (PIG-2910) Make toString() methods on Schema and FieldSchema be readable by Utils.getSchemaFromString()

2012-10-10 Thread Eli Reisman (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman reassigned PIG-2910:


Assignee: Thejas M Nair  (was: Eli Reisman)

sorry wrong button!

> Make toString() methods on Schema and FieldSchema be readable by 
> Utils.getSchemaFromString()
> 
>
> Key: PIG-2910
> URL: https://issues.apache.org/jira/browse/PIG-2910
> Project: Pig
>  Issue Type: Bug
>  Components: impl, parser
>Affects Versions: 0.9.2, 0.10.0, 0.11, 0.10.1
>Reporter: Russell Jurney
>Assignee: Thejas M Nair
>  Labels: newbie
> Fix For: 0.11, 0.10.1
>
> Attachments: PIG-2910-1.patch, PIG-2910-2.patch, PIG-2910-3.patch, 
> PIG-2910-4.patch
>
>
> I want to toString() schemas and send them to the backend via UDFContext. At 
> the moment this requires writing your own toString() method that 
> Utils.getSchemaFromString() can read. Making a readable schema for the 
> backend would be an improvement.
> I spoke with Thejas, who believes this is a bug. The workaround for the 
> moment is, for example:
> String schemaString = inputSchema.toString().substring(1, 
> inputSchema.toString().length() - 1);
> // Set the input schema for processing
> UDFContext context = UDFContext.getUDFContext();
> Properties udfProp = context.getUDFProperties(this.getClass());
> udfProp.setProperty("horton.json.udf.schema", schemaString);
> ...
> schema = Utils.getSchemaFromString(strSchema);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2910) Make toString() methods on Schema and FieldSchema be readable by Utils.getSchemaFromString()

2012-10-10 Thread Eli Reisman (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated PIG-2910:
-

Attachment: PIG-2910-4.patch

how's this? while writing a test I found one in TestSchema that seems built for 
this purpose (see patch) so I modified it. this passes 'ant compile-test'

Thanks for the advice!


> Make toString() methods on Schema and FieldSchema be readable by 
> Utils.getSchemaFromString()
> 
>
> Key: PIG-2910
> URL: https://issues.apache.org/jira/browse/PIG-2910
> Project: Pig
>  Issue Type: Bug
>  Components: impl, parser
>Affects Versions: 0.9.2, 0.10.0, 0.11, 0.10.1
>Reporter: Russell Jurney
>Assignee: Thejas M Nair
>  Labels: newbie
> Fix For: 0.11, 0.10.1
>
> Attachments: PIG-2910-1.patch, PIG-2910-2.patch, PIG-2910-3.patch, 
> PIG-2910-4.patch
>
>
> I want to toString() schemas and send them to the backend via UDFContext. At 
> the moment this requires writing your own toString() method that 
> Utils.getSchemaFromString() can read. Making a readable schema for the 
> backend would be an improvement.
> I spoke with Thejas, who believes this is a bug. The workaround for the 
> moment is, for example:
> String schemaString = inputSchema.toString().substring(1, 
> inputSchema.toString().length() - 1);
> // Set the input schema for processing
> UDFContext context = UDFContext.getUDFContext();
> Properties udfProp = context.getUDFProperties(this.getClass());
> udfProp.setProperty("horton.json.udf.schema", schemaString);
> ...
> schema = Utils.getSchemaFromString(strSchema);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-2910) Make toString() methods on Schema and FieldSchema be readable by Utils.getSchemaFromString()

2012-10-10 Thread Eli Reisman (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman reassigned PIG-2910:


Assignee: Eli Reisman  (was: Thejas M Nair)

> Make toString() methods on Schema and FieldSchema be readable by 
> Utils.getSchemaFromString()
> 
>
> Key: PIG-2910
> URL: https://issues.apache.org/jira/browse/PIG-2910
> Project: Pig
>  Issue Type: Bug
>  Components: impl, parser
>Affects Versions: 0.9.2, 0.10.0, 0.11, 0.10.1
>Reporter: Russell Jurney
>Assignee: Eli Reisman
>  Labels: newbie
> Fix For: 0.11, 0.10.1
>
> Attachments: PIG-2910-1.patch, PIG-2910-2.patch, PIG-2910-3.patch
>
>
> I want to toString() schemas and send them to the backend via UDFContext. At 
> the moment this requires writing your own toString() method that 
> Utils.getSchemaFromString() can read. Making a readable schema for the 
> backend would be an improvement.
> I spoke with Thejas, who believes this is a bug. The workaround for the 
> moment is, for example:
> String schemaString = inputSchema.toString().substring(1, 
> inputSchema.toString().length() - 1);
> // Set the input schema for processing
> UDFContext context = UDFContext.getUDFContext();
> Properties udfProp = context.getUDFProperties(this.getClass());
> udfProp.setProperty("horton.json.udf.schema", schemaString);
> ...
> schema = Utils.getSchemaFromString(strSchema);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2910) Make toString() methods on Schema and FieldSchema be readable by Utils.getSchemaFromString()

2012-10-10 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473535#comment-13473535
 ] 

Thejas M Nair commented on PIG-2910:


Yes, The changes in 2910-3 patch look good. Can you please add a test case, and 
also add a comment that this schema string has "{}" around it, and that is why 
the substring is being done ?


> Make toString() methods on Schema and FieldSchema be readable by 
> Utils.getSchemaFromString()
> 
>
> Key: PIG-2910
> URL: https://issues.apache.org/jira/browse/PIG-2910
> Project: Pig
>  Issue Type: Bug
>  Components: impl, parser
>Affects Versions: 0.9.2, 0.10.0, 0.11, 0.10.1
>Reporter: Russell Jurney
>Assignee: Thejas M Nair
>  Labels: newbie
> Fix For: 0.11, 0.10.1
>
> Attachments: PIG-2910-1.patch, PIG-2910-2.patch, PIG-2910-3.patch
>
>
> I want to toString() schemas and send them to the backend via UDFContext. At 
> the moment this requires writing your own toString() method that 
> Utils.getSchemaFromString() can read. Making a readable schema for the 
> backend would be an improvement.
> I spoke with Thejas, who believes this is a bug. The workaround for the 
> moment is, for example:
> String schemaString = inputSchema.toString().substring(1, 
> inputSchema.toString().length() - 1);
> // Set the input schema for processing
> UDFContext context = UDFContext.getUDFContext();
> Properties udfProp = context.getUDFProperties(this.getClass());
> udfProp.setProperty("horton.json.udf.schema", schemaString);
> ...
> schema = Utils.getSchemaFromString(strSchema);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2910) Make toString() methods on Schema and FieldSchema be readable by Utils.getSchemaFromString()

2012-10-10 Thread Eli Reisman (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Reisman updated PIG-2910:
-

Attachment: PIG-2910-3.patch

That makes a lot of sense. Is 2910-3 patch a bit more like it then?

> Make toString() methods on Schema and FieldSchema be readable by 
> Utils.getSchemaFromString()
> 
>
> Key: PIG-2910
> URL: https://issues.apache.org/jira/browse/PIG-2910
> Project: Pig
>  Issue Type: Bug
>  Components: impl, parser
>Affects Versions: 0.9.2, 0.10.0, 0.11, 0.10.1
>Reporter: Russell Jurney
>Assignee: Thejas M Nair
>  Labels: newbie
> Fix For: 0.11, 0.10.1
>
> Attachments: PIG-2910-1.patch, PIG-2910-2.patch, PIG-2910-3.patch
>
>
> I want to toString() schemas and send them to the backend via UDFContext. At 
> the moment this requires writing your own toString() method that 
> Utils.getSchemaFromString() can read. Making a readable schema for the 
> backend would be an improvement.
> I spoke with Thejas, who believes this is a bug. The workaround for the 
> moment is, for example:
> String schemaString = inputSchema.toString().substring(1, 
> inputSchema.toString().length() - 1);
> // Set the input schema for processing
> UDFContext context = UDFContext.getUDFContext();
> Properties udfProp = context.getUDFProperties(this.getClass());
> udfProp.setProperty("horton.json.udf.schema", schemaString);
> ...
> schema = Utils.getSchemaFromString(strSchema);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: ORDER BY illustrator

2012-10-10 Thread Russell Jurney
There were some upgrades to ILLUSTRATE for Pig 0.10, I think, but that
patch should get applied to 0.9.x too?

Russell Jurney twitter.com/rjurney


On Oct 10, 2012, at 7:53 AM, Allan  wrote:

> I already created the JIRA https://issues.apache.org/jira/browse/PIG-2963.
>
> Best regards,
>
>
> --
> Allan Avendaño S.
> --


Re: PigServer API

2012-10-10 Thread Prashant Kommireddi
Thanks Bill.

The rationale behind providing a List is that it simply provides a lot more
methods than an iterator. You are right in saying one could do that in the
caller code, I have a feeling providing this helper in the API would be
beneficial. For eg, a framework that is used by clients could initiate
several pig scripts/store commands at once. At the framework layer, you
might want to be able to determine the number of MR jobs in total spawned
by these multiple scripts and query stats on those. That's just one
use-case, there could be more methods on List that a user could be
interested in.

-Prashant

On Wed, Oct 10, 2012 at 10:28 AM, Bill Graham  wrote:

> Hi Prashant,
>
> [Replying to the dev list to get others take on these...]
>
> Just curious, why do you prefer a List of JobStats over the already
> existing iterator? I hesitate to add one-liner methods if it's something
> that can be a one-liner my the caller, unless the use case if very common.
>
> Making getSuccessfulJobs() and getFailedJobs() public seems reasonable to
> me.
>
> I'm not sure about the rationale behind the differences between
> registerScript and store(). Store() and registerQuery() are able to
> manually add to the DAG as statements come in, but register script needs
> parsing for execution. That's probably why execution is delegated to the
> GruntParser. The resulting DAG for a single-store script should be the same
> though. It seems like registerScript() should be able to return a list of
> ExecJobs.
>
> thanks,
> Bill
>
>
> On Tue, Oct 9, 2012 at 11:22 PM, Prashant Kommireddi 
> wrote:
>
>> Hi Bill,
>>
>> I am looking at PigStats and JobGraph, and am thinking of adding some
>> functions. Let me know what you think.
>>
>> *getJobList()* returns a List representation of the iterator.
>>
>> public List getJobList() {
>> return IteratorUtils.toList(iterator());
>> }
>>
>> What do you think about making getSuccessfulJobs() and getFailedJobs()
>> public and exposing it to the API? Currently they are package-private?
>>
>> Had another question, seems like the execution flow for
>> PigServer.registerScript/Query is different from PigServer.store(). Was
>> there a reason to make these different? The function store() returns an
>> ExecJob which is great to get info regarding the runs, but registerScript()
>> calls the GruntParser for execution which I think is a different flow?
>>
>> Thanks,
>> Prashant
>>
>>
>> On Thu, Oct 4, 2012 at 6:05 PM, Bill Graham  wrote:
>>
>>> Makes sense to me. We could return a PigStats object.
>>>
>>> On Thu, Oct 4, 2012 at 1:49 PM, Prashant Kommireddi >> >wrote:
>>>
>>> > Hi All,
>>> >
>>> > I am looking at PigServer methods for running scripts/queries and it
>>> seems
>>> > like currently theie return type is void which does not tell much
>>> about job
>>> > completion.
>>> >
>>> > public void registerScript(InputStream in, Map
>>> > params,List paramsFiles) throws IOException {
>>> > try {
>>> > String substituted = doParamSubstitution(in, params,
>>> > paramsFiles);
>>> > GruntParser grunt = new GruntParser(new
>>> > StringReader(substituted));
>>> > grunt.setInteractive(false);
>>> > grunt.setParams(this);
>>> > grunt.parseStopOnError(true);
>>> > } catch (org.apache.pig.tools.pigscript.parser.ParseException
>>> e) {
>>> > log.error(e.getLocalizedMessage());
>>> > throw new IOException(e.getCause());
>>> > }
>>> > }
>>> >
>>> >
>>> > We do have a handle on number of jobs succeeded/failed as part of the
>>> job
>>> > run, so that is something we should add as return type?
>>> >
>>> > Thanks,
>>> > Prashant
>>> >
>>>
>>>
>>>
>>> --
>>> *Note that I'm no longer using my Yahoo! email address. Please email me
>>> at
>>> billgra...@gmail.com going forward.*
>>>
>>
>>
>
>
> --
> *Note that I'm no longer using my Yahoo! email address. Please email me
> at billgra...@gmail.com going forward.*
>


Re: PigServer API

2012-10-10 Thread Bill Graham
Hi Prashant,

[Replying to the dev list to get others take on these...]

Just curious, why do you prefer a List of JobStats over the already
existing iterator? I hesitate to add one-liner methods if it's something
that can be a one-liner my the caller, unless the use case if very common.

Making getSuccessfulJobs() and getFailedJobs() public seems reasonable to
me.

I'm not sure about the rationale behind the differences between
registerScript and store(). Store() and registerQuery() are able to
manually add to the DAG as statements come in, but register script needs
parsing for execution. That's probably why execution is delegated to the
GruntParser. The resulting DAG for a single-store script should be the same
though. It seems like registerScript() should be able to return a list of
ExecJobs.

thanks,
Bill

On Tue, Oct 9, 2012 at 11:22 PM, Prashant Kommireddi wrote:

> Hi Bill,
>
> I am looking at PigStats and JobGraph, and am thinking of adding some
> functions. Let me know what you think.
>
> *getJobList()* returns a List representation of the iterator.
>
> public List getJobList() {
> return IteratorUtils.toList(iterator());
> }
>
> What do you think about making getSuccessfulJobs() and getFailedJobs()
> public and exposing it to the API? Currently they are package-private?
>
> Had another question, seems like the execution flow for
> PigServer.registerScript/Query is different from PigServer.store(). Was
> there a reason to make these different? The function store() returns an
> ExecJob which is great to get info regarding the runs, but registerScript()
> calls the GruntParser for execution which I think is a different flow?
>
> Thanks,
> Prashant
>
>
> On Thu, Oct 4, 2012 at 6:05 PM, Bill Graham  wrote:
>
>> Makes sense to me. We could return a PigStats object.
>>
>> On Thu, Oct 4, 2012 at 1:49 PM, Prashant Kommireddi > >wrote:
>>
>> > Hi All,
>> >
>> > I am looking at PigServer methods for running scripts/queries and it
>> seems
>> > like currently theie return type is void which does not tell much about
>> job
>> > completion.
>> >
>> > public void registerScript(InputStream in, Map
>> > params,List paramsFiles) throws IOException {
>> > try {
>> > String substituted = doParamSubstitution(in, params,
>> > paramsFiles);
>> > GruntParser grunt = new GruntParser(new
>> > StringReader(substituted));
>> > grunt.setInteractive(false);
>> > grunt.setParams(this);
>> > grunt.parseStopOnError(true);
>> > } catch (org.apache.pig.tools.pigscript.parser.ParseException
>> e) {
>> > log.error(e.getLocalizedMessage());
>> > throw new IOException(e.getCause());
>> > }
>> > }
>> >
>> >
>> > We do have a handle on number of jobs succeeded/failed as part of the
>> job
>> > run, so that is something we should add as return type?
>> >
>> > Thanks,
>> > Prashant
>> >
>>
>>
>>
>> --
>> *Note that I'm no longer using my Yahoo! email address. Please email me at
>> billgra...@gmail.com going forward.*
>>
>
>


-- 
*Note that I'm no longer using my Yahoo! email address. Please email me at
billgra...@gmail.com going forward.*


Re: ORDER BY illustrator

2012-10-10 Thread Allan
I already created the JIRA https://issues.apache.org/jira/browse/PIG-2963.

Best regards,


-- 
Allan Avendaño S.
--


[jira] [Created] (PIG-2963) Illustrate command and POPackageLite

2012-10-10 Thread JIRA
Allan Avendaño created PIG-2963:
---

 Summary: Illustrate command and POPackageLite
 Key: PIG-2963
 URL: https://issues.apache.org/jira/browse/PIG-2963
 Project: Pig
  Issue Type: Bug
Reporter: Allan Avendaño
Priority: Critical
 Fix For: 0.11


While trying to execute a simple script like:

A = LOAD 'test01' AS (f1:chararray,f2:int,f3:chararray);
B = order A by f1;
illustrate B;

or 

C = foreach B generate f1, f2; 
illustrate C;

I got the following exception:

java.lang.RuntimeException: ReadOnceBag does not support getMemorySize operation
at org.apache.pig.data.ReadOnceBag.getMemorySize(ReadOnceBag.java:74)
at org.apache.pig.data.SizeUtil.getPigObjMemSize(SizeUtil.java:61)
at org.apache.pig.data.DefaultTuple.getMemorySize(DefaultTuple.java:180)
at 
org.apache.pig.pen.util.ExampleTuple.getMemorySize(ExampleTuple.java:97)
at 
org.apache.pig.data.DefaultAbstractBag.getMemorySize(DefaultAbstractBag.java:148)
at 
org.apache.pig.data.DefaultAbstractBag.markSpillableIfNecessary(DefaultAbstractBag.java:100)
at 
org.apache.pig.data.DefaultAbstractBag.add(DefaultAbstractBag.java:92)
at org.apache.pig.pen.Illustrator.addData(Illustrator.java:116)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackageLite.illustratorMarkup(POPackageLite.java:227)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackageLite.getNext(POPackageLite.java:182)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:422)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:413)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:257)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at 
org.apache.pig.pen.LocalMapReduceSimulator.launchPig(LocalMapReduceSimulator.java:235)
at 
org.apache.pig.pen.ExampleGenerator.getData(ExampleGenerator.java:257)
at 
org.apache.pig.pen.ExampleGenerator.getData(ExampleGenerator.java:238)
at 
org.apache.pig.pen.LineageTrimmingVisitor.init(LineageTrimmingVisitor.java:103)
at 
org.apache.pig.pen.LineageTrimmingVisitor.(LineageTrimmingVisitor.java:98)
at 
org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:166)
at org.apache.pig.PigServer.getExamples(PigServer.java:1180)
at 
org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:738)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.Illustrate(PigScriptParser.java:626)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:323)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:169)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:538)
at org.apache.pig.Main.main(Main.java:154)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
==

At log file, the following:

Pig Stack Trace
---
ERROR 2997: Encountered IOException. Exception

java.io.IOException: Exception
at org.apache.pig.PigServer.getExamples(PigServer.java:1186)
at 
org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:738)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.Illustrate(PigScriptParser.java:626)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:323)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:169)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:538)
at org.apache.pig.Main.main(Main.java:154)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)

Re: ORDER BY illustrator

2012-10-10 Thread Allan
Thanks for your reply, I was wondered if someone else is facing something
similar with illustrator.
I doesn't happen on Pig 0.10.0.

Best regards,

On Wed, Oct 10, 2012 at 8:17 AM, Russell Jurney wrote:

> I don't know what the issue is, but I can JIRA it for you. What
> happens when you try the same on Pig 0.10.0?
>
>
-- 
Allan Avendaño S.
--