[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-21 Thread Mark Wagner (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13747131#comment-13747131
 ] 

Mark Wagner commented on PIG-3419:
--

I'd also be in favor putting this in trunk as opposed to a Tez branch. Although 
the motivation for this is Tez, I think we would want this patch in Pig even if 
there wasn't Tez support.

A couple short comments for Achal: 
* It looks like the build targets that include the META-INF are only executed 
when building against hadoopversion=23. The META-INF also don't seem to be 
included in the pig.jar and pig-withouthadoop.jar that go in the root 
directory. I tried copying in the correct jars, but it seems like something is 
still off.
* The changes to the try/catch blocks in MapReduceLauncher break on 23, because 
HadoopShims for 23 doesn't throw an exception where 20 does. Maybe that should 
be fixed in HadoopShims though.

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, finalpatch.patch, 
> mapreduce_execengine.patch, stats_scriptstate.patch, test_suite.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Subscription: PIG patch available

2013-08-21 Thread jira
Issue Subscription
Filter: PIG patch available (19 issues)

Subscriber: pigdaily

Key Summary
PIG-3431Return more information for parsing related exceptions.
https://issues.apache.org/jira/browse/PIG-3431
PIG-3430Add xml format for explaining MapReduce Plan.
https://issues.apache.org/jira/browse/PIG-3430
PIG-3426Add support for removing s3 files
https://issues.apache.org/jira/browse/PIG-3426
PIG-3419Pluggable Execution Engine 
https://issues.apache.org/jira/browse/PIG-3419
PIG-3379Alias reuse in nested foreach causes PIG script to fail
https://issues.apache.org/jira/browse/PIG-3379
PIG-3374CASE and IN fail when expression includes dereferencing operator
https://issues.apache.org/jira/browse/PIG-3374
PIG-3349Document ToString(Datetime, String) UDF
https://issues.apache.org/jira/browse/PIG-3349
PIG-3346New property that controls the number of combined splits
https://issues.apache.org/jira/browse/PIG-3346
PIG-Fix remaining Windows core unit test failures
https://issues.apache.org/jira/browse/PIG-
PIG-3325Adding a tuple to a bag is slow
https://issues.apache.org/jira/browse/PIG-3325
PIG-3295Casting from bytearray failing after Union (even when each field is 
from a single Loader)
https://issues.apache.org/jira/browse/PIG-3295
PIG-3292Logical plan invalid state: duplicate uid in schema during 
self-join to get cross product
https://issues.apache.org/jira/browse/PIG-3292
PIG-3257Add unique identifier UDF
https://issues.apache.org/jira/browse/PIG-3257
PIG-3199Expose LogicalPlan via PigServer API
https://issues.apache.org/jira/browse/PIG-3199
PIG-3168TestMultiQueryBasic.testMultiQueryWithSplitInMapAndMultiMerge fails 
in trunk
https://issues.apache.org/jira/browse/PIG-3168
PIG-3117A debug mode in which pig does not delete temporary files
https://issues.apache.org/jira/browse/PIG-3117
PIG-3088Add a builtin udf which removes prefixes
https://issues.apache.org/jira/browse/PIG-3088
PIG-3048Add mapreduce workflow information to job configuration
https://issues.apache.org/jira/browse/PIG-3048
PIG-3021Split results missing records when there is null values in the 
column comparison
https://issues.apache.org/jira/browse/PIG-3021

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384


Re: Slow Group By operator

2013-08-21 Thread Cheolsoo Park
Hi Benjamin,

Thank you very much for sharing detailed information!

1) From the runtime numbers that you provided, the mappers are very slow.

CPU time spent (ms)5,081,610168,7405,250,350CPU time spent (ms)5,052,700
178,2205,230,920CPU time spent (ms)5,084,430193,4805,277,910

2) In your GROUP BY query, you have an algebraic UDF "COUNT".

I am wondering whether disabling combiner will help here. I have seen a lot
of cases where combiner actually hurt performance significantly if it
doesn't combine mapper outputs significantly. Briefly looking at
generate_data.pl in PIG-200, it looks like a lot of random keys are
generated. So I guess you will end up with a large number of small bags
rather than a small number of large bags. If that's the case, combiner will
only add overhead to mappers.

Can you try to include this "set pig.exec.nocombiner true;" and see whether
it helps?

Thanks,
Cheolsoo






On Wed, Aug 21, 2013 at 3:52 AM, Benjamin Jakobus wrote:

> Hi Cheolsoo,
>
> >>What's your query like? Can you share it? Do you call any algebraic UDF
> >> after group by? I am wondering whether combiner matters in your test.
> I have been running 3 different types of queries.
>
> The first was performed on datasets of 6 different sizes:
>
>
>- Dataset size 1: 30,000 records (772KB)
>- Dataset size 2: 300,000 records (6.4MB)
>- Dataset size 3: 3,000,000 records (63MB)
>- Dataset size 4: 30 million records (628MB)
>- Dataset size 5: 300 million records (6.2GB)
>- Dataset size 6: 3 billion records (62GB)
>
> The datasets scale linearly, whereby the size equates to 3000 * 10n .
> A seventh dataset consisting of 1,000 records (23KB) was produced to
> perform join
> operations on. Its schema is as follows:
> name - string
> marks - integer
> gpa - float
> The data was generated using the generate data.pl perl script available
> for
> download
>  from https://issues.apache.org/jira/browse/PIG-200 to produce the
> datasets. The results are as follows:
>
>
>  *  * *  * *  * *Set 1  * *Set 2**  * *Set 3**  *
> *Set
> 4**  * *Set 5**  * *Set 6*
> *Arithmetic**  * 32.82*  * 36.21*  * 49.49*  * 83.25*
>  *
>  423.63*  * 3900.78
> *Filter 10%**  * 32.94*  * 34.32*  * 44.56*  * 66.68*
>  *
>  295.59*  * 2640.52
> *Filter 90%**  * 33.93*  * 32.55*  * 37.86*  * 53.22*
>  *
>  197.36*  * 1657.37
> *Group**  * *  *49.43*  * 53.34*  * 69.84*  * 105.12*
>*497.61*  * 4394.21
> *Join**  * *  *   49.89*  * 50.08*  * 78.55*  * 150.39*
>*1045.34* *10258.19
> *Averaged performance of arithmetic, join, group, order, distinct select
> and filter operations on six datasets using Pig. Scripts were configured as
> to use 8 reduce and 11 map tasks.*
>
>
>
>  *  * *  Set 1**  * *Set 2**  * *Set 3**  *
> *Set
> 4**  * *Set 5**  * *Set 6*
> *Arithmetic**  *  32.84*  * 37.33*  * 72.55*  * 300.08
>  2633.7227821.19
> *Filter 10%  *   32.36*  * 53.28*  * 59.22*  * 209.5**
> 1672.3* *18222.19
> *Filter 90%  *  31.23*  * 32.68*  *  36.8*  *  69.55*
>  *
> 331.88* *3320.59
> *Group  * *  * 48.27*  * 47.68*  * 46.87*  * 53.66*
>  *141.36* *1233.4
> *Join  * *  * *   *48.54*  *56.86*  * 104.6*  * 517.5*
>* 4388.34*  * -
> *Distinct**  * * *48.73*  *53.28*  * 72.54*  * 109.77*
>* - *  * *  *  -
> *Averaged performance of arithmetic, join, group, distinct select and
> filter operations on six datasets using Hive. Scripts were configured as to
> use 8 reduce and 11 map tasks.*
>
> (If you want to see the standard deviation, let me know).
>
> So, to summarize the results: Pig outperforms Hive, with the exception of
> using *Group By*.
>
> The Pig scripts used for this benchmark are as follows:
> *Arithmetic*
> -- Generate with basic arithmetic
> A = load '$input/dataset_3' using PigStorage('\t') as (name, age,
> gpa) PARALLEL $reducers;
> B = foreach A generate age * gpa + 3, age/gpa - 1.5 PARALLEL $reducers;
> store B into '$output/dataset_3_projection' using PigStorage()
> PARALLEL $reducers;
>
> *
> *
> *Filter 10%*
> -- Filter that removes 10% of data
> A = load '$input/dataset_3' using PigStorage('\t') as (name, age,
> gpa) PARALLEL $reducers;
> B = filter A by gpa < '3.6' PARALLEL $reducers;
> store B into '$output/dataset_3_filter_10' using PigStorage()
> PARALLEL $reducers;
>
>
> *Filter 90%*
> -- Filter that removes 90% of data
> A = load '$input/dataset_3' using PigStorage('\t') as (name, age,
> gpa) PARALLEL $reducers;
> B = filter A by age < '25' PARALLEL $reducers;
> store B into '$output/dataset_3_filter_90' using PigStorage()
> PARALLEL $reducers;
>
> *
> *
> *Group*
> A = load '$input/dataset_3' using PigStorage('\t') as (name, age,
> gpa) 

[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-21 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13747064#comment-13747064
 ] 

Julien Le Dem commented on PIG-3419:


The point is to be able to implement alternate execution engines without having 
to fork Pig.
I think it should go in trunk.

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, finalpatch.patch, 
> mapreduce_execengine.patch, stats_scriptstate.patch, test_suite.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3436) Make pigmix run with Hadoop2

2013-08-21 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3436:


Attachment: PIG-3436-1.patch

> Make pigmix run with Hadoop2
> 
>
> Key: PIG-3436
> URL: https://issues.apache.org/jira/browse/PIG-3436
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.12
>
> Attachments: PIG-3436-1.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-21 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13747065#comment-13747065
 ] 

Dmitriy V. Ryaboy commented on PIG-3419:


I'd like this patch in trunk since it's not Tez-specific, and allows people to 
experiment with other runtimes (for example, Spark or Drill).

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, finalpatch.patch, 
> mapreduce_execengine.patch, stats_scriptstate.patch, test_suite.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3436) Make pigmix run with Hadoop2

2013-08-21 Thread Rohini Palaniswamy (JIRA)
Rohini Palaniswamy created PIG-3436:
---

 Summary: Make pigmix run with Hadoop2
 Key: PIG-3436
 URL: https://issues.apache.org/jira/browse/PIG-3436
 Project: Pig
  Issue Type: Improvement
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: 0.12
 Attachments: PIG-3436-1.patch



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-21 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13747021#comment-13747021
 ] 

Cheolsoo Park commented on PIG-3419:


[~julienledem], I haven't looked at it yet, but I will review it tonight. I 
will also run full unit tests.

Btw, I was meeting Mark, Olga, Rohini, and Daniel at LinkedIn this morning. We 
decided to create a tez branch. Rohini suggested that this patch should go into 
that branch instead of trunk. Can we agree where we should commit this patch 
first? Personally, I think this can go into trunk directly since it's quite 
general. But there were some concerns.

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, finalpatch.patch, 
> mapreduce_execengine.patch, stats_scriptstate.patch, test_suite.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-21 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13746999#comment-13746999
 ] 

Julien Le Dem commented on PIG-3419:


I have submitted my review. This looks great [~achalsoni81]!
[~cheolsoo] does it look good to you?
Once Achal has updated his patch I'm willing to commit.

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, finalpatch.patch, 
> mapreduce_execengine.patch, stats_scriptstate.patch, test_suite.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3385) DISTINCT no longer uses custom partitioner

2013-08-21 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13746937#comment-13746937
 ] 

Koji Noguchi commented on PIG-3385:
---

While looking at this jira, noticed custom partitioner being dropped when run 
with multi query optimization.  Created PIG-3435.

> DISTINCT no longer uses custom partitioner
> --
>
> Key: PIG-3385
> URL: https://issues.apache.org/jira/browse/PIG-3385
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Will Oberman
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: pig-3385-v01.patch, pig-3385-v02.patch
>
>
> From u...@pig.apache.org:  It looks like an optimization was put in to make 
> distinct use a special partitioner which prevents the user from setting the 
> partitioner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3435) Custom Partitioner not working with MultiQueryOptimizer

2013-08-21 Thread Koji Noguchi (JIRA)
Koji Noguchi created PIG-3435:
-

 Summary: Custom Partitioner not working with MultiQueryOptimizer
 Key: PIG-3435
 URL: https://issues.apache.org/jira/browse/PIG-3435
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Koji Noguchi
Assignee: Koji Noguchi


When looking at PIG-3385, noticed some issues in handling of custom partitioner 
with multi-query optimization.

{noformat}
C1 = group B1 by col1 PARTITION BY
   org.apache.pig.test.utils.SimpleCustomPartitioner parallel 2;
C2 = group B2 by col1 PARTITION BY
   org.apache.pig.test.utils.SimpleCustomPartitioner parallel 2;
{noformat}
This seems to be merged to one mapreduce job correctly but custom partitioner 
information was lost.

{noformat}
C1 = group B1 by col1 PARTITION BY 
org.apache.pig.test.utils.SimpleCustomPartitioner parallel 2;
C2 = group B2 by col1 parallel 2;
{noformat}
This seems to be merged even though they should run on two different 
partitioner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3419) Pluggable Execution Engine

2013-08-21 Thread Achal Soni (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Achal Soni updated PIG-3419:


Attachment: finalpatch.patch

Let me know if there are any pressing changes to this patch!

> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, finalpatch.patch, 
> mapreduce_execengine.patch, stats_scriptstate.patch, test_suite.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3419) Pluggable Execution Engine

2013-08-21 Thread Achal Soni (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13746842#comment-13746842
 ] 

Achal Soni commented on PIG-3419:
-

I have taken all the suggestions into account and regenerated a new patch that 
is hopefully cleaner, smaller, and reflects most of the suggestions. The patch 
is attached and the review board is the following:

https://reviews.apache.org/r/13714/



> Pluggable Execution Engine 
> ---
>
> Key: PIG-3419
> URL: https://issues.apache.org/jira/browse/PIG-3419
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.12
>Reporter: Achal Soni
>Assignee: Achal Soni
>Priority: Minor
> Attachments: execengine.patch, mapreduce_execengine.patch, 
> stats_scriptstate.patch, test_suite.patch
>
>
> In an effort to adapt Pig to work using Apache Tez 
> (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for 
> a cleaner ExecutionEngine abstraction than existed before. The changes are 
> not that major as Pig was already relatively abstracted out between the 
> frontend and backend. The changes in the attached commit are essentially the 
> barebones changes -- I tried to not change the structure of Pig's different 
> components too much. I think it will be interesting to see in the future how 
> we can refactor more areas of Pig to really honor this abstraction between 
> the frontend and backend. 
> Some of the changes was to reinstate an ExecutionEngine interface to tie 
> together the front end and backend, and making the changes in Pig to delegate 
> to the EE when necessary, and creating an MRExecutionEngine that implements 
> this interface. Other work included changing ExecType to cycle through the 
> ExecutionEngines on the classpath and select the appropriate one (this is 
> done using Java ServiceLoader, exactly how MapReduce does for choosing the 
> framework to use between local and distributed mode). Also I tried to make 
> ScriptState, JobStats, and PigStats as abstract as possible in its current 
> state. I think in the future some work will need to be done here to perhaps 
> re-evaluate the usage of ScriptState and the responsibilities of the 
> different statistics classes. I haven't touched the PPNL, but I think more 
> abstraction is needed here, perhaps in a separate patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


RE: can't parse the values using XML loader

2013-08-21 Thread william.dowling
Part of the problem might be that the regexp has

(.*)

but you need
(.*)

Using regexps to parse XML is awfully brittle. An alternative is to use a UDF 
that calls out to an XML parser. I use ElementTree from python UDFs.

Will Dowling


From: Muni mahesh [mahesh87.had...@gmail.com]
Sent: Wednesday, August 21, 2013 6:58 AM
To: dev@pig.apache.org; u...@pig.apache.org
Subject: can't parse the values using XML loader

*Input file :*



hadoop developer
ajay
india
ITC
10.90
2013



*Pig Script:*

register /usr/lib/pig/piggybank.jar;

A = load '/home/sudeep/Desktop/CATALOG.xml' using
org.apache.pig.piggybank.storage.XMLLoader('CATALOG') as (x:
chararray);


B = foreach A GENERATE
FLATTEN(REGEX_EXTRACT_ALL(x,'\\n*\\n(.*)\\n\\s*(.*)\\n\\s*(.*)\\n\\s*(.*)\\n\\s*(.*)\\n\\s*(.*)\\n\\s*\\n*'))
as (id: int, name:chararray);


*Output Expected :*

(hadoop, ajay, india, ITC, 10.90, 2013)

*Issue :

*

But the output i am getting is :*

()

*

*I hope it is not able to parse the values between the tags
*


Re: can't parse the values using XML loader

2013-08-21 Thread Amit
Hello,
Moreover REGEX_EXTRACT_ALL uses Matcher.matches() which tries to match the 
entire string to the input and not the parts of it. You may want to write your 
own REGEX UDF (If you are not going route suggested by Will) which uses 
Matcher.find() instead of Matcher.matches().


 
Regards,
Amit




 From: "william.dowl...@thomsonreuters.com" 
To: u...@pig.apache.org; dev@pig.apache.org 
Sent: Wednesday, August 21, 2013 12:19 PM
Subject: RE: can't parse the values using XML loader
 

Part of the problem might be that the regexp has

(.*)

but you need
(.*)

Using regexps to parse XML is awfully brittle. An alternative is to use a UDF 
that calls out to an XML parser. I use ElementTree from python UDFs.

Will Dowling


From: Muni mahesh [mahesh87.had...@gmail.com]
Sent: Wednesday, August 21, 2013 6:58 AM
To: dev@pig.apache.org; u...@pig.apache.org
Subject: can't parse the values using XML loader

*Input file :*



hadoop developer
ajay
india
ITC
10.90
2013



*Pig Script:*

register /usr/lib/pig/piggybank.jar;

A = load '/home/sudeep/Desktop/CATALOG.xml' using
org.apache.pig.piggybank.storage.XMLLoader('CATALOG') as (x:
chararray);


B = foreach A GENERATE
FLATTEN(REGEX_EXTRACT_ALL(x,'\\n*\\n(.*)\\n\\s*(.*)\\n\\s*(.*)\\n\\s*(.*)\\n\\s*(.*)\\n\\s*(.*)\\n\\s*\\n*'))
as (id: int, name:chararray);


*Output Expected :*

(hadoop, ajay, india, ITC, 10.90, 2013)

*Issue :

*

But the output i am getting is :*

()

*

*I hope it is not able to parse the values between the tags
*

[jira] [Updated] (PIG-3385) DISTINCT no longer uses custom partitioner

2013-08-21 Thread Koji Noguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-3385:
--

Attachment: pig-3385-v02.patch

Uploading a patch with test.  Noticed that original test for custom 
partitioners didn't give different partition results than the default so added 
one silly partitioner that always return 1 (second reducer).

> DISTINCT no longer uses custom partitioner
> --
>
> Key: PIG-3385
> URL: https://issues.apache.org/jira/browse/PIG-3385
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Will Oberman
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: pig-3385-v01.patch, pig-3385-v02.patch
>
>
> From u...@pig.apache.org:  It looks like an optimization was put in to make 
> distinct use a special partitioner which prevents the user from setting the 
> partitioner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3385) DISTINCT no longer uses custom partitioner

2013-08-21 Thread Koji Noguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-3385:
--

Component/s: (was: documentation)
 impl
   Assignee: Koji Noguchi

> DISTINCT no longer uses custom partitioner
> --
>
> Key: PIG-3385
> URL: https://issues.apache.org/jira/browse/PIG-3385
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Will Oberman
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: pig-3385-v01.patch
>
>
> From u...@pig.apache.org:  It looks like an optimization was put in to make 
> distinct use a special partitioner which prevents the user from setting the 
> partitioner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3117) A debug mode in which pig does not delete temporary files

2013-08-21 Thread Ido Hadanny (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ido Hadanny updated PIG-3117:
-

Fix Version/s: 0.12
Affects Version/s: 0.11.1
   Status: Patch Available  (was: Open)

patch introduces pig.delete.intermediate.files property that keeps all 
intermediate files when set to false

> A debug mode in which pig does not delete temporary files
> -
>
> Key: PIG-3117
> URL: https://issues.apache.org/jira/browse/PIG-3117
> Project: Pig
>  Issue Type: Wish
>Affects Versions: 0.11.1, 0.10.0
>Reporter: Ido Hadanny
>Assignee: Cheolsoo Park
> Fix For: 0.12
>
> Attachments: remove_intermediate_results.diff
>
>
> when we debug our pig jobs on pre-production data, we usually find bugs we 
> couldn't detect in our UT, as env and data are not quite the same.
> when the final output of a script is not quite what we expect, we start 
> divide-and-conquer, running it line by line and inspecting the intermediate 
> output of each stage. 
> It would be great if we could simply configure pig not to delete the 
> intermediate MR outputs, and store them as plaintext instead of snappy format.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3117) A debug mode in which pig does not delete temporary files

2013-08-21 Thread Ido Hadanny (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ido Hadanny updated PIG-3117:
-

Attachment: remove_intermediate_results.diff

patch introduces pig.delete.intermediate.files property that keeps all 
intermediate files when set to false

> A debug mode in which pig does not delete temporary files
> -
>
> Key: PIG-3117
> URL: https://issues.apache.org/jira/browse/PIG-3117
> Project: Pig
>  Issue Type: Wish
>Affects Versions: 0.10.0
>Reporter: Ido Hadanny
>Assignee: Cheolsoo Park
> Attachments: remove_intermediate_results.diff
>
>
> when we debug our pig jobs on pre-production data, we usually find bugs we 
> couldn't detect in our UT, as env and data are not quite the same.
> when the final output of a script is not quite what we expect, we start 
> divide-and-conquer, running it line by line and inspecting the intermediate 
> output of each stage. 
> It would be great if we could simply configure pig not to delete the 
> intermediate MR outputs, and store them as plaintext instead of snappy format.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3434) Null subexpression in bincond nullifies outer tuple (or bag)

2013-08-21 Thread Pavel Fedyakov (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Fedyakov updated PIG-3434:


Description: 
According to docs, for bincond operator "If a Boolean subexpression results in 
null value, the resulting expression is null" 
(http://pig.apache.org/docs/r0.11.0/basic.html#nulls).

It works as described in plain foreach..generate expression:

{{in = load 'in';}}
{{out = FOREACH in GENERATE 1, ($0 > 0 ? 2 : 3);}}
{{dump out;}}


in (3 lines, 2nd is empty):
{{0}}

{{1}}

out:
{{(1,3)}}
{{(1,)}}
{{(1,2)}}

But if we wrap generated variables in tuple (or bag), we lose the whole 2nd 
line in output:

{{out = FOREACH in GENERATE (1, ($0 > 0 ? 2 : 3));}}

out:
{{((1,3))}}
{{()}}
{{((1,2))}}


  was:
According to docs, for bincond operator "If a Boolean subexpression results in 
null value, the resulting expression is null" 
(http://pig.apache.org/docs/r0.11.0/basic.html#nulls).

It works as described in plain foreach..generate expression:

{{in = load 'in';}}
{{out = FOREACH in GENERATE 1, ($0 > 0 ? 2 : 3);}}
{{dump out;}}


in (3 lines, 2nd is empty):
{{0}}

{{1}}

out:
{{(1,3)}}
{{(1,)}}
{{(1,2)}}

But if we wrap generated variables in tuple (or bag), we lost the whole 2nd 
line in output:

{{out = FOREACH in GENERATE (1, ($0 > 0 ? 2 : 3));}}

out:
{{((1,3))}}
{{()}}
{{((1,2))}}



> Null subexpression in bincond nullifies outer tuple (or bag)
> 
>
> Key: PIG-3434
> URL: https://issues.apache.org/jira/browse/PIG-3434
> Project: Pig
>  Issue Type: Bug
>Reporter: Pavel Fedyakov
>
> According to docs, for bincond operator "If a Boolean subexpression results 
> in null value, the resulting expression is null" 
> (http://pig.apache.org/docs/r0.11.0/basic.html#nulls).
> It works as described in plain foreach..generate expression:
> {{in = load 'in';}}
> {{out = FOREACH in GENERATE 1, ($0 > 0 ? 2 : 3);}}
> {{dump out;}}
> in (3 lines, 2nd is empty):
> {{0}}
> {{1}}
> out:
> {{(1,3)}}
> {{(1,)}}
> {{(1,2)}}
> But if we wrap generated variables in tuple (or bag), we lose the whole 2nd 
> line in output:
> {{out = FOREACH in GENERATE (1, ($0 > 0 ? 2 : 3));}}
> out:
> {{((1,3))}}
> {{()}}
> {{((1,2))}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3434) Null subexpression in bincond nullifies outer tuple (or bag)

2013-08-21 Thread Pavel Fedyakov (JIRA)
Pavel Fedyakov created PIG-3434:
---

 Summary: Null subexpression in bincond nullifies outer tuple (or 
bag)
 Key: PIG-3434
 URL: https://issues.apache.org/jira/browse/PIG-3434
 Project: Pig
  Issue Type: Bug
Reporter: Pavel Fedyakov


According to docs, for bincond operator "If a Boolean subexpression results in 
null value, the resulting expression is null" 
(http://pig.apache.org/docs/r0.11.0/basic.html#nulls).

It works as described in plain foreach..generate expression:

{{in = load 'in';}}
{{out = FOREACH in GENERATE 1, ($0 > 0 ? 2 : 3);}}
{{dump out;}}


in (3 lines, 2nd is empty):
{{0}}

{{1}}

out:
{{(1,3)}}
{{(1,)}}
{{(1,2)}}

But if we wrap generated variables in tuple (or bag), we lost the whole 2nd 
line in output:

{{out = FOREACH in GENERATE (1, ($0 > 0 ? 2 : 3));}}

out:
{{((1,3))}}
{{()}}
{{((1,2))}}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


can't parse the values using XML loader

2013-08-21 Thread Muni mahesh
*Input file :*



hadoop developer
ajay
india
ITC
10.90
2013



*Pig Script:*

register /usr/lib/pig/piggybank.jar;

A = load '/home/sudeep/Desktop/CATALOG.xml' using
org.apache.pig.piggybank.storage.XMLLoader('CATALOG') as (x:
chararray);


B = foreach A GENERATE
FLATTEN(REGEX_EXTRACT_ALL(x,'\\n*\\n(.*)\\n\\s*(.*)\\n\\s*(.*)\\n\\s*(.*)\\n\\s*(.*)\\n\\s*(.*)\\n\\s*\\n*'))
as (id: int, name:chararray);


*Output Expected :*

(hadoop, ajay, india, ITC, 10.90, 2013)

*Issue :

*

But the output i am getting is :*

()

*

*I hope it is not able to parse the values between the tags
*


Re: Slow Group By operator

2013-08-21 Thread Benjamin Jakobus
Hi Cheolsoo,

>>What's your query like? Can you share it? Do you call any algebraic UDF
>> after group by? I am wondering whether combiner matters in your test.
I have been running 3 different types of queries.

The first was performed on datasets of 6 different sizes:


   - Dataset size 1: 30,000 records (772KB)
   - Dataset size 2: 300,000 records (6.4MB)
   - Dataset size 3: 3,000,000 records (63MB)
   - Dataset size 4: 30 million records (628MB)
   - Dataset size 5: 300 million records (6.2GB)
   - Dataset size 6: 3 billion records (62GB)

The datasets scale linearly, whereby the size equates to 3000 * 10n .
A seventh dataset consisting of 1,000 records (23KB) was produced to
perform join
operations on. Its schema is as follows:
name - string
marks - integer
gpa - float
The data was generated using the generate data.pl perl script available for
download
 from https://issues.apache.org/jira/browse/PIG-200 to produce the
datasets. The results are as follows:


 *  * *  * *  * *Set 1  * *Set 2**  * *Set 3**  * *Set
4**  * *Set 5**  * *Set 6*
*Arithmetic**  * 32.82*  * 36.21*  * 49.49*  * 83.25*  *
 423.63*  * 3900.78
*Filter 10%**  * 32.94*  * 34.32*  * 44.56*  * 66.68*  *
 295.59*  * 2640.52
*Filter 90%**  * 33.93*  * 32.55*  * 37.86*  * 53.22*  *
 197.36*  * 1657.37
*Group**  * *  *49.43*  * 53.34*  * 69.84*  * 105.12*
   *497.61*  * 4394.21
*Join**  * *  *   49.89*  * 50.08*  * 78.55*  * 150.39*
   *1045.34* *10258.19
*Averaged performance of arithmetic, join, group, order, distinct select
and filter operations on six datasets using Pig. Scripts were configured as
to use 8 reduce and 11 map tasks.*



 *  * *  Set 1**  * *Set 2**  * *Set 3**  * *Set
4**  * *Set 5**  * *Set 6*
*Arithmetic**  *  32.84*  * 37.33*  * 72.55*  * 300.08
 2633.7227821.19
*Filter 10%  *   32.36*  * 53.28*  * 59.22*  * 209.5**
1672.3* *18222.19
*Filter 90%  *  31.23*  * 32.68*  *  36.8*  *  69.55*  *
331.88* *3320.59
*Group  * *  * 48.27*  * 47.68*  * 46.87*  * 53.66*
 *141.36* *1233.4
*Join  * *  * *   *48.54*  *56.86*  * 104.6*  * 517.5*
   * 4388.34*  * -
*Distinct**  * * *48.73*  *53.28*  * 72.54*  * 109.77*
   * - *  * *  *  -
*Averaged performance of arithmetic, join, group, distinct select and
filter operations on six datasets using Hive. Scripts were configured as to
use 8 reduce and 11 map tasks.*

(If you want to see the standard deviation, let me know).

So, to summarize the results: Pig outperforms Hive, with the exception of
using *Group By*.

The Pig scripts used for this benchmark are as follows:
*Arithmetic*
-- Generate with basic arithmetic
A = load '$input/dataset_3' using PigStorage('\t') as (name, age,
gpa) PARALLEL $reducers;
B = foreach A generate age * gpa + 3, age/gpa - 1.5 PARALLEL $reducers;
store B into '$output/dataset_3_projection' using PigStorage()
PARALLEL $reducers;

*
*
*Filter 10%*
-- Filter that removes 10% of data
A = load '$input/dataset_3' using PigStorage('\t') as (name, age,
gpa) PARALLEL $reducers;
B = filter A by gpa < '3.6' PARALLEL $reducers;
store B into '$output/dataset_3_filter_10' using PigStorage()
PARALLEL $reducers;


*Filter 90%*
-- Filter that removes 90% of data
A = load '$input/dataset_3' using PigStorage('\t') as (name, age,
gpa) PARALLEL $reducers;
B = filter A by age < '25' PARALLEL $reducers;
store B into '$output/dataset_3_filter_90' using PigStorage()
PARALLEL $reducers;

*
*
*Group*
A = load '$input/dataset_3' using PigStorage('\t') as (name, age,
gpa) PARALLEL $reducers;
B = group A by name PARALLEL $reducers;
C = foreach B generate flatten(group), COUNT(A.age) PARALLEL $reducers;
store C into '$output/dataset_3_group' using PigStorage() PARALLEL
$reducers;
*
*
*Join*
A = load '$input/dataset_3' using PigStorage('\t') as (name, age,
gpa) PARALLEL $reducers;
B = load '$input/dataset_join' using PigStorage('\t') as (name, age, gpa)
PARALLEL $reducers;
C = cogroup A by name inner, B by name inner PARALLEL $reducers;
D = foreach C generate flatten(A), flatten(B) PARALLEL $reducers;
store D into '$output/dataset_3_cogroup_big' using PigStorage()
PARALLEL $reducers;

Similarly, here the Hive scripts:
*Arithmetic*
SELECT (dataset.age * dataset.gpa + 3) AS F1, (dataset.age/dataset.gpa -
1.5) AS F2
FROM dataset
WHERE dataset.gpa > 0;

*Filter 10%*
SELECT *
FROM dataset
WHERE dataset.gpa < 3.6;

*Filter 90%*
SELECT *
FROM dataset
WHERE dataset.age < 25;

*Group*
SELECT COUNT(dataset.age)
FROM dataset
GROUP BY dataset.name;

*Join*
SELECT *
FROM dataset JOIN dataset_join
ON dataset.name = dataset_join.name;

I will re-run the benchmarks to see whether it is the reduce or 

[jira] [Created] (PIG-3433) The import sdsu cannot be resolved

2013-08-21 Thread Ido Hadanny (JIRA)
Ido Hadanny created PIG-3433:


 Summary: The import sdsu cannot be resolved
 Key: PIG-3433
 URL: https://issues.apache.org/jira/browse/PIG-3433
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.11.1
 Environment: Eclipse indigo
Reporter: Ido Hadanny


executed:
➜  trunk  svn update
At revision 1516115.
ant clean eclipse-files
ant compile gen

getting:
https://issues.apache.org/jira/browse/PIG-3399

AND after manually removing the wrong javacc-4.2 dependency, getting:
"The import sdsu cannot be resolved" in DataGenerator.java

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira