date:20120828

[jira] [Created] (PIG-2894) [Piggybank] HadoopJobHistoryLoader for hadoop 0.20.205+

2012-08-28 Thread Aniket Mokashi (JIRA)

Aniket Mokashi created PIG-2894:
---

 Summary: [Piggybank] HadoopJobHistoryLoader for hadoop 0.20.205+
 Key: PIG-2894
 URL: https://issues.apache.org/jira/browse/PIG-2894
 Project: Pig
  Issue Type: Bug
  Components: piggybank
Reporter: Aniket Mokashi
Assignee: Aniket Mokashi


With https://issues.apache.org/jira/browse/MAPREDUCE-323 hadoop moves job 
history files to done directory. With that it is not possible to use current 
HadoopJobHistoryLoader. We need to fix this to make it more useful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (PIG-2895) jodatime jar missing in pig-withouthadoop.jar

2012-08-28 Thread Thejas M Nair (JIRA)

Thejas M Nair created PIG-2895:
--

 Summary: jodatime jar missing in pig-withouthadoop.jar
 Key: PIG-2895
 URL: https://issues.apache.org/jira/browse/PIG-2895
 Project: Pig
  Issue Type: Sub-task
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.11


jodatime jar is missing in pig-withouthadoop.jar. When an external hadoop.jar 
is used, pig will fail with class not found error.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2895) jodatime jar missing in pig-withouthadoop.jar

2012-08-28 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2895:
---

Attachment: PIG-2895.1.patch

 jodatime jar missing in pig-withouthadoop.jar
 -

 Key: PIG-2895
 URL: https://issues.apache.org/jira/browse/PIG-2895
 Project: Pig
  Issue Type: Sub-task
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.11

 Attachments: PIG-2895.1.patch


 jodatime jar is missing in pig-withouthadoop.jar. When an external hadoop.jar 
 is used, pig will fail with class not found error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2886) Add Scan TimeRange to HBaseStorage

2012-08-28 Thread Ted Malaska (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443559#comment-13443559
 ] 

Ted Malaska commented on PIG-2886:
--

Thanks Bill,

I tried running TestHBaseStorage and it freezes on SetUp.  

public void setUp() throws Exception {
// This is needed by Pig
   
cluster = MiniCluster.buildCluster();
conf = cluster.getConfiguration();

util = new HBaseTestingUtility(conf);
util.startMiniZKCluster();
util.startMiniHBaseCluster(1, 1);
}

Just wondering if you know what I'm missing to make this work.  Hopefully I 
will get time in the next couple of days to research this.

 Add Scan TimeRange to HBaseStorage 
 ---

 Key: PIG-2886
 URL: https://issues.apache.org/jira/browse/PIG-2886
 Project: Pig
  Issue Type: Bug
Reporter: Ted Malaska
Priority: Minor
 Attachments: PIG-2886-0.patch, PIG-2886-1.patch


 I have a client that wants to use pig.  They are using MR now.  They can't 
 use PIG right now because they only want to fetch the last day's worth of 
 data in HBase.  A filter with time range would require reading all the HStore 
 files.  If we hold major compaction until after the fetch and use Scan Time 
 Range we only need to read very little in compression. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-08-28 Thread Julien Le Dem (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443614#comment-13443614
 ] 

Julien Le Dem commented on PIG-1314:


Hi Thejas,
this commit added JobControlCompiler.java.orig which I suspect is not what you 
intended.
http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java.orig?view=logpathrev=1376800
Could you double check?
Thanks, Julien

 Add DateTime Support to Pig
 ---

 Key: PIG-1314
 URL: https://issues.apache.org/jira/browse/PIG-1314
 Project: Pig
  Issue Type: Bug
  Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
Assignee: Zhijie Shen
  Labels: gsoc2012
 Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, 
 PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch, 
 PIG-1314-7.patch

   Original Estimate: 672h
  Remaining Estimate: 672h

 Hadoop/Pig are primarily used to parse log data, and most logs have a 
 timestamp component.  Therefore Pig should support dates as a primitive.
 Can someone familiar with adding types to pig comment on how hard this is?  
 We're looking at doing this, rather than use UDFs.  Is this a patch that 
 would be accepted?
 This is a candidate project for Google summer of code 2012. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-08-28 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443630#comment-13443630
 ] 

Thejas M Nair commented on PIG-1314:


Yes, that was not intentional. Deleted JobControlCompiler.java.orig in svn.


 Add DateTime Support to Pig
 ---

 Key: PIG-1314
 URL: https://issues.apache.org/jira/browse/PIG-1314
 Project: Pig
  Issue Type: Bug
  Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
Assignee: Zhijie Shen
  Labels: gsoc2012
 Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, 
 PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch, 
 PIG-1314-7.patch

   Original Estimate: 672h
  Remaining Estimate: 672h

 Hadoop/Pig are primarily used to parse log data, and most logs have a 
 timestamp component.  Therefore Pig should support dates as a primitive.
 Can someone familiar with adding types to pig comment on how hard this is?  
 We're looking at doing this, rather than use UDFs.  Is this a patch that 
 would be accepted?
 This is a candidate project for Google summer of code 2012. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2819) ObjectSerializer should support classloader

2012-08-28 Thread Aniket Mokashi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443632#comment-13443632
 ] 

Aniket Mokashi commented on PIG-2819:
-

I discussed this briefly with Julien during the hackathon. This is useful for 
HCatLoader(ish) use case-(deserializing InputJobInfo). Do you guys have a patch 
for this?

 ObjectSerializer should support classloader
 ---

 Key: PIG-2819
 URL: https://issues.apache.org/jira/browse/PIG-2819
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Raghu Angadi

 {ObjectSerializer} is pretty useful and could be used by UDF and other user 
 code.
 Currently its limitation is that the class that is being deserialized should 
 be visible to root class loader (ie. should be part of CLASSPATH on the front 
 end). The registered jars are not visibile. This is because 
 {{java.io.ObjectInputStream}} used to deserialize is from the root 
 classloader.
 ObjectSerializer should support another method {{deserialize(str, 
 ClassLoader)}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2886) Add Scan TimeRange to HBaseStorage

2012-08-28 Thread Cheolsoo Park (JIRA)

[
https://issues.apache.org/jira/browse/PIG-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443636#comment-13443636
]

Cheolsoo Park commented on PIG-2886:

Hi Ted,

Regarding TestHBaseStorage, does it hang in hadoop 20 or 23? I assume that
you're not setting -Dhadoopversion so using hadoop 20 by default. In hadoop
20, TestHBaseStorage passes for me with your patch. I.e. ant clean test
-Dtestcase=TestHBaseStorage -Dhadoopversion=20 passes.
{code}
[junit] Running org.apache.pig.test.TestHBaseStorage
[junit] Tests run: 23, Failures: 0, Errors: 0, Time elapsed: 131.728 sec
{code}
If it doesn't pass for you, it should be some environment issue. (e.g. did you
set umask 0022?)

However, it does time out in hadoop 23, and I believe that it's expected since
hbase jar from the maven repository is not binary compatible with hadoop 23.
I.e. ant clean test -Dtestcase=TestHBaseStorage -Dhadoopversion=23 fails with
time out error, and the following error can be found in the test log
(build/test/logs/TEST-org.apache.pig.test.TestHBaseStorage.txt):
{code}
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.hdfs.protocol.FSConstants$SafeModeAction
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 7 more
{code}

I ran into the same issue while bumping hbase to 0.94, but it seem applied to
0.90 (current version in trunk) as well. Please see HBASE-5680 for more details.

Please anyone corrects me if I am wrong about TestHBaseStorage in hadoop 23.

Thanks!

Add Scan TimeRange to HBaseStorage
---

Key: PIG-2886
URL: https://issues.apache.org/jira/browse/PIG-2886
Project: Pig
Issue Type: Bug
Reporter: Ted Malaska
Priority: Minor
Attachments: PIG-2886-0.patch, PIG-2886-1.patch

I have a client that wants to use pig. They are using MR now. They can't
use PIG right now because they only want to fetch the last day's worth of
data in HBase. A filter with time range would require reading all the HStore
files. If we hold major compaction until after the fetch and use Scan Time
Range we only need to read very little in compression.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1483) [piggybank] Add HadoopJobHistoryLoader to the piggybank

2012-08-28 Thread Aniket Mokashi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443639#comment-13443639
 ] 

Aniket Mokashi commented on PIG-1483:
-

Opened https://issues.apache.org/jira/browse/PIG-2894.

 [piggybank] Add HadoopJobHistoryLoader to the piggybank
 ---

 Key: PIG-1483
 URL: https://issues.apache.org/jira/browse/PIG-1483
 Project: Pig
  Issue Type: New Feature
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1483_1.patch, PIG-1483.patch


 PIG-1333 added many script-related entries to the MR job xml file and thus 
 it's now possible to use Pig for querying Hadoop job history/xml files to get 
 script-level usage statistics. What we need is a Pig loader that can parse 
 these files and generate corresponding data objects.
 The goal of this jira is to create a HadoopJobHistoryLoader in piggybank.
 Here is an example that shows the intended usage:
 *Find all the jobs grouped by script and user:*
 {code}
 a = load '/mapred/history/_logs/history/' using HadoopJobHistoryLoader() as 
 (j:map[], m:map[], r:map[]);
 b = foreach a generate (Chararray) j#'PIG_SCRIPT_ID' as id, (Chararray) 
 j#'USER' as user, (Chararray) j#'JOBID' as job; 
 c = filter b by not (id is null);
 d = group c by (id, user);
 e = foreach d generate flatten(group), c.job;
 dump e;
 {code}
 A couple more examples:
 *Find scripts that use only the default parallelism:*
 {code}
 a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], 
 m:map[], r:map[]);
 b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' 
 as script_name, (Long) r#'NUMBER_REDUCES' as reduces;
 c = group b by (id, user, script_name) parallel 10;
 d = foreach c generate group.user, group.script_name, MAX(b.reduces) as 
 max_reduces;
 e = filter d by max_reduces == 1;
 dump e;
 {code}
 *Find the running time of each script (in seconds):*
 {code}
 a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], 
 m:map[], r:map[]);
 b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' 
 as script_name, (Long) j#'SUBMIT_TIME' as start, (Long) j#'FINISH_TIME' as 
 end;
 c = group b by (id, user, script_name)
 d = foreach c generate group.user, group.script_name, (MAX(b.end) - 
 MIN(b.start)/1000;
 dump d;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2886) Add Scan TimeRange to HBaseStorage

2012-08-28 Thread Ted Malaska (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443640#comment-13443640
 ] 

Ted Malaska commented on PIG-2886:
--

Great thanks.  Got it.  

I was first doing in on my local (no Hadoop) and it would freezy.  Then I tried 
it on CDH4 and it didn't work either.  I will try it on CDH3 tonight.

By the way do you see anything else in the code I should add or clean up.

I should have time to work on it tonight.

Ted Malaska  

 Add Scan TimeRange to HBaseStorage 
 ---

 Key: PIG-2886
 URL: https://issues.apache.org/jira/browse/PIG-2886
 Project: Pig
  Issue Type: Bug
Reporter: Ted Malaska
Priority: Minor
 Attachments: PIG-2886-0.patch, PIG-2886-1.patch


 I have a client that wants to use pig.  They are using MR now.  They can't 
 use PIG right now because they only want to fetch the last day's worth of 
 data in HBase.  A filter with time range would require reading all the HStore 
 files.  If we hold major compaction until after the fetch and use Scan Time 
 Range we only need to read very little in compression. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2893) fix DBStorage compile issue

2012-08-28 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443785#comment-13443785
 ] 

Alan Gates commented on PIG-2893:
-

+1, patch looks good.

 fix DBStorage compile issue
 ---

 Key: PIG-2893
 URL: https://issues.apache.org/jira/browse/PIG-2893
 Project: Pig
  Issue Type: Sub-task
Reporter: Thejas M Nair
 Attachments: PIG-2893.1.patch


 DBStorage does not compile after the datetime patch was committed. The joda 
 datetime was passed as argument to java.sql.PreparedStatement.setDate() 
 instead of java.sql.Date .

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2892) piggybank build failing on trunk

2012-08-28 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443786#comment-13443786
 ] 

Alan Gates commented on PIG-2892:
-

Thejas filed a separate issue for this, PIG-2893.  He's also posted a patch 
over on that JIRA.  It looked like it handled the date a little differently.  
I'm not sure which is the right solution, you should work with Thejas to figure 
out which is the right one.  If it's ok with you I'll mark this one as a 
duplicate.

 piggybank build failing on trunk
 

 Key: PIG-2892
 URL: https://issues.apache.org/jira/browse/PIG-2892
 Project: Pig
  Issue Type: Bug
  Components: piggybank
Reporter: Alan Gates
Assignee: Cheolsoo Park
Priority: Critical
 Attachments: PIG-2892.patch


 When I try to build Piggybank I get:
 {code}
[javac] 
 /grid/0/hortonal/src/pig/top/trunk/contrib/piggybank/java/build.xml:92: 
 warning: 'includeantruntime' was not set, defaulting to 
 build.sysclasspath=last; set to false for repeatable builds
 [javac] Compiling 159 source files to 
 /grid/0/hortonal/src/pig/top/trunk/contrib/piggybank/java/build/classes
 [javac] 
 /grid/0/hortonal/src/pig/top/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/DBStorage.java:121:
  cannot find symbol
 [javac] symbol  : method setDate(int,java.util.Date)
 [javac] location: interface java.sql.PreparedStatement
 [javac] ps.setDate(sqlPos, ((DateTime) field).toDate());
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2892) piggybank build failing on trunk

2012-08-28 Thread Cheolsoo Park (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443787#comment-13443787
 ] 

Cheolsoo Park commented on PIG-2892:


Hi Alan, I looked at PIG-2893, and his patch seems good to me. In addition, he 
updated the test case. Please go ahead close this as a duplicate. Thanks!

 piggybank build failing on trunk
 

 Key: PIG-2892
 URL: https://issues.apache.org/jira/browse/PIG-2892
 Project: Pig
  Issue Type: Bug
  Components: piggybank
Reporter: Alan Gates
Assignee: Cheolsoo Park
Priority: Critical
 Attachments: PIG-2892.patch


 When I try to build Piggybank I get:
 {code}
[javac] 
 /grid/0/hortonal/src/pig/top/trunk/contrib/piggybank/java/build.xml:92: 
 warning: 'includeantruntime' was not set, defaulting to 
 build.sysclasspath=last; set to false for repeatable builds
 [javac] Compiling 159 source files to 
 /grid/0/hortonal/src/pig/top/trunk/contrib/piggybank/java/build/classes
 [javac] 
 /grid/0/hortonal/src/pig/top/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/DBStorage.java:121:
  cannot find symbol
 [javac] symbol  : method setDate(int,java.util.Date)
 [javac] location: interface java.sql.PreparedStatement
 [javac] ps.setDate(sqlPos, ((DateTime) field).toDate());
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2888) Improve performance of POPartialAgg

2012-08-28 Thread Dmitriy V. Ryaboy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-2888:
---

Attachment: partialagg_patch_5.patch

 Improve performance of POPartialAgg
 ---

 Key: PIG-2888
 URL: https://issues.apache.org/jira/browse/PIG-2888
 Project: Pig
  Issue Type: Improvement
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
 Attachments: partialagg_patch_1.patch, partialagg_patch_2.patch, 
 partialagg_patch_3.patch, partialagg_patch_4.patch, partialagg_patch_5.patch


 During performance testing, we found that POPartialAgg can cause performance 
 degradation for Pig jobs when the Algebraic UDFs it's being applied to aren't 
 well suited to the operator's assumptions. Changing the implementation to a 
 more flexible hash-based model can provide significant performance 
 improvements.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2888) Improve performance of POPartialAgg

2012-08-28 Thread Dmitriy V. Ryaboy (JIRA)

[
https://issues.apache.org/jira/browse/PIG-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443793#comment-13443793
]

Dmitriy V. Ryaboy commented on PIG-2888:

bq. There's a pig.exec.nocombiner that was not replaced by a constant.

Fixed.

bq. It would be nice to have a consistent way of getting booleans (and floats)
from the conf

Feels like scope creep.. maybe in another ticket? I don't want to get into how
to design that around Properties, Configurations, and PigConfigurations.

bq. some of the class description was still applicable
Added better docs.

bq. what is the reason for this particular value?

Bad math :). Fixed the math and added an explanation of how I got there.

bq. Don't you want a visitor to just list them all once and set the count? That
way you would not have to worry about keeping a reference on them.

I could do that, but this feels much cleaner -- no visitors, no serialization,
no changes to the MRCompiler/JCCompiler, very self-contained, and works at
runtime instead of having to be preset by the planner.

bq. +0.5 so that it is never 0 ? Math.min(1, ...) is more readable.

No, +0.5 so that it's a round() instead of floor()

bq. LOG.info() should be wrapped in if (LOG.isInfoEnabled()) { ... } for perf
Done for places where it matters (functions invoked more than once and messages
where args are not constants)

bq.in aggregateSecondLevel() can't the processedInputMap be reused?

No -- aggregate() adds to the list of tuples in the target map, we want to
overwrite in this case.

bq. in getMinOutputReductionFromProp(), if minReduction = 0 it should throw an
exception.

Added a log message instead.

Improve performance of POPartialAgg
---

Key: PIG-2888
URL: https://issues.apache.org/jira/browse/PIG-2888
Project: Pig
Issue Type: Improvement
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
Attachments: partialagg_patch_1.patch, partialagg_patch_2.patch,
partialagg_patch_3.patch, partialagg_patch_4.patch, partialagg_patch_5.patch

During performance testing, we found that POPartialAgg can cause performance
degradation for Pig jobs when the Algebraic UDFs it's being applied to aren't
well suited to the operator's assumptions. Changing the implementation to a
more flexible hash-based model can provide significant performance
improvements.

[jira] [Commented] (PIG-2895) jodatime jar missing in pig-withouthadoop.jar

2012-08-28 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443811#comment-13443811
 ] 

Alan Gates commented on PIG-2895:
-

When I run the e2e tests I am still seeing an error, even once this patch is 
applied.

 jodatime jar missing in pig-withouthadoop.jar
 -

 Key: PIG-2895
 URL: https://issues.apache.org/jira/browse/PIG-2895
 Project: Pig
  Issue Type: Sub-task
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.11

 Attachments: PIG-2895.1.patch


 jodatime jar is missing in pig-withouthadoop.jar. When an external hadoop.jar 
 is used, pig will fail with class not found error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (PIG-2894) [Piggybank] HadoopJobHistoryLoader for hadoop 0.20.205+

[jira] [Created] (PIG-2895) jodatime jar missing in pig-withouthadoop.jar

[jira] [Updated] (PIG-2895) jodatime jar missing in pig-withouthadoop.jar

[jira] [Commented] (PIG-2886) Add Scan TimeRange to HBaseStorage

[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

[jira] [Commented] (PIG-2819) ObjectSerializer should support classloader

[jira] [Commented] (PIG-2886) Add Scan TimeRange to HBaseStorage

[jira] [Commented] (PIG-1483) [piggybank] Add HadoopJobHistoryLoader to the piggybank

[jira] [Commented] (PIG-2886) Add Scan TimeRange to HBaseStorage

[jira] [Commented] (PIG-2893) fix DBStorage compile issue

[jira] [Commented] (PIG-2892) piggybank build failing on trunk

[jira] [Commented] (PIG-2892) piggybank build failing on trunk

[jira] [Updated] (PIG-2888) Improve performance of POPartialAgg

[jira] [Commented] (PIG-2888) Improve performance of POPartialAgg

[jira] [Commented] (PIG-2895) jodatime jar missing in pig-withouthadoop.jar

16 matches

Site Navigation

Mail list logo

Footer information