[jira] Assigned: (PIG-882) log level not propogated to loggers

2009-07-29 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-882:
--

Assignee: Daniel Dai

> log level not propogated to loggers 
> 
>
> Key: PIG-882
> URL: https://issues.apache.org/jira/browse/PIG-882
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Thejas M Nair
>Assignee: Daniel Dai
> Fix For: 0.4.0
>
> Attachments: PIG-882-1.patch, PIG-882-2.patch, PIG-882-3.patch, 
> PIG-882-4.patch, PIG-882-5.patch
>
>
> Pig accepts log level as a parameter. But the log level it captures is not 
> set appropriately, so that loggers in different classes log at the specified 
> level.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-882) log level not propogated to loggers

2009-07-29 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-882:
---

Fix Version/s: 0.4.0
Affects Version/s: 0.3.0
   Status: Patch Available  (was: In Progress)

> log level not propogated to loggers 
> 
>
> Key: PIG-882
> URL: https://issues.apache.org/jira/browse/PIG-882
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Thejas M Nair
>Assignee: Daniel Dai
> Fix For: 0.4.0
>
> Attachments: PIG-882-1.patch, PIG-882-2.patch, PIG-882-3.patch, 
> PIG-882-4.patch, PIG-882-5.patch
>
>
> Pig accepts log level as a parameter. But the log level it captures is not 
> set appropriately, so that loggers in different classes log at the specified 
> level.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



zookeeper patch builds

2009-07-29 Thread Giridharan Kesavan
Looks like hudson space issue is resolved; I 've restarted the zookeeper patch 
build jobs.

-Giri


[jira] Updated: (PIG-882) log level not propogated to loggers

2009-07-29 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-882:
---

Attachment: PIG-882-5.patch

> log level not propogated to loggers 
> 
>
> Key: PIG-882
> URL: https://issues.apache.org/jira/browse/PIG-882
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Thejas M Nair
> Attachments: PIG-882-1.patch, PIG-882-2.patch, PIG-882-3.patch, 
> PIG-882-4.patch, PIG-882-5.patch
>
>
> Pig accepts log level as a parameter. But the log level it captures is not 
> set appropriately, so that loggers in different classes log at the specified 
> level.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-882) log level not propogated to loggers

2009-07-29 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-882:
---

Status: In Progress  (was: Patch Available)

> log level not propogated to loggers 
> 
>
> Key: PIG-882
> URL: https://issues.apache.org/jira/browse/PIG-882
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Thejas M Nair
> Attachments: PIG-882-1.patch, PIG-882-2.patch, PIG-882-3.patch, 
> PIG-882-4.patch, PIG-882-5.patch
>
>
> Pig accepts log level as a parameter. But the log level it captures is not 
> set appropriately, so that loggers in different classes log at the specified 
> level.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-882) log level not propogated to loggers

2009-07-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737006#action_12737006
 ] 

Hadoop QA commented on PIG-882:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12414928/PIG-882-4.patch
  against trunk revision 799141.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to cause Findbugs to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/145/testReport/
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/145/console

This message is automatically generated.

> log level not propogated to loggers 
> 
>
> Key: PIG-882
> URL: https://issues.apache.org/jira/browse/PIG-882
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Thejas M Nair
> Attachments: PIG-882-1.patch, PIG-882-2.patch, PIG-882-3.patch, 
> PIG-882-4.patch
>
>
> Pig accepts log level as a parameter. But the log level it captures is not 
> set appropriately, so that loggers in different classes log at the specified 
> level.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-833) Storage access layer

2009-07-29 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736998#action_12736998
 ] 

Raghu Angadi commented on PIG-833:
--

There will be benchmark results either attached to this jira or to a subsequent 
jira.

I would like to compare to SequenceFiles and the new format in Hive. Should to 
see on par performance.

Major performance benefits come from commonly used projections (through column 
groups) and map side joins of sorted tables. An important part of motivation is 
some features like column security, ability to delete entire columns. 

We are running some larger scale benchmarks internally.. but these run on 
Yahoo's internal data sources.


> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, zebra-javadoc.tgz
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-898) TextDataParser does not handle delimiters from one complex type in another

2009-07-29 Thread Santhosh Srinivasan (JIRA)
TextDataParser does not handle delimiters from one complex type in another
--

 Key: PIG-898
 URL: https://issues.apache.org/jira/browse/PIG-898
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.4.0
Reporter: Santhosh Srinivasan
Priority: Minor
 Fix For: 0.4.0


Currently, TextDataParser does not handle delimiters of one complex type in 
another. An example of such a case is key1(#value1} will not be parsed 
correctly. The production for strings matches any sequence of character that do 
not contain any delimiters for the complex types.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-880) Order by is borken with complex fields

2009-07-29 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-880:


Status: Patch Available  (was: Open)

> Order by is borken with complex fields
> --
>
> Key: PIG-880
> URL: https://issues.apache.org/jira/browse/PIG-880
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Olga Natkovich
>Assignee: Santhosh Srinivasan
> Fix For: 0.4.0
>
> Attachments: PIG-880-bytearray-mapvalue-code-without-tests.patch, 
> PIG-880.patch
>
>
> Pig script:
> a = load 'studentcomplextab10k' as (smap:map[],c2,c3);
> f = foreach a generate smap#'name, smap#'age', smap#'gpa' ;
> s = order f by $0;   
> store s into 'sc.out' 
> Stack:
> Caused by: java.lang.ArrayStoreException
> at java.lang.System.arraycopy(Native Method)
> at java.util.Arrays.copyOf(Arrays.java:2763)
> at java.util.ArrayList.toArray(ArrayList.java:305)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.convertToArray(WeightedRangePartitioner.java:154)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:96)
> ... 5 more
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:230)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:179)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:204)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265)
> at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:769)
> at org.apache.pig.PigServer.execute(PigServer.java:762)
> at org.apache.pig.PigServer.access$100(PigServer.java:91)
> at org.apache.pig.PigServer$Graph.execute(PigServer.java:933)
> at org.apache.pig.PigServer.executeBatch(PigServer.java:245)
> at 
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:140)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
> at org.apache.pig.Main.main(Main.java:389)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-880) Order by is borken with complex fields

2009-07-29 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-880:


Attachment: PIG-880.patch

Attached patch creates maps with value type set to DataByteArray (i.e., 
bytearray) for text data parsed by PigStorage. This change is consistent with 
the language semantics of treating value type as bytearray. New test cases have 
been added.

> Order by is borken with complex fields
> --
>
> Key: PIG-880
> URL: https://issues.apache.org/jira/browse/PIG-880
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Olga Natkovich
>Assignee: Santhosh Srinivasan
> Fix For: 0.4.0
>
> Attachments: PIG-880-bytearray-mapvalue-code-without-tests.patch, 
> PIG-880.patch
>
>
> Pig script:
> a = load 'studentcomplextab10k' as (smap:map[],c2,c3);
> f = foreach a generate smap#'name, smap#'age', smap#'gpa' ;
> s = order f by $0;   
> store s into 'sc.out' 
> Stack:
> Caused by: java.lang.ArrayStoreException
> at java.lang.System.arraycopy(Native Method)
> at java.util.Arrays.copyOf(Arrays.java:2763)
> at java.util.ArrayList.toArray(ArrayList.java:305)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.convertToArray(WeightedRangePartitioner.java:154)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:96)
> ... 5 more
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:230)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:179)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:204)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265)
> at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:769)
> at org.apache.pig.PigServer.execute(PigServer.java:762)
> at org.apache.pig.PigServer.access$100(PigServer.java:91)
> at org.apache.pig.PigServer$Graph.execute(PigServer.java:933)
> at org.apache.pig.PigServer.executeBatch(PigServer.java:245)
> at 
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:140)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
> at org.apache.pig.Main.main(Main.java:389)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-880) Order by is borken with complex fields

2009-07-29 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan reassigned PIG-880:
---

Assignee: Santhosh Srinivasan

> Order by is borken with complex fields
> --
>
> Key: PIG-880
> URL: https://issues.apache.org/jira/browse/PIG-880
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Olga Natkovich
>Assignee: Santhosh Srinivasan
> Fix For: 0.4.0
>
> Attachments: PIG-880-bytearray-mapvalue-code-without-tests.patch
>
>
> Pig script:
> a = load 'studentcomplextab10k' as (smap:map[],c2,c3);
> f = foreach a generate smap#'name, smap#'age', smap#'gpa' ;
> s = order f by $0;   
> store s into 'sc.out' 
> Stack:
> Caused by: java.lang.ArrayStoreException
> at java.lang.System.arraycopy(Native Method)
> at java.util.Arrays.copyOf(Arrays.java:2763)
> at java.util.ArrayList.toArray(ArrayList.java:305)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.convertToArray(WeightedRangePartitioner.java:154)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:96)
> ... 5 more
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:230)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:179)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:204)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265)
> at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:769)
> at org.apache.pig.PigServer.execute(PigServer.java:762)
> at org.apache.pig.PigServer.access$100(PigServer.java:91)
> at org.apache.pig.PigServer$Graph.execute(PigServer.java:933)
> at org.apache.pig.PigServer.executeBatch(PigServer.java:245)
> at 
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:140)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
> at org.apache.pig.Main.main(Main.java:389)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-897) Pig should support counters

2009-07-29 Thread Santhosh Srinivasan (JIRA)
Pig should support counters
---

 Key: PIG-897
 URL: https://issues.apache.org/jira/browse/PIG-897
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.4.0
Reporter: Santhosh Srinivasan
 Fix For: 0.4.0


Pig should support the use of counters. The use of the counters can possibly be 
via the script or via Java APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-889) Pig can not access reporter of PigHadoopLog in Load Func

2009-07-29 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan resolved PIG-889.
-

  Resolution: Won't Fix
Release Note: As per the discussion with Jeff, closing the bug as won't fix

> Pig can not access reporter of PigHadoopLog in Load Func
> 
>
> Key: PIG-889
> URL: https://issues.apache.org/jira/browse/PIG-889
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Fix For: 0.4.0
>
> Attachments: Pig_889_Patch.txt
>
>
> I'd like to increment Counter in my own LoadFunc, but it will throw 
> NullPointerException. It seems that the reporter is not initialized.  
> I looked into this problem and find that it need to call 
> PigHadoopLogger.getInstance().setReporter(reporter) in PigInputFormat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-889) Pig can not access reporter of PigHadoopLog in Load Func

2009-07-29 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736990#action_12736990
 ] 

Santhosh Srinivasan commented on PIG-889:
-

PigHadoopLogger implements the PigLogger interface. As part of the 
implementation it uses the Hadoop reporter for aggregating the warning messages.

> Pig can not access reporter of PigHadoopLog in Load Func
> 
>
> Key: PIG-889
> URL: https://issues.apache.org/jira/browse/PIG-889
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Fix For: 0.4.0
>
> Attachments: Pig_889_Patch.txt
>
>
> I'd like to increment Counter in my own LoadFunc, but it will throw 
> NullPointerException. It seems that the reporter is not initialized.  
> I looked into this problem and find that it need to call 
> PigHadoopLogger.getInstance().setReporter(reporter) in PigInputFormat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-833) Storage access layer

2009-07-29 Thread Jeff Hammerbacher (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736964#action_12736964
 ] 

Jeff Hammerbacher commented on PIG-833:
---

Hey Raghu,

Good stuff! Do you guys have any internal benchmarks that you could add to the 
docs on design and usage?

Thanks,
Jeff

> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, zebra-javadoc.tgz
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-885) New UDFs for piggybank (Bin, Decode, LookupInFiles, RegexExtract, RegexMatch, HashFVN, DiffDate)

2009-07-29 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-885:
---

Attachment: PIG-885-8.patch

Add NullPointerException check

> New UDFs for piggybank (Bin, Decode, LookupInFiles, RegexExtract, RegexMatch, 
> HashFVN, DiffDate)
> 
>
> Key: PIG-885
> URL: https://issues.apache.org/jira/browse/PIG-885
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.3.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Minor
> Fix For: 0.4.0
>
> Attachments: PIG-885-2.patch, PIG-885-3.patch, PIG-885-4.patch, 
> PIG-885-5.patch, PIG-885-6.patch, PIG-885-7.patch, PIG-885-8.patch, 
> PIG-885.patch
>
>
> Bunch of UDFs:
> 1. Bin -- Converts a continuous value into discrete values
> 2. Decode -- Converts a given attribute or expression into another string 
> value, based on the value of the source attribute
> 3. LookupInFiles -- Check for the existence of an expression in a serial of 
> text files
> 4. RegexExtract and RegexMatch -- Similar to perl regexes
> 5. HashFNV -- An implementation of FNV hash
> 6. DiffDate -- Caculate the number of days in between

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-892) Make COUNT and AVG deal with nulls accordingly with SQL standar

2009-07-29 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-892:
---

Attachment: PIG-892_v3.patch

Patch with addressed comments from Santhosh

> Make COUNT and AVG deal with nulls accordingly with SQL standar
> ---
>
> Key: PIG-892
> URL: https://issues.apache.org/jira/browse/PIG-892
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.3.0
>Reporter: Olga Natkovich
>Assignee: Olga Natkovich
> Fix For: 0.4.0
>
> Attachments: PIG-892.patch, PIG-892_v2.patch, PIG-892_v3.patch
>
>
> both COUNT and AVG need to ignore nulls. Also add COUNT_STAR to match 
> COUNT(*) in SQL

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig

2009-07-29 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-792:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

The code has been committed. Thanks, Sri and Ying for this important 
contribution

> PERFORMANCE: Support skewed join in pig
> ---
>
> Key: PIG-792
> URL: https://issues.apache.org/jira/browse/PIG-792
> Project: Pig
>  Issue Type: Improvement
>Reporter: Sriranjan Manjunath
> Attachments: skewedjoin.patch
>
>
> Fragmented replicated join has a few limitations:
>  - One of the tables needs to be loaded into memory
>  - Join is limited to two tables
> Skewed join partitions the table and joins the records in the reduce phase. 
> It computes a histogram of the key space to account for skewing in the input 
> records. Further, it adjusts the number of reducers depending on the key 
> distribution.
> We need to implement the skewed join in pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-885) New UDFs for piggybank (Bin, Decode, LookupInFiles, RegexExtract, RegexMatch, HashFVN, DiffDate)

2009-07-29 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-885:
---

Attachment: PIG-885-7.patch

Add null checking to all applicable UDFs

> New UDFs for piggybank (Bin, Decode, LookupInFiles, RegexExtract, RegexMatch, 
> HashFVN, DiffDate)
> 
>
> Key: PIG-885
> URL: https://issues.apache.org/jira/browse/PIG-885
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.3.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Minor
> Fix For: 0.4.0
>
> Attachments: PIG-885-2.patch, PIG-885-3.patch, PIG-885-4.patch, 
> PIG-885-5.patch, PIG-885-6.patch, PIG-885-7.patch, PIG-885.patch
>
>
> Bunch of UDFs:
> 1. Bin -- Converts a continuous value into discrete values
> 2. Decode -- Converts a given attribute or expression into another string 
> value, based on the value of the source attribute
> 3. LookupInFiles -- Check for the existence of an expression in a serial of 
> text files
> 4. RegexExtract and RegexMatch -- Similar to perl regexes
> 5. HashFNV -- An implementation of FNV hash
> 6. DiffDate -- Caculate the number of days in between

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-882) log level not propogated to loggers

2009-07-29 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-882:
---

Attachment: PIG-882-4.patch

Sync with latest trunk

> log level not propogated to loggers 
> 
>
> Key: PIG-882
> URL: https://issues.apache.org/jira/browse/PIG-882
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Thejas M Nair
> Attachments: PIG-882-1.patch, PIG-882-2.patch, PIG-882-3.patch, 
> PIG-882-4.patch
>
>
> Pig accepts log level as a parameter. But the log level it captures is not 
> set appropriately, so that loggers in different classes log at the specified 
> level.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-513) PERFORMANCE: optimize some of the code in DefaultTuple

2009-07-29 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-513:
-

Status: Patch Available  (was: Reopened)

> PERFORMANCE: optimize some of the code in DefaultTuple
> --
>
> Key: PIG-513
> URL: https://issues.apache.org/jira/browse/PIG-513
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.2.0
>Reporter: Pradeep Kamath
>Assignee: Pradeep Kamath
> Attachments: PIG-513.patch, pig-513_2.patch
>
>
> The following areas in DefaultTuple.java can be changed:
> The member methods get(), set(), getType() and isNull() all call 
> checkBounds() which is redundant call since all these 4 functions throw 
> ExecException. Instead of doing a bounds check, we can catch the 
> IndexOutOfBounds exception in a try-catch and throw it as an ExecException
> The write() method has the following unused object (d in the code below):
> {code}
> for (int i = 0; i < sz; i++) {
> try {
> Object d = get(i);
> } catch (ExecException ee) {
> throw new RuntimeException(ee);
> }
> DataReaderWriter.writeDatum(out, mFields.get(i));
> }
> {code}
> {noformat}
> The get(i) call in the try should be replaced by the writeDatum call directly 
> since d is never used and there is an unncessary call to get()
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-513) PERFORMANCE: optimize some of the code in DefaultTuple

2009-07-29 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-513:
-

Attachment: pig-513_2.patch

I encountered the same issue of  wasted work in checkBounds() while profiling 
the Merge Join. Since java in any case performs bound checks before accessing 
elements in ArrayList, this method call results in duplication of work. In this 
particular case, 6% of total time of query is spent in this method call. 
Attaching the patch generated against current trunk.

> PERFORMANCE: optimize some of the code in DefaultTuple
> --
>
> Key: PIG-513
> URL: https://issues.apache.org/jira/browse/PIG-513
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.2.0
>Reporter: Pradeep Kamath
>Assignee: Pradeep Kamath
> Attachments: PIG-513.patch, pig-513_2.patch
>
>
> The following areas in DefaultTuple.java can be changed:
> The member methods get(), set(), getType() and isNull() all call 
> checkBounds() which is redundant call since all these 4 functions throw 
> ExecException. Instead of doing a bounds check, we can catch the 
> IndexOutOfBounds exception in a try-catch and throw it as an ExecException
> The write() method has the following unused object (d in the code below):
> {code}
> for (int i = 0; i < sz; i++) {
> try {
> Object d = get(i);
> } catch (ExecException ee) {
> throw new RuntimeException(ee);
> }
> DataReaderWriter.writeDatum(out, mFields.get(i));
> }
> {code}
> {noformat}
> The get(i) call in the try should be replaced by the writeDatum call directly 
> since d is never used and there is an unncessary call to get()
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.