[jira] Updated: (PIG-1843) NPE in schema generation
[ https://issues.apache.org/jira/browse/PIG-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1843: Attachment: PIG-1843-1.patch > NPE in schema generation > > > Key: PIG-1843 > URL: https://issues.apache.org/jira/browse/PIG-1843 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.8.0, 0.9.0 > > Attachments: PIG-1843-1.patch > > > Hit NPE in following script: > {code} > a = load 'table_testBagDereferenceInMiddle2' as (a0:chararray); > b = foreach a generate MapGenerate(STRSPLIT(a0).$0)); > {code} > {code} > public class MapGenerate extends EvalFunc { > @Override > public Map exec(Tuple input) throws IOException { > Map m = new HashMap(); > m.put("key", new Integer(input.size())); > return m; > } > > @Override > public Schema outputSchema(Schema input) { > return new Schema(new Schema.FieldSchema(getSchemaName("parselong", > input), DataType.MAP)); > } > } > {code} > Error message: > Caused by: java.lang.NullPointerException > at org.apache.pig.EvalFunc.getSchemaName(EvalFunc.java:76) > at string.PARSELONG.outputSchema(PARSELONG.java:63) > at > org.apache.pig.newplan.logical.expression.UserFuncExpression.getFieldSchema(UserFuncExpression.java:154) > at > org.apache.pig.newplan.logical.optimizer.FieldSchemaResetter.execute(SchemaResetter.java:192) > at > org.apache.pig.newplan.logical.expression.AllSameExpressionVisitor.visit(AllSameExpressionVisitor.java:143) > at > org.apache.pig.newplan.logical.expression.UserFuncExpression.accept(UserFuncExpression.java:71) > at > org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:104) > at > org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:240) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:93) > at > org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:73) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:279) > at org.apache.pig.PigServer.compilePp(PigServer.java:1480) > at org.apache.pig.PigServer.explain(PigServer.java:1042) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (PIG-1843) NPE in schema generation
[ https://issues.apache.org/jira/browse/PIG-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990886#comment-12990886 ] Daniel Dai commented on PIG-1843: - The problem happens when we have nested UDF: 1. Inner UDF does not define complete outputSchema 2. Outer UDF does not define getArgToFuncMapping 3. outputSchema in outer UDF uses inner schema to infer alias > NPE in schema generation > > > Key: PIG-1843 > URL: https://issues.apache.org/jira/browse/PIG-1843 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.8.0, 0.9.0 > > > Hit NPE in following script: > {code} > a = load 'table_testBagDereferenceInMiddle2' as (a0:chararray); > b = foreach a generate MapGenerate(STRSPLIT(a0).$0)); > {code} > {code} > public class MapGenerate extends EvalFunc { > @Override > public Map exec(Tuple input) throws IOException { > Map m = new HashMap(); > m.put("key", new Integer(input.size())); > return m; > } > > @Override > public Schema outputSchema(Schema input) { > return new Schema(new Schema.FieldSchema(getSchemaName("parselong", > input), DataType.MAP)); > } > } > {code} > Error message: > Caused by: java.lang.NullPointerException > at org.apache.pig.EvalFunc.getSchemaName(EvalFunc.java:76) > at string.PARSELONG.outputSchema(PARSELONG.java:63) > at > org.apache.pig.newplan.logical.expression.UserFuncExpression.getFieldSchema(UserFuncExpression.java:154) > at > org.apache.pig.newplan.logical.optimizer.FieldSchemaResetter.execute(SchemaResetter.java:192) > at > org.apache.pig.newplan.logical.expression.AllSameExpressionVisitor.visit(AllSameExpressionVisitor.java:143) > at > org.apache.pig.newplan.logical.expression.UserFuncExpression.accept(UserFuncExpression.java:71) > at > org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:104) > at > org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:240) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:93) > at > org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:73) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:279) > at org.apache.pig.PigServer.compilePp(PigServer.java:1480) > at org.apache.pig.PigServer.explain(PigServer.java:1042) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (PIG-1843) NPE in schema generation
NPE in schema generation Key: PIG-1843 URL: https://issues.apache.org/jira/browse/PIG-1843 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.9.0, 0.8.0 Hit NPE in following script: {code} a = load 'table_testBagDereferenceInMiddle2' as (a0:chararray); b = foreach a generate MapGenerate(STRSPLIT(a0).$0)); {code} {code} public class MapGenerate extends EvalFunc { @Override public Map exec(Tuple input) throws IOException { Map m = new HashMap(); m.put("key", new Integer(input.size())); return m; } @Override public Schema outputSchema(Schema input) { return new Schema(new Schema.FieldSchema(getSchemaName("parselong", input), DataType.MAP)); } } {code} Error message: Caused by: java.lang.NullPointerException at org.apache.pig.EvalFunc.getSchemaName(EvalFunc.java:76) at string.PARSELONG.outputSchema(PARSELONG.java:63) at org.apache.pig.newplan.logical.expression.UserFuncExpression.getFieldSchema(UserFuncExpression.java:154) at org.apache.pig.newplan.logical.optimizer.FieldSchemaResetter.execute(SchemaResetter.java:192) at org.apache.pig.newplan.logical.expression.AllSameExpressionVisitor.visit(AllSameExpressionVisitor.java:143) at org.apache.pig.newplan.logical.expression.UserFuncExpression.accept(UserFuncExpression.java:71) at org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:104) at org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:240) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:93) at org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:73) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:279) at org.apache.pig.PigServer.compilePp(PigServer.java:1480) at org.apache.pig.PigServer.explain(PigServer.java:1042) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (PIG-1793) Add macro expansion to Pig Latin
[ https://issues.apache.org/jira/browse/PIG-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990842#comment-12990842 ] Richard Ding commented on PIG-1793: --- Unit tests pass. The output of test-patch: {code} [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] -1 javac. The applied patch generated 973 javac compiler warnings (more than the trunk's current 962 warnings). [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 514 release audit warnings (more than the trunk's current 513 warnings). {code} The release audit warning is html related. > Add macro expansion to Pig Latin > > > Key: PIG-1793 > URL: https://issues.apache.org/jira/browse/PIG-1793 > Project: Pig > Issue Type: New Feature >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-1793.patch > > > As production Pig scripts grow longer and longer, Pig Latin has a need to > integrate standard programming techniques of separation and code sharing > offered by functions and modules. A proposal of adding macro expansion to > Pig Latin is posted here: http://wiki.apache.org/pig/TuringCompletePig > Below is a brief summary of the proposed syntax (and examples): >* Macro Definition > The existing DEFINE keyword will be expanded to allow definitions of Pig > macros. > *Syntax* > {code} > define () returns { > > }; > {code} > *Example* > {code} > define my_macro(A, sortkey) returns C { > B = filter $A by my_filter(*); > $C = order B by $sortkey; > } > {code} >* Macro Expansion > *Syntax* > {code} > = (); > {code} > *Example:* Use above macro in a Pig script: > {code} > X = load 'foo' as (user, address, phone); > Y = my_macro(X, user); > store Y into 'bar'; > {code} > This script is expanded into the following Pig Latin statements: > {code} > X = load 'foo' as (user, address, phone); > macro_my_macro_B_1 = filter X by my_filter(*); > Y = order macro_my_macro_B_1 by user; > store Y into 'bar'; > {code} > *Notes* > 1. Any alias in the macro which isn't visible from outside will be prefixed > with macro name and suffixed with instance id to avoid namespace collision. > 2. Macro expansion is not a complete replacement for function calls. > Recursive expansions are not supported. >* Macro Import > The new IMPORT keyword can be used to add macros defined in another Pig Latin > file. > *Syntax* > {code} > import ; > {code} > *Example* > {code} > import my_macro.pig; > {code} > *Note:* All macro names are in the global namespace. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (PIG-1793) Add macro expansion to Pig Latin
[ https://issues.apache.org/jira/browse/PIG-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990818#comment-12990818 ] Richard Ding commented on PIG-1793: --- Attaching the patch that implements the above proposed Pig syntax. The only change is the IMPORT statement which now requires the file name be a quoted string: {code} import 'my_macro.pig'; {code} > Add macro expansion to Pig Latin > > > Key: PIG-1793 > URL: https://issues.apache.org/jira/browse/PIG-1793 > Project: Pig > Issue Type: New Feature >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-1793.patch > > > As production Pig scripts grow longer and longer, Pig Latin has a need to > integrate standard programming techniques of separation and code sharing > offered by functions and modules. A proposal of adding macro expansion to > Pig Latin is posted here: http://wiki.apache.org/pig/TuringCompletePig > Below is a brief summary of the proposed syntax (and examples): >* Macro Definition > The existing DEFINE keyword will be expanded to allow definitions of Pig > macros. > *Syntax* > {code} > define () returns { > > }; > {code} > *Example* > {code} > define my_macro(A, sortkey) returns C { > B = filter $A by my_filter(*); > $C = order B by $sortkey; > } > {code} >* Macro Expansion > *Syntax* > {code} > = (); > {code} > *Example:* Use above macro in a Pig script: > {code} > X = load 'foo' as (user, address, phone); > Y = my_macro(X, user); > store Y into 'bar'; > {code} > This script is expanded into the following Pig Latin statements: > {code} > X = load 'foo' as (user, address, phone); > macro_my_macro_B_1 = filter X by my_filter(*); > Y = order macro_my_macro_B_1 by user; > store Y into 'bar'; > {code} > *Notes* > 1. Any alias in the macro which isn't visible from outside will be prefixed > with macro name and suffixed with instance id to avoid namespace collision. > 2. Macro expansion is not a complete replacement for function calls. > Recursive expansions are not supported. >* Macro Import > The new IMPORT keyword can be used to add macros defined in another Pig Latin > file. > *Syntax* > {code} > import ; > {code} > *Example* > {code} > import my_macro.pig; > {code} > *Note:* All macro names are in the global namespace. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PIG-1793) Add macro expansion to Pig Latin
[ https://issues.apache.org/jira/browse/PIG-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1793: -- Attachment: PIG-1793.patch > Add macro expansion to Pig Latin > > > Key: PIG-1793 > URL: https://issues.apache.org/jira/browse/PIG-1793 > Project: Pig > Issue Type: New Feature >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-1793.patch > > > As production Pig scripts grow longer and longer, Pig Latin has a need to > integrate standard programming techniques of separation and code sharing > offered by functions and modules. A proposal of adding macro expansion to > Pig Latin is posted here: http://wiki.apache.org/pig/TuringCompletePig > Below is a brief summary of the proposed syntax (and examples): >* Macro Definition > The existing DEFINE keyword will be expanded to allow definitions of Pig > macros. > *Syntax* > {code} > define () returns { > > }; > {code} > *Example* > {code} > define my_macro(A, sortkey) returns C { > B = filter $A by my_filter(*); > $C = order B by $sortkey; > } > {code} >* Macro Expansion > *Syntax* > {code} > = (); > {code} > *Example:* Use above macro in a Pig script: > {code} > X = load 'foo' as (user, address, phone); > Y = my_macro(X, user); > store Y into 'bar'; > {code} > This script is expanded into the following Pig Latin statements: > {code} > X = load 'foo' as (user, address, phone); > macro_my_macro_B_1 = filter X by my_filter(*); > Y = order macro_my_macro_B_1 by user; > store Y into 'bar'; > {code} > *Notes* > 1. Any alias in the macro which isn't visible from outside will be prefixed > with macro name and suffixed with instance id to avoid namespace collision. > 2. Macro expansion is not a complete replacement for function calls. > Recursive expansions are not supported. >* Macro Import > The new IMPORT keyword can be used to add macros defined in another Pig Latin > file. > *Syntax* > {code} > import ; > {code} > *Example* > {code} > import my_macro.pig; > {code} > *Note:* All macro names are in the global namespace. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (PIG-1825) ability to turn off the write ahead log for pig's HBaseStorage
[ https://issues.apache.org/jira/browse/PIG-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990759#comment-12990759 ] Alan Gates commented on PIG-1825: - Unit tests pass. The output of test-patch: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] [exec] As this points out, the functionality isn't tested. Before we can check it in we'll need a test added to the hbase unit tests that shows that you can write to hbase with this option set. > ability to turn off the write ahead log for pig's HBaseStorage > -- > > Key: PIG-1825 > URL: https://issues.apache.org/jira/browse/PIG-1825 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0 >Reporter: Corbin Hoenes >Priority: Minor > Fix For: 0.8.0 > > Attachments: HBaseStorage_noWAL.patch > > > Added an option to allow a caller of HBaseStorage to turn off the > WriteAheadLog feature while doing bulk loads into hbase. > From the performance tuning wikipage: > http://wiki.apache.org/hadoop/PerformanceTuning > "To speed up the inserts in a non critical job (like an import job), you can > use Put.writeToWAL(false) to bypass writing to the write ahead log." > We've tested this on HBase 0.20.6 and it helps dramatically. > The -noWAL options is passed in just like other options for hbase storage: > STORE myalias INTO 'MyTable' USING > org.apache.pig.backend.hadoop.hbase.HBaseStorage('mycolumnfamily:field1 > mycolumnfamily:field2','-noWAL'); > This would be my first patch so please educate me with any steps I need to > do. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Pig developer meeting in February
Me too, I am interested in coming, Romain On Fri, Jan 28, 2011 at 3:35 PM, Santhosh Srinivasan wrote: > I am planning to attend. > > -Original Message- > From: Olga Natkovich [mailto:ol...@yahoo-inc.com] > Sent: Friday, January 28, 2011 12:58 PM > To: dev@pig.apache.org > Subject: RE: Pig developer meeting in February > > I believe we have critical mass so the meeting is on! > > If you have not responded yet but planning to attend, please, let me know. > > Thanks, > > Olga > > -Original Message- > From: Julien Le Dem [mailto:led...@yahoo-inc.com] > Sent: Thursday, January 27, 2011 5:21 PM > To: dev@pig.apache.org > Subject: Re: Pig developer meeting in February > > Me too. > Julien > > > On 1/27/11 4:09 PM, "Dmitriy Ryaboy" wrote: > > Ok yeah I'll come :). > > > > On Thu, Jan 27, 2011 at 3:17 PM, Olga Natkovich > wrote: > > > While there is a lively discussion on this thread, I have not actually > > gotten any responses to having the meeting with exception of 1 person :). > > > > Please, let me know by the end of the week if you are planning to attend. > > If we don't get at least a few more responses I suggest we postpone > > the meeting. > > > > Thanks, > > > > Olga > > > > -Original Message- > > From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com] > > Sent: Wednesday, January 26, 2011 6:04 PM > > To: dev@pig.apache.org > > Subject: Re: Pig developer meeting in February > > > > Right, we do partition filtering, but not true predicate pushdown. > > > > On Wed, Jan 26, 2011 at 5:59 PM, Daniel Dai > > wrote: > > > > > Are you talking about LoadMetadata.setPartitionFilter? > > > PartitionFilterOptimizer will do that. > > > > > > Daniel > > > > > > > > > Dmitriy Ryaboy wrote: > > > > > >> I may be wrong but I think predicate pushdown is designed for, but > > >> not actually implemented in the current LoadPushdown interface (you > > >> can only push projections). If I am wrong, that's great.. but if > > >> not, that would > > be > > >> an important feature to add, as people are trying to connect Pig to > > >> "smart" > > >> storage systems like rdbmses, HBase, and Cassandra more and more. > > >> I > > think > > >> we only kind of simulate this with partition keys info, which is > > >> not always sufficient > > >> > > >> D > > >> > > >> On Wed, Jan 26, 2011 at 2:41 PM, Julien Le Dem > > >> > > >> wrote: > > >> > > >> > > >> > > >>> If making Pig Thread safe (i.e.: two threads running a different > > >>> pig > > >>> script) is important then we need to change some of the APIs from > > static > > >>> singleton access to a dependency injection pattern. > > >>> In that case, this should probably be done before 1.0 For example: > > >>> UDFContext should be passed to the UDF after construction (similar > > >>> to the SevrletContext in Servlet or the way Hadoop passes the > > >>> context to tasks) Also a clearly separated API that does not > > >>> depend on the Pig implementation would help. > > >>> For example UDFContext is in org.apache.pig.impl.util when it > > >>> would be better in org.apache.pig.api (Or at least an interface > > >>> defining it) > > >>> > > >>> Julien > > >>> > > >>> On 1/24/11 10:14 AM, "Olga Natkovich" wrote: > > >>> > > >>> Hi Guys, > > >>> > > >>> I think it is time for us to have another meeting. Yahoo would be > > >>> happy to host if this works for everybody. How about Wednesday, > > >>> 2/9 4-6 pm. > > >>> Please, > > >>> let us know if you are planning to attend and if the date/time > > >>> works > > for > > >>> you. > > >>> > > >>> Things that come to mind to discuss and as always feel free to > > >>> suggest > > >>> others: > > >>> > > >>> - Error handling proposal - this might be easier to finalize > > >>> face-to-face > > >>> - Pig 0.9 plan > > >>> - Pig Roadmap beyond 0.9 > > >>> oWhat do we want to do in Pig.next? > > >>> oAre we ready for Pig 1.0 > > >>> > > >>> Olga > > >>> > > >>> > > >>> > > >>> > > >> > > > > > > >
[jira] Commented: (PIG-1794) Javascript support for Pig embedding and UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990661#comment-12990661 ] Richard Ding commented on PIG-1794: --- Review comments are here https://reviews.apache.org/r/321/. Since the review board hasn't been linked with Jira, please upload the new patch to the jira. As for the 'include' statement, a related jira is PIG-1824 where the idea is to add SHIP clause so Pig would ship 'import/include' scripts to the backend. > Javascript support for Pig embedding and UDFs in scripting languages > > > Key: PIG-1794 > URL: https://issues.apache.org/jira/browse/PIG-1794 > Project: Pig > Issue Type: New Feature > Components: impl >Affects Versions: 0.9.0 >Reporter: Julien Le Dem >Assignee: Julien Le Dem > Fix For: 0.9.0 > > Attachments: jsScripting.patch > > > The attached patch proposes a javascript implementation for Pig embedding and > UDFs in scripting languages. > It is similar to the Jython implementation and uses Rhino provided in the JDK. > some differences: > - output schema is provided by: .outSchema="" as > javascript does not have annotations or decorators but functions are first > class objects > - tuples are converted to objects using the input schema (the other way > around using the output schema) > The attached patch is not final yet. In particular it lacks unit tests. > See test/org/apache/pig/test/data/tc.js for the "transitive closure" example > See the following JIRAs for more context: > https://issues.apache.org/jira/browse/PIG-928 > https://issues.apache.org/jira/browse/PIG-1479 -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (PIG-1782) Add ability to load data by column family in HBaseStorage
[ https://issues.apache.org/jira/browse/PIG-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990653#comment-12990653 ] Dmitriy V. Ryaboy commented on PIG-1782: That seems reasonable to me. The only reason I suggest deprecating the current HBaseStorage is that it's awkwardly placed in backend.hadoop.hbase which is not where anyone really expects to find it. But I guess we can do that in a different ticket. > Add ability to load data by column family in HBaseStorage > - > > Key: PIG-1782 > URL: https://issues.apache.org/jira/browse/PIG-1782 > Project: Pig > Issue Type: New Feature > Environment: Java 6, Mac OS X 10.6 >Reporter: Eric Yang >Assignee: Bill Graham > > It would be nice to load all columns in the column family by using short hand > syntax like: > {noformat} > CpuMetrics = load 'hbase://SystemMetrics' USING > org.apache.pig.backend.hadoop.hbase.HBaseStorage('cpu:','-loadKey'); > {noformat} > Assuming there are columns cpu: sys.0, cpu:sys.1, cpu:user.0, cpu:user.1, in > cpu column family. > CpuMetrics would contain something like: > {noformat} > (rowKey, cpu:sys.0, cpu:sys.1, cpu:user.0, cpu:user.1) > {noformat} -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (PIG-1717) pig needs to call setPartitionFilter if schema is null but getPartitionKeys is not
[ https://issues.apache.org/jira/browse/PIG-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990496#comment-12990496 ] Gerrit Jansen van Vuuren commented on PIG-1717: --- Thanks :) > pig needs to call setPartitionFilter if schema is null but getPartitionKeys > is not > -- > > Key: PIG-1717 > URL: https://issues.apache.org/jira/browse/PIG-1717 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.9.0 >Reporter: Gerrit Jansen van Vuuren >Assignee: Gerrit Jansen van Vuuren >Priority: Minor > Fix For: 0.9.0 > > Attachments: PIG-1717.patch, PIG-1717.v1.patch, PIG-1717.v2.patch, > patchReleaseAuditWarnings.txt.gz, testlog.tgz, > trunkReleaseAuditWarnings.txt.gz > > > I'm writing a loader that works with hive style partitioning e.g. > /logs/type1/daydate=2010-11-01 > The loader does not know the schema upfront and this is something that the > user adds in the script using the AS clause. > The problem is that this user defined schema is not available to the loader, > so the loader cannot return any schema, the Loader does know what the > partition keys are and pig needs in some way to know about these partition > keys. > Currently if the schema is null pig never calls the > LoadMetaData:getPartitionKeys method or the setPartitionFilter method. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira