[jira] Updated: (PIG-824) SQL interface for Pig
[ https://issues.apache.org/jira/browse/PIG-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-824: -- Attachment: SQL_IN_PIG.html PIG-824.1.patch PIG-824.binfiles.tar.gz PIG-824.binfiles.tar.gz - contains libs that it depends on PIG-824.1.patch - patch SQL_IN_PIG.html - (brief) document JFlex.jar has not been included because it covered by GPL. It will have to be downloaded to lib dir for building with the patch. In future Ivy will be setup to download it . > SQL interface for Pig > - > > Key: PIG-824 > URL: https://issues.apache.org/jira/browse/PIG-824 > Project: Pig > Issue Type: New Feature >Reporter: Olga Natkovich > Attachments: PIG-824.1.patch, PIG-824.binfiles.tar.gz, SQL_IN_PIG.html > > > In the last 18 month PigLatin has gained significant popularity within the > open source community. Many users like its data flow model, its rich type > system and its ability to work with any data available on HDFS or outside. We > have also heard from many users that having Pig speak SQL would bring many > more users. Having a single system that exports multiple interfaces is a big > advantage as it guarantees consistent semantics, custom code reuse, and > reduces the amount of maintenance. This is especially relevant for project > where using both interfaces for different parts of the system is relevant. > For instance, in a > data warehousing system, you would have ETL component that brings data into > the warehouse and a component that analyzes the data and produces reports. > PigLatin is uniquely suited for ETL processing while SQL might be a better > fit for report generation. > To start, it would make sense to implement a subset of SQL92 standard and to > be as much as possible standard compliant. This would include all the > standard constructs: select, from, where, group-by + having, order by, limit, > join (inner + outer). Several extensions such as support for pig's UDFs and > possibly streaming, multiquery and support for pig's complex types would be > helpful. > This work is dependent on metadata support outlined in > https://issues.apache.org/jira/browse/PIG-823 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-925) Fix join in local mode
[ https://issues.apache.org/jira/browse/PIG-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai reassigned PIG-925: -- Assignee: Daniel Dai > Fix join in local mode > -- > > Key: PIG-925 > URL: https://issues.apache.org/jira/browse/PIG-925 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.3.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.4.0 > > > Join is broken after LOJoin patch (Optimizer_Phase5.patch of > [PIG-697|https://issues.apache.org/jira/browse/PIG-697). Even the simplest > join script is not working under local mode: > eg: > a = load '1.txt'; > b = load '2.txt'; > c = join a by $0, b by $0; > dump c; > Caused by: java.lang.NullPointerException > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:206) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:191) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) > at > org.apache.pig.backend.local.executionengine.physicalLayer.counters.POCounter.getNext(POCounter.java:71) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117) > at > org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146) > at > org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109) > at > org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-925) Fix join in local mode
Fix join in local mode -- Key: PIG-925 URL: https://issues.apache.org/jira/browse/PIG-925 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Fix For: 0.4.0 Join is broken after LOJoin patch (Optimizer_Phase5.patch of [PIG-697|https://issues.apache.org/jira/browse/PIG-697). Even the simplest join script is not working under local mode: eg: a = load '1.txt'; b = load '2.txt'; c = join a by $0, b by $0; dump c; Caused by: java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:206) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:191) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.local.executionengine.physicalLayer.counters.POCounter.getNext(POCounter.java:71) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117) at org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146) at org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109) at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-922) Logical optimizer: push up project
[ https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743498#action_12743498 ] Hadoop QA commented on PIG-922: --- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12416618/PIG-922-p1_1.patch against trunk revision 804310. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/165/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/165/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/165/console This message is automatically generated. > Logical optimizer: push up project > -- > > Key: PIG-922 > URL: https://issues.apache.org/jira/browse/PIG-922 > Project: Pig > Issue Type: New Feature > Components: impl >Affects Versions: 0.3.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.4.0 > > Attachments: PIG-922-p1_0.patch, PIG-922-p1_1.patch > > > This is a continuation work of > [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add > another rule to the logical optimizer: Push up project, ie, prune columns as > early as possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Hudson build is back to normal: Pig-Patch-minerva.apache.org #165
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/165/
[jira] Commented: (PIG-914) Change the PIG hbase interface to use bytes along with strings
[ https://issues.apache.org/jira/browse/PIG-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743493#action_12743493 ] Daniel Dai commented on PIG-914: Hi, Alex, Are you able to assign the issue to yourself through Jira? Same to Pig-915, Pig-916. > Change the PIG hbase interface to use bytes along with strings > -- > > Key: PIG-914 > URL: https://issues.apache.org/jira/browse/PIG-914 > Project: Pig > Issue Type: Improvement >Reporter: Alex Newman >Priority: Minor > > Currently start rows, tablenames, column names are all strings, and HBase > supports bytes we might want to change the Pig interface to support bytes > along with strings. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-922) Logical optimizer: push up project
[ https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-922: --- Status: Patch Available (was: Open) > Logical optimizer: push up project > -- > > Key: PIG-922 > URL: https://issues.apache.org/jira/browse/PIG-922 > Project: Pig > Issue Type: New Feature > Components: impl >Affects Versions: 0.3.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.4.0 > > Attachments: PIG-922-p1_0.patch, PIG-922-p1_1.patch > > > This is a continuation work of > [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add > another rule to the logical optimizer: Push up project, ie, prune columns as > early as possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-922) Logical optimizer: push up project
[ https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-922: --- Attachment: PIG-922-p1_1.patch Address comments by Hudson. > Logical optimizer: push up project > -- > > Key: PIG-922 > URL: https://issues.apache.org/jira/browse/PIG-922 > Project: Pig > Issue Type: New Feature > Components: impl >Affects Versions: 0.3.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.4.0 > > Attachments: PIG-922-p1_0.patch, PIG-922-p1_1.patch > > > This is a continuation work of > [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add > another rule to the logical optimizer: Push up project, ie, prune columns as > early as possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-922) Logical optimizer: push up project
[ https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-922: --- Status: Open (was: Patch Available) > Logical optimizer: push up project > -- > > Key: PIG-922 > URL: https://issues.apache.org/jira/browse/PIG-922 > Project: Pig > Issue Type: New Feature > Components: impl >Affects Versions: 0.3.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.4.0 > > Attachments: PIG-922-p1_0.patch, PIG-922-p1_1.patch > > > This is a continuation work of > [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add > another rule to the logical optimizer: Push up project, ie, prune columns as > early as possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-892) Make COUNT and AVG deal with nulls accordingly with SQL standar
[ https://issues.apache.org/jira/browse/PIG-892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743467#action_12743467 ] Daniel Dai commented on PIG-892: +1 > Make COUNT and AVG deal with nulls accordingly with SQL standar > --- > > Key: PIG-892 > URL: https://issues.apache.org/jira/browse/PIG-892 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.3.0 >Reporter: Olga Natkovich >Assignee: Olga Natkovich > Fix For: 0.4.0 > > Attachments: PIG-892.patch, PIG-892_v2.patch, PIG-892_v3.patch > > > both COUNT and AVG need to ignore nulls. Also add COUNT_STAR to match > COUNT(*) in SQL -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-922) Logical optimizer: push up project
[ https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743456#action_12743456 ] Hadoop QA commented on PIG-922: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12416587/PIG-922-p1_0.patch against trunk revision 804310. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. -1 release audit. The applied patch generated 164 release audit warnings (more than the trunk's current 163 warnings). -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/164/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/164/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/164/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/164/console This message is automatically generated. > Logical optimizer: push up project > -- > > Key: PIG-922 > URL: https://issues.apache.org/jira/browse/PIG-922 > Project: Pig > Issue Type: New Feature > Components: impl >Affects Versions: 0.3.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.4.0 > > Attachments: PIG-922-p1_0.patch > > > This is a continuation work of > [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add > another rule to the logical optimizer: Push up project, ie, prune columns as > early as possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: Pig-Patch-minerva.apache.org #164
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/164/changes Changes: [pradeepkth] PIG-845: PERFORMANCE: Merge Join (ashutoshc via pradeepkth) - deleting renamed file - MRStreamHandler.java [pradeepkth] PIG-845: PERFORMANCE: Merge Join (ashutoshc via pradeepkth) [daijy] PIG-913: Error in Pig script when grouping on chararray column -- [...truncated 111633 lines...] [exec] [junit] [exec] [junit] 09/08/14 21:35:32 INFO mapReduceLayer.JobControlCompiler: Setting up single store job [exec] [junit] 09/08/14 21:35:32 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. [exec] [junit] 09/08/14 21:35:33 INFO dfs.StateChange: BLOCK* NameSystem.allocateBlock: /tmp/hadoop-hudson/mapred/system/job_200908142134_0002/job.jar. blk_-6402472781047644060_1012 [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: Receiving block blk_-6402472781047644060_1012 src: /127.0.0.1:44681 dest: /127.0.0.1:40670 [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: Receiving block blk_-6402472781047644060_1012 src: /127.0.0.1:39714 dest: /127.0.0.1:53345 [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: Receiving block blk_-6402472781047644060_1012 src: /127.0.0.1:41010 dest: /127.0.0.1:41033 [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: Received block blk_-6402472781047644060_1012 of size 1497453 from /127.0.0.1 [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: PacketResponder 0 for block blk_-6402472781047644060_1012 terminating [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: Received block blk_-6402472781047644060_1012 of size 1497453 from /127.0.0.1 [exec] [junit] 09/08/14 21:35:33 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:41033 is added to blk_-6402472781047644060_1012 size 1497453 [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: PacketResponder 1 for block blk_-6402472781047644060_1012 terminating [exec] [junit] 09/08/14 21:35:33 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:53345 is added to blk_-6402472781047644060_1012 size 1497453 [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: Received block blk_-6402472781047644060_1012 of size 1497453 from /127.0.0.1 [exec] [junit] 09/08/14 21:35:33 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:40670 is added to blk_-6402472781047644060_1012 size 1497453 [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: PacketResponder 2 for block blk_-6402472781047644060_1012 terminating [exec] [junit] 09/08/14 21:35:33 INFO fs.FSNamesystem: Increasing replication for file /tmp/hadoop-hudson/mapred/system/job_200908142134_0002/job.jar. New replication is 2 [exec] [junit] 09/08/14 21:35:33 INFO fs.FSNamesystem: Reducing replication for file /tmp/hadoop-hudson/mapred/system/job_200908142134_0002/job.jar. New replication is 2 [exec] [junit] 09/08/14 21:35:33 INFO dfs.StateChange: BLOCK* NameSystem.allocateBlock: /tmp/hadoop-hudson/mapred/system/job_200908142134_0002/job.split. blk_5455499385688750307_1013 [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: Receiving block blk_5455499385688750307_1013 src: /127.0.0.1:39716 dest: /127.0.0.1:53345 [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: Receiving block blk_5455499385688750307_1013 src: /127.0.0.1:44685 dest: /127.0.0.1:40670 [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: Receiving block blk_5455499385688750307_1013 src: /127.0.0.1:39881 dest: /127.0.0.1:59910 [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: Received block blk_5455499385688750307_1013 of size 1837 from /127.0.0.1 [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: PacketResponder 0 for block blk_5455499385688750307_1013 terminating [exec] [junit] 09/08/14 21:35:33 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:59910 is added to blk_5455499385688750307_1013 size 1837 [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: Received block blk_5455499385688750307_1013 of size 1837 from /127.0.0.1 [exec] [junit] 09/08/14 21:35:33 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:40670 is added to blk_5455499385688750307_1013 size 1837 [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: PacketResponder 1 for block blk_5455499385688750307_1013 terminating [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: Received block blk_5455499385688750307_1013 of size 1837 from /127.0.0.1 [exec] [junit] 09/08/14 21:35:33 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:53345 is added to blk_54554993856887503
Food for thought on Pig design
http://dreamsongs.com/WIB.html mainly section 2.1 on Worse is Better I stumbled across this article today and found the section on Worse is Better very interesting, especially since he is directly comparing the design philosophies of C vs Lisp. The article is almost 20 years old, so you may have seen it before. Alan.
[jira] Updated: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy updated PIG-924: -- Attachment: pig_924.patch The attached patch includes dynamic shims that could be used with Pig if it didn't bundle its hadoop classes. > Make Pig work with multiple versions of Hadoop > -- > > Key: PIG-924 > URL: https://issues.apache.org/jira/browse/PIG-924 > Project: Pig > Issue Type: Bug >Reporter: Dmitriy V. Ryaboy > Attachments: pig_924.patch > > > The current Pig build scripts package hadoop and other dependencies into the > pig.jar file. > This means that if users upgrade Hadoop, they also need to upgrade Pig. > Pig has relatively few dependencies on Hadoop interfaces that changed between > 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to > use the correct calls for any of the above versions of Hadoop. Unfortunately, > the building process precludes us from the ability to do this at runtime, and > forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-924) Make Pig work with multiple versions of Hadoop
Make Pig work with multiple versions of Hadoop -- Key: PIG-924 URL: https://issues.apache.org/jira/browse/PIG-924 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy The current Pig build scripts package hadoop and other dependencies into the pig.jar file. This means that if users upgrade Hadoop, they also need to upgrade Pig. Pig has relatively few dependencies on Hadoop interfaces that changed between 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to use the correct calls for any of the above versions of Hadoop. Unfortunately, the building process precludes us from the ability to do this at runtime, and forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-923) Allow setting logfile location in pig.properties
[ https://issues.apache.org/jira/browse/PIG-923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy updated PIG-923: -- Attachment: pig_923.patch One-line change to allow Main.java to default to the value specified in pig.logfile. -l still overrides. Not specifying pig.logfile in pig.properties results in the same behavior as before. No unit tests; checked manually. > Allow setting logfile location in pig.properties > > > Key: PIG-923 > URL: https://issues.apache.org/jira/browse/PIG-923 > Project: Pig > Issue Type: Bug >Affects Versions: 0.3.0 >Reporter: Dmitriy V. Ryaboy > Fix For: 0.4.0 > > Attachments: pig_923.patch > > > Local log file location can be specified through the -l flag, but it cannot > be set in pig.properties. > This JIRA proposes a change to Main.java that allows it to read the > "pig.logfile" property from the configuration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-923) Allow setting logfile location in pig.properties
Allow setting logfile location in pig.properties Key: PIG-923 URL: https://issues.apache.org/jira/browse/PIG-923 Project: Pig Issue Type: Bug Affects Versions: 0.3.0 Reporter: Dmitriy V. Ryaboy Fix For: 0.4.0 Local log file location can be specified through the -l flag, but it cannot be set in pig.properties. This JIRA proposes a change to Main.java that allows it to read the "pig.logfile" property from the configuration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-922) Logical optimizer: push up project
[ https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-922: --- Status: Patch Available (was: Open) > Logical optimizer: push up project > -- > > Key: PIG-922 > URL: https://issues.apache.org/jira/browse/PIG-922 > Project: Pig > Issue Type: New Feature > Components: impl >Affects Versions: 0.3.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.4.0 > > Attachments: PIG-922-p1_0.patch > > > This is a continuation work of > [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add > another rule to the logical optimizer: Push up project, ie, prune columns as > early as possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-922) Logical optimizer: push up project
[ https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-922: --- Attachment: PIG-922-p1_0.patch Attach patch for phase 1. > Logical optimizer: push up project > -- > > Key: PIG-922 > URL: https://issues.apache.org/jira/browse/PIG-922 > Project: Pig > Issue Type: New Feature > Components: impl >Affects Versions: 0.3.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.4.0 > > Attachments: PIG-922-p1_0.patch > > > This is a continuation work of > [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add > another rule to the logical optimizer: Push up project, ie, prune columns as > early as possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-922) Logical optimizer: push up project
[ https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743348#action_12743348 ] Daniel Dai commented on PIG-922: Design for push up projection rule: Presumption: * Prune columns of loader, save time for record parsing a = load 'a' as (n1:chararray, n2:chararray, n3:chararray); b = foreach a generate n1, n2; => a = load 'a' (n1:chararray, n2:chararray) We do not need to parse n3 in our loader. * Prune columns across map-reduce boundary (between map-reduce jobs or inter map-reduce jobs), save bandwidth a = load 'a' as (n1:chararray, n2:chararray, n3:chararray); b = group a by n1; c = sort b by n2; d = foreach c generate n2, n3; => a = load 'a' as (n1:chararray, n2:chararray, n3:chararray); b = group a by n1; b1 = foreach b generate n2, n3; c = sort b1 by n2; d = foreach c generate n2, n3; * Prune column within map-reduce boundary does not seem to be helpful store a into 'a'; b = filter a by n1='1'; c = foreach b generate n2; dump c; => store a into 'a'; a1 = foreach a generate n1, n2; b = filter a1 by n1='1'; c = foreach b generate n2; dump c; In this case, an extra foreach step is processed, but we gain no benefit. Algorithm description: 1. Divide all logical operators into two categories: create map-reduce boundary and not create map-reduce boundary. boundary = true: LOCoGroup, LOCross, LOJoin, LODistinct, LOSort boundary = false: LOFilter, LOForEach, LODefine, LOLoad, LOStore, LOSplit, LOSplitOutput, LOStream, LOUnion LOJoin can be boundary or not, depends on the type of join 2. We collect required fields from the bottom, a reverse dependency order walker algorithm is required to do this 3. We do not actually start from the leaf. We start from the last LOForEach. Only LOForEach prune columns. If there is no LOForEach in the script, then we cannot prune anything. 4. From a required output, we need an algorithm to figure required input <= require $0, $2, $3 b = foreach a generate $0, $2+$3; <= require $0, $1 5. From the bottom LOForEach, we collect required fields all the way up, if we move over a boundary operator, save the position because it is possible to put projection there .. => projection here x = CoGroup . .. => projection here y = order .. Put the projection right before boundary to make sure fewer data cross the boundary However, we do not make this decision and do the actual prune now, we will do the actual pruning top down 6. While we traversing up, if we see operator containing more than one inputs, we trace required fields in all directions; We rely on the output schema of this operator to figure out which required fields belong to which input. If we see operator containing more than one outputs, we collects required fields until all outputs has been traced 7. If we see LOStream, LOStore, we stop 8. If we see LOLoad, we stop and set required fields in LOLoad 9. From LOLoad, we do a top down traverse to decide whether we need to put projection, and if yes, insert ForEach 10. We only add projection if it is necessary. It is only necessary when the required fields of that boundary operator is more than output fields of operator before it. Filter .. (output fields: n1, n2, n3) <= we can prune n3 here x = CoGroup (required fields: n1, n2) 11. It is possible that we create a foreach which can be combined into previous foreach, however, we do not handle it in PushUpProject rule ForEach.. <= we will add a ForEach anyway here x = CoGroup . 12. Everytime we insert a LOForEach, we need to adjust the projection map all the way down 13. To fit the PushUpProject into current optimizor framework, we hook the check rule to LOForEach. Everytime we start from LOForEach and we never push up over another LOForEach. So we stop at LOForEach and save required fields upto this point. > Logical optimizer: push up project > -- > > Key: PIG-922 > URL: https://issues.apache.org/jira/browse/PIG-922 > Project: Pig > Issue Type: New Feature > Components: impl >Affects Versions: 0.3.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.4.0 > > > This is a continuation work of > [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add > another rule to the logical optimizer: Push up project, ie, prune columns as > early as possible. -- This message is automatically generated by
[jira] Commented: (PIG-922) Logical optimizer: push up project
[ https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743347#action_12743347 ] Daniel Dai commented on PIG-922: There will be three patches for this issue: phase 1: Infrastructure to find relevant input columns from output column phase 2: Infrastructure to prune column for each relational operator phase 3: push up project optimization rule > Logical optimizer: push up project > -- > > Key: PIG-922 > URL: https://issues.apache.org/jira/browse/PIG-922 > Project: Pig > Issue Type: New Feature > Components: impl >Affects Versions: 0.3.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.4.0 > > > This is a continuation work of > [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add > another rule to the logical optimizer: Push up project, ie, prune columns as > early as possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-913) Error in Pig script when grouping on chararray column
[ https://issues.apache.org/jira/browse/PIG-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743336#action_12743336 ] Hudson commented on PIG-913: Integrated in Pig-trunk #522 (See [http://hudson.zones.apache.org/hudson/job/Pig-trunk/522/]) : Error in Pig script when grouping on chararray column > Error in Pig script when grouping on chararray column > - > > Key: PIG-913 > URL: https://issues.apache.org/jira/browse/PIG-913 > Project: Pig > Issue Type: Bug >Affects Versions: 0.4.0 >Reporter: Viraj Bhat >Priority: Critical > Fix For: 0.4.0 > > Attachments: PIG-913-2.patch, PIG-913.patch > > > I have a very simple script which fails at parsetime due to the schema I > specified in the loader. > {code} > data = LOAD '/user/viraj/studenttab10k' AS (s:chararray); > dataSmall = limit data 100; > bb = GROUP dataSmall by $0; > dump bb; > {code} > = > 2009-08-06 18:47:56,297 [main] INFO org.apache.pig.Main - Logging error > messages to: /homes/viraj/pig-svn/trunk/pig_1249609676296.log > 09/08/06 18:47:56 INFO pig.Main: Logging error messages to: > /homes/viraj/pig-svn/trunk/pig_1249609676296.log > 2009-08-06 18:47:56,459 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to hadoop file system at: hdfs://localhost:9000 > 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to hadoop > file system at: hdfs://localhost:9000 > 2009-08-06 18:47:56,694 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to map-reduce job tracker at: localhost:9001 > 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to > map-reduce job tracker at: localhost:9001 > 2009-08-06 18:47:57,008 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1002: Unable to store alias bb > 09/08/06 18:47:57 ERROR grunt.Grunt: ERROR 1002: Unable to store alias bb > Details at logfile: /homes/viraj/pig-svn/trunk/pig_1249609676296.log > = > = > Pig Stack Trace > --- > ERROR 1002: Unable to store alias bb > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias bb > at org.apache.pig.PigServer.openIterator(PigServer.java:481) > at > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:531) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) > at org.apache.pig.Main.main(Main.java:397) > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: > Unable to store alias bb > at org.apache.pig.PigServer.store(PigServer.java:536) > at org.apache.pig.PigServer.openIterator(PigServer.java:464) > ... 6 more > Caused by: java.lang.NullPointerException > at > org.apache.pig.impl.logicalLayer.LOCogroup.unsetSchema(LOCogroup.java:359) > at > org.apache.pig.impl.logicalLayer.optimizer.SchemaRemover.visit(SchemaRemover.java:64) > at > org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:335) > at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:46) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) > at > org.apache.pig.impl.logicalLayer.optimizer.LogicalTransformer.rebuildSchemas(LogicalTransformer.java:67) > at > org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:187) > at org.apache.pig.PigServer.compileLp(PigServer.java:854) > at org.apache.pig.PigServer.compileLp(PigServer.java:791) > at org.apache.pig.PigServer.store(PigServer.java:509) > ... 7 more > = -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-845) PERFORMANCE: Merge Join
[ https://issues.apache.org/jira/browse/PIG-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-845: --- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Patch committed to trunk. Thanks Ashutosh for this significant contribution! > PERFORMANCE: Merge Join > --- > > Key: PIG-845 > URL: https://issues.apache.org/jira/browse/PIG-845 > Project: Pig > Issue Type: Improvement >Reporter: Olga Natkovich >Assignee: Ashutosh Chauhan > Attachments: merge-join.patch > > > Thsi join would work if the data for both tables is sorted on the join key. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-845) PERFORMANCE: Merge Join
[ https://issues.apache.org/jira/browse/PIG-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-845: - Attachment: (was: merge-join.patch) > PERFORMANCE: Merge Join > --- > > Key: PIG-845 > URL: https://issues.apache.org/jira/browse/PIG-845 > Project: Pig > Issue Type: Improvement >Reporter: Olga Natkovich >Assignee: Ashutosh Chauhan > Attachments: merge-join.patch > > > Thsi join would work if the data for both tables is sorted on the join key. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-913) Error in Pig script when grouping on chararray column
[ https://issues.apache.org/jira/browse/PIG-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-913: --- Resolution: Fixed Status: Resolved (was: Patch Available) Patch committed. > Error in Pig script when grouping on chararray column > - > > Key: PIG-913 > URL: https://issues.apache.org/jira/browse/PIG-913 > Project: Pig > Issue Type: Bug >Affects Versions: 0.4.0 >Reporter: Viraj Bhat >Priority: Critical > Fix For: 0.4.0 > > Attachments: PIG-913-2.patch, PIG-913.patch > > > I have a very simple script which fails at parsetime due to the schema I > specified in the loader. > {code} > data = LOAD '/user/viraj/studenttab10k' AS (s:chararray); > dataSmall = limit data 100; > bb = GROUP dataSmall by $0; > dump bb; > {code} > = > 2009-08-06 18:47:56,297 [main] INFO org.apache.pig.Main - Logging error > messages to: /homes/viraj/pig-svn/trunk/pig_1249609676296.log > 09/08/06 18:47:56 INFO pig.Main: Logging error messages to: > /homes/viraj/pig-svn/trunk/pig_1249609676296.log > 2009-08-06 18:47:56,459 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to hadoop file system at: hdfs://localhost:9000 > 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to hadoop > file system at: hdfs://localhost:9000 > 2009-08-06 18:47:56,694 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to map-reduce job tracker at: localhost:9001 > 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to > map-reduce job tracker at: localhost:9001 > 2009-08-06 18:47:57,008 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1002: Unable to store alias bb > 09/08/06 18:47:57 ERROR grunt.Grunt: ERROR 1002: Unable to store alias bb > Details at logfile: /homes/viraj/pig-svn/trunk/pig_1249609676296.log > = > = > Pig Stack Trace > --- > ERROR 1002: Unable to store alias bb > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias bb > at org.apache.pig.PigServer.openIterator(PigServer.java:481) > at > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:531) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) > at org.apache.pig.Main.main(Main.java:397) > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: > Unable to store alias bb > at org.apache.pig.PigServer.store(PigServer.java:536) > at org.apache.pig.PigServer.openIterator(PigServer.java:464) > ... 6 more > Caused by: java.lang.NullPointerException > at > org.apache.pig.impl.logicalLayer.LOCogroup.unsetSchema(LOCogroup.java:359) > at > org.apache.pig.impl.logicalLayer.optimizer.SchemaRemover.visit(SchemaRemover.java:64) > at > org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:335) > at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:46) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) > at > org.apache.pig.impl.logicalLayer.optimizer.LogicalTransformer.rebuildSchemas(LogicalTransformer.java:67) > at > org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:187) > at org.apache.pig.PigServer.compileLp(PigServer.java:854) > at org.apache.pig.PigServer.compileLp(PigServer.java:791) > at org.apache.pig.PigServer.store(PigServer.java:509) > ... 7 more > = -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-922) Logical optimizer: push up project
[ https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai reassigned PIG-922: -- Assignee: Daniel Dai > Logical optimizer: push up project > -- > > Key: PIG-922 > URL: https://issues.apache.org/jira/browse/PIG-922 > Project: Pig > Issue Type: New Feature > Components: impl >Affects Versions: 0.3.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.4.0 > > > This is a continuation work of > [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add > another rule to the logical optimizer: Push up project, ie, prune columns as > early as possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-922) Logical optimizer: push up project
Logical optimizer: push up project -- Key: PIG-922 URL: https://issues.apache.org/jira/browse/PIG-922 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Fix For: 0.4.0 This is a continuation work of [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add another rule to the logical optimizer: Push up project, ie, prune columns as early as possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-913) Error in Pig script when grouping on chararray column
[ https://issues.apache.org/jira/browse/PIG-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743264#action_12743264 ] Daniel Dai commented on PIG-913: This release audit warning is caused by a new golden file. We cannot add release audit notes to golden files. > Error in Pig script when grouping on chararray column > - > > Key: PIG-913 > URL: https://issues.apache.org/jira/browse/PIG-913 > Project: Pig > Issue Type: Bug >Affects Versions: 0.4.0 >Reporter: Viraj Bhat >Priority: Critical > Fix For: 0.4.0 > > Attachments: PIG-913-2.patch, PIG-913.patch > > > I have a very simple script which fails at parsetime due to the schema I > specified in the loader. > {code} > data = LOAD '/user/viraj/studenttab10k' AS (s:chararray); > dataSmall = limit data 100; > bb = GROUP dataSmall by $0; > dump bb; > {code} > = > 2009-08-06 18:47:56,297 [main] INFO org.apache.pig.Main - Logging error > messages to: /homes/viraj/pig-svn/trunk/pig_1249609676296.log > 09/08/06 18:47:56 INFO pig.Main: Logging error messages to: > /homes/viraj/pig-svn/trunk/pig_1249609676296.log > 2009-08-06 18:47:56,459 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to hadoop file system at: hdfs://localhost:9000 > 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to hadoop > file system at: hdfs://localhost:9000 > 2009-08-06 18:47:56,694 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to map-reduce job tracker at: localhost:9001 > 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to > map-reduce job tracker at: localhost:9001 > 2009-08-06 18:47:57,008 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1002: Unable to store alias bb > 09/08/06 18:47:57 ERROR grunt.Grunt: ERROR 1002: Unable to store alias bb > Details at logfile: /homes/viraj/pig-svn/trunk/pig_1249609676296.log > = > = > Pig Stack Trace > --- > ERROR 1002: Unable to store alias bb > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias bb > at org.apache.pig.PigServer.openIterator(PigServer.java:481) > at > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:531) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) > at org.apache.pig.Main.main(Main.java:397) > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: > Unable to store alias bb > at org.apache.pig.PigServer.store(PigServer.java:536) > at org.apache.pig.PigServer.openIterator(PigServer.java:464) > ... 6 more > Caused by: java.lang.NullPointerException > at > org.apache.pig.impl.logicalLayer.LOCogroup.unsetSchema(LOCogroup.java:359) > at > org.apache.pig.impl.logicalLayer.optimizer.SchemaRemover.visit(SchemaRemover.java:64) > at > org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:335) > at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:46) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) > at > org.apache.pig.impl.logicalLayer.optimizer.LogicalTransformer.rebuildSchemas(LogicalTransformer.java:67) > at > org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:187) > at org.apache.pig.PigServer.compileLp(PigServer.java:854) > at org.apache.pig.PigServer.compileLp(PigServer.java:791) > at org.apache.pig.PigServer.store(PigServer.java:509) > ... 7 more > = -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.