[jira] Commented: (PIG-934) Merge join implementation currently does not seek to right point on the right side input based on the offset provided by the index

2009-09-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749806#action_12749806
 ] 

Hadoop QA commented on PIG-934:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12418219/pig-934_2.patch
  against trunk revision 806668.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/4/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/4/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/4/console

This message is automatically generated.

> Merge join implementation currently does not seek to right point on the right 
> side input based on the offset provided by the index
> --
>
> Key: PIG-934
> URL: https://issues.apache.org/jira/browse/PIG-934
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.1
>Reporter: Pradeep Kamath
>Assignee: Ashutosh Chauhan
> Attachments: pig-934_2.patch
>
>
> We use POLoad to seek into right file which has the following code: 
> {noformat}
>public void setUp() throws IOException{
> String filename = lFile.getFileName();
> loader = 
> (LoadFunc)PigContext.instantiateFuncFromSpec(lFile.getFuncSpec());
> is = FileLocalizer.open(filename, pc);
> loader.bindTo(filename , new BufferedPositionedInputStream(is), 
> this.offset, Long.MAX_VALUE);
> }
> {noformat}
> Between opening the stream and bindTo we do not seek to the right offset. 
> bindTo itself does not perform any seek.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-934) Merge join implementation currently does not seek to right point on the right side input based on the offset provided by the index

2009-09-01 Thread Giridharan Kesavan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749822#action_12749822
 ] 

Giridharan Kesavan commented on PIG-934:


I resubmitted the patch to hudson as the core tests failed for not finding 
javac.

> Merge join implementation currently does not seek to right point on the right 
> side input based on the offset provided by the index
> --
>
> Key: PIG-934
> URL: https://issues.apache.org/jira/browse/PIG-934
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.1
>Reporter: Pradeep Kamath
>Assignee: Ashutosh Chauhan
> Attachments: pig-934_2.patch
>
>
> We use POLoad to seek into right file which has the following code: 
> {noformat}
>public void setUp() throws IOException{
> String filename = lFile.getFileName();
> loader = 
> (LoadFunc)PigContext.instantiateFuncFromSpec(lFile.getFuncSpec());
> is = FileLocalizer.open(filename, pc);
> loader.bindTo(filename , new BufferedPositionedInputStream(is), 
> this.offset, Long.MAX_VALUE);
> }
> {noformat}
> Between opening the stream and bindTo we do not seek to the right offset. 
> bindTo itself does not perform any seek.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-939) Checkstyle pulls in junit3.7 which causes the build of test code to fail.

2009-09-01 Thread Giridharan Kesavan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giridharan Kesavan updated PIG-939:
---

Status: Patch Available  (was: Open)

> Checkstyle pulls in junit3.7 which causes the build of test code to fail.
> -
>
> Key: PIG-939
> URL: https://issues.apache.org/jira/browse/PIG-939
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.3.0
>Reporter: Lee Tucker
> Attachments: pig-939.patch
>
>
> Pig fails to compile if you execute: 
> ant -D clean findbugs checkstyle 
> test 
> It gets the error:
> [javac] Compiling 153 source files to 
> /export/crawlspace/kryptonite/hadoopqa/workspace/workspace/CCDI-Pig-2.3/pig-2.3.0.0.20.0.2967040009/build/test/classes
> [javac] 
> /export/crawlspace/kryptonite/hadoopqa/workspace/workspace/CCDI-Pig-2.3/pig-2.3.0.0.20.0.2967040009/test/org/apache/pig/test/PigExecTestCase.java:31:
>  cannot find symbol
> [javac] symbol  : constructor TestCase()
> [javac] location: class junit.framework.TestCase
> [javac] public abstract class PigExecTestCase extends TestCase {
> [javac] ^
> Once that's done, there's a copy of junit 3.7 cached from ivy that will 
> continue to cause the build to fail.  It will succeed, if you remove it, and 
> then do:
> ant -D clean findbugs test
> This proves it's running checkstyle that pulls in junit 3.7

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-939) Checkstyle pulls in junit3.7 which causes the build of test code to fail.

2009-09-01 Thread Giridharan Kesavan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giridharan Kesavan updated PIG-939:
---

Attachment: pig-939.patch

this patch should fix this issue of downloading junit-3.7

> Checkstyle pulls in junit3.7 which causes the build of test code to fail.
> -
>
> Key: PIG-939
> URL: https://issues.apache.org/jira/browse/PIG-939
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.3.0
>Reporter: Lee Tucker
> Attachments: pig-939.patch
>
>
> Pig fails to compile if you execute: 
> ant -D clean findbugs checkstyle 
> test 
> It gets the error:
> [javac] Compiling 153 source files to 
> /export/crawlspace/kryptonite/hadoopqa/workspace/workspace/CCDI-Pig-2.3/pig-2.3.0.0.20.0.2967040009/build/test/classes
> [javac] 
> /export/crawlspace/kryptonite/hadoopqa/workspace/workspace/CCDI-Pig-2.3/pig-2.3.0.0.20.0.2967040009/test/org/apache/pig/test/PigExecTestCase.java:31:
>  cannot find symbol
> [javac] symbol  : constructor TestCase()
> [javac] location: class junit.framework.TestCase
> [javac] public abstract class PigExecTestCase extends TestCase {
> [javac] ^
> Once that's done, there's a copy of junit 3.7 cached from ivy that will 
> continue to cause the build to fail.  It will succeed, if you remove it, and 
> then do:
> ant -D clean findbugs test
> This proves it's running checkstyle that pulls in junit 3.7

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-934) Merge join implementation currently does not seek to right point on the right side input based on the offset provided by the index

2009-09-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749853#action_12749853
 ] 

Hadoop QA commented on PIG-934:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12418219/pig-934_2.patch
  against trunk revision 806668.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/5/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/5/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/5/console

This message is automatically generated.

> Merge join implementation currently does not seek to right point on the right 
> side input based on the offset provided by the index
> --
>
> Key: PIG-934
> URL: https://issues.apache.org/jira/browse/PIG-934
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.1
>Reporter: Pradeep Kamath
>Assignee: Ashutosh Chauhan
> Attachments: pig-934_2.patch
>
>
> We use POLoad to seek into right file which has the following code: 
> {noformat}
>public void setUp() throws IOException{
> String filename = lFile.getFileName();
> loader = 
> (LoadFunc)PigContext.instantiateFuncFromSpec(lFile.getFuncSpec());
> is = FileLocalizer.open(filename, pc);
> loader.bindTo(filename , new BufferedPositionedInputStream(is), 
> this.offset, Long.MAX_VALUE);
> }
> {noformat}
> Between opening the stream and bindTo we do not seek to the right offset. 
> bindTo itself does not perform any seek.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-934) Merge join implementation currently does not seek to right point on the right side input based on the offset provided by the index

2009-09-01 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749901#action_12749901
 ] 

Ashutosh Chauhan commented on PIG-934:
--

All tests passed on my local box. Not sure why they failed on hudson. 

> Merge join implementation currently does not seek to right point on the right 
> side input based on the offset provided by the index
> --
>
> Key: PIG-934
> URL: https://issues.apache.org/jira/browse/PIG-934
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.1
>Reporter: Pradeep Kamath
>Assignee: Ashutosh Chauhan
> Attachments: pig-934_2.patch
>
>
> We use POLoad to seek into right file which has the following code: 
> {noformat}
>public void setUp() throws IOException{
> String filename = lFile.getFileName();
> loader = 
> (LoadFunc)PigContext.instantiateFuncFromSpec(lFile.getFuncSpec());
> is = FileLocalizer.open(filename, pc);
> loader.bindTo(filename , new BufferedPositionedInputStream(is), 
> this.offset, Long.MAX_VALUE);
> }
> {noformat}
> Between opening the stream and bindTo we do not seek to the right offset. 
> bindTo itself does not perform any seek.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-940) Cross site HDFS access using the default.fs.name not possible in Pig

2009-09-01 Thread Mridul Muralidharan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749905#action_12749905
 ] 

Mridul Muralidharan commented on PIG-940:
-

Is this supported in hadoop ? As in, can you specify the input to be on a 
different hdfs and get a mapred job to work ? IIRC no, but I could be missing 
something.

If it is no, then not sure if pig can support it without an intermediate distcp 
...

> Cross site HDFS access using the default.fs.name not possible in Pig
> 
>
> Key: PIG-940
> URL: https://issues.apache.org/jira/browse/PIG-940
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.3.0
> Environment: Hadoop 20
>Reporter: Viraj Bhat
> Fix For: 0.3.0
>
>
> I have a script which does the following.. access data from a remote HDFS 
> location (via a HDFS installed at:hdfs://remotemachine1.company.com/ ) [[as I 
> do not want to copy this huge amount of data between HDFS locations]].
> However I want my Pigscript  to write data to the HDFS running on 
> localmachine.company.com.
> Currently Pig does not support that behavior and complains that: 
> "hdfs://localmachine.company.com/user/viraj/A1.txt does not exist"
> {code}
> A = LOAD 'hdfs://remotemachine1.company.com/user/viraj/A1.txt' as (a, b); 
> B = LOAD 'hdfs://remotemachine1.company.com/user/viraj/B1.txt' as (c, d); 
> C = JOIN A by a, B by c; 
> store C into 'output' using PigStorage();  
> {code}
> ===
> 2009-09-01 00:37:24,032 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: hdfs://localmachine.company.com:8020
> 2009-09-01 00:37:24,277 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to map-reduce job tracker at: localmachine.company.com:50300
> 2009-09-01 00:37:24,567 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer
>  - Rewrite: POPackage->POForEach to POJoinPackage
> 2009-09-01 00:37:24,573 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>  - MR plan size before optimization: 1
> 2009-09-01 00:37:24,573 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>  - MR plan size after optimization: 1
> 2009-09-01 00:37:26,197 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>  - Setting up single store job
> 2009-09-01 00:37:26,249 [Thread-9] WARN  org.apache.hadoop.mapred.JobClient - 
> Use GenericOptionsParser for parsing the arguments. Applications should 
> implement Tool for the same.
> 2009-09-01 00:37:26,746 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - 0% complete
> 2009-09-01 00:37:26,746 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - 100% complete
> 2009-09-01 00:37:26,747 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - 1 map reduce job(s) failed!
> 2009-09-01 00:37:26,756 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - Failed to produce result in: 
> "hdfs:/localmachine.company.com/tmp/temp-1470407685/tmp-510854480"
> 2009-09-01 00:37:26,756 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - Failed!
> 2009-09-01 00:37:26,758 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 2100: hdfs://localmachine.company.com/user/viraj/A1.txt does not exist.
> Details at logfile: /home/viraj/pigscripts/pig_1251765443851.log
> ===
> The error file in Pig contains:
> ===
> ERROR 2998: Unhandled internal error. 
> org.apache.pig.backend.executionengine.ExecException: ERROR 2100: 
> hdfs://localmachine.company.com/user/viraj/A1.txt does not exist.
> at 
> org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:126)
> at 
> org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
> at 
> org.apache.pig.impl.io.ValidatingInputFileSpec.(ValidatingInputFileSpec.java:44)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:228)
> at 
> org.apache.hadoop.

[jira] Commented: (PIG-939) Checkstyle pulls in junit3.7 which causes the build of test code to fail.

2009-09-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749921#action_12749921
 ] 

Hadoop QA commented on PIG-939:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12418232/pig-939.patch
  against trunk revision 806668.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/6/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/6/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/6/console

This message is automatically generated.

> Checkstyle pulls in junit3.7 which causes the build of test code to fail.
> -
>
> Key: PIG-939
> URL: https://issues.apache.org/jira/browse/PIG-939
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.3.0
>Reporter: Lee Tucker
> Attachments: pig-939.patch
>
>
> Pig fails to compile if you execute: 
> ant -D clean findbugs checkstyle 
> test 
> It gets the error:
> [javac] Compiling 153 source files to 
> /export/crawlspace/kryptonite/hadoopqa/workspace/workspace/CCDI-Pig-2.3/pig-2.3.0.0.20.0.2967040009/build/test/classes
> [javac] 
> /export/crawlspace/kryptonite/hadoopqa/workspace/workspace/CCDI-Pig-2.3/pig-2.3.0.0.20.0.2967040009/test/org/apache/pig/test/PigExecTestCase.java:31:
>  cannot find symbol
> [javac] symbol  : constructor TestCase()
> [javac] location: class junit.framework.TestCase
> [javac] public abstract class PigExecTestCase extends TestCase {
> [javac] ^
> Once that's done, there's a copy of junit 3.7 cached from ivy that will 
> continue to cause the build to fail.  It will succeed, if you remove it, and 
> then do:
> ant -D clean findbugs test
> This proves it's running checkstyle that pulls in junit 3.7

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Request for feedback: cost-based optimizer

2009-09-01 Thread Dmitriy Ryaboy
Hi everyone,
Attached is a (very) preliminary document outlining a rough design we
are proposing for a cost-based optimizer for Pig.
This is being done as a capstone project by three CMU Master's
students (myself, Ashutosh Chauhan, and Tejal Desai). As such, it is
not necessarily meant for immediate incorporation into the Pig
codebase, although it would be nice if it, or parts of it, are found
to be useful in the mainline.

We would love to get some feedback from the developer community
regarding the ideas expressed in the document, any concerns about the
design, suggestions for improvement, etc.

Thanks,
Dmitriy, Ashutosh, Tejal


RE: Request for feedback: cost-based optimizer

2009-09-01 Thread Santhosh Srinivasan
Dmitriy and Gang,

The mailing list does not allow attachments. Can you post it on a
website and just send the URL ?

Thanks,
Santhosh 

-Original Message-
From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com] 
Sent: Tuesday, September 01, 2009 9:48 AM
To: pig-dev@hadoop.apache.org
Subject: Request for feedback: cost-based optimizer

Hi everyone,
Attached is a (very) preliminary document outlining a rough design we
are proposing for a cost-based optimizer for Pig.
This is being done as a capstone project by three CMU Master's students
(myself, Ashutosh Chauhan, and Tejal Desai). As such, it is not
necessarily meant for immediate incorporation into the Pig codebase,
although it would be nice if it, or parts of it, are found to be useful
in the mainline.

We would love to get some feedback from the developer community
regarding the ideas expressed in the document, any concerns about the
design, suggestions for improvement, etc.

Thanks,
Dmitriy, Ashutosh, Tejal


Re: Request for feedback: cost-based optimizer

2009-09-01 Thread Dmitriy Ryaboy
Whoops :-)
Here's the Google doc:
http://docs.google.com/Doc?docid=0Adqb7pZsloe6ZGM4Z3o1OG1fMjFrZjViZ21jdA&hl=en

-Dmitriy

On Tue, Sep 1, 2009 at 12:51 PM, Santhosh Srinivasan wrote:
> Dmitriy and Gang,
>
> The mailing list does not allow attachments. Can you post it on a
> website and just send the URL ?
>
> Thanks,
> Santhosh
>
> -Original Message-
> From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com]
> Sent: Tuesday, September 01, 2009 9:48 AM
> To: pig-dev@hadoop.apache.org
> Subject: Request for feedback: cost-based optimizer
>
> Hi everyone,
> Attached is a (very) preliminary document outlining a rough design we
> are proposing for a cost-based optimizer for Pig.
> This is being done as a capstone project by three CMU Master's students
> (myself, Ashutosh Chauhan, and Tejal Desai). As such, it is not
> necessarily meant for immediate incorporation into the Pig codebase,
> although it would be nice if it, or parts of it, are found to be useful
> in the mainline.
>
> We would love to get some feedback from the developer community
> regarding the ideas expressed in the document, any concerns about the
> design, suggestions for improvement, etc.
>
> Thanks,
> Dmitriy, Ashutosh, Tejal
>


[jira] Commented: (PIG-940) Cross site HDFS access using the default.fs.name not possible in Pig

2009-09-01 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750040#action_12750040
 ] 

Koji Noguchi commented on PIG-940:
--

bq. Is this supported in hadoop ? 
Sure.

> Cross site HDFS access using the default.fs.name not possible in Pig
> 
>
> Key: PIG-940
> URL: https://issues.apache.org/jira/browse/PIG-940
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.3.0
> Environment: Hadoop 20
>Reporter: Viraj Bhat
> Fix For: 0.3.0
>
>
> I have a script which does the following.. access data from a remote HDFS 
> location (via a HDFS installed at:hdfs://remotemachine1.company.com/ ) [[as I 
> do not want to copy this huge amount of data between HDFS locations]].
> However I want my Pigscript  to write data to the HDFS running on 
> localmachine.company.com.
> Currently Pig does not support that behavior and complains that: 
> "hdfs://localmachine.company.com/user/viraj/A1.txt does not exist"
> {code}
> A = LOAD 'hdfs://remotemachine1.company.com/user/viraj/A1.txt' as (a, b); 
> B = LOAD 'hdfs://remotemachine1.company.com/user/viraj/B1.txt' as (c, d); 
> C = JOIN A by a, B by c; 
> store C into 'output' using PigStorage();  
> {code}
> ===
> 2009-09-01 00:37:24,032 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: hdfs://localmachine.company.com:8020
> 2009-09-01 00:37:24,277 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to map-reduce job tracker at: localmachine.company.com:50300
> 2009-09-01 00:37:24,567 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer
>  - Rewrite: POPackage->POForEach to POJoinPackage
> 2009-09-01 00:37:24,573 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>  - MR plan size before optimization: 1
> 2009-09-01 00:37:24,573 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>  - MR plan size after optimization: 1
> 2009-09-01 00:37:26,197 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>  - Setting up single store job
> 2009-09-01 00:37:26,249 [Thread-9] WARN  org.apache.hadoop.mapred.JobClient - 
> Use GenericOptionsParser for parsing the arguments. Applications should 
> implement Tool for the same.
> 2009-09-01 00:37:26,746 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - 0% complete
> 2009-09-01 00:37:26,746 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - 100% complete
> 2009-09-01 00:37:26,747 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - 1 map reduce job(s) failed!
> 2009-09-01 00:37:26,756 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - Failed to produce result in: 
> "hdfs:/localmachine.company.com/tmp/temp-1470407685/tmp-510854480"
> 2009-09-01 00:37:26,756 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - Failed!
> 2009-09-01 00:37:26,758 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 2100: hdfs://localmachine.company.com/user/viraj/A1.txt does not exist.
> Details at logfile: /home/viraj/pigscripts/pig_1251765443851.log
> ===
> The error file in Pig contains:
> ===
> ERROR 2998: Unhandled internal error. 
> org.apache.pig.backend.executionengine.ExecException: ERROR 2100: 
> hdfs://localmachine.company.com/user/viraj/A1.txt does not exist.
> at 
> org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:126)
> at 
> org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
> at 
> org.apache.pig.impl.io.ValidatingInputFileSpec.(ValidatingInputFileSpec.java:44)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:228)
> at 
> org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
> at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
>

[jira] Updated: (PIG-918) [zebra] LOAD call will hang if only the first column group is queried

2009-09-01 Thread Raghu Angadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-918:
-

Attachment: pig-zebra.patch

When you generate a patch with 'git diff' please use 'git diff --no-prefix' so 
that patch applies with 'patch -p0' command. I am updating the attached patch 
with this change.


> [zebra] LOAD call will hang if only the first column group is queried
> -
>
> Key: PIG-918
> URL: https://issues.apache.org/jira/browse/PIG-918
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Yan Zhou
> Fix For: 0.4.0
>
> Attachments: pig-zebra.patch, pig-zebra.patch
>
>
> Zebra's LOAD call with projections that only nclude column(s) in the first 
> column group will hang because an improper range of random numbers for index 
> to the array of column groups always skips the first element so that if all 
> other column groups are not used, the looping keeps running without a chance 
> to break.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-918) [zebra] LOAD call will hang if only the first column group is queried

2009-09-01 Thread Raghu Angadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-918:
-

Affects Version/s: (was: 0.3.0)
   0.4.0

> [zebra] LOAD call will hang if only the first column group is queried
> -
>
> Key: PIG-918
> URL: https://issues.apache.org/jira/browse/PIG-918
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Yan Zhou
> Fix For: 0.4.0
>
> Attachments: pig-zebra.patch, pig-zebra.patch
>
>
> Zebra's LOAD call with projections that only nclude column(s) in the first 
> column group will hang because an improper range of random numbers for index 
> to the array of column groups always skips the first element so that if all 
> other column groups are not used, the looping keeps running without a chance 
> to break.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-918) [zebra] LOAD call will hang if only the first column group is queried

2009-09-01 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750055#action_12750055
 ] 

Raghu Angadi commented on PIG-918:
--

I just committed this. Thanks Yan.

> [zebra] LOAD call will hang if only the first column group is queried
> -
>
> Key: PIG-918
> URL: https://issues.apache.org/jira/browse/PIG-918
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Yan Zhou
> Fix For: 0.4.0
>
> Attachments: pig-zebra.patch, pig-zebra.patch
>
>
> Zebra's LOAD call with projections that only nclude column(s) in the first 
> column group will hang because an improper range of random numbers for index 
> to the array of column groups always skips the first element so that if all 
> other column groups are not used, the looping keeps running without a chance 
> to break.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-941) [zebra] Loading non-existing column generates error

2009-09-01 Thread Yiping Han (JIRA)
[zebra] Loading non-existing column generates error
---

 Key: PIG-941
 URL: https://issues.apache.org/jira/browse/PIG-941
 Project: Pig
  Issue Type: Bug
  Components: data
Reporter: Yiping Han


Loading a column that does not exist generates the following error:

2009-09-01 21:29:15,161 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
2999: Unexpected internal error. null

Example is like this:

STORE urls2 into '$output' using 
org.apache.pig.table.pig.TableStorer('md5:string, url:string');

and then in another pig script, I load the table:

input = LOAD '$output' USING org.apache.pig.table.pig.TableLoader('md5,url, 
domain');

where domain is a column that does not exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-833) Storage access layer

2009-09-01 Thread Jing Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750093#action_12750093
 ] 

Jing Huang commented on PIG-833:


Hi Yongqiang, 
Sorry for the late reply. I was out of town last week. 
Right, SF_F is not defined in the schema, query a none-existing column is 
allowed and it will return null.

> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
> PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, 
> TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Request for feedback: cost-based optimizer

2009-09-01 Thread Jianyong Dai
I am still reading but one interesting question is why you decide to put 
CBO in physical layer?


Dmitriy Ryaboy wrote:

Whoops :-)
Here's the Google doc:
http://docs.google.com/Doc?docid=0Adqb7pZsloe6ZGM4Z3o1OG1fMjFrZjViZ21jdA&hl=en

-Dmitriy

On Tue, Sep 1, 2009 at 12:51 PM, Santhosh Srinivasan wrote:
  

Dmitriy and Gang,

The mailing list does not allow attachments. Can you post it on a
website and just send the URL ?

Thanks,
Santhosh

-Original Message-
From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com]
Sent: Tuesday, September 01, 2009 9:48 AM
To: pig-dev@hadoop.apache.org
Subject: Request for feedback: cost-based optimizer

Hi everyone,
Attached is a (very) preliminary document outlining a rough design we
are proposing for a cost-based optimizer for Pig.
This is being done as a capstone project by three CMU Master's students
(myself, Ashutosh Chauhan, and Tejal Desai). As such, it is not
necessarily meant for immediate incorporation into the Pig codebase,
although it would be nice if it, or parts of it, are found to be useful
in the mainline.

We would love to get some feedback from the developer community
regarding the ideas expressed in the document, any concerns about the
design, suggestions for improvement, etc.

Thanks,
Dmitriy, Ashutosh, Tejal






Re: Request for feedback: cost-based optimizer

2009-09-01 Thread Dmitriy Ryaboy
Our initial survey of related literature showed that the usual place
for a CBO tends to be between the physical and logical layer (in fact,
the famous Cascades paper advocates removing the distinction between
physical and logical operators altogether, and using an "is_logical"
and "is_physical" flag instead -- meaning an operator can be one,
both, or neither).

The reasoning is that you cannot properly determine a cost of a plan
if you don't know the physical "properties" of the operators that
implement it. An optimizer that works at a logical layer would by
definition create the same plan whether in local or mapreduce mode
(since such differences are abstracted from it). This is clearly
incorrect, as the properties of the environment in which these plans
are executed are drastically different.  Working at the physical layer
lets us stay close to the iron and adjust based on the specifics of
the execution environment.

Certainly one can posit a framework for a CBO that would set up the
necessary interfaces and plumbing for optimizing in any execution
mode, and invoke the proper implementations at run time; we are not
discounting that possibility (haven't gotten quite that far in the
design, to be honest).  But we feel that the implementations have to
be execution mode specific.

-Dmitriy

On Tue, Sep 1, 2009 at 6:26 PM, Jianyong Dai wrote:
> I am still reading but one interesting question is why you decide to put CBO
> in physical layer?
>
> Dmitriy Ryaboy wrote:
>>
>> Whoops :-)
>> Here's the Google doc:
>>
>> http://docs.google.com/Doc?docid=0Adqb7pZsloe6ZGM4Z3o1OG1fMjFrZjViZ21jdA&hl=en
>>
>> -Dmitriy
>>
>> On Tue, Sep 1, 2009 at 12:51 PM, Santhosh Srinivasan
>> wrote:
>>
>>>
>>> Dmitriy and Gang,
>>>
>>> The mailing list does not allow attachments. Can you post it on a
>>> website and just send the URL ?
>>>
>>> Thanks,
>>> Santhosh
>>>
>>> -Original Message-
>>> From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com]
>>> Sent: Tuesday, September 01, 2009 9:48 AM
>>> To: pig-dev@hadoop.apache.org
>>> Subject: Request for feedback: cost-based optimizer
>>>
>>> Hi everyone,
>>> Attached is a (very) preliminary document outlining a rough design we
>>> are proposing for a cost-based optimizer for Pig.
>>> This is being done as a capstone project by three CMU Master's students
>>> (myself, Ashutosh Chauhan, and Tejal Desai). As such, it is not
>>> necessarily meant for immediate incorporation into the Pig codebase,
>>> although it would be nice if it, or parts of it, are found to be useful
>>> in the mainline.
>>>
>>> We would love to get some feedback from the developer community
>>> regarding the ideas expressed in the document, any concerns about the
>>> design, suggestions for improvement, etc.
>>>
>>> Thanks,
>>> Dmitriy, Ashutosh, Tejal
>>>
>>>
>
>


[jira] Updated: (PIG-934) Merge join implementation currently does not seek to right point on the right side input based on the offset provided by the index

2009-09-01 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-934:
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Checked that the unit tests works locally on my machine too.

Patch committed - Thanks Ashutosh!

> Merge join implementation currently does not seek to right point on the right 
> side input based on the offset provided by the index
> --
>
> Key: PIG-934
> URL: https://issues.apache.org/jira/browse/PIG-934
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.1
>Reporter: Pradeep Kamath
>Assignee: Ashutosh Chauhan
> Attachments: pig-934_2.patch
>
>
> We use POLoad to seek into right file which has the following code: 
> {noformat}
>public void setUp() throws IOException{
> String filename = lFile.getFileName();
> loader = 
> (LoadFunc)PigContext.instantiateFuncFromSpec(lFile.getFuncSpec());
> is = FileLocalizer.open(filename, pc);
> loader.bindTo(filename , new BufferedPositionedInputStream(is), 
> this.offset, Long.MAX_VALUE);
> }
> {noformat}
> Between opening the stream and bindTo we do not seek to the right offset. 
> bindTo itself does not perform any seek.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-935) Skewed join throws an exception when used with map keys

2009-09-01 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-935:


Attachment: skmapbug.patch

Added code to explicitly check for -1 in orderby

> Skewed join throws an exception when used with map keys
> ---
>
> Key: PIG-935
> URL: https://issues.apache.org/jira/browse/PIG-935
> Project: Pig
>  Issue Type: Bug
>Reporter: Sriranjan Manjunath
> Attachments: skmapbug.patch
>
>
> Skewed join throws a runtime exception for the following query:
> A = load 'map.txt' as (e);
> B = load 'map.txt' as (f);
> C = join A by (chararray)e#'a', B by (chararray)f#'a' using "skewed";
> explain C;
> Exception:
> Caused by: java.lang.ClassCastException: 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast
>  cannot be cast to 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PO
> Project
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSortCols(MRCompiler.java:1492)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSamplingJob(MRCompiler.java:1894)
> ... 27 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-935) Skewed join throws an exception when used with map keys

2009-09-01 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-935:


Attachment: (was: skjoinmapbug.patch)

> Skewed join throws an exception when used with map keys
> ---
>
> Key: PIG-935
> URL: https://issues.apache.org/jira/browse/PIG-935
> Project: Pig
>  Issue Type: Bug
>Reporter: Sriranjan Manjunath
> Attachments: skmapbug.patch
>
>
> Skewed join throws a runtime exception for the following query:
> A = load 'map.txt' as (e);
> B = load 'map.txt' as (f);
> C = join A by (chararray)e#'a', B by (chararray)f#'a' using "skewed";
> explain C;
> Exception:
> Caused by: java.lang.ClassCastException: 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast
>  cannot be cast to 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PO
> Project
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSortCols(MRCompiler.java:1492)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSamplingJob(MRCompiler.java:1894)
> ... 27 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-935) Skewed join throws an exception when used with map keys

2009-09-01 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-935:


Status: Patch Available  (was: Open)

> Skewed join throws an exception when used with map keys
> ---
>
> Key: PIG-935
> URL: https://issues.apache.org/jira/browse/PIG-935
> Project: Pig
>  Issue Type: Bug
>Reporter: Sriranjan Manjunath
> Attachments: skmapbug.patch
>
>
> Skewed join throws a runtime exception for the following query:
> A = load 'map.txt' as (e);
> B = load 'map.txt' as (f);
> C = join A by (chararray)e#'a', B by (chararray)f#'a' using "skewed";
> explain C;
> Exception:
> Caused by: java.lang.ClassCastException: 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast
>  cannot be cast to 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PO
> Project
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSortCols(MRCompiler.java:1492)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSamplingJob(MRCompiler.java:1894)
> ... 27 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-935) Skewed join throws an exception when used with map keys

2009-09-01 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-935:


Status: Open  (was: Patch Available)

> Skewed join throws an exception when used with map keys
> ---
>
> Key: PIG-935
> URL: https://issues.apache.org/jira/browse/PIG-935
> Project: Pig
>  Issue Type: Bug
>Reporter: Sriranjan Manjunath
> Attachments: skjoinmapbug.patch
>
>
> Skewed join throws a runtime exception for the following query:
> A = load 'map.txt' as (e);
> B = load 'map.txt' as (f);
> C = join A by (chararray)e#'a', B by (chararray)f#'a' using "skewed";
> explain C;
> Exception:
> Caused by: java.lang.ClassCastException: 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast
>  cannot be cast to 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PO
> Project
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSortCols(MRCompiler.java:1492)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSamplingJob(MRCompiler.java:1894)
> ... 27 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-935) Skewed join throws an exception when used with map keys

2009-09-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750256#action_12750256
 ] 

Hadoop QA commented on PIG-935:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12418325/skmapbug.patch
  against trunk revision 810327.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/7/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/7/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/7/console

This message is automatically generated.

> Skewed join throws an exception when used with map keys
> ---
>
> Key: PIG-935
> URL: https://issues.apache.org/jira/browse/PIG-935
> Project: Pig
>  Issue Type: Bug
>Reporter: Sriranjan Manjunath
> Attachments: skmapbug.patch
>
>
> Skewed join throws a runtime exception for the following query:
> A = load 'map.txt' as (e);
> B = load 'map.txt' as (f);
> C = join A by (chararray)e#'a', B by (chararray)f#'a' using "skewed";
> explain C;
> Exception:
> Caused by: java.lang.ClassCastException: 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast
>  cannot be cast to 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PO
> Project
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSortCols(MRCompiler.java:1492)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSamplingJob(MRCompiler.java:1894)
> ... 27 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-935) Skewed join throws an exception when used with map keys

2009-09-01 Thread Sriranjan Manjunath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750297#action_12750297
 ] 

Sriranjan Manjunath commented on PIG-935:
-

The unit tests are unrelated to my patch.

> Skewed join throws an exception when used with map keys
> ---
>
> Key: PIG-935
> URL: https://issues.apache.org/jira/browse/PIG-935
> Project: Pig
>  Issue Type: Bug
>Reporter: Sriranjan Manjunath
> Attachments: skmapbug.patch
>
>
> Skewed join throws a runtime exception for the following query:
> A = load 'map.txt' as (e);
> B = load 'map.txt' as (f);
> C = join A by (chararray)e#'a', B by (chararray)f#'a' using "skewed";
> explain C;
> Exception:
> Caused by: java.lang.ClassCastException: 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast
>  cannot be cast to 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PO
> Project
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSortCols(MRCompiler.java:1492)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSamplingJob(MRCompiler.java:1894)
> ... 27 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.