[jira] Updated: (PIG-936) making dump and PigDump independent from Tuple.toString

2009-09-02 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated PIG-936:
---

Status: Open  (was: Patch Available)

> making dump and PigDump independent from Tuple.toString
> ---
>
> Key: PIG-936
> URL: https://issues.apache.org/jira/browse/PIG-936
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Olga Natkovich
>Assignee: Jeff Zhang
> Fix For: 0.4.0
>
> Attachments: Pig_936.Patch
>
>
> Since Tuple is an interface, a toString implementation can change from one 
> tuple implementation to the next. This means that format of dump and PigDump 
> will be different depending on the tuples processed. This could be quite 
> confusing to the users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-936) making dump and PigDump independent from Tuple.toString

2009-09-02 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750308#action_12750308
 ] 

Jeff Zhang commented on PIG-936:


Does anyone know why my patch failed ?
The error message in build log is : (Stripping trailing CRs from patch.)
I do not quite understand it.  I developed this patch on windows, does it 
necessary for me to code in linux platform ?



> making dump and PigDump independent from Tuple.toString
> ---
>
> Key: PIG-936
> URL: https://issues.apache.org/jira/browse/PIG-936
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Olga Natkovich
>Assignee: Jeff Zhang
> Fix For: 0.4.0
>
> Attachments: Pig_936.Patch
>
>
> Since Tuple is an interface, a toString implementation can change from one 
> tuple implementation to the next. This means that format of dump and PigDump 
> will be different depending on the tuples processed. This could be quite 
> confusing to the users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Request for feedback: cost-based optimizer

2009-09-02 Thread Jianyong Dai
Yes, physical properties is important for an optimizer. To optimize Pig 
well, we need to know the underlying hadoop execution environment, such 
as # of map-reduce jobs, how many maps/reducers, how the job is 
configured, etc. This is true even for a rule based optimizer. 
Unfortunately, physical layer does not provide much physical information 
as the name suggests. Basically physical layer is a rephrase of logical 
layer using physical operators. Compare to logical operators, physical 
operators include implementation of pipeline processing but strip away 
many logical details such as "schema". Also, in logical layer, we have 
infrastructure to restructure logical operator such as move nodes 
around, swap nodes, etc, which does not exist in physical layer. From 
optimizer's point of view, physical layer does not give necessary 
information but more harder to deal with. If you would like to work with 
physical details, I think map-reduce layer is the right place to look 
at. However, restructure map-reduce layer is hard cuz we do not have all 
the infrastructure to move things around. Another approach is to use a 
combined logical layer and map-reduce layer for the optimization. In 
this, you restructure the logical layer by observing the physical 
details from map-reduce layer. The down side is that we have to tightly 
couple Pig to hadoop. But now Pig is a subproject of hadoop and almost 
all Pig users are using hadoop, I think it is fine to optimize thing 
towards hadoop.



Dmitriy Ryaboy wrote:

Our initial survey of related literature showed that the usual place
for a CBO tends to be between the physical and logical layer (in fact,
the famous Cascades paper advocates removing the distinction between
physical and logical operators altogether, and using an "is_logical"
and "is_physical" flag instead -- meaning an operator can be one,
both, or neither).

The reasoning is that you cannot properly determine a cost of a plan
if you don't know the physical "properties" of the operators that
implement it. An optimizer that works at a logical layer would by
definition create the same plan whether in local or mapreduce mode
(since such differences are abstracted from it). This is clearly
incorrect, as the properties of the environment in which these plans
are executed are drastically different.  Working at the physical layer
lets us stay close to the iron and adjust based on the specifics of
the execution environment.

Certainly one can posit a framework for a CBO that would set up the
necessary interfaces and plumbing for optimizing in any execution
mode, and invoke the proper implementations at run time; we are not
discounting that possibility (haven't gotten quite that far in the
design, to be honest).  But we feel that the implementations have to
be execution mode specific.

-Dmitriy

On Tue, Sep 1, 2009 at 6:26 PM, Jianyong Dai wrote:
  

I am still reading but one interesting question is why you decide to put CBO
in physical layer?

Dmitriy Ryaboy wrote:


Whoops :-)
Here's the Google doc:

http://docs.google.com/Doc?docid=0Adqb7pZsloe6ZGM4Z3o1OG1fMjFrZjViZ21jdA&hl=en

-Dmitriy

On Tue, Sep 1, 2009 at 12:51 PM, Santhosh Srinivasan
wrote:

  

Dmitriy and Gang,

The mailing list does not allow attachments. Can you post it on a
website and just send the URL ?

Thanks,
Santhosh

-Original Message-
From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com]
Sent: Tuesday, September 01, 2009 9:48 AM
To: pig-dev@hadoop.apache.org
Subject: Request for feedback: cost-based optimizer

Hi everyone,
Attached is a (very) preliminary document outlining a rough design we
are proposing for a cost-based optimizer for Pig.
This is being done as a capstone project by three CMU Master's students
(myself, Ashutosh Chauhan, and Tejal Desai). As such, it is not
necessarily meant for immediate incorporation into the Pig codebase,
although it would be nice if it, or parts of it, are found to be useful
in the mainline.

We would love to get some feedback from the developer community
regarding the ideas expressed in the document, any concerns about the
design, suggestions for improvement, etc.

Thanks,
Dmitriy, Ashutosh, Tejal








[jira] Updated: (PIG-936) making dump and PigDump independent from Tuple.toString

2009-09-02 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-936:
---

Attachment: Pig_936_2.Patch

Hi, Jeff,
It is fine to make patch on Windows. "Stripping trailing CRs from patch" does 
fail the patch. The problem is PigDump.java, which is now surprisingly in 
Windows format (You can see the tailing ^M if you open in vi). If you convert 
PigDump.java into Unix format (by using dos2unix), then your patch can be 
applied. I attached the patch again. The only change is that it will convert 
PigDump.java into Unix as well, so it can be applied to trunk.

I also reviewed the patch. It looks good to me. I am fine with the putting 
TupleFormat and BagFormat into package "org.apache.pig.impl.util". I will 
commit it shortly if no other comments. Thanks!

> making dump and PigDump independent from Tuple.toString
> ---
>
> Key: PIG-936
> URL: https://issues.apache.org/jira/browse/PIG-936
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Olga Natkovich
>Assignee: Jeff Zhang
> Fix For: 0.4.0
>
> Attachments: Pig_936.Patch, Pig_936_2.Patch
>
>
> Since Tuple is an interface, a toString implementation can change from one 
> tuple implementation to the next. This means that format of dump and PigDump 
> will be different depending on the tuples processed. This could be quite 
> confusing to the users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-936) making dump and PigDump independent from Tuple.toString

2009-09-02 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-936:
---

Affects Version/s: (was: 0.4.0)
   0.3.0
   Status: Patch Available  (was: Open)

> making dump and PigDump independent from Tuple.toString
> ---
>
> Key: PIG-936
> URL: https://issues.apache.org/jira/browse/PIG-936
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Olga Natkovich
>Assignee: Jeff Zhang
> Fix For: 0.4.0
>
> Attachments: Pig_936.Patch, Pig_936_2.Patch
>
>
> Since Tuple is an interface, a toString implementation can change from one 
> tuple implementation to the next. This means that format of dump and PigDump 
> will be different depending on the tuples processed. This could be quite 
> confusing to the users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-936) making dump and PigDump independent from Tuple.toString

2009-09-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750608#action_12750608
 ] 

Hadoop QA commented on PIG-936:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12418420/Pig_936_2.Patch
  against trunk revision 810327.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to cause Findbugs to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/8/testReport/
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/8/console

This message is automatically generated.

> making dump and PigDump independent from Tuple.toString
> ---
>
> Key: PIG-936
> URL: https://issues.apache.org/jira/browse/PIG-936
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Olga Natkovich
>Assignee: Jeff Zhang
> Fix For: 0.4.0
>
> Attachments: Pig_936.Patch, Pig_936_2.Patch
>
>
> Since Tuple is an interface, a toString implementation can change from one 
> tuple implementation to the next. This means that format of dump and PigDump 
> will be different depending on the tuples processed. This could be quite 
> confusing to the users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-936) making dump and PigDump independent from Tuple.toString

2009-09-02 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-936:
---

Status: Patch Available  (was: Open)

> making dump and PigDump independent from Tuple.toString
> ---
>
> Key: PIG-936
> URL: https://issues.apache.org/jira/browse/PIG-936
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Olga Natkovich
>Assignee: Jeff Zhang
> Fix For: 0.4.0
>
> Attachments: Pig_936.Patch, Pig_936_2.Patch, Pig_936_3.Patch
>
>
> Since Tuple is an interface, a toString implementation can change from one 
> tuple implementation to the next. This means that format of dump and PigDump 
> will be different depending on the tuples processed. This could be quite 
> confusing to the users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-936) making dump and PigDump independent from Tuple.toString

2009-09-02 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-936:
---

Attachment: Pig_936_3.Patch

Missing several files in the last patch. Resubmitting

> making dump and PigDump independent from Tuple.toString
> ---
>
> Key: PIG-936
> URL: https://issues.apache.org/jira/browse/PIG-936
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Olga Natkovich
>Assignee: Jeff Zhang
> Fix For: 0.4.0
>
> Attachments: Pig_936.Patch, Pig_936_2.Patch, Pig_936_3.Patch
>
>
> Since Tuple is an interface, a toString implementation can change from one 
> tuple implementation to the next. This means that format of dump and PigDump 
> will be different depending on the tuples processed. This could be quite 
> confusing to the users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-936) making dump and PigDump independent from Tuple.toString

2009-09-02 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-936:
---

Status: Open  (was: Patch Available)

> making dump and PigDump independent from Tuple.toString
> ---
>
> Key: PIG-936
> URL: https://issues.apache.org/jira/browse/PIG-936
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Olga Natkovich
>Assignee: Jeff Zhang
> Fix For: 0.4.0
>
> Attachments: Pig_936.Patch, Pig_936_2.Patch, Pig_936_3.Patch
>
>
> Since Tuple is an interface, a toString implementation can change from one 
> tuple implementation to the next. This means that format of dump and PigDump 
> will be different depending on the tuples processed. This could be quite 
> confusing to the users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-935) Skewed join throws an exception when used with map keys

2009-09-02 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-935:
---

  Resolution: Fixed
Assignee: Santhosh Srinivasan
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Reviewed and checked that the latest patch does not cause unit test failures on 
my local patch.

Patch committed. Thanks Sriranjan!

> Skewed join throws an exception when used with map keys
> ---
>
> Key: PIG-935
> URL: https://issues.apache.org/jira/browse/PIG-935
> Project: Pig
>  Issue Type: Bug
>Reporter: Sriranjan Manjunath
>Assignee: Santhosh Srinivasan
> Attachments: skmapbug.patch
>
>
> Skewed join throws a runtime exception for the following query:
> A = load 'map.txt' as (e);
> B = load 'map.txt' as (f);
> C = join A by (chararray)e#'a', B by (chararray)f#'a' using "skewed";
> explain C;
> Exception:
> Caused by: java.lang.ClassCastException: 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast
>  cannot be cast to 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PO
> Project
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSortCols(MRCompiler.java:1492)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSamplingJob(MRCompiler.java:1894)
> ... 27 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-935) Skewed join throws an exception when used with map keys

2009-09-02 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-935:
---

Assignee: (was: Santhosh Srinivasan)

> Skewed join throws an exception when used with map keys
> ---
>
> Key: PIG-935
> URL: https://issues.apache.org/jira/browse/PIG-935
> Project: Pig
>  Issue Type: Bug
>Reporter: Sriranjan Manjunath
> Attachments: skmapbug.patch
>
>
> Skewed join throws a runtime exception for the following query:
> A = load 'map.txt' as (e);
> B = load 'map.txt' as (f);
> C = join A by (chararray)e#'a', B by (chararray)f#'a' using "skewed";
> explain C;
> Exception:
> Caused by: java.lang.ClassCastException: 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast
>  cannot be cast to 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PO
> Project
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSortCols(MRCompiler.java:1492)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.getSamplingJob(MRCompiler.java:1894)
> ... 27 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-936) making dump and PigDump independent from Tuple.toString

2009-09-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750651#action_12750651
 ] 

Hadoop QA commented on PIG-936:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12418422/Pig_936_3.Patch
  against trunk revision 810327.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/9/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/9/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/9/console

This message is automatically generated.

> making dump and PigDump independent from Tuple.toString
> ---
>
> Key: PIG-936
> URL: https://issues.apache.org/jira/browse/PIG-936
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Olga Natkovich
>Assignee: Jeff Zhang
> Fix For: 0.4.0
>
> Attachments: Pig_936.Patch, Pig_936_2.Patch, Pig_936_3.Patch
>
>
> Since Tuple is an interface, a toString implementation can change from one 
> tuple implementation to the next. This means that format of dump and PigDump 
> will be different depending on the tuples processed. This could be quite 
> confusing to the users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-890) Create a sampler interface and improve the skewed join sampler

2009-09-02 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-890:


Status: Open  (was: Patch Available)

> Create a sampler interface and improve the skewed join sampler
> --
>
> Key: PIG-890
> URL: https://issues.apache.org/jira/browse/PIG-890
> Project: Pig
>  Issue Type: Improvement
>Reporter: Sriranjan Manjunath
> Attachments: sampler.patch
>
>
> We need a different sampler for order by and skewed join. We thus need a 
> better sampling interface. The design of the same is described here: 
> http://wiki.apache.org/pig/PigSampler

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-890) Create a sampler interface and improve the skewed join sampler

2009-09-02 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-890:


Attachment: (was: sampler.patch)

> Create a sampler interface and improve the skewed join sampler
> --
>
> Key: PIG-890
> URL: https://issues.apache.org/jira/browse/PIG-890
> Project: Pig
>  Issue Type: Improvement
>Reporter: Sriranjan Manjunath
> Attachments: samplerinterface.patch
>
>
> We need a different sampler for order by and skewed join. We thus need a 
> better sampling interface. The design of the same is described here: 
> http://wiki.apache.org/pig/PigSampler

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-890) Create a sampler interface and improve the skewed join sampler

2009-09-02 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-890:


Status: Patch Available  (was: Open)

> Create a sampler interface and improve the skewed join sampler
> --
>
> Key: PIG-890
> URL: https://issues.apache.org/jira/browse/PIG-890
> Project: Pig
>  Issue Type: Improvement
>Reporter: Sriranjan Manjunath
> Attachments: samplerinterface.patch
>
>
> We need a different sampler for order by and skewed join. We thus need a 
> better sampling interface. The design of the same is described here: 
> http://wiki.apache.org/pig/PigSampler

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-890) Create a sampler interface and improve the skewed join sampler

2009-09-02 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-890:


Attachment: samplerinterface.patch

Fixed the review comments

> Create a sampler interface and improve the skewed join sampler
> --
>
> Key: PIG-890
> URL: https://issues.apache.org/jira/browse/PIG-890
> Project: Pig
>  Issue Type: Improvement
>Reporter: Sriranjan Manjunath
> Attachments: samplerinterface.patch
>
>
> We need a different sampler for order by and skewed join. We thus need a 
> better sampling interface. The design of the same is described here: 
> http://wiki.apache.org/pig/PigSampler

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-941) [zebra] Loading non-existing column generates error

2009-09-02 Thread Jing Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750677#action_12750677
 ] 

Jing Huang commented on PIG-941:


Hi Yiping
I am trying to reproduce the scenario that you are having. 
Say I have a table t1 with two columns 'word' and 'count'.  I try load t1 with 
projection ('word,count,domain')   -- please not 'domain' is a non-existing 
column. 

Here is the result that I have got:
(this,2,)
(is,1,)
(a,4,)
(test,2,)
(hello,1,)
(world,3,)


==

If i only query ('word,count'), 
I got result:
(this,2)
(is,1)
(a,4)
(test,2)
(hello,1)
(world,3)

===
So I think zebra handles non-existing column correctly. 

Now I have a question, which zebra jar you are using?
I recall that some time back, we did have  a bug reports that  zebra handles 
wrong on querying  non existing column.

Thanks

> [zebra] Loading non-existing column generates error
> ---
>
> Key: PIG-941
> URL: https://issues.apache.org/jira/browse/PIG-941
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Reporter: Yiping Han
>
> Loading a column that does not exist generates the following error:
> 2009-09-01 21:29:15,161 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 2999: Unexpected internal error. null
> Example is like this:
> STORE urls2 into '$output' using 
> org.apache.pig.table.pig.TableStorer('md5:string, url:string');
> and then in another pig script, I load the table:
> input = LOAD '$output' USING org.apache.pig.table.pig.TableLoader('md5,url, 
> domain');
> where domain is a column that does not exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-890) Create a sampler interface and improve the skewed join sampler

2009-09-02 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750701#action_12750701
 ] 

Daniel Dai commented on PIG-890:


+1 for the patch.

> Create a sampler interface and improve the skewed join sampler
> --
>
> Key: PIG-890
> URL: https://issues.apache.org/jira/browse/PIG-890
> Project: Pig
>  Issue Type: Improvement
>Reporter: Sriranjan Manjunath
> Attachments: samplerinterface.patch
>
>
> We need a different sampler for order by and skewed join. We thus need a 
> better sampling interface. The design of the same is described here: 
> http://wiki.apache.org/pig/PigSampler

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-890) Create a sampler interface and improve the skewed join sampler

2009-09-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750715#action_12750715
 ] 

Hadoop QA commented on PIG-890:
---

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12418432/samplerinterface.patch
  against trunk revision 810677.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/10/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/10/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/10/console

This message is automatically generated.

> Create a sampler interface and improve the skewed join sampler
> --
>
> Key: PIG-890
> URL: https://issues.apache.org/jira/browse/PIG-890
> Project: Pig
>  Issue Type: Improvement
>Reporter: Sriranjan Manjunath
> Attachments: samplerinterface.patch
>
>
> We need a different sampler for order by and skewed join. We thus need a 
> better sampling interface. The design of the same is described here: 
> http://wiki.apache.org/pig/PigSampler

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-922) Logical optimizer: push up project

2009-09-02 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-922:
---

Attachment: PIG-922-p2_preview2.patch

I attach an incremental phase 2 patch, so you can review only phase 2 patch by 
itself.

> Logical optimizer: push up project
> --
>
> Key: PIG-922
> URL: https://issues.apache.org/jira/browse/PIG-922
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: PIG-922-p1_0.patch, PIG-922-p1_1.patch, 
> PIG-922-p1_2.patch, PIG-922-p1_3.patch, PIG-922-p1_4.patch, 
> PIG-922-p2_preview.patch, PIG-922-p2_preview2.patch
>
>
> This is a continuation work of 
> [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add 
> another rule to the logical optimizer: Push up project, ie, prune columns as 
> early as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-890) Create a sampler interface and improve the skewed join sampler

2009-09-02 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-890:
---

   Resolution: Fixed
Fix Version/s: 0.4.0
   Status: Resolved  (was: Patch Available)

Patch committed. Unit test failure is not related to this patch. Thanks Sri. 

> Create a sampler interface and improve the skewed join sampler
> --
>
> Key: PIG-890
> URL: https://issues.apache.org/jira/browse/PIG-890
> Project: Pig
>  Issue Type: Improvement
>Reporter: Sriranjan Manjunath
> Fix For: 0.4.0
>
> Attachments: samplerinterface.patch
>
>
> We need a different sampler for order by and skewed join. We thus need a 
> better sampling interface. The design of the same is described here: 
> http://wiki.apache.org/pig/PigSampler

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-936) making dump and PigDump independent from Tuple.toString

2009-09-02 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750753#action_12750753
 ] 

Daniel Dai commented on PIG-936:


Unit test failure is not related to this patch.

> making dump and PigDump independent from Tuple.toString
> ---
>
> Key: PIG-936
> URL: https://issues.apache.org/jira/browse/PIG-936
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Olga Natkovich
>Assignee: Jeff Zhang
> Fix For: 0.4.0
>
> Attachments: Pig_936.Patch, Pig_936_2.Patch, Pig_936_3.Patch
>
>
> Since Tuple is an interface, a toString implementation can change from one 
> tuple implementation to the next. This means that format of dump and PigDump 
> will be different depending on the tuples processed. This could be quite 
> confusing to the users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-936) making dump and PigDump independent from Tuple.toString

2009-09-02 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750763#action_12750763
 ] 

Jeff Zhang commented on PIG-936:


Daniel,

Thank you for your help.



> making dump and PigDump independent from Tuple.toString
> ---
>
> Key: PIG-936
> URL: https://issues.apache.org/jira/browse/PIG-936
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Olga Natkovich
>Assignee: Jeff Zhang
> Fix For: 0.4.0
>
> Attachments: Pig_936.Patch, Pig_936_2.Patch, Pig_936_3.Patch
>
>
> Since Tuple is an interface, a toString implementation can change from one 
> tuple implementation to the next. This means that format of dump and PigDump 
> will be different depending on the tuples processed. This could be quite 
> confusing to the users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-942) Maps are not implicitly casted

2009-09-02 Thread Sriranjan Manjunath (JIRA)
Maps are not implicitly casted
--

 Key: PIG-942
 URL: https://issues.apache.org/jira/browse/PIG-942
 Project: Pig
  Issue Type: Bug
Reporter: Sriranjan Manjunath


A = load 'foo' as (m) throws the following exception when foo has maps.

java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast 
to java.util.Map
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POMapLookUp.getNext(POMapLookUp.java:98)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POMapLookUp.getNext(POMapLookUp.java:115)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:612)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:278)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)

The same works if I explicitly cast m to a map: A = load 'foo' as (m:[])

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-942) Maps are not implicitly casted

2009-09-02 Thread Sriranjan Manjunath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750796#action_12750796
 ] 

Sriranjan Manjunath commented on PIG-942:
-

Here's the complete script:

A = load 'map.txt' as (e);
B = load 'map.txt' as (f);
C = join A by (chararray)e#'100', B by (chararray)f#'100';
dump C;

> Maps are not implicitly casted
> --
>
> Key: PIG-942
> URL: https://issues.apache.org/jira/browse/PIG-942
> Project: Pig
>  Issue Type: Bug
>Reporter: Sriranjan Manjunath
>
> A = load 'foo' as (m) throws the following exception when foo has maps.
> java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be 
> cast to java.util.Map
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POMapLookUp.getNext(POMapLookUp.java:98)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POMapLookUp.getNext(POMapLookUp.java:115)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:612)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:278)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)
> The same works if I explicitly cast m to a map: A = load 'foo' as (m:[])

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.