[jira] [Updated] (PIG-3623) HBaseStorage: setting loadKey and noWAL to false doesn't have any affect

2014-02-04 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3623:


  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

+1. Committed to trunk. Thanks Nezih

The TestHBaseStorage still fails in the trunk. Passes fine when reverting to 
the old revision with this patch. Will address that in a separate jira. 

> HBaseStorage: setting loadKey and noWAL to false doesn't have any affect
> 
>
> Key: PIG-3623
> URL: https://issues.apache.org/jira/browse/PIG-3623
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Michael Stefaniak
>Assignee: Nezih Yigitbasi
> Fix For: 0.13.0
>
> Attachments: PIG-3623.1.patch, PIG-3623.2.patch, PIG-3623.3.patch, 
> PIG-3623.patch
>
>
> The documentation for HBaseStorage 
> (http://pig.apache.org/docs/r0.12.0/func.html#HBaseStorage)
> says -loadKey=(true|false) Load the row key as the first value in every tuple 
> returned from HBase (default=false)
> However, looking at the source 
> (http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/hbase/HBaseStorage.java)
> it is just doing a check for the existence of this option
> loadRowKey_ = configuredOptions_.hasOption("loadKey");
> So setting -loadKey=false in the options string, still results in a true value



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] Subscription: PIG patch available

2014-02-04 Thread jira
Issue Subscription
Filter: PIG patch available (17 issues)

Subscriber: pigdaily

Key Summary
PIG-3741Utils.setTmpFileCompressionOnConf can cause side effect for 
SequenceFileInterStorage
https://issues.apache.org/jira/browse/PIG-3741
PIG-3737Bundle dependent jars in distribution in %PIG_HOME%/lib folder
https://issues.apache.org/jira/browse/PIG-3737
PIG-3735UDF to data cleanse the dirty data with expected pattern
https://issues.apache.org/jira/browse/PIG-3735
PIG-3724pig e2e tests dont have hadoop libs on classpath
https://issues.apache.org/jira/browse/PIG-3724
PIG-3679e2e StreamingPythonUDFs_10 fails in trunk
https://issues.apache.org/jira/browse/PIG-3679
PIG-3670Fix assert in Pig script
https://issues.apache.org/jira/browse/PIG-3670
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3635Fix e2e tests for Hadoop 2.X on Windows
https://issues.apache.org/jira/browse/PIG-3635
PIG-3623HBaseStorage: setting loadKey and noWAL to false doesn't have any 
affect
https://issues.apache.org/jira/browse/PIG-3623
PIG-3615Update the way that JsonLoader/JsonStorage deal with BigDecimal
https://issues.apache.org/jira/browse/PIG-3615
PIG-3613UDF for SimilarityMatching between strings with matching scores
https://issues.apache.org/jira/browse/PIG-3613
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587
PIG-3456Reduce threadlocal conf access in backend for each record
https://issues.apache.org/jira/browse/PIG-3456
PIG-3447Compiler warning message dropped for CastLineageSetter and others 
with no enum kind
https://issues.apache.org/jira/browse/PIG-3447
PIG-3441Allow Pig to use default resources from Configuration objects
https://issues.apache.org/jira/browse/PIG-3441
PIG-3373XMLLoader returns non-matching nodes when a tag name spans through 
the block boundary
https://issues.apache.org/jira/browse/PIG-3373
PIG-3347Store invocation brings side effect
https://issues.apache.org/jira/browse/PIG-3347

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384


[jira] [Commented] (PIG-259) allow store to overwrite existing directroy

2014-02-04 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891440#comment-13891440
 ] 

Daniel Dai commented on PIG-259:


PIG-259.9.patch committed. Thanks Nezih!

> allow store to overwrite existing directroy
> ---
>
> Key: PIG-259
> URL: https://issues.apache.org/jira/browse/PIG-259
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Olga Natkovich
>Assignee: Nezih Yigitbasi
> Fix For: 0.13.0
>
> Attachments: PIG-259.5.patch, PIG-259.6.patch, PIG-259.7.patch, 
> PIG-259.8.patch, PIG-259.9.patch, Pig_259.patch, Pig_259_2.patch, 
> Pig_259_3.patch, Pig_259_4.patch
>
>
> we have users who are asking for a flag to overwrite existing directory



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-259) allow store to overwrite existing directroy

2014-02-04 Thread Nezih Yigitbasi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nezih Yigitbasi updated PIG-259:


Attachment: PIG-259.9.patch

Daniel, delta patch added. Thanks for the review.

> allow store to overwrite existing directroy
> ---
>
> Key: PIG-259
> URL: https://issues.apache.org/jira/browse/PIG-259
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Olga Natkovich
>Assignee: Nezih Yigitbasi
> Fix For: 0.13.0
>
> Attachments: PIG-259.5.patch, PIG-259.6.patch, PIG-259.7.patch, 
> PIG-259.8.patch, PIG-259.9.patch, Pig_259.patch, Pig_259_2.patch, 
> Pig_259_3.patch, Pig_259_4.patch
>
>
> we have users who are asking for a flag to overwrite existing directory



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-259) allow store to overwrite existing directroy

2014-02-04 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891303#comment-13891303
 ] 

Daniel Dai commented on PIG-259:


Sounds good. Since the patch is committed, can you upload the delta patch?

> allow store to overwrite existing directroy
> ---
>
> Key: PIG-259
> URL: https://issues.apache.org/jira/browse/PIG-259
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Olga Natkovich
>Assignee: Nezih Yigitbasi
> Fix For: 0.13.0
>
> Attachments: PIG-259.5.patch, PIG-259.6.patch, PIG-259.7.patch, 
> PIG-259.8.patch, Pig_259.patch, Pig_259_2.patch, Pig_259_3.patch, 
> Pig_259_4.patch
>
>
> we have users who are asking for a flag to overwrite existing directory



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Review Request 17681: [PIG-3742] Set MR runtime settings on tez runtime

2014-02-04 Thread Rohini Palaniswamy

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17681/
---

(Updated Feb. 4, 2014, 10:08 p.m.)


Review request for pig, Cheolsoo Park and Daniel Dai.


Changes
---

Patch that deletes and newly adds util classes in util package instead of svn 
mv.


Bugs: PIG-3742
https://issues.apache.org/jira/browse/PIG-3742


Repository: pig


Description
---

Changes made:
1) Converted the relevant MR settings to equivalent Tez settings and set them 
on AM, Vertex and Edge.
2) Moved the util and helper classes (SecurityHelper and TezCompilerUtil) to a 
util package. Does not show up cleanly in review board. Will be doing a svn mv 
while committing.
3) Fixed a issue with 1-1 edge in orderby while running pigmix where 
parallelism was not reflected in the second edge when the parallelism of first 
vertex changed after input split calculation. Also made POIdentityOutTez work 
with shuffle input as well when trying to test performance with 1-1 ege or 
shuffle edge with round robin partitioner. Shuffle edge with round robin 
partitioner or hash partitioner was very bad compared to MR. Even with 1-1 
edge, performance is bad for L10.pig which orders by multiple columns. Still 
need to work on order by performance. Hoping unsorted shuffle with TEZ-661 
might make it better.


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POIdentityInOutTez.java
 1563492 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/SecurityHelper.java
 1563492 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java
 1563492 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompilerUtil.java
 1563492 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java
 1563492 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperator.java
 1563492 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezSessionManager.java
 1563492 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/util/MRToTezHelper.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/util/SecurityHelper.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/util/TezCompilerUtil.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/17681/diff/


Testing
---

Unit and tez.conf e2e tests pass.


Thanks,

Rohini Palaniswamy



Re: Review Request 17681: [PIG-3742] Set MR runtime settings on tez runtime

2014-02-04 Thread Rohini Palaniswamy


> On Feb. 4, 2014, 8:05 p.m., Daniel Dai wrote:
> > Seems the patch cannot apply cleanly. Can you rebase?

This is because of the svn mv of the util classes. Uploaded a patch after 
deleting the moved files and creating them newly. Will do svn mv during the 
commit.


- Rohini


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17681/#review33632
---


On Feb. 4, 2014, 5:40 p.m., Rohini Palaniswamy wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17681/
> ---
> 
> (Updated Feb. 4, 2014, 5:40 p.m.)
> 
> 
> Review request for pig, Cheolsoo Park and Daniel Dai.
> 
> 
> Bugs: PIG-3742
> https://issues.apache.org/jira/browse/PIG-3742
> 
> 
> Repository: pig
> 
> 
> Description
> ---
> 
> Changes made:
> 1) Converted the relevant MR settings to equivalent Tez settings and set them 
> on AM, Vertex and Edge.
> 2) Moved the util and helper classes (SecurityHelper and TezCompilerUtil) to 
> a util package. Does not show up cleanly in review board. Will be doing a svn 
> mv while committing.
> 3) Fixed a issue with 1-1 edge in orderby while running pigmix where 
> parallelism was not reflected in the second edge when the parallelism of 
> first vertex changed after input split calculation. Also made 
> POIdentityOutTez work with shuffle input as well when trying to test 
> performance with 1-1 ege or shuffle edge with round robin partitioner. 
> Shuffle edge with round robin partitioner or hash partitioner was very bad 
> compared to MR. Even with 1-1 edge, performance is bad for L10.pig which 
> orders by multiple columns. Still need to work on order by performance. 
> Hoping unsorted shuffle with TEZ-661 might make it better.
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POIdentityInOutTez.java
>  1563492 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/SecurityHelper.java
>  1563492 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java
>  1563492 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompilerUtil.java
>  1563492 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java
>  1563492 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperator.java
>  1563492 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezSessionManager.java
>  1563492 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/util/MRToTezHelper.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/util/SecurityHelper.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/util/TezCompilerUtil.java
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/17681/diff/
> 
> 
> Testing
> ---
> 
> Unit and tez.conf e2e tests pass.
> 
> 
> Thanks,
> 
> Rohini Palaniswamy
> 
>



[jira] [Commented] (PIG-3441) Allow Pig to use default resources from Configuration objects

2014-02-04 Thread Bhooshan Mogal (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891280#comment-13891280
 ] 

Bhooshan Mogal commented on PIG-3441:
-

[~daijy], yes, I agree. There are a lot of places where Configuration objects 
are re-created in Pig. I tried a bunch of them, but this particular instance - 
{{ConfigurationUtil.toConfiguration()}} seemed to fix the problem for me. This 
method is also called at multiple places.

> Allow Pig to use default resources from Configuration objects
> -
>
> Key: PIG-3441
> URL: https://issues.apache.org/jira/browse/PIG-3441
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.11.1
>Reporter: Bhooshan Mogal
> Attachments: PIG-3441.patch, PIG-3441_1.patch
>
>
> Pig currently ignores parameters from configuration files added statically to 
> Configuration objects as Configuration.addDefaultResource(filename.xml).
> Consider the following scenario -
> In a hadoop FileSystem driver for a non-HDFS filesystem you load properties 
> specific to that FileSystem in a static initializer block in the class that 
> extends org.apache.hadoop.fs.Filesystem for your FileSystem like below - 
> {code}
> class MyFileSystem extends FileSystem {
> static {
>   Configuration.addDefaultResource("myfs-default.xml");
>   Configuration.addDefaultResource("myfs-site.xml");
>   }
> }
> {code}
> Interfaces like the Hadoop CLI, Hive, Hadoop M/R can find configuration 
> parameters defined in these configuration files as long as they are on the 
> classpath.
> However, Pig cannot find parameters from these files, because it ignores 
> configuration files added statically.
> Pig should allow users to specify if they would like pig to read parameters 
> from resources loaded statically.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-259) allow store to overwrite existing directroy

2014-02-04 Thread Nezih Yigitbasi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nezih Yigitbasi updated PIG-259:


Attachment: PIG-259.8.patch

When I removed the isOverwritable method tests failed. Because, the user tells 
us whether to overwrite or not, and using that flag we determine to catch file 
not found problems during validation. That is, implementing the interface is 
not enough to catch file not found problems during validation (user says 
"-overwrite false" but we only check whether PigStorage implements the 
OverwritingStoreFunc and ignore his input), so we need a flag that tells us 
user's input. To make the intent clearer I changed the name of 
OverwritingStoreFunc to OverwritableStoreFunc and changed the name of the 
method from "isOverwrite" to "shouldOverwrite", also added some javadoc.


> allow store to overwrite existing directroy
> ---
>
> Key: PIG-259
> URL: https://issues.apache.org/jira/browse/PIG-259
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Olga Natkovich
>Assignee: Nezih Yigitbasi
> Fix For: 0.13.0
>
> Attachments: PIG-259.5.patch, PIG-259.6.patch, PIG-259.7.patch, 
> PIG-259.8.patch, Pig_259.patch, Pig_259_2.patch, Pig_259_3.patch, 
> Pig_259_4.patch
>
>
> we have users who are asking for a flag to overwrite existing directory



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3441) Allow Pig to use default resources from Configuration objects

2014-02-04 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891228#comment-13891228
 ] 

Daniel Dai commented on PIG-3441:
-

[~bdmogal] I see several more places we instantiate Configuration without 
default config files (eg, HExecutionEngine:111), not sure if we need to change 
those as well. Need to dig into the configuration propagation process more, it 
is quite complicated right now.

> Allow Pig to use default resources from Configuration objects
> -
>
> Key: PIG-3441
> URL: https://issues.apache.org/jira/browse/PIG-3441
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.11.1
>Reporter: Bhooshan Mogal
> Attachments: PIG-3441.patch, PIG-3441_1.patch
>
>
> Pig currently ignores parameters from configuration files added statically to 
> Configuration objects as Configuration.addDefaultResource(filename.xml).
> Consider the following scenario -
> In a hadoop FileSystem driver for a non-HDFS filesystem you load properties 
> specific to that FileSystem in a static initializer block in the class that 
> extends org.apache.hadoop.fs.Filesystem for your FileSystem like below - 
> {code}
> class MyFileSystem extends FileSystem {
> static {
>   Configuration.addDefaultResource("myfs-default.xml");
>   Configuration.addDefaultResource("myfs-site.xml");
>   }
> }
> {code}
> Interfaces like the Hadoop CLI, Hive, Hadoop M/R can find configuration 
> parameters defined in these configuration files as long as they are on the 
> classpath.
> However, Pig cannot find parameters from these files, because it ignores 
> configuration files added statically.
> Pig should allow users to specify if they would like pig to read parameters 
> from resources loaded statically.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3347) Store invocation brings side effect

2014-02-04 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891178#comment-13891178
 ] 

Daniel Dai commented on PIG-3347:
-

It could but introduce a lot of complications. Currently only 
LOForEach/LOSplitOutput is dealing with dup-uid, otherwise it will sprawl to 
all operators and all optimizer rules.

> Store invocation brings side effect
> ---
>
> Key: PIG-3347
> URL: https://issues.apache.org/jira/browse/PIG-3347
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.11
> Environment: local mode
>Reporter: Sergey
>Assignee: Daniel Dai
>Priority: Critical
> Fix For: 0.12.1
>
> Attachments: PIG-3347-1.patch, PIG-3347-2-testonly.patch, 
> PIG-3347-3.patch, PIG-3347-4-testonly.patch, PIG-3347-5.patch
>
>
> The problem is that intermediate 'store' invocation "changes" the final store 
> output. Looks like it brings some kind of side effect. We did use 'local' 
> mode to run script
> here is the input data:
> 1
> 1
> Here is the script:
> {code}
> a = load 'test';
> a_group = group a by $0;
> b = foreach a_group {
>   a_distinct = distinct a.$0;
>   generate group, a_distinct;
> }
> --store b into 'b';
> c = filter b by SIZE(a_distinct) == 1;
> store c into 'out';
> {code}
> We expect output to be:
> 1 1
> The output is empty file.
> Uncomment {code}--store b into 'b';{code} line and see the diffrence.
> Yuo would get expected output.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3744) SequenceFileLoader does not support BytesWritable

2014-02-04 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3744:


Attachment: PIG-3744-2.patch

Test failed while running before commit as a single quote in the LOAD statement 
got accidentally deleted before generating the patch. Fixed that in 
PIG-3744-2.patch

> SequenceFileLoader does not support BytesWritable
> -
>
> Key: PIG-3744
> URL: https://issues.apache.org/jira/browse/PIG-3744
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11.1
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.13.0
>
> Attachments: PIG-3744-1.patch, PIG-3744-2.patch
>
>
> SequenceFileLoader should be referring to BytesWritable for bytearray type, 
> but it refers to pig's DataByteArray which does not even implement Writable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3744) SequenceFileLoader does not support BytesWritable

2014-02-04 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3744:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks Daniel for the review.

> SequenceFileLoader does not support BytesWritable
> -
>
> Key: PIG-3744
> URL: https://issues.apache.org/jira/browse/PIG-3744
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11.1
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.13.0
>
> Attachments: PIG-3744-1.patch, PIG-3744-2.patch
>
>
> SequenceFileLoader should be referring to BytesWritable for bytearray type, 
> but it refers to pig's DataByteArray which does not even implement Writable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3744) SequenceFileLoader does not support BytesWritable

2014-02-04 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891160#comment-13891160
 ] 

Daniel Dai commented on PIG-3744:
-

+1

> SequenceFileLoader does not support BytesWritable
> -
>
> Key: PIG-3744
> URL: https://issues.apache.org/jira/browse/PIG-3744
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11.1
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.13.0
>
> Attachments: PIG-3744-1.patch
>
>
> SequenceFileLoader should be referring to BytesWritable for bytearray type, 
> but it refers to pig's DataByteArray which does not even implement Writable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-259) allow store to overwrite existing directroy

2014-02-04 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891158#comment-13891158
 ] 

Daniel Dai commented on PIG-259:


Yes, good point. Let's remove the method to keep interface simpler.

> allow store to overwrite existing directroy
> ---
>
> Key: PIG-259
> URL: https://issues.apache.org/jira/browse/PIG-259
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Olga Natkovich
>Assignee: Nezih Yigitbasi
> Fix For: 0.13.0
>
> Attachments: PIG-259.5.patch, PIG-259.6.patch, PIG-259.7.patch, 
> Pig_259.patch, Pig_259_2.patch, Pig_259_3.patch, Pig_259_4.patch
>
>
> we have users who are asking for a flag to overwrite existing directory



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3744) SequenceFileLoader does not support BytesWritable

2014-02-04 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3744:


Status: Patch Available  (was: Open)

> SequenceFileLoader does not support BytesWritable
> -
>
> Key: PIG-3744
> URL: https://issues.apache.org/jira/browse/PIG-3744
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11.1
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.13.0
>
> Attachments: PIG-3744-1.patch
>
>
> SequenceFileLoader should be referring to BytesWritable for bytearray type, 
> but it refers to pig's DataByteArray which does not even implement Writable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3744) SequenceFileLoader does not support BytesWritable

2014-02-04 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3744:


Attachment: PIG-3744-1.patch

> SequenceFileLoader does not support BytesWritable
> -
>
> Key: PIG-3744
> URL: https://issues.apache.org/jira/browse/PIG-3744
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11.1
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.13.0
>
> Attachments: PIG-3744-1.patch
>
>
> SequenceFileLoader should be referring to BytesWritable for bytearray type, 
> but it refers to pig's DataByteArray which does not even implement Writable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-259) allow store to overwrite existing directroy

2014-02-04 Thread Nezih Yigitbasi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891145#comment-13891145
 ] 

Nezih Yigitbasi commented on PIG-259:
-

Daniel, one question. Do you think the isOverwrite() method in the 
OverwritingStoreFunc interface necessary? If a store func. implements this 
interface it is very likely that it will return true in isOverwrite(). Maybe we 
should remove that method, what do you think?

> allow store to overwrite existing directroy
> ---
>
> Key: PIG-259
> URL: https://issues.apache.org/jira/browse/PIG-259
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Olga Natkovich
>Assignee: Nezih Yigitbasi
> Fix For: 0.13.0
>
> Attachments: PIG-259.5.patch, PIG-259.6.patch, PIG-259.7.patch, 
> Pig_259.patch, Pig_259_2.patch, Pig_259_3.patch, Pig_259_4.patch
>
>
> we have users who are asking for a flag to overwrite existing directory



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3347) Store invocation brings side effect

2014-02-04 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891127#comment-13891127
 ] 

Koji Noguchi commented on PIG-3347:
---

bq. we will need to generate a new uid for col2 to avoid uid conflict (using a 
UDF IdentityColumn)

Daniel, I think I understand how it is being used, but my confusion is: for the 
pure purpose of tracking column lineage, shouldn't the redundant uid inside the 
relation be allowed?  Isn't the requirement of no-conflict-uid coming from 
using the same uid for ProjectionPatcher which serves a different purpose than 
the lineage tracking?

> Store invocation brings side effect
> ---
>
> Key: PIG-3347
> URL: https://issues.apache.org/jira/browse/PIG-3347
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.11
> Environment: local mode
>Reporter: Sergey
>Assignee: Daniel Dai
>Priority: Critical
> Fix For: 0.12.1
>
> Attachments: PIG-3347-1.patch, PIG-3347-2-testonly.patch, 
> PIG-3347-3.patch, PIG-3347-4-testonly.patch, PIG-3347-5.patch
>
>
> The problem is that intermediate 'store' invocation "changes" the final store 
> output. Looks like it brings some kind of side effect. We did use 'local' 
> mode to run script
> here is the input data:
> 1
> 1
> Here is the script:
> {code}
> a = load 'test';
> a_group = group a by $0;
> b = foreach a_group {
>   a_distinct = distinct a.$0;
>   generate group, a_distinct;
> }
> --store b into 'b';
> c = filter b by SIZE(a_distinct) == 1;
> store c into 'out';
> {code}
> We expect output to be:
> 1 1
> The output is empty file.
> Uncomment {code}--store b into 'b';{code} line and see the diffrence.
> Yuo would get expected output.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (PIG-3744) SequenceFileLoader does not support BytesWritable

2014-02-04 Thread Rohini Palaniswamy (JIRA)
Rohini Palaniswamy created PIG-3744:
---

 Summary: SequenceFileLoader does not support BytesWritable
 Key: PIG-3744
 URL: https://issues.apache.org/jira/browse/PIG-3744
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11.1
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: 0.13.0


SequenceFileLoader should be referring to BytesWritable for bytearray type, but 
it refers to pig's DataByteArray which does not even implement Writable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Review Request 17681: [PIG-3742] Set MR runtime settings on tez runtime

2014-02-04 Thread Daniel Dai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17681/#review33632
---


Seems the patch cannot apply cleanly. Can you rebase?

- Daniel Dai


On Feb. 4, 2014, 5:40 p.m., Rohini Palaniswamy wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17681/
> ---
> 
> (Updated Feb. 4, 2014, 5:40 p.m.)
> 
> 
> Review request for pig, Cheolsoo Park and Daniel Dai.
> 
> 
> Bugs: PIG-3742
> https://issues.apache.org/jira/browse/PIG-3742
> 
> 
> Repository: pig
> 
> 
> Description
> ---
> 
> Changes made:
> 1) Converted the relevant MR settings to equivalent Tez settings and set them 
> on AM, Vertex and Edge.
> 2) Moved the util and helper classes (SecurityHelper and TezCompilerUtil) to 
> a util package. Does not show up cleanly in review board. Will be doing a svn 
> mv while committing.
> 3) Fixed a issue with 1-1 edge in orderby while running pigmix where 
> parallelism was not reflected in the second edge when the parallelism of 
> first vertex changed after input split calculation. Also made 
> POIdentityOutTez work with shuffle input as well when trying to test 
> performance with 1-1 ege or shuffle edge with round robin partitioner. 
> Shuffle edge with round robin partitioner or hash partitioner was very bad 
> compared to MR. Even with 1-1 edge, performance is bad for L10.pig which 
> orders by multiple columns. Still need to work on order by performance. 
> Hoping unsorted shuffle with TEZ-661 might make it better.
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POIdentityInOutTez.java
>  1563492 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/SecurityHelper.java
>  1563492 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java
>  1563492 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompilerUtil.java
>  1563492 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java
>  1563492 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperator.java
>  1563492 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezSessionManager.java
>  1563492 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/util/MRToTezHelper.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/util/SecurityHelper.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/util/TezCompilerUtil.java
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/17681/diff/
> 
> 
> Testing
> ---
> 
> Unit and tez.conf e2e tests pass.
> 
> 
> Thanks,
> 
> Rohini Palaniswamy
> 
>



[jira] [Resolved] (PIG-259) allow store to overwrite existing directroy

2014-02-04 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-259.


   Resolution: Fixed
Fix Version/s: 0.13.0
 Hadoop Flags: Reviewed

Also add some comment to OverwritingStoreFunc. +1.

Patch committed to trunk. Thanks Nezih!

> allow store to overwrite existing directroy
> ---
>
> Key: PIG-259
> URL: https://issues.apache.org/jira/browse/PIG-259
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Olga Natkovich
>Assignee: Nezih Yigitbasi
> Fix For: 0.13.0
>
> Attachments: PIG-259.5.patch, PIG-259.6.patch, PIG-259.7.patch, 
> Pig_259.patch, Pig_259_2.patch, Pig_259_3.patch, Pig_259_4.patch
>
>
> we have users who are asking for a flag to overwrite existing directory



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3567) LogicalPlanPrinter throws OOM for large scripts

2014-02-04 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-3567:


Fix Version/s: 0.13.0
   0.12.1

> LogicalPlanPrinter throws OOM for large scripts
> ---
>
> Key: PIG-3567
> URL: https://issues.apache.org/jira/browse/PIG-3567
> Project: Pig
>  Issue Type: Bug
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.12.1, 0.13.0
>
> Attachments: PIG-3567.patch
>
>
> As mentioned in PIG-3455, LogicalPlanPrinter throws OOM for large scripts. 
> Problem is LogicalPlanPrinter's visit method generates a large string before 
> its written to the PrintStream.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-259) allow store to overwrite existing directroy

2014-02-04 Thread Nezih Yigitbasi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nezih Yigitbasi updated PIG-259:


Attachment: PIG-259.7.patch

Daniel,
Thanks for the comments.
1. Good catch.
2. Updated PigOutputFormat to be consistent with InputOutputFileValidator. Both 
do checks now.
3. Fixed.

> allow store to overwrite existing directroy
> ---
>
> Key: PIG-259
> URL: https://issues.apache.org/jira/browse/PIG-259
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Olga Natkovich
>Assignee: Nezih Yigitbasi
> Attachments: PIG-259.5.patch, PIG-259.6.patch, PIG-259.7.patch, 
> Pig_259.patch, Pig_259_2.patch, Pig_259_3.patch, Pig_259_4.patch
>
>
> we have users who are asking for a flag to overwrite existing directory



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3347) Store invocation brings side effect

2014-02-04 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891032#comment-13891032
 ] 

Daniel Dai commented on PIG-3347:
-

[~knoguchi], in the "B = foreach A generate a as col1, a as col2; ", we will 
need to generate a new uid for col2 to avoid uid conflict (using a UDF 
IdentityColumn). The downside is this will break the lineage chain. The uid is 
mostly used in optimizer, there several holes when we use it for pure lineage. 
Optimizer rules is expected to live with these holes by skip optimize (eg, 
PushUpFilter is skip the foreach with UDF, which include IdentityColumn aiming 
to fix the uid conflict)

> Store invocation brings side effect
> ---
>
> Key: PIG-3347
> URL: https://issues.apache.org/jira/browse/PIG-3347
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.11
> Environment: local mode
>Reporter: Sergey
>Assignee: Daniel Dai
>Priority: Critical
> Fix For: 0.12.1
>
> Attachments: PIG-3347-1.patch, PIG-3347-2-testonly.patch, 
> PIG-3347-3.patch, PIG-3347-4-testonly.patch, PIG-3347-5.patch
>
>
> The problem is that intermediate 'store' invocation "changes" the final store 
> output. Looks like it brings some kind of side effect. We did use 'local' 
> mode to run script
> here is the input data:
> 1
> 1
> Here is the script:
> {code}
> a = load 'test';
> a_group = group a by $0;
> b = foreach a_group {
>   a_distinct = distinct a.$0;
>   generate group, a_distinct;
> }
> --store b into 'b';
> c = filter b by SIZE(a_distinct) == 1;
> store c into 'out';
> {code}
> We expect output to be:
> 1 1
> The output is empty file.
> Uncomment {code}--store b into 'b';{code} line and see the diffrence.
> Yuo would get expected output.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3741) Utils.setTmpFileCompressionOnConf can cause side effect for SequenceFileInterStorage

2014-02-04 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891030#comment-13891030
 ] 

Julien Le Dem commented on PIG-3741:


Ideally each store would get its own config object, but that would be a major 
refactoring.
In the meantime, this looks like a good improvement to me.
+1

> Utils.setTmpFileCompressionOnConf can cause side effect for 
> SequenceFileInterStorage
> 
>
> Key: PIG-3741
> URL: https://issues.apache.org/jira/browse/PIG-3741
> Project: Pig
>  Issue Type: Bug
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Fix For: 0.12.1
>
> Attachments: PIG-3741.patch
>
>
> Currently, Utils.setTmpFileCompressionOnConf(pigContext, conf); is invoked 
> for every job. In case of Seqfile, this api sets mapreduce params on conf to 
> assist SequenceFileInterStorage. However, as a side effect, this might change 
> the behavior of other storers due to these mapred properties. This api should 
> only be called for jobs with intermediate storage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3347) Store invocation brings side effect

2014-02-04 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-3347:


Attachment: PIG-3347-5.patch

Attach another patch which also address Koji's new case.

> Store invocation brings side effect
> ---
>
> Key: PIG-3347
> URL: https://issues.apache.org/jira/browse/PIG-3347
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.11
> Environment: local mode
>Reporter: Sergey
>Assignee: Daniel Dai
>Priority: Critical
> Fix For: 0.12.1
>
> Attachments: PIG-3347-1.patch, PIG-3347-2-testonly.patch, 
> PIG-3347-3.patch, PIG-3347-4-testonly.patch, PIG-3347-5.patch
>
>
> The problem is that intermediate 'store' invocation "changes" the final store 
> output. Looks like it brings some kind of side effect. We did use 'local' 
> mode to run script
> here is the input data:
> 1
> 1
> Here is the script:
> {code}
> a = load 'test';
> a_group = group a by $0;
> b = foreach a_group {
>   a_distinct = distinct a.$0;
>   generate group, a_distinct;
> }
> --store b into 'b';
> c = filter b by SIZE(a_distinct) == 1;
> store c into 'out';
> {code}
> We expect output to be:
> 1 1
> The output is empty file.
> Uncomment {code}--store b into 'b';{code} line and see the diffrence.
> Yuo would get expected output.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-259) allow store to overwrite existing directroy

2014-02-04 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13890971#comment-13890971
 ] 

Daniel Dai commented on PIG-259:


Thanks for the update. Another several comments:
1. PigOutputFormat.java: Remove "PigStorage ps = (PigStorage) sFunc;", we 
cannot assume sFunc is PigStorage
2. InputOutputFileValidator.java: Shall we skip checkOutputSpecs when overwrite 
happens? There is nothing wrong to capture FileAlreadyExistsException 
exception, but since you skip checkOutputSpecs in PigOutputFormat, it seems 
better to do it consistently
3. Another tab in PigStorage.java: "protected ResourceSchema schema"

> allow store to overwrite existing directroy
> ---
>
> Key: PIG-259
> URL: https://issues.apache.org/jira/browse/PIG-259
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Olga Natkovich
>Assignee: Nezih Yigitbasi
> Attachments: PIG-259.5.patch, PIG-259.6.patch, Pig_259.patch, 
> Pig_259_2.patch, Pig_259_3.patch, Pig_259_4.patch
>
>
> we have users who are asking for a flag to overwrite existing directory



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3347) Store invocation brings side effect

2014-02-04 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13890942#comment-13890942
 ] 

Koji Noguchi commented on PIG-3347:
---

bq. UID is to track column lineage so in logical optimizer, so that we can 
freely move operate up and down,  ProjectionPatcher will reposition the column 
according to uid

I think part of my confusion comes from these two.  UID is used for (1) 
tracking column lineage.  (2) UID is also used for ProjectionPatcher to 
reposition therefore requiring UID to be unique within each relation.

Because of (2), we're seeing new uid being created whenever column is 
referenced multiple times.
Like 
A = load 'a.txt' as (a:int);
B = foreach A generate a as col1, a as col2; 

This would create a schema like 
{noformat}
1-2: (Name: LOStore Schema: col1#1:int,col2#2:int)
...
|---A: (Name: LOLoad Schema: a#1:int)RequiredFields:null
{noformat}

So without traversing the lineage, I cannot connect 'col2' to original 'a'.
However, optimizer like PushUpFilter&FilterAboveForeach seems to be using just 
UID to determine the field usages...

But this is outside of this jira.  I need to spend more time learning how the 
pig compiler works.

> Store invocation brings side effect
> ---
>
> Key: PIG-3347
> URL: https://issues.apache.org/jira/browse/PIG-3347
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.11
> Environment: local mode
>Reporter: Sergey
>Assignee: Daniel Dai
>Priority: Critical
> Fix For: 0.12.1
>
> Attachments: PIG-3347-1.patch, PIG-3347-2-testonly.patch, 
> PIG-3347-3.patch, PIG-3347-4-testonly.patch
>
>
> The problem is that intermediate 'store' invocation "changes" the final store 
> output. Looks like it brings some kind of side effect. We did use 'local' 
> mode to run script
> here is the input data:
> 1
> 1
> Here is the script:
> {code}
> a = load 'test';
> a_group = group a by $0;
> b = foreach a_group {
>   a_distinct = distinct a.$0;
>   generate group, a_distinct;
> }
> --store b into 'b';
> c = filter b by SIZE(a_distinct) == 1;
> store c into 'out';
> {code}
> We expect output to be:
> 1 1
> The output is empty file.
> Uncomment {code}--store b into 'b';{code} line and see the diffrence.
> Yuo would get expected output.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3347) Store invocation brings side effect

2014-02-04 Thread Koji Noguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-3347:
--

Attachment: PIG-3347-4-testonly.patch

Thanks [~daijy].  
Adding one more testcase that I believe should push the filter before foreach.
This one succeeds without the patch but fails with the patch.

> Store invocation brings side effect
> ---
>
> Key: PIG-3347
> URL: https://issues.apache.org/jira/browse/PIG-3347
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.11
> Environment: local mode
>Reporter: Sergey
>Assignee: Daniel Dai
>Priority: Critical
> Fix For: 0.12.1
>
> Attachments: PIG-3347-1.patch, PIG-3347-2-testonly.patch, 
> PIG-3347-3.patch, PIG-3347-4-testonly.patch
>
>
> The problem is that intermediate 'store' invocation "changes" the final store 
> output. Looks like it brings some kind of side effect. We did use 'local' 
> mode to run script
> here is the input data:
> 1
> 1
> Here is the script:
> {code}
> a = load 'test';
> a_group = group a by $0;
> b = foreach a_group {
>   a_distinct = distinct a.$0;
>   generate group, a_distinct;
> }
> --store b into 'b';
> c = filter b by SIZE(a_distinct) == 1;
> store c into 'out';
> {code}
> We expect output to be:
> 1 1
> The output is empty file.
> Uncomment {code}--store b into 'b';{code} line and see the diffrence.
> Yuo would get expected output.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3347) Store invocation brings side effect

2014-02-04 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13890903#comment-13890903
 ] 

Daniel Dai commented on PIG-3347:
-

All unit tests pass with the patch.

> Store invocation brings side effect
> ---
>
> Key: PIG-3347
> URL: https://issues.apache.org/jira/browse/PIG-3347
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.11
> Environment: local mode
>Reporter: Sergey
>Assignee: Daniel Dai
>Priority: Critical
> Fix For: 0.12.1
>
> Attachments: PIG-3347-1.patch, PIG-3347-2-testonly.patch, 
> PIG-3347-3.patch
>
>
> The problem is that intermediate 'store' invocation "changes" the final store 
> output. Looks like it brings some kind of side effect. We did use 'local' 
> mode to run script
> here is the input data:
> 1
> 1
> Here is the script:
> {code}
> a = load 'test';
> a_group = group a by $0;
> b = foreach a_group {
>   a_distinct = distinct a.$0;
>   generate group, a_distinct;
> }
> --store b into 'b';
> c = filter b by SIZE(a_distinct) == 1;
> store c into 'out';
> {code}
> We expect output to be:
> 1 1
> The output is empty file.
> Uncomment {code}--store b into 'b';{code} line and see the diffrence.
> Yuo would get expected output.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Review Request 17681: [PIG-3742] Set MR runtime settings on tez runtime

2014-02-04 Thread Rohini Palaniswamy

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17681/
---

Review request for pig, Cheolsoo Park and Daniel Dai.


Bugs: PIG-3742
https://issues.apache.org/jira/browse/PIG-3742


Repository: pig


Description
---

Changes made:
1) Converted the relevant MR settings to equivalent Tez settings and set them 
on AM, Vertex and Edge.
2) Moved the util and helper classes (SecurityHelper and TezCompilerUtil) to a 
util package. Does not show up cleanly in review board. Will be doing a svn mv 
while committing.
3) Fixed a issue with 1-1 edge in orderby while running pigmix where 
parallelism was not reflected in the second edge when the parallelism of first 
vertex changed after input split calculation. Also made POIdentityOutTez work 
with shuffle input as well when trying to test performance with 1-1 ege or 
shuffle edge with round robin partitioner. Shuffle edge with round robin 
partitioner or hash partitioner was very bad compared to MR. Even with 1-1 
edge, performance is bad for L10.pig which orders by multiple columns. Still 
need to work on order by performance. Hoping unsorted shuffle with TEZ-661 
might make it better.


Diffs
-

  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/POIdentityInOutTez.java
 1563492 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/SecurityHelper.java
 1563492 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java
 1563492 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompilerUtil.java
 1563492 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java
 1563492 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperator.java
 1563492 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/TezSessionManager.java
 1563492 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/util/MRToTezHelper.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/util/SecurityHelper.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/pig/branches/tez/src/org/apache/pig/backend/hadoop/executionengine/tez/util/TezCompilerUtil.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/17681/diff/


Testing
---

Unit and tez.conf e2e tests pass.


Thanks,

Rohini Palaniswamy



Re: pig load function

2014-02-04 Thread Mark Wagner
Hi Krishna,

By default it uses the PigStorage LoadFunc. You can change that
behavior though by setting "pig.default.load.func" to your LoadFunc.

-Mark

On Mon, Feb 3, 2014 at 8:37 PM, Krishna Prasad Ambaripeta
 wrote:
> Hi.I am new to pig. Have a basic doubt. when we write " a = load 'a/y.txt' 
> as(a,b)" , which pig function will it call. is it LoadFunc?
> Thanks for the support.
> Thanks,Krishna Prasad


pig load function

2014-02-04 Thread Krishna Prasad Ambaripeta
Hi.I am new to pig. Have a basic doubt. when we write " a = load 'a/y.txt' 
as(a,b)" , which pig function will it call. is it LoadFunc?
Thanks for the support.
Thanks,Krishna Prasad